U.S. Department of Commerce Computer Science National Bureau of Standards and Technology

NBS NBS Special Publication 500-83 PUBLICATIONS Proceedings of the l^lir(|j|j||nL,°!,,??!*'^P * TECH Computer Performance Evaluation Users Group

1 7th Meeting

"Increasing Organizational Productivity"

100

'no. 500-83 1981 c. 2 NATIOI^AL BIJREAU OF STANDARDS

The National Bureau of Standards' was established by an act of Congress on March 3, 1901.

The Bureau's overall goal is to strengthen and advance the Nation's science and technology and facilitate their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a basis for the Nation's physical measurement system, (2) scientific and technological services for industry and government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety. The Bureau's technical work is per- formed by the National Measurement Laboratory, the National Engineering Laboratory, and the Institute for Computer Sciences and Technology.

THE NATIONAL MEASUREMENT LABORATORY provides the national system of physical and chemical and materials measurement; coordinates the system with measurement systems of other nations and furnishes essential services leading to accurate and uniform physical and chemical measurement throughout the Nation's scientific community, industry, and commerce; conducts materials research leading to improved methods of measurement, standards, and data on the properties of materials needed by industry, commerce, educational institutions, and Government; provides advisory and research services to other Government agencies; develops, produces, and distributes Standard Reference Materials; and provides calibration services. The Laboratory consists of the following centers:

Absolute Physical Quantities^ — Radiation Research — Thermodynamics and Molecular Science — Analytical Chemistry — Materials Science.

THE NATIONAL ENGINEERING LABORATORY provides technology and technical ser- vices to the public and private sectors to address national needs and to solve national problems; conducts research in engineering and applied science in support of these efforts; builds and maintains competence in the necessary disciplines required to carry out this research and technical service; develops engineering data and measurement capabilities; provides engineering measurement traceability services; develops test methods and proposes engineering standards and code changes; develops and proposes new engineering practices; and develops and improves mechanisms to transfer results of its research to the ultimate user. The Laboratory consists of the following centers:

Applied Mathematics — Electronics and Electrical Engineering^ — Mechanical Engineering and Process Technology^ — Building Technology — Fire Research — Consumer Product Technology — Field Methods.

THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts research and provides scientific and technical services to aid Federal agencies in the selection, acquisition, application, and use of computer technology to improve effectiveness and economy in Government operations in accordance with Public Law 89-306 (40 U.S.C. 759), relevant Executive Orders, and other directives; carries out this mission by managing the Federal Information Processing Standards Program, developing Federal ADP standards guidelines, and managing Federal participation in ADP voluntary standardization activities; provides scientific and technological advisory services and assistance to Federal agencies; and provides the technical foundation for computer-related policies of the Federal Government. The Institute consists of the following centers:

Programming Science and Technology — Computer Systems Engineering.

'Headquarters and Laboratories at Gaithersburg, MD, unless otherwise noted; mailing address Washington, DC 20234. ^Some divisions within the center are located at Boulder, CO 80303. comre~ Computer Science National Bureau onri Tor*hnr>lr>nw wATioifAL BUREAU I II of Standards allU tJUl lUIUy y or sTANDAaua UBRAirr

Publication 500-83 NBS Special V ' ^^^l

Proceedings of the ^ Computer Performance m^'^ Evaluation Users Group (CPEUG)

1 7tli Meeting

San Antonio, Texas

November 1 6-1 9, 1 981

Proceedings Editor Terry W. Potter

Conference Host Headquarters, Air Training Command Department of the Air Force

Sponsored by Institute for Computer Sciences and Technology National Bureau of Standards Washington, DC 20234

U.S. DEPARTMENT OF COMMERCE Malcolm Baldrige, Secretary

National Bureau of Standards Ernest Ambler, Director

Issued November 1 981 Reports on Computer Science and Technology

The National Bureau of Standards has a special responsibility within the Federal Government for computer science and technology activities. Tne programs of the NBS Institute for Computer Sciences and Technology are designed to provide ADP standards, guidelines, and technical advisory services to improve the effectiveness of computer utilization in the Federal sector, and to perform appropriate research and development efforts as foundation for such activities and programs. This publication series will report these NBS efforts to the Federal computer community as well as to interested specialists in the academic and private sectors. Those wishing to receive notices of publications in this series should complete and return the form at the end of this publication.

National Bureau of Standards Special Publication 500-83 Nat. Bur. Stand. (U.S.), Spec. Publ. 500-83,320 pages (Nov. 1981) CODEN: XNBSAV

Library of Congress Catalog Card Number: 81-600155

U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 1981

For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, DC 20402 Price $9.00 (Add 25 percent for other than U.S. mailing) Foreword

The theme of CPEUG addresses the principal challenge of the 80 's INCREASING ORGANIZATIONAL PRODUCTIVITY —as well as the direct challenge to information processing professionals of increasing data processing service and quality while minimizing cost. The relevance of this theme is reflected in the topics of recent news accounts and magazine articles touching on the management challenge in all sectors of our economy. Congress and the President clearly set forth the challenge as a goal for the Federal Government in the Paperwork Reduction Act of 1980:

"automatic data processing and telecommunications technology shall be acquired and used in a manner which improves service delivery and program management, increases productivity, reduces waste and fraud, and reduces the information processing burden on the public and private sectors."

Productivity growth in both the private and public sectors is an important factor in achieving our stated national goals of:

— reducing inflationary pressures, — raising living standards, — making U.S. goods competitive in world markets, and — protecting the quality of the environment.

Productivity is defined as the relationship between the quantity of goods or services produced (output) at a given level of quality and the resources used (input). This relationship can be measured over time for an organizational unit or for an economy as a whole by comparing a specific period with a preselected base period. Productivity improvement, therefore, is an increase in the ratio of outputs to inputs; that is, a higher quantity and/or quality of goods or services at the same cost, or the same quantity of goods or services at a lower cost, without sacrificing quality.

It is obvious that productivity improvement is important. For a company, productivity means producing at a lower cost than the competition. For a government, productivity means providing more service for a tax dollar. For a nation, productivity means improving the standard of living, controlling inflation, and competing effectively in world markets. From 1945 to 1970, U.S. productivity increased at an annual rate of 3.2 percent. However, from 1966 on, productivity in the U.S. grew at a much more modest rate of 1.6 percent, and in 1978, the

rate slipped below 1 percent. This trend continued in 1979, when

productivity actually declined by almost 1 percent. The major factors which affect productivity growth are:

Human resources; constituting 70 percent of the input cost in the U.S. economy. Business Week projects that 45 million jobs will be affected by increasing automation before the year 2000 — that's half of the present total.

Technological progress; better products, processes, and systems. Promises of office automation and workflow steamlining challenge management to improve the productivity of the 55 million white collar workers in the U.S. (white collar worker productiv- ity increased only 4 percent from 1968-1978, while blue collar productivity improved 84 percent during that period).

Investment and economic growth; new tools and capital equip- ment and expanding or declining market for outputs. Business Week recently projected a tripling of U.S. investment in computer-aided design and computer-aided manufacturing equipment to about $5 billion in 1985. The greatest impediment to reaping the produc- tivity gains from these tools was seen by many experts to be soft- ware development. The theme of CPEUG 80 ~ CPE Trends in the 80 's — reflected on the spectacular growth, progress, and changes in computers and computer applications in the 70 's as well as future trends of the 80 's. This year's theme suggests where the direction of computer usage and performance improvement efforts should be going. Increasing organizational productivity will be a focus of both Federal and commercial management for years to come. Our contributions to both increased efficiency and effectiveness in computer utilization — from microcomputers to supercomputers to digital switches in communications — will be of much importance in the drive for increased productivity. Our sensitivity and inge- nuity in increasing the productivity of the user and his results, not just high CPU utilization levels and greater channel utiliza- tion, will be a key factor in our organizations' ability to succeed in increasing productivity.

I hope you both enjoy and greatly benefit from this year's conference. A truly fine team on this year's conference committee dedicated their time, talent, and effort unstintingly to bring it about. To all of them, their names appear elsewhere in these pro- ceedings, my sincerest thanks for a job well done. I especially thank Peter J. Calomeris for an excellent job of keeping us to our deadlines and the painstaking job of arranging for all of the publications. A special thank you also goes to our hosts — the u. S. Air Force, Air Training Command.

Carl R. Palmer CPEUG 81 Conference Chairperson November 1981

iv . CP

Preface

The theme of CPEUG 81 directly addresses the principal challenge of the 80 's ~ INCREASING ORGANIZATIONAL PRODUCTIVITY, To meet this challenge, information managers and performance specialists must ensure that their organizations increase the quantity and quality of automated information services while minimizing costs. The Conference program reflects the critical role of information services in the productivity and survival of today's organizations, as well as such trends as increasing personnel costs, limited budgets, and the convergence of data processing, word processing, and communications technologies. The keynote address ( Business Automation and

Productivity — Fact, Fiction, and Prediction ) and the keynote panel

( Impact of Changing Technology on Productivity ) will clearly focus the Conference on this challenge.

As in previous Conferences, CPEUG 81 has expanded the size and scope of the technical program to encompass the expanding role of CPE. There are three parallel sessions instead of two, and new areas such as Office Automation are included. Track A is devoted to performance-oriented management throughout the life cycle, and includes sessions in such areas as Requirements Analysis, Capacity Planning, Systems Development, and Acquisition. Track B concentrates on special technologies (data base machines, data communications, etc.) and analysis tools/techniques important to good performance and productivity. Track B also spotlights several special application areas, including state and local government, medical, welfare, and personnel systems. Track C consists of tutorials and case studies that offer familiarization and training in the major areas addressed in the other two tracks. The program also includes three receptions and expanded Birds-of-a-Feather special interest sessions to promote the informal exchange of ideas and experiences that is a trademark of CPEUG. Many people contributed to the success of the CPEUG 81 program. Carol B. Wilson, Program Vice-Chairperson, organized the tutorial track and several sessions in other tracks. The Conference Cormittee, session chairpersons, authors, tutors, and referees all deserve special recognition for their time and patience. The invaluable administrative support of Sylvia M. Mabie merits special thanks

Thomas F. Wyrick CPEUG 81 Program Chairperson November 1981

V Abstract

These Proceedings record the papers that were presented at the Seventeenth Meeting of the Computer Performance Evaluation Users Group (CPEUG 81) held November 16-19, 1981, in San Antonio, TX. With the theme, "Increasing Organizational Productivity," CPEUG 81 reflects the critical role of information services in the productivity and survival of today's organization, as well as such trends as increasing personnel costs, limited budgets, and the convergence of data processing, communications, and word processing technologies. The program was divided into three parallel sessions and included technical papers on previously unpublished works, case studies, tutorials, and panels. Technical papers are presented in the Proceedings in their entirety.

Key words: Benchmarking; capacity planning; chargeback systems; computer performance management; data base machines; end user productivity; human factors evaluation; information system management; office automation; performance management systems; resource measurement facility; simulation; supercomputers.

The material contained herein is the viewpoint of the authors of specific papers. Publication of their papers in this volume does not necessarily constitute an endosement by the Computer Performance Evaluation Users Group (CPEUG) or the National Bureau of Standards. The material has been published in an effort to disseminate information and to promote the state-of-the-art of computer performance measurement, simulation, and evaluation.

vi CPEUG Advisory Board

Dennis M. Conti, Executive Secretary National Bureau of Standards Washington, DC

Richard F. Dunlavey National Bureau of Standards

Washington , DC

Dennis M. Gilbert Federal Computer Performance Evaluation and Simulation Center Washington, DC

Carl R. Palmer U.S. General Accounting Office Washington, DC

James E. Weatherbee Federal Computer Performance Evaluation and Simulation Center Washington, DC

vii Conference Committee

CONFERENCE CHAIRPERSON Dr. Carl R. Palmer U.S. General Accounting Office (202) 275-^797

PROGRAM CHAIRPERSON Thomas F. Wyrick Federal Computer Performance Evaluation and Simulation Center (703) 274-7910

PROGRAM VICE -CHAIR PERSON Carol B. Wilson Fiscal Associates, Inc. (703) 370-6381

PUBLICATION CHAIRPERSON Peter J. Calomeris National Bureau of Standards (301) 921-3861

PUBLICITY CHAIRPERSON Theodore F. Gonter U.S. General Accounting Office (202) 275-5040

PROCEEDINGS EDITOR Terry W. Potter Digital Equipment Corporation (617) 493-9749

ARRANGEMENTS CHAIRPERSON Dr. Jeffrey M. Mohr Arthur Young & Company (202) 828-7176

REGISTRATION CHAIRPERSON Alfred J. Perez U.S. Air Force Data Systems Design Center (205) 279-4051

AWARDS & VENDOR PROGRAM CHAIRPERSON Robert G. Van Houten U.S. Army Command & Control Support Agency (202) 695-6272

LOCAL ARRANGEMENTS CHAIRPERSON William R. Hunsicker Headquarters, ATC/ACDMX (512) 652-6471

FINANCE CHAIRPERSON James G. Sprung The MITRE Corporation (703) 827-6446

viii Referees

Barbara Anderson L. Arnold Johnson

H. Pat Artis John C. Kelly

Duane R. Ball Howard S. Kresin

V. David Cruz David Lindsay

Jeffey Gee Bobby L. McKenzie

Thomas P. Giammo Jim Mulford

Dennis M. Gilbert Paul R. Nester

Theodore F. Gonter Terry W. Potter

Gregory Haralambopoulos Peter Robson

John E. Hewes Dennis Shaw

Phillip C. Howard Thomas F. Wyrick

ix

CP

Table of Contents

FOREWORD iii

PREFACE V

ABSTRACT vi

CPEUG ADVISORY BOARD vii

CONFERENCE COMMITTEE viii

CPEUG 81 REFEREES ix

EFFECTIVE LIFE CYCLE MANAGEMENT - TRACK A

REQUIREMENTS & WORKLOAD ANALYSIS

SESSION OVERVIEW Jim Mulford International Computing Company 5

LONG-RANGE PREDICTION OF NETWORK TRAFFIC William Alexander Richard Br ice Los Alamos National Lab 7

FUNCTIONAL GROUPING OF APPLICATION PROGRAMS IN A TIMESHARE ENVIRONMENT Paul Chandler Wilson Hill Associates 15

CLUSTER ANALYSIS IN WORKLOAD CHARACTERIZATION FOR VAX/VMS Dr. Alex S. Wight University of Edinburgh 21

xi CAPACITY. DISASTER RECOVERY & CONTINGENCY PLANNING

SESSION OVERVIEW Theodore F. Gonter U.S. General Accounting Office 37

CAPACITY MANAGEMENT CONSIDERATIONS J. William Mullen GHR Energy Companies 39

CAPACITY CONSIDERATIONS IN DISASTER RECOVERY PLANNING Steven G. Penansky Arthur Young & Company 45

END USER PRODUCTIVITY

SESSION OVERVIEW Terry Potter Digital Equipment Corporation 57

SERVICE REPORTING FOR THE BOTTOM LINE Carol B. Wilson Fiscal Associates, Inc.

Jeffrey Mohr/Chan Mei Chu Arthur Young and Company 59

AUTOMATING THE AUTOMATION PROCESS Lou El sen Evolving Computer Concepts Corp 67

METHODOLOGY FOR ANALYZING COMPUTER USER PRODUCTIVITY Norman Archer Digital Equipment Corporation 71

SYSTEMS DEVELOPMENT & MAINTENANCE

SESSION OVERVIEW Phillip C. Howard Applied Computer Research 83

DESIGN OF INFORMATION SYSTEMS USING SENARIO-DRIVEN TECHNIQUES

W. T. Hardgrave/S. B. Salazar/E. J. Beller , III National Bureau of Standards 85

CONTAINING THE COST OF SOFTWARE MAINTENANCE TESTING-AN EXAMINATION OF THE QUALITY ASSURANCE ROLE WITHIN U.S. ARMY COMPUTER SYSTEMS COMMAND Maj. Steven R. Edwards U. S. Army Computer Systems Command 93

HISTORICAL FILES FOR SOFTWARE QUALITY ASSURANCE Walter R. Pyper Tracer, Inc 101

xii ADP COST ACCOUNTING & CHARGEBACK

A STEP-BT-STEP APPROACH FOR DEVELOPING AND IMPLEMENTING A DP CHARGING SYSTEM Kenneth W. Giese/Dean Hal stead/Thomas Wyrick FEDSIM 109

CONTROL OF INFORMATION SYSTEMS RESOURCE THROUGH PERFORMANCE MANAGEMENT

SESSION OVERVIEW John C. Kelly Datametrics Systems Corporation 121

INCREASING SYSTEM PRODUCTIVITY WITH OPERATIONAL STANDARDS David R. Vincent

Boole & Babbage, Inc „ 123

BENCHMARKING

SESSION OVERVIEW Barbara Anderson FEDSIM 141

UNIVERSAL SKELETON FOR BENCHMARKING BATCH AND INTERACTIVE WORKLOADS: UNISKEL BM Walter N, Bays/William P. Kincy, Jr. Mitre Corporation 143

COMPARING USER RESPONSE TIMES ON PAGED AND SWAPPED BY THE TERMINAL PROBE METHOD Luis Felipe Cabrera University of California, Berkeley

Jehan-Francois Paris Purdue University 157

PRACTICAL APPLICATION OF REMOTE TERMINAL EMULATION IN THE PHASE IV COMPETITIVE SYSTEM ACQUISITION Deanna J. Bennett AF Directorate of System Engineering 169

COST /PERFORMANCE EVALUATION - METHODS & EXPERIENCES

AN EVALUATION TECHNIQUE FOR EARLY SELECTION OF COMPUTER HARDWARE Bart D. Hodgins/Lyle A. Cox, Jr. Naval Postgraduate School 181

DEFICIENCIES IN THE PROCESS OF PROCURING SMALL COMPUTER SYSTEMS: A CASE STUDY Andrew C. Rucks/Peter M. Ginter University of Arkansas 191

COST/PERFORMANCE COMPARISONS FOR STRUCTURAL ANALYSIS SOFTWARE ON MINI- AND MAINFRAME COMPUTERS Anneliese K. von Mayrhauser/Raphael T. Haftka Illinois Institute of Technology 199

xiii SPECIAL TECHNOLOGIES & APPLICATIONS - TRACK B

FOCUSING ON SECONDARY STORAGE

SESSION OVERVIEW H.Pat Artis Morino Associates, Inc 221

EXPERIENCES WITH DASD, TAPE, AND MSS SPACE MANAGEMENT USING DMS Patrick A. Drayton Southwestern Bell Telephone Company 223

DASD CAPACITY PLANNING 3350 DASD Keith E. Silliman IBM Corporation 231

AN ANALYSIS OF A CDC84U-m DISK SUBSYSTEM J. William Atwood/Keh-Chiang Yu University of Texas at Austin 239

PERFORMANCE PREDICTION & OPTIMIZATION - METHODS & EXPERIENCES

SESSION OVERVIEW Thomas P. Giammo FEDSIM 249

TUNING THE FACILITIES COMPLEX OF UNIVAC 1100 OS John C. Kelly Datametrics Systems Corporation

Jerome W. Blaylock System Development Corporation 251

APPROXIMATE EVALUATION OF A BUBBLE MEMORY IN A TRANSACTION DRIVEN SYSTEM Wen-Te K. Lin Computer Corporation of America

Albert B. Tonik Sperry Univac 259

COMMUNICATIONS NETWORKS—TECHNOLOGICAL DEVELOPMENTS & APPLICATIONS

SESSION OVERVIEW Jeffrey Gee Network Analysis Corporation 269

LOCAL INTEGRATED COMMUNICATION SYSTEMS: THE KEY TO FUTURE PRODUCTIVITY Andrew P, Snow Network Analysis Corporation 271

PERFORMANCE ANALYSIS OF A FLOW CONTROLLED COMMUNICATION NETWORK NODE P, G. Harrison Imperial College, London 277

xiy :::

CASE STUDIES

RESOURCE MANAGEMENT IN A DISTRIBUTED PROCESSING ENVIRONMENT—COMPUTER RESOURCE SELECTION GUIDELINES Eva T. Jun U.S. Department of Energy 293

PANEL OVERVIEWS

OFFICE AUTOMATION AND PRODUCTIVITY Chairperson Mary L. Cunningham National Archives & Record Service 305

COMMAND, CONTROL, AND COMMUNICATIONS MANAGEMENT CHALLENGES IN THE 80 'S Chairperson: Robert G. Van Houten HQ Army Command & Control Support Agency 309

ADP COST ACCOUNTING & CHARGEBACK Chairperson Dennis M. Conti National Bureau of Standards 311

SUPERCOMPUTERS, CPE^ AND THE 80 'S PERSPECTIVES AND CHALLENGES Chairperson F. Brett Berlin Cray Research, Inc 313

ADP CHALLENGES IN MEDICAL, PERSONNEL, AND WELFARE SYSTEMS Chairperson Dinesh Kumar Social Security Administration 315

INFORMATION SYSTEMS AND PRODUCTIVITY IN STATE & LOCAL GOVERNMENT Chairperson: Ron Cornel ison State of Missouri 317

THE ADP PROCUREMENT PROCESS - LESSONS LEARNED Chairperson Terry D. Miller Government Sales Consultants, Inc 319

DATABASE MACHINES Chairperson Carol B. Wilson Fiscal Associates 321

XV TUTORIAL OVERVIEWS

MEASUREMENT, MODELING, AMD CAPACITY PLANNING: THE SECODARY STORAGE OCCUPANCY ISSUE H. Pat Artis Marino Associates 325

MICRO-COMPUTERS IN OFFICE AUTOMATION Wendell W. Berry National Oceanic and Atmospheric Administration 327

COMPUTER PERFORMANCE MANAGEMENT IN THE ADP SYSTEM LIFE CYCLE James G. Sprung Mitre Corporation 329

SIMULATION TOOLS IN PERFORMANCE EVALUATION Doug Neuse/K. Mani Chandy/Jay Misra/Robert Berry Information Research Associates 331

COMPUTATIONAL METHODS FOR QUEUEING NETWORK MODELS Charles H. Sauer IBM Thomas J. Watson Research Center 335

PRODUCTIVITY IN THE DEVELOPMENT FUNCTION Phillip C. Howard Applied Computer Research 337

ACQUISITION BENCHMARKING Dennis Conti National Bureau of Standards 339

SOFTWARE CONVERSION PROCESS AND PROBLEMS Thomas R. Bushback General Services Administration 3^*1

RESOURCE MEASUREMENT FACILITY (RMF) DATA: A REVIEW OF THE BASICS J. William Mullen The GHR Companies, Inc 3^3

Papers in this volume, except those by National Bureau of Standards authors, have not been edited or altered by the National Bureau of Standards. Opinions expressed in non-NBS papers are those of the authors, and not necessarily those of the National Bureau of Standards. Non-NBS authors are solely responsible for the content and quality of their submissions.

The mention of trade names in the volume is in no sense an endorsement or recommendation by the National Bureau of Standards.

xvi Effective Life Cycle Management Track A

1

Requirements and Workload Analysis

SESSION OVERVIEW REQUIREMENTS & WORKLOAD ANALYSIS

James 0. Mulford

International Computing Company 3150 Presidential Drive Fairborn, OH 45324

Accurate definition and analysis of ADP requirements and their resultant system workload is probably the most difficult issue to be addressed by performance analysts. Requirements may be defined in numerous ways (e.g., user terminology, existing system workload, existing prograns, new concepts of operations) and the performance analyst must translate these definitions into terms that can be applied in system sizing, performance analysis, benchmark definitions, or other CPE tasks. Many problems are encountered during this translation, including the following:

large amounts of data must be analyzed requirements are defined in various forms user terminology must be understood by the analyst new requirements are often not fully defined

These requirements and workload analysis complexities will continue to haunt performance analysts into the 198's. In fact, expanded use of automation to improve complexities of this analysis. Fortunately, tools and techniques that aid in requirements and workload analysis are receiving increased emphasis. Advances in ADP technology (e.g., processing speed, memory size and management, telecommunications, DBMS) have allowed this renewed emphasis on new tools and techniques. Better and more complete data on system workload can now be maintained. Advanced analysis techniques (e,g., cluster analysis, data reduction and characterization) are now being applied. Automated requirements definition tools (e,g,,

PSL/PSA ) are now available.

The following papers illustrate this renewed emphasis on innovative methods for analyzing requirements and system workload. All three papers demonstrate productive, cost effective approaches which can be applied in many environments, I sincerely hope that these papers will be followed (in future years) by additional and increasingly sophisticated approaches to requirements and workload analysis,

^ Problem Statement Language/Problem Statement Analyzer, University of Michigan

5

.

LONG-RANGE PREDICTION OF NETWORK TRAFFIC

William Alexander Richard Brice

Los Alamos National Laboratory Computing Division Los Alamos, NM 87544

A method of making long-range computer system workload predic- tions is presented. The method quantifies the effect of qualitative changes in computing by identifying assumptions and by considering the effect of a change on individual users. The method is illustrated by an example involving message traffic in a large computer network.

Key words: Computer networks; long-range forecasting; user behavior; workload forecasting.

1 . Introduction Some papers on workload forecasting for management planning look at current workload Management planning procedures sometimes analysis; others study the extent of growth require computer system workload forecasts or change in computing activities. Determi- for five or even ten years in the future. nation of the current workload is heavily Present workload prediction methods are represented, probably because it is the most inadequate at such long ranges because the straightforward process in forecasting. changes in system use are likely to be quali- Prediction methods include measurement tech- tative rather than just quantitative. niques [1-5],-^ abstraction of synthetic work- Perhaps a computer measurement professional loads from the measurements [4,6,7], and should be reluctant to make such long-range reduction of the measurement data to manage- predictions if, possibly, too much credence able magnitude, for example, clustering will be given them. However, when you must analysis [8-10]. A second category addresses make these predictions, how do you proceed? forecasting from a management perspective. These papers attempt to determine growth or In this paper we present a method for change in activities that may affect comput- making long-range workload predictions that ing needs. Isolated approaches exist that quantifies the effects of qualitative changes attempt to bridge the gulf between qualita- in computing. Naturally, such predictions tive changes in the activities and their are somewhat speculative, and we claim only quantitative effects on computer resource re- to provide a framework with which to organize quirements. Prediction methods in this area and quantify assumptions. The method con- include extrapolation from resource require- sists of constructing a polynomial expression ments of existing application programs for the workload in which each term [11-13], forecasting of resource requirements represents the effects of one change. The for applications that are not yet completely terms are constructed by concentrating on the implemented [14,15], and also some effects of effect that the change will have on individu- al users. This method explicitly represents Figures in brackets indicate the assumptions and allows parametric ranges of literature references at the end of this pa- results per .

7 feedback between workload and level of ser- almost certain to be wrong. This difficulty vice provided to users [16]. suggests that these forecasts should be cast in a form that is easy to update as new in- The essence of these approaches is to formation arrives. Some problems that may determine the nature of the current computing occur if updating is not anticipated are workload and, using this information, to pro- described in Reference 17. ject the amount of similar work that will be done at some future time. There are differ- In Part II, we describe the need to ences in how the current workload determina- predict message traffic in the Los Alamos Na- tions are made and in the fundamental units tional Laboratory (Los Alamos) Integrated of measure used to describe the workload. Computing Network up to 1990. In Part III, The units of measure range from resource we explain our method and illustrate its use. utilization data for specific computer com- ponents to characterizations of project ac- tivities. These approaches seem best suited 2. The Problem for short term (1-2 year) forecasts, because the effects of quantitative changes are like- 2.1 Integrated Computing Network ly to outweigh the effects of qualitative changes during this interval. At Los Alamos, the Central Computing Fa- cility (CCF) includes an Integrated Computing These methods are not suited to our Network (ICN) that allows all validated com- specific problem, which is to forecast ef- puter users at the Laboratory access to al- fects of qualitative changes in computing. most any of the machines or services of the In particular, these approaches do not ad- CCF. dress the influence that revolutions in com- puting hardware and services exert on how a Figure 1 is a schematic diagram of the user does his work. A second difficulty in ICN. At the "front end" of the Network (the forecasting is that long-term forecasts are right side of the diagram) an arbitrary

WORKERS TERMINAL

CRAY 1 (4) CONCENTRATOR -TERMINALS CDC 7600 (4) CDC CYBER 73 (2) POP 11 (12) CFS CDC 6600 (1) WORD IBM 3850 -PROCESSING IBM 3350 DEVELOPMENT SYNC SYSTEMS IBM 434 1 MACHINES SECURITY

IBM 370/ 148 POP 11 CONTROL (3) DATA VAX 11/780 (3) FILE COMM VAX 11/780 (1) P0P11 (1) external TRANSPORT CENTER POP 11/70 (1) "networks SWITCHES

SEL 32/55 (4) PAGES

VAX 11/780 (2) WORK STATION VERSATEC CONCENTRATOR PLOTTER (2) INTELLIGENT FR 80 (3) VAX 11/780 (1) WORK XEROX 9700 STATIONS PRINTER (2) DISTRIBUTED PROCESSORS VAX GATEWAY -OlSTRlBUTEO PROCESSORS L ...... i VAX 11/760 (3)

Figure 1. Functional Diagram of the ICN

8 .

number of terminals (currently about 1350) and simulation models of the Network. With and remote entry stations are concentrated in these models we are beginning to identify the stages to front end switches (the SYNCs), so critical resources in the Network as well as that traffic can be routed between any termi- to investigate the effects of increased nal and any worker computer. Thus, aside traffic load, new equipment, and alternate from administrative restrictions, a user can configurations. Both the measurements and log in on any worker from any terminal. The the models are at present rather crude. worker computers include three Cray-Is, four CDC 7600s, one CDC 6600, and two CDC Cyber-73 In some of the following analysis we computers. Each of the worker computers is treat short messages (less than 100 bytes) connected to the File Transport (FT) switches and long messages separately, because our and, by the FT switches, to the "back end" of models indicate that different resources are the Network (left side of the diagram) . The critical in handling them. The critical FTs allow the workers to send files to each resource limiting the Network's capacity to other and to the special service nodes in the carry large messages seems to be buffer space Network. The special services provided by in the switches, while line capacity and the Network at present include switch processor capacity are critical for small messages. o an output station (PAGES) to which are attached a wide variety of printing and At present there are about 3000 users of graphics devices, the CCF. We measure approximately 20 large and 80 small messages per second in the back o a mass storage and archival facility end of the Network and about 100 small mes-

(CFS) [18] , and sages per second in the front end. This is 0.06 small and 0.007 large messages per o XNET, which handles file traffic between second per user. workers and computers outside the ICN. 2.2 The Forecasting Assignment Messages between workers and SYNCs are usually quite small and are never larger than Recently we were asked by management to 1000 bytes. Messages routed through the FTs predict what the network traffic in the ICN can be as large as 25,976 bytes; large files would be at various points in the future up are broken by the sending machine into mes- to the year 1990, 10 years from now. Current sages no larger than this, and the messages management forecasts indicate that the number are sent sequentially (the ICN is not a of users of the Network will grow linearly packet-switching network). from the present 3000 to 5700 in 1990; managers also anticipate a certain number of In this paper, a "message" is one user- large worker computers in the Network by that or program-defined group of bytes (plus net- year work header) transferred together from a source node to a destination node in the ICN. If the kind of work people do and how "Nodes" include terminals, worker computers, they go about doing it both remained con- and special service stations, but not concen- stant, then the problem would be relatively trators or switches (SYNCs and FTs). From straightforward. We might, for example, sim- the point of view of network implementation, ply predict that the load in 1990 would be messages are certainly the appropriate unit of workload. From a larger view, considering (5700/3000) * (present load) the ICN as a unit, one might first think of workload in terms of terminal sessions, tasks ignoring the different number and kinds of submitted for execution on worker computers, worker machines in 1990 on the assumption etc. We believe that messages are an addi- that messages are generated by programs and tional valid measure of workload, because people, not primarily by machines. However, there is a fairly direct correspondence computing habits have changed significantly between user or user program commands and in the past 10 years, and they are likely to messages generated. Messages result from a again in the next 10. Timesharing radically carriage return at the terminal and from cer- altered the way people used computers in the tain explicit program functions or worker 1970s, distributed processing and networks computer command language commands. are doing it now, and there may be time for two more revolutions by 1990. Change seems With colleagues we have just begun a new to be a given in computing, and no one has network performance measurement and evalua- developed a model to predict it. Thus we tion project on the ICN. This project in- preceded our response with numerous caveats, cludes measurement and characterization of and, when management promised to heed them, message traffic in the Network and analytic ... --^ work.

9 . .

Clearly, the traditional PME predictive data base, and for other unanticipated tool, namely a model of the Network, does not functions may exist. (In fact, word apply directly to this problem; models are processing software is available on a designed to take workload as input, not to PDP-11/70 in the Network now, but this predict it. Furthermore, there exists at software is not yet widely used.) In present no characterization of our computer addition, the worker computers them- workload in terms of "worksteps" or "activity selves may become more specialized units" [12,13], nor any formula for translat- with some machines serving mostly as ing from these to network activity. Finally, number crunchers and others as even if we had such workload characterization general-purpose front ends to the and such a translation formula, it is not number crunchers clear that the formula would be valid for computing conditions 10 years hence. In 2. Increased use of intelligent and fact, the nature of the problem and the lack graphics terminals. of data force us into the role of futurists, a role for which a systems analyst may be no 3. Proliferation of distributed proces- better qualified than the next person. sors (DPs) and local networks of DPs within the Laboratory but outside the 3. The Solution ICN. For a variety of reasons, the number of mini- and midicomputers out- 3.1 The Method side the ICN continues to grow. They are used both for specialized pur- The central idea of our method is to poses, such as process control, and concentrate on the individual user, that is, for general computing; some are con- to predict the effect on the user of future nected in small local networks. Typi- changes in network equipment, topology, and cally these can communicate with any services. This is clearly risky, because node in the ICN via XNET. people are the least understood and least predictable element in computing systems. 4. Electronic mail. Some electronic mail Nevertheless, this focus seems necessary, be- system will probably be installed at cause we do, in fact, believe that network the Laboratory within the next few traffic is affected more by what people years, although it may be implemented choose to do and how they choose to do it as a separate mechanism rather than than by the equipment they use. Of course, through the ICN. network topology, equipment, and services make certain tasks easy and others more dif- 5. Connections with remote networks. The ficult, but so do other factors. We are not most likely candidates are the comput- trying to literally predict human behavior; ing facilities at other Department of we are trying to orient and focus our think- Energy laboratories. Since these in- ing in the face of too much uncertainty. stallations tend, at present, to have sufficient computing power for their

The first step is to identify factors own needs , the connections will prob- that will change computing in our network. ably be used to transfer data, pro- Then we quantify the effect of each factor on grams, reports, etc., rather than to network traffic that individual users gen- allow remote use of our computers. erate. Finally, we collect the terms Similar connections to additional net- representing each factor into a polynomial works are possible. expression Each of these five factors is either a 3.2 Five Factors trend that we see now in computing at Los Alamos or a capability currently being dis- We were able to identify five factors cussed and considered for inclusion here. In that we believe will affect the way people other words, we did not attempt any serious use the ICN in the next few years. They are long-range crystal ball gazing, although the as follows. method allows this if you have the courage

(see Section E) . In the next section, we 1. Specialization of the Network. At discuss the effect of each of these five fac- present, CFS and PAGES are specialized tors on network message rates. nodes to which users from any worker can send files for permanent storage 3.3 Analysis of Factors or for output. In the future, spe- cialized nodes for word processing, It seems easiest to break the estimation for a network status and performance of the effect that a change will have on any

10 ,

system measure into two steps. First, one sages currently go to or from special- can analyze the qualitative aspects of the ized nodes, and if we believe that a effect. For example, is the effect most na- user will generate 50% more messages turally expressed as a ratio to the present because of network specialization by number of messages a user generates or as an year y, then the value for A in the addition to that number? Is it independent formula for that year should be 1.4. of the user's current activity? Is it in- We might plug in values of 1.2, 1.4, dependent of the number of users? Answers to and 1.8 to get a range of answers these questions will determine the position corresponding to a range of assump- of the factor, which represents a given tions about future network specializa- change in the polynomial formula for comput- tion. ing the value that the measure is expected to have in the future. The second step is then 2 The effect of intelligent and graphics to plug in a numeric value for each factor, terminals will be limited almost en- or perhaps a range of numeric values. tirely to the front end of the Net- work. The use of graphics terminals We will illustrate this two-step process will increase the large message rate for each of the factors described in the pre- from workers to terminals, because vious section. terminal output will sometimes consist of plot information for a full screen 1. The specialization of the Network will instead of one line of text. The ef- clearly increase message rates. As fect of intelligent terminals, whether specialized service nodes are added graphics or not, may be complicated. one by one, an individual user doing On the one hand, the ability to do lo- tasks functionally equivalent to cal processing, especially screen present tasks will generate, perhaps editing, should result in fewer mes- even unknowingly, more network mes- sages of much larger average size. On sages as his files are shipped to the other hand, some users may program these nodes. The portion of a user's their terminals to issue very frequent messages due to specialization will program or network status checks on a grow in proportion to the increased background basis and take some action specialization of the Network. There- only when a certain response is ob- fore, a formula for the number of tained, thus greatly increasing the small messages in the Network should small message rate. contain a multiplicative factor a in a term In any case, the factors b and B representing this effect should prob- a*m*Ny ably be multiplicative as are a and A above. Management projections indi- where Ny is the number of ICN users in cate that 1000 of the terminals in the the future year in question and m is Lab will be intelligent in 10 years. the observed rate of small messages We have observed that, at present, per user today. That is, specializa- about one-fourth of all terminals are tion will increase small messages per logged in on any morning. An assump- user per unit time by some factor a. tion of an upper bound of 2.5 large There will be a similar term messages per minute at these terminals A"-M"Ny gives 625 large messages per minute, which is about half the present rate; in the formula for large messages. thus, we used values of from 1.1 to The way specialized nodes are now used 1.5 for B. We used values of from 0.9 indicates that the users will mostly to 1.1 for b. The small range of ship large files that will appear as values for b indicates that not all large messages; this is partly a terminals will be intelligent and that matter of economics. For every large most messages are already small. message in our network there is at least one small protocol message, so 3 The increased use of distributed pro- that the absolute increase in the two cessors and of local networks will types may be about equal; however, be- certainly decrease the ICN message cause there are presently more small rate per user. Almost all of these than large messages, A is greater users' terminal traffic, which con- than a. sists mostly of small messages, will be eliminated from the ICN. They will If we observe that 80% of large mes- still use the ICN for executing large

11 : . ;;

programs prepared locally and for spe- SMy = a*b*c*m*Ny + d*Ny + e (1) cial services mostly involving large files LMy = A"B*C*M*Ny + D-Ny + E (2) Once again, we decided that the fac- tors c and C should be ratios of the present message rates per user. where Values of from 0.5 to 1 for C and 0.25 to 1 for c seem reasonable. SMy and LMy are the number of small and large messages per 4. If electronic mail is implemented us- second in the ICN in year ing the ICN, then, obviously, message y; traffic will increase. It is not at all clear that there is any correla- m and M are the current (1980) tion between the rate at which users number of small and large currently generate messages and the messages per second per rate at which they wil] receive mail. user However, mail traffic will probably be proportional to the number of people Ny is the number of CCF users using the system. We assumed the re- that year; lationship will be linear (although there are certainly other plausible is the factor by which possibilities). Thus we included network specialization terms will affect the number of small messages per second d"Ny and D"Ny per user that year;

in the formulas for the number of b represents the effect of small and large messages. We eventu- intelligent terminals; ally decided that people would send and receive less than five large mail- c represents the effect of ings per day, which is a negligible distributed processing; addition to our load; therefore, we used the value zero for D. d is the number of small messages per user per 5. The additional message traffic caused second due to electronic by connecting our network to others mail would depend very much on the adminis- trative nature of the connection. If e is the number of addition- remote users were given essentially al small messages per the same capabilities as local users, second due to connections then the appropriate adjustment to the to external networks; and formulas is simply to increase the value of N by the number of remote A, B, C, D, and E are the corresponding fac- users. If use of the connection is tors for large messages. restricted to sharing programs, data, and reports between sites, in other We can now plug various values for each words, if the link is used as a fast of the factors into the formula and get "best substitute for the Postal Service, guess," "worst case," and other values for then the message rate might be in- message traffic. We can also experiment with dependent of the number of users alto- the effects of particular assumptions; for gether and might depend instead on example, we can assume that all terminals programmatic schedules. We assumed will be intelligent in 10 years or that elec- that the latter was more likely and tronic mail traffic is proportional to the added a simple term & to each formula square of the number of users. We can inves- to account for some small constant tigate "disaster" scenarios; to illustrate, number of messages due to this connec- we can determine the rate at which intelli- tion. gent terminal owners would have to generate status queries to the Network to saturate its 3.4 Formulas message handling capacity. Finally, we can determine by inspection or by trial which as- Collecting all the terms defined in the sumptions are most critical, for example, the previous section resulted in the following above formulas are clearly more sensitive to formulas the value of a than to the value of e.

12 3.5 Other Possible Factors total prediction. More accurate data about the effect of a given change can be easily The five change factors discussed above incorporated into the formulas so that pred- are certainly not the only ones that will af- ictions grow more accurate in an evolutionary fect computing in the Laboratory in the next way. Concentrating on the effects on indivi- few years. Since we constructed the above dual users might also work well for short- formulas, we have learned that, unknown to term predictions, but we found this method us, others in the Laboratory were already especially helpful as a way of isolating and planning another change, namely a organizing the uncertainties and shakey as- Laboratory-wide automated information manage- sumptions inherent in long-range prediction. ment system (AIMS). Some of the pieces of such a system, such as accounting programs References and some inventory programs, are already run on worker computers in the ICN. Their in- [1] Lindsay, D. S., A Hardware Monitor Study tegration into a comprehensive, widely used of a CDC KRONOS System, Proceedings, management information system would certainly International Symposium on Computer Per- increase network message rates. The point of formance Modeling Measurement and this example is that as many people as possi- Evaluation, Harvard University, March ble, from a variety of disciplines, should be 1976, pp. 136-144. included in the process of thinking of changes in computing. [2] Partridge, D. R. and Card, R. E., Hardware Monitoring of Real-Time Comput- More speculative changes than those we er Systems, Proceedings, International have given might also be included in a pro- Symposium on Computer Performance Model- jection. Very powerful processors on a sin- ing, Measurement and Evaluation, Harvard gle chip will soon be available at very low University, March 1976, pp. 85-101. cost. The use of high-quality graphics out- put devices may become much more widespread [3] McDougall, M. H. , The Event Analysis at the Laboratory to display movies (16 System, Proceedings, 1977 SIGMETRICS/CMG frames of graphics output per second) used to VIII, Washington, D.C., November 1977, study simulation modeling programs. Although pp. 91-100. present worker computers are not capable of producing 16 frames per second from these [4] Keller, T. W. and Lindsay, D. S., programs, long sequences of frames could be Development of A Performance Evaluation generated and stored in CFS; these could be Methodology for the CRAY-1, Proceedings, fetched and fed to the graphics device by the 1977 SIGMETRICS/CMG VIII, Washington, cheap powerful processor at such a rate. If D.C., November 1977, pp. 169-176. this happens, it will greatly increase the large message rate. [5] Abrams, M. D. and Neiman, D. C, NBS Network Measurement Methodology Applied 4. Conclusions to Synchronous Communications, Proceed- ings, CPEUG80, Orlando, Florida, No- Inserting our "best guess" factor values vember 1980, pp. 63-70. into the above formulas resulted in message rates for 1990 of five to six times the [6] Hughes, J. H. , A Functional Instruction present observed rates. To anyone familiar Mix and Some Related Topics, Proceed- with the history of computing, it might seem ings, International Symposium on Comput- unlikely that any workload measure on any er Performance Modeling Measurement and system will grow by "only" 500% in 10 years. Evaluation, Harvard University, March If this projection, in fact, turns out to be 1976, pp. 145-153. low, the reason will probably be that we failed to anticipate some development in com- [7] Terman, F. W. , A Study of Interleaved puting that radically affects network use. Memory Systems by Trace Driven Simula- The necessity of anticipating such changes tion, Proceedings, Symposium on Simula- is, of course, the greatest weakness of our tion of Computer Systems, Boulder, method; however, this weakness is inherent to Colorado, August 1976, pp. 3-10. the problem. It can be overcome somewhat by

requesting input from as many people as pos- [8] Mamrak, S. A. and Amer, P. D. , A Feature sible. Selection Tool for Workload Characteri- zation, Proceedings, 1977 SIGMETRICS/CMG Our method of prediction presented in VIII, Washington, D.C., November 1977, this paper identifies specific assumptions. pp. 113-120. It allows experimenting with different values of factors to see the part each plays in the

13 [9] Agrawala, A. K. and Mohr, J. M. , The Re- [14] Smith, C. and Browne, J. C, Modeling lationship Between the Pattern Recogni- Software Systems for Performance Predic- tion Problem and the Workload Character- tions, Proceedings, CMG X, Dallas, Tex- ization Problem, Proceedings, 1977 as, December 1979, pp. 321-342. SIGMETRIC/CMG VIII, Washington, D.C., November 1977, pp. 131-139. [15] Smith, C. and Browne, J. C, Performance Specifications and Analysis of Software [10] Artis, H. P., A Technique for Establish- Designs, Proceedings, Conference on ing Resource Limited Job Class Struc- Simulation Measurement and Modeling of ture, Proceedings, CMC X, Dallas, Texas, Computer Systems, Boulder, Colorado, Au- December 1979, pp. 249-253. gust 1979, pp. 173-182.

[11] Linde, S. and Morgan, L. , Workload Fore- [16] Axelrod, C. W. , The Dynamics of Computer casting for the Shuttle Mission Simula- Capacity Planning, CMG Transactions, tor Computer Complex at Johnson Space Number 29, September 1980, pp. 3.3-3.21. Center, Proceedings, CMC XI, Boston,

Massachusetts, December 1980, pp. 10-15. [17] Tomberlin, R. D. , Forecasting Computer Processing Requirements: A Case Study, [12] Mohr, J. and Penansky, S. G., A Frame- Proceedings, CPEUG80, Orlando, Florida, work for the Projection of ADP Work- November 1980, pp. 255-261. loads, Proceedings, CMG XI, Boston, Mas- 5-9. sachusetts, December 1980, pp. [18] Blood, M. , Christman, R. , and Collins,

B. , Experience with the LASL Common File

[13] Perlman, W. and Jozwik, R. , DP Workload System, Digest of Papers from Fourth Forecasting, Proceedings, CMG X, Dallas, IEEE Symposium on Mass Storage Systems, Texas, December 1979, pp. 469-475. Denver, Colorado, April 1980, pp. 51-54. .

FUNCTIONAL GROUPING OF APPLICATION PROGRAMS

IN A TIMESHARING ENVIRONMENT

Paul Chandler, CCP, CDF

Wilson Hill Associates 1025 Vermont Ave., N.W., #900 Washington, DC 20005

The ability to adequately describe the computing environment is important in the selection of timesharing services. A method is presented by which the user can functionally describe the workload in terms of require- ments for processing, data input, and data output. Large numbers of programs can thereby be reduced to only eight functional categories which are sufficient to characterize the workload. Unsophisticated non-ADP oriented users who are unable to use traditional per- formance or resource oriented groupings can use this functional method satisfactorily.

1. Introduction mark which does not lend itself well to a lowest cost acquisition. Also, Procurement of a timesharing many procurements must deal with a service can be a lengthy and costly sizable workload and cannot simplify process in many large procurement ef- their workload easily. Additionally, forts. The primary objective of such the procurement specialist who is procurements usually is to provide conducting the computer acquisition the most cost effective computer sys- process may be far removed from daily tem which will meet both current and processing requirements or else be future needs. faced with relatively unknowledgeable ADP users who do not understand the The acquisition of a computer details of machine processing. system presupposes that the workload is well defined. Typical parameters This inability to characterize used include turnaround time, core the workload is the subject of this size or segment, CPU time, number of paper. Outlined is a method by I/O transfers, number of lines which the benchmark workload can be printed, etc. However, many of these segmented by a reduction in the num- parameters are available only through ber of programs into distinct specially implemented hardware or classes software monitors. These monitors require knowledgeable ADP personnel 2. Methodology sensitive to procurement needs, as well as a thorough understanding of This method provides for a func- system requirements. tional grouping of application pro- grams using three representative cri- Many procurement efforts suffer teria. A dichotomous classification from a lack of specific resource or exists already in the minds of many performance areas and are left with ADP personnel of I/O versus CPU. only a functionally-oriented bench- This method separates the "I" from

15 .

the "O" and uses the three parameters Not included are specific per- of formance oriented data such as speed of execution or elapsed seconds for o Processing a file I/O, or resource oriented data such as CPU seconds, core size o Input used, number of bytes transferred from disk to core, etc. Only gen- o Output erally available data obtained from a knowledgeable user is required By using a selection method by without specific hardware or soft- which each criteria is judged to be ware monitoring. either an important or less impor- tant functional attribute of each Following this qualitative data application program, a clustering of collection, each of several knowl- programs emerges which share similar edgeable ADP persons reviews the functionally oriented characteris- collected data. These persons need- tics . not be users or persons from whom data was collected. These persons Figure 1 shows the eight possi- then make a binary high/low choice ble ratings using the three criteria for each of the three criteria iden- described tified earlier for each application program. Each person is asked to Class Rating Factors work alone for the initial selec- ID CPU Input Output tion. Upon completion, for each person, there will be a rating for Al Low Low Low each program as to their evaluation A2 Low Low High of program functionality in each of A3 Low High Low three categories. A4 Low High High A5 High Low Low The ratings of each application A6 High Low High program are compared and the persons A7 High High Low asked to reevaluate their assign- A8 High High High ments if conflicts arise. Addition- ally, some information as to data or Figure 1. Criteria Classes other interdependencies between pro- grams might be used to classify an The assignment of application application, as well. programs to a class depends upon the relative rating of each of the three The result of the selection criteria to the application. In process is that for each application order to perform this classification, there are three descriptors. For standardized and descriptive data the binary high/low judgment re- must be collected for each applica- quired for this model across all tion system. three criteria, eight different com- binations are possible. Figure 2 Data typically can be collected shows these eight combinations in a by means of a comprehensive data spatial relationship. collection effort which results in application being described in each A8 A7 functional terms. Such a descrip- tion will include a narrative de- scription of the program's primary purpose, an estimate of required data input and output, and general qualitative descriptions of program processing.

Additional data may be collect- ed for some quantitative data such as number of files, number of re- cords or transactions processed, size of the data base, language Figure 2. Spatial Relationship type, and number of tape mounts. of Criteria

16 . ..

Each of these dimensions can be little or no requirement for thought of in a spatial relationship either CPU, input or output. in such a way that the selection of These application systems these criteria will define a larger tend to be either in use at group of applications. remote sites or of an infre- quent nature, or used at a 3. Application of Method central location for selected special purposes. Included This method anticipates that a are those applications which natural grouping of programs will query current operating occur, and that groups of applica- system status, perform rudi- tions will tend to have similar mentary reports or text characteristics based solely on the storage of small files, or qualitative ranking of individual simple computational pro- applications grams .

An example is drawn from a re- A2 - Display Oriented Output cent Federal procurement effort of around 6.5 million dollars including This group relies to a greater 96 application programs in a commer- extent on some kind of for- cial timesharing environment. This matted output either in procurement effort was characterized tabular, report, or plotted by a wide geographic dispersion of form. Computational usage the user community, little or no in- appears to be low, and formation about application systems, most applications seem to be or any other good, well-defined parameter driven with small quantitative measurements. operator input required. The data bases used for this group Following the application of appear to be fixed and well this method, the 96 programs were defined with infrequent up- classed in eight functional cate- dates. Also included are gories out of which one or two pro- applications with high grams could be selected for bench- volume, repetitive type out- marking. Figure 3 shows the distri- put . bution of these programs across groups

Number of Rating Factors ID Group Name Applications CPU Input Output

Al System Support 19 Low Low Low A2 Display Oriented Output 17 Low Low High A3 Data Entry 1 Low High Low A4 Information Manipulation 7 Low High High A5 Basic Analysis 29 High Low Low A6 Display Oriented Analysis 9 High Low High A7 Complex Data Entry/Edit 7 High High Low A8 Complex Data Manipulation 7 High High High

Figure 3. Group Distribution

Also included in this example is an A3 - Data Entry assignment of a group name for each of the eight classes. The para- Although only one program was graphs describe each functional classed in this group for the group as used in this illustration. applications surveyed, this Similar names and characteristics group might be characterized could be defined for another set of by those programs which are applications primarily input dependent, and which include a substan- Al - System Support tial processing of data in file building. This group comprises those application programs with

17 . . . .

A4 - Information Manipulation manipulation or index structures This group depends on a fair amount of operator supplied Once the initial grouping was input and produces voluminous made, a second pass was made of each output. Computational re- group in order to generalize common sources are used in the build- features among programs. Of the 96 ing of file indexes, but no applications to which this method inter-file relationships or was applied, less than 7 of these permanent index structure had to be reevaluated, and then only (such as in DBMS applications) a shift in one criterion resulted. occurs. Simple statistical methods may be used in some 4. Discussion applications A possible problem with the A5 - Basic Analysis validity of this method might occur when only small application program This group contains those environments are considered. In applications in which a fairly the example presented, a good dis- sophisticated computational tribution occurred across all usage of machine resources is groups but one, and the distribution needed, either through direct appeared to be non-random. In pro- number crunching or in the curement efforts where there are building of file indexes. only a limited number of programs, Both input and output are it might be difficult to be certain equally weighted, but form that the best grouping had been a low part of the resources achieved used The familiarity of the evalua- A6 - Display Oriented Analysis tion with ADP was initially consi- dered to be a potential problem, This group represents a sophis- but did not prove to be insurmount- ticated usage of output faci- able in this study as knowledgeable lities to the exclusion of personnel who had some ADP back- operator supplied input. ground were available. In those Proprietary display packages instances where the evaluators are used in some applications. are non-ADP oriented, it seems The computational resources likely that additional qualitative used appear to be high. data must be collected.

A7 - Complex Data Entry/Edit An interesting extension of this method would include the quan- Unlike the Data Entry group tification of each workload criteria

(A3) , these applications into small discrete steps. Each utilize on-line input of data step could represent an increment with on-line correction and of the respective criteria such validation of input. The that all values between an arbi- computational usage is also trarily low and high point would be high. Output is a secondary represented. The workload could consideration here from the then be divided up using the method user viewpoint, since the presented here. primary purpose of these applications is to facilitate For the example given, by far data entry and validation. the greatest time was spent in data collection. The actual evaluation A8 - Complex Data Manipulation of applications only required a few days. It. seems likely that those This group was judged to have procurement specialists faced with the highest need and use of all knowledgeable ADP persons will be three criteria measured. The in a better position than those overriding concern addressed faced with unfamiliar users. here is the apparent need for high volume input and output In the environment where there together with complex data is a scarcity of information, this

18 .

approach appears to offer some help in workload quantification. This method provides a method acceptable from a procurement specialty where a savings of time and money can be obtained in environment where there is little information.

The author is grateful to Dr. Shahla Butler for her inspiration in this effort, and to Thomas Mink for his review of the manuscript.

References

[1] Bayraktar, A. N. Computer

Selection and Evaluation , June 1978, U.S. Dept. of Commerce, NTIS #AD-A061-070

[2] Ferrari, Domenico, Computer

Systems Performance Evaluation , Prentice-Hall, 1978.

[3] National Bureau of Standards, Guidelines for Benchmarking ADP Systems in the Competitive

Procurement Environment , FIPS PUB 42-1, May 15, 1977.

[4] National Bureau of Standards, Guideline on Constructing Benchmarks for ADP System

Acquisitions , May 19, 1980.

. ,

CLUSTER ANALYSIS IN WORKLOAD CHARACTERISATION FOR VAX/VMS

Alex S. Wight * ,**

Department of Computer Sciences Purdue University West Lafayette, IN 47907

Permanent address : Department of Computer Science

, University of Ed inburgh , Ed inburgh , U.K. EH9 3JZ

**This work was done while the author was with Digital Equipment Corporation ,Maynard MA 01754

Accurate models of computer system workloads are desirable for system modelling, performance evaluation studies, capacity planning, benchmarking, tuning and sys- tem selection. The use of cluster analysis for workload characterisation has received much attention recently. In this paper we explore the use of cluster analysis for characterisation of workloads on VAX systems running the VMS . In a largely interactive environ- ment we take as our basic workload unit the program or executable image invoked by a command from a terminal user e.g. file directory manipulation, editor, compiler, application program. We discuss some of the problems of cluster analysis and the possible uses of cluster descr iptions

Key words: Cluster analysis. Workload characterisa- tion.

1. Introduction data analysis and recognition and classification of patterns are There are many reports of stu- important. There exist both good dies of computer system workload surveys [12,14,15,18,19,20,22] and a characterisation number of books which describe clus- [3,4,5,6,10,13,24,25,27,31]. Accu- tering algorithms and the analysis rate models of workloads are desir- of the results of clustering able for system modelling, perfor- [ 7, 16, 2 1,26,.2 9, 3 0,3 3, 3 4, 35 ] . The mance evaluation stud ies , capacity application of cluster analysis to planning, benchmarking, tuning and the problem of providing concise selection. The use of cluster descriptions of computer system analysis in this work is not new workloads has been advanced most [28]. Clustering algorithms have recently by Artis [6] , Mamrak and been developed as tools in a wide Amer [32] and Mohr and co-workers range of disciplines where extensive [3,4,5] .

21 . s

In particular Agrawala et al our basic workload unit the program [4] discuss the validity of using or executable image invoked by a cluster analysis while Artis [6] and command from a terminal user e.g. Agrawala and Mohr [3] also discuss file directory manipulation, editor, the representativeness and useful- compiler etc. ness of the clusters which are gen- erated. Mamrak and Amer [32] and 2. Tools Hughes [27] discuss the problem of deciding which of the available, Our starting point was Artis' measured features can be used to very comprehensive discussion of the describe the workload units submit- use of cluster analysis for capacity ted to the clustering process. planning [6]. A search of the 1 iterature These studies illustrate the [7, 11, 12, 14, 16, 18, 19, 20, 29] reveals feasibility of applying cluster ISODATA [8,9,23] as one of the most analysis to accounting log or system popular clustering algorithms. This measurement data and generating con- may not be unrelated to the fact cise descriptions of the workload that it's heuristic nature makes submitted to the computer system in analysis of it difficult! Cluster the measurement period. If a clear analysis which attempts to classify picture of a stable workload already a population of objects on the basis exists we do not need sophisticated of features describing each object analysis to confirm it. Cluster has developed in many disciplines analysis may be used either to dis- ranging through the biological and cover if available data points lie social sciences and engineering. within known or expected groupings With a version of ISODATA available or to discover if they do exhibit , the example of Artis and the any structure or concise groupings. recurring commendation of ISODATA in the literature we chose to use ISO- There is a very large number of DATA. publications on the use of cluster analysis. Useful starting points The standard accounting log of are the papers by Dubes and Jain VMS does not provide sufficiently [18,19] which describe and examine detailed resource usage data for the problems of amalgamating the characterising the submitted work at results of cluster analyses, sta- the level of individual programs. tistical analyses, intuition and Ultimately we are interested in insight. Kendall [30,31] makes many characterising in detail interacti\/e points advising caution in the use users and the programs they invoke. of cluster analysis and stressing VMS was adapted to write additional the importance of the analyst who is messages to the accounting log familiar with the context of the recording statistics for each exe- data and appreciates the strengths cuted image or command. and weaknesses of the algorithms employed. Clustering results must be These statistics cover: interpreted with care. Diday [17] discusses some possible new CPU time approaches Elapsed time Memory occupancy The VAX/VMS operating system Disc traffic running on the DEC VAX 11/780 is Terminal traffic being used in an ever-widening range Paging activity of environments. To characterise the workloads in these environments requires a comprehensive methodol- Since it obviously imposes an ogy. In this paper we present an additional overhead and causes initial approach to the use of clus- interference this form of monitoring ter analysis as part of the metho- can be switched off and on for dology required to get useful models periods of interest. CPU time is for a number of possibly very not recorded accurately, but in the diverse workloads. In a largely standard VMS form by sampling and interactive environment we take as recording with a 10 millisecond

22 . .

unit. Attempting to account CPU much more complex case. Here again time accurately at the 1 microsecond the literature generally persuaded level raised the total data collec- us to follow Artis's example and tion overhead from around 5% to over scale the data using the robust mean 10% and was abandoned. Records from and standard deviation for each a relevant period of an available feature such that for each feature accounting log can be extracted and the data is normalised with unit used as data for statistical and variance. The BMDP statistical pack- cluster analysis. age was used for statistical analysis of the data. Use of robust As we have stated already a statistics for scaling reduces the majority of authors in the cluster- effect of outliers but one must ing literature advocate caution in always remember that any scaling applying a chosen algorithm to new affects the formation of clusters. data, in particular see For this known data the results were [15,18,19,22,30,31] encouraging and the expected group-

ings did appear . 3. Testing 4. Points from the Tests We proceeded as follows: The use of a remote terminal (a) Fisher's Iris data [30] has emulator provides a desirable con- been used many times to test clus- trolled environment for this form of tering algorithms. We checked that testing as well as more general per- our ISODATA program gave results formance work [1 , 2, 36, 37]. In this consistent with Kendall's analysis .case we can explore the effects on [30]. the clusters we see of different feature sets e.g. a resource con- (b) We generated artificial sumption set and a rate of resource data representing points on a 2- consumption set [3] . For explora- dimensional grid. These represented tory analysis of this type one intuitive ideas of dense, diffuse expects to iterate between cluster- and scattered clusters, clusters of ing and examining and interpreting odd shape and outliers. Dubes and clusters produced. Anderberg [7] Jain [18] give the results of apply- suggests generating a first set of ing different clustering algorithms clusters and investigating indivi- to similar artificial data. Our dual clusters before perhaps re- results were at the same time dis- clustering. Outliers can be con- turbing and reassuring. It was a sidered at this point too. Note the worthwhile exercise to gain confi- difference between ISODATA which dence in choosing values for the weights features equally and ISODATA control parameters but a attempts to reduce the variance of salutary lesson in treating all features within each cluster formed clustering results with care since and the approach of Agrawala and it is certainly possible even with a Mohr [5] where it is intended that known algorithm and "obvious" data the clusters formed reflect their to generate quite different clusters "natural" shapes with different from runs with different parameter weights being given to features in settings different clusters as they are esta- blished. ISODATA tends to produce (c) We next proceeded to use hyperspheres and treats all clusters image accounting data collected from identically. An "elongated" cluster a VAX system which was processing a may not be immediately obvious. If known and well-defined load of it is contained uniquely within a interactive scripts submitted from a hypersphere it must be deduced by remote terminal emulator. This gave examining the variances of the us the opportunity to explore the features for members of the cluster. interplay of cluster analysis and Otherwise the analyst must deduce performance evaluation. We had to from the collection of cluster

consider the . problems of feature descriptions that a series of small selection and scaling and continue possibly dense clusters are in fact exploring parameter settings in this closely related and when viewed on a

23 . :

larger scale form one identifiable ignore cluster validity and authors group are happy to demonstrate that the algorithm works in some sense that 5. Experiment they consider suitable. We agree with the observation and criticism The image accounting package but nevertheless proceed, taking the was installed on a VAX system in use view that cluster analysis is an for software developnent. Over a exploratory tool to be used with working day 1791 image records were care in examining the available recorded. The following five data , investigating hypotheses and features were chosen for the cluster seeking underlying structure. The analysis density of our individual clusters and the way they are spread satis- CPU time fies us while we examine the popu- Elapsed time lous clusters in the light of our

Direct I/O rate ( request/CPU sec.) knowledge of the workload.

Buffered I/O rate ( request/CPU sec.)

Page read rate ( request/CPU sec.) 8. Cluster Descriptions

Figure 2 shows that the 8 clus- Statistics for these features ters with the most members include are displayed in Figure 1. The clus- 94.8% of the images. If these clus- tering program was run with an upper ters have inter pretable characteris- limit of 50 clusters allowed. It is tics we have a starting point for a clear from the statistical summaries characterisation of the workload. that we are likely to reach this The values of the centroids of these number. In fact after 10 iterations clusters are displayed in Figures of trying to improve the clusters we 3,4,5 in terms of the untransformed have reduced the number of clusters values of the features. For each to 37. cluster we can look for the dominant features and for each image name we 6. Cluster Summaries have a list of the clusters in which it appears. The groupings of the At this point when we look clusters can now be checked and closely at the ISODATA summary interpreted by consulting this list. statistics we find the average dis- Although there are cases where tance of patterns or data points images appear in many different from their centroids or cluster cen- clusters interpretation is possible. tres to be 1.8 while the average The cluster groupings of Figure 2 distance between cluster centers is are roughly as follows. 31.2. The co-ordinates of a cluster centroid are the mean values for [1] Clusters 1,5 and 10 are each feature of the cluster members. represented by average values Note that these are distances in the of the features chosen and turn transformed data space. This gives out to be largely calls on a picture of tight, well-scattered standard command language util- clusters. Only four of the thirty- ities manipulating files and seven have points with an average directories. Such commands distance from the centroid greater where they appear in other than 3.0 compared with average dis- clusters are cases where they tance between centroids of 31.2. For have generated a large value the cluster centroids the distance for a particular feature and to the nearest centroid ranges from been treated as a separate 1.75 to 51.2 with a mean of 8.0. cluster since ISODATA will split a cluster if there is a 7. Validity large variance in one feature.

Dubes and Jain [19] discuss in [2] Clusters 14 and 18 are dom- detail the question of how valid inated by demands for CPU time clusters are. They feel that in and high terminal traffic general, engineering applications, represented by buffered I/O. including computer science, tend to The images represented here are

24 .

editor use,mail utilities and active programs. The log also con- application programs. tains sufficient information for commonly occurring sequences of com- [3] Cluster 21 is dominated by a mands to be found and incorporated high page read rate The members in models. We do not of course get of 21 are subsets of some of user characteristics like typing the previous images but also speeds and think times. closely clustered. We may hope that the largest [4] The grouping of clusters 7 and clusters formed from a series of 9 is dominated by a high buf- logs allow us to model what appears fered I/O rate and these clus- to be a stable component of the ters contain images making workloads under study. However we heavy use of video terminals. must also take note of the outliers or very small clusters as they may Rather than continue in this turn out to make some of the vein examining all the remaining greatest demands for system very small clusters we first com- resources or be characterised by plete the broad picture with a more very variable activity. If the conventional statistical analysis. small clusters and outliers make Returning to the raw data we list significant demands on the system for each feature the images with the then representing their effect is highest values. When we trace those important but may be more difficult to the clusters we find that they if they differ dramatically from have been identified as clusters and site to site, period to period and often the elongation effect can be especially if they largely reflect discerned i.e. a number of small the choice of features for the clus- clusters have the same dominant ter generation. This form of varia- feature and are seen to be identi- tion is also influenced by the level fied as nearest neighbours in the of workload unit we choose and the transformed hyperspace. nature of general-purpose interac- tive computing. Turning to Figure 6 and the most frequently occurring images we Note also that having identi- find as follows. The most common fied clusters we must look at them image with 312 instances has all but in terms of features not used for 16 of these in clusters 1,5 and 10. clustering both in construction of Proceeding to the second and third, models and possible amendments to all except 4 of 194 are in 1,5 and the clustering and characterisation 10 and all except 7 of 190 likewise. methodology There are 77 unique images recorded and 76% of all images recorded are accounted for by instances of the 10 10. Discussion most frequently-occurring images. Cluster analysis has often been 9. Workload Modelling discussed and used for workload characterisation in general and The clusters found can be used benchmark construction in particu- as a basis for a model of the work- lar. Much useful work can be done load. For each important cluster with fixed workloads [1, 2,36,37] but that we identify real or synthetic as user communities broaden and programs must be available or gen- technological change speeds 'changes erated with characteristics matching in user demands on computing systems those of "average" cluster members. it is desirable to attempt to gen- The number of occurrences of a model erate currently representative work- job in the workload model must be load models and keep them up-to- adjusted so that the load on the date . system generated by each cluster is maintained. More detailed study of This work explores part of the the accounting log allows us to look methodology and indicates that clus- for variations in the load over time ter analysis can be used for data and fluctuations in the mix of from highly interactive systems with

25 . .

a wide range of user demands as well [5] Agrawala, A.K., Mohr, J.M. and as some of the more constrained Bryant, R. , An Approach to the batch environments previously Workload Characterisation Prob- reported lem, Computer, June 1976.

Cluster analysis has developed [6] Artis, H.P., Capacity Planning its own literature in separate for MVS Computer Systems, in fields. There is still scope for Performance of Computer Instal- analysis of clustering techniques lions, ed . Ferrari, D.

for computer workload characterisa- , North-Holland Publishing Com- tion e.g. feature selection, effect pany, 1978. of correlation between features, effects of scaling and weighting of [7] Anderberg, M.R. , Cluster features, algorithm Analysis for Applications, compa r ison ,val id i ty of clusters, use Academic Press, 1973. of clusters, efficiency of algo- rithms and integration of results [8] Ball, G.H., Data Analysis in into a more complete characterisa- the Social Sciences: What About

tion methodology. the Details?, AFIPS FJCC , Volume 27, Part 1, 1965. 11. Acknowledgements [9] Ball, G. and Hall, D.J., ISO- Many people contributed to an DATA, A Novel Method of Data enjoyable and productive time at Analysis and Pattern Classifi- Digital. For the work reported here cation, Stanford Research thanks particularly go to Colin Institute, 1965. Adams, Rick Fadden, Steve Forgey and

Ted Pollak for many hours of assis- [10] Barber, E.O. , Asphjell, A. and tance and discussion. Dispen, A., Benchmark Construc- tion, Performance Evaluation

References Review , Volume 4, Numbe^ 4, 1975. [1] Adams, J. C, Currie, W.S. and

Gilmore , B.A.C., The Structure [11] Bayne, C.K., Beauchamp, J.J., and Uses of the Edinburgh Begovich, C.L. and Kane, V.E., Remote Terminal ESnulator, Monte' Carlo Comparisons of Software Practice and Experi- Selected Clustering Procedures, ence, Volume 8, 1978. Proceedings of Annual Meeting, of American Statistical Associ- [2] Adams, J.C., Performance Meas- ation - Statistical Computing urement and Evaluation of Section, 1978. Time-Shared Virtual Memory Sys- tems, Ph.D. Thesis, University [12] Blashfield, R.K. and Alden- of Edinburgh, 1977. derfer, M.S., A Consumer Report on Cluster Analysis Software, [3] Agrawala, A.K. and Mohr, J.M., The Pennsylvania State Univer- Some Results on the Clustering sity, 1976. Approach to Workload Modelling,

Proceedings of CPEUG 13th [13] Boi, L. , Cazin, J., Martin, R. meeting, October 1977. NBS and Bourret, P., The Use of a Special Publication 500-18. Synthetic Workload for the Optimal Allocation of Files on [4] Agrawala, A.K. and Mohr, J.M., Disk Units, in Performance of The Relationship Between the Computer Installations, ed Pattern Recognition Problem and Ferrari, D. , North-Holland Pub- the Workload Characterisation lishing Company, 1978. Problem, Proceedings of

, November Cole, A. , ed , Numerical Tax- SIGMETRICS/CMG V| | | [14] J. . 1977. onomy, Academic Press, 1969.

26 [15] Cormack, R.M., A Review of [2 6] Haring, G. , Posch, R. ,

Classification , Journal of the Leonhardt, C. and Cell, G., The Royal Statistical Society Use of a Synthetic Jobstream in

(Series A) , 1971. Performance Evaluation, The Computer Journal, 1979. [16] Clifford, H.T. and Stephenson,

W. , An Introduction to Numeri- [27] Hartigan, J. A., Clustering cal Classification, Academic Algorithms, John Wiley and Press, 1975. Sons, 1975.

[17] Diday, E., Problems of Cluster- [28] Hughes, H.D., A Study of a Pro- ing and Recent Advances, IRIA cedure for Reducing the Feature Research Report Number 337, Set of Workload Data, Computer 1979. Performance Evaluation in the

'80s, Proceedings of CMG X| ,

[18] Dubes , R. and Jain, A.K., Clus- 1980. tering Techniques: The User's

Dilemma, Pattern Recognition, [29] Hunt, E. , Diehr, G. and Gar-

Volume 8, 1976. natz, D. , Who Are the Users? An Analysis of Computer Use in a [19] Dubes, R. and Jain, A.K., Vali- University Computer Center, dity Studies in Clustering AFIPS Conference Proceedings, Methodologies, Pattern Recogni- Volume 38, 1971. tion, Volume 11, 1979.

[30] Kendall, M.G. , Multivariate [20] Duran, B.S. and Odell, P.L., Analysis, Hafner Press, 1975. Cluster Analysis, Lecture Notes

in Economics and Mathematical [31] Kendall, M.G. , The Basic Prob-

Systems, Spr inger-Verlag , 1974. lems of Cluster Analysis, in Discriminant Analysis and

[21] Everitt, B., Cluster Analysis, Applications ed . , Cacoullos,

2nd edition, John Wiley, 1980. T. , Academic Press, 1973.

[22] Friedman, H.P. and Rubin, J., [32] Mamrak, S.A. and Amer , P.D., A On Some Invariant Criteria for Feature Selection Tbol for Grouping Data, Journal of the Workload Characterisation, American Statistical Associa- Proceedings of SIGMETRICS/CMG

tion, , 1967. VI I I November 1977.

[23] Gnanadesi kan , R. , Methods for [33] Pore, M.D., Moritz, T.E.,

Statistical Data Analysis of Register, D.T., Yao , S.S. and

Multivariate Ctoservations , John Eppler, W.G. , On Evaluating Wiley and Sons, 1977. Clustering Procedures for Use in Classification, Proceedings

[24] Hall, D.J. and Dev Khanna , The of American Statistical Associ- ISODATA Method Computation for ation, Statistical Computing the Relative Perception of Section, 1978. Similarities and Differences in Complex and Real Data, in [34] Sokal, R.R. and Sneath, P.H.A., Mathematical Methods for Digi- Numerical Taxonomy, W.H. Free- tal Computers. Volume man, | | | (Sta- 1973. tistical Methods for Digital

Computers), ed . , Enslein, K. , [35] Spath, H. , Cluster Analysis Ralston, A. and Wilf, H.S., Algorithms for Data Reduction John Wiley and Sons, 1977. and Classification of Objects, Ellis Horwood Ltd., 1980.

[25] Haring, G. and Posch, R. , On

the Use of a Synthetic [36] Stephens, P.D. , Yarwood, J.K., Online/ Batch Workload for Com- Rees, D.J. and Shelness, N.S., puter Selection, Information The Evolution of the Operating and Management, Volume 3, System EMAS 2900, Software Number 3, 1980. Practice and Experience, Volume

27 10, Number 12, 1980.

[37] Wright, L.S. and Burnette, W.R., An Approach to Evaluating Timesharing Systems: MH-TSS A Case Study, Performance Evalua- tion Review, January 1976. —111 — 11 1

cn c; oc OC Si sr 233.330 3- 500.00 oc cn i>n 1 oc 1—

GO <_> i^no g OCo .— o o o SI o i! i

oc ^^ CSJ oc .— 358.069 DEVIATION STANDARD cr cCi lA Ln <— cn t

cn SC g o Ln r>n t— cn CNJ ^A >- UJ .— hn oo i

m oc hn cr oc cr fvn ivn cr. .— ^ t^n oc t tvn Ln oc MEAN cr LD oc 1— i i! i! 1 I !!

RATE!

RATE RATE I/O MILLISECONDS)! TIME SEC SEC SEC I/O

READ

0/CPU

(SECONDS) QIO/CPU RECT BUFERED QIO/CPU ELAPSED (10 PAGE CPU DQ

29 1

< N E liJ

O UJ < CO D- cn < no O LU z o < CVJ O- Q- < 2 ZD O 1X1 :3 O w OS o

CO oo D- t— =3 < o •\ in q:: a- <-I CD I—

LU CJ

CO LU LU M cc: oo

O CO

o Ln o o oo 1*^

30 I 11 1 1

LU ca CO OO Ln LU =) CM <_> cr LU LU ^ CM oo CD h- O OO <3: o: « Q_ ck: a

LU CO O LU cn LU h— =3 cn oo ac

LU CO LU CM h- ZD Ln OO I— o o Ln LU \ I— to CD Ln ck: o o 1—1 1— >— O ' C3

1— rH CD /-^ CO LU - LO CM --v Q- s: c_3 CM - CM 1— O C50 3p: CO hn w

LU -J •=r oo H- CD CO I—UJ 00 - CO LH rv.o CD CSJ wq: oo Q_ O UJ t_) I— n

1-H Ln

Q- oo a. cvi ZD oo CO O CM O CO o LA Q£ CM cr: CD tD CD

31 «1 1

UJ CO n UJ =3 cn q:: Q- in UJ LU LH - LT C3 I— O j CN

UJ OO C3 UJ UJ 1— ID o::

UJ {/:> UJ I— ZD CD OO OO »— OS C_J - CNJ UJ oc CNI OO Qi O O CN) ca « o

1-H cr a* UD CD cn

OO to OO t— cr

•=r OO oo a- o C-> rH CO 1— Q_ o O OO cn 1—1 iS3

32 33 oo 1^

CNl

OO 00

CM LA

o o LTl LU o < O3- OO in a. O CD CM

>-

OS 3 CM O o cn LU

o o

OO o

CnI

T T I O m o CN

34 Capacity, Disaster Recovery, and Contingency Planning

35

SESSION OVERVIEW

CAPACITY, DISASTER RECOVERY, and CONTINGENCY PLANNING

Chairperson: Theodore F. Gonter U.S. General Accounting Office

Speakers: J. William Mullen GHR Companies

Steven G. Penansky Arthur Young & Co.

Duane Fagg NAVDAC

ABSTRACT

At a recent computer conference a report from a capac- ity management/planning task force concluded that capacity needs are a business problem. Since data processing supplies the services that the business element uses to perform its functions, it concluded that requests for additional compu- ter capacity must be made jointly between the business ele- ment and data processing. Thus, a capacity management/ planning effort requires a close working relationship between the business and data processing community.

In addition, these same businesses most often depend on continuous availability of data processing services. To ensure the continuous availability of adequate services businesses develop contingency/recovery plans. These plans must provide for computer capacity considerations to ensure that sufficient capacity will be available to maintain adequate DP service.

The purpose of this session is to demonstrate the validity of the task force conclusion and the importance of capacity related considerations in the contengency/ recovery planning process. The presentations and papers in this session introduce considerations in capacity manage- ment/planning, introduce the importance of capacity planning in disaster recovery planning, and offer an approach to inte- grating the DPI security, performance management, and con- tingency planning functions.

37

. . :

CAPACITY MANAGEMENT CONSIDERATIONS

J. William Mullen

The GHR Companies, Inc. Destrehan, LA

Initiating an installation capacity management program required certain basic elements that must be in place in the installation to assure success of the program. These basic elements will be addressed with respect to their relationship to the business community and how this interaction should be developed in support of the capacity management /planning effort. An additional aspect of the capacity management program and function, capacity requirements in disaster recovery planning, will be out- lined with respect to the aid that the capacity manage- ment function can provide to the disaster recovery planning effort

Key words: Business case; business elements; business problem; capacity management; capacity planning; disaster recovery planning.

1. Introduction Three major topical areas should be considered in answering the Increased demand for computing above questions: services has given impetus to dramatic growth in many corporate o Environment for program data centers. Finding themselves implem.entation unprepared to predict and respond to these increases in requests for o Elements required for the computing services unable to maintain process ; and acceptable service levels for the user community and having to cope o Communications of results to with management demands and equipment management delivery schedules, installations are beginning to realize the benefits of Each of these major topics will be implementing a capacity management addressed in the following sections and planning program. The obvious on an overview level with a final questions that installation personnel section devoted to capacity responsible for program implementa- management /planning considerations in tion ask themselves are concerned the disaster recovery plan develop- with establishing a starting point ment and how the capacity management/ for the capacity managem.ent /planning planning program efforts can provide program determining if the basic aid in developing the plan. elements are in place in the installa- tion to support the program and how to evaluate program direction.

39 2 . Environment is the installation's ability to provide acceptable service at a reason- Impetus for implem.entation of a able cost. Demand is the business capacity management/planning program community's requirements for service generally stems from an environment and the time constraints imposed by where demands for increases in com- these demands upon the installation. puting services from the user or Once the capacity management /planning business community have resulted in requirements are viewed as a ijusiness immeasurable and uncontrollable work- problem, personnel in the installation load growth and a continuing decline can begin to address the capacity in service levels. It is at this management/planning function in terms point where no amount of perform.ance of the business community as well as tuning will solve the problem, hardware/ software component utiliza- implications and consequences of the tions. This brings about the problem have been escalated to a necessity for service level agreements. management level most of us would like to avoid and reaction mode has become Developing service level agree- a way of life in the installation. ments with the business community Thus, installation personnel begin a users begins the capacity management/ serious attempt to establish a planning process. Service level capacity management /planning program. agreements provide the target that personnel performing the capacity A primary consideration in management /planning function must hit. program implementation is the From a capacity managem.ent view, the management structure of the installa- service level agreement defines tion and organization and the place- current service level requirements and ment of personnel responsible for allows the installation to evaluate performing the capacity management/ the business community's requirements planning function within the current for service against current deliver- organizational structure. Managing ables. From a capacity planning view, capacity and planning for capacity of service level agreements provide future hardware and software compon- insight to short-term business re- ents requires the availability and quirements as well as a definition of communication of a wide range of business objectives, direction and information. Information from lower long-range service requirements. levels within the installation con- Adjustments can be made, based upon cerning current configurations inputs obtained from service level utilization and workload characteri- agreements to distribute and better zations are required. Contact with manage current capacity as well as ' middle and upper management within the project future capacity requirements. business community of the organization is required to understand business A basis for capacity management/ strategies, business direction and to planning is technical knowledge. This match current business demands with knowledge is normally associated with hardware/ software component utiliza- the performance analyst or software tion and workload characterization analyst function. Though expertise data. Once this wide base of communi- is required in dealing with management cations has been established, a view and planning for the installation's of hardware and software component hardware/ software component require- utilization based upon business ments personnel performing the , demands can be developed and an en- capacity management/planning function vironment is created in which the should have skills more aligned with installation's current capacity can management. These skills include the be managed and future business demands oral and written communications on installation can be projected. ability to translate capacity require- ments from technical terms to business A perception of the capacity terms for each given level of manage- management and planning function as a ment receiving information. The end business problem must be created result of this translation becomes the within the installation and the business case for satisfying service business organization. This view demands of the business community. should be based upon an economic Additionally, such skills as inter- supply and demand situation. Suppl" -wing and negotiating will prove to

to be very useful in the interaction with An antecedent to the capacity management within the business com- management/planning program is a munity structure in determining their strong measurement base from which service requirements and developing data on configuration component realistic service level agreements. utilization can be obtained as well as In most installations, several people resource requirem.ents of the various V7ill be required to staff the capacity business systems receiving service. management/planning function to assure Consideration must be given to the that the above skills are available. configuration and the workloads it must support to enable the installa- The final consideration in this tion to select those measurement tools area is the charter of the capacity which will provide the information management/planning function. As required with the least impact upon stated, staff performing the function the configuration. Measurement tools should be positioned within the selected must be able to provide organization's management structure to configuration utilization in concert provide interaction with mid and upper with obtaining measurement data which management in the MIS as well as the will provide resource consumption business community's organization. information that can be related to the The charter should provide freedom business entities receiving service. from day-to-day operational problem Measurement data from the tools must solving and still allow latitude for be in a form that will provide ease of affecting change. Without the ability analysis and archival for historical to affect change within the installa- purposes. Currently available tion and the hardware/software com- measurement tools (i.e. SMF RMF) tend , ponents as well as the operating mode, not to have either of these qualities the capacity management /planning in their raw form. A major considera- function becomes an exercise in tion for the installation is imple- futility with little or no return on menting a central repository of the investment of resources. information which contains those measurement elements necessary for tracking resource consumption and 3. Process Elements workload growth as well as input to workload characterization. This task Technical knowledge and informa- has proved to be difficult for most tion is the basis of the capacity installations. Many installations management /planning function and the achieve some form of repository which raw input to the analysis and pro- provides configuration utilization and jection process. The initial infor- resource consumption growth statistics mation needed for the process is a but fail to provide a mechanism for complete, up-to-date inventory of the relating the utilization and consump- hardware/ software in use in the tion statistics to the various busi- installation and an understanding of ness entities receiving service. its capabilities and resource require- ments. Applications should be char- The final consideration in the acterized by workload requirements and capacity management /planning process subsystems needed for their support in is modeling and analysis for projec- processing. Hardware components ting workload growth and future should be characterized by their resource requirements . Persons under- function for support of processing and taking the modeling and projection their capability and potential for task will require both the low level growth. The inventory can thus become measurement data as well as knowledge the base for planning growth in the of business direction to properly installation's hardv/are and software apply their modeling knowledge. With- components and the applications the out knowledge of current service installation must support. The inven- requirements by the various business tory will also be an aid in the work- entities and planned business direc- load characterization process in tion and strategies, the modeling breaking out the various business process must depend entirely upon functions and systems that must be historical data which may not reflect characterized. growth patterns representative of current business direction and strategies becomes a major requirement

41 . : .

in the capacity management/planning Reporting must be tempered to process the various levels of management with- in the organization. Reports must be more concise and free from technical 4. Communications detail as the level of management being address moves higher up the Effective communications of the corporate ladder. This approach to results of the capacity management/ presenting a joint "business case" to planning process to management is the higher levels of corporate management area where many installations fail to tends to strengthen the communica- meet the expectations of their capaci- tions between DP and business manage- ty management/planning programs. ment and provide impetus for coopera- Reporting results of the program can tion in support of the capacity be divided into three specific areas management/planning process.

o data processing reporting; 5. Disaster Recovery Planning o business reporting; and Anyone involved in developing o joint reporting. an installation disaster backup and recovery plan will atest to the Data processing reporting re- difficulty in defining the processing volves around configuration utiliza- requirements necessary at a backup tion information and component utili- site and matching these requirements zation at a gross and sometimes very to sites available for backup detail level. Most installations purposes. Several of the elements have achieved some degree of exper- involved in the capacity management/ tise in measurement, tracking, and planning process provide inputs to reporting on configuration component the disaster recovery planning utilization and resource consumption process growth, but do not provide business entities this information by system, Inventory of applications and processed. component processing requirements will help assure that no critical Business reporting revolves application has been overlooked in around reporting at the level of defining the applications required to transactions, batch jobs, and output continue the business processing at a volumes in some form. Additional backup site and will aid during information may be provided on the backup site usage negotiations in contribution to component utilization assuring that necessary hardware/ by various business entities, but software components and options are this is generally not done at much in place at the backup site. more than a very broad spectrum of the business comm.unity (i.e. division, Resource consumption require- group within division, section, etc.). ments of the systems that are to run Very few installations have the at a backup site will assist the mechanisms in place to provide infor- personnel responsible for site deter- mation at the application level. mination and selection and provide personnel at the backup site with Joint reporting involves a com- information to judge their configura- bination of data processing reporting tion's ability to handle the and business reporting to provide additional workload requirements. management v/ith a history and current view of configuration utilization and A final aid to the disaster business entity resource consumption. planning effort provided by the ca- Projections, based upon knowledge of pacity management/planning effort is business direction and strategies, an understanding of the business can be given to management to provide direction and objectives of the a view of future configuration organization. This understanding capacity requirements based upon will help personnel involved in business growth. disaster recovery planning to identi- fy those systems critical to continu- ation of the business and the

42 . . ,

business impact loss of processing capability will have upon the organization

6 . Summary

Throughout the preceeding para- graphs , the idea of approaching the capacity management /planning process as a business problem and the involve- ment of mid and upper level management from the business community in the process. Effects upon measurement tool selection and reporting results of the capacity management/planning efforts to management must be approached from both the component utilization level and the business element level. This combination provides the necessary inputs to the workload characterization process reporting process, and planning efforts for future business element capacity requirements. Data elements from the data processing environment and the business community can help achieve a viable disaster recovery plan by combining requirements for component processing and business service

43

CAPACITY CONSIDERATIONS IN DISASTER RECOVERY PLANNING

Steven G. Penansky

Arthur Young & Company Washington, DC 20036

Increasing dependence on data processing in the Federal govern- ment and private industry has resulted in growing interest in recovery plans that minimize the impact resulting from a loss to an organization's data processing capability. Although disaster recovery planning is not generally the responsibility of an organization's CPE group, capacity considerations play a significant role in the development of a cost- effective recovery plan, and CPE practitioners are being asked to par- ticipate in the development of these plans.

This paper describes and discusses several aspects of the disaster recovery planning process in which capacity considerations should be addressed. The concepts of a "minimum configuration" and "recovery windows" are discussed. Capacity considerations involved in evaluating component recovery strategies based on reciprocal agreements, equipped recovery centers, recovery shells, and multiple facilities are presented. Finally, relationships between ongoing capacity planning and recovery plan maintenance activities are examined.

Key words: Capacity planning; contingency planning; disaster recovery planning.

Most Federal agencies and organizations should they occur. This paper discusses a in the private sector rely heavily on number of capacity-related considerations automatic data processing (ADP) systems to which should be included when these plans are achieve mission or corporate goals. Their developed. dependence on the continuing availability of these systems makes these organizations 1. Background vulnerable to disasters such as fires, flood, earthquakes, or other unexpected and In most ADP installations, the undesirable events which may affect their ADP responsibility for disaster recovery planning facilities. While brief interruptions in ADP has traditionally rested with operational senrice might be tolerated, most organizations groups. These groups, in most cases, have not would suffer considerable harm should a fully satisfied their responsibility for service interruption persist. The length of protecting critical corporate or agency an "acceptable" service interruption varies operations or assets. They have incorporated considerably from organization to an increasing number of security measures organization. However, it is clear that in designed to reduce the number of threats to most cases, just the potential of a prolonged which an ADP facility is subject. They have interruption in ADP services calls for the also instituted programs for the off-site development of disaster recovery plans to storage of critical data and, in some cases, mitigate the impact of the interruptions have executed letters of agreement with other

45 . .

similar facilities to provide "back-up" in the planning process in which capacity event of a disaster. Actual recovery plans considerations play an important role. were developed in only a few cases and, despite Several of the important concepts involved in these efforts, adequate protection was rarely disaster recovery planning are also described. achieved [l ]^ The four key phases in the recovery planning process are identified in figure 1. Section In recent years, data center management 3 discusses the capacity considerations in both the Federal government and private related to each of these phases. industry has become increasingly aware of the importance of effective recovery plans. Reports from external auditors and examiners

(e.g., reports from audits conducted by bank FIGURE 1 or insurance examiners or CPA or other firms) PHASED APPROACH TO PLAN and guidelines issued by internal auditing DEVELOPMENT groups (e.g., 0MB Circular A-71 [2 J) have focused management's attention on the critical importance of these plans. As a result, more PHASE 1 - ASSESS RECOVERY REQUIREMENTS concerted, corporate or agency-wide efforts have been laimched to identify recovery PHASE 2 - EVALUATE ALTERNATIVE RECOVERY STRATEGIES requirements and develop a suitable recovery plan to meet these needs. It is in these more - recent efforts that CPE practitioners most PHASE 3 PREPARE RECOVERY PLAN often have become involved. PHASE 4 - REVIEW & TEST PLAN ON ONGOING BASIS Generally, these more recent efforts at recovery planning have taken a more realistic view of the organization's needs and have set about the development of a plan satisfying these needs. Traditionally the responsibility of operational (but, with increasing 2.1 Phase 1 - Assess Recovery Requirements frequency, corporate security) groups, recovery planning now involves The purpose of this phase is to identify representatives from a wide range of the organization's recovery requirements and organizational components. CPE practitioners to determine the "recovery windows" and the are being asked to participate in the recovery "minimum configuration" which then become key planning process to provide much needed data ingredients in the disaster recovery planning on the capacity requirements which must be met process. The primary steps and results of by the individual recovery "strategies" which Phase 1 are presented in figure 2. are ultimately combined to provide a framework for the full recovery plan. Although these The first step in this process should be capacity considerations are only a small part the development of a series of preliminary of the overall planning process, they play an assumptions on which the recovery requirements important role in identifying viable are to be based. In particular, these strategies on which to base a plan and in assumptions should address: ensuring that the plan continues to provide the required protection as workloads and The type of disasters which will be configurations change. addressed (or ignored) by the plan.

2. Overview of the Disaster Recovery Planning The availability of public Process utilities, transportation, and other types of support not directly A full description of the steps involved controlled by the ADP facility, and in developing an effective recovery plan is beyond the scope of this paper. However, the The time ( e.g., end of week, end of various capacity considerations involved in month, etc.) when the disaster would the recovery planning process are best occur. presented in the context of this process. The following paragraphs present an overview of By identifying and documenting these this process, emphasizing those aspects of the assumptions, the organization can ensure that the recovery requirements are based on a common set of assumptions regarding the nature Figures in brackets indicate the and scope of the disaster. Additional literature references at the end of this assumptions will be added throughout the paper planning process.

46 FIGURE 2

PHASE 1 — ASSESS RECOVERY REQUIREMENTS

STEPS: RESULTS:

• ANALYZE THREATS AND EXPOSURES • APPLICATION RECOVERY PRIORITIES

• EVALUATE LOSS POTENTIAL AS A FUNC • RECOVERY WINDOW CHARACTERISTICS TION OF TIME

• MINIMUM CONFIGURATION RcOUIRE- • DEFINE RECOVERY WINDOWS MENTS

• IDENTIFY MINIMUM CONFIGURATION REQUIREMENTS

REVIEW WITH MANAGEMENT AND USERS

Because the potential losses incurred by software; an organization will differ for each application system considered, and because, data; with all likelihood, the magnitude of these potential losses will change as a function of documentation; the length of the service interruption, recovery requirements should be expressed on . communications (data and voice), an application- by-application basis and as a and; function of time. Political considerations may make it difficult to develop a precise supplies ranking of these recovery requirements. In these cases, priorities may be assigned to needed to process an organization's critical classes of similar applications. At this applications in each recovery window. As a point, a number of applications (or one or final step in this phase, the prioritized more classes) are identified as "critical applications, recovery window definitions, and applications." These applications are minimum configuration requirements are included in subsequent recovery planning reviewed with management and users. efforts, the remainder (which might include "applications" such as systems development, 2.2 Phase 2 - Evaluate Alternative Recovery. performance reporting, etc.) are considered Strategies deferable until normal operations resume. During this phase, the recovery

The time dependent nature of these requirments determined during Phase 1 are used requirements suggests that several recovery to analyze alternative recovery strategies and strategies may be required - each operating select an overall recovery approach on which in one or more time frames. These time frames the organization's recovery plan can be based. are termed "recovery windows" and are based Figure 3 summarizes the key steps in this on the recovery requirements identified. For process and the results to be obtained. each of the recovery windows identified, a "minimum configuration" must be identified. Depending on the length of a service This configuration is the smallest set of: interruption, one or more recovery strategies may be needed to restore an organization's ADP

• people; capabilities in the most cost-effective manner. Multiple recovery strategies are space, power, and other usually required because inherent time delays, environmental preparations; costs, or other considerations preclude the use of a single strategy throughout the

. hardware; duration of a disaster. For example, an organization may need 24 hours or more to FIGURE 3 PHASE 2 — EVALUATE ALTERNATIVE RECOVERY STRATEGIES

STEPS: RESULTS:

• IDENTIFY VIABLE RECOVERY • ALTERNATIVE RECOVERY STRATEGY STRATEGIES EVALUATION

• EVALUATE STRATEGIES AGAINST • COST ANALYSIS RECOVERY REQUIREMENTS • OVERALL RECOVERY APPROACH • EVALUATE COSTS OF ACCEPTABLE RECOMMENDATIONS STRATEGIES

• REVIEW WITH MANAGEMENT AND USERS

transfer its critical application to an Recovery shells - A building alternate site (Strategy A) due to the equipped with the raised floor, logistical problems involved in such a power, cooling, fire protection, transfer. As a result, manual procedures and security, and other preparations recordkeeping (Strategy B) may be required to needed to establish a new ADP maintain critical functions in the hours facility but not data processing immediately following a disaster. hardware or communications facilities. The viability of a The recovery strategies generally strategy based on the use of a available to an organization include: recovery shell is directly related to the hardware and communications Clerical procedures - The use of vendors' ability to deliver forms, logs, and other recordkeeping replacement equipment. measures to record critical information and build a queue of Equipped recovery centers - work to be processed once automated Complete ADP facilities, equipped processing capabilities are with compatible hardware and restored. If this strategy is to communications equipment. Access be used successfully, these forms to these facilities can generally and logs along with written be obtained within hours after a procedures describing their use disaster is declared. However, must be developed prior to the their continued use is usually disaster and kept up-to-date. limited to between 30 and 60 days. Because of the large number of paper Cost is usually the primary obstacle records which result, clerical to the use of these facilities. procedures are generally useful only for short- interval recovery Multiple ADP facilities - The use windows, such as 48 hours or less. of two or more geographically separated facilities, the smallest Formal reciprocal agreements - of which is large enough to process Agreements which guaranty that an the organization's critical organization struck by a disaster applications. The nature of this can share another organization's ADP strategy ensures the availability facility and processing time. of a recovery facility throughout Though simple to establish in the duration of a disaster. Because principle, these agreements are the use of multiple facilities difficult to maintain and are involves business decisions and unlikely to be totally reliable. commitments beyond the scope of The impact of such agreements on the disaster recovery, it is difficult "unaffected" organization usually to determine the direct costs limits the use of this strategy to associated with this strategy. recovery windows of one week or

less. I

48 Once the technical, logistical, reviewed with senior management, users, and organizational, and other implications of ADP management to ensure that the completed these strategies have been identified, an plan meets the expectations of all concerned. evaluation on these terms is usually conducted to eliminate those strategies which are not 2.4 Phase 4 - Review and Test Plan on Ongoing consistent with the recovery requirements or Basis which have unacceptable consequences in other areas. These component recovery strategies, During this phase, initial training for each operating in one or more recovery windows management and operational personnel should and for some subset of the organization's be conducted and the detailed plan developed critical applications, can then be combined in Phase 3 tested (see figure 6). Individixals to form one or more recovery approaches (see who have direct responsibility for disaster figure 4). The costs associated with recovery should be thoroughly versed in the alternative strategies and approaches is then plan and its execution. Those who do not should examined and an overall recovery approach be acquainted with the policy and procedures selected. The results of this evaluation are defined in the plan. then reviewed with management and users to ensure they understand the advantages, Periodic testing of the plan is essential disadvantages, costs, and any special to ensuring that it will provide the necessary considerations associated with each protection if and when it is needed. Tests alternative and concur with the overall should be conducted in a manner that evaluates recovery strategy selected. the capabilities of the plan under a realistic set of conditions which could be expected 2.3 Phase 3 - Prepare Recovery Plan during a disaster. An organization's most critical applications should be tested first, Once the framework of the plan has been and the full range of functions required to defined, a recovery plan development team must successfully process these applications be established and the detailed planning, should be tested. analysis, and documentation required to prepare the actual recovery plan must be Maintaining the recovery plan once it has performed. The primary steps and results in been developed is a vitally important this phase are presented in figure 5- The function. As is the case in most planning specific plans and procedures to be prepared efforts, the recovery plan developed in Phase in this phase are largely a function of the 3 will be out of date by the time it is particular recovery approach selected. As the completed. The formal plan maintenance final step in this phase, the plan should be process developed in Phase 3 must be continued

FIGURE 4 COMBINATION OF STRATEGIES TO FORM OVERALL RECOVERY APPROACH

PRIORITY APPLICATION CLASS OVERALL RECOVERY APPROACH

STRATEGY PERSONNEL 1 STRATEGY 1 STRATEGY ACCOUNTS 1 RECEIVABLE 3

INVENTORY 2 STRATEGY CONTROL 6 STRATEGY 9 STRATEGY PAYROLL 3 4

EXPECTED RECOVERY DURATION (MONTHS) C

49 : :

FIGURE 5 PHASE 3 — PREPARE RECOVERY PLAN

STEPS: RESULTS:

• ESTABLISH PLAN DEVELOPMENT TEAM • PLAN DEVELOPMENT TEAM ORGANIZATION • PREPARE DETAILED PROJECT PLAN • PROJECT PLAl'J CONDUCT DETAILED APPLICATION REVIEW • APPLICATION RECOVERY REOUIRE MENTS • PREPARE PROCEDURES FOR INITIATING RECOVERY • DESCRIPTION OF CRITICAL RESOURCES

• PREPARE PROCEDURES FOR • DETAILED RECOVERY PROCEDURES APPLICATION PROCESSING • STAFFING AND RESPONSIBILITIES • FINALIZE RECOVERY TEAM ORGANIZATION AND RESPONSIBILITIES • MAINTENANCE AND TESTING PROCEDURES • PREPARE PROCEDURES FOR MAIN TAINING AND TESTING PLAN • APPROVED RECOVERY PLAN

• FINALIZE PLAN DOCUMENTATION AND REVIEW WITH MANAGEMENT AND USERS

throughout the life of the plan to reflect the 3.1 Phase 1 - Assess Eecovery Requirements many changes which are a regular part of data processing. Since Phase 1 is concerned primarily with evaluating exposure and risks and prioritizing recovery requirements, capacity considerations do not play a major role. One FIGURE 6 might argue that, at this stage, questions about PHASE 4 - REVIEW AND TEST PLAN ON ONGOING BASIS the capacity of an organization's ADP configuration,

the capacity requirements of STEPS: RESULTS: individual applications, and

• TRAIN MANAGEMENT AND OPERATIONS PERSONNEL the capacity which may be required A DISASTER to support an organization's RECOVERY critical applications in a recovery, • CONDUCT PERIODIC TESTING CAPABILITY environment

• PERFORM ONGOING MONITORING should not even be considered. The exposures AND MAINTENANCE and potential losses identified and the recovery requirements derived should be based solely on the impact of a disaster and, at this point, not on the efforts and costs required to satisfy these recovery requirements. 3. Capacity Considerations

The early stages of Phase 1 do provide Capacity considerations are present in an opportunity to gather information which many aspects of the disaster recovery planning will be used later in this phase and in process. The following sections describe the subsequent phases. This informtion should major considerations involved in terms of each include of the recovery planning process phases described above. Resource requirements of each individual application

50 Run times describing the most disadvantageous time when a disaster could reasonably occur in terms of CPU and memory requirements the resource requirements of the organization's critical applications. A DASD space requirements disaster at that time would require the heaviest use of the AflP facility and would Tape drive requirements probably result in the highest vulnerability from a corporate or agency standpoint. Communications requirements Based on this scenario, the minimum Printing/ punching configuration can be selected using the requirements information described above or other available information sources. Each of the components Other special output described in Section 2.1 should be considered. requirements (special forms, In developing this configuration, the bursting, decollating, COM, following should be kept in mind: etc.) Only the gross characteristics of System and supporting software the configuration need be needed to process all applications determined; however, care should be taken to identify any special Operating system software features which must be provided or any other conditions which must be Special on-line systems met.

Utilities Operations in a recovery environment will be difficult, at Assemblers, compilers, linkage best. Few if any deviations should editors, etc. be made from normal operating procedures and more than an adequate Supporting systems (data base supply of all critical resources system(s), tape library should be included in the system, etc.) configuration.

Others In particular, additional capacity should be included to account for An inventory of all computer the fact that: hardware and communications equipment the system used for recovery may not be tuned to the Type of equipment critical applications, and

Vendor normal processing schedules will be interrupted and Model additional capacity may be needed to "make up" lost time. Special features. 3.2 Phase 2 - Evaluate Alternative Much of this information will already be Recovery Strategies available if a comprehensive CPE program has been established. Phase 2 capacity considerations center around the suitability of the alternative Once the organization's recovery strategies available for disaster recovery. requirements have been defined and the As defined in Section 2.2, these strategies recovery windows selected, the minimm generally include: configuration required to process the critical applications in each recovery window must be Clerical procedures determined. Since many critical applications are processed on a periodic basis, this Formal reciprocal agreements determination must be based on earlier assumptions regarding the conditions under Recovery shells which a disaster would occur. A "realistic worst case" scenario should be developed Equipped recovery centers

51 Multiple ADP facilities. of the other organization's applications (each day, if necessary) to gi;iaranty hardware and Since these strategies may be suitable for one software compatibility? Does it really want or more groups of applications and for one or to "buy into" someone else's disaster? more recovery windows, the capacity considerations described should be applied to 3.2.3 Recovery Shells each situation for which a strategy is being considered. Recovery shells are relatively easy to evaluate from a space standpoint: determine 3.2.1 Clerical Procedures the size of the site needed to house the minimum configuration( s) identified in Phase

In the event of a disaster, clerical 1 including room for disk, tape, and related procedures impose a substantial burden on the storage; input and output control areas; data personnel in user departments. These entry; etc. Power, cooling, and several other personnel, who have become accustomed to using requirements are handled the same way. If a an automated system, are suddenly forced to shell looks "tight", opt for a larger one. revert to manual recordkeeping. The difficulties they encounter stem primarily From a hardware and commxmications from a lack of: standpoint, the use of a shell is predicated on the vendor's ability to deliver the adequate staffing levels, which may necessary equipment and provide the have been reduced with the advent communications hook-ups needed. If possible, of the automated system; get vendors' commitments on replacement equipment in advance, in writing, and store training in the manual system due them off-site. Don't forget the time required to turnover, transfers, and for installation and debugging. If your retirement; and organization can't get a firm commitment, or if it can't afford the delays involved, use a forms, logs, and procedures needed shell for the later recovery stages and select to re-establish the manual system. a more reliable strategy for the earlier stages of recovery. Although clerical procedures are often an effective method of continuing critical 3.2.4 Equipped Recovery Centers functions in the period directly following a disaster, personnel capacity should be Equipped recovery centers come in sizes. examined as a significant factor in the success The sizes are usually based on the size of the of this strategy. CPU in the configuration, with the number and type of DASD devices, the amount of memory, 3.2.2 Formal Reciprocal Agreements and the type of communications provided scaled proportionally. Sometimes several size For disasters of short duration, formal options are available from the same vendor. reciprocal agreements can be effective. Some organizations can afford to share some portion Selecting the right size center is of their ADP facilities with another relatively easy if your organization's needs organization for short periods of time. fit nicely between the limits of each size Difficulties are generally encoimtered when class. If they don't you may be paying for longer periods are involved. Because of the more than you need, but anticipated growth risks involved in assuming that only a short over the next few years may justify the period of recovery will be required, and the additional capacity. Otherwise you'll have to effort needed to maintain the high degree of develop your plan around the features provided hardware and software compatibility a by the class you need now and then revise the reciprocal agreement requires, these plan a few years from now. A quick look at strategies are only effective in special the features and costs of each class and your circumstances. growth plans should answer the class selection question. Also remember that your plan will Before getting involved in a formal take six months or more to develop to the point reciprocal agreement, an organization should that you could actually use the center look carefully at its capacity requirements effectively, so adjust your capacity estimates and the capacity available at the other accordingly. site(s). It should evaluate whether it could afford to give up 30-40^ of its configuration If your organization has some type of capacity and still process its workload. Does exceptional requirement which is beyond the it have the time to process portions of several scope of a size class which otherwise meets

52 your needs, service bureaus in the vicinity cost-effective in two to five years. The of the recovery center may be able to provide impact of each strategy on the growth plans the additional capacity your organization should also be examined. needs. The cost may be lower than moving up a class. This approach is most applicable to 3.3 Phase 3 - Prepare Recovery Plan COM, printing, data entry and other kinds of processing that can be done "off-line." During Phase 3, the detailed policies, plans and procedures to be included in the 3.2.5 Multiple ADP Facilities final recovery plan must be prepared. Much of this effort parallels the analysis tasks

Multiple ADP facilities provide the most in Phase 1 only with more attention to detail. flexibility in recovery planning and are often Many of the capacity considerations involved the most costly strategy. However, many of have been discussed above. these costs are not directly attributable to disaster recovery. A second ADP facility can One particular procedure to be prepared be used in a variety of ways depending on the in this phase involves a considerable number nature of the applications portfolio, the of capacity considerations. This procedure organization's willingness to incur rather addresses the replacement of the ADP facility high costs, and their willingness to undertake or any equipment damaged during the disaster. business ventures: In the event of a partial or total loss to the ADP facility, decisions must be made about As a ready recovery site with relocating the ADP facility. Similarly, if sufficient equipment and staff. The the facility can be salvaged but the relatively high cost of this configuration was destroyed, decisions must approach would be a direct expense be made whether more (or less) powerful to the organization. This type of components should be acquired, and if so, which redundancy might be required if ones. recovery had to be as immediate as possible. 3.4 Phase 4 - Review and Test Plan on Ongoing Basis As a separate but equal facility, handling approximately half of the From a capacity standpoint, Phase 4 is workload and with approximately primarily concerned with "closing the loop" - half of the total capacity. This updating the plan to reflect the changes in: assumes that the organization's critical applications account for Hardware, software, staff, etc. at less than half of the total the primary site workload. Changes in corresponding areas at As a service bureau selling time for the site(s) used in the recovery processing that could be pre-empted plan in the event of a disaster. In most cases, this would involve a separate Organizational policies, business venture. priorities, or practices which affect the recovery plan A ready site must have sufficient capacity to process the critical applications and to begin The organization's applications this processing within some rather stringent portfolio. time constraints. This would certainly require ready capacity in all areas, including To provide the necessary updated information, personnel. The service bureau approach close cooperation and formal interfaces must imposes additional complications on an already be developed between a number of facility difficult problem. Though certainly feasible, management functions. Most of the information it is a decision with business and capacity needed to update the recovery plan should be considerations beyond the scope of disaster available from the organization's capacity recovery. planning process. Other information should be available from application system When evaluating the suitability of each development and change control programs. strategy and formulating an overall recovery Truly effective plan maintenance will approach, the impact of the organization's ultimately require a change in the way the growth plans on each strategy should be organization "does business," requiring new considered. Each strategy should be reviewed or improved interfaces between these and other to determine if it will still be practical and ADP and business fimctions.

53 4. Conclusions

Concerns regarding the impact of processing interruptions have stimulated increased interest in effective disaster recovery plans. To ensure the effectiveness of these plans, capacity considerations must be incorporated throughout the recovery planning process. CPE practitioners should become increasingly involved in plan development and maintenance activities to ensure that recovery plans provide sufficient capacity to ensure the continued availability of an organization's critical ADP functions.

I would like to acknowledge the contribu- tion Mr, Donald Cloud of Arthur Young L Company's Providence, Rhode Island office has made to the development of the concepts and approaches described in this paper.

Ref erences

[ l] Comptroller General's Report to The Congress, "Most Federal Agencies Have Done Little Planning for ADP Disas- ters," U.S. General Accounting Office, December 18, 1980.

[2] Office of Management and Budget, Circular A-71, Transmittal Memo- random No. 1, "Security of Federal Automated Information Systems," July 27, 1978.

54 End User Productivity

55

.

SESSION OVERVIEW END USER PRODUCTIVITY

Terry Potter

Digital Equipment Corporation Maynard, MA

Carl Palmer pointed out in this year's proceedings Foreword that Productivity is defined as the relationship between the quantity of goods or services produced (outputs) at a given level of quality and the resources used (input). Therefore, the end user of a computer system should see higher quantity and/or quality of goods or services at the same cost if productivity increases, or at least the same quantity/quality at lower cost. Issues in measuring/reporting on productivity; techniques to increase productivity; and methods to analyze or view end user productivity are the key theme of this session.

Computers were a major boost to U.S. productivity during their early development since they were put on assignments which required little human intervention and were well structured/repetitive. Archer points out that today the applications of computers are more interactive, involving major human intervention doing work which is highly unstructured and non- repetitive. These new applications of computers are requiring fundamental changes in both hardware and software.

New hardware such as local area networks, geographically distributed systems, new database facilities, graphics and intelligent terminals, microcomputers, as well as new architectures are beginning to emerge. New types of software allowing application development without programming or, at least, more automation of application development are also emerging. Elsen's paper focuses on one aspect of this new type or style of software development.

The era of "end-user-driven" computing is now upon us. Understanding the work that the user does will be key to engineering the systems which are flexible enough to meet the productivity goals of an organization. The paper by Wilson/Mohr begins to address the issue of measuring and reporting service while allowing the user to define work in a flex ible manner

In the past, performance analysts have viewed their job as analysing the computer system and its underlying structures. In the near future, however, performance analysts will be forced to view both the computer and the personnel subsystems. Archer's paper discusses a framework for doing performance analyses at different levels of system complexity. That is, he proposes a finite state modeling framework to express both the computer and personnel subsystems. Given this framework, one can apply the typical performance analysis methods we have all become accustomed to. 57

SERVICE REPORTING FOR THE BOTTOM LINE

Carol B. Wilson

Fiscal Associates, Inc.

Alexandria , VA

Jeffrey M. Mohr Chan Mei Chu

Arthur Young & Company Washington, DC

The productivity of the end-users of an ADP Installation is affected by the service being rendered by the installation, and by the anticipated service levels. Losses in productivity of the end-users can ripple out- ward in the form of losses in benefit to the organization from missed opportunities, etc., or in the form of increased cost to the organization to produce the same products or services. As the ADP installation becomes more critical to the success of the organization, the need to ensure the continuation of adequate service becomes more important. This paper de- scribes the basis of an approach to user service reporting based on the fiscal aspects of ADP service. Such a system would provide the neces- sary information to various levels of organizational management to per- mit the assessment of ADP performance as it affects end-user productivity and the bottom line for the organization.

Key words: ADP installation management; cost-benefit; performance manage- ment; productivity; user service reporting.

1. Productivity and ADP Service cal advances made in the hardware area. In response to these concerns, the ADP industry Productivity of the American work force has seen an upsurge in the development of has become a general issue of concern to productivity tools for both ADP professionals business and government. For the past sev- and non-ADP professionals using ADP as a eral months , hardly any issue of a manage- resource. ment related periodical has failed to have an article on productivity. We read arti- The concept of aiding programmer pro- cles concerning the great productivity found ductivity extends back many years. For in the Japanese industries which is no long- example, the addition of the Report Writer er found in American industries. We read capability to the COBOL language was an articles on how Theory Z of management will early productivity tool aimed at reducing increase the productivity of our work force. the time and effort required to format and Books have come out concerning the psycho- generate typical reports. The early efforts logical factors of groups and norm settings, in the file management system area were also and how this affects the work flow and pro- driven and sold for productivity increases ductivity of workers in different environ- by reducing the effort required to handle ments. Such concerns have also been voiced increasingly larger and more complex sets of in the computer industry, as the productivity data. Following this, products were devel- of ADP personnel lags behind the technologi- oped; e.g., data base management systems.

59 which would permit a non-ADP person to access placed on the USRS were the major topics of data and perform some limited formatting and the contract. In the remainder of the paper, manipulation of that data for reporting pur- we summarize the work of the contract and poses without having to use an ADP profess- discuss the requirements of such a USRS. ional as a go-between. 2. Service and Net Benefit to the What has sparked this interest in pro- Organization ductivity? Are we striving for greater pro- ductivity for the sake of greater productiv- The goal of any ADP installation should ity, or, is there some underlying principle be to meet the computing needs of its user that drives the pressure to maintain or to community; i.e., to provide the needed ser- increase productivity of the work force? vice to the users. User service reporting While there are undoubtedly sound social and should be the cornerstone of all computer psychological reasons for having people work performance management programs in that productively, we feel that the primary inter- (1) reports of deviations below the required est in productivity relates to the fact that level of service should be the trigger mech- decreases in productivity decrease the bene- anism for some type of performance improve- fit, or profit, to the organization. That is ment activity; (2) information should be pro- to say, that productivity affects the bottom vided by the reporting system to enable pro- line figure for an organization. This can per planning for the application of user easily be seen when we objectively look for workloads to the ADP installation resources a definition of productivity. Productivity to enhance productivity and maximization of can be defined as the amount of work perform- net benefit. Because the requirements for ed for a given amount of dollars. Productiv- service vary and the demands placed on the ity will increase when more products or ser- installation may vary, the task of managing vices are produced for a specified cost. service becomes one of global balancing and Increasing the output of an organization - planning for the best overall benefit to the products or services - increases the benefits organization. It should be noted that the to the organization in the form of income, ADP installation itself only generates a increased social benefits, etc. If the in- direct benefit to the organization if it crease occurs without a corresponding in- brings in outside income from "renting out" crease in the cost to produce the output, the its excess capacity. Otherwise, the ADP net benefit to the organization is increased. installation's benefit to the organization can only be measured in terms of the net So what does user service reporting have benefits to the organization of having the to do with productivity? Many ADP installa- workload run on it, where the net benefit to tions provide a valuable, possibly critical, the organization for a particular piece of service to the organization by providing com- workload can be defined as the difference puting resources to the organizational units between the benefit to the organization an^ to assist them in carrying out their func- the total cost of performing the work using tions. The service which the installation the ADP installation resource. provides can affect the productivity of the organizational units. As the ADP installa- How are service and net benefit related? tion service becomes more critical to the Bennett [BENNW 81] discusses this relation success of the organization, the need to en- by pointing out that if an ADP installation sure the continuation of adequate service provides service to an efficient user and becomes more important. A user service re- minimizes to a practical point the time re- porting system will be needed to provide the quired to successfully complete the unit of necessary information to the organizational work and the required level of user effort, management which will permit the assessment that the installation is providing practi- of ADP performance as it affects user pro- cally maximal effective service. Further, ductivity and the ability of an organization service may be considered to have improved to maximize its net benefit. if the time-to-completion or user effort are reduced and such reductions lead to reduced During fiscal year 1981, the National costs to the user or increased benefits. Bureau of Standards supported work (under The setting of service goals and the level contract NB80SBCA0501) to look into the of service rendered can affect the product-

feasibility of developing a user service re- ivity of the end-users . There are no univ- porting system (USRS) which would support ersal "best" service levels, since the decisions made to maximize the net benefit levels depend upon the environment of the to an organization. The problems associated work force which uses the ADP resource. For with developing a USRS based on such a con- example, the response time of an online cept and the requirements which would be system should be fairly fast for repetitive

60 . . )

data entry so that the system does not slow plex, and a discussion of the associated down the person entering the data. However, problems is definitely beyond the scope of in some situations it has been shown that a this paper. rapid response may actually lead to greater errors, uneasiness, and lessened productiv- Although non-trivial, we will assume ity of a terminal user. that the gross benefit to the organization for a particular unit of work can be deter- In the section immediately below we mined. It remains, then, to determine the give a general overview of the concept of gross, or total, cost to the organization of maximizing net benefit to an organization. processing that work at its ADP Installation. The reader is referred to the NBS special The total cost of processing work for a pro- publication [MOHRJ 81a] and other references ject (j) can be expressed as in the bibliography dealing with cost and/or benefit analysis of ADP service and the economics of computing. This discussion is C(j) = CU(j) + CI(j) (1) included to ensure that the reader has a general notion of cost-benefit analysis and its informational requirements, since the where C(j) is the total ADP-related cost for informational requirements form the basis of work for project (j the approach to a USRS. In the section fol- CU(j) is the Indirect costs to project (j) lowing the discussion of net benefit, we ex- CI(j) is the direct Installation costs amine what general requirements this approach to project(j) places on a USRS. CU(j) relates to the service rendered by the ADP installation and would include such 3. Concept of Maximizing Net Benefit to the things as the cost of the direct users of the

Organiztion project (i.e. , personnel who directly use the ADP resource such as programmers, data entry The crux of user service reporting and clerks, etc.) and delay costs (e.g., lost the management of service to the users lies opportunity costs or user impact costs). in the identification of the computing needs CI(j) would Include the computing system cost of the user community. The approach taken (e.g., chargeback) and auxllllary costs. It in this paper is that analysis of computer is well to note that CU(j) may be positive or performance should be based on the fiscal negative, since aspects of rendering ADP service to the of- fered user workload, and the determination of xvrhich work should be included in the of- CU(j) = P(j) - S(j) (2) fered user workload and the service it should receive be based on the net benefit of that work to the organization. The underlying where P(j) is the personnel cost assumption is that benefits and costs assoc- S(j) are the savings incurred as the iated with the use of the ADP resources In result of productivity Increases support of an organizational program can be due to ADP reducing the effort. quantified. In fact, it is pointed out It is at this point that we can begin to see [BENNW 81] that if the benefit associated where service, end-user productivity, and the with a unit of work can not be determined, bottom line are related. End-user productiv- the decisions concerning the usefulness of ity is affected by the service both rendered the work and its relative priority with re- (i.e., response) and anticipated. Losses, or spect to other work can not be objectively gains, in productivity will ultimately be re- determined flected in costs and benefits to the organi- zation. The provision of a particular service or end-product by an organization is assumed Regardless of the cause and source of to have an intrinsic value. Benefits are the ADP service degradation, the end result assumed to be the positive advantages which will be the same - some degree of Impact on an organization receives as a result of sup- the user community. When the user community plying an end-product to another organiza- is unable to perform their assigned jobs due tion or individual. In the commercial sec- to ADP service failures, resources are wasted, tor, that value relates to profit to a busi- user costs increase, and productivity de- ness. In the Federal sector, the value may creases. Hence, when service rendered does be derived from satisfying social needs of not meet the required levels, delay costs may the country as expressed in the system of be accrued. The estimation of the more tang- public laws and Congressional mandates. ible costs is fairly straightforward, invol- Qunatifying such benefits can be quite com- ving estimating the length of time a user is

61 .

non-productive while waiting for an overdue where B is the benefit to all projects given response from the ADP installation. But the installation capability there are often less tangible delay costs, C is the cost of using the ADP instal- involving, for example, changes in work pat- lation. terns of attitudes of users. An example of Since the net benefit, NB, depends on the less tangible delay costs is given by Axelrod precise capabilities of the ADP installation [AXELC 79] when he discusses the effect of a and the make-up of the user workload, a set job not meeting the estimated turnaround of NB(i) could be developed for various al- time. The effect may range from loss time ternative installation capabilities. The and productivity of the user as he diverts proper mix of user workload and ADP instal- his time to a monitoring operation with re- lation resources could then be determined by spect to the system queries, to a loss of maximizing the NB(i). It should be noted potentially profitable jobs because of lack that sub-maximizations must be performed of user confidence in the ADP installation. throughout the process, since the determina- tion of whether a particular user job will Secondary delay costs may also occur as become part of the workload depends on system the poor service to the direct user ripples and auxilliary costs, which in turn depend its way outward in the form of missed dead- on installation capability. lines on projects, decisions being made using incomplete data, decisions postponed, lost In order to quantify these economic opportunities for benefits, etc. It should consequences, the user reporting system must be pointed out that resources are always be able to describe user workload at a high wasted when service is poor, since they do enough level to relate meaningfully the not contribute to the gross benefit to the workload to the costs and benefits associated organization and could always be expended with the user work. In addition, methods to more profitably in other areas. It is not translate user work to installation-specific a valid argument that such costs are not real measures are needed to estimate the capacity because salaries must be paid anyway, ma- required to support the user workload at a chines leased, etc., since, st some point, specified level. Such a capability would end-user productivity loss will translate permit the costs to the installation to be into additional resources (e.g., manpower) weighed against the costs and benefits to requirements the users of using the installation, all of which would lead to the maximization of net Given that the above determinations and benefit to the organization. Such capabil- calculations can be made, we can define as ities are not found in performance systems follows the net benefit to the project (j) of today. In the following section we discuss performing its work by using the ADP instal- why these characteristics are important. lation resources: 4. User Reporting System ,

NB(j) = B(j) - C(j) (3) 4 . 1 Environment

The user needs the ADP installation to where B(j) is the gross benefit to the organ- process his workload within certain service ization for project(j) thresholds if the net benefit to the organ- C(j) is the total cost of using the ization is to remain practically maximal. ADP installation. User service reporting enters the picture by The equation above assumes that the instal- providing the necessary information to keep lation has a particular configuration which the service function of the ADP installation is tuned to perform in a certain manner. under control - although it may not provide When the characteristics of the installa- all the information necessary to arrive at a tion's ability to provide service change, solution to the performance problem. Fur- the NB(j) will change, because C(J) depends ther, the user service reporting system must on the direct installation costs - CI(j) - provide the information necessary to relate and the performance of the installation, user workload to the service characteristics which affects CU(j). of a given configuration if planning is to be effective in ensuring that the needed For a given ADP installation capability, levels of service are provided. the net benefit to the organization as a whole for the ADP resource is defines as: The goal of a user service reporting system should be to provide information with respect to ADP service to support the deci- NB = B - C (4) sions being made by the upper management.

62 .

the project /line management, and the ADP in- informed cost analyses in order to stallation management. The users need to maximize net benefit to the organization know what service to expect and what costs will be incurred in acquiring and using the * The ability to provide information that services of the ADP installation if they are is needed by the user community to con- to effectively control and plan their pro- trol their projects and resources grams and projects. The ADP manager also needs information concerning the current * The ability to provide information that level of service being provided to the user is needed by the ADP manager to control and the impact of the projected workload on the computing resource the installation's ability to provide service. The USRS should provide an important link be- The first requirement is necessary if the tween the user community and the ADP instal- proper planning is to be carried out to en- lation. sure that the required service level can be met by the ADP installation in the future as Both the user and the ADP manager need it processes the user workload. The last service information in order to properly con- two requirements support the control, or trol the execution of their projects and pro- operational, decision-making processes of grams. For example, a manager of direct the user community and the ADP installation users may need to make a decision regarding manager the addition of data entry clerks to a pro- ject if the system is not responsive enough 4.2.1 Informed Cost Analysis for the current staff to perform the data entry function. Likewise, the manager of a In the discussion above, we have noted program development group may need to make that the computing needs of the user commun- decisions regarding the assignment of multi- ity should be determined by satisfying the ple development tasks to an individual due overall goal of maximizing net benefit to to the long turnaround time of the ADP in- the organization. The level of service pro- stallation in processing batch work. These vided by the ADP installation may affect types of decisions are control, or operating, both the costs and the benefits of completing decisions which are made to accomodate short- the user's work functions. To be useful to term situations that exist when the ADP in- a USRS report recipient, the work units for stallation is not providing acceptable levels which service reports are generated should of service. If such situations were to occur correspond to the work units of the function over a sufficiently long period of time, being performed. That is to say, the users planning decisions would be most likely be must be given the ability to define units made managers by user and the ADP manager, of work, and , thus, the levels at which the resulting in a mutually acceptable course workload will be described, so that the re- of action. ports are in user-oriented terms and mea- sures. The definition of user work units Decisions in the planning area require will vary from installation to installation, more and different information than is re- and even from application system to appli- quired for making control decisions. In cation system within an organization. particular, not only would information re- lated to the quality of service being pro- Further, the full cost of ADP service vided be necessary, but also information con- must be reported, which includes both user cerning the cost of providing service at dif- and ADP installation costs. The service ferent levels. Such cost information would level requirements placed on the ADP instal- include both the direct cost to the ADP in- lations reflect the quality of service re- stallation of providing the required levels quired by the user community to successfully of service, and the indirect cost to the meet its schedules and deadlines. When the organization when service falls below the ADP installation is meeting these require- specified levels. ments and is available, cost incurred by projects for idle workers, missed deadlines, 4.2 Requirements etc., are not attributable to the ADP re- source. However, whenever the ADP instal- In order for a user service reporting lation is unable to meet the service require- system to satisf actorially support both the ments of all, or part, of its user community, ADP manager and the user community in effect- there is a cost impact to projects of not ively managing their projects, the following having adequate service. For example, the requirements must be met by the system: lost time, and hence cost, ^associated with * The ability to provide management having data entry clerks sit idle while the with information necessary to make computing resource is unavailable is an

63 additional cost incurred by the project, lat ion might be a contributing factor. since the work they would have been perform- ing must still be performed when the ADP re- The user service reporting system must source becomes available. The estimation of be able to provide the required information such delay costs is not necessarily straight- concerning the projected quality of service forward or simple. Such information, however, early enough to serve as input to the deci- is essential to the user community in con- sion-making process. The requirement is trolling project costs and for management in critical is the reporting system is to be making planning decisions. In order for a able to support the project control function USRS to satisfy this requirement, it must be of the user. Such information is needed by able to model (calculate) the costs incurred the user managers to determine the adjust- by individual users and projects based on ments that need to be made to the work different levels of ADP service. When the schedules. This is especially important if reporting system is supporting the control ADP usage is on the critical path of the function, it must be able to model delay project. The ability of the USRS to fulfill costs based on the actual ADP service being this objective may hinge on its ability to provided. When the reporting system is sup- relate service with system configuration and porting the planning function, it must be offered workload. able to model the costs based on projected service and performance levels as estimated 4.2.3 Control of the ADP Resource from workload projections and projected changes to the system configuration. The The ADP manager has the responsibility satisfaction of this objective is of great for ensuring that the ADP installation meets importance to the overall ability of the USRS the service requirements of the user commun- to support the decision-making processes, ity when it is processing the expected work- particularly in the planning area. load, and that the ADP installation is run in an efficient, effective manner. In order 4.2.2 Planning and Control of Projects to carry out these functions, the ADP manager needs to know where in the system the service An ADP installation functions as a ser- bottlenecks are to be found. The key to this vice to the user progrmmatic groups and pro- requirement is that the USRS must be able to jects of an organization. Almost all user relate the quality of service to factors ADP work is not an end in itself, but is a which can point to the specific areas which part of a larger project or program. The need improvement if the service level is to programs and projects have schedules and be brought within control limits. The USRS deadlines to meet, and milestones to be is not to take the place of a performance reached. These schedules and deadlines were management system, but should be able to projected based on the quality of service the provide service related data to such a sys- user expected to receive from the ADP instal- tem. The USRS should indicate when a per- lation. The level of service was based on formance problem exists, and the performancie the function of the user project and was to management system should be used to diagnose help to optimize the end-user's productivity. the problem. In order for a manager to control effectively his project and meet his deadlines and sche- 5. Conclusion dules, he needs information concerning the service he is receiving for the ADP instal- With the possible exception of ADP ser- lation and the service that he can expect to vice bureaus or commercial timesharing ser- receive in the near term. vices, an installation does not exist to provide ADP services, but to support an Reporting to the user community the organization's mission. The service which level of service it has received is probably an ADP installation provides its users can not as important as projecting the quality of impact the productivity of the end-users and, service that will be provided. Although it eventually, the net benefit to the organi- may be argued that the community already zation. Since such decisions will be made knows the level of service it is receiving at multiple levels throughout the organi- from the ADP installation, and that the re- zation, the USRS must be capable of supplying porting of such informatin is not necessary, information at the appropriate level to each it is the inclination of users to blame the report recipient. In particular, the USRS computer anytime projects are not going well, must be able to relate the service rendered or are delayed. ADP service information to the costs of providing, or not providing, would appear useful in helping a user manager adequate service to the organizational units. isolate a problem with a project in which the service being received from the ADP instal- During the course of the study it was

64 :

determined that no current performance sys- References tems satisfy the requirements outlined above. The primary cause for the lack of acceptable [AXELC 79] Axelrod, C.W. Computer Effect- systems is one of orientation - user versus iveness: Bridging, the Management /Technology ADP system. That is to say, the current Gap. Washington, D.C. Information Resources systems are heavily ADP-system oriented and Press, 1979. they report service in ADP units which are easy to track but which do not necessarily [AXELC 80] Axelrod, C.W. The Dynamics of relate to user work functions. The USRS de- Computer Capacity Planning. CMC Transactions scribed above, on the other hand, must be 1980 September; 29: Section 3. very user-oriented if it is to fulfill its goal of assisting management in maximizing [BENNW 81] Bennett, W.D. Computer Perform- net benefit to their organization. There ance: A Conceptual Framework. To be publish- are three issues which must be resolved be- ed as a NBS Special Publication, 1981. fore the USRS descibed above can be developed [EDPPR 80] Fulfilling User Service Object- ives. EDP Performance Review. 1980 October; * ADP system support in the definition of 8(10): 1-7. user work units and service character- istics [FLEMI 81] Flemming, I.J. Service Level Ob- jectives as Capacity Management Criteria. * techniques for the estimation of bene- Record of the Third Annual Conference on Com- fits and delay costs puter Capacity Management. 1981 April 7-9; 163-170. * increased user involvement [GUERJ 80] Guerrieri, J. A. Establishing The resolution of each of these issues is True User Requirements. Small Systems. necessary if the USRS is to be developed. 1980 September; 8: 26-27+. The NBS report [MOHRJ 81a] discusses the pro- spects for resolving these issues. Further, [HOWAP 80] Howard, P.C. Planning Capacity it discusses at length the differences in to Meet User Service Requirements. Computer workload characterization required by this Performance Evaluation CMG XI. 1980 December approach versus the traditional workload 2-5; Boston Massachusetts. Computer Measure- description. It discusses the requirements ment Group: 147-151. on the user community as a result of the required, new characterization. It recom- [KLEIJ 80] Kleignen, J. P.C. Computers and mends particular service measures to be re- Profits: Quantifying Financial Benefits of ported which relate to cost-benefit calcu- Information. Reading Massachusetts: Addison- lations. The conclusion of the report is: Wesley Publishing Company, 1980.

The proposed approach to ADP service [MOHRJ 81a] Mohr,J. and Wilson, C. Draft: A reporting should be useful in virtually Report Describing an Approach to ADP User all organizations or agencies. The Service Reporting. Contract No. NB80SBCA0501 approach is directed towards assisting National Bureau of Standards, 1981. the organization in making those deci- sions that will maximize net benefit by [MOHRJ 81b] Mohr, J., Wilson, C, and Chan, providing information describing the M. Fiscal Aspects of ADP Service Management. total costs and benefits of the level Proceedings of CMG XII. Computer Measurement of ADP service being provided. The re- Group (to be published). ports will be in terms known and under- stood by the user. The information in [NOLAG 79] Nolan, G.J. Standard Costing for the reports should be compatible with Data Processing. Controlled Resource Manage- the various types of management inform- ment Through Computer Performance Evaluation ation presented to high level managers. CMG X. 1979 December 4-7; Dallas Texas. The ADP service reports should be well- Computer Measurement Group: 12-6. suited for inclusion in an overall re- porting system.

65

.

AUTOMATING THE AUTOMATIC N PROCESS

Louis C. Elsen

Evolving Computer Concepts Corp. St. Loius, MO

The increasing demands for data + HARDWARE automation support have created a new -Mainfr ames "frontier": A COMPUTER SYSTEM THAT CAN -Utilities PROGRAM ITSELF. When one considers that $4 -Env ironm ent billion of our $6 billion Federal budget for -Suppl ies automation support is allocated to staffing + STAFFING and that the introduction of new and -Managers improved technologies place increasingly -Analysts difficult requirements on personnel, the -Programmers need for the new "frontier" becomes -Operators apparent. The introduction of automated automation has evolved by necessity— it is a The latter of the two parts, staffing, new concept whose time has come. The currently accounts for 70 to 80 percent of circumstances that brought us to this an installation's budget. This is a far cry evolution stem from: from the cost relationships that were in existence 20 years ago. Then, an average + rapid growth in technology which installation was able to operate on a budget makes a particular computer of approximately $500,000 with 65-75 percent system virtually obsolete before of its total resources allocated to it is delivered. hardware. This impacted the growth of data automation leading to the development of + personnel which is at a crisis computer performance management (CPM) level. (A January MIS report systems which targeted efforts toward predicts shortages of 50,000 in increasing hardware productivity. This 1981.) philosophy and approach to CPM has continued through the years; although, the target The result of these two "circimstances" emphasis has shifted and should now be has had a significant impact on annual cost. redirected toward increasing personnel productivity + Hardware is decreasing at 20%. + Manpower is increasing at 15%. This can be acomplished by evaluating + Software is increasing at 30%. those functions most commonly performed by data automators, reducing them to their Today in order to operate a "typical" logical components and then automating them. computer installation, an annual budget The end result is: ranging from $800,000 to $1.2 million is required. This cost becomes considerably + reduced programming requirements by more significant when one considers the two elimination of menial or simple distinct parts of each installation: design/ programming tasks;

67 . .

+ increased site productivity by facil- programming requirements necessary to obtain itating software generation, thereby data from an on-line data base. It performs reducing the time required to achieve this function well. a desired system objective; As alluded to earlier, programming + increased programmer productivity by efforts must be divided into logical redirecting programming efforts to categories. An attempt to automate areas where automatic programming is "everything" in one huge system would not yet practical or achieved. probably be catastrophic. LOUIS is just one part of the entire programming scheme; Although these "results" appear to rep- incorporating the techniques employed by resent an unrealistic utopia that should be LOUIS and applying them to update and batch reserved for a science fiction novel, this systems would yield similar results. capability does exist. Basic analysis of many computer facilities revealed that the LOUIS is unique because it writes "computer's" primary function was usually tailored programs optimized for the nothing more than a sophisticated filing particular machine it is running on. When system and that the majority of the an individual specifies a data base name, functions being "programmed" were either LOUIS interrogates the data file to putting something into the "file cabinet" or determine the format, type, and size of the getting something out. Observation of file. It analyzes the data to determine techniques employed by analysts and record volume, identifies numeric data and programmers revealed two methods for dates, and edits them to ensure accuracy. accomplishing these input/ output functions. LOUIS continues performing analysis on the information it obtains, including the types + Batch: the processing of data of inquiries it has executed, performing the accumulated over a period of time. very same functions programmers perform. The programs LOUIS writes are optimized + On-Line: the processing of data assembly language programs. This approach immediately with a response received achieves improved throughput and minimum as soon as possible, usually within system impact. To understand what (in my seconds opinion) necessitates the use of ASSEMBLY language and why its use has diminished, I To continue the thrust of automating must now refer you to the past. automation, one must continue analyzing data automation, further reducing it to its logi- When computers were first designed, cal components, grouping similar functions machine/assembly languages were the only together, analyzing how each individual com- means of "programming." Due to its ponent or function is performed, and finally tediousness and complexity, few individuals reducing the programming logic to machine became qualified in the "art." The logic. We, as data automators, perform evolution of data automation progressed in these functions daily whenever we automate a conformance with the law of supply and personnel, budget, or inventory system. We demand. In order to increase the number of study the manual intensive functions being individuals capable of interfacing with performed, reduce them to their logical com- computer, "simpler" languages were created. ponents, then program the computer to do them. A computer program which is the prod- + Higher Level Languages-A language in uct of the techniques I have just outlined which each instruction or statement is the Logical On-Line User's Inquiry System corresponds to several machine code (LOUIS). instruction. These languages allow programmers to write in notation with LOUIS is a system which writes on-line which they can become familiar. computer programs to retrieve data from the "file cabinet." It was designed to provide - COBOL managers and functional users with the - FORTRAN information they need when they need it. - BASIC The programs LOUIS produces have the full - etc range of capabilities comparable to generalized inquiry systems currently on the + Report Writers-A processing program market. LOUIS was not designed to be the that can be used to generate reports answer for everyone. It is not an update from existing sets of data through system, nor is it a batch system. The parameter cards which control the original design objectives of LOUIS were to flow of the program execution. automate approximately 60 percent of the Report writers usually execute in the batch mode. . . .

aren't enough data automators to program all - SIS the requirements that exist. With the - TRS introduction of such technologies as word - lifWDMS (These two systems are processing, teleconferencing, and electronic complete data management systems mail which are becoming the vast array of which provide a capability to the interconnected technologies available update the data base also.) through communications networks, the need - TOTAL for data automators is continuing to grow. - etc, By incorporating systems such as LOUIS to perform tasks in basic system design and + Inquiry Languages-A processing programming, we are able to utilize data program which utilizes a simple automation personnel more effectively and command language whereby the thereby increase overall productivity. interrogation of the contents of a Management is then in a position to reduce computer's storage may be initiated required manpower or redirect programmer at a local or remote point by use of expertise to critical areas and accomplish a keyboard, touchtone pad, or other their objectives without sacrificing dev ice internal machine efficiencies. A computer system designed to write specific programs - AZ7 for each user application provides: - IDS QUERY - etc. + System analysis and design + Programming Although these "simpler" languages + Elimination of program debugging time eased the loads on programmers and users, + Program optimization the workload on the computer was dramatic- + Program documentation ally increased. For example, a program written in Assembly Language required the All without the need for manual programmer to write 5,000 lines of code intervention which produced 5,000 executable machine instruction. However, the same program In order to provide some validity to written in COBOL or a higher level language this presentation, here are several facts could be written in 500 lines, but when this and figures obtained from evaluation of program is compiled and reduced to its LOUIS. One installation in Denver measured assembly language counterpart, the execut- an increase in productivity of ratios able code generated required 8,000-10,000 between 50:1 and 80:1. Eleven sites, when lines. The Report Writer, in this instance, performing their analysis, recorded a 17:1 required 50 lines of code and generated reduction in cost. These figures were 25,000-30,000 lines of executable code. relevant to hardware performance alone. When using an Inquiry Language, 1-5 lines of Another installation was capable of reducing code are necessary, producing 40,000-50,000 and redistributing its programmer work force executable lines. There are three areas to by 35 individuals. In a national survey be examined in this instance. conducted by the Air Force and substantiated by the Department of Defense, savings in the 1. The Assembly Language Program is quarter-billion- dollar range were achieved usually the ultimate in machine in just the first year. efficiency and productivity. Unbelievable? Not really. Increased 2. Programmers using higher level productivity at each site where LOUIS was languages can dedicate much less loaded was achieved because LOUIS allowed time per program. the user to request desired information in the required format when it was needed. 3. Through the use of inquiry langu- LOUIS did not question the user's need for ages, users can now communicate information. It simply analyzed the request directly with the computer; thereby and proceeded, to write the programs increasing their productivity. necessary to obtain the answer as quickly as possible This now brings us to the present. Although we are providing data automation The philosophy expressed is shared by support to managers and functional users, John J. Connell, author of an article titled their need for the processing of information "The Fallacy of Information Resource and the communication of that information Management," appearing in I nfosystems , May necessary for the productive operations of 1981. Mr. Connell identified four kinds of an office continues to exist; and there just information related activities:

69 .

I perceive that in the not- too-distant + Identifying future, a computer system will allow a user + Processing to state his new requirement interactively + Transporting at a terminal. The "computer" will ask the + Using user pertinent questions to obtain information such as the relationships of He stated that the identification and use of data fields, the frequency and types of information are the "responsibility of indi- information requests, and the volume of vidual, thinking, human beings" and that the information to be stored, retrieved, etc. processing and transporting "can be aided by The "computer" will then perform analysis as outside attention, as the successes achieved required to determine necessary hardware in data processing amply demonstrate." augmentation (if any), estimated cost, and feasibility in terms of the machine's It is my belief that the "...successes capability. The findings will be compiled achieved in data processing..." and the in a report to the requester and computer roles of data automators must continue to performance manager. It can then be provide users with the information desired reviewed to determine cessation, continua- when they need it. tion, or modification of the plan. Once this has been acomplished, it is perfectly However, I disagree slightly with Mr. logical to have the "computer" go on to Connell's opinion of identifying and using actually write the programs needed to information. I believe these two items of implement the new procedure in a timely, information related activities can be en- efficient manner. We have reviewed and hanced by data automation. Through educa- substantiated the shift in the managerial tion, users can become aware of what infor- functions performed by CPM from a hardware mation can be gathered, how this information emphasis to that of personnel. We have can be used, and presented with new tracked the path of data automation and its methodologies to further enhance programming evolution and shown the need for productivity through the use of data automating data automation. We have automation demonstrated feasibility, operability, and cost effectiveness of such. You may ask, "Exactly where does CPM fit in this evolution?" It is my opinion As data automators, we can produce that CPM should play a new role, one which far-reaching, beneficial results by auto- incorporates management of the "total site mating data automation. Reducing our work- productivity." I envision computer load, increasing central site productivity, performance managers performing two distinct and providing better service will signifi- functions, utilizing automation to cantly enhance our country's productivity accomplish both. and economic status. As managers of a great resource, we are bound to insure a smooth, + The first function performed by CM productive, secure transition into the should provide users with a cost future providing management with the means analysis of their current and to obtain current data necessary to projected data automation support. efficiently manage defined areas of This service will assist users in responsibil ity. determining whether or not the advantages of obtaining desired information is worth the expenditure.

+ The second function should concentrate on the internal workings of the site. The goal here is to achieve the most cost-effective balance between human and manchine

resources .

In summary, it is important to analyze the areas I have just described because doing so can bring the realization that these jobs are currently being performed manually. It is my contention that in order to produce truly cost-effective results, the performance of these jobs must be automated!

70 .

METHODOLOGY FOR ANALYZING COMPUTER USER PRODUCTIVITY

Norman P. Archer*

Digital Equipment Corporation 146 Main Street Maynard, MA 01754

A methodology is proposed for the analysis of computer user productivity. The methodology is based upon a multi-layered level structure of the computer system in an interactive environment, where each lower level develops more detail of the interactive system. This structure enables a logical approach to the analysis of interactive systems, and productivity may be analyzed at any level, from the lowest primitive function up to the most aggregated system level. Here, considerations of productivity are considerably different in scope and direction, depending upon the level of detail. Examples are drawn from the area of text editing, which is having an increasingly strong effect on the performance of interactive systems.

Key Words: Productivity analysis; interactive systems; text editors; end user performance; performance analysis.

1. Introduction. of work done. In order to accomplish this desirable result, the computer system-human interface should be designed in such a In the time since electronic computers fashion that the computer can perform more of were first invented and used, there has been the task with little human intervention. a continuing decline in the cost per unit work accomplished by computer systems. The This has not been difficult to do with cost per unit time for the human resources standard clerical accounting tasks, which working with the assistance of computer formed the first great wave of automated systems has increased steadily over the same business systems. Here, in essence the time interval. This latter cost has been the computer is turned loose on a great stack of driving force behind the ever-increasing move well-defined and repetitive tasks, and the towards automation. In theory, the computer results are returned to the user in some assists the human user to the extent that well-organized fashion at a later time. there is a net decrease in the overall cost There is very little need for human intervention in such a batch-oriented *0n leave from McMaster University, Hamilton, environment. Hence the productivity Ontario improvement has been due to the speed at which the computer carries out its tasks, and this has shown orders of magnitude increases due to hardware evolution alone.

71 .

primarily for prose editing. Similarly, But now the next wave of business there are common features among editors used systems is in the process of evolution, with primarily for computer program editing. As the rapidly increasing usage of on-line data time has gone on, most of the more recently base systems, the automated office, developed text editors have evolved towards a timesharing computer program development and much more common functionality which may be other similar uses, all sharing the common used for both prose and program editing. attribute that they are highly interactive systems. All carry out a variety of The line editor was the first type of unstructured and non-repetitive tasks, under text editor developed, due to the fact that the close direction of the user. These the originally available terminals were hard interactive systems are feasible only because copy terminals, which require line editors. the system hardware costs are so low that Many varieties of this type of editor have they are no longer the important been developed and used, primarily in consideration they were when the first wave systems and application programming work. of business system development was in The chief characteristic of the line editor progress. Now we are faced with a different is that it addresses one line of text at a set of problems. The current concern now is time. Most line editors allow only vertical with the human interface, since the user is movements of the cursor, so if an operation seated in front of a terminal and a great is to be performed upon a character or deal of human intervention is required due to string of characters on a particular line, the lack of structure in the tasks the system then an editing command is keyed in which is performing. Any improvement in the specifies the string in question, and the efficiency of these interactions is most operation to be performed upon it. The beneficial, because the limiting and most operation may usually be extended to include expensive factor in the cost equation is the a range of lines, if desired. Line editors human were and are used primarily for computer program editing, since computer programs are A class of tasks which is consuming a line oriented. Some of the more specialized rapidly increasing proportion of interactive line editors allow line-by-line syntax system time includes the automated office checking as the program is entered, and may functions such as word processing and also allow program compilation and execution electronic mail, as well as the more from within the editor. However, line traditional task of computer program editors do not lend themselves well to the development. Of these tasks, for example, features which are desirable in editing 100 percent of the time spent on a word proseC 3 ]. processing system involves the use of a text editor, while measurements have shown that at For the more general text editing tasks, least 50 percent of the commands issued including both prose and programs, a screen during interactive program development editor (often referred to as a character involve text editing[1]. Thus it is very editor) appears to have several advantages clear that user productivity is strongly over the line editor, from the user's point affected by text editor design. Many of the of view. First of all, since it is appli- human factor problems of text editors which cable to a broad range of editing tasks, it have been addressed by other workers are requires the user to be familiar with only reviewed in a recent article by Embley and one editor and one set of editing commands Nagy [2]. Miller and Thomas [3] have also which may be used for any type of task. reviewed the more general problems Secondly, the screen editor allows the user encountered in the use of interactive to view an entire page of text, and to move systems. This report will discuss a the cursor about on that page. The cursor is structure or taxonomy for the analysis of normally used to point to the text which is interactive user productivity, with specific to be entered, modified or deleted. The attention to the analysis of text editing cursor may be moved by keyboard command, interactions as examples. mouse, lightpen or other pointing devices. The operation to be performed upon the text 2. Functional Characteristics Of is then input to the computer by means of Text Editors. special function keys or, as in the case of the line editor, by keying in the instruc- There are a wide variety of text editors tion. The rapidly reducing costs of available on modern computer and word hardware have made the screen editor a viable processing systems. The functional tool in many interactive environments. The differences among text editors are very chief disadvantage of the screen editor large. However, there is a certain degree compared to the line editor is that, to make of commonality among editors intended it an effective tool, much higher trans- 72 . , .

mission rates are required from the system to the discussion to other non-computer oriented the terminal and a higher price is paid in tasks if we wish, in the standard systems cpu time. This must be weighed against the analysis manner. However, this is beyond cost of the human resources making use of the the scope of this report and the discussion system. If the primary use of the system is will be restricted to those tasks involving computing, and machine resources are interactive computer system usage. limiting, then the choice of a screen editing system is less likely. This is frequently At the next lower leveKthe task the case in a university environment, for level) , more detail allows us to consider the example. Here machine costs are usually a properties of the individual tasks in which very strong consideration, and from the point the user is involved. The characteristics of of view of the university administration, a particular task would include: student time used in editing text of any kind is relatively cheap. The opposite is a) the type of task (for example, usually the case for a firm employing transcription from a typescript, clerical and professional staff for work editing from a marked copy, etc.), involving text preparation, where the cost of b) the amount of time required to personnel time may outweigh the cost of the complete the task, additional computer resources required to c) the expected error rate from the support screen editing, user/task pair, d) the ultimate disposition of the 3. An Analysis Framework. completed task (program execution, printed copy, etc.), and In order to analyze user productivity in e) the experience of the user with this an organized and logical fashion, the first type of task. step is to develop a framework for the analysis. One way of looking at the The subtask level is more detailed than interactive environment is to develop a the task level, allowing us to address such model with increasingly detailed levels of questions as the amount of time the user the system load in order to observe and spends carrying out certain activities while possibly improve system performance at each performing a specific task. This in turn of the more detailed levels. Consider the permits more direction in performance model which is illustrated in Figure 1. improvement, because we may identify certain Since we are interested in user productivity relatively primitive operations which the we are naturally interested in general system user performs frequently. The example shown characteristics in the presence of the global in Figure 1 illustrates subtasks used to system workload. The global characteristics describe editing operations. There are that are well known to affect user three primary subtasks shown here: text productivity are the system response time entry, subtext entry, and housekeeping. The and availability. Most system performance text entry subtask includes operations studies concentrate on the analysis of relevant to the direct entry of text, and system performance in the presence of this related cursor movement and editing global load functions invoked directly by function keys. The subtext entry subtask involves those If we are interested in user produc- operations in which it it necessary to key in tivity, however, we must consider the next character string instructions for editing lower level of this model, where the purposes, and includes function editing and individual user is described by a position. cursor movement relevant to the editing of This is essentially a description of the these string instructions. The housekeeping tasks which a user in this position is subtask includes remaining operations such as likely to perform, including the frequency instructions for formatting printer output, of performing each such task type. The filing documents, and sending electronic performance and productivity of a user in a mail specific position is generally given by: Each of the primary text entry subtasks a) the type and number of tasks the user includes the primitive functions of keyed must carry out text, cursor moves, and editing function b) the experience of the user with these commands which are at the lowest level in tasks, and this structure. It is not necessary for all c) the ability of the user. of the lower level functions to be common to these three subtask states. Neither must Note that we are concerned here with these functions be restricted to any the user's productivity while using the particular subtask state. Finally, a computer system. Clearly, we could expand secondary subtask state is the entry/exit

73 LEVEL

GLOBAL

USER USER . USER POSITION j-1 3 j+1

TASK, TASK TASK. TASK

SUBTEXT TEXT HOUSE- EDITOR SUBTASK ENTRY ENTRY KEEPING ACCESS

KEYED CURSOR FUNCTION TEXT MOVES FUNCTIONS I

Figure 1. Text Editor Level Framework

74 . .

state, which normally has little direct no matter which system we are considering. effect on user performance because it is From here we can build to an aggregated task entered rather infrequently. mix at a higher level, or we can break down the task to more detail at a lower level. The primitive functions which are at the lowest level in this structure include any 4. 1 The Task Level character which may be entered into the text, any function key or keyed instruction To develop an understanding of the basic which may carry out specific operations such force which drives the evolution of inter- as "delete word", "replace", etc., and any active systems to more productive forms, it key or instruction which controls cursor is helpful to consider various classes of movement. Cursor movement includes such interactive tasks, where the amount of screen editing operations as keyboard arrow structure in the tasks performed may range keys or certain applications on light pen or from very high to very low. Examples of such mouse-driven screen editors. It may also tasks are given below. include functions which control line editor cursors ("pointers"). Degrees Of Task Structure

Frequently used functions in text editing systems are often built into special a) Highly Structured function keys. These might include the - Data entry ability to move the cursor by entities which - Transcribing text from document or are larger than the usual cursor movement by dictation character, and cursor movement by word, line, paragraph or page are common. Other b) Moderately Structured special function keys may be designed to - Editing text without the aid of allow the deletion or insertion of larger marked-up copy entities as well, or to perform special - Writing a computer program inter- operations such as filing the current actively, without the aid of notes document c) Loosely Structured - Writing a letter, memo or story Some frequently performed functions do without the aid of notes not lend themselves to function key - Developing a project proposal from definition. For example, searching the text a few ideas for a particular character string will require the user to define the desired Highly structured tasks are on the character string. This will involve borderline between tasks which can be com- switching to what has been defined above as pletely automated and those which require subtext entry mode, where the subtext is the some human intervention. Frequently, the character string in question. Similarly, a human intervention is required only to replace operation will require the transform information from one form to definition of the string to be replaced as another. Since the ultimate bottleneck in any well as the string which replaces it. Line productivity improvement effort in this area editors are more likely to require subtext will always be the user, technological entry of the operation to be performed. evolution will eventually solve this problem Subtext entry is not necessarily a dis- by eliminating the human from highly advantage, because it may allow greater structured tasks. For example, the most flexibility in user control of the editing commonly used text entry device is the system. keyboard, and even the most proficient keyboard users can not improve their keying 4. Productivity Analysis. speeds above about one hundred words per minute. Added to this is the fact that a Productivity analysis considers the user substantial number of users do not time and output, and the cost of the suppor- touch-type, and many get out of practice ting system. From the point of view of the because they use a terminal infrequently. productivity analyst, the analysis structure The obvious solution to this problem is to differs less in form than in emphasis, de- develop better text entry devices, and some pending upon the level of detail upon which progress has been made in this area. As an interest is to be focussed. For this reason, example, optical character readers may be the following discussion will be directed in used to enter printed text directly into the turn to the various levels outlined above for computer for editing and storage purposes. the analysis structure. The task level will Some progress has been made in the area of be discussed first, because the specific task voice text entry systems, but these are a itself is a common basis which varies little. number of years from being sufficiently

75 . .

reliable and capable devices to substitute analysis tools for the study and improvement for keyboard entry. of computer system performance are available. These include [5] analytic modeling, hardware There is less direct aid possible or and software monitoring, benchmarking, etc., even likely from computer systems in less around which an entire industry has grown and structured environments. For example, thrived. automatic spelling verification and context editing are clearly feasible and are being For interactive systems, tradeoffs are used to some degree on word processing made in the usual manner between the cost of systems. But creativity will continue to be increasing system resources and the resulting the sole province of the human in the productivity improvement due to decreasing foreseeable future. On the other hand, the response times. It should be noted that pro- computer system can be an aid to the human in ductivity in this context is rarely quanti- improving the speed with which creative ideas fied in an organized manner. Rather, can be transformed into concrete form. It is computer system management tends to rely on at this point where a great deal of common- the complaint level of the users as a stimu- ality appears to exist among the writer, the lant for taking action. It is not clear that artist, and the engineer. Design engineers shorter response times mean higher user per- already make use of CAD( Computer Aided formance for all interactive tasks[6]. In Design) techniques in designing mechanical fact, Williams and Swenson[7] suggested that and electronic equipment and buildings, among system resources needed to produce short other things. The creative, unstructured response times when not required by the user task is clearly a task in which benefits can are detrimental to overall productivity; in be derived from a marriage of graphics and many cases they studied, user performance was text editing techniques. These can combine not adversely affected by longer response to give the creative mind a potentially times. However, for most text editing tasks productive working tool in the form of a it is critical that response times be short, "professional work station". Such an inter- since most text editors are designed around active work station could include techniques this assumption. to allow color graphics, split screen editing features and the ability to quickly generate 4.3 The Position Level. hard copy or to set type. The writer could then have the option of integrating text with The analysis of the productivity of a drawings and figures, thus improving the particular position is necessarily tied in quality of the final presentation. All of with the entire system, because of inter- these features are feasible now, but are only actions among the positions in an available as integrated packages on a few organization. Engel et al [8] discuss the relatively expensive systems. The difficulty design and evolution of a prototype office is, of course, in justifying these systems on automation system, which indicated a redesign the basis of productivity improvement, except of some of the lower level positions due ito in relatively specialized situations. the delegation and shifting of certain tasks However, technological evolution has in the among the users. There are many issues in past and will continue to surmount these cost the automated office environment which can obstacles cause a great deal of employee concern due to the possible elimination of office jobs. Not 4.2 The Global Level. all of these concerns are due to the complete automation of certain tasks, although this is Global productivity deals with the currently an important issue in the banking productivity of the entire system, including industry, for example. Productivity analysis the computer system and the human work force in such an environment cannot ignore the very using the computer. At this level the real concerns of management and workers with executive priority decisions are made on the both job security and the quality of the work acquisition of resources to support various environment system applications, based presumably on productivity improvement. However, such Some issues deal with shifting of tasks decisions may be influenced by other factors to other positions, thus either reducing the such as prestige and the ability to attract prestige of the original position or and keep high quality employees. eliminating it entirely. For example, consider a task which may be a source of Tools used for productivity analysis at tension in an automated office. Suppose a this level include operations reviews for the manager, before the introduction of study of human productivity, and some automation, has dictated formal letters or quantitative techniques for workflow memos which his or her secretary has then analysis[4]. A range of measurement and drafted and given to the manager for editing.

76 .

The final copy is then typed, given to the state model shown in Figure 2, which is a manager for signing, and finally mailed. model of the subtask structure level of

With word processing and electronic mail , the Figure 1. The transitions among the sub tasks manager now has the opportunity to do his or are shown, as well as transitions among the her own typing and/or editing, and finally to primitive functions making up each subtask send the mail to its destination with no state. For a particular task, it is possible assistance required from the secretary. The to measure the time spent by the system in ease of keying and making corrections to the each subtask state and/or function substate copy is an enticement that many managers and thus determine how much time was spent in cannot resist. However, the fact that the more productive operations. For example, new task may reduce the time spent on other keying text in the text entry subtask state tasks assigned to the manager's position must is clearly the most productive operation be considered. It is very simple to analyze which may be carried out, and it is more the productivity improvement or reduction productive if no further editing is required resulting from such a change. Suppose a due to entry errors or other changes. Among manager wishes only to edit and mail the the least directly productive activities are document which will be initially keyed in by moving the cursor or telling the system to his or her secretary. The decision would be perform the next editing operation. The time in favor of the manager carrying out this spent on these operations should be minimized task if as much as possible by improved system design C < C m s An important issue which also directly affects the efficiency of text editing is the where C is the cost per task carried out by design of the command language with which the the manager , and user communicates to the system. This was addressed recently by Ledgard, Whiteside, C is the cost per task carried out by Singer and Seymour[9]. the secretary In order to analyze the performance of various text editing systems, we have devel-

c = H X (E + L ) oped a system which allows recording and m m m m time-stamping keystrokes from user terminals. Software has been developed to perform ana-

C = H X E' + H X (E + L ) lyses on the transitions among the various s m m s s s states the system may occupy while the task is being carried out. The fractions appear- where the subscripts m and s refer to the ing in Figure 2 represent the transition manager and secretary respectively, and probabilities from each state to its possible successor states, in this case for a sample H is the hourly rate paid, editing task being performed with a screen E is the time to correct the edited editor. We can also measure the time spent copy and the number of keystrokes transmitted E' is the manager's time to mark up while in the various model states. The the printed copy, and following table contains a comparison of the L is the corresponding mailing time. time and keystrokes in each of the states of the subtask model, for a text entry task and Analyses of this type are not appropriate for an editing task respectively. Note that the informal short notes and reminders often these results are for only one sample task in communicated among managers and their employ- each case, so they will not be representative ees in an automated office environment. To of such tasks on this editor. There is a maintain the spontaneity, informality and wide variability among results obtained in

speed of such communicaions , it is important entering and/ or editing different tasks, with that the sender interface directly with the another source of variability being in the communication system for maximum productiv- user performing the task. ity.

4.4 The Subtask Level,

At the subtask level it is necessary to look at specific models of the particular class of task in question. This allows us to begin looking at issues such as software and hardware system design. Consider the finite

77 Figure 2. Subtask Interactions For A Sample Text Editing Task

78 . 1 1

Table 1. State Time And Keystroke Analysis why. Hammer and Rouse[10] used a Markov ian Of Sample Text Entry And Edit Tasks model to carry out a statistical analysis of variance comparison of text editor perfor- Text Entry Task Edit Task mance in a similar situation. Most statis- Subtask Time(%) Kysk(%) Time(%) Kysk(%) tical tools used for such comparisons require Function that the finite state model be zero-order Markovian. The four state subtask model is Text Entry likely to satisfy this assumption, but an Keyed Text 91.8 96.5 38.7 46.0 analysis at the primitive function level Edit Fns. 2.8 1.8 10.8 6. might not, since the transition probability Cursor Moves 'i.3 1.1 40.8 37.2 from one function state to another state may depend upon the previous state occupied. Subtext Entry

Keyed Text 0.0 0.0 1. 1 0.4 4.5 The Function Level. Edit Fns. 0.0 0.0 0.0 0.0 Cursor Moves 0.0 0.0 4.2 7.6 If a product is to be designed to per- form well in a particular environment, the Housekeeping most important considerations, aside from the Keyed Text 0.3 0.3 0.8 1.3 cost, involve the needs and perceptions of Ed it Fns 0.0 0.0 0.5 0.2 the ultimate user. There are many important Cursor Moves 0.8 0.3 3. 1.2 human factors considerations in this area, such as user satisfaction, keyboard layout, Editor Entry/ 0.0 0.0 0.0 0.0 and terminal and work station design. These Access questions have been considered elsewhere in some detaiK for a review, see reference

Figures given in Table 1 are rounded, so the [11 ]). zeros appearing in the table do not mean that no count was observed for the related func- Other considerations of importance at tion. It should be noted that, since such a this level include software design, because high percentage of time is spent keying text design improvements to the software of a text for the text entry task, there may be little entry system may improve the user's perform- to be gained by system modifications for ance substantially. This may be done by tasks of this type. Some improvement may be reducing the time spent in activities which made by user training in keying or in are not directly productive, such as cursor correction of errors detected during keying. movement. To determine how much time is For the text editing task, the user usually spent using each of the primitive functions, spends a high percentage of time moving the data may be gathered on representative tasks, cursor or using editing functions. Thus the and finite state model analysis applied at editing process might be faster if the detailed level. In this manner it is modifications were made to the editing also possible to detect sequences of func- software or in keyboard design, and these are tions which are performed frequently and candidates for further investigation. which might be automated. If cursor movement functions are found to be satisfactory, both As mentioned above, it is important from by function design and by the placement of the point of view of productivity improvement function keys on the keyboard, then one can that efforts be made to maximize the amount turn to the design of editing features. A of time actually spent keying text in the flood of such special features is undesir- text entry mode. This is true only if the able, since a typical user has the memory total time spent on the task also decreases. capacity to learn and use only a relatively It is possible to have a system in which very small percentage of such editing functions. little time is spent not keying text in the Boies[12] found that only a small number of text entry mode, but which is so awkward to the commands available on a text editor were use that the total time increases substan- used. It may be preferable to have a small tially. This is the case with some primitive number of easily learned and useful features, line editors which require re-entry of an and to invoke any little-used features entire line in order to correct an error. through a common subtext entry function such as a function menu. A rough guide to the A particular use of the finite state emphasis which should be placed on improving model is in the comparison of text editing a particular function should be based on the system performance. By comparing state times frequency with which the user is expected to and keystroke counts for specific tasks, it make use of the function. Features which is possible to determine which system is could handle any well-defined tasks automa- better for that task, and also to determine tically and thus reduce the interaction time of the user are also desirable, but these are

79 . .

typically heavy users of system resources [2] Embley, D.W., and Nagy, G. , Behavioral such as cpu time and system memory. Func- Aspects of Text Editors, Computing tions in this category include automatic Surveys , V. 13. 1981, pp. 33-70. spelling verification and context editing. [3] Miller, L.A., and Thomas, J.C. Jr., In certain positions where a text editor Behavioral Issues in the Use of is used a substantial fraction of the time, Interactive Systems, Int. J

user-defined functions may play a role in Man-Machine Studies , V.9, 1977, pp. giving greater flexibility to a text editing 509-536. system. For example, if the user can define a function which will automatically generate [4] Smith, S. A., Minimizing Processing an appropriate letterhead, with date and Costs In An Automated Office System, salutation, this can save a great deal of

time. Similar comments apply to mailing Cybernetics , V. 10, 1980 pp. 232-242. lists. In the case of computer program

editors, optional line-by-line syntax [5] Ferrari, D. , Computer Systems

checkers for the particular language being Performance Evaluation , Prentice-Hall, used may be effective, especially for the Englewood Cliffs, N.J., 1978. novice. The ability to compile and execute programs from within the editor can avoid the [6] Shneiderman, B., Human Factors Experi- time spent invoking and then leaving the ments In Designing Interactive Systems,

editing system in each cycle of debugging. Computer , V. 12, 1979, pp. 9-19.

5. Conclusions. [7] Williams, J. D. and Sevenson, J. S. , Functional Workload Characteristics and This paper has presented a framework for Computer Response Time In The Design of

the evaluation of computer user productivity, On-Line Systems , Proceedings, Computer with examples drawn from text editing sys- Performance Evaluation Users Group tems, since this is an area in which a sub- Thirteenth Meeting, NBS Special stantial impact could be made in the near Publication 500-18, U. S. Dept. of future. The intention of this conceptual Commerce, Washington, DC, 1977, pp. framework is to provide a logical form for 3-11. the analysis of computer systems at any desired level of detail. We may wish to look [8] Engel, G.H., Groppuso, J., Lowenstein, at the global characteristics of the system R.A., and Traub, W.G., An Office itself such as the response time and Communications System, IBM Systems

availability, or we can look at the very Journal , V.18, 1979, pp. 402-431. detailed level of keystroke input. The

framework is useful in helping to avoid [9] Ledgard, H. , Whiteside, J. A., Singer, excursions into those areas in which the A. and Seymour, W., The Natural analyst may not have either interest or Language of Interactive Systems,

influence for promoting change. The tools Communications of The ACM , V.23, 1980, used for productivity analysis at these pp. 556-563. levels will differ widely in approach. However, the major point is that productivity [10] Hammer, J.M., and Rouse, W.B., Analysis improvement can be promoted at any level in And Modeling of Freeform Text Editing the framework. A particularly useful appli- Behavior, Proc. 1979 International

cation of this approach is in the analysis of Conf . Cybernetics And Society , 1979. interactive systems, the source of the exam- ples used in this discussion. [11] Rouse, W.B., Design of Man-Computer Interfaces For On-Line Interactive

I especially wish to acknowledge the Systems, Proceedings Of The IEEE , V.63, assistance of Mr. Rollins Turner in 1975, pp. 847-857. assembling the system used to capture terminal character data for use in these [12] Boies, S.J., User Behaviour In An studies Interactive Computer System, IBM System

Journal , V. 13, 1974 pp. 1-18. References

[1 ] Doherty, W.J., Thompson, C.H. and Boies, S.J. An Analysis of Interactive System Usage With Respect to Software, Linguistic and Scheduling Attributes, IBM Research Report, RC 3914, 1972.

80 Systems Development and Maintenance

SESSION OVERVIEW

SYSTEMS DEVELOPMENT & MAINTENANCE

Phillip C. Howard

Applied Computer Research Phoenix, AZ 85068

This session will review several concepts that may be useful during the system development life cycle in both the development and maintenance stages. The papers address methods for simplifying that statement of user requirements, for improving the quality assurance function, and for using configuration management techniques to reduce maintenance costs.

The papers represent practical applications of several software engineering concepts which contribute to the improvement in both software quality and productivity. The specific techniques discussed in the three papers apply to different stages in the life cycle and encompass the statement of user requirements, the application of configuration management techniques, and the role of quality assurance.

83

DESIGN OF INFORMATION SYSTEMS USING SCENARIO-DRIVEN TECHNIQUES

W. T. Hardgrave S. B. Salazar E. J. Seller, III

Institute for Computer Sciences and Technology National Bureau of Standards Washington, DC 20234

This paper describes a technique for developing information systems using a scenario-driven design approach. The approach emphasizes client (that is, the user who is purchasing the system) participation in the design process. The first step is to develop a collection of "scenarios" which document the interaction between the computer and the human user. Using the scenarios, information-flow diagrams and database designs may be constructed. After the client has approved these documents, they can be used to establish disk capacity requirements and transaction rates, and finally to specify all hardware and software requirements. The primary advantage of this approach is that the scenarios provide a good indica- tion of the ultimate usefulness and cost of the system. The client can review these documents and approve, modify, or reject the system design before any software is generated. This paper describes the scenarios, the information flow technique, and the database design approach using, as an example, a small business application.

Key Words: Database design; data dictionary; design; flowchart; information flow; information systems; interactive systems; requirements; scenarios.

1. Introduction system and the human user working at a term- inal. A scenario is essentially a hard-copy Designing an interactive information of an interactive session. This level of system requires specialized techniques that detail is seldom available until after the are unnecessary in the development of other system is implemented and capable of writing types of computer systems. Human factors output to a terminal. One goal of the issues are involved, since the user community scenario-driven design approach is to capture typically does not consist of computer spe- this level of detail before implementation. cialists. Interactions between humans and computers must be clearly defined. The data- The collection of scenarios is a com- base must be described independently of any plete description of every screen format that particular programs. Standard flowchart the terminal operator may possibly encounter. practices must be modified to capture the The data appearing in the scenarios should be data flow through an interactive system. typical of data that may actually be used in the application. Thus, this collection of This paper describes a technique for scenarios completely and "by example" de- developing a logical design for an interactive scribes the user interface to the information information system. The product of this de- system. The scenarios are discussed further sign process is a document having these com- in Section 2. ponents : The information flow diagrams describe * Collection of scenarios the movement of data through the system. * Information flow diagrams They show where data enters the system, where A Datahase design it exits, where it is stored, and where it is displayed for the human user. While standard Step-by-step approaches to the development of flowcharts have had widespread usage, the the components, which are identified below, application of flow diagrams to interactive will be presented in subsequent sections. systems requires a somewhat different approach. A particular format and some The term "scenario" is defined for the stringent conventions are discussed in purpose of this paper as a detailed documen- Section 3. tation of the interaction between a computer

85 . ,

The database design is a complete de- There are some other benefits that come with scription of the data that is to be held and the development of detailed scenarios: permanently maintained by the system. One expects this data to have a long life span; * The scenarios may be included in the certainly it will be longer than the execu- User's Manual as tutorial material. tion time of any programs that manipulate it. Therefore, it is necessary to describe the * The scenarios may be incorporated into data apart from any particular program or a test plan to determine system subroutine. The database design technique, acceptance. discussed in section 4, is an application and extension of the "entity-relationship * The scenarios may be used as the basis model" [1] for the contract between the client and the systems designer or software The scenario-driven technique is cur- implementer rently being designed as part of a project to automate the operation of a wholesale book The following subsections will discuss the dealer. The examples used throughout this development of screen-based scenarios and paper are taken from that applieatdon. the menu format. Examples of menus from the wholesale book system are included. There is one significant advantage of this method. There will be detailed commun- 2.2 Development ication between the system designer and the client (the user who has. contracted to pur- In a screen-based scenario design, all chase the system) during the design stage, scenarios will have in common the top level as the client will review the scenarios many screen, that is, the first screen that the times during their development. Thus, when user views. The master menu gives an over- the report on the logical design is avail- view of the aspects of the organization able, the client is already familiar with addressed by the information system, thereby the system and can be confident that it meets defining the scope of the system. The de- expected needs. There is no mystery and signer and the client must work on the master little chance that the client will receive a menu until an agreement is reached and the system substantially different than envi- menu satisfies both parties. During this sioned . process, the system designer and the client must learn each other's terminology and 2. Scenarios develop a common basis for further work.

2.1 Definition After some agreement is reached on the nature of the master menu, work may proceed A scenario is a detailed documentation on the screens at the next level of detail. of the interaction between a computer system There is no established format for the sce!n- and the human user working at a terminal. arios; the screens may describe menus, user Scenarios may be written to mimic the inter- commands, or any Interaction that is accept- action on a line-by-line basis; a menu-by- able to both the client and the designer. menu basis, a screen-by-screen basis, or any This process of defining progressively lower other basis that can reasonably be put into levels of the system continues until all the design document. For the book wholesaler possible interactions have been completely system, scenarios are written on a screen-by- specified. screen basis, that is, each complete computer- human interaction (such as "ordering", "re- There are problems involved In the man- ceiving") is documented In terms of the agement of this documentation. An exhaustive screens sequentially viewed by the user. Each enumeration of all possible scenarios will screen described here consists of a menu or a result in a large number of screens for even menu plus system prompts. a small system. As menus at different levels are redesigned, it becomes tedious to main- Most Importantly, the scenario Is a tool tain consistency and verify correctness of to facilitate communication between the client the interfaces. and the system designer. That is, the scen- ario is a very simple visual Image that de- The notation used in the scenarios is scribes one part of system behavior. A not a metalanguage. The data requested from comprehensive collection of scenarios can the user or printed for the user should be describe system behavior in enough detail to typical data from actual situations. This convince the client that the system will avoids any formatting problems and similar perform as envisioned. misunderstandings between the client and the systems designer.

86 . ; 10

2.3 Menu Format prompts the terminal user for the "Customer Order Group Number", abbreviated COG//. The The menu format is illustrated in Fig- system prints the customer identifier and ures 2-1 through 2-4. Each menu (and its the shipping address. corresponding screen) is numbered to indicate its level, starting with the master menu as According to the menu of the next screen, 0.0. A menu has four columns with the fol- the 1.1.2 menu, the terminal user may select lowing headings to search for the title based on one of several possible inputs. In this case, the * Number terminal user selects a search on ISBN. The system prompts for the ISBN, the International * Option Standard Book Number, which is the unique identifier for books. After the ISBN has * Next been entered, the system adds the book to the order group and prints a message to that * Page effect as shown. As indicated, the screen would refresh itself to allow the operator to Each entry in the menu represents a pos- order another title for this customer order. sible selection by the user. The "option" is the textual statement of the function that is Menu 0.0 displayed for the terminal operator. The "number" signifies the key stroke to select No. Option Next Page the corresponding option. 1 Ordering 1.0 2 The "next" and "page" fields do not 2 Receiving 2.0 40 actually appear on the screen when the system 3 Inquiry/Update 3.0 106 is implemented. However, they are necessary 4 Picking List/Invoice/PO 4.0 147 for the writer and the reader of the logical 5 Accounting functions 5.0 166 design document in order to develop and follow 6 End of day 6.0 183 the sequence of menu displays. The "next" 7 Return to Operating System field indicates which menu appears next if Select one: 1 the corresponding number is selected. The "page" field gives the actual page number in Figure 2-1 Screen 0.0 the logical design document. Both of these fields are useful, albeit redundant. The page field is useful to the reader of the final Menu 1 . design document; and the next field is useful No. Option Next Page during the development cycle, when menu num- bers are available but page numbers are not. 1 Order for customer 1.1 3 2 Order for inventory 1.2 26 2.4 Sample Screens and Menus Select one: 1 The sample screens on the following pages Figure 2-2 Screen 1.0 describe part of the "ordering" process for the book wholesaler system. The level 0.0 menu is the first menu that a terminal user would see after starting the system. "Order- Menu 1 . ing" is selected by typing "1" in response to ^0. Option Next Page the system's prompt "Select one:".

1 New order 1.1.1 4 The "Next" field of the "ordering" entry 2 Add to order 1.1.2 10 indicates that the next menu to appear will be the 1.0 menu. After it appears, as shown Select one: 2 in Figure 2-2, the terminal user may choose Enter C0G#: 53296 to order for a customer or order something to be stored in the inventory. In this case, Customer Id. is: NLM a "1" is typed to signify an order for a Shipping Address is: National Library of customer Medicine 9600 Rockville Pike As indicated by the "Next" field of the Bethesda, >ID 20014 first entry in the 1.0 menu, the next menu to appear will be the 1.1 menu. In menu 1.1, Figure 2-3 Screen 1.1 the terminal user may choose to create a new order or to add to an existing order. In this case, the new user chooses to add to an exist- ing order by selecting "2". The system then : .

* The triggering events can easily be spotted Menu 1.1.2 by looking on the leftmost part of the dia- gram. No. Option Next Page * The involvement of a particular role can be isolated by looking along the appropriate i Search on alt. key 1.1.2.1 30 horizontal strip. 2 Search on ISBN 1.1.2 10 3 Search on series 1.1.2.3 35 * Time may be measured horizontally. Time 4 Enter as new title 1.1.2.4 37 intervals after triggering can be set up 5 End order 0.0 1 along the horizontal axis and charting symbols placed in the appropriate zones. Select one: 2 Enter ISBM: 0-13-854547 * Parallel charts may also be created. For example, in a manufacturing enviornment, a An order for Structured System Design by Gane materials flow diagram can be set up par- and Sarson has been added to COG// 53296 for allel to the information flow diagram. National Library of Medicine. The materials flow documents the flow of parts, subassemblies, etc.; the information Figure 2-4 Screen 1.1.2 flow documents the paperwork.

However, since a flow diagram should be 3. Flow Diagrams generated for each possible flow sequence, there may be problems with the management of 3.1 Characteristics this volume of documentation.

Flow charts have been used for many years 3.2 Sample Flow Diagram to describe the flow of computer programs and data. Although the diagrams used here are The sample flow diagram shown in Figure similar to traditional flow charts, some 3-1 depicts the flow for the part of the stringent rules for their creation and use ordering process discussed previously. The have been Imposed. These restrictions, dis- trapezoidal boxes represent the display of a cussed below, make the diagrams easier to menu; the menu number is given inside the box. read and more useful as a management tool. In cases where there is a square box under- neath the trapezoid, the menu interaction As depicted in Figure 3-1, the foirmat requires input of a data-item by the terminal for the flow diagram exhibits the following user. The name of the data-item is contained characteristics in the box. The freestanding square boxes represent processes that the system performs. * All flow is from left to right. If the process stores or retrieves data from the databases, this is signified by a line * The triggering event is the first box on from the process to the appropriate databasfe. the left. Herein, the term "database" is used very * The process ends with the rightmost box loosely and could, in this case be used inter- (usually labeled "end"). changeably with the term "file". The various databases are represented by the cylindrical * The left margin is partitioned by "roles"; symbols that is, the organizational (or external) entities are listed down the left margin. In Figure 3-1, the flow begins on the One role that is usually included is left when the customer prepares the order. "database". In this way, processes that The order then goes to the ordering depart- store data in or retrieve data from the ment. The terminal operator calls the sys- database may be documented. tem into operation and the level 0.0 menu is * The boxes have various shapes that are displayed. This is signified by the trape- assigned meanings to fit the application. zoid containing 0.0. The terminal operator This example uses trapezoidal, square, selects option 1; this is signified by the and cylindrical symbols; the meanings are "1" above the trapezoid. Then menu 1.0 is explained below. displayed. The terminal operator selects "1" option 1; again this is signified by the The diagram shown in Figure 3-1 is short above the trapezoid. Then menu 1.1 is dis- and fits on one page; most diagrams will played; the terminal operator selects option require several pages. 2. The box below the trapezoid indicates that the terminal operator must enter the There are several advantages in using "Customer Order Group Number", abbreviated this kind of flow diagram: COG//. As shown by the square box, the system retrieves the customer order group from the

88 .

"Customer Order File", abbreviated CO. Also, entity the system gets the "Customer Identifier", abbreviated C#, from information contained in * Defining non-key attributes for each entity the COG. Finally, the system retrieves the "Shipping Address", abb reviated SA and dis- * Enumerating relationships plays it along with the customer number. * Determining keys for the relationships The next menu to be displayed is menu 1.1.2. The terminal operator chooses to * Defining non-key attributes for the rela- search on ISBN. The system requests the ISBN tionships and the operator enters it. The system then searches the title file to ensure that the * Determining which relationships may also ISBN is valid. If the ISBN is not valid, have the dual role of entities then other menus not shown here are invoked. Finally, the title is added to the COG and * Assigning dual keys to dual entity/rela- the process ends. tionships

ROLES PROCESS: ADD-TO-ORDER

CUSTOMER PREPARE ORDER

ADD HUE ORDERING TO DEPT. CO FILE

RECEIVING DEPT.

DATABASE

Figure 3.1: Sample Flow Diagram

4. Database Design 4.1 Entities

The database design is, roughly speak- First, the designer must enumerate the ing, the format of the various data files entities. An entity is a concept that the that are to be stored permanently (usually designer wants the information system to recognize and manipulate. For in on a disk) . The actual values in the files example, may change as different data enters and an inventory system, a part would be an leaves the system. We will refine this entity. For some applications, there are concept as we proceed. many possible choices for entities and de- cisions on tradeoffs may be required. The Our approach to database design con- discussion herein will center on the Whole- sists of several steps as described below. sale Books example rather than dealing with While it is similar to the entity-relation- generalities ship approach described by Chen [1], the technique is specified in more detail and In the wholesale book application, the may differ somewhat from his philosophy. basic entities are:

The process consists of: * Customers

* Enumerating entities * Publishers

* Determining the key attribute of each * Titles

89 .

Sample record diagrams giving the most TITLES important attributes of each entity are sho\^n in Figure 4-1. In the actual ISBN* Title database, more attributes are necessary; Author Cost Price some are omitted for this discussion to keep it manageable.

Each entity is uniquely identifiable CUSTOMER by a single attribute called the key (denoted CID* C-Name C-Address by "*" in the figures). For example, a person may be uniquely identifiable by his/ her social security number or a part may be uniquely identifiable by the part number. PUBLISHER 4.2 Relationships PID* P-Name P-Address Next, the designer must enumerate the relationships, that is, the associations Figure 4-1 Entities among entities. The attributes of a relationship should describe the relationship, not the individual entities, and therefore must include the keys, but no other attri- ITEM-ORDERS butes, of the entities involved. CID* Figure 4-2 shows some typical relation- ISBN* Qty ships; again these are modified from the actual application for simplicity. However, they are quite similar to an early design TITLE before the complexity required by the client /PUBLISHER was introduced. The item-order relationship ISBN* PID* relates customers to titles. A customer (CID) orders a title (ISBN) in some quantity

(QTY) . The title/publisher relationship Figure 4-2 Relationships relates titles to publishers. A publisher (PID) publishes a title (ISBN) ITEM-ORDERS Our example is somewhat more complex. The item-order relationship needs to be COG//** CID* ISBN* QTY grouped to reflect which books appear on an incoming order. Also, the item orders need to be grouped to reflect which items are Figure 4-3 Dual Entity/Relationship included on a single order to the publisher. Therefore, the item-order relationship needs to be referenced as an entity in other is updated to reflect the new organization. relationships. This is accomplished by Thus, the data dictionary is an important adding an attribute. Customer Order Group tool for managing and controlling the system (COG#) which becomes the key of the augment- design process. ed item-order dual entity /relationship (as shown in Figure 4-3, with the key denoted Although the creation and maintenance by "**"). of a dictionary can be an expensive under- taking, it is possible to scale the complex- 4.3 Data Dictionary ity to fit the application. For example, the dictionary may contain many attributes The purpose of the data dictionary is for each data-item or as few as two; it may to provide a written description of each be managed manually or with a computerized attribute that is referenced in the database system. (in both entities and relationships). The data dictionary can be used as a basis for Because this approach is intended for a communication among the various people work- small organization with limited resources, ing on the questions that arise during the the data dictionary described here is as design process concerning the purpose and simple as possible and may be managed manu- meaning of various data-items. Whenever the ally at a small cost. As depicted in Figure database is reorganized, the data dictionary 4-4, the dictionary is a table with two columns: data-item and narrative. Each

90 —

entry in the table describes one data-Item, References or attribute, that is defined in the system. The column labeled "data-item" contains [1] Chen, Peter, "The Entity-Relationship the names of the data-items, while the Model—Toward a Unified View of Data", column labeled "narrative" contains descrip- ACM Transactions on Database Systems , tions of the data-items meaningful to the Vol. 1, No. 1, March 1976, pp. 9-36. user. The usefulness of the data dictionary (as well as its complexity and cost) could [2] Gane, Chris and Sarson, Trish, be increased by adding more information to Structured Systems Analysis: Tools and each entry. For example, a column "entity" Techniques Prentice-Hall, Englewood could be added to keep track of the entitles Cliffs, NJ, 1979, 241 p. in which each data-item appears.

DATA DICTIONARY

DATA-ITEM NARRATIVE

ISBN International Standard Book Number Title Title of book Author Author of book Cost Publisher's price for book Price Our normal price for this book CID Unique identifier for customer C-Name Name of Customer C-Address Address of Customer PID Unique identifier for publisher P-Name Name of publisher P-Address Address of publisher Qty Quantity: Number of this ISBN ordered by this CID COG# Customer Order Group Number: Unique identifier for Purchase Order

Figure 4-4 Data Dictionary Table

5. Concluding Remarks

The information system design technique presented in this paper is based on a sur- prisingly simple but rarely-used idea that the systems analyst build a mock-up of the system for review by the client before implementation begins. The method of constructing the mock-up, described herein, is the development of a collection of "scenarios," the possible interactions between the computerized information system and the human user. The client and the system designer take part in a number of iterations of reviewing and revising scenarios, while the information flow and the database design proceed in parallel. Approval of a collection of scenarios by the client is the first milestone in the development of the system.

91 i CONTAINING THE COST OF SOFTWARE MAINTENANCE TESTING -AN EXAMINATION OF THE QUALITY ASSURANCE ROLE

WITHIN U.S. ARMY COMPUTER SYSTEMS COMMAND

MAJ Steven R. Edwards

Quality Assurance Directorate, US Army Computer Systems Command Ft, Belvoir, VA 22310

This paper provides a brief introduction to the US Army Computer Systems Command (USACSC) to include its various elements and their respective functions. The previous (prior to 1 Sep 80] requirements for software maintenance testing within the Command are examined, with the greatest emphasis being placed on the specific allocation of the Quality Assurance Directorate's resources (manpower and computer) necessary to fulfill these requirements. Key points in this portion

include an explanation of Developmental Center Testing (DCT) , Environ- mental Testing (ENT) as an independent, third-party test, and Field

Validation Testing (FVT) . The evolution of third party testing within the Command is also discussed. Major points envisioned in the revised testing procedures which the Command and QAD are implement- ing are detailed including a discussion of the advantages and dis- advantages of independent testing as perceived by various members of the academic community as well as by the individuals directly involved with this independent testing within USACSC, Conclusions reached as a result of weighing these advantages and disadvantages and their ultimate impact on the revised testing procedures are also examined. The underlying point is the anticipated increase in effectiveness of the Quality Assurance program obtained from the readjustment of resources, the gradual phasing out of the Environmental Test, and the subsequent increase in "up front" QAD involvement. Concluding comments encompass an overview of the methodology being developed to gather and evaluate feedback data on the success and/or shortcomings of the revised procedures. This feedback data includes reports of problems in fielded systems, especially immediately subsequent to the release of maintenance changes to this software, as well as requested changes to fielded systems.

Key words: Developmental Center Test, Environmental Test, Field Validation Test, Independent Testing, Quality Assurance Program, Software Change Package, Software Maintenance Testing,

1. Introduction the software maintenance testing require- ments within USACSC prior to September 1, This paper examines the issues of soft- 1980, discuss the perceived need for change ware maintenance testing as a part of the in these procedures, and then detail the quality assurance mission within the United revisions underway. Since maintenance States Army Computer Systems Command testing is an integral part of the overall (USACSC). Such an examination is particu- quality assurance effort within the Command, larly relevant at this time, since a sign- any change in this part necessitates an in- ificant change in the procedures is in depth review of the role of the quality progress. Thus, it is logical to document assurance element in general. This

93 consideration is also addressed in the sub- sequent discussion. ASST CHIEF of STAFF

for In order to provide the desired setting AS«|ILtFM| AUTOMATION for a discussion of the specific topics and described above, a brief overview of USACSC COMMUNICATIONS is apropros. This introduction to the command focuses on its mission, organiza- tion, and functions in the role of a central design agency responsible for developing automated systems for the Army. The mission DCSlOt of the Command is shown in figure 1. OCSPER SYSTEM DEVEIOPMENT, CO* COMPUTER INTEORDTiail t TESTMQ DCSROA REOUIIEMENTS SYSTEMS DCSOPS COMMAND SrSTEMS FOR THE FIEID USACSC MISSION T«EO OCC USER • SERVE AS THE PRINCIPAL ARMY DEVELOPER OF D«M liUUI COMMANDS MULTICOMMAND ADP SYSTEMS.

• DESIGN. TEST, INSTALL AND MAINTAIN ADP SYSTEMS.

• DIRECT THE ARMY ADP STANDARDIZATION Figure 2. Army Interactions PROGRAMS. With USACSC

• PROVIDE TECHNICAL ASSISTANCE TO HQDA STAFF AGENCIES AND MAJOR ARMY COMMANDS. provides technical support to user commands, both in the US and overseas. The authorized RESEARCH PROGRAMS. • CONDUCT SOFTWARE strength of the Command includes 382 mili- tary and 1102 civilians scattered through- out nine geographic locations. Systems interfaces are a major concern of the command. Figure 3 shows a typical array of Figure 1. USACSC Mission interfaces for one of the standard systems. Documentation and configuration management of interface specifications are extremely Multicommand ADP systems are in support of difficult. An element that makes this functional applications which are similar control even more complex is the fact that in two or more major Army Commands. different systems developers may be involv- Figure 2 shows how the various elements ed. Hence, a developer could easily make ' within the Army interact with USACSC in changes in specifications to one system support of this mission. The Assistant without anticipating the impact these Chief of Staff for Automation and changes could have on other systems. To Communications is responsible for establish- preclude problems in this area, interface ing overall objectives, policies, and control is exercised at Command level. procedures for Army Staff direction and This centralization within a single guidance for the design, development, central design agency permits the coordina- installation, and maintenance of Army tion and management of interface require- management information systems. (A ments between systems controlled by differ- planned reorganization within the Army ent proponent agencies. Staff will alter this element; nevertheless, USACSC will continue to receive Army This paper focuses on the Command's Staff guidance and direction.) Quality Assurance Directorate (QAD) and its interactions with three of the Command's Within their respective areas of software developers: Logistic Systems responsibility, the Army Staff agencies Directorate, Personnel and Force Accounting formulate statements of requirements for Systems Directorate and Financial Systems new systems, or changes to existing systems, Directorate. These Assigned Systems which are then forwarded to USACSC. As Developers (ASD) design, test, install, the principal Army developer for Standard and maintain over forty standard systems Army Multicommand Management Information in response to Department of the Army Systems (STAMMIS), USACSC receives the Proponent Agencies (PA) approved require- approved requirements and undertakes systems ments. The STAMMIS are operational at design or change. Additionally, USACSC

94 . :

tenance testing is composed of three levels STANFINS INTERFACE IHSOPS SOPPIT of evaluation conducted by the system devel- WITH OTHER SYSTEMS oper, followed by an independent third party test conducted by the Quality Assurance Directorate. At the conclusion of these smis « SDPCIIS tests, the systems change package is subject- ed to its final test at a field site where

INIEtFUND iNunruNii it is evaluated by representatives of the iniiHG inilNG STANFiNS major Army command user community (with Quality Assurance, Proponent Agency, and >SIC 1E«M DP System Developer personnel on site). 368( IKCOMI These tests are further defined in USACSC i i L regulations as follows [1]

Developmental Center Test Level I SIIRtlPS IFS (DCT I) - A test which occurs after design, coding, desk checking, and successful compilation of a new or changed program has Figure 3. STANFINS Interface occurred and which insures that the program With Other Systems operates efficiently. Primary objectives of the DCT I are: approximately 150 Army Data Processing - Identify and correct all program Installations world-wide. Some of the errors functional requirements that are satisfied by the standard systems developed by - Exercise program decision and logic USACSC are portrayed in figure 4. flow affected by change to demonstrate technical correctness of program.

- Exercise program functions affected FINANCE PERSONNEL LOGISTICS by change to demonstrate functional correct- ness of the program. INSTALLATION ACCOUNTING STRENGTH ACCOUNTING: SUPPLY MANAGEMENT ACTIVE ARMY INVENTORY ACCOUNTING MAINTENANCE REPORTING RESERVES - Assure compliance with programing CIVILIAN PAY RETIRED COMMISSARIES and documentation standards. CIVILIAN DATA INQUIRY PORT OPERATIONS FORCE DEVELOPMENT The DCT I is normally performed by the RSD BUDGETARY CONTROL INSTALLATION SUPPORT programer who conducts the test utilizing MILITARY POLICE AMMUNITION developer checklists (no formal test plan). The programer examines the developers existing test data base and prepares additional test data as necessary to test Figure 4. Functional all data types which have been affected Systems Operational by change. At the conclusion of the test, the programer considers test data for 2. Software Maintenance Testing permanent inclusion in the developers data base and approves the program for the next Having provided this broad description level of testing in accordance with the of USACSC, it is now appropriate to delve developers internal procedure. more deeply into its software maintenance testing. Maintenance testing is the test- Developmental Center Test Level II ing of approved software changes which (DCT II) - A test of the individual programs result from systems enhancements proposed which have completed DCT I, grouped in their by the PA and Army activities currently appropriate jobs/cycles. Primary objectives processing the system. The assigned devel- of the DCT II include: oper, in conjunction with the proponent, programs the necessary changes and assembles - Identify and correct program inter- the altered software into a systems change face errors. package (SCP] . Formal test condition requirements, which specify the requirements - Exercise all variations of job flow necessary to test the change, are prepared affected by change to demonstrate technical by the proponent for functional changes and correctness of the job. the developer for technical changes. Main-

95 .

- Exercise all functions affected by change. change to demonstrate functional correct- ness of the job. - Evaluate the accuracy and complete- ness of the computer operations manual. The DCT II is normally a joint respon- sibility of programers and their super- The Quality Assurance Directorate is visors. A Test Director may be appointed responsible for developing the ENT plan if the developer desires. Formal test and appointing the ENT director. Upon plans are not required, but detailed check- completion of ENT a formal recommendation list outlining performance standards and of release to the next stage of testing is I/O requirements are strongly recommended. signed by the Test Director. The ENT is The developer maintained data base will be covered in more detail in subsequent supplemented as necessary to meet test discussion. objectives. At the conclusion of DCT II the applicable programer will insure that Field Validation Test - A test actual output agrees with predetermined conducted at an operational unit with results and that the program interfaced representatives of the user community, properly within the job. proponent, developer, and quality assurance on site. The primary objectives of the Developmental Center Test Level III FVT are to: (DCT III) - A systems test which exercises multiple jobs interfacing in the total - Demonstrate to participants that the system. The objectives of the DCT III are: software change satisfies performance requirements of the change. - Demonstrate to the proponent agency and the developer that all change require- - Evaluate functional and ADP ments have been met. documentation for accuracy and ease of understanding. - Demonstrate that all cycles of the system affected by change will execute Installation of the SCP and subsequent and are technically adequate. execution of the cycles/ jobs are to be under normal production environment and are - Demonstrate that all outputs affected the responsibility of the host unit. There- by change are correct and functionally fore, the FVT is the only test conducted in adequate a true production environment. At the conclusion of the FVT a formal memorandum The DCT III is conducted by the developer, of agreement is prepared and the change is may be monitored by QAD, and is validated approved or rejected for release to all by the proponent agency. A formal test field users. If the system has multiple , plan is required. The developers test version (e.g., different operating systems) data base will be examined by the proponent a supplemental site is chosen to run the agency and supplemented as required to change for a specified time prior to thoroughly test changes to the system. At release. (There are also emergency change the conclusion of the test, a formal report packages and urgent change packages but is prepared and a signed acceptance is the time criticality of these changes required prior to release to the next stage dictates an abbreviated testing cycle). of testing. Since the primary focus of this paper Environmental Test - A third-party, is on the Quality Assurance Directorate's technical test of SCP in the form they are role in the maintenance testing cycle, a to be shipped to customers. The test more detailed look at the resources and evaluates machine and human interface by methodology employed in the ENT is using the computer operations manual, pertinent. In order to maintain strict target hardware, representative operator third-party independence, a significant skills, and live data bases captured from amount of duplication has necessarily field installations. The primary objectives evolved. For example, separate test data of the ENT are to: bases for each system are procured and maintained by the QA Directorate. - Insure all jobs affected by change Additionally, system unique card decks, will execute to end of job. documentation, master files, -etc. must be obtained and managed to conduct ENT. The - Successfully load program and JCL magnitude of these requirements, coupled changes and execute other implementing with the personnel resource constraints jobs necessary to install the systems imposed on the ENT function, dictates that

96 the individual testers become generalists, dent test activities (or individuals) i.e., capable of interpreting systems associated with each system or one central documentation and overseeing the job/ test activity? At what stage in the devel- cycle execution (much as a system analyst opmental cycle should the independent test at a data processing installation), but for activity erter the process; that is, should the most part unfamiliar with the specific they insist on examining the change package functional output of the system. Thus, the at the end of the "assembly line" much as ENT is a technical checkout which insures a quality assurance worker in a factory? that the various cycles will execute without Or, would it be better to test early in abnormal termination. (This check-out, of the programing process thereby detecting course, is limited by the constraints of fatal flaws, but negating the effect of a the data base utilized). To maintain the final independent test before final integrity of the ENT, the Quality Assurance release? There is considerable academic Directorate must maintain accountability material available on the subject. One of the systems change package from the theory on the desirability' of a test beginning of the ENT until final release similar to ENT is that this type test is to all users. This entails a quality one of two extremes. At one end of the assurance representative delivering the spectrum (the ENT is at this end) is the package to the field test, remaining on-site person who treats the software as a "black with the developer's representative to box" and attempts to provide the majority monitor any necessary "fixes", and finally of possible inputs, verifying that the performing the necessary actions to cause programs meet the specifications. At the the creation of the shipment tapes and opposite end is the individual who studies documentation for each user. These steps the program logic and, for example, tries are required from a quality assurance to design test data for each potential point of view, but they create considerable branch in the logic. Both methods are overhead beyond that which would be required held to be too extreme and, in reality, to only perform an independent test at a impossible to fully accomplish [2]. But given point in the change package life this theory deals with the broad arena of cycle. software development and testing. USACSC, specifically QAD, in recognition of the 3. Perceived Need to Change Procedures fact that perhaps the "text-book" solutions were not being utilized, commissioned The EOT, as described above, was several studies of the quality assurance established as a testing requirement in program. One of these studies concluded 1974, almost certainly because of critical that the Environmental Test should be problems with change packages after their combined with the DCT III to be conducted installation by Army users. The ENT evolved as the latter is now. This report suggests to insure that change packages would that "independent third party testing is install and run without abnormal termination. only a good idea if it is an oversight Although this may seem to be a basic activity. In gaining the independence of criteria for developer testing, it was a third party test, so much expertise is evidently not met frequently enough which lost from the proponent and development caused the creation of the ENT function. groups that the results are bias free but (There are many explanations for this too shallow to be worthwhile. Only by including variances in developmental and keeping those who know the most about the user hardware, time constraints imposed software in the testing loop can their on the developer, inadequate test data deep knowledge be used to speed up the bases, and the all encompassing human cycle of error detection and correction. error). During the seven years since the While there is always danger of bias where implementation of the ENT, many theoretical the test group has special knowledge, this questions concerning its desirability and danger is less than the one of trivializing effectiveness have been raised. Two of the tests. Whenever objective measures of the most pertinent issues are: (1) The test quality are available, the best of value of an independent test in general both worlds can be obtained: The expert and (2) The Quality Assurance role within group can conduct the test, and the the overall life-cycle of a maintenance independent group can verify (in the sense change. of an auditor) the objective measure is satisfied [3]." A second study, performed The question of the value of an internally, reached similar conclusions in independent test is more complex than it noting that "The ENT is excessively might appear at first glance. Should an expensive and time consuming for the independent test be conducted at all? purpose it serves.... This is not a If so, should there be a myriad of indepen- comment on the diligence or dedication of

97 ,

the personnel who conduct ENT; it is simply 4. Revised Testing Procedure that the ENT was devised as a measure to cure a symptom, that symptom being fielding With the stage being so obviously set packages which did not load or execute..,. for a change in the Command's QA program, To do this is essential; however, it can be particularly regarding independent testing, accomplished easily and cheaply by conduct- the following changes were instituted on ing all or part of the DCT Level III on a September 1, 1980: broadcast ready package and target hardware [4]." - Selected standard systems would undergo revised testing procedures. Change Thus both general literature and packages for these systems would not be specific studies appeared to point toward subjected to ENT, but instead would have a the same conclusions. But it would be DCT Level III with ENT objectives and be insufficient to separate independent test- conducted on target hardware. The devel- ing as an entity unrelated to its role in oper for these systems would be the Command the context of the complete quality representative at the field test and QAD assurance program; i.e., if the ENT is not participation in these final tests would paying sufficient dividends, what will? be virtually non-existent. Again there is abounding literature concern- ing an ideal QA program, some general in - On a time-phased basis, additional nature, some dealing specifically with systems would undergo these revised testing USACSC. One of these general works views procedures. Eventually, all USACSC systems quality assurance as only one of three would transition to the revised testing software assurance methods. Figure 5 procedures graphically depicts these three methods. The interesting point is that the quality - Quality Assurance resources would assurance portion of this triad is respon- gradually be shifted to the upfront sible for conducting design reviews, audits, activities previously mentioned, i.e., walk-throughs , and witnessing acceptance simulation, optimization, design reviews, testing [5] . Returning to the specific and early test monitoring. studies of QAD, there is a common thread exemplified by the following quote: "No The impact of these changes can not be enterprise can afford to use more people underestimated. The quality of products to check quality than to create the affecting the operation of the entire objects to be checked. The solution is to Army was at issue. The revised procedures make QAD an oversight activity [6]." A were not received with overwhelming major conclusion of virtually every report acceptance by the various developers in was that QAD should concentrate more of its the command, although their objections resources in up-front activities such as were most likely keyed by a perceived simulation, optimization, design reviews, increase in workload without a correspond- and monitoring the testing conducted by ing increase in manpower. The theoretical the developer. wisdom of the change was not significantly

FACTOR QUALITY ASSURANCE ACCEPTANCE TESTING INDEPENDENT V&V

PRIMARY OBJECTIVE ENFORCE STANDARDS DEMONSTRATE ACCEPTABLE ELIMINATE CRITICAL PERFORMANCE ERRORS

ORGANIZATION INDEPENDENT GROUP DEVELOPMENT GROUP INDEPENDENT GROUP

RELATIVE SIZE SMALL STAFF. LONG LARGE STAFF, MEDIUM MEDIUM STAFF, LONG DURATION DURATION DURATION

APPLICABILITY DELIVERABLE OPERATIONAL CRITICAL SOFTWARE SOFTWARE SOFTWARE

MAJOR STRENGTH COST BENEFIT RATIO WIDELY ESTABLISHED EXTREMELY EFFECTIVE

MAJOR WEAKNESS DIFFICULT TO SUBJECT TO TUNNEL REQUIRES ADDITIONAL INITIATE VISION RESOURCES

Figure 5. Comparison of Software Assurance Methods

98 questioned. Nevertheless, there was bility and role in the complete testing of sufficient apprehension to cause a their systems. Undoubtedly there will be delay in the change (originally scheduled set backs, perhaps even a decline in the for July 1, 1980) and a reduction in the quality of USACSC products for a period of number of systems initially affected. time. In fact, it would be an anomaly if Regardless, the mechanism to eventually problem areas did not occur. However, accept the revision for all STAMMIS was in in the long run the shifting of QAD place. resources described herein can result in significant savings in the cost of It was also recognized that any major software maintenance while actually change in testing of software was palatable strengthening the QA program within the only if some method of tracking the effec- Command. tiveness of the new procedures was devised. In this regard, QAD instituted an after the fact tracking for change package releases called STAMMIS Release Evaluation and References Performance Profiling. This methodology, in its early stages, is primarily the [1] USACSC Manual 18-1 gathering of pertinent information about a particular change package release in a [2] Meyers, G.J., Software Reliability , format which facilitates the comparison John Wiley and Sons, New York 1976 of this data from one release to the next. Significant trends can then be charted and [3] Hamlet, R., Testing of Data reported to managers. The many elements Processing Software, contract report of information being gathered can be for Battel le Columbus Laboratories, divided into two major catagories: Those 1980. which describe or differentitate vis-a-vis other systems (these factors include size, [4] Fox, M. and White, P. Report of age and % programs changed in a release) QAD Special Study on Testing, and those which provide some indications Memorandum ACSC-QA, 1979. as to the effectiveness of the recent change packages for a system (including [5] Fujii, M. S., A Comparison of Software number of incidents submitted by users Assurance Methods, Logicon, Inc. against the system, number of changes Processing requested against the system, number of [6] Hamlet, R. , Testing of Data emergency and/or urgent changes required Software, Contract Report for Battelle and frequency of normal change package Columbus Laboratories, 1980. releases.) Although the system is in its early stages of development, it is theorized that it will be used as an aid in answering questions related to the effect of change on system stability, the optimal time between change package releases, the relationship of system size to stability, and other pertinent quality indicators.

5. Conclusion.

The Quality Assurance Directorate of the US Army's Computer System Command has undergone some major "soul-searching" narticularly with regard to its role in independent testing. As a result of this self-evaluation, the changes detailed in this paper are being gradually implemented. Six systems have been placed under the new procedures and all have experienced one or more releases of systems change packages with the revised testing methodology. No significant problems have been encountered and, for the most part, the developers are optimistic about their increased responsi-

99

e , . ;

HISTORICAL FILES FOR SOFTWARE QUALITY ASSURANCE

Walter R. Pype r , Ph . D

Tr acor , Inc . 2007 We St Olive Ful ler ton CA 92633

This paper summarizes a combination of ideas, adapted from software engineering and other disciplines, which provide insight into software life cycle management activities. This paper is written for an audience which may not be familiar with software engineering terminology, and some ideas and terminology are taken from related file maintenance and systems analysis fields. The impact of historical files, which represent the software develop- ment and testing activity, upon traditional software configuration management (CM) and software quality assur- ance (SQA) is examined. The implementation of historical files s exp 1 or ed. Both advantag e s and d i s advan ta g e s are ind i c a t e d The relationship to k e y wo r d cross-re f e r enc ing

is d i s c u s s e d .

Keywo r d s Conf iguration manageme n t (CM); cross r e f ere nee data s t r u c tur e design, documenta t functiona 1 ; ion; ; h i s to r i c a 1 file keyword; life cy c 1 e management r e qui re- ;

me n t s > s o f twar quality assurance (SQA); specifi c a t ion

1 . Introduc t ion such a configuration management system, some key decisions must be The intent of this paper is to made regarding which information to summarize a combination of ideas, keep and which to throw away. adapted from software engineering and Ideally, nothing is discarded from other disciplines, which provide in- program inception to system delivery sight into software life cycle and on into system maintenance. The management activities. This paper is challenge to the configuration purposely written for an audience manager is to develop a reasonable which may not be familiar with soft- update procedure for documents, pro- ware engineering terminology, so some gram listings, tapes, etc., so that ideas and terminology are taken from there is traceability from one related file maintenance and systems version to the next and so that the analysis fields. current state of documentation matches the supported software. Some Software configuration management trade-offs are necessary. In some is an area of software quality assur- cases, previous versions of documents ance which is concerned with listings, tapes, etc., cannot be controlling the current version of physically maintained; and even if each document, program listing, and resources were provided to do that, other items associated with a soft- the cross-reference system between ware system. In the maintenance of current and all other versions

101 . ,

of these items requires an additional controlled item (document, program resource which may grow to eclipse the listing, etc.) and indicate the originally envisioned software system relation between any two items. An development activities. equally important CM capability is to produce technical and management Although it is generally diffi- information on any item, such as the cult to decide what to throw away number of internal program segments when a new version of a controlled in a software task or the location(s) item is produced, one solution has of a given data segment within the been to keep copies of the previous entire software system. It is not two versions. Of course, one major intended that the use of a historical change followed by three successive file might eliminate the need for minor changes eliminates the version such traditional CM tools. The major reflecting the major change from the benefit of a historical file to CM configuration management system, so is in minimizing the concern over any such arbitrary rule may cause lost information due to elimination disaster in terms of t r ac e ab i 1 i ty of early versions of a controlled See Figure 1. item. The historical file records any change to a controlled item and Current Version cross-references that change to all (Minor Cha.ige) documents, computer codes, etc. which are functionally affected by Minor Change the change. One challenge is to identify all of these furctional Minor Change relationships and record the cross- references appropriately. Another challenge is to set up the historical file organization in the most useable

form at the "beginning". A thir 1 challenge is to identify the "begin- ning". When does one begin to keep a historical file? Major Change Let's start by considering the last challenge first. A historical file may be initiated at any point in the software engineering life cycle, but there is a point of Figure 1, CM Update Procedure diminishing returns, and there may > be a point of insignificant return for the effort invested. If it is The idea of maintaining a possible to begin the historical historical file which cross references file near the inception of the life all updates in each controlled item is cycle (wherever that is judged to proposed as a way in which a software be), there are rewards for keeping configuration might be controlled track of approved and eliminated appropriately. The idea is a fairly t ideas, together with the correspond- s t r a gh - f o rwar d one, but its implemen- i judgment criteria, during the tation does require discipline which ing early stages of requirements may not traditionally be found in the generation. configuration management arena. The application of historical files to If the historical file is wider areas of software quality implemented when the preliminary assurance is pursued in Section 3, software design document is generated entitled "Wider Application to document provides an outline for Software Quality Assurance (SQA)". that the historical file organization which is close to the ideal, since it 2. Applications to reflects the organization of the Configuration Management (CM) product itself - the computer code. Valuable information on software The basic procedures for software requirement specification decisions CM employ tools and techniques which will have been lost by waiting to can produce the current version of a initiate the historical file until

102 :. . .

the preliminary design document is change? It is proposed that this produced, and it is possible to cross-reference include usage of reorganize the historical file based a keyword file where keywords and upon the structure of any document or keyword phrases are associated listing. By automating the historical with the historical file section file, the reorganization procedure may in which the change is being be accomplished less painfully. described. These keywords pro- vide a secondary check on code The second challenge, that of and data segments which might be setting up the historical file organi- affected by this change. A zation in the most useable form, has second type of explicit cross- been partially discussed already. referencing which lists all Should its organization be arbitrary, locations in associated documents should it follow a particular document, where changes are required is a or should it be a reflection of the traditional part of CM control, program listing itself? The latter and would not necessarily appear organization, that of the program within the historical file. listing itself, is proposed as the most useful, since the historical file The last element of a historical is then keyed directly to the code for file entry responds to the first design, dev.- lopment , maintenance and challenge listed earlier. The ability other purposes. It is suggested at to adequately cross-reference the this point that one of the major changes to a software system and pro- sources of software maintenance cost vide checks against inadvertent is the cost associated with familiari- "ripple effect" of a change has zing personnel with the software plagued software designers and main- design, preliminary to maintenance tenance personnel in the past. The - updates to the cod e . Sine e all doc u idea suggested here is to associate ments are written in s uppo rt of the keywords and keyword phrases with computer If s e each program segment (subroutine; code itse , it ems _ reasonable that th e hi s tor i c a 1 file procedure, etc.) of the computer code should be organize d pa ral 1 el to the and consequently with the associated coda . historical file section. These keywords and phrases provide The content o f th e hi s 1 0 r i c a 1 functional relationships between code file has not been disc u s s e d to this segments which yield a semi-automated point , and it is h igh t ime we 1 o ok e d control by listing all code and data at that in some de tail . Th e his tori c a 1 segments which share the keywords file represents de sign dec i s ions and phrases contained in that What was done? Why was it done? historical file section. See When was it approved? What effect Figure 2 for a Fortran example. did it have? The historical file should contain, at a minimum, the foil owing SUBROUTINE EXAMPLE C 3 .4.1 • Location of Change - supporting C PURPOSE information is listed under C appropriate historical file C section. C C KEYWORDS • Chronology - date change was C implemented in CM controlled C i t em C

Description of Change - Figure 2. Keyword Section in Preamble Standardized format of changes to documents, listings, etc. Admittedly, a lengthy check-list Criteria for Change - of code and data segments might result Standardized comments for reasons from the cross-reference procedure beh ind change using keywords as the means for relating an altered historical file C r o s s -Re f e r e nc e s - What was entry to the balance of the entries. functionally affected by this This procedure is amenable to

103 .

refinement by means of sub-categories, tools and testing activity. In order so that the check-list is not so to formalize the organization of the extensive. The intent is to be historical file and tie it to the exhaustive in the cross-reference delivered code structure, it is exercise, so it may be well to employ recommended that both the code and manual sub-category sorting techniques the historical file be structured rather than automate to the degree with a numbering scheme which is that the intrinsic value of such a common to the final design document. cross-referencing procedure is lost. In particular, the individual sec- tions of the final design document In what respect have we aided should be implemented within the code the CM process? If the historical in single program segments (subrou- file were not available we might tines, procedures, etc.) or in groups expect that the following deficiencies of program segments (e.g. tasks). would exist in our CM system: The numbering system associated with those individual sections of the • The history of changes to a given design document may be included with code segment or document section the preamble or header comments in might be difficult to trace back the individual program segments and more than a few versions prior may be likewise repeated in the to the current version. historical file. In the event that a program segment does not match • The reasons for changes to design, a design document section uniquely, requirements, etc., might be as in the case with certain utility obscured or non-existent with programs, an appropriate itbsequence traditional CM techniques. numbering is possible so tuat the correlation is maintained. • The cross-reference between sections of documents or segments The advantages to the mainten- of code might not exist in a ance of such a common numbering useable form for maintenance or scheme extend beyond configuration design changes, thus amplifying management activity, discussed the aftershock of CM activity. earlier. All parent and descendant documents which support the code may The additional savings associated be referenced directly to code seg- with a reduction in the need for ments. Test results at all levels maintenance of multiple prior versions may be referenced directly to code of listings and documents may be the segments. A historical file is strongest reason for implementing a easily organized and maintained historical file. based upon such a numbering procedure.

3. Wider Applications to The effort involved in maintain- Software Quality Assurance (SQA) ing the common numbering system is insignificant compared with the Upon test or examination of any potential advantages gained. The code segment by means of SQA tools, historical file may be organized in the results of such examination should such a fashion that proposed updates be recorded in an easily accessible to any code segment are complemented central file. The ramifications of by extensive background on prior such testing and the ability to dupli- testing, validation and verification cate the test should be referenced associated with that code segment. within that same central file. The Any data segment which is affected by intent of these suggestions is to a proposed change should be similarly minimize maintenance cost - the great- categorized in the historical file. est potential cost of software if redesign and major updates to the If the suppo r t i n g da ta b as e is software are included in maintenance organized by such a s t rue tur e d costs procedure the pos s i bi 1 ity of unex- pected errors due t o c ode or It is suggested that the documentation upd a t e c an be s i gn i f i historical file, with an organization cantly diminished which reflects the code structure, is the appropriate place to record There is no s u bs t itu te f or results of the application of SQA manual review and c or r e 1 a t i on of

104 . .

some information, but the presence of software update activity. a historical file with built-in features which reduce the search and • Provide central repository of correlation process to a standardized all testing and SQA activity. procedure is a significant aid. Resistance to the implementation 4. Implementation of of a historical file is generally the Historical File prompted by lack of resources, to wit: time, money and personnel. The The individual entries within a return on investment for such a tool historical file may be formatted is generally not realized to a sig- under the appropriate section number nificant degree until all major design to include the items indicated in work has been completed; although Section 2, plus any others that are structuring the development activity identified by the user. Each entry so that design changes at all levels may be cross-referenced in some are carefully recorded and organized fashion, and the section in which the yields some very important benefits. entry resides is cross-referenced by keyword as well as by section number Each individual involved in the which matches the final design docu- development of the software may go to ment sectioi. number. the historical file to gain under- standing of otherwise tersely-written The historical file reside s on an documents. A significant time savings edit-driven system with the abi 1 i ty to is effected, and added insight into expand, concatenate and delete file total system requirements is derived sections, as well as to search and by the reviewer. Attempts to "second- sort on given keywords or other guess" the system architects are attributes. The ability to rap idly diminished, and due to the increased access any portion of the file by level of information available, the numerical pointer is a key to its implementor is given greater con'i- usefulness. The relationship b e twe en dence in the basic design. If that keywords and numbered sections i s is not the case, useful suggestions maintained within the preamble or for improvement will be forthcoming header of program segments (sub rou- at an early stage where the cost of tines, procedures, etc.) and du pli- change is small by comparison to cated in the historical file un less later stages. it is critical to save space in the historical file and not repeat the The details of implementation of keywords within the historical f ile . a historical file are, to a great degree, dependent upon the resources Advantages to implementing a and particular needs of the sponsoring historical file have been outlined organization. Some of the advantages in Sections 2 and 3. They are and disadvantages of starting the summarized here: historical file before the final design document is produced have • Preserve design history of already been discussed. The sugges- s o f twar e tion here has been to start as soon as possible and structure the histori- • Provide cross-reference to cal file so that it may be altered at control structure and data a later date to reflect the final structure changes. design document. This is no easier task than restructuring the require- • Provide essentially unlimited ments document in terms of the final backup to current version of all design document, so this author items under CM control. recognizes that the task of restruct- uring is probably not amenable to • Provide insight to design automation for many complex projects. processes for new imp 1 emen t o r s The process of correlating • Provide detailed history to keywords within the requirements support maintenance activity. document and the initial design specification allows a semi-automated • Provide management tool for procedure for relating the preliminary extraction of significant historical file (based upon the

105 . .

requirements specification) to the or after several maintenance versions final historical file, thus antici- have been generated. pating the organization of the final design specification. See Figure 3. By organizing such a file to match the structure of the delivered code, and providing the ability to cross-reference related documents, test results, SQA results, and other information, cost effective CM and Require- Prelimi- Final software maintenance is more ments nary Design probable. If data structure is also i- Spec i f Des ign Document considered in the generation of the cat ion D 0 c umen t historical file, heretofore unexploited software maintenance I 1 techniques are indicated. Prelimi- Restruc- Updated nary tured Histori- The intent of this paper has Histori- Histori- cal File been to generate interest in the use cal File cal File Structure of historical files for reducing software maintenance costs and to provide Figu e 3. Historical File greater visibility into the Update Process software development process, thus reducing the cost of design change by moving more of the detailed review If the reader feels that we are of design decisions towa.-c' the beginning to magnify the software beginning of the software development development effort beyond reason, process consider the effect of doing the cross-reference exercise to generate The keyword cross-reference approach a representative historical file. has been suggested as The mechanism of thinking about cross- logical complement to the historical references in detail provides the file. This provides a second check designer with insight perhaps not on maintenance activity, and yields otherwise accessible, and the added likely nooks and crannies to search for attention to that detail should "ripple effect" errors. The reveal problems with consistency and use of keywords and phrases is explored in more detail in J . completeness in the design. This 0. activity occurs at an early enough stage to make changes which might be References impossible once the coding cycles have begun. [l] Pyper, Walter R., T he Effectiv e Use of Comments, Tracor, Inc.. 20 March~198l7 5 . Summary

The concept of a historical file has been utilized effectively in many areas of systems and information science. It is a standard tool in the control of organizations and large projects. It has been somewhat neglected in software engineering, possibly due to the apparent complex- ity and cost of maintaining a separate historical file when so much was already being recorded and controlled by CM techniques

One key element which has been lost is -the history of design decisions which provide great insight into sometimes obscure reasons for the existence of data and control struc- tures in their final, delivered form.

106 ADP Cost Accounting and Chargeback

107

A STEP-BY-STEP APPROACH FOR DEVELOPING AND IMPLEMENTING A DP CHARGING SYSTEM

Kenneth W. Giese Dean Halstead Thcmas F. Wyrick

Directorate of System Evaluation* Federal Coitputer Perfontiance Evaluation and Siimilation Center (FEDSIM) Washington, DC 20330

Charging for data processing (DP) services refers to distributing the costs of providing DP services to the users who receive the services. The distribution of costs requires definition of the basic DP services, the resources used to provide the services, and the costs incurred to obtain and make use of the resources. A charging system is cortprised of two subsystems: the rate-setting and the billing subsystems. The rate-setting subsystem incorporates procedures for forecasting the use and the total cost of each service and establish- ing the rate to be charged for each unit of service. The billing subsystem includes procedures for monitoring the use of services, applying the rate for each service unit to corpute the total charge for the services each user receives, and reporting the charges to the user and pertinent accounting groups. Organizations may or may not collect funds to recover the costs of DP services.

The Federal Government has established policies that call for distributing the "full costs of operating DP facilities to users according to the service they receive." The National Bureau of

Standards (NBS) , assisted by the Federal Cotputer Performance Evalua-

tion and Simulation Center (FEDSIM) , is developing a set of Guidelines to assist Federal DP management in iitpleitienting the charging aspects of this policy. This paper describes a step-by-step methodology for the developitent and inplementation of a charging system that will form the basis of the Guidelines. The material presented in this paper represents an overview of the preliminary results of the Guidelines development project and may change during the review of the draft Guidelines document. Selected technical and policy references are cited to encourage further investigation.

Key words: DP cost accounting; DP cost allocation; chargeback; pricing; standard costing; chargeout; charging; charging system.

*The views and conclusions contained in this paper are the authors' and should not be interpreted as representing the official opinions or policies of FEDSIM or of any other person (s) or agency associated with the Government. Moreover, this paper contains the authors' summaries of official documents and, therefore, may not reflect the full intent of those docunents.

109 to the work performed by the DP facility

1 . Introduction for its users. In order for work performed by the DP facility to be classified as a On September 16, 1980, the Office of service, the work must be measured by only Management and Budget (OB) issued Circular one metric and must have a user billing No. A-121, "Cost Accounting, Cost Recovery, rate (price) associated with it. The and Inter-Agency Sharing of Data Processing metric is referred to as a service unit and Facilities." This Circular requires is the smallest category of work for which Federal agencies to irtplement policies and the users can be charged. Services can be procedures to (1) account for the full as sirtple as catputer processing, with CPU cost of operating data processing (DP) seconds as the service unit, or as cotplex facilities; (2) allocate all DP costs to as a machine resource unit (MRU) , which is users according to the DP services re- an algorithm that combines several differ- ceived; (3) evaluate inter-agency sharing ent services and that uses the number of as a means of supporting major new DP MRU's as the service unit. applications; (4) share excess DP capacity with other agencies; and (5) recover the The term "resource" refers to the items cost of inter-agency DP sharing. The eitployed by the DP facility to provide one Circular specifies that agency procedures or more services. In order for a resource for cost accounting and charging mast be to be included in the charging system, the consistent with the guidance provided in DP facility must incur a cost for obtaining Federal Government Accounting Panphlet or using the resource. Exaitples of re-

Number 4 (FGAP 4) , "Guidelines for Account- source categories that are used in DP ing for Autonatic Data Processing Costs," facilities are those listed in A-121 and

U.S. General Accounting Office (GAO) , 1978. FGAP 4: personnel, equipnent, software, supplies, contracted services, space While Circular No. A-121 and FGAP 4 occupancy, intra-agency services and describe general criteria that agencies overhead, and inter-agency services. The should use for cost accounting. Federal DP total cost of each resource is used as the managers need technical guidance in the basis for determining the cost of each area of charging. The National Bureau of service, which is in turn used as the basis Standards (NBS) is developing a Federal for calculating the billing rate that will Information Processing Standard (FIPS) be charged for that service. The concepts Guidelines to provide the needed guidance. of accumulating and accounting for the The Guidelines will incorporate the "best total costs of DP services are described in practices" as described in the literature detail in FGAP 4. and as used by exertplary managers in the Federal Government and private sector. The billing rates for the DP facility's services typically are developed by divid- This paper presents an introduction to ing the total cost of a service by either charging for DP services and summarizes a the projected utilization or available step-by-step methodology for developing and capacity for the service, such as the iiiplementing a charging system. The number of hours the service is available material described in this paper represents for use. Usage accounting procedures an overview of the preliminary results of collect and store the number of service the Guidelines development project. These units for each service utilized by each results may change during further review of user. These data are reduced and reported the draft Guidelines. Selected technical to the respective user. If the DP facility and policy references are cited to encour- is recovering its costs, then the billing age further investigation. rates for the service units are applied, the charge for the work is calculated, and 2. Introduction to Charging a bill is prepared and sent to the user. for DP Services The procedures used to charge for The term charging for DP services services are referred to in the draft refers to the process of distributing the Guidelines as a "charging system." Develop- costs of providing DP services to those who ing a charging system requires many decis- receive the services. The procedures used ions that can affect not only the DP to charge and recover costs for services facility, but also the entire organization. will vary from DP facility to DP facility. Exaitples of the decisions are (1) whether Services, resources, and their interrela- to charge the users for all of the re- tionships are the fxmdamental concepts that sources used to provide a service, or to must be understood in order to understand charge for only certain resources; (2) hew to charge. The term "service" refers whether or not to use the charging system

110 .

to try to influence service usage patterns; casts for each service to determine if the (3) v^ch services the DP facility should DP facility should continue to offer the charge for; and (4) which resources are service used to provide a particular service. These types of decisions are the main focus Cost forecasting involves estimating of the draft Guidelines. the costs incurred to provide the resources needed to support each service. The objec- tive of cost forecasting is to estimate all 3. Functional Description of a costs for the fiscal cycle. There are Charging System several techniques often used for cost forecasting. One of the more camion An operational charging system is techniques involves extrapolating the DP ccitposed of two subsystems: rate-setting facility ' s current operating budget into and billing. (See Figure 1.) The rate- the planning period by itemizing changes to setting subsystem supports the development the budget reflecting estimated future of the rates charged for each service. costs. Once cost forecasting is carpleted, The billing subsystem records usage and the costs and benefits of each DP service applies the rates to cortpute the total are evaluated by ccxiparing the forecasted charges for the services each user re- usage and cost estimates to determine if ceives . that service can be econcmically provided. Cost forecasts can be used as a basis for 3.1 Rate-Setting Subsystem the DP facility's budget requests. Together, usage and cost forecasting The rate-setting subsystem of a provide the data needed to set billing charging system is most often linked to the rates. organization's fiscal cycle, which subse- quently dictates the schedule of the Setting billing rates involves alloca- subsystem's operation. The frequency of a ting the forecasted DP costs to the various typical cycle is yearly. The main objec- DP services. This allocation involves tive of the rate-setting subsystem is the determining the portions of each resource setting of billing rates. Billing rates or group of resources (resource center) are the prices users are charged for the used to support each service. The total services they receive. Most organizations cost of each service is divided by either atteitpt to keep billing rates constant the forecasted capacity or the forecasted throughout the fiscal cycle to prevent usage for that service; the result is disruption of budgets, which explains the referred to as a "standard rate." The interdependence between the subsystem and standard rate is the minimum amount that the organization's fiscal cycle. The work must be charged for each service unit to perfontied in the rate-setting sv±)system is fully recover the cost of providing that best described by dividing it into four unit. Differential surcharges or discounts procedures: usage forecasting, cost may be applied to each standard rate, if forecasting, setting billing rates, and DP appropriate, to develop the final billing management accounting. Extensive interac- rate for each service. The standard rates tion occurs between the various procedxores, may also be used to develop charging and the work is not always catpleted in the algorithms, if the DP facility is using exact order shcvm in Figure 1. Each them. Charging algorithms are the equa- procedure is briefly discussed belcw. tions used to coirpile the usage of several different services into a total usage unit, The first procedure of the rate-setting such as the MRU described earlier. The subsystem is usage forecasting. Usage can billing rate for the total usage unit is be forecast by one or more of several calculated using seme combination of the techniques. Some examples of forecasting standard rates of each of the services in techniques are surveying users to estimate the algorithm. The billing rates are then the amount of service they will require, published and used as a basis for develop- performing a trend analysis based on ing the users' DP budget requests. historical usage data, or a combination of the two. After the forecasts have been The final procedure in the rate-setting cotpiled and validated for accuracy, the DP subsystem is a set of activities referred facility determines if available DP capac- to as DP management accounting. These ity can support the forecasted usage. If activities include budget planning and the not, steps must be taken to adjust either formulation of budget requests; establish- DP capacity or the nuirber of service units ing and maintaining accounts and accounting that users are requesting. Additionally, information for users and the DP facility; the DP facility evaluates the usage fore- and providing any interface that may be

111 wH w>< m CA

H

PQ

n

60

60 M

O

(0

s O I

9 60

WH w>< n

M H aH cn wI

112 required between the DP, budgeting, and have already been removed fron the user ' s accounting gro\:ps of the organization. account. Another technique is to have the user send funds to the DP facility for each bill received. Whatever technique is used 3.2 Billing Subsystem by a DP facility, it must fit in with the policies and procedures that each agency The billing subsystem consists of the has for the transferral of external and

procedures errployed to monitor the usage of internal funds . The agency ' s accounting the services; to prepare and distribute department typically handles all of the reports of each user's service utilization agency-level record-keeping for, and actual and charges for that utilization; and to movement of, the funds. recover costs. If the DP facility is recovering costs, usage reports may then be 4. Outline of the Charging System viewed as bills. Service usage is typi- Development and Iitplementation cally monitored continually; reporting Process typically occurs monthly; and cost recov- ery, which may or may not occur, typically The draft Guidelines discusses in follcws a schedule that is closely tied to detail four phases and eleven steps for the fiscal cycle of the agency. developing and inplementing a charging

system. (See Figure 2 . ) This section Usage accounting procedures, in the provides a sunmary of the major tasks context of charging, refer to recording the required for each of the eleven steps. service \jnits received by each user. Usage of DP equipttient and software is typically 4.1 Planning Phase monitored by job accounting software run in conjunction with the ccrtputer's operating The planning phase incorporates two system. Manual procedures, including time steps: (1) determining the management sheets, data entry logs, tape mount logs, structure and approach to developing and and other manual logs may be used for inplementing the charging system; and (2) personnel and other resources. Data fron setting the objectives for the DP facility, the job accounting software and the manual for charging for DP services, for the logs are corpiled to produce records of charging system, and for the charging total services received by each user during system development project. These steps a given period. result in the preparation of a high-level management plan for the development Reporting consists of the procedures project. ertployed to reduce the usage accounting data, to apply the billing rates to the "Step 1" Determine the Management results of the reduction, and to produce Structure and i^proach reports surtmarizing the total service utilization and charges for that utiliza- There are four major tasks in Step 1. tion for each user. Many DP facilities ac- The tasks are primarily the responsibility tually provide their users with a report at of the organization's management. First, the end of each ccjiputer session or run the level of management involvement and which suniTvarizes the services received for support in the design, development, iitple- that session or run and the estimated mentation, operation, and maintenance of charges for those services. If the DP the charging system should be determined. facility has chosen to recover the charges This involves defining the degree of for its services, then in most instances organizational ccmmitment to charging for the reports that have been produced can be DP services, the charging system, and the viewed as the bills used for recovery. charging system development project. Second, a charging team should be formed. Recovery consists of the procedures The team should consist of at least one eitployed for the collection of funds. representative from each fmctional group There are several different techniques that in the agency that will be affected by the can be used to handle the recovery of charging system, including DP, management, charges. One technique is to have the user budgeting, accounting, and the users. deposit an amount of money into an account Third, the charging team should identify maintained by the DP facility. The DP the major types of work that will need to facility then debits the user's account be perfontied and assign the responsibility every time the user utilizes the DP facil- for conpleting the work to the appropriate ity 's services. In this situation, the team meiriDer or the group they represent. reports or bills reflect the charges that Fourth, the methodology that the organiza-

113 tion will use in the design, development, inplaa:v8ntation, operation, and maintenance of the charging system should be defined by the charging team. The team's decisions should be documented in a formal Project Plan which outlines the individual tasks to be performed in each subsequent phase and step.

2" PLANNING PHASE "Step Set Objectives

Step 2 consists of four major tasks. 1. DETERMINE THE MANAGEMENT STRUCTURE AND APPROACH These tasks are coordinated by the charging 2. SET OBJECTIVES team, vtose objective for this step is to provide a formal definition of the objec- tives for the DP facility, for charging for DP services, for the charging system, and for the charging system development pro- ject. First, the management-level DP objectives should be defined to indicate DESIGN PHASE how DP relates to the organization as a whole. This involves determining the relationship of the DP facility to the 3. ESTABLISH THE ALLOCATION STRUCTURE A. DESIGN THE CHARGING SYSTEM organization, the level of maturity of the DP facility, and whether approval for DP usage should be via the DP facility's or user's budget. Since these ctojectives affect the charging system and the entire organization, they should be coordinated with upper management and all other involved groups. Second, the DP facility's RATE-SETTING PHASE or agency's objectives for charging for DP services should be determined. Exaiiples of 5. FORECAST USAGE these objectives are to (1) increase the 6. FORECAST COST accountability of DP to upper management 7. SET BILLING RATES 8. CONDUCT DP MANAGEMENT ACCOUNTING and the users, (2) irtprove the relationship between charging and the capacity planning process, (3) recover the cost of operating the DP facility, (4) encourage more effi- cient use of resources by the users, (5) encourage the DP facility to keep prices BILLING PHASE ccitpetitive with outside vendors, and (6) influence the behavior of users regarding their use of the resources. 9. DEVELOP AND IMPLEMENT USAGE ACCOUNTING PROCEDURES 10. DEVELOP AND IMPLEMENT REPOR,TING Third, the objectives of the charging PROCEDURES system should be deterroined. These objec- RECOVERY 11. DEVELOP AND IMPLEMENT tives should define how the charging system PROCEDURES should be structured and designed. There are ten primary objectives of a charging system identified in the draft Guidelines document: (1) repeatability, (2) under-

standability, (3) equitability , (4) audit- Figure 2. Charging System Development ability, (5) adaptability, (6) inexpensive Phases and Steps to operate, (7) easy to irtplement and maintain, (8) controllability, (9) stabil- ity, and (10) sirtplicity. Many of these objectives are not ccrrpatible and the charging team must work to explore trade- offs between conflicting objectives.

Fourth, the objectives of the charging system project must be defined. This involves setting the goals, schedules, and

114 )

budgets for accorplishing the phases, The service unit for each service should steps, and tasks defined in the Project also be defined. Fourth, the resources Plan. that the DP facility uses to provide each service should be itemized. Fifth, the centers, or logical groupings used to 4.2 Design Phase accumulate the resources and services, should be identified. Sixth, the structure During the design phase, the objectives for allocating the costs of the resoiarces set forth in the planning phase are used to and resource centers to the services should direct the conceptual development and be constructed. The actual costs of the preliminary design of the charging system. resources are not placed into the alloca- The objectives of this phase are to estab- tion matrix at this time, because they are lish an allocation structure, vrfiich is a not yet kncwn. Seventh, the charging matrix to be used to assign the resources algorithms, if they are to be used, should to the services, and a global charging be developed. system design. During this phase, the global requirements for the charging system "Step 4" Design the Charging System are identified and alternatives for satis- fying the requirements explored. The Step 4 consists of four major tasks. allocation structure and charging system First, the global fxmctional requirements design provide a starting point for the of the system should be identified, detailed design, development, irrplementa- analyzed, and documented. The requirements tion, and operation of the rate-setting and for the charging system that are of primary billing subsystems. The global charging concern are the basic inputs, processes, system design serves to coordinate the and outputs needed for the various proce- individual steps and tasks of subsequent dures. The allocation structure provides a phases. The design phase is coordinated basis for defining requirements for collect- and performed by the charging team members. ing, processing, and reporting information about resources, costs, and services. "Step 3" Establish the Allocation Struc- Second, the global functional requirements ture should be used to define and document the data requirements of the system. The Step 3 consists of seven major tasks. amount and types of needed and available First, the methods for allocating the costs usage and cost data should be closely of the resources to the services should be examined and defined. The data require- determined. There are two decisions vAiich ments document is very iirportant and should must be considered, (1) should the costs of be continually refined throughout the all or only a subset of the resources be charging system developnent project. All allocated to the services, and (2) should design decisions should be checked against billing rates be based on standard or the availability of the data needed to actiaal costs. The first decision is develop and irtplement the procedures and self-explanatory and is a function of the systems that those decisions dictate. policy of the DP facility or the agency in Third, the basic alternatives for satisfy- which it resides. The second decision ing the functional and data requirements of concerns how often the billing rates are the charging system must be explored. set. Billing rates based on standard costs Exanples of the decision trade-offs for are set once for the fiscal cycle; billing each of the procedures of the charging rates based on actual costs are changed as system are manual versus autotiated, in- often as necessary to ensure that they house versus purchased, and existing versus reflect the actual costs to produce them. notf. The potential costs of each alterna- Second, the charging methods or bases tive for each procedure should be weighed shoiiLd. be determined. There are two against its benefits. These decisions are charging method decisions that shoiiLd be iterative and may result in the need to determined: (1) should the rates be based modify the objectives and requirements of upon projected usage or estimated total the system. Fourth, based on decisions capacity; and (2) should the services made in the first three tasks, a global offered by the DP facility be charged using charging system design document should be a single factor, multiple factors, or an carpiled, reviewed, and eventually approved extension of the multiple factors method? or disapproved by the charging team, (These are discussed in detail in the draft i^proval leads to the next two phases,

Guidelines . during which the detailed designs are produced, the system and subsystems are Third, the services that the DP facil- developed, and finally they are inplemented ity wants to provide should be itemized. and operated. Disapproval means that

115 .

earlier phases, steps, and tasks must be "Step 7" Set Billing Rates re-evaluated and, if necessary, repeated until approval is obtained by the involved Step 7 consists of five major tasks. parties. First, the estimated costs of the resources and resource centers should be incorporated 4.3 Rate-Setting Phase into the allocation structure and the total cost for each service calculated. Second, This section describes the four steps the "standard rate" for each service unit to follcw in setting rates. The agency's should be calculated. The standard rate standard DP systems development techniques can be calculated in one of three ways: should be used in conjunction with this (1) by dividing the total estimated cost of section to structure the detailed design, a service for the planning period by the developnnent , iitplementation, and operation total projected usage, (2) by dividing the of the rate-setting si±)system. During the total estimated cost of the service for the rate-setting phase, the detailed decisions planning period by total available DP concerning the type and structure of the capacity, or (3) by using a technique that rate-setting subsystem are made. This considers the total projected usage and section presents information which is cost over the projected (multiyear) life of pertinent to the operational rate-setting the service. Third, coefficients for each subsystem to aid in its detailed design, charging algorithm variable, if charging development, irtplemeritation, and operation. algorithms are being used, should be The functional procedures include the calculated. Fourth, any factors for forecasting of usage and costs, setting of normalizing charges between carpeting or billing rates, and performing DP management similar services, such as multiple machines accounting tasks. or DP facilities, should be determined. Fifth, any surcharges, priority charges, "Step 5" Forecast Usage and discounts that are to be used should be determined. The end result of these tasks Step 5 consists of four major tasks. is the calculation of a billing rate First, users' projected workload require- (price) for each service offered by the DP ments should be obtained. The projections facility should be in terms of the service units that are established in the allocation "Step 8" Conduct DP Management Accounting structure. Second, the forecasts should be analyzed and used to determine the capacity Step 8 consists of three main tasks. needed by the DP facility to provide those First, procedures should be developed for forecasts. Third, discrepancies between the establishment and maintenance of user available or obtainable DP capacity and and DP facility accounts. Second, proce- usage forecasts should be resolved. dures should be developed and irtplemented Fourth, the allocation structure should be for assisting users and the DP facility in reevaliaated and, if necessary, restructured establishing and justifying their budgets. to incorporate the resolutions between the Third, procedures should be developed for users' forecasts and DP capacity. providing coordination with the organiza- tion's accounting and budgeting divisions, "Step 6" Forecast Costs the DP facility, and the users concerning areas connected to the charging system, Step 6 consists of four major tasks. such as the recovery of charges and user First, an estimate of the budget that will accounts. These procedures are needed to be proposed for the DP facility, for the keep track of actual expenditures of both planning period of the charging system, the DP facility and the users. FGAP 4 should be developed or obtained. Second, provides the guidance needed for management additional cost data that are either not accounting for the DP facility. contained in the budget estimate or not contained in the detail that is needed 4.4 Billing Phase should be <±)tained. Third, the cost data should be analyzed and the costs for the Billing includes functional procedures resources and resource centers should be for accounting for usage, producing and calculated. Fourth, the allocation struc- distributing usage reports and bills, and ture should be reevaluated. Resources and recovering charges fran users. During the resource centers for which no cost could be billing phase, these procedures are defined obtained should be removed fran the alloca- in detail, developed, irtplemented, and used tion structure or consolidated under other in an operational environment. The proce- categories. dures directly affect the users, the DP

116 . ,

facility, and the organization's accounting 5 . Conclusion and budgeting practices. Representatives frem these groups should be involved in the The purpose of this paper is to iden- detailed design, developnvent, inplementa- tify the critical phases, steps, and issues tion, and operation. Cost accounting for in charging system development and irrple- the DP facility is not addressed in the mentation. The approach and issues draft Guidelines document; however, cost presented in this paper are not intended to accounting does occur in parallel with the provide a cctrprehensive , detailed plan for billing phase. developing and inplementing a DP charging system. The topic is too ccnplex and far "Step 9" Develop and Iitplettent Usage reaching to be standardized; hcwever, a Accounting Procedures general approach and best practices can be established. Step 9 consists of two major tasks. First, the usage accounting procedures that In setting policies and developing are needed to collect the necessary data guidelines for charging users for the DP for the charging system should be designed services received, the Federal Govemitent in detail. These procedures typically is attempting to shift the responsibility include both autcrtated and manual methods. for DP costs frcm the DP facility to the When designing the procedures, the design- users. Such a shift raises questions that ers should be sure that all of the data are addressed in CMB Circular A-121, FGAP have been identified from the planning, 4, and the draft Guidelines discussed in design, and rate-setting phases. The this paper. The underlying theme of these second task is to develop, implement, and new policies and guidelines is iitproving operate the usage accounting procedures. the accountability of both Federal DP Development of these procedures will facilities and their users. typically include purchasing off-the-shelf software In addition to irtproving accountabil- ity, a charging system can also inprove "Step 10" Develop and Implement Reporting both the DP planning process and the Procedures cotmunications between the DP facility and its users. The approach presented in this Step 10 consists of two major tasks. paper stresses the iitportance of planning First, the reporting procediires should be and ccmmunications as the fundamental designed in detail. This consists of elements for the successful design, develop- determining (1) how to reduce the usage ment, iiipleitentation , and operation of a accounting data, (2) how to apply the charging system. billing rates, and (3) how to report the usage levels and charges. Second, the reporting procedures should be developed, inplemented, and operated. Development of the reporting procedures will typically The authors thank Dr. Dennis Conti of MBS include purchasing off-the-shelf software and the many other professionals in both and making modifications to it as neces- Government and cormercial organizations who sary. participated in the Guidelines developrtent project and the preparation of this paper. "Step 11" Develop and Iitplement Recovery Procediores Bibliography

Step 11 consists of four major tasks. [1] Bernard, Dan, et al. Charging for First, the policies of the organization for Conputer Services; Principles and transferring funds should be researched. Guidelines , Petrocelli, 1977. Second, techniques for adjusting charges in error should be developed. Third, the plan [2] Gushing, Barry E. , "Pricing Internal for recovering fiands from the users and Ccjrputer Services - The Basic Issues," accoxmting for those funds should be Management Accounting , i^ril 1976, designed in detail, developed, inplemented, 57(10), pp. 47-50. and operated in accordance with the agency's policies and practices. Finally, [3] Dearden, John and Nolen, Richard L. feedback from the users should be obtained "How to Control the Ccuputer Resource," and procedures should be modified as Harvard Business Review , November- needed. December 1973, 51, pp. 68-78.

117 ,

[4] "Charging for Cortputer Services," EDP

Analyzer , July 1974, 12(7), pp. 1-13.

[5] "Job Accounting and Chargeback," EDP

Performance Management Handbook , June 1980, pp. 2.0.1-2.90.5.

[6] "Standard Costing in Data Processing,"

EDP Performance Review , June 1981 9(6), pp. 1-6.

[7] "A User-Oriented i^proach to Charge-

back," EDP Performance Review , February 1975, 3(2), pp. 1-6.

[8] McKell, Lynn, Hansen, James V.

Heitger, Lester E. , "Charging for

Conputer Resources," Ccmputing Surveys , June 1979, 11(2), pp. 105-120.

[9] Nolan, Richard L. , "Controlling the Costs of Data Services," Harvard

Business Review , July-August 1977, pp. 114-124.

[10] United States Office of Management and Budget, "Cost Accounting, Cost Recovery and Inter-agency Sharing of Data Processing Facilities," Circular No. A-121, September 1980.

[11] Schaller, Carol, "Survey of Ccnputer Cost Allocation Techniques," Journal of

Accoimtancy , June 1974, 137(6), pp. 41-49.

[12] Statland, Norman, et al, "Guidelines for Cost Accounting Practices for Data

Processing," Data Base , Winter 1977, 8, pp. 2-20.

[13] United States Government Accounting Office, "Illustrative Accounting Proc- edures for Federal Agencies," Federal Government Accounting Panphlet Number 4, 1978.

[14] Zmud, Robert W. , "Selecting Cotputer Resources for Inclusion within a Pricing System," Journal of the American Society for Information

Science , November-December 1975, pp. 346-348. Control of Information Systems Resource Through Performance Management

119

SESSION OVERVIEW

CONTROL OF THE INFORMATION SYSTEM RESOURCE THROUGH PERFORMANCE MANAGEMENT

John C. Kel ly

Datametrics Systems Corporation Burke, VA 22015

Performance evaluation has expanded during the past few years to take on a more global view of the data processing function. Perfor- mance analysts can no longer limit their activities to system tuning. The issues have become broader than just the central hardware and soft- ware. Thus the shift in terminology from performance evaluation to performance management. Two trends have forced this change in empha- sis. First, with the thrust toward on-line systems, the performance of systems has become visible to a wider and wider user community. Per- formance evaluation can no longer be confined to the back room. It is out in the open whether we like it or not. Second, the productivity of the entire organization depends more and more on the computer system. In many cases, corporate profits are directly related to computer per- formance.

The papers presented in this session address some of the issues associated with the trend away from evaluation toward management. David Vincent discusses some of the basic issues associated with per- formance management and describes how the pieces fit together. Dennis Norman discusses the issues as they relate to an on-line network. The session will finish with a panel discussion of the issues presented in the papers as well as answers to the following questions:

• What is the proper role of computer performance management (CPM)? Does it properly include such activities as programmer

product i V i ty? • Where should CPM be located within the organization? • How many people are required to support CPM? • What experience and education are needed for CPM analysts? • Is it possible to cost justify CPM? • Is there really a difference between performance evaluation, performance management, and capacity planning? • How does one take into account user perceptions versus reality? • What are the most useful measures of performance?

The panel will consist of myself, the two speakers, and the follow- ing individuals: Larry Greenhaw of the Texas Department of Human Resources, and Ken Moore of the National Bureau of Standards.

121

. .

INCREASING SYSTEM PRODUCTIVITY WITH OPERATIONAL STANDARDS

David H. Vincent

Boole & Babbage, Inc. Educational Services Division Sunnyvale, CA 94086

ABSTRACT

The user service levels achieved by a data center are dependent on three variables: volume, mix, and efficiency.

By isolating and tracking these elements, the data center can assign accountability to the user for their demands on the system as well as isolate system inefficiencies to be corrected by the data center. These methods also help to create the environment where data center and user can agree on the essentials of the delivery of computer services

One of the most outstanding individ- dustry in general began to adopt his princi- uals to come out of the American Industrial ples. Some of Taylor's disciples carried on Revolution was a young man by the name of his work thereafter; notably, Frank and Frederick W. Taylor. Taylor's "scientific Lillian Gilbreath (who were the subject of a management" had a large effect on the tre- movie "Cheaper oy the Dozen") and Henry mendous surge of affluence we have experi- Gantt, who initiated project scheduling in a enced in the last seventy years which has Gantt Chart. The common interest uniting lifted the working masses in the developed those people was the analysis of work and countries well above any level experienced translating that analysis into more produc- before, even that of kings and queens of tive and efficient procedures and flows of old. Taylor's analysis of work and resul- work. tant method improvements resulted in produc- tivity gains unequaled in history^ The analysis of work involves the fol- lowing: His analysis of work in the 1880's was

exemplified by his study of shoveling iron 1 . The identification of all processes ore in a steel mill. In this study, he necessary to produce an end product or found gross inefficiencies in the actual pro- result cess of shoveling. By breaking the motions 2. The rational (and manageable) organiza- and actions of the workers down into measur- tion of the sequence of operations so as able segments, he was able to develop better to make possible the optimal flow of work methods (something like tuning a com- work puter) and standards of performance so that 3. The analysis of each individual opera- output could be judged on a day-to-day tion or process including measurement basis. and historical trending 4. The integration of the above into an This work was finally completed, as we overall process of producing a product now know it, shortly after he passed on dur- or result ing World War I, at which time American in- This established process has been used successfully for decades in American indus-

. . ' Actually, Taylor was preceded in try (and Japanese, and German, and .). philosphy, by another "irascible genius", It is my contention that the size and com- Charles Babbage, who documented some of the plexity of the data center has now evolved earliest forms of scientific management in to a point where this methodology may be his most successful book. On the Economy of applied in greater depth to realize signif-

Machinery and Manufactures (1832) . However, icant economic benefits. it took Taylor to "rediscover" and implement practical scientific management in America.

123 The role of the data center in the orga- These data centers are usually per- nization to which it belongs has increased ceived by its users as being poorly run be- from simple payroll and accounting applica- cause all variances from negotiated response tions in the 60' s (remember "tab runs"); times are attributed to the inefficiency of through inventory and distribution systems the data center. When analyzing work, there in the 70's; to online, realtime management are in fact only three universal causes of and operational applications in the 80' s. A deviation from a standard which will cause good example of the online operational appli- response time to be better or worse than cation of the 80' s is the automated teller plan: systems in banks. In this case, the data

center has gone to a point where it actually 1 . Volume interacts with bank customers. 2. Mix 3. Efficiency As the data center evolves towards a utility-like process, the end product of the The financial term for these deviations data center is service. With the prolifera- is variance and simply means the difference tion of online systems, service has become between planned versus actual results. The highly visible to the user and a much more first two variances are attributable to user tangible element in top management consid- behavior. The third variance is the only eration. In fact, service levels have be- one that can be attributed to the operating come so visible to the world outside the of the data center. In essence, the plan- data center that, to a large extent, they ning and anticipation of the volume and mix are considered the measure of the data considerations (user behavior) are much of center's performance. The perception of what capacity management is all about. service falls into three basic categories^: In this paper, I will explore methods

1 . Online response time of determining the relationship of volume, 2. Batch turnaround mix, and efficiency on the current system 5. Availability and how we might predict performance at various levels. The goal will be to define When one of these is outside a user's performance curves giving standard levels of expectation, the DP manager's phone begins service at varying levels of volume and mix. to ring with complaints. Moreover, the busi- ness environment in which the DP manager now Volume is a measure of activity in lives is one that expects him to manage his terms of jobs or transactions by job or operation the same as any other functional transaction type. The various jobs or unit of the enterprise. At this point, many transactions in types result from the dif- DP managers are beginning to feel the strain ferences in computer resources consumed to of trying to negotiate and maintain service process that particular job or transaction. levels without the benefit of having fully Differences in volume can be measured by' implemented traditional scientific manage- comparing forecast or plan to actual. For ment principles in their data centers. example, the running of Job A123 resulted in the following volumes: One major element of scientific management is the need to understand the elements of PLAN TIMES ACTUAL TIMES VARIANCE the service to be performed and the vari- JOB RUN RUN (PLAN -ACTUAL) ables that can affect them. These variables have one of two sources: data center AI23 <2> operation or user behavior. There are many data centers that have not yet quantified The volume variance in this case is ex- the difference in, say, response time caused pressed as 2 jobs over plan. In terms of by data center inefficiency as opposed to resources expressed in standard work units-^, that caused by user behavior. This is it is 2 times planned resource per run of because the analysis of work has not been Job A123' Let's say Job A1 23 should use: related to these two factors.

^These were the first three items listed in a poll done in Arizona with 150 DP ^For this example, a standard work unit managers. Appendix A contains a full list will be considered for the case of CPU only. prepared during the 1981 EDP Performance The standard work unit is basically CPU time Management Conference. factored for the relative power of the CPU.

124 Units Planned SWU's divided by planned work- (000) load is in fact the planned rate of SWU consumption for a job from this work-

CPU standard work units = 170 load .

I/O * 3' The volume variance is then a function of the jobs run over or under plan times Memory ^ the planned resource (SWU) consumption rate. This isolates those resources Total standard work units = 170 consumed over or under plan as a result of users running a different number of Therefore, the volume variance is 2 jobs than planned. times 170,000 SWU's units or 340,000. This may also be expressed in terms of dollars if 4. The numeric calculation of volume vari- a standard cost per SWU's can be calculated ance is then: and applied.

Now let's take an example where there <2> X ^11^ = <392.8> is a volume and mix variance. Let's suppose that we have 22 jobs that require a total of 4,320,000 SWU's (an average of 196,364 per job). The mix variance is calculated^: Planned SWU's The actual activity turns out to be 24 1 . The planned SWU rate ( Planned Workload jobs requiring a total of 5,760,000 SWU's for a variance of 1 ,440,000. The matrix of was just calculated for use in the vol- the above with additional information is ume variance. then constructed as follows: 2. The actual SWU rate is the key element in the calculation of the mix variance because the difference between the (000) planned and actual SWU rates is a result JOB P A v WORK UNITS P A V of a workload with different elements.

A123 4 6 <2> 170 680 1020 <340> B357 8 3 5 80 640 240 400 3. The difference in the SWU rates is multi- C896 10 15 <5> 300 3000 4500 <1500> plied by the actual workload. Totals 22 24 <2> 4320 5760 <1440>

The volume variance is calculated: 4. The numeric calculation of this example is then: Variance of job runs Planned SWU's X Planned Workload Volume Variance

4 In this case, the SWU's per job are not varied. As can be seen in the next example, they are varied and this will re- The volume variance is calculated: sult in an efficiency variance. The formula for calculating mix variance would be modi- Variance of Job Runs fied to reflect the fact that both mix and efficiency variance contribute to the X difference between the planned and actual SWU rate. The revised formula would then Planned SWU's be: Planned Workload Planned SWU's Actual SWU's Planned Workload Actual Workload Volume Variance X The variance of job runs is calculated by subtracting the actual total jobs run Actual Units (24) from the plan (24) with a resultant 2 jobs run over plan. The brackets in- Efficiency Variance = Mix Variance dicate that this variance will have an unfavorable impact on data center per- formance.

125 )

= MIX VARIANCE 1 . The total efficiency variance is a sum of the individual efficiency variances - ^) X 24 = <1047.2> calculated for each job run. 2. Each job variance is calculated by mul- The mix variance is the amount of stan- tiplying the actual runs of the job dard work units consumed over or under plan times the job's SWU variance (planned as a result of having different jobs run minus actual SWU's to process each run than were planned which, in turn, consumed of the job). different amounts of resources. The summary 3. The calculation would then be: of the variance is then: Actual Volume Variance <392.8> Times Efficiency Mix Variance <1047.2> Job Job Ran SWU Variance Variance A123 6 5 30 Total Variance <1440.0> B357 3 <20> <60> C896 15 <200> <3000> Now we can expand tlie same example to Total Efficiency Variance <3030> demonstrate an efficiency variance. Let's say that we have the following data: The new total variance summary is then:

Standard Volume Variance <392.8> Work Units Mix Variance <1047.2> Workload (OOO) Efficiency Variance < 3030.0 > <4470.0> Job P A V P A V This says that the data center perfor- Al 23 4 6 <2> 170 165 5 mance of running these jobs, especially Job C896, suffered either because of: B357 8 3 5 80 100 <20> 1. A bad application program, C896 10 15 <5> 300 500 <200> 2. Some system deficiency, 3. A bad estimate of what it would take to

Total 1 22 24 <2> run the job, or 4. A bit of all three. Total SWU' s (000) It should be noted that user behavior caused 32^ of the total variance. Often, P A V the user behavior element contributes even more to the total variance, especially at , 680 990 <310> peak periods.

640 300 340 Where can SWU's be obtained? Most computer systems have some kind of log that 3000 7500 <4500> accumulates resource usage. CPU time is the easiest measure, but relative CPU power dif- 4320 8790 <4470> ferences dictate that adding pure CPU time from different CPU's might be erroneous over In this case, the SWU' 's consumed per time. IBM has offered a solution with MVS job were different than planned. Since we by providing Service Units which are inter- have kept everything the same except the nally calculated. These Service Units are actual SWU's per job, the volume and mix indeed CPU time times an internal power variances are exactly as previously cal- factor which result in theoretically com- culated . patible service units over a variety of IBM CPU's. The efficiency variance is calculated by: The Institute for Software Engineering (Planned Job SWU^ - Actual Job SWU^ has also provided a great deal of literature X in this area which deals with Software Actual Jobsi = Efficiency Variance^ Physics. The direction of this work is some- + what similar to the service unit methods (Planned Job SWU^ - Actual Job SWU^) built into IBM and PCM systems, but is much X more complex. Actual JobSjj = Efficiency Variancej^

126 So far, the analysis of variance has the data center runs or not. Hence, it is focused on service units or the amount of more elastic as charging schemes and manage- work going through a data center. How do ment pressure are applied. these translate to levels of service? We are also talking about the necessity The data processing work plan basically for conscious management decisions regarding consists of putting out the required volume the economic benefits of achieving a given and mix of work as a basic requirement, and online response time versus the system costs on a timely basis as a second but equally that will be required to provide that level important requirement in most shops. We can of response. Or, as another case, the be sure that capacity has been exceeded when adding of another application to the online the data center physically cannot process system in light of its potential impact on the required workload even if it were to run the response time of current users and appli- a full three shifts per day, seven days a cations may or may not be possible under the week. But below this level, there are other current configuration because of the impact considerations that become practical limita- it will have on the service required for tions of available hours to do data process- other applications. This is where perfor- ing work. The variances analyzed earlier in mance management comes in, specifically, the this paper will affect the timeliness of the analysis of performance data. What we want data turnaround, especially in on-line sys- to know is which user-controlled variables, tems. Plans of user workload are especially i.e. , volume and mix, will impact service important during the peak online requirement level performance. Those of us who have that occurs during the normal five-day work been tracking this type of data over time week. This is especially important if the know what happens when a CPU gets over 30% data center is very involved in the basic busy and are well aware of exponential deg- business of the company with which it works, radation of service levels that occur beyond such as a bank or department store. this level. This is also true for many other areas within the system depending on What we're really talking about here where the system bottleneck is. Are there are the end user's work schedules and how tools that will identify such sensitive that impacts the ability of the data center areas in the system? If so, how might they to plan and deliver services matching their be applied? schedules as postulated in the User Behavior Elasticity Theorem ^; which states that the The software tools and sources of data degree to which the data center can influ - needed to perform this kind of analysis are ence end -user behavior is inversely propor- readily available. You probably have some tional to the degree that data processing of them installed on your system already. is involved in the basic business of the organization . This can be illustrated by When the data center has analyzed the two examples. effects of user behavior and workload, the next logical step is to relate them to the The first is in the banking industry capability of the system. Peak period per- where automated tellers are being imple- formance will probably be the main ingre- mented. The data center cannot influence dient in any service agreement. The first the end -users of this application to any step in predicting system capability is to noticeable degree because the bank's basic know what the system can do now. By imple- business depends upon having the automated menting performance reporting, there will be tellers operating when its clients (i.e., data relating to user workload characteris- depositors, withdrawers, etc.) want to make tics. These can then be related to service a transaction. Chargeback schemes, manage- level achievement. ment pressure, and the like will have little or no effect. On the other hand, a company First, we must assume that the system producing buggy whips that has not inte- is properly tuned. A second assumption is grated data processing into its basic busi- that the workload has been shifted to the ness will tend to be much more flexible in extent possible (i.e. , batch work at night terms of end -user behavior because produc- so as not to interfere with online work). tion and distribution will continue whether At this point, the theorem of User Behavior Elasticity applies. The next logical step is to develop the operational standards of performance in terms of service levels based on the current configuration and end -user ^This is my theorem developed from service level objectives. personal observation and many discussions about chargeback systems.

127 .

In the case of predictive models, such levels may also be calculated. Both the questions as "What effect on service levels capital and service analysis will assist in will be experienced by adding another chan- establishing the financial requirements for nel or MSD device?" can be modeled and the the data center. results calculated against the current work- load. Future anticipated workload growth It is important that the model be fed can also be modeled to see future service from an historical data base fed by the level achievement with the current system. various system monitors so that workload Shifts in volume and mix as compared to data can be easily integrated. This may be plan, when modeled, will illustrate service used to define current standards of perfor- degradation caused by user behavior. mance as well as defining trends for pre- dictive modeling. Alternative hardware and software options may be considered to find what is The model should be economical to run, needed to maintain negotiated end -user that is, it should not consume much computer service level requirements. Additions to time. Since this process will be inter- the current system or other alternative active and many passes of the model will be systems may also be modeled to measure the required to determine the new performance impact curve, each pass should run in a couple of minutes or less. The major characteristics of a good predictive model are the following: And finally, the output must be easy to interpret and readily presentable for manage

1 . Results can be easily validated raent reporting. Again, a business analyst 2. Easy to use should be able to interpret the output and 5. Will distinguish the various hardware be able to input various alternatives as a and software characteristics of avail- result of the output from any given pass of able products the model. 4. Can easily integrate user historical data to define workload characteristics The objective of this measuring, ana- 5. Economical to run lyzing, and modeling will be to derive per- 5. Easy to interpret reports formance curves that can become operational standards of performance that show the rela- The first and foremost requirement is tionship of service level achievement versus that the model be easy to validate. The user behavior. Exhibit 1 shows the process value of a predictive model is in its pre- involved in establishing such standards. It diction! is an interactive process that involves tuning and end -user negotiations until The second and very important require- finally an agreed standard of performance ment is that the model be easy to use. This has been set. However, the standard is will reduce the need for highly technical dynamic. Exhibit 2 is a generalized per- systems people or, at the very minimum, a formance curve with user -controlled vari- mathematical theoretician to use the prod- ables along one axis and service level per- uct. Rather, the model should be usable by formance along the other. User-controlled a trained business analyst. Most errors variables are, in fact, the various levels made by models are created from erroneous of volume and mix for each of the various input. Complicated models tend to be re- categories of work. splendent with such opportunities for error. In this case we will discuss TSO trans- The predictive model ideally will easily action response as a function of the volume incorporate hardware and software alterna- of user activity. The data for this graph tives available to the data center, so that may be obtained from CMF, RMF, or SMF. In the analyst can play "what if?" games. That the case of a DOS or in non-IBM environ- is, various configurations can be matched ments, the system log that records the against various workload and service-level volume of activity and system resources requirements to determine which configura- consumed should provide the necessary data. tions would be optimal. This type of anal- Exhibit 3 is an example of a System Workload ysis lends itself extremely well to the Summary^ from which the following data can decision-making needed in the process of be obtained: appropriating capital goods (i.e., data center hardware or software). From this data, various suitable configurations can be selected and the cost effectiveness of each "Exhibits 3, 4 and 5 were produced by evaluated. The cost of various user service CMF for an IBM MVS system.

128 1 . Volume of transactions by performance A third problem is that the system is groups never exactly the same from one period to 2. Service units used in each system area the next. For this reason, it is extremely important to correlate performance data from Exhibit 4 is the CPU Utilization Report the system to data logged from a change which shows: management tracking system. A change manage- ment tracking system is basically a problem

1 . CPU busy data reporting system for all hardware, software, 2. CPU queue data and applications problems and changes made to the system. There have been many times Exhibit 5 is a TSO Subsystem Perfor- that a one-byte code change on Sunday night mance Report which shows: caused a system to come to its knees on Monday morning with resultant days of anal-

1 . TSO response by time period ysis before the change was found. Each 2. TSO response by command system change needs to be documented and 3. Concurrent TSO users available for analysis when performance data shows a major deviation. This analysis goes The above data was fed into the SAS a long way towards explaining the efficiency statistical program to determine which data variance we discussed earlier and should be showed a relationship between TSO response mapped to each change in the performance of and another variable. The variable with the the system. closest relationship turned out to be the CPU queue time"^ which is directly affected Using a Model to Plot a Curve by user volume and mix. In the case of our system, we are currently CPU-bound, so this Because graphing historical data will is not a surprising bottleneck. As you can not always answer the questions of future see in Exhibit 6, a linear regression line system behavior for future user workloads, a has been drawn with the standard error shown model is often used to simplify the process. as a dotted line on either side. In this By using a model, curves may be defined for case the standard error is 1 . 1 seconds on each system limitation as well as an overall either side of the regression line. Basi- curve for the present system capability. cally this says that two -thirds of the time, This requires many iterations of a model to observances of TSO response versus CPU queue define the points on the curve and even more time will fall within the area bounded by iterations to define other possible curves. the dashed lines. It is also important to validate the model to actual system results. That means, if we The same analysis was done for TSO mix pick a point on the performance curve and in Exhibit 7. This line turned out to be take the corresponding volume and response, flat and was essentially meaningless because how close does this match reality? In other of the variation in data. Again, this cor- words, will the point fall within the bounds responded to what we wish the system to do. defined in the linear projection and stan- TSO has a high priority, so even at high CPU dard errors graphed in the analysis of his- busy levels, the TSO service should not torical data as was shown in Exhibit 8? suffer. If indeed the model can be validated Exhibits 8 and 9 show the effect on TSO and the results are consistent, we have in response caused by total service units con- fact defined an "operational standard." sumed and concurrent users respectively. In "Operational standard" as used here means both cases there is a direct relationship, that there is a standard performance charac- even though the standard error is larger. teristic for a given level of user activity As you can see, using historical data, sys- on the current system. This becomes an ex- tem interrelationships with user behavior tremely powerful tool for negotiating ser- can be derived. However, this kind of data vice levels with users. A major misunder- tends to be linear, and does not answer the standing with users can be avoided if they question of how the system will react to realize that for some levels of user activ- user behavior at levels not yet experienced. ity, response will be lower. This is espe- cially true if there are peak periods with extreme activity for short periods during the day and that activity causes the "knee" of the curve to be reached.

'As calculated by Little's Rule, Exhibit 10 comes from an actual case or the mean length of a queue is equal to where a factory made a union agreement to throughput times queueing time. clock out all employees in ten minutes.

129 This agreement brought the system to its ison to planned performance curves has value knees. The exhibit shows a performance in determining: curve for a 370-148 which averages 2.0- second response time for about 10,000 1 . Data center efficiency transactions during an 8-hour period. If, 2. The effects of user behavior (volume and however, 625 transactions come in between mix) 4:00 and 4:10 and must be processed in the 3. Benefits of tuning same two -second response time, a different 4. The ability of the data center to meet system will be required. This is due to the various levels of activity fact that 625 transactions in a ten-minute 5. The benefits of various system alter- period are equal to 30,000 transactions in natives an 8-hour period. A curve is drawn for a 3033 to illustrate the system expansion Because of these benefits and the fact needed to accommodate the 10 -minute traffic that future models will get even more in- at 2-second response. In this case, the volved in simulating operating system company management will have to weigh the parameters (i.e., SRM under MVS), perfor- service requirement against the additional mance curves should be available on a real- investment. If user behavior is relatively time basis. This means linking the Change inelastic, there will soon be a 3033 in- Management Tracking System and realtime stalled. system monitor/model so that early warning mechanisms can be implemented. Realtime

Exhibit 1 1 is a performance curve of monitors will in effect simulate system the Boole & Babbage data center that was changes and signal when the performance made as an example for a case study. This curve has changed from plan. This will system is comprised of: provide an effective tuning tool by dis- playing these changes much like the IPS CPU M80 (370-148) plus 6MB memory parameters in which domains and multipro- gramming levels are displayed in the IBM

Disc 6x 3330 Initializaion and Tuning Guide .

8x STC 3630 (3350) By monitoring the effects of user behavior, both on a continuous after- Tape 3x STC 4534 the-fact basis as well as in models, the data processing operation has implemented an Channels 4 plus byte multiplexor important element of scientific management. The work analysis is done by the system Operating itself while the manager deals with the System MVS/ JES2/TS0/SPF/VAM/CICS question of user behavior and the inte-

gration of DP into the business. > Response time shown by the model de- graded significantly after 20 concurrent users and 1.1 transactions per second. In further analysis of this case, we found that we are indeed CPU -bound. This means that as the CPU resource is consumed by a workload or variations in user behavior patterns, deg- radation of service will be a direct result. A 4341 Model 2 with expanded CPU capacity is on order and we hope to see some relief this fall when it is installed. The next step in this process will be to model the perfor- mance curve of the new system. We may then find some other system bottleneck such as I/O or DASD. In the meantime, we now have our operational standard of performance for TSO.

The Future of Performance Curves

The modeling of a workload against a given system is not only necessary, but is feasible with currently available tools and technology. The computation of and compari-

130 APPENDIX A

MEASURES OF PRODUCTIVITY IN DP OPERATIONS

RESULTS OF VOTING*

MEASUREMENT FACTOR RANKING VOTES

ONLINE RESPONSE TIME 1 198

ON-TIME REPORTS 2 160

SYSTEM UPTIME 3 141

USER SATISFACTION 4 I37

RERUN PERFORMANCE 5 101

REPORTS DISTRIBUTED W/O MISTAKES 6 56

NUMBER OF INTERRUPTS IN ONLINE SERVICE 7 55

PROBLEM RESOLUTION TIME 8 47

CPU UTILIZATION 9 45

COST OF OPERATION 10 43

ACTUAL VS. SCHEDULED RUN TIME 11 39

DEMAND BATCH TURNAROUND 12 34

RECOVERY TIME FROM FAILURE ~ 34

TIME SHARING INTERACTIVE RESPONSE — 30

LATE JOBS FAULT OF OPERATOR — 30

OUTAGES BY CATEGORY — 29

APPLICATION PROGRAM ABENDS — 27

NUMBER OF PROJECTS WITHIN BUDGET — 24

HARDWARE /SOFTWARE RELIABILITY, INTFAC — 23

NUMBER AND TIME OF TAPE MOUNTS — 20

JOBS COMPLETED PER COMPUTER HOUR — 19

PEOPLE TURNOVER — 16

ABSENTEE RATE — 15

*Based on a survey of a meeting of data center managers at the 1981 EDP Performance Con- ference in Phoenix, Arizona, February 23-26, 1981.

131 APPENDIX B

EXHIBITS

1. Establishing operational stan- 7. Linear regression - TSO response/TSO mix dards 2. Generalized performance curve 8. Linear regression - TSO response/SU's

3. CMF system workload summary 9. Linear regression - TSO response/concur- rent users 4. CMF CPU utilization report 10. Performance curve for B/B DC

5. CMF TSO subsystem performance 11. Performance curve for B/B DC report 6. Linear regression - TSO re- sponse/CPU Q

FXHIRIT 1 ESTABLISHING STANDARDS

RUN MPLEMENT CONTtJUOUS CHANGES MONfTORS

132 0T2 0^0. 0«. 12'

3.00. 21 .13 0.30.B4.I],

0.00.06.121

133 1

EXHIBIT 4

I UT IL (^AT ION (

en:CllT*Bie ASIOiKfCT TF ElT. Tl'^c) Ql'Cue OEn^'H - J.M, M»«INUM QMFUP SIIF - 9) CM - I"LF HI CPU ^0 CHaNN?L ANJ Mo OevlCF BJSY (OCT OF EIT. T I* 2. 0 15. J X OF CP'i 81IST Fnp 8»TCh tflTit fflP TSO a.i t OF CPU BUSY FOB TSO r TftS«S • 13.5 ( OF CPU BUSY fOO ST»»TFn TASKS '1. ITB»CTno OVEBMFfiO (PCT OF EM

CPU 0 ( 76) SEC tin

CPU PUSV 0IST3i»»tTI0^ I CF CU SUSY

ri ' 'BL Fi

BUSY PEKCENT

EXHIBIT 5

a sp; AP

vEtA'".'^ us

0.0 flwtaar.f iiSFRS

134

136 EXHIBIT 10 ^

8-

7_

TSO 6_ RESPOSE

CONCURRENT TSO USERS I . . . , .

APPENDIX C

A BIBLIOGRAPHY FOR FURTHER STUDY

1. Arnold 0. Allen, Probability , Statistics 14. J. Martin, Design of Real-Time Computer and Queuelng Theory with Computer Science Systems , Prentice-Hall, Englewood Cliffs

Applications , Academic Press, New York, N.J. (1972). N.Y. (1978). 15. J. Martin, Systems Analysis for Data

2. C. Warren Axelrod, Computer Effectiveness : Transmission , Prentice-Hall, Englewood

Bridging the Management/Technology Gap , Cliffs, N.J. (1972). Information Resources Press, Washington,

D.C. (1979). 16. Montgomery Phister, Jr. , Data Processing

Technology and Economics , Santa Monica, 3. L. Bronner, Capacity Planning: An Calif. (1977).

Introduction , IBM Technical Bulletin GG22-9001-00 (January 1977). 17. Charles H. Sauer/K. Mani Chandy, Computer

Systems Performance Modeling , Prentice- 4. L. Bronner, Capacity Planning: Imple- Hall, Englewood Cliffs, N.J. (1981).

mentation , IBM Technical Bulletin GG22-9015-00, (January 1979). 18. David R. Vincent, "Software Tools for Service Level Management", Data Manage- 5. J. P. Buzen, "Queueing Network Models of ment , pgs. 25-29, (March 1981). Multiprogramming", Ph.D. Thesis, Harvard University, Cambridge, Mass. (1971). 19. David R. Vincent, "Measuring Performance

Online" , ICP Interface , Data Processing

6. Computer , April 1980, IEEE Computer Management , (Summer 1980) Society (contains several articles of interest) 20. David R. Vincent, "Service Level Manage-

ment", 1980 CMFGXI Proceedings , pgs.

7. Peter F. Drucker, Management : Tasks- 196-207.

Responsibilities-Practices , Harper & Row, New York, N.Y. (1974) 21. Daniel A. Wren, The Evolution of Manage-

ment Thought , John Wiley & Sons (1979).

8. Jeffrey L. Forman, Change Communication :

A Management System , IBM Technical Bul- letin GG22-9254-00 (July 1979).

9. An Architecture for Managing the Informa- tion Systems Business, Volume I: Manage-

ment Overview , IBM GE20-662-0 (January 1980)

10. Problem and Change Management in Data

Processing - A Survey and Guide , IBM GE19-5201-0 (August 1976).

11. IBM System Journal , Volume Nineteen, November 1, 1980, "Installation Manage- ment, Capacity Planning."

12. H. Kobayashi, Modeling and Analysis: An Introduction to System Performance Evaluation Methodology, Addison-Wesley Reading, Mass. (1978).

13. J. D.C. Little, "A Proof of the Queueing " Formula, L= W, Operations Research 9 , pgs. 383-387 (1961).

138

I

ll .

SESSION OVERVIEW BENCHMARKING

Barbara Anderson

Federal Computer Performance Evaluation and Simulation Center Washington, DC

Benchmarking is a method of measuring system performance against a predetermined workload. The approaches and tools used to represent this workload in benchmark tests have been evolving and improving over the last several years. The first two papers in this session; 1) "Universal Skeleton for Bench- marking Batch and Interactive Workloads" by William P. Kincy, Jr., and Walter N. Bays of the Mitre Corporation, and 2) "Comparing User Response Times on Paged and Swapped Unix by the Terminal Probe Method" by Dr. Luis Felipe Cabrera of the University of California, Berkeley and Dr. Jehan-Fr ancois Paris of Purdue University present new approaches to contructing benchmarking tests. The final paper, "Practical Application of Remote Terminal Emulation in the Phase IV Competitive System Acquisition," by Deanna J. Bennett of the Phase IV Program Management Office of the Air Force and the mini-panel with Randy Woolley of the Tennessee Valley Authority and Robert Waters of the Air Force Computer Acquisition Center detail recent experience in various aspects of benchmarking for system acquisition

1U1

UNIVERSAL SKELETON FOR BENCHMARKING BATCH AND INTERACTIVE WORKLOADS.* UNISKEL BM

Walter N. Bays

Wi 1 1 iam P. Ki ncy , Jr.

MITRE Houston, TX 77058

The most important objective of the acquisition of new computer systems is to acquire the least expensive system or systems which have the necessary capability to meet workload and quality of service requirements.

The determination of "the necessary capability" may be accomplished by several methods; the most important and accurate method being the execution of performance benchmarks in a "Live Test Demonstration (LTD)".

Building benchmarks is time-consuming and costly. This is particularly true if the benchmarks are designed to represent accurately the batch and interactive workloads to be executed.

One solution would be a benchmark which could be adapted to represent any workload requirements relatively inexpensively. Such a benchmark (UNISKEL BM) is the subject of this paper.

1. Introduction possible without adding to the cost of the development effort, the universal applic- 1.1 Background ability of the benchmark to any agency having similiar benchmarking requirements. UNISKEL BM was developed for NASA's These goals led to the design of a benchmark Johnson Space Center, Houston, Texas by skeleton which could be used by any agency MITRE. The primary objective of the in custom designing a benchmark. In fact, benchmark effort was to design represent- this benchmark skeleton was used by MITRE to ative benchmarks which could be used to build benchmarks for the the Bureau of select new computer systems to replace Census to be used in their acquistion of the obsolete systems in the JSC Central Bridge System. Computing Facility (CCF). The CCF is a general purpose, open shop-type computing 1.2 Use of UNISKEL BM facility supporting the administrative, shuttle mission planning and shuttle The benchmark skeleton described in engineering and development functions at this paper provides a procedure for JSC. integrating programs (SYNTHETIC TASKS) and descriptions of functions to be accomplished An additional objective of the by system software (editor, utilities, etc.) benchmark effort was to emphasize, where (SYSTEM PROCESSOR TASKS) into job streams.

1^3 . .

These job streams are further organized into procure balanced systems to process a known benchmarks representing specific workloads workload under defined quality of service to be executed by the System Under Test objecti ves (SUT)fc These synthetic and system processor tasks may be interactive or batch and may be 1.3 Scope of Paper designed to represent a real workload (one being executed on an agency's system today) Section 2 describes the UNISKEL BM or a perceived workload (one expected to be concept including a brief description of the executed on new systems in the future). workload model which provides input data to UNISKEL BM. Section 3.0 describes the A model of any real workload may be UNISKEL BM structure and provides a brief derived from accounting data. Although this overview of its generation and execution. workload inevitably has the characteristics Appendix A describes the procedure a vendor of the system it was excuted on, UNISKEL BM would follow to conduct a Live Test represents it as a mix of vendor-independent Demonstration (LTD) using the benchmarks. FORTRAN or COBOL tasks, and functionally This example procedure is extracted from specified system processor tasks. The NASA/JSC's documentation supporting their resulting benchmarks are independent of a acquisition action. Appendix A is not vendor's accounting convention and include a attached to this paper. It will be provided mix of tasks representing an explicit amount separately to anyone having an interest in of work to be accomplished. The work to be that 1 evel of detai 1 accomplished is described by the following: 2. UNISKEL BM Concept SYNTHETIC TASKS: 2.1 Design Objectives Virtual Storage Size Number of kernels to be executed The UNISKEL BM was designed 1) to be as Number of accesses to tape/disks independent as possible from any vendor's Record Size system, 2) to minimize biases for or against Number of lines of print any particular computer system architecture, Think time (Interactive Tasks) and 3) to minimize a vendor's time and effort in support of the LTD. SYSTEMS PROCESSOR TASKS: A study of the architectures indicated Functions to be performed that a benchmark which meets the following criteria meets the design objectives: The major advantage of UNISKEL BM is that an agency procuring new systems can • Easily transportable save the time required to design synthetic t Representative of the expected programs and a benchmark structure which can workload as pertains to: be easily converted and executed on a 0 Workload characteristics proposed system or configuration of systems. • Virtual main storage re- UNISKEL BM also includes a Live Test qui rements. Demonstration (LTD) procedure which may be • "Locality of reference" tailored to an agency's requirements. This profile. (Locality of reference pertains to allows the agency to concentrate efforts on the way a program proceeds through its 1) building a model of the future workload instructions and data, e.g., sequentially, and plugging the workload characteristics randomly or some other profile.) into an existing benchmark framework (this can be accomplished manually), 2) defining The last criterion, if not repre- the quality of service desired in new sentative of the real world, could impact systems (in terms of response times for performance depending on whether the system interactive work and overall batch being benchmarked is virtual or non-virtual capacity),- and 3) testing the benchmarks to in its storage management. These criteria assure that the benchmark represents the are addressed further in the following desired workload. paragraphs.

UNISKEL BM would not be appropriate for 2.2 Structural Overview an agency buying a minicomputer or desiring only to measure the speed of a CPU when UNISKEL BM is composed of synthetic executing specified instruction mixes. It programs, data to be processed and is appropriate for mainframe competitive descriptions of functions to be accomplished procurements where the objective is to by systems processors/utilities. The

1U4 e : :

synthetic programs have been designed to be Scripts 1 through N in Figure 2 repre- fully transportable to any system having an sent terminal users 1 through N. These ANSI/78 FORTRAN compiler. The synthetic emulated terminal inputs cause interactive program can be made to represent different tasks (synthetic and systems processors) to sizes of batch or interactive tasks. The be executed. These inputs are delivered to synthetic tasks are characterized by the the system being demonstrated at rates which number of cycles of CPU kernels executed, are representative of the line speeds of the the number of I/O accesses to disk and tape, real communications. the number of lines printed and the voluntary wait (think) time between The prime and non-prime shift interactive activities. The structure of a benchmarks may be generated in various sizes synthetic task is illustrated in Figure 1. representing different sizes of the representative workload. For example, in NASA/JSC's acquisition specifications, one SYNTHETIC PROGRAM DYNAMIC size of computer system that was acceptable CONFIGURATION was a system which had the equivalent PARAMENTERS capacity of two UNIVAC 1108's. Conse- DATA ARRAY quently, benchmarks would be executed which (Data Space represented the prime and non-prime shift Si ze workloads of two llOB's under acceptable CPU Seconds quality of service objectives. A proposed I/O system of the proper size would have to MULTIPLE Record Length demonstrate that capability by executing the COPIES OF Print Lines benchmarks within the threshold elapsed OF KERNELS Local ity times and response times specified in the

Prof i 1 specifications.

( Instruction Think Time Space Size) Different sizes of benchmarks are CONTROL LOGIC generated by replicating a base size. For example, the base sizes used by NASA/JSC are Figure 1. Synthetic Program Structure shown in Table 1. These sizes were defined as having a Performance Rating of 0.5. The Performance Rating (PR) is a measure of a The systems processors/utilities system's throughput when executing these functions to be executed are also organized benchmarks. For example, any system which into tasks (by the demonstrating vendor). can complete these PR 0.5 workloads in the These tasks utilize the systems processors specified elapsed time is considered to be of the System Under Test (SUT). at least a PR 0.5 system for the purposes of the acquisition action. These synthetic and system processor tasks are organized into jobs or runstreams. The jobs are further organized into bench- Table 1. Benchmark-Base Sizes PR 0.5 marks, one representing a prime shift workload and one representing a non-prime BENCHMARK PRIME NON-PRIME shift workload. The non-prime shift bench- CHARACTERISTICS SHIFT SHIFT mark is batch only. The prime shift BATCH INTER- BATCF benchmark contains interactive activities ACTIVE and background batch jobs. The prime shift Program Size, benchmark is designed to be demonstrated 36-bit words. utilizing a Remote Terminal Emulator (RTE). Minimum: lOK lOK lOK An RTE is illustrated in Figure 2. Maximum: 76K 45K 125K I/O Accesses: To Disk 29K 90K 22K To Tape: 19K 0 22K SCRIPT 1 Jobs 8 NA 20 COMM REMOTE Interactive Users: NA 8 NA HOST FRONT TERMINAL Jobs Using Tape: 1 0 1 END EMULATOR Tasks Executed: 94 628 289 SCRIPT N Files: 8 80 220 Elapsed Time in -SUT- Minutes <--- .200 > 210

Figure 2. Remote Terminal Emulator

145 .

Different sizes of benchmarks are For example, a mul ti -processor generated by replicating these PR 0.5 sizes. configuration, or a loosely coupled con- For example, a system being proposed as figuration of more than one system utilizing having a Performance Rating of 6.0 would one input stream for batch, proposed as demonstrate that capability by executing having a PR7 capability must demonstrate benchmarks of size PR 6.0. These sizes that capability by executing a size PR7 would be generated by adding 12 copies of benchmark. A configuration of more than one the PR 0.5 sizes together. The PR 6.0 size uncoupled systems can also be accommodated. non-prime shift benchmark would consist of This is shown in the following example. 240 batch jobs (20 batch jobs x 12 = 240). The prime shift benchmark would be generated the same way and would consist of 96 LTD REQUIREMENT: TWO UNCOUPLED SYSTEMS interactive users and 96 background batch HAVING TOTAL CAPACITY = PR 7.0 jobs. SCORING EQUATION: 2.3 Scoring and Evaluation CAPACITY SIZE N + CAPACITY SIZE M .EG. The benchmarks are designed for ease of DESIRED CAPACITY scoring and of evaluation. Two measurements are used to evaluate performance. These are CAPACITY (SIZE N OR M) = the elapsed time required to accomplish a well-defined, finite batch or interactive ELAPSED TIME THRESHOLD workload (the selected sizes of the X SIZE benchmarks), and the response times for the MEASURED ELAPSED TIME emulated interactive users at terminals. For a system being proposed as having a For example: A batch benchmark of specified performance rating, the benchmark sizes PR 3.0 and PR 4.0 were demonstrated on elapsed time, must be equal to, or less two uncoupled systems to meet a capacity than, the elapsed time threshold contained requirement of PR 7.0. Actual elapsed times in the specifications. In addition, a were 200 minutes for the PR 3.0 LTD and 215 specified percentage of all interactive minutes for the PR 4.0 LTD. The elapsed transactions, of the same type, must have a time threshold was specified as 210 minutes. response time equal to, or less than, the Using the above equations, the actual response time goals specified. capacities would be rated as follows:

This method of evaluation is based on 210 minutes 210 minutes the concept of testing a base size of bench- X 3 + X 4 mark on a known system and determining the 200 minutes 215 minutes thresholds (elapsed time and response times) of acceptable performance with the represen- = Combined Capacity tative workload. Based on this, and the knowledge of how much capacity would be 3.15 + 3.91 = 7.06 OK required in the known system to meet the requirements new systems are being procured Size restrictions should be placed on for, a Performance Rating requirement can be the demonstrator during the LTD. For determined example, it may be desirable to limit the capacity rating attributed to a system to For example, the NASA/JSC benchmarks of the size of the benchmark executed. In the size PR 1.0 were tested on JSC's 1108's. example above, a rating of greater than 3.0 The size of the benchmark and/or the elapsed would not be allowed because a PR 3.0 times and response times were adjusted so benchmark was executed. To receive a 3.15 that the PR 1.0 size could be executed on an rating would require the execution of a 1108 with acceptable quality of service. benchmark of PR 3.5 size. This agency- The future workload to be executed on the imposed restriction recognizes that the new systems was determined to be equivalent relationship between the size of the work- to N 1108's. Consequently, systems having a load and the elapsed time is not linear. Performance Rating of N are to be procured. A system executing a benchmark of PR N size, The prime shift benchmark LTD would within the specified elapsed and response also include an evaluation of response times, has N times the capacity of an 1108 times. The actual LTD response times would when executing the benchmark. have to meet response time criteria as specified in the statement of work.

1U6 :

Typical response time specifications item (E&D-l, etc.) will become one synthetic might be: task with the characterstics shown. These individual component models (there is one 1. Of the M different types of for each marked entry in Table 2) are transactions, N must meet the following normalized in a combined model of individual criteria synthetic or system processor tasks. Several pieces of the prime shift model are a) 85% of executions of a shown in Table 4. transaction must have a response time of equal to or less than the response time threshold (RTT), plus Table 2. Composition of CCF Workload

b) 98% of executions of a transaction must have a response time of COM- MAJOR PROGRAMS/ BATCH DEMAND TOTAL equal to or less than two times the RTT. PONENTS PROCESSORS

2. The balance of transactions SVDS SVDS, SVDSl, 15%* 11%* (M minus N), must meet the following SVDS2, SVDS3 criteria: NASTRAN LINKl, LINK4, 1%* 1%* LINK 8 a) 50% of executions of a E&D ABSLB,TRASYSP, 20%* 15%* transaction must have a response time of SSFS, ETC. equal to or less than the response time PLOTS V8 PLOT, V PLOT 5%* 4%* threshold (RTT), plus TRWPLT SYS b) 98% of executions of a PROC COB, FOR, ASM, ED, 23%* 59%* 32%* transaction must have a response time of MISC. OPEN SHOP, equal to or less than two times the RTT. OPS, MNGMT. 36%* 41%* 37%*

Note that each different type of TOTAL 100% 100% |100% transaction has its own RTT. BATCH/INTERACTIVE An agency would probably allow no more (DEMAND) SHARE: 75% 25% than 15% of the transaction types to fall DISK/DRUM ACCESSES into the criteria described in "2." above. PER CPU+I/0 SECS. 16.5 24.2 TAPE ACCESS PER CPU+I/0 4.0 3.2 2.4 Workload Representation SECS. 10 DIVIDED BY CPU 1.38 2.89 UNISKEL BM represents any workload which can be described at the task level in Model Components the following terms:

BATCH OR INTERACTIVE TASKS: Table 3. E&D Batch Model VIRTUAL MEMORY SIZE CPU SECONDS TO BE BURNED NUMBER OF I/O ACCESSES TO DISK AND PROGRAM PROGRAM CPU I/O ACCESSES FREQ. TAPE SIZE K SUP

NUMBER OF PRINTED LINES OF OUTPUT WORDS SEC DISK . TAPE LOCALITY OF REFERENCE-INSTRUCTIONS AND DATA E&D-l 63 27 1101 1 THINK TIME FOR INTERACTIVE TASKS E&D-2 63 61 405 3 E&D-3 66 354 2868 1 SYSTEMS PROCESSOR/UTILITIES TASKS: E&D-4 72 166 3512 2 STATEMENT OF FUNCTIONS TO BE E&D-5 72 168 14562 3108 1 ACCOMPLISHED E&D-6 72 5 492 5 E&D-7 76 385 770 1 These workload characteristics are stated in terms acceptable as input to the TOTALS UNISKEL BM. Such a workoad model is Each 30000 3108 14 illustrated in Tables 2 through 4. Table 2 SUP SECONDS: 1474 818 101 represents an overall workload on one or more systems. Table 3 represents one TOTAL SUP SECONDS = 2393 lO/CPU = 0.62 component of the overall model. Each line

147 . .

Table 4. PRl Prime Shift until the 100 ms and the 10 I/O's had been Workload Model completed.

The kernel selected could be sub- PROG. PROG. CPU SUP I/O ACCESSES FREQ*THINK routines of applications in the expected SIZE SECONDS TIME workload or could be "canned" instruction or DISK TAPE operations mixes. NASA/JSC and the Bureau of Census elected to use primarily sub- SYSTEMS PROCESSORS - DEMAND: routines of real applications to enhance

represent at i veness ED-1 5 1.456 107 11x28 8 ED-2 5 1.743 129 65x9 8 The kernels selected to burn CPU time ED-3 5 .801 73 245x12 8 also are involved in the "locality of reference" capability of UNISKEL BM. The size of an activity or task is made up of MISCELLANEOUS DEMAND^ instructions and data. The instruction size comprises library routines, synthetic MISC-1 5 .028 56 2 9x28 1 program logic, and the multiple kernel MISC-2 5 .006 6 9x6 3 copies. The data arrays also contain MISC-3 5 .020 20 52x20 11 records which vary in number based on the desired total size of the activity. A synthetic task representing a real task of BALKbKUUINU bAILH: 20,000 words in size could be generated as approximately 10,000 words of instructions SVDS-A 68 213.0 3076 1x0 and 10,000 words of data. The instructions SVDS-B 75 17.0 1027 1x0 might include 2,000 words of overhead SVDS-C 75 21.0 481 1x0 instructions and 80 copies of a kernel con- taining 100 words each (8000 words). The number of copies of the kernel depend on the PRIME SHIFT TOTALS"! calibration of the kernel selected and the CPU DISK TAPE FREQ. size to be represented. The number of data records would be selected the same way. Each: 234428 36968 6180 SUP SECONDS: 3208 6389 1202 The multiple copies of the kernel and the data records provide two functions. TOTAL SUP SECONDS: 10823 lO/CPU = 2.37 First, they allow the selection of a kernel copy and data record for execution based on Disk Accesses/Second = 17. 3 a distribution or profile which is similiar Tape Accesses/Second = 3. 7 to that locality of reference profile shown by the real application. For example, based on a profile entered into the synthetic pro- * Transactions X Subtransactions gram, different copies of the kernel and the data records will be selected each time the CPU is to be utilized. Since the synthetic In addition to the workload activity is progressing through its instruc- characteristics shown on the previous tions and data similiarly to the real appli- tables, the CPU kernels to be used to burn cation, no bias is introduced for either CPU time and the locality of reference "paging" or "swapping" memory management profile must be selected. An agency has a systems. The locality of reference mechan- choice of using multiple copies of any ism also assures that CPUs with cache memory subroutine or instruction mix (kernels) to have realistic "hit ratios." burn CPU time. The major problem presented by this These, kernels are calibrated (CPU capability of UNISKEL BM is determining what milleseconds per cycle and size). "N" the locality of reference profile is for the cycles of the kernel are executed to burn real application being represented. A the desired amount of CPU time. A task (the special sampling routine. Code Interruption synthetic program in the benchmark) burning Analyzer (CIA) was developed to assist in 100 milleseconds of CPU and doing 10 this determination for the NASA/JSC accesses to disk per execution, would do an benchmark I/O, burn 10 ms of CPU time by executing, for example, 5 cycles of a 2 ms kernel, do Second, the use of kernel cycles and another I/O, burn another 5 cycles, etc. similiar blocks of data for each I/O, main-

US .

tain the independence from the system upon assuming the workload and locality of which the benchmark is developed. Note that reference characteristics are available, the method of accounting for use of re- represent a significant task both in terms sources, seconds, standard units of proc- of human resources and computer time. The essing, machine resource units, etc., is generation of these benchmarks by a new immaterial as the work executed is stated in using agency is the subject of paragraph 3. terms of number of kernel cycles for CPU time, number of I/O accesses to various It should be noted that although devices, elapsed wall clock time for the significant effort is required, the human total benchmark and response time for inter- and computer resources required to build active activities. This independence is benchmarks using UNISKEL BM is probably an furthered by allowing the LTD vendor to order of magnitude less than starting from compile the benchmark allowing program sizes scratch and building benchmarks utilizing to vary with word size and the instruction synthetic programs. packing capability of the compiler. For example, an activity within the benchmark 3. Benchmark Generation may be representing object code, on a UNI VAC system, of 30,000 36-bit words containing This section describes how an agency instructions and data. This same activity, uses the UNISKEL framework to build a after being converted and compiled on a benchmark. Each step of the process is CYBER system, could be only 18,000 60-bit discussed briefly; then automatic generation words in size. This is acceptable for tools are discussed. Figure 3 illustrates UNISKEL BM. the benchmark generation process.

2.5 Other Features

UNISKEL BM is a framework within which all types of workload can be represented. For example, data base management systems and transaction processors in the SUT could be exercised by jobs designed to func- tionally exercise those processors. To date, neither of the users of this framework have had a requirement to exercise such capabi 1 i ty

There is also a framework for meeting floating point accuracy requirements. This is accomplished by the selection of the kernels to be executed. In the NASA/JSC application, some kernels had a requirement for 14 decimal digets of accuracy. This would require some systems to execute those kernels using double precision.

Special kernels could also be designed to exercise special capabilities required in systems to be acquired. For example, kernels could be designed to verify special arithmetic or logical capability of a proposed system.

2.6 UNISKEL Limitations

The primary limitations on UNISKEL BM are two. First, program sizes of less than 10,000 36-bit words cannot be represented Figure 3. Benchmark Generation due to the overhead associated with the synthetic program. We found this not to be a problem in the NASA/JSC benchmarks. The first step is to determine from the workload model which parts of the benchmark Second, the generation and testing of will be represented by synthetic tasks and new benchmarks utilizing UNISKEL BM, even which by system processor tasks. As

149 discussed, a system processor task is the transactions per interactive execution and execution of a standard system program to transaction think time. The agency must accomplish specific work. A synthetic task estimate any missing factors, then develop is the execution of a synthetic program to by trial and error a script which consumes consume the same resources as a user written the proper resources and seems "reasonable" program. for the real user activity being emulated.

In the case of the non-prime shift In a competitive procurement these workload model presented in section 2.4, 23% scripts must be designed with conversion in of the work is accomplished by system mind. When writing a script for system A, processors, and 77% by synthetic tasks. how the same function would be accomplished These percentages apply only to the baseline on system B must be considered. Then the system, as system processor functions may function must be clearly specified in such a require different resources on different way that vendor B will accomplish it in a systems. For example, a text editor command realistic way, neither taking unfair requiring 80 Service Units to execute on an advantage nor being unfairly prevented from IBM system may require 0.1 SUP seconds to using a special feature of system B. For execute on a UNI VAC system. This is based example, if a vendor has a text editor on differences in accounting methods, system feature of "change string, and list changed speeds, operating systems, and software line", the agency should not force him to efficiency. There may also be a difference use two seperate commands "change string", between systems for synthetic tasks due to and "list changed line". If a vendor has a compiler efficiency, etc. These differences single editor command that accomplishes all are not undesirable, as the procuring five steps of the above example, he should agency's requirements are not for a certain not be permitted to use it, since a user amount of service units to be consumed, but would be unlikely to work that way. Command are for a certain amount of work to be stacking, entering multiple commands on one accomplished in a specified amount of time. line, may be permitted, however.

Utility tasks such as text editing, 3.2 Synthetic Tasks file copying, and compilation are best represented by system processor tasks. Building synthetic tasks requires These tasks are more expensive than several steps. First, CPU burning kernels synthetic tasks for the agency to develop are chosen. A kernel is a subroutine that and for the vendor to convert, but more does a certain amount of work; that is, on accurately represent the real workload and the baseline system it burns a calibrated differences between competing system amount of CPU time. Then the synthetic software. Some utility tasks and all user program sizes and composition are programs will be represented by synthetic determined, and the locality of reference tasks. model is incorporated if desired. Then the synthetic programs are compiled and linked. 3.1 System Processor Tasks A synthetic task is an execution of a syn- thetic program with parameters which select A system processor task is constructed CPU time, number of I/O's to disk and tape, from a functional description (script) of a disk file size and record length, number of sequence of operations to be performed. lines of print, and frequency of locale This script, commonly written in English, is change. The selection of a synthetic translated into JCL for a particular system. program to execute for a task selects the If the task is to be interactive, think kernel type (and therefore the instruction times are also specified. For example, the mix) as well as the memory size. These script for a text edit system processor task steps are described in more detail below. may be: 3.2.1 Synthetic Program Structure Change all occurrences of ENT to ENTR. List lines 12 through 50. Figure 4 shows the overall structure of On line 50, change 7.0 to 7.5. the synthetic program. For the data array, On line 48, change RTRY to RETRY. there is simply a declaration of an array in List the last line of the file. blank common. Also contributing to the data space size (D-size) of the program are the The workload model includes frequency local data for each kernel and for the of execution and resources but due to system control logic. accounting limitations may omit number of

150 . .

the execution of a single transaction without the RTE dialog. Figure 5 shows the COMMON // P0SI(6000) execution flow of a transaction. The CPU time for the transaction is interleaved with the proper number of I/O's and locale kernels' local data changes

control logic ditarQ) local data

Fortran library send THINK local data command

INKl - INK3 recei ve FPKl - FPK3 T.O.D.

KER, DATA determine locale Invariant Control Logic

Fortran Library execute no kernel (s)

Figure 4. Program Composition

The control logic comprises the main program SYNTH, the input/output routine lOW, and other routines, all of which are invar- iant whatever the type and size of the program. The control logic also includes time for \ two routines KER and DATA which are specific locale I/O or locale to the program size and type. They have ^ change? / encoded in them knowledge of the number and names of kernels and the length of the data I/O array. The last components of the control logic are the Fortran library subroutines do I/O whose size and number is system and compiler dependent Figure 5. Execute Transaction A program uses one or more types of kernels. There are enough replicas of each kernel type to achieve the desired instruc- Each of these bursts of CPU time is tion space size (I-size). For example, a generated by executing one or more kernels program using the general floating point (Figure 6). It is determined how much of kernel (FPK) and the integer Bucholz kernel the burst to consume in each kernel type. (INK) may include FPKl, FPK2, FPK3, INKl, For example, suppose it is determined by the INK2, and INKS. The only difference between Code Interruption Analyzer, or by estima- the kernels of a type is the entry point tion, that a program uses 60% INK and 40% name. FPK. A 15 millisecond burst would be divided 9 ms INK and 6 ms FPK. If one pass After initialization, the interactive through INK has been calibrated at 0.9 ms synthetic program sends a THINK command to and one pass through FPK at 1.2 ms , the the RTE. The RTE delays the specified time burst will be done by 10 executions of INK (THINK time), then sends the time of day. and 5 of FPK. These multiple executions may Then the synthetic program executes a be accomplished by looping around a call to transaction and sends another THINK command. the kernel, or by passing a loop count to The response time is the time between time the kernel which loops internally. This of day being sent and THINK being received. selection is made to best represent the The batch execution of a program is simply actual frequency of subroutine calls.

151 .

Other restrictions may be followed which make benchmark generation easier. The compiled size of a kernel should be small enough that the task sizes can be represented with the desired granularity and For each type done large enough to allow the largest desired of kernel program to be constructed. The size of the instruction space of a program is accurate to within the size of one kernel. But calculate portion compiler and UNISKEL limitations restrict of CPU time the maximum number of kernels which may be to burn included in a program. So in order to represent large programs, large kernels may be required. calculate number of kernel Similarly, the CPU time consumed by one iterations pass of a kernel should be low enough that the desired task CPU times can be achieved with the desired granularity. The CPU time based on locale consumed by a synthetic task can be accurate select a only to within the CPU time of one kernel kernel copy execution. But the overhead of the control and data block program is proportionally higher for short kernel execution times. This does not mean done For number of that the task CPU time will be wrong, but ernel iteration that task calibration will be more diffi- cult, and that the instruction mix will not be so representative. In order to avoid execute this, longer executing kernels may be kernel requi red

The execution of a kernel should be Figure 6. Execute Kernels independent of other routines. A kernel should have no output parameters and should not modify its input parameters. Local 3.2.2 Kernels variables should be minimized and common variables should be used with caution. The kernels used should be Particularly to be avoided is non-uniform representative of the agency's workload execution time and run-time data initial-' because this relatively small portion of the ization. A kernel reads its parameters, and benchmark code consumes the bulk of the CPU works in blank common, thus exercising the time. They could be actual subroutines from locale mechanism. the real workload, or could be code which provides the same frequency of logical and The CPU time of the kernels must be arithmetic operations or high level language calibrated to within the required precision. operations as the real workload. Thus, At NASA/JSC the method used was a FORTRAN though the benchmark programs are synthetic, main program that queries the CPU time used, they execute very much like real programs. executes a kernel a large number of times, and again queries the CPU time used. The The kernels may be written in FORTRAN CPU time of the kernel is the total CPU time or in any language callable from FORTRAN. divided by the number of times executed. Thus, though UNISKEL is FORTRAN based, it has also been used to represent COBOL 3.2.3 Program Sizes workloads. The kernels should conform to applicable ANSI language standards in order The compiled size of each the com- to eliminate or minimize the conversion ponents of a synthetic program must be effort required by various vendors. Real determined. Then the number of kernel subroutines may be modified to conform to copies and amount of blank common necessary the necessary transportability standards to achieve the desired instruction and data while remaining representative of the sizes on the baseline system are calculated. workload.

152 e

3.2.4 Locality of Reference Directed Locale The locality of reference may be specified for a synthetic program by means * of a reference histogram; this is called * * directed locale. Blank common is logically ** ** divided into a number of blocks; for JSC we *** * ** * arbitrarily chose 1000 word blocks. A *** * ** *** ** locale is achieved by executing a certain ***** *** *** * **** *** kernel copy and referencing a certain block ***** **** *** * ******** of data. For each kernel copy and for each ****************** ******************* data block, a probability of access is entered in the histogram. When the KERNELS DATA synthetic program makes an instruction locale change, it uses a pseudo- random Random Locale number generator to choose a kernel copy on the basis of the instruction histogram. ****************** ******************* When it makes a data locale change, it uses ****************** ******************* a pseudo- random number generator to choose ****************** ******************* a data block on the basis of the data histogram. It is not necessary that the KERNELS DATA sequences of random numbers be the same for different vendors. The agency may permit Figure 7. Directed and Random the vendor to use a convenient modulus, as Local long as the algorithm remains intact.

These histograms are compiled into the 3.3 Benchmark Jobs synthetic programs. Two locale parameters supplied as parameters at run time are the The number of jobs in which to I-locale threshhold and the D-locale thresh- apportion the tasks of the workload is first hold. The former is the number of CPU decided, then the jobs are created. The milliseconds to consume between instruction minimum number of jobs is the degree of locale changes. The latter is the number of concurrency needed in competing systems. instruction locale changes to make between The goal is to allow all vendors to use data locale changes. their machines to full potential. Having more jobs also minmizes tailoff, the time at If a locale model is not available, the end of the benchmark during which the random locale may be used, in which case the system is not fully utilized due to probability of choosing each locale is insufficient concurrency. The maximum equal. The I-locale and D-locale parameters tailoff is on the order of the execution are entered at run time as with directed time of one job. The factors limiting the locale. The working set of a program on a number of jobs are granularity of task virtual system may tend to be larger with execution times and overhead of job random locale than with directed locale for initiation and termination. any given locale threshholds. So if random locale is to be used, slightly longer locale A job, consists of JCL for initiation threshholds may be chosen to compensate. (job identification, file assignment, etc.), JCL for each task, and JCL for termination. Figure 7 shows a locale histogram for a Thus, for example, job 1 may consist of program of 18 kernels and 19 blocks of blank tasks 3, 7, 7, 2, 14, 3, and 10. Job 2 may common. The histogram value represents the consist of tasks 7, 2, 8, 17, and 3. The probability of access of a locality. The jobs are constructed to have roughly equal peaks in the histogram correspond to high execution times to minimize tailoff. The activity subroutines, loops, and data executions of tasks within a job are structures in the real program. Note that independent, and so may be in any order. there are 6 kernels and 10 data blocks which are rarely referenced. On a virtual memory Support jobs are also created which system these might not be in the program's catalog, initialize, and delete the bench- working set; but they would be referenced mark files. These run before and after the occasionally. Random locale corresponds to benchmark proper. In the NASA benchmark, a flat histogram: no locale is subject to source files are initialized by copying from high activity or low activity. base source files. Binary files are initialized with date and timestamp records.

153 . .

3.4 Batch and Interactive Benchmarks jobs are of approximately equal length to minimize tailoff. A related routine The work required to build a benchmark (SKEL/ORDER) randomly reorders the tasks down to the level of job generation (dotted within each job. line in Figure 3) should be significantly less than that required by conventional Based on the job composition, methods. Even greater savings can be SKEL/RUNBUILD generates the JCL or RTE realized if more than one benchmark need be scripts for each job. It considers the generated. This will frequently be the case files neccesary for each run and generates as an agency's requirements evolve during the JCL to assign them. It also generates the course of benchmark specification and support routines which catalog, initialize, construction and delete benchmark files.

From this job generation level the Other routines assist in calibration benchmark implementor may change: PR size and documentation by producing formatted of benchmark, number of jobs, number of tables describing the tasks and jobs, and by tasks, and interactive think times. And for producing a job which executes each task synthetic tasks the implementor may also once. Additional information on these change: CPU, I/O, lines of print, output routines will be furnished on request. record length, output file size, and frequency of locale change. These possibly 4. Benchmark Execution extensive modifications can be made and a new benchmark generated and calibrated in a 4.1 Benchmark Testing matter of days. Even if a complete benchmark generation is done, this will be It is necessary to consider the easier the second time after the agency is execution of the benchmark by the agency and familiar with the techniques. The gener- by the vendors. The agency will run the ation of a benchmark can be simplified by benchmark for pre-release verification, for the use of automatic generation tools, accurate sizing of its own system, or for described below. system tuning. The vendor will convert and generate a benchmark and will conduct a Live 3.5 Generation Tools Test Demonstration to demonstrate performance to the agency. The UNISKEL approach can be used to manually generate a benchmark. Or, if In the construction of a benchmark, an access to a UNIVAC 1100 system can be agency should consider the requirements for arranged, there are a number of automatic testing the benchmark on its own system. generation routines available, written in For realism, the benchmark typically

UNIVAC Symbolic Stream Generator language. includes many files, large and small, infre- > This section briefly describes these quently referenced as well as frequently routines referenced, both catalogued and temporary. It will, therefore, require large amounts of One routine (SKEL/PROGRAMS) determines mass storage and require special procedures the composition of the synthetic programs for execution on a general purpose system. based on the desired kernel (s) and program This could require that a system's files be size, the control program size, the kernel "unloaded" prior to testing the benchmark. sizes, the FORTRAN library size, etc. It This is time-consuming and expensive. As an considers both the instruction space size alternative for testing purposes, a second and the data space size. It detects certain benchmark can be constructed which accu- errors which would make the desired program rately represents the workload character- size infeasible. istics but which requires less storage. Since it would be almost identical to the Based on the above criteria, real benchmark, it would require very little SKEL/PROGRAMS generates the unique source additional effort but would minimize the code for each program. It also generates cost of testing. routines which compile the source code and link it into executable programs. It allows 4.2 Benchmark Conversion different (FORTRAN callable) languages to be used for the kernels if desired. Two major goals of benchmarking are often difficult to achieve together. The Another routine (SKEL/ASSIGN) benchmark should be representative so that a determines the tasks which comprise each job system which performs acceptably on the based on estimated task elapsed time. All benchmark will perform acceptably on the

154 .

real workload. But it should not be too specified as a sequence of tasks whose JCL difficult for vendors to implement, because can be combined with job initiation and job the vendor's implementation cost may be termination JCL to form the complete job passed on to the government in terms of JCL. The rest of the jobs are constructed higher bids. from the base jobs by changing job names and possibly changing file names. Most of the synthetic program source code is a subset of ANSI 66 and of ANSI 78 The vendor converts and compiles the FORTRAN for easy transportability. Only the synthetic programs, converts the system I/O routine and the clock routine are processor tasks, generates and replicates written in assembler language and must be the files, and builds the jobstreams. The receded by the vendor. These two routines benchmark is then ready to run. total 50 lines. The main elements of a description of Some incompatibilities may remain, the benchmark for the vendor are descrip- however-, between compilers said to conform tions of programs, files, tasks, and jobs. to the ANSI standard. So the modular The vendor is supplied with source code for construction of the benchmark minimizes the each routine in the benchmark and specifi- amount of code to be converted. The FORTRAN cations for each assembler routine. This control software modules total 500 lines. source code includes one copy of the invar- For each program the two unique modules (KER iant portion of the control logic, one copy and DATA) average 200 lines together. For for each program of the variant portion of as many copies as are needed by the largest the control logic, KER and DATA, and many program there are kernel modules of about 50 copies of each kernel. The programs are to 100 lines each. then described by their constituent modules. For example: All the KER and all the DATA modules are very much alike. All the kernel copies Program 5: are identical but for entry point name. SYNTH ,SETTR ,BMTIME ,FFGTST, ICV, lOW, There are approximately 300,000 lines of LIBRY,LOCD,LOCI,ZOR code in the JSC benchmark; however, there KER5,DATA5 are only about 1000 lines in the unique INK1,INK2,INK3,INK4 modules FPK1,FPK2,FPK3,FPK4

The components to be written or The benchmark files are described as converted are the source code discussed fol lows: above, the system processor scripts, the synthetic task JCL, the files, and the RTE File 1: commands. The system processor scripts One copy per job should be chosen with conversion in mind. temporary Benchmark files are either text files, 300KB binary data files whose structure is indexed sequential access dictated by the vendor supplied I/O routine, binary data or files created by system processors, so initialized by supplied program file conversion should be minimized. The tasks are described as follows: After the system processor task JCL is written, the JCL conversion should be Task 1: straightforward. Each synthetic task is the Execute synthetic program 3 with invocation of a synthetic program with two parameters lines of parameters. Each synthetic program 0,20,448,256,5,2,0,218,5 uses the same file(s) in the same way. 5,16,0,1,9 Files are neither created nor destroyed Disk output to file 9, FORTRAN Unit 9 except possibly by system processor tasks. Temporary files are assigned at job start Task 3: and are not freed until job end. Therefore, Copy file 6 to file 7 the execution of one task is independent of the execution of another, and without The jobs are described as follows: knowing the containing jobs, the vendor may determine the JCL for each synthetic task. Job 1: Task 3, Task 7, Task 7, Task 2, The JCL for each job in the base Task 14, Task 3, Task 10 workload is constructed. Each job is

155 4.3 Verification In interactive mode, the synthetic pro- gram reads timestamps from the RTE, which it The procuring agency will verify at LTD prints along with its own timestamps. Prior time that the benchmark work was done, that to LTD, the agency test team may have the it was done in the required time, and that system clocks of the RTE and of the System performance goals were met. UNISKEL pro- Under Test set to different dates and times. vides convenient ways for the agency to do This difference is checked in the RTE log. as much verification as it deems necessary. The system clock measures the benchmark The binary files in the benchmark are elapsed time. The system log verifies that created by an initialization program that the work was done. The vendor supplied RTE fills them with timestamps. The synthetic log analysis program verifies that response program writes over some or all of these times were acceptable. These primary checks files with timestamps of its own. Prior to are easy to use, but may not be reliable LTD, the test team selects certain temporary enough for the agency's needs, so other spot files to be made permanent. The timestamps checks are typically used. in these files, as well as those in some other permanent files, are examined after A stopwatch verifies the benchmark LTD for consistency. elapsed time, as well as correct system clock operation. Print output from some of The synthetic program keeps a count of the jobs is examined to see that the start the number of times the kernels are called, and stop times from the system log are and this value is printed for verification. correct, and that the proper tasks were A more rigorous check can be accomplished executed. The timestamped RTE log is exam- with an addition to the kernel. The kernels ined for the same reason, and to verify may compute pseudo- random numbers based on response times. The synthetic program a seed supplied by the agency prior to LTD. writes timestamps in all its output, which The final numbers generated by each program are checked for consistency. execution depend on the seed and on the number of calls to the random number routine. These values may be printed and compared against the known values.

156 COMPARING USER RESPONSE TIMES ON PAGED AND SWAPPED UNIX BY THE TERMINAL PROBE METHOD

Luis Felipe Cabrera

Department of Mathematics and Electronics Research Laboratory University of California, Berkeley Berkeley, CA 94720

jehan-Frangoi s pSris

Department of Computer Sciences Purdue University W. Lafayette, IN 47907

In this paper we present a comparison of user response times on paged and swapped versions of the operating system UNIX for the DEC VAX 11/780. The tech- nique used was to construct a script that periodically evaluated the system's work -load and measured the system's response times to a set of benchmark programs.

These measurements, collected at two different sites, show that differences of responsiveness observed between the two systems depended much more on the work- loads and the configurations than on the operating sys- tems themselves.

Since we only used standard UNIX tools to build our scripts, they are highly portable and can be installed on any standard UNIX system in a matter of minutes without bringing the system down.

Keywords: terminal-probe method; UNIX.

1. Introduction Numerous tools already exist with the purpose of predicting or A change from one version of an measuring the performance of a com- operating system to another one puter system. In contrast to many always constitutes a difficult prob- of these tools, the method we lem for the management of a computer present here requires no special installation. The problems of hardware and very little effort for updating the software while attempt- modeling the system or estimating ing to keep the user's interfaces as its work-load. The technique used, unchanged as possible are well which applies to all interactive known. We will address here another systems, essentially consists of problem, namely, the performance constructing a script that periodi- implications of the change. cally evaluates the system's work-

157 . .

load and measures the system's of what is known as the "Berkeley response times to a set of benchmark UNIX." programs. As we will see, the only drawback of the method is that sys- One of the most remarkable tems with highly variable work loads features of UNIX is its user inter- will require longer data gathering face. On logging in, each user is periods assigned a special process contain- ing a command interpreter that Since standard UNIX tools were listens to the terminal. This com- used to build these scripts, they mand interpreter, known as the s hell are highly portable and can be [2], parses the input line decod ing installed on any standard UNIX sys- the command requested, its flags, tem in a matter of minutes without and the arguments passed to it. The bringing the system down. Moreover, shell then exec 's the command. The the philosophy of the method is by shell also has a Fedirection mechan- no means specific to UNIX systems ism that allows it to read command and could be applied to other lines stored in files. Users can interactive systems. thus define sequences of shell com- mands, known as shell scripts, and Section 2 of this paper store them in files awaiting later presents those features of the UNIX invocation. The versatility of operating system which are relevant these scripts is greatly enhanced by to our problem. In Section 3 we the fact that the shell language introduce the data gathering method. also contains control-flow primi- In Section 4 we present and analyze tives, string-valued variables and diagrams which compare the perfor- even simple arithmetic facilities. mance of the installations under the Itius, building a sequence of com- different versions of the operating mands that will be repeatedly exe- system. Section 5 discusses some cuted at fixed time intervals is on general issues about comparing dis- UNIX a nearly trivial task. tinct operating systems. Section 6 has our conclusions. The portability of our tools was also facilitated by the fact 2. The UNIX System that UNIX handles automatically all file allocation decisions. UNIX UNIX is a trademark for a fam- files are accessed through a hierar- ily of time-sharing operating sys- chy of directories and the typical tems developed at Bell Laboratories user is not aware of the physical during the last twelve years [10, location of the files he or she 9]. The first version of UNIX was manipulates implemented in 1969 on a PDP-7. Since then, several versions of UNIX 3. The Data Gathering Method have been developed at Bell Labora- tories. They were primarily aimed Our strategy for monitoring at various members of the PDP-11 each system's responsiveness was the family of minicomputers and, in same one used in Cabrera's perfor- 1978, Bell Laboratories released a mance analysis study of UNIX [3]. version of UNIX aimed at the new VAX It consists of running periodically 11/780. Despite the fact that the a set of predefined benchmarks in a VAX hardware was designed to support totally automatic way. This was a paged virtual memory system (DEC'S achieved by writing a shell script own VMS), the successive versions of that is essentially a loop contain- UNIX developed at Bell Laboratories ing the benchmarks together with did not support paging. This commands that gather statistics motivated a group at the University about the work loads and measure the of California, Berkeley, to imple- time it takes each benchmark to com- ment a virtual memory extension to plete. Each time the script has UNIX for the vAx [1]. This exten- cycled through the execution of sion used a Global LRU (Least these commands, it executes a sleep Recently Used) page replacement command that suspends its execution strategy with use bits simulated by and then wakes it up after a software and is now an integral part predetermined number of seconds.

158 .

This script is then run as a back- goal is to use as benchmark a task ground job (with the same priority representative of the user's tasks. as any user process) during the At the same time, one must try to operation time of the system. capture the work load through some characterization. Then, the This data gathering method can response time of the task is plotted be categorized as a time-sampling against the characterization of method [7] and in fact is very simi- load . lar to Karush's terminal probe method [8]. By using it, we measure We have chosen tasks which the work load of the system as well exercise the system in a "natural" as the dependency of our performance way (i.e., we did not run tasks indexes on the underlying equipment. which would a priori perform better We are thus evaluating the perfor- in a paged env ironment or in a swap- mance of an installation. ping environment) and which were representative of the user's activi- Although running a script ties at the time. Since text pro- affects the load of the system — and cessing constituted on both sites a thus its responsiveness, it was felt significant part of the system's that this would not affect the vali- load, we decided to use, in both dity of our comparison study since sites, the command man man as bench- each system would be presented with mark The man command retrieves and the same script. The main purpose formats the entry for any given com- of our comparison experiment was mand out of the on-line copy of the precisely to observe how each system UNIX Programmer's manual; thus man reacted to this stimulus. man retrieves and formats the manual entry for the man command. This Our commitment to use only task is interesting because the standard UNIX tools decided us upon entry is retrieved from disk using the usage of the t ime command for the widely used formatting program measuring the completion time of nroff . lb avoid screen output prob- each benchmark. The t ime command 1 em s wh e n running the script, the returns, upon completion of the com- output of man man was sent to mand it prefixes, three measure- /dev/null instead of sending it to a ments; response time, system time real terminal. This had the effect and user time. Response time is of discarding the already formated only accurate to the second ( time text of the retrieved page. The truncates, does not round o f f ) . length of each data gathering period

This low resolution of time , was determined at each site and will together with our desire that no be discussed in the next two subsec- individual measurement be off by tions. As for the display of the more than 10%, led us to restrict data, after an exploratory analysis ourselves to benchmarks that would of our measurements it was decided never take less than five seconds to to use 7 5-percentile curves. In complete Figures 1 and 2 we display four curves for the task man man using 4. The Measurements the number of logged in users as characterization of load. The four The design of a comparison curves are the median (50- study like ours is faced with many percentile) , mean, 75-percentile and constraints and trade-offs. There 90-percent ile of the distribution of are basically three main areas in response times per value of our work which major decisions have to be load characterization. made: the selection of the bench- marks, the length of the data gath- Figure 1 displays these curves ering period and the representation for the swapping system and Figure 2 of the data. for the paging system. In them we may appreciate how the influence of Since our method measures the the outliers becomes apparent in the performance of an installation, 90-percent ile curve. We may also i.e., the work load is also included notice that the median of the in the observations, a desirable response time samples are con-

159 . .

sistently lower than the mean of the its size is 2K bytes. samples. This happens in any dis- tribution which is skewed towards Figures 3 and 4 display the the lower range values or, whose 75-percent ile curves of the response outliers lie in the higher range of time for the command man man. On the values. These tWD characteris- both figures we see that the swap- tics were common to all our data. ping system performs slightly better *Ito deemphasi ze the effect of than the paging system at higher outliers we chose to use the 75- load levels. This difference can be percentile curve to display our justified by the dramatic change in data. This decision was partially the workload which occurred v\*ien the motivated by unavailability of abun- paging system was brought up. Since dant data from one of the sites; we do not have data from stand-alone 95-percentile curves would have been measurements, we cannot rate the two too much influenced by outliers. systems solely on these data, but it becomes apparent that neither out- We decided on two single- performs the other one throughout variable characterizations of load: all the values of load. number of users logged in and number of user processes. This latter Figures 5 and 6 display our index represents the number of data for the CPU-bound job. This processes which have been generated job consists of two nested loops and by user commands. It does not an inner block of statements. A 9 include any system generated statement sequence of integer arith- processes metic operations is executed 100,000 times. The size of the object code 4.1. The Berkeley Measurements is only 2K bytes— i.e. four 512 byte pages-- and thus its probability of This analysis was based in 431 being swapped out is quite small. data points for the swapping system It is quite interesting to observe and 1455 points for the paging sys- that the paging system outperforms tem generated during the second half the swapping system under both char- of 1979. Throughout the data gath- acterizations of the workload. This ering period the system's hardware can be explained in terms of the remained unaltered. The work load, additional I/O activity existing in however, underwent a substantial the paging system. Indeed, since change (by very large programs such our job is very small in size and as the VAX version of the algebraic performs no I/O, the only impedi- manipulation system MACSYMA [6]) to ments that may block it to run to thoroughly exercise the memory completion are time-quantum expira- management capabilities of the new tions and multiprogramming delays. kernel Thus our cpu-bound job is either running or in the ready queue. In Given the small amount of main the paging system it more often gets memory the system had at the time, the CPU because of other jobs being these large address space programs blocked by page faults. This fact did influence significantly the work makes us believe that trivial tasks load of the system. The hardware should perform better in the paged configuration of the system was a version of UNIX. VAX 11/780 CPU, 512K bytes of main memory, 16 ports, two RP06 disk Figures 7 and 8 display our drives with a DEC disk controller data for the C compilation of a and one TE 16 tape drive. small C program. This is the only observed task where the swapping We shall present data for three system appears superior. This is

tasks: the command man man , the specially clear in Figure 8 where execution of a cpu-bound job and the the number of user processes charac- compilation of a short C program. terization of load is used. We The choice of the cpu-bound job was believe that the difference in per- motivated by our desire to observe formance observed in this case is the effect paging had on a task that mostly due to the change in workload would almost never be swapped out: which occurred in the system. The

160 execution of processes whose size d id not occur . was much larger than the available memory created high contention for Because of the short interval memory. Ihus medium size processes of time left between our decision of like the C compiler would always be running the script and the scheduled losing their used pages and paging switch from swapped to paged UNIX, in those pieces of code needed for we were only able to collect 152 further processing. Their resident measurements with the swapped ver- set would never be allowed to remain sion of UNIX. This left us with at any reasonable size. The effect about ten observations for each load of large processes was quite notice- level, as expressed either by the able at the time, specially when number of users logged in or the processes like VAXIMA [6] would do total number of user processes. garbage collection through a 2 Mega- These values were obviously much byte address space. At those lower than those required to obtain points, response time for all com- acceptable estimators of the 75 per- mands would degrade ostensibly. centiles of the response time.

4.2. The Purdue Measurements f^ced with the same problem, but on a much smaller scale, one of These experiments were per- the authors [3, 4] decided to clus- formed during the Summer of 1980 on ter neighboring load levels with the VAX 11/780 of the Department of insufficient numbers of observa- Computer Sciences at Purdue Univer- tions. For instance, if 30 was the sity v\^en the Department decided to minimum acceptable sample size and switch from UNIX version 7 to Berke- there were 19 observations ley UNIX. At that time, the machine corresponding to 15 users logged in had 3 Megabytes of main memory, 56 and 12 observations corresponding to ports, three RM03 disk drives on 16 users, the tWD sets of observa- Massbus 0 and one TE16 tape drive on tions would be merged into a single Massbus 1. set of 31 observations and this set made to correspond to a work load of Vfe chose to run as benchmarks (15+16)72 = 15.5 users. The same

the UNIX command man man , which approach applied to our Purdue data formats and displays the entry of would unfortunately resulted into the UNIX manual describing the com- too little load levels after clus- mand itself as veil as a small tering. We decided therefore to use script containing "man man" and a scheme in which the 75 percentile several small tasks. As said for the load level i would be com- before, this choice was motivated by puted taking in account all the the fact that text processing con- measurements at load levels i-1, i stituted then the application con- and i + 1. This filtering greatly cerning the greatest number of reduced the influence of outliers on users. In UNIX version 7, the man the 75 percentiles. It has, how- command is implemented as a shell ever, the unfortunate side-effect of scr ipt while Berkeley UNIX uses introducing a positive correlation directly executable code. Since between neighboring values on the this latter alternative is percentile curves and therefore inherently much faster than a should not be considered as a sub- script, which must be interpreted by stitute for more measurements the shell at each execution, a direct comparison of the response Figure 9 and 10 display the times for the man man command would 75-percent ile curves of the response have been grossly unTair to the ver- time for the script version of the sion 7. Vfe thus decided to run the man man command. As one can see on

version 7 script in place of the both fig ures , the response times for original man command in our measure- the swapping version of UNIX appear ments with the Berkeley UNIX. As a to be somewhat higher than those result, the response times for the corresponding to the paging UNIX and man man command measured at Purdue exhibit also a more erratic were much higher than those observed behavior. These results apparently at Berkeley, where the same problem do not agree with those observed on

161 : .

the Berkeley VAX. One should how- out during their execution. Hhis ever point out that the Purdue VAX would remove any incentive for had a much larger memory and that implementing a paging scheme. More- all our measurements relative to the over, it would even make straight paging version of UNIX were made swapping more effective than paging during the month immediately follow- since it is usually more efficient ing the conversion to Berkeley UNIXr to bring into memory the whole thus before any change in the work- address space of a program in a sin- ing habi ts of the users could have gle I/O operation than by succes- occurred. sively fetching faulting pages.

Another point to mention is This argument does not, how- that the curves representing the ever, stand against the well-known 75-percentiles of the response time fact that larger address spaces have against the number of user processes always resulted in larger programs. are somewhat better behaved than In fact, one of the strongest those corresponding to the number of motivations of the Berkeley UNIX was users. Ttiis suggests that the to provide the larger address space number of user processes is a better required by algebraic manipulation estimator of the system's workload. programs. Our measurements on the Berkeley VAX show indeed that the 5. Discussion switch from swapping to paging has significantly altered the system's One can conclude from our meas- work load by allowing bigger jobs to urements that the switch from UNIX run on the machine. Although it did version 7 to Berkeley UNIX did not not appear in our data, the same alter significantly user's response phenomenon also occurred on the Pur- times on any of the two monitored due VAX, whose current work load is installations. This conclusion, strikingly different from the one however, depends on the work loads existing before the switch to Berke- observed on the two machines and is ley UNIX. by no means an answer to the ques- tion: "Which one of the two operat- The main advantage of one ing systems is better?" operating system over the other one is thus more a question of increased Paging was introduced, more capabilities than faster response than twenty years ago, in order to times. As a cynical observer could allow bigger jobs to run on machines point out, the switch to a better then characterized by very small system might be accompanied by an main memories. It became later a increase of the average response nearly universal feature of large- time resulting from the increased scale multi-access systems because demands placed on the system's it allowed to keep more jobs simul- hardware taneously residing in main memory, thus avoiding swapping delays. As 6. Conclusions it was quickly noticed, paging unfortunately never comes for free We have presented here a simple it requires special hardware, intro- method for comparing user response duces a fair amount of software times on two versions of an operat- overhead and performs very poorly ing system. The method used con- with programs exhibiting scattered sists of constructing a script that reference patterns. periodically evaluates the system's load and measures the system's One may then question the effi- response times to a set of benchmark ciency of paging at a time where programs. This method can be very main memory is so cheap that it easily implemented on UNIX or any becomes possible to keep residing in other system that has features memory enough conversational users allowing to submit periodically a to saturate the CPU of a machine set of tasks and to collect informa- like the VAX. Wbuld this be the tion on response times and system's normal case, jobs would typically workload. It does not require the run without having been ever swapped system to be brought down and does

162 s . —

not affect the normal operation of Nether 1 ands ,pp 205-215. the installation. [5] Digital Equipment Corporation, Results concerning a paged and VAX 11/780 Techn ical Summary .

a swapped versions of the VAX 11/780 Ma yn aT3 , Mas sT^ 197 8 UNIX system show that the observed differences of responsiveness [6] Fateman, R. J., Addendum to between the systems depended more on the Mathlab/MIT MACSYMA Refer- the workloads and the configurations ence Manual for VAX/UNiX VAX- than on the operating systems them- IMA, Department of EECS

selves . Computer Science Division, University of California,

Thus our methods should not be Berkeley (Dec . 1979) . construed as a technique for compar- ing the inherent merits of two [7] Ferrari, D. Computer Systems operating systems, but rather as a Performance ITvaluati on ,

tool giving prompt quantitative Prentice-Hall , Eng lewood answers on the responsiveness of any aiffs, NJ, 1978. particular installation.

[8] Karush, A. D. , The Benchmark- ing Method Applied to Time- Sharing Systems, Rept. SP-3347, Acknowledgements System's Development Corpora- tion, Santa Monica, CA, August Ihe work reported here was sup- 1969. ported in part by the NSF grant MCS 80-12900. The authors express their [9] Kernighan, B. W. and J. R. gratitude to their respective Mashey, The UNIX Programming departments for having provided the Environment, Computer 14, 4 facilities used in this study. (Apr. 1981), 12-24.

[10] Ritchie, D. M. and K. L. Thomp- son, The UNIX Time-Sharing

References System, Comm . ACM 17, 7 (Jul.

1974) , 365-375. A revised ver- sion appeared in The Bell Sys- [1] Babaoglu, 0., W. Joy and J. tem Technical J. 57~, 6~Part 2 Porcar, Design and Implementa- (Jul. -Aug. 19787/ 1295-1990. tion of the Berkeley virtual Memory Extension to the UNIX [11] Ritchie, D. M., S. C. Johnson, Operating System, Department of M.E. Lesk and B. W. Kernighan, EECS —Computer Science Divi- The C Programming Language, The sion, University of California, Bel 1 System Technical j. 5 7"^ 6

Berkeley, ( 1979). Part 2 (Jul. -Aug. 1978), 1991- 2019.

[2] Bourne, S. R. , The UNIX Shell, The Bell System Technical j.

5T7 6^art 2 ( Jul . -Jdig~. 19787^ 1971-1990.

[3] Cabrera, L. F., A Performance Analysis Study of UNIX, proceed ing of the 16th Meeting oi tFe CPEUG^ Orlando^ FT7 October 1980, pp. 233-243.

[4] Cabrera, L. F., Benchmarking

UNIX : A Comparative Study, in Expe r imental Compu te r Perfor- mance Evaluation (D. Fer rar i and M. Spadoni eds.) North- Holland, Amsterdam,

163 I

120.0

6. 8. 10. NUMBER OF USERS

man"" Figure 1 Median, Mean, 75 and 90 percentiles for ""man and Swapped UNIX

100.0

X MEDIAN 80.0 - MERN LiJ + 75-PCTILE 60.0 - I— I— 90-PCTILE UJ cn 40.0 - O Q_ CD ^ 20.0

.0 2. 6. 8. 10, 12. 14 NUMBER OF USERS

Figure 2 Median, Mean, 75 and 90 percentiles for "man man" and Paged UNIX

164 Figure 3 Response Time 75-percentile vs Number of Users for the "man man" command

100.0

.0 5. 10. 15. 20. 25. NUMBER OF USER PROCESSES

Figure 4 Response Time 7 5-percentile vs Number of User Processes for the "man man" command

165 HO. 00

CO 30.00 X SNRPPING O PfiGING

20.00

.00

.0 2. 6 8 10, 12. 14. NUMBER OF USERS

Figure 5 Response Time 7 5-percentile vs Number of Users for the CPU-bound job

40.00

30.00 - X SNAPPING O PAGING

20.00

CD co 10.00 LU

.00 .0 5. 10. 15. 20, 25. NUMBER OF USER PROCESSES

Figure 6 Response Time 75-percentile vs Number of User Processes for the CPU-bound job

166 80.00

CO 60.00 X SNRPPING

IxJ O PRGING

^o.oo CO o to 20.00 LU

.00 ± .0 2, H. 6. 8. 10. 12. NUMBER OF USERS

Figure 7 Response Time 75-percentile vs Number of Users for the C compilation

80.00

60.00 - X SWRPPING

LiJ O PRGING

40.00 UJ CO O CO 20.00 LU

_L .00 .0 5. 10. 15. 20, 25. NUMBER OF USER PROCESSES

Figure 8 Response Time 7 5-percentile vs Number of User Processes for the C compilation

167 150.0

CO X SNRPPING Lu 100.0 - 0 PRGING

UJ CO O 50.0 e Q_ (D UJ OH

1 1 1 1 1 1 1

.0 6. 8. 10. 12. 14. 16, NUMBER OF USERS

Figure 9 Response Time 75-percentile vs Number of Users for the script version of "man man"

150.0

.0. 2. H.. 6. 8. 10. 12. 14. 16. NUMBER OF USER PROCESSES

Figure 10 Response Time 75-percentile vs Number of User Processes for the script version of "man man"

168 PRACTICAL APPLICATION OF REMOTE TERMINAL EMULATION IN THE PHASE IV COMPETITIVE SYSTEM ACQUISITION

Deanna J. Bennett

Air Force Phase IV Program Management Office Directorate of System Engineering Gunter AFS, AL 36114

The Air Force's Phase IV program is the largest computer system acquisition ever attempted. It will replace the Air Force inventory of base supply UNIVAC 1050-11 and base level support Burroughs B3500 cccn- puter systems. At the outset, a strategy was developed for expressing the workload requirements for the more than 200 systems to be replaced. Specific system throughput requirements were established. Along with this the decision had to be made how to validate the capabilities of the proposed systems to support the workload.

The point of this paper is to outline how the following questions were answered by the Air Force in the currently on-going Phase IV acquisition: Given a real acquisition situation, how can we test effectively? What are the practical day-to-day decisions that must be made to support the selected testing strategy?

The end decision regarding the workload test approach was to use some tools from the classic benchmarking tradition, but to enhance their effectiveness with the use of remote terminal emulation. This deter- mination was only a starting point for a number of implementation deci- sions that had to be made by the Air Force to support effective use of an emulator.

The current status of the Phase rv program is that contracts have been awarded to two contractors to compete for worldwide system replacement. The contracts require transition of five major standard Air Force on-line automated data systems (ADSs) and a small sampling of stan- dard Air Force batch systems, and demonstration of contractor-proposed systems with the transitioned Air Force standard software. At the end of the contract, based on cost and evaluation of performance, one contractor will be selected for vrorldwide system replacement.

Key words: Benchmark testing; Phase IV; remote terminal emulation; response time; terminal networks; workload sizing.

1 . Introduction

The Phase IV program was formally conplexity, the Program was established on established in early 1976, as the vehicle an innovative basis, unhampered by prece- by vAiich a large inventory of nearly obso- dents in the competitive conputer selection lete computers in the Air Force inventory and acquisition process. The replacement could be replaced. Because of its size and strategy was developed to fit the need and

169 .

to recognize the economies of scale that We rapidly concluded that batch bench- could devolve from such a massive acqui- mark techniques were inapplicable to our sition. In its overall management and pro- problem, and searched the field for the curement approach. Phase IV falls somewhere proper tool to apply. All the historical between off-the-shelf ccsnputer acquisitions options for demonstrating processing capa- typically managed by the Air Force Computer bility on-line testing problems, bearing in

Acquisition Center (AFCAC) , and the weapons mind the bounds of technical, economic, and system development and acqujsition projects practical feasibility, as well as the ulti- normally managed by system program offices mate objective : to verify not only that (SPOs) throughout the Air Force. the required terminal network could be sup- ported but also that the support was One of the concepts typically imple- timely mented by SPOs is "fly before you buy," in which prototypes, including aircraft, are 2.1. Simulation Unacceptable put through their paces in a real life environment before any ccxnmitment is made The verification required for on-line to large-scale manufacture. This was the system sizing dictated that the actual most attractive aspect of the weapons system be tested and that the mode of test system approach for Phase IV and was incor- not interfere with the internal system porated into our acquisition strategy. processing. On this count alone, a simula- This is why we have two "prototype" system tion or model-based verification was contractors. The main advantage is that it rejected. Construction of proper models minimizes the risk of incorrectly esti- would be expensive. The technical feasibi- mating the processing capabilities of pro- lity of performing the effort was assessed posed computer systems before using them to as minimal, given the short time available, replace over 2 00 Air Force systems around the range of vrorkloads that would have to the world. A miscalculation could have be modeled, and the likelihood that newly severe impact on the capability of the new announced equipment lines could be proposed. systems to keep up with day-to-day base Also, for all practical purposes, since the mission support demands. One basic testing contractor would have to be performing the requirement emerged as paramount: workload simulation/modeling exercise for the final processing capabilities had to be validated test, the Air Force assurance of accuracy well in advance of commitment to worldwide would cCTne down to "trust me" as the veri- replacement by either of the competing fication that the model was performing v^at contractors. was required. A convincing bottom line to the brief consideration of simulation is 2. The Communications Testing Problem the GSA prohibition of simulation as the sole basis for a Contract Award, a tacit In Phase IV, we are using classic acknowledgement of simulation's more benchmarking techniques to create inputs to artistic than anpirical nature. exercise the batch workload profile on the proposed system. This is necessary to 2.2 No Sample Terminals verify batch processing throughput or Alternative Inputs capabilities, but is not sufficient to satisfy us that the real range of workload Before computers became sophisticated can be supported. The base level systems enough to handle more than a few terminals, being replaced through the Phase IV program the verification of a system's capability have a critical on-line as well as a batch to handle a small number of terminals could mode support roles. be done simply by hooking the actual ter- minals up to the test system. As time and There are five major on-line automated technology progressed, often a few ter- data systems vAiich support aircraft minals were used to represent a much larger maintenance, supply, civilian and military whole in the benchmark process. The risk personnel, accounting and finance, and increases, of course, with the dwindling civil engineering functional users at each percentage of terminals actually demon- Air Force base. Depending on the location, strated. After a point, the contention and anywhere from twelve to over one hundred queuing problems attendant on a large num- fifty independent devices have to be sup- ber of terminals are unrealistically over- ported by the new systems at some time in looked when three or four "representative the system life. In addition to the physi- devices" are used in their place. Also, use cal capability to support the terminals, of live terminals implies the use of live the system's responsiveness to the user is operators. While computers can generally be a key concern. relied on to produce reproducible results.

170 human operators cannot be counted on to do legality in terms of Goverrment contracting the same. This violates one of the basic regulations. tenets of benchmarking in the Government: Have a controllable, reproducible Before the publication of the GSA RTE experiment. standard which applies across the government, specially-created, tailored The controllability factor led to other variants of emulators were required when experiments in stress testing, such as that emulation was considered for application to used several years ago in the World Wide a benchmarking problem. The variants Military Command and Control Systan depended on the specific desires of the (WWMCCS) computer system acquisition. In individual user or acquisition office. The this benchmark, the externally-generated result was repeated development costs for terminal inputs were queued on tapes and competing vendors, seen as limiting the batched in. While controllability and field of ccxnpetition to only "wealthy" reproducibility were assured, the system ccxnpetitors. Because of this cost aspect, impact of handling large volumes of devices RTE-based pre-award competition was not per- and their communications overhead aspects mitted by GSA for Government acquisitions. was not assessable. In the design of the Phase IV acquisi- Using a combination of input messages tion strategy, we circumvented this histori- queued on tape and a sampling of inputs cal objection to the sunk pre-award cost of from several live devices was judged to RTE-based ccxnpetition. The Phase IV com- have too many of the disadvantages of peting contractors will perform post-award either technique, and not enough advantages benchmarks under paid contract while they to reasonably pursue for Phase IV 's purpose. compete for final selection as the Air Force-wide implementation contractor. Cost 2.3 The Practical Default. to vendors was not an issue, because the cost was to the Government. Because of the The option of having a full-scale benchmark large number of systems involved, this test with an entire live terminal network investment for Phase IV is cost effective, was also rejected out of hand. The sheer whereas in single site acquisitions it could management burden of monitoring and have been prohibitive. However, we did some controlling activities across over 100 ter- groundwork before RFP release to assure our minals was enough to render this prac- management and ourselves that our costs tically and economically unfeasible. would be primarily test-related costs, not Dissatisfaction with the traditional RTE development costs: A survey of the emu- choices for assessing workload processing lation capabilities of canputer manufac- capacity led to a single reasonable option: turers v^o could reasonably be expected to remote terminal onulation. The only method submit Phase IV proposals verified that each that could satisfy the requirement for had an existing emulator with the basic testing throughput capacity in a terminal- required characteristics. oriented environment on all practical, eco- nomic and technical counts was to require a The Request for Proposal (RFP) that was remote terminal emulator. Emulation is issued in 1978 for Phase IV was the first within the practical , technical state of Air Force RFP which required remote terminal the art. It is controllable and repro- emulation. It was written before publica- ducible. It can demonstrate all the tion of the GSA standard, but it should be effects of communication contention and noted that the Phase IV RTE requirements are line strategies on the system under test. relatively close to the new standard. We could see no other way of testing. 3. Phase IV Workload Test ?^proach 2.4 Support for RTE -Based Test Requirements The Phase IV problem was to orchestrate a throughput test to verify that the work of Phase IV Program management was also 2 37 separate systems with 2 37 variations on reasonably convinced that as long as we a common processing framework could be were breaking new ground in other ways, we handled by new systems. This required deve- should take advantage of the current tech- lopment of some method for distilling the nology and do as complete a job of sizing critical characteristics of the workload verification as possible by requiring a into simple, testable quantities, and remote terminal emulator (RTE) as a basic deciding how to choose what the physical test tool. Before committing to an RTE system under test should look like. requirement. Phase IV had to verify its

171 .

TXBLC !• XI AVERAGE PEAK HOUR OH-LIIIC WORKU>AD CATEGORIES 3. 1 Workload Expression INPUT INPUT OUTPUT OUTPUT LOGICAL System workload representation for CATEGORY MSG CHAR MSG CHAR DSK I/O replacement of a single system is done most 1 82S 72, 891 5,058 450,418 42,735 2 1,817 161,713 12,017 1,059,513 94,794 commonly by constructing a single line pro- 3 2, 556 225,027 15. 583 1,337,839 132,400 jection of future workload and constructing 4 3, 218 283, 035 19,590 1,744,843 166,692 5 3,728 330,729 24,193 2,151,753 205,769 a single timed benchmark test, a set of programs which reflect the work profile for the current system. (Naturally, the vali- dity of the ensuing test is limited by the validity of the analysis and the availabi- lity of programs in appropriate languages Note that as these categories were to form the mix.) Capability to handle developed on the basis of collections from workload growth can then be measured by individual Air Force bases, there is no decreasing the amount of time permitted to straight line relationship across cate- run the program mix. Figure 1 shows this gories for various workload components. approach diagrammatical ly This means that test development must be tailored to each category. The single system acquisition approach of developing a single mix and decreasing the amount of time permitted for its execution to test increased throughput requirements is doubly inapplicable for Phase IV. First, it is a batch-oriented approach vAiich does not account for variances in numbers of terminals. Second, in Phase IV, each cate- gory is really a separate workload profile, unrelated to lower or higher workload categories.

Instead of projecting workload by base, fixed categories are used and the Air Force 0 1 2 3 4 5 base sites move through the categories. Years After Installation— This simplifies the vendor sizing effort, as workload factors are already distilled Single System Figure 1. Typical into usable form for input to sizing Workload Expression models. Figures 2 and 3 give an example of how bases might move through the categories between two given calendar years.

To use this most common workload repre- sentation for Phase IV 's wide range of types and volumes of processing could require 237 separate workload projections and tests to be developed. Besides implying an impossible test development workload, this would be spurious precision, given the basic "plus or minus" imprecisions of workload projections, however carefully G,H they are constructed. D.E.F A,B,C For Phase IV, the performance sta- tistics from a sampling of current sites were collected and massaged into a set of 1 2 3 4 5 workload "categories", using both the Workload Categories empirical data from sites and overall growth rate projections. These categories Figure 2, Year 1 Workload for Bases were defined to the analytical level A, B, C, D, E, F, G, H

required by sizing models. See Table 1 for an example of the workload characteristics, by category, for the U1050 workloads. In Figure 2, Air Force bases A through H are mapped into 5 workload categories, de-

172 pending on their workloads in Year 1 of This required us to be prepared to test any system life. of the fifteen workload categories we had established to anticipate all potential "break points" in contractor proposals. For the Air Force to develop a test for each category and mode, as we in fact did, required construction of 32 separate tests -- 15 on-line, 15 batch and 2 combined on- line and batch tests. G,H One major feature of the Phase IV E.F acquisition that made such a test develop- ment effort feasible is that the contractor B.C.D will be supplying all the software for the test. All workload categories and their characteristics (messages in, characters 1 2 3 4 5 in, cards punched, and so forth) Workload Categories are ex- pressed in terms of automated data systems (ADSs) which the contractor is responsible Figure 3. Year 3 workload for Bases under contract to transition to his pro- A, B, C, D, E, F, G, H posed system. The Air Force test develop- ment effort did not require selecting a specific mix of high-order language Figure 3 shows these same bases and programs that the ccxnpeting contractors workload categories in Year 3 of system could easily run on their systems, as in life, and shows how the workloads of bases historical benchmark construction. Rather, have grown, causing most to migrate into a carefully measured collection of transac- larger categories. tions was collected to exercise the collec- tion of ADSs to fit the workload profile. Given this analytical framework, the This applies equally to the batch and on- testing problem is to verify that there is line tests (for Phase IV separate on-line a system available to satisfy the workload and batch workload tests are required requirements of each category, not each except in a few cases); however, this paper individual base. These category examples addresses solely the on-line test construc- are simplifications. In fact there are tion problems. The amount of work done by five workload categories for the U1050 test developers in selecting the individual workload projects projections and 10 for program transactions frcsm this large pool the B3500 workloads. was monumental, but was unhampered by the "high order language" constraint. 3.2 Stress Testing Philosophy 4. Specific RTE Implementation Decisions Phase IV contractors are permitted by the Air Force to match their equipanent The Phase IV on-line workload tests are lines against the workload categories, executed by the Phase IV contractors on letting "break points" fall vAiere they make their own RTFs using Air Force software optimal use of equipment capabilities. For they have transitioned to work on their example, a vendor could propose to satisfy systems. Much of the test burden is put on workload categories 1 through 4 with the the contractors. Commitment to the use of lowest level processor in an equipment an RTE for validating terminal-related pro- line, categories 5 through 7 with a cessing capabilities did not remove all the slightly enhanced model , and categories 8 Air Force test burden, nor was it an end through 10 with yet a more powerfvil model. decision. It was only the beginning of a Another proposal could have far different series of other implementation decisions. matching of the Government's workload cate- gorizations to equipment models. 4. 1 How Many Terminals To Test?

A flexible workload testing strategy Given the workload definition scheme was required to match the latitude allowed and the determination that an RTE would be the proposing vendors. Stress testing was required, the next decision to round out the at the heart of the strategy: the workload definition of the test enviroment was how processing capabilities tested had to be many terminals would be associated with the measured by testing the largest workload on-line test for each workload category. against which a configuration was proposed. The RFP presents two terminal profiles for

173 each system at each base : the number and problem with this is that the system test types of terminals required at initial activity must reflect the activity defined installation, and the number and types pro- in the Phase IV workload category defini- jected at an augmentation point 54 months tions for the test to produce valid after initial installation, a point close to results. Gross definitions of transaction the end of the initial contract life. We types could in fact result in the contrac- took the obvious option and decided to test tor being forced to generate more workload the larger number of terminals, that at than required by vrorkload sizing merely to augmentation. fit the transaction percentage profile—or, in imposing lesser workloads than required Many Air Force base systems are asso- because the contractor could select lew ciated with each workload category at the activity variants of transaction types. augmentation point. We had to test the terminal configuration for an actual Air Another "easy" alternative frcan the Force base as defined in the RFP because Government test development standpoint is this is the basis for vendor proposals, to put the entire burden on the contractor and, most importantly, for their com- and let him run any mix of transactions munications strategies. Our stylized which he says will fit our "bottom line" workload categories associate the same workload category requirements. The addi- number of on-line transactions with all tional test burden is on the contractor — bases within the category, independent of but there is virtually no way in which the the number of terminals actually projected Air Force can be assured that what we see for the base. We developed tests for the is what we think we are seeing! terminal profile of the base in the cate- gory with the largest number of terminals. These two alternatives have been used This tests the fullest ccanmunications- in other RTE-based system tests. They related overhead, e.g., polling, multi- generally stem from a desire to see the plexing, and/or concentration (depending on system driven, but a lack of software on the contractor-proposed methodology), for which the user can base the demonstration, the largest number of terminal devices. If either because of the cost and time to con- a configuration can support the largest vert extant complex interactive software, nximber of terminals projected for it, then or because the teleprocessing requirement it certainly can support fewer terminals. is new and there is no software. Phase IV is not constrained by the lack of demon- The decision involved a trade-off. stration software. Since the contractor- There is a Phase IV terminal response time converted Air Force automated data systems requirement as well as a workload volume have to accept the same inputs as on requirement. Spreading inputs and their current equipment, we decided to specify

associated outputs across a large number of every transaction for every emulated , terminals could have the effect of making terminal. We could and did develop indivi- the average terminal response time easier dual terminal scripts whose "bottom line" to meet than if the same number of transac- total workload fit the profiles for each tions had to be queued up for fewer workload category. Note that both the ter- terminals. The risk of making the response minal activities and the initial vrorkload time easier to meet was far lower than the categorization were based on our current risk of specifying a test system with so equipment, so the test and specification are few terminals that we wouldn't be prepared totally compatible. to validate a unique communication strategy a vendor might propose for larger numbers 4.3 What Transactions to Select? of terminals. Methodology for test transaction selec- 4.2 Who Provides the RTE Inputs? tion was to individually test a small sampling of transactions frcan all software The contractor-transitioned Air Force systems that were part of the on-line software was the basis for the test. But workload test to determine their profiles who would create the emulated terminal when acting against pre-defined test data inputs? bases. This profile included number of disk I/Os generated, number of characters input We could have defined a "mix" of trans- and output, and other characteristics action types for each transitioned data directly related to the workload category system, (e.g., 40% updates, 50% inquiries, figures. Then an automated "mix and match" and 10% record deletions), and had the program selected the proper numbers and contractor provide the transactions. The kinds of transactions and created the speci-

174 . .

fic scripts by terminal by software system were given staggered start times (ADS). Each script was validated to have within the first few minutes of the timed the proper number of transactions and in to- test period to avoid a "lock step" series tal they were validated to provide the of contention-generated delays. correct amount of work against the ADS. Though our scripts were generated so

Besides completion of all specified they could be run "as is" , we had to decide processing in one hour, there is a response if that should be required of the con- time requirement to be met in the workload tractor. Execution completely in accor- test. The Phase IV systems must provide a dance with the script time spacing of maximum 10-second average response time for transactions might make it impossible for 80% of all on-line transactions, with a the contractor to meet response time maximum 30-second response time for any one requiranents. This is because all the on-line transaction. The sampling of tran- scripts for all the software systems in the saction types used in the Air Force scripts mix could not be tested in advance, at the was carefully selected to support a fair same time. (If we could do that, we would test of on-line support capability. Real, be able to handle our projected workload on on-line, interactive type transactions were the systems we have today, and we wouldn't selected. Batch-oriented transactions were need to replace them!) There is a potential not included on the rationale that response that some combinations of transactions time's importance is related to queries for across ADSs and terminals might uninten- small quantities of information, not large tionally occur in such a way that a snowball reports. A batch job vAiich requires signi- attenuation of response times could result. ficant file manipulation is not an interac- We decided that the contractor shouldn't tive type process just because it can be have to pay for the quirks in our script activated from an interactive terminal development effort, and that some adjust- ments to the timing of inputs should be Because of the average response time permitted requirement, we had to further restrict our test transaction types and test data. For The degree of freedom permitted the Phase IV 's workload test, response time is contractor was important to define. If we defined as the time between the last opera- gave them complete control, there would be tor action of the input and the display of nothing to prevent a virtual non-mix to be the first human-readable character of the run, with the terminal scripts run sequen- output. What if there is no response? tially on an ADS by ADS basis, e.g., Vfhat if the input generates output to running all supply transactions, then all several terminals? In our case, if that finance transactions, etc. Our ultimate occurs, it invalidates our average response decision was that the contractor could run time calculations. You can't calculate the all scripts as provided, or could modify time between input and output if there is them under tight constraints. no output, or if there is no input. As these types of transactions are supported If the time delay between inputs from by current Air Force software that will be any script is to be adjusted by the contrac- carried over to the new system, the tran- tor, the script must be changed so that the sactions that exhibited unacceptable forms first transaction is initiated in the first of output for test purposes and response five minutes of the test period, the last time calculation were eliminated frc«n con- transaction is entered in the last five sideration in building our terminal minutes, and all transactions in between are scripts. distributed evenly across the test period, i.e., they have identical delay times be- 4.4 Who Controls Timing of RTE Inputs? tween them.

The last critical strategic decision to 5. Resulting Test Characteristics be made was how to handle time spacing of the inputs. Our terminal scripts spread What we ended up with at the end of inputs for each terminal across the this on-line test development effort was 60-minute test period, ensuring that the the following: last input was entered sufficiently in advance of the end of the 60.00 minutes (1) A useful test tool, a remote ter- that its respxDnse could be generated within minal anulator, defined as the basis for the timed test period. Further, the the test. scripts for individual terminals interacting with the same software system (2) A workload test strategy that

175 . .

rested completely on the pre-defined basis. The "maximum number of terminals" workload categorizations vAiich could verify approach didn't fit the real life bases the telecommunications support capability anymore. The base with the largest number by the system under test in the most of terminals for its B350 0 workload cate- stressful (largest number of terminals) gory could have far less than the maximum situation. number of terminals for its U1050 workload category. We couldn't run the RTF tests (3) Scripted terminal inputs and out- "as is" because we would be requiring the puts designed by the Air Force to specifi- contractors to test networks that were not cally exercise the Air Force-defined proposed and not required by the contract. workload The problem test designers had to deal (4) A strategy for permitting the with was simply put: Now that the tests contractors to eliminate timing-related have been designed and developed to spread quirks in the Air Force data without all the on-line work across the maximum degrading the purposes of the mix test. number of terminals for any base in a given workload category, how can we test fewer. What we didn't fully realize was that we also had a flexible test basis with One option was to redevelop the which we could test the workload processing tests. This was impossible. While we were support capabilities for any of the Air updating our strategy, the offerors were Force sites in the world — with some test updating their proposals. Having to deal degradation. This "object lesson" came with the proposal flexibility factor would toward the end of the development cycle for put us back in the mode of developing over the thirty-two separate Phase IV workload a hundred tests to cover all possible test packages. combinations, just to be sure we had the right mix available for the contractors to 6. Accounting for run Implementation Redirection The other option was to maintain the Original Phase IV plans called for one- amount of work, but decrease the number of for-one replacement of current U1050 and emulated test terminals to fit the actual B3500 systems, even though they usually are test Air Force base. Decreasing the number located on the same base. After initial of terminals to be emulated wDuld Phase IV proposals were received and after apparently also lose the work input by a large amount of work had been invested in these terminals. Rather than "falling developing workload tests. Air Force and back" on the amount of workload required to congressional reexamination of Phase IV 's be processed, the work was input directly implementation plans resulted in a redirec- at the mainframe instead of via the deleted' tion of this concept. The one-for-one phi- terminals. This could be done because of losophy fell by the wayside for a large the way the Air Force software works. Live number of bases vftiich would now either get terminal transactions can be batched into a a single computer system to replace their disk file in the test set up process, and two on-base computers or be remote sites pulled off the disk file to have the same associated with regional centers. The same effect on the data base as the live trans- workload categories applied, in additive actions for the B3500 software during the fashion, to the single systems: the U1050 test. For the U1050 software, direct card and B3500 projected workloads at a given reader input of the transactions identical Air Force base would have to be handled by to the live transactions has the same the same system. effect. Any terminal in the test data package in excess of the actual test con- From the batch testing standpoint, no figuration was deleted frcm the test, but real problems were posed. The U1050-based its transactions were redirected for the test and the B3500-based tests could be run alternative input. From the network stand- concurrently to test capability to process point the test is somewhat easier. From the the combined workloads. From the terminal workload standpoint, the system is per- network and testing stand-point, at this forming the same test. This approach point all bets were off. The workloads of required no redevelopment of test materials, the two systems at each base are continued to support full vrorkload testing, independent. The mix of new combined and verified to the developers that we had a workloads and remaining single workload powerful, well-constructed test methodology. systems meant that workloads from separate categories were combined on a base-by-base

176 7. Conclusions

This paper has described how Phase IV decided that a remote terminal emulator would be used for its on-line workload tests and the major implementation decisions we had to make in supporting the use of the RTE. In addition to those mentioned above, we had to decide v*iat to call our terminal input collections -- scripts as the Air Force does, or scenarios as the rest of the world does. We had to decide how fast a terminal operator types so we could include the correct operator time in our script time delays and in our constraints to the con- tractors. We had to weight the number of transactions that could be input and output at a device by the kind of device it was -- a terminal with a screen display displays output faster than a printer output ter- minal , and so can handle more transactions in an hour than the printer terminal. We had to decide how to present the scripts to the contractors, and decided that print image format on tape would be acceptable. And there were many other decisions like this, every day of the test development effort.

We planned RTE use into our on-line tests the hard way. There was no GSA standard, so we had to create our own RTE specification, and justify it as a require- ment. Just about everywhere there was a decision to be made, we made the decision that would give the Air Force the most visibility and the most control over the testing process. Usually this meant more work for us. But we have the users of 2 37 Air Force systems to answer to if things don't work out correctly.

In the end, we decided that we have a very good on-line workload test. It hasn't been run yet by the contractors, and that will be the final proof. We also have ccme to realize, over the course of thousands of hours of COTiputer time and several years of development and refinement of RTE scripts, that use of an RTE is not to be undertaken lightly. The end requirement must comple- tely justify not only the burden put on the contractor performing the test, but also the massive amount of time and money it takes to adequately prepare for its use.

177

Cost/Performance Evaluation - Methods and Experiences

179

AN EVALUATION TECHNIQUE FOR EARLY SELECTION OF COMPUTER HARDWARE

Bart D. Hodgins and Lyle A. Cox, Jr.

Naval Postgraduate School Monterey, CA 93940

There is a need for a decision making tool for use early in the com- puter selection process. Such early selection tools are critical to the decision maker due to the environment in which the manager of large scale procurements is forced to operate. The instruction mix technique was well known in the early years of computing. It was simple, easy to use and easy to understand. However, as the complexity of machines and support software grew, the technique was replaced by more sophisticated performance prediction tools. With certain modifications, this technique can continue to provide important information. By using a number of instruction mixes in a computer as- sisted environment, it is possible to find performance trends of hardware in the very early design stages. These trends can be used effectively to make selection decisions. The instruction mix sensitivity technique as demonstrated here has the potential to aid the decision maker in evaluating the performance of a system prior to the actual existence or availability of that hardware without resorting to costly and time consuming techniques such as simula-

tion or model i ng.

1. Introduction fore prototype hardware is available. Hardware selection must be made quickly There is a need for a decision and accurately. Errors cost time and mo- making/selection tool for use in the com- ney. Any delay caused by selection can puter selection process (particularly for have a ripple effect building through the those involved in the very large procure- entire process. Poor selection of the ments typical of our government). The in- hardware to be used as the basis for a struction mix sensitivity technique as system can result in cost overruns in oth- demonstrated here has the potential to aid er areas to compensate for the lack of ac- the decision maker in evaluating the per- ceptable hardware performance. At present formance of a computer prior to the actual there is no general method for computer existence or availability of that hardware hardware evaluation and selection com- without resorting to costly and time con- pletely suitable for use early in the pro- suming techniques such as simulation or curement cycle. Poor procurements are of- model i ng. ten made because the decision maker is forced to make a selection without the 1.1 Background benefit of having candidate hardware available. Similarly, selections of Large Automated Data Processing pro- equipments are often based on imprecise curements evolve through a cycle that of- and quantitatively vague ideas of the ac- ten last five to seven years. The selec- tual operational utilization that the sys- tion of computer hardware is forced to oc- tem will face years in the future. cur early in the procurement cycle. This long period of time from selection to There are several methods currently operational installation often necessi- being utilized for the evaluation of a tates procurement decisions to be made be- computer's performance. They include:

181 (1) benchmark programs which are existing ences between a computer's predicted thru- programs coded in a specific language, put over a collection of mixes represent- then executed and timed on a target ing various applications. The advantage machine [1], (2) kernel functions which of this technique is that neither the are typical functions partially or com- hardware of software need be completed -- pletely coded and timed [1], (3) simula- only the organization and technology need tions which are a combination of a model be determined. The eventual utilization of the system, model of the workload, and of the system need not be precisely de- a measurement of the resulting data [2], fined. This technique provides immediate and (4) analytic models which are evaluation results with a minimum expendi- mathematical representations of the target ture of time and money. machine [1]. These methods are all in use (principally in industry) to evaluate pro- Using the IMSET requires only that posed computer systems for procurement. the vendor furnish the performance specif- For smaller procurements these methods are ications regarding instruction execution effective because their acquisition cycle times. These performance specifications is shorter. In some cases it is even pos- are often available years in advance of a sible to wait until both hardware and prototype model. With these times, and software are available, and to use the the analysis technique presented here, the standard evaluation techniques in making a evaluator can compare the predicted per- specific computer system selection. formance of any hardware against the anti- cipated application. The particular In large acquisitions, selection must machines to be considered in the selection occur early, and unique problems arise need not be prototyped. that cannot be easily solved by the vari- ous evaluation techniques. Evaluation by The use of the IMSET as a tool for the benchmark program method is impossible evaluation provides the decision maker because the different machines are not al- with a profile representing the candidate ways available. Even if a prototype computer's average execution time for the hardware of a future system were available various applications presented in the set for evaluation, the benchmark programs and of instruction mixes. From the data the the kernel function methods often prove decision maker can select the hardware inadequate because the support and appli- with the best profile for the intended ap- cations required to validate the technique plication. For example, if the evaluator usually do not exist at that point. is looking for a machine to perform ac- counting functions then the selection When a selection must be made quick- would be based upon how sensitive each ly, there is usually neither the time, mo- candidate is to the mixes which represent ney, nor the sufficiently detailed design accounting and related functions. The information necessary to model /simul ate less sensitive the machine in terms of ex- the proposed computer systems. How can ecution time the more appropriate it would the selection be scientifically made? It be for selection, since this indicates is because of this problem that the in- that it can execute effectively a broad struction mix sensitivity technique (IM- spectrum of related functions. SET) has been developed. 2. Instruction Mix Sensitivity Technique 1.2 The Instruction Mix Sensitivity Tech- nique Overview 2.1 History

The instruction mix sensitivity tech- The instruction mix as a technique nique is based upon the older instruction for evaluating the performance of a mix method for predicting computer computer's hardware came into being in the hardware performance. In the instruction late 1950's and early 1960's. It evolved mix method a number was computed which as a result of the limitations of an ear- represented the average thruput of a par- lier technique for measuring a computer's ticular hardware. This number was based performance called the instruction execu- upon the relative usage of a given in- tion timing method. struction in a particular application, and its execution time on the evaluated The instruction execution timing hardware. Where the older method was method considered only a single class of based on a single mix representing a instructions. The instruction mix tech- specific application, the sensitivity nique incorporated along with the arith- technique evaluates trends in the differ- metic class, the logical class (i.e. COM-

182 PARE, SHIFT, MOVE, etc.), and in some in- 2.2 IMSET stances I/O and other miscellaneous in- structions. Associated with each instruc- The variation of the instruction mix tion in the mix was the percentage of use technique described here is called the in- of that instruction, called a weighting struction mix sensitivity technique (IM- factor unique to that particular mix. SET) [11]. The IMSET uses a set of ten This weighting factor represented the ap- instruction mixes chosen from over 20 can- proximate probability of occurrence of didates [11] to represent a wide range of that instruction in the programs to be computer applications, spanning from used on the machine. For instance, in a real-time thru scientific to business com- scientific instruction mix one would find putations. Utilization of the IMSET pro- that the percentage of floating point mul- vides the evaluator with a profile tiplications would be higher than the per- representing a hardware's execution times centage for that same instruction in a across all mixes in the set (and hence data processing instruction mix. The pro- across a broad spectrum of applications). babilities in an instruction mix were The comparative profiles of execution determined by either statically or dynami- times provide the decision maker with a cally tracing the programs representative quantitative prediction of how sensitive of specific applications. each computer is to the various mixes and hence how the system would perform over a The instruction mix technique is easy wide range of applications. This is in to apply. By multiplying the execution contrast to the instruction mix technique time of each instruction by the weighting which only provided the evaluator with a factor and summing, one obtains the time thruput evaluation on one mix -- one ap-

required to execute an average instruction pl i cat ion. for that particular mix on that particular computer. This average time can be in- The IMSET uses eighteen functional versely related to a thruput rate in instructions which constitute the basis instructions-per-second (IPS). These to- for evaluation. These include seventeen tals can then be compared with similar specific instructions and one "I/O miscel- rates obtained from other machines, to laneous" category. The eighteen function- give an idea of relative CPU thruput. al instructions and ten mixes which con- stitute the IMSET are shown in Table I. Mixes for many applications have been developed. The most popular of all mixes To use IMSET it is necessary to was the Gibson Mix [3] developed by Jack determine the execution times of the 18 C. Gibson in 1959 on data obtained on the instructions for each central processor to IBM 7090 computer. The Gibson Mix was be evaluated. considered a general -technical mix. There were other similar mixes [4, 5, 6, 7, 8, A detailed discussion of the develop- 9, 10] for data processing, navigation, ment of this technique and the details of scientific, and a myriad of other applica- calculating hardware timing figures are tions. discussed further in Hogin's report [11]. While particular care must be paid to As computer hardware and software determining the timing figures, the pro- technology advanced, especially as systems cedure is not difficult. moved into a multiprogramming environment, it soon became apparent that using a sin- Of particular interest is the han- gle instruction mix to evaluate absolute dling of concurrency in modern hardware. performance was no longer adequate. Among In the standard application of the in- the shortcomings of the technique was its struction mix technique many special failure to account for differences in ad- features such as concurrency of a central dressing modes, word sizes, and operand processor's hardware were ignored. For lengths. The effect of system software example, some processors begin execution upon the mix weights was difficult to as- of a second instruction before the current sess. Perhaps the biggest disadvantage instruction has finished its execution. was the problem of how to validate the in- This allows effective execution times to struction mix to insure that a particular be cut significantly. The IMSET presented mix accurately reflected the intended ap- here takes into account the overlap capa- plication. How can the instruction proba- bilities of the machines being evaluated bilities be determined if the programs by applying a "Knuth Factor". This idea representing the eventual workload have was provided by [12], and is described in not yet been written? [11]. The Knuth Factor compensates for

183 TABLE I

IMSET FUNCTIONAL INSTRUCTIONS AND MIXES

I. ARITHMETIC II. LOGICAL III. CONTROL IV.

FiXFn PpINT FLOATING I/O 1ISC. (SP) (DP) POINT (SP)

BRANCH

X LU C3 IMSET niXES

0.0* PROCESS CONTROL .06'l .013 .001 o.o* 10* 0.0* 0.0* 0.0 .022 0.0 .^oo .480 0.0* 0.0 0.0 0.0* 020

MESSAGE PROCESSING .005 0* 0.0* 0.0* 0.0' .050 .00510 0.0* .010 .033 .150 .m .ItO o.o .03CI 058 0.0* 052

01i| 0.0* REAL TIflE .12C .lOf .020 012 0.0* 0.0* .020 .070 .010 ,500 .100 0.0 0.0 0.0* 0.0* 020

COmUHICATION CNTL .nsr .002 0.0* 0.0* .002 005 .001 .001 .075 .075 .070 ,'175 0.0* .0501 .03(|o .0* 035 .090

DATA 0.0* 0.0* 0.0* 0.0* COflPRESSION .19f .06f .060 0.0* .120 0.0 0.0 ,27fl .060 0.0 .06( .060 0.0* .120

0.0' 0.0* 0.0* 0.0* NAVIGATION .23( .25( 0.0 0.0* .020 0.0 0,0 ,30C .ma 0.0 .0'; 0.0 0.0* 140

TLfl THRUPUT 0.0* 0.0* 0.0* 0.0* .09( .m( 0.0 0.0* .290 0.0 0.0 ,22r .15(1 0.01 .o'ld 0.0 0.0* .190

TECHNICAL 0.0* GENERAL .02^ .02 .025 0.0* oil .011 .011 .108 0"5| 0.0 .36] .775 0.0 0.0 0.0* 0.0* .103

0.0' 0.0" 0.0* 0.0* SCIENTIFIC 0.0* 095 .056 .020 0.0 CO' 0.0' 28' .132 0.0 o.o 0.0* 225 .187

COMPOSITE GENERAL .02 .022 022 0.0* 0.0* .003 .003 .003 .121 0291 O.Q M 0.0 0.0' 0.0* 0.0' 057

* Weight not assigned by mix for this functional instruction

184 the machines which have overlap or paral- computers is presented in [11]. lel processing abilities by scaling down their execution times by an amount compar- The demonstration to obtain the pro- able with the typical use of this feature files of all hardwares chosen versus the by most programs and compilers. set of ten mix applications of the IMSET was conducted on the computerized evalua- 2.3 Some User Notes tion system. Three sample profiles are shown in Figures 1 through Figure 3. Fig- The I/O instructions omitted from ure 4 is the composite of all the micros. many mixes caused problems with early Table II provides a key for the instruc- evaluations. I/O instructions are a mix- tion mixes listed by letter for each of ture of peripheral capability and a cen- the computer profiles. tral processor capability. The mixes presented here include the I/O instruc- When analyzing the profiles it is im- tions in the miscellaneous category rather portant to remember that the purpose of than as a specific instruction. In this the IMSET is to compare a machine's execu- way the central processor's ability to tion time sensitivity between applica- handle I/O is treated as an average over tions, not only its estimated effective all of the I/O instructions without having execution speed for any one application. to specify an exact instruction or partic- The sensitivity between applications is ular device. determined by comparing the times of exe- cution -- the performance trends. Consider It should be pointed out that the in- the following example: struction mix was a tool to be used prin- cipally for the comparative evaluation of the central processor hardware. The way the central processor is configured with Table II other system components such as storage devices and other I/O and peripheral dev- KEY TO THE INSTRUCTION MIXES PRESENTED

ices must be considered separately. This IN FIGURE 1 THROUGH FIGURE 4 characteristic of the instruction mix technique carries over to IMSET. LETTER INSTRUCTION MIX

The user must be aware that the a Process Control software associated with the system in- cludes the operating system, language pro- b Message Processing cessors, in addition to applications pro- grams. All these have an impact upon c Real Time overall performance. The mixes in IMSET reflect all types of computation, making d Communication Control it possible to consider performance in both system software and application e Data Compression software areas. f Navigation 3. A Demonstration of IMSET g TLM Thruput In the final stage of the IMSET development process six micro-computers h General Technical were selected for competitive evaluation

to demonstrate the strength of the IMSET i Scientific when used to evaluate machines closely re- lated in characteristics. This was in- j General Composite tended to reflect an actual evaluation for selection situation. The micros selected are all 8-bit or 16-bit machines ranging from some earlier models to some much more recent ones. Those selected are the ZILOG Suppose we wish to select a micro- 8000, INTEL 8086, MOTOROLA 68000, DIGITAL computer whose principal purpose is ex- EQUIP. CORP. LSI 11/23, and the TEXAS pected to be the processing of both navi- INST. 9900. gation and telemetry applications.

Determination of the individual in- We first screen all candidate struction times for each of the micro- machines to verify that they 1) meet our

185 13 '1=4. 352309 to=6. 171390 16L ':=» 1 0.53665: d=»3. '350

14 .

f»l 1 . 31405: 3=6. 163400 TIME h-3. 375123

MICRO i = i: ^"^^ SECS J =7. 420423

. S

4 .

d e f 3 h IH3TRUCTI0H Mi:

FIGURE 1. DIGITAL EQUIPMENT CORP LSI 11/23

.1=2. 739150 b=2. 521359 c=4. 379433 d=3. 445000 «=4. 367409 f=6. 627600 5 . j=3. 406500 TIME h=4. 071330

MICRO i =6. 374759

4 . SECS J =3. 663729

3 .

2 .

1

d e f 3 h i INSTRUCT I OH MIXES

FIGURE 2. ZILOG 8000

186 13 a=3. 299730 b»4. 361490 IS c =5. 3 13333 6=7. 79&388 14 C = 9. isc'S??;" f = li-J. 341731 9»i3. TIME 12 i52S29'j »la5. 307403 MICRO i=»15. 134319 SECS J "4. o34>5oi3

8

S i

4 .

d e 1^ h IHSTRUCTIGM f-li;-1E3

FIGURE 3. MOTOROLA 68000

INSTRUCT I CM HIXE3

Figure 4. Composite sensitivity profiles of microcomputer hardwares (1) ZILOG 8000, (2) INTEL 8086, (4) MOTOROLA 68000, (5) TEXAS INSTRUMENTS 9900, (6) LSI 11/23.

187 minimum absolute performance requirements with more complex workloads and complicat- and 2) conform to other system require- ed computer systems. It does, however, ments such as availability of support demonstrate its utility and ease of use. software, peripheral compatibility, etc. For those machines which meet these cri- 5. Conclusions teria, we apply the IMSET procedure. The micro-computer selected should present the We have introduced a method, the IM- smallest change between the two applica- SET, with which the decision maker can tions (since the eventual percentage of quickly and efficiently select a computer workload is not known for the two com- hardware from a number of candidates. One ponents). simple example of the application of this method was presented and profiles of actu- For our example, we will assume that al hardwares were shown. the IMSET technique will be used to decide between the Digital Equipment Corp. LSI Validation of the mixes vs. applica- 11/23 and the Motorola 68000 for our navi- tions when using the IMSET for evaluation gation and telemetry problem. The IMSET is not the problem it was for the instruc- results are: tion mix method. When evaluating by the IMSET method, particular sensitivities within a broad area of intended use are being measured, whereas with the instruc- Table III tion mix method, execution time of a specific application was being estimated. IMSET Demonstration Results Thus, validation is not an issue when utilizing the IMSET. Avg. time/inst. Sensitivity PROCESSOR (microseconds) (per cent) The advantages to using the IMSET NAV Mix TLM MIX over other currently used techniques are tremendous. As mentioned above when early Motorola 58000 10.8 8.6 26 selection is required, the decision makers are often not able to utilize many of the DEC PDP 11/23 11.3 6.2 84 more sophisticated, and proven methods. With the IMSET the decision maker needs only the manufacturer's projected instruc- This analysis suggests that the tion set execution times. With these MOTOROLA 68000 might be the more prefer- times it is possible to obtain the evalua- able micro-computer due to its lower sen- tion data within a matter of hours and at sitivity (more uniform performance) to the minimal cost. This technique provides a two appl ications. savings in time, savings in money, and greater confidence in the selection. If we were certain that the principal Perhaps its most attractive advantage is function of the system was telemetry, that its ease of use. the navigation functions to be performed are minimal, and that this balance betvveen The data provided by IMSET can be in- the functions would never change signifi- tegrated easily into more comprehensive cantly during the lifetime of the system, selection techniques. While a discussion the DEC 11/23 would be the better choice of this aspect is beyond the scope of this due to its greater thruput for telemetry paper, it is easy to see how IMSET could applications. In the early stages of sys- be used in specifying or selecting systems tem design we are seldom sure of all these when used in the context of the Cost-Value things. Averaged over both applications, Selection Technique [14], or similar the performance of the two candidates is methodologies. roughly equal. However, as long as we cannot be certain of what the workload The strength of the IMSET as an proportions will be between navigation and evaluation tool lies in the fact that it telemetry, it would be prudent to select is able to be applied in the absence of the machine whose performance is less sen- available hardware and specific knowledge sitive to fluctuations in the job mix, in of intended application. It is very im- this case the Motorola 68000. portant that a tool such as the IMSET be an integral part of any decision making This is a very simple example. The process affecting the procurement of com- IMSET is capable of being used in much puter systems in the future. more challenging evaluation situations.

188 REFERENCES Limitations of the Cost-Value Techniqu for Competitive Computer Selection 1. Lucas, Jr. H. C, "Performance Evalua- Proceedings, Computer Performance Evalua tion and Monitoring," Computing Surveys, tion Users Group 15th Meeting, 1979. V. 3, No. 3, P. 79-90.

2. Svobodova, L., Computer Performance Measurement and Evaluation Methods: Analysis and Applications, p. 52-56, Amer- ican Elsevier, 1976.

3. IBM Technical Report TR00.2043, The Gibson Mix, by J. C. Gibson, p. 2, 18 June 1970.

4. Corsiglia, J., "Matching Computers to the Job - First Step Towards Selection," Data Processing Magazine, p. 23-27, De- cember 1970.

5. Arbuckle, R. A., "Computer Analysis and Thruput Analysis," Computer and Auto- mation, V. 15, No. 1, p. 12-19, January 1966.

6. Flynn, M. J., "Trends and Problems In Computer Organization," Information Pro- cessing 74, p. 3-10, 1974.

7. Knight, K. E., "Changes in Computer Performance," Datamation, V. 12, No. 9, p. 40-54, September 1966.

8. Raichelson, E. and Collins, G., "A Method for Comparing the Internal Operat- ing Speeds of Computers," CACM, V. 7, No. 5, p. 309-310, May 1964.

9. Weitzman, C. , Distributed Micro/Minicomputer Systems, p. 18-19, Prentice-Hall, Inc., 1980.

10. Honeywell Inc., St. Petersburgh Fla.

Aerospace Div., ESD-TR-76-295 , Secure Com- munications Processor Selection Trade Study, by C. H. Bonneau, March 1976.

11. Hodgins, B. D., "A Computer Evalua- tion Technique for Early Selection of

Hardware" Naval Postgraduate School , De- cember 1980.

12. Computer Science Department Stanford University Report CS-186, An Empirical Study of Fortran Program, by D. E. Knuth, p. 32, 1970.

13. Eckhouse, Jr., R. H. and Morris, L.

R., Minicomputer Systems, 2nd ed. , p. 138-139, p. 142-143, Prentice-Hall, 1979.

14. Barbour, R., Holcombe, J., Harris,

C, and Moncrief, W. , "Applications and

189

,

DEFICIENCES IN THE PROCESS OF PROCURING SMALL COMPUTER SYSTEMS: A CASE STUDY

Andrew C. Rucks

Peter M. Ginter

University of Arkansas College of Business Administration Fayetteville, AR 72701

The Latin phrase caveat emptor is nowhere more applicable than in the practice of acquiring computer systems for small business and small govern- ment entities. In a recent issue of Business Week, it was estimated that approximately 50 percent of the purchasers of small computer systems are dissatisfied with their systems. While many computer users are litiga- ting vendors for redress of losses caused by insufficient data processing capability, this does not address the major failings of the process of procuring small systems. The small system procurement process produces a high rate of failure because many buyers inadequately analyze their data processing needs and fail to express these needs to potential vendors in a manner that will lead to the procurement of need satisfying computer systems.

Key words: Computer procurement; proposal evaluation; RFP preparation; small systems.

1. Introduction The RFP that is the basis for this paper was issued by the County Commission of Many small organizations have experi- Escambia County, Alabama in December 1980. enced dissatisfaction with procured computer Proposals were submitted to the Commission systems [l]. Ample evidence of the frustra- by three vendors. These vendors are not tion experienced by small organizations is identified in this paper since to do so found in recent litigations brought against would be in variance with disclosure restric- computer vendors [2,3]. A recent procure- tions specified in the proposals. ment undertaken by the County Commission of Escambia County, Alabama offers a typical 2. Critique of the RFP example of the major cause of user dissatis- faction — a poorly developed solicitation The Escambia County RFP is typical of document. The inadequacy of the Escambia proposals for which far too little data County solicitation document was precipita- processing needs analysis has been performed ted by a lack of technical expertise and a and too little computer procurement exper- general reluctance to obtain assistance tise has been utilized. As a result, the that could have contributed to the planning RFP failed to precisely express the County and execution of a successful procurement. Commission's requirements to potential The purpose of this paper is to (1) discuss vendors. The Escambia County RFP suffered a poorly prepared request for proposal (RFP) from three major deficiencies; (1) perfor- (2) indicate the probable consequences of mance specification vagueness, (2) require- issuing an inadequate request document and ments contradictions, and most importantly (3) offer a simple but effective RFP out- (3) a lack of functional requirements. The line tailored for small organization procure- RFP was divided into seven major sections in- I ments . cluding: —General Conditions, II—Vendor

191 Support, III— Escambia County's Responsibi- 2.1 Section I —General Conditions lities, IV— System Requirements, V— Soft- ware Requirements, VI—Form of Surety Gua- All three of the deficiencies suggested ranty, and VII—Appendix (see table 1). The above were apparent in Section I of the major sections of the RFP will be reviewed Escambia County RFP. For instance, the RFP briefly in order to delineate the proposals suggested "vendors may be required to demon- debilitating weaknesses. strate the capability of their proposed equip- ment and system software." Yet there was little in the RFP to suggest the actual capa- Table 1. Contents of the Escambia County city requirements. Also in this section, the RFP RFP required vendors to "submit his best and cheapest proposal." This requirement was Page rather contradictory and provided little in the way of a meaningful statement of expecta- I. GENERAL CONDITIONS 1 tions. Additionally, the Escambia County RFP confounded vendors' proposals by demanding

1. Purpose 1 delivery of a "garbage collection service billing and collection system" never 2. Closing Date 1 but itemized specifications in Section V Soft- 3. Vendor Proposals 1 — ware Requirements. The County also 4. Demonstrations 1 weakened 5. Acceptance of Proposal considerably its negotiating position by Content 1 indicating that it desired to consider only lease-purchase plans. 6. Rejection of Proposals .... 2 procurement The RFP 7. Incurring Costs 2 stated "Escambia County prefers a three-year, 8. Vendor Contract Respon- lease-purchase plan (lease with option to sibilities 2 purchase)." This requirement precluded vendors from offering alternative methods of 9. Equipment Expandability ... 2 acquisi- tion. 10. Delivery and Installation . . 3 11. Site Preparation 3 II 12. Type of Contract 3 2.2 Section —Vendor Support 13. Vendor's Guarantee 3 The vendor support section of the Escambia County RFP likewise left II. VENDOR SUPPORT 5 considerable to the Imagination in terms of specificity and clarity. The III. ESCAMBIA COUNTY'S RESPONSIBI- RFP called for "vendors... to supply both tested hardware and software" LITIES 6 but failed to describe what tests were re- quired. This section also contained such IV. SYSTEM REQUIREMENTS 7 vagaries as: "the vendor is requested to , furnish maintenance within four hours of 1. System Summary 7 being called on an 8-hour day basis..." and 2. Equipment Requirements .... 7 "The vendor would be required to train the 3. Equipment Operating Software county's personnel..." Requirements 8 4. System Installation 8 2.3 Section IV— System Requirements V. SOFTWARE REQUIREMENTS 9 Section IV was potentially an important section con- A. Financial Management part of the RFP and yet this Information System 9 tained specifications which were contradictory with the general conditions specified in 1. Vendor Accounting .... 9 Section I. Moreover Section IV contained 2. Payroll Accounting .... 10 statements such as "only equipment currently 3. General Ledger and in production shall be proposed" which left Budgetary Accounting . . 11 for a of vendor inter- B. Property (Real and Personal) the door open variety pretations . fax Billing and Collection . . 13

Within one subsection of Section IV was VI. FORM OF SURETY GUARANTY ..... 15 an attempt to specify the operating software for the system to be procured. The RFP speci- VII. APPENDIX 17

192 .

fied a "disk-operating system." The meaning 3.0 Vendor Responses of this requirement was never clarified. Additionally, programming languages and util- As a result of RFP specifications which ity software were treated with equally vague were often contradictory and much too general, and non-specific language. For example the the task of evaluating the actual requirements RFP stated: of Escambia County was subject to a variety of interpretations. This vagueness provided The vendor shall provide vendors an opportunity to interpret the RFP interpreters and/or compilers according to their own equipment and soft- considered to be the prime ware availability. In addition, vendor pro- programming language for the posals could legitamately contain vagaries equipment offered. These which seemingly satisfied the purchaser's programming languages shall needs. A summary of the vendor's responses be the latest and most up- is provided in table 3. to-date versions available. Also, sort and utility routines 4. Proposed Remedies to support the system shall be supplied by the vendor. The incompleteness, vagueness, and inconsistency of the Escambia County RFP 2.4 Section V— Software Requirements precipitated the delivery of proposals that were incomplete and impossible to compare Applications software specifications in terras of system costs. It was difficult were a consistent and persistent problem to make a cost comparison because each of the with the Escambia County RFP. Application vendors chose to interpret at least one of the software was mentioned in three sections of "may include" clauses in the RFP to mean the RFP and these references were not con- "not required", therefore, incongruous sys- sistent in terms of identifying the required tems were proposed. As a result the Escambia application software. Table 2 illustrates County Commission was strongly urged to with- these inconsistencies. It is interesting draw, rework, and resubmit its RFP. to note that even though "Voter Registra- tion" software was mentioned in the Soft- 4.1 RFP Outline for Small Organizations ware Requirements section of the RFP, a statement of requirements for this software An RFP becomes the basis for dealings package was not included in the RFP. The between the organization acquiring a computer property tax system was the only applica- system and computer system vendors. Simply tions software item mentioned in all three stated an RFP discloses to potential vendors applications software references. thewho, what, where, when and how of a pro- curement action. Thus an RFP can be divided into as few as five major sections including: Table 2. Application Software References in (1) procuring organization, (2) contract terms the Escambia County Alabama RFP and conditions, (3) delivery schedule, (4) system specifications and, (5) operating Application Software RFP Paragraph References environment Title 1-10* IV-1** V*** 4.1.1 Procuring Organization Financial Management Information System yes no yes This section of an RFP should be used to provide vendors with information con- Property Tax Billing cerning the business name and address of the & Collection System yes yes yes organization issuing the RFP. The organiza- tion's agent (s), contracting officer, pro- Garbage Collection curement manager and other responsible indi- Service Billing & viduals and their responsibilities should be Collection System yes yes no listed. It may be desirable to include a brief description of the mission of the organi- Voter Registration no yes yes zation and its history.

Automobile Liscense 4.1.2 Contract terms and conditions Records no yes no This section of the RFP delineates all *Section title—Delivery and Installation procurement and contract administration pro- **Section title— System Summary cedures. Potential vendors must be told when ***Section title— Software Requirements proposals are due, the form of proposal sub-

193 Table 3. Summary of Vendor Responses

RFP Para. Vendor I Vendor II Vendor III

I-l Concurrance assumed Concur Unknown

1-2 Concurrance assumed Concur Unknown

1-3 Concurrance assumed Concur Unknown

1-4 Concurrance assumed Concur Unknown

1-5 Questionable concur- Concur Unknown rence

1-6 Concurrance assumed Concur Concurrance assumed

1-7 Concurrance assumed Exception: Escambia Unknown Co. to incur all "liv- ing/traveling cost" associated with a demonstration

1-8 Concurrance assumed Concur: includes Unknown copies of warranties and performance guarantee

1-9 Qualified concurrance Qualified concurrance Unknown

I- 10 (a) Concurrance assumed Concur Concurrance unknown, but assumed

(b) Exception: 120 days Exception: 240-360 Unknown after award (DAA) DAA

(c) Concurrance assumed Exception: implemen- Unknown tation by Oct. 1, 1981

(d) Concurrance asstuned Exception: implemen- Unknown ted by Oct. 1, 1981

(e) Concurrance assumed Concur Unknown

I-ll Concurrance assumed Concur Unknown

1-12 Concur: No cancellation Concur: Cancellation Exception: does not penalty at end of years penalty-the lesser of offer a lease with 1 and 2 (unknown in third 20% of remaining con- option to purchase

year) ; 50% lease payment tract value or 4 (LWOP) plan-offers credited to purchase, soft- times the montly "Government Instal- ware not included in lease charge. No lment Sales Plan" lease agreement penalty charged if that calls for 10% converted to purchase. down and the remain- Maximum price increase der financed over under contract 5% 36 or 60 months. per year Requires Escambia County to sign a "Non-Funding Clause" which disallows termination of the

194 .

Summary of Vendor Responses (cont.)

RFP Para. Vendor I Vendor II Vendor III

contract, except when the county does not have suffi- cient funds in a fiscal period to pay the contract amounts. The clause precludes Escambia County from securing replacement services, except from thisvender curing the life of the contract.

1-13 Concurs with perfor- Concurs with perfor- Unknown mance bond provision. mance bond provision. Exception to require- Exception same as ment of specifications Vendor I and vendor responses becoming part of con-

tract .

II Proven hardware and Concur on tested Proven hardware and operating software hard-ware and soft- operating software supplied. An account ware. A Marketing will be supplied, manager will be Representative and the quality of assigned, the qualifi- a Systems Engineer application soft- cations of this person will be assigned ware is unknown were not supplied. as Project Man- Does not address

The number and type of ager (s) . Some project manager or "Customer Service categories of System systems analyst sup- Representatives" assign- Engineer support port. No preventive ed to the county is offered at no maintenance offered. determined by the charge, other types "Same day" response manager. Telephone of service offered time offered, but support provided. No at the rate of $68.00 no cost stated. charge for support ser- per hour. On call vices. Concurrance as- maintenance service sumed with provision of available on a 24 backup facilities. hour per day, 7 day Page 3 of Appendix A per work basis for indicates that major system compo-

Covington Co. , is a nents and 7 a.m. to comparable installa- 6 p.m. Monday tion. However, does through Friday not indicate if back- for minor systems up can be obtained. components. No Preventive mainte- added charge for nance performed leased systems, monthly. On call except for the maintenance with 4 minor system hour response time components servi- between 8 and 5 ced outside the (assumed to be 8 a.m. afore mentioned to- 5 p.m.). Whether limits which will or not this applies be billed at a 7 days per week is fee to be deter- unknown. Parts depots mined. No re- in Montgomery and sponse time

195 13 ) . . .

Summary of Vendor Responses (cont.)

RFP Para. Vendor I Vendor II Vendor III II Mobile. Concurs with guarantee is offered. (cont. training of country Concurs with training personnel needs

III Concurrance assumed Concurs Unknown

IV- Concurrance assumed Concurs Concurrance assumed by Virtue of a proposal being submitted.

IV- Master Control Program System Support Interactive Multi- COBOL Program RPGII programming Opera- RPL compiler ting System COBOL Compiler

IV- 4 No promised delivery Unknown schedule

V-A 1. General Ledger and Financial Management Financial Management Budgetary Accounting Information System Information System

V-A 2. Payroll Same as above Same as above

V-A 3. Same as V-A 1. Same as above Same as above

V-B Property Tax System Property (Real and Tax Collection Personal) Tax System Billing and Collection System

mission, the procedure for contract award, of a complete system. As a complete system the procedure for conducting face-to-face specification, this section of an RFP is negotiations, the penalties to be imposed divided into five subsections: (1) hard- > if the vendor fails to deliver any or all ware specifications, (2) software specifi- of the items specified in the RFP, the cations, (3) system performance specifica- procedure to be followed for terminating tions, (4) maintenance requirements, and a contract, and the procedure for resolving (5) training requirements. disputes arising from the contract. In summary the contract terms and conditions Hardware specifications should be section of an RFP presents all of the legal stated in functional terms. In other words, understandings among the parties. the procuring organization should not place itself in the position of specifying system 4.1.3 Delivery schedule architecture to vendors. Included in hard- ware specifications are the desired charac- The delivery schedule should contain a teristics of terminals, printers, secondary list of all of the items to be delivered storage devices, etc. Since it is unlikely under the contract—not only hardware and that a live test demonstration (LTD) will software", but also training and manuals, be conducted as part of the pre-award pro- posal evaluation process, it is particularly etc. ; and the date by which the items must be delivered. It is customary to express important to be specific concerning the delivery dates in terms of days-after-con- type and quantity of secondary storage. tract award (DAA) Software specifications fall into two 4.1.4 System specifications categories: system software and applica- tions software. The system software to be This section informs potential vendors delivered includes: an operating system, of the procuring organization needs in terms system utilities, text editor(s), programming

196 language interpreters and compilers, and pro- quired and have vendors provide a list of gramming support utilities (e.g. link/editor). dates and locations for the desired train- For small systems, the description of these ing. items should be general and functional in nature. 4.1.4 Operating Environment

The major emphasis in developing the This section discloses to vendors the system specifications section of the RFP characteristics of the site where the sys- should be devoted to providing vendors with tem will be installed. It shouJ.d contain detailed descriptions of the applications detailed drawings of the site with dimen- software that is to be delivered. This sions that will indicate to vendors such description should include as much detail important considerations as the length of as possible concerning the purpose of the cabel runs. Other important information software, the quantity and description of includes air conditioning and electrical input data, the quantity, frequency, and system capacity. description of all reports to be generated by the software system. References

The majority of procurements of large [l] Computers: A burst of critical feed- scale computer systems include provisions back. Business Week , February 2, 1981, for an LTD. An LTD is used to determine pp. 68-71. if the data processing capacity of a proposed system is sufficient to meet the [2] Laberis, B,, Judge OK's $2.3 million award to NCR intermediate term (five to eight years) user, Computerworld , needs of the procuring organization. How- May 25, 1981, p. 11. ever, this is generally not feasible for Davis, L. Burroughs small system acquisitions. The expense of [3] , admits to 160 suits, preparing and conducting an LTD makes it MIS Week , Vol. 2, No. 29, July 22, 1981, impractical for small system procurements. pp. 1+. The procuring organization may achieve some protection from insufficient capacity by requiring a "proof of capacity period" after delivery during which system capacity can be tested.

Maintenance and vendor support are perhaps the two most critical elements to be measured in the procurement of small computer systems. Since most small organi- zations do not have personnel with the tech- nical background to repair either the soft- ware or hardware, the procuring organization must rely upon the vendor to provide these critical services. Therefore, it is impor- tant to precisely state requirements for maintenance of hardware and software. On- site maintenance, common for large instal- lations, is impractical for small systems, therefore, the user of small systems must decide between on-call and per-call main- tenance services. A good approach to this problem is to specify the temporal require- ments for vendor response to service calls and then request that the vendor provide the organization with descriptions of their offerings in these areas.

Vendor training of the procuring organi- zation's personnel must be specified in the RFP. The organization must decide the scope of training required and the location of training. The most frequently followed scenario is to identify the training re-

197

COST/PERFORMANCE COMPARISONS FOR STRUCTURAL ANALYSIS SOFTWARE ON MINI- AND MAINFRAME COMPUTERS

Anneliese K. von Mayrhauser Raphael T. Haftka

Illinois Institute of Technology Department of Computer Science Chicago, IL 60616

Due to an ever-increasing selection of machines and widely differ- ent charging algorithms, the question often arises "on which machine does a program run most efficiently and/or cheaply". The program's behavior, i.e. its resource demands on the machine under consider- ation and their impacts on charges to be explored, as well as accuracy and correctness of the program's results and reliability of the computer service. Human factors influence decisions as well. We will discuss the selection of computer services from a user's point of view with an emphasis on structural analysis software. However, the data collection and evaluation procedure which analyses cost and performance relevant factors and their relationship with each other can also be applied to other software. The decision making process uses a set of automated tools which we developed.

Key words: Performance comparison; program behavior; structural analysis.

1. Introduction programs to be run on machines under con- sideration because they are more flexible A number of studies have described and avoid conversion problems. Since a user approaches for machine selection from a com- is also interested in the correctness of re- uter center management perspective sults of those programs on all machines,

([1]-C4],C7]) . Other work has taken the executable models of them will not do. user point of view, but has only examined Bell ([6]) reports that experimental com- measurable technical factors such as re- parisons are likely to be either very crude ponse time and transmission rates ([8], [9]). or very expensive. One of the biggest prob- In assessing the cost-effectiveness of lems is that performance indices, resource computers or computer services for a partic- consumption and therefore charges can vary ular application task, these methods turn for the same job, due to differing workload out to be less than useful, because they do situations which the user is unable to con- not deal with human factors adequately and trol. This variability factor should be they leave little room to compare different assessed and taken into account when select- software packages which could be used to ing a computer service installation. Usually perform the application tasks with respect a more stable, predictable job cost is to be to their quality on the various machines preferred- but this depends on the charging under study. algorithm employed as well as on the machine characteristics. Because of these limitations Morgan and Campbell ([5]) suggest to as well as a limited budget for the study, construct an executable model of the set of accurate performance studies for some

199 problems were augmented by more informal The raw performance data are processed approaches, such as a general experience with and transformed into more concise in- systems, studying installation statistics, formation which enables their interpret- or interviewing users, and limited model ation with respect to relevant perform- building for some programs. ance indices.

One question which has been interesting (c) comparison, trade-off, evaluation the scientific computing community in the Finally the quality (or lack thereof) last few years is that of the cost-effect- of the various choices is assessed with iveness and viability of minicomputers respect to all performance indices. This versus main frames. Minicomputers promise may be done by interpreting actual cheaper, more widely available computing measurement data or by interpolation facilities, but they pose many problems, and/or extrapolation of measured data, particularly to those with large calculations or estimation of system performance in mind. The smaller main memory means that through the use of results for another users have to make more use of disk I/O. system. Many minicomputers have a smaller word size with a devastating effect on accuracy. As an example, let us take a user of Storaasli and Foster (QO]) report 4-digit structural analysis software. If he wants to accuracy on PRIME 400 for a medium-sized assess the cost-effectiveness of a mini- problem as compared to 8-13 digit accuracy on computer versus a mainframe, the previously a CDC CYBER 173. Longer elapsed times leave mentioned studies need to be extended in more room for machine hardware failures several directions. First, to use a wider during a run. Pearson et al . (CUIl) report mix of programs and more representative mix some theoretical calculations required of problems for the comparison. Second, to twenty-four hours or more, but the mean time include in the study several machines so between failures was also approximately that the results do not reflect only the twenty-four hours. Lack of adequate software charging algorithm at a particular installa- libraries, debugging facilities, document- tion. Last, to further assess the human fac- ation and operations staff often hampers tors involved in such comparison. users of minicomputers. Some studies ([11] - [14]) show that based on machine 2. Experimental Design cost alone minicomputers may be two to four times more cost-effective than main frames. Since the overwhelming majority of The users of structural analysis software structural analysts use finite element meth- on minicomputers have conducted several ods, the present study is limited to such studies that point to the economic advantages programs. By judicious choice of the type of of using such machines (e.g., ClOD, [15]). problems and the type of structural analyses These are based on data gathered by running employed, one can obtain a reasonably relia- structural analysis programs for a limited ble comparison. The following are the set of problems and machines. elements in such a comparison.

When assessing the cost-effectiveness 2.1 Most commonly used types of services, the following steps should be of analysis followed: (a) Set up of evaluation experiment. (i) Linear static solution for displacement This involves designing a represent- and stresses ative set of problems which are to be (ii) Linear eigenvalue analysis - calcu- solved by the computers. The software lation of buckling loads and vibration packages and the computers/computer modes services also need to be determined. (iii) Nonlinear response due to large deform- These are the input variables. Output ations and material nonlinearities variables, e.g. the class of performance indices and other relevant information 2.2 Type of problem need to be determined as well. This will influence the relevant data to be col- (i) A simple cantilever beam

lected during the measurement phase. (ii ) A plate with a hole (iii) A stiffened cylinder (b) benchmark and data collection and their analysis. These problems are solved for the The set of problems is actually run on different analysis types. Several models are some of the machines and technical and used for each problem, ranging from a very human performance factors are observed.

200 crude model to a refined one, with varying are not CPU intensive and do not put a heavy degrees of freedom ranging from a few dozen computing load on the system. As a result, the for a crude model to more than a thousand for CPU intensive structural analysis programs the refined one. often show very good response times even during the day. In running on the PRIME 400 2.3 Computer programs it was found that the response time and I/O times are quite sensitive to the workload Three computer programs are used; these while the CPU time is not. The amount of programs are: workload depends on the number of users, but more importantly on the kind of work they are 1. SAP IV - A general purpose finite ele- doing, i.e. whether they are doing simple ment program developed at the University editing tasks or compiling and running CPU of California at Berkeley is probably intensive jobs with considerable working set the most widely used "free" (it costs size or core requirements. To account for the $200.) finite element code. variability, runs were performed for light, medium, and heavy load periods on the PRIME. 2. SPAR - A general purpose finite ele- The results were averaged to indicate an ment program developed by W.D. Whetstone expected result, as well as to give an idea which serves as a prototype of a com- of the possible deviation from it. It is mercially developed code. important to realize that a considerable spread in the consumption of a particular re- 3. TWODEL - A special purpose finite ele- ource need not entail the same difference in ment program developed by D. Malkus at cost. Whether it does or not depends on how IIT for large deformation analysis of sensitive the corresponding charging algo- two dimensional elasticity problems. It rithm is with respect to that response. The is representative of in house codes. charging algorithm for the system is given in Table 1 (considerable discounts are available 2.4 Computers and Charging at non-business hours). algorithms NASA - Langley Research Center PRIME 400- Our study on the cost-effectiveness of The NASA Langley Computer Center operates minicomputers and mainframes from a user's several minicomputers including a PRIME 400 point of view uses the charging algorithms in computer. The charging algorithm (see Table 1) several installations as the measure of the was devised to recover the capital and oper- cost of running any of the selected problems. ating costs. Although the NASA PRIME 400 In most cases the charging algorithms are supports a small number of users, they run devised to recover the total cost of owning more CPU intensive jobs than the IIT users and operating the system, and in some cases resulting in a comparable system load. There- to show profit. fore it was assumed that program performance is similar on the IIT and NASA systems. Most of the runs were performed at the Experience confirmed this. The cost of run- computer center at the Illinois Institute ning a job on the NASA PRIME 400 was cal- of Technology. The school operates its own culated based on this assumption. The two PRIME 400 minicomputer and buys computing PRIME computer services charge for an services on a UNIVAC 1100/81 from the United unstable resource such as wall time (one may Computing System Corporation. A few runs were also call it connect time, since these made also on the CDC CYBER 176 owned by a services are interactive); this was not un- large manufacturing firm, and on the UCS CDC common among the charging algorithms for CYBER 176 and on the VAX 11/780 owned by the computer services which we encountered. The IIT Research Institute. The performance on charge serves a regulatory function, because these computers was used to predict costs on it is a deterrent when the system is crowd- a few additional systems. The following is a ed and any additional work would slow it brief description of the computer systems down further. For long jobs it also encour- that were used in this study, their charging ages the use of batch runs when they are of- algorithms and the method used to predict fered as an alternative. costs on them. NASA - Langley Research Center Cyber 173 IIT - PRIME 400 - The PRIME 400 is a The NASA Langley Research Center operates medium size computer which is used at IIT to several CDC main frames including two CYBER support interactive computing. The present 173 computers which are used to support inter- configuration has 1.25 Megabytes of memory active computation. The charging algorithm is and two disk drives. The system is very busy devised to recover capital and operating ex- from 9 AM Monday through Saturday with about penses. 20 to 30 users. However, most of the users

201 Haftka has an extensive experience using expected to change the cost much. the SPAR program on both the I IT and NASA SYSTEMS. Based on this experience the follow- Concordia College Computer Network (CCCN) ing assumptions were used to predict cost on CCCN provides administrative, instructional the NASA Cyber 173. and research computing services to a large number of educational institutions. The serv- 1. The CPU time on the Cyber is 1.5 times ice includes a large software development that of the CPU time on the UNIVAC staff and the charges are computed to recov- 1100/81. er costs of operating the center. Interactive and batch services are provided by a UNIVAC •2. The I/O charges are calculated based on 90/80 mainframe installed in December 1980. the number of reads and writes from the The UNIVAC 90/80 is roughly equivalent to an SPAR data base available in the SPAR IBM 370/158 in raw processing power. output. It was assumed that these num- bers are similar for both systems, and The prediction of costs of running on that the charge per disk access is 1.1 the CCCN UNIVAC 90/80 were based on the times the minimum charge per access. charging algorithm given in Table 1 with the This is based on the feature of the following assumptions: charging algorithm which is relatively insensitive to the number of words in 1. CPU times are 2.25 longer than on the each read or write operation. UNIVAC 1100/81. This is based on pub- lished data ([16]) rating the UNIVAC 3. The amount of core required is assumed 1100/81 at 1800 KOPS and the UNIVAC to vary from 70K octal for the smallest 80/90 at 800 KOPS. number of nodes to 120K for the largest number of nodes. 2. Core requirements were estimated at 200K bytes for the small problems and at 4. Based on the above considerations the 300K bytes for the large problems. cost of running SPAR on the NASA Cyber is: I IT Research Institute VAX 11/780 -

The I IT Research Institute (IITRI) operates a DEC VAX 11/780 with 3 Megabytes of core Cost = (1 + 0.048A) {0.0136T, + O.OO3ni0) memory and 650 Megabyte disk space. It does not have a floating point processor. As it is not heavily loaded, time is available to out- where side users. The charges are given in Table 1.

2.5 Performance indices T^ = UNIVAC 1108/81 CPU time (sec) As part of the design, performance vaV*- iables must be chosen. The "utility" of a ni0 = combined number of disk reads and service to a user is a function of those writes reported by SPAR performance factors which affect the value and usefulness to the user and thus necessar- ily subjective and dependent on particular A = core storage (in units of lOK octal needs in a given situation. For this study we words) chose the following performance variables as being most important in comparing the com- puter service alternative. Large Manufacturing Company CDC Cyber 176 - A few runs of SPAR were made on a 1. correctness of results Cyber 176 which is owned by a large manu- facturing company. The charging algorithm is 2. turnaround time given in Table 1. It appears to be fairly high on CPU and low on I/O and core storage 3. system availability or "uptime" charges. In predicting costs on this machine it was assumed that CPU times can be predicted 4. support - technical help, documentation from the UNIVAC 1100/81 results based on pub- lished data ([16]) rating the UNIVAC 1100/81 These factors are fairly self explan- at 1800 KOPS and the Cyber 176 at 9300 KOPS. atory and were selected because they are It was assumed that the ratio of I/O time to obviously important. Other situations and CPU time is the same on both machines. Be- users may choose a different set of factors cause the I/O cost on the Cyber 176 is low, which reflects their goals better. For a com- even a large error in this assumption is not prehensive list of performance variables see

202 [17] and [18]. which is generated by the data collection it- self. This instrumentation overhead is removed 3. Benchmark and data collection during analysis by the ANALYZ program. ANALYZ analysis processes the raw data from REFSTR, removes instrumentation overhead, and prints summary 3.1 Data collection and analysis performance results (Figure 1). ANALYZ also produces the following dynamic call matrix to Measurement data collected during the be used for further analysis or program re- benchmark are used for job cost prediction structuring for virtual memory systems and performance comparison of the two ma- (Figure 2). A routine called LSP was developed chines. The following statistics are collect- to fit polynomial curves to the resource con- ed for each run: sumption and cost data using a least squares technique. LSP also can plot the data points -CPU time (measured in seconds) and fitted curves. -I/O time (measured in seconds) -memory used (measured in K bytes) The benchmark performance data is col- -turnaround time (measured in seconds) lected at the job level for all jobs, at the -number of disk I/O operations program level for all SPAR jobs, and at the -job cost (measured in $) subroutine level for a few selected SAPIV jobs. Since SPAR is a modular sequence of A comprehensive program performance rather independent tasks, it was not necessary measurement system was developed for the data to collect measurement data at the subroutine collection and analysis. This system will level. SAPIV, on the other hand, is not se- automatically instrument the FORTRAN source quentially structured, but is a conglomerate programs by inserting CALL statements at the of subroutines with frequent interaction. entry and exit points of each module in the This necessitates more detailed performance system. The CALL statements invoke a routine data collection from which SAPIV s resource called REFSTR which will collect performance consumption models are derived (see sample data. Example: output from ANALYZ for example of detailed analysis). The resource consumption statis- SUBROUTINE ELT3A4 (A,B) tics are collected differently on each ma- CALL REFSTR ('ELT3A4', 'ENTRY') chine. In each case instrumentation overhead is measured and removed from the data, if it

. body of subroutine is significant. Job cost computations are based on resource consumption and are not IF (ISTOP.EQ.O) GOTO 99999 directly taken from the job accounting out- put.

99999 'call REFSTR ('ELT3A4', 'EXIT') 3.2 Benchmarks and their results RETURN END 3.2.1 Beam problem

The REFSTR data collection routine logs A cantilever beam was modeled by plane entries and exits from modules in the system, beam elements (E24 elements in SPAR) and captures the CPU and I/O times at each of three vibration modes and frequencies were these points, and writes this program trace calculated using the SPAR program. The num- information out to a permanent file in very ber of nodes was varied from 5 to 600 degrees large blocks. The program trace data from of freedom per node. This simple problem ex- REFSTR looks like this: posed a bug in the PRIME version of SPAR. The program could not calculate more than two ROUTINE CPU I/O vibration modes and frequencies. On the $MAIN$ ENTRY 11 1694 9 1964 UNI VAC there was no problem. Performance FACMGT ENTRY 11 1704 9 1964 data and cost (prime time rate) for the beam FACMGT EXIT 13 0202 9 1964 problems are given in Tables 2 (UNIVAC 1100/ LODCMN ENTRY 13 0246 9 1992 81) and 3 (PRIME 400). The total cost proved INPUTJ ENTRY 13 0522 9 2172 to be very close to a linear function of the ELTYPE ENTRY 13 4028 9 4352 number of nodes for both computers. It was PLANE ENTRY 13 4034 9 4352 therefore possible to predict accurately the ELT3A4 ENTRY 13 4098 9 4454 cost of 600 node runs by extrapolating the PLNAX ENTRY 13 4104 9 4454 cost of the 5-280 node runs, It is therefore assumed that the 5-600 node results can be Calibration studies have been performed on used safely to extrapolate cost up to 1200

REFSTR to determine the CPU and I /a overhead nodes. Figure 3 shows the actual cost points

203 and the predicted cost curves for both com- UNIVAC 1100/81 and the PRIME 400 are given puters. The triangles represent cost points in Tables 8 and 9. The problem is much cheaper for the UNIVAC 1100/81, the circles averages to run on the PRIME 400 than on the UNIVAC for different load situations on the PRIME 1100/81. The 289 node problem was the largest 400, It can be seen that although the results that could be run on the UNIVAC 1100/81 be- may have been sensitive for certain resources, cause of core limitations. The PRIME 400 they are not very sensitive in their cost, and with its virtual memory could handle larger cost predictions for other problem sizes can problems. The excellent performance of the be expected to lie within an analogous range. program on the PRIME 400 is attributed to two factors. First, the large core require- The beam runs are about twice as expens- ments on the UNIVAC 1100/81 are very expens- ive to run on the UCS UNIVAC 1100/81 than on ive (fifty percent of the cost of the 289 the I IT PRIME 400. The response time or turn- run was core cost). Second, the program was around time is favorable on the PRIME for the written by the developer on both machines. small problems. The UNIVAC is quite faster for As a result it does not suffer from the dete- the larger problems. rioration in performance that afflicts pro- grams that have been converted to different The performance of the other computers machines by users who are not as knowledge- for the beam problem was actually measured or able about the program as its developer. estimated and the results are summarized in This applies to the SPAR and SAP IV programs. Table 4. The costs of running the problen on A comparison of the CPU times on the PRIME the NASA PRIME-400 are comparable to those at 400 and the UNIVAC 1100/81 for the plate IIT. The UCS UNIVAC 1100/81 is the most ex- problem shows the following ratios. The PRIME pensive mainframe. The low cost on the Cyber 400 CPU times are 7-15 times longer for 176 is a result of low I/O charges on that SAP IV, 7-13 times longer for SPAR, but only machine compared to the NASA Cyber 173 ($0.84 5-8 times longer for TWODEL. The comparison of $2.25 versus $8.00 of $ 8.60). When evalu- with other computers is given in Table 10. ating the total cost results in Table 4 the The cost on the two UNIVAC computers is very main difference is not between minicomputers high and the cost on the CDC computer is very and mainframes but between service bureau low compared to the PRIME 400. computers (the two UNIVAC machines) and the user owned machines and their lower charges. 3.2.3 Stiffened cylinder problem

3.2.2 Plate problem A stiffened cylindrical shell, see figure 6, modeled by plate and beam elements. A rectangular plate with a hole was mod- The lowest four vibration frequencies and the eled with quadrilateral plane stress finite buckling load of the cylinder were calculated elements (Figure 4 insert). It is subjected using the SPAR program. Three models were used to uniform loads. Stresses are calculated with with 5x5, 10x10, 15x15 grids of elements o,r the number of nodes varying between 35 and 594. 36, 121 and 256 nodes, respectively. The problem was run with the SAP IV and SPAR programs. Tables 5 and 6 show results for The cylinder problem revealed two bugs various machines. in the SPAR program on the PRIME. The stress analysis which is preliminary to the buckling The turnaround time on the UNIVAC is calculation was not correct when combined much better than that of the PRIME. Just as membrane-bending elements were employed. As for the beam problems, the measured data were a result it was necessary to use twice as used to predict costs of running problems up- many plate elements; pure membrane elements to 1200 nodes. The cost data and the predic- and pure bending elements. Unfortunately, the tive curve are given in Figures 4 (SPAR) and mass matrix was not calculated properly with 5 (SAPIV). The cost of the plate problems in- this replacement. As a result we had to per- creases much more rapidly on the PRIME than on form the calculations in two separate runs, the UNIVAC. Table 7 shows a cost comparison one using double elements for buckling for all computers based on the data in Table 5 analysis and one using combined elements for and 6. For SAPIV, the user owned CDC has the the vibration analysis. This did not increase lowest charges whereas the two UNIVACs and the cost of the run substantially but was an the minicomputer IIT PRIME 400 are most ex- inconvenience for the analyst. pensive. For SPAR, the 1200 node figures show a clear disadvantage for the IIT PRIME 400. The results of CPU and I/O times, re- ponse time and cost are given in Table 11 The plate problem was now analyzed for for the UCS UNIVAC 1100/81 and in Table 12 large deformations using the TWODEL program. for the IIT PRIME 400. Comparison with other Performance and cost variables for the UCS computers is given in Table 13 for the

204 largest problem of 256 nodes. It can be seen Information about IIT PRIME 400 particular- that for this problem the performance of the ities is communicated by "word of mouth". PRIME 400 is dismal and that of the CDC very Although the technical people are fairly good. Next the same problem was run with the competent, they are far overburdened and un- SAP IV program and produced quite different able to provide a consistent level of support. results. The PRIME 400 performed excellently whereas the UNIVAC 1100/81 and the UCS Cyber The results for all performance factors 176 had convergence problems. A change of can be used with a weighted scoring method parameters solved them for the 36 node prob- to determine which of the machines has the lem, but difficulties remained for larger highest utility and to perform a utility/cost ones. To resolve these was beyond the know- trade off analysis. For a subset of the ledge of an average user. results in this paper such an analysis was performed in [17], the same method could be The benchmark results showed that cor- used for the results presented here. rectness of the results cannot automatically be assumed. One problem area was improper 4 Concluding remarks conversion for those packages which were de- veloped on one machine and subsequently The study of the performance of struc- transported to another. We found several tural analysis programs on mainframe and problem types that could not be run correctly minicomputers has not shown a clear cut on the PRIME due to those problems (30 per- advantage of either type of computer. Both cent of the beam runs and 50 percent of the mainframe and minicomputer had difficulties, plate runs were completely correct when run but with different problems. For mini- on the PRIME). Another contributing factor computers some of these problems were due to lies in the fact that certain problems require portability problems (SAP IV, SPAR) and im- high precision floating point values to be perfect conversion from mainframe software stored and manipulated and the minicomputer's to a minicomputer. As the problem sizes smaller word size and floating point operand increased, the minicomputers were less effi- size can produce incorrect or poor results. cient for those programs which were not On the other hand, some of the mainframes developed for them; but some of the main- exhibited correctness and feasibility problems frames could not run those problems adequate- as well. The most important thing a user can ly, either. The minicomputer clearly out- do to protect himself is to check results performed the main frame for the one program carefully and validate them, if possible. which was developed on the minicomputer (TWODEL). 3.3 Other performance factors The cost comparison does not show any Availability was considered important. drastic difference between main frames and Table 14 shows percentage of uptime, mean minicomputers except for the user owned time between failures (MBTF) and mean time to CDC CYBER. This probably reflects a competi- repair (MTTR) for UCS UNIVAC 1100/81 and PRIME tive marketplace where the decrease in the 400. The UNIVAC data is taken from published cost of raw computation has stimulated an figures (UCS "Data Link") for January through increase in programming and computer services December 1980, PRIME 400 data could only be offered. There is no longer a main frame obtained for the months of October through monopoly on (structural analysis) software December 1980. The PRIME system is down fewer and services have to be priced to be com- times than the UNIVAC, but is down for extend- petitive with minicomputers. The user owned ed periods of time. The overall uptime of the machines which just charge to cover their UNIVAC is somewhat better than the PRIME. costs naturally are cheaper than those which need to show profit and may have marketing Support is a very important performance and other related expenses. This reduces the factor for many users. Support includes problem to one of microeconomics and price availability and quality of technical help theory ([19]). and documentation. We found both the UNIVAC and PRIME centers to be somewhat deficient in support. We found the UNIVAC documentation to be fairly good, the access to documentation The study reported in this paper has to be good, but the access to technical help been supported by the office of Naval Re- to be poor (due to contract between service search under contract No. N00014-80-C-0364. bureau and university). We found the PRIME The authors wish to acknowledge the help of documentation to be weak and unavailable for Mr. Dennis Witte in obtaining and process- reference on site. Purchase is always requir- ing the performance data. ed. Only standard PRIME manuals are available.

205 Chemical Society, 1977, pp. 191-199. References [13] Wagner, A.F. Day, P., Van Buskirk, R.,

[I] Randal, J.M., and Badger, G.F., Using and Wahl , A.C., Computation in Quantum Quantitative Criteria for Computer Chemistry on a Multi -Experiment Control

Selection , Computing Services Office, and Data-Acquisition on a Sigma 5 University of Illinois at Urbana- Minicomputer, Minicomputers and Large

Champaign, Urbana, IL, 1976. Scale Computations , American Chemical Society, 1977, pp. 200-217. [2] Sharpe, W.F., The Economics of

Computers , Columbia University Press, [14] Freuler, R.J. and Petrie, S.L., New York, 1969. An Effective Mix of Minicomputer Power and Large Scale Computers for Complex [3] Strauss, J.C., A Benchmark Study, AFIPS Fluid Mechanical Calculations,

Conference Proceedings 41 , FJCC, 1972, Minicomputers and Large Scale Computa- pp. 1225-1233. tions, American Chemical Society, 1977, pp. 218-234. [411 Kernighan, B.W,, Plauger, P. J., and Plauger, D.J., On Comparing Apples and [15] Conway, J.H., The Economics of Oranges, or why my Machine is better than Structural Analysis on Supermini s, your Machine, Performance Evaluation Proc. 7th ASCE Conference on Electronic

Review, Vol. 1, No. 3, 1972, pp. 16-20. Computation , 1979, pp. 373-385.

[5] Morgan, D.E. and Campbell, J. A., An [16] Lias, E.J., Tracking the Elusive KOPS,

Answer to a User's Plea?, Proc. 1st Datamation , November 1980, pp. 99-105. ACM-SIGME Symp. on Measurement and

Evaluation , February 1973, pp. 112-120. [17] Von Mayrhauser, A.K. and Witte, D.E., Cost/Performance Comparisons for Pro- [6] Bell, T.E., Computer Performance Varia- grams on Different Machines, to be

bility, AFIPS Conf. Proc. 43, NCC, 1974, presented at EC0MA.9 , October 6-9, 1981, pp. 761-766. Copenhagen.

[711 Timmreck, E.M., Computer Selection [18] Miller, W.G., Selection Criteria for Methodology, Computing Surveys, Vol. 5, Computer System Adoption, Educational

No. 4, 1973, pp. 199-222. Technology , Vol. 9, No. 10, 1969, pp. 71-75. [8] Mamrak, S.A. and Amer, P.D., Comparing Interactive Computer Services - [19] Cotton, I.W., Microeconomics and the Theoretical, Technical, and Economic Market for Computer Services, Computing

Feasibility, Proc. NCC , 1979, pp. 781- Services , Vol. 7, No. 2, 1975, 787. pp. 95-111.

[9] Mamrak, S.A. and DeRuyter, P. A., Statistical Methods for the Comparison

of Computer Services, Computer , Vol. 10, No. 11, 1977, pp. 32-39.

[10] Storaasli, 0.0. and Foster, E.P., Cost-Effecti ve Use of Minicomputers to Solve Structural Problems, AIAA Paper No. 78-484, 1978.

[II] Pearson, P.K., Luchese, R.R., Miller, W.H., and Schaefer, H.F., Theoretical Chemistry Via Minicomputer, Minicomputers and Large Scale Computa-

tions , American Chemical Society, 1977, pp. 171-190.

[12] Norbeck, J.M. and Certain, P.R., Large Scale Computation on a Virtual Memory Minicomputer, Minicomputers and

Large Scale Computations , American

206 Table 1. Charging algorithms (dollars)

1. IIT PRIME 400 0.02 {1, + T„) + T, T^ = CPU time (sec) T^ = I/O time (sec) Tj = wall time (hours)

2. NASA Langley Research Center SOTj + 15T2 400 PRIME T^ = CPU time (hours)

T, = wall time (hours)

3. UCS UNIVAC 1100/81 0.18T^ + 0.0011A(T^ + T^)

T^ = CPU time (sec)

Jr I/O time (sec)

A = (core/512) words

4. Concordia College UNIVAC 90/80 (0.156 + 0.0000248A)T

T = CPU time (sec)

A = core (kilobytes)

5. Large Manufacturing Company CDC CYBER 176 0.2T 0.0003A (T^ + T2) + .03 * T2 1 Tj = CPU time (sec)

T^ = I/O time (sec)

A = core (kilowords)

6. NASA Langley Research Center - See section on "Computers and Charging CDC CYBER 173 Algorithms," pages 2-4.

7. UCS CDC CYBER 176 Confidential

8. IIT Research Institute VAX 11/80 0.05Tj + 5T2

T^ = CPU time (sec)

T2 = connect time (sec)

Table 2. Beam results on the UCS UNIVAC 1100/81 using SPAR

Number of CPU I/O Time Turnaround Time Cost Nodes (sec) (sec) (min) (dollars)

5 5.06 15.77 3:21 1.83 25 5.74 16.12 2:50 2.07 60 7.07 18.17 6:04 2.66 90 8.30 19,79 2:41 3.16 125 10.29 22.81 4:26 4.07 280 16.70 32.59 7:17 6.90 450 22.52 41.73 3:16 9.53 600 30.73 59.68 7:45 13.80 * 1200 60.76 72.12 8:55 30.89

* predicted

207 Table 3. Beam results on the IIT PRIME 400 (averages)

T /D Timo liUlllUC 1 U 1 U 1 1 MIC

f C 1 1 e 1 (tie V acL ) Ti ^ uo 1 1 ars )

C 1 d 7n 17 fiS 1 • m n fi7 it . / u X / . Do U . 0/

1 7 4"^ 1 'fn n 7"^ i / . HO i . U J 60 30.65 28.45 2:37 1.22 90 37.07 22.84 1:57 1.23 125 47.82 37.25 6:02 1.80 280 94.24 63.11 5:41 3.25 450 141.07 132.87 11:04 5.16 600 197.38 125.76 13:47 6.68 1200 * 422.92 262.02 26:36 14.14

* predicted

Table 4. Comparison of the 600 and 1200 node beam problems on various computers (the estimated 1200 node results are given in parentheses)

Computer CPU I/O Turnaround Time Total (sec) (sec) (min) Cost (dollars)

IIT 197.38 125.76 13:47 6.68 ^ PRIME 400 (422.92) (262.02) (26:36) (14.14)

NASA LRC * 197.38 125.76 13:47 5.08 PRIME 400 (422.92) (262.02) (26:36) (10.17)

UCS UNIVAC 30.73 59.68 7:45 13.80 1100/81 (60.76) (72.10) (8:55) (30.89)

NASA LRC 46.1 8.60 CYBER 173 (91.14) (20.78)

LMC 6.05 24.5 2.24 CYBER 176 (11.76) (35.4) (3.98)

Concordia 69.14 11.13 College (136.71) (22.34) UNIVAC 90/80

* estimated + average

208 Table 5, Plate linear analysis with SPAR on various computers

Number of CPU I/O Turnaround Cost Nodes (sec; (sec) (mi n: sec) (do! lars]

UNIVAC 1100/81

35 4.80 14.25 1:27 1.69 72 6.98 15.55 1:44 2.46 143 11.89 18.36 2:08 4.22 285 24.46 25.27 2.48 8.69 594 60.58 44.78 4:36 21.94 1200 170.27 102.31 9:23 62.89

IIT PRIME 400

35 32.11 19.18 1:23 1.05 72 58.40 29.91 2:58 1.81 143 123.14 44.96 4:30 4.18 285 285.14 123.65 8:56 8.39 594 805.96 679.25 41:24 28.71 1200 2500.40 1370.10 144:15 126.21

Table 6. Plate linear analysis with SAP IV on various computers

Number of CPU I/O Turnaround Cost Nodes (sec) (sec) (mi n: sec) (dollars)

UNIVAC 1100/81 35 4.52 1.11 1:13 0.90 72 6.77 1.93 1:51 1.56 143 11.85 3.54 2:12 2.97 285 25.38 8.19 3:08 7.15 594 78.59 26.94 10.08 23.61

1200* , 316.58 122.14 18:54 98.67 IIT PRIME 400 35 30.82 25.65 7:55 1.26 72 66.40 41.15 11:00 2.33 143 166.12 47.15 21:27 4.62 285 396.80 82.61 25:32 10.01 594 1186.52 216.10 54:08 28.95 1200* 3791.00 685.74 112:12 108.60 UCS CYBER 176 35 .54 1.88 0.96 72 1.00 3.08 1.92 143 1.69 4.48 3.36 285 4.10 10.82 7.20 594 11.96 38.21 21.60 1200* 32.96 99.18 70.81 IIT Research Institute VAX 11/780 + ** 35 10.36 0:25 0.52 72 20.94 0:30 1.05 143 59.62 3:30 2,98 285 182.43 5:28 9.12 594 568.33 12:41 28.42

* estimates + averages ** jobs were run in batch mode, no connect time charges

209 Table 7. Comparison of the cost of linear analysis of the 594 (1200) node plate model on various computers (in dollars)

Computer SAP IV SPAR System 594 1200* 594 1200*

I IT PRIME 400+ 28.95 108.65 28.71 126.21 NASA LRC PRIME 400* 23.38 59.64 17.06 56.90 UCS UNIVAC 1100/81 23.61 98.67 21.94 62.89 NASA LRC CYBER 173 ** 6.84 18.90 LMC CYBER 176* 4.23 11.39 2.85 7.82 Concordia College UNIVAC 90/80 28.98 116.42 22.28 69.57 UCS CYBER 176 21.60 70.81 I IT Research Institute VAX 11/780 28.42 **

+ average of several runs * predicted results ** data insufficient for prediction

Table 8. Plate nonlinear analysis on the UCS UNIVAC 1100/81

Number of CPU Time I/O Time Core Storage Turnaround Cost Nodes (sec) (sec) (Kwords) Time (min)

35 19.97 0.35 22.9 32:47 4.61 72 54.15 0.66 28.4 38:01 13.13 143 156.91 1.29 42.5 55:10 42.68 289 549.69 1.78 76.0 137:21 189.33 600* 2030.59 3.59 175.7 650:43 1151.80

* estimates

Table 9. Plate nonlinear analysis on the I IT PRIME 400

Number of CPU I/O Time Response Cost Nodes Time (sec) Time (min) (dollars)

35 156.01 2.74 7:08 3.29 72 379.95 3.81 10:09 7.84 143 978.05 19.47 29:59 20.45 289 2693.59 69.70 70:50 56.44 600* 8693.00 287.91 204:14 193.06

* estimates

Table 10. Comparison of the cost (in dollars) of nonlinear analysis of the 289 and 600 node plate models on various computers using the TWODEL program

Computer CPU Time (sec) Core^ Cost (dollars) System 289 600* 289 600* 289 600* IIT PRIME 400 2694. 8693. -- -- 56.44 193.06 NASA LRC PRIME 400* 2694. 8693. — — 40.13 123.50 UCS UNIVAC 1100/81 550. 2030.6 76.0 175.7 189.33 1177.66 LMC CDC CYBER 176* 58.8 217.1 42.2 97.6 12.52 49.80 Concordia College 1238. 4570.7 320. 740. 203.03 794.20 UNIVAC* * estimates: + UCS UNIVAC 1100/81 and LMC CDC CYBER 176 core requirements are given in K words. Concordia College UNIVAC core requirements are given in K bytes.

210 Table 11. Cylinder buckling and vibration analysis on the UCS UNIVAC 1100/81

Number of CPU Time I/O Time Turnaround Cost Nodes (sec) (sec) Time (min) (dollars)

36 27.28 19.34 7:09 7.36 121 110.65 28.16 10:11 29.21 256 302.59 48.45 19:52 79.97

Table 12. Cylinder buckling and vibration analysis on the IIT PRIME 400

Number of CPU Time I/O Time Turnaround Cost Nodes (sec) (sec) Time (min) (dollars)

36 579.93 343.28 64 19.53 121 3197.10 1184.54 156 90.23 256 10784.0 6589.98 774 360.38

Table 13. Comparison of the 256 cylinder problem on various computers

Computer CPU Time I/O Time Total Cost (sec) (sec) (dollars)

400"^ IIT PRIME ^ 10784 6590 360.38 NASA LRC PRIME-400 10784 6590 283.37 UCS UNIVAC 1100/81 302.59 48.45 79,97 Concordia College 680.8 110.96 UNIVAC 90/80* LMC CDC CYBER 176* 58.2 24.2 12.64 NASA LRC CYBER 173* 453.9 29.55

+ averages

* predicted results

Table 14. Uptime - downtime characteristics

UCS UNIVAC 1100/81 IIT PRIME 400

Uptime % 98.9 96.7 MBTF (hours) 56.79 87.98 MTTR (minutes) 34.7 178.2

211 MODULE CPU %CPU I/O %l/0 #CALLS #RETURNS $MAIN$ 0.065 .08 .007 .04 1 1 FACMGT 1.849 2.38 .0 .0 11 ELAW 0.139 .18 .128 .48 546 546 ADDSTF 21.396 27.51 9.023 34.09 1 1

SESOL 31.583 40.61 10.213 38.59 1 1 STRSC .944 1.21 1.255 4.74 1092 1092

TOTAL 77.769 26.467

CPU amount of CPU time spent in module %CPU fraction of total CPU time spent in module I/O amount of I/O time spent in module %I/0 fraction of total I/O time spent in module #CALLS number of entries into module #RETURNS number of exits from module

Figure 1. Summary performance results

CALL MATRIX TO: 1 2 3 4 5 6 7 8 9 10 FROM:

1 $MAIN$ 0 1 1 1 1 0 0 0 0 0 2 FACMGT 0 0 0 0 0 0 0 0 0 0 3 LODCMN 0 0 0 0 0 0 0 0 0 0 4 INPUTJ 0 0 0 0 0 0 0 0 0 0 5 ELTYPE 0 0 0 0 0 2 0 0 0 0 6 PLANE 0 0 0 0 0 0 1 0 0 0 7 ELT3A4 0 0 0 0 0 0 0 1 0 0

8 PLNAX 0 0 0 0 0 0 0 .0 546 0 9 ELAW 0 0 0 0 0 0 0 0 0 546 10 POSINV 0 0 0 0 0 0 0 0 0 0

Figure Dynamic call matrix

212

/ / Dollars / 610—

40-

28—

1 ' 1 e— 1 1 1 1 1 1 1 1 e zee 4ee eee eee leee i2ee

Number of Nodes

Figure 5. Comparison of total cost for the plate problem using the SAP IV program

2\H Figure 6. A typical cylinder model

215

Special Technologies and Applications Track B

217

Focusing on Secondary Storage

219

.

SESSION OVERVIEW

FOCUSING ON SECONDARY STORAGE

H. Pat Artis

Morino Associates, Inc. Vienna, VA 22180

During the past decade, secondary storage has been the fastest growing resource for most installations. These devices now dominate the physical planning process and they represent an ever more significant portion of hardware budgets. The papers in this session address three secondary storage management topics. They are:

o user experiences with an automated space management system. These systems attempt to reclaim space occupied by datasets that have been unreferenced for long periods of time.

o guidelines for designing IBM I/O configurations to insure system performance.

o an analysis of the architecture and performance characteristics of the CDC 844-41 disk subsystem.

Each of these papers provide a valuable perspective and the session should be informative for those who attend

221

.

EXPERIENCES WITH DASD , TAPE, AND MSS SPACE MANAGEMENT USING DMS Patrick A. Drayton

Southwestern Bell Telephone Company St. Loius, MO 63101

This paper was written to share our experiences in the space management of DASD, Mass Storage and tape datasets in order to reduce DASD cost and improve the performance of the I/O subsystem. Included is a description of the situation which precipitated our turning to space management as a solution. Also included are the standards and procedures we introduced and the software products we utilized and wrote to service our needs. Finally the results that were achieved and some unforeseen problems that arose are documented

Key words: data set, migration, DASD, MSS, space management, performance, disk management system.

1. System Configuration but the user community did not experience any measureable service The system under discussion was an IBM improvements. The CPU could not be 3033AP with 16MK running under driven above 65% busy and batch MVS-SE2. It was configured with 73 thruput was below our service 3350-type DASD and 4 STC 8360 DASD. objective of 80% jobs turned around in In addition, it utilized 12 tape under two hours. TSO response time on drives, one 2305 drum and one 100 average was acceptable (2.2 seconds) GIGABYTE Mass Storage System (MSS) but intermitant responses of 2-5 with four readers. The drum and minutes were not unusual. 8360s, were dedicated to local page data sets. Initial investigation identified the problem as the MSS. Its four readers The system served as the corporate were busy 90-100% of the time with an test system and was used by 800 average of six data sets awaiting programmers from primarily 7 A.M.- 5 service, this caused data sets to P.M. The average daily workload require over a minute from initial consisted of 1700 batch jobs in 25 request to accessability on the initiators, 65,000 TSO transactions staging packs. This resulted in jobs processed by an average of 70 on-line being swapped out for an average of users, plus a small test IMS system 4.3 minutes of a total average elapsed with two message processing regions. time of 11.5 minutes. In other words 37% of the time a job was awaiting 2. Problem Definition data set servicing. On comparing long wait swap times on MSS and non-MSS The problems came to light in systems it was estimated that 80% of October 1980 when the 3033 was the long wait time was directly upgraded to an attached processer (AP) attributable to the MSS related

223 . : a

functions. Further investigation placement guidelines had been issued. uncovered the fact that the operating The problem this presented to MSS was systems method of new data set previously described. In addition allocation caused severe degradation this caused system and user data sets to MSS performance. During MSS space to be spread throughout the real DASD allocation no additional new or old pool. Data sets were placed by the data sets could be allocated even if user wherever space was available they were already staged until after without regard for DASD contention or the initial new space allocation data set performance. request was satisfied. MSS volumes are considered mountable devices and 3. Corrective Action MVS allocation will stop all like-device allocations until the Now that the problems had been current allocation recovery is identified we set about to relieve resolved. With these problems in mind them. Our solutions are summarized as we felt the improvement of MSS service follows was the number one priority item to improving overall service to the user 1. Disk Pack Reconfiguration Plan - community this plan grouped functionally-related data sets In the process of developing solutions together, removed the private pack to improving MSS service, additional concept, and configured packs problems were uncovered. One of these based on performance. was extensive use of the private pack concept by application programming 2. Data Set Naming Standard - in groups. That is, each application was order that ownership could be given a number of dedicated DASD established and to allow data volumes to use as they saw fit with no migration the data set name had to guidance or policing. This was an be in a standard form and this outgrowth of capacity planning which standard had to be automatically required applications to justify their enforceable. DASD requirements. These requirements were translated into DASD volumes and 3. Data Set Placement and Migration distributed as dedicated packs. The Plan - this plan established where problem we found with this concept was data sets should be placed based the application programmers were not on their data set characteristics space managers and the packs contained and usage activity. In addition, old data, over-allocated data sets, criteria for moving data sets and were badly fragmented. after their creation if their usage activity changed was Also the shared DASD pool was badly required. Finally an automated fragmented, it contained many method of accomplishing the above out-of-date data sets and the current was essential to minimize manpower data sets were badly over-allocated requirements. and poorly blocked. This had resulted in uncontrolled growth of DASD 3.1. Disk Pack Reconfiguration Plan requirements. When the user community could not allocate adequate real DASD As described earlier there existed a they turned to MSS for their data need to configure data sets by usage space needs. This further aggravated to allow better control and the MSS busy condition by placing many performance of the DASD subsystem. highly active partitioned, TSO data One of the first criteria implemented sets and indexed files on MSS. was the termination of the private pack concept. This was met with Two additional problems soon became resistance from the user community but evident. There were no data set was overcome with upper management naming standards in place. Data sets backing plus the commitment that were named at the whim of the user so future user data space requirements that the ownership could not be would be met. ascertained. In addition no data set

224 .

All DASD was then divided by function Requests for a total of more than as follows: four volumes from multiple users caused volume sharing. For example 1. Scratch/work volumes - these nine one application would have the (9) volumes contained temporary volume for twelve hours a day. At data sets which would be deleted at the end of their test period the job termination. pack would be dumped and then restored with the next user's data. 2. System data sets - all data sets containing system-required modules 7. User data sets - the vast majority plus system-wide data sets such as of DASD was contained in this SMF, JES SPOOL, JES checkpoint, and pool. All user test data sets, page/swap data sets were located on libraries, and data bases were these 13 volumes. Page and swap located on these 39 volumes. These data sets were on dedicated volumes. were the only volumes to participate in data set migration. 3. Application libraries - each application's object, load and 8. New MSS allocation staging - these source libraries were contained on three (3) volumes contained all these six (6) volumes. Because of newly created permanent data sets. their low usage levels source During nightly maintenance the data libraries were used to fill the sets on these packs were migrated space remaining on local page to either real DASD (user data set packs. We found this was an volumes) or MSS. This was done to excellent use of this space and did avoid the degradation new data set not cause any paging service allocation caused on the MSS and to degradation. allow efficient placement of user data sets on real DASD. 4. TSO data sets - these five volumes contained TSO edit type data sets. This new DASD configuration greatly In order to insure space was enhanced our ability to control space available for TSO edit usage all requirements and I/O performance. data sets not used in 14 days were deleted and user data sets 3.2 Data Set Naming Standard older than 28 days were copied to user DASD (see item g) The naming standard was beneficial in locating the owner of a data set when 5. VSAM data sets - these three (3) problems arose. It also made possible volumes contained volume-size VSAM the accounting for space usage back to data spaces. This eliminated the user department. This in turn hundreds of user VSAM data spaces allowed for better capacity planning spread throughout DASD. This also of future DASD and MSS requirements. allowed for more accurate space management of VSAM data sets. Our However, the naming standard was initial survey found 45% of the essential for continuation of the MSS VSAM data sets were either empty or group concept after data set migration -unusable. was started. MSS performance is enhanced by the use of the group 6. Test data bases - frequently we had concept. This concept causes the .requirements for 1-4 volumes of volumes of MSS to be logically DASD to perform volume tests or connected and space requests to be conversion work for an directed to a group name and be application. In the past this satisfied by any volume in that group precipited a mad scramble (similar to a generic name). We chose attempting to secure unused volumes to assign groups by application. from other systems. This was Therefore, any migration to MSS must go replaced by a pool of four 3350s; to a particular group. The data set which were scheduled on an as name must, therefore, correctly identify needed temporary basis. the application owner.

225 : . :

The standard we implemented was as 2. SAS - Statistical Analysis System follows by SAS Institute Inc. We made use of this wide-spread data tool as Each data set contained two index follows. levels "DD.dddt" where DD is the department level (we have three) ddd a. Process SMF data to report on is the division level within that data set activity. department, t is the district within b. Process RMF data to report on that division. DASD pack and string contention. c. Process MSS trace data to report This standard matched our current TSO on MSS performance. user ID scheme of DDdddtu with u d. Produced control card data to representing the individual user. All direct DMS processing. data sets created over TSO by a user are automatically supplied the two 3. Local Programs - local programs standard indices. In addition, for were required to build control performance we installed three user cards for DMS to handle the catalogs based on the department level migration from DASD to MSS. qualifiers. As an aside, many of our users added a third level index of 3.4 Data Set Placement Plan their initials to allow them to interogate the catalog for their The purpose of this plan was unique three levels to obtain a three-fold listing of their data sets. Enforcement of our standard was 1. Reduce the load on the MSS readers accomplished through the use of DMS by migrating busy data sets from (see below) MSS to DASD. 2. Reduce the need for increased real 3.3 Software Products DASD by migrating unused data sets to MSS. There were two software products used 3. Avoid the degradation of new to take the data placement/migration allocation on MSS by delaying the plan from paper to reality. It will placement of new data sets until be noted in each portion of the plan nightly maintenance. where a particular tool was utilized. Following is a brief synopsis of the The placement standards were software used. established as follows:

1. DMS- (Disk Management System) by 1. DASD - frequently accessed (more Application Development Systems, than 15 times/week) user data sets; Inc. This was our primary system data sets, ISAM data sets, migration tool and our standard application group object, load, and enforcer. This product source libraries; large (over 50 accomplished the following: MB) IMS, VSAM data bases. 2. MSS - Intermediate access (1-14

a. Movement of data to and from MSB times per week) , intermediate sized and DASD. (0-50MB) data sets. The MSS b. Archival to tape of old data cartridge holds 50MB of data). sets. 3. Tape - not frequently accessed c. Deleting of old data sets. (less than once/week), large d. Deleting of non-standard named sequential data sets (larger than data sets. 200 MB), system log files, archived e. Releasing of over-allocated data sets. space in sequential and partitioned data sets. The DASD placement of system and f. Deleting of non-catalogued data application libraries was handled in sets. the reconfiguration plan. In addition at this time each application's libraries were given a 20% space increase for growth and placed on DASD

226 so that no further expansion of these MIGRATION PROCEDURE libraries was possible without the space manager's knowledge. The DASD - MSS application programming staffs were NOT ACCESSED IN FIVE DAYS then made accountable for their DMS AND LOCAL PROGRAM USED library contents and insuring that they remain within the established MSS - DASD space limitations. Large data sets ACCESSED 15 TIMES IN 7 DAYS and IMS data base were placed as SAS AND DMS PROGRAM USED specified by the space manager. All new large space requests were handled DASD - TAPE by the space manager. VSAM data sets DATA BASES NOT ACCESSED IN TWO MONTHS were placed in the common VSAM data MSS - TAPE spaces. NOT ACCESSED IN SIX MONTHS

MSS placement was handled by the TAPE - DELETED introduction of a new-WSS DASD pool NOT ACCESSED IN ONE YEAR and the elimination of direct new space requests to MSS by all but FIGURE 1 system maintenance jobs. This was accomplished through the use of JES and TSO exits to restrict the use of 3.5.1 DASD to MSS the MSS unit parameter. The users placed all new non-TSO, non-tape type Smaller (less than 50MB) data sets data sets on the new-MSS DASD volumes were migrated to MSS if they had not and during nightly maintenance the been accessed in five days. This data sets were either copied to real resulted in an average of twenty data DASD or MSS based on the following sets being migrated daily. This usage criteria: A data set used four figure would have been larger but we times in two days was copied to DASD. were not able to keep up with the Otherwise it was moved to MSS. growing demand of MSS space. For economy reasons MSS cartridges were Tape data set placement was basically not ordered until they were needed. the catch-all for data sets that were Therefore our MSS had only 42% of the too large for MSS or DASD. allowable cartridges installed but the cartridges installed were 67% 3.5 Migration Procedure allocated. DMS was used to identify those data sets not accessed in five The final phase of the space days and a local program was written management solution was the to read the MSS control table and to development of a migration procedure assign MSS group identification based which would reduce the load on MSS and on data set name. It also supplied remove uneeded DASD data sets without control cards to DMS so that it would inconveniencing the user community. perform the copy. Figure 1 is a summation of the final migration procedure parameters.

227 .

3.5.2 MSS to DASD RESULTS

All MSS data sets accessed 15 times in Before After Change the past seven days were migrated to DASD. This was done so that the delay MSS READER BUSY 95% 57% -38% from a data set being heavily accessed and its migration to a faster device MSS DATA SET would be minimal but also avoid the STAGE TIME 62SEC 2 7SEC -3 5 sec migration of every MSS data set that was accessed. This resulted in an MSS DATA SET average of ten data sets being QUEUE 6.0 1.1 -4.9 migrated daily. SAS was utilized to read SMF data to identify heavily JOB LONG accessed MSS data sets and to build WAIT TIME 4.3MIN 2.1MIN -2.2 min control cards for DMS to perform the copy JOB ELAPSED TIME 11.5MIN 7.9MIN -3.6 min 3.5.3 DASD to Tape JOB INPUT Q Larger data sets (over 50MB) not used TIME 65MIN 20 MIN -45 min for two months were archived to tape. If required, the user had to request a TSO RESP TIME restore via a TSO command and the data (TRIVIAL) 2.2SEC 1.5SEC -.7 sec sets was returned during nightly maintenance. This was a slight TSO TRANS- inconvenience but one day turnaround ACTIONS (8-5) 55K 67K +12K for data inactive for over two months was not felt to be excessive. JOBS PROCESSED (8-5) 1100 1700 +600 3.5.4 MSS to Tape CPU BUSY(8-5) 65% 85% +20% MSS data sets not accessed in six months were archived to tape. This DASD ADDITION was caused because of the pressure of (6 MON) 8 2 -6 cartridge growth. Again, the user needed only to request a restore on FIGURE 2 TSO to retrieve his data set. As oulined previously, the MSS device 3.5.5 Tape to Delete was a "bottleneck to good system performance. The migration /placement Tape data sets, which had their efforts reduced the MSS reader busy retention periods exceeded were from S% to 57^ with the IBM guideline deleted. In addition, DMS archived value issued as 70^. More importantly, data sets which were over one year old the service provided by MSS improved were deleted. For MSS data sets that as measured by average data set stage were archived after six months, this time reduced from 62 to 27 seconds and involved a six month retention on average number of MSS data sets awaiting tape. For DASD data sets archived service reduced from 6.0 to 1.1. This after two months, this involved a ten in turn was reflected in improved batch 10 month retention. performance by reducing long wait swapped out time from U.3 to 2.1

4 . Results minutes and average job elapsed time from 11.5 to 7-9 minutes. This also The total effort from the perception effects the batch service queues by of the problem to the attainment of reducing average job input queue time improved results was six months. The from 65 to 20 minutes. The TSO system total manpower required was one person also benefited as evidence the full time and two people for 25% of reduction from 2.2 to 1.5 seconds in their time. Figure 2 summarizes the TSO trivial response time while the outcome of this effort.

228 . . .

workload executed by TSO increased 1. Data set naming standards - the from 55,000 to 67,000 transaction per users were given six weeks notice day. This increased workload was also that only standard names would be evidenced in the increase in CPU busy retained - all non-standard data from 65 to 85%. sets would be deleted. The user community said there was no way Two notes on those results - they could handle that change in under one year. Management 1. All of this was accomplished with supported our decision and we cut an increase of only two DASD over as planned. Suprisingly, there volume s were fewer than ten requests to 2. The greatest performance restore data sets of the more than improvements were realized by 4000 that were scratched. That eliminating the allocation of new indicated to us the larger number MSS data sets directly to MSS (i.e. of unused data sets on DASD and MSS. reader busy was reduced from 80% to 57% after this was implemented) and 2. Private packs - next to the naming the dedication of 8360 's as local standards the elimination of page data sets (this increased the private packs was the least liked CPU busy percentage from a level of of our changes. But we found that 75% to a level of 85%). if the needed space was provided when the user required it that was We are satisfied that the results all that the user desired. They experienced justified the effort didn't relly want or need their own expended volume

4.1 Continuous Monitoring 3. Reference date updates - since DMS used a data set's last reference In order to insure a good system date for migration purposes it was performance, continuous monitoring of Imperative that this be accurate. the DASD/MSS environment was done. We found that soon after we This included weekly reviews of: implemented our plan users developed programs that opened all 1. DASD channel and string usage using their data sets on a weekly basis RMF data. in order that they remain on DASD. 2. DASD pack usage and queueing using We have caught a few users doing RMF data. this and stopped them. We have 3. Allocated and available DASD space also coded programs that process using DMS reports. SMF data to identify this activity 4. MSS available space using IDCAMS and report it to upper management. utility reports. 5. MSS reader busy and data set 4. PDS directory problems - with our staging performance using SAS release of MVS (SE2) there occurred analysis of MSS trace files. a problem. DMS updates the VTOC each time a data set is opened ;the Any time the data indicated a problem reference time, date, and jobname. in, a particular area corrective action If two users are updating the same was taken immediately. This prevented partitioned data set, a problem can small performance problems from occur of having overlapped VTOC becoming major system bottlenecks. updates which are not replaced in the correct sequence. After losing 4.2 Experiences track of a few partitioned data sets, we turned DMS tracking of Not always was it obvious that the end system-wide partitioned data sets would justify the means. There were off. times where it was doubtful any progress could be made. I felt it would be informative to relate some of our more interesting experiences.

229 .

MSS maintenance procedures - due to 10. Release over-allocated. Through the way MSS picks and stages each the use of DMS we released data set, any maintenance which over-allocated space in operates at the data set level is partitioned and sequential data very time consuming. DMS operates sets. We averaged two volumes of at this level, and it takes about 12 space released during every weekly hours to do a DMS backup of 30 MSS run. This process required less volumes and three (3) hours to scan than fifteen minutes per week. 100 MSS volumes for invalid data set names. For this we chose the 5. Conclusion. incremental backup featyre if DMS whereby only changed data sets are In summary there are three areas which backed up. This dramatically were vital for the success of this reduced our maintenance time. project. They were communication to the user community, management Large record length on MSS - a data commitment; and credibility. set with a record length larger than track size (13030 bytes) could Throughout our effort we continually not be placed on MSS. This communicated with the user via letters occurred because the DASD (3350) to their management and bulletins to maximum is (19069 bytes). We had the programmers as to what we were no solution to this problem and were doing, when the change was effective, forced to move these data sets to why things were changing, and the tape after two months of non-use. effect of the change. To this end we made extensive use of an already Non-catalogued data sets - all of established user group. This user our migrations from MSS to DASD and group contained one representive from vice versa assume that the user can each application and served as our find the location of his data set liaison to the users. by means of its catalog entry. For that reason all volume references There were numerous times where the had to be removed and all user complaints of changes required non-catalogued data sets were could have turned management off to deleted our project. But fortunately they stuck with us and weathered the Empty data sets - when the initial storm. With their backing the migration concept was announced changes were made, the standards wer^ many users sought to guarantee DASD followed, and our project was a space by allocating empty data sets success. just in case they would need it. DMS released over-allocated space The fact that we showed the user that but as of this writing could not we could deliver service was very free totally empty data sets. This important. After placing all those resulted in a manual process of restrictions on them if we had not reviewing listings and deleting been able to accomodate legitimate empty data sets. space requests all of our efforts would have been for naught. I feel APF libraries - data sets in the this was the most important concept of Authorized Program Facility have making a data set placement and their volume serial number listed. migration plan work-credibility with Movement of these data sets caused the user community. system problems. We had to exclude these from our migration. I would like to acknowledge the efforts of Greg Marshall, in trouble shooting DMS problems and Jeff Wides, for sharing his MSS expertise and performance data.

230 .

DASD CAPACITY PLANNING 3350 DASD

Keith E. Silliman

IBM Corporation Federal Systems Division 18100 Frederick Pike Gaithersburg, MD 20879

A pragmatic approach to Direct Access Storage Device (DASD) I/O capacity is viewed as the ability to do I/O in terms of the number of I/O operations and bytes of data transferred. Maximum and "average" I/O capacity algorithms are developed for 3350 DASD, Data transfer and density of reference capacity values are derived. Examples of capacity value usage are given.

Key Words: DASD I/O Capacity, Density of Data Reference, I/O Capacity Algorithms, 3350 DASD.

INTRODUCTION high performance computer systems, especially online, data retrieval or Data Base There is an increased need to understand Management Systems (DBMS). Increasing the capacity of Direct Access Storage Devices amounts of data are stored, accessed and (DASD) to do I/O, in terms of the number of retrieved from DASD. Responsive user I/O operations or data bytes transferred per interactions require an adequate DASD unit of time. DASD technology has made subsystem. The design of the DASD subsystem, tremendous advances in increasing data inturn, depends on definitive system density. The resulting reduction in the cost requirements and knowledge of device per megabyte has made possible increased characteristics and capacities. This paper amounts of "online" data. Those increased explores the capacity of the 3350-type DASD amounts of online data, the fixed size of to do I/O in an IBM 3/370 MVS environment. existing computer facilities, and the increasing floor space costs associated with 3350 DASD new facilities have necessitated even higher density DASD. The IBM 3350 [10] is a large capacity, fast access, high data rate. Direct Access Sizing of DASD subsystem based on the Storage Device unit. A unit consists of two number of megabytes of data to be stored, drives, each with a head/disk assembly (HDA) floor space, and available funds frequently with 16 recording surfaces. Each drive has a produce configurations with the fewest number data storage capacity of 317.5 million bytes, of the highest density DASD available. As a allocated in 16,650 tracks (19,069 bytes per result, we believe there is an increased track) and grouped into 555 cylinders (30 awareness of the performance impact of the tracks per cylinder). The disks rotate at

I/O subsystem [ 1,2,3,^,5,6 ] and consider- 3,600 revolutions per minute, sixty able emphasis is being placed on capacity revolutions per second, or once every 16.7 planning [ 7,8,9 ]. milliseconds. A minimum of 10 milliseconds is required to move the access mechanism DASD capacity to do I/O is a critical (read/write head). The maximum head movement consideration in the design and tuning of time (seek) is 50 milliseconds. The data transfer rate is 1.198 million bytes per second

231 . .

The 3350 is a Rotation Position Sensing A projection of device utilization to (RPS) device [11]. It logically disconnects 100 percent with a device service time of 20 from the channel following the set sector milliseconds indicates a device capacity of conmand and for the specified sector to 180,000 I/O operations per hour or 50 per come under the read/write head. That waiting second. If the device service time was 16.7 time is the rotational delay time. When the milliseconds and 100 percent busy, the specified sector comes under the read/write capacity would be 216,000 I/Os per hour or 60 head, the device attempts to logically per second (one per revolution) reconnect with the channel. Following a successful reconnect, data transfer However, the elongated queue delay effect as commences. At the end of data transfer, a a function of resource utilization (response simultaneous channel end/device end I/O time = service time/(1-resource utilization) interrupt is presented to the CPU, completing will cause I/O response time degradation with the DASD portion of the I/O operation. higher device utilizations.

If at I/O initiation time the read/write head is not positioned to correct cylinder, a 33§0 I/O CAPACITY seek occurs to move it. During the seek, the MEASURED 1 HOUR SAMPLER I/O pathway is available to service other requests. At seek completion, the channel is so notified and the set sector command is issued .

If the device cannot logically reconnect «> because a pathway component is busy (i.e., P fiO- head of string, control unit, or channel), a missed reconnect occurs and a full rotation 30- (16.7 milliseconds) must complete before C- another logical reconnect is attempted. All o ao Legend the while, the device is busy "processing" 10- the I/O operation. X saas 30000 72000 lOaOOO 144000 180000 I/O OPERATIONS / HOUR 3350 CAPACITY MEASUREMENTS

Measurements of I/O activity provide insight to DASD I/O capacity. A plot of 3350

FIGURE 1 3350 I/O CAPACITY device busy and I/O operations per hour is MEASURED 1 HOUR SAMPLES presented in Figure 1. The data plotted are one-hour samples for individual 3350 devices A measure of queue delay is plotted with that exhibited device utilization greater device busy for the same samples and is than 10 percent. (The 10 percent threshold presented in Figure 2. That plot indicates was an arbitrary choice to reduce the total an increase of queue delay with increased number of samples plotted.) They are from a device utilization but with some significant one-week period on two IBM S/370 158 attached deviations. Samples of relatively high processor (mixed online and batch workload) device utilization and low queue delay occur, systems and an IBM 3033 (batch) system. The implying synchonized arrivals, probably a maximum rate for a one-hour sample was single user on the device. approximately 127,000 I/O operations, or 35 per second, and a 72 percent device util ization

The plot in Figure 1 indicates an average device service time of approximately 20 milliseconds (derived by multiplying the device utilization by the number of seconds per hour and dividing by the number of I/Os per hour) . Samples in the lower right have shorter device services times, indicating small effective data block sizes and/or minimal seeks and missed reconnects. Samples in the upper left have longer device service times, indicating larger effective data block sizes and/or increased number or duration of seeks and missed reconnects.

232 , .

3350 I/O qUEUEING task then processes the data read or prepares additional data to be written and may repeat MEASURED 1 HOUR SAMPLES m- the process. A schematic diagram of a complete DASD I/O operation is presented in so- X Figure 3. o 80- W- 1 ao- I0S2 I A«o«k«T(on

00~ X* X X 4o-

ao- ao- Legend flotnien to- * IBB &P I Y_ Tf _ _ I X soas 0- A iBS53B4StoflB7BaBge PIRCBNT I/O ^UBUGD

FIGURE 2 3350 I/O QUEUEIKC MEASUSED 1 HOUS SAMPLES- 10

Tim* 0*nc* Sanrla Tim*

Excp Tima

General guidelines, or rules-of- thumb for performance tracking and comparison have been developed. Batch system environment values are 35 percent device busy and an Device service time is the elapsed time average queue length of 0.1 [2,33. Data base from successful SIO to data transfer system values are 30 percent device busy and completion [3,4]. The EXCP, or I/O time, is average queue length of 0.04 [4]. Measured the device service time plus the combined CPU I/O activity rates for 30 to 35 percent processing of the EXCP, any queue delay device utilization are 36,000 to 72,000 per incurred initiating the I/O, and the I/O hour or 10 to 20 per second (see Figure 1). interrupt processing which follows data Up to 15 percent of the I/O operations are transfer queued for 30 to 35 percent device utilization (see Figure 2). The DASD I/O process may be interrupted or delayed at any of several steps. Higher DASD I/O priority tasks may preempt the CPU from the task attempting to execute the EXCP. A busy A DASD I/O operation is the sum of a component of the logical channel (channel or combined effort of the CPU, channel, control device) can cause the request to be queued unit, head of string, and device. The for later processing when the component is sequence of events is as follows: A software available [1]. An unsuccessful SIO task requests a DASD I/O operation by (condition code = 1) will occur if the DASD executing the Execute Channel Program (EXCP) control unit or head of string is busy even code which includes a Start I/O (SIO) though the channel and device are not busy instruction. With the successful execution [3]. Tha condition occurs in a shared DASD of the SIO (condition code = 0), the I/O or a channel/ string switched environment. subsystem portion of the operation is Seek frequency and duration are a function of initiated. A seek may occur to position the data set placement, access methods used, and read/write head to the specified cylinder. concurrency of device use. The duration of The set sector command is issued by the rotational delay is primarily a function of channel. The device revolves to position the missed reconnects which occur because of I/O requested data area under the read/write pathway component busy. head. The device logically reconnects with the channel and data transfers. The Our experience indicates that DASD I/O simultaneous channel end/device end I/O times (from EXCP to EXCP) should average 100 interrupt is processed by the I/O Supervisor milliseconds or less on S/370 Model 158 or (lOS) routines in the CPU. Finally, the task larger systems. Five general areas contrib- that requested the I/O operation is informed that the data transfer has completed. That

233 . .

ute to that elapsed time: (1) CPU time re- Access Method (BSAM) or Queued Sequential quired to execute the EXCP and process the Access Method (QSAM) , may result in an I/O interrupt, (2) logical channel queue time "effective" data block size twice the actual waiting to successfully start the I/O, (3) block size [31. Use of multiple buffers may seek time, (4) rotational delay time, and (5) also cause the effective data block size to data transfer. be larger than the actual.

The time required by the CPU to process a Three of the I/O component times (CPU DASD I/O operation depends on the CPU speed time, seek time, and data transfer time) are (MIPS/MOPS) and the number of instructions fixed as a function of the hardware and executed. Measurements of MVS environments software. Logical channel queue time and indicate 10,000 to 50,000 instructions per rotational delay time, including missed I/O operation [5] for supervisory and reconnects, are not bounded, especially in a application functions combined. The number shared DASD environment. of supervisory instructions is fixed for a given access method. "In MVS 3.7 . . . the DASD CAPACITY DISCUSSION path length for an EXCP is approximately 3800 instructions" [12]. The number of The preceding discussion of 3350 DASD, application instructions depends on the data DASD I/O characteristics, and the data block size and the intensity of the scan of presented in Figure 1 indicate that, in the data read or the build process for data general, a maximum of one I/O operation can written. The number of instructions divided occur per device revolution, or 60 per by the CPU instruction processing rate second. Because of contention at the CPU and provides the CPU time. I/O pathway component level (seeks, missed reconnects, and variable data transfer

The logical channel queue time is the times) , the "average" capacity is signifi- delay time spent waiting for the required I/O cantly less then one transfer per revolution. pathway components to become available to satisfy the request and is a function of the Consecutive data blocks can be read or I/O subsystem utilization [6]. The magnitude written (if the CPU is fast enough to execute of the delay will vary up to a multiple of the required I/Os and application instruc- the device service time when multiple tions to process the previous I/O interrupt, requests are made to an individual drive. process the data, and initiate the next I/O operation) in consecutive device revolutions, The seek time delay will occur whenever assuming sequential BDAM reads or writes on the read/write head is not positioned at the one cylinder (no seeks) of a dedicated de- correct cylinder. Seek times for an IBM 3350 vice, string, control unit, channel, and CPU.

are 10 to 50 milliseconds. Random access However , after the last data block on a track data will generally have one seek per I/O is transferred, the next revolution will oc- operation, whereas sequentially accessed data cur without a data transfer. Following the will generally have multiple I/Os per seek. transfer of the last data block on the track On a contended DASD, even sequentially N, the CPU will be busy processing the I/O accessed data may have one seek per I/O. interrupt while the first data block on track N+1 is moving under the read/write head. Rotational delay time will occur for essentially every DASD I/O. The average An analytical representation of that rela- rotational delay (not including missed tionship for different data block sizes is reconnects) is 8.3 milliseconds (one-half of presented in Figure i». To read or write 10

a revolution) . Missed reconnects occur as a consecutive one-tenth track size data blocks function of the I/O pathway component requires 11 revolutions. An average of five utilization [6]. Each missed reconnect adds revolutions will be required to read or write 16.7 milliseconds to device service time and four one-quarter track data blocks. Two con- corresponding increases in I/O response time. secutive half track blocks require an average Consecutive missed reconnects can, and do, of three revolutions, and an average of three happen revolutions, and an average of two revolu- tions will be required for each full track Data transfer time depends on the data block block size; the number of bytes transferred at the rate of 1.198 million bytes per second DASD I/O CAPACITY ALGORITHM (roughly 1,000 bytes per millisecond). For direct access I/O using Basic Direct Access Based on the assumption of the effective Method (BDAM), the effective data block size maximum of one I/O operation per device is the actual block size. Using search revolution and a missed revolution between previous I/O, with either Basic Sequential the consecutive accesses of the last data

23** — I

block on a track and the first on the next head movements (seeks), (3) missed recon- track, a Maximum DASD I/O Capacity Algorithm nects, (4) and application tasks are not was develop)ed. That algorithm is as follows: dispatched in a timely fashion. A minimum seek (one cylinder) requires 10 milliseconds, or more than half a revolution, a maximum seek requires 50 milliseconds, or three This algorithm indicates the maximum revolutions. A single missed reconnect capacity of a 3350-type DASD to be 216,000 causes a 16.7 millisecond (one revolution) I/Os per hour (60 I/Os per second) for very delay. small data block sizes. Capacity decreases steadily with increased data block size to "Average" DASD I/O capacity may be 108,000 I/Os per hour (30 I/Os per second) one-third (or less) of the mazlmum capacity. for half track blocks (9,300 bytes) and An algorithm for Average DASD I/O Capacity, remains at that level to full track blocks using the one-third value, is as follows: 19,000 bytes). Data block sizes in excess of full track cause a further decrease in I/O Average DASD Maximum DASD I/O Capacity capacity. I/O Capacity 3.3

This algorithm indicates the average capacity of a 3350 DASD to be approximately 65,000 I/O's per hour (18 I/Os per second) for very small data block sizes. With increased data block size, it to .10 Track Stock Sin decreases 1740 BvtH En 32,000 I/Os per hour (9 I/Os per second) for CPU —J-I SI _ mSIO SIO SIO CE/DECE/OE CE/DE CE/DE CE/OE SIO half track (9,300 bytes) data blocks and

Oi/Cu ] I a remains at that level to full track (19,000

D«vici bytes) block size. Figure 5 presents the algorithm results for maximum and average .25 TndE Block Sn DASD I/O capacity for data block sizes from 4628 Byw 1,000 to 19,000 bytes per data block. CPU P SIO CE/OE SIO CE/OE SIO CE/DE SIO CE/DE

I n I ^ p 0«v wsn wmn mm 50 Trtdc Block S» MODELED 3350 PERFORMANCE I/O PER CPU eo- Oi/Cu M-

46- «- LXTrKdc Block Su g 1902S 8 »- CPU ^ 30-

Ch/Cu O :> iB- 12- Legend 0- A MAXIMWM

I I I I I r- 1 Figuie 4 Aa:e»ing Comtcutiva Data Blocks -| 1 0- "T I —————— I . —— C> 1 k EFFBCTIViS DATA BLOCK SIZE k bytes

FICUEIE 5 MODELED 3350 PERFORMANCE I/O PER SECOND VS DATA BLOCK SIZE AVERAGE DASD I/O CAPACITY

The "average" capacity should be of primary interest to the computer systems The I/O capacity values obtained from the analyst not the maximum capacity. Where algorithm are consistent with the measurement average connotes practical or realistic data presented in Figure 1 and the "rule-of- values. The maximum rate projections assumed thumb" value of 30 to 35 percent for I/O dedicated CPU and I/O subsystem components. component busy. Twenty-seven percent (62 of

DASD revolutions will be missed (i.e., go 229) of the samples in Figure 1 are between unused ) in a contended environment because; 32,000 and 65,000 I/Os per hour. Only six (1) I/O operations cannot be initiated, (2) percent (14 of 229) exceed 65,000 per hour.

235 . .

Those values are from a population of samples DENSITY OF REFERENCE CAPACITY of unknown data block sizes that did not include any samples with device utilization Dividing the algorithm-derived I/O less than ten percent. capacity values by the number of hundreds of megabytes of data stored on a 3350 (3.17) The 35 percent device busy rule of thumb gives a measure of I/O capacity per 100 implies similar capacity values. For device megabytes of data storage, or Density of service times of 20 milliseconds (approxi- Reference Capacity. The maximum and average mately the value derived from the samples in 3350 density of reference capacities are Figure 1), the device capacity at 35 percent plotted as a function of data block size in busy would be 63,000 I/Os per hour or 17.5 Figure 7. The maximum density of reference per second (.35 * 3600 / .020 sec). For 35 (per 100 megabytes ) is indicated to be 18 millisecond device service times, the device I/Os per second for very small data block capacity would be 36,000 I/Os per hour or 10 sizes, decreasing with increased data block per second (.35 * 3600 / .035 sec). For a size to 9 I/Os per second for half track and constant device utilization value (e.g., 35 larger. The average density of reference is percent), the device capacity decreases as indicated to be 5 I/Os per second for very device service time increases. small data block sizes, decreasing with increased data block sizes to 3 I/Os per DATA TRANSFER CAPACITY second at half track and larger data block si zes Multiplying the derived DASD I/O capacity values by the corresponding data CAPACITY VALUE USAGE block sizes produces capacity data rates in bytes transferred. The maximum and "average" DASD I/O capacity values can be utilized data rate capacities are plotted as a as guideline, or rule-of-thumb values. For function of data block size in Figure 6. The example, while doing a data flow analysis of maximum data rate capacity indicated is a planned response-oriented computer system, approximately 570,000 bytes per second (30 the implicit requirement to accomplish 18-20 full track blocks) which is half the 1,198 I/Os per second (at 4 k bytes each) to a 3350 million bytes per second instantaneous data file becomes apparent. A comparison with the rate. The corresponding average data rate 3350 DASD capacity values indicates that capacity is 170,000 bytes per second (9 full 18-20 I/Os per second is within the capacity track blocks ) . Smaller data block sizes of the device but above the average for that produce a lesser number of bytes per second data block size. Project management should even though the number of data blocks be alerted to a possible performance problem. transferred is larger. Cost/ performance trade studies should be initiated to assess the benefit/ impact of in- creasing the data block size or distributing MODELED 3350 PERFORMANCE the data file (and accesses) across multiple BYTES PER SECOND VS DATA BLOCK SIZE devices

MODELED 3350 PERFORMANCE DENSITY OF DATA REFERENCE

Legend

10 ti |2 18 14 IS 10 IT 10 IS BFFBCTIVE DATA BLOCK SIZE k bytes Legend A MAXIICai

FIGURE 6 HODELBD 3350 PERFORMANCE BYTES PER SECOND VS DATA BLOCK SIZE 0 i 2 8 i s i ^ 0 a lb U l2 lb 14 lb 10 0* M lb iFFBCnVB DATA BLOCK SIZE k bytes

FIGURE 7 HODELED 3350 PERFORMANCE DENSITY OF DATA REFERENCE

236 .

The planning process for the migration data manipulation, or a high multiprogramming of data from 3330-1 to 3350 DASD could use level may have 3350 DASD I/O capacities lower these values. I/O activity rates on the than indicated here. existing DASD are either known or measurable values. For 3330-1 DASD, the I/O rate A 3350 DASD I/O capacity algorithm that measures are density of reference values per quantifies DASD I/O capacity, I/Os per 100 megabytes because of 3330-1 is a 100 second, and bytes transferred per second has megabyte device. Comparison of those 3330-1 been developed. Maximum and average capacity values with the derived 3350 values indicates predictions were made. We believe this whether the I/O capacity requirements of the algorithm can be generalized to all DASD but data to be moved exceeds the capacity of the have restricted our presentations and intended 3350 DASD. Average density of discussion to 3350-type DASD. Units of reference of the volumes to be migrated measure and predicted values are presented should not exceed the projected value for the for DASD density of reference. They may be receiving device. If the I/O rate to helpful in migration of data to higher existing 3330-1 volumes exceeds the projected density DASD. values for 3350s, the migration may result in increased I/O contention and degrade I/O The effective data block size and DASD response times. track size are critical components of DASD I/O capacity. In general, the maximum number Systems tuners should compare these of I/O operations per second occur with the capacity values with their system measure- smallest block sizes where the CPU and data ments. Significant deviations, either below transfer times are minimized. For small data or above these values, warrant further block sizes, the majority of the resource investigation. Systems not completing work utilization burden is on the CPU. Increased or providing unsatisfactory response times data block size cause additional CPU costs to should be expecially scrutinized. If process the data as well as increased trans- sufficient workload exists, low activity fer timesv However, CPU utilization de- rates (less than 5 I/Os per second) on the creases because I/O operations occur less most active devices may indicate excessive frequently, whereas I/O pathway component head movement or I/O pathway contention utilization increases. The overall time problems. Values above the average may required to process a given amount of data is indicate too much data per volume (data base) reduced because of the dramatically increased or insufficient number of devices (paging and number of bytes of data transferred per public work space). Systems with heavily second accessed volumes (10 to 20, or greater, I/Os per second) and a CPU with a significant The number of blocks of data per track amounts of wait time (20 to 30 percent or determines the maximum I/O capacity of a greater) need to have the access frequency/ DASD. Missing the next consecutive data contention reduced by distributing the data block in a sequential I/O operation following accesses across more devices. the last block on the track becomes increa- singly frequent with increased block size. SUMMARY For data block sizes greater than half track, essentially every data block is the last one We believe DASD capacity should mean on the track. more than the number of bytes of data stored on the device. Computer system analysts and The I/O capacity measures of DASD I/O managers need to become sensitive to DASD operations per second may be a more mean- capacity to do I/O, the number of I/O opera- ingful guideline than the 30 to 35 percent tions, and the bytes of data transferred per device utilization r ule-of-thumb values. unit time. Average 3350 DASD I/O capacity is predicted to be between 9 and 18 I/O operations per A discussion was presented indicating second, depending on the effective data block the combined, coordinated interaction of the size (for one-hour intervals). Unknown data CPU and I/O subsystem components required to block sizes can be approximated (channel accomplish DASD I/O. The speed of the CPU service time multiplied by the device trans- does not contribute to DASD I/O capacity or fer rate) . impairment thereof as long as the combined instruction execution time of I/Os and appli- cation code and any resultant CPU queue delay is less than the revolution period, of the DASD (16.7 milliseconds for 3350s). CPUs with less capacity than a S/370 Model 158, application functions with a high level of

237 ..

References 12, C. C. Burns, "Memory Can Increase Your Capacity," Technical Bulletin 1. K. Ziegler, Jr., "DASD Configuration and GG22-9053, IBM Corporation (1981), Sharing Considerations," Technical Available through the local IBM Branch Bulletin GG22-9052, IBM Corporation Office. (1978), Available through the local IBM Branch Office.

2. '0S/VS2 MVS Performance Notebook,' GC28-0886, IBM Corporation (1979). Available through the local IBM Branch Office

3. R. M. Schardt, "An MVS Tuning Approach," IBM Systems Journal 19, No. 1, 102-119 (1980).

4. S. Holt, "Information Management System - Virtual Storage - Multiple Virtual Storage (IMS/VS/MVS) Performance and Tuning Guide," Technical Bulletin G320-6004, IBM Corporation (1980), Available through the local IBM Branch Office.

5. J. B. Major, "Processor, I/O Path and DASD Configuration Capacity," IBM Systems Journal 20, No. 1, 63-85 (1981).

6. S. E. Frisenborg, "DASD Path and Device Contention Considerations," Technical Bulletin G22-9217 (1981) IBM Corporation, Available through the local IBM Branch Office.

7. L. Bronner, "Capacity Planning: An Introduction," Technical Bulletin GG22-9001, IBM Corporation (1977), Available through the local IBM Branch Office

8. L. Bronner, "Capacity Planning: Implementation," Technical Bulletin GG22-9015, IBM Corporation (1979), Available through the local IBM Branch Office.

9. 'Capacity Planning Extended (CPX),' SB21-2392, IBM Corporation (1981), Available through the local IBM Branch Office.

10. 'Reference Manual for IBM 3350 Direct Access Storage,' GA26-1638, IBM Corporation (1977), Available through the local IBM Branch Office.

11. D. T. Brown, R. L. Eibsen, and C. A. Thorn, "Channel and Direct Access Device Architecture," IBM Systems Journal 11, No. 3. 186-199 (1972).

238 AN ANALYSIS OF A CDC844-41 DISK SUBSYSTEM

J. William Atwood Keh-Chiang Yu

Department of Computer Sciences University of Texas at Austin Austin, TX 78712

In contrast with IBM control programs, input/output supervisor software for CDC systems frequently issues transfer requests for entire program images (e.g., when loading or rolling out a program). These program image transfers have substantially larger service times than regular input/output accesses. A detailed simulation model has been used to show that, for CDC systems, significant performance advantages normally accrue when the access path to the rollout files is disjoint from the path(s) used for regular input/output accesses. The system modelled has a 12-spindle CDC 844-41 disk subsystem, which is shared between two CDC Cyber 170/750 central processors. Projections for a 20-spindle subsystem are also reported.

Keywords: CDC 844-41; computer system evaluation; disk subsystem con- figurations; input/output; modeling; rollin/rollout files; simulation.

1 . Introduction of identical servers, with a service time distribution containing two distinct After an upgrade in central processor components with widely differing means. capacity at the Computation Center of the University of Texas at Austin (UTA), a When the two new CDC Cyber 170/750 careful measurement study was instituted to processors were installed at UTA, the disk evaluate the performance of the new subsystem consisted of 12 spindles and three configuration. Event-trace data were controllers. After examining the recorded from the system, and a simulation preliminary measurement results, it was model was validated. The model was used to conjectured that the merging of rollout file project the performance of the system if the accesses with regular file accesses was file allocation algorithms were altered in operating to the detriment of the overall order to minimize the expected seek time for performance of the disk subsystem. When the the disk subsystem. The observed data and rollout files were confined to a single disk the model results have been reported in spindle, connected to the processors through [1].^ a fourth controller, a spectacular improvement was observed, in that the While making the measurements it was interactive response time for short requests observed that typical service times for most dropped by a factor of four. input/output accesses to the disk subsystem were in the range of 35-45 ms. However, As it was not clear whether the rollout accesses had typical service times improvement was due to the separation of the in the range of 150-250 ms. Input/output two file types, or to the additional software for CDC processors normally capacity provided by the new controller, a processes these two kinds of access study was instituted to examine various disk identically, i.e., a new file is allocated subsystem configurations, and recommend a space in the next (or most) available best configuration for maximizing the spindle, regardless of the type of the file. performance of the overall system. It has Thus the disk subsystem appears as a number been found that separation of rollout files from regular files results in improved performance, except in certain cases of Present address: severe overload of the disk subsystem. Department of Computer Science

Concordia University In section 2 , the important characteristics of the disk subsystem are 2 Montreal, Quebec, Canada Figures in brackets indicate the literature discussed. Section 3 contains a discussion references at the end of this paper. of the experimental environment in which the disk parameters were measured. Section 4

239 . . ;

introduces the simulation model. Section 5 I/O software within NOS and UT-2D gives some data concerning the observed requests transfers in units of a sector, performance of the various parts of the disk which makes it convenient to specify any subsystem. In section 6, the model unit of transfer from one sector up to projections for various 12-spindle several logical tracks in one I/O request. configurations are presented, and the Requests originating from user programs observed effects are explained. In sections normally involve the transfer of small 7 and 8, configurations with altered file quantities of information. Requests management algorithms and increased numbers originating directly from control program of spindles are evaluated, and the activity normally involve the transfer of desirability of separating rollout and much larger blocks of information (e.g., a regular file accesses is shown to be even whole compiler) , and are also much less greater than for the cases reported in frequent section 6. Section 9 concludes the paper with discussion of the results and 2.4 Disk Spindle Parameters indications of directions for further research. The elapsed time (service time), T , at . a disk spindle required to complete a 2. Disk Subsystem Parameters request for data transfer to/from a moveable-head disk consists of three components (see Figure 2-1):

2.1 Disk Subsystem Configurations 1 . the time required to move the reading head to the cylinder In the CDC Cyber 170 system measured, containing the data (seek time, T^) each 844-41 (IBM 3330-equivalent) disk spindle is connected to two 7154 2. the time required to rotate the controllers, which are each connected, via a spindle so that the data to be channel and a peripheral processor (PP), to transferred is under the read/write both central processors. head (rotational latency time, T^^);

2.2 Data Arrangement 3. the time required to actually transfer (read/write) the data CDC 844-41 disks are sectored; there between central memory and the disk are 24 sectors per recording surface, and 19 spindle (data transfer time, T. ). data recording surfaces per cylinder, making a total of 456 sectors per cylinder. Because the 7154 controller can buffer only one sector at a time, a PP cannot transfer a received sector to central memory and issue the read function for the next sector before it has passed the read heads. Therefore files are allocated to alternating sectors: a logical track consists entirely of even-numbered sectors on a cylinder, or odd-numbered sectors. This is called half-tracking by CDC. For the UT-2D control program [2] used on the system measured, a logical track contains 228 sectors. For CDC's Network Operating System (NOS) [3], running on CDC 6000 Series or Cyber processors, the disk is divided into top and bottom (9-surface) halves, and a logical T : seek time s track contains 107 sectors.

T^ : rotational latency time 2.3 File Types

T^ : data transfer time For UT-2D and NOS, files are of four types: system files, rollout files, Figure 2-1: Disk Subsystem Service Times permanent files, and local files. In UT-2D, permanent files are allocated using a ring counter, to spread them evenly across all spindles. Local files are allocated to the The time required to complete an I/O "least full" spindle. Space for all file request consists of the disk service time, types is allocated in units of one logical plus waiting times for the disk, the track. For UT-2D, when a logical track is controller, the channel, and a peripherial requested to create or extend a file, the processor first (lowest numbered) track on the selected spindle is chosen. This tends to 3. Experimental Environment pack the disk towards the lower-numbered cylinders, and to spread all file types evenly across all in-use cylinders. When 3.1 System Configuration the last sector of a logical track is read or written, the PP program must follow a The main computational resource at the track chain (in a table in central mem.ory) University of Texas at Austin Computation to determine the location (on disk) of the Center (at the time of the study) consisted logical track containing the next sequential of two Cyber 170/750 central processors, sector of the file. each with 262,144 60-bit words of central

240 . . memory, 20 peripheral processors, and 24 simulation language called QSIM [6]. The channels. The two Cyber processors share QSIM preprocessor converts the model access to a 505,204 word extended core description into a Fortran program. A storage (ECS) facility, and to a mass simulation model was used because of the storage (disk) subsystem. The disk desire to accurately model the details of subsystem consists of four CDC 7154 disk the seek, latency, and data transfer controllers and twelve spindles of CDC activity, and because of the two-path 844-41 disk storage. switches in front of both the disk spindles (access via two controllers), and the The two processors run under control of controllers (access from two CPUs). the UT-2D operating system [2]. The normal mode of operation is that interactive jobs The model has two levels, which are run on CPU A, and batch jobs are run on represent the general job flow and the I/O CPU B. When interactive jobs request input subsystem, respectively. The first level from their terminal, they are rolled out to consists of two closed chains which model ECS. If ECS gets too full, jobs with long the processors, the tape drives, the residence times are swapped to a disk rollin/rollout servers, and the user think spindle with its own controller. Batch jobs time. The second level model is patterned are rolled out directly to the same swapping after the physical interconnections (among spindle from central memory, without going channels, controllers, and spindles) which through ECS. System, local, and permanent occur in the actual system. Because of the files are kept on the remaining eleven fact that these resources are held spindles, and accessed through three simultaneously by a job, passive resources dual-access controllers. [?] are used heavily in the model. The complete model is described in Yu [8]. UT-2D has two primary components in each processor: MTR (MoniToR), which This model was validated by comparing executes in PP zero, and which is in overall its performance statistics to the control of the system (on that processor); observations reported in section 5. The and CMR (Central Memory Resident), which essential feature required for model runs on the central processor, and performs validation was the introduction of the functions which are difficult or multi-part model for the data transfer inconvenient for MTR to do itself (usually characteristic, as described in section 7. because they require substantial computat-

ional resources) . One other PP on each 5. Measured Values processor is dedicated to managing the system console display; the remaining 18 PPs

form a pool to be used for input/output 5 . 1 Interactive Response Time operations and for other operating system functions When an interactive transaction is completed, the program image is rolled out 3.2 Data Recording to ECS. If the program image resides too long in ECS, it is copied to disk. When the Data concerning the operation of UT-2D interactive user hits carriage return, those were recorded by an event-trace probe images which have spilled over to disk must embedded in MTR and CMR [4], using a buffer first be copied back to ECS, prior to being managed by CMR. This buffer is periodically rolled into central memory. The interactive written to tape by a transient PP routine. response time thus consists of the time For the models and data reported later in between the initiation of a request for this paper, two tapes were running (one for rollin and the initiation of a request for each machine), over the same time period, rollout, for all transactions, plus the time but the event recording was not synchronized required to copy from disk to ECS (the disk between the two machines. subsystem response time for a rollin

request) , for those transactions which have Data concerning the disks were recorded spilled over, averaged over all at the conclusion of each data transfer, by transactions the disk driver which is resident in all PP routines. All waiting times that occur from 5.2 Disk Subsystem Performance the time that the channel is requested to the time that the channel is released are Two trace tapes representing quite recorded. If the data transfer involves different load situations are presented in multiple logical tracks, the data for each the two following subsections. logical track are recorded separately, but it is possible to distinguish these cases. 5.2.1 Trace Tapes 72/73

The data are reduced by an event Trace tapes 72 and 73 were taken on a delogging program called EVENTD [5], which busy day, at about 11.30 a.m. The number of produces statistics and histograms for interactive users on line was 109, but the events of interest, as well as a summary demands that were placed on the two page of overall statistics. processors were relatively light. Therefore these tapes represent a situation where no 4. A Queueing Network Simulation Model component of the disk subsystem is a bottleneck (i.e., has a substantial queue), The results described in section 5 have so that the disk subsystem is, in some been used in a simulation model of the sense, operating in a "linear" range. Note, configuration described in section 3.1. The however, that the disk and controller model is written in a queueing network utilizations are substantially in excess of

241 1 . . those recommended by CDC for this disk For regular file accesses, the controller subsystem. wait is almost as large as the disk service time, and the channel waiting time is larger that the sum of all other wait times. Thus Table 5-1: Disk Subsystem Times—Tapes 72/73 severe queueing is present at the channel and the controller, and the disk subsystem Interactive Batch is overloaded. The last line of Table 5-2 Pop* Time % Pop* Time gives the disk subsystem response time for interactive spillover rollout and batch Regular Accesses rollout requests. These are actually Chan wait 0 204 9 6 14 0 107 7 .0 10 slightly smaller than those for trace tapes

Cont wait 0 240 1 3 17 0 294 19 .2 28 72/73. Spin wait 0 089 4 2 6 0 053 3 .5 5 Spin serv 0 911 42 8 63 0 586 38 .3 56 6. Projections for Alternate Configurations

Total 1 444 67 9 100 1 040 68 .0 100 Several experiments have been performed to examine the question: what is the effect Rollout Accesses of merging rollout file accesses with Total 979.0 1064.0 regular (user) accesses to the disk subsystem? Since rollout transfers tend to * Average number of jobs waiting or being be much longer than regular transfers, served at each point. variance will be introduced into the service time for the disk spindles, which will have the effect of lengthening the response time of the disk subsystem for regular accesses. On the other hand, if the fourth controller Table 5-1 presents the components that (used for rollout files) is less utilized make up the disk subsystem response time. than the first three (used for regular

For regular file accesses, actual disk files) , then the average channel utilization service time accounts for two thirds of the for regular files may be lowered, resulting disk subsystem response time, with the in a shortened disk subsystem response time. remaining one third resulting from queueing The disk subsystem response time for rollout for a channel or a controller. Thus file accesses should be shortened in all queueing is present at the channel and at cases, because more paths will be available. the controller, but it is not severe. The last line of Table 5-1 gives the disk sub- Three configurations of the disk system response time for interactive spill- subsystem are presented: over rollout and batch rollout requests. 1 . Three controllers and 1 1 spindles for regular file accesses, arranged 5.2.2 Trace Tapes 74/75 in a triangle with (4, 4, 3) spindles on each leg, one controller Trace tapes 74 and 75 were taken on the and one spindle dedicated to rollout same day as trace tapes 72 and 73. but at file transfers (configuration A). 3.30 p.m. The number of interactive users This is the actual hardware increased only slightly to 112, but the configuration measured. demands that they placed on the processors were much heavier. Therefore the total 2. Four controllers and 12 spindles, controller capacity tends to limit the arranged in a square with three overall system performance, and the disk spindles on each leg. All 12 subsystem is operating at well in excess of spindles are used indiscriminately maximum reoomended utilization. for both kinds of transfers (configuration B)

Table 5-2: Disk Subsystem Times—Tapes 74/75 3. Four controllers and 12 spindles, arranged as for configuration B, but Interactive Batch with rollout files confined to a Pop* Time % Pop* Time % single spindle, and regular files on the remaining eleven (configuration C) Chan wait 2 880 111 5 55 3 270 118 3 58 The motivation for configuration C is that

Cont wait 0 924 35 8 18 1 020 36 9 18 it is "similar" to configuration A in the Spin wait 0 144 5 6 3 0 143 5 2 3 sense that only one spindle contains rollout

Spin serv 1 260 48 8 24 1 200 43 4 21 files.

Total 5 208 201 7 100 5 633 203 8 100 6.1 Trace Tapes 72/73

Rollout Accesses Table 6-1 reports the results for trace Total 904 .0 849.0 tapes 72 and 73. As can be seen, the rollout channel (channel 4, configuration A) * Average number of jobs waiting or being has a utilization approximately equal to the served at each point. utilization of the regular channels. Therefore, effects due to increased channel availability should be negligible, and the changes in system performance should be due entirely to the variance introduced by Table 5-2 presents the components that merging the two classes of files. make up the disk subsystem response time.

242 Examining the data for configuration B, rise, while the others fall, resulting in an it may be seen that the disk subsystem unbalanced situation, and a further decrease response time increases for regular in system performance: interactive response accesses, and decreases for rollout time rises by 25% compared with accesses, as expected. The overall configuration A, and batch throughput falls controller utilizations, channel by 5.51. Note that when the system becomes utilizations, and logical channel queue saturated, the two channels used for rollout lengths increase, for both the interactive files will bottleneck first because of the and batch processors. As a result, the unbalance. This suggests that confining the interactive response time increases by 13J, rollout activity to a single spindle is and the batch throughput drops by 2.4J. undesirable, when regular and rollout files share controllers.

Table 6-1: Utilizations and response times for tapes 72/73 Table 6-2: Utilizations and response times for tapes 74/75 Logical Channel Queue Length Interactive Logical Channel Queue Length 1 2 3 4 Total Conf ig Interactive

1 2 3 4 Total Config

.076 . 080 .047 .430 0.633 A

.123 . 156 .132 .117 0.528 B 1 .07 1 .06 .757 .436 3.323 A

.119 . 059 .084 .822 1.084 C .661 .537 .484 .581 2.263 B

.422 .285 .476 1 .42 2.603 C Batch

1 2 3 4 Total Config Batch

1 2 3 4 Total Config

.035 . 041 .028 .233 0.633 A

.101 . 074 .083 .080 0.338 B 1.26 1 .21 .800 .210 3.480 A

.081 . 020 .041 .315 0.487 C 1 .00 1.03 .951 .955 3.936 B

1 .02 .627 .948 1 .45 4.045 C Channel Utilization Interactive Channel Utilization

1 2 3 4 Total Config Interactive

1 2 3 4 Total Config

.415 . 440 .384 .459 1.698 A .468 474 .467 .459 1.868 B .771 .796 .762 .477 2.806 A .598 414 .361 .653 2.026 C .733 .753 .742 .720 2.948 B .806 .659 .638 .822 2.925 C Batch

1 2 3 4 Total Config Batch

1 2 3 4 Total Config

.312 , 332 .289 .415 1.348 A .379 376 .357 .361 1.473 B .775 .823 .768 .413 2.779 A .490 287 .235 .560 1.572 C .715 .723 .712 .693 2.843 B .792 .623 .584 .795 2.794 C Controller Utilization (interactive + batch)

1 2 3 4 Total Config Controller Utilization (interactive

1 2 3 4 Total Config .551 575 .512 .607 2.245 A .592 599 .579 .579 2.349 B .914 .930 .908 .629 3.381 A .699 531 .470 .756 2,456 C .879 .890 .883 .866 3.518 B .915 .826 .794 .920 3.455 C Interactive Statistics Configuration A Interactive Statistics Configuration ABC Response time 554 626 695 Disk subsystem Response time 1664 1505 1534 response time Disk subsystem (regular) 67.6 92.6 103 response time (rollout) 979 417 897 (regular) 202 179 172 (rollout) 904 577 1033 Batch Statistics Configuration A Batch Statistics Configuration A Throughput 25.4 24.8 24.0 Disk subsystem Throughput 42.3 38.8 38.8 response time Disk subsystem (regular) 67.9 96.6 104 response time (rollout) 1064 669 939 (regular) 204 244 242 (rollout) 849 740 947

With configuration C, the controller and channel utilizations for those controllers connected to the rollout disk

243 . ; . . r r . r

6.2 Trace Tapes 7H/75 necessitates a search of, on average, one half of a cylinder, requiring one half of 19 Table 6-2 presents the results of the rotations of the spindle, or 158.333 ras. In analysis of configurations A, B, and C for this section a series of experiments are trace tapes 74 and 75. As the utilization reported in which this component is deleted of the rollout channel in configuration A is (i.e., the required information is assumed considerably smaller than the average to be stored in central memory) utilization of the regular channels, spreading the files evenly over four Table 7-1 shows the change in disk controllers lowers the average channel subsystem response time for regular file utilization for those channels which were accesses, for trace tapes 72/73 and 74/75. used for regular files in configuration A. Comparing case I with case II, it may be seen that a substantial improvement in the

On the interactive processor, the disk spindle service time, T^ , has occurred. decrease in response time due to more This drop in T, is reflected in an even channel availability is sufficient to greater drop in the overall disk subsystem

overcome the increase caused by the response time, T , due to reduction of introduction of variance, resulting in a queueing caused by high channel and modest improvement in the response time controller utilizations. (For example, for (9.0% for configuration B, 7.8% for trace tape 74, a drop of 7.0 ms. in T,

configuration C) , and in the disk subsystem results in a drop of 90.0 ms. in the disk response time for regular file accesses. subsystem response time T .) However, due to the larger variance on the batch processor (rollout file accesses are longer and regular file accesses are Table 7-1: Disk subsystem response times shorter), the net result is an 8.3% decrease (regular file accesses). in batch throughput, accompanied by an increase in the disk subsystem response time Trace Tapes 72/73 for regular file accesses. Note also that, because of controller saturation, the disk Configuration A B C subsystem response time for batch rollout files accesses is worse in configuration C System case T T chg"^ T chg"^ -r — — than it is in configuration A. Inter. I* 42.8 67 6 96 2 +42% 104. +53%

6.3 Discussion II 33.4 45 1 82 0 +82% 84.3 +87%

These three experiments at two load Batch I 38.3 67 9 96 6 +42% 104. +53% levels appear to support our contention that II 31 .8 44 6 78 4 +76% 81 .8 +83% for normal to heavy loads, the variance introduced by merging long and short accesses results in a reduced system Trace Tapes 74/75 capacity, while for extremely heavy loads the lowered channel utilizations compensate Configuration A B C for the variance effects, making the separate controller configuration less System case chg"^ chg"^ —T —T —T desirable Inter I* 48 9 202. 179. -11% 172. -15%

7 . Projections for Altered File Management II 41 9 1 12. 131 . + 17% 138. +23%

The distribution of data transfer times Batch I 43 4 204. 244. +20% 262. +28% for regular file accesses in UT-2D can be II 39 4 123. 177. +44% 191. +55% divided into four distinct components [1]: 1. An exponentially distributed compon-

ent, with a mean of approximately 10 *I : before deletion ms., representing normal user I/O; II: after deletion """percent change compared with config A 2. An uniformly distributed component, with a mean of 158 ms., representing searches for end-of-information when updating random access files; An additional result of interest is 3. A component representing accesses that, because the rollout file accesses and requiring two seeks (i.e., the total the regular file accesses now have mean data to be transferred in one access service times that are even more disparate, is split between two physical the variance introduced by merging the two cylinders) file types together causes a greater (percentage) degradation in the performance 4. A component representing loading or of the disk subsystem in case II than is checkpointing operations (typically observed for case I. requiring three or more seeks) The second component is an artifact of the Given that the uniform component in organization of UT-2D file accesses, and is case I is an artifact of the organization of necessitated by the fact that the position file accesses in UT-2D, which may be easily of the end-of-information sector within a corrected by reprogramming (in fact, this

logical track can only be determined by change has already been made) , it follows reading sectors from the logical track until that case II is the preferred mode of end-of-information is discovered. This operation. This implies that separation of

244 . . . rollout files from regular files is likely clearly superior to all other configurations to be desirable, even if the rollout studied controller is underutilized. This result should therefore be applicable to other CDC This again reinforces our previous operating systems, as the end-of-inforraation conclusion that rollout file activity should position is stored in central memory (in the be separated from regular input/output File Status Table entry) for systems such as activity, especially if it can be given CDC's NOS [3] adequate transfer capacity.

8. A 20-3pindle Configuration 9. Discussion

As the interactive processor utilization is less than 0.5 for a 9.1 Applicability of the Data 12-spindle disk subsystem, and as other studies had indicated that interactive The data observed are for a unique activity would saturate the disk subsystem operating system, executing a computational at about 150 users [1], various 20-spindle workload which is almost entirely research- configurations were investigated, with an and student-oriented, i.e., which has little interactive load of 200 users. business data processing component. However, except for the way in which direct Various configurations were access updating is done, UT-2D is investigated: Configuration D is identical essentially identical to NOS as far as to configuration A, except that two organization of the I/O subsystem is controllers are added, with eight spindles concerned. Therefore we feel that the data between thera. Configurations E and F and the models presented here are also arrange the 20 spindles into two triangles, applicable to CDC 6000 Series and Cyber with one rollout disk for each class of systems operating under NOS. Valid models rollout file (batch and spillover for NOS systems could be obtained either by interactive) . Configurations G and H are deleting the direct access component (see similar to configuration C, except that section 7) in the present models, or by eight spindles are added between two running the models with data gathered controllers, and two spindles are dedicated directly from a NOS system [9]. to rollout files, as for configurations E and F. 9.2 Comparison vilth IBH Systems

The transfer time characteristic has Table 8-1: Statistics for 20-spindle several components, and the details of its configuration shape were essential to achieving validation of the system model. The characteristic is Configuration D E F G H D radically different from published characteristics for IBM 3330 systems [10, Response time 1854 1836 1744 1795 1813 1424 11], in spite of the fact that the physical Throughput 46.6 43.2 45.1 43.1 44.1 47.2 equipment (disk spindle) is identical. This Disk Subsystem difference may be due to the fact that the response time I/O software of UT-2D (and NOS) organizes Interactive data accesses to the disk subsystem in quite (regular) 133 199 189 197 202 146 different ways from the ways in which (rollout) 5280 955 792 792 829 387 System/370 I/O Supervisors organize data Batch accesses. This difference is so strong that (regular) 113 185 159 185 179 122 Zahorjan [11] found no differences in his (rollout) 1561 908 771 855 855 405 results when he varied the shape or the mean value of the transfer time distribution, whereas our results depend critically on these parameters.

The first five columns of Table 8-1 The major point of difference comes give various statistics for the different from the decision to optimize locally [i.e., subsystem configurations. There is little to reduce the total time for an individual to choose among them: the largest and transfer (especially a long transfer such as smallest response times differ by only six a rollout access)]. This makes CDC systems percent. However, the disk subsystem fundamentally different from IBM systems response time for interactive rollout file (even though the disk spindles are accesses is over five seconds for electrically identical). For CDC systems, configuration D. This implies that this the possibility of extremely long transfers configuration is limited by the capacity of will lead to a very large variance in the the rollout channel. (Configurations E, F, service time for data transfers, which will G, and H have two rollout spindles; interfere severely with the ability of CDC configuration D has only one.) To disk subsystems to provide good service, compensate for this, the model was run using because the waiting time for the controller a rollout transfer time that was is very strongly influenced by the variance approximately 40% smaller. (This of the service time for the disk spindles. corresponds to the use of a full-tracking disk with a controller capable of buffering 9.3 Study Results more than one physical record at a time.) The last column of Table 8-1 gives the In section 6.3 we have shown that, result for this run. The interactive except for extremely heavy loads, the response time and the batch throughput are measured system appears to operate more

245 . . , . ,

efficiently if the rollout activity is Computation Center and the Department of separated from the regular activity. We Computer Sciences, University of Texas at note that the load represented by trace Austin tapes 74 and 75 is unusually high (it was created by reducing the number of disk References spindles from 20 to 12 for one day without [1] Atwood, J.W., Yu, K.-C, and Mc Leod

informing the users) , and as such would not A., An Emperical Study of a CDC 844-41 normally occur in practice. Disk Subsystem, Performance Evaluation (accepted for puplication) In section 7 we reinforce our [2] Howard, J.H.. A Large Scale Dual conclusion, as the simulation model results Operating System, in Proceedings ACM project that operation conditions closer to Annual Conference 1973, pp. 242-248.

those in CDC's NOS show a clear preference [3] NOS Version ^_ Reference Manual , Control for separation. Data Corporation, St. Paul, Minn., 1978. Publication numbers: 60435400 In section 8 we note that the gains to (Volume 1), 60445300 (Volume 2). be made can also be invalidated if the [4] Howard, J.H., and Wedel, W.M., The

rollout path is not allocated sufficient UT-2D operating system event recorder ,

transfer capacity. We therefore feel that Technical Report TSN-37 , University of it is necessary to separate the rollout Texas at Austin Computation Center, activity from regular I/O, in order to February, 1974. maximize the performance of the disk [5] Howard, J.H., and Wedel, W.M.,

subsystem, except when the whole disk EVENTD-UT- 2D event tape summary/ dump , subsystem is sufficiently overloaded that Technical Report CCSN-38 (Revised), increased channel availability can overcome University of Texas at Austin the effects of the variance introduced. Computation Center, August, 1977. [6] McGehearty, P.F., QSIM, an implementation of a language for the 9.4 Future Work analysis of queueing network models, M.A. Thesis, University of Texas at A considerable amount of data is Austin, Dept. of Computer Sciences, available in the trace tapes, which has not 1974. See also Foster, D.V., been used in the present study. Since the McGehearty, P.F., Sauer, C.H., and original study was done, a simulation model Waggoner, C.N., A Language for Analysis has been developed which was used to of Queueing Models, Fifth Annual evaluate the utility of seek/read-write Pittsburg Modeling and Simulation overlap in this system. The results are Conference, April 1974.

reported in [ 1 ] . Further refinement of the [7] Sauer, C.H. and MacNair, E.A., rollin/rollout model elements is planned. Simultaneous Resource Possession in One solution to the long holding times is to Queueing Models of Computers,

apply preemption to the disk transfers, Performance Evaluation Review , vol. 7, i.e., to break them up into multiple no. 1, 1978, pp. 41-52. transfers with release of the channel [8] Yu, K.-C, The effect of system periodically [12]. This will lower the configuration and file placement on the variance of the disk spindle service time, performance of a CDC 844-41 disk but increase the elapsed time required to subsystem, M.A. Thesis, University of complete long operations, as interfering Texas at Austin, Department of Computer disk arm movement may take place. A study Sciences, 1980.

is currently planned to determine whether [9] Lo Cicero, C. , An event-trace study of the benefits which can be obtained outweigh the performance of the I/O subsystem the disadvantages accruing from the for a CDC Cyber 172 computer, preemption M.Comp.Sci Thesis, Concordia University, Dept. of Computer Science, In addition, the continuous holding of 1980.

resources on the path between the central [10] Bard, Y. , A Model of Shared DASD and

processor and the disk spindle is a major Multipathing , Comm . ACM, Vol. 23, factor contributing to the results presented No. 10, October 1980, pp. 564-572. in this paper. Another study will examine [11] Zahorjan, J., Hume, J..N.P., and Sevcik, the performance improvement which results K.C., A Queueing Model of a Rotational

when the disk subsystem requests are Position Sensing Disk System, INFOR , centrally scheduled, i.e., when the required Vol. 16, No. 3. October 1978, resources (PP, channel, controller, spindle) pp. 199-216. are not allocated until all are available. [12] Browne, J.C Interaction Of Operating and Software Engineering IEEE, Vol. 68, No. 9, September 1980, pp. 1045-1049. The authors would like to acknowledge the assistance of the staff of the Computation Center at the University of Texas at Austin, expecially N. Person, W. Jones, Dr. T. Keller, G. Smith, and Dr. C. Warlick, and the encouragement of Dr. J.C. Browne, who invited the first author to visit the University of Texas during 1979-80. This study was supported in part by the Natural Sciences and Research Council of Canada, grant number A-8634, and in part by the

246 Performance Prediction and Optimization - Methods and Experiences

247

SESSION OVERVIEW PERFORMANCE PREDICTION & OPTIMIZATION - METHODS & EXPERIENCES

Thomas P. Giammo

Federal Computer Performance Evaluation & Simulation Center

The two papers of this session entitled, "Tuning The Facilities Complex of UNIVAC 1100 OS" and "Approximate Evaluation of a Bubble Memory in a Transaction Driven System" are about as far apart in their basic nature as can be imagined while remaining within the context of the subject of this session. One is a highly pragmatic "how to" approach to obtaining specific operation system parameters in a particular operating system while the other is a highly abstract approach to predicting the general characteristics of a model of a computer system. In one sense, this diversity in content is a measure of a broad scope of our field. Each paper represents an important facet of the work that we need to perform in order to achieve results that are practical and capable of implemen- tation as well as soundly-based from a theoretical viewpoint. I believe that each of the papers will be found to be very helpful to the audience which it addresses.

In another sense, however, the diversity in content is indicative of a continuing problem in our field — the apparent difficulty of the theoretical and practical elements of the performance evaluation community in finding common ground.

249

. .

TUNING THE FACILITIES COMPLEX OF UNIVAC 1100 OS

John C. Kelly

Datametrics Systems Corporation Burke, VA 22015

Jerome W. Blaylock

System Development Corporation

Slidell , LA 70458

Performance of UNIVAC 1100 computer systems is largely determined by the way the operating system, 1100 OS, manages the hardware resources. Since workload and performance requirements are unique to a particular site, UNIVAC supplies an extensive set of performance parameters for tailoring the system. One set of parameters, those associated with the Facilities Complex are examined in detail. The Facilities Complex is responsible for managing mass storage (disks and drums). The para- meters associated with the Facilities Complex are defined, performance hypotheses are stated, and sources of data for testing the hypotheses are discussed. This paper extends and updates work reported by Kelly and Route [1^]

1 . I ntroduct ion presented in Table 1. To keep track of what devices are on the system and how much space The 1100 OS, commonly referred to as is allocated to which files, the Facilities the EXEC, is divided into six major compon- Complex makes use of an extensive set of ents: Symbiont Complex, Coarse Scheduler, control tables. They are summarized in Activity Control, Dynamic Allocator, Facil- Table 2. The relationship among the DLT, ities Complex, and I/O Complex. The Sym- DAD, search item, and lead item is presented biont Complex buffers data between I/O de- i n Fi gure 1 vices and main memory. The Coarse Schedu- ler selects which runs should be opened Upon receiving a new request for mass next. Activity Control manages CPU dis- storage, the Facilities Complex first checks patching and allocates CPU quanta. The the EST to determine if sufficient space Dynamic Allocator manages main memory. The exists on the device type requested. If Facilities Complex manages peripheral stor- sufficient space is not available on the age devices and allocates file space. The requested device type, it selects another I/O Complex controls all I/O operations device type according to the selection rules performed on the system. defined at system generation time. The Facilities Complex then accesses the LDUST The Facilities Complex consists of a and MBT to allocate the mass storage request- set of non-resident EXEC segments that must ed. be loaded when their services are required. Services performed by the Facilities Com- The Facilities Complex processes facil- plex include such things as file assignment, ities requests in three passes. The first space allocation, and file cataloguing. A pass checks for the availability of the list of the frequently used segments is facilities being requested. If the requested

251 .

Table 1. Main Segments Associated with Table 2. Control Tables Used by Facilities

Fac i 1 i t i es Comp 1 ex Comp 1 ex

Segment Name Deecrlptlon Control Table Description

FIMAIN The entry point for the FacllltieB Cooiplcx. All Tcc|U66CB lnlCl.AHy go Co FSiAIN which scCivACes MCT Master Configuration Table. Describes the attri- ubordlnate segmentB to do the work. butes for all devices conflgrzred on the system. FIA5G Process 9ASG and 9CAT requests. HFD Master File Directory. Describes the attributes of all catalogued files in the system FIFREE Process @FR£E requests. Master Bit Table. Defines how space is allocated ^MODE. and ^QUAL requests. MBT FIUKQ Process on each mass storage device. track (TRK) or position FALL Allocate mass storage in Area Descriptor. Describes each contigu- granularity. DAD Device (POS) ous block of mass storage allocated to a file. Allocate mass storage sectors to EXEC. FASEC DLT Directory Lookup Table. Part of the MFD that FIBRKI Break facility request into a standard format contains a pointer for each catalogued file to for processing. either a search item or lead item. FICKAC Check availability of cooinun leat ions line. EST Equipment Summary Table (also called EQPSUM) Defines the number of 'positions and tracks avail- of mass stprage apace. FICKAF Check availability able on all mass storage devices of the same type, FICKAT Check availability of tape drives. ^ere is one entry for each device type Buch as 8450 's. FICRIT Create PCT item for file being assigned. LDDST Logical Device Unit Status Table. Defines the In facilities hold state (put on FIHOU) Place a run a of each mass storage device configured on unavailability of facilities. status BOLDQ) due to the system. FREL Release mass storage. FRSEC Release mass storage sectors assigned to EXEC. ROLBAK Initialize a canned run to reload a file. ROLOOT Initialize a canned run to unload a set of files. DRC Manage the master file directory.

LEAD MAIN DAD OAO ITEM ITEM TABLE TABLE DAD TABLE* NEXT DAD TABLE~» PREVIOUS DAD TABLE ERA F lltST WORD HI FRA LAST WORD I

DAT 8

DIRECIOIIY LOOKUP SEARCH TABLE ITEM

FILE NAME«

'1 -» jn FILE NAME* FILE NAME*

SEARCH '2 — If ITEM LEAD MAIN DAD »4^lo d

PILE NAME '

HASH (QUAUFIEU'I'ILE NAME) • I

LEAD MAIN ITEM ITEM

F-CYCLE •

Figure 1. Master File Directory

252 . ..

facilities are not available, the run is "tight." This condition is detected by FALL placed on the facilities hold queue (HOLDQ) which in turn calls ROLOUT to initiate the until they become available. When it is rollout operation. The critical ity of mass determined that all system resources re- storage is tracked by four other thresholds, quested by the run are available, pass one MSW2 through MSW5. These thresholds trigger is completed. The second pass builds the different throttling mechanisms to prevent required control tables and creates the MFD available mass storage from being totally entries. The third pass performs the actual exhausted before rollout can be completed. allocations. Only mass storage requests The thresholds are defined in Table 3. with initial reserves require allocation; if no initial reserves are specified, the allocation is handled dynamically by the Table 3. Mass Storage Thresholds I/O Complex.

The Facilities Complex has been studied Definition Default Value2

in depth by [1,3,6] and as parts of larger MSUl MSW2 + HSW4 3120 If BuisB itorage available

la leee than HSWl , FALL studies by [2,4,5]. calls ROLOUT. Also Che mininnjni number of cracks to be unloaded. 2. Virtual Mass Storage MSW2 MSW3 + MSW4 2340 Print warning message on console

MSW3 0150 1560 Stop run scheduling with Mass storage (peripheral storage on "CS H." drums and disks) is managed as a virtual ttSVi MSW3 / 2 780 Print warning oiessage on console

resource on UN I VAC 1100 systems. That is, MSW5 0200 128 Stop all requests except from a user's viewpoint, the system contains chose J>y EXEC. an infinite amount of mass storage space. When more space is allocated than is physi- ^ MAXOPN is the maxiiEuni ninnher of batch Jobs allowed open ac Che cally available, the system creates space same cime. by rolling out infrequently used files to ^ Default values are baaed on a MAXOPN value of 15. tape. When the ratio of allocated space to configured space becomes large, there is a great potential for the system to begin Arguments can be made for setting MSW1 thrashing by moving files bacl< and forth high or low. A low setting minimizes the between real mass storage (disks) and vir- average amount of unused (available) mass tual mass storage (tapes). The most severe storage, while running the risk of reaching problem occurs when a terminal user requests the critical thresholds where throughput wil a catalogued file which has previously been be curtailed. High settings reduce the risk rolled out to tape. The user must wait for of reaching the critical thresholds, but in- the appropriate tape to be mounted and the crease the amount of mass storage that re- file rolled back to disk -- a process which mains unused. The proper setting really de- in extreme cases may require up to two pends on the rate at which mass storage is hours [3,5,6]. consumed compared to the rate at which it can be made available with a rollout. The The amount of mass storage available is settings are also dependent upon the number monitored by the element FALL. When mass of files with current backups when a rollout storage available drops below a specified takes place. limit, FALL calls ROLOUT to initiate the rollout process. ROLOUT starts a critical 2.2 Number of Tracks to Unload deadline batch run which calls the SECURE processor to perform the actual rollout. The element ROLOUT calculates the number The SECURE processor is a utility program of tracks to unload from the formula: supplied by UNIVAC to maintain backup copies of all catalogued files. When started by TOTRKS ROLOUT, SECURE will unload enough files to NTRKS = AVAIL (1) PERCNT bring mass storage availability up to a pre- defined level where 2.1 Mass Storage Thresholds TOTRKS = Total number of mass storage A set of user definable thresholds de- tracks configured on the system termine when the system should consider mass PERCNT = Fraction of configured tracks storage to be approaching saturation. When to be unloaded. (Current de- the available mass storage drops below fault is 10.)

threshold MSW1 , mass storage is said to be AVAIL = Number of tracks currently aval lable.

253 .

The number of tracks to be unloaded during F(X) = Function for computing point a rollout varies depending upon the number values for TSLR and MTBR. of tracks available at the time the rollout G(X) = Function for computing point is initiated. The objective is to maintain values for SIZE. the number of available tracks approximately EQBIAS = Equipment bias (default = 2). equal to PERCNT percent of the total tracks PVBIAS = Private file bias aval lable. (default = 1). FCBIAS = F-cycle bias (default = 1). The number of tracks to be unloaded is UEFBIAS = User defined bias. further constrained by the equation The functions F(X) and G(X) use Tables h and 5 respectively to compute point values that MSW1 < NTRKS < TFAST (2) lie in the range 0-32.

That is, at least MSW1 tracks will always be Table ^t. Point Values for TSLR and MTBR unloaded, but never more than TFAST. TFAST is currently coded at 30,000 octal or 12,288 decimal and is historically based on the POIHTS HOURS DAYS POINTS HOURS CAYS number of tracks on one FASTRAND. The number of tracks to be unloaded has important per- 1 1 .04 17 98 4.08 2 2 .08 18 116 4.83 formance implications since it affects how 3 3 .13 19 134 5.58 4 4 .17 20 152 6.33 frequently rollouts will have to be performed. 5 7 .29 21 194 8.08 6 10 .42 22 236 9.83 7 13 .54 23 272 11.33 2.3 Selecting Which Files to Unload B 16 .67 24 320 13.33 9 22 .92 25 530 22.08 10 28 1.17 26 740 30.83 11 34 1.42 27 950 39.58 The Unload Eligibility Factor (UEF) de- 12 40 1.67 28 1160 48.33 termines which files to unload during a roll- 13 50 2.08 29 1370 57.08 14 60 2.50 30 1580 65.83 out operation. if the UEF is improperly set, IS 70 2.92 31 •1790 74.58 80 3.33 32 2000 83.33 the wrong files may be rolled out resulting 16 in an unnecessarily high number of rollbacks. Ideally, only infrequently used files should be rolled out. When a rollout is initiated, Table 5. Point Values for SIZE the SECURE processor is started via a canned run stream. SECURE computes the UEF for each catalogued file and selects those files POINTS TRACKS POINTS TRACKS with the highest UEF for unloading.

1 8 17 152 The UEF is computed from the formula: 2 16 18 176 3 24 19 200 4 32 20 224 5 40 21 248 UEF = (CRNCY-'-F(TSLR) + FREQCY-'^F (MTBR) 6 48 22 272 7 56 23 296 + SIZEWT^'-G(SIZE))/2^^™^ 8 64 24 320 9 72 25 376 + EQBIAS + PVBIAS + FCBIAS 10 80 26 432 11 88 27 488 + UEFBIAS (3) 12 96 28 544 13 104 29 664 14 112 30 784 15 120 31 904 where 16 128 32 1024

CRNCY = Currency weighting factor (default = kO). TSLR Time since last reference in Four types of files have predefined UEF's. hours They are: FREQCY = Frequency weighting factor (default = 20). UEF = 0, Tape files or files unloaded MTBR Mean time between references UEF = 1, Removable disk files in hours. UEF = 2, Unload-inhibit files (G and V) SIZEWT = Size weighting factor UEF = 3, Empty files (0 tracks) (default = 10). SIZE Size in tracks. The computation of the UEF is illustrated by UEFWT = Shift factor to reduce UEF the following example. magnitude (default = 5).

254 .

Time since last reference = h days as large as the number of catalogued files.

Time since last catalogued = 1 04 days Since DCLUTS is also used in the computation Number of times assigned = ^6 of the hash code, it should also be a prime Mean time between references = IO't/46 number. The use of a prime number improves - 2.3 days the likelihood of generating hash codes

Size of f i le = 200 tracks that are uniformly distributed over the Stored on drum (EQBIAS) = 2 range of the DLT.

Private file (PVBIAS) = 1

Current F-cycle (FCBIAS) = 1 The actual hashing algorithm has been UEFBIAS = 0 found to be effective when the above guide- lines are followed. Analysis has shown that the algorithm is relatively insensi- UEF = (40--'-17 + 20^'Mi, + lo^vis) / 32 tive to the use of similar file names. That + 2 + 1+1+0 is, similar file names do not usually hash = 39.9 into the same index. This is contrary to what was previously reported in [k]

The information required to compute the UEF Persing [5] analyzed the hashing algo- is maintained in the MFD. rithm using 7,357 file names. His findings are summarized in Table 6. Unfortunately, 3. Master File Directory (MFD) he did not give the actual values used for DCLUTS. His data are rounded to the nearest The Master File Directory (MFD) is thousand. Two items from his study stand illustrated in Figure 1. The MFD describes out. First, low values for DCLUTS result in each catalogued file in the system. Entries nearly every file requiring a search item. into the MFD are made by computing a hash Second, even when the size of DCLUTS is code index from the file name: large, kO% of the files still required a search item. Even for a perfect hashing al- gorithm, there will always be some chance of

HASH(QUALIFIER'-FILENAME) = i {k) filenames hashing into the same index. Du- plicate entries in the DLT are undesirable for two reasons. First, an extra I/O is The hash code is used as an index into the required to read the search Item. Second,

Directory Lookup Table (DLT) . The DLT then mass storage space is wasted storing search contains a pointer to a lead item which con- items. On release Level 36 of the EXEC, tains pointers, for each F-cycle of the DCLUTS is set to a default value of 12,0^7. file, to main items. The main item contains the descriptive information for the file. If two file names generate the same hash Table 6. Results of Hashing 7,357 File code index, a DLT conflict occurs. The con- Names flict is resolved by replacing the lead item pointer with a pointer to a search item. The search item then contains the pointers RATIO TO FILES REQ' I NG % Of to the lead items for the files with the DCLUTS NO. FILES SEARCH ITEMS TOT FILES duplicate hash codes.

3.1 Duplicate Hash Codes 2,049 0.3 7,128 97^

Duplicate hash codes in the DLT result 17,000" 2.3 2,926 kO% from (1) a DLT too small for the number of catalogued files, or (2) an inefficient hash- ing algorithm. The size of the DLT is speci- Approximate value rounded to nearest 1000 fied by the system generation parameter DCLUTS. Besides defining the size of the DLT, DCLUTS is used in the computation of 3.2 MFD Lock Cel Is the hash code as follows: Access to the MFD is controlled by a set of lock words. Multiple lock words are HASH(QUALIFiER--'^FILENAME MOD (DCLUTS) (5) used so that two users can access different parts of the MFD at the same time. The num- ber of lock words and equivalently the num- Analysis by UNIVAC and user sites [5] has ber of parts of the MFD that can be accessed shown that DCLUTS should be at least twice simultaneously is computed from the system

255 generation parameter DLOKSF. The number of can result in wasted mass storage space. It lockwords is equal to can also result in rollouts if no full posi- tions are available. 2DLOKSF k.2 DAD Table Storage in EXPOOL

EXPOOL is the EXEC's main memory stor- where DLOKSF is a positive integer between age pool. It is an area of main memory re- zero and four. As it turns out, the EXEC served for the exclusive use of the EXEC to does not take full advantage of the multiple hold such items as DAD tables. The para- lock words. The dominate user of the MFD is meter NGTB specifies the maximum number of the Facilities Complex, and the first thing DAD tables than can be in EXPOOL per cata- that FIMAIN does is to set all MFD lock logued file when EXPOOL is "tight." When words. Since the Facilities Complex is de- EXPOOL reaches a critical threshold of uti- signed to service only one request at a time, lization, access to EXPOOL is limited and DLOKSF may as well be set to zero. When the one of those things limited is DAD tables. Facilities Complex is rewritten to process By keeping the number of DAD tables per file multiple requests at the same time, it will below NGTB (default value = 5), that is, be advantageous to increase DLOKSF to three minimizing mass storage fragmentation, this or four. will not be a problem. When the value goes above five and EXPOOL is tight, extra over-

4. File Granu 1 ar i ty head will result from moving DAD tables back and forth between main memory and mass stor- Mass storage is allocated in either age. track or position granularity. A track is 64 sectors (1 sector = 28 words); a position 4.3 Mass Storage Sectors is 64 tracks. The number and location of the granules assigned to a file are defined Each DAD table is one sector (28 words) in the Device Area Descriptor (DAD) table. in length. The parameter MAXSEC specifies The DAD table defines every contiguous block the maximum number of DAD tables that the of mass storage assigned to a file. The DAD EXEC can read into memory at one time. If table also maps a file relative address to a the number of DAD tables per file is low, device relative address. MAXSEC may be low. A guideline in [4] sug- gests setting MAXSEC to the 80th percentile 4.1 Mass Storage Fragmentation of the number of DAD tables per file. The default value is 124. Mass storage can become fragmented much the same as memory. When mass storage space 4.4 Maximum File Size is allocated to a file, the EXEC attempts to assign a contiguous block of space; however, An option of the file assignment state- if none is available, it will assign discon- ments allows the user to specify the maximum tiguous areas to the file. Since each sepa- file length in granules. When this value is rate area of mass storage is defined by a unspecified, the parameter MAXGRN is used as separate DAD table, extra overhead is re- a default value. If MAXGRN is set too high, quired to access that file. The overhead mass storage may accidently be wasted by un- shows up as extra mass storage space required debugged or inefficient programs. The cur- to store the DAD tables, extra memory space rent default value for MAXGRN is 128. As a to hold the tables while the file is active, guideline, Kelly [4] suggests that MAXGRN be and extra CPU time to search the DAD tables. set equal to the 80th percentile of all file

s i zes. A common cause of too many DAD tables for a single file is requesting no initial 5. Performance Problems and Hypotheses reserve for the file. When no initial re- serve is requested, the system will allocate Potential performance problems associ- space in four track increments when it is ated with the Facilities Complex are summar- needed. This relults in many discontiguous ized below. They are denoted as PI, P2, areas and multiple DAD tables. Some sites etc. for ease of reference. Associated with have minimized the likelihood of fragmenta- each potential problem are a set of perfor- tion by changing incremental allocations to mance hypotheses denoted HI, H2 , etc. sixteen tracks. Fragmentation can also be that might be the cause of the problem. reduced by requesting position granularity Each hypothesis also has a brief statement rather than track granularity. Care must be of how to test the hypothesis. taken, however, since position granularity

256 . .

PI . Too many rollouts are occurring . the number of files rolled back per run. If more than one file is rolled

HI . Not enough tracks are being unloaded . back at a time, a queue must have

Monitor the frequency that rollouts are ex i sted occurring. Verify that MSWl is proper- H3. Long time to mount SECURE tape . Check ly set. Monitor the frequency of the operator console log or observe opera- threshold messages on the console. tors in action. H2. Users are allowed - to allocate mass stor H'*. Long tape search time . Observe roll- age files with no concern for the physi - back process in operation. Little can

cal resources available . Monitor the be done about this except limiting the number of files assigned to each user, number of files on a tape. their space requirements, and their fre- quency of use. Archive all files older than a specified number of days. Ph. The time to complete a rollout is too

H3. File requests arrive in a burst pattern . long . Prepare a histogram of the interarrival

time of rollouts to determine if time HI . SECURE is doing physical rather than

dependent conditions are causing a prob- 1 og i ca 1 ro 1 1 outs . When files have cur-

1 em. rent copies on tape, they can be rolled H'*. Several very large files dominate mass out by simply making the space avail-

storage . Sort files by size. Examine able. That constitutes a logical roll- the usage pattern of large files. Con- out. Physical rollouts occur only when sider removeable paclcs as an alterna- no current backup exists. Physical tive. rollouts should be avoided. Monitor

H5. Insufficient mass storage is configured . rollouts to determine if physical roll- Quantify the frequency of rollbacks and outs are happening. Evaluate file save the time to complete a rollback. Com- procedures to determine the need for pare to other sites. Compare to inter- backing up files more frequently. nal standards and past performance. H2. Too many files are being unloaded at

Track the growth of the overcommitment once . Evaluate size weight in the UEF. ratio -- the ratio of mass storage Evaluate the mass storage thresholds. space assigned to the mass storage Monitor mass storage available.

space configured. H3. Heavy workload on the system . See HI

H6. Too many files have unload inhibited . under P3. Examine all G and V option files to de- termine if they are necessary. Track

the growth of G and V option files as a P5 . Mass storage availability remains high

percent of total files. even though many rollouts are occurring .

HI . Too many tracks are being unloaded . See

P2. Too many rollbacks are occurring . HI under PI

H2. The thresholds are set too high . Moni-

HI . The wrong files are being rolled out . tor mass storage available over time. Check the UEF's of the files being roll- Monitor frequency of threshold messages ed out. Verify that the weighting fac- on operator's console. Determine rate tors and points are set properly for at which mass storage is being used and your site. Establish guidelines to aid adjust thresholds accordingly. in setting weighting factors. Check all hypotheses for PI. P6. System throughput is being held back by

rollouts and rollbacks . P3. The time to complete a rollback is too

1 ong . HI . Rollouts and rollbacks are happening

too often . All of the analyses sug-

HI. Heavy workload on the system . Since gusted for PI through P5 will be help- rollbacks run as critical deadline runs, ful in this area. Monitor the percent this is unlikely; however, to verify the of runs that require rollbacks as well hypothesis, correlate the elapsed time as the percent of time spent waiting of rollbacks with runs open and other for a rollback to complete. workload measures. H2. Excessive system resources are being

H2. Many rollbacks queued up . Monitor the used to manage mass storage . Compute number of rollback requests waiting the percent of system resources used by when a rollback run completes. Monitor SECURE and other mass storage utilities.

257 . . .

Compare to resources used by user SECURE maintains a summary file from programs. which several reports are available. These reports are useful in evaluating the vari- P7. A large amount of I/O is directed at ables used to compute UEF's. The SECURE

the MFD . Summary Report lists all the files unloaded, the time they were unloaded, the track size,

HI . DCLUTS is set too low for the number of and the UEF. The SECURE ROLOUT Report spec-

catalogued f i les . Compute the ratio of ifies the individual point values used to DCLUTS to the number of catalogued compute the UEF for rolled out files. Un- files. The ratio should be 2.0 or fortunately, the UEF's for the rolled back greater. Compute the percent of files files are unavailable. Since the time of requiring search items. Monitor per- the original roll out is not recorded, it is centage of search items on a regular impossible to recalculate the UEF when the

bas i s rollback takes place. This information is

H2. DCLUTS is not a prime number . DCLUTS available by searching the SECURE Summary should be a prime number at least twice Report for the day when the file was rolled as large as the number of catalogued out

f i les. 7. Conclusion

P8. Mass storage space is being inefficient - Mass storage management, as performed

ly ut i 1 i zed . by the Facilities Complex, has a significant impact on system performance. The article - HI . A large number of DAD tables are requir has emphasized tuning parameters; however,

ed per file . Prepare a histogram of the the performance analyst must not overlook the number of DAD tables per file. Compute benefits to be gained by instituting a set average number of DAD tables per file. of mass storage management guidelines. Many Check NGTB and MAXSEC for proper setting. of the problems resulting from the virtual H2. Disk areas are being assigned but not mass storage concept can be controlled or

used . Prepare a histogram of time since eliminated by exercising more stringent con- last reference for all files on the sys- trols on how the users are allowed to assign tem. Pay special attention to all large mass storage. The approach used in this

f i les. article has also been used to analyze other areas of the EXEC as described in [k] 6. Data Analysis References Data on mass storage usage are avail-

able from several sources: LA, LASSO, DIRVER, [1] Bryant, R. , A Performance Study of the and SECURE. The LA processor reads the Mas- Facilities Complex, Technical Papers:

ter Log File and produces both a ROLBAK and Spring USE Conference , March 5-9, 1979, a ROLOUT summary report. The ROLBAK report pp. 29-38. lists the termination time of each ROLBAK run, as well as the number of files loaded, [2] Caldarale, C.R., State of Georgia Local the average number of seconds per file load- Code, Technical Papers: Fall USE Con -

ed, the average number of tracks loaded per ference , September 10-lA, 1979, pp- second, the number of tapes accessed, and 255-262. the average number of files per tape. An

identical report exists for ROLOUT's. [3] Gray, G. , Mass Storage Files Management,

Technical Papers: Spring USE Conference , LASSO also summarizes data from the March 5-9, 1979, PP. 381-398. Master Log File; however, it does not pro-

vide a special report for rollouts/rollbacks. [4] Kelly, J.C. and Route, G.P., UN I VAC 1108

The information must be extracted from other EXEC Level 32R2 Performance Handbook , reports under the ROLOUT/ROLBAK run ids. It NBS Special Publication 500-3^, National is not nearly as comprehensive as LA. Bureau of Standards, June 1978.

DIRVER analyzes the MFD and produces a [5] Persing, T.D., Computer Performance

series of reports describing each catalogued Evaluation at USAF/AFSC/FTD , Technical

file on the system. It provides excellent Papers: Spring USE Conference , April 1-16. statistics on DAD tables, search items, lead 17-21 , 1978, pp. items, and other attributes of the MFD. It also reports on the amount of mass storage [6] Richards, J. et al., A More Reliable space configured and available as well as File Maintenance System, Techn ical

many other mass storage usage statistics. Papers: Spring USE Conference , April 17-21, 1978, pp. 55-198. 258 APPROXIMATE EVALUATION OF A BUBBLE MEMORY IN A TRANSACTION DRIVEN SYSTEM

Wen-Te K. Lin Computer Corporation of America

and

Albert B. Tonik Sperry Univac

An approximate queueing model is built to evaluate the performance of a transaction-driven computer system with three levels of memory hierarchy, one of which is the bubble memory. The criterion of perfor- mance is the average transaction response time excluding the time spent waiting for transaction initiation because it does not measure the sys- tem performance, but the amount of transactions entering the system. The results are presented in a form which can be used by the system users to decide how much memory to include in the system configuration to achieve desired transaction throughput rate and average response time.

1. Introduction In this present evaluation, we are building a mathematical model for a transac- In a previous paper [1]\ we attempted tion driven system. In such a system, the to evaluate the effect of a virtual memory people at the terminals are entering trans- computer system with three levels of storage actions and waiting for replies. These hierarchy by using a bubble memory rather transactions are processed against a common than a head-per-track device (also referred database. We obtained the specification for to as a drum) . From that evaluation we a standard transaction from data collected derived a mathematical model based on data from Sperry Univac Transaction Interface Pro- collected from a virtual memory multiprogram- cess installations. ming system which was frequently used by stu- dents to write their own programs rather thar The mathematical model attempts to to run standard transactions. Therefore, evaluate the time to process a transaction, they shared programs such as compilers and while processing "n" transactions simultane- text editors. In such an application the ously. This multiprogramming leads to queues limiting device was the intermediate storage at the various devices. The time spent wait- level or paging device. Therefore, replacing ing in each queue has to be added to the the head-per-track device with a bubble response time for each transaction. In addi- memory increased system performance by almost tion, on some devices, as the queue gets the ratio of speed between those devices. longer, the throughput of the control units Such an application reflects neither the goes up, because of the ability to overlap sequential processing of large files, as in latency periods. All of these interactions commercial applications, nor the terminals are built into the mathematical model. all accessing a large database, as in tran- saction oriented systems (real-time). 2. The Standard Transaction

The transaction driven system we want to (1) Figures in brackets indicate the litera- model is characterized as one with trans- ture references at the end of this paper. actions entered from terminals. The trans-

259 ,

actions are handled by standard programs and the drum is 16.7 ms. The revolution time and processed against a common database. The average seek time of the disks are 16.7 ms operating system occupies lOOK words of core. and 30 ms respectively. The channel data The programs to process each transaction transfer rate is 90Qk bytes/sec. The latency occupy about 5K words of core each. The pro- time for the bubble memory is 0.8 ms, and the cessing of each transaction requires execut- data transfer rate is 3M bytes/sec. ing about 80K instructions. In addition, during the processing there are about 12 accesses to I/O devices. Of the 12 I/O com- In summary mands, 8 would be for the secondary storage, and 4 for the mass storage. The CPU process- t^ = 16.7 or 0.8 ing and I/O operations for a single transac- = revolution or latency time for drum or tion are performed in a sequential manner (no bubble memory overlap between I/O and computing). Of t^ = 1000/(9 X 10^ or 3 X lo'') course there is overlapping of I/O operations = channel time per drum or bubble I/O and computing during the processing of many t^ = 30 ms + 8.35 ms transactions simultaneously. = average seek plus search time of the disk ^ The I/O is characterized as follows: on t^ = 1000/(9 X 10^) the average, an I/O operation will transfer = channel time per disk I/O about 250 words. About half of the I/O t = 80K/3 MIPS operations are reads and the other half are = cpu time per transaction writes. However, of the writes about 10% are straight writes. The other 90% are writes 4. Mathematical Model where only part of the block is written. This operation entails a read followed by a We can model the system as shown in Fig- write on the next revolution of the device. ure 1. Transactions come into the system

In summary,

Mq = lOOK words = resident software M = the rest of the memory excluding M^ m = 5K words = program size for each transaction I = 80K instructions = cpu processing of each transaction a^ = 8 Figure 1. Hlgh-Level System Structure

= no . of secondary I/O per transaction = ^2 = no. of mass storage I/O per transaction a^ + a^ = 12 I/O from terminals and queue up at Q. The pro- = no. of I/O per transaction cessor S then processes them. Let us assume d = 250 words = 1 k bytes transactions are generated by a Poisson pro- = amount of data transferred per I/O cess with an average rate of X. transactions 50% = percentage of read I/O per transaction per second. We also assume that system S can 5% = percentage of straight write I/O process up to N transactions simultaneously per transaction at the average rate of ^ (n) transactions per W = 45% = percentage of partial write I/O second where /c (n) depends on the number n of per transaction total transactions in the system (being pro- cessed or in the queue). Therefore we have a 3. The Hardware Configuration birth and death process such that the birth rate X is independent of the population size The transaction system is assumed to run n, while the death rate is/t for n N. Therefore we can calculate the storage hierarchy. The second level memory stationary probability distribution Z. that j are either drums or bubble memory (examples transactions are in the system (including the used will be Univac 8405E) , the third level ones being served) [2]. Let R represent the memory are movable-arm disks (examples will response time of a transaction which is meas- be Univac 8433E) . We assumed the cpu is ured from the time the transaction enters capable of at least 3 MIPs (million instruc- queue Q until the transaction exists from S, tions per second). The revolution time of then

260 :

oo

R - I /Z . k -M k=lk=iV ^ ; p N-1 k = I Z k k (1) k=l k = N

=2 In order to calculate R, we have to cal- us assume: culate and A j^. Let

= = ' Q. = Queue length at station S. tTq 1, n. ii± (2) i u u . . . u (including the one being served) (per control unit at S. = Storage levels l^r. = Average service time at S. X. = Throughput of S. (no. of operations Then from birth and death process we per second) know P. = Probability of a transaction going to S. after being served by Sq U. = utilization of S. (fraction of time Z. = (3) it is busy) 1 C = No. of control units at S„ k=0

Figure 2. More Detailed System Structure That means to compute R, all we have to com- do is to compute jU ^ (k

system parameters r^, r^^ , r2. In order to solve the equations these five

261 '

parameters must be known. We have already In the following, equations (13) through (17) shown that will be combined with equations (4) through (12) to derive Q^^ and Q^. By eliminating Q-,

Uq, X , U_, r„ from equations (6), (10;, ~ (13) (7), (5), C9),^(12f, and ^1 ^j/^^i (17) we obtain

(14) (18) = t/(a^ + a^) (15)

i=i Here equations for r^^ and r^ will be J derived. r^^ is the average number of opera- tions system can process per second. It can be dependent on the queue length Qj^ if the queue is sorted according to sector posi- and Q = tions. However, the definition of standard transaction states that about w of I/O are partial-write operation (a read is required By eliminating Q , U , X , X , U, , r, , from before a write), this makes calculation of r^^ equations (6), (I0),"(7)V (4), it), (ll), and even more complicated. A formula for l/^j^ is (16) we obtain formulated as follows, which has been vali- dated by simulations (for detail of the derivation, see [4]).

[3(l+w)t2Q^+2(wt^+t2)+t^] (16; (19)

""l [2Q^(l+w)] (l+w)2Q^'=

[3(l+w)t2Q^=2{rt^+t2)+t^](l+Q^)rQP^-(l+w)2Qf where w = ratio of I/O request that are partial write = revolution or latency time for drum or bubble memory = channel time without the extra Solving equations (18) and (19) for and Q2 revolution to obtain closed form solution is very dif- ficult. we curves for Therefore, plotted Qj^ = f(Q2) and = in the plane Q- g(Qj^) Q, , Q2 r is the average service rate of system S2 for different values of n. Where the curves which consists of a number of movable-arm intersect is a solution for each value of n. disks and possibly more than one control unit. If there is no overlapping of seek From the solutions for and Q^^ Q2, we can operations among these disk drives, then obtain values for the other quantities. We is simply 1 divided by the average access can evaluate r^^ from by using equation 16. Q^^ time of the drive. But because of overlap- We can find the r2 from Q2 by use of equation ping. is dependent upon the queue length 17 Qq is obtained from equation 6. Q„ . A formula for r„ is derived as follows which has been validated by simulations (for We now want to obtain the response time detail derivation of the formula and its to process a transaction once it has entered simulation validation results, see [5]). the memory. This assumes that all devices are on one con- trol unit.

R(n) = (a^ + a2) RQ(n) + a^Rj(n) + a2R2(n) (26) + t. (IT) i=l

Where Rj^(n) is the response time for each where t^ = average seek plus search time request "to the ith facility when the mul- t^ = channel time tiprogramming level is n. The numbers a^^ and D = no. of devices per control unit a2 are obtained from standard transaction Q' = queue length per control unit definition as defined in Section 3. The = Q2/C R^(n)'s are given by Karlin, page 433 [2] as:

262 . . .

R.(n) = X. r (21) 2.2 1- r 2.0 1

1.8

i 1.6 BV ONE CONTROL UNIT. INTERNAL RESPONSE TIME DOES NOT o INCLUDE HAITINO AT THE OUTSIDE QUEUE.

From equation (7) through (12), 1.1

1,2

PONSE R, (n) = (Q. + 1) ; 1.0 1 1 r .8

.6

.t

.2 How A. of equations (1), (2), (3), can be ^ 0.0 as calculated follows: a3 5 79 U1315 17 19 21 23 25 27 29 31 53 35 57 39

3.78 1.50 5.22 5.9M 6.66 7.58 8.10 8.82 9.51 10.26 Multiprogramming Level-M K Memory Size (E6 Bits) if K

5.7-

11 Abscissa is in 5 , units of maximum number of transactions that can be processed simultaneouslv hnich is related to size of hehoflir in units of millions of bits 5.1. By using >tt j^'s, we can compute the response time by using equation (1).

5. Results From Using Math Model

The results in Section 4 are applied to the transaction model and the system defined in Sections 2 and 3. Equation (20) is used to plot the internal response time for dif- ferent multi-programming levels (which are related to the main memory size) . This internal response time is the time required for a transaction to loop through the mass 0.0- storage. It does not include the waiting , 13 5 7 911131517 19 Z123 25 27 29 31 33 35 37 39 time for the main memory, and it is directly 3.78 1.50 5.22 5.* 6.66 7.38 8.10 8.82 9.51 10.26 related to the throughput of the system Multiprogramming Level-H (through Equation 22). Equation (1) is used flEMORY Size (E6 Bits) to plot average response time including time spent waiting for main memory, for various Figure 4. Average Response Time R vs. Multiprogramming Level M for values of transaction input rate ^ . The distribution of the response time can be Different Transaction Arrival Rate'X (Configuration One) obtained by using Equation (3) . Three system configurations will be used. The first con- figuration consists of the cpu, one drum and six movable-arm disks controlled by a single values for , and we find out it is Qq, Q^^ Q^, control unit. The second configuration has because the control unit for the disks is the two control units for the six disks. bottleneck. Therefore, doubling the number of disk control units improves the response The third configuration replaces the time. But replacing the drum by bubble drum by bubble memory and has two control memory improves the response time only a lit- units for the six disks. The response times tle bit, not as much as the ratio of access for these three configurations are plotted in time between drum and bubble memory, as shown Figure 3 and 4, Figure 5 and 6, and Figure 7 in Figure 7 and 8. But if we increase the and 8 respectively. We can see the response size of bubble memory, therefore reducing the time improves substantially from configura- number of I/O to the disks, the response time tion 1 to configuration 2. By checking the improves substantially.

263 7. Figure 5. Average Internal Response Time Figure Average Internal Response Time R(M) vs. Multiprogramming Level M R(M) vs. Multiprogramming Level M (Configuration Two) (Configuration Three)

Figure 6. Average Response Time R vs. Figure 8. Average Response Time R vs. Multiprogramming Level M for Multiprogramming Level M for Different Transaction Arrival RateX Different Transaction Arrival Rate X (Configuration Two) (Configuration Three)

264 6. Conclusion

A mathematical model for transaction processing system is built, which incor- porates some elaborate algorithms for han- dling I/O at drum and disk memory. This model is then used to evaluate the effect of the bubble memory in a transaction system. The model is helpful when a user needs to decide a system configuration which can meet certain transaction rates with certain response times. It provides equations for plotting response time against both transac- tion rate and system configuration.

7. References

[1] Lin, W.T.K., and Tonik, A.B., Approxi- mate Evaluation of the Effect of a Bub- ble Memory in a Virtual Memory System, Proceedings 13th Meeting of Computer

Performance Evaluation — Users Group . October 1977.

[2] Karlin, S., A First Course in Stochastic

Processes . Academic Press, New York, 1969.

[3] Gecsei, J. and Lukes, J. A., A Model for the Evaluation of Storage Hierarchies,

IBM Systems Journal , No. 2, 1974.

[4] Tonik, A.B., and Lin, W.T.K., Approxi- mate Evaluation of the Effect of a Bub- ble Memory in a Transaction Driven Sys- tem, Sperry UNIVAC Research Report, Blue Bell, Pa., 19406.

[6] Jewell, W.S., A Simple Proof of : L = w. Operation Research, 15, 1109-1119, 1967.

265

Communications Networks -Technological Developments and Applications

267 I SESSION OVERVIEW COMMUNICATIONS NETWORKS

—TECHNOLOGICAL DEVELOPMENTS & APPLICATIONS

Jeffey Gee

Network Analysis Corporation 301 Tower Building Vienna, VA 22180

Communications network is a critcal component in providing effective information services. Practically every business or industry today is involved in using some form of communication networks as a bridge between the people and the information resource. The recent echnology development in the area of long-haul and local area communication networks will greatly facilitate the information system user to pursue various organizational objectives such as resource sharing, increasing productivity and reducing cost. In order to effectively realize these objectives, a comprehensive network planning and design process must be achieved. The efforts should include communication requirements analysis, performance evaluation and network architecture modeling.

The objective of the session is to illustrate a broader prospective of communication networks. Consequently, the session is organized with an introduction of local network technology, a performance modeling of network node, and a network modeling case study.

The local area network has been and will continue to be one of the most important network tehnology developments. The first paper titled "Local Integrated Communications Systems: The Key to Future Productivity" presents an overview of a local communications network concept and describes its potential impact to organizational productivity. In communication networks, flow control traffic congestion avoidance is an important design consideration to meet performance requirements. An analytical model for evaluating the performance for a flow controlled network node is presented in the second paper. Finally, a case study presentation is given to illustrate the practical experience of modeling a long-haul star network.

269

LOCAL INTEGRATED COMMUNICATION SYSTEMS: THE KEY TO FUTURE PRODUCTIVITY

Andrew P. Snow

NETWORK ANALYSIS CORPORATION 301 Tower Building Vienna, VA 22180

An overview of the types of local communication services required in typical large organizations is given. Such services as voice communica- tion, data communications in support of Information Systems, Closed Circuit Television, Security Management, and Energy Management are discussed. Desirable characteristics of an integrated communications system that must provide a highway for these services are presented, with particular emphasis given to organizational productivity. Transmission media that are capable of forming the building blocks for an integrated communications system such as CATV, coaxial cable, microwave, fiber optics, and infrared/ laser communication media are discussed in terms of the desired character- istics. Finally, an illustrative example of an integrated communication system is presented.

Keywords: CATV systems; coaxial cable; data communications; fiber optics information systems; integrated communications; local area networks, microwave; video teleconferencing.

1. Introduction intra and interbuilding oriented. For this reason they are typically on the Distributed data processing (DDP) order of a few kilometers. [1] In addition is adding great impetus to local area they are usually privately owned by the network developments. As computing and organization as opposed to leased facilities storage capabilities decentralize, the from a TELCO. Also they typically support demand for intercommunication among more data rates that far exceed the tariffed and more sophisticated users increases. services or capacity of the TELCO facilities. DDP, in conjunction with integrated local communications systems, offers users a 2. Services To Be Supported potential to improve individual productivity. This productivity enhancement is due not In the intra and interbuilding environ- only to the new services that can be offered ment, the local network should be able but also to the nature of the local network to support the wide variety of services itself. This article will deal with local that users are demanding. Typical data area networks as opposed to long haul communications applications in buildings networks. Long haul networks are intercity today are many terminals accessing one type networks where great emphasis is or more computer hosts for say database placed on bandwidth utilization and minimum updating, query/response, and programmers distance layouts (as one pays for capacity engaged in systems development. In addition by the airline mile). Local area networks to data communications associated with are loosely characterized as being more some of the classical information systems,

271 .

the local network should also support of user demands. [2] Examples of tactical the communications needs of security, change might be the migratory nature of fire, and energy management systems. people/organizational units within a building These systems consist partly of many micro- or building complex, and concomitantly processor based sensor devices that will equipment migration. Typically, productivity typically report back to a computing/data during migratory periods is very poor base machine. In addition many security for the organizational units involved. systems also have video monitoring require- When people move many events are triggered. ments. Also, with the advent of video Terminals and phones move. Support personnel teleconferencing, modern corporate facilities take word processing equipment with them. will require video distribution systems. If the magnitude of the move is significant, It is the video requirement that makes simple adjustment of the building wire traditional twisted pair wire cable plant plant may not be technically feasible. particularly unattractive in the future. The plant may not be able to support the proposed move. In the cases for which Another service area is being created the plant can adjust, significant wire by components of the "office of the future" installation efforts are often necessary Communicating word processing and document to get people within "reach" of the plant. distribution machines are greatly impacting Flexibility in this context means the approaches to information flow in large provision of reasonable access of the organizations. Electronic mail services media to the user's changes. have already encouraged many executives to have terminals on their desks. In Strategic flexibility pertains more the near future, a terminal on every desk towards the ability to adapt to long term is not an inconceivable development in change without expensive architectural some organizations. The prolific increase renovation to upgrade the media. In this of intelligent devices on the corporate context flexibility means inherent or scene is placing demands on classical expandable capacity to handle new require- building transmission plants that cannot ments. An example of strategic flexibility be met in a cost effective manner. would be the planned movement of a data center within an organization's building 3. Desirable Local Integrated or building complex. With traditional Communication Media Attributes wire plant, the data center is the hub of thousands of wire pairs that are distri- In order to meet the diverse demands buted throughout a building. Movement that organizations are and will be placing of a data center within the same building on building physical plants, consolidation might well cause organizational upheaval is necessary in order to be cost effective. or expensive retrofit. In this case it The principal attributes that must be is either extremely expensive or very considered in the selection of an integrated unproductive as users are subjected to communications media are: possible interruptions, degradations, or limitations on service during the move. A different example of strategic flexibility • flexibility would be a planned information system • connectivity which will require significant capacity • capacity. in comparison to existing services. A media which could readily adapt to these two strategic migration examples would Each of these three attributes directly be desirable. Flexibility also means affect the ultimate productivity of the that the media should be prepared to accom- user organization. Productivity will modate very dissimilar types of communications

be discussed in qualitative terms for (e.g. , video and data) each of these attributes.

3.2 Connectivity

3.1 Flexibility Connectivity relates to the ability of the media to provide all required physical Flexibility pertains to the ability connections within the building or building of the integrated media to adapt to both complex. Traditional wire cable plants tactical and strategic change. Tactical are point-to-point in nature. If a particular changes deal with the day to day exigencies device requires connectivity to just a of a viable organization while strategic modest number of physically separated changes deal with the evolutionary character devices, the point-to-point approach can

272 become quite untenable. If wire is used, all intra-bui Iding communications requirements. the central switching approach architectures For this reason a large capacity is required, with their significant reliability implica- especially in comparison with twisted tions are required. Because of bandwidth pair cable. If the system is not able limitations, wire can force the separation to support these requirements, productivity of media as opposed to integration. The can potentially suffer either because any-to-any communications connectivity of disruptive expansions or because of requirement within buildings is becoming the forced requirement of managing separate more of a reality as time progresses. communications media. The trend is definitely towards the hetero- geneous as opposed to homogeneous environment. 4. Transmission Media Elements The IEEE 802 comnittee efforts [3] on the standardization of both contention The transmission media building blocks and token local area networks is encouraging available in the construction of the integra- the heterogeneous environment. This environ- ted communications media are: ment will in turn encourage the growth of multiple connectivity requirements • twisted pair wire, on the corporate premises. Diverse services • microwave, such as electronic mail, electronic document • fiber optics, distribution systems, and communicating • coaxial cable, word processing systems will demand diverse 0 CATV cable systems, connectivity within buildings. A strong t Infrared/laser. case can be made for a broadcast type capability to insure any-to-any physical connectivity. The implementation of the Each of these media will be discussed any-to-any heterogeneous connectivity and compared in terms of the flexibility, must await acceptance and pursuit of such connectivity, and capacity attributes. standards as the International Standards The summary of this comparison is shown Organization (ISO) open systems interconnec- on Table 1. tion architecture. [4] Organizational productivity is closely related to this connectivity endeavor. FLEXIBILITY CONNECTIVITY CAPACITY

Twisted Pair Poor Point-to-Point Poor 3.3 Capacity Microwave Fair Point-to-Point Good Capacity means bandwidth necessary to support the tactical and strategic Fiber Optics Fair Point-to-Point; Excel lent (Current Technology) Limited Broadcast requirements of the user organization. Tactical capacity relates to the ability Coaxial Cable Fair to Point-to-Point; Excellent to provide the bandwidth necessary to Good Broadcast support existing and short term requirements. Some of these bandwidth requirements might CATV Excellent - Point-to-Point; Excellent be of the point-to-point single destination Broadcast type, while others might be of the multiple data communica- connectivity type. Besides Infrared/Laser Fair Point-to-Point Good tions in support of information systems, video requirements may exist that require

large amounts of bandwidth. Typical analog TABLE 1: INTEGRATED MEDIA BUILDING BLOCK video requires approximately 6 mhz of bandwidth. Heavily utilized point-to- point requirements might be supported at the bandwidth required for the specific application. Others may be multiple connec- 4.1 Twisted Pair tivity or more bursty traffic types that might share high bandwidth channels of The characteristics of twisted pair the contention or token nature. These have been alluded to in the previous discussion channels typically will range from 1 to of each attribute. Wire is a point-to- 10 mbps. Proper modulation techniques point media with little broadcast potential. or bandlimiting techniques can limit a Moderate data rates must be supported 10 mbps signal to 6 mhz on an RF type by limited distance modems over distances system or 20 mhz on a baseband system, that are beyond the capability of line respectively. The local integrated communi- drivers interfaces such as RS-232 or RS- cation media should be able to accommodate 449. Video capability is extremely limited

273 and specially conditioned pairs are necessary are still R&D devices. No analog to the to pass video over very short distances. high impedance tap yet exists in the light Wire is not a good candidate for an integrated world. Fiber optic broadcast distribution media but it will continue to exist in systems are probably five years away from buildings because common commercial application. The capacity of fiber optics is dramatic. Capacities in the range of several hundreds of mbps • it is there, are possible. Flexibility of optics is only fair as Frequency Division Multiplex • it currently provides the best (FDM) techniques are still in the R&D support for distribution of stage. This means that very high data voice. rates must be allocated in a Time Division Multiple Access (TDMA) scheme to effectively use this baseband datastream in other In the future, if CATV systems can support than a point-to-point fashion. Once the voice distribution in a cost effective allocation scheme is fixed, changes are manner, new buildings may potentially difficult. Also video support requires be void of twisted pair. Wire cable plant T2 (6.3 mbps) data rates for high quality will most likely exist in parallel with video. whatever integrated media is suggested for existing buildings . It will become 4.4 Coaxial Cable increasingly less attractive in the future. Coaxial cable is distinguished from 4.2 Microwave CATV in that CATV uses FDM techniques to create separate wideband channels while Microwave systems are point-to-point coaxial cable operates on a baseband basis. in nature. They would typically be used The coaxial cable system can be repeatered to link buildings together that are moderate however and therefore its range is extendable. distances apart or to bridge gaps for It suffers from the same flexibility problems which the organization does not have easement mentioned in fiber optics distribution rights. Microwave is also line of sight systems in that composite data rates are and is nominally limited to a 25 mile required to serve many distinct services. range. Of course the maximum range of Some time slots may be dedicated while this media cannot be used for extension others might be used as a contention resource. of Carrier Sense Multiple Access Collision Xerox's Ethernet (Trademark of Xerox Corpora- Detect (CSMA/CD) contention channels to tion) is a baseband coaxial cable system. other buildings because of propataion Its capability to provide a user an integrated delay. Flexibility of microwave is closely coimiuni cat ions media is limited because related to capacity. Digital microwave of its baseband nature and inability to links in the 90 mbps ranges are commercially easily accommodate video. An advantage , available. For analog microwave systems of this type of media is its inherent 6 mhz passbands are typical. Use of microwave broadcast capability. links requires FCC approval of frequency plans. The approval process sometimes 4.5 Community Antenna Television requires frequency interference studies. (CATV) Systems Towers, line of sight requirements, and FCC approval make microwave inflexible CATV systems offer perhaps the best at times as a media. Path reliability media for integrated communications within is a function of terrain, frequency of and among buildings in close proximity. operation, path length, and atmospheric Up to forty-one 6 mhz channels can easily conditions, however extremely reliable be supported as shown on Figure 1. Channels paths are common. can be used for contention systems, video distribution, and even point-to-point 4.3 Fiber Optics communications. As indicated in both [5] and [6], the service, architectural, Fiber optic systems, although principally and protocol local area network development point-to-point in nature, can be configured issues are complex. CATV is the best into a distribution system. This is done suited all around local area network media. by splitters that subdivide the light This media far surpasses the others in energy into equal parts. The limitation the flexibility attribute. This flexibility of this continual splitting technique combined with the any-to-any physical is that the power threshold required at connectivity and large bandwidth capability receivers for acceptable bit error probability will make it the most commonly used integrated is rapidly achieved. Lightwave amplifiers media type.

274 '

Channels Channels Channels Channels

MHz MHz MHz MHz

i I i i 5.75 76 150 222 T-7 11.75 82 156 228 T-8 17.75 88 162 234 T-9 M 23.75 168 240 T-10 FM 29.75 174 246 T-11 O 35.75 108 180 252 T-12 41.75 186 258 T-13 Q 47.75 120 192 264 10 54 126 198 270 11 60 132 204 276 12 66 138 210 282 13 144 216 294 72 W 300

FIGURE 1; CATV FREQUENCY ALLOCATIONS

MICROWAVE LINK - — CATV nin CATV -i

BUILDING C BUILDING A CATV

CATV

-A MICROWAVE RADIO A I

CATV HEAD END BUILDING B

FIGURE 2: EXAMPLE OF A LOCAL INTEGRATED COMMUNICATIONS SYSTEM

275 .

4.6 Infrared/Laser system. This may have been required by lack of easement between the buildings. Infrared/Laser communications links It would be possible to establish a common are strictly point-to-point media. They broadcast channel between buildings for are greatly limited in range due to fog use as a contention channel. This could or rain attenuation and any phenomenon operate on different frequencies in each that might diffract, disperse, or reflect building CATV system and not interfere the unguided but directed light energy. with a wide assortment of other services The flexibility of inf ared/laser links on other CATV channels peculiar to each is about the same as microwave with the building. These services could include exception that desk top links through the entire suite of those mentioned in window glass are possible to establish Section 3. short i nterbui Iding links (as opposed to large tower or roof top mounted MW 6. Conclusion antennas) CATV systems offer the best media 5. Example of a Local Integrated for local integrated communications systems. Communication System For organizational needs spanning several buildings these integrated systems will An example of a local integrated consist of such elements as fiber optics, communications system is shown in Figure microwave, and infrared/ laser links. 2. In this example an organization spans Wire plant will continue to exist in buildings three buildings. Each building has its but the appearance of buildings void of own CATV distribution system for its own wire in the next five years is a real communities of interest. For common services, possibility. Baseband coaxial systems basic connectivity is provided by derivation will also appear on the scene but are of point-to-point circuits between buildings. not good candidates for a flexible integrated In the first case connectivity between system. CATV systems are the best all buildings A and B is obtained by extension around system in terms of capacity, flexi- of the CATV system. In the second case bility, and connectivity. These integrated connectivity between buildings A and C local systems will profoundly affect produc- is obtained by a point-to-point microwave tivity in the future.

References

[1] Tanenbaum, Andrew S, Computer Networks , [5] Maglaris, B and Lissack, T., "An Prentice Hall, 1981. Integrated Broadband Local Network Architecture", 5th Conference on P, [2] Kaczmarek, R and McGregor "The Local Computer Networks , IEEE, October Synthesis of Formal User Requirements: 6-7, 1980. Defining the Networking Problem To

Be Solved", IEEE 1978. [6] Stack, T. and Dillencourt, K., "Protocols For Local Networks", Proceedings, [3] IEEE 802 Local Network Standard, Trends & Applications: 1980 Computer Draft July Computer 1, 4, 1981, IEEE Network Protocols ; May 29, 1980. Standards Committee.

[4] IS0/TC97, N537, Architectural Model of Open Systems Interconnection, December 1980.

276 PERFORMANCE ANALYSIS OF A FLOW CONTROLLED COMMUNICATION NETWORK NODE

P.G. Harrison

Department of Computing Imperial College London SW7 2BZ England

A Markov model is presented for a message processing node with batch arrivals, window size constraints and an additional flow control based on a credit allocation scheme to limit congestion effects. The model is sufficiently general for use in modelling many nodes which represent servers or sub-systems in any queueing network with congestion controls. Its principal application lies in the study of communication networks, in particular representing the network- independent flow control of messages between networks, which may have quite different message sizes and protocols, via a "gateway". No closed form analytic solution exists for the direct representation of the congestion control scheme, and die approach taken is to exploit its properties, approximately in some cases, to simplify the model, reducing the size of the state space sufficiently to permit direct, numerical solution of the balance equations. The credit allocation scheme is shown to be equivalent to the node's operation in different n^odes^ each with its own buffer capacity. Great simplification is achieved in this way; credit control need no longer be modelled explicitly. The approximations made are suitably realistic to retain adequate representation of parameters, resulting in a good physical interpretation and accurate predictions. Accuracy is assessed by comparison with the results of explicit simulations of a selection of nodes of this type with various parameterizations. Finally, we suggest applications for the model in the assessment and comparison of performance under various congestion control schemes.

Key words: Communication networks; flow control; performance prediction; analytic models; queueing models; Markov processes; simulation; validation.

1. Introduction pacing control, [6], >vipd9W sizs. CPPStiaint as in SNA, DECNET and ARPANET, [6], buffer size In communication networks, flow control limitation at individual nodes, [8,9], and schemes are necessary to ease congestion which isarithmic control, [2]. Performance prediction of would otherwise cause any of a variety of flow controlled networks is important in the problems. These include speed mis-matching comparison of such schemes, with the aim of (different processing rates at the sending and choosing the most efficient (always subject to receiving nodes), deterioration in efficiency adequate attainment of the original objective of (e.g. throughput degradation), loss of security or the flow control). reliability (e.g. due to buffer overflow at some node), unfairness (in the sense that certain We address this via an analytic model, classes of message may be allowed to dominate). constructed initially to represent a node, the Various strategies have been proposed and sub-sets message processor, connecting separate of these implemented in some actual networks; e.g. communication networks with possibly quite

277 different characteristics, such as message size or In the following section, the gateway node

protocol; a gateway . Flow control in the gateway modelled is defined and the analysis resulting in is independent of the characteristics of each the reduced state space discussed. The balance network and uses: equations for the underlying Markov Process then (i) Window size restriction; follow directly, from which the state transition (ii) Buffer size limitation via assignment probability matrix may be obtained. The validation of a threshold value for the number of exercise is described in section 3, and finally we messages therein; suggest applications of the model and extensions (iii) Message acceptance protocol based on which could be made to increase its generality, the availability and issue of credits, allowing representation of more sophisticated, accumulated on completion of processing of a stabilized congestion control schemes. message unit by the message processor. This is a pacing control. 2. The Node Model

The analytic model was developed within a 2.1 Node Description Markovian framework and results in balance

equations widi many global dependencies in the The node, shown in Figure 1, accepts batches sense that certain state transition probabilities of messages from a number of classes, each of depend on several state vector component values; which has its own batch size and input stream. in particular, blocking is present. The credit- Incoming message batches from each stream enter a based control scheme is shown, in 2.1, to be common hold queue (according to a Poisson arrival equivalent to operation by the node in m odes, process) from where batches are released into the representing congestion levels, each with its own server queue of the messag e processor, whence they buffer capacity. In this way, the problem are served on a first-come-first-served (FCFS) specification is greatly simplified, and the state basis. Messages are processed one message at a

> space much reduced, since credit handling need no time and the following flow control schemes are longer be modelled explicitly. The approach taken operative:

is to make r<^^ 1,j yti<; approximations to further (i) Window control. Each message class is reduce the size of the state space, the number of assigned a maximum number of batches allowed dimensions in particular. The global state in the server and hold queues, its window dependence, e.g. blocking, is still present, but size; the resulting small size of the state space (ii) Isarithmic control or buffer size permits direct numerical solution of its balance limitation. The server queue has an equations for the state space probabilities. From associated threshold value such that no these, important performance measures follow messages are accepted when the queue size immediately. The physical realism of the reaches this level and a congested mode is approximations made retains the original entered whereby message acceptance is parameters of the model in an intuitively restricted according to credit availability significant way, preserving an accurate as discussed next in (iii); representation. Thus the precision of predictions (iii) Pacing control. Once the threshold has is expected to be good. Validation is performed by been reached messages are accepted only if comparison with the results of simulation tests, sufficient credits have been accumulated, although ultimately validation is only relevant until the queue again becomes empty. The when based on data measured on actual networks. number of available credits is set to zero Nevertheless, the consistency shown with the for queue sizes greater dian or equal to the simulated results does increase confidence in the threshold value. One credit is issued model; its approximations in particular. whenever a message's processing is completed with a resulting queue length less than the The model has great generality in that it is threshold value. Batches of messages are the solution of raw balance equations, so that all accepted from the hold queue, on a FCFS its parameters can have any (global) state basis, when the number of accvmiulated dependencies; in particular state dependent credits is greater than or equal to the batch service rates. Thus it can be applied to a variety size of the class at the front of the hold of queueing network nodes in the contexts of queue. On such a message acceptance a number communications and computer systems. For example, of credits equal to the batch size is whole sub-systems of physical nodes may be consumed, reducing the number of accimiulated represented by one with state dependent service credits correspondingly. If the server queue rate and/or customer acceptance protocol; indeed size becomes zero, non-congested mode of any system with a congestion control scheme which operation is restored until the threshold is can somehow be represented realistically by a again crossed. single (possibly very complex) processor is a suitable application. This is basically the node described in [5] widi variable message sizes and no time-out

278 mechanism. The time delay involved in the issue of * arrivals subject to maximum queue credits, arising from their own transmission on a length T for server queue length less network in the physical system, is not than T. This is because on a service represented. completion resulting in a queue length below the threshold, a single credit is Notation: issued corresponding to a single message completed. Also, when a batch is Number of message classes: R accepted, the number of credits consumed Message processor service rate: y messages/unit is equal to the number of messages in time the batch. Thus the sum of the number of Queueing discipline: FCFS (both queues) accimiulated credits and the server queue Class arrival rates: \^ batches/unit length is constant, equal to T since at time (llrlR) the threshold there are zero credits Class batch sizes: b^ messages available by definition. This is equivalent to (ii), uncongested mode, Class window sizes: w^ batches with C set to T. (llrlR) Threshold of server queue: T messages The state space, slightly generalised from this description, is now defined in terms of mode These parameters may be state dependent in and server queue length; the credit-based control any Markov model, in particular queue length becomes implicit and the model specification dependent. The direct representation of this greatly simplified. system in terms of a Markov process involves a state space with dimensions corresponding to 2.2 The State Space (i) ordering of batches in the hold and server queues in terms of their classes; The state space, S, is the set of ordered (ii) number of messages in the server queue; pairs defined by = (iii) number of credits accumulated. S { (n,m) : UnlCm; l^m^M] where The balance equations have no closed form M is the number of different modes (2 in the solution because of the global state dependencies previous section); of state transition matrix entries on more than m is the mode corresponding to state vector one state vector components, c.f. [1], [7]; for (n,m), with m=l representing congested mode; example due to blocking. Furthermore, direct Cjj) is the maximum queue length permitted in solution of the balance equations is infeasible mode m, the mode's capacity , so that Ci=T; due to the size of the state space. We therefore n is the queue length corresponding to state simplify the formulation of the specification for vector (n,m). a model to represent the system so as to reduce the state space size to such an extent that This state space permits a more general flow explicit solution of the new balance equations control scheme than that described in the previous becomes a practicable proposition. The global section: state dependencies do not vanish so that a closed (a) Any number of modes is possible, i.e. form solution still cannot be derived. M22; (b) Mode changes may occur when the queue The following observations may be made: becomes empty (n=0). Initially, n=0 and m>l; Cn,>T for a realistic system, otherwise (i) The system operates in either of two congestion will never occur in the mode m. modes: Subsequently, on entry to a state with n=0 * congested, when the server queue has from a state (l,m'), a m ode transition to not been empty since the last time the mode m occurs with probability T^mm'' Thus, threshold was reached; in the scheme defined in the previous * uncongested, otherwise. section, ™=2 (ii) Uncongested mode may be represented by ^mm' = r 1 if some maximum server queue size, C say, where 0 if m=l C^T. Of course it would be meaningless to Additional modes may be used to provide have C>T-l+max(bj) since queue lengths stability in the system by introducing greater than the r.h.s. of this inequality intermediate capacities, preventing the could only occur after an arrival to a queue immediate onset of repeated congestion. In of length greater than T-1, when no batches fact the scheme can be generalised further, are accepted. providing more stability, via multiple (iii) Congested mode may be represented by thresholds as discussed in 4.3 * no arrivals for server queue lengths (c) A batch from message class r at the greater than or equal to T. front of the hold queue is accepted into the

279 Credits ih

Class 1

Class 2 Bacefa Siaala arrivals aes sages

Hold, queue

Class S.

Figure 1. The Flow Controlled Node

server queue in state (n,m) iff the queue constraining factor for class k in state (n,m) by capacity wovild not be exceeded, i.e. iff uj^Cn): independent of m since the window control n+bj^Cjj^. Hence a form of blocking is is mode-independent. present even in our approximate model, (d) On crossing the threshold, congested The approximation given below for ujj(n) is mode is entered, i.e. a transition to a state based on the mean value of the length of the (n,m) with n^T causes m to be set to 1. combined hold and server queues. We assume that the total number of messages in these queues has 2.3 Implicit Representation of the Congestion the same probability distribution, Z, as the Control Mechanisms length of die steady state M/M/1 single server queue with The window flow control scheme of the node is R represented implicitly in the of arrival rate E parameterization ujj(n)A]jbjj , the following two quantities for each class k, k=l llklR: processing rate y , Pk» probability that any given R message the or in hold server queues belongs and capacity i wj^bj^ , to class k. This is chosen to be proportional k=l

to the window size as well as the arrival henceforth denoted by Q. Thus {ujj(n)} ' are class °^ rate of k. Thus, p^ -"^k^k* defined recursively (and non-linearly) and may be value of pjj will also be influenced by the determined iteratively, each being set to 1

window control through A j^, as described initially and successively updated to the required next. precision. Convergnce is not considered formally (b) ujj, the constraining factor on the here. Our approximation for uj^(n) is derived as arrival rate, Aj^, of class k batches, follows. 0

280 value of uj(n) given by the previous iteration unconstrained by window control, Le. if uij(n)=l to compute The addition of allows for pi(n). 1/2 for each message class k and queue length n, '^nm the batch currently being processed. may be inaccurate since, for example, one arrival The probability or the queues containing w stream may dominate through its high arrival rate class k batches, given total length m(n), is and yet have a small, highly constraining window therefore size.

{1-Pk(n)}^(n)-W ('"H{Pk(n)}^ 2.4 The Balance Equations for m(n)2w, and we define uj^Cn) to be 1 if m(n)iwij and 2.4.1 Further Definitions

~ ^'k / \ Before the balance equations can be derived,

^ ( w ] {Pk(n)}^ {1-Pk(n)}'°(">-^ some more definitions are required: w=0 if m(n)2wij (3) ^nm^^) ~ probability that the system is blocked in state (n,m) waiting to accept a

N otes : batch of class k from the front of the hold queue, l^k^R (i) In general, m(n) is not an integer, and die generalized factorial function is used in B^ = probability of any blocking in the combinatorials to provide an interpolated state (n,m) estimate for U]^(n) as a function of mean queue length. = Z Bnm(k) (6) (ii) This approximation assumes primarily k=l that the hold queue length is always the same for any given server queue length. A more We make the following (intuitively sound) precise analysis which does not require this approximation: assumption is given in [3]. Bnm(k) = ,^AninPk(n) if bk>Cin-n The credit allocation control scheme is (7) reflected implicitly in the effective batch ^0 otherwise. arrival rate in state (n,m), where is the probability that there is at R least one arrival of any class outstanding in the I ui(n)>^ibi hold queue in state (n,m). The parameter A^^^ is i=l important in that its choice of value allows various different message batch acceptance (4) protocols to be represented. Some examples of R relevant value assignments are: I pi(n)bi i=l (i) For a re-transmit batch acceptance protocol in which a message batch is

discarded if it cannot be accepted into the so that I uj(n)A£bi server queue immediately, no blocking occurs n+bilCm and Ajun - 0 for all (n,m) e S.

= overall messag e arrival rate in state (n,m) Such a protocol is good as regards buffer protection and no hold queue is = (effective batch arrival rate) required. Throughput is not reduced * (mean accepted batch size) significantly and it is the easiest case to model; the credit control scheme is as required. Note that for degenerate bj's or redundant. However, the protocol is unfair to large batches which do not have bounded waiting time for service and suffer severely I v4(n)X£ if there is heavy small batch traffic. n+bjlCni (5) (ii) Hold queue widi FCFS discipline and 1 Pi(n) unbounded buffer size (unlimited queue n+b^C^ lengdi). For state (n,m)eS, R again as required. let A(n) = E bkXkiqj(n) (8) k=l Note too that if batch arrival rates are the total, window -constrained, external

281 message arrival rate in state (n,m). 3. Validation If ^ (n)2)J| the system is overloaded and there will always be a blocked arrival; A^m 3.1 The Simulation Model = 1. The analytic model defined in the previous If A(n)

server queues > n : server queue represents the node's operating characteristics length = n) explicitly (although very much less efficiendy). Considering the whole system as an M/M/1 Whilst it is conceded that ultimate validation single server queue, this yields must be based on data monitored on at least one existing physical node, agreement with simulated Anm - P rob (queue length > n results certainly supports confidence in our

: queue length ^ n) model; its implementation, but most importandy its s^proximations. =1- (9) Q The simulation model is an algorithmic Z(j) E representation of the node defined in 2.1. It j=n models explicidy the flow of messages in each in the notation of 2.3. This is the class and the window flow control mechanism as parameterization we choose for our model. well as the credit allocation/threshold congestion Clearly, further i^proximation has been control scheme which is equivalent to two modes of introduced, but note that the result is exact congestion. All interarrival times and the message for A (n)=0 and as A(n)->«' with mazimiun processor service times are drawn from negative allowed queue size approaching infinity. exponential probability distributions (consistent Furthermore, one would expect the inaccuracy with our Markov model), and messages rejected due introduced to be considerably less than that to window flow constraints are discarded: a arising from the other approximations in the stream's arrival rate is effectively shut down to model. zero when it attempts to violate its window size limit. With the problem now fully defined it is a relatively simple matter to write down the balance The durations of the time periods simulated equations for the scheme. These may be found in were chosen to be long enough to yield [3], together with a discussion of some numerical sufficiendy small standard error estimates on the aspects of their solution. We note here that the derived performance measures: of the order of 5%, size of the state space is see 3.3. Both the analytic formulae and simulation 1+(M-1)T+ max Cm, and the number of elements in are simply programmed in APL, and run on an IBM l

No storage optimization techniques have been 3.2 Node Test Cases found necessary, but the transition matrix is clearly suitable for sparse matrix processing In any test, the total message arrival rate techniques. These could be enhanced further using from all classes must be at least close to the certain properties of the model. For example by message processor's service rate in order that any considering separately die mode changes which can congestion control become operative. Relatively occur only in very few states; the intervening low arrival rates were used to validate the states enterable may then be considered to have implementations of the models - giving results only one component, the queue length. In close to those of corresponding M/M/1 queues, as particular, the state (n,l) with n>T can transit required. only to the state (n-1,1) and thence, eventually, to (T,l). Thus all super -threshold states could be Similarly, for a node with all batches lumped together into a single composite state. consisting of only a single message, the maximum This would produce a more efficient implementation queue length in jH. modes is the threshold value, with respect to both execution time and storage, since congested mode will be entered as soon as but results in loss of potentially important the queue reaches this size: the modes are detail, viz. the queue length distribution above degenerate. Here, the node is precisely a regular the threshold. single server queue (M/M/1 under our Markovian

282 assumptions) with the super-threshold states specifications are given in Tables 1-4, along with aggregated into one. Thus, for a single arrival their performance predictions. These test cases stream o£ unit-sized batches with window size Q, are a sub-set of a much more comprehensive set, in the notation of 2.3, ive should have revealing our most significant results as discussed in later sections. Pi = Z(i) for 01i

where p(i) = A(i)/ij(i+l) for message arrival rate (a) Each arrival stream is inactive due to A(i) and processing rate y(i) in state (i,l), and the window contol. In this way the validity At is the probability of there being a batch of our approximation for {uif(n)}, 2.3, waiting in the hold queue in state (T,l), see 2.4. could be judged directly. Now, Z(i+1)= p(i)Z(i) for Oj^KQ, so we require (b) The hold queue is non-empty, with a Q class k batch at the front, l^k^R. Hence the {1-At}"1z(T) = E Z(j) (12) accuracy of our assignment of values to j=T {Ajimlt 2.4, can be assessed for cases more which is satisfied by our definition of A in 2.4. complex than that discussed in 3.2. (c) The node is in each mode. This can Ntmierical results are in agreement, but this provide further validation by comparison with merely provides some validation of the the appropriate marginal state space implementations of the analytic and simulation probabilities of the analytic model. models. Note that although the modes are degenerate, for finite window sizes the arrival In each simulation run, initially both the streams are not; they result in a multi-class hold and server queues are empty and the node is single server queue. in non-congested mode. No data is collected until a period of 5 time constants has el<^sed. The The following selection of nodes defined for value used for the time constant is the reciprocal our validation was chosen to demonstrate the most of the mean state transition rate, averaged over significant characteristics of the flow controls: all states in the sxialytic model. We used (1) A node with 2 arrival streams and the primarily the sampling or "snapshot" method of threshold value sufficiently high for both data collection, described below, for our the credit allocation and window flow control statistical analysis. Cumulative techniques were schemes to have significant effects. Note also used to provide an alternative approach, but that a threshold above the sum of the the resulting estimates are not significantly products of the window size and batch size different from, in fact are almost identical to, for each class would never be reached. those based on the snapshot data. The cumulative (2) The same node with the threshold reduced variant is described in [4]. so that the credit allocation control scheme becomes dominant. Snapshots of the node's state were sampled at (3) The node with total (unconstrained) a set of time points uniformly spaced at intervals message arrival rate much higher than the of 2 time constants, the first at time equal to 5 message processor's service rate - a time constants. If server queue length i is saturated node. observed nj times in a sample of N snapshots, (4) A node with a third arrival stream of the estimate for the equilibrium probability of single message batches with relatively high queue length i is pi=n£/N, O^i^I, where I is arrival rate and window size. Such a stream the maximum possible queue length. Assuming that allows the server queue length to remain at the sample is independently identically its threshold value after a service distributed according to true queue length completion (in congested mode, of course), probabilities {pi:Oii^I}, the random variable due to acceptance from the hold queue of a n^ has binomial distribution with mean Np^ and message batch of unit size. variance Npj(l-pj).

The numerical parameterizations of these Thus, p£ has expected value p^ (as

283 Probability Probability

Test case 1 Test case 2 0.3 0.3 H _

X — — n

3C .Tr

0 1 10 11 234 56 789 0 1 2 3 4 5 6 7 Server queue length Server queue length

Test case 3 Test case 4

X -EX i — 0 1 2 3 4 5 6 7 9 10 11 01 23 45678

Legend: 0 analytic prediction X simulation estimate — 2 X standard error limit

Figure 2. Predicted Server Queue Length Probability Distributions

required) and variance pj(l-pi)/N. Hence, pj A numerical comparison of these performance has standard error estimate measures as computed by the analytic and simulation models is presented in Tables 1-4, with o[pi] = {pi(l-Pi)/N}0-5 (13) the goodness-of-fit assessed by the chi-squared test. Histograms derived for the server queue Since N is proportional to the (simulated) length probability distributions are shown in duration of the run, D say, the standard errors Figure 2, are proportional to D"^*^. Thus to reduce the standard error estimates by a factor of f, the 3.4 Analysis and Interpretation of the simulation run time must increase by a factor of Numerical Results f2. 3.4.1 Overview and General Observations The mean queue length is estimated as I From tables 1-4 and the plots in Figure 2 it I ipi can be seen that the analytic and simulation i=l results exhibit the same general characteristics: and so has standard errror in terms of mean queue length, the distribution

I itself and their standard errors. The distribution (i is essentially bimodal (with the exception of the { I a[pi])2}0.5 (14) i=l case with high arrival rate discussed below), decreasing from the idle probability (queue length

284 zero) to a trough before the threshold queue not inconsistent with the simulation. length, increasing to a maximum at or immediately below the threshold and falling away sharply for Our primary statistical test for goodness-of- queue lengths above the threshold. The fit is the chi-squared test. This again reflects corresponding single server queue results, with the variability of agreement, with values of the window flow controlled arrival rate and all super- test statistic in the range 12-200 on 8-11 degrees threshold states lumped into one, bear little of freedom (Tables 1-4). The corresponding resemblance. percentage points indicate a fit which is good for case 1 (about 75%), reasonable for case 4 (96%) Apart from the general bimodal shape, there and poor for case 2 (99.9%); effectively 100% for are other more local variations in the case 3. These results are not too impressive in distributions. Denoting the probability of queue themselves, indeed rather surprising in case 2. length n by P(n), for test cases 1-3, But the chi-squared test is well known for its P(0)>P(1)P(3)l is then invalid and This is precisely the situation in test cases 2 queue length 1 can only result from a transition and 4, as well as many others not reported here, 2->l, on a message completion. Thus P(1)P(T) tables). If such points are corrected or omitted, for threshold value T. This follows since the a good fit is indicated. Thus, although the chi- transition T->T is invalid (acceptance of a squared test may suggest a model worthless, it may blocked single message batch on service be that for a (majority) sub-set of queue lengths completion) and so all super -threshold states must the model provides a good representation transit monotonely to state (T-1,1). Propagation actually supported by this self-same test which analogous to that discussed above results in may reject a whole theory on the basis of one P(T-3)>P(T-2)T, yielding very emphatically threshold, precisely where the control schemes P(T)>P(T-1). Further, P(T-l)T (blocked arrival) important. Some possible causes for the transitions: no similar transition is possible inadequacies identified in the model are revealed from queue length T-2 and the effect is outweighed next. by the others in cases 1-3. 3.4.3 Interpretations 3.4.2 Goodness-of-fit (1) For many of the node specifications examined, It is clear from our tables and graphs that cases 1-3 here, the simulation predicted a sharper there is very variable agreement between the peak in the queue length distribution at (or just analytic and simulation results: from very good below) the threshold. This is quite consistent (test case 1) to poor (case 3). The mean queue with its lower predictions for the probability of lengths for cases 1,2,4 certainly do not differ zero queue length in that non congested mode would significantly (in terms of their standard error be entered less frequently. Thus the queue estimates); even case 3 is not outrageous here. capacity would be greater than the threshold value However, more rigorous quantitative testing must less often, and hence the threshold will provide a be based on whole distributions. The proportion of more dominating constraint. In more general terms analytically computed probability points lying also, some smoothing of the peak in the analytic outside the 2 x standard error confidence bands on model is consistent with the aggregate type of the simulation results (Figure 2) are 0, 37, 100 representation of window control: discussed and 12 per cent for cases 1-4 respectively. further below.

Excluding case 3, only one point differs by more than 3 standard errors (queue length 0 in case 2). (2) As expected, the credit allocation control Case 3 is the saturated node and is discussed in (CAC) dominated cases give the best agreement, some detail in the next section. Note here that e.g. case 1: CAC is modelled explicitly and in although quantitatively poor, its analytic greater detail than the window flow control (WFC), predictions still exhibit general characteristics represented only in terms of aggregate effects on

285 ;

arrival rates. Moreover the WFC has representation distribution, marginal probabilities of the independent of the CAC and so of the threshold joint probability distribution, £; value. Thus, if it is inadequate, the analytic (ii) Server utilization, following directly model cannot be expected to be accurate since its from (i) own input (arrival rate) parameters would be (iii) Server throughput; incorrect. This applies regardless of the (iv) Mean and standard deviation of the threshold value, so that by a "CAC dominated" test server queue length; case we really mean one in which WFC has little (v) Distribution of the time delay for a effect on arrival rates and not necessarily merely message batch to pass through the message one with low threshold. processor, a simple weighted convolution; (vi) The power of the node, defined as The good agreement obtained in case 1 is Througlq)ut of node consistent with this argument, but greater positive support is provided by the poor results Mean message time delay in both queues of test case 3 in which batch arrival rates were 2 _ (Throughput) high and consequendy adjusted by the WFC to a considerable extent. Certainly the significant Sum of mean queue lengths probabilities predicted for queue lengths 0 and 1 are counter -intuitive, and we postulate that the by Little's Law. Now the throughput may be inaccuracy arises from our representation of WFC. derived from the node's state space Exploring this possibility further, we tested a probabilities and the sum of the mean queue case with two arrival streams, one with much lengths obtained by considering the whole greater arrival rate but small window size. The system as an M/M/1 single server queue with resulting adjustments to the batch arrival rates window flow constrained arrival rate (2.3). were therefore substantial, even though the node was not overloaded. This test gave a chi-squared 4.2 Experimentation statistic of around 500 on 9 degrees of freedom, strongly suggesting that our arrival rate Having derived expressions or algorithms for adjustment formula (2.3) is inadequate. the performance measures listed in the previous section, one can attempt to optimize the node's On re-computing the analytic results for this performance in various ways, by selection of node specification and test case 3, using the WFC- appropriate parameter values. For example, constrained arrival rates estimated by monitoring (i) In the graphs of throughput and power the corresponding simulations (3.3 (b)) in place vs. mean time delay or mean queue length of the adjustment formula, much better agreement (related by Litde's Law) for messages in the was obtained. A need for improvement in our system, one can identify a knee and a peak representation of WFC is clearly identified, and respectively. Optimization would involve may be provided to some extent through a choosing model parameters to achieve refinement of our aggregate approach, [3]. A more positioning of these points which is best detailed representation would be preferable, but according to some objective; e.g. maximum recall that it is impractical for this to be fully throughput subject to some upper bound on explicit (2.1). Note that another similar possible mean time delay (reflecting provision of some source of inaccuracy is the formulation of the pre-determined minimal service quality). batch class selection probabilities, {pk(n)} of (ii) The distribution of the server queue 2.3. length is important, particularly if service rate depends on queue size. Clearly a low (3) Finally, and in the same vein as (2), the probability of zero queue length is required parameterisation of the blocking probabilities (minimal idle time). In addition, the through {Amnl lead to imprecision. Again the "spread" of the distribution should not be WFC representation is also significant through too great, otherwise the variance of the {plj(n)}, as well as {Ajiml to a lesser extent. number of messages in the queue will be high, However, tests using values for the blocking reflecting a highly variable message delay probabilities estimated in corresponding time and possible instability; a most simulation runs (compare (2)) were inconclusive. undesirable effect, psychologically at least. Thus the ideal shape for the queue length 4. Applications and Enhancements of the Model distribution is that of a sharp peak at a length greater than zero. Furthermore, for a 4.1 Performance Metrics queue length dependent service rate, one would like to be able to choose parameter Most of the required performance measures can values such that the peak was located near be derived from the state space probabilities, £. the queue length yielding maximum service These are as follows: rate (the queue is finite, recall); in this (i) Server queue length probability way, throughput should be optimized also.

286 Considerable experimentation with parameter Extending this argument, any number of thresholds values, such as the threshold, mode capacities, may be defined, the mode being subject to change batch and window sizes, is possible, since the at any; approaching from above or below. In this significant characteristics of the flow control way, stabilizing mechanisms may be represented via scheme are parameterized explicitly. Particular entry to intermediate modes or use of a threshold examples are suggested in [3]. band to delay revertion to the old mode on returning through a threshold. The threshold band 4.3 Extensions to the Model operates as follows: (i) The band consists of two values for the An immediate extension is to incorporate queue length, T;^ and Tg with Ty^ < Tg state-dependent parameters into the model; arrival for thresholds A and B respectively. and service rates in particular. Much more (ii) On reaching threshold B from below, the sophisticated optimization may then be undertaken, mode may change; to a "more congested" mode. as in 4.2. D ynamic flow control schemes are being For example, m may change from 2 to 1 in our used increasingly, particularly in networks with test case with M=2 at the threshold with distributed control functions. The parameters of value T. On approach from above, no mode such schemes are adjustable according to the change occurs. current state of the network. For example, as (iii) Similarly, on reaching A from above, congestion increases at any particular node, the the mode may change; to a "less congested" node can reduce, or even shut down to zero in mode. In the same example, m may change from severe cases, the window sizes of some or all 1 to 2 when the queue becomes empty. On message classes. Of course the physical mechanism approach from below, no mode change occurs. would involve transmission of control information on the network, requiring much more complex It can be seen that this mechanism will modelling which would be impracticable in our prevent instability caused by oscillation of the case. However it is a simple matter to incorporate queue length about a single threshold value having queue length or mode dependent window sizes, Wj, inverse mode transitions for approach from represented by pj and Uj in the balance opposite directions. In fact in out example we equations. In this way, the distribution of have precisely a threshold band, with T^ = 0 and messages with respect to class, entering the Tg = T, but fat greater generality is possible. message processor queue may be controlled according to the current state. 5. Conclusion

State dependent mode capacities may be used An analytic model has been developed which to model stabilizing mechanisms which may operate can represent a wide range of nodes or sub-systems physically by issuing variable (fractional or in queueing networks with congestion control multiple) credits. Indeed in general, mode schemes. The equivalence of the credit- and mode- capacity is equivalent to the size of credits based control schemes, together with certain issued on sub-threshold message completion. approximations yield a state space sufficiently Although a less trivial change, the balance small to permit feasible, direct, numerical equations are easily modified to accomodate such solution of its balance equations; even for dynamic mode capacities. A form of dynamic message complex nodes with large buffer sizes. At the same acceptance protocol is represented in this way. time, enough detail of a modelled system, in terms of its parameters, is retained to achieve great A novel stability mechanism may be provided generality; mainly through the acceptability of by the use of multiple thresholds. In our model, unrestricted state dependence of the parameter there is only one threshold, the only significance values. of which is that the mode, or queue capacity, may change when it is reached from below. The mode The type of control modelled is node-to-node always becomes 1, congested, but there is no rather than end-to-end, based on single buffer analytical reason for this restriction or why a capacity constraints and restrictions on the control is mode transition probability matrix similar to ^ arrival processes to one processor. The could not be used. Furthermore, such transitions localized^ reflecting message pacing and single are quite realistic and feasible, and capable of node protection, and is eminently suitable for representation in the balance equations, on explication to networks with distributed flow approaching the threshold from above as well as control. End-to-end flow control is modelled via below; one could view the existing model as having the window size parameters, although its explicit unit mode transition matrix on reaching the implementation is not. threshold from above. Validation to date is encouraging, suggesting Using the possibility of a mode change at a that our model provides a good analytic queue length of n as the defining criterion for a representation of nodes' behaviour, for server threshold, the empty queue is also a threshold. queue lengths near the threshold in particular.

287 The choices of threshold value and mode edacities on Communications, Vol. COM-20, No. 3, June ate crucial for optunizing performance, and diis 1972. domain is the most important for prediction purposes. We therefore expect that the model can [3] Harrison, P.G., "Analytic model of a be a suitable tool for performance prediction in Communication Network Node with Flow the next stage of our research - experimentation Controls", IBM Research Report RC8832, with multiple modes and thresholds in the Research report DoC81/9, May 1981, Dept.of development of dynamic stability mechanisms, see Computing, Imperial College, London SW7 2BZ, 4,3. England. Also submitted to Computer Networks.

Perhaps of less obvious importance, it should [4] Harrison, P.G., "Performance Prediction of a be pointed out that in addition to providing a Flow Control System using an Analytic Model", rich class of models, this modelling process has Research report D0C8I/IO, June 1981, increased insight into and understanding of the Department of Computing, Imperial College, corresponding physical processors; resulting in London SW7 2BZ, England. the proposals for new or extended congestion control schemes. A network consisting of nodes all [5] Kermani, P., Bharath-Kumar, K., "A Congestion of which possess flow control mechanisms of these Control Scheme for Window Flow Controlled types could provide dynamically well paced, Computer Networks", IBM Research Report No. efficient, fair and secure communication between RC8401. end users. [6] Kleinrock, L., Gerla, M., "Flow Control: A Comparative Survey", IEEE Transactions on Communications, Vol. COM-28, No. 4, April The research reported in this paper 1980. originated during my recent visit to IBM Watson Research Center, Yorktown Heights, New York. I I7] Lam, S.S., "Queueing Networks with Population would like to extend my thanks to Frank Moss, Size Constraints", IBM J. Res. Develop., July Mischa Schwartz, Paul Green, but above all to 1977. Parviz Kermani, for all their help and advice. [8] Lam, S.S., Reiser, M., "Congestion Control in References Store and Forward Networks by Buffer Input Limits", IEEE Transactions on Communications,

tl] Chandy, K.M., Howard, J.H., Towsley, D.F., Vol. COM-27, No. 1, January 1979. "Product Form and Local Balance in Queueing Networks", J.ACM Z±,2. [9] Wecker, S., "DNA: The Digital Network Architecture", IEEE Transactions on [2] Davies, D., "The Control of Congestion in Communications, Vol. COM-28, No. 4, April Packet Switching Networks", IEEE Transactions 1980.

288 Table 1. First Node Test Case

HESSAGE CLASSES; 1 2

ARRIVAL PATES 5.000 3,000 BATCH SIZES 2 ^ UINIiQW SIZES 3 2

MESSAGE PROCESSING RATE - 20.000 THRFSHLILM VALUE = H NON-CONGESTED MODE CAPACITY = 11

QUEUE PROBABILITY; PROHABILITY; STANtiARU PROBABILITY CHI S«lJAREii" LENGTH ANALYTIC SIMULATION ERROR M/M/1 (lUELIE CONTRIBUTION

0 . 142 . 137 010 .0 68 ,224

1 . 060 . 048 006 . 074 3.287

2 . 086 , 082 008 . 080 .239

3 . 078 .075 007 .086 . 152

If . 092 ,101 009 .090 1 . 092

5 ,106 . 12U 009 . 092 4 . 147

6 . 099 ,110 . 009 091 1 . 535

7 . . 178 165 010 , 087 1 , 174

8 . Iif7 . 1^6 010 .331 . 009 9 .008 .010 003 .000 ,332

10 .003 , 003 001 , 000 , 098 11 .001 .000 000 ,000

SAMPLE SIZE = 1250 LENGTH OF TIME PERIOB SIMULATED = 112.095

MEAN O.UEUE LENGTH

" " . ANALYTIC ; 5 '497 SIMULATED; 4,562

STD ERROR ; . 140

CHI-SQUARED STATISTIC = 13.510 WITH 11 DEGREES OF FREEDOM

Table 2. Second Node Test Case

MESSAGE CLASSES 1 2

ARRIVAL RATES 5,000 3.000 BATCH SIZES 2 4 UINDOU SIZES 3 2

MESSAGE PROCESSING RATE = 20.000 THRESHOLD VALUE = 5 NON-CONGESTED MODE CAPACITY = 8

QUEUE PROBABILITY: PROBABILITY; STANDARD PROBABILITY; CHI SQUARED LENGTH ANALYTIC SIMULATION ERROR M/M/1 QUEUE CONTRIBUTION

0 . 184 . 136 .010 . 068 15. 122

1 . 078 . 059 . 007 . 074 5.592

. 129 . 148 .010 , 080 3.516

3 .110 . 138 .010 , 086 9. 025

4 .257 . 260 . 012 . 090 . 054 5' .211 .228 .012 .602 1 . 790

6 . 0 22 . 021 . U04 .000 . 076

7 , 007 .006 . 0 0 2 .000 .281

. 8 . UU3 . 004 0 02 .000 . 022

SAMPLE SIZE = 1250 LENGTH OF TIME PERIOD SIMULATED = 126.505

MEAN QUEUE LENGTH ANALYTIC; 27954 SIMULATED: 3.144 STD ERROR: ,091

CHI-SQUARED STATISTIC = 35,477 WITH 8 DEGREES OF FREEDOM

289 : : : ; :

Table 3. Third Node Test Case

MESSAGE CLASSES

ARRIVAL RATES 10 . 000 6,000 BATCH SIZES 4 uiNrmu SIZES 3

MESSAGE PROCESSING RATE = 20,000 THRESHOLD VALUE = 8 NON-CONGESTED MODE CAPACITY - 11

aiJEUE PROBABILITY PROBABILITY STANDARD PROBABILITY CHI SQUARED LENGTH ANALYTIC SIMULATION ERROR M/M/1 OUeUE CONTRIBUTION

0 . 046 .006 , 002 .001 42.705

. , 1 026 ,008 003 .002 15 . 091 -y . 041 .013 , 003 .003 24 . 282

3 . 047 . 021 . 004 .005 17.867

, 064 .023 .004 .007 33.280

5 .110 . 159 .010 .010 27.326

6 .113 .153 . 010 . 016 17.901

7 .287 .320 .013 . 024 4.589

8 , 256 .294 ,013 .932 7.401

9 , 006 .002 , 001 . 000 4.576 10 .003 .001 ,001 .000 1 .680

. 11 001 .000 .000 .000 1 . 068

SAMPLE SIZE = 1250 LENGTH OF TIME PERIOD SIMULATED = 104.202

MEAN QUEUE LENGTH ANALYTIC; ""5.881 SIMULATED: 6.516

STD ERROR ; ,162

CHI-SQUARED STATISTIC = 197.765 WITH 11 DEGREES OF FREEDOM

Table 4. Fourth Node Test Case

MESSAGE CLASSES:

(1 !) ARRIVAL KATES . 1) BATCH SIZES 3 WINIiOU SIZES 1

MESSAGE PROCESSING RATE = 20.000 THRESHOLD VALUE = 6 NON-CONGESTED MODE CAPACITY = 8

QUEUE PROBABILITY PROBABILITY STANDARD PROBABILITY CHI SQUARED LENGTH ANALYTIC SIMULATION ERROR M/M/1 QUEUE CONTRIBUTION

0 . 204 .237 .012 . 163 6 . 567

1 . 160 . 139 .010 . 146 3.500

. 143 . 142 .010 .131 .013

. 3 . 131 . 113 009 .117 1 .782

4 . 121 .109 .009 . 105 1 .628

5 . 093 .112 .009 . 094 2.319

6 . 138 , 141 .010 .244 . 098

7 .0 03 , 003 . 002 .000 . 024

8 .001 ,000 . 000 . 000 .868

SAMPLE SIZE = 1250 = LENGTH OF TIME PERIOD SIMULATED aa . 029

MEAN QUEUE LENGTH

ANALi'TIC : 2V672 SIMULATED: 2.637

STD ERROR : . 090

CHI-SQUARED STATISTIC = 16.798 WITH 8 DEGREES OF FREEDOM

290 CP

I

j

Case Studies

291

.

RESOURCE MANAGEMENT IN A DISTRIBUTED PROCESSING ENVIRONMENT —COMPUTER RESOURCE SELECTION GUIDELINES

Eva T. Jun

Resource Planning and Measurement Branch Division of Operations Office of Computer Services and Telecommunications Management Office of the Assistant Secretary for Management and Administration U.S. Department of Energy Germantown, MD 20874

In an installation where computer resources of various types and sizes are either available or feasible, resource management and planning personnel have the responsibility for ensuring that user requirements are met in the most cost-effective manner. This paper is a tutorial on what considerations should be a part of the computer resource selection process for any given application.

Key words: Computer resource selection; cost; long range plan; system requirements; user requirements.

1. Introduction 2. ADP Resource Selection Process

The purpose of these guidelines is to 2.1 Timing ensure that the selection method used for automatic data processing (ADP) equipment The timing of ADP resource selection and services will lead to the acquisition is critical to both the user and to the of an appropriate and cost-effective ADP data processing department. For new appli- resource for a given ADP application cations, ADP resource selection should take system. place during the initiation phase of system development. As illustrated in Figure 1, These guidelines outline the process resource requirements should be identified for selecting the resource upon which an and quantified after user requirements are application is to be run once user require- known and either before or concurrently ments have been defined. with preliminary design. For existing sys- tems undergoing expansion (running on dedi- With the emergence of distributed pro- cated processors), ADP resource selection cessing technology, resource management and should take place at least 5 months before planning personnel have a wider choice of projected saturation (much longer in cer- equipment selection. In an installation tain Federal sectors). However, for exist- where computer resources of various types ing systems undergoing major modifications, and sizes are either available or feasible, the appropriateness of the original planners have the task of matching a resource selected should also be resource to each application. re-evaluated

293 :

Security/ privacy

Preliminary • Nature of interface with other Analysis systems - One-way or two-way data exchange - Frequency of interfacing - Data volume - Response time requirement

• Nature of processing: - Computation or data management extensive User Requirements - Accuracy requirement Study • Online storage requirements in terms of: - Number of files per system - Average number of records per file - Average record length

• Geographical distribution of users ADP Resource Selection • Peak load conditions: - Maximum number of simultaneous users or processes - Peak frequency of transaction - Turnaround time requirement

• Input/output requirement: - Input - screen formatting, data editing, and validation Preliminary - Output - format and volume Design

• Frequency of usage : - Average number of interactive and batch sessions per day Figure 1. Application System - Average length of connect time Initiation Phase per session

Figure 2. User Requirements 2.2 Understanding User Requirements

A distinction must be made between 2.3 Long Range Plan user requirements and system requirements. User requirements should accurately reflect Although the processing requirements the functional needs of the user stated in of most applications can be satisfied by terms that the user understands, whereas most general purpose computers, conformity system requirements should define the com- to the data processing department's long puting environment. Quantities in user range plans is an important factor in the requirements should be stated in the units selection of a computing resource. The the user utilizes to measure his tasks, not following paragraphs detail the major in terms of computer system terminology. points of one such plan, that used at the Figure 2 illustrates the kinds of critical Department of Energy (DOE) Headquarters user requirements that must be fully Administrative Computer Center in understood. Germantown, Maryland.

294 :

1. The Headquarters large scale host com- 5. The use of computing resources through

puter will be used for systems that interagency resource sharing ( for require Federal Government agencies) or com- mercial timesharing services must be • Online integration of multiple data considered when the following condi- bases — with current technology, tions are present: this can best be accomplished on the host computer because of • When needed as back up for in-house response time constraints; facilities when in-house computing facilities are temporarily incapac- • A large volume of data, because of itated or when there is a need for the economy of scale offered in a temporary means of providing mass storage configurations on service due to the saturation of large scale processors; in-house computing facilities and urgency of user needs; • A large amount of "number crunch- ing," where the turnaround time • When it is not feasible to provide would be intolerable on smaller proprietary software packages on scale processors. the in-house computers;

2. Classified material will be processed • When the use of special hardware on dedicated classified processors. capability is necessary;

3. Satellite processors are small to • When there is a need to access medium scale processors used to off- proprietary data bases. load work from the host processors. They are appropriate for systems that Interagency resource sharing is avail- do not have the specific requirements able from several Federal data pro- outlined above for a host computer. cessing centers that serve Federal Satellite processors can be installed agencies throughout the country. The in field offices as well as Headquar- General Services Administration (GSA) ters for the convenience of users. assists agencies in screening avail- Factors to be considered in the able Government ADP resources. assignment of users to any satellite Federal Property Management Regula- processor will be based on functional tions (FPMR) require that interagency requirements; geographic location; ADP resource sharing be considered user organization; and the need for prior to acquiring ADP resources from access to data, local control, and the commercial sources. relative capacity of the processors. Evaluation of these factors must be 2.4 System Requirements made on a case-by-case basis. User requirements must be translated 4. Office system processors may be pur- into system requirements in order to deter- chased and used by specific organiza- mine the computing environment necessary to tions within user departments where accomplish the user's tasks. The size of proven to be cost-effective. Mini/ the application system plays a major role micro systems are appropriate for in establishing system requirements. Once office automation systems because they application size has been determined, the offer user control, fast turnaround physical resources needed to support it can for printer output, and (potential) also be determined, thereby allowing selec- economy. tion of a processor with sufficient capac- ity (provided other requirement areas can Data processing may be performed on be satisfied). user-owned processors when the sus- tained processing load of parochial Determining the size of an application data justifies the acquisition of a system before it has been developed is a computer for data processing and when difficult process. Accuracy depends on a capacity of other suitable processors complete understanding of how functions in is not available. the system being developed are met. Work- load characteristics (frequency, volume, etc.) should be gathered in the current environment. Adjustments can then be made to estimate projected workload for the

295 : . ; — . ; — . —

future environment. Another means of esti- 2. Software products and utilities mating the size of a system to be developed • Software control utilities, is to examine the performance statistics of • Application support packages, such current systems of similar volume and as debugging aids, screen format- architecture ting software, etc., • File system maintenance utilities, There are often various ways of sup- such as disk reorganization, data porting a user-required function; thus, the migration, and recall; system configuration derived need not be unique. For instance, screen formatting as 3. Interprocessors communications ability a user requirement can be translated into (where necessary) either high-speed transmission from the • Online interface, host or the satellite processor, or the use • Job networking, of intelligent terminals or local proces- • File transfer; sors for screen formatting where less than high-speed transmission of data to the host 4. Appropriate file structures, including computer will suffice. Each alternative compatibility with interfacing should be evaluated according to how well systems it satisfies the highest priority requirements. 5. Number of simultaneous users the soft- ware is capable of supporting. System requirements can be grouped into five categories: software, hardware, 2.4.1.2 Data Base Management System telecommunications, operating environment, (DBMS) and performance. A single user requirement can generate system requirements in more If the need for a data base management than one category as illustrated in the system is warranted, its architecture must sample derivation of system requirements be evaluated to determine its suitability from user requirements shown in Figure 3. to the user's processing requirements for functional capabilities and efficiency. Security considerations impose requirements on all of the above mentioned 2.4.1.3 Language areas of the system. A risk analysis should be performed to determine the extent Systems that are designed using exist- of the security provisions necessary to ing systems as a base have specific lan- satisfy user requirements. guage support requirements. In addition, any system of this type also has a limited The following paragraphs outline some number of processors on which it can be of the various areas of system requirements used without a major conversion effort. , that must be considered in the computer selection process. 2.4.1.4 Automated Office System Requirements 2.4.1 Software These functions should be supported in Software support for the application office information systems: word process- being considered must be made in terms of ing, electronic mail, document storage, the following considerations. calendar planning, and typewriter-quality output 2.4.1.1 Operating System 2.4.2 Hardware The operating system must provide sup- port in the following areas, where Hardware support must be made in terms required of the following considerations.

1. Security and integrity 2.4.2.1 Memory • Control of access, • Protection against simultaneous Memory requirements are related to update when data is shared by many maximum execution program size (including users code, tables, arrays, etc.) and operating system support (e.g., whether or not the system uses virtual memory)

296 1 1 1 1 1 1 (

(U a •o o •1-1 o d 0) CO o t-l >> •H > 4-1 U >-l o. i. 1 mb ff er O a on > 1 • C I-l d > 3 • H o M 4J • iH 4-l CO CM B ^ UT en of • T5 3 I-l OJ CO >> I-l B u CM 1-4 B 4J d •a 4-1 u O •H d c d) CO CO d • >-l •H O X 0) o u B 14-1 I-l CO 4-1 —1 r-l CO •I-l •H •1-1 4J •I-l •r^ •H u B 01 CO J3 <*-( c O. CO U <*-l (0 CO I-l • I-l a 3 •H CO O 01 CO CO bO 4-1 e u CJ c a OJ OJ T3 4J 0) CO d ca u 0) C o O. T3 o o 0) 00 ca cx •H u 01 CO •H 3 01 CJ o 4< d O 1— B 9 •I-l S CO iJ CO U CO CO O. 01 CO \ o O* 3 (V cr 00 o 01 O 3 14-1 y-i d d M 01 ca u 4-1 4-) cr o o I-l s: o> CM •H 01 4J ai c CO o 01 O B O I-l u CO r-l 1 u I— CO CO q; 3 9) B B B D. 4J o O CO a I-l I-l 4-1 CO •I-l 01 O CO & u i •I-l to CO .O CO o 0 CO I-l o> U 4J o § CO b I-l o < 3 O T3 <4-l u c CU CO 4J

a 0) §

•rH 3

(U

H e . CO CO

CD

CO ^ u >4-l O 4J CJ M 0) •1-1 CO W 01 -H 3 60 > 4-1 CO •H o d 60 Wi 0> d d C d CO 4J a; 0) CO 01 QJ OJ CO i-H CO B I-l I-l I-l B TJ a; 01 4-1 d CO lO [4 o d CM CX d I-l CO » 3 O o (U 0 3 •H •1-1 1—1 4J T3 01 B 1—1 3 CO d I-l <-> > » CO d 4J cr •1-1 o 0 CO o; 01 d u I-l 3 o •1-1 OJ o o. CO 33 CLi H Id •1-4 01 O U I-l d 3 CO I I I r 3 3 01 u 3 1 C7l C7" d 01 CJ > CO •H 3 • H 14-1 3 3 CO S o 03 ; : ; ; ;

In virtual memory systems, the size of 2.4.2.4 Input/Output buffers and system tables that need to be resident in real memory or shared virtual Input/output requirements should be memory is a consideration. examined in terms of the following factors:

2.4.2.2 Accuracy 1. Input/output frequency and schedule;

Number of digits of accuracy repre- 2. Input/output volume; sentable (most often governed by word size) is a deciding factor for applications with 3. Input/output media (form) — terminals, stringent accuracy requirements. Applica- punched cards, graphics, microfiche, tions with numerical processing require- page printer, etc. (quality of output ments should examine the arithmetic is sometimes a consideration, espe- error-interrupt capability (e.g., overflow, cially for office information division by zero, etc.) of potential systems) computers. 4. Output distribution and utilization; 2.4.2.3 Data Storage 5. Input/output data protection. Online storage required by an applica- tion consists of the components required by 2.4.3 Operating Environment production and development work. The operating environment must be Storage for production work consists considered with regard to the following of the following components: factors

1. Production data bases — transaction 1. Physical security and safety of files, audit files, master files, hardware;

etc . 2. Geographic location of equipment, 2. Production program and procedure which impacts the serviceability and libraries. maintainability of parts, since some vendors do not have offices Storage for development work consists nationwide of the following components: 3. The operating environment, which must 1. Test data bases; be understood when selecting terminals for an application. In some installa- 2. Test program and procedure libraries; tions, many users have access to more than one type of terminal. The number 3. Storage taken up by programmer's pri- and types of terminals to which the vate files, an estimate of which can users already have access must be be derived by taking the numerical considered. product of the maximum number of pro- grammers who have access to the sys- 2.4.4 Telecommunications tem, the average number of files owned per user, and the average file size. Telecommunications applications should consider the following factors: The total online disk storage require- ment should be padded by a life-cycle 1. The number and type of terminals, growth factor. their communications speed, and communications protocol must be The minimum number of tape drives specified required equals the maximum number of tapes used simultaneously by the application. If 2. The distance of transmission and the the application uses tapes that are proximity of terminals, in order to imported from or exported to other systems, determine whether or not the use of a tape density is constrained by this factor. cluster controller is cost-effective.

298 ; ;

Applications that need to exchange 2.5.2 Cost data should be processed either on the same computer or on computers that have the Cost is often the bottom line in the appropriate interprocessor communications selection of a computer resource for an ADP support facility. application. Cost estimates must be made on the system life-cycle basis. Volume and speed of communications and their impact on total system performance The cost for computer system support must be evaluated by the system designer. consists of:

2.4.5 Workload and Performance 1. Hardware— processor, online storage, and peripherals; Workload should be specified in: 2. Software; 1. Number of runs processed per day; 3. Site preparation and space rental; 2. Number of interactive sessions per day; 4. Maintenance—hardware and software;

3. Hours of required operation; 5. Operator;

4. Average and peak number of concurrent 6. Environmental control; interactive sessions. 7. System programmer where needed; Performance specification should be stated in terms of peak load conditions. 8. Materials and supplies; Peak workload should be specified in terms of the following considerations: 9. Telecommunications service and equipment 1. Maximum number of users; 10. Equipment backup/redundancy; 2. Peak frequency of transactions; 11. Planning and procurement; 3. Amount of processing for the average transaction in terms of lines of code 12. Application system support; (translated into number of machine

instructions) , number of operating 13. Training; system service calls (such as OPEN, CLOSE, etc.) amount of input/output 14. Management. (l/O) in terms of records read/written (translated into actual hardware I/O Cost factors vary from one computer requests) resource to another. Besides the differ- ence in purchase prices for a microcom- 4. Growth in terms of number of users, puter, minicomputer, and large mainframe number of transactions, and number of computer, the distribution of expenses records processed per month. varies with different types of equipment. Thus, cost-determining factors have been 2.5 Criteria for Evaluating Alternative divided into three cost categories: mini/ Resources micro costs, in-house large mainframe and satellite processor costs, and telepro- 2.5.1 Support of Existing Resources cessing service program (TSP) costs.

Once the size of the application sys- 2.5.2.1 Cost of Mini/Micro Computer tem has been determined, the current capac- Support ity of existing feasible equipment should be reviewed and an impact analysis made to When managed by the end user, the see how the additional requirements will training and management costs for a system impact the overall performance of the that uses mini/micro computers can be equipment. This is most often achieved by greater proportionately than those for the modeling techniques. large computer environment because of lack of expertise in DP management. When users choose to operate their own computers, they must be aware of extra costs incurred due

299 . ; )

to operation of the equipment and coordina- various services. When using the charge- tion required by the staff to accomplish back figures to compare the cost of alter- the desired functions. They cannot rely on native resources, one should make sure that the data processing department to provide the algorithms have as the basis that total immediate assistance when support is needed recovered cost reflects the true cost of in the total system operation (e.g., compu- operations, including all of the items tefr operators, data entry support, etc.). listed in the previous section, as well as Users must spend more of their human the cost of auxiliary operations services resources to make effective use of the such as keypunching, printing, plotting, mini/micro computer. etc. (0MB Circular A-121 requires that charge-back systems of Federal agencies The alternative to users providing account for the full cost recovery of ADP their own support is for the data process- expenditures . ing department to control the procurement, operation, and maintenance (or a subset of 2.5.2.3 Cost of Teleprocessing Services these functions) of the mini/micro instal- Program (TSP) lations along with the host and satellite processors. This alternative may prove to The full cost of TSP service can be be most beneficial to all parties in the derived by adding the following charges to long run. Procurement and planning can be the TSP bill: done by DP experts. When distributed pro- cessing is planned centrally, a more 1. Conversion; coherent growth can be achieved through standardization of hardware and software. 2. Management; (Even in a multivendor environment, there should be a standard interface with host 3. Telecommunications service cost, where and among satellite processors.) The DP needed department can absorb the cost of manage- ment of the processor better because of the 4. Application system support; economy of scale in the volume of work handled. 5. Training.

The feasibility of having the DP 2.5.3 Other Considerations department responsible for the daily opera- tions of the satellite processors depends If an analysis of user and system on the complexity of the system and its requirements indicates that neither exist- geographical location. In view of the high ing in-house ADP resources nor resources cost of training, one consideration that available through TSP are appropriate, the should be made is the career growth path of acquisition of ADP equipment will be

persons working on satellite computers. required. The need for a dedicated pro- , cessor for an application must be justified It is important to evaluate the secur- on a cost-effective basis. ity provisions of candidate systems. If a system does not offer sufficient support in The selection of ADP equipment will be this area, provisions added by the user can based on system requirements as well as the often be very costly. following considerations.

The cost of hardware is normally low 2.5.3.1 Compatibility with Existing when compared to the total life-cycle sup- In-House Hardware and Software port cost of the system. Users who assume that the cost of hardware represents the The compatibility of hardware total cost of the system will find that resources increases the flexibility of con- they have severely underestimated the total figuration. The compatibility of software cost reduces the personnel cost of supporting multiple systems. Compatibility with 2.5.2.2 Cost of Supporting a Production existing in-house ADP resources protects Application on a Large Processor the previous investments in software/ hardware and personnel training. Avail- Most installations have a charge-back ability of common user facilities such as system whether or not real dollars are terminals and remote job entry (RJE) should exchanged. Accounting algorithms vary also be taken into consideration. greatly according to the values placed upon

300 .

2.5.3.2 Ease of System Expansion to Meet 3. Derivation of system resource require- Future Requirements ments. Once the alternate resources are narrowed down, the size of the Since the personnel cost involved in a application (in terms of the amount of major procurement is high, and the process physical resources necessary to sup- of a conversion is resource consuming, the port it) can be determined for each ability to upgrade ADP equipment should be alternative class of equipment; weighted heavily in the selection process. 4. If resource identified in second step 2.5.3.3 Vendor Support is available in-house, assessment of the total impact of supporting an The cost of system unavailability in additional application to determine if terms of management cost and degradation of there is sufficient capability, appro- prograimner/user productivity should be priate software, sufficient support taken into consideration in comparing personnel, and a requirement to change vendor-quoted or published mean-time- current operating procedures. Results between-failure (MTBF) and mean-time-to- of the impact study may lead to the repair (MTTR) statistics. procurement actions for TSP service,

hardware/ sof tware , or acquisition of 2.5.3.4 Hardware and Software support personnel; Maintainability 5. If resource identified in second step Geographical location and level of is not available in-house, searching personnel (in-house or vendor) expertise for a solution based on the Telepro- contribute to hardware and software cessing Services Program (TSP) . If no maintainability. appropriate resource exists there, procurement actions for equipment of 2.5.3.5 Time Constraints the class identified must be initi- ated. In addition to the system Procurement, installation, and train- requirements, the factors discussed in ing time must be considered relative to the Section 2.5 of this procedure (other urgency of users' needs. than system requirements) should be cons idered 2.6 Summary of the Computer Resource Selection Process

Figure 4 summarizes the resource selection process. This process consists of the following steps:

1. Collection of user requirements;

2. Analysis of user requirements. The suitable class of equipment can be identified in most cases by using the principles stated in the data pro- cessing department' s long range plans as baseline resource selection cri- teria. In instances where user requirements do not clearly fit any of the blocks identified in Figure 4, subsequent steps of analysis should be taken for all possible configurations. There are also applications for which the" processing can be distributed across computers (e.g., data entry performed on a minicomputer and data base management operations performed on the host); separate analysis should be done for both components of the system;

301 302 Panel Overviews

OFFICE AUTOMATION AND PRODUCTIVITY

MARY L. CUNNINGHAM

NATIONAL ARCHIVES AND RECORDS SERVICE OFFICE OF RECORDS AND INFORMATION MANAGEMENT WASHINGTON, DC 20408

Productivity is a very real problem in Computer professionals are certainly the Federal Government. Many Federal aware that computers are becoming agencies are facing increased demands for smaller and cheaper and are being used services at a time when resources are more widely. They will eventually reach being reduced. To meet conflicting demands into virtually every office, and into we must find new ways to do our jobs every aspect of information creation, better. That means, among other things, maintenance, and use. With new, that we must improve information simple-to-use minicomputers, ADP support management. Managers of offices are will be used by more and more non- managers of information. Managers are technical professionals and managers. effective in running their various Through the use of communications programs only to the extent that they technologies, access to information will manage the information upon which the dramatically increase. programs depend. Needed information must exist in the most convenient place, at the Another important trend that must be time when it is needed, and in a form that recognized is the merging of information can be used most efficiently. technologies. Today a single terminal may be a word processor, a data The most important development in processor, a link in an electronic recent years in the field of information communications network, or all of these. management has been the rapid advance According to a recent Washington Post you can even call it a revolution— in article by Will Sparks, Vice President of information technology. Citi-Bank Corporation, "All information traveling in the world's information "Word processing," basically linking stream is becoming an homogenous flow of high speed electric typewriters with digital bits,... and since, in many cases, computer capacity, has been part of this computation is being performed on these revolution. The impact on the Federal bits of information while they are in Government has been dramatic. In 1979 the transit, the distinction between computing General Accounting Office reported that in and communicating—or between the computer fiscal 1977 Federal agencies acquired word and the telephone— is vanishing." processing equipment that cost about $80 million. This same report estimated that All of these trends are of immediate by this year the annual expenditure for concern to Federal managers. By and large, acquiring new word processing equipment the ordinary Federal manager is ignorant of would be $300 million. A generally the intricacies of computer science and declining price structure for word communications technology. They can banter processing equipment, combined with the jargon, a little, but must depend on increasing pressure for greater efficiency computer professionals to tell them how it and productivity in written communications, all really works. Still, because it's their virtually guarantees that by the mid-1980 's office and their mission, they must ensure word processing will be a major growth that the technology used is cost effective area. But the fast-moving technology keeps and that it achieves its potential for taking new directions. Both Government controlling the information explosion and agencies and private businesses that increasing worker productivity. traditionally produced and acquired information in paper form are turning more This aspect of management goes be- and more to other media. Undoubtedly this yond the documentation of requirements and trend will continue, and even accelerate, the establishment of performance criteria as the technologies become more advanced, before installing a system. Large, more effective, and less expensive. technically complex systems that affect

305 . —

many office processes and procedures may One historian of the telegraph require considerable expertise to understand Robert Luther Thompson—has labelled the and operate effectively. Even the smaller, 1846-1850 period as the time of single-function systems—such as micro- "methodless enthusiasm." Everybody wanted graphic systems for records storage and to get into the act. Hastily, poorly built retrieval—require the continuing attention and duplicating systems were strung of the office manager if they are to pro- throughout the East and into the Midwest as duce the desired improvements in office the result of starry-eyed dreams of quick performance profits rather than in response to needs and carefully considered plans to fulfill Effective management today, then, them. requires not only effective managers of information, but also effective management This was the period of which of the technological tools of information Henry David Thoreau wrote in Walden : "We management. Non-technical program managers are in great haste to construct a magnetic must understand and effectively use telegraph from Maine to Texas, but Maine office automation. Computer science pro- and Texas, it may be, have nothing fessionals will have to be patient with important to communicate." those managers and teach them what they need to know. They in turn, must be more The telegraph was one of the great careful to identify and explain what they developments of Nineteenth century are trying to accomplish. This is not an technology, but its commerical beginnings easy task. To date there is not even a were notable for waste, misapplications, widely accepted definition of office and failed telegraph companies. One of the automation, much less a set of valid best lessons of that experience of standards and guidelines to use in "methodless enthusiasm" was that you must decision making. Office automation is think in terms of needs and goals when sometimes defined in terms of the equipment employing new technologies, rather than used rather than the goals or objectives it seize upon them for unsuitable purposes is installed to fulfill. Popular just because they are "there." Let's hope descriptions of the "office of the future" we remember those lessens as we once again represent the tasks of the office as get "wired". being the same from organization to organization. That view focuses on the A good place to start a discussion of form of the work in an office, not its new technologies is with definitions. substance. In that view, offices exist to Analysts at the Massachusetts Institute of produce documents, to communicate, or to Technology have defined office automation store and retrieve information. If that as: "The utilization of technology to popular view of office automation prevails, improve the realization of office functions, then office automation will not be cost or, rather, the business objectives of an effective nor achieve the goal of increasing office. This definition stresses increased overall productivity, "improvements," or productivity in the achievement of specific changes in form alone can rarely justify the end results. expense of installing these technologies. To be effective— to make real improvements Seen in this way, the principal benefit of office automation the in office efficiency and productivity—an is office automation system must be carefully increased productivity that results when designed to support the real mission of that employees can perform necessary tasks more office. It must be designed with the goal, efficiently. The objective of office not the process of the office in mind. automation, then, is not to eliminate paper— and certainly not to produce more It is easy, as historians of the paper faster—but rather to help us do the American experience have recognized, to get work we need to do, faster and better. carried away by exciting advances in technology. We only have to think back to Given the potential for savings, it those days in the 1840 's when Samuel Morse's is understandable that many agencies are new electric telegraph had stirred interested in office automation, and many American imaginations and entrepreneurs cooperative, interagency efforts have quickly began to wire the country. been initiated to develop a coordinated approach to it.

306 —

Over the past year, for example, GSA, problems as poor clerical skills, backlogs, 0MB, and 0PM have cooperated on a project or poor turnaround time. to assess the use of office automation by Federal agencies. Most of the systems We are now in the process of docu- studied during the project were designed for menting the characteristics of systems and information storage and retrieval, for text operations having high potential for the editing, or for data analysis, using data cost-effective use of office automation. processing, data transmission, and word The standards we expect to develop, based processing technologies. on this research, will cover both the kinds of applications that can benefit Many were designed to perform several from office automation, and the methods functions, using more than one technology. for studying and evaluating the potential All have as their objective improved of individual applications. We plan delivery of public services and better to study the experiences of the office management of Federal programs. For managers, to learn lessons they have example, in the area of improved delivery learned, and to share them with others. of services, the Social Security Administra- tion now uses a telecommunications system Our goal is to help program managers to perform necessary files review. The identify and meet their needs. We intend system has significantly reduced the to provide the guidance managers will need length of time that needy beneficiaries to understand what office automation can must wait for their checks, and is expected and cannot— do for them. Managers must to save the taxpayers about $25 million first of all understand the unique between FY 1980 and FY 1984. A new features and capabilities of different Veterans Administration system, called types of equipment. Secondly, they must TARGET, is expected to reduce regional be able to identify good applications for offices' staffing by 1,488 work-years over this equipment in their own organization. the next 3 years. And, of course, the processing of income tax returns already Finally, they, and we, need to be benefits from office automation. A new reminded that office automation can't do IRS system rapidly checks the accuracy and it all. We hope that our cooperative currency of taxpayer identification data, efforts over the next year will lead to so watch out. better identification and documentation of the requirements for designing and This survey demonstrated that Federal implementing effective automated systems managers are designing and installing a in the office. wide variety of office automation systems to perform many different tasks. It demon- strated, further, that there is consider- able potential for savings when office automation technology is used appropriately. But we must emphasize that those systems must be installed carefully. Social Security's system, for example, was the result of more than 2 years of system design by a large number of Federal and contractor personnel.

Unfortunately, managers too quick to opt for technological solutions to systemic or procedural problems often find that the results are disppointing. The GAO report cited earlier said that failure to achieve expected savings from the use of word processing equipment was the rule rather than the exception throughout the Federal Government. It has been our experience that a thorough systems analysis is necessary to give managers a solid basis for making informed decisions about what to expect from specific applications of office automation technology. Office automation is no magic solution to such

307

COMMAND, CONTROL. AND COMMUNICATIONS MANAGEMENT CHALLENGES IN THE 80'S

Robert G. Van Houten HQ Army Command & Control Support Agency

The four speakers in this panel will address a few of the many different aspects to command, control and communications.

Panel Members are:

Mr. Joseph Volpe

Modernization of the ADP and supporting data communications for the World Wide Military Communications and Control System (WWMCCS) taking place under the WWMCCS Information Systems (WIS) Program will be discussed. A description of the histori- cal perspective associated with WWMCCS ADP and its status today will be presented. The WIS modernization goals and problems will be covered briefly with the associated WIS architectural solutions. Some challenges will be presented which reflect the maintenance automated test (bit/byte) needs which stem from all levels of modern warfare as projected into the 1990*3.

Col. D. L. Moore

Discussion of Computer Performance Management (CPM) require- ments and activities for the NORAD/ADC operations systems

(Program 427M) . This will include a brief description of the NORAD/ADC operational ADP systems and will cover the CPM program developments, accomplishments to date, including problems encounted and lessons learned, current status and activities of the CPM program and a proposed approached for the foreseeable future.

309 .

Col , Joe Newman

Opportunities for performance management of automation and communications networks for the Worldwide Military Communi- cations and Control System (WWMCCS) at three levels: macro level, intermediate level, and the micro level will be presented

Dr. William Wenker

Co-munications management and capacity planning problems and goals associated with the WWMCCS system architectures in the 80 's will be presented..

310 ADP COST ACCOUNTING & CHARGEBACK

Chairperson: Dennis M. Conti National Bureau of Standards

Panelists: Kenneth W. Giese FEDSIM

Christos N. Kyriazi Department of Commerce

Ranvir Trehan MITRE Corp.

ABSTRACT

There is no question that the ADP shop plays a vital role in the day-to-day operations and survival of most organiza- tions. However, two distinct views still exist on the nature of the ADP shop within an organization. One view exists that the ADP shop is an overhead function that should provide free service to the users. Another view sees the ADP shop as a service center that should run on a profit and loss basis. An extension of this latter view is that the full costs of ADP should be accounted for and charged back to the users. However, whether or not these charges are actually recovered from the users, the mere reporting to the users of their fair share of the total ADP costs is believed to have a beneficial effect by increasing the users' awareness of and accountability for their use of ADP. Most recently within the Federal Government, 0MB circular A-121 requires agencies to implement full cost accounting, and where appropriate, cost recovery. Examples of some of the costs to be accounted for under A-121 are software depreciation and space occupancy costs, in addition to the more traditional ADP cost elements.

Although many sites currently have some form of chargeback system, few account for all of the costs of ADP. For those organizations that are considering implementing a chargeback system, reluctance by both the ADP manager and the user community to make their activities more visible makes such an implementation more difficult.

311 The panel will address several specific aspects of cost accounting and chargeback. Questions that the panel members will be asked to address include the following:

1. Is the expense of implementing a full cost accounting and recovery system justified by its benefits?

2. Is the actual recovery of costs from the users necessary, or will "information reports" suffice? What are the advantages to be gained by an actual transfer of funds?

3. What are the salient features of a good chargeback system?

4. What are some of the human/ political problems asso- ciated with introducing a chargeback system into an organization?

5. How does the trend toward distributed systems compli- cate the cost accounting and chargeback process?

The panel contains individuals with experience in both developing chargeback guidelines, as well as overseeing the implementation of chargeback strategies in a large organization.

312 SUPERCOMPUTERS, CPE, AND THE 80'S PERSPECTIVES AND CHALLENGES

F. Brett Berlin

Cray Research, Inc. 1919 Pennsylvania Ave., NW Washington, DC 20006

The public glamour of minicomputers, microcomputers, and their deriva- tives (e.g., "distributed systems", "personal computers," "paperless offices, etc.) tends to support the misconception of supercomputing as a soon-to-pass anachronism of use to few beyond the specialized worlds of weapons, nuclear power, or weather prediction and research. Recent events in the supercom- puter marketplace, however, indicate that true supercomputing is only in its embryonic stages and that supercomputers will become an even more cru- cial segment of the CPE practitioner's challenge. While actual numbers of installed systems are still relatively low ~ less than 40 worldwide, including all vendors — the breadth of users is rapidly extending far be- yond the walls of the national research laboratories such as Los Alamos and Lawrence Livermore Labs. The Cray-lS supercompiuter, for example, is installed at five scientific service bureaus, several universities, and a few commercial industrial accounts. Even more significantly, the num- ber of identified supercomputer prospects has grown from less than 80 in 1972 to over 300 today. As IBM, Hitachi, and Fujitsu realize already an- nounced intentions to build supercomputer-class systems by the mid-80' s, the CPE challenges now faced by only a small coterie of supercomputer users will extend to most sectors of the community. The primary objective of this seminar is to identify these challenges and to discuss directions of supercomputer technologies for the mid and late 80' s.

During this session, panelists will discuss their current work in the supercomputer field and will identify the key current and future issues which affect supercomputer development. Some of the major issues to be discussed include:

1. Major CPE Challenges to Current Supercomputer Users : The CPE practitioner is faced with the same "classic" questions common to large system, but the unique architectures and performance charac- teristics may make common CPE tools of only marginal value, and previous results from conventional architectures useful as a base- line only.

2. Problem Architectures vs. Supercomputer Architectures : Much of the speed advantage in a supercomputer results from parallel archi- tectures. These can be based upon pipelines, processor arrays, or parallel processors. However, current designs are seriously con- strained in usefulness by "problem architectures"—or the way the problem and algorithm is designed to take advantage of certain ar- chitectures. There are significant costs to tailoring problem soft- ware to meet special architectures, both in conversion and future transportability, but there are some cases in which these costs may be reasonable when compared to potential performance savings.

313 3. Vendor-developed Software Tradeoffs : "Main line" vendors, such as IBM, DEC, etc., have developed software-rich environments in which most users rely heavily upon the vendor for the majority of their system software and utilities needs. Standard packages abound for almost every application, allowing virtually any user to "computerize" without sophisticated in-house development. Supercomputer vendors and users alike are faced with a difficult set of questions which de- termine the desireabil ity of development and support of software functions expected by other market sectors. One such function is virtual memory, which has been implemented by one supercomputer ven- dor but rejected by another as detrimental to optimum performance.

4. Directions of Future Architectures : Supercomputer researchers and manufacturers expect to achieve dramatic performance goals be- fore 1990. In order to accomplish these goals, developers must make certain assumptions concerning both technology and the future appli- cation environment.

SESSION PANELISTS

William Alexander Richard McHugh

Los Alamos National Laboratories Cray Laboratories, Inc. Computer Division 3375 Mitchell Lane Los Alamos, NM 87545 Boulder, CO 80301

Art Lazanoff Jack Worlton

Control Data Corporation Los Alamos National Laboratories 8100 34th Avenue South Computer Division Minniapolis, MN 55440 Los Alamos, NM 87545

314 :

ADP CHALLENGES IN MEDICAL, PERSONNEL, AND WELFARE SYSTEMS

Chairperson: DINESH KUMAR Social Security Administration

Panelists

Dr . Joe H. Ward , Jr . Air Force Human Resources Laboratory

Mr. Paul R. Croll Office of Personnel Management

Lieutenant Colonel Joe Ribotto , Jr. TRIM IS Program Office

Colonel George A, McCall Air Force Manpower and Personnel Center

Mr. Dave England Texas Department of Human Resources

Mr. Norman Clausen Internal Revenue Service

This session highlights the needs, constraints and opportunities associated with automatic data processing applications in the fields of medical, personnel and welfare systems. The state-of-the-art approaches that are being undertaken by senior managers involved in the design, development and enhancements of the ADP system will be presented. The difficulties encountered in planning, consolidation and acquisition of computer and communication systems will be explored. This session includes several planned presentations by the panel members followed by questions and comments from the audience.

The panelist presentation in the area of medical systems will concentrate on the discussion of ADP systems and approaches being undertaken in the area of medical management information systems. The functional capabilities of these systems will be described along with their status and difficulties encountered during implementation.

315 In the area of personnel systems, the planned presentations will cover the latest concepts in planning, design and research on personnel action systems. The procedures for the implementation of personnel policy through computer based systems will be discussed. The policy generated models result in pay-off (utility) of various person-job actions. The techniques for generating alternative pay-offs and their optimization will be discussed. The current and future plans for enhancements of an existing manpower and personnel data system will be discussed to illustrate the phased implementation of research concepts. Reference will be made to the operational Procurement Management Information System (PMIS). The other planned presentation will cover the automation opportunities in conducting tests/ examinations along with related psychometric considerations.

The difficulties encountered in planning and consolidation of ADP requirements and future needs, due to several independently run data processing centers, (as compared to a primarily centralized data processing operation) will be highlighted.

In the area of welfare systems, emphasis will be on the difficulties experienced in managing large ADP and Telecommunications systems along with a discussion of constraints faced by various installations.

316 INFORMATION SYSTEMS AND PRODUCTIVITY IN STATE & LOCAL GOVERNMENT

Ron Cornel i son

Office of Administration State of Missouri

Persons attending this session will gain an understanding of the problems and opportunities present in the public sector at the state and local level relating to applying technology to achieve organizational objectives. The three branches of government —Executive, Legislative and Judicial —will be represented on the panel. The panelists are: Mr. Christopher Burpo, Arthur Young and Company, San Antonio, Texas; Mr. Steven Claggett, Regional Justice Information Service, St. Louis, Missouri; Mr. Ron Cornelison, Missouri Division of EDP Coordination, Jefferson City, Missouri; Mr. J. C. Humphrey, Texas Legislative Council, Austin, Texas; and Ms. Carolyn Steidley, Missouri State Court Administrator's Office, Jefferson City, Missouri.

The panelists will discuss the missions of their respective governmental agencies and how automation has been applied to increase organizational produc- tivity. Questions and comments from persons attending will be welcome in this session.

317

.

THE ADP PROCUREMENT PROCESS - LESSONS LEARNED

Terry D. Miller

Government Sales Consultants, Inc. Annandale, VA 22003

The purpose of this session is to review for listeners the latest techniques in ADP competition, the mistakes that others have made, the latest rules, etc., so that we will not reinvent the wheel.

Colonel Donald W. Sawyer or a member of his staff will discuss the Air Force study on how to improve the procurement process by assisting field offices on the procurement learning curve.

Steve Lazerowich, Federal Proposal Manager of Digital Equipment Corporation in Lanham, will discuss how a mini vendor looks at government procurements

I will discuss the latest regulation changes and provide continuity and a wrap-up.

I am not sure you can say we will advance the state-of-the-art. In the procurement business most people are still learning how to walk. We will provide detailed lessons on how to walk without falling into snakepits.

319

DATABASE MACHINES

Panel discussion chaired by: Carol B. Wilson

Fiscal Associates, Inc. Alexandria, VA 22304

Over the past 10 years, the use of data base management systems has been growing as the need to effectively manage and display data has in- creased. This use of data base management systems has come about because they offer several advantages: functional capability, online query capa- bility, and reduced manpower. A DBMS affords the user such things as data protection, concurrency control, audit trails, and backup /recovery fea- tures, which he does not have to program or design himself. Likewise, manufacturers of DBMS's have developed sophisticated, easy-to-use online query capabilities which permit the user to peruse his data easily, and with no programmer intervention. DBMS's help to reduce manpower require- ments through the provisions of data independence, already programmed data management procedures, flexibility in changing data base design, and the ability to quickly respond to ad hoc inquiries.

Unfortunately, these attractive advantages do not come without a price. Most data base management systems suffer in both the performance and cost areas. They often place extraordinary requirements on the hard- ware because they are large and complex and because they operate on gen- eral purpose operating systems and on general purpose architectures. Cost is also a factor in the DBMS environment as more hardware is purchased to accommodate the increased resource demands. Hidden costs may also arise from service and performance problems affecting other, non-DBMS users of the ADP installation. The performance problems brought about by the extra- ordinary requirements on the hardware and operating system can, and do, lead to decreases in ADP installation productivity. Another problem in the Federal sector comes about because many DBMS's are dependent upon the hard- ware and operating system. The lack of machine independence leads to either less competition in future procurements, or in expensive conversions.

The advent of data base machines could possibly be a solution to the problems mentioned above: decreased ADP installation productivity and hardware independence. The data base machine is a backend system with both software and hardware specifically designed for efficient data base opera- tions. The resource demands normally placed on the ADP system for data manipulation are off-loaded to the data base machine. Changes in mainframes may be accommodated by the development of interfaces from the main system to the data base machine which would afford some hardware independence to the data base system.

The panelists will discuss why and how the productivity of an ADP in- stallation could improve through the use of data base machines. They will describe the communications interface and the unique architectural features.

321

Tutorial Overviews

323

MEASUREMENT, MODELING, AND CAPACITY PLANNING THE SECONDARY STORAGE OCCUPANCY ISSUE

H. Pat Artis

Morino Associates Vienna. VA 22180

During the past decade, the economics of secondary storage occupancy issue: computer performance have undergone a profound transition. Although hardware MEASUREMENT. The available data and costs were dominated by the central design considerations for a secondary processor in the early 1970s, today the storage occupancy measurement scheme will relative costs of the central processor and be reviewed. Applications of data from a secondary storage are approximately equal prototype system developed by the author for most large installations. An example will also be discussed. of this cost transition is shown for a FORTUNE 50 corporation in Table 1. SPACE MANAGEMENT. In the past several years, a number of secondary storage Perccnr 1970 1980 Change management systems (HSM, DMS/OS, ASM2, ... CPU Configunlion 360/65J 360/40F 3033U8 3032U4 have evolved that allow CPU MIPS 0.8 7.0 875 etc.) installations to automatically archive DASD Configuration 16 2314s 128 3350s Byies (billions) 0.<6 42.0 9150 datasets that have been unreferenced for a specified number of days. Modeling MRC Expcnst CPU $66,060 $134,400 203 techniques that allow users to predict the DISK S12,360 $108,800 880 Raiio DISKyCPU 0.19 0.79 420 behavior of a secondary storage management system in their installation and to Floor Space 25,000 80,000 320 CPU 6,000 9,000 150 judiciously select the archiving criteria DISK 5,000 40,000 800 will be discussed. Table 1: Typical Installation Growth 1970 - 1980 DEVICE ARCHITECTURE. Although the most common statistic quoted by the vendors In 1970, the monthly rental cost (MRC) of about secondary storage devices is that the corporation's CPUs was six times greater the cost per byte has steadily declined, than the cost of secondary storage. Today, other statistics are somewhat more these costs are approximately equal. valuable. Beginning with the announcement Although the corporation has witnessed a of the double density 2314, the 3330, growth in CPU power of 880 percent, their 3330-11, 3350, and 3380 have represented a secondary storage space has grown by more steady decline of the accessibility per than 9,150 percent to 46 billion bytes. byte (i.e., maximum I/O power divided by Moreover, secondary storage now represents device capacity). In many installations, the majority of the floor space, power and it is not uncommon to find that only cooling costs for the installation. thirty or forty percent of the datasets on the busiest packs are accessed even once For users of smaller systems like the IBM per week. Therefore, any attempt to 4331 or the IBM 4341, the cost transition effectively use the remaining space on has been even more dramatic. It is not these devices may result in unacceptable uncommon for the secondary storage costs of I/O scheduling bottlenecks. This problem a 4300 class system to be two or three times will require the development of dataset the processor cost. Clearly, computer placement algorithms that attempt to performance evaluation (CPE) must address optimize an entire installation rather the issue of secondary storage occupancy to than a single pack. control the costs of data processing in the 1980s. CAPACITY PLANNING. An approach to DASD capacity planning that incorporates CPE in the 1970s has primarily concentrated device occupancy, performance and on the measurement and modeling of the I/O architecture considerations will be traffic between the central processor and discussed. This approach is currently the secondary storage devices. Unfortunately, being evaluated by the author. very little effort has been expended on understanding how efficiently the space on The audience will also be invited to propose these devices is being used. This tutorial other topics for discussion. addresses the following topics relevant to

325

MICRO-COMPUTERS IN OFFICE AUTOMATION

VJendell VI. Berry

National Oceanic and Atmospheric Administration Office of Budget and Resource Hanagement nockville, I©

Key v;ords: micro-computers, office automation.

This tutorial describes procedures followed in office automation projects and emphasizes the significant part played therein by micro-computers. Applications are those experienced over a six-year period of involvement in administrative automation within the national Oceanic and Atmospheric Administration, (KOAA) of the U.S. Department of Comr.ierce.

The administrative support environment of a service oriented agency broadly defines the scope of autom.ation projects discussed. Criteria were developed to guide the institution of and measure effectiveness of such projects. A systematic approach vjas adopted, and began by defining functions of the specific target office.

Office functions and tools and procedures used to fulfill them. v;ere of immediate concern. These helped us in forming an ansv;er to the most important question - vjho was doing the vjork, what skills and attitudes tovvard changes system, changes vjere present. Most functions were found to have important inter-organizational aspects to thein. A basic management tool became the categorization of functions into "administrative processes" that together define the management of resources in the organization.

Finance, personnel, facilities, material, and general administrative services v/ere five major groups defined. Administrative divisions and technical operating units contained portions of these processes. A high degree of inter-relationship among the groups also existed, and the integrated automation of all of them could be described as the ultimate office automation goal.

V'ithin the macro structure that provided a dynamic system definition, individual objectives were defined. Minimal system disruption and cost savings/avoidance v;ere often sited by those initiating such projects. Tasks vjere scheduled based upon opportunity, politics, and an overall attempt to develop basic system, modules as a primary priority.

TxiTo basic methods of automation emerged. Integration of existing data into routine vrork procedures was the first. Data found in existing batch systems formed this basis. Autom.ation of procedures not previously economical was the second.

The necessity of micro-computers in the above described scenario becomes clearer upon consideration of their application. In their role as "personal" computers, micros provide an interface betvjeen hum^ans and the power of computers, essential to the further development of office automation. Following is an introduction into some areas of r..icro-conputer uses in administrative computing.

327 Information Processing

1) VJord Processing - As price reductions and raarket competition begin to bring the multi-purpose micro into the office and the horae, word-processing is becoming a specialized softvrare application, as opposed to a problem with a hardware dominated solution. V/ord-processing is now introduced to many as a by-product of the acquisition of a m-icro- computer system, rather than as an end unto itself.

2) Data-Entry - Exotic forms of rapid data entry are presently utilized in large volume applications, where their cost m.ay be amortized over many thousands of source documents. For other system.s, many key-punch operations still continue, and much data is also keyed in an on-line environiaent into a raini-or maxi computer. The dis-economy of using computers at human speed provide yet another opportunity for efficent use of the micro-computer as a "front-end" device to other computers.

Information communication

1) Data Transmission - Within the last fev/ years, much research and development work has been concentrated upon communication systems enhancements. The use of micro-computers as part of an information communication network has emerged as result of this emphasis.

2) Management Information Systems - Interactive graphics has been made available at an individual managers level on a practical basis by the micro-computer. The much talked about "what-if" questions may by quickly and vividly ansvjered by graphics, and most recently, in color.

3) Electronic Mail - Electronic mail netv;orks now exist on many mini and naxi computers. The proliferation of v;ords in magnetic form allow micro's to perform the function of a personal electronic filing cabinet. Off-line information storage on inexpensive media is required for large-scale comi.iunications via computer.

Self-Contained Systeras

1) General Development - Shared use of m.icro- systems has caused many small manual systems to be automated. BASIC remains the dominant micro language, v;ith many "dialects" in use. Pascal, Fortran and Cobol also compete for system use.

2) Micro DBnS - An answer to the language conflict and also a part of industry-wide

empahsis upon DBMS' , micro DBMS are now emerging. Mon-data processing professionals nov; fcllov; a systematic approach to development of their systems.

Current Developm.ents

1 ) Human Engineering - Error detection produces perhaps 40fo of the total amount of prograii! coding in many applications. Reducing screen scrolling and providing audio and flashing screen error indications are examples of features designed v/ith the end user in mind.

2) Increased coEiputatlonal power - More eight-bit micro's are nov; in use than all other types of machines combined. Sixteen bit machines are now in many product lines, and thirty-two bit chips are a reality. These v.'ill put mini-com.puter power on desktops within a short period of time.

3) Increased storage - Hard disks have arrived for micro use, bringing with them capacities measured in millions of bytes. Implications for future expansion of micro usage are wide ranging, considering their projected capabilities.

4) New designs - Elicrocode im.plenentation in hardvjare to support the ADA language, and the delegation of interrupt handling to subsidiary 16-bit processors in the new 32-bit chips are an exar;iple of a significant departure from traditional architecture.

328 COMPUTER PERFORMANCE MANAGEMENT IN THE ADP SYSTEM LIFE CYCLE

James G. Sprung

The MITRE Corporation McLean, VA 22102

During the last ten to fifteen years, the performance studies. The performance computer performance has evolved into a studies will be described along with how sophisticated manager-analyst interactive they should be used throughout the ADP process. The reason for this is the System Life Cycle. Throughout the recognition by government ADP managers, as tutorial, examples will be presented that well as non-government managers, that a describe the use of each of the tools for need exists for doing computer performance different types of studies during each throughout the ADP system life cycle. phase of the life cycle. The manager continuously faces requests from users and oversight agencies for Computer Performance Management must computer resource availability data. continuously be of service to managers to be valuable as an agency function. The goal of this tutorial is to Therefore, the need of the CPE analyst to provide the framework for a continuing both perform studies and present computer performance management program meaningful results is critical. The key throughout the ADP System Life Cycle. The to the continuous aspect of the computer tutorial will include descriptions of: performance function is the analysts ability to recognize when the next stage 1) The ADP System Life Cycle, of the life cycle is occuring and to perform the studies that are appropriate 2) the types of performance at that time. Therefore, the computer studies, and performance staff needs to serve both a planning and monitoring function. The ADP 3) the types of performance tools resource, to most agencies, is critical to the agencies existence and with the help The different tools will be described of a computer performance staff, the ADP along with how they can be used to support resource can provide adequate capacity to meet the agencies needs.

329

. 1

SIMULATION TOOLS IN PERFORMANCE EVALUATION

Doug Neuse K. Mani Chandy Jay Misra Robert Berry

Information Research Associates 911 West 29th Street Austin, TX 78705

The selection of modeling tools for performance evaluation is discussed. Simulation and analytic tools are discussed and compared. A simulation systeim designed to accept computer readable, picture- oriented models is described.

Key words: Analysis; performance evaluation; pictures; simulation; tool s

1. Introduction of transactions over time. The tradeoffs between simulation and analysis are described This paper is concerned with the next. question of selecting the modeling tool most appropriate to a performance evaluator's 2.1 Advantages of Simulation problems. We first compare simulation and analytic methods. The advantages of each 2.1.1 The simulation methodol ogy has method are discussed. We next study two been validated over and over again. There is common types of problems faced by performance absolutely no question about the power of the evaluators, and describe the method of choice simulation approach. Given enough time and (simulation or analysis) for each type. The enough measured data, it is possible to important characteristics of good performance simulate any system to any required degree of tools are discussed next. The importance of fide! ity. ease of input to the modeling system is emphasized. A case is made for using 2.1.2 In contrast, the analytic approach pictorial or diagrammtic representations of is limited in power: there are no general computer systems as the input to simulation analytic techniques that apply to every situa- systems. A simulation system that is tion. Different sets of equations have been designed to accept such (computer readable) developed to model different systems. When pictorial descriptions is described. a set of equations yields, for one system, predictions proven to be correct, our faith 2. Modeling Methods: in the ability of that one set of equations Simulation Versus Analysis to model the given system increases; however, our faith in the ability to model al systems There are two approaches to the modeling by similar sets of equations should not of computer systems: simulation and analytic. increase. There is no way of validating a Analytic models consist of sets of equations, general analytic methodology because there is usually derived from queueing theory. no general analytic methodology . Each Discrete-event simulation models simulate the analytic model is usually a distinct set of generation of transactions and the processing equations. At this time, there is no

331 tractable method to generate a set of equa- 3.1.3 It is difficult to validate a tions to model any given system faithfully. model when the design is still on the There are classes of analytic models, such as drawing board. There is nothing to validate Markov models, that are very general. How- against. ever, such models are not always tractable because the number of equations in a model may Simulation is the method of choice in be too large to be solved in a reasonable designing new systems because of its amount of time. flexibility and ease of understanding.

2.1.3 Simulation is easily understood by 3.2 Tuning and Capacity Planning programmers. It is much easier to hire programmers with some experience in simulation In this case, the systems being modeled than it is to hire analysts with expertise in are well-defined. In most cases, the systems analytic models. have been on the market for some time and documentation describing system behavior is 2.1.4 Simulation is flexible. A simula- provided by the system vendor. Systems on tion model can always be changed to track the market do not change as radically or as changes in the system being modeled. An frequently as designs on the drawing board. analytic model may work well for a system In capacity planning, the modeler knows what initially, but poorly when the system is is being modeled. The ability to measure changed. It is not easy to change a set of running systems gives the modeler the oppor- equations to model the effect of a new tunity to carry out intensive validation scheduling policy, for instance. studies.

2.2 Disadvantages of Simulation Analytic models can be used effectively for tuning and capacity planning studies. 2.2.1 Simulations require much more Validation against measured data should computer time than analytic models. provide credibility for the model. Analytic models can produce output almost instanta- 2.2.2 There may be a paucity of measure- neously, allowing the capacity planner to ment data from the system to be modeled. If formulate and ask "what if" questions and get we do not have the measurements to build a answers, interactively. Interactive dialogue detailed model, there is no point in building with the model is very helpful, especially if a detailed model. Analytic methods may work the model produces graphic output and English as well as simulation for simple models. language-like textual output. Analytic model Simulation models are justified when the data may be backed up by simulation models to to parameterize simulation models exists. enhance credibility and for very detailed analyses. In this mode, analytic models are 3. The Method of Choice used to evaluate a wide range of options rapidly, and simulation models are used for The method of choice -- simulation, anal- detailed study of a small set of points. ysis, or a hybrid approach combining simula- tion and analysis -- depends upon the problem 4. Critical Requirements for being solved. Consider different situations Modeling Packages in which models are used. All good modeling packages must satisfy 3.1 The Design of New Systems the following requirements:

Important characteristics of design 0 A model should be shown to validate efforts include the following. 0 Ease of input. A model should be 3.1.1 Designs often change before they described succintly and naturally. are implemented. Models must be able to track rapid, and sometimes radical changes in 0 Useful input. A model should designs. predict the performance metrics desired by the modeler. 3.1.2 The model must be understood by the designers. This point is very important. In fine-tuning and capacity planning, If designers do not understand a model, they most of the input to the model should come are less likely to cooperate in the modeling automatically from measurement and monitoring effort. Active cooperation from designers is packages. In designing new systems, model an absolutely critical ingredient for a input should come naturally from the descrip- successful design. tions of the designs. Since tuning and

332 ^ s

capacity planning have been described at for an I/O system, where the nodes (resources) length, the remainder of this paper is are channels, controllers, DASDs , etc., and focused on the user of simulation models in the edges show the sequence in which these the design of new systems. resources are acquired and released.

5. Simulation Languages Figures 1 and 2 show a hierarchical series of diagrams describing a swap-based Most simulations are written in FORTRAN time-sharing system [1]. Figure 1 shows a (or some other general purpose language) or network with two resources: terminals and a GPSS. GPSS has the advantage of providing "computer". Figure 2 shows a refinement of structures specifically for simulation, the computer node, which represents an whereas these structures have to be explicit- aggregate of system resources. The entry and

ly encoded in FORTRAN. GPSS is a general exit ports of the computer node in Figure 1 purpose simulation system; it was not are shown in Figure 2. Figure 2 shows that designed specifically for modeling computing a computer really consists of CPU, I/O and systems. SIMPL/1 is another general purpose MEMORY resources. Jobs entering the computer simulation language which is becoming popular node acquire memory, then use the CPU and I/O since it provides simulation facilities and a random number of times, release memory, and retains the capabilities of PL/1. In the leave the computer node. last decade, languages designed specifically for simulation of computing systems have been developed. ASPOL, QSIM, and RESQ are pioneering examples. We next describe a terminal ^ system called PAWS which has the same design 1 philosophy as QSIM and RESQ. The entire design of PAWS is based on one key observa- tion: computer professionals prefer to describe their systems by drawing pictures . The ideal simulation system should take these pictures (or some representation of — computer these pictures) as input. In practice, a modeler using a general purpose simulation system has to translate a graphical descrip- tion of a computing system into a procedural Figure 1. A Simple Model: description. This translation is often Top Level of the Hierarchy demanding of the modeler's time. The trans- lation process is also the source of several errors. PAWS is designed to take a (computer probability = 9/10 readable) representation of a designer's picture as input. There is little transla- tion required. acqui re I/O rel ease CPU The most important question for develop- memory subsystem memory ing a simulation modeling system ought to be: how do computer professionals describe probability = 1/10 computer systems to one another? We will address this question in this paragraph. Figure 2. A Model of the Computer Node: Programmers and architects describe systems Next Level of the Hierarchy by drawing diagrams which are usually in the form of graphs. System resources are repre- sented as nodes (vertices) in the graph. If we wished to, we could refine the Edges from one node to another show the description of the I/O subsystem further. The sequence in which transaction use resources. important point to note is that in a system Usually, computer professionals use a hierar- engineer's mind, a model is a picture . chy of diagrams to describe a system. At the top level of the hierarchy, a system is To analyze the model we need more infor- blocked out into large aggregates of mation than the basic topology provides. resources; for instance, the entire I/O Specifically, we need to know the scheduling system may be represented by a single node in disciplines by which the resources are a graph. The model is refined at lower managed, and the amount of each resource levels in the hierarchy. The description of requested by each user. Normally, when a the model refinement is itself another systems programmer says, "the CPU is managed diagram. For instance, we may draw a diagram according to a round robin fixed quantum

333 .

discipline with a 20 ms quantum," he expects if they wish. Indeed, users are encouraged to be understood. Unfortunately, most model- to add to the library and are provided with ing systems do not "understand" terminology the facilities to do so. which is standard in the computing profession. So, to build a model of a computer system 6. Disadvantages of PAWS using a general purpose simulation system, the modeler has to translate the discipline, PAWS was designed specifically for "round-robin-fixed-quantum," into a sequence modeling computing systems, office automation of small operations such as "acquire systems, and information systems in general. processor," "hold for 20 ms," "release However, many valuable constructs and sched- processor," etc. It is this detailed transla- uling disciplines in PAWS do not have tion procedure which discourages people from application in modeling systems other than using simulation models. information systems (though its interface to user-written FORTRAN code allows a great deal Memory is an important resource in of flexibil ity) computing systems, and most of the common memory management disciplines are described 7. Summary in the literature. A systems engineer is likely to describe a system by saying "system This paper is concerned with the X has 512K words and uses a first-come-first- question, "what are the characteristics of a served, first-fit discipline." The statement modeling package which performance evaluation is quite precise, but it may well require 300 experts find most useful?" We showed that lines of code in a general purpose simulation there are two distinct kinds of performance language. Generating this code takes time and evaluation personnel: (a) those involved in may cause errors. designing conceptually new systems; and (b) those involved in tuning and capacity manage- PAWS was designed to overcome these ment. We gave reasons why: (a) simulation problems. PAWS has a large collection of is the method of choice for those designing routines to model most of the scheduling new systems; and (b) analytic techniques disciplines and resources one comes across in backed by simulation capability are the computing systems. For instance, PAWS methods of choice in the tuning/capacity "understands" the meaning of a "round-robin- management situation. We emphasized that the fixed-quantum" discipline, and the meaning most important characteristic of any modeling of 512K words of memory managed using a first- package is ease of user input. We observed come-first-served, first-fit discipline." that systems designers described systems to The input to a PAWS model is designed to one another by drawing pictures. We represent the way in which one computer described a simulation system (PAWS) which systems designer describes a system to another takes as input a representation of these designer. pictures. The user of the system can add new modeling contructs and thus tailor the model- Resources modeled in PAWS include service ing system to the user's unique needs. We resources (such as CPUs), memory, "passive" think that each performance evaluation group resources such as channels and controllers, should adopt a convention for the pictorial and locks. Disciplines modeled in PAWS representation of the systems that it include first-come-first-served, preemptive evaluates. PAWS provides a convenient tool priority, non-preemptive priority, round- for mapping the pictorial representation into robin-fixed-quantum, and polling. Memory a running, accurate simulation. management disciplines include first-fit, best-fit, and budy system. PAWS also has References high-level language statements for generating different kinds of transactions, modeling [1] R. M. Brown, J. C. Browne, and K. M. parallelism (forks and joins), messages, Chandy., Memory Management and Response routing, and processor interference (as in Time," CACM , Vol. 20, No. 3, pp. 153-165. CPU-channel interference for memory). Users can also define their own types of resources (by writing FORTRAN programs) and then create PAWS models using these resources.

PAWS allows for succinct descriptions of complex computer systems. Users are supplied with documented PAWS source code which is also written in FORTRAN. PAWS is not a black box. Users may study the library of PAWS routines

334 COMPUTATIONAL METHODS FOR QUEUEING NETWORK MODELS

Charles H. Sauer

IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598

A major objective of computing systems (including computer commu- nication systems) development in the last two decades has been to promote sharing of system resources. Sharing of resources necessarily leads to contention, i.e., queueing, for resources. Contention and queueing for resources are typically quite difficult to quantify when estimating system performance. A major research topic in computing systems performance in the last two decades has been solution and application of queueing models. These models are usually networks of queues because of the interactions of system resources. For general discussion of queueing network models of computing systems, see C.H. Sauer and K.M. Chandy, Computer System Performance Modeling, (Prentice-Hall, 1981), and recent special issues of Computing Surveys (September 1978) and Computer (April 1980).

In the fifties and early sixties Jackson showed that a certain class of networks of queues has a product form solution in the sense that

P{n^,...,n^) =

where P{n-^,...,nj^) is the joint queue length distribution in a network with M queues, X^(n^), m = 1,...,M, is a factor obtained from the marginal queue length distribution of queue m in isolation and G is a normalizing constant.

The existence of a product form solution makes tractable the solution of networks with large numbers of queues and/or large populations, but the computational methods are non-trivial unless the normalizing constant itself is easily obtained. This tutorial surveys the most important computa- tional methods for product form queueing networks.

335

.

PRODUCTIVITY IN THE DEVELOPMENT FUNCTION

Phillip C. Howard

Applied Computer Research Phoenix, AZ 85021

Productivity in the development organi- Various studies of errors, when and how they zation is a function of several influences. occur, and how they can be minimized are dis- Among these are influences which are strictly cussed. This ties in with the life cycle and external to the development group, and over the allocation of manpower to software pro- which it has little control, the collection jects. In addition, certain psychological of tools and techniques available to support aspects impacting the programming task are development personnel, and the set of manage- also considered. ment policies and procedures which control the development function. The combination of Outside influences on the development these influences create a development envi- organization are largely related to the na- ronment which yields some level of productiv- ture of the interface with the user organi- ity. This tutorial will address each of the zation and the application itself. In par- key elements of this environment as depicted ticular, the experience of both data proces- in Figure 1: the characteristics of the de- sing and user personnel in similar applica- velopment organization, outside influences, tions, and the availability of good require- tools and techniques, management control, and ments definitions (specifications) are crit- finally, productivity and its measurement. ical to high levels of productivity. Another important factor has to do with the complex- To understand productivity in a develop- ity of the application. Numerous studies ment environment, it is necessary to under- have shown that there is a direct relation- stand something about the split between main- ship between complexity and effort, and that tenance and new application development, the productivity declines as complexity increases. software development life cycle, and the al- Considerable attention is given to the ques- location of effort to the various phases of tion of complexity and how to keep it in the life cycle. Most studies have shown that control an inordinate amount of effort is expended in the test and integration stage, which is di- There are numerous tools and techniques rectly related to errors and their causes. available to development personnel, ranging

337 — from design languages to debugging and docu- both improved software quality and improved mentation aids. Vendors of these products productivity. Evidence is also shown to sup- all claim significant productivity gains, but port the idea that structured programming and the potential compounding effect of several the general class of "improved programming such tools never really seems to work out in techniques" can contribute to improved pro- practice. The various categories of these ductivity as well. Problems of manpower al- tools will be considered and some guidelines location are discussed, including Brooks' given on how to evaluate them in the context "mythical man-month" and some ideas on how to of a particular development environment. In allocate manpower to development work. addition, one of the special problems relat- ing to the use of tools, that of properly in- All of those influences on the develop- tegrating them into the working environment ment organization— external influences, tools and overcoming people's natural inertia, or and techniques, and management practices resistance to change, will be explored. determine the level of productivity actually achieved. First, productivity as a ratio of The specific management procedures that output to input is examined, with particular control the development organization also attention to the problems of defining or mea- have an important influence on productivity. suring "output' from a development organiza- For example, project management not only al- tion. Two possibilities are considered, lows for better control but entails a manage- lines of code and "functions." Other types ment discipline which contributes to improved of quantitative measures are also discussed productivity. The basic elements of software in the context of developing a productivity engineering have been shown to contribute to or performance "data base " for the develop- ment organization.

338 ACQUISITION BENCHMARKING

Dennis M, Conti

National Bureau of Standards Washington, DC 20234

Abstract

Benchmarking has traditionally been more of an art than a science, primarily because of the previous lack of well- established procedures for constructing and testing benchmarks. As a tool for evaluating vendor performance during the competi- tive acquisition of computer systems, benchmarking has become an accepted practice. However, in spite of the widespread use of benchmarking, especially within the Federal Government, practitioners are still unaware of the sources of error that affect the ability of a benchmark to represent a real workload.

This tutorial will explore the purpose of benchmarking during the acquisition of computer systems, the sources of error inherent in the benchmark process, and ten steps to constructing and testing benchmark mixes. These steps have formed the basis for a recent NBS Federal Guideline in this area, FIPS PUB 75.

339

SOFTWARE CONVERSION PROCESS AND PROBLEMS

Thomas R. Buschbach

Federal Conversion Support Center Office of Software Development General Services Administration

This tutorial will provide users with a basic understanding of the software conversion process. The technical and management problems which arise during a conversion project will be discussed. A brief description of the Federal Conversion Support Center's capabilities to assist agencies in the conversion process will be given.

1. Federal Conversion Support Center The conversion planning task includes; a complete inventory of current software, The Federal Conversion Support Center selecting tools to automate the process, (FCSC) was established in May, 1980 as the defining standards for the converted programs Federal Government's primary source for soft- and files, and a software conversion study ware conversion technology. The intended which includes schedules and cost estimates. effect of the FCSC is to ensure careful Some of the problems associated with the consideration of software conversion factors planning task include the large number of in ADP equipment and teleprocessing services tasks and the amount of material to be handled. acquisitions. The FCSC provides Federal Agencies with expertise, techniques and tools The materials preparation task involves to conduct conversion studies and accomplish the generation and validation of test data, software conversions through the provision of and the formation of conversion work packages. reimbursable services. A system for tracking the work packages from the time they are formed until they are 2. Software Conversion Process accepted back from the conversion contractor must also be developed. The problems associated Conversion, as defined by the FCSC, is with this task include the amount of test the transformation, without functional change, data required and its comprehensiveness. of computer programs or data to permit their use on a replacement or modified ADP system, The translation task encompasses the telecommunications system, or teleprocessing translation of all programs, files, JCL, and service. Historically, Federal agencies have data bases. This would normally include a clean underestimated the resources required for compile and testing at the program level. software conversion due to a lack of under- standing of the conversion process. A large The maintenance change task is required to part of the problem is the failure to realize track and eventually incorporate changes to the all the tasks involved in a conversion. The production software which occur during the conversion process consists of the following conversion process. tasks: planning; materials preparation; translation; unit, system, and acceptance The implementation task includes; system testing; maintenance changes; implementation. testing to ensure interoperability of programs, To accurately estimate the cost of a conver- acceptance testing to validate output from the sion the following cost elements must also be system, and cutover from the old to the new environment. taken into consideraion; redocumentation , site software in the production preparation, training, management, and tools.

341

.

RESOURCE MEASUREMENT FACILITY (RMF) DATA: A REVIEW OF THE BASICS

J. William Mullen

The GHR Companies, Inc. Destrehan, LA

Analysis and understanding of RMF measurement data is a basic prerequisite to performance management in the MVS environment. A basic approach to RMF analysis is described for key measurement data elements often over- looked in the analysis process, how RMF measures these values, and some rules-of- thumb based upon the author's experience. Specifics in the areas of I/O, paging/ swapping, ISO, and analysis of cause and effect relation- ships of RMF measured data will be discussed.

Key words: Analysis; data elements; I/O; logical

swapping; measurement; MVS; paging/ swapping ; TSO.

1. Introduction of the MVS system should be understood in general with constants in the algo- The reduction and analysis of RMF rithms that are applicable to the data is a basic prerequisite to per- hardware in the configuration. The formance management and configuration overall knowledge gained from study tuning in the MVS environment . A and understanding of the SRM algo- considerable amount of material is rithms provides a base for cause and available on the meanings of the effect analysis of RMF data. The SRM measurement data reported by RMF and algorithms are included in Appendix A some documents attempt to provide for reference. basic ' rules-of - thumb ' to aid the analyst in determining current per- Discussion of RMF measurement formance levels for a configuration. elements addressed in this paper will A more basic approach to analysis of be done by RMF report type to provide RMF data is the understanding of cause ease of reference for measurement data and effect relationships between the elements measurement data elements and the approach to determining interaction 2. CPU Activity Report and subsequent courses of action to alleviate performance bottlenecks Primary measurement elements within the configuration. provided in this report are concerned with the processor average utilization Analysis of RMF data additionally during the specified m.easurement requires a basic understanding of the interval, the minimum, maximum, and System Resources Manager (SRM) and the average number of tasks for the three data collection process whose data types of processing (batch, time- elements are subsequently reported by sharing, started task) and the minimum, RMF. Algorithms used by the SRM to maximum, and average values for the determine over and under utilization various queues within the MVS system.

343 '

A distribution of queue lengths is Input/Output Supervisor (lOS) . The provided by RMF to give the analyst RMF channel sampling module, ERBMFECH, insight into the variances during the collects these statistics and pro- measurement period. Information for vides information for both physical the CPU Activity Report is obtained and logical channel activity. by the RMF CPU sampling module MRBMFECP. This sampling module ob- Primary interest in the physical tains the maximum, minimum, average, channel reports revolves around the and distribution values by sampling average service time for the channel. the SRM User Control Block (OUCB) and This value is the calculated average the ASCB queues which are anchored in time required for the channel to the Resource Manager Control Table complete an I/O operation. Averages (RMCT) addressed via the CVT. Counts for channel I/O operations tend to and statistics gathered are stored in run in the 10 to 15 millisecond the ECPUDT data area. range. As this value increases, a significant change may be seen in the A statistic of primary concern channel busy/CPU wait percentage to the analyst is the ' Out-And-Ready though the change may not be rapid average value. In most instances, an enough to attract attention. A more average value greater than 1.0 for the significant indicator of channel and measurement interval is usually I/O problems are the ' Average-Que- interpreted as an indication of over- Length' and '7o Request Deferred' initiation of the system. The analyst fields on the logical channel report. should first investigate the data MVS associates each physical channel elements used by SRM in the over- and each primary/optional channel utilization algorithm from all RMF specification with a logical channel. reports (i.e. CPU activity, demand Of primary interest in the logical paging, UIC, etc.) to determine if the channel report is the ' Average-Que- system is being operated in an over- Length' field. This field represents initiated mode and there is a lack of the depth of I/O requests queued for needed system resources. If no the logical channel. A value of .1 symptom appears from this analysis, or greater for this field is an indi- the analyst should then evaluate the cation of possible channel overload. Installation Performance Specifica- Though a direct correlation may be tions (IPS) designations for minimum seen between this field and percent- and maximum MPL of each Domain. This age of channel utilization, channel can be accomplished utilizing the RMF utilization is generally not the main trace facility and specifying the contributor to the problem. DMDTCMPL and Dl^lDTRUA trace key word values. The analyst should look for The percentage of request de- any domain that is consistently ferred distribution provides the running at its specified maximum MPL. major indication of the problem. If This could indicate an artificial requests deferred due to channel busy constraint on the SRM's ability to is the high percentage, the device raise the system target JIPL and activity report should be consulted specifically the MPL for that particu- to determine the volume or volumes lar domain. RMF trace values for the receiving the highest I/O activity. DMDTCMPL and DMDTRUA trace key words This will give a starting point to should be reviewed periodically to determine if a volume move or data assure proper minimum and maximum MPL set moves can be done to alleviate designations in the IPS. the problem.

An additional consideration is If the percentage of request the 'Logical-Out-Ready' value and the deferred due to control unit busy is effect on TSO logical swap. Logical the high percentage, then three al- swapping considerations will be ternatives must be investigated. The addressed later in the paper. first is rerouting of channel through alternate or additional control units. 3. Channel/Device Activity Reports This is usually not a viable alterna- tive. The second option is the move- The MVS Operating System pro- ment of DASD strings to other less vides information statistics for utilized control units. This can be physical and logical channels via the a very viable alternative in concert

344 : .

with proper DASD segregation by work- 4. Page/Swap Data Set Usage load at the DASD string level. This effort will revolve around utilizing Resource Measurement Facility the device activity reports to again collects information on page data identify the volumes receiving the sets, swap data sets, and syster^ most activity and making appropriate paging activity via the paging volume or data set moves to distribute sampling module ERBMFEPG. The I/O activity. The final alternative sampling module extracts the data from is necessary if channel paths from two sources : the Paging Vector Table more than one processor are routed (PVT) and the Auxiliary Storage through the control unit pairs. Work- Manager Vector Table (ASMVT) loads on each processor must be evalu- ated with respect to their use of the Analysis of the Page Data Set DASD devices on the strings serviced Usage report and the Swap Data Set by the control units to determine if Usage report should be directed toward movement or combining workload pro- the following elements cessing between processors will allow a better I/O distribution to the o bad slot/swap set indications channel paths being serviced by the for page or swap data sets; control units. A final consideration is the channel algorithm (i.e. rotate, o average service times for LCU) being used on each of the pro- allocated page and swap data cessors. With proper DASD segregation sets ; and by workload at the string level , the Last Channel Used (LCU) algorithm has o allocated page/swap data set proved best for most installations slots and swap sets with (assuming an MVS/SEl or higher environ- minimum, maximum, and average ment) . Processors will tend to sort slots/swap set indicators. channel and control unit paths out between themselves which moves DASD An increase in page data set service contention down to the string, and times is usually caused by an increase more specifically, the device level. in cross system DASD contention. Another cause of increases in service When device busy is the highest times for local page data sets can be percentage for deferred requests , the a spill of swap activity, due to a device usage must be studied to deter- shortage of swap sets and swap data mine which data sets are responsible sets, into the local page data sets. for the high activity level and con- Increases in the number of logged-on sider either a consolidation of work- time-sharing (TSO) users is usually loads to a given processor or the cause of swap data set spills. duplexing the data set (s) involved. The analyst should continuously review Device busy contention will only occur the statistics on swap set usage. If in a multi-processor environment. the maximum swap sets used is equal to swap sets allocated for all in-use The objective of analysis of RMF swap data sets and is a frequent channel/device activity is the iden- condition, swap spill to local data tification of I/O overload conditions sets is probably occurring. This will and more importantly, the elimination also be in concert with an increase in of contention in the multi-processor average ended transaction times for environment. DASD contention at the TSO transactions. control unit level is most difficult to resolve. Through workload segrega- Bad slot/swap set counts for page tion and use or proper channel algo- or swap data sets can indicate the to beginning of DASD volume problems and rithms , the analyst should attempt transfer any contention to the string induce intermittent abnormal task and device level. Once this transfer terminations into the system. This is is accomplished, the contention can particularly true if a bad slot indi- be alleviated through volume and data cator occurs for the PLPA page data set moves on existing DASD strings. sets. If bad slot indications go unnoticed for the PLPA page data set, the potential for a spill of PLPA into the common page data set is present with system performance degradation

3U5 : . ,

being the end result. where systems are not memory (real)

overcapacitated . The demand paging An additional consideration is value is used in concert with other the size and placement of page data variables in the SRM's decision making sets. Analysis of the usage of the process with only UIC and page-fault page data sets from the usage report rate being the stand alone variables will provide the analyst information in the algorithm. The analyst should on page slots used with respect to evaluate the feasibility of changing page slots allocated. Several page the SRM constant values to negate (local) data sets should be used and the UIC value as a control and utilize should be allocated with a minimum of the page- fault rate (PTR) to control slots over the measured maximum slots system paging activity. This can be used. This reduces the seek activity accomplished by altering the SRM due to the spread of pages over slots constants located in CSECTIRARMCNS in by the ASM slot sort routine. the nucleus for MVS/SE Release 1 or prior systems and in SYSl.PARMLIB A final point is the use of the member lEAOPTXX for MVS/SE Release 2 RMF trace facility to track page delay and fWS/System Product (SP) systems. time. The trace key word, RCVMSPP provides the average page delay time The second portion of the Paging (excluding swap pages) in milliseconds Activity report displays statistics on for transfer of pages during the system swapping activity during the measurement interval. This value will measurement interval. Logical swap alert the analyst to paging problems counts and their relationships will be as an initial indicator and tends to discussed under the section on TSO be less subtle than other indicators logical swapping. Swap sequence discussed counts provide the total swapping activity count for the interval and 5. Paging Activity Report averages for pages swapped in and out during the interval. The primary Paging can be the absolute element here is the swaps per second controlling factor in the performance count. In a system with sufficient of an MVS system if the analyst does swap data sets allocated, the analyst not recognize the options available to should try to maintain a 1.0 swap per control the system paging rate. Two second or less average. If this value values derived from the Paging tends to be greater, the physical swap Activity Report are key in the SRM's counts section should be analyzed. system control decision making process VJithin the physical swap counts section, a primary element often over- o Demand paging - number of non- looked is the 'detected wait' swap VIO non-swap page-ins plus count field. Many analyst simply , page-outs per second; and write this count off to system initia- tors going inactive. The analyst o Page fault rate - number of should investigate the minimum, maxi- non-VIO, non-swap page-ins mum, and average values for active plus page-reclaims per second. initiators from the CPU Activity Report and determine if this is The analyst should track these values actually the cause. If not, the other to determine if they are approximately major contributor to this parameter is equal. In most MVS installations, DASD wait time. If investigation of page- ins are the primary contributor paging activity and paging service to each of these values. The third times appears to be within reasonable parameter required is the page bounds, the analyst should begin in- unreferenced- interval- count (UIC) vestigation DASD path and device average during the interval. This queueing. Normally this investigation value must be obtained through the RMF will show that a given path or device trace facility by use of the RCVUICA (s) on the path are experiencing trace key word. The average UIC value significant I/O activity, cross system is a primary value used by the SRM to contention, and queueing of I/O determine system over-under utiliza- request. The analyst can use SMF tion of the system and tends to be the records to determine the type of work- most erratic of the three variables load requesting I/O service to the

346 : .

DASD in question, from each processor, Primary controls for raising or and begin attempts to provide better lowering the system think time are the workload segragation at both the system average UIC (RCVUICA) and the processor and DASD level. system average available frame count

(RCVAFQA) . If the system average LMC Swap counts for unilateral and is greater than 30 and the system exchange on recommendation value swap available frame count is greater than counts are associated with IPS design 300, the system average think time is and parameters and are not in the raised .5 seconds. If the system aver- scope of this paper. age UIC is less than 20, then the system average think time is reduced Additional swap reason counts 1 second regardless of the system are self explanatory to the analyst. average available frame count. Determining the workloads affected by the above swap reasons can be accom- Three problems are present with plished through use of the RMF back- the algorithm and the current default ground monitoring facility and reduc- values tion of RMF type 79 SMF records. o the system UIC can be some- 6. TSO Logical Swapping what erratic depending upon the workload mix processing The TSO logical swap or in on the system; memory swap was introduced with the release of the MVS/Systems Extensions o the UIC comoarison values

(SU50) . At swap out time (terminal of 20 and 30 are high; and input or output wait) , the TSO user is evaluated for logical swap. If o the available frame count the criteria for logical swap is met, value of 300 is too high. the TSO user's fixed frames, recently referenced working set frames and The analyst should review the LSQA remain in storage and the user logical swap statistics from the paging is placed on the out/wait queue to activity report to determine the degree await terminal input. If terminal of logical swapping in the system. In input is not received during a speci- concert, the average available frame fied period (think time) , the TSO count should be reviewed to determine user is physically swapped out of if the average available frames is memory. If terminal input is consistently greater than 100. If this received within the specified amount is the case, the analyst should consider of time, the TSO user is moved to the reducing the default compare value of out/ready queue to compete with other 300 to 100 and evaluating the results tasks waiting to be swapped in under for logical svxap normal SRM control. After taking the above action and

The control variable for TSO reviewing results , the analyst should logical swapping is a value known as trace the RCVUICA value for minimum, 'think time' and is the amount of maximum and average values to determine time the user is allowed, after being UIC ranges under a represenative work- logically swapped out, to enter data load. If traces show the UIC value to at his terminal and prevent being be in a range less than 30 but greater physically swapped out. The maximum than 10, the analyst should consider default think time is 30 seconds and reducing the default values of 20 and is in the SRM constants table 30 for the UIC compare and again review

(IRARMCNS) . An additional value used the effects on logical swap. in the logical swap decision is the system average available frame count Tracing the system average think be done in concert with (RCVAFQA) . Though the ASM queue time should length is also part of the algorithm the above actions . If the system think for the logical swap determination, time remains high or at the maximum it is negated due to default values value without an increase in the assigned which should not be changed percentage of TSO users being logically by the analyst. swapped, the analyst must assume that the user community has a large average

347 . ; . . :

think time for the terminals in use. The SRM algorithms and TSO logic The analyst may want to consider swap algorithms are included for raising the default maximum system reference think time to induce more logical swapping into the system. This should Over utilization(SRM) be done only if sufficient memory is available and the increases in maxi- ( UICA < 2 ) or mum think time are gradual with a ( CPUA > 100 ) or thorough review on the effect to ( ASMQ > 100 ) or logical swap. ( PTR > 100 ) or ( MSPP > 1000 ) or The objective of increasing the percentages of TSO users logically ( DPR>DPRTH ) and (( CPUA> 987o or swapped out and subsequently swapped ( MSPP> 130)) back in is to reduce the instruction path length and the channel/device or overhead associated with physical swapping. Care must be taken to avoid (( lOUSE = 'I'B ) and ( CPUA < 98% )) inducing additional demand paging and memory shortage overhead into the ( TMPL > MINMPL ) for at least 1 system with TSO logical swap improve- ment s Domain

TSO Logical Swap (think time increase)

7 . Summary

IF ( UICA > LSCTUCTH (30) and

Analysis of RMF data requires an ( AMSQA < LACTASTL (100) or understanding of the cause and effect ( AFQA > LSCTAFQH (300) then relationships of those variables used by the SRM to control the MVS system. LSCTMTES = MIN(LSCTMTEH,LSCTMTES

Direction for understanding these + LSCTMTEI ) relationships and analysis of the data can be gained through the following: TSO Logical Swap(think time decrease)

IF UICA LSCTUCTL < (20) or o review and understanding ( ASMQA LSCTASTH (100) or of the SRM algorithms for ( > AFQA LSCTAFQL then over and under utilization ( < (0) of the system; LSCTMTES = MAX(LSCTMTEL, LSCTMTES - LSCTMTED o review and understanding ) the TSO logical swap algorithm

o review and understanding of the cause and effect relationships discussed in this paper; and

o utilization of the RMF trace facility to obtain the key values used in the SRM algorithms and analysis of the relation- ships in your system.

Exercising the above recommen- dations will give the analyst a substantial start in the RMF data analysis process.

8. Appendix A.

348 . NBS-n4A (REV. 2-ec) U.S. DEPT. OF COMM. 1. PUBLICATION OR 2. Performing Organ. Report No. 3. Publication Date REPORT NO. BIBLIOGRAPHIC DATA SHEET (See instructions) November 1981

4. TITLE AND SUBTITLE Computer Science and Technology: Proceedings of the Comouter Performance Evaluation Users Group, 17th Meeting, "increasing Organizational Productivity."

5. AUTHOR(S) Terry W. Potter, Editor

6. PERFORMING ORGANIZATION (If joint or other than NBS, see /n struct/on sj 7. Contract/Grant No.

NATIONAL BUREAU OF STANDARDS DEPARTMENT OF COMMERCE 8. Type of Report & Period Covered WASHINGTON, D.C. 20234 Final

9. SPONSORING ORGANIZATION NAME AND COMPLETE ADDRESS (Street, City. Stote. ZIP)

Same as no. 6

10. SUPPLEMENTARY NOTES

Library of Congress Catalog Card Number: 81-600155

Document describes a computer program; SF-185, PIPS Software Summary, is attached.

11. ABSTRACT (A 200-word or less factual summary of most si gnifi cant information. If document includes a si gnifi cant bi bliography or literature survey, mention it here)

These Proceedings record the papers that were presented at the Seventeenth Meeting of the Computer Performance Evaluation Users Group (CPEUG 81)' held November 16-19, 1981, in San Antonio, TX. With the theme, "Increasing Organizational Productivity," CPEUG 81 reflects the critical role of information services in the productivity and survival of today's organization, as well as such trends as increasing personnel costs, limited budgets, and the convergence of data processing, communications, and word processing technologies. The program was divided into three parallel sessions and included technical papers on previously unpublished works, case studies, tutorials, and panels. Technical papers are presented in the Proceedings in their enti rety

performance 12. KEY WORDS: Benchmarking; capacity planning; chargeback systems; computer management; data base machines; end user productivity; human factors evaluation; infor- mation system management; office automation; performance management systems; resource facility; simulation; supercomputers measurement . 13. AVAILABILITY 14. NO. OF PRINTED PAGES Unlimited ^ 320 Q For Official Distribution. Do Not Release to NTIS Order From Superintendent of Documents, U.S. Government Printing Office, Washington, D C ^ 15. Price 20402.

$9.00 Order From National Technical Information Service (NTIS), Springfield, VA. 22161

ilSCOMM-DC 6043-P80 <^U.S. GOVERNMENT PRINTING OFFICE: 1981-360-997/1850

ANNOUNCEMENT OF NEW PUBLICATIONS ON COMPUTER SCIENCE & TECHNOLOGY

Superintendent of Documents, Government Printing Office, Washington, D. C. 20402

Dear Sir:

Please add my name to the announcement list of new publications to be issued in the series: National Bureau of Standards Special Publication 500-.

Name

Company .

Address

City State Zip Code

(Notification key N-503)

.

NBS TECHNICAL PUBLICATIONS

PERIODICALS NOTE: The principal publication outlet for the foregoing data is the Journal of Physical and Chemical Reference Data (JPCRD) JOURNAL OF RESEARCH—The Journal of Research of the published quarterly for NBS by the American Chemical Society National Bureau of Standards reports NBS research and develop- (ACS) and the American Institute of Physics (AIP). Subscriptions, ment in those disciplines of the physical and engineering sciences in reprints, and supplements available from ACS, 1 155 Sixteenth St., which the Bureau is active. These include physics, chemistry, NW, Washington, DC 20056. engineering, mathematics, and computer sciences. Papers cover a broad range of subjects, with major emphasis on measurement Building Science Series— Disseminates technical information methodology and the basic technology underlying standardization. developed at the Bureau on building materials, components, Also included from time to time are survey articles on topics systems, and whole structures. The series presents research results, closely related to the Bureau's technical and scientific programs. test methods, and performance criteria related to the structural and As a special service to subscribers each issue contains complete environmental functions and the durability and safety charac- teristics building citations to all recent Bureau publications in both NBS and non- of elements and systems. NBS media. Issued six times a year. Annual subscription: domestic Technical Notes— Studies or reports which are complete in them- SI 6; foreign $20. Single copy, $3.75 domestic; $4.70 foreign. selves but restrictive in their treatment of a subject. Analogous to NOTE: The Journal was formerly published in two sections: Sec- monographs but not so comprehensive in scope or definitive in tion A "Physics and Chemistry" and Section B "Mathematical treatment of the subject area. Often serve as a vehicle for final Sciences." reports of work performed at NBS under the sponsorship of other DIMENSIONS/NBS—This monthly magazine is published to in- government agencies. form scientists, engineers, business and industry leaders, teachers, students, and consumers of the latest advances in science and Voluntary Product Standards— Developed under procedures published the technology, with primary emphasis on work at NBS. The magazine by Department of Commerce in Part 10, Title 15, of highlights and reviews such issues as energy research, fire protec- the Code of Federal Regulations. The standards establish tion, building technology, metric conversion, pollution abatement, nationally recognized requirements for products, and provide all interests with health and safety, and consumer product performance. In addi- concerned a basis for common understanding of the characteristics of the products. NBS administers this program as a tion, it reports the results of Bureau programs in measurement standards and techniques, properties of matter and materials, supplement to the activities of the private sector standardizing engineering standards and services, instrumentation, and organizations. automatic data processing. Annual subscription: domestic $11; Consumer Information Series— Practical information, based on foreign $13.75. NBS research and experience, covering areas of interest to the con- NONPERIODICALS sumer. Easily understandable language and illustrations provide useful background knowledge for shopping in today's tech- Monographs— Major contributions to the technical literature on nological marketplace. various subjects related to the Bureau's scientific and technical ac- Order the above NBS publications from: Superintendent of Docu- tivities. ments, Government Printing Office. Washington. DC 20402. Handbooks Recommended codes of engineering and industrial — Order the following NBS publications— FI PS and NBSIR's—from practice (including safety codes) developed in cooperation with in- the National Technical Information Services, Springfield. VA 22161 terested industries, professional organizations, and regulatory bodies. Federal Information Processing Standards Publications (FIPS Special Publications— Include proceedings of conferences spon- PUB)— Publications in this series collectively constitute the sored by NBS, NBS annual reports, and other special publications Federal Information Processing Standards Register. The Register appropriate to this grouping such as wall charts, pocket cards, and serves as the official source of information in the Federal Govern- bibliographies. ment regarding standards issued by NBS pursuant to the Federal Property and Administrative Services Act of 1949 as amended. Applied Mathematics Series— Mathematical tables, manuals, and Public Law 89-306 (79 Stat. 1127), and as implemented by Ex- studies of special interest to physicists, engineers, chemists, ecutive Order 11717(38 FR 12315, dated May 11, 1973) and Part 6 biologists, mathematicians, computer programmers, and others of Title 15 CFR (Code of Federal Regulations). engaged in scientific and technical work. National Standard Reference Data Series— Provides quantitative NBS Interagency Reports (NBSIR)—A special series of interim or data on the physical and chemical properties of materials, com- final reports on work performed by NBS for outside sponsors piled Trom the world's literature and critically evaluated. (both government and non-government). In general, initial dis-

Developed under a worldwide program coordinated by NBS under tribution is handled by the sponsor; public distribution is by the the authority of the National Standard Data Act (Public Law National Technical Information Services, Springfield, VA 22161, 90-396). in paper copy or microfiche form. U.S. DEPARTMENT OF COMMERCE National Bureau of Standards Washington. D C. 20234 POSTAGE AND FEES PAID U.S. DEPARTMENT OF COMMERCE OFFICIAL BUSINESS COM-215

Penalty for Private Use. S300

SPECIAL FOURTH-CLASS RATE BOOK