<<

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange

The Harlan D. Mills Collection Science Alliance

10-5-1988

A Case Study in Cleanroom Engineering: The IBM Cobol Structuring Facility

Richard C. Linger

Harlan D. Mills

Follow this and additional works at: https://trace.tennessee.edu/utk_harlan

Part of the Commons

Recommended Citation Linger, Richard C. and Mills, Harlan D., "A Case Study in Cleanroom Software Engineering: The IBM Cobol Structuring Facility" (1988). The Harlan D. Mills Collection. https://trace.tennessee.edu/utk_harlan/34

This Conference Proceeding is brought to you for free and open access by the Science Alliance at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in The Harlan D. Mills Collection by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. after the Norman conquest was recorded in analysis. and structure chart generation for Roman numerals, but never added up in the output program procedure hierarchy. spite of the obvious interest in such a result. Some 52,000 lines of PL/I source code. new Now the experts in arithmetic of the day and changed, were written to produce would never have believed that with place Version 2. with 28,000 lines reused from notation and long division. school children Version l. of later centuries would be capable of arithmetic performance these experts Version 2 was developed by a Cleanroom deemed impossible. And so it will be that software team composed of a technical may current experts in heuristic. intuitive engineering manager. six software methods of will find engineers. and a certification engineer.l the use of mathematical verification in Three summer supplemental college place of trial and error unit students also participated. Team members impossible to consider as a rational held BS or MS degrees in methodology. or mathematics and had recently joined IBM. With the exception of the team The COBOL Structuring Facility manager and certification engineer, COBOL/SF was their first software The COBOL Structuring Facility (COBOL/SF development project. l988a. l988b) is comparable in function and complexity to a modem high-level The Version 2 development proceeded language . It embodies proprietary through formal specification, design. graph- and function-theoretic technology functional verification, implementation, to automatically transform unstructured and Cleanroom testing in five increments, COBOL programs into hierarchies of beginning on April 15 and completing structured procedures. COBOL/SF helps December 15. 1987. Seventy person-months solve difficult (eight full-time people for eight months. problems by reducing complexity and plus three supplementals for two months increasing understandability of program each) of effort were expended during this logic. development period, for an overall productivity of 740 lines of code per Table l summarizes the development person/month, including all specification, history of COBOL/SF. This paper reports on design. implementation, testing, and development of Version 2; results for other management activities. The system entered versions. also developed with Cleanroom field test at customer sites on January 6, Software Engineering, were similar. 1988.

Lines of Code (KLOC) Version 2 development was a real-world Reused Changed New Total project in every respect, with shifting Prototype 0 0 20 20 requirements and an extremely short Version l 18 2 15 35 development schedule. All schedules and Version lA 30 5 11 46 budgets were met. and all committed Version 2 28 18 34 80 functions were delivered.

All versions of COBOL/SF consist of four Table 1. major components as show in Figure 1. The COBOL/SF Development Summary System Control Program manages system software and user interfaces, and certain The COBOL/SF prototype and versions l common services. The Source Language and lA provided structuring capability for Parsing Subsystem parses the input VS COBOL II programs into VS COBOL II program and creates a knowledge base of only. Version 2 incorporated the following program structure. The input program is major additional functions: structuring of prepared for structuring by the Control OS/VS COBOL programs into either OS/VS Flow Analysis Subsystem, which deals COBOL or VS COBOL II, automation of optional manual steps to enhance the structuring process. complexity metrics ITeam members: K. Cannaday, M. Deck, P. analysis, program modularization Hausler. R Linger, C. Loving, L. Pedowitz, S. Rosen. A Spangler

2 A Case Study in Cleanroom Software Engineering: The IBM COBOL Structuring Facility

Richard C. Linger Harlan D. Mills IBM Corporation University of Florida Bethesda, Maryland Gainesville, Florida and Software Engineering Technology, Inc. Vero Beach, Florida

Abstract programs with only 10 errors detected. All errors were trivial, none requiring more than a few hours to find and fix, and most The IBM COBOL Structuring Facility just a few minutes. In all testing, only one Program Product was developed by a small error resulted in a COBOL program failing using Cleanroon to execute functionally equivalent before Software Engineering technology in a and after structuring. As confidence in pipeline of increments with very high quality grew, field test participants engaged quality and productivity. In the Cleanroom in wholesale structuring of entire systems approach, programs are developed under of COBOL programs, in effect. treating the statistical quality control. and field test version of COBOL/SF as a final mathematical verification is used in place product. of unit debugging. The of specification, design, functional Since the common wisdom in software verification, and testing are d,escribed, engineering is that mathematical together with development and verification of sizable software products is management practices required for impractical and that unit debugging by maintaining intellectual control over the programmers is necessary, these results process. may appear incredible. As far as we know, the axiomatic verification (Hoare 1969. A Cleanroom Software Case Study Gries 1981) of software as widely taught in university computer science courses today The IBM COBOL Structuring Facility is indeed impractical for products of this (COBOL/SF) Version 2 Program Product size. However, functional verification automatically transforms unstructured (Linger 1979) was used for COBOL/SF. And COBOL programs into structured form. It even functional verification for products of was developed by a small programming this size is impractical. except for teams team using Cleanroom Software whose members are well educated in formal Engineering technology [Mills 1987). methods of specification, design and COBOL/SF Version 2 consists of 80,000 functional verification. Team members lines (52,000 new and changed over Version must be provided further intemships in 1) of high function source code that was team operations, for scaling up such formal developed under statistical quality control, methods into work products and processes being specified, then designed, that permit day-to-day work to accumulate mathematically verified, and coded with no into mathematical verifications of unit debugging in a pipeline of increments software products of any size. at very high productivity. Each increment was placed under engineering change It is understandable that the common control before any execution and subjected wisdom of such a new subject as software to system test under a sound statistical engineering can underestimate the design. potentials of human achievement in various ways. Centuries ago, the common As a result, COBOL/SF passed its field test wisdom in arithmetic with Roman of structuring a half-million .. lines of numerals was that large scale arithmetic COBOL code in over 300 application was impractical, so that the great inventory

1 with structural problems caused by Cleanroom Software Engineering complex perlormed procedure logic, ALTER statements. etc. The Structured Program Traditional software development proceeds Generation Subsystem transforms the through steps of specification. design. and input program into structured form and code, then unit, component, and system generates code. Finally. an off-line Parser testing. Selective tests are invented with Generator compiles COBOL grammars into knowledge of programmed intemals. often parse tables for use by the system. by the developers themselves, typically to exercise primary functions. then secondary COBOL/SF Version 2 was developed top functions, error cases. etc. On completion down in five increments as depicted in of testing, the software is known to work as Table 2. With no unit debugging permitted, tested. but can still fail in circumstances the error rates shown are measured from not tested. As a result, the reliability first execution through the completion of evidence of selective testing is entirely Cleanroom testing. They range from 1.4 to anecdotal; it is known only that the 5. 7 errors I KLOC of source code, with an software passed certain tests. with no average of 3.4 errors I KLOC. Table 2 inference possible of future failure rates. suggests a possible correlation between Worse. selective testing provides no increment size and error rate, however. no rational basis for managing development. such relation appeared in the earlier If few errors are found, is the code of high versions, whose larger increments often quality or is the test process faulty? If exhibited the lowest error rates. many errors are found, has the quality of the code been sufficiently improved or are Published reports on software productivity there many more errors left to be found? and quality are highly variable, however, averages of 150 WC I person-month and 70 The objective of Cleanroom Software errors I KWC (including unit debugging) are Engineering is to provide scientific representative of industrial experience for evidence of reliability by embedding the complex products [Boehm 1981, Jones entire development process in a statistical 1986]. Table 2 shows anticipated errors for design [Mills 1987]. In the Cleanroom each increment at a rate of 70 errors I approach, a statistical property of software KLOC. Using the Cleanroom approach, a under test, namely successive times small team of software engineers produced between execution failures. is used to code of compiler complexity at a rate of 3.4 estimate reliability directly using a new errors I KLOC. roughly one-twentieth the certification model [Curritt 1984). In the industry average, and a productivity rate of statistical design. all testing is randomized 740 lines I person-month. roughly five over projected user input distributions, to times the industry average, all within rehearse eventual use of the software in schedule and budget. arriving at reliability estimates. To keep the estimates valid, programs are placed Parser System under engineering change control from Generator first execution on, with no unit debugging Control or developer testing permitted. Program Program I Cleanroom Software Engineering requires I the best possible mathematics-based development methodologies. The objective I is to develop such high quality software I with no unit debugging that statistical I I testing will reveal a reliability growth, as Source Control Structured lower and lower frequency errors are found Language Flow Program and fixed. and not simply thrash from one Parsing Analysis Generation high-frequency error to the next with no Subsystem Subsystem Subsystem reliability growth possible, in effect debugging and not certifYing the code.

Figure 1 Successful Cleanroom software COBOL Structuring Facility Components development depends critically on the

3 ability of team members to apply formal following areas. methods of software engineering in the

Increment Lines of Anticipated Errors Errors I Errors Code Errors at Found in KLOC Found in 70/ KLOC Cleanroom Field Testing Testin):!; 1 4150 291 6 1.4 l 2 11125 779 24 2.2 2 3 10080 706 23 2.3 2 4 19543 1368 Ill 5.7 4 5 7117 498 15 2.1 l Totals 52015 3642 179 lO

179 errors I 52.015 KLOC = 3.4 errors/KLOC

Table 2. Error Rates in Cleanroom Testing Measured From First Execution for COBOL/SF Version 2 Development

Formal Specification A crucial mathematical property required of the formal structures is referential A cleanroom software specification defines transparency in hierarchies (Mills 1988). required function and performance, the that is, fully specified behavioral statistical distribution of user input, and equivalence across levels of decomposition. the content of successive development This requirement precludes popularized increments. specification techniques which lack referential transparency, such as structure A fundamental principle of Cleanroom charts and data flow hierarchies. Software Engineering is to identify formal mathematical structures for specifying the Natural language is used not to carry the problem at hand, whether it be an entire burden of specification, for which it is not system, a subsystem, or a component. well suited, but rather to explain the formal Formal structures include the box specification structures. Where ambiguities structures of data abstraction-:; (Mills 1988), arise, it is the formal structures that must formal grammars, regular expressions. be correct, no matter what the natural propositional logic, predicate calculus, etc., language says. in short, any appropriate mathematical structures at all. Specification structures are developed incrementally, with formal team review for Different parts of a system typically correctness and simplicity at each step, and require different specification techniques. often undergo substantial revision to Box structures are a natural means to correct errors or take advantage of better specify behavior of a system and its ideas. No design work on an increment is subsystems. Within box structure undertaken until its specification is agreed specifications, formal grammars and then by all team members to .be correct. This semantics in conditional rules can provide level of formality is well suited to dealing the level of precision required. Much of with inevitable changes in requirements. COBOL/SF was specified with extensive The intellectual control provided by formal formal grammars, which are closely structures permits the precise impact of related to the problem domain. Grammars changes to be quickly assessed and were Written both for the COBOL languages accommodated. processed, and for internal string substitution operations in terms of No unnecessary work for the sake of recognition and transformation formality was undertaken in specifying grammars. COBOL/SF; the specifications were written to a level of formality sufficient to

4 guarantee completeness and correctness in Designs are constructed by repeatedly team reviews. decomposing specified functions into control structures and subspecifications, as Cleanroom testing of COBOL/SF required illustrated in Figure 3 for a miniature specifying a statistical user input design fragment, and not by assembling distribution of COBOL programs with control structures into designs through acts realistic statement frequencies and coding of heuristic invention. The difference is pattems, in order to generate test cases crucial, even though both processes end up randomized against the distribution. with a structured program, because only the Published papers analyzing COBOL former ·provides the referential program inventories provided statement transparency at each decomposition step frequencies, which were used by a PC-based required for correctness verification. generator to produce non­ executable, random COBOL programs for testing. [for queue q and stack s, append to q all Formal Design members of s (if any) in order followed by eoq, sets to emptyl The design of COBOL/SF was carried out using function-theoretic methodology expands to: (Linger 1979]. In the function-theoretic approach, program designs are regarded as mathematical objects, namely, rules for do [for queue q and stack s. append to q all functions, and designs are treated as members of s (if any} in order followed expressions in an algebra of functions. with by eoq, set s to empty) keywords if, while, etc., as function (for queue q and stack s, append to q all operators. members of s (if any) in order, set s to empty] - The syntactic forms required for function­ back (q] := eoq theoretic design are embodied in a Process od Design Language (Linger 1979] whose principal components are function expands to: (subspecification) definitions, delimited by square brackets, and their decompositions do [for queue q and stack s. append to q all into control structures, containing new members of s (if any) in order followed function definitions, as illustrated in by an eoq, set to empty] Figure 2. Great effort is expended in (for queue q and stack s, append to q all developing concise and correct members of s. (if any) in order, set s to intermediate function definitions, since empty] these serve as specifications in the while functional verification. Well over half the not empty (s) COBOL/SF design text is devoted to function do (move next member of stack s to definitions. queue q] x :=top (s) Sequence: Ifthenelse: While do: back (q) := x od dolO 111 111 back (q) := eoq lg) if while od (h) p ,·,. p od then do Figure 3. lg) (g) Stepwise Decomposition of a Miniature else od Design Fragment (h) fi The entire design, not just its most Figure 2. interesting parts, is embodied with full Syntactic Forms for Function-Theoretic precision in each decomposition step at Design increasing levels of detail. Because statistically generated tests can exercise

5 exceptional cases as well as mainline processing. each increment must address the entire user input distribution, not just Formal Verification its principal components. In Cleanroom there is no protected testing of mainline Formal verification begins with functions. specifications. which are checked line-by­ line in team reviews for correctness against The objective of formal correctness requirements. For example. formal verification in team reviews requires grammars for OS/VS COBOL and VS COBOL designs that are as small and simple as II. comprised of some 1500 productions possible. to help promote effective each, were verified for correctness in reasoning by team members. intensive team reviews. As a result. no grammar errors whatsoever were Properly educated and motivated humans encountered in field testing. have substantial latent capability for logical precision in correctness At the design level. traditional inspection verification. but only if program methodology is aimed at finding errors complexity can be held below a critical through mental execution of program paths threshold. Dijkstra's original motivation in group reviews. Such a process places for structured programming was to reduce demands on long term memory, to recall the size of correctness proofs. but two path histories and branches. and non-local additional factors contribute to complexity reasoning, to integrate the effects of as well. namely. proliferation of state space operations encountered. Worse. it is a non­ data objects. forced by insufficient finite activity, since programs of any size abstraction in the design. and sheer growth contain a virtually infinite number of in design size. likewise forced· by possible paths. insufficient abstraction of case analyses into more general forms with simpler In contrast. function-theoretic design designs (the first idea is rarely the best verification is aimed at verifying the idea!). correctness of successive function decompositions [Mills 1986b). This process Data structured programming [Mills 1986a) is a reduction to practice of the Correctness was used to reduce the number of state space Theorem [Linger 1979), which defines the objects and simplify correctness correctness conditions that must hold for verification. In this approach. data objects every control structure, as illustrated in with disciplined access to data, such as Figure 4 in terms of correctness questions stacks and queues. are employed, rather t? ~pply in team reviews. Every design is a than objects with random access to data. fm1te structure of function decompositions, such as arrays and pointers. The result is a and hence is verified in a finite. and large, sharp reduction in the number of objects number of mental function comparisons and their references. Disciplined data based on the correctness questions. Most of access designs are more difficult to invent, the function comparisons are made in but easier to verify, with less state seconds in team reviews through highly information required in the mind at each structured group dynamics, with more time verification step. taken if an error is suspected. Literally hundreds of such verifications can be made To help reduce the size of designs and the in a day's work, with astonishing savings quantity of logical material to be verified, possible in testing later on. In illustration, simpler design approaches were actively the 3300-line COBOL/SF Parser Generator sought in review, and redesigning for program contained some 700 control simplicity was made an explicit objective. structures. representing around 1200 This activity produced astonishing results, correctness questions to be asked and with factors of up to five in size reductions answered in team review, easily achieved. For example, the prototype of accomplished in a few days work. COBOL/SF, estimated at 100 KLOC of PL/I by an independent IBM group, required just It is common wisdom today that all 20 KLOC as a result of data structured software errors are the result of inevitable programming and design simplification. human fallibility: however, function theoretic design and verification processes

6 prove otherwise. It turns out that nearly all Cleanroom Implementation software errors result from heuristic development processes, and not from Once correctness verification is complete human fallibility itself. Heuristic for each increment. the designs are development processes lack crucial translated into the target language. in this mathematical properties such as case. PL/I. No acts of invention are referential transparency for decomposition permitted in the translation: hard-won and verification. and so embody errors of design correctness must be maintained process that cannot be distinguished from across the language representations. PDL human errors. Rigorous processes such as designs are carried to a level of detail the function theoretic approach provide sufficient to ensure statement-to-statement full referential transparency, and do not mappings into PL/I. In addition, a PC-based carry errors of process ·in their application. translator was written to automate the Like doing long division. one may make implementation process. errors in computation. but they are readily identified through verification as errors of. It is worth noting that all development human fallibility in following a rigorous work, from specification through design process. and verification was carried out on Personal Computers. with a simple text The COBOL/SF experience demonstrates an editor as the only development tool. That upper bound on human fallibility on the is, the specifications and designs were order of three to four errors I KLOC treated strictly as accumulating logical remaining after a rigorous development objects in text form, in a development and verification process and before first process aimed at ensuring their execution. Cleanroom testing then finds completeness and correctness at each step. and fixes these errors to arrive at a near Once translation to PL/I was completed. the zero defect product. We believe that well programs were shipped to a mainframe to over 9()0AJ of the 70 errors I KLOC in current begin compilation and testing under full industrial experience are in fact due to the engineering change control. In the processes in use and not the people. Cleanroorn approach. only the certification engineers who execute the Cleanroom tests have access to the . With no unit Sequence: debugging. compilation during For all inputs, development is simply unnecessary. As a does (g) followed by [h) do (fl? result. PC-based development with no compilation or execution capability is Ifthenels.e: practical, and economical as well. For all inputs, whenever p is true, does [g) do [f) Formal tools to support mathematical and specification, design. and verification will whenever p is false, does [h) do [f)? be welcome when they become available, but we believe that tools for heuristic Whiledo: specification, design, and trial and error For all inputs, coding, testing, and debugging are counter does the whiledo terminate productive. and whenever p is true. does [g) followed by Cleanroom Testing [f) do [f) and whenever pis false. does doing nothing Cleanroom testing proceeds for successive do[f]? cod~ increments by executing test cases randomized against projected user input distributions and recording the resulting inter-fail time intervals. The accumulating Figure 4. Correctness Questions for the Control time intervals are used by a PC-based certification model to compute current Sttuc~sof~e2 mean time to failure (MTTF) [Curritt 1986). Failures are reported by certification engineers back to the software engineers.

7 Errors are fixed as they are found. and the Cleanroom Management code retumed to testing. For high quality code. error frequency drops quickly in the testing and inter-fail times increase Cleanroom team management is technical dramatically. In these cases. the certified engineering management. not MTTF rapidly exceeds total test time. administrative management. A team manager must ensure proper engineering The MTTF values for early increments methodologies in team operation. and must provide a scientific basis for managing be an active participant in high level development of later increments. say by specification and design. Team allocating more effort to verification if management requires a deep understanding MTTF values are too low. or even of formal methods. but also a deep compressing schedules if the values are conviction in their effectiveness. Without higher than required. courage of convictions. it is easy to cut comers when the going gets rough. just The types of errors present in Cleanroom when the best methods are needed most. code are very different from current industrial experience. The errors left In illustration. our Cleanroom team behind after formal correctness understands that any code that exhibits verification are invariably "simple high error rates (say 7 or 8 errors 1 KLOC) in blunders." requiring little effort to find and early Cleanrootn testing will come off the fix. For example. an incorrect conjunction machine and back into design and review. (say. an "and" where an "or" was intended). Such action is rarely required. but occurs or a missing parameter on a call statement typically at a time of stress. say from a tight are typical errors. schedule which itself contributed to the high error rates. To an observer accustomed The errors tend to show up quickly· in the to heuristic methods. taking code off the early testing; it is often the case that all the machine may seem foolhardy. but time errors that will ever be found occur in the spent in rethinking the formal structures first few test cases. For example, the will save far more time in testing later on. COBOL/SF Parser Generator was brought up In Cleanroom. the primary function of in four increments subjected to 120 testing is to certifY code. not debug it. statistically generated test cases. Twelve minor errors were found. all in the first five Cleanroom team management is carried cases. with error-free execution from then out primarily through education in on. now passing three year's use. software engineering methodology. day-by­ day. in group and individual interaction. A Cleanroom project is scheduled on the Every design decision. every review. every basis of code increments running defect free execution failure is an opportunity to within a day or two of first execution. discuss. evaluate. arid improve the use of Under ten percent of project time is devoted formal methods. to implementation and testing. Evolution of Cleanroom work products This experience is in sharp contrast to the through iterations of design and review is deeper structural and interface errors an egoless process. All errors are team commonly encountered with heuristic errors. the result of human fallibility in development processes. This difference formal verification. Any error that reflects a synergism between mathematical survives review was missed by every team verification and statistical testing. The member. However. Cleanroom team former leaves behind simple errors that are success is a source of pride and easily found by test cases that cover the accomplishment that is difficult to entire input distribution. Of course it is understand without firsthand experience. impossible to give a foolproof proof that a When a code increment runs right the first program is zero-defect. but that conclusion time on a machine and every time is increasingly justified as error-free thereafter (as has occurred many times in executions accumulate over months and developing COBOL/SF). the team years of use. satisfaction and motivation is remarkable indeed. Such performances become the "personal best" of the team. and anything

8 less only strengthens the resolve to References improve. [Boehm 1981) Boehm, B. W. Software Cleanroom team performance requires Engineering Economics. Englewood Cliffs. both depth of knowledge in formal methods N.J.: Prentice-Hall. 1981. and the convictions to apply them. The formal methods are based on the flexibility [COBOL/SF 1988a) COBOL Structuring and precision of mathematics. not on the Facility Re-Engineering Concepts. IBM latest buzz words of the moment. For Publication SC34-4079. 1988. example, the buzz word view of structured programming is syntactic and superficial, [COBOL/SF 1988b) COBOL Structuring namely, programming with no gatos. but Facility Users Guide and Reference. IBM the mathematical view is semantic and Publication SC34-4080. 1988. powerful. namely, programming in an algebra of functions with verification at [Curritt 1986) Curritt, P. A.; Dyer, M.; & each decomposition step. In fact. this Mills. H. D. "Certifying the Reliability of mathematical view defines the only known Software." IEEE Transactions on Software process that produces programs that are Engineering. Vol. SE-12, No. 1. (Jan. 1986). correct by construction. [Gries 1981) Gries. D. The Science of One way to begin Cleanroom operations is Programming. Springer Verlag. 1981. to first form a single team whose members have been introduced to the formal [Hoare 1969) Hoare. C. A R "An Axiomatic methods, and can educate and reinforce Basis for ." each other in pilot projects to develop the Communications of the ACM. Vol 12. required depth of understanding and (October 1969): pp. 576-83. convictions. [Jones 1986) Jones. C. Programming The firsthand experience of team Productivity. New York. N. Y.: McGraw­ membership is a critical step in developing Hill. 1986. future team leaders. in creating a whole new set of expectations and performance [Linger 1979) Linger. R C.: Mills, H. D.; & capabilities in software development. Once Witt, B. I. Structured Programming: Theory a team has demonstrated Cleanroom and Practice. Reading, Mass.: Addison­ Software Engineering capability, new Wesley. 1979. teams can be formed by cloning, as team members become leaders of new teams to [Mills 1986a) Mills, H. D.; & Linger. R C. continue the education process. "Program Design Without Arrays and Pointers." IEEE Transactions on Software Cleanroom performance is demanding. but Engineering. Vol. SE-12. No.2 (Feb. 1986). exhilarating too. in extending the human frontier in to new levels of [Mills, 1986b) Mills, H. D. "Structured excellence. Programming: Retrospect and Prospect." IEEE Software. (Nov. 1986). Acknowledgments (Mills 1987) Mills. H. D.: Dyer. M.: & Linger. It is a pleasure to acknowledge the excellent R. C .. "Cleanroom Software Engineering." comments of Kathy Cannaday and the IEEE Software. (Sept. 1987) referees in the preparation of this paper. [Mills 1988) Mills. H.D.; Linger, R. C.: & Hevner, A R. "Box Structured Information Systems." IBM Systems JournaL Vol. 26, No.4 (1987) pp. 395-413..

9