A Software Science Model of Compile Time
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. VOL. 15. NO. S. MAY 1989 543 A Software Science Model of Compile Time WADE H. SHAW, JR., SENIOR MEMBER, IEEE, JAMES W. HOWATT, ROBERT S. MANESS, AND DENNIS M. MILLER Abstract-Halstead’s theory of software science is used to describe larger system whose purpose is not primarily computa- the compilation process and generate a compiler performance index. tional, such as a weapons system or a process controller. A nonlinear model of compile time is estimated for four Ada com- pilers. A fundamental relation between compile time and program Physically, an embedded system may range from a single modularity is proposed. Issues considered include data collection pro- microcomputer to a network of large computers” [2]. For cedures, the development of a counting strategy, the analysis of the example, one area of use will be in the field of avionics. complexity measures used, and the investigation of significant relation- In the development of avionics software, efficient com- ships between program characteristics and compile time. The results pilers are needed. As more Ada compilers become avail- suggest that the model has a high predictive power and provides inter- esting insights into compiler performance phenomena. The research able, tools are needed to validate and evaluate these com- suggests that the discrimination rate of a compiler is a valuable per- pilers to determine which, if any, could best meet DoD formance index and is preferred to average compile time statistics. requirements. One measure of interest in compiler com- parisons is the computer time required to translate source Index Terms-Ada compilers, compile time, performance indexes, software science. code into executable machine code. Currently, benchmark test suites are used; however, they have a poor reputation because the performance fig- I. INTRODUCTION ures are sometimes cited out of context and overgeneral- ECHNOLOGICAL advances in computer software ized into overall ratings [3]. What is needed is an ap- Tare changing the way we understand the underlying proach that provides insight into the effect of intrinsic processes governing software design. Computer systems software characteristics on compile time. Researchers, are becoming more numerous, more complex, and deeply such as Maurice Halstead [4], [5], have raised questions embedded in our society. Inherent in this explosion of about the existence of fundamental principles that govern technology exist questions concerning fundamental rela- the design and execution of software. Halstead’s goal was tionships between the processes of problem definition, al- to develop objective measures of programming time and gorithm selection and coding, and translation into an ex- effort to make sound judgments about software quality and ecutable image. We can no longer write programs, but complexity. must “engineer” software for our systems to offset the The motivation for this research is threefold. First, the rising cost of software development [ 11. application of software science to the compiling problem The Department of Defense (DoD) recognized this is a straightforward extension of the theory. The degree challenge in the 1970’s and realized that a new standard to which concepts proposed in software science can be language could be created to encourage the use of modern used to explain compile time represents reinforcement of software engineering principles [2]. With the introduction the basic tenents offered by Halstead. Second, software of the Ada programming language for DoD, software en- science may, in fact, produce insight into the physical gineering tools are needed to evaluate the performance process of compilation. Clearly, compile time is a rela- and reliability of this language. Ada was developed under tively minor aspect of compiler performance. Neverthe- sponsorship of the DoD to support development of soft- less, variation in source code characteristics such as op- ware for embedded computer systems. “By definition, an eratorloperand frequency are manifested in varying embedded computer system is one that forms a part of a compile times which represent a phenomenon that is not fully understood. Any relationships uncovered by map- Manuscript received February 11, 1987; revised August 3, 1987. ping characteristics of source code to a model of compile W. H. Shaw, Jr. and J. W. Howatt are with the Department of Electrical time offer some evidence of a natural process. Finally, the and Computer Engineering, Air Force Institute of Technology, Wright- Patterson AFB, OH 45433. use of a compile time model allows direct comparison of R. S. Maness was with the Department of Electrical and Computer En- alternative compiler implementations as well as compar- gineering, Air Force Institute of Technology, Wright-Patterson AFB, OH ison of target architectures. Use of simple averages does 45433. He is now with the Air Force Satellite Control Facility, Peterson AFB, CO. not yield as sensitive a metric as models specifically de- D. M. Miller was with the Department of Electrical and Computer En- signed to reduce the error component always present in gineering, Air Force Institute of Technology, Wright-Patterson AFB, OH performance measurement. A model yields structure to 45433. He is now with the Air Force Operational and Test Center, Kirtland AFB, NM. the problem of compiler/machine comparisons so that sta- IEEE Log Number 8926735. tistical tests can be used with a known precision. U.S. Government work not protected by U.S. Copyright 544 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. VOL. 15, NO. 5, MAY IYXY We propose to apply the fundamental concepts of Hal- where V has a unit of measurement in bits. That is, stead’s software science theory to determine if an exten- log2 (n) bits are needed to distinguish each of the n tokens sion of his theory could be used to explain compile time in a program. and evaluate Ada compilers [6], [7]. Specifically, the re- An algorithm may be implemented by many different, search hypotheses are as follows. but functionally equivalent programs. When an algorithm 1) There is no variability in compile time for Ada pro- is implemented in its most succinct form, then its poten- grams which can be explained by the relationships pro- tial volume V* is posed in software science. 2) There is no variability in the performance of the V* = (2 + nf)(log2(2 + nf)) (4) software science model of compile time attributable to where nf is the number of input/output parameters and characteristics of the program. (We develop these char- “2” is the number of required operators: the procedure/ acteristics later.) function name and the parameter list grouping operator. 3) There is no variability in the performance of the This represents the size of the program if it existed as a software science model explained by alternative com- built-in function or procedure call. Halstead then argued piler/computer systems. that the amount of time required to implement an algo- These hypotheses allow for the development and testing rithm is directly proportional to the square of the program of the applicability of software science in three ways. volume ( V)divided by the potential volume ( V*) and a First, how well does the model predict compile time and constant ( S ): what fundamental relationships exist between compile time and the software metrics proposed by Halstead? Sec- T = V’/(SV*). (5) ond, is there a difference in the model’s performance The constant “S” represents the speed of the programmer across various categories of Ada code (such as high versus or the number of mental discriminations per unit of time. low percentage control flow code)? Third, can perfor- Halstead used a value of 18 because in his experiments, mance differences between compilers and machines be 18 gave him the best results when comparing actual ver- detected using software science measures and, if so, can sus predicted programming time. We use the parameter a performance measure be developed? “S” (denoted K) as a measure of compilation rate; it The next section presents the theory applicable to this therefore represents a performance index. investigation to provide a background for the compile time model. The research methodology and experimental de- 111. RESEARCHMETHOD sign used to evaluate the compiler model are presented as Halstead’s programming time equation serves as the well as results of the experiment for four compilers. Fi- basic theoretical model. The equation is specified as a set nally, we summarize the results and conclude the paper of independent variables related by a set of parameters to with comments on the applicability of the compile time be estimated. The dependent variable is the actual CPU model. time required for the compilation process. The volume 11. HALSTEAD’SSOFTWARE SCIENCE THEORY (V) and the potential volume (V*)are the independent variables. Placing (5) in parameter form yields In his classic work on software science [5], Halstead attempted to define and measure the complexity of soft- T = KV“( V*)b. (6) ware by analyzing program source code. Halstead defined four basic metrics computable from the code: This equation has the exact form as Halstead’s time equation where “K” is the discrimination rate, “U” is 2, nI = the number of unique operators, and “b” is - 1. “K” is assumed to have the same mean- n2 = the number of unique operands, ing in the compilation process as the constant “S” in Hal- NI = the total number of occurrences of operators, stead’s equation for predicting programmer time. “K” N2 = the total number of occurrences of operands. represents how fast the compiler does its job (the pro- Using these basic metrics, Halstead defined the vocab- cessing rate) and will depend on the compiler architecture ulary n of a program to be the total unique tokens: and the efficiency of the compiler itself. Clearly, “K” can be interpreted as a performance index given that ‘‘a” = 121 n2 n + (1) and “b” are known.