<<

PRACTICES Cultures and Evolution

Working effectively with a legacy software requires software engineers to view it in the context in which its developers created it.

Václav o work effectively with legacy code, soft- who created legacy software used the Rajlich ware engineers need to understand a legacy knowledge and customs appropriate for the time and Wayne program’s culture—the combina- circumstances in which they were working. Just as an University tion of the ’s background, the anthropologist needs background information to cor- hardware environment, and the program- rectly interpret human cultures, having information Tming techniques that guided its creation. about the environment in which developers produced Norman Software systems typically pass through a series of a given piece of legacy software helps software engi- Wilde stages.1 During the initial development stage, software neers understand it. Michelle developers create a first functioning version of the Imagine a software engineer attempting to reengi- Buckellew code. An evolution stage follows, during which devel- neer the statements in the example in Figure 1, drawn University of opmental efforts focus on extending system capabili- from Convert, a Fortran program from the 1970s. A West Florida ties to meet needs. During the servicing stage, only recent computer science graduate raised on Unix or minor repairs and simple functional changes are pos- Windows would recognize that the program reads in Henry Page sible. In the phase out stage, the system is essentially some parameters and then prints them out. But the Micro frozen, but it still produces value. Finally, during the first write statement using the 9000 FORMAT is Systems close down stage, the developers withdraw the system meaningless. Why does a line containing just the digit and possibly replace it. “1” precede the output? However, a programmer from Most of the tasks in the evolution and servicing the 1970s is likely to recognize this as output for a line phases require program comprehension—under- . Such printers typically used the first column standing how and why a software program functions of output as a carriage control character, interpreted in order to work with it effectively. Effective compre- by special hardware and used with perforated paper hension requires viewing a legacy program not simply tape to control printing. A “1” typically forced a skip as a product of inefficiency or stupidity, but instead as to the top of a new page, a “0” forced a double space, an artifact of the circumstances in which it was devel- and so on. This convention was reasonable given exist- oped. This information can be an important factor in ing hardware because it reduced the number of lines determining appropriate strategies for the software transmitted from computer to printer. program’s transition from the evolution stage to the This is a typical example of the conventions that servicing or phase out stage. impact comprehension of legacy programs. Although programmers in modern computer installations rarely DEFINING SOFTWARE CULTURE encounter line printers, successful reengineering of Edward B. Tylor2 defined human culture as “...that such code requires knowledge of the line printer con- complex whole which includes knowledge, belief, art, vention. Reengineering this program could require dis- morals, law, custom, and any other capabilities and carding the first character of every record. habits acquired by man as a member of society.” The Changes in programmer background, hardware

24 Computer 0018-9162/01/$10.00 © 2001 IEEE Figure 1. Short quiz WRITE (IOUT,9000) for recent computer READ (45,9010,IOSTAT=IOS) IV,IDONT,ISORT,IASCI,KPRNT,IBIN,FACT, science graduates: * PTSS,PSTT What does the first WRITE (IOUT,9020) IV,IDONT,ISORT,IASCI,KPRNT,IBIN,FACT,PTSS,PSTT line of this code do? 9000 FORMAT ('1') 9020 FORMAT (' IV = ',I5,' IDONT = ',I5,’ ISORT = ',I5, * ' IASCI = ',I5,' KPRNT = ',I5/,' IBIN = ',I5,' FACT = ', * F5.1,' PTSS = ',F5.1,' PSTT = ',F5.1) properties, problem-solving techniques, software four years of work and several released versions, processes, and programming practices have resulted NCSA suspended work on this application in 1997. in several major shifts during the of mainstream However, is still publicly available software . The concept of what comprises (http://www.ncsa.uiuc.edu), and programmers still use a “good” program has changed radically over time. it for case studies of program comprehension tech- niques. Mosaic, which is fairly representative of the CULTURE CONTRASTS mid-1990s culture, is a single program consisting Our case studies demonstrate that the software cul- of approximately 100,000 lines of well-structured, ture strongly affects both the comprehension and evo- fairly well-commented C code in approximately 180 lution of software for large systems.3 In these studies, C source files. we used a legacy Fortran system from the 1970s and Table 1 summarizes the main differences between a more modern C application from the early 1990s. Fastgen and Mosaic. The Fastgen geometric modeling system is a suite of Fortran 77 programs that developers can use to System partitioning construct models of solid objects such as vehicles and Fastgen programs share files of intermediate results. aircraft from primitives such as triangles, spheres, Convert typically preprocesses a geometric model and cylinders, donuts, boxes, wedges, and rods. The US then feeds the output into other programs for simu- Air Force uses Fastgen to model the interactions lation, analysis, or display. Early software often used between weapons and targets by tracing rays repre- this kind of partitioning, obviously motivated by the senting explosions or projectiles.4 Convert, a Fastgen limited memory available in most . This that consists of a single Fortran 77 places a greater burden on the user, who must execute source of 2,335 lines (raw line count), is repre- each program individually and run them in the cor- sentative of the 1970s Fortran software culture. rect order with the correct parameters. One benefit is Originally developed in 1978 for a mainframe envi- that the developers wrote very accurate user docu- ronment, Convert expands simplified geometric model mentation for the individual Fastgen programs. input and transforms models into the formats other In comparison, 1990s developers implemented tools require. Software developers have updated and Mosaic as a monolithic program in which users either maintained Convert many times to keep pace with the select options from a menu or the software executes introduction of different hardware platforms. them in response to the input . NCSA Mosaic,5 one of the first widely distributed and used Web browsers, was developed in the Modularity National Center for Supercomputing Applications at In keeping with modern practices, Mosaic’s devel- the University of Illinois during the mid-1990s. After opers implemented a large number of C functions,

Table 1. Summary of Fastgen and Mosaic cultural differences.

System Partitioning Modularity Control flow Obsolete program plans Fastgen System of programs Large noncohesive Unstructured, tangled, Input/output in batches 1970s Fortran that share files subroutines many GO TOs Scratch files Global data in large Data packed into integers common blocks Binary files Mosaic Single monolithic Small cohesive functions Structured 1990s C program Little use of global data

September 2001 25 VIFOR

BROWSER – 1

convert3.f convert3.f DATA UU Obsolete program plans Obsolete program plans solve hardware and oper- comb GLOBALS ating system problems that no longer exist. The line dntwrk repms printer carriage control character is one example, but donut jcsave Convert provides others. convert iwhx Input/output batching. Convert reads and processes ctobin files geometric models in batches of 200 records. Batching made in old mainframe operating systems boxy mode more efficient because the system swapped a job that concyl neww was doing I/O out of memory. A simple read-- influe factor read-process cycle was very inefficient because it sphere specs caused multiple swaps. However, I/O batching pro- work cyl vides little benefit on more modern PC systems. The soot comp batch cycle complicates the code because processing breaks at the arbitrary 200th record, not at any logi- blk cal point such as at the end of a model component. Consider how this might affect an attempt to reengi- neer Convert. The software engineer who does not understand the plan’s purpose might continue using Figure 2. Common the 200-record cycle, just in case it conceals some subroutine use in typically with high cohesion. Relatively short descrip- important functionality of the program. Scratch files. Convert. In this dis- tions summarize each ’s purpose. In contrast, Convert opens at least seven scratch play from the Vifor Fastgen’s developers implemented it before program- files for intermediate results as soon as it starts pro- Fortran analyzer pro- mers accepted the importance of cohesion. Subrou- cessing. Because old computers had limited local mem- gram, subroutines tines are large, and they mix many unrelated issues, ory, Convert stores all data of large or unknown size— appear in the left col- which makes comprehension difficult because the sub- such as parts of a geometric model—on a scratch tape, umn and the pro- routines no longer serve as logical chunks of program and continually rewinds and rewrites these tapes. This gram’s common code. Convert holds most of the data in large global greatly complicates the understanding of Convert’s blocks appear in the common blocks that many subroutines share, as “mirroring” feature. Mirroring simplifies input of a 6 right column. A line Figure 2 shows. In comparison, Mosaic uses few geometric model by allowing the user to input just one indicates that the global variables that are well coupled with the rele- half of a symmetric component, while the system gen- subroutine references vant processing. erates the other half. The program writes the compo- the block. Convert’s poor modularity had a substantial impact nent to be mirrored to the scratch tape and then reads on program comprehension. In C, programmers use it back in. On the second pass, the system reexecutes the functions as the basic chunks for program com- much of the Convert code, but this time with data prehension—understanding each function is a step in from the scratch tape instead of the original input. the comprehension methodology. Programmers could Understanding mirroring relies on understanding the not do this with Convert because the subroutines lack scratch tape plan. the necessary cohesion. Packing data into an integer. In early computers, input data was largely constrained to punched cards, which imposed a limit of 80 characters per record. To save Although modern control flow structures are avail- record space and main memory, programmers some- able in Fortran77, Convert relied extensively on using times used complicated encodings to pack data together. GO TO. The developers may have implemented the first For example, Convert packs the following data into a version of Convert in an earlier version of Fortran and single integer for each geometric component: then ported it into Fortran77. The frequent use of GO TOs complicates reconstructing program plans and • First digit: a code for the kind of component (tri- makes code reading difficult, particularly for today’s angle, donut, sphere, box, and so on); programmers, who are accustomed to structured con- • Second and third digits: the component thickness trol flow. For example, the main loop that reads and in hundredths of an inch; processes a geometric model’s components uses GO TOs • Fourth digit: a space code, to specify what is adja- that jump both forward and backward over hundreds cent to the component; and of lines of code. This creates a complex structure that • Sign: the modeling mode (volume or plate). seems at first to be totally arbitrary. In contrast, Mosaic expresses program plans in structured control flow Convert contains code that picks this integer apart constructs and rarely uses C’s goto. to extract the individual items.

26 Computer Binary input/output. A primary use of Convert is to change, with a potentially high cost of losing convert a geometric model from ASCII to binary for- business flexibility. Hardware and mat for use by other programs in the Fastgen suite. Sometimes software culture differences are so Because the models can be quite large, using binary large that programmers who are evolving old operating systems input reduced both storage space and I/O time signif- programs must learn the old cultures. As a first have changed so icantly on earlier machines. However, this is ben- step, programmers need to document, through radically that many eficial for current PCs because much more memory is interviews and code study, some of the main prac- available. tices that were considered acceptable in different program plans in The programmers who implemented Convert periods and environments. To prepare them for legacy systems are solved legitimate problems of their time, and often encountering legacy code written for a very dif- now a dysfunctional solved them well. However, hardware and operating ferent environment, the education of newly reminder of the past. system properties have continued to change so radi- trained programmers should help them develop cally that these plans are now an obsolete and dys- an appreciation and tolerance for old cultures. functional reminder of the past. Mosaic also contains Programmers sometimes try to reengineer old many complicated program plans, but they do not code to bring it up to date. However, simply appear to be as obsolete at the moment. In a few years, translating an old program into a new language or Mosaic too is likely to appear strange and dated to restructuring the code won’t help. Obsolete program newly trained programmers. plans that solve problems that no longer exist are a troublesome cultural difference that translation or RECENT CULTURE CHANGES restructuring won’t solve. Because these plans are vari- Antipatterns are an example of a recent software able and have complicated interactions, programmers programming change. William Brown and colleagues7 cannot identify and remove them automatically. described a collection of antipatterns that program- Documentation of the old culture will facilitate the mers should avoid. Controversially, one of the antipat- task of programmers who must perform this task terns is functional decomposition, which was widely manually. used in software architecture as late as the early 1990s. The Mosaic system we describe uses functional he problem of software cultural change will decomposition. Brown’s book clearly states that func- not go away in the future. For example, tional decomposition is a bad strategy and that new T the National Aeronautics and Space Admini- projects should not use it. Of course, this offers little stration is planning very long term space probes, which help to the programmer who must maintain a pro- will have missions lasting 50 years or more. Such space gram from the functional decomposition culture. vehicles will undoubtedly require a large volume of Harry Sneed8 argued that program code is the result software, both onboard and as ground support. of the programmer’s thought patterns, which in turn Presumably, programmers will use today’s best soft- depend on the programmer’s experience. Understand- ware practices for these programs, but it would be ing and maintaining the programs requires decipher- foolish to assume that the software will not require ing these patterns. major evolution over such a time period. In the year Code decay, the accumulation of factors that make 2050, where will NASA find trained software engi- software changes difficult, drives a neers willing to immerse themselves in the archaic from the evolution phase to the servicing phase. Loss ways of the early 21st century? ✸ of expertise is the main reason for code decay, and cul- ture change is a major contributor. Recent computer science graduates invariably received training in the Acknowledgment C/Unix, object-oriented, or component-based culture, This work was partially supported by the Air Force remaining unaware of programming methods used Office of Scientific Research under grant number several decades ago. Older programmers may still F49620-99-1-0057. retain knowledge of these methods, but as time pro- gresses, this source of expertise is disappearing. Yet organizations still use programs implemented References decades ago, and—as the Y2K problem showed— 1. V.T. Rajlich and K.H. Bennett, “A Staged Model for the in some cases, these programs are mission-critical. Software Life Cycle,” Computer, July 2000, pp. 66-71. The large investment these systems represent makes 2. E. Tylor, Primitive Culture, Harper and Row, New York, continuing evolution a necessity, but understanding 1958. the code becomes progressively more difficult. 3. N. Wilde et al., “A Case Study of Feature Location in Consequently, organizations are forced into a servic- Unstructured Legacy Fortran Code,” Proc. 5th Euro- ing or phase-out strategy that emphasizes minimal pean Conf. Software Maintenance and Reengineering,

September 2001 27 IEEE CS Press, Los Alamitos, Calif., 2001, pp. 68-76. IEEE Computer Society and the ACM. Contact him 4. E.D. Aitken et al., A Guide to Fastgen Target Geomet- at [email protected]. ric Modeling: User’s Manual, ASI Systems, Fort Walton Beach, Fla., 1993. Norman Wilde is a full professor of computer science 5. NCSA Group, “NCSA Mosaic at the University of West Florida. His research inter- Home Page,” http://www.ncsa.uiuc.edu/SDG/Software/ ests include software maintenance and program com- Mosaic (current Aug. 2001). prehension. Wilde received a PhD in mathematics and 6. V. Rajlich et al., “Vifor: A Tool for Software Mainte- operations research from the Massachusetts Institute nance,” Software Practice and Experience, Jan. 1990, of Technology. He is a member of the IEEE and the pp. 67-77. IEEE Computer Society. Contact him at nwilde@ 7. W.J. Brown et al., Antipatterns: Refactoring Software, uwf.edu. Architectures, and Projects in Crisis, John Wiley & Sons, New York, 1998. Michelle Buckellew is a software engineer at Lock- 8. H. Sneed, “Human Cognition of Complex Thought Pat- heed Martin Integrated Systems in Orlando, Fla. Her terns—How Much Is Our Perception of the Present work involves developing software for the Joint Air- Determined by Our Experience of the Past?” Proc. 6th to-Surface Standoff Missile (JASSM). She received an Int’l Workshop on Program Comprehension (IWPC 98), MS in from the University of IEEE CS Press, Los Alamitos, Calif., 1998, http://cds. West Florida. Contact her at Michelle.Buckellew@ unina.it/~iwpc98/keynote.html. lmco.com.

Václav Rajlich is a full professor and former chair in Henry Page is a senior software engineer at Micro Sys- the Department of Computer Science at Wayne State tems, Fort Walton Beach, Fla. Page received an MS in University. His research interests include software software engineering from the University of West change, evolution, comprehension, and maintenance. Florida. He has participated in several case studies in Rajlich received a PhD in mathematics from Case software maintenance and program comprehension. Western Reserve University. He is a member of the Contact him at [email protected].

2002 EDITORIAL

IT Professional is looking for contributions about the following cover feature topics for 2002. Submit articles to [email protected]. Jan./Feb. Knowledge Management Companies that invested in knowledge sharing and gathering initiatives are taking a hard look at return on investment. Mar./Apr. Enterprise Databases Now at the core of several critical systems, databases deserve careful attention in the data modeling stages. May/June Network Security Find out what basic security measures you should be taking to protect your system from intruders and attacks. July/Aug. IT Infrastructure Building systems to fit a cohesive architecture and system organization can save you from some IT headaches. Sept./Oct. Managing Software Projects Are your software projects threatening to get out of hand? Let our experts tell you how to keep them in check. Nov./Dec. Information Resources Management Juggling scarce resources will be key to surviving the next several months as business looks for a recovery.

28 Computer