Program Comprehension During Software Maintenance and Evolution

Armeliese von Mayrhauser rogram understanding is a major factor in providing effective A. Marie Vans software maintenance and enabling successfulevolution of com- Colorado State University P puter systems. For years, researchers have tried to understand how programmers comprehend programs during software maintenance and evolution. Five types of tasks are commonly associatedwith software maintenance and evolution: adaptive, perfective, and corrective maintenance; reuse; and code leverage. Each task type typically involves certain activities (see Table 1). Some activities, such as understanding the system or problem, are common to severaltasks. To analyze the cognitive processesbehind these tasks, researchershave developed several models (see Table 2). In this article, we describe common elements of six cognition models and compare them based on their scope and experimental support. All models use existing knowledge to build new knowledge about the mental model of the software that is under consideration. Many of these models are based on exploratory experiments, and some of those models have been validated. Programmers employ various strategies and use cues in code or documentation as guidance. However, their level of expertise greatly influences how efficiently they understand the code. Because of limited knowledge about specialized cognition needs for some maintenancetasks, the code cognition models do not representevery single task listed in Table 1. Most models assume that the objective is to understand all of the code rather than a particular purpose, such asdebug- ging. While these general models can foster a complete understanding of a piece of code, they may not always apply to specialized tasks that more efficiently employ strategies geared toward partial understanding. We identify open questions, particularly considering maintenance and evolution of large-scalecode. These questions relate to scalability of existing experimental results with small programs, validity and credibility of results based on experimental procedure, and challenges of data Code cognition models availability. examine how programmers COMMON ELEMENTS OF COGNITION MODELS The program comprehensionprocess uses existing knowledge to acquire understand program code. The new knowledge that ultimately meets the goals of a code cognition task. This processreferences both existing and newly acquiredknowledge to build authors survey the current a mental model of the software that is under consideration. Understanding depends on strategies.While these cognition strategies vary, they all for- knowledge in this area by mulate hypothesesand then resolve, revise, or abandon them. comparing six program Knowledge Programmers possesstwo types of knowledge: general knowledge that comprehension models. is independent of the specific software application they are trying to under-

Computer 0018.9162/95/$4.000 1995 IEEE stand and software-specific knowledge r rable 1. Tasks and activities requiring code understanding. s that represents their level of understanding of the software application. During the understanding process, they acquire more Maintenance tasks Activities software-specific knowledge but may Adaptive Understand system also need more general knowledge-for Define adaptation requirements example, how a round-robin algorithm Develop preliminary and detailed adaptation works. Existing knowledge concerns the design programming languages and computing Code changes environment, programming principles, Debug domain-specific architecture choices, algo- Regression tests rithms, and possible solution approaches. If the programmer has worked with the Perfective Understand system code before, existing knowledge includes Diagnosis and requirements definition for any (partial) mental model of the software. improvements New knowledge primarily concerns the Develop preliminary and detailed perfective design software product. It is acquired through- Code changes/additions out the code understanding process. This Debug knowledge relates to functionality, soft- Regression tests ware architecture, the way algorithms and objects are implemented, control, Corrective Understand system dataflow, and so on. It obviously spans Generate/evaluate hypotheses concerning problem many abstraction levels from “this is an Repair code operating system” to “variable q is incre- Regression tests mented in this loop.” The understanding process matches existing knowledge with Reuse Understand problem, find solution based on close software knowledge until the program- fit with reusable components mers believe they understand the code. Locate components The set of matches is the mental model; it Integrate components can be complete or incomplete. Code leverage Understand problem, find solution based on Mental model predefined components The mental model is an internal, work- Reconfigure solution to increase likelihood of using ing representation of the software under predefined components consideration. It contains static entities Obtain and modify predefined components such as text structures, chunks, plans, Integrate modified components hypotheses, beacons, and rules of discourse. Top-level plans refine into more detailed plans or chunks. Each chunk, in Table 2. Code cognition models. turn, represents a higher level abstraction of other chunks or text structures. A chunks construction combines several Model Maintenance activity Authors dynamic behaviors, including strategies, Control-flow Understand Pennington’ actions, episodes, and processes. Functional Understand Pennington’ Top-down Understand Soloway, Adelson, Static elements of the mental and Ehrlich2f3 model RisV Text-structure knowledge includes the Integrated Understand, corrective, Van Mayrhauser and Vans5 program text and its structure. Text-struc- adaptive, and perceptive ture knowledge for understanding grows Other Enhancement Letovsky6 through experience and is stored in long- Corrective Vessey’ term memory. Pennington’ uses text-struc- Understand Brooks8 ture knowledge to explain control-flow Shneiderman and Mayer9 knowledge for program understanding. Structured programming units form text structure and organize knowledge. tionships. For example, a Begin statement starts a block Examples of text-structure knowledge units include con- of code, while a subsequent If statement indicates a control primes-iteration (loop constructs), sequences, and ditional control structure with a particular purpose. Their conditional constructs (for example, If-Then-Else); vari- relationship is such that the If is part of the block initiated able definitions; module calling hierarchies; and module by the Begin. parameter definitions. The program text’s microstructure Chunks are knowledge structures containing various contains the actual program statements and their rela- levels of text-structure abstractions. Text-structure chunks

August 1995 are called macrostructures,which are identified by a label hypothesis. Hypothesescan fail for three reasons:code to and correspond to the program text’s control-flow orga- verify a hypothesis can’t be found; confusion exists nization.’ For example, the microstructure for a sort becauseone pieceof codesatisfies different hypotheses;or includes all its statements,while the macrostructure is an code can’t be explained. abstractionof the block of code and includes only the label Hypothesesdrive the direction of further investigation. sort. Lower level chunks can form higher level chunks. Generating hypotheses about code and investigating Higher level chunks comprise severallabels and the con- whether they hold or must be rejectedis an important facet trol-flow relationships between them. of code understanding. Plans are knowledge elements for developing and validating expectations,interpretations, and inferences;they Dynamic elements of the mental model capturethe comprehender’sattention during the program A strategy guides the sequenceof actions while follow- understanding task. These plans also include causal ing a plan to reach a particular goal. For example, if the knowledge about the information flow and relationships goal is to understand a block of code, the strategymight be between parts of program. Plans are schemas (frames) to systematically read and understand each line of code with two parts: slot types (templates) and slot fillers. Slot while building a mental representation at higher and types describe generic objects, while slot fillers are cus- higher abstraction levels. An opportunistic strategy stud- tomized to fit a particular feature. Data structures such as ies code in a more haphazard fashion. lists or trees are examples of slot types, and specific pro- Strategiesalso differ in how they match programming gram fragments are examples of slot fillers. These struc- plans to code. Shallow reasoning2a3does so without in- tures are linked by either aKind-ofor an Is-a relationship. depth analysis.Many experts do this when they recognize Programmingplans can be high-, low-, or intermediate- familiar plans. Deep reasoning2a3looks for causal rela- level programming concepts.For example,searching, sort- tionships among procedures or objects and performs ing, and summing algorithms as well as data-structure detailed analyses. knowledge such as linked-lists and trees are intermediate- Strategies guide two understanding mechanisms that level concepts. Iteration and conditional code segments produce information: chunking and cross-referencing. are low-level concepts. Programming plan knowledge Chunkingcreatesnew, higher level abstraction structures includes roles for data objects, operations, tests, other from chunks of lower level structures. As structures are plans, and constraints on what can fill the roles. recognized, labels replace the detail of the lower level Domain plans incorporate all knowledgeabout the prob- chunks. In this way, lower level structures can be chunked lem areaexcept for codeand low-level algorithms. Domain into larger structures at a higher abstraction level. For plans apply to objectsin the real world. For instance,plans example, a code segment may represent a linked-list def- to develop a software tool for designing automobiles inition as pointers and data. In an operating system defi- would include schemasrelated to the function and appear- nition, this may be abstractedas a ready queue. The code ance of a generic car. An appropriate plan would also segmentthat takes a job from the ready queue,puts it into include slots for problem domain objects such as steering the running state, monitors elapsedtime, and removesthe wheels, engines,doors, and tires. Domain plans are crucial job after expiration of the time quantum may be abstracted for understanding program functionality. Control-flow as a round-robin scheduler.The code for the queue,timer, plans alone are not enough to understand aspectssuch as and scheduling are microstructures. Continued abstrac- causal relationships among variables and functions. tion of the round-robin scheduler, dead-lock resolution, Domain plans also addressthe environment surrounding interprocess communication, processcreation/deletion, the software application, the domain-specific architec- and processsynchronization eventually leadsto the higher ture, and solution alternatives. level structure definition, “process management of the Letovsl@refers to hypothesesas conjecturesand defines operating system.” them as results of comprehensionactivities (actions) that Cross-referencingrelates different abstractionlevels, such can take secondsor minutes to occur. Letovsky has iden- as a control-flow view and a functional view, by mapping tified three major types of hypotheses: program parts to functional descriptions.For instance,rec- ognizing that a code segment manages processes and l why conjectures hypothesize the purpose of a function determining its purpose sayssomething about functional- or design choice; ity. Hence,cross-referencing is essentialto building a men- l how conjectures hypothesize the method for accom- tal representation acrossabstraction levels. plishing a program goal; and Code cognition formulates hypotheses,checks whether l what conjectureshypothesize classification-for exam- they are true or false, and revisesthem where necessary. ple, a variable or function. Hypotheses,like plans and schemas,exist at all levels of abstraction. The key is to keep the number of open Degrees of certainty associated with a conjecture vary hypothesesmanageable while increasing understanding. from uncertain guessesto almost certain conclusions. Brook9 theorizes that hypothesesare the only drivers Facilitating knowledge acquisition of cognition. Understanding is complete when the men- Beacons, cuesthat index into knowledge, can be text or tal model contains a complete hierarchy of hypotheses.At a component of other knowledge. For example, a swap the top is the primary hypothesis: a high-level description statement inside a loop or a procedure can be a beaconfor of the program function, which is necessarilyglobal and a sorting function; so can the procedure name Sort. nonspecific. Subsidiary hypothesessupport the primary Beaconsare useful for gaining a high-level understanding.8

Computer Rules of&scour-se are conventions in programming such as coding standards, algorithm implementations, expected use of data structures, and so on. Rules of discourse set programmer expectations. Programming plans are retrieved from Dacumntation long-term memory via these expectations. Soloway, Adelson, and EhrlichL show that expert programmers perform significantly better on plan-like code (code fragments that match expert programming plans) than on nonplan-like code. In practice, this means that unconventional algorithms and programming styles are much harder to understand, even for experts. goal in specification layer is accomplished Expert characteristics and by which parts of the implementation layer) A programmer’s level of expertise in a given domain greatly affects program Figure 1. Letovsky comprehension model. understanding. Experts tend to show the following characteristics: annotation layer links each goal in the specification layer

l They organize knowledge structures by functional char- to its realization in the implementation layer. However, acteristics of the domain in which they are experts. For these links can be incomplete. The dangling-purpose unit instance, novices might understand a particular pro- models such unresolved links. gram organized according to the program syntax. But The assimilation process occurs either top-down or bot- experts might organize program knowledge in terms of tom-up. It is opportunistic in that the programmers pro- algorithms rather than the syntax used to implement ceed in the way they think yields the highest return in the program.“’ knowledge gain. To contribute to one of the three layers

l Experts have developed efficiently organized special- constl ucted in the mental representation, the under- ized schemas,“’ often abstracted from previously standlng process matches code, documents. and so on designed software systems. with elements from the knowledge base.

l Specialized schemas contr-ibute to efficient pr-oblem decomposition and comprehension.‘” Top-down corn- Shneiderman and Mayer model prehension becomes feasible for problems that match Shneiderman and Mayer’s comprehension model (see specialized schemas. Figure 2)” recodes the program in short-term memory into

l Experts approach problem comprehension with flexi- an internal semantic representation via a chunking bility.’ They discard questionable hypotheses and process. These internal semantics contain different levels assumptions much more quickly than novices do, and of program abstraction. At the top are high-level concepts they tend to generate a breadth-first view of the pro- such as program goals, The lowest levels contain details gram. As more information becomes available, they such as the algorithms used to achieve program goals. refine their hypotheses. Long-term memory, a knowledge base with semantic and syntactic knowledge, assists during internal semantics construction. Syntactic knowledge is programming COGNITION MODELS language dependent, while semantic knowledge concerns Common elements of program cognition models occur general programming knowledge independent ofany spe- in various theories. We discuss the most important ones cific programming language, Like working memory, below. semantic knowledge in long-term memory is layered and incorporates high-level concepts and low-level details. Letovsky model Program understanding is directional: It starts with the Letovsky’s high-level comprehension model (see Figure progl-am code and continues until the problem statement 1) has three main components: a knowledge base, a mem is reconstructed. tal model (internal representation), and an assimilation process.” The knowledge base contains programming Brooks model expertise, problem-domain knowledge, rules of discourse, Brooks” sees program comprehension as the recon- plans, and goals. struction of the domain knowledge used by the initial The mental model has three layers: a specification, an developer. Understanding involves recreating the map- implementation, and an annotation layer. The specifica- pings flrom the problem domain into the programming tion layer-the program’s highest abstraction level- domain, through several intermediate domains. completely characterizes the program goals. The The problem (application) domain concerns real-world implementation layer contains the lowest level abstrac- problems-for example, maintaining appropriate inven- tion. with data structures and functions as entities. The tory levels. The objects are inventories, physical entities

August 1995 Problem Program statement Internal semantics Working memory I +

+ Problem Program statement

ILong-term memory

High-level concepts

Other n Low-level details -Comprehension Semantic knowledge Syntactic knowledge

Source: Shneiderman and Mayer8

Figure 2. Shneiderman and Mayer comprehension model.

domain schemas

External representation External representation Program code User manuals in problem domain Maintenance manuals

Internal representation/mental model Hypothesis and subgoals I..______....._.__~~~-----.--~~.~...... ---~-~~~~~~~....~....~~~~~------~~~~..

~ Figure 3. Brooks comprehension model.

Computer “,; . _.... .~~.~~...... ~i...... ~....~. Internal representation j Current mental representation of program : _ _~ ~~ f (plandschemas)

Figure 4. Soloway, Adelson, and Ehrlich comprehension model. that have properties such as size, cost, and quantity. Once can then be decomposed into the typical elements of that the physical objects are characterized, intermediate system type. Theoretically, new code could be understood knowledge domains are required. Inventory cost might entirely in a top-down manner if the programmer had include not only the actual cost but also overhead such as already mastered code that performed the same task and storage. Accounting knowledge recognizes the appropri- was structured in exactly the same way. ate overhead calculations. Knowledge of program syntax This model uses three types of plans: strategic, tactical, helps implementation. This example uses at least four dif- and implementation. Strategic plans describe a global strat- ferent knowledge domains to reach the programming egy used in a program or algorithm and specify language- domain: inventories, accounting, mathematics, and pro- independent actions. Tactical plans are local strategies for gramming languages. solving a problem; these plans contain language-indepen- Knowledge within each domain includes details about dent specifications of algorithms. Implementation plans domain objects, the operations allowed on them, and the are language dependent and are used to implement tacti- order in which those operations are allowed. Interdomain cal plans; they contain actual code fragments. knowledge describes the relationships between objects in A mental model is constructed top-down. It contains a different but closely related domains, such as the rela- hierarchy of goals and plans. Rules of discourse and bea- tionship between the general operating system domain cons help decompose goals into plans and plans into lower and the Unix operating system domain. level plans. Typically, shallow reasoning builds the con- The mental model is built through a top-down process nections between the hierarchical components. that successively refines hypotheses and auxiliary Figure 4 shows the model’s three major components. hypotheses. Hypotheses are iteratively refined, passing The triangles represent knowledge (programming plans through several knowledge domains until they can be or rules of discourse). For example, a subset of the rules of matched to specific code in the program or a related discourse might include the following: document.

Figure 3 illustrates the Brooks model. Knowledge, l variables updated same way as initialized, shown as triangles, can be used directly for hypothesis gen- 9 no dead code, eration in the mental model or matched (mapped) from l a test for a condition means that the condition must be one domain to another. A cognitive process verifies that potentially true, internal representations reflect knowledge contained in l don’t do double duty with code in a nonobvious way, external representations such as code, design documents, and or requirements specifications. Beacons are the main vehi- l an If is used when a statement body is guaranteed to cle for this verification, which is also hypothesis driven. execute only once; a While is used when the statement Once a hypothesis is generated the external (internal) rep- may need to be executed repeatedly. resentations can be searched to support the hypothesis. The diamond represents the understanding process. Top-down model: Soloway, Adelson, and The rectangles illustrate internal or external program rep- Ehrlich resentations. (External representations include docu- Top-down understanding (see Figure 4)2 .i typically ments such as requirements or design documents; code; applies when the code or type of code is familiar. The code user, reference, or maintenance manuals; and other mis-

August 1995 ' r...__------._, r------_--,

model : j (final mental

model :

~ Figure 3. rennmgton comprehension model.

cellaneous related documents. Internal representations program model is created via the chunking of microstruc- include plans and schemas.) The understanding process tures into macrostructures and via cross-referencing. matches external representations to programming plans Programming plan knowledge, consisting of program- using rules of discourse to select plans, by setting expec- ming concepts, exploits existing knowledge during the tations. Once a match is complete, the internal represen- understanding task and infers new plans for storage in tation is updated to reflect the newly acquired knowledge. long-term memory. These updated mental representations are subsequently Least recently used (LRU) page replacement for mem- stored as new plans. ory management is an example of plan knowledge from Comprehension begins with a high-level goal and then the operating systems domain. Data structure knowledge generates detailed subgoals necessary to achieve the may contain the implementation of a first-in, first-out higher level goals. Program documentation and code (FIFO) queue. invoke implementation, strategic, or tactical plans, depending on the current mental representation’s focus. SITUATION MODEL. This model also is built from the bottom up, as the program model creates a dataflow/fimc- Pennington model: bottom-up comprehension tional abstraction. The model requires knowledge of real- Comprehension develops two different mental repre- world domains, such as generic operating system structure sentations: a program model and a situation model. The and functionality for the operating system domain. The program model is usually developed before the situation situation model is complete once the program goal is model. reached. Domain plan knowledge is used to mentally represent PROGRAM MODEL. Pennington’ II found that when code the code in terms of real-world objects, organized as a is completely new to programmers, the first mental rep- functional hierarchy. For example, the situation model resentation they build is a control-flow program abstrac- describes the actual code “pcboards = pcboards - sold” as tion called the program model. This representation, built “reducing the inventory by the number of PC boards sold.” from the bottom up via beacons, identifies elementary Lower order plan knowledge can be chunked into higher blocks of code control primes in the program. order plan knowledge. Pennington uses text structure and programming plan A situation model is built by cross-referencing and knowledge to explain program model development. The chunking. The matching process takes information from

Computer rable 3. Model evaluation criteria.

Soloway, Model Abstraction Shneiderman Adelson, Integrated criterion level Letovsky and Mayer Brooks and Ehrlich Pennington model Static: Low Knowledge Syntactic Programming Implementation Text- Program Knowledge base knowledge domain plans structure model structures knowledge structures

Intermediate Knowledge Semantic Intermediate Tactical Plan Situation base knowledge domain plans knowledge structures

High Knowledge Semantic Problem Strategic Top-down base knowledge domain plans structures Static: Low Implementation Working Hypotheses Plans/schemas Program Program Mental layer memory: and subgoals model model representations Low-level concepts Intermediate Hypotheses PlansJschemas Situation Situation and subgoals model model High Specification Working Hypotheses Plans/schemas Top-down layer memory: and subgoals model High-level concepts Dynamic: Direction Top-down or Top-down or Top-down Top-down Bottom-up Top-down or Processes bottom-up bottom-up bottom-up Experimentation Type Small-scale Small-scale Self- Small-scale Small-scale Large-scale code code observation code code code experiments experiments experiments experiments experiments experiments I the program model and builds hypothesized higher order 6),5 has four major components: the top-down, situation, plans. These new plans are stored in long-term memory and program models and the knowledge base. The first and chunked to create additional higher order plans. three reflect comprehension processes. The fourth is Figure 5 graphically represents Pennington’s model. The needed to successfully build the other three. Each com- right half illustrates program model building, while the ponent is involved in the internal representation of the left half describes situation model construction. Text- program (or short-term memory) and a strategy to build structure knowledge (control primes; program structure; this internal representation. The knowledge base furnishes syntax; programming conventions; and control sequence the process with information related to the comprehen- knowledge such as sequences, iterations, or conditions) sion task and stores any new and inferred knowledge. For and external representations (code, design documents, large systems, a combination of approaches to under- and so on) are inputs to the comprehension process. standing becomes necessary. Therefore, the integrated Beacons can invoke a particular schema (for example, a model combines the top-down understanding of Soloway, swap operation causes the programmer to recall sorting Adelson, and Ehrlich3 with the bottom-up understanding functions). Code statements and their interrelationships of Pennington.’ Nevertheless, experiments show that form the microstructure. Microstructures are chunked into programmers switch between all three comprehension macrostructures (chunked lines of text organized by con- models.5 trol primes). These chunks are stored in long-term mem- Any of the three submodels may become active at any ory and used to build even larger chunks. time during the comprehension process.For example, dur- The program model can change after situation model ing program model construction, a programmer might construction begins. A cross-reference map allows direct recognize a beacon indicating a common task such as sort- mapping from procedural, statement-level representa- ing. This leads to the hypothesis that the code sorts some- tions to a functional, abstract program view. Higher order thing, causing a jump to the top-down model. The plans can cause a programmer to reconsider the program programmer then generates subgoals and searches the model and alter or enhance it as necessary. The program- code for clues to support these subgoals. If the programmer may directly modify the text base or use these plans mer finds a section of unrecognized code during this as input to the program model comprehension process. search, he may return to program model building. Structures built by any one model component are acces- Integrated Metamodel sible by the other two, but each model component has its The integrated code comprehension model (see Figure own preferred knowledge types.

August 1995 Top-down process Documents

Program model process

Top-down structures Programming plans A. Strategic plans

Microstructure Chunking

Macrostructure 0 A. Functional

C. Rules of discourse 1 Knowledge base Figure 6. Integrated Metamodel.

EVALUATION AND ANALYSIS edge in long-term memorywith new external information (such as code) into a mental representation. Each differs in Model scope the amount of detail for these three main components. Model evaluation uses the following criteria: First. does Letovsky’s model is the highest level cognition model, the model incorporate static structures that represent per- emphasizing the mental representation’s form. No details sistent knowledge and the system’s current mental repre- explain how the knowledge assimilation process works or sentation? Second, does it represent dynamic processesthat how knowledge is incorporated into the mental representa- build the mental representation using knowledge? Third, tion The knowledge types coincide with Soloway, Adelson, to which extent was each model validated by experiments? and Ehrlich’s model. Shneiderman and Mayer’s model is For each model, Table 3 (on the previous page) lists the more detailed, since it organizes knowledge in a hierarchy criteria, the abstraction level for each static property, the and separates semantic and syntactic knowledge. Like the direction of dynamic model building, and the experiment Letovsky model, this model focuses on the mental represen- type used in validation. Blankcells indicate that the model tation’s form but lacks details on knowledge construction. does not include the associated attribute. For example, Brooks’ model differs from the other models in that all Letovsky’s model does not have a representation for an changes to the current mental representation stem from intermediate mental representation of the system. hypothesis, The mental model is constructed in one direction only, from the problem domain to the program Evaluation domain. However, we have observed that program under- All six models accommodate a mental representation of standing is not unidirectional.i Nevertheless, both the the code, a body of knowledge (knowledge base) stored in Brooks model and the Soloway, Adelson, and Ehrlich long-term memory, and a process for combining the knowl- model construct the mental representation top-down,

Computer Each model represents important aspects of code comprehension, and many characteristics are shared among different models. For example, Brooks, Letovsky, and Shneiderman and Mayer all focus on hierarchical layers in the mental representations. Brooks and Soloway, Adelson, and Ehrlich use a form of top-down program Situation model process comprehension, while Pennington uses a bottom-up approach to code understanding. And Letovsky and Shneiderman and Mayer use both top-down and bottom- (1 up comprehension. All five models use a matching process Beacons between what is already known (knowledge structures) 4 and the artifact under study. No one model accounts for all behavior as programmers understand unfamiliar code. Read The Integrated Metamodel responds to the cognition needs for large software systems.It combines relevant por- 9 tions of the other models and adds behaviors not found in them-for example, when a programmer switches between top-down and bottom-up code comprehension.

Analysis Cognition models and the theories behind them must be grounded in experiments that either collected the information supporting those theories or validated it. Several types of program comprehension experiments have been conducted. The appropriate type depends on From top-down the purpose of the study and the amount of existing model research. Objectives range from theory building to vali- Bunking dation of detailed portions of a theory. Studies to build a theory are typically observational. Behaviors are observed as they occur in the real world. Once a theory is built from m observations, correlational studies can be designed. These studies determine enough about the behavior in question to explore relationships among variables. At the other end Situation of the spectrum, hypothesis-testing experiments investi- model gate cause and effect between variables. Hypothesis test- el ing consists of carefully controlled experiments whose purpose is to validate an existing theory. Correlational 1 studies indicate possible relationships between variables. Let us now analyze what aspects of code cognition have been covered by existing experiments and how much we really know about how programmers understand code. from the highest abstraction level through finer levels of Table 4 organizes code cognition experiments in terms of detail. The Brooks mode1 leaves the knowledge structures models, common cognition elements, code size, pro- undefined. Although hypotheses are important cognition gramming language, and subjects. For code size, “small” drivers, mental representations can be updated by other refers to programs of less than 900 lines of code (LOC); means-for example, a strategy-driven method (using a “medium” refers to between 900 and 40,000 LOC; while cross-referencing, systematic, or opportunistic strategy). “large” refers to more than 40,000 LOC. Languages such ~ Pennington’s model is more detailed and includes spe- as Cobol, Fortran, Basic, and Pascal are distinguished from cific descriptions of the cognition processes and knowl- state-of-the-art development environments such as edge. It accounts for the types, forms, and composition of C/Unix with tools like Make or Lint. Subjects are catego- knowledge to construct most of the mental representa- rized as novices, grad students, or professional program- tion. It also contains mechanisms for abstraction. The mers. Each cell in the table represents the number and major drawback is its lack of higher level knowledge struc- types of experiments that investigated the row component ’ tures such as design or application-domain knowledge. with the column attribute. Experiments are classified as Soloway, Adelson, and Ehrlich’s model emphasizes the observational (0), correlational (C), or hypothesis-test- highest level abstractions in the mental model. One aspect ing (H). For more details on these experiments, see von that distinguishes this model is its top-down development Mayrhauser and Vans? of the mental model with its assumption that the knowl- Many experiments validate or verify portions of the edge it uses has been previously acquired. By itself, this models presented here. For example, several experiments model does not account for situations when code is novel have addressed the use of plans and strategies. Results and the programmer has no experience to use as a back- seem to support the top-down theory of program corn- plane in which to plug new code. prehension. On the other hand, only observational exper-

August 1995 rable 4. Number and type of experiments-observational (0). correlational (C), or hypothesis-testing (H)-that investigated each component with a given code cognition attribute.

Language Pascal, Subject Code size Fortran, c, Grad Component Small Medium Large Cobol environments Novice student Professional Top-down 0:l 0:l 0:2 Situation 0:l 0:l 0:l 0:l 0:2 Program 0:l 0:l 0:l 0:l 0:2 Other 0:2 0:3 0:l 0:3 Processes 0:l 0:l -0:l o:, y&&: Hypotheses 02 0:2 02 Strategies 05 0:s 0:l y‘o:;! ; 0:4 C:l Cl .g c;:r H:l H:l -Ap &I' Plans 0:l 0:l 0:l C:l 0:l C:l C:l C:l H:2 H:l H:3 H:3 H:2 Rules of discourse H:l H:l H:l H:l Chunks 0:l 0:l C:l 0:l C:l C:l C:l H:l H:l H:l Episodes 0:l 0:l 0:l 0:l 0:l 0:2 Beacons H:2 0:l 01 0:l H:l H:2 H:3 Hz2 : Text structures 0:l o:l Actions 0:l 0:t 0:l 03 I

iments have examined the use of text structures during STATICORDYTUMICBEHAVIOR. Currentresultsmainly bottom-up processing. Correlational and hypothesis- focus on the static properties of programming skills. For testing experiments are needed to lend credence to the example, experiments identified persistent knowledge situation models and program models of program areas, such as searching or sorting algorithms, but did not comprehension. investigate knowledge use or application. Table 4 clearly reveals the unexplored areas of code comprehension. Only one experiment investigating cognition THEORY BUILDING. Many experiments are designed to elements used large-scale code. Just two experiments used measure specific conditions (for example, whether or not medium-sized code. A C/Unix environment, probably the programmers use plans) but the experimental hypothe- most commonly used state-of-the art environment today, ses (for example, that programmers use plans when appeared in only three experiments. As far as purpose, understanding code they have never seen before) are not most studies investigated strategies, beacons, or plans. Few always based on a well-defined program-comprehension studies addressed processes, rules of discourse, actions, theory. Theories regarding large-scale program compre- program comprehension episodes, or entire cognition hension for specialized maintenance tasks are in their models. In addition, most were observational experi- infancy. ments-which is appropriate, since the development of program-comprehension models is still in the theory-building phase. In light of this information, program cognition PROGFMMUNDERSTANDINGISAKEYFACTORinsoftwaremain- must concentrate on three issues: scalability of experi- tenance and evolution. We have summarized the major elements, static or dynamic behavior, and theory building. ments of cognition, how they are represented in several different program models, and the importance of experi- SCALABILITY OF EXPERIMENTS. We mUSt investigate mentation in developing and validating model theories. whether the many well designed experiments that use Several conclusions can be drawn from this survey: small-scale code can scale up for production code. For

example, Pennington’s study of mental representations of l While a lot of important work exists, most of it centers code’ used 200.LOC programs. These studies can answer around general understanding and small-scale code.

questions about understanding small program segments l Existing results from related areas need further investi- or very small programs. However, they do not address the gation. For example, Pennington’s model borrows from interactions of isolated components of understanding or work in understanding technical reports. Perception whether these results can play an important role in under- and problem solving are also strongly related to pro- standing large programs. gram comprehension.

Computer l Some results generalize or appear as components of Behavior: A Model and Experimental Results,” Int’l J. Com- t larger results. For example, elements of Pennington’s puterand Information Sciences, Vol. 8, No. 3, 1979, pp. 219- and Soloway,Adelson, and Ehrlich’smodels appear in 238. the Integrated Metamodel. 10. R. Guindon, “Knowledge Exploited by Experts during Soft-

l Data availability is a challenge.Finding expert software ware Systems Design,“Int’2J. Man-MachineStudies, Vol. 33, engineers that work on production code is difficult 1990, pp. 279-182. unless the companies that maintain large-scale code 11. N. Pennington, “ComprehensionStrategies in Programming,” encouragetheir maintenanceengineers to participate in Proc.Second Workshop Empirical S tudiesof Programmers, program comprehensionexperiments. Existing exper- Ablex Publishing, Norwood, N.J., 1987, pp. 100-112. iments inadequately addressthese work situations. 12. A. von Mayrhauser and A.M. Vans, “Program Understanding:

l The literature fails to provide a clear picture of com- A Survey,” Tech. Report CS-94.120, Colorado State Univer- prehensionprocesses based on specializedmaintenance sity Fort Collins, Col., 1994. tasks such as adaptive or perfective maintenance. Models of the general understanding process play an important part in developing a complete understanding of a pieceof code. However, they may not always be Anneliese von Mayrhauser is a professorof computer representative for narrow tasks such as reuse or scienceat ColoradoS tate University.She is QI!SOthe director enhancements that might more efficiently employ of the ColoradoAdvanced Software Institute, an organiza- strategies gearedtowards partial understanding. tion of thestate of Coloradofortechnology transfer research. Her researchinterests focus on software engineering,par- A better grasp of how programmers understand code and ticularly testingand maintenance.Von Mayrhauser received what is most efficient and effective can spawn various a Dipl. Inf degreefrom the University of Karkruhe and an improvements, such as better tools, better maintenance AM degreeand PhDfiom Duke University, all in computer guidelines and processes,and documentation that sup- science.She is the author of a software engineeringtext and ports the cognitive process. I over80 journals and conferencepublications. She is the IEEE ComputerSociety ’svicepresidentforpublications.

References ~- A. Marie Vans is a PhD candidateat ColoradoS tate Uni-

1. N. Pennington,“S timulusS tructuresand Mental Represen- versity.As asoftware researchengineerfor Hewlett-Packard, tations in ExpertComprehension of ComputerPrograms, ” she developedspecifications and testsfor computer hard- CognitivePsychology, Vol. 19,1987, pp. 295-341. ware. Her researchinterests include empirical studies ofpro- 2. E. Soloway andK . Ehrlich,“ EmpiricalS tudiesof Program- grammers and design of experiments for software mingKnowledge, “IEEETrans. SofhvareEng.,Vol. SE-lo, No. engineering.Vans receivedan MSdegreefrom Colorado S tate 5, Sept. 1984, pp. 595-609. Universityand a BSdegreefrom Cal$ornia State University 3. E. Soloway, B. Adelson, and K. Ehrlich, “Knowledge and at Sacramento,both in computer science.She is a member Processesin the Comprehension of Computer Programs,” in of UpsilonPi Epsilon. TheNature ofExpertise, M. Chi, R. Glaser, and M. Farr, eds., A. Lawrence Erlbaum Associates, Hillsdale, N.J., 1988, pp. Readerscan contact the authors at theDepartment of Com- 129-152. puter Science,Colorado S tate Universig, Fort Collins, CO 4. R.S. Rist, “Plans in Programming: Definition, Demonstration, 80.523; e-mail {avm, vans}@cs.colostate.edu. and Development,” Proc.First Workshop EmpiricalStudies ofProgrammers, Ablex Pub- lishing, Norwood, N.J., 1986, pp. 28-47. 5. A. von Mayrhauser and A.M. Vans, “Com- prehension Processes During Large-Scale Maintenance,“Proc. 16thlnt’lConf. Software Engineering, IEEE CS Press, Los Alamitos, Calif., Order No. 5855,1994, pp. 39-48. Name (Please Print) 6. S. Letovsky, “Cognitive Processes in Pro- gram Comprehension,“Proc.First Workshop PLEASENOTIFY Empirical Studies of Programmers, Ablex US 4 WEEKSIN ~ New Address Publishing, Nor-wood,N.J., 1986, pp. 58-79. ADVANCE 7. I. Vessey, “Expertise in Debugging Com-

puterprograms: AProcessAnalysis,“lntlJ. City State/Country Zip Man-Machine Studies, Vol. 23, 1985, pp. 459-494. 8. R. Brooks, “Towards a Theory of the Com- prehension of Computer Programs,“Int’lJ. Man-Machine Studies, Vol. 18, 1983, pp. MAIL TO: IEEE Service Center 543-554. 445 Hoes Lane 9. B. Shneiderman and R. Mayer, “Syntac- Piscataway, NJ 08854 tic/Semantic Interactions in Programmer

August 1995