Measurements to Manage - July 97 8/15/08 3:53 PM

Measurements to Manage Software Maintenance

George E. Stark The MITRE Corporation

Software maintenance is central to the mission of many organizations. Thus, it is natural for managers to characterize and measure those aspects of products and processes that seem to affect cost, schedule, quality, and functionality of a software maintenance delivery. This article answers basic questions about software maintenance for a single organization and discusses some of the decisions made based on the answers. These examples form a useful basis for other software maintenance managers to make better decisions in the day‐to‐day interactions that occur on their programs.

Software maintenance, the process of modifying existing operational software, is a critical topic in today's industry [1]. Maintenance consumes between 60 percent and 80 percent of a typical product's total software lifecycle expenditures [2‐ 4], and over 50 percent of programmer effort is dedicated to it [5]. Given this high cost, some organizations view their maintenance processes as an area in which to gain a competitive advantage [6]. Several authors note that maintenance of software systems intended for a long operational life pose special management problems [7‐9]. The Institute (SEI) believes that organizational processes greatly affect the predictability and quality of software [10], while J. Arthur and R. Stevens explain the importance of documentation to software maintenance [11]. Additionally, M. Hariza et al., B. Curtis, and C. Yuen conclude that programmer experience is at least as important as code attributes in determining software maintenance complexity [9,12,13]. Thus, software maintenance planning and management should be formalized and quantified. In 1994, the Missile Warning and Space Surveillance Sensors (MWSSS) program management office took over the maintenance effort on the seven products upon which this article is based. Executing in 10 locations worldwide, these products total 8.5 million source lines of code written in 22 different languages. Some systems are more than 30 years old; the newest system became operational in 1992. To understand and manage the software maintenance effort, we instituted a measurement program based on V. Basili's Goal‐Question‐Metric paradigm [14] and other reported measurement experiences [15‐19, 23]. Our goals are to improve customer satisfaction and meet our commitments to cost and schedule. The program questions relate to the size and type of workload we handle, the amount of wasted effort, the cost and schedule of a release, and the quality of the release. Table 1 contains the results.

Goal Question Metric(s) Maximize Customer How many problems affect the Current Change Backlog Satisfaction customer Software Reliability How long does it take to fix an Change Cycle Time from Date Emergency or Urgent problem? Approved and from Date Written Minimize Cost How much does a software Cost per Delivery maintenance delivery cost? How are the costs allocated? Cost per Activity What kinds of changes are being Number of Changes by Type made? How much effort is expended Staff Days Expended/Change by per change type? Type How many invalid change Percentage of Invalid Change requests are evaluated? Requests Closed Each Quarter Minimize Schedule How difficult is the delivery? Complexity Assessment Software

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 1 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Computer Resource Utilization How many changes are made to Percentage Content Changes by the planned delivery content? Delivery Are we meeting our delivery Percentage of On‐time Deliveries schedules? Table 1. Software maintenance goals, questions, and metrics.

Notice that the questions can support more than one goal. For example, the question "Are we meeting our delivery schedules?" supports the goal of customer satisfaction as much as the goal to minimize schedules. For brevity, Table 1 shows only one occurrence. Engineers and managers in our organization use the metrics information three ways:

Direct attention ‐ what problems do they need to address? Solve problems ‐ which choice should they make? Keep score ‐ how are they doing?

This article examines software maintenance in this context, summarizing the MWSSS software maintenance process, showing which process and product changes are under consideration, and describing some limitations to our measurement program. The Software Maintenance Process Figure 1 depicts the MWSSS software maintenance process. Using a problem report, a user identifies a desired change, e.g., a change in requirements or a need to fix an improper function. An analyst at the user's location reviews the problem report for completeness, checking it against known problem reports to eliminate duplication and to determine if the user correctly understood the system. If the problem was previously unknown but correctly identified, the analyst categorizes the problem report as either a software, hardware, or communications equipment problem. For software problems, a software change form (SCF) is generated. The analyst completes 18 SCF data items, including dates for submission and approval, current method, proposed change, justification, and resource estimates.

Figure 1. Software maintenance process.

The analyst then sends the completed SCF to a user board for validation and to a software maintenance engineer for independent evaluation; this maintenance engineer then evaluates the SCF based on three criteria: (1) effort required to complete the change, (2) computer resources required, and (3) impact on the quality of service. The engineer makes the effort estimate based on the taxonomy of change types described later. If this independent analysis is 20 percent greater or less than original user's estimates, the user and engineer meet to resolve the difference. This often helps clarify the requirement.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 2 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Another interfacing system or the maintenance team also can generate SCFs that affect systems for which they are not directly responsible but with which they must interface. These may result from technical initiatives. For example, if system A is upgrading its communication protocol and it exchanges information with system B, people from system A can generate an SCF for system B to upgrade its communications protocol. Next, the user board categorizes the SCF as either a modification or a fix. A modification is a change that is not a part of the current system requirements. A fix is a fault correction. The board also recommends a priority to the SCF: Emergency if the SCF is necessary to avoid system downtime or if high‐priority mission requirements dictate; Urgent if the SCF is needed in the next software delivery for an upcoming mission or to fix a problem that arose as a result of a change in operations; Routine for all other SCFs, e.g., unit conversions, default value changes, and to fix printout. Next, the SCF is queued for incorporation into a version release. The users, maintenance team, and system engineering organization then negotiate the version content. This planning and approval process includes a review of several metrics: release complexity, software reliability, software maintainability, and computer resource utilization. The maintenance configuration control board then reviews the release plan and schedules the release. The software engineering team then completes the design, code, test, installation, and quality assurance of the release, with user and system engineering reviews at milestone dates throughout the release. The metrics in Table 1 reflect the four information products in Figure 1 over which our management team has control: requirements validation, independent costing, delivery content definition, and release implementation. The earlier phases in the process, although important, have little effect on our organization; thus, we do not measure them.

Maximize Customer Satisfaction Customer surveys and interviews revealed that our customer is satisfied when few system problems affect their ability to do their job, complaints are resolved quickly, and the supplier meets its commitments. The metrics that follow address these issues.

How Many Problems Affect the Customer? We track the number of unresolved customer complaints over time. Figure 2 shows the backlog of change requests over the past two years for one project, plus the planned and actual release content. The triangles represent the planned number of changes for a release, the bars represent the actual number of changes closed, and the squares represent the total backlog of change requests at the end of the month. Managers use this chart to allocate computer and staff resources, plan release content, and track the effect of new tools or other process improvement programs over time. If the backlog remained at zero, a manager would not conclude that no problems are affecting the user; it instead may indicate that their office is overstaffed or in equilibrium with the incoming change requests. G.E. Stark, L.C. Kern, and C.W. Vowell describe an alternative to this metric, which looks at the change request arrival and closure rates [17]. Figure 2 also demonstrates how well the organization is meeting schedule commitments. It shows that we made eight of the last 10 deliveries on schedule.

Figure 2. Backlog of change requests, release plan, and actuals by month.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 3 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Software Failure Rate We also track the software failure rate in the field to understand how many problems are affecting the customer. A failure is charged whenever the system performs below user requirements. The site analyst and engineer inspect each problem report to determine if a software failure occurred. All downtime incidents (reported in monthly maintenance logs) caused by a software failure are counted. Figure 3 shows the failure rate for the past nine deliveries of one product. Over the nine releases, the maintenance team has reduced the number of operational failures from 6.8 per 1,000 operational hours to fewer than 2.4 per 1,000 hours. This improvement in product quality shows up in the backlog chart of Figure 2. Users generate fewer change requests when the system operates without failure for longer periods.

Figure 3. Software failure rate by delivery.

The system manager uses the failure rate information in several ways. First, system managers set a threshold at four failures per 1,000 hours of operation. If the system is operating far below this threshold, the manager may decide to incorporate more difficult changes into the next release. If the system is operating above the threshold, the decision may be to switch to a previous version of the system or to only allow fault correction and no modifications until the system is again below the threshold. Second, we use the graph to estimate the expected number of events during a mission and the probability of completing a mission of certain duration without an event. For example, if the customer is planning a one‐week mission and needs to know the probability of the software delivery supporting the entire mission, the probability is given by Pr[no fail in wk] = exp(‐168 hrs/wk*.002 fails/hr) = 0.71 or a 71 percent chance that the system will not fail in one week of operation because of a software problem. Finally, we use the graph in Figure 3 to establish future system requirements with the customer. We have used the historical failure rate as a quality requirement to trade off the cost and schedule against when the customer requests a major system upgrade. J.D. Musa, et al., describe the approach well [20].

How Long Does It Take to Fix a Priority Problem? This is another question related to customer satisfaction. Figure 4 helps gauge satisfaction with a cumulative distribution graph of two important process durations. The line with square markers represents the percentage of priority changes delivered within a given number of days from the time the user writes the change request. The line using diamond markers represents the time to deliver a priority change after the requirement has been approved by the using community.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 4 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Figure 4. Software change form cycle times.

The horizontal distance between the two lines on the chart is the "in‐process" time, i.e., the time the change spends in board review. For this system, the in‐process time averaged 54 days. The chart also shows that approximately 80 percent of our urgent change requests are installed and ready for use within 90 days of user board approval or within 170 days from when they are written by the customer. Thus, these three simple graphs tell us how many problems are affecting the customer, whether we are meeting our schedule commitments, and how long it takes to fix a priority problem.

Minimize Cost Many people participate in the software maintenance process. The 11 cost categories of software maintenance are activities, configuration management, quality assurance, security, administrative support, travel, project management, system engineering, hardware system maintenance, system management, and finance. These costs are further broken into categories of common and system‐specific costs. Hardware maintenance is a system‐specific cost since each of our systems uses different hardware platforms requiring different levels of hardware maintenance. To ease accounting, we apportion travel to a release rather than an activity. The remaining categories are common costs across our organization. Some people participate in more than one activity, and many activities do not require full‐time staff attention. We allocate the common costs to each release as a cost per day.

How Much Does a Delivery Cost? By measuring the cost, each activity contributes to the total release cost, managers can direct their cost reduction efforts in certain areas. For example, Figure 5 shows the per release cost for 17 deliveries. The average cost for this set is $373,500 with a standard deviation of $231,700. We use these high‐level statistics for long‐range (two to five years) budget planning.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 5 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Figure 5. Cost per delivery for one year.

How Are the Costs Allocated? For a typical release, the cost associated with the software development activities (design, code, unit test, and integration test) and hardware system maintenance make up 88 percent of the total release cost. The other nine cost categories account for the remaining 12 percent. This is encouraging because it shows that most of our money is going to the productive work of producing a release. We would, however, like to reduce the total cost of a release. Thus, the hardware system maintenance cost is a target of opportunity. We are examining the root causes of the hardware maintenance cost to find areas for improvement.

What Kinds of Changes Are Being Made? To answer this question, we developed the software change taxonomy shown in Table 2. It includes 10 types of changes and root causes for each change type.

Computational

Incorrect operand in equation Incorrect use of parentheses Incorrect/inaccurate equation Rounding or truncation error

Logic

Incorrect operand in logical expression Logic out of sequence Wrong variable being checked Missing logic or condition test Loop iterated incorrect number of times

Input

Incorrect format Input read from incorrect location End‐of‐file missing or encountered prematurely

Data Handling

Data file not available Data referenced out‐of‐bounds Data initialization Variable used as flag or index not set properly

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 6 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Data not properly defined/dimensioned Subscripting error

Output

Data written to different location Incorrect format Incomplete or missing output Output garbled or misleading

Interface

Software/hardware interface Software/user interface Software/database interface Software/software interface

Operations

COTS/GOTS software change Configuration control

Performance

Time limit exceeded Storage limit exceeded Code or design inefficient Network efficiency

Specification

System/system interface Specification incorrect/inadequate Requirements specification incorrect/inadequate User manual/training inadequate

Improvement

Improve existing function Improve interface

Table 2. Software change taxonomy.

We categorized the changes delivered in the last eight releases using this taxonomy. This consisted of 67 modification (38 percent) and 110 fix (62 percent) changes. Figure 6 is a Pareto diagram of this change data. The left vertical axis shows the actual number of changes attributed to each class, the right vertical axis represents the cumulative percentage of defects, making it a convenient scale from which to read the line graph. The line graph connects the cumulative percents (and counts) at each category.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 7 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Figure 6. Software maintenance changes by type.

How Much Effort Is Expended per Change Type? Figure 7 is a Pareto diagram of the effort required to make each change. This figure shows that although changes based on requirements or interface specification changes ranked fourth in number of changes with 22, they account for 42 percent of the total effort at 582 staff‐days. Logic changes fall to third in effort when viewed in this manner.

http://www.stsc.hill.af.mil/crosstalk/1997/07/maintenance.asp Page 8 of 13 Measurements to Manage Software Maintenance - July 97 8/15/08 3:53 PM

Figure 7. Staff‐days effort by category.

By reviewing change requests and accurately assigning them to the change taxonomy, we can estimate the engineering staff‐days required to design, code, and test individual changes. For example, the average staff‐days of effort due to interface specification changes are 36 staff‐days, with a standard deviation of 43 staff‐days, whereas the average due to requirements specification changes is 22 days, with a standard deviation of 24 staff‐days. We combine this information with the distribution of expenses to allocate activity costs. Next, we track the actuals against the estimate, updating the taxonomy and cost information as each release is completed.

How Complex Is the Delivery? We agree with B. Curtis that complexity is "a loosely defined term" [21] that is usually concerned with understanding the structure of the code. Several measures with this goal in mind have been proposed [22‐25]. We believe that the complexity of a maintenance release involves much more than just the code. Hence, we define complexity as the degree to which characteristics that affect maintenance releases are present. "Characteristics that affect" include product characteristics (such as age, size, documentation, criticality, performance), management processes (such as abstraction techniques, verification and validation, maturity, tools), staff experience (such as system, scheduling, group dynamics), and the environment (such