Measurement and Estimation Symposium

SEPG Conference Amsterdam 12-15 June 2006

Experience in developing metrics for agile projects compatible with CMMI best practice

Graham Collins, UCL

Abstract This report covers the experience and resolution of problems with the reporting of agile projects with a background of senior managers familiar with Earned Value (EV) reporting.

It was found to be important to retain a variety of measures and incorporate new measures based on progress (velocity) over short time periods, analogous to EV performance indicators (cpi and spi) as well as report on percentage of completed software.

The shift of emphasis from earned value progress reporting to tracking on a frequent basis and measuring what percentage software has been completed (using acceptance tests data) initially caused concerns, as the progress achieved in working software typically shows slow progress initially.

An outline of an improvement initiative is given using actual data, to try to achieve a more accurate understanding of a project status, and to show how the initial concerns of senior management were overcome. An assessment of business value, progress each week, and also the remaining work which can be correlated to the EV measurement, are compatible with Capability Maturity Model Integration (CMMI) best practice and agile development. This also had the effect of the development team being more in control, knowing they had ‘buy-in’ from senior staff and also being able to track the teams progress continually as this was visible to everyone involved in development.

This invariably has an impact on other issues such as how the life-cycle is reported to senior management and how metrics should be tailored for agile projects where emphasis is placed on prioritisation of business goals (benefits) and fixed iterations.

Background Agile practices at first sight do not seem compatible with EV, especially as the goals and requirements are refined during agile projects and EV is based on having clear goals and estimates from inception. Yet increasingly developers wish to use agile approaches, due to the advantages of teamwork, feedback from the client and seeing working software at an earlier stage. In addition organisations are increasingly requiring greater compliance and justification of their projects not just in financial terms but in tracking the value to the business. An approach which is being increasingly adopted is that of earned value management (EVM) which is now mandated on all appropriate MoD [APM 06a] projects in the UK.

Added to this is the Software Engineering Institute’s (SEI) CMMI framework which is becoming increasingly used as best practice. Chrissis outlines CMMI has its foundations [Chrissis 03] in the work of ‘principles of statistical quality control’ [Shewhart 31] although other authors [Lipke 00] refer to this work later refined by W.Edwards Deming [Deming 86] using the term Statistical Process Control (SPC). Chrissis also cites the influence of Humphrey’s Managing the Software Process, but it is worth noting that Humphrey advocates both SPC and EV approaches and recently includes detailed examples of EVM [Humphrey 05]. Anderson, whose MSF project has achieved level 3, puts forward a strong argument that CMMI framework is firmly rooted in the work of Deming and SPC [Anderson 05] and because of this it must be possible to achieve CMMI level 5. In addition Chrissis makes clear quoting the SEI that ‘the quality of a system or product is highly influenced by the quality of the process used to develop and maintain it’, in effect, laying the foundation for continual process improvement as the basis for CMMI.

Depending on the reporting required, constraints and governance of the project both approaches EVM and SPC may be required. The effectiveness and their possibility of being used in conjunction with agile projects at different levels of abstraction will be discussed, based on experience from 19 projects (2003-2006).

The CMMI takes into account the increasingly heterogeneous environments that software is developed for and at level 3 requires effective project management practices. Assessment allows different approaches, provided the main goals are achieved.

Pair reporting Working with agile teams having a role on several sites has necessitated a different style of management. There is a high level of trust in a typical agile team, the developers commit to an amount of work (tasks) on a daily, weekly and iteration basis. The development work is done in pairs as this achieves faster and more robust code. To reduce the onus on the project manager to track individuals’ progress, the tasks are outlined on the whiteboard and any problems and issues are flagged up. The rate of work agreed each day in comparison to work achieved (velocity) is marked by the pair. In effect, pair reporting, which addresses concerns and potential problems as early as possible that may need further assistance from the group. This has found to be a useful addition to the agile processes adopted and fits naturally with the agile philosophy of reporting team progress rather than individual progress.

Iterations typically use 1 month time frames, similar to the time period in the agile method Scrum (30 day ‘sprints’); however the main approach is Extreme Programming (XP) [Beck 00]. Information is stored where appropriate on whiteboards and stored via digital cameras and the use of WiKi pages. Approaches where possible have been integrated into existing programme, project and risk management frameworks.

Earned Value and Control Charts The work of [Lipke 00] in using SPC and control charts for air logistics software development projects seemed to point the way to a continuous improvement programme compatible with CMMI level 5. In 1996 these became the first software activities in federal service to achieve CMMI level 4 distinction. The development points towards the compatibility of using SPC control charts on earned value data and creating an application to report risk of outside acceptable boundaries. This was achieved by using the schedule performance index (spi) and cost performance index (cpi) and application of control limits to these indicators.

Software development in the nuclear contracting industry [Alleman 03] seemed to show the compatibility of using earned value with agile programming to gain CMMI level 3. Although the work used XP and achieved a high degree of compatibility, the estimates based on testing gave earned value figures not dissimilar to planned, suggesting the goals remained fairly constant and there were limited scope changes. Experience proved encouraging, but as is usual in agile projects where the goals (requirements) are being uncovered, EV was typically lower than planned, similar Figure 1. As soon as the velocity for the first iteration was applied then results were within a margin of error of 20%. This provided the basis to see if EV could be applied to agile projects. First it was necessary to establish the velocity and ascertain if the agile development processes was under control statistically. The problem was initially in reverse of Lipke’s work, needing to know if the process is under control, before any meaningful cpi, spi values could be derived.

The key issue remaining was that a more accurate metric was needed and that for agile projects the earned value would need to be recalculated at each iteration for improved accuracy. The agile use of acceptance tests to verify user stories complete seemed the ideal solution. User stories measured in ‘story points’ was the metric to indicate story size (and at a finer granularity tasks). As the concept of agile practices was to address the high risk and more valuable stories first this mapped to their relative business value as well.

Earned Value Definition and Summary ‘The value of the useful work done at any given point in a project. The value of completed work expressed in terms of the budget assigned to that work. A measure of project progress. Note: The budget may be expressed in cost or labour hours’ [APM 06b].

Planned

Actual Cost or value Earned Value

Time Planned (BCWS = Budgeted Cost of Work Scheduled) Actual (ACWP = Actual Cost of Work Performed) Earned Value (BCWP = Budgeted Cost of Work Performed) CPI = Cost Performance Index = BCWP/ACWP (or Earned Value/Actual) SPI = Schedule Performance Index = BCWP/BCWS (or Earned Value/Planned) note this is SPI ie cost based. c Figure 1. Earned Value chart

EV measurements should be based on working software, Figure 2., problems may arise when artefacts such as documentation and UML design are included giving a view of progress on work done (activities) rather than work directly attributed to business value. 100%

Progress ( % complete measured in scenarios

0% Iteration 1 2 3 4 coded tested Tested & Passed

Figure 2. Progress as a function of working software, adapted from [Bittner 05]

Earned Value Analysis (EVA) The problem with tracking agile projects with EV is that one of the key features of agile processes is to refine goals through iterative development work so the full extent of what is required evolves through iterations. This has the advantage that the right solution is developed although it is difficult to estimate duration from the outset. As the scope has not been finalised the estimates are poor. A resolution to this problem is to consider the business value of the development.

If we consider an agile project, Table 1, developing a web application with a project team of 6. Stories were measured in ‘story points’ to indicate relative amount of development work (tasks) required. Task times were then logged during the week to give actual (i.e. developer) cost. The table shows how complete each story is by assigning ‘points earned’. Using the approach that earned value of each task is determined by percentage complete of the planned activity [Lester 00]; the percentage of story points ‘earned’ is multiplied with planned (developer hours). As an example story number 1 is 100% complete (it has earned all its associated story points) therefore EV is equivalent to the planned (developer hours). From these values the efficiency, in terms of cpi (220/260 =0.85) and spi (220/240=0.92) can be calculated to give progress estimates. Various other approaches could also be taken, incorporating the business value, to give different interpretations of progress in terms of business value and points progression.

Story ‘Busines Story ‘Points Planned Actual EV number s Value’ Points earned’ (developer (logged (earned hours) hours) value) 1 3 10 10 100 120 100 2 2 8 8 60 60 60 3 2 4 4 60 80 60 4 1 2 0 20 0 0 total 8 24 22 240 260 220

Table 1. Recorded values for one iteration. The EV figures were automatically calculated from the figures of ‘earned points’. Business values record the relative importance of stories.

What also needed to be established, was the rate of work for the environment, the team experience and to a lesser extent the tools used. Variance (Monthly)

1.20

1.10 1.00

0.90

e 0.80 cpi u l a spi

V 0.70

0.60

0.50 0.40

0.30 1 2 3 4 5 6 Months

Figure 3. Initial planning figures based on spic and cpi figures that showed planned progress was not being maintained.

Estimates for time and cost to complete were then derived from the performance indicators spic and cpi. This was achieved on a daily and iteration basis. The estimated cost at completion equals (the original) planned/cpi and estimated time equals planned/spic. Note: spic was used in preference to spit time, which can be easily approximated by inspection of the iteration schedule. However far more useful is to see the efficiency of the process, which in the project Figure 3, was below planned on both cost and time, raising possible causes such as poor estimation or other issues. Using this data for the next project and estimating after the first iteration improved estimation figures by more than 50%. Using spi and cpi ratios is an integral part of the PSM method [McGarry 02] which is used by BAE systems to achieve CMMI level 5. However, even if historic figures are available they may not be applicable. What was required was to understand the rate of development i.e. velocity.

Measuring Velocity The rate of work or velocity, Figure 4, using story points achieved each week by the team can be established via acceptance tests. This measure helps determine the

Velocity

50 45 40 35 s t

n 30 i o P

25 y r o

t 20 S 15 10 5 0

Figure 4. Velocity or rate of work in story points achieved per week. development level for the next iteration and can be estimated within ranges [Cohn 06]. Estimation of Project Duration using Story Points Cohn considers it appropriate to set expectations using a range estimate, which is achieved ideally by running one or more iterations to give some data on the progress of the team and then applying a weighting factor as shown in Table 2 to give an upper and lower value.

Iterations Low Multiplier High Multiplier Completed 1 0.6 1.60 2 0.8 1.25 3 0.85 1.15 4 or more 0.90 1.10

Table 2 Multipliers for estimating velocity based on number of iterations completed from [Cohn 06].

It has been found sometimes useful to weight stories, which include a short value statement, in a range of 1-3 to give an estimate of business value, Table 1, and apply a multiplier to provide upper and lower range estimates.The next step was to develop process control measures to alert the team and project manager to any issues.

Control Limits Control techniques inform whether this development process is under control using the criteria set within various σ ranges [Florec 99]. When a sequence of values need to be evaluated then a time sequenced plot of individual values may be appropriate using individuals and moving range (XmR) charts such as Figure 5, using the formulae in Figure 6. Control limits based on 3σ can be set to inform when the process has values that need to be examined.

Velocity

60

UNPLx 50 s

t 40 n i o p

30 y r

o UCLR t

s 20

10

0 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 / / / / / / / / 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 / / / / / / / / 6 3 0 0 7 3 7 4 0 1 2 2 0 1 1 2 weeks

story points mR

Figure 5 Use of velocity and mR (The UCLR control limit = 20.07) k sequential measurements provide k-1 =r (two-point) moving range values

ith moving range = mRi = │Xi+1 – X i │where integer i is 1 ≤ i ≤ k - 1

ir 1 mRi Individuals average moving range = mR =  r i1 __ 3mr Upper Natural Process Limit =UNPLx= X + = X + 2.660 mR d 2

ik 1 Xi Centreline = CLx = X =  (average of individual values) k i1 __

Lower Natural Process Limit =LNPLx= X - 3mR = X - 2.660 mR

d2

Centreline or average moving range =UCLR = mR

Upper Control Limit for moving range =UCLR= D4 mR = 3.268 mR

__

Sigma for individual values = sigmax (σ) = mR

d2

When n=2 d2 =1.128and D4 =3.268 (from Dispersion and Bias factor tables)

Figure 6 Equations for Calculating Control Limits for XmR Charts

Xi mR 35 32 3 45 7 30 15 35 5 39 4 32 7 34 2 35.25 6.14

Table 3 Xi and mR Data from Project Figure 5

As an illustration, using the XmR equations and the values from Table 3, where the average velocity per week is 35.25 and the average two-point moving range is 6.14 the UNPLx = 51.6 and LNPLx = 18.9 which indicates these values are within natural process limits.

Next it is useful to make an estimate of future progress. One approach is to use weighted averages which have helped achieve improvements for initial estimation of planned figures. Velocity

35 UNPLx

30

25 weighted average s t n i 20 o p

y r 15 X -Bar average o t

s centreline (CLx) 10

5

0 1 2 3 4 5 6 iterations

Figure 7 X-bar average in comparison to weighted average. Iterations were 1 week duration.

What was a typical occurrence, at the start of projects, is a time lag before acceptance tests were complete Figure 7. Taking the weighted average is an alternative to the average CLx and has proved to be more effective measure for estimation of velocity at the start of a project. Values used to calculate weighted average are shown in Table 4.

Although not shown, the values for Figure 7, give the upper control limit for moving average UCLR as 22.8 and all moving range values are within this limit.

n ∑ rn r Weighted average r=1

Story pointsCountWeighted points010521018354144562451202761621 4.73.519.1 Average Weighted Average

Table 4. Weighted Average

Acceptance Test Charts Using acceptance tests solved some of the problems of earned value reporting. Information was available to assess progress in stories complete and how much work was involved (story points). The team deliver can only deliver a certain rate or velocity each iteration, and if additional stories are requested then others may need to be dropped. Provided managers understood that the team would be able to deliver at a velocity similar to earlier iterations then changes to work load were reduced, and planned values remained valid for longer. Even with changes in scope, this data was of value to managers, who were familiar with re-planning and making adjustments to the inventory of work. Cumulative flow of acceptance tests provided further information to manage projects appropriately, Figure 8. Acceptance Test Data

100 90 80 70 60 Acceptance Tests s

T 50 Failing ATs A 40 Passing ATs 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 Iteration

Figure 8. Cumulative flow chart.

The team can easily assess the amount of work remaining using burn down charts, Figure 9, which have a motivating effect on the team, especially towards the end of an iteration or project. Also it can be seen when additional work has been attributed to the project. Sometimes it may be a useful to show the changing inventory line. Cockburn discusses the use of burn-up charts (i.e. cumulative progress with time) and their correlation to earned value charts [Cockburn 03] and considers this a ‘natural mapping’. Cohn outlines the separation of impact of velocity and scope changes by delineating at the x-axis.

Story Points Remaing

350

300

250 s t n i 200 o P

y r 150 o t S 100

50

0 6 6 6 6 6 6 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 / / / / / / / / 1 1 1 1 2 2 2 2 0 0 0 0 0 0 0 0 / / / / / / / / 6 3 0 3 0 7 7 4 0 1 2 2 0 1 1 2

Figure 9. Burn down chart.

Discussion Ideally acceptance data should be within control limits if it to be used as a basis for EV reporting. However, even if data is not within limits, grouping data within iterations can give meaningful values. Variability in this metric reflects the size of user stories and changes in scope.

Measurement should be a natural by-product of the development process and should cause minimal overheads. The interpretation of tasks and acceptance data outside of control limits should help understand the process better and attribute reasons to these variations. Acceptance test data is one approach. Issue reporting is also essential to understand the stability of the code dependant on the size and domain of the project.

Team tracking, using pair reporting, and keeping the data visible, allows the team more involvement in this process. Presentation of results should be as simple as possible with the team and managers able to view velocity to give rates of progress and burn-down charts showing remaining work. Charts should be in a format to gain as much leverage for reporting of progress as possible. Percentages can also be easily derived from stories and story points achieved.

With better appreciation of the rate of work, what has been found is that scope changes in terms of story points has been reduced. The managers know what the team are capable of delivering and the team realise that they cannot achieve more than a certain number of story points (tasks). The increased stability creates more effective and accurate EV reporting. Combining understanding of acceptance data (using weighted averages at the start of the project) and the rate (velocity) with a better understanding of the process has created more viable estimation figures. Where managers are briefed as to the nature of the agile process, to ensure that the requirements with the greatest business value are deliver first, they are more likely to accept the idea of improved estimates during iterations.

From experience the approaches based on velocity are particularly useful. Reporting tasks on a daily basis and stories on a weekly and iterative basis via charts, can give timely and more accurate estimation of value and likely time of completion. Experience has shown that projects cost was less of an issue, as developers’ time is often allocated. In team working environments where developers and project managers are working on multiple projects the cost factor (actual cost) may cause a significant variance from planned. When using EV it is prudent not to report the initial values of cpi and spi as there can be wide fluctuations on cumulative charts which are lessened later in the project. This is somewhat akin to not reporting velocity until a few values under process control have been established.

Careful selection of metrics and measuring progress over a shorter period, improve velocity and project estimation which is consistent with the aims of EVM and may help achieve the requirements of continual process improvement at higher CMMI levels.

Conclusion Agile approaches can give better developer and management perspectives. Velocity measurements are an appropriate method to understand rates of work and allied to this acceptance tests completed give a good indication of progress. These metrics can then if required be the basis of EVA and on larger projects interpreted using control studies. Approaches need to be tailored accordingly, as there is a major difference in clearly defined projects and those where the requirements are clarified through iterative development. As the use of agile methodologies widens, understanding agile development rates and value will increasingly become necessary. Linking these projects to CMMI assessments may be one viable approach to gain an objective measure of progress towards this goal.

References [Alleman 03] Alleman, Glen B.; and Henderson, Michael. Making Agile Development Work in a Government Contracting Environment, Agile Development Conference, Salt Lake City, Utah, June 2003. [Anderson 05] Anderson, David J. Stretching Agile to fit CMMI Level 3, Agile Conference, Denver, July 2005.

[APM 06a] APM (Association of Project Management), Project, Vol 18, Issue 7, Feb 2006.

[APM 06b] APM BoK (Body of Knowledge), APM Publishing, 2006. (Also the BS6079-2:2000 British Standards)

[Beck 00] Beck, Kent. Extreme Programming Explained, Addison-Wesley, 2000.

[Bittner 05] Bittner, Kurt; and Spence, Ian. What is iterative development? Part 3: The management perspective, May 2005. www-128.ibm.com/developerworks/rational/library/may05/bittner-spence/index.html

[Chrissis 03] Chrissis, Mary B.; Konrad, Mike; and Shrum, Sandy. CMMI Guidelines for Process Integration and Product Improvement, Addison-Wesley, 2003.

[Cockburn 03] Cockburn, Alistair. Crystal Clear, Addison-Wesley, 2003. http://alistair.cockburn.us/crystal/articles/evabc/earnedvalueandburncharts.htm

[Cohn 06] Cohn, Mike. Agile Estimating and Planning, Addison-Wesley, 2006.

[Deming 86] Deming, W. Edwards. Out of the Crisis, Cambridge, MA: MIT Centre for Advanced Engineering, 1986.

[Florec 99] Florac, William A.; and Carleton, Anita D. Measuring the Software Process, Addison-Wesley, 1999.

[Humphrey 05] Humphrey, Watts S. Personal Software Process, A Self-Improvement Process for Software Engineers, Addison-Wesley, 2005.

[Lester 2000] Lester, Albert. Project Planning and Control, 3rd Edition Butterworth- Heinemann, 2000.

[Lipke 2000] Lipke, Walt. Software Planning, Statistics, and Earned Value, CrossTalk (Journal of Defence Software Engineering), Dec 2000. www.stsc.hill.af.mil/crosstalk/2000/lipke.html

[McGarry 2002] McGarry, John; Card, David; Jones, Cheryl; Layman, Beth; Clark, Elizabeth; Dean, Joseph; and Hall, Fred. Practical Software Measurement, Addison- Wesley, 2002.

[Shewhart 1931] Shewhart, W.A. Economic Control of Quality of Manufactured Product, New York: Van Nostrand, 1931.