UNIVERSITY OF HERTFORDSHIRE Faculty of Engineering and Information Sciences

7WCM0003 Computer Science MSc Project (Online)

Final Report January 2017

An Investigation into Introducing Test-Driven Development into a Legacy Project

P. Hall

ABSTRACT Background Test-Driven Development is a development technique that is said to produce code that is less complex, highly cohesive and easy to maintain compared to traditional development methods. TDD was introduced into a legacy application that had no testing in place, which allowed users to generate their own content and was developed in the .Net MVC framework. Objective To determine if adding new functionality to a legacy project with Test-Driven Development improves the internal code quality. Method New functionality was added to a legacy application using Test-Driven Development. Code metrics were recorded at the start of the project and after every iteration of development. The results were then compared to establish if any difference could be seen in the metric values before and after introducing TDD. Results Test-Driven Development marginally had negative impacts on average lines of code, coupling and complexity but it had a slight positive impact upon maintainability and cohesion. None of these differences were significant. Conclusion Test-Driven Development has little to no impact on improving the internal code quality of .Net legacy application. However, it did significantly increase the test coverage of the project, which can lead to a reduction in the error proneness of the application and an improvement in external code quality.

ii

ACKNOWLEDGEMENTS I would like to give thanks to my supervisor who helped me to formalise my ideas and was always on hand to give advice. I would also like to thank my loving family for their continued support throughout the project and their understanding for all those long nights and weekends spent away from them.

iii

CONTENTS 1. Introduction ...... 1 1.1 Background ...... 1 1.2 Research Questions ...... 3 1.3 Objectives ...... 3 1.4 Methodology ...... 4 2. Related Work ...... 6 3. Test-Driven Development ...... 11 3.1 Overview of TDD ...... 11 3.2 TDD Tools in .Net ...... 12 3.3 Methods and Techniques for TDD and Legacy Code ...... 21 4. Measuring Software Quality ...... 24 4.1 What Makes Good Software? ...... 24 4.2 Software Metrics Used for Project ...... 25 4.3 Software Metric Tools in .Net ...... 31 5. Legacy Project ...... 37 5.1 Overview of Application ...... 37 5.2 Development Plan ...... 37 6. Development Challenges and Solutions ...... 41 7. Results ...... 47 7.1 Software Metrics ...... 47 7.2 Code Test Coverage ...... 52 8. Conclusion ...... 55 8.1 Summary and Evaluation ...... 55 8.2 Future Work ...... 56 References ...... 57 Appendix A ...... 63 Appendix B ...... 66 Appendix C ...... 67 Appendix D ...... 69

iv

1. INTRODUCTION 1.1 Background Test-Driven Development (TDD) is an approach to developing software were test code is written first before writing production code to fulfil the test. It relies heavily upon refactoring whilst providing a safety net of tests that can be run each time the code is changed (Crispin 2006) and is not just a testing technique but a process incorporating many different methods such as unit, acceptance, performance and system integration tests (Desai et al. 2008). TDD follows a red-green-refactor cycle in which a test is written that fails (red), production code is written that passes (green) and finally that code is refactored if necessary (refactor), this cycle formulates a continuous rhythm for the lifecycle of the project (Beck 2002).

TDD is an increasingly popular paradigm in software engineering especially with the advent of agile processes over the past two decades. It is a highly sought after practice in the job market with its citation on IT job adverts in the UK increasing year on year (ITJobsWatch 2016), but it can also be a daunting practice to begin for those new to the technique (Crispin 2006). However, a recent study shows that skilled developers with no prior experience of TDD can quickly learn and properly apply the methods (Latorre 2014). Despite this increase in popularity, heavy links to agile methodologies and apparent ease to adopt, a 2013 survey shows that less than 40% of agile teams employ a TDD approach (Ambler 2013a).

1

Figure 1 - shows how TDD as a job requirement has increased greatly over the past decade (IT Jobs Watch, 2016)

Figure 2 - 2013 Ambysoft “how agile are you?” survey results showing less than 40% employ a TDD approach (Ambler 2013b)

A systematic review by Bissi et al. (2016) on the effects of TDD on code quality and productivity found that most of the current work on the subject has so far focused on examining external code quality by using the Java programming language to develop new projects. The review suggests that relatively little work has been done 2 to investigate TDD in legacy projects and states that there is scope for research into how TDD impacts internal code quality using a set of code analyse metrics as well as how TDD extends to alternative languages other than Java.

This project was built upon the work of studies such as Guerra’s (2014), where an investigation to give an overview of the process of developing a complex framework using TDD practises was undertaken. The study used code metrics to analyse how the framework evolved over time and provided a detailed documentation of the challenges encountered during the development process. This project used a similar technique but investigated how TDD influenced the code quality of a legacy project.

1.2 Research Questions 1. Does introducing Test-Driven Development into a legacy project improve the internal code quality? 2. What are the challenges and solutions of implementing Test-Driven Development practices into a legacy project? 3. What are the advantages and disadvantages of the different tools available for implementing Test-Driven Development in the .Net framework? 4. Do different code metric measuring tools provide different results for the same code base?

1.3 Objectives • Complete a literature search and review of existing work regarding Test-Driven Development and legacy projects • Identify and evaluate the available approaches for introducing Test-Driven Development into a legacy project • Compare and evaluate the most popular tools for Test-Driven Development inside the .Net framework • Extend the legacy project by implementing completely new functionality and advanced web 2.0 features using Test-Driven Development • Analyse the effect Test-Driven Development has upon the internal code quality of the project • Provide documentation of the project experience and development including any challenges encountered and the solutions used

3

• Analyse the project code coverage and number of unit tests introduced using Test- Driven Development • Compare the results of software metric tools for .Net

1.4 Methodology The research for this project applied known methodologies to a new case study, namely applying Test-Driven Development to an existing project and applying a suite of code metrics to analyse results on internal code quality.

A legacy project developed in ASP.Net MVC was used as the basis for the research project. This project is a Web 2.0 user generated content website based around the idea of users creating painting guides for miniature wargame models. It was developed in Visual Studio using the ASP.Net MVC framework with C# for the backend and HTML, JavaScript and JQuery for the frontend. This project was chosen as the existing project to work upon because it fits the criteria for the research proposed: it allows for the investigation into the tools available for TDD in the .Net framework and most importantly it was not developed using a TDD approach and in fact does not have any form of automated testing integrated, which fits the definition of a legacy project as given by Michael Feathers (2004). New features were added to the existing project using TDD.

To analyse the internal code quality, which is at the core of the research, third party tools were utilised. Developing tools for analysing code metrics is beyond the scope for this project, but could become part of any future work. Readings were taken at regular intervals throughout the lifecycle of development so that the gradual impact TDD has on the project over time could be seen.

Different tools and frameworks available for implementing TDD in the .Net framework were researched, compared and evaluated; some of which were used during development. These were tools for unit tests, test doubles, and dependency injection. NuGet statistics that track downloads and installs were utilised, so that focus was placed on the most popular and relevant tools and extensions in the .Net community.

4

The overall project was developed using an agile process. This is because agile processes are currently the most popular development methods (Jeremiah 2015). Agile also promotes integrating testing into the development lifecycle at all stages, rather than a standalone testing phase at the end in traditional methods like waterfall or v-lifecycle, and the project aimed to see exactly how TDD fits into the process. Visual Studio online was used to manage the project because agile management tools are provided which allow the ability to add all work items to a task board, plan iterations and get information on burndowns and velocity.

5

2. RELATED WORK There have been several studies into Test-Driven Development since it was popularised by the Extreme Programming movement and especially by Kent Beck in his 2003 work. Most previous work around TDD and code quality has focused on external quality and only a few studies have considered internal code quality and class design (Aniche and Gerosa, 2015), and even fewer of these have used software metrics to analyse results. There has been a lack of cohesion in the metrics used to analyse code quality which leads to a difficulty in comparing the results between studies (Bissi et al, 2016; Kollanus 2011).

In 2003 Kaufmann and Janzen conducted an experiment comparing software metrics between TDD and test last approaches. They concluded that TDD had greater productivity but Cyclomatic Complexity (CC) and other metrics were similar between the two methods. However, they admit that the experiment was somewhat flawed because the TDD group had more programming experience, higher average grades and that the projects developed were too small.

Janzen and Saiedian (2006) also concluded that TDD does not show any noticeable improvement in internal code quality, but they did note concerns that quality can significantly reduce when the TDD process breaks down and no tests are written.

Siniaalto and Abrahamsson (2007) conducted an experiment that used a suite of traditional software metrics to measure code quality and the differences between TDD and an Iterative Test Last (ITL) method. They only found differences in the Coupling Between Objects (CBO) and Lack of Cohesion of Methods (LCOM) metrics between the two development approaches. TDD was found to have a better CBO but a worse LCOM when compared to ITL, but the differences were small and cannot concretely be put down to TDD alone, experience of developers may have played a part. The study did, however, find that TDD significantly improved test coverage compared to ITL.

A further study by Siniaalto and Abrahamsson (2008) used Chidamber and Kemerer metrics (1994), CC and Martin’s Dependency Management Metrics (2003) to compare TDD and ITL approaches. They found that TDD may produce code that is

6 less complex and has a higher Depth of Inheritance (DIT) but that is also harder to change and maintain due to its stability in relation to its abstractness.

A study by Pancur and Ciglaric (2011) found similar results to Siniaalto and Abrahamsson when comparing TDD to ITL. They saw small positive increases in internal code quality from complexity and code coverage metrics but these were not statistically significant.

Janzen and Saiedian (2008) posited that test-first programmers are more likely to produce smaller units of less complex code that is highly tested compared to ITL. They conducted a quasi-controlled experiment comparing TDD and ITL over 6 projects. They used software metrics to analyse complexity, coupling and cohesion. The results showed that TDD produced code that was less complex but the data for coupling and cohesion was inconclusive, and a slight increase in coupling for TDD was put down to more classes that were smaller being produced over ITL, which they claim could be positive design due to the classes abstractness. However, they were unable to prove that the TDD classes were more abstract.

Vu et al. (2009) proposed that TDD does not improve internal code quality during their experiment into the differences between a test-first and test-last approach to development. They measured internal code quality with CC and Weighted Methods per Class (WMC). They found that CC does not differ between the two approaches but WMC was significantly higher in the TDD project meaning that test-first produced larger and more complex classes.

Guerra (2014) developed a complex framework using TDD and shows from software metric readings that coding standards remained similar throughout development and that the results were better than statistical thresholds.

A 2016 systematic review by Bissi et al. found 27 previous papers that addressed the effects of TDD on internal code quality, external code quality, and productivity. It concludes that 76% show an increase in internal quality, however this comes with the caveat that only the code coverage metric was used to distinguish this as it was the only common metric found in all the papers reviewed. This contrasts with a 2011 systematic review by Kollanus in which it is concluded from the 16 papers reviewed

7 that most found no improvement in code quality and that TDD may in fact produce code that is difficult to maintain. A 2010 literature review by Shull et al. also found that previous work in the area shows no consistent effect on internal code quality.

Bissi et al. (2016) goes on to propose that most current work is focused on greenfield applications and that there is a lack of research into TDD and legacy projects. It also finds that most studies use the Java programming language due to its popularity and the availability of tools available that support TDD, therefore more studies are needed into other programming languages and environments.

A literature search produced only four previous studies that looked at TDD in legacy applications, and out of these only one of them investigated how it impacted upon the internal code quality. Klammer and Kern (2015) looked at industrial legacy products that had limited or no testing implemented and detailed the problems encountered and solutions used for retrofitting unit tests, but not necessarily using TDD. They state that it may be impossible to add unit tests to legacy code without refactoring the code first and that this may result in a heavy reliance on mocks, which can result in large and complex tests being written. Their findings indicate that mocks should rarely be used and advanced mocking should only be temporary whilst refactoring, low testability is the result of poor software design and that adding unit tests to untested code may not be cost effective. They conclude that because of these findings retrofitting unit tests is unlikely to happen in the real world, therefore testability should be considered from the start of development.

Shihab et al. (2010) presented an approach for determining which functions to write unit tests for in large scale legacy projects. They state that 90% of development cost is spent on maintenance but writing unit tests for a whole legacy system at once is practically infeasible. The study suggests that TDD should be used for the maintenance of legacy code and proposes an approach which they term Test-Driven Maintenance (TDM). In this approach functions of a system are isolated and unit tests incrementally written until quality target is met. They detail a process for prioritising which functions to unit test by extracting historical data from the project such as bug fix and general maintenance modifications, mapping these to the functions that were changed, calculating heuristics and recommending which functions should be tested. 8

A 2005 study by Nair and Ramnath presented a series of developer stories documenting experiences when trying to introduce TDD into a legacy project. The study finds that the most common problems encountered when introducing TDD into legacy projects are breaking dependencies and finding fracture points to be able to get code under test, and the assumption that all legacy code is bad and needs to be refactored. They advocate the use of dependency injection, interfaces and mock objects to overcome difficulties and that legacy code is not necessarily bad code and it is not necessary to make it “right”. They state that most legacy code can simply be avoided and only refactored if absolutely needed, when it is direct association with new features added via TDD. The study suggests a bottom-up approach to refactoring, followed by a periodic top-down review to unify the architecture.

The most relevant previous study to this report was performed by Choudhari and Suman (2015). An experiment was conducted on five maintenance projects in which the same user stories were implemented by two different teams, one using an extreme programming method and the other a traditional waterfall approach. Metrics were collected to evaluate productivity, code quality and maintainability and a survey with the programmers was also conducted. The study found that the teams using extreme programming produced code that had lower complexity, looser coupling and higher cohesion compared to the waterfall method. It also reports that the extreme programming approach produced more maintainable code and had higher productivity. The survey results concluded that the extreme programming developers were more confident in the quality of their code and its robustness for future changes. However, this study used other extreme programming techniques such as pair programming, collective ownership of code and iterative lifecycles as well as TDD, so it is not obvious how much of a role TDD played in the results.

So far there have been mixed results from previous studies and no clear evidence has been presented that TDD increases internal code quality. Most studies so far have reported small positive impacts on code complexity that are not statistically significant, and no real difference for any other internal code quality metrics when compared to other development methods. There is not much research into how TDD impacts legacy projects and many more studies are needed before any concrete

9 conclusions can be made. TDD seems to have a surrounding myth in the software industry and literature that it drastically increases code quality, class design and productivity but this is not backed up by any scientific evidence.

10

3. TEST-DRIVEN DEVELOPMENT

3.1 Overview of TDD Test-Driven Development is a technique for producing software in which unit tests are written before any production code. TDD is not a testing technique but is a method for designing code which is believed to be cleaner, simpler and easier to maintain (Beck, 2002) TDD was popularized by Kent Beck in the early 2000s as part of the Extreme Programming agile methodology but the practice itself can be dated to much earlier, including an early reference in NASA Project Mercury in the 1960s (Siniaalto and Abrahamsson, 2007).

The process of TDD is iterative and relies upon repetition of a very small development cycle. Each iteration follows a red-green-refactor cycle in which a new feature is added by first writing a failing unit test (which usually gets a red bar in unit test frameworks), then writing the simplest possible code to make the test pass (which usually gets a green bar in unit test frameworks) followed by refactoring the code to remove any duplication. This process is followed continually until all functionality has been implemented. Writing the tests before the production code in this manner is said to solidify understanding of what the code should be like (Feathers, 2004) and inspire confidence in the quality of the code (Beck, 2002).

11

Write unit test that fails (red)

Make code compile

Make test pass (green)

Run all tests

Rework code Yes No Duplication? (refactor)

Figure 3 red-green-refactor cycle for TDD

3.2 TDD Tools in .Net

3.2.1 Types of Tools There are numerous different types of tools that can be used for practicing TDD in .Net. However, only one type of tool is a fundamental requirement and that is Unit Test Frameworks. The rest of the tools and frameworks presented in this section are optional and if needed the techniques can be implemented manually without any tools.

3.2.1.1 Frameworks (UTF) Unit testing is a process in which individual components of a system are tested independently for correct operation. A unit is the smallest testable part of an application which is often an entire class or individual method in object oriented systems. UTF’s allow for the automation of this process by enabling the developer to

12 write a piece of code that invokes another piece of code and checks assumptions rather than performing tests manually (Osherove, 2009).

10-unit test frameworks were identified after performing a NuGet package search. The search was limited to NuGet because it is the packet manager for the development platform including .Net and it records the number of downloads so it is possible to see the most popular packages in the .Net community.

Name NuGet Downloads Last Update csUnit 360 13/07/2015 dbUnit 235 04/02/2016 Fixie 28,384 01/09/2016 Gallio 19,584 15/10/2011 mbUnit* 21,419 20/10/2014 MSTest** 2,477 23/10/2015 NaturalSpec 9,148 06/12/2012 NUnit 7,142,172 04/10/2016 Visual Studio Unit 50,684 14/06/2013 Testing*** xUnit.net 3,659,325 06/11/2016 Table 1 list of unit test frameworks on NuGet as of 10/08/2016 * part of a bundle with Gallio on NuGet ** legacy command line utility for older versions of Visual Studio *** included in visual studio so downloads not a good indication of popularity 3.2.1.2 Dependency Injection (DI) Dependency injection is a technique for achieving loose coupling between objects and their dependencies by using a set of software design principles and patterns (Seemann, 2011), specifically a process where objects define their dependencies only through constructor arguments and properties. It can be used to break the normal coupling between a system under test and its dependencies during automated testing (Meszaros, 2007). DI frameworks help to detect dependencies and make it easier to employ DI techniques.

Name Downloads Last Update Autofac 3,240,042 23/11/2016 BalticAmadeus.Container 202 14/09/2014 Caliburn.Micro.Container 5,467 19/06/2013 Castle Windsor 1,278,257 18/05/2016 Container.NET 740 04/11/2013

13

hq.container 91 24/09/2016 IoC 4,376 04/03/2016 LinFu 5,108 21/03/2012 Ninject 3,775,182 14/08/2016 Petite.Container 2,943 23/04/2012 Simple Injector 756,781 13/06/2014 Spring.Net 77,797 14/04/2015 StructureMap 1,462,689 22/11/2016 Unity 4,718,018 10/06/2015 Table 2 list of DI frameworks on NuGet as of 07/12/2016 3.2.1.3 Automated Refactoring Tools (ART) Refactoring is the process of restructuring a piece of code by changing its internal structure without changing its existing behaviour (Fowler, 1999). Refactoring is a key step in the TDD cycle. Automated refactoring tools help in the refactoring process by providing automated operations for refactoring techniques such as renaming, changing method signatures, extracting an interface, etc. The degree of the automation of refactoring varies greatly amongst tools, but even semiautomatic approaches to refactoring offered by many modern IDE’s can increase developer productivity by a factor of ten (Mens and Tourwe, 2004). Feathers (2004) outlined some technical and practical criteria for refactoring tools; they must be accurate and preserve behaviour of program, fast to enable developers to work quickly, integrated so that refactoring’s can be performed during development, and provide an undo facility so that an exploratory approach to design can be explored with limited consequences.

Visual Studio itself provides some basic automated refactoring options and a few other sophisticated automated refactoring tools are available for .Net and can be installed as extensions to the IDE. These types of tools are usually not free to download and use.

Name Developer CodeRush DevExpress Just Code Telerik ReSharper JetBrains Visual Assist Whole Tomato Software Table 3 list of major ART for .Net that can be installed as extensions to Visual Studio

14

3.2.1.4 Test Doubles Test doubles can be used in place of real objects for the express purpose of running a test (Meszaros, 2007). There are generally four types of test doubles that can be used, but there is confusion in the developer community about the differences between types of test doubles and Meszaros (2007) states that terminology is confusing and inconsistent. Mocks is the term generally used to cover all the different types of test doubles.

3.2.1.4.1 Dummy Dummy objects are passed around from method to method but never actually used. They are mainly used to fill parameter lists, and are never actually expected to do anything other than exist (Meszaros, 2007). In statically typed languages like C# objects for parameters must be type compatible so therefore dummy objects that do not implement any actual behaviour can be used when a parameter is required but does not need to be used by the test.

3.2.1.4.2 Fake Fakes are objects that have working implementations but are not suitable for production, for example interfaces that contain fixed data and logic or an in-memory database (Fowler, 2007). They are generally used to replace the functionality of a depended-on component when the real version has not yet been built, is too slow or is not available in the test environment (Meszaros, 2007).

3.2.1.4.3 Mock Mocks help to break dependencies in code (Feathers, 2004) by creating alternative implementations of real objects that supply the correct values for testing. Mocks can perform assertions internally and verify calls to the object. There are two different types of mocks, strict mock objects that fail tests if calls are received in a different order than expected or lenient mock objects that tolerate out of order calls (Meszaros, 2007).

3.2.1.4.4 Stub A test stub is used to provide explicit behaviour of an object so that code paths that may otherwise be impossible to cover can be tested. They provide a canned response to calls made to the method during a test and usually do not respond to any other influences outside of the test (Fowler, 2007). Meszaros (2007) identifies two types of

15 test stubs, a responder that returns valid and invalid via normal method calls and a saboteur that raises exceptions or errors to simulate abnormal behaviour.

3.2.1.4.5 Available Test Double Tools 13 test double frameworks were identified for .Net that provide different implementations of the various types of doubles. There are few frameworks specifically for each type of test double, most are called mocking frameworks despite the type of double they implement. By far the most popular framework for C# and .Net is Moq.

Name Downloads Last Update AutoMock 1,893 18/05/2016 DimMock 1,648 25/01/2012 EasyMoq 2,022 01/05/2013 FakeItEasy 613,917 05/12/2016 Foq 22,567 14/06/2015 JustMock Lite 89,147 04/05/2016 LightMock 3,400 04/02/2016 Mockjockey 2,242 13/02/2013 Moq 8,942,046 20/08/2016 NMock3 19,395 04/04/2013 Prig 1,985 14/09/2014 RhinoMocks 843,888 22/04/2014 Simple.Mocking 9,539 22/12/2015 Table 4 list of test double frameworks on NuGet as of 07/12/2016 3.2.2 Comparison of Unit Test Frameworks Only the Unit Test frameworks were compared as they are the only type of tool that is required for TDD. The top two NuGet packages by download were used in the test along with Visual Studio Unit Testing as this is included in Visual Studio as the default unit testing option.

3.2.2.1 NUnit NUnit is an open-source and free UTF for all .Net languages. It is part of the xUnit family of unit test frameworks and was originally ported from Junit (a Java language UTF). It fully integrates with Visual Studio IDE and tests can be run via the console or with third party test runners such as ReSharper. The latest versions offer a multiple constraint assertion syntax, which uses a single method of the assert class and passes in a constraint object.

16

[Test] public void GetDeletePostReturnsView() { var result = _controller.DeletePost(_post1.ForumPostId);

Assert.That(result, Is.InstanceOf(typeof(ViewResult))); }

Figure 4 example NUnit unit test showing multiple constraint assertion 3.2.2.2 xUnit.Net xUnit.Net is a free, open-source UTF that is part of the xUnit family of test frameworks. It supports C#, F#, VB.Net and other .Net languages. It fully integrates into Visual Studio IDE and can work with various test runners such as ReSharper, CodeRush, TestDriven.Net. xUnit.Net does not use SetUp or TearDown methods unlike most UTF’s, instead favouring constructors and IDisposable. This results in larger unit tests that in theory are easier to understand.

[Fact] public void GetDeletePostReturnsView() { SetUpControllerWithUnitOfWorkMock(); var testPost = new ForumPost(); _uowMock.Setup(x => x.Repository().GetById(1)).Returns(testPost); var result = _controller.DeletePost(1);

Assert.IsType(result); }

Figure 5 example unit test in xUnit.Net. Shows use of parameterised test pattern because of no setup and teardown methods 3.2.2.3 Visual Studio Unit Testing Visual Studio Unit Testing is a UTF that is included in some versions of Visual Studio. Tests can be executed via a runner in Visual Studio including third party runners or from the command line using MSTest. The framework uses attributes to decorate classes and methods as tests and uses different methods on the assert class to verify behaviour.

17

[TestMethod] public void GetDeletePostReturnsView() { var result = _controller.DeletePost(_post1.ForumPostId);

Assert.IsInstanceOfType(result, typeof(ViewResult)); }

Figure 6 example unit test in Visual Studio Unit Test framework showing method call to assert class 3.2.2.4 Method and Results A unit test should run fast and help localize errors quickly (Feathers, 2004) and A big part of TDD is the ability to perform a run of all unit tests frequently (usually every time a build is compiled), so an important consideration for a Unit Test Framework is the speed in which it can run tests. A speed test comparison was performed on the three frameworks listed above.

One test class from the project that had been written in NUnit to implement a feature via TDD was chosen that represented a varied selection of tests. This class was then rewritten into two new classes for Visual Studio Unit Test and xUnit.Net. The tests were kept as close to the original as possible, testing the exact same piece of code and the same expected results but using the syntax and methods of the different frameworks. If any other tools or frameworks were used in the original tests (such as mocks) then the exact same version was setup and used in the new test.

There were 40 unit tests in the test suite so 80 new tests were written (40 in Visual Studio Unit Test and 40 in xUnit) for a total of 120 unit tests. These were then executed on two different test runners, the built in Visual Studio runner and ReSharper test runner to see if the test runner itself played any part in the results or if it was just the framework that determined execution speed.

For each of the unit test methods for each framework the time in milliseconds that the test took to execute was recorded for each test runner. Metrics were then calculated on the data collected and the results can be seen the table below.

18

ReSharper VS Runner Metric VS Test NUnit xUnit VS Test NUnit xUnit Total 687 604 1931 446 234 1235 Max 436 49 1080 243 14 662 Min 4 8 1 3 4 1 Average 16.75 11.23 48.28 11.15 5.85 30.88 Median 5 9 4 4 5 2 Standard Deviation 67.23 7.12 190.09 37.30 2.07 120.85 Table 5 values in milliseconds for both test runners

At first it appears that NUnit is by far the fastest framework across both test runners when looking at the total and average values. However, if we look at the median and standard deviation it appears xUnit has some extreme outliers that are skewing the results. Tests generally seem to run faster in xUnit but it has the odd test that runs extremely slow, whereas this does not happen with the other frameworks. This can be seen better on the box plot graphs below.

Figure 7 box plot for ReSharper test runner without outliers shows that most results in xUnit are faster than the NUnit

19

Figure 8 box plot for VS Test Runner without outliers shows contrast in even more detail than ReSharper results, suggesting that xUnit is much faster without the extreme results.

Figure 9 box plot for VS Test Runner with outliers and mean indicated. xUnit has two outliers that are far above the rest of the results, VS Test also has one outlier much greater

Two tail T-Tests were used to compare the data and determine if there was a significant difference between the frameworks regarding the speed in which each of the 40 unit tests were ran. A value of 0.05 or less is generally considered statistically significant (Craparo, 2007) and none of the T-Tests performed were below this threshold. The biggest difference was between xUnit and NUnit, but again this was not a statistically significant difference.

20

T-Test VS Unit Test NUnit xUnit VS Unit 0.61 0.33 Test NUnit 0.23 xUnit Table 6 T-Tests results for ReSharper test runner showing P value, results <0.05 are statistically significant

T-Test VS Unit Test NUnit xUnit VS Unit 0.38 0.33 Test NUnit 0.2 xUnit Table 7 T-Test results for VS test runner showing P value, results <0.05 are statistically significant Overall it appears that there is some difference between the unit test frameworks and between the test runners themselves, although not statistically different. The data shows that NUnit has the fastest overall speed and all test methods execute in a similar time, whereas both VS Unit Test and xUnit have some extreme results that slow down the overall time. Most tests run faster in these two frameworks than NUnit, as seen by the median values, but have extreme outliers that slow down the overall test suite. As far as test runners are concerned the data shows that VS Test Runner is consistently faster than ReSharper and the best combination out of the options compared is unit tests written in the NUnit framework and executed on the VS Test Runner.

This comparison was only conducted on a small subset of the unit tests written for the project, in which more than 400 were coded and many large scale real world products could have many thousands of unit tests that need to be run often. Over that large amount of data, a more significant difference may be seen in the results. For further work a much larger set of test classes and methods should be used which also utilise more of the frameworks nuances and techniques.

3.3 Methods and Techniques for TDD and Legacy Code Legacy code is defined as any application that does not have testing in place (Feathers, 2004). One of the biggest problems in working with legacy applications is not being able to detect if changes made, either from refactoring the old code or

21 adding new code that interacts with the legacy code, cause existing functionality to break. Without testing in place, it is impossible to be certain that changes made do not cause any issues in other parts of the program.

There are a few different approaches to working with legacy code. The first option is to simply abandon the legacy application and start again, this approach has merits in that the new version can be made cleaner and learn from the mistakes of the original as well as putting tests in place but the downside is that it can be time consuming and wasteful especially if the legacy application is still working correctly. Another approach is to get the whole of the legacy application under test before adding any new functionality, however, this can be extremely difficult and costly as well as leading to a long time before any new features are seen in the program.

The third approach, and one recommended by Feathers (2004), is to only touch legacy code in places where it is needed, i.e. when a new feature is required that must interact with parts of the existing code base. In these instances, the feature should be added using TDD and any legacy code that the new code must interact with should be wrapped inside a test harness before any refactoring takes place. On some occasions the legacy code may be untestable, for example because of dependencies or tight coupling, and in these instances, other forms of testing, i.e. acceptance or integration tests, may need to be temporarily introduced before the refactoring to get the code under test before breaking dependencies to write unit tests. This is done to ensure that the behaviour of the legacy code is not altered when changing its structure. Some automated refactoring tools can help with this scenario by performing the refactor in a more secure manner, but caution must be utilized as not all tools are 100% reliable (Feathers, 2004).

Feathers (2004) states that new features should be added via TDD when working with legacy code as this is the most important feature addition technique and it allows developers to work on new code independently of legacy code as well as introducing tests to the project. He goes on to outline an extended TDD cycle in which a new phase is added at the beginning to get the class to be changed under test and a note of caution to not change any legacy code when making the test pass. Legacy code can be modified during the refactoring process when it is safely under a test suite. 22

Get code under test

Write unit test that fails (red)

Make code compile

Make test pass (green) without modifying old code

Run all tests

Rework new Yes No Refactor? and old code

Figure 10 modified TDD cycle for legacy code (Feathers, 2004)

23

4. MEASURING SOFTWARE QUALITY 4.1 What Makes Good Software? Software quality is hard to define and there is no perfect design (Marinescu, 2005), but several attempts have been made to establish ways in which to measure the code of programs to determine this quality. The International Organisation for Standardisation (ISO) set forth a quality model in ISO 9126 in 1991 which has since been superseded by ISO/IEC 25023:2016. These standards provide a set of quality measures that can be used for quantitatively measuring and evaluating software quality, including those for internal metrics, across multiple characteristics and states that complexity and maintainability are key factors in establishing software quality (BSI, 2016).

Martin (2006) put forth a set of principles for object-oriented design that he claimed if applied together would lead to a less complex and easier to maintain system. These principles are based largely upon having low complexity, high cohesion and loose coupling of classes to produce high quality object-oriented code.

A large body of software metrics suites have been developed for measuring object- oriented design and numerous tools are available for calculating these (Lincke et al., 2008). In 1977 Halstead proposed a suite of metrics based around measuring the complexity of software, and although it has had a lasting effect and is still used in other metrics such as maintainability index, the work has been heavily criticised (Fenton and Bieman, 2014). McCabe (1976) also proposed a metric for measuring software complexity, Cyclomatic Complexity, that is still widely used today.

Other pioneers in software metrics for object-oriented design were Chidamber and Kemerer who in 1991 proposed a suite of metrics with six different measurements that could be used to predict the quality of any object-oriented language.

24

Metric Name Abbreviation Coupling Between Objects CBO Depth of Inheritance Tree DIT Lack of Cohesion of LCOM Methods Number of Children NOC Response for a Class RFC Weighted Methods Per WMC Class Table 8 Chidamber and Kemerer metrics suite

The Martin suite of metrics (Martin, 1994) establishes a set of metrics that can measure the quality of an object-oriented design in terms of the independence of subsystems. It sets forth a design pattern to encourage all forms of dependency to be of a desirable form, and the set of metrics are designed to measure the conformance to the design pattern.

Metric Abbreviation Abstractness A Afferent Coupling Ca Efferent Coupling Ce Instability I Distance from the Main D Sequence Table 9 Martin metric suite

A thesis by Andersson and Vestergren (2004) conducted an experiment into which metrics are best suited for measuring object-oriented design by taking measurements from a selection of good and bad quality source code on several different software metrics tools. They conclude that metrics do have a practical use and that they can to some extent reflect a software systems design quality. They also state that some metrics are better than others for predicting software quality.

4.2 Software Metrics Used for Project A subset of the metrics presented by Andersson and Vestergren (2004) in their study on measuring object-oriented design were used to analyse the code quality of the legacy application each of which is described in detail below. These metrics were

25 chosen because of the availability in the tools surveyed and to be able to give a comparison of these tools a suite of metrics was needed that all the tools supported. The report looks at the metrics from an application level for server side C# code only and does not analyse any client side code (HTML, JavaScript, etc.).

Metric Name Abbreviation Maintainability Index MI Cyclomatic Complexity CC Lack of Cohesion of LCOM Methods Coupling Between Objects CBO Lines of Code LOC Table 10 metrics used for the project and their abbreviation

4.2.1 Maintainability Index (MI) This is a calculation of composed metrics to show how easy the code base is to maintain in the future. Oman and Hagemeister first proposed MI in 1992 and it was used in an extensive trial by Hewlett Packard (Coleman et al., 1994). The MI is comprised of traditional software metrics: weighted Halstead metrics, McCabe’s Cyclomatic Complexity, lines of code and number of comments; and composed into a single figure that indicates the maintainability of a piece of code.

Two original calculations were set forth; one that considered comments and one that did not (Oman and Hagemeister, 1992).

MI = 171 - 3.42ln(aveE) - 0.23aveV(g') - 16.2ln(aveLOC) Equation 1 where aveE is average Halstead effect per module, aveV(g') is the average extended cyclomatic complexity per module and aveLOC is the average number of lines per module.

MI=171-3.42ln(aveE)- 0.23aveV(g’ )-16.2ln(aveLOC) +0.99aveCM Equation 2 where aveE is the average Halstead Effort per module, aveV(g’) is the aver- age extended cyclomatic complexity per module, aveLOC is the average lines of code per module, and aveCM is the average number of lines of comments per module.

Several other calculations have been proposed over the subsequent years (Welker, 2001), but are all based off similar traditional metrics and composed into a single

26 figure. VS Metrics is the only tool used for this project that gives an MI rating and it calculates it differently to the original version. The value is normalised to be in the range of 0 to 100 instead of the original range of 171 to an unbound negative number.

MI = MAX(0,(171 – 5.2 * ln(Halstead Volume) – 0.23 * (Cyclomatic Complexity) – 16.2 * ln(Lines of Code))*100 / 171)

Equation 3 MI as calculated by VS Metrics tool (Conorm, 2007) The thresholds as proposed by the VS Metrics tool are as follows:

Range Maintainability 20-100 Good maintainability 10-19 Moderate maintainability 0-9 Low maintainability

Table 11 maintainability thresholds as proposed by VS Metrics tool (zainnab, 2011)

4.2.2 Cyclomatic Complexity (CC) CC measures the complexity of the program by calculating the number of linearly independent paths in the code and has a foundation in graph theory. CC was first proposed by McCabe in 1976 and its primary purpose was to provide a mathematical technique to identify software modules that are difficult to test or maintain (McCabe, 1976). In practicality terms, it can be used to provide the upper bound for the number of tests needed to ensure that all code paths are executed (Pressman, 1997).

McCabe’s CC metric is defined as: v(G)=e- n+2

Equation 4 where v(G) equals the cyclomatic complexity of the flow graph G, e equals the number of edges and n equals the number of nodes. To better illustrate this consider the following example of a bubble sort algorithm in C#:

27

1: for (int i = intArray.Length - 1; i > 0; i--) { 2: for (int j = 0; j <= i - 1; j++) { 3: if (intArray[j] > intArray[j + 1]) { 4: int highValue = intArray[j]; 5: intArray[j] = intArray[j + 1]; 6: intArray[j + 1] = highValue; } } }

Figure 11 bubble sort algorithm in C# To work out the CC it can be converted to a control flow graph using the numbers in code above as nodes:

1

2

3

4

5

6

Figure 12 control flow graph for sample code showing 6 nodes and 8 edges

The number of edges = 8 and nodes = 6, so the CC can be worked out as v(G) = 8 – 6 + 2 = 4.

As CC measures complexity, it is therefore desirable to keep the number as low as possible. McCabe himself used an upper bound of 10, which he stated seemed like a “reasonable, but not magical, upper limit” (McCabe, 1976).

4.2.3 Coupling Between Objects (CBO) A large assortment of coupling measurements for object-oriented systems have been defined but Chidamber and Kimerer’s CBO metric is the most widely used (Mitchell and Power, 2005).

28

CBO for a class is the count of the number of other classes to which it is coupled. Coupling occurs when one class uses methods or instance variables of another, but it is not associative, for example if class A is coupled with class B and class B is coupled with class C then class A is not necessarily coupled with class C.

CBO was first proposed by Chidamber and Kemerer (1991) in their paper to establish a metric suite for object oriented design along with 5 other metrics. The same authors revised CBO in 1994 were some ambiguities regarding inheritance and a requirement for bi-directionality in coupling were removed (Chidamber and Kemerer 1994).

Chidamber and Kemerer (1994) argue that coupling should be kept to a minimum to improve modularity and encapsulation and that high coupling can be associated with poor maintainability, due to a sensitivity in changes in other areas of the design. Classes that depend on each other too much are harder to understand, change and correct and are more complex to the resulting system (Mitchell and Power, 2005). However, Henderson-Sellers et al. (1996) states that without any coupling a system is useless, therefore some coupling is necessary, and that it is the elimination of extraneous coupling that is the goal of good object-oriented design.

CBO can be used as an indicator of whether a class is losing its integrity (Siniaalto and Abrahamsson, 2007), how complex the testing of various parts of the design are likely to be (Chidamber and Kemerer, 1994) and the maintainability, fault proneness, testability and change of proneness of software design (Mitchell and Power, 2005).

4.2.4 Lines of Code (LOC) LOC is a measurement of a modules size and is one of the oldest software metrics (Rosenberg, 1997). However, as simple as that sounds, there are many variations of this metric and discrepancies over what exactly should be counted. There are many considerations as to what should be included in the count such as commented lines, blank lines, single method calls than span multiple lines because of coding style, include and using statements, etc. and this can lead to counts of the same piece of code widely varying (Smacchia, 2007).

29

Rosenberg (1997) states that LOC can be used as a predictor of development or maintenance effort, a covariate for other methods to normalise them to the same code density and as a standard against which other metrics can be evaluated.

On a method level, and when calculating the average LOC for methods, it can be a good indicator of the complexity of the code. The NDepend tool recommends that methods with LOC greater than 20 are hard to understand and those with LOC greater than 40 are extremely complex and should be split into many smaller methods (NDepend, 2016).

4.2.5 Lack of Cohesion of Methods (LCOM) LCOM is a measurement of how cohesive a class is. It measures the correlation between methods and the local instance variables of a class. Cohesion is desirable in a class because it promotes encapsulation and adheres to the single responsibility principle as proposed by Martin (2003).

LCOM was first proposed by Chidamber and Kemerer in 1991 but several other definitions have been developed subsequently (Li and Henry, 1993; Henderson- Sellers, 1996).

Consider a class C1 with n methods M1, M2, ..., Mn. Let {Ij} = set of instance

variables used by method Mi. There are n such sets{I1},...,{In}. Let P = {(Ii , Ij)|Ii ∩ Ij = ∅} and Q = {(Ii,Ij)|Ii ∩

Ij ≠ ∅}.If all n sets {I1},...,{In} are ∅ then let P = ∅.

LCOM = |P| - |Q|, if |P| > |Q| = 0 otherwise Figure 13 definition of LCOM by Chidamber and Kemerer (1994) LCOM produces values in the range from 0 to 1, where 0 indicates perfect cohesion and 1 an extreme lack of cohesion. It is desirable to keep the value as close to 0 as possible.

Low cohesiveness increases complexity, thereby increasing the number of errors in the development process and it implies that classes should probably be split into two or more subclasses (Chidamber and Kemerer, 1994).

30

4.3 Software Metric Tools in .Net

4.3.1 Previous Work Lots of work has been done regarding metrics and their suitability for quantifying software quality but not much research has been carried out comparing code metric tools, and only one paper has addressed this in relation to tools specifically for .Net. In their 2011 study Novak and Rakić identified 5 .Net tools that met their requirements for measuring software metrics, they went on to analyse the tools by running them on several different software applications and compared the results against one another using T-Tests. They concluded that software metric tools can give results that differ significantly for the same metrics on the same applications. Most other work in the field has focused on other programming languages, notably Java and C/C++. Lincke et al. (2008) performed a comparison of 10 software metric tools for Java and analysed 9 metrics that most of the tools implemented. They found that there were several issues with metrics definitions, notably that they are unclear and inexact which opens the possibility of different interpretations and implementations and that name does not distinguish different variants of the same metric. They concluded that most of the metrics tools provide different results for the same inputs and that this could be due to a difference in interpretation of the metric. Bakar (2012) also found that different tools calculate the same metrics differently by comparing four tools and attributes this result to an ambiguity of metrics definitions, which lead to different implementations across tools. He concludes that research using software metrics are tool dependant and different results might be given by using a different tool because there can be significant differences between the measurement results, he also suggests that some standard meanings should be provided for software metrics. None of the studies formulate an opinion on which tool gives better or more accurate results but they all conclude that the results differ from one another significantly on at least some software metrics.

4.3.2 Available Tools 14 different tools were identified for measuring software metrics in .Net after performing an internet search. These varied from relatively simple lines of code

31 counters to complex static code analysis software for enterprise level which required substantial licence fees.

Name Manufacturer Link Free C# Source Code Semantic http://www.semdesigns.com/Pr No Metrics Designs oducts/Metrics/CSharpMetrics. RSM M Squared http://msquaredtechnologies.co No Technologies m/m2rsm/index.htm LocMetrics LocMetrics http://www.locmetrics.com Yes Code Counter Gerone Soft http://www.geronesoft.com No Pro SLOCCount D H Wheeler http://www.dwheeler.com/sloc Yes count/ SLOCMetrics Microguru http://microguru.com/products No /sloc/ EZMetrics James Heires http://www.jamesheiresconsulti No ng.com/Products.htm McCabeIQ McCabe http://www.mccabe.com/iq_de No Software velopers.htm NDepend ndpend http://www.ndepend.com No Free academic licence available Sourcemonitor Campwood http://www.campwoodsw.com/ Yes Software sourcemonitor.html VS Code Metrics Microsoft Yes Borland Micro Focus http://www.borland.com/en- No Together GB/Products/Requirements- Management/Together TICS TIOBE http://www.tiobe.com/tics/fact- No sheet/ Designite Designite http://www.designite-tools.com No Free academic licence available Table 12 shows the tools available for software metrics in .Net

4.3.3 Evaluation of Tools Only tools that were freely available, provided a free trial period or a free academic licence were considered for evaluation. In addition to this the tool had to provide at least one of the metrics listed earlier and be available for use in .Net and C#. Only six tools of the 14 found met the selection criteria, each of which is discussed in more detail in this section.

32

Number Name 1 VS Code Metrics 2 LocMetrics 3 SLOCCount 4 NDpend 5 SourceMonitor 6 Designite Table 13 list of software metric tools evaluated

4.3.3.1 Visual Studio Code Metrics The VS Code Metrics tool is developed by Microsoft and built into the Visual Studio Integrated Development Environment (IDE). It offers a small range of software metrics for identifying code quality and measures metrics on a solution, project, namespace, type, or method level. Data can be exported directly to Excel or as a CSV.

4.3.3.2 LocMetrics LocMetrics is a tool for measuring the various lines of code in a project. It runs external to the IDE with a GUI or from the command line. The tool has not been updated since October 2007 but is still available for download.

4.3.3.2 SLOCCount SLOCCount supports several different programming languages and is open source software which means that the source code can be modified. It only concentrates on the number of lines of code metric. It was last updated in August 2004 but is still available for download.

4.3.3.3 NDpend NDpend is a commercial tool that provides extensive software metric coverage as well as providing graphical reports and trend monitoring. The tool is highly configurable so that it can be setup to the needs of the project. Metrics can be calculated for applications, assemblies, namespaces, types and methods. It can be installed as a package extension to Visual Studio and all data can be exported in various formats.

33

4.3.3.4 SourceMonitor SourceMonitor is a freeware program that offers metrics at source code and method levels for a variety of programming languages. It operates within a GUI external to the IDE and can display metrics in graphical tables and charts. Metrics can also be exported to XML or CSV.

4.3.3.5 Designite Designite is a commercial tool that provides software metrics at solution, project, class and method level. It also provides information on object-oriented design in the form of “code smells”, trend analysis and dependency matrix. The tool is highly customisable and allows for results to be exported to Excel, XML or CSV file formats.

Name MI CC CBO LOC LCOM

VS Code Metrics X X X X LocMetrics X X SLOCCount X NDepend X X X X Sourcemonitor X X Designite X X X X Table 14 shows which of the 6 tools selected support each of the metrics required by study (X indicates metric is supported)

4.3.4 Comparison Out of the six tools evaluated above LocMetrics and SLOCCount were dismissed for use in the project as they only offered a small subsection of the software metrics required and the developers no longer support them. The remaining four tools (VS Code Metrics, NDepend, SourceMonitor and Designite) were compared using the technique described in this section, which was based upon the methods proposed by Novak and Rakić (2011).

First all the software to be used was installed, either as a standalone product or as an extension to Visual Studio. Code analysis was run on the legacy application that was being used for the project before any new work was done in regards to adding new features with TDD. After the initial analysis was run the results were exported to Microsoft Excel where further analysis was performed.

34

All data was saved into separate files for each tool being compared and the average value and standard deviation was calculated for all the metrics being used in this project. Vectors were then created with the metric values for each tool and compared using two sample T-Tests, which allow for the comparison of data to determine if they differ from one another. Following general statistical convention, a result below 0.05 is considered significantly different (Craparo, 2007).

T-Test VS Metrics Designite SourceMonitor NDepend

VS Metrics 0.53 0.56 0.94

Designite 1.00 0.55

SourceMonitor 0.58

NDepend

Table 15 T-Test results p-value, a value of <0.05 is considered significantly different

The results indicate that calculated metric values for each tool do not significantly differ from one another, which contradicts previous work in this area. In some instances, the T-Test indicates that the tools give virtually identical results for comparisons between VS Metrics and NDepend and SourceMonitor and Designite.

The standard deviation for each metric was also calculated and can be seen in the table below. The results of this show a large variation in for the LOC metrics, which agrees with the results Novak and Rakíc (2011) found. This could be put down to an ambiguity in the calculation of LOC across tools. There does not appear to be a standard way of calculating this metric.

Tool MI CC LCOM LOC Avg. LOC CBO VS Metrics 13.63 1.36 - 1296 45.19 16.04 Designite - 2.06 0.19 4320 79.52 1.94 SourceMonitor - 1.11 - 4330 104.97 - NDepend 1.09 0.28 1448 3.38 - Average 13.63 1.41 0.24 1477 58.27 8.99 Table 16 standard deviation for all metrics and tools, shows a large variance for LOC metrics across tools

In conclusion, this study found mixed results when comparing the software metric data between different tools in .Net. When taken as a suite of metrics no significant

35 difference was found between any of the tools which contradicts previous work, but if each metric is analysed individually a large variance in the LOC metric can be found, which indicates a difference in the calculation method amongst tools compared. From these results, it is impossible to conclude which of the tools provide better results, and this could be an area for future research.

Threats to validity could arise from the fact that this study only used a small subset of the metrics available in each tool and only one project was used to calculate the metrics unlike the previous studies which used many different code bases, so therefore there may be less chance for the results to differ.

4.3.5 Selection for Project All the products that were compared in the previous section were used to take metric readings for the duration of the project. They were all used instead of just one because the comparison was unable to detect which tool was the more effective and despite concluding that no significant difference in the data was found most previous work in the area indicates otherwise (Lincke et al., 2008; Novak and Rakíc, 2010). Showing the results of multiple tools over time would allow for a more comprehensive conclusion of whether TDD improves internal code quality.

Number Name 1 VS Code Metrics 2 NDepend 3 SourceMonitor 4 Designite Table 17 list of tools selected for use in project

36

5. LEGACY PROJECT

5.1 Overview of Application The existing project that was used for the research is a web application that is primarily focused on user generated content (henceforth known as version 1.0 of the application). It allows for users to create painting guides for miniature wargame models by letting them upload images, write text and select equipment used for the various steps in the painting process. The 1.0 version of the application incorporated several Web 2.0 features such as commenting, rating and sharing of user generated content.

Version 1.0 was developed using ASP.Net MVC 5 in C# for the backend with a SQL database and with HTML, CSS and JQuery for the frontend using a Bootstrap theme. It contained no unit tests or any other form of testing and therefore fulfilled the definition of a legacy application as put forth by Michael Feathers (2004), who states that any code without unit tests can be considered as legacy.

5.2 Development Plan The new version of the application (version 2.0) was developed inside an agile methodology (Specifically Disciplined Agile Development) using TDD to implement new features. The project was developed over 10 weeks with 5 iterations each two- weeks in length, followed by a one-week transition phase to finalise the development.

Before development began a one-week inception phase took place in which the initial planning of the project took place. First a project schedule was produced in the form of a Gantt chart that showed the duration of each phase of development.

37

Sep 2016 Oct 2016 Nov 2016 ID Task Name Start Finish Duration 4/9 11/9 18/9 25/9 2/10 9/10 16/10 23/10 30/10 6/11 13/11 20/11 27/11

1 Inception 05/09/2016 11/09/2016 7d

2 Iteration 1 12/09/2016 25/09/2016 14d

3 Iteration 2 26/09/2016 09/10/2016 14d

4 Iteration 3 10/10/2016 23/10/2016 14d

5 Iteration 4 24/10/2016 06/11/2016 14d

6 Iteration 5 07/11/2016 20/11/2016 14d

7 Transition 21/11/2016 27/11/2016 7d

Figure 14 Gantt chart showing the schedule for development

Then a list of stakeholders for the application was produced for which user stories could be generated.

Stakeholder Description Admin The admin can manage and curate the content on the site as well as registered users. Logged in A logged in user can create content for the site as well as interact User with other users. User A user can view the content on the site but cannot create content or interact with other users. Table 18 shows all stakeholders for the project that can have user stories assigned to them

A list of user stories was then made for any new features that were to be added and given a priority based upon relative complexity and effort needed. A list of known bugs was also created and each given a priority, and it was decided that fixing these should be added into development.

ID User Story Priority Points US1 As an admin I can login securely so that I can 3 administer the sites content US2 As an admin I can curate the sites content so 8 that I can remove any unwanted items US3 As an admin I can see all user accounts so 8 that I can see who is using the site US4 As an admin I can modify user accounts so 13 that I can ban or suspend unwanted users US5 As an admin I can modify comments so that I 13 can remove or edit anything inappropriate US6 As a logged in user I can flag comments so 5 that I can report anything inappropriate US7 As a logged user I can flag paint guides so that 2 I can report inappropriate content

38

US8 As a user I can filter content on the site so 2 that I can easily find content that I like US9 As a logged in user I can create a profile so 5 that I can give myself an identity on the site US10 As a logged in user I can add paint guides to 3 my favourites so that I can preserve a list of all the content I like US11 As a user I can view lists of the top paint 1 guides so I can view the best content on the site US12 As a logged in user I can send messages to 8 other users so that I can communicate privately US13 As a logged in user I can participate in forums 20 so that I can take a more active part in the site community US14 As an admin I can get detailed stats of site 13 content use so that I can see what is happening on the site US15 As a logged in user I can rate comments so 8 the best comments can be easily viewed Table 19 all user stories for the new features to be added at the start of development. Stories could be changed or added right up until implementation of the item.

ID Bug Priority 1 Sort order is not preserved when switching 2 the view type on guides page 2 Items on guides page are not separated into 3 pages 3 Editing of a paint guide results in an 1 unhandled error exception 4 Creating a paint guide causes an error 1 Table 20 list of known bugs in the previous version

These were then added into Microsoft’s agile planning tool on Visual Studio Online and it was calculated that there was a work item estimation of 150 points (allowing extra for contingency such as additional features needing to be added or problems arising), which meant that 30 points worth of user stories needed to be completed each iteration.

39

Before the first iteration began and at the end of each iteration software metric readings were taken so that any differences in code quality could be tracked throughout the lifecycle of development.

40

6. DEVELOPMENT CHALLENGES AND SOLUTIONS The legacy application contained no form of testing so the first thing that was needed was for a test project to be setup inside the solution. This was done so that unit tests could be written for the project. The first problem occurred during the implementation of the first user story, which was functionality to allow the user to log in as an administrator. The issue was that unit tests could not be written for the project without interacting directly with the database, this was because the business and data layers were too closely coupled and a layer of abstraction between the two did not exist. The dependencies needed to be broken before unit tests could be added and new functionality implemented. Three approaches for doing this were identified:

• Introduce a repository and unit of work design pattern • Fake the database context by adding interfaces to DBSet and applicationContext, which would allow for using an alternative context to test with • Use a test double framework to mock the context directly without changing any legacy code

As TDD dictates to implement the simplest possible solution first and the best practice is to avoid changing legacy code until necessary, the third approach was initially attempted. However, this did not work as expected because of problems encountered trying to mock non-interface classes which would lead to having to change the implementation of the database context and this forced a move to the repository and unit of work approach instead, as this would lead to making tests for other parts of the application easier to write in the future.

The repository and unit of work pattern is intended to create and abstraction layer between the data layer and the business logic layer. This pattern can help protect the application from changes in the data store and can help to facilitate TDD (Dykstra, 2013).

41

Figure 15 differences between application with no repository and with repository and unit of work pattern (Dykstra, 2013)

As this technique was going to change legacy code and because unit tests could not yet be written integration tests were needed to get the classes to be changed under a test harness. This was done using Selenium web driver to check elements on the live website, and the tests could be run after refactoring took place to ensure functionality on the website remained the same.

The repository and unit of work pattern was then implemented with interfaces for both a repository and unit of work class. Using interfaces allowed for the classes to be easily mocked or faked for testing by using dependency injection to inject the object into the classes that needed to consume them. A fake implementation of the unit of work and repository could instead be injected during testing and the real version for production, this allowed for separating out the concerns of the unit tests.

42 interface IUnitOfWork : IDisposable { int Save();

IGenericRepository UserRepository { get; }

IGenericRepository GuidesRepository { get; }

IGenericRepository CommentsRepository { get; }

}

Figure 16 first version of the IUnitOfWork interface, which had a repository for all data tables in the application (not all implemented here) public interface IGenericRepository where TEntity : class { IEnumerable Get(Expression> filter, Func, IOrderedQueryable> orderBy, string includeProperties); IEnumerable GetAll(); TEntity GetById(object id); void Insert(TEntity entity); void Delete(object id); void Delete(TEntity entityToDelete); void Update(TEntity entityToUpdate); int SaveChanges(); }

Figure 17 first version of the IGenericRepository interface Dependencies between the data and business logic layers were now broken so new features could be added via TDD. After adding some new functionality, a model class from the legacy application needed to be changed and this refactoring could be done with confidence because unit tests were now in place for this area of the application. The confidence in being able to securely refactor is one of the great advantages of TDD over traditional software development techniques.

The process of abstracting the data layer and breaking dependencies so that unit tests could be written was not easy and considerably slowed down the development of new features. A long time was spent finding seam points, writing integration tests and refactoring old code to the detriment of being able to add new features. However, once this work had been done the benefits could be seen and confidence began to grow in the ability to change the legacy code without consequences. It was tempting at this stage to move the whole project overt to the new design pattern, but TDD emphasis that only the bare minimum should be done to make tests pass,

43 and this would have required writing code and tests for parts of the application that did not currently need to worked on.

The next problem encountered was when trying to implement a suspend user method that relied heavily upon the .Net framework classes UserStore and UserManager. It was difficult to determine where to place dependencies for these objects in my own classes and it was hard to test calls to them without mocking as they access the database directly. The first solution tried was to fake the application context in the unit tests so that when the framework objects were called they accessed the fake context but this did not work as intended because of non-virtual properties that could not be mocked. The second solution was to inject the UserStore and UserManager into the repository Suspend method directly and this was done via an extension so that not all children of the repository interface had to implement it. However, this lead to problems with mocking as extension methods are not easily mocked by most mocking frameworks, some solutions to this were to either create a wrapper class, use Moles which is an MS isolation framework or to move the method directly into the interface. The last option was used although this was not ideal because all classes that implemented the repository interface would now need to have a suspend user method even if not needed. public interface IGenericRepository where TEntity : class { IEnumerable Get(Expression> filter, Func, IOrderedQueryable> orderBy, string includeProperties); IEnumerable GetAll(); TEntity GetById(object id); void Insert(TEntity entity); void Delete(object id); void Delete(TEntity entityToDelete); void Update(TEntity entityToUpdate); int SaveChanges(); bool SuspendUser(string id); void ReinstateUser(string id); }

Figure 18 SusependUser and ReinstateUser methods added to the repository interface The second iteration of development was easier to implement because lots of the dependencies had already been broken in the first iteration. However, a problem was encountered when trying to add new functionality to the GuidesController, where

44 some methods relied upon the .Net framework class IIdenity. The methods checked the user’s identity before proceeding and did this via a call to GetUserId(), which is an extension method to IIdentity class, and the problem occurred because this checks the data source directly and cannot be mocked as it is an extension method. A seam was found by searching through the documentation to find that GetUserId() relied upon the Claim class so it was possible to create a stubbed claim and then mock the HttpContextBase class to return this when GetUserId() was called in the controller. private void SetFakeHttpContext(string id) { var identity = new GenericIdentity("test id"); identity.AddClaim(new Claim("http://schemas.xmlsoap.org/ws/2005/05/identity/claims/nameident ifier", id)); var principal = new GenericPrincipal(identity, new[] { "user" }); var context = new Mock(); context.Setup(s => s.User).Returns(principal); var controllerContext = new Mock(); controllerContext.Setup(t => t.HttpContext).Returns(context.Object); _controller.ControllerContext = controllerContext.Object; }

Figure 19 method for mocking the controller context so GetUserId() returns a valid value without calling the database During the third iteration, it was decided to utilise a dependency injection framework and Ninject was used for this. This allowed for dependencies to be bound and default constructors to be removed from controllers. A nuance of the MVC framework means that default constructors are always used within the framework but Ninject removes this convention by auto injecting the dependencies. Although before this an Inversion of Control pattern was being used to inject dependencies the full benefit of this regarding coupling of objects could not be seen as the default constructor which instantiated these objects was still present in the controller.

The last major issue to arise during development was a problem with the repository interface that was implemented earlier in the project. As more classes were being moved over to the unit of work pattern the class was becoming unwieldy with many different repositories having to be created as properties, and this was leading to high coupling and low cohesion of the class. It was decided to refactor this class by making it more generic, however this lead to hundreds of unit tests failing due to them mocking the IUnitOfWork interface. This is one of the major downsides to using

45 mocks and to TDD in general, if interfaces are changed lots of tests fail. Despite this being a long-winded process to refactor, it was relatively straightforward due in the most part to the tests themselves, and an automated refactoring tool helped greatly here due to it pointing out errors and identifying all the places that changes needed to be made. public interface IUnitOfWork : IDisposable { int Save();

IGenericRepository Repository() where T : class;

}

Figure 20 new IUnitOfWork class that uses generics Overall many difficulties are encountered when working with TDD in .Net legacy code, but nothing that cannot be solved. However, lots of hoops must be jumped through to get tests working for what seem to be the most basic of controller methods. It is often hard to determine what to test when working with a framework that auto generates code and there is a fine line between testing the applications own code and that of the framework, it is not often obvious how to go about testing code that call framework methods. Another downside is that lots of test code needs to be written, in some instances many more lines than the production code for a method. Finding seams in the framework to break dependencies is a difficult job and takes time, but one of the major upsides to doing this is the ability to refactor in confidence when tests are in place.

46

7. RESULTS 7.1 Software Metrics This section looks at the results for each of the software metrics recorded for the project. Where possible an average value taken from all the tools used to record the data is presented.

7.1.1 Maintainability Index (MI) Only the VS Code Metric tool recorded MI and this metric shows a slight increase over the course of development. MI initially dropped during the first two iterations but showed a marked increase after the second iteration. A T-test between the set of data recorded at the start and that after the fifth iteration revealed no significant difference in the results.

Maintainability Index

87.8 87.6 87.4 87.2 87 86.8 86.6 86.4 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics

Figure 21 shows that MI initially decreases before improving

7.1.2 Cyclomatic Complexity (CC) The results for CC are inconclusive, with two tools recording a marginal increase and the other two tools recording a marginal decrease. The average values across all tools show a marginal increase in CC of 0.0075. Overall CC has been maintained at the same level throughout development. The values recorded for all tools are low for CC

47 meaning that application started with a low complexity for methods and this was maintained during the development with TDD.

Cyclomatic Complexity

2 1.95 1.9 1.85 1.8 1.75 1.7 1.65 1.6 1.55 1.5 1.45 1.4 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics Ndepend SourceMonitor

Designite Average For All Tools

7.1.3 Lines of Code (LOC) All tools used recorded a LOC metric and as expected this value increased as more features were added. However, what was surprising is that the average LOC for a class also increased during development and all tools showed a greater average LOC value at the end. This goes against the consensus that TDD produces smaller and more cohesive classes. No significant difference in the results between first and last readings were seen.

48

Lines of Code

6000 5000 4000 3000 2000 1000 0 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics Ndepend SourceMonitor

Designite Average For All Tools

Figure 22 LOC shows an increase during development as would be expected as new features are added

Average Lines of Code per Class

100 90 80 70 60 50 40 30 20 10 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics Ndepend SourceMonitor

Designite Average For All Tools

Figure 23 average LOC per class also increases which is unexpected

7.1.4 Lack of Cohesion of Methods (LCOM) The results for LCOM differ between tools and are therefore inconclusive. Only two of the tools used recorded LCOM. NDepend showed a decrease of 0.11 and Designite showed an increase of 0.03 between first and last readings. The average value for both tools showed an overall decrease across development. The values from both tools are considered low for LCOM and therefore the classes in the application have 49 good cohesion. There is a significant difference between the first and last readings for the NDepend tool, with a P value of 0.000453 for a two sample T-Test of the data, however, the readings for the Designite tool do not show a significant difference.

Lack of Cohesion of Methods

0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 0.08 0.07 0.06 0.05 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics Ndepend SourceMonitor

Designite Average For All Tools

7.1.5 Coupling Between Objects (CBO) All the tools that recorded a CBO metric showed an increase over the course of development. CBO increased with every reading, however, none of the tools show a significant difference between the first and last iterations. Two of the tools (VS Code Metrics and Designite) recorded a small increase in CBO over the course of development but NDepend showed an increase of 8.85 with the largest leap coming at the end of the first iteration when the repository and unit of work design pattern was introduced.

50

Coupling Between Objects

14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 Start 1st Iteration 2nd Iteration 3rd Iteration 4th Iteration 5th Iteration

VS Code Metrics Ndepend SourceMonitor

Designite Average For All Tools

Figure 24 all tools show an increase in CBO throughout development

7.1.6 Overall The results from the software metrics readings are consistent with the most of the previous work in the area in the fact that they show that TDD has a minimal impact upon internal code quality. Most of the metrics recorded did not show a significant difference between the readings taken at the start and at the end of development. Only the LCOM metric showed a significant improvement, however, this was only on one of the tools that recorded this metric, the other tool showed a non-significant decline in cohesion so the overall result for LCOM is inconclusive.

Previous studies have recorded a small positive increase in CC but this study shows that CC largely remained the same throughout development. The results for MI differ to those found by Kollanus (2011) in his systematic review that concluded most studies found TDD produced code that is difficult to maintain.

Overall the results show that TDD had a minimal impact upon the internal code quality of the product, with all the average values across all code metric tools showing no significant difference after introducing TDD into the project.

51

Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration MI 87.12 86.98 86.87 87.51 87.36 87.58 LOC 2846.50 3269.00 3155.50 3565.75 3576.50 3557.50 Avg. LOC 41.96 41.90 49.66 50.71 50.71 52.00 CBO 5.24 6.61 7.33 7.92 8.15 8.97 LCOM 0.11 0.12 0.09 0.09 0.08 0.07 CC 1.59 1.56 1.62 1.58 1.60 1.60

Table 21 average values for all metrics recorded during development

7.2 Code Test Coverage Test coverage was recorded using JetBrain DotCover tool which integrates with Visual Studio and provides and analysis of the amount of lines of code in your application that are covered by unit tests. The legacy application started with 0% code coverage as it had no form of testing in place. After the fifth iteration, the application had 46% code coverage from unit tests.

The percentage of code covered by unit tests increased during each iteration, with the largest increase coming during the first two iterations of development. This was when most of the legacy code that needed to be used was brought under a test harness. During subsequent iterations, a smaller but more consistent increase can be seen. This can be put down to only new features being placed under test and not as much legacy code.

52

Unit Test Coverage

50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% 1 2 3 4 5 6

Figure 25 shows the increase in code coverage during the development lifecycle

New namespaces that were added to the project show a 100% coverage rate by the end of the 5th iteration (except WSFinal.App_Start, which was auto generated by the Ninject framework). This shows that when introducing new code TDD promotes high testability from the start and leads to an increase in developer confidence because code is completely tested.

1st 2nd 3rd 4th 5th Start Iteration Iteration Iteration Iteration Iteration WSFinal.App_Start - - 0% 0% 0% 0% WSFinal 0% 0% 0% 0% 0% 0% WSFinal.Migrations 0% 0% 0% 0% 0% 0% WSFinal.Controllers 0% 0% 18% 25% 28% 30% WSFinal.Models 0% 49% 59% 69% 69% 71% WSFinal.Controllers.Admi - 94% 96% 99% 99% 100% n WSFinal.Helpers - - 100% 100% 100% 100% Total 0% 11% 30% 37% 41% 46%

Table 22 shows the percentage of code covered by unit tests for each namespace in the application

53

These findings agree with most of the previous work on TDD and software quality, in that TDD introduces a significant improvement in code coverage from testing. However, this metric does not show the quality of the unit tests it only records if a test has covered a line of code but not how well it has been covered and whether all edge cases have been tested.

54

8. CONCLUSION 8.1 Summary and Evaluation The project set out to determine if introducing TDD into a legacy application improved the internal code quality and it showed by using software metrics that it had a minimal impact. A legacy application developed in the .Net MVC framework was extended and refactored where necessary using Test-Driven Development to implement new features. Software metric measurements were taken throughout development and the results showed that TDD had a minimal impact upon the quality of the code in the project, not all of which were positive, with most of the metrics displaying non-significant changes from the start of development to the last iteration. The results in part agree with most previous work in the area but go against the consensus of TDD advocates and pioneers such as Beck (2002) who claim that TDD produces small cohesive classes that are not complex. However, it could be argued that the legacy code in this instance was already of a high quality and TDD helped to maintain this.

The report identified the challenges that introducing TDD to a .Net application can bring and discovered that most problems encountered are related to breaking dependencies and finding seam points in the code to be able to write unit tests that do not interact with the data layer. The report identified some solutions to these problems by pinpointing ways to separate concerns of the application layers. It is possible to overcome all the challenges when working with legacy .Net code in the MVC framework but sometimes unnecessary work needs to be carried out to do so.

The report also examined the tools available for implementing TDD. From experiments taken it was discovered that there was no significant difference between unit test frameworks and test runners when analysing the speed of test runs.

It is inconclusive from this report if it is worthwhile to extend a legacy .Net MVC application with TDD. On one hand test coverage significantly increases which can lead to increased confidence when working with the legacy code regarding refactoring but on the other hand TDD leads to complex challenges having to be

55 overcome and does not show a significant improvement in the quality and maintainability of the code.

8.2 Future Work During this report, it was brought to light that further work is needed to investigate the impact of TDD on internal code quality in legacy applications before any concrete conclusions can be drawn. The work carried out in this report could be extrapolated into a larger scale project with a greater timescale to see if trends can be found when more development work is carried out. It would also be desirable to investigate the impact upon different quality legacy applications, for instance a legacy application that has poor quality software metrics from the outset to see if TDD improves the quality up the levels seen in this report.

There is also scope for work comparing different development approaches when working with legacy code to see which has the greater impact up on the internal code quality. A comparison between a Test-Last Development and TDD using two different teams to develop the same legacy application with the same user stories would be a worthwhile experiment.

Further work is also needed on comparing unit test frameworks with a much larger test suite than used in this project and across multiple .Net application types to see if any differences in the frameworks and runners can be established and why this is the case.

56

REFERENCES BS ISO/IEC 25023 (2016) Systems and software engineering. Systems and software Quality Requirements and Evaluation (SQuaRE). Measurement of system and software product quality 2016. British Standards Institute.

Ambler, S. (2013a) How Agile Are You? 2013 Survey Results. Available at: http://www.ambysoft.com/surveys/howAgileAreYou2013.html [Accessed: 5 June 2016].

Ambler, S. (2013b) Introduction to Test Driven Development (TDD). Available at: http://agiledata.org/essays/tdd.html [Accessed: 5 June 2016].

Andersson, M. and Vestergren, P. (2004) Object-Oriented Design Quality Metrics. Masters thesis. Uppsala University

Aniche, M. and Gerosa, M.A. (2015) ‘Does test-driven development improve class design? A qualitative study on developers’ perceptions’. Journal of the Brazilian Computer Society. 21(1). pp. 1-11.

Bakar, N. S. A. A. and Boughton, C.V. (2012) ‘Validation of measurement tools to extract metrics from open source projects’. 2012 IEEE Conference on Open Systems (ICOS). 21-24 October 2012. Kuala Lumpur, Malayasia. IEEE Xplore.

Basili, V.R., Briand, L.C. and Melo, W.L. (1996) ‘A validation of object-oriented design metrics as quality indicators’. IEEE Transactions on Software Engineering. 22(10). pp. 751-761.

Beck, K. (2002) Test Driven Development: By Example. Addison-Wesley Professional.

Bissi, W., Serra Seca Neto, Adolfo Gustavo and Emer, Maria Claudia Figueiredo Pereira (2016) ‘The effects of test driven development on internal quality, external quality and productivity: A systematic review’. Information and Software Technology. 74. pp. 45-54.

C. Klammer and A. Kern (2015) ‘Writing unit tests: It's now or never!’. IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 13-17 April 2015. Graz, Poland. IEEE Xplore.

Chidamber, S.R. and Kemerer, C.F. (1994) ‘Metrics suite for object oriented design’. IEEE Transactions on Software Engineering. 20(6). pp. 476-493.

Chidamber, S.R. & Kemerer, C.F. (1991) Towards a metrics suite for object oriented design. OOPSLA 1991 Conference proceedings on Object-oriented programming systems, languages, and applications. 6-11 October 1991. Phoenix, USA. ACM.

57

Choudhari, J. and Suman, U. (2015) ‘An Empirical Evaluation of Iterative Maintenance Life Cycle Using XP’. SIGSOFT Softw. Eng. Notes. 40(2). pp. 1–14.

Choudhari, J. and Suman, U. (2014) ‘Extended Iterative Maintenance Life Cycle Using eXtreme Programming’. SIGSOFT Softw. Eng. Notes. 39(1). pp. 1–12.

Coleman, D., Ash, D., Lowther, B. and Oman, P. (1994) ‘Using metrics to evaluate software system maintainability’. Computer. 27(8). pp. 44-49. conorm (2007) Maintainability Index Range and Meaning. Available at: https://blogs.msdn.microsoft.com/codeanalysis/2007/11/20/maintainability-index-range- and- atmeaning/ [Accessed: 21 November, 2016].

Craparo, Robert M. Significance Level. (2007). In: Encyclopaedia of Measurement and Statistics, 1st ed. Thousand Oaks, CA: SAGE Publications. pp.889-891.

Crispin, L. (2006) ‘Driving Software Quality: How Test-Driven Development Impacts Software Quality’. IEEE Software. 23(6). pp. 70-71.

Desai, C., Janzen, D. and Savage, K. (2008) ‘A survey of evidence for test-driven development in academia’. ACM SIGCSE Bulletin. pp. 97–101.

Dykstra, T. (2013) Implementing the Repository and Unit of Work Patterns in an ASP.NET MVC Application. Available at: https://www.asp.net/mvc/overview/older-versions/getting- started-with-ef-5-using-mvc-4/implementing-the-repository-and-unit-of-work-patterns-in- an-asp-net-mvc-application [Accessed: 26 September, 2016].

E. Shihab, Z. M. Jiang, B. Adams, A. E. Hassan and R. Bowerman (2010) ‘Prioritizing Unit Test Creation for Test-Driven Maintenance of Legacy Systems’. 2010 10th International Conference on Quality Software. 14-15 July 2010. Washington DC, USA. IEEE Computer Society.

Feathers, M. (2004) Working Effectively with Legacy Code. Prentice Hall, United States.

Fenton, N. & Bieman, J. (2014) Software Metrics: A Rigorous and Practical Approach. CRC Press.

Fowler, M. (2007) Mocks Aren't Stubs. Available at: http://martinfowler.com/articles/mocksArentStubs.html [Accessed: August 12, 2016].

Guerra, E. (2014) ‘Designing a Framework with Test-Driven Development: A Journey’. IEEE Software. 31(1). pp. 9-14.

58

Halstead, M.H. (1977) Elements of software science. Elsevier New York.

Henderson-Sellers, B. (1996) Object-orientated metrics: measures of complexity. Prentice Hall, Hemel Hempstead.

ITJobsWatch (2016) TDD Jobs. Available at: http://www.itjobswatch.co.uk/jobs/uk/tdd.do [Accessed: 4 June, 2016].

Janzen, D.S. and Saiedian, H. (2008) ‘Does Test-Driven Development Really Improve Software Design Quality?’. IEEE Software. 25(2). pp. 77-84.

Janzen, D.S. and Saiedian, H. (2006) ‘On the Influence of Test-Driven Development on Software Design’. 19th Conference on Software Engineering and Training 2006. 19-21 April 2006. Turtle Bay, Hawaii. IEEE Xplore.

Jeremiah, J. (2015) Agile vs. waterfall: Survey shows agile is now the norm. Available at: http://techbeacon.com/survey-agile-new-norm [Accessed: 4 June, 2016].

Jiau, H.C. and Chen, J.C. (2009) ‘Test code differencing for test-driven refactoring automation’ ACM Sigsoft Software Engineering Notes. 34(1). pp. 1–10.

Kaufmann, R. and Janzen, D. (2003) ‘Implications of test-driven development: a pilot study’. OOPSLA '03 Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications 26-30 October 2003. New York, USA. ACM.

Kollanus, S. (2011) ‘Critical Issues on Test-Driven Development’. 12th International Conference on Product-focused software process improvement (PROFES) 2011. 20-22 June 2011. Torre Canne, Italy. Springer.

Latorre, R. (2014) ‘Effects of Developer Experience on Learning and Applying Unit Test-Driven Development’ IEEE Transactions on Software Engineering. 40(4). pp. 381-395.

Li, W. and Henry, S. (1993) ‘Object-oriented metrics that predict maintainability’. Journal of Systems and Software. 23(2). pp. 111-122.

Lincke, R., Lundberg, J. and Löwe, W. (2008) ‘Comparing Software Metrics Tools’. ISSTA '08 Proceedings of the 2008 international symposium on Software testing and analysis. 20-24 July. Seattle, USA. ACM.

59

M. Mortensen, S. Ghosh and J. M. Bieman (2006) ‘Testing During Refactoring: Adding Aspects to Legacy Systems’. 2006 17th International Symposium on Software Reliability Engineering. 7-10 November 2006. Raleigh, USA. IEEE Xplore.

Mancl, D., Fraser, S.D. and Opdyke, B. (2011) ‘Workshop: Beyond Green-field Software Development: Reuse, Recycle, Refactor’. Proceedings of the ACM International Conference Companion on Object Oriented Programming Systems Languages and Applications Companion. 22-27 October 2011. Portland, USA. ACM.

Marinescu, R. (2005) ‘Measurement and quality in object-oriented design’. 21st IEEE International Conference on Software Maintenance (ICSM'05). 25-30 September 2005. Los Alamitos, USA. IEEE Computer Society.

Martin, M. and Martin, R.C. (2006) Agile Principles, Patterns, and Practices in C#. Prentice Hall, United States.

Martin, R. (1994) ‘OO Design Quality Metrics - An Analysis of Dependencies’. Available at: https://linux.ime.usp.br/~joaomm/mac499/arquivos/referencias/oodmetrics.pdf [Accessed: 10 October, 2016].

Martin, R.C. (2003) Agile software development: principles, patterns, and practices. Prentice Hall PTR.

McCabe, T.J. (1976) ‘A Complexity Measure’. IEEE Transactions on Software Engineering. 2(4). pp. 308-320.

Mens, T. and Tourwe, T. (2004) ‘A survey of software refactoring’. IEEE Transactions on Software Engineering. 30(2). pp. 126-139.

Meszaros, G. (2007) xUnit Test Patterns: Refactoring Test Code. Addison-Wesley Professional.

Mitchell, A. and Power, J. (2005) ‘Using object-level run-time metrics to study coupling between objects’. Proceedings of the 2005 ACM symposium on Applied computing. 13-17 March 2005. Santa Fe, USA. ACM.

Mortensen, M., Ghosh, S. and Bieman, J.M. (2008) ‘A test driven approach for aspectualizing legacy software using mock systems’, Information and Software Technology. 50(7). pp. 640.

60

Mü, M.M. (2006) ‘The effect of test-driven development on program code’. Proceedings of the 7th international conference on Extreme Programming and Agile Processes in Software Engineering. 17-22 June 2006. Oulu, Finland. Springer-Verlag.

NDepend (2016) Code Metrics Definitions. Available at: http://ndepend.com/docs/getting- started-with-ndepend [Accessed: 23 November, 2016].

Novak, J. and Rakić, G. (2010) ‘Comparison of software metrics tools for: net’. Proceedings of 13th International Multiconference Information Society. 11-15 October 2010. Ljubljana, Slovenia.

Oman, P. and Hagemeister, J. (1992) ‘Metrics for assessing a software system's maintainability’. Conference on Software Maintenance 1992. 9-12 November 1992. Orlando, USA. IEEE Computer Society.

Osherove, R. (2009) The Art of Unit Testing: with Examples in .NET. Manning Publications.

Pančur, M. and Ciglarič, M. (2011) ‘Impact of test-driven development on productivity, code and tests: A controlled experiment’. Information and Software Technology. 53(6). pp. 557- 573.

Pressman, R.S. and Ince, D. (1997) Software engineering: a practitioner's approach. McGraw- Hill, London.

Rosenberg, L.H. and Hyatt, L.E. (1997) ‘Software quality metrics for object-oriented environments’. Crosstalk Journal. 10(4). pp. 1-6.

S. Nair and P. Ramnath (2005) ‘Teaching a goliath to fly [Primavera Systems adoption of agile methodologies]’. Agile Development Conference (ADC'05). 24-29 July 2009. Denver, USA. IEEE Computer Society.

Sanchez, J.C., Williams, L. and Maximilien, E.M. (2007) ‘On the Sustained Use of a Test-Driven Development Practice at IBM’. Agile Conference 2007. 13-17 August 2007. Washington DC, USA. IEEE Computer Society.

Seemann, M. (2011) Dependency Injection in .NET. Manning Publications.

Shull, F., Melnik, G., Turhan, B., Layman, L., Diep, M. and Erdogmus, H. (2010) ‘What Do We Know about Test-Driven Development?’. IEEE Software. 27(6). pp. 16-19.

Siniaalto, M. and Abrahamsson, P. (2007) ‘A Comparative Case Study on the Impact of Test- Driven Development on Program Design and Test Coverage’. Proceedings of the First

61

International Symposium on Empirical Software Engineering and Measurement. 20-21 September 2007. Madrid, Spain. IEEE Computer Society.

Siniaalto, M. and Abrahamsson, P. (2008) ‘Does Test-Driven Development Improve the Program Code? Alarming Results from a Comparative Case Study’. Balancing Agility and Formalism in Software Engineering: Second IFIP TC 2 Central and East European Conference on Software Engineering Techniques, CEE-SET 2007. 10-12 October 2007. Poznan, Poland. Springer Berlin Heidelberg, Berlin, Heidelberg.

Smacchia, P. (2007) How do you count your number of Lines Of Code (LOC)?. Available at: http://codebetter.com/patricksmacchia/2007/10/03/how-do-you-count-your-number-of- lines-of-code-loc/ [Accessed: 5 November, 2016].

Vu, J.H., Frojd, N., Shenkel-Therolf, C. and Janzen, D.S. (2009) ‘Evaluating Test-Driven Development in an Industry-Sponsored Capstone Project’. Sixth International Conference on Information Technology: New Generations, 2009. 27-29 April 2009. Las Vegas, USA. IEEE Computer Society.

Welker, K.D. (2001) ‘Software Maintainability Index Revisited’. CrossTalk - The Journal of Defense Software Engineering. pp. 18-21.

Winkler, D., Schmidt, M., Ramler, R. and Biffl, S. (2012) ‘Improving Unfamiliar Code with Unit Tests: An Empirical Investigation on Tool-Supported and Human-Based Testing’. 13th International Conference on Product-Focused Software Process Improvement, PROFES 2012. 13-15 June 2012. Madrid, Spain. Springer Berlin Heidelberg.

Y. L. Traon, T. Mouelhi, A. Pretschner and B. Baudry (2008) ‘Test-Driven Assessment of Access Control in Legacy Applications’. 2008 1st International Conference on Software Testing, Verification, and Validation. 9-11 April 2008. Lillehammer, Norway. IEEE Computer Society. zainnab (2011) Code Metrics – Maintainability Index. Available at: https://blogs.msdn.microsoft.com/zainnab/2011/05/26/code-metrics-maintainability- index/ [Accessed: 21 November, 2016].

62

APPENDIX A Full results of the T-Tests for comparing software metrics tools and readings for each metric. VS Metrics v Designite VS Metrics Designite

Mean 331.2525 1092.7025 Variance 406916.7697 4629573.679 Observations 4 4 Hypothesized Mean Difference 0 df 4 t Stat -0.678589881

P(T<=t) one-tail 0.267314435 t Critical one-tail 2.131846786 P(T<=t) two-tail 0.53462887 t Critical two-tail 2.776445105

VS Metrics v SourceMonitor Source VS Metrics Monitor

Mean 437.92 1469.203333 Variance 542107.4212 6139512.078 Observations 3 3 Hypothesized Mean Difference 0 df 2 t Stat -0.69103136 P(T<=t) one-tail 0.280487733 t Critical one-tail 2.91998558 P(T<=t) two-tail 0.560975465 t Critical two-tail 4.30265273

VS Metrics v NDepend VS Metrics NDepend

Mean 331.2525 368.12

63

Variance 406916.7697 518347.8713

Observations 4 4 Hypothesized Mean Difference 0 df 6

t Stat -0.076655038 P(T<=t) one-tail 0.470695081 t Critical one-tail 1.943180281 P(T<=t) two-tail 0.941390163

t Critical two-tail 2.446911851

Designite v SourceMonitor Source Designite Monitor

Mean 1456.67 1469.203333 Variance 6149526.472 6139512.078 Observations 3 3 Hypothesized Mean Difference 0 df 4 t Stat -0.006192532 P(T<=t) one-tail 0.497677819 t Critical one-tail 2.131846786 P(T<=t) two-tail 0.995355638 t Critical two-tail 2.776445105

Designite v NDepend Designite NDepend

Mean 874.174 294.528 Variance 3710953.786 415839.8158 Observations 5 5 Hypothesized Mean Difference 0 df 5 t Stat 0.638030546 P(T<=t) one-tail 0.275762439

64

t Critical one-tail 2.015048373

P(T<=t) two-tail 0.551524878 t Critical two-tail 2.570581836

SourceMonitor v NDepend Source Monitor NDepend

Mean 1469.203333 489.6033333 Variance 6139512.078 688972.6052

Observations 3 3 Hypothesized Mean Difference 0 df 2 t Stat 0.649302738 P(T<=t) one-tail 0.291374895 t Critical one-tail 2.91998558 P(T<=t) two-tail 0.58274979 t Critical two-tail 4.30265273

Reading for Each Metric and Tool Metric VS Designite SourceMonitor NDepend Metrics CC 1.46 1.93 1.47 1.49 LOC 1288 4320 4330 1448 Avg. LOC 24.3 48.08 76.14 19.32 CBO 11.25 0.8 - 3.67 LCOM - 0.06 - 0.16 MI 87.12 - - -

65

APPENDIX B Link to video demonstrating the previous version (1.0) of the legacy application https://youtu.be/fVz7LmMcMBo

66

APPENDIX C Full results for each metric for all iterations and tools. Maintainability Index Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code 87.12 86.98 86.87 87.51 87.36 87.58 Metrics Ndepend ------SourceMonitor ------Designite ------Average for All 87.12 86.98 86.87 87.51 87.36 87.58 Tools

Lines of Code Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code 1288 1468 1560 1856 1952 2055 Metrics Ndepend 1448 1629 1523 1729 1743 1728 SourceMonitor 4330 5001 4807 5418 5386 5265 Designite 4320 4978 4732 5260 5225 5182 Average for All 2846.5 3269 3155.5 3565.75 3576.5 3557.5 Tools

Average Lines of Code Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code 24.3 24.47 26 27.7 28.29 28.15 Metrics Ndepend 19.32 19.98 23.08 24.01 24.04 24.69 SourceMonitor 76.14 74.64 92.44 91.83 91.29 94.02 Designite 48.08 48.5 57.1 59.29 59.22 61.14 Average for All 41.96 41.8975 49.655 50.7075 50.71 52 Tools

Coupling Between Objects Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code 11.25 11.55 11.75 12.36 12.99 13.27 Metrics Ndepend 3.67 7.38 9.21 10.33 10.46 12.52 SourceMonitor ------

67

Designite 0.80 0.89 1.03 1.07 1.01 1.13 Average for All 5.24 6.61 7.33 7.92 8.15 8.97 Tools Lack of Cohesion of Methods Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code Metrics Ndepend 0.16 0.17 0.1 0.09 0.07 0.05 SourceMonitor ------Designite 0.06 0.07 0.08 0.08 0.08 0.09 Average for All 0.11 0.12 0.09 0.085 0.075 0.07 Tools

Cyclomatic Complexity Tool Start 1st 2nd 3rd 4th 5th Iteration Iteration Iteration Iteration Iteration VS Code 1.46 1.45 1.51 1.48 1.5 1.49 Metrics Ndepend 1.49 1.48 1.55 1.5 1.5 1.48 SourceMonitor 1.47 1.46 1.49 1.43 1.42 1.42 Designite 1.93 1.83 1.94 1.91 1.98 1.99 Average for All 1.59 1.56 1.62 1.58 1.60 1.60 Tools

68

APPENDIX D Full results for code coverage metric for each iteration. Start Namespace % Covered Uncovered Total Covered Statements WSFinal 0% 0 71 71 WSFinal.Migrations 0% 0 571 571 WSFinal.Controllers 0% 0 1435 1435 WSFinal.Models 0% 0 281 281 WSFinal.Controllers.Admin 0% 0 0 0 Total 0% 0 2358 2358

1st Iteration % Covered Uncovered Total Namespace Covered Statements WSFinal 0% 0 71 71 WSFinal.Migrations 0% 0 571 571 WSFinal.Controllers 0% 0 1435 1435 WSFinal.Models 49% 213 224 437 WSFinal.Controllers.Admin 94% 82 5 87 Total 11% 295 2306 2601

2nd Iteration Namespace % Covered Uncovered Total Covered Statements WSFinal 0% 0 73 73 WSFinal.Migrations 0% 0 225 225 WSFinal.Controllers 18% 256 1146 1402 WSFinal.Models 59% 283 195 478 WSFinal.Controllers.Admin 96% 173 7 180 WSFinal.Helpers 100% 10 0 10 Total 30% 722 1646 2368

3rd Iteration % Covered Uncovered Total Covered Statements WSFinal.App_Start 0% 0 33 33 WSFinal 0% 0 73 73 WSFinal.Migrations 0% 0 243 243 WSFinal.Controllers 25% 379 1138 1517

69

WSFinal.Models 69% 409 180 589 WSFinal.Controllers.Admin 99% 173 2 175 WSFinal.Helpers 100% 10 0 10 Total 37% 971 1669 2640

4th Iteration Namespace % Covered Uncovered Total Covered Statements WSFinal.App_Start 0% 0 34 34 WSFinal 0% 0 73 73 WSFinal.Migrations 0% 0 162 162 WSFinal.Controllers 28% 426 1122 1548 WSFinal.Models 69% 407 180 587 WSFinal.Controllers.Admin 99% 230 2 232 WSFinal.Helpers 100% 10 0 10 Total 41% 1073 1573 2646

5th Iteration Namespace % Covered Uncovered Total Covered Statements WSFinal.App_Start 0% 0 34 34 WSFinal 0% 0 73 73 WSFinal.Migrations 0% 0 6 6 WSFinal.Controllers 30% 472 1122 1594 WSFinal.Models 71% 447 180 627 WSFinal.Controllers.Admin 100% 252 0 252 WSFinal.Helpers 100% 12 0 12 Total 46% 1183 1415 2598

70