Software Evolution Analysis for Team Foundation Server
Total Page:16
File Type:pdf, Size:1020Kb
Software evolution analysis for Team Foundation Server Bachelor thesis University of Groningen July 2013 Author: Joost Koehoorn Primary supervisor: Prof. dr. Alexandru C. Telea Secondary supervisor: Prof. dr. Gerard R. Renardel de Lavalette Abstract To understand how software evolves, visualizing its history is a valuable approach to get an in-depth view of a software project. SolidTA, a software evolution visualization application, has been made to obtain these insights by extracting data from version control systems such as SVN and Git. Companies with large, proprietary codebases are often required to use an all-in-one solution such as Microsoft’s Team Foundation Server. During this project I have been looking into ways of extending SolidTA with the ability to import history from TFS. This has been achieved by utilizing the TFS SDK in order to import all necessary history information into SolidTA’s data domain. Another key part in understanding software evolution are source metrics, such as lines of code and McCabe’s complexity measure. The primary language of TFS projects is C# for which an analyzer was not available in SolidTA, so I have researched existing analyzers and then decided to implement an analyzer myself for greater control over the available metrics, based on an existing C# parser. This has become a fast tool that provides extensive per-file-metrics, for which SolidTA has been extended in order to visualize them. The TFS integration and C# analyzation have been field-tested on a codebase of RDW ("Rijksdienst van Wegverkeer"), which spans a history of circa six years. This test has shown that the implemented solutions are fast and reliable. i. Contents 1 Introduction 1 2 Related Work 2 2.1 SolidTA . .2 2.2 Repository mining . .4 2.3 Team Foundation Server . .4 2.3.1 Version Control . .4 2.3.2 Project Management . .5 2.3.3 Build server . .5 2.4 Metric computation . .5 2.4.1 Metrics . .6 3 Conceptual design 7 3.1 Import process . .7 3.2 Additional metrics . .8 3.3 TFS specific metrics . .9 4 TFS import 10 4.1 Requirements . 10 4.2 Possibilities . 11 4.2.1 TFS2SVN . 11 4.2.2 TFS SDK . 12 4.3 Environment . 12 4.3.1 Programming Language . 13 4.3.2 Dependencies . 14 4.4 Data storage . 14 4.5 Implementation . 15 4.5.1 Design . 15 4.5.2 Import stages . 16 4.5.3 Canceling and progress bars . 18 5 C# analysis 19 5.1 Requirements . 19 ii. 5.2 Solutions . 19 5.2.1 Existing analyzer . 20 5.2.2 Parser and Visitors . 20 5.3 Design and implementation . 21 5.3.1 Visitors . 21 5.3.2 Interface with SolidTA . 23 6 Additional changes 24 6.1 Metric based filtering . 24 6.2 Full-history analyzation . 25 7 Results 26 7.1 Test repository . 26 7.1.1 Performance . 26 7.2 Structural analysis of RDW repository . 27 7.2.1 Analyzation of code metrics . 30 7.3 Evaluation . 36 8 Conclusions 37 9 Future Work 38 10 Acknowledgments 39 11 Acronyms 40 Bibliography 41 List of Tables 2.1 Metrics of interest . .6 7.1 Performance timings . 26 iii. List of Figures 2.1 Overview of SolidTA . .3 3.1 SolidTA’s main window . .7 3.2 Metrics extraction window . .8 3.3 List of work items . .9 4.1 ORM Schema of SolidTA . 15 4.2 SolidTA’s progress bars . 18 5.1 Structural complexity code snippet . 22 6.1 Selective filtering based on metrics values . 24 7.1 Files in RDW repository . 27 7.2 Metrics legend . 28 7.3 Evolution of author activity . 28 7.4 Authors metric . 29 7.5 Authors metric (reduced) . 29 7.6 Evolution of LOC metric . 30 7.7 Evolution of CLOC metric . 31 7.8 Evolution of NCLAS metric . 31 7.9 Evolution of NCLAS metric, ≥ 9 ....................... 32 7.10 Evolution of MSIZE metric . 32 7.11 Evolution of FSIZE metric . 33 7.12 Evolution of CYCL metric . 33 7.13 Files with CYCL ≥ 9 marked . 34 7.14 Evolution of STRC metric . 34 7.15 Evolution of NCAL metric . 35 7.16 Files with NCAL ≥ 9 marked . 35 7.17 Evolution of SSIZE metric . 36 7.18 Evolution of SSIZE metric, ≥ 9 ........................ 36 iv. Software evolution analysis for TFS 1. Introduction Chapter 1 Introduction Version control systems (VCS) have become more and more important for software projects over the last couple of years. SolidTA is an application designed to perform analyzation on the evolution of large software projects. It gives insights in many differ- ent metrics and it is highly sophisticated in being able to combine many of such metrics to answer specific questions about software projects. This helps in understanding how a project is maintained, and may help in improving the development process. One of the goals of SolidTA is to provide a way to gain more insights into different types of software projects. So, not only can it be used to analyze just one individual project, but also for a comparison between multiple projects which may differ in many ways. This includes comparisons between open source and proprietary software. Since many large proprietary software companies are bound to use TFS, it is not currently possible do that, since SolidTA only has support for SVN. Patterns in how the VCS is used may be analyzed, but SolidTA offers a lot more by analyzing the actual code contents of a repository throughout the complete history. To accomplish this, static code analysis tools are used to calculate metrics from source files. The analysis tool that is currently used by SolidTA is CCCC1 and this tool only supports C, C++ and Java code. As TFS mainly targets C# applications, a solution has to be found to also support C# in SolidTA. In this thesis I research possibilities on providing an interface between TFS and SolidTA. In chapter 2 I look at related work, to continue with the conceptual design in chapter 3. Chapter 4 investigates a couple of approaches for data importing from TFS, and the best fitting solution is implemented and integrated in SolidTA. Then in chapter 5, options for C# analyzation are discussed and the implementation of the solution is given. To validate the implemented solutions, chapter 7 analyzes a large software project maintained in TFS. 1http://cccc.sourceforge.net 1 Software evolution analysis for TFS 2. Related Work Chapter 2 Related Work As visualization is an active research area, earlier research has been undertaken that is relevant to what is to be accomplished by this project. In this chapter I discuss findings from these earlier research projects and the main architecture and features of TFS. 2.1 SolidTA The provided software itself, SolidTA, is the result of research done by Voinea et al. [5]. In this paper, a new way of visualizing code evolution was proposed and a tool to support and validate their proposal was announced in the form of CVSgrab. This tool can be seen as the predecessor of SolidTA, which is a commercial release accommodating various new features compared to CVSgrab, but the visualization technique has stayed the same. Figure 2.1 shows the GUI of SolidTA, highlighting some important views. As SolidTA’s main goal is to visualize evolution of files, displayed in (a), the versions of files are visualized in (b) where in-depth insights are provided by various metrics (c), representing certain attributes of a version. 2 Software evolution analysis for TFS 2. Related Work Figure 2.1: Overview of SolidTA The visualization technique as described in aforementioned paper and used in SolidTA draws files as fixed height horizontal stripes —as seen in view (b)—, each divided up into several segments, representing all versions of the file. These segments are ordered according to creation time and the length correspond to the lifetime of the version. The paper also gives a formal definition of a repository R with NF files: R = fFi j i = 1::NF g (2.1) Each files Fi is then defined as a set of NVi versions: Fi = fVij j j = 1::NVig (2.2) These definitions are repository agnostic, whereas the exact definition of a version is dependent on the type of repository. SolidTA is flexible to support multiple version definitions by offering support for plugins that may store all kinds of data offered by a repository. 3 Software evolution analysis for TFS 2. Related Work 2.2 Repository mining Data mining is the process of extracting relevant information from version control sys- tems. This is a hard problem since those tools are primarily designed to commit changes and rollback to previous versions, they usually do not have interfaces for extraction of data [6]. Also, large software projects may span over a decade of history, covering thou- sands of files and hundreds of thousands of versions. These large amounts of data are mostly stored on a separate server and can thus not be requested on the fly, but have to be imported and stored locally first. Another problem exists when mining data from different types of repositories, because each VCS has its own features, libraries and APIs, if any. For an application such as SolidTA, these different types of repositories have to be imported into one central data domain, so differences have to be normalized. 2.3 Team Foundation Server Microsoft’s Team Foundation Server1 is much more than just a version control system. It also incorporates project management tools, a build server and continuous integration server, among more technical features, such as an SDK so external applications can extend and interact with TFS [3].