Representing Concerns in Source Code
Total Page:16
File Type:pdf, Size:1020Kb
Representing Concerns in Source Code by Martin P. Robillard B.Eng. (Computer Engineering), Ecole´ Polytechnique de Montreal,´ 1997 M.Sc. (Computer Science), University of British Columbia, 1999 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES (Department of Computer Science) We accept this thesis as conforming to the required standard The University of British Columbia November 2003 c Martin P. Robillard, 2003 Abstract Many program evolution tasks involve source code that is not modularized as a single unit. Furthermore, the source code relevant to a change task often implements different concerns, or high-level concepts that a developer must consider. Finding and understanding concerns scattered in source code is a difficult task that accounts for a large proportion of the effort of performing program evolution. One possibility to mitigate this problem is to produce textual documentation that describes scattered concerns. However, this approach is impractical because it is costly, and because, as a program evolves, the documentation becomes inconsistent with the source code. The thesis of this dissertation is that a description of concerns, representing program structures and linked to source code, that can be produced cost-effectively during program investigation ac- tivities, can help developers perform software evolution tasks more systematically, and on different versions of a system. To validate the claims of this thesis, we have developed a model for a structure, called concern graph, that describes concerns in source code in terms of relations between program elements. The model also defines precisely the notion of inconsistency between a concern graph and the corresponding source code, so that it is possible to automatically detect and repair inconsistencies between a description of source code and an actual code base. To experiment with concern graphs, we have developed a tool, called FEAT, that allows devel- opers to iteratively build concern graphs when investigating source code, to view the code related to a concern, and to perform analyses on a concern representation. Using FEAT, we have evalu- ated the cost and usefulness of concern graphs in a series of case studies involving the evolution of five systems of different size and style. The results show that concern graphs are inexpensive to create during program investigation, can help developers perform program evolution tasks more systematically, and are robust enough to represent concerns in different versions of a system. ii Contents Abstract ii Contents iii List of Tables vi List of Figures vii Acknowledgments ix 1 Introduction 1 1.1 Concerns ....................................... 3 1.2 An Example of Program Evolution Involving Scattered Concerns .......... 5 1.3 Existing Support for Scattered Concerns ....................... 10 1.4 Concern Graphs .................................... 11 1.5 Overview of the Dissertation ............................. 14 2 The Concern Graph Model 15 2.1 Design Goals ..................................... 15 2.2 Formal Representation ................................ 16 2.2.1 Programs ................................... 16 2.2.2 Fragments ................................... 20 2.2.3 Concerns ................................... 23 2.3 Analyses ........................................ 24 2.3.1 Concern Analysis ............................... 24 2.3.2 Inconsistency Management .......................... 25 2.4 Summary ....................................... 26 3 Tool Support for Concern Graphs 28 3.1 General Mapping Function for Java ......................... 28 3.2 The Feature Analysis and Exploration Tool ..................... 31 3.2.1 Usage Model ................................. 32 3.2.2 User Interface ................................. 33 3.2.3 Implementation ................................ 42 iii 4 Validation 46 4.1 Methodology ..................................... 47 4.2 AVID Study ...................................... 48 4.2.1 Theory .................................... 48 4.2.2 Study Design ................................. 49 4.2.3 Results .................................... 51 4.2.4 Validity .................................... 52 4.3 Jex Study ....................................... 52 4.3.1 Theory .................................... 53 4.3.2 Study Design ................................. 53 4.3.3 Results .................................... 54 4.3.4 Validity .................................... 56 4.4 Redback Study .................................... 56 4.4.1 Theory .................................... 56 4.4.2 Study Design ................................. 56 4.4.3 Results .................................... 57 4.4.4 Validity .................................... 57 4.5 jEdit Study ...................................... 57 4.5.1 Theory .................................... 58 4.5.2 Study Design ................................. 58 4.5.3 Results .................................... 61 4.5.4 Validity .................................... 66 4.6 ArgoUML Study ................................... 67 4.6.1 Theory .................................... 67 4.6.2 Study Design ................................. 67 4.6.3 Results .................................... 69 4.6.4 Validity .................................... 72 4.7 Summary ....................................... 73 5 Automating Concern Graph Creation 75 5.1 Investigation Transcripts ............................... 76 5.2 Inference Algorithm .................................. 77 5.2.1 Calculating Probabilities ........................... 77 5.2.2 Calculating the Correlation Metric ...................... 78 5.2.3 Generating Concerns ............................. 79 5.3 Empirical Evaluation ................................. 80 5.3.1 Implementation Status ............................ 80 5.3.2 Configurations ................................ 81 5.3.3 Studies .................................... 81 5.3.4 Results .................................... 82 5.3.5 Observations ................................. 85 5.4 Summary ....................................... 87 iv 6 Discussion 88 6.1 The Development and Evaluation of the FEAT Tool ................. 88 6.2 Training and the Use of FEAT ............................ 89 6.3 Capturing System Behavior with Concern Graphs .................. 90 6.4 The Importance of a Good Seed ........................... 91 6.5 Concern Interaction Analysis ............................. 91 6.6 The Influence of Concern Graphs on the Evolution Process ............. 92 6.7 Future Work ...................................... 93 6.7.1 Automatic Concern Graph Construction ................... 93 6.7.2 Concern Databases .............................. 93 6.7.3 Pattern-based Code Investigation ....................... 94 6.7.4 Concern Graph-based Code Refactoring ................... 94 7 Related Work 95 7.1 Concern Code Location ................................ 95 7.1.1 Cross-referencing Tools ........................... 95 7.1.2 Program Slicing ............................... 96 7.1.3 Feature Location Techniques ......................... 97 7.1.4 Clustering Techniques ............................ 99 7.2 Concern Documentation ............................... 99 7.2.1 Textual Documentation ............................ 100 7.2.2 Conceptual Modules ............................. 100 7.2.3 Concern Visualization Tools ......................... 101 7.2.4 Virtual Files .................................. 101 7.2.5 Advanced Separation of Concerns Mechanisms ............... 101 7.3 Inconsistency Management .............................. 102 8 Conclusions 104 Bibliography 118 Appendix A Relational Algebra 119 A.1 Notational Conventions ................................ 119 A.2 Definitions ....................................... 119 Appendix B Relations in Java Programs 121 Appendix C Transcripts for the jEdit Case Study 123 C.1 Subject C1 ....................................... 124 C.2 Subject C2 ....................................... 125 C.3 Subject F1 ....................................... 127 C.4 Subject F2 ....................................... 129 v List of Tables 4.1 Claims addressed by the different studies ....................... 47 4.2 Characteristics of the different studies ........................ 47 4.3 Concern completeness results ............................. 55 4.4 Subject Characteristics ................................ 63 4.5 Concern graph for the ArgoUML study ....................... 69 4.6 Differences between classes of versions 0.11.4 and 0.13.4 of ArgoUML ...... 70 4.7 Overlap between studies and claims ......................... 74 5.1 Configuration parameter values ............................ 81 5.2 Characteristics of transcripts ............................. 83 5.3 Results for Subject C1 ................................ 83 5.4 Results for Subject C2 ................................ 84 5.5 Results for Subject C3 ................................ 84 5.6 Results for Subject J1 ................................. 85 5.7 Results for Subject J2 ................................. 85 C.1 View codes ...................................... 123 C.2 Action codes ..................................... 124 vi List of Figures 1.1 Concern scattering and tangling. ........................... 3 1.2 Sorting program .................................... 4 1.3 The main window of the jEdit application ...................... 6 1.4 The options window of the jEdit application ....................