An Empirical Study on Object-Oriented Software Dependencies: Logical, Structural and Semantic
Total Page:16
File Type:pdf, Size:1020Kb
AN EMPIRICAL STUDY ON OBJECT-ORIENTED SOFTWARE DEPENDENCIES: LOGICAL, STRUCTURAL AND SEMANTIC A Thesis submitted as partial fulfillment of the requirement of Doctor of Philosophy (Ph.D.) by Nemitari Miebaka Ajienka Computer Science Department Brunel University London 26 January 2018 Abstract Three of the most widely studied software dependency types are the structural, logical and semantic dependencies. Logical dependencies capture the degree of co- change between software artifacts. Semantic dependencies capture the degree to which artifacts, comments and names are related. Structural dependencies capture the dependencies in the source code of artifacts. Prior studies show that a combination of dependency analysis (e.g., semantic and logical analysis) improves accuracy when predicting which artifacts are likely to be impacted by ripple effects of software changes (though not to a large extent) compared to individual approaches. In addition, some dependencies could be hid- den dependencies when an analysis of one dependency type (e.g., logical) does not reveal artifacts only linked by another dependency type (semantic). While previous studies have focused on combining dependency information with minimal benefits, this Thesis explores the consistency of these measurements, and whether hidden dependencies arise between artifacts, and in any of the axes studied. In this Thesis, 79 Java projects are empirically studied to investigate (i) the direct influence and the degree of overlap between dependency types on three axes (logical – structural (LSt); logical – semantic (LSe); structural – semantic (StSe)) (structural, logical and semantic), and (ii) the presence of hidden coupling on the axes. The results show that a high proportion of hidden dependencies can be detected on the LSt and StSe axes. Notwithstanding, the LSe axis shows a much smaller proportion of hidden dependencies. Practicable refactoring methods to mit- igate hidden dependencies are proposed in the Thesis and discussed with examples. i Contents 1 Introduction 1 1 1.1 Change impact analysis . .1 1.2 Dimensions of software coupling . .2 1.3 Coupling axes . .4 1.4 Hidden coupling . .6 1.4.1 Connection . .8 1.5 Interplay on the coupling axes . .8 1.6 Research apparatus . 10 1.6.1 Research aims and objectives . 10 1.6.2 Research questions . 13 1.6.3 Research contributions and beneficiaries . 13 1.7 Thesis structure . 16 1.8 List of publications . 17 2 State of the art 18 2.1 Introduction . 18 2.2 Software maintenance and evolution . 19 2.3 Static change impact analysis (CIA) . 22 2.4 Software dependencies . 25 2.4.1 Logical coupling . 27 ii 2.4.2 Structural coupling . 30 2.4.3 Semantic coupling . 34 2.5 Interplay on the coupling axes . 36 2.5.1 Interplay on the logical and structural (LSt) coupling axis . 37 2.5.2 Interplay on the logical and semantic (LSe) coupling axis . 41 2.5.3 Interplay on the structural and semantic (StSe) coupling axis 42 2.6 Summary of the chapter . 48 3 Methodology 50 3.1 Introduction . 50 3.2 Data sampling . 50 3.3 Data extraction . 52 3.4 Software dependency metrics extraction . 55 3.4.1 Logical coupling measurement . 56 3.4.2 Structural coupling measurement . 61 3.4.3 Semantic coupling measurement . 63 3.4.3.1 Semantic coupling measurement using class corpora 64 3.4.3.2 Semantic coupling measurement using class identi- fiers vs their corpora . 67 3.5 Data analysis . 73 3.6 Replicability . 75 3.6.1 Corpora-based semantic coupling measurement tool . 76 3.7 Summary of the chapter . 78 4 Interplay on the logical and structural (LSt) coupling axis 79 4.1 Introduction . 79 4.2 Motivation . 80 4.3 Empirical investigation . 81 4.3.1 Studied sample of OSS projects . 81 4.3.2 Data collected . 82 iii 4.3.3 The research questions . 83 4.4 Worked example . 84 4.4.1 RQ1: Overlap between logical and structural coupling . 84 4.4.2 RQ2: Linear relationship between logical and structural cou- pling . 85 4.5 Results: Overall sample of OSS projects . 87 4.5.1 RQ1: Overlap between logical and structural coupling . 87 4.5.2 RQ2: Linear relationship between logical and structural cou- pling . 91 4.6 Points of action . 93 4.7 Summary of the chapter . 94 5 Interplay on the logical and semantic (LSe) coupling axis 95 5.1 Introduction . 95 5.2 Motivation . 96 5.3 Empirical investigation . 97 5.3.1 Studied sample of OSS projects . 98 5.3.2 Data collected . 98 5.3.3 The research questions . 98 5.4 Worked example project . 99 5.4.1 RQ1: Overlap between logical and semantic coupling . 99 5.4.2 RQ2: Linear relationship between logical and semantic coupling100 5.5 Results: Overall sample of OSS projects . 102 5.5.1 RQ1: Overlap between logical and semantic coupling . 102 5.5.2 RQ2: Linear relationship between logical and semantic coupling105 5.6 Points of action . 108 5.7 Summary of the chapter . 109 6 Interplay on the structural and semantic (StSe) coupling axis 111 6.1 Introduction . 111 iv 6.2 Motivation . 112 6.3 Empirical investigation . 115 6.3.1 Studied sample of OSS projects . 115 6.3.2 Data collected . 115 6.3.3 The research questions . 116 6.4 Worked example . 116 6.4.1 RQ1: Overlap between structural and semantic coupling . 117 6.4.2 RQ2: Linear relationship between structural and semantic coupling . 123 6.5 Results: Overall sample of OSS projects . 123 6.5.1 RQ1: Overlap between structural and semantic coupling . 124 6.5.2 RQ2: Linear relationship between structural and semantic coupling . 127 6.6 Points of action . 129 6.6.1 Solving the S dependencies: ‘rename class’ design pattern . 130 6.6.2 Solving the W dependencies: a novel semantic and structural link ’extract class’ design pattern . 131 6.7 Summary of the chapter . 136 7 Conclusion 138 7.1 Introduction . 138 7.2 Research summary . 139 7.3 Research contributions and beneficiaries . 141 7.4 Personal achievement . 146 7.5 Research limitations . 147 7.5.1 External validity . 147 7.5.2 Internal validity . 148 7.5.3 Construct validity . 149 7.6 Future work . 149 v 7.6.1 Empirical studies . 150 7.6.2 Tool development . 151 7.6.3 Studied domains . 152 Glossary of software engineering terms 178 Appendix A 180 A.1 Software dependency metrics . 180 A.2 OSS project sample description . 180 A.3 Advantages of the vector space model (VSM) compared to latent semantic indexing (LSI) information retrieval (IR) approach . 191 Appendix B 192 B.1 Structural and logical coupling interplay outcomes . 192 Appendix C 199 C.1 Semantic and logical coupling interplay outcomes . 199 Appendix D 207 D.1 Structural and semantic coupling interplay outcomes . 207 vi List of Figures 1.1 The three explored coupling axes in this Thesis based on identified gaps in the literature (see Chapter 2) . .5 2.1 A simple view of software development ([189]) . 21 2.2 A method named addShape() from KOffice showing the conceptual information that is latent in (some of the) identifier names, adapted from [102] . 36 2.3 A method named removeShape() from KOffice showing the concep- tual information that is latent in (some of the) identifier names, adapted from [102] . 37 3.1 CVSAnalY database tables . 53 3.2 CVSAnalY database schema . 54 3.3 CVSAnalY actions table . 55 3.4 Association rule mining with the arules package in R . 60 3.5 Output of association rule mining with the arules package in R . 61 3.6 Structural dependency path between two OO software classes – Caller (C2) using a function and variable defined in Called (C1) [203] . 63 3.7 Chi-square association test results for class corpora (VSM) vs. iden- tifier (N-Gram, Disco word synonym category) based semantic simi- larity techniques (box-plot distribution of p-values for threshold t = 0.1, 0.2 and 0.5) . 71 vii 4.1 Summary of classes and revisions in overall studied sample of OSS projects . 83 4.2 Intersection of structurally and logically coupled classes in bluecove . 85 4.3 CLD and CSD Percentages per OSS project (KEY: CSD = co-changed structural dependencies; CLD = coupled logical dependencies) . 88 4.4 Venn Diagrams (weighted) showing the two sets of coupling in two scenarios: project ID=97 (left) and project ID=69 (right) . 90 4.5 Structural and logical coupling - Spearman’s rank correlation results with p-values . ..