Exploring Community Structure of Software Call Graph and Its Applications in Class Cohesion Measurement

The Journal of Systems and Software 108 (2015) 193–210 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Exploring community structure of software Call Graph and its applications in class cohesion measurement Yu Qu a, Xiaohong Guan a,∗, Qinghua Zheng a, Ting Liu a, Lidan Wang a, Yuqiao Hou a, Zijiang Yang b a Ministry of Education Key Lab for Intelligent Network and Network Security, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, China b Department of Computer Science, Western Michigan University, Kalamazoo, MI 48167, USA article info abstract Article history: Many complex networked systems exhibit natural divisions of network nodes. Each division, or community, is Received 20 November 2014 a densely connected subgroup. Such community structure not only helps comprehension but also finds wide Revised 16 April 2015 applications in complex systems. Software networks, e.g., Class Dependency Networks, are such networks with Accepted 7 June 2015 community structures, but their characteristics at the function or method call granularity have not been in- Availableonline16June2015 vestigated, which are useful for evaluating and improving software intra-class structure. Moreover, existing Keywords: proposed applications of software community structure have not been directly compared or combined with Class cohesion metrics existing software engineering practices. Comparison with baseline practices is needed to convince practi- Complex network tioners to adopt the proposed approaches. In this paper, we show that networks formed by software methods Community structure and their calls exhibit relatively significant community structures. Based on our findings we propose two new class cohesion metrics to measure the cohesiveness of object-oriented programs. Our experiment on 10 large open-source Java programs validate the existence of community structures and the derived metrics give additional and useful measurement of class cohesion. As an application we show that the new metrics are able to predict software faults more effectively than existing metrics. © 2015 Elsevier Inc. All rights reserved. 1. Introduction the applications of community detection algorithms can be found in Fortunato (2010). Many natural and man-made complex networked systems, in- There are also research efforts to investigate community struc- cluding metabolic networks, computer networks and social net- tures in software, a very complex system (Concas et al., 2013; Pan works, exhibit divisions or clusters of network nodes (Flake et al., et al., 2011; Šubelj and Bajec, 2011; 2012; Šubelj et al., 2014). Most 2000; Fortunato, 2010; Girvan and Newman, 2002; Mucha et al., of them reported a significant community structure of a certain type 2010; Palla et al., 2005). Each division, or community (Girvan of software network such as Class Dependency Networks(Šubelj and and Newman, 2002), is a densely connected and highly correlated Bajec, 2011). Some pioneering applications of software community subgroup. Such community structure not only helps comprehension structure are proposed (for more details, please refer to Section 2). but also finds wide applications in complex systems. For example, However, there are still some unsolved problems. researchers in Biology and Bioinformatics have applied community Firstly, most of the measurements are performed on the net- detection algorithms to identifying functional groups of proteins in work of classes. Little results are reported on the granularity of Protein–Protein Interaction networks (Dunn et al., 2005; Jonsson et al., software method or function call, i.e., method/function Call Graphs 2006). For online auction sites such as ebay.com, community struc- (Graham et al., 1982). Such investigation is necessary from both ture is used to improve the effectiveness of the recommendation sys- theoretical and practical perspectives. In addition, measurements tems (Jin et al., 2007; Reichardt and Bornholdt, 2007). A survey on of the network of classes cannot be used in intra-class structure, which limits their applications in software quality evaluation and improvement. Secondly, these pioneering applications have not been directly ∗ Corresponding author. Tel.: +86 29 82663934. compared or combined with existing software engineering metrics E-mail addresses: [email protected] (Y. Qu), [email protected] (X. Guan), [email protected] (Q. Zheng), [email protected] (T. Liu), lidanwang@ and practices. Comparison with baseline practices is needed to con- stu.xjtu.edu.cn (L. Wang), [email protected] (Y. Hou), [email protected] vince people to adopt the proposed approaches. Only when the pro- (Z. Yang). posed approaches outperform or complement the existing method, http://dx.doi.org/10.1016/j.jss.2015.06.015 0164-1212/© 2015 Elsevier Inc. All rights reserved. 194 Y. Qu et al. / The Journal of Systems and Software 108 (2015) 193–210 ally reduces coupling between modules. However, employing object- oriented mechanism itself does not necessarily guarantee minimal coupling and maximal cohesion. Therefore, a quantitative measurement is valuable in both a posteriori analysis of a finished product to control software quality, and a priori analysis to guide coding in order to avoid undesirable results in the first place. The existence of community structures, as confirmed by our experiments on 10 large open-source Java programs using four widely-used community detection algorithms, sheds light on the cohesiveness measurements of OO programs. Intuitively, community structures are able to indicate cohesion as nodes within a community are highly cohesive, and nodes in different communities are loosely coupled. In this paper, we propose two new class cohesion metrics— MCC (Method Community Cohesion)andMCEC (Method Community Entropy Cohesion) based on community structures. The basic idea of MCC is to quantify how many methods of a certain class reside in the same community. As for MCEC, it uses the standard notion of In- formation Entropy (Shannon, 2001) to quantify the distribution of all Fig. 1. Community structure of jEdit with 5979 nodes and 34 communities, detected the methods of a class among communities. Comparing with existing by Louvain algorithm (Blondel et al., 2008). metrics, these two metrics provide a new and more systematic point of view for class cohesion measurement. there can be a possibility that the approaches are adopted by soft- Fig. 2 gives the overview of our approach. Once a Call Graph is ware engineering practitioners. constructed, we apply widely-used community detection algorithms. Do software networks at other granularities also present signif- Fig. 2 shows the static Call Graph of JHotDraw, a Java GUI framework icant community structures? If so how can we make use of it in for technical and structured graphics. There are 5125 nodes divided software engineering practices? To answer these open questions and into 35 communities, as reported by Louvain algorithm (Blondel et al., solve the existing problems, we construct static Call Graphs, in which 2008). Based on the community structures, the metrics of MCC and nodes represent methods in an OO (Object-Oriented) program and MCEC are computed. edges represent methods invocation relations. We then apply exist- We validate the proposed metrics using the following processes. ing community detection algorithms to such graphs. Fig. 1 depicts Firstly, we show that MCC and MCEC theoretically satisfy expected the community structure of jEdit, an open-source text editor. There properties of class cohesion metrics (Briand et al., 1998). Secondly, are 5979 nodes that are divided into 34 communities shown in differ- we empirically compare MCC and MCEC with five widely-used class ent colors. The community structure is detected by Louvain algorithm cohesion metrics and our experiments indicate that the new met- (Blondel et al., 2008) that is implemented in a network analysis and rics are more reasonable than existing ones. Thirdly, Principle Com- visualization tool called Pajek.1 In Section 3, we show that such re- ponent Analysis (PCA; Pearson, 1901) is conducted to show that MCC sult presents typical community characteristics similar to those pre- and MCEC provide additional and useful information of class cohesion viously observed in other complex systems. that is not reflected by existing metrics. Finally, experiments are car- It is well known that high-quality software should exhibit “high ried out to show that MCC and MCEC usually perform equally or better cohesion and low coupling” nature. Software with such nature is than existing class cohesion metrics when they are used in software believed to be easy to understand, modify, and maintain (Briand fault prediction. et al., 2001; Pressman, 2010). Object-oriented design strives to In summary we make the following contributions in this paper: incorporate data and related functionality into modules, which usu- 1. We show through experiments on 10 large open-source Java programs that the static Call Graphs constructed from OO pro- 1 http://vlado.fmf.uni-lj.si/pub/networks/Pajek/ grams usually exhibit relatively significant community structures Fig. 2. The proposed approach of this paper, the static Call Graph and community structure of a Java GUI framework for technical and structured graphics—JHotDraw, is depicted in this figure. Y. Qu et al. / The Journal of Systems and Software 108 (2015) 193–210 195 as other networked complex systems (e.g., social

Exploring Community Structure of Software Call Graph and Its Applications in Class Cohesion Measurement

A Network Approach to Define Modularity of Components In

Practical Applications of Community Detection

Social Network Analysis with Content and Graphs

Networks in Cognitive Science

Multi-Layered Network Embedding

An Ecological Approach to Software Supply Chain Risk Management

Exponential Random Graph Models An

Networktoolbox

Patterns in Syntactic Dependency Networks

Evaluating Network Analysis and Agent Based Modeling for Investigating the Stability of Commercial Air Carrier Schedules

Fast Cross-Layer Dependency Inference on Multi-Layered Networks

Mining Social Dependencies in Dynamic Interaction Networks