DESIGNDIFF: Continuously Modeling Software Design Difference from Code Revisions
Total Page:16
File Type:pdf, Size:1020Kb
DESIGNDIFF: Continuously Modeling Software Design Difference from Code Revisions Xiao Wang Lu Xiao Kaifeng Huang Stevens Institute of Technology Stevens Institute of Technology Fudan University Hoboken, United States Hoboken, United States Shanghai, China [email protected] [email protected] [email protected] Bihuan Chen Yutong Zhao Yang Liu Fudan University Stevens Institute of Technology Nanyang Technological University Shanghai, China Hoboken, United States Singapore [email protected] [email protected] [email protected] Abstract—The design structure of a system continuously refactoring following the boy-scout rule: Leave your code better evolves as the consequence of fast-paced code revisions. Agile than you found it. Therefore, developers need a way to quickly techniques, such as continuous testing, ensures the function goals and correctly understand the high-level design differences of a system with every code revision. However, there lacks an efficient approach that can continuously model the design resulting from their own and others’ code revisions. Previous difference resulting from every single code revision to facili- research has developed various techniques to recover the overall tate comprehension and ensure the design quality. This paper design of a software system to improve understanding [10]– contributes a novel design modeling approach, called Design [18]. The limitation is that existing approaches scan the entire Differencing (DESIGNDIFF), that models and visualizes the high- code base to build the design model, and thus are too expensive level design differences resulting from every code revision. This paper defines a complete and general set of 17 design change to apply after every code revision in fast-paced development operators to capture the design difference from any code revision. process. In fact, a code revision usually only alters (if at all) a We evaluated the potential of DESIGNDIFF in three aspects. First, very small part of the design. We argue that it is more efficient a user study of 10 developers indicated that DESIGNDIFF can help to model the design difference by focusing on the changed practitioners to faster and better understand high-level design parts in an incremental fashion. differences from real-life software commits. Second, DESIGN- DIFF analyzed 14,832 real-life commits in five real-life projects: In addition, developers often unintentionally introduce design finding 4,189 commits altered the software design, 855 commits flaws, which are formed by problematic connections among introduced and 337 commits eliminated design flaws. The latency software elements. Previous research has revealed typical design between the flaw introduction and elimination is on average 2 flaws that recur in different projects [19]–[21]. For example, months to 2 years! With an affordable performance overhead, Cyclic Dependency is a group of source files that form a DESIGNDIFF has great potential to benefit practitioners in more applications. dependency cycle. Strong evidences from dozens of real-life Index Terms—software architecture, architecture flaws, reverse projects, both open source and industry, reveal that typical engineering design flaws are responsible for poor maintenance quality and high long-term costs [19], [21]–[23]. The limitation is that I. INTRODUCTION these design flaws usually have already caused significant loss During the past decades, agile methods have gained preva- to the system when they can be detected by existing approaches. lence to enable fast-paced software development [1]–[5]. In practice, due to limited resources, the design flaws were The design structure of a system continuously evolves as seldom eliminated by refactoring and continued causing more the consequence of fast-paced code revisions. Developers loss to the project [24], [25]. We argue that a more effective often unintentionally introduce design flaws that violate well solution is to instantly identify the introduction (as well as the accepted design principles. Overtime, the system becomes ever elimination) of these design flaws with rapid code revisions. challenging to understand and maintain [6]. Agile techniques, This paper contributes an efficient approach, called Design such as continuous testing [7]–[9], ensures the function goals Differencing (DESIGNDIFF). It automatically and efficiently of a system with every code revision. However, there still lacks models and visualizes the essential design differences resulting an efficient approach that can continuously model the design from any code revision as a sequence of design change difference resulting from every single code revision to facilitate operators. This paper defines a comprehensive set of 17 comprehension and ensure the design quality. fundamental design change operators. These operators, applied Many agile methods advocate collective code ownership in combinations and/or in sequences, can capture any drastically and continuous refactoring for enabling the “Built-In Quality” design structure change in a system. The unique advantage of principle. The development team should perform Litter-Pickup our approach is that it focuses on modeling the changed parts, instead of re-modeling the entire design structure of a system. software system as a square matrix [34]–[37]. The rows and As such, it is affordable to be applied after every code commit. columns in a DSM are labeled by the same set of software DESIGNDIFF can also help to instantly detect the introduction design elements in the same order. A cell along the diagonal and elimination of design flaws with rapid code revisions. This represents self-dependency, and a non-empty off-diagonal cell paper contributes the corresponding tool that automatically captures the dependency between the element on the row and generates such design differences and visualization, taking a the element on the column. In Java, there are nine common commit ID and the code base of the system as inputs. The tool types of structural dependencies among source files, extracted developed and data used in this paper can be found here 1. by Understand [32]. In common programming languages like 1) Can DESIGNDIFF facilitate the understanding of the Java and C++, each file usually contains the definition of one high-level design differences resulting from real-life software major class, therefore we use “class” and “file” interchangeably commits? A user study involving 10 developers reviewing for the ease of presentation. Each dependency type is explained 20 real-life software commits of different complexity levels as follows. indicates that DESIGNDIFF can help improve the completeness • fa Implement fb: fa (class) implements fb (interface) and correctness of understanding with less time. In the study, • fa Extend fb: fa (child class) extends fb (parent class) we compared with the code diff viewer from GitHub [26] • fa Call fb: fa calls the methods declared in fb as the baseline. The rationale is that 1) GitHub is the most • fa Throw fb: fa throws an exception of the type of fb accessible tool that highlight code difference in every single • fa Cast fb: fa is casted into the type of fb code commit; and 2) to the best of our knowledge, there • fa Create fb: fa creates an instance of fb is no other available tool that can compute the high-level • fa Typed fb: fa declares a variable of the type of fb design difference from every single code revision. Therefore, • fa Use fb: fa uses a value assigned from the type of fb we believe that DESIGNDIFF can be a handy tool for developers • fa Import fb: fb is imported at the header of fa to perform design review for every single code commit. Fig. 1a presents an example of a file-level DSM of the Maze 2) Can DESIGNDIFF help to instantly detect the introduction Game program (see Section III-A). Cell[1,7] says “Tp, Cr”. It and elimination of design flaws with every single code revision? means that file 1, Door, declares a parameter type of (Typed) We applied DESIGNDIFF on a total of 14,832 commits from five and create an instance of (Create) file 7, Room. open source projects on the Apache Software Foundation [27]. b) Modular Operators We found that only 4,189 (28%) commits altered the design According to the design rule theory proposed by Baldwin structure of a system. DESIGNDIFF further detected that 855 and Clark [33], any complex system is composed of high- (20%) of the 4,189 design structure altering commits introduced level design rules and modules. They proposed to capture four types of architectural flaws, including cyclic dependency, the dynamics of a modular structure by six simple modular parent calls child, sibling call each other and declare child operators, which are a powerful set of conceptual tools to class which violated the SOLID design principles [28], [29]. capture the dynamics of a modular design. These six modular On the contrary, only 337 (8%) commits fixed the flaws. The operators are: latency between the flaw introduction and elimination is on • Augmenting: add a new module to a system average 2 months to 2 years! This indicates that DESIGNDIFF • Excluding: remove a module from a system can help to safeguard the design quality by providing instant • Porting: a module ports the functions from another module warning to developers when design flaws are first introduced. • Inverting: create a new design rule from existing module(s) 3) Finally, the overhead of DESIGNDIFF is averagely from • Splitting: break a module into sub-modules 13 to 29 seconds for analyzing commits in different sizes, • Substituting: replace one module by another module which is affordable to be employed for analyzing every single change. These six operators form the theory of design evolution. They can be applied in combinations and in sequences to drastically II. BACKGROUND change the design structure of a system. In this section, we introduce the fundamental background. III.T HE DESIGNDIFF APPROACH a) Dependency Graph and Design Structure Matrix The Design Differencing (DESIGNDIFF), which can incre- A dependency graph captures the structural dependencies mentally model and visualize the design structure differences among software design elements [30].