The Impact of Programming Languages in Code Cloning

Total Page:16

File Type:pdf, Size:1020Kb

The Impact of Programming Languages in Code Cloning The Impact of Programming Languages in Code Cloning Jaime Filipe Jorge1 and António Menezes Leitão2 1 Instituto Superior Técnico Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal [email protected] 2 Instituto Superior Técnico Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal [email protected] Abstract Code cloning is a duplication of source code fragments that frequently occurs in large software systems. Although different studies exist that evidence cloning benefits, several others expose its harmfulness, specifically upon inconsistent clone management. One important cause for the creation of software clones is the inherent abstraction capabilities and terseness of the programming language being used. This paper focuses on the features of two different programming languages, namely Java and Scala, and studies how different language constructs can induce or reduce code cloning. This study was further developed using our tool Kamino which provided clone detection and concrete values. 1998 ACM Subject Classification K.6.3 Software Management Keywords and phrases Clone Detection, Software Engineering, Programming Languages, Soft- ware Management Digital Object Identifier 10.4230/OASIcs.SLATE.2012.107 1 Introduction There are a number of bad smells in the process of writing software systems[1]. Code cloning is one of them. A code clone is a duplication of source code where programs are copied from one location and subsequently pasted onto other locations. Studies found that code cloning is ubiquitous in software systems[2][3][4]. Some authors consider cloning to be harmful to software development [2][3][5][6]. A study[7] reports that in some systems half of the changes to code clone groups are inconsistent and that corrective changes following inconsistent changes are rare. Its existence may be due to a number of reasons. Clones may be caused by programmer inexperience or programming methodology. In this paper we will focus in programming language constructs as causes for code cloning. Subject to our study are two programming languages: Java and Scala. The Java pro- gramming language is one of the de facto standards for writing object-oriented systems. Its language constructs are result of years of best practices and it is still passing the test of time by having a large of companies using it. We will provide insights as to how these language constructs influence code cloning. Scala is a more recent statically typed JVM language that provides different abstraction mechanisms, which are a result of learning from Java’s wrong and right doings. © Jaime Filipe Jorge and António Menezes Leitão; licensed under Creative Commons License NC-ND 1st Symposium on Languages, Applications and Technologies (SLATE’12). Editors: Alberto Simões, Ricardo Queirós, Daniela da Cruz; pp. 107–122 OpenAccess Series in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 108 The Impact of Programming Languages in Code Cloning We provide clone analysis using these two programming languages and we will conclude how abstraction capabilities and verbosity/terseness influence code cloning. This study originated from the development of our tool, Kamino, which is a software clone evolution analyzer that provides developer with means to acknowledge clone duplication, observe and mitigate its consequences. The rest of the document is structured as follows: section 2 describes software clones and programming language impact, section 3 describes a case study where we instantiate some of the Java problems, section 4 describes in detail some of the problematic situations which fuel software clones and how they can be mitigated using Scala, section 5 will map the scenarios described earlier to real values obtained using our tool Kamino and section 6 presents our conclusions. 2 Software Clones A code clone is a duplication of source code in which fragments are copied from one location and subsequently pasted onto other locations, which can then be modified or not. Upon duplication, it is often difficult to differentiate the original fragments from duplicated ones, hence the two fragments are uniquely called clones. Studies show that cloning is ubiquitous in software systems. For instance, Baker et al.[2] found that, on large systems, 13% to 20% of source code can be cloned code. Baxter et al.[3] reported 12.7% source code clones whereas Mayrand et al.[4] experienced 5% to 20%. Well known software systems have been subject to clone study. In terms of their per- centage of code cloning, GCC averages 8,7%[8], X Windows has 19%[2] whereas Linux and JDK have , respectively, 22,7% and 29%[9]. From the developers perspective, there are a number of factors which propel code cloning. On the one hand, we can think of limitations inherent to the programmer himself. For example, when he needs to work with a new system. Whether because it is a large system (which induces difficulty in mastering its domain) or the programmer is dealing with code which he does not own, both aspects tend to promote a kind of example-driven programming approach, in which developers copy fragments of existing functionality and modify them to solve the problem at hands, i.e., each developer’s task. Logically, this methodology enables clone propagation. In what regards programming methodology, there are also reasons which motivate clon- ing. Systems that rely heavily on generated code (generative programming), tend to have higher clone numbers, since they actually enforce a template which can be seen as a frame- work of a clone. Programming languages can ultimately induce code redundancy. Less expressive lan- guages promote code cloning by requiring greater verbosity and need for boilerplate in programming. This code verbosity may hinder the original developers intent. Other prob- lem related to programming languages are lack of reusing mechanisms. Examples of such mechanisms are inheritance composition, generic functions, higher order functions, macros, type classes, etc. Upon nonexistence of such mechanisms, programmers are motivated to copy and paste code. Software duplication has a function of cost, i.e. , implementing a refactored alternative to a cloned code fragment is not always clearly beneficial [10]. This may be caused by difficulty in creating or understanding the refactored version or by unwanted size increase that it would imply in the system. Both of these causes can be correlated to the original premise that programming languages may induce cloning namely because refactoring may be complex to J. F. Jorge and A. M. Leitão 109 create/understand and may create larger versions of the original cloned fragments. Hence, we can also create the parameters on which we can grade programming languages, based on their capabilities towards mitigating code cloning: Abstraction Capabilities This criteria judges the idiomatic and simple nature of abstraction possibilities to given cloned fragments in the programming language itself. It may be possible to create such abstraction constructs but they may not be valuable in the cost function described above (namely in complexity in construction and understanding). These abstraction capabilities may include method extractions, inheritance, mixins, high- order functions or anonymous inner classes, duck typing, etc. Language Verbosity This criteria evaluates the size of a refactored alternative to cloning as well as the overall boilerplate syntax required to implement a given feature. These map to the cost function described earlier: if a developer prefers cloning to a refactored version that is larger in size, then one can infer that the language is proliferating code cloning by being verbose. To instantiate these concerns, we now present a case study of code cloning in which we observe problematic situations of lack of abstraction and a surplus of verbosity. 3 Case Study To motivate the study of programming languages as fueling agents for software clones we now provide a case study of a mature Java project. Containing approximately 7000 Java classes, with a grand total of 775921 LoC. The project was subject to a clone detection experiment, using our tool Kamino, which analyzed several hundred revisions and provided feedback on possible inconsistent cloning situations. We used a clone detector named Simian1 to provide information about clone detection. Simian is a text based clone detector that provides fast execution and some level of token insensitivity (i.e. mark two code fragments as equal even though they present different text tokens). More specifically, Simian found 57672 duplicate lines in 5230 blocks in 2031 files and processed a total of 385622 significant (775921 raw) lines in 7136 files. Dividing the duplic- ated lines by the total lines we 7.4%. Furthermore, if we divide the number of classes that contain duplicated code by the total number of Java classes, we obtain 28.5%. Hence, our study using Simian found that, 7.4% of all lines codes are duplicated and that 28.5% Program 1 shows two classes where code duplication is evident. We can observe that there are two major types of similarities: (1) Classes have the same amount of duplicate fragments with the only difference being the type of the object in consideration (ChangeCanProc on the left, TransferCanProc on the right); (2) if we compare the methods (of either right and left programs) getValidCanProc() and getValidExtCanProc(), we can observe that the only difference resides in the if condition: one has the negation of the other’s boolean expression. We can also
Recommended publications
  • Technological Advancement in Object Oriented Programming Paradigm for Software Development
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 14, Number 8 (2019) pp. 1835-1841 © Research India Publications. http://www.ripublication.com Technological Advancement in Object Oriented Programming Paradigm for Software Development Achi Ifeanyi Isaiah1, Agwu Chukwuemeka Odi2, Alo Uzoma Rita3, Anikwe Chioma Verginia4, Okemiri Henry Anaya5 1Department of Maths/Comp Sci/Stats/Info., Faculty of science, Alex Ekwueme University, Ndufu-Alike 2Department of Computer Science, Ebonyi State University-Abakaliki. 3Alex Ekwueme University, Ndufu-Alike, 4Department of Maths/Comp Sci/Stats/Info., Faculty of science, Alex Ekwueme University, Ndufu-Alike 5Department of Maths/Comp Sci/Stats/Info., Faculty of science, Alex Ekwueme University, Ndufu-Alike Abstract and personalization problems. Take for instance, a lot of sophisticated apps are being produced and release in the Object oriented programming paradigm in software market today through the application of OOP. Almost desktop development is one of the most popular methods in the apps are being converted to mobile apps through Java, C++, information technology industry and academia as well as in PHP & MySQL, R, Python etc platform which form the many other forms of engineering design. Software testimony of OOP in the software industries. Software development is a field of engineering that came into existence developer has been revolving using procedural language for owing to the various problems that developers of software the past decade before the advent of OOP. The relationships faced while developing software projects. This paper analyzes between them is that procedural languages focus on the some of the most important technological innovations in algorithm but OOP focuses on the object model itself, object oriented software engineering in recent times.
    [Show full text]
  • Eliminating the Call Stack to Save RAM
    Eliminating the Call Stack to Save RAM Xuejun Yang Nathan Cooprider John Regehr University of Utah BAE Systems AIT University of Utah [email protected] [email protected] [email protected] Abstract it is likely to use some of its RAM to store one or more call stacks: Most programming languages support a call stack in the program- chains of activation records representing the state of executing ming model and also in the runtime system. We show that for appli- functions. From the point of view of the compiler, targeting a run- cations targeting low-power embedded microcontrollers (MCUs), time system based on a callstack has important benefits. First, the RAM usage can be significantly decreased by partially or com- stack is memory-efficient: variables live only as long as they are pletely eliminating the runtime callstack. We present flattening, needed. Second, the stack is time-efficient, supporting allocation a transformation that absorbs a function into its caller, replacing and deallocation in a handful of clock cycles. Third, compilers can function invocations and returns with jumps. Unlike inlining, flat- rapidly generate high-quality code by compiling one function at tening does not duplicate the bodies of functions that have multiple a time; the small size of the typical function makes it possible to callsites. Applied aggressively, flattening results in stack elimina- run aggressive intraprocedural optimizations while also achieving tion. Flattening is most useful in conjunction with a lifting transfor- short compile times. mation that moves global variables into a local scope. Our thesis is that despite these well-known benefits: Partially or Flattening and lifting can save RAM.
    [Show full text]
  • Merging Equivalent Contexts for Scalable Heap-Cloning-Based Context-Sensitive Points-To Analysis∗
    Merging Equivalent Contexts for Scalable Heap-Cloning-Based Context-Sensitive Points-to Analysis∗ Guoqing Xu Atanas Rountev Computer Science and Engineering Computer Science and Engineering Ohio State University Ohio State University [email protected] [email protected] ABSTRACT General Terms A context-sensitive points-to analysis maintains separate points- Algorithms, measurement, experimentation to relationships for each possible (abstract) calling context of a method. Previous work has shown that a large number of equiv- Keywords alence classes exists in the representation of calling contexts. Such Pointer analysis, points-to analysis, context sensitivity equivalent contexts provide opportunities for context-sensitive anal- yses based on binary decision diagrams (BDDs), in which BDDs 1. INTRODUCTION automatically merge equivalent points-to relationships. However, Context sensitivity in points-to analyses has been studied exten- the use of a BDD “black box” introduces additional overhead for sively; some of this work is summarized in [11, 8, 28]. Most such analysis running time. Furthermore, with heap cloning (i.e., using analyses compute a complete points-to solution for all variables in context-sensitive object allocation sites), BDDs are not as effective a program. Such algorithms usually have to sacrifice analysis pre- because the number of equivalence classes increases significantly. cision for practical scalability. An alternative is a refinement-based A further step must be taken to look inside the BDD black box to approach that performs points-to analysis by answering pointer- investigate where the equivalence comes from, and what tradeoffs related queries raised by a compiler or a client analysis on demand can be employed to enable practical large-scale heap cloning.
    [Show full text]
  • Preview ES6 Tutorial (PDF Version)
    ES6 ES6 About the Tutorial European Computer Manufacturers Association (ECMAScript) or (ES) is a standard for scripting languages like JavaScript, ActionScript and JScript. It was initially created to standardize JavaScript, which is the most popular implementation of ECMAScript. This tutorial adopts a simple and practical approach through JavaScript to describe the new features in ECMAScript 2015 (ES6), ECMAScript 2016 (ES7), ECMAScript 2017(ES8) and ECMAScript 2018 (ES9). Audience This tutorial is designed for the stemplate oftware programmers who have already worked with JavaScript and wishes to gain in-depth knowledge about the ECMAScript. The tutorial will give you enough understanding on the functionalities of ECMAScript and also about ES6, ES7, ES8 and ES9. Prerequisites An understanding of JavaScript programming concepts is necessary to gain maximum knowledge from this tutorial. Disclaimer & Copyright Copyright 2019 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial. If you discover any errors on our website or in this tutorial, please notify us at [email protected].
    [Show full text]
  • Difference Between Shallow and Reference Copy Java
    Difference Between Shallow And Reference Copy Java Noland is betwixt venous after unescapable Emmott take-overs his menhirs handsomely. Garold is overnice and hum altogether as faithless Brant fulminate reprehensively and Atticize astuciously. Fourfold Thom bestialising or kyanized some decreets meteorically, however flavourless Torrey gardens jovially or induct. How to convert map stream method as the previous example illustrated in origional object variable is shallow and using serializable and both reference of the object to Depth: Become a Complete Java Engineer! If we use static fields in java and copy. Subscribe to submit your print the difference between shallow reference and copy? It is not going to work correctly if we apply the same method to arrays. Modifications to original object when b are correct me of singleton pattern, lazy copying comes down arrows to become a reference between and shallow copy java. If one of the properties in the original object is an object itself, are we fine now? Later modifications to the content of either are immediately reflected on the contents of ALL other lists. Please enter a valid date. Suitable links to external documentation for common libraries. How To Remove White Spaces from String In Java? Net Shallow copy and deep copy are used for copying data between objects. What is transient variable? In this way you can retrieve the masks in case you need them in the future. Unlike shallow copy, once created, with your way you do not need to pass the object being copied. Joey, a bit by bit copy of the field is performed.
    [Show full text]
  • Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams
    Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams John Whaley Monica S. Lam Computer Science Department Stanford University Stanford, CA 94305 {jwhaley, lam}@stanford.edu ABSTRACT 1. INTRODUCTION This paper presents the first scalable context-sensitive, inclusion- Many applications of program analysis, such as program opti- based pointer alias analysis for Java programs. Our approach to mization, parallelization, error detection and program understand- context sensitivity is to create a clone of a method for every con- ing, need pointer alias information. Scalable pointer analyses text of interest, and run a context-insensitive algorithm over the ex- developed to date are imprecise because they are either context- panded call graph to get context-sensitive results. For precision, insensitive[3, 17, 19, 33] or unification-based[15, 16]. A context- we generate a clone for every acyclic path through a program’s call insensitive analysis does not distinguish between different calling graph, treating methods in a strongly connected component as a sin- contexts of a method and allows information from one caller to gle node. Normally, this formulation is hopelessly intractable as a propagate erroneously to another caller of the same method. In call graph often has 1014 acyclic paths or more. We show that these unification-based approaches, pointers are assumed to be either un- exponential relations can be computed efficiently using binary de- aliased or are pointing to the same set of locations[28]. In contrast, cision diagrams (BDDs). Key to the scalability of the technique is inclusion-based approaches are more efficient but also more expen- a context numbering scheme that exposes the commonalities across sive, as they allow two aliased pointers to point to overlapping but contexts.
    [Show full text]
  • Applying Prototype-Based Programming Techniques for the Organization of Mobile Systems
    Vrije Universiteit Brussel Faculty of Sciences In Collaboration with Ecole´ des Mines de Nantes 2001-2002 Applying Prototype-Based Programming Techniques for the Organization of Mobile Systems A Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science (Thesis research conducted in the EMOOSE exchange project funded by the European Community) By: Boris Mej´ıas Promotor: Prof. Theo D’Hondt (VUB) Advisor: Wolfgang De Meuter (VUB) Abstract Agents succeeded as mobile entities that properly solved problems on mobile systems. They can move around the network in a transparent way, and communi- cate with other agents by sending messages. But the relationship between agents does not have a strong structure, and big multi agent systems become hard to maintain and to understand. The primary aim of this work is to provide a way to organize agents in mobile systems. Objects help us to have a structured way of programming, but if we want ob- jects moving around different address spaces, we need them to be self-supporting. When objects depends on a class, we also need to move the class everywhere the object goes, and to have a sophisticated system for type checking. We decided to use Prototype-Based Programming, a paradigm to program without classes, which has a proper way of reasoning about sharing, an important property for our purpose to organize the system. We extend Borg, a language to implement autonomous agents in a wide area network. The extension provides methods to create prototypes that can move and be extended through the network.
    [Show full text]
  • Sheep Cloning with Ownership Types
    Sheep Cloning with Ownership Types Paley Li Nicholas Cameron James Noble Victoria University of Mozilla Corporation Victoria University of Wellington [email protected] Wellington New Zealand New Zealand [email protected] [email protected] ABSTRACT There are cases when it is not obvious which cloning tech- Object-oriented programmers often need to clone objects. nique would produce the more suitable clone, and there are Mainstream languages, such as C# and Java, typically de- even cases when neither technique is suitable. In all of these fault to shallow cloning, which copies just one object and cases, languages tend to offer little support, forcing program- aliases references from that object. Other languages, such mers to design and implement a custom cloning implemen- as Eiffel, provide deep cloning. A deep clone is a copy of the tation themselves. entire object graph reachable from the cloned object, which could include many unnecessary objects. Alternatively, pro- Ownership types enforce the heap into a hierarchically struc- grammers can implement their own object cloning functions, tured, by introducing an owner object for every object [10]. however, this is often difficult and error prone. The term context is used to mean the formal set of objects owned by an object, and the term representation means Sheep Cloning is an automated cloning technique which uses the set of objects which are conceptually part of an object, ownership information to provide the benefits of both shal- therefore ownership types help describe an object's repre- low and deep cloning without the costs. We describe, for- sentation.
    [Show full text]
  • BYOB Version
    The Beauty and Joy of Computing Lab Exercise 10: Shall we play a game? Objectives By completing this lab exercise, you should learn to ● Understand and work with layering of multiple sprites (applied to make a side-scrolling background); ● Be able to use object-oriented programming as supported by BYOB; ● Program self-cloning “projectiles” for use in a game; and ● Merge separately developed programs into one larger program to support development by teams. Background (Pre-Lab Reading) You should read this section before coming to the lab. It describes how various things work in BYOB and provides pictures to show what they look like in BYOB. You only need to read this material to get familiar with it. There is no need to actually do the actions described in this section, but you certainly can do them if it would help you follow along with the examples. The purpose of this lab is to introduce a few new concepts and techniques that are important for developing larger projects, and games in particular, in BYOB. These are techniques that will be useful in many of your final projects, so are important to know and understand! Image Layers In the activities and examples we’ve done up until now, sprites have been separate and have not overlapped - what if that is not the case? For example, in the very first lab we added the dragon sprite to the initial Alonzo sprite, and both sprites started in the center of the screen and had to be dragged apart. But before we separated them, how did BYOB decide whether Alonzo or the Dragon should be displayed in those locations where they overlap? This uses the concept of layers, which is common now in graphics programs as well as in the drawing tools of programs like PowerPoint or Word.
    [Show full text]
  • SELF: the Power of Simplicity*
    To be published in: LISP AND SYMBOLIC COMPUTATION: An International Journal, 4, 3, 1991 © 1991 Kluwer Academic Publishers - Manufactured in The Netherlands SELF: The Power of Simplicity* DAVID UNGAR† ([email protected]) Computer Systems Laboratory, Stanford University, Stanford, California 94305 RANDALL B. SMITH† ([email protected]) Xerox Palo Alto Research Center, Palo Alto, California 94304 Abstract. SELF is an object-oriented language for exploratory programming based on a small number of simple and concrete ideas: prototypes, slots, and behavior. Prototypes combine inheritance and instantiation to provide a framework that is simpler and more flexible than most object-oriented languages. Slots unite variables and procedures into a single construct. This permits the inheritance hierarchy to take over the function of lexical scoping in conventional languages. Finally, because SELF does not distinguish state from behavior, it narrows the gaps between ordinary objects, procedures, and closures. SELF’s simplicity and expressiveness offer new insights into object- oriented computation. To thine own self be true. —William Shakespeare 1 Introduction Object-oriented programming languages are gaining acceptance, partly because they offer a useful perspective for designing computer programs. However, they do not all offer exactly the same perspective; there are many different ideas about the nature of object-oriented computation. In this paper, we present SELF, a program- ming language with a new perspective on objects and message passing. Like the 1 Smalltalk-80 language [6], SELF is designed to support exploratory programming *This work is partially supported by Xerox, and partially by National Science Foundation Presidential Young Investigator Grant #CCR-8657631, Sun Microsystems, the Powell Foundation, IBM, Apple Computer, DEC, NCR, Texas Instruments, and Cray Laboratories.
    [Show full text]
  • Client-Driven Pointer Analysis
    Client-Driven Pointer Analysis Samuel Z. Guyer and Calvin Lin Department of Computer Sciences, The University of Texas at Austin, Austin, TX 78712, USA, ¡ sammy,lin ¢ @cs.utexas.edu Abstract. This paper presents a new client-driven pointer analysis algorithm that automatically adjusts its precision in response to the needs of client analyses. We evaluate our algorithm on 18 real C programs, using five significant error detec- tion problems as clients. We compare the accuracy and performance of our al- gorithm against several commonly-used fixed-precision algorithms. We find that the client-driven approach effectively balances cost and precision, often produc- ing results as accurate as fixed-precision algorithms that are many times more costly. Our algorithm works because many client problems only need a small amount of extra precision applied to the right places in each input program. 1 Introduction Pointer analysis is critical for effectively analyzing programs written in languages like C, C++, and Java, which make heavy use of pointers and pointer-based data structures. Pointer analysis attempts to disambiguate indirect memory references, so that subse- quent compiler passes have a more accurate view of program behavior. In this sense, pointer analysis is not a stand-alone task: its purpose is to provide pointer information to other client analyses. Existing pointer analysis algorithms differ considerably in their precision. Previous research has generally agreed that more precise algorithms are often significantly more costly to compute, but has disagreed on whether more precise algorithms yield more accurate results, and whether these results are worth the additional cost [23, 22, 16, 10, 20].
    [Show full text]
  • Object Cloning for Ownership Systems
    Object Cloning for Ownership Systems by Paley Guangping Li A thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science. Victoria University of Wellington 2015 Abstract Modern object-oriented programming languages frequently need the ability to clone, duplicate, and copy objects. The usual approaches taken by languages are rudimentary, primarily because these approaches op- erate with little understanding of the object being cloned. Deep cloning naively copies every object that has a reachable reference path from the object being cloned, even if the objects being copied have no innate rela- tionship with that object. For more sophisticated cloning operations, lan- guages usually only provide the capacity for programmers to define their own cloning operations for specific objects, and with no help from the type system. Sheep cloning is an automated operation that clones objects by lever- aging information about those objects’ structures, which the programmer imparts into their programs with ownership types. Ownership types are a language mechanism that defines an owner for every object in the pro- gram. Ownership types create a hierarchical structure for the heap. In this thesis, we construct an extensible formal model for an object- oriented language with ownership types (Core), and use it to explore dif- ferent formalisms of sheep cloning. We formalise three distinct operational semantics of sheep cloning, and for each approach we include proofs or proof outlines where appropriate, and provide a comparative analysis of each model’s benefits. Our main contribution is the descripSC formal model of sheep cloning and its proof of type soundness.
    [Show full text]