Holistic Analysis for Refactoring Python-Based Analytics Programs

Total Page:16

File Type:pdf, Size:1020Kb

Holistic Analysis for Refactoring Python-Based Analytics Programs HARP: Holistic Analysis for Refactoring Python-Based Analytics Programs Weijie Zhou Yue Zhao [email protected] [email protected] North Carolina State University Facebook Raleigh, NC Menlo Park, CA Guoqiang Zhang Xipeng Shen [email protected] [email protected] North Carolina State University North Carolina State University Raleigh, NC Raleigh, NC ABSTRACT 42nd International Conference on Software Engineering (ICSE ’20), May 23– Modern machine learning programs are often written in Python, 29, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA, 12 pages. with the main computations specified through calls to some highly https://doi.org/10.1145/3377811.3380434 optimized libraries (e.g., TensorFlow, PyTorch). How to maximize the computing efficiency of such programs is essential for many 1 INTRODUCTION application domains, which has drawn lots of recent attention. This For machine learning applications, computing efficiency is essential, work points out a common limitation in existing efforts: they focus for the growing scale of datasets and the needs for timely responses their views only on the static computation graphs specified by li- in various applications. A pattern common in today’s analytics brary APIs, but leave the influence from the hosting Python code applications is Python+X, where, main computations are written largely unconsidered. The limitation often causes them to miss the with APIs of some libraries X (e.g., TensorFlow, PyTorch) while the big picture and hence many important optimization opportunities. host code is written in Python which connects all pieces together. This work proposes a new approach named HARP to address the We call them Python-based analytics programs. problem. HARP enables holistic analysis that spans across com- Such programs have some common features. We draw on Tensor- putation graphs and their hosting Python code. HARP achieves it Flow [1] as an example for explanation. Like some other packages, through a set of novel techniques: analytics-conscious speculative TensorFlow was initially introduced for a specific scope (Deep analysis to circumvent Python complexities, a unified representa- Learning), but then evolved for a broader scope (Analytics and Sci- tion augmented computation graphs to capture all dimensions of entific Computing). At its core, TensorFlow builds on the dataflow knowledge related with the holistic analysis, and conditioned feed- programming model. In developing a TensorFlow program, a devel- back mechanism to allow risk-controlled aggressive analysis. Refac- oper writes code by calling TensorFlow APIs to specify the intended toring based on HARP gives 1.3–3X and 2.07X average speedups computations, and makes a call to the TensorFlow runtime. The call on a set of TensorFlow and PyTorch programs. triggers the execution of those APIs, which constructs a computa- tion graph; the TensorFlow runtime optimizes the graph, and then CCS CONCEPTS executes it. This approach represents one of the popular program- • Software and its engineering → Automated static analysis; ming paradigms used by modern machine learning and analytics Data flow architectures; • Computing methodologies → Machine frameworks. learning. The recent years have seen a number of efforts for enhancing KEYWORDS the performance of Python-based analytics programs. For exam- ple, the XLA project [18] and the R-Stream-TF project [31] try to machine learning program, computation graph, dynamic language, improve the runtime performance of TensorFlow by utilizing com- program analysis piler techniques to transform dataflow graphs (e.g., fusing multiple ACM Reference Format: operations). Weijie Zhou, Yue Zhao, Guoqiang Zhang, and Xipeng Shen. 2020. HARP: However, despite the many efforts, a large room for performance Holistic Analysis for Refactoring Python-Based Analytics Programs. In improvement is left elusive for current optimizers to harvest. We be- lieve that one of the fundamental reasons is that current efforts have Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed all focused on the dataflow graphs embodied by the invocations for profit or commercial advantage and that copies bear this notice and the full citation of library APIs in a program. Although dataflow graphs usually on the first page. Copyrights for components of this work owned by others than ACM capture the core computations of an analytics program, limiting the must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a view on them may lose the big picture that the host code provides, fee. Request permissions from [email protected]. and hence many large-scoped optimization opportunities. ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Listing 1 offers an example. This codelet is part of the core com- © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7121-6/20/05...$15.00 putations in Deep Dictionary Learning [24]. The first five lines of https://doi.org/10.1145/3377811.3380434 the code specify the core computations and hence the structure of ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea Weijie Zhou, Yue Zhao, Guoqiang Zhang, and Xipeng Shen the dataflow graph. Line 1 defines 퐴 as a variable as its value needs Listing 1: Example from Deep Dictionary Learning ("tf" for to be updated through the execution; Lines 2 and 3 define 퐷 and - the namespace of TensorFlow) as place holders for they will hold the input values; Line 4 specifies 1 A = tf.Variable(tf.zeros(shape=[N, N]) , dtype=tf.float32) the formula for calculating the new value of 퐴 through a series of 2 D = tf.placeholder(shape=[N, N], dtype=tf.float32) calls to the high-level TensorFlow mathematics functions; Line 5 3 X = tf.placeholder(shape=[N, N], dtype=tf.float32) 4 R = tf.matmul(D, tf.subtract(X, tf.matmul(tf.transpose(D) specifies an update to 퐴 with the new value. Line 6 is a Python ,A))) loop statement, and Line 7 calls TensorFlow API "sess.run" (a block- 5 L = tf.assign(A, R) ing API) to invoke TensorFlow runtime to actually construct the 6 for i in range ( I t e r ) : 7 result = sess.run(L, feed_dict={D: D_, X: X_}) dataflow graph and execute it; it passes the actual parameters 퐷_ and -_ to the two place holders. The following pseudo-code shows what the codelet implements: 5 >A 8 < 퐼C4A : Listing 2: Listing 1 in a Different Form 퐴 = 퐷 ¹- − 퐷) 퐴º 1 . 2 t1 = tf.matmul(D, X) where each of the capitalized letters represents a multidimensional 3 t2 = tf.matmul(D, tf.transpose(D)) array (called a tensor), and 퐷 and - are input tensors and remain 4 R = tf.substract(t1, tf.matmul(t2, A)) 5 L = tf.assign(A, R) constant across the loop iterations. 6 for i in range ( I t e r ) : The implementation by the codelet in Listing 1 contains some 7 result = sess.run(L, feed_dict={D: D_, X: X_}) redundant computations, which can be seen if we expand the math- ematical formula in the pseudo-code to 퐷- −퐷퐷) 퐴. Because terms and ) are invariant across the loop iterations, they can be 퐷- 퐷퐷 dynamically-typed language, the complexities of which (listed later) hoisted outside the loop, computed once and reused for all iterations form some major barriers for static optimizers to work effectively. as shown as follows: A pure dynamic optimizer on the other hand is complicated to C1 = 퐷- develop and causes runtime overhead and delays. ) C2 = 퐷퐷 The strategy we finally took is to create a refactoring assistant, 5 >A 8 < 퐼C4A : which provides suggestions for developers to use in refactoring 퐴 = C1 − C2퐴 the code. This strategy allows aggressive analysis on incomplete Even though the redundant computations are not hard to see information. Even though the suggestions may not be always cor- at the pseudo-code level, it cannot be recognized or optimized by rect, as the tool provides enough feedback on the assumptions it either TensorFlow or any of the previously proposed optimizers uses, the developers can avoid the risks while taking advantage of designed for TensorFlow programs. Moreover, even if we rewrite the often correct suggestions. HARP is the first tool that enables the code to better expose the redundancy as in Listing 2, prior holistic analysis that spans across dataflow graphs and their hosting optimizers still cannot detect the redundant computations. Python code. HARP achieves it through several novel features: The reason is that all these tools focus only on the dataflow graph (1) Speculative analysis. HARP relies heavily on static analysis. composed through the TensorFlow APIs, while the redundancy As a dynamically-typed scripting language, Python poses many comes from the interplay between those TensorFlow API calls and difficulties to static analysis, such as unknown types, operator over- the other Python code that hosts those calls. Take Listing 1 for loading, dynamic dispatch of higher-order functions, and so on. instance, without analyzing the host Python code, it is impossible HARP circumvents the complexities with two key insights: Many to tell that D and X are invariant across the for loop and hence Python complexities rarely appear in analytics programs or can ) 퐷- and 퐷퐷 are also loop invariants, as the computation graph be ignored; for those that do matter, they can often be treated suc- does not capture the loop in the host code and its relation with cessfully through speculations on common patterns in analytics the data in the graph. On the other hand, general Python code applications. HARP materializes the insights through speculative optimizers cannot find out the redundancy either as they donot analysis. understand the necessary semantics of the TensorFlow APIs.
Recommended publications
  • Refactoring of Acceptance Tests in Visual Studio
    Refactoring of Acceptance Tests in Visual Studio Denis Elbert Diplomarbeit Studiengang Informatik (Technik) Fakultät für Informatik Hochschule Mannheim Autor: Denis Elbert Matrikelnummer: 510243 Zeitraum: 16.11.2009 – 16.03.2010 Erstgutachterin: Prof. Dr. Astrid Schmücker-Schend Zweitgutachterin Prof. Dr. Miriam Föller-Nord Praktischer Teil angefertigt bei: Prof. Dr. Frank Maurer Agile Software Engineering Group University of Calgary Department of Computer Science 2500 University Dr NW Calgary, Alberta T2N 1N4 Canada Refactoring of Acceptance Tests in Visual Studio STATUTORY DECLARATION (GERMAN) Ich versichere, dass ich die vorliegende Arbeit selbstständig und ohne Benutzung anderer als der angegebenen Hilfsmittel angefertigt habe. Alle Stellen, die wörtlich oder sinngemäß aus veröffentlichten und nicht veröffentlichten Schriften entnommen wurden, sind als solche kenntlich gemacht. Die Arbeit hat in dieser oder ähnlicher Form keiner anderen Prüfungsbehörde vorgelegen. Mannheim, 16.03.2010 ________________________ Unterschrift I Refactoring of Acceptance Tests in Visual Studio ABSTRACT Executable Acceptance Test Driven Development (EATDD) is an extension of Test Driven Development (TDD). TDD requires that unit tests are written before any code. EATDD pushes this TDD paradigm to the customer level by using Acceptance Tests to specify the requirements and features of a system. The Acceptance Tests are mapped to a Fixture that permits the automated execution of the tests. With ongoing development the requirements of the system can change. Thus, the Acceptance Tests must be adjusted in order to reflect the new requirements. Since the tests and the corresponding Fixtures must remain consistent, the manual modification of these tests is time consuming and error-prone. Hence comes the need for Acceptance Test refactoring.
    [Show full text]
  • Improving Tools for Javascript Programmers
    Improving Tools for JavaScript Programmers (Position Paper) Esben Andreasen Asger Feldthaus Simon Holm Jensen Aarhus University Aarhus University Aarhus University [email protected] [email protected] [email protected] Casper S. Jensen Peter A. Jonsson Magnus Madsen Aarhus University Aarhus University Aarhus University [email protected] [email protected] [email protected] Anders Møller Aarhus University [email protected] ABSTRACT 2. FINDING ERRORS WITH We present an overview of three research projects that all DATAFLOW ANALYSIS aim to provide better tools for JavaScript web application 1 The TAJS analysis tool infers an abstract state for each programmers : TAJS, which infers static type information program point in a given JavaScript web application. Such for JavaScript applications using dataflow analysis; JSRefac- an abstract state soundly models the possible states that tor, which enables sound code refactorings; and Artemis, may appear at runtime and can be used for detecting type- which provides high-coverage automated testing. related errors and dead code. These errors often arise from wrong function parameters, misunderstandings of the run- time type coercion rules, or simple typos that can be tedious to find using testing. 1. JAVASCRIPT PROGRAMMERS NEED We have approached the development of TAJS in stages. BETTER TOOLS First, our focus has been on the abstract domain and dataflow JavaScript contains many dynamic features that allegedly constraints that are required for a sound and reasonably ease the task of programming modern web applications. Most precise modeling of the basic operations of the JavaScript importantly, it has a flexible notion of objects: properties are language itself and the native objects that are specified in added dynamically, the names of the properties are dynam- the ECMAScript standard [3].
    [Show full text]
  • Pyref: a Refactoring Detection Tool for Python Projects
    Università Software della Institute Svizzera italiana PYREF:AREFACTORING DETECTION TOOL FOR PYTHON PROJECTS Hassan Atwi June 2021 Supervised by Prof. Dr. Michele Lanza Co-Supervised by Dr. Bin Lin SOFTWARE &DATA ENGINEERING MASTER THESIS iii Abstract Refactoring, the process of improving internal code structure of a software system without altering its external behav- iors, is widely applied during software development. Understanding how developers refactor source code can help us gain better understandings of the software development process and the relationship between various versions of software systems. Currently, many refactoring detection tools (e.g., REFACTORINGMINER and REF-FINDER) have been proposed and have received considerable attention. However, most of these tools focus on Java program, and are not able to detect refactorings applied throughout the history of a Python project, although the popularity of Python is rapidly magnifying. Developing a refactoring detection tool for Python projects would fill this gap and help extend the language boundary of the analysis in variant software engineering tasks. In this work, we present PYREF, a tool that automatically detect 11 different types of refactoring operations in Python projects. Our tool is inspired by REFACTORING MINER, the state-of-the-art refactoring detection tool for Java projects. With that said, while our core algorithms and heuristics largely inherit those of REFACTORING MINER, considerable changes are made due to the different language characteristics between Python and Java. PYREF is evaluated against oracles collected from various online resources. Meanwhile, we also examine the reliability of PYREF by manually inspecting the detected refactorings from real Python projects. Our results indi- cate that PYREF can achieve satisfactory precision.
    [Show full text]
  • How Does Object-Oriented Code Refactoring Influence
    How does Object-Oriented Code Refactoring Influence Software Quality? Research Landscape and Challenges Satnam Kaur *1, Paramvir Singh2 1Department of Computer Science and Engineering Dr B R Ambedkar National Institute of Technology, Jalandhar 144011, Punjab, India 2 School of Computer Science University of Auckland, Auckland 1142, New Zealand ABSTRACT Context: Software refactoring aims to improve software quality and developer productivity. Numerous empirical studies investigating the impact of refactoring activities on software quality have been conducted over the last two decades. Objective: This study aims to perform a comprehensive systematic mapping study of existing empirical studies on evaluation of the effect of object-oriented code refactoring activities on software quality attributes. Method: We followed a multi-stage scrutinizing process to select 142 primary studies published till December 2017. The selected primary studies were further classified based on several aspects to answer the research questions defined for this work. In addition, we applied vote-counting approach to combine the empirical results and their analysis reported in primary studies. Results: The findings indicate that studies conducted in academic settings found more positive impact of refactoring on software quality than studies performed in industries. In general, refactoring activities caused all quality attributes to improve or degrade except for cohesion, complexity, inheritance, fault-proneness and power consumption attributes. Furthermore, individual refactoring activities have variable effects on most quality attributes explored in primary studies, indicating that refactoring does not always improve all quality attributes. Conclusions: This study points out several open issues which require further investigation, e.g., lack of industrial validation, lesser coverage of refactoring activities, limited tool support, etc.
    [Show full text]
  • Visual Studio Code
    Visual Studio Code Tips & Tricks Vol. 1 1st Edition – March 2016, Revision 1 (April 2016) © Microsoft 2016 All rights reserved. This document is for informational purposes only. Microsoft Deutschland GmbH · Konrad-Zuse-Str. 1 · D-85716 Unterschleißheim Tel. +49 (0)89 31760 · www.microsoft.com · www.techwiese.de Authors: Tobias Kahlert and Kay Giza · Microsoft Germany Editor: Mathias Schiffer Localization: textoso · www.textoso.com Page 1 of 26 This book expresses the authors’ views and opinions. This document always up-to-date at: http://aka.ms/VSCodeTipsTricks Contents Visual Studio Code? ................................................................................................................................. 4 Preface ..................................................................................................................................................... 5 What is Visual Studio Code? .................................................................................................................... 6 Tip 1 – Getting the Latest and Greatest as a VS Code Insider ................................................................. 6 Tip 2 – Multiple Cursors .......................................................................................................................... 8 Tip 3 – Using the Command Palette to Control VS Code ........................................................................ 8 Tip 4 – Selecting a Language for a File ...................................................................................................
    [Show full text]
  • Fit Refactoring-Improving the Quality of Fit Acceptance Test
    FIT REFACTORING-IMPROVING THE QUALITY OF FIT ACCEPTANCE TEST Xu Liu A Thesis Submitted to the graduate college of Bowling Green State University in partial fulfillment of The requirement for the degree of Master of Science August 2007 Committee: Joseph Chao, Advisor Ron Lancaster Mohammad Dadfar ii ABSTRACT Joseph Chao, Advisor Acceptance tests are formal testing conducted to determine whether a system satisfies its acceptance criteria or not and whether the acquirer should accept the system or not. A suite of acceptance tests for large projects might include a large number of test cases; therefore, automation of acceptance test is in great demand. Framework for Integrated Tests (FIT) is a popular tool employed in Agile Software Development to automate acceptance tests. Its most attractive feature is that it uses customer readable tables as test cases so that customers can write test cases. Refactoring is the process of restructuring or rewriting code without changing its interface and functionality. Refactoring make the code easier to read, understand and maintain, and sometime helps to improve the performance of the system. In a typical project that uses FIT as an acceptance test tool, the size of FIT acceptance tests grows as the size of system code grows, and the acceptance design may go far away from the original design (this may happen in any project, not restricted in a project using FIT). At this stage, it would be difficult to read and maintain the FIT acceptance test, and it is time to improve the quality of the acceptance test. In this research, we introduce the concept and reveal the importance of FIT Refactoring.
    [Show full text]
  • Introduction to C++: Part 1 Tutorial Version 0.6
    Introduction to C++: Part 1 tutorial version 0.6 Research Computing Services Getting started with the training room terminals . Log on with your BU username . If you don’t have a BU username: . Username: Choose tutm1-tutm18, tutn1-tutn18 . Password: on the board. On the desktop is a link to MobaXterm. Double click to open it. Getting started with the training room terminals . Click the + sign at the top of the window and enter: ssh [email protected] . Use your SCC account if you have one. If not, see the board for a username and password. When prompted enter the password. Getting started on the SCC . Load the Eclipse module: module load eclipse/2019-06 . Enter this command to create a directory in your home folder and to copy in tutorial files: /scratch/intro_to_cpp.sh Run the Eclipse software . Enter this command to start up the Eclipse development environment. eclipse & . When this window appears just click the Launch button: Run the Eclipse software . When this window appears just leave it be for now. Tutorial Outline: All 4 Parts . Part 1: . Part 3: . Intro to C++ . Defining C++ classes . Object oriented concepts . Look at the details of how they . Write a first program work . Part 2: . Part 4: . Using C++ objects . Class inheritance . Standard Template Library . Virtual methods . Basic debugging . Available C++ tools on the SCC Tutorial Outline: Part 1 . Very brief history of C++ . Definition object-oriented programming . When C++ is a good choice . The Eclipse IDE . Object-oriented concepts . First program! . Some C++ syntax . Function calls . Create a C++ class Very brief history of C++ C C++ For details more check out A History of C++: 1979−1991 class Account Object-oriented programming • Internal data: • Balance • Transaction record .
    [Show full text]
  • Test Code Adaptation Plugin for Eclipse
    TAPE Test Code Adaptation Plugin for Eclipse Lehmia Kiran Dr. Fakhar Lodhi Wafa Basit Department of Computer Sciences Department of Computer Sciences Department of Computer Sciences NUCES-FAST NUCES-FAST NUCES-FAST Lahore, Pakistan Lahore, Pakistan Lahore, Pakistan [email protected] [email protected] [email protected] Abstract — Refactoring improves the design of software and Extreme Programming (XP) [14] allows the code to change makes it easier to maintain by eliminating code smells. Manual often; consequently unit tests must also be changed. In refactoring is often error-prone and time-consuming. traditional testing, it is a tester who writes the tests but in Refactoring tools available in almost all major object oriented XP it is developers (not testers) who write tests for every programming languages like java, C++, C# etc that provide class they develop. Whenever refactoring is applied on varying degree of automation. The problem with these tools is that they do not support adaptation of unit tests of the code, many test cases fail due to inconsistency with source refactored code resulting in inconsistent unit tests. Unit tests code. The significant amount of developers’ effort is are the only safety net available to developers to verify the required to manually manipulate the testing suite on every system behavior after refactoring. Once these tests get broken single refactoring [15].The process of refactoring is there is no way to describe whether the system preserved its automated by many tools [17, 20, 21, 22]. Eclipse supports behavior or not. In this paper we provide technical details of automated refactoring and provides separate API to perform TAPE (Test code Adaptation Plug-in for Eclipse).
    [Show full text]
  • Transferring Code-Clone Detection and Analysis to Practice
    Transferring Code-Clone Detection and Analysis to Practice Yingnong Dangy, Dongmei Zhang∗, Song Ge∗, Ray Huang∗, Chengyun Chuy and Tao Xiez ∗Microsoft Research Asia, China Email: {dongmeiz;songge;rayhuang}@microsoft.com yMicrosoft Corporation, USA Email: {yidang,chchu}@microsoft.com zUniversity of Illinois at Urbana-Champaign, USA Email: [email protected] Abstract—During software development, code clones are com- on empirical studies or tool support for detecting or analyzing monly produced, in the form of a number of the same or code clones. However, very few of these research projects were similar code fragments spreading within one or many large conducted in industrial contexts, and much fewer resulting code bases. Numerous research projects have been carried out on empirical studies or tool support for detecting or analyzing research tools were actually used or adopted by industrial code clones. However, in practice, few such research projects practitioners. The research community has already realized have resulted in substantial industry adoption. In this paper, gaps between academic research and industrial practices [6]– we report our experiences of transferring XIAO, a code-clone [8], and has called for training and education of researchers detection and analysis approach and its supporting tool, to and practitioners in conducting successful technology transfer broad industrial practices: (1) shipped in Visual Studio 2012, a widely used industrial IDE; (2) deployed and intensively used and adoption. at the Microsoft Security Response Center. According to our In this paper, we report the successful technology-transfer experiences, technology transfer is a rather complicated journey case of XIAO [9] and lessons learned from our research efforts.
    [Show full text]
  • Comparative Studies of Six Programming Languages
    Comparative Studies of Six Programming Languages Zakaria Alomari Oualid El Halimi Kaushik Sivaprasad Chitrang Pandit Concordia University Concordia University Concordia University Concordia University Montreal, Canada Montreal, Canada Montreal, Canada Montreal, Canada [email protected] [email protected] [email protected] [email protected] Abstract Comparison of programming languages is a common topic of discussion among software engineers. Multiple programming languages are designed, specified, and implemented every year in order to keep up with the changing programming paradigms, hardware evolution, etc. In this paper we present a comparative study between six programming languages: C++, PHP, C#, Java, Python, VB ; These languages are compared under the characteristics of reusability, reliability, portability, availability of compilers and tools, readability, efficiency, familiarity and expressiveness. 1. Introduction: Programming languages are fascinating and interesting field of study. Computer scientists tend to create new programming language. Thousand different languages have been created in the last few years. Some languages enjoy wide popularity and others introduce new features. Each language has its advantages and drawbacks. The present work provides a comparison of various properties, paradigms, and features used by a couple of popular programming languages: C++, PHP, C#, Java, Python, VB. With these variety of languages and their widespread use, software designer and programmers should to be aware
    [Show full text]
  • A Visual Tool-Based Approach to Porting C++ Code
    A VISUAL TOOL-BASED APPROACH TO PORTING C++ CODE. bertjan broeksema Master thesis Rijksuniversiteit Groningen Faculty of mathematics and natural sciences Department Computing Science Supervisor: Prof. dr. A. C. Telea KDAB Supervisors: T. Adam V. Krause Bertjan Broeksema: A Visual Tool-Based Approach to porting C++ Code., Master thesis, © June 2010 ABSTRACT The sheer number of changes needed to port a code base when one of more of its dependencies need to be replaced by either a new version or another framework make it viable to develop tools that perform all or most of the required changes automatically in a reliable way. The purpose of this thesis was to research ways to support developers in the process of automated refactoring of large code bases. To this extent the type of refactoring activities involved was studied. Furthermore, a semi-automated code refactoring system based on queries and rules was developed. Transformations work directly on the source by means of insert and replace actions but are still correct due to the semantic understanding of the code by the framework. Additionally, new visualization techniques are proposed to sup- port the process of iterative refactoring. The results of these are implemented in a tool for the C/C++ language, developed as an extension of the KDevelop Integrated Development Environment (IDE). Finally, the effectiveness and usefulness of the tool was demonstrated on a large industrial code base and concrete refac- toring operations involved in the process of porting C++ code. The tool is able to deal with large and complex code bases. It was tested with a subset of the queries and transformations needed for a Qt3 to Qt4 port on kdelibs 3.5 which contains about 750K lines of code.
    [Show full text]
  • Model-Driven Java Code Refactoring
    Computer Science and Information Systems 12(2):375–403 DOI: 10.2298/CSIS141025015H Model-Driven Java Code Refactoring Sohaib Hamioud1, and Fadila Atil2 1 Complex System Engineering Laboratory (LISCO), Department of Computer Science, Badji Mokhtar University, POB 12, 23000 Annaba, Algeria [email protected] 2 Complex System Engineering Laboratory (LISCO), Department of Computer Science, Badji Mokhtar University, POB 12, 23000 Annaba, Algeria [email protected] Abstract. Refactoring is an important technique for restructuring code to improve its design and increase programmer productivity and code reuse. Performing refactorings manually, however, is tedious, time consuming and error-prone. Thus, providing an automated support for them is necessary. Unfortunately even in our days, such automation is still not easily achieved and requires formal specifications of the refactoring process. Moreover, extensibility and tool develo- pment automation are factors that should be taken into consideration when design- ing and implementing automated refactorings. In this paper, we introduce a model-driven approach where refactoring features, such as code representation, analysis and transformation adopt models as first-class artifacts. We aim at exploring the value of model transformation and code generation when formaliz- ing refactorings and developing tool support. The presented approach is applied to the refactoring of Java code using a prototypical implementation based on the Eclipse Modeling Framework, a language workbench, a Java metamodel and a set of OMG standards. Keywords: code refactoring, metamodeling, model-driven engineering. 1. Introduction Refactoring is the process of changing the internal structure of a software without chan- ging its external behavior, typically aimed at making code more reusable, easier to maintain, easier to extend and easier to understand [3], [19].
    [Show full text]