Profiling Developers Through the Lens of Technical Debt

Profiling Developers Through the Lens of Technical Debt Zadia Codabux∗ Christopher Dutchyn University of Saskatchewan University of Saskatchewan [email protected] [email protected] ABSTRACT training in computer science or related domain, and the rest are Context: Technical Debt needs to be managed to avoid disastrous self-taught or learned the skills on the job [8]. consequences, and investigating developers’ habits concerning tech- Different developers have different weaknesses and tolerances nical debt management is invaluable information in software devel- for Technical Debt (TD). TD is a metaphor introduced by Ward opment. Objective: This study aims to characterize how developers Cunningham [4] to help reason about the long-term costs incurred manage technical debt based on the code smells they induce and the by taking shortcuts in software development. However, when TD refactorings they apply. Method: We mined a publicly-available is not managed, it can impede software development in multiple Technical Debt dataset for Git commit information, code smells, ways ranging from quality issues to loss of a system. One of the coding violations, and refactoring activities for each developer of a indicators of TD is code smells [15], a term coined by Kent Beck, selected project. Results: By combining this information, we pro- which refer to surface indication of potential design weaknesses in file developers to recognize prolific coders, highlight activities that source code [5]. One of the ways to manage TD is refactoring [3]. discriminate among developer roles (reviewer, lead, architect), and Refactoring is a technique for restructuring code by improving estimate coding maturity and technical debt tolerance. its internal structure without affecting its external behavior5 [ ]. Therefore, establishing the developers’ shortcomings with respect CCS CONCEPTS to TD can help focus scrutiny on portions of source code that may be more likely to contain flaws or debt, both by the coder during • Information systems ! Data mining. software development and by reviewers when they check the code KEYWORDS during code reviews. This study aims to profile developers with respect to technical Developer Characterization; Open Source Software; Mining Soft- debt by mining developers’ commit actions, their code smells, and ware Repositories; Technical Debt; Code Smell; Refactoring refactoring activities. This could help in team selection and orga- ACM Reference Format: nization, task assignments such as complex refactoring or code Zadia Codabux and Christopher Dutchyn. 2020. Profiling Developers Through reviews, and pair programming sessions. We mined the publicly the Lens of Technical Debt. In ACM / IEEE International Symposium on Em- available TD dataset [9] to obtain the commit information from pirical Software Engineering and Measurement (ESEM) (ESEM ’20), October GitHub repositories, code smells, and coding violations from Sonar- 8–9, 2020, Bari, Italy. ACM, New York, NY, USA, 6 pages. https://doi.org/10. Qube3 and Ptidej [6], and refactoring information from Refactoring 1145/3382494.3422172 Miner [11]. We elaborate on the TD dataset and tools in Section 2. 1 INTRODUCTION The contributions of this study are: (1) We profile developers with respect to technical debt based Open Source Software (OSS) platforms have thrived over the last on the code smells they induce and close and the type of decade. The latest "State of the Octoverse" report1 by GitHub2, one refactoring they perform. of the world’s most popular and leading software development plat- (2) An updated TD dataset (in the .db format) with views for forms for both open source and private project repositories, reports extracting the raw data for profiling developers, constructed that 44+ million projects were developed by 40+ million developers using SQL queries. from all over the world. OSS development is typically a collabo- (3) The mapping of SonarQube coding violations to the most rative process. Most OSS developers are skilled and experienced common code smells by Fowler [5] and the translation of arXiv:2009.04005v1 [cs.SE] 8 Sep 2020 professionals with an average of 11.8 years of software develop- these rules violations to bad practices, in the format of a CSV ment experience. About 50% of the developers have acquired formal file (codified.csv in the replication package (Section 4.4)). 1https://octoverse.github.com/ The rest of the paper is organized as follows. Section 2 describes 2 https://github.com/ the terminologies associated with the tools in the TD dataset used Permission to make digital or hard copies of all or part of this work for personal or throughout this study. Section 3 highlights the relevant related classroom use is granted without fee provided that copies are not made or distributed work about developers’ characterization. Section 4 focuses on the for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM methodology of the study, including the research goal, questions, must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, study design, and the data analysis process. Section 5 provides to post on servers or to redistribute to lists, requires prior specific permission and/or a insights into the results. Section 6 presents the discussion. Section 7 fee. Request permissions from [email protected]. ESEM ’20, October 8–9, 2020, Bari, Italy lists threats to the validity of this study. Section 8 concludes and © 2020 Association for Computing Machinery. outlines future work. ACM ISBN 978-1-4503-7580-1/20/10...$15.00 3https://www.sonarqube.org/ https://doi.org/10.1145/3382494.3422172 ESEM ’20, October 8–9, 2020, Bari, Italy Codabux et al. 2 BACKGROUND study by Hauff et al. [7] extracted GitHub user-profiles, specifically, In this section, we elaborate on the tools and associated terminolo- their README.md files, to match developers’ information with gies that we use in this paper. descriptions from job advertisements. Li [10] conducted a study Git distinguishes between these different developers’ roles4: focused on the analysis of developers’ behaviors related to technical debts. Using clustering on Git commits, faults and SonarQube • Author is the person who originally wrote the work issues, including vulnerabilities, the author came up with the fol- • Committer is the person who last applied the work lowing categories of developers: Veterans, Vulnerability Creators, Refactoring Miner is an open-source tool that mines refactoring and Fault Inducers/ Commoners. operations from the git commits of Java projects. Some examples Although there are a few studies that explore the characteristics of refactoring types extracted are: Extract Interface, Move Method, of developers in OSS based on artifacts (e.g., as defect information and Extract Method [5]. and source code) for various purposes such recommending specific SonarQube is a web-based open-source platform to analyze tasks (e.g., bug fixing) or for mapping developer profiles to jobs,to source code to detect violations to its pre-defined rules. SonarQube 5 the best of our knowledge, profiling developers from a technical defines three types of issues : debt perspective is yet to be explored from a combined code smell • Bug: an error that breaks the code; fix immediately and refactoring perspective. • Security Vulnerability: a point in code that’s open to attack • Code Smell: a maintainability issue that makes the code 4 METHODOLOGY confusing and difficult to maintain 4.1 Goal SonarQube has more than 500 rules6 for Java issues relating to different aspects of source code. SonarQube also classifies therules The goal of this study is to mine developers’ actions from code into five severity levels: Blocker, Critical, Major, Minor, and Info. smells and refactoring information to profile developers from a In this study, we only considered the SonarQube rules violations technical debt perspective. In so doing, we will identify the develop- (which SonarQube refers to as "code smells"), but we mapped the ers who are superior at managing tasks, fixing bugs, and refactoring rules violations to Fowler’s widely-known code-smell classifica- code. This developer characterization will help inform decisions tion [5] and bad coding practices. Henceforth, when we mention regarding team selection and organization, task assignments, and code smells in this study, we are referring to Fowler’s classification. pair programming sessions. We propose the following Research Many bad practices are not categorized in the SonarQube data. In Question (RQ) to gather information to profile developers. fact, when delving into the TD dataset, we discover that more than What do the developers’ activities regarding inducing and 50% of the SonarQube rules violations end up uncategorized. refactoring code smells and bad practices tell us about their technical debt management? 3 RELATED WORK 4.2 Study Design Here, we explore literature for studies on profiling OSS developers. Yang et al. [14] proposed a developer portrait model based on The data used in this study is from the publicly available Techni- cal Debt Dataset (TD dataset), downloaded as a database, which three characteristics: personal information, programming skills, 7 and contributions from the Open Source Platforms GitHub, GitLab, consists of 33 Java projects from the Apache Software Foundation Gitee, and Bitbucket. Then, they successfully applied

Load more