Do Explicit Review Strategies Improve Code Review Performance?

Do Explicit Review Strategies Improve Code Review Performance? Pavlína Wurzel Gonçalves,1 Enrico Fregnan,1 Tobias Baum,2 Kurt Schneider,2 Alberto Bacchelli1 1University of Zurich, Switzerland 2Leibniz Universität Hannover, Germany 1<lastname>@ifi.uzh.ch 2<firstname>.<lastname>@inf.uni-hannover.de ABSTRACT reading techniques for code inspection [2], but also in debugging Context: Code review is a fundamental, yet expensive part of soft- or test-driven development [16, 20]. ware engineering. Therefore, research on understanding code re- Checklists and strategies assist developers in performing com- view and its efficiency and performance is paramount. plex tasks by systematizing their activity, thus lowering the cogni- Objective: We aim to test the effect of a guidance approach on tive load of reviewers [17, 20]. The result should be a more effective review effectiveness and efficiency. This effect is expected towork and efficient review. by lowering the cognitive load of the task; thus, we analyze the In this experiment, we investigate the effect of guidance ap- mediation relationship as well. proaches on reviewing code. We compare three treatment groups – Method: To investigate this effect, we employ an experimental de- no guidance, checklist, and strategic checklist execution (strategy). sign where professional developers have to perform three code If we can confirm that a guided approach helps developers identify reviews. We use three conditions: no guidance, a checklist, and defects or make review tasks easier, not only “static" checklists, a checklist-based review strategy. Furthermore, we measure the but also explicit reviewing strategies should be incorporated into reviewers’ cognitive load. review tools and used to train reviewers. Limitations: The main limitations of this study concern the specific cohort of participants, the mono-operation bias for the guidance 2 RESEARCH QUESTIONS conditions, and the generalizability to other changes and defects. Our main goal is to investigate whether guidance on how to perform Full registered report: https://doi.org/10.17605/OSF.IO/5FPTJ; a review (strategy) provides additional benefits compared to guid- Materials: https://doi.org/10.6084/m9.figshare.11806656 ance on what to look for in the review (checklist). A good review ACM Reference Format: performance not only means finding many of the contained defects Pavlína Wurzel Gonçalves,1 Enrico Fregnan,1 Tobias Baum,2 Kurt Schneider,2 (effectiveness) but also finding them quickly (efficiency) [6]. We Alberto Bacchelli1. 2020. Do Explicit Review Strategies Improve Code Re- include not having any guidance as an additional control. Therefore, view Performance?. In 17th International Conference on Mining Software we ask: Repositories (MSR ’20), October 5–6, 2020, Seoul, Republic of Korea. ACM, New York, NY, USA,5 pages. https://doi.org/10.1145/3379597.3387509 RQ1: Does guidance in review lead to: RQ1.1: a higher review effectiveness (share of functional 1 INTRODUCTION defects found)? RQ1.2: a higher review efficiency (functional defects Code review is a widespread software engineering practice in found over the review time)? which one or multiple reviewers inspect a code change written by a peer [1, 23] with the primary goal of improving software quality [3]. Performing a good code review is an expensive and We formalize our research question in the following hypotheses: time-consuming task [11]. Therefore, research is investigating how H1.1: there are significant differences in review effectiveness be- to improve code review efficiency and performance. tween checklist, strategy, and no guidance approach. With this aim, researchers have developed many reading tech- H01.1: there are no significant differences in review effectiveness niques to guide developers in reviewing code [15]. One of the guid- between checklist, strategy, and no guidance approach. ance techniques commonly used in the industry is checklist-based reading [2, 8, 27]. A checklist guides a reviewer in what to look for. H1.2: there are significant differences in review efficiency between Further guidance might be possible by telling a reviewer how checklist, strategy, and no guidance approach. to review. Providing a specific strategy to perform a development H01.2: there are no significant differences in review efficiency be- task has been proven to be helpful not only with scenario-based tween checklist, strategy, and no guidance approach. Both guidance approaches (checklist and strategy) systematize Permission to make digital or hard copies of all or part of this work for personal or the activity of the reviewers by reducing the amount of informa- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation tion to keep in mind at a given time, thus, supposedly lowering on the first page. Copyrights for components of this work owned by others than the developers’ cognitive load [20, 25]. Therefore, we investigate: author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. RQ2: Is the effect of guidance on code review mediated bya MSR ’20, October 5–6, 2020, Seoul, Republic of Korea lower cognitive load? © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-7517-7/20/05...$15.00 https://doi.org/10.1145/3379597.3387509 We formalize our research question in the following hypotheses: MSR ’20, October 5–6, 2020, Seoul, Republic of Korea Wurzel Gonçalves and Fregnan et al. H2.1: Cognitive load significantly mediates the relationship be- 3.2 Material tween the guidance approach and review effectiveness. The following section introduces the material we plan to use in this H02.1: Cognitive load does not significantly mediate the relation- study; this material is publicly available [30]. ship between the guidance approach and review effectiveness. Experiment UI. We employ a web-based tool that participants H2.2: Cognitive load significantly mediates the relationship be- use to complete the experiment remotely. We log participants’ tween the guidance approach and review efficiency. answers, environment, and UI interactions. The tool was built for H02.2: Cognitive load does not significantly mediate the relation- our previous work and we modified it according to the new ex- ship between the guidance approach and review efficiency. periment’s requirements and past experience [5]. Figure1 shows a partial view of the checklist implementation in the web-based 3 RESEARCH PROTOCOL experiment UI; a complete view is available in our online appendix [30]. For RQ1, we use a randomized controlled experiment design with Checklist. The checklist is developed based on recommendations three groups. We setup RQ2 as a correlational study; studies in- in the literature and Microsoft checklists [24]. According to the vestigating mediators in experimental design manipulate the in- literature, a good checklist requires a specific answer for each item, dependent variable and the mediator is “only” measured like in separates items by topic, and focuses on relevant issues [9, 12, 17]; observational studies (measurement-of-mediation design [29]). Checklists should specify the scope in which items should be checked (e.g., “for each method/class") to prevent developers from 3.1 Variables memorizing big portions of code and jumping through it [17]. Table1 presents the study’s variables. The guidance approach being Following these recommendations, we created a tentative version used is the independent variable for both RQ1 and RQ2. A central of the checklist. dependent variable for RQ1 and RQ2 is the number of functional For each seeded defect, the final checklist contains at least one defects found by the participants. In RQ2, cognitive load is used item that helps to find the issue but does not give obvious clues as the mediator variable. Furthermore, we measure participants’ about the type or location of the defects. To assess its face validity, demographic data (e.g., Java experience), reported in Table1, to we contacted three Java developers with experience in code review. control for potential correlation with the review performance. We Based on their feedback, we improved the items in our checklist. employ the following treatments (guidance approaches): Then, we repeated this process with other three developers. Cognitive Load Questionnaire. To measure cognitive load, we No Guidance. The first group of developers does not receive any used a standardized questionnaire (StuMMBE-Q) [19]. It captures aid in the review and perform the review as they are used to. the two components of cognitive load (i.e., mental load and mental Checklist. The second group is presented with a checklist (see Sec- effort) in two 6-item subscales. Effort and difficulty ratingsare tion 3.2). They are required to identify defects using this checklist, reliable measures for the cognitive processing that contributes to but also any other defects that might appear. cognitive load [13]. Strategy. Inspired by formerly developed strategies [18, 20], we System Usability Scale. To measure the usability of the guidance apply the same principles in our implementation of a checklist- approaches, we adapted the items of the System Usability Scale based reviewing strategy. While a developer has

Load more