Caesar: a Social Code Review Tool for Programming Education

Total Page:16

File Type:pdf, Size:1020Kb

Caesar: a Social Code Review Tool for Programming Education Caesar: A Social Code Review Tool for Programming Education by Mason Tang S.B., Massachusetts Institute of Technology (2010) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology September 2011 Copyright 2011 Mason Tang. All rights reserved. The author hereby grants to M.I.T. permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole and in part in any medium now known or hereafter created. Author................................................................ Department of Electrical Engineering and Computer Science August 22, 2011 Certified by . Robert C. Miller Associate Professor of Computer Science and Engineering Thesis Supervisor Accepted by . Arthur C. Smith Professor of Electrical Engineering Chairman, Department Committee on Graduate Theses 2 Caesar: A Social Code Review Tool for Programming Education by Mason Tang Submitted to the Department of Electrical Engineering and Computer Science August 22, 2011 In partial fulfillment of the requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Caesar is a distributed, social code review tool designed for the specific constraints and goals of a programming course. Caesar is capable of scaling to a large and diverse reviewer population, provides automated tools for increasing reviewer efficiency, and implements a social web interface for reviewing that encourages discussion and participation. Our system is implemented in three loosely-coupled components: a language-specific code pre- processor that partitions code into small pieces, filters out uninteresting ones, runs static analysis, and detects clusters of similar code; an incremental task router that dynamically assigns reviewers to tasks; and a language-agnostic web interface for reviewing code. Our evaluation using actual student code and a user study indicate that Caesar provides a sig- nificant improvement over existing code review workflows and interfaces. We also believe that this work contributes a modular framework for code reviewing systems that can be easily extended and improved. Thesis Supervisor: Robert C. Miller Title: Associate Professor of Computer Science and Engineering 3 4 Acknowledgements I would like to thank my advisor, Rob Miller, whose guidance has been instrumental in keeping me on track and on task throughout the entire course of this thesis. He has been patient, responsive, and consistently receptive to even the most outlandish ideas to come out of our numerous brainstorming sessions. Most valuably to me, he has taught me a great deal about research, software interfaces, software engineering, and myself. While Prof. Miller who gave me the guidance and direction for my thesis, I would not have made it very far without the constant and unwavering support of my family. My parents and my brother have always been a warm and loving home to me, whether that means a home-cooked meal, a friendly phone call, or even a short e-mail to see how my week has been. I am truly very lucky to have them, and I constantly look to their examples in an effort to become a better person. Equally supportive throughout this project has been my girlfriend, Elena Kon, who, despite living across the country and three timezones away, has consistently managed to be one of my greatest sources of comfort and strength. She has been endlessly patient with me even at my moments of greatest stress, and has believed in me and my abilities even when I had managed to thoroughly convince myself otherwise. I would also like to thank my roommates and best friends, Greg and Katrina, for giving me advice, calming me down when I needed it, and putting up with some extra dirty dishes when deadlines were looming. They have been, in almost every respect, a second family to me. This thesis would not have made it very far either without the inspiration, encourage- ment, and example of the rest of the UID group. It has been my pleasure and honor to find such a group of exceptionally talented researchers to call my colleagues and friends. A special thanks as well to Elena Tatarchenko for joining the project and helping me with the implementation. 5 6 Contents 1 Introduction 11 2 Related Work 17 2.1 Code Review in Industry ........................... 17 2.2 Code Review in Education .......................... 18 2.3 Crowdsourced Workflows .......................... 20 2.4 Static Code Analysis ............................. 21 3 Code Preprocessing 23 3.1 Partitioning .................................. 23 3.1.1 Design ................................. 23 3.1.2 Implementation ............................ 25 3.2 Chunk Filtering ................................ 26 3.2.1 Design ................................. 26 3.2.2 Implementation ............................ 26 3.3 Automated Commenting ........................... 26 3.3.1 Design ................................. 26 3.3.2 Implementation ............................ 27 3.4 Clustering ................................... 28 3.4.1 Design ................................. 28 3.4.2 Implementation ............................ 29 4 Task Routing 31 4.1 Design Goals ................................. 31 4.2 Implementation ................................ 32 7 5 Reviewing Interface 37 5.1 Design Goals ................................. 38 5.2 Features .................................... 39 5.2.1 Streamlined Commenting Interface . 40 5.2.2 Discussion Support .......................... 42 5.2.3 Code Display ............................. 43 5.2.4 Activity Streams ............................ 44 5.2.5 Dashboard ............................... 45 6 Evaluation 47 6.1 Code Preprocessor .............................. 48 6.1.1 Partitioning .............................. 48 6.1.2 Chunk Filtering ............................ 49 6.1.3 Automated Commenting ....................... 50 6.1.4 Clustering ............................... 51 6.2 Task Routing ................................. 53 6.3 Reviewing Interface .............................. 55 7 Conclusion 57 7.1 Future Work .................................. 58 7.1.1 Additional Forms of Review Content . 58 7.1.2 Improved Chunk Filtering ...................... 59 7.1.3 Improved Chunk Clustering ..................... 60 7.1.4 Data-Driven Task Routing ...................... 60 7.1.5 Similar Comment Clustering ..................... 61 7.1.6 Community Building ......................... 61 8 List of Figures 5-1 Caesar’s reviewing interface ......................... 37 5-2 Review Board comment discussion interface . 41 5-3 Rietveld reviewing interface ......................... 41 5-4 Comment markers on a chunk ........................ 42 5-5 Comment display and discussion interface . 42 5-6 New comment form for writing a comment . 43 5-7 Interface for browsing submission code ................... 44 5-8 User contribution activity stream ...................... 45 5-9 Dashboard interface with submissions and task assignments . 46 6-1 Checkstyle comment density distributions . 51 6-2 Chunk cluster size distributions ....................... 52 List of Tables 3.1 Checkstyle violations for example code in Listing 3.1 . 28 6.1 Assignment statistics ............................. 48 6.2 Pre-filtering chunk size distribution ..................... 49 6.3 Post-filtering chunk size distribution ..................... 49 6.4 Chunks removed per submission by preprocessor filters . 50 6.5 Generated Checkstyle comment counts ................... 50 9 6.6 Task assignment student co-reviewer interactions . 54 6.7 Task assignment metrics by submission ................... 54 Listings 3.1 Example Checkstyle input .......................... 28 4.1 Simplified code for task routing ....................... 33 10 Chapter 1 Introduction Software code review is the seemingly simple practice of human examination of source code to improve software quality. This can range from informal “over the shoulder” code reviews to mandatory company-wide review policies supported by specialized code review tools. Regardless of its implementation, code review provides not only the direct effect of catching potential programmer errors, but also the indirect, and often more valuable, effect of spreading developer knowledge and expertise among those involved. Code review as an existing practice in the software engineering industry and open source community generally assumes: that the unit of code being reviewed is a set of changes against an existing codebase (incremental code review), that reviewer assignment is an ad hoc decision made by the submitter, and that the reviewers involved are also themselves developers familiar with the relevant code. Meanwhile, code review as it exists in industry has found little traction in program- ming education. While other software engineering best practices from industry like the use of static analysis tools, version control systems, unit testing, pair programming, and rapid iterations have all experienced at least some adoption in the classroom, peer review of student source code has remained relatively unexplored. Much of this can be attributed to some of the additional constraints encountered in a classroom setting that challenge or violate the assumptions inherited from the standard code review model, such as: the possibility of plagiarism, the relative inexperience of students, and the task (or burden)
Recommended publications
  • Parasoft Dottest REDUCE the RISK of .NET DEVELOPMENT
    Parasoft dotTEST REDUCE THE RISK OF .NET DEVELOPMENT TRY IT https://software.parasoft.com/dottest Complement your existing Visual Studio tools with deep static INCREASE analysis and advanced PROGRAMMING EFFICIENCY: coverage. An automated, non-invasive solution that the related code, and distributed to his or her scans the application codebase to iden- IDE with direct links to the problematic code • Identify runtime bugs without tify issues before they become produc- and a description of how to fix it. executing your software tion problems, Parasoft dotTEST inte- grates into the Parasoft portfolio, helping When you send the results of dotTEST’s stat- • Automate unit and component you achieve compliance in safety-critical ic analysis, coverage, and test traceability testing for instant verification and industries. into Parasoft’s reporting and analytics plat- regression testing form (DTP), they integrate with results from Parasoft dotTEST automates a broad Parasoft Jtest and Parasoft C/C++test, allow- • Automate code analysis for range of software quality practices, in- ing you to test your entire codebase and mit- compliance cluding static code analysis, unit testing, igate risks. code review, and coverage analysis, en- abling organizations to reduce risks and boost efficiency. Tests can be run directly from Visual Stu- dio or as part of an automated process. To promote rapid remediation, each problem detected is prioritized based on configur- able severity assignments, automatical- ly assigned to the developer who wrote It snaps right into Visual Studio as though it were part of the product and it greatly reduces errors by enforcing all your favorite rules. We have stuck to the MS Guidelines and we had to do almost no work at all to have dotTEST automate our code analysis and generate the grunt work part of the unit tests so that we could focus our attention on real test-driven development.
    [Show full text]
  • Parasoft Static Application Security Testing (SAST) for .Net - C/C++ - Java Platform
    Parasoft Static Application Security Testing (SAST) for .Net - C/C++ - Java Platform Parasoft® dotTEST™ /Jtest (for Java) / C/C++test is an integrated Development Testing solution for automating a broad range of testing best practices proven to improve development team productivity and software quality. dotTEST / Java Test / C/C++ Test also seamlessly integrates with Parasoft SOAtest as an option, which enables end-to-end functional and load testing for complex distributed applications and transactions. Capabilities Overview STATIC ANALYSIS ● Broad support for languages and standards: Security | C/C++ | Java | .NET | FDA | Safety-critical ● Static analysis tool industry leader since 1994 ● Simple out-of-the-box integration into your SDLC ● Prevent and expose defects via multiple analysis techniques ● Find and fix issues rapidly, with minimal disruption ● Integrated with Parasoft's suite of development testing capabilities, including unit testing, code coverage analysis, and code review CODE COVERAGE ANALYSIS ● Track coverage during unit test execution and the data merge with coverage captured during functional and manual testing in Parasoft Development Testing Platform to measure true test coverage. ● Integrate with coverage data with static analysis violations, unit testing results, and other testing practices in Parasoft Development Testing Platform for a complete view of the risk associated with your application ● Achieve test traceability to understand the impact of change, focus testing activities based on risk, and meet compliance
    [Show full text]
  • Devnet Module 3
    Module 3: Software Development and Design DEVASCv1 1 Module Objectives . Module Title: Software Development and Design . Module Objective: Use software development and design best practices. It will comprise of the following sections: Topic Title Topic Objective 3.1 Software Development Compare software development methodologies. 3.2 Software Design Patterns Describe the benefits of various software design patterns. 3.3 Version Control Systems Implement software version control using GIT. 3.4 Coding Basics Explain coding best practices. 3.5 Code Review and Testing Use Python Unit Test to evaluate code. 3.6 Understanding Data Formats Use Python to parse different messaging and data formats. DEVASCv1 2 3.1 Software Development DEVASCv1 3 Introduction . The software development process is also known as the software development life cycle (SDLC). SDLC is more than just coding and also includes gathering requirements, creating a proof of concept, testing, and fixing bugs. DEVASCv1 4 Software Development Life Cycle (SDLC) . SDLC is the process of developing software, starting from an idea and ending with delivery. This process consists of six phases. Each phase takes input from the results of the previous phase. SDLC is the process of developing software, starting from an idea and ending with delivery. This process consists of six phases. Each phase takes input from the results of the previous phase. Although the waterfall methods is still widely used today, it's gradually being superseded by more adaptive, flexible methods that produce better software, faster, with less pain. These methods are collectively known as “Agile development.” DEVASCv1 5 Requirements and Analysis Phase . The requirements and analysis phase involves the product owner and qualified team members exploring the stakeholders' current situation, needs and constraints, present infrastructure, and so on, and determining the problem to be solved by the software.
    [Show full text]
  • How Bad Can It Git? Characterizing Secret Leakage in Public Github Repositories
    How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories Michael Meli Matthew R. McNiece Bradley Reaves North Carolina State University North Carolina State University North Carolina State University [email protected] Cisco Systems, Inc. [email protected] [email protected] Abstract—GitHub and similar platforms have made public leaked in this way have been exploited before [4], [8], [21], [25], collaborative development of software commonplace. However, a [41], [46]. While this problem is known, it remains unknown to problem arises when this public code must manage authentication what extent secrets are leaked and how attackers can efficiently secrets, such as API keys or cryptographic secrets. These secrets and effectively extract these secrets. must be kept private for security, yet common development practices like adding these secrets to code make accidental leakage In this paper, we present the first comprehensive, longi- frequent. In this paper, we present the first large-scale and tudinal analysis of secret leakage on GitHub. We build and longitudinal analysis of secret leakage on GitHub. We examine evaluate two different approaches for mining secrets: one is able billions of files collected using two complementary approaches: a to discover 99% of newly committed files containing secrets in nearly six-month scan of real-time public GitHub commits and a public snapshot covering 13% of open-source repositories. We real time, while the other leverages a large snapshot covering focus on private key files and 11 high-impact platforms with 13% of all public repositories, some dating to GitHub’s creation. distinctive API key formats. This focus allows us to develop We examine millions of repositories and billions of files to conservative detection techniques that we manually and automat- recover hundreds of thousands of secrets targeting 11 different ically evaluate to ensure accurate results.
    [Show full text]
  • Metrics for Gerrit Code Reviews
    SPLST'15 Metrics for Gerrit code reviews Samuel Lehtonen and Timo Poranen University of Tampere, School of Information Sciences, Tampere, Finland [email protected],[email protected] Abstract. Code reviews are a widely accepted best practice in mod- ern software development. To enable easier and more agile code reviews, tools like Gerrit have been developed. Gerrit provides a framework for conducting reviews online, with no need for meetings or mailing lists. However, even with the help of tools like Gerrit, following and monitoring the review process becomes increasingly hard, when tens or even hun- dreds of code changes are uploaded daily. To make monitoring the review process easier, we propose a set of metrics to be used with Gerrit code review. The focus is on providing an insight to velocity and quality of code reviews, by measuring different review activities based on data, au- tomatically extracted from Gerrit. When automated, the measurements enable easy monitoring of code reviews, which help in establishing new best practices and improved review process. Keywords: Code quality; Code reviews; Gerrit; Metrics; 1 Introduction Code reviews are a widely used quality assurance practice in software engineer- ing, where developers read and assess each other's code before it is integrated into the codebase or deployed into production. Main motivations for reviews are to detect software defects and to improve code quality while sharing knowledge among developers. Reviews were originally introduced by Fagan [4] already in 1970's. The original, formal type of code inspections are still used in many com- panies, but has been often replaced with more modern types of reviews, where the review is not tied to place or time.
    [Show full text]
  • Github Pull Request Review
    Github Pull Request Review Archaic and delegable Pierre prenotifying her longboat reactivating while Hartley reflow some intercross fadelessly. Narratable and thickhydropathic when Pincus Francois emerged reintegrate his timing. her bedchambers filigree or deactivate tamely. Unlearned and chiromantic Bearnard never outmeasuring There is merged soon, optimize this can request review status becomes quite clear based on their code management repositories that we would react to uninstall the pros and By dzone contributors, required for projects have fixed by everyone who can. In this palace, the toolbar will show why green Checks donut, a grey Changes revision, and grey zero counters in the remaining boxes. For this page with each other process, critical security expert from empirical and. Do at production data obtained from visual studio code review so that you if you want you selected, you a pull request that bad practice. Github will see? In github pull request review your first was this. Program readability: procedures versus comments. If any change on changes in progress and effective code coverage changes in that all pull request? Stripe is not have made for other reviewers are. Haacked is a blog about Technology, Software, Management, and fast Source. Even if there is in github or bandwidth costs go read way you can be detected by submitting are changes into new posts in github pull request review time for agility, requiring signed out. Rbac rules and code and more hunting down a nice aspect of incoming pr will update it more merge methods to! Review apps will spend some changes might require a pull reminders for.
    [Show full text]
  • Best Practices for the Use of Static Code Analysis Within a Real-World Secure Development Lifecycle
    An NCC Group Publication Best Practices for the use of Static Code Analysis within a Real-World Secure Development Lifecycle Prepared by: Jeremy Boone © Copyright 2015 NCC Group Contents 1 Executive Summary ....................................................................................................................................... 3 2 Purpose and Motivation ................................................................................................................................. 3 3 Why SAST Often Fails ................................................................................................................................... 4 4 Methodology .................................................................................................................................................. 4 5 Integration with the Secure Development Lifecycle....................................................................................... 5 5.1 Training ................................................................................................................................................. 6 5.2 Requirements ....................................................................................................................................... 6 5.3 Design ................................................................................................................................................... 7 5.4 Implementation ....................................................................................................................................
    [Show full text]
  • Code Review Guide
    CODE REVIEW GUIDE 2.0 RELEASE Project leaders: Larry Conklin and Gary Robinson Creative Commons (CC) Attribution Free Version at: https://www.owasp.org 1 F I 1 Forward - Eoin Keary Introduction How to use the Code Review Guide 7 8 10 2 Secure Code Review 11 Framework Specific Configuration: Jetty 16 2.1 Why does code have vulnerabilities? 12 Framework Specific Configuration: JBoss AS 17 2.2 What is secure code review? 13 Framework Specific Configuration: Oracle WebLogic 18 2.3 What is the difference between code review and secure code review? 13 Programmatic Configuration: JEE 18 2.4 Determining the scale of a secure source code review? 14 Microsoft IIS 20 2.5 We can’t hack ourselves secure 15 Framework Specific Configuration: Microsoft IIS 40 2.6 Coupling source code review and penetration testing 19 Programmatic Configuration: Microsoft IIS 43 2.7 Implicit advantages of code review to development practices 20 2.8 Technical aspects of secure code review 21 2.9 Code reviews and regulatory compliance 22 5 A1 3 Injection 51 Injection 52 Blind SQL Injection 53 Methodology 25 Parameterized SQL Queries 53 3.1 Factors to Consider when Developing a Code Review Process 25 Safe String Concatenation? 53 3.2 Integrating Code Reviews in the S-SDLC 26 Using Flexible Parameterized Statements 54 3.3 When to Code Review 27 PHP SQL Injection 55 3.4 Security Code Review for Agile and Waterfall Development 28 JAVA SQL Injection 56 3.5 A Risk Based Approach to Code Review 29 .NET Sql Injection 56 3.6 Code Review Preparation 31 Parameter collections 57 3.7 Code Review Discovery and Gathering the Information 32 3.8 Static Code Analysis 35 3.9 Application Threat Modeling 39 4.3.2.
    [Show full text]
  • Veni, Vidi, Vici
    Code Review: Veni, ViDI, Vici Yuriy Tymchuk, Andrea Mocci, Michele Lanza REVEAL @ Faculty of Informatics - University of Lugano, Switzerland Abstract—Modern software development sees code review as For example many of them leverage static code analysis a crucial part of the process, because not only does it facilitate techniques, like the ones provided by FindBugs [5], to spot the sharing of knowledge about the system at hand, but it may implementation problems. However, the results from such also lead to the early detection of defects, ultimately improving techniques are poorly integrated in a code review process, as the quality of the produced software. Although supported by we will see later. numerous approaches and tools, code review is still in its infancy, and indeed researchers have pointed out a number of We propose an approach to augment code review by inte- shortcomings in the state of the art. grating software quality evaluation, and more general design We present a critical analysis of the state of the art of code assessment, not only as a first class citizen, but as the core review tools and techniques, extracting a set of desired features concern of code review. Our approach, called Visual Design that code review tools should possess. We then present our vision Inspection (ViDI), uses visualization techniques to drive the and initial implementation of a novel code review approach quality assessment of the reviewed system, exploiting data named Visual Design Inspection (ViDI), illustrated through a set obtained through static code analysis. ViDI enables intuitive of usage scenarios. ViDI is based on a combination of visualization and easy defect fixing, personalized annotations, and review techniques, design heuristics, and static code analysis techniques.
    [Show full text]
  • How to Trust Auto-Generated Code Patches? a Developer Survey And
    How to trust auto-generated code patches? A developer survey and empirical assessment of existing program repair tools Yannic Noller∗ Ridwan Shariffdeen∗ National University of Singapore National University of Singapore Singapore Singapore [email protected] [email protected] Xiang Gao Abhik Roychoudhury National University of Singapore National University of Singapore Singapore Singapore [email protected] [email protected] ABSTRACT works on generating multi-line fixes [12, 29], or on transplanting Automated program repair is an emerging technology that seeks patches from one version to another [37] — to cover various use to automatically rectify bugs and vulnerabilities using learning, cases or scenarios of program repair. search, and semantic analysis. Trust in automatically generated Surprisingly, there is very little literature or systematic study patches is necessary for achieving greater adoption of program from either academia or industry — on the developer trust in pro- repair. Towards this goal, we survey more than 100 software practi- gram repair. In particular, what changes do we need to bring into tioners to understand the artifacts and setups needed to enhance the program repair process so that it becomes viable to have conver- trust in automatically generated patches. Based on the feedback sations on its wide-scale adoption? Part of the gulf in terms of lack from the survey on developer preferences, we quantitatively evalu- of trust comes from a lack of specifications — since the intended ate existing test-suite based program repair tools. We find that they behavior of the program is not formally documented, it is hard to cannot produce high-quality patches within a top-10 ranking and trust that the automatically generated patches meet this intended an acceptable time period of 1 hour.
    [Show full text]
  • Code Review Quality
    Code Review Quality Defining a good code review Overview Introduction: A modern code review process The importance of reviews for QA Strength and weaknesses Tools to make (better) reviews VCS: Git - Keeping track of changes. Gerrit: State of the art for git VCS. Continuous inspection: SonarQube How to define „good“? Factors to assess review quality Case study: How Developers see it Improve Review Quality Current research Defining a good code review 2 What is a modern code review process INTRODUCTION Defining a good code review 3 Introduction A modern review process Defining a good code review 4 Introduction The review value - strengths Coding standards compliance . Code review helps to maintain consistent coding style across the company . Reviews find high level breaches of design rules and consistency issues. Finding subtile bugs and defects early . A person is able to find logical flaws and approach problems, that might be very hard to find with tests. when they are cheap to fix. (Before submitting) Teaching and sharing knowledge . During review team members gain better understanding of the code base and learn from each other Strengthens team cohesion . Review discussions save team members from isolation and bring them closer to each other. Defining a good code review 5 Introduction The review value - weaknesses Highly depends on reviewer's experience and code familiarity . Study showed that the ‘worst’ reviewer can be ten times less effective than the best reviewer. The defect coverage is hard to asses . They might find a lot of different sorts of issues while overlook some potentially serious ones. Has a frustration and conflict potential .
    [Show full text]
  • Nudge: Accelerating Overdue Pull Requests Towards Completion
    Nudge: Accelerating Overdue Pull Requests Towards Completion CHANDRA MADDILA, Microsoft Research SAI SURYA UPADRASTA, Microsoft Research CHETAN BANSAL, Microsoft Research NACHIAPPAN NAGAPPAN, Microsoft Research GEORGIOS GOUSIOS, Delft University of Technology ARIE van DEURSEN, Delft University of Technology Pull requests are a key part of the collaborative software development and code review process today. However, pull requests can also slow down the software development process when the reviewer(s) or the author do not actively engage with the pull request. In this work, we design an end-to-end service, Nudge, for accelerating overdue pull requests towards completion by reminding the author or the reviewer(s) to engage with their overdue pull requests. First, we use models based on effort estimation and machine learning to predict the completion time for a given pull request. Second, we use activity detection to filter out pull requests that may be overdue, but for which sufficient action is taking place nonetheless. Lastly, we use actor identification to understand who the blocker of the pull request is and nudge the appropriate actor (author or reviewer(s)). We also conduct a correlation analysis to understand the statistical relationship between the pull request completion times and various pull request and developer related attributes. Nudge has been deployed on 147 repositories at Microsoft since 2019. We perform a large scale evaluation based on the implicit and explicit feedback we received from sending the Nudge notifications on 8,500 pull requests. We observe significant reduction in completion time, by over 60%, for pull requests which were nudged thus increasing the efficiency of the code review process and accelerating the pull request progression.
    [Show full text]