Predicting Developers' Negative Feelings About Code Review
Total Page:16
File Type:pdf, Size:1020Kb
Predicting Developers’ Negative Feelings about Code Review Carolyn D. Egelman1, Emerson Murphy-Hill1, Elizabeth Kammer1, Margaret Morrow Hodges2, Collin Green1, Ciera Jaspan1, James Lin1 1Google, 2Artech {cegelman,emersonm,eakammer,hodgesm,colling,ciera,jameslin}@google.com ABSTRACT me upset is that the crap is for completely bogus rea- During code review, developers critically examine each others’ code sons...anybody who thinks that the above [code snip- to improve its quality, share knowledge, and ensure conformance pet] is (a) legible (b) efficient (even with the magical to coding standards. In the process, developers may have negative compiler support) (c) particularly safe is just incompe- interpersonal interactions with their peers, which can lead to frus- tent and out to lunch. The above code is sh*t, and it tration and stress; these negative interactions may ultimately result generates shit code. It looks bad, and there’s no reason in developers abandoning projects. In this mixed-methods study at for it. one company, we surveyed 1,317 developers to characterize the neg- Just like in the physical workplace, such interactions can have ative experiences and cross-referenced the results with objective significant consequences. For example, the Linux kernel lost at least data from code review logs to predict these experiences. Our results one developer due to its apparent toxic culture [22]: suggest that such negative experiences, which we call “pushback”, are relatively rare in practice, but have negative repercussions when I’m not a Linux kernel developer any more...Given the they occur. Our metrics can predict feelings of pushback with high choice, I would never send another patch, bug report, recall but low precision, making them potentially appropriate for or suggestion to a Linux kernel mailing list again...I highlighting interactions that may benefit from a self-intervention. would prefer the communication style within the Linux kernel community to be more respectful. I would prefer KEYWORDS that maintainers find healthier ways to communicate when they are frustrated. I would prefer that the Linux code review, interpersonal conflict kernel have more maintainers so that they wouldn’t ACM Reference Format: have to be terse or blunt. Carolyn D. Egelman1, Emerson Murphy-Hill1, Elizabeth Kammer1, Mar- Unfortunately, we have little systematic understanding of what garet Morrow Hodges2, Collin Green1, Ciera Jaspan1, James Lin1. 2020. Predicting Developers’ Negative Feelings about Code Review. In 42nd In- makes a code review go bad. This is important for three reasons. ternational Conference on Software Engineering (ICSE ’20), May 23–29, 2020, First, from an ethical perspective, we should seek to make the Seoul, Republic of Korea. ACM, New York, NY, USA, 12 pages. https://doi. software engineering process fair and inclusive. Second, from a org/10.1145/3377811.3380414 competitiveness perspective, organizations must retain software engineering talent. Third, happier developers report increased feel- ings of well-being and productivity [9]. 1 INTRODUCTION This paper seeks to understand and measure negative experi- It is well established that modern code review provides many ben- ences in code review. This paper contributes the first study, to the efits for a software organization, including finding defects [3, 17], authors’ knowledge, that quantifies feelings of pushback in the knowledge sharing [3, 17, 19], and improving software mainte- code review process. Measurement enables understanding of the nance [19]. However, code review can also lead to interpersonal prevalence of the bad experiences, whether negative experiences conflict in the workplace; prior research in the social sciences de- are occurring at different rates in subpopulations of developers, scribe interpersonal conflicts as a “consequential stressor in the and whether initiatives aimed at reducing negative experiences, workplace” [21]. Anecdotally, professional developers have reported like codes of conduct, are working. In a first step towards enabling their experiences with stressful [2], toxic [16], and insufferable7 [ ] new initiatives, we use interviews, surveys, and log data from a reviewers of their code. A code review in 2015 of Linux kernel code single company, Google, to study pushback, which we define as illustrates conflict vividly: “the perception of unnecessary interpersonal conflict in code review Christ people. This is just sh*t. The conflict I get is while a reviewer is blocking a change request.” We articulate five due to stupid new gcc header file crap. But what makes main feelings associated with pushback, and we create and validate logs-based metrics that predict those five feelings of pushback. This paper focuses on practical measurement of feelings of pushback Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed rather than developing theories around the root causes of pushback for profit or commercial advantage and that copies bear this notice and the full citation happening or why developers feel pushback in different ways. on the first page. Copyrights for third-party components of this work must be honored. We asked the following research questions: For all other uses, contact the owner/author(s). ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea RQ1: How frequent are negative experiences with code review? © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-7121-6/20/05. RQ2: What factors are associated with pushback occuring? https://doi.org/10.1145/3377811.3380414 RQ3: What metrics detect author-perceived pushback? ICSE ’20, May 23–29, 2020, Seoul, Republic of Korea C.D. Egelman, E. Murphy-Hill, E. Kammer, M.M. Hodges, C. Green, C. Jaspan, J. Lin 2 METHOD junior developers, included racial/ethnic diversity, and consisted To study negative behaviors in code reviews, we combine quali- entirely of women. tative and quantitative methods using surveys and log data from Developers’ experience with code review is mostly positive. Google, a large international software development company. We All 14 interviewees viewed the code review process as helpful, and developed three log-based metrics, informed by interviews with a newer developers particularly appreciated its inherent mentorship. diverse group of 14 developers, to detect feelings of pushback in Nonetheless, the interviews pointed out the challenges developers code reviews and validated those metrics through a survey with a face during code review. stratified sample of 2,500 developers that covered five feelings of Defining pushback. To inform our definition of pushback, we pushback; 1,317 developers completed the survey. In the survey we asked developers what pushback meant to them. While their defini- collected qualitative feedback and asked respondents to volunteer tions largely aligned with our a priori notion that acceptance was code reviews that matched a list of problematic behaviors. being withheld by a reviewer, it included the idea that pushback also contained interpersonal conflict. Interpersonal conflict was also 2.1 Code Review at Google a recurring theme in the instances of pushback that interviewees shared during the interviews. Thus, our definition of pushback is At Google, code review is mandatory. The artifact being evaluated “the perception of unnecessary interpersonal conflict in code review in a code review we will call a change request, or CR for short. When while a reviewer is blocking a change request.” a change request is ready, its author seeks acceptance through code Dealing with Pushback. Interviewees talked about their reac- review, meaning that the reviewer certifies that they have reviewed tions, working towards resolution, and the consequences of the the change and it looks okay to check in. A given code review may pushback they received: be performed by one or more developers. Authors choose their re- viewers, or use a tool-recommended reviewer. Once each reviewer’s • Initial reaction. Aggressive comments can provoke strong concerns (if any) have been addressed, the reviewer accepts the emotional reactions, followed by a desire to seek social sup- change and the author merges the code into the codebase. Unlike port from trusted colleagues. Participants talked about shock, the open source community, almost all code that is reviewed is even- frustration, second guessing themselves, and a sense of guilt tually merged, which makes metrics like “acceptance rate,” which or shame that came from reading aggressive comments. From are common in studies on open source code reviews, inappropriate a behavioral standpoint, participants discussed asking team- in the Google context. mates’ opinions, asking a manager or team lead for advice, Additionally, any code that is checked into the company’s code reviewing the code change and related materials, and review- repository must comply with internal style guides. To ensure com- ing code review guidelines or documentation. pliance, code must either be written or reviewed by a developer • Investing time to work towards a resolution. Partici- with readability in the programming language used, which is a pants brought up adding comments asking for clarification, certification that a Google developer earns by having their code talking directly with reviewers (within the code review tool, evaluated by already-certified developers. More information about over chat, video conference,