Discovering API Usability Problems at Scale

Discovering API Usability Problems at Scale Emerson Murphy-Hill1, Caitlin Sadowski~, Andrew Head, John Daughtry~, Andrew Macvean~, Ciera Jaspan~, and Collin Winter~∗ 1North Carolina State University, ~Google, University of California, Berkeley [email protected],[email protected],[email protected] {daughtry,amacvean,ciera,collinwinter}@google.com ABSTRACT and effort on the part of busy developers. Moreover, it can onlywork Software developers’ productivity can be negatively impacted by effectively if respondents have sufficient memory recall of theAPI using APIs incorrectly. In this paper, we describe an analysis tech- usability challenges they’ve faced in the past, a problem regularly nique we designed to find API usability problems by comparing explored in the social sciences (e.g. [6]). The second technique is successive file-level changes made by individual software develop- to investigate API usability challenges by summarizing them on ers. We applied our tool, StopMotion, to the file histories of real public discussion forums, like StackOverflow [9]. The disadvantage developers doing real tasks at Google. The results reveal several is that this technique requires developers to be sufficiently reflective API usability challenges including simple typos, conceptual API and sufficiently blocked to bother posting a question about the misalignments, and conflation of similar APIs. API that they’re using. The third technique is mining software repositories such as the Google Play store or GitHub to look for ACM Reference Format: instances of API misuse [3]. While this approach scales well, such Emerson Murphy-Hill1, Caitlin Sadowski~, Andrew Head, John Daughtry~, Andrew Macvean~, Ciera Jaspan~, and Collin Winter~. 2018. Discovering techniques elide a large amount of valuable software evolution data API Usability Problems at Scale. In WAPI’18: WAPI’18: IEEE/ACM 2nd Inter- that happened between commits or releases [10]. The fourth scalable national Workshop on API Usage and Evolution , June 2–4, 2018, Gothenburg, technique is mining data provided by “try it out” web API platforms, Sweden. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3194793. where researchers can look at 404 errors that indicate which APIs 3194795 developers have problems with [7]. However, this approach does not reveal what those problems are. 1 INTRODUCTION Our insight is that before a developer correctly uses an API, they Usable APIs are important. Egele and colleagues found that API may struggle to find the right way to use that API. That struggle misuse is widespread, with 88% of applications in the Google Play reveals the nature of the API usability problem. We capitalize on store having at least one API usage mistake [3]. The authors sug- this insight by creating a technique that we call StopMotion. Stop- gest that the problem could be mitigated had the API designers Motion analyzes snapshots of developers’ work history – typically made more usable API choices, such as by providing better default on every editor save – to reconstruct API usability challenges. By values and documentation. Security is one aspect of programming comparing adjacent snapshots, StopMotion finds instances where where API usability makes a clear difference, but other domains like developers replace one API call with another. When multiple de- reliability and performance may also suffer from poor API usability. velopers replace one API call with another, we infer that there is Researchers have created several labor-intensive methods of un- something confusing about the underlying API. Our approach is covering API usability issues, including user-centered design [12], a promising approach to discover API usability problems at scale, heuristic evaluation [2], peer review [8], lab experiments [13], and inexpensively, and at a high level of granularity. interviews [11]. While these approaches can uncover important This paper contributes a sketch of this new approach and initial problems, they cannot be expected to scale up to uncover usability results. We next discuss our approach, describe our experience in issues across a broad sample of programmers using dozens of APIs. applying it at Google, and discuss some areas for future work. Researchers have created four main techniques for uncovering API usability issues at scale. The first is conducting surveys [9]; 2 APPROACH while this approach can scale up to a large number of developers, it The goal of StopMotion is to identify API usability problems auto- requires social capital on the part of researchers to elicit responses matically. It works in several steps, which we describe below. ∗This work took place while Emerson Murphy-Hill was a Visiting Scientist and Andrew The first step is to identify code patches of interest. At Google, Head was an Intern at Google. patches are peer reviewed before being merged into the company’s Permission to make digital or hard copies of all or part of this work for personal or main codebase. Patches contain metadata, such as the person propos- classroom use is granted without fee provided that copies are not made or distributed ing the change, when the change was proposed, and the files that for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the the patch changes. Our tool selects recent patches that change at author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or least one existing Java file. republish, to post on servers or to redistribute to lists, requires prior specific permission The second step is to identify all edits that have gone into a patch. and/or a fee. Request permissions from [email protected]. WAPI’18, June 2–4, 2018, Gothenburg, Sweden For most software development at Google, file versions are recorded © 2018 Copyright held by the owner/author(s). Publication rights licensed to the automatically by a FUSE-based file system, such that every save Association for Computing Machinery. from the developer’s editor will be recorded as a separate version. ACM ISBN 978-1-4503-5754-8/18/06...$15.00 https://doi.org/10.1145/3194793.3194795 Furthermore, some editors at Google save automatically at regular WAPI’18, June 2–4, 2018, Gothenburg, Sweden Murphy-Hill, Sadowski, Head, Daughtry, Macvean, Jaspan, and Winter intervals, such as every 30 seconds. To access this data, we process of T (array, Iterator, Iterable, Collection). Our intuition the workspace associated with the patch. The workspace contains about the of/copyOf change is that the names of the methods don’t a history of every file changed in that workspace by a developer. distill the essence of the difference between the two. Sometimes developers create a new workspace for each patch, but A similar change (n = 81) was changing a Java collection’s workspaces may also be reused for multiple patches. To deal with instance method add to addAll. These methods add an item of workspace reuse, we analyze only the files changed in association type T to the collection and add a collection of items of type T, with the patch by analyzing the changes between when the patch respectively. Again, these methods are conceptually similar, but was submitted for peer review and when the developers’ prior patch arguably do a better job than of/copyOf of conveying their purpose. was merged into the main codebase. One might posit a simpler solution of overloading the methods so For each edit, we analyze each file that was changed to its prior as they are all called add; however, this would remove the ability version. To find changes to API client code, we use the abstract to create collections of collections. syntax tree (AST) differencing tool, Gumtree [5], which for Java uses Another common change (n = 150) was changing a call to size Eclipse’s Java Development Toolkit (JDT) as a back end. We found to isEmpty. While these two methods have quite different return that Gumtree often produces inaccurate results when syntax errors values, why a developer might make this change is clear; in a are present. We also found that many snapshots contained errors, conditional with a collection c, if(c.size()==0) is made more and we discard such snapshots. Through manual inspection, we intention-revealing by writing if(c.isEmpty()). This refactor- observed that if we excluded programs with errors, we would miss ing is suggested by a popular static analysis tool at our company. many interesting API usability problems, such as when a developer The relative infrequency of changes from isEmpty to size (n = 22) has typed the name of the method they want to call, but has not yet provides some confirmation of this interpretation. Similarly, we added the parentheses and semicolon that would make the program observed developers changing add to put on maps (n = 56). And syntactically valid. However, we judged the false positive rate was likewise, add does not exist in Map, but does in other collections. too high to continue analysis when errors are present. Our approach looks for particular fixed patterns of client API 3.3 Protocol Buffers changes. Our tool currently detects two similar patterns: Remote procedure calls are critical to Google given the number of • The developer changes a method call, in the form obj.a(...) calls being made. When you make 10 billion API calls every second, to obj.b(...). The tool records the fully qualified type of optimizing calls is important [1]. Protocol buffers are a language object and the method name before and after. neutral structure optimized for serialization. Given that protocol • The developer changes a static method call, in the form buffers are focused on serialization, they have their own repre- Class.a(...) to Class.b(...). The tool records the fully sentation of a byte string (com.google.protobuf.ByteString).

Discovering API Usability Problems at Scale

Poly Videoos Offer of Source for Open Source Software 3.6.0

A Comprehensive Study of Bloated Dependencies in the Maven Ecosystem

Gateway Licensing Information User Manual Version 19

Evil Pickles: Dos Attacks Based on Object-Graph Engineering∗

Executable Trigger-Action Comments

Trifacta Data Preparation for Amazon Redshift and S3 Must Be Deployed Into an Existing Virtual Private Cloud (VPC)

Intrinsic Redundancy for Reliability and Beyond

Hitachi Ops Center V.10.2.0

Cg 2018 Peiyuan Zhao

Automatic Acquisition of Annotated Training Corpora for Test-Code Generation

Branch Coverage Prediction in Automated Testing †

Exceptional Actors Implementing Exception Handling for Encore