Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK

1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it. A great deal has been said panels on industry-academia collaboration. Here is a list of about barriers to adoption of such techniques in industry, and such (perhaps uncharitably paraphrased) observations: the question and variations of it form the basis for the peren- nial panel sessions that spring up at countless international 1) Irrelevant: ‘Academics need to make their research conferences. more industrially-relevant’, In this paper, we wish to make a contribution to this 2) Unconvincing: ‘Scientific work should be evaluated on discussion informed by our experience with the deployment large-scale real-world problems’, of the static analysis technique Infer [23], [28], [29], [109], 3) Misdirected: ‘Researchers need to spend more time and the dynamic analysis technique Sapienz [45], [89], [91], understanding the problems faced by the industry’, at Facebook. 4) Unsupportive: ‘Industry needs to provide more funding We do not claim to have all, nor even many answers, for research’, and our experience may be somewhat specific to continuous 5) Closed System: ‘Industrialists need to make engineer- deployment in the technology sector in general and, perhaps in ing production code and software engineering artifacts some places, to Facebook alone in particular. Nevertheless, we available for academic study’, and believe that our relatively unusual position as both professors 6) Closed Mind: ‘Industrialists are unwilling to adopt (in Programming Languages and Software Engineering) and promising research’. also engineering managers/engineers in a large tech sector While some of these observations may be true some of company, may offer us a perspective from which we can make the time, focussing any further attention on them leads both a few useful contributions to this ongoing ‘deployment debate’. communities into an unproductive dead end; none of these We explain some of the myths, prevalent in Programming paraphrased quotations provides much practical actionable Languages and Software Engineering research communities, guidance that can be used by the scientific research community many of which we have, ourselves, initially assumed to be to improve the deployability of research prototypes in industry. merely ‘common sense’, yet which our more recent experience We believe (and have found in practice) that practitioners in industry has challenged. We also seek to identify attributes are, in the right circumstances, open minded and supportive of research prototypes that make them more or less suitable and willing to adopt and consider academic research proto- to deployment, and focus on remaining open problems and types. More specifically, With the right kind of scientific evidence in 1This paper accompanies the authors’ joint keynote at the 18th IEEE In- ternational Working Conference on Source Code Analysis and Manipulation, the evaluation, industrialists are very willing to September 23rd-24th, 2018 - Madrid, Spain adopt, deploy and develop research. 2 In this paper we try to elucidate how the research commu- optimising for flaky tests. We characterise this as the need nity might further and better provide such scientific evidence. to recognise that the world into which we deploy testing The research prototypes need not be immediately deployable. and verification techniques is increasingly one where we This would require significant additional engineering effort could achieve greater impact by assuming that all tests that the academics could not (and should not) undertake. are flaky. However, with the right kind of research questions and con- • TERF Ratio. Section VII. The Test-Execution-to- sequent evidence in the scientific evaluation, researchers can Release-Frequency (TERF) Ratio is changing. We surface demonstrate an elevated likelihood that, with such further the issue of accounting for (and reducing) test execution engineering effort from industry, there will exist effective and cost in terms of this ratio in Section VII. efficient deployment routes. • Fix Detection. Section VIII: We had thought, before we Even where the immediate goal of the research is not migrated to our industrial roles, that the problem of deter- deployment, we believe that consideration of some of these re- mining whether a bug is fixed was a relatively trivial and search questions and scientific evaluation criteria may improve entirely solved problem. That was another misconception; the underlying science. For example, by clarifying the intended the problem of fix detection is a challenging one and, deployment use case, the researcher is able to widen the we believe, has not received sufficient attention from the number of techniques that become plausible thereby ensuing research community. that the community retains promising techniques that might • Testability Transformation. Section IX: As a com- happen to fall foul of some, in vogue evaluation criterion. munity, we regularly use code transformation in order In the paper we provide a set of open problems and to facilitate testing and verification, for example, by challenges for the scientific research community in testing and mocking procedures and modeling subsystems for which verification. We believe that these open problems are, and will code is unavailable. Sadly, this practice lacks a fully remain, pertinent to software deployment models that exhibit formal foundation, a problem that we briefly touch on continuous integration and deployment. in Section IX. Although this constitutes a large section of the overall indus- • Evaluation. Section X: We would like to suggest some trial sector, we do not claim that these challenges necessarily possible dimensions for scientific evaluation of research retain their importance when extended beyond continuous prototypes, results along which we believe would provide integration and deployment to other parts of the Software Engi- compelling evidence for likely deployability. These evalu- neering sector. We would be interested to explore collaboration ation criteria, discussed in Section X, are not an additional opportunities that seek to tackle these open problems. burden on researchers. Rather, they offer some alternative The paper is structured as follows: ways in which research work may prove actionable and • Background. Section II: Our background lies in the important, even if it might fail to meet currently accepted scientific community, while the principal focus of our evaluation criteria. current work is in industrial deployment of static and • Deployability. Section XI: Our experience lies primary dynamic analysis. Section II provides a brief overview of in the challenges of deployment within continuous inte- this industrial deployment to set this paper in the context gration environments, which are increasingly industrially of our recent industrial experience. prevalent [101]. In Section XI we describe some of the • ROFL Myth. Section III: We initially fell into a trap, lessons we learned in our efforts to deploy research on which we characterise as believing in a myth, the ‘Report testing and verification at Facebook, focusing on those Only Fault/Failure List’ (ROFL) myth, which we describe lessons that we believe to be generic to all continuous in Section III. We want to surface this myth in Section III, integration environments, but illustrating with specifics because we believe it may still enjoy widespread tacit from Facebook’s deployment of Infer and Sapienz. support in the approach to research adopted by the • Find, Fix and Verify. Section XII: Finally, we could community. We have experienced

Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK

Smoke Testing

Microprocessor

Executable Code Is Not the Proper Subject of Copyright Law a Retrospective Criticism of Technical and Legal Naivete in the Apple V

IBM Developer for Z/OS Enterprise Edition

Static Analysis the Workhorse of a End-To-End Securitye Testing Strategy

Studying the Real World Today's Topics

Types of Software Testing

E2 Studio Integrated Development Environment User's Manual: Getting

SAS® Debugging 101 Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Load Testing, Benchmarking, and Application Performance Management for the Web

Precise and Scalable Static Program Analysis of NASA Flight Software

Three Architectural Models for Compiler-Controlled Speculative