Scaling Static Analyses at Facebook

contributed articles DOI:10.1145/3338112 integrated in the workflow used by se- Key lessons for designing static analyses tools curity engineers. It has led to thousands of fixes of security and privacy bugs, out- deployed to find bugs in hundreds of millions performing any other detection method of lines of code. used at Facebook for such vulnerabili- ties. We will describe the human and BY DINO DISTEFANO, MANUEL FÄHNDRICH, technical challenges encountered and FRANCESCO LOGOZZO, AND PETER W. O’HEARN lessons we have learned in developing and deploying these analyses. There has been a tremendous amount of work on static analysis, both in industry and academia, and we Scaling Static will not attempt to survey that material here. Rather, we present our rationale for, and results from, using techniques similar to ones that might be encoun- Analyses tered at the edge of the research literature, not only simple techniques that are much easier to make scale. Our goal is to complement other reports on industrial static analysis and formal at Facebook 1,6,13,17 methods, and we hope that such perspectives can provide input both to future research and to further industrial use of static analysis. Next, we discuss the three dimen- sions that drive our work: bugs that matter, people, and actioned/missed STATIC ANALYSIS TOOLS are programs that examine, and bugs. The remainder of the article de- attempt to draw conclusions about, the source of other scribes our experience developing and deploying the analyses, their impact, programs without running them. At Facebook, we and the techniques that underpin our have been investing in advanced static analysis tools tools. that employ reasoning techniques similar to those Context for Static from program verification. The tools we describe in Analysis at Facebook this article (Infer and Zoncolan) target issues related Bugs that Matter. We use static analysis to prevent bugs that would affect our prod- to crashes and to the security of our services, they ucts, and we rely on our engineers’ judg- perform sometimes complex reasoning spanning ment as well as data from production to many procedures or files, and they are integrated into tell us the bugs that matter the most. engineering workflows in a way that attempts to bring key insights value while minimizing friction. ˽ Advanced static analysis techniques These tools run on code modifications, participating performing deep reasoning about source code can scale to large as bots during the code review process. Infer targets industrial codebases, for example, with our mobile apps as well as our backend C++ code, 100-million LOC. ˽ Static analyses should strike a balance codebases with 10s of millions of lines; it has seen between missed bugs (false negatives) and un-actioned reports (false positives). over 100 thousand reported issues fixed by developers ˽ A “diff time” deployment, where issues before code reaches production. Zoncolan targets the are given to developers promptly as part of code review, is important to catching 100-million lines of Hack code, and is additionally bugs early and getting high fix rates. 62 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8 It is important for a static analysis nerabilities on Facebook, or on apps intended audience (that is, the people developer to realize that not all bugs of the Facebook family; for example, the analysis tool will be deployed to). are the same: different bugs can have Messenger, Instagram, or WhatsApp. For classes of bugs intended for all different levels of importance or sever- Third, we have an internal initiative or a wide variety of engineers on a given ity depending on the context and the for tracking the most severe bugs platform, we have gravitated toward a nature. A memory leak on a seldom- (SEV) that occur. “diff time” deployment, where analyz- used service might not be as important Our understanding of Bugs that ers participate as bots in code review, as a vulnerability that would allow at- Matter at Facebook drives our focus making automatic comments when tackers to gain access to unauthorized on advanced analyses. For contrast, a an engineer submits a code modifica- information. Additionally, the frequency recent paper states: “All of the static tion. Later, we recount a striking situ- of a bug type can affect the decision of analyses deployed widely at Google ation where the diff time deployment how important it is to go after. If a cer- are relatively simple, although some saw a 70% fix rate, where a more tradi- tain kind of crash, such as a null point- teams work on project-specific analysis tional “offline” or “batch” deployment er error in Java, were happening hourly, frameworks for limited domains (such (where bug lists are presented to engi- then it might be more important to tar- as Android apps) that do interproce- neers, outside their workflow) saw a 0% get than a bug of similar severity that dural analysis”17 and they give their en- fix rate. occurs only once a year. tirely logical reasons. Here, we explain In case the intended audience is the We have several means to collect why Facebook made the decision to much smaller collection of domain se- data on the bugs that matter. First of deploy interprocedural analysis (span- curity experts in the company, we use all, Facebook maintains statistics on ning multiple procedures) widely. two additional deployment models. At crashes and other errors that hap- People and deployments. While “diff time,” security related issues are pen in production. Second, we have a not all bugs are the same, neither are pushed to the security engineer on-call, “bug bounty” program, where people all users; therefore, we use different so she can comment on an in-progress IMAGE BY ANDRIJ BORYS ASSOCIATES, USING SHUTTERSTOCK ASSOCIATES, ANDRIJ BORYS BY IMAGE outside the company can report vul- deployment models depending on the code change when necessary. Addition- AUGUST 2019 | VOL. 62 | NO. 8 | COMMUNICATIONS OF THE ACM 63 contributed articles ally, for finding all instances of a given crashes and app not-responding events though less recognized, the false posi- bug in the codebase or for historical ex- that occur on mobile devices. tive rate is challenging to measure for ploration, offline inspection provides The actioned reports and missed a large, rapidly changing codebase: it a user interface for querying, filtering, bugs are related to the classic concepts would be extremely time consuming and triaging all alarms. of true positives and false negatives from for humans to judge all reports as false In all cases, our deployments focus the academic static analysis literature. A or true as the code is changing. on the people our tools serve and the true positive is a report of a potential bug Although true positives and false way they work. that can happen in a run of the program negatives are valuable concepts, we Actioned reports and missed bugs. in question (whether or not it will hap- don’t make claims about their rates The goal of an industrial static analysis pen in practice); a false positive is one and pay more attention to the action tool is to help people: at Facebook, this that cannot happen. Common wisdom rate and the (observed) missed bugs. means the engineers, directly, and the in static analysis is that it is important Challenges: Speed, scale, and accuracy. A people who use our products, indirect- to keep control of the false positives be- first challenge is presented by the sheer ly. We have seen how the deployment cause they can negatively impact engi- scale of Facebook’s codebases, and the model can influence whether a tool neers who use the tools, as they tend to rate of change they see. For the server- is successful. Two concepts we use to lead to apathy toward reported alarms. side, we have over 100-million lines of understand this in more detail, and to This has been emphasized, for instance, Hack code, which Zoncolan can process help us improve our tools, are actioned in previous Communications’ articles on in less than 30 minutes. Additionally, reports and observable missed bugs. industrial static analysis.1,17 False nega- we have 10s of millions of both mobile The kind of action taken as a result tives, on the other hand, are potentially (Android and Objective C) code and of a reported bug depends on the de- harmful bugs that may remain unde- backend C++ code. Infer processes the ployment model as well as the type of tected for a long time. An undetected code modifications quickly (within 15 bug. At diff time an action is an up- bug affecting security or privacy can lead minutes on average) in its diff time de- date to the diff that removes a static to undetected exploits. In practice, fewer ployment. All codebases see thousands analysis report. In Zoncolan’s offline false positives often (though not always) of code modifications each day and our deployment a report can trigger the implies more false negatives, and vice tools run on each code change. For Zon- security expert to create a task for the versa, fewer false negatives implies colan, this can amount to analyzing one product engineer if the issue is im- more false positives. For instance, one trillion lines of code (LOC) per day. portant enough to follow up with the way to reign in false positives is to fail It is relatively straightforward to product team. Zoncolan catches more to report when you are less than sure a scale program analyses that do simple SEVs than either manual security re- bug will be real; but silencing an analy- checks on a procedure-local basis only.

Load more