contributed articles

DOI:10.1145/3338112 integrated in the workflow used by se- Key lessons for designing static analyses tools curity engineers. It has led to thousands of fixes of security and privacy bugs, out- deployed to find bugs in hundreds of millions performing any other detection method of lines of code. used at for such vulnerabili- ties. We will describe the human and BY DINO DISTEFANO, MANUEL FÄHNDRICH, technical challenges encountered and FRANCESCO LOGOZZO, AND PETER W. O’HEARN lessons we have learned in developing and deploying these analyses. There has been a tremendous amount of work on static analysis, both in industry and academia, and we Scaling Static will not attempt to survey that material here. Rather, we present our rationale for, and results from, using techniques similar to ones that might be encoun- Analyses tered at the edge of the research litera- ture, not only simple techniques that are much easier to make scale. Our goal is to complement other reports on industrial static analysis and formal at Facebook 1,6,13,17 methods, and we hope that such perspectives can provide input both to future research and to further indus- trial use of static analysis. Next, we discuss the three dimen- sions that drive our work: bugs that matter, people, and actioned/missed STATIC ANALYSIS TOOLS are programs that examine, and bugs. The remainder of the article de- attempt to draw conclusions about, the source of other scribes our experience developing and deploying the analyses, their impact, programs without running them. At Facebook, we and the techniques that underpin our have been investing in advanced static analysis tools tools.

that employ reasoning techniques similar to those Context for Static from program verification. The tools we describe in Analysis at Facebook this article (Infer and Zoncolan) target issues related Bugs that Matter. We use static analysis to prevent bugs that would affect our prod- to crashes and to the security of our services, they ucts, and we rely on our engineers’ judg- perform sometimes complex reasoning spanning ment as well as data from production to many procedures or files, and they are integrated into tell us the bugs that matter the most. engineering workflows in a way that attempts to bring key insights

value while minimizing friction. ˽˽ Advanced static analysis techniques These tools run on code modifications, participating performing deep reasoning about source code can scale to large as bots during the code review process. Infer targets industrial codebases, for example, with our mobile apps as well as our backend C++ code, 100-million LOC. ˽˽ Static analyses should strike a balance codebases with 10s of millions of lines; it has seen between missed bugs (false negatives) and un-actioned reports (false positives).

over 100 thousand reported issues fixed by developers ˽˽ A “diff time” deployment, where issues before code reaches production. Zoncolan targets the are given to developers promptly as part of code review, is important to catching 100-million lines of Hack code, and is additionally bugs early and getting high fix rates.

62 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8 It is important for a static analysis nerabilities on Facebook, or on apps intended audience (that is, the people developer to realize that not all bugs of the Facebook family; for example, the analysis tool will be deployed to). are the same: different bugs can have Messenger, Instagram, or WhatsApp. For classes of bugs intended for all different levels of importance or sever- Third, we have an internal initiative or a wide variety of engineers on a given ity depending on the context and the for tracking the most severe bugs platform, we have gravitated toward a nature. A memory leak on a seldom- (SEV) that occur. “diff time” deployment, where analyz- used service might not be as important Our understanding of Bugs that ers participate as bots in code review, as a vulnerability that would allow at- Matter at Facebook drives our focus making automatic comments when tackers to gain access to unauthorized on advanced analyses. For contrast, a an engineer submits a code modifica- information. Additionally, the frequency recent paper states: “All of the static tion. Later, we recount a striking situ- of a bug type can affect the decision of analyses deployed widely at ation where the diff time deployment how important it is to go after. If a cer- are relatively simple, although some saw a 70% fix rate, where a more tradi- tain kind of crash, such as a null point- teams work on project-specific analysis tional “offline” or “batch” deployment er error in Java, were happening hourly, frameworks for limited domains (such (where bug lists are presented to engi- then it might be more important to tar- as Android apps) that do interproce- neers, outside their workflow) saw a 0% get than a bug of similar severity that dural analysis”17 and they give their en- fix rate. occurs only once a year. tirely logical reasons. Here, we explain In case the intended audience is the We have several means to collect why Facebook made the decision to much smaller collection of domain se- data on the bugs that matter. First of deploy interprocedural analysis (span- curity experts in the company, we use all, Facebook maintains statistics on ning multiple procedures) widely. two additional deployment models. At crashes and other errors that hap- People and deployments. While “diff time,” security related issues are pen in production. Second, we have a not all bugs are the same, neither are pushed to the security engineer on-call, “bug bounty” program, where people all users; therefore, we use different so she can comment on an in-progress

IMAGE BY ANDRIJ BORYS ASSOCIATES, USING SHUTTERSTOCK ASSOCIATES, ANDRIJ BORYS BY IMAGE outside the company can report vul- deployment models depending on the code change when necessary. Addition-

AUGUST 2019 | VOL. 62 | NO. 8 | COMMUNICATIONS OF THE ACM 63 contributed articles

ally, for finding all instances of a given crashes and app not-responding events though less recognized, the false posi- bug in the codebase or for historical ex- that occur on mobile devices. tive rate is challenging to measure for ploration, offline inspection provides The actioned reports and missed a large, rapidly changing codebase: it a user interface for querying, filtering, bugs are related to the classic concepts would be extremely time consuming and triaging all alarms. of true positives and false negatives from for humans to judge all reports as false In all cases, our deployments focus the academic static analysis literature. A or true as the code is changing. on the people our tools serve and the true positive is a report of a potential bug Although true positives and false way they work. that can happen in a run of the program negatives are valuable concepts, we Actioned reports and missed bugs. in question (whether or not it will hap- don’t make claims about their rates The goal of an industrial static analysis pen in practice); a false positive is one and pay more attention to the action tool is to help people: at Facebook, this that cannot happen. Common wisdom rate and the (observed) missed bugs. means the engineers, directly, and the in static analysis is that it is important Challenges: Speed, scale, and accuracy. A people who use our products, indirect- to keep control of the false positives be- first challenge is presented by the sheer ly. We have seen how the deployment cause they can negatively impact engi- scale of Facebook’s codebases, and the model can influence whether a tool neers who use the tools, as they tend to rate of change they see. For the server- is successful. Two concepts we use to lead to apathy toward reported alarms. side, we have over 100-million lines of understand this in more detail, and to This has been emphasized, for instance, Hack code, which Zoncolan can process help us improve our tools, are actioned in previous Communications’ articles on in less than 30 minutes. Additionally, reports and observable missed bugs. industrial static analysis.1,17 False nega- we have 10s of millions of both mobile The kind of action taken as a result tives, on the other hand, are potentially (Android and Objective C) code and of a reported bug depends on the de- harmful bugs that may remain unde- backend C++ code. Infer processes the ployment model as well as the type of tected for a long time. An undetected code modifications quickly (within 15 bug. At diff time an action is an up- bug affecting security or privacy can lead minutes on average) in its diff time de- date to the diff that removes a static to undetected exploits. In practice, fewer ployment. All codebases see thousands analysis report. In Zoncolan’s offline false positives often (though not always) of code modifications each day and our deployment a report can trigger the implies more false negatives, and vice tools run on each code change. For Zon- security expert to create a task for the versa, fewer false negatives implies colan, this can amount to analyzing one product engineer if the issue is im- more false positives. For instance, one trillion lines of code (LOC) per day. portant enough to follow up with the way to reign in false positives is to fail It is relatively straightforward to product team. Zoncolan catches more to report when you are less than sure a scale program analyses that do simple SEVs than either manual security re- bug will be real; but silencing an analy- checks on a procedure-local basis only. views or bug bounty reports. We mea- sis in this way (say, by ignoring paths The simplest form is linters, which give sured that 43.3% of the severe security or by heuristic filtering) has the effect of syntactic style advice (for example, “the bugs are detected via Zoncolan. At missing bugs. And, if you want to discov- method you called is to be deprecated, press time, Zoncolan’s “action rate” is er and report more bugs you might also please consider rewriting”). Such simple above 80% and we observed about 11 add more spurious behaviors. checks provide value and are in wide de- “missed bugs.” The reason we are interested in ployment in major companies including A missed bug is one that has been advanced static analyses at Facebook Facebook; we will not comment on them observed in some way, but that was not might be understood in classic terms further in this article. But for more rea- reported by an analysis. The means of as saying: false negatives matter to us. soning going beyond local checks, such observation can depend on the kind of However, it is important to note the as one would find in the academic litera- bug. For security vulnerabilities we have number of false negatives is notori- ture on static analysis, scaling to 10s or bug bounty reports, security reviews, or ously difficult to quantify (how many 100s of millions of LOC is a challenge, as SEV reviews. For our mobile apps we log unknown bugs are there?). Equally, is the incremental scalability needed to support diff time reporting. Figure 1. Continuous development. Infer and Zoncolan both use tech- niques similar to some of what one might find at the edge of the research literature. Infer, as we will discuss, uses one analysis based on the theory 16 Code Reviewers of Separation Logic, with a novel the- orem prover that implements an infer- ence technique that guesses assump- 5 Diff Time Post Land tions. Another Infer analysis involves recently published research results on concurrency analysis.2,10 Zoncolan im- plements a new modular parallel taint analysis algorithm. Developer CI System Code Review CI System Product But how can Infer and Zoncolan scale? The core technical features they

64 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8 contributed articles share are compositionality and careful- WhatsApp—are mostly written in Objec- ly crafted abstractions. For most of this tive-C and Java. C++ is the main language article we will concentrate on what one of choice for backend services. There are gets from applying Infer and Zoncolan, 10s of millions of lines each of mobile rather than on their technical proper- and backend code. ties, but we outline their foundations The reason While they use the same develop- later and provide more technical de- we are interested ment models, the website and mobile tails in an online appendix (https:// products are deployed differently. This dl.acm.org/citation.cfm?doid=333811 in advanced static affects what bugs are considered most 2&picked=formats). analyses important, and the way that bugs can be The challenge related to accuracy is fixed. For the website, Facebook directly intimately related to actioned reports at Facebook might deploys new code to its own datacenters, and missed bugs. We try to strike a bal- and bug fixes can be shipped directly to ance between these issues, informed be understood in our datacenters frequently, several times by the desires based on the class of classic terms: daily and immediately when necessary. bugs and the intended audience. The For the mobile apps, Facebook relies more severe a potentially missed issue false negatives on people to download new versions to is, the lower the tolerance for missed matter to us. from the Android or the Apple store; new bugs. Thus, for issues that indicate a versions are shipped weekly, but mobile potential crash or performance regres- bugs are less under our control because sion in a mobile app such as Messen- even if a fix is shipped it might not be ger, WhatsApp, Instagram, or Face- downloaded to some people’s phones. book, our tolerance for missed bugs is Common runtime errors—for exam- lower than, for example, stylistic lint ple, null pointer exceptions, division by suggestions (for example, don’t use zero—are more difficult to get fixed on deprecated method). For issues that mobile than on the server. On the other could affect the security of our infra- hand, server-side security and privacy structure or the privacy of the people bugs can severely impact both the users using our products, our tolerance for of the Web version of Facebook as well false positives is higher still. as our mobile users, since the privacy checks are performed on the server-side. Software Development at Facebook As a consequence, Facebook invests in Facebook practices continuous soft- tools to make the mobile apps more re- ware development,9 where a main liable and server-side code more secure. codebase (master) is altered by thou- sands of programmers submitting Moving Fast with Infer code modifications (diffs). Master and Infer is a static analysis tool applied diffs are the analogues of, respectively, to Java, Objective C, and C++ code at GitHub master branch and pull re- Facebook.4 It reports errors related to quests. The developers share access to memory safety, to concurrency, to se- a codebase and they land, or commit, a curity (information flow), and many diff to the codebase after passing code more specialized errors suggested by review. A continuous integration system Facebook developers. Infer is run inter- (CI system) is used to ensure code con- nally on the Android and iOS apps for tinues to build and passes certain tests. Facebook, Instagram, Messenger, and Analyses run on the code modification WhatsApp, as well as on our backend and participate by commenting their C++ and Java code. findings directly in the code review tool. Infer has its roots in academic re- The Facebook website was originally search on program analysis with sepa- written in PHP, and then ported to Hack, ration logic,5 research, which led to a a gradually typed version of PHP devel- startup company (Monoidics Ltd.) that oped at Facebook (https://hacklang. was acquired by Facebook in 2013. In- org/). The Hack codebase spans over 100 fer was open sourced in 2015 (www. million lines. It includes the Web fron- fbinfer.com) and is used at Amazon, tend, the internal web tools, the APIs to Spotify, , and other companies. access the social graph from first- and Diff-time continuous reasoning. In- third-party apps, the privacy-aware data fer’s main deployment model is based abstractions, and the privacy control log- on fast incremental analysis of code ic for viewers and apps. Mobile apps— changes. When a diff is submitted to for Facebook, Messenger, Instagram and code review an instance of Infer is run

AUGUST 2019 | VOL. 62 | NO. 8 | COMMUNICATIONS OF THE ACM 65 contributed articles

in Facebook’s internal CI system (Fig- assigned them to the developers we an issue is discovered in the codebase, ure 1). Infer does not need to process thought best able to resolve them. it can be nontrivial to assign it to the the entire codebase in order to analyze The response was stunning: we were right person. In the extreme, somebody a diff, and so is fast. greeted by near silence. We assigned who has left the company might have An aim has been for Infer to run in 20–30 issues to developers, and almost caused the issue. Furthermore, even 15min–20min on a diff on average, none of them were acted on. We had if you think you have found someone and this includes time to check out the worked hard to get the false positive familiar with the codebase, the issue source repository, to build the diff, and rate down to what we thought was less might not be relevant to any of their to run on base and (possibly) parent than 20%, and yet the fix rate—the pro- past or current work. But, if we com- commits. It has typically done so, but portion of reported issues that devel- ment on a diff that introduces an issue we constantly monitor performance opers resolved—was near zero. then there is a pretty good (but not per- to detect regressions that makes it Next, we switched Infer on at diff fect) chance that it is relevant. take longer, in which case we work to time. The response of engineers was just Mental context switch has been bring the running time back down. Af- as stunning: the fix rate rocketed to over the subject of psychological studies,12 ter running on a diff, Infer then writes 70%. The same program analysis, with and it is, along with the importance comments to the code review system. same false positive rate, had much great- of relevance, part of the received col- In the default mode used most often er impact when deployed at diff time. lective wisdom impressed upon us by it reports only regressions: new issues While this situation was surprising Facebook’s engineers. Note that others introduced by a diff. The “new” issues to the static analysis experts on the have also remarked on the benefits of are calculated using a bug equivalence Infer team, it came as no surprise to reporting during code review.17 notion that uses a hash involving the Facebook’s developers. Explanations At Facebook, we are working actively bug type and location-independent they offered us may be summarized in on moving other testing technologies to information about the error message, the following terms: diff time when possible. We are also sup- and which is sensitive to file moves and One problem that diff-time deploy- porting academics on researching incre- line number changes cause by refactor- ment addresses is the mental effort of mental fuzzing and symbolic execution ing, deleting, or adding code; the aim is context switch. If a developer is working techniques for diff time reporting. to avoid presenting warnings that de- on one problem, and they are confront- Interprocedural bugs. Many of the velopers might regard as pre-existing. ed with a report on a separate problem, bugs that Infer finds involve reasoning Fast reporting is important to keep in then they must swap out the mental con- that spans multiple procedures or files. tune with the developers’ workflows. text of the first problem and swap in the An example from OpenSSL illustrates: In contrast, when Infer is run in whole- second, and this can be time consum- program mode it can take more than an ing and disruptive. By participating as a apps/ca.c:2780: NULL _ DEREFERENCE hour (depending on the app)—too slow bot in code review, the context switch pointer ‘revtm’ last assigned on line for diff-time at Facebook. problem is largely solved: program- 2778 could be null Human factors. The significance of mers come to the review tool to dis- and is dereferenced at line 2780, col- the diff-time reasoning of Infer is best cuss their code with human reviewers, umn 6 understood by contrast with a failure. with mental context already swapped 2778. revtm = X509 _ gmtime _ adj(NULL, 0); The first deployment was batch rather in. This also illustrates how important 2779. than continuous. In this mode Infer timeliness is: if a bot were to run for an 2780. i = revtm->length + 1; would be run once per night on the hour or more on a diff it could be too entire Facebook Android codebase, late to participate effectively. The issue is that the procedure and it would generate a list of issues. A second problem that diff-time de- X 509 _ g m t i m e _ a d j() can return We manually looked at the issues, and ployment addresses is relevance. When null in some circumstances. Overall,

Figure 2. A simple example capturing a common safety pattern used in Android apps.

Threading information is used to limit the amount of synchronization required. As a comment from the original code explains: “mCount is written to only by the main thread with the lock held, read from the main thread with no lock held, or read from any other thread with the lock held.” Bottom: unsafe additions to RaceWithMainThread .java.

66 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8 contributed articles the error trace found by Infer has 61 neers—it had to be fast, with actionable steps, and the source of null, the call to reports, and not too many missed bugs X 509 _ g m t i m e _ a d j() goes five pro- on product code (but not on infrastruc- cedures deep and it eventually encoun- ture code).2,15 The tool borrowed ideas ters a return of null at call-depth 4. This from concurrent separation logic, but bug was one of 15 that we reported to Advanced static we gave up on the ideal of proving ab- OpenSSL which were all fixed. analyses, like solute race freedom. Instead, we estab- Infer finds this bug by performing lished a ‘completeness’ theorem saying compositional reasoning, which al- those found in the that, under certain assumptions, a the- lows covering interprocedural bugs research literature, oretical variant of the analyzer reports while still scaling to millions of LOC. only true positives.10 It deduces a precondition/postcondi- can be deployed The analysis checks for data races in tion specification approximating the Java programs—two concurrent memo- behavior of X509 _ gmtime _ adj, at scale and ry accesses, one of which is a write. The and then uses that specification when deliver value for example in Figure 2 (top) illustrates: If reasoning about its calls. The specifi- we run the Infer on this code it doesn’t cation includes 0 as one of the return general code. find a problem. The unprotected read values, and this triggers the error. and the protected write do not race be- In 2017, we looked at bug fixes in cause they are on the same thread. But, several categories and found that for if we include additional methods that some (null dereferences, data races, do conflict, then Infer will report races, and security issues) over 50% of the as in Figure 2, bottom. fixes were for bugs with traces that were Impact. Since 2014, Facebook’s devel- interprocedural.a The interprocedural opers have resolved over 100,000 issues bugs would be missed bugs if we only flagged by Infer. The majority of Infer’s deployed procedure-local analyses. impact comes from the diff-time deploy- Concurrency. A concurrency capabili- ment, but it is also run batch to track is- ty recently added to Infer, the RacerD sues in master, issues addressed in fix- analysis, provides an example of the ben- athons and other periodic initiatives. efit of feedback between program analy- The RacerD data race detector saw sis researchers and product engineers.2,15 over 2,500 fixes in the year to March Development of the analysis started in 2018. It supported the conversion of early 2016, motivated by Concurrent Sep- Facebook’s Android app from a single- aration Logic.3 After 10 months of work threaded to a multithreaded architec- on the project, engineers from News ture by searching for potential data rac- Feed on Android caught wind of what es, without the programmers needing we were doing and reached out. They to insert annotations for saying which were planning to convert part of Face- pieces of memory are guarded by what book’s Android app from a sequential locks. This conversion led to an im- to a multithreaded architecture. Hun- provement in scroll performance and, dreds of classes written for a single- speaking about the role of the analyzer, threaded architecture had to be used Benjamin Jaeger, an Android engineer at now in a concurrent context: the trans- Facebook, stated:b “without Infer, multi- formation could introduce concurrency threading in News Feed would not have errors. They asked for interprocedural been tenable.” As of March 2018, no An- capabilities because Android UI is ar- droid data race bugs missed by Infer had ranged in trees with one class per node. been observed in the previous year (mod- Races could happen via interprocedural ulo 3 analyzer implementation errors.)2 call chains sometimes spanning several The fix rate for the concurrency classes, and mutations almost never analysis to March 2018 was roughly happened at the top level: procedural lo- 50%, lower than for the previous gen- cal analysis would miss most races. eral diff analysis. Our de velopers have We had been planning to launch the emphasized that they appreciate the proof tool we were working on in a year’s reports because concurrency errors are time, but the Android engineers were difficult to debug. This illustrates our starting their project and needed help earlier points about balancing action sooner. So we pivoted to a minimum via- rates and bug severity. See Blackshear ble product, which would serve the engi- et al.2 for more discussion on fix rates. a https://bit.ly/2WloBVj b https://bit.ly/2xurbMl

AUGUST 2019 | VOL. 62 | NO. 8 | COMMUNICATIONS OF THE ACM 67 contributed articles

Overall, Infer reports on over 30 types to enable more powerful analysis of the user’s account. The root cause of of issues, ranging from deep inter-pro- the core Facebook codebase. Zoncol- the vulnerability in Figure 3 is that cedural checks to simple procedure- an is the static analysis tool we built the attacker controls the value of the local checks and lint rules. Concurrency to find code and data paths that may member_id variable which is used in support includes checks for deadlocks cause a security or a privacy violation the action field of the

element. and starvation, with hundreds of “app in our Hack codebase. Zoncolan follows the interprocedural not-responding”’ bugs being fixed in the The code in Figure 3 is an example flow of untrusted data (for example, user past year. Infer has also recently imple- of a vulnerability prevented by Zoncol- input) to sensitive parts of the codebase. mented a security analysis (a ‘taint’ anal- an. If the member_id variable on line Virtual calls do make interprocedural ysis), which has been applied to Java and 21 contains the value ../../u s e r s/ analysis difficult since the tool gener- C++ code; it gained this facility by bor- delete_user/, it is possible to redi- ally does not know the precise type of an rowing ideas from Zoncolan. rect this form into any other form on object. To avoid missing paths (and thus Facebook. On submission of the form, bugs), Zoncolan must consider all the Staying Secure with Zoncolan it will invoke a request to https://face- possible functions a call may resolve to. One of the original reasons for the de- book.com/groups/add_member/../../ SEV-oriented static analysis develop- velopment and adoption of Hack was users/delete_user/ that will delete ment. We designed and developed Zon- colan in collaboration with the Facebook Figure 3. Example of a bug that Zoncolan prevents. It may cause the attacker to delete a App Security team. Alarms reported by user account. The attacker can provide an input on line 5 that causes a redirection to any other form on Facebook at line 20. Zoncolan are inspired by security bugs uncovered by the App Security team. The initial design of Zoncolan began with a list of SEVs that were provided to us by security engineers. For each bug we asked ourselves: “How could we have caught it with static analysis?” Most of those historical bugs were no longer relevant because the programming lan- guage or a secure framework prevented them from recurring—for instance, the widespread adoption of XHP made it possible to build XSS-free Web pages by construction. We realized the remain- ing bugs involved interprocedural flows of untrusted data, either directly or indi- rectly, into some privileged APIs. Detect- ing such bugs can be automated with static taint flow analysis,18 which tracks how the data originating from some un- trusted sources reaches or influences Figure 4. Funneled deployment of Zoncolan the data reaching some sensitive parts of the codebase (sinks). When a security engineer discovers a new vulnerability, we evaluate whether that class of vulnerability is amenable to static analysis. If it is, we prototype the new rule, iterating with the feedback of the engineer in order to refine results to strike the right balance of false posi- tives/false negatives. When we believe security on call bot the rule is good enough, it is enabled reviews on all runs of Zoncolan in production. We adopt the standard Facebook App Security severity framework, which as- sociates to each vulnerability an impact level, in a scale from 1 (best-practice) to 5 (SEV-worthy). A security impact level of 3 or more is considered severe. Master Diff-time analysis analysis Scaling the analysis. A main chal- lenge was to scale Zoncolan to a code- base of more than 100 millions of LOC

68 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8 contributed articles code. Thanks to a new parallel, compo- the visibility of the endpoint (external Impact. Zoncolan has been de- sitional, non-uniform static analysis or internal?), and so on. At press time, ployed for more than two years at Face- that we designed, Zoncolan performs circa 1/3 of the Zoncolan categories are book, first to security engineers, then the full analysis of the code base in less enabled for diff analysis. to software engineers. It has prevented than 30 minutes on a 24-core server. Zoncolan analyzes every Hack code thousands of vulnerabilities from be- Zoncolan builds a dependency graph modification and reports alarms if a diff ing introduced to Facebook’s code- that relates methods to their potential introduces new security vulnerabilities. base. Figure 5 compares the number callers. It uses this graph to schedule The target audience is: the author and of SEVs, such as bugs of severity 3-to-5, parallel analyses of individual methods. the reviewers of the diff (Facebook soft- prevented by Zoncolan, in a six-month In the case of mutually recursive meth- ware engineers who are not security ex- period, to the traditional programs ad- ods, the scheduler iterates the analysis of perts), and the security engineer in the opted by security engineers, such as the methods until it stabilizes, that is, no on-call rotation (who has a limited time manual code reviews/pentesting and more flows are discovered. Suitable oper- budget). When appropriate, the on-call bug bounty reports. The bars show that ators (called widenings in the static anal- validates the alarm reported, blocks at Facebook, Zoncolan catches more ysis literature7) ensure the convergence the diff, and provides support to write SEVs than either manual security re- of the iterations. It is worth mentioning the code in a secure way. For categories views or bug bounty reports. We mea- that, even though the concept of taint with very high signal, Zoncolan acts as a sured that 43.3% of the severe security analysis is well established in Academia, security bot: it bypasses the security on- bugs are detected via Zoncolan. we had to develop new algorithms in or- call and instead comments directly on The graph in Figure 6 shows the dis- der to scale to the size of our codebase. the diff. It provides a detailed explana- tribution of the actioned bugs found by Funneled deployment. Figure 4 tion on the security vulnerability, how it Zoncolan at different stages of the de- provides a graphical representation can be exploited, and includes referenc- ployment funnel, according to the se- of the Zoncolan deployment model. es to past incidents, for example, SEVs. curity impact level. The largest number This funneled deployment model op- Finally, note the funneled deploy- of categories is enabled for the master timizes bug detection with the goal of ment model makes it possible to scale analysis, so it is not unexpected that it supporting security of Facebook: The up the security fixes, without reducing is the largest bucket. However, when re- Zoncolan master analysis finds all ex- the overall coverage Zoncolan achieves stricting to SEVs, the diff analysis large- isting instances of a newly discovered (that is, without missing bugs): If Zon- ly overtakes the master analysis—211 vulnerability. The Zoncolan diff analy- colan determines a new issue is not severe issues are prevented at diff-time, sis avoids vulnerabilities from being high-signal enough for autocomment- versus 122 detected on master. Overall, (re-)introduced in the codebase. ing on the diff, but needs to be looked we measured the ratio of Zoncolan ac- Zoncolan periodically analyzes the at by an expert, it pushes it to the on- tioned bugs to be close to 80%. entire Facebook Hack codebase to up- call queue. If the alarm makes neither We also use the traditional secu- date the master list. The target audi- of these cuts, the issue will end up in rity programs to measure missed bugs ence is security engineers performing the Zoncolan master analysis after the (that is, the vulnerabilities for which security reviews. In the master analysis, diff is committed. there is a Zoncolan category), but the we expose all alarms found. Security engineers are interested in all existing Figure 5. Comparison of severe bugs reported by Zoncolan with respect to security reviews and bug bounty, in a six-month period (darker implies more severe). alarms for a given project or a given category. They triage alarms via a dash- board, which enables filtering by proj- WhiteHat ect, code location, source and/or des- tination of the data, length or features Security Reviews of the trace. When a security engineer finds a bug, he/she files a task for the Zoncolan product group and provides guidance on how to make the code secure. When 0 50 100 150 200 250 300 350 400 an alarm is a false positive, he/she files a task for the developers of Zoncolan with an explanation of why the alarm is Figure 6. Distribution of all the bugs fixed, in a six-month period, based on Zoncolan’s funneled deployment and bug severity (darker implies more severe). false. The Zoncolan developers then re- fine the tool to improve the precision of the analysis. After a category has been master extensively tested, the Zoncolan team, in conjunction with the App security on call team, evaluates if it can be promoted for diff analysis. Often promotion in- bot volves improving the signal by filtering the output according to, for example, 0 20 40 60 80 100 120 140 160 180 200 the length of the inter-procedutal trace,

AUGUST 2019 | VOL. 62 | NO. 8 | COMMUNICATIONS OF THE ACM 69 contributed articles

tool failed to report them. To date, we gether with crafted abstract domains Acknowledgments have had about 11 missed bugs, some can scale: each procedure only needs Special thanks to Ibrahim Mohamed of them caused by a bug in the tool or to be visited a few times, and many of for being a tireless advocate for Zoncol- incomplete modeling. the procedures in a codebase can be an among security engineers, to Cris- analyzed independently, thus open- tiano Calcagno for leading Infer’s tech- Compositionality and Abstraction ing opportunities for parallelism. A nical development for several years, The technical features that under- compositional analysis can even have and to our many teammates and other pin our analyses are compositional- a runtime that is (modulo mutual re- collaborators at Facebook for their ity and abstraction. cursion) a linear combination of the contributions to our collective work on The notion of compositionality comes times to analyze the individual proce- scaling static analysis. from language semantics: A semantics is dures. For this to be effective, a suit- compositional if the meaning of a com- able abstract domain, for instance Readers interested in more technical details of this work are encouraged to review the online appendix; (https:// pound phrase is defined in terms of the limiting or avoiding disjunctions, dl.acm.org/citation.cfm?doid=3338112&picked=formats). meanings of its parts and a means of should also contain the cost of analyz- combining them. The same idea can be ing a single procedure. References 5,8 1. Bessey, A. et al. A few billion lines of code later: using applied to program analysis. A program Finally, compositional analyses are static analysis to find bugs in the real world.Commun. analysis is compositional if the analysis naturally incremental—changing one ACM 53, 2 (Feb. 2010), 66–75. 2. Blackshear, S., Gorogiannis, N., Sergey, I. and O’Hearn, result of a composite program is defined procedure does not necessitate re-ana- P. Racerd: Compositional static race detection. In in terms of the analysis results of its parts lyzing all other procedures. This is im- Proceedings of OOPSLA, 2018. 3. Brookes, S. and O’Hearn, P.W. Concurrent separation and a means of combining them. When portant for fast diff-time analysis. logic. SIGLOG News 3, 3 (2016), 47–65. applying compositionality in program 4. Calcagno, C. et al. Moving fast with software verification. InProceedings of NASA Formal Methods analysis, there are two key questions: Conclusion Symposium, 2015, 3–11. 5. Calcagno, C., Distefano, D. O’Hearn, P.W and Yang, a. How to represent the meaning of This article described how we, as static H. Compositional shape analysis by means of bi- a procedure concisely? analysis people working at Facebook, abduction. J. ACM 58, 6 (2011), 26. 6. Cook, B. Formal reasoning about the security of b. How to combine the meanings in have developed program analyses in re- Amazon Web services. LICS (2018), 38–47. an effective way? sponse to the needs that arise from pro- 7. Cousot, P. and Cousot, R. Abstract interpretation: A unified lattice model for static analysis of programs For (a) we need to approximate the duction code and engineers’ requests. by construction or approximation of fixpoints. In meaning of a component by abstracting Facebook has enough important code Proceedings of the 4th POPL, 1977, 238–252. 8. Cousot, P. and Cousot, R. Modular static program away the full behavior of the procedure and problems that it is worthwhile to analysis. In Proceedings of 2002 CC, 159–178. and to focusing only on the properties have embedded teams of analysis ex- 9. Feitelson, D.G., Frachtenberg, E. and Beck, K.L. Development and deployment at Facebook. IEEE relevant for the analysis. For instance, for perts, and we have seen (for example, in Internet Computing 17, 4 (2013), 8–17. security analysis, one may be only inter- the use of Infer to support multithread- 10. Gorogiannis, N., Sergey, I. and O’Hearn, P. A true positives theorem for a static race detector. In ested that a function returns a user-con- ed Android News Feed, and in the evo- Proceedings of the 2019 POPL. trolled value, when the input argument lution of Zoncolan to detect SEV-worthy 11. Harman, M. and O’Hearn, P. From start-ups to scale- ups: Open problems and challenges in static and contains a user-controlled string, dis- issues) how this can impact the compa- dynamic program analysis for testing and verification). carding the effective value of the string. ny. Although our primary responsibil- In Proceedings of SCAM, 2018. 12. Iqbal, S.T and Horvitz, E. Disruption and recovery of More formally, the designer of the static ity is to serve the company, we believe computing tasks: Field study, analysis, and directions. In Proceedings of 2007 CHI, 677–686. analysis defines an appropriate math- that our experiences and techniques 13. Larus, J.R. et al. Righting software. IEEE Software 21, ematical structure, called the abstract can be generalize beyond the specific 3 (2004), 92–100. 7 14. O’Hearn, P. Continuous reasoning: Scaling the impact domain, which allows us to approxi- industrial context. For example, In- of formal methods. LICS, 2018. mate this large function space much fer is used at other companies such as 15. O’Hearn, P.W. Experience developing and deploying concurrency analysis at Facebook. SAS, 2018, 56–70. more succinctly. The design of a static Amazon, Mozilla, and Spotify; we have 16. O’Hearn, P.W. Separation logic. Comm. ACM 62, 2 (Feb analysis relies on abstract domains pre- produced new scientific results,2,10 and 2019), 86–95. 11,14 17. Sadowski, C., Aftandilian, E., Eagle, A., Miller-Cushon, cise enough to capture the properties of proposed new scientific problems. L. and Jaspan, C. Lessons from building static analysis interest and coarse enough to make the Indeed, our impression as (former) re- tools at Google. Commun. ACM 61, 4 (Apr. 2018), 58–66. 18. Xie, Y. and Aiken, A. Static detection of security problem computationally tractable. The searchers working in an engineering vulnerabilities in scripting languages. In Proceedings ‘abstraction of a procedure meaning’ is organization is that having science and of USENIX Security Symposium, 2006. 19. Yorsh, G., Yahav, E. and Chandra, S. Generating precise often called a procedure summary in the engineering playing off one another in and concise procedure summaries. In Proceedings of analysis literature.19 a tight feedback loop is possible, even 2008 POPL. The answer to question (b) mostly advantageous, when practicing static Dino Distefano is a research scientist at Facebook, depends on the specific abstract do- analysis in industry. London, U.K., and a professor of computer science at main chosen for the representation of To industry professionals we say: Queen Mary University of London, U.K. summaries. Further information on advanced static analyses, like those Manuel Fähndrich is a software engineer at Facebook the abstractions supported by Infer found in the research literature, can be Research, Seattle, WA, USA. and Zoncolan, as well as brief infor- Francesco Logozzo is a software engineer at Facebook deployed at scale and deliver value for Research, Seattle, WA, USA. mation on recursion, fixpoints, and general code. And to academics we say: Peter W. O’Hearn is a research scientist at Facebook, analysis algorithms, may be found in from an industrial point of view the sub- London, U.K. and a professor of computer science at the online technical appendix. It is ject appears to have many unexplored University College London, U.K. worth discussing the intuitive reason avenues, and this provides research op- for why compositional analysis to- portunities to inform future tools. Copyright held by authors/owners.

70 COMMUNICATIONS OF THE ACM | AUGUST 2019 | VOL. 62 | NO. 8