Surviving Software Dependencies

practice DOI:10.1145/3347446 is additional code a programmer wants Article development led by queue.acm.org to call. Adding a dependency avoids repeating work: designing, testing, debugging, and maintaining a specific Software reuse is finally here unit of code. In this article, that unit of but comes with risks. code is referred to as a package; some systems use the terms library and mod- BY RUSS COX ule instead. Taking on externally written dependencies is not new. Most programmers have at one point in their careers had to go through the steps of manually Surviving installing a required library, such as C’s PCRE or zlib; C++’s Boost or Qt; or Java’s JodaTime or JUnit. These packages contain high-quality, debugged Software code that required significant exper- tise to develop. For a program that needs the functionality provided by one of these packages, the tedious work of manually downloading, in- Dependencies stalling, and updating the package is easier than the work of redeveloping that functionality from scratch. The high fixed costs of reuse, however, mean manually reused packages tend to be big; a tiny package would be easier to reimplement. FOR DECADES, DISCUSSION of software reuse was more A dependency manager (a.k.a. pack- common than actual software reuse. Today, the situation age manager) automates the downloading and installation of dependency is reversed: developers reuse software written by others packages. As dependency managers every day, in the form of software dependencies, and the make individual packages easier to download and install, the lower fixed situation goes mostly unexamined. costs make smaller packages economi- My background includes a decade of working with cal to publish and reuse. For example, Google’s internal source code system, which treats the Node.js dependency manager NPM 17 provides access to more than 750,000 software dependencies as a first-class concept, as packages. One of them, escape-string- well as developing support for dependencies in the Go regexp, consists of a single function programming language.2 that escapes regular expression opera- tors in its input. The entire implemen- Software dependencies carry with them serious tation is: risks that are too often overlooked. The shift to easy, var matchOperatorsRe = fine-grained software reuse has happened so quickly /[|\\{}( )[ \ ]^ $ + * ?.]/ g ; that we do not yet understand the best practices for module.exports = function (str) { choosing and using dependencies effectively, or even if (typeof str !== ’string’) { throw new TypeError( for deciding when they are appropriate and when not. ’Expected a string’); The purpose of this article is to raise awareness of the } return str.replace( risks and encourage more investigation of solutions. matchOperatorsRe, ’\\$&’); In software development today, a dependency }; 36 COMMUNICATIONS OF THE ACM | SEPTEMBER 2019 | VOL. 62 | NO. 9 Before dependency managers, and RubyGems (Ruby) each host more practice. But there are important dif- publishing an eight-line code library than 100,000 packages. The arrival of ferences that are being ignored. would have been unthinkable: too this kind of fine-grained, widespread Decades ago, most developers trust- much overhead for too little benefit. software reuse is one of the most con- ed others to write the software they NPM, however, has driven the over- sequential shifts in software develop- depended on, such as operating sys- head approximately to zero, with the ment over the past two decades. And if tems and compilers. That software was result that nearly trivial functionality we are not more careful, it will lead to purchased from known sources, often can be packaged and reused. In late serious problems. with some kind of support agreement. April 2019, the escape-string-regexp There was still a potential for bugs or package was explicitly depended What Could Go Wrong? outright mischief,20 but at least the de- upon by almost a thousand other A package, for this discussion, is code velopers knew who they were dealing NPM packages, not to mention all the downloaded from the Internet. Adding with and usually had commercial or packages developers write for their a package as a dependency outsources legal recourses available. own use and don’t share. the work of developing that code—de- The phenomenon of open source Dependency managers now exist signing, writing, testing, debugging, software, distributed at no cost over the for essentially every programming lan- and maintaining—to someone else on Internet, has displaced many of those guage: Maven Central (Java), NuGet the Internet, often unknown to the pro- earlier software purchases. When reuse (.NET), Packagist (PHP), PyPI (Python), grammer. Using that code exposes the was difficult, there were fewer projects program to all the failures and flaws publishing reusable code packages. in the dependency. The program’s ex- Even though their licenses typically dis- ecution now literally depends on code claimed, among other things, any “im- downloaded from this stranger on the plied warranties of merchantability and Internet. Presented this way, it sounds fitness for a particular purpose,” the incredibly unsafe. Why would anyone projects built up well-known reputa- do this? tions that often factored heavily into Because it’s easy, it seems to work, people’s decisions about which to everyone else is doing it, and, most use. The commercial and legal sup- importantly, it seems like a natural port for trusting software sources IMAGE BY T. EMIL EMIL T. BY IMAGE continuation of age-old established was replaced by reputational support. SEPTEMBER 2019 | VOL. 62 | NO. 9 | COMMUNICATIONS OF THE ACM 37 practice Many common early packages still en- No matter what the expected cost, joy good reputations: consider BLAS experiences with larger dependencies (published in 1979), Netlib (1987), suggest some approaches for estimat- libjpeg (1991), LAPACK (1992), HP STL ing and reducing the risks of adding a (1994), and zlib (1995). software dependency. Better tooling is Dependency managers have scaled Developers trust likely needed to help reduce the costs down this open source code reuse mod- more code with of these approaches, much as depen- el. Now, developers can share code at dency managers have focused to date the granularity of individual functions less justification on reducing the costs of downloading consisting of tens of lines of code. This for doing so. and installation. is a major technical accomplishment. Myriad packages are available, and Inspect the Dependency writing code can involve a large num- You would not hire a software devel- ber of them, but the commercial, legal, oper you have never heard of and know and reputational support mechanisms nothing about. You would learn more for trusting the code have not carried about the person first: check referenc- over. Developers trust more code with es, conduct a job interview, run back- less justification for doing so. ground checks, and so on. Before you The cost of adopting a bad depen- depend on a package found on the In- dency can be viewed as the sum, over ternet, it is similarly prudent to learn a all possible bad outcomes, of the cost bit about it first. of each bad outcome multiplied by A basic inspection can provide a its probability of happening (risk), as sense of how likely you are to run into shown in the equation. problems trying to use this code. If the inspection reveals likely minor prob- expected cost = cost(b) × probability(b) lems, you can take steps to prepare ∑ for or perhaps avoid them. If the in- b ∈ bad outcomes spection reveals major problems, it The context in which a dependency may be best not to use the package; will be used determines the cost of a maybe you will find a more suitable bad outcome. At one end of the spec- one, or maybe you need to develop trum is a personal hobby project, one yourself. Remember that open where the cost of most bad outcomes source packages are published by is near zero: you are just having fun, their authors in the hope they will bugs have no real impact other than be useful but with no guarantee of wasting time, and even debugging can usability or support. In the middle be fun. So, the risk probability almost of a production outage, you will be doesn’t matter—it’s being multiplied the one debugging the package. As by a failure cost of almost zero. At the the original GNU General Public Li- other end of the spectrum is produc- cense warned, “The entire risk as to tion software that must be maintained the quality and performance of the for years. Here, the cost of a bug in a program is with you. Should the pro- dependency can be very high: servers gram prove defective, you assume the may go down, sensitive data may be di- cost of all necessary servicing, repair vulged, customers may be harmed, or or correction.”7 companies may fail. High failure costs The following are some consider- make it much more important to esti- ations when inspecting a package and mate and then reduce any risk of a seri- deciding whether to depend on it: ous failure. Design. Is the documentation clear? Does the API have a clear design? If the authors can explain the package’s API and its design well in the documentation, that increases the likelihood they have explained the implementa- tion well to the computer in the source code. Writing code using a clear, well- designed API is also easier, faster, and hopefully less error-prone. Have the authors documented what they expect from client code in order to make fu- 38 COMMUNICATIONS OF THE ACM | SEPTEMBER 2019 | VOL. 62 | NO. 9 practice ture upgrades compatible? (Examples caught.

Surviving Software Dependencies

Benchmarking, Analysis, and Optimization of Serverless Function Snapshots

Qualifikationsprofil #10309

Gvisor Is a Project to Restrict the Number of Syscalls That the Kernel and User Space Need to Communicate

Firecracker: Lightweight Virtualization for Serverless Applications

Architectural Implications of Function-As-A-Service Computing

The Aurora Operating System

Containerd: Integration

Linux Kernel Debugging Advanced Operating Systems 2020/2021

Bring Security to the Container World, and More

Sub-Millisecond Startup for Serverless Computing with Initialization-Less Booting

X-Containers: Breaking Down Barriers to Improve

Kata Containers and Gvisor a Quantitative Comparison