Taxonomy of Package Management in Programming Languages and Operating Systems
Total Page:16
File Type:pdf, Size:1020Kb
Taxonomy of Package Management in Programming Languages and Operating Systems Hisham Muhammad Lucas C. Villa Real Michael Homer Kong Inc. IBM Research Victoria University of Wellington [email protected] [email protected] Wellington, New Zealand [email protected] Abstract For example, when writing JavaScript web applications on Package management is instrumental for programming lan- a Mac environment, a developer may require using Bower [1], guages and operating systems, and yet it is neglected by a package manager for client-side JavaScript components. both areas as an implementation detail. For this reason, it Bower is installed using npm [2], a package manager for lacks the same kind of conceptual organization: we lack ter- node.js [3], a JavaScript environment. On a Mac system, minology to classify them or to reason about their design the typical way to install command-line tools such as npm is trade-offs. In this paper, we share our experience in both via either Homebrew [4] or MacPorts [5], the two popular OS and language-specific package manager development, general-purpose package managers for macOS. This is not categorizing families of package managers and discussing a deliberately contrived example: it is the regular way to their design implications beyond particular implementations. install modules for a popular language in a modern platform. We also identify possibilities in the still largely unexplored The combinations of package managers change as we area of package manager interoperability. move to a different operating system or use a different lan- guage. Learning one’s way through a new language or sys- Keywords package management, operating systems, mod- tem, nowadays, includes learning one or more packaging ule systems, filesystem hierarchy environments. As a developer of modules, this includes not only using package managers but also learning to deploy 1 Introduction code using them, which includes syntaxes for package speci- Package managers are programs that map relations between fication formats, dependency and versioning rules and de- files and packages (which correspond to sets of files), and ployment conventions. Simply ignoring these environments between packages (dependencies), allowing users to perform and managing modules and dependencies by hand is tempt- maintenance of their systems in terms of packages rather ing, but the complexity of heterogeneous environments and than at the level of individual files. Package management is keeping track of dependency updates can become burden- an area that lies in the border between programming lan- some — all these package managers were created to solve guages and operating systems: packaging is a step that sits practical problems which the developer would have to oth- after a language’s build process, and before an operating sys- erwise directly handle, after all. Another alternative that tem’s component installation. For this reason, it seems to be is often proposed, especially by users of operating systems overlooked by both fields as an implementation issue. Mean- that feature a system-provided package manager (as is the while, package management keeps growing in complexity. case of most Linux distributions), is to avoid using multiple New languages, new deployment models, and new portabil- package managers and use a single general-purpose package ity requirements all give rise to new package management manager. This is, of course, as much of a solution as trying systems. Further, this is not simply a matter of competing im- to make everyone agree on a single programming language plementations: modern complex environments often require — one of many analogies between package management and several package managers to be used in tandem. programming languages. The result is that the ecosystem is not getting any simpler, and at first glance it seems that Permission to make digital or hard copies of all or part of this work for package management is indeed a largely unsolved problem. personal or classroom use is granted without fee provided that copies are not However, maybe the statement “package management is made or distributed for profit or commercial advantage and that copies bear an unsolved problem” simply does not make sense, and is this notice and the full citation on the first page. Copyrights for components akin to saying that “programming languages are an unsolved of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to problem”. In the programming languages world we accept redistribute to lists, requires prior specific permission and/or a fee. Request that the multitude of languages is a given. Beyond that, we permissions from [email protected]. understand that there are families of languages with differ- PLOS ’19, October 27, 2019, Huntsville, ON, Canada ent paradigms, with well-known tradeoffs. We also accept © 2019 Association for Computing Machinery. that there is room for domain-specific languages (DSLs) and ACM ISBN 978-1-4503-7017-2/19/10...$15.00 for general-purpose languages. Most importantly, we know https://doi.org/10.1145/3365137.3365402 PLOS ’19, October 27, 2019, Huntsville, ON, Canada Hisham Muhammad, Lucas C. Villa Real, and Michael Homer how to set boundaries for each language and how to make to Python’s default directory for locally-installed modules DSLs and general-purpose languages interact. Most existing (e.g. /usr/lib/python3.7/site-packages/), whose defi- package management systems, however, are still oblivious nition predates the introduction of pip. Database-oriented to the fact that they exist as part of a larger ecosystem, with designs are often chosen when the package manager needs parts of it handled by other package managers. to accomodate a pre-existing directory structure. If Python In this paper we draw on our own unique combination of packages have modules with the same name, clashes may experiences on all sides of this topic: developing a system- occur. In a filesystem-oriented design, such as for example wide package manager for a Linux distribution [6, 7], creating that of RubyGems, this problem would not happen. Each a language-specific package manager [8], and integrating package has its own subtree under a versioned directory, system and language-specific package managers [7], as well and the rubygems.rb module, part of the default installation as being simply developers and end users of other software. of Ruby, takes care of finding the appropriate files when By building a taxonomy for package management and shar- modules are loaded with the require function. Figure 1 lists ing our experiences with package management development more examples of package managers and their classification. in both the programming language and operating system The major trade-off between the filesystem-oriented and spaces, we aim in this paper to frame the design choices, the database-oriented approaches is whether applications layers, and trade-offs attendant to package managers for should be aware of the file structure defined by the manager developers, maintainers, and users. or whether the manager should adapt to the file structure defined by applications. This affects how the manager tracks 2 Paradigms of package management: the mapping of files and how applications are configured to filesystem-oriented vs. database-oriented find their resource files. It is a typical didactic device to organize the landscape of In filesystem-oriented managers the mapping of files to programming languages into paradigms, such as impera- packages is simple. File conflicts are naturally avoided by tive, functional, object-oriented, and so on. The paradigms storing files of different packages in separate subtrees. Ver- a language is categorized into inform users about its de- sioning conflicts between variants of the same package can sign choices, and with these choices come trade-offs. In the also be handled via the tree structure. The structure also world of package managers, we can also identify general becomes more transparent to users, which can simplify their paradigms, by looking at their core concepts and trade-offs. experience. The run-time lookup of files by applications, Package managers map between files and packages, and be- however, can be complicated, if they are oblivious to the tween packages and dependencies, so users can work with structure defined by the package manager. Applications must packages instead of files. The central design choice in apack- either agree beforehand to this structure (which might be age manager, therefore, is how to perform those mappings. an option in domain-specific environments), or the package There are two approaches on how to map files to packages: manager has to do extra work to configure them to use the the mapping can be based on the hierarchical directory struc- structure, such as setting configuration options or environ- ture where the files reside, or can be separate. As this choice ment variables, or in the worst case, patching them. embodies a series of trade-offs and is the single decision that Conversely, in database-oriented managers the mapping affects the design and implementation of a package manager of files to packages is more complicated. Applications may the most, we identify these as two paradigms of package man- install files wherever they please, and the package man- agement. When the mapping is internal to the file hierarchy ager needs to keep track. This includes handling potential structure, we say that package management is filesystem- conflicts if two packages want to use the same pathname. oriented1. When it is external to the hierarchy of files being Database-oriented systems will usually report on these con- managed, the mapping needs to be stored elsewhere. We say flicts and forbid them. It is up to the integrator (such asa in these cases that management is database-oriented. Most distribution developer) who is building packages to resolve package managers for Linux distributions, such as RPM and the conflict somehow.