A SURVEY OF METADATA PACKAGE MANAGEMENT SYSTEMS AND THEIR SOLVERS.

1SRIKALA BHARADWAJ, 2MEGHA.P.ARAKERI

1,2Information Science Department, MSRIT, Bangalore

Abstract— Software components which encapsulate units of functionality for an operating system is created and widely distributed to end users in packaged forms such as RPM (RPM) format or format in many Linux distros. Many Linux flavours make use of primary package managers with secondary metadata package managers to handle the installation, upgradation and removal of packages. This is a survey paper that delves into the world of metadata package management and examines the various techniques used by the package managers available today.

Index Terms—Package Management, Dependency Hell, Dependency Resolution, Automatic install, Upgrade.

I. INTRODUCTION logic that is to be executed during deployment and metadata that helps track the files and describe the Software created can be shipped to an end user either component’s capabilities. For example in the RPM in source form or in pre-built package form. The format all the metadata is present in a header. A source form is configurable to ones needs and so is software package can be visualized as shown in Fig.2. preferred by developers who configure, build and install it as per their requirements. But management of this becomes a tedious task with time. A typical end user prefers to use a pre-built ready to install packaged application. Package maintainer takes the burden of configuring, building and packaging it. A package will have the installation paths, information about packages that the current package is dependent upon, patches and scripts. A typical open operating system consists of thousands of software components that help run various applications. These components are made available as packages. An operating system may about have 25000 to 45000 packages.

Fig. 2 Contents of a typical package

Package Repository: This is usually a web repository (hosted on a File transfer protocol (FTP) server), or a media device such as DVD where all packages are made available by packager for install. Package Manager: It is a tool that handles dependency resolution, fetching the packages from repository, installation, upgradation and removal of packages Fig. 1 Overview of package management from the repository. A basic package manager (such as RPM) is aided by a secondary metadata using package In order to consistently maintain installation, manager say Yellowdog Updater, modified (). upgradation and removal of these software packages, a for automatic dependency resolution and other set of tools called package manager is used. The Fig.1 advanced features. shows the components involved in package distribution and management. A package is typically a Dependencies for a package: A package ‘A’ might piece of software which contains files to be placed in require package ‘B’ as a pre-install. So package ‘B’ is specific directories in the target system, configuration a dependency for package ‘A’. In the real world

Proceedings of 6th IRF International Conference, Bangalore, India, 01st June, 2014. ISBN: 978-93-84209-24-7

88 A Survey Of Linux Metadata Package Management Systems And Their Solvers. scenario the dependencies tend to form a complex tree Rollback or undo the changes: When the recent like structure as shown in Fig.3 changes produce undesirable effects, it should be possible to rollback to a previous information profile. Frequent need for updating: There are newer versions of software available everyday which meets the user’s requirement better. Routine upgrades are done to provide security patches, fix bugs and new features. Scalability: A single repository contains thousands of packages. The real challenge arises when the package manager has to deal with multiple repositories.

III. PACKAGE MANAGEMENT METHODOLOGIES

A. Functional Package managers Nix [3] is a package manager based on a purely functional model. Packages are built from Nix expressions which are used to describe graphs of build actions called derivations. A derivation consists of a build script, environment variables and other Fig.3. Dependency tree for package yum derivations (set of dependencies). A package is built

recursively from its dependencies and each time the In this paper we will see the workings of various corresponding build scripts and environment settings package managers and techniques used by them to are done. In order to perform an upgrade with a new solve the package install, upgrade and removal configuration, the system is entirely rebuilt with the problem. Section II gives the challenges in package new specification. This allows for an easy roll back to management. Section III gives the overview of the the old configuration if necessary. This model also types of package managers classified according to the benefits from its statelessness property. Statelessness method used to carry out the tasks-functional method makes the configuration actions predictable and and imperative method. Section IV looks particularly ensures there is no mysterious failure. into techniques used for dependency resolution. The similarities between modules (or units) used in Section V lists package managers whose programming and packages used in software computational burden is shifted to server side. Section distribution were highlighted by Tucker [4]. A VII gives the future trends likely to be seen in this area package and module have similar behaviour as both of research. provide their capabilities as a service and in turn may

require certain imports. In programming context the II. CHALLENGES IN PACKAGE imports correspond to other functions and values MANAGEMENT whereas in package management context the imports

are other modules or dependencies. The distribution Language to express user request: A simple install format would be a compound unit which is a bundle of command has to be expressed in a structural way [1] all necessary atomic units linked correctly. As seen in [2] such that the steps and dependencies required to the programming world where a class can be install it is input accurately into the package manager. instantiated as several objects similarly a unit can be Usually Conjunctive normal form or set theory invoked several times and separate copies of each expressions are used. invocation can be uniquely identified and maintained. Dependency resolution: Package manager should This makes a provision for using correct version of check whether a solution that satisfies all constraints dependencies without conflicting with other installed exists among the available packages. As the versions. Having multiple versions installed satisfiability criteria increases, the logical complexity simultaneously can aid in better rollback mechanism. of the problem also increases.

Optimization: The best solution should be picked B. Imperative package managers among available solutions. User preference should be Most tools use imperative method [3] of operation taken into account for optimization. where the deployment actions are stateful. Files Unpacking and installing: The installation should be belonging to each package are stored in the Unix file done with minimal possible changes to the current system hierarchy. Packages are upgraded by information profile such that there are no broken overwriting the old versions. Statefulness makes it applications. hard to support multiple versions at the same time. These tools are capable of automatically resolving

Proceedings of 6th IRF International Conference, Bangalore, India, 01st June, 2014. ISBN: 978-93-84209-24-7

89 A Survey Of Linux Metadata Package Management Systems And Their Solvers.

1 dependencies and installing the required package. p  p represents conflict between the packages p They are less complex and hence are fast. The and p1. reliability however is less as they may not be able to When the propositional logic consists of all find a solution many-a-times. It is difficult to roll back dependency requirements in conjunctive form, the an installation or upgradation as configuration files problem can be solved in linear time. But if two or are manipulated without traceability. more packages satisfy the dependency and Cupt [5] is a partial reimplementation of APT with an disjunctions are used in the propositional logic, then it aim to achieve a bug-free package manager. The becomes a NP-complete problem [8]. Boolean optimization criterion is hard coded. In the satisfiability is a known NP-complete problem. Hence MISC-2010 [12] competition it was able to find a package upgrade request is also a NP-complete solution with 21% success for large set criterion (4 problem [7]. Debian releases) and 84% success for single release YaSt [9] package manager used by openSUSE Linux setup. uses Boolean satisfiability (SAT solver) to find Smart Package manager [6] is a platform agnostic dependencies. It uses an external solver libzypp. User portable tool. It aids the user with efficient solution preferences are hard coded in the problem encoding. finding algorithms and rollback options. In the MAX- SAT is a SAT based method where the effort is MISC-2010 competition Smart performed very well made to satisfy maximum number of clauses. with a single repository with 93% success of finding a INESC-ID [8] is a solver built using the p2cudf parser solution; however its performance degraded with the and MAX-SAT solver from MSUnCore which has an scaling of repository size and could provide solutions unsatisfiability core. In the MISC-2010 competition, with only 25% success. insec solver approximately had twice the penalty rate Mancoosi Package Manager (MPM) [7] is a of the winner. In took less time when the aim was to distribution-agnostic tool that uses the Common minimize the number of packages removed or Upgradeability Description Format (CUDF) format changed and worked best with direct CUDF input. It for describing package upgrade problem and found a solution with a success rate of 96% of which optimizations. MPM allows users to specify high level 12% were optimal. user optimization criteria. MPM architecture allows for use of external dependency solvers to be used by D. Pseudo Boolean Optimization adapting to CUDF. In the MISC-2010 competition Apt-bpo [10] uses Pseudo-Boolean Optimization MPM was found to be the only manager that (PBO) on the solution obtained by applying a SAT performed well consistently as the size of the technique to get the best solution. Also -bpo has the repository grew. It had a 93% success rate in finding a capability to refine the solution as per user specified solution with single repository and 94% with four preferences without any tradeoff with performance. repositories. Apt-bpo was found to be the slowest (1/3rd the speed of winner) and had the most number of penalties IV. CLASSIFICATION BASED ON THE (approximately 5.6 times more penalties than winner) TECHNIQUE USED FOR PROBLEM in the MISC 2010 competition. SOLVING Argelich et al. [8] brought the existing knowledge of C. Boolean Satisfiability pseudo Boolean optimizer p2 and Sat4j libraries used Most package managers use the Boolean satisfiability for solving dependencies in eclipse environment to (SAT) technique to solve the dependency problem [8]. Linux package management in a PBO solver called Boolean satisfiability is represented in Conjunctive p2cudf. Using the p2 software a single objective normal form where there are a set of clauses which are function consisting of all optimization criteria is disjunction of Boolean variables. A Boolean formula created. Since such a function can contain a huge is satisfied if and only if all its clauses are satisfied. A number of literals and huge coefficients p2cudf fails to clause is satisfied if and only if atleast one of its converge to the best solution [11]. In the MISC-2010 variables is satisfied. Deciding whether there exists an paranoid/trendy criteria competition the p2cudf solver assignment to the variables such that the formula is performed well in the real world debian user problems satisfied is known as Boolean satisfiability problem. category and emerged the winner. However overall it Packages are assigned the Boolean value true if they was approximately 2.3 times slower than Unsa are installed and the value false if they are not (winner) and had a penalty 1.6 times more than installed. Dependencies are represented by clauses winner. It found a solution 96% of the time regardless [11] such as the of the number of repositories but the solution was not clause ( p  d 1 )....( p  d k ) represents dependency optimal. of package p on packages d1, d2 … dk. Binary clause

Proceedings of 6th IRF International Conference, Bangalore, India, 01st June, 2014. ISBN: 978-93-84209-24-7

90 A Survey Of Linux Metadata Package Management Systems And Their Solvers. OPIUM [13] is a complete tool that overcomes the drawbacks of apt-get and yum by finding a optimized OSCAR [16] is a toolkit which aids in package solution that is reliable. It uses a combination of tools management. An OSCAR package includes a in a hybrid approach. It uses SAT solver for meta-file that describes the package. Packages are installation and a pseudo-boolean solver for classified as core (required by OSCAR to function), optimization (minimization problem). It also gives a included (included in official OSCAR release) and solution as to which packages must be uninstalled to third-party. The main highlighting feature of OSCAR overcome unsatisfiability using the resolution proof is Image-based management. An image offers a tree output by SAT solver. Users can state preferences virtual view of the system with its current information such as minimization of size of packages downloaded profile. This image is stored and manipulations such when they are being charged per byte. This is done as installation and upgradation are done on the through an objective function. Comparison with the image-server. When a suitable information profile is popular apt-get shows that though OPIUM is slower it reached, the client can synchronize with the image can give a better solution and its failure rate is low. server. This is beneficial to the client as the entire However there is no rollback mechanism in place such process is done on server side. It is easy to perform a that the user can go to a previous information profile. rollback by synchronizing with an older saved image. However this requires more memory at the server side. E. Answer Set Programming It is suitable for a High performance scenario where Aspcud [14] is a configuration tool that uses the abundant resources are available. Answer set programming (ASP) technique. ASP is to represent a given computational problem by a logic Chris [17] highlights a technique where homogenous program whose answer sets correspond to solutions, package management is possible using customized and then use an ASP solver for finding answer sets of configuration scripts for similar set of clients. Clients the program. This tool has its primary hard constraints synchronize automatically with upstream repositories and optimizing soft constraints hard coded into customized for them. Server receives a specification configuration files. The user input is coded in CUDF. and does dependency resolution and gives the correct The optimization is aided by dedicated search configuration back. Repositories are tiered for strategies and heuristics. This tool was approximately efficient use. This technique faces problems when 3 times slower and had a 2.3 times higher penalty than multiple clients are present and each client specifies the winner in the MISC-2010 competition. constrains for upgrade or removal of a package. This prevents the server from giving a homogenous F. Mixed Integer Linear Programming solution. Researchers have suggested methods to MILP (Mixed integer linear programming) [15] overcome this drawback. technique is used by Claude to get an optimized solution for the upgradability problem. It uses the VI. FUTURE TRENDS CUDF (Common Upgradability Description format) to formulate the problem’s request. It finds the best 1. Standardization of package metadata inorder to solution for a given criteria by converting it into a ensure the compatibility of Package management tools minimization problem with binary variables under with one another. The Linux standard Base endorses integer linear equations and inequalities. MILP solver the RPM format. The CUDF works with both deb and is quick to prove unsatisfiability as well. Solver uses RPM format to create its own metadata and can be aggregate functions to solve problems with multiple used by other tools. criteria. MILP solver can be improved to 2. Package managers should be written in low level accommodate user preferences better. Unsa is a solver languages like C so that the tools on top of it can use built using IBM ILOG'S CPLEX. Usna emerged the available bindings and extend the features. Package winner in MISC-2010 competition with a 100% managers written in Perl did not have a good reach as success rate in finding the best solution in the fastest Perl is not supported by many Linux distros. Similarly possible time. use of high level languages creates compatibility issues. V. PACKAGE MANAGEMENT USING 3. Test-before-install and rollback-after-install are SERVER-SIDE COMPUTATION desirable features in package management. With packages available from various vendors users need to All the techniques seen in the previous section put the try, install and rollback to the previous information burden of problem solving on the client side while profile if needed. Most package managers provide this only the repository is maintained on the server-side. feature but some lack the ability to perform runtime The following techniques try and shift the failure checks and clean rollbacks. computational burden to the server-side making it 4. Package managers should be able to pick the best possible for thin-clients to utilize the system. solution with zero percent failure. Better and newer

Proceedings of 6th IRF International Conference, Bangalore, India, 01st June, 2014. ISBN: 978-93-84209-24-7

91 A Survey Of Linux Metadata Package Management Systems And Their Solvers. optimization techniques that are capable of finding an [3] Eelco Dolstra and Andres Löh. “NixOS: A Purely Functional Linux Distribution,” In ICFP 2008: 13th ACM SIGPLAN optimal solution are needed. Independent solvers that International Conference on Functional Programming, pages can work with any packaging format and system are 367–378, September 2008. necessary. [4] David B. Tucker and Shriram Krishnamurthi. Applying module 5. User preferences should be accommodated in system research to package management. In Tenth International formulating best solution. The optimization criteria Workshop on Software Configuration Management (SCM-10), must take as much as user input as possible and render 2001. an optimal solution with an effort to accommodate the [5] E.V. Lyubimkin, Cupt package manager, criteria in a priority based manner. https://wiki.debian.org/Cupt, 2010. 6. Metadata maintained by package managers must be minimal and accommodating to change. Compressed [6] Gustavo Niemeyer, http://labix.org/smart files and light databases are used by package managers [7] Pietro Abate, Roberto Di Cosmo et al, “MPM :A Modular to maintain their metadata. This metadata should be Package Manager.” CBSE’11, June 2011. standardized to ensure compatibility. [8] Josep Argelich, Daniel Le Berr ,Inˆes Lynce, “Solving Linux 7. Server-side computational assistance may be used upgradeability problems using boolean optimization.” In: to provide better solutions to the clients. As soon as LoCoCo: Logics for Component Configuration, pp. 11-22. updates are available, the server can synchronize with 2010. the client. A homogenous setup across multiple clients [9] Olivier Roussel, Vasco M. Manquinh, “Pseudo-Boolean and is possible. Cardinality Constraint.” In: Armin Biere, Marijn Heule, Hans 8. Easy replicability of desired configuration on van Maaren & Toby Walsh, editors: Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications 185, IOS multiple servers. The information profile of a server Press, pp. 695–733. 2009. can be replicated on another by feeding the necessary configuration metadata. [10] Yast. http://en.opensuse.org/Package_management. 2011.

[11] P. Trezentos, I. Lynce, and A. L. Oliveira. “Apt-pbo: solving the CONCLUSION software dependency problem using pseudo-Boolean optimization.” In C. Pecheur, J. Andrews, and E. D. Nitto, th This paper provided a survey on challenges in package editors, 25 IEEE/ACM International Conference on Automated Software Engineering, pages 427–436, Sept. 2010. ACM. management and types of package management. It showcased server aided package management and [12] http://www.mancoosi.org/misc-2010/ 2010. performance some of client side package managers. [13] Chris Tucker, David Shuffelton, Ranjit Jhala & Sorin Lerner: As more vendors opt for packaged format to distribute “OPIUM: Optimal Package Install/Uninstall Manager.” In: software the future solutions in package management ICSE,07 (29th International Conference on Software need to address configuration rollback, scalability, Engineering), IEEE ComputerSociety, pp. 178–188. 2007 reliability and incorporation of user preferences. [14] M.Gebser, R.Kaminski, T.Schaub, Apscud:a linux package Adapting to standards will ensure compatibility to configuration tool based on answer set programming. LoCoCo, GUI tools and plugins. Server side assistance for volume 65 of EPTCS, 2011. package management can be used more intensely with [15] Claude Michel, Michel Rueher. Handling software availability of low cost servers and parallel processors. upgradeability problems with MILP Solvers. In Proceedings of LoCoCo 2010, EPTCS 29. pp. 1-10, 2010

REFERENCES [16] John Mugler, Thomas Naughton and Stephen L. Scott. OSCAR Meta-Package System. Proceedings of the 19th International [1] Roberto Di Cosmo, Stefano Zacchiroli ,Paulo Trezentos, Symposium on High Performance Computing Systems and “Package Upgrades in FOSS Distributions: Details and Applications, Pages 353-360, 2005 Challenge,”. International Workshop On Hot Topics In Software Upgrades Proceedings of the 1st International Workshop on Hot [17] Chris St. Pierre, Matt Hermanson, “Staging package deployment Topics in Software Upgrades, page 1-5 ACM, 2008. via repository management”, Proceeding LISA'11 Proceedings of the 25th international conference on Large Installation System [2] LAN Yu-Qing, DUAN Xiao-Gang, GAO Jing, ZHOU Administration. Pages 1-1, 2011 Wen-Bin, ZHAO Hui, Extraction Methods on Linux Package Dependency Relations. Information Engineering and Computer Science, 2009. ICIECS 2009.



Proceedings of 6th IRF International Conference, Bangalore, India, 01st June, 2014. ISBN: 978-93-84209-24-7

92