Package Manager: the Core of a GNU/Linux Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Package Manager: The Core of a GNU/Linux Distribution by Andrey Falko A Thesis submitted to the Faculty in partial fulfillment of the requirements for the BACHELOR OF ARTS Accepted Paul Shields, Thesis Adviser Allen Altman, Second Reader Christopher K. Callanan, Third Reader Mary B. Marcy, Provost and Vice President Simon’s Rock College Great Barrington, Massachusetts 2007 Abstract Package Manager: The Core of a GNU/Linux Distribution by Andrey Falko As GNU/Linux operating systems have been growing in popularity, their size has also been growing. To deal with this growth people created special programs to organize the software available for GNU/Linux users. These programs are called package managers. There exists a very wide variety of package managers, all offering their own benefits and deficiencies. This thesis explores all of the major aspects of package management in the GNU/Linux world. It covers what it is like to work without package managers, primitive package man- agement techniques, and modern package management schemes. The thesis presents the creation of a package manager called Vestigium. The creation of Vestigium is an attempt to automate the handling of file collisions between packages, provide a seamless interface for installing both binary packages and packages built from source, and to allow per package optimization capabilities. Some of the features Vestigium is built to have are lacking in current package managers. No current package manager contains all the features which Vestigium is built to have. Additionally, the thesis explains the problems that developers face in maintaining their respective package managers. i Acknowledgments I thank my thesis committee members, Paul Shields, Chris Callanan, and Allen Altman for being patient with my error-ridden drafts. Thank you for the feedback, the encourage- ment, and being diligent in responding to my questions. I thank my most excellent friend, Rigels. You have encouraged me throughout this whole process. Every time I stop and think about how much work I need to complete, I say to myself, “Rigels is working really hard to become a dentist. I need to work hard to learn computers inside out”. I will always cherish how you would greet me and ask, “What’s this? Where are the commits [to the thesis project subversioning repository]? I want to see more commits”. I thank you for being so interested in my project and sometimes forcing me to bounce complicated ideas off of you. I also thank you for coming to my residence one day and helping me debug the most difficult pieces of code of Vestigium. I would not have enjoyed writing this thesis without your presence. I thank Daniel Villarreal for sacrificing his time to proofread the thesis to help me prepare it for its final draft. I thank Ryan Hoffman for showing interest in the project and for sacrificing his time to proofread the third chapter. I thank Evan Didier for showing interest in my thesis and pointing out errors in the final draft. I thank Fedor Labounko, Mike Haskel, and my uncle Leo for showing interest in my thesis project and allowing me to bounce some ideas off of you all. I further thank Fedor for answering many of my questions about the thesis formatting and writing process. I thank the open source software developers who volunteer their free time to develop software that I use free of charge and almost free of problems. Your work inspires millions of people, including me. Finally, I thank my father for instilling in me the discipline, will, and patience necessary for completing projects of this magnitude. I thank you for providing the resources to make everything possible. I also thank you both for showing interest in my work and for helping me overcome all of the hardships that I had to overcome in life. ii Contents 0.1 Introduction . 1 1 Optimization 8 1.1 Linux . 8 1.1.1 File Systems . 12 1.2 Toolchain . 13 1.2.1 Glibc . 14 1.2.2 Gcc . 14 1.2.3 Binutils . 18 1.3 Benchmarking . 19 1.3.1 Scimark . 20 1.3.2 Bashmark . 22 1.3.3 Lame . 24 1.3.4 Povray . 25 1.3.5 Kernel Benchmarks . 26 2 Package Managers: Making Life Easy 27 2.1 Linux From Scratch . 28 2.2 Package Management Techniques . 37 2.2.1 Directory and PATH Method . 38 2.2.2 Directory and Link Method . 39 iii 2.2.3 Timestamp Method . 39 2.2.4 Users method . 40 2.2.5 LD PRELOAD Method . 42 2.2.6 Temporary Build Directory Method . 43 2.3 Binary Packages . 43 2.4 Primitive Package Managers . 44 2.4.1 Pkgtool . 45 2.4.2 RPM . 47 2.4.3 Dpkg . 51 2.5 High-level Package Managers . 52 2.5.1 Apt . 53 2.5.2 Swup . 55 2.5.3 Pacman . 57 2.6 Centralized Package Managers . 58 2.6.1 Portage . 60 2.7 Maintaining Package Managers . 66 2.7.1 Keeping Up To Date . 66 2.7.2 ABI Changes . 67 2.7.3 Toolchain Weaknesses . 67 2.7.4 Branches . 68 2.7.5 Bug Reporting Systems . 70 2.7.6 Optimization . 71 2.8 Frontends . 72 3 The Birth of a Distribution: Developing the Package Manager 74 3.1 Determining Features . 75 3.2 User Interaction . 77 3.3 Implementation . 85 iv 3.3.1 Libraries . 88 3.3.2 User Requests . 90 3.3.3 Queues . 92 3.3.4 Subroutines . 96 3.3.5 File Tracking . 101 3.4 Strengths and Weaknesses of Implementation . 106 4 Maintaining a Distribution 110 4.1 Using Simulation to Estimate Personnel Requirements . 112 4.2 The Maintainer Scheme: Pros and Cons . 114 4.3 Attracting and Keeping Developers . 116 4.4 Attracting and Keeping Users . 118 5 Conclusion 120 5.1 Further Research . 123 A Understanding Levels of Optimization 125 B Circular Dependencies 126 C Source Code 127 C.1 Vestigium . 128 C.1.1 vestigium . 128 C.1.2 cont.lib . 139 C.1.3 names.lib . 140 C.1.4 search.lib . 141 C.1.5 cp . 145 C.1.6 install . 147 C.1.7 ln . 149 C.1.8 mkdir . 151 v C.1.9 mv . 152 C.1.10 vestigium.conf . 154 C.1.11 subarchs . 156 C.1.12 vesport . 157 C.2 Sysmark . 162 C.2.1 sysmark . 162 C.2.2 parse . 166 C.2.3 README . 167 C.2.4 Makefile . 168 C.3 Simulation . 169 C.3.1 main . 169 C.3.2 maintainers-create . 174 C.3.3 pkg-comp-times-retrieve . 175 C.3.4 retrieve-pkg-commit-times . 176 vi 0.1 Introduction Contrast a small village with a large city. In the village, everyone knows everyone else, what everyone does, the conflicts between people, who is friends with whom, etc. In a city, the opposite is the case. One does not know the names of every person one passes. No one knows whether two individuals might argue if they are in close proximity. The village people can manage themselves without needing a large bureaucratic order, while the city remains standing only because of such a bureaucratic order. GNU/Linux distributions are like a city. A GNU/Linux distribution is made up of thou- sands of pieces of software. It is nearly impossible to have a usable distribution with less than three hundred packages. To manage all of the pieces in a distribution, some sort of structure is needed. This structure is centered around the idea of package management. The purpose of package management is to allow the tracking of all the pieces of software that comprise a distribution. In this way, civil order can be maintained and conflicts avoided. Most GNU/Linux distributions differ from proprietary operating systems, such as Mi- crosoft’s Windows and Apple’s Mac OS, by how much control a user can have. GNU/Linux operating systems, depending on how their package management systems are organized, al- low the user greater flexibility in shaping the operating system. This flexibility can address significant needs and preferences – more significant than, for instance, merely being able to change the background picture on a desktop. The needs in this context might include things like bug fixes, security upgrades, and hardware constraints; while preferences might involve, say, a user’s like or dislike for having a minimal versus maximal set software, or their desire to optimize the system for particular hardware. This thesis examines package management in GNU/Linux distributions and explores almost every aspect of package management. We examine the intricacies of optimizing software as well as bootstrapping compilers. We probe and describe just about all theoreti- cal and practical ways to manage software on a system. We go as far as creating a package manager and outlining how distributions can manage their personnel. 1 The first chapter delves deeply into how a GNU/Linux system is optimized. This chap- ter introduces the reader to most aspects of optimization that.