Distri Researching Fast Linux Package Management Michael Stapelberg @Zekjur 2020-10-11 Overview

Distri Researching Fast Linux Package Management Michael Stapelberg @Zekjur 2020-10-11 Overview

distri researching fast Linux package management Michael Stapelberg @zekjur 2020-10-11 Overview ● 1-minute introduction ● demo videos: arch vs. distri package installation speed ● Comparison with Arch Linux ● How does distri work? Introduction: Michael Stapelberg ● Debian Developer for 7 years (2012-2019) ○ left Debian because of antique tooling and slow changes ● Using Arch Linux for 1 year ○ used Fedora and NixOS each for a few months ○ in a previous life, used Gentoo, Ubuntu and NetBSD ● Wrote the i3 tiling window manager in 2009 ● other FOSS projects, too! Debian Code Search, RobustIRC, gokrazy, … demo: installing “ack” Arch distri demo: installing “qemu” Arch distri Updates/package install: faster in distri ● transport compression → Arch switched to zstd in 2020-01-04 ● mirror selection → Arch asks its users to maintain their mirror list → Why can’t Arch default to a CDN that’s fast everywhere? ● no hooks/triggers: maximum parallelism Arch is moving from package hooks to pacman hooks (e.g. sysusers) ● no unpacking stage: use images instead of archives Updates/package install: more robust in distri ● Arch does not support partial upgrades → distri packages depend on the specific transitive closure, so can always be installed ● Arch upgrades frequently require manual intervention → distri packages use separate hierarchies: file conflicts impossible :) → distri packages are hermetic: not easily broken by other packages on the system Debugging experience ● Installing gdb should be all that a user needs to do: Debug symbols and sources of any package should be fetched on demand! ● Arch does not (yet) provide debug infos for all packages Arch does not (yet) transparently make available symbols → will be solved with debuginfod ● (distri solves this on the package manager level) Packaging experience ● quicker feedback → more engaging → more contributions ● isolating package builds from the host system should be the default Arch asks package maintainers to do manual chroot management Changes over time ● declarative packaging is key to make changes happen the Arch package format is a custom format, not defined anywhere → want auto-formatting → want machines to be able to make edits (→ monorepo?) → express intents/end states, not mechanisms How does distri work? package manager speed: install “ack” (Perl)* distribution package data wall-clock time rate manager Fedora dnf 114 MB 33s 3.4 MB/s Debian apt 16 MB 10s 1.6 MB/s NixOS Nix 15 MB 5s 3.0 MB/s Arch Linux pacman 6.5 MB 3s 2.1 MB/s Alpine apk 10 MB 1s 10.0 MB/s rate = data ÷ wall-clock time * standard installation, includes metadata & package download and dependencies → https://michael.stapelberg.ch/posts/2019-08-17-linux-package-managers-are-slow/ Why are package managers slow? ● 2 most widely used package formats: ○ deb (Debian package), tar(1) in ar(1) ○ rpm (Red Hat Package Manager), metadata around cpio(1) ○ (Arch: tar(1) with metadata) ● task: make package contents available → e.g. pacman -S nginx results in /usr/bin/nginx ● traditionally: resolve deps, download, extract, configure → need to carefully fsync(2) to make I/O as safe as possible How can we go faster? append-only package store of immutable images 1. use an image format (e.g. SquashFS) instead of an archive format 2. mount each image under its own path (“separate hierarchies”): e.g. /ro/nginx-amd64-1.14.1/… e.g. /ro/zsh-amd64-5.6.2/… 3. (rest of the system as usual, e.g. /etc, /var/cache, …) advantages ● mount instead of extract → faster package installation → faster build environment composition ● append-only: can use unsafe I/O ● immutable: no longer possible to screw up your installation hermetic packages ● when run, use the same version of dependencies as when built ● a wrapper script sets e.g. LD_LIBRARY_PATH, PYTHONPATH, PERL5LIB, … separate hierarchies: exchange dirs ● packages exchange data via directories with well-known paths, e.g.: man(1) ⟷ nginx(1) via /usr/share/man gcc(1) ⟷ libusb(3) via /usr/include ● prudent approach: emulate well-known paths e.g.: /usr/include/jpeglib.h is a symlink to /ro/libjpeg-turbo-amd64-2.0.0/out/include/jpeglib.h separate hierarchies: exchange dirs (per package) ● loose coupling (global) vs. tight coupling (per package) → typically suitable for plugin mechanisms where ABI must match ● e.g. /ro/xorg-server-amd64-1.20.3/out/lib/xorg/modules/ separate hierarchies: advantages ● move conflicts from package installation to program execution → only need to resolve /bin/python (2.7 or 3?) when assembling /bin ● packages always co-installable e.g. zsh-amd64-5.6.2 and zsh-amd64-5.6.3 → partial updates/rollbacks easily possible ● package manager can be version-agnostic! → entirely eliminates a large source of slowness → no need for global metadata, package-specific metadata sufficient immutability ● package contents and exchange dirs are read-only ● rarely, programs expect the system to be writable e.g. GNOME’s gsettings wants a cache in the exchange directory ● such designs need to be improved upstream: 1. good caches are not required (fallback to slow path) 2. good caches are transparently created 3. good caches are automatically updated when needed no hooks/triggers (1) ● hook (or maintscript, postinst, …): program run after package installation trigger: program run after other package installation (e.g. man-db) → work at package-installation time which may be unnecessary ● preclude concurrent package installation (not implemented concurrency-safe) ● arbitrary code, can be slow no hooks/triggers (2) ● claim: we can build a functioning system without hooks/triggers ● 1. packages declare what they need (e.g. sysusers) ● 2. move work from package installation to program execution e.g. ssh needs a hostkey: create it in sshd(8) wrapper script ● very few exceptions: bootloader or firmware (need to install them outside of the file system) practicality ● FUSE file system for providing /ro → easier to implement than managing separate mounts, overlays, unions, … → faster (!), as kernel mounts are slow ● packages need to be built with --prefix=/ro/nginx-amd64-1.14.1 etc. ● a small number of packages need to be patched → path-related issues (e.g. service files, gcc, gobject, automake, …) → deep system integration (e.g. dracut) practicality (2) ● removal of hooks is not for everyone → configuration layers (debconf, YAST, …) might be a feature to some Why is distri faster? ● traditionally: resolve deps, download, extract, configure + careful fsync(2) to make I/O as safe as possible ● distri: resolve deps, download image, extract, configure (unsafe I/O okay) → scales to 12+ GB/s (!) on 100 Gbit links using Go’s net/http conclusion 1. append-only package stores are more elegant than mutable systems → simpler design, faster implementation 2. exchange directories make things seem normal to third-party software → can compile unpackaged software, can run closed-source binaries 3. all of these ideas are practical → live CDs (read-only) and cross-compilation paved the way project goals ● Not trying to build a community or user base! ● Instead, distri enables (my) Linux distribution research, with regular proof of concept releases ● Now that you know the pain points and how fast it could be, maybe you can improve things? :) ● ● ● ●.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us