<<

Masaryk University Faculty of Informatics

Modern development workflow for Fedora GNU/ distribution

Master’s Thesis

Bc. František Lachman

Brno, Spring 2019

Declaration

Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Bc. František Lachman

Advisor: Ing. Milan Brož, Ph.D.

i

Acknowledgements

I would like to express my appreciation to my supervisor, Doctor Milan Brož, for his thorough mentoring and all the valuable advice he gave me. I would like to thank my technical consultant, Tomáš Tomeček, for guiding me during the last year. For all discussions, reviews and suggestions. I really appreciate it. Also, I would like to thank all members of the Packit team for being very helpful to me. Finally, I would like to thank my wife for support and understanding during the work on the thesis. The thesis would not exist without any of you.

iii Abstract

In this thesis, we will look at the current state of the software packaging in the current GNU/Linux distributions. Using three representatives (, and Fedora), we will see various parts of the workflow and compare the different approaches to the same problems. The goal of the thesis is to propose a modern development workflow for Fedora GNU/. We will describe related tools and partial solutions, that exist for this distribution. Finally, we will discuss the proposed solution and examples of the real setup of the prototype.

The thesis is licensed under a Creative Commons Attribution 4.0 International license. iv Keywords

GNU/Linux, , packaging, distributions, git, Fedora, RPM

v

Contents

Introduction 1

1 Current state of three Linux distributions 3 1.1 Quick overview of the distributions ...... 3 1.2 Package Specification ...... 5 1.3 Storing distribution data ...... 8 1.4 Distribution versioning scheme, repositories ...... 11 1.5 Packager workflow ...... 14 1.6 Building packages ...... 16 1.7 Security, formal requirements ...... 18 1.8 Package sources, editing of the distribution packages ..... 20

2 Existing solutions 21 2.1 Similar or related tools ...... 21 2.2 Custom solutions ...... 26

3 Proposed solution for Fedora ecosystem 31 3.1 Key principles ...... 31 3.2 Source-git ...... 34 3.3 Upstream-downstream format conversion ...... 36 3.4 Workflow ...... 38

4 Implementation 43 4.1 Security and compliance ...... 51 4.2 Implemented requirements ...... 53

5 Use cases 55

Conclusion 59

Bibliography 61

Glossary 69

A Appendices 75 A.1 Packit CLI ...... 75 A.2 Sources ...... 83

vii

List of Figures

1.1 Debian FTP server 8 1.2 Arch SVN tree 9 1.3 dist-git scheme 9 1.4 Pagure over dist-git 10 2.1 Spec file template for rpkg-util (content from [68]) 24 2.2 RDO packaging overview ([63]) 29 3.1 pull-request workflow 32 3.2 source-git repository 34 3.3 Commits in the source-git repository 35 3.4 Transformation of the source-git to dist-git 37 3.5 Automatically created pull-request in dist-git 39 3.6 Automatically created downstream 39 3.7 Fedora CI system status for the dist-git pull-request 40 3.8 Synchronize downstream to upstream 40 3.9 Packit-as-a-service integration with Copr 42 4.1 Dependency graph 49 4.2 Packit-as-a-service GitHub application 50 5.1 systemd source-git 58

ix

Introduction

One of the benefits of using Linux distributions is the ability to install the software on the system easily. There is a significant amount of work that needs to be done after a release of some program to be able to install it as a package on the system. The conversion from the to the installable package is not only about compilation. Today’s package managers are responsible for sharing and resolving dependencies, and tracking installed files. There are various ways to achieve these responsibilities. Server .com tracks more than three hundred of different Linux distributions1. Many of them share the packaging system, some of them builds on top of others or are influenced by others, but there are many very different ideas and approaches how the packaging canbe done. In the first section of the thesis, we will look at the three important representatives. We will see the description of their solution from the perspective of a maintainer, a person who is responsible for the conversion from source code to the installable package and taking care of that package. The goal of the thesis is to propose a new maintainer workflow for Fedora, one of the Linux distributions. Here are the requirements for the prototype: • Transform the content of the upstream project into a format which Fedora distribution build system understands. • The transformation process should distinguish between the pristine upstream archive and additional changes layered on top; they should not be mixed together by default. • The process has to be configurable by a user. • Submit the changes as a new pull request in Fedora distribution’s code repository. The proposed workflow has to fulfil the following points: • An easy way to contribute: a user can propose a change with a minimal amount of steps. • Maintainers can easily collaborate with upstream projects. The tooling would enable downstream maintainers to pick up

1. By the time of writing of the thesis. 1 fixes from upstream and maintainers are able touse the prototype to report issues upstream. • Provenance: the commits are signed, and it is possible to track down the origin of a certain artefact. • Continuous Integration: contributors are getting feedback for their proposed changes. The second section will describe tools that try to simplify or automate the parts of the maintainers’ workflow. These are used for getting convenient principles and approaches for the Fedora Linux distribution, the target Linux distribution of the proposed solution. The prototype will be described in the third section of the thesis. The section will cover principles and ideas that were used during the implementation. The implementation will be described in the Section 4 and will describe used tools and libraries, internal structure of the code and the conclusion of the requirements. The fifth, and the last, section will show the setup of the projects that were used for demonstrating the use of the prototype.

2 1 Current state of three Linux distributions

In this chapter, we are going to show the current state of the art of three distributions of GNU/Linux, the UNIX-like . After a short introduction of the used examples, we are going to discuss several packaging topics to see differences among distributions’ approaches.

• As a first model, we are going to use Debian as a representative of -based Linux distributions like and . • For the second point of view, Arch Linux is going to be used. • The third example will be Fedora representing distributions based on RPM (e.g. RHEL, Centos or SUSE). Since Fedora will also be used in the implementation part, there is a strong accent on this distribution in the following parts. We will describe its approach more deeply and with more technicalities.

1.1 Quick overview of the distributions

1.1.1 Debian and deb-based distributions Debian is the operating system by The Debian Project, the association who have made common cause to create a free1 operating system [77]. Debian was created in August 1993 and named after the creator Ian Murdock, and his wife, Debra. It uses GNU/Linux or FreeBSD kernel; there is also ongoing work on supporting GNU/Hurd [78]. According to the server distrowatch.com, Debian is used as a base for nearly four hundred distributions2, of which 135 are now active3. We can name Ubuntu, elementary OS, Linux Mint or for a reference.

1. The meaning of the term free in this context and relation to the Open Source can be found in [79]. 2. A current list of Debian derivates can be seen in the following URL: https: //distrowatch.com/search.php?basedon=Debian 3. Active derivates are listed in this URL: https://distrowatch.com/search.php? basedon=Debian&status=Active

3 1. Current state of three Linux distributions

The packaging system of Debian, which the derivates inherit will be described later in this chapter.

1.1.2 Arch Linux Arch is a Linux distribution, which is aiming to be simple, agile and lightweight. It was founded in 2002 and was inspired by the Linux distributions like or . [55] It is now (29 April 2019) the 16th most popular distribution according to distrowatch.com. Arch is a rolling-release system that is highly configurable and assembled by the user. The “simple” in The Arch Way means without unnecessary additions, modifications, or complications. (It is “simple” from technical, not usability, point of view.) [55] The Arch way4 can be defined (based on [22]) with five core principles: • simplicity, • code-correctness over convenience, • user-centric, • openness, • freedom.

1.1.3 Fedora and RPM-based distributions The history of the Fedora Linux distribution is loosely coupled to Linux. According to [59], it started as a repository of newer and more experimental software for , one of the earliest Linux distributions. The start of the real distribution (called Fedora Core) was caused by the move from Red Hat Linux to (RHEL). The Fedora Core was not governed by the Red Hat, but by the newly created Fedora Project. The project was lately renamed to just Fedora and is driven by community with support from Red Hat, which can use Fedora as a test bed for new technologies and application, which would later be included in Red Hat Enterprise Linux. [59]

4. The Arch way can be generalized as KISS (Keep It Simple Stupid) [22].

4 1. Current state of three Linux distributions 1.2 Package Specification

1.2.1. Debian package formats Debian packages are regular archives with .deb file-extension and special format of the content. The specification is registered by Internet Assigned Numbers Authority (IANA) [46]. In the deb archive, named package_version_architecture.deb, we can find the following files [49], [45]:

• debian-binary: A version of the .deb file. Consists of series of lines. Currently, only one line is present (equal to 2.0).

• data..gz: An archive containing the filesystem with the files that will be installed. (We are using relate paths to the root.) Another compression can be used as well. It is also possible to use not compressed tar archive. (The file extension defines the used method.)

• control.tar.gz: An archive with package metadata. Contains the control file which has an email-header-like format (defined in RFC2822 [51]). We can name some possible fields: – package name, – version, – maintainer, – architecture, – depends, replaces, conflicts, suggests... – description, – size, installed size, hash sums.

There are also source packages. They are used for transforming the source code (written in a programming language) to the binary deb packages (with the compiled code). They usually consist from the three files [49]:

• .dsc: Debian Source Control, contains the header defined in [51] like the control file. (The specified dependencies are for the build of a binary package.)

5 1. Current state of three Linux distributions

• .orig.tar.gz: Original, pristine sources file. (It is not allowed to change the file in order to be able to easily check the origin and integrity of the file. [49]) • .debian.tar.gz/.gz: Archive with the maintainer’s changes and files needed for packaging (e.i. debian directory).

1.2.2 Arch PKGBUILD file The Arch uses its ABS (Arch Build System), which is inspired by the ports system from the *BSD. Each port is a directory named by the package and containing the build script [55]. (The source will be downloaded and does not need to be saved locally.) ABS offers the same functionality with the file named PKGBUILD. This file is a Bash script that contains the URL of the source code along with the compilation and packaging instructions [2]. In the file, there are constants (e.g. pkgname, pkgver or url) and function for each step: • prepare, • build, • check, • package. The file structure and guidelines can be seen in the project’s wiki page ([6]) and the example PKGBUILD can be downloaded from [7].

1.2.3 Fedora RPM, spec file The key benefit of the earliest versions of the Red Hat Linux was its software distribution. Its solution, RPP (Red Hat Software Program Packages) allows installing and uninstalling packages to the system [39]. According to [13], the crucial was the ability to choose installed software after the initial setup was over. Lately, Red Hat introduced the Red Hat built on the following principles [39]: • Ease of use. • Package-oriented focus. • Upgradability of packages. • Tracking of package interdependencies. • Query capabilities. • Verification.

6 1. Current state of three Linux distributions

• Support for multiple architectures. • Use of pristine sources. After the adoption of other Linux distributions, the meaning of RPM was changed to RPM Package Manager (here, the RPM means the ), but the shortcut stays the same [39]. Packages in this system are in the form of compressed archives. This archive can contain multiple files and installation instructions. [39] There are two types of RPM packages [39]: • Binary RPM: *.rpm archive containing the whole (and compiled) libraries or applications. • Source RPM (SRPM): *.src.rpm archive containing the sources and instructions (spec file) to build a binary package. The package definition in the RPM system is a spec file (shortcut for the specification file). It is a text file with the .spec extension [39]. As we can see in [13], it can contain the following entries: • Comments: Human-readable notes. • Tags: Data definition. • Scripts: Commands executed in the specific phase of the workflow. • Macros: Code that will be expanded later. • The %files list — Files that will be present in the package. • Directives: Used to define a specific way to handle filein the %files list. • Conditionals: customise the spec file depending on the various conditions (e.g. target architecture).

7 1. Current state of three Linux distributions 1.3 Storing distribution data

This part will describe systems used for uploading or storing the packages or its updates. This system is a core part of the thesis since it serves as a frontend of the distribution for the maintainer and the implementation part tries to automate the gap between upstream code and the package-backend of the Linux distributions.

1.3.1 Debian FTP Debian repository uses an FTP () server as its backend for uploading packages [49]. The file structure of the FTP server can be seen in Figure 1.1. There also exists some local (e.g. European) FTP servers or dedicated servers for security updates [14].

ftp.upload.debian.org ├─ /pub/UploadQueue/ └─ DELAYED ├── 0-day └── 1-day

*.security.upload.debian.org └─ /pub/SecurityUploadQueue/

Figure 1.1: Debian FTP server

There are multiple directories that form the upload queues. The standard queue is located at ftp.upload.debian.org in the directory /pub/UploadQueue/. There are also queues for delaying the updates (e.g. /DELAYED/1-day) or queues for security updates (/pub/SecurityUploadQueue/) on dedicated FTP servers.

1.3.2 Arch SVN tree Arch Linux uses SVN (Subversion) versioning system for storing distribution data. The main difference with Fedora is in the structure (see Figure 1.2). All packages are stored in one repository, where each package has its directory. [3]

8 1. Current state of three Linux distributions

├── package-one │ ├── repos SVN │ └── ├── package-two │ ├── repos │ └── trunk └── package-three ├── repos └── trunk

Figure 1.2: Arch package SVN tree

Since Arch is a rolling-release distribution [22], there is no need for versioning the whole Linux distribution. The packages are only split into multiple repositories. In combination with architecture, the repository is represented as a subdirectory. There is also a trunk subdirectory for the development. [3] The whole package directory can have the following structure: • repos - ./$repository_name-$architecture/PKGBUILD • trunk - Used by developers before copying to repos directories.

1.3.3 Fedora dist-git Fedora Linux distribution (and also Copr [18]) uses so-called dist-git as its package-backend. It is a git server with the repository for each package [27]. The structure is described in Figure 1.3.

package-one package-two package-three

master f29 f29 master f28 master f29 f28

Figure 1.3: dist-git scheme

9 1. Current state of three Linux distributions

In the repository, the spec file and other distribution files (e.g. or configuration files) are stored. Another important fileis sources, that contains a name and hash of the upstream source(s), that are saved in the lookaside-cache. This service is used for storing source archives outside of the version system that is not well suited for saving big files. (The difference between versions of the source file canbe huge and meaningless [35].) Each dist-git repository has a git branch for each release and master branch for rawhide, the development version of the Fedora Linux distribution. Using the git system brings us multiple benefits. We can use git authentication, local cloning of the repositories and operations that are common for many developers. For a more convenient (developer-like) access, there is a Pagure instance built on top of the dist-git. It is an open-source alternative to GitHub or GitLab and allows well-known operations like forking, pull-requests or issues [56]. It has a web-frontend as well as the REST (REpresentational State Transfer) API (Application Programming Interface) (documented in the /api/0 endpoint of each instance, e.g. https://pagure.io/api/0), that will be intensely used in the implementation part. Figure 1.4 shows the relation to dist-git.

Pagure dist-git package-one

WEB f29 f28 master

pull-request releases

REST wiki fork issues

Figure 1.4: Pagure over dist-git

10 1. Current state of three Linux distributions 1.4 Distribution versioning scheme, repositories

1.4.1 Debian package sources Package sources does not mean in this sense the source packages. It is related to /etc//sources.list, were the user can define repositories (package sources) that can be used. There are various repositories available (each providing packages for multiple releases) [49]:

• Security updates: The security updates do not share the mirror infrastructure with the common repositories and are hosted at security.debian.org.

• Stable updates: Contains non-security updates needed before the next release. A carefully selected subset of the Proposed updates.

• Proposed updates: Place for preparing regular updates. Updates from this repository are moved to the stable every 2 months.

• Stable backports: Recompilation of new version of packages to the old version of the system, where the new package version is not present in the stable.

1.4.2 Arch repositories As it was written in the previous part, the Arch does not have any versions. The packages are only divided into multiple repositories [55]: • [core] repository: It is the base set of the packages, that are critical for the system. The packages have to be in a perfect state, and there are exactly one5 of each necessary tool for a base Arch system.

• [extra] repository: These are the packages not critically needed for the system but enhances the system with more applications. It also contains graphical environments like GNOME or KDE.

5. There are some exceptions to this rule, e.g. two text editors (vim and nano) [55].

11 1. Current state of three Linux distributions

• [testing] repository: Here are the candidates for the [core] and [extra] repositories. The packages in this repository are waiting for testing or need other packages to be updated, and the move to the other repository needs to be atomic. The [testing] repository is not meant to be enabled in the regular system.

• AUR [community] repository: In the [community] repository, we can find packages maintained by Trusted Users. It is a binary branch of the AUR (Arch User Repository). The move from AUR [unsupported] is based on voting and adoption by the Trusted User.

• AUR [unsupported] repository: This is the unsupported branch of the AUR. It is not directly accessible by pacman (Arch’s package manager). It is a huge database of the PKGBUILD scripts. It can be moved to the AUR [community] if it acquires enough votes and is adopted by the Trusted User.

1.4.3 Fedora versions and repositories As it was described in the previous part, the versioning of Fedora packages are represented as a git tree. Each Fedora release has its branch for each package that wants to be part of that release. A git branch for a new Fedora version is so-called branched from the master (rawhide) branch (article Releases/Branched in [30]). According to that article, the branching is done sometime before the release to have time for stabilising and fixing problems when preparing a new release. When going forward from the dist-git towards the users, there is a concept of repositories. Following part is based on the Releases/Branched article in [31]. For each branched version of Fedora, there are multiple repositories:

• updates: Here comes the update after the enabling Bodhi (will be described later) on the branched version. The stable itself is not a repository. It can be considered as a stable snapshot of the branched version before enabling Bodhi. A stable state of

12 1. Current state of three Linux distributions

the release can be represented as a combination of Fedora repository with the content from the updates repository.

• updates-testing: Here are the package updates before the move to the updates repository. This repository can be enabled by testers to check the update before it comes to the regular users.

The key moment for the Fedora release is Bodhi enabling point. Bodhi is an update gating system and has the following workflow [26]: 1. When packager submits a build in Koji (Fedora build system), he/she can create an update in Bodhi. 2. Bodhi will send the newly-built version to the updates-testing repository. 3. Fedora users now can install the updated version by enabling the updates-testing repository 6. These users can vote (+1/-1 karma) for the update. 4. If the update reaches the specified karma level (usually 3), it can be pushed to the updates repository (manually or automatically). This allows democratic voting and testing the new updates. Like in the Debian ecosystem, there are also third-party RPM repositories as well. E.g., there are repositories called rpm-fusion with the content, that Fedora Project or Red Hat does not want to ship. The free-rpm-fusion repository contains open-source content that might be patent encumbered in the US [76]. The nonfree-rpm-fusion repository contains software, that is legally permitted to be shipped in the regular release [76]. Like the PPA (Personal Package Archives) repositories for the Ubuntu-based systems, there are Copr repositories with the repositories created by the users. Copr is an easy-to-use automatic build system providing a package repository as its output[23]. It does not provide any guaranty but allows everyone7 to create its RPM repositories. Copr will also be used in the prototype for creating the test builds from the upstream code.

6. By adding –enablerepo=updates-testing option to DNF command. 7. Only a FAS (Fedora Accounts System) account is needed.

13 1. Current state of three Linux distributions 1.5 Packager workflow

The following subsection will describe actions, that the maintainers need to do to move the upstream code changes to the Linux distribution. Another topic is the handling of the downstream patches. The aim of the prototype (presented in the section 3) is to reduce the manual steps on this workflow.

1.5.1 Updating Debian packages The release of the new version in the Debian world is done in the following way [43]: 1. Download the old source package. 2. Download the new upstream source (package_newversion.orig.tar.gz). 3. Update the changelog. 4. Apply the patches. 5. Generate .changes. 6. Upload .changes (e.g via dupload). For manipulation with patches, Debian has a tool called dquilt. It simplifies the creation or editing of the patch files [43]:

• Adding a new patch: 1. dquilt new patch_name.patch: Set a patch name. 2. dquilt add file_to_change: Set the file that we need to change. 3. Fix of the file_to_change. 4. dquilt refresh: Save the changes to the patch file. 5. dquilt header -e: Add the description to the patch file.

• Editing an existing patch: 1. dquilt pop old.patch Set the patch to be edited. 2. Fix the problem in the old.patch. 3. dquilt refresh: Update the old.patch file. (Save the changes from the previous step.) 4. dquilt header -e: Update the patch description. 5. while dquilt push; do dquilt refresh; done: Apply the rest of the patches.

14 1. Current state of three Linux distributions

1.5.2 Updating Arch packages Since the Arch packages are stored in the SVN repository, to update a package, maintainer needs to update PKGBUILD file in the package subdirectory, upload and commit the changes. To simplify this manipulation, there is a group of scripts that simplifies the whole process (e.g. extrapkg or communitypkg) [3]. These scripts do a similar job as fedpkg on top of the dist-git. For moving packages between the repositories, there is a script as well [3]: ssh repos.archlinux.org "/packages/db-move fromrepo torepo packagename"

1.5.3 Updating Fedora packages To two main parts needed for the package updates were described in the two previous parts. Here, we will see the steps, that the maintainer needs to do to propose a new upstream version to the Fedora users. 1. Download the dist-git repository. (Either with the fedpkg (utility for working with dist-git), or directly via git.) 2. For each Fedora branch: 1. Update the spec file (e.g. version, source URL (Uniform Resource Locator) or new changelog entry). 2. Other changes to spec file (e.g. update of the requirements) or adding/editing the patch files. 3. Use fedpkg new-sources to send a new archive to lookaside-cache (if needed) and update the sources file. 4. Commit the changes. 5. Push the changes to the dist-git. 6. Propose a build in Koji, the Fedora build system. 7. Create an update in Bodhi like it was written in the previous part. These are the manual steps, and this gives us the main goal of the thesis – to automate as much manual work as possible.

15 1. Current state of three Linux distributions 1.6 Building packages

1.6.1 Building Debian packages The Debian package build process starts with the two necessary directories [49]: • debian directory with control file and other files related to packaging or system integration (e.g. menu.ex), • a directory with the virtual containing the files tobe installed (e.g. ./package-root/usr/bin/command will be installed to /usr/bin/command). In [43], there are steps that needs to be done during the build process: 1. Clean the source tree (debian/rules clean). 2. Build the source package (-source -b). 3. Build the program (debian/rules build). 4. Build binary packages (fakeroot debian/rules binary). (Fakeroot is used to allow setting file permissions independently to the current user.) 5. Make the .dsc file. 6. Make the .changes file (dpkg-genchanges).

1.6.2 Building Arch packages As it was described in the package specification subsection, Arch packages are defined in the PKGBUILD file located in the SVN repository. The installable (and compiled) form of the packages is the .pkg.tar.xz archive. (It is a format that is understandable by the pacman.) The build is provided by the makepkg command. It uses the PKGBUILD file to know the necessary steps needed to build a package. (These steps are defined as functions and are using variables defined in the file as well.) It provides a .pkg.tar.xz files as a result of the build. [2]

1.6.3 Building Fedora packages To create a binary RPM package, we need a content of the dist-git repository for the package that needs to be built. This git repository contains all the necessary information to built a package – location of the sources and the recipe for the actual built.

16 1. Current state of three Linux distributions

Here are the steps done by the rpmbuild tool [13]: 1. Execution of the commands and macros in the %prep section of the spec file. 2. Check of the file list. 3. Executing the commands and macros in the %build section of the spec file. The execution can be stopped in the various phases, and the rpmbuild command can provide binary as well as the source RPM files. There is also a %install section in the spec file that contains the commands and macros executed during the installation. There are multiple systems and tools built on top of the rpmbuild used for building the RPM packages: • mock: Allows building packages in a clean or different environment. • Koji: Fedora build system, provides RPM builds from the dist-git content. • Copr: Build system providing a package repository as its output. SRPM files can be used for submitting a Copr build.

17 1. Current state of three Linux distributions 1.7 Security, formal requirements

All three Linux distributions are part of the Reproducible builds initiative. Reproducible builds are a set of practices that create an independently-verifiable path from source to binary code [73]. By the [72], a build is reproducible if anyone can build the same bit-by-bit copy of the artefact from the same given source code, build environment and build instructions. Reproducibility is a key point of the formal requirements to packages – it can give the resistance for some attacks8, quality assurance, smaller binary differences, and increased development speed [71].

1.7.1 Authenticity of Debian packages According to the [49], the trust to the package authenticity is built on top of the GPG (GNU Privacy Guard) keys and the hierarchy of hashes. For each repository, there is a signed Release file, that contains the Packages files and their hashes. These files contain alist of available packages with their hashes. The trusted keys are managed by the apt-key utility.

1.7.2 Authenticity of Arch packages After a quite hard time without the proper package signing9, the Arch builds its authenticity determination on the GPG in a Web of Trust model [5]. There is a set of master signing keys (listed at https://www.archlinux.org/master-keys/). At least three of them needs to be used to sign a public key of each Trusted User or Arch developer. Developers use their keys for signing their packages. Users can then define a level of trust (globally or per package). Pacman will then use basic GPG methods when checking the downloaded packages. [5]

8. It is possible to detect the unintended changes to the package. 9. There is a blog-post [50], Arch author’s response [20] and closed issue [1] related to this. The start of implementation and the current setup is described in the series of blog posts by Allan McRae, a pacman developer ([8], [9], [10], [11]).

18 1. Current state of three Linux distributions

According to [4], the patching policy in Arch is very strict. There are only possible those patches that solve building, too-new-compiler problems or fixing the main feature of the software even to work. Minor fixes or additional features are not allowed. Even the changes that were denied by the upstream.

1.7.3 Authenticity of Fedora packages There are multiple security and formal aspects in the Fedora build system: • package signing: Fedora uses GPG to add digital signatures to the package files. It gives the user the ability to check whether the package was not changed by someone else. As we can see in [13], the packages can be signed during the build as well as we can add multiple signatures afterwards. According to [34], each stable RPM package is signed by the GPG key, whose public key is present in fedora-release packages. The verification of the signatures is automatically done by DNF or graphical tools for software management (e.g. -software) [34]. • package files verification: In Using RPM to Verify Installed Packages chapter of [13], there is a following list of checks, that are used for checking each file of the installed packages: – Owner – Group – Mode – MD5 Checksum – Size – Major Number – Minor Number – Symbolic Link Storing – Modification Time • pristine sources: It is one of the key principles behind RPM [13]. It means that the upstream release is taken as is and the necessary downstream changes are added as patch files. This setup has multiple advantages: patches can be applied on multiple releases and it is easy to see the downstream changes.

19 1. Current state of three Linux distributions 1.8 Package sources, editing of the distribution packages

1.8.1 Sources for Debian packages For each binary package, we can get the source package via apt-get source command. The source package is downloadable if its repository has a deb-src line in the sources.list file. The used version depends on the state of the apt-get cache. Other versions can be manually downloaded from the Debian mirror or web site. [49] Every change of the Debian packages needs to be done in the source packages. When we need to deliver the changes to the users, we need to build the binary package(s). Binary packages are architecture-dependent. [49]

1.8.2 Sources for Arch packages Since the Arch’s build system is inspired by the *BSD ports, it is possible to clone a package(s) locally and provide a build on the system where it will be used. It allows using custom versions of packages (newer or patched one) or setting custom options for compilation [2]. For manipulation with build source files, there is an APS (Arch Build Source Management Tool). It provides git access (via svntogit) to source files used to build Arch packages [21].

1.8.3 Sources for Fedora packages One of the possible ways to get sources of the RPM package is to use the dist-git repository. It can be cloned with fedpkg or directly with git. It is possible to checkout a particular git commit or branch (Fedora version) of the repository. The URL of the source is specified in the spec file. The source archive can be downloaded directly, or it is possible to use some tool (e.g. spectool or fedpkg) to get the URL and do the download from the lookaside-cache with one command. The RPM package can be built from the edited sources with adding a new patch file to the spec file or creating a custom source archive. This process will be simplified in the proposed solution.

20 2 Existing solutions

From the beginning of the package management in the Linux distributions, there are many tools and libraries that aims to simplify the packagers’ life: from the shell scripts created for specific packages to complicated tools for general usage. In the first part of this chapter, we will go through the tools thattry to solve some parts of the maintenance. The second part will contain examples of some projects, that solves the problem for one package or ecosystem of the packages.

2.1 Similar or related tools

Tito Tito is a tool for managing RPM based projects using git for their source code repository. [42] Tito is a Python package and CLI (command-line interface) utility aiming the situation where the packager controls the upstream repository. There are multiple workflows supported by Tito; all of them heavily use git tags for the automation.

Tagging Tito can automatically do the following operations: 1. Bump the version/release in the spec file. 2. Auto-generate the spec file changelog based on the git history since the last tag. 3. Commit the changes of the spec file. 4. Create a new git tag.

Building Tito can create reliable tar.gz archive files with consistent checksums from any tag. Packages are signed by GPG key and checked (article RPM in [31]) on the host during the installation process. It is a way how to prevent unintended changes in the delivered packages. There are also other build options in Tito:

21 2. Existing solutions

• Build source and binary RPMs off any git tag. (RPM files can be directly installed on the system, and SRPM file can be for example send to Copr infrastructure or used during the package review process.) With tito build –test we can generate a new source archive with the content of the given (or last) commit and use it in the spec file as a source when creating an RPM file. • Build packages using upstream git repository as a source and the commits from the downstream git repository will be applied as patches in the SRPM Tito internally uses three classes for its work: tagger, builder and releaser. Each of them can have multiple implementations, and this specific workflow is achieved by using alternative builder (UpstreamBuilder).

Releases As well as there are multiple implementations of the builder, there are multiple releasers1. One can use Tito for releasing to Copr or directly to distgit.

Conclusion of the Tito X It is possible to have multiple packages in one repository. X There are multiple implementations of taggers/builders/releasers. X There is an –offline and –dry-run mode. X Tito can be run only locally. × Python API is not specified and without documentation: – There is an open issue 2 for that. – Quoting the part of the README.md: Also, there are no guarantees that tito will not change in future releases, meaning that your custom implementations may occasionally need to be updated. × UpstreamBuilder expects a specific tag format. [19] × UpstreamBuilder generates one patch for the whole difference. (It can be huge and not understandable.) [41] To conclude the pros and cons, Tito is the closest solutions for our goal, but it is not documented, UpstreamBuiler does not fit well our

1. The manual page for the releasers can be seen by using man 8 releasers.conf. 2. Provide, document a Python API: https://github.com/dgoodwin/tito/ issues/165 22 2. Existing solutions

needs and upgrades can break the custom implementation. Since Tito is only for the local usage, automation/integration needs to be created on top of the tool.

RPKG RPKG project is mostly a python library for dealing with RPM packaging in a git source control [38]. It is built on top of other package-related projects like mock, rpm-build, rpmlint and copr-cli. According to the documentation ([38]), there is no executable shipped with the rpkg, but there are tools for creating a CLI on top of it and the precise definition of its API. There is also a Python API with the low-level functionality that can be used for the prototype. In the API we can find methods for building packages (e.g. build, prep, copr_build or koji_upload), distgit-related operations (e.g. import_srpm or switch_branch), and methods for using lookaside-cache (e.g. remote_file_exists, download or upload). There were efforts to completely rework the RPKG ([24]). There exist a fork of the original project at https://pagure.io/rpkg2. Its vision is described in [62].

rpkg-util This tool has a similar name to RPKG, but it is a different tool. From the documentation [68], it is an RPM packaging utility that works with both DistGit and standard git repositories, and it handles two types of directory content: packed and unpacked. This utility supports two ways to achieve that goal: • spec file templates, • auto packaging. Unfortunately, the auto packaging is deprecated, and then there is only the support for the spec file templates. The example spec file template can be seen in Figure 2.1. The rpkg-util can generate regular spec file with changelog made from git commits. We can also build RPM or SRPM files. There is also a possibility to run the conversion from the dirty repository or for the given commit. For this use case, the tool can add suffixes to the source and resulting files (e.g. commit hash).

23 2. Existing solutions

Name: {{{ git_dir_name }}} Version: {{{ git_dir_version }}} Release: 1%{?dist} Summary: This is a test package.

License: GPLv2+ URL: https://someurl.org VCS: {{{ git_dir_vcs }}}

Source: {{{ git_dir_pack }}}

%description This is a test package.

%prep {{{ git_dir_setup_macro }}}

%changelog {{{ git_dir_changelog }}}

Figure 2.1: Spec file template for rpkg-util (content from [68])

Also, there is support for multiple packages in one repository and pushing to the Copr system.

Rebase-helper As the name of this project says rebase-helper is a tool for automating several steps of the package process. [69] The workflow used by rebase-helper has the following steps ([70]): 1. Preparation: 1. Create the workspace and result directories. 2. Get a new upstream version (more about this later). 3. Bump the version in the spec file. 2. Get sources: 1. Download old source files from lookaside-cache. 2. Download new source files. 3. Downstream patches: 1. Extract old sources to a new git repository. 2. Each patch is applied and committed. 3. New sources are extracted to the new git repository and used as a remote repository in old sources repository. 4. Rebase the patch commit on top of the new sources.

24 2. Existing solutions

5. Generate patch files from the rebased commits. 4. Build 1. Build an old RPM file. 2. Build a new RPM file. 5. Comparison: 1. Run of multiple checkers against both sets of packages. 6. Cleanup: 1. Remove the workspace directory. The essential part is making the two temporary git repositories and using a git rebase for applying the old patches to the new version. This operation is more durable to patching conflicts and can also skip the commit if the patch is already present in the new upstream version. (From the git property of being snapshot-based, not delta- based versioning system. [16]) Although it provides CLI for performing a specific workflow, it also provides a rich Python API. This API ([66]) contains Python access to several packaging tools (rpmbuild, mock, koji, rpmdiff, licensecheck and others). It also contains an object representation of the spec file, so it is possible to change or update the parts of this type of files. Another important part of this tool is gathering the upstream version. It provides several implementations of the so-called versioneers, that can provide the latest upstream version. In the time of writing, the possible implementations are:

• Anitya (Release monitoring service) A database of packages and its releases. Performs regular checks for new upstream releases and also provides a mapping between upstream projects and packages (names) in the distributions. ([25])

• Package services for particular programming languages: – RubyGems (Ruby) – PyPI (Python) – Hackage (Haskell) – cpan () – npmjs (Node.js)

25 2. Existing solutions 2.2 Custom solutions

RDO/OpenStack From the [64], RDO is two things. It’s a freely-available, community- supported distribution of OpenStack that runs on Red Hat Enterprise Linux (RHEL) and its derivatives, such as CentOS. RDO also makes the latest OpenStack development release available for Fedora. It is a distribution of the OpenStack components and some other tooling around. All these projects use a particular workflow during developing, packaging and releasing. The architecture of the workflow can be seen in Figure 2.2 from the RDO OpenStack Packaging ([63]).

RDOPKG RDOPKG is an RPM packaging automation tool (mainly) for the RDO projects. Its key component is a patching branch. Each repository managed with RDOPKG has multiple remote repositories [63]:

• Upstream git repository: Here, the real development is done. (The master and stable/$VERSION branches.)

• RDO dist-git: Contains files for packaging RPM packages (e.g. spec file). (The rpm-$VERSION and $VERSION-rdo branches.)

• Patched git repo: Contains the patch branches for projects. Git commits are the patches. There are branches for versions ($VERSION-patches) and for distribution specific patches ($DIST-patches). Commits from these branches can be rebased on the new upstream version. This tool creates the patch files from the repository and add the patches with the generated changelog to the spec file. (The git rebasing brings the ability to work with the code, not the patch files. Also, it can automatically drop redundant or unnecessary patches.)

Locally RDOPKG works in one repository with the branches fetched from the previously mentioned remotes. (There is also the fourth repository for metadata.)

26 2. Existing solutions

DLRN This tool is closely related to the RDO ecosystem – it can automatically build the RPM packages from the upstream, using packaging files from the RDO dist-git and add patches from the patching branches. This tool is not only a build system but also the repository management. It can create repositories for each build (e.g. /centos7/42/0c/420c638d632555dcbfb94_52cbbfe7) and make a links like /centos7/current/delorean.repo to the last tested repository. [65]

Conclusion The RDO’s main advantage is the patched repo. Since the tool is closely related to the workflow of OpenStack components, it fits very well projects in that ecosystem. For other projects, there are some downsides: × Situation, where upstream developers are also the downstream maintainers. (The package-related files are placed in the third repository, and it is not easily possible to sync it with the upstream repository.) × Packaging work is still too far from the upstream and packagers need to work in three places. (Automation is done on top of them, not between them.) × Not well configurable. × No dist-git automation. (e.g. conditional merges, builds or Bodhi updates)

Fedora Kernel is a core part of each Linux distribution and probably one of the most complex. There are many patches, configuration files and tweaks, that are added on top of the vanilla version of the upstream kernel. (See the dist-git repository [36] for more information.) Currently, Fedora uses multiple repositories and tools on top of them to handle such a big package: • upstream kernel source, • dist-git repository, • the exploded source git tree.

27 2. Existing solutions

The conversion from upstream and downstream to the exploded is done with the fedkernel scripts [15]. This tool provides the following workflow: 1. Determine the dist-git sha1sum for the build. 2. Perform a checkout of that exact commit. 3. Prepare the dist-git tree for that commit. 4. Determine the upstream base release, tag or git revision. 5. Reset an upstream git tree. 6. Extract the patches Fedora applies. 7. Apply patches to the upstream git tree. 8. Tag the upstream git tree with the corresponding package build NVR (“name-version-release”). Fedora kernel package is also one of the packages, that will be used to experiment with the proposed solution. It is used as a package, where upstream developers are not Fedora maintainers, and there needs to be done a lot of patching and configuration on the downstream side.

28 2. Existing solutions

Figure 2.2: RDO packaging overview ([63])

29

3 Proposed solution for Fedora ecosystem

After the look at the existing solutions and other related tools, that were described in the previous section, there is a chapter about the prototype implemented for the Fedora ecosystem. It builds on the useful features of the existing solutions, gives them together and improves it where it is needed. Since the proposal will be used as a starting point for the real solution, it has been created after the discussion and collaboration with the real maintainers of the Fedora packages. Teams that were adopting the prototype and giving the feedback are: • Rebase-helper (discussed in section about Rebase-helper), • pykickstart, • Fedora Kernel, • standard-test-roles, • systemd, • (Fedora installer and the related tooling), • the project itself, • other smaller packages (conu, colin,ogr, sen,... ). Since the development was done in an agile way and there was feedback from these projects, there were many architectural changes during the implementation, and more will probably come. The project itself was named Packit to describe the “packaging” purpose. The following chapters will describe the proposed solution in a more detail way.

3.1 Key principles

After the discussions with the Fedora maintainers and the team that was working on the automation for the containers 1, we defined some principles that were needed to take in mind during the implementation and the architecture-design phase. 1. Automation of routine work. 2. Use well-known open-source workflows (e.g. pull-requests).

1. Userspace Containerization team at Red Hat

31 3. Proposed solution for Fedora ecosystem

3. Allow packagers/maintainers to work on the source, not on the patch files. 4. Move downstream development close to upstream. 5. Every automated task can also be achieved locally (bot and CLI tool). Here is a short description of the above rules.

1. Automation of routine work This is the main goal that we need to achieve. To automate the manual and routine steps that maintainers need to do during the maintenance and releasing of the new versions.

2. Use well-known open-source workflows (e.g. pull-requests) The aim is to give maintainers the same tools and workflows that are used for development. The upstream developers are used to git, pull-request workflows (can be seen in Figure 3.1) and CI system with the response statuses on git commits/pull-requests, that can be used to avoid merging not working code.

original-project

master

pull-request

fork

master feature-branch

Figure 3.1: pull-request workflow

3. Allow packagers/maintainers to work on the source, not on patch files Since maintainers in Fedora currently work in dist-git that uses the patches, giving them only the git and pull-requests is not a solution

32 3. Proposed solution for Fedora ecosystem

since they need to review patch difference, that can be very hard. Another goal is to give them a way to work on the real code, review the downstream changes, not the patch difference.

4. Move downstream development close to upstream Allowing collaboration of the maintainers and upstream developers would be beneficial. It is useful for both sides if there is an easy way to move the downstream fix to the upstream. Also from the opposite point of view, the upstream developer needs to get the result of the downstream checks and tests quickly, if he wants to some new changes.

5. Every automated task can also be achieved locally (bot and CLI tool) The solution will provide integration on the level of API of the used tool, but there will also be a CLI tool that allows maintainers to perform the actions locally. It brings more control and ability to use the tool in case of any failure of the used services.

33 3. Proposed solution for Fedora ecosystem 3.2 Source-git

It is a fundamental part of the proposed workflow. It uses a similar concept to the patching-branches in the rdopkg (described in the part about RDOPKG). We have an git repository containing the source code of the upstream repository and the commits on top of them that contains the downstream content – patches and packaging files (e.g. spec file). The source-git repository can be created quickly as a fork of the upstream git project. We can have multiple branches – for multiple upstream versions2 as well as for the multiple distribution version. (Example is in Figure 3.2. In the left side, there is an upstream repository with the v1.3 and v2.1 tags. In the right side, there is a source-git repository with the same tags and some commits on top of the upstream code.)

upstream-project source-git

v2.1-f29

v2.1 v1.3 v2.1 v1.3 v2.1-f30 v1.3-f28

Figure 3.2: source-git repository

The downstream commits in the source-git repository can contain two types of changes: • Addition/changes/removal of the packaging files: spec file, configuration files, downstream tests,... • Changes on the upstream code. (Instead of adding patch files.) Of course, the external source-git repository is not needed if the upstream contains the downstream files as well. It is only a specific case of the source-git where there are no downstream commits.

2. It can be an git tag or a branch for the releases with the same main version. (e.g. v1 for v1.0, v1.1 and v2 for v2.0, v2.1)

34 3. Proposed solution for Fedora ecosystem

source-git

+spec-¡ le

v2.1-f29

v1.3 v2.1 + some downstream change v1.3-f28 – some downstream change

Figure 3.3: Commits in the source-git repository

35 3. Proposed solution for Fedora ecosystem 3.3 Upstream-downstream format conversion

Since the dist-git is a core part of the packaging workflow, we cannot avoid using it. Because of the disadvantages of using dist-git for maintenance, we can automate the manipulation with dist-git and convert the source-git content to the dist-git repository. The maintainer can work with the source-git, but the infrastructure can still use dist-git as a source-of-true. The distribution has to use the pristine upstream archive as a source. The source-git repository uses the corresponding git tag as a starting point. On top of this point, there can be multiple downstream commits like it was described in the previous part. The two types of changes need to be handled differently: • The downstream content needs to be synced directly to the dist-git repository. • The changes of the source code need to be converted to the patch files. The algorithm goes through the downstream commits and generates a diff from the previous commit. (Like in the rebase-helper, we can avoid applying of the already merged patch.) The diff is filtered from the directly synced files. The synced files are definedin the configuration file. (For example, maintainers can sync testsor changes in the spec file.) The configuration file itself should be synced as well. The generated patches are then added as patches to the spec file. We can add a comment with the other information from the original source-git commit to make the spec file more human-readable. (There needs to be also a part that applies the patches.) In Figure 3.4, there is a relation between the upstream, source-git and dist-git repository. In the dist-git tree, we can see the last commit containing a patch file3 (generated from the two downstream commits in source-git) and edited spec file (with new upstream version and name of the patch).

3. In the real implementation, the patch files are named after the commit hashes to allow backward control of the generated patch.

36 3. Proposed solution for Fedora ecosystem

upstream

v2.1 v2.1.tar.gz

source-git

+spec-¡ le

v2.1-f29

+ some downstream change v2.1 – some downstream change dist-git f29

+spec-¡ le

Source1: v2.1.tar.gz Patch1: patch-1

+patch-1 + some downstream change – some downstream change

Figure 3.4: Transformation of the source-git to dist-git

37 3. Proposed solution for Fedora ecosystem 3.4 Workflow

There are multiple workflows supported in the prototype. We will go through the most important ones.

1. Upstream release to downstream pull request 1. The action needs to be triggered by the new upstream release. There are multiple sources, that can be used for receiving the information about new releases – GitHub web-hooks, Upstream Release Monitoring4 or Github2Fedmsg (an application that rebroadcasts GitHub events on the fedmsg, FEDerated MeSsaGe Bus). Also, we can start the process from the CLI with packit propose-update. 2. Create or download the upstream archive. 3. Generate patch files from the source-git commits. 4. Add patch file names to the spec file. 5. Copy the downstream files to the dist-git (spec file included). 6. Commit the new files in the dist-git repository to the temporary branch. 7. Fork the dist-git project on Pagure (if needed). 8. Push the temporary branch to the fork project. 9. Create a pull-request from the temporary branch to the target branch (e.g. f29 or master). The pull requests trigger the downstream CI. Maintainer then sees the status of the scratch (test) build, simple package checks and custom tests if they are set up.

2. Upstream pull-request to downstream pull-request The pull-request workflow is mostly similar to the previous one.It only uses the pull-request branch instead of the upstream git tag.

3. Downstream to upstream synchronisation Since there are situations that cause changes in the dist-git directly, maintainers need to copy the downstream changes on the configured synced files back to the upstream.

4. More about Upstream Release Monitoring can be found in the related page in the [30].

38 3. Proposed solution for Fedora ecosystem

Figure 3.5: Automatically created pull-request in dist-git

Figure 3.6: Automatically created downstream commit

It is convenient to propose changes as a pull request. 1. Copy the files from the dist-git repository to the cloned source-git repository (we need to have the information about what files are synced in both directions in the configuration file that is synced by default as well). 2. Commit the new files in the dist-git repository to the temporary branch.

39 3. Proposed solution for Fedora ecosystem

Figure 3.7: Fedora CI system status for the dist-git pull-request

3. Fork the source-git repository (if needed). 4. Push the temporary branch to the fork project. 5. Create a pull-request from the temporary branch to the target branch (e.g. master by default).

Figure 3.8: Synchronize downstream to upstream

4. Generating SRPMs SRPM files are a common source for other tools or systems (e.g. Copr). (They contain sources as well as the files needed to create a binary RPM package.) 1. Create or download the source archive. 2. Generate patch files from the source-git commits and add them to the spec file.

40 3. Proposed solution for Fedora ecosystem

3. Generate an SRPM file from the archive, patches and updated spec file.

5. Triggering the Copr builds One does not need to generate SRPM and submit the Copr build by hand but can submit the Copr build automatically. The reaction to the pull-request can be seen in Figure 3.9. There are also other possibilities, how to use this project – triggering the build in the Koji (Fedora build system [37]) or creating updates in Bodhi (system for approving package updates to be available to the end users).

41 3. Proposed solution for Fedora ecosystem

Figure 3.9: Packit-as-a-service integration with Copr

42 4 Implementation

The following part of the thesis describe the implementation of the proposed prototype for the Fedora ecosystem. The project is called Packit, and it was developed openly from the initial state. (In the packit-service/packit GitHub project.1)

Used technology For the implementation, Python programming language was used for multiple reasons: other solutions are mainly written in Python and libraries that can be used also provides a Python API. There are also advantages of the language itself. In [48], we can see a part called Why do People Use Python?. It shows that Python code is considered as readable, understandable and design to be easily reused. The same amount of code is more descriptive than languages like ++ or . One downside of choosing Python is its execution speed. Although, there are a couple of optimisations, its not as fast as the lower-level languages like C or C++ (from chapter OK, but What’s the Downside in [48]). Since there are no strict requirements for the speed of the operations, the high-level programming language is not a problem. The limitation is more on the side of network bandwidth/latency and delay of the remote APIs.

Python packages: • click: A library used for creating a CLI and manual pages. • flask: Flask is a Python web server ([54]) used for receiving the webhooks. • GitPython: Python wrapper on top of the git binary. • PyYAML and jsonschema: Packages used for loading and validating the configuration files.

1. Original repository was named source-git from the core part of the workflow, but it is more than the source-git, so the name is now more general.

43 4. Implementation

• rebasehelper: This is a Python interface for the tool described more in the section about Rebase-helper. Its Python API is used for manipulation with spec files. • bodhi-client: Python API for communication with Bodhi. • fedmsg: Python package for communication with Fedora messaging system called fedmsg. • python-gnupg: Library for interacting with GPG.

Tools called as a binary • fedpkg: Fedora packaging tool allowing manipulation with dist-git and lookaside-cache. • git: CLI for git. Some operations were hard to achieve with the gitpython library and need to be achieved directly via subprocess. • rpmbuild: Tool for building RPM or SRPM files. It is used for checking the signatures of the git commits.

Tools used for development • -bender/podman/buildah: Tools for building and running a containerised version of Packit. • pytest+flexmock: Testing and mocking framework. • release-bot: Automatic releases of Python packages to GitHub and PyPI (Python Package Index). • -scm: Python package versioning based on the git tags. • pre-commit: unified git pre-commit hooks running static analysis of the code: – syntax – type checker (mypy) – black (unopinionated code linter and formatter) – trailing whitespace checker – avoiding of committing secrets • CentOS-CI: Continues-Integration system based on the used for running tests on the pull-requests and commits. • Packit itself: Releasing Packit itself to Fedora is done with Packit.

44 4. Implementation

Code structure The structure of the code can be seen in Figure 4.1. The core item is the packit.api module. This Python API provides methods to accomplish the required use-cases/workflows. Packit itself has multiple entry-points, or in other words, ways to be run:

• There is CLI utility packit created with the help of the click package. This package simplifies creating CLIs defined by function decorators. All help messages and input validation is resolved by this library. There is also a click-man package that can generate manual pages for click applications. From the code perspective, there is a base command defined in packit.cli.base_cli, and all use-cases/workflows are provided as subcommands (e.g. packit.cli.update). • The Fedora ecosystem is highly built on the top of the fedmsg. With the usage of GitHub2Fedmsg, we can receive GitHub events from the fedmsg message bus. This functionality is implemented as a packit listen-to-fedmsg. • The third, and last, way to run Packit is a web-service. It is a Flask web-server that can receive web-hooks and start the action based on the given data. The web-hooks support is needed to be able to act as a GitHub app. (Figure 4.2 shows the settings of installed application for a GitHub organization.)

Apart from the packit.api module, there are other key blocks of the project. Some of them will be described in more detail in the following paragraphs.

Local Project This is a class coupling the multiple other instances related to one git project. It contains git.Repository instance from gitpython, current git branch, a path to the cloned repository and also support for manipulation with the remote project via OGR project and service instances. The OGR library has been created for this project and will be described in the part about OGR. Main advantage and purpose of the LocalProject class is an ability to calculate missing information

45 4. Implementation from others (e.g. a branch from the git.Repository or ogr.GitProject from the ogr.GitService and full_name). Also, it can clone the project to the given or temporary directory if needed. There is also one other related class representing the LocalProject as a CLI argument. It acts as a click argument type that gives us easy validation and help/error messages from the library. With this argument, we can very easily create a new instance of the LocalProject from the given path or git URL.

Configuration Each package that would like to be operated with Packit needs to create a configuration file, that gives us needed information. It contains a mapping of the project to the downstream package, a path to the spec file and jobs definition (described in the next paragraph). The configuration can also contain so-called actions that can be used as hook during the workflows or can also overwrite some parts of the workflow with a custom command. These actions can be useful for the non-trivial projects that already have some automation during the packaging or cannot be maintained in the same way as other projects. As an example, we can name systemd or Fedora kernel.

Jobs This is a way, how to turn on the reaction on some events when running Packit-as-a-service. If the service receives an event, it checks whether it matches some job definition in the configuration. Ifit matches, it starts a workflow, that was configured. The job definition contains a type of an event (e.g. GitHub release) and the action that needs to be done (e.g. propose version to the dist-git as a pull-request). It can also define some metadata (e.g. target dist-git branch).

Representation of upstream and downstream The main functionality is implemented in the PackitRepositoryBase class and its subclasses, Upstream and DistGit. It contains: • LocalProject instance, • spec file representation from the RebaseHelper package,

46 4. Implementation

• methods needed for the required workflows: – creating patches, – editing spec file, – creating/downloading source archives, – git operations (checkout, commit, pull-request, ...), – manipulation with fedpkg, lookaside-cache or rpmbuild. The api module only creates an instance of the Upstream/DistGit and call its methods to accomplish the required workflow.

Git forge API From the early state of the prototype, there was a huge amount of code needed to handle APIs of the git servers (e.g. GitHub and Pagure). All of them have REST API and the Python wrapper on top of it. Each of the Python libraries has a different representation of the remote objects. These servers are not pure git servers but provide a lot of other tools (e.g. issues and pull-request). The very first version supported only GitHub for the upstream/source-git and Pagure for the downstream. Different APIs were not a problem at that point, but there came a request to support Pagure on the upstream site (e.i. https://pagure.io). This request started an initiative to create a common Python API for multiple git servers/forges. Another reason was a similar code that handles the forking of projects and pull-requests. The related code was moved to the dedicated project called OGR (One Git library to Rule). It defines one API and multiple implementations: currently GitHub and Pagure. The GitLab code is not ready yet, and there are no requests to it for now. The main user of this library is Packit, but other projects are going to use this as well. The API itself is more object-oriented than the one in libpagure, but not so complex as the pygithub or python-. It defines three main classes to represent Service (connection and login to the server), Project and User. The most required methods are implemented in the Project class – there are methods for getting various information about the project (e.g. name or description) or manipulation with pull-requests (e.g. listing, creating or getting comments for the pull-requests). There are also methods for getting/creating a fork of the project.

47 4. Implementation

The GitHub implementation is a pure API bridge, that forwards methods of the OGR to pygithub and vice versa. It only needs to convert between different representations. The Pagure implementation was a bit harder since the libpagure has only one class representing the projects but also contains information about log in, user and other not so related data. Since it has a not so rich API, we created a subclass of the one from libpagure to be able to easily move our afford back to the upstream but still has the code quickly on our site. On top of this subclass, we are building the real implementation of the OGR interface.

48 4. Implementation

packit.cli

packit_base

packit.service

web_hook listen_to_fedmsg srpm build status sync_from_downstream update create_update

jobs utils types fed_mes_consume

api

status

distgit upstream

fedpkg base_git

local_project con g

utils sync schema constants

exceptions actions

Figure 4.1: Dependency graph

49 4. Implementation

Figure 4.2: Packit-as-a-service GitHub application

50 4. Implementation 4.1 Security and compliance

With the usage of automation instead of manual work, there is one big problem. We need to be careful not to create a security vulnerability, but at the same time be able to provide the same operations as the regular maintainer:

• When using Packit as a CLI, it is on the user to have required privileges to use internal services, but then it is also his/her responsibility to check the code.

• In case of the Packit-as-a-service, we need to have privileges to use internal services, but we are syncing code from servers that can be out of control of the Fedora maintainers. We need to be sure we do not sync some potentially infected code that will be executed in the downstream infrastructure.

Since we are using pull-requests to bring the code to the dist-git, we can be sure that some maintainer approves (and merges) the code before it lands in the distribution. With the pull-request there is still one problem – we are running tests and scratch builds on the pull- requests. The prototype uses one of the built-in property of the git to prove the authorship of the used code. Maintainers can define allowed GPG keys in the configuration file, and Packit will check the topgit commit of the used code to determine if it can be tested in the internal infrastructure. There are two things to consider:

• We can check only the last commit since it means that someone authorised verified the code in that state. It is not needed to check also the previous commits.

• We need to be careful, where we get the list of the allowed keys. If we get the keys from the upstream configuration file, anyone can change them on the upstream site, and the verification will be without any value. We are getting the list from the downstream configuration. It gives the maintainers a simple way to edit the list of approvers but anyone without access to the downstream package repository cannot change it. During the local run, one can easily edit also a configuration file in

51 4. Implementation

dist-git, but it is not a risk since he/she is using his/her own identity for connecting to the internal infrastructure.

Since there can be maintainers with various control over the upstream projects, the described approach is optional and is on the discussion of the maintainers and leaders. (There were also discussions about evading the pull-requests and directly pushing to the master branch. These are more organisational questions and cannot be resolved in the thesis. The prototype only gives the ability to do it in multiple ways.)

52 4. Implementation 4.2 Implemented requirements

In the following part, we will shortly look back to the prototype requirements and how they were fulfilled.

Transform the content of the upstream project into a format which Fedora distribution build system understands. Process of transformation was described in Subsection 3.3.

The transformation process should distinguish between the pristine upstream archive and additional changes layered on top; they should not be mixed together by default. The specification of source-git was described in Subsection 3.2 and generation of the patch files from pristine sources and generation of patch files was described in part about Upstream release to downstream pull request workflow in Subsection 3.4.

The process has to be configurable by a user. There is a configuration file, that was described in part about configuration in Section 4. More information and example configuration files can be found in the Packit documentation [53]in the configuration.md.

Submit the changes as a new pull request in Fedora distribution’s code repository. This can be done for new upstream releases as well as for the upstream pull-requests. When creating a downstream pull-request, the fork of the repository is created if needed and the downstream Continues Integration and other checks are automatically run on the proposed code.

An easy way to contribute: a user can propose a change with a minimal amount of steps. Users can work in the upstream repository (or source-git). There is a Copr repository created for each pull-request to see if the changes do not break the build.

53 4. Implementation

Maintainers can easily collaborate with upstream projects. The tooling would enable downstream maintainers to pick up fixes from upstream and maintainers are able to use the prototype to report issues upstream. Maintainers can collaborate on the upstream and synchronise upstream pull-requests to the dist-git. There are two possible reactions to breaking upstream changes: • Status of the failed Copr build of the RPM is shown in the pull- request. • The pull-request in dist-git (created from the upstream pull-request) triggers the downstream checks (scratch builds, basic checks and other configured tests). Maintainers can see the results and can recognise the potential problems before the actual release.

Provenance: the commits are signed, and it is possible to track down the origin of a certain artefact. It is possible to configure allowed GPG keys. If specified, thetop commit of the release or pull-request needs to be signed by one of the keys. (To be able not to run automation on the code, that was not checked by someone responsible.) The original source is always specified in the spec file. When creating an SRPMs for the given git revision, it is possible to use two approaches: • Use source archive for the given git revision and generate patch files from the commits on top ofit. • Generate a new testing archive named by the given revision.

Continuous Integration: contributors are getting feedback for their proposed changes. The most visible feedback is for the upstream pull-requests. As it was set, for each pull-request the Copr repository is created, the build status is shown as a pull-request status and the information, how to test the newly-built packages is added as a new comment. The pull- requests can be synchronised to the downstream repository to get the downstream feedback as well.

54 5 Use cases

Multiple projects are already using and testing Packit. In the following part, we will go through some of them and describe its setup, challenges and usage of the prototype.

Packit As a first example, we can use Packit itself. Since the project is an uncomplicated Python application without any complex dependencies and built challenges, the setup was quite easy. Another advantage is that the upstream project is in our control. It is a representative of the simple projects without the need of the source-git. The local workflow used in Packit:

1. Create a new release with the release-bot1 on GitHub and PyPI. (The archive from PyPI is used for the Fedora release.) 2. packit propose-update: This command will send a source archive from PyPI to lookaside-cache and create a pull-request in dist-git for a master branch (Fedora Rawhide version). (We are also syncing test suite for STI (Standard Test Interface) that is automatically run on the pull-request code.) 3. After the merge of the previous pull-request, we can merge commits also to other branches. (Or run packit propose-update –upstream-branch f30 to create a pull-request to the branch for Fedora 30.) 4. With packit build, we can start a Koji build for the specified branch. With –scratch option, we can submit a scratch (testing) build. 5. After the successful build, we can run packit create-update, which will create a Bodhi updates for the new release. This will move the new version to the updates-testing repository and

1. More information can be found on the project’s homepage: https://github. com/user-cont/release-bot/.

55 5. Use cases

people can update to this version with the –enablerepo=updates-testing option for DNF. If the particular amount of people approve the update, the new version is moved to the stable repository. Also, we can use the Packit-as-a-service for creating Copr repositories for the pull-requests. The state of the project can be seen with the packit status command. This command shows the package versions in each step (upstream repository dist-git, Koji and Bodhi). There is also other useful information provided by this command. When there is a direct change in dist-git, we can use packit sync-from-downstream to create a pull-request to the upstream project with the changes on the synchronized files.

conu, colin These are the Python tools for testing and linting the container images. From the perspective of Packit they are very similar to the Packit itself – no need for source-git, no need for custom steps during the release.

pykickstart Other Python packages with the basic setup. This package initially requested the custom steps to be able to generate a spec file automatically before other manipulation. This possibility was implemented as a part of the support for actions and hooks during the workflows, but pykickstart places the spec file directly to the repository and does not use the actions any more.

docker-py This is a Python API for the Docker daemon. It has an upstream repository that tends to change an API and break things quiet often. Since Fedora does not support versions of Docker higher than 1.13, this package needs much downstream work to maintain. There is a source-git repository with the downstream changes and tests: https://github.com/TomasTomecek/docker-py-source-git. This repository is a traditional source-git example as it was described in Subsection 3.2.

56 5. Use cases

systemd The setup for systemd is not used yet, but was created for testing purposes and as an example for systemd maintainers, how can it be done. The systemd maintainers have a control on the upstream project, but they are using some stable branches as a source for the releases. For their purpose, the extra repository was created (https://github.com/ packit-service/systemd-source-git/) to be used as a source-git. It contains three branches: • master: Branch is not used and contains only a README.md file. • upstream-v239: A mirror of the v239 git tag from the upstream repository. • f29: This branch is built on top of the upstream-v239 and contains many backported 2 commits from the master branch of the upstream repository. The last backported commit is tagged (e.g. 239.stable.feb.2019). This tag is used for building upstream stable versions of systemd. The commits on top of that tag are related to Fedora (configuration for Packit, spec files and configuration files). Some downstream changes are there as other commits as well. This setup (shown in Figure 5.1) is only a little complicated source-git. As an upstream version, we are using the stable tag, not the original version tag. (The backports are done in the upstream repository.)

2. With git rebase or git cherry-pick. This process does not preserve commit integrity (hash, difference, signature).

57 5. Use cases

systemd-upstream

master systemd-source-git

c3 f29

c2

c1 cherry-pick c3 c3' c0 c1' systemd-239.stable.feb.2019.tar.gz 239.stable.feb.2019 cherry-pick c1 v239-upstream v239

Figure 5.1: systemd source-git

58 Conclusion

The aim of the thesis was to look at the current state of the packaging in the current Linux distributions and to create a prototype for the Fedora Linux distribution. The proposed workflow was discussed with the teams from Fedora and implemented as a tool called Packit.

Contribution

Since the prototype was meant to be a starting point for the real usage, the whole code base was not entirely done by the author of the thesis. Here are the author’s contributions: • The very first version and design. • The introduction of the OGR library and its main . • Since February 2019, the team from Fedora is participating in the project as well. (Mainly the consultant of this thesis, Tomáš Tomeček.) With more people participating in the project, there is a need for many architectural discussions and reviews. • The main contribution to the projects since the adoption of the prototype by the Fedora developers. • Packaging Packit to Fedora. • Package review of OGR as a dependency of Packit. All author’s git commits can be easily found since they are marked as Signed-off-by: Frantisek Lachman and signed by the 13A767B3A7030225 GPG key.3

Future

By the time of writing of the thesis, there are already some projects using Packit and getting benefit of it. It is expected that more projects will use it as well. With the increasing number of new projects, there are many requests for supporting more special workflows. Other improvements are currently being done on the continues integration level.

3. One can list the author’s commits on the following URL: https://github.com/ packit-service/packit/commits?author=lachmanfrantisek

59 5. Use cases

The Packit-as-a-service will hopefully receive new features and in the future, users will get all of the benefits by only enabling it for their repositories. All other communication will be done through common ways like pull-requests, releases and issues.

60 Bibliography

1. [devtools, db-scripts] add database signatures [online]. 2011 [visited on 2019-04-30]. Available from: https://bugs.archlinux.org/ task/23101?project=1. 2. AARON GRIFFIN AND OTHERS. Arch Build System - ArchWiki [online]. 2019 [visited on 2019-04-30]. Available from: https: //wiki.archlinux.org/index.php/Arch_Build_System. 3. AARON GRIFFIN AND OTHERS. DeveloperWiki:HOWTO Be A Packager - ArchWiki [online]. 2019 [visited on 2019-04-30]. Available from: https://wiki.archlinux.org/index.php/DeveloperWiki: HOWTO_Be_A_Packager. 4. AARON GRIFFIN AND OTHERS. DeveloperWiki:Patching [online]. 2019 [visited on 2019-05-01]. Available from: https: //wiki.archlinux.org/index.php/DeveloperWiki:Patching. 5. AARON GRIFFIN AND OTHERS. pacman/Package signing - ArchWiki [online]. 2019 [visited on 2019-04-30]. Available from: https : //wiki.archlinux.org/index.php/Pacman/Package_signing. 6. AARON GRIFFIN AND OTHERS. PKGBUILD - ArchWiki [online]. 2019 [visited on 2019-04-30]. Available from: https://wiki.archlinux.org/index.php/PKGBUILD. 7. AARON GRIFFIN AND OTHERS. PKGBUILD.proto [online]. 2019 [visited on 2019-04-30]. Available from: https : / / git . archlinux.org/pacman.git/plain/proto/PKGBUILD.proto. 8. ALLAN MCRAE. Pacman Package Signing – 1: Makepkg and Repo- add [online]. 2011 [visited on 2019-04-30]. Available from: http: / / allanmcrae . com / 2011 / 08 / pacman - package - signing - 1 - makepkg-and-repo-add/. 9. ALLAN MCRAE. Pacman Package Signing – 2: Pacman-key [online]. 2011 [visited on 2019-04-30]. Available from: http://allanmcrae.com/2011/08/pacman-package-signing- 2-pacman-key/.

61 BIBLIOGRAPHY

10. ALLAN MCRAE. Pacman Package Signing – 3: Pacman [online]. 2011 [visited on 2019-04-30]. Available from: http://allanmcrae.com/2011/08/pacman-package-signing- 3-pacman/. 11. ALLAN MCRAE. Pacman Package Signing – 4: Arch Linux [online]. 2011 [visited on 2019-04-30]. Available from: http://allanmcrae.com/2011/12/pacman-package-signing- 4-arch-linux/. 12. APACHE SOFTWARE FOUNDATION. [online]. 2018 [visited on 2019-05-10]. Available from: https://subversion.apache.org. 13. BAILEY, Edward C. Maximum RPM (RPM) (Other Sams). 1st. Redhat Press, 1997. ISBN 0672311054,9780672311055. 14. BARTH, Andreas. Debian Developer’s Reference [online]. 2007 [visited on 2019-05-07]. Available from: https : //www.debian.org/doc/manuals/developers-reference. 15. BOYER, Josh. Scripts to create an exploded kernel tree from the Fedora pkg-git kernel repo [online]. 2019 [visited on 2019-05-08]. Available from: https://pagure.io/fedkernel. 16. CHACON, S.; STRAUB, B. Pro Git. Apress, 2014. The expert’s voice. ISBN 9781484200766. Available also from: https://git- scm.com/book/en/v2. 17. CHANDRA, Rakesh. Python requests essentials : learn how to integrate your applications seamlessly with web services using Python requests. Birmingham, UK: Packt Publishing, 2015. ISBN 9781784395414. 18. Copr Buildsystem – COPR documentation [online] [visited on 2019- 05-07]. Available from: https://docs.pagure.org/copr.copr. 19. CRAIG RINGER, Devan Goodwin. UpstreamBuilder assumes that the upstream original sources are tagged -1 [online]. 2014 [visited on 2019-05-08]. Available from: https://github.com/dgoodwin/ tito/issues/146. 20. DAN MCGEE. Arch Linux and (the lack of) package signing [online]. 2011 [visited on 2019-04-30]. Available from: https://lwn.net/ Articles/434990.

62 BIBLIOGRAPHY

21. DAVE REISNER. Arch Build Source Management Tool [online]. 2019 [visited on 2019-04-30]. Available from: https://github.com/archlinux/asp. 22. DEVOLDER, Ike. Arch Linux Environment Setup How-to. Packt Publishing, 2012. ISBN 978-1-84951-972-4. 23. Fedora Copr [online] [visited on 2019-04-28]. Available from: https://copr.fedorainfracloud.org. 24. FEDORA PROJECT. Propose new mechanism to refactor cli codebase [online]. 2016 [visited on 2019-05-08]. Available from: https: //pagure.io/rpkg/issue/49. 25. FEDORA PROJECT. Anitya [online]. 2019 [visited on 2019-03-27]. Available from: https://anitya.readthedocs.io/en/stable/. 26. FEDORA PROJECT. Bodhi - bodhi 3.14.0 documentation [online]. 2019 [visited on 2019-04-28]. Available from: https://bodhi. fedoraproject.org/docs/. 27. FEDORA PROJECT. DistGit [online]. 2019 [visited on 2019-04-28]. Available from: https://github.com/release-engineering/dist-git. 28. FEDORA PROJECT. Federated Message Bus [online]. 2019 [visited on 2019-04-28]. Available from: http://fedmsg.com/federated- message-bus/. 29. FEDORA PROJECT. Fedora and Red Hat Enterprise Linux [online]. 2019 [visited on 2019-04-28]. Available from: https : / / docs . fedoraproject.org/en-US/quick-docs/fedora-and-red-hat- enterprise-linux/. 30. FEDORA PROJECT. Fedora Project Wiki [online]. 2019 [visited on 2019-04-28]. Available from: https://fedoraproject.org/wiki. 31. FEDORA PROJECT. Fedora User Docs [online]. 2019 [visited on 2019-03-27]. Available from: https://docs.fedoraproject.org. 32. FEDORA PROJECT. Fedora’s Mission and Foundations [online]. 2019 [visited on 2019-04-28]. Available from: https : / / docs . fedoraproject.org/en-US/project/. 33. FEDORA PROJECT. fedpkg [online]. 2019 [visited on 2019-04-28]. Available from: https://pagure.io/fedpkg.

63 BIBLIOGRAPHY

34. FEDORA PROJECT. Package Signing FAQ [online]. 2019 [visited on 2019-05-01]. Available from: https://getfedora.org/en/ keys/faq/. 35. FEDORA PROJECT. Package Source Control - Fedora Project Wiki [online]. 2019 [visited on 2019-04-25]. Available from: https://fedoraproject.org/wiki/Package_Source_Control# Lookaside_Cache. 36. FEDORA PROJECT. The kernel meta package (upstream) [online]. 2019 [visited on 2019-05-08]. Available from: https://src.fedoraproject.org/rpms/kernel. 37. FEDORA PROJECT. The Koji Build System [online]. 2019 [visited on 2019-04-28]. Available from: https://pagure.io/koji. 38. FEDORA PROJECT. Welcome to rpkg’s documentation! [online]. 2019 [visited on 2019-04-03]. Available from: https : / / docs . pagure.org/rpkg/. 39. FOSTER-JOHNSON, . Red Hat RPM Guide. Illustrated edition. *Red Hat, 2003. ISBN 0764549650,9780764549656. 40. GITLAB INC. GitLab: The first single application for the entire DevOps lifecycle [online] [visited on 2019-04-28]. Available from: https://about.gitlab.com/. 41. GOODWIN, Devan. RFE: Support one patch per tag for UpstreamBuilder. [online]. 2010 [visited on 2019-05-08]. Available from: https://github.com/dgoodwin/tito/issues/4. 42. GOODWIN, Devan. Tito: About [online]. 2019 [visited on 2019-03-27]. Available from: https : / / github . com / dgoodwin / tito / blob / 98a517554b6b3c5c1e44b25691e8acc58b5481c2/README.md. 43. HERTZOG, Raphaël. Debian New Maintainers’ Guide [online]. 2010 [visited on 2019-05-01]. Available from: https://www.debian.org/doc/manuals/maint-guide. 44. HUNGER, Steve. Debian GNU/Linux Bible. 1 CD with Debian GNU/Linux 2.2r2. Hungry Minds, 2001. ISBN 9780764547102,0764547100.

64 BIBLIOGRAPHY

45. IAN JACKSON, Guillem Jover. deb(5) — dpkg-dev — Debian unstable — Debian Manpages [online]. 2019 [visited on 2019-05-06]. Available from: https : //manpages.debian.org/unstable/dpkg-dev/deb.5.en.html. 46. IANA. Deb () [online]. 2014 [visited on 2019-05-07]. Available from: http://www.iana.org/assignments/media- types/application/vnd.debian.binary-package. 47. LUCAS, Michael W. PGP & GPG: Email for the Practical Paranoid. 1st ed. No Starch Press, 2006. ISBN 1593270712,9781593270711. 48. LUTZ, M. Learning Python: Powerful Object-Oriented Programming. O’Reilly Media, 2013. Safari Books Online. ISBN 9781449355715. Available also from: https://books.google.cz/books?id= ePyeNz2Eoy8C. 49. MAS, Raphaël Hertzog; Roland. The Debian Administrator’s Handbook: Debian Wheezy from Discovery to Mastery. Freexian SARL, 2013. ISBN 9791091414029. 50. NATHAN WILLIS. Arch Linux and (the lack of) package signing [online]. 2011 [visited on 2019-04-30]. Available from: https: //lwn.net/Articles/434990. 51. P. RESNICK, Ed. RFC 2822: Internet Message Format. 2001. Technical report. 52. PACKIT TEAM. One Git library to Rule [online]. 2019 [visited on 2019-04-28]. Available from: https://github.com/packit- service/ogr. 53. PACKIT TEAM. Packit: docs [online]. 2019 [visited on 2019-05-09]. Available from: https://github.com/packit-service/packit/ tree/master/docs. 54. PALLETS TEAM. Flask [online]. 2018 [visited on 2019-03-21]. Available from: http://flask.pocoo.org. 55. PHILLIPS, Dusty. Arch Linux Handbook: A Simple, Lightweight Linux Handbook. CreateSpace, 2009. ISBN 9781448699605,1448699606.

65 BIBLIOGRAPHY

56. PIERRE-YVES CHIBON. Docs - pagure - Pagure.io [online]. 2019 [visited on 2019-05-01]. Available from: https://pagure.io/ docs/pagure/. 57. PIERRE-YVES CHIBON. Pagure [online]. 2019 [visited on 2019- 04-28]. Available from: https://pagure.io/pagure. 58. POSTEL, Jon; REYNOLDS, Joyce. RFC 959: File transfer protocol. 1985. Technical report. 59. PROFFITT, Brian. Introducing Fedora: Desktop Linux. 1st ed. Course Technology PTR, 2010. ISBN 9781435457782,1435457781. 60. PROJECT, Fedora. github2fedmsg [online]. 2019 [visited on 2019- 04-28]. Available from: https://github.com/fedora-infra/ github2fedmsg. 61. PYTHON SOFTWARE FOUNDATION. PyPI – the Python Package Index [online]. 2019 [visited on 2019-05-10]. Available from: https://pypi.org. 62. QI, Chenxiong. Thoughts on rpkg2 [online]. 2017 [visited on 2019- 05-08]. Available from: https://pagure.io/rpkg/issue/49. 63. RDO PROJECT. RDO OpenStack Packaging [online]. 2019 [visited on 2019-03-24]. Available from: https://www.rdoproject.org/ documentation/intro-packaging/. 64. RDO PROJECT. RDO: Frequently Asked Questions [online]. 2019 [visited on 2019-03-24]. Available from: https://www.rdoproject.org/rdo/faq/. 65. RDO PROJECT. Welcome to DLRN’s documentation! [online]. 2019 [visited on 2019-05-08]. Available from: https://dlrn.readthedocs.io. 66. RED HAT, INC. rebase-helper: API [online]. 2019 [visited on 2019- 03-27]. Available from: https://rebase-helper.readthedocs. io/en/latest/api/index.html. 67. RED HAT, INC. Red Hat - We make open source technologies for the enterprise [online]. 2019 [visited on 2019-05-10]. Available from: https://www.redhat.com/en. 68. RED HAT, INC. rpkg-util [online]. 2019 [visited on 2019-03-27]. Available from: https://pagure.io/rpkg-util.

66 BIBLIOGRAPHY

69. RED HAT, INC. Welcome to rebase-helper [online]. 2019 [visited on 2019-03-27]. Available from: https://rebase-helper.readthedocs.io/. 70. RED HAT, INC. Welcome to rebase-helper: How does it work? [online]. 2019 [visited on 2019-03-27]. Available from: https://rebase-helper.readthedocs.io/en/latest/#how- does-it-work. 71. REPRODUCIBLE BUILDS PROJECT. Buy-in – reproducible-builds.org [online]. 2019 [visited on 2019-04-30]. Available from: https://reproducible-builds.org/docs/buy-in/. 72. REPRODUCIBLE BUILDS PROJECT. Definitions – reproducible- builds.org [online]. 2019 [visited on 2019-04-30]. Available from: https://reproducible-builds.org/docs/definition/. 73. REPRODUCIBLE BUILDS PROJECT. Reproducible Builds [online]. 2019 [visited on 2019-04-30]. Available from: https://reproducible-builds.org/. 74. RITCHIE, Dennis M; THOMPSON, Ken. The UNIX time-sharing system. Bell System Technical Journal. 1978, vol. 57, no. 6. 75. ROSSUM, G van. Python tutorial, technical report CS-R9526. Centrum voor Wiskunde en Informatica (CWI), Amsterdam. 1995. 76. RPM FUSION TEAM. RPM Fusion [online]. 2019 [visited on 2019-05-01]. Available from: https://rpmfusion.org/. 77. SOFTWARE IN THE PUBLIC INTEREST, Inc. et al. About Debian [online]. 2019 [visited on 2019-03-11]. Available from: https: //www.debian.org/intro/about. 78. SOFTWARE IN THE PUBLIC INTEREST, Inc. et al. Debian GNU/Hurd [online]. 2019 [visited on 2019-03-11]. Available from: https://www.debian.org/ports/hurd/. 79. SOFTWARE IN THE PUBLIC INTEREST, Inc. et al. What Does Free Mean? [online]. 2019 [visited on 2019-03-11]. Available from: https://www.debian.org/intro/free.

67 BIBLIOGRAPHY

80. UBUNTU DEVELOPERS. Ubuntu Packaging Guide [online]. 2017 [visited on 2019-05-01]. Available from: http : //packaging.ubuntu.com/ubuntu-packaging-guide.pdf. 81. VERMEULEN, Sven. Linux Sea [online]. 2016 [visited on 2019- 05-10]. Available from: http://swift.siphos.be/linux_sea.

68 Glossary

dist-git Distribution git, git with additional data storage. It is designed to hold content of source RPMs [27]. ix, 9, 10, 12, 15–17, 20, 27, 28, 32, 36–40, 44, 46, 51, 52, 54–56, 70 lookaside-cache Dedicated service used to avoid saving whole source archives in the repositories [35]. 10, 15, 20, 23, 24, 44, 47, 55 rolling-release Without fixed releases. Arch is one of the rolling-release Linux distribution [22]. 4, 9, 69

ABS Arch Build System, a ports—like package building system makes it simple to create your own easily installable Arch packages from source, to use and/or share with the community on the AUR [55]. 6 API Application Programming Interface. 10, 22, 23, 25, 33, 43–45, 47, 48, 56 Arch Lightweight Gnu/Linux distribution [55]. It is a simple (UNIX-like design and philosophy [55]), but highly-customizable, rolling-release system. [55]. 3, 4, 6, 8, 9, 11, 12, 15, 16, 18–20, 69, 72 AUR The Arch User Repository, offering many thousands of build scripts for Arch user—provided software packages [55]. 12, 69

Bodhi Update gating system designed to democratize the package update testing and release process for RPM based Linux distributions.[26] The process is based on +1/-1 karma voting of testers [26]. 12, 13, 15, 41, 55, 56

CLI Command-line Interface. 21, 23, 25, 32, 33, 38, 43–46, 51 Copr An easy-to-use automatic build system providing a package repository as its output [23]. 9, 13, 17, 22, 24, 40, 41, 53, 54, 56

Debian Linux distribution created in 1993. Used as a base for many Linux distributions like Ubuntu, elementary OS, Linux Mint or Kali Linux. 3–5, 8, 11, 13, 14, 16, 18, 20, 73

69 Glossary

DNF Software package manager that installs, updates, and removes packages on RPM-based Linux distributions (article DNF in [30]). 13, 19, 56 downstream For upstream program or set of programs, Fedora can be called as downstream (article Staying close to upstream projects in [30]). See upstream for more information. 19, 28, 34, 36, 46, 47, 51, 53, 54, 57 fedmsg FEDerated MeSsaGe bus, originally FEDora MeSsaGing. Broker-less messaging architecture to send and receive messages to and from various services in Fedora Infrastructure [28]. 38, 44, 45, 71

Fedora Linux distribution developed and sponsored by Fedora Project and Red Hat [29]. vii, ix, 1–4, 6, 8–10, 12, 13, 15, 16, 19, 20, 26–28, 31–46, 51, 55–57, 59, 70

Fedora Project a community of people working together to build a free and open source software platform and to collaborate on and share user- focused solutions built on that platform [32]. 4, 13, 52, 70, 71 fedpkg Tooling for working with Fedora’s dist-git and artifact build including RPMs, containers, modules [33]. 15, 20

Flask A powerfull micro framework for creating Python web applications [17]. 43, 45

FTP File Transfer Protocol, internet standard described in the RFC959 ([58]): The objectives of FTP are 1) to promote sharing of files (computer programs and/or data), 2) to encourage indirect or implicit (via programs) use of remote computers, 3) to shield a user from variations in file storage systems among hosts, and 4) to transfer data reliably and efficiently. 8 git A free and open source distributed system designed to handle everything from small to very large projects with speed and efficiency. 9, 10, 12, 15, 16, 20–28, 32, 34, 36,38, 43–47, 51, 54, 57, 59, 69, 71, 72, 83

70 Glossary

GitHub The single largest host for git repositories, and is the central point of collaboration for millions of developers and projects. A large percentage of all git repositories are hosted on GitHub, and many open-source projects use it for git hosting, issue tracking, code review, and other things. [16]. ix, 10, 38, 43–48, 50, 55, 71 GitHub2Fedmsg An application that rebroadcasts GitHub events on the fedmsg bus [60]. 38, 45 GitLab A single application for the entire software development lifecycle. From project planning and source code management to CI/CD, monitoring, and security [40]. 10, 47, 71 GNU GNU’s Not Unix. An effort of several and developers to create a free, UNIX-like operating system [81]. 18, 71 GNU/Linux UNIX-like operating system. GNU system based on the Linux kernel originally created by in 1991 [44]. 3 GPG GNU Privacy Guard. A free, open-source implementation of PGP from the GNU project GPG aims to be compatible with the OpenPGP Internet standard as defined in RFC 2440 [39]. 18, 19, 21, 44, 51, 54, 59

Koji An RPM-based build system. The Fedora Project uses Koji for their build system (instance hosted at https://koji.fedoraproject. org) [37]. 13, 15, 17, 55, 56

Linux distribution A collection of software (called the packages) bundled together in a coherent set that creates a fully functional environment [81]. vii, 1–21, 27, 59, 69, 70, 72, 73

OGR One Git library to Rule. Library for one API for many git forges. (e.g. GitHub, GitLab, Pagure) [52]. 45, 47, 48, 59, 83

Packit The name of the proposed prototype. The code is hosted at https://github.com/packit-service/packit. vii, 44–47, 51, 53, 55–57, 59, 75, 83

71 Glossary pacman Package manager for Arch Linux distribution, written in C and aims to be lightweight and fast [55]. 12, 16, 18

Pagure Git-centered forge, python based using pygit2 [57]. 47, 48, 71

PGP An original implementation of the OpenPGP standard, whereas GnuPG is a freely available reimplementation of that same standard [47]. 73

PPA Personal Package Archives. Non-standard software sources for Ubuntu [80]. 13

PyPI The Python Package Index (PyPI) is a repository of software for the Python programming language [61]. 44, 55

Python Python is a simple, yet powerful programming language that bridges the gap between C and shell programming, and is thus ideally suited for "throw-away programming" and rapid prototyping [75]. 21–23, 25, 43–45, 47, 55, 56, 70

RDO RPM Distribution of OpenStack [64]. ix, 26, 27, 29

Red Hat Red Hat, Inc. – Provider of enterprise open source solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies[67]. 4, 6, 13, 31, 70

REST REpresentational State Transfer. 10, 47

RPM RPM Package Manager, originally Red Hat Package Manager (before addoption by other Linux distributions) [39]. It can also mean the package format used by this manager [13]. 3, 4, 7, 13, 16, 17, 19–23, 25–27, 40, 44, 54, 69–72 spec file File containing the information used by RPM to create the binary and source packages. The spec file also set files that are a part of the package and where they should be placed after the installation [13]. 7, 10, 15, 17, 20–26, 34, 36, 38, 40, 41, 44, 46, 47, 54, 56, 57

72 Glossary

SRPM Source RPM. Package containing the sources and instructions to build a binary package [39]. 7, 17, 22, 23, 40, 41, 44, 54

SVN Apache Subversion (SVN), centralized version control system [12]. 8, 15, 16

Ubuntu Linux distribution based on Debian [80]. 3, 13, 69, 72

UNIX A family of multi-user and multitasking operating systems derived from the original AT&T Unix. Development started by Ken Thompson and Dennis Ritchie in 1970 [74]. 3, 69, 71 upstream In free and open source projects, the upstream of a program or set of programs is the project that develops those programs. This term comes from the idea that water and the goods it carries float downstream and benefit those who are there to receive it (article Staying close to upstream projects in [30]). 14, 19, 27, 34, 36, 38, 46, 47, 51–54, 56, 57, 70

URL Uniform Resource Locator. 15, 20, 46, 59

Web of Trust The global network of people who have identified each other and digitally signed each other’s Open-PGP (PGP stands for Pretty Good Privacy) keys. The Web of Trust is composed entirely of links between individuals [47]. 18

73

A Appendices

A.1 Packit CLI

$ packit --help Usage: packit [OPTIONS] COMMAND [ARGS]...

Integrate upstream open source projects into Fedora operating system.

Options: -d, --debug Enable debug logs. --fas-user TEXT Fedora Account System username. -k, --keytab TEXT Path to FAS keytab file. --dry-run Do not perform any remote changes (pull requests or comments). -h, --help Show this message and exit.

Commands: build Build selected upstream project in... create-update Create a bodhi update for the... listen-to-fedmsg Listen to events on fedmsg and... propose-update Release current upstream release... srpm Create new SRPM (.src.rpm file)... status Display status. sync-from-downstream Copy synced files from Fedora... version Display the version.

75 A. Appendices $ packit build --help Usage: packit build [OPTIONS] [PATH_OR_URL]

Build selected upstream project in Fedora.

Packit goes to dist-git and performs ‘fedpkg build‘ for the selected branch.

PATH_OR_URL argument is a local path or a URL to the upstream git repository, it defaults to the current working directory

Options: --dist-git-branch TEXT Target branch in dist-git to release into. --dist-git-path TEXT Path to dist-git repo to work in. Otherwise clone the repo in a temporary directory. --scratch Submit a scratch koji build -h, --help Show this message and exit.

76 A. Appendices $ packit create-update --help Usage: packit create-update [OPTIONS] [PATH_OR_URL]

Create a bodhi update for the selected upstream project

PATH_OR_URL argument is a local path or a URL to the upstream git repository, it defaults to the current working directory

Options: --dist-git-branch TEXT Target branch in dist-git to release into. --koji-build TEXT Koji build (NVR) to add to the bodhi update (can be specified multiple times) --update-notes TEXT Bodhi update notes --update-type [security|bugfix|enhancement|newpackage] Type of the bodhi update -h, --help Show this message and exit.

77 A. Appendices $ packit listen-to-fedmsg --help Usage: packit listen-to-fedmsg [OPTIONS] [MESSAGE_ID]...

Listen to events on fedmsg and process them.

if MESSAGE-ID is specified, process only the selected messages

Options: -h, --help Show this message and exit.

78 A. Appendices $ packit propose-update --help Usage: packit propose-update [OPTIONS] [PATH_OR_URL] [VERSION]

Release current upstream release into Fedora

PATH_OR_URL argument is a local path or a URL to the upstream git repository, it defaults to the current working directory

VERSION argument is optional, the latest upstream version will be used by default

Options: --dist-git-branch TEXT Target branch in dist-git to release into. --dist-git-path TEXT Path to dist-git repo to work in. Otherwise clone the repo in a temporary directory. --local-content Do not checkout release tag. Use the current state of the repo. --force-new-sources Upload the new sources also when the archive is already in the lookaside cache. --remote TEXT Name of the remote to discover upstream project URL, If this is not specified, default to origin. --upstream-ref TEXT Git ref of the last upstream commit in the current branch from which packit should generate patches (this option implies the repository is source-git). -h, --help Show this message and exit.

79 A. Appendices $packit srpm --help Usage: packit srpm [OPTIONS] [PATH_OR_URL]

Create new SRPM (.src.rpm file) using content of the upstream repository.

PATH_OR_URL argument is a local path or a URL to the upstream git repository, it defaults to the current working directory

Options: --output FILE Write the SRPM to FILE instead of current dir. --remote TEXT Name of the remote to discover upstream project URL, If this is not specified, default to origin. --upstream-ref TEXT Git ref of the last upstream commit in the current branch from which packit should generate patches (this option implies the repository is source-git). -h, --help Show this message and exit.

80 A. Appendices $ packit status --help Usage: packit status [OPTIONS] [PATH_OR_URL]

Display status.

- latest downstream pull requests - versions from all downstream branches - latest upstream releases - latest builds in Koji - latest updates in Bodhi

Options: -h, --help Show this message and exit.

81 A. Appendices $ packit sync-from-downstream --help Usage: packit sync-from-downstream [OPTIONS] [PATH_OR_URL]

Copy synced files from Fedora dist-git into upstream by opening a pull request.

PATH_OR_URL argument is a local path or a URL to the upstream git repository, it defaults to the current working directory

Options: --dist-git-branch TEXT Source branch in dist-git to sync from. --upstream-branch TEXT Target branch in upstream to sync to. --no-pr Do not create a pull request to upstream repository. --fork / --no-fork Push to a fork before creating a pull request. --remote TEXT Name of the remote where packit should push. If this is not specified, push to a fork if the repo can be forked. -h, --help Show this message and exit.

82 A. Appendices A.2 Sources

• Packit project: https://github.com/packit-service/packit – Snapshot of the git repository is included as a digital attachment in the IS.

• OGR project: https://github.com/packit-service/packit – Snapshot of the git repository is included as a digital attachment in the IS.

• Sources for the thesis itself, build setup and other resources: https://gitlab.fi.muni.cz/xlachma1/diploma-thesis

83