Attacks on Package Managers

Masaryk University Faculty of Informatics Attacks on Package Managers Bachelor’s Thesis Martin Čarnogurský Brno, Spring 2019 Masaryk University Faculty of Informatics Attacks on Package Managers Bachelor’s Thesis Martin Čarnogurský Brno, Spring 2019 This is where a copy of the official signed thesis assignment and a copy ofthe Statement of an Author is located in the printed version of the document. Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Martin Čarnogurský Advisor: Mgr. Vít Bukač, Ph.D. i Acknowledgements I would like to thank Mgr. Vít Bukač Ph.D., RNDr. Václav Lorenc, and Mgr. Patrik Hudák for their continuous support over the years. I would not have been able to go so far, both in education and professionally, and for these reasons I forever owe them my gratitude. iii Abstract The primary focus of this thesis is to analyse the current state of various package managers regarding security mechanisms related to selected malicious attacks, such as typosquatting or distribution of malicious packages. The further analysis described here also provides insights on differences between security mechanisms used in OS-level software managers, smartphone application marketplaces, and the primary focus of this thesis: community repositories of libraries used by developers. We then propose several monitoring mechanisms as a proof-of-concept to detect malicious intent, ongoing attacks or yet unknown vulnerabilities. The implemented system computes a risk- score using heuristics that is language independent where possible and evaluated against real data from Python package index. iv Keywords package manager, typosquatting, attack, malware v Contents Introduction 1 1 Overview of the ecosystem 3 1.1 Package managers .......................3 1.1.1 Debian package manager . .4 1.2 Package managers for developers ...............5 1.2.1 The Python Package Index . .5 2 Anatomy of a Python package 7 2.1 The Setup Script .......................7 2.2 Package Installer for Python (PIP) ..............8 2.3 Source distributions ...................... 10 2.4 Binary package format .................... 10 2.5 The Wheel Binary Package Format .............. 11 3 Previous Incidents 15 4 Attack vectors and threat models 19 4.1 Source code modifications ................... 19 4.2 Typosquatting ......................... 20 4.3 Bait packages ......................... 21 5 Analyzing packages on a global scale 23 5.1 Existing tools and frameworks ................ 23 5.2 Static Analysis ........................ 25 5.2.1 Abstract Syntax Tree . 26 5.2.2 Tree transformation and analysis . 29 5.3 Aura framework ........................ 32 5.3.1 apip . 33 5.4 Global PyPI scan findings .................. 33 6 Conclusions and Future Work 41 6.1 Future work .......................... 41 6.2 Conclusions .......................... 43 Glossary 48 vii A Appendix 51 A.1 Live analysis .......................... 51 A.1.1 Comparision of static analysis vs. live analysis approaches . 51 A.2 setup.py from the talib package ................ 52 B ssh-decorate incident evidence 55 C Built-in Aura analyzers 57 C.1 Produced hits ......................... 57 viii List of Figures 4.1 An example of a typosquatting package on PyPI when searching for a package scikit 20 4.2 Screenshot of a package that is already included in Python 3.3 but available for download on PyPI 22 5.1 Detections found during the latest scan 36 5.2 Screenshot of a typosquatting package 38 B.1 A screenshot of the opened GitHub issue by user mowshon after he found the malicious code 55 B.2 A screenshot of the malicious code 56 ix Introduction Package managers are widely used in various areas, ranging from OS-level installation of software frequently used in Linux systems to development libraries and the installation of smartphone applications. In this thesis, we aim to analyze various attack vectors on package managers used by developers and demonstrate a proof-of-concept monitoring system to address these issues that we developed from scratch. A quick introduction to package managers, their usage and how they operate are discussed in Chapter 1. A majority of the package managers in question are community- based. In this context, no central authority needs to confirm when a new version of a package is uploaded or if an entirely new package is being created. On one hand, it provides a low-barrier opportunity to contribute to the open-source and faster release cycles; on the other hand, it means that anyone can upload anything, which presents various exciting scenarios for malicious attacks. To understand how these attacks are performed, we need first to understand the basics of how packages themselves are being used; this is covered in Chapter 2. Several different threat vectors have been identified. We have seen from the past incident that the typical type of attack is a so-called typosquatting attack[11] now seeing a reincarnation in the world of package managers. Other forms of attacks include hijacking existing packages or creating bait packages with attractive names, trying to lure the developers to install them. We compiled a brief overview of the notable incidents in Chapter 3. Since packages are not isolated components but typically have dependencies on other packages, the compromise of a package can propagate much further and faster into other packages. We discuss this topic in more depth in Chapter 4. In Chapter 5, we present a proof-of-concept system, called Aura, that we created from scratch after an extensive research, which can scan terabytes of data and find anomalies in the published packages on the PyPI repository. This goal is accomplished by using a highly- optimized hybrid analysis engine that tracks the code execution flow 1 and defeats a selected set of code obfuscations. We further discuss these techniques in the associated chapter, as well as difficulties in terms of creating this engine. At the end of the chapter, we present interesting findings that we extracted from the dataset gathered by scanning the whole PyPI repository. Thesis conclusions and steps that could be taken in the future are provided in Chapter 6. 2 1 Overview of the ecosystem In this chapter, we discuss the roles of the package managers and briefly how they operate. As a baseline, we look into the Debian package manager, which in the context of this thesis is an ideal and mature model for how the package operations and ecosystem should work from the security point of view. Afterwards, we look into the package manager for Python, called pip. 1.1 Package managers Package managers 1 are currently present in various areas in com- puter systems where a user can install a missing software for her needs to avoid a complicated process of manual installation. Users usually select the application they wish to install from a list of available applications, and the package manager installs this application; in most cases, it also performs a default configuration that is needed. One of the main benefits of such systems is that they also handle dependencies, where the installed package also requires another package already installed for its functionality. These systems are commonly referred to as package managers and are available in modern operating systems and smartphones. As they often are used by non-technical people, they usually have several security mechanisms that aim to prevent the compromise of the end-user system, block malicious intent or mitigate the spread of a potentially exploitable vulnerability. In most cases, this is achieved by a central authority that manages the repository of available software, requiring the approval of every published software and their different versions with a combination of static analysis to flag packages that need a human review. 1. Sometimes also called Software Managers or Application Managers. The name often depends on context; for example, in programming languages, the preferred term is Package Manager, since the installed software is often just a set of libraries and not a directly executable application. 3 1. Overview of the ecosystem 1.1.1 Debian package manager A Debian package2 is a collection of files that allows applications or libraries to be distributed via the Debian package management system. There also is a Linux distribution with the same name, Debian[21], that is using Debian package management system as a core software manager, hence the name of this distribution. Any given package consists of one source package and one or more binary components with the structure defined by the policy3, although there are numerous techniques for creating these files. A significant note here is that every officially published package needs to have an associated source code, built by maintainers. Pack- ages that are already compiled and contain binary components are not accepted4, in order to ensure that all published packages originate from provided source code without any (sometimes malicious) addi- tions. Such a mechanism also allows for independent audits to verify that distributed packages are unmodified before being published. The mechanism is called reproducible builds5. Although reproducible builds in Debian are not available for all packages at the time of writ- ing, there is a great effort to increase the coverage and essentially provide them for all officially distributed software in Debian. Accepting a new package (or a new version of an existing package) into the official repository usually goes through several steps, suchas putting it in a testing or unstable area6 for a period, which mitigates several attack vectors discussed further in this thesis. Other additional extensions, such as Debsigs7, allow extending the standard package model by providing support for digital signatures and verification using PGP. 2. https://wiki.debian.org/Packaging 3.

Load more