Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages
Total Page:16
File Type:pdf, Size:1020Kb
Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages Ruian Duan, Omar Alrawi, Ranjita Pai Kasturi, Ryan Elder, Brendan Saltaformaggio, and Wenke Lee Georgia Institute of Technology Abstract—Package managers have become a vital part of the organizations that rely on open-source interpreted program- modern software development process. They allow developers ming languages for different internal and external applications. to reuse third-party code, share their own code, minimize their Attackers can infiltrate well-defended organization by simply codebase, and simplify the build process. However, recent reports subverting the software supply chain of registries. For exam- showed that package managers have been abused by attackers to ple, eslint-scope [4], a package with millions of weekly distribute malware, posing significant security risks to developers downloads in Npm, was compromised to steal credentials and end-users. For example, eslint-scope, a package with millions of weekly downloads in Npm, was compromised to steal from developers. Similarly, rest-client [5], which has over credentials from developers. To understand the security gaps one hundred million downloads in RubyGems, was compro- and the misplaced trust that make recent supply chain attacks mised to leave a Remote-Code-Execution (RCE) backdoor possible, we propose a comparative framework to qualitatively on web servers. These attacks demonstrate how miscreants assess the functional and security features of package managers can covertly gain access to a wide-range of organizations by for interpreted languages. Based on qualitative assessment, we carrying out a software supply chain attack. apply well-known program analysis techniques such as metadata, static, and dynamic analysis to study registry abuse. Our initial efforts found 339 new malicious packages that we reported to Security researchers [7] are aware of these attacks and have the registries for removal. The package manager maintainers proposed several solutions to address the rise of malicious confirmed 278 (82%) from the 339 reported packages where three software in registries. Zimmermann et al. [8] systematically of them had more than 100,000 downloads. For these packages we studied 609 known security issues and revealed a large attack were issued official CVE numbers to help expedite the removal of surface in the Npm ecosystem. BreakApp [9], on the other these packages from infected victims. We outline the challenges hand, isolates untrusted packages, which addresses credential of tailoring program analysis tools to interpreted languages and theft and prevents access to sensitive data, but does not release our pipeline as a reference point for the community to stop cryptocurrency mining or backdoors. Additionally, many build on and help in securing the software supply chain. solutions [10]–[12] assume inherent trust and focus on finding bugs in packages rather than malicious packages. To make matters worse, some attacks are very sinister and use social I. INTRODUCTION engineering techniques [13], [14] to disguise themselves by Many modern web applications rely on interpreted pro- first publishing a “useful” package, then waiting until it isused gramming languages because of their rich libraries and pack- by their target to update it and include malicious payloads. ages. Registries (also known as package managers) like PyPI, Although, many security researchers are actively investigating Npm, and RubyGems provide a centralized repository that attacks on registries and proposing solutions, these approaches developers can search and install add-on packages to help in seem to be ad-hoc and one-off solutions. A better approach is development. For example, developers building a web appli- to understand the extent of the software supply chain abuse and cation can rely on Python web frameworks like Django [1], how miscreants are taking advantage of them. The approach arXiv:2002.01139v2 [cs.CR] 2 Dec 2020 Web2py [2], and Flask [3] to provide boilerplate code for rapid must be grounded to allow an objective comparison between development. Not only have registries made the development the different registry ecosystem. process more efficient, but also they have created a large community that collaborates and shares open-source code. To this end, we propose a framework that highlights key Unfortunately, miscreants have found ways to infiltrate these functionality, security mechanisms, stakeholders, and remedi- communities and infect benign popular packages with mali- ation techniques to comparatively analyze different registry cious code that steal credentials [4], install backdoors [5], and ecosystems. We use our framework to look at what features even abuse compute resources for cryptocurrency mining [6]. registries provide, what security principles are enforced, how is trust delegated between different parties, and what remediation The impact of this problem is not isolated to small one-off and contingency plans registries have in place for post-attack. web apps, but large websites, enterprises, and even government We leverage our findings to provide practical action items that registry maintainers can enforce using pre-existing tools and security principles that will improve the security of the overall Network and Distributed Systems Security (NDSS) Symposium 2021 package management ecosystem. Using well-known program 21-24 February 2021, San Diego, CA, USA ISBN 1-891562-66-5 analysis techniques, we build MALOSS, a custom pipeline https://dx.doi.org/10.14722/ndss.2021.23055 tailored for interpreted languages that we use to empirically www.ndss-symposium.org study the security of package managers. We make this pipeline public 1 for the community to use as a reference or starting integration and continuous deployment (CI/CD) pipeline to point to help analyze and identify suspicious packages. automate the release process (i.e., build and deploy). We use our pipeline MALOSS to study over one million Developers. Developers manage the app development frame- packages from PyPI, Npm, and RubyGems and identified 7 work and are consumers of the published packages. They malicious packages in PyPI, 41 malicious packages in Npm, are responsible for finding the right packages to use in their and 291 malicious packages in RubyGems. We reported these software and releasing their products to end-users. Devs focus packages to registry maintainers and had 278 of them removed, on developing unique features in their software and reuse over 82%. Three of the reported malicious packages had packages from registries for common functionalities. Also, over 100K installs and they were assigned an official CVE Devs need to address issues of reused packages, such as known number. We present an in-depth case study to demonstrate the vulnerabilities and incompatibilities. utility of our framework and demonstrate the sophistication of these malicious packages and present their infection vectors, End-users. Although not directly interacting with registries, capabilities, and persistence. Moreover, to study the impact end-users are still an important stakeholder in the ecosystem. the malicious packages, we use passive-DNS data to estimate Users are at the downstream and use services or applications how wide spread the installation of these malicious packages. from Devs on browsers, mobile devices or Internet-of-Things Finally, we propose actionable steps to help improve the overall (IoT) devices. Users are eventually customers that pay and security of package managers and protect the software supply fuel the whole ecosystem, however, they have no control of chain such as adding typo detection at the client-side to software except feedback channels and can be affected by minimize accidental errors of developers. upstream security issues. II. BACKGROUND B. An Overview of Registry Abuse Registries are platforms for code sharing and play an essen- We present a selected list of supply chain attacks in tial role in the software development process. We start by intro- Figure 2, spanning across different types of registries (e.g. ducing the four primary stakeholders involved in developing, interpreted languages, system-wide). In 2016, Tschacher [7] managing and using packages from registries, namely Registry demonstrated a proof-of-concept attack against package man- Maintainers (RMs), Package Maintainers (PMs), Developers agers. The attack used typosquatting, which is a technique (Devs) and End-users (Users). We then present an overview of that misspells the name of a popular package and waits for registry abuse and show that existing studies cannot address users installing the popular package to typo the name (hence the rising trend of supply chain attacks. We further dive into typosquatting) resulting in the installation of the malicious the security gaps and identify challenges in securing registries. package instead. As of August 2019, there were more than 300 malicious packages reported and removed in different registries (PyPI, Npm, RubyGems, etc.). In Figure 3, we aggregate the A. Primary Stakeholders number of malicious packages uploaded into registries and We sketch the characteristics of primary stakeholders and their corresponding download counts by year of uploading. We their simplified relationships in the package manager ecosys- note that these counts are documented/detected attacks, which tem in Figure 1. Note that the stakeholders are roles, which is a subset of all the attacks (known and unknown).