On Managing Vulnerabilities in AI/ML Systems
Total Page:16
File Type:pdf, Size:1020Kb
On managing vulnerabilities in AI/ML systems Jonathan M. Spring April Galyardt jspringATseidotcmudotedu Software Engineering Institute CERT® Coordination Center Carnegie Mellon University Software Engineering Institute Pittsburgh, PA Carnegie Mellon University Pittsburgh, PA Allen D. Householder Nathan VanHoudnos CERT® Coordination Center Software Engineering Institute Software Engineering Institute Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA Pittsburgh, PA ABSTRACT 1 INTRODUCTION This paper explores how the current paradigm of vulnerability man- The topic of this paper is more “security for automated reasoning” agement might adapt to include machine learning systems through and less “automated reasoning for security.” We will introduce the a thought experiment: what if flaws in machine learning (ML) questions that need to be answered in order to adapt existing vul- were assigned Common Vulnerabilities and Exposures (CVE) iden- nerability management practices to support automated reasoning tifiers (CVE-IDs)? We consider both ML algorithms and model ob- systems. We suggest answers to some of the questions, but some are jects. The hypothetical scenario is structured around exploring the quite thorny questions that may require a new paradigm of either changes to the six areas of vulnerability management: discovery, re- vulnerability management, development of automated reasoning port intake, analysis, coordination, disclosure, and response. While systems, or both. algorithm flaws are well-known in the academic research commu- First, some definitions. We follow the ®CERT Coordination Cen- nity, there is no apparent clear line of communication between this ter (CERT/CC) definition of vulnerability: “a set of conditions or research community and the operational communities that deploy behaviors that allows the violation of an explicit or implicit secu- and manage systems that use ML. The thought experiments identify rity policy” [23, §1.2]. We will follow Spring et al. [51] and define some ways in which CVE-IDs may establish some useful lines of machine learning (ML) as “a set of statistical tools that analyze data communication between these two communities. In particular, it to infer relationships and patterns. Ideally, the relationships and would start to introduce the research community to operational patterns inferred by ML will lead to a useful model of the object security concepts, which appears to be a gap left by existing efforts. or phenomenon that the data describes,” and define artificial intel- ligence (AI) as “a software agent that takes actions based on its CCS CONCEPTS environment.” To be concrete, this paper will focus on vulnerability • Computing methodologies ! Machine learning algorithms; management for just ML-enabled systems. • Software and its engineering ! Maintaining software; • Secu- One practical way to think of security services for an ML system rity and privacy ! Vulnerability management. is via the set of services a Computer Security Incident Response Team (CSIRT) might provide, which is produced by Forum of In- KEYWORDS cident Response and Security Teams (FIRST) and documented by Benetis et al. [2]. A complete risk management and security perspec- vulnerability management, machine learning, CVE-ID, prioritiza- tive on ML would include more than the CSIRT services framework. tion arXiv:2101.10865v1 [cs.CR] 22 Jan 2021 However, we will work from the assertion that to manage the se- ACM Reference Format: curity of an ML-enabled system, all CSIRT services will need to be Jonathan M. Spring, April Galyardt, Allen D. Householder, and Nathan able to handle ML systems. VanHoudnos. 2020. On managing vulnerabilities in AI/ML systems. In New Specifically, of the CSIRT services, we are carving out just vul- Security Paradigms Workshop 2020 (NSPW ’20), October 26–29, 2020, Virtual nerability management for discussion. The other services, as well conference. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/ as wider issues such as risk management, all have challenges as 3442167.3442177 well, but we leave them as future work. Vulnerability management includes six services [2, §7]: Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed • Vulnerability discovery / research for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. • Vulnerability report intake For all other uses, contact the owner/author(s). • Vulnerability analysis NSPW ’20, October 26–29, 2020, Virtual conference • Vulnerability coordination © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8995-2/20/10. • Vulnerability disclosure https://doi.org/10.1145/3442167.3442177 • Vulnerability response NSPW ’20, October 26–29, 2020, Virtual conference Spring, Galyardt, Householder, and VanHoudnos These areas cover a wide range. They span the interface between There are two common axes that help distinguish vulnerabilities software developers, software users, security teams, and people who in modern security practice. One is within vulnerability identifica- find flaws in software. There is national infrastructure in multiple tion. The second is level of abstraction of the vulnerable product. countries dedicated to support these services and facilitate com- Sections 2.1 and 2.2, respectively, discuss these levels. Section 2.3 munication. For example, both the United States and the People’s summarizes background the the CVE project. Republic of China have National Vulnerability Databases (NVDs). FIRST, a global body, provides one definition of how to score the 2.1 Vulnerability Identification severity of vulnerabilities, Common Vulnerability Scoring Sys- Vulnerability identification and classification spans from scanning tem (CVSS). The Common Vulnerabilities and Exposures (CVE) individual systems to organizing vulnerabilities into categories to scheme is designed to assist such cataloging and ranking efforts. facilitate better programming principles. A number of vulnerabil- This paper’s goal is to facilitate the creation of a trading zone be- ity identification methods exist. CVE is perhaps the most widely tween ML engineers, software architects, and security practitioners. known, but there are others. Any trading zone requires a shared language, whether it is a physi- In increasing broadness along the identification and classification cal or intellectual trading zone [15]. All participants in a trading axis, we have: zone want something of value they can take back to their respective • Instance of a vulnerable product communities. These items of value are often “boundary objects,” • Vulnerability in a product (e.g., CVE-ID, VU#) which mark boundaries by being recognizable and functional in • Category of which a vulnerability is an example (e.g., Com- both cultures in the trading zone. Rawls and Mann [41] states that mon Weakness Enumeration (CWE), Open Web Application the Mitre Corporation (MITRE) produces identifiers, such as CVE Security Project (OWASP)) identifiers (CVE-IDs), in part for their value to trading zones as boundary objects. This background makes CVE-IDs an attractive 2.1.1 Instances of Vulnerable Products. On the very specific end and useful focal point for our thought experiments. of this spectrum, we have instances of a vulnerable product. An We will organize our exploration of a new paradigm for ML se- instance is a specific computer or service that uses a vulnerable curity around one hypothetical – what if flaws in ML systems were product. When an organization scans the systems it owns to per- assigned CVE-IDs? Sections 4 and 6 do the main work on exploring form asset management, it will find instances of a vulnerability. the thought experiment. Before we can answer this question, we Instances are often tagged as the association of a host or system first lay some background on the current ways of identifying soft- identifier accompanied by the ID of the vulnerability of which they ware vulnerabilities in Section 2. Section 3 will provide background are an instance.1 Instances may also be called findings or sightings. on the current state of adversarial attacks on ML algorithms. Sec- tion 4 then steps through each of the services areas of vulnerability 2.1.2 Vulnerable Products. Moving up from instances, we find vul- management to explore the impact of giving ML algorithm flaws nerable products. The practitioner community’s expectation is that CVE-IDs. Section 5 explores the ML algorithm thought experiment a product is some sort of artificial information processing system. from the perspective of CVE Numbering Authorities (CNAs). Sec- This definition is vague because a vulnerable product is usually tion 6 steps through each of the service areas to explore the impact the thing that security practitioners say “has” the vulnerability, of giving ML model object flaws CVE-IDs. as defined above. Since a vulnerability may be introduced bya software defect, configuration decision, design decision, system interaction, or environmental mismatch, the “product” that has a vulnerability cannot be constrained much. Section 2.2 will discuss different categories of vulnerable products. 2 VULNERABILITY BACKGROUND CVE-IDs are most closely associated with products. Section 2.3 Expanding on the definition of vulnerability cited in Section 1, a will detail the CVE program.