An Ecological Approach to Software Supply Chain Risk Management

130 PROC. OF THE 15th PYTHON IN SCIENCE CONF. (SCIPY 2016) An Ecological Approach to Software Supply Chain Risk Management Sebastian Benthall‡§∗, Travis Pinney‡¶, JC Herz∗∗‡, Kit Plummerk‡ https://youtu.be/6UnuPhTPdnM F Abstract—We approach the problem of software assurance in a novel way With a small number of analytic assumptions about the inspired by an analytic framework used in natural hazard risk mitigation. Exist- propagation of vulnerability and exposure through the software ing approaches to software assurance focus on evaluating individual software dependency network, we have developed a model of ecosystem projects in isolation. We demonstrate a technique that evaluates an entire risk that predicts "hot spots" in need of more investment. In this ecosystem of software projects, taking into account the dependencey structure paper, we demonstrate this model using real software dependency between packages. Our model analytically separates vulnerability and exposure data extracted from PyPI using Ion Channel [IonChannel]. as elements of software risk, then makes minimal assumptions about the propagation of these values through a software supply chain. Combined with data collected from package management systems, our model indicates "hot spots" Prior work in the ecosystem of higher expected risk. We demonstrate this model using data collected from the Python Package Index (PyPI). Our results suggest that Zope [Verdon2004] outline the diversity of methods used for risk and Plone related projects carry the highest risk of all PyPI packages because analysis in software design. Their emphasis is on architecture- they are widely used and their core libraries are no longer maintained. level analysis and its iterative role in software development. Security is achieved through managing information flows through Index Terms—risk management, software dependencies, complex networks, architecturally distinct tiers of trust. They argue for a team-based software vulnerabilities, software security approach with diverse knowledge and experience because "risk analysis is not a science". Contrary to this, our work develops a Introduction scientific theory of risk analysis, building on work from computer science and other fields. Systems that depend on complex software are open to many kinds There is a long history of security achieved through static anal- of risk. One typical approach to software security that mitigates ysis of source code. [Wagner2000] points out that the dependency this risk is static analysis. We are developing novel methods to of modern Internet systems on legacy code and the sheer com- manage software risk through supply chain intelligence, with a plexity of source code involved makes manual source code level focus on open source software ecosystems. auditing very difficult. While some complex projects are audited The Heartbleed bug in OpenSSL is an example of community by large and dedicated communities, not all software systems are failure and of how vulnerabilities in open source software can be so gifted in human resources. Therefore, static analysis tools based a major security risk. [Wheeler2014] The recent failure of React, on firm mathematical foundations are significant for providing Babel, and many other npm packages due to the removal of one computer security at scale. small dependency, left-pad, shows how dependencies can be [Wheeler2015] develops a risk index for determining which a risk factor for production software. [Haney2016] These high open source software projects need security investments. This profile examples, though quite different from each other, illustrate work is part of the Linux Foundation (LF) Core Infrastructure how software risk traverses the supply chain. As dependencies Initiative (CII) and published by the Institute for Defense Analysis. become more numerous and interlinked, the complexity of the This metric is based on their expertise in software development system increases, as does the scope of risk management. Open analytics and an extensive literature review of scholarly and source software projects make their source code and developer commercial work on the subject. They then apply this metric to activity data openly available for analysis. This data can be used Debian packages and have successfully identified projects needing to mitigate software risk in ways that have not been explored. investment. This work is available on-line as the CII Census * Corresponding author: [email protected] project [CensusProject]. ‡ Ion Channel, ionchannel.io While software security studies general focus on the possibility § UC Berkeley School of Information of technical failure of software systems, open source software ¶ [email protected] exposes an additional risk of community failure. Development of ** [email protected] || [email protected] a software project may cease before it reaches a state of usability and maturity. [Schweik2012] is a comprehensive study of the Copyright © 2016 Sebastian Benthall et al. This is an open-access article success and failure of open source projects based on large-scale distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, analysis of SourceForge data, as well as survey and interview provided the original author and source are credited. data. They define a successful project as one that performs a AN ECOLOGICAL APPROACH TO SOFTWARE SUPPLY CHAIN RISK MANAGEMENT 131 useful function and has had at least three releases. They identify language it’s written in. (Some languages, such as C++, several key predictive factors to project success, including data are harder to write in securely and therefore generally more that indicates usefulness (such as number of downloads), number vulnerable [Wheeler2015]) of hours contributed to the project, and communicativeness of the • Exposure. A software project’s exposure is its extrinsic project leader. availability to attack. A direct network connection is a These precedents focus on individual software projects source of exposure. and their susceptiblity to technical and community failure. Vulnerability and exposure are distinct elements of a software [Nagappan2005] and [Nagappan2007] look at dependency rela- project’s risk. Analyzing them separately and then combining tionships between packages and specifically relative code churn them in a principled way gives us a better understanding of a (changes in lines of code) between dependent packages as a cause project’s risk. of system defects in Windows Server 2003. Dependencies complicate the way we think about vulnerability We build on these approaches by considering security as a and exposure. A software project doesn’t just include the code function of the entire software supply chain. This supply chain in its own repository; it also includes the code of all of its resembles a complex ecosystem more than a simple ’chain’ or dependencies, often tied to a specific version. Furthermore, a stack. We draw inspiration from a risk management strategy package does not need to be installed directly to be exposed--it can approaches used in another kind of complex system, namely be installed as a dependency of another project, or as a transitive disaster risk reduction and climate change adaptation research dependency. Based on these observations, we can articulate two developed by Cardona [Cardona2012] and widely used by the heuristics for use of dependency topology in assessing project risk: World Bank’s Global Facility for Disaster Risk Reduction among others [Yamin2013]. • If A depends on B, then a vulnerability in B implies a This framework evaluates the expected cost of low-probability corresponding vulnerability in A. events by distinguishing three factors of risk: hazards, exposure, • If A depends on B, then an exposure to A implies an and vulnerabilities. Hazards are potentially damaging factors from exposure to B. the environment. Exposure refers to the inventory of elements in places where hazards occur. Vulnerabilities are defined as the For example, if a web application (A) uses a generic web propensity of exposed elements to suffer adverse effects when application framework (B), and that web application is installed impacted by a hazard. Expected risk is then straightforwardly and recieving web traffic, then there is an instance of the web calculated using the formula: framework installed and recieving web traffic. The framework is exposed through the web application. If there is a vulnerability risk = hazard ∗ vulnerability ∗ exposure in the web application framework (such as a susceptibility to SQL injection attacks), then the web application will inherit that We adapting this framework to cybersecurity in the software vulnerability. There are exceptions to these rules. Developers ecosystem. There are significant differences between modeling of the web application (A) might recognize the vulnerability to risk from natural hazards and modeling cybersecurity risk. Most SQL injection and fix the problem without pushing the change notably, cybersecurity threats can be deliberately adversarial, de- upstream (to B). Nevertheless, this is a principled analytic way of tecting and exploiting specific weaknesses rather than presenting relating vulnerability, exposure, and software dependency that can a general hazard. In this work we focus on the interplay between be implemented as a heuristic and tested as a hypothesis. exposure and vulnerability in the software

An Ecological Approach to Software Supply Chain Risk Management

A Network Approach to Define Modularity of Components In

Practical Applications of Community Detection

Social Network Analysis with Content and Graphs

Networks in Cognitive Science

Multi-Layered Network Embedding

Exponential Random Graph Models An

Networktoolbox

Patterns in Syntactic Dependency Networks

Evaluating Network Analysis and Agent Based Modeling for Investigating the Stability of Commercial Air Carrier Schedules

Fast Cross-Layer Dependency Inference on Multi-Layered Networks

Mining Social Dependencies in Dynamic Interaction Networks

A Python Toolkit for Language Networks Construction and Analysis