Scalable Trust Establishment with Software Reputation
Total Page:16
File Type:pdf, Size:1020Kb
Scalable Trust Establishment with Software Reputation Sven Bugiel Lucas Davi Steffen Schulz System Security Lab System Security Lab System Security Lab TU Darmstadt (CASED) TU Darmstadt (CASED) TU Darmstadt (CASED) & Darmstadt, Germany Darmstadt, Germany Ruhr-Universität Bochum & [email protected] [email protected] Macquarie University (INSS) Darmstadt, Germany [email protected] ABSTRACT prehensive metric that works efficiently and scales to systems Users and administrators are often faced with the choice with several hundred interacting software systems. between different software solutions, sometimes even have In practice, there are several methods to measure the to assess the security of complete software systems. With trustworthiness of software components. They can be clas- sufficient time and resources, such decisions can be based sified as (1) formal verification, (2) evaluation and testing, on extensive testing and review. However, in practice this and/or (3) reputation systems. However, approaches (1) is often too expensive and time consuming: When a user and (2) do generally not meet the required efficiency and decides between two alternative software solutions or a veri- scalability due to the intensive manual work involved. In fier should assess the security of a complete software system particular, formal verification is used for scenarios with high during remote attestation, such assessments should happen associated risks or software classified as mission critical, e.g., almost in realtime. military applications, train controllers, or satellite firmware. In this paper, we present a pragmatic, but highly scalable By specifying (semi-)formally the security goals, adversary approach for the trustworthiness assessment of software pro- model, and requirements, software is developed (or adjusted) grams based on their security history. The approach can be such that it is verified to achieve these requirements. How- used to, e.g. automatically sort programs in an App store by ever, after several years of research, formal verification is their security record or on top of remote attestation schemes still too expensive for most mainstream applications and that aim to access the trustworthiness of complex software requires too much time to keep up with the frequent re- configurations. We implement our approach for the popu- leases of many mainstream software programs. By contrast, lar Debian GNU/Linux system, using publicly available in- methodical evaluation and testing, for instance, according formation from open-source repositories and vulnerability to Common Criteria or by thorough penetration testing, is databases. Our evaluation shows reasonable prediction ac- more practical and frequently used by industry and gov- curacy for the more vulnerable packets and good accuracy ernment organizations. However, the process is still too when considering entire system installations. slow and costly for many applications and, moreover, po- tentially error-prone due to the incompleteness of the test- ing/evaluation procedure. Additionally, the testing results are hard to interpret by a common user, who has to choose 1. INTRODUCTION between different software solutions for his requirements and A longstanding problem in computer science is how to as- who wants to evaluate the security of these options. Finally, sess the trustworthiness of a given (remote) software system, we are not aware of any reputation-based system for software i.e., if it behaves as expected. To make reasonable decisions, programs that meets the required scalability and objectivity, common metrics are required along which trust is quantifi- although in practice many users and administrators already able and comparable. Example use cases for this problem use subjective experience and recommendations in their de- are when a system administrator/user must decide between cision making processes. alternative implementations of, e.g., a web server or mail client, when a company decides on what programs are ac- Contribution. ceptable on employees' laptops that intent to connect to the In this paper, we present a pragmatic and lightweight ap- company network, or even when choosing between different proach to determine the trustworthiness of complex software available virtual machines to implement desired services in systems. We extend on previous works on statistical vulner- the cloud. In all these cases, we require an objective, com- ability prediction to derive the expected trustworthiness of large commodity computer systems and provide a generic, intelligible metric to aid in the assessment and comparison of software security. We implement and evaluate the accu- Permission to make digital or hard copies of all or part of this work for racy of our assessment scheme that can be used locally, to personal or classroom use is granted without fee provided that copies are assist in deciding between different software solutions, or to not made or distributed for profit or commercial advantage and that copies assess the trustworthiness of remote software systems dur- bear this notice and the full citation on the first page. To copy otherwise, to ing remote attestation. When assessing the trustworthiness republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. of complete system installations, we can predict their failure STC’11, October 17, 2011, Chicago, Illinois, USA. rate within error margins of around 10% to 30%. Copyright 2011 ACM 978-1-4503-1001-7/11/10 ...$10.00. 2. PROBLEM DESCRIPTION failure and trustworthiness and then present a metric to ex- Whether it is for large scale investment decisions, imple- press trustworthiness prediction. mentation of communication infrastructures or personal use, 3.1 State-Of-The-Art in Practical Systems system integrators and users are regularly confronted with the decision between different software systems. One major As lined out in Section 1, our daily decisions for trusting criterion for this decision making process is the trustworthi- or not trusting a particular software are usually not deter- ness of the program, i.e., how likely it is to \behave as ex- mined by objective and comprehensive testing and evalua- pected" with regards to the security and trust requirements tion or even formal analysis, as these approaches are often of the system integrator or user. simply too expensive and time consuming, and scale poorly However, increasingly complex and rapidly evolving soft- in today's rapidly changing software landscape. ware systems impede efficient security analysis and testing, In practice, the reliability of a program is often derived let alone formal analysis. Instead, users and system admin- based on subjective impressions, recommendations and past istrators typically rely on a vaguely defined reputation sys- experience. This reputation-based approach is well-known tem, built from personal experience, recommendations and and in fact very common for human interaction. In the soft- additional meta-data such as the programming language or ware landscape, such reputation is established based on per- development process. This is particularly problematic in sonal experience with the software, reviews and recommen- cases where combinations of software systems must be eval- dations by (often only partly) trusted third parties. Such uated in realtime, e.g., when evaluating client platforms in events are in general difficult to track and often require the remote attestation protocols. users to contribute to extensive reputation infrastructures. We know of no works that investigate the accuracy of However, in software security, the overwhelming amount of these informal assessments. Nevertheless, we expect their security incidents are, in fact, objective software defects that accuracy to decrease significantly with lower expertise of are recorded and monitored in public databases. Moreover, evaluator and time available to make the assessment. In multiple previous works show that one can indeed estimate the following sections we thus describe and evaluate an auto- the amount of future security failures based on past expe- mated generic mechanism to derive the likelihood of security rience [12, 14, 13, 21], meaning security failure rate of a failures in a given software, with the goal to support a more program is a relatively stable, characteristic value. accurate decision making processes. We identify the follow- 3.2 An Objective Definition of Trust ing requirements for a practical and efficient assessment of software trustworthiness. In general, the definition of security failure depends on the security goals and software environment of the respec- Accuracy. The produced trustworthiness assessments must tive usage scenarios, and is thus a highly subjective matter. be reasonably accurate to support the decision process. However, from the perspective of the software developer, a Error margins must be available to judge the accuracy different security notion can be identified where a given se- of the assessment. curity failure is either caused by a particular software or not, and a decision is made whether the program should be fixed Universality. The desired assessment system must be ap- and users be warned on potentially unexpected behavior. plicable to any software, i.e., it must not depend on in- This decision is usually straightforward and determined by dividual characteristics such as the programming lan- common practices in software and security engineering, such guage or development