Cryptography As an Operating System Service: a Case Study
Total Page:16
File Type:pdf, Size:1020Kb
Cryptography As An Operating System Service: A Case Study ANGELOS D. KEROMYTIS Columbia University JASON L. WRIGHT and THEO DE RAADT OpenBSD Project and MATTHEW BURNSIDE Columbia University Cryptographic transformations are a fundamental building block in many security applications and protocols. To improve performance, several vendors market hardware accelerator cards. However, until now no operating system provided a mechanism that allowed both uniform and efficient use of this new type of resource. We present the OpenBSD Cryptographic Framework (OCF), a service virtualization layer im- plemented inside the operating system kernel, that provides uniform access to accelerator function- ality by hiding card-specific details behind a carefully designed API. We evaluate the impact of the OCF in a variety of benchmarks, measuring overall system performance, application throughput and latency, and aggregate throughput when multiple applications make use of it. We conclude that the OCF is extremely efficient in utilizing cryptographic accelerator func- tionality, attaining 95% of the theoretical peak device performance and over 800 Mbps aggregate throughput using 3DES. We believe that this validates our decision to opt for ease of use by applica- tions and kernel components through a uniform API and for seamless support for new accelerators. We are grateful to Global Technologies Group, Inc. for providing us with two XL-Crypt (Hifn 7811) boards, one Hifn 6500 reference board, and one Hifn 7814 reference board. We are also grateful to Network Security Technologies, Inc. for providing us with two Hifn 7751 boards, one Broadcom 5820 board, and two Broadcom 5805 boards. In addition, Network Security Technologies funded part of the original development of the device-support software. Part of this work was supported by DARPA and the Air Force Research Laboratory, Air Force Material Command, USAF, under agreement number F30602-01-0537. Part of his work appeared in Proceedings of the USENIX Annual Technical Conference, San Antonio, TX, June 2003, pp. 165-178 and in Proceedings of the 11th IEEE International Conference on Networks, Sydney, Australia, Sept.-Oct. 2003, pp. 455-460. Authors’ addresses: A. D. Keromytis, Department of Computer Science, Columbia University, M.C. 0401, 1214 Amsterdam Avenue, New York, NY 10027; email: [email protected]; T. de Raadt, 812 23rd Avenue SE, Calgary, Alberta T2G 1N8, Canada; J. L. Wright, 3857 Beech Down Dr., Chantilly, VA 20151; M. Burnside, Department of Computer Science, Columbia University, M.C. 0401, 1214 Amsterdam Avenue, New York, NY 10027. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]. C 2006 ACM 0734-2071/06/0200-0001 $5.00 ACM Transactions on Computer Systems, Vol. 24, No. 1, February 2006, Pages 1–38. 2 • A. D. Keromytis et al. Furthermore, our evaluation points to several bottlenecks in system and operating system design: data copying between user and kernel modes, PCI bus signaling inefficiency, protocols that use small data units, and single-threaded applications. We identify some of these limitations through a set of measurements focusing on application-layer cryptographic protocols such as SSL. We offer several suggestions for improvements and directions for future work. We provide experimental evidence of the effectiveness of a new approach which we call operating system shortcutting. Short- cutting can improve the performance of application-layer cryptographic protocols by 27% with very small changes to the kernel. Categories and Subject Descriptors: D.4.6 [Operating Systems]: Cryptographic Controls; D.4.8 [Operating Systems]: Performance—Measurements General Terms: Security, Performance Additional Key Words and Phrases: Encryption, authentication, hash functions, digital signatures, cryptographic protocols 1. INTRODUCTION Today’s computing systems are used for applications such as electronic com- merce, tele-collaboration of various types, and evolving peer-to-peer systems that often contain sensitive information. Security in these systems depends on several mechanisms that utilize cryptographic primitives as a basic build- ing block. Such cryptographic primitives can be very complex [Broscius and Smith 1991] because the design of these systems is intended to impede sim- ple, brute-force, computational attacks. This complexity drives the belief that strong security is fundamentally incompatible with good performance which, in turn, leads to favoring performance over cryptography by minimizing use of the latter. However, the foundation for this belief is often software implementation [Feldmeier and Karn 1990] of algorithms that were originally intended for effi- cient hardware implementation. Although modern encryption algorithms such as AES were designed with performance on general CPUs in mind, they remain relatively heavyweight compared to other computational tasks typically found on a server or workstation. To address this issue, vendors have been marketing hardware accelera- tion boards that implement several cryptographic algorithms used by secu- rity protocols and applications. In some ways, this mirrors the evolution of high-performance graphics processing units (GPUs) to match the needs of the computer-gaming community. Note that, despite the increasing performance of CPUs, GPUs are considered essential for serious gaming (or other graphics- intensive applications), both because they often outperform the system pro- cessor and, perhaps more importantly, because the CPU can be used to com- plete other tasks while the GPU is handling the graphics-rendering part of the application. In this environment, GPUs and applications must conform to industry- standard APIs such as DirectX and OpenGL. In the case of cryptography, mod- ern operating systems lack the necessary support to provide efficient access to similar functionality to applications and the operating system itself through a uniform API that abstracts away hardware details. As a result, accelera- tors are often used directly through libraries linked with applications typically ACM Transactions on Computer Systems, Vol. 24, No. 1, February 2006. Cryptography As An Operating System Service: A Case Study • 3 requiring device-specific knowledge by the applications and preventing the op- erating system itself from easily utilizing such hardware. We present the OpenBSD Cryptographic Framework (OCF), a service virtu- alization layer implemented inside the operating system kernel, that provides uniform access to accelerator functionality by hiding device-specific details be- hind a carefully-designed API. The abstraction introduced allows us to easily support new hardware accelerators and enables applications to use any such accelerator without device-specific knowledge. Furthermore, this intermediate layer does not unduly impact performance which is common when such abstrac- tions are introduced. The OCF has been in use with OpenBSD [de Raadt et al. 1999] for over three years (since OpenBSD 2.8) and has proven stable and efficient in practice, although it continues to evolve in response to new requirements1. The OCF has also been ported to FreeBSD and NetBSD with a port under development for Linux. It offers features such as load-balancing across multiple accelerators, session migration, and algorithm chaining. We describe the changes we made to the OpenBSD kernel and applications to take advantage of the OCF. This article should serve as a good introduction to newcomers as well as veterans of operating system design and development of the complexity of introducing a major new mechanism to a relatively widely-used and stable operating system. We evaluate the impact of the OCF in a variety of micro-benchmarks, mea- suring overall system performance, application throughput and latency, and aggregate throughput when multiple applications use the OCF. Our evaluation shows that, despite its addition in the system as a device/service virtualization layer, the OCF is extremely efficient in utilizing cryptographic accelerator func- tionality, attaining 95% of the theoretical peak device performance. In another configuration, we were able to achieve a 3DES aggregate throughput of over 800 Mbps, by employing a multithreaded application and load-balancing across multiple accelerators. A secondary observation from our work is that small data buffers should be processed in software, if possible, freeing hardware accelerators to handle larger requests that better amortize the system and PCI transaction costs. On the other hand, multithreading results in increased utilization of the OCF, im- proving aggregate throughput. We make recommendations for future directions in architectural placement of cryptographic functionality,operating system pro- visions, and application design, and discuss several improvements and promis- ing directions