The Design of the Openbsd Cryptographic Framework
Total Page:16
File Type:pdf, Size:1020Kb
The Design of the OpenBSD Cryptographic Framework Angelos D. Keromytis Jason L. Wright Theo de Raadt Columbia University OpenBSD Project OpenBSD Project [email protected] [email protected] [email protected] Abstract the design of these systems is intended to impede sim- ple, brute-force, computational attacks. This complexity drives the belief that strong security is fundamentally in- Cryptographic transformations are a fundamental build- imical to good performance. ing block in many security applications and protocols. To improve performance, several vendors market hard- This belief has led to the common predilection to avoid ware accelerator cards. However, until now no operating cryptography in favor of performance [22]. However, system provided a mechanism that allowed both uniform the foundation for this belief is often software imple- and efficient use of this new type of resource. mentation [8] of algorithms intended for efficient hard- ware implementation. To address this issue, vendors We present the OpenBSD Cryptographic Framework have been marketing hardware cryptographic acceler- (OCF), a service virtualization layer implemented in- ators that implement several cryptographic algorithms side the kernel, that provides uniform access to accel- used by security protocols and applications. However, erator functionality by hiding card-specific details be- modern operating systems lack the necessary support hind a carefully-designed API. We evaluate the impact to provide efficient access to such functionality to ap- of the OCF in a variety of benchmarks, measuring over- plications and the operating system itself through a all system performance, application throughput and la- uniform API that abstracts away device details. As tency, and aggregate throughput when multiple applica- a result, accelerators are often used directly through tions make use of it. libraries linked with applications, typically requiring We conclude that the OCF is extremely efficient in uti- device-specific knowledge by the applications, and pre- lizing cryptographic accelerator functionality, attaining venting the operating system itself from easily utilizing 95% of the theoretical peak device performance, and such hardware. over 800 Mbit/sec aggregate throughput using 3DES. We present the OpenBSD Cryptographic Framework We believe that this validates our decision to opt for ease (OCF), a service virtualization layer implemented inside of use by applications and kernel components through a the kernel, that provides uniform access to accelerator uniform API, and for seamless support for new accel- functionality by hiding device-specific details behind a erators. Furthermore, our evaluation points to several carefully-designed API. The abstraction introduced al- bottlenecks in system and operating system design: data lows us to easily support new hardware accelerators and copying between user and kernel modes, PCI bus signal- enable applications to use any such accelerator without ing inefficiency, protocols that use small data units, and device-specific knowledge. Furthermore, this intermedi- single-threaded applications. We offer several sugges- ate layer does not unduly impact performance, as is com- tions for improvements and directions for future work. mon when such abstractions are introduced. The OCF has been in use with OpenBSD [5] for over three years and has proven stable and efficient in practice. It offers 1 Introduction features such as load-balancing across multiple acceler- ators, session migration, and algorithm chaining. We describe the changes we made to the OpenBSD kernel Today’s computing systems are used for applications and applications to take advantage of the OCF. In pre- such as electronic commerce, tele-collaboration of vari- vious work [18] we presented a preliminary analysis of ous types, and evolving peer-to-peer systems, often con- the impact of hardware acceleration on network security taining sensitive information. Security in these sys- protocols, without describing the OCF itself in any de- tems depends on several mechanisms that utilize crypto- tail. Here, we evaluate the impact of the OCF in a variety graphic primitives as a basic building block. Such cryp- of micro-benchmarks, measuring overall system perfor- tographic primitives can be very complex [2] because mance, application throughput and latency, and aggre- choice one might prefer a second processor as it also gate throughput when multiple applications use the OCF. assists with the substantial (and perhaps dominant) non- cryptographic overheads. [18] provides some basic per- Our evaluation shows that, despite its addition in the sys- formance characterizations of IPsec as well as other net- tem as a device/service virtualization layer, the OCF is work security protocols, and the impact acceleration has extremely efficient in utilizing cryptographic accelera- on throughput. The authors conclude that the relative tor functionality, attaining 95% of the theoretical peak cost of high-grade cryptography is low enough that it device performance. In another configuration, we were should be the default configuration. able to achieve a 3DES aggregate throughput of over 800 Mbps, by employing a multi-threaded application There has been a considerable amount of work on the and load-balancing across multiple accelerators. Fur- enhancement of system performance through the addi- thermore, use of hardware accelerators can remove con- tion of cryptographic hardware [2]. This early work was tention for the CPU and thus improve overall system re- characterized by its focus on the hardware accelerator sponsiveness and performance for unrelated tasks. Our rather than its implications for overall system perfor- evaluation allowed us to determine that the limiting fac- mance. [24] began examining cryptographic subsystem tor for high-performance cryptography in modern sys- issues in the context of securing high-speed networks, tems is data copying and the PCI bus. Furthermore, and observed that the bus-attached cards would be lim- small data-buffers should be processed in software if ited by bus-sharing with a network adapter on systems possible, freeing hardware accelerators to handle larger with a single I/O bus. A second issue pointed out in that requests that better amortize the system and PCI trans- time frame [20] was the cost of system calls, and a third action costs. On the other hand, multi-threading results [21, 23, 7, 11] the cost of buffer copying. These issues in increased utilization of the OCF, improving aggregate are still with us, and continue to require aggressive de- throughput. We make recommendations for future direc- sign to reduce their impacts. tions in architectural placement of cryptographic func- [25] describes an API to cryptographic functions, the tionality, operating system provisions, and application main purpose of which is to separate cryptographic li- design, and discuss several improvements and promis- braries from applications, thus allowing independent de- ing directions for future work. velopment. Our service API is similar at a high level, The framework has been in use with IPsec since although several differences were dictated by the need OpenBSD 2.8, although it continues to evolve in re- to support actual hardware accelerators and allow it to sponse to new requirements. Public-key support and the be used efficiently by protocols such as IPsec and SSL, /dev/crypto interface were introduced in a later version. as we discuss in Section 3. Other work includes the Mi- The OCF has also been ported to FreeBSD and NetBSD, crosoft CryptoAPI [17], GSS-API [16] and IDUP-GSS- and we are working on Windows and Linux versions. API [1], PKCS #11 [14], SSAPI [26], and the CDSA [19]. These are primarily intended for use by applica- tions that also require authentication, authorization, key Paper Organization Section 2 discusses related work. management and other higher level security services. Section 3 describes the OCF’s design and implementa- Our work focuses on low-level cryptographic opera- tion, while Section 4 discusses its use by various sub- tions, providing a simple abstraction layer that does not systems and applications. In Section 5, we evaluate significantly impact performance, compared to a device- the framework’s performance, and discuss some of the specific approach. results and potential improvements and future work in [10] describes an open-source cryptographic coproces- Section 6. Section 7 concludes the paper. sor, focusing on protecting keys and other sensitive in- formation from tampering by unauthorized applications. The author extends the cryptlib library to communicate 2 Related Work with the co-processor. While he discusses several op- tions for hardware acceleration and identifies some po- tential performance bottlenecks, it is mostly a quali- As interest in security is currently in an upswing, recent tative analysis. That work is extended in [9], which work has been examining the overall performance im- presents a comprehensive cryptographic security archi- pact of security technologies in real systems. Work by tecture, again focusing primarily on preserving the con- Coarfa, et al. [4] has focused on the impact of hard- fidentiality of users’ (and applications’) cryptographic ware accelerators in the context of TLS web servers keys. We are interested in a much simpler problem: using a trace-based methodology, and concludes that how to accelerate cryptographic operations in a general- there is some opportunity for acceleration,