The Big Data Bootstrap

Total Page:16

File Type:pdf, Size:1020Kb

The Big Data Bootstrap The Big Data Bootstrap Ariel Kleiner [email protected] Ameet Talwalkar [email protected] Purnamrita Sarkar [email protected] Computer Science Division, University of California, Berkeley, CA 94720, USA Michael I. Jordan [email protected] Computer Science Division and Department of Statistics, University of California, Berkeley, CA 94720, USA Abstract form hypothesis testing, do bias correction, make more efficient use of available resources (e.g., by ceasing to The bootstrap provides a simple and pow- process data when the confidence region is sufficiently erful means of assessing the quality of esti- small), perform active learning, and do feature selec- mators. However, in settings involving large tion, among many more potential uses. datasets, the computation of bootstrap-based quantities can be prohibitively demanding. Accurate assessment of estimate quality has been a As an alternative, we present the Bag of Lit- longstanding concern in statistics. A great deal of tle Bootstraps (BLB), a new procedure which classical work in this vein has proceeded via asymp- incorporates features of both the bootstrap totic analysis, which relies on deep study of particu- and subsampling to obtain a robust, compu- lar classes of estimators in particular settings (Politis tationally efficient means of assessing estima- et al., 1999). While this approach ensures asymptotic tor quality. BLB is well suited to modern par- correctness and allows analytic computation, its appli- allel and distributed computing architectures cability is limited to cases in which the relevant asymp- and retains the generic applicability, statisti- totic analysis is tractable and has actually been per- cal efficiency, and favorable theoretical prop- formed. In contrast, recent decades have seen greater erties of the bootstrap. We provide the re- focus on more automatic methods, which generally re- sults of an extensive empirical and theoretical quire significantly less analysis, at the expense of do- investigation of BLB's behavior, including a ing more computation. The bootstrap (Efron, 1979; study of its statistical correctness, its large- Efron & Tibshirani, 1993) is perhaps the best known scale implementation and performance, selec- and most widely used among these methods, due to tion of hyperparameters, and performance on its simplicity, generic applicability, and automatic na- real data. ture. Efforts to ameliorate statistical shortcomings of the bootstrap in turn led to the development of re- lated methods such as the m out of n bootstrap and 1. Introduction subsampling (Bickel et al., 1997; Politis et al., 1999). Assessing the quality of estimates based upon finite Despite the computational demands of this more au- data is a task of fundamental importance in data anal- tomatic methodology, advancements have been driven ysis. For example, when estimating a vector of model primarily by a desire to remedy shortcomings in statis- parameters given a training dataset, it is useful to be tical correctness. However, with the advent of increas- able to quantify the uncertainty in that estimate (e.g., ingly large datasets and diverse sets of often complex via a confidence region), its bias, and its risk. Such and exploratory queries, computational considerations quality assessments provide far more information than and automation (i.e., lack of reliance on deep analysis a simple point estimate itself and can be used to im- of the specific estimator and setting of interest) are in- prove human interpretation of inferential outputs, per- creasingly important. Even as the amount of available data grows, the number of parameters to be estimated Appearing in Proceedings of the 29 th International Confer- and the number of potential sources of bias often also ence on Machine Learning, Edinburgh, Scotland, UK, 2012. grow, leading to a need to be able to tractably assess Copyright 2012 by the author(s)/owner(s). The Big Data Bootstrap −1 Pn estimator quality in the setting of large data. Pn = n i=1 δXi is the empirical distribution of X1;:::;Xn. The true (unknown) population value to Thus, unlike previous work on estimator quality as- ^ sessment, here we directly address computational costs be estimated is θ(P ). For example, θn might estimate and scalability in addition to statistical correctness a measure of correlation, the parameters of a regres- sor, or the prediction accuracy of a trained classifi- and automation. Indeed, existing methods have sig- ^ nificant drawbacks with respect to one or more of cation model. Noting that θn is a random quantity these desiderata. The bootstrap, despite its strong because it is based on n random observations, we de- statistical properties, has high|even prohibitive| fine Qn(P ) 2 Q as the true underlying distribution of ^ computational costs; thus, its usefulness is severely θn, which is determined by both P and the form of blunted by the large datasets increasingly encountered the mapping θ. Our end goal is the computation of in practice. We also find that its relatives, such as some metric ξ(Qn(P )) : Q! Ξ, for Ξ a vector space, the m out of n bootstrap and subsampling, can have which informatively summarizes Qn(P ). For instance, lesser computational costs, as expected, but are gen- ξ might compute a confidence region, a standard error, erally not robust to specification of hyperparameters or a bias. In practice, we do not have direct knowledge (such as the number of subsampled data points) and of P or Qn(P ), and so we must estimate ξ(Qn(P )) it- are also somewhat less automatic due to their need to self based only on the observed data and knowledge of explicitly utilize theoretically derived estimator con- the form of the mapping θ. vergence rates. Motivated by the need for an automatic, accurate 3. Related Work means of assessing estimator quality that is scalable The bootstrap (Efron, 1979; Efron & Tibshirani, 1993) to large datasets, we present a new procedure, the provides an automatic and widely applicable means of Bag of Little Bootstraps (BLB), which functions by quantifying estimator quality: it simply uses the data- combining the results of bootstrapping multiple small driven plugin approximation ξ(Qn(P )) ≈ ξ(Qn(Pn)). subsets of a larger original dataset. BLB has a sig- While ξ(Qn(Pn)) cannot be computed exactly in most nificantly more favorable computational profile than cases, it is generally amenable to straightforward the bootstrap, as it only requires repeated computa- Monte Carlo approximation as follows: repeatedly tion of the estimator under consideration on quantities resample n points i.i.d. from Pn, compute the esti- of data that can be much smaller than the original mate on each resample, form the empirical distribu- dataset. Hence, BLB is well suited to implementation tion Qn of the computed estimates, and approximate on modern distributed and parallel computing archi- ξ(Qn(P )) ≈ ξ(Qn). Though conceptually simple and tectures. Our procedure maintains the bootstrap's au- powerful, this procedure requires repeated estimator tomatic and generic applicability, favorable statistical computation on resamples having size comparable to properties, and simplicity of implementation. Finally, that of the original dataset. Therefore, if the original as we show empirically, BLB is consistently more ro- dataset is large, then this computation can be costly. bust than alternatives such as the m out of n bootstrap and subsampling. While the literature does contain some discussion of techniques for improving the computational efficiency We next formalize our statistical setting and notation of the bootstrap, that work is largely devoted to in Section2, discuss relevant prior work in Section3, reducing the number of Monte Carlo resamples re- and present BLB in full detail in Section4. Subse- quired (Efron, 1988; Efron & Tibshirani, 1993). These quently, we present an empirical and theoretical study techniques in general introduce significant additional of statistical correctness in Section5, an exploration complexity of implementation and do not obviate the of scalability including large-scale experiments on a need for repeated estimator computation on resamples distributed computing platform in Section6, practical having size comparable to that of the original dataset. methods for automatically selecting hyperparameters in Section7, and assessments on real data in Section8. Bootstrap variants such as the m out of n boot- strap (Bickel et al., 1997) and subsampling (Politis et al., 1999) were introduced to achieve statistical con- 2. Setting and Notation sistency in edge cases in which the bootstrap fails, though they also have the potential to yield compu- We assume that we observe a sample X1;:::;Xn 2 X drawn i.i.d. from some true (unknown) underlying tational benefits. The m out of n bootstrap (and sub- distribution P 2 P. Based only on this observed sampling) functions as follows, for m < n: repeatedly ^ resample m points i.i.d. from n (subsample m points data, we obtain an estimate θn = θ(Pn) 2 Θ, where P without replacement from X1;:::;Xn), compute the The Big Data Bootstrap estimate on each resample (subsample), form the em- Algorithm 1 Bag of Little Bootstraps (BLB) pirical distribution Qn;m of the computed estimates, Input: Data X1;:::;Xn approximate ξ(Qm(P )) ≈ ξ(Qn;m), and apply an an- θ: estimator of interest alytical correction to in turn approximate ξ(Qn(P )). ξ: estimator quality assessment This final analytical correction uses prior knowledge of b: subset size ^ the convergence rate of θn as n increases
Recommended publications
  • Bootstrapping Complexity
    BOOTSTRAPPING COMPLEXITY learning from self-organizing systems in nature and technology - a remix of Kevin Kellyʼs book “Out of Control” (1994) by Andreas Lloyd 2009 Distributed with permission under a Creative Commons license Introduction 3 Hive mind 6 Machines with an attitude 23 Assembling complexity 38 Co-evolution 47 The natural flux 63 The emergence of control 78 Closed systems 91 Artificial evolution 107 An open universe 128 The butterfly sleeps 142 Epilogue: Nine Laws of God 153 Text rights: Kevin Kelly: “Out of Control”, 1994, all rights reserved - used with kind permission. Full text available at http://www.kk.org/outofcontrol/ Wikipedia: “Cybernetics”, as of september 2009 - some rights reserved (BY-SA) Full text available at http://en.wikipedia.org/wiki/Cybernetics Photo rights: The Blue Marble, NASA Goddard Space Flight Center. Image by Reto Stöckli (land surface, shallow water, clouds). Enhancements by Robert Simmon (ocean color, compositing, 3D globes, animation). Data and technical support: MODIS Land Group; MODIS Science Data Support Team; MODIS Atmosphere Group; MODIS Ocean Group Additional data: USGS EROS Data Center (topography); USGS Terrestrial Remote Sensing Flagstaff Field Center (Antarctica); Defense Meteorological Satellite Program (city lights). http://visibleearth.nasa.gov/ Ecosphere, Statico - some rights reserved (BY-NC-SA). http://www.flickr.com/photos/ statico/143803777/ 2 Introduction “Good morning, self-organizing systems!” The cheerful speaker smiled with a polished ease and adjusted his tie. "I am indeed very happy to find the Office of Naval Research joining with the Armour Research Foundation in organizing this conference on what I personally consider an exceedingly important topic, and at such a well-chosen time." It was a spring day in early May, 1959.
    [Show full text]
  • Bootstrapping a Compiler for Fun and Profit
    Starting From Scratch Bootstrapping a Compiler for Fun and Profit Disclaimer: Profit not guaranteed Neither is fun. A background in compilers is neither assumed, implied, nor expected. Questions are welcome; and will be answered, possibly correctly. What’s a Compiler? A program that translates from one programming language into another programming language. To create a compiler, we need at least three things: A source language(e.g. C, Java, LLVM byte code) A target language (e.g. C, Java ByteCode, LLVM byte code, or the machine code for a specific CPU) to translate it to Rules to explain how to do the translation. How do we make a great compiler? There are two approaches to this problem. A) Write the perfect compiler on your very first try. (Good luck!) B) Start out with a not-so-great compiler. Use it to write a better one. Repeat. That’s great, but… What if I don’t have a not-so-great compiler to start out with? What if I don’t have a compiler at all? The Modern Answer: Just Google it. Someone will have written one for you by now. If they haven’t, you can start with an existing compiler for an existing source language (such as C), and change the target language that it generates. Starting from Scratch… What did we do before we had the Internet? What would we do if we didn’t have a compiler to start from? We’d make one! There are many possible ways to go about doing this. Here’s my approach… What are we trying to Do? We want to write a program that can translate from the language we HAVE to the language we WANT.
    [Show full text]
  • Trapper: an Operating System Bootstrapping Package for IBM PC Compatible Computer Systems
    Программные продукты и системы /Software&Systems 2 (33) 2020 Software &Systems Received 26.09.19 DOI: 10.15827/0236-235X.130.210-217 2020, vol. 33, no. 2, pp. 210–217 Trapper: an operating system bootstrapping package for IBM PC compatible computer systems Y.I. Klimiankou 1, Postgraduate Student, [email protected] 1 Belarusian State University of Informatics and Radioelectronics, Minsk, 220013, Belarus The paper presents an overview of the bootstrapping process on the IBM PC-compatible computer systems and proposes an architecture of the operating system bootstrapping package. The proposed package implements a framework for constructing boot images targeted at non-traditional operating systems like microkernel, an exokernel, unikernel, and multikernel. The bootstrapping package consists of three sets of independent boot modules and a Boot Image Builder application, which creates OS boot images. This application integrates and chains boot modules with one another to organize a complete bootloader chain. They are necessary to bring the operating system to a working state. The bootstrapping package architecture reflects the principal stages of the computer system boot process. Each set of boot modules is connected to the particular boot stage and forms a layer that is responsible for performing its own clearly defined set of functions and relies on clearly defined inter-layer interfaces to strictly isolate dependencies on the boot device, firmware and the specifics of the boot- loaded operating system. The paper presents the implementation of the described architecture for boot image generation designed and implemented for a research multikernel operating system and explains how it boots up. Additionally, the paper proposes the full separation idea of initialization code out of the operating system kernel and its movement into the independent OS loader module.
    [Show full text]
  • 3 Compiler Construction and Bootstrapping
    Compilers and Compiler Generators © P.D. Terry, 2000 3 COMPILER CONSTRUCTION AND BOOTSTRAPPING By now the reader may have realized that developing translators is a decidedly non-trivial exercise. If one is faced with the task of writing a full-blown translator for a fairly complex source language, or an emulator for a new virtual machine, or an interpreter for a low-level intermediate language, one would probably prefer not to implement it all in machine code. Fortunately one rarely has to contemplate such a radical step. Translator systems are now widely available and well understood. A fairly obvious strategy when a translator is required for an old language on a new machine, or a new language on an old machine (or even a new language on a new machine), is to make use of existing compilers on either machine, and to do the development in a high level language. This chapter provides a few examples that should make this clearer. 3.1 Using a high-level host language If, as is increasingly common, one’s dream machine M is supplied with the machine coded version of a compiler for a well-established language like C, then the production of a compiler for one’s dream language X is achievable by writing the new compiler, say XtoM, in C and compiling the source (XtoM.C) with the C compiler (CtoM.M) running directly on M (see Figure 3.1). This produces the object version (XtoM.M) which can then be executed on M. Even though development in C is much easier than development in machine code, the process is still complex.
    [Show full text]
  • A Minimalistic Verified Bootstrapped Compiler (Proof Pearl)
    A Minimalistic Verified Bootstrapped Compiler (Proof Pearl) Magnus O. Myreen Chalmers University of Technology Gothenburg, Sweden Abstract compiler on itself within the ITP (i.e. bootstrapping it), one This paper shows how a small verified bootstrapped compiler can arrive at an implementation of the compiler inside the can be developed inside an interactive theorem prover (ITP). ITP and get a proof about the correctness of each step [13]. Throughout, emphasis is put on clarity and minimalism. From there, one can export the low-level implementation of the compiler for use outside the ITP, without involving any CCS Concepts: • Theory of computation ! Program ver- complicated unverified code generation path. ification; Higher order logic; Automated reasoning. The concept of applying a compiler to itself inside an ITP Keywords: compiler verification, compiler bootstrapping, can be baffling at first, but don’t worry: the point of this paper interactive theorem proving is to clearly explain the ideas of compiler bootstrapping on a simple and minimalistic example. ACM Reference Format: To the best of our knowledge, compiler bootstrapping Magnus O. Myreen. 2021. A Minimalistic Verified Bootstrapped inside an ITP has previously only been done as part of the Proceedings of the 10th ACM SIGPLAN Compiler (Proof Pearl). In CakeML project [29]. The CakeML project has as one of International Conference on Certified Programs and Proofs (CPP ’21), its aims to produce a realistic verified ML compiler and, January 18–19, 2021, Virtual, Denmark. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3437992.3439915 as a result, some of its contributions, including compiler bootstrapping, are not as clearly explained as they ought to 1 Introduction be: important theorems are cluttered with real-world details that obscure some key ideas.
    [Show full text]
  • A Brief History of Unix
    A Brief History of Unix Tom Ryder [email protected] https://sanctum.geek.nz/ I Love Unix ∴ I Love Linux ● When I started using Linux, I was impressed because of the ethics behind it. ● I loved the idea that an operating system could be both free to customise, and free of charge. – Being a cash-strapped student helped a lot, too. ● As my experience grew, I came to appreciate the design behind it. ● And the design is UNIX. ● Linux isn’t a perfect Unix, but it has all the really important bits. What do we actually mean? ● We’re referring to the Unix family of operating systems. – Unix from Bell Labs (Research Unix) – GNU/Linux – Berkeley Software Distribution (BSD) Unix – Mac OS X – Minix (Intel loves it) – ...and many more Warning signs: 1/2 If your operating system shows many of the following symptoms, it may be a Unix: – Multi-user, multi-tasking – Hierarchical filesystem, with a single root – Devices represented as files – Streams of text everywhere as a user interface – “Formatless” files ● Data is just data: streams of bytes saved in sequence ● There isn’t a “text file” attribute, for example Warning signs: 2/2 – Bourne-style shell with a “pipe”: ● $ program1 | program2 – “Shebangs” specifying file interpreters: ● #!/bin/sh – C programming language baked in everywhere – Classic programs: sh(1), awk(1), grep(1), sed(1) – Users with beards, long hair, glasses, and very strong opinions... Nobody saw it coming! “The number of Unix installations has grown to 10, with more expected.” — Ken Thompson and Dennis Ritchie (1972) ● Unix in some flavour is in servers, desktops, embedded software (including Intel’s management engine), mobile phones, network equipment, single-board computers..
    [Show full text]
  • The UNIX Time- Sharing System
    1. Introduction There have been three versions of UNIX. The earliest version (circa 1969–70) ran on the Digital Equipment Cor- poration PDP-7 and -9 computers. The second version ran on the unprotected PDP-11/20 computer. This paper describes only the PDP-11/40 and /45 [l] system since it is The UNIX Time- more modern and many of the differences between it and older UNIX systems result from redesign of features found Sharing System to be deficient or lacking. Since PDP-11 UNIX became operational in February Dennis M. Ritchie and Ken Thompson 1971, about 40 installations have been put into service; they Bell Laboratories are generally smaller than the system described here. Most of them are engaged in applications such as the preparation and formatting of patent applications and other textual material, the collection and processing of trouble data from various switching machines within the Bell System, and recording and checking telephone service orders. Our own installation is used mainly for research in operating sys- tems, languages, computer networks, and other topics in computer science, and also for document preparation. UNIX is a general-purpose, multi-user, interactive Perhaps the most important achievement of UNIX is to operating system for the Digital Equipment Corpora- demonstrate that a powerful operating system for interac- tion PDP-11/40 and 11/45 computers. It offers a number tive use need not be expensive either in equipment or in of features seldom found even in larger operating sys- human effort: UNIX can run on hardware costing as little as tems, including: (1) a hierarchical file system incorpo- $40,000, and less than two man years were spent on the rating demountable volumes; (2) compatible file, device, main system software.
    [Show full text]
  • Bootstrap Program
    BOOTSTRAP PROGRAM A bootstrap program is the first code that is executed when the computer system is started. The entire operating system depends on the bootstrap program to work correctly as it loads the operating system. A figure that demonstrates the use of the bootstrap program is as follows: In the above image, the bootstrap program is a part of ROM which is the non-volatile memory. The operating system is loaded into the RAM by the bootstrap program after the start of the computer system. Then the operating system starts the device drivers. Bootstrapping Process The bootstrapping process does not require any outside input to start. Any software can be loaded as required by the operating system rather than loading all the software automatically. The bootstrapping process is performed as a chain i.e. at each stage, it is the responsibility of the simpler and smaller program to load and execute the much more complicated and larger program. This means that the computer system improves in increments by itself. The booting procedure starts with the hardware procedures and then continues onto the software procedures that are stored in the main memory. The bootstrapping process involves self-tests, loading BIOS, configuration settings, hypervisor, operating system etc. Benefits of Bootstrapping Without bootstrapping, the computer user would have to download all the software components, including the ones not frequently required. With bootstrapping, only those software components need to be downloaded that are legitimately required and all extraneous components are not required. This process frees up a lot of space in the memory and consequently saves a lot of time.
    [Show full text]
  • Give Your PXE Wings! It’S Not Magic! How Booting Actually Works
    Give Your PXE Wings! It’s not magic! How booting actually works. Presentation for virtual SREcon 2020 By Rob Hirschfeld, RackN Rob Hirschfeld @zehicle Co-Founder of RackN We created Digital Rebar Bare Metal Provisioning ++ @L8istSh9y Podcast on PXE: http://bit.ly/pxewings In concept, O/SO/S Provisioning is Easy! We’re just installing an operating system on a server or switch! Why is that so hard?! ● Bootstrapping ● Firmware Limitations ● Variation ● Networking ● Security ● Performance Computer ● Post configuration In concept, Pre- & Provisioning is Easy! Post- Config We’re just installing an operating system on a server or switch! Why is that so hard?! ● Bootstrapping ● Firmware Limitations And that’s not even including: ● Variation ● Networking ● System Inventory ● Security ● System Validation ● Performance ● Hardware Configuration ● Post configuration ● Naming & Addressing ● Credentials Injection Exploring Provisioning Approaches Netboot (25 min) Image Deploy (10 min) Esoteric Flavors (5 min) ● PXE ● Packer ● kexec ● iPXE ● Write Boot Part ● Secure Boot ● ONIE ● Cloud Init ● BMC Boot ● Kickstart ● Preseed Exploring Provisioning Approaches Netboot (25 min) Image Deploy (10 min) Esoteric Flavors (5 min) ● PXE ● Packer ● kexec ● iPXE ● Write Boot Part ● Secure Boot ● ONIE ● Cloud Init ● BMC Boot ● Kickstart ● Preseed All roads lead to a kernel init process PXE PXE Let’s PXE! Bootstrapping is a multi-stage process Server DHCP Firmware PXE NextServer & Options TFTP Bootloader Stage 1 lpxelinux.0 Provisioning Service(s) HTTP(S) Bootloader Stage
    [Show full text]
  • System Boot and Startup Bootstrapping
    System Boot and Startup Bootstrapping The first step in starting the operating system is that the • When a PC computer is powered up control is kernel is loaded into memory by a boot loader. transferred to a startup routine in the BIOS (Basic When the FreeBSD kernel is loaded into memory, the Input/Output System), stored in nonvolatile storage. kernel goes through several stages of hardware and → The BIOS runs some hardware diagnostics. software initialization. → A short boot program is loaded from the boot sector • Initialize CPU state including run-time stack and (sector 0) of the boot disk. → virtual-memory mapping. Which disk to use as boot disk is selectable in the BIOS setup menu. • Machine-dependent initializations. • If the standard PC boot loader is used, it will load → Setting up mutexes and page tables. another boot loader from the boot block in the partition → Configuring I/O devices. marked as boot partition in the MBR (Master Boot • Machine-independent initializations. Record). → Mounting root filesystem. • This second level boot loader may be the FreeBSD boot → Initializing system data structures. loader or a general purpose boot loader such as GRUB. • Create the first user-mode process. • The FreeBSD boot loader loads the kernel into memory • Start user-mode system processes from a standard location in the root file system (or it can be given commands to load from a nonstandard location). 1 2 Boot Programs Kernel Initialization • A boot program is a standalone program that operates • Ultimately the FreeBSD kernel is loaded into memory by without assistance of an operating system.
    [Show full text]
  • Predicting Academic Performance: a Bootstrapping Approach for Learning Dynamic Bayesian Networks
    Predicting Academic Performance: A Bootstrapping Approach for Learning Dynamic Bayesian Networks B Mashael Al-Luhaybi( ), Leila Yousefi, Stephen Swift, Steve Counsell, and Allan Tucker Intelligent Data Analysis Laboratory, Brunel University London, London, UK {Mashael.Al-luhaybi,Leila.yousefi,stephen.swift,steve.counsell, Allan.Tucker}@brunel.ac.uk Abstract. Predicting academic performance requires utilization of stu- dent related data and the accurate identification of the key issues regard- ing such data can enhance the prediction process. In this paper, we pro- posed a bootstrapped resampling approach for predicting the academic performance of university students using probabilistic modeling taking into consideration the bias issue of educational datasets. We include in this investigation students’ data at admission level, Year 1 and Year 2, respectively. For the purpose of modeling academic performance, we first address the imbalanced time series of educational datasets with a resampling method using bootstrap aggregating (bagging). We then ascertain the Bayesian network structure from the resampled dataset to compare the efficiency of our proposed approach with the original data approach. Hence, one interesting outcome was that of learning and testing the Bayesian model from the bootstrapped time series data. The prediction results were improved dramatically, especially for the minority class, which was for identifying the high risk of failing students. Keywords: Performance prediction · Bayesian networks · Resampling · Bootstrapping · EDM 1 Introduction According to the Higher Education Statistics Agency (HESA) in the UK [1], the drop-out rate among undergraduate students has increased in the last three years. The statistics published by the HESA reveal that a total of 26,000 stu- dents in England in 2015 dropped out from their enrolled academic programmes after their first year.
    [Show full text]
  • Bootstrapping Trust in Commodity Computers
    Bootstrapping Trust in Commodity Computers Bryan Parno Jonathan M. McCune Adrian Perrig CyLab, Carnegie Mellon University Abstract ultimately involves human users, which creates a host of ad- Trusting a computer for a security-sensitive task (such as checking ditional challenges (§9). Finally, all of the work we survey email or banking online) requires the user to know something about has certain fundamental limitations (§10), which leaves many the computer’s state. We examine research on securely capturing a interesting questions open for future work in this area (§11). computer’s state, and consider the utility of this information both for Much of this research falls under the heading of “Trusted improving security on the local computer (e.g., to convince the user Computing”, the most visible aspect of which is the Trusted that her computer is not infected with malware) and for communi- cating a remote computer’s state (e.g., to enable the user to check Platform Module (TPM), which has already been deployed that a web server will adequately protect her data). Although the on over 200 million computers [45]. In many ways, this is recent “Trusted Computing” initiative has drawn both positive and one of the most significant changes in hardware-supported negative attention to this area, we consider the older and broader security in commodity systems since the development of seg- topic of bootstrapping trust in a computer. We cover issues rang- mentation and process rings in the 1960s, and yet it has been ing from the wide collection of secure hardware that can serve as a met with muted interest in the security research community, foundation for trust, to the usability issues that arise when trying to perhaps due to its perceived association with Digital Rights convey computer state information to humans.
    [Show full text]