Secure Processors Part II: Intel SGX Security Analysis and MIT Sanctum Architecture

Total Page:16

File Type:pdf, Size:1020Kb

Secure Processors Part II: Intel SGX Security Analysis and MIT Sanctum Architecture Foundations and Trends R in Electronic Design Automation Vol. 11, No. 3 (2017) 249–361 c 2017 V. Costan, I. Lebedev and S. Devadas DOI: 10.1561/1000000052 Secure Processors Part II: Intel SGX Security Analysis and MIT Sanctum Architecture Victor Costan, Ilia Lebedev and Srinivas Devadas [email protected], [email protected] and [email protected] Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Contents 1 Introduction 250 1.1 The Case for Hardware Isolation . 251 1.2 Intel SGX is Not the Answer . 252 1.3 MIT Sanctum Processor . 253 2 An Analysis of Intel’s Software Guard Extensions (SGX) 255 2.1 SGX Implementation Overview . 256 2.2 SGX Memory Access Protection . 261 2.3 SGX Security Check Correctness . 268 2.4 Tracking TLB Flushes . 276 2.5 Enclave Signature Verification . 280 2.6 Key Hierarchy and Derivation . 285 2.7 SGX Security Properties . 288 3 The MIT Sanctum Processor 306 3.1 Threat Model . 307 3.2 Programming Model Overview . 309 3.3 Protection Boundaries . 315 3.4 Security Primitives . 315 3.5 Hardware Modifications . 317 3.6 Software Design . 324 ii iii 3.7 Security Analysis of Sanctum . 338 3.8 Work Related to Sanctum Mechanisms . 348 4 Conclusion 350 Acknowledgments 352 References 353 Abstract This manuscript is the second in a two part survey and analysis of the state of the art in secure processor systems, with a specific fo- cus on remote software attestation and software isolation. The first part established the taxonomy and prerequisite concepts relevant to an examination of the state of the art in trusted remote computation: attested software isolation containers (enclaves). This second part ex- tends Part I’s description of Intel’s Software Guard Extensions (SGX), an available and documented enclave-capable system, with a rigorous security analysis of SGX as a system for trusted remote computation. This part documents the authors’ concerns over the shortcomings of SGX as a secure system and introduces the MIT Sanctum processor developed by the authors: a system designed to offer stronger security guarantees, lend itself better to analysis and formal verification, and offer a more straightforward and complete threat model than the Intel system, all with an equivalent programming model. This two part work advocates a principled, transparent, and well- scrutinized approach to system design, and argues that practical guar- antees of privacy and integrity for remote computation are achievable at a reasonable design cost and performance overhead. V. Costan, I. Lebedev and S. Devadas. Secure Processors Part II: Intel SGX Security Analysis and MIT Sanctum Architecture. Foundations and Trends R in Electronic Design Automation, vol. 11, no. 3, pp. 249–361, 2017. DOI: 10.1561/1000000052. 1 Introduction Between the Snowden revelations and the seemingly unending series of high-profile hacks of the past few years, the public’s confidence in software systems has decreased considerably. At the same time, key initiatives such as cloud computing and the IoT (Internet of Things) are gaining popularity but require users to place much trust in the systems providing these services. We must therefore develop capa- bilities to build software systems with compelling security, and gain back our users’ trust. This manuscript is the second in a two part survey of the state of the art in secure processor systems, with a specific focus on remote software attestation and software isolation. Part I [Costan et al., 2017] established relevant background in computer system design (§ I.2) and security primitives (§ I.3), and surveyed relevant prior work (§ I.4). The same work discussed the attested software isolation container (en- clave): a modern primitive for modular secure software and trusted remote computation, as exemplified by Intel’s Software Guard Ex- tensions (§ I.5). This manuscript extends the discussion of enclaves and SGX by surveying the implementation and security properties of SGX (§ 2), 250 1.1. The Case for Hardware Isolation 251 and documents the authors’ concerns with its vulnerabilities to several classes of software attacks. Informed by the successes and shortcom- ings of SGX, this manuscript also discusses the MIT Sanctum proces- sor (§ 3): a secure processor that offers an equivalent programming model with strong security guarantees against an insidious software threat model including cache timing and memory access pattern at- tacks. With this work, we hope to enable a shift in discourse in secure hardware architecture away from plugging specific security holes to a principled approach to eliminating attack surfaces. 1.1 The Case for Hardware Isolation The best known practical method for securing a software system amounts to modularizing the system’s code in a way that minimizes code in the modules responsible for the system’s security. Formal ver- ification techniques are then applied to these modules, which make up the system’s trusted codebase (TCB). The method assumes that software modules are isolated, so the TCB must also include the mech- anism providing the isolation guarantees. Today’s systems rely on an operating system kernel, or a hypervi- sor (such as Linux or Xen, respectively) for software isolation. How- ever each of the last three years (2012-2014) witnessed over 100 new security vulnerabilities in Linux [cve, 2014a, Chen et al., 2011], and over 40 in Xen [cve, 2014b]. One may hope that formal verification methods can produce a se- cure kernel or hypervisor. Unfortunately, these codebases are far out- side our verification capabilities: Linux and Xen have over 17 million [Anthony, 2014] and 150,000 [xen, 2015] lines of code, respectively. In stark contrast, the seL4 formal verification effort [Klein et al., 2009] spent 20 man-years to cover 9,000 lines of code. Given Linux and Xen’s history of vulnerabilities and uncertain prospects for formal verification, a prudent system designer cannot in- clude either in a TCB (trusted computing base), and must look else- where for a software isolation mechanism. 252 Introduction Fortunately, Intel’s Software Guard Extensions (SGX) [McKeen et al., 2013, Anati et al., 2013] has brought attention to the alter- native of providing software isolation primitives in the CPU’s hard- ware. This avenue is appealing because the CPU is an unavoidable TCB component, and processor manufacturers have strong economic incentives to build correct hardware. 1.2 Intel SGX is Not the Answer Unfortunately, although the SGX design includes a vast array of de- fenses against a variety of software and physical attacks, it fails to offer meaningful software isolation guarantees. The SGX threat model pro- tects against all direct attacks, but excludes “side-channel attacks”, even if they can be performed via software alone. Alarmingly, cache timing attacks require only unprivileged software running on the victim’s host computer, and do not rely on any phys- ical access to the machine. This is particularly concerning in a cloud computing scenario, where gaining software access to the victim’s com- puter only requires a credit card [Ristenpart et al., 2009], whereas physical access is harder, requiring trespass, coercion, or social engi- neering on the cloud provider’s employees. Similarly, in many Internet of Things (IoT) scenarios, the process- ing units have some amount of physical security, but they run outdated software stacks that have known security vulnerabilities. For example, an attacker may exploit a vulnerability in an IoT lock’s Bluetooth stack and obtain software execution privileges, then mount a cache timing attack on its access-granting process, and obtain the crypto- graphic key that opens the lock. Furthermore, the analysis of SGX documentation as described in Part I of this work reveals that it is impossible for anyone but In- tel to reason about SGX’s security properties, because significant im- plementation details are not covered by the publicly available docu- mentation. This is a concern, as the myriad of security vulnerabilities [Wojtczuk and Rutkowska, 2011, 2009b, Wojtczuk et al., 2009, Duflot et al., 2006, Rutkowska and Wojtczuk, 2008, Wojtczuk and Rutkowska, 1.3. MIT Sanctum Processor 253 2009a, Wecherowski, 2009, Embleton et al., 2010] in TXT [Grawrock, 2009], Intel’s previous attempt at securing remote computation, show that securing the machinery underlying Intel’s processors is incredibly challenging, even in the presence of strong economic incentives. If a successor to SGX claimed to protect against cache timing at- tacks, substantiating such a claim would require an analysis of its hard- ware and microcode, and ensuring that no implementation detail is vulnerable to cache timing attacks. Barring a highly unlikely shift to open-source hardware from Intel, such analysis will never happen. A concrete example: the SGX documentation [Int, 2013, 2014] does not state where SGX stores the EPCM (enclave page cache map). If the EPCM is stored in cacheable RAM, page translation verification is subject to cache timing attacks. Interestingly, this detail is unnecessary for analyzing the security of today’s SGX implementation, as we know that SGX uses the operating system’s page tables, and page transla- tions are therefore vulnerable to cache timing attacks. The example does, however, demonstrate the fine nature of crucial details that are simply undocumented in today’s hardware security implementations. In summary, while the principles behind SGX have great potential, the SGX design does not offer meaningful isolation guarantees, and the SGX implementation is not open enough for independent researchers to be able to analyze its security properties. 1.3 MIT Sanctum Processor The Sanctum processor’s main contribution is a software isolation scheme that addresses the issues raised above: Sanctum’s isolation prov- ably defends against known software side-channel attacks, including cache timing attacks and passive address translation attacks. Sanctum is a co-design that combines minimal and minimally invasive hard- ware modifications with a trusted software security monitor that is amenable to rigorous analysis and does not perform cryptographic operations using keys.
Recommended publications
  • Memory Hierarchy Memory Hierarchy
    Memory Key challenge in modern computer architecture Lecture 2: different memory no point in blindingly fast computation if data can’t be and variable types moved in and out fast enough need lots of memory for big applications Prof. Mike Giles very fast memory is also very expensive [email protected] end up being pushed towards a hierarchical design Oxford University Mathematical Institute Oxford e-Research Centre Lecture 2 – p. 1 Lecture 2 – p. 2 CPU Memory Hierarchy Memory Hierarchy Execution speed relies on exploiting data locality 2 – 8 GB Main memory 1GHz DDR3 temporal locality: a data item just accessed is likely to be used again in the near future, so keep it in the cache ? 200+ cycle access, 20-30GB/s spatial locality: neighbouring data is also likely to be 6 used soon, so load them into the cache at the same 2–6MB time using a ‘wide’ bus (like a multi-lane motorway) L3 Cache 2GHz SRAM ??25-35 cycle access 66 This wide bus is only way to get high bandwidth to slow 32KB + 256KB main memory ? L1/L2 Cache faster 3GHz SRAM more expensive ??? 6665-12 cycle access smaller registers Lecture 2 – p. 3 Lecture 2 – p. 4 Caches Importance of Locality The cache line is the basic unit of data transfer; Typical workstation: typical size is 64 bytes 8 8-byte items. ≡ × 10 Gflops CPU 20 GB/s memory L2 cache bandwidth With a single cache, when the CPU loads data into a ←→ 64 bytes/line register: it looks for line in cache 20GB/s 300M line/s 2.4G double/s ≡ ≡ if there (hit), it gets data At worst, each flop requires 2 inputs and has 1 output, if not (miss), it gets entire line from main memory, forcing loading of 3 lines = 100 Mflops displacing an existing line in cache (usually least ⇒ recently used) If all 8 variables/line are used, then this increases to 800 Mflops.
    [Show full text]
  • Intel® Architecture Instruction Set Extensions and Future Features Programming Reference
    Intel® Architecture Instruction Set Extensions and Future Features Programming Reference 319433-037 MAY 2019 Intel technologies features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Intel does not assume any liability for lost or stolen data or systems or any damages resulting from such losses. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifica- tions. Current characterized errata are available on request. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Intel does not guarantee the availability of these interfaces in any future product. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1- 800-548-4725, or by visiting http://www.intel.com/design/literature.htm. Intel, the Intel logo, Intel Deep Learning Boost, Intel DL Boost, Intel Atom, Intel Core, Intel SpeedStep, MMX, Pentium, VTune, and Xeon are trademarks of Intel Corporation in the U.S.
    [Show full text]
  • Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance
    White Paper Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance Ofri Wechsler Intel Fellow, Mobility Group Director, Mobility Microprocessor Architecture Intel Corporation White Paper Inside Intel®Core™ Microarchitecture Introduction Introduction 2 The Intel® Core™ microarchitecture is a new foundation for Intel®Core™ Microarchitecture Design Goals 3 Intel® architecture-based desktop, mobile, and mainstream server multi-core processors. This state-of-the-art multi-core optimized Delivering Energy-Efficient Performance 4 and power-efficient microarchitecture is designed to deliver Intel®Core™ Microarchitecture Innovations 5 increased performance and performance-per-watt—thus increasing Intel® Wide Dynamic Execution 6 overall energy efficiency. This new microarchitecture extends the energy efficient philosophy first delivered in Intel's mobile Intel® Intelligent Power Capability 8 microarchitecture found in the Intel® Pentium® M processor, and Intel® Advanced Smart Cache 8 greatly enhances it with many new and leading edge microar- Intel® Smart Memory Access 9 chitectural innovations as well as existing Intel NetBurst® microarchitecture features. What’s more, it incorporates many Intel® Advanced Digital Media Boost 10 new and significant innovations designed to optimize the Intel®Core™ Microarchitecture and Software 11 power, performance, and scalability of multi-core processors. Summary 12 The Intel Core microarchitecture shows Intel’s continued Learn More 12 innovation by delivering both greater energy efficiency Author Biographies 12 and compute capability required for the new workloads and usage models now making their way across computing. With its higher performance and low power, the new Intel Core microarchitecture will be the basis for many new solutions and form factors. In the home, these include higher performing, ultra-quiet, sleek and low-power computer designs, and new advances in more sophisticated, user-friendly entertainment systems.
    [Show full text]
  • Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth Eurosys Conference (Eurosys'19), Dresden, Germany, 25-28 March 2019
    http://www.diva-portal.org Postprint This is the accepted version of a paper presented at EuroSys'19. Citation for the original published paper: Farshin, A., Roozbeh, A., Maguire Jr., G Q., Kostic, D. (2019) Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth EuroSys Conference (EuroSys'19), Dresden, Germany, 25-28 March 2019. ACM Digital Library N.B. When citing this work, cite the original published paper. Permanent link to this version: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-244750 Make the Most out of Last Level Cache in Intel Processors Alireza Farshin∗† Amir Roozbeh∗ KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] Ericsson Research [email protected] Gerald Q. Maguire Jr. Dejan Kostić KTH Royal Institute of Technology KTH Royal Institute of Technology [email protected] [email protected] Abstract between Central Processing Unit (CPU) and Direct Random In modern (Intel) processors, Last Level Cache (LLC) is Access Memory (DRAM) speeds has been increasing. One divided into multiple slices and an undocumented hashing means to mitigate this problem is better utilization of cache algorithm (aka Complex Addressing) maps different parts memory (a faster, but smaller memory closer to the CPU) in of memory address space among these slices to increase order to reduce the number of DRAM accesses. the effective memory bandwidth. After a careful study This cache memory becomes even more valuable due to of Intel’s Complex Addressing, we introduce a slice- the explosion of data and the advent of hundred gigabit per aware memory management scheme, wherein frequently second networks (100/200/400 Gbps) [9].
    [Show full text]
  • Microcode Revision Guidance August 31, 2019 MCU Recommendations
    microcode revision guidance August 31, 2019 MCU Recommendations Section 1 – Planned microcode updates • Provides details on Intel microcode updates currently planned or available and corresponding to Intel-SA-00233 published June 18, 2019. • Changes from prior revision(s) will be highlighted in yellow. Section 2 – No planned microcode updates • Products for which Intel does not plan to release microcode updates. This includes products previously identified as such. LEGEND: Production Status: • Planned – Intel is planning on releasing a MCU at a future date. • Beta – Intel has released this production signed MCU under NDA for all customers to validate. • Production – Intel has completed all validation and is authorizing customers to use this MCU in a production environment.
    [Show full text]
  • Vxworks Architecture Supplement, 6.2
    VxWorks Architecture Supplement VxWorks® ARCHITECTURE SUPPLEMENT 6.2 Copyright © 2005 Wind River Systems, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means without the prior written permission of Wind River Systems, Inc. Wind River, the Wind River logo, Tornado, and VxWorks are registered trademarks of Wind River Systems, Inc. Any third-party trademarks referenced are the property of their respective owners. For further information regarding Wind River trademarks, please see: http://www.windriver.com/company/terms/trademark.html This product may include software licensed to Wind River by third parties. Relevant notices (if any) are provided in your product installation at the following location: installDir/product_name/3rd_party_licensor_notice.pdf. Wind River may refer to third-party documentation by listing publications or providing links to third-party Web sites for informational purposes. Wind River accepts no responsibility for the information provided in such third-party documentation. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): (800) 545-WIND telephone: (510) 748-4100 facsimile: (510) 749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support VxWorks Architecture Supplement, 6.2 11 Oct 05 Part #: DOC-15660-ND-00 Contents 1 Introduction
    [Show full text]
  • Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc
    Freescale Semiconductor AN2808 Application Note Rev. 1, 06/2005 Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc. Contents 1 Scope and Definitions 1. Scope and Definitions . 1 2. Feature Overview . 2 The purpose of this application note is to facilitate migration 3. 7447A Specific Features . 12 from IBM’s 750FX-based systems to Freescale’s 4. Programming Model . 16 MPC7447A. It addresses the differences between the 5. Hardware Considerations . 27 systems, explaining which features have changed and why, 6. Revision History . 30 before discussing the impact on migration in terms of hardware and software. Throughout this document the following references are used: • 750FX—which applies to Freescale’s MPC750, MPC740, MPC755, and MPC745 devices, as well as to IBM’s 750FX devices. Any features specific to IBM’s 750FX will be explicitly stated as such. • MPC7447A—which applies to Freescale’s MPC7450 family of products (MPC7450, MPC7451, MPC7441, MPC7455, MPC7445, MPC7457, MPC7447, and MPC7447A) except where otherwise stated. Because this document is to aid the migration from 750FX, which does not support L3 cache, the L3 cache features of the MPC745x devices are not mentioned. © Freescale Semiconductor, Inc., 2005. All rights reserved. Feature Overview 2 Feature Overview There are many differences between the 750FX and the MPC7447A devices, beyond the clear differences of the core complex. This section covers the differences between the cores and then other areas of interest including the cache configuration and system interfaces. 2.1 Cores The key processing elements of the G3 core complex used in the 750FX are shown below in Figure 1, and the G4 complex used in the 7447A is shown in Figure 2.
    [Show full text]
  • CPUID Handling for Guests
    CPUID handling for guests Andrew Cooper Citrix XenServer Friday 26th August 2016 Andrew Cooper (Citrix XenServer) CPUID handling for guests Friday 26th August 2016 1 / 11 Return information about the processor I Identifying information (GenuineIntel, AuthenticAMD, etc) I Feature information (available instructions, MSRs, etc) I Topology information (sockets, cores, threads, caches, TLBs, etc) Unpriveleged I Useable by userspace I Doesn't trap to supervisor mode The CPUID instruction Introduced in 1993 I Takes input parameters in %eax and %ecx I Returns values in %eax, %ebx, %ecx and %edx Andrew Cooper (Citrix XenServer) CPUID handling for guests Friday 26th August 2016 2 / 11 Unpriveleged I Useable by userspace I Doesn't trap to supervisor mode The CPUID instruction Introduced in 1993 I Takes input parameters in %eax and %ecx I Returns values in %eax, %ebx, %ecx and %edx Return information about the processor I Identifying information (GenuineIntel, AuthenticAMD, etc) I Feature information (available instructions, MSRs, etc) I Topology information (sockets, cores, threads, caches, TLBs, etc) Andrew Cooper (Citrix XenServer) CPUID handling for guests Friday 26th August 2016 2 / 11 The CPUID instruction Introduced in 1993 I Takes input parameters in %eax and %ecx I Returns values in %eax, %ebx, %ecx and %edx Return information about the processor I Identifying information (GenuineIntel, AuthenticAMD, etc) I Feature information (available instructions, MSRs, etc) I Topology information (sockets, cores, threads, caches, TLBs, etc) Unpriveleged I Useable by userspace I Doesn't trap to supervisor mode Andrew Cooper (Citrix XenServer) CPUID handling for guests Friday 26th August 2016 2 / 11 Some kernels binary patch themselves (e.g.
    [Show full text]
  • POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors
    POWER-AWARE MICROARCHITECTURE: Design and Modeling Challenges for Next-Generation Microprocessors THE ABILITY TO ESTIMATE POWER CONSUMPTION DURING EARLY-STAGE DEFINITION AND TRADE-OFF STUDIES IS A KEY NEW METHODOLOGY ENHANCEMENT. OPPORTUNITIES FOR SAVING POWER CAN BE EXPOSED VIA MICROARCHITECTURE-LEVEL MODELING, PARTICULARLY THROUGH CLOCK- GATING AND DYNAMIC ADAPTATION. Power dissipation limits have Thus far, most of the work done in the area David M. Brooks emerged as a major constraint in the design of high-level power estimation has been focused of microprocessors. At the low end of the per- at the register-transfer-level (RTL) description Pradip Bose formance spectrum, namely in the world of in the processor design flow. Only recently have handheld and portable devices or systems, we seen a surge of interest in estimating power Stanley E. Schuster power has always dominated over perfor- at the microarchitecture definition stage, and mance (execution time) as the primary design specific work on power-efficient microarchi- Hans Jacobson issue. Battery life and system cost constraints tecture design has been reported.2-8 drive the design team to consider power over Here, we describe the approach of using Prabhakar N. Kudva performance in such a scenario. energy-enabled performance simulators in Increasingly, however, power is also a key early design. We examine some of the emerg- Alper Buyuktosunoglu design issue in the workstation and server mar- ing paradigms in processor design and com- kets (see Gowan et al.)1 In this high-end arena ment on their inherent power-performance John-David Wellman the increasing microarchitectural complexities, characteristics. clock frequencies, and die sizes push the chip- Victor Zyuban level—and hence the system-level—power Power-performance efficiency consumption to such levels that traditionally See the “Power-performance fundamentals” Manish Gupta air-cooled multiprocessor server boxes may box.
    [Show full text]
  • Caches & Memory
    Caches & Memory Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon, Bala, Bracy, McKee, and Sirer] Programs 101 C Code RISC-V Assembly int main (int argc, char* argv[ ]) { main: addi sp,sp,-48 int i; sw x1,44(sp) int m = n; sw fp,40(sp) int sum = 0; move fp,sp sw x10,-36(fp) for (i = 1; i <= m; i++) { sw x11,-40(fp) sum += i; la x15,n } lw x15,0(x15) printf (“...”, n, sum); sw x15,-28(fp) } sw x0,-24(fp) li x15,1 sw x15,-20(fp) Load/Store Architectures: L2: lw x14,-20(fp) lw x15,-28(fp) • Read data from memory blt x15,x14,L3 (put in registers) . • Manipulate it .Instructions that read from • Store it back to memory or write to memory… 2 Programs 101 C Code RISC-V Assembly int main (int argc, char* argv[ ]) { main: addi sp,sp,-48 int i; sw ra,44(sp) int m = n; sw fp,40(sp) int sum = 0; move fp,sp sw a0,-36(fp) for (i = 1; i <= m; i++) { sw a1,-40(fp) sum += i; la a5,n } lw a5,0(x15) printf (“...”, n, sum); sw a5,-28(fp) } sw x0,-24(fp) li a5,1 sw a5,-20(fp) Load/Store Architectures: L2: lw a4,-20(fp) lw a5,-28(fp) • Read data from memory blt a5,a4,L3 (put in registers) . • Manipulate it .Instructions that read from • Store it back to memory or write to memory… 3 1 Cycle Per Stage: the Biggest Lie (So Far) Code Stored in Memory (also, data and stack) compute jump/branch targets A memory register ALU D D file B +4 addr PC B control din dout M inst memory extend new imm forward pc Stack, Data, Code detect unit hazard Stored in Memory Instruction Instruction Write- ctrl ctrl ctrl Fetch Decode Execute Memory Back IF/ID
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Hardware Architecture
    Hardware Architecture Components Computing Infrastructure Components Servers Clients LAN & WLAN Internet Connectivity Computation Software Storage Backup Integration is the Key ! Security Data Network Management Computer Today’s Computer Computer Model: Von Neumann Architecture Computer Model Input: keyboard, mouse, scanner, punch cards Processing: CPU executes the computer program Output: monitor, printer, fax machine Storage: hard drive, optical media, diskettes, magnetic tape Von Neumann architecture - Wiki Article (15 min YouTube Video) Components Computer Components Components Computer Components CPU Memory Hard Disk Mother Board CD/DVD Drives Adaptors Power Supply Display Keyboard Mouse Network Interface I/O ports CPU CPU CPU – Central Processing Unit (Microprocessor) consists of three parts: Control Unit • Execute programs/instructions: the machine language • Move data from one memory location to another • Communicate between other parts of a PC Arithmetic Logic Unit • Arithmetic operations: add, subtract, multiply, divide • Logic operations: and, or, xor • Floating point operations: real number manipulation Registers CPU Processor Architecture See How the CPU Works In One Lesson (20 min YouTube Video) CPU CPU CPU speed is influenced by several factors: Chip Manufacturing Technology: nm (2002: 130 nm, 2004: 90nm, 2006: 65 nm, 2008: 45nm, 2010:32nm, Latest is 22nm) Clock speed: Gigahertz (Typical : 2 – 3 GHz, Maximum 5.5 GHz) Front Side Bus: MHz (Typical: 1333MHz , 1666MHz) Word size : 32-bit or 64-bit word sizes Cache: Level 1 (64 KB per core), Level 2 (256 KB per core) caches on die. Now Level 3 (2 MB to 8 MB shared) cache also on die Instruction set size: X86 (CISC), RISC Microarchitecture: CPU Internal Architecture (Ivy Bridge, Haswell) Single Core/Multi Core Multi Threading Hyper Threading vs.
    [Show full text]