1 What Is Memory Hierarchy Why Memory Hierarchy? Generations of Microprocessors Area Costs of Caches What Is Cache, Exactly?

Total Page:16

File Type:pdf, Size:1020Kb

1 What Is Memory Hierarchy Why Memory Hierarchy? Generations of Microprocessors Area Costs of Caches What Is Cache, Exactly? What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache and Virtual Memroy Review Proc/Regs L1-Cache BiggerL2-Cache Faster Cache optimization approaches, L3-Cache (optional) cache miss classification, Memory Disk, Tape, etc. Here we focus on L1/L2/L3 caches and main memory 1 2 Adapted from UCB CS252 S01 Why Memory Hierarchy? Generations of Microprocessors Time of a full cache miss in instructions executed: µProc 1st Alpha: 340 ns/5.0 ns = 68 clks x 2 or 1000 CPU 60%/yr. “Moore’s Law” 136 2nd Alpha: 266 ns/3.3 ns = 80 clks x 4 or 100 Processor-Memory Performance Gap: 320 (grows 50% / year) 3rd Alpha: 180 ns/1.7 ns =108 clks x 6 or 10 DRAM 648 Performance DRAM 7%/yr. 1/2X latency x 3X clock rate x 3X Instr/clock ⇒ 1 4.5X 1987 1983 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 1980 1981 1982 1984 1985 1986 1980: no cache in µproc; 1995 2-level cache on chip (1989 first Intel µproc with a cache on chip) 3 4 Area Costs of Caches What Is Cache, Exactly? Processor % Area %Transistors Small, fast storage used to improve average access time to slow memory; usually made by SRAM (­cost) (­power) Exploits locality: spatial and temporal Intel 80386 0% 0% In computer architecture, almost everything is a cache! Alpha 21164 37% 77% Register file is the fastest place to cache variables First-level cache a cache on second-level cache StrongArm SA110 61% 94% Second-level cache a cache on memory Memory a cache on disk (virtual memory) Pentium Pro 64% 88% TLB a cache on page table 2 dies per package: Proc/I$/D$ + L2$ Branch-prediction a cache on prediction information? Branch-target buffer can be implemented as cache Itanium 92% Beyond architecture: file cache, browser cache, proxy Caches store redundant data cache only to close performance gap Here we focus on L1 and L2 caches (L3 optional) as buffers to main memory 5 6 1 Example: 1 KB Direct Mapped Cache For Questions About Cache Design Assume a cache of 2N bytes, 2K blocks, block size of 2M bytes; N = M+K (#block times block size) Block placement: Where can a block be placed? (32 - N)-bit cache tag, K-bit cache index, and M-bit cache The cache stores tag, data, and valid bit for each block Cache index is used to select a block in SRAM (Recall BHT, Block identification: How to find a block in the BTB) cache? Block tag is compared with the input tag A word in the data block may be selected as the output Block address 31 9 4 0 Block replacement: If a new block is to be Tag Example: 0x50 Index Block offset Ex: 0x01 Ex: 0x00 fetched, which of existing blocks to Stored as part of the cache “state” replace? (if there are multiple choices Valid Bit Cache Tag Cache Data Byte 31 : Byte 1 Byte 0 0 0x50 Byte 63 : Byte 33 Byte 32 1 2 3 Write policy: What happens on a write? : : : Byte 1023 : Byte 992 31 7 8 Where Can A Block Be Placed Set Associative Cache Example: Two-way set associative cache What is a block: divide memory space into Cache index selects a set of two blocks blocks as cache is divided The two tags in the set are compared to the input in A memory block is the basic unit to be cached parallel Direct mapped cache: there is only one place Data is selected based on the tag comparison in the cache to buffer a given memory block Set associative or direct mapped? Discuss later Cache Index N-way set associative cache: N places for a Valid Cache Tag Cache Data Cache Data Cache Tag Valid given memory block Cache Block 0 Cache Block 0 Like N direct mapped caches operating in parallel ::: : :: Reducing miss rates with increased complexity, cache access time, and power consumption Adr Tag Fully associative cache: a memory block can Compare Sel11 Mux 0 Sel0 Compare be put anywhere in the cache OR Cache Block 9 Hit 10 How to Find a Cached Block Which Block to Replace? Direct mapped cache: the stored tag for the Direct mapped cache: Not an issue cache block matches the input tag For set associative or fully associative* cache: Fully associative cache: any of the stored N tags matches the input tag Random: Select candidate blocks randomly from the cache set Set associative cache: any of the stored K LRU (Least Recently Used): Replace the block tags for the cache set matches the input that has been unused for the longest time tag FIFO (First In, First Out): Replace the oldest block Cache hit time is decided by both tag Usually LRU performs the best, but hard comparison and data access – Can be (and expensive) to implement determined by Cacti Model *Think fully associative cache as a set associative one with a 11 single set 12 2 What Happens on Writes Where to write the data if the block is found in cache? Real Example: Alpha 21264 Caches Write through: new data is written to both the cache 64KB 2-way block and the lower-level memory associative Help to maintain cache consistency instruction cache Write back: new data is written only to the cache block Lower-level memory is updated when the block is 64KB 2-way replaced associative data A dirty bit is used to indicate the necessity cache Help to reduce memory traffic What happens if the block is not found in cache? Write allocate: Fetch the block into cache, then write the data (usually combined with write back) I-cache D-cache No-write allocate: Do not fetch the block into cache (usually combined with write through) 13 14 Alpha 21264 Data Cache Cache performance D-cache: 64K 2-way Calculate average memory access time (AMAT) associative AMAT = hit time + Miss rate× Miss penalty Use 48-bit virtual address to index cache, Example: hit time = 1 cycle, miss time = 100 cycle, use tag from physical miss rate = 4%, than AMAT = 1+100*4% = 5 address 48-bit Virtual=>44-bit Calculate cache impact on processor address 512 block (9-bit blk performance index) CPU time = (CPU execution cycles + Memory stall cycles)×Cycle time Cache block size 64 bytes (6-bit offset)t Memory Stall Cycles CPU time = IC×CPIexecution + ×CycleTime Tag has 44-(9+6)=29 Instruction bits Writeback and write Note cycles spent on cache hit is usually counted allocated into execution cycles (We will study virtual- physical address translation) 15 16 Disadvantage of Set Associative Cache Virtual Memory Compare n-way set associative with direct mapped cache: Virtual memory (VM) allows programs to have the Has n comparators vs. 1 comparator illusion of a very large memory that is not limited by Has Extra MUX delay for the data physical memory size Data comes after hit/miss decision and set selection Make main memory (DRAM) acts like a cache for secondary In a direct mapped cache, cache block is available before storage (magnetic disk) hit/miss decision Otherwise, application programmers have to move data in/out main memory Use the data assuming the access is a hit, recover if That’s how virtual memory was first proposed found otherwise Cache Index Valid Cache Tag Cache Data Cache Data Cache Tag Valid Virtual memory also provides the following functions Cache Block 0 Cache Block 0 Allowing multiple processes share the physical memory in multiprogramming environment ::: : :: Providing protection for processes (compare Intel 8086: without VM applications can overwrite OS kernel) Facilitating program relocation in physical memory space Adr Tag Compare Sel11 Mux 0 Sel0 Compare OR Cache Block 17 Hit 3 VM Example Virtual Memory and Cache VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory and secondary storage. Cache terms vs. VM terms Cache block => page Cache Miss => page fault Tasks of hardware and OS TLB does fast address translations OS handles less frequently events: page fault TLB miss (when software approach is used) 19 20 4 Qs for Virtual Memory Virtual Memory and Cache Q1: Where can a block be placed in the upper Parameter L1 Cache Main Memory level? Miss penalty for virtual memory is very high => Full Block (page) size 16-128 bytes 4KB – 64KB associativity is desirable (so allow blocks to be placed anywhere in the memory) Hit time 1-3 cycles 50-150 cycles Have software determine the location while accessing disk (10M cycles enough to do sophisticated Miss Penalty 8-300 cycles 1M to 10M cycles replacement) Miss rate 0.1-10% 0.00001-0.001% Q2: How is a block found if it is in the upper level? Address mapping 25-45 bits => 13-21 32-64 bits => 25-45 Address divided into page number and page offset bits bits Page table and translation buffer used for address translation Q: why fully associativity does not affect hit time? 4 Qs for Virtual Memory Virtual-Physical Translation Q3: Which block should be replaced on a A virtual address consists of a virtual page miss? number and a page offset. Want to reduce miss rate & can handle in software The virtual page number gets translated to a Least Recently Used typically used physical page number. A typical approximation of LRU The page offset is not changed Hardware set reference bits OS record reference bits and clear them periodically 36 bits 12 bits OS selects a page among least-recently referenced for replacement Virtual Page Number Page offset Virtual Address Q4: What happens on a write? Translation Writing to disk is very expensive Use a write-back strategy Physical Page Number Page offset Physical Address 33 bits 12 bits 23 24 4 TLB: Improving Page Table Access Address Translation Via Page Table Cannot afford accessing page table for every access include cache hits (then cache itself makes no sense) Again, use cache to speed up accesses to page table! (cache for cache?) TLB is translation
Recommended publications
  • Intel® Strongarm® SA-1110 High- Performance, Low-Power Processor for Portable Applied Computing Devices
    Advance Copy Intel® StrongARM® SA-1110 High- Performance, Low-Power Processor For Portable Applied Computing Devices PRODUCT HIGHLIGHTS ■ Innovative Application Specific Standard Product (ASSP) delivers leadership performance, integration and low power for palm-size devices, PC companions, smart phones and other emerging portable applied computing devices As businesses and individuals rely increasingly on portable applied ■ High-speed 100 MHz memory bus and a computing devices to simplify their lives and boost their productivity, flexible memory these devices have to perform more complex functions quickly and controller that adds efficiently. To satisfy ever-increasing customer demands to support for SDRAM, communicate and access information ‘anytime, anywhere’, SMROM, and variable- manufacturers need technologies that deliver high-performance, robust latency I/O devices — provides design functionality and versatility while meeting the small-size and low-power flexibility, scalability and restrictions of portable, battery-operated products. Intel designed the high memory bandwidth SA-1110 processor with all of these requirements in mind. ■ Rich development The Intel® SA-1110 is a highly integrated 32-bit StrongARM® environment enables processor that incorporates Intel design and process technology along leading edge products with the power efficiency of the ARM* architecture. The SA-1110 is while reducing time- to-market software compatible with the ARM V4 architecture while utilizing a high-performance micro-architecture that is optimized to take advantage of Intel process technology. The Intel SA-1110 provides the performance, low power, integration and cost benefits of the Intel SA-1100 processor plus a high speed memory bus, flexible memory controller and the ability to handle variable-latency I/O devices.
    [Show full text]
  • Comparison of Contemporary Real Time Operating Systems
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 11, November 2015 Comparison of Contemporary Real Time Operating Systems Mr. Sagar Jape1, Mr. Mihir Kulkarni2, Prof.Dipti Pawade3 Student, Bachelors of Engineering, Department of Information Technology, K J Somaiya College of Engineering, Mumbai1,2 Assistant Professor, Department of Information Technology, K J Somaiya College of Engineering, Mumbai3 Abstract: With the advancement in embedded area, importance of real time operating system (RTOS) has been increased to greater extent. Now days for every embedded application low latency, efficient memory utilization and effective scheduling techniques are the basic requirements. Thus in this paper we have attempted to compare some of the real time operating systems. The systems (viz. VxWorks, QNX, Ecos, RTLinux, Windows CE and FreeRTOS) have been selected according to the highest user base criterion. We enlist the peculiar features of the systems with respect to the parameters like scheduling policies, licensing, memory management techniques, etc. and further, compare the selected systems over these parameters. Our effort to formulate the often confused, complex and contradictory pieces of information on contemporary RTOSs into simple, analytical organized structure will provide decisive insights to the reader on the selection process of an RTOS as per his requirements. Keywords:RTOS, VxWorks, QNX, eCOS, RTLinux,Windows CE, FreeRTOS I. INTRODUCTION An operating system (OS) is a set of software that handles designed known as Real Time Operating System (RTOS). computer hardware. Basically it acts as an interface The motive behind RTOS development is to process data between user program and computer hardware.
    [Show full text]
  • IXP400 Software's Programmer's Guide
    Intel® IXP400 Software Programmer’s Guide June 2004 Document Number: 252539-002c Intel® IXP400 Software Contents INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. The Intel® IXP400 Software v1.2.2 may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation.
    [Show full text]
  • Comparative Architectures
    Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture
    [Show full text]
  • Arm C Language Extensions Documentation Release ACLE Q1 2019
    Arm C Language Extensions Documentation Release ACLE Q1 2019 Arm Limited. Mar 21, 2019 Contents 1 Preface 1 1.1 Arm C Language Extensions.......................................1 1.2 Abstract..................................................1 1.3 Keywords.................................................1 1.4 How to find the latest release of this specification or report a defect in it................1 1.5 Confidentiality status...........................................1 1.5.1 Proprietary Notice.......................................2 1.6 About this document...........................................3 1.6.1 Change control.........................................3 1.6.1.1 Change history.....................................3 1.6.1.2 Changes between ACLE Q2 2018 and ACLE Q1 2019................3 1.6.1.3 Changes between ACLE Q2 2017 and ACLE Q2 2018................3 1.6.2 References...........................................3 1.6.3 Terms and abbreviations....................................3 1.7 Scope...................................................4 2 Introduction 5 2.1 Portable binary objects..........................................5 3 C language extensions 7 3.1 Data types................................................7 3.1.1 Implementation-defined type properties............................7 3.2 Predefined macros............................................8 3.3 Intrinsics.................................................8 3.3.1 Constant arguments to intrinsics................................8 3.4 Header files................................................8
    [Show full text]
  • AASP Brief 031704F.Pdf
    servicebrief Avnet Avenue Service Provider Program Avnet Design Services has teamed up with the top design service companies in North America to provide you with superior component, board and system level solutions. In cooperation with Avnet Design Services, you can access these pre-screened and certified design service providers. WHAT is the Avnet Avenue Service Provider Program? A geographically dispersed and technical diverse network of design service providers available to fulfill your design service needs Avnet's seven partners are the top design service companies in North America The program compliments Avnet Design Services' ASIC and FPGA design service offerings by providing additional component, board and system-level design services WHY use an Avnet Avenue Service Provider? Time to Market The program provides additional technical resources to assist you in meeting your time to market requirements Value All Providers are selected based on their ability to provide cost competitive solutions Experience All Providers have proven experience completing a wide array of projects on time and within budget Less Risk All Providers are pre-screened and certified to ensure your success Technology The program provides you with single source access to a broad range of services and technical expertise Scale All Providers are capable of supporting the full range of design service requirements from very large to small HOW do I access the Avnet Avenue Service Provider Program? Contact your local Avnet Representative or call 1-800-585-1602 so that
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Network Processors: Building Block for Programmable Networks
    NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale
    [Show full text]
  • Strongarm™ SA-1100 Microprocessor for Portable
    StrongARM™ SA-1100 Microprocessor for Portable Applications Brief Datasheet Product Features The StrongARM™ SA-1100 Microprocessor (SA-1100) is a device targeted to provide portable applications with high-end computing performance without requiring users to sacrifice available battery time. The SA-1100 incorporates a 32-bit StrongARM™ RISC processor with instruction and data cache, memory-management unit (MMU), and read/write buffers running at 133/190 MHz. In addition, the SA-1100 provides system support logic, multiple serial communication channels, a color/gray scale LCD controller, PCMCIA support for up to two sockets, and general-purpose I/O ports. ■ High performance ■ 208-pin thin quad flat pack (LQFP) —150 Dhrystone 2.1 MIPS @ 133 MHz ■ 256 mini-ball grid array (mBGA) —220 Dhrystone 2.1 MIPS @ 190 MHz ■ Low power (normal mode)† ■ 32-way set-associative caches —<230 mW @1.5 V/133 MHz —16 Kbyte instruction cache —<330 mW @1.5 V/190 MHz —8 Kbyte write-back data cache ■ Integrated clock generation ■ 32-entry MMUs —Internal phase-locked loop (PLL) —Maps 4 Kbyte, 8 Kbyte, or 1 Mbyte —3.686-MHz oscillator —32.768-kHz oscillator ■ Power-management features ■ Write buffer —Normal (full-on) mode —8-entry, between 1 and 16 bytes each —Idle (power-down) mode —Sleep (power-down) mode ■ Big and little endian operating modes ■ Read buffer —4-entry, 1, 4, or 8 words ■ 3.3-V I/O interface ■ Memory bus —Interfaces to ROM, Flash, SRAM, and DRAM —Supports two PCMCIA sockets † Power dissipation, particularly in idle mode, is strongly dependent on the details of the system design Order Number: 278087-002 November 1998 Information in this document is provided in connection with Intel products.
    [Show full text]
  • Sok: Introspections on Trust and the Semantic Gap
    SoK: Introspections on Trust and the Semantic Gap Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, and Radu Sion Stony Brook University fbpjain, mbaig, dozhang, porter, [email protected] Abstract—An essential goal of Virtual Machine Introspection representative legacy OS (Linux 3.13.5), and a representative (VMI) is assuring security policy enforcement and overall bare-metal hypervisor (Xen 4.4), as well as comparing the functionality in the presence of an untrustworthy OS. A number of reported exploits in both systems over the last 8 fundamental obstacle to this goal is the difficulty in accurately extracting semantic meaning from the hypervisor’s hardware- years. Perhaps unsurprisingly, the size of the code base and level view of a guest OS, called the semantic gap. Over the API complexity are strongly correlated with the number of twelve years since the semantic gap was identified, immense reported vulnerabilities [85]. Thus, hypervisors are a much progress has been made in developing powerful VMI tools. more appealing foundation for the trusted computing base Unfortunately, much of this progress has been made at of modern software systems. the cost of reintroducing trust into the guest OS, often in direct contradiction to the underlying threat model motivating This paper focuses on systems that aim to assure the func- the introspection. Although this choice is reasonable in some tionality required by applications using a legacy software contexts and has facilitated progress, the ultimate goal of stack, secured through techniques such as virtual machine reducing the trusted computing base of software systems is introspection (VMI) [46].
    [Show full text]
  • The New Intel® Xscale™ Microarchitecture
    Session 5: Application Specific Processors The new Intel® Xscale™ Microarchitecture Nuno Ricardo Carvalho de Sousa Departamento de Informática, Universidade do Minho 4710 - 057 Braga, Portugal [email protected] Abstract. In embedded systems, performance and power consumption are the most important criteria to define a good processor chip. The new Intel® Xscale™ microarchitecture, an evolution from StrongARM™ microarchitecture, combines these two features, as will be detailed in this communication. We will also see the advanced techniques used by this microarchitecture core to achieve a high level of efficiency. 1 Introduction Nowadays, most microprocessors are in embedded systems, not in PC’s. Embedded products become part of our everyday items: cellular phones, video games, Personal Digital Assistants (PDA) and much more. Although PC processors seem to generate much of all the excitement in the press, it is the other 98 percent – the embedded processors – that are technologically leading the way. This required a new design of microprocessors. The performance of these embedded microprocessors rivals that of PC’s of just few years ago. With clock frequencies up to 400 MHz, these chips offer performance, with a very economical electrical consumption [1]. A microprocessor’s architecture defines the instruction set and programmer’s model for any processor that will be based on that architecture. Different processor implementations may be built to comply with the architecture. Each processor may vary in performance and features, and be optimized to target different applications. In this document we will see with more detail Intel’s 80200, the first microprocessor that use Xscale, and the new Intel PXA250 application processor.
    [Show full text]
  • Tornado-Releasenotes
    Tornado® 2.2 RELEASE NOTES Copyright 2002 Wind River Systems, Inc. ALL RIGHTS RESERVED. No part of this publication may be copied in any form, by photocopy, microfilm, retrieval system, or by any other means now known or hereafter invented without the prior written permission of Wind River Systems, Inc. AutoCode, Embedded Internet, Epilogue, ESp, FastJ, IxWorks, MATRIXX, pRISM, pRISM+, pSOS, RouterWare, Tornado, VxWorks, wind, WindNavigator, Wind River Systems, WinRouter, and Xmath are registered trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. Attaché Plus, BetterState, Doctor Design, Embedded Desktop, Emissary, Envoy, How Smart Things Think, HTMLWorks, MotorWorks, OSEKWorks, Personal JWorks, pSOS+, pSOSim, pSOSystem, SingleStep, SNiFF+, VSPWorks, VxDCOM, VxFusion, VxMP, VxSim, VxVMI, Wind Foundation Classes, WindC++, WindManage, WindNet, Wind River, WindSurf, and WindView are trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. This is a partial list. For a complete list of Wind River trademarks and service marks, see the following URL: http://www.windriver.com/corporate/html/trademark.html Use of the above marks without the express written permission of Wind River Systems, Inc. is prohibited. All other trademarks, registered trademarks, or service marks mentioned herein are the property of their respective owners. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): 800/545-WIND telephone: 510/748-4100 facsimile: 510/749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support Tornado Release Notes, 2.2 15 Aug 02 Part #: DOC-14291-ZD-01 Contents 1 Introduction .............................................................................................................
    [Show full text]