Translation Lookaside Buffer Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Computer Organization and Architecture Designing for Performance Ninth Edition
COMPUTER ORGANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION William Stallings Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Editorial Director: Marcia Horton Designer: Bruce Kenselaar Executive Editor: Tracy Dunkelberger Manager, Visual Research: Karen Sanatar Associate Editor: Carole Snyder Manager, Rights and Permissions: Mike Joyce Director of Marketing: Patrice Jones Text Permission Coordinator: Jen Roach Marketing Manager: Yez Alayan Cover Art: Charles Bowman/Robert Harding Marketing Coordinator: Kathryn Ferranti Lead Media Project Manager: Daniel Sandin Marketing Assistant: Emma Snider Full-Service Project Management: Shiny Rajesh/ Director of Production: Vince O’Brien Integra Software Services Pvt. Ltd. Managing Editor: Jeff Holcomb Composition: Integra Software Services Pvt. Ltd. Production Project Manager: Kayla Smith-Tarbox Printer/Binder: Edward Brothers Production Editor: Pat Brown Cover Printer: Lehigh-Phoenix Color/Hagerstown Manufacturing Buyer: Pat Brown Text Font: Times Ten-Roman Creative Director: Jayne Conte Credits: Figure 2.14: reprinted with permission from The Computer Language Company, Inc. Figure 17.10: Buyya, Rajkumar, High-Performance Cluster Computing: Architectures and Systems, Vol I, 1st edition, ©1999. Reprinted and Electronically reproduced by permission of Pearson Education, Inc. Upper Saddle River, New Jersey, Figure 17.11: Reprinted with permission from Ethernet Alliance. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Copyright © 2013, 2010, 2006 by Pearson Education, Inc., publishing as Prentice Hall. All rights reserved. Manufactured in the United States of America. -
The Alpha 21264 Microprocessor: Out-Of-Order Execution at 600 Mhz
The Alpha 21264 Microprocessor: Out-of-Order Execution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA REK August 1998 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in 0.35u CMOS6, 6 metal layers, 2.2V y 15 Million transistors, 3.1 cm2, 587 pin PGA y Specint95 of 30+ and Specfp95 of 50+ y Out-of-order and speculative execution y 4-way integer issue y 2-way floating-point issue y Sophisticated tournament branch prediction y High-bandwidth memory system (1+ GB/sec) REK August 1998 2 Alpha 21264: Block Diagram FETCH MAP QUEUE REG EXEC DCACHE Stage: 0 1 2 3 4 5 6 Int Branch Int Reg Exec Predictors Reg Issue File Queue Addr Sys Bus Map (80) Exec (20) L1 Bus 64-bit Data Reg Exec Inter- Cache Bus 80 in-flight instructions File Cache plus 32 loads and 32 stores Addr face 64KB 128-bit (80) Exec Unit Next-Line 2-Set Address Phys Addr 4 Instructions / cycle L1 Ins. 44-bit Cache FP ADD FP Reg 64KB FP Div/Sqrt Issue File Victim 2-Set Reg Queue (72) FP MUL Buffer Map (15) Miss Address REK August 1998 3 Alpha 21264: Block Diagram FETCH MAP QUEUE REG EXEC DCACHE Stage: 0 1 2 3 4 5 6 Int Branch Int Reg Exec Predictors Reg Issue File Queue Addr Sys Bus Map (80) Exec (20) L1 Bus 64-bit Data Reg Exec Inter- Cache Bus 80 in-flight instructions File Cache plus 32 loads and 32 stores Addr face 64KB 128-bit (80) Exec Unit Next-Line 2-Set Address Phys Addr 4 Instructions / cycle L1 Ins. -
Intermediate X86 Part 2
Intermediate x86 Part 2 Xeno Kovah – 2010 xkovah at gmail All materials are licensed under a Creative Commons “Share Alike” license. • http://creativecommons.org/licenses/by-sa/3.0/ 2 Paging • Previously we discussed how segmentation translates a logical address (segment selector + offset) into a 32 bit linear address. • When paging is disabled, linear addresses map 1:1 to physical addresses. • When paging is enabled, a linear address must be translated to determine the physical address it corresponds to. • It’s called “paging” because physical memory is divided into fixed size chunks called pages. • The analogy is to books in a library. When you need to find some information first you go to the library where you look up the relevant book, then you select the book and look up a specific page, from there you maybe look for a specific specific sentence …or “word”? ;) • The internets ruined the analogy! 3 Notes and Terminology • All of the figures or references to the manual in this section refer to the Nov 2008 manual (available in the class materials). This is because I think the manuals <= 2008 organized and presented much clearer than >= 2009 manuals. • When I refer to a “frame” it means “a page- sized chunk of physical memory” • When paging is enabled, a “linear address” is the same thing as a “virtual memory ” “ ” address or virtual address 4 The terrifying truth revealed! And now you knowAHHHHHHHH!!!…the rest of the story.(Nah, Good it’s not so bad day. :)) 5 Virtual Memory • When paging is enabled, the 32 bit linear address space can be mapped to a physical address space less than 32 bits. -
Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN)
Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN). [Chapters 1, 2] • Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] Week 2 • MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2] • Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 4] Week 3 • CPU Organization: Datapath & Control Unit Design. [Chapter 5] Week 4 – MIPS Single Cycle Datapath & Control Unit Design. – MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Week 5 • Microprogrammed Control Unit Design. [Chapter 5] – Microprogramming Project Week 6 • Midterm Review and Midterm Exam Week 7 • CPU Pipelining. [Chapter 6] • The Memory Hierarchy: Cache Design & Performance. [Chapter 7] Week 8 • The Memory Hierarchy: Main & Virtual Memory. [Chapter 7] Week 9 • Input/Output Organization & System Performance Evaluation. [Chapter 8] Week 10 • Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Week 11 • Final Exam. EECC550 - Shaaban #1 Lec # 1 Winter 2005 11-29-2005 Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals • Computing Element Choices: – Computing Element Programmability – Spatial vs. Temporal Computing – Main Processor Types/Applications • General Purpose Processor Generations • The Von Neumann Computer Model • CPU Organization (Design) • Recent Trends in Computer Design/performance • Hierarchy -
Chapter 3 Protected-Mode Memory Management
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT This chapter describes the Intel 64 and IA-32 architecture’s protected-mode memory management facilities, including the physical memory requirements, segmentation mechanism, and paging mechanism. See also: Chapter 5, “Protection” (for a description of the processor’s protection mechanism) and Chapter 20, “8086 Emulation” (for a description of memory addressing protection in real-address and virtual-8086 modes). 3.1 MEMORY MANAGEMENT OVERVIEW The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mech- anism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional. These two mechanisms (segmentation and paging) can be configured to support simple single-program (or single- task) systems, multitasking systems, or multiple-processor systems that used shared memory. As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s addressable memory space (called the linear address space) into smaller protected address spaces called segments. Segments can be used to hold the code, data, and stack for a program or to hold system data structures (such as a TSS or LDT). -
IA-32 Intel Architecture Software Developer's
IA-32 Intel Architecture Software Developer’s Manual Volume 3: System Programming Guide NOTE: The IA-32 Intel Architecture Developer’s Manual consists of three books: Basic Architecture, Order Number 245470; Instruction Set Reference Manual, Order Number 245471; and the System Programming Guide, Order Number 245472. Please refer to all three volumes when evaluating your design needs. 2001 Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel’s IA-32 Intel® Architecture processors (e.g., Pentium® 4 and Pentium® III processors) may contain design defects or errors known as errata. Current characterized errata are available on request. Intel®, Intel386™, Intel486™, Pentium®, Intel® NetBurst™, MMX™, and Itanium™ are trademarks owned by Intel Corporation. -
Computer Architectures an Overview
Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements. -
Improving the Precise Interrupt Mechanism of Software- Managed TLB Miss Handlers
Improving the Precise Interrupt Mechanism of Software- Managed TLB Miss Handlers Aamer Jaleel and Bruce Jacob Electrical & Computer Engineering University of Maryland at College Park {ajaleel,blj}@eng.umd.edu Abstract. The effects of the general-purpose precise interrupt mechanisms in use for the past few decades have received very little attention. When modern out-of-order processors handle interrupts precisely, they typically begin by flushing the pipeline to make the CPU available to execute handler instructions. In doing so, the CPU ends up flushing many instructions that have been brought in to the reorder buffer. In par- ticular, many of these instructions have reached a very deep stage in the pipeline - representing significant work that is wasted. In addition, an overhead of several cycles can be expected in re-fetching and re-executing these instructions. This paper concentrates on improving the performance of precisely handling software managed translation lookaside buffer (TLB) interrupts, one of the most frequently occurring interrupts. This paper presents a novel method of in-lining the interrupt handler within the reorder buffer. Since the first level interrupt-handlers of TLBs are usually small, they could potentially fit in the reorder buffer along with the user-level code already there. In doing so, the instructions that would otherwise be flushed from the pipe need not be re-fetched and re-executed. Additionally, it allows for instructions independent of the exceptional instruction to continue to execute in parallel with the handler code. We simulate two different schemes of in-lining the interrupt on a pro- cessor with a 4-way out-of-order core similar to the Alpha 21264. -
Intel® Architecture Instruction Set Extensions and Future Features
Intel® Architecture Instruction Set Extensions and Future Features Programming Reference May 2021 319433-044 Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. All product plans and roadmaps are subject to change without notice. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These are not “commercial” names and not intended to function as trademarks. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be ob- tained by calling 1-800-548-4725, or by visiting http://www.intel.com/design/literature.htm. Copyright © 2021, Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. -
Data Caches for Superscalar Processors*
Data Caches for Superscalar Processors* Toni Juan Juan J. Navarro Olivier Temam antoniojQx.upc.es juanjoQac.upc.es [email protected] Dept. Arquitectura de Computadors - Barcelona PRiSM [Jniversitat Politecnica de Catalunya Versailles University Spain FlWlCe Abstract are more difficult to design because load/store requests sent in parallel share no obvious locality properties. The diffi- As the number of instructions executed in parallel increases, culty is to propose a design that can cope with increasing superscalar processors will require higher bandwidth from degree of instruction parallelism. The solutions presently data caches. Because of the high cost of true multi-ported implemented in processors can be classified as: caches, alternative cache designs must be evaluated. The l True multi-porting. With respect to performance true purpose of this study is to examine the data cache band- multi-porting is clearly an ideal solution, but its chip area width requirements of high-degree superscalar processors, cost is high. Cost can be partly reduced by accepting a and investigate alternative solutions. The designs studied certain degradation of cache access time that would reduce range from classic solutions like multi-banked caches to more the performance. complex solutions recently proposed in the literature. The . Multiple Cache Copies. For n accesses, the cache must performance tradeoffs of these different cache designs are be replicated n times with no benefit to storage space. More- examined in details. Then, using a chip area cost model, over, store requests are sent simultaneously to all cache all solutions are compared with respect to both cost and copies for coherence and thus no other cache request can performance. -
Zarka Cvetanovic and R.E. Kessler Compaq Computer Corporation
PERFORMANCE ANALYSIS OF THE ALPHA 21264-BASED COMPAQ ES40 SYSTEM Zarka Cvetanovic and R.E. Kessler Compaq Computer Corporation Abstract SPEC95 1-CPU 90 This paper evaluates performance characteristics of the Compaq ES40/21264 667MHz Compaq ES40 shared memory multiprocessor. The ES40 80 HP PA-8500 440MHz system contains up to four Alpha 21264 CPU’s together IBM Power3 375MHz with a high-performance memory system. We qualitatively 70 SUN USPARC-II 450MHz describe architectural features included in the 21264 Intel Pentium-III 800MHz microprocessor and the surrounding system chipset. We 60 further quantitatively show the performance effects of these features using benchmark results and profiling data 50 collected from industry-standard commercial and technical workloads. The profile data includes basic performance 40 information – such as instructions per cycle, branch mispredicts, and cache misses – as well as other data that 30 specifically characterizes the 21264. Wherever possible, we compare and contrast the ES40 to the AlphaServer 4100 – a 20 previous-generation Alpha system containing four Alpha 21164 microprocessors – to highlight the architectural 10 advances in the ES40. We find that the Compaq ES40 often provides 2 to 3 times the performance of the AlphaServer 0 4100 at similar clock frequencies. We also find that the SPECint95 SPECfp95 ES40 memory system has about five times the memory bandwidth of the 4100. These performance improvements Figure 1 - SPEC95 Comparison come from numerous microprocessor and platform enhancements, including out-of-order execution, branch SPECfp_rate95 prediction, functional units, and the memory system. 3000 Compaq ES40/21264 667MHz 1. INTRODUCTION HP PA-8500 440MHz 2500 SUN USparc-II 400MHz The Compaq ES40 is a shared memory multiprocessor Intel Pentium-III 800MHz containing up to four third-generation Alpha 21264 2000 microprocessors [1][2]. -
Alphapc 264DP Technical Reference Manual Preliminary
AlphaPC 264DP Technical Reference Manual Order Number: EC–RB0DA–TE Revision/Update Information: This is a new document. Preliminary Compaq Computer Corporation February 1999 The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAM- AGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED "AS IS" AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WAR- RANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT. This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer Corporation. ©1999 Digital Equipment Corporation. All rights reserved. Printed in U.S.A. COMPAQ, the Compaq logo, the Digital logo, and DIGITAL Registered in U.S. Patent and Trademark Office. AlphaPC, DECchip, and Tru64 are trademarks of Compaq Computer Corporation. Intel is a registered trademark of Intel Corporation. Microsoft, Visual C++, and Windows NT are registered trademarks of Microsoft Corporation. Other product names mentioned herein may be the trademarks of their respective companies. 12 February 1999 – Subject to Change Contents 1 AlphaPC 264DP Introduction 1.1 System Components and Features . 1-1 1.1.1 Memory Subsystem . 1-2 1.1.2 21272 Core Logic Chipset . 1-2 1.1.3 CPU Daughtercard . 1-3 1.1.3.1 Level 2 Cache Subsystem Overview . 1-4 1.1.3.2 21264 DC-to-DC Converter .