Review You Are Here!

Total Page:16

File Type:pdf, Size:1020Kb

Review You Are Here! 11/20/12 Review • ImplemenOng precise interrupts in in-order CS 61C: pipelines: – Save excepOons in pipeline unOl commit point Great Ideas in Computer Architecture – Check for traps and interrupts before commit Virtual Memory – No architectural state overwriCen before commit • Support mulOprogramming with translaon and protecOon Instructors: – Base and bound, simple scheme, suffers from memory Krste Asanovic, Randy H. Katz fragmentaon – Paged systems remove external fragmentaon but hCp://inst.eecs.Berkeley.edu/~cs61c/fa12 add indirecOon through page table Fall 2012 -- Lecture #36 1 Fall 2012 -- Lecture #36 2 You Are Here! Private Address Space per User So/ware Hardware OS • Parallel Requests User 1 VA1 Warehouse Smart pages Assigned to computer Scale Phone Page Table e.g., Search “Katz” Computer • Harness Parallel Threads Parallelism & Today’s User 2 Assigned to core Achieve High Lecture VA1 e.g., Lookup, Ads Performance Computer • Parallel InstrucOons Core … Core Page Table >1 instrucOon @ one Ome Memory (Cache) e.g., 5 pipelined instrucOons User 3 VA1 Input/Output Core • Parallel Data Memory Physical FuncOonal >1 data item @ one Ome InstrucOon Unit(s) Unit(s) Page Table e.g., Add of 4 pairs of words free A0+B0 A1+B1 A2+B2 A3+B3 • Hardware descripOons Main Memory • Each user has a page table All gates @ one Ome • Page table contains an entry for each user page • Programming Languages Logic Gates Fall 2012 -- Lecture #36 3 Fall 2012 -- Lecture #36 4 A Problem in the Early SixOes Manual Overlays • Assume an instrucOon can address all • There were many applicaons whose data the storage on the drum could not fit in the main memory, e.g., 40k bits payroll • Method 1: programmer keeps track of main – Paged memory system reduced fragmenta=on addresses in the main memory and but s=ll required the whole program to be iniOates an I/O transfer when required resident in the main memory – Difficult, error-prone! 640k bits • Method 2: automac iniOaon of I/O drum transfers by sokware address Central Store translaon Ferranti Mercury – Brooker’s interpre=ve coding, 1960 1956 – Inefficient! Not just an ancient black art, e.g., IBM Cell microprocessor used in Playsta=on-3 has explicitly managed local store! Fall 2012 -- Lecture #36 5 Fall 2012 -- Lecture #36 6 1 11/20/12 Demand Paging in Atlas (1962) Hardware Organizaon of Atlas Effective 16 ROM pages system code “A page from secondary Initial 0.4 ~1 µsec (not swapped) Address storage is brought into the Address 2 subsidiary pagessystem data primary storage whenever Decode (not swapped) PARs 1.4 µsec it is (implicitly) demanded 48-bit words 0 by the processor.” 512-word pages Main Drum (4) Primary 8 Tape Tom Kilburn 32 pages 192 pages 32 Pages decks 1 Page Address 1.4 µsec 512 words/page 31 88 sec/ Register (PAR) word <effective PN , status> Primary memory as a cache per page frame for secondary memory Secondary Central (Drum) Compare the effective page address against all 32 PARs User sees 32 x 6 x 512 words Memory 32x6 pages match ⇒ normal access of storage no match ⇒ page fault save the state of the partially executed instruction Fall 2012 -- Lecture #36 7 Fall 2012 -- Lecture #36 8 Atlas Demand Paging Scheme Recap: Typical Memory Hierarchy • Take advantage of the principle of locality to present the user • On a page fault: with as much memory as is available in the cheapest – Input transfer into a free page is iniOated technology at the speed offered by the fastest technology – The Page Address Register (PAR) is updated On-Chip Components – If no free page is lek, a page is selected to be replaced Control (based on usage) Cache Instr – The replaced page is wriCen on the drum Second Secondary Level Main Memory RegFile Memory (Disk) Datapath Cache • to minimize drum latency effect, the first empty Data Cache (SRAM) (DRAM) page on the drum was selected – The page table is updated to point to the new locaon of the page on the drum Speed (cycles): ½’s 1’s 10’s 100’s 10,000’s Size (bytes): 100’s 10K’s M’s G’s T’s Cost: highest lowest Fall 2012 -- Lecture #36 9 11/20/12 Fall 2012 -- Lecture #31 10 Modern Virtual Memory Systems Illusion of a large, private, uniform store Administrivia Protection & Privacy OS several users, each with their private • Regrade request deadline Monday Nov 26 address space and one or more shared address spaces useri – For everything up to Project 4 page table ≡ name space Swapping Demand Paging Store Provides the ability to run programs Primary larger than the primary memory Memory Hides differences in machine configurations The price is address translation on each memory reference VA mapping PA TLB Fall 2012 -- Lecture #36 11 Fall 2012 -- Lecture #36 12 2 11/20/12 CS61C in the News Hierarchical Page Table “World's oldest digital Virtual Address computer successfully 31 22 21 12 11 0 p1 p2 offset reboots” 10-bit 10-bit L1 index L2 index offset Iain Thomson Root of the Current Page Table p2 The Register, 11/20/2012 “Aker three years of restoraon by the Naonal Museum of CompuOng (TNMOC) and staff at p1 Bletchley Park, the world's oldest funcOoning digital computer has been successfully rebooted at a ceremony aended by two of its original developers. The 2.5 ton Harwell Dekatron, later renamed (Processor Level 1 the Wolverhampton Instrument for Teaching Computaon from Harwell (WITCH), was first Register) Page Table constructed in 1949 and from 1951 ran at the USK's Harwell Atomic Energy Research Establishment, Memory Physical where it was used to process mathemacal calculaons for Britain's nuclear program. Level 2 The system uses 828 flashing Dekatron valves, each capable of holding a single digit, for volale Page Tables memory, plus 480 GPO 3000 type relays to shik calculaons and six paper tape readers. It was very page in primary memory slow, taking a couple of seconds for each addiOon or subtracOon, five seconds for mulOplicaon and page in secondary memory up to 15 for division.” PTE of a nonexistent page Data Pages Fall 2012 -- Lecture #36 13 Fall 2012 -- Lecture #36 14 Two-Level Page Tables in Physical Address Translaon & ProtecOon Memory Physical Virtual Memory Virtual Address Virtual Page No. (VPN) offset Address Spaces Level 1 PT Kernel/User Mode User 1 Read/Write VA1 Protection Address Level 1 PT Check Translation User 1 User 2 Exception? Physical Address Physical Page No. (PPN) offset User2/VA1 VA1 • Every instruction and data access needs address User1/VA1 translation and protection checks User 2 A good VM design needs to be fast (~ one cycle) and Level 2 PT space efficient User 2 Fall 2012 -- Lecture #36 15 Fall 2012 -- Lecture #36 16 Translaon Lookaside Buffers (TLB) TLB Designs (really an Address Translaon Cache!) Address translation is very expensive! • Typically 32-128 entries, usually fully associave – Each entry maps a large page, hence less spaal locality across In a two-level page table, each reference pages è more likely that two entries conflict becomes several memory accesses – SomeOmes larger TLBs (256-512 entries) are 4-8 way set- associave Solution: Cache translations in TLB – Larger systems someOmes have mulO-level (L1 and L2) TLBs TLB hit Single-Cycle Translation ⇒ • Random or FIFO replacement policy TLB miss ⇒ Page-Table Walk to refill • No process informaon in TLB? virtual address VPN offset • TLB Reach: Size of largest virtual address space that can be simultaneously mapped by TLB V R W D tag PPN (VPN = virtual page number) Example: 64 TLB entries, 4KB pages, one page per entry (PPN = physical page number) TLB Reach = _____________________________________________? hit? physical address PPN offset Fall 2012 -- Lecture #36 17 Fall 2012 -- Lecture #36 18 3 11/20/12 Handling a TLB Miss Flashcard Quiz: Software (MIPS, Alpha) Which statement is false? TLB miss causes an exception and the operating system walks the page tables and reloads TLB. A privileged “untranslated” addressing mode used for walk Hardware (SPARC v8, x86, PowerPC, RISC-V) A memory management unit (MMU) walks the page tables and reloads the TLB If a missing (data or PT) page is encountered during the TLB reloading, MMU gives up and signals a Page-Fault exception for the original instruction Fall 2012 -- Lecture #36 20 21 Hierarchical Page Table Walk: Page-Based Virtual-Memory Machine (Hardware Page-Table Walk) SPARC v8 Page Fault? Page Fault? Virtual Address Index 1 Index 2 Index 3 Offset Protecon viola=on? Protecon viola=on? 31 23 17 11 0 Virtual Virtual Context Context Table Address Physical Address Physical Table Address Address Register L1 Table Inst. Inst. Data Data PC D Decode E + M W root ptr TLB Cache TLB Cache Context Register L2 Table Miss? Miss? PTP L3 Table Page-Table Base Register Hardware Page PTP Table Walker Physical PTE Physical Memory Controller Address Address 31 11 0 Physical Address Physical Address PPN Offset Main Memory (DRAM) MMU does this table walk in hardware on a TLB miss • Assumes page tables held in untranslated physical memory Fall 2012 -- Lecture #36 22 Fall 2012 -- Lecture #36 23 Address Translaon: pung it all together Handling VM-related traps Virtual Address Inst Inst. Data Data hardware Decode PC TLB Cache D E + M TLB Cache W TLB hardware or software software Lookup TLB miss? Page Fault? TLB miss? Page Fault? miss hit Protection violation? Protection violation? Page Table Protection • Handling a TLB miss needs a hardware or sokware Walk Check mechanism to refill TLB • Handling a page fault (e.g., page is on disk) needs a the page is ∉ memory ∈ memory denied permitted restartable trap so sokware handler can resume aer retrieving page Page Fault Update TLB Protection Physical – Precise excepOons are easy to restart (OS loads page) Fault Address – Can be imprecise but restartable, but this complicates OS (to cache) sokware Where? SEGFAULT • Handling protecOon violaon may abort process Fall 2012 -- Lecture #36 24 25 4 11/20/12 Concurrent Access to TLB & Cache Address Translaon in CPU Pipeline (Virtual Index/Physical Tag) Virtual Inst Inst.
Recommended publications
  • Data Storage the CPU-Memory
    Data Storage •Disks • Hard disk (HDD) • Solid state drive (SSD) •Random Access Memory • Dynamic RAM (DRAM) • Static RAM (SRAM) •Registers • %rax, %rbx, ... Sean Barker 1 The CPU-Memory Gap 100,000,000.0 10,000,000.0 Disk 1,000,000.0 100,000.0 SSD Disk seek time 10,000.0 SSD access time 1,000.0 DRAM access time Time (ns) Time 100.0 DRAM SRAM access time CPU cycle time 10.0 Effective CPU cycle time 1.0 0.1 CPU 0.0 1985 1990 1995 2000 2003 2005 2010 2015 Year Sean Barker 2 Caching Smaller, faster, more expensive Cache 8 4 9 10 14 3 memory caches a subset of the blocks Data is copied in block-sized 10 4 transfer units Larger, slower, cheaper memory Memory 0 1 2 3 viewed as par@@oned into “blocks” 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 3 Cache Hit Request: 14 Cache 8 9 14 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 4 Cache Miss Request: 12 Cache 8 12 9 14 3 12 Request: 12 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 5 Locality ¢ Temporal locality: ¢ Spa0al locality: Sean Barker 6 Locality Example (1) sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Sean Barker 7 Locality Example (2) int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } Sean Barker 8 Locality Example (3) int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Sean Barker 9 The Memory Hierarchy The Memory Hierarchy Smaller On 1 cycle to access CPU Chip Registers Faster Storage Costlier instrs can L1, L2 per byte directly Cache(s) ~10’s of cycles to access access (SRAM) Main memory ~100 cycles to access (DRAM) Larger Slower Flash SSD / Local network ~100 M cycles to access Cheaper Local secondary storage (disk) per byte slower Remote secondary storage than local (tapes, Web servers / Internet) disk to access Sean Barker 10.
    [Show full text]
  • Managing the Memory Hierarchy
    Managing the Memory Hierarchy Jeffrey S. Vetter Sparsh Mittal, Joel Denny, Seyong Lee Presented to SOS20 Asheville 24 Mar 2016 ORNL is managed by UT-Battelle for the US Department of Energy http://ft.ornl.gov [email protected] Exascale architecture targets circa 2009 2009 Exascale Challenges Workshop in San Diego Attendees envisioned two possible architectural swim lanes: 1. Homogeneous many-core thin-node system 2. Heterogeneous (accelerator + CPU) fat-node system System attributes 2009 “Pre-Exascale” “Exascale” System peak 2 PF 100-200 PF/s 1 Exaflop/s Power 6 MW 15 MW 20 MW System memory 0.3 PB 5 PB 32–64 PB Storage 15 PB 150 PB 500 PB Node performance 125 GF 0.5 TF 7 TF 1 TF 10 TF Node memory BW 25 GB/s 0.1 TB/s 1 TB/s 0.4 TB/s 4 TB/s Node concurrency 12 O(100) O(1,000) O(1,000) O(10,000) System size (nodes) 18,700 500,000 50,000 1,000,000 100,000 Node interconnect BW 1.5 GB/s 150 GB/s 1 TB/s 250 GB/s 2 TB/s IO Bandwidth 0.2 TB/s 10 TB/s 30-60 TB/s MTTI day O(1 day) O(0.1 day) 2 Memory Systems • Multimode memories – Fused, shared memory – Scratchpads – Write through, write back, etc – Virtual v. Physical, paging strategies – Consistency and coherence protocols • 2.5D, 3D Stacking • HMC, HBM/2/3, LPDDR4, GDDR5, WIDEIO2, etc https://www.micron.com/~/media/track-2-images/content-images/content_image_hmc.jpg?la=en • New devices (ReRAM, PCRAM, Xpoint) J.S.
    [Show full text]
  • Technical Details of the Elliott 152 and 153
    Appendix 1 Technical Details of the Elliott 152 and 153 Introduction The Elliott 152 computer was part of the Admiralty’s MRS5 (medium range system 5) naval gunnery project, described in Chap. 2. The Elliott 153 computer, also known as the D/F (direction-finding) computer, was built for GCHQ and the Admiralty as described in Chap. 3. The information in this appendix is intended to supplement the overall descriptions of the machines as given in Chaps. 2 and 3. A1.1 The Elliott 152 Work on the MRS5 contract at Borehamwood began in October 1946 and was essen- tially finished in 1950. Novel target-tracking radar was at the heart of the project, the radar being synchronized to the computer’s clock. In his enthusiasm for perfecting the radar technology, John Coales seems to have spent little time on what we would now call an overall systems design. When Harry Carpenter joined the staff of the Computing Division at Borehamwood on 1 January 1949, he recalls that nobody had yet defined the way in which the control program, running on the 152 computer, would interface with guns and radar. Furthermore, nobody yet appeared to be working on the computational algorithms necessary for three-dimensional trajectory predic- tion. As for the guns that the MRS5 system was intended to control, not even the basic ballistics parameters seemed to be known with any accuracy at Borehamwood [1, 2]. A1.1.1 Communication and Data-Rate The physical separation, between radar in the Borehamwood car park and digital computer in the laboratory, necessitated an interconnecting cable of about 150 m in length.
    [Show full text]
  • Embedded DRAM
    Embedded DRAM Raviprasad Kuloor Semiconductor Research and Development Centre, Bangalore IBM Systems and Technology Group DRAM Topics Introduction to memory DRAM basics and bitcell array eDRAM operational details (case study) Noise concerns Wordline driver (WLDRV) and level translators (LT) Challenges in eDRAM Understanding Timing diagram – An example References Slide 1 Acknowledgement • John Barth, IBM SRDC for most of the slides content • Madabusi Govindarajan • Subramanian S. Iyer • Many Others Slide 2 Topics Introduction to memory DRAM basics and bitcell array eDRAM operational details (case study) Noise concerns Wordline driver (WLDRV) and level translators (LT) Challenges in eDRAM Understanding Timing diagram – An example Slide 3 Memory Classification revisited Slide 4 Motivation for a memory hierarchy – infinite memory Memory store Processor Infinitely fast Infinitely large Cycles per Instruction Number of processor clock cycles (CPI) = required per instruction CPI[ ∞ cache] Finite memory speed Memory store Processor Finite speed Infinite size CPI = CPI[∞ cache] + FCP Finite cache penalty Locality of reference – spatial and temporal Temporal If you access something now you’ll need it again soon e.g: Loops Spatial If you accessed something you’ll also need its neighbor e.g: Arrays Exploit this to divide memory into hierarchy Hit L2 L1 (Slow) Processor Miss (Fast) Hit Register Cache size impacts cycles-per-instruction Access rate reduces Slower memory is sufficient Cache size impacts cycles-per-instruction For a 5GHz
    [Show full text]
  • Information Technology 3 and 4/1998. Emerging
    OCCASION This publication has been made available to the public on the occasion of the 50th anniversary of the United Nations Industrial Development Organisation. DISCLAIMER This document has been produced without formal United Nations editing. The designations employed and the presentation of the material in this document do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations Industrial Development Organization (UNIDO) concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries, or its economic system or degree of development. Designations such as “developed”, “industrialized” and “developing” are intended for statistical convenience and do not necessarily express a judgment about the stage reached by a particular country or area in the development process. Mention of firm names or commercial products does not constitute an endorsement by UNIDO. FAIR USE POLICY Any part of this publication may be quoted and referenced for educational and research purposes without additional permission from UNIDO. However, those who make use of quoting and referencing this publication are requested to follow the Fair Use Policy of giving due credit to UNIDO. CONTACT Please contact [email protected] for further information concerning UNIDO publications. For more information about UNIDO, please visit us at www.unido.org UNITED NATIONS INDUSTRIAL DEVELOPMENT ORGANIZATION Vienna International Centre, P.O. Box 300, 1400 Vienna, Austria Tel: (+43-1) 26026-0 · www.unido.org · [email protected] EMERGING TECHNOLOGY SERIES 3 and 4/1998 Information Technology UNITED NATIONS INDUSTRIAL DEVELOPMENT ORGANIZATION Vienna, 2000 EMERGING TO OUR READERS TECHNOLOGY SERIES Of special interest in this issue of Emerging Technology Series: Information T~chn~~ogy is the focus on a meeting that took INFORMATION place from 2n to 4 November 1998, in Bangalore, India.
    [Show full text]
  • The Memory Hierarchy
    Chapter 6 The Memory Hierarchy To this point in our study of systems, we have relied on a simple model of a computer system as a CPU that executes instructions and a memory system that holds instructions and data for the CPU. In our simple model, the memory system is a linear array of bytes, and the CPU can access each memory location in a constant amount of time. While this is an effective model as far as it goes, it does not reflect the way that modern systems really work. In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access times. CPU registers hold the most frequently used data. Small, fast cache memories nearby the CPU act as staging areas for a subset of the data and instructions stored in the relatively slow main memory. The main memory stages data stored on large, slow disks, which in turn often serve as staging areas for data stored on the disks or tapes of other machines connected by networks. Memory hierarchies work because well-written programs tend to access the storage at any particular level more frequently than they access the storage at the next lower level. So the storage at the next level can be slower, and thus larger and cheaper per bit. The overall effect is a large pool of memory that costs as much as the cheap storage near the bottom of the hierarchy, but that serves data to programs at the rate of the fast storage near the top of the hierarchy.
    [Show full text]
  • Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality
    Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality CSCE 513 Computer Architecture Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE513 1 Topics for Memory Hierarchy • Memory Technology and Principal of Locality – CAQA: 2.1, 2.2, B.1 – COD: 5.1, 5.2 • Cache Organization and Performance – CAQA: B.1, B.2 – COD: 5.2, 5.3 • Cache Optimization – 6 Basic Cache Optimization Techniques • CAQA: B.3 – 10 Advanced Optimization Techniques • CAQA: 2.3 • Virtual Memory and Virtual Machine – CAQA: B.4, 2.4; COD: 5.6, 5.7 – Skip for this course 2 The Big Picture: Where are We Now? Processor Input Control Memory Datapath Output • Memory system – Supplying data on time for computation (speed) – Large enough to hold everything needed (capacity) 3 Overview • Programmers want unlimited amounts of memory with low latency • Fast memory technology is more expensive per bit than slower memory • Solution: organize memory system into a hierarchy – Entire addressable memory space available in largest, slowest memory – Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor • Temporal and spatial locality insures that nearly all references can be found in smaller memories – Gives the allusion of a large, fast memory being presented to the processor 4 Memory Hierarchy Memory Hierarchy 5 Memory Technology • Random Access: access time is the same for all locations • DRAM: Dynamic Random Access Memory – High density, low power, cheap, slow – Dynamic: need to be “refreshed” regularly – 50ns – 70ns, $20 – $75 per GB • SRAM: Static Random Access Memory – Low density, high power, expensive, fast – Static: content will last “forever”(until lose power) – 0.5ns – 2.5ns, $2000 – $5000 per GB Ideal memory: • Magnetic disk • Access time of SRAM – 5ms – 20ms, $0.20 – $2 per GB • Capacity and cost/GB of disk 6 Static RAM (SRAM) 6-Transistor Cell – 1 Bit 6-Transistor SRAM Cell word word 0 1 (row select) 0 1 bit bit bit bit • Write: 1.
    [Show full text]
  • Alan Turingturing –– Computercomputer Designerdesigner
    AlanAlan TuringTuring –– ComputerComputer DesignerDesigner Brian E. Carpenter with input from Robert W. Doran The University of Auckland May 2012 Turing, the theoretician ● Turing is widely regarded as a pure mathematician. After all, he was a B-star Wrangler (in the same year as Maurice Wilkes) ● “It is possible to invent a single machine which can be used to compute any computable sequence. If this machine U is supplied with the tape on the beginning of which is written the string of quintuples separated by semicolons of some computing machine M, then U will compute the same sequence as M.” (1937) ● So how was he able to write Proposals for development in the Mathematics Division of an Automatic Computing Engine (ACE) by the end of 1945? 2 Let’s read that carefully ● “It is possible to inventinvent a single machinemachine which can be used to compute any computable sequence. If this machinemachine U is supplied with the tapetape on the beginning of which is writtenwritten the string of quintuples separated by semicolons of some computing machinemachine M, then U will compute the same sequence as M.” ● The founding statement of computability theory was written in entirely physical terms. 3 What would it take? ● A tape on which you can write, read and erase symbols. ● Poulsen demonstrated magnetic wire recording in 1898. ● A way of storing symbols and performing simple logic. ● Eccles & Jordan patented the multivibrator trigger circuit (flip- flop) in 1919. ● Rossi invented the coincidence circuit (AND gate) in 1930. ● Building U in 1937 would have been only slightly more bizarre than building a differential analyser with Meccano.
    [Show full text]
  • Chapter 2: Memory Hierarchy Design
    Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. Memory hierarchy 1. Basic concepts 2. Design techniques 2. Caches 1. Types of caches: Fully associative, Direct mapped, Set associative 2. Ten optimization techniques 3. Main memory 1. Memory technology 2. Memory optimization 3. Power consumption 4. Memory hierarchy case studies: Opteron, Pentium, i7. 5. Virtual memory 6. Problem solving dcm 2 Introduction Introduction Programmers want very large memory with low latency Fast memory technology is more expensive per bit than slower memory Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Temporal and spatial locality insures that nearly all references can be found in smaller memories Gives the allusion of a large, fast memory being presented to the processor Copyright © 2012, Elsevier Inc. All rights reserved. 3 Memory hierarchy Processor L1 Cache L2 Cache L3 Cache Main Memory Latency Hard Drive or Flash Capacity (KB, MB, GB, TB) Copyright © 2012, Elsevier Inc. All rights reserved. 4 PROCESSOR L1: I-Cache D-Cache I-Cache instruction cache D-Cache data cache U-Cache unified cache L2: U-Cache Different functional units fetch information from I-cache and D-cache: decoder and L3: U-Cache scheduler operate with I- cache, but integer execution unit and floating-point unit Main: Main Memory communicate with D-cache. Copyright © 2012, Elsevier Inc. All rights reserved.
    [Show full text]
  • NEWSLETTER Cpdtrf;Lq'tp.Nn •
    Th~?p~,~eps~9ti(t~J.s··r'~ws{~tt~I"" DIGITAL COMPUTER j.~~p'~r()xJ ?~i~\ .me? lumfOr ~he !.."te r-?h~9ge among t n'ter~~te~ p~r~On~\?jlnf9rtl).~t 1?n<?o" ... <;(Enl~g ,·rT?~9~•.· •.···· ••• ~. ~v;lop1l1en~~ In.t,v.~t.I.~iJ$>·· .. ~... 1,9 f~·.1l.·I·i· ••. ?9I11p.tJt~.r pr;.;J;ct$~pi.~lr.I.~.lIt.ton.<.i~ ..... 11 T'" ItT<i'~O'9?y~rvm~~t~gEl~<:iesi NEWSLETTER cPDtrf;lQ'tp.nn •. "'n(.t,.~ontr I bu't;Qrs. OFFICE OF NAVAL RESEARCH MATHEMATICAL SCIENCES DIVISION Vol. 11, No.3 Gordon D. Goldstein, Editor July 1959 Jean S. Campbell, Asst. Editor TABLE OF CONTENTS Page No. COMPUTERS AND DATA PROCESSING, NORTH AMERICA 1. Ferranti Electric, Inc., Sirius, Hempstead, L.L, New York 1 2. Librascope, Inc., Libratrol- 500, Glendale, California 1 3. Monroe Calculating Machine Company, Distributape, Orange, New Jersey 2 COMPUTING CENTERS 1. Air Force Cambridge Research Center, Computer and Mathematical Sciences Laboratory, L.G. Hanscom Field, Bedford, Massachusetts 2 2. Georgia Institute of Technology, Rich Electronic Computer Center, 2 Atlanta, Georgia 2 3. Holloman Air Force Base, New Mexico, Data Assimilator 3 4. National Bureau of Standards, Computation Laboratory, Washington, D.C. 4 5. New York University,AEC Computing and Applied Mathematics Center, NewYork,N.Y. 4 6. RCA Service Company, FLAC I and IBM 709, Patrick Air Force Base, Florida 5 7. U.S. Naval Missile Center, 709 andRAYDAC Systems, Point Mugu, California 5 8. U.S. NavalProving Ground, Naval Ordnance Computation Center, Dahlgren, Virginia 6 9.
    [Show full text]
  • Memory Hierarchy Summary
    Carnegie Mellon Carnegie Mellon Today DRAM as building block for main memory The Memory Hierarchy Locality of reference Caching in the memory hierarchy Storage technologies and trends 15‐213 / 18‐213: Introduction to Computer Systems 10th Lecture, Feb 14, 2013 Instructors: Seth Copen Goldstein, Anthony Rowe, Greg Kesden 1 2 Carnegie Mellon Carnegie Mellon Byte‐Oriented Memory Organization Simple Memory Addressing Modes Normal (R) Mem[Reg[R]] • • • . Register R specifies memory address . Aha! Pointer dereferencing in C Programs refer to data by address movl (%ecx),%eax . Conceptually, envision it as a very large array of bytes . In reality, it’s not, but can think of it that way . An address is like an index into that array Displacement D(R) Mem[Reg[R]+D] . and, a pointer variable stores an address . Register R specifies start of memory region . Constant displacement D specifies offset Note: system provides private address spaces to each “process” . Think of a process as a program being executed movl 8(%ebp),%edx . So, a program can clobber its own data, but not that of others nd th From 2 lecture 3 From 5 lecture 4 Carnegie Mellon Carnegie Mellon Traditional Bus Structure Connecting Memory Read Transaction (1) CPU and Memory CPU places address A on the memory bus. A bus is a collection of parallel wires that carry address, data, and control signals. Buses are typically shared by multiple devices. Register file Load operation: movl A, %eax ALU %eax CPU chip Main memory Register file I/O bridge A 0 Bus interface ALU x A System bus Memory bus I/O Main Bus interface bridge memory 5 6 Carnegie Mellon Carnegie Mellon Memory Read Transaction (2) Memory Read Transaction (3) Main memory reads A from the memory bus, retrieves CPU read word xfrom the bus and copies it into register word x, and places it on the bus.
    [Show full text]
  • Clementina - (1961-1966) a Personal Experience
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Servicio de Difusión de la Creación Intelectual The Beginning of Computer Science in Argentina – Clementina - (1961-1966) A Personal Experience Cecilia Berdichevsky 1 SADIO-Argentine Computing Society-Uruguay 252-Buenos Aires-Argentina http://www.sadio.org.ar, <[email protected]> 2 ICDL Argentina (International Computer Driving Licence)-Rincon 326-Buenos Aires-Argentina http://www.icdl.org.ar, <[email protected]> I dedicate this work to the memory of Dr. Manuel Sadosky. Abstract. 1957 marked the beginning of modern education in computing in Argentina. I was lucky enough to live this part of the history. After issuing an international bid that year, all members of a special commission from the University of Buenos Aires selected the Ferranti Mercury computer to be purchased for the University. Once installed in 1961, an Institute of Calculus1 was created with the aim of improving the use and professional and technical applications of the machine. Almost at the same time, a new course of study was organized, the Scientific Computist2. Those three events, promoted by our teacher and mentor Manuel Sadosky, set the start point of education assisted by computers in our country. The work at the Institute covered three fields: problem solving, research and teaching. Several Working Teams were organized looking to solve “real problems” in different disciplines: Mathematical Economics, Operations Research, Statistics, Linguistics, Applied Mechanics, Numerical Analysis, Electronic Engineering and Programming Systems. The architecture, structure, operation, languages and other characteristics of the machine, quite advanced for the time, determined the specific area of each of the working teams.
    [Show full text]