Cache & Memory System

COMP 212 Computer Organization & Architecture Re-Cap of Lecture #2 • The text book is required for the class – COMP 212 Fall 2008 You will need it for homework, project, review…etc. – To get it at a good price: Lecture 3 » Check with senior student for used book » Check with university book store Cache & Memory System » Try this website: addall.com » Anyone got the book ? Care to share experience ? Comp 212 Computer Org & Arch 1 Z. Li, 2008 Comp 212 Computer Org & Arch 2 Z. Li, 2008 Components & Connections Instruction – CPU: processing data • Instruction word has 2 parts – Mem: store data – Opcode: eg, 4 bit, will have total 24=16 different instructions – I/O & Network: exchange data with – Operand: address or immediate number an outside world instruction can operate on – Connection: Bus, a – In von Neumann computer, instruction and data broadcasting medium share the same memory space: – Address space: 2W for address width W. • eg, 8 bit has 28=256 addressable space, • 216=65536 addressable space (room number) Comp 212 Computer Org & Arch 3 Z. Li, 2008 Comp 212 Computer Org & Arch 4 Z. Li, 2008 Instruction Cycle Register & Memory Operations During Instruction Cycle • Instruction Cycle has 3 phases • Pay attention to the – Instruction fetch: following registers’ • pull instruction from mem to IR, according to PC change over cycles: • CPU can’t process memory data directly ! – Instruction execution: – PC, IR • Operate on the operand, either load or save data to – AC memory, or move data among registers, or ALU – Mem at [940], [941] operations – Interruption Handling: to achieve parallel operation with slower IO device • Sequential • Nested Comp 212 Computer Org & Arch 5 Z. Li, 2008 Comp 212 Computer Org & Arch 6 Z. Li, 2008 Homework Lecture #3: Cache & Memory System • Compute the summation of an Array in memory: • Summary: – Key is to use a counter to control how many times we add – Computer memory is hierarchical: – Use JMP command to control the flow of the program. » CPU >> Registers >> Cache >> Memory >> Hard Disk>>Tape/Optical Disk – Give the machine code » Access speed: decreasing dramatically as we go down the hiearchy » Cost: decreasing also – Walk thru the cycles, and give register status at selected states. – How to design cache/memory hierarchy and access algorithm, such – See website for more detail. that the average access time is the best ? » Cache Design » Cache performance Comp 212 Computer Org & Arch 7 Z. Li, 2008 Comp 212 Computer Org & Arch 8 Z. Li, 2008 Memory Hierarchy - Diagram The computer memory hierarchy Comp 212 Computer Org & Arch 9 Z. Li, 2008 Comp 212 Computer Org & Arch 10 Z. Li, 2008 Memory types Physical Types • Registers • Semiconductor – Directly operatable by CPU – RAM • L1/L2 Cache – Direct accessible by CPU • Magnetic • Main memory – Disk & Tape – Need a address, and then load from memory • Optical • Disk cache – CD & DVD – Memory on disk, • Disk • Others • Optical, e.g. DVD, CD, BlueRay – Bubble • Tape – Hologram Comp 212 Computer Org & Arch 11 Z. Li, 2008 Comp 212 Computer Org & Arch 12 Z. Li, 2008 Physical Characteristics The Design Goals and Constraints • Volatility • How much? – When power off, the info is gone, e.g. RAM – Capacity hierarchy, register < cache < memory < disk < tape/optical – Non-volatile: SSD, Hard Disk storage • Erasable • How fast? – ROM: read only memory, – Register/Cache: CPU clock rate of access, in GHz range – CD: write once, read multiple times – Memory: limited by BUS speed, and data bus width, in MHz range – RAM/Disk: read/write many times • How expensive? • Power consumption – Register > cache > memory > solid state disk > disk > tape / optical storage Comp 212 Computer Org & Arch 13 Z. Li, 2008 Comp 212 Computer Org & Arch 14 Z. Li, 2008 Key Characteristics of Computer Memory System Cache system design and performance Comp 212 Computer Org & Arch 15 Z. Li, 2008 Comp 212 Computer Org & Arch 16 Z. Li, 2008 Location – where it resides Capacity • CPU: • Word size – Registers, L1 and L2 cache, – For ext. memory, typically bytes (8bits), e.g., 120GB hard disk, 20GB • Internal SSD (Solid State Disk) on EEE Pc – For internal memory, word can be 8, 16, 32 bits. – Main Memory, • Number of words • External – Determined by address size for internal memory, – Secondary storage, hard disk, optical disk, tapes » E.g. 32bit mem address gives us 232 word, or 4G word addressable space » Installed memory can be less than that, e.g. 1G mem. Comp 212 Computer Org & Arch 17 Z. Li, 2008 Comp 212 Computer Org & Arch 18 Z. Li, 2008 Unit of Transfer Access Methods (1) • Internal • Sequential – Usually governed by data bus width, and bus clock rate – Start at the beginning and read through in order • External – Access time depends on location of data and previous location – Usually a block which is much larger than a word – e.g. tape • Addressable unit • Direct A – Smallest location which can be uniquely addressed, determined by 2 , – Individual blocks have unique address A is address space. – Access is by jumping to vicinity plus sequential search – Word internally – Access time depends on location and previous location – Cluster on M$ disks – e.g. disk Comp 212 Computer Org & Arch 19 Z. Li, 2008 Comp 212 Computer Org & Arch 20 Z. Li, 2008 Access Methods (2) Performance • Random • Access time (latency) – Individual addresses identify locations(address) exactly – Time between presenting the address and getting the valid data – Access time is independent of location or previous access • Memory Cycle time – for RAM – e.g. Internal Mem, RAM – Time may be required for the memory to “recover” before next access, related to BUS operations. • Associative – Cycle time is access + recovery – Data is located by a comparison with contents of a portion of the • store: e.g. get word with MSB 1110,xxxx,xxxx,xxxx. Transfer Rate – Rate at which data can be moved – Access time is independent of location or previous access » 1 unit per cylce time for RAM: e.g. 32bit data bus, 500Mhz cycle time, – e.g. Cache gives us 2G Bytes/ sec transfer rate Comp 212 Computer Org & Arch 21 Z. Li, 2008 Comp 212 Computer Org & Arch 22 Z. Li, 2008 Design goals of computer memory Average memory access time with cache • It is about tradeoffs between • If it takes – More space – t1 to access cache – Faster access – t2 to access memory, with t2>>t1, say 20 times – Cost – h : prob of data access in cache, or hit ratio • Use cache system to balance out • The average access time: – Between CPU and Main Memory – h*t1 + (1-h)*(t1+t2) – Between Main Memory – Disk – h*t1: time to directly access cache – (1-h)*(t1+t2): in time of miss, time to load the data into cache, and access it. Comp 212 Computer Org & Arch 23 Z. Li, 2008 Comp 212 Computer Org & Arch 24 Z. Li, 2008 Average access time as a function of hit ratio Locality of Reference • How to improve hit ratio ? • Good news is data access has strong locality – E.g. Loop operations repeatedly access a small set of data • What is the right cache size ? • How do design cache replacement algorithm ? – i.e. what to keep in cache, based on past access pattern ? Comp 212 Computer Org & Arch 25 Z. Li, 2008 Comp 212 Computer Org & Arch 26 Z. Li, 2008 Cache • Small amount of fast memory • Sits between normal main memory and CPU Cache Design • May be located on CPU chip or module Comp 212 Computer Org & Arch 27 Z. Li, 2008 Comp 212 Computer Org & Arch 28 Z. Li, 2008 Typical Cache to Memory Diagram Cache/Main Memory Structure • Memory has 2n addressable words • Memory is accessed in blocks – K words per block – Total 2n/K blocks. – E.g. » n=24, we have 224=16M address space » If K=4, we have 224/22 = 222 memory blocks » So the block address is the first 22 bits of word address Comp 212 Computer Org & Arch 29 Z. Li, 2008 Comp 212 Computer Org & Arch 30 Z. Li, 2008 Cache/Main Memory Structure Cache operation – overview • Cache contains cache C lines • CPU requests contents of memory location • Each line has a block, or K • Check cache for this data words • If present, get from cache (fast) • It has a tag to indicate which • If not present, read required block from main memory to block in memory is in cache cache line • To uniquely identify a cache • Then deliver from cache to CPU h line, we need h bits, if C=2 . • Cache includes tags to identify which block of main memory is in each cache slot Comp 212 Computer Org & Arch 31 Z. Li, 2008 Comp 212 Computer Org & Arch 32 Z. Li, 2008 Cache Read Operation - Flowchart Cache Design Issues • Size of Cache • Block Size • Levels of Cache, – 1, 2 or 3 levels ? • Mapping Function – Direct – Associative – Set Associative Comp 212 Computer Org & Arch 33 Z. Li, 2008 Comp 212 Computer Org & Arch 34 Z. Li, 2008 Cache Design Issues (2) Cache Size • Cache Replacement Algorithm: what to keep in cache ? • Cost – Least recent used (LRU), – Cache is expensive, compared with memory, in dollar per bits – First In First Out (FIFO), • Speed – Least Freq Use (LFU), – Cache size too big not good for fast access – Random » More gates and logic needed for addressing • Write Policy – Checking cache for data takes time – Write thru – Write back – Write once Comp 212 Computer Org & Arch 35 Z. Li, 2008 Comp 212 Computer Org & Arch 36 Z. Li, 2008 Comparison of Cache Sizes Mapping Functions: Direct Mapping • Simply computed as: – I = J mod m, where – I is the line number in cache, J is the memory block number, and m is the number of lines in cache • Each block of main memory maps to only one cache line – i.e.

Cache & Memory System

Memory Hierarchy Memory Hierarchy

Make the Most out of Last Level Cache in Intel Processors In: Proceedings of the Fourteenth Eurosys Conference (Eurosys'19), Dresden, Germany, 25-28 March 2019

Migration from IBM 750FX to MPC7447A by Douglas Hamilton European Applications Engineering Networking and Computing Systems Group Freescale Semiconductor, Inc

Caches & Memory

Stealing the Shared Cache for Fun and Profit

IBM Power Systems Performance Report Apr 13, 2021

Cache-Fair Thread Scheduling for Multicore Processors

A Cache Line Fill Circuit for a Micropipelined, Asynchronous Microprocessor

Quickspecs HP Integrity Rx7640 Server Overview

Exploiting Cache Side Channels on CPU-FPGA Cloud Platforms

The Tradeoffs of Fused Memory Hierarchies in Heterogeneous Computing Architectures

Architectural Trade-Offs in a Latency Tolerant Gallium Arsenide Microprocessor