CS152 Computer Architecture and Engineering Lecture 21 Buses and I

CS152 Computer Architecture and Engineering Lecture 21 Buses and I

Recap: Levels of the Memory Hierarchy Capacity Upper Level Access Time Staging CS152 Cost Xfer Unit faster Computer Architecture and Engineering CPU Registers 100s Bytes Registers Lecture 21 <10s ns Instr. Operands prog./compiler 1-8 bytes Buses and I/O Cache K Bytes Cache #1 10-100 ns $.01-.001/bit cache cntl Blocks 8-128 bytes Main Memory M Bytes Memory November 10, 1999 100ns-1us $.01-.001 OS John Kubiatowicz (http.cs.berkeley.edu/~kubitron) Pages 512-4K bytes Disk G Bytes ms Disk -3 -4 lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ 10 - 10 cents user/operator Files Mbytes Tape Larger infinite sec-min Tape Lower Level 10-6 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.1 Lec21.2 Recap: What is virtual memory? Recap: Three Advantages of Virtual Memory ° Virtual memory => treat memory as a cache for the disk ° Translation: • Program can be given consistent view of memory, even ° Terminology: blocks in this cache are called “Pages” though physical memory is scrambled ° Typical size of a page: 1K — 8K • Makes multithreading reasonable (now used a lot!) ° Page table maps virtual page numbers to physical frames • Only the most important part of program (“Working Set”) Virtual Physical Virtual Address must be in physical memory. Address Space Address Space 10 • Contiguous structures (like stacks) use only as much V page no. offset physical memory as necessary yet still grow later. ° Protection: Page Table Page Table • Different threads (or processes) protected from each other. Base Reg Access • Different pages can be given special behavior index V Rights PA - (Read Only, Invisible to user programs, etc). into page • Kernel data protected from User programs table table located • Very important for protection from malicious programs in physical P page no. offset => Far more “viruses” under Microsoft Windows memory 10 ° Sharing: Physical Address • Can map same physical page to multiple users CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99(“Shared memory”) ©UCB Fall 1999 Lec21.3 Lec21.4 Recap: Making address translation practical: TLB Recap: TLB organization: include protection ° Translation Look-aside Buffer (TLB) is a cache of recent translations Virtual Address Physical Address Dirty Ref Valid Access ASID ° Speeds up translation process “most of the time” 0xFA00 0x0003 Y N Y R/W 34 ° TLB is typically a fully-associative lookup-table 0x0040 0x0010 N Y Y R 0 virtual address 0x0041 0x0011 N Y Y R 0 page off Virtual Physical Address Space Memory Space Page Table ° TLB usually organized as fully-associative cache 2 • Lookup is by Virtual Address 0 • Returns Physical Address + other info 1 ° Dirty => Page modified (Y/N)? 3 physical address Ref => Page touched (Y/N)? page off Valid => TLB entry valid (Y/N)? TLB Access => Read? Write? frame page 2 2 ASID => Which User? 0 5 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.5 Lec21.6 Recap: MIPS R3000 pipelining of TLB Reducing Translation Time I: Overlapped Access MIPS R3000 Pipeline Virtual Address (For 4K pages) Inst Fetch Dcd/ Reg ALU / E.A Memory Write Reg 12 TLB I-Cache RF Operation WB V page no. offset E.A. TLB D-Cache TLB Lookup TLB 64 entry, on-chip, fully associative, software TLB fault handler Access V Rights PA Virtual Address Space ASID V. Page Number Offset P page no. offset 6 20 12 12 Physical Address 0xx User segment (caching based on PT/TLB entry) 100 Kernel physical space, cached 101 Kernel physical space, uncached ° Machines with TLBs overlap TLB lookup with cache 11x Kernel virtual space access. Allows context switching among • Works because lower bits of result (offset) available early 64 user processes without TLB flush CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.7 Lec21.8 Overlapped TLB & Cache Access Problems With Overlapped TLB Access ° Overlapped access only works as long as the address bits used to ° If we do this in parallel, we have to be careful, index into the cache do not change as the result of VA translation however: Example: suppose everything the same except that the cache is assoc increased to 8 K bytes instead of 4 K: lookup index 11 2 32 TLB 4K Cache 1 K cache index 00 This bit is changed 20 10 2 4 bytes by VA translation, but 20 12 page # disp is needed for cache 00 virt page # disp lookup Hit/ Miss ° Solutions: ⇒ FN = FN Data Hit/ Go to 8K byte page sizes; Miss ⇒ Go to 2 way set associative cache; or ⇒ SW guarantee VA[13]=PA[13] ° With this technique, size of cache can be up to same size as pages. 1K 2 way set assoc cache 10 ⇒ What if we want a larger cache??? 44 CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.9 Lec21.10 Reduced Translation Time II: Virtually Addressed Cache Survey VA PA ° R4000 Trans- Main CPU lation Memory • 32 bit virtual, 36 bit physical • variable page size (4KB to 16 MB) Cache • 48 entries mapping page pairs (128 bit) hit ° MPC601 (32 bit implementation of 64 bit PowerPC data arch) ° Only require address translation on cache miss! • 52 bit virtual, 32 bit physical, 16 segment registers 428 • Very fast as result (as fast as cache lookup) • 4KB page, 256MB segment • No restrictions on cache organization 24 • 4 entry instruction TLB ° Synonym problem: two different virtual addresses map to same physical address ⇒ two cache entries holding data for the same physical address! • 256 entry, 2-way TLB (and variable sized block xlate) ° Solutions: • overlapped lookup into 8-way 32KB L1 cache • Provide associative lookup on physical tags during cache miss to enforce • hardware table search through hashed page tables a single copy in the cache (potentially expensive) • Make operating system enforce one copy per cache set by selecting ° Alpha 21064 virtual⇒physical mappings carefully. This only works for direct mapped • arch is 64 bit virtual, implementation subset: 43, 47,51,55 bit caches. • 8,16,32, or 64KB pages (3 level page table) ° Virtually Addressed caches currently out of favor because of synonym • 12 entry ITLB, 32 entry DTLB complexities CS152 / Kubiatowicz • 43 bit virtual, 28 bit physical octword address CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.11 Lec21.12 Alpha VM Mapping Administrivia ° “64-bit” address divided ° Important: Lab 7. Design for Test into 3 segments • You should be testing from the very start of your design • seg0 (bit 63=0) user • Consider adding special monitor modules at various points code/heap in design => I have asked you to label trace output from • seg1 (bit 63 = 1, 62 = 1) these modules with the current clock cycle # user stack • The time to understand how components of your design • kseg (bit 63 = 1, 62 = 0) should work is while you are designing! kernel segment for OS ° Question: Oral reports on 12/6? ° 3 level page table, each • Proposal: 10 — 12 am and 2 — 4 pm one page ° Pending schedule: • Alpha only 43 unique bits of VA • Sunday 11/14: Review session 7:00 in 306 Soda • (future min page size up to • Monday 11/15: Guest lecture by Bob Broderson 64KB => 55 bits of VA) • Tuesday 11/16: Lab 7 breakdowns and Web description ° PTE bits; valid, kernel & • Wednesday 11/17: Midterm I user read & write enable • Monday 11/29: no class? Possibly (No reference, use, or • Monday 12/1 Last class (wrap up, evaluations, etc) dirty bit) • Monday 12/6: final project reports due after oral report CS152 / Kubiatowicz • Friday 12/10 grades should be posted. CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.13 Lec21.14 Administrivia II Computers in the News: Sony Playstation 2000 ° Major organizational options: • 2-way superscalar (18 points) • 2-way multithreading (20 points) • 2-way multiprocessor (18 points) • out-of-order execution (22 points) • Deep Pipelined (12 points) ° Test programs will include multiprocessor versions ° Both multiprocessor and multithreaded must implement synchronizing “Test and Set” instruction: • Normal load instruction, with special address range: - Addresses from 0xFFFFFFF0 to 0xFFFFFFFF - Only need to implement 16 synchronizing locations • Reads and returns old value of memory location at specified address, while setting the value to one (stall memory stage for one extra cycle). ° (as reported in Microprocessor Report, Vol 13, No. 5) • For multiprocessor, this instruction must make sure that all • Emotion Engine: 6.2 GFLOPS, 75 million polygons per second updates to this address are suspended during operation. • Graphics Synthesizer: 2.4 Billion pixels per second • For multithreaded, switch to other processor if value is already • Claim: Toy Story realism brought to games! CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99non-zero (like a cache miss). ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.15 Lec21.16 Playstation 2000 Continued What is a bus? A Bus Is: ° shared communication link ° single set of wires used to connect multiple subsystems Processor Input Control Memory Datapath Output ° Emotion Engine: ° Sample Vector Unit • Superscalar MIPS core • 2-wide VLIW ° A Bus is also a fundamental tool for composing • Vector Coprocessor • Includes Microcode Memory large, complex systems Pipelines • High-level instructions like • systematic means of abstraction • RAMBUS DRAM interface matrix-multiply CS152 / Kubiatowicz CS152 / Kubiatowicz 11/10/99 ©UCB Fall 1999 11/10/99 ©UCB Fall 1999 Lec21.17 Lec21.18 Buses Advantages of Buses I/O I/O I/O Processer Device Device Device Memory ° Versatility: • New devices can

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us