ESE532: System-On-A-Chip Architecture Today Apple A11

ESE532: System-On-A-Chip Architecture Today Apple A11

ESE532: System-on-a-Chip Architecture Today • Case for Programmable SoC • Goals • Outcomes Day 1: August 29, 2018 • New/evovling Course, Risks, Tools Introduction and Overview • Sample Optimization • This course (incl. policies, logistics) • Zed Boards 1 2 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Apple A11 Bionic Questions • 90mm2 10nm FinFET • 4.3B transistors • Why do today’s SoC look like they do? • iPhone 8, 8s, X • How approach programming modern SoCs? • 6 ARM cores (64b) – 2 fast (2.4GHz) • How design a custom SoC? – 4 low energy • When building a System-on-a-Chip (SoC) • 3 custom GPUs – How much area should go into: • Neural Engine • Processor cores, GPUs, – 600 Bops? FPGA logic, memory, interconnect, • Motion, image accel. custom functions (which) …. ? • 8MB L2 cache 3 4 Penn ESE532 Fall 2017 -- DeHonhttps://wccftech.com/apple-iphone-8-plus-a11-bionic-complete-teardown/ Penn ESE532 Fall 2018 -- DeHon FPGA Field-Programmable Gate Array K-LUT (typical k=4) Compute block Case for Programmable SoC w/ optional output Flip-Flop ESE171, CIS371 5 6 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon 1 The Way things Were Today 20 years ago • Microprocessor may not be fast enough – (but often it is) • Wanted programmability – Or low enough energy – used a processor • Time and Cost of a custom IC is too high • Wanted high-throughput – $100M’s of dollars for development, Years – used a custom IC • FPGAs promising • Wanted product differentiation – But build everything from prog. gates? – Got it at the board level – Select which ICs and how wired • Premium for small part count – And avoid chip crossing • Build a custom IC – ICs with Billions of Transistors – It was about gates and logic 7 8 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Non-Recurring Engineering NRE Costs (NRE) Costs • Costs spent up front on development – Engineering Design Time – Prototypes – Mask costs • Recurring Engineering – Costs to produce each chip • 28-nm SoC development costs doubled Cost(Nchips) = CostNRE + Nchips× Costperchip over previous node – EE Times 2013 28nm+78%, 20nm+48%, 14nm+31%, 10nm+35% 9 10 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon € Economics Amortize NRE with Volume Forcing fewer, more customizable Cost(Nchips) = CostNRE + Nchips× Costperchip chips • Economics force fewer, more customizable chips – Mask costs in the millions of dollars – Custom IC design NRE 10s—100s of millions of dollars CostNRE € Cost= + Costperchip • Need market of billions of dollars to recoup investment Nchips • With fixed or slowly growing total IC industry revenues • ! Number of unique chips must decrease 11 12 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon € 2 Given Demand for Large ICs Programmable • Now contain significant software • How do we get higher performance than – Almost all have embedded processors a processor, while retaining • Must co-design SW and HW programmability? • Must solve complete computing task – Tasks has components with variety of needs – Some don’t need custom circuit – 90/10 Rule 13 14 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Programmable SoC Programmable SoC • Implementation Platform for innovation – This is what you target (avoid NRE) – Implementation vehicle 15 16 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Then and Now 20 years ago Today • Programmability? • Programmability? – use a processor – uP, FPGA, GPU, PSoC • High-throughput • High-throughput Goals, Outcomes – used a custom IC – FPGA, GPU, PSoC, • Wanted product custom differentiation • Wanted product – board level differentiation – Select & wired IC – Program FPGAs, • Build a custom IC PSoC – It was about gates • Build a custom IC and logic 17 18 Penn ESE532 Fall 2018 -- DeHon – System and software Penn ESE532 Fall 2018 -- DeHon 3 Goals Roles • Create Computer Engineers • PhD Qualifier – SW/HW divide is wrong, outdated – One broad Computer Engineering – Parallelism, data movement, resource management, abstractions • CMPE Concurrency – Cannot build a chip without software • Hands-on Project course • SoC user – know how to exploit • SoC designer – architecture space, hw/sw codesign • Project experience – design and optimization 19 20 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Outcomes Outcomes • Design, optimize, and program a modern System-on-a-Chip. • Understand the system on a chip from • Analyze, identify bottlenecks, design-space gates to application software, including: – Modeling " write equations to estimate – on-chip memories and communication • Decompose into parallel components networks, I/O interfacing, design of accelerators, processors, firmware and OS/ • Characterize and develop real-time solutions infrastructure software. • Implement both hardware and software • Understand and estimate key design solutions metrics and requirements including: • Formulate hardware/software tradeoffs, and – area, latency, throughput, energy, power, predictability, and reliability. perform hardware/software codesign 21 22 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon New and Evolving Course Tools • Spring 2017 – first offering • Are complex – Raw, all assignments new … some buggy • Will be challenging, but good for you to – Assignments too tedious, long build confidence can understand and • Fall 2017 – second offering master – Refine assignments, project • Tool runtimes can be long – Increased explicit modeling emphasis • Learning and sharing experience will be – Hard, not insane part of assignments • Fall 2018 – now – Not much different from 2017 – Hopefully fewer rough edges 23 24 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon 4 Distinction Abstraction Stack CIS240, 371, 501 ESE532 • Best Effort Computing • Hardware-Software SoC Arch: – Run as fast as you can codesign Processor Arch: ESE 532 • Binary compatible – Willing to recompile, maybe CIS 371/501 rewrite code • ISA separation – Define/refine hardware Mixed Signal: • Shared memory • Real-Time ESE 568 parallelism – Guarantee meet deadline • Non shared-memory Circuits/VLSI Digital: ESE370/570 Analog: ESE419/572 models (ESE319) Devices: ESE521 25 26 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon (ESE218) Abstract Approach • Identify requirements, bottlenecks • Decompose Parallel Opportunities Approach -- Example – At extreme, how parallel could make it? – What forms of parallelism exist? • Thread-level, data parallel, instruction-level • Design space of mapping – Choices of where to map, area-time tradeoffs • Map, analyze, refine – Write equations to understand, predict 27 28 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon SPICE Circuit Simulator Analyze Matrix Solve Ax=B A matrix B vector x unknown vector Solve for x Linear Algebra solving n eqns in n unknowns. Example: Kapre+DeHon, TRCAD 2012 29 30 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon 5 Analyze Speedup • T=Tmodeleval+Tmatsolve+Tctrl • What should we speedup first? • T=Tmodeleval+Tmatsolve+Tctrl • What happens if only speedup modeleval? 31 – T=Tmatsolve+(Tmodeleval)/S+Tctrl 32 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Model Evaluation: Trivial Analyze Hardware Implementation • If only accelerated model evaluation only about 2x speedup • If want better than 14x speed, must also attack control * * ÷ e f b - * ÷ d c a ex 33 34 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHonVerilog-AMS as Domain-Specific Language Spatial Parallelism Parallelism: Model Evaluation • Every operation (*, + /) gets dedicated hardware. • Implement task in space " use additional area for • Every device each operator. independent • Parallel – all operations occur simultaneously. • Many of each type of device • Can evaluate in * * ÷ parallel e f b – T=Tseq/Nproc - * ÷ • Build pipelined d c a circuit for model x e – Tseq=Ncomp*Tcycle vs. T =T 35 pipe cycle 36 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon 6 Single FPGA Mapping Parallelism: Model Evaluation Fully spatial Custom VLIW circuit • Spatial end up • Use custom bottlenecked by evaluation engines b ÷ * e * f * other components • …or GPUs - * ÷ + ÷ d c a ex ex ~100x Speedup 1-10x Speedup Multiple FPGAs 1 FPGA VLIW=Very Long Instruction Word 37 38 Penn ESE532 Fall 2018 -- DeHon exploits Instruction-Level Parallelism Penn ESE532 Fall 2018 -- DeHon Parallelism: Matrix Solve Example Matrix • Needed direct solver? • E.g. Gaussian elimination • Data dependence on previous reduce • Parallelism in subtracts • Some row independence 39 40 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Example Matrix Example Matrix Reduce to critical path: from 9 sequential operations 41 to path of 5 operations. 42 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon 7 Dataflow PE Matrix Solve Only Graph Incoming Nodes Messages Dataflow trigger * + ÷ Graph Outgoing ~2.4x mean Fanout Messages 43 44 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Parallelism: Matrix Solve Parallelism Controller? • Settled on • Could leave constructing dataflow sequential graph • For some designs, • Graph can be becomes the iteration independent bottleneck once – Statically scheduled others accelerated – (cheaper) • Has internal • This is bottleneck to parallelism in further acceleration condition evaluation T=Tmodeleval/S1+(Tmatsolve)/S2+Tctrl 45 46 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2018 -- DeHon Single-Chip Solution Parallelism Controller • Customized datapath controller

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us