ESE532: System-On-A-Chip Architecture Today Apple A12

ESE532: System-on-a-Chip Architecture Today • Case for Programmable SoC • Course Goals • Outcomes Day 1: August 28, 2019 • New/evovling Course, Risks, Tools Introduction and Overview • Sample Optimization Everyone grab: • This course (incl. policies, logistics) • Preclass • FeedbacK Sheet (1/2 page) 1 2 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Apple A12 Bionic Questions • 84mm2, 7nm • Why do today’s SoC look like they do? • 7 Billion Tr. • How approach programming modern SoCs? • iPhone XS, XR • How design a custom SoC? – IPad 2019 • When building a System-on-a-Chip (SoC) • 6 ARM cores – How much area should go into: – 2 fast • Processor cores, GPUs, – 4 low energy FPGA logic, memory, interconnect, • 4 custom GPUs custom functions (which) …. ? • Neural EnGine – 5 Trillion ops/s? 3 4 Penn ESE532 Fall 2018 -- DeHon Penn ESE532 Fall 2019 -- DeHon FPGA Field-Programmable Gate Array K-LUT (typical k=4) Compute block Case for Programmable SoC w/ optional output Flip-Flop ESE171, ESE150, CIS371 5 6 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon 1 The Way things Were Today 25 years ago • Microprocessor may not be fast enough – (but often it is) • Wanted programmability – Or low enough energy – used a processor • Time and Cost of a custom IC is too high • Wanted high-throughput – $100M’s of dollars for development, Years – used a custom IC • FPGAs promising • Wanted product differentiation – But build everything from prog. gates? – Got it at the board level – Select which ICs and how wired • Premium for small part count • Build a custom IC – And avoid chip crossing – ICs with Billions of Transistors 7 8 Penn ESE532– FallIt 2019 was -- DeHon about gates and logic Penn ESE532 Fall 2019 -- DeHon Non-Recurring Engineering NRE Costs (NRE) Costs • Costs spent up front on development – Engineering Design Time – Prototypes – Mask costs • Recurring Engineering – Costs to produce each chip 9 10 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon NRE Cost (continued) Amortize NRE with Volume 11 12 Penn ESE532 Fall 2019 -- DeHon https://semiengineering.com/how-much-will-that-chip-cost/ Penn ESE532 Fall 2019 -- DeHon 2 Economics Large ICs Forcing fewer, more • Now contain significant software customizable – Almost all have embedded processors chips • Must co-design SW and HW • Economics force fewer, more customizable chips • Must solve complete computing task – Mask costs in the millions of dollars – Tasks has components with variety of needs – Custom IC design NRE 10s—100s of millions of dollars • Need market of billions of dollars to recoup investment – Some don’t need custom circuit • With fixed or slowly growing total IC industry revenues – 90/10 Rule • è Number of uniQue chips must decrease • Chips must be programmable 13 14 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Given Demand for Programmable SoC Programmable • Implementation Platform for innovation – This is what you target (avoid NRE) • How do we get higher performance than – Implementation a processor, while retaining vehicle programmability? 15 16 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Programmable SoC Then and Now 25 years ago Today • Programmability? • Programmability? – use a processor – uP, FPGA, GPU, PSoC • High-throughput UG1085 – used a custom IC • High-throughput Xilinx – FPGA, GPU, PSoC, • Wanted product UltraScale custom Zynq differentiation • Wanted product TRM – board level (p27) differentiation – Select & wired IC – Program FPGAs, • Build a custom IC PSoC – It was about gates • Build a custom IC 17 and logic 18 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon – System and software 3 Goals • Create Computer Engineers Course – SW/HW divide is wrong, outdated – Parallelism, data movement, resource Goals, Outcomes management, abstractions – Cannot build a chip without software • SoC user – know how to exploit • SoC designer – architecture space, hw/sw codesign 19 • Project experience – design and optimization20 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Outcomes Roles • Design, optimize, and program a modern System-on-a-Chip. • PhD Qualifier • Analyze, identify bottlenecks, design-space – One broad Computer Engineering – Modeling à write equations to estimate • CMPE Concurrency • Decompose into parallel components • Hands-on Project course • Characterize and develop real-time solutions • Implement both hardware and software solutions • Formulate hardware/software tradeoffs, and 21 perform hardware/software codesign 22 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon New and Evolving Course Outcomes • Spring 2017 – first offering – Raw, all assignments new … some buggy – Assignments too tedious, long • Understand the system on a chip from • Fall 2017 – second offering gates to application software, including: – Refine assignments, project – Increased explicit modeling empHasis – on-chip memories and communication – Hard, not insane networks, I/O interfacing, design of • Fall 2018 – third offering accelerators, processors, firmware and – Not mucH different from 2017 – Added real-time ethernet data handling; project groups of 3 OS/infrastructure software. – Many students cHallenged witH C and software engineering • Understand and estimate key design – Stream debug and performance challenging • Fall 2019 – now metrics and requirements including: – Basic structure remains same – area, latency, throughput, energy, power, – Try front-load more C – Try better introduce Stream optimization and debug predictability, and reliability. 23 – Group writeup on projects 24 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon 4 Tools Distinction • Are complex CIS240, 371, 501 ESE532 • Will be challenging, but good for you to • Best Effort Computing • Hardware-Software build confidence can understand and – Run as fast as you can codesign • Binary compatible – Willing to recompile, maybe master rewrite code • ISA separation – Define/refine hardware • Tool runtimes can be long • Shared memory • Real-Time parallelism • Learning and sharing experience will be – Guarantee meet deadline part of assignments • Non shared-memory parallelism models 25 26 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Abstraction Stack Software Embedded Sys: ESE350/519 Systems SoC Arch: ESE 532 Approach -- Example Processors Processor Arch: Mixed Signal: ADC, DAC CIS 371/501 ESE 568 Switched Capacitors (CIS240) Gates, Memories Digital: Analog: Amplifier, Compare Circuits/VLSI ESE370/570 ESE419/572 Voltage/Current Ref. Devices: ESE521 Transistors (ESE218) 27 28 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Abstract Approach SPICE Circuit Simulator • Identify requirements, bottlenecks Matrix Solve Ax=B • Decompose Parallel Opportunities A matrix – At extreme, how parallel could make it? B vector x unknown vector – What forms of parallelism exist? Solve for x • Thread-level, data parallel, instruction-level • Design space of mapping Linear Algebra solving n eqns – Choices of where to map, area-time tradeoffs in n unknowns. • Map, analyze, refine Example: Kapre+DeHon, TRCAD 2012 – Write equations to understand, predict 29 30 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon 5 Analyze Analyze • T=Tmodeleval+Tmatsolve+Tctrl 31 32 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Speedup Analyze • If only accelerated model evaluation only about 2x speedup • If want better than 14x speed, must also attack control • T=Tmodeleval+Tmatsolve+Tctrl • What should we speedup first? • What happens if only speedup modeleval? – T=Tmatsolve+(Tmodeleval)/S+Tctrl 33 34 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Model Evaluation: Trivial Spatial Parallelism Hardware Implementation •Every operation (*, + /) gets dedicated hardware. VD1 / vj I D1 = I s ´(e -1) •Implement task in space à use additional area for each operator. d VD1 / vj 1 GD1 = dV (I D1) = I s ´e ´ •Parallel – all operations occur simultaneously. D1 vj * * ÷ * * ÷ e f b e f b - * ÷ - * ÷ d c a d c a ex ex 35 36 Penn ESE532 Fall 2019 -- DeHonVerilog-AMS as Domain-Specific Language Penn ESE532 Fall 2019 -- DeHon 6 Parallelism: Model Evaluation Spatial Too Big? Data Parallel Fully spatial Custom VLIW • Every device circuit independent • Many of each type b ÷ * e * f * of device ÷ - ÷ + • Can evaluate in d * c a parallel ex x – T=Tseq/Nproc e • Build pipelined ~100x Speedup circuit for model ~10x Speedup Multiple FPGAs – Tseq=Ncomp*Tcycle 1 FPGA VLIW=Very Long Instruction Word vs. Tpipe=Tcycle 37 38 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon exploits Instruction-Level Parallelism Parallelism: Model Evaluation Parallelism: Matrix Solve • Spatial end up • Use custom • Needed direct solver? bottlenecked by evaluation engines • E.g. Gaussian other components • …or GPUs elimination • Data dependence on previous reduce – Limited data parallelism • Parallelism in subtracts • Some row 39 independence 40 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Example Matrix Example Matrix 41 42 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon 7 Example Matrix Dataflow Processing Element (PE) Graph Incoming Nodes Messages Dataflow trigger * + ÷ Graph Outgoing Reduce to critical path: Fanout Messages from 9 sequential operations to path of 5 operations. 43 44 Penn ESE532 Fall 2019 -- DeHon Penn ESE532 Fall 2019 -- DeHon Parallelism: Matrix Solve Matrix Solve Only • Settled on constructing dataflow graph • Graph can be iteration independent

ESE532: System-On-A-Chip Architecture Today Apple A12

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support