Apple A13 Bionic Questions

Apple A13 Bionic Questions

ESE532: System-on-a-Chip Architecture Today • Case for Programmable SoC Day 1: September 2, 2020 • Course Goals Introduction and Overview • Outcomes (lecture start 10:35am) • Evovling Course, Risks, Tools • Sample Optimization Note: both linked to web (also slides) • This course (incl. policies, logistics) www.seas.upenn.edu/~ese532/fall2020/fall2020.html • Preclass • Feedback form 1 2 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Apple A13 Bionic Questions • 98mm2, 7nm • Why do today’s SoC look like they do? • 8.5 Billion Tr. • How approach programming modern SoCs? • iPhone 11 + • How design a custom SoC? • 6 ARM cores • When building a System-on-a-Chip (SoC) – 2 fast (2.6GHz) – How much area should go into: – 4 low energy • Processor cores, GPUs, • 4 custom GPUs FPGA logic, memory, interconnect, • NeurAl Engine custom functions (which) …. ? – 5 Trillion ops/s? 3 4 Penn ESE532 FAll 2020-- DeHon Penn ESE532 Fall 2020 -- DeHon FPGA Field-Programmable Gate Array K-LUT (typical k=4 or 6) Compute block Case for Programmable SoC w/ optional output Flip-Flop ESE171, ESE150, CIS371 5 6 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon 1 End of uProcessor Scaling Old • Moore’s Law scaling Now delivered faster • Dennard’s Law transistors kicked in • Processors rode • uP were burning Moore’s Law more power – Turning transistors • Lost ability to scale into performance down voltage • Could wait and ride • Processor technology curve performance stalled 7 8 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon The Way things Were Today 30 years ago • Microprocessor may not be fast enough • Wanted programmability – (but often it is) – used a processor – Or low enough energy • Wanted it a little faster • Single core processor scaling has ended – Next year’s processor would run faster… • Time and Cost of a custom IC is too high • Wanted high-throughput – $100M’s of dollars for development, Years – used a custom IC • Wanted product differentiation • FPGAs promising – Got it at the board level – But build everything from prog. gates? – Select which ICs and how wired • Premium for small part count • Build a custom IC – And avoid chip crossing – It was about gates and logic 9 10 Penn ESE532 Fall 2020 -- DeHon Penn ESE532– FallICs 2020 --withDeHon Billions of Transistors Non-Recurring Engineering NRE Costs (NRE) Costs • Costs spent up front on development – Engineering Design Time – Prototypes – Mask costs • Recurring Engineering – Costs to produce each chip 11 12 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon 2 NRE Cost (continued) Amortize NRE with Volume 13 14 Penn ESE532 Fall 2020 -- DeHon https://semiengineering.com/how-much-will-that-chip-cost/ Penn ESE532 Fall 2020 -- DeHon Economics Large ICs Forcing fewer, more • Now contain significant software customizable – Almost all have embedded processors chips • Must co-design SW and HW • Economics force fewer, more customizable chips • Must solve complete computing task – Mask costs in the millions of Dollars – Tasks has components with variety of needs – Custom IC DesiGn NRE 10s—100s of millions of dollars • NeeD market of billions of Dollars to recoup investment – Some don’t need custom circuit • With fixeD or slowly GrowinG total IC inDustry revenues – 90/10 Rule • è Number of unique chips must Decrease • Chips must be proGrammable 15 16 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Given Demand for Programmable SoC Programmable • Implementation Platform for innovation – This is what you target (avoid NRE) • How do we get higher performance than – Implementation a processor, while retaining vehicle programmability? 17 18 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon 3 Programmable SoC Then and Now 30 years ago Today • Programmability? • Programmability? – use a processor – uP, FPGA, GPU, PSoC • Faster • Faster – Processors scaled UG1085 • Can’t get with single • High-throughput core Xilinx – used a custom IC UltraScale • High-throughput • Wanted product Zynq – FPGA, GPU, PSoC, differentiation TRM custom – board level (p27) • Wanted product – Select & wired IC differentiation • Build a custom IC – Program FPGAs, PSoC – It was about gates and • Build a custom IC logic 19 – System and software 20 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Goals • Create Computer Engineers – SW/HW divide is wrong, outdated – Computer engineers understand computation Course • HW and SW are just tools and design options Goals, Outcomes – Parallelism, data movement, resource management, abstractions – Cannot build a chip without software • SoC user – know how to exploit • SoC designer – architecture space, hw/sw codesign 21 • Project experience – design and optimization22 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Outcomes Roles • Design, optimize, and program a modern System-on-a-Chip. • PhD Qualifier • Analyze, identify bottlenecks, design-space – One broad Computer Engineering – Modeling à write equations to estimate • CMPE Concurrency • Decompose into parallel components • Hands-on Project course • Characterize and develop real-time solutions • Implement both hardware and software solutions • Formulate hardware/software tradeoffs, and 23 perform hardware/software codesign 24 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon 4 Evolving Course Outcomes • Spring 2017 – first offering – Raw, all assignments new, buggy, too tedious, long • Fall 2017 – second offering • Understand the system on a chip from – Refine assignments, project; increased explicit modeling emphasis gates to application software, including: – Hard, not insane – on-chip memories and communication • Fall 2018 – third offering (similar 2017) – Added real-time ethernet data handling; project groups of 3 networks, I/O interfacing, design of – Many students challenged with C and software engineering accelerators, processors, firmware and – Stream debug and performance challenging OS/infrastructure software. • Fall 2019 – fourth (structure same) – Try front-load more C, better introduce Stream optimization and debug • Understand and estimate key design – Group writeup on projects metrics and requirements including: • Fall 2020 – now – Move to Vitis (from SDSoC) – area, latency, throughput, energy, power, – Use Amazon cloud for first half; F1 instance for FPGA access HW – Then transition to Ultra96 (SoC FPGA) for projects predictability, and reliability. 25 26 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Tools Distinction • Are complex CIS240, 371, 471, 571 ESE532 • Will be challenging, but good for you to • Best Effort Computing • Hardware-Software build confidence can understand and – Run as fast as you can codesign • Binary compatible – Willing to recompile, maybe master rewrite code • ISA separation – Define/refine hardware • Tool runtimes can be long • Shared memory • Real-Time parallelism • Learning and sharing experience will be – Guarantee meet deadline part of assignments • Non shared-memory parallelism models 27 28 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Distinction Abstraction Stack ESE680: Software ESE680 Embedded Sys: Hardware/Software Co- ESE532: HW/SW ESE350/519 Design for Machine • Deep computer Systems ML SoC Arch: Learning engineering ESE 532 • Deep on Application (ML) • Broad application • More accessible to CS Processor Arch: Mixed Signal: ADC, DAC • Program in C Processors – Less previous experience CIS 371/501 ESE 568 Switched Capacitors with circuits and • Suitable followup if want (CIS240) architecture to dig deeper Gates, Memories Digital: Analog: Amplifier, Compare th • Won’t be as deep on • 5 offering Circuits/VLSI ESE370/570 ESE419/572 Voltage/Current Ref. understanding HW and optimization Devices: ESE521 Transistors (ESE218) • Program in Pytorch, OpenCL 29 30 Penn ESE532• First Fall 2020offerin -- DeHong Penn ESE532 Fall 2020 -- DeHon 5 Abstract Approach • Identify requirements, bottlenecks • Decompose Parallel Opportunities Approach -- Example – At extreme, how parallel could make it? – What forms of parallelism exist? • Thread-level, data parallel, instruction-level • Design space of mapping – Choices of where to map, area-time tradeoffs • Map, analyze, refine – Write equations to understand, predict 31 32 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Example SPICE Circuit Example: SPICE Circuit Simulator Simulator Matrix Solve Ax=B A matrix B vector x unknown vector Solve for x = KCL, KVL * Linear Algebra solving n eqns in n unknowns. Example: Kapre+DeHon, TRCAD 2012 33 34 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon * Kirchhoff {Current,Voltage} Laws Analyze Analyze • T=Tmodeleval+Tmatsolve+Tctrl 35 36 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon 6 Speedup Analyze • If only accelerated model evaluation only about 2x speedup • If want better than 14x speed, must also attack control • T=Tmodeleval+Tmatsolve+Tctrl • What should we speedup first? • What happens if only speedup modeleval? – T=Tmatsolve+(Tmodeleval)/S+Tctrl 37 38 Penn ESE532 Fall 2020 -- DeHon Penn ESE532 Fall 2020 -- DeHon Model Evaluation: Trivial Spatial Parallelism Hardware Implementation •Every operation (*, + /) gets dedicated hardware. VD1 / vj I D1 = I s ´(e -1) •Implement task in space à use additional area for each operator. d VD1 / vj 1 GD1 = dV (I D1) = I s ´e ´ •Parallel – all operations occur simultaneously. D1 vj * * ÷ * * ÷ e f b e f b - * ÷ - * ÷ d c a d c a ex ex 39 40 Penn ESE532 Fall 2020 -- DeHonVerilog-AMS as Domain-Specific

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us