Custom Computing

Custom Computing

Custom Computing Lecture 1: Custom Computing Technologies Wayne Luk Department of Computing Imperial College London https://www.doc.ic.ac.uk/~wl/teachlocal/cuscomp/ [email protected] wl 2021 1.1 General-purpose computing: efficient? AES 128bit key Throughput Power Efficiency 128bit data Consumption (Gb/s/W) ASIC [0] 3.84 Gbits/sec 350 mW 11 (1/1) FPGA [1] 1.32 Gbit/sec 490 mW 2.7 (1/4) ASM StrongARM [2] 31 Mbit/sec 240 mW 0.13 (1/85) Asm Pentium III [3] 648 Mbits/sec 41.4 W 0.015 (1/800) C Emb. Sparc [4] 133 Kbits/sec 120 mW 0.0011 (1/10,000) Java Emb. Sparc [5] 450 bits/sec 120 mW 0.0000037 (1/3,000,000) [0] Application-Specific Integrated Circuit: 180 nm CMOS ASIC [0]: non-programmable hardware [1] Field Programmable Gate Array: Amphion CS5230 on Virtex2 + Xilinx Virtex2 Power Estimator FPGA [1]: programmable hardware [2] Dag Arne Osvik: 544 cycles AES – ECB on StrongArm SA-1110 [2]-[5]: general-purpose processors [3] Helger Lipmaa PIII assembly handcoded + Intel Pentium III (1.13 GHz) Datasheet Source: P. Schaumont, and I. Verbauwhede [4] gcc, 1 mW/MHz @ 120 Mhz Sparc – assumes 250 nm CMOS Adapted from: J. Cong [5] Java on KVM (Sun J2ME, non-JIT) on 1 mW/MHz @ 120 MHz Sparc – assumes 250 nm CMOS wl 2021 1.2 TPU: Tensor Processing Unit Systolic Array Source: N.P. Jouppi et al. We will find out the secret of systolic array design in Lecture 7! wl 2021 1.3 Learning outcomes: ability to • develop parametric descriptions of custom computers • develop alternative designs for custom computers that meet specified requirements • analyse the performance of a custom computer in terms of time and space • evaluate space/time trade-offs between competing custom computing designs in order to determine optimal solutions • use simulation to compare the intended and actual behaviour of custom computers wl 2021 1.4 Module plan Week Monday Thursday Remarks (starting) 2 (18/1) Lecture 1 Lecture 3 Technologies and systems; Lecture 2 Ex 1 parametric block description 3 (25/1) Lecture 4 Lecture 5 Patterns of computation; repeated composition; Ex 2 Ex 3 types, laws 4 (01/2) Lecture 6 Lecture 8 Reasoning and specialisation; sequential designs Lecture 7 Ex 4 and pipelining; systolic design 5 (08/2) Lecture 9 Lecture 10 Industrial case studies; state machines; Ex 5 Ex 6 summary 6 (15/2) Lectures 11 Lecture 12 Streaming design; iterations; stream offsets; Ex 7 Ex 8 hardware mapping 7 (22/2) Lecture 13 Lecture 14 Scheduling; design compilation; performance Ex 9 Ex 10 modelling 8 (01/3) Lecture 15 Lecture 16 + Loops and cyclic graphs; industrial case studies; Ex 11 Ex 12 summary 9 (08/3) Revision class Revision class Revision 10 (15/3) - - Timed assessment week 11 (22/3) - - Timed assessment week wl 2021 1.5 Custom computing: key principles • generalisation and specialisation • often start with design: f0 • generalise f0 to become f(x) – f(x0) = f0 where x is a parameter, x0 is a specific value f(x) design space generalise x=x0 designs f0 wl 2021 1.6 Custom computing: key principles • generalisation and specialisation • often start with design: f0 • generalise f0 to become f(x) – f(x0) = f0 where x is a parameter, x0 is a specific value • specialise f with values for x – to produce f1, f2, f3 … with tradeoffs in speed, size… f(x) design space generalise specialise x=x0 x=x1 designs f0 f1 wl 2021 1.7 Custom computing: key principles • generalisation and specialisation • often start with design: f0 • generalise f0 to become f(x) – f(x0) = f0 where x is a parameter, x0 is a specific value • specialise f with values for x – to produce f1, f2, f3 … with tradeoffs in speed, size… f(x) design space generalise specialise x=x x=x0 x=x1 2 designs f0 f1 f2 wl 2021 1.8 Custom computing: key principles • generalisation and specialisation • often start with design: f0 • generalise f0 to become f(x) – f(x0) = f0 where x is a parameter, x0 is a specific value • specialise f with values for x – to produce f1, f2, f3 … with tradeoffs in speed, size… f(x) design space generalise specialise x=x x=x3 x=x0 x=x1 2 designs f0 f1 f2 f3 wl 2021 1.9 Benefits of customisation • improvements in – accuracy: as needed, not necessarily 8, 32, 64, 128 bits – throughput: rate of producing results – latency: time between first input and first output – reconfiguration time: speed of adapting to changes – size: area, volume, weight – energy and power consumption: mobile and remote applications – development time: design and validation – cost: minimise fabrication, post-delivery fixes, enhancements • need to prioritise design objectives – e.g. smallest design at a given speed consuming given energy • opportunities for customisation – application-oriented, e.g. run-time conditions – implementation-oriented, e.g. technology used wl 2021 1.10 Implementation technologies • application-specific integrated circuit (ASIC) – high performance, low part cost: cheap if producing large volume – high risk, high development cost, slow time-to-market – costly (Moore’s Second Law) to develop, build and test, inflexible wl 2021 1.11 FPGA: Field Programmable Gate Array Arithmetic Block I/O Block Xilinx Virtex-6 FPGA Memory Block Arithmetic Block Memory Block (20TB/s) Source: Maxeler FPGA: Field Programmable Gate Array Arithmetic Block I/O Block Logic Cell (105 elements) Memory Block (20TB/s) Source: Maxeler FPGA: getting more heterogeneous Scalar, Sequential Flexible Parallel Compute, Machine learning & Signal Processing & Complex Compute Data manipulation Vector, Compute Intensive Heterogeneous Acceleration from Data Center to the Edge Scalar Adaptable Intelligent 160 GB/s of AI Engines Memory B/W Video + AI Arm per Core Dual-Core Cortex-A72 Genomics + AI Risk Modeling + AI Arm Dual-Core Cortex-R5 Database + AI NETWORK-ON-CHIP Network IPS + AI I/O Storage + AI Any-to-Any Custom Memory TB/s of Bandwidth Connectivity Hierarchy PL-to-AI Engine Delivering Deterministic Performance & Low Latency Source: Xilinx Accelerate clouds: Microsoft + Amazon www.top500.org/news/microsoft-goes-all-in-for-fpgas-to-build-out-cloud-based-ai/ aws.amazon.com/ec2/instance-types/f1/ wl 2021 1.15 Implementation technologies • application-specific integrated circuit (ASIC) – high performance, low part cost: cheap if producing large volume – high risk, high development cost, slow time-to-market – costly (Moore’s Second Law) to develop, build and test, inflexible • Field-Programmable Gate Array (FPGA) – low risk, fast time-to-market, low development cost, high part cost – post-delivery improvement: fix bugs, update functions – customisable at run time: adapt to environment changes – prototype for ASIC – enable internet routing • custom computing systems – stand-alone – PCIe / Infiniband – system-on-chip: instruction processor + FPGA wl 2021 1.16 When to specialise? ASIC: Application-Specific Integrated Circuit • fabrication time: pre-fab optimisation – specialise physical fabric, ↓ post-fab options – ↓ flexibility, ↑ efficiency for compilation and execution wl 2021 1.17 When to specialise? FPGA: field programmable gate array • fabrication time: pre-fab optimisation – specialise physical fabric, ↓ post-fab options – ↓ flexibility, ↑ efficiency for compilation and execution • compile time: pre-execution optimisation – specialise initial mapping to fabric, ↓ execution options – ↓ efficiency for compilation, ↑ efficiency for execution wl 2021 1.18 When to specialise? instruction processor, FPGA overlay or reconfiguration • fabrication time: pre-fab optimisation – specialise physical fabric, ↓ post-fab options – ↓ flexibility, ↑ efficiency for compilation and execution • compile time: pre-execution optimisation – specialise initial mapping to fabric, ↓ execution options – ↓ efficiency for compilation, ↑ efficiency for execution • run time – specialise mapping to fabric during execution – ↑ flexibility, ↓ efficiency for execution wl 2021 1.19 Technology comparison temporal + spatial specialisation at compile time and run time FPGAs General-Purpose Instruction Processors spatial specialisation at fab time and compile Digital Signal Processors time, temporal specialisation at run time Flexibility Special-Purpose Instruction Processors ASICs Efficiency, Performance Adapted from K. Fan, HPCA’09 wl 2021 1.20 Makimoto’s Wave: cyclical innovation Generalisation at fab time, specialisation at compile/run time Adapted from T. Makimoto, IEEE Computer’13 Specialisation at fab time wl 2021 1.21 Design metrics • NRE (non-recurring engineering) cost – one-time cost of designing a system • total cost: total cost = NRE cost + unit cost * number of units • size, performance, power • flexibility – make changes to the hardware with low NRE cost • time-to-prototype, time-to-market • maintainability • correctness, safety, robustness Source: J. Wong wl 2021 1.22 FPGA/ASIC crossover points Cost FPGA FPGACost Advantage CostFPGA Advantage Cost AdvantageASIC CostASIC Advantage Cost Advantage Production Volume Source: S.S.S.P. Rao wl 2021 1.23 Current and future: System-on-Chip I/O Ring and Interface Circuitry Processor eg ARM Embedded Fixed Fixed - functionality Processor IP IP specified using Block Block software On-Chip Reconfigurable Memory Logic Fixed Intellectual I/O Ring and Interface Circuitry Property Block - functionality fixed at design time Programmable Logic - little post-fab - circuit can be specified / modified flexibility after fabrication, possibly at run time - maybe slower than fixed IP block Source: S. Wilton wl 2021 1.24 Summary • custom computing: theory and practice of customisation – from data centres/cloud computing to mobile appliances • customisable off-the-shelf implementation technology – e.g. FPGAs, coarse-grained/hybrid processors, custom instructions • factors favouring field-programmability – rise in FPGA capability: many exciting applications – rise in integrated circuit fabrication cost: zero for FPGA users! – customisation: facilitate product evolution and prototyping • custom computing tools + applications at Imperial College – financial analysis/trading, multimedia processing, medical imaging – network firewall, data compression/encryption, mobile robots – bio-informatics, machine learning, bio-inspired/self-aware systems see: http://cc.doc.ic.ac.uk wl 2021 1.25.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us