Poster/Demo Sessions Slides

Poster/Demo Sessions Slides From RISC-V Workshop in Barcelona 7-10 May, 2018 Table of Contents for Poster/Demo Sessions Slides • Derek Atkins, Slide 3 • Eric Matthews and Lesley Shannon, Slides 47 – 48 • Mary Bennett, Slides 4 – 5 • Ekaterina Berezina and Andrey Smolyarov, • Paulo Matos, Slides 49 – 50 Slides 6 – 7 • Lucas Morais, Slides 51 – 52 • Alex Bradbury, Slide 8 • Mauro Olivieri, Slides 53 – 54 • Luca Carloni and Christian Palmiero, Slides 9 - • Aleksandar Pajkanovic, Slides 55 – 56 10 • Atish Patra, Slides 57 – 60 • Jie Chen, Slides 11 – 13 • Shubhodeep Roy, Slides 61 – 63 • Matt Cockrell, Slides 14-26 • Boris Shingarov, Slides 64 – 65 • Alberto Dassatti, Slides 27 – 28 • • Christian Fabre, Slides 29 – 30 Christoph Schulz, Slides 66 – 67 • Juan Fumero, Slides 31 – 32 • Wei Song, Rui Hou and Dan Meng, Slides 68 – 69 • Nicolas Gaude and Hai Yu, Slides 33 – 36 • Greg Sullivan, Slides 70 – 71 • Chris Jones and Zdenek Prikryl, Slides 37 – 38 • Robert Trout, Slide 72 • Felix Kaiser, Slides 39 – 40 • Vasily Varaksin and Ekaterina Berezina, Slides 73 – 74 • Alexander Kamkin and Andrei Tatarnikov, Slides • Danny Ybarra, Slides 75 – 82 41 – 42 • Simon Davidmann and Lee Moore, 83 – 87 • Luke Leighton, Slide 43 • Heng Lin, Slides 44 – 45 • Maja Malenko, Slide 46 Fast, Quantum-Resistant Secure Boot Solution For RISC-V MCUs with off-chip program stores Demo at Poster Session: Mynewt RTOS on HiFive1 development board Run-time metrics Security Verification Powered by Walnut Digital Signature Method Level Time Algorithm™ ECDSA P256 128-bit 179 ms • Up to 40X faster than ECDSA at 128-bit ECDSA P521 256-bit (not supported) security • Small footprint: easily fits in FE310’s 8K byte WalnutDSA 128-bit 6.76 ms OTP ROM WalnutDSA 256-bit 33.6 ms MCU: SiFive FE310, 32-bit RISC-V • Based on Group Theoretic Cryptography CPU Clock: 256 MHz (GTC) RTOS: Mynewt 1.3.0 • GTC Leverages - Structured groups - Matrices and permutations - Arithmetic over finite fields 3 What is CGEN How does it work? (define-insn (name "add") (comment "register add") (attrs base-isas) (syntax "add ${rd},${rs1},${rs2}") (format + (f-funct7 funct7) rs2 rs1 (f-funct3 funct3) rd (f-opcode opcode)) (semantics (set DI rd (add DI rs1 rs2))) ) About Syntacore IP company, founding member of RISC-V foundation Develops and licenses state-of-the-art RISC-V cores ■ Initial line is available and shipping to customers ■ SDKs, samples in silicon, full collateral ■ Core team comes from 10+ years of highly-relevant background ■ 2.5+ years of focused RISC-V development Full service to specialize CPU IP for customer needs ■ One-stop workload-specific customization for 10x improvements • with tools/compiler support ■ IP hardening at the required library node ■ SoC integration and SW migration Copyright © 2018 Syntacore. All trademarks, product, and brand names belong to their respective owners. SCR1 overview Compact MCU core for deeply embedded applications ■ Open source under SHL-license since May 2017 (Apache 2.0 derivative with HW specific) Unrestricted commercial use allowed ■ RV32I|E[MC] ISA ■ <15 kGates in basic RV32EC configuration ■ 2 to 4 stages pipeline ■ M-mode only ■ Optional configurable IPIC: 8..32 IRQs -O2 1.28 ■ Optional integrated Debug Controller DMIPS OpenOCD based -best** 1.72 Performance*, ■ Verification suite per MHz * Dhrystone 2.1, Coremark 1.0, GCC 7.1 BM from TCM-O2 2.60 ■ Documentation ** -O3 -funroll-loops -fpeel-loops -fgcse-smCoremark -fgcse-las -flto ■ Best-effort support provided -best** 2.78 ■ Commercial support available https://github.com/syntacore/scr1 Copyright © 2018 Syntacore. All trademarks, product, and brand names belong to their respective owners. Diving into RISC-V LLVM ● Why RISC-V LLVM ● Current status ● Approach ● Nest steps ● Supporting your own extension ● Get involved Alex Bradbury [email protected] @asbradbury @lowrisc Securing a RISC-V Core for IoT Applications with Dynamic Information Flow Tracking (DIFT) I/O CHANNELS Motivation DIFT Step 1. Each data item coming • Many IoT devices are prone to TAG from potentially malicious attacks due to low-level INITIALIZATION channels is extended with a tag programming errors (e.g. buffer that marks it as spurious overflow and format string Step 2. During program attacks) VULNERABILITY execution, the RISC-V core Contributions performs tag propagation to PROGRAM keep track of information flows • Design and implementation of a DETECTION generated by spurious data low-overhead protection scheme based on DIFT to secure RISC-V Step 3. By tag checking, the cores for IoT applications RISC-V core detects if spurious SECURITY EXCEPTION data are used in an unsafe manner and generates a • Demonstration with an security exception FPGA-based prototype C. Palmiero, G. Di Guglielmo, L. Lavagno, and L. Carloni 9 Design, Implementation, and Evaluation of the DIFT-Enhanced RISC-V Processor Core PC T ALU Tag Decoder Load Check TPR Store Logic CSR TCR Unit off-chip IF ID EX Tag Instruction Register File T MULT ID EX WB Check Memory DIV FPU Logic Tag Tag Instruction Update Propagation Data Cache T Logic Logic Memory D-RI5CY off-chip • Design and implementation of a DIFT architecture for an optimized 4-stage, in-order, 32-bit RISC-V core based on the PULPino platform • The D-RI5CY architecture supports various software-programmable security policies; it was evaluated with policies for memory protection • The experimental results demonstrate that securing a RISC-V core with DIFT is feasible, does not incur in any run-time overhead, and requires negligible resources C. Palmiero, G. Di Guglielmo, L. Lavagno, and L. Carloni Ultra Low Power Cluster I$ exploration with RI5CY Cluster Domain (Cluster CLK, Cluster VDD) Dual Clock FIFOs Bank Bank TCDM Bank DMA #1 #2 . #16 Single-Ported Shared Instruction Cache I$ BANK Low latency Read TAG RI5CY … ... RI5CY Logarithmic interconnect #1 #8 Read DATA 112 L0 Buffer L0 Buffer Prefetch Prefetch 8 128 (128bits) 128 (128bits) interface interface Cluster Bus (AXI4) Read-Only Logarithmic DEMU DEMU Cache interconnect 8 x 8 X X TAG DATA Controlle 128 128 Peripheral … ... interconnect (Way (Way 1) r share share RI5CY … ... RI5CY 1) SCM Dual #1 #8 dI$ … ... dI$ L0 Buffer L0 Buffer SCM Port #1 #8 Prefetch Prefetch Dual 128bit cache 128 (128bits) 128 (128bits) interface interface Port line 64 64 Private Write DATA … ... I$ Instruction Cache I$ 64 #1 #8 Instruction Bus (AXI4) Write 64 ... TAG 64 … ... 64 Instruction Bus (AXI4) 64 1 Two-Level Icache & Results Multi-Port Shared Instruction Cache Two-level relaxed-timing Instruction Cache I$ type RI5CY RI5CY Two- … ... SP MP #1 #8 Characteristics Level L0 Buffer L0 Buffer RI5CY RI5CY Prefetch Prefetch … ... 128 (128bits) 128 (128bits) #1 #8 interface interface L0 Buffer L0 Prefetch Prefetch [1] 128 (128bits) 128 Buffer Throughput -5% 1.00 -15% interface interface (128bits Cache Cache L1 ) Controller Controller I$ I$ … ... #1 … ... #8 128 128 Area 1.00 +50% +40% 128 128 Read-Only Logarithmic Read-Only Logarithmic ComparisonPower between-25% throughput,-50% area 1.00 … interconnect 8 x 8 interconnect 8 x 1 … andEfficiency power efficiency among SP, MP 128 L1.5 128 and two-level icache. 1.00 means the share share best among the three types of icache. DATA 128 DATA dI$ … ... dI$ & … ... … ... & #1 #8 [1] For throughput, the value of two- Master TAG TAG level icache is the worst case. Cache 64 … ... 64 Controll All value in the table are the average er Instruction Bus (AXI4) 64 value except [1] . For more details and 64 data, please see the poster. Instruction Bus (AXI4) 64 2 REFERENCES [1] I. Loi, A. Capotondi, D. Rossi, A. Marongiu and L. Benini, "The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores," in IEEE Transactions on Multi-Scale Computing Systems, vol. PP, no. 99, pp. 1-1. [2] L. Benini, E. Flamand, D. Fuin, and D. Melpignano, “P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator,” in 2012 Design, Automation Test in Europe Conference Exhibition (DATE), March 2012, pp. 983 –987. [3] A. Teman, D. Rossi, P. Meinerzhagen, L. Benini, and A. Burg, “Power, Area, and Performance Optimization of Standard Cell Memory Arrays Through Controlled Placement,” ACM Trans. Des. Autom. Electron. Syst., vol. 21, no. 4, pp. 59:1–59:25, May 2016. [Online]. Available: http://doi.acm.org/10.1145/2890498 3 Proprietary + Confidential Evaluation of RISC-V for Pixel Visual Core Matt Cockrell ([email protected]) (May 9th, 2018 1 Proprietary + Confidential Overview ● Use case ● Core selection ● Integration 2 Background: Pixel Visual Core Proprietary + Confidential “Pixel Visual Core is the Google-designed Image Processing Unit (IPU)—a fully programmable, domain-specific processor designed from scratch to deliver maximum performance at low power.”[1] Critical point: A dedicated A53 (top left) aggregates application layer IPU resource requests and configures appropriately. Magnified Image of Pixel Visual Core [1] “Pixel Visual Core: image processing and machine learning on Pixel 2”, Oct 17, 2017, Ofer Shacham, Google Inc. 3 Pixel Visual Core today Proprietary + Confidential Main Dedicated A-Class CPU CPU runs OS and support IPU I/O IPU 4 Modern SoC - multiple accelerators Proprietary + Confidential Modern mobile SoC with Main Main CPU no longer multiple devices. CPU dedicated only to IPU ⇒ Local controller is desired X Y Z PU I/O Add microcontroller as job scheduling and dispatch unit. Open source IP is an interesting option for this microcontroller. 5 Proprietary + Confidential Core Selection Considerations Level of Effort How difficult would it be to work with and integrate. Risk Stability and reliability of support. License Flexibility of use. 6 ential Candidate 1: Bottle Rocket (https://github.com/google/bottlerocket) ● Internal Project to demonstrate ability to easily develop custom RISC-V implementation by leveraging Rocket Chip. ● Implements RV32IMC ● Represents evaluating Rocket Chip as an option.

Poster/Demo Sessions Slides

Lecture 26: Domain Specific Architectures Chapter 07, CAQA 6Th Edition

P1360R0: Towards Machine Learning for C++: Study Group 19

SIMD in Different Processors

An Analysis and Implementation of the HDR+ Burst Denoising Method

Gables: a Roofline Model for Mobile Socs

Euphrates: Algorithm-Soc Co-Design for Low-Power Mobile Continuous Vision

Concepts Introduced in Chapter 7 Reasons for Adoption of Domain SpeciC Architectures

Cloudleak: DNN Model Extractions from Commercial Mlaas Platforms

COEN-4730 Computer Architecture Lecture 12 Domain Specific Architectures (DSA) Chapter 7

Review & Background

Mempool: a Shared-L1 Memory Many-Core Cluster with a Low-Latency Interconnect

Smartphone Huawei Senza Futuro Stop Ai Chip E Alla Licenza Android