Accelerating RedisEdge with CCIX Jeff Defilippi Director Product Management Arm
#ArmTechCon Copyright © 2019 Arm TechCon, All rights reserved. 1 Accelerating RedisEdge with CCIX
Copyright © 2019 Arm TechCon, All rights reserved. 2 Data Consumption and Latency Driving Future Designs
Massive Amounts 5G of Data Critical Data Edge Trillions Cloud Data Edge Centers of Devices Edge
Analyze & Store Filter & React Local Decisions
Latency Requirements for Emerging Applications Robotics / Medical V2X Control Smart City, IoT Control 2.2ms 10ms 15-20ms
Copyright © 2019 Arm TechCon, All rights reserved. 3 The Throughput Challenge
300 All Internet Data 250 278
228 200 186 150 151
EXABYTES/MONTH 100 122 96
50
0 2016 2017 2018 2019 2020 2021
Source: Cisco Visual Networking Index 2016-2021
Copyright © 2019 Arm TechCon, All rights reserved. 4 Most Content is Served from the Edge
300
250
200 208 CDN + Managed IP services 150 166 133 EXABYTES/MONTH 100 104 81 61 50 Remaining bandwidth (to Cloud) 35 41 47 54 63 70 0 2016 2017 2018 2019 2020 2021
Copyright © 2019 Arm TechCon, All rights reserved. 5 Data Demands for New Use Cases
500M HD image sensors 329EB/ 300 month
250
200 208 Visual positioning – 150 166 100M driving hrs/day 133 6EB/month EXABYTES/MONTH 100 104 81 Voice UI for 61 50 50% of cellphone subscribers – 35 41 47 54 63 70 1EB/month 0 2016 2017 2018 2019 2020 2021
Copyright © 2019 Arm TechCon, All rights reserved. 6 Domain Compute Required to Filter and React at the Edge
…….Faster and faster general-purpose compute used to be good enough for all workloads…
general purpose compute
Copyright © 2019 Arm TechCon, All rights reserved. 7 Domain Compute Required to Filter and React at the Edge
…….Faster and faster general-purpose compute used to be good enough for all workloads…
general purpose network storage security ml/ai graphics compute ... … heterogenous, intelligent & optimized compute is the only way to keep up
Copyright © 2019 Arm TechCon, All rights reserved. 8 Secure Edge Platform
Cloud native deployments Edge
Stack Engines
Multi-tenancy, virtualized, secured Analytics
SQL/NoSQL Data
Network Services Network
Network PredictionNetwork Filter / React / / React Store Filter/
Containers / Hypervisor Diverse, heterogenous systems HW Acceleration
RUNTIME SECURITY SERVICESSECURITY RUNTIME Arm Neoverse Platform
Root of Trust Arm PSA
Copyright © 2019 Arm TechCon, All rights reserved. 9 RedisEdge: Purpose Built for the Edge
Redis high performance, fast in-memory database • <1ms latency for real-time processing Redis with Streams + Modules • RedisTimeSeries, RedisAI, RedisGears
Tunable data persistence • Ensure recovery of critical data Small footprint • <5MB for resource constrained deployments (<8 cores)
Copyright © 2019 Arm TechCon, All rights reserved. 10 Video Analytics Example – Count the Count
https://github.com/RedisGears/EdgeRealtimeVideoAnalytics
Copyright © 2019 Arm TechCon, All rights reserved. 11 Edge Development Platform
Copyright © 2019 Arm TechCon, All rights reserved. 12 Arm Neoverse N1: Cloud to Edge Performance
Hyperscale Edge Edge Datacenter Compute Access
150W and beyond 35W-105W 15W-65W 64-128 core 16-64 core 8-32 core
Neoverse N1 offers hyperscale performance with industry leading power/area efficiency for more applications
Copyright © 2019 Arm TechCon, All rights reserved. 13 Arm Neoverse N1 System Development Platform
4x Neoverse N1 CPUs at speeds up to 3 GHz Cutting-edge TSMC 7nm manufacturing process Silicon interoperability proof of N1 platform with Cadence’s PCIe/CCIX Gen4/3 and DDR4/3 IP Demonstrate performant cache-coherent integration of accelerators using Xilinx’s Alveo (U280) CCIX accelerator card Benchmark performance of cloud-native workloads and optimize open-source software
Copyright © 2019 Arm TechCon, All rights reserved. 14 N1 SDP: Reference Open Source Software
Application Processor software Compute Subsystem Supervisory firmware
Distro & grub Secure Partition Application Supervisory SCP firmware bootloader Under development CPUs compute (Cortex-M) Open Source firmware for N1SDP for system initialization Linux Kernel power control and SCP UEFI/EDK2 MCP Firmware MCP SoC level system management Trusted Firmware-A SoC (Interconnect, IO) • Open source firmware for CPU initialization & system management • Secure runtime interface
https://community.arm.com/developer/tools-software/oss-platforms/w/docs/440/neoverse-n1-sdp
Copyright © 2019 Arm TechCon, All rights reserved. 15 Alveo U280 – Breathe New Life into Your Data Center
16nm Cloud Deployable UltraScale™ Architecture
Off-Chip Memory Support Cloud ↔ On-Premise • Max Capacity: 8GB Mobility • Max Bandwidth: 460 GB/s
Ecosystem of Applications Internal SRAM • Many available today • Max Capacity: 43MB • More on the way • Max Bandwidth: 37TB/s
PCIe Gen3x16, Gen4x8, Server OEM Support • Major OEMs in Qualification CCIX
Accelerate Any Application • IDE for compiling, debugging, profiling • Supports C/C++, RTL, and OpenCL
Copyright © 2019 Arm TechCon, All rights reserved. 16 CCIX Seamless Acceleration
Copyright © 2019 Arm TechCon, All rights reserved. 17 CCIX – What is it?
Cache Coherent Interface for Xcellerators (CCIX)
• Accelerates heterogeneous computing • Builds on PCIe standard and infrastructure • Adds cache coherency protocol
Supports PCIe Gen 4.0 (16GT/s) and Gen 5.0 (32GT/s)
• Includes intermediate speeds, 20GT/s, 25GT/s
Copyright © 2019 Arm TechCon, All rights reserved. www.ccixconsortium.com18 CCIX Products in the Market Members currently shipping real silicon
Copyright © 2019 Arm TechCon, All rights reserved. 19 Flexible, Scalable Interconnect Topologies
Processor Processor
Processor CCIX CCIX Processor Processor
CCIX CCIX CCIX CCIX CCIX CCIX CCIX
Switch Accel Accel
CCIX CCIX
CCIX CCIX CCIX CCIX CCIX CCIX Accelerator Memory
Accelerator Memory CCIX CCIX
CCIX CCIX
Accel Accel
Direct attached, daisy chain, mesh and switched topologies
CCIX CCIX CCIX CCIX
Copyright © 2019 Arm TechCon, All rights reserved. 20 Limitations with Traditional Accelerator Attach
SW DMA Engine: Clean and copy data PCIe != on-chip Processor Require software to move data Accelerator CPU CPU PCIe Special driver for each accelerator On-chip bus System Bus CPU CPU Requires skilled kernel developers Private Accel Memory Long term maintenance required Memory
OS Virtual Memory Reduced ability to leverage high level languages and tools
Copyright © 2019 Arm TechCon, All rights reserved. 21 Benefits of Virtualized, Coherent Accelerators
Shared Virtual Memory
‘Driverless’ model, it’s all memory Processor Accelerator CPU CPU CCIX On-chip = multichip
On-chip bus Cache Eliminates bespoke DMA driver CPU CPU
Enables, cache coherent fine grain Accel Memory data sharing Memory
Shared Data Structures
Copyright © 2019 Arm TechCon, All rights reserved. 22 Accelerator Framework for CCIX Virtual Functions
Access to CCIX resources from virtual Guest App Guest App machines and containers CCIX VF0 CCIX CCIX Guest VMs/ VF1 VF2 Containers Capable of creating CCIX Virtual Functions (VF) in any SW language Hypervisor
Simple R/W access to any memory region CCIX ext PCIe SW Model OS Kernel
Processor Mem Accel Mem
Copyright © 2019 Arm TechCon, All rights reserved. www.ccixconsortium.com23 Industry Multichip Standardization
CCIX: Cache Coherent Interface for Xccelerators
• Symmetrical coherency for scale-up and heterogeneous compute-to-compute
• Ability to create a network of coherent devices enabled devices (tree, mesh, etc)
• Accelerator-to-accelerator data movement CXL: Compute Express Link
• CXL 1.1 specification available, built on PCIe Gen5
• Simple, asymmetrical coherency model
• Focus on accelerator and memory attach to server compute Arm investment within CCIX and CXL
• Focus on enabling innovation for emerging use-cases and workloads • Unify the software stack between CCIX and CXL • Support PCIe, CCIX, CXL on Neoverse CMN IP for both server and accelerator SoC designs
Copyright © 2019 Arm TechCon, All rights reserved. 24 Accelerating RedisEdge
Copyright © 2019 Arm TechCon, All rights reserved. 25 Purpose Built for the AI Edge
Time RedisEdge = Redis + Streams + + AI + Gears Series
Copyright © 2019 Arm TechCon, All rights reserved. 26 Accelerating RedisEdge – Two Use Cases
1. Persistent memory expansion
2. Acceleration function offload
Copyright © 2019 Arm TechCon, All rights reserved. 27 Persistence with PCIe Attached SSD
Periodic back-up (every second) of a log file (AOF) and checkpoint file (RDB) 2 Redis database application Arm N1 SDP
works out of memory SSD PCIe 1 3 DDR Memory If crash occurs, database is restored and rebuilt
Copyright © 2019 Arm TechCon, All rights reserved. 28 Benefits of Persistent Memory Expansion
Replace SSD with CCIX persistent memory
Arm N1 SDP Xilinx U280 2x better performance by eliminate
processing for log/checkpoint
CCIX CCIX
No risk of critical data loss DDR Memory 1 PMEM
Application works directly on Instantaneous restart NVMe PMEM
Copyright © 2019 Arm TechCon, All rights reserved. 29 Persistent + Near Memory Acceleration
Off-line acceleration - regression, analytics, etc Add in-line or off-line custom acceleration to offload critical CPU cycles Arm N1 SDP Xilinx U280
2 CCIX In-line acceleration: add on the path to CCIX memory such as crypto and compression 1
DDR Memory PMEM Off-line acceleration: add an off-line module that runs directly on the accelerator In-line acceleration - crypto, compression, etc
Copyright © 2019 Arm TechCon, All rights reserved. 30 RedisGears – Serverless Engine
Support for both event driven and batch operations
Develop routines in C or Python
Port modules to accelerator without kernel changes • On-chip or off-chip accelerators
Copyright © 2019 Arm TechCon, All rights reserved. 31 RedisEdge with Module Acceleration
Arm N1 SDP Xilinx U280
CCIX CCIX
DDR Memory PMEM
Copyright © 2019 Arm TechCon, All rights reserved. 32 Accelerating the AI Edge See the demo in the CCIX Booth!
RedisEdge: purpose-built data service platform for the AI edge
Arm Neoverse N1: secure, hyperscale compute in edge power budget
Xilinx Alveo U280: accelerate any workload
CCIX: seamless, driverless, efficient data movement
Copyright © 2019 Arm TechCon, All rights reserved. 33 Trademark and copyright statement The trademarks featured in this presentation are Thank You! registered and/or unregistered trademarks of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.
Copyright © 2019
#ArmTechCon Copyright © 2019 Arm TechCon, All rights reserved. 34