Accelerating RedisEdge with CCIX Jeff Defilippi Director Product Management Arm

#ArmTechCon Copyright © 2019 Arm TechCon, All rights reserved. 1 Accelerating RedisEdge with CCIX

Copyright © 2019 Arm TechCon, All rights reserved. 2 Data Consumption and Latency Driving Future Designs

Massive Amounts 5G of Data Critical Data Edge Trillions Cloud Data Edge Centers of Devices Edge

Analyze & Store Filter & React Local Decisions

Latency Requirements for Emerging Applications Robotics / Medical V2X Control Smart City, IoT Control 2.2ms 10ms 15-20ms

Copyright © 2019 Arm TechCon, All rights reserved. 3 The Throughput Challenge

300 All Internet Data 250 278

228 200 186 150 151

EXABYTES/MONTH 100 122 96

50

0 2016 2017 2018 2019 2020 2021

Source: Cisco Visual Networking Index 2016-2021

Copyright © 2019 Arm TechCon, All rights reserved. 4 Most Content is Served from the Edge

300

250

200 208 CDN + Managed IP services 150 166 133 EXABYTES/MONTH 100 104 81 61 50 Remaining bandwidth (to Cloud) 35 41 47 54 63 70 0 2016 2017 2018 2019 2020 2021

Copyright © 2019 Arm TechCon, All rights reserved. 5 Data Demands for New Use Cases

500M HD image sensors 329EB/ 300 month

250

200 208 Visual positioning – 150 166 100M driving hrs/day 133 6EB/month EXABYTES/MONTH 100 104 81 Voice UI for 61 50 50% of cellphone subscribers – 35 41 47 54 63 70 1EB/month 0 2016 2017 2018 2019 2020 2021

Copyright © 2019 Arm TechCon, All rights reserved. 6 Domain Compute Required to Filter and React at the Edge

…….Faster and faster general-purpose compute used to be good enough for all workloads…

general purpose compute

Copyright © 2019 Arm TechCon, All rights reserved. 7 Domain Compute Required to Filter and React at the Edge

…….Faster and faster general-purpose compute used to be good enough for all workloads…

general purpose network storage security ml/ai graphics compute ... … heterogenous, intelligent & optimized compute is the only way to keep up

Copyright © 2019 Arm TechCon, All rights reserved. 8 Secure Edge Platform

Cloud native deployments Edge

Stack Engines

Multi-tenancy, virtualized, secured Analytics

SQL/NoSQL Data

Network Services Network

Network PredictionNetwork Filter / React / / React Store Filter/

Containers / Hypervisor Diverse, heterogenous systems HW Acceleration

RUNTIME SECURITY SERVICESSECURITY RUNTIME Arm Neoverse Platform

Root of Trust Arm PSA

Copyright © 2019 Arm TechCon, All rights reserved. 9 RedisEdge: Purpose Built for the Edge

Redis high performance, fast in-memory database • <1ms latency for real-time processing Redis with Streams + Modules • RedisTimeSeries, RedisAI, RedisGears

Tunable data persistence • Ensure recovery of critical data Small footprint • <5MB for resource constrained deployments (<8 cores)

Copyright © 2019 Arm TechCon, All rights reserved. 10 Video Analytics Example – Count the Count

https://github.com/RedisGears/EdgeRealtimeVideoAnalytics

Copyright © 2019 Arm TechCon, All rights reserved. 11 Edge Development Platform

Copyright © 2019 Arm TechCon, All rights reserved. 12 Arm Neoverse N1: Cloud to Edge Performance

Hyperscale Edge Edge Datacenter Compute Access

150W and beyond 35W-105W 15W-65W 64-128 core 16-64 core 8-32 core

Neoverse N1 offers hyperscale performance with industry leading power/area efficiency for more applications

Copyright © 2019 Arm TechCon, All rights reserved. 13 Arm Neoverse N1 System Development Platform

4x Neoverse N1 CPUs at speeds up to 3 GHz Cutting-edge TSMC 7nm manufacturing process Silicon interoperability proof of N1 platform with Cadence’s PCIe/CCIX Gen4/3 and DDR4/3 IP Demonstrate performant cache-coherent integration of accelerators using ’s Alveo (U280) CCIX accelerator card Benchmark performance of cloud-native workloads and optimize open-source software

Copyright © 2019 Arm TechCon, All rights reserved. 14 N1 SDP: Reference Open Source Software

Application Processor software Compute Subsystem Supervisory firmware

Distro & grub Secure Partition Application Supervisory SCP firmware bootloader Under development CPUs compute (Cortex-M) Open Source firmware for N1SDP for system initialization Linux Kernel power control and SCP UEFI/EDK2 MCP Firmware MCP SoC level system management Trusted Firmware-A SoC (Interconnect, IO) • Open source firmware for CPU initialization & system management • Secure runtime interface

https://community.arm.com/developer/tools-software/oss-platforms/w/docs/440/neoverse-n1-sdp

Copyright © 2019 Arm TechCon, All rights reserved. 15 Alveo U280 – Breathe New Life into Your Data Center

16nm Cloud Deployable UltraScale™ Architecture

Off-Chip Memory Support Cloud ↔ On-Premise • Max Capacity: 8GB Mobility • Max Bandwidth: 460 GB/s

Ecosystem of Applications Internal SRAM • Many available today • Max Capacity: 43MB • More on the way • Max Bandwidth: 37TB/s

PCIe Gen3x16, Gen4x8, Server OEM Support • Major OEMs in Qualification CCIX

Accelerate Any Application • IDE for compiling, debugging, profiling • Supports C/C++, RTL, and OpenCL

Copyright © 2019 Arm TechCon, All rights reserved. 16 CCIX Seamless Acceleration

Copyright © 2019 Arm TechCon, All rights reserved. 17 CCIX – What is it?

Cache Coherent Interface for Xcellerators (CCIX)

• Accelerates heterogeneous computing • Builds on PCIe standard and infrastructure • Adds cache coherency protocol

Supports PCIe Gen 4.0 (16GT/s) and Gen 5.0 (32GT/s)

• Includes intermediate speeds, 20GT/s, 25GT/s

Copyright © 2019 Arm TechCon, All rights reserved. www.ccixconsortium.com18 CCIX Products in the Market Members currently shipping real silicon

Copyright © 2019 Arm TechCon, All rights reserved. 19 Flexible, Scalable Interconnect Topologies

Processor Processor

Processor CCIX CCIX Processor Processor

CCIX CCIX CCIX CCIX CCIX CCIX CCIX

Switch Accel Accel

CCIX CCIX

CCIX CCIX CCIX CCIX CCIX CCIX Accelerator Memory

Accelerator Memory CCIX CCIX

CCIX CCIX

Accel Accel

Direct attached, daisy chain, mesh and switched topologies

CCIX CCIX CCIX CCIX

Copyright © 2019 Arm TechCon, All rights reserved. 20 Limitations with Traditional Accelerator Attach

SW DMA Engine: Clean and copy data PCIe != on-chip Processor Require software to move data Accelerator CPU CPU PCIe Special driver for each accelerator On-chip CPU CPU Requires skilled kernel developers Private Accel Memory Long term maintenance required Memory

OS Virtual Memory Reduced ability to leverage high level languages and tools

Copyright © 2019 Arm TechCon, All rights reserved. 21 Benefits of Virtualized, Coherent Accelerators

Shared Virtual Memory

‘Driverless’ model, it’s all memory Processor Accelerator CPU CPU CCIX On-chip = multichip

On-chip bus Cache Eliminates bespoke DMA driver CPU CPU

Enables, cache coherent fine grain Accel Memory data sharing Memory

Shared Data Structures

Copyright © 2019 Arm TechCon, All rights reserved. 22 Accelerator Framework for CCIX Virtual Functions

Access to CCIX resources from virtual Guest App Guest App machines and containers CCIX VF0 CCIX CCIX Guest VMs/ VF1 VF2 Containers Capable of creating CCIX Virtual Functions (VF) in any SW language Hypervisor

Simple R/W access to any memory region CCIX ext PCIe SW Model OS Kernel

Processor Mem Accel Mem

Copyright © 2019 Arm TechCon, All rights reserved. www.ccixconsortium.com23 Industry Multichip Standardization

CCIX: Cache Coherent Interface for Xccelerators

• Symmetrical coherency for scale-up and heterogeneous compute-to-compute

• Ability to create a network of coherent devices enabled devices (tree, mesh, etc)

• Accelerator-to-accelerator data movement CXL: Compute Express Link

• CXL 1.1 specification available, built on PCIe Gen5

• Simple, asymmetrical coherency model

• Focus on accelerator and memory attach to server compute Arm investment within CCIX and CXL

• Focus on enabling innovation for emerging use-cases and workloads • Unify the software stack between CCIX and CXL • Support PCIe, CCIX, CXL on Neoverse CMN IP for both server and accelerator SoC designs

Copyright © 2019 Arm TechCon, All rights reserved. 24 Accelerating RedisEdge

Copyright © 2019 Arm TechCon, All rights reserved. 25 Purpose Built for the AI Edge

Time RedisEdge = Redis + Streams + + AI + Gears Series

Copyright © 2019 Arm TechCon, All rights reserved. 26 Accelerating RedisEdge – Two Use Cases

1. Persistent memory expansion

2. Acceleration function offload

Copyright © 2019 Arm TechCon, All rights reserved. 27 Persistence with PCIe Attached SSD

Periodic back-up (every second) of a log file (AOF) and checkpoint file (RDB) 2 Redis database application Arm N1 SDP

works out of memory SSD PCIe 1 3 DDR Memory If crash occurs, database is restored and rebuilt

Copyright © 2019 Arm TechCon, All rights reserved. 28 Benefits of Persistent Memory Expansion

Replace SSD with CCIX persistent memory

Arm N1 SDP Xilinx U280 2x better performance by eliminate

processing for log/checkpoint

CCIX CCIX

No risk of critical data loss DDR Memory 1 PMEM

Application works directly on Instantaneous restart NVMe PMEM

Copyright © 2019 Arm TechCon, All rights reserved. 29 Persistent + Near Memory Acceleration

Off-line acceleration - regression, analytics, etc Add in-line or off-line custom acceleration to offload critical CPU cycles Arm N1 SDP Xilinx U280

2 CCIX In-line acceleration: add on the path to CCIX memory such as crypto and compression 1

DDR Memory PMEM Off-line acceleration: add an off-line module that runs directly on the accelerator In-line acceleration - crypto, compression, etc

Copyright © 2019 Arm TechCon, All rights reserved. 30 RedisGears – Serverless Engine

Support for both event driven and batch operations

Develop routines in C or Python

Port modules to accelerator without kernel changes • On-chip or off-chip accelerators

Copyright © 2019 Arm TechCon, All rights reserved. 31 RedisEdge with Module Acceleration

Arm N1 SDP Xilinx U280

CCIX CCIX

DDR Memory PMEM

Copyright © 2019 Arm TechCon, All rights reserved. 32 Accelerating the AI Edge See the demo in the CCIX Booth!

RedisEdge: purpose-built data service platform for the AI edge

Arm Neoverse N1: secure, hyperscale compute in edge power budget

Xilinx Alveo U280: accelerate any workload

CCIX: seamless, driverless, efficient data movement

Copyright © 2019 Arm TechCon, All rights reserved. 33 Trademark and copyright statement The trademarks featured in this presentation are Thank You! registered and/or unregistered trademarks of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

Copyright © 2019

#ArmTechCon Copyright © 2019 Arm TechCon, All rights reserved. 34