Title

AdaptingName, the Wave Day Dataflow Month, Architecture 2019 to a Licensable AI IP Product

Presented by Yuri Panchul, MIPS Open Technical Lead On SKOLKOVO Robotics & AI Conference. April 15-16, 2019 www.wavecomp.ai Wave + MIPS: A Powerful History of Innovation

2010 2012 2014 2016 2018 Wave founded by Dado Banatao as Developed Coarse Grain Delivered 11GHz test Announced Derek Meyer as CEO Wave acquires MIPS to deliver on its vision for of Wave Semiconductor, with a vision Reconfigurable Array (CGRA) chip at 28nm revolutionizing AI from the datacenter to the edge of ushering in a new era of AI semiconductor architecture Launched Early Access Program computing to enable data scientists to experiment Announced partnership with Broadcom and with neural networks to develop next-gen AI chip on 7nm node Launched MIPS Open initiative Closed Series D Round of funding at $86M, bringing total investment to $115M+ Expanded global footprint with offices in and Manila

2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2015 2011 2013 2017 2019 Renamed the company to Wave develops dataflow-based Wave expanded team to include MIPS business is sold by Created technology, providing higher Wave Computing to better Imagination Technologies to MIPS Open architecture, silicon and reflect focus on accelerating AI performance and scalability software expertise Tallwood Venture Capital as Advisory Board for AI applications with dataflow-based solutions Tallwood MIPS Inc. for $65M MIPS Technologies is sold to MIPS Powering 80% of MIPS Automotive MCUs MIPS introduces first Android-MIPS Imagination Technologies. MediaTek selects MIPS for LTE ADAS-enabled automobiles based Set top box at CES & ADAS engagements with modems MobilEye, Microchip & MStar Wave Joins Amazon, Facebook, Google selects MIPS architecture for use Google and Samsung to Support in its Android 3.0, "Honeycomb“ mobile Advanced AI Research at UC device Berkeley

TALLWOOD Venture Capital Datacenter-to-Edge Needs Systems, Silicon and IP

Compute Systems AI Processors AI Licensable IP

“Deep learning is a fundamentally new software model. So we need a new computer platformTritonAI toTM run it...”

Semiconductor Cloud Enterprise Partners

AI Applications Software

Autonomous Driving NLP Retail Smart Cities

Wave Computing © 2019 3 What is Driving AI to the Edge?

Market Drivers

Networking Enterprise Mobile

Industrial Autonomous IOT AI was born in AI Use Cases Datacenter“Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Privacy Security

Isolated Low latency Cost

Bandwidth

Wave Computing © 2019 Storage Compute 4 Scalable AI Platform: TritonAI™ 64

Key Benefits:

• Highly Scalable to address broad AI use cases MIPS64 + WaveFlowTM WaveTensorTM SIMD • Supports Inference and Training Multi-CPU High flexibility to High efficiency • High“Deep flexibility learning tois asupport fundamentally all AI algorithms new software model. RichSo weInstruction need a newsupport computer all AI platformfor key to CNNrun it...” algorithms AI algorithms Set Proven Tools • High efficiency for key AI CNN algorithms Mature Software • Configurable to support AI use cases • Mature Software Platform support

Software Platform → Flexibility → Efficiency

Wave Computing © 2019 5 Data Rates and New AI Networks Requirements

Data Rates & Network Complexity Drive Data Compute Performance Demands (Inferences /sec) Data 100K Rates Aggregated Core Network Data

GHz 10K

Video to Radar Lidar 1K MHz

Music to Imaging 100

KHz Data Compute Performance Compute Data Voice 10 Accelerometer Hz MIPS64+SIMD 1

0.01 W 0.1 W 1 W 10 W 100 W 1 KW **These curves represent the conceptional combination of these technologies, not actual independent performance.** Wave Computing © 2019 6 WaveTensor ™ Configurable Architecture

Configurable Architecture for Tensor Processing

Data 4x4 4x4 4x4 4x4 Fetch Tile Tile Tile Tiles Slice • Configurable MACs, Accumulation and Array Size Data 8x8 8x8 • Overlap of Communication & Computation Fetch Tile Tile N Tiles/Slice High Speed • Compatible datatypes with WaveFlow™ Core Buffer Data Confi Confi Confi Confi • Supports int8 for inferencing Fetch gTile gTile gTile gTile • Roadmap“Deep learning to bfloat16, is a fp32 fundamentally for training new software model. So we need a new computer platform to run it...” Data Confi Confi Confi Confi M Fetch gTile gTile gTile gTile Slices

Configurable MAC units ResNet-50 Inferences/Sec* More Memory {4x4 | 8x8} Compute Density ~1K/mm2 MAX TOPs TOPs/Watt TOPs/mm2 Compute Efficiency ~500/watt 1024 8.3 10.1 • Core, Int8, 7nm Fin-FET nominal process/Vdd/temp • Core, Int8, 8x8 tile config, 7nm Fin-FET nominal • Batch=1, std model w/o pruning, performance process/Vdd/temp and power vary with array size/configuration

Wave Computing © 2019 7 WaveFlow™ Reconfigurable Architecture

dMEM dMEM dMEM dMEM dMEM dMEM dMEM dMEM

• Configurable IMEM and DMEM Sizes PE-4 PE-5 PE-6 PE-7 PE-C PE-D PE-E PE-F

iMEM

iMEM iMEM iMEM iMEM iMEM iMEM • Overlap of communication & Computation iMEM i MAC MAC MAC MAC m MAC MAC MAC MAC • Compatible datatypes with WaveTensor™ e m • Integer (Int8, Int16, Int32) for inference

PE-0 PE-1 PE-2 PE-3 PE-8 PE-9 PE-A PE-B

iMEM iMEM

iMEM iMEM

iMEM iMEM iMEM • Roadmap (bfloat16, fp32) for training iMEM dMEM dMEM dMEM dMEM dMEM dMEM dMEM dMEM “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Tile = (16 PE’s + 8 MACS) • Wide range of scalable solutions (2-1K tiles) • Future Proof all AI algorithms Tile Tile Tile Tile Tile

• Flexible 2 dimensional tiling implementation AXI • Reconfigurable for dynamic networks Tile Tile Tile Tile Tile • Concurrent Network execution • Supports signal and vision processing WaveFlowTM = Wave Dataflow Array of Tiles

Wave Computing © 2019 8 Future Proofing in an Rapidly Evolving Industry

What is the likelihood that your DNN accelerator will run all these “yet to be invented” networks?

“Deep learning is a fundamentally new software model. SoWave’s we need TritonAI a new computer™ 64 platform platform combines to run it...” a reconfigurable processor with an efficient neural network accelerator.

Offers customers peace of mind and investment protection

Wave Computing © 2019 9 Operations using WaveFlow™ Reconfigurable Processor

Future-proof your Silicon CNN Layers Data Preprocessing • Sparse Matrix-Vector Processing • Scaling • Stochastic pooling • Aspect Ratio adjustment • Median pooling (illumination estimation & color correction) • Normalizing “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...” Activation functions Other Functions • Leaky rectified linear unit (Leaky ReLU) (used in Yolo3) • Compression/Decompression • Parametric rectified linear unit (PReLU) • Encryption/Decryption • Randomized leaky rectified linear unit (RReLU) • Sorting

Custom Operators (e.g.) • Novel Loss Function • New Softmax Implementation • Image resize nearest neighbor

Wave Computing © 2019 10 MIPS-64 Processor

MIPS-64: • MIPS64r6 ISA • 128-bit SIMD/FPU for int/SP/DP ops • Virtualization extensions • Superscalar 9-stage pipeline w/SMT • Caches (32KB-64KB), DSPRAM (0-64KB) • Advanced branch predict and MMU “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Multi-Processor Cluster: • 1-6 cores • Integrated L2 cache (0-8MB, opt ECC) • Power mgmt. (F/V gating, per CPU) • Interrupt control with virtualization • 256b native AXI4 or ACE interface

Wave Computing © 2019 11 Wave’s TritonAI™ 64 IP Software Platform

Software Platform: • Mature IDE & Tools • Driver Layer for Technology Mapping • Linux Operating system support/updates • Abstract AI Framework calls via WaveRTTM API • Optimized AI Libraries for: • CPU/SIMD/WaveFlow/WaveTensor • TensorFlow-Lite Build support/updates • Extensible to Edge Training with Full TensorFlow Build “Deep learning is a fundamentally new software model. So TensorFlowwe need a Lite new I/F computer platformONNX to I/F run it...”

Configurable Hardware Platform: WaveRT™ | API based Run time with Libraries • MIPS64r6 ISA Cluster Linux • 1- 6 cores Add-Ons to IDE: • 1-4 threads/core New/updated RT Libraries, Driver Layer • L1 I/D (32KB-64KB) New/updated Tech support • Unified L2 (256K to 8 Mbytes) MIPS IDE MIPS64 CPU+SIMD • WaveFlow Tile Array Compile, • 4 – N Tiles WaveFlow™ WaveTensor™ Build, Core Core MIPS64 CPU+SIMD Mem

• WaveTensor Slice Array Debug, Coherent • 1 – N Slices Libraries MIPS64 CPU+SIMD

Wave Computing © 2019 12 Federated Learning: The Next Frontier in Edge AI

Wave Computing © 2019 13 Today’s Approach: Centralized Machine Learning

Better ML comes at a cost of collecting data Most training done in the cloud. i.e. Send your data to the cloud.

Diminished Privacy • Where is your data?

“Deep learning is a fundamentally new software model.• WhoSo we has need access a new to computer your data? platform to run it...” Incompatible with Banks, Insurance, Military, Health sectors Latency Problems • Most access technologies are asymmetric High Communications Costs

Wave Computing © 2019 14 Federated Learning: Edge AI with Data Privacy

Federated learning uses training at the edge to refine the global model

1. Server selects a group of users

Server Server model 2. Users receive copy of central model “Deep learning is a fundamentally new software model. So we need3. Users a new update computer model platform based to run on it...” local data (“Training at the Edge”) A The server selects K users B They receive the current model 4. Updates are shared with the server w1 (User data remains private) Server Server 5. Server aggregates the changes 2 w and updates the central model

C And compute updates using their data D Updates are shared with the server

Source: Florian Hartmann, Google AI Wave Computing © 2019 15 Federated Learning: Edge AI with Data Privacy

w1 Benefits & Use Cases: • Transfer learning using local data at edge w5 w2 • Edge data remains private “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...” • Social networking 5G applications Microcell • Intelligent transportation systems that help increase passenger & pedestrian safety + traffic flow w4 w3

Wave Computing © 2019 16 Federated Learning: Edge AI with Data Privacy

Federated learning uses training at the edge to refine a global master model

Benefits & Use Cases: • Transfer learning using local data at edge “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...” • Edge data remains private • Social networking applications • Intelligent transportation systems that help increase passenger & pedestrian safety + traffic flow

Wave Computing © 2019 17 Wave is Driving Training at the Edge with bfloat16

sign Exponent Fraction si (8bits) (23bits) Training into Two Camps: g Float16 or bfloat16 datatypes float32 Sn E E E E E E E E M M M M M M M ~ M M M M M

31 30 23 22 (bit index) 0

“Deep learning is a fundamentally new softwaresi (5bits)model. So we need a (10bits)new computer platform to run it...” g float16 Sn E E E E E M M M M M M M M M M

15 14 7 6 (bit index) 0 (8bits) (7bits)

bfloat16 S E E E E E E E E M M M M M M M

15 14 7 6 (bit index) 0

Wave Computing © 2019 18 The TritonAI™ Edge Training (Development)

Edge Training Development

• Training Stacks for • Federated Learning at the Edge Full TensorFlow • Transfer Learning at the edge

“Deep• Local learning or personalized is a fundamentally models new software model. So we needTensorFlow a new computer Lite I/F platform to run Caffeit...”

• Full TensorFlow Build WaveRT™ | API based Run time with Libraries

• WaveRT API Ext for Training MIPS IDE Add-Ons: Linux New/updated RT Libraries, • Optimized SIMD FP32 & Frameworks bfloat16 eigen libraries Tech support Driver Layer • Deploy training at the edge MIPS IDE MIPS64 CPU+(bf16) Compile, WaveFlow™ WaveTensor™ Build, Core Core

Debug, MIPS64 CPU+(bf16) Mem (bf16) (bf16) Coherent Libraries MIPS64 CPU+(f16)

Wave Computing © 2019 19 Summary

Wave’s TritonAITM Platform Drives Inferencing to the Edge

Wave’s TritonAI™ Platform is a configurable, scalable & programmable offering customers’ efficiency, flexibility and AI investment protection

Wave will enable “Training at the edge” with next-gen MIPS AI processor bfloat16 architectures

Wave Computing © 2019 20 Thank You

If you have questions or would like more information, visit www.wavecomp.ai

@wavecomputing

https://www.linkedin.com/company/wave-computing

https://www.facebook.com/WaveComp/ 21 MIPS-Classic Cores

Presented by Yuri Panchul MIPS Open Technical Lead MIPS Open Meetup in Moscow April 15, 2019 1 Three classes of CPU Cores MIPS IP core product portfolio 2 I-Class brings the ultimate differentiation

3 Next 12-months focus on I-Class P6600 P-Class 64-bit P5600 High single thread performance MIPS64, multi-issue, 1-6 cores VZ, 128b SIMD/FPU via OoO multi-issue design. 32-bit MIPS32, multi-issue, 1-6 cores High data throughput apps VZ, 128b SIMD/FPU, 40b PA

I6400 I6500 I6500-F Many core/multi-cluster FuSa version I6500 64-bit MIPS64, dual issue, 1-6 cores Hetero cluster/multi-cluster Safety Element out of Context I-Class SMT, VZ, SIMD/FPU I6400 + DSPRAM, ITC, LL ports (SEooC) design Performance/cost balanced Efficiency via multi-threading AP & RT embedded apps interAptiv I7200 32-bit MIPS32, single issue, 1-4 cores nanoMIPS, dual issue, 1-4 cores MT, DSP, FPU, MPU or MMU VMT, DSP, MPU/MMU, LL ports

M-Class microAptiv M5100/M5150 M6200/M6250 Size/Cost optimized CPUs MIPS32/microMIPS, +30% higher frequency than 32-bit microAptiv + Class leading perf efficiency MCU/MPU, SRAM/L1 cache VZ, FPU, Anti-Tamper microAptiv, M51xx IoT & embedded apps DSP, FPU (UCF), SRS ECC, AXI, 40b PA, APB

Wave Computing Confidential © 2018 23 MIPS IP Cores - Features Summary

microAptiv M51xx M62xx interAptiv I7200 I6500/-F P5600 P6600

MIPS Primary ISA MIPS32 r5 MIPS32 r5 MIPS32 r6 MIPS32 r5 nanoMIPS32 MIPS64 r6 MIPS32 r5 MIPS64 r6

Virtual/Phys Addr Bits 32/32 32/32 32/32 32/32 32/32 48/48 32/40 48/40 FPU  (UC version only)  - MT - MT w/SIMD Hi Perf w/SIMD Hi Perf w/SIMD DSP/SIMD extensions DSP ASE r2 DSP ASE r2 DSP ASE r2 DSP ASE r2 DSP ASE r2 MSA 128-bit MSA 128-bit MSA 128-bit

Virtualization -  - - -    Small code size ISA microMIPS32 microMIPS32 microMIPS32 MIPS16e2 ASE nanoMIPS32 - - - Multi-threading - - - 2 VPE, 9 TC 3 VPE, 9 TC 4 VPE - -

SuperScalar - - - - Dual-issue in order Dual-issue in order Multi-issue OoO Multi-issue OoO

Pipeline stages 5 5 6 9 9 9 16 16

Relative Frequency* 0.6x 0.6x 0.75x 1x 0.95x 0.90x 1.10x 1.10x SPRAMs (I/D/U)  /  / -  /  / -  /  / -  /  / -  /  /  - /  / - - / - / - - / - / - L1 caches        

L2 cache - -      Coherent Multi-Core - - - Up to 4 cores Up to 4 cores Up to 6 cores, Up to 6 cores Up to 6 cores Up to 64 clusters Native System Bus I/F AHB-Lite AHB-Lite AXI OCP 2 or AXI AXI AXI or ACE AXI AXI

* Relative Frequencies are approximate, are provided for rough guidance only, and will vary to some extent in different process nodes

Wave Computing Confidential © 2018 24 MIPS technology differentiation

MIPS architecture and IP cores offer powerful, unique capabilities

Hardware Hardware Real-time Multi-threading Virtualization Features 3 1 Better efficiency 2 Enables security by Combination of and CPU utilization separation; unique features for available across entire real-time applications performance range MIPS multi-threading makes MIPS virtualization and real- time features even better MIPS IP cores 1. Offer leading Power Performance Area (PPA) across the range 2. Provide ultimate scalability: multi-thread, multi-core, multi-cluster 3. Address Functional Safety to ISO 26262 for automotive and IEC 61508 for industrial

Wave Computing Confidential © 2018 25 The I-Class I6500 I7200 MIPS® I-Class Processor Core Roadmap MIPS32 and MIPS64 evolution

MIPS32 Release 3 MIPS32/64 Release6 I6500F I6500

ASIL-D SEooC I6400 I6400 plus MIPS32 + enhanced MIP16e ASIL-B Safety Package Multi-Cluster 9-stage pipeline DIA Accelerator clusters Strategic Core Multi-core (1-4 cores) w/int L2$ MIPS64 DSPRAM, ITC Development MT (2 VP, 9 TC per core) App Processor feature set Auxiliary lo-latency DSPr2 ASE Large performance/IPC jump (dual periph ports (up to 4) ECC on data RAMs issue SS) Heterogeneous configs 5G I&D ScratchPad (SP) RAMs Multi-core (1-6 cores) w/SMT InterThread Comms (ITC) FuSa (4 VP/core) AP & RT/Emb features Datacenter SIMD, HW VZ, Ld/St bonding, Adv Pwr Mgmt, ECC on all AI Enabled Multi- interAptiv caches, etc. Core xMP/F I7200 Details released 2H’19 Single- nanoMIPS ISA interAptiv Large core perf/IPC jump (dual issue SS) Core UP/F Fetch, latency and QoS enhancements Unified SPRAM (option) Tuned embedded feature set Coh cluster to 4 cores w/SMT (3 VP/core + TCs) AXI system i/f x2 Highly efficient, scalable MIPS Warrior I6400 & I6500 Setting a new standards in mainstream 64-bit processing

Simultaneous Hardware multi-threading virtualization 1 3 2 Highly efficient, I6500 scalable 64-bit 30-60% Higher Next-generation Performance security performance than ARM*

Advanced power 128-bit SIMD management

2 Wave Computing © 2018 – Company Confidential 8 Key Components of a Multi-Core Cluster? I6500

CPU Cluster : contains multiple CPU I6500 Cluster Cores, Coherent IO Cores and other Debug/ Std/Cust ITU supporting CPU Subsystem Trace GCRs

MIPS CPU Core: 1 to 4/6 cores / Cluster Global Interrupt Cluster Pwr Control (GIC) Control (CPC) Coherency Manager : handles CPU – Core Coherency Core CPU – IOCU memory transactions 1 Manager 4 coherently Core Core IOCU: 2-8 Coherent IO interface for fast 2 L2$ 5

CPU interface to system IO (256K–8MB) Core Core 3 Cluster Interface: Combination of 6 Industry standard interfaces. IOCU Aux AXI System IOCU ACE – AXI4 Coherency Extension 1 ports i/f 2

AXI4 to I/O AXI4 or Devices ACE To Main Memory Wave Computing © 2018 – Company Confidential 29 How to build a Multi-Core Cluster? I6500

MMU : enable OS like Linux to manage Optional MCP I/F (128-bit to CM) I6500 Cluster I6500 Core memory efficiently as well as enforcing memoryDebug/ isolation and protection.Std/Cust L1 Instr Cache ITU 32-64 KB Bus Interface Unit Trace GCRs SMT (Multi-Threading): up to 4 Instruction Fetch Unit hardwareGlobal threads Interrupt makes I6500Cluster Pwrbehave Memory Control (GIC) Control (CPC) Management like 4-CPUs. Each hardware thread adds Thread1 Thread2 Thread3 Thread4 Unit st Branch Predict ~10% area; 1 thread, boost throughput Core Coherency Core by as much1 as 50%Manager 4 Instruction Issue Unit

Core Core Integer Execution Memory DMA MT SIMD 2 L2$ 5 Pipes Pipe Load/ (256K–8MB) MD ALU/ ALU/ Core Core Integer and Store Data U Br LS SIMD (Single3 Instruction Multiple SP/DP FPU Address SPRAM 6 Pipe Pipe Pipe (0/64KB) Data): Vector level Arithmetic and IOCU Graduation Unit Boolean engineAux working. AXI System AbilityIOCU to 1 2 L1 Data Cache execute multipleports math operationi/f per Power Mgmt 32-64 KB Unit (PMU) cycle. AXI4 to I/O AXI4 or Devices ACE To Main Memory 3 Wave Computing © 2018 – Company Confidential 0 MIPS I7200 Highlights

• Multi-threaded multi-core 32-bit processor IP • Designed for high performance embedded systems with real-time requirements • LTE/5G, communications processing • Wireless router/gateway processor Leading performance: • Data/packet processing 50% higher than previous generation

Top performance efficiency I7200 Optimal real-time response multi-threaded via vertical 2 from 1 multi-threading multi-core Processor IP zero cycle context switch

Smallest code size Ideal features for 3 from new RTOS or Linux nanoMIPSTM ISA software systems

Wave Computing Confidential © 2018 31 3 nanoMIPS ISA – advancing small code size Achieves outstanding code density targeting small footprint applications or constraint devices

2 Native ISA for MIPS I7200

4000000 Smaller is better MIPS32 MIPS interAptiv 3500000

3000000 Thumb2 ARM Cortex R8 ARM Cortex A53 2500000 1 MIPS I7200 MIPS16e2 2000000 nanoMIPS Thumb2 1500000 nanoMIPS

Total Code Size Size Code (Bytes) Total 1000000 Size optimized (-Os) Performance optimized (-O3) gcc compiler, with indicated optimization target

32 Wave Computing Confidential © 2018 Success Stories

Market specific use-cases Classical Application Segments

Fungible Cluster 0 Cluster n

L1 L2 L2 CPU 4 control CPU 3 ACE ACE CPU 2 CPU 1

T1 Data Pump / Fabric CPU CPU cluster Applications T2 Datapath Dual-issue RF Multi- DSP(s) threading up pipeline to Uplink Downlink

T9 IRQs PCIe PCIe / others

L1 (control)/L2/L3 (protocol CPUs) L1 (baseband) X86 Network Host Server Interface

Feature ADAS LTE/5G Datacenter

Application 1 Autonomous driving Broadband LET modem Converged Infrastructure. Distributed Workload processing Value Prop 2 SMT, RT, IPC, VZ SMT, RT, IPC, low power, small code SMT, RT, IPC, distributed work load, VZ Key drivers Heterogeneous, FuSa / SEooC, High efficiency class leading 32b uArch Threads / cores / clusters 3 multi-Cluster High perf small code size ISA low latency payload processing Low latency embedded / RT features Likely Customers Denso, Renesas, NXP, OnSemi, MTK, , Qualcomm, Sony, Cisco, , Broadcom, Marvell, Samsung, Toshiba, Cypress, STM, Juniper, Microchip, Bosch,.. Google, Baidu, Argo, … Application Segment 1 ADAS I6500-F lead customer: Mobileye ADAS Platform

I6500-F at heart of heterogeneous coherent processing clusters in next-gen EyeQ®5 SoC Designed to act as central computer for sensor fusion in Fully Autonomous Driving (FAD) vehicles starting 2020

“Our EyeQ®5 SoC will be the most advanced solution of its kind for fully autonomous vehicles which will start rolling out in 2020. The ASIL B(D) features in the I6500-F are key to ensuring our chip achieves the highest level of safety.” 2018 – Elchanan Rushinek, SVP Engineering, Mobileye MIPS I-class vs Cortex A7 & A53 performance 2 Less power or more CPU headroom Per core, normalized to A53 Values (@ same frequency) Simultaneous Multi-Threading 1.80 1 45+% performance boost for 10% Area

1.60 +60% 3 I6500 performs best for different workloads +45% 1.40 +30%

1.20

1.00

0.80

1 thread 1 thread 1 thread

2 threads 2 threads 2 threads

– – –

– –

0.60 –

2T 2T 2T

– – –

0.40

ARM ARM ARM

- ARM ARM - ARM -

Relative Relative Performance

- -

0.20 -

I6400/I6500 I6400/I6500 I6400/I6500

interAptiv A53 A7 interAptiv I6400/I6500 A7 A7 A53 I6400/I6500 A7 interAptiv I6400/I6500 A53 0.00 DMIPS - 32b CoreMark SPECint2000 (rate) • Based on Cortex A53 data reported by ARM on website and/or in presentation materials, plus benchmarked results on Linaro (HiSilicon Octa A53 Kirin620) with Linux kernel: 3.18.0-linaro-hikey SMP preempt, RFS Debian squeeze, with GCC-based 5.0.0 toolchain. A7 scores are ARM claims. Measured results are lower. • I6400 results are based on production released RTL, FPGA platform benchmarking and in case of SPEC, 1 enhancement for next release performance models testing Application Segment 2 LTE/5G 2 Success Story, Multi-threading & Mediatek I7200

Mediatek 5G LTE modem About MIPS in the application

▪ Mediatek designed MIPS into modem across multiple products LTE ▪ First version, Helio X30, in production L1 CPU 4 control CPU 3 CPU 2 ▪ Hardware Multi-Threading (MT) CPU 1

T1 CPU CPU cluster ▪ Fast inter-thread communications Applications T2 Datapath Dual-issue RF Multi- DSP(s) ▪ Higher performance/high processing efficiency threading up MIPS pipeline to

Advantage ▪ Scalability T9 IRQs ▪ 1-4 cores, 2 threads per core

▪ Deterministic ; Real-time interrupts L1 (control)/L2/L3 (protocol CPUs) L1 (baseband)

“MIPS CPUs, with their powerful multi-threading capability, offer a combination of efficiency and high throughput for LTE modems that contributes significantly to system performance.” said Dr. Kevin Jou, SVP and CTO, MediaTek. 2018 I7200 Performance Advantage

CoreMark/MHz 35~60% higher than interAptiv AND competition! 1 thread interAptiv 2 threads

1 thread I7200 2 threads

Cortex A53 Cortex R8

0 1 2 3 4 5 6 7 ARM scores reported by ARM for AArch32. Thumb2 scores likely 5-10% lower. MIPS scores are MIPS32 for interAptiv, and nanoMIPS for I7200 Wave Computing Confidential © 2018 40 Application Segment 3 Data-center Datacenter Distributed Workload Processor

I6500 Core 4x SMT 1 ▪ Scalable - Multi-threading to Multi-core to Multi-cluster L1 DSPRAM

I6500 I6500 ▪ Configurable - Heterogeneous inside & outside Cluster 0 Cluster n

2 ▪ Optimized for High-throughput data processing applications

3 ▪ Real-time, secure, deterministic and low latency L2 L2 ACE ACE Coherent Fabric / NoC

Uplink Downlink

“The MIPS Simultaneous Multi-Threading architecture is an important technique to PCIe PCIe ensure that such workloads run efficiently as measured by the CPU’s instructions per clock, or IPC. We have seen that this efficiency translates directly into a smaller area as well as lower power for silicon implementations based on MIPS.” 2018 Network Host Server Interface Pradeep Sindhu, CEO of Fungible I6400 & 6500 – Why Multi Threading? A powerful differentiator among CPU IP cores

1▪ Why MT? ▪ A path to higher performance, and higher efficiency ▪ 30%-60% higher performance for 10% per Thread increase

I6500

Thread Thread 2 Thread Thread 3 Thread 4 2▪ Easy to use – programming model is same as multi-core Thread 1 ▪ A thread looks like a core to standard SMP OS

3▪ Simultaneous / concurrent execution ▪ Zero Cycle overhead context switching Instruction Queues

2.50 Multi-Thread Memcopies on single I6400/I6500 core Hardware 2.00 Scheduler >100%

1.50

1.00 Execution

0.50 Queues

0.00 I6400 - 1C1T I6400 - 1C2T I6400 - 1C4T

Wave Computing © 2018 – Company Confidential 43 MIPS IP Cores – Mapping to ARM

microAptiv M51xx M62xx interAptiv I7200 I6500/-F P5600 P6600

MIPS Primary ISA MIPS32 r5 MIPS32 r5 MIPS32 r6 MIPS32 r5 nanoMIPS32 MIPS64 r6 MIPS32 r5 MIPS64 r6 Virtual/Phys Addr Bits 32/32 32/32 32/32 32/32 32/32 48/48 32/40 48/40 FPU  (UC version only)  - MT - MT w/SIMD Hi Perf w/SIMD Hi Perf w/SIMD

DSP/SIMD extensions DSP ASE r2 DSP ASE r2 DSP ASE r2 DSP ASE r2 DSP ASE r2 MSA 128-bit MSA 128-bit MSA 128-bit Virtualization -  - - -    Small code size ISA microMIPS32 microMIPS32 microMIPS32 MIPS16e2 ASE nanoMIPS32 - - - Multi-threading - - - 2 VPE, 9 TC 3 VPE, 9 TC 4 VPE - -

SuperScalar - - - - Dual-issue in order Dual-issue in order Multi-issue OoO Multi-issue OoO

Pipeline stages 5 5 6 9 9 9 16 16 Relative Frequency* 0.6x 0.6x 0.75x 1x 0.95x 0.90x 1.10x 1.10x

SPRAMs (I/D/U)  /  / -  /  / -  /  / -  /  / -  /  /  - /  / - - / - / - - / - / - L1 caches         L2 cache - -      Coherent Multi-Core - - - Up to 4 cores Up to 4 cores Up to 6 cores, Up to 6 cores Up to 6 cores Up to 64 clusters Native System Bus I/F AHB-Lite AHB-Lite AXI OCP 2 or AXI AXI AXI or ACE AXI AXI Sample ARM Mapping M3 M4 M23 M33 R52 R7 R8 A53 A57 A72 Wave Computing Confidential © 2018 44 MIPS Open™ The New Standard in Open Use ISAs

Presented by Yuri Panchul, MIPS Open Technical Lead MIPS Open Meetup in Moscow, April 15, 2019

Wave Computing © 2019 Wave + MIPS – A Powerful AI + IP Combination

MIPS becomes MIPS Computer IP licensing Wave Systems Inc. Establishes MIPS company founded Smartphone Open Advisory 1981 LTE modem Board 1980 1990 2000 2010 2017 2018 2019 “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...” 1981 MIPS architecture First MIPS DTV/STB Cable Modem Wi-Fi - Enterprise Automotive Wave Joins AWS, invented by product ships: product ships: product ships: Residential Networking MCUs Acquires Facebook, team from R2000 uP Gateway & ADAS MIPS Google & Stanford Samsung on UC Berkeley’s BAIR initiative Wave Launches MIPS Open

Wave Computing © 2019 46 MIPS Open™: Accelerating Innovation @ the Edge

• Accelerates innovation for system-on-chip designs

• Expands adoption of MIPS RISC architecture & grows supporting ecosystem

• Dedicated Advisory Board prevents architectural fragmentation

• Delivers peace of mind & investment protection for existing MIPS customers

“DeepDecades learning of issilicon a fundamentally-proven innovation new software model. So we needFueling a new Next computer-Gen Edge platform Designs to run it...”

ADAS TVs Smartphones Autonomous vehicles Visual analytics

Routers Set-top boxes Switches IoT HPC

Wave Computing © 2019 47 MIPS Open™ Benefits

Proven Established Matured Unified Approach Architecture Ecosystem Technology

• Compatibility • MIPS architecture- • Well-established • 30+ years of legacy of based designs are shipping MIPS architecture • No Fragmentation • Tools, software, “Deep learning is a fundamentallyin billions new of devices software model. So we need a new computer platform to run it...” applications, boards • Enabler to build the next • Same tools, software • Shipping Products – trillion devices/ • Supported and used by • Flexibility – UDI support wired modems, wireless applications modems – LTE/5G/Wi-Fi, IoT, customers/ partners automotive – ADAS, data centers, AI and others.

MIPS Open is the World’s First Commercial-ready, Silicon Proven, Feature-Rich Architecture Available for Open Use

Wave Computing © 2019 48 MIPS Open™ Model

Training/Teaching/ Development Design & Build Evaluation Architecture to Core Core to SoC Sell

Open Source Tools License Cores to MIPS Open FPGA 32-/64-bit ISA Software Verification Suite Developed Cores Semi Vendors DSP/SIMD/VZ/MT SDK

“Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

MIPS Open SOC#1 Everyone – Core Developer Certification Certified Core ….. Student, Partner SOC#n University, Professional

Partner Tools Partner Design Customer/ Software SDK Services 3rd Party IP

Wave Computing © 2019 49 MIPSOpen.com Now Live

“Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Wave Computing © 2019 50 51Wave Computing © 2019 – Confidential MIPS Open™ Components

MIPS Open ISA MIPS Open Tools MIPS Open FPGA MIPS Open Cores

• Latest R6 version of 32- • IDE for Embedded RTOS • Getting Started Guide - • Low power, low footprint bit/64-bit ISA and Linux Embedded provides MIPS Open microAptiv cores • Extensions such as Edition FPGA system as a set of • Microprocessor(MPU) Verilog files Virtualization, Multi- • Enables MIPS software • Microcontroller (MCU) threading, SIMD, DSP and developers to build, debug, • Labs - includes 25 hands-on microMIPS architectures and deploy applications on labs that guide users • Targeted for embedded MIPS-based hardware and in exploring computer applications software platforms architecture & system-level design • SOC - shows how to build a system-on-chip design based on MIPS Open FPGA that loads the open source Linux

MIPS Open Components includes the latest MIPS R6 ISA, low power, low footprint microAptiv cores, microAptiv soft core targeted for FPGA and tools for software development.

Wave Computing © 2019 51 MIPS Open™ Features

Comprehensive Open Use License Certification Patent License Package •MIPS Open use •Access to certification •Right and license •Complete package license includes: services to core under R6 architecture •Instruction set, • Latest MIPS R6 developers patents to design, cores, tools for architecture documents •Verification and build and sell cores the community to • microAptiv cores Certification of cores •Use of the “MIPS accelerate innovation at “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...” • Tools developed using Certified” trademark the edge • microAptiv softcore MIPS R6 ISA. logo for certified cores targeted for FPGA

MIPS Open provides a comprehensive open use package and tools to enable fast time to market for certified cores.

Wave Computing © 2019 52 Evolving the MIPS Ecosystem for AI at the Edge

Building on an established ecosystem… Growing new markets with AI

Mixed Signal Memory

DSP

“Deep learning is a fundamentallyNOC new software model. So we need a new computer platform to run it...”

I/O & Accelerators …creating an AI solution stack that enables end-to-end business models

Today Tomorrow Wave Computing © 2019: IPro Group Technology Day, 10 April 2019 53 MIPS Open™ Advisory Committee Membership Levels

“Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Wave Computing © 2019 54 MIPS Open™ Timeline

‘Open Use’ microAptiv Q2’19 UC/UP Cores

March 28th, 2019 “Deep learning is a fundamentally new software model. So we need a new computer platform to run it...”

Feb 19th, 2019

Dec 17th , 2018

Wave Computing © 2019 55 Thank You

Wave Computing © 2019