Ncore™ Cache Coherent Interconnect Technology Overview, 24 May 2016

Craig Forrest David Kruckemyer Chief Technology Officer Chief Hardware Architect

Copyright © 2016 Arteris 24 May 2016 Contents

○ About Arteris ○ Caches, Cache Coherency and Challenges ○ Introducing Ncore Cache Coherent Interconnect ○ Summary

Copyright © 2016 Arteris 2 Arteris: The on-chip interconnect leader

Arteris Product Milestones ○ Founded in 2003 to pioneer network-on-chip (NoC) interconnect ○ NoC Solution = first released NoC implementation in 2005 ○ FlexNoC® = second generation Arteris NoC in 2009/2010 ○ FlexPSI = die-to-die or chip-to-chip parallel interface in 2013 ○ FlexNoC Resilience Package™ = Functional Safety option in 2014 ○ FlexNoC Physical™ = Physically aware IP with FlexNoC Version 3 in 2015 ○ Ncore™ Cache Coherent Interconnect = Heterogeneous cache coherency in 2016. Company ○ Headquarters and Engineering Development in Campbell, USA ○ Worldwide support offices (USA, France, China, Korea, India, Japan)

Awards Customer Adoption 76 79 67 58 52 41

20 13 6 9 1

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Copyright © 2016 Arteris 3 * Customer data current as of 1 May 2016 Arteris has become the standard for complex and low-power SoCs Customers shipped > 1B SoCs as of 2015

240 Design Starts 146 Tape-Outs 146 240 140 229 119 190 99 159 128 55 85 32 41 26 19 13 11 1 5 1 5

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

108 Chips Produced 104 108

79

51 33 20 11 1 4

2008 2009 2010 2011 2012 2013 2014 2015 2016

*Data is cumulative. Design data is customer-reported and subject to change. Data is current as of 1 May 2016.

Copyright © 2016 Arteris 4 Arteris Customers: Arteris technology is becoming a standard

Mobility Current as of 1 May 2016

Very Large SoC Maker

Automotive, IoT (Internet of Things), Camera & CE (Consumer )

Major Automotive Major Auto & CE OEM SoC Maker Japan System Automotive Toshiba OEM SoC Maker

Japan Tier 1 Large SoC Maker Drone Maker

SSD (Solid State Drive), Networking & Automation Major SSD Major SSD Vendor Vendor Defense Defense Defense Silicon Foundry Major IP Contractor Contractor Contractor Provider

Copyright © 2016 Arteris 5 Arteris interconnect IP now covers coherent and non-coherent use cases

CPU Subsystem Design-Specific Subsystems Application IP DSP Subsystem (A/V) Subsystem AES A57 A57 A53 A53 IP IP IP IP IP IP GPU 2D GR. Subsystem A57 A57 A53 A53 FlexWay® Interconnect FlexWay Interconnect MPEG 3D Graphics L2 cache L2 cache IP IP IP IP IP IP Etc.

Ncore™ Cache Coherent Interconnect FlexNoC® Non-coherent Interconnect InterChip LinksTM

Memory Scheduler Subsystem Interconnect WiFi HDMI CRI Crypto Memory Controller USB 3 GSM MIPI PCIe Firewall USB 2 (PCF+) LP DDR Wide IO LTE Display DDR3 PHY RSA- PHY PHY LTE Adv. 3.0, 2.0 PSS PMU Cert. PHY PHY Engine JTAG High Speed Wired Peripherals Wireless Subsystem Arteris Interconnect IP Products Security Subsystem I/O Peripherals

Memory Subsystem

Copyright © 2016 Arteris 6 Contents

○ About Arteris ○ Caches, Cache Coherency and Challenges ○ Introducing Ncore Cache Coherent Interconnect ○ Summary

Copyright © 2016 Arteris 7 Modern SoC Design Challenges

○ SCALABILITY: How to scale systems up as the number of coherent agents increases?

○ HETEROGENEITY: How to integrate coherent processing elements using different protocols, different semantics, or having different cache characteristics?

○ SYSTEM INTEGRATION: How to integrate IP that is not cache coherent and achieve better performance?

○ PHYSICAL DESIGN: How to create a cache coherent system that is easily placed on chip?

○ POWER MANAGEMENT: How to optimize power consumption of complex systems?

Copyright © 2016 Arteris 8 Why Caches?

○ Caches are small, fast memories tightly coupled to processing elements

○ Reduced average memory latency means higher performance • Temporal locality • Spatial locality

○ High bandwidth due to high frequency and wide interfaces

○ Fewer off-chip DRAM accesses resulting in lower power consumption

Copyright © 2016 Arteris 9 Why Cache Coherency?

○ Caches create multiple copies of data • Managing these copies in software is difficult

○ Hardware cache coherency creates the illusion of a flat, shared memory • Caches are invisible to software • Multiple copies are kept consistent

○ But… managing copies in hardware requires a lot of communication • Must check every place there may be a valid copy à Snoop • Snoop filters reduce communication by tracking cache contents

Copyright © 2016 Arteris 10 Contents

○ About Arteris ○ Caches, Cache Coherency and Challenges ○ Introducing Ncore Cache Coherent Interconnect ○ Summary

Copyright © 2016 Arteris 11 Ncore Cache Coherent Interconnect IP

Coherent Agents Non-coherent Agents

CPU Cluster GPU … Image … Display Cache ($) Cache ($) Processing Processing Non - coherent Agents coherent Subsystems

Peripherals

DRAM SRAM

Memory Agents Copyright © 2016 Arteris 12 Ncore Interconnect Architecture

Cache ($) ⋯ Cache ($) coherent - coherent Non

Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface ⋯ Bridge

Directory

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 13 Coherent Read Example – Cache Hit Consumer Producer Cache ($) Cache ($) ⋯ Cache ($) ❶ ❸ coherent - coherent Non

Coherent Agent Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface Interface ⋯ Bridge

Directory ❷

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 14 Coherent Read Example – Cache Misses Consumer Cache ($) Cache ($) ⋯ Cache ($) ❶ coherent - coherent Non

Coherent Agent Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface Interface ⋯ Bridge

Directory

Snoop ❷ ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Snoop Filter ❸ Bridge ⋱ Coherent Coherent ❹ Memory Memory Interface ⋯ Interface

Memory

Copyright © 2016 Arteris 15 Ncore Benefits

1. True heterogeneous coherency 2. Highly scalable systems

3. Higher performance with non- coherent IP

4. Lower power consumption

5. Easier chip floorplanning

Copyright © 2016 Arteris 16 Benefit #1: True heterogeneous coherency

Two features are primarily responsible for enabling Ncore’s unique heterogeneous cache coherency capabilities:

1. Support for multiple coherence models

2. Use of multiple configurable snoop filters to accommodate different cache organizations

Copyright © 2016 Arteris 17 Benefit #1: True heterogeneous coherency Support for heterogeneous coherent agents

○ Cache coherent agents can differ greatly, which increases the difficulty in integrating them into a system-on-chip • Logical – coherence models • Physical – cache organization, transaction table sizes

○ Ncore adapts to each coherent agent’s behavior and characteristics • Coherent agent interfaces adapt individual coherence models to a generic model using a lightweight messaging layer

Copyright © 2016 Arteris 18 Benefit #1: True heterogeneous coherency Coherent agent interfaces adapt individual coherence models to a generic model

Cache ($) ⋯ Cache ($) coherent - coherent Non

Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface ⋯ Bridge

Directory

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 19 Benefit #1: True heterogeneous coherency With multiple configurable snoop filters

Cache ($) ○ ⋯Cache coherentCache ($) agents can have very different behaviors • Cache organization Coherent Agent • CoherencyCoherent Agent models Interface ⋯ Interface • Workloads

Directory - coherent Non Proxy Cache ($) Proxy Bridge(s) Snoop Filter ○ Associating caching agents thatNon share-coherent Snoop CCTI Filter Domain Snoop common properties with individual snoop Filter Snoop filters can consume less die area than a Filter monolithic snoop filter ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 20 Benefit #1: True heterogeneous coherency Multiple snoop filters are more area-efficient than one

A B

C D Cache ($) Cache ($) Cache ($) Cache ($)

Traditional Approach Ncore Approach

A Snoop Filter A Monolithic #1 REQ B REQ Snoop Filter (Y) B (X) C Snoop C D Filter #2 (Z) D

Multiple snoop filters are smaller: area(Y+Z) < area (X)

Copyright © 2016 Arteris 21 Ncore Benefits

1. True heterogeneous coherency

2. Highly scalable systems 3. Higher performance with non- coherent IP

4. Lower power consumption

5. Easier chip floorplanning

Copyright © 2016 Arteris 22 Benefit #2: Highly scalable systems With a configurable, modular approach

○ Transaction processing and data bandwidth scaling • Each component can be scaled individually (add or subtract components) • Ports per component can be scaled individually (add or remove ports)

○ Why is configurable interconnect superior to fixed-function, centralized controllers? • Meet performance goals without wasted resources • Easily adjust system design as requirements evolve • Build derivative chips based on the same platform

Copyright © 2016 Arteris 23 Benefit #2: Highly scalable systems Add more components or ports to scale bandwidth

Cache ($) Cache ($) ⋯ Cache ($) …or add more ports coherent - coherent Non

Coherent Agent Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface Interface ⋯ Bridge

Directory Add more components… Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Arteris Confidential 24 Ncore Benefits

1. True heterogeneous coherency

2. Highly scalable systems

3. Higher performance with non- coherent IP 4. Lower power consumption

5. Easier chip floorplanning

Copyright © 2016 Arteris 25 Benefit #3: Higher performance with non-coherent IP Using configurable proxy caches

Advantages (new and novel) 1. Better for sharing data between non-coherent agents and coherent agents 2. Better for sharing data between non-coherent agents

○ Using a proxy cache minimizes communication through DRAM

○ Additional system benefits • Pre-fetch effect – fetch cache lines vs. individual data • Write-gathering benefit – writes accumulated in cache • Optimizes coherent memory accesses

Copyright © 2016 Arteris 26 Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent & coherent agents Using configurable proxy caches Consumer ❷ Producer Cache ($) ⋯ Cache ($) ❸ ❶ coherent - coherent Non

Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface ⋯ Bridge ❺

Directory

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter ❹ Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 27 Benefit #3: Higher performance with non-coherent IP Sharing between non-coherent agents Using configurable proxy caches ❷ Producer Consumer Cache ($) ⋯ Cache ($) ❶ ❸ coherent - coherent Non

Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface ⋯ Bridge

Directory ❹

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 28 Ncore Benefits

1. True heterogeneous coherency

2. Highly scalable systems

3. Higher performance with non- coherent IP

4. Lower power consumption 5. Easier chip floorplanning

Copyright © 2016 Arteris 29 Benefit #4: Lower power consumption With multiple clock and voltage domains

Cache ($) ⋯ Cache ($) coherent - coherent Non

Coherent Agent Coherent Agent Cache ($) Proxy Interface Interface ⋯ Bridge

Directory

Snoop ⋯ Filter Non-coherent Snoop CCTI Filter Subsystem Snoop coherent - coherent Non Filter Cache ($) Proxy Snoop Filter Bridge ⋱ Coherent Coherent Memory Memory Interface ⋯ Interface

Copyright © 2016 Arteris 30 Ncore Benefits

1. True heterogeneous coherency

2. Highly scalable systems

3. Higher performance with non- coherent IP

4. Lower power consumption

5. Easier chip floorplanning

Copyright © 2016 Arteris 31 Benefit #5: Easier chip floorplanning With a highly distributed architecture

Hub- and crossbar- based coherent interconnects require significant contiguous reserved die area

Source: Andrei Frumusanu, AnandTech ○ Reserve less area for cache coherent interconnect • Place it in existing “white space” routing channels – easier P&R ○ Locate modular Ncore components closer to critical IP – better timing ○ Minimize wiring congestion

Copyright © 2016 Arteris 32 Contents

○ About Arteris ○ Caches, Cache Coherency and Challenges ○ Introducing Ncore Cache Coherent Interconnect ○ Summary

Copyright © 2016 Arteris 33 Summary

Ncore™ Cache Coherent Interconnect IP is targeted at heterogeneous SoCs.

Benefits Major Unique Features

○ Scalability ○ Multiple configurable ○ Configurability snoop filters ○ Area efficiency ○ Multiple configurable proxy caches ○ High performance ○ Modular distributed ○ Optimal power consumption architecture

RESULT: Custom-configured interconnect IP that meets exact system requirements

Copyright © 2016 Arteris 34 To request more information, visit us at http://www.arteris.com/contact

Copyright © 2016 Arteris 35