Compute Express Link™ (CXL™): Memory and Cache Protocols

Compute Express Link™ (CXL™): Memory and Cache Protocols

Compute Express Link™ (CXL™): Memory and Cache Protocols Robert Blankenship Intel Corporation 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 1 Topics What is CXL? Caching 101 CXL Caching Protocol CXL Memory Protocol Merging Memory and Caching for Accelerators 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 2 What is CXL? • CXL runs across the standard PCIe physical layer with new protocols x16 PCIe x16 CXL Card Card optimized for cache and memory • CXL uses a flexible processor Port that X16 Connector can auto-negotiate to either the PCIe channel SERDES standard PCIe transaction protocol or Connector the alternate CXL transaction protocols etc. • First generation CXL aligns to 32 GT/s PCIe 5.0 specification Processor • CXL usages expected to be key driver for an aggressive timeline to PCIe 6.0 architecture 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 3 CXL Uses 3 Protocols • IO (CXL.io) → Same as PCI Express® • Memory (CXL.mem) → Memory Access from CPU Host to Device • Cache (CXL.cache) → Coherent Access from Device to CPU Host 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 4 Representative CXL Usages 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 5 Caching 101 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 6 Caching Overview Caching temporarily brings data closer to the consumer Host Memory Access Latency: ~200ns Shared Bandwidth: 100+ GB/s Improves latency and bandwidth using prefetching and/or locality Data • Prefetching: Loading Data into cache before it is required Read • Spatial Locality (locality is space): Access address X then Local Data Cache X+n Access Latency: ~10ns • Temporal Locality (locality in Time): Multiple access to the Dedicated Bandwidth: 100+ GB/s same Data Data Read Accelerator 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 7 CPU Cache/Memory Hierarchy with CXL ~10 GB – Directly Connected Memory CXL.mem CXL.mem • Modern CPUs have 2 or (aka DDR) ~10 GB ~10 GB more levels of coherent Home Agent cache Coherent CPU-to-CPU Symmetric Links • Lower levels (L1), smaller CPU Socket 1 in capacity with lowest ~10 MB – L3 (aka LLC) latency and highest Wr Wr CXL.Cache ~500 KB – L2 ~500 KB – L2 Cache Cache ... CXL.$ ~50KB ~50KB bandwidth per source. CXL.io CXL.io PCIe L1 L1 L1 L1 PCIe • Higher levels (L3), less ~50 KB ~50 KB ... ~50 KB ~50 KB CPU CPU CPU CPU bandwidth per source but CPU Socket 0 much higher capacity and support more sources ~1 MB ~1 MB • Device caches are expected Device Device to be up to 1MB. Note: Cache/Memory capacities are examples and not aligned to a specific product. 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 8 Cache Consistency How do we make sure updates in cache are visible to other agents? • Invalidate all peer caches prior to update • Can managed with software or hardware CXL uses hardware coherence Define a point of “Global Observation” (aka GO) when new data is visible from writes Tracking granularity is a “cacheline” of data 64-bytes for CXL All addresses are assumed to be Host Physical Address (HPA) in CXL cache and memory protocols Translations using existing Address Translation Services (ATS). 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 9 Cache Coherence Protocol • Modern CPU caches and CXL are built on M,E,S,I protocol/states • Modified – Only in one cache, Can be read or written, Data NOT up-to-date in memory • Exclusive – Only in one cache, Can be read or written, Data IS up-to-date in memory • Shared – Can be in many caches, Can only be read, Data IS up-to-date in memory • Invalid – Not in cache • M,E,S,I is tracked for each cacheline address in each cache • Cacheline address in CXL is Addr[51:6] • Notes: • Each level of the CPU cache hierarchy follows MESI and layers above must be consistent • Other extended states and flows are possible but not covered in context of CXL 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 10 How are Peer Caches Managed? • All peer caches managed by the “Home Agent” within the cache level Hidden from CXL device • A “Snoop” is the term for the Home to check cache state and may cause cache state changes • CXL Snoops: • Snoop Invalidate (SnpInv): Cache to degrade to I-state, and must return any Modified data • Snoop Data (SnpData): Cache to degrade to S-state, and must return any Modified data. • Snoop Current (SnpCurr): Cache state does not change, but must return any Modified data 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 11 CXL Cache Protocol 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 12 Cache Protocol Summary • Simple set of 15 cache-able reads and writes from the device to host memory • Keep complexity of global coherence management in the host 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 13 Cache Protocol Channels 3 channels in each direction: D2H vs H2D Data and RSP channels are pre-allocated D2H Requests from the device H2D Requests are snoops from the host Ordering: H2D Req (Snoop) push H2D RSP 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 14 Read Flow • Diagram to show message flows in time • X-axis: Agents • Y-axis: Time 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 15 Read Flow • Diagram to show message flows in time • X-axis: Agents • Y-axis: Time 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 16 Mapping Flow Back to CPU Hierarchy ~10 GB – Directly Connected Memory CXL.mem CXL.mem (aka DDR) ~10 GB ~10 GB Home Agent Coherent CPU-to-CPU Symmetric Links CPU Socket 1 ~10 MB – L3 (aka LLC) Wr Wr CXL.Cache ~500 KB – L2 ~500 KB – L2 Cache Cache ... CXL.$ ~50KB ~50KB CXL.io CXL.io PCIe L1 L1 L1 L1 PCIe ~50 KB ~50 KB ... ~50 KB ~50 KB CPU CPU CPU CPU CPU Socket 0 CXL~1 MB ~1 MB DeviceDevice Device 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 17 Mapping Flow Back to CPU Hierarchy • Peer Cache can be: ~10 GB – Directly Connected Memory CXL.mem CXL.mem • Peer CXL Device with (aka DDR) ~10 GB ~10 GB Cache Home Agent • CPU Cache in Local Socket Coherent CPU-to-CPU Symmetric Links Peer Cache • CPU Cache in Remote CPU Socket 1 Socket ~10 MB – L3 (aka LLC) Wr Wr CXL.Cache ~500 KB – L2 Peer~500 KBCache – L2 Cache Cache Peer Cache ... CXL.$ Peer Cache ~50KB ~50KB CXL.io CXL.io PCIe L1 L1 L1 L1 PCIe ~50 KB ~50 KB ... ~50 KB ~50 KB CPU CPU CPU CPU CPU Socket 0 CXL~1 MB ~1 MBPeer DeviceDevice DeviceCache 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 18 Mapping Flow Back to CPU Hierarchy Memory Memory • Peer Cache can be: Controller Controller ~10 GB – DirectlyMemory Connected Memory CXL.mem CXL.mem • Peer CXL Device with Memory Controller(aka DDR) ~10 GB ~10 GB Cache Controller • CPU Cache in Local Socket HomeHome Agent Home • CPU Cache in Remote Socket Coherent CPU-to-CPU Symmetric Links Peer Cache • Memory Controller CPU Socket 1 can be: ~10 MB – L3 (aka LLC) Wr Wr CXL.Cache • Native DDR on Local ~500 KB – L2 Peer~500 KBCache – L2 Cache Cache Peer Cache ... CXL.$ Peer Cache Socket ~50KB ~50KB CXL.io CXL.io PCIe • Native DDR on Remote L1 L1 L1 L1 PCIe ~50 KB ~50 KB ~50 KB ~50 KB Socket ... • CXL.mem on peer Device CPU CPU CPU CPU CPU Socket 0 CXL~1 MB ~1 PeerMB DeviceDevice DeviceCache 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 19 Example #2: Write CXL Peer • For Cache Writes there Device Cache Home I S are three phases: Memory • Ownership Controller • Silent Write • Cache Eviction Legend Cache State: Modified Exclusive Shared Invalid Allocate Tracker Deallocate Tracker 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 20 Example #2: Write CXL Peer • For Cache Writes there Device Cache Home are three phases: I S Ownership Memory • Controller • Silent Write Ownership • Cache Eviction S I I E Legend Cache State: Modified Exclusive Shared Invalid Allocate Tracker Deallocate Tracker 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 21 Example #2: Write CXL Peer • For Cache Writes there Device Cache Home are three phases: I S Memory • Ownership Controller • Silent Write Ownership • Cache Eviction S I I E Legend Write Cache State: <Silent Wr> E M Modified Exclusive Shared Invalid Allocate Tracker Deallocate Tracker 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 22 Example #2: Write CXL Peer • For Cache Writes there Device Cache Home are three phases: I S Memory • Ownership Controller • Silent Write Ownership • Cache Eviction S I I E Legend Write Cache State: <Silent Wr> Modified E M Exclusive Eviction Shared M I Invalid Data Allocate Tracker Deallocate Tracker 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 23 Example #3: Steaming Write CXL Peer • Direct Write to Host Device Cache Home • Ownership + Write in a I S Memory single flow. Controller • Rely on completion to S I indicate ordering • May see reduced Data bandwidth for ordered Legend traffic Cache State: Modified • Host may install data into Exclusive Shared LLC instead of writing to Invalid memory Allocate Tracker Deallocate Tracker 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 24 15 Request in CXL • Reads: RdShared, RdCurr, RdOwn, RdAny • Read-0: RdownNoData, CLFlush, CacheFlushed • Writes: DirtyEvict, CleanEvict, CleanEvictNoData • Streaming Writes: ItoMWr, MemWr, WOWrInv, WrInv(F) 2020 Storage Developer Conference. © CXL™ Consortium. All Rights Reserved. 25 CXL Memory Protocol 2020 Storage Developer Conference. © CXL™ Consortium.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    44 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us