Pmem Was Achieved with Spec Version 1.0 and the Rest Has Been Value on Top of That

Pmem Was Achieved with Spec Version 1.0 and the Rest Has Been Value on Top of That

Update on the SNIA Persistent Memory Programming Model in Theory and Practice Andy Rudoff Intel 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 1 Agenda Why create the NVM Programming TWG? What the NVM Programming Model means to most people Details on actual implementations, specific to: Intel Linux Windows Virtualization Areas unspecified by NVMP Future work 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 2 What Motivated Us to Create the TWG Concerning direction in ecosystem Products were emerging with private APIs ISVs forced to choose to ”lock in” to a product Lots of mis-information, conflicting APIs But my reasons (Intel) were… 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 3 Big and Affordable Memory 128, 256, 512GB High Performance Storage DDR4 Pin Compatible Direct Load/Store Access Hardware Encryption Native Persistence High Reliability 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 4 What Everyone Should Know About Persistent Memory There are many ways to use it without modifying your program or even knowing it is installed in the system Some applications will want direct access to it Best way to fully leverage what it can do The programming model is for those apps Libraries like PMDK build on it too 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 5 The SNIA Programming Model High order bits Model, not API In-kernel and User space To be honest… My goal of using memory-mapped files for pmem was achieved with spec version 1.0 and the rest has been value on top of that 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 6 Doug’s FMS Slides on the Four Modes Block Mode Innovation Emerging PM Technologies IO Persistent Memory User View NVM.FILE NVM.PM.FILE Kernel Protected NVM.BLOCK NVM.PM.VOLUME Media Typ e Disk Drive Persistent Memory NVDIMM Disk-Like Memory-Like The current version (1.2) of the specification is available at https://www.snia.org/sites/default/files/technical_work/final/NVMProgrammingModel_v1.2.pdf 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 7 My Summary of the Programming Model file memory Manageme Applicati Applicati USER SPACE Application nt UI on on Standard Standard Standard Load/ Management Raw File API File API Store Device Library Access “DAX” KERNEL SPACE File pmem- System Aware MMU File System Mappings Generic NVDIMM Driver Persistent Memory 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 8 Intel-Specific Implementation Details Communicating with the OS ACPI 6.0+ NFIT SMART E820 table HMAT UEFI BTT DSMs for communicating with NVDIMMs Not a standard CPU Cache Flush, PCOMMIT, ADR, eADR, Deep Flush 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 9 Intel: How the Hardware Works MOV Core L1 L1 CLWB + fence Custom L2 -or- Power fail protected domain CLFLUSHOPT + fence indicated by ACPI property: -or- CPU Cache Hierarchy CLFLUSH -or- CPU CACHES CPU NT stores + fence -or- L3 WBINVD (kernel only) Not shown: MCA ADR Minimum Required ADR Failure Detection WPQ -or- WPQ Power fail protected domain: WPQ Flush (kernel only) Memory subsystem DIMM 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 10 App Responsibilities Program Initialization DAX mapped file? (OS provides info) no yes Use standard API for flushing CPU caches (msync/fsync or FlushFileBuffers) considered persistent? no (ACPI provides info) yes Stores considered persistent CLWB? when globally-visible (CPU_ID provides info) no yes Use CLWB+SFENCE CLFLUSHOPT? for flushing (CPU_ID provides info) no yes Use CLFLUSHOPT+SFENCE Use CLFLUSH for flushing for flushing 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 11 Linux: Exposing Persistent Memory to Apps DAX mechanism added (replacing old XIP mechanism) Allowed drivers & file systems to provide direct access ext4 & XFS support upstream 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 12 Linux: A Few Surprises That Came Up Using general-purpose filesystems pmfs work derailed early on Requiring MAP_SYNC Took a long time to arrive at this solution Attitude on per-mount DAX versus per-file Emerging support for RDMA Device DAX 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 13 Linux: Device DAX Doesn’t follow the SNIA programming model Surprising behavior for app writers: read(2)/write(2)/msync(2) don’t work! stat(2) doesn’t tell you the size Can’t back it up using off-the-shelf tools PMDK library hides as much of this as possible Only solution we have for RDMA w/long-lived registrations until WIP finishes Avoiding some minor FS annoyances (for now) ZFOD allocations, app control of large pages 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 14 Linux: Deep Flush Implemented in driver Exposed via sysfs (for Device DAX users) FS journal writes use it msync()/fsync() invokes it Unsafe shutdown detection Left to user space to handle (for now) Some last minute tweaks (permission issues) 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 15 Linux: Uncorrectable Error Handling Tracked in OS, driver will clear them on block zero “mcsafe” version of memcpy not used by any upstream FS yet NOVA did some excellent work in this area Apps can discover, catch SIGBUS punch hole/delete file to clear Clearing poison & writing new data NOT atomic PMDK provides one solution to this Complication: not safe to clear poison with ISA (yet) 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 16 4k Application User SIGBUS Space (via MCE or Bad Block mapping) Bad Blocks (512) MMU pmem-Aware Mappings File System ARS Machine Check Handler Driver Kernel Space 4k Persistent Memory 64 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 17 Windows: Implementation DAX mechanism added NTFS so far User Space flush always safe No need for something like MAP_SYNC No equivalent of Device DAX No exposure of Deep Flush to user space Emerging support for uncorrectables 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 18 Virtualization Several products announced to the public: Support for pmem programming model in a guest VM Expected to use virtual NFIT table VMware (vSphere 6.7), Hyper-V (announced intentions), KVM (upstream) Executive summary: No need for applications to be aware they are in a VM when using pmem Kudos to all these products for making this happen! Model (and other specs) silent of hard problems, like: How to detect eADR while allowing live migration Flushing large ranges Handling poison/machine checks in guests 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 19 Where We Could’ve Done Better Spec has some inconsistencies Example: arguments to operations like Flush, Clear Error Spec has some areas that aren’t covered at all Whether CPU caches even need flushing Key to building anything on top of the NVMP programming model How to discover Performance Granularity of operations Known lost data/Incomplete flush on failure (bad block list, unsafe shutdown count, etc.) 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 20 Future Work for TWG Continue to evolve spec Bring it up to date Add section on current practices Continue to evolve rpmem, RAS, transactions Continue education on persistent memory 2018 Storage Developer Conference. © Intel Corporation. All Rights Reserved. 21.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    21 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us