Persistent Memory Programming on Conventional Hardware

Total Page:16

File Type:pdf, Size:1020Kb

Persistent Memory Programming on Conventional Hardware Persistent Memory Programming on Conventional Hardware Terence Kelly [email protected] http://ai.eecs.umich.edu/~tpkelly/ SNIA Storage Developer Conference 25 September 2019 c 2019 Terence Kelly, all rights reserved. 1 / 69 Prizes to first three correct outputs Bored? Fidgety? Mini-Hackathon! ACM Queue magazine, July/Aug 2019 (queue.acm.org) Download \famus" library Run \example 01.csh" C-shell script on Linux box E-mail output to [email protected] Subject line \[famus output]" 2 / 69 Bored? Fidgety? Mini-Hackathon! ACM Queue magazine, July/Aug 2019 (queue.acm.org) Download \famus" library Run \example 01.csh" C-shell script on Linux box E-mail output to [email protected] Subject line \[famus output]" Prizes to first three correct outputs 3 / 69 NVM Returns 4 / 69 NVM Programming Style Persistent application data lives in memory only Access/update with LOAD/STORE No separate persistent store e.g., relational database, key-value store 5 / 69 Fewer moving parts: no quirky external store One data format: no serializers/parsers One paradigm: no context switching, \impedance mismatch" Simplicity alone improves cost, correctness, performance NVM Programming Advantage: Simplicity The cheapest, fastest and most reliable components of a computer system are those that aren't there. |Gordon Bell 6 / 69 One data format: no serializers/parsers One paradigm: no context switching, \impedance mismatch" Simplicity alone improves cost, correctness, performance NVM Programming Advantage: Simplicity The cheapest, fastest and most reliable components of a computer system are those that aren't there. |Gordon Bell Fewer moving parts: no quirky external store 7 / 69 One paradigm: no context switching, \impedance mismatch" Simplicity alone improves cost, correctness, performance NVM Programming Advantage: Simplicity The cheapest, fastest and most reliable components of a computer system are those that aren't there. |Gordon Bell Fewer moving parts: no quirky external store One data format: no serializers/parsers 8 / 69 Simplicity alone improves cost, correctness, performance NVM Programming Advantage: Simplicity The cheapest, fastest and most reliable components of a computer system are those that aren't there. |Gordon Bell Fewer moving parts: no quirky external store One data format: no serializers/parsers One paradigm: no context switching, \impedance mismatch" 9 / 69 NVM Programming Advantage: Simplicity The cheapest, fastest and most reliable components of a computer system are those that aren't there. |Gordon Bell Fewer moving parts: no quirky external store One data format: no serializers/parsers One paradigm: no context switching, \impedance mismatch" Simplicity alone improves cost, correctness, performance 10 / 69 Crash-consistency mechanisms Application data in mmap'd files NVM Programming Platforms Research/academic Differences NV-heaps Compiler support Mnemosyne Concurrency/isolation Atlas Industrial Similarities PMDK (Intel) C/C++ NVM Direct (Oracle) 11 / 69 Application data in mmap'd files NVM Programming Platforms Research/academic Differences NV-heaps Compiler support Mnemosyne Concurrency/isolation Atlas Crash-consistency mechanisms Industrial Similarities PMDK (Intel) C/C++ NVM Direct (Oracle) 12 / 69 NVM Programming Platforms Research/academic Differences NV-heaps Compiler support Mnemosyne Concurrency/isolation Atlas Crash-consistency mechanisms Industrial Similarities PMDK (Intel) C/C++ NVM Direct Application data in mmap'd files (Oracle) 13 / 69 Implementable on conventional hardware, without NVM Ye Olde Persistent Memorie Application data in mmap()'d file Sparse backing file: pay-as-you-go storage footprint Pointer cast interprets file as typed application data Persistent heap allocates from mmap()'d file Sliding persistent heap beneath legacy software 14 / 69 Ye Olde Persistent Memorie Application data in mmap()'d file Sparse backing file: pay-as-you-go storage footprint Pointer cast interprets file as typed application data Persistent heap allocates from mmap()'d file Sliding persistent heap beneath legacy software Implementable on conventional hardware, without NVM 15 / 69 Persistent Memory as a Software Abstraction 16 / 69 Outline Part I: Review p-mem programming on conventional hardware (assuming orderly shutdown, i.e., no failures) Part II: Crash consistency (power outages, kernel panics, process crashes) 17 / 69 mmap() Process Virtual in−memory image Address Space: low addr high addr mmap() return value { (start addr) On−Disk Filesystem: backing file { offset length 18 / 69 /* interpret file as byte array */ /* access persistent data... */ /* ... via LOADs ... */ /* ... and STOREs */ /* cf. "sed 's/abc/ABC/g' < input > output" */ Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { if ('a' == * p && 'b' == *(p+1) && 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; *(p+2) = 'C'; } } 19 / 69 /* access persistent data... */ /* ... via LOADs ... */ /* ... and STOREs */ /* cf. "sed 's/abc/ABC/g' < input > output" */ Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; /* interpret file as byte array */ start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { if ('a' == * p && 'b' == *(p+1) && 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; *(p+2) = 'C'; } } 20 / 69 /* ... via LOADs ... */ /* ... and STOREs */ /* cf. "sed 's/abc/ABC/g' < input > output" */ Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; /* interpret file as byte array */ start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { /* access persistent data... */ if ('a' == * p && 'b' == *(p+1) && 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; *(p+2) = 'C'; } } 21 / 69 /* ... and STOREs */ /* cf. "sed 's/abc/ABC/g' < input > output" */ Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; /* interpret file as byte array */ start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { /* access persistent data... */ if ('a' == * p && 'b' == *(p+1) && /* ... via LOADs ... */ 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; *(p+2) = 'C'; } } 22 / 69 /* cf. "sed 's/abc/ABC/g' < input > output" */ Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; /* interpret file as byte array */ start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { /* access persistent data... */ if ('a' == * p && 'b' == *(p+1) && /* ... via LOADs ... */ 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; /* ... and STOREs */ *(p+2) = 'C'; } } 23 / 69 Warm-Up: In-Place \abc" ! \ABC" char *start, *end, *p; /* interpret file as byte array */ start = (char *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); end = start + filesize - 2; for (p = start; p < end; p++) { /* access persistent data... */ if ('a' == * p && 'b' == *(p+1) && /* ... via LOADs ... */ 'c' == *(p+2)) { * p = 'A'; *(p+1) = 'B'; /* ... and STOREs */ *(p+2) = 'C'; } } /* cf. "sed 's/abc/ABC/g' < input > output" */ 24 / 69 /* interpret file contents as application-defined type */ /* access/update persistent data ... */ /* ... via LOADs ... */ /* ... and STOREs, with type checking */ Typed Data Structures struct foo { int bar; double qux; } *foop; foop = (struct foo *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); if (foop->bar) foop->qux *= 2.0; 25 / 69 /* interpret file contents as application-defined type */ /* access/update persistent data ... */ /* ... via LOADs ... */ /* ... and STOREs, with type checking */ Typed Data Structures struct foo { int bar; double qux; } *foop; foop = (struct foo *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); if (foop->bar) foop->qux *= 2.0; 26 / 69 /* access/update persistent data ... */ /* ... via LOADs ... */ /* ... and STOREs, with type checking */ Typed Data Structures struct foo { int bar; double qux; } *foop; /* interpret file contents as application-defined type */ foop = (struct foo *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); if (foop->bar) foop->qux *= 2.0; 27 / 69 /* ... via LOADs ... */ /* ... and STOREs, with type checking */ Typed Data Structures struct foo { int bar; double qux; } *foop; /* interpret file contents as application-defined type */ foop = (struct foo *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); /* access/update persistent data ... */ if (foop->bar) foop->qux *= 2.0; 28 / 69 Typed Data Structures struct foo { int bar; double qux; } *foop; /* interpret file contents as application-defined type */ foop = (struct foo *)mmap(NULL, filesize, PROT_READ | PROT_WRITE, MAP_SHARED, filedescriptor, 0); /* access/update persistent data ... */ if (foop->bar) /* ... via LOADs ... */ foop->qux *= 2.0; /* ... and STOREs, with type checking */ 29 / 69 Persistent Heap, Gordian Knot Edition typedef struct { void *mapaddr, *root, *avail, *end; } *pheap_t; static pheap_t e_pheap; /* persistent heap bookkeeping */ static void *M(size_t s) { /* simple bump-pointer allocator */ void *r = e_pheap->avail; size_t u = sizeof *e_pheap, nu = s / u + (0 == s % u ? 0 : 1); e_pheap->avail = (pheap_t)(e_pheap->avail) + nu; assert(e_pheap->avail
Recommended publications
  • Oracle® Linux 7 Managing File Systems
    Oracle® Linux 7 Managing File Systems F32760-07 August 2021 Oracle Legal Notices Copyright © 2020, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • Parallel File Systems for HPC Introduction to Lustre
    Parallel File Systems for HPC Introduction to Lustre Piero Calucci Scuola Internazionale Superiore di Studi Avanzati Trieste November 2008 Advanced School in High Performance and Grid Computing Outline 1 The Need for Shared Storage 2 The Lustre File System 3 Other Parallel File Systems Parallel File Systems for HPC Cluster & Storage Piero Calucci Shared Storage Lustre Other Parallel File Systems A typical cluster setup with a Master node, several computing nodes and shared storage. Nodes have little or no local storage. Parallel File Systems for HPC Cluster & Storage Piero Calucci The Old-Style Solution Shared Storage Lustre Other Parallel File Systems • a single storage server quickly becomes a bottleneck • if the cluster grows in time (quite typical for initially small installations) storage requirements also grow, sometimes at a higher rate • adding space (disk) is usually easy • adding speed (both bandwidth and IOpS) is hard and usually involves expensive upgrade of existing hardware • e.g. you start with an NFS box with a certain amount of disk space, memory and processor power, then add disks to the same box Parallel File Systems for HPC Cluster & Storage Piero Calucci The Old-Style Solution /2 Shared Storage Lustre • e.g. you start with an NFS box with a certain amount of Other Parallel disk space, memory and processor power File Systems • adding space is just a matter of plugging in some more disks, ot ar worst adding a new controller with an external port to connect external disks • but unless you planned for
    [Show full text]
  • Ubuntu Server Guide Basic Installation Preparing to Install
    Ubuntu Server Guide Welcome to the Ubuntu Server Guide! This site includes information on using Ubuntu Server for the latest LTS release, Ubuntu 20.04 LTS (Focal Fossa). For an offline version as well as versions for previous releases see below. Improving the Documentation If you find any errors or have suggestions for improvements to pages, please use the link at thebottomof each topic titled: “Help improve this document in the forum.” This link will take you to the Server Discourse forum for the specific page you are viewing. There you can share your comments or let us know aboutbugs with any page. PDFs and Previous Releases Below are links to the previous Ubuntu Server release server guides as well as an offline copy of the current version of this site: Ubuntu 20.04 LTS (Focal Fossa): PDF Ubuntu 18.04 LTS (Bionic Beaver): Web and PDF Ubuntu 16.04 LTS (Xenial Xerus): Web and PDF Support There are a couple of different ways that the Ubuntu Server edition is supported: commercial support and community support. The main commercial support (and development funding) is available from Canonical, Ltd. They supply reasonably- priced support contracts on a per desktop or per-server basis. For more information see the Ubuntu Advantage page. Community support is also provided by dedicated individuals and companies that wish to make Ubuntu the best distribution possible. Support is provided through multiple mailing lists, IRC channels, forums, blogs, wikis, etc. The large amount of information available can be overwhelming, but a good search engine query can usually provide an answer to your questions.
    [Show full text]
  • Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques
    Comparative Analysis of Distributed and Parallel File Systems’ Internal Techniques Viacheslav Dubeyko Content 1 TERMINOLOGY AND ABBREVIATIONS ................................................................................ 4 2 INTRODUCTION......................................................................................................................... 5 3 COMPARATIVE ANALYSIS METHODOLOGY ....................................................................... 5 4 FILE SYSTEM FEATURES CLASSIFICATION ........................................................................ 5 4.1 Distributed File Systems ............................................................................................................................ 6 4.1.1 HDFS ..................................................................................................................................................... 6 4.1.2 GFS (Google File System) ....................................................................................................................... 7 4.1.3 InterMezzo ............................................................................................................................................ 9 4.1.4 CodA .................................................................................................................................................... 10 4.1.5 Ceph.................................................................................................................................................... 12 4.1.6 DDFS ..................................................................................................................................................
    [Show full text]
  • LSF ’07: 2007 Linux Storage & Filesystem Workshop Namically Adjusted According to the Current Popularity of Its Hot Zone
    June07login1summaries_press.qxd:login summaries 5/27/07 10:27 AM Page 84 PRO: A Popularity-Based Multi-Threaded Reconstruction overhead of PRO is O(n), although if a priority queue is Optimization for RAID-Structured Storage Systems used in the PRO algorithm the computation overhead can Lei Tian and Dan Feng, Huazhong University of Science and be reduced to O(log n). The entire PRO implementation in Technology; Hong Jiang, University of Nebraska—Lincoln; Ke the RAIDFrame software only added 686 lines of code. Zhou, Lingfang Zeng, Jianxi Chen, and Zhikun Wang, Hua- Work on PRO is ongoing. Future work includes optimiz - zhong University of Science and Technology and Wuhan ing the time slice, scheduling strategies, and hot zone National Laboratory for Optoelectronics; Zhenlei Song, length. Currently, PRO is being ported into the Linux soft - Huazhong University of Science and Technology ware RAID. Finally, the authors plan on further investigat - ing use of access patterns to help predict user accesses and Hong Jiang began his talk by discussing the importance of of filesystem semantic knowledge to explore accurate re - data recovery. Disk failures have become more common in construction. RAID-structured storage systems. The improvement in disk capacity has far outpaced improvements in disk band - The first questioner asked about the average rate of recov - width, lengthening the overall RAID recovery time. Also, ery for PRO. Hong answered that the average reconstruc - disk drive reliability has improved slowly, resulting in a tion time is several hundred seconds in the experimental very high overall failure rate in a large-scale RAID storage setup.
    [Show full text]
  • Oracle® Private Cloud Appliance Administrator's Guide for Release 2.4.3
    Oracle® Private Cloud Appliance Administrator's Guide for Release 2.4.3 F23081-02 August 2020 Oracle Legal Notices Copyright © 2013, 2020, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • Ceph – Software Defined Storage Für Die Cloud
    Ceph – Software Defined Storage für die Cloud CeBIT 2016 15. März 2015 Michel Rode Linux/Unix Consultant & Trainer B1 Systems GmbH [email protected] B1 Systems GmbH - Linux/Open Source Consulting, Training, Support & Development Vorstellung B1 Systems gegründet 2004 primär Linux/Open Source-Themen national & international tätig über 70 Mitarbeiter unabhängig von Soft- und Hardware-Herstellern Leistungsangebot: Beratung & Consulting Support Entwicklung Training Betrieb Lösungen dezentrale Strukturen B1 Systems GmbH Ceph – Software Defined Storage für die Cloud 2 / 36 Schwerpunkte Virtualisierung (XEN, KVM & RHEV) Systemmanagement (Spacewalk, Red Hat Satellite, SUSE Manager) Konfigurationsmanagement (Puppet & Chef) Monitoring (Nagios & Icinga) IaaS Cloud (OpenStack & SUSE Cloud & RDO) Hochverfügbarkeit (Pacemaker) Shared Storage (GPFS, OCFS2, DRBD & CEPH) Dateiaustausch (ownCloud) Paketierung (Open Build Service) Administratoren oder Entwickler zur Unterstützung des Teams vor Ort B1 Systems GmbH Ceph – Software Defined Storage für die Cloud 3 / 36 Storage Cluster B1 Systems GmbH Ceph – Software Defined Storage für die Cloud 4 / 36 Was sind Storage Cluster? hochverfügbare Systeme verteilte Standorte skalierbar (mehr oder weniger) Problem: Häufig Vendor-Lock-In 80%+ basieren auf FC B1 Systems GmbH Ceph – Software Defined Storage für die Cloud 5 / 36 Beispiele 1/2 Dell PowerVault IBM SVC NetApp Metro Cluster NetApp Clustered Ontap ... B1 Systems GmbH Ceph – Software Defined Storage für die Cloud 6 / 36 Beispiele 2/2 AWS S3 Rackspace Files Google Cloud
    [Show full text]
  • Comparing a Highly-Available Symmetrical Parallel Cluster File System with an Asymmetrical Parallel File System
    pCFS vs. PVFS: Comparing a Highly-Available Symmetrical Parallel Cluster File System with an Asymmetrical Parallel File System Paulo A. Lopes and Pedro D. Medeiros CITI and Department of Informatics, Faculty of Science and Technology, Universidade Nova de Lisboa 2829-516 Monte de Caparica, Portugal {pal,pm}@di.fct.unl.pt Abstract. pCFS is a highly available parallel, symmetrical (where nodes per- form both compute and I/O work) cluster file system that we have designed to run in medium-sized clusters. In this paper, using exactly the same hardware and Linux version across all nodes we compare pCFS with two distinct configu- rations of PVFS: one using internal disks, and therefore not able to provide any tolerance against disk and/or I/O node failures, and another where PVFS I/O servers access LUNs in a disk array and thus provide high availability (in the following named HA-PVFS). We start by measuring I/O bandwidth and CPU consumption of PVFS and HA-PVFS setups; then, the same set of tests is performed with pCFS. We conclude that, when using the same hardware, pCFS compares very favourably with HA-PVFS, offering the same or higher I/O bandwidths at a much lower CPU consumption. Keywords: Shared-disk cluster file systems, performance, parallel file systems. 1 Introduction In asymmetrical file systems, some nodes must be set aside as I/O or metadata servers while others are used as compute nodes, while in symmetrical file systems all nodes run the same set of services and may simultaneously perform both I/O and computa- tional tasks.
    [Show full text]
  • Introduction to Distributed File Systems. Orangefs Experience 0.05 [Width=0.4]Lvee-Logo-Winter
    Introduction to distributed file systems. OrangeFS experience Andrew Savchenko NRNU MEPhI, Moscow, Russia 16 February 2013 . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Outline 1. Introduction 2. Species of distributed file systems 3. Behind the curtain 4. OrangeFS 5. Summary . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Species of distributed file systems Network Clustered Distributed Terminology is ambiguous!. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Network file systems client1 client2 clientN server storage A single server (or at least an appearance) and multiple network clients. Examples: NFS, CIFS. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Clustered file systems server1 server2 serverN storage Servers sharing the same local storage (usually SAN[2] at block level). shared storage architecture. Examples: GFS2[3], OCFS2[4]. .. .. .. .. .
    [Show full text]
  • View the Slides
    SMB3.1.1 POSIX Protocol Extensions: Summary and Current Implementation Status Steve French Azure Storage – Microsoft Samba Team And SMB Jeremy Allison Google/Samba Team 3.1.1 Legal Statement This work represents the views of the author(s) and does not necessarily reflect the views of Microsoft or Google Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. Outline Linux is a lot more than POSIX ... Why do these extensions matter? Implementation Status What works today? Some details How to handle Linux continuing to extend APIs? Wireshark and Tracing Linux > POSIX Currently huge number of syscalls! (try “git grep SYSCALL_DEFINE” well over 850 and 500+ are even documented “man syscalls” FS layer has 223). Verified today vs Only about 100 POSIX API calls 513 syscalls with man pages! +12 just since last year’s SDC! Some examples of new fs ones from past 9 months ... Syscall name Kernel Version introduced io_uring_enter 5.1 io_uring_register 5.1 io_uring_setup 5.1 move_mount 5.2 open_tree 5.2 fsconfig 5.2 fsmount 5.2 fsopen 5.2 fspick 5.2 Repeating an old slide ... Remember LINUX > POSIX And not just new syscalls … new flags ... 2 examples of richer Linux vs. simpler POSIX fallocate has 7 flags – Insert range – Unshare range – Zero range – Keep size – But POSIX fallocate has no flags Rename (renameat2) has 3 flags – noreplace, whiteout and exchange – POSIX rename has none Network File systems matter ● these extensions to most popular network fs protocol (SMB3) are important ● block devices struggle to do file system tasks: locking, security, leases, consistent metadata Linux Apps need to work over network mounts and continue to work as Linux evolves Improve common situations where customers have Linux and Windows and Mac clients Make sure extensions work with most secure, most optimal SMB3.1.1 dialect (don’t encourage less secure network file systems, or even SMB1/CIFS) Quick Overview of Status ● Linux kernel client: – 5.1 kernel or later can be used.
    [Show full text]
  • White Paper | September 2014
    Oracle OpenStack for Oracle Linux High Availability Guide Release 1.0 ORACLE WHITE PAPER | SEPTEMBER 2014 ORACLE OPENSTACK FOR ORACLE LINUX HIGH AVAILABILITY GUIDE Contents Introduction 1 Active-Passive HA Deployment Architecture 1 Active-Passive HA Networking 2 Install Oracle Grid Infrastructure 3 Deploy OpenStack on Nodes 4 Configure iptables for Oracle Clusterware Traffic 4 Install OpenStack HA Package 5 Configure HA for MySQL 5 Configure HA for RabbitMQ 7 Configure HA for OpenStack 8 Configure HA for Keystone 8 Configure HA for Cinder 9 Configure HA for Glance 11 Configure HA for Swift 12 Configure HA for Nova 13 Configure HA for Neutron 15 Configure HA for Network Controller Nodes 15 ORACLE OPENSTACK FOR ORACLE LINUX HIGH AVAILABILITY GUIDE Introduction A High Availability (HA) cluster is a set of servers (nodes) managed by clusterware to work together for minimizing application downtime as much as possible. Clusterware enables servers to communicate with each other, so that they appear to function as a collective unit. In an active-passive HA cluster, there is one active server running the application, and one or more nodes standing by. If any problem were to occur on the active node that prevents it from running the application, the clusterware is able to detect this problem and start the protected applications on one of the standby nodes. Oracle Clusterware provides the infrastructure necessary to run Oracle Real Application Clusters (Oracle RAC). Oracle Clusterware can also manage user applications and other resources, such as virtual IP addresses. This document shows you how to use Oracle Clusterware to manage Oracle OpenStack nodes to make its services HA.
    [Show full text]
  • Around the Linux File System World in 45 Minutes
    Around the Linux File System World in 45 minutes Steve French IBM Samba Team [email protected] Abstract lines of code), most active (averaging 10 changesets a day!), and most important kernel subsystems. What makes the Linux file system interface unique? What has improved over the past year in this important part of the kernel? Why are there more than 50 Linux 2 The Linux Virtual File System Layer File Systems? Why might you choose ext4 or XFS, NFS or CIFS, or OCFS2 or GFS2? The differences are not al- The heart of the Linux file system, and what makes ways obvious. This paper will describe the new features Linux file systems unique, is the virtual file system in the Linux VFS, how various Linux file systems dif- layer, or VFS, which they connect to. fer in their use, and compare some of the key Linux file systems. 2.1 Virtual File System Structures and Relation- ships File systems are one of the largest and most active parts of the Linux kernel, but some key sections of the file sys- tem interface are little understood, and with more than The relationships between Linux file system compo- than 50 Linux file systems the differences between them nents is described in various papers [3] and is impor- can be confusing even to developers. tant to understand when carefully comparing file system implementations. 1 Introduction: What is a File System? 2.2 Comparison with other Operating Systems Linux has a rich file system interface, and a surprising number of file system implementations.
    [Show full text]