Implementing SMB Semantics in a Linux Cluster 2020 Linux Storage

Implementing SMB Semantics in a Linux Cluster 2020 Linux Storage

Implementing SMB semantics in a Linux cluster 2020 Linux Storage and Filesystems Conference Santa Clara, CA Volker Lendecke Samba Team / SerNet 2020-02-25 Who am I? I Co-Founder of SerNet in G¨ottingen,Germany I First Samba patches in 1994 I Early Samba Team member I Samba infrastructure (tdb, tevent, etc) I File server I Clustered Samba I Winbind I AD controller is my colleague Stefan Metzmacher's domain I Stefan implemented AD multi-master replication in Samba Volker Lendecke SMB semantics (2 / 12) What is Samba? I www.samba.org: Samba is the standard Windows interoperability suite of programs for Linux and Unix I Server- and Client-Implementation of the Server Message Block (SMB) protocol I SMB is the Windows protocol to share drives across the network I Comparable to NFS (NFSv4 RFCs feels very familiar) I Print server for Windows clients I Active Directory domain member I Make Active Directory users and groups available on Linux I Active Directory domain controller I Provide user database for Windows and Unix clients Volker Lendecke SMB semantics (3 / 12) What is SMB? I \Server Message Block" I Started in the 1980s, developed until today I Since EU verdict (2007?) well documented I SMB semantics: single-tasking DOS \on the wire" I Every application by definition had exclusive file access I SHARE.EXE maintained illusion by blocking concurrent access I Network-aware applications could explicitly permit sharing per open I Posix opens only have to read metadata I Permissions, file location etc I Inherent scalability problem through share modes I SMB opens need to examine all other opens Volker Lendecke SMB semantics (4 / 12) Samba architecture I For every client Samba forks a new process I Distinct memory space for every process I Spec (MS-SMB2/MS-FSA) suggests a lot of shared tables I Lists of clients, open files, lots more I Samba can't use any of those data structures directly I Samba shares data structures via shared key/value stores I TDB is a memory-mapped hash table I Protection via fcntl locks or shared mutexes I TDB provides a clean separation layer I This made clustering initially possible I Process separation extended to nodes Volker Lendecke SMB semantics (5 / 12) SMB share modes and leases I Share Modes (a.k.a Share Reservations) I Every open call requests access permissions I READ, WRITE or DELETE (among others) I Every open call allows other permissions I Concurrent READ, WRITE or DELETE permitted I First come, first serve I NFS4 does not have DELETE I Oplocks / Leases (a.k.a. Delegations) I Cache coherency protocol, per-file granularity I Interoperability with NFS highly welcome I Linux fcntl F SETLEASE and flock don't match SMB semantics I Samsung's in-kernel SMB server needs this as well Volker Lendecke SMB semantics (6 / 12) Implementation of SMB locking I One locking.tdb record per inode I Metadata: File name, delete token, time stamps I One share mode entry per fd I One share mode lease per lease key (leases shared across fds) I Open a file I Walk the share mode entry array, on conflict return NT STATUS SHARING VIOLATION I Look at the share mode lease array I On conflict, send a message to lease holding process I Lease holder will \break the lease" with the client I Close a file I Clean up, inform potential lease breakers I Problem: There can be LOTS of open handles on an inode Volker Lendecke SMB semantics (7 / 12) Clustered TDB ctdb I ctdb extends tdb files beyond a single machine I ctdbd is a daemon to move records around I smbd requesting a record gets a local copy I ctdb maintains the most recent record location I locking.tdb can be lossy I Share mode state valid only for open file handles I A crashed node's file handles are closed by definition I Samba deals with crashed processes since day one I ctdb record access is like NUMA with extreme node distance I More services by ctdbd: I Cluster membership I Remote messaging transport I Remote process exists() API Volker Lendecke SMB semantics (8 / 12) ctdb Architecture Node 0 Node 1 TCP ctdb ctdb sock sock sock sock mmap smbd smbd mmap smbd mmap smbd mmap mmap mmap locking.tdb locking.tdb Volker Lendecke SMB semantics (9 / 12) Scalability work on progress I Avoid walking the share mode array I Share mode conflict: I I want to write, but someone else did not grant FILE SHARE WRITE I I don't grant FILE SHARE WRITE, but someone already writes I Same for READ and DELETE, First come, first serve I Central flags field to hold most restrictive share mode I Intersection of all share modes granted I Union of all granted access I Opening a file just checks the per-file summary I If there's a conflict, recalculate the truth I Share mode array exists in a separate TDB file I Handling much more efficient than before I Roughly factor 100 for specific tests Volker Lendecke SMB semantics (10 / 12) Next steps I Move share entries.tdb back into locking.tdb I Non-contended file access got slower (3 instead of 2 records) I Now that the logic works, we can optimize data structures I Base locking.tdb on g lock.tdb technology I Avoid tdb locks while doing open/close/unlink/rename etc I Improve parallelism, reduce contention I Enable ctdb recovery while cluster file system is stuck I Spread locking.tdb across per-node per-inode records I Parallel case (no share mode conflicts) only looks at one record I Conflicting case must take all records into account Volker Lendecke SMB semantics (11 / 12) Questions? [email protected] / [email protected] http://www.sambaxp.org/ Volker Lendecke SMB semantics (12 / 12).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us