Design and Implementation of SMB Locking in a Clustered File System

Design and Implementation of SMB Locking in a Clustered File System Aravind Srinivasan EMC, Isilon Storage Division 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. Agenda Overview OneFS Overview Fundamentals of Distributed Locking Challenges in implementing distributed locking Design and Implementation of DLM in OneFS Implementation of SMB locking on top of the DLM in OneFS 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 2 Overview Any clustered file system needs a robust Distributed Lock Manager (DLM) to synchronize resources A file sharing protocol, such as SMB must utilize the DLM appropriately to regulate access to files from multiple clients 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 3 OneFS Overview 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 4 Isilon OneFS Cluster NAS file server Scalable Add more storage in 5 mins Reliable 8x mirror / +4 parity Striped across nodes Single volume file system 3 to 144 nodes Fully symmetric peers No metadata servers Commodity hardware CPU, Mem, Disks 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 5 Isilon OneFS File System Concurrent access to all files with all protocols SMB1/SMB2 NFSv3/NFSv4 SSH HTTP/FTP 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 6 6 OneFS – High Level Overview OneFS is Isilon's sixth-generation operating system that provides the intelligence behind all Isilon scale-out storage systems. It combines the three layers of traditional storage architectures—file system, volume manager and RAID—into one unified software layer, creating a single intelligent file system that spans all nodes within a cluster. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 7 OneFS – High Level Overview Isilon's OneFS enables: Independent or linear scalability of performance and capacity A single point of management for large and rapidly growing repositories of data Mission-critical reliability and high availability with state-of-the-art data protection 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 8 Fundamentals of Distributed Locking Multiple writers to the same file - need a reader- writer lock Writers can be on different nodes – need a distributed locking system File contents corrupted! Clustered File System Volume /volume/somefile write write Node 1 Node 2 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 9 DLM Challenges Performance Multiple requirements depending upon the protocol requirements Exposing the appropriate APIs to utilize the DLM 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 10 Design and Implementation of DLM In OneFS Goal of DLM /ifs/somefile write write Lk resource OneFS Volume 1 File contents intact 2 EX-lock EX-lock Lk resource DLM module (lk) 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 11 DLM in OneFS From the perspective of the DLM a resource is simply an identifier. It can be a number or it can be an arbitrary blob of data (as in OneFS Lock Manager). Resources can have a number of modes which can be acquired which determine the level of exclusivity required by the client. The DLM in OneFS is named LK 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 12 Requirements of LK The goal of LK is to provide the infrastructure upon which POSIX, NFS and SMB can implement kernel enforced, cluster coherent locks. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 13 Requirements of LK The requirements can be grouped into the following major areas: Ranges allowed (ie: number of bits, behavior at boundaries) Semantics (ie: modes allowed) Wait types (ie: blocking, non-blocking, asynchronous) 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 14 Requirements of LK (Contd) Conversions allowed (ie: are conversions from one type of lock to another allowed? Eg: converting a lock from shared to exclusive) Reference counting semantics (stacked vs. reference counted) Fairness (strict vs. opportunistic) Miscellaneous 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 15 LK Design The DLM is split into two distinct roles: Initiator and Coordinator 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 16 LK - Coordinator The coordinator will deal with nodes as a whole, and won’t know about individual threads on a node. From the coordinator’s point of view, a node will request a lock, own the lock, and then release the lock. For example, if a node asks for an exclusive lock while holding a shared lock, the exclusive lock will be granted immediately, provided that no other nodes hold shared locks. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 17 LK - Initiator The initiator is the one requesting the lock. On the initiator side, there is one entry for each resource for which there is a local owner or waiter. Each entry contains a list of all the local owners and a number of queues containing waiters. The main queue hangs directly off of the lock entry, while the rest hang off of per-lock-type structures. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 18 Three Messages In LK LK uses three messages to communicate between the initiator and the coordinator Request - Generated by initiator and sent to coordinator. Contains the Needs and Wants of the initiator Grant - Generated by the coordinator and sent to initiator. Contains the goals for the resource on this initiator and the additional holds. Release Generated by the initiator and sent to the coordinator.Used to release an initiator's hold on a lock. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 19 LK Terms Need This is the mode of the lock that the client requires. Eg: needs shared mode Want This is the set of modes of the lock that the client may want as soon as they are not being used by another client. Eg: want exclusive and delete. Holds This is the set of additional modes of the lock which the coordinator has granted the initiator. eg: holding exclusive and shared Goal This is the set of modes that the initiator should attempt to achieve as soon as it is able. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 20 LK Terms (Contd) The resource parameter in LK represents what is being locked or unlocked. It is an arbitrary blob of data. The locker parameter represents who is locking or unlocking. This is the parameter which is used for deadlock detection. The domain parameter represents, not surprisingly, the lock domain. There can be multiple lock domain in existence at any time, each one controlling locks for a different aspect of the system. Eg: OPLOCK domain/CBRL domain The wait type parameter controls whether the potentially- blocking functions are allowed to block indefinitely. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 21 LK Callbacks Lock owners can register callbacks to be called when the node gives up a certain type of lock. Initiator delays releasing locks which have callbacks registered. Instead, it creates a special type of local waiter and puts it on the main queue. When the special local waiter is converted into a lock owner, the callback is then called. After the callback is done, its lock owner will go away, and the initiator will release the lock for real. That is, of course, unless there are still other lock callbacks pending in the main queue. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 22 SMB locking on top of LK in OneFS SMB in OneFS uses LK for all its locking purposes such as Oplocks and BRLs. An event channel is registered between the SMB daemon and the OneFS kernel. The results from LK are communicated via the registered event channel. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 23 SMB Locking on top of LK in OneFS The locker parameter is specified as part of the syscall to acquire the appropriate lock. The locker parameter can be either the client lease key (for leases) or the MID, TID and PID combination for BRLs or just the file pointer for legacy oplocks Basically, the locker uniquely identifies the owner of the lock. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 24 SMB Locking on top of LK in OneFS A unique 64 bit ID is also passed as part of the syscall, which will be used to register call backs in LK Whenever a lock is contended, the registered callback routine is triggered and will notify the userspace using the appropriate ID. The userspace has to maintain the async state and should respond to the message from the kernel appropriately 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 25 SMB Locking on top of LK in OneFS Using LK for locking, pushes all the SMB locking requirements down to the kernel, thereby significantly improving the performance and also achieving cluster coherency. The support for callbacks enable us to register async operations and prevent blocking in the kernel. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 26 Summary Distributed locking in OneFS is achieved by using a OneFS specific DLM called LK LK achieves the basic cluster coherency and also provides performance benefits as well as scalability LK can also be easily extended to support other future protocols by adding a new lock domain if necessary. 2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved. 27 Questions? Contact Aravind Srinivasan [email protected] 2012 Storage Developer Conference.

Design and Implementation of SMB Locking in a Clustered File System

HP Storageworks Clustered File System Command Line Reference

Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Designing High-Performance and Scalable Clustered Network Attached Storage with Infiniband

Newest Trends in High Performance File Systems

A Guide to the IBM Clustered Network File System

OCFS2: the Oracle Clustered File System, Version 2

Sistemi Di Storage: Clustered Filesystems

Best Practices for Data Sharing in a Grid Distributed SAS Environment

Around the Linux File System World in 45 Minutes

A Cross-Platform, High Performance Shared Storage System Master of Science Thesis in the Programme Networks and Distributed Systems

Red Hat Gluster Storage Product Overview