Distributed Systems Distributed File Systems Case Studies Network File
Total Page:16
File Type:pdf, Size:1020Kb
Distributed File Systems Distributed Systems Case Studies NFS • AFS • CODA • DFS • SMB • CIFS Distributed File Systems Dfs • WebDAV • GFS • Gmail-FS? • xFS Paul Krzyzanowski [email protected] Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Page 1 Page 2 NFS Design Goals – Any machine can be a client or server – Must support diskless workstations – Heterogeneous systems must be supported NFS • Different HW, OS, underlying file system Network File System – Access transparency • Remote files accessed as local files through normal file Sun Microsystems system calls (via VFS in UNIX) – Recovery from failure • Stateless, UDP, client retries c. 1985 – High Performance • use caching and read-ahead Page 3 Page 4 NFS Design Goals NFS Design Goals No migration transparency No support for UNIX file access semantics If resource moves to another server, client must Stateless design: file locking is a problem. remount resource. All UNIX file system controls may not be available. Page 5 Page 6 1 NFS Design Goals NFS Design Goals Devices Transport Protocol must support diskless workstations where every file Initially NFS ran over UDP using Sun RPC is remote. Why UDP? Remote devices refer back to local devices. - Slightly faster than TCP - No connection to maintain (or lose) - NFS is designed for Ethernet LAN environment – relatively reliable - Error detection but no correction. NFS retries requests Page 7 Page 8 NFS Protocols Mounting Protocol Mounting protocol • Send pathname to server Request access to exported directory tree • Request permission to access contents client: parses pathname Directory & File access protocol contacts server for file handle Access files and directories (read, write, mkdir, readdir, …) • Server returns file handle – File device #, inode #, instance # client: create in-code vnode at mount point. (points to inode for local files) points to rnode for remote files - stores state on client Page 9 Page 10 Mounting Protocol Directory and file access protocol static mounting • First, perform a lookup RPC – mount request contacts server – returns file handle and attributes • Not like open Server: edit /etc/exports – No information is stored on server Client: mount fluffy:/users/paul /home/paul • handle passed as a parameter for other file access functions – e.g. read(handle, offset, count) Page 11 Page 12 2 Directory and file access protocol NFS Performance NFS has 16 functions • Usually slower than local – (version 2; six more added in version 3) • Improve by caching at client – Goal: reduce number of remote operations – Cache results of null link getattr read, readlink, getattr, lookup, readdir lookup symlink setattr – Cache file data at client (buffer cache) readlink – Cache file attribute information at client create statfs – Cache pathname bindings for faster lookups remove mkdir • Server side rename rmdir – Caching is “automatic” via buffer cache readdir – All NFS writes are write-through to disk to avoid read unexpected data loss if server dies write Page 13 Page 14 Inconsistencies may arise Validation Try to resolve by validation • Always invalidate data after some time – Save timestamp of file – After 3 seconds for open files (data blocks) – When file opened or server contacted for new – After 30 seconds for directories block • Compare last modification time • If data block is modified, it is: • If remote is more recent, invalidate cached data – Marked dirty – Scheduled to be written – Flushed on file close Page 15 Page 16 Improving read performance Problems with NFS • Transfer data in large chunks • File consistency – 8K bytes default • Assumes clocks are synchronized • Read-ahead • Open with append cannot be guaranteed to work – Optimize for sequential file access – Send requests to read disk blocks before they are • Locking cannot work requested by the application – Separate lock manager added (stateful) • No reference counting of open files – You can delete a file you (or others) have open! • Global UID space assumed Page 17 Page 18 3 Problems with NFS Problems with NFS • No reference counting of open files • File permissions may change – You can delete a file you (or others) have open! – Invalidating access to file • Common practice • No encryption – Create temp file, delete it, continue access – Requests via unencrypted RPC – Sun’s hack: – Authentication methods available • If same process with open file tries to delete it • Diffie-Hellman, Kerberos, Unix-style • Move to temp name – Rely on user-level software to encrypt • Delete on close Page 19 Page 20 Improving NFS: version 2 Improving NFS: version 2 • User-level lock manager • Adjust RPC retries dynamically – Monitored locks – Reduce network congestion from excess RPC • status monitor: monitors clients with locks retransmissions under load • Informs lock manager if host inaccessible – Based on performance • If server crashes: status monitor reinstates locks on recovery • If client crashes: all locks from client are freed • Client-side disk caching • NV RAM support – cacheFS – Improves write performance – Extend buffer cache to disk for NFS – Normally NFS must write to disk on server before • Cache in memory first responding to client write requests • Cache on disk in 64KB chunks – Relax this rule through the use of non-volatile RAM Page 21 Page 22 The automounter Automounter Problem with mounts • Alternative to static mounting – If a client has many remote resources mounted, • Mount and unmount in response to client boot-time can be excessive demand – Each machine has to maintain its own name space – Set of directories are associated with a local • Painful to administer on a large scale directory – None are mounted initially Automounter – When local directory is referenced – Allows administrators to create a global name space • OS sends a message to each server – Support on-demand mounting • First reply wins – Attempt to unmount every 5 minutes Page 23 Page 24 4 Automounter maps The automounter Example: automount /usr/src srcmap NFS mount NFS srcmap contains: application automounter server cmd -ro doc:/usr/src/cmd kernel -ro frodo:/release/src \ bilbo:/library/source/kernel NFS request lib -rw sneezy:/usr/local/lib KERNEL VFS Access /usr/src/cmd: request goes to doc Access /usr/src/kernel: NFS ping frodo and bilbo, mount first response NFS request Page 25 Page 26 More improvements… NFS v3 More improvements… NFS v3 • Updated version of NFS protocol • New commit operation • Support 64-bit file sizes – Check with server after a write operation to see • TCP support and large-block transfers if data is committed – UDP caused more problems on WANs (errors) – If commit fails, client must resend data – All traffic can be multiplexed on one connection – Reduce number of write requests to server • Minimizes connection setup – Speeds up write requests – No fixed limit on amount of data that can be • Don’t require server to write to disk immediately transferred between client and server • Return file attributes with each request • Negotiate for optimal transfer size – Saves extra RPCs • Server checks access for entire path from client Page 27 Page 28 AFS • Developed at CMU • Commercial spin-off AFS – Transarc • IBM acquired Transarc Andrew File System Carnegie-Mellon University Currently open source under IBM Public License Also: c. 1986(v2), 1989(v3) OpenAFS, Arla, and Linux version Page 29 Page 30 5 AFS Design Goal AFS Assumptions • Most files are small • Reads are more common than writes Support information sharing • Most files are accessed by one user at a time on a large scale • Files are referenced in bursts (locality) – Once referenced, a file is likely to be referenced again e.g., 10,000+ systems Page 31 Page 32 AFS Design Decisions AFS Design Whole file serving • Each client has an AFS disk cache – Send the entire file on open – Part of disk devoted to AFS (e.g. 100 MB) – Client manages cache in LRU manner Whole file caching – Client caches entire file on local disk • Clients communicate with set of trusted servers – Client writes the file back to server on close • if modified • Each server presents one identical name space to clients • Keeps cached copy for future accesses – All clients access it in the same way – Location transparent Page 33 Page 34 AFS Server: cells AFS Server: volumes • Servers are grouped into administrative entities Disk partition contains called cells file and directories • Cell: collection of grouped into volumes – Servers – Administrators Volume – Users – Administrative unit of organization – Clients • e.g. user’s home directory, local source, etc. – Each volume is a directory tree (one root) • Each cell is autonomous but cells may cooperate and – Assigned a name and ID number present users with one uniform name space – A server will often have 100s of volumes Page 35 Page 36 6 Namespace management Accessing an AFS file Clients get information via cell directory server 1. Traverse AFS mount point E.g., /afs/cs.rutgers.edu (Volume Location Server) that hosts the Volume Location Database (VLDB) 2. AFS client contacts Volume Location DB on Volume Location server to look up the volume 3. VLDB returns volume ID and list of machines Goal: (>1 for replicas on read-only file systems) everyone sees the same namespace 4. Request root directory from any machine in the list 5. Root directory contains files, subdirectories, and /afs/cellname/path mount points 6. Continue parsing the file name until another mount /afs/mit.edu/home/paul/src/try.c point (from step