Distributed Computing Systems Distributed File Systems Outline

4/4/2014 Distributed File Systems • Early networking and files – Had FTP to transfer files – Telnet to remote login to other systems with files Distributed Computing Systems • But want more transparency! – local computing with remote file system • Distributed file systems One of earliest distributed Distributed File Systems system components • Enables programs to access remote files as if local – Transparency • Allows sharing of data and programs • Performance and reliability comparable to local disk Outline Concepts of Distributed File System • Overview (done) • Transparency • Basic principles (next) • Concurrent Updates – Concepts • Replication – Models • Fault Tolerance • Network File System (NFS) • Consistency • Platform Independence • Andrew File System (AFS) • Security • Dropbox • Efficiency 1 4/4/2014 Transparency Concurrent Updates Illusion that all files are similar. Includes: • Changes to file from one client should not • Access transparency — a single set of operations. Clients that work on local files can work with remote files. interfere with changes from other clients • Location transparency — clients see a uniform name – Even if changes at same time space. Relocate without changing path names. • Mobility transparency —files can be moved without • Solutions often include: modifying programs or changing system tables – File or record-level locking • Performance transparency —within limits, local and remote file access meet performance standards • Scaling transparency —increased loads do not degrade performance significantly. Capacity can be expanded. 5 6 Replication Fault Tolerance • File may have several copies of its data at • Function when clients or servers fail different locations – Often for performance reasons • Detect, report, and correct faults that occur – Requires update other copies when one copy is • Solutions often include: changed • Simple solution – Redundant copies of data, redundant hardware, – Change master copy and periodically refresh the other backups, transaction logs and other measures copies – Stateless servers • More complicated solution – Idempotent operations – Multiple copies can be updated independently at same time needs finer grained refresh and/or merge 7 8 2 4/4/2014 Consistency Platform Independence • Data must always be complete, current, and correct • Access even though hardware and OS • File seen by one process looks the same for all completely different in design, architecture processes accessing and functioning, from different vendors • Consistency special concern whenever data is • Solutions often include: duplicated – Well-defined way for clients to communicate with • Solutions often include: servers – Timestamps and ownership information 9 10 Security Efficiency • File systems must be protected against • Overall, want same power and generality as unauthorized access, data corruption, loss and local file systems other threats • Early days, goal was to share “expensive” • Solutions include: resource the disk – Access control mechanisms (ownership, • Now, allow convenient access to remotely permissions) stored files – Encryption of commands or data to prevent “sniffing” 11 12 3 4/4/2014 Outline File Service Models • Overview (done) Upload/Download Model Remote Access Model • Read file: copy file from server • File service provides functional • Basic principles (next) to client interface Concepts • Write file: copy file from client – Create, delete, read bytes, write – to server bytes, … – Models • Good • Good – Simple – Client only gets what’s needed Network File System (NFS) – Server can manage coherent view • • Bad of file system – Wasteful – what if client only • Andrew File System (AFS) needs small piece? • Bad – Problematic – what if client – Possible server and network • Dropbox doesn’t have enough space? congestion – Consistency – what if others • Servers used for duration of access need to modify file? • Same data may be requested repeatedly Semantics of File Service Accessing Remote Files (1 of 2) Sequential Semantics Session Semantics • For transparency, implement client as module Read returns result of last write Relax sequential rules under VFS • Easily achieved if • Changes to open file are – Only one server initially visible only to – Clients do not cache data process that modified it • But – Performance problems if no • Last process to modify file cache “wins” – Can instead write-through • Must notify clients holding • Can hide or lock file under copies modification from other • Requires extra state, generates extra traffic clients (Additional picture next slide) 4 4/4/2014 Accessing Remote Files (2 of 2) Stateful or Stateless Design Virtual file system allows for transparency Stateful Stateless Server maintains client-specific Server maintains no information on client accesses state • Each request must identify file • Shorter requests and offsets • Server can crash and recover • Better performance in – No state to lose processing requests • No open/close needed – They only establish state • Cache coherence possible • No server space used for state – Server can know who’s – Don’t worry about supporting accessing what many clients • Problems if file is deleted on • File locking possible server • File locking not possible Caching Concepts of Caching (1 of 2) • Hide latency to improve performance for Centralized control repeated accesses • Keep track of who has what open and cached on each node • Four places: • Stateful file system with signaling traffic – Server’s disk – Server’s buffer cache (memory) Read-ahead (pre-fetch) – Client’s buffer cache (memory) • Request chunks of data before needed – Client’s disk • Minimize wait when actually needed • Client caches risk cache consistency problems • But what if data pre-fetched is out of date? 5 4/4/2014 Concepts of Caching (2 of 2) Outline Write-through Overview (done) • All writes to file sent to server • – What if another client reads its own (out-of-date) cached copy? Basic principles (done) • All accesses require checking with server • • Or … server maintains state and sends invalidations • Network File System (NFS) (next) Delayed writes (write-behind) • Andrew File System (AFS) • Only send writes to files in batch mode (i.e., buffer locally) • One bulk write is more efficient than lots of little writes • Dropbox • Problem: semantics become ambiguous – Watch out for consistency – others won’t see updates! Write on close • Only allows session semantics • If lock, must lock whole file Network File System (NFS) NFS Overview • Introduced in 1984 (by Sun Microsystems) • Provides transparent access to remote files – Independent of OS (e.g., Mac, Linux, Windows) or • Not first made, but first to be used as product hardware • Made interfaces in public domain • Symmetric – any computer can be server and client – But many institutions have dedicated server Allowed other vendors to produce – • Export some or all files implementations • Must support diskless clients • Internet standard is NFS protocol (version 3) • Recovery from failure – RFC 1913 – Stateless, UDP, client retries • High performance • Still widely deployed, up to v4 but maybe too – Caching and read-ahead bloated so v3 widely used 6 4/4/2014 Underlying Transport Protocol NSF Protocols Initially NSF ran over UDP using Sun RPC • Since clients and servers can be implemented for • different platforms, need well-defined way to • Why UDP? communicate Protocol – Protocol – agreed upon set of requests and responses – Slightly faster than TCP between client and servers – No connection to maintain (or lose) • Once agreed upon, Apple implemented Mac NFS client can talk to a Sun implemented Solaris NFS server – NFS is designed for Ethernet LAN • NFS has two main protocols • Relatively reliable – Mounting Protocol : Request access to exported directory – Error detection but no correction tree – Directory and File Access Protocol : Access files and • NFS retries requests directories (read, write, mkdir, readdir … ) NFS Mounting Protocol NFS Architecture • In many cases, on same LAN, but not required • Request permission to access contents at pathname – Can even have client-server on same machine • Client • Directories available on server through /etc/exports – Parses pathname – When client mounts, becomes part of directory hierarchy – Contacts server for file handle Server 1 Client Server 2 • Server (root) (root) (root) – Returns file handle: file device #, i-node #, instance # • Client – Create in-memory VFS i-node at mount point export . vmunix usr nfs – Internally point to r-node for remote files • Client keeps state, not server Remote Remote people students x staff users • Soft-mounted – if client access fails, throw error to mount mount processes. But many do not handle file errors well big bobjon . jimann jane joe • Hard-mounted – client blocks processes, retries until server up (can cause problems when NFS server down) File system mounted at /usr/students is sub-tree located at /export/people in Server 1, and file system mounted at /usr/staff is sub-tree located at /nfs/users in Server 2 7 4/4/2014 Example NFS exports File NFS Automounter • File stored on server, typically /etc/exports • Automounter – only mount when access empty NFS-specified dir – Attempt unmount every 5 minutes – Avoids long start up costs when many NSF mounts # See exports(5) for a description. – Useful if users don’t need /public 192.168.1.0/255.255.255.0 (rw,no_root_squash) • Share the folder /public • Restrict to 192.168.1.0/24 Class C subnet – Use ‘ *’ for wildcard/any • Give read/write

Load more