Distributed Computing Systems Distributed File Systems Outline Concepts of Distributed File System Transparency Concurrent Updat

2/12/2016 Distributed File Systems • Early networking and files – Had FTP to transfer files – Telnet to remote login to other systems with files Distributed Computing Systems • But want more transparency! – local computing with remote file system • Distributed file systems One of earliest distributed Distributed File Systems system components • Enables programs to access remote files as if local – Transparency • Allows sharing of data and programs • Performance and reliability comparable to local disk Outline Concepts of Distributed File System • Overview (done) • Transparency • Basic principles (next) • Concurrent Updates – Concepts • Replication – Models • Fault Tolerance • Network File System (NFS) • Consistency • Platform Independence • Andrew File System (AFS) • Security • Dropbox • Efficiency Transparency Concurrent Updates Illusion that local/remote files are similar. Includes: • Changes to file from one client • Access transparency — a single set of Transparency operations. Clients that work on local files Transparency should not interfere with Concurrent Updates can work with remote files. Concurrent Updates changes from other clients Replication • Location transparency — clients see a Replication Fault Tolerance uniform name space. Relocate without Fault Tolerance – Even if changes at same time Consistency changing path names. Consistency Platform Independence • Mobility transparency — files can be moved Platform Independence • Solutions often include: Security without modifying programs or changing Security Efficiency system tables Efficiency – File or record-level locking • Performance transparency — within limits, local and remote file access meet performance standards • Scaling transparency —increased loads do not degrade performance significantly. Capacity can be expanded. 5 6 1 2/12/2016 Replication Fault Tolerance • File may have several copies of its • Function when clients or servers data at different locations fail Transparency – Often for performance reasons Transparency • Detect, report, and correct faults Concurrent Updates – Requires update other copies when Concurrent Updates Replication one copy is changed Replication that occur Fault Tolerance Fault Tolerance Consistency • Simple solution Consistency • Solutions often include: Platform Independence – Change master copy and periodically Platform Independence – Redundant copies of data, Security refresh other copies Security redundant hardware, backups, Efficiency • More complicated solution Efficiency transaction logs and other measures – Multiple copies can be updated independently at same time needs – Stateless servers finer grained refresh and/or merge – Idempotent operations 7 8 Consistency Platform Independence • Data must always be complete, • Access even though hardware current, and correct Transparency Transparency and OS completely different in Concurrent Updates • File seen by one process looks Concurrent Updates design, architecture and Replication the same for all processes Replication Fault Tolerance accessing Fault Tolerance functioning, from different Consistency Consistency Platform Independence • Consistency special concern Platform Independence vendors Security whenever data is duplicated Security Efficiency Efficiency • Solutions often include: • Solutions often include: – Well-defined way for clients to – Timestamps and ownership information communicate with servers (protocol) 9 10 Security Efficiency • File systems must be protected • Overall, want same power Transparency against unauthorized access, Transparency and generality as local file Concurrent Updates Concurrent Updates systems Replication data corruption, loss and other Replication Fault Tolerance threats Fault Tolerance • Early days, goal was to Consistency Consistency share “expensive” resource Platform Independence • Solutions include: Platform Independence Security Security the disk Efficiency Efficiency – Access control mechanisms • Now, allow convenient (ownership, permissions) access to remotely stored – Encryption of commands or files data to prevent “sniffing” 11 12 2 2/12/2016 Outline File Service Models • Overview (done) Upload/Download Model Remote Access Model • Read file: copy file from server • File service provides functional • Basic principles (next) to client interface Concepts • Write file: copy file from client – Create, delete, read bytes, write – to server bytes, … – Models • Good • Good – Simple – Client only gets what’s needed – Server can manage coherent view • Network File System (NFS) • Bad of file system – Wasteful – what if client only • Andrew File System (AFS) needs small piece? • Bad – Problematic – what if client – Possible server and network • Dropbox doesn’t have enough space? congestion – Consistency – what if others • Servers used for duration of access need to modify file? • Same data may be requested repeatedly Semantics of File Service Accessing Remote Files (1 of 2) Sequential Semantics Session Semantics • For transparency, implement client as module Read returns result of last write Relax sequential rules under Virtual File System (VFS) • Easily achieved if • Changes to open file are – Only one server initially visible only to – Clients do not cache data process that modified it • But – Performance problems if no • Last process to modify file cache “wins” – Can instead write-through • Must notify clients holding • Can hide or lock file under copies modification from other • Requires extra state, generates extra traffic clients (Additional picture next slide) Accessing Remote Files (2 of 2) Stateful or Stateless Design Virtual file system allows for transparency Stateful Stateless Server maintains client-specific Server maintains no information on client accesses state • Each request must identify file • Shorter requests and offsets • Server can crash and recover • Better performance in – No state to lose processing requests • No open/close needed – They only establish state • Cache coherence possible • No server space used for state – Server can know who’s – Don’t worry about supporting accessing what many clients • Problems if file is deleted on • File locking possible server • File locking not possible 3 2/12/2016 Caching Concepts of Caching (1 of 2) • Hide latency to improve performance for Centralized control repeated accesses • Keep track of what files each client has open and cached • Four places: • Stateful file system with signaling traffic – Server ’s disk – Server ’s buffer cache (memory) Read-ahead (pre-fetch) – Client ’s buffer cache (memory) • Request chunks of data before needed – Client ’s disk • Minimize wait when actually needed • Client caches risk cache consistency problems • But what if data pre-fetched is out of date? Concepts of Caching (2 of 2) Outline Write-through Overview (done) • All writes to file sent to server • – What if another client reads its own (out-of-date) cached copy? Basic principles (done) • All accesses require checking with server • • Or … server maintains state and sends invalidations • Network File System (NFS) (next) Delayed writes (write-behind) • Andrew File System (AFS) • Only send writes to files in batch mode (i.e., buffer locally) • One bulk write is more efficient than lots of little writes • Dropbox • Problem: semantics become ambiguous – Watch out for consistency – others won’t see updates! Write on close • Only allows session semantics • If lock, must lock whole file Network File System (NFS) NFS Overview • Introduced in 1984 (by Sun Microsystems) • Provides transparent access to remote files – First was 1970’s Data Access Protocol by DEC – Independent of OS (e.g., Mac, Linux, Windows) or – But NFS first to be used as product hardware – Developed in conjunction with Sun RPC • Symmetric – any computer can be server and client • Made interfaces in public domain – But many setups have dedicated server – Request For Comment (RFC) by Internet Engineering Task • Export some or all files Force (IETF) – technical development of Internet standards • Must support diskless clients – Allowed other vendors to produce implementations • Recovery from failure • Internet standard is NFS protocol (version 3) – Stateless, UDP, client retries – RFC 1913 • High performance • Still widely deployed, up to v4 but maybe too bloated so – Caching and read-ahead v3 widely used 4 2/12/2016 Underlying Transport Protocol NFS Protocols Initially NFS ran over UDP using Sun RPC • Since clients and servers can be implemented for • different platforms, need well-defined way to • Why UDP ? communicate Protocol – Protocol – agreed upon set of requests and responses – Slightly faster than TCP between client and servers – No connection to maintain (or lose) • Once agreed upon, Apple Mac NFS client can talk to a Sun Solaris NFS server – Reliable send not issue • NFS has two main protocols • NFS is designed for Ethernet LAN (relatively reliable) – Mounting Protocol - Request access to exported directory – UDP has error detection but no correction tree – Directory and File Access Protocol - Access files and • NFS retries requests upon error/timeout directories (read, write, mkdir, readdir … ) NFS Mounting Protocol NFS Architecture • In many cases, on same LAN, but not required • Request permission to access contents at pathname – Can even have client-server on same machine • Client • Directories available on server through /etc/exports – Parses pathname – When client mounts, becomes part of directory hierarchy – Contacts server for file handle Server 1 Client Server 2 • Server (root) (root) (root) – Returns

Load more