Outline for Today’s Lecture Soft vs. Hard Links

Administrative: What’s the difference in behavior? / – TAs expect to be in Teer tonight – Demo signups Terry Lynn Objective: Jamie – naming issues continued – Distributed file systems (naming) – Down a level – files themselves

Unix File Naming (Hard Links) Symbolic (Soft) Links

A directory B A Unix file may have multiple names. 0 wind: 18 • Unix files may also be named by symbolic (soft) links. rain: 32 0 – A soft is a file containing a pathname of some other file. hail: 48 sleet: 48 Each directory entry naming the file is called a hard link. symlink symlink (existing name, new name) inodelink directory A directory B Each contains a reference count allocate a new file (inode) with type symlink count = 2 0 wind: 18 showing how many hard links name it. initialize file contents with existing name 0 inode48 rain: 32 create directory entry for new file with new name hail: 48 sleet: 67

linksystem call unlinksystem call (“remove”) The target of the link may be link (existing name, new name) (name) removed at any time, leaving create a new name for an existing file destroy directory entry inodelink ../A/hail/0 a dangling reference. increment inode link count decrement inode link count count = 1 if count = 0 and file is not in active use inode48 inode67 How should the kernel free blocks (recursively) and on -disk inode handle recursive soft links? Convenience, but not performance!

1 Soft vs. Hard Links Soft vs. Hard Links

What’s the difference in behavior? What’s the difference in behavior? / /

Terry Lynn Terry Lynn

Jamie Jamie X X

Naming Her local Distributed File Systems directory usr • \\His\d\pictures\castle.jpg m_pt • Naming client – Not location transparent - both – Location machine and drive embedded transparency/ server in name. for_export independence • NFS mounting A B His local network • Caching – Remote directory mounted dir tree over local directory in local – Consistency client Her local His after naming hierarching. tree after • Replication mount – /usr/m_pt/A mount A B usr on B – Availability and client server – No global view usr updates m_pt m_pt A B

2 Global Name Space VFS: the Filesystem Switch Sun Microsystems introduced the framework in 1985 to accommodate the Network File Example: System cleanly. / • VFS allows diverse specific file systems to coexist in a file tree, isolating all FS-dependencies inpluggable filesystem modules.

afs user space VFS was an internal kernel restructuring syscall layer (file, uio, etc.) with no effect on the syscall interface. tmp bin lib network protocol Virtual File System (VFS) Incorporates object-oriented concepts: stack a generic procedural interface with local files (TCP/IP) NFS FFS LFS *FS etc. etc. multiple implementations. device drivers shared files - Other abstract interfaces in the kernel: device drivers, looks identical to file objects, executable files, memory objects. all clients

Vnodes Example: In the VFS framework, every file or directory in active (NFS) use is represented by a vnode object in kernel client server syscall layer memory. user programs VFS free vnodes syscall layer syscall layer VFS NFS server Each vnodehas a standard file attributes struct. Generic vnodepoints at UFS Activevnodes are reference- filesystem-specific struct counted by the structures that (e.g., inode, rnode), seen UFS NFS hold pointers to them, e.g., only by the filesystem. client the system file table. NFS UFS Vnodeoperations are Each specific file macros that vector to network system maintains a filesystem-specific hash of its resident procedures. vnodes.

3 Vnode Operations and Attributes Pathname Traversal vnode/file attributes (vattr or fattr) directories only type (VREG, VDIR, VLNK, etc.) vop_lookup (OUT vpp, name) • When a pathname is passed as an argument to a mode (9+ bits of permissions) vop_create (OUT vpp, name, vattr ) system call, the syscall layer must “convert it to a nlink (hard link count) vop_remove (vp, name) vnode”. owner user ID vop_link (vp, name) owner group ID vop_rename (vp, name, tdvp, tvp, name) • Pathname traversal is a sequence of vop_lookup calls filesystem ID vop_mkdir (OUT vpp, name, vattr ) to descend the tree to the named file or directory. unique file ID vop_rmdir (vp, name) open(“/tmp/zot”) Issues: (bytes and blocks) vop_readdir (uio, cookie) vp = getvnode for / (rootdir) 1. crossing mount points access time vop_symlink (OUT vpp, name, vattr, contents) vp->vop_lookup(&cvp, “tmp”); modify time 2. obtaining root vnode (or current dir) vop_readlink (uio) vp = cvp; generation number 3. finding resident vnodes in memory vp->vop_lookup(&cvp, “zot”); 4. caching name- >vnode translations files only generic operations vop_getpages (page**, count, offset) 5. symbolic (soft) links vop_getattr (vattr ) vop_putpages (page**, count, sync, offset) 6. disk implementation of directories vop_setattr (vattr ) vop_fsync () 7. locking/referencing to handle races vhold() with name create and delete operations vholdrele()

Hints Prefix Tables

• A valuable distributed systems design technique that / can be illustrated in naming. A /A/m_pt1 -> blue • Definition: information that is not guaranteed to be m_pt1 correct. If it is, it can improve performance. If not, things /A/m_pt1/usr/B -> pink will still work OK. Must be able to validate information. usr /A/m_pt1/usr/m_pt2 -> pink • Example: Sprite prefix tables B m_pt2

/A/m_pt1/usr/m_pt2/stuff.below

4 Performance Issue re:Naming Meta- What to do about long paths? • Make long lookups cheaper - cluster • File size • Location of file - and data on disk to make each component • File type which device resolution step somewhat cheaper • Protection - access • Location of – Immediate files - meta-data and first block of data control information individual blocks of co-located the file on disk. • History: • Collapse prefixes of paths - hash table creation time, • Owner of file – Prefix table last modification, • Group(s) of users • “Cache it” - in this case, directory info last access. associated with file

Access Control for Files UNIX access control

• Access control lists - detailed list • Each file carries its access control with it. attached to file of users allowed rwx rwx rwx setuid (denied) access, including kind of access allowed/denied. Owner Group Everybody else When bit set, it • UNIX RWX - owner, group, everyone UID GID allows process executing object to assume UID of • Owner has chmod, chgrp rights owner temporarily - (granting, revoking) enter owner domain (rights amplification)

5 The Access Model Access Control Matrix • Authorization problems can be represented abstractly by of an access model. • Processes execute in – each row represents a subject/principal/domain a protection domain, hotgossip solutions proj1 luvltr – each column represents an object initially inherited from gradefile subject – each cell: accesses permitted for the {subject, TA rw rw rx r object}pair • , , delete, execute, search, control, or any other grp r rwx method • In real systems, the access matrix is sparse Terry rw and dynamic. Lynn rw rw • need a flexible, efficient representation

22

Two Representations Access Control Lists • ACL - Access Control Lists – Columns of previous matrix • Approach: represent the access matrix by – Permissions attached to Objects storing its columns with the objects. – ACL for file hotgossip: Terry, rw; Lynn, rw • Tag each object with an access control list (ACL) of • Capabilities authorized subjects/principals. – Rows of previous matrix • To authorize an access requested by S for O – Permissions associated with Subject – search O’s ACL for an entry matching S – Tickets, Namespace (what it is that one can name) – compare requested access with permitted access – Capabilities held by Lynn: luvltr, rw; hotgossip,rw – access checks are often made only at bind time

23

6 Dynamics of Protection Capabilities Schemes • Approach: represent the access matrix by storing its rows with the subjects. • How to endow software modules with • Tag each subject with a list of capabilities for the objects appropriate privilege? it is permitted to access. – What mechanism exists to bind principals with – A capability is an unforgeable object reference, subjects? like a pointer. • e.g., setuid syscall, setuidbit – It endows the holder with permission to operate on the object – What principals should a software module bind to? • e.g., permission to invoke specific methods • privilege of creator: but may not be sufficient to perform the service – Typically, capabilities may be passed from one • privilege of owner or system: dangerous subject to another. • Rights propagation and confinement problems

Dynamics of Protection Protection Domains Schemes • Processes execute in a • How to revoke privileges? protection domain, initially • What about adding new subjects or new inherited from subject hotgossip Domain0 solutions proj1 luvltr objects? • Goal: to be able to gradefile change protection • How to dynamically change the set of objects TA rw rwo rxc r ctl domains accessible (or vulnerable) to different • Introduce a level of grp r rwx enter processes run by the same user? indirection – Need-to-know principle / Principle of minimal • Domains become Terry rw privilege protected objects with Lynn rw rw – How do subjects change identity to execute a operations defined on more privileged module? them: owner, copy, Domain0 r • protection domain, protection domain switch (enter) control 27 28

7 • If domain contains copy on right to some object, Finally Arrive at File then it can transfer that right to the object to another domain. • What do users seem to want from the file • If domain is owner of abstraction? hotgossip Domain0 solutions proj1 luvltr some object, it can gradefile grant that right to the • What do these usage patterns mean for file object, with or without TA rw rwo rc r ctl structure and implementation decisions? copy to another domain grp r rwo – What operations should be optimized 1st? • If domain is owner or – How should files be structured? has ctl right to a Terry rc rw domain, it can remove – Is there temporal locality in file usage? right to object from that Lynn rw rw enter – How long do files really live? domain r r • Rights propagation. Domain0 29

Generalizations from UNIX Workloads More on Access Patterns

• Standard Disclaimers that you can’t • There is significant reuse (re-opens) - most generalize…but anyway… opens go to files repeatedly opened & • Most files are small (fit into one disk block) quickly. Directory nodes and executables although most bytes are transferred from also exhibit good temporal locality. longer files. – Looks good for caching! • Most opens are for read mode, most bytes • Use of temp files is significant part of file transferred are by read operations system activity in UNIX - very limited reuse, • Accesses tend to be sequential and 100% short lifetimes (less than a minute).

8 File Structure Implementation: UNIX Inodes Mapping File -> Block Data Block Addr

• Contiguous Data blocks File ... 3 ... 3 – 1 block pointer, causes fragmentation, growth is a Attributes 3 3 problem.

• Linked Addr ... – each block points to next block, directory points to

first, OK for sequential access Block 1 ... • Indexed 2 ... – index structure required, better for random access ... 2 ... into file. Decoupling meta-data 1 2 from directory entries 2 1

File Allocation Table (FAT)

eof

Lecture.ppt

Pic.jpg

Notes.txt

eof

eof

9