Multi-Process Systems: • Information Marshalling and File System Unmarshalling • Block Management • Directory Implementation

What we have discussed before • The file abstraction • File manager and file descriptors Multi-Process Systems: • Information marshalling and File system unmarshalling • Block management • Directory implementation Operating Systems and Distributed Systems Operating Systems and Distributed Systems 2 What we have discussed before What we have discussed before Fid = open("MyInput.txt", O_RDONLY)) numRead = read(Fid, buf, BUF_LEN)) Operating Systems and Distributed Systems Operating Systems and Distributed Systems The long-term information What we will learn storage problem • Must store large amounts of data • Files – Gigabytes -> terabytes -> petabytes • Directories & naming • Stored information must survive the • File system implementation termination of the process using it • Example file systems – Lifetime can be seconds to years – Must have some way of finding it! • Multiple processes must be able to access the information concurrently Operating Systems and Distributed Systems 5 Operating Systems and6 Distributed Systems What is a file system? The Linux file system • An organization of data and metadata on a storage device • The Linux file system architecture is an interesting example of abstracting complexity. • Another way to think about a file system is as a • Using a common set of API functions, a large variety of file protocol. systems can be supported on a large variety of storage devices. – Just as network protocols (such as IP) give meaning to the streams of data traversing the Internet, file systems • Example: the read function call, which allows some number of give meaning to the data on a particular storage medium. bytes to be read from a given file descriptor. • There are many types of file systems and media. – unaware of file system types, such as ext3 or NFS. – unaware of the particular storage medium upon which the file – With all of this variation, the file system interface is system is mounted implemented as a layered architecture, separating • AT Attachment Packet Interface (ATAPI) disk, • the user interface layer • Serial-Attached SCSI (SAS) disk, • the file system implementation • Serial Advanced Technology Attachment (SATA) disk. • the drivers that manipulate the storage devices – when the read function is called for an open file, the data is returned as expected. Operating Systems and Distributed Systems 7 Operating Systems and Distributed Systems High-level architecture High-level architecture • User space contains the applications (for this example, the user of the file system) and the GNU C Library (glibc) which provides the user interface for the file system calls (open, read, write, close). – The SCI acts as a switch, funneling system calls from user space to the appropriate endpoints in kernel space • The VFS is the primary interface to the underlying file systems: – exports a set of interfaces and then abstracts them to the individual file systems, which may behave very differently from one another. • Two caches exist for file system objects (inodes and dentries), which I'll define shortly. Each provides a pool of recently-used file system objects. Operating Systems and Distributed Systems 9 Operating Systems and10 Distributed Systems The external view: Naming files Typical file extensions Extension Meaning • Important to be able to find files after they’re created File.txt Text file • Every file has at least one name File.c C source program • Name can be – Human-accessible: “foo.c”, “my photo”, “Go Slugs!” File.jpg JPEG-encoded picture – Machine-usable: 1138, 8675309 File.tgz Compressed tar file • Case may or may not matter File.html HTML file – Depends on the file system • Name may include information about the file’s contents File~ Emacs-created backup – Certainly does for the user (the name should make it easy to File.tex Input for TeX figure out what’s in it!) – Computer may use part of the name to determine the file type File.mp3 Audio encoded in MPEG layer 3 File.pdf Adobe Portable Document Format File.ps Postscript file Operating Systems and11 Distributed Systems Operating Systems and12 Distributed Systems File structures Internal file structure Magic number Header 1 byte 1 record Text size Module Data size name BSS size Date Header Symbol table size Object module 12A 101 111 Owner Entry point Protection Flags Size Text Header sab wm cm avg ejw sab elm ddel Object module Data Header Relocation bits Sequence Sequence Object module of bytes of records F01 W03 F03 Symbol table Tree Executable file Archive file Operating Systems and Distributed Systems 13 Operating Systems and Distributed Systems 14 Accessing a file File attributes (possibilities) • Sequential access Attribute Meaning – Read all bytes/records from the beginning Protection Who can access the file and how Password Password needed to access the file – Cannot jump around Creator ID of the person who created the file Owner Current owner of the file • May rewind or back up, however Read-only flag Indicates whether file may be modified – Convenient when medium was magnetic tape Hidden flag Indicates whether file should be displayed in listings System flag Is this an OS file? – Often useful when whole file is needed Archived flag Has this file been backed up? ASCII/binary flag Is this file ASCII or binary? • Random access Random access flag Is random access allowed on this file? Temporary flag Is this a temporary file? – Bytes (or records) read in any order Lock flags Used for locking a file to get exclusive access Record length Number of bytes in a record – Essential for database systems Key position / length Location & length of the key Creation time When was the file created? – Read can be … Last access time When was the file last accessed? • Move file marker (seek), then read or … Last modified time When was the file last modified? Current size Number of bytes in the file • Read and then move file marker Maximum size Number of bytes the file may grow to Operating Systems and Distributed Systems 15 Operating Systems and16 Distributed Systems File operations Directories • Create: make a new file • Append: like write, but only • Naming is nice, but limited at the end of the file • Delete: remove an existing • Humans like to group things together for file • Seek: move the “current” pointer elsewhere in the convenience • Open: prepare a file to be file accessed • File systems allow this to be done with • Get attributes: retrieve directories (sometimes called folders) • Close: indicate that a file is attribute information no longer being accessed • Set attributes: modify • Grouping makes it easier to • Read: get data from a file attribute information – Find files in the first place: remember the enclosing • Write: put data to a file • Rename: change a file’s directories for the file name – Locate related files (or just determine which files are related) Operating Systems and Distributed Systems 17 Operating Systems and18 Distributed Systems What’s in a directory? Directory structure • Structure • Two types of information – Linear list of files (often itself stored in a file) – File names • Simple to program – File metadata (size, timestamps, etc.) • Slow to run • Basic choices for directory information • Increase speed by keeping it sorted (insertions are slower!) – Store all information in directory – Hash table: name hashed and looked up in file • Fixed size entries • Decreases search time: no linear searches! • Disk addresses and attributes in directory entry • May be difficult to expand – Store names & pointers to index nodes (i-nodes) • Can result in collisions (two files hash to same location) attributes – Tree games attributes games • Fast for searching mail attributes mail attributes • Easy to expand news attributes news attributes research attributes research • Difficult to do in on-disk directory • Name length Storing all information Using pointers to attributes in the directory index nodes – Fixed: easy to program – Variable: more flexible, better for users Operating Systems and19 Distributed Systems Operating Systems and20 Distributed Systems Linux directories Hierarchical directory system • Linux supports Inode=3 Root directory – Plain text directories Rec_len=16 – Hashed directories Name_len=5 A B C – Plain text more common, but File_type=2 slower Users\0\0\0 • Directories just like regular A A A B B C C C foo Papers bar foo blah files Inode=251 Papers foo Photos • Name length limited to 255 Rec_len=12 bytes Name_len=2 A A A B B File_type=2 os.tex sunset Family foo.tex foo.ps Me\0\0 A A A sunset kids Mom Operating Systems and21 Distributed Systems Operating Systems and Distributed Systems 22 Unix directory tree Sharing files bin Root directory etc lib A B C usr tmp A A A B B C C C Papers foo Photos foo Photos bar foo blah elm A A A B os.tex sunset Family lake foo.c .cshrc /usr/elm/foo.c todo.txt A A ? sunset kids ??? Operating Systems and23 Distributed Systems Operating Systems and24 Distributed Systems Solution: use links Operations on directories • A creates a file, and inserts into her directory • Create: make a new • Readdir: read a directory • B shares the file by creating a link to it directory entry • A unlinks the file • Delete: remove a directory • Rename: change the name of a directory – B still links to the file (usually must be empty) • Similar to renaming a file • Opendir: open a directory – Owner is still A (unless B explicitly changes it) • Link: create a new entry in to allow searching it a directory to link to an A A B B • Closedir: close a directory existing file (done searching) b.tex b.tex • Unlink: remove an entry in a.tex a.tex a directory • Remove the file if this is the last link to this file Owner: A Owner: A Owner: A Count: 1 Count: 2 Count: 1 Operating Systems and25 Distributed Systems Operating Systems and26 Distributed Systems High-level architecture Virtual file system layer (Linux) • Each individual file system implementation, such as ext2, • The VFS acts as the root level of the file-system interface.

Multi-Process Systems: • Information Marshalling and File System Unmarshalling • Block Management • Directory Implementation

Serverless Network File Systems

XFS: There and Back ...And There Again? Slide 1 of 38

Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs

Filesystem Considerations for Embedded Devices ELC2015 03/25/15

Netinfo 2009-06-11 Netinfo 2009-06-11

Zack's Kernel News

Compatibility Matrix for CTE Agent with Data Security Manager Release 7.0.0 Document Version 15 January 19, 2021 Contents

Xfs: a Wide Area Mass Storage File System

Scaling Source Control for the Next Generation of Game Development by Mike Sundy & Tobias Roberts Perforce User's Conference May 2007, Las Vegas

A Transparently-Scalable Metadata Service for the Ursa Minor Storage System

NOVA: the Fastest File System for Nvdimms

Red Hat Ceph* Storage and Intel®