File Systems: Semantics & Structure What Is a File
Total Page:16
File Type:pdf, Size:1020Kb
5/15/2017 File Systems: Semantics & Structure What is a File 11A. File Semantics • a file is a named collection of information 11B. Namespace Semantics • primary roles of file system: 11C. File Representation – to store and retrieve data – to manage the media/space where data is stored 11D. Free Space Representation • typical operations: 11E. Namespace Representation – where is the first block of this file 11L. Disk Partitioning – where is the next block of this file 11F. File System Integration – where is block 35 of this file – allocate a new block to the end of this file – free all blocks associated with this file File Systems Semantics and Structure 1 File Systems Semantics and Structure 2 Data and Metadata Sequential Byte Stream Access • File systems deal with two kinds of information int infd = open(“abc”, O_RDONLY); int outfd = open(“xyz”, O_WRONLY+O_CREATE, 0666); • Data – the contents of the file if (infd >= 0 && outfd >= 0) { – e.g. instructions of the program, words in the letter int count = read(infd, buf, sizeof buf); Metadata – Information about the file • while( count > 0 ) { e.g. how many bytes are there, when was it created – write(outfd, buf, count); sometimes called attributes – count = read(infd, inbuf, BUFSIZE); • both must be persisted and protected } – stored and connected by the file system close(infd); close(outfd); } File Systems Semantics and Structure 3 File Systems Semantics and Structure 4 Random Access Consistency Model void *readSection(int fd, struct hdr *index, int section) { struct hdr *head = &hdr[section]; • When do new readers see results of a write? off_t offset = head->section_offset; – read-after-write size_t len = head->section_length; • as soon as possible, data-base semantics void *buf = malloc(len); if (buf != NULL) { • this commonly called “POSIX consistency” lseek(fd, offset, SEEK_SET); – read-after-close (or sync/commit) if ( read(fd, buf, len) <= 0) { free(buf); • only after writes are committed to storage buf = NULL; – open-after-close (or sync/commit) } • each open sees a consistent snapshot } return(buf); – explicitly versioned files } • each open sees a named, consistent snapshot File Systems Semantics and Structure 5 File Systems Semantics and Structure 6 1 5/15/2017 File Attributes – basic properties Extended File Types and Attributes • thus far we have focused on a simple model • extended protection information – a file is a "named collection of data blocks" – e.g. access control lists • in most OS files have more state than this • resource forks – file type (regular file, directory, device, IPC port, ...) – e.g. configuration data, fonts, related objects – file length (may be excess space at end of last block)) • application defined types – ownership and protection information – e.g. load modules, HTML, e-mail, MPEG, ... – system attributes (e.g. hidden, archive) – creation time, modification time, last accessed time • application defined properties • typically stored in file descriptor structure – e.g. compression scheme, encryption algorithm, ... File Systems Semantics and Structure 7 File Systems Semantics and Structure 8 Databases Object Stores simplified file systems, cloud storage • a tool managing business critical data • – optimized for large but infrequent transfers • table is equivalent of a file system • bucket is equivalent of a file system • data organized in rows and columns – a bucket contains named, versioned objects – row indexed by unique key • objects have long names in a flat name space – columns are named fields within each row – object names are unique within a bucket • support a rich set of operations • an object is a blob of immutable bytes – multi-object, read/modify/write transactions – get … all or part of the object – SQL searches return consistent snapshots – put … new version, there is no append/update – insert/delete row/column operations – delete File Systems Semantics and Structure 9 File Systems Semantics and Structure 10 Key-Value Stores File Names and Name Binding • smaller and faster than an SQL database • file system knows files by internal descriptors – optimized for frequent small transfers • users know files by names • table is equivalent of a file system – names more easily remembered than disk addresses – a table is a collection of key/value pairs – names can be structured to organize millions of files • keys have long names in a flat name space • file system responsible for name-to-file mapping – associating names with new files – key names are unique within a table – changing names associated with existing files value is a (typically 64-64MB) string • – allowing users to search the name space – get/put (entire value) • there are many ways to structure a name space – delete File Systems Semantics and Structure 11 File Systems Semantics and Structure 12 2 5/15/2017 What is in a Name? Flat Name Spaces directory • there is one naming context per file system /home /mark /TODO. txt – all file names must be unique within that context suffix separator base name • all files have exactly one true name – these names are probably very long • suffixes and file types • file names may have some structure – file-to-application binding often based on suffix e.g. CAC101.CS111.SECTION1.SLIDES.LECTURE_13 • defined by system configuration registry – • configured per user, or per directory – this structure may be used to optimize searches – suffix may define the file type (e.g. Windows) – the structure is very useful to users – suffix may only be a hint (magic # defines type) – the structure has no meaning to the file system File Systems Semantics and Structure 13 File Systems Semantics and Structure 14 Hierarchical Namespaces A rooted directory tree • directory root – a file containing references to other files – it can be used as a naming context user_1 user_2 user_3 • each process has a current working directory • names are interpreted relative to directory file_a dir_a file_b file_c dir_a • nested directories can form a tree (/user_1/file_a) (/user_1/dir_a) (/user_2/file_b) (/user_3/file_c) (/user_3/dir_a) – file name is a path through that tree – directory tree expands from a root node file_a file_b (/user_1/dir_a/file_a) (/user_3/dir_a/file_b) • fully qualified names begin from the root – may actually form a directed graph File Systems Semantics and Structure 15 File Systems Semantics and Structure 16 True Names vs. Path Names Hard Links: example • Some file systems have “true names” root • DOS and ISO9660 have a single “path name” user_1 user_3 – files are described by directory entries – data is referred to by exactly one directory entry dir_a file_c – each file has only one (character string) name file_a ln /user_3/dir_a/file_b /user_1/file_a • Unix (and Linux) … have named links file_b – files are described by I-nodes (w/unique I#) – directories associate names with I-node numbers – many directory entries can refer to same I-node Both names now refer to the same I-node File Systems Semantics and Structure 17 File Systems Semantics and Structure 18 3 5/15/2017 Unix-style Hard Links Symbolic Links: example • all protection information is stored in the file root ln –s /user_3/dir_a/file_b /user_1/file_a – file owner sets file protection (e.g. read-only) user_1 user_3 – all links provide the same access to the file – anyone with read access to file can create new link file_a dir_a file_c – but directories are protected files too • not everyone has read or search access to every directory /user_3/dir_a/file_b file_b • all links are equal – there is nothing special about the owner‘s link – file is not deleted until no links remain to file – reference count keeps track of references /user_1/file_a is now a macro for /user_3/dir_a/file_b File Systems Semantics and Structure 19 File Systems Semantics and Structure 20 Symbolic Links Generalized Directories: Issues • another type of special file • Can there be multiple links to a single file? – an indirect reference to some other file – who can create new references to a file? – contents is a path name to another file – if one reference is deleted, does the file go away? • Operating System recognizes symbolic links – can the file's owner cancel other peoples' access? – automatically opens associated file instead • Can clients work their way back up the tree? – if file is inaccessible or non-existent, the open fails • Is the namespace truly a tree • symbolic link is not a reference to the I-node – or is it an a-cyclic directed graph – symbolic links will not prevent deletion – or an arbitrary directed graph – do not guarantee ability to follow the specified path – Internet URLs are similar to symbolic links • Does namespace span multiple file systems? File Systems Semantics and Structure 21 File Systems Semantics and Structure 22 File System Goals File System Structure • ensure the privacy and integrity of all files • disk volumes are divided into fixed-sized blocks • efficiently implement name-to-file binding – many sizes are used: 512, 1024, 2048, 4096, 8192 ... – find file associated with this name • most of them will store user data – list the file names in this part of the name space • some will store organizing “meta-data” • efficiently manage data associated w/each file – description of the file system (e.g. layout and state) – file control blocks to describe individual files – return data at offset X in file Y – lists of free blocks (not yet allocated to any file) – write data Z at offset X in file Y • all operating systems have such data structures • manage attributes associated w/each file – different OS and FS often have very different goals – what is the length of file Y – these result in very different implementations – change owner/protection of file Y to be X File Systems Semantics and Structure 23 File Systems Semantics and Structure 24 4 5/15/2017 Unix System 5 – Volume Structure File Descriptor Structures block 0 boot block • all file systems have file descriptor structures super block size and number of I-nodes are block 1 • contain all info about file UNIX I-node block specified in super block – type (e.g.