File Management What Is a File?

CPSC 410 / 611 : Operating Systems File Management File Management • What is a file? • Elements of file management • File organization • Directories • File allocation What is a File? A file is a collection of data elements, grouped together for purpose of access control, retrieval, and modification Persistence: Often, files are mapped onto physical storage devices, usually nonvolatile. Some modern systems define a file simply as a sequence, or stream of data units. A file system is the software responsible for – creating, destroying, reading, writing, modifying, moving files – controlling access to files – management of resources used by files. 1 CPSC 410 / 611 : Operating Systems File Management The Logical View of File Management user • directory management file structure • access control • access method records • blocking physical blocks in memory physical blocks on disk • disk scheduling • file allocation File Management • What is a file? • Elements of file management • File organization • Directories user file structure • directory management • File allocation • access control • UNIX file system • access method records physical blocks in memory • blocking physical blocks on disk • disk scheduling • file allocation 2 CPSC 410 / 611 : Operating Systems File Management Logical Organization of a File • A file is perceived as an ordered collection of records, R0, R1, ..., Rn . • A record is a contiguous block of information transferred during a logical read/write operation. • Records can be of fixed or variable length. • Organizations: – Pile – Sequential File – Indexed Sequential File – Indexed File – Direct/Hashed File Pile • Variable-length records • Chronological order • Random access to record by search of whole file. • What about modifying records? Pile File 3 CPSC 410 / 611 : Operating Systems File Management Sequential File • Fixed-format records key field • Records often stored in order of key field. • Good for applications that process all records. • No adequate support for random access. • Q: What about adding new record? • A: Separate pile file keeps Sequential File log file or transaction file. Indexed Sequential File • Similar to sequential file, with two additions. – Index to file supports index random access. – Overflow file indexed main file from main file. • Record is added by appending it to overflow file and providing link from predecessor. overflow file Indexed Sequential File 4 CPSC 410 / 611 : Operating Systems File Management Indexed File • Variable-length records index • Multiple Indices index partial index • Exhaustive index vs. partial index File Representation to User (Unix) UNIX File Descriptors: file descriptor system file in-memory table table inode table [0] int myfd; [1] [2] myfd = open(“myfile.txt”, O_RDONLY); [3] [4] file descriptor file structure table [0] myfd [1] 3 [2] [3] user space kernel space [4] 3 ISO C File Pointers: FILE *myfp; myfp myfp = fopen(“myfile.txt”, “w”); user space kernel space 5 CPSC 410 / 611 : Operating Systems File Management File Descriptors and fork() • With fork(), child inherits content of parent’s address parent’s file desc table A(SFT) ’ [0] space, including most of parent s B(SFT) [1] state: C(SFT) [2] D(SFT) – scheduling parameters [3] system file table (SFT) – file descriptor table [4] [5] – signal state A B – environment – etc. C child’s file desc table D (“myf.txt”) A(SFT) [0] B(SFT) [1] C(SFT) [2] D(SFT) [3] [4] [5] File Descriptors and fork() (II) parent’s file desc table A(SFT) [0] B(SFT) int main(void) { [1] ‘ ’ C(SFT) char c = ! ; [2] D(SFT) int myfd; [3] system file table (SFT) [4] myfd = open(‘myf.txt’, O_RDONLY); [5] A B fork(); C read(myfd, &c, 1); child’s file desc table D (“myf.txt”) A(SFT) [0] ‘ ’ B(SFT) printf( Process %ld got %c\n , [1] C(SFT) (long)getpid(), c); [2] D(SFT) [3] return 0; [4] } [5] 6 CPSC 410 / 611 : Operating Systems File Management File Descriptors and fork() (III) parent’s file desc table A(SFT) [0] int main(void) { B(SFT) [1] system file table (SFT) ‘ ’ C(SFT) char c = ! ; [2] int myfd; D(SFT) [3] A [4] B fork(); [5] C ‘ ’ myfd = open( myf.txt , O_RDONLY); D (“myf.txt”) read(myfd, &c, 1); child’s file desc table A(SFT) E (“myf.txt”) [0] printf(‘Process %ld got %c\n’, B(SFT) [1] C(SFT) (long)getpid(), c); [2] E(SFT) [3] return 0; [4] } [5] Duplicating File Descriptors: dup2() • Want to redirect I/O from well-known file descriptor to descriptor associated with some other file? – e.g. stdout to file? Errors: EBADF: fildes or fildes2 is not valid #include <unistd.h> EINTR: dup2 interrupted by signal int dup2(int fildes, int fildes2); Example: redirect standard output to file. int main(void) { int fd = open(‘my.file’, <some_flags>, <some_mode>); dup2(fd, STDOUT_FILENO); close(fd); write(STDOUT_FILENO, ‘OK’, 2); } 7 CPSC 410 / 611 : Operating Systems File Management Duplicating File Descriptors: dup2() (II) • Want to redirect I/O from well-known file descriptor to descriptor associated with some other file? – e.g. stdout to file? Errors: EBADF: fildes or fildes2 is not valid #include <unistd.h> EINTR: dup2 interrupted by signal int dup2(int fildes, int fildes2); after open after dup2 after close file descriptor table file descriptor table file descriptor table [0] standard input [0] standard input [0] standard input [1] standard output [1] write to file.txt [1] write to file.txt [2] standard error [2] standard error [2] standard error [3] write to file.txt [3] write to file.txt File Management • What is a file? • Elements of file management • File organization • Directories user file structure • directory management • File allocation • access control • UNIX file system records • access method physical blocks in memory • blocking physical blocks on disk • disk scheduling • file allocation 8 CPSC 410 / 611 : Operating Systems File Management Allocation Methods • File systems manage disk resources • Must allocate space so that – space on disk utilized effectively – file can be accessed quickly • Typical allocation methods: – contiguous – linked – indexed • Suitability of particular method depends on – storage device technology – access/usage patterns Contiguous Allocation Logical file mapped onto a sequence of adjacent 0 1 2 3 physical blocks. 4 5 6 7 Pros: • minimizes head movements 8 9 10 11 • simplicity of both sequential and direct access. • Particularly applicable to applications where 12 13 14 15 entire files are scanned. 16 17 18 19 Cons: 20 21 22 23 • Inserting/Deleting records, or changing length of records difficult. 24 25 26 27 • Size of file must be known a priori. (Solution: copy file to larger hole if exceeds allocated file start length size.) file1 0 5 • External fragmentation file2 10 2 • Pre-allocation causes internal fragmentation file3 16 10 9 CPSC 410 / 611 : Operating Systems File Management Linked Allocation • Scatter logical blocks throughout secondary 0 1 2 3 storage. • Link each block to next one by forward 4 5 6 7 pointer. • May need a backward pointer for backspacing. 8 9 10 11 Pros: 12 13 14 15 • blocks can be easily inserted or deleted 16 17 18 19 • no upper limit on file size necessary a priori • size of individual records can easily change 20 21 22 23 over time. 24 25 26 27 Cons: file start end • direct access difficult and expensive file 1 9 23 • overhead required for pointers in blocks … … … • reliability … … … Variations of Linked Allocation Maintain all pointers as a separate linked list, preferably in main memory. 0 16 0 1 2 3 9 0 10 23 4 5 6 7 file start end 16 24 file1 9 23 8 9 10 11 ... ... ... 23 -1 12 13 14 15 ... ... ... 24 26 26 10 16 17 18 19 20 21 22 23 Example: File-Allocation Tables (FAT) 24 25 26 27 10 CPSC 410 / 611 : Operating Systems File Management Indexed Allocation Keep all pointers to blocks in one location: index 0 1 2 3 block (one index block per file) 4 5 6 7 9 0 16 24 26 10 23 -1 -1 -1 8 9 10 11 • Pros: 12 13 14 15 – supports direct access – no external fragmentation 16 17 18 19 – therefore: combines best of continuous and linked allocation. 20 21 22 23 • Cons: 24 25 26 27 – internal fragmentation in index blocks file index block file1... ...7 • Trade-off: ... ... – what is a good size for index block? ... ... – fragmentation vs. file length Solutions for the Index-Block-Size Dilemma Linked index blocks: Multilevel index scheme: 11 CPSC 410 / 611 : Operating Systems File Management Index Block Scheme in UNIX 0 direct direct 9 single 10 indirect double 11 indirect triple 12 indirect UNIX (System V) Allocation Scheme Example: block size: 1kB access byte offset 9000 access byte offset 350000 808 367 8 367 816 331 3333 0 75 11 9156 331 3333 9156 12 CPSC 410 / 611 : Operating Systems File Management Free Space Management (conceptual) • Must keep track where unused blocks are. • Can keep information for free space management in unused blocks. • Bit vector: used free used used used free used free free ... # #1 #2 #3 #4 #5 #6 #7 #8 0 • Linked list: Each free block contains pointer to next free block. • Variations: • Grouping: Each block has more than on pointer to empty blocks. • Counting: Keep pointer of first free block and number of contiguous free blocks following it. Free-Space Management in System-V FS Management of free disk blocks: linked index to free blocks. 0 98 99 superblock data blocks inodes Management of free i-nodes: superblock 0 boot block 1 0 file system layout 0 1 cache in marked i-nodes in i-list superblock 13 CPSC 410 / 611 : Operating Systems File Management File Management • What is a file? • Elements of file management • File organization • Directories user file structure • directory management • File allocation • access control • UNIX file system records • access method physical blocks in memory • blocking physical blocks on disk • disk scheduling • file allocation Directories • Large amounts of data: Partition and structure for easier access.

File Management What Is a File?

DASH: Database Shadowing for Mobile DBMS

File Formats

Redbooks Paper Linux on IBM Zseries and S/390

Cygwin User's Guide

11.7 the Windows 2000 File System

CS 152: Computer Systems Architecture Storage Technologies

AMD Alchemy™ Processors Building a Root File System for Linux® Incorporating Memory Technology Devices

Files and Processes (Review)

The Application of File Identification, Validation, and Characterization Tools in Digital Curation

Secure Untrusted Data Repository (SUNDR)

Chapter 10: File System

Recursive Updates in Copy-On-Write File Systems - Modeling and Analysis