<<

Introduction to Operating Systems

File System Interface

John Franco Electrical Engineering and Computing Systems University of Cincinnati Concept of a System

• The implementation of an abstract, logical view of information storage that is independent of medium • The entity from which this logical view is achieved is called a file • Files consist of abstractions of physical properties and are mapped to physical devices • A file is a collection of related information and is the smallest nameable storage unit available to • The logical is contiguous but the physical incantation of the file may not be • File types: – Data ∗ numeric ∗ binary ∗ character – Program – Free form or rigidly formed File Structure • No structure - just a sequence of alpha-numberic characters • Simple record structure - – lines – fixed length records (tables) – variable length records (array of ) • Complex structures - – formatted document • Files can be stored in fixed size disk blocks – subject to internal fragmentation • Certain files must conform to a structure that is required by the OS – executable files have a special structure that is used to determine where in memory to load the file and what the location of the first instruction is • The structures an OS supports, the more complex the OS is going to be • New applications may demand new types, putting a further burden on the OS - hence requires few types – an application must supply its own code to interpret the contents of a file File Attributes

• Name: kept in human-readable form, unlike other attributes • Identifier: unique identity wrt the file system • : text, video, image, document, latex, etc. • Location: pointer to the location of the file on the device • Size: number of byte • Protection: permissions for reading, writing, executing • created/modified: usage monitoring Attributes are maintained in special file system nodes, depending on the file system - in unix OSes it is in , in FAT32 it is in file descriptor blocks

Use getfattr -d * to see extended file attributes setfattr -n name -v value to set extended file attributes lsattr to list attributes + to change attributes File Operations

A file is an abstract data type - hence, special operations are used to manipulate files Operations include the following: • Create: find space and add entry to : create a channel for communicating with a file logically • : close the channel • : write data beginning cursor position, update cursor position afterwards • : read data beginning at cursor position, update cursor position afterwards • Reposition cursor: set cursor position (seek) • Delete: free space and remove entry from directory • Truncate: delete data beginning at some position in the file Data Structures for Open Files

• Global Table: contain data about a file that is not dependent on a particular . Examples: size, dates, disk location, file-open count (number of processes using the file - used to delete data structures when the last reference is removed)

• Per-Process Table: cursor position for the process, permissions for reading, writing, appending, etc. File Locking

• May be provided by some operating systems or even languages (see Java example)

• A file may be considered shared or exclusive

• Locking may be mandatory (access denied or allowed based on locks) or advisory (a process can look at lock status and decide what it should do) File Types

• Recognition: file extension, magic numbers • Support auto opening of a file in some application • Support execution of certain file types, such as scripts Use file to see file types Use ghex to see the magic numbers File Access

• Sequential: cursor moves from one position to an adjacent position while reading and writing but rewind is possible too • Direct: cursor can be placed anywhere for reading or writing • Index file: create a file of pointers to blocks in a file, each pointer associates with a name and names are sorted - first search the names for the needed block, then access the block through the pointer – example: items in data base are stored by UPC - order index by increasing UPC, search in index to find pointer, then locate block

• Relative file: records are numbered, access is via record number Disk Structure • Disks can be subdivided into partitions • Disks or partitions can be RAID-protected against failures • Disks or partitions can be used raw or formatted with a FS • Partitions can be formatted independently • There are general file systems such as and special purpose file systems such as

Use sudo fdisk /dev/sda to see file systems on disk Use dumpe2fs /dev/sda5 to see file system information Filesystem Types

• General purpose: for user files – ext4: standard journaling file system – : new copy on write filesystem for linux – : read-only linux file system - simple and space-efficient – : high-performance journaling file system • Special purpose: – : “temporary” file system in volatile main memory, contents erased if the system reboots or crashes – objfs: an interface to the kernel that looks like a file system - gives debuggers access to kernel symbols – ctfs: maintains “contract” information to manage which processes start when the system boots and must continue to run during operation – lofs: a “loop back” file system that allows one file system to be accessed in place of another one – procfs: presents information on all processes as a file system – : exports information about devices and drivers from kernel to user space, also used for configuration Directories

• Needed operations on directories: – search: for one or more files, possibly recursively – create: a file in a particular directory – delete: one or more files from a directory or directories – list: the names, attributes, structure of files in directories – rename: files in specified directories

• Needed features: – efficient: a file should be located in real time – naming: should be convenient to humans - the same name should be allowed to be used more than once on different files - the same file should be allowed to have more than one name – grouping: supports grouping of files by common properties, e.g. all files of an application Directories directory: .txt hello.jpg fizzle.doc sway big.xls

❄ ❄ ❄ ❄ ❄ files: ✓ ✏ ✓ ✏ ✓ ✏ ✓ ✏✓ ✏ ✒ ✑ ✒ ✑ ✒ ✑ ✒ ✑✒ ✑

• Single Level Directory: – efficiency problem: if the number of files is in the hundreds of thousands, seaching for a node representing a file can be time consuming – naming problem: a name cannot be used more than once – grouping problem: supporting groups is awkward - can set a group attribute and then, when listing files, use a switch to only members of a group or groups visible Directories

users: peanut butter jelly ❅ ❅ ❅ ❅ ✠ ❅❘ ❅❘ test.txt hello.jpg fizzle.doc sway big.xls

❄ ❄ ❄ ❄ ❄ ✓ ✏ ✓ ✏ ✓ ✏ ✓ ✏ ✓ ✏

✒ ✑ ✒ ✑ ✒ ✑ ✒ ✑ ✒ ✑ • Two Level Directory: – efficiency: improved - need only search in user’s space – naming: different users can give files the same name but a single user cannot have the same name for different files – grouping problem: this has not been addressed – paths: concept of naming a from root becomes necessary Directories

users: peanut butter jelly ❅ ❅ ❅ ❅ ✠ ❅❘ ❅❘ test.txt images fizzle.doc sway big.xls

❄ ❄ ❄ ❄ ❄ ✓ ✏ ✓ ✏ ✓ ✏ ✓ ✏ a.jpg b.jpg ✒ ✑ ✒ ✑ ✒ ✑ ✒ ✑ ❄ ❄ ✓ ✏✓ ✏

✒ ✑✒ ✑ • Tree Structure: – efficiency: vastly improved - need only search in a directory path – grouping: easy - use subdirectories – subdirectories: users can create their own – commands: provide user commands for directory control such as , , -rf, -lR etc. Directories

home: Play Work Defer ❅ ❅ ❅ ❅ ✠ ❅❘ ❅❘ test.txt images c.jpg sway big.xls

❄ ❄ ❄ ❄ ✓ ✏ ✓ ✏ ✓ ✏ a.jpg b.jpg ✒ ✑ ✒ ✑ ✒ ✑ ❄ ❄ ✠ ✓ ✏✓ ✏

✒ ✑✒ ✑ • Acyclic Graph Structure: – subdirectories and files: can now be shared, same subdirectories and files can be given different names (aliases) – new directory entry type: the link – new problem: delete a file - any link to it is dangling Directories

home: Play Work Defer ❅■ ❅ ❅ ❅ ❅ ❅ ✠ ❅ ❅❘ ❅❘ test.txt images tmp sway big.xls

❄ ❄ ❄ ❄ ✓ ✏ ✓ ✏ ✓ ✏ a.jpg b.jpg ✒ ✑ ✒ ✑ ✒ ✑ ❄ ❄ ✓ ✏✓ ✏

✒ ✑✒ ✑ • General Graph Structure: – cycles: can wreak havoc if the OS does not do something special while searching subdirectories Mounting

• Filesystems must be mounted to access: – mounting is generally a priviledged operation - but user can be allowed to on subdirectories – a volume that has an unknown type or a defective structure should not be mounted – a kernel data structure should be provided to keep track of all mount points – windows: mount points look like this: C: – unix: any empty subdirectory is a mount point, plus / is a mount point – if the current directory for some is on a mounted volume, that volume cannot be unmounted Across a Network

• Client-Server: – server has files, client uses them, server decides resource access – server has many clients – client authentication protocols are necessary for security – examples: NFS (unix), CIFS (windows)

• Distributed Naming Services: – a system that stores, organizes and provides (secure) access to information in a directory, over a network – examples: LDAP, DNS, NIS, Active Directory – authentication may be provided by Kerberos – firewalls provide some protection against intruders

• Failures: – failures are more likely over networks and are handled differently - state information must be saved for recovery, otherwise an intruder can hijack a connection File Sharing Across a Network

• Consistency Semantics: – specify how multiple users are to access a shared file simultaneously – should writes to an open file be visible immediately to other users of the same open file or should other users maintain their own copy and all copies get synchronized under some condition (file is closed)? – should all users be able to read and write to the same file concurrently? – should shared files be immutable - cannot be written to or renamed once they are declared to be shared?