Naming and Directories

Sarah Diesburg Operating Systems CS 3430 Recall from the last time…

 A file header associates the file with its data blocks File Header Storage

 Under , a file header is stored in a data structure called i-node  For early UNIX systems  I-nodes are stored in a special array  Fixed number of array entries . Maximum number of files fixed  Not stored near data blocks on disk . Reading a small file involves . One disk seek to get the i-node . Other disk seek(s) to get file blocks

Reasons for Separate Allocations

 Reliability  Data corruptions are unlikely to affect i-nodes  Reduced fragmentation  File headers are smaller than a whole block  By packing them in an array, multiple headers can be fetched from disk  File headers are accessed more often  e.g., ls  Grouping file headers improves disk efficiency More recently

 Portions of file header array stored on each cylinder  For small directories  All file headers and data stored in the same cylinder  Reduce seek time Naming

 Remember that odd moment when your computer asks you to name the first file?  Naming: allows users to issue file names instead of i-node numbers - Users tend to come up with poor names  e.g., test - Many file are difficult to name…

How do you name these photos? Directories

 A table of file names and their i-node numbers  Under many file systems  Directories are implemented as normal files  Containing file names and i_node numbers  Only the OS is permitted to modify directories Name Space

 Flat name space  Hierarchical naming  Relational name space  Contextual naming  Content-based naming Flat Name Space

 All files are stored in a single + Easy to implement - Not scalable for large directories  Name collisions: multiple files with the same names Hierarchical Naming

 Uses multiple levels of directories  Most popular name space organization + Conceptual model maps well into the human model of organizing things  A file cabinet contains many files + Scalable  The probability of name collisions decreases + Spatial locality  Store all files under a directory within a cylinder to avoid disk seeks

More on Hierarchical Naming

 Absolute name: consisting the path from the root directory ‘/’ to the file  e.g., /pets/cat.jpg

root directory sub directory file name Drawbacks of Hierarchical Naming

- Not all files can fit into the hierarchical model

pets pests ? ?

- Accessing a file may involve many levels of directory lookups, or a path resolution before getting to the file content

An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os

An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 1. Read in the file header for the root directory ‘/’  Stored at a fixed location on disk

/ An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 2. Read the first data block for the root directory  Lookup the directory entry for pets

/

pets An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 3. Read the file header for pets

/ pets

pets An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 4. Read the first data block for the pet directory  Lookup the directory entry for cat.jpg

/ pets

pets cat An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 5. Read the file header for cat.jpg

/ pets cat

pets cat An Example of Path Resolution

 To access the data content of /pets/cat.jpg  The system needs to perform the following disk I/Os 6. Read the data block for cat.jpg

/ pets cat

pets cat Some Performance Optimizations…

 Top-level directories are usually cached  A user inside a directory (e.g., /pets)  Can issue relative path names (e.g., cat.jpg) to refer files within the current directory

Relational Name Space

 Hierarchical naming model is largely a tree  One step beyond is the relational naming model, which allows the construction of general graphs  A file can belong to multiple folders  According to its attributes  Files can be accessed in a manner similar to relational databases  e.g., keywords: cats and binder Pros and Cons of Relational Name Space + More flexible than hierarchical naming - May require a long list of attributes to name a single piece of data  e.g., this lecture  Keywords: operating systems, file systems, naming, too many cat pics, etc . - Who will create those attributes?

Contextual Naming

 Takes advantage of the observation that certain attributes can be added automatically  e.g., when you try to open a file by Word, a system will search only the file types supported by Word (.doc, .txt, .html) + Avoids a long list of attributes - A user may not remember the file name

Content-Based Naming

 Searches a file by its content instead of names  File contents are extracted automatically  e.g., I want a photo of a cat taken five years ago  The system returns all files satisfying the criteria

Content-Based Naming

- Requires advanced information processing techniques  e.g., image recognition  Many existing systems use manual indexing  Automated content-based naming is still an active area of research Example: The “Internet

 Can be viewed as a worldwide file system  What is the naming scheme for the Internet file system? The “Internet File System”

 Contains shades of various naming schemes  Flat name space:  Each URL provides a unique name  Hierarchical name space:  Within individual websites  Relational name space  Can search the Internet via search engines  Contextual name space:  Page ranked according to relevance  Content-based name space:  You can find your information without knowing the exact file names Example: Plan 9

 Modern UNIX has a deep-rooted influence from the Plan 9 OS  Developed by Bell lab  Major design philosophy: everything is a file  A single hierarchical name space for  Processes (e.g., /proc)  Files  IPC (e.g., pipe)  Devices (e.g., /dev/fd0)  Use open/close/read/write for everything  e.g., /dev/mem