<<

Design decisions

• Files need to contain vastly different types of information. • Some of this information is tightly structured with lines, records etc. • Should the file system allow flexibility in how it deals with differently structured files? • At the bottom level the file system is working with discrete structures (sectors and blocks) of a definite size. • The most common solution is to treat files as a stream of bytes and any other structure is enforced by the programs using the files. • The work has to be done somewhere. • Some operating systems provide facilities than others for dealing with a variety of file types.

1 File attributes

• Information about the files. • These vary widely because of the previous design decisions. • Standard ones • file name – the full name includes the directories to traverse to find this file. How much space should the system allocate for a name? Many systems use a byte to indicate the file name length and so are limited to 255 characters. There are usually limitations on the characters you can use in filenames. • location – where is the file stored, some pointer to the device (or server) and the positions on the device • size of the file – either in bytes, blocks, number of records etc • owner information – usually the owner can do anything to a file • other access information – who should be allowed to do what • dates and times – of creation, access, modification • and file types…

2 File name limits

• NTFS • extended paths (start with "\\") up to 32767 chars, but ordinary paths up to 256 + "C:\" and null • in Windows 10 can opt-in to avoid 256 char limit • each or final filename commonly limited to 255 chars • see https://msdn.microsoft.com/en-us/library/aa365247.aspx for further info • Linux (many different file systems) • limit 4096 • commonly 255 bytes per path component • APFS • 255 characters per path component • ??? • see https://arstechnica.com/gadgets/2016/06/a-zfs-developers-analysis-of-the- good-and-bad-in-apples-new-apfs-file-system/ for a nice description of this file system

3 File type

• The more the system knows about file types the more it can perform appropriate tasks. • e.g. • Executable binaries can be loaded and executed. • Text files can be indexed. • Pictures can have thumbnails generated from them. • Files can automatically be opened by corresponding programs. • Also the system can stop the user doing something stupid like printing an mp3 file. • All operating systems “know” about executable binary files. • They have an OS specific structure – information for the loader about necessary libraries and where different parts should be loaded and where the first instruction is.

4 Dealing with file types

• Windows deals with different file types using a simple extension on the file name. • The extensions are connected to programs and commands in the system registry. • But there is nothing to stop a user changing an extension (except a warning message). • uses magic numbers on the front of the file . •If the file is executed the magic number can be used to invoke an interpreter for example. Try using the file command on Linux • For "universal" file types many OSs (or Mac). can extract information from files. And od e.g. od -a -N 32 README.rtf

5 (old) solution

• The Macintosh used 8 bytes to identify file types (4 for the creator and 4 for the type). • Not normally visible to the user. Therefore harder to change by accident.

• Also more structure - each file has two components (one can be empty).

• See https://en.wikipedia.org/wiki/Resource_fork

(/rsrc) • Program code (originally), • icons, menu items, • window information, • preferences.

• Data fork • Holds the (possibly unstructured) data, program code • e.g. the text of a word processing document.

• Each program includes in its resource fork a list of all the types of files it can work with.

• In MacOS X if using the Unix the resource fork has merged back into the data fork.

• Or into bundles (directories which look like single files)

• Under HFS+ it is possible to have multiple forks.

6 NTFS files

Attribute Type Description Information such as Standard Information timestamp and link count. • NTFS takes a very general Locations of all attribute approach to file attributes. Attribute List records that do not fit in the MFT record.

• NTFS views each file (or Additional names, or hard links, can be included as folder) as a set of file File Name additional file name attributes. attributes. • The most important one is File data. NTFS supports usually the data attribute. multiple data attributes per • New attributes can be file. Each file typically has one unnamed data Data added. attribute. A file can also have one or more named data attributes, each using a particular syntax.

7 More NTFS attributes

Attribute Type Description

A volume-unique file identifier. Used by the distributed link tracking Object ID service. Not all files have object identifiers.

Similar to a data stream, but operations are logged to the NTFS log file Logged Tool Stream just like NTFS metadata changes.

Reparse Point Used for mounted drives and archives.

Index Root Used to implement folders and other indexes. Used to implement the B-tree structure for large folders and other Index Allocation large indexes. Used to implement the B-tree structure for large folders and other Bitmap large indexes.

Volume Information Used only in the $Volume system file. Contains the volume version.

Volume Name Used only in the $Volume system file. Contains the volume label.

8 Alternate Data Streams

C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 0 File(s) 0 bytes 2 (s) 45,499,203,584 bytes free

C:\Users\rshee\Documents\ads>echo "this is an ADS attached to the 'ads test folder'" > :ads0.txt

C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 0 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free

C:\Users\rshee\Documents\ads>echo "this is an ADS attached to 'file1.txt'" > file1.txt:ads1.txt

C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 18/09/2017 02:38 PM 0 file1.txt 1 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free

C:\Users\rshee\Documents\ads>echo "this is another ADS attached to 'file1.txt'" > file1.txt:ads2.txt

C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 18/09/2017 02:39 PM 0 file1.txt 1 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free

C:\Users\rshee\Documents\ads>more < :ads0.txt "this is an ADS attached to the 'ads test folder'"

C:\Users\rshee\Documents\ads>more < file1.txt:ads1.txt "this is an ADS attached to 'file1.txt'"

C:\Users\rshee\Documents\ads>more < file1.txt:ads2.txt "this is another ADS attached to 'file1.txt'" 9 Representing files on disk

• We know that disks deal with constant size blocks. All files and information about files must be stored in these blocks (usually accessible via a logical block number). • We need to know what we want our file system to look like to the users in terms of its structure. • We also need to know how we can place the required information on the disk devices.

10 Structure

• Data about files and other information about storage on a device is called metadata. • Usually a disk device has one or more directories to store metadata. • These directories can be arranged in different ways: • single level – simple, small systems did this, especially with small disk devices (floppies) • Disadvantages in finding files as the number of files grows (some implementations use a B-tree). • To be workable it requires very long filenames.

11 Multiple levels

• two level – The top level (master file directory) is one entry per user on a multi-user system, the next level (user file directory) looks like a single level system to each user. • Creating a user file directory is usually only allowed for administrators. • User file directories can be allocated like other files. What about the master file directory? • When people log in they are placed within their own directories. Any files mentioned are in that directory. • Can use full pathnames to refer to other user’s files (if permissions allow it).

12 Normal file hierarchy tree – as many directories as required. This facilitates the organisation of collections of files.

Directories are special files. Should users be able to directly to their own directories?

13 Sharing files and directories

• We commonly want the same data to be accessible from different places in the file hierarchy. e.g. Two people might be working on a project and they both want the project to be in their local directories. • This can be accomplished in several different ways: • If the data is only we could just make an extra copy. • We could make two copies of the file record information (not the data). • There is a problem with consistency. • There can be one true file entry in one directory. Other entries have some reference to this entry. • UNIX symbolic links and Windows shortcuts. • There can be a separate table with file information. Then all directory entries for the same file just point to the corresponding information in this table. • UNIX and NTFS hard links.

14 Hard links

UNIX ln ExistingFilename NewFilename • Each directory entry stores a pointer to the file’s inode (more on those soon) which holds the real information about the file. NTFS mklink /H NewFilename ExistingFilename • Each directory entry holds copies of most file attributes plus a pointer to the file’s Master File Table (MFT) file record (more on those soon). • NTFS updates the properties of a only when a user accesses the original file by using the hard link. • Hard links do not have security descriptors; instead, the security descriptor belongs to the original file to which the hard link points. Both only make links on the same volume.

15 UNIX symbolic links

• The file called “linkFile” is actually just a text file with the contents “realFile”. • The OS knows to treat it differently from other text files because of the “l” in the attributes on the left hand side. • If the original is moved then UNIX can’t do anything about it. You get unable to errors for example.

$ ls -al total 1 drwxr-xr-x 2 Robert None 0 Sep 14 16:59 . drwxr-xr-x 3 Robert None 0 Jul 16 16:56 .. -rw-r--r-- 1 Robert None 304 Sep 12 10:34 .bash_history -rw-r--r-- 1 Robert None 0 Sep 14 16:59 realFile

$ ln -s realFile linkFile

$ ls -l total 0 lrwxrwxrwx 1 Robert None 8 Sep 14 16:59 linkFile -> realFile -rw-r--r-- 1 Robert None 0 Sep 14 16:59 realFile

16 Windows shortcuts

• Make a file. e.g. test.txt. • Make a to the file. • Rename the original to “test”. • By default Windows uses attributes such as file type, size and time of modification to find renamed shortcuts. Except when I just tried this in Windows 10 I got an error message. • Much better under NTFS with the Distributed Link Tracking Client service running. • This keeps a log of changes made to files that have shortcuts. So the file can almost always be found. • It even works in distributed environments (sometimes).

19/09/2017 11:14 AM 1,065 test - Shortcut.lnk 19/09/2017 11:13 AM 24 test.txt

17 Mac aliases

HFS+ has a unique, persistent identifier for each file or folder. An stores this identifier with the pathname.

Originally the identifier was first used to find the file. Only if this failed was the pathname used.

Now in order to work more like symbolic links the pathname is used first. This means that if a file is renamed and a new file with the old name is created both an alias and a will find the same file (the new one).

If the pathname does not find the file but the identifier does then the pathname is updated to the current correct value.

18 Before next time

• Read from the textbook • 11.3 Directory and Disk Structure • 19.5.1 NTFS Internal Layout • 12.4 Allocation Methods • 12.5 Free-Space Management • 12.6.1 - Efficiency Sad but true: http://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead

19