Files Need to Contain Vastly Different Types of Information
Total Page:16
File Type:pdf, Size:1020Kb
Design decisions • Files need to contain vastly different types of information. • Some of this information is tightly structured with lines, records etc. • Should the file system allow flexibility in how it deals with differently structured files? • At the bottom level the file system is working with discrete structures (sectors and blocks) of a definite size. • The most common solution is to treat files as a stream of bytes and any other structure is enforced by the programs using the files. • The work has to be done somewhere. • Some operating systems provide more facilities than others for dealing with a variety of file types. 1 File attributes • Information about the files. • These vary widely because of the previous design decisions. • Standard ones • file name – the full name includes the directories to traverse to find this file. How much space should the system allocate for a name? Many systems use a byte to indicate the file name length and so are limited to 255 characters. There are usually limitations on the characters you can use in filenames. • location – where is the file stored, some pointer to the device (or server) and the positions on the device • size of the file – either in bytes, blocks, number of records etc • owner information – usually the owner can do anything to a file • other access information – who should be allowed to do what • dates and times – of creation, access, modification • and file types… 2 File name limits • NTFS • extended paths (start with "\\") up to 32767 chars, but ordinary paths up to 256 + "C:\" and null • in Windows 10 can opt-in to avoid 256 char limit • each directory or final filename commonly limited to 255 chars • see https://msdn.microsoft.com/en-us/library/aa365247.aspx for further info • Linux (many different file systems) • path limit 4096 • commonly 255 bytes per path component • APFS • 255 characters per path component • ??? • see https://arstechnica.com/gadgets/2016/06/a-zfs-developers-analysis-of-the- good-and-bad-in-apples-new-apfs-file-system/ for a nice description of this file system 3 File type • The more the system knows about file types the more it can perform appropriate tasks. • e.g. • Executable binaries can be loaded and executed. • Text files can be indexed. • Pictures can have thumbnails generated from them. • Files can automatically be opened by corresponding programs. • Also the system can stop the user doing something stupid like printing an mp3 file. • All operating systems “know” about executable binary files. • They have an OS specific structure – information for the loader about necessary libraries and where different parts should be loaded and where the first instruction is. 4 Dealing with file types • Windows deals with different file types using a simple extension on the file name. • The extensions are connected to programs and commands in the system registry. • But there is nothing to stop a user changing an extension (except a warning message). • UNIX uses magic numbers on the front of the file data. •If the file is executed the magic number can be used to invoke an interpreter for example. Try using the file command on Linux • For "universal" file types many OSs (or Mac). can extract information from files. And od e.g. od -a -N 32 README.rtf 5 (old) Macintosh solution • The Macintosh used 8 bytes to identify file types (4 for the creator and 4 for the type). • Not normally visible to the user. Therefore harder to change by accident. • Also more structure - each file has two components (one can be empty). • See https://en.wikipedia.org/wiki/Resource_fork • Resource fork (/rsrc) • Program code (originally), • icons, menu items, • window information, • preferences. • Data fork • Holds the (possibly unstructured) data, program code • e.g. the text of a word processing document. • Each program includes in its resource fork a list of all the types of files it can work with. • In MacOS X if using the Unix File System the resource fork has merged back into the data fork. • Or into bundles (directories which look like single files) • Under HFS+ it is possible to have multiple forks. 6 NTFS files Attribute Type Description Information such as Standard Information timestamp and link count. • NTFS takes a very general Locations of all attribute approach to file attributes. Attribute List records that do not fit in the MFT record. • NTFS views each file (or Additional names, or hard links, can be included as folder) as a set of file File Name additional file name attributes. attributes. • The most important one is File data. NTFS supports usually the data attribute. multiple data attributes per • New attributes can be file. Each file typically has one unnamed data Data added. attribute. A file can also have one or more named data attributes, each using a particular syntax. 7 More NTFS attributes Attribute Type Description A volume-unique file identifier. Used by the distributed link tracking Object ID service. Not all files have object identifiers. Similar to a data stream, but operations are logged to the NTFS log file Logged Tool Stream just like NTFS metadata changes. Reparse Point Used for mounted drives and archives. Index Root Used to implement folders and other indexes. Used to implement the B-tree structure for large folders and other Index Allocation large indexes. Used to implement the B-tree structure for large folders and other Bitmap large indexes. Volume Information Used only in the $Volume system file. Contains the volume version. Volume Name Used only in the $Volume system file. Contains the volume label. 8 Alternate Data Streams C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 0 File(s) 0 bytes 2 Dir(s) 45,499,203,584 bytes free C:\Users\rshee\Documents\ads>echo "this is an ADS attached to the 'ads test folder'" > :ads0.txt C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 0 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free C:\Users\rshee\Documents\ads>echo "this is an ADS attached to 'file1.txt'" > file1.txt:ads1.txt C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 18/09/2017 02:38 PM 0 file1.txt 1 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free C:\Users\rshee\Documents\ads>echo "this is another ADS attached to 'file1.txt'" > file1.txt:ads2.txt C:\Users\rshee\Documents\ads>dir Directory of C:\Users\rshee\Documents\ads 18/09/2017 02:39 PM 0 file1.txt 1 File(s) 0 bytes 2 Dir(s) 45,499,006,976 bytes free C:\Users\rshee\Documents\ads>more < :ads0.txt "this is an ADS attached to the 'ads test folder'" C:\Users\rshee\Documents\ads>more < file1.txt:ads1.txt "this is an ADS attached to 'file1.txt'" C:\Users\rshee\Documents\ads>more < file1.txt:ads2.txt "this is another ADS attached to 'file1.txt'" 9 Representing files on disk • We know that disks deal with constant size blocks. All files and information about files must be stored in these blocks (usually accessible via a logical block number). • We need to know what we want our file system to look like to the users in terms of its structure. • We also need to know how we can place the required information on the disk devices. 10 Structure • Data about files and other information about storage on a device is called metadata. • Usually a disk device has one or more directories to store metadata. • These directories can be arranged in different ways: • single level – simple, small systems did this, especially with small disk devices (floppies) • Disadvantages in finding files as the number of files grows (some implementations use a B-tree). • To be workable it requires very long filenames. 11 Multiple levels • two level – The top level (master file directory) is one entry per user on a multi-user system, the next level (user file directory) looks like a single level system to each user. • Creating a user file directory is usually only allowed for administrators. • User file directories can be allocated like other files. What about the master file directory? • When people log in they are placed within their own directories. Any files mentioned are in that directory. • Can use full pathnames to refer to other user’s files (if permissions allow it). 12 Normal file hierarchy tree – as many directories as required. This facilitates the organisation of collections of files. Directories are special files. Should users be able to write directly to their own directories? 13 Sharing files and directories • We commonly want the same data to be accessible from different places in the file hierarchy. e.g. Two people might be working on a project and they both want the project to be in their local directories. • This can be accomplished in several different ways: • If the data is read only we could just make an extra copy. • We could make two copies of the file record information (not the data). • There is a problem with consistency. • There can be one true file entry in one directory. Other entries have some reference to this entry. • UNIX symbolic links and Windows shortcuts. • There can be a separate table with file information. Then all directory entries for the same file just point to the corresponding information in this table. • UNIX and NTFS hard links. 14 Hard links UNIX ln ExistingFilename NewFilename • Each directory entry stores a pointer to the file’s inode (more on those soon) which holds the real information about the file. NTFS mklink /H NewFilename ExistingFilename • Each directory entry holds copies of most file attributes plus a pointer to the file’s Master File Table (MFT) file record (more on those soon).