Unix Directory Structure at NERC
iTSS
November 2005 Contents
1 Introduction 3
2 The UNIX directory tree 4 2.1 Pseudo Filesystems ...... 5 2.2 Device Files ...... 6 2.3 Links ...... 8 2.3.1 Hard links ...... 8 2.3.2 symbolic or soft links ...... 8 2.4 Mounting lesystems ...... 8 2.4.1 Mounting on boot ...... 9 2.5 File Sharing ...... 10 2.6 Comparison with Windows lesystems ...... 11
3 The NERC Directory Structure 13 3.1 Local Structure ...... 15 3.2 Network Structure ...... 15 3.2.1 The /users directory ...... 16 3.2.2 The /data directory ...... 17 3.2.3 The /packages or /nerc/packages directory ...... 17 3.2.4 The /nerc directory ...... 19 3.3 Other common mount points ...... 20 3.4 Singleton machines ...... 20
2 Chapter 1
Introduction
When learning about UNIX or Linux for the rst time you will often hear that “everything is a le unless it is a process”. This may seem odd as your keyboard certainly doesn’t look like a le, neither does your monitor, and what about directories? Well, in a very broad sense, the “everything is a le” statement is true. Directories are just les containing a list of other les and things like keyboard and monitor are accessed in the same way as les and appear as special les in the directory tree. In order to read input from the keyboard for instance, it is done by reading from a le. To write a message to the console you write to a le. Thus, les are absolutely central to understanding a UNIX system.
3 Chapter 2
The UNIX directory tree
Let’s take a look at the basic directory structure of a typical UNIX system. The hierarchical nature is fairly evident with a single root (or trunk) which branches into many directories and subdirectories. The base of the hierarchy is the “/” directory which known as the “root” directory. Below this is the rest of the le system. Here is a list of directories found in the root of a Debian system: bin common programs shared by the system administrator and the users boot the kernel and startup les, possibly boot loader con guration dev les that describe hardware etc system con guration les home user home directories, mainly for stand-alone machines lib library les media contains common mount points for CD and oppy (Debian) mnt general purpose mount points net mount point for remote le systems opt used for 3rd party software proc special lesystem containing system information root system administrator home directory (Linux) sbin programs used by the system and sysadmin scratch mount point for scratch space (Arbitrary) sys special lesystem for interacting with the kernel tmp temporary space available to the system and users usr subtree of programs, libraries, documentation for the users
4 iTSS 2.1. PSEUDO FILESYSTEMS
var storage of variable les such as logs, mail, print spools etc. Underneath /usr you will nd a similar tree to this containing bin, sbin, lib etc. This is historical back when fast storage was expensive. It allowed for the /usr directory tree to be mounted from slower storage unit (likely disk) whilst the essentials on the root partition could be on fast storage (such as drum). Nowadays it is quite unusual to split /usr from the root partition and on Solaris, you will nd /bin is a symbolic link to /usr/bin. This lesystem layout will vary slightly between Solaris, Linux and Irix but not so much that you wont be able to nd your way around. It is worth men- tioning that there is an attempt to produce a standard hierarchy called FHS which should allow software and users to predict the location of installed les and directories.
2.1 Pseudo Filesystems
On all Unix systems you will come across “pseudo” les. These are les that are not related to storage space. An example would be /dev/null, known as the bit bucket. Anything written to this le is simply thrown away. Very useful for discarding unwanted output. Sometimes you will come across entire lesystems that dont exist on disk. The most obvious of these is /proc. This virtual lesystem documents kernel and process information and can sometimes be used to tune a running kernel. Here is the “content” of /proc/cpuinfo from my desktop: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.40GHz stepping : 7 cpu MHz : 2399.994 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid bogomips : 4751.36 This shows that I have a 2.4GHz Pentium 4 clocked at the correct speed and with half a megabyte of cache. I can also tell from this that the processor is capable of hyperthreading but this is obviously disabled (the CPU would look like two indidual cpus to the kernel so there would be further information for processor 1). Lots of interesting information is available through /proc and it is used by common system tools such as ps and top as we shall see later in the course. The
5 iTSS CHAPTER 2. THE UNIX DIRECTORY TREE
Solaris version of /proc contains similar information but it is in a less easily readable format. There are many other virtual lesystems available for Linux. /dev is some- times a dynamic virtual lesystem. This is because /dev was becoming very large as Linux distributions were lling it with device les that described every type of hardware likely to be found on a desktop system. If we can make this lesystem dynamic we can generate the correct device le as we need it. When you come across this lesystem on Linux it is known as udev (there is also a lesystem called devfs which is now obsolete). Other examples of special lesystems include sysfs which is used as an API to the kernel. Along with udev, this is very useful for interacting with hotplug devices that may require additional kernel modules and rmware. tmpfs is a lesytem that exists in memory only which is ideal for /tmp. All contents of a tmpfs lesystem are lost during a power cycle. Its also possible that you will encounter the selinuxfs lesystem which exposes the API of SELinux to userspace.
2.2 Device Files
One of the smarter ideas which the unix designers had was to make all les consist of a stream of characters. There are no structures in this stream - no record lengths or block lengths or bu er lengths, no nonsense about xed length record les and variable length record les. There is just a stream of characters. One of these characters is the newline character (ASCII NL) which is used as a record separator by text applications but unix, itself, does not care about that ( though a printing utility certainly will). Those of you have never had to struggle with data descriptors and such things on IBM mainframes or VAX’s will probably never realise just how good an idea this was. Having made this simpli cation, they then took the next step and realised that devices such as disks, tapes, terminals and so on can also be represented as streams of characters, at least as far as users are concerned. Inside the kernel, of course, there are device drivers which have to understand about SCSI commands and magnetic tape blocks and so on but this is kept hidden from the user so that, for example: echo ’Hello’ > output.txt writes “Hello” into a le called output.txt, and echo ’Hello’ > /dev/tty1 writes “Hello” onto the rst terminal. A moment’s thought should convince you that what actually happens behind the scenes is quite di erent in these two cases. /dev/tty1 is actually an interface between user-land and the terminal device driver which is part of the kernel. This may seem like ngernail-painting but it is very useful if you want to write software which is independent of the details of your hardware. In fact, all user-mode software is hardware-independent, by default, and you have to go to extraordinary lengths to make a program dependent on a particular hardware device.
6 iTSS 2.2. DEVICE FILES
If you look under the /dev directory you will nd “device les” which repre- sent all the various devices attached to your machine. It is mostly pretty obvious which device les are an interface to which hardware devices but I probably need to say a few words about the Solaris (SPARC) disk naming convention which is very general (It was designed for very large systems with hundreds, even thou- sands, of disks.) but appears unnecessarily convoluted for a small workstation. The device les which represent the disks on a Solaris machine can be found in two directories - /dev/dsk and /dev/rdsk. The names of the les in these two directories are the same and are something like c0t1d0s2. Here, c0 means controller 0 (the rst controller), t1 means SCSI target 1, d0 means LUN 0 or device 0. LUNs are not commonly used except in some RAID systems and tape libraries. s2 means slice (partition) 2. There are up to 8 partitions on a Solaris disk (s0 - s7). They are de ned using the format program. By convention, s2 is always a pseudo partition covering the whole disk - do not remove or change this ever.
Linux uses a simpler system. Disks are either /dev/hdx or /dev/sdx de- pending on whether the kernel sees them as IDE or SCSI targets. Serial ATA (SATA) disks are treated like SCSI as they employ the SCSI instruction set; the same is true for USB attached storage. Soon parallel ATA (PATA), also known as IDE, will be a deprecated format. The rst disk to be found by the kernel on boot will be labeled “a”, the second “b” and so on. The important thing to remember here is that they are labeled on boot. The label bears little relation to the SCSI target or whether the disk is master or slave on the ATA bus. The rst SCSI-type storage device to be found by the kernel will be /dev/sda and the rst PATA device will be labeled /dev/hda. Therefore /dev/sda1 signi es partition 1 on the rst disk. As the label bears little relation to the target numbering, we must be careful as the following example illustrates.
Consider that you have a server with two disks set up in a mirror con gu- ration for resilience. These disks are SCSI so the two disks are known to the kernel as /dev/sda and /dev/sdb. Suppose that the rst disk develops a fault. It is reasonable to expect that you might remove the rst disk and run with a degraded mirror on just one disk. However, when the machine comes up with only one disk, that disk will be /dev/sda no matter which disk failed. If you then “hot add” a new disk into the running system, this disk will become /dev/sdb. This in itself is nothing to worry about but when you are diagnosing problems with disks, remember that /dev/sda is not necessarily the lowest SCSI target. In the case above, it may be that /dev/sda is a higher SCSI target than /dev/sdb (normally the disks are “discovered” by the kernel during boot in the order that they exist on the bus). If you are unlucky to su er a second disk problem on this host (with no reboot in between) you must remember to check the corespondence between disk device and physical SCSI target before pulling disks out. This is most easily done by examining the boot log.
Returning to Solaris on SPARC, it was mentioned above that there are two device les for each partition. A “raw” device and a “block” device found in /dev/rdsk and /dev/dsk respectively. Always use the block device when mounting partitions, and in most other situations you will use the raw devices.
7 iTSS CHAPTER 2. THE UNIX DIRECTORY TREE
2.3 Links
In a strict mathematical sense, the unix le structure is not a simple, hierarchical tree because of links - there are two kinds of links in the unix le store.
2.3.1 Hard links To create a hard link use the command ln patha pathb.This de nes pathb as an alternative name for patha. patha and pathb do not need to be in the same directory. They do have to be on the same lesystem. The two names are completely equivalent If you edit either patha or pathb, the changes will be seen in both pathb and patha. If you remove one name, the le remains and can be accessed using the other name. You cannot de ne hard links to directories (This is not allowed due to the danger of loops occurring when traversing the le store recursively).
2.3.2 symbolic or soft links To create a symbolic link, use the command ln -s patha pathb. Again pathb is an alternative name for patha but the two names are not equivalent, pathb is a pointer to patha. If patha is removed, pathb remains but points to nothing (a dangling link). You can create symbolic links to directories as well as les. Symbolic links can point to another le system or even another computer. Be- cause of this, they are the great kludge in unix le stores. The example which I quoted earlier in Solaris (/bin pointing to /usr/bin) is typical. Almost always when someone uses a symbolic link, it is a “quick x”. Symbolic links can be nested though there is limit in the depth of nesting, again to catch loops.
2.4 Mounting lesystems
When talking about hard links in section 2.3.1, we found that these can only ex- ist within a single lesystem. A lesystem (in UNIX terms) is a single partition. This de nition becomes slightly more fuzzy when we use more advanced volume management tools but the basic premise still holds. Each lesystem resides on a single (sometimes arbitrary) slice of storage medium. On Windows (and some other operating systems) it is possible to refer to each lesystem by name (Windows uses a drive letter). However, on UNIX- like operating systems, we use the mount command to link all the lesystems together so that they look like a uni ed homogenous tree containing the entire le store. As an example, we may have our root partition on /dev/sda1 and our var partition on /dev/sda2 we would then: mount /dev/sda2 /var
So that everything in the var partition now exists in the /var directory. A lesystem can be mounted at di erent times in di erent places. This concept is used when repairing a system that cannot be booted. The host is booted from a rescue image and the lesystems on the local disks can then be mounted on arbitrary mount points and repaired.
8 iTSS 2.4. MOUNTING FILESYSTEMS
The next step is obvious - extend this concept to disks attached to other computers and this is what NFS (Network File System) is all about. mount livcomms:/local1/data /data Note the machine name followed by colon syntax. This is very common in NFS commands. You can mount, on your machine, le systems which reside on a disk attached to another machine - the NFS server. Your machine is the NFS client. I have introduced two important words here. Also, a single machine can be an NFS server and an NFS client (for di erent le systems) at the same time.
2.4.1 Mounting on boot Obviously, when a machine starts up, it has a table of mounts (both local disk mounts and NFS mounts) to be performed as part of the boot process, before users are allowed to login. These are listed in a le called /etc/fstab (/etc/vfstab on Solaris). The exact detail of the layout of this le varies a bit so RTFM! Note also that this le is used to de ne other disk space parameters such as the size and location of swap space. Here is an example of a Linux /etc/fstab # /etc/fstab: static file system information. # #
9 iTSS CHAPTER 2. THE UNIX DIRECTORY TREE
As you may be suspecting, the warning about the way that Linux names its disks in section 2.2 can come back to haunt us here. This is particularly noticeable on large le servers with many SCSI disks attached. Adding another disk can throw out all the device names and /etc/fstab will have to be rejigged. To get around this problem, it is possible to “label” each partition and this is the default method on the Redhat distributions. The label is part of the lesystem and it is a simple matter to add a label to an existing partition: /sbin/tune2fs -L root /dev/sda1 /sbin/tune2fs -L boot /dev/sda2 /sbin/tune2fs -L var /dev/sda3 or when creating the lesystem: /sbin/mkfs -t ext3 -L scratch /dev/hda7 and then your fstab may contain lines like this: LABEL=root / ext3 defaults 0 1 LABEL=boot /boot ext3 defaults 0 2 LABEL=var /var ext3 defaults 0 2 Similar utilities for labeling other lesystems such as XFS, exist. Note it is not possible to label the swap partition. Managing disks and partitions can be made a lot easier by using Logical Volume Management. There are a number of free and commercial systems available. The commonest one on Linux is LVM2. This is available when in- stalling a number of Linux distros including Redhat and Debian. I will not go into detail here, there is plenty of information available on the web.
2.5 File Sharing
File sharing on Linux and UNIX is usually done using NFS. This is a venerable application and compatible with just about any version of UNIX, Linux or BSD. Other applications for le sharing exist but this is by far the most widely used. To share a le system from a Solaris host you would use the share(1) command and /etc/dfs/vfstab. On Linux and most other systems, you would use the exportfs(1) command and the /etc/exports le. The /etc/exports le on a typical Linux leserver might look like: # exports file for Bush file server # NERC master and Bush domain exports below /local/master 192.171.136.0/24(ro,sync) /local/bush/domain 192.171.136.0/24(ro,sync) bufiles(rw,sync) # experimental (for now) switch to nerc-bush.ac.uk domain /local/nerc-bush.ac.uk/domain 192.171.136.0/24(ro,sync) bufiles(rw,sync) # application packages shares below /local/packages 192.171.136.0/24(ro,sync) budbase(rw,sync) # user shares below /local/users 192.171.136.0/24(rw,sync) /model/users 192.171.136.0/24(rw,sync) /poll/users 192.171.136.0/24(rw,sync)
10 iTSS 2.6. COMPARISON WITH WINDOWS FILESYSTEMS
/ifetrop/users 192.171.136.0/24(rw,sync) # data areas below /model/data 192.171.136.0/24(rw,sync) /poll/data 192.171.136.0/24(rw,sync) /ifetrop/data 192.171.136.0/24(rw,sync) # /var/spool/mail 192.171.136.0/24(rw,sync) /cache/condor_ckpt 192.171.136.0/24(rw,sync) Make changes to this le and then export everything in it by /usr/sbin/exportfs. Solaris is similar, here is an example of /etc/dfs/vfstab, note that long lines may be split using the “\” character. share -F nfs -o [email protected] -d "mail spool" /var/mail share -F nfs -o [email protected],rw=livcomms:livhome \ -d "/nerc structure" /local/nerc-liv.ac.uk share -F nfs -o [email protected],rw=livcomms -d "/nerc structure" \ /local/master # share -F nfs -o [email protected] -d "Solaris packages" \ /local1/packagesshare -F nfs -o [email protected] /local1/data # share -F nfs -o [email protected] -d "Home dirs" /local/users Exporting all shares listed in this le can be done by using the shareall(1) command. There are one or two things that need to be known before plunging into an implentation of NFS. 1. NFS does not do le locking. File locking is provided by an additional piece of software. Normally this is not a problem but for certain le access patterns it can be a real performance killer 2. NFS does not export any metadata associated with a le. Modern lesys- tems allow for extra le attributes to be stored alongside the le itself. These are mostly used for access control list (ACL) type functionality. Such information is not visible on les exported by NFS. These comments relate to NFSv3. NFSv4 provides more functionality and se- curity but is still not widely deployed.
2.6 Comparison with Windows lesystems
This is a bit of a mine eld - a unix le system looks very like a Windows le system (especially NTFS) but there are all sorts of di erences once you get into it. There are the obvious di erences which everyone knows about: case sensitivity line terminators path component seperator
11 iTSS CHAPTER 2. THE UNIX DIRECTORY TREE