ADVANCED I/O

ISA 563: Fundamentals of Systems Programming Agenda

 File Locking  File locking exercise

 Unix Domain Sockets

 Team Projects Time File Locking Background

 Both high-performance and general-purpose database systems require atomic access to database resources & records

 Unix systems have evolved to support this need  flock(2) – early locking primitive, only locks files  fcntl(2) – raw, powerful, configurable interface  lockf(3) – simplified API built on fcntl(2)

 Key ability is to sections, or records, in a file Recall: Properties of Unix Files

 Unix files contain no markup

 A sequence of raw data bytes  Contrast this with Windows files, which contain markup and index information

 File systems maintain maps to data blocks Advisory vs. Mandatory Locking

 Advisory locking is for cooperating processes: a group of processes that have a “gentleman’s agreement” to only use a particular API to access data

 Mandatory locking puts the burden on the kernel (really, the code) to check access in the particular open, read, and write system calls to intercept any trying to access a file Aside: Security Considerations

 Note the trust relationships here!  TCB includes the kernel  Note how mandatory locking is achieved via a complete abuse/hack on the file permission bits  Set-group-ID ON, group-execute OFF  The semantics of this combination are entirely unclear  Similar to playing with unusual combinations of memory page permissions  Mandatory locking can be used maliciously to prevent legitimate access to a file Early Forms of File Locking: flock(2) flock(2) is a system call that allows cooperating processes to lock a whole file. It uses advisory locks.

#include int flock(int fd, int operation); flock(2) operations

Symbol Value Meaning LOCK_SH 1 Shared lock LOCK_EX 2 Exclusive lock LOCK_NB 4 Don’t block when locking LOCK_UN 8 Unlock

Note how their values follow the flag pattern we discussed last class: they have significant bits at non-overlapping locations in their binary representation. flock(2) Caveats

 flock(2) is useful but coarse-grained

 Advisory locking: only cooperating processes actually using the flock(2) interface obey the locking restrictions

 flock(2) locks whole files, not regions of a file  Also, the lock is on the file, not a . Child processes (via fork(2)) and new file descriptors (via dup(2)) do not result in a new lock. Children can thus unlock the file and cause the parent to lose the lock. Raw Record Locking: fcntl(2)

 The fcntl(2) system call is a general interface for controlling file locking and locking portions or regions of a file

 More powerful than flock(2), and hence somewhat more complex

 The flock structure holds meta-data about the lock The flock structure

 This structure describes:  The type of lock  Shared read: F_RDLCK  Exclusive write: F_WRLCK  Unlock: F_UNLCK  The size of the region to lock (in bytes)  The offset of where to begin locking (a combination of two parameters, l_start, which is relative to l_whence)  A process ID (of a process that *may* already hold a lock on this file) Using fcntl(2)

#include int fcntl(int filedes, int cmd, &flock_structure);

Third argument is a pointer to a flock struct. ‘cmd’ is one of: F_GETLK, F_SETLK, F_SETLKW ‘filedes’ must be open for reading or writing (appropriate to the desired type of lock) The fcntl ‘cmd’ parameter

Command F_GETLK Determine if the flock structure describes a lock held by some other process. The ‘pid’ argument in the flock structure will be filled in with that process ID. If no lock exists, the flock structure remains unchanged except that l_type is set to F_UNLCK F_SETLK Set (via F_RDLCK or F_WRLCK) or unset the lock (via F_UNLCK) F_SETLKW The blocking version of F_GETLK (W means ‘wait’).

Testing with F_GETLK and then trying to grab the lock with F_SETLK is not an atomic operation, thus two processes can race to grab the lock. Simplifying Life with lockf(3)

 This function is a simplified API that uses fcntl(2) underneath in its implementation

 Usually used in conjunction with lseek(2) or fseek(3)  Because it has no parameter to say where to lock from in a file, just the ‘size’ of the region to lock The lockf(3) interface

#include int lockf(int filedes, int function, off_t size);

‘filedes’ must be open, either O_WRONLY or O_RDWR as appropriate for desired type of lock ‘size’ is the size (in bytes) of the region to lock. A value of zero means “lock through the largest possible size of the file” ‘function’ is described on next slide The lockf(3) ‘function’ parameter

Function Description F_ULOCK Unlock locked section F_LOCK Lock a section for exclusive use F_TLOCK Test and lock a section for exclusive use F_TEST Test a section for locks by other processes

Note that there is no distinction between read and write locks like with fcntl(2). Note also the “atomic” operation F_TLOCK. If fcntl(2) is not atomic, then how might we get this operation to be atomic? Food for thought… File Locking Experiment

Using flock(2), write two processes that share information via a file called “myinfo”. One process accepts user input and writes it to the file. The other process should attempt to lock this file and read data from it. Does this process always block?

Create a third process that writes to the file without obtaining a lock via flock(2). Observe results. Unix Domain Sockets

 A form of fast IPC, using standard Unix names

 An alternative to using Internet sockets

 UDS datagrams (unlike Internet UDP) are reliable Advantages of Unix Domain Sockets

 Can be referred to via a filename  This is the standard Unix way of naming things, contrast with other forms of IPC that require a new, complex namespace  Can use standard file tools (e.g., ls, rm) with them

 They are fast: they only copy data  They do not involve:  protocol state  headers to add, remove, or checksum  sequence numbers  acknowledgements to send, no keepalives More Advantages of UDS

 Both stream and datagram interfaces (like TCP and UDP)

 Datagram service is reliable:  No lost messages  Messages are delivered in order

 Can use network-based API or the socketpair() function The socketpair(2) system call

 Set up a pair of unnamed UNIX domain sockets  Endpoints will be connected  But remain nameless  After socketpair() returns, the only way to refer to the endpoints is via the 3rd argument, an array of two socket descriptors

 Can use the Unix Internet sockets API to bind an address (i.e., a pathname) to a UDS socket descriptor The socketpair(2) signature

#include int socketpair(int domain, int type, int protocol, int sockfd[2]);

‘domain’ should be AF_UNIX ‘type’ is ‘SOCK_STREAM’ or ‘SOCK_DGRAM’ ‘protocol’ is optional, use zero ‘sockfd’ stores the two socket descriptor handles Alternative: Using ‘socket(2)'

 Must use the ‘struct sockaddr_un’ structure to load in the desired name, then use the socket(2) system call to create the socket

 The ‘sun_path’ member of this structure is a statically sized character array that can hold a file name.

 Example on next slide (no error checking, condensed from example on page 596 in APUE) #include #include void foo(){ int sd, size; struct sockaddr_un un; un.sun_family = AF_UNIX; strncpy(un.sun_path, “somesocket.sock”,16); sd = socket(AF_UNIX, SOCK_DGRAM, 0); bind(sd, (struct sockaddr*)&un, size); } Team Projects Time

Use the remaining time to meet with your team and discuss / work on projects.