ADVANCED I/O
ISA 563: Fundamentals of Systems Programming Agenda
File Locking File locking exercise
Unix Domain Sockets
Team Projects Time File Locking Background
Both high-performance and general-purpose database systems require atomic access to database resources & records
Unix systems have evolved to support this need flock(2) – early locking primitive, only locks files fcntl(2) – raw, powerful, configurable interface lockf(3) – simplified API built on fcntl(2)
Key ability is to lock sections, or records, in a file Recall: Properties of Unix Files
Unix files contain no markup
A sequence of raw data bytes Contrast this with Windows files, which contain markup and index information
File systems maintain maps to data blocks Advisory vs. Mandatory Locking
Advisory locking is for cooperating processes: a group of processes that have a “gentleman’s agreement” to only use a particular API to access data
Mandatory locking puts the burden on the kernel (really, the file system code) to check access in the particular open, read, and write system calls to intercept any process trying to access a file Aside: Security Considerations
Note the trust relationships here! TCB includes the kernel Note how mandatory locking is achieved via a complete abuse/hack on the file permission bits Set-group-ID ON, group-execute OFF The semantics of this combination are entirely unclear Similar to playing with unusual combinations of memory page permissions Mandatory locking can be used maliciously to prevent legitimate access to a file Early Forms of File Locking: flock(2) flock(2) is a system call that allows cooperating processes to lock a whole file. It uses advisory locks.
#include
Symbol Value Meaning LOCK_SH 1 Shared lock LOCK_EX 2 Exclusive lock LOCK_NB 4 Don’t block when locking LOCK_UN 8 Unlock
Note how their values follow the flag pattern we discussed last class: they have significant bits at non-overlapping locations in their binary representation. flock(2) Caveats
flock(2) is useful but coarse-grained
Advisory locking: only cooperating processes actually using the flock(2) interface obey the locking restrictions
flock(2) locks whole files, not regions of a file Also, the lock is on the file, not a file descriptor. Child processes (via fork(2)) and new file descriptors (via dup(2)) do not result in a new lock. Children can thus unlock the file and cause the parent to lose the lock. Raw Record Locking: fcntl(2)
The fcntl(2) system call is a general interface for controlling file locking and locking portions or regions of a file
More powerful than flock(2), and hence somewhat more complex
The flock structure holds meta-data about the lock The flock structure
This structure describes: The type of lock Shared read: F_RDLCK Exclusive write: F_WRLCK Unlock: F_UNLCK The size of the region to lock (in bytes) The offset of where to begin locking (a combination of two parameters, l_start, which is relative to l_whence) A process ID (of a process that *may* already hold a lock on this file) Using fcntl(2)
#include
Third argument is a pointer to a flock struct. ‘cmd’ is one of: F_GETLK, F_SETLK, F_SETLKW ‘filedes’ must be open for reading or writing (appropriate to the desired type of lock) The fcntl ‘cmd’ parameter
Command F_GETLK Determine if the flock structure describes a lock held by some other process. The ‘pid’ argument in the flock structure will be filled in with that process ID. If no lock exists, the flock structure remains unchanged except that l_type is set to F_UNLCK F_SETLK Set (via F_RDLCK or F_WRLCK) or unset the lock (via F_UNLCK) F_SETLKW The blocking version of F_GETLK (W means ‘wait’).
Testing with F_GETLK and then trying to grab the lock with F_SETLK is not an atomic operation, thus two processes can race to grab the lock. Simplifying Life with lockf(3)
This function is a simplified API that uses fcntl(2) underneath in its implementation
Usually used in conjunction with lseek(2) or fseek(3) Because it has no parameter to say where to lock from in a file, just the ‘size’ of the region to lock The lockf(3) interface
#include
‘filedes’ must be open, either O_WRONLY or O_RDWR as appropriate for desired type of lock ‘size’ is the size (in bytes) of the region to lock. A value of zero means “lock through the largest possible size of the file” ‘function’ is described on next slide The lockf(3) ‘function’ parameter
Function Description F_ULOCK Unlock locked section F_LOCK Lock a section for exclusive use F_TLOCK Test and lock a section for exclusive use F_TEST Test a section for locks by other processes
Note that there is no distinction between read and write locks like with fcntl(2). Note also the “atomic” operation F_TLOCK. If fcntl(2) is not atomic, then how might we get this operation to be atomic? Food for thought… File Locking Experiment
Using flock(2), write two processes that share information via a file called “myinfo”. One process accepts user input and writes it to the file. The other process should attempt to lock this file and read data from it. Does this process always block?
Create a third process that writes to the file without obtaining a lock via flock(2). Observe results. Unix Domain Sockets
A form of fast IPC, using standard Unix names
An alternative to using Internet sockets
UDS datagrams (unlike Internet UDP) are reliable Advantages of Unix Domain Sockets
Can be referred to via a filename This is the standard Unix way of naming things, contrast with other forms of IPC that require a new, complex namespace Can use standard file tools (e.g., ls, rm) with them
They are fast: they only copy data They do not involve: protocol state headers to add, remove, or checksum sequence numbers acknowledgements to send, no keepalives More Advantages of UDS
Both stream and datagram interfaces (like TCP and UDP)
Datagram service is reliable: No lost messages Messages are delivered in order
Can use network-based API or the socketpair() function The socketpair(2) system call
Set up a pair of unnamed UNIX domain sockets Endpoints will be connected But remain nameless After socketpair() returns, the only way to refer to the endpoints is via the 3rd argument, an array of two socket descriptors
Can use the Unix Internet sockets API to bind an address (i.e., a pathname) to a UDS socket descriptor The socketpair(2) signature
#include
‘domain’ should be AF_UNIX ‘type’ is ‘SOCK_STREAM’ or ‘SOCK_DGRAM’ ‘protocol’ is optional, use zero ‘sockfd’ stores the two socket descriptor handles Alternative: Using ‘socket(2)'
Must use the ‘struct sockaddr_un’ structure to load in the desired name, then use the socket(2) system call to create the socket
The ‘sun_path’ member of this structure is a statically sized character array that can hold a file name.
Example on next slide (no error checking, condensed from example on page 596 in APUE) #include
Use the remaining time to meet with your team and discuss / work on projects.