Secure Programmer: Prevent Race Conditions

Secure programmer: Prevent race conditions Resource contention can be used against you David Wheeler ([email protected]), Research Staff Member, Institute for Defense Analyses Summary: Learn what a race condition is and why it can cause security problems. This article shows you how to handle common race conditions on UNIX-like systems, including how to create lock files correctly, alternatives to lock files, how to handle the file system, and how to handle shared directories -- and, in particular, how to correctly create temporary files in the /tmp directory. You'll also learn a bit about signal handling. Date: 07 Oct 2004 Level: Intermediate Also available in: Japanese Activity: 9098 views Comments: (View | Add comment - Sign in) Rate this article Using a stolen password, "Mallory" managed to log into an important server running Linux. The account was limited, but Mallory knew how to cause trouble with it. Mallory installed and ran a trivial program with odd behavior: It quickly created and removed many different symbolic link files in the /tmp directory, using a multitude of processes. (When accessed, a symbolic link file, also called a symlink, redirects the requester to another file.) Mallory's program kept creating and removing many different symlinks pointing to the same special file: /etc/passwd, the password file. A security precaution on this server was that every day, it ran the security program Tripwire -- specifically, the older version, 2.3.0. As Tripwire started up, it tried to create a temporary file, as many programs do. Tripwire looked and found that there was no file named /tmp/twtempa19212, so that appeared to be a good name for a temporary file. But after Tripwire checked, Mallory's program then created a symlink with that very name. This was no accident. Mallory's program was designed to create the very file names most likely to be used by Tripwire. Tripwire then opened up the file and starting writing temporary information. But instead of creating a new empty file, Tripwire was now overwriting the password file. From then on, no one -- not even the administrators -- could log into the system, because the password file had been corrupted. It could have been worse. Mallory's attack could have overwritten any file, including critical data stored on the server. Introduction to race conditions The story is hypothetical; Mallory is a conventional name for an attacker. But the attack, and the vulnerability it exploits, are all too common. The problem is that many programs are vulnerable to a security problem called a race condition. A race condition occurs when a program doesn't work as it's supposed to because of an unexpected ordering of events that produces contention over the same resource. Notice that a race condition doesn't need to involve contention between two parts of the same program. Many security problems occur if an outside attacker can interfere with a program in unexpected ways. For example, Tripwire V2.3.0 determined that a certain file didn't exist, and then tried to create it, not taking into account that the file could have been created by an attacker between those two steps. Decades ago, race conditions were less of a problem. Then, a computer system would often run one simple sequential program at a time, and nothing could interrupt it or vie with it. But today's computers typically have a large number of processes and threads running at the same time, and often have multiple processors executing different programs simultaneously. This is more flexible, but there's a danger: If those processes and threads share any resources, they may be able to interfere with each other. In fact, a vulnerability to race conditions is one of the more common vulnerabilities in software, and on UNIX-like systems, the directories /tmp and /var/tmp are often misused in a way that opens up race conditions. But first, we need to know some terminology. All UNIX-like systems support user processes; each process has its own separate memory area, normally untouchable by other processes. The underlying kernel tries to make it appear that the processes run simultaneously. On a multiprocessor system, they really can run simultaneously. A process notionally has one or more threads, which share memory. Threads can run simultaneously, as well. Since threads can share memory, there are usually many more opportunities for race conditions between threads than between processes. Multithreaded programs are much harder to debug for that very reason. The Linux kernel has an elegant basic design: There are only threads, and some threads happen to share memory with other threads (thus implementing traditional threads), while others don't (thus implementing separate processes). To understand race conditions, let's first look at a trivial C statement: Listing 1. Trivial C statement b = b + 1; Looks pretty simple, right? But let's pretend that there are two threads running this line of code, where b is a variable shared by the two threads, and b started with the value 5. Here's a possible order of execution: Listing 2. Possible order of execution with shared "b" (thread1) load b into some register in thread 1. (thread2) load b into some register in thread 2. (thread1) add 1 to thread 1's register, computing 6. (thread2) add 1 to thread 2's register, computing 6. (thread1) store the register value (6) to b. (thread2) store the register value (6) to b. We started with 5, then two threads each added one, but the final result is 6 -- not the expected 7. The problem is that the two threads interfered with each other, causing a wrong final answer. In general, threads do not execute atomically, performing single operations all at once. Another thread may interrupt it between essentially any two instructions and manipulate some shared resource. If a secure program's thread is not prepared for these interruptions, another thread may be able to interfere with the secure program's thread. Any pair of operations in a secure program must still work correctly if arbitrary amounts of another thread's code is executed between them. The key is to determine, when your program is accessing any resource, if some other thread could interfere with it. Back to top Solving race conditions The typical solution to a race condition is to ensure that your program has exclusive rights to something while it's manipulating it, such as a file, device, object, or variable. The process of gaining an exclusive right to something is called locking. Locks aren't easy to handle correctly. One common problem is a deadlock (a "deadly embrace"), in which programs get stuck waiting for each other to release a locked resource. Most deadlocks can be prevented by requiring all threads to obtain locks in the same order (for example, alphabetically, or "largest grain" to "smallest grain"). Another common problem is a livelock, where programs at least manage to gain and release a lock, but in such a way that they can't ever progress. And if a lock can get stuck, it can be very difficult to make it gracefully release. In short, it's often difficult to write a program that correctly, in all circumstances, locks and releases locks as needed. A common mistake to avoid is creating lock files in a way that doesn't always lock things. You should learn to create them correctly or switch to a different locking mechanism. You'll also need to handle races in the file system correctly, including the ever-dangerous shared directories /tmp and /var/tmp, and how signals can be used safely. The sections below describe how to deal with them securely. Back to top Lock files UNIX-like systems have traditionally implemented a lock shared between different processes by creating a file that indicates a lock. Using a separate file to indicate a lock is an example of an advisory lock, not a mandatory lock. In other words, the operating system doesn't enforce the resource sharing through the lock, so all processes that need the resource must cooperate to use the lock. This may appear primitive, but not all simple ideas are bad ones. Creating separate files makes it easy to see the state of the system, including what's locked. There are some standard tricks to simplify clean-up if you use this approach -- in particular, to remove stuck locks. For example, a parent process can set a lock, call a child to do the work (make sure only the parent can call the child in a way that it can work), and when the child returns, the parent releases the lock. Or, a cron job can look at the locks, which contain a process id. If the process isn't alive, it would erase the lock and restart the process. Finally, the lock file can be erased as part of system start-up, so that if the system suddenly crashes, you won't have a stuck lock later. If you're creating separate files to indicate a lock, there's a common mistake: calling creat() or its open() equivalent (the mode O_WRONLY | O_CREAT | O_TRUNC). The problem is that root can always create files this way, even if the lock file already exists -- which means that the lock won't work correctly for root. The simple solution is to use open() with the flags O_WRONLY | O_CREAT | O_EXCL (and permissions set to 0, so that other processes with the same owner won't get the lock). Note the use of O_EXCL, which is the official way to create "exclusive" files. This even works for root on a local file system.

Load more