Cross-Checking Semantic Correctness: the Case of Finding File System Bugs

Cross-checking Semantic Correctness: The Case of Finding File System Bugs Paper #171 Abstract 1. Introduction Today, systems software is too complex to be bug-free. To System software is buggy. It is often implemented in un- find bugs in systems software, developers often rely oncode safe languages (e.g., C) for achieving better performance or checkers, like Sparse in Linux. However, the capability of directly accessing the hardware, thereby facilitating the intro- existing tools used in commodity, large-scale systems is duction of tedious bugs. At the same time, it is also complex. limited to finding only shallow bugs that tend to be introduced For example, Linux consists of 17 millions lines of pure code by the simple mistakes of programmers and require no deep and accepts around 190 commits every day. understanding of code. Unfortunately, a majority of and To help this situation, researchers often use memory-safe difficult-to-find bugs are semantic ones, which violate high- languages in the first place. For example, Singularity [32] level rules or invariants (e.g., missing a permission check). uses C# and Unikernel [40] use OCaml for their OS develop- Thus, it is difficult for code-checking tools that lack the ment. However, in practice, developers usually rely on code understanding of a programmer’s true intention, to reason checkers. For example, Linux has integrated static code anal- about semantic correctness. ysis tools (e.g., Sparse) in its build process to detect common To solve this problem, we present JUXTA, a tool that au- coding errors (e.g., if system calls validate arguments that tomatically infers high-level semantics directly from source come from userspace). Other tools such as Coverity [7] and code. The key idea of JUXTA is to compare and contrast mul- KINT [56] are able to find memory corruption and integer tiple existing implementations that obey latent yet implicit overflow bugs, respectively. Besides this, a large number of high-level semantics. For example, the implementation of dynamic checkers are also available, such as kmemleak for de- open() at the file system layer expects to handle an out-of- tecting memory leaks, and AddressSanitizer [47] for finding space error of the disk, regardless of implementation. We use-after-free bugs in Linux. have applied JUXTA to 54 file systems in the stock Linux Unfortunately, these tools tend to follow certain kinds of kernel (680K LoC), found 139 previously unknown semantic high-level rules that not only lack deep understanding of a bugs (one bug per 4.9K LoC), and provided corresponding programmer’s intentions or execution context, but also result patches to 41 different file systems that include mature, popu- in discovering shallow bugs. The majority of undiscovered lar file systems like ext4, btrfs, xfs and nfs. These semantic bugs are semantic ones that violate high-level rules or in- bugs are not easy to locate, as all that JUXTA found have ex- variants. According to recent surveys of software bugs and isted for overDRAFT 6.4 years on average. Not only do our empirical patches, over 50% of bugs in Linux file systems are semantic results look promising, but the design of JUXTA is generic bugs [39] (e.g., incorrectly updating a file’s timestamps), and (not specific to file systems) enough to be easily extended to many tools used in practice are ineffective in detecting se- any software that has multiple implementations, like browsers mantic vulnerabilities [14] (e.g., missing a permission check). or network stacks. Without any domain-specific knowledge, it is highly unlikely to reason about the correctness or incorrectness of code, and discover such bugs. In this regard, a large body of research has been proposed to check and enforce semantic or system rules, which we broadly classify into three categories: model checking, formal proof, and automatic testing. A common requirement for these techniques is that developers should manually pro- vide the correct semantics of code for checking: models in model checking and proofs in formal proofs. Unfortunately, creating such semantics is difficult, error-prone, and virtually infeasible for commodity systems like Linux. [Copyright notice will appear here once ’preprint’ option is removed.] 1 2015/5/20 To solve this problem, we present JUXTA, a tool that auto- 2. Case Study matically infers high-level semantics from source code. The Linux provides an abstraction layer, called virtual file system key intuition of our approach is that different implementa- (VFS). The Linux VFS defines an interface between afile tions of the same functionality should obey the same system system and Linux, which can be viewed as an implicit spec- rules or semantics. Therefore, we can derive latent seman- ification that all file systems should obey to support Linux. tics by comparing and contrasting these implementations. In To derive this latent specification in the existing file sys- particular, we have applied JUXTA to 54 file system imple- tems, JUXTA compares source code of file systems originated mentations in stock Linux, which consists of 680K LoC in from these VFS interfaces. The VFS is rather complex; it total. We found 139 previously unknown semantic bugs (one consists of 15 common operations (e.g., super_operations, bug per 4.9K), and provided corresponding patches to 41 inode_operations) that comprise over 170 functions. In this different file systems, including mature and widely adopted section, we describe three interesting cases (and bugs) that file systems like ext4, btrfs, xfs and nfs. We would liketo JUXTA found: rename(), write_begin/end() and fsync(). emphasize that these semantic bugs JUXTA found are difficult to find as they have existed for over 6.4 years on average; 2.1 Bugs in inode.rename() over 30 bugs were introduced more than 10 years ago. One might think that rename() is a simple system call Challenges. The main challenge in comparing a multiple of that just changes the name of a file to another, but it has file system implementations at once stems from the fact that a very subtle, complicated semantics. Let us consider a all of the file systems implement their own logic (e.g., fea- simple example that renames a file, “old_dir/a” to another, tures and disk layout), which is dramatically different from “new_dir/b”: each other, but implicitly follow certain high-level semantics 1 rename("old_dir/a","new_dir/b"); (e.g., expect to check file system permissions when opening a file). More importantly, these high-level semantics are deeply Upon its successful completion, “old_dir/a” is renamed convoluted in their code in one way or another without any ex- to “new_dir/b” as expected, but what about timestamps plicit, common specifications. Instead of directly comparing of involved directories and files? To precisely implement all of the file system implementations, we devised two statis- rename(), developers should specify semantics of 12 dif- tical models that properly capture common semantics, and ferent scenarios: three timestamps, ctime for status change, remain tolerant against the specific implementation of each mtime for modification and atime for access timestamps, of old_dir new_dir a b file system; in other words, JUXTA identifies deviant behav- four inodes, , , and . In fact, POSIX par- ior derived from common semantics shared among multiple tially defines its semantics: updating ctime and mtime of two different software implementations. directories, old_dir and new_dir: Contributions. We have made the following contributions: “Upon successful completion, rename() shall mark for update the last data modification and last file status • We have found 139 previously unknown semantic bugs change timestamps of the parent directory of each in 41 different file systems in the stock Linux kernel. We file.” have made and submitted corresponding patches to fix [51] the bugs that we found. Patches for 58 bugs are already In UNIX philosophy, this specification makes sense since applied either in a testing branch or in mainline Linux. rename() never change the inodes of both files, a and b. But • Our idea and design of inferring latent semantics by com- in practice, it causes serious problems as developers believe paring and contrasting multiple implementations. In par- status of both files (ctime) are changed after rename(). ticular, we devise two statistical comparison schemes that For example, a popular archiving utility, tar, used to have can compare multiple seemingly different implementa- a critical problem when performing an incremental backup tions at the code level. (–listed-incremental). After a user renames a file a to b, tar assumes file b is already backed up and file a is deleted, • An open source tool, JUXTA, and its pre-processed as ctime of b is not updated after renamed. Then, upon database that will facilitate other developers’ ability to restoration, tar deletes file a, which it thinks deleted, and easily build their own checkers on top of it. We have never restore b as it skipped, thereby losing the original file made eight checkers as an example: semantic comparator, that user wanted to backup [1]. specification/interface generator, external APIs checker, However, it is hard to say this is a fault by tar because and lock pattern checker. the majority of file systems update ctime of new and old The rest of this paper is organized as follows. §2 motivates files after rename(), although POSIX remains this behav- JUXTA’s approach with case study. §3 overviews its workflow. ior undefined. In fact, assuming rename() updates ctime of §4 describes JUXTA’s design. §5 shows various checkers built both files is more reasonable belief as developers, because on top of JUXTA. §6 shows its implementation. §7 explains traditionally rename() has updated ctime when it has been bugs we found. §8 discusses our potential applications. §9 implemented with link() and unlink() [1, 51].

Load more