System Administration and Network Security

Master SSCI, M2R subject

Duration: up to 3 hours.

All answers should be justified. Shortness and clarity will be rewarded.

1 Perl and multithreading

Perl threads are different from the thread implementation of others lan- guages. While most synchronisation operators are similar to POSIX threads or Java Threads, there is a major difference: thread are not shared by default. Shared data must be explicitely flagged as such: my $dummy :shared = 1;

1.1 Questions 1. The threads are still running in a single process. How Perl achieves the separation of the data used by the threads ? (5 lines max) 2. Ruby and Python use default sharing for the data. From the perfor- mance point of view of a scripting language, is it better to shared or not shared the data by default ? Why ? (10 lines max) 3. From the programming point of view, give the advantages and draw- backs of such approach. (10 lines max)

2 Performance evaluation

The systems A and B are evluated using two different benchmarks. The following values are measured. Benchmarks A B 1 10s 15s 2 20s 5s

1 2.1 Questions 1. By evaluating the A/B performance, which system is the best in aver- age ? 2. By evaluating the B/A performance, which system is the best in aver- age ? 3. Which system is the best ? Why ? (10 lines max)

3 IMAP

You discover that the IMAP server of your small company has been compro- mised.

1. What are the possible consequences of the attack ? (3 lines max per consequences) 2. What should you do next ? (10 lines max)

4 Security : Nmap

1. What is the use of Nmap ? How to use it to attack a network ? 2. An attacker is running Nmap on the network of your small company. How to find where is he ? (several possibilities) (10 lines max.) 3. How would you mitigate such problem ? (10 lines max)

5

BTRFS is an experimental filesystem developped in the kernel since 2007. Its main goal is to scale to very large storage.

5.1 BTRFS features 1. copy-on-write: when a file is copied, its blocks are not duplicated but just flagged. A block is duplicated only when it is written. 2. snapshots: read-only, or copy-on-write, clone of the filesystem. 3. 0 (stripping) 1 (mirror), 5 (stripping + 1 parity block), 6 (stripping + 2 parity blocks), 10 (stripping + mirror)

2 4. multidevices spanning: a filesystem may use simultaneously several disks.

5. online defragmentation

6. online balancing between disks,

7. block device addition and removal,

8. transparent compression,

9. several filesystem roots within each partition.

10. Data scrubbing with self-healing: a hashing function is used in back- ground to discover errors. Self-healing correction is done using the copies of the block.

5.2 Questions 1. Which features of BTRFS are important for the servers of your enter- prise ? Why ? (10 lines max)

2. Which features of BTRFS are important for the desktop of your en- terprise ? Why ? (10 lines max)

3. Your users carry their laptop all around the world and plug them later on your network. Which features of BTRFS are important ? Why ? (10 lines max)

6 Vulnerabilities

The following document is the first two pages of a scientific article describing a security problem involving software development and compiler optimiza- tion.

1. What is the problem ? When does it occur ? (5 lines max)

2. Is it relevant for you ? for your enterprise ? How ? (5 lines max)

3. How would you eliminate the problem ? Give the various steps of the investigation and resolution. (10 lines max)

3 Towards Optimization-Safe Systems: Analyzing the Impact of Undefined Behavior

Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama MIT CSAIL

char *buf = ...; Abstract char *buf_end = ...; This paper studies an emerging class of software bugs unsigned int len = ...; if (buf + len >= buf_end) called optimization-unstable code: code that is unexpect- return; /* len too large */ edly discarded by compiler optimizations due to unde- if (buf + len < buf) fined behavior in the program. Unstable code is present return; /* overflow, buf+len wrapped around */ in many systems, including the Linux kernel and the Post- /* write to buf[0..len-1] */ gres database. The consequences of unstable code range Figure 1: A pointer overflow check found in several code bases. from incorrect functionality to missing security checks. The code becomes vulnerable as gcc optimizes away the second if To reason about unstable code, this paper proposes statement [13]. a novel model, which views unstable code in terms of optimizations that leverage undefined behavior. Using unstable code happens to be used for security checks, the this model, we introduce a new static checker called Stack optimized system will become vulnerable to attacks. that precisely identifies unstable code. Applying Stack This paper presents the first systematic approach for to widely used systems has uncovered 160 new bugs that reasoning about and detecting unstable code. We imple- have been confirmed and fixed by developers. ment this approach in a static checker called Stack, and use it to show that unstable code is present in a wide 1 Introduction range of systems software, including the Linux kernel and the Postgres database. We estimate that unstable code The specifications of many programming languages des- exists in 40% of the 8,575 Debian Wheezy packages that ignate certain code fragments as having undefined behav- contain C/C++ code. We also show that compilers are ior [15: §2.3], giving compilers the freedom to generate increasingly taking advantage of undefined behavior for instructions that behave in arbitrary ways in those cases. optimizations, leading to more vulnerabilities related to For example, in C the “use of a nonportable or erroneous unstable code. program construct or of erroneous data” leads to unde- To understand unstable code, consider the pointer over- fined behavior [24: §3.4.3]. flow check buf + len < buf shown in Figure 1, where buf One way in which compilers exploit undefined behavior is a pointer and len is a positive integer. The program- is to optimize a program under the assumption that the pro- mer’s intention is to catch the case when len is so large gram never invokes undefined behavior. A consequence that buf + len wraps around and bypasses the first check of such optimizations is especially surprising to many pro- in Figure 1. We have found similar checks in a number of grammers: code which works with optimizations turned systems, including the Chromium browser [7], the Linux off (e.g., -O0) breaks with a higher optimization level (e.g., kernel [49], and the Python interpreter [37]. -O2), because the compiler considers part of the code dead While this check appears to work with a flat address and discards it. We call such code optimization-unstable space, it fails on a segmented architecture [23: §6.3.2.3]. code, or just unstable code for short. If the discarded Therefore, the C standard states that an overflowed pointer is undefined [24: §6.5.6/p8], which allows gcc to simply assume that no pointer overflow ever occurs on any archi- Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies tecture. Under this assumption, buf + len must be larger are not made or distributed for profit or commercial advantage and that than buf, and thus the “overflow” check always evaluates copies bear this notice and the full citation on the first page. Copyrights to false. Consequently, gcc removes the check, paving the for third-party components of this work must be honored. For all other uses, contact the owner/author. way for an attack to the system [13]. In addition to introducing new vulnerabilities, unstable Copyright is held by the owner/author(s). code can amplify existing weakness in the system. Fig- SOSP’13, Nov. 3–6, 2013, Farmington, Pennsylvania, USA. ACM 978-1-4503-2388-8/13/11. ure 2 shows a mild defect in the Linux kernel, where the http://dx.doi.org/10.1145/2517349.2522728 programmer incorrectly placed the dereference tun->sk

1 struct tun_struct *tun = ...; struct sock *sk = tun->sk; piler to generate precise warnings when it removes code if (!tun) based on undefined behavior. The model specifies precise return POLLERR; conditions under which a code fragment can induce un- /* write to address based on tun */ defined behavior. Using these conditions we can identify Figure 2: A null pointer dereference vulnerability (CVE-2009- fragments that can be eliminated under the assumption 1897) in the Linux kernel, where the dereference of pointer tun is that undefined behavior is never triggered; specifically, before the null pointer check. The code becomes exploitable as gcc any fragment that is reachable only by inputs that trigger optimizes away the null pointer check [10]. undefined behavior is unstable code. We make this model more precise in §3. before the null pointer check !tun. Normally, the kernel The Stack checker implements this model to identify forbids access to page zero; a null tun pointing to page unstable code. For the example in Figure 2, it emits a zero causes a kernel oops at tun->sk and terminates the warning that the null pointer check !tun is unstable due current process. Even if page zero is made accessible (e.g., to the earlier dereference tun->sk. Stack first computes via mmap or some other exploits [25, 45]), the check !tun the undefined behavior conditions for a wide range of con- would catch a null tun and prevent any further exploits. In structs, including pointer and integer arithmetic, memory either case, an adversary should not be able to go beyond access, and library function calls. It then uses a constraint the null pointer check. solver [3] to determine whether the code can be simplified Unfortunately, unstable code can turn this simple bug away given the undefined behavior conditions, such as into an exploitable vulnerability. For example, when gcc whether the code is reachable only when the undefined be- first sees the dereference tun->sk, it concludes that the havior conditions are true. We hope that Stack will help pointer tun must be non-null, because the C standard programmers find unstable code in their applications, and states that dereferencing a null pointer is undefined [24: that our model will help compilers make better decisions §6.5.3]. Since tun is non-null, gcc further determines that about what optimizations might be unsafe and when an the null pointer check is unnecessary and eliminates the optimizer should produce a warning. check, making a privilege escalation exploit possible that We implemented the Stack checker using the LLVM would not otherwise be [10]. compiler framework [30] and the Boolector solver [3]. Poor understanding of unstable code is a major obstacle Applying it to a wide range of systems uncovered 160 new to reasoning about system behavior. For programmers, bugs, which were confirmed and fixed by the developers. compilers that discard unstable code are often “baffling” We also received positive feedback from outside users and “make no sense” [46], merely gcc’s “creative reinter- who, with the help of Stack, fixed additional bugs in both pretation of basic C semantics” [27]. On the other hand, open-source and commercial code bases. Our experience compiler writers argue that the C standard allows such shows that unstable code is a widespread threat in systems, optimizations, which many compilers exploit (see §2.3); that an adversary can exploit vulnerabilities caused by it is the “broken code” [17] that programmers should fix. unstable code with major compilers, and that Stack is Who is right in this debate? From the compiler’s point useful for identifying unstable code. of view, the programmers made a mistake in their code. The main contributions of this paper are: For example, Figure 2 clearly contains a bug, and even a new model for understanding unstable code, Figure 1 is arguably incorrect given a strict interpretation • of the C standard. However, these bugs are quite subtle, a static checker for identifying unstable code, and and understanding them requires detailed knowledge of • a detailed case study of unstable code in real systems. the language specification. Thus, it is not surprising that • such bugs continue to proliferate. Another conclusion one can draw from this paper is From the programmer’s point of view, the compilers are that language designers should be careful with defining being too aggressive with their optimizations. However, language construct as undefined behavior. Almost every optimizations are important for achieving good perfor- language allows a developer to write programs that have mance; many optimizations fundamentally rely on the undefined meaning according to the language specifica- precise semantics of the C language, such as eliminating tion. Our experience with C/C++ indicates that being needless null pointer checks or optimizing integer loop liberal with what is undefined can lead to subtle bugs. variables [20, 29]. Thus, it is difficult for compiler writers The rest of the paper is organized as follows. §2 pro- to distinguish legal yet complex optimizations from an op- vides background information. §3 presents our model of timization that goes too far and violates the programmer’s unstable code. §4 outlines the design of Stack. §5 sum- intent [29: §3]. marizes its implementation. §6 reports our experience of This paper helps resolve this debate by introducing a applying Stack to identify unstable code and evaluates model for identifying unstable code that allows a com- Stack’s techniques. §7 covers related work. §8 concludes.

2