Spatial and temporal errors Buffer overflow Buffer overflow is a common vulnerability. Memory corruption errors are sometimes classified as: É Simple cause: Secure Programming Lecture 4: Memory É putting m bytes into a buffer of size n, for m>n É spatial: the error happens because memory access Corruption II (Stack & Heap Overflows) É exceeds some region of memory that a data item is corrupts the surrounding memory intended to occupy. É Simple fix:

É main example: buffer overflow É check size of data before/when writing David Aspinall, Informatics @ Edinburgh É temporal: the error happens because memory Overflow exploits, where corruption performs something access occurs in some region of memory that the specific the attacker wants, can be very complex. program ought not currently have access to We’ll study examples to explain how devastating 26th September 2019 É main example: dangling pointer overflows can be, looking at simple (and historical) This lecture focuses on spatial errors. stack overflows and heap overflows. Examples will use Linux/x86 to demonstrate; principles are similar on other OSes/architectures.

How the stack works (reminder) Corrupting stack variables Application scenario

int authenticate(char *username, char *password) {

Stack (frames) int authenticated; // flag, non-zero if authenticated high addresses char buffer[1024]; // buffer for log message ↑ authenticated = verify_password(username, password); . . Local variables are put close together on the stack. if (authenticated ==0){ sprintf(buffer, "Incorrect password for user %s\n", É If a stray write goes beyond the size of one variable username); Data É . . . it can corrupt another log("%s",buffer); } return authenticated; }

Code low addresses ↓ É Vulnerability in authenticate() call to sprintf(). Memory É If the username is longer than 995 bytes, data will be written past the end of the buffer. Possible stack frame before exploit Stack frame after exploit Local variable corruption remarks

password: 0x080B8888 ... 1235

username: 0x080B4444 AAAAAA... password: 0x080B8888 1235 Tricky in practice:

saved EIP (return addr) username: 0x080B4444 AAAAAA... É location of variables may not be known saved EBP (frame ptr) saved EIP (return addr) É memory addresses can vary between invocations

authenticated: 0x0000000A É standards don’t specify stack layout saved EBP (frame ptr) É moves things around, optimises layout AAAA authenticated: 0x00000000 É effect depends on behaviour of application code . buffer .

AAAA buffer start addr A more predictable, general attack works by corrupting buffer[1024] (undefined contents) the fixed information in every stack frame: the frame buffer start addr pointer and return address. É If username is >995 letters long, authenticated is ... corrupted and may be to non-zero. É E.g., char 1024=‘\n’, the low byte becomes 10.

Classic stack overflow exploit Attacker controlled execution Arbitrary code exploit

. The attacker takes these steps: . By over-writing the return address, the attacker may 1. write code useful for an attacker return address The malicious argument either: overwrites all of the space 2. store executable code somewhere in memory 3. use stack overflow to direct execution there . allocated for the buffer, all the . way to the return address 1. set it to point to some known piece of the location. application code, or code inside a shared , The attack code is known as shellcode. Typically, it attack code The return address is altered which achieves something useful, or launches a shell or network connection. to point back into the stack, 2. supply his/her own code somewhere in memory, Shellcode is ideally: . somewhere before the attack which may do anything, and arrange to call that. . code. Typically, the attack code The second option is the most general and powerful. É small and self-contained buffer executes a shell. É position independent How does it work? É free of ASCII NUL (0x00) characters . . Question. Why? Arbitrary code exploit Building shellcode Invoking system calls

Consider spawning a shell in Unix. The code looks like this: To execute a library function, the code would need to find the location of the function. #include ... É for a dynamically loaded library, this requires char *args[] = { "/bin/sh", NULL }; 1. write code useful for an attack execve("bin/sh", args, NULL) ensuring it is loaded into memory, negotiating with 2. store executable code somewhere in memory the linker 3. use stack overflow to direct execution there É this would need quite a bit of assembly code É execve() is part of the Standard C Library, libc É it starts a process with the given name and It is easier to make a system call directly to the argument list and the environment as the third operating system. parameter. É luckily, execve() is a library call which corresponds We want to write (relocatable) assembly code which exactly to a system call. does the same thing: constructing the argument lists and then invoking the execve function.

Invoking system calls Invoking a shell From assembly to shellcode

Here is the assembly code for a simple system call invoking a shell:

.section .rodata # data section args: Linux system calls (32 bit x86) operate like this: .long arg # char *["/bin/sh"] However, this is not yet quite shellcode: it contains .long 0 # hard-wired (absolute) addresses and a data section. arg: É Store parameters in registers EBX, ECX, . . . .string "/bin/sh" É Put the desired system call number into AL Question. How could you turn this into position É Use the interrupt int 128 to trigger the call .text independent code without separate data? .globl main main: movl $arg, %ebx movl $args, %ecx movl $0, %edx movl $0xb, %eax int $0x80 # execve("/bin/sh",["/bin/sh"],NULL) ret From assembly to shellcode Arbitrary code exploit $ gcc shellcode.s -o shellcode.out $ objdump -d shellcode.out ... 080483ed

: 80483ed: bb a8 84 04 08 mov $0x80484a8,%ebx 1. write code useful for an attacker 80483f2: b9 a0 84 04 08 mov $0x80484a0,%ecx 80483f7: ba 00 00 00 00 mov $0x0,%edx 2. store executable code somewhere in memory 80483fc: b8 0b 00 00 00 mov $0xb,%eax 3. use stack overflow to direct execution there Moreover, we need to find the binary representation of 8048401: cd 80 int $0x80 the instructions (i.e., the compiled shell code). 8048403: c3 ret Two options: This will be the data that we can then feed back into our É shellcode on stack attack. É We take the hex op code sequence bb a8 84... etc and encode it as a string (or URL, filename, etc) to É shellcode in another part of the program data feed into the program as malicious input. Problem in both cases is : There’s a bit of an art to crafting shellcode for different architectures and scenarios. Handily many examples are É how to find out where the code is? online. For example, at shell-storm.org/shellcode or www.exploit-db.com/shellcodes.

Attack code on stack: the NOP sled Attack code elsewhere in memory Stack smashing without shellcode

...

... password: 0x080B8888 Sometimes an attacker cannot directly inject code which gets executed, but can still corrupt return addresses. username: 0x080B4444 corrupted ret. addr. The exact address of the user-controlled data saved EIP 0x0BADC0DE seeded with Return to library (ret2libc) attack code in the stack is hard executable code . to guess. . saved EBP 0x41414141 The attacker overflows a buffer causing the return The attacker can increase the instruction to jump to invoke system() with an chance of success by allowing authenticated: 0x41414141 attack code argument pointing to /bin/sh. a range of addresses to work. The overflow uses a NOP sled, AAAA NOP which the CPU execution . Return-oriented Programming (ROP) buffer . "lands on", before being Sequences of instructions (gadgets) from library code . directed to the attack code. AAAA buffer start addr . are assembled together to manipulate registers, eventually to invoke an library function or even to make NOP É Various (sometimes intricate) possibilities a Turing-complete language. É . . . in an environment variable, modifying function pointers, corrupting caller’s saved frame pointer Heap overflows: overview Memory safety and undefined behaviour Memory allocation in C Memory safety A programming language enforces memory safety if it ensures that reads and writes stay within clearly defined malloc(size) tries to allocate a space of size bytes. The heap is the region of memory that a program uses memory areas. for dynamically allocated data. É It returns a pointer to the allocated region Undefined behaviour É . . . of type void* which the programmer can cast to The runtime or operating system provides memory the desired pointer type A programming language specification describes the management for the heap. É or it fails and returns a NULL pointer meaning of programs. For languages that lack memory É The memory is uninitialised so should be written With explicit , the programmer safety, the spec may say the meaning of an illegal to before being read from uses library functions to allocate and deallocate regions memory access is undefined. of memory. Question. Which points above contribute to (memory) Question. Is there an advantage in "undefined unsafe behaviour in C? behaviour"?

Question. What risks do you see for software security with "undefined" behaviour?

Memory allocation in C Memory allocation in C Simple heap variable attack Without memory safety, heap-allocated variables may free(ptr) frees the previously allocated space at ptr. overflow from one to another.

É No return value (void) char *user = (char *)malloc((char)*8); É If it fails (ptr a non-allocated value), what happens? calloc(size) behaves like malloc(size) but it also char *adminuser = (char *)malloc(sizeof(char)*8); É if ptr is NULL, nothing initialises the memory, clearing it to zeroes. strcpy(adminuser, "root"); É “undefined” otherwise, Question. Suppose we allocate a string buffer, and É program may abort, or might carry on and let bad if (argc >1) things happen strcpy(user, argv[1]); immediately assign the empty string to it. else É What happens if ptr is dereferenced after being strcpy(user, "guest"); What security reason may there be to prefer calloc() freed? over malloc()? /* Now we'll do ordinary operations as "user" and É depends on behaviour of allocator create sensitive system files as "adminuser" */

Question. Suppose we accidently call free(ptr) before the final dereference of ptr() but before another É Is it possible to overflow user and change call to malloc(). Is that safe? adminuser ? Simple heap variable attack Simple heap variable attack $ gcc useradminuser.c -o useradminuser.out $ ./useradminuser.out #include User is at 0x9504008, contains: guest #include Admin user is at 0x9504018, contains: root #include Problem: how do we know where the allocations will be void main(int argc, char argv[]) { made? * $ ./useradminuser.out User is at 0x9483008, contains: guest char *user = (char *)malloc(sizeof(char)*8); Admin user is at 0x9483018, contains: root char adminuser = (char )malloc(sizeof(char) 8); É Heap allocator is free to allocate anywhere, not * * * necessarily in adjacent memory strcpy(adminuser, "root"); $ ./useradminuser.out frank User is at 0x8654008, contains: frank Let’s investigate what happens on Linux x86, glibc. if (argc >1) Admin user is at 0x8654018, contains: root strcpy(user, argv[1]); (for a particular version, on a particular day, . . . ) else strcpy(user, "guest"); É Buffers not adjacent, there’s some extra space printf("User is at %p, contains: %s\n", user, user); printf("Admin user is at %p, contains: %s\n", adminuser, adminuser); É Addresses not identical each run } É But admin user is stored higher in memory!

Remarks about heap variable attack Heap allocator implementation Let’s try overflowing. . . .

$ ./useradminuser.out frank...... david A common heap implementation is to use blocks laid out User is at 0x9405008, contains: frank...... david Admin user is at 0x9405018, contains: id É same kind of attack is possible for (mutable) global contiguously in memory, with a free list intermingled. variables, which are allocated statically in another Heap blocks have headers which give information such memory segment Count more carefully: as: É this is an application-specific attack, need to find

$ ./useradminuser.out frank56789ABCDEFdavid security-critical path near overflowed variable É size of previous block User is at 0x9f0b008, contains: frank56789ABCDEFdavid É need to be lucky: overwriting intervening memory É size of this block Admin user is at 0x9f0b018, contains: david might cause crashes later, before the program gets É flags, e.g., in-use flag to use the intentionally corrupted data É if not in use, pointers to next/previous free block Whoa! Is there a more generic attack for the heap? The doubly-linked free list makes finding spare memory Question. Can you think of a way to prevent this fast for the malloc() operation. attack? Heap allocator implementation General heap overflow attack Unlinking operation

void unlink(mallocblock_t *element) { mallocblock_t *mynext = element->next; mallocblock_t *myprev = element->prev; struct mallocblock { struct mallocblock *next; Rough idea: mynext->prev = myprev; struct mallocblock *prev; myprev->next = mynext; int prevsize; } int thissize; É Coalescing blocks unlinks them from the free list int freeflag; É Attacker makes unlink() do an arbitrary write! // malloc space follows the header } mallocblock_t; É uses overflow to set next and previous É performs two (related) word writes É and set flags to indicate free É mynext->prev= mynext+2, myprev->next= myprev É unlink() then performs write * * É If freeflag is non-zero, the block is in the freelist É attacker arranges at least one of these to be useful É Allocator will split blocks and coalesce them again Exercise. Check you understand this: draw a picture of a doubly linked list and explain how the attacker can make an arbitrary write.

Writing to arbitrary locations Heap spraying and browser exploits Review questions Stack overflows Apart from operating system (C code) memory What locations might the attacker choose? É Explain how uncontrolled memory writing can let an management, other application runtimes provide attacker corrupt the value of local variables. memory allocation features, which may be accessible to É Global Offset Table (GOT) used to link ELF-format É Explain how an attacker can exploit a stack overflow an attacker. binaries. Allows arbitrary locations to be called to execute arbitrary code. instead of a library call. É Draw an stack during a stack overflow attack with a É Exit handlers used in Unix for return from main(). A particular case is in browser-based exploits which NOP sled and shell code, giving addresses É Lock pointers or exception handlers stored in the have made use of heaps for managed runtimes such as Windows Process Environment Block (PEB) JavaScript, VBScript, Flash, HTML5. Heap overflows É Application-level function pointers (e.g. C++ virtual member tables). Writing shell code to predictable heap locations is É Describe the API functions used to interface to heap sometimes called heap spraying. This is simple in allocation in C. Give two examples of risky behaviour. The details are intricate, but library exploits and tookits concept: string variables manipulated in scripts are É Show how overflowing one heap-allocated variable are available (e.g., Metasploit). allocated in a heap. can corrupt a second. É Explain how a heap overflow attack can exploit memory allocation routines to allow arbitrary writes. Coming next References and credits

We’ll looking other kinds of overflow attacks, and the This lecture included examples from: general protection mechanisms mentioned at the start.

É M. Dowd, J. McDonald and J. Schuh. The Art of The best way to understand these attacks is to try them Software Security Assessment, Addison-Wesley out! 2007.

We recommend trying the SEED Labs for some good See the extras page for more pointers. walk-throughs.