Spatial and Temporal Errors Buffer Overflow

Spatial and temporal errors Buffer overflow Buffer overflow is a common vulnerability. Memory corruption errors are sometimes classified as: É Simple cause: Secure Programming Lecture 4: Memory É putting m bytes into a buffer of size n, for m>n É spatial: the error happens because memory access Corruption II (Stack & Heap Overflows) É exceeds some region of memory that a data item is corrupts the surrounding memory intended to occupy. É Simple fix: É main example: buffer overflow É check size of data before/when writing David Aspinall, Informatics @ Edinburgh É temporal: the error happens because memory Overflow exploits, where corruption performs something access occurs in some region of memory that the specific the attacker wants, can be very complex. program ought not currently have access to We’ll study examples to explain how devastating 26th September 2019 É main example: dangling pointer overflows can be, looking at simple (and historical) This lecture focuses on spatial errors. stack overflows and heap overflows. Examples will use Linux/x86 to demonstrate; principles are similar on other OSes/architectures. How the stack works (reminder) Corrupting stack variables Application scenario int authenticate(char *username, char *password) { Stack (frames) int authenticated; // flag, non-zero if authenticated high addresses char buffer[1024]; // buffer for log message ↑ authenticated = verify_password(username, password); . Local variables are put close together on the stack. if (authenticated ==0){ sprintf(buffer, "Incorrect password for user %s\n", É If a stray write goes beyond the size of one variable username); Data É . it can corrupt another log("%s",buffer); } return authenticated; } Code low addresses ↓ É Vulnerability in authenticate() call to sprintf(). Memory É If the username is longer than 995 bytes, data will be written past the end of the buffer. Possible stack frame before exploit Stack frame after exploit Local variable corruption remarks password: 0x080B8888 ... 1235 username: 0x080B4444 AAAAAA... password: 0x080B8888 1235 Tricky in practice: saved EIP (return addr) username: 0x080B4444 AAAAAA... É location of variables may not be known saved EBP (frame ptr) saved EIP (return addr) É memory addresses can vary between invocations authenticated: 0x0000000A É C standards don’t specify stack layout saved EBP (frame ptr) É compiler moves things around, optimises layout AAAA authenticated: 0x00000000 É effect depends on behaviour of application code . buffer . AAAA buffer start addr A more predictable, general attack works by corrupting buffer[1024] (undefined contents) the fixed information in every stack frame: the frame buffer start addr pointer and return address. É If username is >995 letters long, authenticated is ... corrupted and may be set to non-zero. É E.g., char 1024=‘\n’, the low byte becomes 10. Classic stack overflow exploit Attacker controlled execution Arbitrary code exploit . The attacker takes these steps: . By over-writing the return address, the attacker may 1. write code useful for an attacker return address The malicious argument either: overwrites all of the space 2. store executable code somewhere in memory 3. use stack overflow to direct execution there . allocated for the buffer, all the . way to the return address 1. set it to point to some known piece of the location. application code, or code inside a shared library, The attack code is known as shellcode. Typically, it attack code The return address is altered which achieves something useful, or launches a shell or network connection. to point back into the stack, 2. supply his/her own code somewhere in memory, Shellcode is ideally: . somewhere before the attack which may do anything, and arrange to call that. code. Typically, the attack code The second option is the most general and powerful. É small and self-contained buffer executes a shell. É position independent How does it work? É free of ASCII NUL (0x00) characters . Question. Why? Arbitrary code exploit Building shellcode Invoking system calls Consider spawning a shell in Unix. The code looks like this: To execute a library function, the code would need to find the location of the function. #include <unistd.h> ... É for a dynamically loaded library, this requires char *args[] = { "/bin/sh", NULL }; 1. write code useful for an attack execve("bin/sh", args, NULL) ensuring it is loaded into memory, negotiating with 2. store executable code somewhere in memory the linker 3. use stack overflow to direct execution there É this would need quite a bit of assembly code É execve() is part of the Standard C Library, libc É it starts a process with the given name and It is easier to make a system call directly to the argument list and the environment as the third operating system. parameter. É luckily, execve() is a library call which corresponds We want to write (relocatable) assembly code which exactly to a system call. does the same thing: constructing the argument lists and then invoking the execve function. Invoking system calls Invoking a shell From assembly to shellcode Here is the assembly code for a simple system call invoking a shell: .section .rodata # data section args: Linux system calls (32 bit x86) operate like this: .long arg # char *["/bin/sh"] However, this is not yet quite shellcode: it contains .long 0 # hard-wired (absolute) addresses and a data section. arg: É Store parameters in registers EBX, ECX, . .string "/bin/sh" É Put the desired system call number into AL Question. How could you turn this into position É Use the interrupt int 128 to trigger the call .text independent code without separate data? .globl main main: movl $arg, %ebx movl $args, %ecx movl $0, %edx movl $0xb, %eax int $0x80 # execve("/bin/sh",["/bin/sh"],NULL) ret From assembly to shellcode Arbitrary code exploit $ gcc shellcode.s -o shellcode.out $ objdump -d shellcode.out ... 080483ed <main>: 80483ed: bb a8 84 04 08 mov $0x80484a8,%ebx 1. write code useful for an attacker 80483f2: b9 a0 84 04 08 mov $0x80484a0,%ecx 80483f7: ba 00 00 00 00 mov $0x0,%edx 2. store executable code somewhere in memory 80483fc: b8 0b 00 00 00 mov $0xb,%eax 3. use stack overflow to direct execution there Moreover, we need to find the binary representation of 8048401: cd 80 int $0x80 the instructions (i.e., the compiled shell code). 8048403: c3 ret Two options: This will be the data that we can then feed back into our É shellcode on stack attack. É We take the hex op code sequence bb a8 84... etc and encode it as a string (or URL, filename, etc) to É shellcode in another part of the program data feed into the program as malicious input. Problem in both cases is : There’s a bit of an art to crafting shellcode for different architectures and scenarios. Handily many examples are É how to find out where the code is? online. For example, at shell-storm.org/shellcode or www.exploit-db.com/shellcodes. Attack code on stack: the NOP sled Attack code elsewhere in memory Stack smashing without shellcode ... ... password: 0x080B8888 Sometimes an attacker cannot directly inject code which gets executed, but can still corrupt return addresses. username: 0x080B4444 corrupted ret. addr. The exact address of the user-controlled data saved EIP 0x0BADC0DE seeded with Return to library (ret2libc) attack code in the stack is hard executable code . to guess. saved EBP 0x41414141 The attacker overflows a buffer causing the return The attacker can increase the instruction to jump to invoke system() with an chance of success by allowing authenticated: 0x41414141 attack code argument pointing to /bin/sh. a range of addresses to work. The overflow uses a NOP sled, AAAA NOP which the CPU execution . Return-oriented Programming (ROP) buffer . "lands on", before being Sequences of instructions (gadgets) from library code . directed to the attack code. AAAA buffer start addr . are assembled together to manipulate registers, eventually to invoke an library function or even to make NOP É Various (sometimes intricate) possibilities a Turing-complete language. É . in an environment variable, modifying function pointers, corrupting caller’s saved frame pointer Heap overflows: overview Memory safety and undefined behaviour Memory allocation in C Memory safety A programming language enforces memory safety if it ensures that reads and writes stay within clearly defined malloc(size) tries to allocate a space of size bytes. The heap is the region of memory that a program uses memory areas. for dynamically allocated data. É It returns a pointer to the allocated region Undefined behaviour É . of type void* which the programmer can cast to The runtime or operating system provides memory the desired pointer type A programming language specification describes the management for the heap. É or it fails and returns a NULL pointer meaning of programs. For languages that lack memory É The memory is uninitialised so should be written With explicit memory management, the programmer safety, the spec may say the meaning of an illegal to before being read from uses library functions to allocate and deallocate regions memory access is undefined. of memory. Question. Which points above contribute to (memory) Question. Is there an advantage in "undefined unsafe behaviour in C? behaviour"? Question. What risks do you see for software security with "undefined" behaviour? Memory allocation in C Memory allocation in C Simple heap variable attack Without memory safety, heap-allocated variables may free(ptr) frees the previously allocated space at ptr. overflow from one to another. É No return value (void) char *user = (char *)malloc(sizeof(char)*8); É If it fails (ptr a non-allocated value), what happens? calloc(size) behaves like malloc(size) but it also char *adminuser = (char *)malloc(sizeof(char)*8); É if ptr is NULL, nothing initialises the memory, clearing it to zeroes.

Load more