Spatial and temporal errors Buffer overflow Buffer overflow is a common vulnerability. Memory corruption errors are sometimes classified as: É Simple cause: Secure Programming Lecture 4: Memory É putting m bytes into a buffer of size n, for m>n É spatial: the error happens because memory access Corruption II (Stack & Heap Overflows) É exceeds some region of memory that a data item is corrupts the surrounding memory intended to occupy. É Simple fix:
É main example: buffer overflow É check size of data before/when writing David Aspinall, Informatics @ Edinburgh É temporal: the error happens because memory Overflow exploits, where corruption performs something access occurs in some region of memory that the specific the attacker wants, can be very complex. program ought not currently have access to We’ll study examples to explain how devastating 26th September 2019 É main example: dangling pointer overflows can be, looking at simple (and historical) This lecture focuses on spatial errors. stack overflows and heap overflows. Examples will use Linux/x86 to demonstrate; principles are similar on other OSes/architectures.
How the stack works (reminder) Corrupting stack variables Application scenario
int authenticate(char *username, char *password) {
Stack (frames) int authenticated; // flag, non-zero if authenticated high addresses char buffer[1024]; // buffer for log message ↑ authenticated = verify_password(username, password); . . Local variables are put close together on the stack. if (authenticated ==0){ sprintf(buffer, "Incorrect password for user %s\n", É If a stray write goes beyond the size of one variable username); Data É . . . it can corrupt another log("%s",buffer); } return authenticated; }
Code low addresses ↓ É Vulnerability in authenticate() call to sprintf(). Memory É If the username is longer than 995 bytes, data will be written past the end of the buffer. Possible stack frame before exploit Stack frame after exploit Local variable corruption remarks
password: 0x080B8888 ... 1235
username: 0x080B4444 AAAAAA... password: 0x080B8888 1235 Tricky in practice:
saved EIP (return addr) username: 0x080B4444 AAAAAA... É location of variables may not be known saved EBP (frame ptr) saved EIP (return addr) É memory addresses can vary between invocations
authenticated: 0x0000000A É C standards don’t specify stack layout saved EBP (frame ptr) É compiler moves things around, optimises layout AAAA authenticated: 0x00000000 É effect depends on behaviour of application code . buffer .
AAAA buffer start addr A more predictable, general attack works by corrupting buffer[1024] (undefined contents) the fixed information in every stack frame: the frame buffer start addr pointer and return address. É If username is >995 letters long, authenticated is ... corrupted and may be set to non-zero. É E.g., char 1024=‘\n’, the low byte becomes 10.
Classic stack overflow exploit Attacker controlled execution Arbitrary code exploit
. The attacker takes these steps: . By over-writing the return address, the attacker may 1. write code useful for an attacker return address The malicious argument either: overwrites all of the space 2. store executable code somewhere in memory 3. use stack overflow to direct execution there . allocated for the buffer, all the . way to the return address 1. set it to point to some known piece of the location. application code, or code inside a shared library, The attack code is known as shellcode. Typically, it attack code The return address is altered which achieves something useful, or launches a shell or network connection. to point back into the stack, 2. supply his/her own code somewhere in memory, Shellcode is ideally: . somewhere before the attack which may do anything, and arrange to call that. . code. Typically, the attack code The second option is the most general and powerful. É small and self-contained buffer executes a shell. É position independent How does it work? É free of ASCII NUL (0x00) characters . . Question. Why? Arbitrary code exploit Building shellcode Invoking system calls
Consider spawning a shell in Unix. The code looks like this: To execute a library function, the code would need to find the location of the function. #include
Invoking system calls Invoking a shell From assembly to shellcode
Here is the assembly code for a simple system call invoking a shell:
.section .rodata # data section args: Linux system calls (32 bit x86) operate like this: .long arg # char *["/bin/sh"] However, this is not yet quite shellcode: it contains .long 0 # hard-wired (absolute) addresses and a data section. arg: É Store parameters in registers EBX, ECX, . . . .string "/bin/sh" É Put the desired system call number into AL Question. How could you turn this into position É Use the interrupt int 128 to trigger the call .text independent code without separate data? .globl main main: movl $arg, %ebx movl $args, %ecx movl $0, %edx movl $0xb, %eax int $0x80 # execve("/bin/sh",["/bin/sh"],NULL) ret From assembly to shellcode Arbitrary code exploit $ gcc shellcode.s -o shellcode.out $ objdump -d shellcode.out ... 080483ed
Attack code on stack: the NOP sled Attack code elsewhere in memory Stack smashing without shellcode
...
... password: 0x080B8888 Sometimes an attacker cannot directly inject code which gets executed, but can still corrupt return addresses. username: 0x080B4444 corrupted ret. addr. The exact address of the user-controlled data saved EIP 0x0BADC0DE seeded with Return to library (ret2libc) attack code in the stack is hard executable code . to guess. . saved EBP 0x41414141 The attacker overflows a buffer causing the return The attacker can increase the instruction to jump to invoke system() with an chance of success by allowing authenticated: 0x41414141 attack code argument pointing to /bin/sh. a range of addresses to work. The overflow uses a NOP sled, AAAA NOP which the CPU execution . Return-oriented Programming (ROP) buffer . "lands on", before being Sequences of instructions (gadgets) from library code . directed to the attack code. AAAA buffer start addr . are assembled together to manipulate registers, eventually to invoke an library function or even to make NOP É Various (sometimes intricate) possibilities a Turing-complete language. É . . . in an environment variable, modifying function pointers, corrupting caller’s saved frame pointer Heap overflows: overview Memory safety and undefined behaviour Memory allocation in C Memory safety A programming language enforces memory safety if it ensures that reads and writes stay within clearly defined malloc(size) tries to allocate a space of size bytes. The heap is the region of memory that a program uses memory areas. for dynamically allocated data. É It returns a pointer to the allocated region Undefined behaviour É . . . of type void* which the programmer can cast to The runtime or operating system provides memory the desired pointer type A programming language specification describes the management for the heap. É or it fails and returns a NULL pointer meaning of programs. For languages that lack memory É The memory is uninitialised so should be written With explicit memory management, the programmer safety, the spec may say the meaning of an illegal to before being read from uses library functions to allocate and deallocate regions memory access is undefined. of memory. Question. Which points above contribute to (memory) Question. Is there an advantage in "undefined unsafe behaviour in C? behaviour"?
Question. What risks do you see for software security with "undefined" behaviour?
Memory allocation in C Memory allocation in C Simple heap variable attack Without memory safety, heap-allocated variables may free(ptr) frees the previously allocated space at ptr. overflow from one to another.
É No return value (void) char *user = (char *)malloc(sizeof(char)*8); É If it fails (ptr a non-allocated value), what happens? calloc(size) behaves like malloc(size) but it also char *adminuser = (char *)malloc(sizeof(char)*8); É if ptr is NULL, nothing initialises the memory, clearing it to zeroes. strcpy(adminuser, "root"); É “undefined” otherwise, Question. Suppose we allocate a string buffer, and É program may abort, or might carry on and let bad if (argc >1) things happen strcpy(user, argv[1]); immediately assign the empty string to it. else É What happens if ptr is dereferenced after being strcpy(user, "guest"); What security reason may there be to prefer calloc() freed? over malloc()? /* Now we'll do ordinary operations as "user" and É depends on behaviour of allocator create sensitive system files as "adminuser" */
Question. Suppose we accidently call free(ptr) before the final dereference of ptr() but before another É Is it possible to overflow user and change call to malloc(). Is that safe? adminuser ? Simple heap variable attack Simple heap variable attack $ gcc useradminuser.c -o useradminuser.out $ ./useradminuser.out #include
Remarks about heap variable attack Heap allocator implementation Let’s try overflowing. . . .
$ ./useradminuser.out frank...... david A common heap implementation is to use blocks laid out User is at 0x9405008, contains: frank...... david Admin user is at 0x9405018, contains: id É same kind of attack is possible for (mutable) global contiguously in memory, with a free list intermingled. variables, which are allocated statically in another Heap blocks have headers which give information such memory segment Count more carefully: as: É this is an application-specific attack, need to find
$ ./useradminuser.out frank56789ABCDEFdavid security-critical path near overflowed variable É size of previous block User is at 0x9f0b008, contains: frank56789ABCDEFdavid É need to be lucky: overwriting intervening memory É size of this block Admin user is at 0x9f0b018, contains: david might cause crashes later, before the program gets É flags, e.g., in-use flag to use the intentionally corrupted data É if not in use, pointers to next/previous free block Whoa! Is there a more generic attack for the heap? The doubly-linked free list makes finding spare memory Question. Can you think of a way to prevent this fast for the malloc() operation. attack? Heap allocator implementation General heap overflow attack Unlinking operation
void unlink(mallocblock_t *element) { mallocblock_t *mynext = element->next; mallocblock_t *myprev = element->prev; typedef struct mallocblock { struct mallocblock *next; Rough idea: mynext->prev = myprev; struct mallocblock *prev; myprev->next = mynext; int prevsize; } int thissize; É Coalescing blocks unlinks them from the free list int freeflag; É Attacker makes unlink() do an arbitrary write! // malloc space follows the header } mallocblock_t; É uses overflow to set next and previous É performs two (related) word writes É and set flags to indicate free É mynext->prev= mynext+2, myprev->next= myprev É unlink() then performs write * * É If freeflag is non-zero, the block is in the freelist É attacker arranges at least one of these to be useful É Allocator will split blocks and coalesce them again Exercise. Check you understand this: draw a picture of a doubly linked list and explain how the attacker can make an arbitrary write.
Writing to arbitrary locations Heap spraying and browser exploits Review questions Stack overflows Apart from operating system (C code) memory What locations might the attacker choose? É Explain how uncontrolled memory writing can let an management, other application runtimes provide attacker corrupt the value of local variables. memory allocation features, which may be accessible to É Global Offset Table (GOT) used to link ELF-format É Explain how an attacker can exploit a stack overflow an attacker. binaries. Allows arbitrary locations to be called to execute arbitrary code. instead of a library call. É Draw an stack during a stack overflow attack with a É Exit handlers used in Unix for return from main(). A particular case is in browser-based exploits which NOP sled and shell code, giving addresses É Lock pointers or exception handlers stored in the have made use of heaps for managed runtimes such as Windows Process Environment Block (PEB) JavaScript, VBScript, Flash, HTML5. Heap overflows É Application-level function pointers (e.g. C++ virtual member tables). Writing shell code to predictable heap locations is É Describe the API functions used to interface to heap sometimes called heap spraying. This is simple in allocation in C. Give two examples of risky behaviour. The details are intricate, but library exploits and tookits concept: string variables manipulated in scripts are É Show how overflowing one heap-allocated variable are available (e.g., Metasploit). allocated in a heap. can corrupt a second. É Explain how a heap overflow attack can exploit memory allocation routines to allow arbitrary writes. Coming next References and credits
We’ll looking other kinds of overflow attacks, and the This lecture included examples from: general protection mechanisms mentioned at the start.
É M. Dowd, J. McDonald and J. Schuh. The Art of The best way to understand these attacks is to try them Software Security Assessment, Addison-Wesley out! 2007.
We recommend trying the SEED Labs for some good See the extras page for more pointers. walk-throughs.