<<

Program, Application CS 251 Fall 2019 CS 240 Principles of Programming Languages λ BenFoundations Wood of Systems Compiler/

Programming with Memory Software Operating System Instruction Architecture

the memory model Microarchitecture pointers and arrays in Digital Logic

Devices (, etc.) Hardware

https://cs.wellesley.edu/~cs240/ Programming with Memory 2 Solid-State PhysicsProgramming with Memory 3

Instruction Set Architecture (HW/SW ) -addressable memory = mutable byte array processor memory Instructions 0xFF…F Location / = element • Names, Encodings Instruction Encoded • Effects Logic Instructions • Identified by unique numerical address • Arguments, Results • range of possible addresses possible of range Holds one byte Registers space address Local storage • Names, Size Address = index • How many Large storage • Unsigned number • Addresses, Locations

• • • • • • Represented by one word • Computable and storable as a value

Operations: 0x00…0 • Load: read contents at given address Computer • Store: write contents at given address

Programming with Memory 4 Programming with Memory 5 In what order are the individual of a multi-byte value Multi-byte values in memory stored in memory? 64- Store across contiguous byte locations. least significant byte Words Bytes Address most significant byte 0x1F 0x1E 0x1D 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Alignment (Why?) 0x1C 0x1B 2A B6 00 0B 0x1A 0x19 0x18 0x17 Address Contents Little Endian: least significant byte first 0x16 0x15 03 2A • low order byte at low address 0x14 0x13 02 B6 • high order byte at high address ✘ 0x12 0x11 01 00 • used by , … 0x10 0x0F 0x0E 00 0B 0x0D 0x0C 0x0B 0x0A Address Contents 0x09 Big Endian: most significant byte first 0x08 03 0B • high order byte at low address 0x07 0x06 • low order byte at high address Bit order within byte always same. 0x05 02 00 0x04 • used by networks, SPARC, … 0x03 01 B6 Byte ordering within larger value? 0x02 0x01 00 2A 0x00 Programming with Memory 6 Programming with Memory 7

Data, addresses, and pointers C: Variables are locations address = index of a location in memory Compiler maps variable name à location. pointer = a reference to a location in memory, Declarations do not initialize! represented as an address stored as data 0x24 int x; // x @ 0x20 00 00 00 F0 0x20 int y; // y @ 0x0C The number 240 is stored at address 0x20. 0x24 0x1C 240 = F0 = 0x00 00 00 F0 0x20 10 16 0x18 x = 0; // store 0 @ 0x20 x 0x1C A pointer stored at address 0x08 0x14 points to the contents at address 0x20. 0x18 00 00 00 0C 0x10 // store 0x3CD02700 @ 0x0C 0x14 A pointer to a pointer 0x0C y = 0x3CD02700; 0x10 is stored at address 0x00. 00 00 00 20 0x08 0x0C y The number 12 is stored at address 0x10. 0x04 // 1. load the contents @ 0x0C 0x08 00 00 00 08 0x00 Is it a pointer? // 2. add 3 0x04 How do we know if values are pointers or not? +3 +2 +1 +0 // 3. store sum @ 0x20 0x00 How do we manage use of memory? memory drawn as 32-bit values, x = y + 3; +3 +2 +1 +0 little endian order Programming with Memory 12 Programming with Memory 13 & = address of C: Pointer operations and types C: Pointer example * = contents at Declare a variable, p address = index of a location in memory int* p; pointer = a reference to a location in memory, represented as an address stored as data that will hold the address of a memory location holding an int int x = 5; Declare two variables, x and y, that hold ints, Expressions using addresses and pointers: int y = 2; and store 5 and 2 in them, respectively. &___ address of the memory location representing ___ a.k.a. "reference to ___" Take the address of the memory location *___ contents at the given by ___ p = &x; representing x a.k.a. "dereference ___" ... and store it in the memory location representing p. Now, “p points to x.” Pointer types: Add 1 to the contents of memory at the address ___* address of a memory location holding a ___ given by the contents of the y = 1 + *p; a.k.a. "a reference to a ___" memory location representing p … and store it in the memory location representing y. Programming with Memory 17 Programming with Memory 21

& = address of C: Pointer example * = contents at C: Pointer type syntax

C assignment: location Spaces between base type, *, and variable name mostly do not matter. What is the type of *p? Left-hand-side = right-hand-side; The following are equivalent: value What is the type of &x? int* p; // p @ 0x04 What is *(&y) ? I prefer this int x = 5; // x @ 0x14, store 5 @ 0x14 int* ptr; int y = 2; // y @ 0x24, store 2 @ 0x24 0x24 y I see: "The variable ptr holds an address of an int in memory." p = &x; // store 0x14 @ 0x04 0x20 0x1C int * ptr; // 1. load the contents @ 0x04 (=0x14) 0x18 // 2. load the contents @ 0x14 (=0x5) 0x14 x // 3. add 1 0x10 int *ptr; more common C style // 4. store sum as contents @ 0x24 0x0C Looks like: "Dereferencing the variable ptr will yield an int." y = 1 + *p; 0x08 0x04 p Or "The memory location where the variable ptr points holds an int." 0x00 // 1. load the contents @ 0x04 (=0x14) Caveat: do not declare multiple variables unless using the last form. +3 +2 +1 +0 // 2. store 0xF0 as contents @ 0x14 int* a, b; means int *a, b; means int* a; int b; *p = 240; Programming with Memory 25 Programming with Memory 26 Arrays are adjacent memory locations Arrays are adjacent memory locations C: Arrays storing the same type of data. C: Arrays storing the same type of data. a is a name for the array’s base address, a is a name for the array’s base address, Declaration: int a[6]; can be used as an immutable pointer. Declaration: int a[6]; can be used as an immutable pointer. Indexing: a[0] = 0xf0; Address of a[i] is base address a element type a[5] = a[0]; plus i times element size in bytes. number of name No bounds a[6] = 0xBAD; elements check: a[-1] = 0xBAD;

0x24 Pointers: int* p; 0x24 0x20 p = a; 0x20 a[5] equivalent 0x1C { p = &a[0]; 0x1C 0x18 *p = 0xA; 0x18 … 0x14 0x14 0x10 p[1] = 0xB; 0x10 equivalent 0x0C { *(p + 1) = 0xB; 0x0C a[0] 0x08 p = p + 2; 0x08 0x04 0x04 p 0x00 array indexing = address arithmetic 0x00 Both are scaled by the size of the type. +3 +2 +1 +0 +3 +2 +1 +0 *p = a[1] + 1; Programming with Memory 27 Programming with Memory 39

C: Array allocation C: Array access ex Basic Principle Basic Principle T A[N]; T A[N]; Array of length N with elements of type T and name A Array of length N with elements of type T and name A Contiguous of N*sizeof(T) bytes of memory Identifier A has type T* char string[12]; Use sizeof to determine proper size in C. int val[5]; 0 2 4 8 1 x x + 12 x x + 4 x + 8 x + 12 x + 16 x + 20 int val[5]; Expression Type Value x x + 4 x + 8 x + 12 x + 16 x + 20 val[4] int 1 double a[3]; val int * x x + 8 x + 16 x + 24 val+1 int * &val[2] int * char* p[3]; IA32 (or char *p[3];) val[5] int x x + 4 x + 8 x + 12 *(val+1) int x86-64 val + i int * x x + 8 x + 16 x + 24 Programming with Memory 40 Programming with Memory 41 C: Null-terminated strings ex C: * and [] ex

C strings: arrays of ASCII characters ending with null . C programmers often use * where you might expect []: e.g., char*: Why? • pointer to a char 0x57 0x65 0x6C 0x6C 0x65 0x73 0x6C 0x65 0x79 0x20 0x43 0x53 0x00 • pointer to the first char in a string of unknown length 'W' 'e' 'l' 'l' 'e' 's' 'l' 'e' 'y' ' ' 'C' 'S' '\0' int strcmp(char* a, char* b); Does Endianness matter for strings? int string_length(char* str) { // Try with pointer arithmetic, but no array indexing.

int string_length(char str[]) {

} }

Programming with Memory 43 Programming with Memory 44

C: 0 vs. '\0' vs. NULL Memory address-space layout Addr Perm Contents Managed by Initialized 0 '\0' 2N-1 Name: zero Name: null character Type: int Type: char RW Procedure context Compiler Run time Size: 4 bytes Size: 1 byte Stack Value: 0x00000000 Value: 0x00 Usage: The integer zero. Usage: Terminator for C strings. Programmer, Dynamic RW malloc/free, Run time NULL Heap data structures Name: null pointer / null reference / null address new/GC Type: void* Global variables/ Compiler/ RW Startup Size: 1 word (= 8 bytes on a 64-bit architecture) Statics static data structures Assembler/Linker Value: 0x00000000000000 Usage: The absence of a pointer where one is expected. Compiler/ R String literals Startup Address 0 is inaccessible, so *NULL is invalid; it crashes. Literals Assembler/Linker Compiler/ Is it important/necessary to encode the null character or the null pointer as 0x0? X Instructions Startup Text Assembler/Linker What happens if a programmer mixes up these "zeroey" values? 0 Programming with Memory 45 Programming with Memory 50 C: Dynamic memory allocation in the heap C: memory allocator Heap: #include // include C standard library void* malloc(size_t size) Allocates a memory block of at least size bytes and returns its address. Allocated block Free block If error (no space), returns NULL. Rules: Check for error result. Managed by memory allocator: Cast result to relevant pointer type. Use sizeof(...) to determine size.

pointer to newly allocated block void free(void* ptr) of at least that size number of contiguous bytes required Deallocates the block referenced by ptr, making its space available for new allocations. void* malloc(size_t size); ptr must be a malloc result that has not yet been freed. pointer to allocated block to free Rules: ptr must be a malloc result that has not yet been freed. void free(void* ptr); Do not use *ptr after freeing.

Programming with Memory 51 Programming with Memory 52

C: Dynamic array allocation C: Array of pointers to arrays of ints #define ZIP_LENGTH 5 int** zips = (int**)malloc(sizeof(int*) * 3); int* = (int*)malloc(sizeof(int)*ZIP_LENGTH); zips[0] = (int*)malloc(sizeof(int)*5); if (zip == NULL) { // if error occurred perror("malloc"); // print error message int* zip0 = zips[0]; exit(0); // end the program zip0[0] = 0; } zips[0][1] = 2; zip 0x7fedd2400dc0 0x7fff58bdd938 zips[0][2] = 4; zips[0][3] = 8; zip[0] = 0; zips[0][4] = 1; zip[1] = 2; 1 0x7fedd2400dd0 zip[2] = 4; 8 0x7fedd2400dcc zips[1] = (int*)malloc(sizeof(int)*5); 4 0x7fedd2400dc8 zip[3] = 8; zips[1][0] = 2; 2 0x7fedd2400dc4 zips[1][1] = 1; zip[4] = 1; 0 0x7fedd2400dc0 zips[1][2] = 0; zips[1][3] = 4; printf("zip is"); zips[1][4] = 4; Why terminate with NULL? for (int i = 0; i < ZIP_LENGTH; i++) { zips[2] = NULL; printf(" %d", zip[i]); } zips printf("\n"); 0x10004380 0x10008900 0x00000000 0 2 4 8 1 Why free(zip); zip no NULL? +0 +4 +8 +12 +16 +20 0 2 4 8 1 2 1 0 4 4 Programming with Memory 53 Programming with Memory 55 Zip code

zips 0x10004380 0x10008900 0x00000000NULL http://xkcd.com/138/ 0 2 4 8 1 2 1 0 4 4

// return a count of all zips that end with digit endNum int zipCount(int* zips[], int endNum) { int count = 0; while (*zips) { if ((*zips)[4] == endNum) count++; zips++; } return count; Watch out! *zips[4] means *(zips[4])

} Programming with Memory 57 Programming with Memory 59

C: scanf reads formatted input C: Classic bug using scanf !!!

int val; Declared, but not initialized. int val; Declared, but not initialized. Holds anything. Holds anything...... Store in memory at the address Store in memory at the address given by the contents of val scanf("%d", &val); given by the address of val: scanf("%d", val); (implicitly cast as a pointer): store input @ 0x7F…F38. store input @ 0xBAD4FACE. Read one int Store it in memory Read one int Store it in memory in decimal10 format at this address. in decimal10 format at this address. Best case: ! crash immediately from input. from input. with segmenta_on fault/. Bad case: " silently corrupt data 0x7FFFFFFFFFFFFF3C 0x7FFFFFFFFFFFFF3C stored @ 0xBAD4FACE, fail to store input in val, 0x7FFFFFFFFFFFFF38 0x7FFFFFFFFFFFFF38 val BA D4 FA CE val BA D4 FA CE and keep going. 0x7FFFFFFFFFFFFF34 0x7FFFFFFFFFFFFF34 ...... Worst case: #$%& CA FE 12 34 0x00000000BAD4FACE program does literally anything. Programming with Memory 61 Programming with Memory 62 C: Memory error messages C: Why? Why learn C? • Think like actual computer (abstraction close to machine level) without dealing with machine code. • Understand just how much Your Favorite Language provides. • Understand just how much Your Favorite Language might cost. • Classic. • Still (more) widely used (than it should be). http://xkcd.com/371/ • Pitfalls still fuel devastating reliability and security failures today. 11: segmentation fault ("segfault", SIGSEGV) Why not use C? accessing address outside legal area of memory • Probably not the right language for your next personal project. 10: bus error (SIGBUS) • It "gets out of the programmer's way" accessing misaligned or other problematic address even when the programmer is unwittingly running toward a cliff. • Many advances in programming language design since then have More to come on debugging! produced languages that fix C's problems while keeping strengths.

Programming with Memory 63 Programming with Memory 64