6.006 Course Notes
Total Page:16
File Type:pdf, Size:1020Kb
6.006 Course Notes Wanlin Li Fall 2018 1 September 6 • Computation • Definition. Problem: matching inputs to correct outputs • Definition. Algorithm: procedure/function matching each input to single out- put, correct if matches problem • Want program to be able to handle infinite space of inputs, arbitrarily large size of input • Want finite and efficient program, need recursive code • Efficiency determined by asymptotic growth Θ() • O(f(n)) is all functions below order of growth of f(n) • Ω(f(n)) is all functions above order of growth of f(n) • Θ(f(n)) is tight bound • Definition. Log-linear: Θ(n log n) Θ(nc) • Definition. Exponential: b • Algorithm generally considered efficient if runs in polynomial time • Model of computation: computer has memory (RAM - random access memory) and CPU for performing computation; what operations are allowed in constant time • CPU has registers to access and process memory address, needs log n bits to handle size n input • Memory chunked into “words” of size at least log n 1 6.006 Wanlin Li 32 10 • E.g. 32 bit machine can handle 2 ∼ 4 GB, 64 bit machine can handle 10 GB • Model of computaiton in word-RAM • Ways to solve algorithms problem: design new algorithm or reduce to already known algorithm (particularly search in data structure, sort, shortest path) • Classification of algorithms based on graph on function calls • Class examples and design paradigms: brute force (completely disconnected), decrease and conquer (path), divide and conquer (tree with branches), dynamic programming (directed acyclic graph), greedy (choose which paths on directed acyclic graph to take) • Computation doesn’t have to be a tree, just has to be directed acyclic graph 2 September 11 • Example. Check if array contains item 1 def contains(A,v): 2 fori in range(len(A)): 3 if A[i] == v: 4 return True 5 return False • Other option to call function recursively using additional parameter, decrease and conquer; contains(A, v, i = 0) • Divide and conquer would check if middle element is desired element, otherwise return if element is in either half of array; still O(n) • Maximum subarray sum problem: what is the largest sum of any nonempty (contiguous) subarray? • Example. Brute force 1 def MSS(A): 2 m = A[0] 3 forj in range(1,len(A)+1): 4 fori in range(0,j): 5 m = max(m, SS(A,i,j)) 6 returnm where SS is helper function for summing subarray; O(n3) because there are O(n2) nodes and each requires linear time • Divide and conquer approach: MSS fully in left half of array, fully in right half of array, or contains middle two elements 2 6.006 Wanlin Li 1 def MSS(A, i=0, j=None): 2 ifj is None: 3 j = len(A) 4 if j-i == 1: 5 return A[i] 6 c = (i+j)//2 7 return max(MSS(A,i,c), MSS(A,c,j), MSS_EA(A,i,c) + MSS_SA(A,c,j)) 8 def MSS_SA(A,i,j): 9 s = m = A[i] 10 fork in range(1, j-i): 11 s += A[i+k] 12 m = max(s,m) 13 returnm 14 def MSS_EA(A,i,j): 15 s = m = A[j-1] 16 fork in range(1,j-i): 17 s += A[j-1-k] 18 m = max(s,m) 19 returnm n which is O(n log n); O(n) from subarray in both halves of array, O 2 from each half of array ! recursion tree has log n levels, each requiring O(n) work, so total runtime is O(n log n) • As dynamic programming, 1 def MSS(A): 2 m = mss_ea = A[0] 3 fori in range(1,len(A)): 4 mss_ea = A[i] + max(0, mss_ea) 5 m = max(m,mss_ea) 6 returnm 3 September 13 • Sorting problem: input is array, output is permutation of the array in sorted order • In place sorting: B = A and O(1) extra space (no second array is created, A is modified); list.sort is in place, sorted is not in place sorting • Permutation sort: check if each permutation B of A is sorted, takes Θ(n · n!) time (n! permutations and Θ(n) time to check) • Insertion sort: decrease and conquer, in-place, recursively sort all but one item; sort first n − 1 elements, then insert A[n − 1] in the correct place; shifting ele- ments of array takes linear time so overall time is O(n2) • Example. Bottom-up implementation 3 6.006 Wanlin Li 1 fori in range(1,n): 2 while A[i] < A[i-1]: 3 A[i], A[i-1] = A[i-1], A[i] 4 i -= 1 5 returnA • T (n): running time of algorithm on input of size n but nature of input also mat- ters • T (n) is the worst case running time on input of size n; worst case bound gives guarantee on running time 2 2 • Insertion sort is O(n ) due to two nested loops of ≤ n iterations, Ω(n ) in reverse sorted case (just need a bad case for lower bound), so T (n) = Θ(n2) • Divide and conquer algorithm: if size n input divided into a subproblems of size n n b ;T (n) = aT b + f(n) where f(n) is the time needed to divide and combine • n n Merge sort: recursively sort A[: 2 ] and A[ 2 :]; then merge left and right halves into output by comparing current elements of halves • Example. 1 def merge_sort(A, i=0, j=None): 2 ifj is None: j = len(A) 3 if j - i == 1: return [A[i]] 4 m = (i + j)//2 5 L = merge_sort(A[:m]) 6 R = merge_sort(A[m:]) 7 return list(merge(L,R)) 8 def merge(L,R): 9 l = r = 0 10 whilel< len(L) orr< len(R): 11 ifl< len(L) and (r >= len(R) or L[l] <= R[r]): 12 yield L[l] 13 l += 1 14 else: 15 yield R[r] 16 r += 1 • n T (n) = 2T 2 + Θ(n) commonly called merge sort recurrence 4 September 18 • Sequence interface and data structures (linked lists, dynamic arrays; amortiza- tion), set interface • Definition. Interface (API/ADT): specification, what data can be maintained, supported operations and meaning, problem 4 6.006 Wanlin Li • Definition. Data structure: implementation, how data gets stored, algorithm implementing operations, solution • Python list uses dynamic array • Static sequence: problem of maintaining sequence of objects/items x0; x1; : : : xn−1; subject to len(), sequence iteration, at(i), set-at(i,x) • Ideal solution to static sequence is static array, e.g. 1 def allocate(n):#pretend static array 2 return [None] * n 3 class StaticArray (ArraySequence): 4 def __init__(self,n): 5 self.n = n 6 self.array = allocate(n) 7 def len(self): return self.n 8 def at(self, i): return self.array[i] 9 def set_at(self, i, x): self.array[i] = x • Special cases of left() (at(0)) and right() (at(n − 1)) • Static sequence is Θ(1) for everything except iteration • Dynamic sequence: also includes insert-at(i,x) forcing all the subsequent items to shift back one; delete-at(i), insert/delete at left/right sides • Special cases: stack (insert/delete right), queue (insert right or delete left), deque (insert/delete left/right) • In static array, attempting to do dynamic process requires Θ(n) time because it allocates a new array and shifts the items • Memory allocation: RAM can be viewed as large array, can allocate and initialize array of n words in Θ(n) time; ensures time is Ω(space) • Ways to overcome problem of copying over information in static array: linked lists or dynamic arrays • Linked lists allocate O(1) size arrays instead of contiguous block, use pointers; allocate structure of size 3 for each item, contains previous pointer, item, next pointer; constant time for each insertion and deletion but indexing takes Θ(n) time • Dynamic arrays: still must allocate array of size n words in Θ(n) time, but in- stead of resizing array to exactly size n maintain an invariant of size Θ(n) ≥ n; when inserting and size = n; want to increase size to 2n; when deleting and size n is 4 ; halve the size (here n is size of array) 5 6.006 Wanlin Li • n Python list doesn’t necessarily use 4 and 2n; any constant coefficients a; b work as long as a < 1 < b • Worst-case insert right is still Θ(n); but very few inserts actually require the expensive call; resize only when reach n is a power of 2; total cost for n = 2k insert rights from empty sequence is Θ(1+2+···+2k) = Θ(n); amortized time is constant per operation • In algorithms, care about worst case; in data structures, care more about amor- tization • Definition. Amortized analysis: only meaningful for data structures; operation has amortized cost T (n) if when k operations are performed, they cost at most kT (n) time • Example. Insert right requires constant amortized time • Insertion and deletion in the middle is still expensive • Sequence cares about index of item, set cares about key of item • Static set: find-key(k), basic iteration (can come out in any order) • Dynamic set: includes insert(item), delete(key); static set and dynamic set are basically Python set/dictionary, solved by hashing • Ordered set: find-next(key), find-previous(key), special cases of finding min/max • Dynamic ordered set: delete min/max 5 September 20 • Sequence is like a checkout line (where objects are), set is like a raffle (what the object is) • Priority queue: optimized for finding most important item; can find length, can insert at any time, can find max or find and remove max • Priority queues good for sorting; given array A; insert all elements into priority queue and repeatedly remove max or remove min • Multi-set API can have repeated elements (generally want this behavior in pri- ority queue) • Priority queue: elements stored in array without caring about order (can be dynamic array), insert at end (Q.append) in O(1) (possibly amortized), swap max with end of array and pop last element i.e.