<<

6.006 Course Notes

Wanlin Li

Fall 2018

1 September 6

• Computation

• Definition. Problem: matching inputs to correct outputs

• Definition. : procedure/function matching each input to single out- put, correct if matches problem

• Want program to be able to handle infinite space of inputs, arbitrarily large size of input

• Want finite and efficient program, need recursive code

• Efficiency determined by asymptotic growth Θ()

• O(f(n)) is all functions below order of growth of f(n)

• Ω(f(n)) is all functions above order of growth of f(n)

• Θ(f(n)) is tight bound

• Definition. Log-linear: Θ(n log n)

Θ() • Definition. Exponential: b

• Algorithm generally considered efficient if runs in polynomial time

• Model of computation: computer has memory (RAM - random access memory) and CPU for performing computation; what operations are allowed in constant time

• CPU has registers to access and process memory address, needs log n bits to handle size n input

• Memory chunked into “words” of size at least log n

1 6.006 Wanlin Li

32 10 • E.g. 32 bit machine can handle 2 ∼ 4 GB, 64 bit machine can handle 10 GB

• Model of computaiton in word-RAM

• Ways to solve problem: design new algorithm or reduce to already known algorithm (particularly search in , sort, shortest )

• Classification of algorithms based on graph on function calls

• Class examples and design paradigms: brute force (completely disconnected), decrease and conquer (path), divide and conquer ( with branches), (), greedy (choose which paths on directed acyclic graph to take)

• Computation doesn’t have to be a tree, just has to be directed acyclic graph

2 September 11

• Example. Check if array contains item

1 def contains(A,v): 2 fori in range(len(A)): 3 if A[i] == v: 4 return True 5 return False

• Other option to call function recursively using additional parameter, decrease and conquer; contains(A, v, i = 0)

• Divide and conquer would check if middle element is desired element, otherwise return if element is in either half of array; still O(n)

• Maximum subarray sum problem: what is the largest sum of any nonempty (contiguous) subarray?

• Example. Brute force

1 def MSS(A): 2 m = A[0] 3 forj in range(1,len(A)+1): 4 fori in range(0,j): 5 m = max(m, SS(A,i,j)) 6 returnm

where SS is helper function for summing subarray; O(n3) because there are O(n2) nodes and each requires linear time

• Divide and conquer approach: MSS fully in left half of array, fully in right half of array, or contains middle two elements

2 6.006 Wanlin Li

1 def MSS(A, i=0, j=None): 2 ifj is None: 3 j = len(A) 4 if j-i == 1: 5 return A[i] 6 c = (i+j)//2 7 return max(MSS(A,i,c), MSS(A,c,j), MSS_EA(A,i,c) + MSS_SA(A,c,j)) 8 def MSS_SA(A,i,j): 9 s = m = A[i] 10 fork in range(1, j-i): 11 s += A[i+k] 12 m = max(s,m) 13 returnm 14 def MSS_EA(A,i,j): 15 s = m = A[j-1] 16 fork in range(1,j-i): 17 s += A[j-1-k] 18 m = max(s,m) 19 returnm

n  which is O(n log n); O(n) from subarray in both halves of array, O 2 from each half of array → recursion tree has log n levels, each requiring O(n) work, so total runtime is O(n log n)

• As dynamic programming,

1 def MSS(A): 2 m = mss_ea = A[0] 3 fori in range(1,len(A)): 4 mss_ea = A[i] + max(0, mss_ea) 5 m = max(m,mss_ea) 6 returnm

3 September 13

• Sorting problem: input is array, output is permutation of the array in sorted order

• In place sorting: B = A and O(1) extra space (no second array is created, A is modified); list.sort is in place, sorted is not in place sorting

• Permutation sort: check if each permutation B of A is sorted, takes Θ(n · n!) time (n! permutations and Θ(n) time to check)

: decrease and conquer, in-place, recursively sort all but one item; sort first n − 1 elements, then insert A[n − 1] in the correct place; shifting ele- ments of array takes linear time so overall time is O(n2)

• Example. Bottom-up implementation

3 6.006 Wanlin Li

1 fori in range(1,n): 2 while A[i] < A[i-1]: 3 A[i], A[i-1] = A[i-1], A[i] 4 i -= 1 5 returnA

• T (n): running time of algorithm on input of size n but nature of input also mat- ters

• T (n) is the worst case running time on input of size n, worst case bound gives guarantee on running time

2 2 • Insertion sort is O(n ) due to two nested loops of ≤ n iterations, Ω(n ) in reverse sorted case (just need a bad case for lower bound), so T (n) = Θ(n2)

• Divide and conquer algorithm: if size n input divided into a subproblems of size n n  b ,T (n) = aT b + f(n) where f(n) is the time needed to divide and combine

• n n : recursively sort A[: 2 ] and A[ 2 :], then merge left and right halves into output by comparing current elements of halves

• Example.

1 def merge_sort(A, i=0, j=None): 2 ifj is None: j = len(A) 3 if j - i == 1: return [A[i]] 4 m = (i + j)//2 5 L = merge_sort(A[:m]) 6 R = merge_sort(A[m:]) 7 return list(merge(L,R)) 8 def merge(L,R): 9 l = r = 0 10 whilel< len(L) orr< len(R): 11 ifl< len(L) and (r >= len(R) or L[l] <= R[r]): 12 yield L[l] 13 l += 1 14 else: 15 yield R[r] 16 r += 1

• n  T (n) = 2T 2 + Θ(n) commonly called merge sort recurrence

4 September 18

• Sequence interface and data structures (linked lists, dynamic arrays; amortiza- tion), set interface

• Definition. Interface (API/ADT): specification, what data can be maintained, supported operations and meaning, problem

4 6.006 Wanlin Li

• Definition. Data structure: implementation, how data gets stored, algorithm implementing operations, solution

• Python list uses dynamic array

• Static sequence: problem of maintaining sequence of objects/items x0, x1, . . . xn−1; subject to len(), sequence iteration, at(i), set-at(i,x)

• Ideal solution to static sequence is static array, e.g.

1 def allocate(n):#pretend static array 2 return [None] * n 3 class StaticArray (ArraySequence): 4 def __init__(self,n): 5 self.n = n 6 self.array = allocate(n) 7 def len(self): return self.n 8 def at(self, i): return self.array[i] 9 def set_at(self, i, x): self.array[i] = x

• Special cases of left() (at(0)) and right() (at(n − 1))

• Static sequence is Θ(1) for everything except iteration

• Dynamic sequence: also includes insert-at(i,x) forcing all the subsequent items to shift back one; delete-at(i), insert/delete at left/right sides

• Special cases: stack (insert/delete right), queue (insert right or delete left), deque (insert/delete left/right)

• In static array, attempting to do dynamic process requires Θ(n) time because it allocates a new array and shifts the items

• Memory allocation: RAM can be viewed as large array, can allocate and initialize array of n words in Θ(n) time; ensures time is Ω(space)

• Ways to overcome problem of copying over information in static array: linked lists or dynamic arrays

• Linked lists allocate O(1) size arrays instead of contiguous block, use pointers; allocate structure of size 3 for each item, contains previous pointer, item, next pointer; constant time for each insertion and deletion but indexing takes Θ(n) time

• Dynamic arrays: still must allocate array of size n words in Θ(n) time, but in- stead of resizing array to exactly size n maintain an invariant of size Θ(n) ≥ n; when inserting and size = n, want to increase size to 2n; when deleting and size n is 4 , halve the size (here n is size of array)

5 6.006 Wanlin Li

• n Python list doesn’t necessarily use 4 and 2n, any constant coefficients a, b work as long as a < 1 < b

• Worst-case insert right is still Θ(n), but very few inserts actually require the expensive call; resize only when reach n is a power of 2, total cost for n = 2k insert rights from empty sequence is Θ(1+2+···+2k) = Θ(n); amortized time is constant per operation

• In algorithms, care about worst case; in data structures, care more about amor- tization

• Definition. Amortized analysis: only meaningful for data structures; operation has amortized cost T (n) if when k operations are performed, they cost at most kT (n) time

• Example. Insert right requires constant amortized time

• Insertion and deletion in the middle is still expensive

• Sequence cares about index of item, set cares about key of item

• Static set: find-key(k), basic iteration (can come out in any order)

• Dynamic set: includes insert(item), delete(key); static set and dynamic set are basically Python set/dictionary, solved by hashing

• Ordered set: find-next(key), find-previous(key), special cases of finding min/max

• Dynamic ordered set: delete min/max

5 September 20

• Sequence is like a checkout line (where objects are), set is like a raffle (what the object is)

• Priority queue: optimized for finding most important item; can find length, can insert at any time, can find max or find and remove max

• Priority queues good for sorting; given array A, insert all elements into priority queue and repeatedly remove max or remove min

• Multi-set API can have repeated elements (generally want this behavior in pri- ority queue)

• Priority queue: elements stored in array without caring about order (can be dynamic array), insert at end (Q.append) in O(1) (possibly amortized), swap max with end of array and pop last element i.e. do one pass of in O(n)

6 6.006 Wanlin Li

• Priority queue not an in-place sort, instead a destructive sort; in-place is de- structive sort using O(1) extra memory

• Could also keep sorted array to make remove max O(1) time, but then insertion is O(n) time (becomes insertion sort)

• Binary is better balanced between remove max and insertion

• Binary tree recipe: fill complete binary tree in reading order

• Nodes labeled in sensible order so view of array as tree is just mental image

• Array is bijection with binary tree with specified shape

•  i−1  Left child is 2i + 1, right child is 2i + 2, parent is 2 , depth is blog(i − 1)c

• Definition. Node max heap property at node i: node i has property if A[i] ≥ A[2i + 1],A[2i + 2]

• Max heap is array with every node having node max heap property

• Max heap is O(1) for returning max, O(log n) for inserting new element and resorting into max heap

• Definition. Tree max heap property at node i: A[i] larger than all descendants

• Max heap sorting works by starting at bottom and swapping with parent until tree is sorted

• Remove max swaps root with last leaf and pops element, then re-sorts tree from top to bottom in O(log n) time

: n inserts requires O(n log n), n removals requires O(n log n) so total is O(n log n); can be performed in place, if all elements inserted at once then heapsort can be optimized to O(n) time

6 September 25

• General purpose dynamic ordered (multi) set: like a priority queue but not re- stricted to min/max, like sorted array but is dynamic

• Dynamic set operations: insert, delete, find

• Ordered set operations: find min/max, successor, delete min/max

• API uses insert item with key; item can also be the key

7 6.006 Wanlin Li

• Linked binary tree: node and key with parent, left and right pointers; like a , acyclic graph, can think of tree as pointer to root

: satisfies binary search tree property BST (items “kept in order”; at each node with key k, k ≥ every key in left subtree and ≤ every key in right subtree); binary search in frozen form

• Find(node, k): compare k to node.key, if k too large then move to right child and if k too small then move to left child

• Find(node) means find the smallest node of the tree rooted at that node

• Find-min(node): keep going to left child, if left child does not exist then return node

• In-order traversal: list everything to the left, then list current node, then list everything to the right; Θ(n) time

• Successor(node): if possible, return find-min(node.right); otherwise go up until the parent is on the right

• Successor is O(h), where h is height of tree

• Iterative traversal is O(n) rather than O(h) because each node is visited at most 3 times

• Insert: can always add new key as leaf following search algorithm

• Delete: find minimum of right half of tree and move it to the node, then recurse

• Find, insert, delete, remove-min, all in O(h) time

2 • BST sort is O(n ) due to uncertain depth of binary tree

7 September 27

• Height of node: number of edges on longest path down to leaf

• Depth of node: number of edges up to root

• Definition. Skew: height of right child minus height of left node; define skew of empty tree to be −1

• Definition. AVL property: skew of each node has absolute value at most 1

• Definition. AVL tree: every node has AVL property

• If every node has AVL property, height of tree is at most 2 log n

8 6.006 Wanlin Li

• Let S(h) be the smallest number of nodes needed for AVL tree of height h; com- h pute S(h) recursively S(h) = S(h − 1) + S(h − 2) + 1 ≥ 2S(h − 2) ≥ 2 2 S(0) so tree with n nodes has at most height 2 log n

• Augmenting BST nodes: every node has node.height and node.skew precom- puted at cost of dynamic operations maintaining augmented values

• Always insert something with height 0 and skew 0

• To delete from BST, find successor and delete recursively; for AVL tree, perform the normal BST operation and then find the lowest node whose subtree changed, update heights and skews up the tree

• Augmented values can be computed directly from children, changes propogate up the tree and take O(h) time total

• Other augmentations: have every node store min and max of subtree, allows returning min and max in O(1) time

• BST rotations: preserving BST property but reconfiguring the tree; if A, B are subtrees of X,X is left child of Y, and C is right subtree of Y, then rotate-right(Y) keeps A as left subtree of X but moves Y to right child of X and transfers B to left subtree of Y

• BST rotation requires changing pointers (O(1)) and updating heights/skews

• AVL rebalancing: during upward maintain pass, might hit node x with skew of ±2; problem when x has skew 2 and right child y has skew −1; rotate right at y and then rotate left at x fixes problem

8 October 2

Set Interface Set Dynamic Order D + O Space Data Structure find insert delete find-next find-max delete-max ∼ ×n Unsorted Array n n n n n n 1 Linked List n 1 n n n n 3 Dynamic Array n 1a n n n n 4 Sorted Array log n n n log n 1 n 1 Max Heap n log na n n 1 log n 1 Balanced BST log n log n log n log n (1) log n 5 u Direct Access 1 1 1 u u u n 1 1 1 n n n 1

• Find(key), insert(item), delete(key), find-next(key)

9 6.006 Wanlin Li

• Comparison model: items as black boxes with comparison operation on keys so time ≥ Ω(number of comparisons)

• Binary decision tree algorithm to make comparisons, algorithm bounded by depth of decision tree

• In order traversal search has depth n and n + 1 possible outputs

• Question: what is smallest height for any binary tree with n + 1 leaves? (each output must be a leaf)

• Any comparison algorithm must run in at least O(log n) time; so binary search is optimal

• Better tree would have non-constant branching factor for decreased time

• Definition. Direct Access Array: store key in the index, e.g. if element is 3 then store it in index 3; decision tree has linear branching factor from root for depth of 1 so finding element takes Θ(1) time

• Direct access array inefficient in terms of space, would have to store everything

• Hashing: if u >> n, use smaller direct access array with size m = Θ(n), then use hash function to map u → m; smaller direct access array is hash table

• Space of u items maps to space of m items by function h; hash of key is h(k)

• m is number of things that are actually stored, u is number of possibilities total

• Definition. Collision: h is not injective, if k1 6= k2 and h(k1) = h(k2) then this is a collision; same index generated for more than one key

• Associate data structure (chain) with each space in direct access array

• Definition. Chaining: external data structure to hold all the elements with the same hash value to handle collisions; supports find, insert, delete (could be linked list, dynamic array)

• n If collisions have constant size m = O(1), chain doesn’t have to be efficient and hash function just has to minimize collisions

• Fixed hash function makes constant collision size impossible

• Sample hash function: division (mod)

• Universal hash function: example hab(k) = ((ak + b) mod p) mod m where p is some large prime and a, b ∈ Fp; defines hash family H = {hab|a, b ∈ Fp}

• Fix p, m at the beginning and then choose random hash function for input

10 6.006 Wanlin Li

• 1 Definition. Universal hash function: Ph∈H (h(ki) = h(kj)) ≤ m

• Xij is indicator random variable equal to 1 if h(ki) = h(kj), 0 otherwise; Xi is P n random variable = j Xij = m

• n Expected length of chain is m which is highly desirable

9 October 4

• Comparison search cannot do better than Ω(log n), direct access array faster but requires too much space; hash range u down to range m = Θ(n) to create expected O(1) find, amortized for dynamic

: bound height of tree by number of leaves, n! possible outputs so using Stirling’s approximation comparison sort must be Ω(log(n!)) = Ω(n log n)

• Direct access array sort of unique keys: initializes direct access array with size u = max(#+1), insert each key into direct access array O(n); then scan through direct access array to output items in order taking O(u)

2 • Increase range so u = Θ(n ) makes direct access array sorting worse; represent each key as number k = an + b and store as tuples base n,

• Sort by second item first, then first item (sort in same space) while keeping same keyed items in same order; this ensures final output will be in correct order

• Stability: duplicate keys in same order in sorted array as in input array; order of repeated keys matters, want tuple sort to be stable

• Sort by least significant first if sort is guaranteed to be stable

: store collisions of duplicate keys in sequence/queue data struc- ture; chaining

• Keys first placed into direct access array by second tuple element with stability, then read out into second (normal) array; process then repeated for first digit, direct access array chains read out into sorted array

c • To sort elements in range n , need O(cn) time which is still O(n) time for con- stant number of counting sorts

c • : if u < n for some c, use tuple sort to repeatedly sort digits from least to most important using stable counting sort; O(n) sorting time

11 6.006 Wanlin Li 10 October 11

• Graph edges as pair of vertices, ordered pair means and un- ordered pairs mean undirected graph

• Length of path is number of edges

• (Unweighted) : from u ∈ V to v ∈ V, find path of minimum possible length from u to v

• δ(u, v) = min length of path from u to v or ∞ when there is no path

• Single pair problem: δ(s, t) and minimum path

• Single source: δ(s, v) for all v ∈ V and shortest path tree containing a shortest path from s to every v

• All pairs shortest path problem: δ(u, v) for all u, v ∈ V

• Graph representation: adjacency sets (neighbors), for each vertex u ∈ V store (outgoing) neighbors of u Adj(u) = {v ∈ V |(u, v) ∈ E}

• Virtually any data structure can be used for adjacency set, only iteration through set matters; adjacency list uses linked list to represent adjacency set but Python list also works

• To test if edge is in graph, hash table is best

• Array representation: assume vertices are {0, . . . , v − 1} so adjacency is array of set data structures; in Python this is like dictionary, assume each vertex is some hashable thing

• Object representation: each vertex is some object and v.Adj stores adjacency set of v

• Single source shortest-path problem (SSSP)

• Breadth first search (BFS): given adjacency sets and source vertex, solves SSSP and gives shortest path tree; visits vertices v ∈ V in order by δ(s, v), i.e. start at δ(s, v) = 0 when v = s and then find the next vertices

• Want to exclude any edges going backwards or staying in the same level; ignore repeats

• BFS takes O(V + E) time which is linear time for a graph → BFS is fast

12 6.006 Wanlin Li 11 October 18

• Adjacency set representation of graph can be stored as any type of list

• BFS: explore graph level by level

• Depth first search DFS: natural recursive exploration of graph, depth before breadth

• Example. Traversal:

1 def DFS(Adj, s): 2 parent = [None] * len(Adj)#DFS tree 3 def visit(u): 4 forv in Adj[u]:#edge(u,v) 5 if parent[v] is None: 6 parent[v] = u 7 visit(v) 8 parent[s] = s#s is root 9 visit(s)

each vertex visited at most once, pay |Adj(u)| local to visit(u) which takes O(E) time, initialization takes O(V ) time so overall O(V + E) time (linear for graph)

• Parent pointers form tree; tree edges

• DFS better for maze with one solver

• DFS does not solve for shortest path, has shorter code

• BFS and DFS both check for repeated vertices visited, only differ in order of visiting vertices

• BFS uses queue to store order of visited vertices, DFS uses stack to store order of visited vertices; otherwise exactly the same

• DFS creates tree paths, at most one path between any two points

• BFS, DFS not guaranteed to visit entire graph, DFS especially can be modified to explore whole graph; try all vertices as starting point, but skip ones already seen by previous search

• Adds code

1 fors in range(len(Adj)):#try all sources 2 if parent[s] is None: 3 parent[s] = s#trys as root 4 visit(s)

to try all possible roots, still takes same asymptotic time

13 6.006 Wanlin Li

• DFS outputs subgraph of disconnected trees

• Edge classification of graph:

1. Tree edges: edges of forest found by DFS 2. Forward edge: edge from one vertex of a tree to one of its descendants 3. Backwards edge: edge from one vertex of a tree to one of its ancestors (or to self) 4. Cross edge: edge from one vertex to another in the graph but without direct ancestry relationship, not necessarily even in the same tree

detection: graph G has cycle iff full DFS has a back edge

• Classification of graph edges can be determined by including timer in full DFS algorithm and considering inclusion of time intervals

• Topological sorting: given directed acyclic graph, order vertices such that for any edge (u, v), order u before order v; solution of topological sorting is reverse of full DFS finishing times, i.e. append vertex to order after visiting everything in its subtree and then reverse order

12 October 23

• Single source shortest path, shortest path tree

• Edge-weighted graph: directed graph with edge weight function, weight of path is sum of weights of edges of path

• Edge representation now includes adjacent vertex and weight of connecting edge; one possibility as dictionary of dictionaries or tuples

• Adj is now list of keys in dictionary but weights can also be included

• Negative-weight cycle can prevent existence of minimum weight path; in this case δ(u, v) = −∞

• Instead of minimum of path weight, use infimum

• Brute force algorithm for shortest path: attempt to list all paths, not guaranteed to terminate; in directed acyclic graph this is still exponential

• Subpath property: subpaths of shortest paths are also shortest paths

• Tree property: there exists a tree containing a shortest path from s to every vertex v ∈ V where δ(s, v) is finite

14 6.006 Wanlin Li

• Relaxation meta-algorithm: maintain distance estimate d[v] with invariant that d[v] ≥ δ(s, v) for all v; goal for d[v] = δ(s, v)

• Relaxing edge (u, v):

1 def relax(u,v): 2 if d[v] > d[u] + w(u,v): 3 d[v] = d[u] + w(u,v) 4 parent[v] = u

• Safety lemma: relax(u,v) maintains d[v] ≥ δ(s, v)

• Termination lemma: if no edges can be relaxed, d[v] = δ(s, v) for all v

• Triangle inequality δ(u, v) ≤ δ(u, x) + δ(x, v)

• For DAG, relax edges in topological sort order to minimize runtime

13 October 25

• δ(u, v) is minimum weight of u, v path but could be ±∞

• SSSP computes δ(s, v) for all v

• Relaxation framework: d[v] always ≥ δ(s, v), relax in some order; if no edge is relaxable, done

• Initialize all d values to infinity except source

• Careless choice of edge ordering can result in exponential runtime

• With DAG, exists topological order and can use that; relax all edges from vertices in topological order

• Claim that every shortest path gets relaxed in order, i.e. if path v0 → v1 → ... → vk then v0, v1 is relaxed first

• Claim: when vi−1 gets visited, d[vi] = δ(s, vi), proof by induction

• Subpaths of shortest paths are shortest

• Bellman Ford: allows cycles and negative weights, full-generality and intro- duces possibility of negative weight cycle

• Possibilities: if negative weight cycle is detected, abort; compute all δ(s, v), even −∞; find a negative weight cycle

15 6.006 Wanlin Li

• Edge update order: E = {e1, e2, . . . , en}, one round is relaxing every edge once, repeat for |V | − 1 rounds; must handle negative weight cycles somehow, runs in O(|V ||E|) time by making list of edges first

• Correctness: if δ(s, v) is finite, there exists a shortest path from s to v with at most |V | − 1 edges that does not repeat vertices (simple path)

• Take shortest s, v path with fewest number of edges, this path cannot have cycles

• Bellman Ford correctness: take some minimum weight path s, v with at most V − 1 edges if δ(s, v) finite, path v0 → ... → vk = v

• Sometime in round 1, edge v0 → v1 gets relaxed, then in second round v1 → v2 gets relaxed, etc.

• Detect negative weight cycle: if any edge can still be relaxed after running orig- inal algorithm, raise ValueError

• Find all min weight paths: if after algorithm, any edge (u, v) can be relaxed, then path to v has weight negative infinity and any vertex reachable from v has weight negative infinity; every −∞ found because every negative weight cycle has some relaxable edge

14 October 30

• Dijkstra: SSSP with weight ≥ 0 for all edges, runs in O(V log V + E)

• BFS is special case of Dijkstra, intuition is to travel down edges at unit speed and find all vertices reachable within time t

• Example. If weights are small integers, subdivide edges into unit edges and run BFS but this is not general enough

• Still in relaxation framework, keep track of parents and estimated distance

• Update estimated distances based on order of “events”, travel along edges from vertices based on weight order

• Example code:

1 def dijkstra_sssp(Adj, w, s): 2 parent = [None] * len(Adj) 3 parent[s] = s 4 d = [math.inf] * len(Adj) 5 d[s] = 0 6 7 Q = PriorityQueue.build(Item(id = u, key = d[u]) foru in Adj) 8

16 6.006 Wanlin Li

9 while len(Q) > 0: 10 u = Q.delete_min().id# Delete and processu 11 forv in Adj[u]: 12 if d[v] > d[u] + w(u,v): 13 d[v] = d[u] + w(u,v) 14 parent[v] = u 15 Q.decrease_key(id = v, new_key = d[v]) 16 return d, parent

• Binary heap: cross-reference with DAA/hash table to make updating keys faster

• Types of priority queues: binary heap, DAA/hash table

• Dijkstra runtime:

Queue Build Delete Min Decrease Key Total DAA Heap V V 1 V 2 + E = O(V 2) 2 Hash Table Heap Ve Ve 1e Ve Binary Heap V log V log V (V + E) log V Fibonacci Heap V log Va 1a V log V + E

• In practice, binary heap; in theory, Fibonacci heap (better asymptotic but higher constant)

• Dijkstra correctness: relaxation framework, loop invariant with B the set of nodes that have been decreased and processed; every round adds one vertex to B

• Claim: for every u ∈ B, d[u] = δ(s, u) and for every v∈ / B, d[v] is the min weight of an s − v path that does not leave B until the very last node

• Base case: only one such path, goes to s and has weight 0

• B contains some nodes, M = min{d[v]|v∈ / B}; claim all paths leaving B have weight ≥ M by loop invariant

• For any v∈ / B, δ(s, v) ≥ M because any path leaving B has weight ≥ M

• Single pair shortest path problem: single s, single t, use bidirectional Dijkstra and search from both ends until frontiers meet, i.e. when some vertex has been processed in both runs

• Double processed vertex not necessarily on shortest path (paths can meet in middle of edge), must find node with minimal ds[x] + dt[x]

17 6.006 Wanlin Li 15 November 1

• Shortest path algorithms: use first that applies in order of DAG relaxation, BFS unweighted, Dijkstra (weights nonnegative), Bellman-Ford (general)

2 • All pairs shortest paths: compute δ(u, v) for all pairs u, v, output of size O(V )

2 • One strategy is to run SSSP from each vertex; Bellman-Ford would be O(V E) but this is not optimal

• Idea to change weights so all are nonnegative while preserving shortest paths

• Add some value to all incoming edges to one vertex and subtract same value from all outgoing edges to same vertex; this preserves shortest paths

• Johnson: add new vertex X with weight 0 path to every other vertex, then h(v) = −δ(x, v) where h(v) is value to add to incoming edges and subtract from outgoing edges

0 • Claim: for all edges (u, v), w (u, v) = w(u, v) + δ(x, u) − δ(x, v) ≥ 0 which is true by triangle inequality

• Johnson’s algorithm: add vertex X, compute δ(x, v)∀v, reweight by h(v) = −δ(x, v) to get graph with nonnegative edge weights; then run Dijkstra from each vertex in G0 where shortest path trees are the same

2 • Johnson: O(V log V + VE)

16 November 6

• Search problems, sorting problems, shortest path problems

• Classify algorthims based on dependencies for recursive method, in general just needs to be DAG

• Recursive approach: define subproblems, find recurrence, identify base cases, compute solution from subproblems, analyze run time

• “SR.BST” (subproblems, relate subproblems, base cases, solution, time)

• Example. Fibonacci numbers: subproblem is kth Fibonacci number Fk, recur- rence is Fk = Fk−1 + Fk−2, base case F0 = F1 = 1, solution is Fn

 n  • Fibonacci graph can be tree with Ω 2 2 vertices or DAG with n vertices using dictionary

18 6.006 Wanlin Li

• Subproblems should not be computed more than once

• Dynamic programming or “careful brute force”

• SSSP with dynamic programming: x(v) = δ(s, v), x(v) = min{x(u)+w(u, v)|(u, v) ∈ E}, DAG shortest path is relaxing edges in topological sort order; base cases x(s) = 0 and x(v) = ∞ for anything without dependency, return x(t) and takes O(V + E) time

17 November 8

• Define subproblems (describe meaning of subproblem, record partial state), re- late subproblems (topological order to argue relations form a DAG), identify base cases (solutions for all reachable independent subproblems), compute solution from subproblems, analyze run time

• Want recurrence relation that makes dependencies acyclic, use “smaller” sub- problems

• SSSP by dynamic programming: subproblems are x(v, k) weight of shortest path from s → v using exactly k edges; relation x(v, k) depending on min{x(u, k − 1) + w(u, v)|(u, v) ∈ E}

• Base cases x(v, 0), solution is minimum of x(v, i) for all i ≤ |V | − 1; looks like Bellman-Ford, can also detect negative weight cycle

2 • Time: V subproblems, work is summing over subproblems X X work(x(v, k)) = O(degin V ) v∈V k∈[|V |]

which is O(VE) just as Bellman-Ford

• Run DAG first, then Bellman-Ford

• Example. Cutting rod of length n with price list of each possible integer length; x(i) maximizes revenue on log of length i

18 November 13

• Recurrence from subsequence, prefix/suffix

• Example. Test similarity between two strings

• Edit distance: input strings A, B and output the minimum number of edits to transform A into B

19 6.006 Wanlin Li

• Edits: insert/delete character somewhere in string, replace character with other character

• x(i, j): minimum number edits needed to transform prefix ending at ith letter at A into prefix ending with jth letter of B

• If A[i] = B[j], number of edits is x(i − 1, j−); otherwise 1 + x(i, j − 1) if inserting into A, 1+x(i−1, j) if deleting from A, 1+x(i−1, j −1) if replacing last character

• Base cases x(0, j) requires j insertions, x(i, 0) requires i deletions

• Constant work at each subproblem but total work is O(|A||B|)

• Example. Arithmetic parenthesization: insert parentheses to maximize value of expression

19 November 20

• Arithmetic parenthesization problem: consider the operation that occurs last, break into subproblems on either side; need to solve max problem and min prob- lem

• Sequence of n+1 numbers separated by sequence of n binary operations of {+, ×}

• x(i, j, ±1) representing max/min of subsequence i, j

• x(i, j, 1) = max(max{x(i, k, 1)pkx(k + 1, j, 1)}, max{x(i, k, −1)pkx(k + 1, j, −1)})

• x(i, j, −1) is minimum of all four extreme cases

• Base case when i = j, no operators left; solution x(1, n, 1) computed top down with memoization or bottom up in increasing j − i value

2 3 • Time: O(n ) subproblems, each takes O(n) to compute so overall O(n ) runtime

• Egg drop problem: building with F floors, have E eggs and D drops and want to find critical height k above which every egg will break in fewest number of drops

• With 100 floors and 1 egg, must try every floor from bottom to kth floor

• With infinitely many eggs, use binary search in O(log F ) drops

• With F = 100 and 2 eggs, use first egg to find group of 10 in which egg breaks and use second egg to find exact floor

• Start at floor t, if it breaks solve E − 1, t − 1 and if it doesn’t break solve E,F − t

2 • X(F,E) = min(1 + max{X(E − 1, t − 1),X(E,F − t)}) which is O(EF ); optimal runtime is O(E log F )

20 6.006 Wanlin Li 20 November 27

0 0 • Algorithm to solve all problems: x(a, b) is subproblem, x(a, b) = R(a, b, x(a , b )) with (a0, b0) < (a, b) in some sense and some relation R

• Base cases x(a, b) = B(a, b) not dependent on x when recurrence is too small to hold, S(a, b) is original problem

• Choosing subproblems: input tends to be sequence (prefixes/suffixes or sub- string) or number (smaller numbers)

• Identify feature of subproblem’s solution allowing for reduction to smaller sub- problem

• Lucky algorithm: guess feature of subproblem solution, recurse and combine into solution (lucky algorithm does not actually exist)

• Simulate guessing with for loop over all choices and return best solution

• Might need to add additional subproblems: relation has cycles, current subprob- lems don’t cover everything that needs to be known, need multiple answers to each subproblem

• Example. Subset sum: given n positive integers A = {a0, . . . , an−1} and target sum t, does there exist a subset S ⊆ A with P S = t; runs in O(nt) time

21 November 29

• Subset sum in O(nt), not polynomial

• Definition. Polynomial time: runtime is polynomial in size of input; Θ(n) if numbers t, ai fit in a word, O(n log t) bits

• t exponential in log t

• Definition. Pseudopolynomial time: runtime polynomial in input size and num- bers in input

• O(nt) is pseudopolynomial, optimal

• Definition. P: class of problems solvable in polynomial time

nO(1) • EXP is class of problems solvable in exponential time 2 where n is size of input

• Example. Negative weight cycle detection in P

21 6.006 Wanlin Li

• Example. n × n chess in EXP but not in P

• R is class of problems solvable in finite time

• Halting problem: input is program, question of whether the program termi- nates; halting problem not in R

• Most decision problems are uncomputable (not in R)

• Definition. Decision problem: yes or no as outputs; function from all possible inputs to {yes, no}, all inputs also a string of bits

• Algorithm corresponds to a bit string which is an integer in countable infinity

• Infinite string of bits from every decision problem, treating as base 2 decimal gives a real in [0, 1]

• [0, 1] not countable, way more problems than there are algorithms; cannot enu- merate problems

• Tetris problem: given board configuration and next n blocks to insert, question if survival is possible; in EXP but unknown if in P

• Definition. NP: class of decision problems solvable in polynomial time by non- deterministic algorithms; also the class of decision problems with polynomial length “certificate”/“proof” when answer is yes, i.e. checkable in polynomial time

• Example. Lucky algorithm would be in NP

• Definition. Nondeterministic algorithm: makes guesses and returns yes/no, guarantee that algorithm will find yes if possible

• Computational difficulty: P ⊆ N P ⊂ EXP ⊂ R

• Claim: if P= 6 NP then Tetris ∈/ P because Tetris is NP-hard

• NP-hard: as hard as all problems in NP

• Example. Chess is EXP -hard

• Definition. Reduction: given two problems A, B, reduce some instance of A problem into B problem so that B solution can be converted to A solution

• If A to B reduction exists, A is at least as easy as B and B is at least as hard as A

• All problems in NP can be reduced to Tetris

22 6.006 Wanlin Li

• Example. Classic NP-complete problem: 3-partition, given n integers a1, . . . an n between 0 and n, does there exist a partition into 3 triples with the same sum; strongly NP-complete

• Jigsaw puzzle: with number ai, create ai × 1 jigsaw piece; jigsaw ≈ 3-partition

• Reduction from 3-partition to Tetris also exists

• Other NP-complete problems: longest simple path in graph, longest common subsequence among n strings, shortest paths in 3 dimensions

22 December 4

• Improvement of AVL tree: higher branching factor including fusion trees with √ q  5 log n degree w, runtime O log log n

• Batcher odd-even mergesort: merge n elements with first and second halves sorted, then recursively merge odd slices and even slices; compare and swap (if necessary) adjacent pairs

• Odd-even mergesort nicely parallelizable

• Las Vegas algorithm, Monte Carlo algorithm

23 December 6

• Solve problem by designing new algorithm or reducing to already solved problem

• Brute force, divide/decrease and conquer, dynamic programming

• Data structures: RAM, trees, reallocation

24 Overall Points

• Heap sort optimal comparison sort algorithm that is in place

23