EECS 336: Introduction to Algorithms Lecture 4 MSTs and

Reading: MIT notes on matroids, Sections Idea: Greedy by edge weight 4.5, 4.6. Algorithm: Kruskal’s Algorithm Last time: • T = ∅ • Greedy Stays Ahead • Sort edges by increasing cost. • Interval Scheduling • For each edge e (in sorted order): • if {e} ∪ S is acyclic add e to S Today: • else discard e.

• Minimum Spanning Example: • Kruskal’s Alg 6 5 2 3 • 1 4

Minimum Spanning Trees Runtime “maintaining minimal connectivity in a net- work, e.g., for broadcast” First attempt: Def: a spanning tree of a graph G =(V, E) is T ⊆ E s.t. T (n, m) ≤ m log m + X i i cost of |{z} detection 2 (a) (V, T ) is connected. ≈ m log m + m 2 (b) (V, T ) is acyclic. = O(m ).

Goal: given costs on edges, find spanning Idea: use “union-find” data structure: tree with minimum total cost, a.k.a., the (MST). • union: merges two sets Def: c(e) is cost of edge e ∈ E, c(S) is total cost of subset S ⊂ E. • find: returns the “name” of a set.

1 • m union or find operations costs Lemma 2: if G is conneted, Kruskal out- ∗ O(m log m) puts a tree. ∗ 2...2 (log m is inverse of 22 = O(log m)) Proof: (by condradiction) n| times{z } • use sets to keep track of connected Suppose not: commponents. • output T not connected. ⇒ T has (A, B)-cut with no edges. • e = (u, v) can be added iff u and v are ⇒ T and any e from v ∈ A to u ∈ B is acyclic. in different connected comps. • G is connected, (A, B)-cut in G has an edge e. ⇒ Kruskal considered adding e but didn’t. →←

∗ T (m)= m log m + m log m Lemma 3: (Augmentation Lemma) If = Θ(mlogm) F,H ⊂ E are forests and |F |≤|H| then exists e ∈ H \ F such that H ∪{e} is a forest. Trees, Forests, and Cuts Proof:

Def: G′ = (V, E′) is a subgraph of G = Lemma 1 (V, E) if E′ ⊆ E. ⇒ # components of (V, F ) > # comp. of (V,H) ⇒ must be edge e ∈ H \ F Def: An acyclic undirected graph is a forest between connected comps of (V, F ). ⇒ (V, F ∪{e}) is acyclic. Def: An acyclic connected undirected graph is a tree Theorem: The tree output by Kruskal is Lemma 1: If G =(V, E) is a forest then it optimal. has n − m connected components. Proof: (by contradiction) Proof: Induction (homework).

Def: A, B ⊆ V is a cut if A ∪ B = ∅ and • Let I = {i1,...,in−1} be elt’s of Greedy. A ∩ B = E. Edge e = (u, v) crosses cut if (in order) u ∈ A and v ∈ B (or vice versa). • Let J = {j1,...,jn−1} be elt’s of OPT. (in order) • Assume for condradiction: c(I) >c(J) Correctness • Let r be first index with c(jr)

2 ⇒ c(j) ≤ c(jr) • (⇒) same as for Theorem 1. ⇒ c(j) ≤ c(ir) ⇒ j considered before ir • (⇐) homework. • Suppose j considered after ik (k

Matroids

Def: A set system M =(E, I) where

• E is ground set. • I ⊆ 2E is set of compatible subsets of E.

Question: When does this greedy algorithm work?

Def: A matroid is a set system M = (E, I) satisfying:

M1 “subset property” if I ∈ I, all subsets of I are in I. M2 “augmentation property” if I,J ∈ I and |I| < |J|, then exists e ∈ J \ I such that I ∪{e} ∈ I.

(compatible sets also called indepenent sets).

Corollary: acyclic subgraphs are a matroid.

Theorem: greedy algorithm is optimal iff feasible outputs are a matroid.

Proof:

3