ECE750-TXB Lecture 1: Asymptotics
Todd L. Veldhuizen ECE750-TXB Lecture 1: Asymptotics [email protected]
Asymptotics Asymptotics: Motivation Todd L. Veldhuizen Bibliography [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 26, 2007
ECE750-TXB Motivation Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] I We want to choose the best algorithm or data structure for the job. Asymptotics Asymptotics: Need characterizations of resource use, e.g., time, Motivation I Bibliography space; for circuits: area, depth.
I Many, many approaches:
I Worst Case Execution Time (WCET): for hard real-time applications
I Exact measurements for a specific problem size, e.g., number of gates in a 64-bit addition circuit. I Performance models, e.g., R∞, n1/2 for latency-throughput, HINT curves for linear algebra (characterize performance through different cache regimes), etc.
I ... ECE750-TXB Asymptotic analysis Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation Bibliography I We will focus on Asymptotic analysis: a good “first approximation” of performance that describes behaviour on big problems
I Reasonably independent of:
I Machine details (e.g., 2 cycles for add+mult vs. 1 cycle) I Clock speed, programming language, compiler, etc.
ECE750-TXB Asymptotics: Brief history Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] I Basic ideas originated in Paul du Bois-Reymond’s Asymptotics Infinit¨arcalc¨ul(‘calculus of infinities’) developed in the Asymptotics: 1870s. Motivation Bibliography I G. H. Hardy greatly expanded on Paul du Bois-Reymond’s ideas in his monograph Orders of Infinity (1910) [3].
I The “big-O” notation was first used by Bachmann (1894), and popularized by Landau (hence sometimes called “Landau notation.”)
I Adopted by computer scientists [4] to characterize resource consumption, independent of small machine differences, languages, compilers, etc. ECE750-TXB Basic asymptotic notations Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Asymptotic ≡ behaviour as n → ∞, where for our purposes Motivation n is the “problem size.” Bibliography Three basic notations:
I f ∼ g (“f and g are asymptotically equivalent”)
I f g (“f is asymptotically dominated by g”)
I f g (f and g are asymptotically bounded by one another)
ECE750-TXB Basic asymptotic notations Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: f (n) Motivation f ∼ g means lim = 1 n→∞ g(n) Bibliography
Example: 3x2 + 2x + 1 ∼ 3x2. ∼ is an equivalence relation:
I Transitive: (x ∼ y) ∧ (y ∼ z) ⇒ (x ∼ z)
I Reflexive: x ∼ x
I Symmetric: (x ∼ y) ⇒ (y ∼ x). Basic idea: We only care about the “leading term,” disregarding less quickly-growing terms. ECE750-TXB Basic asymptotic notations Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] f (n) f g means lim sup < ∞ Asymptotics Asymptotics: n→∞ g(n) Motivation Bibliography f (n) i.e., g(n) is eventually bounded by a finite value.
I Basic idea: f grows more slowly than g, or just as quickly as g.
I is a preorder (or quasiorder):
I Transitive: (f g) ∧ (g h) ⇒ (f h). I Reflexive: f f
I fails to be a partial order because it is not antisymmetric: there are functions f , g where f g and g f but f 6= g.
I Variant: g f means f g.
ECE750-TXB Basic asymptotic notations Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Write f g when there are positive constants c1, c2 such that Asymptotics Asymptotics: Motivation f (n) Bibliography c ≤ ≤ c 1 g(n) 2
for sufficiently large n.
I Examples:
I n 2n I n (2 + sin πn)n
I is an equivalence relation. ECE750-TXB Strict forms Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Write f ≺ g when f g but f 6 g. Motivation Bibliography I Basic idea: f grows strictly less quickly than g f (n) I Equivalent: f g exactly when limn→∞ g(n) = 0. I Examples 2 3 I x ≺ x I log x ≺ x
I Variant: f g means g ≺ f
ECE750-TXB Orders of growth Lecture 1: Asymptotics
Todd L. Veldhuizen We can use ≺ as a “ruler” by which to judge the growth of [email protected] functions. Some common “tick marks” on this ruler are: Asymptotics n k 2 n n 2 Asymptotics: log log n ≺ log n ≺ log n ≺ n ≺ n ≺ n ≺ · · · ≺ 2 ≺ n! ≺ n ≺ 2 Motivation Bibliography We can always find in ≺ a dense total order without endpoints. i.e.,
I There is no slowest-growing function;
I There is no fastest-growing function;
I If f ≺ h we can always find a g such that f ≺ g ≺ h. (The canonical example of a dense total order without endpoints is Q, the rationals.) I This fact allows us to sketch graphs in which points on the axes are asymptotes. ECE750-TXB Big-O Notation Lecture 1: Asymptotics “Big-O” is a convenient family of notations for asymptotics: Todd L. Veldhuizen [email protected] O(g) ≡ {f : f g} Asymptotics Asymptotics: Motivation i.e., O(g) is the set of functions f so that f g. Bibliography 2 2 2 3/2 I O(n ) contains n , 7n , n, log n, n , 5,...
I Note that f ∈ O(g) means exactly f g.
I A standard abuse of notation is to treat a big-O expression as if it were a term:
x2 + 2x1/2 + 1 = x2 + O(x1/2) | {z } ≺x2 The above equation should be read as “there exists a function f ∈ O(x1/2) such that x2 + 2x1/2 + 1 = x2 + f (x).”
ECE750-TXB Big-O for algorithm analysis Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] I Big-O notation is an excellent tool for expressing
machine/compiler/language-independent complexity Asymptotics Asymptotics: properties. Motivation Bibliography I On one machine a sorting algorithm might take ≈ 5.73n log n seconds, on another it might take ≈ 9.42n log n + 3.2n seconds.
I We can wave these differences aside by saying the algorithm runs in O(n log n) seconds.
I O(f (n)) means something that behaves asymptotically like f (n):
I Disregarding any initial transient behaviour; I Disregarding any multiplicative constants c · f (n); I Disregarding any additive terms that grow less quickly than f (n). ECE750-TXB Basic properties of big-O notation Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Given a choice between an sorting algorithm that runs in Asymptotics Asymptotics: O(n2) time and one that runs in O(n log n) time, which Motivation should we choose? Bibliography 1. Gut instinct: the O(n log n) one, of course! 2. But: note that the class of functions O(n2) also contains n log n. Just because we say an algorithm is O(n2) does not mean it takes n2 time! 3. It could be that the O(n2) algorithm is faster than the O(n log n) one.
ECE750-TXB Additional notations Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
To distinguish between “at most this fast,” “at least this Asymptotics Asymptotics: fast,” etc. there are additional big-O-like notations: Motivation Bibliography
f ∈ O(g) ≡ f g upper bound f ∈ o(g) ≡ f ≺ g strict upper bound f ∈ Θ(g) ≡ f g tight bound f ∈ Ω(g) ≡ f g lower bound f ∈ ω(g) ≡ f g strict lower bound ECE750-TXB Tricks for a bad remembering day Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation Lower case means strict: I Bibliography I o(n) is strict version of O(n) I ω(n) is strict version of Ω(n)
I ω, Ω (omega) is the last letter of the greek alphabet — if f ∈ ω(g) then g comes after f in asymptotic ordering.
I f ∈ Θ(g): the line through the middle of the theta — asymptotes converge
ECE750-TXB Notation: o(·) Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics f ∈ o(g) means f ≺ g Asymptotics: Motivation Bibliography
I o(·) expresses a strict upper bound.
I If f (n) is o(g(n)), then f grows strictly slower than g. Pn −k 1 I Example: k=0 2 = 2 − 2n = 2 + o(1) I o(1) indicates the class of functions for which g(n) limn→∞ 1 = 0, which means limn→∞ g(n) = 0. I 2 + o(1) means “2 plus something that vanishes as n → ∞”
I If f is o(g), it is also O(g). n I n! = o(n ). ECE750-TXB Notation: ω(·) Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: f ∈ ω(g) means f g Motivation Bibliography
I ω(·) expresses a strict lower bound.
I If f (n) is ω(g(n)), then f grows strictly faster than g.
I f ∈ ω(g) is equivalent to g ∈ o(f ). I Example: Harmonic series Pn 1 −1 I hn = k=0 k ∼ ln n + γ + O(n ) I hn ∈ ω(1) (It is unbounded.) I hn ∈ ω(ln ln n) n n I n! = ω(2 ) (grows faster than 2 )
ECE750-TXB Notation: Ω(·) Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation f ∈ Ω(g) means f g Bibliography
I Ω(·) expresses a lower bound, not necessarily strict
I If f (n) is Ω(g(n)), then f grows at least as fast as g.
I f ∈ Ω(g) is equivalent to g ∈ O(f ) 2 I Example: Matrix multiplication requires Ω(n ) time. (At least enough time to look at each of the n2 entries in the matrices.) ECE750-TXB Notation: Θ(·) Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] f ∈ Θ(g) means f g Asymptotics Asymptotics: Motivation Bibliography I Θ(·) expresses a tight asymptotic bound
I If f (n) is Θ(g(n)), then f (n)/g(n) is eventually contained in a finite positive interval [c1, c2].
I Θ(·) bounds are very precise, but often hard to obtain.
I Example: QuickSort runs in time Θ(n log n) on average. (Tight! Not much faster or slower!)
I Example: Stirling’s approximation ln n! ∼ n ln n − n + O(ln n) implies that ln n! is Θ(n ln n)
I Don’t make the mistake of thinking that f ∈ Θ(g) f (n) means limn→∞ g(n) = k for some constant k.
ECE750-TXB Algebraic manipulations of big-O Lecture 1: Asymptotics
I Manipulating big-O terms requires some thought — Todd L. Veldhuizen always keep in mind what the symbols mean! [email protected]
I An additive O(f (n)) term swallows any terms that are Asymptotics Asymptotics: f (n): Motivation Bibliography n2 + n1/2 + O(n) + 3 = n2 + O(n)
The n1/2 and 3 on the l.h.s. are meaningless in the presence of an O(n) term.
I O(f (n)) − O(f (n)) = O(f (n)) not 0! I O(f (n)) · O(g(n)) = O(f (n)g(n)). −1 1/2 I Example: What is ln n + γ + O(n ) times n + O(n )? h i ln n + γ + O(n−1) · n + O(n1/2) = n ln n + γn + O(n1/2 ln n)
The terms γO(n1/2), O(n−1/2), O(1), etc. get swallowed by O(n1/2 ln n). ECE750-TXB Sharpness of estimates Lecture 1: Asymptotics
Todd L. Example: for a constant c, Veldhuizen [email protected] c c ln(n + c) = ln n 1 + = ln n + ln 1 + Asymptotics Asymptotics: n n Motivation 2 c c Bibliography = ln n + − + ··· (Maclaurin series) n 2n2 1 = ln n + Θ n
It is also correct to write
ln(n + c) = ln n + O(n−1) ln(n + c) = ln n + o(1)
−1 1 1 since Θ(n ) ⊆ O( n ) ⊆ o(1). However, the Θ( n ) error term is sharper — a better estimate of the error.
ECE750-TXB Sharpness of estimates & The Riemann Lecture 1: Asymptotics
Hypothesis Todd L. Veldhuizen Example: let π(n) be the number of prime numbers ≤ n. [email protected] The Prime Number Theorem is that Asymptotics Asymptotics: π(n) ∼ Li(n) (1) Motivation Bibliography R n 1 where Li(n) = x=2 ln x dx is the logarithmic integral, and n Li(n) ∼ ln n Note that (1) is equivalent to:
π(n) = Li(n) + o(Li(n))
It is known that the error term can be improved, for example to n √ π(n) = Li(n) + O e−a ln n ln n ECE750-TXB Sharpness of estimates & The Riemann Lecture 1: Asymptotics
Hypothesis Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation The famous Riemann hypothesis is the conjecture that a Bibliography sharper error estimate is true:
1 π(n) = Li(n) + O(n 2 ln n)
This is one of the Clay Institute millenium problems, with a $1,000,000 reward for a positive proof. Sharp estimates matter!
ECE750-TXB Sharpness of estimates Lecture 1: Asymptotics
Todd L. To maintain sharpness of asymptotic estimates during Veldhuizen [email protected] analysis, some caution is required. E.g. If f (n) = 2n + O(n), what is log f (n)? Asymptotics Asymptotics: Bad answer: log f (n) = n + O(n). Motivation More careful answer: Bibliography
log f (n) = log(2n + O(n)) = log(2n(1 + O(n2−n))) = log(2n) + log(1 + O(n2−n))
Since log(1 + δ(n)) ∼ O(δ(n)) if δ ∈ o(1),
log f (n) = n + O(n2−n)
i.e., log f (n) is equal to n plus some value converging exponentially fast to 0. ECE750-TXB Sharpness of estimates Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected] log f (n) = n + O(n2−n) Asymptotics Asymptotics: is a reasonably sharp estimate (but, what happens if we take Motivation 2log f (n) with this estimate?) Bibliography If we don’t care about the rate of convergence we can write
f (n) = n + o(1)
where o(1) represents some function converging to zero. This is less sharp since we have lost the rate of convergence. Even less sharp is
f (n) ∼ n
which loses the idea that f (n) − n → 0, and doesn’t rule out things like f (n) = n + n3/4.
ECE750-TXB Asymptotic expansions Lecture 1: Asymptotics
Todd L. Veldhuizen An asymptotic expansion of a function describes how that [email protected] function behaves for large values. Often it is used when an Asymptotics explicit description of the function is too messy or hard to Asymptotics: derive. Motivation Bibliography e.g. if I choose a string of n bits uniformly at random (i.e., each of the 2n possible strings has probability 2−n), what is 3 the probability of getting ≥ 4 n 1’s? n Easy to write the answer: there are k ways of arranging k 3 1’s, so the probability of getting ≥ 4 n 1’s is:
n X n P(n) = 2−n k 3 k=d 4 ne This equation is both exact and wholly uninformative. ECE750-TXB Asymptotic expansions Lecture 1: Asymptotics Can we do better? Yes! Todd L. The number of 1’s in a random bit string is a binomial Veldhuizen [email protected] distribution and is well-approximated by the normal distribution as n → ∞: Asymptotics Asymptotics: Motivation Bibliography n Z ∞ 2 X −n n 1 − x 2 ∼ √ e 2 dx k 1 √ x=α 2π k= 2 n+α n = 1 − F (α) where F (x) = 1 1 + erf √x is the cumulative normal 2 2 distribution. Maple’s asympt command yields the asymptotic expansion: ! 1 F (x) ∼ 1 − O x2 xe 2
ECE750-TXB Asymptotic expansions example Lecture 1: Asymptotics
3 Todd L. We want to estimate the probability of ≥ 4 n 1’s: Veldhuizen [email protected] 1 √ 3 n + α n = n Asymptotics Asymptotics: 2 4 Motivation √ Bibliography n gives α = 4 . Therefore the probability is √ n P(n) ∼ 1 − F 4 1 ∼ 1 − 1 + O √ n ne 32 1 = O √ n ne 32
3 So, the probability of having more than 4 n 1’s converges to 0 exponentially fast. ECE750-TXB Asymptotic Expansions Lecture 1: Asymptotics
I When taking an asymptotic expansion, one writes Todd L. Veldhuizen [email protected] ln n! ∼ n ln n − n + O(1) Asymptotics Asymptotics: rather than Motivation Bibliography ln n! = n ln n − n + O(1)
Writing ∼ is a clue to the reader that an asymptotic expansion is being taken, rather than just carrying an error term around.
I Asymptotic expansions are very important in average case analysis, where we are interested in characterizing how an algorithm performs for most inputs.
I To prove an algorithm runs in O(f (n)) on average, one technique is to obtain an asymptotic estimate of the probability of running in time f (n), and show it converges to zero very quickly.
ECE750-TXB Asymptotic Expansions for Average-Case Analysis Lecture 1: Asymptotics
Todd L. I The time required to add two n-bit integers by a no Veldhuizen carry adder is proportional to the longest carry [email protected]
sequence. Asymptotics Asymptotics: I It can be shown that the probability of having a carry Motivation sequence of length ≥ t(n) satisfies Bibliography
Pr(carry sequence ≥ t(n)) ≤ 2−t(n)+log n+O(1)
I If t(n) log n, the probability converges to 0. We can conclude that the average running time is O(log n).
I In fact we can make a stronger statement:
Pr(carry sequence ≥ log n + ω(1)) → 0
Translation: “The probability of having a carry sequence longer than log n + δ(n), where δ(n) is any unbounded function, converges to zero.” ECE750-TXB The Taylor series method of asymptotic Lecture 1: Asymptotics expansion Todd L. Veldhuizen [email protected] I This is a very simple method for asymptotic expansion that works for simple cases; it is one technique Maple’s Asymptotics Asymptotics: asympt function uses. Motivation ∞ Bibliography I Recall that the Taylor series of a C function about x = 0 is given by:
x2 x3 f (x) = f (0) + xf 0(0) + f 00(0) + f 000(0) + ··· 2! 3!
I To obtain an asymptotic expansion of some function F (n) as n → ∞, 1. Substitute n = x −1 into F (n). (Then n → ∞ as x → 0.) 2. Take a Taylor series about x = 0. 3. Substitute x = n−1. 4. Use the dominating term(s) as the expansion, and the next term as the error term.
ECE750-TXB Taylor series method of asymptotic expansion: Lecture 1: Asymptotics example Todd L. 1+ 1 Veldhuizen Example expansion: F (n) = e n . [email protected] Obviously limn→∞ F (n) = e, so we expect something of the Asymptotics form F (n) ∼ e + o(1). Asymptotics: Motivation −1 −1 1+x 1. Substitute n = x into F (n): obtain F (x ) = e Bibliography 2. Taylor series about x = 0: x2 x3 e1+x = e + xe + e + e + ··· 2 6 3. Substitute x = n−1: e 1 1 = e + + e + e + ··· n 2n2 6n3 1 1 4. Since e n e 2n2 e · · · , 1 F (n) ∼ e + Θ n ECE750-TXB Asymptotics of algorithms Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics is a key tool for algorithms and data structures: Asymptotics Asymptotics: Motivation I Analyze algorithms/data structures to obtain sharp estimates of asymptotic resource consumption (e.g., Bibliography time, space)
I Possibly use asymptotic expansions in the analysis to estimate e.g. probabilities
I Use these resource estimates to
I Decide which algorithm/data structure is “best” according to design criteria
I Reason about the performance of compositions (combinations) of algorithms and data structures.
ECE750-TXB References on asymptotics Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation I Course text: [1] Asymptotic notations Bibliography I Concrete Mathematics, Ronald L. Graham, Donald E. Knuth and Oren Patashnik, Ch. 9 Asymptotics [2]
I Advanced:
I Shackell, Symbolic Asymptotics [6] I Hardy, Orders of Infinity [3] I Lightstone + Robinson, Nonarchimedean fields and asymptotic expansions [5] ECE750-TXB Bibliography I Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics [1] Thomas H. Cormen, Charles E. Leiserson, and Ronald R. Asymptotics: Rivest. Motivation Bibliography Intoduction to algorithms. McGraw Hill, 1991. bib [2] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, Reading, MA, USA, second edition, 1994. bib
ECE750-TXB Bibliography II Lecture 1: Asymptotics
Todd L. Veldhuizen [3] G. H. Hardy. [email protected]
Orders of infinity. The ‘Infinit¨arcalc¨ul’ of Paul du Asymptotics Asymptotics: Bois-Reymond. Motivation Hafner Publishing Co., New York, 1971. Bibliography Reprint of the 1910 edition, Cambridge Tracts in Mathematics and Mathematical Physics, No. 12. bib [4] Donald E. Knuth. Big omicron and big omega and big theta. SIGACT News, 8(2):18–24, 1976. bib pdf
[5] A. H. Lightstone and Abraham Robinson. Nonarchimedean fields and asymptotic expansions. North-Holland Publishing Co., Amsterdam, 1975. North-Holland Mathematical Library, Vol. 13. bib ECE750-TXB Bibliography III Lecture 1: Asymptotics
Todd L. Veldhuizen [email protected]
Asymptotics Asymptotics: Motivation Bibliography [6] John R. Shackell. Symbolic asymptotics, volume 12 of Algorithms and Computation in Mathematics. Springer-Verlag, Berlin, 2004. bib ECE750-TXB Lecture 2
Todd L. Veldhuizen ECE750-TXB Lecture 2 [email protected] Resources and Complexity Classes Outline Resource Consumption
Complexity classes Todd L. Veldhuizen Bibliography [email protected]
Electrical & Computer Engineering University of Waterloo Canada
January 16, 2007
ECE750-TXB Resource Consumption Lecture 2 Todd L. Veldhuizen To decide which algorithm or data structure to use, we are [email protected] interested in their resource consumption. Depending on the Outline problem context, we might be concerned with: Resource Consumption I Time and space consumption Complexity classes I For logic circuits: Bibliography I Number of gates I Depth I Area I Heat production
I For parallel/distributed computing:
I Number of processors I Amount of communication required I Parallel running time
I For randomized algorithms:
I Number of random bits used I Error probability ECE750-TXB Machine models Lecture 2 Todd L. Veldhuizen I The performance of an algorithm must always be [email protected] analyzed with reference to some machine model that defines: Outline Resource I The basic operations supported (e.g., random-access Consumption
memory; arithmetic; obtaining a random bit; etc.) Complexity classes I The resource cost of each operation. Bibliography I Some common machine models:
I Turing machine (TM): very primitive, tape-based, used for theoretical arguments only;
I Nondeterministic Turing machine: TM that can effectively fork its execution at each step, so that after t steps it can behave as if it were a superfast parallel machine with e.g. 2t processors;
I RAM (Random Access Machine) is a model that corresponds more-or-less to an everyday single-CPU desktop machine, but with infinite memory;
I PRAM and LogP [2, 3] are popular models for parallel computing.
ECE750-TXB Machine models Lecture 2 Todd L. Veldhuizen [email protected] I The performance of an algorithm can change drastically when you change machine models. e.g., many problems Outline Resource believed to take exponential time (assuming P 6= NP) Consumption
on a RAM can be solved in polynomial time on a Complexity classes Nondeterministic TM. Bibliography
I Often there are generic results that let you translate resource bounds on one machine model to another:
I An algorithm taking time T (n) and space S(n) on a Turing machine can be simulated in O(T (n) log log S(n)) time by a RAM;
I An algorithm taking time T (n) and space S(n) on a RAM can be simulated in O(T 3(n)(S(n) + T (n))2) time by a Turing machine.
I Unless otherwise stated, people are usually referring to a RAM or similar machine model. ECE750-TXB Machine models Lecture 2 Todd L. Veldhuizen [email protected]
Outline I When you are analyzing an algorithm, know your Resource machine model. Consumption
Complexity classes I There are embarassing papers in the literature in which nonspecialists have “proven” outlandish complexity Bibliography results by making basic mistakes
I e.g. Assuming that arbitrary precision real numbers can be stored in O(1) space and multiplied, added, etc. in O(1) time. On realistic sequential (nonparallel) machine models, d-digit real numbers take:
I O(d) space I O(d) time to add I O(d log d) time to multiply
ECE750-TXB Example of time and space complexity Lecture 2 Todd L. Veldhuizen I Let’s compare three containers for storing values: list, [email protected] tree, sorted array. Let n be the number of elements Outline stored. Resource Consumption I Average-case complexity (on a RAM) is: Complexity classes Space Search time Insert time Bibliography List Θ(n) Θ(n) Θ(n) Balanced tree Θ(n) Θ(log n) Θ(log n) Sorted array Θ(n) Θ(log n) Θ(n)
I If search time is important: since log n ≺ n, a balanced tree or sorted array will be faster than a list for sufficiently large n.
I If insert time is important: use a balanced tree.
I Caveat: asymptotic performance says nothing about performance for small cases. ECE750-TXB Example: Circuit complexity Lecture 2 Todd L. I In circuit complexity, we do not analyze programs per Veldhuizen se, but a family of circuits, one for each problem size [email protected] (e.g., addition circuits for n-bit integers). Outline I Circuits are built from basic gates. The most realistic Resource Consumption model is gates that have finite fan-in and fan-out, i.e., Complexity classes
gates have 2-inputs and output signals can be fed into Bibliography at most k inputs I Common resource measures are: I time (i.e., delay, circuit depth) I number of gates (or cells, for VLSI) I fan-out I area I E.g., addition circuits: Adder type Gates Depth Ripple-carry adder ≈ 7n ≈ 2n √ Carry-skip (1L) ≈ 8n ≈ 4 n Carry lookahead ≈ 14n ≈ 4 log n Conditional-sum adder ≈ 3n log n ≈ 2 log n
ECE750-TXB Resource consumption tradeoffs Lecture 2 Todd L. Veldhuizen [email protected]
Outline
Resource I Often there are tradeoffs between consumption of Consumption resources. Complexity classes Bibliography I Example: Testing whether a number is prime. The Miller-Rabin test takes time Θ(k log3 n) and has probability of error 4−k . 3 I Choosing k = 20 yields time Θ(log n) and probability of error 2−40. 1 4 I Choosing k = 2 log n yields time Θ(log n) and 1 probability of error n . ECE750-TXB Resource consumption tradeoffs: time-space Lecture 2 Todd L. Cracking passwords has a time-space tradeoff: Veldhuizen [email protected] I Passwords are stored encrypted to make them hard to recover: e.g. htpasswd (web passwords) turns “foobar” Outline into “AjsRaSQk32S6s” Resource Consumption
I Brute force approach: if there are n possible passwords, Complexity classes
precompute a database of size O(n) containing every Bibliography possible encrypted password and its plaintext. Crack passwords in O(log n) time by looking them up in the database. 64 I Prohibitively expensive in space: e.g. n ≈ 2 . 2/3 I Hellman: can recover plaintext in O(n ) time using a database of size O(n2/3).
I MS-Windows LanManager passwords are 14-characters; they are stored hashed (encrypted). With a precomputed database of size 1.4Gb (two CD-ROMs), 99.9% of all alphanumerical password hashes can be cracked in 13.6 seconds [4].
ECE750-TXB Resource consumption tradeoffs: Area-time Lecture 2 Todd L. Veldhuizen [email protected]
In designing circuits e.g., VLSI, one is concerned with how Outline much area a circuit takes up vs. how fast it is (its gate Resource Consumption
depth). Complexity classes I Often one can sacrifice area for time (depth), and vice Bibliography versa.
I e.g. Multiplying two n-bit numbers. With A the area and T the time, it is known [1] that for any circuit family
(AT )2α = Ω(n1+α)
This is an “area-time product.” ECE750-TXB Kinds of problems Lecture 2 Todd L. Veldhuizen [email protected]
Outline
I We write algorithms to solve problems. Resource Consumption I Some special classes of problems: Complexity classes I Decision problems: require a yes/no answer. Example: Does this file contain a valid Java program? Bibliography
I Optimization problems: require choosing a solution that minimizes (maximizes) some objective function. Example: Find a circuit made out of AND, OR, and NOT gates that computes the sum of two 8-bit integers, and has the fewest gates.
I Counting problems: count the number of objects that satisfy some criterion. Example: For how many inputs will this circuit output zero?
ECE750-TXB Complexity classes Lecture 2 Todd L. Veldhuizen [email protected]
I A complexity class is defined as Outline
I a style of problem Resource Consumption I that can be solved with a specified amount of resources Complexity classes I on a specified machine model Bibliography I Example: P (a.k.a PTIME) is the class of decision problems that can be solved in polynomial time (i.e., d time O(n ) for some d ∈ N) on a Turing machine. I Complexity classes:
I Let us lump together problems according to how “hard” they are
I Are usually defined so as to be invariant under non-radical changes of machine model (e.g., the class P on a TM is the same as the class P on a RAM). ECE750-TXB Some basic distinctions Lecture 2 Todd L. Veldhuizen I At the coarsest level of structure, decision problems [email protected] come in three varieties: Outline
I Problems we can write computer programs to Resource solve. What this course is about! (Program will always Consumption stop and say “yes” or “no,” and be right!) Complexity classes
I Problems we can define, but not write computer Bibliography programs to solve (e.g., deciding whether a Java program runs in polynomial time) I Problems we cannot even define. I Consider deciding whether x ∈ A for some set A ⊆ N of natural numbers. e.g., prime numbers. I In any (effective) notation system we care to choose, there are ℵ0 (countably many) problem definitions. (They can be put into 1-1 correspondence with the natural numbers). ℵ0 I There are 2 (uncountably many) problems — subsets of A ⊆ N. (They can be put into 1-1 correspondence with the reals.)
ECE750-TXB An aside: Hasse diagrams Lecture 2 Todd L. Veldhuizen [email protected]
Outline Complexity classes are sets of problems Resource I Consumption
I Some complexity classes are contained inside other Complexity classes
complexity classes. Bibliography I e.g., every problem in class P (polynomial time on TM) is also in class PSPACE (polynomial space on TM).
I We can write P ⊆ PSPACE to mean: the class P is contained in the class PSPACE.
I ⊆ is a partial order: reflexive, transitive, anti-symmetric.
I Hasse diagrams are intuitive ways of drawing partial orders. ECE750-TXB An aside: Hasse diagrams Lecture 2 Todd L. Veldhuizen [email protected] I Example: I am a professor and a geek. Professors are people; geeks are people (are too!) Outline Resource I {me} ⊆ professors Consumption Complexity classes I {me} ⊆ geeks Bibliography I professors ⊆ people
I geeks ⊆ people
people o KK ooo KK ooo KK professors geeks NN s NNN ss NN sss {me}
ECE750-TXB Whirlwind tour of major complexity classes Lecture 2 Todd L. Veldhuizen [email protected]
Outline I Some caveats: Resource I There are 462 classes in the Complexity Zoo. Consumption Complexity classes I We’ll see... slightly fewer than that. (Most complexity classes are interesting primarily to structural complexity Bibliography theorists — they capture fine distinctions that we’re not concerned with day-to-day.)
I For every class we shall see, there are many classes above, beside, and below it that are not shown;
I The Hasse diagrams do not imply that the containment is strict: e.g., when the diagram shows NP above P, this means P ⊆ NP, not P ⊂ NP. ECE750-TXB Whirlwind tour of major complexity classes Lecture 2 Todd L. Veldhuizen [email protected]
Decidable Outline Resource EXP Consumption Complexity classes PSPACE Bibliography m PP mmmm PPP coNP NP QQQ nn QQQ nnn Q P nn
I EXP = decision – exponential time on TM (aka EXPTIME)
I PSPACE = decision – polynomial space on TM
I P = decision – polynomial time on TM (aka P)
I NP, co − NP: we’ll get to these...
ECE750-TXB Randomness-related classes Lecture 2 Todd L. ZPP, RP, coRP, BPP: probabilistic classes (machine has Veldhuizen [email protected] access to random bits) EXPTIME Outline n RR nn RR Resource nnn R NP BPP coNP Consumption R nnn RRR Complexity classes nnn R RP P coRP Bibliography PPP ll PP llll ZPP
PTIME
I BPP ≈ problems that can be solved in polynomial time with access to a random number source, with 1 probability of error < 2 . (Run many times and vote: get error as low as you like.)
I ZPP=problems that can be solved in polynomial time with access to a random number source, with zero probability of error. ECE750-TXB Polynomial-time and below Lecture 2 Todd L. Veldhuizen [email protected]
Outline
Resource PTIME Polynomial time Consumption Complexity classes
Bibliography NC “Nick’s class”
LOGSPACE Logarithmic space
NC1 Logarithmic depth circuits, bounded fan in/out
AC0 Constant depth circuits, bounded fan in/out
ECE750-TXB Structural complexity theory Lecture 2 Todd L. Veldhuizen I Structural complexity theory = the study of complexity [email protected] classes and their interrelationships Outline I Many fundamental relationships are not known: Resource I Is P=NP? (Lots of industrially important problems are Consumption NP, like placement & routing for VLSI, designing Complexity classes communication networks, etc.) Bibliography
I Is ZPP=P? (Is randomness really necessary?) I Is BPP ⊆ NP? If so, we can solve those hard problems in NP by flipping coins, with some error so tiny we don’t care.
I Lots of conditional results are known, e.g.: “If BPP contains NP, then RP=NP and PH is contained in BPP; any proof of BPP=P would require showing either NEXP is not in P/poly or that #P requires superpolynomial sized circuits.”
I Luckily (for me and you) this is not a course in complexity theory. We will do basics only. ECE750-TXB Bibliography I Lecture 2 Todd L. Veldhuizen [email protected]
Richard P. Brent and H. T. Kung. Outline Resource The area-time complexity of binary multiplication. Consumption J. ACM, 28(3):521–534, 1981. bib pdf Complexity classes Bibliography David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. LogP: Towards a realistic model of parallel computation. In Marina Chen, editor, Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 1–12, San Diego, CA, May 1993. ACM Press. bib
ECE750-TXB Bibliography II Lecture 2 Todd L. Veldhuizen [email protected]
Outline
David E. Culler, Richard M. Karp, David Patterson, Resource Abhijit Sahay, Eunice E. Santos, Klaus Erik Schauser, Consumption Ramesh Subramonian, and Thorsten von Eicken. Complexity classes Bibliography Logp: a practical model of parallel computation. Commun. ACM, 39(11):78–85, 1996. bib Philippe Oechslin. Making a faster cryptanalytic time-memory trade-off. In Dan Boneh, editor, CRYPTO, volume 2729 of Lecture Notes in Computer Science, pages 617–630. Springer, 2003. bib pdf ECE750-TXB Lecture 3
Todd L. Veldhuizen ECE750-TXB Lecture 3 [email protected] Basic Algorithm Analysis, Recurrences, and Z-transforms Outline
Todd L. Veldhuizen [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 28, 2007
ECE750-TXB Lecture 3
Todd L. Veldhuizen [email protected]
Part I
Basic Algorithm Analysis ECE750-TXB RAM-style machine models Lecture 3 Todd L. Veldhuizen [email protected]
I Unless we are dealing with parallelism, randomness, circuits, etc., for the remainder of this course we will always assume a RAM-style machine.
I RAM = random access memory
I Every memory location can be read and written in O(1) time. (This is in contrast to a Turing machine, where reading a symbol at position p on the tape requires moving the position of the machine across the tape to p, requiring O(p) steps.)
I Memory locations, variables, registers, etc. all contain objects of size O(1). (e.g., 64-bit words)
I Basic operations (addition, multiplication, etc.) all take O(1) time.
ECE750-TXB Styles of analysis Lecture 3 Todd L. Veldhuizen [email protected]
I Worst case: if an algorithm has worst case O(f (n)) time, there are constants c1, c2 such that no input requires more than c1 + c2f (n) time, for n big.
I Average case: average case O(f (n)) time means: the time required by the algorithm on inputs of size n, averaged according to some probability distribution (usually uniform) is O(f (n)).
I Amortized analysis: if a data structure has amortized time O(f (m)) then a sequence of m operations will take O(m · f (m)) time. (Most operations are cheap, but every now and then you need to do something expensive.) ECE750-TXB What is n? Lecture 3 Todd L. Veldhuizen [email protected] I When we say an algorithm takes O(f (n)) time, what does n refer to?
I Default: n is the number of bits required to represent the input.
I However, often we choose n to be a natural description of the “size” of the problem: 2 I Number of vertices in a graph (input length is O(n ) bits to specify edges)
I For number-theory algorithms: n is often an integer (input length is O(log n) bits) I For linear algebra, n usually indicates rank:
I Input is O(n) bits for vectors, e.g., dot product; 2 I Input is O(n ) bits for matrices.
I Exactly what n stands for is important:
I Two integers ≤ n can be multiplied in O(log n) time. I Two n-bit integers can be multiplied in O(n log n) time.
ECE750-TXB Tools for analyzing algorithms Lecture 3 Todd L. Veldhuizen [email protected]
I Asymptotics
I Recurrences, z-transforms
I Combinatorics, Ramsey theory
I Discrepancy
I Probability, Statistics
I Information Theory
I Random objects (e.g., random graphs), zero-one laws
I ... pretty much anything else you can think of ECE750-TXB No silver bullet Lecture 3 Todd L. Veldhuizen [email protected] I Finding a bound for the running time of an algorithm is an undecidable problem; it is impossible to write a program that will automatically prove a bound, if one exists, for any program.
I There are very simple algorithms that have extremely long proofs of complexity bounds.
I There are very simple algorithms that nobody knows the running time of! e.g. Collatz problem.
I In any formal system (e.g., ZFC set theory) there are simple algorithms that have a complexity bound, but this cannot be proven.
I There is no finite set of tools that suffice for algorithm analysis.
I However, there are well-defined classes of algorithms that can be analyzed in a systematic way, and we will learn some of these.
ECE750-TXB Recurrence equations Lecture 3 Todd L. Veldhuizen [email protected]
I Recurrence equations are one of the simplest techniques for algorithm analysis, and for simple programs the analysis is easily automated.
I Recipe:
I Write out the algorithm in some suitable programming language or pseudocode, so that every step is expressed in terms of basic operations supported by the machine model that take O(1) time.
I Attach to each statement/syntax block a variable that counts the amount of resource used there (e.g., time)
I Write equations that relate the variables; simplify or approximate as necessary.
I Solve the equations. ECE750-TXB Pseudocode language Lecture 3 Todd L. Veldhuizen [email protected]
I A simple pseudocode language:
s = loc ← e assignment | if e then b [else b]opt if statement | for v = e to e b for loop | v(e, ··· , e) function call | return e b = s | s b statement block (one or more statements) loc = v[e] array v variable e = loc location (array or variable) | e op e operator: + ∗ − etc. | e R e relation: ≤, =, etc. | constant
ECE750-TXB Analysis Rules Lecture 3 Todd L. Veldhuizen [email protected]
I Basic operations (array access, multiply, add, compare, etc.) take O(1) time.
I Represent constant time operations by arbitrary constants c1, c2, etc. ECE750-TXB For loops (simple version) Lecture 3 Todd L. Veldhuizen [email protected]
for i = 1 to n n t1 X . t1 = c1 + (c2 + t2(i)) t2(i) . end . i=1
I The time required by a loop is: I c1: time required to initialize i = 1 I for each loop iteration (sum):
I some constant overhead c2 (time required to increment i and compare to n) I the time required by the body of the loop t2(i), which might depend on the value of i.
ECE750-TXB If statements (simple version) Lecture 3 Todd L. Veldhuizen [email protected]
if t2 e then
t . 3 . t 1 t1 = t2 + c1 + max(t3, t4) else
t . 4 .
I Time required for an if statement: I t2 = time required to evaluate branch condition I c1 = some constant time required for branching I t3, t4 = time taken in the branches I We use max(t3, t4) because we are seeking an upper bound on running time. ECE750-TXB Function calls Lecture 3 Todd L. I For each function F , introduce a time function TF (n): Veldhuizen represents the amount of time used by the function F [email protected] on inputs characterized by parameters n = (n1, n2, ··· ). (Usually have just a single parameter: TF (n).) I The variable(s) n should include any values on which the time required by the function depends. s I Example: (naive) function to compute r Exp(r, s) p ← 1 for i = 1 to s p ← p ∗ r end return p Time depends on s (exponent) but not r (base), so time function should be TExp(s). I Time required for function call:
t1 Exp(x, y) t1 = c1 + TExp(y)
ECE750-TXB Example: Exp(r, s) Lecture 3 Todd L. Veldhuizen [email protected]
Exp(r, s) t t1 = t2 + t3 + t5 2 p ← 1 t2 = c2 for i = 1 to s Ps 0 t3 = c3 + (c + t4) t1 t3 t4 p ← p ∗ r i=1 3 t4 = c4 end t = c t5 return p 5 5
I Solve:
0 TExp(s) = t1 = c2 + c3 + s(c3 + c4) + c5
0 I So, TExp(s) = c + c s. ECE750-TXB A simplifying notation Lecture 3 Todd L. I Write c to mean Θ(1). Anytime a constant is needed, Veldhuizen [email protected] just use c.
I The result is an upper bound on the time, for c sufficiently large.
I Example:
Exp(r, s) t t1 = t2 + t3 + t5 2 p ← 1 t2 = c for i = 1 to s Ps t3 = c + (c + t4) t1 t3 t4 p ← p ∗ r i=1 t4 = c end t = c t5 return p 5
TExp(s) = c + cs
I Fine point: each time c occurs, it means Θ(1) (some bounded value): but not the same value at each occurrence.
ECE750-TXB Matrix Multiply Example Lecture 3 Todd L. Veldhuizen [email protected]
Matrix-Multiply(n, A, B, C) for i = 1 to n for j = 1 to n C(i, j) ← 0 for k = 1 to n C(i, j) ← C(i, j) + A(i, k) ∗ B(k, j) end end end ECE750-TXB Matrix Multiply: analyze Lecture 3 Todd L. Veldhuizen [email protected] Matrix-Multiply(n, A, B, C)
for i = 1 to n
for j = 1 to n t 3 C(i, j) ← 0
for k = 1 to n t1 t2 t4 t5 C(i, j) ← C(i, j) + A(i, k) ∗ B(k, j)
end
end
end t = c + Pn (c + t ) 1 i=1 2 t = c + Pn (c + t + t ) 2 j=1 3 4 t3 = c t = c + Pn (c + t ) 4 k=1 5 t5 = c
ECE750-TXB Matrix multiply: solve Lecture 3 Todd L. Veldhuizen [email protected]
I Solve:
n n n ! X X X t1 = c + 2c + 3c + 2c i=1 j=1 k=1 = c + n · (2c + n · (3c + n · 2c)) = 2cn3 + 2cn2 + 2cn + c
3 I MatrixMultiply(n, A, B, C) takes Θ(n ) time. ECE750-TXB Analyzing Recursion Lecture 3 Todd L. Veldhuizen [email protected]
I When functions call themselves, we will get time functions defined in terms of themselves.
I Such equations are called recurrences.
I Example:
T (1) = c Base case T (n) = c + T (n − 1) Recursion
This example easy to solve: T (n) = c + c + c + ··· c = cn | {z } n I Rarely that easy in practice!
ECE750-TXB Fibonacci example Lecture 3 Todd L. Veldhuizen [email protected] Fibonacci(n) if n ≤ 2 then return 1 else return Fibonacci(n − 1) + Fibonacci(n − 2)
I Analyze base case(s) separately:
T (1) = c T (2) = c
I Recurrence:
T (n) = c + T (n − 1) + T (n − 2) ECE750-TXB Fibonacci example: Call graph Lecture 3 Todd L. Veldhuizen [email protected]
ECE750-TXB Lecture 3
Todd L. Veldhuizen [email protected]
Bibliography
Part II
Z-transforms ECE750-TXB Unilateral Z-transform I Lecture 3 Todd L. Veldhuizen I Transforms give us an alternative representation of [email protected]
functions or series in which certain manipulations Bibliography and/or insights are easier.
I The Z-transform is of special interest to algorithm analysis because it makes solving some recurrences simple.
I To put the Z-transform in context of transforms in general: there are Integral transforms (for functions of the real line) and their discrete cousins Generating functions and Formal power series.
I Integral transforms [1, 2]: Represent a function f (x) by its transform F (p). R I Forward transform F (p) = K(p, x)f (x)dx R I Inverse transform f (x) = L(x, p)F (p)dp
I K, L are kernels. −px I Laplace transform: K(p, x) = e
ECE750-TXB Unilateral Z-transform II Lecture 3 Todd L. Veldhuizen [email protected] iωt I Fourier transform: K(p, x) = e p−1 Bibliography I Mellin transform: K(p, x) = x
I Exotica: Hankel transform, Hilbert transform, Abel transform, ···
I Generating functions / formal power series [6, 2]:
I Represent a sequence/discrete function f (n) by its transform/generating function F (z) P I Forward transform: F (z) = K(z, n)f (n) R I Inverse transform: f (n) = L(n, z)F (z) n I Ordinary Generating Functions: K(z, n) = z −n I ? Z-transforms: K(z, n) = z zn I Exponential Generating Functions: K(z, n) = n! e−z zn I Poisson Generating Functions: K(z, n) = n! I Exotica: Lambert series, Bell series ECE750-TXB Unilateral Z-transform III Lecture 3 Todd L. Veldhuizen [email protected]
I Z-transforms are more-or-less the same as ordinary Bibliography generating functions. The OGF can be obtained from the z-transform, and vice versa, by the substitution z 7→ z −1. The OGF form is more common in combinatorics and theoretical computer science; the Z-transform is more common in engineering, particularly signals and controls.
I Very useful for solving linear difference equations, i.e., equations of the form
f (n) + c1f (n − a1) + c2f (n − a2) + ··· = g(n)
that arise frequently in algorithm analysis.
ECE750-TXB Unilateral Z-transform IV Lecture 3 Todd L. Veldhuizen [email protected]
Bibliography I Linear difference equations are a special case of recurrences. Examples of recurrences not in this class are:
T (n) = T (n/2) + 1 Solution T (n) = Θ(log n) √ T (n) = T ( n) + 1 Solution T (n) = Θ(log log n)
However, we can use Z-transforms to obtain approximate solutions to the above pair of recurrences by performing a change of variables that results in a linear recurrence: r = 2n n for the first, r = 22 for the second. For a survey of recurrence-solving techniques, see [3]. ECE750-TXB Unilateral Z-transform V Lecture 3 Todd L. I Definition of the Z-transform and its inverse: Veldhuizen [email protected] ∞ X −n Z[f (n)] = f (n)z = F (z) (1) Bibliography n=0 I −1 1 n−1 Z [F (z)] = 2πi F (z)z dz (2) C where the contour C must be in the region of convergence of F (z) and encircle the origin.
I The function f (n) is discrete, i.e., f : N → R. I The Z-transform F (z) is complex: F : C → C. I The sum of Eqn. (1) often converges for only part of the complex plane, called the Region of Convergence (ROC) of F (z).
I In practice, never use Eqns. (1,2): instead use tables of transform pairs.
I Standard references: [4, 6, 5] or any DSP book
ECE750-TXB Unilateral Z-transform VI Lecture 3 Todd L. Veldhuizen [email protected] I An important transform pair: the z-transform of f (n) = bn is Bibliography
∞ X 1 Z [bn] = z−nbn = 1 − bz−1 n=0
Note that the sequence f (0), f (1), f (2), ··· = b0, b1, b2, ··· can be read off from the series expansion of F (z):
1 = b0 + b1z−1 + b2z−2 + b3z−3 + ··· 1 − bz−1 This is by definition — compare Eqn. (1). ECE750-TXB Unilateral Z-transform VII Lecture 3 Todd L. Veldhuizen [email protected] I A typical z-transform of a function f (n) looks like:
Bibliography −1 −1 (1 − a1z )(1 − a2z ) ··· F (z) = −1 −1 (1 − b1z )(1 − b2z ) ···
Here we have written F (z) in factored form. −1 I When z = ai , (1 − ai z ) = 0, and F (z) = 0. Such values of z are called zeros. −1 I When z → bi , (1 − bi z ) → 0 and F (z) → ∞. Such values of z are called poles.
I To take the inverse Z-transform of something in the form N(z) F (z) = −1 −1 (1 − b1z )(1 − b2z ) ···
ECE750-TXB Unilateral Z-transform VIII Lecture 3 Todd L. Veldhuizen where the b1, b2, ··· are all distinct, we can use partial [email protected] fractions expansion to write Bibliography
N1(z) N2(z) F (z) = −1 + −2 + ··· (1 − b1z ) (1 − b2z )
n 1 and then use the transform pair Z[b ] = (1−bz−1) to obtain something like
n n f (n) = c1b1 + c2b2 + ···
The term with the largest value of |bi | will be asymptotically dominant, e.g. if f (n) = 2n + 3n, then f (n) ∼ 3n.
I Hence, the asymptotic behaviour of f (n) can be read off directly from F (z): find the pole farthest from the n origin (i.e., the bi with |bi | largest); then f (n) = Θ(bi ). ECE750-TXB Unilateral Z-transform IX Lecture 3 Todd L. Veldhuizen [email protected]
Bibliography
I When the largest pole occurs in a form such as −1 2 −1 3 (1 − bi z ) or (1 − bi z ) etc. (double,triple poles), we need to consult a table of transforms and find what −1 k form (1 − bi z ) will take:
F −1[(1 − bz−1)2] = (n + 1)bn −1 −1 3 1 2 3 n F [(1 − bz ) ] = ( 2 n + 2 n + 1)b
ECE750-TXB Z-transforms Lecture 3 Todd L. Veldhuizen [email protected]
I Two compelling reasons for using Z-transforms: Bibliography 1. Because of the transform pair
Z[f (n − a)] = z−aF (z)
linear difference equations become linear equations that can be solved by simple algebraic manipulation. 2. The asymptotics of f (n) are governed by the pole(s) of F (z) farthest from the origin. If we just want to know Θ(f (n)), we can take the z-transform, and find the outermost pole(s) [2].
I e.g. If the outermost pole(s) of F (z) is a single pole at z = 2, then f (n) is Θ(2n). I e.g. If the outermost pole(s) of F (z) is a double pole at z = 5, then f (n) is Θ(n5n). ECE750-TXB Solving linear recurrences with Z-transforms Lecture 3 Todd L. I Workflow for exact solution: Veldhuizen 1. Use (discrete) δ-functions to encode initial conditions: [email protected] ( 1 when n = a Bibliography δ(n − a) = 0 otherwise 2. Take Z-transform of difference equation(s) to obtain equation(s) in F (z). 3. Solve for F (z). Linear difference equations result in F (z) being a ratio of polynomials in z. Factor the denominator. 4. Use partial fraction expansion to split into a sum of simple terms, and take the inverse Z-transform. I Workflow for asymptotic solution: 1. Disregard initial conditions. 2. Take Z-transform of recurrence. Solve for F (z). Factor denominator. 3. Identify outermost pole(s). If they are > 1, find the inverse Z-transform of the term corresponding to those pole(s). If outermost poles are ≤ 1, the initial conditions may matter ⇒ exact solution.
ECE750-TXB Common Z-transform pairs Lecture 3 Todd L. Veldhuizen [email protected] I Linearity: Bibliography Z [af (n) + bg(n)] = aZ[f (n)] + bZ[g(n)]
I Common transform pairs:
Z [T (n − a)] = z−aT (z) shift Z [δ(n)] = 1 impulse n 1 Z [a ] = 1−az−1 single pole n 1 Z [(n + 1)a ] = (1−az−1)2 double pole 1 Z [1] = 1−z−1 single pole at 1 z−1 Z [n] = (1−z−1)2 double pole at 1 2 z−1+z−2 Z n = (1−z−1)3 triple pole at 1 ECE750-TXB Finding boundary conditions I Lecture 3 Todd L. Veldhuizen [email protected]
Bibliography I We use the Z-transform as a unilateral transform: all functions are assumed to be 0 for n < 0.
I Initial conditions must be dealt with by introducing δ-functions.
I E.g., the Fibonacci numbers satisfy the recurrence
f (n) = f (n − 1) + f (n − 2)
with the boundary conditions (BCs) f (0) = f (1) = 1.
I If we evaluate f (0) = f (−1) + f (−2) = 0, it doesn’t satisfy the BC f (0) = 1.
ECE750-TXB Finding boundary conditions II Lecture 3 Todd L. I We add a term δ(n): Veldhuizen [email protected]
f (n) = f (n − 1) + f (n − 2) + αδ(n) Bibliography
Then f (0) = α, so we choose α = 1 to match the BC f (0) = 1. Then try f (1):
f (1) = f (0) + f (−1) +α δ(1) |{z} | {z } |{z} =1 =0 =0 = 1
So, our BC f (1) = 1 is satisfied.
I In general, if the recurrence has a term f (n − k), we may need terms
α0δ(n) + α1δ(n − 1) + ··· + αk δ(n − k)
to account for BC’s. ECE750-TXB Finding boundary conditions III Lecture 3 Todd L. Veldhuizen [email protected]
Bibliography
I However, if we are only interested in asymptotic behaviour, boundary conditions often do not matter: αz−k I Functions αδ(n − k) have a Z-transform 1−z−1 . I Will contribute a pole at z = 1, plus some term(s) to the numerator of the Z-transform when written in factored form.
I If the dominant poles of the Z-transform are > 1, then the pole(s) and zero(s) contributed by δ(n − k) functions do not change the asymptotics.
ECE750-TXB Z-Transforms: Fibonacci Example Lecture 3 Todd L. Example: our time recurrence for the Fibonacci Veldhuizen I [email protected] function. Bibliography t(n) = c + t(n − 1) + t(n − 2)
I Ignore initial conditions; take z-transform: c T (z) = + z−1T (z) + z−2T (z) 1 − z−1
I Solve for T (z): c T (z) = 1 − 2z−1 + z−3 √ 1± 5 I Asymptotics: have poles at z = 1, 2 √ 1+ 5 I Outermost pole (i.e., with |z| maximized) is z = 2 ; dominates asymptotics √ n 1+ 5 I T (n) is Θ(φ ) with φ = 2 ≈ 1.618. ECE750-TXB Z-Transforms: Fibonacci Example Lecture 3 √ Exact solution: via partial fractions. Let φ = 1+ 5 , Todd L. I √ 2 Veldhuizen 1− 5 [email protected] θ = 2 . c Bibliography T (z) = 1 − 2z −1 + z −3 c A B C = (1−z−1)(1−φz−1)(1−θz−1) = 1−z−1 + 1−φz−1 + 1−θz−1 ˛ A = T (z)(1 − z −1)˛ = −c ˛z=1 √ √ −1 ˛ c 5( 5+1)2 B = T (z)(1 − φz )˛ = √ ˛z=φ 10( 5−1) √ √ −1 ˛ c 5( 5−1)2 C = T (z)(1 − θz )˛ = √ (Yech.) ˛z=θ 10(1+ 5) −1 h α i n I Inverse Z transform: Z 1−az−1 = αa f (n) = −c + Bφn + Cθn ∼ Bφn + O(1)
I Partial fractions is tedious: if we only want asymptotics, just read the pole locations and do not bother with an exact inverse transform.
ECE750-TXB Method 2: Maple Lecture 3 Todd L. Veldhuizen The practical method of choice is to use a symbolic algebra [email protected] package like Maple: Bibliography > rsolve({T(n)=c+T(n−1)+T(n−2),T(1..2)=c},T); √ “ √ ”n √ “ √ ”n −1/5 c 5 −1/2 5 + 1/2 + 1/5 c 5 1/2 + 1/2 5 √ „ “√ ”−1«n −1/5 c 5 −2 5 + 1 “√ ” √ „ “ √ ”−1«n “ √ ”−1 −1/5 c 5 − 1 5 −2 − 5 + 1 − 5 + 1 − c
> asympt(%,n,2); √ !n √ 1 + 5 2/5 c 5 + O (1) 2
√ n 1+ 5 So, running time is Θ(φ ) with φ = 2 = 1.6180 ··· ECE750-TXB Fibonacci example cont’d Lecture 3 Todd L. Veldhuizen I So, our little Fibonacci(n) function requires [email protected]
exponential time. Bibliography I Is there a better way?
I Iterate: for i = 2..n sum previous two elements. Requires Θ(n) time.
I Use our Z-transform powers: ( 1 if n ≤ 2 Fib(n) = Fib(n − 1) + Fib(n − 2) otherwise = Fib(n − 1) + Fib(n − 2) + δ(n − 1) − δ(n − 2)
Z-transform, solve, inverse Z-transform:
Fib(n) = aφn + bθn
where a, b are constants, and φ, θ are as before. This can be implemented in O(log n) time.
ECE750-TXB Bibliography I Lecture 3 Todd L. Veldhuizen [email protected]
Bibliography [1] Brian Davies. Integral transforms and their applications. Springer, 3rd edition, 2005. bib [2] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. 2007. Book draft. bib pdf
[3] George S. Lueker. Some techniques for solving recurrences. ACM Comput. Surv., 12(4):419–436, 1980. bib pdf ECE750-TXB Bibliography II Lecture 3 Todd L. Veldhuizen [email protected]
[4] Alan V. Oppenheim, Alan S. Willsky, and Syed Hamid Bibliography Nawab. Signals and systems. Prentice-Hall signal processing series. Prentice-Hall, second edition, 1997. bib [5] Robert Sedgewick and Philippe Flajolet. An introduction to the analysis of algorithms. Addison-Wesley, 1996. bib [6] Herbert S. Wilf. Generatingfunctionology. Academic Press, 1990. bib ECE750-TXB Lecture 4: Search & Correctness Proofs
Todd L. ECE750-TXB Lecture 4: Search & Veldhuizen Correctness Proofs [email protected] Outline
Bibliography Todd L. Veldhuizen [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 14, 2007
ECE750-TXB Problem: Searching a sorted array Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected]
I Let hT , ≤i be a total order. e.g. T could be the Outline integers Z. Bibliography Problem: Searching a sorted array I Inputs: 1. An integer n > 0. 2. An array A[0..n − 1] of elements of T , sorted in ascending order so that (i ≤ j) ⇒ (A[i] ≤ A[j]). 3. An element x of T .
I Specification: Return true if x is equal to some element in the array, false otherwise. ECE750-TXB Linear Searching Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] I Let’s analyze the worst-case time complexity of the
following (naive) algorithm on a RAM. Outline
Bibliography Linsearch(int n, T[] A, T x) for i = 0 to n − 1 if A[i] = x then return true end return false
I e.g.,
Linsearch(5, [3, 5, 9, 9, 13], 4) returns false Linsearch(5, [3, 5, 9, 9, 13], 9) returns true
ECE750-TXB Linear Searching Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] I Without thinking much we can say this algorithm takes time Θ(n), but let’s get the practice, and be sure: Outline Bibliography I Time taken depends on n, but not on x or the contents of A[], assuming that the type T allows comparisons in time Θ(1).
I Let T (n) be the time taken for an array of size n.
Linsearch(int n, T[] A, T x) t for 4 i = 0 to n − 1
t5 t3 if t1 A[i] = x then t2 return true t7 end
t6 return false ECE750-TXB Linear Searching Lecture 4: Search & Correctness Proofs I Write equations: (see rules from Lecture 3) Todd L. Veldhuizen t1 = c1 [email protected] t2 = c2 Outline t3 = c3 + t1 + max(t2, 0) Bibliography t4 = c4 Pn−1 t5 = t4 + i=0 (c5 + t3) t6 = c6 t7 = t5 + t6
I (In the above analysis, I analyzed the “one-armed if” (t3) by pretending it was an if ··· then ··· else ··· statement in which the second branch required zero time.)
I Solving,
T (n) = t7 = n(c5 + c3 + c1 + c2) + c4 + c6 So, Linsearch requires Θ(n) time as expected.
ECE750-TXB How To Catch A Lion In A Desert Lecture 4: Search & Correctness Proofs The Bolzano-Weierstraß method. Divide the desert by a Todd L. Veldhuizen line running from north to south. The lion is then either in [email protected] the eastern or in the western part. Let’s assume it is in the Outline eastern part. Divide this part by a line running from east to Bibliography west. The lion is either in the northern or in the southern part. Let’s assume it is in the northern part. We can continue this process arbitrarily and thereby constructing with each step an increasingly narrow fence around the selected area. The diameter of the chosen partitions converges to zero so that the lion is caged into a fence of arbitrarily small diameter. — from How To Catch A Lion In The Desert, Mathematical Methods Not as elegant as the inversion method1, but a good starting point for a search algorithm.
1Place a spherical cage in the desert, enter it and lock from the inside. Perform an inversion with respect to the cage. Then, the lion is inside the cage and you are outside. ECE750-TXB Binary Search Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] I Search a sorted array A[0..n − 1] by repeatedly dividing
it in half, and searching one of the halves for x. Outline
I The portion of the array we are searching will be l..h Bibliography (for low and high). Initially we’ll have l = 0 and h = n − 1, so that we are searching A[0..n − 1].
I We will design the algorithm around an invariant: a property that is maintained as the algorithm runs.
I The invariant has three parts, each of which must always be true:
I A is sorted. I l ≤ h. Otherwise, A[l..h] is not a valid interval of the array.
I A[l] ≤ x ≤ A[h]. The lion is always in our segment.
ECE750-TXB Binary Search Lecture 4: Search & Correctness BinarySearch(n, A[], x) Proofs Todd L. I Require A sorted. Veldhuizen [email protected] I Let l = 0 and h = n − 1. A[l..h] is our search range.
I If x < A[l] or A[h] < x then x is not in the array; return Outline
false. Bibliography I Otherwise, A[l] ≤ x ≤ A[h] and l ≤ h. We have established the invariant. I Call BinarySearch2(A, x, l, h). BinarySearch2(A[], x, l, h): I Require l ≤ h, A[l] ≤ x ≤ A[h], A sorted (Invariant). I If l = h then return true. I Otherwise, split the search range in two by choosing 1 midpoint i = l + b 2 (h − l)c. Then either: I x ≤ A[i], in which case A[l] ≤ x ≤ A[i]. Return BinarySearch2(A, x, l, i). I A[i] < x, in which case either I A[i] < x < A[i + 1], in which case return false; or I A[i + 1] ≤ x ≤ A[h]. Return BinarySearch2(A, x, i + 1, h). ECE750-TXB Binary Search: Code Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] BinarySearch2(A, x, l, h) requires (l ≤ h) ∧ (A[l] ≤ x ≤ A[h]) ∧ (A sorted) Outline if l = h then Bibliography return true else i ← l + b(h − l)/2c if x ≤ A[i] then return BinarySearch2(A, x, l, i) else if x < A[i + 1] then return false else return BinarySearch2(A, x, i + 1, h)
ECE750-TXB Binary Search: Example Lecture 4: Search & Correctness Proofs
I Example: n = 7, search for x = 3. Todd L. Veldhuizen [email protected] 0 1 2 3 4 5 6
Outline A[] 1 3 6 9 10 10 13 O O O Bibliography l i h x ≤ A[i]
A[] 1 3 6 9 10 10 13 O O O l i h x ≤ A[i]
A[] 1 3 6 9 10 10 13 O O l, i h ¬(x ≤ A[i]) ∧ ¬(x < A[i + 1])
A[] 1 3 6 9 10 10 13 O l, h l = h return true k ECE750-TXB Binary Search: Time analysis for h − l + 1 = 2 . Lecture 4: Search & Correctness Proofs I Let n = h − l + 1, the size of the search range. Todd L. I We will analyze BinarySearch2(··· ) only; the Veldhuizen ‘setup’ function BinarySearch(··· ) adds a constant [email protected] overhead. Outline I Let T (n) be an (upper bound for) the time complexity. Bibliography BinarySearch2(A, x, l, h)
if t10 l = h then
t9 return A[l] = x
else
t8 i ← l + b(h − l)/2c
t6 if x ≤ A[i] then t11 t5 return BinarySearch2(A, x, l, i)
else t t 7 if 3 x < A[i + 1] then
t2 return false t4
else t 1 return BinarySearch2(A, x, i + 1, h)
ECE750-TXB Binary Search: Time analysis Lecture 4: Search & Correctness Proofs k I Analyze for the case where h − l + 1 = 2 , for some Todd L. Veldhuizen k ∈ N. [email protected]
I Need to prove that when BinarySearch2 calls itself, Outline k we go from a problem of size 2 to a problem of size Bibliography 2k−1: then recurrence will have the form T (n) = ··· + T (n/2) + ··· . Proposition (Both halves have size 2k−1.) If h − l + 1 = 2k and i = l + b(h − l)/2c, then i − l + 1 = 2k−1 and h − (i + 1) + 1 = 2k−1. Proof. h − l + 1 = 2k implies i = l + b(h − l)/2c = l + b(2k − 1)/2c = l + 2k−1 − 1, so i − l + 1 = (l + 2k−1 − 1) − l + 1 = 2k−1, and h − (i + 1) + 1 = h − (l + 2k−1 − 1 + 1) + 1 = 2k − 2k−1 = 2k−1. ECE750-TXB Binary Search: Time analysis Lecture 4: Search & Correctness k Proofs I Assuming n = h − l + 1 = 2 , a recursive call has Todd L. problem size 2k−1 = n/2. Veldhuizen [email protected] I Write equations: (use c = Θ(1) notation) Outline
Bibliography t1 = c + T (n/2) t2 = c t3 = c t4 = c + max(t1, t2) t5 = c + T (n/2) t6 = c t = t + c + max(t , t ) 7 6 5 4 t = c 8 t = c 9 t = c 10 ( c if n = 1 t11 = c + t7 + t8 otherwise
ECE750-TXB Binary Search: Time analysis Lecture 4: Search & Correctness Proofs I Solving, simplifying, and folding constants, we obtain Todd L. Veldhuizen the recurrence [email protected] ( c if n = 1 Outline T (n) = Bibliography c + T (n/2) otherwise
I So far, we have only seen recurrences of the form
F (n) = αF (n − a) + βF (n − b) + ··· + G(n)
i.e. linear difference equations. ξ I Change of variables: Let ξ = log n, and r(ξ) = T (2 ). Then T (n/2) = r(ξ − 1). New recurrence:
r(ξ) = c + r(ξ − 1)
Now it is a linear difference equation in ξ. ECE750-TXB Binary Search: Solve recurrence Lecture 4: Search & Correctness 2 Proofs I From inspection r(ξ) = c(1 + ξ), but for practice: Todd L. Veldhuizen r(ξ) = c + r(ξ − 1) [email protected] ⇓ z-transform Outline c R(z) = + z−1R(z) Bibliography 1 − z−1 c I Solve: get R(z) = (1−z−1)2 . A double pole at z = 1.
z−1 Z[n] = (1−z−1)2 Z[T (n − a)] = z−aT (z)
+1 z−1 Write as R(z) = c · z · (1−z−1)2 . Then −1 −1 −1 +1 z −1 z Z c · z · −1 2 = cZ −1 2 (1 − z ) (1 − z ) ξ←ξ+1
= cξ|ξ←ξ+1 = c(ξ + 1) 2Note: we treat the c term as if it were c · u(n), where u(n) is the step function: u(n) = 1 if and only if n ≥ 0.
ECE750-TXB Binary Search: Solve recurrence Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] I Therefore r(ξ) = c(ξ + 1). And, T (n) = r(log n), so
T (n) = c(1 + log n). Outline
k Bibliography I So, binary search takes time O(log n) for n = 2 .
I It turns out (we won’t prove) it is O(log n) for any n ≥ 1. (See assignment 1, section 1, problem 1.)
I Much faster than Linsearch. On an array of size n = 1, 000, 000,
I On average Linsearch will require ≈ 500, 000 comparisons. I BinarySearch requires ≤ 21 comparisons. I There is an even faster search method, interpolation search, that under favourable assumptions takes average time Θ(log log n) steps. ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
I The invariant helps immensely in proving correctness. Todd L. Veldhuizen I To prove: if the preconditions are satisfied, then [email protected] BinarySearch2(A, x, l, h) returns true if and only if Outline there is a k ∈ [l, h] such that A[k] = x. Bibliography I Proof architecture: Progress and Preservation. 1. Progress is made: in each recursive call, the problem becomes strictly smaller. (This implies a base case is eventually reached.) 2. Preservation: the invariant is satisfied at all entries to the function. 2.1 the invariant is satisfied when BinarySearch2 is called from BinarySearch (initial entry). 2.2 the invariant is preserved when BinarySearch2 calls itself recursively. 3. Base cases: when BinarySearch2 returns directly (without calling itself), its return value is correct. 4. (1,2,3) together yield a simple correctness proof by induction over problem size.
ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen A basic rule of inference: in the branches of an if statement, [email protected]
Outline
if ψ then Bibliography (1) ··· else (2) ···
I At point (1), ψ is true.
I At point (2), ψ is false. Example: if ψ were ‘n > m’, where n, m ∈ N, then at (1) n > m is true, and at (2) ¬(n > m) is true, which means m ≤ n. ECE750-TXB Binary Search: Correctness Proof I Lecture 4: Search & Correctness Proofs Lemma (Base cases correct) Todd L. Veldhuizen If the invariant holds, and BinarySearch2 returns directly [email protected] without calling itself, then it returns true if and only if there Outline is a k ∈ [l, h] such that A[k] = x. Moreover, Bibliography BinarySearch2 always returns directly on problems of size n = 1. Proof. There are two cases. 1. The program path taken is:
if (1)l = h then return true
We have l = h, and the invariant says A[l] ≤ x ≤ A[h]; since hT , ≤i is a total order, antisymmetry ((x ≤ y) ∧ (y ≤ x) ⇒ x = y) implies A[l] = x. The return value is true, satisfying the requirement. If the
ECE750-TXB Binary Search: Correctness Proof II Lecture 4: Search & Correctness problem size is h − l + 1 = 1 then l = h, and Proofs BinarySearch2 returns directly. Todd L. Veldhuizen 2. The program path taken is: [email protected] Outline
BinarySearch2(A, x, l, h) Bibliography if (1)l = h then else if (2)x ≤ A[i] then else if (3)x < A[i + 1] then return false
We have (1) l 6= h and (2) x > A[i] and (3) x < A[i + 1]. Putting (2) and (3) together, A[i] < x < A[i + 1], which together with the array being sorted implies x is not in the array, and the return value is false. ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen Lemma (Progress) [email protected]
Each time BinarySearch2 calls itself, the problem size is Outline
strictly smaller. Bibliography Proof. The possible recursion paths are:
BinarySearch2(A, x, l, h) if (1)l = h then else i ← l + b(h − l)/2c ··· (a)BinarySearch2(A, x, l, i) ··· (b)BinarySearch2(A, x, i + 1, h)
From (1) we have l 6= h. Therefore problem size n = (h − l + 1) satisfies n ≥ 2.
ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen We have two cases: [email protected] 1. (Call site (a).) The problem size of the recursive call is Outline i − l + 1, so we must prove i − l + 1 < h − l + 1. This Bibliography is equivalent to i < h. (By contradiction.) Suppose that i ≥ h. Then, substituting i = l + b(h − l)/2c, we obtain b(h − l)/2c ≥ h − l, a contradiction since h − l ≥ 1. 2. (Call site (b).) To prove: h − (i + 1) + 1 < h − l + 1. Equivalent to l < i + 1. From the invariant, l ≤ h, so b(h − l)/2c ≥ 0. Since i = l + b(h − l)/2c, l ≤ i. Therefore l < i + 1. ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected]
Outline
Lemma (Preservation) Bibliography If the invariant is satisfied on a call to BinarySearch2, then it is satisfied on calls by BinarySearch2 to itself. Proof. Recall the invariant is:
(l ≤ h) ∧ (A[l] ≤ x ≤ A[h]) ∧ (A sorted)
We never modify the array, so A remains sorted. There are two cases to consider.
ECE750-TXB Binary Search: Correctness Proof I Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen 1. The program path taken is [email protected]
BinarySearch2(A, x, l, h) Outline if (1)l = h then Bibliography else i ← l + b(h − l)/2c if (2)x ≤ A[i] then return BinarySearch2(A, x, l, i)
We have (1) l 6= h and (2) x ≤ A[i]. Need to prove 1.1 A[l] ≤ x ≤ A[i]. We have A[l] ≤ x from the invariant, and x ≤ A[i] from the branch condition (2), so A[l] ≤ x ≤ A[i]. 1.2 l ≤ i. From the invariant, l ≤ h, so b(h − l)/2c ≥ 0. Therefore l ≤ i = l + b(h − l)/2c. ECE750-TXB Binary Search: Correctness Proof II Lecture 4: Search & Correctness 2. The program path taken is Proofs Todd L. Veldhuizen BinarySearch2(A, x, l, h) [email protected] if (1)l = h then Outline else Bibliography i ← l + b(h − l)/2c if (2)x ≤ A[i] then else if (3)x < A[i + 1] then else return BinarySearch2(A, x, i + 1, h)
Need to prove: 2.1 A[i + 1] ≤ x ≤ A[h]. From (3) x ≥ A[i + 1] and from the invariant x ≤ A[h]. Therefore A[i + 1] ≤ x ≤ A[h]. 2.2 i + 1 ≤ h. We proved i < h in the first case of the progress lemma, so i + 1 ≤ h.
ECE750-TXB Binary Search: Correctness Proof: Denouement Lecture 4: Search & Correctness Put everything together: Proofs Todd L. Veldhuizen Lemma (Correct Step) [email protected] If the invariant is initially satisfied, then Outline 1. If the problem is of size n = 1, BinarySearch2 Bibliography returns immediately a correct answer. 2. If the problem is of size n > 1 then BinarySearch2 either immediately returns a correct answer, or calls itself with the invariant satisfied on a problem of size n0 where 1 ≤ n0 < n.
Proof. (1) from the base cases lemma. (2) if it returns immediately it is correct from the base cases lemma. Otherwise, it calls itself: progress lemma gives n0 < n, preservation lemma gives invariants satisfied, and 1 ≤ n0 follows from l ≤ h (invariant). ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Theorem Todd L. If the invariant is satisfied then returns a Veldhuizen BinarySearch2 [email protected] correct answer. Outline Proof. Bibliography By induction on problem size. 1. Base case. To prove: if n = 1 a correct answer is returned. Proof: apply Correct Step Lemma. 2. Induction step. To prove: if a correct answer is returned for problems of size ≤ n (induction hypothesis), then a correct answer is returned for a problem of size n + 1. Proof: apply the Correct Step Lemma: for a problem of size n + 1, BinarySearch2 is correct or calls itself on a problem of size n0 < n + 1. Therefore n0 ≤ n, and from the induction hypothesis a correct answer is returned.
ECE750-TXB Binary Search: The front end Lecture 4: Search & Correctness One last item: the entry routine BinarySearch. It Proofs Todd L. establishes the invariant. Its only requirement is that A is a Veldhuizen sorted array of at least one element. [email protected]
BinarySearch(n, A, x) Outline requires (A sorted) ∧ (n ≥ 1) Bibliography if (1)x < A[0] then return false else if (2)x > A[n − 1] then return false else return (3)BinarySearch2(A, x, 0, n − 1)
For (3) to be reached, A must be sorted, and 1. n ≥ 1 implies 0 ≤ n − 1, establishing l ≤ h; 2. (1) gives x ≥ A[0], and (2) gives x ≤ A[n − 1], establishing A[l] ≤ x ≤ A[h]. ECE750-TXB Binary Search: Correctness Proof Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected]
Outline
I This establishes the correctness of binary search up to a Bibliography basic level of rigour.
I However, proofs by hand are notoriously error-prone: I I did this proof by hand, and given my track record, I give it a 25% chance of being correct.
I Reward of $5 for each error found, up to a maximum of $20. :)
I Gold standard is a formal proof in a system such as Isabelle, Coq, ACL2, etc.
ECE750-TXB In praise of invariants Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected] I A good invariant is an indispensible tool in designing algorithms and data structures. Outline Bibliography I The fine details of the algorithm are often dictated by the need to 1. Preserve the invariant (during recursion, iteration, changing state); 2. Make progress; 3. Handle the base cases correctly.
I If you design an algorithm around an invariant: 1. the invariant guides you in the design; 2. you are more likely to have a correct implementation; 3. the proof of correctness is often easier (and, sometimes, straightforward). ECE750-TXB Bibliography I Lecture 4: Search & Correctness Proofs
Todd L. Veldhuizen [email protected]
Outline
Bibliography ECE750-TXB Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen ECE750-TXB Lecture 5: Veni, Divisi, Vici [email protected] (Divide and Conquer) Divide and Conquer
Abstract Data Todd L. Veldhuizen Types Bibliography [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 14, 2007
ECE750-TXB Divide And Conquer Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] BinarySearch is an (arguably degenerate) instance of a basic algorithm design pattern: Divide and Conquer
Abstract Data Types
Bibliography Divide And Conquer 1. If the problem is of trivial size, solve it and return. 2. Otherwise: 2.1 Divide the problem into several problems of smaller size. 2.2 Conquer these smaller problems by recursively applying the divide and conquer pattern. 2.3 Combine the answers to the smaller problems into an answer to the whole problem. ECE750-TXB Divide and Conquer Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] Binary Search as Divide-and-Conquer: Divide and 1. If the problem is of size n = 1, answer is obvious — Conquer Abstract Data return. Types 2. Otherwise: Bibliography
I Split the array into two halves. Principle:
(x in A[l..h]) ≡ (x in A[l..i]) ∨ (x in A[i + 1..j])
I Search the two halves: for one half, call self recursively; for other half, answer is false.
I Combine the two answers: since one answer is always false, and “x or false” is just “x,” simply return the answer from the half we searched.
ECE750-TXB Divide-and-Conquer Recurrences Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen If [email protected]
Divide and 1. The base cases (trivially small problems) require time Conquer
O(1), and Abstract Data 2. a problem of size n is split into k subproblems of size Types Bibliography s(n), and 3. splitting the problem into subproblems and combining the answers takes time f (n) then the general form of the time recurrence is
T (n) = c + kT (s(n)) + f (n)
e.g. for binary search we had k = 1 (we only had to search one half), s(n) = n/2, and f (n) = 0. ECE750-TXB Strassen Matrix Multiplication Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer Recall that our MatrixMultiply routine took Θ(n3) I Abstract Data time. Types Bibliography I Can we do better? No one thought so until...
I A landmark paper: Volker Strassen, Gaussian Elimination is not optimal (1969). [1] 3 I An o(n ) divide-and-conquer approach to matrix multiplication.
ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer k+1 Abstract Data Strassen: “If A, B are matrices of order m2 to be Types
multiplied, write Bibliography
A A B B C C A = 11 12 B = 11 12 C = 11 12 A21 A22 B21 B22 C21 C22
k where the Aik , Bik , Cik matrices are of order m2 ... ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen “Then compute [email protected] Divide and I = (A11 + A22)(B11 + B22) Conquer II = (A21 + A22)B11 Abstract Data Types III = A11(B12 − B22) Bibliography (7 subproblems) IV = A22(−B11 + B21) V = (A11 + A12)B22 VI = (−A11 + A21)(B11 + B12) VII = (A12 − A22)(B21 + B22) C = I + IV − V + VII 11 C = II + IV (and combine) 21 C12 = III + V C22 = I + III − II + VI
ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and 1. Subproblems: compute 7 matrix multiplications of size Conquer n/2. Abstract Data Types
2. Constructing the subproblems and combining the Bibliography answers is done with matrix additions/subtractions, taking Θ(n2) time. 3. Apply general divide-and-conquer recurrence with k = 7, s(n) = n/2, f (n) = Θ(n2):
T (n) = c + 7T (n/2) + Θ(n2) ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
f (n) Divide and 2 Conquer Recall that Θ(n ) means “some function f (n) for which n2 is eventually restricted to some finite positive interval Abstract Data Types [c , c ].” 1 2 Bibliography 2 Pick a value c > c2; then eventually f (n) ≤ cn . Solve recurrence:
T (n) = c + 7T (n/2) + cn2
This will give an asymptotically correct bound, but possibly T (n) is less than the actual time required for small n.
ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici ξ I Let r(ξ) = T (2 ), and ξ = log2 n. Recurrence becomes Todd L. Veldhuizen r(ξ) = c + 7r(ξ − 1) + c(2ξ)2 [email protected]
ξ Divide and = c + 7r(ξ − 1) + c(4 ) Conquer
Abstract Data I Z-transform: Types c Z [c] = Bibliography 1 − z−1 Z [7r(ξ − 1)] = 7z−1R(z) h i c Z c(4ξ) = 1 − 4z−1
I Z-transform version of recurrence is: c c R(z) = + 7z−1R(z) + 1 − z−1 1 − 4z−1
I Solve: c2(1 − 5 z−1) R(z) = 2 (1 − z−1)(1 − 4z−1)(1 − 7z−1) ECE750-TXB Strassen’s method Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Look at singularities: zero at z = 5 , poles at z = 1, 4, 7. Divide and I 2 Conquer I Pole at z = 7 is asymptotically dominant: Abstract Data Types
ξ Bibliography r(ξ) ∼ c17
I Change variables back: ξ = log2 n:
log n T (n) ∼ c17 2 log 7 2.807··· = c1n 2 = c1n
2.807··· I Strassen matrix multiplication takes Θ(n ) time.
ECE750-TXB Divide-and-conquer recurrences I Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen I Our analysis of Strassen’s algorithm is easily generalized. [email protected] Consider a divide-and-conquer recurrence of the form I Divide and Conquer d T (n) = kT (n/s) + Θ(n ) (1) Abstract Data Types
Bibliography I To obtain an asymptotic upper bound we can solve the related recurrence
T 0(n) = kT 0(n/s) + cnd (2)
0 I It can be shown that T (n) T (n). ξ I Analyzing for the case when n = s with ξ ∈ N, we can change variables to turn Eqn. (2) into:
r(ξ) = kr(ξ − 1) + c(sd )ξ ECE750-TXB Divide-and-conquer recurrences II Lecture 5: Veni, Divisi, Vici Z-transform: I Todd L. c Veldhuizen R(z) = kz−1R(z) + [email protected] 1 − sd z−1 Divide and Conquer Solve: I Abstract Data c Types R(z) = Bibliography (1 − kz−1)(1 − sd z−1)
I Which is the dominant pole? Depends on the values of k, s, d. d d d I Three cases: s < k, s = k, s > k. 1. sd < k. Then the dominant pole is z = k. Get
r(ξ) kξ
ξ Since n = s , use ξ = logs n:
log k log k 0 log n log k log n log n T (n) k s = (2 ) log s = (2 ) log s = n log s
ECE750-TXB Divide-and-conquer recurrences III Lecture 5: Veni, Divisi, Vici
d Todd L. 2. s = k. Then get a double pole at z = k. Recall that Veldhuizen [email protected] −1 −1 kz ξ Z = ξk Divide and (1 − kz−1)2 Conquer Abstract Data End up with Types Bibliography r(ξ) ξkξ
log k·log n log k 0 logs n log s log s T (n) (logs n)k (log n)2 = n log n log sd = n log s log n = nd log n
3. sd > k. Then dominant singularity is z = sd . Get
r(ξ) (sd )ξ
0 d log n d log s log n d T (n) (s ) s = (2 ) log s = n ECE750-TXB ‘Master’ Theorem Lecture 5: Veni, Divisi, Vici Theorem (‘Master’) Todd L. Veldhuizen The solution to a divide-and-conquer recurrence of the form [email protected]
d Divide and T (n) = kT (dn/se) + Θ(n ) Conquer Abstract Data where s > 1, is Types Bibliography log k Θ n log s if sd < k T (n) = Θ nd log n if sd = k Θ(nd ) if sd > k
I Examples: log k I Binary search: k=1, s=2, d=0: second case, log s = 0, so T (n) = Θ(n0 log n) = Θ(log n). log k I Strassen: k = 7, s = 2, d = 2: first case, log s ≈ 2.807, so T (n) = Θ(n2.807···).
ECE750-TXB Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data Types Abstract Data Types Bibliography ECE750-TXB Abstract Data Types Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] I A basic software engineering principle: Divide and Separate the interface (what you can do) from Conquer the implementation (how it is done.) Abstract Data Types
Bibliography I An abstract data type is a an interface to a collection of data.
I There may be numerous ways to implement an ADT, each with different performance characteristics.
I An ADT consists of 1. Some types that may be required to provide operations, relations, and satisfy properties. e.g. a totally ordered set. 2. An interface: the capabilities provided by the ADT.
ECE750-TXB Abstract Data Types Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen I Notation for types: hT , f1, f2, ··· , R1, R2, · · · i indicates [email protected] a set T together with Divide and I Some operators or functions f1, f2,... Conquer I Some relations R1, R2, ··· . Abstract Data Types This is the standard notation for a structure in logic: Bibliography could be
I an algebra (functions but no relations) e.g. a field hF , +, ∗, −, ·−1, 0, 1i;
I a relational structure (relations but no functions) e.g. a total order hT , ≤i;
I some structure with both functions and relations e.g. an ordered field hF , +, ∗, −, ·−1, 0, 1, ≤i
I Often a type is required to satisfy certain axioms, or belong to a specified class of structures, e.g. (a field, a total order, a distance metric.) ECE750-TXB ADT: Dictionary[K, V ] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen Dictionary[K,V] [email protected]
Stores a set of pairs (key, value), and finds values by Divide and I Conquer key. At most one value per key is permitted. Abstract Data Types I Types: Bibliography I hKi: key type (e.g. a word). I hV , 0i: value type (e.g., a definition of a word). The value 0 is a special value used to indicate absence of a dictionary entry.
I Operations:
I insert(k, v): insert a key-value pair into the dictionary. If an entry for the key is already present, it is replaced.
I V find(k): if (k, v) is in the dictionary, returns v; otherwise returns 0.
I remove(k): deletes the entry for key k, if present.
ECE750-TXB Abstract Data Types Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] Implementation ADT Divide and Data structure Conquer
Abstract Data Linked list Types k5 Bibliography kkkk kkkk Dictionary / Sorted array SSS SSSS SS) Search tree
Implementation insert time find time remove time Linked list O(n) O(n) O(n) Sorted array O(n) O(log n) O(n) Search tree O(log n) O(log n) O(log n) ECE750-TXB The role of ADTs in choosing data structures Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Questions to consider when choosing a data structure: Divide and Conquer
1. What type of data do you need to store? Abstract Data Types I Does it have a natural total order? e.g. integers, strings under lexicographic ordering, database keys, etc. Bibliography
I Does it bear some natural partial order? I Do elements represent points or regions in some metric space, e.g., Rn? (e.g., screen regions, boxes in a three-dimensional space, etc.) The order relation(s) or geometric organization of your data may allow the use of ADTs that exploit those properties to allow efficient access.
ECE750-TXB The role of ADTs in choosing data structures Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and 2. What operations are required? Do you need to Conquer Abstract Data I determine whether a value is in the data set (search)? Types I insert, modify, delete elements? Bibliography I iterate through the elements (order doesn’t matter)? I iterate through the elements in some sorted order? I find the “biggest” or “smallest” element? I find elements that are “close” to some value? These requirements can be compared with the interfaces provided by ADTs, to decide what ADTs might be suitable. ECE750-TXB The role of ADTs in choosing data structures Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and 3. What is the typical mixture of operations you will Conquer perform? Will you Abstract Data Types I insert frequently? Bibliography I delete frequently? I search frequently? Different ADT implementations may offer different performance characteristics. By understanding the typical mixture of operations you can choose an implementation with the most suitable performance characteristics.
ECE750-TXB ADT: Array[V] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] Array[V] Divide and I A finite sequence of cells, each containing a value. Conquer : Abstract Data I Types Types
I hV , 0i: a value type, with a “default value” 0 used to Bibliography initialize cells.
I Operations : I Array(n): create an array of length n, where n ∈ N is a positive integer. Cells are initialized to 0.
I integer length(): returns the length of the array I get(i): returns the value of cell i. It is required that 0 ≤ i ≤ length() − 1.
I set(i, v): sets the value of cell i to v. It is required that 0 ≤ i ≤ length() − 1. ECE750-TXB ADT: Set[V] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] Set[V] Divide and Conquer I Stores a set of values. Permits inserting, removing, and Abstract Data testing whether a value is in the set. At most one Types instance of a value can be in the set. Bibliography
I Types:
I hV i: a value type
I Operations:
I insert(v): adds v to the set, if absent. Inserting an element already in the set causes no change.
I remove(v): removes v from the set, if present; I boolean contains(v): returns true if and only if v is in the set.
ECE750-TXB ADT: Multiset[V] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] Multiset[V] Divide and Conquer I Stores a multiset of values (i.e., with duplicate elements Abstract Data permitted.) Permits inserting, removing, and testing Types whether a value is in the set. Sometimes called a bag. Bibliography
I Types:
I hV i: a value type
I Operations:
I insert(v): adds v to the multiset. I remove(v): removes an instance of v from the multiset, if present;
I boolean contains(v): returns true if and only if v is in the set. ECE750-TXB Stacks, Queues, and Priority Queues Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
I Informally, a queue is a collection of objects “awaiting Divide and their turn.” e.g., customers queueing at a grocery store. Conquer Abstract Data The queueing policy governs “who goes next.” Types
I First In, First Out (FIFO): like a line at the grocery Bibliography store: the element that was added least recently goes next.
I Last In, First Out (LIFO): the item added most recently goes next. A stack: like an “in-box” of work where new items are placed on the top, and what ever is on the top of the stack gets processed next.
I Priority Queueing: items are associated with priorities; the item with the highest priority goes next.
ECE750-TXB ADT: Queue[V] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Queue[V] Divide and Conquer
I A FIFO queue (first in, first out) of objects. Abstract Data Types I Types : Bibliography I hV i: a value type
I Operations :
I insert(v): adds the object v to the end of the queue I boolean isEmpty(): returns true just when the queue is empty
I V next(): returns and removes the value at the front of the queue. It is an error to perform this operation when the queue is empty. ECE750-TXB ADT: Stack[V] Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Stack[V] Divide and Conquer
I A LIFO (last in, first out) stack of objects. Abstract Data Types I Types : Bibliography I hV i: a value type
I Operations :
I push(v): adds a value to the top of the stack. I boolean isEmpty(): returns true just when the stack contains no elements.
I V pop(): returns the value at the top of the stack. It is an error to perform this operation when the stack is empty.
ECE750-TXB ADT: PriorityQueue[P,V] Lecture 5: Veni, Divisi, Vici
Todd L. PriorityQueue[P,V] Veldhuizen [email protected] I A queue of objects, each with an associated priority, in Divide and which an object with maximal priority is always chosen Conquer
next. Abstract Data Types Types : I Bibliography I hP, ≤i: a priority type, with a total order ≤. I hV i: a value type
I Operations :
I insert(p, v): insert a pair (p, v) where p ∈ P is a priority, and v ∈ V is a value;
I boolean isEmpty(): returns true just when the queue is empty;
I (P, V ) next(): returns and removes the object at the front of the queue, which is guaranteed to have maximal priority. It is an error to perform this operation on an empty queue. ECE750-TXB Data Structure: Linked List Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected] I Realizes a list of items, e.g., (2, 3, 5, 7, 11) Divide and I Inserting and removing elements at the front of the list Conquer requires O(1) time. Abstract Data Types
I Searching requires iterating through the list: O(n) time Bibliography I List can be iterated through from front to back. I Basic building block is a node that contains:
I data: A piece of data; I next:A pointer to the next node, or a null pointer if the node is the end of the list.
ECE750-TXB Data Structure: Linked List Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen public class LinkedList
... Abstract Data } Types Bibliography class Node
Node(T data, Node
Todd L. I Insert a new data element: Veldhuizen [email protected] public void insert (T data) Divide and { Conquer
head = new Node
I Remove the front element, if any: public T removeFirst() { if (head == null) throw new RuntimeException(”List is empty.”); else { T data = head.data; head = head.next; return data; } }
ECE750-TXB Data Structure: Linked List II Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer Check if the list is empty: Abstract Data I Types public boolean isEmpty() Bibliography { return (head == null); } ECE750-TXB Implementing Stack[V ] with a Linked List Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and I The ADT Stack[V ] is naturally implemented by a Conquer
linked list. Abstract Data Types
public class Stack
public void push(V v) { list . insert (v); } public boolean isEmpty() { return list .isEmpty(); } public V pop() { return list .removeFirst (); } }
I push(v), isEmpty(), and pop() require O(1) time.
ECE750-TXB Iterators Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen Iterator[V] [email protected]
I An iterator is an ADT that provides traversal through Divide and some container. It abstracts away from the details of Conquer Abstract Data how the data are stored, and presents a simple interface Types
for retrieving the elements of the container one at a Bibliography time.
I Types:
I V : the value type stored in the container
I Operations:
I Iterator(C): initialize the iterator to point to the first element in the container C;
I boolean hasNext(): returns true just when there is another element in the container;
I V next(): returns the next element in the container, and advances the iterator. ECE750-TXB An Iterator for a linked list Lecture 5: Veni, Divisi, Vici public class ListIterator
Divide and ListIterator ( LinkedList
T next() { if (node == null) throw new RuntimeException(”Tried to iterate past end of list ”);
T data = node.data; node = node.next; return data; } }
ECE750-TXB Iterators Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data Example usage of an iterator: Types ListIterator
Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data I Recall that a Queue[V ] ADT requires first-in first-out Types queueing. But, the list as we have shown it only Bibliography supports inserting and removing at one end.
I Simple variant of a linked list: a in addition to maintaining a pointer to the head of the list, also maintain a pointer to the tail of the list.
ECE750-TXB Bidirectional Linked List Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
class BiNode
BiNode
BiNode(T data, BiNode
Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data Types
Bibliography
ECE750-TXB Bidirectional Linked List Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and public class BiLinkedList
public void insert (T data) { head = new BiNode
Todd L. public T removeLast() Veldhuizen { [email protected]
if ( tail == null) Divide and throw new RuntimeException(”List is empty.”); Conquer else { Abstract Data T data = tail .data; Types if ( tail .prev == null) Bibliography { head = null; tail = null; } else { tail = tail .prev; tail .next = null; } return data; } }
ECE750-TXB Implementing a Queue < V > with a Lecture 5: Veni, Divisi, Vici
Bidirectional Linked List Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data public class Queue
BiLinkedList
public void insert (V v) { list . insert (v); } public boolean isEmpty() { return list .isEmpty(); } public V next() { return list .removeLast(); } }
I insert(v), isEmpty() and next() all take O(1) time. ECE750-TXB Bibliography I Lecture 5: Veni, Divisi, Vici
Todd L. Veldhuizen [email protected]
Divide and Conquer
Abstract Data Types [1] V. Strassen. Bibliography Gaussian elimination is not optimal. Numerische Mathematik, 13:354–356, 1969. bib pdf ECE750-TXB Lecture 6: Lists and Trees
Todd L. Veldhuizen ECE750-TXB Lecture 6: Lists and Trees [email protected] Linear Data Structures Todd L. Veldhuizen Trees [email protected] Bibliography
Electrical & Computer Engineering University of Waterloo Canada
February 14, 2007
ECE750-TXB Iterators Lecture 6: Lists and Trees
Todd L. Veldhuizen Iterator[V] [email protected] Linear Data I An iterator is an ADT that provides traversal through Structures some container. It abstracts away from the details of Trees how the data are stored, and presents a simple interface Bibliography for retrieving the elements of the container one at a time.
I Types:
I V : the value type stored in the container
I Operations:
I boolean hasNext(): returns true just when there is another element in the container;
I V next(): returns the next element in the container, and advances the iterator. ECE750-TXB An Iterator for a linked list Lecture 6: Lists and Trees public class ListIterator
Linear Data ListIterator ( LinkedList
{ node = list .head; } Trees
Bibliography boolean hasNext() { return (node == null); }
T next() { if (node == null) throw new RuntimeException(”Tried to iterate past end of list ”);
T data = node.data; node = node.next; return data; } }
ECE750-TXB Iterators Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures Example usage of an iterator: Trees Bibliography ListIterator
Todd L. Veldhuizen [email protected]
Linear Data Structures I Bidirectional Linked List Trees I Each node has a link to both the next and previous items in the list. Bibliography
I We maintain a pointer to both the front and the back. I We can insert and remove items at both the front and back of the list.
I We will use Bidirectional Linked Lists to illustrate two basic, but extremely useful principles: 1. Maintaining invariants of data structures; 2. Symmetry.
ECE750-TXB Bidirectional Linked List Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] I We have encountered invariants already, in the correctness sketch of binary search. That was an Linear Data invariant for a recursive algorithm, and was required to Structures Trees be true of each recursive invokation of the function. Bibliography Here we discuss data structure invariants. An invariant of a data structure is a property that is required to be always true, except when we are performing some transient update operation.
I Invariants help us to implement data structures correctly: many basic operations can be viewed as disruptions of the invariant (e.g., inserting an element) after which we need to repair or maintain the invariant. ECE750-TXB Bidirectional Linked List Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] I As in a linked list, the basic building block of a bidirectional linked list is a node. Linear Data Structures class BiNode
BiNode(T data) { data = data; next = null; prev = null; } }
ECE750-TXB Bidirectional Linked List: Invariants Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] I Let’s look at a few examples to see what invariants
suggest themselves. Linear Data Structures
Trees
Bibliography ECE750-TXB Bidirectional Linked List: Invariants Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures
I Here are the invariants we will use: Trees
1. (front 6= null) implies (front.prev = null). (If the list is Bibliography nonempty, there is no element before the front element.) 2. (back 6= null) implies (back.next = null). (If the list is nonempty, there is no element after the last element.) 3. (front = null) if and only if (back = null). 4. For any node x, 4.1 (x.next 6= null) implies x.next.prev = x; 4.2 (x.prev 6= null) implies x.prev.next = x.
ECE750-TXB Bidirectional Linked List: Symmetry Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] I Bidirectional linked lists have a natural symmetry: if we Linear Data ‘reverse’ the list by swapping the ‘front back’ Structures pointers and each of the ‘next prev’ pointers, we get Trees another bidirectional linked list. Bibliography
I We’ll call this the dual list.
I The symmetry extends to operations on the list: insertFront() is ‘dual’ to insertBack(), and removeFront() is dual to removeBack().
I This kind of duality has the following nice property:
I Carrying out a sequence of operations op1,..., opk gives the same result as
I Taking the dual list, carrying out the dual sequence of operations ˆop1,..., ˆop2, and taking the dual list. ECE750-TXB Bidirectional Linked List: Symmetry Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures I Example: starting from the list [1, 2], we can Trees I insertBack(3) to give [1, 2, 3]; Bibliography I removeFront() to give [2, 3]. The dual version:
I take the dual list [2, 1]; I insertFront(3) to give [3, 2, 1]; I removeBack() to give [3, 2]; I take the dual list [2, 3]. Get the same answer both ways.
ECE750-TXB Bidirectional Linked List: Symmetry Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
I Why we care about symmetry: if our implementation is Linear Data correct, Structures Trees I We can obtain the routine insertBack() by taking the code for insertFront() and swapping front/back and Bibliography prev/next;
I Ditto for removeBack() and removeFront(); I The set of invariants should not change under swapping front/back and prev/next. Example: For any node x, 1. (x.next 6= null) implies x.next.prev = x; 2. (x.prev 6= null) implies x.prev.next = x. If we swap next/prev, we get 1. (x.prev 6= null) implies x.prev.next = x; 2. (x.next 6= null) implies x.next.prev = x. ECE750-TXB Bidirectional Linked List: Implementation Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures I For operations at the front of the list, there are three Trees cases to consider: Bibliography 1. front=null (empty list) 2. front 6= null and front.next = null (one-element list) 3. front 6= null and front.next 6= null (multi-element list)
I We need to consider each of these cases when we implement, and ensure that in each case, the invariants are maintained.
ECE750-TXB Bidirectional Linked List: Implementation Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] public void insertFront (T data) Linear Data { Structures
BiNode
Bibliography if ( front == null) /∗ Case 1 ∗/ { front = node; /∗ Both made non−null for Inv. 3 ∗/ back = node; } else { /∗ Case 2,3 ∗/ front .prev = node; /∗ Inv 4.1 ∗/ node.next = front; /∗ Inv 4.2 ∗/ front = node; } } ECE750-TXB Bidirectional Linked List: Implementation Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] public void insertBack(T data) Linear Data { Structures
BiNode
Bibliography if (back == null) /∗ Case 1 ∗/ { back = node; /∗ Both made non−null for Inv. 3 ∗/ front = node; } else { /∗ Case 2,3 ∗/ back.next = node; /∗ Inv 4.1 ∗/ node.prev = back; /∗ Inv 4.2 ∗/ back = node; } }
ECE750-TXB Bidirectional Linked List: Implementation Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] public T removeFront() { Linear Data if ( front == null) /∗ Case 1 ∗/ Structures throw new RuntimeException(”Empty list.”); Trees else { Bibliography T data = front.data; front = front.next; if ( front == null) /∗ Case 2 ∗/ back = null; else { front .prev = null; /∗ Case 3 ∗/ } return data; } } ECE750-TXB Bidirectional Linked List: Implementation Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] public T removeBack() { Linear Data if (back == null) /∗ Case 1 ∗/ Structures throw new RuntimeException(”Empty list.”); Trees else { Bibliography T data = back.data; back = back.prev; if (back == null) /∗ Case 2 ∗/ front = null; else { back.next = null; /∗ Case 3 ∗/ } return data; } }
ECE750-TXB Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures
Trees
Bibliography Trees ECE750-TXB Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
I Recall that binary search allowed us to find items in a Linear Data sorted array in Θ(log n) time. However, inserting or Structures removing an item from the array took Θ(n) time in the Trees Bibliography worst case.
I Balanced Binary Search Trees offer Θ(log n) search, and also Θ(log n) insert and remove.
I More generally, trees offer a hierarchical decomposition of a search space:
I Spatial searching: R-trees, quadtrees, octtrees, kd-trees; I Databases: BTrees and their kin; I Intervals: interval trees; I ...
ECE750-TXB Binary Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Basic building block is a tree node, which contains: Linear Data I Structures
I A data value, drawn from some total order hT , ≤i; Trees
I A pointer to a left child; Bibliography I A pointer to a right child. ECE750-TXB Binary Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures
Trees
Bibliography
I Traversing a tree means visiting all its nodes. There are three common orders for doing this:
I Preorder: a node is visited before its children, e.g., [E, C, B, A, D, G, F , H, I ]
I Inorder: the left subtree is visited, then the node, then the right subtree, e.g., [A, B, C, D, E, F , G, H, I ].
I Postorder: a node is visited after its children, e.g., [A, B, D, C, F , I , H, G, E].
ECE750-TXB Binary Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures
Trees
Bibliography
I Terminology:
I E is the root and is on level 0; I C, G are children of E and are on level 1; I C is the parent of B, D; I C is an ancestor of A (as are E and B); I I is a descendent of G (and E and H); I The sequence (E, C, B, A) is a path from the root to A. ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data I Binary Search Trees satisfy the following invariant: for Structures any tree node x, Trees 1. If x.left 6= null, then x.left.data ≤ x.data; Bibliography 2. If x.right 6= null, then x.data ≤ x.right.data.
I If we want to visit the nodes of the tree in order, we start at the root and recursively: 1. Visit the left subtree; 2. Visit the node; 3. Visit the right subtree. i.e. an inorder traversal.
ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected] I Search procedure is very similar to binary search of an Linear Data array. Structures
Trees
boolean contains(int z) Bibliography { if (z == data) return true; else if ((z < data) && (left != null)) return left . contains(z); else if ((z > data) && (right != null)) return right . contains(z);
return false ; } ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures I The worst-case performance of contains(z) depends on Trees the height of the tree. Bibliography I The height of a tree is the length of the longest path from the root to a leaf.
I Root: the node at the top of the tree I Leaf (or external node): a node with no children I Internal node: any node that is not a leaf.
I Worst-case time required for contains(z) is O(h), where h is the height of the tree.
ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
I A binary search tree of height h can contain up to Linear Data 2h − 1 values. Structures Trees Given n values, we can always construct a binary search I Bibliography tree of height at most 1 + log2(n): I Sort the n values in ascending order and put them in an array A[0..n − 1].
I Make the root element A[bn/2c]. I Build the left subtree using elements A[0..bn/2c − 1] I Build the right subtree using elements A[bn/2c + 1].
I But, we can also construct a binary tree of height n − 1.
I Make the root A[0]; make its right subtree A[1], ... ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen I To achieve O(log n) search times, it is necessary to have [email protected] the tree balanced, i.e., have all leaves roughly the same Linear Data distance from the root. Structures
I This is easy if the contents of the tree are fixed. Trees Bibliography I This is not easy if we are adding and removing elements dynamically.
I We can aim for average-case balanced, i.e., the probability of having a badly balanced tree → 0 as n → ∞.
I Example: treaps
I We can have deterministic balancing that guarantees balance in the worst case.
I red-black trees; I AVL trees; I 2-3 trees; I B-trees; I splay trees.
ECE750-TXB Enumeration of Binary Search Trees Lecture 6: Lists and Trees I A useful fact: the number of valid binary search trees Todd L. Veldhuizen on n keys is given by the Catalan numbers: [email protected]
2n 1 Linear Data Structures Cn = n n + 1 Trees
1 Bibliography ∼ 4n · √ πn3/2 (Sequence A000108 in the Online Encyclopedia of Integer Sequences.)
I First few values: 1 1 7 429 2 2 8 1430 3 5 9 4862 4 14 10 16796 5 42 11 58786 6 132 12 208012 ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees I A naive insertion strategy, which does not guarantee Todd L. Veldhuizen balance, is the following: [email protected]
Linear Data void insert (int z) Structures { Trees if (z == data) return; Bibliography else if (z < data) { if ( left == null) left = new Tree(z); else left . insert (z); } else if (z > data) { if ( right == null) right = new Tree(z); else right . insert (z); } }
Note the symmetry: can swap left/right and <, >.
ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees Naive insertion works fairly well in the average case [1]. Todd L. I Veldhuizen Theorem [email protected] The expected height of a binary search tree constructed by Linear Data Structures inserting a sequence of n random values is ∼ c log n with Trees
c ≈ 4.311. Bibliography I Equivalently, inserting n values in a randomly chosen order.
I Using Markov’s inequality, we can say that if H is a random variable giving the height of a tree after the insertion of n keys chosen uniformly at random, then
[H] c log n log n Pr(H ≥ αn) ≤ E = = O αn αn n i.e., the probability of a tree having height linear in n converges to zero. So, badly balanced trees are very unlikely for large n. ECE750-TXB Binary Search Trees Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures
Trees
Bibliography
Result of 100 random insertions.
ECE750-TXB Rotations Lecture 6: Lists and Trees I A rotation is a simple, local operation that makes some Todd L. Veldhuizen subtrees shorter and others deeper. [email protected] I Rotations preserve the inorder traversal, i.e., the order Linear Data of keys in the tree remains the same. Structures I Any two binary search trees on the same set of keys can Trees be transformed into one another by a sequence of Bibliography rotations.1 I Rotations are a common method to restore balance to a tree.
D rotate right at D → B @ @ ~~ @@ @@ ~ ~ @ @ B E AD @ @ @@ ~~ @@ @ ← rotate left at B ~ ~ @ AC CE 1In fact, something even more interesting is true: for each n there is a sequence of rotations that produces every possible binary tree without any duplicates, and eventually returns the tree to its initial configuration. (i.e., the rotation graph Gn, where vertices are trees and edges are rotations, contains a Hamiltonian path [2].) ECE750-TXB Rotations Lecture 6: Lists and Trees
Todd L. Veldhuizen I Code to rotate right: [email protected]
Tree rotateRight () Linear Data { Structures if ( left == null) Trees throw new RuntimeException(”Cannot rotate here”); Bibliography
Tree A = left . left ; Tree B = left ; Tree C = left . right ; Tree D = this; Tree E = right;
return new Tree(B.data, A, new Tree(D.data, C, E)); }
I Code to rotate left: use duality, swap left/right.
ECE750-TXB Rotation example Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures I Badly balanced binary tree: rebalance by rotating left at a, then left at c: Trees Bibliography a >> b > b ÐÐ > b >> a c Ò == c ?? a Ò ? d ? d ?? d ?? c e ?? e e ECE750-TXB Iterator for a binary search tree I Lecture 6: Lists and Trees
Todd L. class BSTIterator implements Iterator { Veldhuizen [email protected] Stack stack ; Linear Data public BSTIterator(BSTNode t) Structures { Trees stack = new Stack(); Bibliography fathom(t); }
public boolean hasNext() { return !stack .empty(); }
public Object next() { BSTNode t = (BSTNode)stack.pop(); if (t . right child != null)
ECE750-TXB Iterator for a binary search tree II Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
fathom(t. right child ); Linear Data return t ; Structures } Trees Bibliography void fathom(BSTNode t) { do { stack .push(t); t = t. left child ; } while (t != null ); } } ECE750-TXB Bibliography I Lecture 6: Lists and Trees
Todd L. Veldhuizen [email protected]
Linear Data Structures [1] Luc Devroye. Trees
A note on the height of binary search trees. Bibliography Journal of the ACM (JACM), 33(3):489–498, 1986. bib pdf
[2] J. M. Lucas, D. R. van Baronaigien, and F. Ruskey. On rotations and the generation of binary trees. Journal of Algorithms, 15(3):343–366, November 1993. bib ps ECE750-TXB Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. ECE750-TXB Lecture 7: Red-Black Trees, Veldhuizen Heaps, and Treaps [email protected] Red-Black Trees
Heaps Todd L. Veldhuizen Treaps [email protected] Bibliography
Electrical & Computer Engineering University of Waterloo Canada
February 14, 2007
ECE750-TXB Binary Search Trees Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] I Recall that in a binary tree of height h the time required to find or insert an element is O(h). Red-Black Trees Heaps I In the worst case h = n, the number of elements. Treaps
I To keep h ∈ O(log n) one needs a balancing strategy. Bibliography I Balancing strategies may be either:
I Randomized: e.g. a random insert order results in expected height of c log n with c ≈ 4.311.
I Deterministic (in the sense of not random).
I Today we will see an example of each:
I Red-black trees: deterministic balancing I Treaps: randomized. Also demonstrate persistence and unique representation. ECE750-TXB Red-black trees Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] I Red-black trees are a popular form of binary search tree with a deterministic balancing strategy. Red-Black Trees Heaps I Nodes are coloured red or black. Treaps
I Properties of the node-colouring ensure that the longest Bibliography path to a leaf is no more than twice the length of the shortest path.
I This ensures height of ≤ 2 log2(n + 1), which implies search, min, max in O(log n) worst-case time.
I Insert and Delete can also be performed in O(log n) worst-case time.
I Invented by Bayer [2], red-black formulation due to Guibas and Sedgewick [9]. Other sources: [5, 10].
ECE750-TXB Red-Black Trees: Invariants Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps Balance invariants: I Bibliography 1. No red node has a red child. 2. Every path in a subtree contains the same number of black nodes. ECE750-TXB Red-Black Trees Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
ECE750-TXB Red-Black Trees: Balance I Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees Let bh(x) be the number of black nodes along any path Heaps from a node x to a leaf, excluding the leaf. Treaps Bibliography Lemma The number of internal nodes in the subtree rooted at x is at least 2bh(x) − 1.
Proof. ECE750-TXB Red-Black Trees: Balance II Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen By induction on height: [email protected] 1. Base case: If x has height 0, then x is a leaf, and Red-Black Trees bh(x) = 0; the number of internal (non-leaf) Heaps bh(x) descendents of x is 0 = 2 − 1. Treaps 2. Induction step: assume the hypothesis is true for height Bibliography ≤ h. Consider a node of height h + 1. From invariant (2), the children have black height either bh(x) − 1 (if the child is black) or bh(x) (if the child is red). By induction hypothesis, each child subtree has at least 2bh(x)−1 − 1 internal nodes. The total number of internal nodes in the subtree rooted at x is therefore ≥ (2bh(x)−1 − 1) + 1 + (2bh(x)−1 − 1) = 2bh(x) − 1.
ECE750-TXB Red-Black Trees: Balance Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Theorem Red-Black Trees A red-black tree with n internal nodes has height at most Heaps Treaps 2 log2(n + 1). Bibliography Proof. Let h be the tree height. From invariant 1 (a red node must have both children black), the black-height of the root must be ≥ h/2. Applying Lemma 1.1, the number of internal nodes n of the tree satisfies n ≥ 2h/2 − 1. Rearranging, h ≤ 2 log2(n + 1). ECE750-TXB Red-Black Trees: Balance Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] I As with all non-randomized binary search trees, balance must be maintained when insert or delete operations are Red-Black Trees performed. Heaps Treaps
I These operations may disrupt the invariants, so Bibliography rotations and recolourings are needed to restore them.
I Insert for red-black tree: 1. Insert the new key as a red node, using the usual binary tree insert. 2. Perform restructurings and recolourings along the path from the newly added leaf to the root to restore invariants. 3. Root is always coloured black.
ECE750-TXB Red-Black Trees: Balance Lecture 7: Red-Black Trees, Heaps, and Treaps
I Four cases for red nodes with red children: Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
I Restructure/recolour to correct: each of the above cases becomes ECE750-TXB Red-Black Trees: Example Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen I Insertion of [1,2,3,4,5] into a red-black tree: [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
I Implementation of rebalancing is straightforward but a bit involved.
ECE750-TXB Heaps and Treaps Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. I Treaps are a randomized search tree that combine Veldhuizen TRees and hEAPS. [email protected]
Red-Black Trees I First, let’s look at heaps. Heaps I Consider determining the maximum element of a set. Treaps I We could iterate through the array and keep track of Bibliography the maximum element seen so far. Time taken: Θ(n).
I We could build a binary tree (e.g. red-black). We can obtain the maximum (minimum) element in O(h) time by following rightmost (leftmost) branches. If tree is balanced, requires O(n log n) time to build the tree, and O(log n) time to retrieve the maximum element.
I A heap is a highly efficient data structure for maintaining the maximum element of a set. It is a rudimentary example of a dynamic algorithm/data structure. ECE750-TXB Dynamic Algorithms Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. I A static problem is one where we are given an instance Veldhuizen of a problem to solve, we solve it, and are done (e.g., [email protected]
sort an array). Red-Black Trees I A dynamic problem is one where we are given a problem Heaps to solve, we solve it. Treaps
I Then the problem is changed slightly and we resolve. Bibliography I ...ad infinitum.
I The challenge goes from solving a single instance of a problem to maintaining a solution as the problem is modified.
I It is usually more efficient to update the solution than recompute from scratch.
I e.g., binary search trees can be viewed as a method for dynamically maintaining an ordered list as elements are inserted and removed.
ECE750-TXB Heaps Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen I A heap dynamically maintains the maximum element in [email protected] a collection (or, dually, the minimum element). A binary heap can: Red-Black Trees Heaps I Obtain the maximum element in O(1) time; Treaps I Remove the maximum element in O(log n) time; Bibliography I Insert new element in O(log n) time. Heaps are a natural implementation of the PriorityQueue ADT.
I There are several flavours of heaps: binary heaps, binomial heaps, fibonacci heaps, pairing heaps. The more sophisticated of these support merging (melding) two heaps.
I We will look at binary heaps. ECE750-TXB Binary Heap Invariants Lecture 7: Red-Black Trees, Heaps, and Treaps
1. A binary heap is a complete binary tree of height h − 1, Todd L. Veldhuizen plus a possibly incomplete level of height h filled from [email protected] left to right. Red-Black Trees
2. The key stored at each node is ≥ the key(s) stored in Heaps
its children. Treaps
Bibliography
ECE750-TXB Binary Heap Lecture 7: Red-Black Trees, Heaps, and Treaps I A binary heap may be stored as a (1-based) array, where Todd L. Veldhuizen I Parent(j) = bj/2c [email protected] I LeftChild(i) = 2 ∗ i Red-Black Trees I RightChild(i) = 2 ∗ i + 1 Heaps I e.g., [17, 11, 13, 9, 6, 2, 12, 4, 3, 1] is an array Treaps representation of the heap: Bibliography ECE750-TXB Heap operations Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] I To insert a key k into the heap:
I Place k at the next available position. Red-Black Trees
I Swap k with its parent(s) until the heap invariant is Heaps
satisfied. (Takes O(log n) time.) Treaps
I The maximum element is just the key stored at the Bibliography root, which can be read off in O(1) time.
I To delete the maximum element:
I Place the key at the last heap position at the root (overwriting the current maximum), and decrease the size of the heap by one.
I Choose the largest of the root and its two children, and make this the root; perform this procedure recursively until the heap invariant is satisfied.
ECE750-TXB Heap: insert example Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps I Example: insert 23 into the heap and restore the heap Treaps invariant. Bibliography ECE750-TXB Heap: delete-max example Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
I To delete the max element, move the element from the last position (2) to the root;
I To restore heap invariant, swap root with the largest child greater than it, if any, and repeat down the heap.
ECE750-TXB Treaps Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] Treaps (binary TRee + hEAP) Red-Black Trees I a randomized binary search tree Heaps
I with O(log n) average-case insert, delete, search Treaps
I with O(∆ log n) average-case union, intersection, ⊆, ⊇, Bibliography where ∆ = |(A \ B) ∪ (B \ A)| is the difference between the sets
I uniquely represented (to be explained)
I easily made persistent (to be explained)
I Due to Vuillemin [14] and independently, Seidel and Aragon [11]. Additional references: [3, 16, 15]. ECE750-TXB Treaps: Basics Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees I Keys are assigned (randomly chosen) priorities. Heaps I Two total orders on keys: Treaps I The usual key order; Bibliography I A randomly chosen priority order, often obtained by assigning each key a random integer, or using an appropriate hash function
I Treaps are kept sorted by key in the usual way (inorder tree traversal visits keys in order).
I The heap property is maintained wrt the priority order.
ECE750-TXB Treap ordering Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen I Each node has key k and priority p [email protected]
I Ordering invariants: Red-Black Trees
Heaps (k , p ) 2 2 Treaps t KK tt KK Bibliography tt KK tt KK tt K (k1, p1)(k3, p3)
k1 ≤ k2 ≤ k3 Key order
p ≥ p 2 p 1 Priority order p2 ≥p p3
Every node has a higher priority than its descendents. ECE750-TXB Treaps: Basics Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees I If priorities are chosen randomly, the tree is on average balanced, and insert, delete, search take O(log n) time Heaps Treaps I Random priorities behave like a random insertion order: the structure of the treap is exactly that obtained by Bibliography inserting the keys into a binary search tree in descending order of heap prioritity.
I If keys are unique (no duplicates), and priorities are unique, then the treap has the unique representation property
ECE750-TXB Unique representation Lecture 7: Red-Black Trees, Heaps, and Treaps
I Unique representation: each set is represented by a Todd L. Veldhuizen unique data structure [1, 13, 12] [email protected] I Most tree data structures do not have this property: depending on order of inserts, deletes, etc. the tree can Red-Black Trees have different forms for the same set of keys. Heaps n −3/2 −1/2 Treaps I Recall there are Cn ∼ 4 n π ways to place n keys in a binary search tree (Catalan numbers). e.g. Bibliography C20 = 6564120420. Deterministic (i.e., not randomized) uniquely I √ represented search trees are known to require Ω( n) worst-case time for insert, delete, search [12].
I Treaps are randomized (not deterministic), and have O(log n) average-case time for insert, delete, search
I If you memoize or cache the constructors of a uniquely represented data structure, you can do equality testing in O(1) time by comparing pointers. ECE750-TXB Treap: Example Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps Treap A1 = R.insert("f"); // Insert the key f Treaps Treap A2 = A1.insert("u"); // Insert the key u Bibliography
Treap B1 = R.insert("u"); // Insert the key u into R Treap B2 = R.insert("f"); // Insert the key f
ECE750-TXB Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography ECE750-TXB Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
ECE750-TXB Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography ECE750-TXB Canonical forms Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] I The structure of the treap does not depend on the order on which the operations are carried out. Red-Black Trees Heaps Treaps give a canonical form for sets: if A, B are sets, I Treaps
we can determine whether A = B by constructing treaps Bibliography containing the elements of A and B, and comparing them. If the treaps are the same, the sets are equal.
I Treaps give an easy decision procedure for equality of terms modulo associativity, commutativity, and idempotency.
I Treaps are very useful in program analysis (e.g., for compilers) for solving fixpoint equations on sets.
ECE750-TXB Persistent Data Structures Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen Literature: [7, 8, 4, 6] [email protected]
Red-Black Trees I Partially persistent: Can access previous versions of a data structure, but cannot derive new versions from Heaps them (read-only access to a linear past.) Treaps Bibliography I Fully persistent: Can make changes in previous versions of the data structure: versions can “fork.”
I Any linked data structure with constant bounded in-degree can be made fully persistent with amortized O(1) space and time overhead, and worst case O(1) overhead for access [7]
I Confluently persistent: Can branch into two versions of the data structure, and later reconcile these branches ECE750-TXB The Version Graph Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. The version graph shows how versions of a data structure Veldhuizen are derived from one another. [email protected]
I Vertices: Data structures Red-Black Trees Heaps I Edges: Show how one data structure was derived from another Treaps Bibliography I Treaps example:
R B }} BB }} BB }} BB ~}} B A1 B1
A2 B2
ECE750-TXB Version graph Lecture 7: Red-Black Trees, Heaps, and Treaps I Partial persistence: version graph is a linear sequence of Todd L. Veldhuizen versions, each derived from the previous version. [email protected]
I Partial/full persistence: get a version tree Red-Black Trees I Confluent persistence: get a version DAG (directed Heaps acyclic graph) Treaps Bibliography X A {{ AA {{ AA {{ AA }{{ A Y 1 Z Y 2 C CC CC CC C! Ö W ECE750-TXB Purely Functional Data Structures Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
I Literature: [10] Heaps Treaps I Functional data structures: cannot modify a node of the data structure once it is created. (One implication: Bibliography no cyclic data structures.)
I Functional data structures are by nature partially persistent: we can always hold onto pointers to old versions of the data structure.
ECE750-TXB Scopes Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Partial persistence is very useful for managing scopes in Veldhuizen I [email protected] compilers and program analysis. Red-Black Trees I A scope is a representation of the names that are visible Heaps at a given program point: Treaps
int foo(int a, int b) Bibliography { // S1 int x = a*a, y = b*b, z=0; // S2
for (int k=0; k < x; ++k) // S3 for (int l=0; l < y; ++l) // S4 ++c; // S5 return x; } ECE750-TXB Scopes Example Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected]
Red-Black Trees
Heaps
Treaps
Bibliography
ECE750-TXB Bibliography I Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] [1] A. Andersson and T. Ottmann. Faster uniquely represented dictionaries. Red-Black Trees In IEEE, editor, Proceedings: 32nd annual Symposium Heaps on Foundations of Computer Science, San Juan, Puerto Treaps Bibliography Rico, October 1–4, 1991, pages 642–649, 1109 Spring Street, Suite 300, Silver Spring, MD 20910, USA, 1991. IEEE Computer Society Press. bib pdf
[2] Rudolf Bayer. Symmetric binary B-trees: Data structure and maintenance algorithms. Acta Inf, 1:290–306, 1972. bib ECE750-TXB Bibliography II Lecture 7: Red-Black Trees, Heaps, and Treaps
[3] Guy E. Blelloch and Margaret Reid-Miller. Todd L. Veldhuizen Fast set operations using treaps. [email protected] In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 16–26, Red-Black Trees Heaps Puerto Vallarta, Mexico, June 1998. bib ps Treaps [4] Adam L. Buchsbaum and Robert E. Tarjan. Bibliography Confluently persistent deques via data-structural bootstrapping. In Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, pages 155–164. ACM Press, 1993. bib pdf ps
[5] Thomas H. Cormen, Charles E. Leiserson, and Ronald R. Rivest. Intoduction to algorithms. McGraw Hill, 1991. bib
ECE750-TXB Bibliography III Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [email protected] [6] P. F. Dietz. Fully persistent arrays. Red-Black Trees In F. Dehne, J.-R. Sack, and N. Santoro, editors, Heaps Proceedings of the Workshop on Algorithms and Data Treaps Bibliography Strucures, volume 382 of LNCS, pages 67–74, Berlin, August 1989. Springer. bib [7] James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, and Robert Endre Tarjan. Making data structures persistent. In ACM Symposium on Theory of Computing, pages 109–121, 1986. bib pdf ECE750-TXB Bibliography IV Lecture 7: Red-Black Trees, Heaps, and Treaps [8] Amos Fiat and Haim Kaplan. Todd L. Making data structures confluently persistent. Veldhuizen [email protected] In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-01), pages Red-Black Trees 537–546, New York, January 7–9 2001. ACM Press. Heaps bib pdf Treaps Bibliography [9] Leonidas J. Guibas and Robert Sedgewick. A dichromatic framework for balanced trees. In FOCS, pages 8–21. IEEE, 1978. bib [10] Chris Okasaki. Purely Functional Data Structures. Cambridge University Press, Cambridge, UK, 1998. bib [11] Raimund Seidel and Cecilia R. Aragon. Randomized search trees. Algorithmica, 16(4/5):464–497, 1996. bib pdf ps
ECE750-TXB Bibliography V Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [12] Lawrence Snyder. [email protected]
On uniquely representable data structures. Red-Black Trees In 18th Annual Symposium on Foundations of Heaps Computer Science, pages 142–146, Long Beach, Ca., Treaps USA, October 1977. IEEE Computer Society Press. bib Bibliography [13] R. Sundar and R. E. Tarjan. Unique binary search tree representations and equality-testing of sets and sequences. In Baruch Awerbuch, editor, Proceedings of the 22nd Annual ACM Symposium on the Theory of Computing, pages 18–25, Baltimore, MY, May 1990. ACM Press. bib pdf ECE750-TXB Bibliography VI Lecture 7: Red-Black Trees, Heaps, and Treaps
Todd L. Veldhuizen [14] Jean Vuillemin. [email protected] A unifying look at data structures. Red-Black Trees
Communications of the ACM, 23(4):229–239, 1980. Heaps
bib pdf Treaps
Bibliography [15] M. A. Weiss. A note on construction of treaps and Cartesian trees. Information Processing Letters, 54(2):127–127, April 1995. bib [16] Mark Allen Weiss. Linear-time construction of treaps and Cartesian trees. Information Processing Letters, 52(5):253–257, December 1994. bib pdf ECE750-TXB Lecture 8: Treaps, Tries, and Hash Tables
Todd L. ECE750-TXB Lecture 8: Treaps, Tries, and Veldhuizen Hash Tables [email protected] Review: Treaps
Tries Todd L. Veldhuizen Hash Tables [email protected] Bibliography
Electrical & Computer Engineering University of Waterloo Canada
February 1, 2007
ECE750-TXB Review: Treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen I Recall that a binary search tree has keys drawn from a [email protected] totally ordered structure hK, ≤i Review: Treaps I An inorder traversal of the tree recovers the keys in Tries ascending order. Hash Tables
Bibliography d
b h
a c f i ECE750-TXB Review: Treaps Lecture 8: Treaps, Tries, and Hash Tables Recall that a heap has priorities drawn from a totally Todd L. I Veldhuizen ordered structure hP, ≤i [email protected]
I The priority of a parent is ≥ that of its children (for a Review: Treaps max heap.) Tries Hash Tables I The largest priority is at the root. Bibliography
23
11 14
7 1 6 13
ECE750-TXB Review: Treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. I In a treap, nodes contain a pair (k, p) where k ∈ K is a Veldhuizen [email protected] key, and p ∈ P is a priority. Review: Treaps I A Treap is a mixture of a binary search tree and a heap: Tries
Hash Tables I A binary search tree with respect to keys; Bibliography I A heap with respect to priorities.
(d,23)
(b,11) (h,14)
(a,7) (c,1) (f,6) (i,13) ECE750-TXB Review: Unique Representation Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I If the keys and priorities are unique, then treaps have the unique representation property: given a set of (k, p) Review: Treaps pairs, there is only one way to build the tree. Tries Hash Tables I For the heap property to be satisfied, there is only one (k, p) pair that can be the root: the one with the Bibliography highest priority.
I The left subtree of the root will contain all keys < k, and the right subtree of the root will contain all keys > k.
I Of the keys < k, the one with the highest priority must occupy the left child of the root. This then splits constructing the left subtree into two subproblems.
I etc.
ECE750-TXB Review: Unique Representation Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen I Example: to build a treap from {(i, 13), (c, 1), (d, 23), (b, 11), (h, 14), (a, 7), (f , 6)}, [email protected] unique choice of root: (d, 23) Review: Treaps
Tries (d, 23) T Hash Tables jjjj TTTT j Bibliography {(c, 1), (b, 11), (a, 7)} {(i, 13), (h, 14), (f , 6)}
I To build the left subtree, pick out the highest priority element: (b, 11). And so forth.
(d, 23) T t TTTT ttt (b, 11) {(i, 13), (h, 14), (f , 6)} u KK uuu KK (a, 7) (c, 1) ECE750-TXB Review: Unique Representation Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen I Data structures with the unique representation can be [email protected] checked for equality in O(1) time by using caching (also known as memoization): Review: Treaps Tries I Implement the data structure in a purely functional style Hash Tables (a node’s fields are never altered after construction. Any changes require creating a new node.) Bibliography
I Maintain a map from (key, priority, lchild, rchild) tuples to already constructed nodes.
I Before constructing a node, check the cache to see if it already exists; if so, return the pointer to that node. Otherwise, construct the node and add it to the cache.
I If two treaps contain the same keys, their root pointers will be equal: can be checked in O(1) time.
I Checking and maintaining the cache requires additional time overhead.
ECE750-TXB Review: Balance of treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
I Treaps are balanced if the priorities are chosen Review: Treaps randomly. Tries Hash Tables I Recall that building a binary search tree with a random insertion order results in a tree of expected height Bibliography c log n, with c ≈ 4.311. I A treap with random priorities assigned to keys has exactly the same structure as a binary search tree created by inserting keys in descending order of priority
I Descending order of priority is a random order; I Therefore treaps have expected height c log n with c ≈ 4.311. ECE750-TXB Insertion into treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
I Insertion for treaps is much simpler than that for Review: Treaps red-black trees. Tries 1. Insert the (k, p) pair as for a binary search tree, by key Hash Tables alone: the new node will be placed somewhere at the Bibliography bottom of the tree. 2. Perform rotations along the path from the new leaf to the root to restore invariants:
I If there is a node x whose right subchild has a higher priority, rotate left at x. I If there is a node x whose left subchild has a higher priority, rotate right at x.
ECE750-TXB Insertion into treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I Example: the treap below has just had (e, 19) inserted
as a new leaf. Rotations have not yet been performed. Review: Treaps
(d,23) Tries Hash Tables
Bibliography (b,11) (h,14)
(a,7) (c,1) (f,6) (i,13)
(e,19)
I f has a left subchild with greater priority: rotate right at f . ECE750-TXB Insertion into treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I After rotating right at f : Review: Treaps (d,23) Tries
Hash Tables
(b,11) (h,14) Bibliography
(a,7) (c,1) (e,19) (i,13)
(f,6)
I h has a left subchild with greater priority: rotate right at h.
ECE750-TXB Insertion into treaps Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I After rotating right at h: Review: Treaps
(d,23) Tries
Hash Tables
Bibliography (b,11) (e,19)
(a,7) (c,1) (h,14)
(f,6) (i,13)
I Heap invariant is satisfied: all done. ECE750-TXB Lecture 8: Treaps, Tries, and Hash I Treaps are easily made persistent (retain previous Tables Todd L. versions) by implementing them in a purely functional Veldhuizen style. Insertion requires duplicating at most a sequence [email protected]
of nodes from the root to a leaf: an O(log n) space Review: Treaps
overhead. The remaining parts of the tree are shared. Tries
Hash Tables I E.g. the previous insert done in a purely functional style: Bibliography
Version 2 Version 1
(d,23) (d,23)
(e,19)
(b,11) (h,14)
(a,7) (c,1) (f,6) (i,13)
ECE750-TXB Strings Lecture 8: Treaps, Tries, and Hash Tables
I A string is a sequence of characters drawn from some Todd L. Veldhuizen alphabet Σ. We will often use Σ = {0, 1}: binary [email protected] strings. Review: Treaps We write Σ∗ to mean all finite strings1 composed of I Tries characters from Σ. (∗ is the Kleene closure.) Hash Tables ∗ I Σ contains the empty string . Bibliography ∗ I If w, v ∈ Σ are strings, we write w · v or just wv to mean the concatenation of w and v.
I Example: given w = 010 and v = 11, w · v = 01011. hΣ∗, ·, i is an example of a monoid: a set (Σ∗) together with an associative binary operator (·) and an identity element (). For any strings u, v, w ∈ Σ∗,
u · (v · w) = (u · v) · w v = v = v
1Infinite strings are very useful also: if we write a real number x ∈ [0, 1] as a binary number e.g. 0.101100101000 ··· , this is a representation of x by an infinite string from Σω. ECE750-TXB Tries Lecture 8: Treaps, Tries, and Hash Tables I Recall that we may label the left and right links of a Todd L. Veldhuizen binary tree with 0 (for left) and 1 (for right): [email protected]
Review: Treaps 0 y @@ 1 yyy @@ Tries x : 0 ÓÓ ::1 Hash Tables ÓÓ : y z Bibliography
I To describe a path in the tree, one can list the sequence of left/right branches to take from the root. E.g., 10 gives y, 11 gives z.
I The set of all paths from the root to leaves is P◦ = {0, 10, 11}
I The set of all paths from the root to leaves or internal nodes is: P• = {, 0, 1, 10, 11}, where is the empty string indicating the path starting and ending at the root.
ECE750-TXB Tries Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
◦ Review: Treaps I The set P is prefix-free: no string is an initial segment Tries of any other string. Otherwise, there would be a path Hash Tables to a leaf passing through another leaf! Bibliography • • • I The set P is prefix-closed: if wv ∈ P , then w ∈ P also. i.e., P• contains all prefixes of all strings in P•.2
2We can define • as an operator by A• ≡ {w : wv ∈ A}. • is a closure operator. A useful fact: every closure operator has as its range a complete lattice, where meet and join are given by (X u Y )• = X • ∩ Y • and (X t Y )• = (X • ∪ Y •)•. Applying this fact to the representation of binary trees by strings, • induces a lattice of binary trees. ECE750-TXB Tries Lecture 8: Treaps, Tries, and Hash • Tables I Given a binary tree, we can produce a set of strings P ◦ Todd L. or P that describe all paths (resp. all paths to leaves). Veldhuizen [email protected] • ◦ I The converse is also true: given a set P or P , we can reproduce the tree.3 Review: Treaps Tries Example: the set {100, 11, 001, 01} is prefix free, and I Hash Tables
the corresponding tree can be built by simply adding Bibliography the paths one-by-one to an initially empty tree:
O ooo OOO 0 ooo OOO1 ooo OOO ooo OOO ?o ? 0 ?? 1 0 ?? 1 ?? ?? ?? ?? ? ? ? ?? 1 0 ?? ?? ?
3Formally we can say there is a bijection (a 1-1 correspondence) between binary trees and prefix-closed (resp. prefix-free) sets.
ECE750-TXB Tries Lecture 8: Treaps, Tries, and Hash Tables I A tree constructed in this way — by interpreting a set Todd L. of strings as paths of the tree — is called a trie. (The Veldhuizen term comes from reTRIEval; pronounced either “tree” [email protected] or “try” depending on taste. Tries were invented by de Review: Treaps la Briandais, and independently by Fredkin [5].) Tries I The most common use of a trie is to implement a Hash Tables DictionaryhK, V i, i.e., maintaining a map Bibliography f : K * V by associating each k ∈ K with a path through the trie to a node where f (k) is stored.4 I Tries find applications in bioinformatics, coding and compression, sorting, SAT solving, routing, natural language processing, very large databases (VLDBs), data mining, etc. I Binary Decision Diagrams (BDDs) are essentially tries with caching and sharing of subtrees. I Recent survey by Flajolet [4]. 4The notation K * V indicates a partial function from K to V : a function that might not be defined for some keys. ECE750-TXB Trie example: word list Lecture 8: Treaps, Tries, and Hash Tables I Example: build a trie to store english words: trie, larch, Todd L. saxophone, tried, saxifrage, squeak, try, squeak, Veldhuizen squeaky, squeakily, squeakier. [email protected] I Common implementation variants of a trie: Review: Treaps I associate internal nodes with entries also, if one occurs Tries there. (Can use 1 bit on internal nodes to indicate Hash Tables whether a key terminates there.) Bibliography
I when a node has only one descendent, end the trie there, rather than including a possibly long chain of nodes with single children.
I Use the trie to store keys only; implicitly the values we are storing are V = {0, 1}. The function the trie represents is a map χ : K → {0, 1} where χ is the characteristic function of the set: χ(k) = 1 if and only if k is in the set.
I Use the alphabet {a, b, ··· , z}. I Instead of having a 26-way branch in each node, put a little BST at each node with up to 26 elements in it (a “ternary search trie” [1])
ECE750-TXB Trie example: wordlist Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
Review: Treaps saxifrage Tries i Hash Tables x o larch saxophone squeakier a e Bibliography l i l s q u e a k squeak squeakily t y squeaky r i e d trie tried y
try ECE750-TXB Trie example: coding Lecture 8: Treaps, Tries, and Hash Tables Suppose we want to transmit (or compress) data. Todd L. I Veldhuizen [email protected] I At the receiving (or decoding) end, we will have a long
string of bits to decode. Review: Treaps
Tries I A simple but effective strategy is to build a codebook that maps binary codewords to plaintext. The incoming Hash Tables transmission is then just a sequence of codewords that Bibliography we will replace, one by one, with their corresponding plaintext.5
I A code that can be described by a trie, with outputs only at the leaves, is an example of a uniquely decodeable code: there is only one way an encoded message can be decoded. Specifically, such codes are called prefix codes or instantaneous codes.
5This strategy is asymptotically optimal (achieves a bitrate ≤ H + for any > 0) for stationary ergodic random processes, with an appropriate choice of codebook.
ECE750-TXB Trie example: coding Lecture 8: Treaps, Tries, and Hash Tables
I Example: to encode english, we might assign codewords Todd L. to sequences of three letters, giving the most frequent Veldhuizen words shorter codes: [email protected] Three-letter combination Codeword Review: Treaps the 000 Tries and 001 Hash Tables for 010 Bibliography are 011 but 100 not 1010 you 1011 all 1100 . . . . etc 11101101 . . . . qxw 1111011001101001 I These codewords are chosen to be a prefix-free set. ECE750-TXB Trie example: coding Lecture 8: Treaps, Tries, and Hash Tables
Todd L. I For decoding messages we build a trie: Veldhuizen [email protected]
0 1 Review: Treaps Tries
0 1 0 1 Hash Tables
Bibliography 01 01 01 0 1
the and for are but
0 1 0 1 0 1
not you all
0 1
ECE750-TXB Trie example: decoding Lecture 8: Treaps, Tries, and Hash Tables I Incoming message: 100101001010111100 Todd L. Veldhuizen [email protected] I To decode: start at root of trie, follow path given by
bits. When a leaf is reached, output the word there, Review: Treaps
and return to the root. Tries
Hash Tables
Bibliography 100 1010 010 1011 1100 |{z} |{z} |{z} |{z} |{z} but not for you all
I This requires substantially fewer bits than transmitting as ASCII text (24 bits per 3-letter sequence).
I A good code assigns short codewords to frequently-occurring strings; if a string occurs with probability pi , one wants the codeword to have length about − log2 pi . I Later in the course we shall see how such codes can be constructed optimally using a greedy algorithm. ECE750-TXB Tries: Kraft’s inequality Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I Kraft’s inequality is a simple constraint on the lengths of codewords in a prefix code (equivalently, leaf depths Review: Treaps in a binary tree.) Tries Theorem (Kraft) Hash Tables Bibliography Let (d1, d2,...) be a sequence of code lengths of a code. There is a prefix code with code lengths d1, d2,... (equivalently, a binary tree with leaves at depth d1, d2,...) if and only if
n X 2−di ≤ 1 (1) i=1
ECE750-TXB Tries: Kraft’s inequality I Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I Positive example: the codeword lengths 3, 3, 2, 2 satisfy 1 1 1 1 3 Review: Treaps Kraft’s inequality: 8 + 8 + 4 + 4 = 4 . Possible trie realization: Tries Hash Tables 0 o OO 1 Bibliography ooo OOO 0 ?o 1 0 ?? 0 ? 1 ?? I Negative example: the codeword lengths 3, 3, 3, 2, 2, 2 9 violate Kraft’s inequality: sum is 8 . I Kraft’s inequality becomes an equality for trees in which every internal node has two children. ECE750-TXB Tries: Kraft’s inequality Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Two ways to prove Kraft’s inequality: Veldhuizen [email protected] I Put each node of a binary tree in correspondence with a subinterval of [0, 1] on the real line: root is [0, 1], its children get Review: Treaps [0, 1 ] and [ 1 , 1]. Each node at depth d receives an interval of 2 2 Tries length 2−d and splits it in half for its children. The union of the Hash Tables intervals at the leaves is ⊆ [0, 1], and the intervals at the leaves are pairwise disjoint, so the sum of their interval lengths is ≤ 1. Bibliography
I Kraft’s inequality can also be proved with a simple induction argument. The list of valid codeword length sequences can be generated from the initial sequence h1, 1i (codewords {0, 1}) by the rewrite rules k → k + 1, k + 1 (expand a node into two children) and k → k + 1 (expand a node to have a single child). Base case: with h1, 1i obviously 2−1 + 2−1 = 1. Induction step: if sum is ≤ 1, consider expanding a single element of the sequence: have either the rewrite k → k + 1, k + 1, and 2k ≥ 2k−1 + 2k−1; or the rewrite k → k + 1, and 2k ≥ 2k−1. So rewrites never increase the “weight” of a node.
ECE750-TXB Tries: Kraft’s inequality I Lecture 8: Treaps, Tries, and Hash Tables
It is occasionally useful to have an infinite set of codewords Todd L. Veldhuizen handy, in case we do not know in advance how many [email protected] different objects we might need to code. For an infinite set of codewords (or infinite binary tree), Review: Treaps Kraft’s inequality implies Tries Hash Tables + + ∗ Bibliography dk ≥ c + log k + log log log k infinitely often (2)
where
log+ x ≡ log x + log log x + log log log x + ···
with the sum taken only over the positive terms, and log∗ x is the “iterated logarithm” — ( 0 if x ≤ 1 log∗ x = 1 + log∗(log x) otherwise ECE750-TXB Tries: Kraft’s inequality II Lecture 8: Treaps, Tries, and Hash Tables
See e.g., [2, 9]. Todd L. Where does this bound come from? Well, a necessary condition for Veldhuizen [email protected] ∞ X −d 2 k ≤ 1 Review: Treaps k=0 Tries
Hash Tables P∞ −dk to hold is that the series k=0 2 converges. For example, if −dk 1 Bibliography dk = log k, then 2 = k , the Harmonic series. The Harmonic series diverges, so Kraft’s inequality can’t hold. We can parlay this into an inequality by remembering the “comparison test” for convergence of series: if ak , bk are two positive series, and P P ak ≤ bk for all k, then ak ≤ bk . If we stick the Harmonic series in −dk for ak and 2 for bk , we get:
1 −dk P −dk If k ≤ 2 for all k then ∞ ≤ 2 .
The premiss of this test must be false if P 2−dk does not diverge to −dk 1 −dk 1 infinity. Therefore 2 must be < k for at least some k. If 2 < k for only some finite number of choices of k, the series would still −dk −dk 1 diverge. So, a necessary condition for 2 to converge is that 2 < k
ECE750-TXB Tries: Kraft’s inequality III Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
Review: Treaps
for infinitely many terms. Taking logarithms and multiplying through by Tries −1 we get dk > log k for infinitely many i. Hash Tables We can generalize this by saying that if g ∈ ω(1) is any diverging 0 Bibliography function, then dk > − log g (k) for infinitely many k. (The Harmonic series bound follows from choosing g(x) = log x.) Unfortunately there is no “slowest growing function” g(x) from which we could obtain a tightest possible bound. Eqn. (2) is from [2]; Bentley credits the result to Ronald Graham and Fan Chung, apparently unpublished. ECE750-TXB Tries: Variations on a theme I Lecture 8: Treaps, Tries, and Hash Tables
There are many useful variants of tries [4]: Todd L. Veldhuizen I Multiway branching: instead of choosing Σ = {0, 1}, [email protected] one can choose any finite alphabet, and allow each Review: Treaps node to have |Σ| children. Tries
I Paged trie: each node is required to have a minimal Hash Tables number of leaves descended from it; when this Bibliography threshold is not met, the subtree is converted into a compact form (e.g., an array of keys and values) suitable for secondary storage. This technique can also be used to increase performance in main memory [6].
I Patricia tries [7] (“Practical Algorithm To Retrieve Information Coded in Alphanumeric6”) Introduce skip pointers to avoid long sequences of single-branch nodes like
0 1 1 0 / / / /
ECE750-TXB Tries: Variations on a theme II Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
I LC-Trie: the first few levels of a big trie tend to be Review: Treaps almost a complete binary tree of some depth, which can Tries be collapsed into an array of pointers to tries [8]. Hash Tables Bibliography I Ternary Search Tries (TSTs): a blend of a trie and a BST; can require substantially less space than a trie. For a large |Σ|, replace a |Σ|-way branch at each internal node with a BST of depth ≤ log |Σ|.
6Almost better than my all-time favourite strained CS acronym, PERIDOT: “Programming by Example for Real-time Interface Design Obviating Typing.” Great project, despite the acronym. ECE750-TXB Hash Tables Lecture 8: Treaps, Tries, and Hash Tables I Suppose we wanted to represent the following set: Todd L. Veldhuizen [email protected] M = {35, 139, 395, 1691, 1760, 1795, 3632, 3789, 4657} Review: Treaps
Tries Given some x ∈ N, we want to quickly test whether x ∈ M. Hash Tables Bibliography I Binary search trees: require following a path through a tree — perhaps not fast enough for our problem.
I Super fast way: allocate an array of 4657 bytes. Set ( 0 if i 6∈ M A[i] = 1 if i ∈ M
Then, on a RAM, can test whether x ∈ M with a single memory access to A[i] (a constant amount of time). However, space required by this strategy is O(sup M).
ECE750-TXB Hash Tables Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected]
I Obviously the array A would contain mostly empty Review: Treaps space. Can we somehow “compress” the array but still Tries support fast access? Hash Tables Bibliography I Yes: allocate a much smaller table B of length k. Define a function h : [1, 4657] → [1, k] that maps indices of A to indices of B, can be computed quickly, and ensures that if x, y ∈ M and x 6= y, then h(x) 6= h(y) i.e., no two elements of M have the same index in B.
I Then, x ∈ M if and only if B[h(x)] = x. ECE750-TXB Hash Tables Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen I For our example, h(x) = x mod 17 does the trick. Here is the array B: [email protected] Review: Treaps j B[j] j B[j] j B[j] Tries 0 0 6 0 12 0 1 35 7 0 13 0 Hash Tables 2 0 8 1691 14 0 Bibliography 3 139 9 1760 15 3789 4 395 10 1795 16 4657 5 0 11 3632
I e.g.: x = 1691: h(x) = 8, and B[8] = 1691, so x ∈ M.
I e.g.: x = 1692: h(x) = 9, and B[9] = 1760 6= 1692, so x 6∈ M.
I This is a hash table. h(x) = x mod 17 is called a hash function.
ECE750-TXB Hash Functions Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [email protected] I A hash function is a map h : K → H from some
(usually large) key space K to some (usually small) set Review: Treaps of hash values H. In our example, we were mapping Tries from K = [1, 4657] to H = [1, 17]. Hash Tables Bibliography I If the set M ⊆ K is chosen uniformly at random, keys are uniformly distributed (i.e., each k ∈ K has the same probability of appearing in a set to represent). In this case the hash function should distribute the keys evenly amongst elements of H, i.e., we want that |h−1(y)| ≈ |h−1(z)| for y, z ∈ H.7
I For a nonuniform distribution on keys, one just wants to choose h so that the distribution induced on H is close to uniform.
7Recall that for a function f : R → S, f −1(s) ≡ {r : f (r) = s}. ECE750-TXB Hash Functions Lecture 8: Treaps, Tries, and Hash Tables I We will describe some hash functions where K = N Todd L. Veldhuizen (keys are nonnegative integers). These are easily [email protected] adapted to other kinds of keys (e.g., strings) by interpreting the binary representation of the key as an Review: Treaps integer. Tries Hash Tables Some commonly used hash functions are the following: Bibliography 1. Division: use h(k) = k mod m where m = |H| is usually chosen to be a prime number far away from any power of 2. (Note.8)
I For long bit strings, use Horner’s rule for evaluating polynomials in Z/mZ (will explain.) 2. Multiplication: use h(k) = bm{kφ}c, where 0 < φ < 1
is an irrational number√ and {x} ≡ x − bxc. A popular 5−1 choice of φ is φ = 2 . 8A particularly terrible choice would be m = 256, which would hash objects based only on their lowest 8 bits. e.g., the hash of a string would depend only on its last character.
ECE750-TXB Lecture 8: Treaps, Multiplication hash functions: Example√ 5−1 Tries, and Hash Example of multiplication hash function using φ = 2 , and hash Tables table with m = 100 slots: Todd L. Veldhuizen key {kφ} bm{kφ}c [email protected] 1 0.618034 61. 2 0.236068 23. Review: Treaps
3 0.854102 85. Tries 4 0.472136 47. Hash Tables 5 0.090170 9. Bibliography 6 0.708204 70. 7 0.326238 32. 8 0.944272 94. 9 0.562306 56. 10 0.180340 18. 11 0.798374 79. 12 0.416408 41. 13 0.034442 3. 14 0.652476 65. 15 0.270510 27. 16 0.888544 88. 17 0.506578 50. Idea is that the third column (the hash slots) ‘looks like’ a random sequence. ECE750-TXB Multiplication hash functions Lecture 8: Treaps, Tries, and Hash Tables I The reason why h(k) = bm{kφ}c is a reasonable hash Todd L. function is interesting. Veldhuizen [email protected] I The short answer is that the sequence {kφ} for k = 1, 2, 3,... ‘kind of behaves like’ a random real Review: Treaps drawn from (0, 1). So, h(k) = bm{kφ}c ‘looks like’ a Tries randomly chosen hash function. Hash Tables A less sketchy explanation: Bibliography 1. {kφ} is uniformly distributed on (0, 1): asymptotically, the proportion of {kφ} falling in an interval (α, β) where (α, β) ⊆ (0, 1) is (β − α). Just like a uniform distribution on (0, 1)! 2. {kφ} satisfies an ergodic theorem: if we sample a suitably well-behaved9 function f at points {kφ} and average, this converges to the integral: m 1 X Z 1 f ({kφ}) → f (x)dx m k=1 0 Just like a uniform distribution on (0, 1)! See [3]. Variously called Weyl’s ergodic principle, Weyl’s equidistribution theorem. However, {kφ} is emphatically not a random sequence. 9Continuously differentiable and periodic with period 1 ECE750-TXB Hash Functions Lecture 8: Treaps, Tries, and Hash Tables I To evaluate whether a hash function is a good choice Todd L. for a set of data S ⊆ K, one can see how the observed Veldhuizen distribution of keys into hash table slots compares to a [email protected] uniform distribution. Review: Treaps I Suppose there are n keys and m hash slots. Compute Tries the observed distribution of the keys: Hash Tables |{k : h(k) = i}| Bibliography pˆ = i n I To measure how far from uniform, compute m ˆ X D(P||U) = log2 m + pˆi log2 pˆi i=1
Convention: 0 log2 0 = 0. I This is the Kullback-Leibler divergence of the observed distribution Pˆ from the uniform distribution U. It may be thought of as the “distance” from Pˆ to U. I The smaller D(Pˆ||U), the better the hash function. ECE750-TXB Bibliography I Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [1] Jon L. Bentley and Robert Sedgewick. [email protected] Fast algorithms for sorting and searching strings. In SODA ’97: Proceedings of the eighth annual Review: Treaps Tries ACM-SIAM symposium on Discrete algorithms, pages Hash Tables 360–369, Philadelphia, PA, USA, 1997. Society for Bibliography Industrial and Applied Mathematics. bib [2] Jon Louis Bentley and Andrew Chi Chih Yao. An almost optimal algorithm for unbounded searching. Information Processing Lett., 5(3):82–87, 1976. bib pdf
[3] Bernard Chazelle. The Discrepancy Method — Randomness and Complexity. Cambridge University Press, Cambridge, 2000. bib
ECE750-TXB Bibliography II Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [4] Philippe Flajolet. [email protected]
The ubiquitous digital tree. Review: Treaps
In Bruno Durand and Wolfgang Thomas, editors, Tries
STACS, volume 3884 of Lecture Notes in Computer Hash Tables Science, pages 1–22. Springer, 2006. bib pdf Bibliography
[5] Edward Fredkin. Trie memory. Commun. ACM, 3(9):490–499, 1960. bib [6] Steffen Heinz, Justin Zobel, and Hugh E. Williams. Burst tries: a fast, efficient data structure for string keys.
ACM Trans. Inf. Syst., 20(2):192–223, 2002. bib ECE750-TXB Bibliography III Lecture 8: Treaps, Tries, and Hash Tables
Todd L. Veldhuizen [7] Donald R. Morrison. [email protected]
PATRICIA—practical algorithm to retrieve information Review: Treaps
coded in alphanumeric. Tries
J. ACM, 15(4):514–534, 1968. bib pdf Hash Tables
Bibliography [8] Stefan Nilsson and Gunnar Karlsson. IP-address lookup using LC-tries. IEEE Journal on Selected Areas in Communications, 17:1083–1092, June 1999. bib [9] Jorma Rissanen. Stochastic Complexity in Statistical Inquiry, volume 15 of Series in Computer Science. World Scientific, 1989. bib ECE750-TXB Lecture 9: Hashing
Todd L. Veldhuizen [email protected] ECE750-TXB Lecture 9: Hashing Outline
Hashing Todd L. Veldhuizen Bibliography [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 6, 2007
ECE750-TXB Hash tables Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing
I Recall that a hash table consists of Bibliography
I m slots into which we are placing items; I A map h : K → [0, m − 1] from key values to slots.
I We put n keys k1, k2,..., kn into locations h(k1), h(k2),..., h(kn).
I In the ideal situation we can then locate keys with O(1) operations. ECE750-TXB Horner’s Rule I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Horner’s rule gives an efficient method for evaluating hash functions for sequences, e.g., strings. Outline Hashing
I Consider a hash function of the form Bibliography
h(k) = k mod m
I If we wish to hash a string such as “hello,” we can interpret it as a long binary number: in ASCII, “hello” is
01101000 01100101 01101100 01101100 01101111 | {z } | {z } | {z } | {z } | {z } h e l l o
I As a sequence of integers, “hello” is [104, 101, 108, 108, 111]. We want to compute
(104 · 232 + 101 · 224 + 108 · 216 + 108 · 28 + 111 · 20) mod m
ECE750-TXB Horner’s Rule II Lecture 9: Hashing Todd L. Veldhuizen I Horner’s rule is a general trick for evaluating a [email protected]
polynomial. We write Outline
Hashing 3 2 2 ax + bx + cx + d = (ax + bx + c)x + d Bibliography = ((ax + b)x + c)x + d
So that instead of computing x3, x2,... we have only multiplications:
t1 = ax + b
t2 = t1x + c
t3 = t2x + d
I Trivia: some early CPUs included an instruction opcode for applying Horner’s rule. May be making a comeback! ECE750-TXB Horner’s Rule III Lecture 9: Hashing Todd L. I To use Horner’s rule for hashing: to compute Veldhuizen (a · 224 + b · 216 + c · 28 + d) mod m, [email protected]
Outline 8 t1 = (a · 2 + b) mod m Hashing
8 Bibliography t2 = (t1 · 2 + c) mod m 8 t3 = (t2 · 2 + d) mod m
Note that multiplying by 2k is simply a shift by k bits. I Why this works. In short, algebra. The integers Z form a ring under multiplication and addition. The hash function h(k) = k mod m can be interpreted as a homomorphism from the ring Z of integers to the ring Z/mZ of integers modulo m. Homomorphisms preserve structure in the following sense: if we write + for integer addition, and ⊕ for addition modulo m,
h(a + b) = h(a) ⊕ h(b)
i.e., it doesn’t matter whether we compute (a + b) mod m or compute (a mod m) and (b mod m) and add with modular
ECE750-TXB Horner’s Rule IV Lecture 9: Hashing arithmetic: we get the same answer either way. Similarly, if we Todd L. write × for multiplication in , and ⊗ for multiplication in /m , Veldhuizen Z Z Z [email protected] h(a × b) = h(a) ⊗ h(b) Outline Horner’s rule works precisely because h : Z → Z/mZ is a Hashing homomorphism: Bibliography h(((a × 28 + b) × 28 + c) × 28 + d) = (((h(a) ⊗ h(28) ⊕ h(b)) ⊗ h(28) ⊕ h(c)) ⊗ h(28) ⊕ h(d)) This can be optimized to use fewer applications of h, as above. In this form it is obvious why m = 28 is a horrible choice for a hash table size: 28 mod 28 = 0, so (((h(a) ⊗ h(28) ⊕ h(b)) ⊗ h(28) ⊕ h(c)) ⊗ h(28) ⊕ h(d)) = (((h(a) ⊗ 0 ⊕ h(b)) ⊗ 0 ⊕ h(c)) ⊗ 0 ⊕ h(d)) = h(d) i.e., the hash value depends only on the last byte. Similarly, if we used m = 216, we would have h(216) = 0, which would remove all but the last two bytes from the hash value computation. For background on algebra see, e.g., [1, 9, 7]. ECE750-TXB Collisions Lecture 9: Hashing Todd L. Veldhuizen [email protected]
I A collision occurs when two keys map to the same Outline location in the hash table, i.e., there are distinct Hashing x, y ∈ M such that h(x) = h(y). Bibliography
I Strategies for handling collisions: 1. Pick a value of m large enough so that collisions are rare, and can be easily dealt with e.g., by maintaining a short “overflow” list of items whose hash slot is already occupied. 2. Pick the hash function h to avoid collisions. 3. Put another data structure in each hash table slot (a list, tree, or another hash table); 4. If a hash slot is full then try some other slots in some fixed sequence (open addressing).
ECE750-TXB Collision Strategy 1: Pick m big I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Let’s see how big m must be for the probability of collisions to be small. Outline Hashing I Two cases: Bibliography I n > m: then there must be a collision, by the pigeonhole principle.1
I n ≤ m: may or may not be a collision.
I The “birthday problem”: what is the probability that amongst n people, at least two share the same birthday?
I This is a hashing problem: people are keys, days of the year are slots, and h maps people to their birthdays.
I If n ≥ 23, then the probability of two people having the 1 same birthday is > 2 . (Counterintuitive, but true.) I The “birthday problem” analysis is straightforward to adapt to hashing. ECE750-TXB Collision Strategy 1: Pick m big II Lecture 9: Hashing Todd L. I Suppose the hash function h and the distribution of Veldhuizen keys cooperate to produce a uniform distribution of keys [email protected]
into hash table slots. Outline Hashing I Recall that with a uniform distribution, probability may be computed by simple counting: Bibliography
# outcomes in which E happens Pr(event E happens) = # outcomes
I First we count the number of hash functions without collisions:
I There are m choices of where to put the first key; m − 1 choices of where to put the second key; ... m − n + 1 choices of where to put the nth key.
I The number of hash functions with no collisions is n m! 2 m = m · (m − 1) ··· (m − n + 1) = (m−n)! . (Note .) I Next we count the number of hash functions allowing collisions:
ECE750-TXB Collision Strategy 1: Pick m big III Lecture 9: Hashing Todd L. I There are m choices of where to put the first key; m Veldhuizen choices of where to put the second key; ... m choices of [email protected] where to put the nth key. Outline n I The number of hash functions allowing collisions is m . Hashing I The probability of a collision-free arrangement is Bibliography m! p = (m − n)! · mn
I Asymptotic estimate of ln p, assume m n:
n2 n n3 ln p ∼ − + + O (1) 2m 2m m2
Here we have used Stirling’s approximation and n “ n2 ” ln(m − n) = ln m − m − O m2 . 2 2 I Two cases: If n ≺ m then ln p → 0. If n m then ln p → −∞. ECE750-TXB Collision Strategy 1: Pick m big IV Lecture 9: Hashing Todd L. I Recall that if Veldhuizen [email protected]
ln p = x + Outline
Hashing
then Bibliography
p = ex+ = ex e = ex 1 + + 2 + ··· Taylor series = ex (1 + O()) if ∈ o(1)
I Probability of a collision-free arrangement is
3 − n(n−1) ! − n(n−1) n e 2m p ∼ e 2m + O m2
I Interpretation:
ECE750-TXB Collision Strategy 1: Pick m big V Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
2 Hashing I If m ∈ ω(n ) there are no collisions (almost surely). 2 Bibliography I If m ∈ o(n ) there is a collision (almost surely).
I i.e., if we want a low probability of collisions, our hash table has to be quadratic (or more) in the number of items.
1If m + 1 pigeons are placed in m pigeonholes, there must be two pigeons in the same hole. (Replace “pigeons” with “keys,” and “pigeonholes” with “hash slots.”) 2The handy notation mm is called a “falling power” [8]. ECE750-TXB Threshold functions Lecture 9: Hashing Todd L. 1 2 Veldhuizen I m = 2 n is an example of a threshold function: [email protected] I ≺ the threshold, asymptotic probability of event is 0 Outline I the threshold, asymptotic probability of event is 1. Hashing £C£C Bibliography
Prob. of no collision 1
0 XX nn2− n2 n2+ n3 Hash table size (m)
ECE750-TXB Collision Strategy 1: pick m big Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline I Picking m big is not an effective strategy for handling Hashing collisions. Bibliography I For n = 1000 elements, this table shows how big m must be to achieve the desired probability of no collisions: p m 0.1 5 000 000 0.01 50 000 000 10−6 500 000 000 000 10−9 500 000 000 000 000 ECE750-TXB Collision Strategy 1: pick m big Lecture 9: Hashing Todd L. Veldhuizen [email protected] I The analysis of collisions in hashing demonstrates two Outline pigeonhole principles. Hashing I The simplest pigeonhole principle states that if you put Bibliography ≥ m + 1 pigeons in m holes, there must be one hole with ≥ 2 pigeons.
I With respect to hash tables, the pigeonhole applies as follows: If a hash table with m slots is used to store ≥ m + 1 elements, there is a collision.
I The probability-of-collision analysis of the previous slide demonstrates a probabilistic pigeonhole principle: if you √ put ω( n) pigeons in n holes, there is a hole with ≥ 2 pigeons almost surely (i.e., with probability converging to 1 as n → ∞.)
ECE750-TXB Collision Strategy 2: pick h carefully I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Can we pick our hash function h to avoid collisions? Outline I For example, if we use hash functions of the form Hashing h(k) = bm{kφ}c Bibliography
we could try random values of φ ∈ (0, 1) until we found one that was collision-free.
I We have a probability of success − n p ∼ e 2m(m−1) (1 + o(1))
I Geometric distribution:
I Probability of success p, probability of failure 1 − p
I Each trial independent, identically distributed.
I Probability that k tries are needed for success = (1 − p)k−1p −1 I Mean: p ECE750-TXB Collision Strategy 2: pick h carefully II Lecture 9: Hashing Todd L. Veldhuizen [email protected]
I Number of values of φ we expect to try before we find a Outline
collision-free hash table for n = 1000: Hashing
Bibliography m # Expected failures before success 1000 10217 2000 10109 10000 1022 100000 147
I Picking hash functions randomly in this manner is unlikely to be practical.
I There are better strategies: see [6, 2].
ECE750-TXB Collision Strategy 3: secondary data structures I Lecture 9: Hashing Todd L. I By far the most common technique for handling Veldhuizen collisions is to put a secondary data structure in each [email protected]
hash table slot: Outline
I A linked list (‘chaining’) Hashing
I A binary search tree (BSTs) Bibliography I Another hash table n I Let α = m be the load factor: the average number of items per hash table slot.
I Assuming uniform distribution of keys into slots:
I Linked lists require 1 + α steps (on average) to find a key;
I Suitable BSTs require 1 + max(c log α, 0) steps (on average).3
I Using secondary hash tables of size quadratic in the number of elements in the slot, one can achieve O(1) lookups on average, and require only Θ(n) space. ECE750-TXB Collision Strategy 3: secondary data structures II Lecture 9: Hashing I Analysis of secondary hash tables: Todd L. Veldhuizen I Let Ni be a random variable indicating the number of [email protected] items landing in slot j. I E[Ni ] = α Outline „ « 1 1 Hashing I Var[Ni ] = n · 1 − m m Bibliography | {z } Bernoulli variance I Space required for secondary hash tables is proportional to 2 3 X 2 X 2 X 2 E 4 Ni 5 = E[Ni ] = Var[Ni ] + α 1≤i≤m 1≤i≤m 1≤i≤m „ 1 „ 1 « n2 « = m · n · 1 − + m m m2 n2 n ∼ + n − m m Plus space Θ(m) for the primary hash table = n2 Θ(m + m + n). Choosing m = Θ(n) yields linear space. 3The max(··· ) deals with the possibility that α < 1, in which case log α < 0.
ECE750-TXB Collision Strategy 4: open addressing I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Open addressing is a family of techniques for resolving collisions that do not require secondary data structures. Outline This has the advantage of not requiring any dynamic Hashing Bibliography memory allocation.
I In the simplest scenario we have a function s : H → H that is ideally a permutation of the hash values, for example the “linear probing” function
s(x) = (x + 1) mod m
I When we attempt to insert a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until an empty slot is found.
I To find a key k, we look in slot h(k), s(h(k)), s(s(h(k))), etc. until either k or an empty slot is found. ECE750-TXB Collision Strategy 4: open addressing II Lecture 9: Hashing I However, the use of permutations performs badly as the Todd L. Veldhuizen hash table becomes fuller: tend to get [email protected] “clumps/clusters,” i.e., long sequences h(k), s(h(k), s(s(h(k))),... where all the slots are Outline occupied (see e.g. [10]). Hashing
I Performance can be good for not very full tables, e.g. Bibliography 2 √ α < 3 . As α → 1 operations begin to take Θ( n) time [5]. I Quadratic probing offers less clumping: try slots h0(k), h1(k), ··· where
2 hi (k) = (h(k) + i ) mod m
h(k) is an initial fixed hash function. If m prime, the sequence hi (k) will visit every slot. I Double hashing uses two hash functions, h1 and h2:
hi (k) = (h1(k) + i · h2(k)) mod m
h1(k) gives an initial slot to try; h2(k) gives a ‘stride’ (reduces to linear probing when h2(k) = 1.)
ECE750-TXB Collision Strategy 4: open addressing III Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing
Bibliography I Under favourable conditions, an open addressing scheme behaves like a geometric distribution when searching for an open slot: the probability of finding an empty slot is 1 − α, so the expected number of trials is 1 1−α . Note the catastrophe when α → 1. ECE750-TXB Summary of collision strategies Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing Strategy E[access time] Space Bibliography Choose m big O(1) Ω(n2) Linked List 1 + α O(n + m) Binary Search Tree 1 + max(c log α, 0) O(n + m) Secondary Hash Tables O(1) O(n + m) 1 Open addressing 1−α O(m)
I Open addressing can be quite effective if α 1, but fails catastrophically as α → 1.
ECE750-TXB Summary of collision strategies Lecture 9: Hashing Todd L. I If unexpectedly n m (e.g. we have far more data than Veldhuizen we designed for), then α → ∞. For example, if [email protected]
m ∈ O(1) and n ∈ ω(1): Outline
I Linked list has O(n) accesses; Hashing I BSTs have O(log n) accesses—offer a gentler failure Bibliography mode. I If hash function is badly nonuniform: I Linked list can be O(n); I BST will have O(log n); 2 I Secondary hash tables may require O(n ) space.
I To summarize: hash table + BST will give fast search times, and let you sleep at night.
I To maintain O(1) access times as n → ∞, it is necessary to maintain m n. This can be done by choosing an allowable interval α ∈ [c1, c2]; when α > c2 resize the hash table to make α = c1. So long as c2 > c1, this strategy adds O(1) amortized time per insertion, as in dynamic arrays. ECE750-TXB Applications of hashing I Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing I Hashing is a ubiquitous concept, used not just for maintaining collections but also for Bibliography
I cryptography
I combinatorics
I data mining
I computational geometry
I databases
I router traffic analysis
I An example: probabilistic counting
ECE750-TXB Probabilistic Counting I Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing Problem: estimate the number of unique elements in a I Bibliography LARGE collection (e.g., a database, a data stream) without requiring much working space
I Useful for query optimization in databases [11]:
I e.g. to evaluate A ∩ B ∩ C can do either A ∩ (B ∩ C) or (A ∩ B) ∩ C
I one of these might be very fast, one very slow.
I have rough estimates of |B ∩ C| vs |A ∩ B| to decide which strategy will be faster. ECE750-TXB Probabilistic Counting I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Less serious (but more readily understood) example: Outline I Shakespeare’s complete works: Hashing I N=884,647 words (or so) Bibliography I n=28,239 unique words (or so)
I w = average word length I Nmax ≈ n = prior estimate on n I Problem: estimate n — the number of unique words used. Approaches: 1. Sorting: Put all 884,647 words in a list and sort, then count. (Time O(Nw log N), space O(Nw)) 2. Trie: Scan through the words and build a trie, with counters at each node; requires O(nw) space (neglecting size of counters.) 3. Super-LogLog Probabilistic Counting [3]: Use 128 bytes of space, obtain estimate of 30897 words (error 9.4%).
ECE750-TXB Probabilistic Counting I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Inputs: a multiset A of elements, possibly with many duplicates (e.g., Shakespeare’s plays) Outline Hashing I Problem: estimate card(A): the number of unique elements in A (e.g., number of distinct words Bibliography Shakespeare used)
I Simple starting idea: hash the objects into an m-element hash table. Instead of storing keys, just count the number of elements landing in each hash slot.
I Extreme cases to illustrate the principle:
I Elements of A are all different: will get an even distribution in the hash table.
I Elements of A are all the same: will get one hash table slot with all the elements!
I The shape of the hash table distribution reflects the frequency of duplicates. ECE750-TXB Probabilistic Counting Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline
Hashing I Linear Counting [11] Bibliography I Compute hash values in the range [0, Nmax) I Maintain a bitmap representing which elements of the hash table would be occupied, and estimate n from the sparsity of the hash table. I Uses Θ(Nmax) bits, e.g., on the order of card(A) bits.
I Room for improvement: the precise sparsity pattern doesn’t matter: just the number of full vs. empty slots.
ECE750-TXB Probabilistic Counting I Lecture 9: Hashing Todd L. Veldhuizen [email protected] I Probabilistic Counting [4] I Compute hash values in the range [0, Nmax) Outline I Instead of counting hash values directly, count the Hashing occurrence of hash values matching certain patterns: Bibliography
Pattern Expected occurrences xxxxxxx1 2−1 · card(A) xxxxxx10 2−2 · card(A) xxxxx100 2−3 · card(A) xxxx1000 2−4 · card(A) . . . .
Use these counts to estimate card(A).
I To improve accuracy, use m different hash functions. I Uses Θ(m log Nmax) storage, and delivers accuracy of O(m−1/2) ECE750-TXB Probabilistic Counting Lecture 9: Hashing Todd L. Veldhuizen [email protected]
Outline I Super-LogLog [3] requires Θ(log log Nmax) bits. With 1.28kb of memory can estimate card(A) to within Hashing Bibliography accuracy of 2.5% for Nmax ≤ 130 million.
I Probabilistic counters: count to N using log log N bits:
1 3 7 15 2 4 8 16
/ / / / ··· 1 1 1 1 2 4 8 16 Need log N states, which can be encoded in log log N bits.
ECE750-TXB Bibliography I Lecture 9: Hashing Todd L. Veldhuizen [email protected] [1] Stanley Burris and H. P. Sankappanavar. A Course in Universal Algebra. Outline Hashing Springer-Verlag, 1981. bib pdf Bibliography [2] Martin Dietzfelbinger, Anna Karlin, Kurt Mehlhorn, and Friedhelm MeyerAuf Der. Dynamic perfect hashing: Upper and lower bounds. SIAM J. Comput., 23(4):738–761, 1994. bib [3] Marianne Durand and Philippe Flajolet. Loglog counting of large cardinalities (extended abstract). In Giuseppe Di Battista and Uri Zwick, editors, ESA, volume 2832 of Lecture Notes in Computer Science, pages 605–617. Springer, 2003. bib pdf ECE750-TXB Bibliography II Lecture 9: Hashing Todd L. Veldhuizen [email protected] [4] Philippe Flajolet and G. N. Martin. Probabilistic counting algorithms for data base Outline applications. Hashing Journal of Computer and System Sciences, Bibliography 31(2):182–209, September 1985. bib pdf
[5] Philippe Flajolet, Patricio V. Poblete, and Alfredo Viola. On the analysis of linear probing hashing. Algorithmica, 22(4):490–515, 1998. bib pdf
[6] Michael L. Fredman and Janos Komlos an Endre Szemeredi. Storing a sparse table with 0(1) worst case access time. J. ACM, 31(3):538–544, 1984. bib
ECE750-TXB Bibliography III Lecture 9: Hashing Todd L. Veldhuizen [7] Joseph A. Gallian. [email protected] Contemporary Abstract Algebra. Outline D. C. Heath and Company, Toronto, 3rd edition, 1994. Hashing bib Bibliography [8] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics: A Foundation for Computer Science. Addison-Wesley, Reading, MA, USA, second edition, 1994. bib [9] Saunders MacLane and Garrett Birkhoff. Algebra. Chelsea Publishing Co., New York, third edition, 1988. bib ECE750-TXB Bibliography IV Lecture 9: Hashing Todd L. Veldhuizen [email protected] [10] Robert Sedgewick and Philippe Flajolet. Outline An introduction to the analysis of algorithms. Hashing Addison-Wesley Publishing Company, Reading, Bibliography MA-Menlo Park-New York-Don Mills, Ontario-Wokingham, England-Amsterdam-Bonn- Sydney-Singapore-Tokyo-Madrid-San Juan-Milan-Paris, 1996. bib [11] Kyu-Young Whang, Brad T. Vander-Zanden, and Howard M. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. Database Syst., 15(2):208–229, 1990. bib pdf ECE750-TXB Lecture 10: Design Tradeoffs, Introduction to Average-Case ECE750-TXB Lecture 10: Design Tradeoffs, Analysis Todd L. Introduction to Average-Case Analysis Veldhuizen [email protected]
Todd L. Veldhuizen [email protected]
Electrical & Computer Engineering University of Waterloo Canada
March 1, 2007
ECE750-TXB Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen [email protected] Part I
Design Tradeoffs ECE750-TXB Theme: Design Tradeoffs Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. I Tradeoffs between design parameters: A recurring Veldhuizen theme in algorithms & data structures. [email protected]
I Examples:
I By making a hash table bigger, we can decrease α (the load factor) and achieve faster search times. (A tradeoff between space and time.)
I In designing circuits to add n-bit integers, we can obtain very low delays (the maximum number of gates between inputs and outputs) by increasing the number of gates: trading time (delay) for area (number of gates)
I In many tasks we can trade the precision of an answer for time and space, e.g., responding quickly to database queries with an estimate of the answer, rather than the exact answer.
ECE750-TXB Theme: Design Tradeoffs Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen I Design tradeoffs are often parameterizable. [email protected] I For example, in speed/accuracy tradeoffs we don’t usually have to choose either speed or accuracy. Instead we have a parameter — the allowable error — that we can adjust.
I With large we get fast (but possibly not very accurate) answers
I As → 0 we get very accurate answers that take longer to compute.
I Let’s look at an example of a tradeoff in the design of data structures. ECE750-TXB Design Tradeoff: Hash tables vs. BSTs Lecture 10: Design Tradeoffs, Introduction to Average-Case I Consider representing a collection of n keys drawn from Analysis an ordered structure hK, ≤i. Todd L. Veldhuizen [email protected] I A (balanced) binary search tree (BST) has Θ(log n) search times.
I A hash table has Θ(1) search (if we keep the size number of elements, and choose an appropriate hash function.)
I Difference between these two data structures:
I A BST allows us to iterate through the elements in order, using Θ(log n) working space. The Θ(log n) space is used to record the path from the root to the iterator position in a stack.
I Items in a hash table are not stored in order — if we want to iterate through them in order, we need extra space and time, e.g. Θ(n) space for a temporary array and Θ(n log n) time to sort the items.
ECE750-TXB Design Tradeoff: Hash tables vs. BSTs Lecture 10: Design Tradeoffs, Introduction to Average-Case I We can view BSTs and hash tables as two points in a Analysis design space: Todd L. Veldhuizen Data structure Search time Working space for [email protected] ordered iteration Hash table Θ(1) Θ(n) Binary Search Tree Θ(log n) Θ(log n)
I Suppose: 9 I We have a very large (n = 10 ) collection of keys that barely fits in memory
I Dynamic: keys added and removed frequently. I We need fast search, fast insert, fast remove.
I Red-black: height is ≈ 2 log n ≈ 61 levels
I We need to be able to iterate through the collection in order.
I There is not enough room in memory to create a temporary array for sorting; also, this would be prohibitively slow. ECE750-TXB Design Tradeoff: Hash tables vs. BSTs Lecture 10: Design Tradeoffs, Introduction to I Let’s make a simple data structure that will offer a Average-Case smoother tradeoff between search time and the working Analysis Todd L. space required for an ordered iteration. Veldhuizen [email protected] I If you think of BST + hash table as two points in a design space, we want a structure that will ‘interpolate’ smoothly between them. c log n Binary Search Tree cc
Search Time
Hash Table 1 log n n Working space for ordered iteration cc
ECE750-TXB Design Tradeoff: Hash tables vs. BSTs I Lecture 10: Design Tradeoffs, Introduction to Average-Case I Consider a hash table of m slots, using a BST in each Analysis
slot to resolve collisions: Todd L. !! Veldhuizen !! aa [email protected] aa b b b b b b b b b b b b !! X XXXb b b b b b b XXhhXX b b b b b b b I Observation:
I When m = 1 we have a degenerate hash table with a single slot.
I All the keys are put in a single BST.
I So, choosing m = 1 essentially gives us a BST: we can iterate through the keys in order, search requires c log n steps, where c reflects the average tree depth. ECE750-TXB Design Tradeoff: Hash tables vs. BSTs II Lecture 10: Design Tradeoffs, Introduction to What about the case m = 2? Average-Case I Analysis I We have a hash table with two slots. If hash function is Todd L. good, we get two BSTs of roughly n/2 keys apiece. Veldhuizen [email protected] I Search time is about c log(n/2). I Can we iterate through the keys in order?
I Yes: have two iterators, one for each tree. Initially the two iterators point at the smallest key in their tree. I At each step of the iteration, choose the iterator that is pointing at the smaller of the two keys. Retrieve that key, and advance the iterator.
I Generalize: if we choose an arbitrary m,
I We will have m BSTs of average size n/m
I Search times will be around c log(n/m), assuming m n. I To iterate through the keys in order,
I Obtain m iterators, one for each tree. I At each step, choose the iterator pointing at the smallest key, retrieve that key and advance the iterator.
ECE750-TXB Design Tradeoff: Hash tables vs. BSTs III Lecture 10: Design Tradeoffs, Introduction to I To do this efficiently, we need a fast way to maintain a Average-Case collection (of iterators) that lets us quickly obtain the Analysis one with the smallest value (the iterator pointing at the Todd L. Veldhuizen smallest key) [email protected]
I Easy: a min-heap.
I Our algorithm for ordered iteration will look like this: 1. Create an array of m BST iterators, one for each hash table slot. 2. Turn this array into a min-heap, ordering iterators by the key they are pointing at. (The heap can be built in O(m) time.) 3. To obtain the next element, 3.1 Remove the least element from the min heap. (This takes O(log m) time.) 3.2 Obtain its key, and advance the iterator. (Advancing a BST iterator requires O(1) amortized time.) 3.3 Put the iterator back into the min-heap. (This takes O(log m) time.) ECE750-TXB Design Tradeoff: Hash tables vs. BSTs I Lecture 10: Design Tradeoffs, Introduction to I We can iterate through the keys in order in time Average-Case O(n(1 + log m)). Analysis Todd L. I O(m) time to obtain the iterators and build the heap Veldhuizen I O(1 + log m) time per key to adjust the heap, times n [email protected] keys = O(n log m) (The 1 + ··· handles the case m = 1.)
I Overall, O(n(1 + log m)) time, assuming m n. I The space required for iterating through the keys in order is O(m(1 + log(n/m))):
I We need m iterators, one per hash table slot. I Each iterator requires space O(1 + log(n/m)), on average, for a stack recording its position in the tree. (The 1 + ··· handles the case where n = m.)
I The number of steps for searching is on average 1 + c log(n/m), where c is a constant depending on the kind of BST we choose. The constant 1 is added to reflect visiting the correct slot in the hash table; and to handle the case where m = n, in which case c log(n/m) = 0, and having 0 search steps doesn’t make sense.
ECE750-TXB Design Tradeoff: Hash tables vs. BSTs Lecture 10: Design Tradeoffs, Introduction to I Looking at these complexities, a sensible Average-Case parameterization is m = n1−β. Analysis Todd L. I When β = 0, m = n and we get a hash table; Veldhuizen [email protected] I When β = 1, m = 1 and we get a BST. I Space and time: I Number of search steps is ≈ 1 + c log(n/m) = 1 + c log(n/(n1−β)) = 1 + c log nβ = 1 + βc log n. 1 I β directly multiplies our search time: choosing β = 2 halves our search time.
I Working space for ordered iteration is O(m(1 + log(n/m))) = O(n1−β(1 + log nβ)). E.g., if we choose β = 1 we are twice as fast as a BST I 2 √ for searching, and require O( n log n) working space for ordered iteration. I The amount of extra space we need for ordered iteration, relative to the space needed to store the keys, n1−β (1+β log n) −β is ≈ n = n (1 + β log n). NB: if β > 0 the relative space overhead for supplying ordered iteration is → 0. ECE750-TXB Design Tradeoff: Hash tables vs. BSTs Lecture 10: Design Tradeoffs, 9 Introduction to I Let’s look at some real-life numbers. Take n = 10 keys. Average-Case Analysis I Assume we use red-black trees, so that average depth of Todd L. keys in a tree of n/m elements is ≤ 2 log(n/m). Veldhuizen [email protected] Parameter #Search steps Space for iter. Space overhead β 1 + β2 log n n1−β (1 + β log n) n−β (1 + β log n) (Hash) 0 1 1000000000 100% 1/8 4.7 355237568 35% 1/4 16.0 47654705 4.7% 1/2 31.9 504341 0.05% 3/4 45.8 4165 0.0004% 7/8 53.3 362 0.00004% (BST) 1 60.8 31 0.000003%
I e.g. Choosing β = 1/4, we can get searches 4 times faster than the plain red-black tree, and have only a 4.7% space overhead for ordered iteration.
I Choosing β = 1/2, we can get searches twice as fast as a plain red-black tree, with a 0.05% space overhead for ordered iteration.
ECE750-TXB Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen [email protected]
Part II Bibliography
Introduction to Average-Case Analysis ECE750-TXB Average-case Analysis Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen I Worst-case analysis is very important for some [email protected] applications (e.g., real-time systems: worst-case execution time), and in theoretical computer science. Bibliography
I However, average-case performance is usually more important for practical engineering work.
I Given a choice between a data structure that always finds an item in 253 steps, vs one that finds an item in 5 steps on average (but with probability 10−12 takes more than 10000 steps), we would usually choose the fast-on-average data structure.
I In practice we are usually interested in performance for the average case, rather than worst case.
ECE750-TXB Average-case Analysis Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
I We can often find algorithms + data structures that are Todd L. Veldhuizen much more efficient on average than the best [email protected] worst-case data structures. Bibliography I We shall see that randomness can be an extremely effective tool for achieving good average-case performance.
I Example: uniquely represented dictionaries.
I It is known that deterministic (no randomness) tree-based√ uniquely represented dictionaries require Ω( n) time for insert/search operations
I Sundar-Tarjan trees [3] achieve this bound. I Treaps are uniquely represented and on average achieve O(log n) search and insert. However, with vanishingly small probability they may require O(n) time. ECE750-TXB Average-case Analysis Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis I Example: QuickSort Todd L. I QuickSort is the standard sorting algorithm. It achieves Veldhuizen O(n log n) time on average, and is faster in practice [email protected] than other algorithms. However, in the worst case it Bibliography requires O(n2) time.
I Merge Sort requires O(n log n) time in the worst case, but is slower in practice than QuickSort.
I Example: Searching
I Binary search requires O(log n) time in the worst case. I Interpolation search [1] requires O(log log n) time on average, if the data is uniformly distributed. However, it can require O(n) time with pathological distributions.
I The function log log n is so slowly growing as to be effectively constant: log log 1010 ≈ 5; log log 1020 ≈ 6; log log 1040 ≈ 7. I We can often use hashing to make an arbitrary key distribution uniform.
ECE750-TXB Some key ideas we will explore I Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis 1. We can always obtain an algorithm that combines the Todd L. best average case and best worst-case bounds of any Veldhuizen [email protected] algorithms we have available. (We needn’t settle for “fast on average, but occasionally catastrophically Bibliography slow.”) 2. If an algorithm has O(f (n)) average time, the probability of taking ω(f (n)) time goes to zero. 3. The amount of randomness, or entropy, of an input distribution plays a critical role in the performance of algorithms.
I Randomness can help average-case performance: if there are comparatively few “worst cases,” then as long as we have at least a certain amount of randomness in the inputs, those worst cases do not contribute to the average running time. ECE750-TXB Some key ideas we will explore II Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen [email protected]
Bibliography I Randomness can hurt average-case performance: we can design BSTs so that search time depends not on the number of keys n, but just on the amount of randomness in the distribution of keys we are asked to search for. In this case, the less randomness, the better!
ECE750-TXB Best of both worlds Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. I Algorithms that are good in the average-case Veldhuizen occasionally break down on pathological examples, e.g., [email protected] 2 that make QuickSort run in O(n ) time, cause search Bibliography trees of height O(n), etc.
I Suppose we have a pair of algorithms:
I Algorithm A has average case time Θ(f (n)) and worst case Θ(g(n)) 0 I Algorithm B has average case time Θ(f (n)) and worst case Θ(g 0(n))
I We can always construct an algorithm that has the best of both: 0 I average case time Θ(min(f (n), f (n))) and 0 I worst case time Θ(min(g(n), g (n))). ECE750-TXB Best of both worlds Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis Easy: Run A and B in parallel (or interleaved), and Todd L. I Veldhuizen return the result of the first finisher. [email protected]
Algorithm Bibliography A Input First finisher
Algorithm e B
I E.g. we can simultaneously perform a binary search and an interpolation search: this gives an algorithm with O(log log n) average case, and O(log n) worst case [2].
I However, note that this may entail larger constant factors and extra space compared to using a single algorithm.
ECE750-TXB Applications of randomness Lecture 10: Design Tradeoffs, Here are a few of the applications of randomness we will Introduction to I Average-Case encounter: Analysis I Modelling the distribution of inputs, so we can: Todd L. Veldhuizen I compute average-case performance; [email protected] I determine the structure of “typical” inputs, so we can tune our algorithm to them; Bibliography I Exploiting the amount of randomness (entropy) of inputs to achieve better performance;
I Using randomness to force some quantity of interest into a desirable distribution (e.g., height of a treap);
I Using randomness to foil an adversary: e.g. our algorithm/data structure could perform poorly only if the entity generating the queries could predict some random sequence we were using
I Using randomness to break symmetry, e.g., leader election in distributed systems;
I Using randomness to efficiently approximate answers in extremely small amounts of time or space.
I Random distributions of inputs can cause complexity classes to collapse, so that problems that were hard to solve efficiently suddenly become efficiently solveable on average. ECE750-TXB Typical inputs I Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen I We can use tools of average-case analysis to [email protected] characterize the “typical case” for which we should tune our algorithms. Bibliography
I Simple example: find the first nonzero bit in a binary string of n bits.1
I Simple strategy: scan the string from left to right, stop when a 1 is encountered.
I Clearly has worst case O(n). I What does a typical input look like?
I With a uniform distribution on n-bit strings, each bit is 1 0 or 1 with probability 2 .
ECE750-TXB Typical inputs II Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis I Waiting time to encounter a 1 follows a geometric distribution: probability of success p = 1 . Todd L. 2 Veldhuizen [email protected] Bitstring pattern Probability 1 Bibliography 1xxxxx ··· 2 1 01xxxx ··· 4 1 001xxx ··· 8 . . . .
1 I Mean of a geometric distribution is p = 2. 1 I Conclusion: on average we encounter a 1 after p = 2 bits. The running time of our naive “scan left to right looking for a 1” algorithm is Θ(1) — does not depend on n.
1In practice, many CPUs have a single instruction that will compute this for you. ECE750-TXB Typical inputs Lecture 10: Design Tradeoffs, Introduction to I This is an example of an exponential concentration: Average-Case Analysis 1 2 Todd L. Veldhuizen Prob. [email protected] 1 4 1 Bibliography 8 1 16 1 2 3 4 5 6 7 ··· Location of first nonzero bit
I Probability is concentrated around the mean. I Probability of the first nonzero bit being ≥ 2 + δ is ≤ 2−δ−1 = O(2−δ): falls off exponentially quickly. I Exponential concentrations are enormously useful. I An exponential concentration can swallow any polynomial function: if f (n) = O(na) for a ∈ N, then ∞ X O(2−δ) · f (c + δ) = O(1) δ=0 The tail of the distribution can contribute only an O(1) factor when we compute an average of f (··· ).
ECE750-TXB Typical inputs Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen I On a more practical note, understanding typical inputs [email protected] can help us engineer fast implementations. 2 Bibliography I Consider our “first nonzero bit” example: I From our analysis we know that with probability 15 16 ≈ 0.94 will encounter a 1 in the first 4 bits. I Design: have a lookup table indexed by the first four bits: 94% of the time we can just glance in the table and return.
I For the other ≈ 6% of the time, scan the remaining bits to find the first 1.
2Again, this is only ‘for example’: in practice if you were at all interested in performance you would be using a single cpu instruction for this. ECE750-TXB Average-case time Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen I We will consider several equivalent definitions of [email protected] average-case performance. First, an informal definition: Bibliography An algorithm runs in average time O(f (n)) (respectively, Θ, Ω, o, ω) if the average time T (n) is O(f (n)), where X T (n) = Pr(w) · T (w) | {z } | {z } all inputs w probability of time of algorithm of size n input w on input w
ECE750-TXB Average-case time Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis I Let’s make this more precise. Todd L. Veldhuizen I For each n, let Kn be all possible inputs of size n. [email protected]
I For each n, let µn : Kn → R be a probability Bibliography distribution on Kn. I µn(w) ≥ 0 for all w ∈ Kn. (Probabilities are positive.) P I µ (w) = 1. (Probabilities sum to 1.) w∈Kn n I Example: Bit strings n I Kn = {0, 1} , e.g., K2 = {00, 01, 10, 11}. 1 I We often choose the uniform distribution µn(w) = 2n . 1 1 e.g. µ2(00) = 4 , µ2(01) = 4 , etc.
I (Kn)n∈N is a family of sets indexed by n.
I Similarly, (µn)n∈N is a family of distributions. We often call (µn)n∈N an asymptotic distribution. ECE750-TXB Average-case time Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis
Todd L. Veldhuizen [email protected] I With these definitions in hand, our first formal
definition: Bibliography Definition (1) Let the average time T (n) for inputs of size n be given by X T (n) = µn(w)T (w)
w∈Kn
An algorithm has average-case time O(f (n)) if and only if T (n) ∈ O(f (n)).
ECE750-TXB Bibliography I Lecture 10: Design Tradeoffs, Introduction to Average-Case Analysis [1] Yehoshua Perl, Alon Itai, and Haim Avni. Todd L. Interpolation search—a log logn search. Veldhuizen [email protected] Commun. ACM, 21(7):550–553, 1978. bib pdf Bibliography [2] Nicola Santoro and Jeffrey B. Sidney. Interpolation-binary search. Inform. Process. Lett., 20(4):179–181, 1985. bib pdf
[3] R. Sundar and R. E. Tarjan. Unique binary search tree representations and equality-testing of sets and sequences. In Baruch Awerbuch, editor, Proceedings of the 22nd Annual ACM Symposium on the Theory of Computing, pages 18–25, Baltimore, MY, May 1990. ACM Press. bib pdf ECE750-TXB Lecture 11: Probability
Todd L. Veldhuizen ECE750-TXB Lecture 11: Probability [email protected]
Bibliography
Todd L. Veldhuizen [email protected]
Electrical & Computer Engineering University of Waterloo Canada
February 28, 2007
ECE750-TXB Twentieth-Century Probability Lecture 11: Probability
Todd L. Veldhuizen [email protected] I Two foundational questions: Bibliography 1. How to define probability of infinite sequences in a meaningful way? 2. What is a randomness? What is a “random sequence”?
I Two landmarks: 1. Andrei N. Kolmogorov (1933): The rigorous formulation of probability theory using measure theory, allowing a consistent treatment of both finite and infinite sample spaces. 2. Per Martin-L¨of(1966): An acceptable definition of random sequences using constructive measure theory. Martin-L¨of’s definition implies a sequence is random if a computer is incapable of compressing it. [2] ECE750-TXB Probability and Measure I Lecture 11: Probability
Todd L. Veldhuizen I Modern probability theory is based on the concept of a [email protected]
measure. A measure generalizes the idea of “volumes,” Bibliography “lengths,” “probabilities,” and so forth.
I Consider defining a “length measure” that assigns a measure to subsets of the real line.
I Recall the following notations for closed and open intervals:
[a, b] = {x ∈ R : a ≤ x ≤ b} (a, b) = {x ∈ R : a < x < b} where we require a ≤ b.
I We will say that the interval [a, b] has measure b − a.
I The empty set ∅ has measure 0.
I This is the usual definition of the measure of an interval, called Lebesgue measure.
ECE750-TXB Probability and Measure II Lecture 11: Probability I Lebesgue measure is a function µ that assigns measures Todd L. to certain subsets of the reals R. For example, Veldhuizen µ([1, 2]) = 1. [email protected]
I What should the measure of [1, 2] ∪ [3, 5] be? Bibliography
I [1, 2] and [3, 5] are disjoint sets, so we should be able to add their measures: µ([1, 2]) = 1 and µ([3, 5]) = 2, so we can set µ([1, 2] ∪ [3, 5]) = 3.
I In general if X and Y are disjoint sets,
µ(A ∪ B) = µ(A) + µ(B)
I What should be the measure of the open interval (1, 2) (that is, [1, 2] with the endpoints removed)?
I Note that [1, 2] = (1, 2) ∪ {1} ∪ {2}.
I We can write the set {1} as [1, 1], and by our previous definition
µ([1, 1]) = 0
similarly for {2}: “points have no length.” ECE750-TXB Probability and Measure III Lecture 11: Probability
Todd L. I We can then apply the disjoint-sets rule to say that Veldhuizen [email protected] µ([1, 2]) = µ({1} ∪ (1, 2) ∪ {2}) Bibliography = µ({1}) +µ( (1, 2) ) + µ({2}) | {z } | {z } =0 =0
therefore µ([1, 2]) = µ( (1, 2) ).
I By similar reasoning, µ([a, b]) = µ((a, b) ).
I The disjoint-sets rule implies we can combine any finite number of disjoint sets:
n X µ(A1 ∪ A2 ∪ · · · ∪ An) = µ(Ai ) i=1 Should this extend also to an infinite collection of disjoint sets?
ECE750-TXB Probability and Measure IV Lecture 11: Probability
I For example, we can construct the interval (0, 1] as the Todd L. union of the intervals Veldhuizen [email protected] 1 1 1 1 1 [0, 1] = 2 , 1 ∪ 4 , 2 ∪ 8 , 4 ∪ · · · Bibliography We would like to say that
∞ ! ∞ [ X µ 2−i−1, 2−i = 2−i−1 = 1 i=0 i=0
I We could also build the interval (0, 1] as a union of S points: 0 This is an inconsistency. ECE750-TXB Probability and Measure V Lecture 11: Probability Todd L. I To avoid this particular inconsistency, we can restrict Veldhuizen the union-of-disjoint-sets rule to countable sequences of [email protected] sets, i.e., collections of sets that can be put into Bibliography one-to-one correspondence with the naturals: I If A1, A2, ··· are a countable sequence of pairwise disjoint sets, then ! [ X µ Ai = µ(Ai ) i∈N i∈N I However, even this restriction (to countable sequences of sets) is not enough. In certain flavours of set theory (e.g., ZFC: usual set theory with the axiom of choice), measures cannot be defined to every subset of R in a consistent way; this leads to contradictions like being able to chop up the unit interval (measure 1) into pieces and reassemble it into something of measure 2. (If interested, see Vitali sets, Banach-Tarski paradox.) ECE750-TXB Probability and Measure VI Lecture 11: Probability Todd L. I Measure theory sidesteps inconsistencies by declaring Veldhuizen certain sets to be nonmeasurable. In Lebesgue measure [email protected] on the real line, measurable sets are defined by the Bibliography following rules: 1. R is measurable, and has measure µ(R) = ∞. 2. All intervals of the form [a, b] are measurable. 3. If A is measurable, then its complement R \ A is measurable. 4. The union of a finite or countable sequence of measurable sets is measurable. I Sets that cannot be constructed by the above rules are deemed nonmeasurable (e.g., the Cantor set). I So, a measure space consists of three things: 1. A set Ω on which measures are being defined (e.g., R) 2. A set F of measurable sets. Each X ∈ F is a subset of Ω. 3. A measure µ : F → [0, ∞]. ECE750-TXB Probability Spaces I Lecture 11: Probability Todd L. I A probability space comprises Veldhuizen [email protected] I A sample space of outcomes. For continuous n distributions the sample space is often R or R ; for Bibliography discrete distributions the sample space is often Z, N, {0, 1}, etc. I A class of measurable sets of the sample space; I A probability measure that assigns probabilities to events. I A probability space is a triple (Ω, F , µ) where I Ω is a sample space. The elements x ∈ Ω are outcomes. I F is a collection of subsets of Ω we call events (the measurable sets); I µ : F → R is a probability measure. I Of the events F , these properties are required: I Ω ∈ F (we can measure the whole sample space) I F is closed under complementation and countable union: I If X ∈ F then so is its complement (Ω \ X ) ∈ F ; ECE750-TXB Probability Spaces II Lecture 11: Probability Todd L. S Veldhuizen I If X1, X2, ··· are in F , then so is Xi . i∈N [email protected] I The probability measure µ must satisfy these properties (the Kolmogorov axioms): Bibliography 1. Probabilities are positive: for every X ∈ F , µ(X ) ≥ 0. 2. µ(Ω) = 1. (Probabilities sum to 1.) 3. For any finite or countable sequence of pairwise disjoint events X1, X2, ··· , [ X µ Xi = µ(Xi ) i (The probability of one of the events Xi happening is the sum of their probabilities.) I Finite probability spaces are very simple. We usually take F = 2Ω (the powerset of Ω), i.e., every subset of Ω is measurable. ECE750-TXB Probability Spaces III Lecture 11: Probability I Example: a random bit. We can define the probability Todd L. space by Ω = {0, 1}, F = {∅, {0}, {1}, {0, 1}}, and Veldhuizen [email protected] µ(∅) = 0 Bibliography 1 µ({0}) = 2 1 µ({1}) = 2 µ({0, 1}) = 1 I We can take a product of two probability spaces. For a uniform distribution on two bits, we can use the probability space (Ω2, F2, µ2) = (Ω, F , µ) × (Ω, F , µ) which has I Sample space Ω2 = Ω × Ω = {(0, 0), (0, 1), (1, 0), (1, 1)}, i.e., all combinations of two bits; ECE750-TXB Probability Spaces IV Lecture 11: Probability I F2 = F × F , containing all pairs of events drawn from Todd L. Veldhuizen F ; [email protected] I Probability measure µ2 defined by Bibliography µ2(X , Y ) = µ(X )µ(Y ) for all X , Y . Note this implies that events in the first probability space of the product are independent from those of the second. I We can repeat this process to obtain probability measures µ3, µ4,... on any finite number of bits. I One of the useful consequences of using the measure-theoretic treatment of probability is the Kolmogorov extension theorem, which says that the finite distributions µ1, µ2, µ3,... (on bit sequences of length 1, 2, 3, etc.) define a unique stochastic process: ω I a probability space whose sample space Ω is infinite binary sequences, ECE750-TXB Probability Spaces V Lecture 11: Probability Todd L. Veldhuizen [email protected] I with an appropriate set F of measurable sets of Bibliography sequences, I with a probability measure µω on those measurable sets; I such that the finite projections of µω (e.g., the first k bits) match the finite distributions µ1, µ2, ··· . I Any set of finite distributions satisfying the Kolmogorov consistency conditions can be extended to a random process in this way. This particular stochastic process is an example of a Bernoulli process, in which outcomes are sequences of digits drawn from a binary alphabet. ECE750-TXB Probability Basics I Lecture 11: Probability Todd L. Veldhuizen I Informally, we will just write Pr(··· ) to mean the [email protected] probability of some event; the implication is that “··· ” Bibliography specifies some measurable event X ∈ F . I For example, when we write Pr(Z ≥ 0) we are referring to µ(X ) where X is the set of outcomes X ∈ F in which Z ≥ 0. (Strictly speaking, it would not make sense to write µ(Z ≥ 0) because ‘Z ≥ 0’ is a formula, rather than a subset of Ω.) Probability essentials. 1. Independence. Two events X and Y are said to be independent if and only if µ(X ∩ Y ) = µ(X ) · µ(Y ) i.e., Pr(both X and Y happen) = Pr(X ) · Pr(Y ) ECE750-TXB Probability Basics II Lecture 11: Probability Similarly, events X1, X2, ··· are independent if and only Todd L. Veldhuizen if [email protected] \ Y µ Xi = µ(Xi ) Bibliography i 2. Union bound. For any finite or countable set of events X1, X2, ··· , [ X µ Xi ≤ µ(Xi ) (1) i Eqn. (1) is an equality when the Xi are pairwise disjoint. The union bound is often used to obtain an upper bound on the probability of some rare event happening. I Example: what is the probability that a binary string of length√ n chosen uniformly at random contains a run of ≥ n ones? (For convenience, we limit ourselves to n = k2 for some integer k.) ECE750-TXB Probability Basics III Lecture 11: Probability I Uniformly at random means the probability space is Todd L. n Veldhuizen defined as a product measure Ω with Ω = {0, 1}, and [email protected] 1 µ1(0) = µ1(1) = 2 . Bibliography I Define√ Xi to be the event that the string contains√ a run of n ones starting at position√ i, where i ≤ n − n. I The probability of a run of n 1’s starting at position i 1 is µ (X ) = √ . (This is obvious, but to be finicky we n i 2 n could consider every possible string having such a run starting at i. Each such string w is an event {w}; the events √ are pairwise disjoint; there are 2n− n such strings, each with probability 2−n, so summing over the pairwise disjoint events √ (cf. Kolmogorov’s 3rd axiom) the sum comes out to 2− n.) √ I The events X1, X2, ··· , Xn− n are definitely not independent. However, we can use the union bound to obtain √ n− n √ X √ 1 Pr(run of n ones) ≤ µ(Xi ) = (n − n) √ 2 n i=1 ECE750-TXB Probability Basics IV Lecture 11: √ Probability So, the√ probability of having a run of n ones is − n Todd L. O(n2 ), which is going to zero very quickly. (Note Veldhuizen we used O(·), not Θ(·), because we have a possibly [email protected] loose upper bound.) Bibliography 3. Inclusion-Exclusion. For any two events X1 and X2, µ(X1 ∪ X2) = µ(X1) + µ(X2) − µ(X1 ∩ X2) More generally, for any events X1, X2, ··· , [ X µ Xi = µ(Xi ) i X − µ(Xi ∩ Xj ) i ECE750-TXB Probability Basics V Lecture 11: Probability I This can be easily remembered by looking at Venn Todd L. Veldhuizen diagrams: sum up the areas, subtract the things you [email protected] counted twice, then add in the things you subtracted too many times, ... Bibliography I Inclusion-Exclusion is a particular instance of a general principle called M¨obius inversion. 4. Random variables. In the measure-theoretic treatment of probability, a random variable Z is a function Z :Ω → V from outcomes to some set V. Commonly V is R (a continuous random variable), N (a discrete random variable), or {0, 1} (an indicator variable or Bernoulli variable). I Example. Take bitstrings of length n again. Let Z be a random variable counting the number of ones in the string. Then formally Z is a function from {0, 1}n → N, so that for example Z(010010110) = 4 ECE750-TXB Probability Basics VI Lecture 11: Probability 5. Indicator (Bernoulli) random variables. In the special Todd L. Veldhuizen case that a random variable takes on only the values 0 [email protected] and 1, it is called an indicator variable or Bernoulli random variable. We can associate each event E ∈ F with an Bibliography indicator variable ZE that is 1 if E occurs, and 0 otherwise, i.e. ( 1 if X ∈ E ZE (X ) = 0 otherwise 6. Expectation. For a random variable Z, we write E[Z] for the expected value of Z. This is simply the average over the sample space. For a finite probability space (i.e., the number of outcomes |Ω| is finite), X E[Z] ≡ Z(X )µ({X }) (2) X ∈Ω But in the general case (i.e., a possibly infinite sample space) the expectation is an integral over the probability ECE750-TXB Probability Basics VII Lecture 11: Probability space; in the case that Ω = R this is the familiar Todd L. Veldhuizen integral on the real line defined in terms of pdf’s. I.e., if [email protected] F (x) = µ((−∞, x] ) (the cdf), and f (x) = F 0(x) (the Bibliography pdf), then Z +∞ E[Z] = xf (x)dx x=−∞ 7. Linearity of Expectation. If Z1, Z2,... are random variables, then " # X X E Zi = E[Zi ] i i I The usefulness of this cannot be understated! The random variables may be far from independent, but we can still sum their expectation. ECE750-TXB Probability Basics VIII Lecture 11: Probability Todd L. Veldhuizen I In particular, the combination of indicator variables with [email protected] linearity of expectation is very powerful, and one of the most basic tools of the Probabilistic Method [1]. Bibliography I In algorithm analysis, we can sometimes choose a set of indicator variables Z1, Z2,... where each Zi represents some piece of work we may or may not have to do. The expected value of the sum E[Z1 + Z2 + ··· ] is an upper bound on on the average amount of work we need to do. For example, in [3, §2.5] you can find an ingeniously simple analysis of the average time complexity of QuickSort using this method. I It can also be used to characterize what the typical “largest occurrence” of some pattern is in a random object. For example, what is the expected length of the longest run of 1’s in a random n-bit string? I Let t be the length of a run. (We will solve for t to find the most likely situation.) ECE750-TXB Probability Basics IX Lecture 11: Probability Todd L. Veldhuizen [email protected] I Let X1, X2,..., Xn−t+1 be indicator random variables, where Xi = 1 just when a run of ≥ t 1’s starts at Bibliography position i. The probability of a run of t 1’s is 2−t , so −t Pr(Xi = 1) = 2 . I Let Y = X1 + X2 + ... + Xn−t+1 be the expected number of runs of length ≥ t. Although the Xi are not independent, we can use linearity of expectation to obtain E[Y ] = E[X1 + X2 + ··· + Xn−t+1] = E[X1] + E[X2] + ··· + E[Xn−t+1] = (n − t + 1)2−t To find a likely value for t, we can set E[Y ] = 1: i.e., we want to find the value of t for which we expect to have one run of t 1’s. ECE750-TXB Probability Basics X Lecture 11: Probability Todd L. I This is a simple asymptotics exercise: taking Veldhuizen logarithms of E[Y ] = 1, we obtain [email protected] t = log(n − t + 1) Bibliography “ t ” = log(n) − Θ n „ log n « = log(n) − Θ n I Note also the similarity to the union bound: if we view the Xi as events (rather than random variables), we can say ! [ X Pr Xi ≤ Pr(Xi ) i i = (n − t + 1)2−t This works because the expected value of a indicator variable is just its probability of being 1. ECE750-TXB Probability Basics XI Lecture 11: Probability I We can combine the above results to obtain a Todd L. Veldhuizen concentration inequality for the length of the longest [email protected] run: set t = log(n) + δ. Then Bibliography Pr (run of length ≥ log(n) + δ) ≤ (n − log n − δ + 1)2− log n−δ 1 = (n − log n − δ + 1) 2−δ n = (1 − o(1))2−δ So, with very little work we have obtained an exponential concentration for the length of the longest run. 8. Markov’s inequality. This inequality gives us quick but loose bounds on the deviation of random variables from their expectation. If X is a random variable and α > 0, then [|X |] Pr(|X | ≥ α) ≤ E α ECE750-TXB Probability Basics XII Lecture 11: Probability Todd L. Veldhuizen [email protected] If X takes on only positive values, then the | · |’s may be dropped. Bibliography I Example. Recall that the expected height of a binary search tree constructed by inserting a random sequence of n keys is c log n, with c ≈ 4.311. If we let H be a random variable representing the height of the tree, then Markov’s inequality gives c log n log n Pr(H ≥ βn) ≤ = O βn n So, the probability of getting a tree of height Θ(n) log n tends to zero as O n . ECE750-TXB Probability Basics XIII Lecture 11: Probability Todd L. Veldhuizen [email protected] I Example. Suppose an algorithm runs in average-case Bibliography time f (n) but has worst-case time g(n), where g(n) f (n). What is the probability that the algorithm will take time g(n)? Treat the running time as a random variable; applying Markov’s inequality, we immediately obtain f (n) Pr(running in time ≥ g(n)) ≤ g(n) Since f (n) ≺ g(n), the probability of running in time ≥ g(n) is tending to zero, at least as quickly as the ratio between the average and worst-case time. ECE750-TXB Probability Basics XIV Lecture 11: Probability Todd L. Veldhuizen 9. Variance and Standard Deviation. Recall that the [email protected] variance and standard deviation of a random variable X are: Bibliography 2 2 2 Var[X ] = E[(X − E[X ]) ] = E[X ] − (E[X ]) σ[X ] = (Var[X ])1/2 A common special case: if X is a Bernoulli random variable with probability β of being 1, then Var[X ] = β(1 − β). If X1, X2,... are independent random variables, then " # X X Var Xi = Var[Xi ] i i ECE750-TXB Probability Basics XV Lecture 11: Probability 10. Chebyshev’s bound. Chebyshev’s bound, like Markov’s Todd L. Veldhuizen bound, is a tail inequality that gives a bound on how [email protected] slowly probability can drop as you move away from the mean. Bibliography 1 Pr(|X − [X ]| ≥ aσ[X ]) ≤ E a2 11. Distributions. A probability space (Ω, F , µ) on Ω = N or Ω = R coincides with our familiar idea of a “probability distribution.” A random variable X : F → ΩX has an associated distribution (probability space) (ΩX , FX , µX ), where I The sample space is ΩX ; I The measurable events are given by −1 FX = {E ⊆ ΩX : X (E) ∈ F } −1 I The probability measure is µX (E) = µ(X (E)). ECE750-TXB Probability Basics XVI Lecture 11: Probability Todd L. Veldhuizen [email protected] I Consider a uniform distribution on 3-bit strings, i.e., Ω = {0, 1}3, and a random variable X that counts the Bibliography number of bits that are 1, e.g. X (011) = 2. Then −1 µX (2) = µ(X (2)) = µ({110, 101, 011}) I For a continuous random variable, e.g., something of the form X :Ω → R, the familiar probability density function (pdf) and cumulative density function (cdf) are: F (x) = µ( (0, x]) d f (x) = dx F (x) ECE750-TXB Bibliography I Lecture 11: Probability Todd L. Veldhuizen [email protected] [1] Noga Alon and Joel Spencer. Bibliography The Probabilistic Method. John Wiley, second edition, 2000. bib [2] Per Martin-L¨of. The definition of random sequences. Information and Control, 9(6):602–619, December 1966. bib [3] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. bib ECE750-TXB Lecture 12: Markov Chains and their Applications Todd L. ECE750-TXB Lecture 12: Markov Chains Veldhuizen and their Applications [email protected] Bibliography Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada February 20, 2007 ECE750-TXB Markov Chains I Lecture 12: Markov Chains and their Applications Todd L. I The probabilistic counter is a simple example of a Veldhuizen [email protected] Markov chain. Roughly speaking, a Markov chain is a finite state machine with probabilistic transitions. Bibliography Consider the follow two-state Markov chain with transition probabilities as shown: p " 1−p A B 1−p 9 b f p /.-,()*+ /.-,()*+ Let fA(n) be the probability that the machine is in state A at step n, similarly fB (n). Obviously the machine can only be in one state at a time, so fA(n) + fB (n) = 1 ECE750-TXB Markov Chains II Lecture 12: Markov Chains and their Applications Todd L. Let’s consider the scenario where the machine is in state Veldhuizen A to begin: [email protected] Bibliography fA(0) = 1 fB (0) = 0 These are the initial conditions. We can write equations to describe how the system evolves. For each state, we look at the incident edges, and write the probability of being in that state at time n in terms of where it was at time n − 1. We reach state A at time n I with probability (1 − p) if we were in state A at time n − 1; I with probability p if we were in state B at time n − 1 ECE750-TXB Markov Chains III Lecture 12: Markov Chains and This leads to the equations their Applications Todd L. fA(n) = (1 − p)fA(n − 1) + pfB (n − 1) Veldhuizen [email protected] fB (n) = pfA(n − 1) + (1 − p)fB (n − 1) Bibliography To encode the initial conditions we add δ(n) to the equation for fA; this will result in fA(0) = 1. In matrix form, we can write the equations as: f (n) 1 − p p f (n − 1) δ(n) A = A + fB (n) p 1 − p fB (n − 1) 0 | {z } P These are called the Chapman-Kolmogorov equations. The matrix P is the transition matrix. Taking Z-transforms, we obtain −1 FA(z) 1 − p p z FA(z) 1 = −1 + FB (z) p 1 − p z FB (z) 0 | {z } P ECE750-TXB Markov Chains IV Lecture 12: Markov Chains and their Applications Todd L. Rearranging, Veldhuizen [email protected] Bibliography 1 − p p −1 1 0 FA(z) 1 z − = p 1 − p 0 1 FB (z) 0 | {z } | {z } | {z } | {z } P I F x which we can write as the equation (Pz−1 − I)F = x (1) or, z−1(P − zI)F = x (2) ECE750-TXB Markov Chains V Lecture 12: Markov Chains and I If you are familiar with eigenvalues, the term (P − zI) should their Applications look conspicuous. Note that we can solve Eqn. (1) by Todd L. −1 Veldhuizen left-multiplying through by z(P − zI) to obtain [email protected] −1 F = z(P − zI) x Bibliography Furthermore, (P − zI)−1 can be written in terms of the −1 adj K adjoint and determinant: recall that K = |K| , where |K| is the determinant. We can then write the solution as: z · adj (P − zI) F = x |P − zI| So, the poles of the functions in F will occur at values of z where |P − zI| = 0 Compare this to the characteristic equation for the eigenvalues of P: |P − λI| = 0 ECE750-TXB Markov Chains VI Lecture 12: Markov Chains and their Applications Todd L. I the poles are located at λ1, λ2, ··· , where the λi are Veldhuizen the eigenvalues of the transition matrix P. [email protected] Bibliography Solving for FA(z), we obtain (1 − (1 − p)z−1) F (z) = A (1 − z−1)(1 − (1 − 2p)z−1) This has poles at z = 1 and 1 − 2p, and a zero at 1 − p. I The pole at z = 1 reflects the limiting distribution of −1 c the chain. (Recall that Z [ 1−z−1 ] = c · u(n).) I The pole at 1 − 2p produces transient behaviour (so long as 0 < p < 1.) I If p = 0 or p = 1, the zero at 1 − p cancels one of the poles. ECE750-TXB Markov Chains VII Lecture 12: Markov Chains and Taking the inverse transform, we obtain. their Applications Todd L. 1 1 Veldhuizen f (n) = + (1 − 2p)n [email protected] A 2 2 |{z} | {z } Bibliography pole z=1 pole z=1−2p 1 1 f (n) = − (1 − 2p)n B 2 2 where we have used fB (n) = 1 − fA(n). Depending on the value of p, this two-state Markov chain can exhibit five distinct behaviours: 1. p = 0: the machine always stays in state 0. The only possible sequence is AAAAAA ··· . 2. p = 1: always get a strict alternation between states: ABABABAB ··· 1 3. p < 2 : get monotone, exponentially fast approach to 1 limiting distribution fA(n) = fB (n) = 2 . 1 1 4. p = 2 : get fA(n) = fB (n) = 2 for n ≥ 1. Every sequence of A0s and B0s is equally likely. ECE750-TXB Markov Chains VIII Lecture 12: Markov Chains and 1 their Applications 5. p > 2 : get oscillating decay to limiting distribution of 1 Todd L. fA(n) = fB (n) = 2 . Veldhuizen [email protected] I Markov Chains. I A (finite) Markov chain is a set of states Bibliography S = {s1,..., sn} together with a matrix P of transition probabilities pij of moving from state si to state sj . ω I The sample space is Ω = S , i.e., infinite sequences of states. I For an initial distribution u, the distribution after n steps is Pnu. (n) I Write pij for the probability of going from state i to state j in n steps. (Note: this is the entry (i,j) from the matrix Pn.) + (n) Write i → j if there is an n > 0 such that pij > 0, i.e., j can be reached from i. If i →+ j and j →+ i, we say i and j are communicating, and we can write i ↔ j. The relation ↔ is an ECE750-TXB Markov Chains IX Lecture 12: Markov Chains and equivalence relation, and it partitions the states into their Applications classes of states that communicate with each other. Todd L. Classification of Markov chains: Veldhuizen I [email protected] Markov chain U kkk UUU Bibliography kkk UUUU kkk UU Reducible Irreducible/Ergodic ii RRR iiii RRR iiii RRR Aperiodic/Mixing Periodic I Irreducible: all states are communicating. This means there is only one class of long-term behaviours. If a chain is irreducible, there is a limiting distribution u such that Pu = u, and the chain spends a proportion of time ui in state i. This chain is irreducible: 1 % A B O /.-,()*+1 C o /.-,()*+1 /.-,()*+ ECE750-TXB Markov Chains X Lecture 12: Markov Chains and T their Applications and the distribution u = 1 1 1 is a limiting 3 3 3 Todd L. distribution. Veldhuizen In particular, for a sample sequence, [email protected] 1 Bibliography ui = limn→∞ n (#times in state i) with probability 1. (This is a Ces`aro limit: the probability of being in state i after n steps might not converge — it might be 0, 0, 1, 0, 0, 1,... — but the Ces`aro limit does converge.) An irreducible Markov chain is ergodic — meaning that sample space average coincides with time averages (with probability 1). (Note that some authors use “ergodic” as a synonym for “mixing,” which is not quite correct.) I A Markov chain is aperiodic if for all i, j, (n) gcd{n : pij > 0} = 1 otherwise, the chain is called periodic. ECE750-TXB Markov Chains XI Lecture 12: Markov Chains and I If a chain is not irreducible, it is called reducible. There their Applications are multiple equivalence classes under ↔; means there Todd L. is more than one possible long-term behaviour. A Veldhuizen [email protected] simple example is: 1 1 Bibliography 1 1 Õ 2 2 A o B / C I An irreducible chain is mixing if for any initial distribution p0, /.-,()*+ /.-,()*+ /.-,()*+ n lim A p0 = u n→∞ where u is the limiting distribution. This chain is mixing: 3/4 1 A n 4 B Q 1 1 = 4 4 /.-,()*+ /.-,()*+ 3/4 C q 3/4 /.-,()*+ ECE750-TXB Markov Chains XII Lecture 12: Markov Chains and their Applications Todd L. A chain that is mixing “forgets” its initial conditions, in Veldhuizen n [email protected] the sense that the distribution A p0 is asymptotically independent of the distribution p0. Bibliography I An absorbing state is one that communicates only with itself: 1 A A chain is called absorbing if every state communicates with an absorbing state. I Applications discussed in lecture: I PageRank (Google) I Anomoly detection ECE750-TXB Convergence Time of Markov Chains I Lecture 12: Markov Chains and their Applications I Recall that the z-transform of the Todd L. Veldhuizen Chapman-Kolmogorov equations has the form [email protected] N(z) Bibliography F = |P−zI | x where P is the transition matrix, N(z) is some polynomial in z, and x encodes the initial conditions. I The inverse z-transform of fi (n) will have terms N(z) corresponding to the poles of |P−zI | ; for example, n n fi (n) = ui + α2(λ2) + α3(λ3) + ··· where λk are the poles; equivalently the eigenvalues of the transition matrix P. By convention we number the eigenvalues so that |λ1| ≥ |λ2| ≥ |λ3| ≥ · · · ECE750-TXB Convergence Time of Markov Chains II Lecture 12: Markov Chains and their Applications The largest eigenvalue, λ , always satisfies λ = 1, 1 1 Todd L. which generates the limiting distribution term u . Veldhuizen i [email protected] I The rate of convergence to the limiting distribution is Bibliography governed by |λ2|, the magnitude of the second-largest pole/eigenvalue. (Or, the first largest eigenvalue with |λi | < 1, if there are multiple eigenvalues equal to 1.) For example, if λ2 = 0.99, then get a term of the form n α2(0.99) , which has a half-life of 4n ≈ 69 (very slow convergence.) If λ2 = 0.7, then half-life is 4n ≈ 2 (very fast convergence.) I |λ2| is sometimes referred to as the SLEM: Second Largest Eigenvalue Modulus. (Modulus = magnitude). I In designing randomized algorithms that can be modelled as Markov chains, we can optimize the convergence time by minimizing |λ2|. ECE750-TXB Leader Election I Lecture 12: Markov Chains and their Applications Scenario: have some number of computers that can Todd L. I Veldhuizen communicate with each other. We want them to [email protected] randomly elect one of them to be the leader. Each Bibliography computer must run the same algorithm. I One method for leader election is the following: I Initially, each machine thinks it is the leader. I At each time step, machines broadcast whether they think they are the leader or not. I If a machine thinks it is the leader, but some other machine also does, it flips a coin to decide whether to drop out or stay in the leadership race. I The process is finished when there is only one machine that thinks it is the leader. I If nobody thinks they are the leader, start all over again with everyone thinking they are the leader. I Goal: minimize the amount of time needed to elect a leader. ECE750-TXB Leader Election II Lecture 12: Markov Chains and I Implies: make the Markov chain converge to a stable their Applications configuration as quickly as possible. Todd L. I Implies: choose the dropout probability p in order to Veldhuizen [email protected] make the secondary pole as close to the origin as possible. Bibliography I E.g., with 2 players. I 4 states: 11 (everyone thinks they could be a leader), 10,01 (one leader), 00 (both drop out). (1−p)2 11 H B p(1−p) || BB p(1−p) || BB || BB ~|| B 1 01 1 p2 10 1 1 m Ø 00 ECE750-TXB Leader Election III Lecture 12: Markov Chains and their Applications We will show that the average time to elect a leader is Todd L. I √ Veldhuizen 1 [email protected] minimized when p = 2 (3 − 5) ≈ 0.381966. I Equations: Bibliography 2 3 2 2 3 2 3 2 3 f00(n) 0 0 0 p f00(n − 1) 0 6 f01(n) 7 6 0 1 0 p(1 − p) 7 6 f01(n − 1) 7 6 0 7 6 7 = 6 7 6 7 + 6 7 4 f10(n) 5 4 0 0 1 p(1 − p) 5 4 f10(n − 1) 5 4 0 5 2 f11(n) 1 0 0 (1 − p) f11(n − 1) δ(n) I The eigenvalues/pole locations are: λ1 = 1 λ2 = 1 1 2 1 1 p 4 3 2 λ3,4 = 2 p − p + 2 ± 2 p − 4p + 10p − 4p + 1 (via Maple.) ECE750-TXB Leader Election IV Lecture 12: Markov Chains and their Applications Todd L. Veldhuizen [email protected] Bibliography I The positive pole dominates. To find the p that dλ3 minimizes it, set dp = 0 to obtain √ 1 p = 2 (3 − 5) I This yields λ3 ≈ 0.618, corresponding to a half-life of about 1.44. (Choosing p = 1/2, an unbiased coin flip, would yield λ3 ≈ 0.64, and half-life ≈ 1.55.) ECE750-TXB Bibliography I Lecture 12: Markov Chains and their Applications Todd L. Veldhuizen [email protected] Bibliography ECE750-TXB Lecture 13: Information Theory Todd L. ECE750-TXB Lecture 13: Information Veldhuizen Theory [email protected] Bibliography Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada February 22, 2007 ECE750-TXB Entropy I Lecture 13: Information Theory I The central concept of information theory is entropy, Todd L. Veldhuizen which measures: [email protected] 1. The “amount of randomness” in a distribution; 2. How many bits are required, on average, to represent an Bibliography object, assuming the distribution is known to both the encoder and decoder. I Entropy is a functional from distributions to R: if µ is a distribution, then the entropy H(µ) is a real number describing “how random” the distribution µ is. I The following requirements lead to a unique definition of entropy (up to a multiplicative constant). Suppose µ is a distribution on {1, 2,..., n}; we can treat µ as a n vector of R , i.e., µ = [p1 p2 ··· pn] where p1 is the probability of outcome 1, etc. ECE750-TXB Entropy II Lecture 13: Information 1. Continuity: H(µ) should be a continuous function of µ. Theory Using a δ − definition of continuity, for each µ and Todd L. Veldhuizen each > 0 there is a δ > 0 such that [email protected] kµ − µ0k < δ implies kH(µ) − H(µ0)k < Bibliography 2. Maximality: H(µ) attains its maximum value when 1 1 1 µ = [ n n ··· n ], i.e., a uniform distribution is the “most random.” 3. Additivity: If µ and µ0 are two distributions then H(µ × µ0) = H(µ) + H(µ0) i.e., the entropy of a product of probability spaces is the sum of their entropy. 4. Expandability: If we expand the distribution µ from the domain {1, 2,..., n} to the domain {1, 2,..., n + 1}, then H([p1 p2 ··· pn]) = H([p1 p2 ··· pn 0]) ECE750-TXB Entropy III Lecture 13: Information Theory I The unique function satisfying these conditions is Todd L. Veldhuizen n [email protected] X H(µ) = β −p log p i 2 i Bibliography i=1 where 0 log 0 = 0 to satisfy continuity. The constant β is usually taken to be 1 so that the entropy can be interpreted as “the number of bits needed to represent an outcome.” 1 1 1 I Example. Let µ = [ n n ··· n ]. Then n ˝ X (µ) = −pi log pi i=1 1 1 = n · (− n log n ) = log n ECE750-TXB Entropy IV Lecture 13: Information Theory Todd L. With a uniform distribution on n outcomes, log n bits Veldhuizen are needed to represent an outcome. The uniform [email protected] distribution is the only distribution on n outcomes that Bibliography has entropy log n. I Example. Let µ = [0 0 1 0 0 ··· 0]. Then H(µ) = (n − 1)(−0 log 0) − (1 log 1) = 0 The only distributions with H(µ) = 0 are those where a single outcome has probability 1. 1 1 I Example. Let µ = [ 2 2 ]. Then H(µ) = 1: one bit is required to represent the outcome of a uniform distribution on two outcomes (e.g., a fair coin flip.) ECE750-TXB Entropy V Lecture 13: Information I Example. Let µ = [p (1 − p)]. (A Bernoulli RV with Theory probability p.) Then Todd L. Veldhuizen [email protected] H(µ) = −p log p − (1 − p) log(1 − p) Bibliography Entropy of a Bernoulli random variable 1 0.9 H 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Probability p ECE750-TXB Noiseless coding theorem I Lecture 13: Information Theory ∗ Todd L. I Recall {0, 1} is the set of finite binary strings. Veldhuizen ∗ [email protected] I Let C ⊆ {0, 1} be a prefix-free set of codewords. (Prefix-free means no codeword occurs as a prefix of Bibliography another codeword; see the lecture on tries.) I A code for µ is a function c : dom µ → C. (Note that we can treat c as a random variable.) I The average code length of c is X c = E[c] = µ(x)|c(x)| x∈dom µ I Shannon’s noiseless coding theorem states that: 1. c ≥ H(µ). That is, no code can achieve an average code length less than the entropy. 2. There exists a code achieving c ≤ 1 + H(µ). ECE750-TXB Applications of information theory to algorithms Lecture 13: Information Theory Todd L. Veldhuizen [email protected] I Information theory is applied in algorithm design and Bibliography analysis in several ways: 1. Deriving lower bounds on time or space required for an algorithm or data structure using a uniform distribution. (Usually called “information-theoretic lower bound.”) 2. Designing structures for searching that exploit the entropy of the distribution on keys. 3. The noiseless coding theorem can be used to derive probability bounds. (Often such arguments are phrased in terms of Kolmogorov complexity; the technique is called “The Incompressibility Method.”) ECE750-TXB Information-theoretic lower bounds I Lecture 13: Information Theory Todd L. I Information theory provides a quick method for obtain Veldhuizen lower bounds, i.e., Ω(·)-bounds, on time and space. [email protected] I Example: sorting an array. Consider the problem of Bibliography sorting an array of n integers using comparison operations and swaps. I Assume no two elements of the array are equal. I Assume comparisons such as “a[i] ≤ a[j]” are the only method of obtaining information about the input array ordering. I There are n! orderings of the input array, but only one possible output ordering. 00 I Each comparison test of the form “a[i] ≤ a[j] yields a true-or-false answer, i.e., at most one bit of information about the ordering of the input array. I To distinguish amongst the n! possible orderings of the input array, at least log n! tests are required, on average. This can be established in several ways: ECE750-TXB Information-theoretic lower bounds II Lecture 13: Information Theory Todd L. Veldhuizen [email protected] I Decision tree: consider the sequence of comparisons performed by a sorting algorithm as a path through a Bibliography tree, where each tree node represents a comparison test; the left branch is taken if a[i] > a[j], and the right branch is taken if a[i] ≤ a[j]. Each leaf of the tree is a rearrangement (sorting) of the input array. Since there are n! possible derangements of the input array, the tree must have ≥ n! leaves; since it is a binary tree, it must be of depth ≥ log n!. I It must be possible to recover the initial array ordering from the sequence of test outcomes; so, the sequence of test outcomes constitutes a “code” for input array orderings. Apply the noiseless coding theorem: at least log n! comparisons are required, on average. ECE750-TXB Information-theoretic lower bounds III Lecture 13: Information Theory Todd L. I After sorting the input array, all information about the Veldhuizen original ordering is lost. If we wanted to “run the [email protected] algorithm backwards” and recover the original input Bibliography array, we would need log n! bits of information to reproduce the original, unsorted array. We say that sorting the array incurs an “irreversibility cost” of log n! bits. RAM and Turing-machine models allow at most O(1) bits of information to be “erased” at each step. Therefore Ω(log n!) = Ω(n log n) time steps are required. (These ideas are stock-in-trade of “thermodynamics of computation,” which investigates (among other things) the minimum amount of heat that must be produced when computations are performed.) From one of these three arguments, one can conclude that any comparison-based sorting algorithm requires Ω(log n!) = Ω(n log n) worst-case time. ECE750-TXB Information-theoretic lower bounds IV Lecture 13: Information Theory Todd L. I It is possible to sort in o(n log n) time in certain Veldhuizen circumstances: for example, to sort a large array of [email protected] values in the range [0, 255] one can simply maintain an Bibliography array of 256 counters, scan the array to build a histogram, and then expand the histogram into a sorted array. This can be done with O(n) operations. I However, due to either the second or third argument above (the coding argument or the reversibility cost argument), it is never possible to sort an array in less than O(H) operations, where H is the entropy of the ordering of the input array. I Example: Binary Decision Diagrams (BDDs). BDDs are a very popular representation for boolean functions. I A boolean function on k variables is a function k f (x1,..., xk ): {0, 1} → {0, 1}. ECE750-TXB Information-theoretic lower bounds V Lecture 13: Information Theory I Example: the majority function on three-variables MAJ(x, y, z) is true when a majority of its inputs are Todd L. Veldhuizen true: [email protected] x y z MAJ(x, y, z) Bibliography 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 k I Note that a boolean function on k variables has 2 possible input combinations (hence, 8 lines in the above truth table.) I BDDs are popular because they can often represent a boolean function very compactly, e.g., linear in the number of variables. Can they always do this? ECE750-TXB Information-theoretic lower bounds VI Lecture 13: Information Theory I Put a uniform distribution on boolean functions of k k Todd L. variables. There are 2 possible input combinations; a Veldhuizen boolean function can be true or false for each input [email protected] 2k combination; hence there are 2 boolean functions on Bibliography k variables. I Under a uniform distribution, the entropy is 2k k log2 2 = 2 . I Any representation of boolean functions can be viewed as a “code” to represent the boolean function in memory. Applying the noiseless coding theorem, we get c ≥ 2k That is, any representation for boolean functions requires an average code length of 2k bits — exponential in the number of inputs. I We can therefore say: any representation of boolean functions requires Ω(2k ) bits per function, on average, with a uniform distribution. ECE750-TXB Information-theoretic lower bounds VII Lecture 13: Information Theory Todd L. Veldhuizen [email protected] Bibliography I The fact that BDDs often achieve small representations suggests that: 1. the distribution on boolean functions used in practice has quite low entropy; 2. BDDs are a reasonably efficient code for that distribution. ECE750-TXB Entropy and searching I Lecture 13: Information Theory I Scenario: retrieving records from a database. Todd L. Veldhuizen [email protected] I Let K be a set of keys, and n = |K|. We have seen that binary search, or binary search trees, yield worst-case Bibliography time of O(log n). I It turns out that the best average-case performance that can be achieved does not depend on n per se, but rather on the entropy of the input distribution on keys. I Suppose µ is a probability distribution on K, indicating the frequency with which keys are requested. I In some applications, search problems have highly nonuniform distributions on input keys. I Often the probability distribution on very large key sets follows a distribution where the mth most popular key has probability ≺ m−1 (cf. Zipf’s law, Chris Anderson’s article The Long Tail.). I Let H = H(µ) be the entropy of the input distribution. ECE750-TXB Entropy and searching II Lecture 13: Information Theory Todd L. Veldhuizen [email protected] I We can achieve search times of O(H) by placing commonly requested keys close to the root of the tree. Bibliography I Example: suppose we are performing dictionary lookups on the following set of keys: Key Probability 1 entropy 2 1 caffeine 4 1 Markov 16 1 thermodynamics 16 1 convex 16 1 stationary 16 ECE750-TXB Entropy and searching III Lecture 13: Information Theory I We can use the following search tree: Todd L. Veldhuizen entropy [email protected] RR nnnn RR caffeine stationary Bibliography o l UUU ooo lll UU convex markov thermodynamics I We have placed the most commonly requested keys (entropy, caffeine) close to the root. I Average depth is 1 1 1 1 2 (1) + 4 (2) + 16 (2) + 3 · 16 (3) = 1.69 I There are good algorithms to design such search trees offline (i.e., when the distribution is known): one can build binary search trees that achieve optimal search times e.g. [1]. ECE750-TXB Entropy and searching IV Lecture 13: Information Theory I However, in some situations it is impractical to build Todd L. Veldhuizen such trees offline, since the contents of the database are [email protected] changing, or the key distribution is not stationary (e.g., every day brings different “hot topics” that people are Bibliography searching for.) I Splay trees [2] are fascinating binary search trees that reorganize themselves in response to input key distributions. The underlying idea is very simple: each time a key is requested, it is moved to the root of the tree. In this way, popular keys tend to hang out close to the root. I Splay trees are known to be optimal for a static distribution on keys. Their performance for nonstationary distributions is a longstanding open question: the “dynamic optimality conjecture.” ECE750-TXB Bibliography I Lecture 13: Information Theory Todd L. Veldhuizen [email protected] Bibliography [1] Kurt Mehlhorn. Nearly optimal binary search trees. Acta Informat., 5(4):287–295, 1975. bib pdf [2] Daniel Dominic Sleator and Robert Endre Tarjan. Self-adjusting binary search trees. J. ACM, 32(3):652–686, July 1985. bib pdf ECE750-TXB Lecture 14: Typical Inputs Todd L. Veldhuizen ECE750-TXB Lecture 14: Typical Inputs [email protected] Bibliography Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada February 27, 2007 ECE750-TXB Asymptotic Distributions I Lecture 14: Typical Inputs Todd L. Veldhuizen I Scenario for average-case analysis: [email protected] I n is the input “size” Bibliography I Inputs: (Kn)n∈N is a family of sets indexed by n, giving the possible inputs to an algorithm of size n. I For each n, there is a probability distribution µn on inputs. I Example: if an algorithm operates on binary strings, we could choose size to mean “length of the string,” and n Kn = {0, 1} (strings of length n.) I An asymptotic distribution is a family of probability distributions (µn)n∈N where µn is a probability measure on the sample space Kn. I When our meaning is obvious, we will write µn(w) for the probability of the input w ∈ Kn. (If µn is a measure then to be fastidious we should write µn({w}), where {w} is the event that the outcome is w. But, writing µ(w) is clearer.) ECE750-TXB Asymptotic Distributions II Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] Bibliography I Example: for binary strings of length n, the uniform distribution on {0, 1}n is defined by 1 µ (w) = n 2n n since there are |Kn| = 2 binary strings of length n. I To design algorithms that behave well on average, it helps to know what properties are “typical” for the distribution of inputs. ECE750-TXB Sets of asymptotic measure 1 I Lecture 14: Typical Inputs S Todd L. Let K = Kn be all possible inputs of any length. I i∈N Veldhuizen [email protected] I For a class of inputs A ⊆ K, the asymptotic measure of A, if it exists, is given by Bibliography µ∞(A) = lim µn(A ∩ Kn) n→∞ I Note that in general the limit may not exist. For example, taking Kn to be binary strings, the probability of the set A = {w ∈ {0, 1}∗ : |w| is even} alternates between 0 and 1: µ3(A ∩ K3) = 0 µ4(A ∩ K4) = 1 µ5(A ∩ K5) = 0 ECE750-TXB Sets of asymptotic measure 1 II Lecture 14: Typical Inputs . . Todd L. Veldhuizen [email protected] and so the limit fails to exist. Bibliography I If µ∞(A) = 1, we can say I A has asymptotic measure 1; I A is almost surely true; I A happens almost surely; I The phrases with probability 1, almost certain, almost always, and with high probability are also used. I The abbreviation a.s. is commonly used for almost surely. I Let’s look at a few examples of almost sure properties of random binary strings: 1. Runs of 1’s 2. The balance of 0’s and 1’s 3. The position of the first nonzero bit 4. The number of prime divisors of the string when interpreted as a base-2 integer. ECE750-TXB Runs in random strings I Lecture 14: Typical Inputs Todd L. I Example: Runs of 1’s in binary strings. Veldhuizen n [email protected] I Let Kn = {0, 1} be binary strings of length n, and µn the uniform distribution. Bibliography I Define the random variable R : Kn → N to be the length of the longest run of 1’s. For example, R(0100111110100) = 5 I Recall that in previous lectures we obtained a concentration inequality for R: Pr(R ≥ t) ≤ (n − t + 1)2−t We used the union bound: let Xi be the event that a run of t 1’s starts at position i; then Pr(R ≥ t) = µ(X1 ∪ · · · Xn−t+1) ECE750-TXB Runs in random strings II Lecture 14: Typical Inputs n−t+1 Todd L. X Veldhuizen ≤ µ(Xi ) [email protected] i=1 Bibliography = (n − t + 1)2−t where we are requiring that t ≤ n of course. I Assume t ≺ n, set Pr(R ≥ t) = 1 and take logarithms: t t ≤ log(n − t + 1) = log n − Θ n I Choosing t(n) = log n + δ, we obtain Pr(R ≥ log n + δ) ≤ (n − log n − δ + 1)2− log n−δ 1 −δ = n (n − log n − δ + 1)2 = (1 − o(1))2−δ ECE750-TXB Runs in random strings III Lecture 14: Typical Inputs Todd L. Veldhuizen Conversely, [email protected] Pr(R < log n + δ) = 1 − Pr(R ≥ log n + δ) Bibliography > 1 − (1 − o(1))2−δ I If δ ∈ ω(1) then Pr(R < log n + δ) → 1. I We can say “Almost surely, a binary string chosen uniformly at random does not have a run of length log n + ω(1).” I Define Aδ ≡ {w ∈ K : longest run length is < log |w| + δ(|w|)} (Aδ is a family of sets of strings, indexed by a function δ.) ECE750-TXB Runs in random strings IV Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] I For any function δ, Bibliography −δ(n) µn(Aδ ∩ Kn) > 1 − (1 − o(1))2 A less sharp, but clearer statement is: −δ(n) µn(Aδ ∩ Kn) = 1 − O(2 ) Note δ ∈ ω(1) implies µ∞(Aδ) = 1. I Aδ is an example of what we shall call a typical set. ECE750-TXB Balance of 0’s and 1’s I Lecture 14: Typical Inputs Todd L. I Example: Balance of 0’s and 1’s in a string. Veldhuizen [email protected] I Choose a binary string of length n uniformly at random, Bibliography and define the random variables Y1,..., Yn by: ( +1 if the i thbit is 1 Yi = −1 if the i thbit is 0 Then E[Yi ] = 0, and 2 Var[Yi ] = E[(Yi − E[Yi ]) ] = 1 Pn I Let Y = i=1 Yi . I Y can be interpreted as a “random walk” on Z, where each bit of the string indicates whether to move up or down. I |Y | is the discrepancy between the number of zeros and ones. ECE750-TXB Balance of 0’s and 1’s II Lecture 14: Typical Inputs Todd L. The expectation and variance of Y are: Veldhuizen I [email protected] E[Y ] = 0 Bibliography n X Var[Y ] = Var[Yi ] = n i=1 I To bound the discrepancy |Y | we can use: Theorem (Chernoff inequality) Let Y1,..., Yn be discrete, independent random variables Pn with E[Yi ] = 0 and |Yi | ≤ 1 for all i. Let Y = i=1 Xi , and σ2 = Var[Y ] be the variance of Y . Then 2 Pr(|Y | ≥ λσ) ≤ 2e−λ /4 ECE750-TXB Balance of 0’s and 1’s III Lecture 14: 2 Typical Inputs I Applying the Chernoff inequality with σ = n, we obtain Todd L. √ −λ2/4 Veldhuizen Pr(|Y | ≥ λ n) ≤ 2e [email protected] Bibliography I Let’s work the right-hand side of the inequality into the form 2−δ. Setting 2−δ = 2e−λ2/4 and solving we obtain λ = 2pln 2(1 + δ) I Substituting, Pr(|Y | ≥ 2pn(δ + 1) ln 2) ≤ 2−δ I Let Bδ be the set of binary strings satisfying this bound: n p o Bδ = w ∈ K : discrepancy < 2 |w|(δ + 1) ln 2 I As in the previous example, −δ µn(Bδ ∩ Kn) = 1 − O(2 ) ECE750-TXB First nonzero bit I Lecture 14: Typical Inputs Todd L. Example: First nonzero bit in a string. Veldhuizen I [email protected] I As before consider binary strings of length n under a uniform distribution. Bibliography I Let Y be an R.V. indicating the position of the first nonzero bit: for example, Y (000010110111) = 5 1 I Y has a geometric distribution with probability p = 2 : 1 [Y ] = = 2 E 1 − p δ k X 1 Pr(Y ≤ δ) = = 1 − 2−δ 2 k=1 ECE750-TXB First nonzero bit II Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] Bibliography I Let Cδ be strings whose first nonzero bit is at position ≤ δ; then −δ µn(Cδ ∩ Kn) = 1 − 2 I Almost surely, a binary string of length n has a 1 in a position ≤ f (n) for any f ∈ ω(1). ECE750-TXB Erd¨os-Kac theorem I Lecture 14: Typical Inputs Todd L. I Example: Number of prime divisors. Veldhuizen [email protected] I Let w be a binary string of length n chosen uniformly at random. Bibliography I We can interpret w as a number (written in base 2): for example, given the string w = 010011, we can take 0100112 = 19 I Let W be a random variable counting the number of prime divisors of w. I The Erd¨os-Kac theorem [1] states that the distribution of W converges to a normal distribution: E[W ] = ln n + ln ln 2 + o(1) ! Z b 1 W − E[W ] 1 − t2 Pr a ≤ p ≤ b = √ e 2 dt + o(1) E[W ] 2π a ECE750-TXB Erd¨os-Kac theorem II Lecture 14: Typical Inputs I Choosing a = −b and integrating the normal Todd L. Veldhuizen distribution, [email protected] ! Bibliography W − E[W ] b Pr −b ≤ p ≤ b = 1 − erfc √ E[W ] 2 I We employ the following inequality, found on the internet (MathWorld) so it must be true: 2 e−α2 erfc(α) < √ √ π α + α2 + 2 I This yields ! b2 W − [W ] 2 e− 2 Pr −b ≤ E ≤ b > 1 − √ p π q 2 E[W ] √b + b + 2 2 2 ECE750-TXB Erd¨os-Kac theorem III Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] I If b = b(n) = ω(1) then Bibliography ! b2 W − E[W ] − Pr −b ≤ p ≤ b > 1 − O e 2 E[W ] where we have deliberately made the asymptotic bound less sharp to make the next step easier: setting b2 O 2−δ = O e− 2 √ we obtain b = 2δ ln 2. ECE750-TXB Erd¨os-Kac theorem IV Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] Therefore the number of prime divisors W satisfies I Bibliography p −δ Pr |W − E[W ]| ≤ 2δ ln 2 · E[W ] > 1 − O 2 (1) where E[W ] = ln n + ln ln 2 + o(1) I Let Dδ be the set of strings w ∈ K satisfying Eqn. (1), where n = |w|. ECE750-TXB Typical sets I Lecture 14: Typical Inputs The following definition of typical sets is loosely inspired by a similar Todd L. Veldhuizen idea in information theory, but using a parameter δ resembling the [email protected] “randomness deficiency” of Kolmogorov complexity [3, 2]. Bibliography Definition Let Aδ be a family of sets indexed by functions δ : N → R. We say Aδ is typical if −δ(n) µn(Aδ ∩ Kn) = 1 − O(2 ) I We will call Aδ a typical set, even though strictly speaking it is a family of sets indexed by δ. I The following properties are straightforward: 1. If δ ∈ ω(1) then µ∞(Aδ) = 1. 2. If Aδ ⊆ Bδ, and Aδ is a typical set, then so is Bδ. S 3. The set of all possible inputs Kδ = K = Kn is n∈N typical. ECE750-TXB Typical sets II Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] I A typical set represents an almost sure property with an Bibliography exponential concentration inequality: I Every input is in Aδ almost surely when δ ∈ ω(1); −δ I The probability of not being in Aδ falls off as O(2 ). I The intersection Aδ ∩ Bδ ∩ Cδ ∩ ... of any finite number of typical sets is also typical. We prove this for the intersection of two sets; any finite number follows by induction. Proposition If Aδ and Bδ are typical, so is Cδ = Aδ ∩ Bδ. ECE750-TXB Typical sets III Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] I We will use the following elementary probability identity. Bibliography Pr(α ∧ β) = Pr(¬¬(α ∧ β)) = 1 − Pr(¬(α ∧ β)) = 1 − Pr(¬α ∨ ¬β) = 1 − [Pr(¬α) + Pr(¬β) − Pr((¬α) ∧ (¬β))] = 1 − (1 − Pr(α)) − (1 − Pr(β)) + Pr((¬α) ∧ (¬β)) = Pr(α) + Pr(β) − 1 + Pr(¬α ∧ ¬β) Proof. ECE750-TXB Typical sets IV Lecture 14: Typical Inputs Todd L. Let Aδ,n = Aδ ∩ Kn, and similarly for Bδ,n. Write Aδ,n for the Veldhuizen [email protected] complement Kn \ Aδ,n. We start from the following identity: Bibliography µ(Aδ,n ∩ Bδ,n) = µ(Aδ,n) + µ(Bδ,n) − 1 + µ(Aδ,n ∩ Bδ,n) Note that −δ −δ µ(Aδ,n) = 1 − µ(Aδ,n) = 1 − (1 − O(2 )) = O(2 ) and similarly for µ(Bδ,n). Since µ(Aδ,n ∩ Bδ,n) ≤ max(µ(Aδ,n), µ(Bδ,n)), −δ µ(Aδ,n ∩ Bδ,n) = O(2 ) Therefore µ(Aδ,n ∩ Bδ,n) = µ(Aδ,n) + µ(Bδ,n) −1 + µ(Aδ,n ∩ Bδ,n) | {z } | {z } | {z } =1−O(2−δ ) =1−O(2−δ ) O(2−δ ) = 1 − O(2−δ) ECE750-TXB Typical sets V Lecture 14: Typical Inputs Todd L. Veldhuizen I Binary strings chosen uniformly at random have all of [email protected] the following properties, almost surely, for any δ ∈ ω(1): Bibliography 1. A run of 1’s no longer than log n + δ; 2. The discrepancy between the number of 0’s and 1’s is less than p4n(δ + 1) ln 2; 3. The first nonzero bit appears at a position ≤ δ; 4. When viewed as a base-2 integer, has ln n + ln ln 2 prime divisors ±δ1/2p2 ln 2(ln n + ln ln 2). −δ −3 I For example, choosing δ = 10 (2 ≈ 10 ), a 1024-bit string has, with fairly high probability: 1. A run of ≤ 20 bits; 2. A discrepancy of ≤ 176 bits; 3. A 1 in the first 10 positions; 4. About 6.5 ± 9.5 prime divisors. ECE750-TXB Typical sets VI Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] Bibliography I (Note that the constant factors associated with the concentration inequality 1 − O(2−δ) may change when we take intersections of typical sets. For these examples I am just using δ = 10 and hiding the constant factor inside the waffly “fairly high probability.”) ECE750-TXB Typical sets as a filter Lecture 14: Typical Inputs I We have established the following properties of typical Todd L. sets: Veldhuizen [email protected] 1. If Aδ is typical, and Aδ ⊆ Bδ, then Bδ is typical. 2. If Aδ and Bδ are typical, then Aδ ∩ Bδ is typical. Bibliography S 3. The set of all inputs Kδ = K = Kn is typical. n∈N 4. The empty set ∅ is not typical. I The typical sets form a mathematical structure called a filter. K I A filter on a set K is a collection F ⊆ 2 of subsets of K satisfying these properties: 1. If A ∈ F and A ⊆ B, then B ∈ F . 2. If A, B ∈ F then (A ∩ B) ∈ F ; 3. K ∈ F ; 4. ∅ 6∈ F . I Filters are a bit abstract, but powerful. One useful application is an ultraproduct, which can be used to 1 construct a single (infinite) structure that embodies the Σ1 1 properties of typical inputs. (Σ1 properties are definable by second-order sentences of the form ∃R1,..., Rk . ψ(R1,..., Rk ) — which includes first-order 1 sentences. For example, χ-colourability of graphs is a Σ1 property.) ECE750-TXB Typical sets and average-case time I Lecture 14: Typical Inputs Todd L. I Say an algorithm runs in time O(f (n)) on a typical set Veldhuizen [email protected] Aδ if for any δ ∈ O(1), the algorithm has worst-case performance O(f (n)) on Aδ. Bibliography I Question: does running in time O(f (n)) on a typical set imply average-case time O(f (n))? I Answer: not necessarily — it’s easy to construct counterexamples. Consider the following algorithm on strings: function Broken(w) if w = 111 ··· 11 then wait for 22|w| seconds return I It returns right away, unless the string is all 1’s, in n which case it takes O(22 ) time (where n = |w|). ECE750-TXB Typical sets and average-case time II Lecture 14: Typical Inputs Todd L. I So, it runs in O(1) time on a typical set. (Using, for Veldhuizen [email protected] example, the set Aδ of strings with runs of length < log |w| + δ.) Bibliography −n −n 2n 2n−n I Average time is (1 − 2 ) · c + 2 · O(2 ) = O(2 ). Doubly-exponential! I Suppose the worst-case running time of the algorithm can be expressed in the form O(g(n, δ)): note that O(g(n, O(1))) gives worst-case time on a typical set. I The average-case time is then: log |Kn| X T (n) = O(2−δ)O(g(n, δ)) δ=0 log |Kn| X = O(2−δg(n, δ)) δ=0 ECE750-TXB Typical sets and average-case time III Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] Bibliography P∞ −k c Note that anything of the form k=0 2 k where c ∈ O(1) converges to a constant — an exponential concentration swallows any polynomial. g(n,δ) I If g(n,O(1)) is at most polynomial in δ, then worst-case time on the typical set equals average-case time. ECE750-TXB Example: No-carry adder I Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] I The no-carry adder is a simple algorithm for adding Bibliography binary numbers. I Let x0, y0 be n-bit integers. The no-carry adder repeats the following iteration: xi+1 = xi ⊕ yi yi+1 = (xi &yi ) LSH 1 where ⊕ is bitwise XOR, & is bitwise AND, and LSH 1 shifts left by one bit. At each iteration xi holds a partial sum, and yi holds carry bits. The iteration continues until yi = 0. ECE750-TXB Example: No-carry adder II Lecture 14: Typical Inputs I Example: to calculate the sum of Todd L. Veldhuizen [email protected] x0 = 011011102 Bibliography y0 = 000000102 the following steps occur: x1 = 011011002 y1 = 000001002 x2 = 011010002 y2 = 000010002 x3 = 011000002 y3 = 000100002 x4 = 011100002 y4 = 000000002 I How many iterations are required? ECE750-TXB Example: No-carry adder III Lecture 14: Typical Inputs The number of iterations is determined by the length of I Todd L. the longest “carry sequence,” i.e., the longest span Veldhuizen [email protected] across which a carry must be propagated. Bibliography I For there to be a carry sequence of length t, there must be a bit position where x0 and y0 are both 1, followed by t − 1 positions where x0 and y0 have opposite bits: x0 = 01101110 y0 = 00000010 I The probability of a carry sequence of length t is easily bounded by employing the union bound: let Zi be the event that x0, y0 match in bit positions i through i + t − 2. Then [ X Pr( Zi ) ≤ Pr(Zi ) = (n − t + 1)2−t+1 ECE750-TXB Example: No-carry adder IV Lecture 14: This is very close to the equation for a run of 1’s; using Typical Inputs Todd L. t = log n + δ, we obtain Veldhuizen [email protected] [ −δ Pr( Zi ) ≤ 1 − O(2 ) Bibliography I So, the number of iterations is O(g(n, δ)) where g(n, δ) = log n + δ I To calculate the average case: n X T (n) = O(2−δ)O(log n + δ) δ=0 n X = O(2−δ log n + δ2−δ) δ=0 = O(log n) P −δ P −δ since δ log n2 = log n · δ 2 = log n, and P −δ δ δ2 = O(1). ECE750-TXB Bibliography I Lecture 14: Typical Inputs Todd L. Veldhuizen [email protected] [1] P. Erd¨osand M. Kac. Bibliography The Gaussian law of errors in the theory of additive number theoretic functions. Amer. J. Math., 62:738–742, 1940. bib pdf [2] M. Li and P. Vit´anyi. An introduction to Kolmogorov complexity and its applications. Springer-Verlag, New York, 2nd edition, 1997. bib [3] V. G. Vovk. The Kolmogorov-Stout law of the iterated logarithm. Mat. Zametki, 44(1):27–37, 154, 1988. bib ECE750-TXB Lecture 16: Randomized Algorithms Todd L. ECE750-TXB Lecture 16: Randomized Veldhuizen Algorithms [email protected] Bibliography Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada March 6, 2007 ECE750-TXB Stochastic algorithms and data structures I Lecture 16: Randomized Algorithms I A stochastic algorithm or data structure is one with Todd L. Veldhuizen access to a stream of random bits (a.k.a. coin flips). [email protected] These random bits can be used to make or influence I Bibliography decisions about how to proceed. The intended effect might be to: I Avoid worst cases I Achieve an average-case performance even for arbitrary inputs; I Use the random bits to guess answers, if good answers are plentiful. I In understanding stochastic algorithms/data structures there are two distributions to keep in mind: 1. The distribution of inputs; 2. The distribution of the random bits being used. I Some possibly familiar examples of stochastic algorithms include simulated annealing, genetic algorithms, Kernighan-Lin graph partitioning, etc. ECE750-TXB Stochastic algorithms and data structures II Lecture 16: Randomized Algorithms Todd L. Veldhuizen I A randomized algorithm (or data structure) is one that [email protected] offers good performance for any input, with high probability. i.e., there are no classes of inputs/operation Bibliography sequences for which the performance is asymptotically poor. I A classic application of randomization is to Hoare’s QuickSort. To sort an array A[1 ... n]: 1. If n = 1 then done. 2. Otherwise, choose a pivot element A[i] by examining some finite number of elements of the array. 3. Partition the array into three parts: items > A[i], items < A[i], and items = A[i]. 4. Recursively sort the first two partitions, and merge the resulting arrays. ECE750-TXB Stochastic algorithms and data structures III Lecture 16: Randomized Algorithms I If the pivot element is chosen deterministically, then we Todd L. 2 Veldhuizen can force the algorithm to take Θ(n ) time by designing [email protected] the input array carefully. For example, a common Bibliography heuristic is “median of three”: choose the pivot to be the median of A[1], A[bn/2c], A[n]. By placing the maximum elements of the array in these positions, the array is partitioned into subarrays of size n − 3, 3, and 0. Repeating this design recursively yields an array for which QuickSort requires Θ(n2) time. I In Randomized Quicksort, one choses the pivot element uniformly at random from 1 ... n. Then, it is impossible to design a worst case array input without knowing in advance the random bits being used to choose the pivot. I Performance for randomized algorithms is usually measured as “worst average case”: ECE750-TXB Stochastic algorithms and data structures IV Lecture 16: Randomized Algorithms I The time required for an input w, which we write T (w), Todd L. is no longer a deterministic function, but a random Veldhuizen variable of the coin flip sequence used by the algorithm. [email protected] Random bits (coin flips) s = 011001101100... Bibliography input w Randomized Algorithm I We measure the time required by the algorithm as maxw∈Kn Es [T (w)] I The maximum over all inputs w ∈ Kn of length n I of the expectation with respect to the random bit sequence s of the running time. I The input distribution is ignored: one is concerned with the worst-case (with respect to inputs) of the average time (with respect to the random bits). ECE750-TXB Randomized Equality Protocol I Lecture 16: Randomized Algorithms Todd L. reliable Machine A communication Machine B Veldhuizen File of n bits File of n bits [email protected] Bibliography I Consider the problem of maintaining a mirror of a large database across a reliable network connection. Both machine A and B have a copy of the database, and we wish to determine whether the files are the same. I Any algorithm achieving zero error for arbitrary files must transmit ≥ n bits: one can do no better than just transmitting the entire file from machine A to B. I Why? Each bit transmitted can be thought of as the outcome of some test performed on a file. If t tests are performed, and t < n, then there are 2t test outcomes and 2n > 2t possible files; by pigeonhole there must be two different files with the same test outcomes. ECE750-TXB Randomized Equality Protocol II Lecture 16: Randomized Algorithms I Note that if we transmit, e.g., an md5 checksum, there Todd L. exist pairs of files that are different but have the same Veldhuizen checksums, called hash collisions. (In fact there are [email protected] growing databases one can access on the internet to Bibliography attempt to produce md5 hash collisions.) I There is a simple randomized algorithm that: 1. Transmits O(log n) bits; 2. Achieves an astronomically low error probability, and this probability can be made as low as desired; 3. It is impossible to produce “hash collisions” that reliably cause the algorithm to wrongly report files are equal when they are not. I Randomized Equality Protocol: I Alice has a file x = x0x1 ··· xn−1, and Bob has a file y = y0y1 ··· yn−1.(xi , yi are bits; we interpret x and y as large integers.) 2 I Alice chooses a prime p uniformly at random in [2, n ]. (This prime can be represented in ≤ 2 log n bits.) ECE750-TXB Randomized Equality Protocol III Lecture 16: Randomized Algorithms I Alice computes Todd L. Veldhuizen [email protected] s = x mod p Bibliography and transmits s and p to Bob. (This requires ≤ 2d2 log ne bits, plus change.) I Bob computes q = y mod p If q = s, Bob outputs “x = y.” If q 6= s, Bob outputs “x 6= y.” 16 I Note that for a file of 10 bytes (≈ 900 Tb), the amount of data transmitted is ≈ 256 bytes. I To analyze the error, we take the usual “worst-case average” approach: for the worst possible choice of files, what is the average probability of error? ECE750-TXB Randomized Equality Protocol IV Lecture 16: Randomized Algorithms I Say a prime p is ‘bad’ for (x, y) if x mod p = y Todd L. Veldhuizen mod p, but x 6= y. Otherwise, say p is ‘good’ for (x, y). [email protected] Our general approach is to prove that the ‘good’ primes Bibliography vastly outnumber the bad ones, and so our chance of picking a ‘good’ prime is high. I The probability that an error occurs is #bad primes in [2, n2] #primes in [2, n2] 2 I The number of primes in [2, n ] is n2 π(n2) ∼ Li(n2) ∼ ln n2 (Prime number theorem; Li is the logarithmic integral.) ECE750-TXB Randomized Equality Protocol V Lecture 16: Randomized I An error occurs when x 6= y but x, y are the same Algorithms modulo p, i.e., we can write Todd L. Veldhuizen [email protected] x = x0 · p + s Bibliography y = y 0 · p + s for some integers x0, y 0. Then, p divides (x − y), since x − y = x0 · p + s − (y 0 · p + s) = (x0 − y 0) · p. Let r = |x − y|. Since r ≤ 2n, r has ≤ n − 1 prime divisors. The probability of p being a prime divisor of w is therefore n − 1 n 2 ln n ≤ ∼ = π(n2) n2 n ln n2 Therefore the probability of error is 2 ln n ≤ (1 − o(1)) n ECE750-TXB Randomized Equality Protocol VI Lecture 16: Randomized For example, if n = 1016, the error probability is Algorithms −14 Todd L. ≈ 10 . Veldhuizen [email protected] I This is a specific example of a general pattern: “abundance of witnesses.” The principle is that if x 6= y Bibliography and p does not divide (x − y), then p is a “witness” to the fact “x 6= y.” There are lots of witnesses, so if we choose a potential witness (a prime) at random, we’re likely to find one. I To get an even lower error, we can repeat the protocol k times: Alice chooses k primes uniformly at random 2 from [1, n ] and transmits x mod pi for each prime. With k independent trials, and failure probability in each, the probability of k failures is ≤ k . For example, with n = 1016 and k = 10, by sending ≈ 2 kb of data, we can obtain a probability of error ≈ 10−141. This is an example of success amplification. ECE750-TXB Classification of randomized algorithms I Lecture 16: Randomized Algorithms Todd L. Veldhuizen [email protected] Stochastic algorithms: use random bits in some way I Bibliography 1. Las Vegas algorithms: no error; use coin flips to avoid worst cases; get good worst-case expected time. 2. Monte Carlo algorithms: allow some probability of error. 2.1 One-sided Monte Carlo (1MC): a NO answer is always correct, a YES answer has probability of error (i.e., false positives are possible). 2.2 Bounded error Monte Carlo (2MC): computes a 1 function f (w) with probability ≥ 2 + δ of being correct, δ > 0. 2.3 Unbounded error Monte Carlo (UMC). ECE750-TXB One-sided Monte Carlo I Lecture 16: Randomized Algorithms Todd L. I Recall that a decision problem is described by some set Veldhuizen L; we are asked to decide “Is w ∈ L?” [email protected] I An algorithm A is a One-sided Monte Carlo when: Bibliography 1 I If x ∈ L then Prob(A(x) = 1) ≥ 2 . I If x 6= L then Prob(A(x) = 0) = 1. I The randomized-equality protocol we saw was a one-sided Monte Carlo algorithm: I It had zero probability of error if the files were equal, and some probability of error when the files were unequal. I To match the definition of one-sided MC, we could take the set being decided to be pairs of files of n bits that differ. (A NO answer to the decision problem = the files are equal.) 1 I Since the probability of error is < 2 , can get an error δ with t ≤ − log δ repetitions. ECE750-TXB Bounded-Error Monte Carlo I Lecture 16: Randomized Algorithms Todd L. Also known as Two-Sided Monte Carlo. Veldhuizen I [email protected] ∗ ∗ I Computes a function f :Σ → Σ , e.g., a function of binary strings. Bibliography 1 I Probability of being correct is ≥ 2 + , for some > 0 constant. I Since > 0, to obtain an error probability < δ we need only a constant number of iterations, independent of n. I If ∈ o(1) then might need exponentially many repetitions (in n) to achieve an error probability < δ. I Success amplification: I Run the algorithm t times. I If an output appears at least dt/2e times, output it (i.e., majority vote). I Otherwise, output “?” (algorithm fails.) ECE750-TXB Bounded-Error Monte Carlo II Lecture 16: Randomized Algorithms Todd L. Veldhuizen [email protected] Bibliography I A tedious analysis shows that to achieve an error probability < δ, it suffices to choose 2 ln δ t ≥ ln(1 − 42) I Note this formula does not depend on the length of the input. ECE750-TXB Unbounded Error Monte Carlo Lecture 16: Randomized Algorithms Todd L. Veldhuizen I Have probability 1/2 + (n) of being correct, i.e., better [email protected] than chance (but possibly not much better!) Bibliography I Using the same formula as before, to obtain an error δ, the number of repetitions required is 2 ln δ t ≥ ln(1 − 42(n)) If ∈ o(1), then 1 − 42(n) → 1, and ln(1 − 42(n)) → 0. So, t ∈ ω(1). −2 I Need t ∈ Ω( ) to keep error bounded. I Could be that exponentially many repetitions of the algorithm are required. ECE750-TXB Bibliography I Lecture 16: Randomized Algorithms Todd L. Veldhuizen [email protected] [1] Rajiv Gupta, Scott A. Smolka, and Shaji Bhaskar. On randomization in sequential and distributed Bibliography algorithms. ACM Comput. Surv., 26(1):7–86, 1994. bib pdf [2] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, Cambridge, 1997 edition, 1995. bib [3] Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. ACM Comput. Surv., 28(1):33–37, 1996. bib ECE750-TXB Lecture 17: Algorithms for binary relations and graphs ECE750-TXB Lecture 17: Algorithms for Todd L. Veldhuizen binary relations and graphs [email protected] Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada March 8, 2007 ECE750-TXB Binary Relations I Lecture 17: Algorithms for binary relations and graphs 2 Todd L. I Recall that a binary relation on a set X is a set R ⊆ X . Veldhuizen [email protected] I We may interpret a binary relation as a directed graph G = (X , R). I Some common axioms relations may satisfy: 1. Transitive (T ): ∀x, y, z . (R(x, y) ∧ R(y, z) → R(x, z)) / #/ ;/ If there is a path from x to z, there is an edge from x to z. ECE750-TXB Binary Relations II Lecture 17: Algorithms for 2. Reflexive: ∀x . R(x, x) binary relations and graphs Todd L. Veldhuizen < / b [email protected] Every vertex has an edge to itself. 3. Symmetric (S) ∀x, y . R(x, y) → R(y, x) Z If there is an edge from x to y, there is an edge from y to x. Usually one draws the graph without arrows: and it is called simply a “graph” rather than a directed graph. ECE750-TXB Binary Relations III Lecture 17: Algorithms for 4. Antisymmetric (A) binary relations and graphs Todd L. ∀x, y . R(x, y) ∧ R(y, x) → (x = y) Veldhuizen [email protected] When the relation is reflexive, transitive and also antisymmetric, it is a partial order. I A rough classification of binary relations: Binary Relation/Directed Graph Graph (S) Preorder/Quasiorder (T,R) Equivalence (T,R,S) Partial order/Poset (T,R,A) Tree order Total order ECE750-TXB Binary Relations IV Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] I Good algorithms for managing the common classes of binary relations are known. If you can identify the abstract relation(s) underlying a problem, this may lead you directly to efficient algorithms. ECE750-TXB Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] Part I Equivalence Relations ECE750-TXB Equivalence relations and partitions I Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen I An equivalence relation ∼ is a binary relation that is [email protected] reflexive, transitive, and symmetric. (The most familiar example: equality, “=”). I Pictured as a graph, an equivalence relation is a collection of cliques: a M b e f MMM qq >> Ð MM qqq >> Ð MqMq >> ÐÐ qq MMM > Ð qqq MM > ÐÐ c q d g 2 I For an equivalence ∼ ⊆ X , we write I [a]∼ = {b ∈ X : a ∼ b} for the equivalence class of a; ECE750-TXB Equivalence relations and partitions II Lecture 17: Algorithms for binary relations and graphs I X / ∼ for the set of equivalence classes induced by ∼: Todd L. Veldhuizen [email protected] X / ∼ = {[a]∼ : a ∈ X } X / ∼ is a partition. (Recall that a partition of a set X is a collection of subsets Y1,..., Yk of X that are S pairwise disjoint and satisfy Yi = X .) I Example: In the above figure, the equivalence classes are {{a, b, c, d}, {e, f , g}}. I Example: take N with a ∼ b ≡ (a mod 5 = b mod 5). The equivalence classes N/ ∼ are {{0, 5, 10,...}, {1, 6, 11,...},..., {4, 9, 14,...}}. I Common algorithmic problems we encounter with equivalence classes: I Answering queries of the form “Is a ∼ b?” ECE750-TXB Equivalence relations and partitions III Lecture 17: Algorithms for binary relations and graphs I Maintaining an equivalence relation as we progressively Todd L. decide objects are equivalent. (This results from an Veldhuizen [email protected] inductively defined equivalence relation.) Example: the Nelson-Oppen method for equational reasoning [7]. I Maintaining an equivalence relation as we progressively decide objects are not equivalent. (This results from a co-inductive definition of equivalence [6].) Example: minimizing states of a DFA [4], maintaining bisimulations, congruence closure [3]. I A system of representatives is the primary means for efficient manipulation of equivalence relations. I A system of representatives for ∼ is a function s :(X / ∼) → X choosing a single element from each block of the partition, such that a ∼ b if and only if s(a) = s(b) ECE750-TXB Equivalence relations and partitions IV Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen I Example: to reason about equivalence of integers [email protected] modulo 5, we could choose the representatives 0, 1, 2, 3, and 4. The integer 1 represents the equivalence class [1]∼ = {1, 6, 11, 16,...}. I With a means to quickly compute representatives, we can test whether a ∼ b by computing the representatives of the equivalence classes [a]∼ and [b]∼, then using equality. I If the equivalence relation is static, one can precompute a system of representatives as e.g., a table. If the equivalence relation is discovered dynamically, more sophisticated methods are needed. ECE750-TXB Disjoint Set Union I Lecture 17: Algorithms for binary relations and graphs I Disjoint Set Union is algorithms-speak for maintaining Todd L. an inductively-defined equivalence relation: Veldhuizen [email protected] I Initially we have a set of objects, none of which are known to be equivalent. I We gradually discover that objects are equivalent, and we wish to maintain a representation of the equivalence relation that lets us quickly answer queries of the form “Is a ∼ b?” I Interface: I union(a, b): include a ∼ b in the equivalence relation I find(a): returns an equivalence class representative (ECR) for a. I There is wonderfully elegant data structure due to Tarjan [8] that performs these operations in O(nα(n)) time, where α(n) ≤ 3 for n less than (cosmologists’ best estimate of) the number of particles in the universe. ECE750-TXB Disjoint Set Union II Lecture 17: Algorithms for binary relations I Tarjan’s data structure maintains the equivalence and graphs relation on the set X as a forest — a collection of trees. Todd L. Veldhuizen Each node in a tree is an element of the set X , each [email protected] tree is an equivalence class, and each root is an equivalence class representative. b e Ñ@ O ^>> O ÑÑ >> ÑÑ > a c d f O g A forest representation of the equivalence classes {{a, b, c, d}, {e, f , g}}. I Each element has a pointer to its parent; to determine the equivalence class representative, we follow the parent pointers to the root of the tree. ECE750-TXB Disjoint Set Union III Lecture 17: Algorithms for binary relations I The efficiency of the representation depends on how and graphs deep the trees are. To keep the trees shallow, two Todd L. Veldhuizen techniques are employed: (i) path compression; and (ii) [email protected] ‘union by rank.’ I Record representation: for each element x ∈ X , we track I parent(x): a pointer to the parent of x, or a pointer to itself if it is the root (alternately, a null pointer can be used.) I rank(x): indicates how deep trees are (but, not depth per se). I Pseudocode for find(a): find (a) if parent(a) 6= a then parent(a) ← find(parent(a)) return parent(a) ECE750-TXB Disjoint Set Union IV Lecture 17: Algorithms for binary relations and graphs This recursively follows the parent pointers up to the Todd L. root, then rewrites all the parent pointers so they point Veldhuizen [email protected] directly at the root, called “path compression”: f f Ñ@ ^== Ñ@ O ^== ÑÑ == ÑÑ == ÑÑ = ÑÑ = d e d c e O c Left: tree. Right: after calling find(c). I A simple way to implement union(a,b): just make the root of a’s tree have b as a parent. union(a,b) parent( find (a)) ← b ECE750-TXB Disjoint Set Union V Lecture 17: Algorithms for binary relations and graphs However, this can lead to poorly balanced trees. For better Todd L. asymptotic efficiency, one can track how deep the trees are and Veldhuizen [email protected] always make the deeper tree the parent of the shallower tree: called “union by rank.” union(a,b) pa ← find(a) pb ← find(b) if pa=pb then return if rank(pa) > rank(pb) then parent(pb) ← pa else parent(pa) ← pb if (rank(pa) = rank(pb)) rank(pb) ← rank(pa) + 1 ECE750-TXB Disjoint Set Union VI Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen I Tarjan proved that using both path compression and union by [email protected] rank, a sequence of n calls to union and find requires O(nα(n)) time, where α(n) ≤ 3 for .2 9 . > 2. => n ≤ 22 65536 > ;> i.e., a tower of 65536 powers-of-two. The function α(n) is the ‘inverse’ of the Ackermann function; see CLR [2] or [8] for details. I For any practical purpose, the time required by Tarjan’s algorithm is indistinguishable from O(n) for a sequence of n operations; or O(1) per operation amortized time (to come.) ECE750-TXB Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] Bibliography Part II Graphs ECE750-TXB Representation of Graphs I Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] I Here are four common methods of representing graphs. Bibliography I If the graph is large (e.g., infinite), the structure is not known beforehand, etc., we may choose an implicit representation for the graph, where vertices and edges are computed on-the-fly as needed. For example, the graph G = (N, E) where (x, y) ∈ E if and only if y divides x, is an infinite graph where the edges can be computed on the fly by factorization. I An explicit representation is one where we directly encode the structure of the graph in a data structure. Some common methods for this: ECE750-TXB Representation of Graphs II Lecture 17: Algorithms for binary relations and graphs I Adjacency matrix: an n × n matrix A of 0’s and 1’s, Todd L. Veldhuizen with Aij = 1 if and only if vi , vj ∈ E. Row i indicates [email protected] the out edges for vertex i, and column i indicates the in Bibliography edges. 0 1 1 0 0 0 0 1 A = 0 0 0 1 0 0 0 0 b / d O O a / c ECE750-TXB Representation of Graphs III Lecture 17: Algorithms for binary relations and graphs Adjacency lists: each vertex maintains a set of vertices Todd L. I Veldhuizen to/from which there is an edge e.g. [email protected] Bibliography out(a) = {b, c} out(b) = {d} out(c) = {d} out(d) = ∅ I If the graph structure is static (i.e., not changing as the algorithm runs), it is common to represent lists of in- and out- edges as vectors, for efficiency. I For more elaborate algorithms on e.g. weighted graphs, a representation of this sort is commonly used: ECE750-TXB Representation of Graphs IV Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] public class Edge { Bibliography Vertex x, y; double weight; } public class Vertex { Set ECE750-TXB Depth-First Search I Lecture 17: Algorithms for binary relations and graphs Todd L. I One of the commonest operations on a graph is to visit Veldhuizen the vertices of the graph one by one in some desired [email protected] order. This is commonly called a search. Bibliography I In a depth-first search, we explore along a single path into the graph as far as we can until no new vertices can be reached; then we return to some earlier point where new vertices are still reachable and continue. (Think of exploring a maze.) I Example of a depth-first search (yellow) starting at the center vertex of this graph: ECE750-TXB Depth-First Search II Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen I As we visit each new vertex, we perform some action [email protected] there. The choice of action depends on what we hope to accomplish; for now we will just call it “visiting the Bibliography vertex,” but later we will see examples of specific useful actions. We might choose to visit the vertex the first time we see it (preorder), or the last time we see it (postorder) I Here is a recursive implementation of depth-first search. It uses a set Seen to track which vertices have been visited. One can also include a flag field as part of the vertex data structure that can be “marked” to indicate the vertex has been seen. ECE750-TXB Depth-First Search III Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen dfs(x) [email protected] dfs(x, ∅) Bibliography dfs(x, Seen) if x 6∈ Seen Seen ← Seen ∪ {x} preorderVisit (x) // Do something For each edge (x, y), dfs(y,Seen) postorderVisit (x) // Do something I This search is easily implemented in a nonrecursive version, using a stack data structure to keep track of the current path into the graph: ECE750-TXB Depth-First Search IV Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] dfs(x) Seen = ∅ Bibliography Stack S push(S,x) while S is not empty, y ← pop(S) if y 6∈ Seen then Seen ← Seen ∪ { y } preorderVisit (y) for each edge (y, z), push(S,z) ECE750-TXB Topological Sort I Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen I A Directed Acyclic Graph (DAG) is a graph in which [email protected] there are no cycles (i.e., paths from a vertex to itself.) Bibliography I The reflexive, transitive closure of a DAG is a partial order. (If you add to a DAG an edge (x, y) whenever there is a path from x to y, plus self-loops (x, x), the resulting edge relation is a partial order: reflexive, transitive, and anti-symmetric.) I Every finite partial order can be extended to a total order: i.e., if v is a partial order on a finite set, there is a total order ≤ such that (x v y) ⇒ (x ≤ y); or, more obtusely, v ⊆ ≤. (Axiom of choice implies this for infinite sets also.) ECE750-TXB Topological Sort II Lecture 17: 2 Algorithms for I Example: let V = N (pairs of natural numbers), and binary relations for all i, j, put edges (i, j) → (i + 1, j) and and graphs Todd L. (i, j) → (i, j + 1): Veldhuizen [email protected] ...... O O O Bibliography O / O / O / ··· / / / ··· O O O / / / ··· Then the transitive reflexive closure of this graph is a partial order v where (i,j) v (i 0, j0) if and only if i ≤ i 0 and j ≤ j0: (2, 0) (1, 1) (0, 2) III uu III uu II uuu II uuu (1, 0) (0, 1) III uu II uuu (0, 0) ECE750-TXB Topological Sort III Lecture 17: Algorithms for binary relations and graphs Todd L. One way to extend v to a total order is: Veldhuizen [email protected] . Bibliography . O aB BBB O B ?_ BBB ?? aB ! ? ?_ BBB ?? ?? / / An example of what computer scientists call “dovetailing.” I Topological sort is a method for obtaining a total-order extension of a partial order. ECE750-TXB Topological Sort IV Lecture 17: Algorithms for binary relations and graphs I Example: Suppose we want to evaluate a digital circuit: a Todd L. d b Veldhuizen [email protected] e c Bibliography Build a graph where signals are vertices, and an edge indicates that one signal depends upon another (a ‘dependence graph’): e d c a b ECE750-TXB Topological Sort V Lecture 17: Algorithms for binary relations The transitive, reflexive closure of this graph yields an and graphs order w, where e.g., ‘e w d’ means signal e can be Todd L. Veldhuizen evaluated only after signal d. [email protected] Extending w to a total order ≥ gives us a valid order in which to evaluate the signals, e.g., Bibliography e ≥ d ≥ c ≥ b ≥ a If we evaluate signals in the order a, b, c, d, e we respect the dependencies. I Other examples: I Ordering the presentation of topics in a course or paper. I Solving equations I Makefiles I Planning (keeping track of task dependencies) I Spreadsheets and dataflow languages [5] I Ordering static initializers in programming languages I Dynamization of static algorithms e.g. [1] ECE750-TXB Topological Sort VI Lecture 17: Algorithms for binary relations I Here is an algorithm for topological sort based on and graphs depth-first search. Note that there are many ways in Todd L. Veldhuizen which a partial order can be extended to a total order; [email protected] this is just one method. Bibliography TopologicalSort (V,E) Set for x ∈ V dfs(x, visited , order) dfs(x, visited , order) if x 6∈ visited visited .add(x) for each out edge (x,y) dfs(y, visited , order) order . insertBack(x) ECE750-TXB Topological Sort VII Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] Bibliography I We search the dependence graph depth-first, visiting vertices postorder at which time we insert them at the back of the list. I Example: for the circuit example, a depth-first search might visit the vertices in the order a, b, d, c, e. ECE750-TXB Connected components of undirected graph I Lecture 17: Algorithms for binary relations and graphs I Defn: A set of vertices Y ⊆ V is connected if for every Todd L. Veldhuizen a, b ∈ Y there is a path from a to b. Y is a maximal [email protected] connected component if it cannot be enlarged, i.e., for Bibliography any connected set of vertices Y 0 with Y ⊆ Y 0, Y = Y 0. I Note that the connected components of a graph form a partition of the vertices: g d Ñ Ð == ÑÑ ÐÐ = a b c e q The connected components are {{a, b, g, q}, {c, d, e}}. I Using Tarjan’s disjoint set union, there is a very simple algorithm for connected components: ECE750-TXB Connected components of undirected graph II Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [email protected] 1. Have a parent pointer and rank associated with each Bibliography vertex (e.g., by creating a separate record for each vertex, or by storing these fields directly in the vertex data structure.) 2. For each edge (a, b), call union(a, b). No searching is necessary! The complexity is O(|E + V |α(|E + V |)), ‘practically’ linear in the number of vertices and edges. ECE750-TXB Bibliography I Lecture 17: Algorithms for binary relations and graphs Todd L. Veldhuizen [1] Umut A. Acar, Guy E. Blelloch, Robert Harper, Jorge L. [email protected] Vittes, and Shan Leung Maverick Woo. Dynamizing static algorithms, with applications to Bibliography dynamic trees and history independence. In SODA ’04: Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, pages 531–540, Philadelphia, PA, USA, 2004. Society for Industrial and Applied Mathematics. bib pdf [2] Thomas H. Cormen, Charles E. Leiserson, and Ronald R. Rivest. Intoduction to algorithms. McGraw Hill, 1991. bib ECE750-TXB Bibliography II Lecture 17: Algorithms for binary relations and graphs [3] Peter J. Downey, Ravi Sethi, and Robert Endre Tarjan. Todd L. Veldhuizen Variations on the common subexpression problem. [email protected] Journal of the ACM (JACM), 27(4):758–771, 1980. bib pdf Bibliography [4] J. E. Hopcroft. An n log n algorithm for minimizing the states in a finite-automaton. In Z. Kohavi, editor, Theory of Machines and Computations, pages 189–196. Academic Press, 1971. bib [5] Wesley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. Advances in dataflow programming languages. ACM Comput. Surv., 36(1):1–34, 2004. bib pdf ECE750-TXB Bibliography III Lecture 17: Algorithms for binary relations and graphs Todd L. [6] Y. N. Moschovakis. Veldhuizen [email protected] Elementary Induction on Abstract Structures. North-Holland, Amsterdam, 1974. bib Bibliography [7] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. Journal of the ACM (JACM), 27(2):356–364, 1980. bib pdf [8] R. E. Tarjan. Efficiency of a good but not linear disjoint set union algorithm. Journal of the ACM (JACM), 22:215–225, 1975. bib pdf ECE750-TXB Lecture 18: Graph Algorithms Todd L. Veldhuizen [email protected] Electrical & Computer Engineering University of Waterloo Canada March 13, 2007 Weighted Graphs I A weighted graph is a triple G = (V , E, w) where w is a weight function: often w : E → R ∪ {+∞} + w : E → Q I Often w(x, y) > 0 and represents a “distance” or “score.” I Example: vertices are cities, and edges represent driving times between adjacent cities. Distance metric on graphs I I If edge weights are positive, we can define a distance metric (or quasimetric) on the graph. You are familiar with Euclidean 2 distance, for example, in R : q 2 2 d(x, z) = (x1 − z1) + (x2 − z2) We can define distances in graphs in such a way that they share many of the useful properties of Euclidean distance. 2 I A distance metric d : V → R satisfies: 1. d(x, y) ≥ 0 2. d(x, y) = 0 if and only if x = y 3. d(x, y) = d(y, x) (symmetry) 4. d(x, y) + d(y, z) ≥ d(x, z) (triangle inequality) If the symmetry axiom (3) is omitted, d is called a quasimetric. (For weighted directed graphs, a quasimetric may be appropriate.) Distance metric on graphs II 2 I A set V together with a distance metric d : V → R is called a metric space. I A connected graph with nonnegative edge weights can be turned into a metric space: 1. Define the length of a path to be the sum of edge weights along the path; 2. Define d(x, x) = 0 and d(x, y) to be the minimum path length from x to y. 2 I In R we can define open and closed discs: p {(x, y): x2 + y 2 < r} p {(x, y): x2 + y 2 ≤ r} Distance metric on graphs III I In a metric space (V , d) we can define open and closed balls: Br (x) = {y ∈ V : d(x, y) < r} E.g. if we construct a graph of settlements in Ontario where edges indicate roads and weights are driving times, then a ball is e.g., settlements that are within two hours of Waterloo. Breadth-first search I I Breadth-first search is another method to visit all the vertices of a graph. Conceptually, we put weights of 1 on each edge. Then from some starting vertex x, we consider balls of radius r around x and take r → ∞; we visit vertices in the order they are added to the ball. Breadth-first search II I Basic scheme: we maintain a queue of vertices that are just outside the current ball. BFS(V,E,x) Seen = ∅ Queue Q Enqueue(Q,x) While Q is not empty, Get y = next element in queue. if y 6∈ Seen, Seen ← Seen ∪{y} Visit y For each edge y, z ∈ E, Enqueue(Q,z) This algorithm is linear in the number of edges. Single-source shortest paths I I The BFS algorithm is easily modified to solve the following problem: given a connected graph with nonnegative edge weights and a specified vertex x, compute d(x, y) for all y ∈ V . That is, find the length of the shortest path from x to every other vertex in the graph. I Intuition: again, consider balls centered around x, but use edge weights. We want to visit vertices in order of their distance from x, so we modify the BFS algorithm to use a priority queue. We put pairs (z, d) into the priority queue, where z is a vertex, and d is the length of some path from x to z. The priority queue orders (z, d) pairs by d, using e.g., a min heap, so that at each step we can efficiently retrieve the next closest vertex to x. Single-source shortest paths II SSSP(V,E,x,w) Seen ← ∅ PriorityQueue PQ. Put (x, 0) in PQ. While PQ is not empty, Get (y,d) from PQ (least element). if y 6∈ Seen then Seen ← Seen ∪ { y } Visit (y,d) For each edge (y,z), put (z,d+w(y,z)) in PQ. I Time complexity: if we use a min-heap, this achieves O(|E| log |E|) time. It is possible to get this down to O(|V | log |V | + |E|) by the use of somewhat exotic data structures such as Fibonacci heaps. Transitive Closure I I Let G = (V , E) be a graph. 0 ∗ ∗ I The transitive closure of G is G = (V , E ) where (x, y) ∈ E if there is a path from x to y in G. I Define T (E) = {(x, y) : path from x to y in E}. ∗ I Then E = T (E). I T is a closure operator: 1. E ⊆ T (E) (nondecreasing) 2. (E1 ⊆ E2) ⇒ (T (E1) ⊆ T (E2)) (monotone) 3. T (T (E)) = T (E) (idempotent/fixpoint) I The complexity of transitive closure is closely linked to that of matrix multiplication. Transitive Closure II I There is a path of length 2 from i to j if there is some vertex k such that E(i, k) ∧ E(k, j). We can write this as: _ E 2(i, j) = E(i, k) ∧ E(k, j) k∈V W W where k∈V is a disjunction over all vertices k ∈ V ( is to ∨ as P is to +, Q is to ×, etc.) = ÒÒ == ÒÒ == ÒÒ == ÒÒ = i j // / // / // // // / Transitive Closure III I Compare to matrix multiplication: if B = AA, then X bij = aik · akj k I If A is the adjacency matrix of the graph, then to find paths of length 2 we can compute the matrix product A2 in the boolean ring (B, +, ·, 0, 1) where B = {0, 1}, addition is disjunction (α + β) ≡ (α ∨ β), and multiplication is conjunction (α · β) ≡ (α ∧ β). I To find paths of any length, we can write A∗ = I + A + A2 + A3 + ··· (1) where we need only compute terms up to An−1, where n = |V |, the number of vertices. With the leading I term, Eqn. (1) gives a reflexive transitive closure. The difference between transitive closure and reflexive-transitive closure is trivial: the latter has E ∗(x, x) for every x. Transitive Closure IV 5 I The obvious method of evaluating Eqn. (1) requires O(n ) time. If we write A∗ = I + A(I + A(I + A(I + A(I + ··· )))) (2) we can compute A∗ with O(n4) operations. 8 2 2 2 I By using power trees, e.g., A = (((A ) ) ) we can compute the transitive closure with O(n3 log n) operations; or O(nγ log n), where γ is the exponent of e.g. Strassen matrix multiplication. I There is a simple algorithm (Warshall’s Algorithm) that computes transitive closure from the adjacency matrix in O(n3) time. I In practice, the best way to compute transitive closure depends strongly on the anticipated structure of the input graph, its size, density, planarity, etc. There is a large literature on algorithms for TC. Transitive Closure V I Perhaps surprisingly, there is an algorithm computing transitive closure in O(n2) average time, for a uniform distribution on graphs. I The G(n, p) random graph model is a distribution on graphs of n vertices where each edge is present independently with probability p. Choosing p = 1/2 gives a uniform distribution on graphs. 1 2 I In G(n, 2 ), transitive closure can be computed in O(n ) time on average. I The reason why: with probability 1, every vertex is at most two steps away from every other vertex: I Let x, y ∈ V be vertices; there are n − 2 choices of intermediate vertices to make a path of length 2 from x to y. 1 With each intermediate vertex w, we have a probability 4 of having both the edge (x, w) and (w, y). I The probability of there being no w such that 3 n−2 E(x, w) ∧ E(w, y) is ( 4 ) . Transitive Closure VI