CSE 638: Advanced Algorithms

Lectures 20 & 21 ( Cache-oblivious with Decrease-Keys )

Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2013 Cache-Oblivious Buffer Heap

Amortized I/O Bounds

Priority Queue Delete / Delete-Min Decrease-Key

Cache- Buffer Heap oblivious 1 N  O log B2 B  Cache-   Tournament Tree aware

Binary Heap O(((log N ))) ( worst-case ) 2 Internal Memory 1 O(((log 2 N ))) O ((( ))) Cache-Oblivious Buffer Heap: Structure

Consists of r = 1 + log 2N levels, where N = total number of elements.

For 0 ≤≤≤ i ≤≤≤ r - 1, level i contains two buffers:

Element Buffers Update Buffers  element buffer Bi B U contains elements of the form 0 0 B U (x, kx), where x is the element 1 1

id, and kx is its key B2 U2

 update buffer Ui B U contains updates ( Delete , i i Decrease-Key and Sink ), each B U augmented with a time-stamp. r - 1 r - 1 Fig: The Buffer Heap Cache-Oblivious Buffer Heap: Invariants

i Element Buffers Update Buffers Invariant 1 : |Bi|≤≤≤ 2

B0 U0

U Invariant 2 : B1 1

B2 U2 (a) No key in Bi is larger than any key in Bi+1

(b) For each element x in Bi, all updates yet Bi Ui to be applied on x reside in U0, U1, … , Ui

Br - 1 Ur - 1 Invariant 3 : Fig: The Buffer Heap (a) Each Bi is kept sorted by element id

(b) Each Ui ( except U0 ) is kept ( coarsely ) sorted by element id and time-stamp Cache-Oblivious Buffer Heap: Operations

The following operations are supported: − Delete (x): Deletes the element x from the queue. − Delete-Min ( ) : Extracts an element with minimum key from queue.

− Decrease-Key (x, kx): ( weak Decrease-Key ) ′′′ ′′′ If x already exists in the queue, replaces key k x of x with min( kx, k x),

otherwise inserts x with key kx into the queue.

A new element x with key kx can be inserted into queue by Decrease-Key (x, kx ). Cache-Oblivious Buffer Heap: Operations

Decrease-Key (x, kx) :

Insert the operation into U0 augmented with current time-stamp.

Delete (x) :

Insert the operation into U0 augmented with current time-stamp.

Delete-Min ( ) : Two phases: − Descending Phase ( Apply Updates ) − Ascending Phase ( Redistribute Elements ) Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) : 1. sort updates

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 2. apply updates: U1 − Delete (x)

B2 − Decrease-Key (x, kx) U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2 3. carry updates: − all updates that did not apply − applied Decrease-Keys as Deletes

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) : 1. sort updates: − merge segments B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 2. apply updates: U2 − Delete (x)

− Decrease-Key (x, kx) − Sink (x, kx)

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2

3. carry updates

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Descending Phase ( Apply Updates ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

− find 2k elements with 2k smallest keys Uk+1 − convert each remaining element x with

key kx to Sink (x, kx) Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1

push the Sinks to Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

move all elements from Bk to shallower levels leaving Bk empty

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) :

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: Delete-Min Delete-Min ( ) - Ascending Phase ( Redistribute Elements ) : element with minimum key

B0 U0

B1 U1

B2 U2

Bk-1 Uk-1

Bk Uk

Uk+1 Cache-Oblivious Buffer Heap: I/O Complexity

1  r−1 r − 1  Potential Function: 4 3 1 Φ ((()(H))) = rU0 +∑((()( riU −++))) i ∑ ((()( iB))) i  B  i=1 i = 0 

Element Buffers Update Buffers direction of direction of B0 U0 elements updates

B1 U1

B2 U2

Bi Ui

updates generate elements

Br - 1 Ur - 1

Fig: Major Data Flow Paths in the Buffer Heap

Lemma: A Buffer Heap on N elements supports Delete , Delete-Min and 1 Decrease-Key operations cache-obliviously in   amortized Ο log 2 N  B  I/Os each using Ο ((( N ))) space. Cache-Oblivious Buffer Heap: I/O Complexity

1  r−1 r − 1  Potential Function: 4 3 1 Φ ((()(H))) = rU0 +∑((()( riU −++))) i ∑ ((()( iB))) i  B  i=1 i = 0 

Element Buffers Update Buffers direction of direction of B0 U0 elements updates

B1 U1

B2 U2

Bi Ui

updates generate elements

Br - 1 Ur - 1

Fig: Major Data Flow Paths in the Buffer Heap

Lemma: A Buffer Heap on N elements supports Delete , Delete-Min and 1 Decrease-Key operations cache-obliviously in   amortized Ο log 2 N  B  I/Os each using Ο ((( N ))) space. Cache-Oblivious Buffer Heap: I/O Complexity

1  r−1 r − 1  Potential Function: 4 3 1 Φ ((()(H))) = rU0 +∑((()( riU −++))) i ∑ ((()( iB))) i  B  i=1 i = 0 

Element Buffers Update Buffers direction of direction of B0 U0 elements updates

B1 U1

B2 U2

Bi Ui

updates generate elements

Br - 1 Ur - 1

Fig: Major Data Flow Paths in the Buffer Heap

Lemma: A Buffer Heap on N elements supports Delete , Delete-Min and 1 Decrease-Key operations cache-obliviously in   amortized Ο log 2 N  B  I/Os each using Ο ((( N ))) space. Cache-Oblivious Buffer Heap: I/O Complexity

1  r−1 r − 1  Potential Function: 4 3 1 Φ ((()(H))) = rU0 +∑((()( riU −++))) i ∑ ((()( iB))) i  B  i=1 i = 0 

Element Buffers Update Buffers direction of direction of B0 U0 elements updates

B1 U1

B2 U2

Bi Ui

updates generate elements

Br - 1 Ur - 1

Fig: Major Data Flow Paths in the Buffer Heap

Lemma: A Buffer Heap on N elements supports Delete , Delete-Min and 1 Decrease-Key operations cache-obliviously in   amortized Ο log 2 N  B  I/Os each using Ο ((( N ))) space. Cache-Oblivious Buffer Heap: I/O Complexity

1  r−1 r − 1  Potential Function: 4 3 1 Φ ((()(H))) = rU0 +∑((()( riU −++))) i ∑ ((()( iB))) i  B  i=1 i = 0 

Element Buffers Update Buffers direction of direction of B0 U0 elements updates

B1 U1

B2 U2

Bi Ui

updates generate elements

Br - 1 Ur - 1

Fig: Major Data Flow Paths in the Buffer Heap

Lemma: A Buffer Heap on N elements supports Delete , Delete-Min and 1 Decrease-Key operations cache-obliviously in   amortized Ο log 2 N  B  I/Os each using Ο ((( N ))) space. Cache-Oblivious Buffer Heap: Layout

• All Bi’s are kept in a stack SB. • All Ui’s are kept in a stack SU. • An array As is maintained in a stack SA.

for 0 ≤≤≤ i ≤≤≤ r – 1 As[i] contains:

− |Bi|

− number of segments in Ui

− number of updates in each segment of Ui In both stacks lower level buffers are placed above higher level buffers. The left to right order of the elements in any buffer are maintained top to bottom in the stack. Cache-Oblivious Buffer Heap: Reconstruction

After each operation check whether ∑∑∑|Ui| ≥≥≥ ∑∑∑|Bi|, and if so,

Step 1: Sort the elements in SB by element id and level number.

Step 2: Sort the updates in SU by element id and time-stamp.

Step 3: Scan SB and Su simultaneously, and apply the updates in

Su on the elements of SB. Step 4: Reconstruct the by filling the shallowest

levels with the current elements in SB, and emptying Su. Cache-Oblivious Buffer Heap: I/O Complexity Lemma : A BH supports Delete , Delete-Min , and Decrease-Key operations

in O((1 / B) log 2(N / B)) amortized I/Os each assuming a tall cache.

PotentialProof : We Method will :use the Potential Method .

Associates− Each credit Decrease-Key with the entire inserted data structure into U0 willinstead be treatedof specific as objects. a pair of 〈〈〈 〉〉〉 − D0operations:= initial state Decrease-Key of the data structure, Dummy .

−− DiEach= state component of data structure of the afteractual i-th cost operation, will have i = a1, ΘΘΘ 2,(1 …, / Bn) factor

− ΦΦΦassociated= a potential with function it whichmapping we will each drop Di to for a real simplicit numbery. ΦΦΦ(Di)

− ci / ai = actual / amortized cost of the i-th operation, i = 1, 2, …, n − For 0 ≤≤≤ i ≤≤≤ r - 1, let ui = |Ui|, and bi = |Bi|. n n Then = +ΦΦ − ⇒⇒⇒ = + ΦΦ − − If H isacDii the current((()( i)))( state((() D i−−−1))) of BH∑ acD, ii we ∑ define((()( potentialn ))) ((()( D 0 ))) of H as: i=1 i = 1

1 1 − define ΦΦΦ so that ΦΦΦ(D0) = 0 andr− ΦΦΦ(Di) ≥≥≥ 0 for r all − i. Φ H=4 ru + 3 riu −+ i + 1 b n((()( n))) 0 ∑ n ((()( )))(i ∑ ((()))) i i=1 i = 0 Then ∑ci= ∑ a i −Φ((()( Dn ))) ≤ ∑ a i i=1 i = 1 i = 1 Thus the total amortized cost is an upper bound on the total actual cost. Cache-Oblivious Buffer Heap: I/O Complexity Lemma : A BH supports Delete , Delete-Min , and Decrease-Key operations

in O((1 / B) log 2(N / B)) amortized I/Os each assuming a tall cache.

PotentialProof : We Method will :use the Potential Method .

Associates− Each credit Decrease-Key with the entire inserted data structure into U0 willinstead be treatedof specific as objects. a pair of 〈〈〈 〉〉〉 − D0operations:= initial state Decrease-Key of the data structure, Dummy .

−− DiEach= state component of data structure of the afteractual i-th cost operation, will have i = a1, ΘΘΘ 2,(1 …, / Bn) factor

− ΦΦΦassociated= a potential with function it whichmapping we will each drop Di to for a real simplicit numbery. ΦΦΦ(Di)

− ci / ai = actual / amortized cost of the i-th operation, i = 1, 2, …, n − For 0 ≤≤≤ i ≤≤≤ r - 1, let ui = |Ui|, and bi = |Bi|. n n Then = +ΦΦ − ⇒⇒⇒ = + ΦΦ − − If H isacDii the current((()( i)))( state((() D i−−−1))) of BH∑ acD, ii we ∑ define((()( potentialn ))) ((()( D 0 ))) of H as: i=1 i = 1

1 1 − define ΦΦΦ so that ΦΦΦ(D0) = 0 andr− ΦΦΦ(Di) ≥≥≥ 0 for r all − i. Φ H=4 ru + 3 riu −+ i + 1 b n((()( n))) 0 ∑ n ((()( )))(i ∑ ((()))) i i=1 i = 0 Then ∑ci= ∑ a i −Φ((()( Dn ))) ≤ ∑ a i i=1 i = 1 i = 1 Thus the total amortized cost is an upper bound on the total actual cost. Cache-Oblivious Buffer Heap: I/O Complexity

r−1 r − 1 Φ =4 + 3 −+ + 1 Amortized Cost of Reconstruction : ((()(H))) ru0 ∑((()( riu)))(i ∑ ((() i))) b i i=1 i = 0 r−1 r − 1 r −−−1 − ≤≤≤ r u Starts with ∑ u i ≥≥≥ ∑ b i . Thus cost of steps 1 to 4 is ∑∑∑ i. i=0 i = 0 r −−−1 i === 0 − Total potential drop is ≤≤≤ r ∑∑∑ u i. i === 0 r −−−1 r −−−1 − Amortized cost of reconstruction is ≤≤≤ r ∑∑∑ u i ≤≤≤ r ∑∑∑ u i . i === 0 i === 0

After each operation check whether ∑∑∑ui ≥≥≥ ∑∑∑bi, and if so,

Step 1: Sort the elements in SB by element id and level number.

Step 2: Sort the updates in SU by element id and time-stamp.

Step 3: Scan SB and Su simultaneously, and apply the updates in

Su on the elements of SB. Step 4: Reconstruct the data structure by filling the shallowest

levels with the current elements in SB, and emptying Su. X   r=== Ο(log N ) I/O cost of sorting X elements = Οlog 2 X  (( 2 )) B  Cache-Oblivious Buffer Heap: I/O Complexity

r−1 r − 1 Φ =4 + 3 −+ + 1 Amortized Cost of Reconstruction : ((()(H))) ru0 ∑((()( riu)))(i ∑ ((() i))) b i i=1 i = 0 r−1 r − 1 r −−−1 − ≤≤≤ r u Starts with ∑ u i ≥≥≥ ∑ b i . Thus cost of steps 1 to 4 is ∑∑∑ i. i=0 i = 0 r −−−1 i === 0 − Total potential drop is ≤≤≤ r ∑∑∑ u i. i === 0 r −−−1 r −−−1 − Amortized cost of reconstruction is ≤≤≤ r ∑∑∑ u i ≤≤≤ r ∑∑∑ u i . i === 0 i === 0 Amortized Cost of Delete : − Actual cost = 1 . − Increase in potential = 4r.

− Amortized cost = 1 + 4 r = O(log 2N).

Delete (x) :

Insert the operation into U0 augmented with current time-stamp. [Stack Push /Pop requires O(1 / B) amortized I/Os each] Cache-Oblivious Buffer Heap: I/O Complexity

r−1 r − 1 Φ =4 + 3 −+ + 1 Amortized Cost of Reconstruction : ((()(H))) ru0 ∑((()( riu)))(i ∑ ((() i))) b i i=1 i = 0 r−1 r − 1 r −−−1 − ≤≤≤ r u Starts with ∑ u i ≥≥≥ ∑ b i . Thus cost of steps 1 to 4 is ∑∑∑ i. i=0 i = 0 r −−−1 i === 0 − Total potential drop is ≤≤≤ r ∑∑∑ u i. i === 0 r −−−1 r −−−1 − Amortized cost of reconstruction is ≤≤≤ r ∑∑∑ u i ≤≤≤ r ∑∑∑ u i . i === 0 i === 0 Amortized Cost of Delete : − Actual cost = 1 . Decrease-Key (x, k ) : − Increase in potential = 4r. x Insert the operation into U0 augmented with current time-stamp.

− Amortized cost [Each= 1 +Decrease-Key 4 r = O(logis considered2N). as two operations] Amortized Cost of Decrease-Key : − Actual cost = 2 ××× 1. − Increase in potential = 2 ××× 4r.

− Amortized cost = 2 + 8 r = O(log 2N). Cache-Oblivious Buffer Heap: I/O Complexity

Amortized Cost of Delete-Min : X  I/O cost of sorting X elements = Ο log 2 X  B Actual Cost:  

− Cost of sorting U0 is ≤≤≤ ru 0. k 1 − Cost of examining the updates in U0, U1, …, Uk is∑∑∑ ((()(k− i + ))) u i . i === 0 k −−−1 − Cost of examining the elements in B0, B1, …, Bk-1 is ∑∑∑ b i . i === 0 ′′′ − Let bk and b k be the number of elements in Bk before and after the updates, respectively.  X  I/O cost of scanning X elements = Ο   Then the total cost of examining B during updates and selectionB  k X  kI/O′′′ is costthe smallestof scanning level X withelements non-empty = Ο  B  after updates is max ((( b k , b k ))) . B  k after updates. − Cost of accessing As is ≤≤≤ r .

k k −−−1 1 ′′′ Thus actual cost, cru≤+0 ∑((()( kiu −+++))) i ∑ b i max((()( bbrk , k ))) + i=0 i = 0 Cache-Oblivious Buffer Heap: I/O Complexity

Amortized Cost of Delete-Min : Actual Cost:

− Cost of sorting U0 is ≤≤≤ ru 0. k 1X  − Cost of examining theI/O updates cost of inselection U0, U1, from …, U Xk elementsis∑∑∑ ((()(k− = i Ο + ))) u  i . i === 0 k −−−1 B  − Cost of examining the elements in B0, B1, …, Bk-1 is ∑∑∑ b i . i === 0 − Let b and b′′′ be the number of elements in B before and after k k As stores informationk on each the updates, respectively. buffer and has length = Ο ((( r )))

Then the total cost of examining Bk during updates and selection

after updates is max ((( b k , b k′′′ ))) .

− Cost of accessing As is ≤≤≤ r .

k k −−−1 1 ′′′ Thus actual cost, cru≤+0 ∑((()( kiu −+++))) i ∑ b i max((()( bbrk , k ))) + i=0 i = 0 Cache-Oblivious Buffer Heap: I/O Complexity

r−1 r − 1 Amortized Cost of Delete-Min : Φ H=4 ru + 3 riu −+ i + 1 b Φ ((()(H))) = ru0 +∑((()( riu −+)))(i ∑ ((() i + ))) b i i=1 i = 0 Potential Drop:

− Potential drop due to changes in update buffers is k k 4 3 31 ≥ru0 +∑((()( riu −))) i −−−((()( rk))) ∑ u i i =1 i = 0 − Potential drop due to changes in element buffers is k −−−1 ′′′ ≥∑∑∑ bi + max((()( bk , b k ))) i === 0

Thus, total drop in potential is k k −−−1 1 ′′′ ≥ru0 +∑((()( kiu −+))) i + ∑ b i + max((()( bbk , k ))) i=0 i = 0 Therefore, amortized cost of Delete-Min is

ccHˆ =+(((Φ((( after))) − Φ((( H before )))))) ≤= r Ο ((()(log 2 N ))) Cache-Oblivious Buffer Heap: I/O Complexity

1+++ε 0 We assume N ≫ M === Ω ((( B ))) for some ε >>>  N  ⇒⇒⇒ log2N === Ο  log 2  B 

Therefore, under the tall cache assumption,

Amortized I/O cost of each operation 1 N  (Delete , Delete-Min and Decrease-Key ) is = Ο log 2  B B  Removing the “Tall Cache” Assumption

i − Restrict the size of each update buffer: |Ui|≤≤≤ 2

− Now all buffers ( Ui and Bi) of the first log 2B levels occupy only O(B) blocks.

− No external I/O is required to access the first log 2B levels. − Thus amortized I/O cost of each operation is 1  1 N =Ο((()(log2N − log 2 B )))  = Ο log 2 B  B B