ALGORITHMS -CSE202-SPRING 2000 Divide and Conquer and Sorting, Some Solutions

Name Tree Isomorphism

0 T

Problem Give an efficient algorithm to decide , given two rooted binary trees T and ,

0 T whether T and are isomorphic as rooted trees.

Solution

0 0 0 0

T r oot T L; R ; L ;R

Let r oot be the root of and be the root of , and let be the left and

0 0

T T T f

right subtrees of T and respectively. If embeds in , then the isomorphism must

0 0

r oot L L

map r oot to , and must map every member of either entirely to or entirely to

0 Iso

R and vice versa. Thus, we can define the isomorphism relationship on sub-trees as

NIL;NIL=True Isox; N I L=IsoN I L; x=False x 6= NIL follows: Iso , if

and otherwise

Isor ;r =

1 2

IsoLef tr ; Lef tr AN D I soRig htr ; R ig htr 

1 2 1 2

OR

IsoLef tr ; R ig htr AN D I soRig htr ; R ig htr :

1 2 1 2

2

n 

The above immediately translates into a program, and the time, I claim, is O .

0

2 T; y 2 T x; y

The reason is that for any nodes x , the above calls itself on at most

y P

once (in fact, precisely when x and are at the same depths of their trees.) Let be

0 0

x P R oot y the path from R oot to and be that from to ; in a tree there is exactly one’

such path. Then, of the four recursive calls at any level, at most one involves nodes from

0 P

both P and . Inductively, at any recursion depth, at most one call compares a sub-tree

y d containing x to one containing . Since all sub-trees called at depth of the recursion

are depth d in their respective trees, the claim follows. Therefore the total number of

2

n  calls is at most O .

1 PROBLEM 1(8-3IN CLR): Stooge sort

Solution:

[1 :::n]

a) We will prove by induction that Stooge sort correctly sorts the input array A .

= 1 n = 2 n = 1 i = j i +1 > j

The base cases are for n and .If then , and thus ;

[i] n =2 then, line 4 returns A (which is correct). If then lines 1 and 2 guarantee that a

bigger element will be returned in a position following that of a smaller element. Thus,

n =1 n =2

Stooge sort correctly sorts A when and when .

 3 The inductive case happens when n . The inductive hypothesis is that Stooge

sort correctly sorts arrays of length

arrays of length n.

The input array A is essentially split into three pieces: the first third and the last

n 2k k = bn=3c

third are both of length k , and the second third is of length ,where .

i +1 = n

Note 1: j in the first or “top” call to Stooge sort. Note 2: Observe that the

= bn=3c  n=3

middle third is at least as long as each of the other thirds (since k ,but

2k = n 2bn=3c n2n=3 = n=3 n ; this is actually necessary for the algorithm to work).

In line 6, Stooge sort is applied to the first two-thirds of A (i.e., the subarray of length

k A n  3

n consisting of the first and middle thirds of ). We are assuming that ,so

= bn=3c  1 n k n k and is strictly less than . This means that we can invoke the

inductive hypothesis on this subarray (i.e., we assume by inductive hypothesis that the

0 A

first two-thirds of A will be sorted in the first two-thirds of the new array ).

0

A n k A

After line 6, the largest k elements of will be in the last positions of (i.e.,

0 x

they will be in the middle third and/or the last third of A ). Here’s why: Let be any one

A x A

of the k largest elements in .Case1:If is in the last third of , then it will be in the

0 x

last third of A , and our claim is rather trivially true. Case 2: Assume is in the first two-

x k A n k

thirds of A.Since is one of the largest elements in , at least other elements

 x k n k

in A are “smaller” (actually, ) than .Atmost, of these elements can be in the

n 2k

last third of A. Therefore, at least of these elements are in the first two-thirds of

n 2k A

A. We can conclude that there are at least elements in the first two-thirds of

n 2k = n 2bn=3cn2n=3 = n=3 bn=3c = k

that are smaller than x.But .

0

 n 2k A

Because k , the first k positions of will consist of elements that are all

0

x A x

smaller than x. That means that will be in the middle third of . In either case, will 0

be in the last two-thirds of A . 0

In line 7, Stooge sort is applied to the last two-thirds of A (i.e., the subarray of length

0

k A n  3

n consisting of the middle and last thirds of ). We are assuming that ,so

= bn=3c  1 n k n k and is strictly less than . This means that we can invoke the

inductive hypothesis on this subarray (i.e., we assume by inductive hypothesis that the

0 00 A

last two-thirds of A will be sorted in the last two-thirds of the new array ). A

Recall that after line 6, the largest k elements of will be in the last two-thirds (i.e.,

0 0

k A k A A

the last n positions) of . After line 7, the largest elements of (or ) will make 00

up the last third of A , and they will be in sorted order. 00

In line 8, Stooge sort is applied to the first two-thirds of A (i.e., the subarray of

00

k A

length n consisting of the first and middle thirds of ). We are assuming that

 3 k = bn=3c  1 n k n n ,so and is strictly less than . This means that we can

invoke the inductive hypothesis on this subarray (i.e., we assume by inductive hypothesis

00 B that the first two-thirds of A will be sorted in the first two-thirds of the new array ).

2

00

A A

Recall that after line 7, the largest k elements of will be in the last third of

k in sorted order. After line 8, the other elements (i.e., the n smallest elements) will

make up the first two-thirds of B , and they will be in sorted order; also, the last third of

00

A B B will be the same as the last third of . Thus, , which is the final array returned by

the algorithm, will consist of all the elements of A in sorted order.

n = 3T 2n=3 + O 1 b,c) The recurrence is T , since we split arrays into three

smaller arrays, each of size  2/3 the size of the big array (don’t worry about floor and 1 ceiling operators here); the nonrecursive portion of the code is O (i.e., the complexity

is bounded by a constant). Its solution is, by Case 1 of the Master Theorem (with

log 3

3=2

= 3 b = 3=2  = log 3 T n  n  log 3

a , ,and ,say), . Now, to compute ,

3=2 3=2

log 3

e

 2:71 :71 > 2:7

observe that it is equal to . Thus, since 2 , we can say that

log 3=2

e

2:7

n  n 

T . The worst-case running time of Stooge sort is worse than all worst-case

2 2

n  n n n 

running times of  ,mergesort lg , and

n n  lg . Begone, Stooges!

Note: You could also look at the corresponding recursion tree, which will have depth

log n n = 1 n = 2

 (remember that and are both base cases). For simplicity, let’s

3=2 n

say that the depth of all leaves in the recursion tree is log . Then, when we add up

3=2

the number of subproblems at each recursive level, we get:

log n+1 log n+1

3=2 3=2

log n

13 3 1

0 1

3=2

3 +3 +::: +3 = =

13 2

(Think: “first in” - “first out” .)

1 common base

log n

3=2

For simplicity, let’s consider 3 . By Logarithm Property 6 on p.443 in Appendix

log n log 3

3=2 3=2

= n O 1 A.5, 3 (look familiar?). Remember that work is being done at each node in the recursion tree.

3 PROBLEM N.1:Stooge sort (problem 8-3, pag. 169 of CLR, questions a,b,c).

Solution:

[1 :::n]

a) We will prove by induction that Stooge sort correctly sorts the input array A .

= 2 The base case is for n . In this case the first two lines of Stooge sort guarantee

that a bigger element will be returned in a position following that of a smaller element.

n = 2

Thus, Stooge sort correctly sorts A when . The inductive case happens when

 3

n . Then, assume by inductive hypothesis that the recursive call to Stooge-sort in

0 A line 6 correctly sorts the elements in the first two-thirds of the input array A, and let

be the array thus obtained. Also, assume by inductive hypothesis that the recursive call 0

to Stooge-sort in line 7 correctly sorts the elements in the last two-thirds of the array A , 00 and let A be the array thus obtained. Finally, assume by inductive hypothesis that the

recursive call to Stooge-sort in line 8 correctly sorts the elements in the first two-thirds

00 B of the array A , and let be the (final) array thus obtained. Now we need to prove that

at the end of the algorithm, the array B is sorted. To prove this, we just need to prove

[i]

that for any two elements A , the position of in will be smaller than the

[j ] A

position of A , no matter which is their position in the input array . Since this will

[i];A[j] be true for any pair A , then the correctness of Stooge sort will follow.

We divide the proof in two cases. 1) Consider the simple case in which the position

[i] A[j ] A of A is smaller than the position of in the input array . Then, clearly, from the

three facts that we have estabilished using the inductive hypothesis, we have that the

[i] B A[j ]

position of A in will be smaller than the position of . 2) Consider the more

[j ] A[i]

involved case in which the position of A is smaller than the position of in the B

input array A. We need to prove that their order in will be swapped. Here we have

[j ] A[i] some other cases. First, if A and are both in the first two-thirds of the input array

A, then they will be swapped in the recursive call in line 6 (which is correct by inductive

hypothesis) and will not be swapped again in the following recursive calls (again, by

[j ] A[i] the inductive hypothesis). Second, if A and are both in the last two-thirds of the

input array A, then they will be swapped in the recursive call in line 7 (which is correct

by inductive hypothesis) and will not be swapped again in the following recursive call

[j ]

(again, by the inductive hypothesis). Finally, if A is in the first one-third of the input

A[i] A array A and is in the last one-third of the input array , then we claim that their position are swapped either in the recursive call in line 7 or in the recursive call in line 8. In fact, assume their positions are not swapped in the recursive call in line 7. This

means that after the execution of the recursive call in line 6 (which is correct by inductive

0

[j ] A

hypothesis) the position of A in array is still in the first one-third and moreover,

0

A[j ]

at least all the elements in the second third of A are greater than . Then we have

[j ] A[i]

that since these elements are greater than a , then they are greater than (since we

[i] < A[j ]

assumed A ), and after the execution of the recursive call in line 7 (which is

00

[j ] A

correct by inductive hypothesis), A will appear in the first two-thirds of .Now,

00

[i] A since also A will remain in the first third of , their positions will be swapped by the

execution of the recursive call in step 8.

n = 3T 2n=3 + O 1

b,c) The recurrence is clearly T . Its solution is, by case 1 of

log 3

2:7

3=2

 n = n  = n Master Theorem, T . Clearly, the worst-case running time of Stooge-sort is worse than all worst-case running times of insertion sort, , quicksort and heapsort.

4 Name MAX SUM

Problem Use the divide-and-conquer approach to write an efficient recursive algorithm

that finds the maximum sum in any contiguous sublist of a given list of n real values. Analyse your algorithm, and show the results in order notation. Can you do better?

Solution(s)

n n Solution 1 (O lg )): Informally, one solution is the following. Divide the given list into two sublists (call them “left” and “right”) of roughly 1/2 the size of the original list, and recursively “solve” each sublist by finding the maximum “contiguous sum” in both. How do we then solve our original list? Take the largest of: (1) the maximum “contiguous sum” in the “left” sublist (2) the maximum “contiguous sum” in the “right” sublist (3) the maximum “contiguous sum” that “spreads” from the point at which we split the list into two sublists (this sum may or may not include values from both sublists). To compute this, we will need to compute:

(3a) the maximum “contiguous sum” whose corresponding values make up the last l l positions of the “left” sublist (for some )

(3b) the maximum “contiguous sum” whose corresponding values make up the first r r positions of the “right” sublist (for some ) We then want the maximum of (3a), (3b), and (3a)+(3b) as our answer to (3). Observe that we could just compare (3a)+(3b) with (1) and (2), because if (3a) is the maximum “contiguous sum” in the big list, then (3a) must equal (1); the same goes for (3b) and (2).

Here is the algorithm:

Lef t Rig ht Inputs: A, an array of real values; and , the leftmost and rightmost

position numbers of A in the subarray under consideration.

Outputs: the maximum “contiguous sum” in A.

procedure MAXSUM (A ; Lef t; R ig ht);

var : C enter integer;

# The position (in A) of the last element in the left subarray. : Lef tS um; R ig htS um real;

# The answers to (1) and (2) : Lef tB or der S um; R ig htB or der S um real; # The answers to (3a) and (3b)

begin

= Rig ht

if Lef t then

[Lef t]

return A

=1 Lef t = Rig ht # The base case is n (i.e., when ). #Wereturnthesolearrayentry

5

C enter := bLef t + Rig ht=2c A; Lef t; C enter

Lef tS um := MAXSUM( )

A; C enter +1; R ig ht R ig htS um := MAXSUM( )

compute Lef tB or der S um

compute R ig htB or der S um

n

# Code omitted. Both these computations are O .

+ R ig htB or der S um return max(Lef tS um; R ig htS um; Lef tB or der S um ) # We want the maximum of (1), (2), and (3a)+(3b).

end;

n = 2T n=2 + O n The recurrence is T . We split arrays into two smaller arrays,

each of size  1/2 the size of the big array. The nonrecursive portion of the code is

n n a = 2

O ; in fact, it is . Its solution is, by Case 2 of the Master Theorem (with

log a log 2 1

b 2

=2 n = n = n = n T n  n lg n and b and thus ), .

Note: You could also look at the corresponding recursion tree, which will have depth

log n lg n O n

 ,or . At each level in the recursion tree, work is being done. So, the

2

n lg n

algorithm is O .

n

Solution 2 (O ):

n O 1

There is a way to reduce the O “overhead” in the previous solution to .In-

n stead of having the algorithm compute the “border sums” in  time in each call, let’s have the algorithm “paste together” information provided by results from the recursive

calls. Consider the following algorithm:

var Lef tE ndS um; var R ig htE ndS um

procedure MAXSUM2 (A ; Lef t; R ig ht; ,

ar AllS um v )

var :

C enter integer; :

Lef tS um; R ig htS um real; : Lef tE ndS um; R ig htE ndS um real; # The maximum “contiguous sums” starting from the left end

# (respectively, ending at the right end) of the array : Lef tB or der S um; R ig htB or der S um real;

# The answers to (3a) and (3b) : Al l Lef tS um; Al l R ig htS um real;

# The total sum of all elements in the left (respectively right) subarrays. : AllS um real; # This is the sum of all the elements in the array.

begin

= Rig ht if Lef t then

begin

Lef tE ndS um := A[Lef t]

R ig htE ndS um := A[Lef t]

AllS um := A[Lef t]

[Lef t] return A

end

C enter := bLef t + Rig ht=2c

A; Lef t; C enter; Lef tE ndS um; Lef tB or der S um; Al l Lef tS um Lef tS um := MAXSUM2( )

6

A; C enter +1; R ig ht; R ig htB or der S um; R ig htE ndS um; Al l R ig htS um

R ig htS um := MAXSUM2( )

AllS um := Al l Lef tS um + Al l R ig htS um

Lef tE ndS um := maxLef tE ndS um; Al l Lef tS um + R ig htB or der S um

R ig htE ndS um := maxR ig htE ndS um; Lef tB or der S um + Al l R ig htS um

+ R ig htB or der S um return max(Lef tS um; R ig htS um; Lef tB or der S um )

end;

n = 2T n=2 + O 1 The recurrence is T , since we split arrays into two smaller

arrays, each roughly half the size of the larger array. The nonrecursive portion of the 1

code is O (i.e., the complexity is bounded by a constant). Its solution is, by Case 1 of

=2 b =2 log a = log 2 =1  = log 2= 1

the Master Theorem (with a ,,and ,say),

b 2 2

n  n T .

Note: You could also look at the corresponding recursion tree, which will have depth

log n

 . For simplicity, let’s say that the depth of all leaves in the recursion tree is

2 n

log . Then, when we add up the number of subproblems at each recursive level, we 2

get:

log n+1

2

12

0 1 log n log n+1

2 2

2 +2 +::: +2 = =2 1= 2n1

12

(Think: “first in” - “first out” .) 1 common base

Remember that 1 work is being done at each node in the recursion tree.

7

n

Solution 1 (Multi-pass  algorithm):

n L

Let’s say the array (or list) “A” is indexed from 1 to . Set up an array indexed

L[i]

from 0 to n.Let be the “maximum contiguous sum” whose corresponding subarray

[i] A i 1  i  n L[0]=0

ends with A , the element of the array in the th position ( ). Let ,

[i] A

since 0 is a viable answer if all of the A values are nonpositive or if the array has no

A max L[i]

in

elements (a weird case). Our “maximum contiguous sum” in is thus 0 ;

[0] = 0

this is because the true “maximum contiguous sum” is L , or it corresponds to

1  i  n a subarray that terminates at some position i in the array ( ). (There may, incidentally, be more than one subarray that corresponds to the sum.)

Here’s the algorithm:

A n

Inputs: n, array indexed from 1 to . A Outputs: M axS um, the “maximum contiguous sum” in .

procedure MAX.CONTIGUOUS.SUM (n; A);

var

: 0::n L array [ ];

begin

L[0] := 0

=1 n

for i to do

[i 1] < 0

if L then

L[i]:=A[i]

1 # If the “maximum contiguous sum” ending at array position i

# is negative (let’s call the corresponding subarray B ), then we would

[i] i

# rather take A as our “maximum contiguous sum” ending at position

[i] B B A[i] # instead of attaching A to ,since just “drags down.”

else

L[i]:=L[i1] + A[i]

1

# If the “maximum contiguous sum” ending at array position i

[i] # is nonnegative, then we want to attach A to the corresponding # subarray and take the sum. end

end L

M axS um:=max( )

n # Here, max takes the maximum value in L. This operation is .

return M axS um

end;

n n

This iterative algorithm is  , since the for loop and the max function are both .

n

Solution 2 (One-pass  algorithm):

M axS um max L[k ] i

k i

Set up a variable that will keep track of 0 as increases from L 0ton. Then, the second pass through the array to find the maximum value becomes unnecessary.

In fact, we don’t even need the L array! We can replace it with another variable,

L M axS um

C ur r entS um. We don’t need to maintain the array because (1) is already

[i] C ur r entS um keeping track of the running maximum of the L quantities, and (2) con-

8

tains all the information from previous array elements that is needed to update M axS um.

L[i] i n In particular, C ur r entS um keeps track of max(0, )as increases from 0 to .

Here’s the algorithm:

A n

Inputs: n, array indexed from 1 to . A Outputs: M axS um, the “maximum contiguous sum” in .

procedure MAX.CONTIGUOUS.SUM2 (n; A);

var : C ur r entS um; M axS um real;

begin

C ur r entS um := 0

M axS um := 0

=1 n

for i to do

C ur r entS um := C ur r entS um + A[i]

[i]

# This is the sum we would get if we were to attach A to the

i 1 # “maximum contiguous subarray” B ending at position # if the sum of B’s elements is nonnegative. The initialization

#of C ur r entS um and the upcoming if statement guarantee that,

A[i]

#ifthe B sum is negative, it will not be added to ,andwe

[i] C ur r entS um C ur r entS um

# will take A as our . In short, is

;L[i 1] + A[i]

#assignedmax0 , which was how we computed

[i]

# successive values of L in the first algorithm. 0

if C ur r entS um < then

C ur r entS um := 0 +1 # This guarantees that [in the next iteration], array position i will start # off a new subarray under consideration. Since the “maximum contiguous

# sum” ending at position i is negative, we don’t want to add it in next # time.

else if C ur r entS um > M axS um then

M axS um := C ur r entS um

# Update M axS um as necessary. end end

return M axS um

end;

n n This iterative algorithm is  , since the for loop is .

9 Name Weighted Median

Problem

n x ;x ; ;x w ;w ; ;w

2 n 1 2 n

For distinct elements 1 with positive weights such that

P

n

w =1 x k

i , the weighted median is the element satisfying

i=1

X X

w  1=2 w  1=2 i

i and

x x

i i

k k

x ;x ; ;x x

2 n i

1. Argue that the median of 1 is the weighted median of the with

w =1=n i =1;2; ;n

weights i for .

O n lg n 2. Show how to compute the weighted median of n elements in worst-case

time using sorting.

n 3. Show how to compute the weighted median in  worst-case time using a linear-time median algorithm.

Solution

1.

2.

P

x ; :::x w ; ::w w =

n 1 n i

3. We are given an unsorted list 1 and positive weights with

P P

1 x w  1=2 w  1=2

j k

,andwishtofindan i so that and .

j jx x

j i i

k

P

x ;w ;::x ;w  W = w

1 n n i

For a weighted unsorted list 1 ,let . We will

1in

give an algorithm which solves a more general problem: given a weighted array,

P P

0   W x w  w 

j k

and a number find an i so that .and

j jx x

j i i

k

W . j

We know there is an algorithm S el ect which can find the ’th largest element of

O n

a list of length n in time. To find the weighted median (or in general, the

x

-sample), we first find the non-weighted median l using Select. Then we divide

S x B

the list into the subset of elements smaller than l and the subset of elements

x x S B l

bigger than or equal to l .Since is the unweighted median, both and have 2 at most n= elements. (If our list contains only a single element, the recursion

bottoms out, and we return that element, which is a correct solution since both

  W

sums in the two inequalities are over empty sets, hence are 0, and 0 .)

P

W = w W >

j S

We then calculate S .If , we recursively call our algorithm

x 2S

j

S; x x  k

on . We output the result of this recursive call, k ,since satisfies:

P

w w = sum S

j j

jx

j , since only elements of can be smaller than

j

k

j jx 2S;x

j j

k

P P P

S w = w + w  W

j j

an element of ,and j

j jx >x j jx 2= S j jx 2S;x >x

j j j j

k k

S x W +W = W

k S

S since all elements not in are larger than .Otherwise,

B; W

recursively call the algorithm on S . The proof of correctness in this case

is similar to the last. Thus, the above algorithm always finds a valid solution.

n

Since the time used by the sub-routine select is O as is that to divide the list

B S W c

into and and to compute S , there is a constant so that the time taken

10

T n cn

by the algorithm on an n element input, , is at most + the time taken by

n  cn + T n=2

a recursive call on an input of 1/2 the size. I.e., T . Thus

n  cn + cn=2+cn=4+::::  2cn O n T , so the algorithm takes time.

11