<<

Introsp ective and Selection

David R Musser

Computer Science Department

Rensselaer Polytechnic Institute Troy NY

mussercsrpiedu

Abstract

Quicksort is the preferred inplace sorting in manycontexts since its average

computing time on uniformly distributed inputs is N log N and it is in fact faster than

most other sorting algorithms on most inputs Its drawback is that its worstcase time

bound is N Previous attempts to protect against the worst case by improving the

way cho oses pivot elements for partitioning have increased the average computing

time to o muchone might as well use which has aN log N worstcase time

b ound but is on the average to times slower than quicksort A similar dilemma exists

with selection algorithms for nding the ith largest element based on partitioning This

pap er describ es a simple solution to this dilemma limit the depth of partitioning and for

subproblems that exceed the limit switch to another algorithm with a b etter worstcase

bound Using heapsort as the stopp er yields a that is just as fast

as quicksort in the average case but also has an N log N worst case time bound For

selection a hybrid of Hoares find algorithm which is linear on average but quadratic

in the worst case and the BlumFloydPrattRivestTarjan algorithm is as fast as Hoares

algorithm in practice yet has a linear worstcase time b ound Also discussed are issues

of implementing the new algorithms as generic algorithms and accurately measuring their

p erformance in the framework of the C Standard Library

key words Quicksort Heapsort Sorting algorithms Introsp ective algorithms Hybrid

algorithms Generic algorithms STL

Intro duction

Among sorting algorithms with O N log N average computing time of quicksort

is considered to b e a go o d choice in most contexts It sorts in place except for log N

stack space and is usually faster than other inplace algorithms such as heapsort mainly

b ecause it do es substantially fewer data assignments and other op erations These characteristics

make medianof quicksort a go o d candidate for a sorting routine and it is in

fact used as such in the C Standard Template Library STL as well as in older C

librariesit is the algorithm most commonly used for for example However its worst

case time is N and although the worstcase behavior app ears to be highly unlikely the

very existence of input sequences that can cause such bad p erformance may be a concern in

some situations The b est alternativeinsuch situations has b een to use heapsort which has a

N log N worstcase as well as averagecase time bound but doing so results in computing

times to times longer than quicksorts time on most inputs Thus in order to protect

This work was partially supp orted by a grant from IBM Corp oration

completely against deterioration to quadratic time it would seem necessary to pay a substantial

time p enalty for the great ma jority of input sequences

A similar dilemma app ears with selection algorithms for nding the ith smallest elementof

a sequence Hoares algorithm based on partitioning has a linear b ound in the average case

but is quadratic in the worst case The BlumFloydPrattRivestTarjan lineartime worst

case algorithm is much slower on the average than Hoares algorithm In this pap er we

concentrate on the sorting problem and return to the selection problem only briey in a later

section

The next section presents a class of arbitrarily long input sequences that do in fact cause

medianof quicksort to take quadratic time call these sequences medianof killers It

is p ointed out that such input sequences might not be as rare in practice as a probabilistic

argument based on uniform distribution of p ermutations would suggest

The third section then describ es for introspective a new hybrid sorting

algorithm that b ehaves almost exactly like medianof quicksort for most inputs and is just as

fast but which is capable of detecting when partitioning is tending toward quadratic b ehavior

By switching to heapsort in those situations introsort achieves the same O N log N time

b ound as heapsort but is almost always faster than just using heapsort in the rst place On a

medianof killer sequence of length for example introsort switches to heapsort after



partitions and has a total running time less than th of that of medianof quicksort

In this case it would be somewhat faster to use heapsort in the rst place but for almost all

randomlychosen integer sequences introsort is faster than heapsort by a factor of between

and

Ideally quicksort partitions sequences of size N into sequences of size approximately N

those sequences in sequences of size N and so on implicitly pro ducing a of subproblems

whose depth is approximately log N The quadratictime problem arises on sequences that

cause many unequal partitions resulting in the subproblem tree depth growing linearly rather

than logarithmically Introsort avoids the problem by putting a logarithmic b ound on the depth

and switching to heapsort whenever the b ound is reached

Another way of obtaining an O N log N worstcase time bound with an algorithm based

on quicksort is to install a lineartime mediannding algorithm for cho osing partition pivots

However the resulting algorithm is useless in practice since the large overhead of the

mediannding algorithm makes the overall sorting algorithm much slower than heapsort This

algorithm could however be used as the stopp er in an introsp ective algorithm in place of

heapsort There might be some advantage in this algorithm in terms of commonality of co de

with the selection algorithms describ ed in a later section

Instead of nding the median exactly other less exp ensive alternatives have b een suggested

such as adaptively selecting from larger arrays a larger sample of elements from which to

estimate the median A new version of the C library function qsort based on this technique and

other improvements outp erforms medianof versions in most cases but still takes N

time in the worstcase

Yet another p ossibility is to use a randomized version of quicksort in whichpivot elements

are chosen at random The worstcase time b ound is still N but the probabilityof there

seconds on a MHz microSPARC II versus seconds for the generic sort algorithm from the

HewlettPackard HP implementation of the C Standard Template library with b oth algorithms compiled

with the Ap ogee C compiler version Performance comparisons that are more architecture and compiler

indep endent are given in a later section



For a purely recursiveversion of quicksort the subproblem depth would b e the same as the recursion depth

but the most ecientversions recurse directly or using a stack on only one branch of the subproblem tree and

iterate down the other

b eing suciently many bad partitions to achieve this b ound is extremely low However cho osing

a single element of the sequence at random do es not pro duce go o d enough partitions on the

average to comp ete with medianof choices and cho osing three elements at random requires

to o much time for random numb er generation Introsort b eats these randomized variations in

practice

Medianof Killer Sequences

As is wellknown the simplest metho ds of cho osing pivot elements for partitioning such as

always cho osing the rst element cause quicksort to deteriorate to quadratic time on already

or nearly sorted sequences Since there are many situations in which one applies a sorting

algorithm to an already sorted or almost sorted sequence a more robust wayofcho osing pivots

is needed While the ideal choice would be the median value the amount of computation

required is to o large Instead one of the most frequently used metho ds is to cho ose the rst

middle and last elements and then cho ose the median of these three values This metho d

pro duces go o d partitions in most cases including the sorted or almost sorted cases that cause



the simpler pivoting metho ds to blow up Nevertheless there are sequences that can cause

medianof quicksort to make many bad partitions and take quadratic time

For example let k be any even p ositiveinteger and consider the following p ermutation of

k

    k  k   k k  k  k  k   k

 k   k   k   k   k     k  k

Call this p ermutation K for reasons to be seen it will also b e called a medianof killer

k

or sequence Assume the three elements chosen by a medianof quicksort

algorithm on an array ak are aak and ak Thus for K the three values

k

chosen are and k and the median value chosen as the pivot for partitioning is In the

partition the only elements exchanged are k and resulting in

    k  k   k k  k  k  k   k

  k   k   k   k   k    k  k

where the partition point is after a Now the array ak now holds a sequence with

the structure of K as can be seen by subtracting from every element As this step can

k 

be repeated ktimeswehave the following

Theorem For any even integer N divisible by the permutation K causes medianof

N

partitioning to produce a sequence of N partitions into subsequences of length and N

and N and N

Corollary On any sequencepermuted from an ordered sequencebyK where N is divisible

N

by medianof quicksort takes N time

Pro of By the theorem the subproblem tree pro duced by partitioning K has at least N

N

levels Since the total time for all partitions at each level is N the total time for all levels

is N

One might argue that the K sequences are not likely to occur in real applications or at

N

least not nearly so likely as the almost sorted sequences that cause simpler versions of quicksort



This assumes that care is also taken with other asp ects of the algorithm such as programming the partition

algorithm so that it swaps equal elements in order to ensure that go o d partitions result on sequences of all equal

elements or sequences whose elements are chosen from a relatively small set

to blow up While this is probably true p erhaps there is still some reason to exp ect these or

equally damaging sequences to o ccur more often in practice than a probabilistic argument based

on uniform distribution of would suggest Indeed some authors have prop osed

substituting for the uniform distribution a universal distribution that assigns much higher

probability to sequences that can b e pro duced by short programs as can the K sequences than

N

to random sequences those that require programs of length prop ortional to the sequence length

Under the universal distribution b oth quicksorts worstcase and average computing times

are N The average and worstcase times for heapsortand introsortremain N log N

under the universal distribution

The Algorithm

The details of the introsort algorithm are p erhaps easiest to understand in terms of its dierences

from the following version of medianof quicksortthe one used in the HP implementation

of the C Standard Template Library for the sort function This version uses a constant

called size threshold to stop generating subproblems for small sequences leaving the problem

instead for a later pass using Leaving small subproblems to insertion sort is one

of the usual optimizations of quicksort the merits of leaving them until a nal pass rather than

calling insertion sort immediately are discussed later As in blo ck structure is indicated in

the pseudoco de using indentation rather than bracketing

Algorithm QuicksortA f b

Inputs A a containing the sequence

of data to b e sorted in p ositions Af Ab

f the rst p osition of the sequence

b the rst p osition b eyond the end of the sequence

Output A is p ermuted so that Af  Af   Ab

Quicksort loopA f b

Insertion sortA f b

loopA f b Algorithm Quicksort

Inputs A f b as in Quicksort

Output A is p ermuted so that Ai  Aj

for all i j f  i j b and size threshold j i

while b f size threshold

of Af Afbf Ab do p PartitionA f b Median

if p f b p

then Quicksort loopA p b

bp

else Quicksort loopA f p

fp

The test p f b p is to ensure that the recursive call is on a subsequence of length no more

than half of the input sequence so that the stack depth is O log N rather than O N The

sort Median of and Partition are not imp ortant for this discussion details of Insertion

these routines are also used without change in Introsort

The key mo dication to quicksort to avoid quadratic worstcase behavior is to make the

algorithm aware of how many subproblem levels it is generating It can then avoid the

p erformance problem by switching to heapsort when the number of levels reaches a limit

According to Theorem below the depth limit must be O log N Although any choice for

the constant factor in this b ound will ensure an overall time b ound of O N log N it must not

b e so small that the algorithm calls heapsort to o frequently causing it to b e little b etter than

using heapsort in the rst place In the following pseudoco de the depth limit is blog N c

since this value empirically pro duces go o d results

Algorithm IntrosortA f b

Inputs A a random access data structure containing the sequence

of data to b e sorted in p ositions Af Ab

f the rst p osition of the sequence

b the rst p osition b eyond the end of the sequence

Output A is p ermuted so that Af  Af   Ab

Introsort loopA f b Floor lgb f

Insertion sortA f b

loopA f b depth limit Algorithm Introsort

Inputs A f b as in Introsort

depth limit a nonnegativeinteger

Output A is p ermuted so that Ai  Aj

for all i j f  i j b and size threshold j i

threshold while b f size

do if depth limit

then HeapsortA f b

return

depth limit depth limit

p PartitionA f b Median of Af Afbf Ab

Introsort loopA p b depth limit

bp

loop it is p ossible to omit the test for recursing on the smaller half of the In Introsort

partition since the depth limit puts an O log N bound on the stack depth anyway This

omission nicely osets the added test for exceeding the depthlimit keeping the algorithms

overhead essentially the same as Quicksorts We can skip the details of the Heapsort

algorithm since they are unimp ortantfor the time b ound claimed for Introsortas long as

Heapsortor any other algorithm used as the stopp er has an O N log N worstcase time

b ound

Theorem Assuming an O log N subproblem tree depth limit and an O N log N worstcase

time bound for Heapsort the worstcase computing time for Introsort is N log N

Pro of The time for partitioning is b ounded by N times the number of levels of partitioning

which is b ounded by the subproblem tree depth limit Hence the total time for partitioning is

O N log N Supp ose Introsort calls Heapsort j times on sequences of length n n

 j

Let c be a constant such that cN log N b ounds the time for Heapsort on a sequence of length

N then the time for all the calls of Heapsort is b ounded by

j j

X X

n cN log N cn log n c log N

i i i

i i

or O N log N also Therefore the total time for Introsort is O N log N

The lower bound is also N log N since a sequence such as an already sorted sequence

pro duces equal length partitions in all cases resulting in log N levels each taking N time

Implementation and Optimization

An imp ortant detail not indicated by the pseudoco de is the parameterization of the algorithms

by the typ e of data and the typ e of function used for comparisons Such parameterization

is a fairly common use of the C template feature but the Standard Template Library

go es further by also parameterizing on the sequence representation Sp ecically instead of

passing an array base address and using integer indexing the b oundaries of the sequence are

passed via iterators which are a generalization of ordinary CC p ointers The iterator typ e

is a template parameter of the algorithms as in The implementation of introsort is

parameterized in the same way for example the C source co de for the main function is

template class RandomAccessIterator class T class Distance

void introsortloopRandomAccessIterator first

RandomAccessIterator last T

Distance depthlimit

while last first stlthreshold

if depthlimit

partialsortfirst last last

return

depthlimit

RandomAccessIterator cut unguardedpartition

first last Tmedianfirst first last first

last

introsortloopcut last valuetypefirst depthlimit

last cut

STL denes ve categories of iterators the most powerful b eing randomaccess iterators for

which i n for iterator i and integer n is a constant time op eration Introsort like quicksort and

heapsort requires randomaccess iterators which means that it cannot b e used with linked lists

However it is not restricted just to arrays in STL randomaccess iterators are also provided

by vectors and deques which are like arrays in most resp ects but dynamically expandable at

one or at b oth ends resp ectively

For heapsort the implementation calls the STL generic partial sort algorithm Similarly

the call to insertion sort is co ded as a call to an internal insertion sort routine already provided

in the HP STL co de

Performance Measurements

To compare the computing time p erformance of algorithms that have the same asymptotic time

b ounds as is the case here wemust lo ok for more discriminating ways of describing them For

this purp ose it is traditional to use formulas that express the actual time an algorithm uses or

the number of op erations it do es in the worst or average case when applied to any input in

a certain class Actual times are the most accurate description but may only apply to a single

hardware architecture and compiler conguration Op eration counts are more p ortableexcept

for dep endence in some cases on compiler optimizationsbut they may b ear little relation to

actual times when only certain op erations are counted such as only counting the comparison

op erations done by a sorting algorithm For example in STL heapsort present as a sp ecial

case of its partial sort generic algorithm do es fewer comparisons than the STL medianof

quicksort version sort but its actual computing time is usually signicantly higher b ecause

it do es more assignments and other op erations And since these algorithms are generic they

access most of the op erations they do through typ e parameters and thus the relative cost of

dierent kinds of op erations can vary when they are sp ecialized with dierenttyp es

With algorithms that are generic in the sequence representation as in STL the number of

iterator operations and their cost relativeto comparisons and assignments are also imp ortant

As discussed in more detail at the end of this section in the deque representation in the HP

STL implementation most iterator op erations cost several times as much as vector or array

iterator op erations and this fact is signicant in assessing the relative p erformance of dierent

variants of introsort

A fourth kind of op erations that app ears in these algorithms is distance operations which

are arithmetic op erations on integer results of iterator subtractions such as the division byin

the expression last first The cost of a distance op eration is mostly indep endentof

the data or iterator typ es but it is still useful to knowhowmanysuch op erations are p erformed

Heapsort for example do es many more distance op erations than quicksort or introsort which

along with its higher numb er of iterator op erations accounts for its p o orer p erformance in most

cases

Thus the p erformance tables below do not show actual times but instead give separate

counts for comparison assignment iterator and distance op erations Table shows the op er

ation counts for introsort quicksort and heapsort on randomlyordered sequences ranging in

size from to elements Eachvalue is the median of measurements on seven ran

dom sequences The table shows that on such sequences the op eration counts for introsort are

almost identical to those of quicksort Heapsort comes in with an approximately smaller

comparison count but do es ab out times as many assignments times as many iterator

op erations and more than times as many distance op erations The last column gives the to

tal numb er of op erations which is a guide to how actual computing times compare when each

kind of op eration takes ab out the same time as for example with arrays containing integer

elements The total for heapsort is more than times the total for introsort or quicksort

Table shows the op eration counts for the three algorithms on the medianof killer se

quences For quicksort the op eration total grows quadratically as predicted by the analysis

Note also that on these sequences introsort is somewhat slower than heapsort b ecause it p er

forms blog N c partitions b efore switching to heapsort on a sequence of length N blog N c

Thus the time for the heapsort call within introsort is almost as great as the heapsort time on

the original sequence but introsort adds to this the time for the partitions

There are also some sequences on whichintrosort fares worse than quicksort though never

so dramatically as the dierences in the other direction Such sequences T can b e constructed

N

by taking K and randomly shuing the elements in p ositions blog N c through N and

N

Nblog N c through N call such sequences twofaced On T introsort switches to

N

heapsort after blog N c partitions but since the heapsort call is on a random sequence it takes

longer than continuing to partition would

Table Performance of Introsort Quicksort and Heapsort on Random Sequences Sizes and

Op erations Counts in Multiples of

Size Algorithm Comparisons Assignments Iterator Ops Distance Ops Total Ops

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Exp eriments were also p erformed with an introsort version in which the insertion sort call

is placed at the end of Introsort loop where it do es many separate calls on small subarrays

rather than at the end of Introsort where it do es one nal pass over the entire array The

onenalpass p osition is one of the quicksort optimizations suggested by Sedgewick But

with mo dern memory caches the savings in overhead may b e cancelled or outweighed for large



arrays since the complete pass at the end can double the number of misses However

in these exp eriments the manyinsertionsortcall version almost never ran faster than the one

nalpass version and in some cases ran considerably slower One imp ortant case is sorting

a deque whose implementation in HP STL is in terms of a twolevel structure consisting of

a blo ck of pointers to xedsized blo cks This representation supp orts random access but

iterator op erations are slower than array or vector iterator op erations by a constant factor

typically three to ve Measurements show that the manycall version p erforms ab out more

iterator op erations than the onecall version causing it to take more time when sorting

a deque Thus the onecall version is denitely to b e preferred when sorting a deque and the

only question is whether the manycall version should b e used for arrays or vectors giving up

some genericity For now it seems b etter to retain only the onecall version since the cases

in which the manycall version might be faster are rare but this decision should probably be

revisited as relativecachemiss p enalties continue to increase

Finallyitmay b e of some interest that the op eration counts were obtained without mo di

fying the algorithm source co de at all by sp ecializing their typ e parameters with classes whose



In Introsort the manycall version also has the advantage that insertion sorting is never done on a region

already sorted by Heapsort although this is relatively unimp ortantsince Heapsort is very rarely called and

insertionsorting an already sorted arrayisvery fast anyway

Table Performance of Introsort Quicksort and Heapsort on Medianof Killer Sequences

Sizes and Op erations Counts in Multiples of

Size Algorithm Comparisons Assignments Iterator Ops Distance Ops Total Ops

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

Introsort

Quicksort

Heapsort

op erations have counters built into them This is an obvious technique for counting element

typ e comparison and assignment op erations but iterator op erations and even distance op era

tions can be counted in the same way since STL generic algorithms also access them through

typ e parameters The most useful way of dening a counting class is as an adaptor which

is a generic comp onent that takes another comp onent and outts it with a new interface or

provides the same interface but with dierent or additional b ehindthescenes functionality

Acounting adaptor is an example of adding b oth to the interface for initializing and rep orting

counts and internal functionality incrementing the counts

Introsp ective Selection Algorithms

Hoares nd algorithm for selecting the ismallest element of a sequence is similar to quick

sort but only one of the two subproblems generated by partitioning must be pursued With

even splits the computing time is O N NN or O N But the same medianof

killer sequences describ ed in Section cause the to do N partitions when

nding the median and the computing time thus b ecomes N

If as in introsort we limit the depth to a logarithmic b ound the average time remains linear

but the worst case is N log N We could get an overall linear worstcase b ound by putting a

constant b ound on partitioning depth but that would mean that for suciently large sequences

wewould switch to the stopp er algorithm on most inputs even though Hoares algorithm would

run faster A b etter approach is to require that the sequence size is cut at least in half byany

k consecutive partitions for some p ositiveinteger k if this test fails then switch to the linear

time algorithm This limits the partitioning depth to k dlog N e and the total time to O N

Another approachthatachieves this time b ound is limiting the sum of the sizes of all partitions

generated to some constant times N

The details of such introselect algorithms and the results of exp eriments comparing them

with other selection algorithms will b e given in a separate pap er

Implications for Generic Software Libraries

Let us conclude by noting the p ossible advantages and disadvantages of introsort and introselect

relative to other sorting and selection algorithms included in a generic software library suchas

STL

STL is not a set of sp ecic software comp onents but a set of requirements which comp o

nents must satisfy By making part of the requirements for comp onents STL

insures that compliant comp onents not only have the sp ecied interfaces and semantics but also

meet certain computing time b ounds The complexity requirements were chosen based on the

characteristics of the b est algorithms and data structures known at the time the requirements

were prop osed Whenever p ossible the requirements are expressed as worstcase time b ounds

Requiring O N log N worstcase time for the generic sort algorithm would have meant

sort could b e used but would have precluded quicksort And of course only requiring O N

worstcase time would have p ermitted insertion sort or even to be used So in

this case the requirements only stipulate O N log N average time in order to allow quicksort

to be used for this comp onent The sp ecication then app ends the caveat if the worstcase

b ehavior is imp ortant stable sort or partial sort should be used In view of introsorts

combination of parity with quicksort on most sequences while meeting an O N log N worst

case b ound the STL requirement could b e nowbestbeexpressedasan O N log N worstcase

b ound with no need for the caveat

Is there nowany need to have heapsort publicly available in a generic library Yes if as in

STL it is present in a more general form called partial sort capable of placing the smallest

k elements of a sequence of length N in the rst k p ositions leaving the remaining N k

elements in an undened order Its time complexity requirementisO N log k whichisnotas

go o d as could be done using a linear selection algorithm to place the k smallest elements at

the b eginning of the array and sorting them with heapsort for a total time of O N k log k

However it do es p ermit an algorithm that makes a heap out of the rst k elements then

maintains in it the k smallest elements of the i elements seen so far as i go es to N and nally

sorts the heap This algorithm which is used in HP STL is faster on the average than the

selectandsort algorithm

Introsort likequicksort is not stabledo es not preserve the order of equivalent elements

so there is still a need to have a separate requirement for a stable sorting routine In STL the

requirements for stable sort are written to b e satisable by a algorithm

A p ossible disadvantage of introsort is the extra co de space required roughly double the

requirement for quicksort or heapsort alone In a compiletime generic library such as STL this

problem is exacerbated when many dierent instances of a generic comp onent are required in

a single application program The problem fades away however if there is a separate need for

partial sort since the co de for introsort can just call it and need not include it inline

Finallycombining introsp ection with partitioning metho ds other than medianof should

b e explored For example it can also b e applied to partitioning using the simpler choice of the

rst element as the pivot the resulting algorithm still has a N log N time b ound However

b ecause the simpler choice of pivot do es not partition as evenly as medianof partitioning

do es the average time is higher Another p ossibility is to apply depth limiting to the algorithm

describ ed in which adaptively selects from larger arrays a larger sample of elements from

which to estimate the median Exp erimentation with this algorithm and other variations of

will b e the sub ject of a future pap er

Acknowledgments C Stewart suggested the idea for the class of sequences K J Valois

N

and twoanonymous referees made many useful comments on an earlier draft of this pap er

References

C A R Hoare Quicksort Computer Journal

R C Singleton Communications of the ACM

J W J Williams Algorithm heapsort Communications of the ACM

D R Musser and A A Stepanov Algorithmoriented generic libraries Software Practice

and Exp erience July

A A Stepanov and M Lee The Standard Template Library Technical Rep ort HPL

HewlettPackard Lab oratories May revised Octob er incorp orated

into Accredited Standards Committee X American National Standards Institute Infor

mation Pro cessing Systems Working Pap er for Draft Prop osed International Standard for

Information Pro cessing SystemsProgramming Language C Do c No XJ

WGN April

D R Musser and A Saini STL Tutorial and Reference Guide C Programming with

the Standard Template Library AddisonWesley Reading MA

C A R Hoare Algorithm partition and algorithm nd Communications of the

ACM

M Blum R W Floyd V Pratt R L Rivest and R E Tarjan Time b ounds for selection

Journal of Computer and System Sciences

J L Bentley and M D McIlroy Engineering a sort function Software Practice and Ex

p erience November

C V Stewart Private communication

M Li and P M B Vitanyi Average case complexity under the universal distribution

equals worstcase complexity Information Pro cessing Letters

A A Stepanov M Lee and D R Musser HewlettPackard Lab oratories refer

ence implementation of the Standard Template Library source les available from

ftpftpcsrpiedupubstl

T H Cormen C E Leiserson and R L Rivest Intro duction to AlgorithmsMITPress

Cambridge MA

R Sedgewick Implementing quicksort programs Communications of the ACM

A LaMarca and R E Ladner The inuence of caches on the p erformance of sorting

Pro ceedings of Eighth Annual ACMSIAM Symp osium on Discrete Algorithms January