RadixSort SortingintheC++STL

CS311DataStructuresandAlgorithms LectureSlides Monday,March16,2009

GlennG.Chappell DepartmentofComputerScience UniversityofAlaskaFairbanks [email protected]

©2005–2009GlennG.Chappell UnitOverview AlgorithmicEfficiency&Sorting

MajorTopics • IntroductiontoAnalysisofAlgorithms • IntroductiontoSorting • ComparisonSortsI • MoreonBigO • TheLimitsofSorting • DivideandConquer • ComparisonSortsII • ComparisonSortsIII • RadixSort • SortingintheC++STL

16Mar2009 CS311Spring2009 2 Review IntroductiontoAnalysisofAlgorithms

Efficiency • General:usingfewresources(time,space,bandwidth,etc.). • Specific:fast(time). AnalyzingEfficiency • Determinehowthe size of the input affectsrunningtime,measuredin steps ,inthe worst case . Scalable • Workswellwithlargeproblems.

Using Big-O In Words Cannotread O(1) Constanttime allofinput O(log n) Logarithmictime Faster O(n) Lineartime O(n log n) Loglineartime Slower 2 Usually too O(n ) Quadratictime slowtobe O(bn),forsome b >1 Exponentialtime scalable

16Mar2009 CS311Spring2009 3 Review IntroductiontoSorting— Basics,Analyzing

Sort :Placealistinorder. 3 1 5 3 5 2 Key :Thepartofthedataitemusedtosort.

Comparison sort :Asortingalgorithm x x

Fivecriteriaforanalyzingageneralpurposecomparisonsort: • (Time)Efficiency In-place =nolargeadditional • RequirementsonData storagespacerequired (constantadditionalspace). • SpaceEfficiency • Stability 1.Allitemsclosetoproper • PerformanceonNearlySortedData places,OR 2.fewitemsoutoforder. 16Mar2009 CS311Spring2009 4 Review IntroductiontoSorting— OverviewofAlgorithms

Thereisno known sortingalgorithmthathasallthepropertieswe wouldlikeonetohave. Wewillexamineanumberofsortingalgorithms.Mostofthesefall intotwocategories: O(n2)and O(n log n). • QuadraticTime[ O(n2)]Algorithms  BubbleSort  SelectionSort  InsertionSort  Treesort(laterinsemester) • LogLinearTime[ O(n log n)]Algorithms  MergeSort (some) HeapSort(mostlylaterinsemester)  (notinthetext) • SpecialPurpose— NotComparisonSorts PigeonholeSort RadixSort

16Mar2009 CS311Spring2009 5 Review ComparisonSortsIII— Quicksort:Description

Quicksort isanotherdivideandconquer algorithm.Procedure: • Choosesalistitem(the pivot ). 3 1 5 3 5 2 3 • Doa Partition :putitemslessthanthe Pivot pivotbeforeit,anditemsgreaterthan Partition thepivotafterit. • Recursivelysorttwosublists:items 2 1 3 3 5 5 3 beforepivot,itemsafterpivot. Pivot Sort Sort Wedidasimplepivotchoice:the (recurse) (recurse) firstitem.Later,weimprovethis. 1 2 3 3 3 5 5 FastPartitionalgorithmsareinplace,but notstable. • Note:InplacePartitiondoesnotgiveus aninplaceQuicksort.Quicksortuses memoryforrecursion.

16Mar2009 CS311Spring2009 6 Review ComparisonSortsIII— Quicksort:Improvements

UnoptimizedQuicksortisslow(quadratictime)onnearlysorted dataandusesalotofspace(linear)forrecursion. ThreeImprovements InitialState: • Medianofthreepivotselection. We 2 12 9 10 3 1 6 Improvesperformanceon most wrote Pivot nearlysorteddata. these AfterPartition: two. Requiresrandomaccessdata. • Tailrecursioneliminationon 2 1 3 6 10 9 12 largerrecursivecall. Reducesspaceusagetologarithmic. • Donotsortsmallsublists;finishwithInsertionSort. Generalspeedup. Mayadverselyaffectcachehits. Withtheseoptimizations,Quicksortisstill O(n2)time.

16Mar2009 CS311Spring2009 7 Review ComparisonSortsIII— Quicksort:Analysis

Efficiency • Quicksortis O(n2). • Quicksorthasa very good O(n log n)averagecasetime. ☺☺ RequirementsonData • Nontrivialpivotselectionalgorithms(medianofthreeandothers) areonlyefficientforrandomaccessdata. SpaceUsage • Quicksortusesspaceforrecursion. Additionalspace: O(log n),ifyouarecleveraboutit. Evenif all recursioniseliminated, O(log n)additionalspaceisused. Thisadditionalspaceneednotholdanydataitems. Stability • EfficientversionsofQuicksortarenotstable. PerformanceonNearlySortedData • AnonoptimizedQuicksortis slow onnearlysorteddata: O(n2). • MedianofthreeQuicksortis O(n log n)onmostnearlysorteddata.

16Mar2009 CS311Spring2009 8 Review ComparisonSortsIII— Introsort:Description

In1997,DavidMusserfoundouthowtomakeQuicksortloglinear time. • Keeptrackoftherecursiondepth.

• Ifthisexceedssomebound(recommended:2log 2n),thenswitchto HeapSortforthecurrentsublist. HeapSortisageneralpurposecomparisonsortthatisloglineartime andinplace.Wewilldiscussitindetaillaterinthesemester. • Mussercallsthistechnique“introspection”.Thus,introspective sorting,or Introsort .

16Mar2009 CS311Spring2009 9 Review ComparisonSortsIII— Introsort:Diagram

HereisanillustrationofhowIntrosortworks. • Inpractice,therecursionwillbedeeperthanthis. • TheInsertionSortcall might notbedone,duetoitseffectoncachehits. Now,thelistis Introsort nearlysorted.Finish witha(lineartime!) Introsort-recurse InsertionSort[??]. LikeMo3Quicksort: FindMo3Pivot,Partition Tailrecursion

Introsort-recurse Introsort-recurse eliminated.(Butit LikeMo3Quicksort: LikeMo3Quicksort: stillcountstoward FindMo3Pivot,Partition FindMo3Pivot,Partition themaximum recursiondepth.) Introsort-recurse Introsort-recurse Whenthesublisttosortis LikeMo3Quicksort: LikeMo3Quicksort: verysmall,donotrecurse. FindMo3Pivot,Partition FindMo3Pivot,Partition InsertionSortwillfinishthe joblater[??]. Introsort-recurse Introsort-recurse LikeMo3Quicksort: LikeMo3Quicksort: RecursionDepthLimit FindMo3Pivot,Partition FindMo3Pivot,Partition

Whentherecursiondepthis Heap Sort Heap Sort toogreat,switchtoHeapSort tosortthecurrentsublist.

16Mar2009 CS311Spring2009 10 Review ComparisonSortsIII— Introsort:Analysis

Efficiency ☺☺ • Introsortis O(n log n). • Introsortalsohasanaveragecasetimeof O(n log n)[ofcourse]. ItsaveragecasetimeisjustasgoodasQuicksort. ☺☺ RequirementsonData • Introsortrequiresrandomaccessdata. SpaceUsage • Introsortusesspaceforrecursion(orsimulatedrecursion). Additionalspace: O(log n)— evenifallrecursioniseliminated. Thisadditionalspaceneednotholdanydataitems. Stability • Introsortisnotstable. PerformanceonNearlySortedData • Introsortisnotsignificantlyfasterorsloweronnearlysorted data.

16Mar2009 CS311Spring2009 11 RadixSort Background

Wehavelookedindetailatsixgeneralpurposecomparisonsorts. Nowwelookattwosortingalgorithmsthatdonotusea comparisonfunction: • PigeonholeSort. • RadixSort. Laterinthesemester,wewilllookcloseratHeapSort,which is a generalpurposecomparisonsort,butwhichcanalsobe convenientlymodifiedtohandleothersituations.

16Mar2009 CS311Spring2009 12 RadixSort Preliminaries:PigeonholeSort— Description

Supposewehavealisttosort,and: • Keyslieinasmallfixedsetofvalues. Not general • Keyscanbeusedtoindexanarray. purpose E.g.,theymightbesmallishnonnegativeintegers. Notevena Procedure comparisonsort • Makeanarrayofemptylists( buckets ),oneforeachpossiblekey. • Iteratethroughthegivenlist;inserteachitemattheendofthe bucketcorrespondingtoitsvalue. • Copyitemsineachbucket,inorder,backtotheoriginallist. Timeefficiency: linear time ,ifwrittenproperly. • Howisthispossible?Answer:Wearenotdoinggeneralpurpose comparisonsorting.Our(n log n)bounddoesnotapply. Thisalgorithmisoftencalled Pigeonhole Sort . • Itisnotverypractical;itrequiresaverylimitedsetofkeys. • PigeonholeSortisstable,anduseslinearadditionalspace.

16Mar2009 CS311Spring2009 13 RadixSort Preliminaries:PigeonholeSort— WriteIt

TODO • WriteafunctiontodoPigeonholeSort.

16Mar2009 CS311Spring2009 14 RadixSort Description

UsingPigeonholeSort,wecandesignapracticalalgorithm: Radix Sort . Supposewewanttosortalistof strings (insomesense): • Characterstrings. • Numbers,consideredasstringsofdigits. • Shortishsequencesofsomeotherkind. Calltheentriesinastring“ characters ”. • TheseneedtobevalidkeysforPigeonholeSort. • Inparticular,wemustbeabletousethemasarrayindices. Thealgorithmwillarrangethelistin lexicographic order . • Thismeanssortfirstbyfirstcharacter,thenbysecond,etc. • Forstringsofletters,thisisalphabeticalorder. • Forpositiveintegers(paddedwithleadingzeroes),thisisnumericalorder. RadixSortProcedure • PigeonholeSortthelistusingthe last characterasthekey. • TakethelistresultingfromthepreviousstepandPigeonholeSortit,using the next-to-last characterasthekey. • Continue… • Afterresortingby first character,thelistissorted.

16Mar2009 CS311Spring2009 15 RadixSort Example

Hereisthelisttobesorted. • 583 508 183 90 223 236 924 4 426 106 624

Wefirstsortthembytheunitsdigit,using Nonemptybuckets PigeonholeSort. areunderlined • 90 583 183 223 924 4 624 236 426 106 508 ThenPigeonholeSortagain,basedonthetensdigit, inastablemanner(notethatthetensdigitof4is0). • 4 106 508 223 924 624 426 236 583 183 90 Again,usingthehundredsdigit. • 4 90 106 183 223 236 426 508 583 624 924 Andnowthelistissorted.

16Mar2009 CS311Spring2009 16 RadixSort WriteIt,Comments

TODO • WriteRadixSortforsmallishpositiveintegers.

Comments • RadixSortmakesverystrongassumptionsaboutthevalues inthelisttobesorted. • Itrequireslinearadditionalspace. • Itisstable. • Itdoesnotperformespeciallywellorbadlyonnearlysorted data. • Ofcourse,whatwereallycareaboutisspeed. See the next slide.

16Mar2009 CS311Spring2009 17 RadixSort Efficiency[1/2]

HowFastisRadixSort? • Fixthenumberofcharactersandthecharacterset. • Theneachsortingpasscanbedoneinlineartime. PigeonholeSortwithonebucketforeachpossiblecharacter. • Andthereareafixednumberofpasses. • Thus,RadixSortis O(n): linear time . Howisthispossible? • RadixSortisasortingalgorithm.However,again,itis neithergeneralpurposenoracomparisonsort. Itplacesrestrictionsonthevaluestobesorted:notgeneral purpose. Itgetsinformationaboutvaluesinwaysotherthanmakinga comparison:notacomparisonsort. • Thus,ourargumentshowingthat(n log n)comparisons wererequiredintheworstcase,doesnotapply.

16Mar2009 CS311Spring2009 18 RadixSort Efficiency[2/2]

Inpractice,RadixSortisnotreallyasfastasitmightseem. • Thereisahiddenlogarithm.Thenumberofpassesrequiredisequal tothelengthofastring,whichissomethinglikethelogarithm of thenumberofpossiblevalues. • IfweconsiderRadixSortappliedtoalistinwhichallthevalues mightbe different ,thenitisinthesameefficiencyclassasnormal sortingalgorithms. However,incertainspecialcases(e.g.,biglistsofsmallnumbers) RadixSortcanbeausefultechnique.

100million recordstosortby ZIPcode?Radix Sortworkswell.

16Mar2009 CS311Spring2009 19 SortingintheC++STL SpecifyingtheInterface

Iteratorbasedsortingfunctionscanbespecifiedtwoways: • Givenarange “last ”isactuallyjustpasttheend,asusual. void sortIt(Iterator first, Iterator last);

• Givenarangeandacomparison. template void sortIt(Iterator first, Iterator last, Ordering compare);

“compare ”,above,shouldbesomethingyoucanusetocomparetwo values. • “compare(val1, val2) ”shouldbealegalexpression,andshouldreturna bool :trueif val1 comesbefore val2 (think“lessthan”). • So compare canbeafunction(passedasafunctionpointer). • Itcanalsobeanobjectwith operator() defined:a function object .

16Mar2009 CS311Spring2009 20 SortingintheC++STL OverviewoftheAlgorithms[1/4]

TheC++StandardTemplateLibraryhassixsortingalgorithms: • Globalfunction std::sort • Globalfunction std::stable_sort • Memberfunction std::list::sort • Globalfunctions std::partial_sort and partial_sort_copy . • Combinationoftwoglobalfunctions: std::make_heap & std::sort_heap Wenowlookbrieflyateachofthese.

16Mar2009 CS311Spring2009 21 SortingintheC++STL OverviewoftheAlgorithms[2/4]

Function std::sort ,in • Globalfunction. • Takestworandomaccessiteratorsandanoptionalcomparison. • O(n2),buthas O(n log n)averagecase.Notstable. Thisshouldbecome O(n log n)intheforthcomingrevisedC++standard. Itiscurrently O(n log n)ingoodSTLimplementations. • Algorithmused: Quicksortiswhatthestandardscommitteewasthinking. Introsortiswhatgoodimplementationsnowuse. Otheralgorithms(HeapSort?)arepossible,butunlikely. Function std::stable_sort ,in • Globalfunction. • Takestworandomaccessiteratorsandanoptionalcomparison. • O(n log n).Stable. • Algorithmused:probablyMergeSort,generalsequenceversion.

16Mar2009 CS311Spring2009 22 SortingintheC++STL OverviewoftheAlgorithms[3/4]

Function std::list::sort ,in • Memberfunction.Sortsonlyobjectsoftype std::list . • Takeseithernoparametersoracomparison. • O(n log n).Stable. • Algorithmused:probablyMergeSort,LinkedListversion.

16Mar2009 CS311Spring2009 23 SortingintheC++STL OverviewoftheAlgorithms [4/4]

WewilllookatthelasttwoSTLalgorithmsinmoredetaillater in thesemester,whenwecoverPriorityQueuesandHeaps: • Functions std::partial_sort and std::partial_sort_copy ,in Globalfunctions. Takethreerandomaccessiteratorsandanoptionalcomparison. O(n log n).Notstable. Solveamoregeneralproblemthancomparisonsorting. Algorithmused:probablyHeapSort. • Combination: std::make_heap & std::sort_heap ,in BothGlobalfunctions. Bothtaketworandomaccessiteratorsandanoptionalcomparison. Combinationis O(n log n).Notstable. Solvesamoregeneralproblemthancomparisonsorting. Algorithmused:almostcertainlyHeapSort.

16Mar2009 CS311Spring2009 24 SortingintheC++STL UsingtheAlgorithms[1/2]

Algorithm std::sort isdeclaredintheheader . Callitwithtwoiterators: vector v; Defaultconstructorcall. std::sort(v.begin(), v.end()); Wecanonlypassan // Ascending order object ,nota type .

Orusetwoiteratorsandacomparison: std::sort(v.begin(), v.end(), std::greater()); // Descending order

• Classtemplate std::greater isdefinedin . Use std::stable_sort similarlyto std::sort .

16Mar2009 CS311Spring2009 25 SortingintheC++STL UsingtheAlgorithms[2/2]

Whensortinga std::list ,usethe sort memberfunction:

#include std::list myList; myList.sort(); // Ascending order myList.sort(std::greater()); // Descending order

16Mar2009 CS311Spring2009 26