SI 335, Unit 7: Advanced Sort and Search Daniel S. Roche (
[email protected]) Spring 2016 Sorting’s back! After starting out with the sorting-related problem of computing medians and percentiles, in this unit we will see two of the most practically-useful sorting algorithms: quickSort and radixSort. Along the way, we’ll get more practice with divide-and-conquer algorithms and computational models, and we’ll touch on how random numbers can play a role in making algorithms faster too. 1 The Selection Problem 1.1 Medians and Percentiles Computing statistical quantities on a given set of data is a really important part of what computers get used for. Of course we all know how to calculate the mean, or average of a list of numbers: sum them all up, and divide by the length of the list. This takes linear-time in the length of the input list. But averages don’t tell the whole story. For example, the average household income in the United States for 2010 was about $68,000 but the median was closer to $50,000. Somehow the second number is a better measure of what a “typical” household might bring in, whereas the first can be unduly affected by the very rich and the very poor. Medians are in fact just a special case of a percentile, another important statistical quantity. Your SAT scores were reported both as a raw number and as a percentile, indicating how you did compared to everyone else who took the test. So if 100,000 people took the SATs, and you scored in the 90th percentile, then your score was higher than 90,000 other students’ scores.