Compsci 201 Priority Queues & Autocomplete

Compsci 201 Priority Queues & Autocomplete

Compsci 201 Priority Queues & Autocomplete Owen Astrachan Jeff Forbes November 15, 2017 11/15/17 Compsci 201, Fall 2017, PQ + Compare 1 U is for … • URL and URI • Uniform Resource (Locator and Identifier) • Usenet • p2p original source of FAQ, Flame, Spam • Unix • Before there was Linux, … • User Interface, UI, UX • User is the heart and soul 11/15/17 Compsci 201, Fall 2017, PQ + Compare 2 Plan for the Day • Where are we? Where are we going? What’s left? • Review of PQs+Heaps: implementation and API • PQ API is a key to Autocomplete • Software Design and Software Testing • Autcomplete, algorithms, trade-offs • Testing, debugging, understanding 11/15/17 Compsci 201, Fall 2017, PQ + Compare 3 Work [todate | todo] • APTQuiz2: Median 30, Mean 24.6 • APTQuiz1: Median 30, Mean 25.5 • Midterm 1: Median 78%, Mean 75% • Midterm 2: Median 87%, Mean 84% • Assignments: 5 of 6 out • APT: one more set, one more quiz 11/15/17 Compsci 201, Fall 2017, PQ + Compare 4 One path, two paths 11/15/17 Compsci 201, Fall 2017, PQ + Compare 5 Heap Review • Used to implement priority queues efficiently • Binary tree implemented in array: indexes! • Heap shape and heap property, 2*k and 2*k+1 • Minimal element: O(1) peek and O(log N) poll • Change definition of min: max-heap! • How to compare elements? • Comparable or Comparator! 11/15/17 Compsci 201, Fall 2017, PQ + Compare 6 A Sore by any other name … • String implements Comparable<String> • We know there’s a .compareTo method • What if we want to change how strings compared, e.g., length or lowercase or … • Sometimes we don’t have access to class, or sub-classing not a good idea • Some classes are final, no sub-classing! • Some classes not designed for sub-classing 11/15/17 Compsci 201, Fall 2017, PQ + Compare 7 Comparator or Raptor Coma? • Must implement .compare(T a, T b) • Different than .compareTo, similar as well • Return < 0 when a < b • Return == 0 when a == b • Return > 0 when a > b • You write code to determine what this means! 11/15/17 Compsci 201, Fall 2017, PQ + Compare 8 Something Old, Something New • Create class that implements Comparator<T> • Write the .compare(T a, T b) method • Create Comparator object, use it • Java 8: New tricks: Create classes “anonymously” by calling Comparator.comparing • Pass method names as parameters https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/PersonSorter.java 11/15/17 Compsci 201, Fall 2017, PQ + Compare 9 PersonSorter.java static class Person implements Comparable<Person> { String first; String last; public String getLast(){ return last; } // more here https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/PersonSorter.java • Changing how people are compared (run it) Comparator<Person> comp = Comparator.comparing(Person::getFirst) .thenComparing(Person::getLast); Collections.sort(list,comp); 11/15/17 Compsci 201, Fall 2017, PQ + Compare 10 WOTO http://bit.ly/201fall17-nov15-compare • What is Comparator.comparing? • Creating Comparator at runtime 11/15/17 Compsci 201, Fall 2017, PQ + Compare 11 It's time for Autocomplete 11/15/17 Compsci 201, Fall 2017, PQ + Compare 12 What is Autocomplete? • 40,000 queries/second, thousands of computers, 0.2 seconds to answer query 11/15/17 Compsci 201, Fall 2017, PQ + Compare 13 Geolocating Heaven… 11/15/17 Compsci 201, Fall 2017, PQ + Compare 14 Data Structure for Autocomplete • We'd like the "best" or "top" matching Terms • Each Term is a (word, weight) pair • If we sort by weight, we get the best easily! • Priority Queues help • Find "best" element • Comparator! 11/15/17 Compsci 201, Fall 2017, PQ + Compare 15 Tradeoffs in Autocomplete • Bruteforce: look at every (word, weight) pair • Find heaviest/best ones after looking at all N • Binary Search: search efficiently for prefixes • Find these candidates, choose best from M • Trie Search: search really efficiently for prefixes • Search tree-like structure, choose best from M 11/15/17 Compsci 201, Fall 2017, PQ + Compare 16 Overview of Approaches • Have N total terms, want k best matches from M • N is millions; k is 10’s; M is hundreds… • We want k "heav…" prefix matches, from M of N • We can organize the N elements to find M • ITRW we'd do lots/different organizing • Bruteforce: search for M matching terms, sort M • O(N) to search, O(M log M) to sort, get top k 11/15/17 Compsci 201, Fall 2017, PQ + Compare 17 How to get organized https://www.youtube.com/watch?v=1ve57l3c19g 11/15/17 Compsci 201, Fall 2017, PQ + Compare 18 Bruteforce made smarter • Suppose we insert all N elements into a priority queue ordered by weight -- limited to M elements • PQ contains M elements, minimum first • Default PQ in Java, can specify comparator • Example: M = 4, add 10,70,30,40,20,60,50 • After adding 20, drop using pq.remove() • After adding 60 drop using pq.remove() • Contains largest M elements seen so far 11/15/17 Compsci 201, Fall 2017, PQ + Compare 19 Quantifying Improvements • PQ changes O(N + M log M)to O(N log M) • Find M, then sort versus PQ insertion. Doesn't take constant factors into account • Still typically faster when N > M https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/TopMSorting.java • Where are heaviest/best when sorting? • Last M: Collections.sort(...) • First M: .sort(...,comp.reversed()); 11/15/17 Compsci 201, Fall 2017, PQ + Compare 20 Sort N take top k • Why use reversed comparator? Alternative? • If we used comp, greatest/top k at end • Could still use subList to return these! https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/TopMSorting.java public static List<String> sortTopM(List<String> list, int mSize, Comparator<String> comp){ List<String> copy = new ArrayList<>(list); Collections.sort(copy,comp.reversed()); return copy.subList(0, mSize); } 11/15/17 Compsci 201, Fall 2017, PQ + Compare 21 Use limited-PQ for top k https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/TopMSorting.java • Remove/poll min when pq size exceeds • Why do we remove smallest seen so far? • Why do we use LinkedList and addFirst? public static List<String> pqTopM(List<String> list, int mSize, Comparator<String> comp) { PriorityQueue<String> pq = new PriorityQueue<>(comp); for(String s : list) { pq.add(s); if (pq.size() > mSize) pq.remove(); } LinkedList<String> ret = new LinkedList<>(); while (pq.size() > 0) ret.addFirst(pq.remove()); return ret; } 11/15/17 Compsci 201, Fall 2017, PQ + Compare 22 No organization in BruteForce • Always search through all N (word, weight) terms • Even to find the best 10, or the best 100 of M • New Query? Search again, no improvement • We can organize data to facilitate prefix search! • Binary search through a sorted list • Trie data structure to help with prefixes • Both better in theory, in practice? It depends 11/15/17 Compsci 201, Fall 2017, PQ + Compare 23 Binary Search in Autocomplete • Given "beenie" and prefix of 3, find M matches • Find first "bee.." and last "bee.." • Sort these M elements by weight! Done • O(log N) to find first and last O(M log M) to sort 11/15/17 Compsci 201, Fall 2017, PQ + Compare 24 BinarySearch (Autocomplete) • Sort all N elements: cost O(N log N) • Find M prefixes (weight,word) pairs O(log N) using binary search for firstIndex and lastIndex (adjacent) • Take the top k of these • More queries? Sort once! Sorting cost amortized • Carefully code binary search to find first/last • Top k after sorting: O(M log M) • Use limited PQ?: O(M log k) 11/15/17 Compsci 201, Fall 2017, PQ + Compare 25 Summary of Two Approaches • Bruteforce: O(N + M log M) or O(N log M) • Binary search: O(log N + M log M)* • Requires initial sort of O(N log N), only once! • We are willing to sort once, recoup $$ over time • Which is better? • What if we do LOTS of queries? Q? • Comparing QN to Qlog N 11/15/17 Compsci 201, Fall 2017, PQ + Compare 26 One compare, cut list in half! binary search 11/15/17 Compsci 201, Fall 2017, PQ + Compare 27 Finding the firstIndex • Use Collections.binarySearch • Code below doesn't check index < 0 • Why is this O(N) in worst case? public static int firstIndex(String[] values, String target, Comparator<String> comp) { List<String> list = Arrays.asList(values); int index = Collections.binarySearch(list,target,comp); while (0 <= index && comp.compare(list.get(index),target) == 0) { index -= 1; } return index+1; } 11/15/17 Compsci 201, Fall 2017, PQ + Compare 28 Start with code and change, … • Do not use this reference to achieve O(log N) http://stackoverflow.com/questions/6676360/first -occurrence-in-a-binary-search • One idea: find standard code and mess with it until it works 11/15/17 Compsci 201, Fall 2017, PQ + Compare 29 How to develop loops • David Gries: The Science of Programming • Edsger Dijkstra: The Discipline of Programming 11/15/17 Compsci 201, Fall 2017, PQ + Compare 30 Reasoning about code https://coursework.cs.duke.edu/201fall17/sortall/blob/master/src/Looper.java A. Runs Forever B. Exhausts Memory and stops C. Prints~ (2 billion), D. Prints~ (-2 billion) public class Looper { public static void main(String[] args){ int x = 0; while (x < x + 1) { x = x + 1; } System.out.println("value of x = "+x); } } 11/15/17 Compsci 201, Fall 2017, PQ + Compare 31 Reasoning with Logic • While loop test is a boolean expression • Negation must be true when loop exits, why? • Other boolean expressions aid loop development • Loop invariant: true when loop test checked • Use invariant and loop guard to develop loop • Reason semi-formally about loops 11/15/17 Compsci 201, Fall 2017, PQ + Compare 32 Better than late-night coding? • Proving code correct?

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    38 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us