Algorithms ROBERT SEDGEWICK | KEVIN WAYNE

Algorithms ROBERT SEDGEWICK | KEVIN WAYNE COLLECTIONS, IMPLEMENTATIONS, COLLECTIONS, IMPLEMENTATIONS, PRIORITY QUEUES PRIORITY QUEUES ‣ collections ‣ collections ‣ priority queues, sets, symbol tables ‣ priority queues, sets, symbol tables Algorithms ‣ heaps and priority queues Algorithms ‣ heaps and priority queues FOURTH EDITION ‣ heapsort ‣ heapsort ‣ event-driven simulation (optional) ‣ event-driven simulation (optional) ROBERT SEDGEWICK | KEVIN WAYNE ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu http://algs4.cs.princeton.edu Collections Terminology “The difference between a bad programmer and a good one is whether Abstract Data Type (ADT) [the programmer] considers code or data structures more important. Bad ・A set of abstract values, and a collection of operations on those values. programmers worry about the code. Good programmers worry about ・Operations: data structures and their relationships.” – Queue: enqueue, dequeue — Linus Torvalds (creator of Linux) – Stack: push, pop – Union-Find: union, find, connected Things we might like to represent ・Sequences of items. Example: queue of integers ・Sets of items. ・A sequence of integers ・Mappings between items, e.g. jhug’s grade is 88.4 – Mathematical sequence, not any particular data structure! ・create: returns empty sequence. ・enqueue x: puts x on the right side of the sequence. ・dequeue x: removes and returns the element on the left-hand side of the sequence. For more: COS326 3 4 Terminology Terminology Abstract Data Type (ADT) Implementation ・A set of abstract values, and a collection of operations on those values. ・Data structures are used to implement ADTs. ・Choice of data structure may involve performance tradeoffs. Collection – Worst case vs. average case performance. ・An abstract data type that contains a collection of data items. – Space vs. time. ・Restricting ADT capabilities may allow better performance. [stay tuned] Data Structure ・A specific way to store and organize data. Examples ・Can be used to implement ADTs. ・Queue ・Examples: Array, linked list, binary tree. – Linked list – Resizing array ・Randomized Queue – Linked list (slow, but you can do it!) – Resizing array 5 6 List (Java terminology) (some) List operations operation parameters returns effect contains Item boolean checks if an item is in the list OLLECTIONS MPLEMENTATIONS add Item appends Item at end C , I , add index, Item adds Item at position index PRIORITY QUEUES set index, Item replaces item at position index with Item remove index boolean removes item at index ‣ collections remove Item boolean removes Item if present in list get index Item returns item at index ‣ priority queues, sets, symbol tables Java implementations ‣ heaps and priority queues Algorithms ArrayList ‣ heapsort ・・LinkedList ‣ event-driven simulation (optional) ROBERT SEDGEWICK | KEVIN WAYNE http://algs4.cs.princeton.edu Caveat ・Java list does not match standard ADT terminology. ・Abstract lists don’t support random access. 8 Task Priority queue NSA Monitoring Priority queue. Remove the largest (or smallest) item. You receive 1,000,000,000 unencrypted documents every day. MaxPQ: Supports largest operations. return contents contents ・・ operation argument size ・You’d like to save the 1,000 documents with the highest score for ・MinPQ: Supports smallest operations. value (unordered) (ordered) manual review. insert P 1 P P insert Q 2 P Q P Q insert E 3 P Q E E P Q In real code, pick a list implementation, e.g. LinkedList remove max Q 2 P E E P insert X 3 P E X E P X insert A 4 P E X A A E P X public void process(List<Document> top1000, Document newDoc) { insert M 5 P E X A M A E M P X Document lowest = top1000.get(0); remove max X 4 P E M A A E M P insert for (Document d : top1000) P 5 P E M A P A E M P P insert L 6 P E M A P L A E L M P P if (d.score() < lowest.score()) insert E 7 P E M A P L E A E E L M P P lowest = d; remove max P 6 E M A P L E A E E L M P A sequence of operations on a priority queue if (newDoc.score() > lowest.score()) { Min PQ operations. top1000.remove(lowest); top1000.add(newDoc); operation parameters returns effect } inserts Item adds an item } min Item returns minimum Item delMin Item deletes and returns minimum Item List based solution 9 10 Priority queue Priority queue Implementation operation parameters returns effect s We’ll get to that later. insert Item adds an item ・ min Item returns minimum Item delMin Item deletes and returns minimum Item “Programmers waste enormous amounts of time thinking about, Actual algs4 class or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong public void process(MinPQ<Document> top1000, Document newDoc) { negative impact when debugging and maintenance are considered. top1000.insert(newDoc); We should forget about small efficiencies, say about 97% of the top1000.delMin(newDoc); time: premature optimization is the root of all evil. Yet we should } not pass up our opportunities in that critical 3%” — Donald Knuth, Structured Programming with Go To Statements PQ based solution Advantages ・Much simpler code. ・ADT is problem specific. May be faster. 11 12 Sets Application Genre identification operation parameters returns effect s Collect set of all words from a song’s lyrics. add Item adds an item, only one copy may exist ・ contains boolean returns true if item is present ・Compare against large dataset using machine learning techniques. delete deletes item – Guess genre. In real code, pick a set implementation, e.g. TreeSet public Set<String> wordsInFile(In file) { Set<String> s = new Set<String>(); while (!file.isEmpty()) s.add(file.readString()); return s; } Finding all words in a file 13 14 Symbol tables Collinear Collinear revisited (on board / projector) operation parameters returns effect put Key, Value associates Value with Key ・Collections make things easier. contains Key boolean returns true if Key is present Likely to be slower and use more memory. get Key Value returns Value associated with Key (if any) ・ delete Key Value deletes Key and returns Value In real code, pick a symbol table implementation, e.g. TreeMap public void countChars(SymT<Character, Integer> charCount, String s) { for (Character c : s.toCharArray()) if (charCount.contains(c)) charCount.put(c, charCount.get(c) + 1); else charCount.put(c, 1); } Adding letter counts to array of strings Other names ・Associative array, map, dictionary 15 16 Design Problem Implementations Solo in Groups ・Erweiterten Netzwerk is a new German minimalist social networking Collection Java Implementations algs4 Implementations site that provides only two operations for its logged-in users. LinkedList, ArrayList, Stack (oops) – : Enter another user’s username and click the Neu button. List None This marks the two users as friends. MinPQ MinPQ PriorityQueue MaxPQ MaxPQ TreeSet, HashSet, Set SET (note: ordered) – : Type in another user’s username and CopyOnWriteArraySet, ... determine whether the two users are in the same extended network TreeMap, HashMap, RedBlackBST, SeparateChainingHashST, Symbol Table (i.e. there exists some chain of friends between the two users). ConcurrentHashMap, ... LinearProbingHashST, ... pollEv.com/jhug text to 37607 Implementations Identify at least one ADT that Erweiterten Netzwerk should use: ・Use algs4 classes when possible in COS226. ・When performance matters, pick the right implementation! A. Queue [879345] D. Priority Queue [879348] ・Next two weeks: Implementation details of these collections. B. Union-find [879346] E. Symbol Table [879349] ・More collections to come. C. Stack [879347] F. Randomized Queue [879350] Note: There may be more than one ‘good’ answer. 17 18 Priority queue API Requirement. Generic items are Comparable. Key must be Comparable COLLECTIONS, IMPLEMENTATIONS, (bounded type parameter) PRIORITY QUEUES public class MaxPQ<Key extends Comparable<Key>> MaxPQ() create an empty priority queue ‣ collections MaxPQ(Key[] a) create a priority queue with given keys ‣ priority queues, sets, symbol tables void insert(Key v) insert a key into the priority queue ‣ heaps and priority queues Algorithms Key delMax() return and remove the largest key ‣ heapsort boolean isEmpty() is the priority queue empty? ‣ event-driven simulation (optional) ROBERT SEDGEWICK | KEVIN WAYNE Key max() return the largest key http://algs4.cs.princeton.edu int size() number of entries in the priority queue 20 Priority queue applications Priority queue client example See optional slides / Coursera lecture. ・Event-driven simulation. [customers in a line, colliding particles] Challenge. Find the largest M items in a stream of N items. ・Numerical computation. [reducing roundoff error] ・Data compression. [Huffman codes] Graph searching. [Dijkstra's algorithm, Prim's algorithm] ・ order of growth of ﬁnding the largest M in a stream of N items Number theory. [sum of powers] ・ Assignment 4. ・Artificial intelligence. [A* search] implementation time space ・Statistics. [maintain largest M values in a sequence] sort N log N N Operating systems. [load balancing, interrupt handling] ・ elementary PQ M N M Discrete optimization. [bin packing, scheduling] ・ binary heap N log M M Spam filtering. [Bayesian spam filter] ・ best in theory N M Generalizes: stack, queue, randomized queue. 21 22 Priority queue: unordered and ordered array implementation Priority queue: unordered array implementation public class UnorderedArrayMaxPQ<Key extends Comparable<Key>> { private Key[] pq; // pq[i] = ith element on pq return contents contents private int N;

Load more