Really Efficient Use of Keys: the Adding Item to a Trie Scrunching Example

much about cost of comparisons. adding bat and faceplate. (unrelated to on preceding slides) worst case is length of string. ticked. arrays, each indexed 0..9

should throw extra factor of key length, L, into costs: 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 a f A2: comparisons really means Θ(ML) operations. b ✷ trout pike ghee milk oil look for key X, keep looking at same chars of X M times. a bat f ✷ x a 3 4 5 6 7 8 9 better? Can we get search cost to be O(L)? b ab ax fa a b e o b c cumin mace multi-way decision , with one decision per character ✷ ✷ ✷ ✷ aba abbas axe axolotl fabric fac t them, but keep track of original index of each item: e ✷ A3: 0 1* 2 3 4 5* 6 7 8 9* abate face A2: 0 1 2* 3 4 5 6* 7* 8 9 h p t A1: 0* 1 2 3 4 5* 6 7* 8 9 0 -1 1 -1 2 5 5 7 6 7 9 ✷ faceplate✷ ✷ abash facet A123:

bass trout pike ghee milk oil salt cumin mace

19:39:39 2018 CS61B: Lecture #31 2 19:39:39 2018 CS61B: Lecture #31 4 19:39:39 2018 CS61B: Lecture #31 6

CS61B Lecture #31 The Trie: Example A Side-Trip: Scrunching

obvious implementation for internal nodes is array in- balanced search structures (DS(IJ), Chapter 9 abase, abash, abate, abbas, axolotl, axe, fabric, facet} character. L show paths followed for “abash” and “fabric” performance, length of search key. Pseudo-random Numbers (DS(IJ), Chapter 11) N internal node corresponds to a possible prefix. independent of , number of keys. Is there a depen- in path to node = that prefix. arrays are sparsely populated by non-null values—waste of

a f

a f arrays on top of each other! ✷ x b a empty) entries of one array to hold non-null elements of ab ax fa a b e o b c ✷ ✷ ✷ ✷ ✷ aba abbas axe axolotl fabric facet markers to tell which entries belong to which array. s t ✷ abas abate h

abash✷

19:39:39 2018 CS61B: Lecture #31 1 19:39:39 2018 CS61B: Lecture #31 3 19:39:39 2018 CS61B: Lecture #31 5 Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists

can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which to put the keys at “random” heights. to put the keys at “random” heights. to put the keys at “random” heights. thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large

example: example: example: ⇓ ⇓

25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞

20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ start at top layer on left, search until next step would then go down one layer and repeat. start at top layer on left, search until next step would start at top layer on left, search until next step would above, we search for 125 and 127. Gray nodes are looked at; then go down one layer and repeat. then go down one layer and repeat. nodes are overshoots. above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; the nodes were chosen randomly so that there are about nodes are overshoots. nodes are overshoots. nodes that are >k high as there are that are k high. the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about >k k >k k searches fast with high probability. nodes that are high as there are that are high. nodes that are high as there are that are high. searches fast with high probability. searches fast with high probability.

19:39:39 2018 CS61B: Lecture #31 8 19:39:39 2018 CS61B: Lecture #31 10 19:39:39 2018 CS61B: Lecture #31 12

Practicum Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists scrunching idea is cute, but can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which to put the keys at “random” heights. to put the keys at “random” heights. good if we want to expand our trie. complicated. thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large more useful for representing large, sparse, fixed tables many rows and columns. example: example: Furthermore, number of children in trie tends to drop drastically ⇓ gets a few levels down from the root. practice, might as well use linked lists to represent of ∞ ∞ children.. . 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 arrays for the first few levels, which are likely to have start at top layer on left, search until next step would start at top layer on left, search until next step would children. then go down one layer and repeat. then go down one layer and repeat. above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; nodes are overshoots. nodes are overshoots. the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about nodes that are >k high as there are that are k high. nodes that are >k high as there are that are k high. searches fast with high probability. searches fast with high probability.

19:39:39 2018 CS61B: Lecture #31 7 19:39:39 2018 CS61B: Lecture #31 9 19:39:39 2018 CS61B: Lecture #31 11 Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists

can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which to put the keys at “random” heights. to put the keys at “random” heights. to put the keys at “random” heights. thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large

example: example: example: ⇓ ⇓ ⇓

20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞

start at top layer on left, search until next step would start at top layer on left, search until next step would start at top layer on left, search until next step would then go down one layer and repeat. then go down one layer and repeat. then go down one layer and repeat. above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; nodes are overshoots. nodes are overshoots. nodes are overshoots. the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about nodes that are >k high as there are that are k high. nodes that are >k high as there are that are k high. nodes that are >k high as there are that are k high. searches fast with high probability. searches fast with high probability. searches fast with high probability.

19:39:39 2018 CS61B: Lecture #31 14 19:39:39 2018 CS61B: Lecture #31 16 19:39:39 2018 CS61B: Lecture #31 18

Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists Probabilistic Balancing: Skip Lists

can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which can be thought of as a kind of n-ary search tree in which to put the keys at “random” heights. to put the keys at “random” heights. to put the keys at “random” heights. thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large thought of as an ordered list in which one can skip large

example: example: example: ⇓ ⇓ ⇓

20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ 20 25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞

start at top layer on left, search until next step would start at top layer on left, search until next step would start at top layer on left, search until next step would then go down one layer and repeat. then go down one layer and repeat. then go down one layer and repeat. above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; above, we search for 125 and 127. Gray nodes are looked at; nodes are overshoots. nodes are overshoots. nodes are overshoots. the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about the nodes were chosen randomly so that there are about nodes that are >k high as there are that are k high. nodes that are >k high as there are that are k high. nodes that are >k high as there are that are k high. searches fast with high probability. searches fast with high probability. searches fast with high probability.

19:39:39 2018 CS61B: Lecture #31 13 19:39:39 2018 CS61B: Lecture #31 15 19:39:39 2018 CS61B: Lecture #31 17 Summary Structures that Implement Abstractions

search trees allows us to realize Θ(lg N) performance. red-black trees: linked lists, circular buffers N) performance for searches, insertions, deletions. good for external storage. Large nodes minimize # of OrderedSet operations Priority Queue: heaps Set: binary search trees, red-black trees, B-trees, ) performance for searches, insertions, and deletions, arrays or linked lists is length of key being processed. Unordered Set: to manage space efficiently. idea: scrunched arrays share space. Map: hash table probable Θ(lg N) performace for searches, insertions, dele- Map: red-black trees, B-trees, sorted arrays or linked lists

implement. Presented for interesting ideas: probabilistic balance, random- structures.

19:39:39 2018 CS61B: Lecture #31 20 19:39:39 2018 CS61B: Lecture #31 22

Example: Adding and deleting Summary of Collection Abstractions Corresponding Classes in Java

from initial list: (Collection) Multiset Blue: Java has corresponding interface contains, Green: Java has no corresponding interface ArrayList, LinkedList, Stack, ArrayBlockingQueue,

25 30 40 50 55 60 90 95 100 115 120 125 130 140 150 ∞ Set List get(n) order, we add 126 and 127 (choosing random heights for OrderedSet remove 20 and 40: Ordered Set Unordered Priority Queue: PriorityQueue first Set Set (SortedSet): TreeSet Unordered Set: HashSet Priority Queue 30 50 55 60 90 95 100 115 120 125 126 127 130 140 150 ∞ Sorted Set subset nodes here have been modified. Map Map: HashMap contains, iterator get Map (SortedMap): TreeMap

Unordered Ordered Map Map

19:39:39 2018 CS61B: Lecture #31 19 19:39:39 2018 CS61B: Lecture #31 21 19:39:39 2018 CS61B: Lecture #31 23