Algorithms ROBERT SEDGEWICK | KEVIN WAYNE
5.2 TRIES
‣ string symbol tables ‣ R-way tries Algorithms ‣ ternary search tries FOURTH EDITION ‣ character-based operations
ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu
Last updated on 4/15/21 9:40 AM 5.2 TRIES
‣ string symbol tables ‣ R-way tries Algorithms ‣ ternary search tries ‣ character-based operations
ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu Symbol tables: performance summary
Review. Two classic symbol tables: red–black BSTs and hash tables.
frequency of core operations ordered core operations implementation operations on keys search insert delete
red–black BST log n log n log n ✔ compareTo()
equals() hash table 1 † 1 † 1 † hashCode()
† under uniform hashing assumption
Q. Can we do better? A. Yes, if we can avoid examining the entire key, as with string sorting.
3 String symbol tables: performance summary
Goal (for string keys). Faster than hashing, more flexible than BSTs. Benchmark. Count distinct words in a text file. exchange rate: L character accesses per hash around Θ(log n) character accesses per string compare
character accesses (typical case) count distinct
search search space implementation insert moby.txt actors.txt hit miss (references)
2 2 2 red–black BST L + log n log n log n 4 n 1.4 97.4
hashing L L L 4 n to 16 n 0.76 40.6 (linear probing)
n = number of key–value pairs file size words distinct
L = length of key moby.txt 1.2 MB 210 K 32 K R = radix actors.txt 82 MB 11.4 M 900 K
4 5.2 TRIES
‣ string symbol tables ‣ R-way tries Algorithms ‣ ternary search tries ‣ character-based operations
ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu Tries
Etymology. [ from retrieval, but pronounced “try” ]
6 Tries
Abstract trie. Store characters in nodes (not keys). Each node has up to R children, one for each possible character in alphabet.
link to trie containing all keys root that start with s link to trie containing all keys that start with she
b s t
y 4 e h h key value
by 4 a 6 l e 0 o e 5 sea 6
sells 1 l l r value associated with she in node corresponding to she 0 last character in key shells 3 s 1 l e 7 shore 7
the 5 s 3 7 Tries: search hit
Follow links corresponding to each character in the key. Search hit: node where search ends has a non-null value. Search miss: reach null link or node where search ends has null value.
get("shells")
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7 return value in node corresponding to s 3 last character in key (return 3) 8 Tries: search hit
Follow links corresponding to each character in the key. Search hit: node where search ends has a non-null value. Search miss: reach null link or node where search ends has null value.
get("she")
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r search may terminate at an internal node (return 0)
s 1 l e 7
s 3 9 Tries: search miss
Follow links corresponding to each character in the key. Search hit: node where search ends has a non-null value. Search miss: reach null link or node where search ends has null value.
get("shelter")
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
no link to t s 3 (return null) 10 Tries: search miss
Follow links corresponding to each character in the key. Search hit: node where search ends has a non-null value. Search miss: reach null link or node where search ends has null value.
get("shell")
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7 no value associated node corresponding to s 3 last character in key (return null) 11 Tries: insertion
Follow links corresponding to each character in the key. Encounter a null link: create new node. Encounter the last character of the key: set value in that node.
put("shore", 7)
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
s 3 12 Trie construction demo
trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
s 3
13 R-way tries: Java representation
Node. A value, plus references to R nodes.
private static class Node { private Object val; no generic array creation private Node[] next = new Node[R]; }
characters are implicitlycharacters are implicitly defined by link definedindex by link indexs s s s e e h h e h e h a l a e l e a 2 l e 0a 2 l e2 0 2 0 0
l l l l each node has each node has an array of linksan array of links s sand a value and a value s 1 s 1 1 1
Trie representationTrie representation Remark. An R-way trie stores neither keys nor characters explicitly.
14 R-way tries: Java implementation
public class TrieST
private static class Node { /* see previous slide */ }
public void put(String key, Value val) { root = put(root, key, val, 0); }
private Node put(Node x, String key, Value val, int d)
{
if (x == null) x = new Node();
if (d == key.length()) { x.val = val; return x; }
char c = key.charAt(d);
x.next[c] = put(x.next[c], key, val, d+1);
return x;
}
private Value get(String key) { /* similar, see book or booksite */ } }
15 R-way trie: performance
Parameters. n = number of key–value pairs; L = length of key; R = alphabet size.
Search hit. Θ(L). Search miss (worst case). Θ(L). sublinear in L Search miss (typical case). Θ(log R n).
Space. At least Θ(nR) space. characters are implicitly defined by link index s s e h e h a l e a 2 l e 0 2 0 l l each node has an array of links s and a value at least R links per key s 1 1
Trie representation Bottom line. Fast search hit; even faster search miss; but wastes space.
16 Trie quiz 1
What is worst-case running time to insert a key of length L into an R-way trie that contains n key–value pairs? n = number of key–value pairs L = length of key A. Θ(L) R = alphabet size
B. Θ(R + L)
C. Θ(n + L)
D. Θ(R L) might need to create L nodes, each containing an array of R links
17 String symbol table implementations cost summary
character accesses (typical case) count distinct
search search space implementation insert moby.txt actors.txt hit miss (references)
2 2 2 red–black BST L + log n log n log n 4 n 1.4 97.4
hashing L L L 4 n to 16 n 0.76 40.6 (linear probing)
out of R-way trie L log R n R + L (R+1) n 1.12 memory
R-way trie. Method of choice for small R. Effective for medium R. Too much memory for large R.
Challenge. Use less memory, e.g., a 65,536-way trie for Unicode!
18 5.2 TRIES
‣ string symbol tables ‣ R-way tries Algorithms ‣ ternary search tries ‣ character-based operations
ROBERT SEDGEWICK | KEVIN WAYNE https://algs4.cs.princeton.edu Ternary search tries