Efficient Data Structures for High Speed Packet Processing

Efficient data structures for high speed packet processing Paolo Giaccone Notes for the class on \Computer aided simulations and performance evaluation " Politecnico di Torino November 2020 Outline 1 Applications 2 Theoretical background 3 Tables Direct access arrays Hash tables Multiple-choice hash tables Cuckoo hash 4 Set Membership Problem definition Application Fingerprinting Bit String Hashing Bloom filters Cuckoo filters 5 Longest prefix matching Patricia trie Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 2 / 93 Applications Section 1 Applications Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 3 / 93 Applications Big Data and probabilistic data structures 3 V's of Big Data Volume (amount of data) Velocity (speed at which data is arriving and is processed) Variety (types of data) Main efficiency metrics for data structures space time to write, to update, to read, to delete Probabilistic data structures based on different hashing techniques approximated answers, but reliable estimation of the error typically, low memory, constant query time, high scaling Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 4 / 93 Applications Probabilistic data structures Membership answer approximate membership queries e.g., Bloom filter, counting Bloom filter, quotient filter, Cuckoo filter Cardinality estimate the number of unique elements in a dataset. e.g., linear counting, probabilistic counting, LogLog and HyperLogLog Frequency in streaming applications, find the frequency of some element, filter the most frequent elements in the stream, detect the trending elements, etc. e.g., majority algorithm, frequent algorithm, count sketch, count{min sketch Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 5 / 93 Applications Probabilistic data structures Rank estimate quantiles and percentiles in a data stream using only one pass through the data e.g., random sampling, q-digest, t-digest Similarity find the nearest neighbor for large datasets using sub-linear in time solutions. e.g., locality-sensitive hashing, MinHash, SimHash Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 6 / 93 Applications Application to computer networks Packet processing examples Address lookup Where to send an incoming packet? Exact matching: Ethernet forwarding tables MAC address Port 56:d4:ae:eb:f8:b6 2 06:b3:01:8b:24:1e 1 Longest prefix matching: IP routing tables Network prefix Gateway Interface 130.192.9.0/24 * eth0 130.192.0.0/16 130.192.9.250 eth0 default/0 130.192.9.252 eth0 Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 7 / 93 Applications Application to computer networks Packet processing examples Firewall, ACL Which packet to accept or deny? \Drop all packets from evil source network 66.66/16 on ports 6-666" Usually needs 5 fields: IP source-address, IP dest-address, L4 protocol, source-port, dest-port Intrusion Detection Schemes Deep Packet inspection (DPI) \Drop all packets that contains the string EvilWorm anywhere within the packet" Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 8 / 93 Applications Challenges at high speed high speed processing allows a limited number of clock cycles to process a packet Data rate Transmission time (64B) 100 Mbps 5.12 µs 1 Gbps 512 ns 10 Gbps 51 ns 40 Gbps 12.8 ns 100 Gbps 5.12 ns the time required to elaborate a packet depends on the specific data structure adopted to store all the packet processing rules each kind of packet processing problem requires an optimized data structure to minimize memory accesses memory access time is typically the main performance bottleneck e.g., use hash tables for exact matching, use Patricia tries for longest prefix matching Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 9 / 93 Applications Exact matching problem Solution #1 sorted table: array of (MAC address, port) pairs sorted based on the MAC address perform binary search to lookup for an address each iteration corresponds to a memory access at most log N iterations/memory accesses for a lookup, where N is the number of stored MAC addresses deterministic lookup time Solution #2 hash tables: search is much faster than solution #1 on average, even if the worst case can be O(log N= log log N) it is possible to adopt more sophisticated data structure/hashing techniques (e.g., to reduce memory) Bloom Filters, fingerprinting, etc. Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 10 / 93 Theoretical background Section 2 Theoretical background Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 11 / 93 Theoretical background \Bins and Balls" model for load balancing n balls, n bins for any dropping policy, average occupancy 1 ball/bin a deterministic algorithm, by comparing the occupancy of all n bins, is able to achieve a maximum occupancy 1 ball/bin not always feasible to track the size of O(n) bins, when n is very large Main question Is it possible to achieve similar performance to the optimal load balancing (i.e., 1 ball/bin) with an algorithm that track O(1) bins? Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 12 / 93 Theoretical background Randomized load balancing Random Dropping policy 1 choose uniformly at random 1 bin log n Maximum occupancy sharply concentrated on (1 + O(1)) and log log n 3 log n upper bounded by log log n Random Load Balancing policy 1 choose uniformly at random d bins 2 place the ball in the least occupied one log log n Maximum occupancy sharply concentrated on + O(1) log d Very famous result: \the power of 2 or d random choices" Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 13 / 93 Theoretical background Theoretical results for randomized dropping policies 8 Random Dropping Random Load Balancing d=2 7 Random Load Balancing d=3 Random Load Balancing d=4 6 5 4 Maximum occupancy 3 2 1 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 n Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 14 / 93 Theoretical background Simulation results for randomized dropping policies 10 Random dropping Random load balancing d=2 Random load balancing d=4 8 95% confidence intervals 6 occupancy bin 4 Max 2 0 100 1000 10000 100000 1x106 Bins/Balls Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 15 / 93 Theoretical background The birthday problem Exact question What is the minimum number of students in a class to be sure that at least 2 of them have the same birthday? 366 students (we neglect leap years for simplicity) Approximated question What is the minimum number of students in a class such that at least 2 of them have the same birthday with some given probability? 23 students to get probability > 0:5 41 students to get probability > 0:9 Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 16 / 93 Theoretical background Exact analysis of birthday problem Assume a class of m students, with m < 366. Letp ¯ be the probability that all m students have distinct birthdays: m−1 m−1 364 363 365 − m + 1 Y 365 − i Y i p¯ = × × ::: × = = 1 − 365 365 365 365 365 i=0 i=0 Now the probability p at least two students have the same birthday (i.e., birthday collision event) is m−1 Y i 365! p = 1 − p¯ = 1 − 1 − = 1 − 365 (365 − m)! × 365m i=0 If m = 23 then p = 0:507. If m = 41, then p = 0:903. Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 17 / 93 Theoretical background Approximated analysis of birthday problem Now observe that e−ax ≈ 1 − ax if x is small m−1 Pm−1 i m(m−1) 2 Y − i − i=0 − − m p ≈ 1 − e 365 = 1 − e 365 = 1 − e 2×365 ≈ 1 − e 2×365 i=0 1 Exact 0.9 Approx 0.8 0.7 0.6 0.5 0.4 0.3 Prob(birthday collision) 0.2 0.1 0 0 10 20 30 40 50 60 70 80 Number of people Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 18 / 93 Theoretical background Generalized version of birthday problem Probability of collision Consider m elements chosen uniformly at random, with repetition, from a set of cardinality n, with m < n. The probability p(n) that at least one pair of equal elements has been chosen (i.e., a \collision" has occurred) is 2 − m p(n) ≈ 1 − e 2n (1) Proof: this is just a generalization of birthday problem with a generic number n of days in a year. Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 19 / 93 Theoretical background Generalized version of birthday problem Number of elements for a collision Consider m elements chosen uniformly at random, with repetition, from a set of cardinality n, with m < n. To observe at least one collision with probability p, it must hold: s 1 m ≈ 2n log 1 − p Proof: From (1), we can write: m2 s − m2 1 1 − p = e 2n ) log(1 − p) = − ) m = 2n log 2n 1 − p Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov. 2020 20 / 93 Theoretical background Generalized version of birthday problem Typical number of elements for a collision Assume p = 0:5. Then, p m ≈ 1:17 n It can be shown that the typical number of elements is sharply concentrated around its average, when n ! 1: rπ p E[m] = × n ≈ 1:25 n 2 For example, for n = 365, m ≈ 22:3 and E[m] = 23:9. Giaccone (Politecnico di Torino) Hash, Cuckoo, Bloom and Patricia Nov.

Efficient Data Structures for High Speed Packet Processing

Podcast Ch23a

A General-Purpose High Accuracy and Space Efficient

Binary Index Trees a Cumulative Frequency Array Allows Us To

B-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches

Compact Fenwick Trees for Dynamic Ranking and Selection

K-Mer Data Structures Rayan Chikhi CNRS, Univ

CMU SCS 15-721 (Spring 2020) :: OLTP Indexes (Trie Data Structures)

Space- and Time-Efficient String Dictionaries

Quotient Hash Tables-Efficiently Detecting Duplicates in Streaming

Reconfigurable Architecture for Minimal Perfect Sequencing Using the Convey Hybrid Core Computer Chad Michael Nelson Iowa State University

The Guide to Xillybus Lite

Hashing and Amortization