PAM: Parallel Augmented Maps Yihan Sun Daniel Ferizovic Guy E. Belloch Carnegie Mellon University Karlsruhe Institute of Technology Carnegie Mellon University
[email protected] [email protected] [email protected] Abstract 1 Introduction Ordered (key-value) maps are an important and widely-used The map data type (also called key-value store, dictionary, data type for large-scale data processing frameworks. Beyond table, or associative array) is one of the most important da- simple search, insertion and deletion, more advanced oper- ta types in modern large-scale data analysis, as is indicated ations such as range extraction, filtering, and bulk updates by systems such as F1 [60], Flurry [5], RocksDB [57], Or- form a critical part of these frameworks. acle NoSQL [50], LevelDB [41]. As such, there has been We describe an interface for ordered maps that is augment- significant interest in developing high-performance paral- ed to support fast range queries and sums, and introduce a lel and concurrent algorithms and implementations of maps parallel and concurrent library called PAM (Parallel Augment- (e.g., see Section 2). Beyond simple insertion, deletion, and ed Maps) that implements the interface. The interface includes search, this work has considered “bulk” functions over or- a wide variety of functions on augmented maps ranging from dered maps, such as unions [11, 20, 33], bulk-insertion and basic insertion and deletion to more interesting functions such bulk-deletion [6, 24, 26], and range extraction [7, 14, 55]. as union, intersection, filtering, extracting ranges, splitting, One particularly useful function is to take a “sum” over a and range-sums.