STXXL: Standard Template Library for XXL Data Sets

STXXL: Standard Template Library for XXL Data Sets Roman Dementiev Institute for Theoretical Computer Science, Algorithmics II University of Karlsruhe, Germany March 19, 2007 Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 1/43 Outline 1 Introduction 2 Experimental Parallel Disk System 3 The STXXL Library 4 Engineering Algorithms for Large Graphs 5 Engineering Large Suffix Array Construction 6 Porting Algorithms to External Memory 7 Summary Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 2/43 Large Data Sets Where they come from Geographic information systems: GoogleEarth, NASA’s World Wind Computer graphics: visualize huge scenes Billing systems: phone calls, traffic Analyze huge networks: Internet, phone call graph Text collections: , , etc. How to process them Buy a TByte main memory? expensive or impossible Here: how to process very large data sets cost-efficiently Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 3/43 Large Data Sets Where they come from Geographic information systems: GoogleEarth, NASA’s World Wind Computer graphics: visualize huge scenes Billing systems: phone calls, traffic Analyze huge networks: Internet, phone call graph Text collections: , , etc. How to process them Buy a TByte main memory? expensive or impossible Here: how to process very large data sets cost-efficiently Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 3/43 RAM Model vs. Real Computer A straightforward solution Use small main memory keep data on cheap disks, “unlimited” virtual memory O(1) registers Theory: should work (von Neumann (RAM) model) ALU 1 word = O(log n) bits I Unit cost memory access freely programmable large memory I No locality of reference Practice: terrible performance 6 I Random hard disk accesses are 10 slower than main memory accesses I Strong locality of reference ⇒ I/O is the bottleneck Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 4/43 The Parallel Disk Model (PDM) [Vitter&Shriver’1994] Parameters Input size N CPU Memory size M N Block size B The number of disks D M Performance DB Minimize the number of I/O steps In an I/O step try transfer D blocks 1 2... D Minimize the number of CPU instructions I/O-efficient algorithms ≡ external memory algorithms Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 5/43 Engineering Parallel Disk Systems 2x Xeon 4 Threads 400x64 Mb/s Intel 1 GB E7500 DDR Chipset 128 RAM 2x64x66 Mb/s PCI−Busses 4x2x100 MB/s Controller Channels 8x45 8x80 MB/s GB Challenges Cheap case for ≥ 8 hard disks Many fast PCI slots for ATA controllers (no bus bottlenecks) Wide Parallel ATA cables worsen airflow (later system use Serial ATA) File system scalability: very large files ⇒ 375 MB/s (≈ 98% of the peak) for about 3000 Euro in 2002 ⇒ Other systems: 10 disks = 640 MB/s, 4 disks = 214 MB/s Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 6/43 The STXXL Library Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 7/43 I/O-Efficient Software Libraries Advantages Abstract away the technical details of I/O Provide implementation of basic I/O-eff. algorithms and data structures ⇒ Boost algorithm engineering Existing Libraries TPIE: many (geometric) search data structures LEDA-SM: extension of LEDA (discontinued) + Good demonstrations of the external memory concepts – Do not implement many features that speed up I/O-efficient algorithms Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 8/43 I/O-Efficient Software Libraries Advantages Abstract away the technical details of I/O Provide implementation of basic I/O-eff. algorithms and data structures ⇒ Boost algorithm engineering Existing Libraries TPIE: many (geometric) search data structures LEDA-SM: extension of LEDA (discontinued) + Good demonstrations of the external memory concepts – Do not implement many features that speed up I/O-efficient algorithms Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 8/43 The STXXL Library STL – C++ Standard Template Library, implements basic containers (maps, sets, priority queues, etc.) and algorithms (quicksort, mergesort, selection, etc.) STXXL : Standard Template Library for XXL Data Sets http://stxxl.sourceforge.net containers and algorithms that can process huge volumes of data that only fit on disks (I/O-efficient) I Compatible with STL I Performance–oriented Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 9/43 STXXL Features Dementiev, Kettner, Sanders STXXL: The Standard Template Library for Extra Large Data Sets. ESA 2005, 13th Annual European Symposium on Algorithms CPU Transparent parallel disk support Handles very large problems (up to petabytes) SORT SCAN SCAN M Pipelining saves many I/Os Explicitly overlaps I/O and computation DB Avoids superfluous copying I in OS I/O subsystem and the library itself1 2... D Compatible with STL – C++ Standard Template Library I Short development times I Reuse of STL code (e.g. selection alg.) Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 10/43 STXXL Design Applications STL−user layer Streaming layer vector, stack, set Containers: priority_queue, map Pipelined sorting, Algorithms: sort, for_each, merge zero−I/O scanning Block management (BM) layer typed block, block manager, buffered streams, block prefetcher, buffered block writer TXXL S Asynchronous I/O primitives (AIO) layer files, I/O requests, disk queues, completion handlers Operating System Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 11/43 STXXL Design: AIO Layer Hides details of async. I/O Applications (portability, user-friendly) STL−user layer Streaming layer vector, stack, set Containers: priority_queue, map Pipelined sorting, Implementations for Algorithms: sort, for_each, merge zero−I/O scanning Linux/MacOSX/BSD/Solaris Block management (BM) layer and Windows systems typed block, block manager, buffered streams, block prefetcher, buffered block writer Asynchrony provided by Asynchronous I/O primitives (AIO) layer POSIX threads or Boost files, I/O requests, disk queues, Threads completion handlers Unbuffered I/O support: Operating System more control over I/O Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 12/43 STXXL Design: BM Layer Applications Block abstraction STL−user layer Streaming layer vector, stack, set Containers: priority_queue, map Pipelined sorting, Parallel disk model Algorithms: sort, for_each, merge zero−I/O scanning Block management (BM) layer (Randomized) striping and typed block, block manager, buffered streams, cycling block prefetcher, buffered block writer Parallel disk buffered writing Asynchronous I/O primitives (AIO) layer and optimal prefetching files, I/O requests, disk queues, completion handlers [Hutchinson&Sanders&Vitter01] Operating System Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 13/43 STXXL User Layers Applications STL−user layer Streaming layer STL-user layer: compatible vector, stack, set Containers: Pipelined sorting, priority_queue, map Algorithms: sort, for_each, merge zero−I/O scanning with STL, vector, stack, queue, deque, priority queue, Block management (BM) layer typed block, block manager, buffered streams, map, sorting, scanning block prefetcher, buffered block writer Streaming layer: Asynchronous I/O primitives (AIO) layer files, I/O requests, disk queues, programming with pipelining completion handlers Operating System Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 14/43 Example: Generate Random Graph with STXXL 1 stxxl::vector<edge> Edges(10000000000ULL); 2 std::generate(Edges.begin(),Edges.end(),random_edge()); 3 stxxl::sort(Edges.begin(),Edges.end(),edge_cmp(),512*1024*1024); 4 stxxl::vector<edge>::iterator NewEnd = std::unique(Edges.begin(),Edges.end()); 5 Edges.resize(NewEnd - Edges.begin()); Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 15/43 Streaming Layer and Pipelining DATA DATA EM algorithm ⇒ data flow through a DAG SCAN Feed output data stream directly to the consumer algorithm SORT A new iterator-like interface for EM algorithms Basic pipelined implementations (file, sorting nodes, etc.) provided by STXXL SCAN Saves many I/Os (factor 2–3) in many EM algorithms DATA Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 16/43 Parallel Disk Sorting: an Important Part of STXXL Sorting is the core routine of almost every I/O-eff. algorithm. We engineer a parallel disk sorting algorithm and implementation which guarantees: 1 N N Optimal I/O volume sort(N) = O DB logM/B B 2 Almost perfect overlapping of I/O and computation Dementiev and Sanders Asynchronous Parallel Disk Sorting. SPAA 2003, 15th ACM Symposium on Parallelism in Algorithms and Architectures Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 17/43 I/O-Efficient Multiway Merge Sort The Algorithm 1 Sort input chunks of size Θ(M) (aka runs) 2 Θ(M/B)-way merge runs until only one sorted run left N N ⇒ O B logM/B M I/Os Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 18/43 I/O-Efficient Multiway Merge Sort The Challenges 1 Overlapping of I/O and computation: running time ≈ max(IOtime(N),CPUtime(N)) N N 2 B logM/B M Disk balancing: IOtime(N) ≈ O D Roman Dementiev (University of Karlsruhe) STXXL March 19, 2007 19/43 Run Formation: Easy Two threads cooperate to build k runs of size M/2: post a read request for runs 1 and 2 thread A: | thread B: for r:=1 to k do | for r:=1 to k-2 do wait until | wait until run r is read | run r is written sort run r | post a read for run r+2 post a write for run r | control flow in thread A control flow in thread B time 1 2 3 4 k−1 k read ... I/O Stripe data over the disks 1 2 3 4 bound ... k−1 k sort case 1 2 3 4 k−1 k write ... TRF (N) = M N 1 2 3 4 k−1 k max(2Tsort ( 2 ) M ,TIO (N)) + Tstartup read ... compute 1 2 3 4 k−1 k bound ... sort case 1 2 3 4 k−1 k write ..

Load more