Mmap: Fast Billion-Scale Graph Computation on a PC Via Memory Mapping

2014 IEEE International Conference on Big Data MMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng (Polo) Chau Ho Lee, U Kang Georgia Tech KAIST Atlanta, Georgia Daejeon, Republic of Korea fzlin48, kahng, kmsabrin, [email protected] fcrtlife, [email protected] Abstract—Graph computation approaches such as GraphChi PageRank Runtime on Twitter Graph and TurboGraph recently demonstrated that a single PC can (1.5 billion edges; 10 iterations) perform efficient computation on billion-node graphs. To achieve high speed and scalability, they often need sophisticated data GraphChi 1248 s structures and memory management strategies. We propose a minimalist approach that forgoes such requirements, by leverag- TurboGraph 198 s ing the fundamental memory mapping (MMap) capability found MMap 131 s on operating systems. We contribute: (1) a new insight that MMap is a viable technique for creating fast and scalable graph algorithms that surpasses some of the best techniques; (2) the design 1-step Neighbor Query Runtime on and implementation of popular graph algorithms for billion-scale YahooWeb Graph (6.6 billion edges) graphs with little code, thanks to memory mapping; (3) extensive TurboGraph 154.7 ms experiments on real graphs, including the 6.6 billion edge Ya- hooWeb graph, and show that this new approach is significantly faster or comparable to the highly-optimized methods (e.g., 9.5X MMap 3.3 ms faster than GraphChi for computing PageRank on 1.47B edge MMap Twitter graph). We believe our work provides a new direction in Fig. 1: Top: Our method (memory mapping) is 9.5X the design and development of scalable algorithms. Our packaged as fast as GraphChi and comparable to TurboGraph; these code is available at http://poloclub.gatech.edu/mmap/. state-of-the-art techniques use sophisticated data structures and explicit memory management, while MMap takes a minimalist approach using memory mapping. Bottom: MMap is 46X as I. INTRODUCTION fast as TurboGraph for querying 1-step neighbors on 6.6 billion Large graphs with billions of nodes and edges are in- edge YahooWeb graph (times are averages over 5 nodes with creasingly common in many domains, ranging from computer degrees close to 1000 each). science, physics, chemistry, to bioinformatics. Such graphs’ sheer sizes call for new kinds of scalable computation frame- access graph edges on the disk as if they were in the main works. Distributed frameworks have become popular choices; memory. In addition, the OS caches items that are frequently prominent examples include GraphLab [17], PEGASUS [12], used, based on policies such as the least recently used (LRU) and Pregel [18]. However, distributed systems can be expensive page replacement policy, which allows us to defer memory to build [14], [8] and they often require cluster management management and optimization to the OS, instead of imple- and optimization skills from the user. Recent works, such menting these functionalities ourselves, as GraphChi [14] and as GraphChi [14] and TurboGraph [8], take an alternative TurboGraph [8] did. This caching feature is particularly desir- approach. They focus on pushing the boundaries as to what able for computation on large real-world graphs, which often a single machine can do, demonstrating impressive results that exhibit power-law degree distributions [6]. In such graphs, even for large graphs with billions of edges, computation can information about a high-degree node tends to be accessed be performed at a speed that matches or even surpasses that many times by a graph algorithm (e.g., PageRank), and thus of a distributed framework. When analyzing these works, we is cached in the main memory by the OS, resulting in higher observed that they often employ sophisticated techniques in overall algorithm speed. order to efficiently handle a large number of graph edges [14], [8] (e.g., via explicit memory allocation, edge file partitioning, In this paper, we present MMap, a fast and scalable graph and scheduling). computation method that leverages the memory mapping technique, to achieve the same goal as GraphChi and TurboGraph, Can we streamline all these, to provide a simpler approach but through a simple design. Our major contributions include: that achieves the same, or even better performance? Our curiosity led us to investigate whether memory mapping, a • New Insight. We show that the well-known memory fundamental capability in operating systems (OS) built upon mapping capability is in fact a viable technique for virtual memory management system, can be a viable technique easily creating fast and scalable graph algorithms to support fast and scalable graph computation. Memory that surpasses some of the best graph computation mapping is a mechanism that maps a file on the disk to the approaches such as GraphChi and TurboGraph as virtual memory space, which enables us to programmatically shown in Figure 1 and Section IV. 978-1-4799-5666-1/14/$31.00 ©2014 IEEE 159 source_id target_id to comprehensive resources such as [15] for more details. 0 1 Figure 2 briefly describes how memory mapping works. 0 4 0 5 Systems like GraphChi and TurboGraph implemented some 0 8 of the above techniques like custom pages and page tables 1 1 1 44 by themselves, which are eventually translated into OS-level 1 50 paging. We believe this indirection incurs overhead and may 2 3 not have fully utilized memory management optimization 3 3 already built on the OS. This belief prompted us to investigate 3 10 using memory mapping to directly scale up graph algorithms. B. Advantages of Memory Mapping 999,998 875 999,999 255,750 There are many advantages of using memory mapping, especially when processing large files. Below we summarize Edge List file Physical Memory (e.g. tens of GB) (e.g. 8 GB) the major advantages of memory mapping [15]. Fig. 2: How memory mapping works. A portion of a file on • Reading from and writing to a memory-mapped file disk is mapped into memory (blue); portions no longer needed do not require the data to be copied to and from a are unmapped (orange). A large graph is often stored as an user-space buffer while standard read/write do. edge list (left), which usually does not fit in the main memory • Aside from any potential page faults, reading from (right). MMap treats the edge file as if it were fully loaded into and writing to a memory-mapped file do not incur memory; programmatically, the edge list is accessed like an any overhead due to context switching. array. Each “row” of the edge file describes an edge, identified • When multiple processes map the same data into by its source node ID and target node ID. memory, they can access that data simultaneously. Read-only and shared writable mappings are shared in their entirety; private writable mappings may have • Design & Implementation. We explain how MMap their not-yet-COW (copy-on-write) pages shared. can be leveraged to build important algorithms for large graphs, using simple data structures and little III. MMAP:FAST &SCALABLE GRAPH COMPUTATION code (Section III). For example, GraphChi’s frame- THROUGH MEMORY MAPPING work consists of more than 8000 lines of code [14], while MMap’s is fewer than 450 lines1. We describe our fast and minimal MMap approach for large graph computation. We will explain how MMap uses simpler • Extensive Evaluation on Large Graphs. Using large data structures for storing and accessing graph edges and how real graphs with up to 6.6 billion edges (YahooWeb MMap flexibly supports important classes of graph algorithms. [25]), our experiments show that MMap is significantly faster than or comparable to GraphChi and Tur- A. Main Ideas boGraph. We also evaluate how MMap’s performance would sustain for different graph sizes. Existing approaches. As identified by GraphChi and TurboGraph researchers [14], [8], the crux in enabling fast Importantly, we are not advocating to replace existing ap- graph computation is to design efficient techniques to store proaches with MMap. Rather, we want to highlight the kind of and access the large number of graph’s edges. GraphChi and performance we can achieve by leveraging memory mapping TurboGraph, among others, designed sophisticated methods alone. We believe MMap has strong potential to benefit a wide such as parallel sliding windows [14] and pin-and-slide [8] array of algorithms, besides the graph algorithms that we focus to efficiently access the edges. To handle the large number on in this work. of edges that may be too large to fit in memory (e.g., 50GB for YahooWeb), GraphChi and TurboGraph utilize sharding to II. BACKGROUND:MEMORY MAPPING AND ITS break the edge lists into chunks, load and unload those chunks ADVANTAGES into the memory, perform necessary computation on them, and move the partially computed results back and forth to the disk. Here, we describe how memory mapping works and how This requires them to convert the simple edge list file into a it may benefit large graph computation. We refer our readers complex, sharded and indexed database, and to have extraneous to [19], [20], [24], [4] for more details on memory mapping. memory management for optimally accessing the database. Our streamlined approach. We would like to forgo these A. Memory Mapping steps with a simpler approach by leveraging memory mapping. Memory mapping is a mechanism that maps a file or part In spirit, our goal is the same as GraphChi and TurboGraph, of a file into the virtual memory space, so that files on the disk but we defer the memory management to the OS. Once a can be accessed as if they were in memory. Memory mapping graph data file is memory-mapped to its binary representation, is a mature and well-studied technique.

Mmap: Fast Billion-Scale Graph Computation on a PC Via Memory Mapping

Beej's Guide to Unix IPC

Mmap and Dma

An Introduction to Linux IPC

Table of Contents

IPC: Mmap and Pipes

A Machine-Independent DMA Framework for Netbsd

Interprocess Communication

Real-Time Streaming Video and Image Processing on Inexpensive Hardware with Low Latency

Lecture 14: Paging

Process Address Spaces and Binary Formats

Analysis of Body Movement and Its Effects on Cyberware 3D Whole

CS162 Operating Systems and Systems Programming Lecture 19