Tachyon: A Reliable Memory Centric Storage for Analytics

Haoyuan (HY) Li, , , Scott Shenker,

UC Berkeley Outline

• Overview

• Research

• Open Source

• Future Outline

• Overview

• Research

• Open Source

• Future Memory is King

• RAM throughput increasing exponenally • Disk throughput increasing slowly

Memory-locality key to interacve response me Realized by many…

• Frameworks already leverage memory

5 Problem solved?

6 An Example: -

• Fast in-memory data processing framework – Keep one in-memory copy inside JVM – Track lineage of operaons used to derive data – Upon failure, use lineage to recompute data

Lineage Tracking

map

join reduce

filter map Issue 1 Different jobs share data: Slow writes to disk storage engine & Spark Task Spark Task execution engine same process Spark block 1 mem Spark block 3 mem (slow writes) block manager block 3 block manager block 1

block 1 block 2 HDFS block 3 block 4 disk

8 Issue 1 Different frameworks share data: Slow writes to disk storage engine & Spark Task Hadoop MR execution engine same process Spark block 1 mem YARN (slow writes) block manager block 3

block 1 block 2 HDFS block 3 Block 4 disk

9 Issue 2

execution engine & Spark Task storage engine same process block 1 Spark memory block 3 block manager

block 1 block 2 HDFS block 3 block 4 disk

10 Issue 2

execution engine & crash storage engine same process block 1 Spark memory block 3 block manager

block 1 block 2 HDFS block 3 block 4 disk

11 Issue 2

Process crash: lose all cache

execution engine & storage engine crash same process

block 1 block 2 HDFS block 3 block 4 disk

12 Issue 3 duplicate memory per job & GC

execution engine & Spark Task Spark Task storage engine same process Spark block 1 mem Spark block 3 mem (duplication & GC) block manager block 3 block manager Block 1

block 1 block 2 HDFS block 3 block 4 disk

13 Tachyon

Reliable data sharing at memory-speed within and across cluster frameworks/jobs

14 Soluon Overview

Basic idea • Push lineage down to storage layer • Use memory aggressively

Facts • One data copy in memory • Rely on recomputaon for fault-tolerance Stack

Computation Frameworks (Spark, MapReduce, Impala, Tez, …)

Tachyon

Existing Storage Systems (HDFS, S3, GlusterFS, …) Architecture

17 Lineage

18 Issue 1 revisited Different frameworks share at memory-speed execution engine & storage engine Spark Task Hadoop MR same process (fast writes) Spark block 1 mem YARN

block 1 block 2 Tachyon HDFS block 3 block 4 in-memory disk

block 1 block 2 HDFS Block 3 Block 4 disk 19 Issue 2 revisited

execution engine & Spark Task storage engine same process block 1 Spark memory block manager

block 1 block 2 Tachyon HDFS block 3 block 4 in-memory disk

20 Issue 2 revisited

execution engine & crash storage engine same process Spark memory block manager

block 1 block 2 Tachyon HDFS block 3 block 4 in-memorydisk

block 1 block 2 HDFS block 3 block 4 disk 21 Issue 2 revisited

process crash: keep memory- cache execution engine & storage engine crash same process

block 1 block 2 Tachyon HDFS block 3 block 4 in-memorydisk

block 1 block 2 HDFS block 3 block 4 disk 22 Issue 3 revisited Off-heap memory storage one memory copy & no GC execution engine & storage engine Spark Task Spark Task same process (no duplication & GC) Spark mem Spark mem

block 1 block 2 Tachyon HDFS block 3 block 4 in-memory disk

block 1 block 2 HDFS Block 3 Block 4 disk 23 Outline

• Overview

• Research

• Open Source

• Future Queson 1: How long to get missing data back?

That server contains the data computed last month!

25 Lineage enables Asynchronous Checkpoinng

26 Edge Algorithm

• Checkpoint leaves • Checkpoint hot files • Bounded Recovery Cost

27 Queson 2: How to allocate recomputaon resource?

Would recomputaon slow down my high priority jobs? Priority Inversion?

28 Recomputaon Resource Allocaon

• Priority Based Scheduler

• Fair Sharing Based Scheduler

29 Comparison with in Memory HDFS Further Improve Spark’s Performance

Grep Master Faster Recovery Recomputaon Resource Consumpon

33 Outline

• Overview

• Research

• Open Source

• Future A Open Source Status

• Apache License, Version 0.4.1 (Feb 2014)

• 15+ Companies

• Spark and MapReduce applications can run without any code change

Tachyon 0.5: Release Growth - ? contributors

Tachyon 0.4: - 30 contributors

Tachyon 0.3: - 15 contributors

Tachyon 0.2: - 3 contributors

Apr ‘13 Oct‘13 Feb ‘14 July ‘14 36 Contributors Outside of Berkeley

More than 75%

37 Tachyon is the Default Off-Heap Storage Soluon for .

Thanks to Redhat! Thanks to Redhat! Commercially Supported By Ageo

Thanks to Redhat! in

41 Spark/MapReduce/Shark without Tachyon

• Spark – val file = sc.textFile(“hdfs://ip:port/path”) • Hadoop MapReduce – hadoop jar hadoop-examples-1.0.4.jar wordcount hdfs://localhost:19998/input hdfs://localhost: 19998/output • Shark – CREATE TABLE orders_cached AS SELECT * FROM orders; Spark/MapReduce/Shark with Tachyon

• Spark – val file = sc.textFile(“tachyon://ip:port/path”) • Hadoop MapReduce – hadoop jar hadoop-examples-1.0.4.jar wordcount tachyon://localhost:19998/input tachyon:// localhost:19998/output • Shark – CREATE TABLE orders_tachyon AS SELECT * FROM orders; Spark OFF_HEAP with Tachyon

// Input data from Tachyon’s Memory val file = sc.textFile(“tachyon://ip:port/path”)

// Store RDD OFF_HEAP in Tachyon’s Memory file.persist(OFF_HEAP) Thanks to our Code Contributors! Aaron Davidson Chang Cheng Jey Kottalam Raymond Liu Achal Soni Du Li Joseph Tang Ali Ghodsi Fei Wang Lukasz Jastrzebski Robert Metzger Andrew Ash Gerald Zhang Manu Goyal Rong Gu Anurag Khandelwal Grace Huang Mark Hamstra Sean Zhong Aslan Bekirov Hao Cheng Nick Lanham Srinivas Parayya Bill Zhao Orcun Simsek Tao Wang Calvin Jia Henry Saputra Pengfei Xuan Timothy St. Clair Vamsi Chitters Colin Patrick McCabe Hobin Yoon Qifan Pu Xi Liu Shivaram Venkataraman Huamin Chen Qianhao Dong Xiaomin Zhang

45 Outline

• Overview

• Research

• Open Source

• Future Goal?

47 Beer Assist Other Components

MapRe Spark Tez Shark GraphX Impala …… duce

Tachyon

Gluster HDFS S3 Localfs NFS Ceph …… fs

Welcome Collaboraon! Short Term Roadmap

• Ceph Integraon (Redhat)

• Hierarchical Local Storage (Intel)

• Further improver Shark Performance (Yahoo)

• Beer support for Mul-tenancy (AMPLab)

• Many more from AMPLab and Industry Collaborators.

49 Your Requirements? Tachyon Summary

• High-throughput, fault-tolerant memory centric storage, with lineage as a first class citizen

• Further improve performance for frameworks such as Spark, Hadoop, and Shark etc.

• Healthy community with 15+ companies contributing Thanks! Quesons? • More Informaon: – hp://tachyon-project.org – hps://github.com/amplab/tachyon • Email: – [email protected]