Evaluating for HPC Applications

Lavanya Ramakrishnan CRD & NERSC Magellan Research Agenda

• What are the unique needs and features of a science cloud? – NERSC Magellan User Survey • What applications can efficiently run on a cloud? – Benchmarking cloud technologies (Hadoop, ) and platforms (Amazon EC2, Azure) • Are Programming Models such as Hadoop effective for scientific applications? – Experimentation with early applications • Can scientific applications use a data-as-a-service or software-as-a-service model? • What are the security implications of user-controlled cloud images? • Is it practical to deploy a single logical cloud across multiple DOE sites? • What is the cost and energy efficiency of clouds?

2 Magellan User Survey

User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources through science Hadoop File System

MapReduce Programming Model/Hadoop Cost associativity? (i.e., I can get 10 cpus for 1 hr now or 2 cpus for 5 hrs at the same cost) Easier to acquire/operate than a local cluster Exclusive access to the computing resources/ability to schedule independently of other groups/users Ability to control groups/users

Ability to share setup of software or experiments with collaborators

Ability to control software environments specific to my application Access to on-demand (commercial) paid resources closer to deadlines Access to additional resources

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

Program Office Program Office

Advanced Scientific Computing Research 17% High Energy Physics 20% Biological and Environmental Research 9% Nuclear Physics 13% Basic Energy Sciences -Chemical Sciences 10% Advanced Networking Initiative (ANI) Project 3% Fusion Energy Sciences 10% Other3 14% Cloud Computing Services

• Infrastructure (IaaS) – Provide unlimited access to data storage and compute cycles – e.g., Amazon EC2, Eucalyptus • (PaaS) – Delivery of a computing platform/software stack – Container/images for specific user groups – e.g., Hadoop, Azure • – Specific function provided for use across multiple user groups (i.e. Science Gateways)

4 Magellan Software

ANI Cloud Hadoop Gateways (Eucalyptus) BatchQueues PrivateClusters Virtual Machines Virtual PublicorRemote Scienceand Storage

Magellan Cluster

5

• Web-service API to IaaS offering • Uses Xen paravirtualization – cluster compute instance type uses hardware assisted virtualization • Non-persistent local disk in VM • Simple Storage Service (S3) – scalable persistent object store • Elastic Block Storage (EBS) – persistent, block level storage Eucalyptus

• Open source IaaS implementation – API compatible with Amazon AWS – manage virtual machines • Walrus & Block Storage – interface compatible to S3 & EBS • Available to users on Magellan testbed • Private virtual clusters – scripts to manage dynamic virtual clusters – NFS/Torque etc Virtualization Impact

• Platforms – Amazon, Azure, Lawrencium (IT cluster) – Magellan • IB, TCP over IB, TCP over Ethernet, VM • Workloads – HPCC – NERSC6 Benchmarks – Applications Pipelines • JGI Supernova Factory • Metrics – Performance, Cost, Reliability, Programmability NERSC-6 Benchmark Performance [1/2] 18

16 Amazon EC2

14 Lawrencium

12 EC2-Beta

10 EC2-Beta-Opt

Franklin 8

Carver

Runtime RelativeCarverRuntime to 6

4

2

0 GAMESS GTC IMPACT fvCAM MAESTRO256 NERSC-6 Benchmark Performance [2/2]

60

50

40 Amazon EC2 Lawrencium 30 EC2-Beta EC2-Beta-Opt Franklin 20 Carver Runtime relativeCarverRuntime to

10

0 MILC PARATEC Magellan: NERSC6 Application Benchmarks

100 90 80 TCP o IB 70 60 TCP o Eth 50 40 VM 30 20 10 PercentagePerformance RelativeNative to 0 GTC PARATEC CAM Magellan: HPCC

100 TCP o IB 80 TCP o Eth 60 VM 40 20 RespectNative to 0 PercentagePerformance with DGEMM Ping Pong HPL PTRANS Bandwidth 12 10 TCP o IB 8 TCP o Eth 6 IB/VM to Native to 4 Percentage 2 PerformanceRelative 0 Ping Pong Latency RandRing Latency Performance-Cost Tradeoffs

• James Hamilton’s cost model – http://perspectives.mvdirona.com/ 2010/09/18/OverallDataCenterCosts.aspx – expanded for HPC environments • Quantify difference in cost between IB and Ethernet

Application Class Performance Increase/Cost Increase Tightly Coupled with IO 4.36 (e.g., CAM) Tightly Coupled with 1.2 minimal IO (e.g., PARATEC, GTC) Nearby Supernova Factory

• Tools to measure the expansion history of the Universe and explore the nature of Dark Energy – Largest data volume supernova search • Data Pipeline – Custom data analysis codes • Coordinated by Python scripts – Run on a standard Linux batch queue cluster • Cloud provides – Control over OS versions – Root access and shared “group” account – Immunity to externally enforced OS or architecture changes Experiments on Amazon EC2

Input Data Output Data

EBS via NFS Local storage to EBS via NFS

Staged to local Local storage to storage from EBS via NFS EBS EBS via NFS EBS via NFS

Staged to local EBS via NFS storage from EBS EBS via NFS Local storage to S3

Staged to local Local storage to storage from S3 S3

15 Total Cost (Instance + Storage/ month)

180 Data Cost per month (80) Instance Cost (80) 160

Data Cost per month (40) Instance Cost (40) 140

120

100

cost/month) cost/month) 80

60 storgge

40

20 Total Cost of an experimentinstancesan + (Costof Costof Total 0 EBS-A1 EBS-A2 EBS-B1 EBS-B2 S3-A S3-B Experiment Matrix Magellan Software

ANI Cloud Hadoop Gateways (Eucalyptus) BatchQueues PrivateClusters Virtual Machines Virtual PublicorRemote Scienceand Storage

Magellan Cluster

17 Hadoop Stack • Open source reliable, scalable distributed computing – implementation of MapReduce – Hadoop Distributed File System (HDFS)

Pig Chukwa Hive HBase

Zoo MapReduce HDFS Keeper

Core Avro

Source: Hadoop: The Definitive Guide Hadoop for Science • Advantages of Hadoop – transparent data replication, data locality aware scheduling – fault tolerance capabilities • Mode of operation – use streaming to launch a script that calls executable – HDFS for input, need shared file system for binary and database – input format • handle multi-line inputs (BLAST sequences), binary data (High Energy Physics)

19 Hadoop Benchmarking: Early Results [1/2]

• Compare traditional parallel file systems to HDFS – TeraGen and Terasort to compare file system performance • 32 maps for TeraGen and 64 reduces for Terasort over a terabyte of data

6000 TeraGen

4000 Terasort-64 reduces

2000 Time (seconds) Time

0 GPFS HDFS Lustre Hadoop Benchmarking: Early Results [2/2]

70 10MB 60 10GB

50

40

30 Throughput MB/s Throughput 20

10

0 0 10 20 30 40 50 60 Number of concurrent writers TestDFSIO to understand concurrency at default block size

21 IMG Systems: Genome & Metagenome Data Flow

5,115 Genomes Every 4 months 6.5 Mil genes + ~ 350 - 500 Genomes ~ .5 – 1 Mil Genes

+ 330 Genomes ! 158 GEBA On demand 8.2 Mil genes

+ 287 Samples: ~105 Studies + 12.5 Mil genes 19 Mil genes Monthly

65 Samples: 21 Studies IMG+2.6 Mil genes On demand 9.1 Mil total BLAST on Hadoop • NCBI BLAST (2.2.22) – reference IMG genomes- of 6.5 mil genes (~3Gb in size) – full input set 12.5 mil metagenome genes against reference • BLAST Hadoop – uses streaming to manage input data sequences – binary and databases on a shared file system • BLAST Task Farming Implementation – server reads inputs and manages the tasks – client runs blast, copies database to local disk or ramdisk once on startup, pushes back results – advantages: fault-resilient and allows incremental expansion as resources come available BLAST Performance

100 90 EC2-taskFarmer 80 70 Franklin- 60 taskFarmer EC2-Hadoop 50 40 Azure

Time (minutes) Time 30 20 10 0 16 32 64 128 Number of processors BLAST on Yahoo! M45 Hadoop

• Initial config – Hadoop memory ulimit issues, – Hadoop memory limits increased to accommodate high memory tasks – 1 map per node for high memory tasks to reduce contention – thrashing when DB does not fit in memory • NFS shared file system for common DB – move DB to local nodes (copy to local /tmp). – initial copy takes 2 hours, but now BLAST job completes in < 10 minutes – performance is equivalent to other cloud environments. – future: Experiment with Distributed Cache • Time to solution varies - no guarantee of simultaneous availability of resources Strong user group and sysadmin support was key in working through this. Summary: Virtualization for Science • Porting applications still requires lots of work • Public clouds – Virtualization has a performance impact – Failures when creating large instances – Data Costs tend to be overwhelming • Eucalyptus – Learning curve and stability at scale • Alternate stacks and trying version 2.0 – SSE instructions were not exposed in VM • Additional benchmarking

26 Summary: Hadoop for Science

• Deployment Challenges – all jobs run as user “hadoop” affecting file permissions – less control on how many nodes are used - affects allocation policies – file system performance for large file sizes • Programming Challenges: No turn-key solution – using existing code bases, managing input formats and data • Performance – BLAST over Hadoop: performance is comparable to existing systems – existing parallel file systems can be used through Hadoop On Demand • Additional benchmarking, tuning needed, Plug- ins for Science Acknowledgements

This work was funded in part by the Advanced Scientific Computing Research (ASCR) in the DOE Office of Science under contract number DE-C02-05CH11231. CITRIS/UC, Yahoo M45!, Amazon EC2 Education and Research Grants, Research, Wei Lu, Dennis Gannon, Masoud Nikravesh and Greg Bell.

Magellan - Shane Canon, Iwona Sakrejda Magellan Benchmarking – Shane Canon, Nick Wright EC2 Benchmarking – Keith Jackson, Krishna Muriki, John Shalf, Shane Canon, Harvey Wasserman, Shreyas Cholia, Nick Wright BLAST – Shreyas Cholia, Keith Jackson, Shane Canon, John Shalf Supernova Factory – Keith Jackson, Rollin Thomas, Karl Runge Questions [email protected]

29