HPC-ABDS Integrated Software
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack Geoffrey Foxa, Judy Qiua, Shantenu Jhab, Supun Kamburugamuvea and Andre Luckowb a School of Informatics and Computing, Indiana University, Bloomington, IN 47408, USA b RADICAL, Rutgers University, Piscataway, NJ 08854, USA Abstract We review the High Performance Computing Enhanced Apache Big Data Stack HPC-ABDS and summarize the capabilities in 21 identified architecture layers. These cover Message and Data Protocols, Distributed Coordination, Security & Privacy, Monitoring, Infrastructure Management, DevOps, Interoperability, File Systems, Cluster & Resource management, Data Transport, File management, NoSQL, SQL (NewSQL), Extraction Tools, Object-relational mapping, In-memory caching and databases, Inter-process Communication, Batch Programming model and Runtime, Stream Processing, High-level Programming, Application Hosting and PaaS, Libraries and Applications, Workflow and Orchestration. We summarize status of these layers focusing on issues of importance for data analytics. We highlight areas where HPC and ABDS have good opportunities for integration. 1. Introduction to HPC-ABDS Big Data ABDS HPC, Cluster 17. Orchestration Crunch, Tez, Cloud Dataflow Kepler, Pegasus, Taverna 16. Libraries Mllib/Mahout, R, Python ScaLAPACK, PETSc, Matlab 15A. High Level Programming Pig, Hive, Drill Domain-specific Languages 15B. Platform as a Service App Engine, BlueMix, Elastic Beanstalk XSEDE Software Stack Languages Java, Erlang, Scala, Clojure, SQL, SPARQL, Python Fortran, C/C++, Python 14B. Streaming Storm, Kafka, Kinesis HPC-ABDS 13,14A. Parallel Runtime MapReduce MPI/OpenMP/OpenCL Integrated CUDA, Exascale Runtime 2. Coordination Zookeeper 12. Caching Memcached Software 11. Data Management Hbase, Neo4J, MySQL iRODS 10. Data Transfer Sqoop GridFTP 9. Scheduling Yarn Slurm 8. File Systems HDFS, Object Stores Lustre 1, 11A Formats Thrift, Protobuf FITS, HDF 5.
[Show full text]