NASA Big Data Challenges: Ames Perspective
Total Page:16
File Type:pdf, Size:1020Kb
NASA Big Data Challenges: Ames Perspective Dr. Piyush Mehrotra Chief, NASA Advanced Supercomputing (NAS) Division [email protected] Agenda • Ames Overview • Big Data Challenges for Users • NASA Supercomputing – HECC project • Big Data related projects @ NAS National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 2 • Occupants: ~1130 civil servants; ~2,100 contractors;1,650 tenants Ames Research Center ~1344 summer students in 2015 • FY2016 Budget: ~$915M (including reimbursable/EUL) • ~1,900 acres (400 acres security perimeter); 5M building ft2 • Airfield: ~9,000 and 8,000 ft runways National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 Partnerships at Ames • Partnering with external organizations to access capabilities under collaborative agreements • Entering into reimbursable agreements for partner access to NASA capabilities • Expanding overall landscape of space activity (maximizing public and private sector growth) • Spurring innovation International Commercial Virtual Institutes Interagency NAI: the Community Military NASA Research Park Academia NASA Centers National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 Core Competencies at Ames Today Space and Earth Sciences Intelligent/ Air Traffic Entry Adaptive Management Systems Systems Advanced Computing Cost-Effective Astrobiology and Life Space Missions Aerosciences & IT Science Systems National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 National Strategic Computing Initiative Executive Order -- Creating a National Strategic Computing Initiative, July 2015 Objectives: 1. Accelerate delivery of a capable exascale computing system delivering approximately 100 times the performance of current systems across a range of applications. 2. Increase coherence between the technology base used for modeling and simulation and that used for data analytic computing. 3. Establish a viable path forward for future HPC systems even after the limits of current semiconductor technology are reached (the "post-Moore's Law era"). 4. Increase the capacity and capability of an enduring national HPC ecosystem. 5. Develop an enduring public-private collaboration to ensure that the benefits of the research and development advances are shared among government, industrial, and academic sectors. National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 6 High-End Computing Capability (HECC) NASA’s Premier Supercomputer Center Resources have broad mission impact across all of NASA’s Mission Directorates Over 500 science & engineering projects with more than 1,500 users (hosted by the NASA Advanced Supercomputing (NAS) Division at Ames) • Pleiades – 7.25 PF peak – Distributed memory cluster SGI Altix ICE – 246K-core; 11472 nodes; 4 Xeon generations – #15 (#7 in US) on TOP500; #9 in HPCG list (06/2016) • Specialized Hardware – Endeavour: shared memory nodes – 1024 core 4 TB & 512 core 2 TB – GPGPU nodes: 64 nodes NVIDIA Tesla K40 2 bldgs, 4 floors – Xeon Phi: 20 many-integrated core nodes 26,600 sq. ft. – NVIDIA DGX-1: 8 Tesla Pascale GPUs for machine learning 6.25 MW elec. power • Storage: ~30 PB disk; ~500 PB tape capacity 2160 tons cooling • Networking: 10 Gb/s external peering National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 7 Integrated Spiral Support for MS&A Develop and deliver the most productive integrated supercomputing environment in the world, enabling NASA to extend technology, expand knowledge, and explore the universe NASA Mission Challenges Outcome: Dramatically enhanced understanding and Scientists and engineers plan computational insight, accelerated science and engineering, and analyses, selecting the best-suited codes to increased mission safety and performance address NASA’s complex mission challenges Performance Data Analysis Optimization and Visualization NAS visualization experts apply advanced data analysis and rendering techniques to help users explore and understand NAS software experts large, complex computational utilize tools to parallelize results and optimize codes, dramatically increasing simulation performance Computational Modeling, while decreasing turn-around time Simulation, & Analysis NAS support staff help users to productively utilize NASA’s supercomputing environment (hardware, software, networks, and National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 storage) to rapidly solve large computational problems Big Data Challenges for NASA Users Data Discovery – finding what data is available and where NASA supports • Indexing, federated metadata service and semantic reasoning enormous collections of Data management – transferring very large data sets from archives to big data sets: computational resources • Increased WAN bandwidth • Fault tolerant and resilient hardware/software infrastructure Observational Data Estimate 100+ active Tools/models/algorithms - developing analytics/analysis software at scale satellites producing 50PBs • Mechanisms for sharing software to reduce duplication per year Analysis workflow – increasing complexity of processing pipelines have multiple components requiring heterogeneous resources Model Data • Software for workflow description and management to tie all components together and NAS has 30 PBs of online facilitate re-use storage- MITGcm run produced > 3PBs Analysis/Analytics infrastructure – inadequacy of available resources • I/O infrastructure Experimental Data • Large memory spaces for in-core analysis • Support for the heterogeneous resources in an integrated environment: distributed Wind tunnel tests projected memory & shared memory systems, hadoop cluster, accelerators, FPGAs etc. to produce 100 TBs per test Data Dissemination– difficult to share knowledge across a wider community • Support for dissemination and sharing of code, data products, results, etc….. National Aeronautics and Space Administration BasedNASA on Big a HECCData Task survey Force,: NAS September Technical 2016 Report: NAS-2014-02.pdf 9 Big Data Challenges for NASA Users Data Discovery – finding what data is available and where NASA supports • Indexing, federated metadata service and semantic reasoning enormous collections of Data management – transferring very large data sets from archives to big data sets: computational resources Fun Fact: The term “Big• Data”Increased WANwas bandwidth first used by Michael Cox & David • Fault tolerant and resilient hardware/software infrastructure ObservationalEllsworth Data of the NAS Division at Ames in their paper: Estimate 100+ active Tools/models/algorithms - developing analytics/analysis software at scale “Visualizingsatellites producing flow 50PBs around an• Mechanismsairframe for” sharing Visualization software to reduce 97, duplication Phoenix AZ. per- yearBiggest data set consideredAnalysis workflow 7.5 GB – increasing; high -complexityend analysis of processing machines pipelines have had multiple components requiring heterogeneous resources Model Dataless than 1GB memory• Software for workflow description and management to tie all components together and NAS has 30 PBs of online facilitate re-use storage- MITGcm run produced > 3PBs Analysis/Analytics infrastructure – inadequacy of available resources •I/O infrastructure Experimental Data • Large memory spaces for in-core analysis • Support for the heterogeneous resources in an integrated environment: distributed Wind tunnel tests projected memory & shared memory systems, hadoop cluster, accelerators, FPGAs etc. to produce 100 TBs per test Data Dissemination– difficult to share knowledge across a wider community • Support for dissemination and sharing of code, data products, results, etc….. National Aeronautics and Space Administration BasedNASA on Big a HECCData Task survey Force,: NAS September Technical 2016 Report: NAS-2014-02.pdf 10 Merging HPC and Data Analysis @ NAS: Data Intensive Supercomputing Environment Collaboration / User Portals Cloud-based data servers, web servers Data Resources Large-scale Compute Analytics/Analysis Data Rces1 Platforms Pleiades Endeavour • Optimized for analytics/analysis Optimized for large-scale - Visualization engines • User & community Datasets simulation and analysis - Accelerators • Shared across platforms - FPGAs • Tools for data/workflow High-speed WAN connection management; machine learning External • Data Repositories Data Resources • Experimental sources(labs, wind tunnels, etc.) National Aeronautics and Space Administration NASA Big Data Task Force, September 2016 11 Big Data Related Projects @ NAS • Mining network flows for cyber-security risk assessment Applications • Tree cover classification for continental US • … • Machine learning algorithms using GPUs/Xeon Phis/FPGAs Analytics • Large scale analytics on distributed and shared memory architectures • Collaboration platform for scientists (NASA Earth eXchange - NEX) Support Software • Ontology-based data search environment for observational data • Data tagging for security and data discovery • Workflow management system with automated provenance capture • Optimization of Lustre calls to improve QOS for applications System Software • Support for caching using SSDs - Lustre - target metadata and/or user access - Analysis – as local disks or fast global “file” system (RDMA over IB) • SSDs for I/O optimization – linux-based block device software Hardware Infrastructure • Specialized clusters for data analytics § Xeon PHI,