HDF5 Overview.Pptx

HDF5 Overview.Pptx

The HDF Group HDF5 Overview Elena Pourmal [email protected] The HDF Group 10/17/15 ICALEPCS 2015 1 www.hdfgroup.org Outline • The HDF Group company • Products and services • Overview of HDF5 • What is coming in HDF5 1.10.0 release? • Future directions 10/17/15 ICALEPCS 2015 2 www.hdfgroup.org THE HDF GROUP COMPANY 10/17/15 ICALEPCS 2015 3 www.hdfgroup.org Champaign, Illinois, USA 10/17/15 ICALEPCS 2015 4 www.hdfgroup.org The HDF Group www.hdfgroup.org • Not-for-profit company (since 2006), ex-NCSA at University of Illinois • Offices in 5 states • About 40 employees (more than 50% growth in the past 9 years) - Core software developers - Domain specialists - Documentation team - Technical support • Mission-driven 10/17/15 ICALEPCS 2015 5 www.hdfgroup.org The HDF Group Mission To ensure long-term accessibility of HDF data through sustainable development and support of HDF technologies. 10/17/15 ICALEPCS 2015 6 www.hdfgroup.org The HDF Group philosophy • Committed to Open Source • HDF software is free • BSD type of license • Community involvement • Testing • Patches • New features (e.g., CMake support) • Serving diverse user base • Remote sensing, HPC, non-destructive testing, medical records, scientific modeling, etc. 10/17/15 ICALEPCS 2015 7 www.hdfgroup.org Revenue by Source Light Sources 2014 3% 0% Earth science 4% Finance 28% General NASA, NOAA Naonal Labs HPC 62% Oil & gas 3% 0% Par<cle science 10/17/15 ICALEPCS 2015 8 www.hdfgroup.org Revenue by Project Type Revenues by type of proJect Training and other Consulng outreach 8% 0% R&D 22% Development 24% Premium support 1% Enterprise support 45% 10/17/15 ICALEPCS 2015 9 www.hdfgroup.org PRODUCTS AND SERVICES 10/17/15 ICALEPCS 2015 10 www.hdfgroup.org The HDF Group products • Main product: HDF Technology Suite - For managing high volume complex, heterogeneous data - Flagship: HDF5 data store - Flexible and efficient storage and I/O - Portable - Highly customizable - Misc. tools - Specialized software and tools (e.g., JPSS) 10/17/15 ICALEPCS 2015 11 www.hdfgroup.org Data challenges addressed by HDF5 HDF5 IN 5 MINUTES 10/17/15 ICALEPCS 2015 12 www.hdfgroup.org HDF5 Technology Platform • HDF5 Abstract Data Model • Defines the “building blocks” for data organization and specification • Files, Groups, Links, Datasets, Attributes, Datatypes, Dataspaces • HDF5 Software • Tools • Language Interfaces (C, Fortran, C++, Java) • HDF5 Library • HDF5 Binary File Format • Bit-level organization of HDF5 file • Defined by HDF5 File Format Specification • HDF5 Ecosystem • Tools and services (h5py, MATLAB, IDL, OPeNDAP, etc.) • Communities (Earth Sciences, medical imaging, modeling and visualization) • Community standards (NeXus, HDF-EOS5, h5part, CGNS) • Institutional support and endorsement (NASA, NOAA, DOE) 10/17/15 ICALEPCS 2015 13 www.hdfgroup.org Members of the HDF community 10/17/15 ICALEPCS 2015 14 www.hdfgroup.org Success stories • Petabytes of NASA remote sensing data in HDF4 and HDF5 file formats • New NASA/JPSS missions chose HDF5 format for data archiving Need to organize complex collections of data Long term data preservation lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Efficient, scalable storage and access 10/17/15 ICALEPCS 2015 15 www.hdfgroup.org Success story: Trillion Particle Simulation • Physics plasma simulation at NERSC Cray XE6 • Simulation ran on 120,000 cores using 80% of computing resources 90% of available memory 50% of Lustre scratch system and writing 10 one-trillion particle dumps of 30-42 TBs in HDF5 files; sustained ~ 27 GB/sec; total 350 TBs in HDF5 10/17/15 ICALEPCS 2015 16 www.hdfgroup.org The HDF Group services • Helpdesk and mailing lists - [email protected] - [email protected] - Open to all users of HDF • HDF5 Documentation https://www.hdfgroup.org/HDF5/doc/index.html • HDF Examples (C, Fortran, C++, Java, Python, MATLAB) https://www.hdfgroup.org/HDF5/examples/ 10/17/15 ICALEPCS 2015 17 www.hdfgroup.org The HDF Group services • Standard support • Assistance in general areas of HDF usage • Premium support • Access to our consulting and training resources • Limited consulting hours are included • Enterprise support • Help with developing common strategies for managing HDF data within organization • Organization shares consulting/troubleshooting services • Training • Consulting, custom development and support 10/17/15 ICALEPCS 2015 18 www.hdfgroup.org New Upcoming Features HDF5 1.10.0 RELEASE 10/17/15 ICALEPCS 2015 19 www.hdfgroup.org Reusing free file space in a file PERSISTENT FILE FREE SPACE TRACKING 10/17/15 ICALEPCS 2015 20 www.hdfgroup.org Unused space in HDF5 file • HDF5 library currently only tracks free space while file is open • Space from deleted objects • Space from resized compressed chunks • Free space in the file is “lost” after file is closed • h5repack is used to remove “holes” in the file • New function H5Pset_file_space • Sets a property to track free space in the file that can be reused when file is reopened • Allows fine tuning space tracking 10/17/15 ICALEPCS 2015 21 www.hdfgroup.org Improving performance and saving space SCALABLE CHUNK INDEXING 10/17/15 ICALEPCS 2015 22 www.hdfgroup.org Optimizing chunking storage and performance • HDF5 has an ability to add more data to existing datasets (data arrays) • Special storage mechanism – chunked storage • B-trees are used to index chunks in the file • O(log n) lookup time • HDF5 takes advantage of the access pattern and properties of the datasets • O(1) lookup time • File space savings when storing HDF5 metadata 10/17/15 ICALEPCS 2015 23 www.hdfgroup.org Optimizing chunking storage and performance • B-tree implementation was reworked to use less space in the file • Used for datasets with more than one unlimited dimension • New indexing structures were introduced to achieve O(1) performance and storage savings in special cases 10/17/15 ICALEPCS 2015 24 www.hdfgroup.org Optimizing chunking storage and performance • Examples of O(1) lookup access: • Fixed-size chunked dataset with no compression filters • Algorithmic lookup • Fixed-size chunked dataset with compression filters • Array to index chunks • Fixed-size dataset stored in one chunk (i.e., we now allow compression for contiguous dataset) • No index • Dataset with one unlimited dimension • Extensible array to index chunks 10/17/15 ICALEPCS 2015 25 www.hdfgroup.org CONCURRENCY: SINGLE-WRITER/MULTIPLE- READER 10/17/15 ICALEPCS 2015 26 www.hdfgroup.org Concurrent Access to Data New data elements … Writer Reader …which can be read … are added by a reader… to a dataset HDF5 File in the file… with no IPC necessary. 10/17/15 ICALEPCS 2015 27 www.hdfgroup.org Managing data stored across HDF5 files VIRTUAL DATASET (VDS) 10/17/15 ICALEPCS 2015 28 www.hdfgroup.org VDS Use Case with NPP satellite data 4 granules in 9 GMODO-SVM07… files Visualization with IDV 10/17/15 ICALEPCS 2015 29 www.hdfgroup.org VDS Use Case with NPP satellite data One virtual dataset with 36 granules stored in one file Visualization with IDV 10/17/15 ICALEPCS 2015 30 www.hdfgroup.org VDS use case: Percival detector Series of images D C B A t3+4k t1+4k t4 t3 Virtual Dataset VDS has images A, B, C and D interleaved t2 t1 reader VDS.h5 writer writer writer writer 10/17/15Dataset A Dataset B Dataset C Dataset D A B C D a.h5 b.h5 c.h5 d.h5 31 www.hdfgroup.org VDS: Conceptual View 32 10/17/15 www.hdfgroup.org Performance boost when opening and closing HDF5 files METADATA CACHE IMAGE 10/17/15 ICALEPCS 2015 33 www.hdfgroup.org Problem: Metadata Cache Image ! HDF5 metadata is typically small and scattered throughout the file. ! Resulting many small I/Os a major problem for parallel file systems. ! Metadata cache minimizes this during normal operation, but must still populate cache on file open, and flush it on file close. ! Problem if files are opened and closed often. 10/17/15 ICALEPCS 2015 34 www.hdfgroup.org Solution: Metadata Cache Image ! Store the contents of the metadata cache in a single block at file close, and then populate the cache with the stored entries on file open. ! If access pattern is similar over close and reopen, should save a significant number of small I/O operations. ! This solution is implemented in the metadata cache image feature. 10/17/15 ICALEPCS 2015 35 www.hdfgroup.org Metadata Cache Image ! To enable, set cache image FAPL property on file create or open: H5AC_cache_image_config_t cache_image_config = {H5AC__CURR_CACHE_IMAGE_CONFIG_VERSION, TRUE, 0}; fapl_id = H5Pcreate(H5P_FILE_ACCESS); H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); H5Pset_mdc_image_config(fapl_id, &cache_image_config); ! Then create or open file as usual. 10/17/15 ICALEPCS 2015 36 www.hdfgroup.org Metadata Cache Image ! Metadata cache image is read and deleted automatically on file open. ! Must set cache image FAPL property again if a new cache image is desired on file close. ! Earlier versions of HDF5 that don't understand the cache image will refuse to open the file. ! One can use a light-weight utility to remove caching info making file compatible with 1.8 ! Prototype implementation showed order of magnitude speedup on parallel systems 10/17/15 ICALEPCS 2015 37 www.hdfgroup.org Performance imporvemnts DATA AGGREGATION AND PAGE BUFFERING 10/17/15 ICALEPCS 2015 38 www.hdfgroup.org Page buffering/ Data aggregation Aggregate and align metadata and small data, perform I/O in aligned pages 10/17/15 39 www.hdfgroup.org Data and Metadata Aggregators The new aggregators pack small raw data and metadata allocations into aligned

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    58 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us