Enhancing the Performance and Scalability of Third-Party Libraries in the Sandia I/O Software Stack

Total Page:16

File Type:pdf, Size:1020Kb

Enhancing the Performance and Scalability of Third-Party Libraries in the Sandia I/O Software Stack Enhancing the Performance and Scalability of Third-Party Libraries in the Sandia I/O Software Stack HDF5 User Group Meeting 2020, October 14- 16, 2020 Gregory Sjaardema, Engineering Sciences Center, Sandia National Laboratories, Albuquerque NM Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Unclassified, Unlimited Release. SAND2020-10955 C Scalability Capacity Complexity Exodus Background ◦ Developed 1992 ◦ Replaced old fortran-unformatted read/write files with a ◦ device-independent, random access database. ◦ Based on NetCDF ◦ Single format used by all SNL Structural and Thermal FE Codes ◦ Nodes, Homogenous Element Blocks, Node Sets, Side Sets ◦ Transient Data on Nodes and Elements. ◦ A Suite of mesh generation, preprocessing, postprocessing, visualization, and translation tools developed to generate and modify exodus files. SEACAS ◦ Backward Compatibility Essential Sjaardema: Sub-JOWOG-34, 2017 SAND2017-1050 C Exodus Evolution Capacity: ◦ 1992: CDF1 ~34 Million Nodes/Elements (~NetCDF-2.3.X) ◦ 2004: CDF2 ~500 Million Nodes/Elements (NetCDF-3.6.0) ◦ Internal changes to exodus format to reduce dataset sizes ◦ 2008: NetCDF-4 (HDF5 based) 2.1 Billion Nodes/Elements ◦ 2012: 64-bit Integer changes (HDF5 Enabled) Multi-Billion Nodes/Elements ◦ Converts integer size on the fly. Ids, indices, maps Capability: ◦ Named blocks, sets, attributes, and maps ◦ Transient variables on all blocks and sets ◦ “unlimited” string size replaced fixed 32 character limit ◦ Full topology support (Element->Face->Edge->Node) ◦ Arbitrary Polyhedral element support ◦ Compression (via HDF5) ◦ File Groups (via HDF5) ◦ Assemblies, Entity Attributes, Discontinuous Galerkin Sjaardema: Sub-JOWOG-34, 2017 SAND2017-1050 C Exodus Evolution -- Parallel Original workflow was file-per-processor ◦ External tools to split to parallel and join from parallel ◦ Extension library “nemesis” provided parallel data structures ◦ Each “processor” file is valid exodus file. Exodus becomes “parallel-aware” ◦ Nemesis Embedded in Exodus (ne_? Changed to ex_?) ◦ PNetCDF and HDF5 provide parallel IO capabilities ◦ Can support auto-decomposition and auto-join Sjaardema: Sub-JOWOG-34, 2017 SAND2017-1050 C I/O Subsystem – IOSS Library ◦ Started as the IO component of the Sierra project – 12/1999 ◦ Provide a database-independent interface to applications ◦ (Exodus, CGNS, SAF, XDMF, ADIOS2, …) ◦ Also functions as a “pseudo-C++ API” for the exodus library ◦ Supports Advanced HPC Capabilities: ◦ Kokkos Data ◦ Burst Buffer ◦ Data Warehouse (FAODEL) ◦ Embedded Visualization ◦ Auto-decomposition ◦ Replaces legacy file-per-processor mode ◦ Uses either HDF5 or PnetCDF for parallel input ◦ Uses decomposition methods in Zoltan and ParMETIS ◦ Supports Exodus and CGNS (Structured and Unstructured) ◦ Auto-join (single file output) option ◦ Uses HDF5 or PnetCDF for parallel output ◦ Scalability issues.... Being addressed. ◦ HDF5 is used by 3 of the supported Data types. Parallel Scalability -- Exodus 2 Billion Elements Sequoia Lustre Filesystem 1000000 Week 100000 Execution - Old Day 10000 Large model being run on Sequoia Application reported “analysis is hanging”Hour 1000 Ran same mesh in IOSS simulator with tracing enabled to determine what was happening 100 EXECUTION TIME (SECONDS) Initial results looked promising…f Minute 10 1 64 256 1,024 4,096 16,384 65,536 262,144 PROCESSOR COUNT Parallel Scalability -- Exodus 2 Billion Elements Sequoia Lustre Filesystem 1000000 Execution - Old Week 100000 Open - Old Day 10000 Hour 1000 100 EXECUTION TIME (SECONDS) Minute 10 1 64 256 1,024 4,096 16,384 65,536 262,144 PROCESSOR COUNT A supercomputer is a device for turning compute-bound problems into I/O-bound problems Parallel Scalability -- Exodus 2 Billion Elements Sequoia Lustre Filesystem 1000000 Execution - Old Week 100000 ◦ Added option to use new (at the time) HDF5 collective metadata option Open - Old Day 10000 ◦ Additional performance improvements, ◦ parallel fixes, ◦ code cleanups Hour 1000 100 EXECUTION TIME (SECONDS) Minute 10 1 64 256 1,024 4,096 16,384 65,536 262,144 PROCESSOR COUNT A supercomputer is a device for turning compute-bound problems into I/O-bound problems Parallel Scalability -- Exodus 2 Billion Elements Sequoia Lustre Filesystem 1000000 Execution - New Execution - Old Week ◦ Added option to use new HDF5 collective metadata option 100000 Open - Old ◦ ◦ Additional performance improvements, Day Open - New ◦ parallel fixes, 10000 ◦ code cleanups ◦ Work closely with HDF5 and NetCDF developers Hour 1000 ◦ Working with HDF5 – ◦ flush, truncate, open scalability issues 100 EXECUTION TIME (SECONDS) Minute 10 1 64 256 1,024 4,096 16,384 65,536 262,144 PROCESSOR COUNT A supercomputer is a device for turning compute-bound problems into I/O-bound problems Parallel Scalability – Exodus AutoJoin 300 134 Million Elements, rzuseq 250 Improving parallel scalability of auto-join (single file output) 200 Compose FPP 150 ◦ Patch NetCDF/HDF5 to avoid unneeded data access (PR 336) 100 ◦ Code review of Ioss “autojoin” routines Execution Time (sec) 50 0 32 128 512 Preliminary results look promising, more to do: Processor Count ◦ Darshan Profiling Processor Count 2 8 32 128 512 2048 8192 32768 ◦ MPI Profiling 10000 ◦ MPI-IO Tuning Fill - Redefine Fill - Elapsed Compose Timing ◦ Filesystem Tuning 1000 NoFill - Redefine NoFill - Elapsed Working with HDF5 – 100 ◦ Flush, truncate, close scalability issues 10 Execution Time, seconds 1 0.1 Input/Output performance improvements CGNS Structured Mesh 100000 Day Baseline Add MetaData BCast 10000 Improved N->1 Hour File-per-Processor (fpp) 1000 MPI_BCast() Serial Reference 100 Minute 10 1 Second Execution (seconds) 0.1 0.01 0.001 32 128 512 MPI Ranks 2,048 8,192 32,768 Input/Output performance improvements CGNS Structured Mesh 100000 Day Baseline Add MetaData BCast 10000 Improved N->1 Hour File-per-Processor (fpp) 1000 MPI_BCast() Serial Reference 100 Compact Storage Minute 10 1 Second Execution (seconds) 0.1 0.01 0.001 32 128 512 MPI Ranks 2,048 8,192 32,768 Input/Output performance improvements – Large Model CGNS Structured Mesh 100000 Baseline Improved N->1 10000 Improved, Striped File-per-Processor (fpp) Serial Reference 1000 100 10 Execution Time (seconds) Time Execution 1 MPI Ranks 0.1 32 64 128 256 512 1024 2048 4096 8192 16384 Input/Output performance improvements – Large Model CGNS Structured Mesh 100000 Baseline Improved N->1 10000 Improved, Striped File-per-Processor (fpp) Serial Reference 1000 Improvements: 100 • 2 to 3 orders of magnitude • Hours to seconds… • Competitive with FPP 10 • Additional speedups (stripe, …) Execution Time (seconds) Time Execution 1 MPI Ranks 0.1 32 64 128 256 512 1024 2048 4096 8192 16384 Input/Output performance improvements – Large Model CGNS Structured Mesh 100000 Baseline Improved N->1 Recipe for Success… 10000 Improved, Striped • Test on representative models File-per-Processor (fpp) • Use simulator for faster trials Serial Reference • Instrument/Trace/Profile 1000 • Collaborate with TPL Developers • Test hypotheses • Retest Improvements: • Verify on different models 100 • 2 to 3 orders of magnitude • Verify in actual application • Hours to seconds… • Productionize / Stabilize • Competitive with FPP • Repeat… 10 • Additional speedups (stripe, …) Execution Time (seconds) Time Execution 1 MPI Ranks 0.1 32 64 128 256 512 1024 2048 4096 8192 16384 32 Exodus speedups… Exodus format file in NetCDF-4 (HDF5) format • 64-bit integers, 64-bit doubles 1000 • 48.3 Gbyte file • 390 Million finite element nodes Baseline • 49 Million 27-node hexahedral elements • 1 time step with 8 variables per node Compact 100 Lustre Filesystem • “nscratch” 11 PByte • Input file stripe count = 1 FPP • Output file stripe count = 36 10 Execution Time (seconds) Run on “serrano” CTS-1 system • 2.1 GHz processors • Dual sockets with 18 cores each • Intel Broadwell® E5-2695 v4 • ~1.2 TFLOPs per node 1 • 128 GB RAM per node (3.55 GB per core) 32 64 128 256 • 512Intel Omni1024 -Path2048 high4096 speed8192 interconnect16384 32768 MPI Ranks 33 Exodus speedups… 1000 Baseline Compact 100 FPP 10 Execution Time (seconds) 1 32 64 128 256 512 1024 2048 4096 8192 16384 32768 MPI Ranks 34 Exodus speedups… 1000 Baseline Significant Speedup Realized Compact 100 • NetCDF PR1570 (HDF5 Compact Storage). • Merged, Released in 4.7.4. • Uses existing API function FPP • Still slower than FPP by >2X; more profiling needed 10 Execution Time (seconds) 1 32 64 128 256 512 1024 2048 4096 8192 16384 32768 MPI Ranks 35 Cgns many zone optimization Creating a cgns file with “many” zones Profiling showed “H5Literate_by_name” major contributor exposes N^2 behavior H5Literate_by_name changed to H5Lexists ◦ Metadata only being written Code calling this function no longer shows as a hot-spot ◦ Serial or Parallel Still N^2 behavior in another part of code due to string Execution Time for Zone Create Testing 1,200 compares. Looking at hashing strings to avoid this cost. Baseline 1,000 Thousands H5Literate_by_name Change 800 Remove strcmp cost With both of these fixes, behavior is near-linear performance out 600 to a million zones. Execution Time 400 200 0 0 200,000 400,000 600,000 800,000 1,000,000 Zone Count Input by Scot Breitenfeld and Mickael Philit 36 Conclusions Scalability, Complexity, and Capacity are very important to HPC Applications
Recommended publications
  • A New Data Model, Programming Interface, and Format Using HDF5
    NetCDF-4: A New Data Model, Programming Interface, and Format Using HDF5 Russ Rew, Ed Hartnett, John Caron UCAR Unidata Program Center Mike Folk, Robert McGrath, Quincey Kozial NCSA and The HDF Group, Inc. Final Project Review, August 9, 2005 THG, Inc. 1 Motivation: Why is this area of work important? While the commercial world has standardized on the relational data model and SQL, no single standard or tool has critical mass in the scientific community. There are many parallel and competing efforts to build these tool suites – at least one per discipline. Data interchange outside each group is problematic. In the next decade, as data interchange among scientific disciplines becomes increasingly important, a common HDF-like format and package for all the sciences will likely emerge. Jim Gray, Distinguished “Scientific Data Management in the Coming Decade,” Jim Gray, David Engineer at T. Liu, Maria A. Nieto-Santisteban, Alexander S. Szalay, Gerd Heber, Microsoft, David DeWitt, Cyberinfrastructure Technology Watch Quarterly, 1998 Turing Award Volume 1, Number 2, February 2005 winner 2 Preservation of scientific data … the ephemeral nature of both data formats and storage media threatens our very ability to maintain scientific, legal, and cultural continuity, not on the scale of centuries, but considering the unrelenting pace of technological change, from one decade to the next. … And that's true not just for the obvious items like images, documents, and audio files, but also for scientific images, … and MacKenzie Smith, simulations. In the scientific research community, Associate Director standards are emerging here and there—HDF for Technology at (Hierarchical Data Format), NetCDF (network the MIT Libraries, Common Data Form), FITS (Flexible Image Project director at Transport System)—but much work remains to be MIT for DSpace, a groundbreaking done to define a common cyberinfrastructure.
    [Show full text]
  • Using Netcdf and HDF in Arcgis
    Using netCDF and HDF in ArcGIS Nawajish Noman Dan Zimble Kevin Sigwart Outline • NetCDF and HDF in ArcGIS • Visualization and Analysis • Sharing • Customization using Python • Demo • Future Directions Scientific Data and Esri • Direct support - NetCDF and HDF • OPeNDAP/THREDDS – a framework for scientific data networking, integrated use by our customers • Users of Esri technology • National Climate Data Center • National Weather Service • National Center for Atmospheric Research • U. S. Navy (NAVO) • Air Force Weather • USGS • Australian Navy • Australian Bur.of Met. • UK Met Office NetCDF Support in ArcGIS • ArcGIS reads/writes netCDF since version 9.2 • An array based data structure for storing multidimensional data. T • N-dimensional coordinates systems • X, Y, Z, time, and other dimensions Z Y • Variables – support for multiple variables X • Temperature, humidity, pressure, salinity, etc • Geometry – implicit or explicit • Regular grid (implicit) • Irregular grid • Points Gridded Data Regular Grid Irregular Grid Reading netCDF data in ArcGIS • NetCDF data is accessed as • Raster • Feature • Table • Direct read • Exports GIS data to netCDF CF Convention Climate and Forecast (CF) Convention http://cf-pcmdi.llnl.gov/ Initially developed for • Climate and forecast data • Atmosphere, surface and ocean model-generated data • Also for observational datasets • The CF conventions generalize and extend the COARDS (Cooperative Ocean/Atmosphere Research Data Service) convention. • CF is now the most widely used conventions for geospatial netCDF data. It has the best coordinate system handling. NetCDF and Coordinate Systems • Geographic Coordinate Systems (GCS) • X dimension units: degrees_east • Y dimension units: degrees_north • Projected Coordinate Systems (PCS) • X dimension standard_name: projection_x_coordinate • Y dimension standard_name: projection_y_coordinate • Variable has a grid_mapping attribute.
    [Show full text]
  • A COMMON DATA MODEL APPROACH to NETCDF and GRIB DATA HARMONISATION Alessandro Amici, B-Open, Rome @Alexamici
    A COMMON DATA MODEL APPROACH TO NETCDF AND GRIB DATA HARMONISATION Alessandro Amici, B-Open, Rome @alexamici - http://bopen.eu Workshop on developing Python frameworks for earth system sciences, 2017-11-28, ECMWF, Reading. alexamici / talks MOTIVATION: FORECAST DATA AND TOOLS ECMWF Weather forecasts, re-analyses, satellite and in- situ observations N-dimensional gridded data in GRIB Archive: Meteorological Archival and Retrieval System (MARS) A lot of tools: Metview, Magics, ecCodes... alexamici / talks MOTIVATION: CLIMATE DATA Copernicus Climate Change Service (C3S / ECMWF) Re-analyses, seasonal forecasts, climate projections, satellite and in-situ observations N-dimensional gridded data in many dialects of NetCDF and GRIB Archive: Climate Data Store (CDS) alexamici / talks HARMONISATION STRATEGIC CHOICES Metview Python Framework and CDS Toolbox projects Python 3 programming language scientic ecosystems xarray data structures NetCDF data model: variables and coordinates support for arbitrary metadata CF Conventions support on IO label-matching broadcast rules on coordinates parallelized and out-of-core computations with dask alexamici / talks ECMWF NETCDF DIALECT >>> import xarray as xr >>> ta_era5 = xr.open_dataset('ERA5-t-2016-06.nc', chunks={}).t >>> ta_era5 <xarray.DataArray 't' (time: 60, level: 3, latitude: 241, longitude: 480)> dask.array<open_dataset-..., shape=(60, 3, 241, 480), dtype=float64, chunksize=(60, 3, Coordinates: * longitude (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ... * latitude (latitude) float32 90.0 89.25 88.5 87.75 87.0 86.25 85.5 ... * level (level) int32 250 500 850 * time (time) datetime64[ns] 2017-06-01 2017-06-01T12:00:00 ..
    [Show full text]
  • MATLAB Tutorial - Input and Output (I/O)
    MATLAB Tutorial - Input and Output (I/O) Mathieu Dever NOTE: at any point, typing help functionname in the Command window will give you a description and examples for the specified function 1 Importing data There are many different ways to import data into MATLAB, mostly depending on the format of the datafile. Below are a few tips on how to import the most common data formats into MATLAB. • MATLAB format (*.mat files) This is the easiest to import, as it is already in MATLAB format. The function load(filename) will do the trick. • ASCII format (*.txt, *.csv, etc.) An easy strategy is to use MATLAB's GUI for data import. To do that, right-click on the datafile in the "Current Folder" window in MATLAB and select "Import data ...". The GUI gives you a chance to select the appropriate delimiter (space, tab, comma, etc.), the range of data you want to extract, how to deal with missing values, etc. Once you selected the appropriate options, you can either import the data, or you can generate a script that imports the data. This is very useful in showing the low-level code that takes care of reading the data. You will notice that the textscan is the key function that reads the data into MATLAB. Higher-level functions exist for the different datatype: csvread, xlsread, etc. As a rule of thumb, it is preferable to use lower-level function are they are "simpler" to understand (i.e., no bells and whistles). \Black-box" functions are dangerous! • NetCDF format (*.nc, *.cdf) MATLAB comes with a netcdf library that includes all the functions necessary to read netcdf files.
    [Show full text]
  • R Programming for Climate Data Analysis and Visualization: Computing and Plotting for NOAA Data Applications
    Citation of this work: Shen, S.S.P., 2017: R Programming for Climate Data Analysis and Visualization: Computing and plotting for NOAA data applications. The first revised edition. San Diego State University, San Diego, USA, 152pp. R Programming for Climate Data Analysis and Visualization: Computing and plotting for NOAA data applications SAMUEL S.P. SHEN Samuel S.P. Shen San Diego State University, and Scripps Institution of Oceanography, University of California, San Diego USA Contents Preface page vi Acknowledgements viii 1 Basics of R Programming 1 1.1 Download and install R and R-Studio 1 1.2 R Tutorial 2 1.2.1 R as a smart calculator 3 1.2.2 Define a sequence in R 4 1.2.3 Define a function in R 5 1.2.4 Plot with R 5 1.2.5 Symbolic calculations by R 6 1.2.6 Vectors and matrices 7 1.2.7 Simple statistics by R 10 1.3 Online Tutorials 12 1.3.1 Youtube tutorial: for true beginners 12 1.3.2 Youtube tutorial: for some basic statistical summaries 12 1.3.3 Youtube tutorial: Input data by reading a csv file into R 12 References 14 Exercises 14 2 R Analysis of Incomplete Climate Data 17 2.1 The missing data problem 17 2.2 Read NOAAGlobalTemp and form the space-time data matrix 19 2.2.1 Read the downloaded data 19 2.2.2 Plot the temperature data map of a given month 20 2.2.3 Extract the data for a specified region 22 2.2.4 Extract data from only one grid box 23 2.3 Spatial averages and their trends 23 2.3.1 Compute and plot the global area-weighted average of monthly data 23 2.3.2 Percent coverage of the NOAAGlobalTemp 24 2.3.3 Difference from the
    [Show full text]
  • 4.04 Netcdf.Pptx
    Network Common Data Form NetCDF An Indroduction NetCDF is a set of software libraries and self- describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. 1 The Purpose of NetCDF ● The purpose of the Network Common Data Form (netCDF) interface is to allow you to create, access, and share array- oriented data in a form that is self-describing and portable. ● Self-describing means that a dataset includes information defining the data it contains. ● Portable means that the data in a dataset is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. ● The netCDF software includes C, Fortran 77, Fortran 90, and C++ interfaces for accessing netCDF data. ● These libraries are available for many common computing platforms. 2 NETCDF Features ● Self-Describing: A netCDF file may include metadata as well as data: names of variables, data locations in time and space, units of measure, and other useful information. ● Portable: Data written on one platform can be read on other platforms. ● Direct-access: A small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data. ● Appendable: Data may be efficiently appended to a netCDF file without copying the dataset or redefining its structure. ● Networkable: The netCDF library provides client access to structured data on remote servers through OPeNDAP protocols. ● Extensible: Adding new dimensions, variables, or attributes to netCDF files does not require changes to existing programs that read the files. ● Sharable: One writer and multiple readers may simultaneously access the same netCDF file.
    [Show full text]
  • Parallel Netcdf
    COMP 705: Advanced Parallel Computing HW 6: Accessing XSEDE resources; parallel NetCDF Mary Thomas Department of Computer Science Computational Science Research Center (CSRC) San Diego State University (SDSU) Due: 11/01/17 Updated: 11/06/17 COMP 705: HW 6 Due: 11/01/17 Updated: 11/06/17 2/13 Mary Thomas Table of Contents 1 Problem 6a: Accessing comet.sdsc.edu 2 Problem 6b: Using Parallel NetCDF Data Storage 3 General Instructions for all Problems Configuring the Problem Size and Processor Distribution Analysis Use TAU to Analyze Code Performance Matlab code to read NetCDF Data COMP 705: HW 6 Due: 11/01/17 Updated: 11/06/17 3/13 Mary Thomas Problem 6a: Accessing comet.sdsc.edu Problem 6a: Accessing comet.sdsc.edu Use your parallel MPI I/O (or NetCDf) code to show you can access and run jobs on comet. This includes: Logging on Setting your environment Running batch script jobs COMP 705: HW 6 Due: 11/01/17 Updated: 11/06/17 4/13 Mary Thomas Problem 6b: Using Parallel NetCDF Data Storage Problem 6b: Using Parallel NetCDF Data Storage Note: this problem is no longer required for HW4 (running on tuckoo) Modify your code to save temperature data at different time slices into a single NetCDF file You will need to define your own meta data and arrays to store. Examples include: data stored in structs, temperature, time data create names for data, and scalings (min/max temp, min/max temperature, etc.) Visualize 3 or more timeslices using software that can read your netCDF file.
    [Show full text]
  • They Have Very Good Docs At
    Intro to the Julia programming language Brendan O’Connor CMU, Dec 2013 They have very good docs at: http://julialang.org/ I’m borrowing some slides from: http://julialang.org/blog/2013/03/julia-tutorial-MIT/ 1 Tuesday, December 17, 13 Julia • A relatively new, open-source numeric programming language that’s both convenient and fast • Version 0.2. Still in flux, especially libraries. But the basics are very usable. • Lots of development momentum 2 Tuesday, December 17, 13 Why Julia? Dynamic languages are extremely popular for numerical work: ‣ Matlab, R, NumPy/SciPy, Mathematica, etc. ‣ very simple to learn and easy to do research in However, all have a “split language” approach: ‣ high-level dynamic language for scripting low-level operations ‣ C/C++/Fortran for implementing fast low-level operations Libraries in C — no productivity boost for library writers Forces vectorization — sometimes a scalar loop is just better slide from ?? 2012 3 Bezanson, Karpinski, Shah, Edelman Tuesday, December 17, 13 “Gang of Forty” Matlab Maple Mathematica SciPy SciLab IDL R Octave S-PLUS SAS J APL Maxima Mathcad Axiom Sage Lush Ch LabView O-Matrix PV-WAVE Igor Pro OriginLab FreeMat Yorick GAUSS MuPad Genius SciRuby Ox Stata JLab Magma Euler Rlab Speakeasy GDL Nickle gretl ana Torch7 slide from March 2013 4 Bezanson, Karpinski, Shah, Edelman Tuesday, December 17, 13 Numeric programming environments Core properties Dynamic Fast? and math-y? C/C++/ Fortran/Java − + Matlab + − Num/SciPy + − R + − Older table: http://brenocon.com/blog/2009/02/comparison-of-data-analysis-packages-r-matlab-scipy-excel-sas-spss-stata/ Tuesday, December 17, 13 - Dynamic vs Fast: the usual tradeof - PL quality: more subjective.
    [Show full text]
  • Introduction to Using the Netcdf Data Format with Fortran 90 I. Introduction
    Introduction to using the netCDF data format with Fortran 90 Michael Thorne ([email protected]) Last Updated: July 20, 2010 I. Introduction NetCDF – Network Common Data Form NetCDF is an array based data structure for storing multidimensional data. A netCDF file is written with an ASCII header and stores the data in a binary format. The space saved by writing binary files is an obvious advantage, particularly because one does not need worry about the byte order. Any byte- swapping is automatically handled by the netCDF libraries and a netCDF binary file can thus be read on any platform. Some features associated with using the netCDF data format are as follows: Coordinate systems: support for N-dimensional coordinate systems. • X-coordinate (e.g., lat) • Y-coordinate (e.g., lon) • Z-coordinate (e.g., elevation) • Time dimension • Other dimensions Variables: Support for multiple variables. • E.g., S-wave velocity, P-wave velocity, density, stress components… Geometry: Support for a variety of grid types (implicit or explicit). • Regular grid (implicit) • Irregular grid • Points Self-Describing: Dataset can include information defining the data it contains. • Units (e.g., km, m/sec, gm/cm3,…) • Comments (e.g., titles, conventions used, names of variables (e.g., P-wave velocity), names of coordinates (e.g., km/sec),... Full documentation on the data format can be found at: • http://www.unidata.ucar.edu/software/netcdf/ - netCDF webpage • http://www.unidata.ucar.edu/software/netcdf/docs/ - Full netCDF documentation • http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-f90/ - F90 Interface guide Naming convention: NetCDF files are generally named with the .nc extension.
    [Show full text]
  • Netcdf User's Guide for Fortran 90
    NetCDF User’s Guide for Fortran 90 An Access Interface for Self-Describing, Portable Data Version 3.5 March 2002 Russ Rew, Unidata Program Center, and Robert Pincus, NOAA/CIRES Climate Diagnostics Center Copyright © 2002 University Corporation for Atmospheric Research, Boulder, Colorado. Permission is granted to make and distribute verbatim copies of this manual provided that the copyright notice and these paragraphs are preserved on all copies. The software and any accompa- nying written materials are provided “as is” without warranty of any kind. UCAR expressly dis- claims all warranties of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The Unidata Program Center is managed by the University Corporation for Atmospheric Research and sponsored by the National Science Foundation. Any opinions, findings, conclu- sions, or recommendations expressed in this publication are those of the author(s) and do not nec- essarily reflect the views of the National Science Foundation. Mention of any commercial company or product in this document does not constitute an endorse- ment by the Unidata Program Center. Unidata does not authorize any use of information from this publication for advertising or publicity purposes. Chapter : i NetCDF User’s Guide for Fortran 90 1 Introduction . 1 1.1 The NetCDF Interface . 1 1.2 NetCDF Is Not a Database Management System . 1 1.3 File Format . 2 1.4 What about Performance? . 2 1.5 Is NetCDF a Good Archive Format? . 3 1.6 Creating Self-Describing Data conforming to Conventions . 3 1.7 Background and Evolution of the NetCDF Interface .
    [Show full text]
  • Efficient Netcdf Processing for Big Datasets Singh, R.M., Yu
    22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Efficient NetCDF processing for big datasets R.M. Singh a, J. Yu a and G. Podger a a CSIRO Land and Water Flagship, GPO Box 1666, Canberra ACT 2601 Email: [email protected] Abstract: NetCDF (Network Common Data Form) is a data format used commonly to store scientific array- oriented data. A number of software tools exist that can be used to process and view NetCDF data. In some cases though, the existing tools cannot do the processing required and thus, we have to write code for some of the data processing operations. One such case is when we have to merge two NetCDF files. Although the core library for read-write access to NetCDF files is written in C, interfaces to the C library are available in a number of languages including Python, C#, Perl, R and others. Python with the Numpy package is widely used in scientific computing where the Numpy package provides for fast and sophisticated handling of the arrays in Python. NetCDF being an array-oriented data format, Python/Numpy combination is a good candidate for NetCDF processing. But in terms of performance, Python is not as fast as C. So if we are looping over the NetCDF data, as is the case when we merge two files, then as the dimension size of the arrays increases, so does the computing time. And if we have to merge 1000’s of such files, then we have to look for opportunities to speed up the process.
    [Show full text]
  • Parallel Netcdf: a Scientific High-Performance I/O Interface
    Parallel netCDF: A Scientific High-Performance I/O Interface Jianwei Li Wei-keng Liao Alok Choudhary ECE Department, Northwestern University jianwei, wkliao, choudhar @ece.northwestern.edu { } Robert Ross Rajeev Thakur William Gropp Rob Latham MCS Division, Argonne National Laboratory rross, thakur, gropp, robl @mcs.anl.gov { } Abstract mat and an easy-to-use application programming interface (API) for storing and retrieving netCDF files across multi- Dataset storage, exchange, and access play a critical ple platforms. More and more scientific applications choose role in scientific applications. For such purposes netCDF netCDF as their output file format. While these applica- serves as a portable and efficient file format and program- tions become computational and data intensive, they tend to ming interface, which is popular in numerous scientific ap- be parallelized on high-performance computers. Hence, it plication domains. However, the original interface does not is highly desirable to have an efficient parallel programming provide a efficient mechanism for parallel data storage and interface to the netCDF files. Unfortunately, the original de- access. sign of the netCDF interface is proving inadequate for paral- In this work, we present a new parallel interface for writ- lel applications because of its lack of a parallel access mech- ing and reading netCDF datasets. This interface is de- anism. In particular, there is no support for concurrently rived with minimal changes from the serial netCDF inter- writing to a netCDF file. Therefore, parallel applications face but defines semantics for parallel access and is tai- operating on netCDF files must serialize access. Tradition- lored for high performance. The underlying parallel I/O ally, parallel applications write to netCDF files through one is achieved through MPI-IO, allowing for dramatic perfor- of the allocated processes which easily becomes a perfor- mance gains through the use of collective I/O optimizations.
    [Show full text]