4.04 Netcdf.Pptx
Total Page:16
File Type:pdf, Size:1020Kb
Network Common Data Form NetCDF An Indroduction NetCDF is a set of software libraries and self- describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. 1 The Purpose of NetCDF ● The purpose of the Network Common Data Form (netCDF) interface is to allow you to create, access, and share array- oriented data in a form that is self-describing and portable. ● Self-describing means that a dataset includes information defining the data it contains. ● Portable means that the data in a dataset is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers. ● The netCDF software includes C, Fortran 77, Fortran 90, and C++ interfaces for accessing netCDF data. ● These libraries are available for many common computing platforms. 2 NETCDF Features ● Self-Describing: A netCDF file may include metadata as well as data: names of variables, data locations in time and space, units of measure, and other useful information. ● Portable: Data written on one platform can be read on other platforms. ● Direct-access: A small subset of a large dataset may be accessed efficiently, without first reading through all the preceding data. ● Appendable: Data may be efficiently appended to a netCDF file without copying the dataset or redefining its structure. ● Networkable: The netCDF library provides client access to structured data on remote servers through OPeNDAP protocols. ● Extensible: Adding new dimensions, variables, or attributes to netCDF files does not require changes to existing programs that read the files. ● Sharable: One writer and multiple readers may simultaneously access the same netCDF file. With Parallel netCDF, multiple writers may efficiently and concurrently write into the same netCDF file. ● Archivable: Access to all earlier forms of netCDF data will be supported by current and future versions of the software. 3 Format description ● The netCDF libraries support 3 different binary formats for netCDF files: ● The classic format was used in the first netCDF release, and is still the default format for file creation. ● The 64-bit offset format was introduced in version 3.6.0, and it supports larger variable and file sizes. ● The netCDF-4/HDF5 format was introduced in version 4.0; it is the HDF5 data format, with some restrictions. ● All formats are "self-describing" ● Starting with version 4.0, the netCDF API allows the use of the HDF5 data format. ● NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions. ● Full backward compatibility in accessing old netCDF files and using previous versions of the C and Fortran APIs is supported. 4 Classic dataset ● A netCDF classic or 64-bit offset dataset is stored as a single file comprising two parts: ● a header, containing all the information about dimensions, attributes, and variables except for the variable data; ● a data part, comprising fixed-size data, containing the data for variables that don't have an unlimited dimension; and variable-size data, containing the data for variables that have an unlimited dimension. ● Both the header and data parts are represented in a machine- independent form. 5 Classic netCDF data model 6 Limitations in the classic netCDF data model ● Its simplicity makes it is easy to understand, but limitations include: ● No real data structures, just multidimensional arrays and lists ● No nested structures, variable-length types, or ragged arrays ● Only one shared unlimited dimension for appending new data ● A flat name space for dimensions and variables ● Character arrays rather than strings ● A small set of numeric types ● In addition, the classic netCDF format has performance limitations for high performance computing with very large datasets: ● Large variables must be less than 4 GB (per record) ● No real compression supported, just scale/offset packing ● Changing a file schema (the logical structure of the file) may be very inefficient ● Efficient access sometimes requires data to be read in the same order as it was written ● Big-endian bias may hamper performance on little-endian platforms ● I/O is serial in Unidata netCDF-3 ● but see Argonne/Northwestern Parallel netCDF project 7 NetCDF HDF5 dataformat ● NetCDF-4 files are created with the HDF5 library, and are HDF5 files and can be read without the netCDF-4 interface. ● Note that modifying these files with HDF5 will almost certainly make them unreadable to netCDF-4 ● Groups in a netCDF-4 file correspond with HDF5 groups ” ● Variables in netCDF correspond with identically named datasets in HDF5. ● Attributes similarly. ● Since there is more metadata in a netCDF file than an HDF5 file, special datasets are used to hold netCDF metadata. ● The _netcdf_dim_info dataset (in group _netCDF) contains the ids of the shared dimensions, and their length (0 for unlimited dimensions). ● The _netcdf_var_info dataset (in group _netCDF) holds an array of compound types which contain the variable ID, and the associated dimension ids. ● backward compatibility to the classical format is preserved ● Support for parallel IO ● http://www.unidata.ucar.edu/netcdf/netcdf-4. 8 Data Model in NetCDF-4/HDF5 Files 9 Fortran Code example ● http://www.unidata.ucar.edu/software/netcdf/examples/programs/ ● Download simple_xy_rd.f90 and simple_xy_wr.f90 Extract from simple_xy_wr.f90 ! This is a very simple example which writes a 2D array of ! sample data. To handle this in netCDF we create two shared ! dimensions, "x" and "y", and a netCDF variable, called "data". ! Open the file call check( nf90_create(FILE_NAME, NF90_CLOBBER, ncid) ) ! Define the dimensions. NetCDF will hand back an ID for each. call check( nf90_def_dim(ncid, "x", NX, x_dimid) ) call check( nf90_def_dim(ncid, "y", NY, y_dimid) ) dimids = (/ y_dimid, x_dimid /) Define Mode ! Define the variable. call check( nf90_def_var(ncid, "data", NF90_INT, dimids, varid) ) ! End define mode. This tells we are done defining metadata. call check( nf90_enddef(ncid) ) call check( nf90_put_var(ncid, varid, data_out) ) call check( nf90_close(ncid) ) 10 Code example (Using a Cray XC or XE), page 2 ● How to compile > module load cray-netcdf > ftn simple_xy_rd.f90 –o simple_xy_rd > ftn simple_xy_wr.f90 –o simple_xy_wr ● Executing (in interactive mode) : > srun ./simple_xy_wr *** SUCCESS writing example file simple_xy.nc! > srun ./simple_xy_rd *** SUCCESS reading example file simple_xy.nc! 11 Code example (Using a Cray XC or XE), page 3 > ncdump simple_xy.nc netcdf simple_xy { dimensions: x = 6 ; y = 12 ; variables: int data(x, y) ; data: data = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 ; } 12 Self-Describing Data ● The mere use of netCDF is not sufficient to make data "self- describing" and meaningful to both humans and machines. ● The names of variables and dimensions should be meaningful and conform to any relevant conventions. Dimensions should have corresponding coordinate variables where sensible. ● Attributes play a vital role in providing ancillary information. It is important to use all the relevant standard attributes using the relevant conventions. ● A number of groups have defined their own additional conventions and styles for netCDF data. Descriptions of these conventions, as well as examples incorporating them can be accessed from the netCDF Conventions site, http://www.unidata.ucar.edu/netcdf/conventions.html. ● These conventions should be used where suitable. Additional conventions are often needed for local use. 13 Second example : Adding attributes ● Using sfc_pres_temp_wr.f90 from the same webpage as before Note : Each netCDF call has a ‘call check(statement) around it. Removed here for easy of reading (see example 1) ! We will write surface temperature and pressure fields. character (len = *), parameter :: PRES_NAME="pressure" character (len = *), parameter :: UNITS = "units" character (len = *), parameter :: PRES_UNITS = "hPa" nf90_def_var(ncid, PRES_NAME, NF90_REAL, dimids, pres_varid) !Assign units attributes to the pressure and temperature netCDF variables. nf90_put_att(ncid, pres_varid, UNITS, PRES_UNITS) nf90_enddef(ncid) nf90_put_var(ncid, pres_varid, pres_out) 14 Second example (output) netcdf sfc_pres_temp { dimensions: latitude = 6 ; longitude = 12 ; variables: float latitude(latitude) ; latitude:units = "degrees_north" ; float longitude(longitude) ; longitude:units = "degrees_east" ; float pressure(latitude, longitude) ; pressure:units = "hPa" ; float temperature(latitude, longitude) ; temperature:units = "celsius" ; data: latitude = 25, 30, 35, 40, 45, 50 ; longitude = -125, -120, -115, …, -90, -85, -80, -75, -70 ; pressure = 900, 906, …, 971 ; temperature = 9, 10.5, …, 26.75 ; } 15 NetCDF-4 Performance Improvements ● Per-variable compression Compresses a variable with zlib. Transparent to the user ● Per-variable chunking (multidimensional tiling) A chunk is a hyper-rectangle of any shape. When a dataset is chunked, each chunk is read or written as a single I/O operation, and individually passed from stage to stage of the pipeline and filters ● Parallel-I/O for platforms with parallel file systems HDF5's, not Argonne's Parallel ● "Reader makes right" rules ● Writer always uses native representations,