Introduction Commonalities Examples Final Notes
Scientific File Formats
Daniel L. Wang
SLAC
6 October 2010
Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
1 Introduction
2 Commonalities
3 Examples FITS XTC ROOT I/O NetCDF HDF5 Others
4 Final Notes
Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
Introduction: why files?
Files contain many (most?) scientific data
Files last for a long time
Explosion in data→more, bigger files
Figure: Magnetic tape drive http://www.flickr. com/photos/laughingsquid/102689398/
Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
Scientific File Access Simple, non-transactional Data access: value lookups, statistics, plotting, Transformations: simple math + complex algorithms Logging/history: coarse “Image C is the result of function(Image A, parameter set B)” not “$X was deducted from Account Y and added to Account Z” Longevity: >10, 50 or 100+ years
Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
Themes and Commonalities in Formats Sequential or random-access navigation Storage efficiency Self-description (i.e., metadata) Ordering: Object sequences, grids, images, N-D arrays (+ tables) Write-append (non-update) Machine portability (FP format, byte order) Standards
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
FITS Flexible Image Transport System. Standard astronomical data format (NASA/IAU)
Generic N-D arrays, images, and ASCII or binary tables Image tile-compression Not random access Human-readable header See also: AipsIO (used by casacore)
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Primary HDU HDU type Contents Primary ASCII header + n-D array Extension HDU Image* ASCII header w/image metadata + n-D array Extension HDU ASCII Table* ASCII header w/table metadata + fixed-width ... row data Binary Table* ASCII header w/table metadata + 2-D array * Extension HDU
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
XTC eXtended Tagged Container (HEP, photon science)
Object serialization: Vectors, trajectories, events, detections Not random-access No compression Lightweight, streaming
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Datagram Datagram type Notes Sequence Event transitions (e.g., Datagram {Begin,End}Run, L1Accept) ... Xtc User data objects Env More information: https://confluence.slac.stanford.edu/display/PCDS/XTC+format
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
ROOT I/O ROOT Object I/O (HEP)
Object serialization Tree-structured (similar to fs) Object deletion Compression (deflate) Ranged values See also: LCIO
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Figure: From [Brun and Rademakers, 1996]
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
NetCDF Network Common Data Form. (Geo)
N-dimensional arrays Arrays appendable in one dimension (v4: or more) Named dimensions with explicit coordinates (allow irregular spacing) Machine portable Slabbed, sliced, random access NetCDF4: NetCDF API on HDF5 physical structure
2000-2006 JJA wind power, courtesy Scott Capps Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Section Notes Header magic, #records, dimension, global attr, variable meta Non-record data fixed-size variables, incl fixed- dimensions Record data record variables (incl. record di- mensions From [Rew et al., 1997]
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
N-D arrays Array nesting via pointers+VL datatypes Compression Parallel I/O (MPI I/O) Flexible, chunked data layout (slabs or custom tiles) Ragged arrays via variable-length datatypes fs-like, w/symlinks, heaps, freelists, Up to 255 byte offsets
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Figure: From [HDF Group, 2010]
Daniel L. Wang Scientific File Formats FITS Introduction XTC Commonalities ROOT I/O Examples NetCDF Final Notes HDF5 Others
Many more formats!
Irregular/non-rectangular grids (e.g., geodesic, radial, etc.) [Wadsley and Shell Internationale Petroleum, 1980] , triangular meshes [G´orski et al., 2005]
See also Scientific Data Format FAQ [Stern, 1995] Figure: Hexagonal mesh http: //www.flickr.com/photos/danhorst/819469908/
From HEALPix: http://healpix.jpl.nasa.gov
Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
Final notes
Data > formats
Long-lived formats, long-lived software
Figure: Looking for data in files. http: //www.flickr.com/photos/mahmood/4616170423/ Daniel L. Wang Scientific File Formats Introduction Commonalities Examples Final Notes
References Brun, R. and Rademakers, F. (1996). ROOT Object I/O System. http://www.hdfgroup.org/HDF5/doc/H5.format.html.
G´orski, K., Hivon, E., Banday, A., Wandelt, B., Hansen, F., Reinecke, M., and Bartelmann, M. (2005). HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. The Astrophysical Journal, 622:759.
HDF Group (2010). HDF5 file format specification version 2.0. http://www.hdfgroup.org/HDF5/doc/H5.format.html.
Rew, R., Davis, G., Emmerson, S., and Davies, H. (1997). NetCDF user’s guide for C. Unidata Program Center.
Stern, I. (1995). Scientific Data Format Information FAQ.
Wadsley, W. A. and Shell Internationale Petroleum (1980). Modelling reservoir geometry with non-rectangular coordinate grids. In SPE Annual Technical Conference and Exhibition, Dallas, Texas. American Institute of Mining, Metallurgical, and Petroleum Engineers.
Daniel L. Wang Scientific File Formats