Using the FITS Data Structure
Total Page:16
File Type:pdf, Size:1020Kb
Stephen Finniss (FNNSTE003) Literature Synthesis Using the FITS Data Structure Abstract In the interest of implementing a back-end application that can handle FITS data, an introduction into what the FITS file format is, and what it aims to achieve is presented. To understand the contents of a FITS file a brief overview of the structure of FITS files is conducted. In addition a description of standard and conforming extensions to the FITS format are briefly investigated to demonstrate the flexibility of the format. To demonstrate the usefulness of the format an overview of what data is recorded using the format, as well as the number of locations produces files in this format is shown. The results from a brief investigation into what FITS IO libraries are available is discussed. The main conclusions are that writing a library for handling IO is a large task, and using CFITSIO is advised and that due to how many different kinds of information can be stored using the format that targeting a subset would allow for a better application to be written. Introduction The FITS file format was defined as a way to allow astronomical information to be transported between institutions which all use different internal formats. To achieve this goal the format itself is very generic, and can be used to represent a large amount of different data types. Furthermore, all valid FITS files will remain valid even when updates are made to the FITS standard[1]. Implementing an application which uses these files is an interesting task as the flexibility of the format means that there are many ways to interpret the data contained in them. Exploring the data format, the structures it uses, the extensions it has available to it, what is it used for, and what libraries are available to work with it should outline what would be required to write a back-end application for handling FITS files. The FITS Format The format was originally standardised in 1981[1]. The original FITS file format contained 2 parts, a human readable header and a data component. The purpose of the header is to described what is contained in the data component as well as provide meta data. In the header are multiple keyword value pairs, and each entry is exactly 80 characters long and may optionally have a comment. Not all keywords have a value. The header will always contain certain mandatory keywords, however there are many optional keyword values pairs that can be added[1]. For flexibility it is possible to add keyword value pairs that are not specified, and programs used for viewing and handling the FITS file are expected to just ignore them[2]. The keywords deal a lot with the format of the data, and it is interesting to note that although the format is used to describe what are effectively images, the information stored does not contain actual pixel data. The data stored is real world data, that would have to be correctly interpreted by applications used to view and manipulate the data. The original specification has been changed, with the most recent being published in 2008[2]. The most recent standard for FITS deprecates certain keywords and techniques used to store extra data in FITS files, as well as having standardized support for FITS extensions. FITS extensions all obey the original specification regarding a header containing keyword value pairs, followed by data. The extensions simply describe different mandatory keywords and how they are used to read the data. Stephen Finniss (FNNSTE003) Literature Synthesis [3] FITS Extensions The standard extensions are the ‘IMAGE’[5], ‘TABLE’[6] and ‘BINTABLE’[4] extensions, as they are all specified in the FITS standard[2]. Conforming extensions are those that conform to the standard for extensions as specified in the FITS standard, but are not approved by the IAU FITS Working Group[2]. An example of a conforming extension is the proposed image compression extension[7, 8]. The original FITS specification was good for recording single sets of n-dimensional data, however for multiple sets of related n-dimensional data one would have to use a new FITS file for each new set. The image extension was the answer to this problem, and allowed for multiple related sets of n-dimensional data to be stored in a single FITS file[5]. The appearance of this extension is almost identical to that of the primary header, the difference being it includes certain keywords value pairs in its header that are mandatory in all conforming FITS extensions[2]. The use of the FITS format became popular as a method to transfer astronomical data, and as a result of this the ability to define other kinds of related data was proposed. The table extension was created to allow for the recording of standard catalogues, observation data and tables of results[6]. The extension defines a number of new keyword value pairs in the extension’s header and there is no data component associated with the extension. The binary table extension is a more generalized version of the table extension. Because the tables extension was entirely defined in the header[6], all the values had to be recorded in ASCII to conform with the FITS standard. The binary table removes this problem by storing information in the data component and allows for more flexible and efficient way to store data structures inside FITS files[4]. Using the binary table, one is able to define variable length arrays, multidimensional arrays and sub-string arrays. Stephen Finniss (FNNSTE003) Literature Synthesis FITS files are used to store very large data sets, such as all the information received from a telescope. However, there is no attempt to compress this information in the file specification itself. One would have to compress the FITS file after it has been created to reduce its size. The problem this creates is that the entire FITS file must be decompressed if one is to investigate the contents of the file. The response to this is to use a compression extension. This allows for the data component to be compressed, but allow the headers to remain human readable[7]. This way one can easily tell what the contents of the file are without having to decompress it, which can be a timely process with large files. Further more, when processing the file, one would only have to decompress the part of the file they’re interested in viewing. This extension is not part of the FITS standard, however it is implemented in some libraries for handling FITS files[8]. There are also extensions to the format which allow for defining different coordinate conventions. Celestial[13], spectral[12] and world[14] coordinates can be specified. Example Usage FITS files have many uses within astronomy. Originally designed to store any n-dimensional data, it has been extended to store far more. Any information that is retrieved from a telescope can be represented using FITS files. Many datasets that have been published used these files. In the 6dF Galaxy survey FITS files are used to record results from the survey. This includes information using various coordinate systems, information about spectra, and images taken by the telescopes[10]. FITS files can also be used to store spectral data as there is no widely used format for storing spectroscopic datasets[11]. FITS files are available from a large number of sources, many of which make them available online[9]. Available FITS File Handling Libraries A large number of libraries have been written to handle IO with FITS files. These libraries are available in a number of different languages, each with varying levels of compliance with the FITS standard and its extensions. A lot of the libraries are high level, and will represent the FITS data using objects. They are easier to use, but have less control. In the interest of performance, a low level library is ideal[9]. The FITS standard is well supported, as there are FITS IO libraries written in over 10 other languages. CFITSIO is a low level library written in C. It is capable of handling FITS files that support all the standard FITS features[8]. Most the other libraries don’t have the same level of compliance with the FITS standards. There are libraries written in other languages that are wrappers around CFITSIO. Namely CCfits (C++), CSharpFits (in C#), FtisTCL (TCL), CFITSIO.pm (Perl), MFITSIO (MatLab), GFITSIO (LabVIEW), PFITS (Python) and FITSload (IGOR Pro). CFITSIO appears to be the best choice, however there are other candidates such as PyFITS, MRDFITS/MWRFITS and nom.tam.fits. PyFITS has a similar level of support for the FITS format. However for a high performance application, Python is a poor choice. This library is a useful option for prototyping. MRDFITS/MWRFITS for IDL also has good support, unfortunately IDL is an entirely propriety language, which limits it’s accessibility. There is a free alternative to IDL, but it does not provide the same feature set, and there is no guarantee libraries will work with it. Stephen Finniss (FNNSTE003) Literature Synthesis nom.tam.fits for Java also supports all the standard FITS features, and is efficient for a Java library. CFITSIO is the best choice in terms of support and performance. For simpler tasks with handling FITS files there are tools such as FTOOLS, which provides a plethora of small applications each handling a specific aspect of FITS file manipulation[15]. Conclusion Because the FITS format was originally designed to be used on magnetic tape[1] and the format has to remain backwards compatible with previous versions[2], there are a large number deprecated features which needs to be catered for.