Redalyc.A Lossy Method for Compressing Raw CCD Images
Total Page:16
File Type:pdf, Size:1020Kb
Revista Mexicana de Astronomía y Astrofísica ISSN: 0185-1101 [email protected] Instituto de Astronomía México Watson, Alan M. A Lossy Method for Compressing Raw CCD Images Revista Mexicana de Astronomía y Astrofísica, vol. 38, núm. 2, octubre, 2002, pp. 233-249 Instituto de Astronomía Distrito Federal, México Available in: http://www.redalyc.org/articulo.oa?id=57138212 How to cite Complete issue Scientific Information System More information about this article Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Journal's homepage in redalyc.org Non-profit academic project, developed under the open access initiative Revista Mexicana de Astronom´ıa y Astrof´ısica, 38, 233{249 (2002) A LOSSY METHOD FOR COMPRESSING RAW CCD IMAGES Alan M. Watson Instituto de Astronom´ıa Universidad Nacional Aut´onoma de M´exico, Campus Morelia, M´exico Received 2002 June 3; accepted 2002 August 7 RESUMEN Se presenta un m´etodo para comprimir las im´agenes en bruto de disposi- tivos como los CCD. El m´etodo es muy sencillo: cuantizaci´on con p´erdida y luego compresi´on sin p´erdida con herramientas de uso general como gzip o bzip2. Se convierten los archivos comprimidos a archivos de FITS descomprimi´endolos con gunzip o bunzip2, lo cual es una ventaja importante en la distribuci´on de datos com- primidos. El grado de cuantizaci´on se elige para eliminar los bits de bajo orden, los cuales sobre-muestrean el ruido, no proporcionan informaci´on, y son dif´ıciles o imposibles de comprimir. El m´etodo es con p´erdida, pero proporciona ciertas garant´ıas sobre la diferencia absoluta m´axima, la diferencia RMS y la diferencia promedio entre la imagen comprimida y la imagen original; tales garant´ıas implican que el m´etodo es adecuado para comprimir im´agenes en bruto. El m´etodo produce im´agenes comprimidas de 1/5 del tamano~ de las im´agenes originales cuando se cuan- tizan im´agenes en las que ningun´ valor cambia m´as de 1/2 de la desviaci´on est´andar del fondo. Esta es una mejora importante con respecto a las razones de compresi´on producidas por m´etodos sin p´erdida, y aparentemente las im´agenes comprimidas con bzip2 no exceden el l´ımite te´orico por m´as de unas decenas de por ciento. ABSTRACT This paper describes a lossy method for compressing raw images produced by CCDs or similar devices. The method is very simple: lossy quantization followed by lossless compression using general-purpose compression tools such as gzip and bzip2. A key feature of the method is that compressed images can be converted to FITS files simply by decompressing with gunzip or bunzip2, and this is a significant advantage for distributing compressed files. The degree of quantization is chosen to eliminate low-order bits that over-sample the noise, contain no information, and are difficult or impossible to compress. The method is lossy but gives guarantees on the maximum absolute difference, the expected mean difference, and the expected RMS difference between the compressed and original images; these guarantees make it suitable for use on raw images. The method consistently compresses images to © Copyright 2002: Instituto de Astronomía, Universidad Nacional Autónoma México roughly 1/5 of their original size with a quantization such that no value changes by more than 1/2 of a standard deviation in the background. This is a dramatic im- provement on lossless compression. It appears that bzip2 compresses the quantized images to within a few tens of percent of the theoretical limit. Key Words: TECHNIQUES: IMAGE PROCESSING 1. INTRODUCTION infrared detector mosaic has 2k 2k pixels (Beck- ett et al. 1998), but makes up for×its smaller size by Optical and infrared instruments now routinely being read more frequently. More than a few nights produce huge amounts of image data. The largest of data from such instruments can easily overwhelm current common-user CCD mosaic has 12k 8k pix- workstation-class computers. Compression is a solu- els (Veillet 1998); a single image from such×a mosaic tion to some of the problems generated by these large is 192 MB in size. The largest current common-user quantities of data. Similarly, compression can im- 233 234 WATSON prove the effective bandwidth to remote observato- ries (in particular space observatories), remote data archives, and even local storage devices. This paper describes in detail a lossy method for compressing raw images and presents a quantitative comparison to other lossy methods. It is organized as follows: 2 reviews the limitation of lossless com- pression andx the motivation for lossy compression; 3 briefly summarizes the most relevant previous wx ork on lossy compression; 4 describes the new method; 5 presents results xon the distribution of differencesx between the original and compressed im- ages; 6 investigates the performance of the method with particularx reference to hcomp; 7 compares the performance of the method to otherx similar meth- ods; 8 discusses the suitability of the method for compressingx raw data and investigates some conse- quences of such use; 9 discusses how the method might be improved; andx 10 presents a brief sum- mary. x Fig. 1. The compression ratio (ratio of compressed size 2. LOSSLESS COMPRESSION to original size) as a function of log2 q for an image con- 2.1. The Shannon Limit sisting of pure Gaussian noise with standard deviation σ written as a 16-bit FITS image with a BSCALE of Compression methods can be classified as lossless qσ and then losslessly compressed. The solid line is the or lossy depending on whether compression changes Shannon limit for optimal compression of the individual the values of the pixels. The difficulties of lossless 16-bit pixels and the other lines show the ratios achieved compression of CCD images have been discussed by by bzip2, gzip, lzop, and hcomp (used losslessly). White (1992), Press (1992), and V´eran & Wright (1994). The noise in images from an ideal CCD con- where pi is the normalized frequency of the input sists of Poisson noise from the signal and uncorre- code unit i. From this we can derive that the optimal lated Gaussian read noise. If the signal is measured input code unit is the longest one for which the bits in electrons, the variance of the Poisson noise is equal are still correlated; since in this case, each pixel is to the signal and the variance of the read noise is typ- independent, the optimal input code unit is simply a ically 10{100. However, CCD images are often quan- 16-bit pixel. The resulting optimal compression ratio tized to sample the read noise, with typical analog- (the ratio of the compressed size to original size) will to-digital converter gains being 1{5 electrons. This be H=16. means that the larger noises associated with larger Figure 1 shows as a function of q the optimal com- signal levels can be hugely over-sampled. If this is pression ratio determined by calculating the Shan- the case, the low-order bits in an image are essen- non entropy explicitly. As can be seen, the optimal tially uniform white noise, contain no information, compression ratio is a constant minus (log2 q)=16. and cannot be compressed. Intuitively, this is expected: the number of incom- © Copyright 2002: Instituto de Astronomía, Universidad Nacional Autónoma México pressible low-order bits grows as log2 q and the To illustrate this, series of 512 512 16-bit FITS − images of pure Gaussian noise were× created.× The im- fraction of the compressed image occupied by these bits grows as (log2 q)=16. This result is not new; ages were written with with a BSCALE of qσ, so the − standard deviation is sampled by a factor of 1=q. Romeo et al. (1999) investigated the Shannon limit Shannon's first theorem (Shannon 1948ab, 1949) al- for compression of a quantized Gaussian distribu- lows us to calculate the optimal compression ratio for tion, and their equation (3.3), in my notation, is these images. The theorem states that if a stream of H log2 p2πe log2 q; (2) bits is divided into fixed-length \input code units", ≈ − then the minimum number of bits required to encode This equation was derived in the limit of small q, each input code unit is just the Shannon entropy, but Gaztanaga~ et al. (2001) have shown that the H p log2 p ; (1) corrections for finite q are small for q at least as ≡ − i i 8 i6=0 large as 1.5. Not surprisingly, the Shannon entropies Xp COMPRESSING RAW IMAGES 235 and compression ratios calculated using this equa- instead can approximate their compression ratios by tion and calculated explicitly for the Gaussian noise the Shannon limit. images are in almost perfect agreement. Hcomp (White 1992) is specialized to compress- Real astronomical images contain information ing 16-bit astronomical images and uses a com- in addition to noise, but nevertheless this exercise plex algorithm based on a wavelet transformation of demonstrates the futility of attempting to achieve the pixel values, optional quantization of the coeffi- good compression ratios for images that contain cients, and quad-tree compression of the coefficients. over-sampled noise. For example, if the noise is Hcomp can be used losslessly by omitting the quan- over-sampled by a factor of 60 (which corresponds tization of the coefficients. to q = 60), it is impossible to compress the image Bzip2 (Seward 1998), gzip (Gailly 1993), and losslessly to better than half its size; too much of lzop (Oberhumer 1998) are methods for compressing the image is occupied by white noise in the low-order general byte streams and use dictionary-based algo- bits.