www.library.carleton.ca/find/gis File Compression An Introduction for Raster Image Files

Introduction: Image files can take up a lot of disc space and be very large. Many use compression techniques to reduce the storage space required by image data. The details of compression vary but are typically described in the specification. Compression techniques are distinguished by whether they remove detail and color from the image.

Lossless techniques compress image data without removing detail Lossy techniques compress images by removing detail

“Most file types that are lossy compressed are typically where the algorithm looks for structure and pattern that is essentially uniform or repeated and can be reduced by using a simpler representation (often with a code telling how much it is uniform over what space, or an equation/ function of some sort). The problem is that in the compression processing, some data variance may be removed that contains information that can't be recovered when the inverse algorithm is run to re-expand (re-construct) the image. However, most lossy algorithms can be lossy to a user selected degree, i.e. the greater the compression, the more potential for loss of information. In satellite images of more uniform features such as deserts and water at a moderate (Landsat) to coarse (AVHRR, SPOT-VGT) scale, high compression can be used. However, where surface features spatially vary a lot in reflectance and the image scale captures that, then high compression will lose some of that info. So, the degree of compression needed is a trade-off between:

• storage capabilities, processing capabilities, both which have to be high for non-compressed or lossless compressed (which is a very low level of compression generally for remote sensing images) images, the image scale, and

• the user needs in terms of, details of features, that need to be resolved and minimum mapping unit, amongst other things. If you have the storage space, I would suggest getting lossless where possible. GeoTiff files can generally be obtained lossless, sometimes jpegs could be considered lossless if the compression is really low. Compression methods such as have advanced a lot and can also vary from lossless to lossy. Where possible choose lossless, or no compression. User's can then compress if they need to.”1

Raster Data : Raster images like vector data are often lacking sufficient metadata from the data producer. Metadata is crucial to using the image files. Metadata such as below makes a researchers job easy:

• MrSID, lossy, at 20:1 compression ratio, 3 bands (RGB). 9 MB each

• The original uncompressed 4 km. × 4 km. GeoTIFF tiles (about 250 MB each) • 4 km. × 4 km. tiles in lossless JPEG 2000 format (compressed to 125 MB each but with- out any loss in image quality from the original GeoTIFFs)

1 Dr. Douglas J.King, Professor, Dept. of Geography, Carleton University Page 2

Information needed in the metadata is:

• Specify in the abstract section how the raster data was produced. For example, the following is a sample of the abstract section for a Digital Raster Graphic (DRG) in the abstract section: This Digital Raster Graphic (DRG) was produced by down sampling, georeferencing, and conversion of a 1000 dots-per-inch (dpi) composite image of revised map separates to a standard, USGS GeoTIFF format. This DRG includes collar information and is georeferenced to the UTM grid.

• Specify in the purpose section how the raster data can be used. For example: Due to the georeferencing and high accuracy, this DRG is useful as a source or background layer in a GIS, as a means to perform quality assurance on other digital products, and as a source for the collection and revision of vector data. This DRG can also be merged with other digital data, such as Digital Elevation Models (DEMs) or Digital Orthophoto Quads (DOQs), to provide additional visual information for the extraction and revision of base cartographic information. 2

Commonly used compression techniques: For large file sizes requiring high degree of compression, such as Tiff files:

compression technique is used in Lizardtech’s MrSID3 format. MrSID raster datasets are normally highly compressed with a algorithm. Older MrSID files will always be lossy whereas the newer version of the software allows for . MrSID is widely supported and may be easily utilized in almost every remote sensing and GIS software package. The lossy/lossless of an image is set at the time of creation. The user can specify the compression ratio as well as select the number of resolution levels to include with the file. You can export large images with high compression while still preserving image quality by choosing the MrSID (Multiresolution Seamless Image Database) file format. The wavelet compression technique embeds into the file multiple image levels with differing resolution. The export also provides two methods of managing memory during the compression stage: a faster one-pass method that does not limit the amount of memory that can be accessed by the process, and a slower, two-pass method that limits memory use to enable the compression of very large images.

Lossless Encoding Increases Image Accuracy Lossless MrSID is wavelet-based imaging technology that dramatically increases the value of geospatial data by making it more accessible and useful while maintaining the highest level of quality and accuracy.Lossless technology maintains numeric pixel fidelity between original and encoded imagery, enabling geospatial imagery to be used in new applications, on more de- vices and across more networks.In addition, it provides significant reductions in the cost of storing, sharing and using geospatial information.4

Note: Coordinate Information is embedded within SID files automatically when created. If the projection is not set correctly, problems will happen when using the file. With ArcGIS, the projection information needs to be set in ArcCatalog. If there are problems, a work around exists by creating an .sdw file. Aux files are automatically created by ArcMap.

2 Geospatial Metadata Standards. Missouri Geographic Information System Advisory Committee. http://www.mgisac.org/index.php?n=Standards.MetadataStandard 3 http://www.lizardtech.com 4 Lizardtech. http://www.lizardtech.com/press/news.php?item=07-07-2003 Page 3

For Images with continuous tone such as photographs including the Library’s collection of air photos and orthophotos:

• Joint Photographic Experts Group (JPEG) is a lossy compression technique supported by JPEG, TIFF, PDF, and PostScript language file formats. JPEG compression provides the best results with continuous-tone images, such as photographs. When you choose JPEG compression, you specify the image quality by choosing an option from the Quality menu, dragging the Quality pop-up slider, or entering a value between 1 and 12 in the Quality text box. For the best printed results, choose maximum-quality compression. Files with JPEG encoding can be printed only on Level 2 (or later) PostScript printers and may not separate into individual plates.

Be aware that the higher the quality the larger the file size. If in doubt do a test and try a medium range and then a high quality to see the difference in file size.

For Images that contain large areas of a single colour:

• LZW is a lossless compression technique supported by TIFF, PDF, GIF, and PostScript language file formats. This technique is most useful in compressing images that contain large areas of single color, such as screenshots or simple paint images.

• ZIP encoding is a lossless compression technique supported by the PDF and TIFF file formats. For images that are black and white:

• CCITT encoding is a family of lossless compression technique that is supported by the PDF and PostScript language file formats. (CCITT is an abbreviation for the French spelling of International Telegraph and Telekeyed Consultive Committee.).