<<

Image Standards

Betsy Fanning 03/29/2006

TIFF, JPEG, and PDF. Different formats for different jobs.

When one thinks of enterprise content management (ECM), one tends to focus more on the management of content in an organization. As AIIM defines ECM, there is a whole lot more to it that includes the technologies used to capture, manage, store, preserve, and deliver content and documents.

Even though a majority of the information organizations handle today is born digitally and stays that way, a lot of information still enters organizations as paper. While paper continues to be useful in business, digital information enables an organization to respond more rapidly to changing circumstances. In the late 1980s, the buzz in the business world circulated around digital document imaging. It was the solution many organizations implemented to gain some level of control over their documents. As recently as 10 years ago (according to AIIM research), industry solutions were typically departmental in with a particular business focus. As our survey indicates, 82% of end users now see ECM technologies— imaging is one—as a core element in their overall IT infrastructure.* Today, organizations don’t think twice about imaging as imaging technology is integrated in most products. Image viewers are prevalent.

Image file format standards have helped move along the widespread adoption of imaging technology. provide a standardized method of organizing and storing image data. A scanned document or image consists of picture elements, or pixels, that represent the brightness and color of the information on the page. While there are numerous graphic and image file formats, this article will look at TIFF, JPEG, and PDF; three of the standards used in document imaging.

Compression of Image Files When considering imaging file formats, one needs to have a basic understanding of compression. A single compression method is not applicable for all scanned documents. When choosing the best method, one must consider the type of document that will be scanned.

Compression scheme is the method used to reduce the amount of data needed to store or transmit a representation of an image. Compression is lossless when the data is compressed by efficient coding of the information in the image and where the reconstructed image contains the same amount of information. In , images are compressed by selectively removing information from the image. This does

From www.aiim.org/article-aiim.asp?ID=31178 1 18 December 2006 not mean that words, phrases, or sentences are removed. Through complex algorithms statistically redundant information as well as perceptually irrelevant or unimportant information is removed leaving only the useful information. ANSI/AIIM TR 33, Selecting an Appropriate Method to Match User Requirements provides an explanation of compression algorithms and useful information in selecting the best compression algorithm for your application.

TIFF TIFF, Tagged Image File Format, is used mainly for storing raster images, including and line art, and is largely credited with founding the imaging industry. Aldus is credited with developing TIFF for use with PostScript printing. It is now widely used for images along with JPEG. TIFF’s primary goal is to provide a rich environment within which applications can exchange image data. This richness is required to take advantage of the varying capabilities of scanners and other imaging devices.

TIFF uses tags to handle multiple images and data in a single file. These tags describe the size of the image or define how the image data is arranged and identifies the compression algorithm, if any, that is used. Images created using TIFF can be used for archiving purposes because TIFF is a lossless format, i.e., the file may be edited and saved without losing any compression.

In document management, TIFF is used in conjunction with CCITT Group IV compression (typically used with facsimile technology). Usually black and white documents are captured using TIFF; however, color may also be used. In large volume applications, documents are typically scanned in black and white, rather than color or to conserve on the file size. Because TIFF supports multiple pages, a multi- page document can be scanned to a single file rather than an individual file for each page scanned.

JPEG JPEG (pronounced jay-peg; Joint Photographic Experts Group) is a lossy compression format for photographic images. It is designed for use with either full color or gray-scale images. JPEG is best when used with photographs rather than text. JPEG specifies how an image is transformed into a stream of bytes, but not how those bytes are encapsulated in any particular storage medium. JFIF (JPEG File Interchange Format), created by the Independent JPEG Group, specifies how to produce a file suitable for computer storage and transmission over the Internet from a JPEG stream.

JPEG/JFIF is commonly used to store and transmit photographs over the Internet. It is not suitable for use with line drawings or text because its compression method does not perform well with these types of images. PNG and GIF are used in these instances. JPEG is best used with photographs and paintings of realistic scenes with smooth variations of tone and color. In many cases, JPEG will produce a much higher quality image than other common methods.

With the increasing use of multimedia technologies, image compression requires higher performance and new features. JPEG 2000 is intended to advance standardized image coding systems to serve applications for years to come. JPEG 2000 is a new image

From www.aiim.org/article-aiim.asp?ID=31178 2 18 December 2006 format based on state-ofthe- art wavelet compression. It is applicable for a number of different applications in the market including digital cameras, pre-press, medical imaging, and others. JPEG 2000, Part 1 (ISO 15444) offers both lossless and lossy compression and provides better image quality at smaller file sizes than JPEG. JPEG 2000, Part 2 (ISO 15444/6) is used to compress scanned color documents containing both bitonal elements as well as images.

The development of JPEG 2000 is the result of collaboration between the International Organization for Standardization (ISO), the International Telecommunications Union (ITU-T, formerly CCITT), and input from a multitude of industry experts.

PDF The final file format to be discussed is PDF, Portable Document Format. Did you know that there are over 500 PDF product suppliers? PDF is a file format developed by Adobe Systems for representing documents in a manner that is independent of the original application software, hardware, and operating system used to create those documents. PDF is an and anyone may write applications, royalty free, that can read or write a PDF document. A PDF document is a self-contained, cross-platform document. It is a file that will look the same on the screen and in print, regardless of what kind of computer or printer someone is using and regardless of what software package was originally used to create it. Although they contain the complete formatting of the original document, including and images, PDF files are highly compressed, allowing complex information to be downloaded efficiently. PDF is the de facto standard for secure, dependable electronic information exchange that is widely recognized by industries and governments around the world.

In addition to being an open standard, PDF is also flexible. A family of PDF standards has either been produced or are in the developmental stages. AIIM and NPES, working with many records managers, archivists, industry representatives, and other PDF developers, completed work on PDF/Archive or PDF/A (ISO 19005-1) that will ensure the long-term preservation of electronic documents. A second part to ISO 19005 is being developed to address digital signatures, Open Type fonts, 3D graphics, JPEG 2000, consistency with PDF/X, PDF/E, and PDF/UA is currently being developed. The digital pre-press industry joined forces to develop the PDF/X standard (ISO 15930) which defines methods for the exchange of digital data within the graphic arts industry and for the exchange of files between graphic arts establishments. PDF/X is predominantly used in the exchange of advertisements for magazines. In the developmental pipeline you will find PDF/Engineering (PDF/E), which defines a file format for the exchange of engineering documents based on the PDF format for various communities working with engineering documentation. It is intended to improve document exchange and collaboration within engineering workflows both inside companies and with their partners, suppliers, customers, and others. PDF/UA (PDF/Universal Access) will define a file format to ensure that PDF documents are accessible to those with disabilities. This standard is in the early development phase. There are a multitude of image file formats to choose from. Whatever image file format and compression that your organization chooses is dependent on the application you are using. It is important to take into consideration the type of documents you will be scanning, the graphical content contained in the documents, and how they will be used. The file format you select should

From www.aiim.org/article-aiim.asp?ID=31178 3 18 December 2006 meet the intended use and be capable of including the compression scheme you choose.

Betsy Fanning ([email protected]) is AIIM’s director, of content and standards development. She welcomes any and all comments regarding standards and/or the AIIM standards program.

From www.aiim.org/article-aiim.asp?ID=31178 4 18 December 2006