• Part I - Images • COLOR IMAGES – Image Formats • IMAGE STANDARDS – JPEG – JPEG 2000 Color image formats
• There actually are three basic methods or graphic formats for a computer to render, or store and display, an image: – raster – vector – metafile
Raster image format
• A Raster format breaks the image into a series of colored dots called pixels. The number of ones and zeros (bits) used to create each pixel denotes the depth of color you can put into your images. – Raster image formats can save at 8, 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves can carry up to 16,777,216 different colors. – The main Internet formats, Bitmap, GIF, PNG, JPEG, are all Raster formats.
Enlargement of a raster image. The quality is not improved Vector image format
• An image stored in a vector format is an image defined by lines, curves, circles etc, which are stored as mathema cal formulas. Compared to raster images only the formulas are stored. This makes the size of the file very small. The images don't loose focus when you zoom, since the lines are re-rendered.
• This forma ng falls into open and proprietary formats made for specific programs: – SVG (Scalable Vector Graphics) open standard created and developed by the World Wide Web Consor um – AI (Adobe Illustrator) – CDR (CorelDRAW) – …..
• A vector image gives a very high quality, it requires small storage space and is easy to edit. You should always try to save your vector images in a vector format. It is not possible to save photos, scanned images etc in a vector format.
• Examples of vector images are drawings, diagrams and illustra ons.
.
Enlargement of a vector image. The quality is s ll good
Metafile format
• An image in metafile format is a combina on of both of the two basic formats; vector and raster format. Metafile formats are portable formats that can include both vector and raster informa on.
• Photos are stored in raster format. In some cases you might want to put describing numbers, text and arrows in these images. Both text and arrows should be saved as vector informa on and not as raster to keep the good quality but the photos s ll needs to be in a raster format. The storage is done in a meta format.
• Examples of metafile formats: – the WMF (Windows metafile) – the EMF and EMF+ (Windows Enhanced Metafile). EMF+ is a 32 bit format used by Windows a er Windows XP. It stores a list of func on calls that are issued to Windows GDI to display an image on screen and for printer drivers. It is the na ve vector format for Word, Power Point and Publisher MS applica ons.
Enlargement of a meta image: the quality is good for the vector informa on Bitmap image format
• The Bitmap image format was invented by Microso as a device independent image (DIB) format. It allows to store 2D digital images of any width, height and resolu on, both monochrome and color. Typically images are in uncompressed form but op onally can also be compressed.
• Bitmap images can have a pixel depth of 1, 4, 8, 16, 24, 32 bits. Bitmap images of 1, 4, 8 bits have a table for color conversion. Images with higher depths have the color directly encoded with the three RGB components.
• A bitmap in memory is loaded as a DIB structure. It includes: – Header (file size…) – Bitmap info (size, n. pixel depth..) – Color table – Pixel map upside-down packed in row (rounded up to a mul ple of 4 bytes)
24-bit depth bitmap Image compression
• Image compression aims at reducing the number of bits used to represent raster image content. Compression can be either lossless or lossy. Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.
• Lossless compression algorithms usually exploit sta s cal redundancy to represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is possible because most real-world data has sta s cal redundancy. For example, in English text, the le er 'e' is much more common than the le er 'z', and the probability that the le er 'q' will be followed by the le er 'z' is very small.
• Lossy compression instead assumes that some loss of fidelity is acceptable. For example, a person viewing a picture or television video scene might not no ce if some of its finest details are removed or not represented perfectly.
• What compression technique is to be used depends on the applica on. In general: – Text documents: lossless compression – Data for numerical analysis: lossless compression – Programs: lossless compression – Typographic images: lossless compression – WEB images: lossy compression – Video: lossy compression – Audio: lossy compression
• Lossy compression guarantees higher compression rates Lossless compression
Run-lenght encoding
• Run lenght encoding is a fixed-lenght coding scheme. With run lenght encoding, a sequence of equal symbols is encoded with only one symbol, followed by a number that specifies the mes the symbol appears consecu vely. A special symbol is required.
• For example, if we consider a text string with three character sequences of 11 (eleven) characters “r” each, followed by three sequences of 11 (eleven) “p” each and three sequences of 11 (eleven) “c” each, the whole string is encoded as (special symbol $): $11r $11r $11r $11p $11p $11p $11c $11c $11c Total characters: 6+6+6 = 18 Total numbers: 3+3+3=9 Total dimension: 27Byte = 216bit Prefix-free coding
• Fixed-length codes are always uniquely decipherable. However, these do not always give the best compression and variable length codes are preferred.
• Prefix free coding is a coding scheme where no codeword is a prefix of another one. Every message encoded by a prefix free code is uniquely decipherable. Since no codeword is a prefix of any other we can always find the first codeword in a message, peel it off, and con nue decoding.
• We are therefore interested in finding good (best compression) prefix-free codes. Huffman coding
• Huffman coding refers to the use of a variable-length code for encoding a source symbol, where the variable-length code has been derived on the basis of the es mated probability of occurrence of the source symbol.
• Huffman coding uses a specific method for choosing the representa on for each symbol: – a prefix code that expresses the most common source symbols with shorter strings of bits than for less common source symbols. – the bit string represen ng some par cular symbol is never a prefix of the bit string represen ng any other symbol (prefix-free code)
• It is the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size with the same symbol frequencies. • The Huffman code for an alphabet (set of symbols) may be generated by construc ng a binary tree with nodes containing the symbols to be encoded and their frequencies of occurrence. The tree may be constructed as follows: – Step 1. Select the two parentless nodes with the lowest frequencies. – Step 2. Create a new node which is the parent of the two lowest frequency nodes. – Step 3. Assign the new node a frequency equal to the sum of its children's frequencies. – Step 4. Repeat Step 1 un l there is only one parentless node le .
• Huffman code example:
Consider a set of five different symbols. The symbol's frequencies are: 0 1 A 24 B 12 C 10 A 0 1 D 8 24 E 8 If not encoded, this results into a total of 186 bit (3 bit per codeword) 0 1 0 1 Huffman encoding: – Step 1. Combine D and E into DE with a frequency of 16 B C D E – Step 2. Combine B and C into BC with a frequency of 22 10 12 8 8 – Step 3. Combine BC and DE into BCDE with a frequency of 38 – Combine A with BCDE into ABCDE with a frequency of 62 • The code for each symbol may be obtained by tracing a path to the symbol from the root of the tree. A 1 is assigned for a branch in one direc on and a 0 is assigned for a branch in the other direc on.
• The running me of Huffman's method is fairly efficient, it takes O(n log n) opera ons to construct it.
0 1 • Building the code tree:
A Symbol Frequency Code Code length Total lenght 0 1 24 A 24 0 1 24 B 12 100 3 36 C 10 101 3 30 0 1 0 1 D 8 110 3 24 E 8 111 3 24 B C D E 12 8 8 ------10 186 bit 138 bit (3 bit code) (Huffman encoding) Decoding Huffman encoded files
• In order to decode Huffman encoded files, the decoding algorithm must know what code was used to encode the data. A table containing symbols and their codes or the Huffman tree should be used.
• Decoding a file is a two step process: – the header data is read in and the Huffman code for each symbol is reconstructed. – the encoded data is read and decoded.
• The fastest method for decoding symbols is to read the encoded file one bit at me and traverse the Huffman tree according to each of the bits un l a leaf containing a symbol is reached. When a bit causes a leaf of the tree to be reached, the symbol contained in that leaf is wri en to the decoded file, and traversal starts again from the root of the tree.
0 1 • Example A 0 1 24 Input sequence 0111100110111 Decoded sequence A E B D E 0 1 0 1
B C D E 10 12 8 8
• Although Huffman's original algorithm is op mal for a stream of unrelated symbols with a known input probability distribu on, it is not op mal when the probability mass func ons are unknown, not iden cally distributed, …..
• Other methods such as f.e. LZW (Lempel-Zif-Welsh) coding used in GIF images o en have be er compression capability: these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input sta s cs, the la er of which is useful when input probabili es are not precisely known or vary significantly within the stream. GIF image format
• GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 as a lossless compression standard by CompuServe. It is an 8 bit per pixel image format using a pale e of up to 256 dis nct colors from the 24-bit RGB color space.
• The format had a widespread usage on the early WWW due to its wide support and portability. GIF main characteris cs suited
• A GIF image employs lossless LZW (Lempel-Zif-Welsh) compression so that the file size of an image may be reduced without degrading the visual quality, provided the image can be rendered with only 256 colours. Its lossless compression preserves very sharp edges. unsuited • The 256 colors limita on makes the GIF format unsuitable for color photographs and other images with con nuous color. It is instead well- suited for simpler images such as graphics or logos with solid areas of color.
• GIF has been generally used for sharp-edged line art (o en vector based logos) with a very limited number of colors. A large por on of web page line art, including logos and design elements, is GIF. The GIF format is also s ll u lized both for short, small anima ons and low resolu on film clips for web pages. • Monochrome photographs (with con nuous grey tones) can be represented well as GIFs but have large file sizes due to the inappropriate compression technique
• With the excep on of animated GIFs however, PNG has increasingly replaced GIFs on web pages. LZW compression
• LZW compression replaces strings of symbols (i.e. sequences with 2 or more symbols) with single codes. It does not do any analysis of the incoming string pa ern. Instead, it adds every new string of symbols it finds to a table of strings. Compression occurs when a single code is output instead of a string of symbols
• The LZW code can be of any arbitrary length, but it must have more bits in it than a single symbol. For 8 bit symbols the first 256 codes are assigned to the standard set. The remaining codes are assigned to strings that are found as the algorithm proceeds • For example, with 12 bit codes, codes 0-255 refer to individual bytes, and codes 256-4095 refer to substrings Example
• The input character string is a list of words separated by the '/' character: /WED /WE /WEE /WEB /WET
‒ A table with 256 individual characters is assumed to be present. Each character is assigned with a unique code (0-255). ‒ Star ng from the first couple of characters, it is checked if the string /W is in the table. Since it is not present in the table, the string /W is added to the table and the code for / is output. The string is assigned to code 256 (codes 0-255 are already used for 256 characters). ‒ A er E, has been read in, the second string code, WE is added to the table, and the code for W is output.
‒ This con nues un l in the second word the characters / and W are read, matching string number 256. In this case, the string /WE (three characters) is added to the table and the code 256 is output.
‒ The process con nues un l the string is exhausted and all the codes have been output. /WED /WE /WEE/ WEB/ WET
Character Input existing? string checked existing? Code Output New code New String
/ Y - - - - - W Y /W N / 256 /W E Y WE N W 257 WE D Y ED N E 258 ED / Y D/ N D 259 D/ W Y /W Y E Y /WE N 256 260 /WE / Y E/ N E 261 E/ W Y /W Y E Y /WE Y E Y /WEE N 260 262 /WEE / Y E/ Y W Y E/W N 261 263 E/W E Y WE Y B Y WEB N 257 264 WEB / Y B/ N B 265 B/ W Y /W Y E Y /WE Y T Y /WET N 260 266 /WET EOF T LZW decompression
• The decompression algorithm takes the stream of codes output from the compression algorithm and uses them to recreate the input stream. • The LZW algorithm does not need to pass the string table to the decompression code. The table can be built exactly as it was during compression, using the input stream as data
• In decompression, the string table ends up looking exactly like the table built up during compression. The output string is iden cal to the input string from the compression algorithm. Input Codes: / W E D 256 E 260 261 257 B 260 T
Input Output New_code Old_code String New table entry
/ / / W / W 256 = /W E W E 257 = WE D E D 258 = ED 256 D /W 259 = D/ E 256 E 260 = /WE 260 E /WE 261 = E/ 261 260 E/ 262 = /WEE 257 261 WE 263 = E/W B 257 B 264 = WEB 260 B /WE 265 = B/ T 260 T 266 = /WET
• CompuServe updated the GIF format in 1989 to include anima on, transparency, and interlacing.
GIF87a Image 61.4KB GIF89a Image 61.4KB GIF anima on
• GIF89a anima on provides the ability to set the cell's (technically called an "anima on frame") movement speed in 1/100 of a second. An internal clock embedded into the GIF keeps count and flips the image when the me comes.
• GIF89a was designed based on the principle of rendering images to a logical screen. Each image could op onally have its own pale e, and the format allows to specify delay and wai ng for user input GIF Transparency
• In transparent GIF the computer is told to hone in on one color. A par cular red / green / blue shade already found in the image is chosen and blanked out (the color is dropped from the pale e that makes up the image), so that whatever is behind it shows through. A transparent GIF is limited in that only one color of the 256-shade pale e can be made transparent
• The process is similar to chroma key used in television. A computer is told to hone in on a specific color, let's say it's green. Chroma key screens are usually green because it's the color least likely to be found in human skin tones. That chroma is then erased and replaced by another image. Interlaced, non-interlaced GIF
• A non-interlaced image, is filled in from the top to the bo om, one line a er another. Interlacing is the concept of filling in every other line of data, then going back to the top and doing it all again, filling in the lines you skipped.
• The effect on a computer monitor is that the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows the viewer to at least get an idea of what is coming up rather than wai ng for the en re image, line by line. PNG image format
• PNG (Portable Network Graphics) is a lossless compression format created to improve and replace the GIF format. PNG is a single-image format. A companion format called MNG has been defined for anima on. PNG files use file-extension "PNG" or "png”.
• The mo va on for crea ng the PNG format came in early 1995, a er Unisys announced that it claimed its patent on the LZW compression algorithm used in the GIF format. A replacement was anyway desirable because the GIF limita on to 256 colors at a me when computers were capable of displaying far more.
• PNG was designed for transfering images on the Internet, and is the op mal choice for expor ng images with repea ng gradients for web usage. It is not suited not professional graphics. • PNG is also useful for saving temporary photographs that require successive edi ng. When the photograph is ready to be distributed, it can then be saved as a JPEG, and this limits the informa on loss to just one genera on.
• PNG format can be 10 mes the size of JPEG.
• PNG employs the RGB color space. PNG does not support EXIF (Exchangeable Image File) image data (including camera se ngs like shu er speed, focal lenght, exposure compensa on, flash used… and scene informa on, date and me…) from sources such as digital cameras, which makes it problema c for use amongst amateur and especially professional photographers.
• PNG is supported by the pla orm-independent libpng reference library, with func ons for handling PNG images (h p://www.libpng.org/pub/png/libpng.html)
Deflate compression
• PNG employs the DEFLATE lossless compression algorithm that uses a combina on of the LZ77 algorithm and Huffman coding (also used in the PKZIP archiving tool and specified in RFC 1951).
• A DEFLATE stream consists of a series of blocks. Each block uses a single mode of compression and is preceded by a 3-bit header: – 1-bit: Last block in stream marker: • 1: if this is the last-block in the stream • 0: if there are more blocks to process a er this one. – 2-bits: Encoding method used for this block type: • 00: a stored/raw/literal sec on follows, between 0 and 65535 bytes in length. • 01: a sta c Huffman compressed block, using a pre-agreed Huffman tree. • 10: a compressed block complete with the Huffman table supplied. Compression is achieved through two steps: – LZ77 algorithm: matching and replacement of duplicate strings with pointers: if a repeated string of bits exists a back reference to the previous loca on of the same string is inserted. This reference is expressed as (distance, lenght). References can be made across any number of blocks – Huffman coding: replacing symbols with new, weighted symbols based on frequency of use • The LZ77 compression finds sequences of characters that are repeated. It uses a sliding window of 32K (records what the last 32768 characters were).
• When the same sequence of characters is encountered the sequence is replaced by a distance (how far back in the window) and the lenght (the number of iden cal characters), which is equivalent to the statement: "each of the next length characters is equal to the characters exactly distance characters behind the current point in the uncompressed stream".
• Example:
Let’s take the sequence : Homehomehomehomehom Consider the characters Homeh and the next 4 characters omeh: Homehomehomehomehome
– There is an exact match of the last 4 characters omeh with the characters before, 4 posi ons behind the current point. We can output special characters to the stream that represent a number for length, and a number for distance. We can encode: Homeh [D=4, L=4]
– Considering also the characters that follow each of the two strings declared to be equal we see that other characters are the same. We can increase compression as: Homeh [D=4, L=18]
• PNG supports indexed pale e-based (pale es of 24-bit RGB colors) or grayscale or RGB images (one or more channels).
• The number of channels will depend on whether the image is greyscale or color and whether it has an alpha channel. PNG allows the following combina ons of channels: • indexed (channel containing indexes into a pale e or colors) • greyscale • greyscale and alpha (0/1 indicates the level of transparency for each pixel) • red, green and blue (rgb / truecolor) • red, green, blue and alpha
Type Bit depth per channel
1 2 4 8 16
indexed (color type 3) 1 2 4 8 No greyscale (color type 0) 1 2 4 8 16 greyscale & alpha (color type 4) No No No 16 32 Truecolor (RGB - color type 2) No No No 24 48 truecolor & alpha (RGBA - color type 6) No No No 32 64
cell values are total bits per pixel TIFF image format
• The TIFF (Tagged Image File Format) format is a lossless standard by Aldus Corpora on that stores color images with 24 bits per pixel.
• TIFF allows to compress images up to a certain point s ll saving image quality. With respect to PNG, TIFF is much larger in file size for an equivalent image. • TIFF is a format that incorporates an extremely wide range of op ons. It is useful as a generic format for interchange between professional image edi ng applica ons, but many applica ons including web browsers can read only a subset of TIFF types. So the same image can display in different colors depending on the TIFF interpreter: – The most common general-purpose, lossless compression algorithm used with TIFF is LZW, which is inferior to PNG. – There is a TIFF variant that uses the same compression algorithm as PNG uses, but it is not supported by many proprietary programs. – TIFF also offers special-purpose lossless compression algorithms like CCITT Group IV, which can compress bilevel images (e.g., faxes or black-and-white text) be er than PNG's compression algorithm.
• Different color images can be represented in TIFF, namely: RGB, CMYK, Lab
Lossy compression
The JPEG Standard
• JPEG is a compression algorithm developed by the Joint Photographic Experts Group. The Web took to the format straightaway because it allows to store images in fewer bytes, and transfer them in fewer bytes (h p://www.jpeg.org/)
• JPEG does not define which color space is to be used for images. JPEG provides lossy compression, i.e. trades-off detail in the displayed picture for a smaller storage file. • The compression algorithm is not as well suited for line drawings and other textual or iconic graphics, and thus the PNG and GIF formats are preferred for these types of images.
JPEG - JFIF
• JPEG specifies both the codec defining how an image is transformed into a stream of bytes, and the file format used to contain that stream. JFIF (JPEG File Intechange Format) specifies how a file is created to store a JPEG stram on a computer. JFIF defines the color model to be used as the YCbCr or YUV color spaces that are directly derived from the RGB space.
• JPEG/JFIF is the format most used for storing and transmi ng photographs on the WWW. For this applica on, it is preferred to formats such as GIF, which has a limit of 256 dis nct colors that is insufficient for colour photographs, and PNG, which produces much larger image files for this type of image. Mode of opera on
• JPEG uses transform coding, it is based on the following observa ons:
– Observa on 1: A large majority of useful image contents change rela vely slowly across images, i.e., it is unusual for intensity values to alter up and down several mes in a small area, like f.e. within an 8 x 8 pixel image block. In terms of frequencies, low spa al frequency components contain more informa on than high frequency components (that correspond to less useful details and noises).
– Observa on 2: Psychophysical experiments suggest that humans are more recep ve to the loss of higher spa al frequency components than the loss of lower frequency components.
Some rules of use
• Since JPEG is lossy, bytes are lost at the expense of detail.
• You can see where the compression algorithm found groups of pixels that all appeared to be close in color and just grouped them all together as one: JPEG Image compression example JPEG Image compression example The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive.
• A useful property of JPEG is that the loss can be varied by adjus ng compression parameters. This means that the image maker can trade-off file size against output image quality. For good- quality, full-color source images, the default quality se ng is Q 75 i.e. 25% of the image is included in the algorithm. A good rule is to save your JPEGs at 50% or medium compression. JPEG major steps
– DCT (Discrete Cosine Transforma on) – Quan za on – Zigzag Scan – Entropy coding • Coefficient encoding – DPCM on DC component – RLE on AC Components • Huffman Coding JPEG chain DCT (Discrete Cosine Transforma on)
• Apply DCT to 8x8 image blocks
• If the image size is not a mul ple of 8, then add copies of the last row or column un l a mul ple of 8 is reached. This makes both tone and luminance of the 8x8 block not change too much a er DCT, as it would be if these elements were set to 0.
• DCT allows to shi from spa al domain to frequency domain:
f(i,j) is the value that is present in the (i,j) posi on of the 8x8 block of the original image. F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) posi on of the 8x8 matrix that encodes the transformed coefficients.
Discrete Cosine Transform (DCT):
Inverse Discrete Cosine Transform (IDCT): Why DCT not FFT
• DCT is like FFT, but can approximate linear signals well with few coefficients.
The 64 (8 x 8) DCT basis func ons
F[0,0] DCT factoring
• To compute DCT, factoring reduces the problem to a series of 1D DCTs:
f [ i,j ] G [ i,v ] F [ u,v ] Quan za on
• To reduce number of bits per sample, quan za on is used:
F'[u, v] = round (F[u, v] / q[u, v])
where q(u,v) is the quan za on matrix and F(u,v) is the DCT coefficient matrix.
Example: 101101 = 45 (6 bits) q[u, v] = 4 truncate to 4 bits: 1011 = 13
• Quan za on error is the main source of the lossy compression. Different quan za on matrices can be used. • Uniform Quan za on Each F[u,v] is divided by the same constant N.
• Non-uniform Quan za on accounts for the fact that human eye is most sensi ve to low frequencies (upper le corner), less sensi ve to high frequencies (lower right corner), more sensi ve to luminance, less to color
Luminance Quantization Table q(u, v) Chrominance Quantization Table q(u, v) ------16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 99 12 12 14 19 26 58 60 55 18 21 26 66 99 99 99 99 14 13 16 24 40 57 69 56 24 26 56 99 99 99 99 99 14 17 22 29 51 87 80 62 47 66 99 99 99 99 99 99 18 22 37 56 68 109 103 77 99 99 99 99 99 99 99 99 24 35 55 64 81 104 113 92 99 99 99 99 99 99 99 99 49 64 78 87 103 121 120 101 99 99 99 99 99 99 99 99 72 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99 ------Non-uniform Quan za on
Zig-zag scan
• As a result of quan za on we have a 8x8 matrix with many elements equal to 0. Non null coefficients are all in the upper le corner. • This suggests to transform the 8x8 matrix into a 64 element vector using a zig-zag order. Zig-Zag scan is used to group low frequency coefficients in the top of the vector: maps 8x8 to 1x64 vector Coefficient encoding
• Coefficients are encoded differently:
– Differen al Pulse Code Modula on (DPCM) on the DC component • DC component is large and varied, but o en close to the value of the DC component of the previous block. According to this JPEG encodes the difference (DC diff) between the previous and the current 8 x 8 block. – Run Length Encoding (RLE) on AC components • Many of the AC coefficients are equal to 0. According to this they are encoded using RLE, which counts the number of consecu ve 0s: – a minimum of 0 to a maximum of 16 consecu ve 0s is allowed (in the la er case the special symbol (15,0) is used); – the end of block is encoded with (0,0). • For the DC component (DC diff) we build the pair: (SIZE) (AMPLITUDE) SIZE is the number of bits needed to represent the DC difference value; AMPLITUDE is the value of the DC difference. ------SIZE Value ------1 -1, 1 2 -3, -2, 2, 3 3 -7..-4, 4..7 4 -15..-8, 8..15 . . . . 10 -1023..-512, 512..1023 ------
Example: if DC value is 4, 3 bits are needed. • For the AC components the following representa on: (RUNLENGHT, SIZE) (AMPLITUDE) is used where RUNLENGHT is the number of consecu ve 0 (from 0 to 15), SIZE has the same meaning as for the DC coefficient, AMPLITUDE is the actual value for nonzero AC coefficients. DC Coefficient DC of preceding block
o o
symbol-1 symbol-2
Special symbols (symbol-1)
Size Amplitude Huffman encoding
• A er we have encoded every block, we have a sequence of symbols: ‒ Symbol 1: (SIZE) or (RLE, SIZE) ‒ Symbol 2: (AMPLITUDE) These symbols are further encoded using the Huffman encoding to reduce the number of data. Most frequent symbols are encoded with shorter codes. Less frequent with longer ones.
• Huffman tables provide codes for every symbol of the sequence. Huffman Tables can be custom (sent in header) or default. Huffman tables are different for DC and AC symbol 1.
Symbol1 and Symbol2 encoding
Byte stuffing
Progressive JPEG
• Progressive JPEG works a lot like the interlaced GIF89a by filling in every other line, then returning to the top of the image to fill in the remainder.
• The DCT progressive mode of opera on consists of the same DCT and quan za on steps that are used by DCT sequen al mode. The key difference is that each image component is encoded in mul ple scans rather than in a single scan. A er each block of DCT coefficients is quan zed, it is stored in a coefficient buffer memory. The buffered coefficients are then par ally encoded in each of mul ple scans.
• The first scan(s) encode a rough but recognizable version of the image which can be transmi ed quickly in comparison to the total transmission me, and are refined by succeeding scans un l reaching a level of picture quality that was established by the quan za on tables.
• Each scan of progressive JPEG takes about the same computa on to display as a whole JPEG. It has sense only if a decoder is available that is faster than the communica on link. Progressive spectral selec on - Progressive successive approxima on
• There are two complementary methods by which a block of quan zed DCT coefficients may be par ally encoded.
• Progressive Spectral Selec on algorithm: – The DCT coefficients are grouped into several spectral bands: only a specified band of coefficients from the zig-zag sequence need be encoded within a given scan. – Low-frequency DCT coefficient bands are sent first,and then higher-frequency ones This procedure is called spectral selec on, because each band typically contains coefficients which occupy a lower or higher part of the spa al-frequency spectrum.
• Progressive Successive Approxima on algorithm – The coefficients within the band need not be encoded to their full quan zed accuracy in a given scan: DCT coefficients are sent first with lower precision, and then refined in later scans (first the N most significant bits and the less significant in successive scans) • The quan zed DCT coefficient informa on can be viewed as a rectangle for which the axes are the DCT coefficients (in zig-zag order) and their amplitudes.
‒ Spectral Selec on slices the informa on in one dimension. ‒ Successive Approxima on slices the informa on in the other.
The JPEG Bitstream
• A JPEG image consists of a sequence of segments, each beginning with a marker, each of which begins with a 0xFF byte followed by a byte indica ng what kind of marker it is. Some markers consist of just those two bytes; others are followed by two bytes indica ng the length of marker-specific payload data that follows.
• The bitstream of a JPEG/JFIF image file gives the following segments:
Type of segment Length of segment ------Start of image 0 APP0 16 Quan sa on table 67 Start of frame: baseline DCT 11 Huffman table 28 Huffman table 63 Start of scan 49363 End of image 0
– The APP0 segment marks this file as a JFIF/JPEG. JFIF defines some extra fields (like the image resolu on and an op onal thumbnail). – The JPEG file contains one frame (image) and that frame can include one of more scans. Scans can give progressively more detail (with Progressive JPEG) or can be different axis of the colour space. – The quan sa on and Huffman tables are needed for decoding the image. There can be many Huffman and quan sa on tables and different scans might use different tables. JPEG image format example
Applica on specific. For example, an EXIF JPEG file uses the marker to store EXIF metadata Luminance quan za on table Crominance quan za on table
Baseline DCT-based JPEG, specifies the width, height, number of components, and component subsampling (e.g., 4:2:0).
Huffman table for DC symbol 1 Huffman table for DC symbol 2
Huffman table for AC symbol 1 Huffman table for AC symbol 2
In baseline DCT JPEG images, there is generally a single scan. Progressive DCT JPEG images usually contain mul ple scans. The marker specifies which slice of data it will contain, and is immediately followed by entropy-coded data. Sequen al Lossless JPEG
• Lossless JPEG does not use DCT. It uses a predic ve scheme based on the nearest neighbors and entropy coding is used on the predic on error. The predic on block has replaced the DCT encoding and the quan za on block from the baseline sequen al JPEG encoder.
• The simplest predic ve coding scheme is the DPCM that encodes the difference between the actual value of each pixel and its predicted value. The predicted value is provided by an appropriate func on based on the modified values of the pixels above and le (A, B and C in figure). Predictor formula can as be simple as = A or the average, or as complex as =B+(A-C)/2
C B A X
• The sequence is encoded with the Huffman code. Hierarchical JPEG
• Hierarchical coding represents images at different pixel resolu ons, i.e. we could be able to create various image versions, e.g. 512x512, 1024x1024 and 2048x2048
• Hierarchical JPEG mode creates a set of compressed images beginning with small images, and then con nuing with images of increased resolu ons. In this way it provides a pyramidal encoding of an image at mul ple resolu ons, each differing in resolu on from its adjacent encoding by a factor of two in either the horizontal or ver cal dimension or both.
• Hierarchical encoding is useful in applica ons in which a very high resolu on image must be accessed by a lower-resolu on display or for real me applica ons. An example is an image scanned and compressed at high resolu on for a very high-quality printer, where the image must also be displayed on a low-resolu on PC video screen.
• In hierarchical JPEG, lower resolu on image is scaled up to the next resolu on and used as a predic on for the following stage. The encoding procedure can be summarized as follows:
– 1 Filter and down-sample the original image by Image downsampling the desired number of mul ples of 2 in each dimension (DSF). 1 1 – 2 Encode this reduced-size image using one of the sequen al DCT or progressive DCT encoders (FDCT). 5 – 3 Decode this reduced-size image (IDCT) 6 – 4 Interpolate and up-sample it by 2 horizontally 2 and/or ver cally, using the iden cal interpola on 3 filter which the receiver must use (USF). 4 – 5 Use this up-sampled image as a predic on of the original at this resolu on, and encode the difference (error) image using one of the sequen al DCT or progressive DCT (FDCT). 7 7 – 6 Decode (IDCT) the difference image and sum it Image encoding to the up-sampled version available. – 7 Repeat steps 4), 5) and 6) un l the full resolu on of the image has been encoded.
• Hierarchical JPEG single frames can be furthermore coded with progressive JPEG mode
Handling Color Images
• JPEG encoding of color images can be done according to alterna ve approaches: - Consider the R, G and B components and JPEG encode each channel separately
Apply JPEG R-Component Compression
Apply JPEG Color Image G-Component Compression
Apply JPEG B-Component Compression - Transform RGB to another representa on in order to separate Luminance from Chroma and apply JPEG to each channel separately (downsampling the chroma channel)
Apply JPEG Y-component Compression Transform to (Luminance) Color Image Y Cr Cb or YUV Cr Cb component Subsample by Apply JPEG (Chrominance) 2 in H & V Compression
GIF, PNG, TIFF, JPEG standards at comparison
• GIF – PNG GIF and PNG formats use lossless compression to achieve medium levels of compression on images
– GIF works best on images with few colors or images in which one color is dominant. It acts best on iden cal, adjacent pixels (or rows of iden cal, adjacent pixels). There is no loss during the compression process as long as the original image had fewer than 256 colors. The key to using GIFs effec vely is to use the smallest possible number of colors. – PNG is the other choice with higher number of colors.
– Also notable about GIFs is the fact that images can be transparent, animated, or both.
• JPEG The JPEG format uses lossy compression to achieve high levels of compression on images with many colors. – The compression works best with con nuous-tone images, that is, images where the change between adjacent pixels is small but not zero. – JPEG images generally store 16 or 24 bits of color and thus are best for 16- or 24- bit images. – Due to the no ceable loss of quality during the compression process, JPEGs should be used only where image file size is important, primarily on web pages.
• Example: image with 25 plain colors
‒ The GIF and the PNG have the same quality as the original, but the GIF is ea ng almost 3 mes as much KB's as the PNG. ‒ The JPEG with the least compressing factor is already blurring a lot. The JPEG with 25% compressing is smaller, but s ll 4 mes the PNG and has a bad quality blurring the 25 original colors
• Example: photo image ‒ JPEG is a standard for digital photographs because it can save informa on on more than 16 million different hues. Its "lossy" compression has li le effect on photographs. ‒ JPEG will produce a smaller file than PNG for photographic images. Using PNG instead of JPEG for such images would result in a large increase in filesize (o en 5–10 mes) with negligible gain in quality. ‒ The JPEG with 50% compression is showing quite a lot of the JPEG ar facts in the air around the right tower. ‒ The GIF and the 25% compressed JPEG are both reasonable. • Example: text image
– PNG is a be er choice than JPEG for storing images that contain text, line art, or other images with sharp transi ons that do not transform well into the frequency domain.
– Where an image contains both sharp transi ons and photographic parts a choice must be made between the large but sharp PNG and a small JPEG with ar facts around sharp transi ons.
• If your image is...
– Black and white Use GIF ! sample – Text on a plain background Use GIF ! sample – Transparent or animated Use GIF – Computer-drawn line, cartoon art Use GIF. – Small images, like icons, bu ons Use GIF – Predominantly (>80%) one color Consider GIF
The TIFF format is rarely seen on the web because it offers poor compression.
• Image size: 146 x 184, 75 colors. File size: – 8 bpp: 26864 bytes – 24 bpp: 80592 byts – PPM (24 bpp) : 80674 byte – GIF (8 bpp): 3585 byte (FC=22.48) – JPG (24 bpp): 4805 byte (FC=16.77) – PPM.ZIP (24 bpp): 3698 byte (FC=21.79)
PPM portable pixel map (most redundant and inefficient image format)
– 16/24-bit scanned photograph Use JPEG ! sample – Computer-drawn con nuous-tone art Use JPEG ! sample – Scanned images and photographs Use JPEG – (Large) images with a lot of detail Use JPEG
• Image size: 244 x 334, 31322 colors File size: • 8 bpp: 81496 byte • 24 bpp: 244488 byte • PPM (24 bpp) : 244620 byte • GIF (8 bpp): 49613 byte (FC=4.92) • JPG (24 bpp): 16352 byte (FC=14.95) • PPM.ZIP (24 bpp): 190977 byte (FC=1.28) JPEG 2000 Standard
• JPEG 2000 is an ISO Standard (ISO/IEC 15444-1:2000) for images created by the Joint Photograph Expert Group commi ee in year 2000. Allows mul spectral imaging. JPEG2000 supports both lossy and lossless compression. JPEG2000 image files have the extension .jp2 or .j2f
• JPEG2000 is not so widely used nor on the web as JPEG: many important so ware programs for image manipula on and processing and web browsers do not have support for JPEG2000; no accepted way to embed EXIF data. – Photoshop: ADOBE plugin – Paint Shop Pro: proprietary plugin – Browsers: plugin available (Luratech) – Linux: yes through JasPer (MIT licence) – MSWindows: only read proprietary – MACOSX: yes through QuickTime – … JPEG2000 color components
• JPEG2000 requires that images are transformed from RGB into: – YUV (fully reversible) – YCbCr (irreversible because of floa ng point implementa on and round-offs)
• YCbCr is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y CbCr is not an absolute color space, it is a way of encoding RGB informa on. The actual color displayed depends on the actual RGB colorants used to display the signal. Typically the terms YCbCr and YUV are used interchangeably. YCbCr color space
Cb
Cr
Original image Y Gamma correc on
• Y’CbCr can be used instead of YcbCr where Y’ stands for Luma, i.e. gamma-corrected Luminance. Gamma correc on is a nonlinear opera on used to code and decode luminance in video or s ll image systems. It is required to compensate for proper es of human vision.
g • Gamma correc on is made according to Vout = aV in where a is a constant (typically equal to 1) and the input and output values are non-nega ve real values in the range 0–1. A gamma value γ < 1 is called an encoding gamma; a gamma value γ > 1 is called a decoding gamma. In most computer systems, images are encoded with a gamma of about 0.45 and decoded with a gamma of 2.2.
• Data in s ll image files (e.g. JPEG) are explicitly encoded (that is, they carry gamma-encoded values, not linear intensi es) Vout
gamma ecoding/decoding (dashed/con nuous) and linear transfer func on (do ed)
Vin Reference Grid and Image Area • JPEG2000 supports mul ple image components 1 to 255 or more. Each component can have different sizes and bit depths (1 to 32bits), and different alignments rela ve to each other
• JPEG2000 uses the concept of the canvas to align mul ple image components in a single coordinate system. By default, images are placed on the canvas so that the image and canvas origins align. • Only those samples which fall within the image area actually belong to the image component. Thus the samples of component i are mapped into the image component domain, as a rectangle having upper le hand sample with coordinates (X0siz, Y0siz) and lower right hand sample with coordinates (Xsiz-1, Ysiz-1)
Canvas as a rectangular grid of size Xsiz x Ysiz Canvas The image is aligned to the bo om-right corner of the grid Xsiz, Ysiz with ver cal, horizontal sampling periods of each component. Size of the Image area: (Xsiz-XOsiz)x(Ysiz-YOsiz) Image Tiling
• Each image component is further broken down into les. Tile sizes are variable, and can differ from component to component. Similar to blocks in JPEG, but more flexible.
Image les rela ve to the reference grid • The reference grid is par oned into a regular sized rectangular array of les. The le size and ling offset are defined on the reference grid, by dimensional pairs (XTsiz, YTsiz) and (XTOsiz, YTOsiz), respec vely.
• By default, images will have one le that has the same dimensions and offset on the canvas as the image. If the le dimensions are smaller than the image dimensions and the le offsets are different than the images offsets, some les may extend beyond the borders of the image. Image subsampling and cropping
• Each image component can also be subsampled. The subsampling factors indicate the scaling factor between the component dimensions and the image dimensions. For example, an image component that has subsampling factors of 2 by 2 of a 1280 by 720 image, will have dimensions 640 by 360. The samples of component i are at integer mul ples of (Xsiz(i), Ysiz(i)) of the canvas.
396
16:9 297 4:3 720 360
1280 640 Original image area (full resolu on) New image area (subsampled) • Tiling with subsampling and cropping can be used to obtain new images from original images. An example that sub-samples an 1280 x 720 (16:9) image at 2:1 ra o on each side and then crops it to 4:3 aspect ra o: new image size is 396x297 Wavelet compression
• JPEG2000 uses Discrete Wavelet Transform in the lossy stage of image compression. Wavelet transform breaks down the image into mul resolu on representa ons.
• For JPEG2000, the wavelet transform is applied to the image on a le by le basis. Discrete Wavelet Transform
• In numerical analysis and func onal analysis, the Discrete Wavelet Transform refers to wavelet transforms for which the wavelets are discretely sampled.
• The Discrete Wavelet Transform was invented by the Hungarian mathema cian Alfréd Haar: – For an input represented by a list of 2n numbers, the Haar wavelet transform may be considered to simply pair up input values, storing the difference and passing the sum. – This process is repeated recursively, pairing up the sums to provide the next scale: finally resul ng in 2n − 1 differences and one final sum.
• The Discrete Wavelet Transform has nice proper es: – it can be performed in O(n) opera ons; – it captures not only some no on of the frequency content of the input, by examining it at different scales, but also captures the temporal content, i.e. the mes at which these frequencies occur. Combined, these two proper es make the wavelet transform, an alterna ve to the conven onal Fast Fourier Transform. 1D Discrete Wavelet Transform
• The Haar wavelet can be described as a step func on:
1 1 2x2 matrix H = 1 0 < = x < ½ 1 1 -1 F(x) -1 ½ < x < =1 0 1 0 otherwise -1
– Given a sequence (a0, a1, a2,a3…a2n+1) of even lenght this can be transformed into a sequence of two-component vectors (a0,a1),… (a2n,a2n+1).
– If one mul plies each vector with the matrix H one gets the result (s0,d0)…..(sn,dn) of one stage of the Haar wavelet transform (sum, difference).
– The two sequences s and d are separated and the process is repeated with the sequence (s0, s1, s2, s3…s2n+1) • In the one dimensional Discrete Wavelet Transform case, it equals that the signal is broken into subbands by passing it through a low pass filter and a high pass filter. The outputs give: – the approxima on coefficients (from the low-pass filter) – the detail coefficients (from the high-pass filter)
Approximation coefficients -100+200+600+200-200= 700/8 = 87,5
Detail coefficients
• Taking only the sum at each level implies that half the frequencies of the signal have been removed at each level. So half of the samples can be discarded according to Nyquist’s rule. The filter outputs are therefore downsampled by 2 Nyquist theorem: a signal must be sampled at least twice its highest frequency in order to extract all the informa on from the bandwidth.
• Due to the decomposi on process the input signal must be a mul ple of 2n where n is the number of levels.
2D Discrete Wavelet Transform
• In the two-dimensional case, as in the 1D case, the signal is broken into subbands by passing it through a low pass filter and a high pass filter, and both subbands are downsampled by 2.
• According to the Mallat method, decomposi on can be applied separably in the ver cal and horizontal direc ons in the order. This leads to a two-dimensional signal ge ng broken down into four subbands, known as: – LL (Low frequency horizontal, Low frequency ver cal), – HL (High frequency horizontal, Low frequency ver cal), – LH (Low frequency horizontal, High frequency ver cal – HH (High frequency horizontal, High frequency ver cal) 1
LL HL L 2 LH HH H Two-step 2D Wavelet Mallat decomposi on Input image L H
HL HL
LH HH LH HH 2D Wavelet Decomposi on
• Conceptually, for a par cular image, these subbands translate to: low-frequency approxima on of the original (LL) primarily ver cal edges (HL) primarily horizontal edges (LH) diagonal edges (HH).
• Decomposi on is itera vely applied. Since downsampling is performed at each pass, at each itera on the image halves its size in the ver cal and horizontal direc ons.
Wavelet decompos on quan za on
• A er Discrete Wavelet decompos on has been performed, quan za on matrix is applied to the decomposed image. Uniform quan za on is performed within each subband, with different levels of quan za on for each subband.
• JPEG2000 does not specify the use of par cular quan za on matrices. A way of calcula ng a quan za on matrix for a par cular filter is suggested. Generally, the higher frequency subbands are quan zed more coarsely, since humans have lower contrast sensi vity to high frequency informa on.
An example Region of Interest (ROI) coding
• JPEG 2000 offers increased flexibility that can make it more applicable than JPEG, and has other interes ng feature like ROI coding and progressive transmission
• In ROI coding, por ons of an image are stored at higher quality than the rest of the image. This is useful, because we may care more about detail in some por ons of an image than in others.
An example of Region of Interest JPEG2000 coding
• ROI is easy to do when the image is stored compressed in a mul resolu on format.
• We first start with a ROI mask, which marks out a region of the image we wish to store at higher quality. The wavelet coefficients corresponding to the transform of the mask have to be stored at higher quality (quan zed less coarsely). We can do this by applying the transform to the mask, and looking at which coefficients fall in the mask.
Coefficients here are quan zed less coarsely at any subband JPEG2000 vs JPEG
• With respect to JPEG it allows space saving in the order of 20%-30%. Therefore it appears to be par cularly suited for large images.
• However this is not the primary mo va on for its use. More important JPEG 2000 employs mul resolu on and can arrange a large range of bit rates (both very low and very high compression rates are supported). With JPEG if we want to trasmit over low bit rate we should first reduce the resolu on and then encode. • The wavelet representa ons of an image generally perform be er than DCT representa ons for lossy image compression, as there is less perceptual loss for the same bit rate even when performed on the same block size.
• Mul -resolu on wavelet representa ons give be er performance because: – Mul -resolu on representa ons are more similar to how the human visual system represents images. Consequently be er quan za on matrices can be chosen, to more closely match and exploit the characteris cs of the human visual system – The wavelet basis func ons are smoother than the DCT basis func ons (which tend to be blocky), and are more natural and pleasing to the eye.
A compara ve example
JPEG at 0.125 bpp (enlarged) C. Christopoulos, A. Skodras, T. Ebrahimi, JPEG2000 (online tutorial) JPEG2000 at 0.125 bpp C. Christopoulos, A. Skodras, T. Ebrahimi, JPEG2000 (online tutorial)