• Part I - Images • COLOR IMAGES – Image Formats • IMAGE STANDARDS – JPEG – JPEG 2000 Color image formats

• There actually are three basic methods or graphic formats for a computer to render, or store and display, an image: – raster – vector – metafile

Raster image format

• A Raster format breaks the image into a series of colored dots called pixels. The number of ones and zeros (bits) used to create each pixel denotes the depth of color you can put into your images. – Raster image formats can save at 8, 16, 24, and 32 bits-per-pixel. At the two highest levels, the pixels themselves can carry up to 16,777,216 different colors. – The main Internet formats, Bitmap, GIF, PNG, JPEG, are all Raster formats.

Enlargement of a raster image. The quality is not improved Vector image format

• An image stored in a vector format is an image defined by lines, curves, circles etc, which are stored as mathemacal formulas. Compared to raster images only the formulas are stored. This makes the size of the file very small. The images don't loose focus when you zoom, since the lines are re-rendered.

• This formang falls into open and proprietary formats made for specific programs: – SVG (Scalable Vector Graphics) open standard created and developed by the World Wide Web Consorum – AI (Adobe Illustrator) – CDR (CorelDRAW) – …..

• A vector image gives a very high quality, it requires small storage space and is easy to edit. You should always try to save your vector images in a vector format. It is not possible to save photos, scanned images etc in a vector format.

• Examples of vector images are drawings, diagrams and illustraons.

.

Enlargement of a vector image. The quality is sll good

Metafile format

• An image in metafile format is a combinaon of both of the two basic formats; vector and raster format. Metafile formats are portable formats that can include both vector and raster informaon.

• Photos are stored in raster format. In some cases you might want to put describing numbers, text and arrows in these images. Both text and arrows should be saved as vector informaon and not as raster to keep the good quality but the photos sll needs to be in a raster format. The storage is done in a meta format.

• Examples of metafile formats: – the WMF (Windows metafile) – the EMF and EMF+ (Windows Enhanced Metafile). EMF+ is a 32 bit format used by Windows aer Windows XP. It stores a list of funcon calls that are issued to Windows GDI to display an image on screen and for printer drivers. It is the nave vector format for Word, Power Point and Publisher MS applicaons.

Enlargement of a meta image: the quality is good for the vector informaon Bitmap image format

• The Bitmap image format was invented by Microso as a device independent image (DIB) format. It allows to store 2D digital images of any width, height and resoluon, both monochrome and color. Typically images are in uncompressed form but oponally can also be compressed.

• Bitmap images can have a pixel depth of 1, 4, 8, 16, 24, 32 bits. Bitmap images of 1, 4, 8 bits have a table for color conversion. Images with higher depths have the color directly encoded with the three RGB components.

• A bitmap in memory is loaded as a DIB structure. It includes: – Header (file size…) – Bitmap info (size, n. pixel depth..) – Color table – Pixel map upside-down packed in row (rounded up to a mulple of 4 bytes)

24-bit depth bitmap

• Image compression aims at reducing the number of bits used to represent raster image content. Compression can be either lossless or lossy. Lossless compression schemes are reversible so that the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.

• Lossless compression algorithms usually exploit stascal redundancy to represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is possible because most real-world data has stascal redundancy. For example, in English text, the leer 'e' is much more common than the leer 'z', and the probability that the leer 'q' will be followed by the leer 'z' is very small.

• Lossy compression instead assumes that some loss of fidelity is acceptable. For example, a person viewing a picture or television video scene might not noce if some of its finest details are removed or not represented perfectly.

• What compression technique is to be used depends on the applicaon. In general: – Text documents: lossless compression – Data for numerical analysis: lossless compression – Programs: lossless compression – Typographic images: lossless compression – WEB images: lossy compression – Video: lossy compression – Audio: lossy compression

• Lossy compression guarantees higher compression rates Lossless compression

Run-lenght encoding

• Run lenght encoding is a fixed-lenght coding scheme. With run lenght encoding, a sequence of equal symbols is encoded with only one symbol, followed by a number that specifies the mes the symbol appears consecuvely. A special symbol is required.

• For example, if we consider a text string with three character sequences of 11 (eleven) characters “r” each, followed by three sequences of 11 (eleven) “p” each and three sequences of 11 (eleven) “c” each, the whole string is encoded as (special symbol $): $11r $11r $11r $11p $11p $11p $11c $11c $11c Total characters: 6+6+6 = 18 Total numbers: 3+3+3=9 Total dimension: 27Byte = 216bit Prefix-free coding

• Fixed-length codes are always uniquely decipherable. However, these do not always give the best compression and variable length codes are preferred.

• Prefix free coding is a coding scheme where no codeword is a prefix of another one. Every message encoded by a prefix free code is uniquely decipherable. Since no codeword is a prefix of any other we can always find the first codeword in a message, peel it off, and connue decoding.

• We are therefore interested in finding good (best compression) prefix-free codes. Huffman coding

• Huffman coding refers to the use of a variable-length code for encoding a source symbol, where the variable-length code has been derived on the basis of the esmated probability of occurrence of the source symbol.

• Huffman coding uses a specific method for choosing the representaon for each symbol: – a prefix code that expresses the most common source symbols with shorter strings of bits than for less common source symbols. – the bit string represenng some parcular symbol is never a prefix of the bit string represenng any other symbol (prefix-free code)

• It is the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size with the same symbol frequencies. • The Huffman code for an alphabet (set of symbols) may be generated by construcng a binary tree with nodes containing the symbols to be encoded and their frequencies of occurrence. The tree may be constructed as follows: – Step 1. Select the two parentless nodes with the lowest frequencies. – Step 2. Create a new node which is the parent of the two lowest frequency nodes. – Step 3. Assign the new node a frequency equal to the sum of its children's frequencies. – Step 4. Repeat Step 1 unl there is only one parentless node le.

• Huffman code example:

Consider a set of five different symbols. The symbol's frequencies are: 0 1 A 24 B 12 C 10 A 0 1 D 8 24 E 8 If not encoded, this results into a total of 186 bit (3 bit per codeword) 0 1 0 1 Huffman encoding: – Step 1. Combine D and E into DE with a frequency of 16 B C D E – Step 2. Combine B and C into BC with a frequency of 22 10 12 8 8 – Step 3. Combine BC and DE into BCDE with a frequency of 38 – Combine A with BCDE into ABCDE with a frequency of 62 • The code for each symbol may be obtained by tracing a path to the symbol from the root of the tree. A 1 is assigned for a branch in one direcon and a 0 is assigned for a branch in the other direcon.

• The running me of Huffman's method is fairly efficient, it takes O(n log n) operaons to construct it.

0 1 • Building the code tree:

A Symbol Frequency Code Code length Total lenght 0 1 24 A 24 0 1 24 B 12 100 3 36 C 10 101 3 30 0 1 0 1 D 8 110 3 24 E 8 111 3 24 B C D E 12 8 8 ------10 186 bit 138 bit (3 bit code) (Huffman encoding) Decoding Huffman encoded files

• In order to decode Huffman encoded files, the decoding algorithm must know what code was used to encode the data. A table containing symbols and their codes or the Huffman tree should be used.

• Decoding a file is a two step process: – the header data is read in and the Huffman code for each symbol is reconstructed. – the encoded data is read and decoded.

• The fastest method for decoding symbols is to read the encoded file one bit at me and traverse the Huffman tree according to each of the bits unl a leaf containing a symbol is reached. When a bit causes a leaf of the tree to be reached, the symbol contained in that leaf is wrien to the decoded file, and traversal starts again from the root of the tree.

0 1 • Example A 0 1 24 Input sequence 0111100110111 Decoded sequence A E B D E 0 1 0 1

B C D E 10 12 8 8

• Although Huffman's original algorithm is opmal for a stream of unrelated symbols with a known input probability distribuon, it is not opmal when the probability mass funcons are unknown, not idencally distributed, …..

• Other methods such as f.e. LZW (Lempel-Zif-Welsh) coding used in GIF images oen have beer compression capability: these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input stascs, the laer of which is useful when input probabilies are not precisely known or vary significantly within the stream. GIF image format

• GIF, which stands for "Graphic Interchange Format," was first standardized in 1987 as a lossless compression standard by CompuServe. It is an 8 bit per pixel image format using a palee of up to 256 disnct colors from the 24-bit RGB .

• The format had a widespread usage on the early WWW due to its wide support and portability. GIF main characteriscs suited

• A GIF image employs lossless LZW (Lempel-Zif-Welsh) compression so that the file size of an image may be reduced without degrading the visual quality, provided the image can be rendered with only 256 colours. Its lossless compression preserves very sharp edges. unsuited • The 256 colors limitaon makes the GIF format unsuitable for color photographs and other images with connuous color. It is instead well- suited for simpler images such as graphics or logos with solid areas of color.

• GIF has been generally used for sharp-edged line art (oen vector based logos) with a very limited number of colors. A large poron of web page line art, including logos and design elements, is GIF. The GIF format is also sll ulized both for short, small animaons and low resoluon film clips for web pages. • Monochrome photographs (with connuous grey tones) can be represented well as GIFs but have large file sizes due to the inappropriate compression technique

• With the excepon of animated GIFs however, PNG has increasingly replaced GIFs on web pages. LZW compression

• LZW compression replaces strings of symbols (i.e. sequences with 2 or more symbols) with single codes. It does not do any analysis of the incoming string paern. Instead, it adds every new string of symbols it finds to a table of strings. Compression occurs when a single code is output instead of a string of symbols

• The LZW code can be of any arbitrary length, but it must have more bits in it than a single symbol. For 8 bit symbols the first 256 codes are assigned to the standard set. The remaining codes are assigned to strings that are found as the algorithm proceeds • For example, with 12 bit codes, codes 0-255 refer to individual bytes, and codes 256-4095 refer to substrings Example

• The input character string is a list of words separated by the '/' character: /WED /WE /WEE /WEB /WET

‒ A table with 256 individual characters is assumed to be present. Each character is assigned with a unique code (0-255). ‒ Starng from the first couple of characters, it is checked if the string /W is in the table. Since it is not present in the table, the string /W is added to the table and the code for / is output. The string is assigned to code 256 (codes 0-255 are already used for 256 characters). ‒ Aer E, has been read in, the second string code, WE is added to the table, and the code for W is output.

‒ This connues unl in the second word the characters / and W are read, matching string number 256. In this case, the string /WE (three characters) is added to the table and the code 256 is output.

‒ The process connues unl the string is exhausted and all the codes have been output. /WED /WE /WEE/ WEB/ WET

Character Input existing? string checked existing? Code Output New code New String

/ Y - - - - - W Y /W N / 256 /W E Y WE N W 257 WE D Y ED N E 258 ED / Y D/ N D 259 D/ W Y /W Y E Y /WE N 256 260 /WE / Y E/ N E 261 E/ W Y /W Y E Y /WE Y E Y /WEE N 260 262 /WEE / Y E/ Y W Y E/W N 261 263 E/W E Y WE Y B Y WEB N 257 264 WEB / Y B/ N B 265 B/ W Y /W Y E Y /WE Y T Y /WET N 260 266 /WET EOF T LZW decompression

• The decompression algorithm takes the stream of codes output from the compression algorithm and uses them to recreate the input stream. • The LZW algorithm does not need to pass the string table to the decompression code. The table can be built exactly as it was during compression, using the input stream as data

• In decompression, the string table ends up looking exactly like the table built up during compression. The output string is idencal to the input string from the compression algorithm. Input Codes: / W E D 256 E 260 261 257 B 260 T

Input Output New_code Old_code String New table entry

/ / / W / W 256 = /W E W E 257 = WE D E D 258 = ED 256 D /W 259 = D/ E 256 E 260 = /WE 260 E /WE 261 = E/ 261 260 E/ 262 = /WEE 257 261 WE 263 = E/W B 257 B 264 = WEB 260 B /WE 265 = B/ T 260 T 266 = /WET

• CompuServe updated the GIF format in 1989 to include animaon, transparency, and interlacing.

GIF87a Image 61.4KB GIF89a Image 61.4KB GIF animaon

• GIF89a animaon provides the ability to set the cell's (technically called an "animaon frame") movement speed in 1/100 of a second. An internal clock embedded into the GIF keeps count and flips the image when the me comes.

• GIF89a was designed based on the principle of rendering images to a logical screen. Each image could oponally have its own palee, and the format allows to specify delay and waing for user input GIF Transparency

• In transparent GIF the computer is told to hone in on one color. A parcular red / green / blue shade already found in the image is chosen and blanked out (the color is dropped from the palee that makes up the image), so that whatever is behind it shows through. A transparent GIF is limited in that only one color of the 256-shade palee can be made transparent

• The process is similar to chroma key used in television. A computer is told to hone in on a specific color, let's say it's green. Chroma key screens are usually green because it's the color least likely to be found in human skin tones. That chroma is then erased and replaced by another image. Interlaced, non-interlaced GIF

• A non-interlaced image, is filled in from the top to the boom, one line aer another. Interlacing is the concept of filling in every other line of data, then going back to the top and doing it all again, filling in the lines you skipped.

• The effect on a computer monitor is that the graphic appears blurry at first and then sharpens up as the other lines fill in. That allows the viewer to at least get an idea of what is coming up rather than waing for the enre image, line by line. PNG image format

• PNG (Portable Network Graphics) is a lossless compression format created to improve and replace the GIF format. PNG is a single-image format. A companion format called MNG has been defined for animaon. PNG files use file-extension "PNG" or "png”.

• The movaon for creang the PNG format came in early 1995, aer Unisys announced that it claimed its patent on the LZW compression algorithm used in the GIF format. A replacement was anyway desirable because the GIF limitaon to 256 colors at a me when computers were capable of displaying far more.

• PNG was designed for transfering images on the Internet, and is the opmal choice for exporng images with repeang gradients for web usage. It is not suited not professional graphics. • PNG is also useful for saving temporary photographs that require successive eding. When the photograph is ready to be distributed, it can then be saved as a JPEG, and this limits the informaon loss to just one generaon.

• PNG format can be 10 mes the size of JPEG.

• PNG employs the RGB color space. PNG does not support EXIF (Exchangeable Image File) image data (including camera sengs like shuer speed, focal lenght, exposure compensaon, flash used… and scene informaon, date and me…) from sources such as digital cameras, which makes it problemac for use amongst amateur and especially professional photographers.

• PNG is supported by the plaorm-independent libpng reference library, with funcons for handling PNG images (hp://www.libpng.org/pub/png/libpng.html)

Deflate compression

• PNG employs the DEFLATE lossless compression algorithm that uses a combinaon of the LZ77 algorithm and Huffman coding (also used in the PKZIP archiving tool and specified in RFC 1951).

• A DEFLATE stream consists of a series of blocks. Each block uses a single mode of compression and is preceded by a 3-bit header: – 1-bit: Last block in stream marker: • 1: if this is the last-block in the stream • 0: if there are more blocks to process aer this one. – 2-bits: Encoding method used for this block type: • 00: a stored/raw/literal secon follows, between 0 and 65535 bytes in length. • 01: a stac Huffman compressed block, using a pre-agreed Huffman tree. • 10: a compressed block complete with the Huffman table supplied. Compression is achieved through two steps: – LZ77 algorithm: matching and replacement of duplicate strings with pointers: if a repeated string of bits exists a back reference to the previous locaon of the same string is inserted. This reference is expressed as (distance, lenght). References can be made across any number of blocks – Huffman coding: replacing symbols with new, weighted symbols based on frequency of use • The LZ77 compression finds sequences of characters that are repeated. It uses a sliding window of 32K (records what the last 32768 characters were).

• When the same sequence of characters is encountered the sequence is replaced by a distance (how far back in the window) and the lenght (the number of idencal characters), which is equivalent to the statement: "each of the next length characters is equal to the characters exactly distance characters behind the current point in the uncompressed stream".

• Example:

Let’s take the sequence : Homehomehomehomehom Consider the characters Homeh and the next 4 characters omeh: Homehomehomehomehome

– There is an exact match of the last 4 characters omeh with the characters before, 4 posions behind the current point. We can output special characters to the stream that represent a number for length, and a number for distance. We can encode: Homeh [D=4, L=4]

– Considering also the characters that follow each of the two strings declared to be equal we see that other characters are the same. We can increase compression as: Homeh [D=4, L=18]

• PNG supports indexed palee-based (palees of 24-bit RGB colors) or grayscale or RGB images (one or more channels).

• The number of channels will depend on whether the image is greyscale or color and whether it has an alpha channel. PNG allows the following combinaons of channels: • indexed (channel containing indexes into a palee or colors) • greyscale • greyscale and alpha (0/1 indicates the level of transparency for each pixel) • red, green and blue (rgb / truecolor) • red, green, blue and alpha

Type Bit depth per channel

1 2 4 8 16

indexed (color type 3) 1 2 4 8 No greyscale (color type 0) 1 2 4 8 16 greyscale & alpha (color type 4) No No No 16 32 Truecolor (RGB - color type 2) No No No 24 48 truecolor & alpha (RGBA - color type 6) No No No 32 64

cell values are total bits per pixel TIFF image format

• The TIFF (Tagged Image File Format) format is a lossless standard by Aldus Corporaon that stores color images with 24 bits per pixel.

• TIFF allows to compress images up to a certain point sll saving image quality. With respect to PNG, TIFF is much larger in file size for an equivalent image. • TIFF is a format that incorporates an extremely wide range of opons. It is useful as a generic format for interchange between professional image eding applicaons, but many applicaons including web browsers can read only a subset of TIFF types. So the same image can display in different colors depending on the TIFF interpreter: – The most common general-purpose, lossless compression algorithm used with TIFF is LZW, which is inferior to PNG. – There is a TIFF variant that uses the same compression algorithm as PNG uses, but it is not supported by many proprietary programs. – TIFF also offers special-purpose lossless compression algorithms like CCITT Group IV, which can compress bilevel images (e.g., faxes or black-and-white text) beer than PNG's compression algorithm.

• Different color images can be represented in TIFF, namely: RGB, CMYK, Lab

Lossy compression

The JPEG Standard

• JPEG is a compression algorithm developed by the Joint Photographic Experts Group. The Web took to the format straightaway because it allows to store images in fewer bytes, and transfer them in fewer bytes (hp://www..org/)

• JPEG does not define which color space is to be used for images. JPEG provides lossy compression, i.e. trades-off detail in the displayed picture for a smaller storage file. • The compression algorithm is not as well suited for line drawings and other textual or iconic graphics, and thus the PNG and GIF formats are preferred for these types of images.

JPEG - JFIF

• JPEG specifies both the codec defining how an image is transformed into a stream of bytes, and the file format used to contain that stream. JFIF (JPEG File Intechange Format) specifies how a file is created to store a JPEG stram on a computer. JFIF defines the color model to be used as the YCbCr or YUV color spaces that are directly derived from the RGB space.

• JPEG/JFIF is the format most used for storing and transming photographs on the WWW. For this applicaon, it is preferred to formats such as GIF, which has a limit of 256 disnct colors that is insufficient for colour photographs, and PNG, which produces much larger image files for this type of image. Mode of operaon

• JPEG uses transform coding, it is based on the following observaons:

– Observaon 1: A large majority of useful image contents change relavely slowly across images, i.e., it is unusual for intensity values to alter up and down several mes in a small area, like f.e. within an 8 x 8 pixel image block. In terms of frequencies, low spaal frequency components contain more informaon than high frequency components (that correspond to less useful details and noises).

– Observaon 2: Psychophysical experiments suggest that humans are more recepve to the loss of higher spaal frequency components than the loss of lower frequency components.

Some rules of use

• Since JPEG is lossy, bytes are lost at the expense of detail.

• You can see where the compression algorithm found groups of pixels that all appeared to be close in color and just grouped them all together as one: JPEG Image compression example JPEG Image compression example The difference between the 1% and 50% compression is not too bad, but the drop in bytes is impressive.

• A useful property of JPEG is that the loss can be varied by adjusng compression parameters. This means that the image maker can trade-off file size against output image quality. For good- quality, full-color source images, the default quality seng is Q 75 i.e. 25% of the image is included in the algorithm. A good rule is to save your at 50% or medium compression. JPEG major steps

– DCT (Discrete Cosine Transformaon) – Quanzaon – Zigzag Scan – Entropy coding • Coefficient encoding – DPCM on DC component – RLE on AC Components • Huffman Coding JPEG chain DCT (Discrete Cosine Transformaon)

• Apply DCT to 8x8 image blocks

• If the image size is not a mulple of 8, then add copies of the last row or column unl a mulple of 8 is reached. This makes both tone and luminance of the 8x8 block not change too much aer DCT, as it would be if these elements were set to 0.

• DCT allows to shi from spaal domain to frequency domain:

f(i,j) is the value that is present in the (i,j) posion of the 8x8 block of the original image. F(u,v) is the DCT coefficient of the 8x8 block in the (u,v) posion of the 8x8 matrix that encodes the transformed coefficients.

Discrete Cosine Transform (DCT):

Inverse Discrete Cosine Transform (IDCT): Why DCT not FFT

• DCT is like FFT, but can approximate linear signals well with few coefficients.

The 64 (8 x 8) DCT basis funcons

F[0,0] DCT factoring

• To compute DCT, factoring reduces the problem to a series of 1D DCTs:

f [ i,j ] G [ i,v ] F [ u,v ] Quanzaon

• To reduce number of bits per sample, quanzaon is used:

F'[u, v] = round (F[u, v] / q[u, v])

where q(u,v) is the quanzaon matrix and F(u,v) is the DCT coefficient matrix.

Example: 101101 = 45 (6 bits) q[u, v] = 4 truncate to 4 bits: 1011 = 13

• Quanzaon error is the main source of the lossy compression. Different quanzaon matrices can be used. • Uniform Quanzaon Each F[u,v] is divided by the same constant N.

• Non-uniform Quanzaon accounts for the fact that human eye is most sensive to low frequencies (upper le corner), less sensive to high frequencies (lower right corner), more sensive to luminance, less to color

Luminance Quantization Table q(u, v) Chrominance Quantization Table q(u, v) ------16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 99 12 12 14 19 26 58 60 55 18 21 26 66 99 99 99 99 14 13 16 24 40 57 69 56 24 26 56 99 99 99 99 99 14 17 22 29 51 87 80 62 47 66 99 99 99 99 99 99 18 22 37 56 68 109 103 77 99 99 99 99 99 99 99 99 24 35 55 64 81 104 113 92 99 99 99 99 99 99 99 99 49 64 78 87 103 121 120 101 99 99 99 99 99 99 99 99 72 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99 ------Non-uniform Quanzaon

Zig-zag scan

• As a result of quanzaon we have a 8x8 matrix with many elements equal to 0. Non null coefficients are all in the upper le corner. • This suggests to transform the 8x8 matrix into a 64 element vector using a zig-zag order. Zig-Zag scan is used to group low frequency coefficients in the top of the vector: maps 8x8 to 1x64 vector Coefficient encoding

• Coefficients are encoded differently:

– Differenal Pulse Code Modulaon (DPCM) on the DC component • DC component is large and varied, but oen close to the value of the DC component of the previous block. According to this JPEG encodes the difference (DC diff) between the previous and the current 8 x 8 block. – Run Length Encoding (RLE) on AC components • Many of the AC coefficients are equal to 0. According to this they are encoded using RLE, which counts the number of consecuve 0s: – a minimum of 0 to a maximum of 16 consecuve 0s is allowed (in the laer case the special symbol (15,0) is used); – the end of block is encoded with (0,0). • For the DC component (DC diff) we build the pair: (SIZE) (AMPLITUDE) SIZE is the number of bits needed to represent the DC difference value; AMPLITUDE is the value of the DC difference. ------SIZE Value ------1 -1, 1 2 -3, -2, 2, 3 3 -7..-4, 4..7 4 -15..-8, 8..15 . . . . 10 -1023..-512, 512..1023 ------

Example: if DC value is 4, 3 bits are needed. • For the AC components the following representaon: (RUNLENGHT, SIZE) (AMPLITUDE) is used where RUNLENGHT is the number of consecuve 0 (from 0 to 15), SIZE has the same meaning as for the DC coefficient, AMPLITUDE is the actual value for nonzero AC coefficients. DC Coefficient DC of preceding block

o o

symbol-1 symbol-2

Special symbols (symbol-1)

Size Amplitude Huffman encoding

• Aer we have encoded every block, we have a sequence of symbols: ‒ Symbol 1: (SIZE) or (RLE, SIZE) ‒ Symbol 2: (AMPLITUDE) These symbols are further encoded using the Huffman encoding to reduce the number of data. Most frequent symbols are encoded with shorter codes. Less frequent with longer ones.

• Huffman tables provide codes for every symbol of the sequence. Huffman Tables can be custom (sent in header) or default. Huffman tables are different for DC and AC symbol 1.

Symbol1 and Symbol2 encoding

Byte stuffing

Progressive JPEG

• Progressive JPEG works a lot like the interlaced GIF89a by filling in every other line, then returning to the top of the image to fill in the remainder.

• The DCT progressive mode of operaon consists of the same DCT and quanzaon steps that are used by DCT sequenal mode. The key difference is that each image component is encoded in mulple scans rather than in a single scan. Aer each block of DCT coefficients is quanzed, it is stored in a coefficient buffer memory. The buffered coefficients are then parally encoded in each of mulple scans.

• The first scan(s) encode a rough but recognizable version of the image which can be transmied quickly in comparison to the total transmission me, and are refined by succeeding scans unl reaching a level of picture quality that was established by the quanzaon tables.

• Each scan of progressive JPEG takes about the same computaon to display as a whole JPEG. It has sense only if a decoder is available that is faster than the communicaon link. Progressive spectral selecon - Progressive successive approximaon

• There are two complementary methods by which a block of quanzed DCT coefficients may be parally encoded.

• Progressive Spectral Selecon algorithm: – The DCT coefficients are grouped into several spectral bands: only a specified band of coefficients from the zig-zag sequence need be encoded within a given scan. – Low-frequency DCT coefficient bands are sent first,and then higher-frequency ones This procedure is called spectral selecon, because each band typically contains coefficients which occupy a lower or higher part of the spaal-frequency spectrum.

• Progressive Successive Approximaon algorithm – The coefficients within the band need not be encoded to their full quanzed accuracy in a given scan: DCT coefficients are sent first with lower precision, and then refined in later scans (first the N most significant bits and the less significant in successive scans) • The quanzed DCT coefficient informaon can be viewed as a rectangle for which the axes are the DCT coefficients (in zig-zag order) and their amplitudes.

‒ Spectral Selecon slices the informaon in one dimension. ‒ Successive Approximaon slices the informaon in the other.

The JPEG Bitstream

• A JPEG image consists of a sequence of segments, each beginning with a marker, each of which begins with a 0xFF byte followed by a byte indicang what kind of marker it is. Some markers consist of just those two bytes; others are followed by two bytes indicang the length of marker-specific payload data that follows.

• The bitstream of a JPEG/JFIF image file gives the following segments:

Type of segment Length of segment ------Start of image 0 APP0 16 Quansaon table 67 Start of frame: baseline DCT 11 Huffman table 28 Huffman table 63 Start of scan 49363 End of image 0

– The APP0 segment marks this file as a JFIF/JPEG. JFIF defines some extra fields (like the image resoluon and an oponal thumbnail). – The JPEG file contains one frame (image) and that frame can include one of more scans. Scans can give progressively more detail (with Progressive JPEG) or can be different axis of the colour space. – The quansaon and Huffman tables are needed for decoding the image. There can be many Huffman and quansaon tables and different scans might use different tables. JPEG image format example

Applicaon specific. For example, an EXIF JPEG file uses the marker to store EXIF metadata Luminance quanzaon table Crominance quanzaon table

Baseline DCT-based JPEG, specifies the width, height, number of components, and component subsampling (e.g., 4:2:0).

Huffman table for DC symbol 1 Huffman table for DC symbol 2

Huffman table for AC symbol 1 Huffman table for AC symbol 2

In baseline DCT JPEG images, there is generally a single scan. Progressive DCT JPEG images usually contain mulple scans. The marker specifies which slice of data it will contain, and is immediately followed by entropy-coded data. Sequenal Lossless JPEG

• Lossless JPEG does not use DCT. It uses a predicve scheme based on the nearest neighbors and entropy coding is used on the predicon error. The predicon block has replaced the DCT encoding and the quanzaon block from the baseline sequenal JPEG encoder.

• The simplest predicve coding scheme is the DPCM that encodes the difference between the actual value of each pixel and its predicted value. The predicted value is provided by an appropriate funcon based on the modified values of the pixels above and le (A, B and C in figure). Predictor formula can as be simple as = A or the average, or as complex as =B+(A-C)/2

C B A X

• The sequence is encoded with the Huffman code. Hierarchical JPEG

• Hierarchical coding represents images at different pixel resoluons, i.e. we could be able to create various image versions, e.g. 512x512, 1024x1024 and 2048x2048

• Hierarchical JPEG mode creates a set of compressed images beginning with small images, and then connuing with images of increased resoluons. In this way it provides a pyramidal encoding of an image at mulple resoluons, each differing in resoluon from its adjacent encoding by a factor of two in either the horizontal or vercal dimension or both.

• Hierarchical encoding is useful in applicaons in which a very high resoluon image must be accessed by a lower-resoluon display or for real me applicaons. An example is an image scanned and compressed at high resoluon for a very high-quality printer, where the image must also be displayed on a low-resoluon PC video screen.

• In hierarchical JPEG, lower resoluon image is scaled up to the next resoluon and used as a predicon for the following stage. The encoding procedure can be summarized as follows:

– 1 Filter and down-sample the original image by Image downsampling the desired number of mulples of 2 in each dimension (DSF). 1 1 – 2 Encode this reduced-size image using one of the sequenal DCT or progressive DCT encoders (FDCT). 5 – 3 Decode this reduced-size image (IDCT) 6 – 4 Interpolate and up-sample it by 2 horizontally 2 and/or vercally, using the idencal interpolaon 3 filter which the receiver must use (USF). 4 – 5 Use this up-sampled image as a predicon of the original at this resoluon, and encode the difference (error) image using one of the sequenal DCT or progressive DCT (FDCT). 7 7 – 6 Decode (IDCT) the difference image and sum it Image encoding to the up-sampled version available. – 7 Repeat steps 4), 5) and 6) unl the full resoluon of the image has been encoded.

• Hierarchical JPEG single frames can be furthermore coded with progressive JPEG mode

Handling Color Images

• JPEG encoding of color images can be done according to alternave approaches: - Consider the R, G and B components and JPEG encode each channel separately

Apply JPEG R-Component Compression

Apply JPEG Color Image G-Component Compression

Apply JPEG B-Component Compression - Transform RGB to another representaon in order to separate Luminance from Chroma and apply JPEG to each channel separately (downsampling the chroma channel)

Apply JPEG Y-component Compression Transform to (Luminance) Color Image Y Cr Cb or YUV Cr Cb component Subsample by Apply JPEG (Chrominance) 2 in H & V Compression

GIF, PNG, TIFF, JPEG standards at comparison

• GIF – PNG GIF and PNG formats use lossless compression to achieve medium levels of compression on images

– GIF works best on images with few colors or images in which one color is dominant. It acts best on idencal, adjacent pixels (or rows of idencal, adjacent pixels). There is no loss during the compression process as long as the original image had fewer than 256 colors. The key to using GIFs effecvely is to use the smallest possible number of colors. – PNG is the other choice with higher number of colors.

– Also notable about GIFs is the fact that images can be transparent, animated, or both.

• JPEG The JPEG format uses lossy compression to achieve high levels of compression on images with many colors. – The compression works best with connuous-tone images, that is, images where the change between adjacent pixels is small but not zero. – JPEG images generally store 16 or 24 bits of color and thus are best for 16- or 24- bit images. – Due to the noceable loss of quality during the compression process, JPEGs should be used only where image file size is important, primarily on web pages.

• Example: image with 25 plain colors

‒ The GIF and the PNG have the same quality as the original, but the GIF is eang almost 3 mes as much KB's as the PNG. ‒ The JPEG with the least compressing factor is already blurring a lot. The JPEG with 25% compressing is smaller, but sll 4 mes the PNG and has a bad quality blurring the 25 original colors

• Example: photo image ‒ JPEG is a standard for digital photographs because it can save informaon on more than 16 million different hues. Its "lossy" compression has lile effect on photographs. ‒ JPEG will produce a smaller file than PNG for photographic images. Using PNG instead of JPEG for such images would result in a large increase in filesize (oen 5–10 mes) with negligible gain in quality. ‒ The JPEG with 50% compression is showing quite a lot of the JPEG arfacts in the air around the right tower. ‒ The GIF and the 25% compressed JPEG are both reasonable. • Example: text image

– PNG is a beer choice than JPEG for storing images that contain text, line art, or other images with sharp transions that do not transform well into the frequency domain.

– Where an image contains both sharp transions and photographic parts a choice must be made between the large but sharp PNG and a small JPEG with arfacts around sharp transions.

• If your image is...

– Black and white Use GIF ! sample – Text on a plain background Use GIF ! sample – Transparent or animated Use GIF – Computer-drawn line, cartoon art Use GIF. – Small images, like icons, buons Use GIF – Predominantly (>80%) one color Consider GIF

The TIFF format is rarely seen on the web because it offers poor compression.

• Image size: 146 x 184, 75 colors. File size: – 8 bpp: 26864 bytes – 24 bpp: 80592 byts – PPM (24 bpp) : 80674 byte – GIF (8 bpp): 3585 byte (FC=22.48) – JPG (24 bpp): 4805 byte (FC=16.77) – PPM.ZIP (24 bpp): 3698 byte (FC=21.79)

PPM portable pixel map (most redundant and inefficient image format)

– 16/24-bit scanned photograph Use JPEG ! sample – Computer-drawn connuous-tone art Use JPEG ! sample – Scanned images and photographs Use JPEG – (Large) images with a lot of detail Use JPEG

• Image size: 244 x 334, 31322 colors File size: • 8 bpp: 81496 byte • 24 bpp: 244488 byte • PPM (24 bpp) : 244620 byte • GIF (8 bpp): 49613 byte (FC=4.92) • JPG (24 bpp): 16352 byte (FC=14.95) • PPM.ZIP (24 bpp): 190977 byte (FC=1.28) JPEG 2000 Standard

• JPEG 2000 is an ISO Standard (ISO/IEC 15444-1:2000) for images created by the Joint Photograph Expert Group commiee in year 2000. Allows mul spectral imaging. JPEG2000 supports both lossy and lossless compression. JPEG2000 image files have the extension .jp2 or .j2f

• JPEG2000 is not so widely used nor on the web as JPEG: many important soware programs for image manipulaon and processing and web browsers do not have support for JPEG2000; no accepted way to embed EXIF data. – Photoshop: ADOBE plugin – Paint Shop Pro: proprietary plugin – Browsers: plugin available (Luratech) – Linux: yes through JasPer (MIT licence) – MSWindows: only read proprietary – MACOSX: yes through QuickTime – … JPEG2000 color components

• JPEG2000 requires that images are transformed from RGB into: – YUV (fully reversible) – YCbCr (irreversible because of floang point implementaon and round-offs)

• YCbCr is a family of color spaces used as a part of the color image pipeline in video and digital photography systems. Y CbCr is not an absolute color space, it is a way of encoding RGB informaon. The actual color displayed depends on the actual RGB colorants used to display the signal. Typically the terms YCbCr and YUV are used interchangeably. YCbCr color space

Cb

Cr

Original image Y Gamma correcon

• Y’CbCr can be used instead of YcbCr where Y’ stands for Luma, i.e. gamma-corrected Luminance. Gamma correcon is a nonlinear operaon used to code and decode luminance in video or sll image systems. It is required to compensate for properes of human vision.

g • Gamma correcon is made according to Vout = aV in where a is a constant (typically equal to 1) and the input and output values are non-negave real values in the range 0–1. A gamma value γ < 1 is called an encoding gamma; a gamma value γ > 1 is called a decoding gamma. In most computer systems, images are encoded with a gamma of about 0.45 and decoded with a gamma of 2.2.

• Data in sll image files (e.g. JPEG) are explicitly encoded (that is, they carry gamma-encoded values, not linear intensies) Vout

gamma ecoding/decoding (dashed/connuous) and linear transfer funcon (doed)

Vin Reference Grid and Image Area • JPEG2000 supports mulple image components 1 to 255 or more. Each component can have different sizes and bit depths (1 to 32bits), and different alignments relave to each other

• JPEG2000 uses the concept of the canvas to align mulple image components in a single coordinate system. By default, images are placed on the canvas so that the image and canvas origins align. • Only those samples which fall within the image area actually belong to the image component. Thus the samples of component i are mapped into the image component domain, as a rectangle having upper le hand sample with coordinates (X0siz, Y0siz) and lower right hand sample with coordinates (Xsiz-1, Ysiz-1)

Canvas as a rectangular grid of size Xsiz x Ysiz Canvas The image is aligned to the boom-right corner of the grid Xsiz, Ysiz with vercal, horizontal sampling periods of each component. Size of the Image area: (Xsiz-XOsiz)x(Ysiz-YOsiz) Image Tiling

• Each image component is further broken down into les. Tile sizes are variable, and can differ from component to component. Similar to blocks in JPEG, but more flexible.

Image les relave to the reference grid • The reference grid is paroned into a regular sized rectangular array of les. The le size and ling offset are defined on the reference grid, by dimensional pairs (XTsiz, YTsiz) and (XTOsiz, YTOsiz), respecvely.

• By default, images will have one le that has the same dimensions and offset on the canvas as the image. If the le dimensions are smaller than the image dimensions and the le offsets are different than the images offsets, some les may extend beyond the borders of the image. Image subsampling and cropping

• Each image component can also be subsampled. The subsampling factors indicate the scaling factor between the component dimensions and the image dimensions. For example, an image component that has subsampling factors of 2 by 2 of a 1280 by 720 image, will have dimensions 640 by 360. The samples of component i are at integer mulples of (Xsiz(i), Ysiz(i)) of the canvas.

396

16:9 297 4:3 720 360

1280 640 Original image area (full resoluon) New image area (subsampled) • Tiling with subsampling and cropping can be used to obtain new images from original images. An example that sub-samples an 1280 x 720 (16:9) image at 2:1 rao on each side and then crops it to 4:3 aspect rao: new image size is 396x297 Wavelet compression

• JPEG2000 uses Discrete Wavelet Transform in the lossy stage of image compression. Wavelet transform breaks down the image into mul resoluon representaons.

• For JPEG2000, the wavelet transform is applied to the image on a le by le basis. Discrete Wavelet Transform

• In numerical analysis and funconal analysis, the Discrete Wavelet Transform refers to wavelet transforms for which the wavelets are discretely sampled.

• The Discrete Wavelet Transform was invented by the Hungarian mathemacian Alfréd Haar: – For an input represented by a list of 2n numbers, the Haar wavelet transform may be considered to simply pair up input values, storing the difference and passing the sum. – This process is repeated recursively, pairing up the sums to provide the next scale: finally resulng in 2n − 1 differences and one final sum.

• The Discrete Wavelet Transform has nice properes: – it can be performed in O(n) operaons; – it captures not only some noon of the frequency content of the input, by examining it at different scales, but also captures the temporal content, i.e. the mes at which these frequencies occur. Combined, these two properes make the wavelet transform, an alternave to the convenonal Fast Fourier Transform. 1D Discrete Wavelet Transform

• The Haar wavelet can be described as a step funcon:

1 1 2x2 matrix H = 1 0 < = x < ½ 1 1 -1 F(x) -1 ½ < x < =1 0 1 0 otherwise -1

– Given a sequence (a0, a1, a2,a3…a2n+1) of even lenght this can be transformed into a sequence of two-component vectors (a0,a1),… (a2n,a2n+1).

– If one mulplies each vector with the matrix H one gets the result (s0,d0)…..(sn,dn) of one stage of the Haar wavelet transform (sum, difference).

– The two sequences s and d are separated and the process is repeated with the sequence (s0, s1, s2, s3…s2n+1) • In the one dimensional Discrete Wavelet Transform case, it equals that the signal is broken into subbands by passing it through a low pass filter and a high pass filter. The outputs give: – the approximaon coefficients (from the low-pass filter) – the detail coefficients (from the high-pass filter)

Approximation coefficients -100+200+600+200-200= 700/8 = 87,5

Detail coefficients

• Taking only the sum at each level implies that half the frequencies of the signal have been removed at each level. So half of the samples can be discarded according to Nyquist’s rule. The filter outputs are therefore downsampled by 2 Nyquist theorem: a signal must be sampled at least twice its highest frequency in order to extract all the informaon from the bandwidth.

• Due to the decomposion process the input signal must be a mulple of 2n where n is the number of levels.

2D Discrete Wavelet Transform

• In the two-dimensional case, as in the 1D case, the signal is broken into subbands by passing it through a low pass filter and a high pass filter, and both subbands are downsampled by 2.

• According to the Mallat method, decomposion can be applied separably in the vercal and horizontal direcons in the order. This leads to a two-dimensional signal geng broken down into four subbands, known as: – LL (Low frequency horizontal, Low frequency vercal), – HL (High frequency horizontal, Low frequency vercal), – LH (Low frequency horizontal, High frequency vercal – HH (High frequency horizontal, High frequency vercal) 1

LL HL L 2 LH HH H Two-step 2D Wavelet Mallat decomposion Input image L H

HL HL

LH HH LH HH 2D Wavelet Decomposion

• Conceptually, for a parcular image, these subbands translate to: low-frequency approximaon of the original (LL) primarily vercal edges (HL) primarily horizontal edges (LH) diagonal edges (HH).

• Decomposion is iteravely applied. Since downsampling is performed at each pass, at each iteraon the image halves its size in the vercal and horizontal direcons.

Wavelet decomposon quanzaon

• Aer Discrete Wavelet decomposon has been performed, quanzaon matrix is applied to the decomposed image. Uniform quanzaon is performed within each subband, with different levels of quanzaon for each subband.

• JPEG2000 does not specify the use of parcular quanzaon matrices. A way of calculang a quanzaon matrix for a parcular filter is suggested. Generally, the higher frequency subbands are quanzed more coarsely, since humans have lower contrast sensivity to high frequency informaon.

An example Region of Interest (ROI) coding

• JPEG 2000 offers increased flexibility that can make it more applicable than JPEG, and has other interesng feature like ROI coding and progressive transmission

• In ROI coding, porons of an image are stored at higher quality than the rest of the image. This is useful, because we may care more about detail in some porons of an image than in others.

An example of Region of Interest JPEG2000 coding

• ROI is easy to do when the image is stored compressed in a mul resoluon format.

• We first start with a ROI mask, which marks out a region of the image we wish to store at higher quality. The wavelet coefficients corresponding to the transform of the mask have to be stored at higher quality (quanzed less coarsely). We can do this by applying the transform to the mask, and looking at which coefficients fall in the mask.

Coefficients here are quanzed less coarsely at any subband JPEG2000 vs JPEG

• With respect to JPEG it allows space saving in the order of 20%-30%. Therefore it appears to be parcularly suited for large images.

• However this is not the primary movaon for its use. More important JPEG 2000 employs mulresoluon and can arrange a large range of bit rates (both very low and very high compression rates are supported). With JPEG if we want to trasmit over low bit rate we should first reduce the resoluon and then encode. • The wavelet representaons of an image generally perform beer than DCT representaons for lossy image compression, as there is less perceptual loss for the same bit rate even when performed on the same block size.

• Mul-resoluon wavelet representaons give beer performance because: – Mul-resoluon representaons are more similar to how the human visual system represents images. Consequently beer quanzaon matrices can be chosen, to more closely match and exploit the characteriscs of the human visual system – The wavelet basis funcons are smoother than the DCT basis funcons (which tend to be blocky), and are more natural and pleasing to the eye.

A comparave example

JPEG at 0.125 bpp (enlarged) C. Christopoulos, A. Skodras, T. Ebrahimi, JPEG2000 (online tutorial) JPEG2000 at 0.125 bpp C. Christopoulos, A. Skodras, T. Ebrahimi, JPEG2000 (online tutorial)