Image and Compression Coding Theory Contents

1 JPEG 1 1.1 The JPEG standard ...... 1 1.2 Typical usage ...... 1 1.3 JPEG compression ...... 2 1.3.1 Lossless editing ...... 2 1.4 JPEG files ...... 3 1.4.1 JPEG filename extensions ...... 3 1.4.2 profile ...... 3 1.5 Syntax and structure ...... 3 1.6 JPEG example ...... 4 1.6.1 Encoding ...... 4 1.6.2 Compression ratio and artifacts ...... 8 1.6.3 Decoding ...... 10 1.6.4 Required precision ...... 11 1.7 Effects of JPEG compression ...... 11 1.7.1 Sample ...... 11 1.8 Lossless further compression ...... 11 1.9 Derived formats for stereoscopic 3D ...... 12 1.9.1 JPEG Stereoscopic ...... 12 1.9.2 JPEG Multi-Picture Format ...... 12 1.10 Patent issues ...... 12 1.11 Implementations ...... 13 1.12 See also ...... 13 1.13 References ...... 14 1.14 External links ...... 15

2 16 2.1 Examples ...... 16 2.2 Conversion ...... 17 2.3 RGB density ...... 17 2.4 Lists ...... 17 2.4.1 Generic ...... 17 2.4.2 Commercial ...... 18

i ii CONTENTS

2.4.3 Special-purpose ...... 18 2.4.4 Obsolete ...... 18 2.5 Absolute color space ...... 18 2.5.1 Conversion ...... 19 2.5.2 Arbitrary spaces ...... 19 2.6 See also ...... 19 2.7 References ...... 19 2.8 External links ...... 20

3 21 3.1 Wavelength and detection ...... 21 3.2 Physiology of color ...... 21 3.2.1 Theories ...... 22 3.2.2 Cone cells in the ...... 22 3.2.3 Color in the human brain ...... 23 3.2.4 Subjectivity of color perception ...... 24 3.2.5 In other animal species ...... 24 3.3 Evolution ...... 25 3.4 Mathematics of color perception ...... 26 3.5 ...... 27 3.6 See also ...... 27 3.7 References ...... 27 3.8 External links ...... 29

4 YUV 30 4.1 History ...... 30 4.2 Conversion to/from RGB ...... 31 4.2.1 SDTV with BT.601 ...... 31 4.2.2 HDTV with BT.709 ...... 32 4.2.3 Notes ...... 32 4.3 Numerical approximations ...... 32 4.3.1 Studio swing for BT.601 ...... 33 4.3.2 Full swing for BT.601 ...... 33 4.4 / systems in general ...... 33 4.5 Relation with Y′CbCr ...... 34 4.6 Types of sampling ...... 34 4.7 Converting between Y′UV and RGB ...... 34 4.7.1 Y′UV444 to RGB888 conversion ...... 35 4.7.2 Y′UV422 to RGB888 conversion ...... 35 4.7.3 Y′UV411 to RGB888 conversion ...... 35 4.7.4 Y′UV420p (and Y′V12 or YV12) to RGB888 conversion ...... 36 4.7.5 Y′UV420sp (NV21) to RGB conversion (Android) ...... 36 CONTENTS iii

4.8 References ...... 37 4.9 External links ...... 37

5 YCbCr 38 5.1 Rationale ...... 38 5.2 YCbCr ...... 38 5.2.1 ITU-R BT.601 conversion ...... 40 5.2.2 ITU-R BT.709 conversion ...... 40 5.2.3 ITU-R BT.2020 conversion ...... 41 5.2.4 JPEG conversion ...... 41 5.3 CbCr Plane at Y = 0.5 ...... 41 5.4 References ...... 41 5.5 External links ...... 42

6 43 6.1 Rationale ...... 43 6.2 How subsampling works ...... 43 6.3 Sampling systems and ratios ...... 44 6.4 Types of sampling and subsampling ...... 44 6.4.1 4:4:4 ...... 44 6.4.2 4:2:2 ...... 44 6.4.3 4:2:1 ...... 44 6.4.4 4:1:1 ...... 44 6.4.5 4:2:0 ...... 45 6.4.6 4:1:0 ...... 46 6.4.7 3:1:1 ...... 46 6.5 Out-of- ...... 46 6.6 Terminology ...... 46 6.7 History ...... 46 6.8 Effectiveness ...... 47 6.9 Compatibility issues ...... 47 6.10 See also ...... 47 6.11 References ...... 47

7 Discrete cosine transform 49 7.1 Applications ...... 49 7.1.1 JPEG ...... 49 7.2 Informal overview ...... 50 7.3 Formal definition ...... 51 7.3.1 DCT-I ...... 51 7.3.2 DCT-II ...... 51 7.3.3 DCT-III ...... 51 iv CONTENTS

7.3.4 DCT-IV ...... 51 7.3.5 DCT V-VIII ...... 51 7.4 Inverse transforms ...... 52 7.5 Multidimensional DCTs ...... 52 7.6 Computation ...... 53 7.7 Example of IDCT ...... 53 7.8 See also ...... 53 7.9 Notes ...... 54 7.10 Citations ...... 54 7.11 References ...... 54 7.12 Further reading ...... 55 7.13 External links ...... 55

8 H.264/MPEG-4 AVC 56 8.1 Naming ...... 56 8.2 History ...... 57 8.2.1 Versions ...... 57 8.3 Applications ...... 58 8.3.1 Derived formats ...... 59 8.4 Design ...... 59 8.4.1 Features ...... 59 8.4.2 Profiles ...... 61 8.4.3 Levels ...... 63 8.4.4 Decoded picture buffering ...... 63 8.5 Implementations ...... 63 8.5.1 Software encoders ...... 64 8.5.2 Hardware ...... 64 8.6 Licensing ...... 64 8.7 See also ...... 65 8.8 References ...... 65 8.9 Further reading ...... 66 8.10 External links ...... 67

9 Group of pictures 68 9.1 Description ...... 68 9.2 GOP Structure ...... 68 9.3 References ...... 69

10 Video compression picture types 70 10.1 Summary ...... 70 10.2 Pictures/Frames ...... 70 10.3 Slices ...... 70 CONTENTS v

10.4 ...... 71 10.5 Intra coded frames/slices (I‑frames/slices or Key frames) ...... 71 10.6 Predicted frames/slices (P-frames/slices) ...... 71 10.7 Bi-directional predicted frames/slices (B-frames/slices) ...... 71 10.8 See also ...... 72 10.9 References ...... 72 10.10External links ...... 72

11 73 11.1 Inter frame prediction ...... 73 11.2 Frame types ...... 74 11.2.1 P-frame ...... 74 11.2.2 B-frame ...... 74 11.3 Typical Group Of Pictures (GOP) structure ...... 74 11.4 H.264 Inter frame prediction improvements ...... 74 11.4.1 More flexible block partition ...... 74 11.4.2 Resolution of up to ¼ ...... 75 11.4.3 Multiple references ...... 75 11.4.4 Enhanced Direct/Skip ...... 75 11.5 Additional ...... 76 11.6 References ...... 76 11.7 See also ...... 76

12 Motion compensation 77 12.1 How it works ...... 77 12.2 Illustrated example ...... 77 12.3 Motion Compensation in MPEG ...... 77 12.4 Global motion compensation ...... 78 12.5 Block motion compensation ...... 78 12.6 Variable block-size motion compensation ...... 79 12.7 Overlapped block motion compensation ...... 79 12.8 Quarter Pixel (QPel) and Half Pixel motion compensation ...... 79 12.9 3D image coding techniques ...... 79 12.10See also ...... 79 12.11Applications ...... 79 12.12References ...... 79 12.13External links ...... 80

13 81 13.1 Related terms ...... 81 13.2 Algorithms ...... 81 13.2.1 Direct methods ...... 81 vi CONTENTS

13.2.2 Indirect methods ...... 81 13.2.3 Additional note on the categorization ...... 81 13.3 Applications ...... 82 13.3.1 Video coding ...... 82 13.4 See also ...... 82 13.5 References ...... 82 13.6 Text and image sources, contributors, and licenses ...... 83 13.6.1 Text ...... 83 13.6.2 Images ...... 86 13.6.3 Content license ...... 88 Chapter 1

JPEG

For other uses, see JPEG (disambiguation). 1.1 The JPEG standard JPEG (/ˈdʒeɪpɛɡ/ JAY-peg)[1] is a commonly used “JPEG” stands for Joint Photographic Experts Group, the name of the committee that created the JPEG standard and also other still picture coding standards. The “Joint” stood for ISO TC97 WG8 and CCITT SGVIII. In 1987 ISO TC 97 became ISO/IEC JTC1 and in 1992 CCITT became ITU-T. Currently on the JTC1 side JPEG is one of two sub-groups of ISO/IEC Joint Technical Commit- tee 1, Subcommittee 29, Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) – titled as Coding of still pic- tures.[6][7][8] On the ITU-T side ITU-T SG16 is the re- spective body. The original JPEG group was organized in 1986,[9] issuing the first JPEG standard in 1992, which was approved in September 1992 as ITU-T Recommen- dation T.81[10] and in 1994 as ISO/IEC 10918-1. Continuously varied JPEG compression (between Q=100 and The JPEG standard specifies the codec, which defines Q=1) for an abdominal CT scan how an image is compressed into a stream of bytes and decompressed back into an image, but not the file for- mat used to contain that stream.[11] The and JFIF standards define the commonly used file formats for in- terchange of JPEG-compressed images. method of for digital images, partic- ularly for those images produced by digital . JPEG standards are formally named as Information tech- The degree of compression can be adjusted, allowing a nology – Digital compression and coding of continuous- selectable tradeoff between storage size and image qual- tone still images. ISO/IEC 10918 consists of the following ity. JPEG typically achieves 10:1 compression with little parts: [2] perceptible loss in . Ecma International TR/98 specifies the JPEG File Inter- JPEG compression is used in a number of image file for- change Format (JFIF); the first edition was published in mats. JPEG/Exif is the most common image format used June 2009.[14] by digital cameras and other photographic image capture devices; along with JPEG/JFIF, it is the most common format for storing and transmitting photographic images 1.2 Typical usage on the World Wide Web.[3] These format variations are often not distinguished, and are simply called JPEG. The JPEG compression algorithm is at its best on pho- The term “JPEG” is an acronym for the Joint Photo- tographs and paintings of realistic scenes with smooth graphic Experts Group, which created the standard. The variations of tone and color. For web usage, where the MIME media type for JPEG is image/, except in older amount of data used for an image is important, JPEG is Explorer versions, which provides a MIME type very popular. JPEG/Exif is also the most common format of image/pjpeg when uploading JPEG images.[4] JPEG saved by digital cameras. files usually have a filename extension of .jpg or .jpeg. On the other hand, JPEG may not be as well suited for JPEG/JFIF supports a maximum image size of line drawings and other textual or iconic graphics, where 65,535×65,535 ,[5] hence up to 4 gigapixels the sharp contrasts between adjacent pixels can cause no- for an aspect ratio of 1:1. ticeable artifacts. Such images may be better saved in a

1 2 CHAPTER 1. JPEG lossless graphics format such as TIFF, GIF, PNG, or a a portion of the data. However, support for progressive . The JPEG standard actually includes is not universal. When progressive JPEGs are re- a lossless coding mode, but that mode is not supported in ceived by programs that do not support them (such as ver- most products. sions of Internet Explorer before )[15] the soft- As the typical use of JPEG is a lossy compression ware displays the image only after it has been completely method, which somewhat reduces the image fidelity, it downloaded. should not be used in scenarios where the exact repro- There are also many medical imaging and traffic systems duction of the data is required (such as some scientific that create and process 12-bit JPEG images, normally and medical imaging applications and certain technical images. The 12-bit JPEG format has been part image processing work). of the JPEG specification for some time, but this format JPEG is also not well suited to files that will undergo mul- is not as widely supported. tiple edits, as some image quality will usually be lost each time the image is decompressed and recompressed, par- ticularly if the image is cropped or shifted, or if encoding parameters are changed – see digital for details. To avoid this, an image that is being modified or may be modified in the future can be saved in a lossless 1.3.1 Lossless editing format, with a copy exported as JPEG for distribution. See also: jpegtran

1.3 JPEG compression A number of alterations to a JPEG image can be per- formed losslessly (that is, without recompression and the JPEG uses a lossy form of compression based on the associated quality loss) as long as the image size is a mul- discrete cosine transform (DCT). This mathematical op- tiple of 1 MCU block (Minimum Coded Unit) (usually 16 eration converts each frame/field of the video source from pixels in both directions, for 4:2:0 chroma subsampling). the spatial (2D) domain into the (a.k.a. Utilities that implement this include jpegtran, with user transform domain). A perceptual model based loosely on interface Jpegcrop, and the JPG_TRANSFORM plugin the human psychovisual system discards high-frequency to IrfanView. information, i.e. sharp transitions in intensity, and color Blocks can be rotated in 90-degree increments, flipped hue. In the transform domain, the process of reducing in- in the horizontal, vertical and diagonal axes and moved formation is called quantization. In simpler terms, quan- about in the image. Not all blocks from the original image tization is a method for optimally reducing a large number need to be used in the modified one. scale (with different occurrences of each number) into a smaller one, and the transform-domain is a convenient The top and left edge of a JPEG image must lie on an 8 representation of the image because the high-frequency × 8 pixel block boundary, but the bottom and right edge coefficients, which contribute less to the overall picture need not do so. This limits the possible lossless crop op- than other coefficients, are characteristically small-values erations, and also prevents flips and rotations of an image with high compressibility. The quantized coefficients are whose bottom or right edge does not lie on a block bound- then sequenced and losslessly packed into the output bit- ary for all channels (because the edge would end up on top stream. Nearly all software implementations of JPEG or left, where – as aforementioned – a block boundary is permit user control over the compression-ratio (as well obligatory). as other optional parameters), allowing the user to trade Rotations where the image is not a multiple of 8 or 16, off picture-quality for smaller file size. In embedded ap- which value depends upon the chroma subsampling, are plications (such as miniDV, which uses a similar DCT- not lossless. Rotating such an image causes the blocks to compression scheme), the parameters are pre-selected be recomputed which results in loss of quality.[16] and fixed for the application. When using lossless cropping, if the bottom or right side The compression method is usually lossy, meaning that of the crop region is not on a block boundary then the some original image information is lost and cannot be re- rest of the data from the partially used blocks will still be stored, possibly affecting image quality. There is an op- present in the cropped file and can be recovered. It is also tional lossless mode defined in the JPEG standard. How- possible to transform between baseline and progressive ever, this mode is not widely supported in products. formats without any loss of quality, since the only differ- There is also an interlaced progressive JPEG format, in ence is the order in which the coefficients are placed in which data is compressed in multiple passes of progres- the file. sively higher detail. This is ideal for large images that Furthermore, several JPEG images can be losslessly will be displayed while downloading over a slow connec- joined together, as long as they were saved with the same tion, allowing a reasonable after receiving only quality and the edges coincide with block boundaries. 1.5. SYNTAX AND STRUCTURE 3

1.4 JPEG files most JPEG files contain a JFIF marker segment that pre- cedes the Exif header. This allows older readers to cor- The file format known as “JPEG Interchange Format” rectly handle the older format JFIF segment, while newer (JIF) is specified in Annex B of the standard. However, readers also decode the following Exif segment, being this “pure” file format is rarely used, primarily because of less strict about requiring it to appear first. the difficulty of programming encoders and decoders that fully implement all aspects of the standard and because of certain shortcomings of the standard: 1.4.1 JPEG filename extensions

• Color space definition The most common filename extensions for files employ- ing JPEG compression are .jpg and .jpeg, though .jpe, • Component sub-sampling registration .jfif and .jif are also used. It is also possible for JPEG data to be embedded in other file types – TIFF encoded • Pixel aspect ratio definition. files often embed a JPEG image as a thumbnail of the main image; and MP3 files can contain a JPEG of cover Several additional standards have evolved to address these art, in the ID3v2 tag. issues. The first of these, released in 1992, was JPEG File Interchange Format (or JFIF), followed in recent years by Exchangeable image file format (Exif) and ICC color pro- 1.4.2 Color profile files. Both of these formats use the actual JIF byte lay- out, consisting of different markers, but in addition em- Many JPEG files embed an ICC color profile (color ploy one of the JIF standard’s extension points, namely space). Commonly used color profiles include sRGB and the application markers: JFIF uses APP0, while Exif uses Adobe RGB. Because these color spaces use a non-linear APP1. Within these segments of the file, that were left transformation, the of an 8-bit JPEG file for future use in the JIF standard and aren't read by it, is about 11 stops; see gamma curve. these standards add specific . Thus, in some ways JFIF is a cutdown version of the JIF standard in that it specifies certain constraints (such as 1.5 Syntax and structure not allowing all the different encoding modes), while in other ways it is an extension of JIF due to the added meta- A JPEG image consists of a sequence of segments, each data. The documentation for the original JFIF standard beginning with a marker, each of which begins with a states:[17] 0xFF byte followed by a byte indicating what kind of marker it is. Some markers consist of just those two JPEG File Interchange Format is a minimal file bytes; others are followed by two bytes (high then low) format which enables JPEG bitstreams to be ex- indicating the length of marker-specific payload data that changed between a wide variety of platforms follows. (The length includes the two bytes for the length, and applications. This minimal format does not but not the two bytes for the marker.) Some markers include any of the advanced features found in are followed by entropy-coded data; the length of such the TIFF JPEG specification or any application a marker does not include the entropy-coded data. Note specific file format. Nor should it, for the only that consecutive 0xFF bytes are used as fill bytes for purpose of this simplified format is to allow the padding purposes, although this fill byte padding should exchange of JPEG compressed images. only ever take place for markers immediately following entropy-coded scan data (see JPEG specification section Image files that employ JPEG compression are commonly B.1.1.2 and E.1.2 for details; specifically “In all cases called “JPEG files”, and are stored in variants of the JIF where markers are appended after the compressed data, image format. Most image capture devices (such as digi- optional 0xFF fill bytes may precede the marker”). tal cameras) that output JPEG are actually creating files in Within the entropy-coded data, after any 0xFF byte, a the Exif format, the format that the camera industry has 0x00 byte is inserted by the encoder before the next byte, standardized on for metadata interchange. On the other so that there does not appear to be a marker where none hand, since the Exif standard does not allow color pro- is intended, preventing framing errors. Decoders must files, most software stores JPEG in JFIF skip this 0x00 byte. This technique, called byte stuffing format, and also include the APP1 segment from the Exif (see JPEG specification section F.1.2.3), is only applied file to include the metadata in an almost-compliant way; [18] to the entropy-coded data, not to marker payload data. the JFIF standard is interpreted somewhat flexibly. Note however that entropy-coded data has a few markers Strictly speaking, the JFIF and Exif standards are incom- of its own; specifically the Reset markers (0xD0 through patible because each specifies that its marker segment 0xD7), which are used to isolate independent chunks of (APP0 or APP1, respectively) appear first. In practice, entropy-coded data to allow parallel decoding, and en- 4 CHAPTER 1. JPEG

coders are free to insert these Reset markers at regular 5. The resulting data for all 8×8 blocks is further intervals (although not all encoders do this). compressed with a lossless algorithm, a variant of There are other Start Of Frame markers that introduce Huffman encoding. other kinds of JPEG encodings. Since several vendors might use the same APPn marker The decoding process reverses these steps, except the type, application-specific markers often begin with a quantization because it is irreversible. In the remainder standard or vendor name (e.g., “Exif” or “Adobe”) or of this section, the encoding and decoding processes are some other identifying string. described in more detail. At a restart marker, block-to-block predictor variables are reset, and the bitstream is synchronized to a byte bound- ary. Restart markers provide means for recovery after bitstream error, such as transmission over an unreliable 1.6.1 Encoding network or file corruption. Since the runs of macroblocks between restart markers may be independently decoded, Many of the options in the JPEG standard are not com- these runs may be decoded in parallel. monly used, and as mentioned above, most image soft- ware uses the simpler JFIF format when creating a JPEG file, which among other things specifies the encoding method. Here is a brief description of one of the more 1.6 JPEG codec example common methods of encoding when applied to an input that has 24 bits per pixel (eight each of , , and Although a JPEG file can be encoded in various ways, ). This particular option is a lossy most commonly it is done with JFIF encoding. The en- method. coding process consists of several steps:

1. The representation of the colors in the image is con- Color space transformation verted from RGB to Y′CBCR, consisting of one component (Y'), representing , and First, the image should be converted from RGB into two chroma components, (CB and CR), represent- a different color space called Y′CBCR (or, informally, ing color. This step is sometimes skipped. YCbCr). It has three components Y', CB and CR: the Y' component represents the brightness of a pixel, and 2. The resolution of the chroma data is reduced, usu- the CB and CR components represent the chrominance ally by a factor of 2 or 3. This reflects the fact that (split into blue and red components). This is basically the eye is less sensitive to fine color details than to the same color space as used by digital color as fine brightness details. well as including video , and is simi- lar to the way color is represented in analog PAL video 3. The image is split into blocks of 8×8 pixels, and for and MAC (but not by analog NTSC, which uses the YIQ each block, each of the Y, CB, and CR data un- color space). The Y′CBCR color space conversion allows dergoes the discrete cosine transform (DCT), which greater compression without a significant effect on per- was developed in 1974 by N. Ahmed, T. Natara- ceptual image quality (or greater perceptual image quality jan and K. R. Rao; see Citation 1 in discrete cosine for the same compression). The compression is more ef- transform. A DCT is similar to a Fourier transform ficient because the brightness information, which is more in the sense that it produces a kind of spatial fre- important to the eventual perceptual quality of the image, quency spectrum. is confined to a single . This more closely cor- 4. The amplitudes of the frequency components are responds to the perception of color in the human visual quantized. Human vision is much more sensitive to system. The color transformation also improves compres- small variations in color or brightness over large ar- sion by statistical decorrelation. eas than to the strength of high-frequency brightness A particular conversion to Y′CBCR is specified in the variations. Therefore, the magnitudes of the high- JFIF standard, and should be performed for the result- frequency components are stored with a lower accu- ing JPEG file to have maximum compatibility. However, racy than the low-frequency components. The qual- some JPEG implementations in “highest quality” mode ity setting of the encoder (for example 50 or 95 on do not apply this step and instead keep the color informa- a scale of 0–100 in the Independent JPEG Group’s tion in the RGB , where the image is stored library[20]) affects to what extent the resolution of in separate channels for red, green and blue brightness each frequency component is reduced. If an exces- components. This results in less efficient compression, sively low quality setting is used, the high-frequency and would not likely be used when file size is especially components are discarded altogether. important. 1.6. JPEG CODEC EXAMPLE 5

Downsampling

Due to the densities of color- and brightness-sensitive re- ceptors in the human eye, humans can see considerably more fine detail in the brightness of an image (the Y' com- ponent) than in the hue and color saturation of an image (the Cb and Cr components). Using this knowledge, en- coders can be designed to images more effi- ciently. The transformation into the Y′CBCR color model en- ables the next usual step, which is to reduce the spa- tial resolution of the Cb and Cr components (called "downsampling" or "chroma subsampling"). The ratios at which the downsampling is ordinarily done for JPEG images are 4:4:4 (no downsampling), 4:2:2 (reduction by a factor of 2 in the horizontal direction), or (most com- monly) 4:2:0 (reduction by a factor of 2 in both the hor- izontal and vertical directions). For the rest of the com- pression process, Y', Cb and Cr are processed separately The 8×8 sub-image shown in 8-bit grayscale and in a very similar manner.

Block splitting

After subsampling, each channel must be split into 8×8 blocks. Depending on chroma subsampling, this yields Minimum Coded Unit (MCU) blocks of size 8×8 (4:4:4   – no subsampling), 16×8 (4:2:2), or most commonly 52 55 61 66 70 61 64 73   16×16 (4:2:0). In video compression MCUs are called  63 59 55 90 109 85 69 72    macroblocks.  62 59 68 113 144 104 66 73     63 58 71 122 154 106 70 69  If the data for a channel does not represent an integer   .  67 61 68 104 126 88 68 70  number of blocks then the encoder must fill the remaining    79 65 60 70 77 68 58 75  area of the incomplete blocks with some form of dummy   data. Filling the edges with a fixed color (for example, 85 71 64 59 55 61 65 83 ) can create artifacts along the visible part of 87 79 69 68 65 76 78 94 the border; repeating the edge pixels is a common tech- nique that reduces (but does not necessarily completely eliminate) such artifacts, and more sophisticated border filling techniques can also be applied. Before computing the DCT of the 8×8 block, its values are shifted from a positive range to one centered on zero. For an 8-bit image, each entry in the original block falls in the range [0, 255] . The midpoint of the range (in this Discrete cosine transform case, the value 128) is subtracted from each entry to pro- duce a data range that is centered on zero, so that the Next, each 8×8 block of each component (Y, Cb, Cr) modified range is [−128, 127] . This step reduces the dy- is converted to a frequency-domain representation, us- namic range requirements in the DCT processing stage ing a normalized, two-dimensional type-II discrete cosine that follows. (Aside from the difference in dynamic range transform (DCT), which was introduced by N. Ahmed, within the DCT stage, this step is mathematically equiv- T. Natarajan and K. R. Rao in 1974; see Citation 1 in alent to subtracting 1024 from the DC coefficient after discrete cosine transform. The DCT is sometimes re- performing the transform – which may be a better way to ferred to as “type-II DCT” in the context of a family of perform the operation on some architectures since it in- transforms as in discrete cosine transform, and the corre- volves performing only one subtraction rather than 64 of sponding inverse (IDCT) is denoted as “type-III DCT”. them.) As an example, one such 8×8 8-bit subimage might be: This step results in the following values: 6 CHAPTER 1. JPEG

If we perform this transformation on our matrix above, we get the following (rounded to the nearest two digits x beyond the decimal point):  −→  −76 −73 −67 −62 −58 −67 −64 −55    −65 −69 −73 −38 −19 −43 −59 −56      u  −66 −69 −60 −15 16 −24 −62 −55   g =   y. −→  −65 −70 −57 −6 26 −22 −58 −59 y    −415.38 −30.19 −61.20 27.24 56.12 −20.10 −2.39 0.46  −61 −67 −60 −24 −2 −40 −60 −58     4.47 −21.86 −60.76 10.25 13.15 −7.09 −8.54 4.88    −49 −63 −68 −58 −51 −60 −70 −53      −46.83 7.37 77.13 −24.56 −28.91 9.93 5.42 −5.65   −43 −57 −64 −69 −73 −67 −63 −G45=   v.  −48.53 12.07 34.10 −14.76 −10.24 6.30 1.83 1.95  y −41 −49 −59 −60 −63 −52 −50 −34    12.12 −6.55 −13.20 −3.95 −1.87 1.75 −2.79 3.14    The next step is to take the two-dimensional DCT, which  −7.73 2.91 2.38 −5.94 −2.38 0.94 4.30 1.85    is given by: −1.03 0.18 0.42 −2.42 −0.88 −3.02 4.12 −0.66 −0.17 0.14 −1.07 −4.19 −1.17 −0.10 0.50 1.68

Note the top-left corner entry with the rather large magni- tude. This is the DC coefficient (also called the constant component), which defines the basic hue for the entire block. The remaining 63 coefficients are the AC coef- ficients (also called the alternating components).[21] The advantage of the DCT is its tendency to aggregate most of the signal in one corner of the result, as may be seen above. The quantization step to follow accentuates this effect while simultaneously reducing the overall size of the DCT coefficients, resulting in a signal that is easy to compress efficiently in the entropy stage. The DCT temporarily increases the bit-depth of the data, since the DCT coefficients of an 8-bit/component image take up to 11 or more bits (depending on fidelity of the DCT calculation) to store. This may force the codec to temporarily use 16-bit numbers to hold these coefficients, doubling the size of the image representation at this point; The DCT transforms an 8×8 block of input values to a linear these values are typically reduced back to 8-bit values by combination of these 64 patterns. The patterns are referred to as the quantization step. The temporary increase in size at the two-dimensional DCT basis functions, and the output values this stage is not a performance concern for most JPEG are referred to as transform coefficients. The horizontal index is implementations, since typically only a very small part of u and the vertical index is v . the image is stored in full DCT form at any given time during the image encoding or decoding process.

[ ] [ ] 1 ∑7 ∑7 (2x + 1)uπ (2y + 1)vπ G = α(u)α(v) g cos cos Quantization u,v 4 x,y 16 16 x=0 y=0 The human eye is good at seeing small differences in where brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency • u is the horizontal spatial frequency, for the integers brightness variation. This allows one to greatly reduce 0 ≤ u < 8 . the amount of information in the high frequency compo- nents. This is done by simply dividing each component in • v is the vertical spatial frequency, for the integers the frequency domain by a constant for that component, 0 ≤ v < 8 . { and then rounding to the nearest integer. This rounding √1 , if u = 0 operation is the only lossy operation in the whole process • α(u) = 2 is a normalizing scale (other than chroma subsampling) if the DCT computa- 1, otherwise tion is performed with sufficiently high precision. As a factor to make the transformation orthonormal result of this, it is typically the case that many of the higher frequency components are rounded to zero, and • gx,y is the pixel value at coordinates (x, y) many of the rest become small positive or num- • Gu,v is the DCT coefficient at coordinates (u, v). bers, which take many fewer bits to represent. 1.6. JPEG CODEC EXAMPLE 7

The elements in the quantization matrix control the com- Entropy coding pression ratio, with larger values producing greater com- pression. A typical quantization matrix (for a quality of Main article: 50% as specified in the original JPEG Standard), is as Entropy coding is a special form of lossless data com- follows:

  16 11 10 16 24 40 51 61   12 12 14 19 26 58 60 55    14 13 16 24 40 57 69 56    14 17 22 29 51 87 80 62  Q =  . 18 22 37 56 68 109 103 77    24 35 55 64 81 104 113 92  49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 The quantized DCT coefficients are computed with

( ) Gj,k Bj,k = round for j = 0, 1, 2,..., 7; k = 0, 1, 2,..., 7 Qj,k where G is the unquantized DCT coefficients; Q is the quantization matrix above; and B is the quantized DCT coefficients. Using this quantization matrix with the DCT coefficient Zigzag ordering of JPEG image components matrix from above results in: pression. It involves arranging the image components in a "zigzag" order employing run-length encoding (RLE) algorithm that groups similar frequencies together, insert- ing length coding zeros, and then using Huffman coding on what is left. The JPEG standard also allows, but does not require, de- coders to support the use of , which is Left: a final image is built up from a series of basis functions. mathematically superior to Huffman coding. However, Right: each of the DCT basis functions that comprise the image, this feature has rarely been used, as it was historically and the corresponding weighting coefficient. Middle: the basis covered by patents requiring royalty-bearing licenses, and function, after multiplication by the coefficient: this component is because it is slower to encode and decode compared to added to the final image. For clarity, the 8x8 macroblock in this Huffman coding. Arithmetic coding typically makes files example is magnified by 10x using bilinear interpolation. about 5–7% smaller. The previous quantized DC coefficient is used to predict   the current quantized DC coefficient. The difference be- −26 −3 −6 2 2 −1 0 0   tween the two is encoded rather than the actual value. The  0 −2 −4 1 1 0 0 0    encoding of the 63 quantized AC coefficients does not  −3 1 5 −1 −1 0 0 0    use such prediction differencing.  −3 1 2 −1 0 0 0 0  B =   .  1 0 0 0 0 0 0 0  The zigzag sequence for the above quantized coefficients    0 0 0 0 0 0 0 0  are shown below. (The format shown is just for ease of  0 0 0 0 0 0 0 0  understanding/viewing.) 0 0 0 0 0 0 0 0 For example, using −415 (the DC coefficient) and round- ing to the nearest integer

If the i-th block is represented by Bi and positions within ( ) each block are represented by (p, q) where p = 0, 1, ..., 7 −415.37 round = round (−25.96) = −26. and q = 0, 1, ..., 7 , then any coefficient in the DCT im- 16 age can be represented as Bi(p, q) . Thus, in the above Notice that most of the higher-frequency elements of the scheme, the order of encoding pixels (for the i-th block) sub-block (i.e., those with an x or y spatial frequency is Bi(0, 0) , Bi(0, 1) , Bi(1, 0) , Bi(2, 0) , Bi(1, 1) , greater than 4) are compressed into zero values. Bi(0, 2) , Bi(0, 3) , Bi(1, 2) and so on. 8 CHAPTER 1. JPEG

• AMPLITUDE is the bit-representation of x.

The run-length encoding works by examining each non- zero AC coefficient x and determining how many zeroes came before the previous AC coefficient. With this in- formation, two symbols are created:

Both RUNLENGTH and SIZE rest on the same byte, Baseline sequential JPEG encoding and decoding processes meaning that each only contains four bits of information. The higher bits deal with the number of zeroes, while the This encoding mode is called baseline sequential encod- lower bits denote the number of bits necessary to encode ing. Baseline JPEG also supports progressive encoding. the value of x. While sequential encoding encodes coefficients of a This has the immediate implication of Symbol 1 being single block at a time (in a zigzag manner), progressive only able store information regarding the first 15 zeroes encoding encodes similar-positioned batch of coeffi- preceding the non-zero AC coefficient. However, JPEG cients of all blocks in one go (called a scan), followed defines two special Huffman code words. One is for end- by the next batch of coefficients of all blocks, and so ing the sequence prematurely when the remaining co- on. For example, if the image is divided into N 8×8 efficients are zero (called “End-of-Block” or “EOB”), blocks B0,B1,B2, ..., Bn−1 , then a 3-scan progressive and another when the run of zeroes goes beyond 15 be- encoding encodes DC component, Bi(0, 0) for all fore reaching a non-zero AC coefficient. In such a case blocks, i.e., for all i = 0, 1, 2, ..., N − 1 , in first scan. where 16 zeroes are encountered before a given non-zero This is followed by the second scan which encoding AC coefficient, Symbol 1 is encoded “specially” as: (15, a few more components (assuming four more compo- 0)(0). nents, they are B (0, 1) to B (1, 1) , still in a zigzag i i The overall process continues until “EOB” – denoted by manner) coefficients of all blocks (so the sequence is: (0, 0) – is reached. B0(0, 1),B0(1, 0),B0(2, 0),B0(1, 1),B1(0, 1),B1(1, 0), ..., BN (2, 0),BN (1, 1) ), followed by all the remained coefficients of all blocks With this in mind, the sequence from earlier becomes: in the last scan. It should be noted here that once all similar-positioned co- (0, 2)(−3); (1, 2)(−3); (0, 2)(−2); (0, 3)(−6); efficients have been encoded, the next position to be en- (0, 2)(2); (0, 3)(−4); (0, 1)(1); (0, 2)(−3); (0, coded is the one occurring next in the zigzag traversal as 1)(1); indicated in the figure above. It has been found that base- (0, 1)(1); (0, 3)(5); (0, 1)(1); (0, 2)(2); (0, line progressive JPEG encoding usually gives better com- 1)(−1); (0, 1)(1); (0, 1)(−1); (0, 2)(2); (5, pression as compared to baseline sequential JPEG due to 1)(−1); the ability to use different Huffman tables (see below) tai- lored for different frequencies on each “scan” or “pass” (0, 1)(−1); (0, 0). (which includes similar-positioned coefficients), though the difference is not too large. (The first value in the matrix, −26, is the DC coefficient; In the rest of the article, it is assumed that the coefficient it is not encoded the same way. See above.) pattern generated is due to sequential mode. From here, frequency calculations are made based on oc- In order to encode the above generated coefficient pat- currences of the coefficients. In our example block, most tern, JPEG uses Huffman encoding. The JPEG standard of the quantized coefficients are small numbers that are provides general-purpose Huffman tables; encoders may not preceded immediately by a zero coefficient. These also choose to generate Huffman tables optimized for the more-frequent cases will be represented by shorter code actual frequency distributions in images being encoded. words. The process of encoding the zig-zag quantized data be- gins with a run-length encoding explained below, where: 1.6.2 Compression ratio and artifacts

• x is the non-zero, quantized AC coefficient. The resulting compression ratio can be varied according • RUNLENGTH is the number of zeroes that came be- to need by being more or less aggressive in the divisors fore this non-zero AC coefficient. used in the quantization phase. Ten to one compression usually results in an image that cannot be distinguished • SIZE is the number of bits required to represent x. by eye from the original. A compression ration of 100:1 1.6. JPEG CODEC EXAMPLE 9

This image shows the pixels that are different between a non- compressed image and the same image JPEG compressed with a quality setting of 50. Darker means a larger difference. Note especially the changes occurring near sharp edges and having a block-like shape.

The compressed 8×8 squares are visible in the scaled-up picture, together with other visual artifacts of the lossy compression

mosquito , as the resulting “edge busyness” and spu- rious dots, which change over time, resemble mosquitoes swarming around the object.[22][23] These artifacts can be reduced by choosing a lower level of compression; they may be completely avoided by sav- ing an image using a lossless file format, though this will result in a larger file size. The images created with ray- tracing programs have noticeable blocky shapes on the terrain. Certain low-intensity compression artifacts might be acceptable when simply viewing the images, but can be The original image emphasized if the image is subsequently processed, usu- ally resulting in unacceptable quality. Consider the exam- ple below, demonstrating the effect of lossy compression is usually possible, but will look distinctly artifacted com- on an edge detection processing step. pared to the original. The appropriate level of compres- Some programs allow the user to vary the amount by sion depends on the use to which the image will be put. which individual blocks are compressed. Stronger com- Those who use the World Wide Web may be familiar with pression is applied to areas of the image that show fewer the irregularities known as compression artifacts that ap- artifacts. This way it is possible to manually reduce JPEG pear in JPEG images, which may take the form of noise file size with less loss of quality. around contrasting edges (especially curves and corners), Since the quantization stage always results in a loss of in- or “blocky” images. These are due to the quantization formation, JPEG standard is always a lossy compression step of the JPEG algorithm. They are especially no- codec. (Information is lost both in quantizing and round- ticeable around sharp corners between contrasting colors ing of the floating-point numbers.) Even if the quantiza- (text is a good example, as it contains many such corners). tion matrix is a matrix of ones, information will still be The analogous artifacts in MPEG video are referred to as lost in the rounding step. 10 CHAPTER 1. JPEG

1.6.3 Decoding

Decoding to display the image consists of doing all the above in reverse. Taking the DCT coefficient matrix (after adding the dif- ference of the DC coefficient back in)

  −26 −3 −6 2 2 −1 0 0    0 −2 −4 1 1 0 0 0    Notice  −3 1 5 −1 −1 0 0 0    the slight differences between the original (top) and  −3 1 2 −1 0 0 0 0    decompressed image (bottom), which is most readily  1 0 0 0 0 0 0 0    seen in the bottom-left corner.  0 0 0 0 0 0 0 0      0 0 0 0 0 0 0 0 −66 −63 −71 −68 −56 −65 −68 −46 0 0 0 0 0 0 0 0    −71 −73 −72 −46 −20 −41 −66 −57     −70 −78 −68 −17 20 −14 −61 −63  and taking the entry-for-entry product with the quantiza-    −63 −73 −62 −8 27 −14 −60 −58  tion matrix from above results in    −58 −65 −61 −27 −6 −40 −68 −50     −57 −57 −64 −58 −48 −66 −72 −47      −53 −46 −61 −74 −65 −63 −62 −45 −416 −33 −60 32 48 −40 0 0 −47 −34 −53 −74 −60 −47 −47 −41    0 −24 −56 19 26 0 0 0    and adding 128 to each entry  −42 13 80 −24 −40 0 0 0     −42 17 44 −29 0 0 0 0       18 0 0 0 0 0 0 0  62 65 57 60 72 63 60 82  0 0 0 0 0 0 0 0       57 55 56 82 108 87 62 71   0 0 0 0 0 0 0 0     58 50 60 111 148 114 67 65  0 0 0 0 0 0 0 0    65 55 66 120 155 114 68 70    .  70 63 67 101 122 88 60 78    which closely resembles the original DCT coefficient ma-  71 71 64 70 80 62 56 81  trix for the top-left portion.  75 82 67 54 63 65 66 83  The next step is to take the two-dimensional inverse DCT 81 94 75 54 68 81 81 87 (a 2D type-III DCT), which is given by: ∑ ∑ [ ] [This is the] decompressed subimage. In general, the de- 1 7 7 (2x+1)uπ (2y+1)vπ fx,y = 4 u=0 v=0 α(u)α(v)Fu,v cos 16 cos compression16 process may produce values outside the orig- inal input range of [0, 255] . If this occurs, the decoder where needs to clip the output values keep them within that range to prevent overflow when storing the decompressed • x is the pixel row, for the integers 0 ≤ x < 8 . image with the original bit depth. The decompressed subimage can be compared to the • y is the pixel column, for the integers 0 ≤ y < 8 . original subimage (also see images to the right) by tak- ing the difference (original − uncompressed) results in the following error values: • α(u) is defined as above, for the integers 0 ≤ u < 8 .   −10 −10 4 6 −2 −2 4 −9 •   Fu,v is the reconstructed approximate coefficient at  6 4 −1 8 1 −2 7 1    coordinates (u, v).  4 9 8 2 −4 −10 −1 8     −2 3 5 2 −1 −8 2 −1    • fx,y is the reconstructed pixel value at coordinates  −3 −2 1 3 4 0 8 −8    (x, y)  8 −6 −4 −0 −3 6 2 −6   10 −11 −3 5 −8 −4 −1 −0  6 −15 −6 14 −3 −5 −3 7 Rounding the output to integer values (since the original had integer values) results in an image with values (still with an average∑ ∑ absolute error of about 5 values per pixels 1 7 7 | | shifted down by 128) (i.e., 64 x=0 y=0 e(x, y) = 4.8750 ). 1.8. LOSSLESS FURTHER COMPRESSION 11

The error is most noticeable in the bottom-left corner 1.7.1 Sample photographs where the bottom-left pixel becomes darker than the pixel to its immediate right. For information, the uncompressed 24-bit RGB image below (73,242 pixels) would require 219,726 bytes (excluding all other information headers). The filesizes 1.6.4 Required precision indicated below include the internal JPEG information headers and some meta-data. For highest quality im- The encoding description in the JPEG standard does not ages (Q=100), about 8.25 bits per color pixel is required. fix the precision needed for the output compressed im- On grayscale images, a minimum of 6.5 bits per pixel age. However, the JPEG standard (and the similar MPEG is enough (a comparable Q=100 quality color informa- standards) includes some precision requirements for the tion requires about 25% more encoded bits). The high- decoding, including all parts of the decoding process est quality image below (Q=100) is encoded at nine bits (variable length decoding, inverse DCT, dequantization, per color pixel, the medium quality image (Q=25) uses renormalization of outputs); the output from the refer- one bit per color pixel. For most applications, the quality ence algorithm must not exceed: factor should not go below 0.75 bit per pixel (Q=12.5), as demonstrated by the low quality image. The image at • a maximum of one bit of difference for each pixel lowest quality uses only 0.13 bit per pixel, and displays component very poor color. This is useful when the image will be displayed in a significantly scaled-down size. A method • low mean square error over each 8×8-pixel block for creating better quantization matrices for a given image • very low mean error over each 8×8-pixel block quality using PSNR instead of the Q factor is described in Minguillón & Pujol (2001).[24] • very low mean square error over the whole image

• extremely low mean error over the whole image

These assertions are tested on a large set of randomized The medium quality photo uses only 4.3% of the storage input images, to handle the worst cases. The former IEEE space required for the uncompressed image, but has lit- 1180–1990 standard contained some similar precision re- tle noticeable loss of detail or visible artifacts. However, quirements. The precision has a consequence on the im- once a certain threshold of compression is passed, com- plementation of decoders, and it is critical because some pressed images show increasingly visible defects. See the encoding processes (notably used for encoding sequences article on rate– theory for a mathematical ex- of images like MPEG) need to be able to construct, on planation of this threshold effect. A particular limitation the encoder side, a reference decoded image. In order of JPEG in this regard is its non-overlapped 8×8 block to support 8-bit precision per pixel component output, transform structure. More modern designs such as JPEG dequantization and inverse DCT transforms are typically 2000 and JPEG XR exhibit a more graceful degradation implemented with at least 14-bit precision in optimized of quality as the bit usage decreases – by using transforms decoders. with a larger spatial extent for the lower frequency coeffi- cients and by using overlapping transform basis functions.

1.7 Effects of JPEG compression 1.8 Lossless further compression JPEG compression artifacts blend well into photographs with detailed non-uniform textures, allowing higher com- From 2004 to 2008 new research emerged on ways to fur- pression ratios. Notice how a higher compression ra- ther compress the data contained in JPEG images with- tio first affects the high-frequency textures in the upper- out modifying the represented image.[25][26][27][28] This left corner of the image, and how the contrasting lines has applications in scenarios where the original image is become more fuzzy. The very high compression ra- only available in JPEG format, and its size needs to be tio severely affects the quality of the image, although reduced for archiving or transmission. Standard general- the overall colors and image form are still recognizable. purpose compression tools cannot significantly compress However, the precision of colors suffer less (for a human JPEG files. eye) than the precision of contours (based on luminance). Typically, such schemes take advantage of improvements This justifies the fact that images should be first trans- to the naive scheme for coding DCT coefficients, which formed in a color model separating the luminance from fails to take into account: the chromatic information, before subsampling the chro- matic planes (which may also use lower quality quantiza- tion) in order to preserve the precision of the luminance • Correlations between magnitudes of adjacent coef- plane with more information bits. ficients in the same block; 12 CHAPTER 1. JPEG

• Correlations between magnitudes of the same coef- Stereoscopic (JPS, extension .jps) is a JPEG-based for- ficient in adjacent blocks; mat for stereoscopic images.[30][31] It has a range of con- figurations stored in the JPEG APP3 marker field, but • Correlations between magnitudes of the same coef- usually contains one image of double width, representing ficient/block in different channels; two images of identical size in cross-eyed (i.e. left frame on the right half of the image and vice versa) side-by-side • The DC coefficients when taken together resemble arrangement. This file format can be viewed as a JPEG a downscale version of the original image multi- without any special software, or can be processed for ren- plied by a scaling factor. Well-known schemes for dering in other modes. lossless coding of continuous-tone images can be ap- plied, achieving somewhat better compression than the Huffman coded DPCM used in JPEG. 1.9.2 JPEG Multi-Picture Format

Some standard but rarely used options already exist in JPEG Multi-Picture Format (MPO, extension .mpo) is a JPEG to improve the efficiency of coding DCT coeffi- JPEG-based format for multi-view images. It contains [32][33] cients: the arithmetic coding option, and the progres- two or more JPEG files concatenated together. sive coding option (which produces lower bitrates be- There are also special EXIF fields describing its purpose. cause values for each coefficient are coded independently, This is used by the Fujifilm FinePix Real 3D W1 cam- and each coefficient has a significantly different distri- era, Lumix DMC-TZ20, DMC-TZ30, DMC- bution). Modern methods have improved on these tech- TZ60& DMC-TS4 (FT4), DSC-HX7V, HTC Evo niques by reordering coefficients to group coefficients of 3D, the JVC GY-HMZ1U AVCHD/MVC extension larger magnitude together;[25] using adjacent coefficients and by the Nintendo 3DS for its 3D Camera. and blocks to predict new coefficient values;[27] dividing In the last few years, due to the growing use of stereo- blocks or coefficients up among a small number of in- scopic images, much effort has been spent by the sci- dependently coded models based on their statistics and entific community to develop algorithms for stereoscopic [34][35] adjacent values;[26][27] and most recently, by decoding . blocks, predicting subsequent blocks in the spatial do- main, and then encoding these to generate predictions for DCT coefficients.[28] 1.10 Patent issues Typically, such methods can compress existing JPEG files between 15 and 25 percent, and for JPEGs compressed In 2002, Forgent Networks asserted that it owned and at low-quality settings, can produce improvements of up would enforce patent rights on the JPEG technology, aris- to 65%.[27][28] ing from a patent that had been filed on October 27, 1986, and granted on October 6, 1987 (U.S. Patent 4,698,672). A freely available tool called packJPG[29] is based on the The announcement created a furor reminiscent of Unisys' 2007 paper “Improved Redundancy Reduction for JPEG attempts to assert its rights over the GIF image compres- Files.” sion standard. The JPEG committee investigated the patent claims in 2002 and were of the opinion that they were invalidated 1.9 Derived formats for stereo- by prior art.[36] Others also concluded that Forgent did scopic 3D not have a patent that covered JPEG.[37] Nevertheless, between 2002 and 2004 Forgent was able to obtain about 1.9.1 JPEG Stereoscopic US$105 million by licensing their patent to some 30 com- panies. In April 2004, Forgent sued 31 other companies to enforce further license payments. In July of the same year, a consortium of 21 large computer companies filed a countersuit, with the goal of invalidating the patent. In addition, launched a separate lawsuit against Forgent in April 2005.[38] In February 2006, the United States Patent and Trademark Office agreed to re-examine Forgent’s JPEG patent at the request of the Public Patent [39] An example of a stereoscopic .JPS file Foundation. On May 26, 2006 the USPTO found the patent invalid based on prior art. The USPTO also found JPS is a stereoscopic JPEG image used for creating 3D that Forgent knew about the prior art, and did not tell the Patent Office, making any appeal to reinstate the patent effects from 2D images. It contains two static images, [40] one for the left eye and one for the right eye; encoded highly unlikely to succeed. as two side-by-side images in a single JPG file. JPEG Forgent also possesses a similar patent granted by the Eu- 1.11. IMPLEMENTATIONS 13 ropean Patent Office in 1994, though it is unclear how separate grounds.[49] On Nov. 24, 2009, a Reexamina- enforceable it is.[41] tion Certificate was issued cancelling all claims. As of October 27, 2006, the U.S. patent’s 20-year term Beginning in 2011 and continuing as of early 2013, an appears to have expired, and in November 2006, Forgent entity known as Princeton Corporation,[50] agreed to abandon enforcement of patent claims against based in Eastern Texas, began suing large numbers use of the JPEG standard.[42] of companies for alleged infringement of U.S. Patent 4,813,056. Princeton claims that the JPEG image com- The JPEG committee has as one of its explicit goals that their standards (in particular their baseline methods) be pression standard infringes the '056 patent and has sued implementable without payment of license fees, and they large numbers of websites, retailers, camera and device have secured appropriate license rights for their JPEG manufacturers and resellers. The patent was originally 2000 standard from over 20 large organizations. owned and assigned to General Electric. The patent ex- pired in December 2007, but Princeton has sued large Beginning in August 2007, another company, Global numbers of companies for “past infringement” of this Patent Holdings, LLC claimed that its patent (U.S. Patent patent. (Under U.S. patent laws, a patent owner can sue 5,253,341) issued in 1993, is infringed by the download- for “past infringement” up to six years before the filing ing of JPEG images on either a website or through e-mail. of a lawsuit, so Princeton could theoretically have con- If not invalidated, this patent could apply to any website tinued suing companies until December 2013.) As of that displays JPEG images. The patent emerged in July March 2013, Princeton had suits pending in New York 2007 following a seven-year reexamination by the U.S. and Delaware against more than 55 companies. General Patent and Trademark Office in which all of the original Electric’s involvement in the suit is unknown, although claims of the patent were revoked, but an additional claim court records indicate that it assigned the patent to Prince- (claim 17) was confirmed.[43] ton in 2009 and retains certain rights in the patent.[51] In its first two lawsuits following the reexamination, both filed in Chicago, Illinois, Global Patent Holdings sued the Green Bay Packers, CDW, Motorola, Apple, Orb- 1.11 Implementations itz, Officemax, Caterpillar, Kraft and Peapod as defen- dants. A third lawsuit was filed on December 5, 2007 A very important implementation of a JPEG codec is in South Florida against ADT Security Services, Auto- the free programming library of the Independent Nation, Florida Crystals Corp., HearUSA, MovieTick- JPEG Group. It was first published in 1991 and was key ets.com, Ocwen Financial Corp. and Tire Kingdom, and for the success of the standard.[52] This library or a direct a fourth lawsuit on January 8, 2008 in South Florida derivative of it is used in countless applications. against the Boca Raton Resort & Club. A fifth lawsuit was filed against Global Patent Holdings in Nevada. That law- suit was filed by Zappos.com, Inc., which was allegedly threatened by Global Patent Holdings, and seeks a judi- 1.12 See also cial declaration that the '341 patent is invalid and not in- fringed. • new format based on intra- Global Patent Holdings had also used the '341 patent frame encoding of the HEVC to sue or threaten outspoken critics of broad software • -Cube an early implementer of JPEG in chip form patents, including Gregory Aharonian[44] and the anony- mous operator of a website blog known as the “Patent • Comparison of graphics file formats Troll Tracker.”[45] On December 21, 2007, patent lawyer Vernon Francissen of Chicago asked the U.S. Patent and • Comparison of layout engines (graphics) Trademark Office to reexamine the sole remaining claim of the '341 patent on the basis of new prior art.[46] • Deblocking filter (video), the similar deblocking methods could be applied to JPEG On March 5, 2008, the U.S. Patent and Trademark Office agreed to reexamine the '341 patent, finding that the new • Design rule for Camera File system (DCF) prior art raised substantial new questions regarding the patent’s validity.[47] In of the reexamination, the ac- • File extensions cused infringers in four of the five pending lawsuits have • filed motions to suspend (stay) their cases until comple- Graphics editing program tion of the U.S. Patent and Trademark Office’s review • High Efficiency Image , image container of the '341 patent. On April 23, 2008, a judge presid- format for HEVC and other image coding formats ing over the two lawsuits in Chicago, Illinois granted the [48] motions in those cases. On July 22, 2008, the Patent • Image compression Office issued the first “Office Action” of the second re- examination, finding the claim invalid based on nineteen • Image file formats 14 CHAPTER 1. JPEG

, the traditional standard image used to test [15] “Progressive Decoding Overview”. Microsoft Developer image processing algorithms Network. Microsoft. Retrieved 2012-03-23.

• Lossless Image Codec FELICS [16] “Why You Should Always Rotate Original JPEG Photos Losslessly”. • Motion JPEG [17] “JFIF File Format as PDF” (PDF). • PGF [18] Tom Lane (1999-03-29). “JPEG image compression • PNG FAQ”. Retrieved 2007-09-11. (q. 14: “Why all the argu- ment about file formats?") • WebP [19] “ISO/IEC 10918-1 : 1993(E) p.36”.

[20] Thomas G. Lane. “Advanced Features: Compression pa- 1.13 References rameter selection”. Using the IJG JPEG Library.

[1] “Definition of “JPEG"". Collins English Dictionary. Re- [21] http://forum.doom9.org/showthread.php?p=184647# trieved 23 May 2013. post184647

[2] Haines, Richard F.; Chuang, Sherry L. (1 July 1992). [22] Phuc-Tue Le Dinh and Jacques Patry. Video compression The effects of video compression on acceptability of im- artifacts and MPEG . Video Imaging De- ages for monitoring life sciences experiments (Techni- signLine. February 24, 2006. Retrieved May 28, 2009. cal report). NASA. NASA-TP-3239, A-92040, NAS [23] "3.9 mosquito noise: Form of edge busyness distor- 1.60:3239. Retrieved 13 March 2016. The JPEG still- tion sometimes associated with movement, characterized image-compression levels, even with the large range of by moving artifacts and/or blotchy noise patterns super- 5:1 to 120:1 in this study, yielded equally high levels of imposed over the objects (resembling a mosquito fly- acceptability ing around a person’s head and shoulders).” ITU-T Rec. [3] “HTTP Archive - Interesting Stats”. httparchive.org. Re- P.930 (08/96) Principles of a reference impairment sys- trieved 2016-04-06. tem for video

[4] MIME Type Detection in Internet Explorer: Uploaded [24] Julià Minguillón, Jaume Pujol (April 2001). “JPEG MIME Types (msdn.microsoft.com) standard uniform quantization error modeling with appli- cations to sequential and progressive operation modes”. [5] JPEG File Layout and Format Electronic Imaging. 10 (2): 475–485. Retrieved 10 June 2016. [6] ISO/IEC JTC 1/SC 29 (2009-05-07). “ISO/IEC JTC 1/SC 29/WG 1 – Coding of Still Pictures (SC 29/WG 1 [25] I. Bauermann and E. Steinbacj. Further Lossless Com- Structure)". Retrieved 2009-11-11. pression of JPEG Images. Proc. of Picture Coding Sym- posium (PCS 2004), San Francisco, USA, December 15– [7] ISO/IEC JTC 1/SC 29. “Programme of Work, (Allocated 17, 2004. to SC 29/WG 1)". Retrieved 2009-11-07. [26] N. Ponomarenko, K. Egiazarian, V. Lukin and J. Astola. [8] ISO. “JTC 1/SC 29 – Coding of audio, picture, multime- Additional of JPEG Images, Proc. dia and hypermedia information”. Retrieved 2009-11-11. of the 4th Intl. Symposium on Image and Signal Process- [9] JPEG. “Joint Photographic Experts Group, JPEG Home- ing and Analysis (ISPA 2005), Zagreb, Croatia, pp.117– page”. Retrieved 2009-11-08. 120, September 15–17, 2005.

[10] “T.81 : Information technology – Digital compression and [27] M. Stirner and G. Seelmann. Improved Redundancy Re- coding of continuous-tone still images – Requirements duction for JPEG Files. Proc. of Picture Coding Sympo- and guidelines”. Retrieved 2009-11-07. sium (PCS 2007), Lisbon, Portugal, November 7–9, 2007

[11] William B. Pennebaker; Joan L. Mitchell (1993). JPEG [28] Ichiro Matsuda, Yukio Nomoto, Kei Wakabayashi and still image data compression standard (3rd ed.). Springer. Susumu Itoh. Lossless Re-encoding of JPEG images us- p. 291. ISBN 978-0-442-01272-4. ing block-adaptive intra prediction. Proceedings of the 16th European Conference (EUSIPCO [12] ISO. “JTC 1/SC 29 – Coding of audio, picture, multime- 2008). dia and hypermedia information”. Retrieved 2009-11-07. [29] “Latest Binary Releases of packJPG: V2.3a”. January 3, [13] JPEG (2009-04-24). “Press Release – 48th WG1 meet- 2008. ing, Maui, USA – JPEG XR enters FDIS status, JPEG File Interchange Format (JFIF) to be standardized as [30] J. Siragusa, D. C. Swift, “General Purpose Stereoscopic JPEG Part 5”. Retrieved 2009-11-09. Data Descriptor”, VRex, Inc., Elmsford, New York, USA, 1997. [14] “JPEG File Interchange Format (JFIF)". ECMA TR/98 1st ed. Ecma International. 2009. Retrieved 2011-08-01. [31] Tim Kemp, JPS files 1.14. EXTERNAL LINKS 15

[32] “Multi-Picture Format” (PDF). 2009. Retrieved 2015- [50] Workgroup. “Princeton Digital Image Corporation Home 12-30. Page”. Retrieved 2013-05-01.

[33] cybereality, MPO2Stereo: Convert Fujifilm MPO files to [51] Workgroup. “Article on Princeton Court Ruling Regard- JPEG stereo pairs, mtbs3d, retrieved 12 January 2010 ing GE License Agreement”. Retrieved 2013-05-01.

[34] Alessandro Ortis; Sebastiano Battiato, A new fast match- [52] Jpeg.org ing method for adaptive compression of stereoscopic im- ages, SPIE - Three-Dimensional Image Processing, Mea- surement (3DIPM), and Applications 2015, retrieved 30 1.14 External links April 2015

[35] Alessandro Ortis; Francesco Rundo; Giuseppe Di Giore; • JPEG Standard (JPEG ISO/IEC 10918-1 ITU-T Sebastiano Battiato, Adaptive Compression of Stereoscopic Recommendation T.81) at W3.org Images, International Conference on Image Analysis and Processing (ICIAP) 2013, retrieved 30 April 2015 • Official Joint Photographic Experts Group site

[36] “Concerning recent patent claims”. Jpeg.org. 2002-07- • JFIF File Format at W3.org 19. Retrieved 2011-05-29. • JPEG viewer in 250 lines of easy to understand [37] JPEG and JPEG2000 – Between Patent Quarrel and python code Change of Technology at the Wayback Machine (archived August 17, 2004)‹The template Wayback is being • Wotsit.org’s entry on the JPEG format considered for merging.› • Example images over the full range of quantization [38] Kawamoto, Dawn (April 22, 2005). “Graphics patent suit levels from 1 to 100 at visengi.com fires back at Microsoft”. CNET News. Retrieved 2009- • 01-28. Public domain JPEG compressor in a single C++ source file, along with a matching decompressor at [39] “Trademark Office Re-examines Forgent JPEG Patent”. code..com Publish.com. February 3, 2006. Retrieved 2009-01-28. • Example of .JPG file decoding [40] “USPTO: Broadest Claims Forgent Asserts Against JPEG Standard Invalid”. Groklaw.net. May 26, 2006. Re- • Jpeg Decoder Open Source Code , Copyright (C) trieved 2007-07-21. 1995–1997, Thomas G. Lane.

[41] “Coding System for Reducing Redundancy”. • JPEG compression and decompression on GPU. Gauss.ffii.org. Retrieved 2011-05-29.

[42] “JPEG Patent Claim Surrendered”. Public Patent Foun- dation. November 2, 2006. Retrieved 2006-11-03.

[43] Ex Parte Reexamination Certificate for U.S. Patent No. 5,253,341 Archived June 2, 2008, at the Wayback Ma- chine.‹The template Wayback is being considered for merging.›

[44] Workgroup. “Rozmanith: Using Software Patents to Si- lence Critics”. Eupat.ffii.org. Retrieved 2011-05-29.

[45] “A Bounty of $5,000 to Name Troll Tracker: Ray Niro Wants To Know Who Is saying All Those Nasty Things About Him”. Law.com. Retrieved 2011-05-29.

[46] Reimer, Jeremy (2008-02-05). “Hunting trolls: USPTO asked to reexamine broad image patent”. Arstech- nica.com. Retrieved 2011-05-29.

[47] U.S. Patent Office – Granting Reexamination on 5,253,341 C1

[48] “Judge Puts JPEG Patent On Ice”. Techdirt.com. 2008- 04-30. Retrieved 2011-05-29.

[49] “JPEG Patent’s Single Claim Rejected (And Smacked Down For Good Measure)". Techdirt.com. 2008-08-01. Retrieved 2011-05-29. Chapter 2

Color space

0.9 520 ProPhoto RGB

0.8 540 Adobe RGB 1998 0.7 560 Colormatch RGB 0.6 sRGB SWOP CMYK 500 0.5 580 y 0.4 600 D65 point 620 0.3

0.2

480 0.1

0.0 460 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 x A comparison of the enclosed by some color Comparison of some RGB and CMYK colour on a CIE spaces. 1931 xy diagram

CIELAB or CIEXYZ color spaces, which were specifi- A color space is a specific organization of colors. In com- cally designed to encompass all colors the average human bination with physical device profiling, it allows for re- can see. producible representations of color, in both analog and digital representations. A color space may be arbitrary, Since “color space” is a more specific term, identifying a with particular colors assigned to a set of physical color particular combination of color model and mapping func- swatches and corresponding assigned names or numbers tion, it tends to be used informally to identify a color such as with the collection, or structured mathe- model, since identifying a color space automatically iden- matically, as with NCS System, Adobe RGB or sRGB. tifies the associated color model, however this usage is A color model is an abstract mathematical model de- strictly incorrect. For example, although several specific scribing the way colors can be represented as tuples of color spaces are based on the RGB color model, there is numbers (e.g. triples in RGB or quadruples in CMYK); no such thing as the singular RGB color space. however, a color model with no associated mapping func- tion to an absolute color space is a more or less arbitrary color system with no connection to any globally under- stood system of color interpretation. Adding a specific 2.1 Examples mapping function between a color model and a refer- ence color space establishes within the reference color Colors can be created in printing with color spaces based space a definite “footprint”, known as a gamut, and for on the CMYK color model, using the subtractive primary a given color model this defines a color space. For ex- colors of ( (C), (M), (Y), ample, Adobe RGB and sRGB are two different absolute and black (K)). To create a three-dimensional represen- color spaces, both based on the RGB color model. When tation of a given color space, we can assign the amount of defining a color space, the usual reference standard is the magenta color to the representation’s X axis, the amount

16 2.3. RGB DENSITY 17

2.3 RGB density

The RGB color model is implemented in different ways, depending on the capabilities of the system used. By far the most common general-used incarnation as of 2006 is the 24-bit implementation, with 8 bits, or 256 discrete levels of color per channel. Any color space based on such a 24-bit RGB model is thus limited to a range of 256×256×256 ≈ 16.7 million colors. Some implemen- tations use 16 bits per component for 48 bits total, re- sulting in the same gamut with a larger number of dis- tinct colors. This is especially important when working with wide-gamut color spaces (where most of the more common colors are located relatively close together), or when a large number of digital filtering algorithms are used consecutively. The same principle applies for any color space based on the same color model, but imple- mented in different bit depths. A comparison of CMYK and RGB color models. This image demonstrates the difference between how colors will look on a (RGB) compared to how they will reproduce in a CMYK print process. 2.4 Lists

Main article: List of color spaces and their uses of cyan to its Y axis, and the amount of yellow to its Z axis. The resulting 3-D space provides a unique position for every possible color that can be created by combining CIE 1931 XYZ color space was one of the first attempts those three . to produce a color space based on measurements of hu- man color perception (earlier efforts were by James Clerk Colors can be created on computer monitors with color Maxwell, König & Dieterici, and Abney at Imperial Col- spaces based on the RGB color model, using the ad- lege)[1] and it is the basis for almost all other color spaces. ditive primary colors (red, green, and blue). A three- The CIERGB color space is a linearly-related companion dimensional representation would assign each of the three of CIE XYZ. Additional derivatives of CIE XYZ include colors to the X, Y, and Z axes. Note that colors gener- the CIELUV, CIEUVW, and CIELAB. ated on given monitor will be limited by the reproduction medium, such as the phosphor (in a CRT monitor) or fil- ters and (LCD monitor). 2.4.1 Generic Another way of creating colors on a monitor is with an HSL or HSV color space, based on hue, saturation, Main article: Color models brightness (value/brightness). With such a space, the RGB uses additive , because it describes variables are assigned to cylindrical coordinates. what kind of light needs to be emitted to produce a given color. RGB stores individual values for red, green and Many color spaces can be represented as three- blue. RGBA is RGB with an additional channel, alpha, dimensional values in this manner, but some have more, to indicate transparency. or fewer dimensions, and some, such as Pantone, cannot be represented in this way at all. Common color spaces based on the RGB model include sRGB, Adobe RGB, ProPhoto RGB, scRGB, and CIE RGB. 2.2 Conversion CMYK uses mixing used in the print- ing process, because it describes what kind of inks need to be applied so the light reflected from the substrate and Main article: Color translation through the inks produces a given color. One starts with a white substrate (canvas, page, etc.), and uses ink to sub- Color space conversion is the translation of the represen- tract color from white to create an image. CMYK stores tation of a color from one basis to another. This typically ink values for cyan, magenta, yellow and black. There are occurs in the context of converting an image that is rep- many CMYK color spaces for different sets of inks, sub- resented in one color space to another color space, the strates, and press characteristics (which change the dot goal being to make the translated image look as similar gain or transfer function for each ink and thus change the as possible to the original. appearance). 18 CHAPTER 2. COLOR SPACE

image compression schemes such as MPEG and JPEG. xvYCC is a new international digital video color space standard published by the IEC (IEC 61966-2-4). It is based on the ITU BT.601 and BT.709 standards but ex- tends the gamut beyond the R/G/B primaries specified in those standards. HSV (hue, saturation, value), also known as HSB (hue, saturation, brightness) is often used by artists because it is often more natural to think about a color in terms of hue and saturation than in terms of additive or subtractive color components. HSV is a transformation of an RGB colorspace, and its components and are rela- tive to the RGB colorspace from which it was derived. HSL (hue, saturation, /luminance), also known as HLS or HSI (hue, saturation, intensity) is quite simi- lar to HSV, with “lightness” replacing “brightness”. The difference is that the brightness of a pure color is equal to mixing: Three overlapping lightbulbs in a vacuum, the brightness of white, while the lightness of a pure color adding together to create white. is equal to the lightness of a medium gray.

2.4.2 Commercial

• Pantone Matching System (PMS)

(NCS)

2.4.3 Special-purpose

• The space is used in computer vision applications. It shows the color of light (red, yellow, green etc.), but not its intensity (dark, bright).

• The TSL color space (Tint, Saturation and Lumi- nance) is used in face detection.

Subtractive color mixing: Three splotches of paint on white paper, subtracting together to turn the paper black. 2.4.4 Obsolete

Early color spaces had two components. They largely ig- YIQ was formerly used in NTSC (North America, Japan nored blue light because the added complexity of a 3- and elsewhere) television broadcasts for historical rea- component process provided only a marginal increase in sons. This system stores a luma value roughly analogous fidelity when compared to the jump from to (and sometimes misidentified as)[2][3] luminance, along to 2-component color. with two chroma values as approximate representations of the relative amounts of blue and red in the color. It is • RG for early Technicolor film similar to the YUV scheme used in most video capture systems[4] and in PAL (Australia, Europe, except France, • RGK for early which uses SECAM) television, except that the YIQ color space is rotated 33° with respect to the YUV color space and the color axes are swapped. The YDbDr scheme used 2.5 Absolute color space by SECAM television is rotated in another way. YPbPr is a scaled version of YUV. It is most commonly In color science, there are two meanings of the term ab- seen in its digital form, YCbCr, used widely in video and solute color space: 2.6. SEE ALSO 19

• A color space in which the perceptual difference 2.5.2 Arbitrary spaces between colors is directly related to distances be- tween colors as represented by points in the color A different method of defining absolute color spaces is fa- space.[5][6] miliar to many consumers as the swatch card, used to se- lect paint, fabrics, and the like. This is a way of agreeing a • A color space in which colors are unambiguous, that color between two parties. A more standardized method is, where the interpretations of colors in the space of defining absolute colors is the Pantone Matching Sys- are colorimetrically defined without reference to ex- tem, a proprietary system that includes swatch cards and ternal factors.[7][8] recipes that commercial printers can use to make inks that In this article, we concentrate on the second definition. are a particular color. CIEXYZ and sRGB are examples of absolute color spaces, as opposed to a generic RGB color space. A non-absolute color space can be made absolute by 2.6 See also defining its relationship to absolute colorimetric quanti- ties. For instance, if the red, green, and blue colors in a • monitor are measured exactly, together with other prop- erties of the monitor, then RGB values on that monitor • Color model can be considered as absolute. The L*a*b* is sometimes referred to as absolute, though it also needs a • List of colors specification to make it so.[9] A popular way to make a color space like RGB into an absolute color is to define an ICC profile, which contains 2.7 References the attributes of the RGB. This is not the only way to ex- press an absolute color, but it is the standard in many in- dustries. RGB colors defined by widely accepted profiles [1] William David Wright, 50 years of the 1931 CIE Standard include sRGB and Adobe RGB. The process of adding an Observer. Die Farbe, 29:4/6 (1981). ICC profile to a graphic or document is sometimes called tagging or embedding; tagging therefore marks the abso- [2] Charles Poynton, “YUV and 'luminance' considered lute meaning of colors in that graphic or document. harmful: a plea for precise terminology in video,” online, author-edited version of Appendix A of Charles Poyn- ton, Digital Video and HDTV: Algorithms and Interfaces, 2.5.1 Conversion Morgan–Kaufmann, 2003. online

Main article: Color translation [3] Charles Poynton, Constant Luminance, 2004

[4] Dean Anderson. “Color Spaces in Frame Grabbers: RGB A color in one absolute color space can be converted into vs. YUV”. Retrieved 2008-04-08. another absolute color space, and back again, in general; however, some color spaces may have gamut limitations, [5] Hans G. Völz (2001). Industrial Color Testing: Fun- and converting colors that lie outside that gamut will not damentals and Techniques. Wiley-VCH. ISBN 3-527- produce correct results. There are also likely to be round- 30436-3. ing errors, especially if the popular range of only 256 dis- tinct values per component (8-bit color) is used. [6] Gunter Buxbaum; Gerhard Pfaff (2005). Industrial Inor- ganic Pigments. Wiley-VCH. ISBN 3-527-30363-4. One part of the definition of an absolute color space is the viewing conditions. The same color, viewed under differ- [7] Jonathan B. Knudsen (1999). 2D Graphics. ent natural or artificial conditions, will look dif- O'Reilly. ISBN 1-56592-484-3. ferent. Those involved professionally with color matching may use viewing rooms, lit by standardized lighting. [8] Bernice Ellen Rogowitz; Thrasyvoulos N Pappas; Scott J Occasionally, there are precise rules for converting be- Daly (2007). Human Vision and Electronic Imaging XII. tween non-absolute color spaces. For example, HSL and SPIE. ISBN 0-8194-6605-0. HSV spaces are defined as mappings of RGB. Both are non-absolute, but the conversion between them should [9] Yud-Ren Chen; George E. Meyer; Shu-I. Tu (2005). maintain the same color. However, in general, convert- Optical Sensors and Sensing Systems for Natural Resources and Food Safety and Quality. SPIE. ISBN 0-8194-6020- ing between two non-absolute color spaces (for example, 6. RGB to CMYK) or between absolute and non-absolute color spaces (for example, RGB to L*a*b*) is almost a meaningless concept. 3. www.iscc.org/aic2001/abstracts/poster/Zoch.doc 20 CHAPTER 2. COLOR SPACE

2.8 External links

• Color FAQ, Charles Poynton

• FAQ about color physics, Stephen Westland • Color Science, Dan Bruton

• Color Spaces, Rolf G. Kuehni (October 2003)

• Colour spaces – perceptual, historical and applica- tional background, Marko Tkalčič (2003)

• Color formats for image and – Color conversion between RGB, YUV, YCbCr and YPbPr. • C library of SSE-optimised color format conver- sions. • Konica Sensing: Precise Color Communi- cation Chapter 3

Color vision

eye can distinguish up to a few hundred , when those pure spectral colours are mixed together or diluted with white light, the number of distinguishable chromaticities can be quite high. In very low light levels, vision is scotopic: light is detected by rod cells of the retina. Rods are maximally sensitive to wavelengths near 500 nm, and play little, if any, role in colour vision. In brighter light, such as daylight, vision is photopic: light is detected by cone cells which are respon- sible for colour vision. Cones are sensitive to a range of wavelengths, but are most sensitive to wavelengths near 555 nm. Between these regions, mesopic vision comes into play and both rods and cones provide signals to the retinal ganglion cells. The shift in colour perception from Colorless, green, and red photographic filters as imaged dim light to daylight gives rise to differences known as ("perceived") by camera the Purkinje effect. The perception of “white” is formed by the entire spec- Color vision is the ability of an organism or machine trum of visible light, or by mixing colours of just a few to distinguish objects based on the wavelengths (or wavelengths in animals with few types of colour recep- frequencies) of the light they reflect, emit, or transmit. tors. In humans, white light can be perceived by combin- Colors can be measured and quantified in various ways; ing wavelengths such as red, green, and blue, or just a pair indeed, a person’s perception of colors is a subjective of complementary colours such as blue and yellow.[1] process whereby the brain responds to the stimuli that are produced when incoming light reacts with the sev- eral types of cone cells in the eye. In essence, different people see the same illuminated object or light source in 3.2 Physiology of color perception different ways.

Trichromatic cone cells respond positively to one of three frequencies exhibited by photons arriving on their surface. 3.1 Wavelength and hue detection

The three color channels are discovered Isaac Newton discovered that white light splits into its by nearby opponent cells.

Opponent cells tuned to luminosity are component colours when passed through a dispersive excited by the red, green, and blue color signals. prism. Newton also found that he could recombine these Cg cells are excited by red and blue and inhibited by green. Cb cells are excited colours by passing them through a different prism to by red and green and inhibited by blue. make white light. The modern model of human color perception as it occurs in the The characteristic colours are, from long to short wave- retina, pertaining to both the trichromatic and opponent process lengths (and, correspondingly, from low to high fre- theories introduced in the 19th century. quency), red, , yellow, green, blue, , and vi- olet. Sufficient differences in wavelength cause a differ- Perception of color begins with specialized retinal cells ence in the perceived hue; the just-noticeable difference containing pigments with different spectral sensitivities, in wavelength varies from about 1 nm in the blue-green known as cone cells. In humans, there are three types and yellow wavelengths, to 10 nm and more in the longer of cones sensitive to three different spectra, resulting in red and shorter blue wavelengths. Although the human trichromatic color vision.

21 22 CHAPTER 3. COLOR VISION

The cones are conventionally labeled according to the or- SML dering of the wavelengths of the peaks of their spectral sensitivities: short (S), medium (M), and long (L) cone types. These three types do not correspond well to par- ticular colors as we know them. Rather, the perception of color is achieved by a complex process that starts with the differential output of these cells in the retina and it will be finalized in the visual cortex and associative areas of the brain. For example, while the L cones have been referred to sim- Normalized cone response (linear energy) 400 450 500 550 600 650 700 ply as red receptors, microspectrophotometry has shown Wavelength (nm) that their peak sensitivity is in the greenish-yellow region of the spectrum. Similarly, the S- and M-cones do not Normalized response spectra of human cones, to monochromatic directly correspond to blue and green, although they are spectral stimuli, with wavelength given in nanometers. often described as such. The RGB color model, there- fore, is a convenient means for representing color, but is not directly based on the types of cones in the human eye. The peak response of human cone cells varies, even among individuals with so-called normal color vision;[3] in some non-human species this polymorphic variation is even greater, and it may well be adaptive.[4]

3.2.1 Theories

Two complementary theories of color vision are the trichromatic theory and the opponent process theory. The trichromatic theory, or Young–Helmholtz theory, proposed in the 19th century by Thomas Young and Hermann von Helmholtz, as mentioned above, states that the retina’s three types of cones are preferentially sensi- The same figures as above represented here as a single curve in tive to blue, green, and red. Ewald Hering proposed the [5] three (normalized cone response) dimensions opponent process theory in 1872. It states that the vi- sual system interprets color in an antagonistic way: red vs.

100- green, blue vs. yellow, black vs. white. Both theories are Eye color sensitivity now accepted as valid, describing different stages in vi- 90- sual physiology, visualized in the diagram on the right.[6] 80- Green ←→Magenta and Blue ←→Yellow are scales with mutually exclusive boundaries. In the same way that there 70- cannot exist a “slightly negative” positive number, a sin- 60- gle eye cannot perceive a blueish-yellow or a reddish- % Sensitivity green. (But such impossible colors can be perceived due 50- to .) 40-

30- 3.2.2 Cone cells in the human eye 20- A range of wavelengths of light stimulates each of these 10- receptor types to varying degrees. Yellowish-green light, 400 500 Nanometres 600 700 for example, stimulates both L and M cones equally strongly, but only stimulates S-cones weakly. Red light, Relative brightness sensitivity of the human visual system as a on the other hand, stimulates L cones much more than M function of wavelength cones, and S cones hardly at all; blue-green light stimu- lates M cones more than L cones, and S cones a bit more strongly, and is also the peak stimulant for rod cells; and Each individual cone contains pigments composed of blue light stimulates S cones more strongly than red or opsin apoprotein, which is covalently linked to either 11- green light, but L and M cones more weakly. The brain cis-hydroretinal or more rarely 11-cis-dehydroretinal.[2] combines the information from each type of receptor to 3.2. PHYSIOLOGY OF COLOR PERCEPTION 23

give rise to different of different wavelengths Visual information is then sent to the brain from retinal of light. ganglion cells via the optic nerve to the optic chiasma: a The opsins (photopigments) present in the L and M cones point where the two optic nerves meet and information are encoded on the X chromosome; defective encoding from the temporal (contralateral) visual field crosses to of these leads to the two most common forms of color the other side of the brain. After the optic chiasma the blindness. The OPN1LW gene, which codes for the opsin visual tracts are referred to as the optic tracts, which enter present in the L cones, is highly polymorphic (a recent the thalamus to synapse at the lateral geniculate nucleus study by Verrelli and Tishkoff found 85 variants in a sam- (LGN). ple of 236 men).[9] A very small percentage of women The lateral geniculate nucleus is divided into laminae may have an extra type of color receptor because they (zones), of which there are three types: the M-laminae, have different alleles for the gene for the L opsin on each consisting primarily of M-cells, the P-laminae, consisting X chromosome. X chromosome inactivation means that primarily of P-cells, and the koniocellular laminae. M- only one opsin is expressed in each cone , and some and P-cells receive relatively balanced input from both women may therefore show a degree of tetrachromatic L- and M-cones throughout most of the retina, although color vision.[10] Variations in OPN1MW, which codes the this seems to not be the case at the fovea, with midget opsin expressed in M cones, appear to be rare, and the cells synapsing in the P-laminae. The koniocellular lam- observed variants have no effect on . inae receive axons from the small bistratified ganglion cells.[11][12] After synapsing at the LGN, the visual tract continues on 3.2.3 Color in the human brain back to the primary visual cortex (V1) located at the back of the brain within the occipital lobe. Within V1 there is a distinct band (striation). This is also referred to as “striate cortex”, with other cortical visual regions referred to collectively as “extrastriate cortex”. It is at this stage that color processing becomes much more complicated. In V1 the simple three-color segregation begins to break down. Many cells in V1 respond to some parts of the spectrum better than others, but this “color tuning” is of- ten different depending on the adaptation state of the vi- sual system. A given cell that might respond best to long wavelength light if the light is relatively bright might then become responsive to all wavelengths if the stimulus is relatively dim. Because the color tuning of these cells is not stable, some believe that a different, relatively small, population of neurons in V1 is responsible for color vi- Visual pathways in the human brain. The ventral stream () is important in color recognition. The dorsal stream (green) is sion. These specialized “color cells” often have receptive also shown. They originate from a common source in the visual fields that can compute local cone ratios. Such “double- cortex. opponent” cells were initially described in the goldfish retina by Nigel Daw;[13][14] their existence in primates Color processing begins at a very early level in the visual was suggested by David H. Hubel and Torsten Wiesel and [15] system (even within the retina) through initial color op- subsequently proven by Bevil Conway. As Margaret ponent mechanisms. Both Helmholtz’s trichromatic the- Livingstone and David Hubel showed, double opponent ory, and Hering’s opponent process theory are therefore cells are clustered within localized regions of V1 called correct, but arises at the level of the recep- blobs, and are thought to come in two flavors, red–green [16] tors, and opponent processes arise at the level of retinal and blue–yellow. Red–green cells compare the rela- ganglion cells and beyond. In Hering’s theory opponent tive amounts of red–green in one part of a scene with the mechanisms refer to the opposing color effect of red– amount of red–green in an adjacent part of the scene, re- green, blue–yellow, and light–dark. However, in the vi- sponding best to (red next to green). sual system, it is the activity of the different receptor Modeling studies have shown that double-opponent cells types that are opposed. Some midget retinal ganglion are ideal candidates for the neural machinery of color cells oppose L and M cone activity, which corresponds constancy explained by Edwin H. Land in his retinex [17] loosely to red–green opponency, but actually runs along theory. an axis from blue-green to magenta. Small bistratified From the V1 blobs, color information is sent to cells in retinal ganglion cells oppose input from the S cones to the second visual area, V2. The cells in V2 that are most input from the L and M cones. This is often thought to strongly color tuned are clustered in the “thin stripes” that, correspond to blue–yellow opponency, but actually runs like the blobs in V1, stain for the enzyme cytochrome ox- along a color axis from green to . 24 CHAPTER 3. COLOR VISION

of electromagnetic radiation from invisible portions of the broader spectrum. In this sense, color is not a prop- erty of electromagnetic radiation, but a feature of by an observer. Furthermore, there is an ar- bitrary mapping between wavelengths of light in the vi- sual spectrum and human experiences of color. Although most people are assumed to have the same mapping, the philosopher John Locke recognized that alternatives are possible, and described one such hypothetical case with the “inverted spectrum” thought experiment. For exam- ple, someone with an inverted spectrum might experience green while seeing 'red' (700 nm) light, and experience red while seeing 'green' (530 nm) light. Synesthesia (or ideasthesia) provides some atypical but illuminating ex- amples of subjective color experience triggered by input that is not even light, such as sounds or shapes. The pos- sibility of a clean dissociation between color experience from properties of the world reveals that color is a sub- jective psychological phenomenon. This image (when viewed in full size, 1000 pixels wide) contains 1 million pixels, each of a different color. The human eye can The Himba people have been found to categorize colors distinguish about 10 million different colors.[18] differently from most Euro-Americans and are able to easily distinguish close , barely discernible for most people.[23] The Himba have created a very dif- idase (separating the thin stripes are interstripes and thick ferent which divides the spectrum to dark stripes, which seem to be concerned with other visual in- shades (zuzu in Himba), very light (vapa), vivid blue and formation like motion and high-resolution form). Neu- green (buru) and dry colors as an adaptation to their spe- rons in V2 then synapse onto cells in the extended V4. cific way of life. This area includes not only V4, but two other areas in the posterior inferior temporal cortex, anterior to area V3, Perception of color depends heavily on the context in the dorsal posterior inferior temporal cortex, and poste- which the perceived object is presented. For example, rior TEO.[19][20] Area V4 was initially suggested by Semir a white page under blue, , or purple light will reflect Zeki to be exclusively dedicated to color, but this is now mostly blue, pink, or purple light to the eye, respectively; thought to be incorrect.[21] In particular, the presence in the brain, however, compensates for the effect of lighting V4 of orientation-selective cells led to the view that V4 (based on the color shift of surrounding objects) and is is involved in processing both color and form associated more likely to interpret the page as white under all three with color.[22] Color processing in the extended V4 oc- conditions, a phenomenon known as . curs in millimeter-sized color modules called globs.[19][20] This is the first part of the brain in which color is pro- cessed in terms of the full range of hues found in color 3.2.5 In other animal species space.[19][20] Many species can see light with frequencies outside the Anatomical studies have shown that neurons in extended human “”. Bees and many other in- V4 provide input to the inferior temporal lobe . “IT” cor- sects can detect ultraviolet light, which helps them to find tex is thought to integrate color information with shape nectar in flowers. Plant species that depend on insect and form, although it has been difficult to define the ap- pollination may owe reproductive success to ultraviolet propriate criteria for this claim. Despite this murkiness, “colors” and patterns rather than how colorful they ap- it has been useful to characterize this pathway (V1 > V2 pear to humans. Birds, too, can see into the ultraviolet > V4 > IT) as the ventral stream or the “what pathway”, (300–400 nm), and some have sex-dependent markings distinguished from the dorsal stream (“where pathway”) on their plumage that are visible only in the ultraviolet that is thought to analyze motion, among many other fea- range.[24][25] Many animals that can see into the ultravio- tures. let range, however, cannot see red light or any other red- dish wavelengths. For example, bees’ visible spectrum ends at about 590 nm, just before the orange wavelengths 3.2.4 Subjectivity of color perception start. Birds, however, can see some red wavelengths, al- though not as far into the light spectrum as humans.[26] It See also: Linguistic relativity and the color naming is an incorrect popular belief that the common goldfish is debate the only animal that can see both infrared and ultraviolet light,[27] their color vision extends into the ultraviolet but Nothing categorically distinguishes the visible spectrum not the infrared.[28] 3.3. EVOLUTION 25

The basis for this variation is the number of cone types in dim light.[37] that differ between species. Mammals in general have In the evolution of mammals, segments of color vision color vision of a limited type, and usually have red-green were lost, then for a few species of primates, regained , with only two types of cones. Humans, by gene duplication. Eutherian mammals other than pri- some primates, and some marsupials see an extended mates (for example, dogs, mammalian farm animals) range of colors, but only by comparison with other mam- generally have less-effective two-receptor (dichromatic) mals. Most non-mammalian vertebrate species distin- color perception systems, which distinguish blue, green, guish different colors at least as well as humans, and many and yellow—but cannot distinguish oranges and . species of birds, fish, reptiles and amphibians, and some There is some evidence that a few mammals, such as cats, invertebrates, have more than three cone types and prob- have redeveloped the ability to distinguish longer wave- ably superior color vision to humans. length colors, in at least a limited way, via one-amino-acid In most Catarrhini (Old World monkeys and apes— mutations in opsin genes.[38] The adaptation to see reds primates closely related to humans) there are three types is particularly important for primate mammals, since it of color receptors (known as cone cells), resulting in leads to identification of fruits, and also newly sprouting trichromatic color vision. These primates, like humans, reddish leaves, which are particularly nutritious. are known as trichromats. Many other primates (in- However, even among primates, full color vision dif- cluding New World monkeys) and other mammals are fers between New World and Old World monkeys. Old dichromats, which is the general color vision state for World primates, including monkeys and all apes, have mammals that are active during the day (i.e., felines, ca- vision similar to humans. New World monkeys may nines, ungulates). Nocturnal mammals may have little or or may not have color sensitivity at this level: in most no color vision. Trichromat non-primate mammals are [29][30] species, males are dichromats, and about 60% of fe- rare. males are trichromats, but the owl monkeys are cone Many invertebrates have color vision. Honeybees and monochromats, and both sexes of howler monkeys are bumblebees have trichromatic color vision which is in- trichromats.[39][40][41][42] Visual sensitivity differences sensitive to red but sensitive to ultraviolet. Osmia rufa, between males and females in a single species is due to for example, possess a trichromatic color system, which the gene for yellow-green sensitive opsin protein (which they use in foraging for pollen from flowers.[31] In view of confers ability to differentiate red from green) residing the importance of color vision to bees one might expect on the X sex chromosome. these receptor sensitivities to reflect their specific visual Several marsupials such as the fat-tailed dunnart ecology; for example the types of flowers that they visit. (Sminthopsis crassicaudata) have been shown to have However, the main groups of hymenopteran insects ex- trichromatic color vision.[43] cluding ants (i.e., bees, wasps and sawflies) mostly have three types of photoreceptor, with spectral sensitivities Marine mammals, adapted for low-light vision, have only similar to the honeybee’s.[32] Papilio butterflies possess a single cone type and are thus monochromats. six types of photoreceptors and may have pentachromatic vision.[33] The most complex color vision system in the animal kingdom has been found in stomatopods (such as 3.3 Evolution the mantis shrimp) with up to 12 spectral receptor types thought to work as multiple dichromatic units.[34] Main article: Evolution of color vision Vertebrate animals such as tropical fish and birds some- times have more complex color vision systems than hu- mans; thus the many subtle colors they exhibit gener- Color perception mechanisms are highly dependent on ally serve as direct signals for other fish or birds, and evolutionary factors, of which the most prominent is not to signal mammals.[35] In bird vision, tetrachromacy thought to be satisfactory recognition of food sources. is achieved through up to four cone types, depending on In herbivorous primates, color perception is essential for species. Each single cone contains one of the four main finding proper (immature) leaves. In hummingbirds, par- types of vertebrate cone photopigment (LWS/ MWS, ticular flower types are often recognized by color as RH2, SWS2 and SWS1) and has a colored oil droplet in well. On the other hand, nocturnal mammals have less- its inner segment.[32] Brightly colored oil droplets inside developed color vision, since adequate light is needed the cones shift or narrow the spectral sensitivity of the for cones to function properly. There is evidence that cell. It has been suggested that it is likely that pigeons are ultraviolet light plays a part in color perception in many pentachromats.[36] branches of the animal kingdom, especially insects. In general, the optical spectrum encompasses the most com- Reptiles and amphibians also have four cone types (occa- mon electronic transitions in matter and is therefore the sionally five), and probably see at least the same number most useful for collecting information about the environ- of colors that humans do, or perhaps more. In addition, ment. some nocturnal geckos have the capability of seeing color The evolution of trichromatic color vision in primates oc- 26 CHAPTER 3. COLOR VISION

curred as the ancestors of modern monkeys, apes, and to a particular perceived color (which is a single point in humans switched to diurnal (daytime) activity and be- R3ₒₒᵣ). This association is easily seen to be linear. It gan consuming fruits and leaves from flowering plants.[44] may also easily be seen that many different elements in Color vision, with UV discrimination, is also present in the “physical” space Hₒₒᵣ can all result in the same sin- a number of arthropods—the only terrestrial animals be- gle perceived color in R3ₒₒᵣ, so a perceived color is not sides the vertebrates to possess this trait.[45] unique to one physical color. Some animals can distinguish colors in the ultraviolet Thus human color perception is determined by a specific, spectrum. The UV spectrum falls outside the human vis- non-unique linear mapping from the infinite-dimensional ible range, except for some cataract surgery patients.[46] Hilbert space Hₒₒᵣ to the 3-dimensional Euclidean space Birds, turtles, lizards, many fish and some rodents have R3ₒₒᵣ. [47] UV receptors in their retinas. These animals can see Technically, the image of the (mathematical) cone over the UV patterns found on flowers and other wildlife that the simplex whose vertices are the spectral colors, by this are otherwise invisible to the human eye. linear mapping, is also a (mathematical) cone in R3ₒₒᵣ. Ultraviolet vision is an especially important adaptation in Moving directly away from the vertex of this cone repre- birds. It allows birds to spot small prey from a distance, sents maintaining the same chromaticity while increasing navigate, avoid predators, and forage while flying at high its intensity. Taking a cross-section of this cone yields a speeds. Birds also utilize their broad spectrum vision to 2D chromaticity space. Both the 3D cone and its projec- recognize other birds, and in sexual selection.[48][49] tion or cross-section are convex sets; that is, any mixture of spectral colors is also a color. 3.4 Mathematics of color percep- tion

A “physical color” is a combination of pure spectral col- ors (in the visible range). Since there are, in principle, infinitely many distinct spectral colors, the set of all phys- ical colors may be thought of as an infinite-dimensional vector space, in fact a Hilbert space. We call this space Hₒₒᵣ. More technically, the space of physical colors may be considered to be the (mathematical) cone over the simplex whose vertices are the spectral colors, with white at the centroid of the simplex, black at the apex of the cone, and the associated with any given vertex somewhere along the line from that ver- tex to the apex depending on its brightness. An element C of Hₒₒᵣ is a function from the range of vis- ible wavelengths—considered as an interval of real num- bers [Wᵢ,Wₐₓ]—to the real numbers, assigning to each wavelength w in [Wᵢ,Wₐₓ] its intensity C(w). A humanly perceived color may be modeled as three The CIE 1931 xy chromaticity diagram. The numbers: the extents to which each of the 3 types of is shown with color temperatures labeled in kelvins. The outer curved boundary is the spectral (or monochromatic) locus, with cones is stimulated. Thus a humanly perceived color may wavelengths shown in nanometers (blue). Note that the colors be thought of as a point in 3-dimensional Euclidean space. in this file are being specified using sRGB. Areas outside the tri- 3 We call this space R ₒₒᵣ. angle cannot be accurately rendered because they are out of the Since each wavelength w stimulates each of the 3 types of gamut of sRGB, therefore they have been interpreted. Note that cone cells to a known extent, these extents may be repre- the colors depicted depend on the color space of the device you sented by 3 functions s(w), m(w), l(w) corresponding to use to view the image (number of colors on your monitor, etc.), and may not be a strictly accurate representation of the color at the response of the S, M, and L cone cells, respectively. a particular position. Finally, since a beam of light can be composed of many different wavelengths, to determine the extent to which In practice, it would be quite difficult to physiologically a physical color C in Hₒₒᵣ stimulates each , we measure an individual’s three cone responses to vari- must calculate the integral (with respect to w), over the ous physical color stimuli. Instead, a psychophysical ap- interval [Wᵢ,Wₐₓ], of C(w)·s(w), of C(w)·m(w), and proach is taken. Three specific benchmark test are of C(w)·l(w). The triple of resulting numbers associates typically used; let us call them S, M, and L. To calibrate to each physical color C (which is an element in Hₒₒᵣ) human perceptual space, scientists allowed human sub- 3.6. SEE ALSO 27

jects to try to match any physical color by turning dials ject appear neutral (), while keeping other to create specific combinations of intensities (IS, IM, IL) colors also looking realistic.[50] For example, chromatic for the S, M, and L lights, resp., until a match was found. adaptation transforms are used when converting images This needed only to be done for physical colors that are between ICC profiles with different white points. Adobe spectral, since a linear combination of spectral colors will Photoshop, for example, uses the Bradford CAT.[51] be matched by the same linear combination of their (IS, In color vision, chromatic adaptation refers to color con- IM, IL) matches. Note that in practice, often at least one stancy; the ability of the visual system to preserve the of S, M, L would have to be added with some intensity to appearance of an object under a wide range of light the physical test color, and that combination matched by a sources.[52] linear combination of the remaining 2 lights. Across dif- ferent individuals (without color blindness), the match- ings turned out to be nearly identical. 3.6 See also By considering all the resulting combinations of intensi- ties (IS, IM, IL) as a subset of 3-space, a model for human • perceptual color space is formed. (Note that when one of Color blindness S, M, L had to be added to the test color, its intensity was • Color theory counted as negative.) Again, this turns out to be a (math- ematical) cone, not a quadric, but rather all rays through • Inverted spectrum the origin in 3-space passing through a certain convex set. Again, this cone has the property that moving di- • rectly away from the origin corresponds to increasing the intensity of the S, M, L lights proportionately. Again, a • Visual perception cross-section of this cone is a planar shape that is (by defi- nition) the space of “chromaticities” (informally: distinct colors); one particular such cross section, corresponding 3.7 References to constant X+Y+Z of the CIE 1931 color space, gives the CIE chromaticity diagram. [1] “Eye, human.” Encyclopædia Britannica 2006 Ultimate This system implies that for any hue or non-spectral color Reference Suite DVD, 2009. not on the boundary of the chromaticity diagram, there are infinitely many distinct physical spectra that are all [2] Nathans, Jeremy; Thomas, Darcy; Hogness, perceived as that hue or color. So, in general there is no David S. (April 11, 1986). “Molecular Genet- ics of Human Color Vision: The Genes Encoding such thing as the combination of spectral colors that we Blue, Green, and Red Pigments”. Science. 232 perceive as (say) a specific version of tan; instead there are (4747): 193–202. Bibcode:1986Sci...232..193N. infinitely many possibilities that produce that exact color. doi:10.1126/science.2937147. JSTOR 169687. PMID The boundary colors that are pure spectral colors can be 2937147. perceived only in response to light that is purely at the associated wavelength, while the boundary colors on the [3] Neitz J, Jacobs GH (1986). “Polymorphism of the long- “line of ” can each only be generated by a specific wavelength cone in normal human color vision”. Nature. ratio of the pure violet and the pure red at the ends of the 323 (6089): 623–5. Bibcode:1986Natur.323..623N. visible spectral colors. doi:10.1038/323623a0. PMID 3773989. The CIE chromaticity diagram is horseshoe-shaped, with [4] Jacobs GH (January 1996). “Primate photopigments its curved edge corresponding to all spectral colors (the and primate color vision”. Proc. Natl. Acad. Sci. spectral locus), and the remaining straight edge corre- U.S.A. 93 (2): 577–81. Bibcode:1996PNAS...93..577J. sponding to the most saturated purples, mixtures of red doi:10.1073/pnas.93.2.577. PMC 40094 . PMID and violet. 8570598.

[5] Hering, Ewald (1872). “Zur Lehre vom Lichtsinne”. Sitzungsberichte der Mathematisch– 3.5 Chromatic adaptation Naturwissenschaftliche Classe der Kaiserlichen Akademie der Wissenschaften. K.-K. Hof- und Staatsdruckerei in Commission bei C. Gerold’s Sohn. LXVI. Band (III Main article: Chromatic adaptation Abtheilung).

[6] Ali, M.A. & Klyne, M.A. (1985), p.168 In color science, chromatic adaptation is the estimation of the representation of an object under a different light [7] Wyszecki, Günther; Stiles, W.S. (1982). Color Science: source from the one in which it was recorded. A com- Concepts and Methods, Quantitative Data and Formulae mon application is to find a chromatic adaptation trans- (2nd ed.). New York: Wiley Series in Pure and Applied form (CAT) that will make the recording of a neutral ob- Optics. ISBN 0-471-02106-7. 28 CHAPTER 3. COLOR VISION

[8] R. W. G. Hunt (2004). The Reproduction of Colour (6th [22] Zeki S (2005). “The Ferrier Lecture 1995 Behind the ed.). Chichester UK: Wiley–IS&T Series in Imaging Sci- Seen: The functional specialization of the brain in space ence and Technology. pp. 11–2. ISBN 0-470-02425-9. and time”. Philosophical Transactions of the Royal Society B. 360 (1458): 1145–1183. doi:10.1098/rstb.2005.1666. [9] Verrelli BC, Tishkoff SA (September 2004). “Signatures PMC 1609195 . PMID 16147515. of Selection and Gene Conversion Associated with Hu- man Color Vision Variation”. Am. J. Hum. Genet. 75 (3): [23] Roberson, Davidoff, Davies & Shapiro. referred by Debi 363–75. doi:10.1086/423287. PMC 1182016 . PMID Roberson, University of Essex 2011 15252758. [24] Cuthill, Innes C (1997). “Ultraviolet vision in birds”. In [10] Roth, Mark (2006). “Some women may see 100 million Peter J.B. Slater. Advances in the Study of Behavior. 29. colors, thanks to their genes” Post-Gazette.com Oxford, England: Academic Press. p. 161. ISBN 978-0- 12-004529-7. [11] R.W. Rodieck, “The First Steps in Seeing”. Sinauer As- sociates, Inc., Sunderland, Massachusetts, USA, 1998. [25] Jamieson, Barrie G. M. (2007). Reproductive Biology and Phylogeny of Birds. Charlottesville VA: University of Vir- [12] Hendry, Stewart H. C.; Reid, R. Clay (1970-01-01). ginia. p. 128. ISBN 1-57808-386-9. “SH Hendry, RC Reid, “The Koniocellular Pathway in Primate Vision”. Annual Reviews Neuroscience, [26] Varela, F. J.; Palacios, A. G.; Goldsmith T. M. “Color 2000, vol. 23, pp. 127-53”. Annual Review vision of birds” in Ziegler & Bischof (1993) 77–94 of Neuroscience. Annualreviews.org. 23: 127–53. [27] “True or False? “The common goldfish is the only animal doi:10.1146/annurev.neuro.23.1.127. PMID 10845061. that can see both infra-red and ultra-violet light.” - Skep- Retrieved 2012-09-09. tive”. Retrieved September 28, 2013.

[13] Nigel W. Daw (17 November 1967). “Goldfish Retina: [28] Neumeyer, Christa (2012). “Chapter 2: Color Vision Organization for Simultaneous Color Contrast”. Sci- in Goldfish and Other Vertebrates”. In Lazareva, Olga; ence. 158 (3803): 942–4. Bibcode:1967Sci...158..942D. Shimizu, Toru; Wasserman, Edward. How Animals See doi:10.1126/science.158.3803.942. PMID 6054169. the World: Comparative Behavior, Biology, and Evolution [14] Bevil R. Conway (2002). Neural Mechanisms of Color of Vision. Oxford Scholarship Online. ISBN 978-0-195- Vision: Double-Opponent Cells in the Visual Cortex. 33465-4. Springer. ISBN 1-4020-7092-6. [29] Ali, Mohamed Ather; Klyne, M.A. (1985). Vision in Ver- tebrates. New York: Plenum Press. pp. 174–175. ISBN [15] Conway BR (15 April 2001). “Spatial structure of cone 0-306-42065-1. inputs to color cells in alert macaque primary visual cortex (V-1)". J. Neurosci. 21 (8): 2768–83. PMID 11306629. [30] Jacobs, G. H. (1993). “The Distribution and Na- ture of Colour Vision Among the Mammals”. Biolog- [16] John E. Dowling (2001). Neurons and Networks: An In- ical Reviews. 68 (3): 413–471. doi:10.1111/j.1469- troduction to Behavioral Neuroscience. Harvard University 185X.1993.tb00738.x. PMID 8347768. Press. ISBN 0-674-00462-0. [31] Menzel, R.; Steinmann, E.; Souza, J. De; Backhaus, W. [17] McCann, M., ed. 1993. Edwin H. Land's Essays. Spring- (1988-05-01). “Spectral Sensitivity of Photoreceptors field, Va.: Society for Imaging Science and Technology. and Colour Vision in the Solitary Bee, Osmia Rufa”. Jour- [18] Judd, Deane B.; Wyszecki, Günter (1975). Color in Busi- nal of Experimental Biology. 136 (1): 35–52. ISSN 0022- ness, Science and Industry. Wiley Series in Pure and Ap- 0949. plied Optics (3rd ed.). New York: Wiley-Interscience. p. [32] Osorio D, Vorobyev M (June 2008). “A review of the 388. ISBN 0-471-45212-2. evolution of animal colour vision and visual communi- [19] Conway BR, Moeller S, Tsao DY (2007). “Specialized cation signals”. Vision Research. 48 (20): 2042–2051. color modules in macaque extrastriate cortex”. Neu- doi:10.1016/j.visres.2008.06.018 (inactive 2016-01-03). ron. 56 (3): 560–73. doi:10.1016/j.neuron.2007.10.008. PMID 18627773. PMID 17988638. [33] Arikawa K (November 2003). “Spectral organization of the eye of a butterfly, Papilio”. J. Comp. Physiol. [20] Conway BR, Tsao DY (2009). “Color-tuned neu- rons are spatially clustered according to color pref- A Neuroethol. Sens. Neural. Behav. Physiol. 189 erence within alert macaque posterior inferior tem- (11): 791–800. doi:10.1007/s00359-003-0454-7. PMID 14520495. poral cortex”. Proc Natl Acad Sci U S A. 106 (42): 18035–18039. Bibcode:2009PNAS..10618034C. [34] Cronin TW, Marshall NJ (1989). “A retina with doi:10.1073/pnas.0810943106. PMC 2764907 . PMID at least ten spectral types of photoreceptors in a 19805195. mantis shrimp”. Nature. 339 (6220): 137–40. Bibcode:1989Natur.339..137C. doi:10.1038/339137a0. [21] John Allman; Steven W. Zucker (1993). “On cytochrome oxidase blobs in visual cortex”. In Laurence Harris; [35] Kelber A, Vorobyev M, Osorio D (February 2003). “Ani- Michael Jenkin. Spatial Vision in Humans and Robots: mal color vision—behavioural tests and physiological con- The Proceedings of the 1991 York Conference. Cambridge cepts”. Biol Rev Camb Philos Soc. 78 (1): 81–118. University Press. ISBN 0-521-43071-2. doi:10.1017/S1464793102005985. PMID 12620062. 3.8. EXTERNAL LINKS 29

[36] Introducing Comparative Colour Vision Colour Vision: A [50] Süsstrunk, Sabine. Chromatic Adaptation Study in Cognitive Science and the Philosophy of Percep- tion, By Evan Thompson [51] Lindbloom, Bruce. Chromatic Adaptation

[37] Roth, Lina S. V.; Lundström, Linda; Kelber, Almut; [52] Fairchild, Mark D. (2005). “8. Chromatic Adaptation”. Kröger, Ronald H. H.; Unsbo, Peter (March 30, 2009). Color Appearance Models. Wiley. p. 146. ISBN 0-470- “The pupils and optical systems of eyes”. Journal 01216-1. of Vision. 9 (3:27): 1–11. doi:10.1167/9.3.27. PMID 19757966.

[38] Shozo Yokoyamaa and F. Bernhard Radlwimmera, “The 3.8 External links Molecular Genetics of Red and Green Color Vision in Mammals”, Genetics, Vol. 153, 919–932, October 1999. • Feynman’s lecture on color vision [39] Jacobs G. H.; Deegan J. F. (2001). “Photopig- • Peter Gouras, “Color Vision”, Webvision, University ments and color vision in New World monkeys from of Utah School of Medicine, May 2009. the family Atelidae”. Proceedings of the Royal So- ciety B: Biological Sciences. 268 (1468): 695–702. • James T. Fulton, “The Human is a Blocked Tetra- doi:10.1098/rspb.2000.1421. chromat”, Neural Concepts, July 2009. [40] Jacobs G. H., Deegan J. F., Neitz , Neitz J., Crog- • Vurdlak, “Mega Color Blindness Test”, Mighty Op- nale M. A. (1993). “Photopigments and color vision tical Illusions, March 2009. in the nocturnal monkey, Aotus". Vision Research. 33 (13): 1773–1783. doi:10.1016/0042-6989(93)90168-V. • Clive (Max) Maxfield, “Color Vision: One of Na- PMID 8266633. ture’s Wonders”, CliveMaxfield.com, 2006. [41] Mollon J. D.; Bowmaker J. K.; Jacobs G. H. (1984). • Egopont, “Color Vision Test”. “Variations of color vision in a New World primate can be explained by polymorphism of retinal photopigments”. • Lintonapps, “Color Vision Test for Iphone” Proceedings of the Royal Society B: Biological Sciences. 222 (1228): 373–399. Bibcode:1984RSPSB.222..373M. • Bruce McEvoy (2008). “Color vision”. Retrieved doi:10.1098/rspb.1984.0071. 2012-03-30. [42] Sternberg, Robert J. (2006): Cognitive Psychology. 4th • What colors do animals see? Web Exhibits Ed. Thomson Wadsworth. • The Science of Why No One Agrees on the Color [43] Arrese CA, Beazley LD, Neumeyer C (March 2006). of This Dress “Behavioural evidence for marsupial trichromacy”. Curr. Biol. 16 (6): R193–4. doi:10.1016/j.cub.2006.02.036. PMID 16546067.

[44] Pinker, Steven (1997). How the Mind Works. New York: Norton. p. 191. ISBN 0-393-04535-8.

[45] Koyanagi, M.; Nagata, T.; Katoh, K.; Yamashita, S.; Tokunaga, F. (2008). “Molecular Evolution of Arthro- pod Color Vision Deduced from Multiple Opsin Genes of Jumping Spiders”. Journal of Molecular Evolution. 66 (2): 130–137. doi:10.1007/s00239-008-9065-9. PMID 18217181.

[46] David Hambling (May 30, 2002). “Let the light shine in: You don't have to come from another planet to see ultra- violet light”. EducationGuardian.co.uk.

[47] Jacobs GH, Neitz J, Deegan JF (1991). “Retinal receptors in rodents maximally sensitive to ultraviolet light”. Na- ture. 353 (6345): 655–6. Bibcode:1991Natur.353..655J. doi:10.1038/353655a0. PMID 1922382.

[48] FJ Varela; AG Palacios; TM Goldsmith (1993). Bischof, Hans-Joachim; Zeigler, H. Philip, eds. Vision, brain, and behavior in birds. Cambridge, Mass: MIT Press. pp. 77– 94. ISBN 0-262-24036-X.

[49] IC Cuthill; JC Partridge; ATD Bennett; SC Church; NS Hart; S Hunt (2000). “Ultraviolet Vision in Birds”. Ad- vances in the Study of Behavior. 29. pp. 159–214. Chapter 4

YUV

V Y′UV color model is used in the PAL and SECAM +.4 composite color video standards. Previous black-and- white systems used only luma (Y′) information. Color +.3 information (U and V) was added separately via a sub- carrier so that a black-and-white receiver would still be +.2 able to receive and display a color picture transmission in the receiver’s native black-and-white format. +.1 U Y′ stands for the luma component (the brightness) and U and V are the chrominance (color) components; −.4−.3−.2−.1 +.1 +.2 +.3 +.4 luminance is denoted by Y and luma by Y′ – the prime −.1 symbols (') denote gamma compression,[1] with “lumi-

−.2 nance” meaning perceptual (color science) brightness, while “luma” is electronic (voltage of display) brightness. −.3 The YPbPr color model used in analog

−.4 and its digital version YCbCr used in digital video are more or less derived from it, and are sometimes called Y′UV. (CB/PB and CR/PR are deviations from on blue–yellow and red–cyan axes, whereas U and V are Example of U-V color plane, Y′ value = 0.5, represented within blue–luminance and red–luminance differences respec- RGB color gamut tively.) The Y′IQ color space used in the analog NTSC television broadcasting system is related to it, although in YUV is a color space typically used as part of a color a more complex way. image pipeline. It encodes a or video tak- As for etymology, Y, Y′, U, and V are not abbreviations. ing human perception into account, allowing reduced The use of the letter Y for luminance can be traced back bandwidth for chrominance components, thereby typi- to the choice of X Y Z primaries. This lends itself natu- cally enabling transmission errors or compression arti- rally to the usage of the same letter in luma (Y′), which facts to be more efficiently masked by the human per- approximates a perceptually uniform correlate of lumi- ception than using a “direct” RGB-representation. Other nance. Likewise, U and V were chosen to differentiate color spaces have similar properties, and the main reason the U and V axes from those in other spaces, such as the to implement or investigate properties of Y′UV would be x and y chromaticity space. See the equations below or for interfacing with analog or digital television or photo- compare the historical development of the math.[2][3][4] graphic equipment that conforms to certain Y′UV stan- dards. The scope of the terms Y′UV, YUV, YCbCr, YPbPr, etc., is sometimes ambiguous and overlapping. Histori- 4.1 History cally, the terms YUV and Y′UV were used for a specific analog encoding of color information in television sys- Y′UV was invented when engineers wanted color televi- tems, while YCbCr was used for digital encoding of color sion in a black-and-white infrastructure.[5] They needed information suited for video and still-image compression a signal transmission method that was compatible with and transmission such as MPEG and JPEG. Today, the black-and-white (B&W) TV while being able to add term YUV is commonly used in the computer industry to color. The luma component already existed as the black describe file-formats that are encoded using YCbCr. and white signal; they added the UV signal to this as a The Y′UV model defines a color space in terms of one solution. luma (Y′) and two chrominance (UV) components. The The UV representation of chrominance was chosen over

30 4.2. CONVERSION TO/FROM RGB 31

the U and V signals would be zero and only the Y′ sig- nal would need to be transmitted. If R and B were to have been used, these would have non-zero values even in a B&W scene, requiring all three data-carrying signals. This was important in the early days of , because holding the U and V signals to zero while con- necting the signal to Y′ allowed color TV sets to display B&W TV without the additional ex- pense and complexity of special B&W circuitry. In ad- dition, black and white receivers could take the Y′ sig- nal and ignore the color signals, making Y′UV backward- compatible with all existing black-and-white equipment, input and output. It was necessary to assign a narrower bandwidth to the chrominance channel because there was no additional bandwidth available. If some of the lumi- nance information arrived via the chrominance channel (as it would have if RB signals were used instead of dif- ferential UV signals), B&W resolution would have been compromised.[6]

4.2 Conversion to/from RGB

4.2.1 SDTV with BT.601

Y′UV signals are typically created from RGB (red, green and blue) source. Weighted values of R, G, and B are summed to produce Y′, a measure of overall brightness or luminance. U and V are computed as scaled differences between Y′ and the B and R values. BT.601 defines the following constants:

WR = 0.299

WG = 1 − WR − WB = 0.587

WB = 0.114

UMax = 0.436

VMax = 0.615

Y′UV is computed from RGB as follows:

′ Y = WRR + WGG + WBB = 0.299R + 0.587G + 0.114B ′ B − Y ′ U = UMax ≈ 0.492(B − Y ) 1 − WB ′ R − Y ′ V = VMax ≈ 0.877(R − Y ) 1 − WR An image along with its Y′, U, and V components respectively The resulting ranges of Y′, U, and V respectively are [0, 1], [-UMₐₓ, UMₐₓ], and [-VMₐₓ, VMₐₓ]. straight R and B signals because U and V are color differ- Inverting the above transformation converts Y′UV to ence signals. This meant that in a black and white scene RGB: 32 CHAPTER 4. YUV

when directly converting between SDTV and HDTV, the luma (Y′) information is roughly the same but the rep- ′ 1 − W ′ V ′ R = Y + V R = Y + = Y + 1.14V resentation of the chroma (U & V) channel information V 0.877 Max needs conversion. Still in coverage of the CIE 1931 color W (1 − W ) W (1 − W ) G = Y ′ − U B B − V R R space the Rec. 709 color space is almost identical to Rec. UMaxWG VMaxWG 601 and covers 35.9%.[7] In contrast to this UHDTV with 0.232U 0.341V = Y ′ − − = Y ′ − 0.395U − 0.581VRec. 2020 covers a much larger area and would further 0.587 0.587 see its very own matrix set for YUV/Y′UV. 1 − W U B = Y ′ + U B = Y ′ + = Y ′ + 2.033U BT.709 defines these weight values: UMax 0.492 Equivalently, substituting values for the constants and expressing them as matrices gives these formulas for WR = 0.2126 BT.601: WB = 0.0722

     The conversion matrices & formulas for BT.709 are ′ Y 0.299 0.587 0.114 R these:  U  = −0.14713 −0.28886 0.436 G − − V 0.615 0.51499 0.10001 B           ′ R 1 0 1.13983 Y ′ Y 0.2126 0.7152 0.0722 R      G = 1 −0.39465 −0.58060 U  U = −0.09991 −0.33609 0.436 G V 0.615 −0.55861 −0.05639 B B 1 2.03211 0 V      R 1 0 1.28033 Y ′    − −   4.2.2 HDTV with BT.709 G = 1 0.21482 0.38059 U B 1 2.12798 0 V

0.9 520 UHDTV 4.2.3 Notes

HDTV 0.8 540 • The weights used to compute Y′ (top row of matrix) are identical to those used in the Y′IQ color space. 0.7 560 • Equal values of red, green and blue (i.e. levels of 0.6 gray) yield 0 for U and V. Black, RGB=(0, 0, 0), 500 580 yields YUV=(0, 0, 0). White, RGB=(1, 1, 1), yields 0.5 y YUV=(1, 0, 0). 0.4 600 • These formulas are traditionally used in analog tele- 620 visions and equipment; digital equipment such as 0.3 D65 490 700 HDTV and digital video cameras use Y′CbCr.

0.2 • UV planes in a range of [−1,1] 480 0.1 470 • Y′ value of 0 460 0.0 0.0 0.1380 0.2 0.3 0.4 0.5 0.6 0.7 0.8 • Y′ value of 0.5 x • Y′ value of 1 HDTV Rec. 709 (quite close to SDTV Rec. 601) compared with UHDTV Rec. 2020

For HDTV the ATSC decided to change the basic values 4.3 Numerical approximations for WR and WB compared to the previously selected val- ues in the SDTV system. For HDTV these values are pro- Prior to the development of fast SIMD floating-point pro- vided by Rec. 709. This decision further impacted on the cessors, most digital implementations of RGB→Y′UV matrix for the Y′UV↔RGB conversion so that its mem- used integer math, in particular fixed-point approxima- ber values are also slightly different. As a result, with tions. Approximation means that the precision of the SDTV and HDTV there are generally two distinct Y′UV used numbers (input data, output data and constant val- representations possible for any RGB triple: a SDTV- ues) is limited and thus a precision loss of typically about Y′UV and a HDTV-Y′UV one. This means in detail that the last binary digit is accepted by whoever makes use of 4.4. LUMINANCE/CHROMINANCE SYSTEMS IN GENERAL 33

that option in typically a trade off to improved computa- 4.3.2 Full swing for BT.601 tion speeds. For getting a 'full swing' 8 bit representation of Y′UV for In the following examples, the operator " a ≫ b " denotes SDTV/BT.601 the following operations can be used: a right-shift of a by b bits. For clarification the variables are using two suffix characters: 'u' is used for the un- 1. Basic transform from 8 bit RGB to 16 bit values signed final representation and 't' is used for the scaled (Y′: unsigned, U/V: signed, matrix values got rounded down intermediate value. The examples below are given so that the later on desired Y′UV range of each [0..255] for BT.601 only. The same principle can be used for do- is reached whilst no overflow can happen): ing functionally equivalent operations using values that do an acceptable match for data that follows the BT.709 or      ′ any other comparable standard. Y 76 150 29 R  U  = −43 −84 127 G Y′ values are conventionally shifted and scaled to the V 127 −106 −21 B range [16, 235] (referred to as studio swing or “TV lev- els”) rather than using the full range of [0, 255] (referred 2. Scale down (">>8”) to 8 bit values with rounding to as full swing or “PC levels”). This confusing practice ("+128”) (Y′: unsigned, U/V: signed): derives from the MPEG standards and explains why 16 is added to Y′ and why the Y′ coefficients in the basic ′ ′ transform sum to 220 instead of 255.[8] U and V values, Y t = (Y + 128) ≫ 8 which may be positive or negative, are summed with 128 Ut = (U + 128) ≫ 8 to make them always positive, giving a studio range of V t = (V + 128) ≫ 8 16–240 for U and V. (These ranges are important in video 3. Add an offset to the values to eliminate any negative editing and production, since using the wrong range will values (all results are 8 bit unsigned): result either in an image with “clipped” and , or a low-contrast image.) Y u′ = Y t′ Uu = Ut + 128 4.3.1 Studio swing for BT.601 V u = V t + 128

For getting the traditional 'studio swing' 8 bit representa- tion of Y′UV for SDTV/BT.601 the following operations 4.4 Luminance/chrominance sys- can be used: tems in general 1. Basic transform from 8 bit RGB to 16 bit values (Y′: unsigned, U/V: signed, matrix values got rounded so that The primary advantage of luma/chroma systems such as the later on desired Y′ range of [16..236] and U/V range Y′UV, and its relatives Y′IQ and YDbDr, is that they re- of [16..240] is reached): main compatible with black and white (largely due to the work of ). The Y′ channel saves all the data recorded by black and white      Y ′ 66 129 25 R cameras, so it produces a signal suitable for reception on  U  = −38 −74 112 G old monochrome displays. In this case, the U and V are V 112 −94 −18 B simply discarded. If displaying color, all three channels are used, and the original RGB information can be de- 2. Scale down (">>8”) to 8 bit with rounding ("+128”) coded. (Y′: unsigned, U/V: signed): Another advantage of Y′UV is that some of the infor- mation can be discarded in order to reduce bandwidth. The human eye has fairly little spatial sensitivity to color: Y t′ = (Y ′ + 128) ≫ 8 the accuracy of the brightness information of the lumi- Ut = (U + 128) ≫ 8 nance channel has far more impact on the image detail V t = (V + 128) ≫ 8 discerned than that of the other two. Understanding this human shortcoming, standards such as NTSC and PAL 3. Add an offset to the values to eliminate any negative reduce the bandwidth of the chrominance channels con- values (all results are 8 bit unsigned): siderably. (Bandwidth is in the temporal domain, but this translates into the spatial domain as the image is scanned out.) Y u′ = Y t′ + 16 Therefore, the resulting U and V signals can be substan- Uu = Ut + 128 tially “compressed”. In the NTSC (Y′IQ) and PAL sys- V u = V t + 128 tems, the chrominance signals had significantly narrower 34 CHAPTER 4. YUV bandwidth than that for the luminance. Early versions Y′UV is not an absolute color space. It is a way of en- of NTSC rapidly alternated between particular colors in coding RGB information, and the actual color displayed identical image areas to make them appear adding up to depends on the actual RGB colorants used to display the each other to the human eye, while all modern analogue signal. Therefore a value expressed as Y′UV is only pre- and even most digital video standards use chroma sub- dictable if standard RGB colorants are used (i.e. a fixed sampling by recording a picture’s color information at set of primary chromaticities, or particular set of red, reduced resolution. Only half the horizontal resolution green, and blue). compared to the brightness information is kept (termed Furthermore, the range of colors and 4:2:2 chroma subsampling), and often the vertical reso- (known as the color gamut) of RGB (whether it be lution is also halved (giving 4:2:0). The 4:x:x standard BT.601 or Rec.709) is far smaller than the range of col- was adopted due to the very earliest color NTSC stan- ors and brightnesses allowed by Y′UV. This can be very dard which used a chroma subsampling of 4:1:1 (where important when converting from Y′UV (or Y′CbCr) to the horizontal color resolution is quartered while the ver- RGB, since the formulas above can produce “invalid” tical is full resolution) so that the picture carried only a RGB values – i.e., values below 0% or very far above quarter as much color resolution compared to brightness 100% of the range (e.g. outside the standard 16-235 luma resolution. Today, only high-end equipment processing range (and 16-240 chroma range) for TVs and HD con- uncompressed signals uses a chroma subsampling of 4:4:4 tent, or outside 0-255 for standard definition on PCs). with identical resolution for both brightness and color in- Unless these values are dealt with they will usually be formation. “clipped” (i.e., limited) to the valid range of the channel The I and Q axes were chosen according to bandwidth affected. This changes the hue of the color, so it is there- needed by human vision, one axis being that requiring the fore often considered better to desaturate the offending most bandwidth, and the other (fortuitously at 90 degrees) colors such that they fall within the RGB gamut.[9] Like- the minimum. However, true I and Q demodulation was wise, when RGB at a given bit depth is converted to YUV relatively more complex, requiring two analog delay lines, at the same bit depth, several RGB colors can become the and NTSC receivers rarely used it. same Y′UV color, resulting in information loss. However, this color space conversion is lossy, particu- larly obvious in crosstalk from the luma to the chroma- carrying wire, and vice versa, in analogue equipment (in- 4.5 Relation with Y′CbCr cluding RCA connectors to transfer a digital signal, as all they carry is analogue , which is ei- Y′UV is often used as a term for YCbCr. However, ther YUV, YIQ, or even CVBS). Furthermore, NTSC they are completely different formats with different scale and PAL encoded color signals in a manner that causes factors.[10] high bandwidth chroma and luma signals to mix with each Nevertheless, the relationship between them in the stan- other in a bid to maintain backward compatibility with dard case is simple. In particular, the Y channel is the black and white television equipment, which results in dot same in both, both Cb and U are proportional to (B-Y), crawl and cross color artifacts. When the NTSC standard and both Cr and V are proportional to (R-Y). was created in the 1950s, this was not a real concern since the quality of the image was limited by the monitor equip- ment, not the limited-bandwidth signal being received. However today′s modern television is capable of display- 4.6 Types of sampling ing more information than is contained in these lossy sig- nals. To keep pace with the abilities of new display tech- To get a digital signal, Y′UV images can be sampled in nologies, attempts were made since the late 1970s to pre- several different ways; see chroma subsampling. serve more of the Y′UV signal while transferring images, such as SCART (1977) and S-Video (1987) connectors. Instead of Y′UV, Y′CbCr was used as the standard for- 4.7 Converting between Y′UV and mat for (digital) common video compression algorithms RGB such as MPEG-2. Digital television and DVDs preserve their compressed video streams in the MPEG-2 format, RGB files are typically encoded in 8, 12, 16 or 24 bits per which uses a full Y′CbCr color space, although retaining pixel. In these examples, we will assume 24 bits per pixel, the established process of chroma subsampling. The pro- which is written as RGB888. The standard byte format fessional CCIR 601 digital video format also uses Y′CbCr is: at the common chroma subsampling rate of 4:2:2, pri- marily for compatibility with previous analog video stan- r0 = rgb[0]; g0 = rgb[1]; b0 = rgb[2]; r1 = rgb[3]; g1 = dards. This stream can be easily mixed into any output rgb[4]; b1 = rgb[5]; ... format needed. Y′UV files can be encoded in 12, 16 or 24 bits per pixel. The common formats are Y′UV444 (or YUV444), 4.7. CONVERTING BETWEEN Y′UV AND RGB 35

YUV411, Y′UV422 (or YUV422) and Y′UV420p (or R = clamp((298 × C + 409 × E + 128) >> 8) YUV420). The apostrophe after the Y is often omit- G = clamp((298 × C − 100 × D − 208 × E + 128) >> 8) ted, as is the “p” after YUV420p. In terms of actual B = clamp((298 × C + 516 × D + 128) >> 8) file formats, YUV420 is the most common, as the data is more easily compressed, and the file extension is usu- Note: The above formulae are actually implied for ally ".YUV”. YCbCr. Though the term YUV is used here, it should be noted that YUV and YCbCr are not exactly the same The relation between data rate and sampling (A:B:C) is in a strict manner. defined by the ratio between Y to U and V channel.[11][12] The ITU-R version of the formulae is different: To convert from RGB to YUV or back, it is simplest to use RGB888 and YUV444. For YUV411, YUV422 and YUV420, the bytes need to be converted to YUV444 Y = 0.299 × R + 0.587 × G + 0.114 × B + 0 first. Cb = −0.169 × R − 0.331 × G + 0.499 × B + 128 YUV444 3 bytes per pixel (12 bytes per 4 pixels) C = 0.499 × R − 0.418 × G − 0.0813 × B + 128 YUV422 4 bytes per 2 pixels ( 8 bytes per 4 pixels) r YUV411 6 bytes per 4 pixels YUV420p 6 bytes per 4 pixels, reordered R = clamp(Y + 1.402 × (Cr − 128)) G = clamp(Y − 0.344 × (Cb − 128) − 0.714 × (Cr − 128)) 4.7.1 Y′UV444 to RGB888 conversion B = clamp(Y + 1.772 × (Cb − 128))

The function [R, G, B] = Y′UV444toRGB888(Y′, U, V) Integer operation of ITU-R standard for YCbCr(8 bits per converts Y′UV format to simple RGB format. channel) to RGB888: The RGB conversion formulae used for Y′UV444 format are also applicable to the standard NTSC TV transmission Cr = Cr − 128; format of YUV420 (or YUV422 for that matter). For C = C − 128; YUV420, since each U or V sample is used to represent 4 b b Y samples that form a square, a proper sampling method R = Y + Cr + (Cr >> 2) + (Cr >> 3) + (Cr >> 5) can allow the utilization of the exact conversion formulae G = Y − ((Cb >> 2) + (Cb >> 4) + (Cb >> 5)) − ((Cr >> 1) + (Cr >> 3) + (Cr >> 4) + (Cr >> 5)) shown below. For more details, please see the 420 format B = Y + Cb + (Cb >> 1) + (Cb >> 2) + (Cb >> 6) demonstration in the bottom section of this article. These formulae are based on the NTSC standard: 4.7.2 Y′UV422 to RGB888 conversion

Y ′ = 0.299 × R + 0.587 × G + 0.114 × B Input: Read 4 bytes of Y′UV (u, y1, v, y2 ) U = −0.147 × R − 0.289 × G + 0.436 × B Output: Writes 6 bytes of RGB (R, G, B, R, G, B) V = 0.615 × R − 0.515 × G − 0.100 × B

On older, non-SIMD architectures, floating point arith- u = [0]; y1 = yuv[1]; v = yuv[2]; y2 = yuv[3]; metic is much slower than using fixed-point arithmetic, Using this information it could be parsed as regular so an alternative formulation is:[13] Y′UV444 format to get 2 RGB pixels info: rgb1 = Y′UV444toRGB888(y1, u, v); rgb2 = Y ′ = ((66 × R + 129 × G + 25 × B + 128) >> 8) + 16Y′UV444toRGB888(y2, u, v); U = ((−38 × R − 74 × G + 112 × B + 128) >> 8) +Y′UV422 128 can also be expressed in YUY2 FourCC V = ((112 × R − 94 × G − 18 × B + 128) >> 8) + 128format code. That means 2 pixels will be defined in each macropixel (four bytes) treated in the image. For the conversion from Y'UV to RGB, using the coeffi- Address range cients C, D and E and noting that clamp() denotes clamp- ing a value to the range of 0 to 255, the following formu- Y0 U0 Y1 V0 Y2 U1 Y3 V1 lae provide the conversion from Y′UV to RGB (NTSC …. version):

4.7.3 Y′UV411 to RGB888 conversion C = Y ′ − 16 D = U − 128 Input: Read 6 bytes of Y′UV E = V − 128 Output: Writes 12 bytes of RGB 36 CHAPTER 4. YUV

// Extract YUV components u = yuv[0]; y1 = yuv[1]; y2 = yuv[2]; v = yuv[3]; y3 = yuv[4]; y4 = yuv[5]; rgb1 = Y′UV444toRGB888(y1, u, v); rgb2 = Y′UV444toRGB888(y2, u, v); rgb3 = Y′UV444toRGB888(y3, u, v); rgb4 = Y′UV444toRGB888(y4, u, v); So the result is we are getting 4 RGB pixels values (4*3 bytes) from 6 bytes. This means reducing the size of transferred data to half, with a loss of quality.

4.7.4 Y′UV420p (and Y′V12 or YV12) to RGB888 conversion

Y′UV420p is a planar format, meaning that the Y′, U, As shown in the above image, the Y′, U and V compo- and V values are grouped together instead of interspersed. nents in Y′UV420 are encoded separately in sequential The reason for this is that by grouping the U and V val- blocks. A Y′ value is stored for every pixel, followed by ues together, the image becomes much more compress- a U value for each 2×2 square block of pixels, and finally ible. When given an array of an image in the Y′UV420p a V value for each 2×2 block. Corresponding Y′, U and format, all the Y′ values come first, followed by all the U V values are shown using the same color in the diagram values, followed finally by all the V values. above. Read line-by-line as a byte stream from a device, the Y′ block would be found at position 0, the U block at The Y′V12 format is essentially the same as Y′UV420p, position x×y (6×4 = 24 in this example) and the V block but it has the U and V data switched: the Y′ values are at position x×y + (x×y)/4 (here, 6×4 + (6×4)/4 = 30). followed by the V values, with the U values last. As long as care is taken to extract U and V values from the proper locations, both Y′UV420p and Y′V12 can be processed using the same algorithm. As with most Y′UV formats, there are as many Y′ values as there are pixels. Where X equals the height multiplied 4.7.5 Y′UV420sp (NV21) to RGB conver- by the width, the first X indices in the array are Y′ values that correspond to each individual pixel. However, there sion (Android) are only one fourth as many U and V values. The U and V values correspond to each 2 by 2 block of the image, This format (NV21) is the standard picture format on meaning each U and V entry applies to four pixels. After Android camera preview. YUV 4:2:0 planar image, with the Y′ values, the next X/4 indices are the U values for 8 bit Y samples, followed by interleaved V/U plane with each 2 by 2 block, and the next X/4 indices after that are 8bit 2x2 subsampled chroma samples.[14] the V values that also apply to each 2 by 2 block. C++ code used on Android to convert pixels of Translating Y′UV420p to RGB is a more involved pro- YUVImage:[15] cess compared to the previous formats. Lookup of the Y′, U and V values can be done using the following method: void YUVImage::yuv2rgb(uint8_t yValue, uint8_t uValue, uint8_t vValue, uint8_t *r, uint8_t *g, uint8_t size.total = size.width * size.height; y = yuv[position.y *b) const { int rTmp = yValue + (1.370705 * (vValue- * size.width + position.x]; u = yuv[(position.y / 2) * 128)); int gTmp = yValue - (0.698001 * (vValue-128)) (size.width / 2) + (position.x / 2) + size.total]; v = - (0.337633 * (uValue-128)); int bTmp = yValue + yuv[(position.y / 2) * (size.width / 2) + (position.x / 2) + (1.732446 * (uValue-128)); *r = clamp(rTmp, 0, 255); size.total + (size.total / 4)]; rgb = Y′UV444toRGB888(y, *g = clamp(gTmp, 0, 255); *b = clamp(bTmp, 0, 255); u, v); } Here "/" means integer division. 4.9. EXTERNAL LINKS 37

4.8 References

[1] Engineering Guideline EG 28, “Annotated Glossary of Essential Terms for Electronic Production,” SMPTE, 1993. [2] CIELUV [3] CIE 1960 color space [4] Macadam, David L. (1 August 1937). “Projective Trans- formations of I. C. I. Color Specifications”. Journal of the Optical society of America. 27 (8): 294–297. doi:10.1364/JOSA.27.000294. Retrieved 12 April 2014. [5] Maller, Joe. RGB and YUV Color, FXScript Reference [6] W. Wharton & D. Howorth, Principles of Television Re- ception, Pitman Publishing, 1971, pp 161-163 [7] ""Super Hi-Vision” as Next-Generation Television and Its Video Parameters”. Information Display. Retrieved 1 January 2013. [8] Keith Jack. Video Demystified. ISBN 1-878707-09-4. [9] Limiting of YUV digital video signals (BBC publication) Authors: V.G. Devereux http://downloads.bbc.co.uk/rd/ pubs/reports/1987-22. [10] Poynton, Charles (19 June 1999). “YUV and luminance considered harmful”. Retrieved 22 August 2008. [11] msdn.microsoft.com, Recommended 8-Bit YUV Formats for Video Rendering [12] msdn.microsoft.com, YUV Video Subtypes [13] https://msdn.microsoft.com/en-us/library/ms893078. aspx [14] fourcc.com YUV pixel formas [15] https://android.googlesource.com/platform/frameworks/ av/+/master/media/libstagefright/yuv/YUVImage.cpp

4.9 External links

• RGB/Y′UV Pixel Conversion • Explanation of many different formats in the Y′UV family • Poynton, Charles. Video engineering • Kohn, Mike. Y′UV422 to RGB using SSE/Assembly • YUV, YCbCr, YPbPr color spaces • Color formats for image and video processing - Color conversion between RGB, YUV, YCbCr and YPbPr • How to convert RGB to YUV420P • libyuv • pixfc-sse - C library of SSE-optimized color format conversions Chapter 5

YCbCr

“CbCr” redirects here. For other uses, see CBCR. as a part of the in video and digital YCbCr, Y′CbCr, or Y Pb/Cb Pr/Cr, also written as photography systems. Y′ is the luma component and CB and CR are the blue-difference and red-difference chroma components. Y′ (with prime) is distinguished from Y, which is luminance, meaning that light intensity is nonlin- early encoded based on gamma corrected RGB primaries. Y′CbCr color spaces are defined by a mathematical coordinate transformation from an associated RGB color space. If the underlying RGB color space is absolute, the Y′CbCr color space is an absolute color space as well; conversely, if the RGB space is ill-defined, so is Y′CbCr.

5.1 Rationale

Cathode ray tube displays are driven by red, green, and blue voltage signals, but these RGB signals are not ef- ficient as a representation for storage and transmission, since they have a lot of redundancy. YCbCr and Y′CbCr are a practical approximation to A visualization of YCbCr color space color processing and perceptual uniformity, where the primary colors corresponding roughly to red, green and blue are processed into perceptually meaningful informa- tion. By doing this, subsequent image/video processing, transmission and storage can do operations and introduce errors in perceptually meaningful ways. Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (CB and CR) that can be bandwidth- reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency. One practical example would be decreasing the band- width or resolution allocated to “color” compared to “black and white”, since humans are more sensitive to the black-and-white information (see image example to the right). This is called chroma subsampling.

5.2 YCbCr

The CbCr plane at constant luma Y′=0.5 YCbCr is sometimes abbreviated to YCC. Y′CbCr is of- ten called YPbPr when used for analog component video, YCBCR or Y′CBCR, is a family of color spaces used although the term Y′CbCr is commonly used for both sys-

38 5.2. YCBCR 39

tems, with or without the prime. Y′CbCr is often confused with the YUV color space, and typically the terms YCbCr and YUV are used inter- changeably, leading to some confusion; when referring to signals in video or digital form, the term “YUV” mostly means “Y′CbCr”. Y′CbCr signals (prior to scaling and offsets to place the signals into digital form) are called YPbPr, and are cre- ated from the corresponding gamma-adjusted RGB (red, green and blue) source using two defined constants KB and KR as follows:

′ ′ ′ ′ Y = KR · R + (1 − KR − KB) · G + KB · B 1 B′ − Y ′ PB = · 2 1 − KB 1 R′ − Y ′ PR = · 2 1 − KR where KB and KR are ordinarily derived from the defi- nition of the corresponding RGB space. (The equivalent matrix manipulation is often referred to as the “color ma- trix”.) Here, the prime ′ symbols mean is be- ing used; thus R′, G′ and B′ nominally range from 0 to 1, with 0 representing the minimum intensity (e.g., for dis- play of the color black) and 1 the maximum (e.g., for dis- play of the color white). The resulting luma (Y) value will then have a nominal range from 0 to 1, and the chroma (PB and PR) values will have a nominal range from −0.5 to +0.5. The reverse conversion process can be readily derived by inverting the above equations. When representing the signals in digital form, the results are scaled and rounded, and offsets are typically added. For example, the scaling and offset applied to the Y′ com- ponent per specification (e.g. MPEG-2[1]) results in the value of 16 for black and the value of 235 for white when using an 8-bit representation. The standard has 8-bit digi- tized versions of CB and CR scaled to a different range of 16 to 240. Consequently, rescaling by the fraction (235- 16)/(240-16) = 219/224 is sometimes required when do- ing color matrixing or processing in YCbCr space, result- ing in quantization when the subsequent pro- cessing is not performed using higher bit depths. The scaling that results in the use of a smaller range of digital values than what might appear to be desirable for representation of the nominal range of the input data al- lows for some “” and “undershoot” during pro- cessing without necessitating undesirable . This “head-room” and “toe-room” can also be used for exten- sion of the nominal color gamut, as specified by xvYCC. A color image and its Y, CB and CR components. The Y image The value 235 accommodates a maximum black-to-white is essentially a greyscale copy of the main image. overshoot of 255 - 235 = 20, or 20 / ( 235 - 16 ) = 9.1%, which is slightly larger than the theoretical max- imum overshoot (Gibbs’ Phenomenon) of about 8.9% of the maximum step. The toe-room is smaller, allowing 40 CHAPTER 5. YCBCR only 16 / 219 = 7.3% overshoot, which is less than the The resultant signals range from 16 to 235 for Y′ (Cb theoretical maximum overshoot of 8.9%. and Cr range from 16 to 240); the values from 0 to 15 Since the equations defining YCbCr are formed in a way are called footroom, while the values from 236 to 255 are that rotates the entire nominal RGB color cube and scales called headroom. it to fit within a (larger) YCbCr color cube, there are some Alternatively, digital Y′CbCr can derived from digital points within the YCbCr color cube that cannot be rep- R'dG'dB'd (8 bits per sample, each using the full range resented in the corresponding RGB domain (at least not with zero representing black and 255 representing white) within the nominal RGB range). This causes some diffi- according to the following equations: culty in determining how to correctly interpret and dis- play some YCbCr signals. These out-of-range YCbCr ′ ′ ′ values are used by xvYCC to encode colors outside the ′ 65.738 · R 129.057 · G 25.064 · B Y = 16+ D + D + D BT.709 gamut. 256 256 256 37.945 · R′ 74.494 · G′ 112.439 · B′ C = 128− D − D + D B 256 256 256 112.439 · R′ 94.154 · G′ 18.285 · B′ C = 128+ D − D − D R 256 256 256 In the above formula, the scaling factors are multiplied by 256 255 . This allows for the value 256 in the denominator, which can be calculated by a single bitshift. RGB to YCbCr conversion If the R'dG'dB'd digital source includes footroom and headroom, the footroom offset 16 needs to be subtracted 255 first from each signal, and a scale factor of 219 needs to 5.2.1 ITU-R BT.601 conversion be included in the equations. The inverse transform is: The form of Y′CbCr that was defined for standard- definition television use in the ITU-R BT.601 (formerly CCIR 601) standard for use with 298.082 · Y ′ 408.583 · C R′ = + R − 222.921 is derived from the corresponding RGB space as follows: D 256 256 298.082 · Y ′ 100.291 · C 208.120 · C G′ = − B − R + 135.576 D 256 256 256 K = 0.299 · ′ · R ′ 298.082 Y 516.412 CB − BD = + 276.836 KG = 0.587 256 256 KB = 0.114 The inverse transform without any roundings (using val- ues coming directly from ITU-R BT.601 recommenda- From the above constants and formulas, the following can tion) is: be derived for ITU-R BT.601. Analog YPbPr from analog R'G'B' is derived as follows: 255 255 R′ = · (Y ′ − 16) + · 0.701 ·(C − 128) D 219 112 R ′ ′ ′ ′ Y = 0.299 · R + 0.587 · G + 0.114 · B′ 255 · ′ − − 255 · ·0.114 · − − 255 · ·0.299 · − GD = (Y 16) 0.886 (CB 128) 0.701 (CR 128) ′ ′ ′ 219 112 0.587 112 0.587 P = − 0.168736 · R − 0.331264 · G + 0.5 · B B 255 255 ′ ′ ′ ′ ′ · − · − ·B = · (Y − 16)+ · 0.886 ·(CB − 128) PR = 0.5 R 0.418688 G 0.081312 BD 219 112 Digital Y′CbCr (8 bits per sample) is derived from analog This form of Y′CbCr is used primarily for older standard- R'G'B' as follows: definition television systems, as it uses an RGB model that fits the phosphor emission characteristics of older CRTs.

Y ′ = 16+ (65.481 · R′+ 128.553 · G′+ 24.966 · B′) ′ ′ 5.2.2′ ITU-R BT.709 conversion CB = 128+ (−37.797 · R − 74.203 · G + 112.0 · B ) ′ ′ ′ CR = 128+ (112.0 · R − 93.786 · G − 18.214A· differentB ) form of Y′CbCr is specified in the ITU-R BT.709 standard, primarily for HDTV use. The newer or simply componentwise form is also used in some computer-display oriented ap- plications. In this case, the values of Kb and Kr differ, but the formulas for using them are the same. For ITU-R ′ (Y ,CB,CR) = (16, 128, 128) + (219 · Y, 224 · PB, 224BT.709,· PR) the constants are: 5.3. CBCR PLANE AT Y = 0.5 41

top priority is the most accurate retention of luminance 0.9 [3] 520 UHDTV information. For YcCbcCrc, the coefficients are:

HDTV 0.8 540 KB = 0.0593 0.7 560 KR = 0.2627

0.6

500 580 5.2.4 JPEG conversion 0.5 y JFIF usage of JPEG supports Y′CbCr where Y′, CB and 0.4 600 CR have the full 8-bit range of [0...255].[4] Below are 620 0.3 D65 the conversion equations expressed to six decimal digits 490 700 of precision. (For ideal equations, see ITU-T T.871.[5])

0.2 Note that for the following formulae, the range of each input (R,G,B) is also the full 8-bit range of [0...255]. 480 0.1

470 ′ ′ ′ ′ 460 · · · 0.0 Y = 0 + (0.299 RD) + (0.587 GD) + (0.114 BD) 380 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 − · ′ − · ′ · ′ x CB = 128 (0.168736 RD) (0.331264 GD) + (0.5 BD) · ′ − · ′ − · ′ CR = 128 + (0.5 RD) (0.418688 GD) (0.081312 BD) Rec. 709 compared with Rec. 2020 And back:

′ ′ · − RD = Y + 1.402 (CR 128) ′ ′ KB = 0.0722 − · − − · − GD = Y 0.344136 (CB 128) 0.714136 (CR 128) ′ ′ KR = 0.2126 · − BD = Y + 1.772 (CB 128) This form of Y′CbCr is based on an RGB model that more closely fits the phosphor emission characteristics of 5.3 CbCr Plane at Y = 0.5 newer CRTs and other modern display equipment. The definitions of the R', G', and B' signals also differ • Y=0.5 between BT.709 and BT.601, and differ within BT.601 depending on the type of TV system in use (625-line as Note: when Y = 0, R, G and B must all be zero, thus Cb in PAL and SECAM or 525-line as in NTSC), and differ and Cr can only be zero. Likewise, when Y = 1, R, G and further in other specifications. In different designs there B must all be 1, thus Cb and Cr can only be zero. are differences in the definitions of the R, G, and B chro- Unlike R, G, and B, the Y, Cb and Cr values are not in- maticity coordinates, the reference white point, the sup- dependent; choosing YCbCr values arbitrarily may lead ported gamut range, the exact gamma pre-compensation to one or more of the RGB values that are out of gamut, functions for deriving R', G' and B' from R, G, and B, i.e. greater than 1.0 or less than 0.0. and in the scaling and offsets to be applied during con- version from R'G'B' to Y′CbCr. So proper conversion of Y′CbCr from one form to the other is not just a matter of inverting one matrix and applying the other. In fact, when 5.4 References Y′CbCr is designed ideally, the values of KB and KR are derived from the precise specification of the RGB color [1] e.g. the MPEG-2 specification, ITU H.262 2000 E pg. 44 primary signals, so that the luma (Y′) signal corresponds [2] Charles Poynton, Digital Video and HDTV, Chapter 24, as closely as possible to a gamma-adjusted measurement pp. 291–292, Morgan Kaufmann, 2003. of luminance (typically based on the CIE 1931 measure- [3] “BT.2020 : Parameter values for ultra-high definition ments of the response of the human visual system to color television systems for production and international pro- [2] stimuli). gramme exchange”. International Telecommunication Union. June 2014. Retrieved 2014-09-08. 5.2.3 ITU-R BT.2020 conversion [4] JPEG File Interchange Format Version 1.02 [5] T.871: Information technology – Digital compression and The ITU-R BT.2020 standard defines both gamma cor- coding of continuous-tone still images: JPEG File Inter- rected Y′CbCr and a linear encoded version of YCbCr change Format (JFIF). ITU-T. September 11, 2012. Re- called YcCbcCrc.[3] YcCbcCrc may be used when the trieved 2016-07-25. 42 CHAPTER 5. YCBCR

5.5 External links

• Charles Poynton — Color FAQ

• Charles Poynton — Video engineering • Color Space Visualization

• PC Magazine Encyclopedia: YCbCr

• YUV, YCbCr, YPbPr color spaces. • Color formats for image and video processing — Color conversion between RGB, YUV, YCbCr and YPbPr.

. Chapter 6

Chroma subsampling

Chroma subsampling is the practice of encoding images detail at a lower rate. In video systems, this is achieved by implementing less resolution for chroma information through the use of color difference components. The sig- than for luma information, taking advantage of the human nal is divided into a luma (Y') component and two color visual system’s lower acuity for color differences than for difference components (chroma). luminance.[1] In human vision there are three channels for color detec- It is used in many video encoding schemes — both analog tion, and for many color systems, three “channels” is suf- and digital — and also in JPEG encoding. ficient for representing most colors. For example: red, green, blue or magenta, yellow, cyan. But there are other ways to represent the color. In many video systems, the 6.1 Rationale three channels are luminance and two chroma channels. In video, the luma and chroma components are formed as a weighted sum of gamma-corrected (tristimulus) R'G'B' components instead of linear (tristimulus) RGB compo- nents. As a result, luma must be distinguished from lumi- nance. That there is some “bleeding” of luminance and color information between the luma and chroma compo- nents in video, the error being greatest for highly satu- rated colors and noticeable in between the magenta and green bars of a color bars test pattern (that has chroma subsampling applied), should not be attributed to this engineering approximation being used. Indeed, similar bleeding can occur also with gamma = 1, whence the re- In full size, this image shows the difference between four sub- versing of the order of operations between gamma cor- sampling schemes. Note how similar the color images appear. The lower row shows the resolution of the color information. rection and forming the weighted sum can make no dif- ference. The chroma can influence the luma specifically Digital signals are often compressed to save transmission at the pixels where the subsampling put no chroma. In- time and reduce file size. Since the human visual sys- terpolation may then put chroma values there which are tem is much more sensitive to variations in brightness incompatible with the luma value there, and further post- than color, a video system can be optimized by devot- processing of that Y'CbCr into R'G'B' for that pixel is ing more bandwidth to the luma component (usually de- what ultimately produces false luminance upon display. noted Y'), than to the color difference components Cb and Cr. In compressed images, for example, the 4:2:2 Y'CbCr scheme requires two-thirds the bandwidth of (4:4:4) R'G'B'. This reduction results in almost no visual difference as perceived by the viewer.

Original without color subsampling. 200% zoom. 6.2 How subsampling works

Because the human visual system is less sensitive to the position and motion of color than luminance,[2] bandwidth can be optimized by storing more luminance detail than color detail. At normal viewing distances, there is no perceptible loss incurred by sampling the color Image after color subsampling (compressed with Sony

43 44 CHAPTER 6. CHROMA SUBSAMPLING

Vegas DV codec, box filtering applied.) 6.4.2 4:2:2

The two chroma components are sampled at half the sam- 6.3 Sampling systems and ratios ple rate of luma: the horizontal chroma resolution is halved. This reduces the bandwidth of an uncompressed video signal by one-third with little to no visual differ- The subsampling scheme is commonly expressed as a ence. three part ratio J:a:b (e.g. 4:2:2) or four parts if alpha channel is present (e.g. 4:2:2:4), that describe the num- Many high-end digital video formats and interfaces use ber of luminance and chrominance samples in a concep- this scheme: tual region that is J pixels wide, and 2 pixels high. The parts are (in their respective order): • AVC-Intra 100

• J: horizontal sampling reference (width of the con- • Digital ceptual region). Usually, 4. • DVCPRO50 and DVCPRO HD • a: number of chrominance samples (Cr, Cb) in the first row of J pixels. • Digital-S • b: number of changes of chrominance samples (Cr, • CCIR 601 / / D1 Cb) between first and second row of J pixels. • ProRes (HQ, 422, LT, and Proxy) • Alpha: horizontal factor (relative to first digit). May be omitted if alpha component is not present, and is • XDCAM HD422 equal to J when present. • Canon MXF HD422 This notation is not valid for all combinations and has ex- ceptions, e.g. 4:1:0 (where the height of the region is not 6.4.3 4:2:1 2 pixels but 4 pixels, so if 8 bits/component are used the media would be 9 bits/pixel) and 4:2:1. This sampling mode is not expressible in J:a:b notation. An explanatory image of different chroma subsampling '4:2:1' is an obsolete term from a previous notational schemes can be seen at the following link: http://lea. scheme, and very few software or hardware use it. hamradio.si/~{}s51kq/subsample. (source: “Basics of Cb horizontal resolution is half that of Cr (and a quarter Video": http://lea.hamradio.si/~{}s51kq/V-BAS.HTM) of the horizontal resolution of Y). This exploits the fact or in details in Chrominance Subsampling in Digital Im- that human eye has less spatial sensitivity to blue/yellow ages, by Douglas Kerr. than to red/green. NTSC is similar, in using lower res- The mapping examples given are only theoretical and for olution for blue/yellow than red/green, which in turn has illustration. Also note that the diagram does not indicate less resolution than luma. any chroma filtering, which should be applied to avoid . 6.4.4 4:1:1 To calculate required bandwidth factor relative to 4:4:4 (or 4:4:4:4), one needs to sum all the factors and divide In 4:1:1 chroma subsampling, the horizontal color res- the result by 12 (or 16, if alpha is present). olution is quartered, and the bandwidth is halved com- pared to no chroma subsampling. Initially, 4:1:1 chroma subsampling of the DV format was not considered to be 6.4 Types of sampling and subsam- broadcast quality and was only acceptable for low-end and consumer applications.[3][4] Currently, DV-based formats pling (some of which use 4:1:1 chroma subsampling) are used professionally in electronic news gathering and in 6.4.1 4:4:4 servers. DV has also been sporadically used in feature films and in digital . Each of the three Y'CbCr components have the same In the NTSC system, if the luma is sampled at 13.5 MHz, sample rate, thus there is no chroma subsampling. This then this means that the Cr and Cb signals will each be scheme is sometimes used in high-end film scanners and sampled at 3.375 MHz, which corresponds to a maximum cinematic post production. Nyquist bandwidth of 1.6875 MHz, whereas traditional Note that “4:4:4” may instead be referring to R'G'B' color “high-end broadcast analog NTSC encoder” would have space, which implicitly does also not have any chroma a Nyquist bandwidth of 1.5 MHz and 0.5 MHz for the I/Q subsampling. Formats such as HDCAM SR can record channels. However, in most equipment, especially cheap 4:4:4 R'G'B' over dual-link HD-SDI. TV sets and VHS/ VCR’s the chroma channels 6.4. TYPES OF SAMPLING AND SUBSAMPLING 45

have only the 0.5 MHz bandwidth for both Cr and Cb • In MPEG-2, Cb and Cr are cosited horizontally. Cb (or equivalently for I/Q). Thus the DV system actually and Cr are sited between pixels in the vertical direc- provides a superior color bandwidth compared to the best tion (sited interstitially). composite analog specifications for NTSC, despite having only 1/4 of the chroma bandwidth of a “full” digital signal. • In JPEG/JFIF, H.261, and MPEG-1, Cb and Cr are Formats that use 4:1:1 chroma subsampling include: sited interstitially, halfway between alternate luma samples. • DVCPRO (NTSC and PAL) • In 4:2:0 DV, Cb and Cr are co-sited in the horizontal • NTSC DV and DVCAM direction. In the vertical direction, they are co-sited • D-7 on alternating lines.

6.4.5 4:2:0 Most digital video formats corresponding to PAL use 4:2:0 chroma subsampling, with the exception of In 4:2:0, the horizontal sampling is doubled compared to DVCPRO25, which uses 4:1:1 chroma subsampling. 4:1:1, but as the Cb and Cr channels are only sampled on Both the 4:1:1 and 4:2:0 schemes halve the bandwidth each alternate line in this scheme, the vertical resolution compared to no chroma subsampling. is halved. The data rate is thus the same. This fits rea- With interlaced material, 4:2:0 chroma subsampling can sonably well with the PAL color encoding system since result in motion artifacts if it is implemented the same this has only half the vertical chrominance resolution of way as for progressive material. The luma samples are NTSC. It would also fit extremely well with the SECAM derived from separate time intervals while the chroma color encoding system since like that format, 4:2:0 only samples would be derived from both time intervals. It stores and transmits one color channel per line (the other is this difference that can result in motion artifacts. The channel being recovered from the previous line). How- MPEG-2 standard allows for an alternate interlaced sam- ever, little equipment has actually been produced that out- pling scheme where 4:2:0 is applied to each field (not both puts a SECAM analogue video signal. In general SECAM fields at once). This solves the problem of motion arti- territories either have to use a PAL capable display or a facts, reduces the vertical chroma resolution by half, and transcoder to convert the PAL signal to SECAM for dis- can introduce comb-like artifacts in the image. play. Different variants of 4:2:0 chroma configurations are found in:

• All ISO/IEC MPEG and ITU-T VCEG H.26x video coding standards including H.262/MPEG-2 Part 2 Original. *This image shows a single field. The moving implementations (although some profiles of MPEG- text has some motion blur applied to it. 4 Part 2 and H.264/MPEG-4 AVC allow higher- quality sampling schemes such as 4:4:4) • DVD-Video and Blu-ray Disc.[5][6] • PAL DV and DVCAM 4:2:0 progressive sampling applied to moving interlaced • HDV material. Note that the chroma leads and trails the moving text. *This image shows a single field. • AVCHD and AVC-Intra 50 • Apple Intermediate Codec • most common JPEG/JFIF and MJPEG implemen- tations • VC-1 4:2:0 interlaced sampling applied to moving interlaced material. *This image shows a single field. • SuperMHL [7] In the 4:2:0 interlaced scheme however, vertical resolu- tion of the chroma is roughly halved since the chroma Cb and Cr are each subsampled at a factor of 2 both hor- samples effectively describe an area 2 samples wide by 4 izontally and vertically. samples tall instead of 2X2. As well, the spatial displace- There are three variants of 4:2:0 schemes, having differ- ment between both fields can result in the appearance of ent horizontal and vertical siting. [8] comb-like chroma artifacts. 46 CHAPTER 6. CHROMA SUBSAMPLING

6.5 Out-of-gamut colors

One of the artifacts that can occur with chroma subsam- pling is that out-of-gamut colors can occur upon chroma Original still image. reconstruction. Suppose the image consisted of alternat- ing 1-pixel red and black lines and the subsampling omit- ted the chroma for the black pixels. Chroma from the red pixels will be reconstructed onto the black pixels, caus- ing the new pixels to have positive red and negative green and blue values. As displays cannot output negative light (negative light does not exist), these negative values will 4:2:0 progressive sampling applied to a still image. Both effectively be clipped and the resulting luma value will fields are shown. be too high.[10] Similar artifacts arise in the less artificial example of gradation near a fairly sharp red/black bound- ary. Filtering during subsampling can also cause colors to go out of gamut. 4:2:0 interlaced sampling applied to a still image. Both fields are shown. If the interlaced material is to be de-interlaced, the comb- 6.6 Terminology like chroma artifacts (from 4:2:0 interlaced sampling) can [9] be removed by blurring the chroma vertically. The term Y'UV refers to an analog encoding scheme while Y'CbCr refers to a digital encoding scheme. One difference between the two is that the scale factors on the chroma components (U, V, Cb, and Cr) are different. 6.4.6 4:1:0 However, the term YUV is often used erroneously to re- fer to Y'CbCr encoding. Hence, expressions like “4:2:2 YUV” always refer to 4:2:2 Y'CbCr since there simply is This ratio is possible, and some codecs support it, but it no such thing as 4:x:x in analog encoding (such as YUV). is not widely used. This ratio uses half of the vertical and one-fourth the horizontal color resolutions, with only one- In a similar vein, the term luminance and the symbol Y eighth of the bandwidth of the maximum color resolu- are often used erroneously to refer to luma, which is de- tions used. Uncompressed video in this format with 8-bit noted with the symbol Y'. Note that the luma (Y') of video quantization uses 10 bytes for every macropixel (which engineering deviates from the luminance (Y) of color sci- is 4 x 2 pixels). It has the equivalent chrominance band- ence (as defined by CIE). Luma is formed as the weighted width of a PAL I signal decoded with a delay line decoder, sum of gamma-corrected (tristimulus) RGB components. and still very much superior to NTSC. Luminance is formed as a weighed sum of linear (tristim- ulus) RGB components. In practice, the CIE symbol Y is often incorrectly used • Some video codecs may operate at 4:1:0.5 or to denote luma. In 1993, SMPTE adopted Engineering 4:1:0.25 as an option, so as to allow similar to VHS Guideline EG 28, clarifying the two terms. Note that the quality. prime symbol ' is used to indicate gamma correction. Similarly, the chroma/chrominance of video engineer- ing differs from the chrominance of color science. The chroma/chrominance of video engineering is formed 6.4.7 3:1:1 from weighted tristimulus components, not linear compo- nents. In video engineering practice, the terms chroma, Used by Sony in their HDCAM High Definition chrominance, and saturation are often used interchange- recorders (not HDCAM SR). In the horizontal dimen- ably to refer to chrominance. sion, luma is sampled horizontally at three quarters of the full HD sampling rate- 1440 samples per row instead of 1920. Chroma is sampled at 480 samples per row, a third 6.7 History of the luma sampling rate. In the vertical dimension, both luma and chroma are sam- Chroma subsampling was developed in the 1950s by Alda pled at the full HD sampling rate (1080 samples verti- Bedford for the development of color television by RCA, cally). which developed into the NTSC standard; luma-chroma 6.10. SEE ALSO 47 separation was developed earlier, in 1938 by Georges haloing artifacts. Valensi. Through studies, he showed that the human eye has high resolution only for black and white, somewhat less for 6.10 See also “mid-range” colors like and , and much less for colors on the end of the spectrum, reds and . • Color space Using this knowledge allowed RCA to develop a system • in which they discarded most of the blue signal after it SMPTE - Society of Motion Picture and Television comes from the camera, keeping most of the green and Engineers only some of the red; this is chroma subsampling in the • Digital video YIQ color space, and is roughly analogous to 4:2:1 sub- sampling, in that it has decreasing resolution for luma, • HDTV yellow/green, and red/blue. • YCbCr • YPbPr 6.8 Effectiveness • CCIR 601 4:2:2 SDTV

While subsampling can easily reduce the size of an un- • YUV compressed image by 50% with minimal loss of qual- • Color ity, the final effect on the size of a compressed image is considerably less. This is because image compres- • Color vision sion algorithms also remove redundant chroma informa- • tion. In fact, by applying something as rudimentary as Rod cell chroma subsampling prior to compression, information is • Cone cell removed from the image that could be used by the com- • pression algorithm to produce a higher quality result with Better pictorial explanation here no increase in size. For example, with compres- sion methods, better results are obtained by dropping the highest frequency chroma layer inside the compression al- 6.11 References gorithm than by applying chroma subsampling prior to compression. This is because wavelet compression op- [1] S. Winkler, C. J. van den Branden Lambrecht, and M. erates by repeatedly using as high and low pass Kunt (2001). “Vision and Video: Models and Applica- filters to separate frequency bands in an image, and the tions”. In Christian J. van den Branden Lambrecht. Vision wavelets do a better job than chroma subsampling does. models and applications to image and video processing. Springer. p. 209. ISBN 978-0-7923-7422-0.

[2] Livingstone, Margaret (2002). “The First Stages of Pro- 6.9 Compatibility issues cessing Color and Luminance: Where and What”. Vision and Art: The Biology of Seeing. New York: Harry N. Abrams. pp. 46–67. ISBN 0-8109-0406-3. The details of chroma subsampling implementation cause considerable confusion. Is the upper leftmost chroma [3] Jennings, Roger; Bertel Schmitt (1997). “DV vs. Beta- value stored, or the rightmost, or is it the average of all cam SP”. DV Central. Retrieved 2008-08-29. External the chroma values? This must be exactly specified in link in |work= (help) standards and followed by all implementors. Incorrect [4] Wilt, Adam J. (2006). “DV, DVCAM & DVCPRO For- implementations cause the chroma of an image to be off- mats”. adamwilt.com. Retrieved 2008-08-29. External set from the luma. Repeated compression/decompression link in |work= (help) can cause the chroma to “travel” in one direction. Differ- ent standards may use different versions for example of [5] Clint DeBoer (2008-04-16). “HDMI Enhanced Black “4:2:0” with respect to how the chroma value is deter- Levels, xvYCC and RGB”. Audioholics. Retrieved 2013- 06-02. mined, making one version of “4:2:0” incompatible with another version of “4:2:0”. [6] “Digital Color Coding” (PDF). Telairity. Retrieved 2013- 06-02. Proper upsampling of chroma can require knowing whether the source is progressive or interlaced, informa- [7] “look out and theres a new cable in tion which is often not available to the upsampler. town”.

Chroma subsampling causes problems for film makers [8] Poynton, Charles (2008). “Chroma Subsampling Nota- trying to do keying with blue or green screening. The tion” (PDF). Charles Poynton. Retrieved 2008-10-01. chroma interpolation along edges produces noticeable External link in |work= (help) 48 CHAPTER 6. CHROMA SUBSAMPLING

[9] Munsil, Don; Stacey Spears (2003). “DVD Player Bench- mark - Chroma Upsampling Error”. Secrets of Home The- ater & High Fidelity. Retrieved 2008-08-29. External link in |work= (help)

[10] Chan, Glenn. “Towards Better Chroma Subsampling”. SMPTE Journal. Retrieved 2008-08-29. External link in |work= (help)

• Poynton, Charles. “YUV and luminance considered harmful: A plea for precise terminology in video”

• Poynton, Charles. “Digital Video and HDTV: Al- gorithms and Interfaces”. U.S.: Morgan Kaufmann Publishers, 2003. • Kerr, Douglas A. “Chrominance Subsampling in Digital Images” Chapter 7

Discrete cosine transform

A discrete cosine transform (DCT) expresses a finite implicit in the cosine functions. sequence of data points in terms of a sum of cosine func- tions oscillating at different frequencies. DCTs are im- A generic sampled signal portant to numerous applications in science and engineer- ing, from lossy compression of audio (e.g. MP3) and images (e.g. JPEG) (where small high-frequency compo- n 0 2 4 6 8 10 nents can be discarded), to spectral methods for the nu- The modulus of its DFT merical solution of partial differential equations. The use of cosine rather than sine functions is critical for compres- sion, since it turns out (as described below) that fewer co-

n sine functions are needed to approximate a typical signal, 0 2 4 6 8 10 whereas for differential equations the cosines express a Its DCT particular choice of boundary conditions. In particular, a DCT is a Fourier-related transform simi- 0 2 4 6 8 10 lar to the discrete Fourier transform (DFT), but using only n real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symme- DCT-II (bottom) compared to the DFT (middle) of an input sig- try (since the Fourier transform of a real and even func- nal (top). tion is real and even), where in some variants the input and/or output data are shifted by half a sample. There A related transform, the modified discrete cosine trans- are eight standard DCT variants, of which four are com- form, or MDCT (based on the DCT-IV), is used in AAC, mon. , WMA, and MP3 audio compression. The most common variant of discrete cosine transform DCTs are also widely employed in solving partial differ- is the type-II DCT, which is often called simply “the ential equations by spectral methods, where the differ- [1][2] DCT”. Its inverse, the type-III DCT, is correspond- ent variants of the DCT correspond to slightly different ingly often called simply “the inverse DCT” or “the even/odd boundary conditions at the two ends of the ar- IDCT”. Two related transforms are the discrete sine ray. transform (DST), which is equivalent to a DFT of real and odd functions, and the modified discrete cosine transform DCTs are also closely related to Chebyshev polyno- (MDCT), which is based on a DCT of overlapping data. mials, and fast DCT algorithms (below) are used in Chebyshev approximation of arbitrary functions by series of Chebyshev polynomials, for example in Clenshaw– Curtis quadrature. 7.1 Applications

The DCT, and in particular the DCT-II, is often used in 7.1.1 JPEG signal and image processing, especially for lossy com- pression, because it has a strong “energy compaction” Main article: JPEG § Discrete cosine transform property:[1][2] in typical applications, most of the sig- nal information tends to be concentrated in a few low- The DCT is used in JPEG image compression, MJPEG, frequency components of the DCT. For strongly cor- MPEG, DV, , and video compression. related Markov processes, the DCT can approach the There, the two-dimensional DCT-II of N × N blocks compaction efficiency of the Karhunen-Loève transform are computed and the results are quantized and entropy (which is optimal in the decorrelation sense). As ex- coded. In this case, N is typically 8 and the DCT-II for- plained below, this stems from the boundary conditions mula is applied to each row and column of the block. The

49 50 CHAPTER 7. DISCRETE COSINE TRANSFORM

result is an 8 × 8 transform coefficient array in which the quences, two issues arise that do not apply for the contin- (0, 0) element (top-left) is the DC (zero-frequency) com- uous cosine transform. First, one has to specify whether ponent and entries with increasing vertical and horizontal the function is even or odd at both the left and right bound- index values represent higher vertical and horizontal spa- aries of the domain (i.e. the min-n and max-n boundaries tial frequencies. in the definitions below, respectively). Second, one has to specify around what point the function is even or odd. In particular, consider a sequence abcd of four equally 7.2 Informal overview spaced data points, and say that we specify an even left boundary. There are two sensible possibilities: either the data are even about the sample a, in which case the even Like any Fourier-related transform, discrete cosine trans- extension is dcbabcd, or the data are even about the point forms (DCTs) express a function or a signal in terms halfway between a and the previous point, in which case of a sum of sinusoids with different frequencies and the even extension is dcbaabcd (a is repeated). amplitudes. Like the discrete Fourier transform (DFT), a DCT operates on a function at a finite number of dis- These choices lead to all the standard variations of DCTs crete data points. The obvious distinction between a DCT and also discrete sine transforms (DSTs). Each boundary and a DFT is that the former uses only cosine functions, can be either even or odd (2 choices per boundary) and while the latter uses both cosines and sines (in the form of can be symmetric about a data point or the point halfway complex exponentials). However, this visible difference between two data points (2 choices per boundary), for a is merely a consequence of a deeper distinction: a DCT total of 2 × 2 × 2 × 2 = 16 possibilities. Half of these implies different boundary conditions from the DFT or possibilities, those where the left boundary is even, cor- other related transforms. respond to the 8 types of DCT; the other half are the 8 types of DST. The Fourier-related transforms that operate on a function over a finite domain, such as the DFT or DCT or a Fourier These different boundary conditions strongly affect the series, can be thought of as implicitly defining an exten- applications of the transform and lead to uniquely use- sion of that function outside the domain. That is, once ful properties for the various DCT types. Most directly, you write a function f(x) as a sum of sinusoids, you can when using Fourier-related transforms to solve partial evaluate that sum at any x , even for x where the original differential equations by spectral methods, the boundary f(x) was not specified. The DFT, like the Fourier series, conditions are directly specified as a part of the problem implies a periodic extension of the original function. A being solved. Or, for the MDCT (based on the type-IV DCT, like a cosine transform, implies an even extension DCT), the boundary conditions are intimately involved of the original function. in the MDCT’s critical property of time-domain aliasing cancellation. In a more subtle fashion, the boundary con-

N = 11 ditions are responsible for the “energy compactification” properties that make DCTs useful for image and audio compression, because the boundaries affect the rate of DCT-I: convergence of any Fourier-like series.

0 10 In particular, it is well known that any discontinuities in a function reduce the rate of convergence of the Fourier DCT-II: series, so that more sinusoids are needed to represent the function with a given accuracy. The same principle governs the usefulness of the DFT and other transforms for signal compression; the smoother a function is, the DCT-III: fewer terms in its DFT or DCT are required to represent it accurately, and the more it can be compressed. (Here, we think of the DFT or DCT as approximations for the Fourier series or cosine series of a function, respectively, in order to talk about its “smoothness”.) However, the DCT-IV: implicit periodicity of the DFT means that discontinu- ities usually occur at the boundaries: any random seg- ment of a signal is unlikely to have the same value at both the left and right boundaries. (A similar problem arises for the DST, in which the odd left boundary condi- tion implies a discontinuity for any function that does not Illustration of the implicit even/odd extensions of DCT input data, for N=11 data points (red dots), for the four most common types happen to be zero at that boundary.) In contrast, a DCT of DCT (types I-IV). where both boundaries are even always yields a contin- uous extension at the boundaries (although the slope is However, because DCTs operate on finite, discrete se- generally discontinuous). This is why DCTs, and in par- 7.3. FORMAL DEFINITION 51

ticular DCTs of types I, II, V, and VI (the types that have Some authors further multiply the X0 term by 1/√2 and two even boundaries) generally perform better for signal multiply√ the resulting matrix by an overall scale factor compression than DFTs and DSTs. In practice, a type-II of 2/N (see below for the corresponding change in DCT is usually preferred for such applications, in part for DCT-III). This makes the DCT-II matrix orthogonal, but reasons of computational convenience. breaks the direct correspondence with a real-even DFT of half-shifted input. This is the normalization used by Matlab, for example. In many applications, such as 7.3 Formal definition JPEG, the scaling is arbitrary because scale factors can be combined with a subsequent computational step (e.g. the quantization step in JPEG[3]), and a scaling that can Formally, the discrete cosine transform is a linear, in- be chosen that allows the DCT to be computed with fewer vertible function f : RN → RN (where R denotes the multiplications.[4][5] set of real numbers), or equivalently an invertible N × N square matrix. There are several variants of the DCT with The DCT-II implies the boundary conditions: xn is even − − slightly modified definitions. The N real numbers x0, ..., around n= 1/2 and even around n=N 1/2; Xk is even xN−₁ are transformed into the N real numbers X0, ..., around k=0 and odd around k=N. XN−₁ according to one of the formulas: 7.3.3 DCT-III

7.3.1 DCT-I − [ ( )] 1 N∑1 π 1 [ ] − N∑−2 Xk = x0+ xn cos n k + k = 0,...,N 1. 1 k π 2 N 2 X = (x +(−1) x − )+ x cos nk k = 0,...,Nn=1−1. k 2 0 N 1 n N − 1 n=1 Because it is the inverse of DCT-II (up to a scale factor, see below), this form is sometimes simply referred to as Some authors further multiply the x and xN−₁ terms 0 “the inverse DCT” (“IDCT”).[2] by √2, and correspondingly multiply the X0 and XN−₁ terms by 1/√2. This makes the DCT-I matrix orthogonal, Some authors divide the x0 term by √2 instead of by 2 √if one further multiplies by an overall scale factor of (resulting in an overall x0/√2 term) and multiply√ the re- 2/(N − 1) , but breaks the direct correspondence with sulting matrix by an overall scale factor of 2/N (see a real-even DFT. above for the corresponding change in DCT-II), so that the DCT-II and DCT-III are transposes of one another. The DCT-I is exactly equivalent (up to an overall scale This makes the DCT-III matrix orthogonal, but breaks factor of 2), to a DFT of 2N − 2 real numbers with even the direct correspondence with a real-even DFT of half- symmetry. For example, a DCT-I of N=5 real numbers shifted output. abcde is exactly equivalent to a DFT of eight real numbers abcdedcb (even symmetry), divided by two. (In contrast, The DCT-III implies the boundary conditions: xn is even DCT types II-IV involve a half-sample shift in the equiv- around n=0 and odd around n=N; Xk is even around alent DFT.) k=−1/2 and odd around k=N−1/2. Note, however, that the DCT-I is not defined for N less than 2. (All other DCT types are defined for any positive 7.3.4 DCT-IV N.) [ ( )( )] N∑−1 Thus, the DCT-I corresponds to the boundary conditions: π 1 1 X = x cos n + k + k = 0,...,N−1. xn is even around n=0 and even around n=N−1; similarly k n N 2 2 n=0 for Xk. The DCT-IV matrix becomes orthogonal (and thus, being clearly symmetric, its own inverse)√ if one further multi- 7.3.2 DCT-II plies by an overall scale factor of 2/N . [ ( ) ] A variant of the DCT-IV, where data from different N∑−1 π 1 transforms are overlapped, is called the modified discrete X = x cos n + k k = 0,...,N−1. k n N 2 cosine transform (MDCT) (Malvar, 1992). n=0 The DCT-II is probably the most commonly used form, The DCT-IV implies the boundary conditions: xn is even − − and is often simply referred to as “the DCT”.[1][2] around n= 1/2 and odd around n=N 1/2; similarly for Xk. This transform is exactly equivalent (up to an overall scale factor of 2) to a DFT of 4N real inputs of even symmetry where the even-indexed elements are zero. That is, it is 7.3.5 DCT V-VIII half of the DFT of the 4N inputs yn , where y2n = 0 , y2n+1 = xn for 0 ≤ n < N , y2N = 0 , and y4N−n = DCTs of types I-IV treat both boundaries consistently re- yn for 0 < n < 2N . garding the point of symmetry: they are even/odd around 52 CHAPTER 7. DISCRETE COSINE TRANSFORM either a data point for both boundaries or halfway be- tween two data points for both boundaries. By contrast, DCTs of types V-VIII imply boundaries that are even/odd ( ) − − [ ( ) ] [ ( ) ] around a data point for one boundary and halfway be- N∑1 1 N∑2 1 π 1 π 1 tween two data points for the other boundary. Xk1,k2 = xn1,n2 cos n2 + k2 cos n1 + k1 N2 2 N1 2 n1=0 n2=0 In other words, DCT types I-IV are equivalent to real- − − [ ( ) ] [ ( ) ] N∑1 1 N∑2 1 even DFTs of even order (regardless of whether N is even π 1 π 1 = xn1,n2 cos n1 + k1 cos n2 + k2 . or odd), since the corresponding DFT is of length 2(N−1) N1 2 N2 2 n1=0 n2=0 (for DCT-I) or 4N (for DCT-II/III) or 8N (for DCT- IV). The four additional types of discrete cosine trans- form (Martucci, 1994) correspond essentially to real- Technically, computing a two- (or multi-) dimensional even DFTs of logically odd order, which have factors of N±½ in the denominators of the cosine arguments. However, these variants seem to be rarely used in prac- tice. One reason, perhaps, is that FFT algorithms for odd- length DFTs are generally more complicated than FFT al- gorithms for even-length DFTs (e.g. the simplest radix-2 algorithms are only for even lengths), and this increased intricacy carries over to the DCTs as described below. (The trivial real-even array, a length-one DFT (odd length) of a single number a, corresponds to a DCT-V of length N=1.)

7.4 Inverse transforms

Using the normalization conventions above, the inverse of DCT-I is DCT-I multiplied by 2/(N−1). The inverse of DCT-IV is DCT-IV multiplied by 2/N. The inverse of Two-dimensional DCT frequencies from the JPEG DCT DCT-II is DCT-III multiplied by 2/N and vice versa.[2] Like for the DFT, the normalization factor in front of these transform definitions is merely a convention and DCT by sequences of one-dimensional DCTs along each differs between treatments.√ For example, some authors dimension is known as a row-column algorithm (after the multiply the transforms by 2/N so that the inverse two-dimensional case). As with multidimensional FFT does not require any additional multiplicative factor. algorithms, however, there exist other methods to com- Combined with appropriate factors of √2 (see above), this pute the same thing while performing the computations can be used to make the transform matrix orthogonal. in a different order (i.e. interleaving/combining the algo- rithms for the different dimensions). The inverse of a multi-dimensional DCT is just a sep- arable product of the inverse(s) of the corresponding 7.5 Multidimensional DCTs one-dimensional DCT(s) (see above), e.g. the one- dimensional inverses applied along one dimension at a time in a row-column algorithm. Multidimensional variants of the various DCT types fol- The image to the right shows combination of horizontal low straightforwardly from the one-dimensional defini- and vertical frequencies for an 8 x 8 ( N1 = N2 = 8 tions: they are simply a separable product (equivalently, ) two-dimensional DCT. Each step from left to right and a composition) of DCTs along each dimension. top to bottom is an increase in frequency by 1/2 cycle. For For example, a two-dimensional DCT-II of an image or example, moving right one from the top-left square yields a matrix is simply the one-dimensional DCT-II, from a half-cycle increase in the horizontal frequency. Another above, performed along the rows and then along the move to the right yields two half-cycles. A move down columns (or vice versa). That is, the 2D DCT-II is given yields two half-cycles horizontally and a half-cycle ver- by the formula (omitting normalization and other scale tically. The source data (8x8) is transformed to a linear factors, as above): combination of these 64 frequency squares. 7.7. EXAMPLE OF IDCT 53

7.6 Computation and this method in hindsight can be seen as one step of a radix-4 decimation-in-time Cooley–Tukey algorithm ap- Although the direct application of these formulas would plied to the “logical” real-even DFT corresponding to the require O(N2) operations, it is possible to compute the DCT II. (The radix-4 step reduces the size 4N DFT to same thing with only O(N log N) complexity by factor- four size- N DFTs of real data, two of which are zero izing the computation similarly to the fast Fourier trans- and two of which are equal to one another by the even form (FFT). One can also compute DCTs via FFTs com- symmetry, hence giving a single size- N FFT of real data bined with O(N) pre- and post-processing steps. In gen- plus O(N) butterflies.) Because the even-indexed ele- eral, O(N log N) methods to compute DCTs are known ments are zero, this radix-4 step is exactly the same as a as fast cosine transform (FCT) algorithms. split-radix step; if the subsequent size- N real-data FFT is also performed by a real-data split-radix algorithm (as The most efficient algorithms, in principle, are usually in Sorensen et al. 1987), then the resulting algorithm ac- those that are specialized directly for the DCT, as op- tually matches what was long the lowest published arith- posed to using an ordinary FFT plus O(N) extra op- − metic count for the power-of-two DCT-II ( 2N log2 N erations (see below for an exception). However, even N + 2 real-arithmetic operations[lower-alpha 1]). So, there “specialized” DCT algorithms (including all of those that is nothing intrinsically bad about computing the DCT via achieve the lowest known arithmetic counts, at least for an FFT from an arithmetic perspective—it is sometimes power-of-two sizes) are typically closely related to FFT merely a question of whether the corresponding FFT al- algorithms—since DCTs are essentially DFTs of real- gorithm is optimal. (As a practical matter, the function- even data, one can design a fast DCT algorithm by tak- call overhead in invoking a separate FFT routine might ing an FFT and eliminating the redundant operations be significant for small N , but this is an implementation due to this symmetry. This can even be done automat- rather than an algorithmic question since it can be solved ically (Frigo & Johnson, 2005). Algorithms based on the by unrolling/inlining.) Cooley–Tukey FFT algorithm are most common, but any other FFT algorithm is also applicable. For example, the Winograd FFT algorithm leads to minimal-multiplication 7.7 Example of IDCT algorithms for the DFT, albeit generally at the cost of more additions, and a similar algorithm was proposed by Feig & Winograd (1992) for the DCT. Because the algo- Consider this 8x8 grayscale image of capital letter A. rithms for DFTs, DCTs, and similar transforms are all so closely related, any improvement in algorithms for one transform will theoretically lead to immediate gains for the other transforms as well (Duhamel & Vetterli 1990). While DCT algorithms that employ an unmodified FFT often have some theoretical overhead compared to the best specialized DCT algorithms, the former also have a distinct advantage: highly optimized FFT programs Original size, scaled 10x (nearest neighbor), scaled 10x (bilin- are widely available. Thus, in practice, it is often eas- ear). ier to obtain high performance for general lengths N with FFT-based algorithms. (Performance on modern DCT of the image. hardware is typically not dominated simply by arithmetic counts, and optimization requires substantial engineering   effort.) Specialized DCT algorithms, on the other hand, 6.1917 −0.3411 1.2418 0.1492 0.1583 0.2742 −0.0724 0.0561   see widespread use for transforms of small, fixed sizes  0.2205 0.0214 0.4503 0.3947 −0.7846 −0.4391 0.1001 −0.2554 ×   such as the 8 8 DCT-II used in JPEG compression, or  1.0423 0.2214 −1.0017 −0.2720 0.0789 −0.1952 0.2801 0.4713    the small DCTs (or MDCTs) typically used in audio com- −0.2340 −0.0392 −0.2617 −0.2866 0.6351 0.3501 −0.1433 0.3550    pression. (Reduced code size may also be a reason to use  0.2750 0.0226 0.1229 0.2183 −0.2583 −0.0742 −0.2042 −0.5906   a specialized DCT for embedded-device applications.)  0.0653 0.0428 −0.4721 −0.2905 0.4745 0.2875 −0.0284 −0.1311  − − − − −  In fact, even the DCT algorithms using an ordinary FFT 0.3169 0.0541 0.1033 0.0225 0.0056 0.1017 0.1650 0.1500 − − − − are sometimes equivalent to pruning the redundant op- 0.2970 0.0627 0.1960 0.0644 0.1136 0.1031 0.1887 0.1444 erations from a larger FFT of real-symmetric data, and they can even be optimal from the perspective of arith- Each basis function is multiplied by its coefficient and metic counts. For example, a type-II DCT is equivalent then this product is added to the final image. to a DFT of size 4N with real-even symmetry whose even-indexed elements are zero. One of the most com- mon methods for computing this via an FFT (e.g. the 7.8 See also method used in FFTPACK and FFTW) was described by Narasimha & Peterson (1978) and Makhoul (1980), • JPEG—Contains a potentially easier to understand 54 CHAPTER 7. DISCRETE COSINE TRANSFORM

7.10 Citations

[1] Ahmed, N.; Natarajan, T.; Rao, K. R. (January 1974), “Discrete Cosine Transform”, IEEE Transactions on Com- puters, C–23 (1): 90–93, doi:10.1109/T-C.1974.223784 [2] Rao, K; Yip, P (1990), Discrete Cosine Transform: Al- gorithms, Advantages, Applications, Boston: Academic Press, ISBN 0-12-580203-X [3] W. B. Pennebaker and J. L. Mitchell, JPEG Still Image Data Compression Standard. New York: Van Nostrand Reinhold, 1993. [4] Y. Arai, T. Agui, and M. Nakajima, “A fast DCT-SQ scheme for images,” Trans. IEICE, vol. 71, no. 11, pp. 1095–1097, 1988. [5] X. Shao and S. G. Johnson, “Type-II/III DCT/DST al- gorithms with reduced number of arithmetic operations,” Signal Processing, vol. 88, pp. 1553–1564, June 2008. Basis functions of the discrete cosine transformation with corre- sponding coefficients (specific for our image). 7.11 References

• Narasimha, M.; Peterson, A. (June 1978). “On the Computation of the Discrete Cosine Transform”. IEEE Transactions on Communications. 26 (6): 934–936. doi:10.1109/TCOM.1978.1094144. • Makhoul, J. (February 1980). “A fast cosine trans- On the left is the final image. In the middle is the weighted func- form in one and two dimensions”. IEEE Transac- tion (multiplied by a coefficient) which is added to the final im- tions on Acoustics, Speech, and Signal Processing. 28 age. On the right is the current function and corresponding coef- (1): 27–34. doi:10.1109/TASSP.1980.1163351. ficient. Images are scaled (using bilinear interpolation) by factor • 10×. Sorensen, H.; Jones, D.; Heideman, M.; Burrus, C. (June 1987). “Real-valued fast Fourier trans- form algorithms”. IEEE Transactions on Acoustics, example of DCT transformation Speech, and Signal Processing. 35 (6): 849–863. doi:10.1109/TASSP.1987.1165220. • Modified discrete cosine transform • Arai, Y.; Agui, T.; Nakajima, M. (November 1988). • Discrete sine transform “A fast DCT-SQ scheme for images”. IEICE Trans- actions. 71 (11): 1095–1097. • Discrete Fourier transform • Plonka, G.; Tasche, M. (January 2005). “Fast and • List of Fourier-related transforms numerically stable algorithms for discrete cosine transforms”. Linear Algebra and its Applications. • Discrete 394 (1): 309–345. doi:10.1016/j.laa.2004.07.015. • Duhamel, P.; Vetterli, M. (April 1990). “Fast fourier transforms: A tutorial review and a state 7.9 Notes of the art”. Signal Processing. 19 (4): 259–299. doi:10.1016/0165-1684(90)90158-U. [1] The precise count of real arithmetic operations, and in • particular the count of real multiplications, depends some- Ahmed, N. (January 1991). “How I came up what on the scaling of the transform definition. The with the discrete cosine transform”. Digital Sig- − 2N log2 N N + 2 count is for the DCT-II definition nal Processing. 1 (1): 4–9. doi:10.1016/1051- shown here; two multiplications√ can be saved if the trans- 2004(91)90086-Z. form is scaled by an overall 2 factor. Additional multi- • plications can be saved if one permits the outputs of the Feig, E.; Winograd, S. (September 1992). “Fast al- transform to be rescaled individually, as was shown by gorithms for the discrete cosine transform”. IEEE Arai, Agui & Nakajima (1988) for the size-8 case used Transactions on Signal Processing. 40 (9): 2174– in JPEG. 2193. doi:10.1109/78.157218. 7.13. EXTERNAL LINKS 55

• Malvar, Henrique (1992), Signal Processing with • LTFAT is a free Matlab/Octave toolbox with inter- Lapped Transforms, Boston: Artech House, ISBN faces to the FFTW implementation of the DCTs and 0-89006-467-9 DSTs of type I-IV.

• Martucci, S. A. (May 1994). “Symmetric convo- • Discrete Cosine Transform: An interactive demon- lution and the discrete sine and cosine transforms”. stration. IEEE Transactions on Signal Processing. 42 (5): 1038–1051. doi:10.1109/78.295213.

• Oppenheim, Alan; Schafer, Ronald; Buck, John (1999), Discrete-Time Signal Processing (2nd ed.), Upper Saddle River, N.J: Prentice Hall, ISBN 0-13- 754920-2

• Frigo, M.; Johnson, S. G. (February 2005). “The Design and Implementation of FFTW3” (PDF). Proceedings of the IEEE. 93 (2): 216–231. doi:10.1109/JPROC.2004.840301.

7.12 Further reading

• Wen-Hsiung Chen; Smith, C.; Fralick, S. (Septem- ber 1977). “A Fast Computational Algorithm for the Discrete Cosine Transform”. IEEE Transac- tions on Communications. 25 (9): 1004–1009. doi:10.1109/TCOM.1977.1093941.

• Press, WH; Teukolsky, SA; Vetterling, WT; Flan- nery, BP (2007), “Section 12.4.2. Cosine Trans- form”, Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge Uni- versity Press, ISBN 978-0-521-88068-8

7.13 External links

• “discrete cosine transform”. PlanetMath.

• Syed Ali Khayam: The Discrete Cosine Transform (DCT): Theory and Application

• Implementation of MPEG integer approximation of 8x8 IDCT (ISO/IEC 23002-2)

• Matteo Frigo and Steven G. Johnson: FFTW, http: //www.fftw.org/. A free (GPL) C library that can compute fast DCTs (types I-IV) in one or more di- mensions, of arbitrary size.

• Takuya Ooura: General Purpose FFT Package, http: //www.kurims.kyoto-u.ac.jp/~{}ooura/fft.html. Free C & FORTRAN libraries for computing fast DCTs (types II-III) in one, two or three dimensions, power of 2 sizes.

• Tim Kientzle: Fast algorithms for computing the 8- point DCT and IDCT, http://drdobbs.com/parallel/ 184410889. Chapter 8

H.264/MPEG-4 AVC

H.264 or MPEG-4 Part 10, Advanced Video Cod- used by streaming internet sources, such as from ing (MPEG-4 AVC) is a block-oriented motion- Vimeo, YouTube, and the iTunes Store, web software compensation-based video compression standard. As of such as the Adobe Player and Microsoft Sil- 2014 it is one of the most commonly used formats for verlight, and also various HDTV broadcasts over ter- the recording, compression, and distribution of video restrial (Advanced Television Systems Committee stan- content.[1] dards, ISDB-T, DVB-T or DVB-T2), cable (DVB-C), The intent of the H.264/AVC project was to create a stan- and satellite (DVB-S and DVB-S2). dard capable of providing good at substan- H.264 is protected by patents owned by various parties. tially lower bit rates than previous standards (i.e., half or A license covering most (but not all) patents essential to less the of MPEG-2, H.263, or MPEG-4 Part 2), H.264 is administered by patent pool MPEG LA.[2] Com- without increasing the complexity of design so much that mercial use of patented H.264 technologies requires the it would be impractical or excessively expensive to imple- payment of royalties to MPEG LA and other patent own- ment. An additional goal was to provide enough flexibil- ers. MPEG LA has allowed the free use of H.264 tech- ity to allow the standard to be applied to a wide variety of nologies for streaming internet video that is free to end applications on a wide variety of networks and systems, users, and Cisco Systems pays royalties to MPEG LA on including low and high bit rates, low and high resolution behalf of the users of binaries for its open source H.264 video, broadcast, DVD storage, RTP/IP packet networks, encoder. and ITU-T multimedia telephony systems. The H.264 standard can be viewed as a “family of standards” com- posed of a number of different profiles. A specific de- coder decodes at least one, but not necessarily all profiles. The decoder specification describes which profiles can be 8.1 Naming decoded. H.264 is typically used for lossy compression, although it is also possible to create truly lossless-coded regions within lossy-coded pictures or to support rare use The H.264 name follows the ITU-T naming convention, cases for which the entire encoding is lossless. where the standard is a member of the H.26x line of H.264 was developed by the ITU-T Video Coding Ex- VCEG video coding standards; the MPEG-4 AVC name perts Group (VCEG) together with the ISO/IEC JTC1 relates to the naming convention in ISO/IEC MPEG, Moving Picture Experts Group (MPEG). The project where the standard is part 10 of ISO/IEC 14496, which partnership effort is known as the Joint Video Team is the suite of standards known as MPEG-4. The stan- (JVT). The ITU-T H.264 standard and the ISO/IEC dard was developed jointly in a partnership of VCEG MPEG-4 AVC standard (formally, ISO/IEC 14496-10 – and MPEG, after earlier development work in the ITU- MPEG-4 Part 10, ) are jointly T as a VCEG project called H.26L. It is thus common maintained so that they have identical technical content. to refer to the standard with names such as H.264/AVC, The final drafting work on the first version of the stan- AVC/H.264, H.264/MPEG-4 AVC, or MPEG-4/H.264 dard was completed in May 2003, and various extensions AVC, to emphasize the common heritage. Occasionally, of its capabilities have been added in subsequent editions. it is also referred to as “the JVT codec”, in reference High Efficiency Video Coding (HEVC), a.k.a. H.265 and to the Joint Video Team (JVT) organization that devel- MPEG-H Part 2 is a successor to H.264/MPEG-4 AVC oped it. (Such partnership and multiple naming is not developed by the same organizations, while earlier stan- uncommon. For example, the video compression stan- dards are still in common use. dard known as MPEG-2 also arose from the partnership between MPEG and the ITU-T, where MPEG-2 video H.264 is perhaps best known as being one of the video is known to the ITU-T community as H.262.[3]) Some encoding standards for Blu-ray Discs; all Blu-ray Disc software programs (such as VLC media player) internally players must be able to decode H.264. It is also widely identify this standard as AVC1.

56 8.2. HISTORY 57

8.2 History ferently. The next major feature added to the standard was In early 1998, the Video Coding Experts Group (VCEG – Scalable Video Coding (SVC). Specified in Annex G of ITU-T SG16 Q.6) issued a call for proposals on a project H.264/AVC, SVC allows the construction of bitstreams called H.26L, with the target to double the coding effi- that contain sub-bitstreams that also conform to the stan- ciency (which means halving the bit rate necessary for a dard, including one such bitstream known as the “base given level of fidelity) in comparison to any other existing layer” that can be decoded by a H.264/AVC codec that video coding standards for a broad variety of applications. does not support SVC. For temporal bitstream scalabil- VCEG was chaired by Gary Sullivan (Microsoft, formerly ity (i.e., the presence of a sub-bitstream with a smaller PictureTel, U.S.). The first draft design for that new stan- temporal sampling rate than the main bitstream), com- dard was adopted in August 1999. In 2000, Thomas Wie- plete access units are removed from the bitstream when gand (Heinrich Hertz Institute, Germany) became VCEG deriving the sub-bitstream. In this case, high-level syntax co-chair. and inter-prediction reference pictures in the bitstream In December 2001, VCEG and the Moving Picture Ex- are constructed accordingly. On the other hand, for spa- perts Group (MPEG – ISO/IEC JTC 1/SC 29/WG 11) tial and quality bitstream scalability (i.e. the presence formed a Joint Video Team (JVT), with the charter to fi- of a sub-bitstream with lower spatial resolution/quality nalize the video coding standard.[4] Formal approval of than the main bitstream), the NAL (Network Abstraction the specification came in March 2003. The JVT was Layer) is removed from the bitstream when deriving the (is) chaired by Gary Sullivan, Thomas Wiegand, and sub-bitstream. In this case, inter-layer prediction (i.e., the Ajay Luthra (Motorola, U.S.: later Arris, U.S.). In June prediction of the higher spatial resolution/quality signal 2004, the Fidelity range extensions (FRExt) project was from the data of the lower spatial resolution/quality sig- finalized. From January 2005 to November 2007, the nal) is typically used for efficient coding. The Scalable JVT was working on an extension of H.264/AVC to- Video Coding extensions were completed in November wards scalability by an Annex (G) called Scalable Video 2007. Coding (SVC). The JVT management team was ex- The next major feature added to the standard was tended by Jens-Rainer Ohm (Aachen University, Ger- (MVC). Specified in Annex H many). From July 2006 to November 2009, the JVT of H.264/AVC, MVC enables the construction of bit- worked on Multiview Video Coding (MVC), an extension streams that represent more than one view of a video of H.264/AVC towards free viewpoint television and 3D scene. An important example of this functionality is television. That work included the development of two stereoscopic 3D video coding. Two profiles were devel- new profiles of the standard: the Multiview High Profile oped in the MVC work: Multiview High Profile supports and the Stereo High Profile. an arbitrary number of views, and Stereo High Profile The standardization of the first version of H.264/AVC is designed specifically for two-view stereoscopic video. was completed in May 2003. In the first project to ex- The Multiview Video Coding extensions were completed tend the original standard, the JVT then developed what in November 2009. was called the Fidelity Range Extensions (FRExt). These extensions enabled higher quality video coding by sup- 8.2.1 Versions porting increased sample bit depth precision and higher- resolution color information, including sampling struc- Versions of the H.264/AVC standard include the follow- tures known as Y'CbCr 4:2:2 (=YUV 4:2:2) and Y'CbCr ing completed revisions, corrigenda, and amendments 4:4:4. Several other features were also included in the Fi- (dates are final approval dates in ITU-T, while final “Inter- delity Range Extensions project, such as adaptive switch- national Standard” approval dates in ISO/IEC are some- ing between 4×4 and 8×8 integer transforms, encoder- what different and slightly later in most cases). Each ver- specified perceptual-based quantization weighting matri- sion represents changes relative to the next lower version ces, efficient inter-picture lossless coding, and support that is integrated into the text. of additional color spaces. The design work on the Fi- delity Range Extensions was completed in July 2004, and • Version 1: (May 30, 2003) First approved version the drafting work on them was completed in September of H.264/AVC containing Baseline, Main, and Ex- 2004. tended profiles.[5] Further recent extensions of the standard then included • Version 2: (May 7, 2004) Corrigendum containing adding five other new profiles intended primarily for various minor corrections.[6] professional applications, adding extended-gamut color space support, defining additional aspect ratio indicators, • Version 3: (March 1, 2005) Major addition to defining two additional types of “supplemental enhance- H.264/AVC containing the first amendment pro- ment information” (post-filter hint and ), viding Fidelity Range Extensions (FRExt) contain- and deprecating one of the prior FRExt profiles that in- ing High, High 10, High 4:2:2, and High 4:4:4 dustry feedback indicated should have been designed dif- profiles.[7] 58 CHAPTER 8. H.264/MPEG-4 AVC

• Version 4: (September 13, 2005) Corrigendum con- • Version 17: (April 13, 2013) Amendment with ad- taining various minor corrections and adding three ditional SEI message indicators.[17] aspect ratio indicators.[8] • Version 18: (April 13, 2013) Amendment to specify • Version 5: (June 13, 2006) Amendment consisting the coding of depth map data for 3D stereoscopic of removal of prior High 4:4:4 profile (processed as video, including a Multiview Depth High profile.[17] a corrigendum in ISO/IEC).[9] • Version 19: (April 13, 2013) Corrigendum to cor- • Version 6: (June 13, 2006) Amendment consisting rect an error in the sub-bitstream extraction process of minor extensions like extended-gamut color space for multiview video.[17] support (bundled with above-mentioned aspect ratio [9] indicators in ISO/IEC). • Version 20: (April 13, 2013) Amendment to specify additional color space identifiers (including support • Version 7: (April 6, 2007) Amendment containing of ITU-R Recommendation BT.2020 for UHDTV) the addition of High 4:4:4 Predictive and four Intra- and an additional model type in the tone mapping only profiles (High 10 Intra, High 4:2:2 Intra, High information SEI message.[17] 4:4:4 Intra, and CAVLC 4:4:4 Intra).[10] • • Version 8: (November 22, 2007) Major addition to Version 21: (February 13, 2014) Amendment H.264/AVC containing the amendment for Scalable to specify the Enhanced Multiview Depth High [18] Video Coding (SVC) containing Scalable Baseline, profile. Scalable High, and Scalable High Intra profiles.[11] • Version 22: (February 13, 2014) Amendment • Version 9: (January 13, 2009) Corrigendum con- to specify the multi-resolution frame compatible taining minor corrections.[12] (MFC) enhancement for 3D stereoscopic video, the MFC High profile, and minor corrections.[18] • Version 10: (March 16, 2009) Amendment con- taining definition of a new profile (the Constrained Baseline profile) with only the common subset of ca- pabilities supported in various previously specified 8.3 Applications profiles.[13] Further information: List of video services using • Version 11: (March 16, 2009) Major addi- H.264/MPEG-4 AVC tion to H.264/AVC containing the amendment for Multiview Video Coding (MVC) extension, includ- ing the Multiview High profile.[13] The H.264 video format has a very broad applica- tion range that covers all forms of digital compressed • Version 12: (March 9, 2010) Amendment contain- video from low bit-rate Internet streaming applications to ing definition of a new MVC profile (the Stereo High HDTV broadcast and applications with profile) for two-view video coding with support of nearly lossless coding. With the use of H.264, bit rate interlaced coding tools and specifying an additional savings of 50% or more compared to MPEG-2 Part 2 are SEI message (the frame packing arrangement SEI reported. For example, H.264 has been reported to give message).[14] the same Digital Satellite TV quality as current MPEG- 2 implementations with less than half the bitrate, with • Version 13: (March 9, 2010) Corrigendum contain- current MPEG-2 implementations working at around 3.5 [14] ing minor corrections. Mbit/s and H.264 at only 1.5 Mbit/s.[19] Sony claims that • Version 14: (June 29, 2011) Amendment specifying 9 Mbit/s AVC recording mode is equivalent to the im- age quality of the HDV format, which uses approximately a new level (Level 5.2) supporting higher processing [20] rates in terms of maximum macroblocks per second, 18–25 Mbit/s. and a new profile (the Progressive High profile) sup- To ensure compatibility and problem-free adoption of porting only the frame coding tools of the previously H.264/AVC, many standards bodies have amended or specified High profile.[15] added to their video-related standards so that users of these standards can employ H.264/AVC. Both the Blu- • Version 15: (June 29, 2011) Corrigendum contain- ray Disc format and the now-discontinued HD DVD [15] ing minor corrections. format include the H.264/AVC High Profile as one of • Version 16: (January 13, 2012) Amendment con- 3 mandatory video compression formats. The Digi- taining definition of three new profiles intended pri- tal Video Broadcast project (DVB) approved the use of marily for real-time communication applications: H.264/AVC for broadcast television in late 2004. the Constrained High, Scalable Constrained Base- The Advanced Television Systems Committee (ATSC) line, and Scalable Constrained High profiles.[16] standards body in the United States approved the use 8.4. DESIGN 59

of H.264/AVC for broadcast television in July 2008, al- frames at maximum resolution. This is in con- though the standard is not yet used for fixed ATSC broad- trast to prior standards, where the limit was casts within the United States.[21][22] It has also been ap- typically one; or, in the case of conventional proved for use with the more recent ATSC-M/H (Mo- "B pictures" (B-frames), two. This particular bile/Handheld) standard, using the AVC and SVC por- feature usually allows modest improvements in tions of H.264.[23] bit rate and quality in most scenes. But in cer- The CCTV (Closed Circuit TV) and Video Surveillance tain types of scenes, such as those with repet- markets have included the technology in many products. itive motion or back-and-forth scene cuts or uncovered background areas, it allows a sig- Canon and DSLRs use H.264 video wrapped in nificant reduction in bit rate while maintaining QuickTime MOV containers as the native recording for- clarity. mat. • Variable block-size motion compensation (VBSMC) with block sizes as large as 16×16 8.3.1 Derived formats and as small as 4×4, enabling precise seg- mentation of moving regions. The supported luma prediction block sizes include 16×16, AVCHD is a high-definition recording format designed 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4, many by Sony and Panasonic that uses H.264 (conforming to of which can be used together in a single H.264 while adding additional application-specific fea- macroblock. Chroma prediction block sizes tures and constraints). are correspondingly smaller according to the AVC-Intra is an intraframe-only compression format, de- chroma subsampling in use. veloped by Panasonic. • The ability to use multiple motion vectors per XAVC is a recording format designed by Sony that uses macroblock (one or two per partition) with a level 5.2 of H.264/MPEG-4 AVC, which is the high- maximum of 32 in the case of a B macroblock est level supported by that video standard.[24][25] XAVC constructed of 16 4×4 partitions. The motion can support (4096 × 2160 and 3840 × vectors for each 8×8 or larger partition region 2160) at up to 60 frames per second (fps).[24][25] Sony can point to different reference pictures. has announced that cameras that support XAVC include • The ability to use any macroblock type in B- two CineAlta cameras—the Sony PMW-F55 and Sony frames, including I-macroblocks, resulting in [26][27] PMW-F5. The Sony PMW-F55 can record XAVC much more efficient encoding when using B- with 4K resolution at 30 fps at 300 Mbit/s and 2K reso- frames. This feature was notably left out from [28] lution at 30 fps at 100 Mbit/s. XAVC can record 4K MPEG-4 ASP. resolution at 60 fps with 4:2:2 chroma subsampling at 600 • Mbit/s.[29][30] Six-tap filtering for derivation of half-pel luma sample predictions, for sharper subpixel motion-compensation. Quarter-pixel motion is derived by linear interpolation of the half- 8.4 Design pel values, to save processing power. • Quarter-pixel precision for motion compensa- 8.4.1 Features tion, enabling precise description of the dis- placements of moving areas. For chroma the H.264/AVC/MPEG-4 Part 10 contains a number of new resolution is typically halved both vertically features that allow it to compress video much more effi- and horizontally (see 4:2:0) therefore the mo- ciently than older standards and to provide more flexibil- tion compensation of chroma uses one-eighth ity for application to a wide variety of network environ- chroma pixel grid units. ments. In particular, some such key features include: • Weighted prediction, allowing an encoder to specify the use of a scaling and offset when • Multi-picture inter-picture prediction including the performing motion compensation, and provid- following features: ing a significant benefit in performance in spe- cial cases—such as fade-to-black, fade-in, and • Using previously encoded pictures as refer- cross-fade transitions. This includes implicit ences in a much more flexible way than in past weighted prediction for B-frames, and explicit standards, allowing up to 16 reference frames weighted prediction for P-frames. (or 32 reference fields, in the case of interlaced encoding) to be used in some cases. In pro- • Spatial prediction from the edges of neighboring files that support non-IDR frames, most lev- blocks for “intra” coding, rather than the “DC"- els specify that sufficient buffering should be only prediction found in MPEG-2 Part 2 and the available to allow for at least 4 or 5 reference transform coefficient prediction found in H.263v2 60 CHAPTER 8. H.264/MPEG-4 AVC

and MPEG-4 Part 2. This includes luma prediction • A quantization design including: block sizes of 16×16, 8×8, and 4×4 (of which only one type can be used within each macroblock). • Logarithmic step size control for easier bit rate management by encoders and simplified • Lossless macroblock coding features including: inverse-quantization scaling • A lossless “PCM macroblock” representation • Frequency-customized quantization scal- mode in which video data samples are repre- ing matrices selected by the encoder for sented directly,[31] allowing perfect represen- perceptual-based quantization optimization tation of specific regions and allowing a strict limit to be placed on the quantity of coded data • An in-loop deblocking filter that helps prevent the for each macroblock. blocking artifacts common to other DCT-based im- • An enhanced lossless macroblock representa- age compression techniques, resulting in better vi- tion mode allowing perfect representation of sual appearance and compression efficiency specific regions while ordinarily using substan- • tially fewer bits than the PCM mode. An entropy coding design including: • Flexible interlaced-scan video coding features, in- • Context-adaptive binary arithmetic coding cluding: (CABAC), an algorithm to losslessly compress syntax elements in the video stream knowing • Macroblock-adaptive frame-field (MBAFF) the probabilities of syntax elements in a given coding, using a macroblock pair structure context. CABAC data more effi- for pictures coded as frames, allowing 16×16 ciently than CAVLC but requires considerably macroblocks in field mode (compared with more processing to decode. MPEG-2, where field mode processing in a picture that is coded as a frame results in the • Context-adaptive variable-length coding processing of 16×8 half-macroblocks). (CAVLC), which is a lower-complexity alter- • Picture-adaptive frame-field coding (PAFF or native to CABAC for the coding of quantized PicAFF) allowing a freely selected mixture transform coefficient values. Although lower of pictures coded either as complete frames complexity than CABAC, CAVLC is more where both fields are combined together for elaborate and more efficient than the methods encoding or as individual single fields. typically used to code coefficients in other prior designs. • New transform design features, including: • A common simple and highly structured • An exact-match integer 4×4 spatial block variable length coding (VLC) technique transform, allowing precise placement of for many of the syntax elements not coded residual signals with little of the "ringing" of- by CABAC or CAVLC, referred to as ten found with prior codec designs. This de- Exponential- (or Exp- sign is conceptually similar to that of the well- Golomb). known discrete cosine transform (DCT), intro- duced in 1974 by N. Ahmed, T.Natarajan and • Loss resilience features including: K.R.Rao, which is Citation 1 in Discrete co- • sine transform. However, it is simplified and A Network Abstraction Layer (NAL) defi- made to provide exactly specified decoding. nition allowing the same video syntax to be used in many network environments. One • An exact-match integer 8×8 spatial block very fundamental design concept of H.264 transform, allowing highly correlated regions is to generate self-contained packets, to re- to be compressed more efficiently than with move the header duplication as in MPEG- the 4×4 transform. This design is conceptu- 4’s Header Extension Code (HEC).[32] This ally similar to that of the well-known DCT, but was achieved by decoupling information rel- simplified and made to provide exactly speci- evant to more than one slice from the media fied decoding. stream. The combination of the higher-level • Adaptive encoder selection between the 4×4 parameters is called a parameter set.[32] The and 8×8 transform block sizes for the integer H.264 specification includes two types of pa- transform operation. rameter sets: Sequence Parameter Set (SPS) • A secondary Hadamard transform performed and Picture Parameter Set (PPS). An active on “DC” coefficients of the primary spatial sequence parameter set remains unchanged transform applied to chroma DC coefficients throughout a coded video sequence, and an ac- (and also luma in one special case) to obtain tive picture parameter set remains unchanged even more compression in smooth regions. within a coded picture. The sequence and 8.4. DESIGN 61

picture parameter set structures contain in- • 2: row alternation: L and R are interlaced by formation such as picture size, optional cod- row. ing modes employed, and macroblock to slice • 3: side by side: L is on the left, R on the right. group map.[32] • 4: top bottom: L is on top, R on bottom. • Flexible macroblock ordering (FMO), also known as slice groups, and arbitrary slice or- • 5: frame alternation: one view per frame. dering (ASO), which are techniques for re- • structuring the ordering of the representation Auxiliary pictures, which can be used for such pur- of the fundamental regions (macroblocks) in poses as . pictures. Typically considered an error/loss • Support of monochrome (4:0:0), 4:2:0, 4:2:2, and robustness feature, FMO and ASO can also be 4:4:4 chroma subsampling (depending on the se- used for other purposes. lected profile). • Data partitioning (DP), a feature providing the ability to separate more important and less im- • Support of sample bit depth precision ranging from portant syntax elements into different packets 8 to 14 bits per sample (depending on the selected of data, enabling the application of unequal er- profile). ror protection (UEP) and other types of im- • provement of error/loss robustness. The ability to encode individual color planes as dis- • tinct pictures with their own slice structures, mac- Redundant slices (RS), an error/loss robust- roblock modes, motion vectors, etc., allowing en- ness feature that lets an encoder send an extra coders to be designed with a simple parallelization representation of a picture region (typically at structure (supported only in the three 4:4:4-capable lower fidelity) that can be used if the primary profiles). representation is corrupted or lost. • Frame numbering, a feature that allows the • Picture order count, a feature that serves to keep the creation of “sub-sequences”, enabling tempo- ordering of the pictures and the values of samples in ral scalability by optional inclusion of extra the decoded pictures isolated from timing informa- pictures between other pictures, and the detec- tion, allowing timing information to be carried and tion and concealment of losses of entire pic- controlled/changed separately by a system without tures, which can occur due to network packet affecting decoded picture content. losses or channel errors. • Switching slices, called SP and SI slices, allowing an These techniques, along with several others, help H.264 encoder to direct a decoder to jump into an ongoing to perform significantly better than any prior standard un- video stream for such purposes as video streaming der a wide variety of circumstances in a wide variety of bit rate switching and “trick mode” operation. When application environments. H.264 can often perform radi- a decoder jumps into the middle of a video stream cally better than MPEG-2 video—typically obtaining the using the SP/SI feature, it can get an exact match same quality at half of the bit rate or less, especially on [33] to the decoded pictures at that location in the video high bit rate and high resolution situations. stream despite using different pictures, or no pictures Like other ISO/IEC MPEG video standards, H.264/AVC at all, as references prior to the switch. has a reference software implementation that can be freely downloaded.[34] Its main purpose is to give exam- • A simple automatic process for preventing the ac- ples of H.264/AVC features, rather than being a use- cidental emulation of start codes, which are special ful application per se. Some reference hardware design sequences of bits in the coded data that allow ran- work is also under way in the Moving Picture Experts dom access into the bitstream and recovery of byte Group. The above-mentioned are complete features of alignment in systems that can lose byte synchroniza- H.264/AVC covering all profiles of H.264. A profile for a tion. codec is a set of features of that codec identified to meet a • Supplemental enhancement information (SEI) and certain set of specifications of intended applications. This video usability information (VUI), which are extra means that many of the features listed are not supported information that can be inserted into the bitstream in some profiles. Various profiles of H.264/AVC are dis- to enhance the use of the video for a wide variety of cussed in next section. purposes. SEI FPA (Frame Packing Arrangement) message that contains the 3D arrangement: 8.4.2 Profiles • 0: checkerboard: pixels are alternatively from L and R. The standard defines a sets of capabilities, which are re- • 1: column alternation: L and R are interlaced ferred to as profiles, targeting specific classes of applica- by column. tions. These are declared as a profile code (profile_idc) 62 CHAPTER 8. H.264/MPEG-4 AVC and a set of constraints applied in the encoder. This al- High 4|2|2 Profile (Hi422P, 122) Primarily targeting lows a decoder to recognize the requirements to decode professional applications that use , that specific stream. this profile builds on top of the High 10 Profile, Profiles for non-scalable 2D video applications include adding support for the 4:2:2 chroma subsampling the following: format while using up to 10 bits per sample of decoded picture precision.

Constrained Baseline Profile (CBP, 66 with constraint setHigh 1) 4|4|4 Predictive Profile (Hi444PP, 244) This Primarily for low-cost applications, this profile is profile builds on top of the High 4:2:2 Profile, most typically used in videoconferencing and supporting up to 4:4:4 chroma sampling, up to mobile applications. It corresponds to the subset of 14 bits per sample, and additionally supporting features that are in common between the Baseline, efficient lossless region coding and the coding of Main, and High Profiles. each picture as three separate color planes.

Baseline Profile (BP, 66) Primarily for low-cost appli- cations that require additional data loss robustness, For , editing, and professional applications, this profile is used in some videoconferencing and the standard contains four additional Intra-frame-only mobile applications. This profile includes all fea- profiles, which are defined as simple subsets of other cor- tures that are supported in the Constrained Base- responding profiles. These are mostly for professional line Profile, plus three additional features that can be (e.g., camera and editing system) applications: used for loss robustness (or for other purposes such as low-delay multi-point video stream compositing). High 10 Intra Profile (110 with constraint set 3) The The importance of this profile has faded somewhat High 10 Profile constrained to all-Intra use. since the definition of the Constrained Baseline Pro- High 4|2|2 Intra Profile (122 with constraint set 3) file in 2009. All Constrained Baseline Profile bit- The High 4:2:2 Profile constrained to all-Intra use. streams are also considered to be Baseline Profile bitstreams, as these two profiles share the same pro- High 4|4|4 Intra Profile (244 with constraint set 3) file identifier code value. The High 4:4:4 Profile constrained to all-Intra use.

Extended Profile (XP, 88) Intended as the streaming CAVLC 4|4|4 Intra Profile (44) The High 4:4:4 Profile video profile, this profile has relatively high com- constrained to all-Intra use and to CAVLC entropy pression capability and some extra tricks for robust- coding (i.e., not supporting CABAC). ness to data losses and server stream switching.

Main Profile (MP, 77) This profile is used for As a result of the Scalable Video Coding (SVC) exten- standard-definition digital TV broadcasts that sion, the standard contains five additional scalable pro- use the MPEG-4 format as defined in the DVB files, which are defined as a combination of a H.264/AVC standard.[35] It is not, however, used for high- profile for the base layer (identified by the second word definition television broadcasts, as the importance in the scalable profile name) and tools that achieve the of this profile faded when the High Profile was scalable extension: developed in 2004 for that application. Scalable Baseline Profile (83) Primarily targeting High Profile (HiP, 100) The primary profile for broad- video conferencing, mobile, and surveillance appli- cast and disc storage applications, particularly for cations, this profile builds on top of the Constrained high-definition television applications (for example, Baseline profile to which the base layer (a subset this is the profile adopted by the Blu-ray Disc stor- of the bitstream) must conform. For the scalability age format and the DVB HDTV broadcast service). tools, a subset of the available tools is enabled. Progressive High Profile (PHiP, 100 with constraint setScalable 4) Constrained Baseline Profile (83 with con- Similar to the High profile, but without support of straint set 5) field coding features. A subset of the Scalable Baseline Profile intended primarily for real-time communication applications. Constrained High Profile (100 with constraint set 4 and 5) Similar to the Progressive High profile, but without Scalable High Profile (86) Primarily targeting broad- support of B (bi-predictive) slices. cast and streaming applications, this profile builds on top of the H.264/AVC High Profile to which the High 10 Profile (Hi10P, 110) Going beyond typical base layer must conform. mainstream consumer product capabilities, this profile builds on top of the High Profile, adding Scalable Constrained High Profile (86 with constraint set 5) support for up to 10 bits per sample of decoded A subset of the Scalable High Profile intended pri- picture precision. marily for real-time communication applications. 8.5. IMPLEMENTATIONS 63

Scalable High Intra Profile (86 with constraint set 3) Where MaxDpbMbs is a constant value provided in Primarily targeting production applications, this the table below as a function of level number, and profile is the Scalable High Profile constrained to PicWidthInMbs and FrameHeightInMbs are the picture all-Intra use. width and frame height for the coded video data, ex- pressed in units of macroblocks (rounded up to integer As a result of the Multiview Video Coding (MVC) exten- values and accounting for cropping and macroblock pair- sion, the standard contains two multiview profiles: ing when applicable). This formula is specified in sec- tions A.3.1.h and A.3.2.f of the 2009 edition of the stan- dard. Stereo High Profile (128) This profile targets two-view stereoscopic 3D video and combines the tools of the For example, for an HDTV picture that is 1920 sam- High profile with the inter-view prediction capabili- ples wide (PicWidthInMbs = 120) and 1080 samples high ties of the MVC extension. (FrameHeightInMbs = 68), a Level 4 decoder has a max- imum DPB storage capacity of Floor(32768/(120*68)) = Multiview High Profile (118) This profile supports two 4 frames (or 8 fields) when encoded with minimal crop- or more views using both inter-picture (temporal) ping parameter values. Thus, the value 4 is shown in and MVC inter-view prediction, but does not sup- parentheses in the table above in the right column of the port field pictures and macroblock-adaptive frame- row for Level 4 with the frame size 1920×1080. field coding. It is important to note that the current picture being de- Multiview Depth High Profile (138) coded is not included in the computation of DPB full- ness (unless the encoder has indicated for it to be stored for use as a reference for decoding other pictures or for Feature support in particular profiles delayed output timing). Thus, a decoder needs to actu- ally have sufficient memory to handle (at least) one frame 8.4.3 Levels more than the maximum capacity of the DPB as calcu- lated above. As the term is used in the standard, a "level" is a speci- fied set of constraints that indicate a degree of required decoder performance for a profile. For example, a level of support within a profile specifies the maximum picture 8.5 Implementations resolution, , and bit rate that a decoder may use. A decoder that conforms to a given level must be able to In 2009, the HTML5 working group was split between decode all bitstreams encoded for that level and all lower supporters of Theora, a free video format which levels. is thought to be unencumbered by patents, and H.264, The maximum bit rate for High Profile is 1.25 times that which contains patented technology. As late as July 2009, of the Base/Extended/Main Profiles, 3 times for Hi10P, Google and Apple were said to support H.264, while and 4 times for Hi422P/Hi444PP. and Opera support Ogg Theora (now Google, Mozilla and Opera all support Theora and WebM with The number of luma samples is 16x16=256 times the [36] number of macroblocks (and the number of luma sam- VP8). Microsoft, with the release of Internet Explorer ples per second is 256 times the number of macroblocks 9, has added support for HTML 5 video encoded using per second). H.264. At the Gartner Symposium/ITXpo in November 2010, Microsoft CEO Steve Ballmer answered the ques- tion “HTML 5 or Silverlight?" by saying “If you want to 8.4.4 Decoded picture buffering do something that is universal, there is no question the world is going HTML5.”[37] In January 2011, Google an- Previously encoded pictures are used by H.264/AVC en- nounced that they were pulling support for H.264 from coders to provide predictions of the values of samples in their Chrome browser and supporting both Theora and other pictures. This allows the encoder to make efficient WebM/VP8 to use only open formats.[38] decisions on the best way to encode a given picture. At On March 18, 2012, Mozilla announced support for the decoder, such pictures are stored in a virtual decoded H.264 in on mobile devices, due to prevalence of picture buffer (DPB). The maximum capacity of the DPB H.264-encoded video and the increased power-efficiency is in units of frames (or pairs of fields), as shown in paren- of using dedicated H.264 decoder hardware common on theses in the right column of the table above, can be com- such devices.[39] On February 20, 2013, Mozilla imple- puted as follows: mented support in Firefox for decoding H.264 on Win- dows 7 and above. This feature relies on Windows’ capacity = min(floor(MaxDpbMbs / built in decoding libraries.[40] Firefox 35.0, released on (PicWidthInMbs * FrameHeightInMbs)), January 13, 2015 supports H.264 on OS X 10.6 and 16) higher.[41] 64 CHAPTER 8. H.264/MPEG-4 AVC

On October 30, 2013, Rowan Trollope from Cisco Sys- (Consumer Show) offer an on-chip hardware tems announced that Cisco would release both bina- full HD H.264 encoder, known as Intel Quick Sync ries and source code of an H.264 called Video.[51][52] OpenH264 under the Simplified BSD license, and pay all A hardware H.264 encoder can be an ASIC or an FPGA. royalties for its use to MPEG LA themselves for any soft- ware projects that use Cisco’s precompiled binaries (thus ASIC encoders with H.264 encoder functionality are making Cisco’s OpenH264 binaries free to use); any soft- available from many different companies, ware projects that use Cisco’s source code instead of its but the used in the ASIC is typically licensed binaries would be legally responsible for paying all roy- from one of a few companies such as Chips&Media, Al- alties to MPEG LA themselves, however. Current target legro DVT, On2 (formerly Hantro, acquired by Google), CPU architectures are x86 and ARM, and current target Imagination Technologies, NGCodec. Some companies operating systems are Linux, Windows XP and later, Mac have both FPGA and ASIC product offerings.[53] OS X, and Android; iOS is notably absent from this list Texas Instruments manufactures a line of ARM + DSP because it doesn't allow applications to fetch and install cores that perform DSP H.264 BP encoding at [42][43][44] binary modules from the Internet. Also on Oc- 30fps.[54] This permits flexibility with respect to codecs tober 30, 2013, from Mozilla wrote that it (which are implemented as highly optimized DSP code) would use Cisco’s binaries in future versions of Firefox to while being more efficient than software on a generic add support for H.264 to Firefox where platform codecs CPU. are not available.[45] Cisco published the source to OpenH264 on December 9, 2013.[46] 8.6 Licensing

8.5.1 Software encoders See also: Microsoft Corp. v. Motorola Inc.

8.5.2 Hardware In countries where patents on software algorithms are up- held, vendors and commercial users of products that use See also: List of cameras with onboard video stream H.264/AVC are expected to pay patent licensing royal- encoding and H.264/MPEG-4 AVC products and imple- ties for the patented technology that their products use.[55] mentations This applies to the Baseline Profile as well.[56] A private organization known as MPEG LA, which is not Because H.264 encoding and decoding requires signifi- affiliated in any way with the MPEG standardization or- cant computing power in specific types of arithmetic op- ganization, administers the licenses for patents applying erations, software implementations that run on general- to this standard, as well as the patent pools for MPEG-2 purpose CPUs are typically less power efficient. How- Part 1 Systems, MPEG-2 Part 2 Video, MPEG-4 Part 2 ever, the latest quad-core general-purpose x86 CPUs Video, and other technologies. The MPEG-LA patents have sufficient computation power to perform real-time in the US last at least until 2027.[57] SD and HD encoding. Compression efficiency depends On August 26, 2010, MPEG LA announced that H.264 on video algorithmic implementations, not on whether encoded internet video that is free to end users will never hardware or software implementation is used. There- be charged royalties.[58] All other royalties remain in fore, the difference between hardware and software based place, such as royalties for products that decode and en- implementation is more on power-efficiency, flexibility code H.264 video as well as to operators of free televi- and cost. To improve the power efficiency and reduce sion and subscription channels.[59] The license terms are hardware form-factor, special-purpose hardware may be updated in 5-year blocks.[60] employed, either for the complete encoding or decod- ing process, or for acceleration assistance within a CPU- In 2005, Qualcomm, which was the assignee of U.S. controlled environment. Patent 5,452,104 and U.S. Patent 5,576,767, sued Broadcom in US District Court, alleging that Broad- CPU based solutions are known to be much more flex- com infringed the two patents by making products ible, particularly when encoding must be done concur- that were compliant with the H.264 video compres- rently in multiple formats, multiple bit rates and resolu- sion standard.[61] In 2007, the District Court found that tions (multi-screen video), and possibly with additional the patents were unenforceable because Qualcomm had features on container format support, advanced integrated failed to disclose them to the JVT prior to the release of advertising features, etc. CPU based software solution the H.264 standard in May 2003.[61] In December 2008, generally makes it much easier to load balance multiple the US Court of Appeals for the Federal Circuit affirmed concurrent encoding sessions within the same CPU. the District Court’s order that the patents be unenforce- The 2nd generation Intel "Sandy Bridge" Core i3/i5/i7 able but remanded to the District Court with instructions processors introduced at the January 2011 CES to limit the scope of unenforceability to H.264 compliant 8.8. REFERENCES 65 products.[61] [12] “ITU-T Home : Study groups : ITU-T Recommendations : ITU-T H.264 (2007) Cor. 1 (01/2009)". ITU. 2009-01- 13. Retrieved 2013-04-18.

8.7 See also [13] “ITU-T Home : Study groups : ITU-T Recommendations : ITU-T H.264 (03/2009)". ITU. 2009-03-16. Retrieved • VP8 2013-04-18. • VP9 [14] “ITU-T Home : Study groups : ITU-T Recommendations : ITU-T H.264 (03/2010)". ITU. 2010-03-09. Retrieved • Comparison of H.264 and VC-1 2013-04-18.

• Dirac (video compression format) [15] “ITU-T Home : Study groups : ITU-T Recommendations : ITU-T H.264 (06/2011)". ITU. 2011-06-29. Retrieved • H.264/MPEG-4 AVC products and implementa- 2013-04-18. tions [16] “ITU-T Home : Study groups : ITU-T Recommendations • High Efficiency Video Coding : ITU-T H.264 (01/2012)". ITU. 2012-01-13. Retrieved 2013-04-18. • Ultra-high-definition television [17] “ITU-T Recommendation H.264 (04/2013)". ITU. 2013- • IPTV 06-12. Retrieved 2013-06-16. • ISO/IEC JTC 1/SC 29 [18] “Recommendation H.264 (02/14)". ITU. 2014-11-28. Retrieved 2016-02-28. 8.8 References [19] Wenger; et al. “RFC 3984 : RTP Payload Format for H.264 Video”: 2.

[1] Ozer, Jan. “Encoding for Multiple Screen Delivery, Sec- [20] “Which recording mode is equivalent to the image qual- tion 3, Lecture 7: Introduction to H.264”. Udemy. Re- ity of the High Definition Video (HDV) format?". Sony trieved 10 October 2016. eSupport.

[2] “AVC/H.264 FAQ”. www.mpegla.com. Retrieved 2016- [21] “ATSC Standard A/72 Part 1: Video System Character- 09-15. istics of AVC in the ATSC Digital Television System” (PDF). Retrieved 2011-07-30. [3] “H.262 : Information technology — Generic coding of moving pictures and associated audio information: [22] “ATSC Standard A/72 Part 2: AVC Video Transport Sub- Video”. Retrieved 2007-04-15. system Characteristics” (PDF). Retrieved 2011-07-30. [4] Joint Video Team, ITU-T web site. [23] “ATSC Standard A/153 Part 7: AVC and SVC Video Sys- [5] “ITU-T Home : Study groups : ITU-T Recommendations tem Characteristics” (PDF). Retrieved 2011-07-30. : ITU-T H.264 (05/2003)". ITU. 2003-05-30. Retrieved [24] “Sony introduces new XAVC recording format to accel- 2013-04-18. erate 4K development in the professional and consumer [6] “ITU-T Home : Study groups : ITU-T Recommendations markets”. Sony. 2012-10-30. Retrieved 2012-11-01. : ITU-T H.264 (05/2003) Cor. 1 (05/2004)". ITU. 2004- [25] “Sony introduces new XAVC recording format to accel- 05-07. Retrieved 2013-04-18. erate 4K development in the professional and consumer [7] “ITU-T Home : Study groups : ITU-T Recommendations markets” (PDF). Sony. 2012-10-30. Retrieved 2012-11- : ITU-T H.264 (03/2005)". ITU. 2005-03-01. Retrieved 01. 2013-04-18. [26] “Sony supports “Beyond HD” strategy with new full sensor [8] “ITU-T Home : Study groups : ITU-T Recommendations cameras”. broadcastengineering.com. 2012-10-30. Re- : ITU-T H.264 (2005) Cor. 1 (09/2005)". ITU. 2005-09- trieved 2012-11-01. 13. Retrieved 2013-04-18. [27] Steve Dent (2012-10-30). “Sony goes Red-hunting with [9] “ITU-T Home : Study groups : ITU-T Recommendations PMW-F55 and PMW-F5 pro CineAlta 4K Super 35mm : ITU-T H.264 (2005) Amd. 1 (06/2006)". ITU. 2006- sensor camcorders”. Engadget. Retrieved 2012-11-05. 06-13. Retrieved 2013-04-18. [28] “F55 CineAlta 4K the future, ahead of schedule” (PDF). [10] “ITU-T Home : Study groups : ITU-T Recommendations Sony. 2012-10-30. Retrieved 2012-11-01. : ITU-T H.264 (2005) Amd. 2 (04/2007)". ITU. 2007- 04-06. Retrieved 2013-04-18. [29] “Ultra-fast “SxS PRO+" memory cards transform 4K video capture”. Sony. Retrieved 2012-11-05. [11] “ITU-T Home : Study groups : ITU-T Recommendations : ITU-T H.264 (11/2007)". ITU. 2007-11-22. Retrieved [30] “Ultra-fast “SxS PRO+" memory cards transform 4K 2013-04-18. video capture” (PDF). Sony. Retrieved 2012-11-05. 66 CHAPTER 8. H.264/MPEG-4 AVC

[31] “The H.264/AVC Advanced Video Coding Standard: [53] “Design-reuse.com”. Design-reuse.com. 1990-01-01. Overview and Introduction to the Fidelity Range Exten- Retrieved 2010-05-17. sions” (PDF). Retrieved 2011-07-30. [54] "Category:DM6467 - Texas Instruments Embedded Pro- [32] RFC 3984, p.3 cessors Wiki”. Processors.wiki.ti.com. 2011-07-12. Re- [33] Apple Inc. (1999-03-26). “H.264 FAQ”. Apple. trieved 2011-07-30. Archived from the original on March 7, 2010. Retrieved [55] “Summary of AVC/H.264 License Terms” (PDF). Re- 2010-05-17. trieved 2010-03-25. [34] Karsten Suehring. “H.264/AVC JM Reference Software Download”. Iphome.hhi.de. Retrieved 2010-05-17. [56] “OMS Video, A Project of ’s Open Media Commons Initiative”. Retrieved 2008-08-26. [35] “TS 101 154 – V1.9.1 – Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Cod- [57] http://www.osnews.com/story/24954/US_Patent_ ing in Broadcasting Applications based on the MPEG-2 Expiration_for_MP3_MPEG-2_H_264 has a Transport Stream” (PDF). Retrieved 2010-05-17. MPEG-LA patent US 7826532 that was filed in September 5, 2003 and has a 1546 day term ex- [36] “Decoding the HTML 5 video codec debate”. Ars Tech- tension. http://patft1.uspto.gov/netacgi/nph-Parser? nica. 2009-07-06. Retrieved 2011-01-12. patentnumber=7826532 http://www.google.com/ [37] “Steve Ballmer, CEO Microsoft, interviewed at Gart- patents/about?id=2onYAAAAEBAJ ner Symposium/ITxpo Orlando 2010”. Gartnervideo. November 2010. Retrieved 2011-01-12. [58] “MPEG LA’s AVC License Will Not Charge Royalties for Internet Video that is Free to End Users through Life [38] “HTML Video Codec Support in Chrome”. 2011-01-11. of License” (PDF). MPEG LA. 2010-08-26. Retrieved Retrieved 2011-01-12. 2010-08-26.

[39] “Video, Mobile, and the Open Web”. 2012-03-18. Re- [59] Hachman, Mark (2010-08-26). “MPEG LA Cuts Royal- trieved 2012-03-20. ties from Free Web Video, Forever”. pcmag.com. Re- [40] “WebRTC enabled, H.264/MP3 support in Win 7 on by trieved 2010-08-26. default, Metro UI for Windows 8 + more – Firefox De- velopment Highlights”. hacks.mozilla.org. mozilla. 2013- [60] “AVC FAQ”. MPEG LA. 2002-08-01. Retrieved 2010- 02-20. Retrieved 2013-03-15. 05-17.

[41] Firefox Notes Version 35.0 [61] See Qualcomm Inc. v. Broadcom Corp., No. 2007-1545, 2008-1162 (Fed. Cir. December 1, 2008). For articles in [42] “Open-Sourced H.264 Removes Barriers to WebRTC”. the popular press, see signonsandiego.com, “Qualcomm 2013-10-30. Retrieved 2013-11-01. loses its patent-rights case” and “Qualcomm’s patent case goes to jury”; and bloomberg.com “Broadcom Wins First [43] “Cisco OpenH264 project FAQ”. 2013-10-30. Retrieved Trial in Qualcomm Patent Dispute” 2013-11-01.

[44] “OpenH264 Simplified BSD License”. 2013-10-27. Re- trieved 2013-11-21. 8.9 Further reading [45] “Video Interoperability on the Web Gets a Boost From Cisco’s H.264 Codec”. 2013-10-30. Retrieved 2013-11- • Wiegand, Thomas; Sullivan, Gary J.; Bjøntegaard, 01. Gisle; Luthra, Ajay (July 2003). “Overview of the [46] https://github.com/cisco/openh264/commit/ H.264/AVC Video Coding Standard” (PDF). IEEE 59dae50b1069dbd532226ea024a3ba3982ab4386 Transactions on Circuits and Systems for Video Tech- nology. 13 (7). Retrieved January 31, 2011. [47] “ 4:2:2 encoding support”, Retrieved 2011-09-22.

[48] “x264 4:4:4 encoding support”, Retrieved 2011-06-22. • Topiwala, Pankaj; Sullivan, Gary J.; Luthra, Ajay (August 2004). “Overview and Introduction to the [49] “x264 support for 9 and 10-bit encoding”, Retrieved Fidelity Range Extensions” (PDF). SPIE Applica- 2011-06-22. tions of XXVII. Retrieved [50] “x264 replace High 4:4:4 profile lossless with High 4:4:4 January 31, 2011. Predictive”, Retrieved 2011-06-22. • Ostermann, J.; Bormans, J.; List, P.; Marpe, D.; [51] “Quick Reference Guide to generation Intel® Core™ Narroschke, M.; Pereira, F.; Stockhammer, T.; Processor Built-in Visuals”. Intel® Software Network. Wedi, T. (2004). “Video coding with H.264/AVC: 2010-10-01. Retrieved 2011-01-19. Tools, Performance, and Complexity” (PDF). IEEE [52] “Intel® Quick Sync Video”. www.intel.com. 2010-10- Circuits and Systems Magazine. 4 (1). Retrieved 01. Retrieved 2011-01-19. January 31, 2011. 8.10. EXTERNAL LINKS 67

• Sullivan, Gary J.; Wiegand, Thomas (January 2005). “Video Compression—From Concepts to the H.264/AVC Standard” (PDF). Proceedings of the IEEE. 93 (1). doi:10.1109/jproc.2004.839617. Retrieved January 31, 2011.

• Richardson, Iain E. G. (January 2011). “Learn about video compression and H.264”. VCODEX. Vcodex Limited. Retrieved January 31, 2011.

8.10 External links

• ITU-T publication page: H .264: Advanced video coding for generic audiovisual services

• MPEG-4 AVC/H.264 Information Doom9’s Forum • H.264/MPEG-4 Part 10 Tutorials (Richardson)

• “Part 10: Advanced Video Coding”. ISO publication page: ISO/IEC 14496-10:2010 – Information tech- nology — Coding of audio-visual objects. • “H.264/AVC JM Reference Software”. IP Home- page. Retrieved 2007-04-15. • “JVT document archive site”. Retrieved 2007-05- 06. • “Publications”. Thomas Wiegand. Retrieved 2007- 06-23. • “Publications”. Detlev Marpe. Retrieved 2007-04- 15. • “Fourth Annual H.264 video codecs comparison”. Moscow State University. (dated December 2007) • “Discussion on H.264 with respect to IP cameras in use within the security and surveillance industries”. (dated April 2009)

• “Sixth Annual H.264 video codecs comparison”. Moscow State University. (dated May 2010) Chapter 9

Group of pictures

In video coding, a group of pictures, or GOP struc- loss robustness or fast-forward. D pictures are only ture, specifies the order in which intra- and inter-frames used in MPEG-1 video. are arranged. The GOP is a group of successive pictures within a coded video stream. Each coded video stream An I frame indicates the beginning of a GOP. After- consists of successive GOPs. From the pictures contained wards several P and B frames follow. In older designs, the in it, the visible frames are generated. Encountering a allowed ordering and referencing structure is relatively new GOP in a compressed video stream ensures the de- constrained.[1] coder that no previous frame will be needed to decode the next ones, and is what allows fast seeking through the The I frames contain the full image and do not require video. any additional information to reconstruct it. Typically, encoders use GOP structures that cause each I frame to be a “clean random access point”, such that any errors within 9.1 Description the GOP structure are corrected by the next I frame. In the newer designs found in H.264/MPEG-4 AVC and HEVC, encoders have much more flexibility about ref- A GOP can contain the following picture types: erencing structures. They can use the same referencing structures as were previously used in older designs, or • I picture or I frame (intra coded picture) – a pic- they can use more pictures as references and they can use ture that is coded independently of all other pictures. more flexible ordering of the coding order relative to the Each GOP begins (in decoding order) with this type display order. They are also allowed to use B pictures of picture. as references when coding other (B or P) pictures. This extra flexibility can improve compression efficiency, but • P picture or P frame (predictive coded picture) – it can cause propagation of errors if some data becomes contains motion-compensated difference informa- lost or corrupted. One popular structure for use with the tion relative to previously decoded pictures. In newer designs is the use of a hierarchy of B pictures. Hi- older designs such as MPEG-1, H.262/MPEG-2 and erarchical B pictures can provide very good compression H.263, each P picture can only reference one pic- efficiency and can also limit the propagation of errors, ture, and that picture must precede the P picture in since the hierarchy can ensure that the number of pic- display order as well as in decoding order and must tures affected by any data corruption problem is strictly be an I or P picture. These constraints do not ap- limited. ply in the newer standards H.264/MPEG-4 AVC and HEVC. Generally, the more I frames the video stream has, the more editable it is. However, having more I frames sub- • B picture or B frame (bipredictive coded picture) stantially increases bit rate needed to code the video. – contains motion-compensated difference informa- tion relative to previously decoded pictures. In older designs such as MPEG-1 and H.262/MPEG-2, each 9.2 GOP Structure B picture can only reference two pictures, one of which must precede the B picture in display order and the other must follow the B picture in display The GOP structure is often referred by two numbers, for order, and all pictures that are referenced must be example, M=3, N=12. The first number tells the dis- I or P pictures. These constraints do not apply in tance between two anchor frames (I or P). The second newer standards H.264/MPEG-4 AVC and HEVC. one tells the distance between two full images (I-frames): it is the GOP size.[2] For the example M=3, N=12, the • D picture or D frame (DC direct coded picture) – GOP structure is IBBPBBPBBPBBI. Instead of the M serves as a fast-access representation of a picture for parameter the maximal count of B-frames between two

68 9.3. REFERENCES 69 consecutive anchor frames can be used. For example, in a sequence with pattern IBBBBPBBBBPBBBBI, the GOP size is equal to 15 (length between two I frames) and distance between two anchor frames (M value) is 5 (length between I and P frames or length between two consecutive P Frames).

9.3 References

[1] http://www.cs.cf.ac.uk/Dave/Multimedia/node258.html

[2] http://documentation.apple.com/en/compressor/ usermanual/index.html#chapter=18%26section=5% 26tasks=true Chapter 10

Video compression picture types

In the field of video compression a video frame is com- A B‑frame ('Bi-predictive picture') saves even more pressed using different algorithms with different advan- space by using differences between the current frame and tages and disadvantages, centered mainly around amount both the preceding and following frames to specify its of data compression. These different algorithms for content. video frames are called picture types or frame types. The three major picture types used in the different video algorithms are I, P and B. They are different in the fol- lowing characteristics: 10.2 Pictures/Frames • I‑frames are the least compressible but don't require other video frames to decode. While the terms “frame” and “picture” are often used in- • P‑frames can use data from previous frames to de- terchangeably, strictly speaking, the term picture is a more compress and are more compressible than I‑frames. general notion, as a picture can be either a frame or a field. • B‑frames can use both previous and forward frames A frame is a complete image captured during a known for data reference to get the highest amount of data time interval, and a field is the set of odd-numbered or compression. even-numbered scanning lines composing a partial im- age. When video is sent in interlaced-scan format, each frame is sent as the field of odd-numbered lines followed 10.1 Summary by the field of even-numbered lines. Frames that are used as a reference for predicting other frames are referred to as reference frames. In such designs, the frames that are coded without predic- tion from other frames are called the I-frames, frames that I-frame P-frame B-frame I-frame use prediction from a single reference frame (or a single frame for prediction of each region) are called P-frames, A sequence of video frames, consisting of two keyframes (I), one and frames that use a prediction signal that is formed as a forward-predicted frame (P) and one bi-directionally predicted (possibly weighted) average of two reference frames are frame (B). called B-frames. There are three types of pictures (or frames) used in video compression: I‑frames, P‑frames and B‑frames. An I‑frame is an 'Intra-coded picture', in effect a fully specified picture, like a conventional static image file. P‑frames and B‑frames hold only part of the image infor- 10.3 Slices mation, so they need less space to store than an I‑frame and thus improve video compression rates. In the latest international standard, known as A P‑frame ('Predicted picture') holds only the changes H.264/MPEG-4 AVC, the granularity of the estab- in the image from the previous frame. For example, in a lishment of prediction types is brought down to a lower scene where a car moves across a stationary background, level called the slice level of the representation. A slice only the car’s movements need to be encoded. The en- is a spatially distinct region of a frame that is encoded coder does not need to store the unchanging background separately from any other region in the same frame. pixels in the P‑frame, thus saving space. P‑frames are In that standard, instead of I-frames, P-frames, and also known as delta‑frames. B-frames, there are I-slices, P-slices, and B-slices.

70 10.6. PREDICTED FRAMES/SLICES (P-FRAMES/SLICES) 71

10.4 Macroblocks • Typically require more bits to encode than other frame types. Typically, pictures (frames) are segmented into macroblocks, and individual prediction types can be Often, I‑frames are used for random access and are used selected on a macroblock basis rather than being the as references for the decoding of other pictures. Intra same for the entire picture, as follows: refresh periods of a half-second are common on such ap- plications as digital television broadcast and DVD stor- • I-frames can contain only intra macroblocks age. Longer refresh periods may be used in some envi- ronments. For example, in videoconferencing systems it • P-frames can contain either intra macroblocks or is common to send I-frames very infrequently. predicted macroblocks • B-frames can contain intra, predicted, or bi- predicted macroblocks 10.6 Predicted frames/slices (P- frames/slices) Furthermore, in the video codec H.264, the frame can be segmented into sequences of macroblocks called slices, • Require the prior decoding of some other picture(s) and instead of using I, B and P-frame type selections, the in order to be decoded. encoder can choose the prediction style distinctly on each individual slice. Also in H.264 are found several addi- • May contain both image data and motion vector dis- tional types of frames/slices: placements and combinations of the two. • • SI‑frames/slices (Switching I); Facilitates switching Can reference previous pictures in decoding order. between coded streams; contains SI-macroblocks (a • Older standard designs (such as MPEG-2) use only special type of intra coded macroblock). one previously decoded picture as a reference during • SP‑frames/slices (Switching P); Facilitates switch- decoding, and require that picture to also precede ing between coded streams; contains P and/or I- the P picture in display order. macroblocks • In H.264, can use multiple previously decoded pic- • multi‑frame motion estimation (up to 16 reference tures as references during decoding, and can have frames, or 32 reference fields) any arbitrary display-order relationship relative to the picture(s) used for its prediction. Multi‑frame motion estimation will allow increases in the • Typically require fewer bits for encoding than I pic- quality of the video while allowing the same compression tures do. ratio. SI- SP‑frames (defined for Extended Profile) will allow for increases in the error resistance. When such frames are used along with a smart decoder, it is possible to recover the broadcast streams of damaged DVDs. 10.7 Bi-directional pre- dicted frames/slices (B- 10.5 Intra coded frames/slices frames/slices)

(I‑frames/slices or Key • Require the prior decoding of other frame(s) in or- frames) der to be decoded. • May contain image data and motion vector displace- See also: Key frame (animation) and Intra-frame ments or both.

• Older standards have a single global motion • I-frames are coded without reference to any frame compensation vector for the entire frame. except themselves. • Some standards have a single motion compen- • May be generated by an encoder to create a random sation vector per macroblock. access point (to allow a decoder to start decoding • properly from scratch at that picture location). Include some prediction modes that form a predic- tion of a motion region (e.g., a macroblock or a • May also be generated when differentiating im- smaller area) by averaging the predictions obtained age details prohibit generation of effective P or B- using two different previously decoded reference re- frames. gions. 72 CHAPTER 10. VIDEO COMPRESSION PICTURE TYPES

• In other words, some standards allow two motion compensation vectors per macroblock (biprediction).

• In older standard designs (such as MPEG-2), B- frames are never used as references for the predic- tion of other pictures. As a result, a lower quality en- coding (resulting in the use of fewer bits than would otherwise be the case) can be used for such B-frames because the loss of detail will not harm the predic- tion quality for subsequent pictures. • In H.264, may or may not be used as references for the decoding of other pictures (at the discretion of the encoder).

• In older standard designs (such as MPEG-2), use ex- actly two previously decoded pictures as references during decoding, and require one of those pictures to precede the B-frame in display order and the other one to follow it.

• In H.264, can use one, two, or more than two previ- ously decoded pictures as references during decod- ing, and can have any arbitrary display-order rela- tionship relative to the picture(s) used for its predic- tion. • Typically require fewer bits for encoding than either I or P-frames.

10.8 See also

• Key frame term in animation

• Video compression • Intra frame

• Inter frame

• Group of pictures application of frame types • Datamosh

• Video

10.9 References

10.10 External links

• Video streaming with SP and SI frames Chapter 11

Inter frame

An inter frame is a frame in a video compression stream which is expressed in terms of one or more neighbor- ing frames. The “inter” part of the term refers to the use of Inter frame prediction. This kind of prediction tries to take advantage from temporal redundancy be- tween neighboring frames enabling higher compression rates.

11.1 Inter frame prediction Inter-frame prediction process. In this case, there has been an il- lumination change between the block at the reference frame and An inter coded frame is divided into blocks known as the block which is being encoded: this difference will be the pre- macroblocks. After that, instead of directly encoding the diction error to this block. raw pixel values for each block, the encoder will try to find a block similar to the one it is encoding on a pre- viously encoded frame, referred to as a reference frame. Thus the overall size of motion vector plus predic- This process is done by a block matching algorithm. If tion error will be greater than the raw encoding. In the encoder succeeds on its search, the block could be en- this case the encoder would make an exception and coded by a vector, known as motion vector, which points send a raw encoding for that specific block. to the position of the matching block at the reference frame. The process of motion vector determination is called motion estimation. • If the matched block at the reference frame has also In most cases the encoder will succeed, but the block been encoded using Inter frame prediction, the er- found is likely not an exact match to the block it is en- rors made for its encoding will be propagated to the coding. This is why the encoder will compute the differ- next block. If every frame was encoded using this ences between them. Those residual values are known as technique, there would be no way for a decoder to the prediction error and need to be transformed and sent synchronize to a video stream because it would be to the decoder. impossible to obtain the reference images. To sum up, if the encoder succeeds in finding a matching block on a reference frame, it will obtain a motion vec- tor pointing to the matched block and a prediction error. Using both elements, the decoder will be able to recover Because of these drawbacks, a reliable and time periodic the raw pixels of the block. The following image shows reference frame must be used for this technique to be effi- the whole process graphically: cient and useful. That reference frame is known as Intra- frame, which is strictly intra coded, so it can always be This kind of prediction has some pros and cons: decoded without additional information.

• If everything goes fine, the algorithm will be able In most designs, there are two types of inter frames: P- to find a matching block with little prediction error frames and B-frames. These two kinds of frames and so that, once transformed, the overall size of motion the I-frames (Intra-coded pictures) usually join in a GOP vector plus prediction error is lower than the size of (Group Of Pictures). The I-frame doesn't need additional a raw encoding. information to be decoded and it can be used as a re- liable reference. This structure also allows to achieve an • If the block matching algorithm fails to find a suit- I-frame periodicity, which is needed for decoder synchro- able match the prediction error will be considerable. nization.

73 74 CHAPTER 11. INTER FRAME

11.2 Frame types This structure has strong points:

The difference between P-frames and B-frames is the ref- • It minimizes the problem of possible uncovered ar- erence frame they are allowed to use. eas. • P-frames and B-frames need less data than I-frames, 11.2.1 P-frame so less data is transmitted.

P-frame is the name to define the forward Predicted pic- But it has weak points: tures. The prediction is made from an earlier picture, mainly an I-frame, so that require less coding data (≈50% • It increases the complexity of the decoder, which when compared to I-frame size). can mean more memory is needed to rearrange the The amount of data needed for doing this prediction con- frames. sist of motion vectors and transform coefficients describ- • The interpolated frames (namely B-frames) require ing prediction correction. It involves the use of motion more motion vectors which means an increased bit compensation. rate.

11.2.2 B-frame 11.4 H.264 Inter frame prediction B-frame is the term for bidirectionally predicted pictures. improvements This kind of prediction method occupies less coding data than P-frames (≈25% when compared to I-frame size) because they can be predicted or interpolated from an The most important improvements of this technique in earlier and/or latter frame. Similar to P-frames, B-frames regard to previous H.264 standard are: are expressed as motion vectors and transform coeffi- cients. In order to avoid a growing propagation error, • More flexible block partition B-frames are not used as a reference to make further pre- • dictions in most encoding standards. However, in newer Resolution of up to ¼ pixel motion compensation encoding methods (such as AVC), B-frames may be used • Multiple references as reference. • Enhanced Direct/Skip Macroblock 11.3 Typical Group Of Pictures 11.4.1 More flexible block partition (GOP) structure Luminance block partition of 16×16 (MPEG-2), 16×8, The typical Group Of Pictures (GOP) structure is 8×16, 8×8. Last case allows divide the block in new IBBPBBP... The I-frame is used to predict the first P- blocks of 4×8, 8×4, 4×4. frame and these two frames are also used to predict the 16×16 8×8 8×16 16×8 first and the second B-frames. The second P-frame is predicted also using the first I-frame. Both P-frames join together to predict the third and fourth B-frames. The scheme is shown in the next picture: This structure suggests a problem because the fourth frame (a P-frame) is needed in order to predict the sec- ond and the third (B-frames). So we need to transmit the P-frame before the B-frames and it will delay the trans- mission (it will be necessary to keep the P-frame). 8×8 4×4 4×8 8×4

The frame to be coded is divided in block of equal size as some blocks shown in the picture above. Each block pre- diction will be blocks of same size as reference pictures, Example of a GOP structure by a small displacement. 11.4. H.264 INTER FRAME PREDICTION IMPROVEMENTS 75

11.4.2 Resolution of up to ¼ pixel motion compensation

Pixels at half-pixel position are obtained by applying a filter of length 6. H=[1 −5 20 20 −5 1] For example: b=A - 5B + 20C + 20D - 5E + F Pixels at quarter-pixel position are obtained by bilinear interpolation. While MPEG-2 allowed a ½ pixel resolution, Inter frame allows up to ¼ pixel resolution. That means that it is pos- sible to search a block in the frame to be coded in other reference frames, or we can interpolate nonexistent pix- els to find blocks that are even better suited to the current block. If motion vector is an integer number of units of 11.4.4 Enhanced Direct/Skip Macroblock samples, that means it is possible to find in reference pic- tures the compensated block in motion. If motion vector Skip and Direct Mode are very frequently used, especially is not an integer, the prediction will be obtained from in- with B-frames. They significantly reduce the number of terpolated pixels by an interpolator filter to horizontal and bits to be coded. These modes are referred to when a vertical directions. block is coded without sending residual error or motion vectors. The encoder will only record that it is a Skip Macroblock. The decoder will deduce the motion vec- tor of Direct/Skip Mode coded block from other blocks already decoded. There are two ways to deduce the motion:

11.4.3 Multiple references Temporal It uses the block motion vector from List 1 frame, located at the same position to deduce the motion vector. List 1 block uses a List 0 block as Multiple references to motion estimation allows finding reference. the best reference in 2 possible buffers (List 0 to past pic- tures, List 1 to future pictures) which contain up to 16 Spatial It predicts the movement from neighbour mac- frames each. Block prediction is done by a weighted sum roblocks in same frame. A possible criterion could of blocks from the reference picture. It allows enhanced be to copy the motion vector from a neighboring picture quality in scenes where there are changes of plane, block. These modes are used in uniform zones of zoom, or when new objects are revealed. the picture where there is not much movement. 76 CHAPTER 11. INTER FRAME

In the figure above, pink blocks are Direct/Skip Mode coded blocks. As we can see, they are used very fre- quently, mainly in B-frames.

11.5 Additional information

Although the use of the term “frame” is common in in- formal usage, in many cases (such as in international stan- dards for video coding by MPEG and VCEG) a more gen- eral concept is applied by using the word “picture” rather than “frame”, where a picture can either be a complete frame or a single interlaced field. Video codecs such as MPEG-2, H.264 or Ogg Theora reduce the amount of data in a stream by following key frames with one or more inter frames. These frames can typically be encoded using a lower bit rate than is needed for key frames because much of the image is ordinarily similar, so only the changing parts need to be coded.

11.6 References

• Software H.264: http://iphome.hhi.de/suehring/ tml/download/

• T.Wiegand, G.J. Sullivan, G. Bjøntegaard, A.Luthra: Overview of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, July 2003

11.7 See also

• Video compression picture types Chapter 12

Motion compensation

used in the next frame. Using motion compensation, a video stream will contain some full (reference) frames; then the only information stored for the frames in between would be the informa- tion needed to transform the previous frame into the next frame.

12.2 Illustrated example

The following is a simplistic illustrated explanation of how motion compensation works. Two successive frames were captured from the movie Elephants Dream. As can be seen from the images, the bottom (motion compen- sated) difference between two frames contains signifi- Visualization of MPEG block motion compensation. Blocks that cantly less detail than the prior images, and thus com- moved from one frame to the next are shown as white arrows, presses much better than the rest. Thus the informa- making the motions of the different platforms and the character tion that is required to encode compensated frame will be clearly visible. much smaller than with the difference frame. This also means that it is also possible to encode the information Motion compensation is an algorithmic technique used using difference image at a cost of less compression ef- to predict a frame in a video, given the previous and/or ficiency but by saving coding complexity without motion future frames by accounting for motion of the camera compensated coding; as a matter of fact that motion com- and/or objects in the video. It is employed in the encod- pensated coding (together with motion estimation, mo- ing of video data for video compression, for example in tion compensation) occupies more than 90% of encoding the generation of MPEG-2 files. Motion compensation complexity. describes a picture in terms of the transformation of a reference picture to the current picture. The reference picture may be previous in time or even from the future. When images can be accurately synthesised from pre- 12.3 Motion Compensation in viously transmitted/stored images, the compression effi- ciency can be improved. MPEG

In MPEG, images are predicted from previous frames 12.1 How it works (P frames) or bidirectionally from previous and future frames (B frames). B frames are more complex because Motion compensation exploits the fact that, often, for the image sequence must be transmitted/stored out of or- der so that the future frame is available to generate the B many frames of a movie, the only difference between [1] one frame and another is the result of either the camera frames. moving or an object in the frame moving. In reference After predicting frames using motion compensation, the to a video file, this means much of the information that coder finds the error (residual) which is then compressed represents one frame will be the same as the information and transmitted.

77 78 CHAPTER 12. MOTION COMPENSATION

12.4 Global motion compensation between the current and previous motion vector in the bit- stream. The result of this differencing process is mathe- In global motion compensation, the motion model ba- matically equivalent to a global motion compensation ca- sically reflects camera motions such as: pable of panning. Further down the encoding pipeline, an entropy coder will take advantage of the resulting sta- • Dolly - moving the camera forward or backward tistical distribution of the motion vectors around the zero vector to reduce the output size. • Track - moving the camera left or right It is possible to shift a block by a non-integer number of • Boom - moving the camera up or down pixels, which is called sub-pixel precision. The in-between pixels are generated by interpolating neighboring pixels. • Pan - rotating the camera around its Y axis, moving Commonly, half-pixel or quarter pixel precision (Qpel, the view left or right used by H.264 and MPEG-4/ASP) is used. The com- • Tilt - rotating the camera around its X axis, moving putational expense of sub-pixel precision is much higher the view up or down due to the extra processing required for interpolation and on the encoder side, a much greater number of potential • Roll - rotating the camera around the view axis source blocks to be evaluated. The main disadvantage of block motion compensation It works best for still scenes without moving objects. is that it introduces discontinuities at the block borders There are several advantages of global motion compen- (blocking artifacts). These artifacts appear in the form sation: of sharp horizontal and vertical edges which are eas- ily spotted by the human eye and produce false edges • It models the dominant motion usually found in and ringing effects (large coefficients in high frequency video sequences with just a few parameters. The sub-bands) due to quantization of coefficients of the share in bit-rate of these parameters is negligible. Fourier-related transform used for of the residual frames[2] • It does not partition the frames. This avoids artifacts at partition borders. Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensa- • A straight line (in the time direction) of pixels with tion vector tells where those blocks come from (a com- equal spatial positions in the frame corresponds to a mon misconception is that the previous frame is divided continuously moving point in the real scene. Other up into non-overlapping blocks, and the motion compen- MC schemes introduce discontinuities in the time sation vectors tell where those blocks move to). The direction. source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame MPEG-4 ASP supports GMC with three reference out of pieces of several different previously-transmitted points, although some implementations can only make frames. use of one. A single reference point only allows for trans- Frames can also be predicted from future frames. The lational motion which for its relatively large performance future frames then need to be encoded before the pre- cost provides little advantage over block based motion dicted frames and thus, the encoding order does not nec- compensation. essarily match the real frame order. Such frames are Moving objects within a frame are not sufficiently rep- usually predicted from two directions, i.e. from the I- resented by global motion compensation. Thus, local or P-frames that immediately precede or follow the pre- motion estimation is also needed. dicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB. 12.5 Block motion compensation Further, the use of triangular tiles has also been proposed for motion compensation. Under this scheme, the frame In block motion compensation (BMC), the frames are is tiled with triangles, and the next frame is generated by partitioned in blocks of pixels (e.g. macroblocks of performing an affine transformation on these triangles.[3] 16×16 pixels in MPEG). Each block is predicted from Only the affine transformations are recorded/transmitted. a block of equal size in the reference frame. The blocks This is capable of dealing with zooming, rotation, trans- are not transformed in any way apart from being shifted lation etc. to the position of the predicted block. This shift is repre- sented by a motion vector. To exploit the redundancy between neighboring block vectors, (e.g. for a single moving object covered by mul- tiple blocks) it is common to encode only the difference 12.9. 3D IMAGE CODING TECHNIQUES 79

12.6 Variable block-size motion 12.9 3D image coding techniques compensation Motion compensation is utilized in Stereoscopic Video Coding Variable block-size motion compensation (VBSMC) is the use of BMC with the ability for the encoder to dynam- In video, time is often considered as the third dimension. ically select the size of the blocks. When coding video, Still image coding techniques can be expanded to an extra the use of larger blocks can reduce the number of bits dimension. needed to represent the motion vectors, while the use of JPEG2000 uses wavelets, and these can also be used to smaller blocks can result in a smaller amount of predic- encode motion without gaps between blocks in an adap- tion residual information to encode. Older designs such tive way. Fractional pixel affine transformations lead to as H.261 and MPEG-1 video typically use a fixed block bleeding between adjacent pixels. If no higher internal size, while newer ones such as H.263, MPEG-4 Part 2, resolution is used the delta images mostly fight against H.264/MPEG-4 AVC, and VC-1 give the encoder the the image smearing out. The delta image can also be ability to dynamically choose what block size will be used encoded as wavelets, so that the borders of the adaptive to represent the motion. blocks match. 2D+ techniques utilize H.264 and MPEG-2 compatible coding and can use motion 12.7 Overlapped block motion compensation to compress between stereoscopic images. compensation 12.10 See also Overlapped block motion compensation (OBMC) is a good solution to these problems because it not only in- • Motion estimation creases prediction accuracy but also avoids blocking ar- tifacts. When using OBMC, blocks are typically twice • Image stabilization as big in each dimension and overlap quadrant-wise with • all 8 neighbouring blocks. Thus, each pixel belongs to 4 Inter frame blocks. In such a scheme, there are 4 predictions for each • HDTV blur pixel which are summed up to a weighted mean. For this purpose, blocks are associated with a • Television standards conversion that has the property that the sum of 4 overlapped win- dows is equal to 1 everywhere. • VidFIRE

Studies of methods for reducing the complexity of • X-Video Motion Compensation OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads 12.11 Applications to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then be- • video compression longs to 3 blocks rather than 4, and rather than using 8 • neighboring blocks, only 4 are used for each block to be change of framerate for playback of 24 frames per compensated. Such a scheme is found in the H.263 An- second movies on 60 Hz LCDs or 100 Hz interlaced nex F Advanced Prediction mode cathode ray tubes

12.12 References 12.8 Quarter Pixel (QPel) and Half Pixel motion compensation [1] berkeley.edu - Why do some people hate B-pictures? [2] Zeng, Kai, et al. “Characterizing perceptual artifacts in In motion compensation, quarter or half samples are ac- compressed video streams.” IS&T/SPIE Electronic Imag- tually interpolated sub-samples caused by fractional mo- ing. International Society for Optics and Photonics, 2014. tion vectors. Based on the vectors and full-samples, the [3] Aizawa, Kiyoharu, and Thomas S. Huang. “Model-based sub-samples can be calculated by using bicubic or bilinear image coding advanced video coding techniques for very 2-D filtering. See subclause 8.4.2.2 “Fractional sample low bit-rate applications.” Proceedings of the IEEE 83.2 interpolation process” of the H.264 standard. (1995): 259-271. 80 CHAPTER 12. MOTION COMPENSATION

Garnham, N. W., Motion Compensated Video Coding, University of Nottingham PhD Thesis, October 1995, OCLC 59633188.

12.13 External links

• Temporal Rate Conversion - article giving an overview of motion compensation techniques.

• A New FFT Architecture and Chip Design for Mo- tion Compensation based on Phase Correlation • DCT and DFT coefficients are related by simple fac- tors • DCT better than DFT also for video

• John Wiseman, An Introduction to MPEG Video Compression

• DCT and motion compensation • Compatibility between DCT, motion compensation and other methods Chapter 13

Motion estimation

matching metric is usually related to what the final esti- mated motion is used for as well as the optimisation strat- egy in the estimation process.

13.2 Algorithms

The methods for finding motion vectors can be cate- gorised into pixel based methods (“direct”) and feature based methods (“indirect”). A famous debate resulted in Motion vectors that result from a movement into the z -plane of two papers from the opposing factions being produced to [1][2] the image, combined with a lateral movement to the lower-right. try to establish a conclusion. This is a visualization of the motion estimation performed in or- der to compress an MPEG movie. 13.2.1 Direct methods Motion estimation is the process of determining motion • Block-matching algorithm vectors that describe the transformation from one 2D im- age to another; usually from adjacent frames in a video • Phase correlation and frequency domain methods sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the • Pixel recursive algorithms 3D scene onto a 2D plane. The motion vectors may re- • late to the whole image (global motion estimation) or spe- Optical flow cific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be rep- 13.2.2 Indirect methods resented by a translational model or many other models that can approximate the motion of a real video camera, Indirect methods use features, such as corner detection, such as rotation and translation in all three dimensions and match corresponding features between frames, usu- and zoom. ally with a statistical function applied over a local or global area. The purpose of the statistical function is to remove matches that do not correspond to the actual mo- 13.1 Related terms tion. Statistical functions that have been successfully used in- More often than not, the term motion estimation and the clude RANSAC. term optical flow are used interchangeably. It is also re- lated in concept to image registration and stereo corre- spondence. In fact all of these terms refer to the process 13.2.3 Additional note on the categoriza- of finding corresponding points between two images or tion video frames. The points that correspond to each other in two views (images or frames) of a real scene or object It can be argued that almost all methods require some are “usually” the same point in that scene or on that ob- kind of definition of the matching criteria. The differ- ject. Before we do motion estimation, we must define our ence is only whether you summarise over a local image measurement of correspondence, i.e., the matching met- region first and then compare the summarisation (such ric, which is a measurement of how similar two image as feature based methods), or you compare each pixel points are. There is no right or wrong here; the choice of first (such as squaring the difference) and then summarise

81 82 CHAPTER 13. MOTION ESTIMATION over a local image region (block base motion and filter based motion). An emerging type of matching criteria summarises a local image region first for every pixel lo- cation (through some feature transform such as Laplacian transform), compares each summarised pixel and sum- marises over a local image region again.[3] Some match- ing criteria has the ability to exclude points that does not actually correspond to each other albeit producing a good matching score, others does not have this ability, but they are still matching criteria.

13.3 Applications

13.3.1 Video coding

Applying the motion vectors to an image to synthesize the transformation to the next image is called motion compensation. As a way of exploiting temporal redun- dancy, motion estimation and compensation are key parts of video compression. Almost all video coding stan- dards use block-based motion estimation and compensa- tion such as the MPEG series including the most recent HEVC.

13.4 See also

• Video processing unit • Vision processing unit

• Scale-invariant feature transform

13.5 References

[1] Philip H.S. Torr and Andrew Zisserman: Feature Based Methods for Structure and Motion Estimation, ICCV Workshop on Vision Algorithms, pages 278-294, 1999

[2] Michal Irani and P. Anandan: About Direct Methods, ICCV Workshop on Vision Algorithms, pages 267-277, 1999.

[3] Rui Xu, David Taubman & Aous Thabit Naman, 'Mo- tion Estimation Based on Mutual Information and Adap- tive Multi-scale Thresholding', in Image Processing, IEEE Transactions on , vol.25, no.3, pp.1095-1108, March 2016. 13.6. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 83

13.6 Text and image sources, contributors, and licenses

13.6.1 Text

• JPEG Source: https://en.wikipedia.org/wiki/JPEG?oldid=744445500 Contributors: Damian Yerrick, AxelBoldt, Derek Ross, Zundark, The Anome, Tarquin, AlexWasFirst, Ted Longstaffe, William Avery, Ben-Zin~enwiki, Hannes Hirzel, Mjb, Heron, B4hand, Freckle- foot, Edward, Bdesham, D, Michael Hardy, Graue, (, Ahoerstemeier, Haakon, Stevenj, Nanshu, Glenn, Llull, Rl, Rob Hooft, Crissov, Timwi, Dcoetzee, Ed Cormany, Stone, Tpbradbury, E23~enwiki, Furrykef, RayKiddy, Ed g2s, Francs2000, Donarreiskoffer, Robbot, Ke4roh, Hankwang, Wanion, Astronautics~enwiki, Psychonaut, Academic Challenger, Rursus, Roscoe x, Bkell, Wikibot, Michael Snow, Garrett Albright, Kostiq, Lupo, Mattflaschen, Filemon, Pabouk, Giftlite, Dbenbenn, Smjg, DocWatson42, SamB, Ksheka, ShaunMacPher- son, Inkling, BenFrantzDale, Lisbk, Fleminra, Markus Kuhn, JamesHoadley, Ssd, Daniel Brockman, Xorx77, Spe88, Jmcnamera, Neilc, Tom k&e, Chowbok, Toytoy, Slowking Man, Antandrus, OverlordQ, MarkSweep, Kaldari, Marc Mongenet, Zfr, Timothy57, Willhsmit, Porges, Mike Rosoft, Mormegil, Perey, Jayjg, ChrisRuvolo, Freakofnurture, Imroy, Slady, Bruzie, Discospinster, Rich Farmbrough, Lm- cgign, Qutezuce, Notinasnaid, Samboy, Horkana, Mani1, KaiSeun, Hhielscher, Bender235, ZeroOne, Kbh3rd, Kjoonlee, JoeSmack, Plugwash, Syp, Kaszeta, *drew, Edwinstearns, Lankiveil, Kwamikagami, Shanes, Causa sui, David Crawshaw, Longhair, Fir0002, Dee Earley, Mathieu, SpeedyGonsales, Alphax, Photonique, Sleske, Minghong, Foxandpotatoes, Jumbuck, Danski14, Alansohn, Guy Harris, Interiot, Arthena, CyberSkull, AzaToth, Hinotori, Goldom, Antialias, PAR, Dschwen, ProhibitOnions, Rick Sidwell, Cburnett, Gpvos, Mnemo, Vuo, H2g2bob, Mattbrundage, Bruce89, STrRedWolf, Stuartyeates, Alkarex, TigerShark, Bryan986, LOL, Rocastelo, Sburke, Guy M, Phillipsacp, WadeSimMiser, Eyreland, Simsong, Karam.Anthony.K, Brownsteve, Allen3, FBarber, Audiodude, Pawnbroker, Man- darax, Graham87, Kbdank71, Jclemens, Mulligatawny, Jshadias, Rjwilmsi, Koavf, Strait, Arisa, TeemuN, Bubba73, ATLBeer, Alejo2083, FlaBot, WWC, Musical Linguist, Lawrencegold, Loggie, Nivix, Rune.welsh, RexNL, Gurch, Nimur, Alphachimp, Chobot, DVdm, Bg- white, Adoniscik, Gwernol, YurikBot, Wavelength, Klingoncowboy4, Crotalus horridus, Xcrivener, Huw Powell, Jimp, Charles Gaudette, Brandmeister (old), Fabartus, Muchness, Pi Delport, Groogle, Cancan101, Ori Livneh, Stephenb, Gaius Cornelius, Rsrikanth05, Akhris- tov, Wimt, FelixH~enwiki, NawlinWiki, Aeusoes1, Janke, Grafen, Trademarx, Daniel Mietchen, Zwobot, Xompanthy, Shotgunlee, Dead- EyeArrow, Xpclient, Ghclark, Wknight94, Superdood, Mugunth Kumar, Zzuuzz, Viory, Cedar101, KGasso, DGaw, JoanneB, Ian Fieggen, DmitriyV, Ilmari Karonen, RG2, Amit man, Cmglee, Finell, AndrewWTaylor, Benhoyt, A bit iffy, RupertMillard, SmackBot, Eugen Dedu, Slashme, KnowledgeOfSelf, McGeddon, Blue520, WookieInHeat, Timeshifter, Alsandro, Yamaguchi, Sloman, Brianski, TRosenbaum, ERcheck, Wookipedian, Chris the speller, Kurykh, MK8, Oli Filth, Alexwagner, Toybuilder, Jerome Charles Potts, Nbarth, Kostmo, DHN- bot~enwiki, Konstable, Audriusa, Can't sleep, clown will eat me, Frap, Neo139, Berland, Gnp, GeorgeMoney, UU, Calbaer, Crboyer, Cy- bercobra, Khukri, Dream out loud, Warren, Mayank geek, Mwtoews, Daniel.Cardenas, Mchavez, Ozhiker, Mike1901, Soumyasch, Jcoy, Brett.donald, Feraudyh, Loadmaster, Pedantic of Purley, Beetstra, Boomshadow, Jmgonzalez, Dicklyon, Larrymcp, GilbertoSilvaFan, Wag- gers, Tendim, H, Yoderj, Skabraham, Peter M Dodge, JoeBot, Newone, Twas Now, Aeons, Gorniac, Tawkerbot2, Harold f, JForget, Ale jrb, Tamarkot, Eponymous-Archon, Jesse Viviano, Green caterpillar, NickW557, Requestion, Grammaticus Repairo, Gogo Dodo, Llort, Chasingsol, Paul Heckbert, Quibik, Cgrenier, Kozuch, Omicronpersei8, Arb, Thijs!bot, Epbr123, Jedibob5, N5iln, Dtgriscom, Marek69, Bobblehead, Electron9, Michael A. White, Nobar, Escarbot, AntiVandalBot, Luna Santin, Seaphoto, Fru1tbat, Waerloeg, Farosdaugh- ter, Myanw, Eleos, Dhrm77, MikeLynch, JAnDbot, Almwi, Dogru144, Deflective, MER-C, PhilKnight, HAl, LittleOldMe, Steveprutz, Dakusan, SteveSims, Benstown, VoABot II, Rajb245, Clivestaples, Nyq, Yevgeniwebmaster, Biker JR, JNW, WODUP, Kriegaffe, El- liotbay, Fabrictramp, WhatamIdoing, Glen, Chris G, Stolsvik, AVRS, MartinBot, Andewulfe, Graham101, NAHID, Nikpapag, Seandy- lanw, Speck-Made, Mschel, J.delanoy, Kimse, Trusilver, AltiusBimm, Maurice Carbonaro, 06109599, AlexMld, Dispenser, Ncmvocalist, L'Aquatique, Compact disk, Plasticup, NewEnglandYankee, Zojj, Shoessss, 2help, KylieTastic, Ajfweb, Ja 62, Rickyrazz, Ale2006, Tux- ick, Reelrt, Pegase~enwiki, Malik Shabazz, Timotab, TCMike, Epson291, TXiKiBoT, Oshwah, GDonato, Nxavar, CoJaBo, Milkcrate, LeaveSleaves, Mamuf, Mwilso24, Cw2k, Meters, Falcon8765, Jakub Vrána, Balko Kabo, Ish rishabh, Ben Boldt, Lonwolve, Joe4440, Gaelen S., Manning james, AlphaPyro, Euryalus, Gerakibot, Tardis5923, God Emperor, Zardragon1, Flyer22 Reborn, Anow2, Oxy- moron83, Yswismer, Rhsimard, Anakin101, Tree Kittens, Struway2, Kauai68, Escape Orbit, C0nanPayne, Emk, Sanao, Church, Mar- tarius, Elassint, ClueBot, GorillaWarfare, XenonofArcticus, Lonelyprogrammer264, The Thing That Should Not Be, Chocoforfriends, Techdawg667, Jan1nad, Lawrence Cohen, Ndenison, Wysprgr2005, Timbo76, Smrti18, Legocool, Alexbot, M4gnum0n, Rcooley~enwiki, Jamesedwardlong, NuclearWarfare, Ca michelbach, ChrisHodgesUK, AbJ32, Aitias, DumZiBoT, XLinkBot, Yangez, Wikiuser100, Amar- rajsingh, Mifter, Dekart, Badgernet, Cortega, Deineka, Addbot, Ghettoblaster, Henry.guillotine, Fernsalan, MrZoolook, Shervinemami, Justanemokid, Jncraton, CanadianLinuxUser, MrOllie, Demkop, Favonian, West.andrew.g, 5 albert square, Hussainsab100, 84user, Light- bot, Fireattack, Meisam, Yobot, Boardersparadise, Tohd8BohaithuGh1, Ptbotgourou, Carminox, Gobbleswoggler, ArchonMagnus, Kcac- ciatore, AnomieBOT, Andrewrp, 1exec1, Justme89, Wuyongzheng, Mrbobotron, Xqbot, Lmitchell6, Andrew-916, Jeffrey Mall, HannesP, Lilllian76, Prunesqualer, ToraNeko, Shadowjams, FrescoBot, Kkj11210, Fbrazill, Nicoontheweb, Mfwitten, Pinethicket, I dream of horses, Tamariki, SpaceFlight89, Funkybearman, RobinK, Full-date unlinking bot, Kgrad, SchreyP, Lotje, Rixs, Cheng chai fung, Athaba, Mira- cle Pen, MisterPook, DARTH SIDIOUS 2, Mean as custard, Deityguns, Jeherve, LightStarch, Noname58, DASHBot, 3-5 file, EmausBot, John of Reading, Joesehpwilliam, Mk dexter, Bdijkstra, Nuujinn, Greypyjamas2, Wikipelli, Dcirovic, K6ka, Lucas Thoms, Max thep- rintspace, Lunagron, AvicBot, Kzl.zawlin, Davidolivan, JamesGeddes, Palashrijan, Tristaess, Ὁ οἶστρος, Xboxmaster17, Reim, DidgeGuy, FalseAlarm, Kevjonesin, Sbmeirow, Atcold, Bogdinamita, Bomazi, Generalnat2, Richgel999, CharlieEchoTango, Mikhail Ryazanov, Clue- Bot NG, Gareth Griffith-Jones, Jack Greenmaven, Kangdang, Widr, Be..anyone, Helpful Pixie Bot, HMSSolent, BG19bot, Neptune’s Trident, Walrus068, PhnomPencil, Pitzik4, Frze, Onewhohelps, Atl123, Writ Keeper, Bahersabry, Conifer, Fylbecatulous, Chrish1958, BattyBot, WP Randomno, Nfvr, Cyberbot II, Geyserit, Tjb777, Khazar2, Ornicks, JYBot, Fyodorser, SarahPalmerson, SoledadKabocha, Mogism, Jogfalls1947, Julià Minguillón, Frosty, Kulandru mor, Kitsurenda, Passengerpigeon, Loopylo2000, Caspar Cedro, Lethosor, Comp.arch, Enderchestfrantic, LauraALo, ScienceRandomness, DudeWithAFeud, Itmanfive, Norasisl, Hollyrood, Crystallizedcarbon, Ois- guad, TD712, Prophet Kwaku Frank nanah Harrison, Snipergang, Cherdchai Iamwongsrikul, Narrowusername561, Bobsrislolha, Hugovpr, Kurousagi, DatGuy, Unascribed, Hanscoil, Djamana, Saucecontrol, Kempot12, Fmadd, Bender the Bot and Anonymous: 739

• Color space Source: https://en.wikipedia.org/wiki/Color_space?oldid=741410750 Contributors: Damian Yerrick, Zundark, Heron, Branko, Alan Peakall, Dominus, Wapcaplet, MichaelJanich, CesarB, Cyp, LittleDan, Jay, Gutza, Hyacinth, Grendelkhan, Bevo, Denel- son83, Robbot, Noldoaran, Fredrik, Mfc, DocWatson42, Seabhcan, Brona, Asc99c, Guanaco, Avsa, Darrien, Sven271, Sam Hocevar, Hellisp, Uaxuctum~enwiki, Ta bu shi da yu, Poccil, CALR, Richie, Josef Meixner~enwiki, Rich Farmbrough, Guanabot, Smyth, Notinas- naid, SocratesJedi, Alue, Dkroll2, Bobo192, Dmurphy, Arthena, Kasper Hviid, PAR, Suruena, Forderud, Aadnk, Mindmatrix, Jacobo- lus, Cy21, Eyreland, Umofomia, Sparkit, BD2412, Rjwilmsi, NekoDaemon, Chobot, Adoniscik, Roboto de Ajvol, YurikBot, ENeville, Jonathan Webley, Retired username, Tetracube, BorgQueen, LeonardoRob0t, Benandorsqueaks, Cmglee, Sardanaphalus, SmackBot, Fish- Speaker, Moeron, Gilliam, Brianski, Chris the speller, Bluebot, VMS Mosaic, Seattlenow, Aditsu, Rythie, IronGargoyle, Beetstra, Waggers, 84 CHAPTER 13. MOTION ESTIMATION

Rhebus, Dr.K., Aursani, Cxw, ISteve, Chrumps, Aubrey Jaffer, Tletnes, Cpesacreta, Martin Hogbin, AgentPeppermint, Escarbot, Lovi- bond, Sluzzelin, The Transhumanist, Sterrys, DariusMonsef, Colordoc, 28421u2232nfenfcenc, Ashishbhatnagar72, R'n'B, Pharaoh of the Wizards, Normankoren, SharkD, Robartsd, Mufka, KylieTastic, Ale2006, Funandtrvl, Rebornsoldier, Odo1982, Ratuliut, Neparis, SieBot, Dwandelt, Marluxia.Kyoshu, Blacklemon67, ClueBot, Ficbot, Lambtron, Life of Riley, Misst r, Skarebo, Druj Nasu, Addbot, Kbseah, Lightbot, Aboalbiss, AnomieBOT, Jim1138, Sellyme, Twirligig, Dave3457, North8000, Chrisc666, Diannaa, Jhenderson777, Woodlot, Stephanej, Dcirovic, ZéroBot, Ὁ οἶστρος, Joeyvandernaald, ChuispastonBot, EdoBot, ClueBot NG, AlbertBickford, This lousy T-shirt, BarrelProof, BG19bot, AUllrich, Conifer, SarahPalmerson, Makecat-bot, OrangeNapkin, DavRosen, Kind Tennis Fan, Brockwattage, Tony0517, Rhyiana, Me-in-nk, Colorchaser, Fmadd, Bender the Bot and Anonymous: 122 • Color vision Source: https://en.wikipedia.org/wiki/Color_vision?oldid=742801269 Contributors: Graft, D, Minesweeper, Angela, Smack, Emperorbma, Ec5618, Selket, IceKarma, BenRG, Owen, Donarreiskoffer, Robbot, Hankwang, Guillep2k, Seglea, Pmineault, Cyrius, Giftlite, Seabhcan, BenFrantzDale, Tom harrison, Gadfium, Beland, Sayeth, Willhsmit, Quota, Natrij, MichaelMcGuffin, Rich Farm- brough, Vsmith, Notinasnaid, Bender235, ReallyNiceGuy, Dkroll2, Robert P. O'Shea, Dennis , Bobo192, Johnkarp, Arcadian, Guidod, Anthony Appleyard, Njaard, Arthena, Atlant, PatrickFisher, PAR, Aranae, TaintedMustard, Skatebiker, Gene Nygaard, Voxadam, Bookandcoffee, Kazvorpal, Forderud, Jackhynes, Joriki, Batintherain, OwenX, Georgia guy, Rechlin, Jacobolus, Tckma, Tabletop, Noetica, Magister Mathematicae, Kbdank71, Dpv, Edison, Rjwilmsi, Joe Decker, Fred Hsu, Oscabat, Titoxd, AED, SiriusB, Nihiltres, NekoDae- mon, Srleffler, DVdm, Adoniscik, A314268, Dysmorodrepanis~enwiki, Janke, Lepidoptera, Voidxor, Bmju, Archer7, Finell, Crystallina, SmackBot, WilliamThweatt, Tom Lougheed, McGeddon, C.Fred, Thorseth, Delldot, Chris the speller, Bluebot, Keegan, Audacity, Persian Poet Gal, RDBrown, MartinPoulter, Imagine1989, Nbarth, Sbharris, Scwlong, VMS Mosaic, Daqu, Nakon, B jonas, Tagal, Richard001, Clean Copy, Last Avenue, Johnor, Drag-5, LtPowers, Harryboyles, JunCTionS, Wandell, Robertg9, Jonas August, Sjöðar, Extremophile, Smith609, Dicklyon, Guyburns, Սահակ, Paleolith, Iridescent, Aoleson, Vaughan Pratt, CmdrObot, Rambam rashi, NickSpiker, Abeg92, Lcguang~enwiki, Medtopic, Thijs!bot, Ebichu63, Keraunos, Headbomb, Picus viridis, AgentPeppermint, Oosh, SvenAERTS, AntiVandal- Bot, Ninjakannon, Magioladitis, VoABot II, Usien6, Cgingold, D3z, Talon Artaine, C A Morris, Atarr, Vanessaezekowitz, Jim.henderson, Threetwoone, Codeye, Jerry, Acalamari, SharkD, Shepaado, AntiSpamBot, LittleHow, Belovedfreak, Nwbeeson, Moonksy29, Olivier Hammam, Joeinwap, VolkovBot, TheMindsEye, Oshwah, Anihl, Britonamission, Rebornsoldier, Softtest123, Ericmelse, AlleborgoBot, SieBot, BIGELLOW, X-Fi6, Renatops, Bentogoa, Treehill, Sanya3, Hatster301, AlanUS, Sean.hoyland, Wernervb, YSSYguy, ClueBot, LAX, The Thing That Should Not Be, Richerman, Hunt9, Pmronchi, Tyler, SchreiberBike, Jonverve, SoxBot III, SF007, Zarnivop, Jytdog, Ost316, Little Mountain 5, SilvonenBot, Hgold, Addbot, DOI bot, Micromaster, Nuvitauy07, Daltonsgirl, ChenzwBot, Luckas-bot, Crisp- muncher, Aboalbiss, AdvCentral, Azcolvin429, KoenB, AnomieBOT, Enisbayramoglu, JackieBot, Materialscientist, 90 Auto, Danno uk, Citation bot, DynamoDegsy, Lapabc, LilHelpa, Xqbot, Marxercise, Alumnum, JukeJohn, Psychron1, Efodix, Shadowjams, Nikil44, Lu- casilver, Mfwitten, NifCurator1, Louperibot, Citation bot 1, Drj11, Pinethicket, HRoestBot, Calmer Waters, Trappist the monk, Wotnow, Lotje, Duoduoduo, Zink Dawg, Jhenderson777, Suffusion of Yellow, Logical Fuzz, Gunnar Vestergaard, Ranv, Rayman60, WikitanvirBot, Caribant, Dewritech, NotAnonymous0, Dcirovic, QuentinUK, NicatronTg, Washington00, K kisses, Ego White Tray, ClueBot NG, Jj1236, Dreth, O.Koslowski, CasualVisitor, Helpful Pixie Bot, Titodutta, Bibcode Bot, BG19bot, Psheno, Feitosa-santana, StevenBjerke, Zyxwv99, Preeti.sambi, 1993kid, Munawar6, Sonalprasad, Krystaleen, Dexbot, Me, Myself, and I are Here, Edgeuction, Nicholashunter, Giavoto, Ginsuloft, Quenhitran, Atotalstranger, Anrnusna, Mfareedk, JaconaFrere, JohnSHicks, Monkbot, Musamaster, Fruitypebble, Trackteur, Poiuytrewqvtaatv123321, Pictalee the kitty, Jay912, ChemicalyImbalanced, Max El Magnifico, Fguhock, Marysoliris, Cmbakwe, Gra- hameditor, TheoTPV, Bender the Bot and Anonymous: 228 • YUV Source: https://en.wikipedia.org/wiki/YUV?oldid=746599520 Contributors: The Anome, PierreAbbat, Karen Johnson, Maury Markowitz, Heron, Collabi, Alfio, Ahoerstemeier, Nikai, Willem, Crissov, Timwi, David costanzo, Mina86, Denelson83, Robbot, Fredrik, DHN, Asparagus, Achurch, C8to, Zigger, Karn, Lefty, Leonard G., CMJ, Alexander.stohr, Gilgamesh~enwiki, Darrien, Mako098765, Pembers, Ary29, Marc Mongenet, Grm wnr, Qef, Rich Farmbrough, Qutezuce, Poltras, Notinasnaid, Bender235, Plugwash, MrVacBob, Ardric47, Lysdexia, Benhutchings, Cburnett, Pismak, Nuggetboy, Jacobolus, Timsamoff, Zilog Jones, Jhartmann, Pawnbroker, Joe Decker, Bmenrigh, Yar Kramer, FlaBot, Srleffler, Stephantom, Tene, Adoniscik, Amaurea, Ec-, Eraserhead1, Blackworm, O^O, Engineer- Scotty, VikC, Dsandlund, Sylvain Marliere~enwiki, Bb3cxv, Wknight94, FaZ72, MrTroy, AlonsoAlfons, Krótki, SmackBot, Senord- ingdong, Stimpy, GrGBL~enwiki, Brianski, Nbarth, Zom-B, Cantalamessa, Vegard, VMS Mosaic, OnionRingOfDoom, RedKnight7, Daniel.Cardenas, Banzoo, Dicklyon, Bp0, Plefno, Paul Foxworthy, Tawkerbot2, Pfagerburg~enwiki, Jesse Viviano, John259, Revolus, Liv- ingShadow, HappyFunJay, Tonyle, (3ucky(3all, Glennchan, Cefarix, Dezidor, Ichernev, Stannered, Ben pcc, XtoF, CosineKitty, PhilKnight, Jahoe, Nikevich, Weisu, Auiow, Tholly, CommonsDelinker, Knutinh, Markzou, SharkD, Pgubanov, Xu9780, Dcouzin, Tkgd2007, Joein- wap, Xburge03, Pierrelucbacon, Rjgodoy, Spinningspark, Gmathis, SieBot, Sasha.cohn, Penkov~enwiki, Kevinarpe, DumZiBoT, Salam32, Addbot, Lightbot, Drpickem, Yobot, Turfdoggle, Magog the Ogre, AnomieBOT, Joule36e5, Dwayne, LilHelpa, Tabledhote, LCID Fire, Maggyero, Ofir nahum, Hexcoder, FoxBot, MoreNet, Chaim Leib, Bento00, WikitanvirBot, Stephanej, RenamedUser01302013, QuentinUK, Atcold, Dan-sagunov, Thebombzen, Jack Greenmaven, Pvranade, Versatranitsonlywaytofly, Fde, BG19bot, KShiger, Pratyya Ghosh, Derzuomaia, Faizan, Comp.arch, Aleks-ger, Sureshgk2, *thing goes, Dsp420, JasminRoy, Fmadd and Anonymous: 161 • YCbCr Source: https://en.wikipedia.org/wiki/YCbCr?oldid=737948513 Contributors: Heron, Crissov, Dcoetzee, Radiojon, Alexan- der.stohr, Rchandra, HorsePunchKid, Ehudshapira, Aknorals, Notinasnaid, Wisdom89, Benhutchings, Algocu, Kbolino, Nuggetboy, Ja- cobolus, Jhartmann, Pawnbroker, KamasamaK, Mike Segal, Cat5nap, Supreme geek overlord, Yar Kramer, Preslethe, Srleffler, Charles Gaudette, RussBot, Gaius Cornelius, TheMandarin, Jpbowen, Mysid, Ruza~enwiki, Boivie, Calvin08, Orbit02, Mike1024, HereToHelp, KJBracey, Zvika, AlonsoAlfons, SmackBot, Nowic, Konstable, VMS Mosaic, Stratadrake, Dicklyon, DabMachine, Chris319, XyKyWyKy, Cxw, Pfagerburg~enwiki, Neelix, W.F.Galway, LivingShadow, Thijs!bot, Glennchan, Pie Man 360, Dougher, JAnDbot, Arch dude, BlueR- obot~enwiki, VoABot II, Japo, Auiow, Tmetro, Knutinh, Sigmundur, VolkovBot, Orgads, Cuddlyable3, Gmathis, Formerly the IP-Address 24.22.227.53, Cpoynton, Svick, Digisus, MXER, GrandDrake, Alksentrs, Scarsxrunxdeep, Copyeditor42, Ogat, Addbot, LaaknorBot, IOLJeff, 84user, Loupeter, Yobot, Savalou, Omnipaedista, FrescoBot, Lonaowna, Maggyero, Noel Streatfield, Hexcoder, EmausBot, Go- ingBatty, RenamedUser01302013, Atcold, Dharam66, Be..anyone, Wyatt8740, LionDoc, Akababa, TheHampe, A583886, Jkorinth and Anonymous: 70 • Chroma subsampling Source: https://en.wikipedia.org/wiki/Chroma_subsampling?oldid=746428074 Contributors: Damian Yerrick, Brion VIBBER, Gareth Owen, Stib, Crissov, Grendelkhan, Giftlite, Zigger, Everyking, Fleminra, Quadell, OverlordQ, Bumm13, Tooki, Grm wnr, Lostchicken, DmitryKo, PeterJerde, Smyth, Smalljim, Richi, Phlake, Ahruman, Algocu, Dan100, Georgia guy, LOL, Jhartmann, Isnow, Eyreland, Sneakums, Yuriybrisk, Snafflekid, Ketiltrout, FlaBot, Heycam, Bitoffish, Itinerant1, DrVeghead, Fogelmatrix, Chobot, Bgwhite, Wavelength, Msikma, Janke, Mikeblas, Mugunth Kumar, Deville, Shawnc, Mikus, Cmglee, Tom Morris, SmackBot, Mdd4696, Betacommand, BirdValiant, Nbarth, Cantalamessa, AnarchyElmo, VMS Mosaic, OranL, AlyM, Kc2idf, The undertow, Kikawala, Gang65, Stratadrake, Abolen, Volatileacid, Dicklyon, CmdrObot, Cydebot, Djg2006, Quibik, Thijs!bot, Jm3, Glennchan, CombatWombat42, Gij, CosineKitty, Jgro, Drewcifer3000, Auiow, CommonsDelinker, Xavier Giró, Totsugeki, Xu9780, Dcouzin, Tuxick, Ray andrew, Win- 13.6. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 85

TakeAll, Telecineguy, AlleborgoBot, MinorContributor, D-CinemaGuy, ClueBot, GrandDrake, Czarkoff, Cyrilgermond~enwiki, Blan- chardb, Gallando, Dekart, Addbot, Sergei.yakovlev, SpBot, Ginosbot, Raulsaavedraf, Legobot, Yobot, AnomieBOT, Schmeditator, Cita- tion bot, LilHelpa, Drilnoth, Abradoks, Lonaowna, Reaper Eternal, Javache, LightStarch, EmausBot, Dadr, Josve05a, Intellec7, ClueBot NG, Danitxu, Helpful Pixie Bot, BG19bot, Vincent Liu, ChrisGualtieri, Tagremover, Cerabot~enwiki, Fycafterpro, YiFeiBot, Theys York, AlexL1118, Monkbot, Mohsen gholami2, Bender the Bot and Anonymous: 114 • Discrete cosine transform Source: https://en.wikipedia.org/wiki/Discrete_cosine_transform?oldid=720226655 Contributors: The Anome, Tbackstr, Matusz, PierreAbbat, Michael Hardy, DopefishJustin, Nixdorf, Ahoerstemeier, Stevenj, Cferrero, Darkwind, Nikai, Dcoetzee, Vrable, Rik Bos, Robbot, Hankwang, Tea2min, Giftlite, CryptoDerk, OverlordQ, Saucepan, Imjustmatthew, Mike Rosoft, Qutezuce, Foolip, Shanes, Photonique, Obradovic Goran, LutzL, Shawn K. Quinn, CyberSkull, Cburnett, Mixer, Novacatz, Ods15, Is- now, Waldir, Marudubshinki, Qwertyus, Mulligatawny, Misternuvistor, Alejo2083, Chobot, EvilStorm, DVdm, Agamemnon2, YurikBot, Armistej, John Quincy Adding Machine, Stephenb, Gaius Cornelius, Stassats, DmitriyV, That Guy, From That Show!, SmackBot, Deepakr, Oli Filth, Metacomet, Can't sleep, clown will eat me, Harumphy, Berland, Henning Makholm, Daniel.Cardenas, Ser Amantio di Nicolao, Gurijala, NongBot~enwiki, Loadmaster, Guyburns, Momet, Mellery, Glanthor Reviol, Samuell, Buscha, Yuchen~enwiki, Thijs!bot, So- breira, Electron9, Davehein, LachlanA, Salgueiro~enwiki, Amusingmuses, Magioladitis, Cdecoro, Glrx, Raj75081, Mstuomel, Tolstoy143, Arkadi kagan, SieBot, PanagosTheOther, X-Fi6, Yintan, Lourakis, Mangledorf, Mallapurbharat, XLinkBot, Addbot, Innv, MagnusA.Bot, Jjdawson7, Tpa87~enwiki, Poneng, Yobot, Mordecai lee, ArthurBot, Rychomaciol, Xqbot, Arnacer, Srich32977, Alomov, RibotBOT, FrescoBot, Rs 71 77 ta 92 93 9, Rafostry, RedBot, Davidmeo, Rs 71 77 ta 92 93 95, Sardar.ali, TobeBot, Yunshui, Christoph hausner, Mean as custard, RjwilmsiBot, Helwr, WikitanvirBot, Benhut1, KHamsun, GuenterRote, ZéroBot, John Cline, Crazy runner, Adcasaa, Lorem Ip, ClueBot NG, Hanakus, Widr, JordoCo, BG19bot, Wiki13, SciCompTeacher, MatthewIreland, Nfvr, Dexbot, Jogfalls1947, Kephir, Cerabot~enwiki, Faizan, Damjeux, Eric Corbett, Lollolloolololol, Monkbot, SkateTier, Vicky Ajmera, Psoenderg, Nhabedi, Hakanhaber- darwiki and Anonymous: 172 • H.264/MPEG-4 AVC Source: https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC?oldid=743786501 Contributors: Bryan Derksen, Zundark, The Anome, Etu, Jrincayc, Ghakko, Fred Bauder, Nixdorf, Gabbe, Ahoerstemeier, KAMiKAZOW, Stevenj, Bueller 007, Julesd, Marco Krohn, Whkoh, Andres, Adam Conover, Crissov, Fuzheado, Doradus, Radiojon, Tpbradbury, Rvalles, Mrand, Thue, Bevo, Quoth-22, Topbanana, Nickshanks, Jeffq, Cluth, Robbot, Kadin2048, Dittaeva, Jzhang, Hadal, Unfree, David Gerard, Pdenapo, DocWatson42, DavidCary, Mintleaf~enwiki, Philwelch, Fudoreaper, Bork, Jonabbey, Fleminra, Mellum, AlistairMcMillan, Bobblewik, Edcolins, Ryanaxp, Wmahan, D3, Rrw, Thewikipedian, Antandrus, Dan aka jack, Beland, OverlordQ, Quarl, Ablewisuk, Kaldari, Jossi, Hgfernan, Chuck-meister, EagleOne, DmitryKo, ChrisRuvolo, Arcataroger, Discospinster, Rhobite, Pmsyyz, Qutezuce, Vsmith, EliasAlu- card, Chowells, Berkut, Horkana, Demitsu, Gronky, Bender235, Thebrid, Sockatume, FirstPrinciples, Koenige, Jantangring, Alereon, MarkWahl, Asierra~enwiki, Dee Earley, Franl, Snacky, GChriss, Hargrimm, Barte, Ricky81682, Kocio, Dark Shikari, VladimirKo- rablin, Tetromino, Ronark, Irdepesca572, Paul1337, Stephan Leeds, VoluntarySlave, Thechoirlife2006, Algocu, Chiprunner, Boothy443, Woohookitty, Morton.lin, Barrylb, Ma Baker, Robert K S, Ruud Koot, Yueh, Quadra630, GregorB, Isnow, SDC, Tom W.M., Marudub- shinki, Pawnbroker, Dulldull, Ekyygork, Graham87, Lemoncurd, Galwhaa, David Levy, Mulligatawny, Mendaliv, Teque5, Rjwilmsi, Pde- long, Plonk420, JMCorey, SudoMonas, Koavf, Pangolin, KamasamaK, Qnaalor, Sdornan, Joz3d, Cat5nap, Wikifier, Pengvado, Titoxd, FlaBot, Ysangkok, Gurch, Janitscharen, OriginalGamer, Srleffler, BMF81, Chobot, Elange, Digitalme, Dadu~enwiki, YurikBot, Wave- length, Hairy Dude, RussBot, Arado, J. M., Robert Will, Limulus, WLGades, Hydrargyrum, Gaius Cornelius, Dynamo ace, TheMandarin, NawlinWiki, Wiki alf, Msikma, Thatdog, MySchizoBuddy, EEMIV, Gadget850, Janta~enwiki, CLW, Richardcavell, Daniel C, Nuprin, Closedmouth, E.P.I.C., Shawnc, OroCHU, Jesup, Mikus, DmitriyV, Stuinzuri, Passer~enwiki, ViperSnake151, NetRolller 3D, Daniel G., DrJolo, SmackBot, GrummelJS, Manabu~enwiki, McGeddon, KelleyCook, Paxse, Mauls, Flux.books, J Darnley, Jcarroll, Aims3Bor, Wookipedian, Esoj zeravla, Coinchon, Thumperward, Jeysaba, Ssfreitas, ASBands, Nbarth, Kungming2, Metalim, Ned Scott, Onceler, Mooncow, Can't sleep, clown will eat me, Cantalamessa, Frap, Eliyahu S, Squilibob, MLeb, Edwtie, Kadavill, Brindy, Freeone3000, AlyM, Gurnec, Acdx, Daniel.Cardenas, Vina-iwbot~enwiki, Moeburn, SashatoBot, Ozhiker, ManiacK, Nextop~enwiki, Notmicro, Roguegeek, Briantist, Michael Slana, Breno, Slakr, AxG, Zachdms, Kvng, Luismbs, Betaboy, Muéro, Scolen2, Tawkerbot2, Vssmike, Cpufreak2589, FleetCommand, VoxLuna, Dgw, Jesse Viviano, Requestion, Paulirwin, MaxEnt, Phatom87, Cydebot, W.F.Galway, Wa2ise, Darimee, Kakesu, Shirulashem, Ticapix, Javsav, Babyworm, Ferenczy, John254, Steve.ruckdashel, Eatrains, Greg L, Oosh, AntiVandalBot, Wide- fox, Sapibobo, Llloic, Prolog, Matariel, Scepia, Credema, Corella, Poee, VictorAnyakin, JAnDbot, CombatWombat42, Harryzilber, Gij, Thechristian, MER-C, Justin The Claw, Erpel13, Sujayt, J knowlton tx, Wikilolo, Magioladitis, Farooqn, Schwans, Father Goose, Kei- thicus, XMog, Spellmaster, Hemidemisemiquaver, AVRS, MartinBot, SeyedKevin, ARCG, Jim.henderson, Rettetast, RBFN, X264hack, J.delanoy, Jesant13, Jol123, SharkD, WikiBone, Digital digest, Quahog21, Mufka, Joshua Issac, Adamd1008, Tir-Gwaith, Doubleyouyou, Bsautter, Kruador, RJASE1, Funandtrvl, Endorf, Jim.Callahan,Orlando, JSteinseiffer, Judith6yu, HIGH44576I, Jvte, Kiranwashindkar, Nabbia, Rei-bot, Qxz, Coldfire82, Wdunaway, Jonasmike, FogDevil, JhsBot, Vpbpi, Charliemccay, Dreadloco, Katimawan2005, TheArv, Andemedia, Cschalle, Simaocampos, Nick.kinson, Mahuhn, Hsj13, Vitz-RS, G00nsf, Biggiesized, Vcodex, AlphaPyro, Phe-bot, Pig837, Siway, Permacultura, Fpmfpm, DhivehiRaajje, Lightmouse, Gerardpons, Bluebretagne, Svick, Microwaven~enwiki, Grazfather, Clue- Bot, Mhorchler, GrandDrake, Czarkoff, Uncle Milty, Rjsquirrel, Namazu-tron, Rockfang, Esuvorkin, Excirial, Rcooley~enwiki, Wilsone9, Njuuton, Zeldafreakx86, BOTarate, Bald Zebra, Nakomaru, ThreeOneFive, Jvting, DumZiBoT, Guiddruid, Rreagan007, Jkolak, Salam32, Dgtsyb, Dsimic, Addbot, Videogenie, Scientus, MrOllie, LaaknorBot, Ipsupermarket, GufNZ, Tide rolls, Lightbot, LEADTechnologies, , Luckas-bot, Yobot, Savagezhang, Cedric.lacrambe, AVB, AnomieBOT, Etiwiki, Motfellow, Mannies, Wallsbk, A333, Que- bec99, LilHelpa, Gsmgm, TheAMmollusc, Martnym, Nonet84, Ebohman, Nasa-verve, EmuAGR, RibotBOT, TexasDig, Griessn, Bas- sower, Th3typh00n, Zoltan1, Tilkax, FrescoBot, Lonaowna, Kanryu, , Serols, Ballplyr86, LiberatorG, Stafik, Niri.M, Theuns.verwoerd, Twistor96, Lotje, Michael9422, Simonhorlick, Asided m plane, Digital dinosaur, Bongdentoiac, Ttmcmurry, RjwilmsiBot, Nowakpl, Man- dolinface, Ooikhaichin, EmausBot, Cctv guy, Dewritech, Pez Bower, K6ka, AsceticRose, Regression Tester, Aeiuthhiet, Bobble35, Ὁ οἶστρος, H3llBot, MajorVariola, FalseAlarm, Monika Ceskova Fourneaux, Palosirkka, Malikcis, Puffin, Martin196r, WhiplashInferno, Smadim2, Gregtheross, Gordon1104, ClueBot NG, Zhangji88, Be..anyone, Ravana 999, Sigidagirl, CCTVguru, Roozhou, Yetisyny, Ip- surveillance, Unixman83, Comfr, BattyBot, MatthiasBock, Wikkiwitchh, Cyberbot II, Helmboy, Jogfalls1947, Fighuhldz, Frosty, Mark viking, Comp.arch, Techie007, LauraALo, Lwarrenwiki, Dark Mistress, NashiAnkur, Monkbot, ArchitectOfIdeas, Fvisagie, Jbellars, So- haibsibi1, GreenC bot and Anonymous: 680 • Group of pictures Source: https://en.wikipedia.org/wiki/Group_of_pictures?oldid=721785255 Contributors: Uwe Kessler~enwiki, Gre- gorB, Isnow, Mulligatawny, GünniX, DanielPenfield, MalafayaBot, MNewnham, Daniel.Cardenas, CmdrObot, Jesse Viviano, Pdq, Yoga- rine, John Link, Bongwarrior, Tinucherian, Jez9999, BenTels, Mram80, Redrocketred, Dsimic, Addbot, Jim10701, Legobot, Luckas-bot, Yobot, Grobblakk, Xqbot, Louperibot, IamQueensBoulevard, Sitlitus, Jasonkts, Blurps04 and Anonymous: 28 • Video compression picture types Source: https://en.wikipedia.org/wiki/Video_compression_picture_types?oldid=733511701 Contrib- utors: AxelBoldt, BAxelrod, Charles Matthews, Timwi, Maximus Rex, Rfc1394, DavidCary, OverlordQ, DMG413, Andreas Kauf- 86 CHAPTER 13. MOTION ESTIMATION

mann, Jtact, Hooperbloob, Katana, Antialias, Woohookitty, Sdgjake, Ivar Y, Isnow, Mulligatawny, Pangolin, Berrinam, RussBot, J. M., Nick, DmitriyV, Allens, One, SmackBot, Chris the speller, Thumperward, Patriarch, Serzh-z, Yaksha, Radagast83, NickPenguin, Daniel.Cardenas, Fjjf, Ozhiker, Chrisch, FleetCommand, Davidhorman, EdJogg, seren, Alphachimpbot, JAnDbot, Father Goose, Cœur, 0612, Grimko, VolkovBot, Petteri Aimonen, DaBler, C xong, Blackboxxx, Methossant, Netyire, Addbot, Mathieu Perrin, Swiveler, YongjunwMS, ArthurBot, Xqbot, Shirik, Henk.muller, BenzolBot, CountVajhula, Probatus, ZéroBot, Hinata, Ipsign, ChuispastonBot and Anonymous: 42 • Inter frame Source: https://en.wikipedia.org/wiki/Inter_frame?oldid=741929522 Contributors: Steinsky, ArcRiley, Klemen Kocjancic, A ,Mwarren us, DougCube ,הסרפד ,purple wikiuser, Mulligatawny, Joz3d, Berrinam, Daniel.Cardenas, John, Fuzzbox, CmdrObot, Cydebot Xburge03, DaBler, C xong, MystBot, Addbot, OlEnglish, Yobot, Superweapons, New.limit, Nonet84, Alexrelucas, Crno srce, Nkdipak, Probatus, JosJuice, Hinata, Mohezz, Flugaal, Tagremover, QCH1M95DW3 and Anonymous: 22 • Motion compensation Source: https://en.wikipedia.org/wiki/Motion_compensation?oldid=686831950 Contributors: Damian Yerrick, The Anome, Rade Kutil, Patrick, Kku, Stevenj, Gobeirne, Lethe, OverlordQ, Gary D, Klemen Kocjancic, Huffers, Shinglor, Autiger, Atlant, Dark Shikari, Mnemo, Isnow, Eyreland, Mulligatawny, SudoMonas, Cat5nap, FlaBot, Arnero, Intgr, Berrinam, YurikBot, Cecil- Ward, DmitriyV, SmackBot, Djj4, Patrickdepinguin, Stubblyhead, Patriarch, Tkho~enwiki, AlyM, Romanski, Daniel.Cardenas, TenPound- Hammer, Dinamisbo, Mudd1, Cydebot, Colinmanning, Electron9, Davidhorman, Spaden1, Ngarnham, Frankk74, Whoop whoop, Up- holder, Conquerist, Mstuomel, VolkovBot, Nxavar, Oxfordwang, Revent, MinorContributor, Abdkhanz, C xong, Mild Bill Hiccup, 3dtech, Goodone121, Rcooley~enwiki, Kegon, C. A. Russell, Addbot, SpBot, TheRoan, Legobot, Sumail, AnomieBOT, Maxis ftw, Ita140188, MoreNet, Vutrankien, Cogiati, AvicAWB, Venkatarun95, Clark matt, Helpful Pixie Bot, Comfr, BattyBot, Makecat-bot, Temp12345789 and Anonymous: 49 • Motion estimation Source: https://en.wikipedia.org/wiki/Motion_estimation?oldid=732709394 Contributors: The Anome, CesarB, Giftlite, Kwamikagami, Kocio, Forderud, SmackBot, Davidhaha, Mudd1, Cydebot, Jim.henderson, SMC89, Melcombe, C xong, Muro Bot, Kegon, Addbot, Luckas-bot, Yobot, RibotBOT, Tuankiet65, Me, Myself, and I are Here, Sara.Ezz, DancingWhale, OmniBot, Fmadd and Anonymous: 14

13.6.2 Images

• File:1Mcolors.png Source: https://upload.wikimedia.org/wikipedia/commons/d/d6/1Mcolors.png License: Public domain Contributors: Transferred from the English Wikipedia. Original file is/was here. ([#Original_upload_log Original upload log] available below.) Original artist: Janke • File:420-interlaced-single-field.png Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/420-interlaced-single-field.png License: CC-BY-SA-3.0 Contributors: English Wikipedia Original artist: Glenn Chan • File:420-interlaced-still.png Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/420-interlaced-still.png License: Public domain Contributors: English Wikipedia Original artist: Glenn Chan • File:420-original444.png Source: https://upload.wikimedia.org/wikipedia/commons/d/d9/420-original444.png License: Public domain Contributors: English Wikipedia Original artist: Glenn Chan • File:420-progressive-single-fiel.png Source: https://upload.wikimedia.org/wikipedia/commons/7/70/420-progressive-single-fiel.png License: Public domain Contributors: Transferred from en.wikipedia to Commons by Jay8g using CommonsHelper. Original artist: Glennchan at English Wikipedia • File:420-progressive-still.png Source: https://upload.wikimedia.org/wikipedia/commons/1/1c/420-progressive-still.png License: Public domain Contributors: English Wikipedia Original artist: Glenn Chan • File:444-original-single-field.png Source: https://upload.wikimedia.org/wikipedia/commons/5/54/444-original-single-field.png Li- cense: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia Original artist: Glennchan at en.wikipedia • File:AdditiveColor.svg Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/AdditiveColor.svg License: Public domain Con- tributors: Transferred from en.wikipedia to Commons. Original artist: SharkD at English Wikipedia Later versions were uploaded by Jacobolus at en.wikipedia. • File:Ambox_rewrite.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/1c/Ambox_rewrite.svg License: Public domain Contributors: self-made in Inkscape Original artist: penubag • File:Barn-yuv.png Source: https://upload.wikimedia.org/wikipedia/commons/2/29/Barn-yuv.png License: Public domain Contributors: Concept from en:Image:YUV_components.jpg, original public domain image at en:Image:Barns_grand_tetons.jpg Original artist: User: Brianski • File:Barns_grand_tetons_YCbCr_separation.jpg Source: https://upload.wikimedia.org/wikipedia/commons/d/d9/Barns_grand_ tetons_YCbCr_separation.jpg License: Public domain Contributors: Based on the (public domain) photo Image:Barns grand tetons.jpg. Code above and resulting output by Mike1024. Original artist: Mike1024 • File:Block_partition.jpg Source: https://upload.wikimedia.org/wikipedia/commons/c/ce/Block_partition.jpg License: CC BY 2.5 Con- tributors: VC Demo Original artist: VC Demo, itu delft • File:CCD.png Source: https://upload.wikimedia.org/wikipedia/commons/3/32/CCD.png License: Public domain Contributors: Own work Original artist: LionDoc • File:CIE1931xy_gamut_comparison.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/1e/CIE1931xy_gamut_ comparison.svg License: CC BY-SA 3.0 Contributors: http://commons.wikimedia.org/wiki/File:CIE1931xy_blank.svg Original artist: BenRG and cmglee • File:CIExy1931_Rec_2020_and_Rec_709.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/27/CIExy1931_Rec_ 2020_and_Rec_709.svg License: CC BY-SA 3.0 Contributors: • CIExy1931.svg Original artist: CIExy1931.svg: Sakurambo • File:Cie_Chart_with_sRGB_gamut_by_spigget.png Source: https://upload.wikimedia.org/wikipedia/commons/6/60/Cie_Chart_ with_sRGB_gamut_by_spigget.png License: CC BY-SA 3.0 Contributors: Own work Original artist: Spigget 13.6. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 87

• File:Color-bars-original.png Source: https://upload.wikimedia.org/wikipedia/commons/3/39/Color-bars-original.png License: Public domain Contributors: English Wikipedia Original artist: Glenn Chan • File:Color-bars-vegas-.png Source: https://upload.wikimedia.org/wikipedia/commons/3/36/Color-bars-vegas-dv.png License: Pub- lic domain Contributors: English Wikipedia Original artist: Glenn Chan • File:Colorcomp.jpg Source: https://upload.wikimedia.org/wikipedia/commons/0/06/Colorcomp.jpg License: CC BY-SA 3.0 Contribu- tors: I(Janke) | Talk) created this work entirely by myself. Original artist: Janke | Talk • File:Colorspace.png Source: https://upload.wikimedia.org/wikipedia/commons/3/37/Colorspace.png License: CC BY 2.5 Contributors: Transferred from en.wikipedia to Commons by aboalbiss. Original artist: The original uploader was Cpesacreta at English Wikipedia • File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: CC-BY-SA-3.0 Contribu- tors: ? Original artist: ? • File:Cone-fundamentals-with--spectrum.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/04/ Cone-fundamentals-with-srgb-spectrum.svg License: Public domain Contributors: Own work Original artist: BenRG • File:Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv Source: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_ 1471-2342-12-24-S1.ogv License: CC BY 2.0 Contributors: Flint A (2012). "Determining optimal medical image compression: psychometric and image distortion analysis". BMC Medical Imaging. DOI:10.1186/1471-2342-12-24. PMID 22849336. PMC: 3431233. Original artist: Flint A • File:DCT-8x8.png Source: https://upload.wikimedia.org/wikipedia/commons/2/24/DCT-8x8.png License: Public domain Contributors: Own work Original artist: Devcore • File:DCT-symmetries.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/ae/DCT-symmetries.svg License: CC-BY-SA- 3.0 Contributors: en.wikipedia Original artist: en:Stevenj • File:Dct-table.png Source: https://upload.wikimedia.org/wikipedia/commons/6/63/Dct-table.png License: Public domain Contributors: Own work Original artist: Hanakus • File:Dctjpeg.png Source: https://upload.wikimedia.org/wikipedia/commons/2/23/Dctjpeg.png License: Public domain Contributors: No machine-readable source provided. Own work assumed (based on copyright claims). Original artist: No machine-readable author provided. FelixH~commonswiki assumed (based on copyright claims). • File:Direct_skip.jpg Source: https://upload.wikimedia.org/wikipedia/commons/a/a6/Direct_skip.jpg License: Public domain Contribu- tors: Own work Original artist: Gerard Pons • File:Edit-clear.svg Source: https://upload.wikimedia.org/wikipedia/en/f/f2/Edit-clear.svg License: Public domain Contributors: ? Origi- nal artist: ? • File:Elephantsdream_vectorstill04_crop.png Source: https://upload.wikimedia.org/wikipedia/commons/c/c0/Elephantsdream_ vectorstill04_crop.png License: CC BY 3.0 Contributors: (c) copyright 2006, Blender Foundation / Netherlands Media Art Institute / www.elephantsdream.org Original artist: Own work • File:Elephantsdream_vectorstill06.png Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/Elephantsdream_vectorstill06. png License: CC BY 3.0 Contributors: (c) copyright 2006, Blender Foundation / Netherlands Media Art Institute / www.elephantsdream.org Original artist: Own work • File:Example_dft_dct.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/0f/Example_dft_dct.svg License: CC-BY-SA- 3.0 Contributors: This image was created with gnuplot. Original artist: Alessio Damato • File:Eyesensitivity.svg Source: https://upload.wikimedia.org/wikipedia/commons/c/c0/Eyesensitivity.svg License: CC BY-SA 3.0 Con- tributors: File:Evesensitivity.svg, vectorised Original artist: Skatebiker, vector by Adam Rędzikowski • File:Felis_silvestris_silvestris_small_gradual_decrease_of_quality.png Source: https://upload.wikimedia.org/wikipedia/commons/ e/e9/Felis_silvestris_silvestris_small_gradual_decrease_of_quality.png License: CC BY 3.0 Contributors: • Felis_silvestris_silvestris.jpg Original artist: Felis_silvestris_silvestris.jpg: Michael Gäbler • File:Filterstef.JPG Source: https://upload.wikimedia.org/wikipedia/commons/a/a3/Filterstef.JPG License: CC-BY-SA-3.0 Contributors: ? Original artist: User Stefan.lila on sv.wikipedia • File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by- sa-3.0 Contributors: ? Original artist: ? • File:H.264_block_division.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/bf/H.264_block_division.svg License: CC0 Contributors: Own work Original artist: Flugaal • File:IPB_images_sequence.png Source: https://upload.wikimedia.org/wikipedia/commons/7/7a/IPB_images_sequence.png License: CC-BY-SA-3.0 Contributors: No machine-readable source provided. Own work assumed (based on copyright claims). Original artist: No machine-readable author provided. Brostat assumed (based on copyright claims). • File:I_P_and_B_frames.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/64/I_P_and_B_frames.svg License: Public domain Contributors: Own work Original artist: Petteri Aimonen • File:Idct-animation.gif Source: https://upload.wikimedia.org/wikipedia/commons/5/5e/Idct-animation.gif License: CC BY-SA 3.0 Con- tributors: Own work Original artist: Hanakus • File:Interframe_prediction.png Source: https://upload.wikimedia.org/wikipedia/commons/a/a2/Interframe_prediction.png License: Public domain Contributors: Modification of Image:Interframe.png, which has been released into public domain Original artist: New.limit • File:JPEG_ZigZag.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/43/JPEG_ZigZag.svg License: Public domain Con- tributors: self-made after Interiot’s en:Image:JPEG ZigZag.jpg Original artist: Alex Khristov • File:JPEG_example_JPG_RIP_001.jpg Source: https://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_ 001.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Masur using CommonsHelper. Original artist: Toytoy at English Wikipedia 88 CHAPTER 13. MOTION ESTIMATION

• File:JPEG_example_JPG_RIP_010.jpg Source: https://upload.wikimedia.org/wikipedia/commons/3/38/JPEG_example_JPG_RIP_ 010.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Masur using CommonsHelper. Original artist: Toytoy at English Wikipedia • File:JPEG_example_JPG_RIP_025.jpg Source: https://upload.wikimedia.org/wikipedia/commons/8/8c/JPEG_example_JPG_RIP_ 025.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Masur using CommonsHelper. Original artist: Toytoy at English Wikipedia • File:JPEG_example_JPG_RIP_050.jpg Source: https://upload.wikimedia.org/wikipedia/commons/e/e0/JPEG_example_JPG_RIP_ 050.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Masur using CommonsHelper. Original artist: Toytoy at English Wikipedia • File:JPEG_example_JPG_RIP_100.jpg Source: https://upload.wikimedia.org/wikipedia/commons/b/b4/JPEG_example_JPG_RIP_ 100.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Masur using CommonsHelper. Original artist: Toytoy at English Wikipedia • File:JPEG_example_image.jpg Source: https://upload.wikimedia.org/wikipedia/commons/b/be/JPEG_example_image.jpg License: CC-BY-SA-3.0 Contributors: ? Original artist: ? • File:JPEG_example_image_decompressed.jpg Source: https://upload.wikimedia.org/wikipedia/commons/d/d4/JPEG_example_ image_decompressed.jpg License: CC-BY-SA-3.0 Contributors: Transferred from en.wikipedia to Commons by Shizhao using CommonsHelper. Original artist: The original uploader was Cburnett at English Wikipedia • File:JPEG_example_subimage.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/61/JPEG_example_subimage.svg Li- cense: CC-BY-SA-3.0 Contributors: Own work in Inkscape based on the following data: ,&,,55,,&,,61,,&,,66,,&,,70,,&,,61,,&,,64,,&,,73\\63,,&,,59,,&,,55,,&,,90,,&,,109,,&,,85,,&,,69,,&,,72\\62,,&,,59,,&,,68,,&,,113,,&,,144,,&,,104,,&,,66,,&,,73\\63,,&,,58,,&,,71,,&,,122,,&,,154,,&,,106,,&,,70,,&,,69\\67,,&,,61,,&,,68,,&,,104,,&,,126,,&,,88,,&,,68,,&,,70\\79,,&,,65,,&,,60,,&,,70,,&,,77,,&,,68,,&,,58,,&,,75\\85,,&,,71,,&,,64,,&,,59,,&,,55,,&,,61,,&,,65,,&,,83\\87,,&,,79,,&,,69,,&,,68,,&,,65,,&,,76,,&,,78,,&,,94\end{bmatrix Original artist: $5}">

13.6.3 Content license

en:User:Cburnett • File:JPEG_process.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/68/JPEG_process.svg License: Public domain Con- tributors: Own work Original artist: User:Konstable • File:JPS-sample.jpg Source: https://upload.wikimedia.org/wikipedia/commons/c/c7/JPS-sample.jpg License: CC BY-SA 3.0 Contribu- tors: Own work Original artist: Matthias Stirner • File:Jpegvergroessert.jpg Source: https://upload.wikimedia.org/wikipedia/commons/1/15/Jpegvergroessert.jpg License: CC-BY-SA- 3.0 Contributors: ? Original artist: ? • File:Letter-a-8x8.png Source: https://upload.wikimedia.org/wikipedia/commons/1/1a/Letter-a-8x8.png License: Public domain Contrib- utors: Own work Original artist: Hanakus • File:Lichtenstein_img_processing_test.png Source: https://upload.wikimedia.org/wikipedia/commons/3/39/Lichtenstein_img_ processing_test.png License: CC-BY-SA-3.0 Contributors: Image:Lichtenstein.jpg Original artist: Andreas Tille, cropped by Alessio Damato • File:Lichtenstein_jpeg_difference.png Source: https://upload.wikimedia.org/wikipedia/commons/f/ff/Lichtenstein_jpeg_difference. png License: CC-BY-SA-3.0 Contributors: Own work Original artist: Alessio Damato • File:Lock-green.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg License: CC0 Contributors: en:File: Free-to-read_lock_75.svg Original artist: User:Trappist the monk • File:Lossless-circle-canny.png Source: https://upload.wikimedia.org/wikipedia/commons/1/1c/Lossless-circle-canny.png License: Public domain Contributors: Own work Original artist: Meisam • File:Lossless-circle.png Source: https://upload.wikimedia.org/wikipedia/commons/f/f9/Lossless-circle.png License: Public domain Con- tributors: Own work Original artist: Meisam • File:Lossy-circle-canny.png Source: https://upload.wikimedia.org/wikipedia/commons/6/6e/Lossy-circle-canny.png License: Public domain Contributors: Own work Original artist: Meisam • File:Lossy-circle.jpg Source: https://upload.wikimedia.org/wikipedia/commons/e/ed/Lossy-circle.jpg License: Public domain Contribu- tors: Own work Original artist: Meisam • File:Modern_Color_Vision_Model.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b0/Modern_Color_Vision_ Model.svg License: CC0 Contributors: Own work Original artist: Fatbag • File:Motion_compensation_example-compensated_difference.jpg Source: https://upload.wikimedia.org/wikipedia/commons/9/92/ Motion_compensation_example-compensated_difference.jpg License: CC BY 2.5 Contributors: Screenshot from “Elephants Dream” http://orange.blender.org/download Original artist: (c) copyright 2006, Blender Foundation / Netherlands Media Art Institute / www. elephantsdream.org • File:Motion_compensation_example-difference.jpg Source: https://upload.wikimedia.org/wikipedia/commons/7/7b/ Motion_compensation_example-difference.jpg License: CC BY 2.5 Contributors: Screenshot from “Elephants Dream” http://orange.blender.org/download Original artist: (c) copyright 2006, Blender Foundation / Netherlands Media Art Institute / www.elephantsdream.org • File:Motion_compensation_example-original.jpg Source: https://upload.wikimedia.org/wikipedia/commons/b/b3/Motion_ compensation_example-original.jpg License: CC BY 2.5 Contributors: Screenshot from “Elephants Dream” http://orange.blender. org/download Original artist: (c) copyright 2006, Blender Foundation / Netherlands Media Art Institute / www.elephantsdream.org 13.6. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 89

• File:Multiple_references.jpg Source: https://upload.wikimedia.org/wikipedia/commons/b/b9/Multiple_references.jpg License: Public domain Contributors: Own work Original artist: Gerard Pons • File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ? Original artist: ? • File:Question_book-new.svg Source: https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0 Contributors: ? Original artist: ? • File:RGB_and_CMYK_comparison.png Source: https://upload.wikimedia.org/wikipedia/commons/1/1b/RGB_and_CMYK_ comparison.png License: Public domain Contributors: • Made in Photoshop. Original was also made in Photoshop in 2003. Original artist: RGB_CMYK_4.jpg: Annette Shacklett • File:Searchtool.svg Source: https://upload.wikimedia.org/wikipedia/en/6/61/Searchtool.svg License: ? Contributors: ? Original artist: ? • File:Spectrum_locus_12.png Source: https://upload.wikimedia.org/wikipedia/commons/f/f4/Spectrum_locus_12.png License: Public domain Contributors: self-made Java program displaying an interpretation of public domain figures Original artist: KoenB • File:Subpel_interpolation.jpg Source: https://upload.wikimedia.org/wikipedia/commons/1/19/Subpel_interpolation.jpg License: Public domain Contributors: Own work Original artist: Gerard Pons • File:SubtractiveColor.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/19/SubtractiveColor.svg License: Public domain Contributors: Transferred from en.wikipedia to Commons. Original artist: SharkD at English Wikipedia Later version uploaded by Jacobolus, Dacium at en.wikipedia. • File:Symbol_neutral_vote.svg Source: https://upload.wikimedia.org/wikipedia/en/8/89/Symbol_neutral_vote.svg License: Public do- main Contributors: ? Original artist: ? • File:Symbol_template_class.svg Source: https://upload.wikimedia.org/wikipedia/en/5/5c/Symbol_template_class.svg License: Public domain Contributors: ? Original artist: ? • File:Text_document_with_red_question_mark.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_ with_red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg from the Tango project. Original artist: Benjamin D. Esham (bdesham) • File:Ventral-dorsal_streams.svg Source: https://upload.wikimedia.org/wikipedia/commons/f/fb/Ventral-dorsal_streams.svg License: CC-BY-SA-3.0 Contributors: I (Selket) made this from File:Gray728.svg Original artist: Selket • File:YCbCr-CbCr_Scaled_Y50.png Source: https://upload.wikimedia.org/wikipedia/commons/3/34/YCbCr-CbCr_Scaled_Y50.png License: Public domain Contributors: Own work Original artist: Simon A. Eugster • File:YCbCr.GIF Source: https://upload.wikimedia.org/wikipedia/commons/b/b8/YCbCr.GIF License: CC0 Contributors: Own work (Original caption: “I(Cuddlyable3 (talk)) created this work entirely by myself.”) Original artist: Cuddlyable3 at en.wikipedia • File:YUV_UV_plane.svg Source: https://upload.wikimedia.org/wikipedia/commons/f/f9/YUV_UV_plane.svg License: CC-BY-SA-3.0 Contributors: Own work Original artist: Tonyle • File:Yuv420.svg Source: https://upload.wikimedia.org/wikipedia/en/0/0d/Yuv420.svg License: PD Contributors: Vectorised version of http://en.wikipedia.org/wiki/Image:Yuv420.png – created with a program given on the talk page. Original artist: Original bitmapped version by Xburge03, SVG version by Qef. • File:Yuv422_yuy2.svg Source: https://upload.wikimedia.org/wikipedia/commons/5/53/Yuv422_yuy2.svg License: Public domain Con- tributors: en:Image:Yuv422 yuy2.png Original artist: en:User:Pierrelucbacon, User:Stannered

|

• Creative Commons Attribution-Share Alike 3.0

}}