Video Formation, Perception, and Representation
Total Page:16
File Type:pdf, Size:1020Kb
wang-50214 book August 23, 2001 10:23 1 Video Formation, Perception, and Representation In this first chapter, we describe what a video signal is, how it is captured and per- ceived, how it is stored and transmitted, and what the important parameters are that determine the quality and bandwidth (which in turn determines the data rate) of the signal. We first present the underlying physics of color perception and specification (Section 1.1). We then describe the principles and typical devices for video capture and display (Section 1.2). As will be seen, analog video is captured, stored, and transmitted in a raster scan format, using either progressive or interlaced scans. As an example, we review the analog color television (TV) system (Section 1.4), and give insights as to how certain critical parameters (such as frame rate and line rate) are chosen, what the spectral content of a color TV signal is, and how different components of a signal can be multiplexed into a composite signal. Finally, Section 1.5 introduces the ITU-R BT.601 video format (formerly CCIR601), the digitized version of the analog color TV signal. We present some of the considerations that have gone into the selection of various digitization parameters. We also describe several other digital video formats, includ- ing high-definition TV (HDTV). The compression standards developed for different applications and their associated video formats are summarized. The purpose of this chapter is to give the reader background knowledge about analog and digital video, and to provide insights into common video system design problems. As such, the presentation is intentionally more qualitative than quantitative. In later chapters, we will return to certain problems mentioned in this chapter and provide more rigorous descriptions and solutions. 1 wang-50214 book August 23, 2001 10:23 2 Video Formation, Perception, and Representation Chapter 1 1.1 COLOR PERCEPTION AND SPECIFICATION A video signal is a sequence of two-dimensional (2-D) images, projected from a dy- namic three-dimensional (3-D) scene onto the image plane of a video camera. The color value at any point in a video frame records the emitted or reflected light at a particular 3-D point in the observed scene. To understand what the color value means physically, we review basics of light physics in this section, and describe the attributes that char- acterize light and its color. We also describe the principle of human color perception, and different ways to specify a color signal. 1.1.1 Light and Color Light consists of an electromagnetic wave, with wavelengths in the range of 380–780 nanometers (nm), to which the human eye is sensitive. The energy of light is measured by flux, with a unit of the watt, which is the rate at which energy is emitted. The radiant intensity of a light, which is directly related to our perception of the brightness of the light, is defined as the flux radiated into a unit solid angle in a particular direction, measured in watt/solid-angle. A light source usually can emit energy in a range of wavelengths, and its intensity can vary in both space and time. In this text, we use the notation C(X, t,λ)to represent the radiant intensity distribution of a light, which specifies the light intensity at wavelength λ, spatial location X = (X, Y, Z), and time t. The perceived color of a light depends on its spectral content (i.e., its wavelength composition). For example, a light that has its energy concentrated near 700 nm appears red. A light that has equal energy across the entire visible band appears white. In general, a light that has a very narrow bandwidth is referred to as a spectral color. On the other hand, a white light is said to be achromatic. There are two types of light sources: the illuminating source, which emits an electromagnetic wave, and the reflecting source, which reflects an incident wave.∗ Illuminating light sources include the sun, lightbulbs, television monitors, and so on. The perceived color of an illuminating light source depends on the wavelength range in which it emits energy. Illuminating light follows an additive rule: the perceived color of several mixed illuminating light sources depends on the sum of the spectra of all the light sources. For example, combining red, green, and blue lights in the right proportions creates the color white. Reflecting light sources are those that reflect an incident light (which could itself beareflected light). When a light beam hits an object, the energy in a certain wavelength range is absorbed, while the rest is reflected. The color of a reflected light depends on the spectral content of the incident light and the wavelength range that is absorbed. The most notable reflecting light sources are color dyes and paints. A reflecting light source follows a subtractive rule: the perceived color of several mixed reflecting light sources depends on the remaining, unabsorbed wavelengths. For example, if the incident light ∗Illuminating and reflecting light sources are also referred to as primary and secondary light sources, respectively. We do not use those terms, to avoid confusion with the primary colors associated with light. In other sources, illuminating and reflecting lights are also called additive colors and subtractive colors, respectively. wang-50214 book August 23, 2001 10:23 Section 1.1 Color Perception and Specification 3 is white, a dye that absorbs the wavelength near 700 nm (red) appears as cyan. In this sense, we say that cyan is the complement of red (i.e., white minus red). Similarly, magenta and yellow are complements of green and blue, respectively. Mixing cyan, magenta, and yellow dyes produces black, which absorbs the entire visible spectrum. 1.1.2 Human Perception of Color The perception of light by a human being starts with the photoreceptors located in the retina (the rear surface of the interior of the eyeball). There are two types of receptors: cones, which function under bright light and can perceive color tone; and rods, which work under low ambient light and can only extract luminance information. Visual information from the retina is passed via optic nerve fibers to the brain area called the visual cortex, where visual processing and understanding is accomplished. There are three types of cones, which have overlapping pass-bands in the visible spectrum with peaks at red (near 570 nm), green (near 535 nm), and blue (near 445 nm) wavelengths, respectively, as shown in Figure 1.1. The responses of these receptors to an incoming light distribution C(λ) can be described by the following equation: Ci = C(λ)ai (λ) dλ, i = r, g, b, (1.1.1) where ar (λ), ag(λ), ab(λ) are referred to as the frequency responses or relative absorp- tion functions of the red, green, and blue cones. The combination of these three types of receptors enables a human being to perceive any color. This implies that the per- ceived color depends only on three numbers, Cr , Cg, Cb, rather than the complete light spectrum C(λ). This is known as the tri-receptor theory of color vision, first advanced by Young [17]. There are two attributes that describe the color sensation of a human being: luminance and chrominance. Luminance refers to the perceived brightness of the light, 100 Blueϫ20 Luminosity function 80 Red Green 60 40 Figure 1.1 Frequency responses of the Relative sensitivity three types of cones in the human retina and the luminous efficiency function. The 20 blue response curve is magnified by a factor of twenty in the figure. Reprinted from D. H. Pritchard, US color television 0 fundamentals, IEEE Trans. Consum. 400 500 600 700 Electron. (1977), CE-23:467–78. Wavelength Copyright 1977 IEEE. wang-50214 book August 23, 2001 10:23 4 Video Formation, Perception, and Representation Chapter 1 which is proportional to the total energy in the visible band. Chrominance describes the perceived color tone of a light, which depends on the wavelength composition of the light. Chrominance is in turn characterized by two attributes: hue and saturation. Hue specifies the color tone, which depends on the peak wavelength of the light, whereas saturation describes how pure the color is, which depends on the spread or bandwidth of the light spectrum. In this book, we use the word “color” to refer to both the luminance and chrominance attributes of a light, although it is customary to use the word color to refer to the chrominance aspect of a light only. Experiments have shown that there exists a secondary processing stage in the hu- man visual system (HVS), which converts the three color values obtained by the cones into one value that is proportional to the luminance and two other values that are respon- sible for the perception of chrominance. This is known as the opponent color model of the HVS [2, 8]. It has been found that the same amount of energy produces different sensations of brightness at different wavelengths, and this wavelength-dependent vari- ation of brightness sensation is characterized by a relative luminous efficiency function, ay(λ), which is also shown in Figure 1.1. It is essentially the sum of the frequency responses of all three types of cones. We can see that the green wavelength contributes the most to the perceived brightness, the red wavelength the second-most, and the blue the least. The luminance (often denoted by Y) is related to the incoming light spectrum by Y = C(λ)ay(λ) dλ. (1.1.2) In the preceding equations, we have omitted the time and space variables, since we are only concerned with the perceived color or luminance at a fixed spatial and tempo- ral location.