Lecture 2 Video Formation and Representation

2013 Spring Term 1 Lecture 2 Video Formation and Representation Wen-Hsiao Peng (彭文孝) Multimedia Architecture and Processing Lab (MAPL) Department of Computer Science National Chiao Tung University 1 Mar. 2013 Hsinchu, Taiwan Preface 2 The previous lecture talks about what light is and how it is perceived by our visual system to initiate color vision. In this lecture, we shall have a look at methods for capturing and representing video signals. Video Signal 3 When we refer to a video, we are actually referring to a sequence of moving images, each is the perspective projection of a 3‐D scene onto a 2‐D image plane. This drawing by Durer clearly conveys the idea of perspective projection. Normally, we refer to a point in the image plane as a pixel or a pel, especially when we talk about digital imagery. Color Video Camera 4 This block diagram shows you the typical imaging pipeline in a video camera. As can be seen, to capture color information, there are three types of sensors, each has a frequency response determined by the color matching functions of the chosen primary. Color Video Camera 5 Most cameras nowadays use CCD or CMOS sensors for digital color imaging. Normally, with these sensors, only one color value can be sampled at a point, and the sampling pattern is usually 50% Green, 25% Red and 25% Blue. Green color has a higher sampling rate because, as we have seen in our first lecture, it captures the most of brightness information. To get a complete set of RGB values for each point, interpolation is required. Recently, there appear some advanced sensors, which can acquire all three color values at a single point without interpolation. Color Video Camera 6 For more efficient processing and transmission, most cameras will further convert the captured RGB values into more independent luminance and chrominance information. Progressive and Interlaced Scan 7 This slide presents two different ways for sampling a video signal; one is progressive sampling and the other is interlaced sampling. With progressive sampling, a video signal is sampled as a sequence of complete video frames, just like what you would normally do with sampling. But, with interlaced sampling, we keep only half of the information in a complete frame each time. That is, we sample only the even‐numbered lines at one instance, and then proceed with the odd‐numbered lines at the next. The pictures so obtained are called field pictures. Also, the field containing the first and following alternating lines is referred to as the top field and that containing the second and following alternating lines as the bottom field. Progressive and Interlaced Scan 8 Since field pictures have a lower vertical resolution, they are normally sampled twice more frequently than frame pictures along the temporal dimension. That is, with the same data rate, we can send twice as many field pictures as the number of frame pictures in a progressive sequence. As a result, an interlaced sequence tends to have smoother motion when played back. This is the motivation for using the interlaced sampling. Progressive and Interlaced Scan 9 However, the downside of the interlaced sampling is that visual artifacts may appear when the scene contains fast‐moving objects. In this case, you can observe some ziz‐zag or feather‐ like artifacts along the vertical edges of objects. This arises because when the top field and the bottom field are displayed together in the form of a complete video frame, scenes/images captured at different time instances are blended together. It is important to remember that these field pictures are actually separated in time. Progressive and Interlaced Scan 10 To alleviate the artifacts, a de‐interlacing algorithm is usually employed to convert field pictures into frame pictures before playback. Analog Video Raster 11 This slide describes the mechanism for video capture and display in early days when analog cameras were in use. As illustrated by this figure, analog cameras capture a video signal by continuously and periodically scanning an image region from the top to the bottom. Different lines are scanned at slightly different times, and the scan format can be either progressive or interlaced. Along contiguous scan lines, the intensity values are recorded as a 1‐D waveform, which is known as a raster scan. This figure shows a typical waveform of such a raster scan signal. Analog Video Raster 12 In general, a raster is characterized by two basic parameters, which are the frame rate (frames/second) and the line number. The frame rate defines the temporal sampling rate of a raster while the line number indicates the vertical sampling rate. From these parameters, we can further derive other parameters, such as the line rate (lines/second), line interval, and frame interval. Notice that the 1‐D raster signal is set periodically to a constant level to indicate when the display devices should retrace its sensor horizontally or vertically to begin displaying a new line or a new field. Spectrum & Signal Bandwidth 13 This and the following slides talk about the spectrum of the 1‐D raster signal and its bandwidth estimation. I will skip this part. For details, please refer to Wang’s book. Analog Color TV Systems 14 This table compares the three major analog TV systems that are used worldwide. Please refer to Wang’s book for a more detailed exposition. [Note: Taiwan’s Over‐the‐Air TV networks have gone digital since May 2012, but most of households subscribe to Cable TV, whose signals remain analog] Digital Video (1/2) 15 A digital video can be obtained either by sampling a raster scan, or sampling the scene with a digital video camera. Like an analog video, a digital video is defined by a few parameters, such as the frame rate, the line number per frame, the number of samples per line, and the bit depth, which denotes the number of bits used to represent a pixel value. The raw data rate of a digital video can be computed as the product of these parameters, which has a unit of bits per second. Digital Video (2/2) 16 Conventionally, the luminance or each of the three color values is specified with 8 bits; so, Nb is equal to 8 for a monochrome video and 24 for a color video. However, in cases where the chrominance components have a different sampling resolution (spatial and temporal) than that of the luminance, Nb should reflect the equivalent number of bits used for each pixel in the luminance resolution. In addition, another two important parameters are image aspect ratio and pixel aspect ratio. The pixel aspect ratio indicates the ratio of the width to the height of a physical rectangular area used for rendering a pixel. ITU‐R BT.601 (1/2) 17 The ITU‐R BT.601 is a standard format used to represent “different” analog TV video signals (NTSC, PAL, SECAM). It specifies how to convert a 1‐D raster scan into a digital video by sampling. The sampling rate is chosen to meet two constraints: – horizontal and vertical sampling interval should match – the same rate should be used for NTSC and PAL/SECAM and it should be a multiple of their respective line rates (so that each line has an integer number of samples) (1) leads to 11 MHz for NTSC and 13MHz for PAL/SECAM (2) needs a multiple of least comm. mult. (15750,15625) A number that satisfies both constraints is 13.5MHz. ITU‐R BT.601 (2/2) 18 With this sampling rate, we will have 858 pixels per line for NTSC and 864 pixels for PAL/SECAM. The resulting formats are shown in these figures. It is noteworthy that there are some pixels in the so‐called “non‐active” area, and they correspond to signal samples for the horizontal or vertical retrace, and are thus not intended for display. So, the true display resolution is 480 or 576 lines per frame, depending on whether the signal is NTSC or PAL, and both have 720 pixels per line. A digital video with either of these resolutions is often called an Standard Definition (SD) video. Digital Video Formats (1/2) 19 This table summarizes some common digital video formats, along with their main applications and compression methods. The right most column gives their raw data rates to indicate how much bandwidth it would take if they are transmitted without any compression. As an example, for an SD video with 4:2:0 color sampling, its raw data rate is 124 Mbps, which is roughly the bandwidth limit that can be supported by the best Wi‐Fi technology we have today. By MPEG‐2 compression, it is possible to reduce the bit rate to 4‐8 Mbps, which is equivalent to a 10‐30x compression ratio. Digital Video Formats (2/2) 20 On the top of this table are the two popular HD formats, which have been widely used for HDTV as well as smartphone video. They are usually referred to as 720p or 1080p video, depending on the number of lines in height. The suffix “p” means progressive sampling, and we use “i” as the suffix when referring to interlaced sampling. 1080p video is also known as “Full HD” video. The SIF/CIF/QCIF formats were quite popular 10 years ago but are gradually phased out. High Definition and Ultra High Definition 21 This chart compares the resolutions of different video formats. In particular, the green/purple and dark blue areas show the sizes of the so‐called Ultra High Definition, which is going to be the format for next‐generation digital video.

Lecture 2 Video Formation and Representation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support