<<

Need for  Reducing the amount of data needed to reproduce images or (compression) saves storage space, increases access speed, and is the only way to achieve digital motion video on personal .  In order to compare video compression systems, one must have ways to evaluate compression performance. Three key parameters need to be considered: i. Amount or degree of compression ii. iii. Speed of compression or decompression.  In addition, we must also look at the hardware and required by each compression method. • Compression is useful because it helps reduce resource usage, such as data storage space or transmission capacity. • Because compressed data must be decompressed to use, this extra processing imposes computational or other costs through decompression; this situation is far from being a free lunch.  Data compression is subject to a space–time complexity trade- off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage.  The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.[  However, the most important reason for compressing data is that more and more we share data. The Web and its underlying networks have limitations on bandwidth that define the maximum number of bits or bytes that can be transmitted from one place to another in a fixed amount of time. Non-lossy and for images  may be lossless(non-lossy) or lossy.  means that the reproduced image is not changed in any way by the compression/decompression process therefore, we do not have worry about the picture quality for a lossless system- the output picture will be exactly the same as the input picture. Lossless compression is possible because we can use more efficient methods of data transmission than the -by-pixel PCM (Pulse-Code )format that comes from a digitizer.  Lossless compression is preferred for archival purposes and often for medical imaging, technical drawings, clip art, or comics.  Lossless compression is used in cases where it is important that the original and the decompressed data be identical, or where deviations from the original data could be deleterious. Typical examples are executable programs, text documents, and source code.  Some image file formats, like PNG(Portable Network Graphics) or GIF(Graphic Interface Format), use only lossless compression, while others like TIFF(Tagged Image ) and MNG( Network Group) may use either lossless or lossy methods.  Lossless data compression is used in many applications. For example, it is used in the ZIP file format and in the GNU tool gzip. It is also often used as a component within lossy data compression technologies (e.g. lossless mid/side joint stereo preprocessing by the LAME MP3 encoder and other lossy audio encoders).  One of the reasons that there is still some interest in new loss- less compression techniques, is that only very inexact data structures can survive Lossy Compression.  It is often the case that loss of a single bit, renders a whole phrase or line of data inaccurate. This is why we attempt to build more and more stable memory systems. The recent shift from RD to SD ram for instance was partially because RD ram needed more interactive maintenance of its data.  If we are so protective of the memory of data, then it makes sense that we must also be protective of the compression scheme we use to store and retrieve data. So the only places Lossy Compression can be used, are places where the accuracy at the bit level, does not materially affect the quality of the data.  Methods for lossless image compression are: • Run-length encoding – used as default method in PCX and as one of possible in BMP, TGA, TIFF  Area image compression  DPCM and Predictive Coding   Adaptive dictionary algorithms such as LZW – used in GIF and TIFF  Deflation– used in PNG, MNG, and TIFF  Chain codes  Lossy compression system by definition do make some change to the image – something is different.  The trick is making that difference hard for the viewer to see. Lossy compression systems may introduce any of the artifacts, or they may even create some unique artifacts if their own.  None of these effects is easy to quantify, and final decisions about compression systems, or about any specific compressed image, will usually have to be made after a subjective evaluation- there‘s not a good alternative to looking at test pictures.  The various measures of analog picture quality- -to- ratio, resolution, color errors, etc., may be useful in some cases, but only after viewing real pictures to make sure that the right artifacts are being measured.  Lossy compression methods, especially when used at low bit rates, introduce compression artifacts. Lossy methods are especially suitable for natural images such as photographs in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in . The lossy compression that produces imperceptible differences may be called visually lossless.  Lossy compression is most commonly used to compress multimedia data (audio, video, and still images), especially in applications such as streaming and internet telephony.   By contrast, lossless compression is typically required for text and data files, such as bank records and text articles. In many cases it is advantageous to make a master lossless file that can then be used to produce compressed files for different purposes.  For example, a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine, and a 10 kilobyte lossy copy can be made for a small image on a web page.  Methods for lossy compression:  Reducing the to the most common colors in the image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette, this method can be combined with dithering to avoid posterization.  . This takes advantage of the fact that the human eye perceives spatial changes of brightness more sharply than those of color, by averaging or dropping some of the information in the image.  . This is the most commonly used method. In particular, a Fourier-related transform such as the Discrete Cosine Transform (DCT) is widely used.  The DCT is sometimes referred to as "DCT-II" in the context of a family of discrete cosine transforms; e.g., see discrete cosine transform. The more recently developed is also used extensively, followed by quantization and entropy coding.

Color  Adding color to a grayscale image is a neat little effect you see all over the place. Now, this isn‘t to be confused with taking a color image and removing its color, only to add some of it back in certain places. This technique is entirely different. What I‘ going to show you today is how to apply a new color to a naked image, so to speak. It‘s a really simple technique that‘s fun to use, it‘s great for creating visual interest, and drawing attention to a certain portion of a photo.  Admittedly, the process of colorizing a grayscale photo certainly seems straight forward enough, in that it probably involves grabbing a paint brush and painting color onto the image itself. The problem, though, is that while you would succeed in adding color to the photo, you would systematically destroy any detail it once contained.  Using the cutest photo *ever* (snatched from iStockphoto.com), I‘m going to show you the trick to adding color while retaining all the glorious detail of the photo.

 The very first thing we want to do is make sure the document is in color mode, and not grayscale. Else, we won‘t get very far and your frustration level with all things digital could reach an all time high.  Step 1: Choose Image > Mode and make sure the document is set to either RGB or CMYK. If the document mode is Grayscale, you won‘t be allowed you to paint in color, which can be quite maddening.  NOTE: If this image will be printed professionally, then you want to choose CMYK. If you‘re going to print the image on your home color inkjet or if the image is destined to live out its life only on screen, then go with RGB.  Step 2: Create a new layer by clicking the New Layer button at the bottom of the Layers Palette. This is where the new paint will live, so that we don‘t screw up the original photo.  Step 3: Change the blending mode of the new layer to either Color or Overlay, as shown below. This will allow the detail of the image to show through the paint, instead of the paint being a solid coat.  Step 4: Press B to select the Brush tool, and click on the foreground color chip at the bottom of the main Toolbar. Pick a nice pastel color from the resulting color picker and press OK.  TIP: Press Command + (PC: Ctrl + ) to zoom in, and Command – (PC: Ctrl -) to zoom back out of your document. Another handy tip to remember while doing detail work is that while zoomed in on your document, pressing the spacebar turns the cursor into a little hand which you can then use to mouse over to a different area of the image, like so:  Step 5: Since we‘re about to embark upon a bit of detail work, I‘m going to share a workspace trick with you before we start painting. Choose Window > New Window for [insert image name]. This is going to allow us to be zoomed in really far on the image in one window, and still see what the image looks like at its normal size in another.  The neat bit is what you do in one window happens simultaneously in the other.  Step 6: As you move around in the image, you come upon places where your brush is too large, such as the little strap around her neck. Here you can press the left bracket key, [, to cycle down in brush size, and later cycle back up by pressing the right bracket key, ].  If you mess up during the painting process, just press E to select the Eraser tool and fix your mistake. Press B to pick the brush back up and soldier on. After painting her dress, gloves, purse and hat, here‘s the little cutie all clad in purple:  Step 7: Create an adjustment layer by pressing the half black/half white circle at the bottom of the Layers Palette, and choose Hue/Saturation.  Step 8: In the resulting dialog box, grab the Hue slider and move it rightward.  Step 9: In our case, I decided on a peachy color (it matches my web site) and to make the effect a bit more subtle, I decreased the Saturation just a tad, as shown below. Grayscale and Still-Video Images  In photography and computing, a grayscale or greyscale digital image is an image in which the value of each pixel is a single sample, that is, it carries only intensity information. Images of this sort, also known as black-and-white, are composed exclusively of shades of gray, varying from black at the weakest intensity to white at the strongest.  Grayscale images are distinct from one-bit bi-tonal black-and- white images, which in the context of imaging are images with only the two colors, black, and white (also called bilevel or binary images). Grayscale images have many shades of gray in between.  Grayscale is a range of shades of gray without apparent color. The darkest possible shade is black, which is the total absence of transmitted or reflected light.  The lightest possible shade is white, the total transmission or reflection of light at all visible wavelengths.  Intermediate shades of gray are represented by equal brightness levels of the three primary colors (red, green and blue) for transmitted light, or equal amounts of the three primary pigments (cyan, magenta and yellow) for reflected light.  Grayscale images are often the result of measuring the intensity of light at each pixel in a single band of the electromagnetic spectrum (e.g. infrared, visible light, , etc.), and in such cases they are monochromatic proper when only a given frequency is captured.  But also they can be synthesized from a full color image; see the section about converting to grayscale. Still-Video Images  You can extract any video frame as a still image to place wherever you want it to appear in your iMovie project.  To easily add a still frame to the same project where it already appears as part of a , extract t from the project clip.  If you want to add a still frame from a video clip that isn‘t part of your project, you can extract it from the source video in any one of your Events.  To extract a still frame from video:  A)Let the pointer hover over the video frame that you want to extract as a still image.  B)Hold down the Control key and press the mouse button to open the menu, and then choose ―Add still frame to project.‖  C)The image is added to the end of your open project as a four- second clip. (If you created the still frame from a source video clip, the Ken Burns (motion) effect is automatically applied; if you‘ve created the still frame from a project clip, the Ken Burns effect isn‘t applied).  D)To change the duration, click the Duration button ( looks like a clock) that appears in the clip‘s lower-left corner when the pointer moves over the clip; enter a time for the Duration, and  then click OK.  You can easily export a still image from your video file using GoPro Studio. Here is a procedure that details the process:  Step1:Go to Step 2: Edit  Step2:Select the desired video clip so that it is displayed in the Player window.  Step3: Mac instructions - Place the playhead so that the frame you want to export is displayed in the player window, then selecting Share > Export Still.  Step4: Windows Instructions - Place the playhead so that the frame you want to export is displayed in the player window, then selecting File > Export > Still Image.  Step 5: Then just select the image Name, Location and Size to Export (small, medium, large, & native) and click Export.  Step 6:Check out the location you specified to verify that the image was created. Audio compression  Audio compression (data), a type of lossy or lossless compression in which the amount of data in a recorded waveform is reduced to differing extents for transmission respectively with or without some loss of quality, used in CD and MP3 encoding, Internet , and the like  compression, also called audio level compression, in which the dynamic range, the difference between loud and quiet, of an audio waveform is reduced  Video and Audio files are very large beasts. Unless we develop and maintain very high bandwidth networks (Gigabytes per second or more) we have to compress to data. Relying on higher bandwidths is not a good option -- M25 Syndrome: Traffic needs ever increases and will adapt to swamp current limit whatever this is.  As we will compression becomes part of the representation or coding scheme which have become popular audio, image and video formats.  We will first study basic compression algorithms and then go on to study some actual coding formats.  We have studied the theory of encoding now let us see how this is applied in practice.  We need to compress video (and audio) in practice since:  1. (and audio) data are huge. In HDTV, the bit rate easily exceeds 1 Gbps. -- big problems for storage and network communications. For example: One of the formats defined for HDTV within the is 1920 horizontally by 1080 lines vertically, at 30 frames per second.  If these numbers are all multiplied together, along with 8 bits for each of the three primary colors, the total data rate required would be approximately 1.5 Gb/sec. Because of the 6 MHz. channel bandwidth allocated, each channel will only support a data rate of 19.2 Mb/sec, which is further reduced to 18 Mb/sec by the fact that the channel must also support audio, transport, and ancillary data information.  As can be seen, this restriction in data rate means that the original signal must be compressed by a figure of approximately 83:1.  This number seems all the more impressive when it is realized that the intent is to deliver very high quality video to the end user, with as few visible artifacts as possible.  2. Lossy methods have to employed since the compression ratio of lossless methods (e.g., Huffman, Arithmetic, LZW) is not high enough for image and video compression, especially when distribution of pixel values is relatively flat. The following compression types are commonly used in Video compression:  Spatial Redundancy Removal - Intraframe coding (JPEG)  Spatial and Temporal Redundancy Removal - Intraframe and Interframe coding (H.261, MPEG)  These are discussed in the following sections.  Audio compression has become well entrenched in consumer and professional digital audio products such as the (CD), digital versatile disc (DVD), digital audio broadcasting (DAB) and motion picture experts group (MPEG) audio layer 3 (MP3) distribution on the Internet.  Audio and speech compression schemes can be conveniently partitioned into applications reflecting some measure of acceptable quality, ranging from telephone speech to wideband audio. MPEG Layers I, II and III

 The International Standards Organisation (ISO) and Motion Picture Experts Group (MPEG) audio coding standard describes audio compression for synchronized audio to accompany the compressed video known as MPEG.  It combines features of MUSICAM (Masking pattern adapted Universal Subband Integrated Coding and ) and ASPEC (Adaptive Spectral Perceptual Entropy Coding).  It consists of three layers (codes) of increasing complexity and improving subjective performance, and it operates with input sampling rates of for example 32, 44.1 and 48 kHz, and it outputs bit rates per monophonic channel between 32 and 192 kbit/sec, or per stereophonic channel between 64 and 384 kbit/sec.  The standard supports single channel mode, stereo mode, dual channel mode (for bilingual audio programs) and an optional joint stereo mode.  The encoder operates in conjunction with a real-time model of the human spectral perception threshold.  This threshold is a frequency-dependent boundary or threshold that marks sound pressure levels (SPL) below which the human ear cannot detect sounds.  Signal spectral components below the threshold level that cannot be heard are declared irrelevant, and they are not encoded in the compression process.  The encoder operation is quite complex!  The in MPEG-4 Part 3 was enhanced relative to the previous standard MPEG-2 Part 7, in order to provide better for a given encoding bitrate.  AAC's best known use is as the default audio format of Apple's iPhone, iPod, iTunes.  AAC's multiple are: - Low Complexity Advanced Audio Coding (LC-AAC) - High-Efficiency Advanced Audio Coding (HE-AAC) - Scalable Sample Rate Advanced Audio Coding (AAC- SSR) - Bit Sliced (BSAC) - Long Term Predictor (LTP)  AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications. The MPEG-2 standard contains several audio coding methods, including the MP3 coding scheme. AAC is able to include 48 full- bandwidth (up to 96 kHz) audio channels in one stream plus 16 low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams.  The quality for stereo is satisfactory to modest requirements at 96 kbit/s in joint stereo mode, however hi- fi transparency demands data rates of at least 128kbit/s. The MPEG-2 audio tests showed that AAC meets the requirements referred to as "transparent" for the ITU at 128 kbit/s for stereo, and 320kbit/s for 5.1 audio. Simple Audio Compression Methods  Traditional lossless compression methods (Huffman, LZW, etc.) usually don't work well on audio compression (the same reason as in image compression).  The following are some of the Lossy methods applied to audio compression:  Silence Compression - detect the "silence", similar to run- length coding .  Adaptive Differential Pulse Code Modulation (ADPCM)  e.g., in CCITT G.721 - 16 or 32 Kbits/sec.  (a) encodes the difference between two consecutive ,  (b) adapts at quantization so fewer bits are used when the value is smaller.  It is necessary to predict where the waveform is headed -> difficult  Apple has proprietary scheme called ACE/MACE. Lossy scheme that tries to predict where wave will go in next sample. About 2:1 compression.  (LPC) fits signal to speech model and then transmits parameters of model. Sounds like a computer talking, 2.4 kbits/sec.  Code Excited Linear Predictor (CELP) does LPC, but also transmits error term - audio conferencing quality at 4.8 kbits/sec. JPEG Standard  In computing, JPEG (seen most often with the .jpg or .jpeg filename extension) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography.  The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality.  JPEG compression is used in a number of image file formats. JPEG/Exif is the most common image format used by digital cameras and other photographic image capture devices; along with JPEG/JFIF, it is the most common format for storing and transmitting photographic images on the .  These format variations are often not distinguished, and are simply called JPEG.  The term "JPEG" is an acronym for the Joint Photographic Experts Group, which created the standard. The MIME media type for JPEG is image/jpeg (defined in RFC 1341), except in older Internet Explorer versions, which provides a MIME type of image/pjpeg when uploading JPEG images.  JPEG/JFIF supports a maximum image size of 65535×65535 pixels – one to four gigapixels (1000 megapixels), depending on (from panoramic 3:1 to square).  "JPEG" stands for Joint Photographic Experts Group, the name of the committee that created the JPEG standard and also other still picture coding standards. The "Joint" stood for ISO TC97 WG8 and CCITT SGVIII. In 1987 ISO TC 97 became ISO/IEC JTC1 and in 1992 CCITT became ITU-T. Currently on the JTC1 side JPEG is one of two sub-groups of ISO/IEC Joint Technical Committee 1, Subcommittee 29, Working Group 1 (ISO/IEC JTC 1/SC 29/WG 1) – titled as Coding of still pictures.On the ITU-T side ITU-T SG16 is the respective body. The original JPEG group was organized in 1986, issuing the first JPEG standard in 1992, which was approved in September 1992 as ITU-T Recommendation T.81 and in 1994 as ISO/IEC 10918-1.  The JPEG standard specifies the , which defines how an image is compressed into a stream of bytes and decompressed back into an image, but not the file format used to contain that stream.  The Exif and JFIF standards define the commonly used file formats for interchange of JPEG-compressed images.  JPEG standards are formally named as Information technology – Digital compression and coding of continuous- tone still images.  The JPEG standards includes:  Objectives  Architecture  DCT Encoding and Quantization  Statistical Coding  Predictive Lossless Coding, and  Performance JPEG Objectives

 JPEG under took to develop a single standard applicable to the still-imaging needs of a wide range of applications in all the different industries that might use digital continuous-tone imaging.  The scope if this is bet seen by listing the objectives in detail:  1. To be at or near the state of the art for degree of compression versus image quality,  2. To be parameterizable so that the user can select the desired compression versus quality tradeoff,  3.To be applicable to practically any kind of source image, without regard to dimensions, image content, aspect ratio, etc.,  4. To have computational requirements that are reasonable for both hardware or software implementation, and  5. To support four different modes of operation:  (a) sequential encoding, where each image component is encoded in the same order that it was scanned;  (b) progressive encoding where the image is encoded in multiple passes so that a coarse image is presented rapidly, followed by repeated imaged showing greater and greater deal;  (c) lossless encoding, where the encoding guarantees exact reproduction of all the data in the source image,  (d) hierarchical encoding, where the image is encoded at multiple resolutions.  These objectives were extremely ambitious, yet they are largely met by the completed, which is testimony to the excellent work of the JPEG commettee. JPEG- Architectures  The lossy modes of operation (a,b,d) are implemented with DCT encoding of 8 x 8 pixel blocks, followed by one of two statistical coding methods, while the lossless option (c) is complemented with simple predictive coding followed by statistical coding. This is shown in the following figures:

Compressed Zig-Zag Statistical  Source | DCT Quantizer image Encoder Ordering coding  image Bit Stream  Data Quantizing Table table Specification

 Fig 1(a): Sequential coding block diagram Figures: Progressive encoding, Lossless coding, and hierarchical coding block diagrams

Figure 1(b)

Figure 1(c)

Figure 1(d)

Zig-Zag Statistical Hierarchical DCT Quantizer Control Encoder Ordering Coding

Quantizing Table Table Specification  The architecture shown in previous figures apply to a single gray scale image or to one of the components of a color image.  To compress a color image, the color image components can be either completely compressed one after another, or the three components can be interleaved for each block of the image.  In the case of sequential-mode encoding, DCT encoding is done on the blocks of the image as they are scanned, and the DCT coefficient output is transmitted block by block in the same order.  For progressive-mode encoding, an image buffer is added after the DCT encoding step. The progressive-mode behavior is obtained by reading out different portions of the DCT coefficients to achieve progressively improved quality over several scans.  For hierarchical-mode encoding, processing is added ahead of the DCT encoder to filter and subsample the source image before encoding. This sub-sampling and encoding is done repeatedly with progressively less sub-sampling to transmit images of increasing resolution one after another JPEG-DCT Encoding and Quantization  The output if the DCT encoder (the DCT coefficients) is shown in figure 2(a) as 2-D array with the DC coefficient in the upper left corner, and the AC coefficients arranged with increasing spatial frequency horizontally and vertically.  These components are quantized according to 64-entry table, which must be specified to the encoder by the application.  The quantization table has 8 bits per entry and specifies the step size of quantizing for each DCT coefficient.  This allows each coefficient to be represented with no more precision than is necessary to achieve the desired image quality. The standard does not specify any quantization tables; these must be provided by the application and will become part of the data stream, so the decoder knows what table was used.  Therefore,, modification of the quantization table specified during encoding is one way to vary the degree of compression.  After quantization, the DC coefficient is treated differently from the AC coefficients. Because there is usually a strong correlation between DC coefficients of adjacent 8 x 8 blocks, the DC coefficient is encoded as the difference from the previous block in the encoding sequence.  Figure 1(a) also shows a step called zig-zag ordering between the quantizer and the statistical coder. This is an important step, which arranges the DCT coefficients so that the statistical coding will be more effective. It is diagrammed in Figure 2(b).  In order to create a bit-stream where coefficients that are more likely to be nonzero (low frequency ones) are placed before coefficients that are more likely to be zero (high-frequency ones), the zig-zag sequence shown by Figure 2(b) is used to read the coefficients into the bit-stream. The result is that all the zero-value coefficients tend to be together at the end of the block and can be transmitted with vary few bits using a simple run-length code.  DCT coefficients Zig-Zag sequence  for one 8 x8 block

Horizontal spatial frequency 0 7 Horizontal spatial frequency Vertical 0 Start 0 7 Spatial 7 0 Frequency Vertical

 Spatial

 Frequency 

 7 7

 Fig 2(a):2-D matrix of DCT coeff. Fig 2(b): Zig-Zag ordering end  JPEG-Statistical Coding

 The final encoder processing step is statistical coding. It achieves lossless compression by encoding the quantized DCT coefficients more efficiently based on their statistical characteristics.  The JPEG standard allows two types of statistical coding- or arithmetic coding. The Baseline sequential coder only Huffman statistical coding.  Huffman coding(a Huffman code is an optimal prefix code found using the algorithm developed by David A. Huffman) requires specification of the Huffman table or code block this is the job of the application, the standard does not specify a table except for the Baseline coder.  The Huffman table also become part of the image bit-stream; the standard supports up to four Huffman tables per image, to provide for different tables for each component of a multi- component image.  The arithmetic statistical coding option does not require a separate table to be provided, but it does require a little more processing for implementation. However, it result in a little more processing better compression (5 to 10 percent) for many images. JPEG-Predictive Lossless Coding  The Loss compression option, does not use DCT.  Instead, a simple predictor is used, but there is a choice of seven different kinds of prediction available.  The different predictor choices specify how many and which adjacent pixels are used to predict the next pixel.  The statistical coding in the lossless mode can use either of the two methods specified for the DCT modes, and is similar to what is specified for the DC coefficient of the DCT modes.  The lossless compression will work with source images having from 2 to 16 bpp, and typically around2:1 compression for photographic color images. JPEG-Performance  Compression performance is best specified by relating image quality to bits per pixel in the compressed data stream.  This relationship depends to some degree on the characteristics of the source image- some images are harder to compress successfully than others. With this in mind, here are some figures for ‗typical‘ source images[2].  0.25-0.5 bpp; moderate to good quality, sufficient for some applications;  05-075 bpp: good to very good quality, sufficient for many applications;  0.75-15 bpp; excellent quality, sufficient for most applications;  15-20 bpp; :undistinguishable from the original, sufficient for the most demanding applications. MPEG Standard  Digital motion video can be accomplished with the JPEG still- image standard if you have fast enough hardware to process 30 images per second.  However, the maximum compression potential cannot be achieved because the redundancy between frames is not being exploited.  Furthermore, there are many other things to be considered in compressing and decompressing motion video, as indicated in the objectives.  The MPEG standards includes:  1.Objectives 3.Bitstream 5. MPEG-2 and MPEG-4  2. Architecture 4. Performance MPEG Objectives  As with the JPEG standard, the MPEG standard is intended to be generic, meaning that it will support the needs of many applications.  As such, it can be considered as a motion video compression toolkit, from which a user selects the particular features that best suit his or her application. More specific objectives are:  1. The standard will deliver acceptable at compressed data rates between 1.0 and 1.5 Mbps.  2. It will support either symmetric or asymmetric compress/decompress applications.  3. When compression takes it into0 account, random-access playback is possible to any specified degree.  4. Similarly, when compression takes it into account, fast-forward, fast-reverse, or normal-reverse playback modes can be made available in addition to normal (forward) playback.  5. Audio/video synchronization will be maintained.  6. Catastrophic behavior in the presence of data errors should be avoidable.  7. When it is required, compression-decompression delay can be controlled.  8. Editability should be available when required by the application.  9. There should be sufficient format flexibility to support playing of video in windows.  10. The processing requirements should not preclude the development of low-cost  Some of these objectives are conflicting, and they all conflict with the objectives of cost and quality.  In spite of that, the proposed standard provides for all of the objectives, but of course not all at once.  A proposed application has to make its own choices about which features of the standard it requires and accept any tradeoff that this may cause. Architecture  The MPEG standard is primarily a bitstream specification, although it also specifies a typical decoding process to assist in interpreting the bitstream specification.  This approach supports data interchange, but it does not restrict creativity and innovation in the means for creating or decoding that bitstream.  The bitstream architecture is based on a sequence of pictures, each of which contains the date needed to create a single displayable image.  Note that the order of transmission of pictures in the data stream may not be the same as the order in which pictures will be displayed- the will be evident shortly.  There are four different kinds of pictures, depending on how each picture is to decoded.  I pictures are intercoded, meaning that they are coded independent of any other picture.  An I picture must exist at the start of any video stream and also at any random-access entry point in the stream.  I pictures are predicted pictures, which are coded using from a previous I or P picture ,  B pictures are interpolated pictures, which are coded by interpolating between a previous and a future I or p picture . This process is sometimes referred to as bidirectional prediction.  D pictures are a special format that is only used for implementing fast search modes.  An I pictures requires the most date; it is similar to a JPEG image. It is structured into 8X8 blocks that are DCT coded, quantized, and statistically encoded.  A P picture requires about one-third of the data of an I picture ; it consists of 16X16 marc blocks, which are DCT coded motion correction values.  A B picture takes 2:1 to 5:1 less data than a P picture; it also has marcblocks and blocks containing interpolation parameters and DCT coded correction values.  The most compression is obtained by using as many B pictures as possible.  However, to perform B decoding , the ‗future‘ I or B picture involved must be transmitted before any of the dependent B pictures can proportional to the number of B pictures in series.  Because of the delay issue, the consideration of random access, and the effectiveness of the interpolation technique, a typical displayed picture sequence would be of the form (the numbers are the order of display):  I B B B P B B B I B B B P……  1 2 3 4 5 6 7 8 9 10 11 12 13  This picture sequence is diagrammed showing the decencies of pictures on each other.   * Forward prediction *  | |

 Interpolation  (bidirectional predication)  * Possible random-access entry points  Figure: A Typical MPEG picture sequence showing inter-frame dependences  The standard, however, is completely flexible with regard to the picture sequence, and an application can( if it wishes ) tailor it to optimize any situation.  As mentioned before, when B pictures are used, the reference I and P pictures must be transmitted before any dependent B pictures that the order of transmission for the sequence above is ( the numbers are still the order of display):  I P B B B I B B B P B B B…..  15 2 3 4 9 6 7 8 13 10 11 12 Bitstream Syntax  The bitstream is structured at six levels of hierarchy in order to support all of the features of MPEG video. These are:  Sequence layer-an independent video stream.  layer-- this is a clip of video that begins with a random-access entry point and has uniform video parameters within it.  Picture layer- represents one displayable image.  Slice layer-a variable –size sub picture group that provides for synchronizing of the decoder in the even of an error.  layer-the 16 X 16 pixel motion compensation unit.  The syntax levels are diagrammed.  Layer ......  Video sequences 1 2 3 4 5 …… j  . 1 2 3 4 …… k  Group of pictures

1 2 3 4 …… l  Pictures

 Slice 1 2 3 4 …… m

1 2  Macroblock 5 6 3 4  CR CB

 Block 8X8 pixels

 .= Possible random-access entry points  Figure : MPEG layered bitstream structure Performance  MPEG provides for a wide range of video resolutions and data rates. One set of choices that has been widely researched is optimized for a data rates of about 1.2 Mpbs( CD_ROM data rate).  For 30 frames/second video at a of 352X240 pixels, the quality of compressed and decompressed video at this data rate is often expressed as similar to VHS recording.  Most scenes do not exhibit compression artifacts, but the most demanding material may resolution or tradeoffs to obtain visually acceptable results. MPEG-2 and MPEG-4

 The ISO committee which developed the MPEG standard is currently at work specifying a successor standard known as MPEF-2.  The video component is targeted for bit rates in the range of about 2 to 15 Mbps, which is sufficient for supporting HDTV.  Additionally, MPEF-2 includes a number of new features with the intent of providing compatibility with existing standards such as terrestrial video, MPEG and H.261.  Compatible transmission is conceptually similar to today‘s TV broadcasts, which can received by both color and black and white sets.  The MPEG-2 encoding is intended to allow a single transmission to be received by a range of digital , from small portable units that might only support NTSC resolution to HDTV receivers.  Scalable digital video is critical to transmission over packet switching networks.  As the load on the network increases, the transmitting node adjusts by decreasing the quality of the transmitted video.  The audio encoding fir MPEG-2 is also being extended.  MPEG-2 audio will encode up to five full bandwidth channels (left, right, center, and two surround channels), an additional low-frequency enhancement channel, and up to seven commentary or multilingual channels.  Several improvements on the MPEG audio format are planned for lower sample rates.  More recently, another digital video encoding standard effort known as MPEG-4 is underway.  This new initiative is for very low bit-rate coding of audiovisual programs, with particular application to mobile multimedia communications.  Although MPEG-2 and MPEG both use a DCT algorithm, it is anticipated that MPEG-4 will be based on a new algorithm, which , though computationally more expensive, results in significantly higher compression. DVI TECHNOLOGY  DVI technology is different from the other standards discussed here because it is specifically based on the use of special hardware.  Intel Corporation and IBM corporation have developed a programmable chipset which implements the technology in a co –processor environment on any type of computer platform.  These chips support a wide range of multimedia functions in software, including JPEG compression and several DVI- unique compression algorithms of stills or motion video.  Because of the programmability of the chipset, it can respond to new algorithm developments-for example, the programming of the chips to do MPEG processing is being explored.  Intel is committed to producing higher-speed DVI functionality with a future and even discussed the possible integration of DVI functionality with a future generation of x86 CPU family.  Thus, the DVI hardware is an important engine for present and future compression developments. DVI TECHNOLOGY MOTION VIDEO COMPRESSION  DVI Technology can do both symmetric and asymmetric motion video compression / decompression.  The asymmetric approach is called Production- Level Video( PLV), Video for PLV must be sent to a central compression facility, which uses large computers and special interface equipment, but and DVI system is capable of playing back the resulting compression facility, which uses large computers and special interface equipment, but any DVI system is capable of playing back the resulting compressed video.  The picture quality of PLV is the highest that can currently be achieved.  The other DVI compression approach is called Real Time Video ( RTV) .  It is done on any DVI system that has the Action Media Capture Board installed.  Playback of RTV is on the same system or any other DVI system.  Because RTV is a symmetric approach, which requires that compression be done with only the computing power available in a DVI system, RTV picture quality is not as good as PLV picture quality. DVI PRODUCTION –LEVEL COMPRESSION  PLV is an interframe compression technique; the algorithm details are proprietary, and we can say only that it is block- oriented and that it involves multiple compression techniques.  Since it was designed specifically for the DVI chipset, it is optimized for that environment, and it probably would not make much sense to run it on different hardware.  PLV compression is an asymmetric approach where a large computer does the compression and the DVI hardware in the PC does the decomposition.  It takes a facility costing several hundred thousand dollars to perform PLV compression at reasonable speeds.  Since this cost is too much for a single application developer to bear, centralized facilities are provided where developers can send their video to be compressed for a fee.  High quality motion video compression has difficulties right from the start.  The data rates created by the initial digitizing are high, even for large computers.  This happens because the initial digitizing really has to be done in real time to obtain the best quality.  In most cases, the input video medium for compression will be an analog -for best results , it will be one-inch tape.  Although one-inch tape machines can play at slow speeds, they do it by introducing frame storage and processing, which would interfere with the quality of the compressed result. The only way to get around that processing is to run the VTR at normal play speed ( 30 frames per second) .  Therefore, for best quality, we must invest in digitizing and interface hardware, which will let the VTR run at normal speed and capture the on computer disk.  For PLV compression, the real-time video from the VTR is digitized, filtered, and chrominance subsampled by special hardware before storage on digital disk.  Such storage still requires a data rate of about 2 megabytes per second –subsequently higher than the storage data rate of a typical PC – and a 12 gigabyte digital disk only holds 10 minutes of this partially compressed digital video.  Then, in nonreal time, the data is taken frame by frame from the digital disk and run through the PLV compression algorithm. The compression algorithm typically runs on a parallel process processor CPU. Compression takes about 3 seconds per frame on 250-MIPS machine –still about 90 times slower than real time. ( A minute of final compressed video will take 90 minutes to compress). Of course, compression speed is proportionally faster on an even larger parallel machine. PLV Performance  The DVI PLV compression algorithm is proprietary and will not be described here.  Its performance is also difficult to describe or show here, because it doe not make sense to show still frames from a motion sequence.  That is because an individual frame frames from a motion sequence.  That is because an individual frame from motion video may contain artifacts which are not visible when those frames are delivered to a viewer at 30 frames per second.  There is a significant degree of visual averaging taking place when viewing 30 frames per second video.  This is also true for normal - noise artifacts become highly visible in a single still frame, whereas the averaging between frames in a motion sequence makes noise much less visible.  You can observe this problem if you experiment with a VCR which has slow –motion pictures.  Anyway, PLV compressed video delivers full-screen, full- motion pictures.  Anyway, PLV compressed video delivers full-screen, full – motion pictures at a quality subjectively competitively with half—inch VCR pictures.  The PLV compression algorithm must be given goals for the data rate of the compressed bit stream and for the amount of DVI processor chip time per frame which will devoted to decompression.  Even when working with CD-ROM , we will often want to use fewer than 5120 bytes per frame for he average data rate of the video because we want to leave space in the CD_ROM rate for audio or possibly other data. We also may wish to display the motion video at less than full-screen in order to save data so that more than 72 minutes could be on one CD-ROM. Disc.  Another way to effectively reduce the compressed data rate is to lower the video frame rate.  In some cases, 15 frames per second is fast enough.  This cane be used either to cut the video data rate in half or to allow more than 5120 bytes per frame to achieve somewhat higher video quality.  In the case of DVI processor decompression time, there are 33 milliseconds per frame available at 30 frames per second ( 40 milliseconds time be used for decomposition because we need the processor to perform motion image or scaling the image to a different size.  You can see that there is a multidimensional tradeoff here involving four interacting parameters, which together will determine the resulting picture quality:  Image Cropping (pixel count)  Compressed data bytes per frame  Video frame rate  Decompression processing time  The PLV compression software takes all of these parameters as input, and it will try to produce the best –quality pictures within these constraints.  Because PLV compression scheme, there are some special considerations involved in starting up playback of a scene or in starting a scene in the middle.  The first frame of motion sequence must be treated as a still image ( called a reference frame); additional time is required to send all the data for a reference –about three times the data of an average motion video frame. ( If we are using motion video at 5120 bytes average per frame, a reference frame will require around 15,000 bytes of data.)  If it is intended that a scene wil be started by the application those points.  This can usually be done without causing any noticeable interruption of motion when the scene is played from end to end, because the DVI decompression software uses multiple frame buffers in VRAM so that variations in the put compressed data rate are accommodated with affecting the displayed frame rate .  In any case, if you have special needs for reference frame in your video, they have to be expressed at the time you order PLV compression. DVIREAL-TIME COMPRESSION  The use of a centralized compression service that is remote from the developer of an application introduces delay and expense into the application introduces delay and expense into the application development process.  It also precludes any application that needs to do real time compression.  In creating an application, a developer needs a way to experiment with the video and audio in the context of the application without incurring this delay and expense.  This need is filled by DVI‘s service Real-Time Video (RTV), which is a compression process that is done in real time on DVI development system.  With RTV, the developer may compress his or her video and audio to the same file size as from the PLV service, and then use those files in the application under development inexactly the same way that the final PLV file will be used.  By this means the developer may experiment as much as needed and actually try out the complete application before sending any video out for PLV compression.  The tradeoff in RTV is picture quality.  To accomplish motion video compression with only the resources of DVI development systems means that RTV is lower in resolution and frame than PLV.  Compression is done with DVI processor chips and , while these chips are very powerful among their peers, in the milliseconds available to compress a frame in real time the processor doe not compare with the computer cycle available in simplified.  However, the results produced are good enough to fill most needs for application development and testing.  For some applications which do require real-time compression within the application, RTV may completely fill the bill.  RTV compression allows the user to make some trades of compression versus picture quality if the RTV-Compressed code will never have to be stored on CD-ROM .  By allowing the data rate to go higher than 1,53,600 bytes per second using fast hard disk storage, the RTV frame rate can be increased to 30 frames per second.  Most DVI capture software provides for user choice of these parameters.  Communication between RTV and PLV occurs through the medium of SMPTE time code.  The original one-inch videotape which will eventually go to PLV compression by RTV for development purposes, the time code is captured, for storage with the video frames.  RTV is not frame –to-frame compression, so an TRV file can be started or stopped at any point frame-to-frame compression, so an RTV file can be started or stopped at an point.  In the RTV mode decisions about in and out cut points for the displayed video can be made and the time code values may be read from the RTV-file data to create the edit list which will be used for the PLV compression.  After all decisions about video material have been made, the master one- inch tapes and the edit list go to the PLV compression facility for final compression of exactly and only the selected scenes MIDI PROTOCOL  In the early 1980‘s, several music instrument manufacturers agreed on a networking standard for musical instrument Digital Interface.  The standard is now maintained by MMA, the MIDI Manufacturer‘s Association and disseminated by the International MIDI Association ( IMA) [67].  The specification is also reproduced in whole or in part in references such as [65,68,69].  The specification calls for certain hardware connections, using a 5-pin DIN connector .   There are three kinds of connections, allowed; in , out and ‗thru‘.  A thru connector provides a direct copy of the input signal.  I would like to mention in passing that the MIDI network, although it has been made to work, it is to be expected that some super-set of MIDI will appear on the market.  Already companies like Lone would have attempted to bring to the market an optical network which includes MIDI as a subset.  The MIDI software specification involves 8 data bits , a start bit, and a stop bit, for a total of 10 bits transmitted at a rate of 31.25 kbaud.  A message consists of one status byte followed by zero or more data byes.  MIDI devices, such as tone generators, can be connected in networks, such as chain or trees.  Each device can listen to one or more MIDI channels.  All data and mode messages are sent to all receivers but the messages include a channel number so that on specific messages .  The messages defined for musical events, such as note on, note off, and pitch bend change.  The key number represents keys from the bottom of the keyboard range to the top.  Velocity means the speed with which the is struck and generally controls attacks characteristics , overall amplitude, and spectrum of the note.  The polyphonic key pressure message is sent by devices such as keyboards that can measure the pressure applied as each ley is held.  The pressure for each key can be sent separately so that individual notes can be modified in performance.  A channel pressure message comes from a device that can measure the pressure from its sensors, but can send only one pressure detected ?(usually the maximum ).  A program change message causes the synthesizer to select one of 128 voices .  In the early years of MIDI, each manufacturer assigned arbitrary voices to those program numbers.  The recent General MIDI Specification includes a 128-voice Instrument pitch Map.  A melody recorded on one General MIDI on one General MIDI synthesizer's xylophone sound, for example, will also be played back using a xylophone, and on some other General MIDI synthesizer.  Four Mode message (not shown in the table) determine, among other things, whether the instruments ‗voices will be assigned to incoming notes in monophonic (single melody) or polyphonic(several voices a once) fashion.  There is also provision for common messages( sent to all receivers) real-time messages(for synchronization), and for system exclusive (sysex) messages.  System Exclusive is Essentially a generalized escape mechanism for messages of arbitrary length.  MIDI is not limited to hardware systems , Indeed, the acceptance of MIDI made possible the proliferation of software programs running on the , Macintosh, Atari, and PC.  MIDI software includes sequencer programs, with which the musician can record, play back, view , and alter musical events, working with music notation, piano-roll notation, text displays of MIDI commands, and the like.  The basic MIDI messages playing back the melody from a synthesizer,. .  In the fig all the messages are sent out over channel 0.  The note numbers and velocities are given in decimal representation.  A note on message with a velocity of ‘0‘ is the same as a note- off message.  Time in the first column is in milliseconds, with 90 quarter to the second.  Note that the first note occurs after a 3-second delay from the start of the play back.  The original MIDI specification dealt primarily with real-time music performance.  To represent time in music, there are basically two possibilities –absolute time and delta time.  With delta time, the time elapsed since the previous event is recorded.  With absolute time, time elapsed since the beginning of the composition is represented.  In the most general terms, both kinds of time are identical.  But in practical implementation , delta time has the advantage that a whole sequence can be moved as one unit; only the start time of the unit must be changed.  The polyphonic key pressure message is sent by devices such as keyboards that can measure the pressure applied as each key is held.  A program change message causes the synthesizer to select one of 128 voices. BFIEF SURVERY OF SPEECH RECOGNITION AND GENERATION  Speech is one of the main channels for human communication and thus must be handled carefully in any multimedia , in contrast to what has been discussed thus far about music, a major criterion in speech in intelligibility.  Telephone –quality spoken has a bandwidth limited so around 200-3400 Hz. An 8-kHz sample rate results in 68kh/sec bit rate for PCM speech, far smaller than required for music PCM.  Speech Production :  The organs involved in speech include the larynx, which encloses loose flaps of muscle called vocal cords.  The puffs of air that are released create a waveform which can be approximated by a series of rounded pulses.  The waveform created by the vocal cords propagates through a series of irregularly shaped tubes, including the throat, the mouth , and the nasal passages.  At the lips and other points in the tract, part of waveform is transmitted further, and part is reflected.  The flow can be significantly constricted or completely interrupted by the uvula, the teeth, and the lips  A voice sound occurs when the vocal cords produce a more or less or less regular waveform.  The less periodic, unvoiced sounds involve turbulence in which some part of the whole tract is tightened.  Vowels are voiced sounds produced without any major obstruction in the vocal cavity.  In speech, formants (introduced) are created by the position of tongue and jaw. , for example , In separating vowels, the first three formants are the most significant.  In the fundamental of the female is around 200HZ and higher, with the formants perhaps 10 percent higher than those of the male.  Consonants arise when the vocal tract is more or less obstructed.  Sounds at the level of constants and vowels are collectively known as phonemes, the most basic unit of speech differention, analysis, and synthesis, then the word.  Fig shows a sonogram , a time –varying representation of a speech signal.  The regions of high energy appear dark.  The vertical stripes in the dark region correspond to individual pulses from the vocal cords.  The change in the position of the darkest areas from left to right corresponds to the changes in formants.  The SPASM system developed by cook (76) combines models of the glottal waveform and noise sources in the vocal tract with modeling of and obstructions in the vocal tract.  The resulting articulatory model is implemented with a GUI, including cross-section of the head , to permit synthesis of spoken and singing voice. Encoding and Transmitting speech  The simplest way to encode speech is to use PCM , discussed above.  The 8-bit , 8 kHz standard for speech is of significantly lower quality than what is required for music.  Still, at the nominal 64-kb/sec rate for speech, if one bit per sample can be saved, then the total saving is 5 kb/sec.  Methods for lowering the bit rate thus remain an active area of research.  The ADPCM method discussed above can easily save 2 to 47bits per sample.  ―PCM, ADPCM, and related methods attempt to describe the waveform itself.  There are other methods, such as the sub band coding discussed above under MPEG.  We now turn to another class of methods, called voice codes or vocoders.  The human vocal tract can be simplified by assuming, for example , the source of vibration for voiced sounds is not affected by the rest of the vocal tract.  The series of filters that model the vocal tract can be modeled such that if one filter changes, there is no effect on the others.  Under such conditions , we can calculate the voice model coefficients independtly of the fundamental frequency or the voiced or unvoiced decision.  We can also reasonably assume that formants change quite slowly compared to the rate of individual pulses from the vocal tract and transmit the filter co- efficients at a slower rate.  The channel vocoder pioneered by Dudley analyses speech as a bank of filters.  The driving function for synthesis is noise or a series of pulses like those generated by the vocal cords.  .  The filter co-efficients, the fundamental frequency, and the voiced/unvoiced decision are transmitted.  Research on the channel vocoder ultimately led to the phase vocoder implementation mentioned above.  Linear prediction, also mentioned above, models the vocal tract as a source followed by a series of filters.  These filters can be modeled as a series of tubes, and tube parameter can be transmitted  There is unfortunately, no intuitive relationship between tube parameters and say the spectrogram reprepesentation , but LPC is certainly adequate for compressing the speech for reproduction in chips.  One transmission pitch period, gain , the voice/un voiced decision, and a dozen or so filter coefficients .  In different kind of system , both encoder and decoder can contain lookup table.  Each table entry is vector containing a series of samples.  Rather than transmit the samples, one can tranmit just the index into the table.  If the exact sequence of samples cannot be found, the closest vector is transmitted.  This method can be used to transmit the waveform itself or sequences of coefficients for a vocoder .  As we have seen, the basic data is 64 kb/sec (CCITTg.211) for 8 bit PCM with ADPCM, 4 to 6 bits per sample are transmitted, for 32 to 48 kb/sec; there is 32 lb/sec CCITT standard G721 for ADPCM. Some subband coding systems operate as lower as 16 kb/sec.  For higher –quality speech with sub band coding, there is CCITT G.72 for 50-7000Hzat 64 –kb sec rate.  For various methods of coding bit rates cab fall as low as 2400- bit/s, but with a corresponding reduction in quality.  There is a good discussion of the various CCITT standards.  Improvements in quality and lowering bit rate are being driven by military research and the usual telephone companies, but also by factors such as the desire to incorporate voice with other data, such as in ISDN, or the need to scrunch more channels from cellular networks. End of Unit-II