Perceptual Criteria and Design Alternatives for Low Coding

V. Ralph Algazi, CIPIC, University of California, Davis, and Norimichi Hiwasa, CIPIC-HQUL and Mitsubishi Electric Corp., Of una, Japan

Abstract frames/second,the raw data rate is 60 Mbps, and some- The design of very low bit rate coders, below 64 kilo- whatmore to accountfor chrominance.Means and options bits per second,presents a number of new challenges. availablefor the drastic decreaseof the bit rate, which is a Such low bit rate codersare targetedto small size images, primary interest in this paper are shown in Table I. We say 176 X 144, or below and are limited to head and considerin our discussiona subsetof the options listed, shouldersscenes. with emphasison the effect on performance and image In this paper, we compareand rate the performance of quality. several still image encodingmethods such as DC!: Sub- band and Wavelets,using a new quality scale basedon 3. Someproposed standards for low bit rate properties of human vision. Thesestill image coders are video. embeddedinto video coderssuch as the H261 coder. The relative importance of intraframe and interframe quality Someimages sizes have beenadopted as tentative stan- and contributions to the total bit rate provide an overall dardsin a hierarchicalformat framework known as H.261, designframework for theselow bit rate video coders. which considervideo encoding at ratesp* 64 kbps, where Wediscuss and illustrate, for the encoding of low bit p is an integer[1]. rate video. the still image and nwtion impairments that Two of thesevideo standards,known as Common In- are perceptuallymost important. termediate Format (CIF) and Quarter CIF (QCIF) are shownin Table2. Note that the raw bit rates at 30 frames per secondare 1. Physical basis and design parameters in slightly more than 24 Mbps for CIF and 6 Mbps for QCIF. video coding. For a final bit rate at or below 64 kbps,we require a com- pressionratio slightly larger than 100:1 for QCIF video, at The coding of video sequencesis based on both the 10 kbps, the compressionrequired would be more than propertiesof human perceptionand on propertiesof the data that make it possible. Without repeatinghere an ex- 640:1. The targetedcompression mandates strong restric- tions on the type of image sequencesthat can be consid- position of these basic properties,we will discuss im- mediatelythe typical structureand some of the options in ered,following the frameworkof options of Table 1. The image size has to be a QCIF or smaller, limited camera video coding, and discuss the underlying principles and motion is allowed, the sceneis a simple "head and shoul- propertiesas needed. ders" video conferencescene, and motion is limited. Video coders have severalcomponents. An intraframe coder which embodiesone of severalmethods for the re- presentationand coding of still images.We will discuss 4. CompressionGoals and Means. and comparesome of them later in this paper. Frame to frame redundancyis quite high in video. Thus motion Let's consider briefly reasonable ranges of compression compensation(MC) is used to estimateone of the frames for tl1e intraframe and interframe portions of tl1e coder. in the sequencefrom one or severaladjacent frames. The Intraframe coding will provide substantial compression for interframe coder is the embodimentof the efficient repre- tl1e restricted type of scenes of interest. Clearly, we are sentationand coding of the motion compensationresid- not targeting here a very high quality image and a com- uals, that will be generallyquite small. pression of 10 for a still image tl1atis not too busy is rea- sonable. Below.3 bit/pixel, most of tl1e known metl1ods 2. Designalternatives. will result in images tl1atare not generally acceptable. We target, for discussion, a compression of 20 for tl1e intra- Consider,as a reference,the raw digital data rate of a frame portion of the coder, so that factors of 5 to 32 addi- standard television image sequence. For typical image tional compressionare needed to lower the range of output size of 640 X 480 pixels, or 525 lines video, the number rates to 64-10 kbps. An improvement factor of 5 C2I1be of pixels is about 0.25 Megabytes per frame. At 30 achieved by inter frame coding, but not a factor of 32. Therefore,low bit rates will require decreasesof both the Distortionfactors Fl and F2 are of the fonn frame rate and the image size, as well as interframe cod- F2-~ ing. Note however that lowering the frame rate will de- -111112 crease the correlation between frames and, thus the whereIIIlf is the meansquare value of the image and lIewll2 effectiveness of interframe coding by motion is the weightedM.S. value of the error. b) End of block effect: This effect is commonlyobserved compensation. in block transfonncoding. Define horizontalend of block 5. Image compressiontechniques and image discontinuityas quality for intraframe coding. Aew(m,N)~[ew(m,N) -ew(m,N + 1)]2 where N is the block size. With a similar definition for the vertical discontinuity, we evaluate the end of block As discussed,we are targetinga compressionratio of distortionfactor as 20 for the intraframe portion of the video coder,and there- fore, we compressa QCIF image to about .4 bit/pel. Let F3~[IIAew(M,n)112 -t-IIAew(m,N)112]1/2 us considerthe elementsof image quality at sucha low bit c) Structured errors in any part of the image. We com- rate taking as a prototypical examplethe H.261 coder. pute the average local correlation of weighted errors, Ry(M,N) in 8 X 8 neighborhoodsto detenninelocal error A. Quality Factors: structure,and we define the distortion factor for these structureerrors as This subjectivequality evaluationis basedon a number F4 = [IIRyI12 + IIRxI12]1/2 of impairments that can be observed in the encoded image. We identify the types of impairmentsand define Thus,if errorsare uncorrelated.F4 is zero. correspondingdistortion factors that can be objectively d) Structure errors in the vicinity of high contrast edges. In the vicinity of high contrast transitions,visual impair- quantified. Distortion factors are perceptuallyweighted measures mentswill be maskedby the activity of the image. How- of image impairments. These distortion factors are sug- ever, the largesterrors due to the ocr also occur in the gestedby experiencein observingartifacts due to coding sameportions of images. We define a distortion factor F5 and by knowledgeof propertiesof the human visual sys- that includes a masking function as well as a measureof tem. A global subjective measure,the Mean Opinion imageactivity. Score (MOS), is a subjective assessmentor ranking of a B. A composite quality metric: The Picture Quality compositeimage quality. We assumethat MOS is a linear combination of ob- Scale: (PQS) serveddisturbances Di and that MOS is approximatedby The five distortion factors are highly correlated and a an objective picture quality scale (PQS) which is of linear numericalPicture Quality Scale (PQS)is defined as combinationof measurabledistortion factors Fi. 3 The distortion factors Fj are functions of the difference PQS~bo + L biZi betweenthe original and reconstructedencoded image, so Ii==I that PQS will be a measureof the degradationfrom an original. Propertiesof the visual system with respectto where Zj are the principal components of tl1e covariance perceptionof luminance, contrast transfer function. and matrix of tl1e Fi's and tl1e b's are partial regression anisotropyof vision are used to weight the error e(m,n) coefficients. betweenthe original image and its encodedversion [2]. By multiple regression analysis, a correlation of 0.88 We denotesuch perceptually weighted errors by ew(m,n). witl1 tl1eMean Opinion Score (MaS) taken on a five point a) RandomErrors: Somerandom errors are visible in en- scale is determined. (The five point impairment scale is: codedimages. We use two quality factors for theseerrors, 5=imperceptible; 4=perceptible but not annoying; 3=slightly annoying; 2=annoying; and l=annoying.) This obtained by weighted mean squareerrors measures.Fl. makesuse of the standarderror weight used in television. l1QS provides a useful quality metric for still monochro- F2 makes use of a more complete,two-dimensional and matic images [2,3]. anisotropicperceptual weight [2]. Comparison of intraframe coding techniques We have observed that randomerrors are seldom the c. dominantfactors in coded still frames.This is becausethe with PQS. human visual system is more sensitive to patterns and By using PQS we compare the quality versus bit rate structured misalignmenterrors in image than to random curvesof severalcoders at about 0.4 bit/pixel. Resultsare disturbances. shown in Figure I and indicate that all three coding meth- sensitivity function. A major result is that the perception ods have comparableperformance, with a advantagefor of moving detail is higher if the moving detail is being the DCT coder. Note that the performanceof suchcoders trackedvisually. The major effects used in coding tempo- dependon the quantizationmatrix usedas well as on the ral changesare to control rapid temporal changesof low error free coding strategy. We have used in each case spatial impairments,and to allow errors at higher spatial methods reported in the literature [4,5,6]. However, the frequenciesfor rapid temporalchanges. quality of the coded still image does not correlate well The importanceof theseresults to our discussionare in with observedquality of video as we constructimage se- the changein perceptualeffects when forming an image quencefrom thesestill frames. For instance,we are quite sequencefrom coded still images, which exhibit the types sensitive to temporal changes in the averagegray scale of visual impairmentsdiscussed in a previous section. To value in flat portionsof images. addressthe specifics of image quality, we considerthe H.261 coder,designed for low bit rate video. 6. Coding techniques and quality effects in video and image sequence coding. 7. The H 261 Coder.

We now discuss briefly some techniquesand quality In the H.261 coder, 8 X 8 blocks are transformedwith factors in image sequencecoding. Of necessity,the dis- the DCT. Intraframecoding follows closely the JPEG still cussionis now more qualitative. image standard[7]. A 2 X 2 group of blocks or macro- The intraframecoding of image sequencestake advan- blocks, representingan array of 16 X 16 pixels is usedto tage first of the small incrementalinformation from frame determine (MC). For each macro- to frame. If large changes occur from frame to frame, block, the decision is made to encode it either by inter- thenspatial details cannotbe perceived. If image motion frame or intraframemethods. The decision is also made, is small and smooth,then the incrementalinformation in a basedon the motion compensatedmacroblock residual en- new frame can be predicted from previouslyencoded ad- ergy, whether to use motion compensationor not. The jacent frames. This motion compensation(MC) is an es- H.261 may be used as a fixed rate coder, so that the vari- timation process. Because of the accumulation of able rate digital bit streamresulting of the encodingpro- distortions in interframe coding and becauseof the disas- cess feeds a buffer that maintains a fixed output rate. trous effect of transmissionerrors in the quality of images Buffer overflow results in drastic decisions in order to that have been coded differentially, some intraframe maintainthe fixed bit rate. Theserange from adaptationof coded image are included periodically in the overall en- the quantizer step size to discarding whole coding scheme. data. Quality factors for sequences: Experimen~ with the 8.261 Coder. To examine t11e combined effect of all t11epossible causes of image im- Flicker. A very objectionable degradation of quality oc- painnents at low bit rates, we encoded a standard video curs when flicker is present. Although the flicker effect sequence "Miss America" using t11eH.261 coder, with depends of somewhat of viewing conditions, keeping the some of the options that will be of use in illustrating the flat field rate at 60 frames per second will reduce the image quality issues. The 8.261 coder simulator is made flicker effect to tolerable limits. For a low frame rate vid- available by t11eportable Video Research Group (PVRG) eo, the flat field rate is maintained above this limit by re- at Stanford University [8]. For a target bit rate, say 64 peating the frames at the display. kbps, we have the option of buffer size, initial quantiza- Reproduction of motion. It is generally accepted that tion step, and frame rate. We chose a 16,000 bit buffer smooth motion can be approximated in an perceptually t11atdoes not result in buffer overflow for our experi- acceptable fashion if a sequence of 24 motion frames per ments, and thus allows nonnal operation of t11ecoder. second is sustained. Thus, if interlaced scan is not re- We now consider several frame rates for this target bit quired, a frame rate of 24 frames per second is as high as rate. For high frame rates, t11erendition of motion is im- needed for a smooth motion rendition. For low bit rates, proved, while for lower frame rates, t11edistortion of the such a frame rate cannot be achieved, and jerky motion isolated frames will be decreased, at the expense of in- will result. creasing the jerkiness of motion. Perceptual Sensitivity to Temporal Changes. We are Intraframe and Interframe Coding. It is infonnative to interested in the combined spatial and temporal sensitivity consider the contribution to the total bit rate in the H.26 I of human visual perception. Combined spatial-temporal coder. AtlS frames/sec, the interframe coder achieves ap- sinusoids can be used to study this 3 D contrast proximately 3.5 higher compression than t11eintraframe coder. Accounting for the chrominanceinformation, the H.261 coder with a smaller image, say 128 X 128, and compressionfactor for the intraframecoder is about20 as achieve20 kbps. Below that rate, it does not appear that anticipated.The bits neededto encodeboth the intraframe the H.261 OCT based coder is suitable, or either is the and interframe information increase as the frame rate block basedmotion compensation. Subbandand wavelet decreases. codermay provide some incrementaladvantages because distortions track more closely the moving object. Other 8. Discussionof Image Quality issues. methods,such analysis based motion compensationor carefuluse of spatia-temporalvisual perception,may have The goals of our srudy were two-fold: more promise, without going to the complexity of an 1. To determine the relative importance of each of the imagesynthesis coder [9]. distortion factors, identified for still image coding, One of the objectives at CIPIC, is to pursue the de- whenthese images are displayedin a time sequence. velopmentof quality metrics for image sequencescoding. 2. To determine the importanceof jerky motion at re- The current study was of help in providing a rough cut ducedframe rates. iliroughsome major issues. With respectto the still image distortion factors, some conclusionsare easyto reachfor the low bit rate we have Rererenc~ to maintain. The random error factors,Fl and F2 are not important becauseother effectsare much larger. The end [1] Ming Liou, "Overview of the px64 Kbps Video Cod- of block factor F3 may be quite detrimental, principally ing Standard." Communicationsof the AIM, Vol. 34, when the entire macroblockis suddenlydegraded because No.4, pp. 59-63, April 1991. of buffer overflow. It is not yet clear to us that this effect [2] M. Miyahara. K. Kotani and V. R. Algazi, "Objective could not be mitigated by careful quantizerselection, and Picture Quality Scale (PQS) for Image Coding," SID with acceptablebuffer size and delay. The structureder- Symposium92, Boston,MA, May 1992. ror factors F4 and F5 are the dominantfactors in all cases. [3] V. R. Algazi, Y. Kato, M. Miyahara and K. Kotani, In smooth portion of the image,such as the face of Miss "Comparisonof Image Coding Techniqueswith a Pic- America, tile factor F4 measuresimage blotchiness. The ture Quality Scale",SPIE Technical Programon Pho- factor F5, measuresvisible blocking artifacts near higher tonics Instrumentation Conference,Vol. 1771, San contrast transitions, such as in tile neck area of Miss Diego,CA, July 1992. America. [4] J. D. Johnston,"A Filter Family Designedfor use in With respectto tile effects due to motion, we also Quadrature Mirror Filter Banks," ICASSP, April reach readily some preliminary conclusions. For a "head 1980. and shoulders"scene such as Miss America, the smooth- [5] I. Daubechies, "Orthonomal Bases of Compactly nessof motion is not a critical issue. Frame ratesof 10 or SupportedWavelets," Commun. on Pure and Applied even 5 frarnes/secare more acceptabletilan a poor still Mathematics,Vol. XLI, 1988. image quality. If tile structuredimpairments measured by [6] J. W. Woods, Ed., "SubbandImage Coding," Kluwer F3, F4 and F5 are significant in tile framesof an image se- AcademicPublishers, Norwell, MA, 1991. quence,their perceptual impact is increased. Structured [7] JPEG-W. B. Pennebakerand J. L. Mitchell, "JPEG, errors move acrossportions of tile image in a slow, ran- Still Image Data CompressionStandard 1993", Van dom fashionare quite annoying. Thus tile trade-off seems NostrandReinhold, New York. to favor strongly higher quality still images and a low [8] PVRG-P64Codec 1.1, Andy C. Hung, Stanford Uni- frame rate £0achieve the overall goals of a very low bit versity,June 1993. rate system. [9] U. Golz and R. Schafer,"Considerations on the Pos- sibility to ExchangeTemporal Against Spatial Reso- 9. Discussionand Conclusions lution in Image Coding," Signal Processing Image Communication2, 1990,39-51. We examinedthe perfonnanceof the H.26l coder on a QCIF sequenceat 64 and 32 kbps and 30, 15 and 5 fra- mes/sec. At 64 kbps, 15 frames per second appear to achieve the best compromise between motion rendition We are very gralefullo Sandeep Shelly for assistancc and image quality. At 32 kbps, it is not possible to with the H.261 coder simulation. achieve 30 frames/secat all, and 5 frames/secis the pre- We also acknowledge support of the UC Micro Pro ferred choice. To go below 32 kbps, we could use the gram, Pacific Bell, Lockheed and Hewlelt Packard.

Acknowledgements. ,,~ Gradation " Sbittt TransferSpeed tis -1.5 Mbitls, 2 Mbitl5

Table2. CIF and QCIF Standardsin H.261 Po vs bit xel

;::::: /' /' ""'"

o-N Figure 1. 1) ocr (H.26I); 2) wavelcl-(scalarQ + Huffman) /16 dim L VQ) 3) subband-(scalarQ + Huffman)/16 dim L VQ) 4) subband-(scalarQ + Huffman)

~"-4,2'3O.