<<

FRAME RATE PREFERENCES IN LOW

Gayatri Yadavalli, Mark Masry and Sheila S. Hemami

Cornell University, School of Electrical and Computer Engineering, Ithaca, NY email:[email protected], {masry, hemami}@ece.cornell.edu

ABSTRACT The paper is organized as follows: the frame rate subjective preference test is described in Section 2. In Section 3, a statistical A double stimulus subjective evaluation was performed to deter- analysis is used to determine viewer preferences across subgroups. mine preferred frame rates at a fixed bit rate for low bit rate video. Section 4 concludes the paper. Stimuli consisted of eight reference color video sequences of size 352 × 240 . These were compressed at rates of 100, 200 2. TEST DESCRIPTION and 300 kbps for low, medium, and high motion sequences, re- spectively, using three encoders and frame rates of 10, 15 and 30 This section presents the design of the frame rate preference eval- frames per second. Twenty-two viewers ranked their frame rate uation. It includes a summary of content chosen, followed by a preferences using an adjectival categorical scale. Their preferences discussion of the coding conditions and a description of the test were analyzed across sequence content, motion type, and encoder. environment. Viewers preferred a frame rate of 15 frames per second across all categories, with several notable content-based exceptions. 2.1. Video Sequences

1. INTRODUCTION To determine the effects of frame rate on different content types, eight reference sequences of approximately eight seconds each were chosen specifically to represent a variety of streaming video As the demand for streaming video rises among Internet users, 352 × 240 providers are faced with new challenges to maximize video qual- content and resized to pixels. All sequences were dig- ity under fixed bandwidth constraints. At a given bit rate, it is itized using the 4:2:2 YUV . The reference sequences not known how the frame rate affects the perceived quality of the are described here: video. The majority of available today is coded at 30 frames per second (fps). A lower frame rate improves the qual- Low motion: ity of individual frames at the expense of the smooth motion of • videoconference - a head-and-shoulders view of a woman video coded at high frame rates. Example frames from one of the talking to a stationary . video sequences in the Experts Group (VQEG) [1] • news - news footage consisting of one scene with a station- database coded at 10 and 30 fps are shown in Figure 1. Given ary camera. this tradeoff, it is useful for video providers to have a set of rules governing the selection of an optimal frame rate for a particular Medium motion: content type, given a fixed bit rate. • crowd - a moving crowd with a high level of detail and some This paper presents the results of a subjective evaluation of camera motion. viewer-preferred frame rates for a variety of low bit rate video • martial arts - a fight scene involving several people and fast content. Low, medium, and high motion sequences were selected; motion. the motion content of each sequence determined the bit rate used, • ranging from 100 to 300 kbps. The reference sequences were se- airport - several people walking through a moving crowd lected from a wide assortment of film and television programming with high camera motion. to span the range of available content types. In order to test frame • - several scenes of computer animated characters rate preferences, each reference sequence was compressed with a with some camera motion. given encoder at three different frame rates: 10, 15, and 30 fps. High motion: The resulting set of three compressed video sequences was pre- sented and evaluated together. Since the bitrate was fixed for each • sports - a panning shot of a football game with a large num- reference sequence and each of the eight reference sequences was ber of moving players on a stationary background. processed with three encoders, there were 24 such test sets, all of • car chase - a very high-speed car chase scene with many which were evaluated by each viewer. The evaluation was per- cuts and extremely high motion. formed using the double-stimulus, five-grade adjectival, categor- ical ITU-R quality scale of ITU-R BT.500-11 [2]. A rank-based 2.2. Coding Conditions analysis of variance was performed to analyze the significance of viewer preferences in terms of sequence content, motion level, and Each of the sequences was encoded in color using three differ- encoder. Fifteen fps was generally preferred across categories. ent motion-compensated video coding algorithms: the Sorenson Specific exceptions are noted and discussed. Professional video coder version 2.1 [3], the University of British Columbia’s H.263+ coder version 3.0 [4], and a wavelet-based rate-distortion optimized coder developed at Cornell University [5]. The three coders implement vector quantization, the Discrete Cosine Transform (DCT), and the , respectively, and are referred to as VQ, H.263+ and Wavelet throughout this paper. Different bit rates were chosen for each motion category. The low motion sequences were encoded at 100 Kbps, the medium mo- tion sequences at 200 Kbps, and the high motion at 300 Kbps in order to maintain a similar level of quality across the categories. The encoded sequences exhibited a range of blocking and blur- ring artifacts typical of coded video sequences due to the low bit rates used. These artifacts differed across coders. Blockiness oc- curs when quantization causes the appearance of distinct edges be- tween adjacent blocks. Blurriness is the loss of high frequency de- tail. Sequences coded with the H.263+ and VQ coders exhibited a higher degree of blocking, while those coded with the Wavelet coder showed greater blurriness. The Wavelet coder also exhibited a pulsing effect that was most visible on low motion sequences. (a) H.263+ was the only coder that dropped frames in order to meet bit rate targets.

2.3. Test Environment The test environment was designed to simulate standard viewing conditions as nearly as possible for low bit rate video. Room light- ing was fixed at approximately 230 lux. The video sequences were displayed on a 21” Nokia Multigraph 445XPro monitor at a of 1024 × 768. Viewing distance was fixed at 6 picture heights. Monitor gamma was 2.3. Maximum and minimum lumi- nances were measured at 98.9 and 1.5 cd/m2, respectively. The test setup consisted of a single screen with two display areas. Each of the twenty-four test sets of sequences with 10, 15 and 30 fps was presented to every viewer in random order to re- move contextual effects. For each of these sets, subjects were first shown the broadcast quality reference video coded at 4 Mbps and 30 fps in the left display area. This video remained available to replay at any time during the test. Subjects then viewed each of (b) the three possible frame rate sequences in the right area. Three buttons labeled A, B, and C located just below the right display Fig. 1. Example sections of frames compressed at 275 kbps using area allowed users to replay the test sequences as desired. Each H.263+ and frame rates of (a) 30 fps and (b) 10 fps of the three encodings in a test set were assigned to these buttons in a pseudorandom order to eliminate the possibility of viewer ac- climatization to a button/fps combination. Viewers were able to examine all three video sequences before rating them. in terms of sequence content, motion category, and encoder to de- Viewers used a five-position slider to rate each of the three se- termine the effects of each on viewer preference. quences in each test set on a five-grade categorical scale; ratings were then converted to ordinal rankings for the purposes of anal- ysis. Ties were allowed in order account for lack of preference. 3.1. Statistical Analysis The test subjects consisted of twenty-two viewers - eleven male Twenty-four test sets were generated as described in section 2. and eleven female - with varying levels of experience viewing and Ratings for the sequences within a test set were then converted into rating video quality. Each viewer performed two full trials of the ordinal rankings using a three rank scale (i.e. ranks of 1, 2 and 3) test, and the results from both trials have been consolidated into for analysis purposes. Tied values were assigned as the average of the overall results. Viewers had normal (20/20) visual acuity or the rankings they would otherwise occupy. corrective lenses. Lower ranking numbers corresponded to higher viewer pref- erence. The rankings for each test set were grouped and summed 3. RESULTS AND ANALYSIS by frame rate to determine overall preferences for each viewer. In order to test preference across a particular subset of sequences, the This section analyzes the results of testing frame rate preferences rankings of each viewer’s ratings of the sequences in that subset for twenty-two viewers. The rank-based Friedman Test used to were summed. The sums for each viewer were then ranked again. perform the statistical analysis is described. Results are discussed The Friedman Test [6] was performed on the resulting set of 22 Sum of Viewer Rankings Preference χ2 Confidence Level by Frame Rate Grouping Motion 10 15 30 10 15 30 By Video Sequence Videoconference L 49.5 31.5 51 2 1 3 10.70 99.5% News L 31.5 36.5 62 1 2 3 24.38 99.9% Crowd M 43 31.5 56.5 2 1 3 14.25 99.9% Martial Arts M 45 33 53 3 1 2 9.23 99.0% Airport M 52 27 53 2 1 2 19.73 99.9% Animation M 39.5 39 51.5 1 1 3 8.07 98.2% Sports H 54 37 41 3 1 2 7.18 97.2% Car chase H 39.5 37.5 55 2 1 3 8.34 98.4% By Motion Low Motion L 39.5 34 58.5 2 1 3 15.02 99.9% Medium Motion M 43 32 57 2 1 3 14.27 99.9% High Motion H 47 35.5 49.5 3 1 2 5.07 92.0% By Encoder Wavelet All 54 35 43 3 1 2 8.27 98.4% H.263+ All 45 32 55 2 1 3 12.09 99.7% VQ All 37.5 36 58.5 2 1 3 14.39 99.9%

Table 1. Frame rate preferences grouped by sequence across viewers and encoders

rankings to determine the statistical significance of the degree to 3.2. General Results which these sums differ across frame rates. The Friedman test is a rank-based analysis of variance designed for non-parametric tests The results in the upper portion of Table 1 were derived by per- of three or more correlated samples. In this test, group rankings are forming the Friedman Test on per viewer results for each video summed, and the sums for each observation are rank-ordered once sequence across encoders. Sums that were less than one unit apart again. Lower rankings correspond to higher preferences, since the were treated as ties. Rankings of viewer-preferred frame rates are lowest sums indicate the most preferred frame rate for each obser- given for each sequence, showing that viewers generally preferred vation. Tied rankings are given a value between the ranks that they a frame rate of 15 fps. would otherwise occupy. Of all the sequences, news alone resulted in a slight preference The null hypothesis is that the sums for all the rankings for for 10 fps, and correspondingly, the strongest preference against each frame rate are equal, which in this case implies a lack of 30 fps. news was of the poorest visual quality of all the sequences. frame rate preference. The test therefore measures the statistical This, in combination with its low motion content, may account significance of the degree to which the sums differ across frame for viewers placing such high emphasis on improvement in frame rates. The squared deviates are calculated for each frame rate by quality as compared to accurate representation of motion. The high multiplying the number of observations (n =22viewers) in the confidence level associated with the results for this sequence illus- group by the squared difference between the frame rate rank mean trates particularly well that the confidence level given does not im- (Mg) and the ranking scale mean of 2. These deviates are summed ply that 10 fps is strongly preferred, since only a slight preference 2 and used to calculate the χ value, which is translated to a confi- is indicated by the sums. Rather, the confidence level shows the dence level using a standard lookup table. Confidence levels given statistical significance of the degree to which these sums differ. do not represent the level of confidence that a certain frame rate The results for sports show a slight preference for 15 fps over is preferred above all others, but rather signify the confidence that 30 fps, with a definite bias against 10 fps. The content consisted χ2 the sums were statistically different. The value is calculated by of a continuous, unbroken pan of small moving figures with no SS scaling the sum of the squared deviates between groups ( bg ). scene changes. In this case, some viewers may have preferred the χ2 Given the test parameters, the scaling factor is 1, so the value smoother motion of the players and reduced temporal aliasing at is equal to the sum of the squared deviates, higher rates, whereas the improvement in frame quality had little  effect on understanding the content of the scene. The results for χ2 = SS = n (M − 2)2 (bg) g (1) animation and car chase also show little preference between 10 fps g and 15 fps. These were high detail sequences, and frame quality may have been more important than in the other sequences. performed to determine viewer-preferred frame rates for a variety Table 2 gives overall results, where the Friedman Test was per- of low bit rate video content. The results indicate that 15 fps is formed on per viewer rankings across all sequences and encoders. preferred across different types of content. This result was found Sums that were four or fewer units apart were treated as ties. These to be independent of the encoder used to compress the sequences. results also suggest that a majority of users preferred a frame rate A rank-based analysis of variance confirms that the effects of the of 15 fps. 30 fps was the least preferred overall. In this case the frame rate were statistically significant factor. Fifteen fps was the χ2 value is 11.27, corresponding to a confidence level of 99.6%. evident viewer preference in almost all cases, perhaps because it represents a compromise rate between frame and motion quality. There were notable exceptions to this general trend. The re- Rankings sults for news, animation, and car chase indicated that viewers 10 15 30 prefer lower frame rates for poorer quality, low-motion, or high- detail content. For high-motion, unbroken panning sequences such Sum 42 34 56 as sports, viewers preferred higher frame rates. Mean (Mg) 1.909 1.545 2.545 Preliminary testing with similar sequences at slightly higher SD 0.182 4.545 6.545 bit rates suggested a preference for higher frame rates, perhaps because motion quality becomes a more important consideration χ2 11.27 after adequate frame quality has been achieved. The analogous ex- periment for much lower bit rates has not been conducted. Further Table 2. General viewer frame rate preferences across all encod- work will attempt to determine the bit rates at which viewer pref- ings erences shift to high or low frame rates, the results of which will further enable providers to tailor their content to available rates.

5. REFERENCES 3.3. Effect of Motion Type [1] VQEG: “Final Report from the Video Quality Experts Group An analysis of the effect of motion on preferred frame rate indi- on the validation of objective models of video quality assess- cates that 15 fps is preferred across motion types. The “By Mo- ment,” http://www.vqeg.org, 2000. tion” results in Table 1 were derived by performing the Friedman 2 [2] Methodology for the Subjective Assessment of the Quality of Test on sequences in each motion category, and calculating the χ Television Pictures, Recommendation ITU-R BT.500-10, ITU value within categories. It is worth noting that confidence level Telecom. Standardization Sector of ITU, August 2000. is lowest for high motion sequences, suggesting that viewers had a less pronounced preference for a particular frame rate at an in- [3] Sorenson Media, “Sorenson Video Version 2,” creased level of overall motion. http://www.sorenson.com, 2000. Furthermore, at low motion levels, 10 fps was consistently [4] Processing Lab, University of British Columbia, chosen over 30 fps as the second choice. For medium motion, the “TMN (H.263+) encoder/decoder, version 3.0,” difference between the two narrows, and viewers seem to exhibit a http://www.ee.ubc.ca/image, September 1997. very slight preference for 10 fps over 30 fps at high motion levels. This seems to indicate that the “compromise” rate of 15 fps is con- [5] Y. Yang and S. S. Hemami, “Generalized rate-distortion sistently preferred among the three choices regardless of motion optimizations for motion-compensated video coding,” IEEE content, and a safe second choice would be 10 fps, particularly at Transactions on Circuits and Systems for Video Technology, low levels of motion. vol. 10, no. 6, pp. 942-955, September 2000. [6] J. Devore, “Probability and Statistics for Engineering and the 3.4. Effect of Coder Choice Sciences,” Duxbury Press, December 1999. [7] M. Masry, S. S. Hemami, A. M. Rohaly, and W. Osberger, As shown in the bottom portion of Table 1, an analysis of frame “Subjective quality evaluation of low bit rate video,” Proceed- rate preferences across coders also suggests that 15 fps is the best ings of the SPIE Conference on Human Vision and Electronic general choice. Results were derived by examining viewer re- Imaging, San Jose, CA, pp. 195-195, January 2001. sponse to all sequences generated using the same encoder, and calculating the χ2 values for each encoder. Although viewers’ first preference was consistently 15 fps, the second choice varied by encoder. For sequences generated using the Wavelet coder, 30 fps was the second choice, whereas for the H.263+ and VQ coders, 10 fps was second. For sequences encoded using VQ, there was little preference between 10 and 15 fps.

4. CONCLUSIONS

At a fixed bit rate, video compression algorithms allocate more bits to each frame when coding a sequence at low frame rates than at high frame rates. This comes at the expense of the quality of the motion reproduced in the coded video. A subjective test was