COMPARISON OF LOSSLESS FOR CROWD-BASED QUALITY ASSESSMENT ON TABLETS

Christian Keimel, Christopher Pangerl and Klaus Diepold

[email protected], [email protected], [email protected] Institute for Data Processing, Technische Universitat¨ Munchen¨ Arcisstr. 21, 80333 Munchen¨

ABSTRACT diverse group, but also to reduce the financial expenditures significantly. Video quality evaluation with subjective testing is both time Comparisons between the results from classic and crowd- consuming and expensive. A promising new approach to sourced subjective testing in our previous contributions [1, traditional testing is the so-called crowdsourcing, moving 2] show a good correlation for some methodologies simi- the testing effort into the Internet. The advantages of this lar to usual inter-lab correlations. There are, however, still approach are not only the access to a larger and more di- some challenges [3]. In particular, the hardware variation verse pool of test subjects, but also the significant reduction and the necessity for lossless coding. of the financial burden. Extending this approach to tablets, Tablet-based video quality assessment in combination allows us not only to assess video quality in a realistic envi- with crowdsourcing offers not only a more realistic test setup ronment for an ever more important use case, but also pro- for a ever more important use case of video codecs, but vides us with a well-defined hardware platform, eliminat- can also provide a uniform hardware and software platform ing on of the main drawbacks of crowdsourced video qual- with well-defined parameters. In this contribution we there- ity assessment. One prerequisite, however, is the support fore examine a key issue for this application of crowd-based of lossless coding on the used tablets. We therefore exam- video quality assessment to tablets, the support, but also ine in this contribution the performance of lossless video performance of lossless coding technologies on tablets. A codecs on the iPad platform. Our results show, that crowd- similar concept for tablet-based quality assessment was pro- based video testing is already feasible for CIF-sized posed by Rasmussen in [4] for still images, but not for video. on tablets, but also that there may be limits for higher reso- To the best of our knowledge, this is the first contribution fo- lution videos. cusing on lossless coding performance of video codecs on tablet platforms in the context of crowd-based video quality 1. INTRODUCTION assessment. This contribution is organized as follows: after a short Video quality is usually evaluated with subjective testing, introduction into the concept of crowdsourcing and crowd- as no universally accepted objective quality metrics exist, based video quality assessment, we discuss the extension of yet. Subjective testing, however, is both time consuming the concept to tablets and the lossless coding technologies and expensive. On the one hand this is caused by the limited available for current tablets. After presenting the results of capacity of the laboratories due to both the hardware and the the performance comparison, we conclude with a short sum- requirements of the relevant standards, e.g. ITU-R BT.500, mary. on the other hand by the reimbursement of the test subjects that needs to be competitive to the general wage level at 2. CROWDSOURCING the laboratories’ locations in order to be able to hire enough qualified subjects. The term Crowdsourcing is a neologism from the words Crowdsourcing is an alternative to the classical approach crowd and outsourcing and describes the transfer of services to subjective testing that has received increased attention re- from professionals to the public via the Internet. These ser- cently. It uses the Internet to assign simple tasks to a group vices often consist of tasks which cannot or not efficiently of online workers. Hence tests are no longer performed in be solved by computers but are simple enough to be per- a standard conforming laboratory, but conducted via the In- formed by non-trained workers, e.g. tagging photos with ternet with participants from all over the world. This not meaningful key words. However, even rather complex ser- only allows us to recruit the subjects from a larger, more vices can be crowdsourced, like creative tasks such as the generation of new business ideas [5], all kinds of profes- 4. EXTENDING THE CROWD-BASED APPROACH sional design work [5] or financial services via crowd-funding TO TABLETS [6]. There are many examples where such services are per- formed by volunteers, the most prominent one may be Wiki- Extending the crowd-sourcing approach to tablets, the videos pedia, but by now there also exist a number of professional under test are no longer presented using a web browser, but platforms that connect businesses with workers willing to rather with a native application for the tablets’ operating collaborate for a small payment. system. Additionally, this allows us to assess video qual- The first and still most prominent platform was created ity in a realistic environment and use case that is becoming in 2005 by Amazon Inc. under the name Mechanical Turk ever more important. where a requester can define and place so called Human In this contribution, we chose the Apple iPad family, Intelligence Tasks (HITs). These HITs are small tasks which as it currently has a significant market share in the tablet can be performed independently of each other. Any worker category and can therefore be considered representative of who is registered at the platform may choose to perform any tablet devices. Also the large market share provides a huge HIT for the amount of payment which has been assigned to pool of potential workers for the quality assessment tasks. this HIT by the requester. There are, however, means to Compared to tablets using the Android operating system, further limit the workforce based on age, nationality, or via the limited range of devices in the iPad family provides us a qualification test [1]. also with a well-defined set of hardware capabilities. This addresses one of the main challenges mentioned in [3]: the large variation of different hardware in crowd- based video quality assessment and its potential influence on the quality assessment. In particular, only the brightness 3. CROWD-BASED VIDEO QUALITY can be modified by the user, whereas all other options re- ASSESSMENT garding the presentation e.g. colour rendition of the videos on the iPad are controlled by the operating system and can In crowd-based video quality assessment we utilize these not be changed. crowdsourcing platforms to perform subjective testing with a global worker pool, usually with a web-based application, that can be accessed via common web browsers, e.g. Firefox 5. LOSSLESS CODING or Internet Explorer. Examples of web-based audio-visual The Apple multimedia frame work available on iOS de- quality assessment applications include [1, 2, 7–10]. vices supports numerous video coding standards including Videos under test are losslessly compressed and then H.264/AVC, that supports lossless coding in its High 4:4:4 provided to the test subjects via a web interface in their Predictive profile. This built-in support for H.264/AVC, browser. In order to avoid frame freezes or similar distor- however, is limited to the High profile, thus excluding native tions due to limitations in the available bandwidth of the support for the decoding of losslessy encoded H.264/AVC transmission network, the videos are locally cached before videos. the playback starts. Subjects then assess the visual quality An alternative is the open source multimedia framework and the corresponding judgements are provided to the test FFmpeg [11]. It consists of a collection of different software manager. The aim is to keep the methodology as close as libraries that provide encoding and decoding functionality possible to the methodology used for subjective tests in a for a wide range of video codecs. Also FFmpeg is avail- lab environment. able not only for desktop operating systems, but also tablet The motivation to use is based on operating system including iOS. In the version used in this the fact that in general the worker’s web-browser and plug- contribution, the following codecs supporting lossless video ins cannot be assumed to support the original encoding for- compression are included: H.264/AVC, Motion JPEG2000, mat of the videos under test, as this would necessarily limit , FFvHuff, FFv1 and . H.264/AVC is a our research to already widely adopted coding standards and well-known ITU/ISO standard and we therefore refer to the their profiles. Therefore the videos need to be delivered ei- exising literature e.g. [12,13] for further information. Com- ther uncompressed or only using lossless compression to pared to these two standards, the HuffYUV, FFvHuff, FFv1 the workers. This enables us to consider also new coding and Lagarith codecs were developed as alternatives to un- technologies or other processing algorithms. One could of compressed video within the open-source community. course re-encode the videos for the delivery with common HuffYUV was proposed by Ben Rudiak-Gould [14] as lossy coding techniques, but then we would move further an alternative to uncompressed YCbCr video. It combines from the ideal lab setup, as we then also implicitly assess an intra-frame prediction with a consecutive Huffman en- the artefacts introduced by this additional compression. tropy coding of the residuals. The intra-frame prediction Fig. 1: Video Sequences used in the performance comparison. In the left column from top to bottom: City, Crew, Football, Foreman and Stephan. In the right column from top to bottom: MobCal, ParkRun, Shields and Stockholm

Similiar to FFvYUV and FFv1, Lagarith [18] is also based on HuffYUV.It combines the median intra-frame cod- ing model with a combination of Huffman variable run length coding followed by . Unique to Lagarith is the support of so-called null frames: if the previous frame is identical to the current frame, the current frame is skipped and the decoder will use the previous frame.

Fig. 2: Intra-frame prediction for HuffYUV 6. PERFORMANCE COMPARISON

In order to determine if lossless coding and therefore crowd- selects between three different prediction models: left, gra- based video quality assessment is feasible on a tablet, we dient and median. The first model, left, only uses the pixel compare H.264/AVC, FFvYUV, FFv1 and Lagarith with re- l to predict the pixel x as x = l, the second model, gra- spect to the achieved compression ratio and . As dient, predicts x as x = l + a − d and the median model FFvYUV is identical to HuffYUV except for the improve- selects the median value from the model left, the model gra- ment in the , we will only consider FFvYUV dient and from the pixel a above x as illustrated in Fig. 2. in the performance comparison. Note, that the model selection is not adaptive, but rather one The test device was an iPad2 with iOS 5.1.1 and for model is selected a-priori to the encoding and then used for encoding the video sequences FFmpeg version 0.7.12 was the complete video sequence. Also only a fixed table for the used. FFmpeg was compiled with the inline assembler op- Huffmann code entropy encoding is used. tion to increase the decoding performance. For H.264/AVC, FFvYUV [15] is an extension of HuffYUV developed by the High 4:4:4 Predictive profile, with both CABAC and the FFmpeg project to address some of HuffYUV’s short- CAVLAC entropy encoding was used, for all other codecs comings: instead of a fixed Huffman table, context-adaptive the FFmpeg default settings were used. Huffman tables are used for the entropy encoding. We considered in total four different video formats: CIF, FFv1 [16] was also developed by the FFmpeg project 576i50, 720p50 and 1080i50. For CIF, we used City, Crew, and is derived from HuffYUV. The main difference is that Football, Foreman and Stephan, all at a frame rate of 30 the intra-frame coding model is limited to the median model frames per second (fps). The 576i50 and 1080i50 video as described above and instead of a fixed Huffmann code sequences are MobCal, ParkRun, Shields and Stockholm. table, two different options for the entropy coding are avail- Lastly, for the 720p50 format we used the sequences Mob- able: the first option is a Golomb-Rice variable run length Cal, ParkRun and Shields. All video sequences were pro- code, the second option an arithmetic code based on [17]. vided in YCbCr, 4:2:0 format to the different encoders and 3 CIF 576i 2.75 H.264/AVC 2.5 (CABAC) H.264/AVC 2.25 (CAVLC) FFv1 2

1.75 FFvHuff compression ratio 1.5 Lagarith 1.25

1 city crew football foreman stephan mobcal parkrun shields stockholm 3 720p 1080i 2.75

2.5

2.25

2

1.75 compression ratio 1.5

1.25

1 mobcal parkrun shields mobcal parkrun shields stockholm

Fig. 3: Compression ratio for different codecs and different video formats for both interlaced formats the videos were de-interlaced to If we consider the achieved frame rates in Fig. 4, how- 50 fps before encoding. All video sequences have a length ever, we can see that unfortunately the good compression of 10 s and are shown in Fig.1. ratio seems to corresponds to more decoding complexity and slower frame rates. For the CIF format, all codecs are able to achieve a frame rate of at least 30 fps, the neces- 7. RESULTS AND DISCUSSION sary frame rate to display the video sequences without jud- der. But for the higher resolution formats, none of the video The compression ratios achieved for the different video se- codecs considered in this contribution is able to achieve the quences and codecs are presented in Fig.3, the achieved required frame rate for judder-free play back and the frame frame rate in Fig.4. Considering the compression ratios, we rate drops nearly linear with increasing spatial resolution. can see from Fig.3 that H.264/AVC with activated CABAC outperforms all other codecs for most of the CIF, 576i50 and This lack in performance could be caused by two dif- 720p50 video sequences. Only for the 1080i50 sequences, ferent issues: FFmpeg’s lack of optimisation for the iOS another , FFv1, comes close. Not surprisingly, the platform and/or the limited hardware capabilities of tablets lower the complexity of the codec, the worse the compres- in general and the iPad 2 in particular. Although the current sion performance. In particular, we can also notice the dif- iPad platform allows for GPU acceleration using OpenGL ference between CABAC and CAVLC for H.264/AVC. But ES, the available FFmpeg versions have so far not been even H.264/AVCwith CABAC as the best performing codec adapted and therefore perform all operations on the iPad’s leads to rather large bitrates and corresponding file sizes for CPU. Secondly, due the high bitrates of upto 34 Mbyte/s or the encoded video sequences: for CIF, and 576i50, the av- roughly 272 MBit/s required by the lossless coding, the uti- erage bitrate is 1.8 MByte/s and 6.5 MByte/s, respectively, lized flash memory may not be able to achieve the necessary and for 720p50 and 1080i50 the is 34 MByte/s data rate for judder-free play back. An indication for this is for both formats. a periodicity of the judder, typically also observed on desk- 120 35 CIF 576i 105 30 H.264/AVC (CABAC) 90 H.264/AVC 25 (CAVLC) 75 FFv1 20 60 FFvHuff 15 45 Lagarith frame [fps] rate frame rate [fps] 10 30

15 5

0 0 city crew football foreman stephan mobcal parkrun shields stockholm 8 14 1080i 720p 7 12 6 10 5 8 4 6 3 frame rate [fps] frame [fps] rate 4 2

2 1

0 0 mobcal parkrun shields mobcal parkrun shields stockholm

Fig. 4: Achieved frame rate for different codecs and different video formats top systems in the case of insufficient hard disk data transfer Of course another option would be to avoid lossless cod- rate. These two points, however, have not been confirmed, ing altogether and instead encode the videos under test with yet, and further studies are necessary to determine the exact . On the one hand, the bitrate must be performance bottle-neck. low enough for play back with the tablet’s native multime- dia framework, on the other hand the bitrate must be high enough so that additional coding artefacts introduced by the 8. CONCLUSION re-encoding are not perceivable. But it is unclear if such an equilibrium is achievable and additional studies are needed In this contribution we discussed the extension of crowd- to examine this option in more detail. based video quality assessment to tablets. After shortly discussing the crowdsourcing principle and its approach to video quality assessment, we extended the concept to tablets, 9. REFERENCES exploiting the uniform hardware and realistic testing envi- [1] C. Keimel, J. Habigt, C. Horch, and K. Diepold, ronment. “QualityCrowd - A Framework for Crowd-based We then examined an important part of this framework, Quality Evaluation,” in Picture Coding Symposium lossless video coding, and compared available lossless codecs (PCS) 2012, Krakow, Poland, May 2012, pp. 245–248. for the Apple iPad platform for the complete range of cur- rently used resolutions from CIF to HDTV: [2] ——, “Video Quality Evaluation in the Cloud,” in Pro- The results show that at least for video sequences in CIF ceedings International Packet Video Workshop (PV) resolution and a CIF-typical frame rate of 30 fps, a tablet 2012, Munich, Germany, May 2012, pp. 155 – 160. based video quality assessment is feasible. For higher res- olutions, however, further studies regarding possible hard- [3] C. Keimel, J. Habigt, and K. Diepold, “Challenges in ware limitations of current generation tablets and the eval- crowd-based video quality assessment,” in Quality of uation of encoder optimizations on the tablet platform are Multimedia Experience (QoMEX), 2012 Fourth Inter- necessary. national Workshop on, Jul. 2012, pp. 13 –18. [4] D. R. Rasmussen, “The mobile image quality survey [11] FFmpeg, “About FFmpeg,” Aug. 2012. [Online]. game,” in Society of Photo-Optical Instrumentation Available: http://ffmpeg.org/about.html Engineers (SPIE) Conference Series, ser. Society of Photo-Optical Instrumentation Engineers (SPIE) Con- [12] D. Marpe, T. Wiegand, and G. Sullivan, “The ference Series, vol. 8293, Jan. 2012, pp. 82 930I–1 – H.264/MPEG4 standard and 82 930I–12. its applications,” Communications Magazine, IEEE, vol. 44, no. 8, pp. 134 –143, August 2006. [5] D. C. Brabham, “Crowdsourcing as a model for prob- lem solving,” Convergence: The International Jour- [13] J. Heo, S.-H. Kim, and Y.-S. Ho, “Improved CAVLC nal of Research into New Media Technologies, vol. 14, for H.264/AVC Lossless Intra-Coding,” Circuits and no. 1, pp. 75–90, 2008. Systems for Video Technology, IEEE Transactions on, vol. 20, no. 2, pp. 213 –222, Februar 2010. [6] A. Gaggioli and G. Riva, “Working the crowd,” Sci- ence, vol. 321, no. 5895, p. 1443, 2008. [14] B. Rudiak-Gould, “Huffyuv,” Aug. 2012. [Online]. Available: http://neuron2.net/www.math.berkeley. [7] K.-T. Chen, C.-C. Wu, Y.-C. Chang, and C.-L. Lei, “A edu/benrg/huffyuv.html crowdsourceable QoE evaluation framework for mul- timedia content,” in Proceedings of the 17th ACM in- [15] M. Niedermayer, “FFvHuff,” Aug. 2012. [On- ternational conference on Multimedia, ser. MM ’09. line]. Available: http://svn.ffmpeg.org/doxygen/trunk/ ACM, 2009, pp. 491–500. huffyuv 8c-source.html [8] K.-T. Chen, C.-J. Chang, C.-C. Wu, Y.-C. Chang, [16] ——, “FFV1 Specification,” Aug. and C.-L. Lei, “Quadrant of euphoria: a crowdsourc- 2012. [Online]. Available: http://www1.mplayerhq. ∼ ing platform for QoE assessment,” Network, IEEE, hu/ michael/.html vol. 24, no. 2, pp. 28–35, Mar. 2010. [17] G. Nigel and N. Martin, “Range encoding: an algo- [9] F. Ribeiro, D. Florencio, C. Zhang, and M. Seltzer, rithm for removing redundancy from a digitised mes- “Crowdmos: An approach for crowdsourcing mean sage,” in Proceedings of the Conference on Video and opinion score studies,” in Acoustics, Speech and Sig- Data Recording, ser. IERE conference proceedings. nal Processing (ICASSP), 2011 IEEE International Institution of Electronic and Radio Engineers, Jul. Conference on, May 2011, pp. 2416–2419. 1979. [10] F. Ribeiro, D. Florencio, and V. Nascimento, “Crowd- [18] B. Greenwood, “Lagarith lossless video codec,” April sourcing subjective image quality evaluation,” in Im- 2004. [Online]. Available: http://lags.leetcode.net/ age Processing (ICIP), 2011 IEEE International Con- codec.html ference on, Sep. 2011, pp. 3158–3161.