Enhancing Libva-Utils with VP8 and HEVC Encoding and Temporal Scalability for VP8

Title: Enhancing Libva-Utils with VP8 and HEVC Encoding and Temporal Scalability for VP8 = Context = Many modern CPUs support hardware acceleration for processing, decoding and encoding video. Hardware acceleration lowers the CPU time needed for handling videos, and it can also be more energy-efficient, resulting in longer battery life for mobile devices. Applications can leverage these capabilities with interfaces that enable access to hardware acceleration from user mode. For Intel CPUs on Linux, this interface is called VA-API. It consists of “a main library and driver- specific acceleration backends for each supported hardware vendor” [1]. The main library is called libva, and the driver-specific backend for integrated Intel GPUs is called intel-vaapi-driver. In this project, I want to focus on libva-utils, written in C. It provides simple reference applications for encoding, decoding and video processing and includes a number of API conformance tests for libva (which are implemented using the GoogleTest framework). Libva-utils is designed to test the hardware acceleration and API stack and serves as a starting point for further development, such as including VA-API acceleration into application software and multimedia frameworks. = Problem = As hardware advances, VA-API must reflect these changes. Most available engineering resources are invested in the backend (intel-vaapi-driver) and the main library (libva). Regarding libva-utils, several areas can be identified that have lagged behind the other parts of this stack. Libva-utils in its current state does not include reference encoders for either VP8 or HEVC. Further encoder-specific enhancements such as temporal scalability are currently only available to selected codecs (e.g., H.264). With this present proposal, I intend to contribute to an up-to-date libva-utils. = Project Goals/Deliverables = * Implement a sample encoder application for the VP8 codec (called vp8enc). * Implement a sample encoder application for the HEVC codec (called hevcenc). * Add automated testing * Add temporal scalability to VP8 encoder. * Optional: Add temporal scalability VP9 encoder. = Prerequisites = VP8 and HEVC encoding are supported on Intel Gen9 GPUs and higher; therefore, a Skylake+ system is needed for development and testing. I plan to install a low-cost remotely accessible Kabylake Celeron Server running a recent version of 64-bit Ubuntu as the main development machine for this project. On the software side, having a self-contained stack with intel-vaapi-driver, libva and libva-utils would allow different versions to be tested independently. The entire stack should be compiled in a way that all relevant shared libraries, such as libva*.so and libi965_drv_video.so, can be placed in arbitrary locations. This can be done by setting relevant environment variables (such as LIBVA_LIBS at compile time, respectively LIBVA_DRIVERS_PATH at runtime). This allows a coexistence with standard Ubuntu packages and multiple versions of the stack on the same machine. = Implementation = == VP8 Encoder Application == Developing a simple VP8 encoder application demands a basic understanding of the VP8 codec [2], its bitstream [3], the IVF container [4] and VAAPI. Because we are interfacing a hardware encoder and since most of the intra-frame processing is transparent to the user, it is not necessary to understand every detail of the VP8 codec. Nevertheless, we must know how VP8 handles interframe processing because we must provide the necessary buffers for the reference frames. Libva-utils already includes a reference implementation of a VP9 encoder application [5], which can be used as a starting point to prototype a VP8 encoder. To understand which parts of the code need modification, it is worth investigating the differences between VP8 and VP9. The following list is a first attempt to identify these differences, but it may not currently be complete: * Superblocks and Macroblocks VP8 processes a frame in fixed-sized macroblocks (for example, 4x4 luma and 8x8 chroma). VP9 uses a more granular approach—it uses so-called superblocks, of sizes up to 64x64 pixels. Superblocks can be further subdivided into smaller entities down to 4x4 pixels. This helps VP9 perform better with high-resolution content. For example, imagine encoding a uniformly blue sky. The corresponding blocks do not hold much frequency information and can therefore be encoded more efficiently in a large block. Having a brief look at the source of vp9enc.c, I am inclined to assume that by using VAAPI, this difference is transparent to the user. * Segments Both codecs support segment frames. This means that each macroblock or superblock can be individually assigned to a segment number. The segments need not be contiguous or have any predefined order [6]. Each segment can be processed with its own quantization and filter settings. VP8 and VP9 differ in the number of maximum supported segments per frame. VP8 allows for four segments and VP9 allows for eight segments. In libva, VP8 and VP9 segments are allocated slightly differently, as one can see when comparing va_enc_vp8.h and va_enc_vp9.h, and therefore some adaptation work in this regard can be expected. * Reference Frames Both codecs use reference frames. VP8- and VP9-encoded frames can have up to three references (the last frame, the golden frame and an alternative reference frame). In VP9, potential reference frames are stored in a pool of eight frames. For example, in va_enc_vp9.h, the _VAEncPictureParameterBufferVP9 structure holds an element VASurfaceID reference_frames[8]. Currently, I am unsure whether the VP8 implementation uses such a pool. Regardless, reference frames are treated slightly differently. Generally speaking, reference frames are part of the interframe processing, and as frames must be allocated (by calling vaCreateSurfaces()), it is certain that this part of the VP8 encoder application must be reworked. * Tiles and Data Partitioning VP9 uses tiling to divide frames into sections, which can be processed independently. VP8 uses data partitioning, allowing to binary code macroblocks and motion vectors independently from quantized transformation coefficients. Both techniques address the need to parallelize work across several CPU cores. VP8 does not support tiling and VP9 does not support data partitioning. Implementing data partitioning for the VP8 encoder application can most likely be done by setting the auto_partitions flag in the _VAEncPictureParameterBufferVP8 structure. Developing the VP8 encoder application can be done in incremental steps. This process would start with an I–Frame-only encoder and later add support for segments and reference frames and VP8- specific features such as data partitioning. == HEVC Encoder Application == Like for the previous task, the prerequisites for developing a simple HEVC encoder application are a basic understanding of the HEVC codec [7], its bitstream and container, as well as the corresponding VAAPI. For reference, an H.264 encoder is already available in libva-utils. Studying other sources such as libyami [8] and gstreamer-vaapi [9] may be worthwhile. Comparing H.264 to HEVC can hint to which parts of the implementation require special attention: * Macroblocks and Coding Tree Units H.264 uses fixed-sized macroblocks for encoding (size depends on the used profile and whether luma or chroma is handled, and it can vary between 4x4, 8x8 and 16x16). HEVC (like VP9) addresses the need for high-resolution content. HEVC approaches this by dividing the frame into coding tree units (CTUs), “which can use larger block structures of up to 64×64 pixels and can better sub-partition the picture into variable sized structures” [10]. The structure of the HEVC coding tree is a quadtree, each subdivision of which has four children. The smallest CTU has a size of 4x4. My guess is that this is internally handled and the encoder application need not interfere at this level. * Prediction Modes In HEVC, the number of available prediction modes has increased to 35 (from nine for H.264). Again, this difference is most likely internally handled. * Reference Frames H.264 can have up to 16 reference frames, whereas in HEVC, the number of references is 2x8, meaning that the same reference frame can be used more than once but with different weights. The total number of unique references in HEVC is eight. * Others This list is certainly not complete at this time. In addition, more research on HEVC, its bitstream and container is required. The time needed to become familiar with both codecs is reserved in the timeline. == Automated Encoder Test == To test the developed encoders automatically, the following criteria for encoder output can be evaluated: * Decodability: tests whether the bitstream can be decoded without errors * Number of frames: checks whether the number of input frames equals the number of output frames * Resolution: checks whether input and output resolutions match * Frame content: tests for content reproduction All tests demand the availability of a corresponding decoder, either from within libva-utils or an external decoder. Testing frame content is a bit more complicated, as this test includes the generation of suitable test patterns, encoding, decoding and automated analysis of the test pattern. I would like to propose the use of QR codes as a test pattern that is simple both to generate and to analyze. Because we are testing lossy codecs, the QR dots must be the size of a macroblock (or an integer multiple) to keep distortion resulting from the coding process low. Depending on the QR profile used, a certain percentage of dot defects can also be tolerated (7–30%) Rather than quantifying the quality of the resulting image, this test is limited to basic image reproduction and only qualifies as a pass/fail test. This test could be extended in two ways: * Testing interframe encoding by using a series of QR codes While this test is intended for static images, it can be enhanced to handle interframe encoding by using a series of QR codes. For example, the frame number (such as “frame:%03d”) can be encoded into each QR code.

Load more