Overview of Mpeg-5 Part 2 – Low Complexity
Total Page:16
File Type:pdf, Size:1020Kb
ITU Journal: ICT Discoveries, Vol. 3(1), 8 June 2020 OVERVIEW OF MPEG-5 PART 2 – LOW COMPLEXITY ENHANCEMENT VIDEO CODING (LCEVC) Florian Maurer1, Stefano Battista2, Lorenzo Ciccarelli3, Guido Meardi3, Simone Ferrara3 1RWTH Aachen University, Germany, 2Università Politecnica delle Marche (UNIVPM), Italy, 3V-Nova, UK Abstract – This paper provides an overview of MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC), a novel video coding standard from the MPEG ISO Working Group. The codec is designed for use in conjunction with existing video codecs, leveraging specific tools for encoding "residuals", i.e. the difference between the original video and its compressed representation. LCEVC can improve compression efficiency and reduce the overall computational complexity using a small number of specialized enhancement tools. This paper provides an outline of the coding structure of encoder and decoder, coding tools, and an overview of the performance of LCEVC with regard to both compression efficiency and processing complexity. Keywords – Low Complexity Enhancement Video Coding (LCEVC), MPEG-5 Part 2, multi-resolution video coding, video compression. 1. INTRODUCTION is the case, changing codecs cannot be done without relevant up-front investments and large amounts of This paper provides an overview of MPEG-5 Part 2 time. Accordingly, having the possibility to upgrade Low Complexity Enhancement Video Coding an ecosystem without the need to replace it (LCEVC), a new video coding standard developed by completely and still having the freedom to select a MPEG that is scheduled to be published as base codec of their choice is an important option ISO/IEC 23094-2 [1]. that operators need to have. Rather than being a replacement for existing video Further, service operators, small and big alike, are coding schemes, LCEVC is designed to leverage increasingly concerned about the cost of delivering existing (and future) codecs to enhance their a growing number of services, often using performances whilst reducing their computational decentralized infrastructures such as cloud-based complexity. It is not meant to be an alternative to systems or battery-powered edge devices. The need other codecs, but rather a useful complement to any to increase the overall efficiency of video delivery codec. systems must also be balanced with the seemingly This is achieved by a combination of processing an conflicting needs to upgrade video resolutions and input video at a lower resolution with an existing consume less power. single-layer codec and using a simple and small set Finally, the “softwarization” of solutions across the of highly specialized tools to correct impairments, technological spectrum has brought up the need to upscale and add details to the processed video. have also codec solutions which do not necessarily require a bespoke dedicated hardware for 2. COMMERCIAL REASONS FOR THE operating efficiently, but rather can operate as a STANDARD software layer on top of existing infrastructures and LCEVC was driven by several commercial needs put deliver the required performances. forward to MPEG by many leading industry experts LCEVC seeks to solve the above issues by providing from various areas of the video delivery chain, from a solution that is compatible with existing vendors to traditional broadcasters, from satellite (and future) ecosystems whilst delivering it at a providers to over-the-top (OTT) service providers lower computational cost than it would be and social media [2]. otherwise possible with a tout-court upgrade. Service providers work with complex ecosystems. Aside from rapidly improving the efficiency of They make choices on codecs based on various legacy workflows, LCEVC can also improve the factors, including maximum compatibility with business case for the adoption of next-generation their existing ecosystems, costs of deploying the codecs, by combining their superior coding technology (including royalty rates), etc. Sometimes efficiency with significantly lower processing they are forced to make certain choices. Whichever requirements. © International Telecommunication Union, 2020 Some rights reserved. This work is available under the CC BY-NC-ND 3.0 IGO license: https://creativecommons.org/licenses/by-nc-nd/3.0/igo/. More information regarding the license and suggested citation, additional permissions and disclaimers is available at: https://www.itu.int/en/journal/2020/001/Pages/default.aspx ITU Journal: ICT Discoveries, Vol. 3(1), 8 June 2020 3. KEY TECHNICAL FEATURES 3.5 Parallelization LCEVC deploys a small number of very specialized The scheme does not use any inter-block prediction. coding tools that are well suited for the type of data The image is processed by applying small (2x2 or it processes. Some of the key technical features are 4x4) independent transform kernels over the layers highlighted below. of residual data. Since no prediction is made between blocks, each 2x2 or 4x4 block can be 3.1 Sparse residual data processing processed independently and in a parallel manner. As further shown in Section 5, the coding scheme Moreover, each layer is processed separately, thus processes one or two layers of residual data. This allowing the decoding of the blocks and decoding of residual data is produced by taking differences the layers to be done in a largely parallel manner. between a reference video frame (e.g., a source video) and a base-decoded upscaled version of the 4. BITSTREAM STRUCTURE video. The resulting residual data is sparse The LCEVC bitstream contains a base layer, which information, typically edges, dots and details which may be at a lower resolution, and an enhancement are then efficiently processed using very simple and layer consisting of up to two sub-layers. The small transforms which are designed to deal with following section briefly explains the structure of sparse information. this bitstream and how the information can be 3.2 Efficient use of existing codecs extracted. The base codec is typically used at a lower While the base layer can be created using any video resolution. Because of this, the base codec operates encoder and is not specified further in the LCEVC on a smaller number of pixels, thus allowing the standard, the enhancement layer must follow the codec to use less power, operate at a lower structure as specified. Similar to other MPEG codecs quantization parameter (QP) and use tools in a [3][4], the syntax elements are encapsulated in more efficient manner. network abstraction layer (NAL) units which also help synchronize the enhancement layer 3.3 Resilient and adaptive coding process information with the base layer decoded The scheme allows the overall coding process to be information. Depending on the position of the frame resilient to the typical coding artefacts introduced within a group of pictures (GOP), additional data by traditional discrete cosine transform (DCT) specifying the global configuration and for block-based codecs. The first enhancement controlling the decoder may be present. sub-layer (L-1 residuals) enables us to correct The data of one enhancement picture is encoded artefacts introduced by the base codec, whereas the into several chunks. These data chunks are second enhancement sub-layer (L-2 residuals) hierarchically organized as shown in Fig. 1. For each enables us to add details and sharpness to the processed plane (nPlanes), up to two enhancement corrected upscaled base for maximum fidelity sub-layers (nLevels) are extracted. Each of them (up to lossless coding). Typically, the worse the again unfolds into numerous coefficient groups of base reconstruction is, the more the first layer may contribute to correct. Conversely, the better the Coefficient Group-1 data base reconstruction is, the more bit rate can be nLayers (0) allocated to the second sub-layer to add the finest details. Coefficient Group-1 data 3.4 Agnostic base enhancement nLevels (nLayers - 1) The scheme can enhance any base codec, from Plane Y existing ones (MPEG-2, VP8, AVC, HEVC, VP9, AV1, etc.) to future and under-development ones nPlanes Coefficient Group-2 data nLayers (0) (including EVC and VVC). The reason is that the enhancement operates on a decoded version of the base codec in the pixel domain, and therefore it can Plane V Coefficient Group-2 data be used on any format as it does not require any (nLayers - 1) information on how the base has been encoded and/or decoded. Fig. 1 – Encoded enhancement picture data chunk structure ITU Journal: ICT Discoveries, Vol. 3(1), 8 June 2020 entropy encoded transform coefficients. residuals. These residuals form the starting point The amount depends on the chosen type of for the encoding process of the first enhancement transform (nLayers). Additionally, if the temporal sub-layer. A number of coding tools, which will be prediction is used, for each processed plane an described further in the following subsection, additional chunk with temporal data for process the input and generate entropy encoded Enhancement sub-layer 2 is present. quantized transform coefficients. 5. CODING STRUCTURE 5.1.3 Enhancement sub-layer 2 As a last step of the encoding process, the 5.1 Encoder enhancement data for Layer 2 needs to be The encoding process to create an LCEVC generated. In order to create the residuals, the conformant bitstream is shown in Fig. 2 and can be coefficients from Layer 1 are processed by an in- depicted in three major steps. loop decoder to achieve the corresponding reconstructed picture. Since Layer 1 might have a 5.1.1 Base codec different resolution than the input sequence, the Firstly, the input sequence is fed into two reconstructed picture is processed by an upscaler, consecutive downscalers and is processed again depending on the chosen scaling mode. according to the chosen scaling modes. Any Finally, the residuals are calculated by a subtraction combination of the three available options of the input sequence and the upscaled (2-dimensional scaling, 1-dimensional scaling in the reconstruction.