Design and Implementation of a Fast HEVC Random Access Video Encoder

Design and Implementation of a Fast HEVC Random Access Video Encoder ALFREDO SCACCIALEPRE Master's Degree Project Stockholm, Sweden March 2014 XR-EE-KT 2014:003 Contents 1 Introduction 11 1.1 Background . 11 1.2 Thesis work . 12 1.2.1 Factors to consider . 12 1.3 The problem . 12 1.3.1 C65 . 12 1.4 Methods and thesis outline . 13 1.4.1 Methods . 13 1.4.2 Objective measurement . 14 1.4.3 Subjective measurement . 14 1.4.4 Test sequences . 14 1.4.5 Thesis outline . 16 1.4.6 Abbreviations . 16 2 General concepts 19 2.1 Color spaces . 19 2.2 Frames, slices and tiles . 19 2.2.1 Frames . 19 2.2.2 Slices and Tiles . 19 2.3 Predictions . 20 2.3.1 Intra . 20 2.3.2 Inter . 20 2.4 Merge mode . 20 2.4.1 Skip mode . 20 2.5 AMVP mode . 20 2.5.1 I, P and B frames . 21 2.6 CTU, CU, CTB, CB, PB, and TB . 21 2.7 Transforms . 23 2.8 Quantization . 24 2.9 Coding . 24 2.10 Reference picture lists . 24 2.11 Gop structure . 24 2.12 Temporal scalability . 25 2.13 Hierarchical B pictures . 25 2.14 Decoded picture buffer (DPB) . 25 2.15 Low delay and random access configurations . 26 2.16 H.264 and its encoders . 26 2.16.1 H.264 . 26 1 2 CONTENTS 3 Preliminary tests 27 3.1 Speed - quality considerations . 27 3.1.1 Interactive applications . 27 3.1.2 Entertainment applications . 28 3.2 Compression efficiency tests . 29 3.2.1 C65 and HM12 . 30 3.2.2 C65 and x264 . 30 3.3 Conclusions . 31 4 Implementing B pictures 33 4.1 Gop structure . 33 4.1.1 C65's original gop structure . 33 4.1.2 Hierarchical gop structure . 34 4.1.3 Choice of gop size . 34 4.1.4 C65 optional parameter -hgopsize ............. 35 4.1.5 Considerations on coding delay . 35 4.1.6 Memory requirements for DPB . 37 4.1.7 Syntax elements to set . 38 4.2 B slices - details about implementation . 39 4.2.1 Gop structure and references . 39 4.2.2 B slices - building list 0 and list 1 . 39 4.2.3 Generation of merge candidates . 40 4.2.4 Motion Vector Prediction . 41 4.2.5 B slices - syntax fixes . 42 4.2.6 Other modifications . 43 4.3 Motion compensation . 44 4.3.1 Interpolation . 44 4.3.2 Averaging . 45 4.3.3 SIMD code . 46 4.4 Motion estimation . 47 5 Rate distortion optimization 49 5.1 Motion estimation . 49 5.2 Motion estimation in C65 B..................... 49 5.2.1 Step zero: generate the merge candidates list . 49 5.2.2 Step one: choose best merge index . 50 5.2.3 Step two: test the skip mode . 51 5.2.4 Step three: test the intra mode . 52 5.2.5 Step four: test uni-prediction . 52 5.2.6 Step five: motion vector local search . 54 5.2.7 Step six: mode selection . 54 5.3 Quantization parameter choice . 55 6 Results and future works 57 6.1 Methodology . 57 6.2 Results . 58 6.2.1 hgopsize 8 - various qp values . 58 6.2.2 hgopsize 4 - various qp values . 62 6.2.3 hgopsize 2 - various qp values . 64 6.2.4 hgopsize 2 vs hgopsize 4 vs hgopsize 8 ........ 66 CONTENTS 3 6.2.5 C65 against C65 B..................... 66 6.2.6 Subjective test . 67 6.2.7 C65 B against x264 . 69 6.2.8 C65 B against HM . 69 6.2.9 Encoding speed . 69 6.2.10 Final considerations . 70 6.3 Future works . 70 6.3.1 Gop selection . 70 6.3.2 Combined prediction signal for motion vector search . 71 6.3.3 32 x 32 and 64 x 64 mode . 72 6.3.4 Deblocking filter and TMVP . 72 6.3.5 I frames to improve with more directions . 72 7 Conclusions 75 A Listings and result data 77 A.1 Listings . 77 A.2 Result data . 78 4 CONTENTS List of Figures 1.1 Video sequences under test . 16 2.1 CTU and CTB . 22 2.2 CTB split in CB . 22 2.3 CB and PB . 23 2.4 CB and TB . 23 2.5 Temporal scalability . 25 2.6 Hierarchical coding structure with 4 temporal levels . 26 3.1 Interactive applications, low-delay mode c65 comparison against other codecs . 27 3.2 Entertainment applications, c65 comparison against other codecs 28 4.1 C65 original encoding order . 34 4.2 Dyadic hierarchical gop structures implemented in C65 . 36 4.3 Hierarchical non-dyadic gop structure size 8, -hgopsize 8n ... 37 4.4 Sample DPB content . 38 4.5 Position of spatial candidates of motion information . 40 4.6 Fractional positions in motion compensation . 44 4.7 Averaging of input signal in H.264 . 45 4.8 Averaging of input signal in HEVC . 46 5.1 Scene cut . 53 5.2 Video sequence with scene cut . 54 6.1 Subjective test . 68 6.2 Hierarchical gop structure . 72 5 6 LIST OF FIGURES List of Tables 3.1 C65 vs HM 12.0 random access configuration . 29 3.2 HM 12.0 random access configuration vs same software using only one reference picture. 31 3.3 C65 vs x264 with settings: --psnr --threads 1 --profile high --preset veryslow --tune psnr -I 48 ............. 32 3.4 x264 with settings: --psnr --threads 1 --profile high --preset veryslow --tune psnr -I 48 vs x264 with settings: --psnr --threads 1 --profile high --preset veryslow --ref 1 --tune psnr -I 48 .............................. 32 4.1 Luma interpolation filter in HEVC . 45 6.1 Qp toggling possibilities . 58 6.2 Summary qp toggling configurations, gop size 8 . 58 6.3 Summary qp toggling configurations, gop size 4 . 63 6.4 Summary qp toggling configurations, gop size 2 . 65 6.5 Summary gop structures comparison . 66 6.6 C65 vs C65 B............................. 67 6.7 Encoding speed . 69 6.8 C65 B vs HM 12.0 for one Intra picture . 73 A.1 No qp toggle vs configuration -4 -1 0 1 2.............. 78 A.2 No qp toggle vs configuration -4 -2 0 2 4.............. 79 A.3 No qp toggle vs configuration -4 -2 2 3 4.............. 79 A.4 No qp toggle vs configuration -4 -3 0 4 8.............. 80 A.5 No qp toggle vs configuration -4 -2 2 4 6.............. 80 A.6 No qp toggle vs configuration -4 -3 2 4 8.............. 81 A.7 No qp toggle vs configuration -4 -3 -3 4 8 ............. 81 A.8 No qp toggle vs configuration -4 -1 0 1............... 82 A.9 No qp toggle vs configuration -4 -2 0 2............... 82 A.10 No qp toggle vs configuration -4 -3 0 3............... 83 A.11 No qp toggle vs configuration -4 -3 2 6............... 83 A.12 No qp toggle vs configuration -4 -2 2................ 84 A.13 No qp toggle vs configuration -4 -3 3................ 84 A.14 No qp toggle vs configuration -4 -3 6................ 85 A.15 Configuration -4 -3 3 vs configuration -4 -3 2 6.......... 85 A.16 Configuration -4 -3 2 6 vs configuration -4 -3 -3 4 8 . 86 A.17 C65 B vs x264 . ..

Design and Implementation of a Fast HEVC Random Access Video Encoder

VOL. E100-C NO. 6 JUNE 2017 the Usage of This PDF File Must Comply

Dynamic Resource Management of Network-On-Chip Platforms for Multi-Stream Video Processing

Parameter Optimization in H.265 Rate-Distortion by Single Frame

Fast Coding Unit Encoding Mechanism for Low Complexity Video Coding

The Open-Source Turing Codec: Towards Fast, Flexible and Parallel Hevc Encoding

Design and Analysis of Video Compression Technique Using Hevc Intra-Frame Coding K

Parallel Deblocking Filter Based on Modified Order of Accessing the Coding Tree Units for HEVC on Multicore Processor

Two-Dimensional Audio Compression Method Using Video Coding Schemes

Proceedings of the 2018 Symposium on Information Theory and Signal Processing in the Benelux, May 31

Hardware Implementation of HEVC Inverse Transform in 45Nm CMOS

An 8-Week, Open-Label Study in Depressed Patients

Entropy Coding in HEVC