Design and Implementation of a Fast HEVC Random Access Video Encoder
Total Page:16
File Type:pdf, Size:1020Kb
Design and Implementation of a Fast HEVC Random Access Video Encoder ALFREDO SCACCIALEPRE Master's Degree Project Stockholm, Sweden March 2014 XR-EE-KT 2014:003 Contents 1 Introduction 11 1.1 Background . 11 1.2 Thesis work . 12 1.2.1 Factors to consider . 12 1.3 The problem . 12 1.3.1 C65 . 12 1.4 Methods and thesis outline . 13 1.4.1 Methods . 13 1.4.2 Objective measurement . 14 1.4.3 Subjective measurement . 14 1.4.4 Test sequences . 14 1.4.5 Thesis outline . 16 1.4.6 Abbreviations . 16 2 General concepts 19 2.1 Color spaces . 19 2.2 Frames, slices and tiles . 19 2.2.1 Frames . 19 2.2.2 Slices and Tiles . 19 2.3 Predictions . 20 2.3.1 Intra . 20 2.3.2 Inter . 20 2.4 Merge mode . 20 2.4.1 Skip mode . 20 2.5 AMVP mode . 20 2.5.1 I, P and B frames . 21 2.6 CTU, CU, CTB, CB, PB, and TB . 21 2.7 Transforms . 23 2.8 Quantization . 24 2.9 Coding . 24 2.10 Reference picture lists . 24 2.11 Gop structure . 24 2.12 Temporal scalability . 25 2.13 Hierarchical B pictures . 25 2.14 Decoded picture buffer (DPB) . 25 2.15 Low delay and random access configurations . 26 2.16 H.264 and its encoders . 26 2.16.1 H.264 . 26 1 2 CONTENTS 3 Preliminary tests 27 3.1 Speed - quality considerations . 27 3.1.1 Interactive applications . 27 3.1.2 Entertainment applications . 28 3.2 Compression efficiency tests . 29 3.2.1 C65 and HM12 . 30 3.2.2 C65 and x264 . 30 3.3 Conclusions . 31 4 Implementing B pictures 33 4.1 Gop structure . 33 4.1.1 C65's original gop structure . 33 4.1.2 Hierarchical gop structure . 34 4.1.3 Choice of gop size . 34 4.1.4 C65 optional parameter -hgopsize ............. 35 4.1.5 Considerations on coding delay . 35 4.1.6 Memory requirements for DPB . 37 4.1.7 Syntax elements to set . 38 4.2 B slices - details about implementation . 39 4.2.1 Gop structure and references . 39 4.2.2 B slices - building list 0 and list 1 . 39 4.2.3 Generation of merge candidates . 40 4.2.4 Motion Vector Prediction . 41 4.2.5 B slices - syntax fixes . 42 4.2.6 Other modifications . 43 4.3 Motion compensation . 44 4.3.1 Interpolation . 44 4.3.2 Averaging . 45 4.3.3 SIMD code . 46 4.4 Motion estimation . 47 5 Rate distortion optimization 49 5.1 Motion estimation . 49 5.2 Motion estimation in C65 B..................... 49 5.2.1 Step zero: generate the merge candidates list . 49 5.2.2 Step one: choose best merge index . 50 5.2.3 Step two: test the skip mode . 51 5.2.4 Step three: test the intra mode . 52 5.2.5 Step four: test uni-prediction . 52 5.2.6 Step five: motion vector local search . 54 5.2.7 Step six: mode selection . 54 5.3 Quantization parameter choice . 55 6 Results and future works 57 6.1 Methodology . 57 6.2 Results . 58 6.2.1 hgopsize 8 - various qp values . 58 6.2.2 hgopsize 4 - various qp values . 62 6.2.3 hgopsize 2 - various qp values . 64 6.2.4 hgopsize 2 vs hgopsize 4 vs hgopsize 8 ........ 66 CONTENTS 3 6.2.5 C65 against C65 B..................... 66 6.2.6 Subjective test . 67 6.2.7 C65 B against x264 . 69 6.2.8 C65 B against HM . 69 6.2.9 Encoding speed . 69 6.2.10 Final considerations . 70 6.3 Future works . 70 6.3.1 Gop selection . 70 6.3.2 Combined prediction signal for motion vector search . 71 6.3.3 32 x 32 and 64 x 64 mode . 72 6.3.4 Deblocking filter and TMVP . 72 6.3.5 I frames to improve with more directions . 72 7 Conclusions 75 A Listings and result data 77 A.1 Listings . 77 A.2 Result data . 78 4 CONTENTS List of Figures 1.1 Video sequences under test . 16 2.1 CTU and CTB . 22 2.2 CTB split in CB . 22 2.3 CB and PB . 23 2.4 CB and TB . 23 2.5 Temporal scalability . 25 2.6 Hierarchical coding structure with 4 temporal levels . 26 3.1 Interactive applications, low-delay mode c65 comparison against other codecs . 27 3.2 Entertainment applications, c65 comparison against other codecs 28 4.1 C65 original encoding order . 34 4.2 Dyadic hierarchical gop structures implemented in C65 . 36 4.3 Hierarchical non-dyadic gop structure size 8, -hgopsize 8n ... 37 4.4 Sample DPB content . 38 4.5 Position of spatial candidates of motion information . 40 4.6 Fractional positions in motion compensation . 44 4.7 Averaging of input signal in H.264 . 45 4.8 Averaging of input signal in HEVC . 46 5.1 Scene cut . 53 5.2 Video sequence with scene cut . 54 6.1 Subjective test . 68 6.2 Hierarchical gop structure . 72 5 6 LIST OF FIGURES List of Tables 3.1 C65 vs HM 12.0 random access configuration . 29 3.2 HM 12.0 random access configuration vs same software using only one reference picture. 31 3.3 C65 vs x264 with settings: --psnr --threads 1 --profile high --preset veryslow --tune psnr -I 48 ............. 32 3.4 x264 with settings: --psnr --threads 1 --profile high --preset veryslow --tune psnr -I 48 vs x264 with settings: --psnr --threads 1 --profile high --preset veryslow --ref 1 --tune psnr -I 48 .............................. 32 4.1 Luma interpolation filter in HEVC . 45 6.1 Qp toggling possibilities . 58 6.2 Summary qp toggling configurations, gop size 8 . 58 6.3 Summary qp toggling configurations, gop size 4 . 63 6.4 Summary qp toggling configurations, gop size 2 . 65 6.5 Summary gop structures comparison . 66 6.6 C65 vs C65 B............................. 67 6.7 Encoding speed . 69 6.8 C65 B vs HM 12.0 for one Intra picture . 73 A.1 No qp toggle vs configuration -4 -1 0 1 2.............. 78 A.2 No qp toggle vs configuration -4 -2 0 2 4.............. 79 A.3 No qp toggle vs configuration -4 -2 2 3 4.............. 79 A.4 No qp toggle vs configuration -4 -3 0 4 8.............. 80 A.5 No qp toggle vs configuration -4 -2 2 4 6.............. 80 A.6 No qp toggle vs configuration -4 -3 2 4 8.............. 81 A.7 No qp toggle vs configuration -4 -3 -3 4 8 ............. 81 A.8 No qp toggle vs configuration -4 -1 0 1............... 82 A.9 No qp toggle vs configuration -4 -2 0 2............... 82 A.10 No qp toggle vs configuration -4 -3 0 3............... 83 A.11 No qp toggle vs configuration -4 -3 2 6............... 83 A.12 No qp toggle vs configuration -4 -2 2................ 84 A.13 No qp toggle vs configuration -4 -3 3................ 84 A.14 No qp toggle vs configuration -4 -3 6................ 85 A.15 Configuration -4 -3 3 vs configuration -4 -3 2 6.......... 85 A.16 Configuration -4 -3 2 6 vs configuration -4 -3 -3 4 8 . 86 A.17 C65 B vs x264 . ..