Phd THESIS Retina-Inspired Image and Video Coding

UNIVERSITE´ COTEˆ D’AZUR GRADUATE SCHOOL STIC SCIENCE ET TECHNOLOGIES DE L’INFORMATION ET DE LA COMMUNICATION PhD THESIS A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Science Specialized in Signal and Image Processing presented by Effrosyni Doutsi Retina-Inspired Image and Video Coding supervised by Pr. Lionel Fillatre and Dr. Marc Antonini prepared at I3S Laboratory, SIS Team, Mediacoding Group Funded by 4G-TECHNOLOGY and ANRT Grant Defended on March 22, 2017 Committee: Referees : Pr. Janusz Konrad - Boston University Dr. Laurent Perrinet - CNRS, Aix-Marseille Université Directors : Pr. Lionel Fillatre - I3S, UniversitéCôte d’Azur Dr. Marc Antonini - I3S, CNRS Examiners : Pr. Béatrice Pesquet - Télécom ParisTech Dr. Pierre Kornbropst - Inria Pr. Fernando Pereira - University of Lisbon Invited : Mr. Julien Gaulmin - 4G-TECHNOLOGY 2 Contents 1 Introduction 9 1.1 GeneralFramework................................ 9 1.2 ContributionsandOutline. 11 1.2.1 PartII:DynamicFiltering. 11 1.2.2 Part III: Dynamic Quantization . 12 1.2.3 PartIV:Applications .......................... 13 I STATE-OF-THE-ART 15 2 State-of-the-art and Background in Compression 17 2.1 Whyweusecompression? . .. .. .. .. .. .. .. 18 2.1.1 Losslesscompression . 18 2.1.2 Lossycompression ............................ 19 2.2 CodingPrinciple ................................. 20 2.3 Rate-DistortionTheory . 20 2.3.1 Qualitative Metrics . 21 2.3.1.1 MeanSquareError(MSE) . 22 2.3.1.2 Peak Signal to Noise Ratio (PSNR) . 22 2.3.1.3 Structure SIMilarities (SSIM) . 22 2.3.1.4 PSNRvsSSIM......................... 23 2.3.2 ShannonEntropy............................. 24 2.3.2.1 HuffmanCoding ........................ 25 2.3.2.2 ArithmeticCoding. 26 2.3.2.3 Stack-RunCoding . 27 2.3.3 Rate-Distortion Optimality . 28 2.3.4 CodingUnitandComplexity . 29 2.3.5 Lagrangian Optimization . 29 2.3.5.1 Rate Allocation Problem . 31 2.3.5.2 Distortion Allocation Problem . 31 2.4 Progress inVideoCompressionStandards . ...... 32 2.4.1 VideoStream............................... 32 2.4.2 FromJPEGtoHEVC .......................... 33 2.4.3 OverviewofJPEGandJPEG2000 . 34 2.4.3.1 Discrete Cosine Transform (DCT) . 34 2.4.3.2 Discrete Wavelet Transform (DWT) . 35 2.4.3.3 Quantization . 36 2.4.3.4 Entropycoding......................... 36 2.4.3.5 Overview MJPEG and MJPEG2000 . 36 2.4.4 OverviewofMPEG-1 .......................... 37 2.4.4.1 Macroblocks .......................... 38 2.4.4.2 Motion Compensation . 38 1 2 CONTENTS 2.4.5 OverviewofMPEG-2 .......................... 39 2.4.6 Overview of H.264/MPEG-4/AVC . 40 2.4.7 OverviewofH.265 ............................ 42 2.5 Alternative Video Compression Algorithms . ....... 43 2.5.1 VP9.................................... 43 2.5.2 GreenMetadataStandard. 44 2.6 IEEE1857Standard ............................... 46 2.7 Conclusion: What is the future of video compression? . ......... 47 II DYNAMIC FILTERING 49 3 DoG Filters from Neuroscience to Image Processing 53 3.1 Introduction.................................... 53 3.2 IntroductiontotheVisualSystem . 54 3.2.1 Retina................................... 54 3.2.1.1 Photoreceptors . 56 3.2.1.2 Horizontal Cells . 56 3.2.1.3 Bipolar cells . 56 3.3 OPLApproximationModels. 56 3.3.1 SpatialDoGFilter ............................ 57 3.3.2 Separable Spatiotemporal DoG Filter . 57 3.3.3 Non-Separable Spatiotemporal Receptive Field . ....... 58 3.3.4 Non-Separable Spatiotemporal Filter . ..... 59 3.4 DoGinImageProcessing ............................ 60 3.4.1 SpatialDoGPyramid .......................... 60 3.4.2 Invertible Spatial DoG Pyramid . 61 3.4.3 Invertible Spatiotemporal DoG Pyramid . 61 3.5 Conclusion .................................... 62 4 Retina-inspired Filtering 63 4.1 Introduction.................................... 63 4.2 Definition ..................................... 64 4.3 Spatiotemporal Behavior and Convergence . ....... 65 4.4 WeightedDoGAnalysis ............................. 67 4.4.1 WDoGinSpaceDomain. .. .. .. .. .. .. .. 68 4.4.2 WDoG in Frequency Domain . 69 4.5 NumericalResults ................................ 72 4.5.1 1DInputSignal ............................. 72 4.5.2 ImageInputSignal............................ 73 4.6 Conclusion .................................... 76 5 Inverse Retina-Inspired Filtering 77 5.1 Introduction.................................... 77 5.2 Discrete Retina-inspired Filter . ...... 78 5.3 Retina-inspiredFrame . 79 5.3.1 FrameTheory .............................. 79 5.3.2 FrameProof ............................... 80 5.4 Pseudo-inverseFrame .. .. .. .. .. .. .. .. .. 82 5.5 ConjugateGradient ............................... 83 5.6 NumericalResults ................................ 84 5.6.1 Progressive Reconstruction . 85 5.6.2 Additive White Gaussian Noise . 86 CONTENTS 3 5.7 Conclusion .................................... 91 III DYNAMIC QUANTIZATION 93 6 Generation of Spikes 97 6.1 Introduction.................................... 97 6.2 GanglionCells .................................. 98 6.2.1 Morphology................................ 99 6.2.2 Functionality ............................... 99 6.3 SpikeGenerationModels. 100 6.3.1 RateCodes ................................ 101 6.3.1.1 Michaelis-Menten Function . 101 6.3.1.2 Rate as a Spike Count . 101 6.3.1.3 Rate as a Spike Density . 102 6.3.1.4 Rate as a Population Activity . 102 6.3.2 Time Code: Leaky Integrate and Fire (LIF) . 103 6.3.3 RankCode ................................ 105 6.4 Howtointerpretthespikes?. 106 6.5 Spikesincodingsystems. 107 6.5.1 RankOrderCoder(ROC). 107 6.5.2 ExtensionofROC ............................ 108 6.5.3 TimeEncodingMachine(TEM) . 108 6.5.4 A/D Bio-inspired Converter . 109 6.6 Conclusion .................................... 110 7 LIF Quantizer 113 7.1 Introduction.................................... 113 7.2 LIFQuantizer................................... 114 7.2.1 Decodingspikes.............................. 115 7.2.2 Dead-zone................................. 117 7.2.3 Quantization ............................... 117 7.2.4 Perfect-LIF dead-zone Quantizer . 120 7.2.5 Uniform-LIF dead-zone Quantizer . 121 7.2.6 Adaptive-LIF dead-zone Quantizer . 127 7.2.7 Optimized-LIF dead-zone Quantizer . 131 7.3 ProgressiveReconstruction . 135 7.4 Conclusion .................................... 136 IV APPLICATIONS 137 8 Application on Video Surveillance Data 139 8.1 VideoSurveillanceoverWSN . 140 8.1.1 TransmissionConstraints . 140 8.1.2 Network Topologies . 140 8.1.3 Pre-processingsolutions . 141 8.2 4G-TECHNOLOGY ............................... 142 8.2.1 BSViArchitecture ............................ 142 8.2.2 EViBOX/BVi Architecture . 142 8.2.3 SurveillanceScenarios . 142 8.2.3.1 WorkingArea ......................... 143 8.2.3.2 Traffic.............................. 143 4 CONTENTS 8.2.3.3 PublicArea........................... 144 8.3 OurContributions ................................ 144 8.4 VideoNumericalResults. 146 8.5 Conclusion .................................... 153 9 General Conclusion 155 9.1 Contributions................................... 155 9.1.1 Retina-inspired Filtering . 155 9.1.2 LIFQuantizer .............................. 156 9.1.3 Retina-inspired image and video codec . 156 9.2 Perspectives.................................... 156 V APPENDIX 159 A Proof of Lemma 1 161 A.1 Closed-form of Jc(t) ............................... 161 A.2 Closed-form of Js(t) ............................... 162 B List of Symbols 165 C List of Abbreviations 169 CONTENTS 5 Abstract The goal of this thesis is to propose a novel video coding architecture which is inspired by the visual system and the retina. If one sees the retina as a machine which processes the visual stimulus, it seems an intelligent and very efficient model to mimic. There are several reasons to claim that, first of all because it consumes low power, it also deals with high resolution inputs and the dynamic way it transforms and encodes the visual stimulus is beyond the current standards. We were motivated to study and release a retina-inspired video codec. The proposed algorithm was applied to a video stream in a very simple way according to the coding standards like MJPEG or MJPEG2000. However, this way allows the reader to study and explore all the advantages of the retina dynamic processing way in terms of compression and image processing. The current performance of the retina- inspired codec is very promising according to some final results which outperform MJPEG for bitrates lower than 100 kbps and MPEG-2 for bitrates higher than 70 kpbs. In addition, for lower bitrates the retina-inspired codec outlines better the content of the input scene. There are many perspectives which concern the improvement of the retina-inspired video codec which seem to lead to a groundbreaking compression architecture. Hopefully, this manuscript will be a useful tool for all the researchers who would like to study further than the perceptual capability of the visual system and understand how the structure and the functions of this efficient machine can in practice improve the coding algorithms. Résumé Cette thèse vise àproposer une nouvelle architecture de codage vidéo qui s’inspire du système visuel et de la rétine. La rétine peut être considérée comme une machine intelligente qui traite le stimulus visuel de fa¸con très efficace. De ce fait, elle représente donc un bon candidat pour développer de nouveaux systèmes de traitement d’image. Il y a plusieurs raisons pour cela, tout d’abord parce

Phd THESIS Retina-Inspired Image and Video Coding

Versatile Video Coding – the Next-Generation Video Standard of the Joint Video Experts Team

VOL. E100-C NO. 6 JUNE 2017 the Usage of This PDF File Must Comply

Security Solutions Y in MPEG Family of MPEG Family of Standards

Dynamic Resource Management of Network-On-Chip Platforms for Multi-Stream Video Processing

Parameter Optimization in H.265 Rate-Distortion by Single Frame

The H.264 Advanced Video Coding (AVC) Standard

Design and Implementation of a Fast HEVC Random Access Video Encoder

Fast Coding Unit Encoding Mechanism for Low Complexity Video Coding

ITC Confplanner DVD Pages Itcconfplanner

The H.264/MPEG4 Advanced Video Coding Standard and Its Applications

H.264/MPEG-4 Advanced Video Coding Alexander Hermans

The Open-Source Turing Codec: Towards Fast, Flexible and Parallel Hevc Encoding