Phd THESIS Retina-Inspired Image and Video Coding
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITE´ COTEˆ D’AZUR GRADUATE SCHOOL STIC SCIENCE ET TECHNOLOGIES DE L’INFORMATION ET DE LA COMMUNICATION PhD THESIS A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Science Specialized in Signal and Image Processing presented by Effrosyni Doutsi Retina-Inspired Image and Video Coding supervised by Pr. Lionel Fillatre and Dr. Marc Antonini prepared at I3S Laboratory, SIS Team, Mediacoding Group Funded by 4G-TECHNOLOGY and ANRT Grant Defended on March 22, 2017 Committee: Referees : Pr. Janusz Konrad - Boston University Dr. Laurent Perrinet - CNRS, Aix-Marseille Universit´e Directors : Pr. Lionel Fillatre - I3S, Universit´eCˆote d’Azur Dr. Marc Antonini - I3S, CNRS Examiners : Pr. B´eatrice Pesquet - T´el´ecom ParisTech Dr. Pierre Kornbropst - Inria Pr. Fernando Pereira - University of Lisbon Invited : Mr. Julien Gaulmin - 4G-TECHNOLOGY 2 Contents 1 Introduction 9 1.1 GeneralFramework................................ 9 1.2 ContributionsandOutline. 11 1.2.1 PartII:DynamicFiltering. 11 1.2.2 Part III: Dynamic Quantization . 12 1.2.3 PartIV:Applications .......................... 13 I STATE-OF-THE-ART 15 2 State-of-the-art and Background in Compression 17 2.1 Whyweusecompression? . .. .. .. .. .. .. .. 18 2.1.1 Losslesscompression . 18 2.1.2 Lossycompression ............................ 19 2.2 CodingPrinciple ................................. 20 2.3 Rate-DistortionTheory . 20 2.3.1 Qualitative Metrics . 21 2.3.1.1 MeanSquareError(MSE) . 22 2.3.1.2 Peak Signal to Noise Ratio (PSNR) . 22 2.3.1.3 Structure SIMilarities (SSIM) . 22 2.3.1.4 PSNRvsSSIM......................... 23 2.3.2 ShannonEntropy............................. 24 2.3.2.1 HuffmanCoding ........................ 25 2.3.2.2 ArithmeticCoding. 26 2.3.2.3 Stack-RunCoding . 27 2.3.3 Rate-Distortion Optimality . 28 2.3.4 CodingUnitandComplexity . 29 2.3.5 Lagrangian Optimization . 29 2.3.5.1 Rate Allocation Problem . 31 2.3.5.2 Distortion Allocation Problem . 31 2.4 Progress inVideoCompressionStandards . ...... 32 2.4.1 VideoStream............................... 32 2.4.2 FromJPEGtoHEVC .......................... 33 2.4.3 OverviewofJPEGandJPEG2000 . 34 2.4.3.1 Discrete Cosine Transform (DCT) . 34 2.4.3.2 Discrete Wavelet Transform (DWT) . 35 2.4.3.3 Quantization . 36 2.4.3.4 Entropycoding......................... 36 2.4.3.5 Overview MJPEG and MJPEG2000 . 36 2.4.4 OverviewofMPEG-1 .......................... 37 2.4.4.1 Macroblocks .......................... 38 2.4.4.2 Motion Compensation . 38 1 2 CONTENTS 2.4.5 OverviewofMPEG-2 .......................... 39 2.4.6 Overview of H.264/MPEG-4/AVC . 40 2.4.7 OverviewofH.265 ............................ 42 2.5 Alternative Video Compression Algorithms . ....... 43 2.5.1 VP9.................................... 43 2.5.2 GreenMetadataStandard. 44 2.6 IEEE1857Standard ............................... 46 2.7 Conclusion: What is the future of video compression? . ......... 47 II DYNAMIC FILTERING 49 3 DoG Filters from Neuroscience to Image Processing 53 3.1 Introduction.................................... 53 3.2 IntroductiontotheVisualSystem . 54 3.2.1 Retina................................... 54 3.2.1.1 Photoreceptors . 56 3.2.1.2 Horizontal Cells . 56 3.2.1.3 Bipolar cells . 56 3.3 OPLApproximationModels. 56 3.3.1 SpatialDoGFilter ............................ 57 3.3.2 Separable Spatiotemporal DoG Filter . 57 3.3.3 Non-Separable Spatiotemporal Receptive Field . ....... 58 3.3.4 Non-Separable Spatiotemporal Filter . ..... 59 3.4 DoGinImageProcessing ............................ 60 3.4.1 SpatialDoGPyramid .......................... 60 3.4.2 Invertible Spatial DoG Pyramid . 61 3.4.3 Invertible Spatiotemporal DoG Pyramid . 61 3.5 Conclusion .................................... 62 4 Retina-inspired Filtering 63 4.1 Introduction.................................... 63 4.2 Definition ..................................... 64 4.3 Spatiotemporal Behavior and Convergence . ....... 65 4.4 WeightedDoGAnalysis ............................. 67 4.4.1 WDoGinSpaceDomain. .. .. .. .. .. .. .. 68 4.4.2 WDoG in Frequency Domain . 69 4.5 NumericalResults ................................ 72 4.5.1 1DInputSignal ............................. 72 4.5.2 ImageInputSignal............................ 73 4.6 Conclusion .................................... 76 5 Inverse Retina-Inspired Filtering 77 5.1 Introduction.................................... 77 5.2 Discrete Retina-inspired Filter . ...... 78 5.3 Retina-inspiredFrame . 79 5.3.1 FrameTheory .............................. 79 5.3.2 FrameProof ............................... 80 5.4 Pseudo-inverseFrame .. .. .. .. .. .. .. .. .. 82 5.5 ConjugateGradient ............................... 83 5.6 NumericalResults ................................ 84 5.6.1 Progressive Reconstruction . 85 5.6.2 Additive White Gaussian Noise . 86 CONTENTS 3 5.7 Conclusion .................................... 91 III DYNAMIC QUANTIZATION 93 6 Generation of Spikes 97 6.1 Introduction.................................... 97 6.2 GanglionCells .................................. 98 6.2.1 Morphology................................ 99 6.2.2 Functionality ............................... 99 6.3 SpikeGenerationModels. 100 6.3.1 RateCodes ................................ 101 6.3.1.1 Michaelis-Menten Function . 101 6.3.1.2 Rate as a Spike Count . 101 6.3.1.3 Rate as a Spike Density . 102 6.3.1.4 Rate as a Population Activity . 102 6.3.2 Time Code: Leaky Integrate and Fire (LIF) . 103 6.3.3 RankCode ................................ 105 6.4 Howtointerpretthespikes?. 106 6.5 Spikesincodingsystems. 107 6.5.1 RankOrderCoder(ROC). 107 6.5.2 ExtensionofROC ............................ 108 6.5.3 TimeEncodingMachine(TEM) . 108 6.5.4 A/D Bio-inspired Converter . 109 6.6 Conclusion .................................... 110 7 LIF Quantizer 113 7.1 Introduction.................................... 113 7.2 LIFQuantizer................................... 114 7.2.1 Decodingspikes.............................. 115 7.2.2 Dead-zone................................. 117 7.2.3 Quantization ............................... 117 7.2.4 Perfect-LIF dead-zone Quantizer . 120 7.2.5 Uniform-LIF dead-zone Quantizer . 121 7.2.6 Adaptive-LIF dead-zone Quantizer . 127 7.2.7 Optimized-LIF dead-zone Quantizer . 131 7.3 ProgressiveReconstruction . 135 7.4 Conclusion .................................... 136 IV APPLICATIONS 137 8 Application on Video Surveillance Data 139 8.1 VideoSurveillanceoverWSN . 140 8.1.1 TransmissionConstraints . 140 8.1.2 Network Topologies . 140 8.1.3 Pre-processingsolutions . 141 8.2 4G-TECHNOLOGY ............................... 142 8.2.1 BSViArchitecture ............................ 142 8.2.2 EViBOX/BVi Architecture . 142 8.2.3 SurveillanceScenarios . 142 8.2.3.1 WorkingArea ......................... 143 8.2.3.2 Traffic.............................. 143 4 CONTENTS 8.2.3.3 PublicArea........................... 144 8.3 OurContributions ................................ 144 8.4 VideoNumericalResults. 146 8.5 Conclusion .................................... 153 9 General Conclusion 155 9.1 Contributions................................... 155 9.1.1 Retina-inspired Filtering . 155 9.1.2 LIFQuantizer .............................. 156 9.1.3 Retina-inspired image and video codec . 156 9.2 Perspectives.................................... 156 V APPENDIX 159 A Proof of Lemma 1 161 A.1 Closed-form of Jc(t) ............................... 161 A.2 Closed-form of Js(t) ............................... 162 B List of Symbols 165 C List of Abbreviations 169 CONTENTS 5 Abstract The goal of this thesis is to propose a novel video coding architecture which is inspired by the visual system and the retina. If one sees the retina as a machine which processes the visual stimulus, it seems an intelligent and very efficient model to mimic. There are several reasons to claim that, first of all because it consumes low power, it also deals with high resolution inputs and the dynamic way it transforms and encodes the visual stimulus is beyond the current standards. We were motivated to study and release a retina-inspired video codec. The proposed algorithm was applied to a video stream in a very simple way according to the coding standards like MJPEG or MJPEG2000. However, this way allows the reader to study and explore all the advantages of the retina dynamic processing way in terms of compression and image processing. The current performance of the retina- inspired codec is very promising according to some final results which outperform MJPEG for bitrates lower than 100 kbps and MPEG-2 for bitrates higher than 70 kpbs. In addition, for lower bitrates the retina-inspired codec outlines better the content of the input scene. There are many perspectives which concern the improvement of the retina-inspired video codec which seem to lead to a groundbreaking compression architecture. Hopefully, this manuscript will be a useful tool for all the researchers who would like to study further than the perceptual capability of the visual system and understand how the structure and the functions of this efficient machine can in practice improve the coding algorithms. R´esum´e Cette th`ese vise `aproposer une nouvelle architecture de codage vid´eo qui s’inspire du syst`eme visuel et de la r´etine. La r´etine peut ˆetre consid´er´ee comme une machine intelligente qui traite le stimulus visuel de fa¸con tr`es efficace. De ce fait, elle repr´esente donc un bon candidat pour d´evelopper de nouveaux syst`emes de traitement d’image. Il y a plusieurs raisons pour cela, tout d’abord parce