Perceptual Depth Compression for Stereo Applications

EUROGRAPHICS 2014 / B. Lévy and J. Kautz Volume 33 (2014), Number 2 (Guest Editors) Perceptual Depth Compression for Stereo Applications Dawid Paj ˛ak,1 Robert Herzog,2 Radosław Mantiuk,3 Piotr Didyk,4 Elmar Eisemann,5 Karol Myszkowski,2 and Kari Pulli1 1NVIDIA, 2MPI Informatik, 3West Pomeranian University of Technology, 4MIT CSAIL, 5TU Delft Depth Frame Vergence Low-res. Depth Representation Spatial/Temporal H.264 Encoder Z Prediction Macroblock P M Color Frame D R Disparity H.264 Masking DCT C Spatial D Decorrelation Compressed Macroblock Entropy Quantization R 101100101101011010101 Coding 1010101111100100101... Figure 1: Outline of our compression framework. H.264 compression tools are framed in orange, our enhancements in blue. Abstract Conventional depth video compression uses video codecs designed for color images. Given the performance of current encoding standards, this solution seems efficient. However, such an approach suffers from many issues stemming from discrepancies between depth and light perception. To exploit the inherent limitations of human depth perception, we propose a novel depth compression method that employs a disparity perception model. In contrast to previous methods, we account for disparity masking, and model a distinct relation between depth perception and contrast in luminance. Our solution is a natural extension to the H.264 codec and can easily be integrated into existing decoders. It significantly improves both the compression efficiency without sacrificing visual quality of depth of rendered content, and the output of depth-reconstruction algorithms or depth cameras. 1. Introduction the rendering engine can perform such adjustments, but for played-back encoded content (server-side rendering broad- Stereo imaging is an important trend in modern media. The casted to multiple devices, movies, stereo cameras, etc.) such amount of available 3D content is increasing and, especially, a solution is impossible. Nonetheless, warping operations, on a consumer level it has gained much importance. Yet, cre- based on color images with per-pixel depth (considered by ating effective 3D content for different displays from movie several standards, e.g., HDMI 1.4), can efficiently produce theaters to hand-held auto-stereoscopic displays is difficult, an appropriate high-quality view [DRE∗10]. Color + Depth so that neither viewing comfort nor depth perception is sacri- even compresses better than two color images [MWM∗09]. ficed. Discomfort typically results from accommodation (fo- Using multiple color images and a depth map, or even multi- cusing distance) and vergence (the distance at which eye ple depth maps, can increase the warping quality, and is ben- viewing directions meet) [LIFH09,SKHB11] disagreements, eficial for multi-view systems [HWD∗09]. Such solutions which is addressed by fitting the scene within a limited com- can reduce the server workload, as multiple heterogeneous fort zone [LHW∗10, KHH∗11, MWA∗13]. More precisely, clients can receive the same information stream, which is lo- the pixel distance between the same scene point in the left cally adapted to the viewing conditions, instead of having and right views is adjusted, hereby influencing vergence. the server provide distinct content for each observer. Fur- The required modifications depend on device (type, thermore, depth information also enables inserting commer- screen size, and resolution), viewer (left-right eye distance, cials or synthesizing new views from arbitrary positions as observer distance), and content [OHB∗11] (extent of the required in 3DTV applications, but increase bandwidth com- scene, visible objects). In a real-time rendering context, pared to a standard video. c 2014 The Author(s) Computer Graphics Forum c 2014 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd. D. Paj ˛aket al. / Perceptual Depth Compression for Stereo Applications Encoding color-converted depth via standard Similarly, previous work rarely considers particularities codecs [PKW11] is wasteful. Color and depth have a of depth information. Depth is usually a smooth signal strong correlation, but it is rarely exploited. Other recent with abrupt discontinuities at object boundaries [WYT∗10] solutions only work for rendered content [PHE∗11]. Finally, and, typically, exhibits more spatial redundancy than tex- no monocular video encoding method considers the percep- ture. Depth-map quality is also less important than the qual- tual quality of stereo, thereby risking compression artifacts, ity of the synthesized views [MMS∗09], created by depth- potentially introducing viewing discomfort. image-based rendering (DIBR) [Feh04]. Nonetheless, high- quality discontinuities are critical for avoiding displaced We present three major contributions: pixel warps, which may lead to frayed object boundaries • measurements of neighborhood masking (in terms of spa- and local inconsistencies in depth perception [VTMC11]. tial location and spatial frequencies); Since high-frequency information is typically discarded first • a corresponding model, applicable in various contexts; by standard compression methods, a lossless coding is of- • a method for simultaneous depth/color compression inte- ten essential, leading to a high bandwidth. Edge-preserving grated into existing H.264 encoders. depth filtering [PJO∗09] combined with downscaling (and Our compression exploits the fact that depth edges of- upscaling while decoding) reduces bandwidth significantly ten coincide with color edges. But if a color edge is miss- [MKAM09], but it is difficult to control these filters and pre- ing, the depth difference is often not visible either. Our so- dict their impact. lution upsamples a special low-resolution depth stream us- ∗ ing the color image and can be integrated easily into stan- Custom solutions Merkle et al. [MMS 09] propose dard codecs (e.g., H.264) by using our new neighborhood- platelets; a quadtree decomposition hierarchically dividing masking model to reduce residuals in the frame prediction. the depth image into blocks. Each block is split by a linear segment, defining two regions that are piecewise constant or linearly varying. The resulting sharp edges lead to higher 2. Previous work quality than H.264/MVC that is explicitly designed for mul- In this section, we discuss existing solutions for depth com- tiview video representations, but the linear edges might dif- pression and give an overview of attempts to explore vi- fer from the actual geometry, and the fitting process is rela- sual masking for compression. More on left/right RGB video tively costly. stream compression, asymmetric coding, and multi-view Template-based solutions [EWG99] potentially respect compression can be found in an excellent survey [DHH11]. edges, but are not general enough and have difficulties in meeting bandwidth constraints. Edge detection and ex- Depth-map compression The H.264/AVC standard en- plicit representation results in higher precision [ZKU∗04, ables simultaneous (but independent) depth handling via the PHE∗11]. The discontinuities can be found via texture seg- “Auxiliary Picture Syntax” [MWM∗09], yet encoding is op- mentation [ZKU∗04] and Paj ˛aket al. [PHE∗11] show that timized for human perception of natural images. There is important edges can be identified via the luminance content, no mechanism concentrating on depth quality [MMS∗09], but this thresholding process is overly conservative and only which is also true when encoding depth in color channels removes edges in low-contrast regions. While this contrast- [PKW11]. Further, bandwidth control is unintuitive; in some based metric works well for their remote rendering and up- sequences depth bitrates have more influence on the qual- sampling applications, recent findings [DRE∗11, DRE∗12] ity of a warped image than the bitrate of the color image suggest that considering apparent depth is less than optimal, [MWM∗09, Fig. 3], while high depth quantization can be and more aggressive compression is possible. tolerated in others [HWD∗09, Fig. 6] and regions-of-interest strategies are even harder to control [KbCTS01]. Joint texture/depth coding Strong edges in color and depth often coincide and Wildeboer et al. [WYT∗10] use Multi-view video codecs, such as H.264 MVC, encode the full-resolution color image to guide depth-image up- views in separate streams and exploit spatio-temporal re- sampling, similar to joint-bilateral upsampling [KCLU07]. dundancy between neighboring views. When compared to Spatiotemporal incarnations can lead to even better results independent compression of views with H.264 AVC, multi- [RSD∗12], or one can choose edges manually [MD08]. view coding reduces the bit-rates by 15%−25% [DGK∗12]. These approaches also improve depth scans that have notori- Nonetheless, this still can be a significant cost in terms of ously low resolution and are more prone to inaccuracies than bandwidth and computation. For example, autostereoscopic high-resolution color cameras. Unfortunately, due to varia- displays require multiple UHD video streams, leading to tions in luminance (e.g., texture) that do not correspond to prohibitively expensive transmission and coding times with changes in depth, such reconstruction schemes may lead to H.264 MVC [VTMC11]. Further, image edges are ignored, so-called texture copying artifacts [STDT09]. which can lead to problems for high-quality 3D videos. Our solution can be integrated in such frameworks and would In a recent compression proposal for multiview with as- improve

Perceptual Depth Compression for Stereo Applications

How to Exploit the Transferability of Learned Image Compression to Conventional Codecs

Chapter 9 Image Compression Standards

Image Compression Using Discrete Cosine Transform Method

Task-Aware Quantization Network for JPEG Image Compression

Why Compression Matters

Video Compression and Communications: from Basics to H.261, H.263, H.264, MPEG2, MPEG4 for DVB and HSDPA-Style Adaptive Turbo-Transceivers

Website Quality Evaluation Based on Image Compression Techniques

Comparison of Video Compression Standards

Discernible Image Compression

Image Compression Algorithms Using Dct

DVC: an End-To-End Deep Video Compression Framework

AN4672, Using the Run-Length Decoding Features on Vybrid Devices