Computational Light Field Photography Depth Estimation, Demosaicing, and Super-Resolution

Computational Light Field Photography Depth Estimation, Demosaicing, and Super-Resolution Yongwei Li Department of Information Systems and Technology Mid Sweden University Doctoral Thesis No. 327 Sundsvall, Sweden 2020 Mittuniversitetet Informationssystem och teknologi ISBN 978-91-88947-57-4 SE-851 70 Sundsvall ISSN 1652-893X SWEDEN Akademisk avhandling som med tillstånd av Mittuniversitetet framlägges till of- fentlig granskning för avläggande av teknologie doktorsexamen den 10 juni 2020 klockan 09.00 i sal C-312, Mittuniversitetet Holmgatan 10, Sundsvall. Seminariet kommer att hållas på engelska. c Yongwei Li, juni 2020 ⃝ Tryck: Tryckeriet Mittuniversitetet To Medical Professionals during COVID-19 Pandemic iv Abstract The transition of camera technology from film-based cameras to digital cameras has been witnessed in the past twenty years, along with impressive technological ad- vances in processing massively digitized media content. Today, a new evolution emerged – the migration from 2D content to immersive perception. This rising trend has a profound and long-term impact to our society, fostering technologies such as teleconferencing and remote surgery. The trend is also reflected in the scientific research community, and more intention has been drawn to the light field and its applications. The purpose of this dissertation is to develop a better understanding of light field structure by analyzing its sampling behavior and to addresses three problems con- cerning the light field processing pipeline: 1) How to address the depth estimation problem when there is limited color and texture information. 2) How to improve the rendered image quality by using the inherent depth information. 3) How to solve the interdependence conflict of demosaicing and depth estimation. The first problem is solved by a hybrid depth estimation approach that combines advantages of correspondence matching and depth-from-focus, where occlusion is handled by involving multiple depth maps in a voting scheme. The second problem is divided into two specific tasks – demosaicing and super-resolution, where depth- assisted light field analysis is employed to surpass the competence of traditional image processing. The third problem is tackled with an inferential graph model that encodes the connections between demosaicing and depth estimation explicitly, and jointly performs a global optimization for both tasks. The proposed depth estimation approach shows a noticeable improvement in point clouds and depth maps, compared with references methods. Furthermore, the objective metrics and visual quality are compared with classical sensor-based demosaicing and multi-image super-resolution to show the effectiveness of the proposed depth-assisted light field processing methods. Finally, a multi-task graph modelis proposed to challenge the performance of the sequential light field image processing pipeline. The proposed method is validated with various kinds of light fields, and outperforms the state-of-the-art in both demosaicing and depth estimation tasks. The works presented in this dissertation raise a novel view of the light field data structure in general, and provide tools to solve image processing problems in specific. The impact of the outcome can be manifold: To support scientific research with v vi light field microscopes, to stabilize the performance of range cameras for industrial applications, as well as to provide individuals with a high-quality immersive expe- rience. Sammanfattning Under de senaste tjugo åren har det skett en övergång från filmbaserad till digital ka- merateknik, parallellt med en imponerande teknisk utveckling inom bearbetning av omfattande digitaliserat medieinnehåll. På senare tid även en ny utvecklingslinje – övergången från 2D-innehåll till omslutande perception. Detta är en utveckling som har långtgående och långvarig påverkan på samhället och främjar arbetsmetoder såsom telekonferens och fjärrstyrd kirurgi. Den här utvecklingst trenden återspeglas också i det vetenskapliga forskningssamhället, och mer uppmärksamhet har lagts på light field och dess olika tillämpningsområden. Syftet med avhandlingen är att nå en bättre förståelse av strukturen i light field genom att analysera hur light field samplas, och att lösa tre problem inom behand- lingsprocessen av light field: 1) Hur problemet med djupestimering kan lösas med begränsad information om färg och textur. 2) Hur renderad bildkvalitet kan förbätt- ras genom att utnyttja den inneboende djupinformationen. 3) Hur beroendekonflik- ten mellan demosaicing (färgfiltrering) och djupestimering kan lösas. Det första problemet har lösts genom en hybridmetod för djupestimering, som kombinerar styrkorna med korrespondensmatchning och djup från fokus, där ock- lusion hanteras genom att använda flera djupkartor i ett röstningssystem. Det andra problemet delas upp i två separata moment – demosaicing och superupplösning, där djupassisterad analys av light field används för att överträffa kapaciteten för tradi- tionell bildbehandling. Det tredje problemet har angripits med en inferentiell grafmodell som explicit kopplar samman demosaicing och djupestimering, och samfällt utför en global optimering för båda dessa processteg. Den metod för djupestimering som föreslås producerar visuellt tilltalande punkt- moln och djupkartor, jämfört med andra referensmetoder. Objektiva mätvärden och visuell kvalitet jämförs vidare med klassisk sensorbaserad demosaicing och superupp- lösning från multipla bilder, för att visa effektiviteten hos de föreslagna metoderna för djupassisterad behandling av light field. En multitaskande grafmodell föreslås även för att matcha och överträffa prestandan hos sekventiell light field-baserad bildbehandling. Den metod som föreslås valideras med olika sorters light fields och överträffar de bästa existerande metoderna inom både demosaicing och djupestimering. De arbeten som presenteras i avhandlingen utgör ett nytt sätt att betrakta den generella datastrukturen hos light field, och tillhandahåller verktyg för att lösa spe- vii viii cifika bildbehandlingsproblem. Effekterna av dessa resultat kan vara många, till ex- empel som stöd för vetenskaplig forskning om light field-baserade mikroskop, för att förbättra prestandan hos avståndsmätande kameror i industriella tillämpningar, såväl som för att erbjuda högkvalitativa omslutande mediaupplevelser. Acknowledgements I would like to thank my supervisors, Prof. Mårten Sjöström and Dr. Roger Olsson, for offering me the opportunity to be a Ph.D. student in the Realistic 3D group. I want to thank them for their supports, kindness, encouragement, and trust through- out my education, and their wisdom and positive attitude when I had tough mo- ments. This thesis and many other research works would undoubtedly never have achieved fruition without your dedication. The supports of my friends and colleagues are vital to me. I thank all the members of the Realistic 3D group that I have had the pleasure of working with, es- pecially Waqas Ahmad and Elijs Dima, who have helped me consistently from the very beginning of this journey and shared four years of happy memories with me. At Chinese festivals, more than ever we think of our family and friends far away. I want to thank my compatriots from EKS and their family who organized cultural events and gatherings. I thank Jan Lundgren for his review of this thesis, and all the staff members of Mid Sweden University I met along the way. I wish you all the best in your future endeavors. I am grateful to my family and my wife’s family. Without their constant support, I would never have had a chance to start and finish a Ph.D., or even to come to study in Sweden. I want to thank my wife, Jun Wang. There is so much I want to say, but so little can be expressed. She quit her job as a teacher in China and came to me with great courage. She took care of me, spared me from tremendous housework, and even managed to be my Swedish translator after SFI and grundläggande. Anywhere I go, I know it would be my sweet home only if you are with me. This work has received funding from the European Unions Horizon 2020 research and innovation program under the Marie Sklodowska-Curie Actions, and specifically European Training Network on Full Parallax Imaging. The network has provided me with numerous exciting training schools, workshops, internal confer- ences, online seminars, and secondments. I am greatly thankful for the coordinators and researchers from all our academic and industrial partners/organizations. They made this project both fruitful and enjoyable. Special thanks to Prof. Manuel Mar- tinez from University of Valencia, Prof. Filiberto Pla from Jaume I University, and their research groups for hosting me during my research secondments. Thank you for your assistance and motivation. ix x Table of Contents Abstract v Sammanfattning vii Acknowledgements ix List of Papers xv Terminology xxi 1 Introduction 1 1.1 Motivation . .1 1.2 Purpose . .2 1.3 Objectives . .2 1.4 Scope . .3 1.5 Contributions . .3 1.6 Outline . .4 2 Computational Photography and Light Field 5 2.1 Computational photography . .5 2.1.1 Image processing . .5 2.1.2 Digital photography . .6 2.2 Plenoptic function and light field . .8 2.2.1 Definition . .9 2.2.2 Light field sampling . .9 3 Light Field Depth Estimation, Demosaicing and Super-Resolution 11 xi xii TABLE OF CONTENTS 3.1 View-based depth estimation . 11 3.1.1 Correspondence matching . 12 3.1.2 Depth-from-focus/defocus . 12 3.1.3 Other techniques . 13 3.2 Light field demosaicing . 13 3.3 Light field spatial super-resolution . 14 3.4 Markov random field in computer vision . 15 4 Related Works 19 4.1 Light field depth estimation . 19 4.1.1 View-based initial depth estimation . 19 4.1.2 Markov random field depth refinement . 20 4.1.3 Concluding remarks . 20 4.2 Light field demosaicing . 21 4.2.1 Sampling-based approaches . 21 4.2.2 Other approaches . 21 4.2.3 Concluding remarks . 22 4.3 Light field spatial super-resolution . 22 4.3.1 Projection-based approaches . 22 4.3.2 Optimization-based approaches . 22 4.3.3 Learning-based approaches . 23 4.3.4 Concluding remarks . 23 5 Depth-Assisted Light Field Processing 25 5.1 Light field microscopic depth estimation .

Computational Light Field Photography Depth Estimation, Demosaicing, and Super-Resolution

Multi-Perspective Stereoscopy from Light Fields

Light Field Saliency Detection with Deep Convolutional Networks Jun Zhang, Yamei Liu, Shengping Zhang, Ronald Poppe, and Meng Wang

Convolutional Networks for Shape from Light Field

Head Mounted Display Optics II

Light Field Features in Scale and Depth

Light Field Rendering Marc Levo Y and Pat Hanrahan Computer Science Department Stanford University

Learning a Deep Convolutional Network for Light-Field Image Super-Resolution

Improving Resolution and Depth-Of-Field of Light Field Cameras Using a Hybrid Imaging System

The Fresnel Zone Light Field Spectral Imager

Suitability Analysis of Holographic Vs Light Field and 2D Displays

Deep Sparse Light Field Refocusing

An Equivalent F-Number for Light Field Systems: Light Eﬀiciency, Signal-To-Noise Ratio, and Depth of Field Ivo Ihrke