Synthesis, Coding, and Evaluation of 3D Images Based on Integral Imaging
Total Page:16
File Type:pdf, Size:1020Kb
Synthesis, Coding, and Evaluation of 3D Images Based on Integral Imaging Roger Olsson Department of Information Technology and Media Mid Sweden University Doctoral Thesis No. 55 Sundsvall, Sweden 2008 Mittuniversitetet Informationsteknologi och medier ISBN 978-91-85317-98-1 SE-851 70 Sundsvall ISSN 1652-893X SWEDEN Akademisk avhandling som med tillstånd av Mittuniversitetet framlägges till offent- lig granskning för avläggande av Teknologie Doktorsexamen torsdagen den 12 juni 2008 i L111, Mittuniversitetet, Holmgatan 10, Sundsvall. c Roger Olsson, maj 2008 Tryck: Kopieringen, Mittuniversitetet, Sundsvall. Always in motion is the future Yoda iv Sammanfattning De senaste åren har kameraprototyper som kan fånga tredimensionella (3D) bilder presenterats, baserade på 3D-tekniken Integral Imaging (II). När dessa II-bilder be- traktas på en 3D-skärm, delger de både ett djup och ett innehåll som på ett realistiskt sätt ändrar perspektiv när tittaren ändrar sin betraktningsposition. Avhandlingen koncentrerar sig på tre hämmande faktorer gällande II-bilder. För det första finns det en mycket begränsad allmän tillgång till II-bilder för jämförande forskning och utveckling av kodningsmetoder. Det finns heller inga objektiva kvali- tetsmått som uttryckligen mäter distorsion med avseende på II-bildens egenskaper: djup och betraktningsvinkelberoende. Slutligen uppnår nuvarande standarder för bildkodning låg kodningseffektivitet när de appliceras på II-bilder. En metod baserad på datorrendrering har utvecklats som tillåter produktion av olika typer av II-bilder. En II-kameramodell ingår som bas, kombinerad med ett scen- beskrivningsspråk som möjligör att godtydligt komplexa virtuella scener definieras. Ljustransporten inom scenen och fram till II-kameran simuleras med strålföljning och geometrisk optik. Den presenterade metoden används för att skapa ett antal II- kameramodeller, scendefinitioner och II-bilder. Två kvalitetmått har tagits fram för att objektivt kvantifiera distorsion som kan uppträda i en II-bild med avseende på dess specifika egenskaper. Det första måt- tet modellerar hur distortionen uppfattas av en tittare som betraktar en 3D-skärm ur olika betraktningsvinklar. Det andra måttet beräknar distorsionens djupdistribu- tion inom II-bilden. Nya aspekter av kodningsinducerade artefakter påvisas med de föreslagna kvalitetsmåtten. Slutligen har en kodningsmetod för II-bilder utarbetats som bland annat utnytt- jar videokodningsstandarden H.264/AVC genom att först transformera II-bilden till en pseudovideosekvens (PVS). Kodningsmetodens egenskaper har studerats i detalj och jämförts med andra kodningsmetoder, bland annat med hjälp av de föreslag- na kvalitetsmåtten. Den föreslagna kodningsmetoden åstadkommer samma kvalitet som JPEG2000 till ungefärligen 1/60-del av kraven på lagring och distribution. v vi Abstract In recent years camera prototypes based on Integral Imaging (II) have emerged that are capable of capturing three-dimensional (3D) images. When being viewed on a 3D display, these II-pictures convey depth and content that realistically change perspective as the viewer changes the viewing position. The dissertation concentrates on three restraining factors concerning II-picture progress. Firstly, there is a lack of digital II-pictures available for inter alia compara- tive research and coding scheme development. Secondly, there is an absence of ob- jective quality metrics that explicitly measure distortion with respect to the II-picture properties: depth and view-angle dependency. Thirdly, low coding efficiencies are achieved when present image coding standards are applied to II-pictures. A computer synthesis method has been developed, which enables the produc- tion of different II-picture types. An II-camera model forms a basis and is combined with a scene description language that allows for the describing of arbitrary com- plex virtual scenes. The light transport within the scene and into the II-camera is simulated using ray-tracing and geometrical optics. A number of II-camera models, scene descriptions, and II-pictures are produced using the presented method. Two quality evaluation metrics have been constructed to objectively quantify the distortion contained in an II-picture with respect to its specific properties. The first metric models how the distortion is perceived by a viewer watching an II-display from different viewing-angles. The second metric estimates the depth-distribution of the distortion. New aspects of coding-induced artifacts within the II-picture are revealed using the proposed metrics. Finally, a coding scheme for II-pictures has been developed that inter alia utilizes the video coding standard H.264/AVC by firstly transforming the II-picture into a pseudo video sequence. The properties of the coding scheme have been studied in detail and compared with other coding schemes using the proposed evaluation metrics. The proposed coding scheme achieves the same quality as JPEG2000 at approximately 1/60th of the storage- or distribution requirements. vii viii Acknowledgements I would like to thank my main supervisor Professor Youzhi Xu for his confidence in providing me with the opportunity to become a PhD-student at Mid Sweden Uni- versity. Thanks also to my supervisor Docent Tingting Zhang who, together with Youzhi, allowed me to independently explore and define the topic of this work. In addition I would like to specially acknowledge the help provided by my supervisor Docent Mårten Sjöström. His guidance and helpful comments on scientific character and writing has helped me transform my text into a form that is far more under- standable than it was initially. Thanks also to Professor Theo Kanter for his sug- gestions on how to improve the main theme of the dissertation, and Fiona Wait for proofreading the text. My colleagues at the Division of Information and Communication Systems are worthy of great appreciation. The interesting discussions at coffee breaks have pro- vided rewarding pauses from struggling with subordinate clauses or something sim- ilarly intriguing. Better still have been the (more or less speculative) lunch debates on various topics well outside the realm of 3D images that I have had the opportu- nity of participating in together with Magnus Eriksson, Stefan Pettersson, Mårten, and others. Thanks also to all other PhD-students on the Department of Informa- tion Technology and Media and the Swedish Graduate School of Telecommunication (GST) with whom I have crossed paths during these years and learned a lot. As a GST-student I also greatly appreciate the financial support provided by GST as well as project funding from the EU Objective 1-programme Södra Skogslän region. Finally, I want to give special thanks to my family who have cheered me through both floods and droughts in motivation and creativity. To my wife Sara I want to express my love; this thesis would not have been written without your love, support and encouragement. Roger Olsson Sundsvall, May 2008 ix x Contents Sammanfattning v Abstract vii Acknowledgements ix List of selected papers xv Terminology xvii 1 Introduction 1 1.1 Motivation . .................................. 2 1.2 Overall aim and problem definitions . ................... 3 1.3 Solution approach . .............................. 4 1.4 Outline . .................................. 5 1.5 Contributions . .............................. 6 1.5.1 Condensed abstracts from the selected papers . ....... 7 2 Background 11 2.1 Three-dimensional images .......................... 11 2.1.1 History . .............................. 11 2.1.2 Human visual system requirements . ............... 12 2.1.3 3D-techniques . .......................... 18 2.1.4 Applications .............................. 22 xi xii CONTENTS 2.2 Integral Imaging (II) . ........................... 24 2.2.1 History . ............................... 24 2.2.2 The dimension taxonomy . ................ 26 2.2.3 II - a way to sample the light field . ................ 28 2.2.4 II-properties . ........................... 30 2.2.5 Constraints in property trade-off . ................ 37 2.2.6 The Component Images of the II-picture . ............ 38 2.2.7 Multi-view - another way to sample the light field ........ 43 2.3 Related works . ............................... 44 2.3.1 Synthesis . ............................... 44 2.3.2 Evaluation ............................... 46 2.3.3 Coding . ............................... 47 2.4 Concluding remarks . ........................... 48 2.4.1 Problem definitions . ....................... 48 3 Synthesis 51 3.1 Chapter outline . ............................... 52 3.2 Methodology . ............................... 52 3.3 II-camera model . ............................... 53 3.3.1 II-camera model representation . ................ 57 3.4 II-picture synthesis . ........................... 58 3.4.1 Ray-tracing using MegaPOV . ................ 58 3.4.2 Integrating II-camera model and MegaPOV ............ 60 3.5 Example of II-camera model parametrization . ............ 61 3.6 Results . ............................... 64 3.6.1 II-camera models ........................... 64 3.6.2 Virtual scenes . ....................... 66 3.6.3 Synthesized II-pictures . ....................... 66 3.6.4 Comparison between synthesis approaches ............ 67 3.7 Concluding remarks . ........................... 68 CONTENTS xiii 3.7.1 Authors contributions . ................... 69 3.7.2 Problem definitions – P1a and P1b . ............... 69 4 Evaluation 71 4.1 Chapter outline . .............................. 71 4.2 Methodology