Estimating Reflectance Properties and Reilluminating Scenes Using
Total Page:16
File Type:pdf, Size:1020Kb
Estimating Reflectance Properties and Reilluminating Scenes Using Physically Based Rendering and Deep Neural Networks Farhan Rahman Wasee A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science (Computer Science) at Concordia University Montréal, Québec, Canada October 2020 © Farhan Rahman Wasee, 2020 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Farhan Rahman Wasee Entitled: Estimating Reflectance Properties and Reilluminating Scenes Using Physically Based Rendering and Deep Neural Networks and submitted in partial fulfillment of the requirements for the degree of Master of Computer Science (Computer Science) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. Jinqiu Yang Examiner Dr. Ching Suen Examiner Dr. Jinqiu Yang Supervisor Dr. Charalambos Poullis Approved by Dr. Lata Narayanan, Chair Department of Computer Science and Software Engineering October 2020 Dr. Mourad Debbabi, Interim Dean Faculty of Engineering and Computer Science Abstract Estimating Reflectance Properties and Reilluminating Scenes Using Physically Based Rendering and Deep Neural Networks Farhan Rahman Wasee Estimating material properties and modeling the appearance of an object under varying illumi- nation conditions is a complex process. In this thesis, we address the problem by proposing a novel framework to re-illuminate scenes by recovering the reflectance properties. Uniquely, following a divide-and-conquer approach, we recast the problem into its two constituent sub-problems. In the first sub-problem, we have developed a synthetic dataset of spheres with realistic mate- rials. The dataset has a wide range of material properties, rendered from varying viewpoints and under fixed directional light. Images from the dataset are further processed and used as reflectance maps used during the training process of the network. In the second sub-problem, reflectance maps are created for scenes by reorganizing the outgoing radiances recorded in the multi-view images. The network trained on the synthetic dataset, is used to infer the material properties of the reflectance maps, acquired for the test scenes. These predictions are reused to relight the scenes from novel viewpoints and different lighting conditions using path tracing. A number of experiments are conducted and performances are reported using different metrics to justify our design decisions and the choice of our network. We also show that, using multi-view images, the camera properties and the geometry of a scene, our technique can successfully predict the reflectance properties using our trained network within seconds. In the end, we also present the visual results of re-illumination on several scenes under different lighting conditions. iii Acknowledgments At the very beginning, I would like to take a moment to express my gratitude towards everyone who directly and indirectly played a role to help me complete this thesis. Firstly, I am very grateful to my supervisor, Dr. Charalambos Poullis for believing in me and for making me a part of his wonderful lab. I have enjoyed every moment of working side by side with him. His in-depth knowledge, patience and guidance during the most difficult phases of the project helped me navigate my way throughout the last couple of years. It would not have been possible if it was not for him. I would also like to thank the members of my lab and my friends Bodhiswatta [Bodhi] Chatterjee and Pinjing Xu [Alex] for their countless help and advices throughout the time I was a part of the ICT Lab. I would like to thank the respectable committee members, the other professors and staff of Concordia University for the help and support I have received from them. The research done as part of this thesis is based upon work supported by the Natural Sci- ences and Engineering Research Council of Canada Grants DG-N01670 (Discovery Grant) and DND-N01885 (Collaborative Research and Development with the Department of National Defence Grant). I want to thank Jonathan Fournier from Valcartier DRDC, and Hermann Brassard, Sylvain Pronovost, Bhakti Patel, and Scott McAvoy from Presagis Inc Canada, for their invaluable discus- sions and assistance. Last but not the least, I would like to thank my family for supporting me throughout my life. It would not have been possible at all if it had not been for you and your love. You are the source of my inspiration, joy and all the motivation. iv Contents List of Figures vii List of Tables xii 1 Introduction 1 1.1 Thesis Organization . 4 2 Literature Review 5 2.1 Deep learning and Convolutional Neural Networks . 5 2.1.1 LeNet . 6 2.1.2 AlexNet . 7 2.1.3 ResNet . 7 2.1.4 ResNeXt . 8 2.1.5 DenseNet . 8 2.1.6 EfficientNet . 10 2.2 Appearance Modeling and Inverse Rendering . 11 2.2.1 Procedural-based . 11 2.2.2 Deep Learning based . 13 2.3 Conclusion . 15 3 Background 17 3.1 Rendering and Different Rendering Techniques . 17 3.1.1 Rasterization . 18 v 3.1.2 Ray casting . 19 3.1.3 Raytracing . 19 3.1.4 Path tracing . 20 3.2 Reflection Models . 21 3.2.1 Bidirectional Reflectance Distribution Function . 21 3.2.2 BRDF Models and Acquisition . 22 4 Estimating Reflectance Properties and Relighting 26 4.1 Overview . 26 4.1.1 Materials . 27 4.1.2 Lights . 28 4.1.3 Camera viewpoints . 30 4.2 Dataset and Reflectance Maps Generation . 31 4.2.1 Generating realistic spheres dataset . 31 4.2.2 Generating reflectance maps from spheres . 32 4.3 Network Training . 33 4.3.1 Network Architecture . 33 4.3.2 Loss function . 33 4.3.3 Data augmentations . 34 4.3.4 Training and Inference . 34 4.4 Evaluation and Results . 36 4.4.1 Comparison of networks . 37 4.4.2 Analysis of image relighting . 38 5 Conclusion and Future work 47 5.1 Concluding Remarks . 47 5.2 Future Work . 48 vi List of Figures Figure 2.1 The architecture diagram of the LeNet model [56] that was designed to rec- ognize handwritten characters from images. 6 Figure 2.2 The architecture of AlexNet [53] which created a revolution in image clas- sification and introduced several new ideas including the use of data augmentation and GPUs for training. 7 Figure 2.3 Two different types of residual blocks are shown here. The left one corre- sponds to a residual block for the ResNet-34 [32] model. The right one, on the other hand, shows a bottleneck block for the ResNet-50/101/152 [32]. 8 Figure 2.4 Figure illustrating the fundamental concept of parallel paths of ResNext [102]. On the left, we can see the basic residual block of a ResNet. On the right, we can see parallel blocks of this same type, a total of 32 paths converging into one. The cardinality is 32 in this case. 9 Figure 2.5 Figure illustrating the flow of data through a DenseNet [37]. There are 3 dense blocks here. The layers in between two dense blocks are referred to as tran- sition layers and they contribute towards reducing the size of the feature maps with the help of convolution operations and pooling. Each dense block has 5 layers, with a growth rate, k = 4. Each of these layers also takes in the feature maps from the previous layers as input. 9 vii Figure 2.6 This figure shows the motivation behind the architectural design of Effi- cientNet [92] in contrast to other conventional ConvNets. Here, (a), (b), (c) and (d) show how each dimension i.e. width, number of channels, depth, etc are adjusted to achieve optimal performance. In comparison, (e) shows how EfficientNet uses all of these factors together and uniformly scales them with a fixed ratio to find the optimal results more efficiently. 10 Figure 3.1 The raytracing process involves tracing the rays from the camera through the image plane towards the scene geometry. This diagram from [16] shows the rays are traced for some pixels on the image plane that form the image of the sphere. The view rays get intersected with the sphere, from where the shadow ray is spawned towards the light source. The bottommost view ray hit the surface from where the shadow ray cannot reach the light, hence it is creating the shadow. 20 Figure 3.2 3 different renders generated using path tracing. The three images are us- ing rendered using sampling sizes 4, 32 and 256 respectively. The noisiness gets stronger as the number of samples is lowered. As the number of samples grows bigger, more rays end up reaching the light source, thereby, producing a cleaner and smoother image. 21 Figure 3.3 The figure [86] illustrates the measurements involved in computing the BRDF. A distant light source emits radiance Li. The incoming angles are expressed using the zenith and azimuth angles denoted by (θi,φi). The sensor captures the reflected radiance Lo which creates an angle represented by (θo,φo). (Variable symbols are edited from the original figure to match our equations.) . 22 viii Figure 4.1 Each row highlights each distinct property used to define a material. The topmost row, from left to right illustrates the effect of Kd. Changing the value of Kd alters the diffuse color of the material, which is tweaked in this image to change the color to red, then green and finally to blue. The second row demonstrates the effect of Ks, the specular component, on the overall color of the sphere. The Ks controls the reflectivity component, and also the color of the specular hotspot.