Differentiable Visual Computing

Differentiable Visual Computing by Tzu-Mao Li Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2019 © Tzu-Mao Li, MMXIX. All rights reserved. The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Author.............................................................. Department of Electrical Engineering and Computer Science May 10, 2019 Certified by . arXiv:1904.12228v2 [cs.GR] 27 May 2019 Frédo Durand Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Differentiable Visual Computing by Tzu-Mao Li Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Derivatives of computer graphics, image processing, and deep learning algorithms have tremen- dous use in guiding parameter space searches, or solving inverse problems. As the algorithms become more sophisticated, we no longer only need to differentiate simple mathematical func- tions, but have to deal with general programs which encode complex transformations of data. This dissertation introduces three tools, for addressing the challenges that arise when obtaining and applying the derivatives for complex graphics algorithms. Traditionally, practitioners have been constrained to composing programs with a limited set of coarse-grained operators, or hand-deriving derivatives. We extend the image processing language Halide with reverse-mode automatic differentiation, and the ability to automatically optimize the gradient computations. This enables automatic generation of the gradients of arbitrary Halide programs, at high performance, with little programmer effort. We demonstrate several applications, including how our system enables quality improvements of even traditional, feed-forward image processing algorithms, blurring the distinction between classical and deep learning methods. In 3D rendering, the gradient is required with respect to variables such as camera parame- ters, light sources, geometry, and appearance. However, computing the gradient is challenging because the rendering integral includes visibility terms that are not differentiable. We intro- duce, to our knowledge, the first general-purpose differentiable ray tracer that solves the full rendering equation, while correctly taking the geometric discontinuities into account. We show prototype applications in inverse rendering and the generation of adversarial examples for neural networks. Finally, we demonstrate that the derivatives of light path throughput, especially the second- order ones, can also be useful for guiding sampling in forward rendering. Simulating light transport in the presence of multi-bounce glossy effects and motion in 3D rendering is challenging due to the high-dimensional integrand and narrow high-contribution areas. We extend the Metropolis Light Transport algorithm by adapting to the local shape of the integrand, thereby increasing sampling efficiency. In particular, the Hessian is able to capture the strong anisotropy of the integrand. We use ideas from Hamiltonian Monte Carlo and simulate physics in Taylor expansion to draw samples from high-contribution region. Thesis Supervisor: Frédo Durand Title: Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments I would like to acknowledge by briefly reflecting how I ended up writing this dissertation. I was fascinated by computer science since I knew the existence of computers. I love the idea of automating tedious computation and the intellectual challenges involved in the process of automation. Naturally, I enrolled in a computer science program during my undergraduate education. At there I was introduced to the enchanting world of computer graphics, where people are able to generate beautiful images with equations and code, instead of pen and paper. During my undergraduate and master’s studies at National Taiwan University, I worked with Yung-Yu Chuang, the professor who brought me into computer graphics research. I started to read academic papers, and was mesmerized by people who contribute their own ideas to improve image generation. In the end I was also able to contribute my own little idea, by publishing a paper related to denoising Monte Carlo rendering images during my Master’s study. I hoped to do more, and decided to pursue a Ph.D. next. Among all the academic literature I studied, a few names caught my eyes. My Ph.D. advisor Frédo Durand is one of them. His work on frequency analysis for light transport simulation provides a rigorous and insightful theoretical foundation for sampling and reconstruction in light transport. I am also amazed by his versatility in research. At the point this dissertation was written, he has already worked on physically-based rendering, non-photorealistic rendering, computational photography, computer vision, geometric and material modeling, human-computer interaction, medical imaging, and programming systems. After I first met him, I also found out that he has a great sense of humor and is in general a very likable and good-tempered person. I joined the MIT graphics group as a rendering person. Frédo and I decided that research on Metropolis light transport [216] aligns our interests the most. Metropolis light transport is a classical rendering algorithm that was recently revitalized thanks to Wenzel Jakob’s reim- plementation of this notoriously difficult-to-implement algorithm. We found the classical literature of Langevin Monte Carlo [181] and Hamiltonian Monte Carlo [48], and thought that the derivatives information these algorithms use can also help Metropolis light transport. I also learned about the field of automatic differentiation, which later became the most essential tool of this thesis. Frédo, Luke Anderson, and Shuang Zhao had a lot of discussions with me on this topic, which helped to shape my thoughts. Frédo also brought Ravi Ramamoorthi, Jaakko Lehtinen, and Wenzel Jakob into this project. Ravi helped the most on this particular project. When we were collaborating, every week he would monitor my progress, and try to understand the current obstacles and provide advice. While the discussions with Frédo and others are usually higher-level, Ravi tried to understand every single detail of what I do. Explaining my 5 thought process to them clarified my thinking. Ravi and Frédo also have strong influences on my paper writing style. Frédo taught me to focus on the high-level pictures and Ravi taught me to explain everything in a clear manner. Like many other computer science research projects, this project ended up to be a huge undertaking of software engineering. Our algorithm requires the Hessian and gradient of the light transport contribution. Implementing this efficiently requires complex metaprogramming and is very difficult with existing tools. This also motivates my later work on extending the Halide programming language [171] for generating gradients. Nevertheless, I was still able to implement the first differentiable bidirectional path tracer that is able to generate gradient and Hessian of path contribution. We published a paper in 2015. The results from our first project on improving Metropolis light transport were quite encouraging. This motivated us to look deeper into this subfield. Our derivative-based algorithm helped local exploration, but the bigger issue of these Markov chain Monte Carlo methods lies in the global exploration. As light transport contribution is inherently multi-modal due to discontinuities, the Markov chains in Metropolis light transport algorithms usually have bad mixing. This typically manifests as blotchy artifacts on the images. We were hoping to have a better understanding of the global structure of light transport path space, in order to resolve the exploration issue. Unfortunately, this turns out to be more difficult than we imagined, due to the curse of dimensionality. As the dimensionality of the path space increases, the difficulty to capture the structure increases exponentially. We were able to get decent results by fitting Gaussians on low-dimensional cases (say, 4D), but I got stuck as soon as I proceeded to higher-dimensional space.1 I stuck on the global Metropolis light transport project for nearly two years. In the meantime, I also explored a few other directions. For example, I tried to generalize gradient domain rendering [127] to the wavelet domain. None of these attempts were very successful. As most researchers already knew, when working on a research project, it is very difficult to know whether the researcher is missing something, or the project simply will go nowhere in the first place. Still, I gained a lot of useful knowledge in these years. During this period, I helped Luke Anderson on his programming system for rendering, Aether [7]. The system stores the Monte Carlo sampling process symbolically, and automatically produces the probability density function of this process. Like my first Metropolis rendering project, this process also heavily involves metaprogramming and huge engineering efforts. This increased my interests

Load more