New Information-Theoretic Analyses and Algorithmic Methods for Parameter Estimation in Structured Data Settings and Plenoptic Imaging Models

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY

Abhinav Viswanathan Sambasivan

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy

Prof. Jarvis Haupt

I believe that no singular accomplishment is truly “singular,” and in that spirit, I would like to express my heartfelt gratitude to the plurality of factors that have led me to achieve this professional and personal milestone. First and foremost, I would like to begin by thanking my parents, who have always put my needs and joy over theirs, and strived to raise me as a good human being. I would like to thank my mother, Sujatha Viswanathan, for showing me what unconditional love is, and for emphasizing the value of good education over all else, and my father Viswanathan Sambasivan, for teaching me the importance of having faith, and for always keeping my spirits high. I would like to thank my grandmother, Smt. Visalakshi Sambasivan, and my aunts, uncles, and in-laws, who have constantly shown their support and instilled important family values in me. I would also like to thank my cousins−Aravind, Alamelu, Uma, and Krishnan and their spouses for holding me in a special place. I am deeply indebted to my Ph.D. advisor, Prof. Jarvis Haupt, for his unwavering support and for believing in me throughout my graduate study. He has always been approachable, counseled me when I needed his support, and shared his wisdom with me at every turn. My interactions with him over the years have greatly shaped my thought process and research outlook. I hope to carry these values and ethics with me throughout my professional life. I would like to express my gratitude to Defense Advanced Research Projects Agency (DARPA) for their ﬁnancial support throughout my graduate study. I would also like to thank all my collaborators and Principal Investigators who worked with me on the DARPA REVEAL project. I would especially like to thank Prof. James Leger and Prof. Joey Talghader for many productive research discussions. I also thank Di Lin, Takahiro Sasaki, and Connor Hashemi, who have helped me with the experimental setup and

i data collection for this project. I am privileged to have had the opportunity to collaborate with Dr. Richard Pax- man, from Maxar Technologies. I have learnt a lot about how to pursue a research problem from him. I am also very thankful to Prof. Gary Meyer and his students, Michael Tetzlaff and Michael Ludwig for discussions on topics in computer graphics and ray-tracing, which was a key component of this thesis. I would like to thank Prof. Nikos Sidiropoulos, Prof. Soheil Mohajer, Prof. Arindam Banerjee, and Prof. Tom Luo, whose graduate classes were pivotal for me to gain a strong understanding of the fundamental concepts in Electrical Engineering and Com- puter Science. I would also like to extend a special thanks to Prof. Anand Gopinath, for his support and guidance during the early stages of my graduate study. I would like to thank Prof. Georgios Giannakis, Prof. Soheil Mohajer, and Prof. Nikos Papanikolopoulos for serving on my Ph.D. committee and providing useful com- ments that helped improve my thesis. I would also like to thank my labmates−Swayambhoo Jain, Mojtaba Elyaderani, Xingguo Li, Sirisha Rambhatla, Jineng Ren, Di Xiao, Alex Gutierrez, Gamini Udawat, and Akshay Kumar for the numerous exciting and enlightening discussions that have helped me overcome hurdles in my research path. I am especially thankful to Swayamb- hoo Jain for his mentorship during my internship at Technicolor AI Labs. I have been extremely fortunate to have always been surrounded by a great set of friends. I am thankful to Ashwin Varadarajan, who has been one of my closest friends for the better part of my life. I would also like to thank Rohit Sridhar, Venkat Ram Subramanian and Ramesh K.G. for being my support system and for endless hours of fun and banter. I am grateful to my Minnesota family−Vaishnavi, Karthik, Deepak, and Subhash for filling the void of missing my home and family. I would like to express my limitless appreciation to my wife and best friend, Ramya Ramasubramanian, for standing beside me through thick and thin, for helping with the drafts of this thesis, for making me a better person every day, for myriad of sweet things she has done for me, and finally for showing me what true love is. Finally, I would like to thank the Almighty for giving me the strength, the ability, and the opportunity to pursue my dreams and pray that the good fortune bestowed upon me continues for a long time.

ii Dedicated to the loving memory of my grandfather Shri. A.V. Sambasivan, whose name I have inherited and whose values I wish to inherit, and my grandmother Smt. Kamala Ramdhas, whose love for me had no limits.

iii Abstract

Parameter estimation problems involve estimating an unknown quantity (or parameter) of interest, from a set of data (or observations) that contains some information about the parameter. Such problems are ubiquitous and widely studied across diverse disciplines in science and engineering including, but not limited to, physics, computer science, signal processing, computational genomics, and economics. Information-theoretic limits of a parameter estimation problem quantify the best-achievable performance (under a suitable metric), thus establishing the fundamental difficulty of solving the problem. A central theme of the first two parts of this work is to develop information-theoretic tools to analyze the fundamental limits of estimating parameters from noisy data under two very different settings: (1) the parameter of interest belongs to structured class of signals, and (2) a concise forward model relating the observations to the parameters is analytically challenging to obtain. The first part of this work examines the fundamental error characteristics of a general class of matrix completion problems, where the matrix of interest is a product of two a priori unknown matrices, one of which is sparse, and the observations are noisy. Our main contributions come in the form of minimax lower bounds for the expected per- element squared error for this problem under several common noise models. Our results establish that the error bounds derived in (Soni et al. 2016) for complexity-regularized maximum likelihood estimators achieve, up to multiplicative constants and logarithmic factors, the minimax error rates under certain (mild) conditions. The rest of this work focuses on plenoptic imaging, which usually involves taking multiple single snapshot images of a scene, collected across time (videos), wavelength (multi-spectral cameras), and from multiple vantage points (light field sensor arrays), thus providing substantially more information about a given scene than conventional imaging. For this thrust, we first focus on assessing the fundamental limits of scene parameter estimation in plenoptic imaging systems, with an eye towards passive indirect imaging problems. We develop a general framework to obtain lower bounds on the variance of unbiased estimators for scene parameter estimation from noisy plenoptic data. The novelty of this work lies in the use of computer graphics rendering software

iv to synthesize the (often-complicated) forward mapping to evaluate the Hammersley- Chapman-Robbins lower bound (HCR-LB), which is at least as tight as the more commonly used Cramer-Rao lower bound. When the rendering software yields inexact estimates of the forward mapping, we analyze the effects of such inaccuracies on the HCR-LB both theoretically and via simulations, and provide a method to obtain upper and lower intervals for the true HCR-LB. The final part of this work explores algorithmic methods for Non-Line-of-Sight (NLOS) imaging from (noisy) plenoptic data, where the aim is to recover a hidden scene of interest from noisy measurements that arise from reflections off a scattering surface, e.g. a wall, or the floor. We use the insight that plenoptic data is highly structured due to parallax and/or motion in the hidden scene and propose a multi-way Total Variation (TV) regularized inversion methodology to leverage this structure and recover hidden scenes. We demonstrate our recovery algorithm on real-world plenoptic data measurements at visible and Long-Wave InfraRed (LWIR) wavelengths. Experi- ments in LWIR (or thermal) imaging shows that it is possible to reliably image human subjects around a corner, nearly in real-time, using our framework.

v Contents

Acknowledgementsi

Abstract iv

List of Tables ix

List of Figuresx

1 Introduction1 1.1 Information-Theoretic Analyses of Parameter Estimation Problem...2 1.2 Algorithmic Methods for Non-Line-of-Sight Plenoptic Imaging...... 5

2 Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models7 2.1 Introduction...... 7 2.1.1 Organization...... 8 2.1.2 Notations and Preliminaries...... 8 2.2 Problem Statement...... 9 2.2.1 Observation Model...... 9 2.2.2 The Minimax Risk...... 10 2.3 Main Results and Implications...... 12 2.3.1 Additive Gaussian Noise...... 14 2.3.2 Additive Laplace Noise...... 15 2.3.3 One-bit Observation Model...... 16 2.3.4 Poisson-distributed Observations...... 19 2.4 Conclusion...... 21 2.5 Acknowledgement...... 22

vi 2.6 Appendix...... 22 2.6.1 Proof of Theorem 2.1...... 23 2.6.2 Proof of Corollary 2.1...... 28 2.6.3 Proof of Corollary 2.2...... 28 2.6.4 Proof of Corollary 2.3...... 29 2.6.5 Proof of Theorem 2.2...... 30

3 Parameter Estimation Lower Bounds for Plenoptic Imaging Systems 36 3.1 Introduction...... 37 3.1.1 Prior Art...... 38 3.1.2 Our Contribution...... 39 3.2 Forward Model...... 40 3.2.1 The Rendering Equation...... 40 3.2.2 Illustrative Example Scene: A Π-shaped Hallway...... 42 3.3 Problem Statement...... 42 3.4 Renderer-Enabled Computation of Lower Bounds...... 44 3.4.1 HCR Lower Bound for Poisson Noise...... 46 3.4.2 HCR Lower Bound for Additive White Gaussian Noise...... 47 3.4.3 Localizing Information Content in Plenoptic Observations.... 47 3.4.4 Experimental Evaluation: Lower bounds for Π-shaped Hallway Scene...... 48 3.5 Computing Lower bounds with Inexact Rendering...... 52 3.5.1 Estimating HCR Lower Bounds from Inexact Rendering..... 57 3.5.2 Experimental Validation: NLOS Object Localization...... 60 3.6 Maximum Likelihood Estimation...... 64 3.6.1 Experimental Evaluation: NLOS Object Localization using Max- imum Likelihood Estimation...... 65 3.7 Conclusion...... 68 3.8 Acknowledgment...... 69 3.9 Appendix...... 70 3.9.1 Empirical validation of Assumption A.2 ...... 70 3.9.2 Proof of Theorem 3.1...... 72 3.9.3 Proof of Theorem 3.2...... 75

vii 4 Non-Line-Of-Sight Imaging from Plenoptic Observations 76 4.1 Introduction...... 76 4.1.1 Prior Art...... 77 4.2 Problem Formulation...... 78 4.2.1 The NLOS Imaging Problem...... 79 4.2.2 Notation and Preliminaries...... 81 4.3 Multi-Way Total Variation Regularization for NLOS Scene Recovery.. 82 4.4 Signal-Separation to Noise Ratio: An Unsupervised Evaluation Metric. 86 4.5 Recovering a 2D NLOS light ﬁeld...... 88 4.5.1 Experimental Setup...... 88 4.5.2 Results and Discussion...... 89 4.6 NLOS Scene Recovery using Thermal Imaging...... 91 4.6.1 Experimental Setup...... 91 4.6.2 Results and Discussion...... 93 4.7 Conclusion...... 97 4.8 Acknowledgment...... 98

5 Directions for Future Work 99

References 101

viii List of Tables

3.1 Comparison of HCR lower bound and performance of MLE for AWGN with σ = 0.1...... 66 4.1 Average throughput of reconstruction methods in frames/sec (fps). Im- age acquisition rate was 10 fps. Methods (or settings) with greater throughputs > 10 fps, show potential for real-time imaging and are highlighted in bold...... 96 4.2 Average ∆-SSNR (over all video frames) for thermal NLOS recovery (in dB). For each video, the method with highest Average ∆-SSNR is highlighted in bold...... 96 5.1 Acronyms used (in alphabetical order)...... 110

ix List of Figures

1.1 A typical parameter estimation problem consists of the “forward model,” which describes how noisy observations y are generated from the parameter of interest θ∗, and the “inverse problem,” which entails estimating θ∗ from y. The noise (and hence the observations) are modeled as a random quantities whose distribution is assumed to be known a priori. The distance between θb(y) and θ∗ (under a suitable, speciﬁed distance metric) determines the “goodness” of the estimator...... 2 3.1 The rendering equation, explained graphically: (a) The proportion of in-

cident light coming in from direction ϕi that gets reﬂected along direction

ϕo is determined by the BRDF of the surface; (b) Light incident on a surface point r, can be seen as light leaving from another point in the in out scene g(r, ϕi), ⇒ Lθ∗ (r, −ϕi) = Lθ∗ (g(r, ϕi), −ϕi)...... 41 3.2 Simulating the forward model using rendering: (a) Layout of a Π-shaped hallway with dimensions marked. Corridors A, B, and C are 2.5m, 3m, and 2.5m long respectively, and 2m tall. The hallway is illuminated with −1 −2 white ceiling lights with a luminance of 3 lm · sr m . The camera C0 is located 0.5m outside corridor A. Location and radius of a red spherical ball constitute the unknown scene parameter θ∗. (b) If we deﬁne θ∗ by setting ball radius = 10cm and ball location as the intersection of corridor

A and B, then Lθ∗ is the nominal RGB image of the scene captured by

a camera at C0. We obtain Lθ∗ using the rendering engine Mitsuba [1] as shown in (b)...... 43

x 3.3 HCR-LB for ball location estimation for Poisson Noise: (a)-(d), and AWGN with different values of σ: (e)-(h). HCR-LB under different regimes are shown in: (a),(e) LOS region - HCR-LB is very small. The LB drops significantly when ball starts moving in Corridor B; (b),(f) Transition from LOS to NLOS - Sharp increase in the HCR-LB when the ball moves away from LOS; and (c),(g) NLOS region - HCR-LB in is much higher, indicating the potential hardness of the estimation problem; (d),(h) Ball radius estimation - HCR-LB decreases with increasing size (radius) of the ball. With the help of these curves, one can quantify how difficult the problem of estimating NLOS parameters can be. For AWGN model, we can see that HCR-LB increases with σ as expected.. 49 3.4 Pixelwise FD-Fisher Information (FD-FI), obtained by aggregating contributions from all 30 spectral channels. Darker regions ⇒ more informative. Pixelwise FD-FI shows where and how information about the parameter of interest is localized in our observations. These images highlight subtle details about the scene parameters which are not visible from the nominal RGB images (bottow-row) which are obtained from the rendering software; (a) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row) for 4 different ball locations: (from left to right) completely in LOS, just inside LOS, just outside LOS, center of corridor B. Notice that different regions in the scene are more informative than others for different ball locations; (b) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row). These images show where information about ball radius is localized. Notice that regions of information differ from the FD-FI images for ball location in Figure 3.4(a)...... 50

xi 3.5 (a) Top-view of the scene layout used. Scene geometry is the same as in Section 3.2.2. Instead of a red spherical object, we consider a red teapot, and the camera is now placed in the middle of hallway and captures RGB images. The scalar parameter of interest θ∗ is the horizontal displacement of the teapot from intersections of corridors A and B. RGB images rendered using Redner for diﬀerent values of θ∗ are shown in (b) and (c). 65536 samples-per pixel were used, and it took around 3.3 minutes to render each scene; (b) θ∗ = 0.2m: teapot fully in LOS; (c) θ∗ = 0.9m: teapot just moved completely away from LOS...... 61 3.6 HCR-LB for estimation of teapot location under AWGN and Poisson

Noise. HCRNeﬀ (Red lines): HCR-LB computed directly using rendered

data with Neﬀ = 65536 samples per-pixel; HCRd (Black lines): HCR-LB estimated from rendering scenes with N = 2048, 3072,..., 11264. Due to h ∗ i ∗ rendering errors, typically we have E HCR(d θ ) ≥ HCR(θ ) ≥ HCRNeﬀ .

The region between HCRd and HCRNeff denotes the interval within which the true HCR-LB is likely to lie. (a) HCR-LB for Poisson noise model. (b) HCR-LB for AWGN with σ = 0.1. (c) HCR-LB when the teapot is not in LOS for σ = 0.1, 0.2, 0.4, 0.6, and 0.8...... 62 3.7 Effect of samples per-pixel on λ and the HCR functional f(λ) for estimation of teapot location. Noise model: AWGN with σ = 0.1, true object location θ∗ = 1.05m. (a) λeN in the neighborhood of θ∗ = 1.05m - shows how λeN decreases with N uniformly for all values of ∆. (b) Plot of estimated and observed λ’s for θ∗ = 1.05m shows that λeNeff ≥ λb, even with

Neﬀ = 65536 samples per-pixels. (c) HCR functional obtained from the estimated and observed λ’s...... 62

3.8 Relationship between HCRd and HCRNeff for different teapot locations (a)-(c) for AWGN with σ = 0.8, (d)-(f) for Poisson Noise model; under 2 different scenarios: (a),(d) Maximum of the HCR functional occurs for

∆ → 0 implies that HCRd is much larger than HCRNeﬀ . (b),(e) Maximum

of the HCR functional occurs for k∆k 0 implies that HCRd and HCRNeﬀ are approximately equal. (c),(f) HCR-LB for 0.7m ≤ θ∗ ≤ 1.69m..... 63

xii 3.9 Top row: Clean Images for different teapot locations rendered using 65536 samples per-pixel. Bottom row: A single instance of noisy images corrupted by AWGN with σ = 0.1. After the teapot goes completely out of LOS (θ∗ > 0.9m), it is very hard to discern any information about the teapot by simply looking at these images (both from clean and the noisy versions)...... 65 3.10 Comparison of HCR lower bound and MLE for AWGN with σ = 0.1... 67 3.11 Top left: A single instance of the teapot image rendered with 1024 samples. Other plots: Per-pixel variance (summed over the 3 color channels) for the teapot image for different values of samples per-pixel N. It can be seen that per-pixel variance is not same across all pixels in the image. While the general pattern of pixel-wise variance is similar across different values of N, the magnitude of the variance decreases (as expected) with increasing samples...... 71 4 3.12 Results from the simulation with 10 independent draws of weights Wω: (a) Average Squared L2-Error of fit vs Degree p; (b) Distribution of the

optimal degree popt shows that most of it is concentrated around p = 1; (c) A single instance of weighted sum of pixel variance (γ(N)) along with the model ﬁt using p = 1...... 72

4.1 Illustration of a typical NLOS imaging problem. The camera FOV Fc defines the set of all points on the reflecting surface corresponding to the pixels captured by the camera located at c. The light field ob-

served/measured by the camera from a surface point r ∈ Fc is denoted out (c−r) by L (r, ϕo), where the outgoing ray direction ϕo = ...... 80 kc−rk2

xiii 4.2 Experimental setup used for the 2D NLOS light field recovery. (a) Layout of the measurement studio. Length of rotating arm is 30cm, scattering surface is a brushed metal sheet coated with satin paint, 2 LED strips placed 9cm apart are the (hidden) 1D objects of interest. A CMOS camera captures pictures of the scattering surface at multiple angles, which constitutes the observed light field. (b) A picture of the measurement studio. All the apparatus is placed inside an enclosure and the surfaces in the interior of the enclosure are covered with anti-reflection black felt to minimize ambient light and unwanted reflections. While the picture here shows only one LED strip, the actual experimental measurements involved 2 LED strips...... 88 4.3 Results for NLOS 2D light field recovery using multi-way TV regularization. (a) Observed light field, Y. (b) Reconstructed (incident) light field containing the hidden scene, Xb . ∆-SSNR = 30.3055 dB. (c)-(f) An overlay of the 1D observed and reconstructed light fields at different scatter points. Each plot corresponds to a single column of the 2D light fields shown in (a) and (b). Our regularized inversion algorithm produces sharp reconstructions of the 1D objects from diffuse or blurry measurements.. 90 4.4 Experimental setup used for thermal NLOS imaging. (a) Layout of the L-shaped hallway scene used. A LWIR camera mounted on an 1m long computer-controlled arm captures thermal images of a flat piece of masonite at 10 frames/second. The hidden scene comprises of a moving

human subject, who is located at a distance dH from the wall. (b) A picture of the experimental setup. Reﬂecting surface used in our experiments (masonite) is diﬀerent from the one pictured here (black masonite). 92 4.5 Observed (noisy) LWIR images and corresponding reconstructions (of

the “mirror-reﬂections”) for dH = 1.8m. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iterations (column 4) are approximately the same. Full videos are available in the supplementary material...... 94

xiv 4.6 Observed (noisy) LWIR images and corresponding reconstructions (of the

“mirror-reﬂections”) for dH = 2.7m. The signal level in the observations are signiﬁcantly smaller compared to Figure 4.5 since the person is farther away from the wall. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iteration (column 4) are approximately the same. Full videos are available in the supplementary material...... 95 4.7 Masonite scattering surface as seen by a visible camera...... 96

xv Chapter 1

Introduction

Parameter estimation problems involve estimating an unknown quantity (or parameter) of interest, from a set of data (or observations) that contains some information about said the parameter. Such problems are ubiquitous and widely studied across diverse disciplines in science and engineering including, but not limited to, physics, computer science, signal processing, computational genomics, and economics. The steps involved in a typical parameter estimation problem are outlined in Figure 1.1. The functional dependence between the parameter of interest θ∗, and the observations y, is described by the so-called “forward operator” f(·), which is typically assumed to be known a priori. In real-world applications, the observations y are modeled as noisy versions of the nominal forward mapping f(θ∗), where the noise might arise from a variety of sources, e.g., unmodeled eﬀects, quantization errors, etc. This constitutes the forward model. The inverse problem, as the name suggests, involves recovering θ∗ back from noisy observations y using an estimation algorithm (or an estimator, in short). The error (under a suitable metric) incurred by an estimator θb in recovering the true parameter determines the accuracy of the estimates. Thus, estimation algorithms are often ac- companied by an analysis of their respective estimation errors, which determines the “goodness” of the estimates. A useful complement involves analyzing the fundamental (or information-theoretic) limits of estimating parameters from noisy data, which quantiﬁes the smallest achievable error (by any estimator). Such analyses provide a benchmark for optimality against which the performance of any estimator can be compared.

1 2

Figure 1.1: A typical parameter estimation problem consists of the “forward model,” which describes how noisy observations y are generated from the parameter of interest θ∗, and the “inverse problem,” which entails estimating θ∗ from y. The noise (and hence the observations) are modeled as a random quantities whose distribution is assumed to be known a priori. The distance between θb(y) and θ∗ (under a suitable, speciﬁed distance metric) determines the “goodness” of the estimator.

1.1 Information-Theoretic Analyses of Parameter Estima- tion Problem

It is well-known that an inverse problem depends on two key components: (1) the complexity of the parameter class, and (2) the forward model. If the parameter of interest θ∗ belongs to a structured class of signals (or vectors), then it is possible to leverage this structure and develop estimators that incur small errors; e.g., compressive sensing methods utilize such insights to accurately recover signals that admit sparse representations under a suitable basis (see e.g. results in [2]). The forward model also plays a crucial role in determining how much information about θ∗ is conveyed by the noisy observations y; e.g., results in compressive sensing have shown that measurement matrices with independent and identically distributed (i.i.d) Gaussian entries are a good forward model to recover sparse signals from linear measurements (see e.g. results in [2]). Unsurprisingly, the aforementioned factors play a crucial role in determining the fundamental limits of a parameter estimation problem as well. In Chapters2 and3 of this dissertation, we analyze the fundamental limits of parameter estimation from an information-theoretic standpoint for two different problems, one where the signal of interest lies in a structured class, and the other where the forward model is complicated and highly non-linear, making it difficult to obtain lower bounds using conventional methods, respectively. 3 We first examine the fundamental error characteristics for a general class of matrix completion problems, where the matrix of interest belongs to a structured class, in Chapter2. In particular, we consider matrices that can be expressed a product of two a priori unknown matrices, one of which is sparse. Our main contributions come in the form of minimax lower bounds for the expected per-element squared error for this problem under several common noise models. Specifically, we analyze scenarios where the corruptions are characterized by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations, as instances of our general result that appeared in [3]. Our results establish that the error bounds derived in [4] for complexity-regularized maximum likelihood estimators achieve, up to multiplicative constants and logarithmic factors, the minimax error rates in each of these noise scenarios, provided that the nominal number of observations is large enough, and the sparse factor has (on an average) at least one non-zero per column. It is worth pointing out that the lower bounds derived here quantify the smallest achievable errors by any estimator over all possible values of the parameter of interest. In other words, the lower bounds derived in Chapter2 are not a function of the individual parameter value, but depend on the entire parameter class, and hence are known as global lower bounds. The second part of this dissertation involves analyzing the fundamental limits of scene parameter estimation in plenoptic imaging systems, with an eye towards passive indirect imaging problems. In imaging science, plenoptic functions (also known as “light fields”) are high-dimensional functions (often 5D or higher) that describe the amount of light flowing through every point in space in every direction [5]. Given that tradi- tional images can be interpreted as projections of the plenoptic function onto distinct (2D) spatial planes, exploiting the other dimensions of the plenoptic function can provide substantially more information about scenes of interest than do single snapshots. Specifically, we are interested in the fundamental limits of estimating scene parameters that are not in the line-of-sight (LOS) of the imaging system from information-rich (but noisy) plenoptic observations. In Chapter3, we present a general framework to compute lower bounds on the per- parameter mean squared error (MSE) of any unbiased estimator from noisy plenoptic data. The proposed framework builds on our initial work that appeared in [6] and enables us to compute local lower bounds (bounds that are a function of the individual 4 parameter value) for plenoptic imaging problems. Unlike the matrix completion problem mentioned above, the forward model in plenoptic imaging settings is analytically challenging to express in closed-form, as it involves solving an integral equation (Fred- holm integral equation of the second kind). We circumvent this roadblock by using computer graphics rendering software to synthesize the forward model and numerically evaluate the Hammersley-Chapman-Robbins lower bound (HCR-LB) to establish lower bounds on the variance (or equivalently, the MSE) of any unbiased estimator of the unknown parameters. The HCR-LB enjoys several advantages over the more commonly used Cramer-Rao lower bound (CR-LB). Firstly, the HCR-LB doesn’t make any regularity assumptions about the log-likelihood function and hence is applicable to a broader class of problems. Secondly, the HCR-LB is at least as tight as the CR-LB when both bounds exist. Unlike the CR-LB, the HCR-LB doesn’t require computing derivatives of the score function, which is challenging to do here (since the forward model is not easily described in closed-form, computing its derivatives is even more of a challenge), and hence is a natural choice for our setting. The potential benefits of using the HCR- LB come at a cost of increased computational requirements. Evaluating the HCR-LB requires solving an optimization problem that can be computationally demanding relative to evaluating the CR-LB, which simply requires computing (or approximating using Finite Difference (FD) methods) the derivatives of the log-likelihood function. While computing lower bounds typically requires knowledge of the true forward mapping (θ∗ 7→ f(θ∗)) exactly, in practice, rendering software packages only provide an approximate solution of the true forward mapping. For scenarios where the rendering error is non-negligible, we analyze the effects of such rendering inconsistencies on the proposed lower bounding framework both theoretically and via simulations. We also provide a simple method to obtain upper and lower intervals for the true HCR-LB in the presence of rendering errors. In addition to computing lower bounds, our framework enables us to localize the information content in the plenoptic observations using the pixelwise Fisher Information metric (computed using FD), thus providing valuable insights about Non-Line-of-Sight (NLOS) imaging problems. Some of the findings from our framework provide additional validation to the phenomenon observed in [7–11], where the authors use occluders to aid in NLOS scene recovery. We demonstrate the utility of our framework by computing the HCR-LB under Poisson noise and additive white Gaussian noise models, for a few canonical estimation problems. We also compare the lower bounds with the performance of Maximum Likelihood Estimators (MLEs) for 5 an object localization problem, which shows that for the scenarios we examined our lower bounds are nearly tight and hence are indicative of the true fundamental limits.

1.2 Algorithmic Methods for Non-Line-of-Sight Plenoptic Imaging

In the third and final part of this dissertation, we move away from the paradigm of fundamental limits and focus on algorithmic methods for NLOS imaging from noisy plenoptic data, where the aim is to recover a hidden scene of interest from noisy measurements that arise from reflections off a scattering surface, e.g. a wall, or the floor. Unlike the previous problem where the parameter of interest is related to the observations through multiple (and potentially, infinite) bounces, the aim of NLOS imaging is to invert a “single-bounce” model and recover the light field incident on the reflecting surface. Using the rendering equation [12], it can be shown that the NLOS observations are a linear function of the hidden scene and the reflectance properties of the surface. At visible wavelengths, commonly occurring surfaces scatter the incident light diffusely (over a wide range of angles), making the forward model of the NLOS imaging problem extremely ill-conditioned, and thus, challenging. However, plenoptic data, which typically comprises multiple single snapshot images collected across time and through multiple viewpoints is a highly structured multi-way tensor and has smooth variations across all plenoptic dimensions, due to parallax and/or motion in the hidden scene. We propose leveraging this structure of the plenoptic function (or the light field) using a multi-way Total Variation (TV) regularized linear inversion methodology, which jointly enforces sparse gradients across multiple plenoptic dimensions, to recover the hidden scene of interest. In Chapter4, we present an algorithm based on the split Bregman method [13] to solve our regularized linear inverse problem. The proposed algorithm has a fast convergence rate and admits a distributed (GPU-accelerated) implementation. Additionally, our algorithm only requires access to the linear forward model via function calls to the forward operator (and its adjoint), and doesn’t require storing the full forward operator as a matrix. This can be extremely beneficial as matricized forward operators can get very large even for problems of modest size. We demonstrate the efficacy of our regularized inversion algorithm for recovering simple 2D NLOS scenes from real data collected 6 in a real-world experimental setup. A common method to quantify the performance of a signal recovery algorithm is to use Signal to Noise Ratio (SNR). However, computing the SNR requires access to the ground truth data. For NLOS imaging experiments, measuring ground truth data can be an arduous or even an impossible task. To overcome this hurdle, we propose a novel metric called the Signal-Separation to Noise Ratio (SSNR), as evaluation metric for NLOS imaging problems. The proposed SSNR metric can be directly estimated from (observed and reconstructed) images without the ground truth data, and also faithfully quantify the reconstruction quality. We use this metric to quantify the accuracy of our reconstructions. It was recently observed in [14] that ordinary materials (e.g. rough metallic surfaces, colored acrylic) behave almost mirror-like at infra-red wavelengths. Using this insight, we apply our multi-way TV regularized inversion methodology for recovering hidden scenes from noisy plenoptic data at Long-Wave InfraRed (LWIR) wavelengths. Experimental results on real data shows that it is possible to reliably image human subjects around a corner, nearly in real-time, using our algorithm. Furthermore, we compare our multi-way TV regularized method against applying the standard 2D TV regularization independently across multiple frames of the plenoptic images as in [14]. Reconstruction results show the added benefits of jointly exploiting the structure across multiple plenoptic dimensions, both qualitatively and quantitatively. Finally, it is worth noting that while the notation within each chapter is consistent, there might be some variations and reuse of symbols across chapters. Since each chapter is a self-contained effort, there might also be some overlap in the introductory material throughout. Chapter 2

Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models

2.1 Introduction

The matrix completion problem involves imputing the missing values of a matrix from an incomplete, and possibly noisy sampling of its entries. In general, without making any assumption about the entries of the matrix, the matrix completion problem is ill- posed and it is impossible to recover the matrix uniquely. However, if the matrix to be recovered has some intrinsic structure (e.g., low rank structure), it is possible to design algorithms that exactly estimate the missing entries. Indeed, the performance of low-rank matrix completion and estimation methods have been extensively studied in noiseless settings [15–19], in noisy settings where the observations are aﬀected by additive noise [20–26], and in settings where the observations are non-linear (e.g., highly- quantized or Poisson distributed observation) functions of the underlying matrix entry (see, [27–29]). Recent works which explore robust recovery of low-rank matrices under malicious sparse corruptions include [30–33]. A notable advantage of using low-rank models is that the estimation strategies involved in completing such matrices can be cast into eﬃcient convex methods which are well-understood and suitable to analyses. The fundamental estimation error characteristics for more general completion problems, for example, those employing general

7 8 bilinear factor models, have not (to our knowledge) been fully characterized. In this work, we provide several new results in this direction. Our focus here is on matrix completion problems under sparse factor model assumptions, where the matrix to be estimated is well-approximated by a product of two matrices, one of which is sparse. Such models have been motivated by a variety of applications in dictionary learning, subspace clustering, image demosaicing, and various machine learning problems (see, e.g. the discussion in [4]). Here, we investigate fundamental lower bounds on the achievable estimation error for these problems in several speciﬁc noise scenarios – additive Gaussian noise, additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations. Our analyses compliment the upper bounds provided recently in [4] for complexity-penalized maximum likelihood estimation methods, and establish that the error rates obtained in [4] are nearly minimax optimal (as long as the nominal number of measurements is large enough).1

2.1.1 Organization

The remainder of this chapter is organized as follows. We begin with a brief overview of the various preliminaries and notations in Section 2.1.2 followed by a formal definition of the matrix completion problem considered here in Section 2.2. Our main results are stated in Section 2.3; there, we establish minimax lower bounds for the recovery of a matrix X∗ that admits a sparse factorization under a general class of noise models. We also briefly discuss the implications of these bounds for different instances noise distributions and compare them with existing works. In Section 2.4 we conclude with a concise discussion of possible extensions and potential future directions. The proofs of our main results are provided in the Appendix.

2.1.2 Notations and Preliminaries

We provide a brief summary of the notations used here and revisit a few key concepts before delving into our main results. We let a ∨ b = max{a, b} and a ∧ b = min{a, b}. For any n ∈ N,[n] denotes the set of integers {1, . . . , n}. For a matrix M, we use the following notation: kMk0 denotes the number of non-zero elements in M, kMk∞ = maxi,j |Mi,j| denotes the entry-wise

1The material in Chapter2 is c 2018 IEEE. Reprinted, with permission, from IEEE Transactions on Information Theory, “Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models,” A.V. Sambasivan and J. D. Haupt. 9 qP 2 maximum (absolute) entry of M, and kMkF = i,j Mi,j denotes the Frobenius norm. We use the standard asymptotic computational complexity (O, Ω, Θ) notations to suppress leading constants in our results for clarity of exposition. We also brieﬂy recall an important information-theoretic quantity, the Kullback- Leibler divergence (or KL divergence). When x(z) and y(z) denote the pdfs (or pmfs) of real scalar random variables, the KL divergence of y from x is denoted by K(Px, Py) and given by

x(Z) K( , ) = log Px Py EZ∼x(Z) y(Z) x(Z) log , , Ex y(Z) provided x(z) = 0 whenever y(z) = 0, and ∞ otherwise. The logarithm is taken be the natural log. It is worth noting that the KL divergence is not necessarily commutative and, K(Px, Py) ≥ 0 with K(Px, Py) = 0 when x(Z) = y(Z). In a sense, the KL divergence quantiﬁes how “far” apart two distributions are.

2.2 Problem Statement

2.2.1 Observation Model

∗ n ×n We consider the problem of estimating the entries of an unknown matrix X ∈ R 1 2 that admits a factorization of the form,

X∗ = D∗A∗, (2.1)

∗ n ×r ∗ r×n where for some integer r ∈ [n1 ∧ n2], D ∈ R 1 and A ∈ R 2 are a priori unknown factors. Additionally, our focus in this paper will be restricted to the cases where the ∗ matrix A is k-sparse (having no more than k ≤ rn2 nonzero elements). We further note here that if k < r, then the matrix A∗ will necessarily have zero rows which can be removed without aﬀecting the product X∗. Hence without loss of generality, we assume that k ≥ r. In addition to this, we assume that the elements of D∗ and A∗ are bounded, so that ∗ ∗ kD k∞ ≤ 1 and kA k∞ ≤ Amax, (2.2) 10 ∗ for some constant Amax > 0. A direct implication of (2.2) is that the elements of X ∗ are also bounded (kX k∞ ≤ Xmax ≤ rAmax). However in most applications of interest

Xmax need not be as large as rAmax (for example, in the case of recommender systems, ∗ the entries of X are bounded above by constants, i.e. Xmax = O(1)). Hence we further assume here that Xmax = Θ(Amax) = O(1). While bounds on the amplitudes of the elements of the matrix to be estimated often arise naturally in practice, the assumption that the entries of the factors are bounded ﬁxes some of the scaling ambiguities inherent to the bilinear model. Instead of observing all the elements of the matrix X∗ directly, we assume here that we make noisy observations of X∗ at a known subset of locations. In what follows, we will model the observations Yi,j as independent draws from a probability distribution ∗ (or mass) function parametrized by the true underlying matrix entry Xi,j. We denote by S ⊆ [n1] × [n2] the set of locations at which observations are collected, and assume that these points are sampled randomly with E[|S|] = m (which denotes the nominal number of measurements) for some integer m satisfying 1 ≤ m ≤ n1n2. Speciﬁcally, for

γ0 = m/(n1n2), we suppose S is generated according to an independent Bernoulli(γ0) model, so that each (i, j) ∈ [n1] × [n2] is included in S independently with probability ∗ γ0. Thus, given S, we model the collection of |S| measurements of X in terms of the collection {Yi,j}(i,j)∈S , YS of conditionally (on S) independent random quantities. The joint pdf (or pmf) of the observations can be formally written as

Y p ∗ (Y ) p ∗ (Y ) ∗ , (2.3) XS S , Xi,j i,j , PX (i,j)∈S where p ∗ (Y ) denotes the corresponding scalar pdf (or pmf), and we use the short- Xi,j i,j ∗ ∗ hand XS to denote the collection of elements of X indexed by (i, j) ∈ S. Given S ∗ and the corresponding noisy observations YS of X distributed according to (2.3), our matrix completion problem aims at estimating X∗ under the assumption that it admits a sparse factorization as in (2.1).

2.2.2 The Minimax Risk

In this paper, we examine the fundamental limits of estimating the elements of a matrix that follows the model (2.1) and observations as described above, using any possible estimator (irrespective of its computational tractability). The accuracy of an estimator 11 Xb in estimating the entries of the true matrix X∗ can be measured in terms of its risk R which we deﬁne to be the normalized (per-element) Frobenius error, Xb

h ∗ 2 i EYS kXb − X kF RX , . (2.4) b n1n2

Here, our notation is meant to denote that the expectation is taken with respect to all of the random quantities (i.e., the joint distribution of S and YS ). Let us now consider a class of matrices parametrized by the inner dimension r, sparsity factor k and upper bound Amax on the amplitude of elements of A, where each element in the class obeys the factor model (2.1) and the assumptions in (2.2). Formally, we set

n1×n2 n1×r r×n2 X (r, k, Amax) , {X = DA ∈ R : D ∈ R , kDk∞ ≤ 1 and A ∈ R ,

kAk0 ≤ k, kAk∞ ≤ Amax}. (2.5)

The worst-case performance of an estimator Xb over the class X (r, k, Amax), under the Frobenius error metric deﬁned in (2.4), is given by its maximum risk,

ReX , sup RX. b ∗ b X ∈X (r,k,Amax)

The estimator having the smallest maximum risk among all possible estimators and is said to achieve the minimax risk, which is a characteristic of the estimation problem itself. For the problem of matrix completion under the sparse factor model described in Section 2.2.1, the minimax risk is expressed as

R∗ inf R X (r,k,Amax) , eXb Xb = inf sup RX ∗ b Xb X ∈X (r,k,Amax) h ∗ 2 i EYS kXb − X kF = inf sup . ∗ Xb X ∈X (r,k,Amax) n1n2 (2.6)

As we see, the minimax risk depends on the choice of the model class parameters r, k and Amax. It is worth noting that inherent in the formulation of the minimax risk are 12 the noise model and the nominal number of observations (m = E[|S|]) made. For the sake of brevity, we shall not make all such dependencies explicit. In general it is complicated to obtain closed form solutions for (2.6). Here, we will adopt a common approach employed for such problems, and seek to obtain lower bounds on the minimax risk, R∗ using tools from [34]. Our analytical approach X (r,k,Amax) is inspired also by the approach in [35], which considered the problem of estimating low- rank matrices corrupted by sparse outliers.

2.3 Main Results and Implications

In this section we establish lower bounds on the minimax risk for the problem settings deﬁned in Section 2.2 where the KL divergences of the associated noise distributions exhibit a certain property (a quadratic upper bound in terms of the underlying parameters of interest; we elaborate on this later). We consider four diﬀerent noise models; additive Gaussian noise, additive Laplace noise, Poisson noise, and quantized (one-bit) observations, as instances of our general result. The proof of our main result presented in Theorem 2.1 appears in Appendix 2.6.1.

Theorem 2.1. Suppose the scalar pdf (or pmf) of the noise distribution satisﬁes, for all x, y in the domain of the parameter space,

1 2 K(Px, Py) ≤ 2 (x − y) , (2.7) 2µD for some constant µD which depends on the distribution. For observations made as independent draws YS ∼ PX∗ , there exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys

( ) ∗ 2 2 2 n1r + k R ≥ C · min ∆(k, n2)A , γ µ , (2.8) X (r,k,Amax) max D m where

∆(k, n2) , min {1, (k/n2)} . (2.9)

Let us now analyze the result of this theorem more closely and see how the estimation risk varies as a function of the number of measurements obtained, as well as the 13 dimension and sparsity parameters sof the matrix to be estimated. We can look at the minimax risk in equation (2.8) in two diﬀerent scenarios w.r.t the sampling regime:

• Large sample regime or when m (n1r ∨ k), (where we use the notation to suppress dependencies on constants). In this case we can rewrite (2.8) and lower bound the minimax risk as ∗ 2 n1r + k R = Ω (µD ∧ Amax) ∆(k, n2) . (2.10) X (r,k,Amax) m

Here the quantities n1r and k (which give the maximum number of non-zeros in D∗ and A∗ respectively) can be viewed as the number of degrees of freedom

n1r contributed by each of the factors in the matrix to be estimated. The term m · ∆(k, n2) can be interpreted as the error associated with the non-sparse factor ∗ which follows the parametric rate (n1r/m) when k ≥ n2, i.e. A (on an average) has more than one non-zero element per column. Qualitatively, this implies that all the degrees of freedom oﬀered by D∗ manifest in the estimation of the overall matrix X∗ provided there are enough non-zero elements (at least one non-zero per column) in A∗. If there are (on an average) less than one non-zero element per column in the sparse factor, a few rows of D∗ vanish due to the presence of zero columns in A∗ and hence all the degrees of freedom in D∗ are not carried over to X∗ (resulting in zero columns in X∗). This makes the overall problem easier and ∗ reduces the minimax risk (associated with D ) by a factor of (k/n2). Similarly, k ∗ m · ∆(k, n2) is the error term associated with the sparse factor A , and it follows the parametric rate of (k/m) in the large sample regime provided k ≥ n2.

• Small sample regime or when m (n1r ∨ k). In this case the minimax risk in (2.8) becomes R∗ = Ω ∆(k, n )A2 . (2.11) X (r,k,Amax) 2 max

Equation (2.11) implies that the minimax risk in estimating the unknown matrix doesn’t become arbitrarily large when the nominal number of observations is much smaller than the number of degrees of freedom in the factors (or when m 2 n1r + k), but is instead lower bounded by the squared-amplitude (Amax) of the sparse factor (provided there are sufficient non-zeros in A∗ for all the degrees of freedom to manifest). This is a direct consequence of our assumption that the entries of the factors are bounded. 14 The virtue of expressing the lower bounds for the minimax risk as in (2.8) is that we don’t make any assumptions on the nominal number of samples collected and is hence a valid bound over all sampling regimes. However, in the discussions that follow, we shall often consider the large sample regime and appeal to the lower bounds of the form (2.10). In the following sections, we consider different noise models which satisfy the KL-divergence criterion (2.7) and present lower bounds for each specific instance as corollaries of our general result presented in Theorem 2.1.

2.3.1 Additive Gaussian Noise

Let us consider a setting where the observations are corrupted by i.i.d zero-mean additive Gaussian noise with known variance. We have the following result; its proof appears in Appendix 2.6.2

∗ Corollary 2.1 (Lower bound for Gaussian Noise). Suppose Yi,j = Xi,j + ξi,j, 2 where ξi,j are i.i.d Gaussian N (0, σ ), σ > 0, ∀(i, j) ∈ S. There exist absolute constants

C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys

( ) ∗ 2 2 2 n1r + k R ≥ C · min ∆(k, n2)A , γ σ . (2.12) X (r,k,Amax) max m

Remark 2.1. If instead of i.i.d Gaussian noise, we have that ξi,j are just independent 2 2 zero-mean additive Gaussian random variables with variances σi,j ≥ σmin ∀(i, j) ∈ S, the result in (2.12) is still valid with the σ replaced by σmin. This stems from the fact that the KL divergence between the distributions in equations (2.42) and (2.45) can be upper bounded by the smallest of value variance amongst all the noise entries.

It is worth noting that our lower bounds on the minimax risk relate directly to the work in [4], which gives upper bounds for matrix completion problems under similar sparse factor models. The normalized (per-element) Frobenius error for the sparsity- penalized maximum likelihood estimator under a Gaussian noise model presented in [4] satisﬁes

h ∗ 2 i Y kXb − X k E S F 2 n1r + k = O (σ ∧ Amax) log(n1 ∨ n2) . (2.13) n1n2 m 15 A comparison of (2.13) to our results in Equation (2.12) imply that the rate attained by the estimator presented in [4] is minimax optimal up to a logarithmic factor when ∗ there is (on an average), at least one non-zero element per columns of A (i.e. k ≥ n2), and provided we make suﬃcient observations (i.e m n1r + k). Another direct point of comparison to our result here is the low rank matrix completion problem with entry-wise observations considered in [22]. In particular, if we adopt the lower bounds obtained in Theorem 6 of that work to our settings, we observe that the risk involved in estimating rank-r matrices that are sampled uniformly at random follows

h ∗ 2 i Y kXb − X k E S F 2 (n1 ∨ n2)r = Ω (σ ∧ Xmax) , n1n2 m (n + n )r = Ω (σ ∧ X )2 1 2 , max m (2.14)

where the last equality follows from the fact that n1 ∨ n2 ≥ (n1 + n1)/2. If we consider ∗ ∗ ∗ non-sparse factor models (where k = rn2), it can be seen that the product X = D A is low-rank with rank(X∗) ≤ r and our problem reduces to the one considered in [22]

(with m ≥ (n1 ∨ n2)r, an assumption made in [22]). Under the conditions described above, and our assumption in Section 2.2.1 that Xmax = Θ(Amax), the lower bound given in (2.12) (or it’s counterpart for the large sample regime) coincides with (2.14). However the introduction of sparsity brings additional structure which can be exploited in estimating the entries of X∗, thus decreasing the risk involved.

2.3.2 Additive Laplace Noise

The following theorem gives a lower bound on the minimax risk in settings where the observations YS are corrupted with heavier tailed noises; its proof is given in Appendix 2.6.3.

∗ Corollary 2.2 (Lower bound for Laplacian Noise). Suppose Yi,j = Xi,j + ξi,j, where ξi,j are i.i.d Laplace(0, τ), τ > 0, ∀(i, j) ∈ S. There exist absolute constants

C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax 16 risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys

( ) ∗ 2 2 −2 n1r + k R ≥ C · min ∆(k, n2)A , γ τ . (2.15) X (r,k,Amax) max m

When we compare the lower bounds obtained under this noise model to the results of the previous case it can be readily seen that the overall error rates achieved are similar in both cases. Since we have the variance of Laplace(τ) random variable to be (2/τ 2), the leading term τ −2 in (2.15) is analogous to the σ2 factor which appears in the error bound for Gaussian noise. Using (2.15), we can observe that the complexity penalized maximum likelihood estimator described in [4] is minimax optimal up to a constant times a logarithmic factor, τXmax log(n1 ∨ n2) in the large sample regime, and when ∗ there is (on an average), at least one non-zero element per columns of A (i.e. k ≥ n2).

2.3.3 One-bit Observation Model

We consider here a scenario where the observations are quantized to a single bit, i.e. the observations Yi,j can take only binary values (either 0 or 1). Quantized observation models arise in many collaborative ﬁltering applications where the user ratings are quantized to ﬁxed levels, in quantum physics, communication networks, etc. (see, e.g. discussions in [27, 36]).

For a given sampling set S, we consider the observations YS to be conditionally (on S) independent random quantities deﬁned by

Yi,j = 1{Zi,j ≥0}, (i, j) ∈ S, (2.16) where ∗ Zi,j = Xi,j − Wi,j.

Here the {Wi,j}(i,j)∈S are i.i.d continuous zero-mean scalar noises having (bounded) probability density function f(w) and cumulative density function F (w) for w ∈ R, and

1{A} is the indicator function which takes the value 1 when the event A occurs (or is true) and zero otherwise. Our observations are thus quantized, corrupted versions of the true underlying matrix entries. Note that the independence of Wi,j implies that the elements Yi,j are also independent. Given this model, it can be easily seen that each

Yi,j, (i, j) ∈ S, is a Bernoulli random variable whose parameter is a function of the 17 ∗ true parameter Xi,j, and the cumulative density function F (·). In particular, for any ∗ ∗ (i, j) ∈ S, we have Pr(Yi,j = 1) = Pr(Wi,j ≤ Xi,j) = F (Xi,j). Hence the joint pmf of |S| the observations YS ∈ {0, 1} (conditioned on the underlying matrix entries) can be written as, Y ∗ Y ∗ 1−Y p ∗ (Y ) = [F (X )] i,j [1 − F (X )] i,j . (2.17) XS S i,j i,j (i,j)∈S

We will further assume that F (rAmax) < 1 and F (−rAmax) > 0, which will allow us to avoid some pathological scenarios in our analyses. In such settings, the following theorem gives a lower bound on the minimax risk; its proof appears in Appendix 2.6.4.

Corollary 2.3 (Lower bound for One-bit observation model). Suppose that the observations Yi,j are obtained as described in (2.16) where Wi,j are i.i.d continuous zero-mean scalar random variables as described above, and deﬁne

 1/2 1/2 1 ! 2 cF,rAmax ,  sup  sup f (t) . (2.18) |t|≤rAmax F (t)(1 − F (t)) |t|≤rAmax

There exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, 1 ≤ r ≤ (n1 ∧ n2), and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys

( ) ∗ 2 2 −2 n1r + k R ≥ C · min ∆(k, n2)A , γ c . (2.19) X (r,k,Amax) max F,rAmax m

It worth commenting on the relevance of our result (in the linear sparsity regime) to the upper bounds established in [4], for the matrix completion problem under similar settings. The normalized (per element) error of the complexity penalized maximum likelihood estimator described in [4] obeys

h i kX − X∗k2 2 ! ! ! EYS b F c 1 n r + k = O F,rAmax + X2 1 log(n ∨ n ) , n n c0 c2 max m 1 2 1 2 F,rAmax F,rAmax (2.20) where Xmax (≥ 0) is the upper bound on the entries of the matrix to be estimated and 18 c0 is deﬁned as F,rAmax

f 2(t) 0 cF,rAmax , inf . (2.21) |t|≤rAmax F (t)(1 − F (t))

Comparing (2.20) with the lower bound established in (2.19), we can see that estimator described in [4] is minimax optimal up to a logarithmic factor (in the large sample regime, and with k ≥ n ) when the term (c2 /c0 ) is bounded above by a con- 2 F,rAmax F,rAmax stant. The lower bounds obtained for the one-bit observation model and the Gaussian case essentially exhibit the same dependence on the matrix dimensions (n1, n2 and r), sparsity (k) and the nominal number of measurements (m), except for the leading term

(which explicitly depends on the distribution of the noise variables Wi,j for the one- bit case). Such a dependence in error rates between rate-constrained tasks and their Gaussian counterparts was observed in earlier works on rate-constrained parameter estimation [37, 38]. It is also interesting to compare our result with the lower bounds for the one-bit (low rank) matrix completion problem considered in [27]. In that work, the authors establish that the risk involved in matrix completion over a (convex) set of max-norm and nuclear norm constrained matrices (with the decreasing noise pdf f(t) for t > 0) obeys

h i kX − X∗k2 s r ! EYS b F 1 (n ∨ n )r = Ω X 1 2 n n max c0 m 1 2 F,rAmax s r ! 1 (n + n )r = Ω X 1 2 , (2.22) max c0 m F,rAmax

q where c0 is defined as in (2.21). As long as c2 and c0 are comparable, F,rAmax F,rAmax F,rAmax the leading terms of our bound and (2.22) are analogous to each other. In order to note the difference between this result and ours, we consider the case when A∗ is not ∗ sparse i.e., we set k = rn2 in (2.19) so that the resulting matrix X is low-rank (with rank(X) ≤ r). For such a setting, our error bound (2.19) scales in proportion to the ratio of the degrees of freedom (n1 + n2)r and the nominal number of observations m, while the bound in [27] scales to the square root of that ratio. A more recent work [39], proposed an estimator for the low-rank matrix completion 19 on finite alphabets and establishes convergence rates faster than in [27]. On casting their results to our settings, the estimation error in [39] was shown to obey

!2 ! kXb − X∗k2 c2 (n + n )r F = O F,rAmax 1 2 log(n + n ) . (2.23) n n c0 m 1 2 1 2 F,rAmax

On comparing (2.23) with the our lower bounds (for the low-rank case, where k = rn2), it is worth noting that their estimator achieves minimax optimal rates up to a logarithmic factor when the ratio (c2 /c0 ) is bounded above by a constant. F,rAmax F,rAmax

2.3.4 Poisson-distributed Observations

Let us now consider a scenario where the data maybe observed as discrete ‘counts’ (which is common in imaging applications e.g., number of photons hitting the receiver per unit time). A popular model for such settings is the Poisson model, where all the ∗ entries of the matrix X to be estimated are positive and our observation Yi,j at each location (i, j) ∈ S is an independent Poisson random variable with a rate parameter ∗ Xi,j. The problem of matrix completion now involves the task of Poisson denoising. Unlike the previous cases, this problem cannot be directly cast into the setting of our general result, as there is an additional restriction on the model class that the entries of X∗ are strictly bounded away from zero. A straightforward observation that follows is that the sparse factor A∗ in the factorization cannot have any zero valued columns.

Hence we have that k ≥ n2 be satisﬁed as a necessary (but not a suﬃcient) condition in this case. The approach we use to derive the following result is similar in spirit to the previous cases and is described in Appendix 2.6.5.

Theorem 2.2 (Lower bound for Poisson noise). Suppose that the entries of the ma- ∗ trix X satisfy min Xi,j ≥ Xmin for some constant 0 < Xmin ≤ Amax and the observations i,j ∗ Yi,j are independent Poisson distributed random variable with rates Xi,j ∀(i, j) ∈ S.

There exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and n2 ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model 0 class X (r, k, Amax, Xmin) which is a subset of X (r, k, Amax) comprised of matrices with positive entries, obeys

( ) ∗ 2 2 n1r + k − n2 R 0 ≥ C · min ∆(e k, n2, δ)A , γ Xmin , (2.24) X (r,k,Amax,Xmin) max m 20 where δ = Xmin , and the function ∆(k, n , δ) is given by Amax e 2 2 k − n2 ∆(e k, n2, δ) = min (1 − δ) , . (2.25) n2

As in the previous cases, our analysis rests on establishing quadratic upper bounds on the KL divergence to obtain parametric error rates for the minimax risk; a similar approach was used in [40], which describes performance bounds on compressive sensing sparse signal estimation task under a Poisson noise model, and in [41]. Recall that the lower bounds for each of the preceding cases exhibited a leading factor to the parametric rate, which was essentially the noise variance. Note that for a Poisson observation model, the noise variance equals the rate parameter and hence depends on the true underlying matrix entry. So we might interpret the factor Xmin in (2.24) as the minimum variance of all the independent (but not necessarily identically distributed) Poisson observations and hence is somewhat analogous to the results presented for the Gaussian and Laplace noise models. The dependence of the minimax risk on the nominal number of observations (m), matrix dimensions (n1, n2, r), and sparsity factor k, is encapsulated in the two terms, n1r k−n2 m and m . The first term, which corresponds to the error associated with the dictionary term D∗ is exactly the same as with the previous noise models. However we can see that the term associated with the sparse factor A∗ is a bit different from the other models discussed. In a Poisson-distributed observation model, we have that the entries of the true underlying matrix to be estimated are positive (which also serves as the Poisson rate parameter to the observations Yi,j). A necessary implication of this is that the sparse factor A∗ should contain no zero-valued columns, or every columns should have at least one non-zero entry (and hence we have k ≥ n2). This reduces the effective number of degrees of freedom (as described in Section 2.3.1) in the sparse factor from k to k − n2, thus reducing the overall minimax risk. It is worth further commenting on the relevance of this result (in the large sample regime) to the work in [4], which establishes error bounds for Poisson denoising problems with sparse factor models. From Corollary III.3 of [4], we see that the normalized (per element) error of the complexity penalized maximum likelihood estimator obeys

h ∗ 2 i Y kXb − X k E S F Xmax 2 n1r + k = O Xmax + · Xmax log(n1 ∨ n2) , (2.26) n1n2 Xmin m 21 where Xmax is the upper bound on the entries of the matrix to be estimated. Comparing (2.26) with the lower bound established in (2.24) (or again, it’s counterpart in the large sample regime), we can see that estimator described in [4] is minimax optimal w.r.t to the matrix dimension parameters up to a logarithmic factor (neglecting the leading constants) when k ≥ 2n2. We comment a bit on our assumption that the elements of the true underlying ∗ matrix X , be greater than or equal to some Xmin > 0. Here, this parameter shows

n1r+k−n2 up as a multiplicative term on the parametric rate ( m ), which suggests that the minimax risk vanishes to 0 at the rate of Xmin/m (when the problem dimensions are fixed). This implication is in agreement with the classical Cramér-Raolower bounds which states that the variance associated with estimating a Poisson(θ) random variable using m iid observations decays at the rate θ/m (and achieved by a sample average estimator). Thus our notion that the denoising problem becomes easier as the rate parameter decreases is intuitive and is consistent with classical analyses. On this note, we briefly mention recent efforts which do not make assumptions on the minimum rate of the underlying Poisson processes; for matrix estimation tasks as here [29], and for sparse vector estimation from Poisson-distributed compressive observations [42].

2.4 Conclusion

In this chapter, we established minimax lower bounds for sparse factor matrix completion tasks, under very general noise/corruption models. We also provide lower bounds for several specific noise distributions that fall under our general noise model. This indicates that property (2.7), which requires that the scalar KL divergences of the noise distribution admit a quadratic upper bounded in terms of the underlying parameters, is not overly restrictive in many interesting scenarios. A unique aspect of our analysis is its applicability to matrices representable as a product of structured factors. While our focus here was specifically on models in which one factor is sparse, the approach we utilize here to construct packing sets extends naturally to other structured factor models (of which standard low-rank models are one particular case). A similar analysis to that utilized here could also be used to establish lower bounds on estimation of structured tensors, for example, those expressible in a Tucker decomposition with sparse core, and possibly structured factor matrices (see, e.g., [43] for a discussion of Tucker models). We defer investigations along these lines 22 to a future effort.

2.5 Acknowledgement

We acknowledge support for this eﬀort from the DARPA Young Faculty Award, Grant No. N66001-14-1-4047. We are grateful to the anonymous reviewer of our paper [3] for their detailed and thorough evaluations of the paper. In particular, we thank the reviewer for pointing out some subtle errors in the initial versions of our main results that motivated us to obtain tighter lower bounds.

2.6 Appendix

In order to prove Theorem 2.1 we use standard minimax analysis techniques, namely the following theorem (whose proof is available in [34]),

Theorem 2.3 (Adopted from Theorem 2.5 in [34]). Assume that M ≥ 2 and suppose that there exists a set with ﬁnite elements, X = {X0, X1,... XM } ⊂ X (r, k, Amax) such that

• d(Xj, Xk) ≥ 2s, ∀ 0 ≤ j < k ≤ M; where d(·, ·): X × X → R is a semi-distance function, and

M 1 P • M K(PXj , PX0 ) ≤ α log M with 0 < α < 1/8. j=1 Then

inf sup PX(d(Xb , X) ≥ s) ≥ inf sup PX(d(Xb , X) ≥ s) Xb X∈X (r,k,Amax) Xb X∈X √ M r 2α ≥ √ 1 − 2α − > 0. (2.27) 1 + M log M

Here the first inequality arises from the fact that the supremum over a class of matrices X is upper bounded by that of a larger class X (r, k, Amax) (or in other words, estimating the matrix over an uncountably infinite class is at least as difficult as solving the problem over any finite subclass). We thus reduce the problem of matrix completion over an uncountably infinite set X (r, k, Amax), to a carefully chosen finite collection of matrices X ⊂ X (r, k, Amax) and lower bound the latter which then gives a valid bound for the overall problem. 23 In order to obtain tight lower bounds, it essential to carefully construct the class X (which is also commonly called a packing set) with a large cardinality, such that its elements are also as far apart as possible in terms of (normalized) Frobenius distance, which is our choice of semi-distance metric.

2.6.1 Proof of Theorem 2.1

n ×n Let us deﬁne a class of matrices X ⊂ R 1 2 as

X , {X = DA : D ∈ D, A ∈ A}, (2.28)

n ×r r×n where the factor classes D ⊂ R 1 and A ⊂ R 2 are constructed as follows for

γd, γa ≤ 1 (to be quantiﬁed later)

n1×r D , D ∈ R : Di,j ∈ {0, 1, d0} , ∀(i, j) ∈ [n1] × [r] , (2.29) and

r×n2 A , A ∈ R : Ai,j ∈ {0, Amax, a0}, ∀(i, j) ∈ [r] × [n2], kAk0 ≤ k , (2.30) where ( ) γd · µD n1r 1/2 d0 , min 1, p , Amax ∆(k, n2) m ( 1/2) γa · µD k a0 , min Amax, p , ∆(k, n2) m and ∆(k, n2) is defined in (2.9). Clearly X as defined in (2.28) is a finite class of matrices which admits a factorization as in Section 2.2.1, so X ⊂ X (r, k, Amax). We consider the lower bounds involving the non-sparse term, D and the sparse factor A separately and then combine those results to a get an overall lower bound on the minimax risk R∗ . X (r,k,Amax) Let us first establish the lower bound obtained by using the sparse factor A. In order to do this, we define a set of sparse matrices A¯ ⊂ A, where all the nonzero terms 24 l m are stacked in the first r0 = k rows. Formally we define n2

¯ r×n2 T A , A ∈ R : A = (Anz|0A) , (Anz)i,j ∈ {0, a0}, 0 ∀(i, j) ∈ [n2] × [r ], kAnzk0 ≤ k , (2.31)

0 0 where Anz is an n2 ×r sparse matrix with at most k non zeros and 0A is an n2 ×(r −r ) zero matrix. Let us now deﬁne the ﬁnite class of matrices XA ⊂ X as

¯ XA , X = DI A : A ∈ A , (2.32) where DI is made up of block zeros and block identity matrices, and deﬁned as follows

  Ir0 0  . .   . .   . .  DI ,   , (2.33)  I 0 0   r  0 0

0 0 where, Ir0 denotes the r × r identity matrix.

The deﬁnitions in equations (2.31) to (2.33) imply that the elements of XA form T 0 block matrices which are of the form Anz ··· Anz 0 , with bn1/r c blocks of

Anz, ∀A ∈ A¯ and the rest is a zero matrix of the appropriate dimension. Since the entries of Anz can take only one of two values 0 or a0, and since there are at most k non-zero elements (due to the sparsity constraint), the Varshamov-Gilbert bound (cf. 0 Lemma 2.9 in [34]) guarantees the existence of a subset XA ⊆ XA with cardinality 0 k/8 Card(XA) ≥ 2 + 1, containing the n1 × n2 zero matrix 0, such that for any 2 distinct 25 0 elements X1, X2 ∈ XA we have,

k jn k kX − X k2 ≥ 1 a2 1 2 F 8 r0 0 2 2 k n1 2 γaµD k = min Amax, 8 dk/n2e ∆(k, n2) m 2 2 n1n2 k ∧ n2 2 γaµD k ≥ min Amax, 32 n2 ∆(k, n2) m | {z } =∆(k,n2) n n k = 1 2 min ∆(k, n )A2 , γ2µ2 , (2.34) 32 2 max a D m where the second to last inequality comes from the fact that k ≤ (n1n2)/2, and bxc ≥ x/2 ∀x ≥ 1. 0 For any X ∈ XA, consider the KL divergence of P0 from PX, pXS (YS ) K(PX, P0) = EX log p0(YS ) X m = K( , ) · (2.35) PXi,j P0i,j n n i,j 1 2 m 1 X ≤ |X |2, (2.36) n n 2µ2 i,j 1 2 D i,j where (2.35) is obtained by conditioning2 the expectation w.r.t the sampling set S, and (2.36) follows from the assumption on noise model (2.7). To further upper bound the 0 RHS of (2.36), we note that the maximum number of nonzero entries in any X ∈ XA is at most n1(k ∧ n2) by construction of the sets XA and A¯ in (2.32), (2.31) respectively. Hence we have m k ∧ n2 2 K(PX, P0) ≤ 2 a0 2µD n2 | {z } =∆(k,n2) m 2 2 2 k = 2 min ∆(k, n2)Amax, γa · µD . (2.37) 2µD m

2 Here, both the observations YS , and the sampling set S are random quantities. Thus by conditioning h i h h ii w.r.t to S, we get X Y ∼p log pX (YS )/p0(YS ) = S log pX (YS )/p0(YS ) . E , E S XS S E EXS |S S Since S is generated according to the independent Bernoulli(m/n1n2) model, ES [·] yields the constant term m . We shall use such conditioning techniques in subsequent proofs as well. n1n2 26 From (2.37) we see that

1 X K( , ) ≤ α log(Card(X 0) − 1), (2.38) Card(X 0) − 1 PX P0 A A 0 X∈XA

√ α log 2 is satisﬁed for any 0 < α < 1/8 by choosing 0 < γa < 2 . Equations (2.34) and (2.38) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield

∗ 2 ! kXb − X kF 1 2 2 2 k inf sup PX∗ ≥ · min ∆(k, n2)Amax, γaµD ≥ β, (2.39) ∗ Xb X ∈XA n1n2 64 m for some absolute constant β ∈ (0, 1). We now consider the non-sparse factor D to construct a testing set and establish lower bounds similar to the previous case. Let us deﬁne a ﬁnite class of matrices XD ⊆ X as

¯ XD , X = DA : D ∈ D, A = Amax Ir ··· Ir 0D ∈ A , (2.40) where A is constructed with b(k ∧ n2)/rc blocks of r × r identity matrices (denoted by j k k∧n2 ¯ Ir), 0D is r × n2 − r r zero matrix, and D ⊆ D is deﬁned as

¯ n1×r D , D ∈ R : Dij ∈ {0, d0} , ∀(i, j) ∈ [n1] × [r] . (2.41)

The deﬁnition in (2.40) is similar to that we used to construct XA and hence it results in a block matrix structure for the elements in XD. We note here that there are n1r elements in each block D, where each entry can be either 0 or d0. Hence the Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) guarantees the existence of a subset

0 0 n1r/8 XD ⊆ XD with cardinality Card(XD) ≥ 2 + 1, containing the n1 × n2 zero matrix 0 0, such that for any 2 distinct elements X1, X2 ∈ XD we have

n r k ∧ n kX − X k2 ≥ 1 2 A2 d2 1 2 F 8 r max 0 n n n n r o ≥ 1 2 min A2 ∆(k, n ), γ2µ2 1 , (2.42) 16 max 2 d D m where we use the fact that (k ∧ n2) = n2 · ∆(k, n2), and bxc ≥ x/2 ∀x ≥ 1 to obtain the 27 last inequality. 0 For any X ∈ XD, consider the KL divergence of P0 from PX, pXS (YS ) K(PX, P0) = EX log p0(YS ) m 1 X ≤ |X |2, (2.43) n n 2µ2 i,j 1 2 D i,j where the inequality follows from the assumption on noise model (2.7). To further upper bound the RHS of (2.43), we note that the maximum number of nonzero entries in any 0 X ∈ XD is at most n1(k ∧ n2) by construction of the class XD in (2.40). Hence we have m k ∧ n2 2 2 K(PX, P0) ≤ 2 Amaxd0 2µD n2 | {z } =∆(k,n2) m n 2 2 2 n1r o = 2 min ∆(k, n2)Amax, γdµD . (2.44) 2µD m

From (2.44) we see that

1 X K( , ) ≤ α0 log(Card(X 0 ) − 1), (2.45) Card(X 0 ) − 1 PX P0 D D 0 X∈XD

√ 0 α0 log 2 is satisﬁed for any 0 < α < 1/8 by choosing 0 < γd < 2 . Equations (2.42) and (2.45) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield

∗ 2 ! kXb − X kF 1 2 2 2 n1r 0 inf sup PX∗ ≥ · min ∆(k, n2)Amax, γdµD ≥ β , (2.46) ∗ Xb X ∈XD n1n2 64 m for some absolute constant β0 ∈ (0, 1). Inequalities (2.39) and (2.46) imply the result,

∗ 2 ! kXb − X kF 1 2 2 2 n1r + k inf sup PX∗ ≥ · min ∆(k, n2)Amax, γD µD ∗ Xb X ∈X (r,k,Amax) n1n2 128 m ≥ (β0 ∧ β), (2.47)

0 where γD = (γd ∧γa), is a suitable value for the leading constant, and we have (β ∧β) ∈

(0, 1). In order to obtain this result for the entire class X (r, k, Amax), we use the fact 28 that solving the matrix completion problem described in Section 2.2.1 over a larger (and possibly uncountable) class of matrices is at least as diﬃcult as solving the same problem over a smaller (and possibly ﬁnite) subclass. Applying Markov’s inequality to (2.47) directly yields the result of Theorem 2.1, completing the proof.

2.6.2 Proof of Corollary 2.1

2 For a Gaussian distribution with mean x ∈ R and variance σ , denoted by Px ∼ N (x, σ2), we have

1 1 2 px(z) = √ exp − (z − x) , ∀z ∈ R. (2.48) 2πσ2 2σ2

Using the expression for the pdf of a Gaussian random variable in (2.48), the KL divergence of Px from Py (for any y ∈ R) satisﬁes px(z) K(Px, Py) = Ex log py(z) 1 = (x − y)2. (2.49) 2σ2

The expression for KL divergence between scalar Gaussian distributions (with identical variances) given in (2.49), obeys the condition (2.7) with equality, where µD = σ. Hence we directly appeal to Theorem 2.1 to yield the desired result.

2.6.3 Proof of Corollary 2.2

For a Laplace distribution with parameter τ > 0 centered at x, denoted by Px ∼ Laplace(x, τ) where x ∈ R, the KL divergence of Px from Py (for any y ∈ R) can be computed by (relatively) straightforward calculation as

px(z) K(Px, Py) = Ex log py(z) = τ|x − y| − 1 − e−τ|x−y|. (2.50) 29 Using a series expansion of the exponent in (2.50) we have

(τ|x − y|)2 (τ|x − y|)3 e−τ|x−y| = 1 − τ|x − y| + − + ··· 2! 3! (τ|x − y|)2 ≤ 1 − τ|x − y| + . (2.51) 2!

Rearranging the terms in (2.51) yields the result,

τ 2 K( , ) ≤ (x − y)2. (2.52) Px Py 2

With the upper bound on the KL divergence established in (2.52), we directly appeal −1 to Theorem 2.1 with µD = τ to yield the desired result.

2.6.4 Proof of Corollary 2.3

∗ For any X, X ∈ X (r, k, Amax) using the pdf model described in (2.17), it is straightforward to show that the scalar KL divergence is given by

∗ ∗ ∗ F (Xi,j) ∗ 1 − F (Xi,j) K( ∗ , ) = F (X ) log + (1 − F (X )) log , PXi,j PXi,j i,j i,j F (Xi,j) 1 − F (Xi,j) for any (i, j) ∈ S. We directly use an intermediate result from [4] to invoke a quadratic upper bound for the KL divergence term,

1 2 ∗ 2 ∗ K(PX , PXi,j ) ≤ cF,rA (Xi,j − Xi,j) , (2.53) i,j 2 max

where cF,rAmax is defined in (2.18). Such an upper bound in terms of the underlying matrix entries can be attained by following a procedure illustrated in [36], where one first establishes quadratic bounds on the KL divergence in terms of the Bernoulli parameters, and then subsequently establishes a bound on the squared difference between Bernoulli parameters in terms of the squared difference of the underlying matrix elements. With a quadratic upper bound on the scalar KL divergences in terms of the underlying matrix entries (2.53), we directly appeal to the result of Theorem 2.1 with µ = c−1 to yield the desired result. D F,rAmax 30 2.6.5 Proof of Theorem 2.2

The Poisson observation model considered here assumes that all the entries of the underlying matrix X∗ are strictly non-zero. We will use similar techniques as in Appendix 2.6.1 to derive the result for this model. However we need to be careful while constructing the sample class of matrices as we need to ensure that all the entries of the members should be strictly bounded away from zero (and in fact ≥ Xmin). In this proof sketch, we will show how an appropriate packing set can be constructed for this problem, and obtain lower bounds using arguments as in Appendix 2.6.1. As before, let us ﬁx D and establish the lower bounds due to the sparse factor A alone. For γd, γa ≤ 1 (which we shall qualify later), we construct the factor classes n ×r r×n D ⊂ R 1 , and A ⊂ R 2

n1×r D , D ∈ R :Di,j ∈ {0, 1, δ, d0}, ∀(i, j) ∈ [n1] × [r] , (2.54) and

r×n2 A , A ∈ R : Ai,j ∈ {0, Xmin, Amax, a0}, ∀(i, j) ∈ [r] × [n2], kAk0 ≤ k , (2.55) where δ Xmin , , Amax

√ γd · Xmin n1r 1/2 d0 , min 1 − δ, , Amax m ( s 1/2) Xmin k − n2 a0 , min Amax, γa · , ∆(k − n2, n2) m and ∆(·, ·) is deﬁned in (2.9). Similar to the previous case, we consider a subclass A¯ ⊂ A l m with at most k nonzero entries which are all stacked in the ﬁrst r0 + 1 = k rows such n2 that ∀A ∈ A¯ we have,

  Xmin for i = 1, j ∈ [n2]  0 (A)ij = (A)ij ∈ {0, a0} for 2 ≤ i ≤ r + 1, j ∈ [n2] . (2.56)   0 otherwise

n ×n Now we deﬁne a ﬁnite class of matrices XA ⊂ R 1 2 as

¯ XA , X = (D0 + DI )A : A ∈ A , (2.57) 31 where DI , D0 ∈ D are deﬁned as

  0r0 Ir0 0  . .   . .   . .  DI ,   , and (2.58)  0 0 I 0 0   r r  0 0 0 D0 , 1n1 | 0 , (2.59)

n 1 0 where 1n1 is the n1 × 1 vector of all ones, and we have r0 blocks of 0r (which is the 0 0 0 r × 1 zero vector) and Ir0 (which is the r × r identity matrix), in DI . 0 The above deﬁnitions ensure that XA ⊂ X (r, k, Amax, Xmin). In particular, for any

X ∈ XA we have

X = (D0 + DI )A, T 0 = Xmin1n1 · 1n2 +DI A , (2.60) | {z } =X0

¯ 0 r×n2 0 where (D0 + DI ) ∈ D, A ∈ A, and A ∈ R just retains the rows 2 to r + 1, of A. It 0 n1 is also worth noting here that the matrix DI A has r0 copies of the nonzero elements of A0. Now let us consider the Frobenius distance between any two distinct elements

X1, X2 ∈ XA,

2 2 kX1 − X2kF = k(D0 + DI )A1 − (D0 + DI )A2kF , 0 0 2 = kX0 + DI A1 − X0 − DI A2kF , 0 0 2 = kDI A1 − DI A2kF . (2.61)

The constructions of A0 in (2.60), and the class A¯ in (2.56) imply that the number of 0 degrees of freedom in the sparse matrix A (which can take values 0 or a0) is restricted to (k − n2). The Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) can be easily 0 ¯ applied to the set of matrices of the form Xe = DI A for A ∈ A, and this coupled 0 0 with (2.61) guarantees the existence of a subset XA ⊆ XA with cardinality Card(XA) (k−n2)/8 T ≥ 2 + 1, containing the n1 × n2 reference matrix X0 = Xmin1n1 · 1n2 , such that 32 0 for any two distinct elements X1, X2 ∈ XA we have,

k − n jn k kX − X k2 ≥ 2 1 a2, 1 2 F 8 r0 0 $ % k − n n X k − n = 2 1 min A2 , γ2 min 2 , k−n max a 8 2 ∆(k − n2, n2) m n2 n1n2 (k − n2) ∧ n2 2 2 Xmin k − n2 ≥ min Amax, γa , 32 n2 ∆(k − n2, n2) m | {z } =∆(k−n2,n2) n n k − n = 1 2 min ∆(k − n , n )A2 , γ2 X 2 , (2.62) 32 2 2 max a min m where the second to last inequality comes from the fact that bxc ≥ x/2 when x ≥ 1. The joint pmf of the set of |S| observations (conditioned on the true underlying matrix) can be conveniently written as a product of Poisson pmfs using the independence criterion as, ∗ ∗ Yi,j −Xi,j Y (Xi,j) e pX∗ (YS ) = . (2.63) S (Yi,j)! (i,j)∈S

0 For any X ∈ XA, the KL divergence of PX0 from PX where X0 is the reference matrix (whose entries are all equal to Xmin) is obtained by using an intermediate result from [40] giving

  X K( , ) = K( , ) PX PX0 ES  PXi,j PX0i,j  i,j m X Xi,j = X log − X + X . n n i,j X i,j min 1 2 i,j min

Using the inequality log t ≤ (t − 1), we can bound the KL divergence as

m X Xi,j − Xmin K( , ) ≤ X − X + X PX PX0 n n i,j X i,j min 1 2 i,j min 2 m X (Xi,j − Xmin) = . (2.64) n n X 1 2 i,j min

To further upper bound the RHS of (2.64), we note that the number of entries greater 0 than Xmin in any X ∈ XA is at most n1(n2 ∧ (k − n2)) by the construction of sets XA 33 and A¯ in (2.57),(2.56) respectively. Hence we have

2 (a0 + Xmin − Xmin) n2 ∧ (k − n2) K(PX, PX0 ) ≤ m (2.65) Xmin n2 | {z } =∆(k−n2,n2) ∆(k − n , n )a2 = m 2 2 0 , (2.66) Xmin

0 where (2.65) uses the fact that by construction, entries of the matrices in XA are upper bounded by a0 + Xmin. From (2.66) we can see that

1 X 0 K( , 0 ) ≤ α log(Card(X ) − 1), (2.67) Card(X 0) − 1 PX PX A A 0 X∈XA

√ is satisﬁed for any 0 < α < 1/8 by choosing γ < α√log 2 . Equations (2.62) and a 2 2 (2.67) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield

∗ 2 ! kXb − X kF 1 2 2 k − n2 inf sup PX∗ ≥ · min ∆(k − n2, n2)Amax, γa Xmin ∗ Xb X ∈XA n1n2 64 m ≥ β, (2.68) for some absolute constant β ∈ (0, 1). We use arguments similar to the previous case to establish lower bounds using the dictionary term D. Again, the key factor in the construction of a packing set is to ensure that entries of the matrices are bounded away from zero (and in fact ≥ Xmin). 0 For this let us first define the finite class, XD ⊆ X (r, k, Amax, Xmin) as

¯ XD , X = (Dδ + D)A : D ∈ D, (2.69) A = Amax Ir ··· Ir ΨD ∈ A , (2.70) where, Ir denotes the r × r identity matrix, ΨD is an r × (n2 − rbn2/rc) matrix given ! ID by ΨD = and, ID is the identity matrix of dimension (n2 − rbn2/rc) and 0D 0D is the (r − n2 + rbn2/rc) × (n2 − rbn2/rc) zero matrix, D¯ ⊆ D and Dδ ∈ D are deﬁned 34 as

¯ n1×r D , D ∈ R : Di,j ∈ {0, d0}, ∀(i, j) ∈ [n1] × [r] , and Xmin (Dδ)i,j , = δ, ∀(i, j). Amax

The above deﬁnition of A (with the block identity matrices and ΨD) ensures that entries of the members in our packing set XD, are greater than or equal to Xmin. We can see that for any X ∈ XD we have

X = DδA + DA, T = δAmax1n1 · 1n2 +DA. | {z } =X0

Thus we can appeal to the Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) for matrices of the form Xe = DA, where D ∈ D¯ and A is deﬁned in (2.69), to guarantee

0 0 n1r/8 the existence of a subset XD ⊆ XD with cardinality Card(XD) ≥ 2 + 1, containing T the n1 ×n2 reference matrix X0 = Xmin1n1 ·1n2 , such that for any two distinct elements 0 X1, X2 ∈ XD we have, n r jn k kX − X k2 ≥ 1 2 A2 d2 1 2 F 8 r max 0 n n n n r o ≥ 1 2 min (1 − δ)2A2 , γ2X 1 . (2.71) 16 max d min m

0 For any X ∈ XD, the KL divergence of PX0 from PX where X0 is the reference matrix (whose entries are all equal to Xmin) can be upper bounded using (2.64) by

2 m X (Xi,j − Xmin) K( , ) ≤ , PX PX0 n n X 1 2 i,j min d2 A2 ≤ m 0 max , (2.72) Xmin

0 where (2.72) uses the fact that by construction, entries of the matrices in XD are upper bounded by d0 + Xmin. From (2.72) we can see that

1 X 0 0 K( , 0 ) ≤ α log(Card(X ) − 1), (2.73) Card(X 0 ) − 1 PX PX D D 0 X∈XD 35 √ 0 is satisﬁed for any 0 < α0 < 1/8 by choosing γ < α√log 2 . Equations (2.71) and d 2 2 (2.73) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield

∗ 2 ! kXb − X kF 1 n 2 2 2 n1r o 0 inf sup PX∗ ≥ · min (1 − δ) Amax, γd Xmin ≥ β , ∗ Xb X ∈XA n1n2 64 m (2.74) for some absolute constant β0 ∈ (0, 1). Using (2.68) and (2.74), and by applying Markov’s inequality we directly get the result presented in Theorem 2.2, thus completing the proof. Chapter 3

Parameter Estimation Lower Bounds for Plenoptic Imaging Systems

This work focuses on assessing the information-theoretic limits of scene parameter estimation in plenoptic imaging systems, which are capable of providing substantially more information about a given scene than conventional cameras. We present a general framework to compute lower bounds on the parameter estimation error from noisy plenoptic observations. Our particular focus is on passive indirect imaging problems, where the observations do not contain line-of-sight information about the parameter(s) of interest. Using computer graphics rendering software to synthesize the (often complicated) dependence among parameter(s) of interest and observations, i.e. the forward model, we numerically evaluate the Hammersley-Chapman-Robbins (HCR) bound to establish lower bounds on the variance of any unbiased estimator of the unknown parameters. For scenarios where the rendering software produces an inexact version of the true forward model, we analyze the eﬀects of such rendering inconsistencies on the computed lower bounds both theoretically and via simulations. We also compare our lower bound with the performance of the Maximum Likelihood Estimator on a canonical object localization problem, which shows that our lower bounds are indicative of the true underlying fundamental limits. 1

1Portions of the material in this chapter is c 2019 IEEE. Reprinted, with permission, from 53rd Asilomar Conference on Signals, Systems, and Computers., “Computer Graphics meets Estimation Theory: Parameter Estimation Lower Bounds for Plenoptic Imaging Systems,” A.V. Sambasivan, R. G. Paxman, and J. D. Haupt.

36 37 3.1 Introduction

Conventional imaging systems are modeled after human vision and provide information about a scene via a two-dimensional image, where each pixel is described by a tristimulus value (e.g., RGB). However, the plenoptic function [5] captures the intensity of light at every location in space, over all possible angles and wavelengths, and at all time instances, providing much more information about a given scene than a conventional camera. The resulting plenoptic function takes the form

L = L(r, ϕ, ν, t),

3 where r ∈ R refers to the location at which the plenoptic observation is made, ϕ ∈ [0, 2π) × [0, π) denotes the angle pair (direction of arrival of light) in polar coordinates, ν is the wavelength, and t is the time index. The domain over which the plenoptic function is measured typically changes with the imaging modality used for a particular application. Plenoptic imaging has been used for a wide range of applications including stereoscopy [44, 45], microscopy [46, 47], and Non-Line-Of-Sight (NLOS) imaging [7,8, 10, 48]. While plenoptic imaging systems have a variety of applications in computer vision and image processing, we focus here on NLOS or indirect imaging. NLOS imaging corresponds to the scenario where the scene of interest is hidden from the observer (or imaging system) and the aim is to recover some parameters corresponding to the hidden scene from indirect reflections off various surfaces in the direct LOS of the observer (or imaging system). The parameters of interest in these problems could be as simple and high-level as the location of hidden objects (NLOS object tracking) or as complex as the entire hidden scene (NLOS image reconstruction). NLOS imaging, or more colloquially “imaging around corners,” has numerous potential applications in defense, autonomous navigation, and imaging inaccessible tissues in endoscopy for better diagnostics to name a few. We consider the problem of determining the fundamental limits of parameter estimation from noisy plenoptic observations, with an eye towards indirect (NLOS) imaging, from an estimation-theoretic standpoint. The difficulty in establishing fundamental lower bounds for plenoptic imaging problems lies in the complexity of the forward model, which codifies the functional dependence of the nominal (noiseless) observations on the 38 scene parameters. In this paper, we build on our previous work [6], where we first proposed combining analytical tools from classical estimation theory [49–52] with computer graphics rendering engines [1, 53] that help us simulate the often complicated forward model, for computing lower bounds on the estimability of scene parameters from noisy plenoptic observations. In doing so, we can provide a benchmark of optimality against which various estimation strategies could be compared, and also obtain useful insights on where information about the NLOS scene parameters is localized in a given set of observations.

3.1.1 Prior Art

NLOS imaging methods can be broadly classified into two categories: (1) Active imaging methods, and (2) Passive imaging methods. Active imaging typically involves illuminating the hidden scene using external stimuli, e.g., pulsed lasers, LIDARs, and using the information from the returning photons, (e.g., Time-of-Flight information), measured by specialized detectors, e.g., single-photon avalanche diodes (SPADs), to reconstruct the shape and albedo of hidden objects. These methods were initially demonstrated in [54, 55] and have recently gained a lot of attention [56–58]. However, active imaging requires the use of expensive and bulky hardware like femtosecond lasers and specialized detectors, which make them less amenable for on-field deployment in real-life applications. In this work, we are particularly interested in passive imaging methods that rely on the intensity measurements obtained at the imaging device to estimate the hidden scene parameters. The main challenge in passive imaging is that the indirect photons bouncing off diffuse (non-mirror-like) surfaces have very low signal levels making the imaging problems ill-conditioned. In a pioneer work [59], it was observed that the presence of occluders like sharp edges and corners facilitates the NLOS recovery problem. This led to many follow-up works that exploit occluders (and motion) in the hidden scene to perform NLOS imaging [7–11, 48]. Other passive imaging methods use coherence-based techniques for NLOS reconstruction [60–63]. A comprehensive survey of various NLOS imaging methods can be found in [64]. More recently, [65, 66] proposed using the Cramer-Rao bound and Fisher information as a proxy to study the feasibility and conditioning of their NLOS problem. The aforementioned works have shown significant promise in “imaging around corners,” but 39 a thorough information-theoretic treatment of the NLOS parameter estimation problem for more general and realistic scenes is still lacking.

3.1.2 Our Contribution

We present a general framework to establish fundamental limits for NLOS parameter estimation problems using noisy passive plenoptic observations. In contrast to the above-mentioned efforts, which develop simplified/approximate (linear) forward models for controlled experimental environments to study NLOS imaging, our framework can handle complicated (non-linear) forward models that describe realistic scenes. In [6], we proposed using rendering engines to simulate the forward model and numerically evaluate the Hammersley-Chapman-Robbins (HCR) bound that provides a lower bound on the variance of any unbiased estimator of the parameter(s) of interest. The HCR bound is applicable to a wider range of problems than the more commonly used Cramer- Rao bound (CRB), and is at least as tight as CRB when both exist. Our framework also enables us to localize the information content in the observations, which can be used to further validate some of the conclusions about the benefits of occluding objects for NLOS imaging [7,9–11, 65, 66]. One important assumption made by [6] is that the rendering engine simulates the forward model exactly, i.e., provides an error-free version of the true plenoptic observations for a given set of scene parameter values. However, in practice this assumption might not hold as the rendering engine might yield a close albeit inexact estimate of the true plenoptic observations. In this work, we consider unbiased and progressive rendering techniques, which produce unbiased estimates of the true plenoptic observations with continually decreasing error as we let the renderer run indefinitely (see documen- tation of Mitsuba [1]). We extend the efforts in [6] by analyzing effects of using such inexact renderings in our lower bounding framework and provide a simple method to estimate intervals for the true HCR lower bounds. We instantiate the HCR lower bounds for Poisson noise and Additive White Gaus- sian Noise (AWGN) models, and demonstrate the utility of our framework using a few canonical estimation problems. Finally, we compare our lower bounds with the performance of Maximum Likelihood Estimators for the problem of object localization, which shows that our lower bounds are almost tight and are indicative of the true fundamental limits. 40 The rest of the chapter is organized as follows. We begin by providing an overview of the rendering equation [12], which explains the forward model for our settings in Section 3.2. Using this, the problem setup is formalized in Section 3.3. We explain our renderer- enabled lower bound computation framework and the information-theoretic tools used in Section 3.4. The effect of rendering errors on the lower bound computation is studied in Section 3.5. We develop Maximum Likelihood Estimators for the problem of NLOS object localization and compare its performance against the computed lower bounds in Section 3.6 and finish with some concluding remarks about future work in Section 3.7.

3.2 Forward Model

d Let Θ ⊆ R denote the parameter class containing all possible values of the parameter of our interest. For a given scene with some unknown (deterministic) parameter θ∗ ∈ Θ, we denote the set of plenoptic samples (obtained by sampling the full plenoptic ﬁeld) associated with the scene by, Lθ∗ , {Lθ∗ (ω)}ω∈Ω , where we introduce the shorthand ω := [r, ϕ, ν, t], to denote the arguments of the plenoptic function, and Ω is the observation space, which is a subset of the domain over which the plenoptic function is deﬁned.

3.2.1 The Rendering Equation

Information-theoretic treatment of this estimation problem requires understanding the ∗ forward model θ 7→ Lθ∗ that captures the functional dependence of the plenoptic observations on the scene parameters. This can be mathematically explained by the rendering equation [12], which models the behavior of light rays as they originate from a light source, bounce off the objects in the scene and ultimately reach the detector (an imaging device). This forward mapping is commonly referred to as “light-transport” in the computer-graphics literature. For fixed (ν, t), light incident on a surface point r of a scene, along direction in −ϕi, denoted by Lθ∗ (r, −ϕi), is typically scattered along all directions, as shown in Figure 3.1(a). The proportion of this light reflected along ϕo is determined by a surface-dependent quantity known as the bi-directional reflectance distribution function (BRDF) f(r, ·, ·).

The overall radiance leaving a surface point r, along direction ϕo, denoted by out in Lθ∗ (r, ϕo), can hence be obtained by summing up the contributions of Lθ∗ (r, −ϕi) 41

(a) (b)

Figure 3.1: The rendering equation, explained graphically: (a) The proportion of incident light coming in from direction ϕi that gets reﬂected along direction ϕo is determined by the BRDF of the surface; (b) Light incident on a surface point r, can in be seen as light leaving from another point in the scene g(r, ϕi), ⇒ Lθ∗ (r, −ϕi) = out Lθ∗ (g(r, ϕi), −ϕi).

from all possible incident directions −ϕi, and also any light emitted by the surface (if it is an emitter). Mathematically, Z out e in Lθ∗ (r, ϕo) = Lθ∗ (r, ϕo) + Lθ∗ (r, −ϕi)f(r, ϕo, ϕi)(ϕi · n)dϕi, 2 S+(r)

e where Lθ∗ (r, ϕo) is the emitted radiance at r along the direction ϕo, n is the surface 2 normal at r, and S+(r) is the unit hemisphere at r containing all outgoing directions. in out From Figure 3.1(b), we can see that Lθ∗ (r, −ϕi) = Lθ∗ (g(r, ϕi), −ϕi), where g(r, ϕi) is a scene geometry dependent operator that essentially ﬁnds the ﬁrst surface point reached when traveling outward from r, along direction ϕi. Using this we can obtain the rendering equation as Z out e out Lθ∗ (r, ϕo) = Lθ∗ (r, ϕo) + Lθ∗ (g(r, ϕi), −ϕi)f(r, ϕ0, ϕi)(ϕi · n)dϕi, (3.1) 2 S+(r)

out where Lθ∗ is the plenoptic function of interest in the forward model alluded to above. Equation (3.1) is a Fredholm integral equation of the second kind, and is diﬃcult to solve in closed form for all but simplest of settings. To overcome this hurdle, ray- tracing/rendering engines were developed to approximately solve the integral equation in (3.1) using Monte Carlo methods. Ray-tracing engines are widely used in the computer graphics community to generate photo-realistic images by tracing the path of light 42 rays as they interact with objects in a scene. Here, we rely upon this (mature) tech- nology for our purposes; for a given parameter value θ∗, we use ray-tracing packages to approximately solve (3.1) and synthesize the plenoptic observations Lθ∗ .

3.2.2 Illustrative Example Scene: A Π-shaped Hallway

We illustrate the connection between scene parameters and plenoptic observations using an illustrative example scene. The scene of interest is a Π-shaped hallway with dimensions as marked in Figure 3.2(a). The inner walls of the hallways are painted with a diffuse eggshell paint, and their BRDF is modeled using the roughplastic plugin in Mitsuba [1]. The ceiling lights are 0.1m × 1.5m with a luminance of 3 lm · sr−1m−2 and emit white light (uniformly over all wavelengths)2. Assuming we place a red spherical object in this hallway, one might be interested in estimating various parameters related to the object, e.g., its location, radius, etc., which when bundled together constitutes the scene parameter vector θ∗. Figure 3.2 shows how the plenoptic observations Lθ∗ can be rendered using a ray-tracing engine for a given parameter value θ∗, which in this case comprises of the ball location and radius. Given the ball location and radius, and a priori knowledge of the scene layout, we can use the ray-tracing software to (approximately) solve the rendering equation (3.1). A rendered RGB image of the scene is shown in Figure 3.2(b). We shall use this hallway setup as a recurring example in the following sections with different objects and camera configurations to demonstrate the utility of our lower bounding framework.

3.3 Problem Statement

We model the noisy observations Yω, as independent draws from a known class of prob- ∗ ∗ ability distributions p(Yω; θ ), whose parameters depends on θ through the plenoptic function. Letting YΩ , {Yω}ω∈Ω be shorthand for the entire collection of noisy plenoptic observations, the likelihood of the observations can be written as

∗ Y ∗ p(YΩ; θ ) = p(Yω; θ ) , Pθ∗ . (3.2) ω∈Ω

Let Y denote the observation class that deﬁnes the set of all possible observations for

2The rendered images using this illumination and camera setup used here is roughly similar to using a commercially-available 2000 lm ceiling light with an exposure time of 1/350 seconds. 43

(a) (b)

Figure 3.2: Simulating the forward model using rendering: (a) Layout of a Π-shaped hallway with dimensions marked. Corridors A, B, and C are 2.5m, 3m, and 2.5m long respectively, and 2m tall. The hallway is illuminated with white ceiling lights with −1 −2 a luminance of 3 lm · sr m . The camera C0 is located 0.5m outside corridor A. Location and radius of a red spherical ball constitute the unknown scene parameter θ∗. (b) If we deﬁne θ∗ by setting ball radius = 10cm and ball location as the intersection of corridor A and B, then Lθ∗ is the nominal RGB image of the scene captured by a camera at C0. We obtain Lθ∗ using the rendering engine Mitsuba [1] as shown in (b). 44 our problem. We are then interested in determining the fundamental limits of imaging a given scene with unknown parameter θ∗ ∈ Θ, from noisy plenoptic measurements

YΩ ∈ Y. Speciﬁcally, we seek to evaluate the performance of estimators, θb(YΩ) , θb : Y → Θ, which have a ﬁnite measure with respect to Pθ∗ , as a function of the true underlying parameter of interest θ∗. The accuracy (or lack thereof) of an estimator θb in estimating the true scene parameter θ∗ can be measured in terms of the Mean Squared Error (MSE), " 2# MSE (θ∗) = θ − θ∗ , (3.3) θb E b 2 where the expectation is taken with respect to the randomness in the noisy observations

YΩ. In order to derive meaningful local lower bounds (bounds as a function of θ∗) on the MSE of estimators, we further need to make assumptions about the class of estimators. ∗ ∗ Indeed, for any θ ∈ Θ, we have a trivial (yet, valid) estimator θb(YΩ) = θ , ∀ YΩ ∈ Y, which achieves an MSE of 0. Such an estimator performs extremely well when θ = θ∗, but generalizes poorly in other cases. In order to avoid such trivial instances, we restrict our attention to the class of unbiased estimators, for which the MSE reduces to variance (or trace of the covariance matrix, to be precise). Hence we seek lower bounds on the covariance matrix Cov (θ∗) Cov(θ∗) in a semi-deﬁnite ordering sense, or on θb , Var (θ∗) = tr Cov (θ∗) Var(θ∗). θb θb ,

3.4 Renderer-Enabled Computation of Lower Bounds

Our approach employs the Hammersley-Chapman-Robbins lower bound (HCR-LB) [51, 52], which provides lower bounds on the variance of unbiased estimators. HCR-LB and its variants have been used in constrained parameter estimation problems for sensor array signal processing, like bearing estimation [67–69], frequency estimation [70, 71] and many more. In this section, we will state the HCR-LB and discuss some of its salient aspects which make it ideally suited to this problem setting.

∗ d Lemma 3.1 (HCR Lower Bound). Let θ ∈ Θ ⊆ R be any deterministic but unknown ∗ parameter, and let YΩ denote a set of noisy observations of the unknown parameter θ . ∗ Then the variance of any unbiased estimator of θi obeys

∗ ∗ Var(θi ) ≥ HCR(θi ), 45 where the HCR lower bound is given by

∆2 HCR(θ∗) = sup i , ∀ i = 1, . . . , d. (3.4) i ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )

∗ ∗ Here p(YΩ; θ ) and p(YΩ; θ + ∆) denote the pdfs (or pmfs) of the observations parametrized by θ∗ and θ∗ + ∆, respectively.

The denominator in the RHS of (3.4) is the so-called χ2-divergence of p(Y; θ∗ + ∆) from p(Y; θ∗) which essentially measures changes in the probability distribution functions when the true parameter θ∗ is perturbed by ∆. When a small change in the unknown parameter results in distinctly different sets of observations, resulting in a large χ2-divergence between the likelihoods, the HCR-LB is small, suggesting that the estimation problem could be easy and vice versa. It is worth commenting on the relationship of the HCR-LB framework with the well known Cramer-Rao lower bound (CR-LB) [49,50,72]. If we let ∆i → 0, the expression inside the supremum of (3.4) converges to the CR-LB (if the corresponding limit exists). Thus when both bounds exist, HCR-LB is at least as tight as the CR-LB. Unlike the CR-LB, HCR-LB makes no “regularity” assumptions on the noise likelihood function and hence is applicable for a wider range of problems. In particular, CR-LB requires computing derivatives of the log-likelihood function (with respect to θ) and is not well defined in scenarios where the log-likelihood function is not differentiable (e.g., when the parameter space is a countable set). Even for simple scenes, due to the presence of occluding barriers and edges, sharp “transition regions” may occur in the true plenoptic ∗ intensities Lθ∗ as the underlying scene parameter θ varies smoothly resulting in the log-likelihood to be non-differentiable. On the other hand, HCR-LB doesn’t require explicitly computing derivatives of the log-likelihood. For any given noise model, this can be done by rendering or synthesizing ∗ Lθ∗+∆ for a suitably large collection of possible values of θ and ∆ using a ray-tracing engine and then evaluating the functional form of the χ2-divergence. ∗ In addition to lower bounds on the variance of individual parameters θi , we can naturally extend the result in Lemma 3.1, to lower bound the MSE of estimator for a given value of θ∗ as follows.

∗ d Corollary 3.1 (HCR Lower bound on the MSE). Let θ ∈ Θ ⊆ R be any deterministic but unknown parameter, and let YΩ denote a set of noisy observations of the unknown 46 parameter θ∗. Then the MSE of any unbiased estimator of θ∗ obeys k∆k2 MSE(θ∗) ≥ sup , (3.5) ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )

∗ ∗ where p(YΩ; θ ) and p(YΩ; θ + ∆) denote the pdfs (or pmfs) of the observations parametrized by θ∗ and θ∗ + ∆, respectively.

∗ Pd ∗ Proof. For unbiased estimators we have, MSE(θ ) , i=1 Var(θi ). We can further lower bound each of the variance terms in this summation using Lemma 3.1 to obtain,

d X ∆2 MSE(θ∗) ≥ sup i ∗ 2 ∆6=0 h p(YΩ;θ +∆) i i=1 ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ ) d P 2 ∆i ≥ sup i=1 , ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )

P P where the last inequality follows from the fact that, sup fi(x) ≥ sup fi(x). i x x i In the following sub-sections, we instantiate Lemma 3.1 and provide functional ex- pressions for the HCR-LB under some common noise models, viz. Poisson noise and additive white Gaussian noise models.

3.4.1 HCR Lower Bound for Poisson Noise

Consider noisy plenoptic observations Yω, which are drawn independently from a

Poisson distribution with rates given by the true plenoptic intensities Lθ∗ (ω), i.e. ind Yω ∼ Poisson(Lθ∗ (ω)), ∀ω ∈ Ω. The Poisson distribution is commonly used to characterize noise which is discrete or quantized in nature, e.g., when the imaging device counts the number of photons incident on the detector over a certain window of time. If we specialize the HCR-LB given in (3.4) for this setting, we get

2 ∗ ∆i HCRPoisson(θi ) := sup 2 , ∀i = 1, 2, . . . , d, ∆6=0 P (Lθ∗+∆(ω)−Lθ∗ (ω)) ∗ exp − 1 θ +∆∈Θ L ∗ (ω) ω∈Ω θ (3.6) 47 where all the true plenoptic intensities in (3.6) can be obtained from the rendering engine.

3.4.2 HCR Lower Bound for Additive White Gaussian Noise

Consider noisy plenoptic observations with additive white Gaussian noise (AWGN) of i.i.d 2 the form Yω = Lθ∗ (ω) + ω, where the noise ω ∼ N (0, σ ), ∀ ω ∈ Ω. We further assume that the noise variance σ2 is known a priori. If we specialize the HCR-LB given in (3.4) for this setting, we get

2 ∗ ∆i HCRAWGN(θi ) := sup 2 , ∀i = 1, 2, . . . , d, ∆6=0 P (Lθ∗+∆(ω)−Lθ∗ (ω)) θ∗+∆∈Θ exp σ2 − 1 ω∈Ω (3.7) where all the true plenoptic intensities in (3.7) can be obtained from the rendering engine.

3.4.3 Localizing Information Content in Plenoptic Observations

In addition to computing lower bounds, we can go a step further and localize the information content in the plenoptic observations. Since we assume that our observations are statistically independent, the Fisher-Information of a particular pixel ω, for any given parameter value θ∗ can be expressed as

" 2 # ∗ ∂ Iω(θi ) = E log p(Yω; θ) , ∀i = 1, 2, . . . , d, (3.8) ∂θi θ=θ∗ where we implicitly assume that the partial derivative of the log-likelihood function is well-deﬁned at θ = θ∗, and the expectation is with respect to the pdf (or pmf) of the ∗ observations p(YΩ; θ ). If we instantiate (3.8) for the AWGN and Poisson noise models considered above, we get

2  ∂L ∗ (ω)  1 θ for Poisson noise ∗ Lθ∗ (ω) ∂θi Iω(θi ) = 2 , ∀i = 1, 2, . . . , d. 1 ∂Lθ∗ (ω) 2  2 for AWGN with variance σ σ ∂θi (3.9) 48 In the absence of access to the gradients of the plenoptic observations, we can still easily compute the pixel-wise Fisher Information for different noise models by approximating ∗ (3.9) using Finite Differences (FD), which we refer to as (pixel-wise) FD-FI. Iω(θi ) quantifies how much information is conveyed by a given pixel ω about the parameter ∗ of interest θi and hence is a useful tool in localizing the overall information content in the observations.

3.4.4 Experimental Evaluation: Lower bounds for Π-shaped Hallway Scene

In order to demonstrate the utility of our lower bounding framework, we consider the hallway scene setup described in Section 3.2.2. We assume that noisy plenoptic measurements are made by an imaging device located at C0, which collects multispectral images of size 512 × 384 with 30 uniformly spaced spectral channels in the 368 − 830nm range (as opposed to the RGB observations illustrated in Figure 3.2(b)). With this setup, we study the fundamental limits of two diﬀerent scalar estimation problems:

• Estimating location of the ball (with a ﬁxed radius of 10cm) along the Π-shaped hallway.

• Estimating radius (size) of the ball located at the center of Corridor B.

For these estimation problems, we assume that the scene geometry and the BRDFs of the surfaces are known, i.e., the only unknown in the scene is either the location of the ball or its radius. For the problem of estimating the ball location, we discretize the 1-D Π-shaped manifold along which the ball could be located to 790 equispaced points (each 1cm apart). Ball locations are numbered clockwise from 1 (bottom-left of the Π-shaped manifold) to 790 (bottow-right of the manifold). For size estimation problem, we consider 101 possible values for ball radii in the range 5cm to 15cm (with 0.1cm increments). We use the Mitsuba renderer [1] to synthesize physically accurate plenoptic ∗ samples Lθ∗ , for different values of the unknown parameter θ (and ∆), and numerically evaluate the HCR bound given by Equations (3.6) and (3.7). In addition to numerically evalutating the HCR-LB, we use Finite-Differences on the rendered data, as discussed in Section 3.4.3, to (approximately) compute pixel-wise Fisher Information (3.9), which is referred to as Finite Difference-Fisher Information (FD-FI). All the scenes were rendered with 16384 samples per-pixel using the Bi-Directional Path Tracer (BDPT) rendering 49

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 3.3: HCR-LB for ball location estimation for Poisson Noise: (a)-(d), and AWGN with different values of σ: (e)-(h). HCR-LB under different regimes are shown in: (a),(e) LOS region - HCR-LB is very small. The LB drops significantly when ball starts moving in Corridor B; (b),(f) Transition from LOS to NLOS - Sharp increase in the HCR-LB when the ball moves away from LOS; and (c),(g) NLOS region - HCR-LB in is much higher, indicating the potential hardness of the estimation problem; (d),(h) Ball radius estimation - HCR-LB decreases with increasing size (radius) of the ball. With the help of these curves, one can quantify how difficult the problem of estimating NLOS parameters can be. For AWGN model, we can see that HCR-LB increases with σ as expected. algorithm on a HP Linux distributed cluster with 20 Intel Haswell E5-2680v3 processor cores and 10GB of RAM per-scene and it took approximately 2.5 hours to render each scene.

Results and Discussion

HCR-LB under Poisson noise for estimating the ball location is show in Figures 3.3(a) to 3.3(c). Similary, HCR-LB under AWGN with diﬀerent levels of noise variances σ2, is shown in Figures 3.3(e) to 3.3(g). It can be observed that the HCR-LB for AWGN increases for increasing σ, which is intuitive, as the estimation problem is expected to get tougher as the noise level increases. This trend is observed uniformly for both location estimation and the radius estimation problems. For the location estimation problem, the HCR-LB is (unsurprisingly) small ( 1) 50

(a) (b)

Figure 3.4: Pixelwise FD-Fisher Information (FD-FI), obtained by aggregating contributions from all 30 spectral channels. Darker regions ⇒ more informative. Pixelwise FD-FI shows where and how information about the parameter of interest is localized in our observations. These images highlight subtle details about the scene parameters which are not visible from the nominal RGB images (bottow-row) which are obtained from the rendering software; (a) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row) for 4 different ball locations: (from left to right) completely in LOS, just inside LOS, just outside LOS, center of corridor B. Notice that different regions in the scene are more informative than others for different ball locations; (b) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row). These images show where information about ball radius is localized. Notice that regions of information differ from the FD-FI images for ball location in Figure 3.4(a). 51 when the ball is in corridor A (in LOS). The lower bound suddenly drops when the ball translates horizontally (with respect to the camera) in corridor B (see Figures 3.3(a) and 3.3(e)). This is expected, as detecting an object is easier when it moves horizontally as compared to when it moves away from the camera. HCR-LB increases by a few orders of magnitude when ball goes away from LOS – location 320 7→ location 321 (see Figures 3.3(b) and 3.3(f)), and continues to increase sharply for ball locations further down corridor B (see Figures 3.3(c) and 3.3(g)). In corridor C, the actual value of the bound gets very large ∼ 104cm2, which indicates that accurately locating the ball around two corners is (again, unsurprisingly) a much more difficult estimation problem. Some recent works [7,8] have proposed that sharp edges and occlusions in a scene can help recover NLOS imagery. These sharp occlusions act like a pin-hole camera, projecting information about NLOS regions of a scene on to visible regions like walls and floors. The authors of [7] refer to this imaging phenomenon as the “corner-camera”. Our pixel-wise FD-FI images are able to provide some evidence for this effect showing regions around the corners containing higher amount of information about the hidden object location than other regions. This effect can be clearly seen from the video of the pixel-wise FD-FI and HCR-LB as the ball moves from location 1 to 790, provided in the supplementary material. By looking at the FD-FI images, we also believe that some of the dips in the HCR-LB in corridor C (in Figures 3.3(c) and 3.3(g)) result from “corner-camera” type effects. Pixelwise FD-FI images in Figure 3.4(a) provide valuable insights on where and how information about a parameter of interest is distributed amongst the observed samples. As one might expect, when in LOS, the edges of the ball provide most information about its location. When the ball moves away from LOS, information about its location is conveyed by shadows cast by the ceiling light in corridor B on the floor, and other indirect photons from the back wall and the floor. These subtle details are however not visible in the raw plenoptic data (or the nominal RGB images in bottom-row of Figure 3.4). The ability to localize information content in indirect photons is extremely useful for those who are interested in developing NLOS imaging algorithms. The FD-FI images can be used as a guide to develop clever imaging modalities that collect observations with “high information content” about the NLOS scene parameters. HCR-LB for estimating the radius of a ball placed at the center of corridor B (location: 400) is shown in Figures 3.3(d) and 3.3(h). We observe that the lower bound decreases as the radius increases, suggesting that estimating radius of a larger ball could 52 be relatively easier. Pixelwise FD-FI images for radius estimation problem in Figure 3.4(b), show that regions of information are mainly on the back wall which arise from multi-bounce photons, and the number of informative pixels increases for larger radii. We can see that the regions of high information differ significantly between Figure 3.4(a) and Figure 3.4(b). We conclude that these informative regions/observations depend on both what the parameter of interest is (see difference between Figures 3.4(a) and 3.4(b)), and also on the particular value of the parameter (see differences amongst images in each of Figures 3.4(a) and 3.4(b)). Another interesting point to note is that, while the informative regions change significantly for the type of parameter under consideration and also for different values of a given parameter, they are nearly identical for the two noise models considered here. However, the magnitude of the FD-FI pixels between the two noise models are considerably different (see first two rows of Figures 3.4(a) and 3.4(b)).

3.5 Computing Lower bounds with Inexact Rendering

The framework above implicitly assumes that the rendering engine yields the exact plenoptic intensities for a given set of scene parameter values. While this assumption might be valid for scenes with simple geometry, lighting, and/or surface scattering models (or BRDFs), for complex scenes, the rendered images could be erroneous with per-pixel error that depends inversely on the number of Monte-Carlo samples used by the rendering algorithm to solve the rendering equation (3.1). In this section, we analyze the inconsistencies that arise when using plenoptic data with rendering errors to compute lower bounds of the form of equations (3.6) and (3.7). We also provide a simple algorithm to estimate intervals for the true HCR-LB when using inexact renderings. The errors induced by light-transport (or rendering) algorithms have been previously studied by Arvo et al. [73], where the authors identify different sources of error in the rendering algorithm and provide theoretical bounds. More recently, Celarek et al. [74] propose a methodology to numerically estimate rendering errors in terms of MSE and also per-pixel standard deviation using multiple short renderings. While this work aims to use the error estimates to compare various rendering methodologies, our aim here is to assess and quantify how rendering errors percolate into our HCR-LB framework. It is further useful to highlight the distinction between the rendering errors considered in this section and the different noise models for the plenoptic observation in 53 Sections 3.4.1 and 3.4.2. While the Poisson and AWGN noise models describe the noise/uncertainty in the observations of an imaging system for scene parameter estimation, this section analyzes the effect of using inaccurately rendered data to compute the HCR-LB presented in Sections 3.4.1 and 3.4.2. For some value of the scene parameter θ, let us denote the true plenoptic intensities as Lθ, and the output of the rendering algorithm computed using N samples per-pixel (N) (N) (N) as, Leθ = Lθ + Eθ , where Eθ is the (additive) rendering error. In our analysis, we assume that for all θ ∈ Θ, and ω ∈ Ω, the rendering algorithm satisfies the following assumptions:

(N) A.1 The Rendering algorithm is unbiased, i.e. E[Leθ (ω)] = Lθ(ω), or equivalently (N) E[Eθ (ω)] = 0.

A.2 Most weighted sums of pixel-wise error variance decays at the rate of Θ(N −1)3.

In other words, for weights Wω ∼ Uniform [0, Lmax], where Lmax is an absolute constant, X (N) −1 Wω · Var(Eθ (ω)) = Cθ(ω) · N , ω∈Ω

4 with high probability , for some scene-dependent constant Cθ(ω) > 0.

A.3 The higher-order moments of the relative per-pixel error is upper bounded by " (N) k# −1 Eθ (ω) O(N ), i.e., ∃ N0, δ > 0, s.t. for all N > N0 and k > 2, E = Lθ(ω) O(N −(1+δ)).

A.4 The pixel-wise errors for diﬀerent scene parameter values are statistically un- correlated.

Remark 3.1. Assumption A.1 is a mild assumption and is satisﬁed by a lot of modern rendering algorithms. In particular, we use Mitsuba’s [1] Bi-Directional Path Tracer (BDPT) and Redner’s [53] path tracer, which are unbiased, for rendering all our scenes.

Remark 3.2. A suﬃcient condition for Assumption A.2 is that the error variance of (N) −1 each pixel, Var(Eθ (ω)) = Θ(N ) hold for all ω ∈ Ω. While such behavior is expected from standard Monte-Carlo based algorithms, modern rendering algorithms use other

3Here Θ(·) refers to asymptotic notation and is not to be confused with the class of scene parameter values deﬁned in Section 3.3. 4 The probability is with respect to the randomness in the weights Wω and not the rendering errors. 54 techniques like importance sampling to improve the performance. Hence the N −1 rate might not hold pixel-wise, but is typically valid when averaged over all pixels.

We present simulations in Appendix 3.9.1 to empirically validate Assumption A.2. In particular, we show empirically that the weighted sum of pixel-wise error variance decays at 1/N for the example scene used in this paper. For rendering algorithms with P (N) diﬀerent rates of convergence, we could simply assume that Wω·Var(Eθ (ω)) = Cω· ω∈Ω N −p, where the degree of decay p > 0 depends on the nature of the rendering algorithm used. The decay rate p may be known a priori or could be estimated empirically from simulations similar to that presented in Appedix 3.9.1. Our analysis naturally extends for any p > 0.

Remark 3.3. Assumption A.3 is automatically satisﬁed if the rendering errors are bounded in magnitude, which they typically are for most rendering algorithms used in practice, and when Assumption A.2 holds. This criterion suggests that for “suﬃciently large” values of N, the higher-order error terms are “relatively small.”

Remark 3.4. Assumption A.4 holds when the renderer uses diﬀerent (random number) seeds for generating its samples.

We ﬁrst look at the functional form of the HCR-LB presented in Sections 3.4.1 and 3.4.2. We can see that under both Poisson noise and AWGN models, the HCR-LB ∗ for θi takes the form

2 ∗ ∆i HCR(θi ) = sup , (3.10) ∆6=0 exp {λ(Lθ∗ , Lθ∗+∆)} − 1 θ∗+∆∈Θ where λ(Lθ∗ , Lθ∗+∆) captures the dependence on the rendered plenoptic intensities. It is easy to see that,

2  P (Lθ∗ (ω)−Lθ∗+∆(ω)) λP for Poisson Noise Model  Lθ∗ (ω) , λ(L ∗ , L ∗ ) = ω∈Ω . θ θ +∆ 2  kLθ∗ −Lθ∗+∆k 2 σ2 , λG for AWGN with noise variance σ (3.11)

We suppress the dependence of λ on θ∗, ∆, and the forward model for the sake of brevity. Such dependences will be made explicit whenever additional clarity is required. 55 (N) (N) When we use inaccurately rendered plenoptic data Leθ∗ and Leθ∗+∆, where we use N samples per-pixel for rendering, we end up with λe(N) and obtain an erroneous estimate of the HCR functional (the function being maximized in the HCR-LB),

2 (N) ∗ ∆i f(λe ; θ , ∆) = . (3.12) exp λe(N) − 1

We first characterize the effect of rendering error in computing λ using the following claims, and then use that to analyze the effect on the HCR functional (3.12), and the overall HCR-LB. The proofs of Theorems 3.1 and 3.2 appear in Appendices 3.9.2 and 3.9.3, respectively.

Theorem 3.1 (Eﬀect of rendering error on λP ). Let us denote the value of λP computed (N) using the (inexactly) rendered data with N samples per-pixel as λeP . Then we have

(N) CP λe = λP + + ηP , (3.13) P N

−(1+δ) where CP ≥ 0 is a constant that depends only on scene parameters, |E[ηP ]| = O(N ), −1 and Var(ηP ) = O(N ), and δ > 0 is the constant appearing in Assumption A.3.

Theorem 3.2 (Eﬀect of rendering error on λG). Let us denote the value of λG computed (N) using the (inexactly) rendered data with N samples per-pixel as λeG . Then we have

(N) CG λe = λG + + ηG, (3.14) G N where CG ≥ 0 is a constant that implicitly depends only on scene parameters and 2 −1 AWGN variance σ , E[ηG] = 0, and Var(ηG) = O(N ).

Theorems 3.1 and 3.2 imply that λe overestimates the true λ in expectation, where −1 CP CG the estimation bias shrinks at the rate of Θ(N ). While the N and N terms denote the (positive) bias, the ηP and ηG terms codify the variance of the computed λe’s. Although both bias and the variance terms exhibit the same rates of O(N −1), there is a bias-variance tradeoff in the computed λe that depends on the scene parameters θ∗ and θ∗ + ∆. In order to better understand this bias-variance tradeoff, let us take a closer look at the expression of λe. We use the AWGN case here for the sake of exposition, but the tradeoff holds similarly for the Poisson noise model as well. 56 Using (3.11) we can write,

(N) 1 2 2 T λe = λG + kEθ∗ − Eθ∗+∆k + (Lθ∗ − Lθ∗+∆) · (Eθ∗ − Eθ∗+∆) . (3.15) G σ2 σ2 | {z } | {z } η1 η2

Let us see how η1 and η2 behave under 2 diﬀerent scenarios:

Case (1): kLθ∗ − Lθ∗+∆k kEθ∗ − Eθ∗+∆k: (N) In this scenario, |η1| |η2|, and we can write λeG ≈ λG+η2. Since E[η2] = 0 (from (N) Assumption A.1), we can see that the computed value λeG is (approximately) un- (N) (N) −1 biased, i.e., E[λeG ] ≈ λG. Furthermore, we have Var(λeG ) ≈ Var(η2) = O(N ) from Assumption A.2. In other words, this is a regime where the (zero-mean) variance term dominates the overall error. This typically happens when:

•k ∆k is large, or • the parameters of interest are in LOS, which means that even a small perturbation of the true parameter θ∗ would result in a large diﬀerence in the plenoptic observations, or • the value of N 1, which would result in the rendering errors being very small.

Case (2): kLθ∗ − Lθ∗+∆k kEθ∗ − Eθ∗+∆k: (N) In this scenario, |η1| |η2|, and we can write λeG ≈ λG + η1 > λG. This in (N) ∗ turn means that the HCR functional is underestimated, i.e., f(λeG ; θ , ∆, ) < ∗ −1 −(1+δ) f(λG; θ , ∆). Furthermore, we have E[η1] = Θ(N ) and Var(η1) = O(N ) from Assumptions A.2 and A.3, respectively. In other words, this is a regime where the (non-negative) bias term dominates the overall error. This happens when:

•k ∆k is small, or • the parameters of interest are in NLOS, which means that even large per- turbations of the true parameter θ∗ would yield very similar plenoptic observations, or • the value of N is small, which would result in larger rendering errors.

Furthermore, since the overall HCR-LB is achieved by maximizing the HCR functional ∗ f(λ; θ , ∆), the supremum typically occurs for small values of λ or when kLθ∗ −Lθ∗+∆k 57 is small (corresponding to Case (2) above). Thus the discussion above implies that the overall HCR-LB is usually underestimated, especially for NLOS imaging problems, when using inaccurately rendered plenoptic data to compute the bounds.

Remark 3.5 (Eﬀect of rendering error on the overall HCR lower bound). A direct implication of Theorems 3.1 and 3.2 is that the HCR-LB for Poisson and AWGN noise models computed using plenoptic data rendered with N samples per-pixel obeys,

∗ a.s ∗ lim HCRN (θi ) = HCR(θi ), ∀ i = 1, 2, . . . , d. (3.16) N→∞

Furthermore, if the supremum of the true HCR functional f(λ; θ∗, ∆) occurs for k∆k ≤

, then from the above discussion we can infer that there exists a constant N0() that depends on the scene parameters such that for all N < N0(), with high probability,

∗ ∗ HCRN (θi ) ≤ HCR(θi ) ∀ i = 1, 2, . . . , d. (3.17)

The value N0() essentially determines how the samples per-pixel aﬀects the bias- variance tradeoﬀ described above.

Equation (3.17) simply follows from the observation that for small values of N, the CP CG bias-terms N and N dominate the variance terms (ηP and ηG respectively) so that N ∗ ∗ λe ≥ λ, which in turn means that HCRN (θi ) ≤ HCR(θi ).

3.5.1 Estimating HCR Lower Bounds from Inexact Rendering

Our error analysis above shows how rendering error manifests itself in the computation of the HCR-LB. One important take-away from this analysis is that in order to get a good estimate of the HCR-LB, one must use as many number of samples per-pixel as possible to render the scenes. However, this might result in a computational bottleneck as ray-tracing is a highly time and memory intensive process. For example, rendering each high-resolution plenoptic (multi-spectral) image in Section 3.4.4 until convergence using Mitsuba [1] took ∼ 2.5 hours, and we had to render close to 800 such images for the location estimation example (and an additional 100 images for the radius estimation example). In this section, we describe a simple method to estimate upper and lower intervals for the true HCR-LB. For any given values of θ∗ and ∆, Theorems 3.1 and 3.2 suggest that 58 the relationship between the true and the observed λ’s (for both Poisson and AWGN noise models) is given by,

C(θ∗, ∆) λe(N)(θ∗, ∆) = λ(θ∗, ∆) + + η, (3.18) N where C(θ∗, ∆) ≥ 0 is the co-eﬃcient of the rate of decay of bias, and η models the higher-order (moments 3 and higher) terms of the rendering errors, with Var(η) = O(N −1). Thus we can use a data-driven method to estimate the unknown λ(θ∗, ∆) and C(θ∗, ∆) by rendering the scenes with diﬀerent values of N and solving for the unknowns using a simple (weighted) Least-Squares algorithm. In particular, if we render the scenes for parameter values θ∗ and θ∗ + ∆ using

N1,N2,...,Nk samples per-pixel, for some k ≥ 2, then we get the following system of linear equations,

 (N ) ∗   1    λe 1 (θ , ∆) 1 η1 N1 " ∗ #  .   . .  λ(θ , ∆)  .   .  =  . .  +  .  . (3.19)     C(θ∗, ∆)   (N ) ∗ 1 λ k (θ , ∆) 1 ηk e Nk

Equation (3.19) can be solved in closed form to obtain an unbiased estimate (approximately unbiased for the case of Poisson noise model), λb(θ∗, ∆) of λ(θ∗, ∆). We can then use the estimated λb(θ∗, ∆) in (3.10) to get

2 ∗ ∆i HCR([ θi ) := sup n o , ∀ i = 1, 2, . . . , d, (3.20) ∆6=0 exp λb(θ∗, ∆) − 1 θ∗+∆∈Θ

∗ where HCR([ θi ), is a random variable whose randomness or stochasticity arises due to the rendering errors.

∗ ∗ Claim 3.1 (Relationship between HCR([ θi ) and HCR(θi )). For any given parameter value θ∗, the HCR lower bound computed using unbiased estimates λb(θ∗, ∆) (for all ∆) is, in expectation, an upper bound on the true HCR lower bound, i.e.,

h ∗ i ∗ E HCR([ θi ) ≥ HCR(θi ), ∀ i = 1, 2, . . . , d. (3.21) 59 Proof. We have

" # h ∗ i ∗ h ∗i E HCR([ θi ) = E sup f λb; ∆, θ ≥ sup E f λb; ∆, θ , (3.22) ∆6=0 ∆6=0

where we use the fact that EX sup G(X, y) ≥ sup EX [G(X, y)]. Next we observe that y y 2 ∆i ∗ the HCR functional f(λ) := exp(λ)−1 is a convex function of λ. Hence for every θ and ∆, we can apply Jensen’s inequality to get,

h ∗i h i ∗ ∗ E f λb; ∆, θ ≥ f E λb ; ∆, θ = f (λ; ∆, θ ) , (3.23) where the last equality in (3.23) follows from the fact that λb’s are unbiased estimates. Combining Equations (3.22) and (3.23) yields the desired result, thus concluding the proof.

While Claim 3.1 gives an upper bound (in expectation) on the true HCR-LB, Re- mark 3.5 suggests that the HCR-LB computed directly using rendered plenoptic data is (typically) a lower bound on the true HCR-LB value. Thus, for any given scene and unknown parameter value, we can compute an upper and lower interval within which the true HCR-LB is expected to lie using a ﬁxed computational/rendering budget. In the next section, we present experimental results for the problem of object localization, which demonstrates the utility of our error analysis framework. It is worth commenting on the relationship between the λ’s in our error analysis and the CR-LB. We can obtain the CR-LB for some scalar parameter θ∗ as

∆2 CR(θ∗) = lim . (3.24) ∆→0 λ(Lθ∗ , Lθ∗+∆)

From Remark 3.5 and the discussion above it, we can see that for small values of ∆, using rendered data (and λeN ) to compute the CR-LB (using Finite Diﬀerences), typically results in underestimating the CR-LB as well especially for NLOS parameter estimation problems. 60 3.5.2 Experimental Validation: NLOS Object Localization

Setup

We use the same scene layout as described in Section 3.2.2, with a different camera configuration and object. We consider the problem of estimating the location of a red teapot (downloaded from Morgan McGuire’s website [75]) that is constrained to lie in a straight line in Corridor B as shown in Figure 3.5(a). The distance of the teapot from the intersection of corridors A and B is the scalar parameter of interest θ∗. The camera is located at the center of corridor A as shown in Figure 3.5(a) and captures RGB images of size 160 × 120. Using the insights on Fisher information from Section 3.4.4, we see that regions on the floor near the corner have more information about hidden objects than others. Hence we point the camera slightly towards the floor instead of looking straight at the back wall. Luminance of the ceiling lights were set to 12 lm · sr−1m−2, emitting white light (uniformly over all wavelengths)5. We use Redner’s [53] default path-tracer to render the scenes. Figures 3.5(b) and 3.5(c) are the rendered images for θ∗ = 0.2m and 0.9m respectively using 65536 samples-per pixel. We compute the HCR-LB for 100 teapot locations from 0.7m to 1.69m at 1cm increments. For obtaining HCR[ from inexact renderings discussed in Section 3.5.1, we render all the 100 scenes with 10 different values of samples per-pixel, N = 2048, 3072,..., 11264(= 11 · 1024). For computing the HCR-LB directly from the rendered plenoptic data as discussed in Section 3.4, we use Neff = 65536 samples per-pixel. We use the same rendering/computational budget for both methods, i.e., the overall rendering time using

Neﬀ = 65536 samples and rendering the scene 10 times with N = 2048, 3072,..., 11264, take approximately the same time (around 3.3 minutes per teapot location).

Results and Discussion

∗ ∗ ∗ We can see from Figure 3.6 that HCR([ θ ) ≥ HCRNeﬀ (θ ) for θ ≥ 0.9m, which corresponds to the NLOS regime. We can also observe a sharp increase in lower bounds around θ = 0.9m, which corresponds to the teapot moving away from the LOS of the camera . Such an increase is expected since localizing NLOS objects is a much harder problem as we saw in Section 3.4.4. We also observe that the HCR-LB (both HCR[ and HCRNeﬀ ) increases for both AWGN and Poisson Noise models as the teapot moves

5The rendered images using this illumination and camera setup used here is roughly similar to using a commercially-available 2000 lm ceiling light with an exposure time of 1/120 seconds. 61

(a) (b) (c)

Figure 3.5: (a) Top-view of the scene layout used. Scene geometry is the same as in Section 3.2.2. Instead of a red spherical object, we consider a red teapot, and the camera is now placed in the middle of hallway and captures RGB images. The scalar parameter of interest θ∗ is the horizontal displacement of the teapot from intersections of corridors A and B. RGB images rendered using Redner for diﬀerent values of θ∗ are shown in (b) and (c). 65536 samples-per pixel were used, and it took around 3.3 minutes to render each scene; (b) θ∗ = 0.2m: teapot fully in LOS; (c) θ∗ = 0.9m: teapot just moved completely away from LOS. further away from LOS and into corridor B, which again is expected.

It can also been seen that HCR[ (black lines) are not as smooth as HCRNeﬀ (red lines). We believe that this is due to the fact that HCR[ is a noisy estimate obtained from many sets of low samples per-pixel renderings and hence has higher variance than

HCRNeff , which is computed using a single set of rendered images with larger samples per-pixel. Furthermore, it is worth emphasizing that HCR[ shown as is, does not necessarily upper bound the true HCR-LB, but the relationship holds in expectation, ∗ ∗ i.e. E[HCR([ θ )] ≥ HCR(θ ), where the expectation is taken with respect to rendering errors. Since computing E[HCR][ is computationally prohibitive, we use HCR[ as a surrogate for the upper bound on the true HCR-LB in our discussions. Thus it is important to keep in mind that the upper bounds (and hence the HCR intervals) shown here are “approximate” and help us understand and interpret the fundamental limits of parameter estimation problems associated with plenoptic imaging systems. Figure 3.7(a) shows the effect of samples per-pixel N on λ. As we increase N, we can see that λeN uniformly decreases across all values of ∆ for NLOS parameters, as predicted by our analysis. From Figure 3.7(b), we can see that λeNeff (θ∗, ∆) ≤ λb(θ∗, ∆) for all ∆. Even though λb and λeNeff are close, Figure 3.7(c) shows that the difference in 62

Estimated Intervals for HCR-LB - Poisson Noise Estimated Intervals for HCR-LB - AWGN with σ = 0.10 Estimated Intervals for HCR-LB in NLOS - AWGN

1000 3 HCR 16 HCR 10 HCR HCR HCRNeff HCRN Neff 14 eff d d d 800 HCR interval HCR interval σ = 0.10 12 σ = 0.20 2 2 2 σ = 0.40 2 600 10 10 σ = 0.60 σ = 0.80 8

400 6 HCR-LB in cm HCR-LB in cm HCR-LB in cm

4 101 200 2

0 0

0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Object Location in m Object Location in m Object Location in m (a) (b) (c)

Figure 3.6: HCR-LB for estimation of teapot location under AWGN and Poisson Noise.

HCRNeff (Red lines): HCR-LB computed directly using rendered data with Neff = 65536 samples per-pixel; HCR[ (Black lines): HCR-LB estimated from rendering scenes with h ∗ i N = 2048, 3072,..., 11264. Due to rendering errors, typically we have E HCR([ θ ) ≥ ∗ HCR(θ ) ≥ HCRNeff . The region between HCR[ and HCRNeff denotes the interval within which the true HCR-LB is likely to lie. (a) HCR-LB for Poisson noise model. (b) HCR-LB for AWGN with σ = 0.1. (c) HCR-LB when the teapot is not in LOS for σ = 0.1, 0.2, 0.4, 0.6, and 0.8.

˜N Estimated and Observed λ for θ∗ = 1.05 m HCR Functional for θ∗ = 1.05 m λ (θ∗, ∆) for θ∗ = 1.05 m 5 4.5

14 Estimated - f(λˆ) 4.0 HCR(θ∗)

4 2 12 3.5 Observed - f(λÑeff ) d HCRN (θ∗) 10 3.0 eff 3 2.5 8 2.0 6 2 Lambda value Lambda value 1.5 4 N = 2048

HCR functional1 value in.0 cm N = 6144 1 Estimated - λˆ 2 N = 11264 0.5 N Neﬀ = 65536 Observed - λ˜ eﬀ 0 0 0.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.06 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 − − − − − − − − − − − − Perturbation ∆ in m Perturbation ∆ in m Perturbation ∆ in m (a) (b) (c)

Figure 3.7: Effect of samples per-pixel on λ and the HCR functional f(λ) for estimation of teapot location. Noise model: AWGN with σ = 0.1, true object location θ∗ = 1.05m. (a) λeN in the neighborhood of θ∗ = 1.05m - shows how λeN decreases with N uniformly for all values of ∆. (b) Plot of estimated and observed λ’s for θ∗ = 1.05m shows that N λe eff ≥ λb, even with Neff = 65536 samples per-pixels. (c) HCR functional obtained from the estimated and observed λ’s. 63

HCR Functional for θ∗ = 1.05 m HCR Functional for θ∗ = 1.45 m Estimated Intervals for HCR-LB - AWGN with σ = 0.80 300 450 HCR 1000 275 HCR 400 Neﬀ

2 2 d 250 HCR interval 800 350

225 2

200 300 600

175 250 400 HCR-LB in cm 150 Estimated - f(λˆ) Estimated - f(λˆ) 200 HCR(θ∗) HCR(θ∗) 200 HCR functional125 value in cm HCR functional value in cm Observed - f(λÑeff ) Observed - f(λÑeff ) d d 100 HCR (θ ) 150 HCR (θ ) Neff ∗ Neff ∗ 0

0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.2 0.1 0.0 0.1 0.2 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 − − − − − − Perturbation ∆ in m Perturbation ∆ in m Object Location in m (a) (b) (c)

HCR Functional for θ∗ = 1.05 m HCR Functional for θ∗ = 1.45 m Estimated Intervals for HCR-LB - Poisson Noise 275 1000 400 HCR HCR 250 Neﬀ d 2 2 HCR interval 350 800 225 2 200 300 600

175 250 400

150 HCR-LB in cm Estimated - f(λˆ) Estimated - f(λˆ) 200 125 HCR(θ∗) HCR(θ∗) 200 HCR functional value in cm HCR functional value in cm Observed - f(λÑeff ) Observed - f(λÑeff ) d d 100 HCR (θ ) 150 HCR (θ ) Neff ∗ Neff ∗ 0

0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 − − − − − − − − Perturbation ∆ in m Perturbation ∆ in m Object Location in m (d) (e) (f)

Figure 3.8: Relationship between HCR[ and HCRNeff for different teapot locations (a)- (c) for AWGN with σ = 0.8, (d)-(f) for Poisson Noise model; under 2 different scenarios: (a),(d) Maximum of the HCR functional occurs for ∆ → 0 implies that HCR[ is much larger than HCRNeff . (b),(e) Maximum of the HCR functional occurs for k∆k 0 implies that HCR[ and HCRNeff are approximately equal. (c),(f) HCR-LB for 0.7m ≤ θ∗ ≤ 1.69m. the HCR functional is significant around the neighborhood of ∆ = 0, and the difference slowly tapers off for larger values of ∆. This implies that the overall HCR-LB will have larger uncertainty intervals when the maximum of the HCR functional occurs in the neighborhood of ∆ = 0 (see Figures 3.8(a) and 3.8(d)), and smaller intervals when the maximum occurs for k∆k 0 (see Figures 3.8(b) and 3.8(e)). We can also see from Figure 3.6(c) that the HCR intervals for AWGN model are relatively large for smaller values of σ and decrease as we increase the noise level σ. This arises from the fact that as we increase σ, the location of the maximum of the HCR functional moves further away from ∆ = 0, and this results in smaller HCR intervals as we have seen from Figure 3.8. Similar conclusions also hold for HCR-LB under Poisson noise model (see Figures 3.8(d) to 3.8(f)). From (3.24), we can see that CR-LB is essentially the limit of the HCR functional for ∆ → 0. Then Figure 3.8 would imply that, CR-LB computed with inexactly rendered 64 data might have larger uncertainty intervals (in general) than HCR-LB since, there could be instances where the supremum of the HCR functional is achieved away from the neighborhood of ∆ = 0 in which case the uncertainty in HCR-LB would be smaller (see Figures 3.8(b) and 3.8(e)). The experimental results provided here validate our analysis of the effects of rendering errors on the lower bound computation and illustrate the utility of our HCR estimation framework outlined in Section 3.5.1, where we use multiple sets of erroneously rendered data with different number of samples per-pixel, to obtain intervals for true HCR-LB.

3.6 Maximum Likelihood Estimation

While HCR-LB provides lower bounds that are at least as tight as CR-LB, there are no guarantees on the existence of an unbiased estimator that achieves the lower bound. In order to show that the HCR-LB we derived here accurately depicts the true fundamental limits of scene parameter estimation, we compare our lower bounds with the performance of the Maximum Likelihood (ML) Estimator. Maximum Likelihood Es- timates (MLEs) are a good ﬁrst choice because they are intuitive, simple to derive, asymptotically unbiased, and are optimal for many simple estimation problems. While we make no claims about the optimality of MLEs, we show how they compare against the HCR-LB to give us a sense of how tight our lower bounds are. If the errors of the MLEs are close to the corresponding HCR-LB values, it would imply that our lower bounds are tight and indicative of the true fundamental limits. Under the additive white Gaussian noise model described in Section 3.4.2, we can ∗ obtain the MLE for θ , from noisy observations YΩ as,

X 2 θbML(YΩ) = arg min (Yω − Lθ(ω)) . (3.25) θ∈Θ ω∈Ω

It is worth noting that ML estimation under Gaussian noise is equivalent to minimizing the `2 loss. We use the optimization library in PyTorch [76] to solve (3.25), and obtain the ML estimates. In particular, we use the Adam optimizer [77], where gradients of the loss function with respect to the scene parameters are obtained using Finite Diﬀerences (FD). A principled alternative to FD gradients is to use diﬀerentiable renderers like Redner 65 θ∗ = 0.8m θ∗ = 1.0m θ∗ = 1.2m θ∗ = 1.4m θ∗ = 1.6m

Figure 3.9: Top row: Clean Images for diﬀerent teapot locations rendered using 65536 samples per-pixel. Bottom row: A single instance of noisy images corrupted by AWGN with σ = 0.1. After the teapot goes completely out of LOS (θ∗ > 0.9m), it is very hard to discern any information about the teapot by simply looking at these images (both from clean and the noisy versions).

[53] or Mitsuba 2.0 [78] that enable us to directly render gradients of the plenoptic observations with respect to the scene parameters of interest. We observed that rendering gradients of multi-bounce photon paths has a much larger memory footprint and a 25× to 30× computational overhead as compared to rendering the actual plenoptic observations. Thus for problems with a small number of parameters (< 15 or so), we were able to obtain more robust gradients using FD than by using diﬀerentiable rendering.

3.6.1 Experimental Evaluation: NLOS Object Localization using Max- imum Likelihood Estimation

Setup

We use the exact same setup used in Section 3.5.2. We consider the problem of estimating the location of a red teapot that is constrained to lie in a straight line in Corridor B as shown in Figure 3.5(a). The distance of the teapot from the intersection of corridors A and B is the scalar parameter of interest θ∗. The camera is located at the center of corridor A as shown in Figure 3.5(a) and captures RGB images of size 160×120. We use 66 ∗ ∗ ∗ Paramter θ HCRNeﬀ (θ ) HCR([ θ ) MSE(θbML) Var(θbML) (in cm) (in cm2) (in cm2) (in cm2) (in cm2) 80 4.17 × 10−214 1.30 × 10−216 0.0024 0.0019 100 2.2201 4.3282 4.0518 4.0372 120 2.6999 4.9048 5.9466 4.0569 140 3.3964 5.6872 6.7245 6.2610 160 6.7721 11.6408 12.8361 12.3953

Table 3.1: Comparison of HCR lower bound and performance of MLE for AWGN with σ = 0.1.

Redner [53] to render the true plenoptic observations using Neﬀ = 65536 samples per- pixel and then synthetically add AWGN with σ = 0.1 to generate noisy observations. Figure 3.9 shows the rendered clean images and a single instance of noisy observations for 5 diﬀerent teapot locations, θ∗ (in m) = 0.8, 1.0, 1.2, 1.4, and 1.5. Except for θ∗ = 0.8m, all other locations correspond to the case where the teapot is completely out of LOS.

Results and Discussion

We obtain the ML estimates by solving (3.25), where the derivative with respect to the 6 `2-loss is computed using FD . As opposed to 65536 samples, which was used to render the clean (noiseless) images, we use low-sample renderings to obtain fast (and noisy) gradients for the stochastic optimization procedure. We use N = 512 samples for the initial few iterations and then use N = 1024 towards the end. It is worth noting that it took approximately 3 seconds and 6 seconds to compute gradients with 512 and 1024 samples respectively, on an NVIDIA Quadro RTX 8000 GPU with 48GB of RAM. The overall runtime for (a single run of) the ML estimation algorithm varied from 28 − 40 minutes depending the true location of the teapot. The algorithm converged faster when the teapot was closer to the beginning of the corridor. We perform 30 independent runs of ML estimation, with diﬀerent realizations of noisy observations, for each θ∗ to assess the performance of the MLE. The initialization point for every run was chosen uniformly at random in the interval 0.7m to 1.7m. We report the average MSE and the variance of the ML estimates over the 30 runs along with the computed HCR-LB (both HCRNeﬀ and HCR)[ from the previous section, in

6 `(θ0+ξ)−`(θ0−ξ) We use the central diﬀerence method to obtain the gradients of the loss: ∇θ`(θ0) ≈ 2ξ , with ξ = 0.01m, where `(θ0) is the loss function. 67 MLE vs HCR-LB - AWGN with σ = 0.10

16 HCR

2 HCRN 14 eﬀ d MSE(θˆML) 12 Var(θˆML)

10 HCR interval

HCR-LB / MSE /2 Variance in cm

0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Object Location in m Figure 3.10: Comparison of HCR lower bound and MLE for AWGN with σ = 0.1.

Table 3.1. Firstly, it is worth noting that the average MSE and the variance of the MLEs are very close to each other for most values of θ∗. This implies that the ML estimator for this problem is (nearly) unbiased and hence the HCR-LBs are applicable to the ML estimates. While it can be seen that the average MSE is a bit higher than the variance for θ∗ = 1.2m, we believe that this diﬀerence would vanish as we perform more runs ( 30) of the ML estimation. When the teapot is in LOS (θ∗ = 0.8m), we can see that the ML estimator (unsurprisingly) is able to estimate the object location quite accurately. As the object translates further away from LOS the performance of the ML estimator reduces in tan- dem with the HCR-LB. Furthermore, the MSE (and the variance) of the MLEs are quite close to the HCR-LB intervals computed in the previous sections. Starting at random initializations from a 1m wide interval, the ML estimates converge to within a few cm of the true teapot location. This is impressive and promising considering that it is nearly impossible to ﬁnd any information about the hidden object directly by looking at the images in Figure 3.9. On the other hand, the renderer-enabled ML estimator is able to make use of the information content in the very weak signals from the indirect photons to localize the object. While there are no theoretical guarantees about the existence of estimators that can achieve the HCR-LB, we can see from this experiment that the performance of the ML estimator is close to that of the HCR-LB. Also, the error rates of the MLEs exhibit similar behavior as a function of the true object location as the HCR-LB. This shows 68 that the renderer-enabled HCR-LB framework proposed here yields lower bounds that are indicative of the true underlying fundamental limits for scene parameter estimation problems in plenoptic imaging systems.

3.7 Conclusion

We presented a framework to compute information-theoretic lower bounds for estimating scene parameters from noisy plenoptic data. Our approach employed the HCR-LB over the commonly used CR-LB, as the former is: (a) more amenable to our settings where computing partial derivatives (with respect to parameters of interest) might be infeasible, (b) applies to a wider range of problems, and (c) yields a bound that is at least as good as CR-LB. Using computer graphics rendering packages, we overcome the difficulty of having to solve the forward model in closed form, and numerically evaluate the HCR-LB. Furthermore, we analyze the effects of rendering error on the computed HCR-LB and show that the rendering error typically introduces a bias in the computed HCR-LB values. In particular, we show that the HCR functional computed using erroneously rendered images underestimates the true value especially for NLOS parameter estimation problems. We show that this bias vanishes at the rate of O(N −1) for unbiased and progressive renderers used here, where N is the number of samples per-pixel used. Based on our error analysis, we also provide a simple method to estimate intervals for the true HCR-LB. Our error analysis automatically accounts for the error accrued in computing the CR-LB using Finite Differences (FD) and indicates that the uncertainty (or the size) of the estimated intervals for CR-LB would be at least as large as those for HCR-LB. Thus, in addition to being at least as tight as CR-LB in value, HCR-LB is also at least as robust to rendering errors as CR-LB. Our renderer-enabled lower-bounding framework has been used to compute lower bounds for a few illustrative NLOS imaging problems under two common noise models: Poisson Noise and AWGN with different levels of noise variances. Additionally, we are able to compute pixelwise Fisher Information (or FD-FI). This FD-FI data provides useful insights, especially for NLOS imaging problems, as they tell us which of the indirect photons/observations convey more information about the parameter(s) of interest. We believe that these insights and tools can be used to develop novel adaptive sensing strategies for scene parameter estimation. Although we explore only classical 69 multi-spectral imaging systems in this work, our estimation-theoretic framework readily generalizes to accommodate additional dimensions of the plenoptic function, e.g., lenslet-array camera systems, polarization, motion in the scene, etc. The potential benefits of our HCR-LB framework come at a cost of increased computational requirements. Computing HCR-LB involves finding the supremum of the expression in (3.4) for general noise models. In this work, we compute the supremum by exhaustive search since the parameter space is small. However, if one is interested in problems with multiple parameters (d 1), the computational time for exhaustive search would increase exponentially making it prohibitive to apply in high- dimensional settings. For such settings, we could potentially explore either derivative free (zeroth-order) optimization methods, or recently developed differentiable renderers like [53,78] to compute derivatives and evaluate the HCR-LB, Equations (3.6) and (3.7) using gradient-based optimization algorithms. While it might be hard to obtain the global maxima of the HCR functional using gradient-based methods, convergence to any local maxima would yield a valid (but possibly loose) lower bound. It would be interesting to study the landscape of the HCR functional for such multi-parameter estimation problems. We defer these investigations to future work. Finally, we supplement our lower bounding framework by comparing the HCR-LB computed here with the performance of the Maximum Likelihood Estimator (MLE) for a simple, but illustrative, object localization problem. We see that the HCR-LB values closely matches the behavior of MLEs indicating that our framework is able to compute meaningful lower bounds that reflect the true fundamental limits of scene parameter estimation in plenoptic imaging systems. While (asymptotically) unbiased estimators like MLEs are useful in understanding the fundamental limits of parameter estimation, it is common to introduce “bias” into the estimates in the form of regularization to reduce the overall estimation error [79], especially in high-dimensional statistical inference. Generalizing our lower bounding framework for such biased estimators would be an interesting avenue for future work.

3.8 Acknowledgment

We graciously acknowledge support for this work by the DARPA REVEAL program, Contract No. HR0011-16-C-0024, and the Minnesota Supercomputing Institute (MSI) 70 at the University of Minnesota (URL: http://www.msi.umn.edu) for providing computational resources for rendering the scenes used in Section 3.4.4. We also thank Prof. Gary Meyer and his students Prof. Michael Tetzlaﬀ and Dr. Michael Ludwig, for useful discussions about ray-tracing algorithms and for helping out with the design of the Π-shaped hallway scene used in this paper.

3.9 Appendix

3.9.1 Empirical validation of Assumption A.2

We provide empirical evidence using simulations to validate Assumption A.2 here. For a an example scene described below, we empirically study the relationship between the weighted sum of per-pixel variance of rendered images and the number of samples per- pixel N, used for rendering. It is worth noting that for unbiased rendering algorithms, per-pixel variance of rendered images is equivalent to the per-pixel error variance. i.e. Var(L(N)(ω)) = Var(E(N)(ω)). Hence we use the variance of rendered pixel values in this section to empirically validate Assumption A.2. Our example scene for this simulation consists of a red teapot placed at the intersection of corridors A and B in the hallway scene described in Section 3.2.2. We consider 11 equally-spaced values of samples per-pixel N ranging from 1024, 2048,..., 11264 (= 11 · 1024). For each value of N, we render 100 independent low resolution (40 × 30) RGB images of the same scene using Redner [53] and compute the per-pixel variance. The ﬁrst image (top-left) in Figure 3.11 shows a single instance of the rendered RGB scene with N = 1024 samples per-pixel. The other 11 images in Figure 3.11 show the per-pixel variance (summed over the 3 color channels) of the rendered images. We can see that the per-pixel variance is not uniform across the entire image. Some regions of the image have smaller error (pixels corresponding to the teapot) than others (pixels on the back wall above the teapot). However, we can also observe the general trend that pixel variances consistently decrease with increasing values of N.

We consider weights Wω ∼ Uniform [0, Lmax], where we set Lmax = 12, which is the radiance value of the lights in the scene. It is worth mentioning that the value of Lmax was observed to have little to no effect on the results of this simulation. For (N) P (N) different values of N, we then compute γ := Wω · Var(L (ω)). In order to find ω∈Ω the exact rate of decay of γ(N) with respect to N, we fit parametric models given by, 71 Image Rendered Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance at N = 1024 for N = 1024 for N = 2048 for N = 3072 for N = 4096 for N = 5120 7e-05 5e-05 3e-05 3e-05 2e-04 6e-05 4e-05 3e-05 2e-05 1e-04 5e-05 2e-05 2e-05 1e-04 4e-05 3e-05 2e-05 1e-05 8e-05 3e-05 2e-05 1e-05 1e-05 5e-05 2e-05 1e-05 1e-05 3e-05 1e-05 5e-06 5e-06

Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance for N = 6144 for N = 7168 for N = 8192 for N = 9216 for N = 10240 for N = 11264 2e-05 3e-05 2e-05 1e-05 1e-05 1e-05 2e-05 2e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 2e-05 1e-05 1e-05 8e-06 1e-05 8e-06 1e-05 7e-06 6e-06 1e-05 7e-06 6e-06 1e-05 5e-06 4e-06 5e-06 4e-06 5e-06 5e-06 2e-06 2e-06 2e-06 2e-06

Figure 3.11: Top left: A single instance of the teapot image rendered with 1024 samples. Other plots: Per-pixel variance (summed over the 3 color channels) for the teapot image for diﬀerent values of samples per-pixel N. It can be seen that per-pixel variance is not same across all pixels in the image. While the general pattern of pixel-wise variance is similar across diﬀerent values of N, the magnitude of the variance decreases (as expected) with increasing samples.

(N) −p γ = Cp · N for values of p between 0.85 and 1.15 (in 0.001 increments) and choose the value of p that has the smallest L2 error of ﬁt,

(N) −p 2 popt = arg min kγ − Cp · N k2. 0.85≤p≤1.15

4 We repeat the above process with 10 independent draws of Wω to find the distribution of popt. Figure 3.12(a) shows that the average fitting error is smallest for p = 1, with an average squared L2-error = 3.154 × 10−4. Furthermore, we can see from the distribution of popt in Figure 3.12(b) that popt is highly concentrated around p = 1. Mean, median and mode of the distribution all occur at p = 1. Finally, Figure 3.12(c) shows a single instance of γ(N) and the corresponding model fit using p = 1, which −1 shows how well the parametric model of Cp · N fits the observed weighted sum of pixel variances. These simulations show that the weighted sum of pixel variances follow a Θ(N −1) parametric decay rate with high probability, thus validating Assumption A.2. 72

Model-Fitting Error vs Degree p Histogram of Optimal Degree - popt Weighted Sum of Pixel Variance vs N 0.35 0.6 Average Squared L-2 Error Observed Values Optimal Degree, = 1.000 6 popt Model ﬁt for p = 1.000 0.30 0.5 5

0.25 ))

0.4 ω ( )

N 4 0.20 ( L 0.3 ( Var

Fraction 3 0.15 · ω

0.2 W ω ∑ Squared L2-Error of ﬁt 0.10 2

0.1 0.05 1

0.0 0.00 0.85 0.90 0.95 1.00 1.05 1.10 1.15 0.96 0.98 1.00 1.02 1.04 2000 4000 6000 8000 10000 Degree p Degree p Samples per pixel - N (a) (b) (c)

4 Figure 3.12: Results from the simulation with 10 independent draws of weights Wω: (a) Average Squared L2-Error of ﬁt vs Degree p; (b) Distribution of the optimal degree popt shows that most of it is concentrated around p = 1; (c) A single instance of weighted sum of pixel variance (γ(N)) along with the model ﬁt using p = 1.

3.9.2 Proof of Theorem 3.1

Let us denote the true plenoptic intensities for parameter values Lθ∗ , Lθ∗+∆ by L1, L2 respectively, for brevity of notation. We can then describe the corresponding (inexactly) (N) (N) rendered plenoptic values using N samples per-pixel as, Le1 = L1 + E1 and Le2 =

L2 + E2, where the dependence of the rendering noise on N is implicit. For the case of HCR lower bound under Poisson noise (Section 3.4.1), if we use the rendered plenoptic values, we end up with erroneous estimation of λP , given by

(N) (N) 2 X Le1 (ω) − Le2 (ω) λeP = ω∈Ω Le1(ω) 2 X (L1(ω) − L2(ω) + E(ω)) = , (3.26) L1(ω) + E1(ω) ω∈Ω where we let E := E1 − E2. If the errors in the rendered image are relatively small, i.e.

|E1(ω)| L1(ω) for all ω ∈ Ω, then we can use the series expansion to write

2 1 1 E1(ω) E1(ω) = 1 − + 2 − ... . (3.27) L1(ω) + E1(ω) L1(ω) L1(ω) L1(ω) 73 Substituting (3.27) in (3.26), we get

2 ! 2 X L1(ω) − L2(ω) + E(ω) E1(ω) E1(ω) λeP = 1 − + 2 − ... L1(ω) L1(ω) L1(ω) ω∈Ω 2 ! ∞ k X L1(ω) − L2(ω) X −E1(ω) = λP + L1(ω) L1(ω) ω∈Ω k=1 | {z } (I) 2 ! ∞ k X E(ω) + 2 L1(ω) − L2(ω) E(ω) X −E1(ω) + . (3.28) L1(ω) L1(ω) ω∈Ω k=0 | {z } (II)

If we further take a closer look at the term (I), we can see that

( 2 ) X (L1(ω) − L2(ω)) −(1+δ) E [(I)] = 3 Var(E1(ω)) + O(N ), (3.29) L1(ω) ω∈Ω where we have used the unbiasedness (Assumption A.1) and the bound on higher- order moments (Assumption A.3) of the rendering errors to get (3.29). Using similar arguments, we can show that

X 1 2(L1(ω) − L2(ω)) −(1+δ) E [(II)] = Var(E(ω)) − 2 E[E(ω)E1(ω)] + O(N ) L1(ω) L1(ω) ω∈Ω ( X Var(E1(ω)) + Var(E2(ω)) = − L1(ω) ω∈Ω ) 2(L1(ω) − L2(ω)) −(1+δ) 2 (Var(E1(ω)) − E[E1(ω)E2(ω)]) + O(N ) L1(ω) (3.30) X 2L2(ω) − L1(ω) Var(E2(ω)) −(1+δ) = 2 Var(E1(ω)) + + O(N ), L1(ω) L1(ω) ω∈Ω (3.31) where we have used the fact that E1 and E2 are independent (Assumption A.4) and zero- mean (Assumption A.1) to get Var(E(ω)) = Var(E1(ω))+Var(E2(ω)) and E[E1(ω)E2(ω)] = 74

E[E1(ω)] · E[E2(ω)] = 0. Combining Equations (3.29) and (3.31), we get

2 X L2(ω) Var(E2(ω)) −(1+δ) E [(I) + (II)] = 3 Var(E1(ω)) + + O(N ) L1(ω) L1(ω) ω∈Ω C = P + O(N −(1+δ)), (3.32) N for some scene-dependent constant CP ≥ 0 and δ > 0, which is the constant appearing in Assumption A.3. Equation (3.32) follows directly from Assumption A.2 about the rate of decay of weighted sums of pixel-wise error variance. Now we can get a handle on the variance of (I) + (II),

h 2i 2 Var ((I) + (II)) = E ((I) + (II)) − {E [(I) + (II)]} . (3.33)

From (3.28), we can rewrite

X c0(ω)E1(ω) (I) = , (3.34) L1(ω) + E1(ω) ω∈Ω

2 L1(ω)−L2(ω) where we let c0(ω) := , and L1(ω)

2 X E(ω) + 2 L1(ω) − L2(ω) E(ω) (II) = . (3.35) L1(ω) + E1(ω) ω∈Ω

1 If we use Equations (3.34) and (3.35) and the series expansion of 2 to (L1(ω) + E1(ω)) h 2i 2 2 evaluate E ((I) + (II)) = E (I) + (II) + 2 · (I) · (II) , we get

X Var ((I) + (II)) = {c1(ω) Var(E1) + c2(ω) Var(E2)} ω∈Ω −(1+δ) 2 + O(N ) − {E [(I) + (II)]} = O(N −1). (3.36)

where c1(ω) and c2(ω) are some scene-dependent constants, and the ﬁnal equality follows from Assumption A.2 and (3.32). Combining Equations (3.32) and (3.36) with (3.28) we get the desired result in Theorem 3.1, thus completing the proof. 75 3.9.3 Proof of Theorem 3.2

We use similar arguments as above to prove Theorem 3.2. For the case of HCR lower bound under AWGN noise (Section 3.4.2), if we use the rendered plenoptic values, we end up with erroneous estimation of λG, given by

1 2 2 T λeG = λG + kE1 − E2k + (L1 − L2) · (E1 − E2) . (3.37) σ2 σ2 | {z } | {z } η1 η2

If we further take a look at η1, we can see that

X Var(E1(ω)) + Var(E2(ω)) 2 E[η1] = 2 − 2 E[E1(ω)E2(ω)] σ σ | {z } ω∈Ω =0 C = G , (3.38) N for some scene-dependent constant CG ≥ 0, where the last equality follows from As- sumption A.2. Also, due to the unbiasedness of rendering errors, it is easy to see that

E[η2] = 0. Thus we have

C [η + η ] = G . (3.39) E 1 2 N

As in the previous proof, we denote E := E1 − E2, and calculate the variance of η1 + η2,

2 2 Var(η1 + η2) = E[(η1 + η2) ] − {E[η1 + η2]} 1 4 T 2 = 4 E kEk + 4(L1 − L2) · E kEk ·E σ | {z } =O(N −(1+δ)) 4 C2 + (L − L )T · [EET ] · (L − L ) − G σ4 1 2 E 1 2 N 2 | {z } =O(N −1) = O(N −1), (3.40) where we use Assumption A.3 to bound the higher order moments of the rendering −(1+δ) error by O(N ), and the independence of E1 and E2, and Assumption A.2 to get T a handle on E[EE ]. Combining Equations (3.39) and (3.40) with (3.37), we get the desired result in Theorem 3.2, thus completing the proof. Chapter 4

Non-Line-Of-Sight Imaging from Plenoptic Observations

4.1 Introduction

In the previous chapter, we considered the fundamental limits of estimating high-level scene parameters, e.g., object location and size, from noisy plenoptic data, with an eye towards estimating Non-Line-of-Sight (NLOS) scene parameters. In this chapter, we will consider the problem of imaging hidden scenes from (noisy) plenoptic observations made from a reflecting surface with known surface reflectance properties. NLOS imaging refers to the general paradigm of discerning hidden objects and scenes from observations that are outside the Field-of-View (FOV) of an imaging device. NLOS imaging has numerous applications in autonomous navigation (aids collision avoidance with vehicles around a corner), medical diagnostics (imaging tissues and regions inaccessible with an endoscope), military and search-and-rescue operations (alert soldiers and first responders of hidden adversaries in a building) to name a few. While problems like imaging through (semi) opaque scattering media, e.g., seeing through wall, medial tomography, etc. can be thought of as types of NLOS imaging problems, we restrict our attention to the class of NLOS problems that can broadly described as, ”seeing around corners” in this work. The plenoptic function or the light field1 [5] is a high-dimensional vector function (often 5D or higher) that describes the amount of light flowing through every point in

1We use the term plenoptic function and light ﬁeld interchangeably in this work.

76 77 space in every direction, at any given point of time and across multiple wavelengths. The resulting plenoptic function takes the form

L = L(r, ϕ, ν, t),

3 where r ∈ R refers to the location at which the plenoptic observation is made, ϕ ∈ [0, 2π) × [0, π) denotes the angle pair (direction of arrival/exitance of light) in polar coordinates, ν is the wavelength, and t is the time index. Plenoptic imaging systems obtain “slices” of the plenoptic function over a subset of the domain that defines the imaging modality. For instance, multi-spectral imaging systems capture the plenoptic function at a specified set of wavelengths, a fixed location in space, and over a finite FOV. Thus plenoptic imaging systems contain a lot more information about a given scene than a conventional camera. In this work, we describe a regularized inversion methodology for recovering hidden scenes from NLOS plenoptic measurements that leverages the richness of the plenoptic function as well as the substantial structural regularity of the underlying scene across multiple dimensions. For example, plenoptic data comprising of multiple single snapshot images, collected across time, and through multiple viewpoints is highly structured. We exploit these structural regularities (or redundancies) induced by motion (in the scene and/or the camera) and parallax using our regularization-based approach. We demonstrate the efficacy of our proposed regularization approach on different imaging modalities using real data.

4.1.1 Prior Art

NLOS imaging methods can either be active, based on a transient or time-resolved imaging framework initially proposed in [54, 80], which requires using external exci- tation of the hidden scene using a suitable source of illumination (e.g., pulsed lasers, LIDARs), or passive methods, which use existing ambient lighting in the scene to perform imaging. Early active NLOS imaging methods used a combination of expensive femtosecond lasers and 2 picosecond resolution streak camera [54,81], but newer active imaging systems have proposed using single-photon avalanche diode (SPAD) detectors and time-correlated single photon counting (TCSPC) modules as less expensive alterna- tives. Recently, active methods like [56–58] to name a few, have gained a lot of attention and have been used widely for NLOS imaging. 78 In this chapter, we focus on passive imaging methods that are typically less expensive, have lower power requirements and faster data acquisition rates, and more importantly stealthier (since they don’t require “probing the hidden scene” with external stimuli) than active methods. The main challenge in passive imaging is that common materials exhibit diffuse scattering properties (at least in the visible portion of the spectrum), resulting in a very ill-conditioned forward model. It was first observed in [59], that the presence of occluding objects like sharp edges and corners facilitates the NLOS recovery problem. This led to many follow up works that exploit the presence of such occluders in the hidden scene to perform NLOS scene recovery [7–11,48,65,66]. While the above-mentioned efforts use information about the scene geometry to obtain a well-conditioned forward model for imaging the hidden scene, we propose a supplementary approach, which can be applied in conjunction with the occluder-based imaging paradigm. Recently, passive thermal imaging has been proposed as a viable alternative to using visible light for NLOS imaging in [14]. The authors argue that at Long-Wave InfraRed (LWIR) wavelengths (8-14µm), the hidden objects themselves usually act as light sources rather than mere light reflectors, resulting in a simpler “one-bounce” scenario, rather than a “multi-bounce” scenario. Furthermore, surface reflectance (or the BRDF) of common materials have a much stronger mirror-like (or specular) behavior in the IR spectrum than at visible wavelengths. However, the main challenge in LWIR settings is the low albedo of surfaces (due to high absorption levels), resulting in observations with low SNR. The authors of [14] demonstrate NLOS object localization, (2D) shape recovery and human pose estimation using thermal images. In this work, we demonstrate thermal NLOS imaging from a video of thermal images, using our regularized inversion methodology outlined in Section 4.3. The hidden scene comprises of a (moving) human subject who is several feet (6ft to 9ft) away from the wall, and our reconstructions run almost in real-time, the first such demonstration, to the best of our knowledge.

4.2 Problem Formulation

Let us first mathematically characterize the forward mapping or the light-transport model that relates the light field incident on a reflecting surface, i.e. light from the hidden scene, to the reflections off the scattering surface, i.e. observations made by the 79 camera using geometric ray optics2. For a fixed wavelength ν and time instant t, let us denote the light field incident at a point r on the reflecting surface, along some direction in ϕi by L (r, ϕi). We can then write the outgoing light field from r along direction ϕo as

out X in L (r, ϕo) = L (r, ϕi)f(r, ϕo, ϕi)(ϕi · nr), (4.1) 2 ϕi∈S+(r) where f(r, ·, ·) is the bi-directional reflectance distribution function (BRDF) of the re- 2 flecting surface, nr is the surface normal, and S+(r) denotes the (discretized version of the) unit hemisphere containing all possible incident rays at point r. It is worth pointing out that (4.1) is a discretized version of the rendering equation [12]. Equation (4.1) also implicitly assumes that the reflecting surface does not emit any light of its own. By combining BRDF and the dot-product terms into a single forward operator in B(r, ϕo, ϕi) := f(r, ϕo, ϕi)(ϕi ·nr), we get the following linear relationship between L and Lout:

out X in L (r, ϕo) = L (r, ϕi)B(r, ϕo, ϕi). (4.2) 2 ϕi∈S+(r)

If we assume knowledge of the BRDF of the reﬂecting surface and we can characterize the forward model completely. A typical NLOS imaging problem is illustrated in Figure 4.1. The exact relationship between the observations of interest and the NLOS scene depends on the imaging modality used, e.g. FOV of the camera, location(s) of the camera relative to the scattering surface, knowledge of the presence of any occluders in the scene (for occluder-based imaging).

4.2.1 The NLOS Imaging Problem

We formally define the NLOS imaging problem for a broad a class of imaging modalities here and then we shall instantiate it for specific cases in the following sections. Let 3 C ⊆ R denote set of camera locations in the scene where the observations are made, and Fc denote the set of points on the reflecting surface(s) observed by the camera located at c ∈ C. For a fixed wavelength ν and time instant t, the set of observations

2We neglect the eﬀects of participating media and sub-surface scattering here. 80

Figure 4.1: Illustration of a typical NLOS imaging problem. The camera FOV Fc defines the set of all points on the reflecting surface corresponding to the pixels captured by the camera located at c. The light field observed/measured by the camera from a surface point r ∈ F is denoted by Lout(r, ϕ ), where the outgoing ray direction ϕ = (c−r) . c o o kc−rk2 made by the imaging system is given by

out (c − r) Y := Y(c, r) = L (r, ϕo) ϕo = , r ∈ Fc and c ∈ C . (4.3) kc − rk2

The observations Y, in the general setting, is a 5D tensor (7D if we include measurements across wavelength and time), which is indexed by a 3D location where the measurement is made and the 2D angle/direction of the observed light field ray. In the simplest setting where we have a single camera location, the observations form a standard 2D image (4D if we include wavelength and time) as shown in Figure 4.1. Thus, the overall observation tensor Y, can be thought of as multiple 2D images from various camera locations stacked to form a multi-way tensor. Since we operate in discrete settings, the total number of measurements depends on the cardinality of C (or the number of camera vantage points) and the cardinality of Fc (or the resolution of the camera) for all c ∈ C. Using similar formalism, we define the light field incident on the reflecting surface 81 (containing the NLOS scene of interest) as

∗ ∗ in 2 X := X (r, ϕi) = L (r, ϕi) ϕi ∈ S+(r) and r ∈ F , (4.4)

S ∗ where F = Fc is the overall FOV of the entire imaging system. The light field X c∈C is a multi-way tensor, which not only contains light from the NLOS scene of interest, but the entire 2D hemispherical region at every point in the camera FOV as shown in Figure 4.1. The resolution of the light field X∗ that we aim to recover depends on how 2 finely we want to discretize the incident upper hemisphere of light rays S+(r), at every surface point r ∈ F. Using (4.2), we write the forward model relating the observations of the plenoptic imaging system and (unknown) light field incident on the reflecting surface (with known BRDF) as

X (c − r) Y(c, r) = B r, , ϕ · X∗(r, ϕ ), ∀c ∈ C, r ∈ F , (4.5) kc − rk i i c 2 2 ϕi∈S+(r) or succinctly using operator notation as Y = B(X∗), where B(·) is the linear forward operator acting on the light ﬁeld tensor X∗. While it is possible to vectorize the observations and represent the linear forward operator as a matrix, the size of such matricized forward operators typically become very large even for problems of modest size. However, in many occasions the forward model is simpliﬁed by making structural assumptions about the BRDF, e.g., a shift-invariant BRDF model, which would yield a much simpler form of the forward model when represented using a convolutional operator (or kernel) rather than a full blown matrix.

For a fixed imaging modality (defined by choice of C and {Fc}c∈C), and known BRDF of the reflecting surface (and hence the forward operator B), the NLOS imaging problem aims to recover the unknown light field incident on the reflecting surface X∗ from plenoptic measurements Y = B(X∗).

4.2.2 Notation and Preliminaries

T Let X be a K-way light ﬁeld tensor. We use the shorthand ω := [ω1, ω2, . . . , ωK ] to denote the arguments (or indices) of X, where we let ωk take values from 1 to

Nk, for k = 1,...,K. We then deﬁne the set of indices for the light ﬁeld tensor as 82

Ω := [N1]×[N2]×· · ·×[NK ], where [N] denotes the set 1, 2,...,N, for any N ∈ N. The K values {Nk}k=1 typically depend on the imaging modality used and the desired level of discretization/resolution of the light ﬁeld. The directional gradient of X at location ω is a K-dimensional vector deﬁned by

  ∇1X(ω)    ∇2X(ω)    ∇ X(ω) :=  .  , (4.6)  .    ∇K X(ω) where each term ∇kX(ω) is the derivative of X along the k-th plenoptic dimension, for k = 1,...,K. In this work, use a discrete approximation of the directional derivative3,

∇kX(ω) ≈ X(ω1, . . . , ωk + 1, . . . , ωK ) − X(ω1, . . . , ωk, . . . , ωK ). (4.7)

Finally, we denote the adjoint (or the Hermitian conjugate) of a linear operator A(·) as H H H A (·), and the composition of A (A(·)) as A A(·), and k · kF denotes the Frobenius norm of a tensor.

4.3 Multi-Way Total Variation Regularization for NLOS Scene Recovery

In this section, we present our regularized linear inversion methodology to recover the unknown light ﬁeld X∗ containing the NLOS scene by exploiting the structural regularity of the multi-way tensor. We propose recovering X∗ from plenoptic observations Y by solving the following optimization problem:

µ 2 Xb = arg min kY − B(X)kF + kXkTV, (4.8) X 2 P where kXkTV := k∇X(ω)k2 denotes the K-way isotropic Total Variation (TV) ω∈Ω semi-norm that promotes robustness against noise and enforces structural regularity, i.e., smooth regions across all plenoptic dimensions. Here µ is a regularization or tuning parameter which determines the tradeoﬀ between the data-ﬁt (or the `2) term and the

3We perform symmetric post-padding of the tensors using the MATLAB function padarray to accommodate edge cases in our code. 83 regularization (or the TV) term in the cost function. TV regularization [82] has been widely used for (2D) image restoration since it effectively remove high (spatial) frequency noise by promotes smooth regions in the image [83–86]. In our case, if we assume that the set of camera locations C are contiguous and the FOV points on the reflecting surface change slowly with different camera locations, then the incident light field tensor X has structural regularity across all plenoptic dimensions. In other words, the incident 2D light field across various points on the reflecting surface are all related via parallax, which results in sparse gradients across other plenoptic dimensions too. We propose solving (4.8) using the split Bregman method described in [13]. An outline of the split Bregman method adapted to our setting to solve (4.8) is shown in Algorithm 4.1, and the method is briefly explained in the rest of this section.

We use Bregman splitting to decouple the TV regularization term and the `2-terms in (4.8) by replacing ∇kX by Dk for k = 1,...,K. This yields the following constrained optimization problem:

µ 2 min k(D1,..., DK )k2 + kY − B(X)kF , (4.13) X 2

s.t. Dk = ∇kX for all k = 1,...,K.

q P PK 2 where k(D1,..., DK )k2 = k=1 D1(ω). To weakly enforce the contraints in the ω∈Ω above formulation, we add penalty terms to the objective function in (4.13), which yields

K µ 2 λ X 2 min k(D1,..., DK )k2 + kY − B(X)kF + kDk − ∇kXkF . (4.14) X 2 2 k=1

This is where the split Bregman method begins to diﬀer from continuation-based methods. While continuation methods enforce the equality constraints in (4.13) by letting λ → ∞ in (4.14), we enforce the penalty here by applying the Bregman iteration as mentioned in [13] to get

K µ 2 λ X 2 min k(D1,..., DK )k2 + kY − B(X)kF + kDk − ∇kX − BkkF , (4.15) X 2 2 k=1 where the values of the Bk terms are updated using the Bregman iteration where we 84

Algorithm 4.1 Split Bregman algorithm for Multi-way (Isotropic) TV regularized NLOS scene recovery (4.8). Require: B(·), µ, λ (we typically set λ = 2µ), (tolerence for convergence) Input: Observed light ﬁeld data - Y (0) H 0 (0) (0) (0) Initialize:X = B (Y), and D1 = ··· = DK = B1 = ··· = BK = 0, and t = 0. 1: repeat v 2: Compute: u K (t) uX (t) (t) 2 S = t |∇kX + Bk | (4.9) k=1

3: Update X(t+1):

K (t+1) µ 2 λ X (t) (t) 2 X = arg min kY − B(X)kF + kD − ∇kX − B kF (4.10) X 2 2 k k k=1

(t+1) 4: Update Dk : 1 ∇ X(t) + B(t) D(t+1) = max S(t) − , 0 k k , for k = 1,...,K (4.11) λ S(t)

(t+1) 5: Update Bk :

(t+1) (t) (t+1) (t+1) Bk = Bk + ∇kX − Dk , for k = 1,...,K (4.12)

6: t ← t + 1

(t) (t−1) kX −X kF 7: until (t) < kX kF

Output: Recovered light ﬁeld: Xb = Xt 85 iteratively solve

K (t+1) n (t+1)o µ 2 X , Dk = arg min k(D1,..., DK )k2 + kY − B(X)kF k=1 K 2 X,{Dk}k=1 K λ X (t) + kD − ∇ X − B k2 , (4.16) 2 k k k F k=1 (t+1) (t) (t+1) (t+1) Bk = Bk + (∇kX − Dk ), ∀ k = 1,...,K. (4.17)

The authors of [13] suggested that it would suﬃce to perform just one iteration of optimizing with respect to X, followed by optimizing with respect to the Dk’s to solve (4.16), instead of doing it iteratively for most practical problems of interest.

Algorithm 4.2 Algorithm for the X(t+1)-update step (4.10) Require: B(·), BH (·), µ, λ (we typically set λ = 2µ),

H NGD - number of gradient descent steps , L - largest eigenvalue of B B(·) K (t) n (t) (t)o Input:Y , X , Dk , Bk k=1 Initialize: 1 Step size: η = µL + 4Kλ

H PK H (t) (t) Compute: b = µB (Y) + λ k=1 ∇k Dk − Bk

H PK H Deﬁne the linear operator: A := µ B B + λ k=1 ∇k ∇k Set: X(t,0) = X(t)

Perform:

for n = 0, 1,...,NGD − 1 do

X(t,n+1) = max X(t,n) − η A X(t,n) − b , 0

end for

Output:X (t+1) := X(t,NGD)

Therefore, we approximately solve the subproblem (4.10), which can be done using a few steps of (projected) gradient descent as long as we have access to the adjoint of the forward operator BH (·). The algorithm for X(t+1)-update is outlined in Algorithm 86 4.2. Projection of X onto the non-negative orthant in Algorithm 4.2 ensures that our light field tensors are always non-negative. For our experiments, we perform anywhere between 1 and 50 gradient descent steps and compute L, which is the largest eigenvalue of BH B(·) using the power method. And finally, we use a generalized shrinkage formula n (t+1)oK [85] for the Dk -update step as shown in Equations (4.9) and (4.11) to solve k=1 the subproblem in closed form. The split Bregman method has several advantages for solving the TV regularized inverse problems including fast convergence, subsequent iterates being “smooth” (or having small TV values), and most importantly the ability to enforce equality constraints without letting the constraint penalty factor λ → ∞. This is crucial to ensure that the subproblem (4.10) is not ill-conditioned and we can solve it using fast iterative methods without running into numerical instabilities. Equations (4.9), (4.11) H and (4.12), and finite difference operations, ∇k and ∇k , in the Bregman iterative procedure are computed element-wise and hence admit fast distributed (GPU-accelerated) implementations. The computational bottleneck for Algorithm 4.1 is the complexity of the forward operator B (and its adjoint BH ). If the BRDF of reflecting surface is well- approximated, for example, by convolutional operators, the algorithm can be end-to-end accelerated using GPUs to yield fast (near real-time) reconstructions. We implemented our multi-way (isotropic) TV regularized inversion algorithm outlined in Algorithms 4.1 and 4.2 on MATLAB. Our implementation was GPU-accelerated using MATLAB’s Parallel Computing ToolboxTM, and run on a NVIDIA Quadro RTX 8000 GPU.

4.4 Signal-Separation to Noise Ratio: An Unsupervised Evaluation Metric

For scenarios where we don’t have access to the ground truth data to quantitatively evaluate the performance of the reconstruction algorithm, we propose using an alternative to the popular Signal to Noise Ratio (SNR), called the Signal-Separation to Noise Ratio (SSNR). For NLOS imaging problems, the quality of reconstruction depends on how well we can recover or “tease apart” the incident light, which is buried under the bed of ambient light from diﬀuse scattering. Thus, we deﬁne the Signal-Separation to 87 Noise Ratio (SSNR) as

Mean Signal Level - Mean Background Level SSNR 20 log dB. (4.18) , 10 Mean Noise Level

In general, ground truth data is required to obtain the signal, background, and the noise levels, but we overcome this by “learning” these values from the images. For a given (2D) image Z, we model the pixels Zi,j as random draws from a Gaussian Mixture Model (GMM), with unknown parameters,

i.i.d Zi,j ∼ p(z) := πs ·N (z|µs, σs) + πb ·N (z|µb, σb), where µs and µb represent the mean signal level and the mean background level, re- 2 2 spectively, σs and σb denote the corresponding noise variances, and the mixture weights

πs + πb = 1. Given an image, we can learn (or estimate) these parameters, say using an iterative Expectation-Maximization (EM) algorithm,4 and use that to compute the SSNR deﬁned in (4.18).

For a given image Z, let µbs, µbb, σbs, and σbb denote the estimated GMM parameters. Then the SSNR for the image Z is deﬁned as

µbs − µbb SSNR(Z) , 20 log10 dB. (4.19) 0.5 (σbs + σbb)

For an algorithm that generates an output Xb using observations Y, the SSNR gain is given by

∆-SSNR , SSNR(Xb ) − SSNR(Y). (4.20)

Thus, we use the ∆-SSNR defined in (4.20), as a metric to evaluate the quality of our reconstructions in the absence of ground truth information for our experiments. In the following sections, we demonstrate our multi-way TV regularization based inversion methodology for NLOS scene reconstruction (or recovery) under different imaging modalities, which are instances of the general setup outlined in Section 4.2.1, and highlight the benefits of exploiting the joint structure present in multiple dimensions of

4We use MATLAB’s inbuilt function fitgmdist in our implementations. 88

(a) (b)

Figure 4.2: Experimental setup used for the 2D NLOS light field recovery. (a) Layout of the measurement studio. Length of rotating arm is 30cm, scattering surface is a brushed metal sheet coated with satin paint, 2 LED strips placed 9cm apart are the (hidden) 1D objects of interest. A CMOS camera captures pictures of the scattering surface at multiple angles, which constitutes the observed light field. (b) A picture of the measurement studio. All the apparatus is placed inside an enclosure and the surfaces in the interior of the enclosure are covered with anti-reflection black felt to minimize ambient light and unwanted reflections. While the picture here shows only one LED strip, the actual experimental measurements involved 2 LED strips. the light field. We begin with a simple experimental setup where the aim is to reconstruct an unknown 2D light field (consisting of 1D hidden objects) comprising of (self- illuminating) objects from diffuse measurements. We then demonstrate the effectiveness of our regularizer in exploiting motion and temporal redundancies to image around corners in the Long-Wave InfraRed (LWIR) where surfaces behave almost mirror-like (or specular).

4.5 Recovering a 2D NLOS light ﬁeld

4.5.1 Experimental Setup

This experiment involved a specially constructed enclosure that comprised of a motorized rotational stage with a 30cm arm attached to it. A shot-noise-limited CMOS camera (Photometrics, Prime Sigma camera) was placed at one end of the arm, which was capable of taking pictures of the reflecting (scattering) surface from multiple angles. The motorized stage and the camera were fully computer-controlled to capture 89 the scattering surface from multiple angles. The scatter used was a flat brushed metal sheet painted with a satin paint, whose BRDF was measured a priori using the setup described in Section 5.A of [87]. The scene of interest comprised of two dense LED strips with a width of 3mm. The layout of the experimental setup and a picture of the light field measurement studio are shown in Figure 4.2. It is worth taking a moment to explain how the light fields are indexed. Both the unknown incident light field X, and the measured light field Y are anchored to the scattering points on the reflecting surface. X is a 2D light field indexed as X(Pn, θin), where

Pn is the scatter point on the reﬂecting surface, and θin corresponds to the incoming angle with respect to the surface normal at Pn. Similarly, Y is the 2D measured light

field indexed as Y(Pn, θout), where θin corresponds to the outgoing angle with respect to the surface normal at Pn. A total of 100 scatter points, indexed from left to right as shown in Figure 4.2(a), are considered for collecting angular data (corresponding to multiple camera viewpoints), as the rotating arm sweeps uniformly over the range of angles [29.98◦, 61.03◦], with an angular resolution of 0.3489◦, collecting 90 angular measurements. The camera FOV changes dynamically as the rotating arm moves, and hence all 100 points are not in the camera FOV at all times. While the camera takes 2D pictures at every angular position, we average the pixels (column-wise) from a horizontal strip (in the center of the camera FOV) of 50 rows to obtain a single of the 1D light field (e.g. one row of Figure 4.3(a)). The overall NLOS imaging problem thus reduces to reconstructing (or recovering) a 100 × 90 unknown light field incident on the 100 scattering points, from light field measurements of the same size. Even though the scene comprises of 1D objects, the light field corresponding to them is 2D, making this a 2D light field recovery problem.

4.5.2 Results and Discussion

We ran 500 iterations of the split Bregman method outlined in Algorithm 4.1 with µ = 0.005, λ = 2µ, and the number of (projected) gradient descent iterations (for

Algorithm 4.2), NGD = 50. The forward model was a 90 × 90 matrix, which comprised of the measured BRDF modulated by the appropriate cosine factors as outlined in (4.5). The reconstruction took approximately 50 seconds, while the data collection took close to 4.2 minutes. Figure 4.3 shows the results of applying our TV regularization method on the 90

(a) (b)

(e) (f)

Figure 4.3: Results for NLOS 2D light field recovery using multi-way TV regularization. (a) Observed light field, Y. (b) Reconstructed (incident) light field containing the hidden scene, Xb . ∆-SSNR = 30.3055 dB. (c)-(f) An overlay of the 1D observed and reconstructed light fields at different scatter points. Each plot corresponds to a single column of the 2D light fields shown in (a) and (b). Our regularized inversion algorithm produces sharp reconstructions of the 1D objects from diffuse or blurry measurements. 91 2D light field data. We can see from Figure 4.3(a) that the observed light field are “blurry” due to the diffuse scattering of the reflecting surface. However, using the BRDF information, it is possible to reconstruct the NLOS scene using our algorithm. ∆-SSNR = 30.3055dB for the reconstruction. Figure 4.3(b) shows that the recovered signal has 2 sharp peaks corresponding to the 2 LED strips and that the background signal is significantly suppressed. We can see that the (normalized) peaks of the observations are not equal even though the LED strips had the same brightness. This is due to the “cosine falloff” (or Lambert’s Law). However, using the BRDF information, which accounts for this effect, we get reconstructions where the peak values are nearly identical, as expected (see Figures 4.3(c) to 4.3(f)). It is worth mentioning that the reconstructed light field can further be used to identify the 2D locations of the hidden objects simply by triangulating the θin’s corresponding to the “peak” values of the light field across all scatter points.

4.6 NLOS Scene Recovery using Thermal Imaging

In this section, we demonstrate NLOS scene recovery at infrared (or LWIR) wavelengths by applying our multi-way TV reconstruction approach. By jointly exploiting the structure of the hidden scenes across spatial and temporal dimensions, we perform near real-time thermal NLOS imaging. We present results for both 2D-TV regularized reconstruction, where we apply 2D-TV regularization frame-by-frame, as well as 3D-TV regularized reconstruction, where we jointly enforce the TV penalty uniformly across spatial and temporal dimensions.

4.6.1 Experimental Setup

The setup consists of an L-shaped hallway as shown in Figure 4.4. An infrared (IR) camera (optris R PI 400) is mounted on an arm of length 1m, which is attached to a motorized stage. Both the motorized stage and the IR camera are fully computer- controlled. The IR camera has a (horizontal) FOV of 29◦ and a thermal sensitivity of 40mK, and captures images of size 288 × 382 at 10 frames/second. The ambient temperature of the room was around 22◦C. The thermal camera was pointed at a ﬂat masonite board, which is the scattering surface, at a viewing angle of roughly 37◦. The hidden scene comprised of a moving human subject wearing a t-shirt and full length trousers. The distance of the subject from the wall, denoted in Figure 4.4(a) as 92

(a) (b)

Figure 4.4: Experimental setup used for thermal NLOS imaging. (a) Layout of the L-shaped hallway scene used. A LWIR camera mounted on an 1m long computer- controlled arm captures thermal images of a flat piece of masonite at 10 frames/second. The hidden scene comprises of a moving human subject, who is located at a distance dH from the wall. (b) A picture of the experimental setup. Reflecting surface used in our experiments (masonite) is different from the one pictured here (black masonite).

dH = 1.8m and 2.7m for different datasets. The measured and reconstructed light fields are videos (3D tensors), where we aim to exploit the spatio-temporal correlations in the light field for NLOS scene recovery. Based on BRDF measurements of the masonite scatterer in IR (using a similar setup as the one used in the visible experiments), it was observed that the masonite scatterer has a sharp specular peak and behaves mirror-like. However, the surface albedo is small resulting in a weak signal from the NLOS scene. Thus, the forward model in this case can be written as Y = ρ ·M(X) + η, where M(·) denotes the forward operator of a perfect mirror, ρ is the (unknown) albedo of the scattering surface, and η is measurement noise from the camera. Instead of estimating X directly, we use our multi-way TV regularization to denoise the observations to computationally convert the scattering surface into a mirror (up to a scaling factor). Since recovering the mirror image is equivalent to recovering the NLOS scene (modulo geometric transformations),5 we refer to our recovery algorithm as “NLOS imaging” (or NLOS scene recovery), without loss of generality.

5X and M(X) are identical up to permutations. A mirror simply reflects all incident light rays back into the scene at a different direction following the law of reflection: “angle of incidence = angle of reflection.” 93 4.6.2 Results and Discussion

For fast implementation, we chop the video into blocks (or batches) of 20 frames and apply the multi-way TV regularization algorithm to 3D tensors of size 288 × 382 × 20. It was empirically observed that increasing the block size to more than 20 resulted in negligible improvement in reconstruction quality, which is expected as temporal correlations are between frames that are too far apart is expected to be small. In order to exploit the temporal correlation between the frames in diﬀerent batches, we divide video into overlapping batches with a 5-frame overlap. In other words, frames 1 to 20 → batch 1, frames 15 to 35 → batch 2, and so on). Since we are interested in denoising in this case, the forward operator (and its adjoint) were set to identity. For both the 2D-TV and the 3D-TV methods, we set 6 µ = 0.8 for dH = 1.8m and µ = 0.3 for dH = 2.7m, λ = 2µ, and the number of

(projected) gradient descent iterations (for Algorithm 4.2), NGD = 1. The number of Bregman iterations (outer loop) was set to 170. Unsurprisingly, the computational time for the 2D-TV method is approximately 40% faster than the 3D counterpart with the same number of iterations. Thus, in order to have a fair(er) comparison between the two methods, we also provide reconstruction results using 125 iterations of the 3D-TV method, which approximately takes the same computational time as the 2D-TV method using 170 iterations (see Table 4.1). Figures 4.5 and 4.6 shows a few representative frames from the videos corresponding to the human subject at dH = 1.8m (approximately 6ft) and dH = 2.7m (approximately 9ft), respectively. It can be seen that, even though the observations are extremely noisy, both the 2D-TV and the 3D-TV reconstructions are able to image the human subject well. We further observe that exploiting the temporal correlations using 3D-TV helps reducing or “smoothening out” the background noise better (than the 2D-TV method). This improvement is much more evident from the reconstructed videos (see supplementary material), where we can see “ﬂickering” between image frames in the 2D-TV reconstruction videos that is not present in the corresponding 3D-TV reconstructions. A scenario where the 2D-TV reconstruction would outperform the 3D-TV method is when there is rapid motion in the hidden scene. If the observed plenoptic video frames change “too fast”, then the 3D-TV method tends to introduce “motion-blurs” that blur-out the edges of moving objects, which is undesirable. Since 2D-TV don’t use

6It was empirically seen that using same values of µ worked well for both 2D-TV and 3D-TV methods. 94

Observations 2D-TV (170 iters) 3D-TV (170 iters) 3D-TV (125 iters)

Figure 4.5: Observed (noisy) LWIR images and corresponding reconstructions (of the “mirror-reﬂections”) for dH = 1.8m. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iterations (column 4) are approximately the same. Full videos are available in the supplementary material. 95

Observations 2D-TV (170 iters) 3D-TV (170 iters) 3D-TV (125 iters)

Figure 4.6: Observed (noisy) LWIR images and corresponding reconstructions (of the “mirror-reﬂections”) for dH = 2.7m. The signal level in the observations are signiﬁ- cantly smaller compared to Figure 4.5 since the person is farther away from the wall. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iteration (column 4) are approximately the same. Full videos are available in the supplementary material. 96

Method Throughput (in fps) 2D-TV (170 iters) 13.07 3D-TV (170 iters) 9.30 3D-TV (125 iters) 12.74

Table 4.1: Average throughput of reconstruction methods in frames/sec (fps). Image acquisition rate Figure 4.7: Masonite scatter- was 10 fps. Methods (or settings) with greater ing surface as seen by a visible throughputs > 10 fps, show potential for real-time camera. imaging and are highlighted in bold.

Distance of human 2D-TV 3D-TV 3D-TV subject from wall, (170 iterations) (170 iterations) (125 iterations) dH (in m) 1.8 7.135 7.330 7.183 2.7 6.907 6.867 6.609

Table 4.2: Average ∆-SSNR (over all video frames) for thermal NLOS recovery (in dB). For each video, the method with highest Average ∆-SSNR is highlighted in bold. the temporal information, this eﬀect is not seen in 2D-TV reconstructions. Table 4.2 provides a quantitative comparison of the reconstruction methods used here using the ∆-SSNR metric proposed in Section 4.4. The average ∆-SSNR is higher for the case when the subject was closer to the wall, as expected. For the dH = 1.8m case, 3D-TV with 170 iterations outperforms (according to the ∆-SSNR-metric) the other methods, and for the dH = 2.7m case, 2D-TV with 170 iterations (mariginally) outperforms the 3D-TV methods. Furthermore, it is interesting to compare the LWIR images (and especially the reconstructions) in Figures 4.5 and 4.6, with Figure 4.7, which shows a single snapshot of the masonite scattering surface used in our experiments, in visible wavelength (taken by a smartphone with a human standing less than 30cm away from the surface). While there is practically no discernible information in the visible spectrum, even the noisy measurements from the IR camera conveys a lot of information about the hidden scene. This highlights the potential beneﬁts of using IR imaging over visible techniques for NLOS scene recovery. 97 4.7 Conclusion

In this chapter, we presented an algorithmic method for NLOS imaging, by utilizing the structural redundancies of the information-rich plenoptic functions (or light fields). Our NLOS reconstruction methodology aims to reconstruct hidden scenes by “inverting” the known linear light transport operator (of a scattering surface) using a suitable prior (or regularization) on the unknown light field. Specifically, we propose using a multi-way TV regularized inversion methodology for recovering NLOS scenes from noisy plenoptic data. Based on the popular split-Bregman method [13], we provide an algorithm for solving the regularized inverse problem for general K-dimensional NLOS light field recovery. We demonstrate the utility of our regularizer on real plenoptic data (in the visible spectrum) of simple 2D scenes, consisting of 1D objects. We also demonstrate the efficacy of our regularized inversion algorithm on thermal (LWIR) images, which have recently been proposed as a viable alternative to visible NLOS imaging due to the specular (mirror-like) BRDF of common materials at LWIR wavelengths [14]. Experimental results on real data show, both qualitatively and quantitatively (using the proposed ∆-SSNR-metric), the ability of our method to exploit spatio-temporal correlations in the thermal videos, enabling us to reliably image human subjects who are several feet away from the scattering surface. Based on the technique of chopping up the video into (overlapping) mini-batches of small tensors, outlined in Section 4.6.2, we implemented a near real-time thermal NLOS imaging pipeline on MATLAB. Our pipeline does the following:

• receive an online stream of thermal image frames,

• convert them into (overlapping) mini-batches of size 20,

• denoise the batches (using our 3D-TV regularization method), and

• display a slightly-time-delayed version of the observed and reconstructed thermal videos.

Using a GPU-accelerated implementation of our multi-way TV reconstruction algorithm, we demonstrated thermal NLOS imaging in near real-time with a latency of about 5 seconds. We believe that this latency can be further reduced by an end-to-end optimized pipeline, implemented in a low-latency programming language like C++. The results shown here indicate potential for deploying such imaging methods in practice under real-time operating constraints. We envision that by suitably combining 98 information from multiple imaging modalities, it is would be possible to further push the limits of passive indirect imaging systems. For example, we could consider using light ﬁeld sensor arrays co-located with an IR camera, to jointly exploit the structure of the plenoptic function across spatial, angular, time, and wavelength dimensions, simultaneously. We defer such investigations to future work.

4.8 Acknowledgment

We would like to thank Prof. James Leger and his students, Connor Hashemi and Dr. Takahiro Sasaki, for their help with the experimental setup used for the LWIR experiments, and for providing the BRDF data. Connor Hashemi helped with the setup of LWIR experiments, data collection, and the implementation of the online batch- processing pipeline in Section 4.6. Dr. Sasaki provided the BRDF measurements for the satin paint-coated scattering surface, and the measured light ﬁeld data for the experiments in Section 4.5. Chapter 5

Directions for Future Work

In Chapter2, we obtain lower bounds for problems where the parameter of interest is structured, using the concept of “packing sets” to measure the complexity of the parameter class. In particular, we outline a method for constructing packing sets for matrices that can be expressed as product of two structured factor matrices. However, it is possible to utilize similar analytical techniques to establish lower bounds for multi- dimensional tensor completion problems, where the tensor of interest can expressed as a product of structured factors, e.g., using a PARAFAC model, or Tucker decomposition. The aim here would be to complement the upper bounds derived in [88]. We defer investigations along these lines for future work. The work on parameter estimation lower bounds for plenoptic imaging problems uses rendering software as a “black-box,” to synthesize the forward model and numerically evaluate the HCR-LB. In this work, we maximized the HCR functional by exhaustive search in our example problems, but this would scale poorly as the size of the parameter class and/or the dimensionality of the parameter increases. An alternative to exhaustive search would be to use derivative-free (or zeroth order) optimization (DFO) algorithms [89] to maximize the HCR functional, e.g., Equations (3.6) and (3.7), without using gradient information. An important consideration for this line of work would be to develop optimization algorithms that can handle rendering inconsistencies and also analyze the eﬀects of rendering error on the convergence of such algorithms. A principled alternative to the DFO approach would be to use recently developed diﬀerentiable renderers like [53,78,90] to render gradients of the plenoptic observations with respect to scene parameters. The rendered gradients can either be used to directly

99 100 compute the CR-LB, or maximize the HCR functional to obtain the HCR-LB. Rendered gradients, just like rendered images, are Monte Carlo estimates and hence are erroneous. It would be a topic of significant interest to extend our rendering error analysis and study the effects of using noisy rendered gradients for the HCR-LB computation and Maximum Likelihood Estimation problems. It is worth pointing out that the HCR-LB, derived here applies only to unbiased estimators, which are useful when we have many more observations than unknowns, i.e., when the number of observations significantly exceeds the dimensionality of the parameter. However, it is well-known that adding (a suitable) bias in the form of regularization (or a prior) to the estimator, as we do in Chapter4, can significantly reduce the MSE [79]. Extending our framework to accommodate potentially biased estimators (such as those that arise from many regularized or constrained optimization problems) would of an exciting line of work with significant applicability to high-dimensional scene parameter inference. The work on Non-Line-of-Sight (NLOS) scene recovery in Chapter4, provides a regularized linear inversion methodology, which leverages the high structural regularity of the plenoptic function (or the light field) using a multi-way Total Variation (TV) regularizer. We also outline an algorithm based on the split Bregman method [13] to solve our regularized inverse problem. Experimental results show the efficacy of our approach, especially in Long-Wave Infrared (LWIR), where we are able to reliably image human subjects from highly noisy LWIR observations, almost in real-time. Our recovery algorithm requires a priori knowledge of the forward operator (a.k.a. the light transport operator), and hence the Bi-Directional Reflectance Distribution (BRDF) of the scattering surface, which might be hard to obtain in many scenarios. However, we can treat the light transport operator as an unknown factor and try to recover both the hidden scene and the light transport operator using a “blind inverse imaging approach”. This was recently studied in [91], where the authors proposed jointly learning the hidden scene and the light transport operator using deep convolutional neural networks. An alternative approach would be to parametrize the BRDF using a physically realistic analytical model like [92, 93] and use differentiable renderers to learn the BRDF parameters and the hidden scene jointly. This would be an interesting direction for future work. References

[1] W. Jakob, “Mitsuba renderer,” 2010, http://www.mitsuba-renderer.org.

[2] S. Foucart and H. Rauhut, “A mathematical introduction to compressive sensing,” in A mathematical introduction to compressive sensing, pp. 1–39. Springer, 2013.

[3] A. V. Sambasivan and J. D. Haupt, “Minimax lower bounds for noisy matrix completion under sparse factor models,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3274–3285, 2018.

[4] A. Soni, S. Jain, J. Haupt, and S. Gonella, “Noisy matrix completion under sparse factor models,” IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 3636–3661, 2016.

[5] E. H. Adelson, J. R. Bergen, et al., “The plenoptic function and the elements of early vision,” 1991.

[6] A. V Sambasivan, R. G. Paxman, and J. D. Haupt, “Computer graphics meets estimation theory: Parameter estimation lower bounds for plenoptic imaging systems,” in 2019 53rd Asilomar Conference on Signals, Systems, and Computers. IEEE, 2019, pp. 1021–1025.

[7] K. L. Bouman, V. Ye, A. B. Yedidia, F. Durand, G. W. Wornell, A. Torralba, and W. T Freeman, “Turning corners into cameras: Principles and methods,” in International Conference on Computer Vision, 2017, vol. 1, p. 8.

[8] M. Baradad, V. Ye, A. B. Yedidia, F. Durand, W. T. Freeman, G. W. Wornell, and A. Torralba, “Inferring light ﬁelds from shadows,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6267–6275.

101 102 [9] C. Saunders, J. Murray-Bruce, and V. K. Goyal, “Computational periscopy with an ordinary digital camera,” Nature, vol. 565, no. 7740, pp. 472–475, 2019.

[10] A. B. Yedidia, M. Baradad, C. Thrampoulidis, W. T. Freeman, and G. W. Wornell, “Using unknown occluders to recover hidden scenes,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12231–12239.

[11] S. W. Seidel, Y. Ma, J. Murray-Bruce, C. Saunders, W. T Freeman, C. Y. Christo- pher, and V. K. Goyal, “Corner occluder computational periscopy: Estimating a hidden scene from a single photograph,” in 2019 IEEE International Conference on Computational Photography (ICCP). IEEE, 2019, pp. 1–9.

[12] J. T. Kajiya, “The rendering equation,” in ACM Siggraph Computer Graphics. ACM, 1986, vol. 20, pp. 143–150.

[13] T. Goldstein and S. Osher, “The split Bregman method for L1-regularized problems,” SIAM journal on Imaging Sciences, vol. 2, no. 2, pp. 323–343, 2009.

[14] T. Maeda, Y. Wang, R. Raskar, and A. Kadambi, “Thermal non-line-of-sight imaging,” in 2019 IEEE International Conference on Computational Photography (ICCP). IEEE, 2019, pp. 1–11.

[15] E. J. Cand`esand B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, pp. 717–772, 2009.

[16] E. J. Cand`esand T. Tao, “The power of convex relaxation: Near-optimal matrix completion,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2053– 2080, 2010.

[17] R. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Transactions on Information Theory, vol. 56, no. 6, pp. 2980–2998, 2010.

[18] B. Recht, “A simpler approach to matrix completion,” The Journal of Machine Learning Research, vol. 12, pp. 3413–3430, 2011.

[19] D. Gross, “Recovering low-rank matrices from few coeﬃcients in any basis,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1548–1566, 2011.

[20] R. Keshavan, A. Montanari, and S. Oh, “Matrix completion from noisy entries,” in Advances in Neural Information Processing Systems, 2009, pp. 952–960. 103 [21] E. J. Cand`esand Y. Plan, “Matrix completion with noise,” Proceedings of the IEEE, vol. 98, no. 6, pp. 925–936, 2010.

[22] V. Koltchinskii, K. Lounici, and A. B. Tsybakov, “Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion,” The Annals of Statistics, vol. 39, no. 5, pp. 2302–2329, 2011.

[23] A. Rohde and A. B. Tsybakov, “Estimation of high-dimensional low-rank matrices,” The Annals of Statistics, vol. 39, no. 2, pp. 887–930, 2011.

[24] T. T. Cai and W. Zhou, “Matrix completion via max-norm constrained optimization,” Electronic Journal of Statistics, vol. 10, no. 1, pp. 1493–1525, 2016.

[25] O. Klopp, “Noisy low-rank matrix completion with general sampling distribution,” Bernoulli, vol. 20, no. 1, pp. 282–303, 2014.

[26] J. Lafond, “Low rank matrix completion with exponential family noise.,” in COLT, 2015, pp. 1224–1243.

[27] M. A. Davenport, Y. Plan, E. van den Berg, and M. Wootters, “1-bit matrix completion,” Information and Inference, vol. 3, no. 3, pp. 189–223, 2014.

[28] Y. Plan, R. Vershynin, and E. Yudovina, “High-dimensional estimation with geometric constraints,” Information and Inference, p. iaw015, 2016.

[29] A. Soni and J. Haupt, “Estimation error guarantees for Poisson denoising with sparse and structured dictionary models,” in IEEE International Symposium on Information Theory. IEEE, 2014, pp. 2002–2006.

[30] E. J. Cand`es,X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” Journal of the ACM (JACM), vol. 58, no. 3, pp. 11, 2011.

[31] D. Hsu, S. M. Kakade, and T. Zhang, “Robust matrix decomposition with sparse corruptions,” IEEE Transactions on Information Theory, vol. 57, no. 11, pp. 7221– 7234, 2011.

[32] H. Xu, C. Caramanis, and S. Sanghavi, “Robust PCA via outlier pursuit,” IEEE Transactions on Information Theory, vol. 58, no. 5, pp. 3047–3064, 2012. 104 [33] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis, “Low-rank matrix recovery from errors and erasures,” IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4324–4337, 2013.

[34] A. B. Tsybakov, Introduction to nonparametric estimation, Springer, 2008.

[35] O. Klopp, K. Lounici, and A. B. Tsybakov, “Robust matrix completion,” Proba- bility Theory and Related Fields, pp. 1–42, 2014.

[36] J. Haupt, N. Sidiropoulos, and G. Giannakis, “Sparse dictionary learning from 1-bit data,” in IEEE International. Conference on Acoustics, Speech and Signal Processing. IEEE, 2014, pp. 7664–7668.

[37] A. Ribeiro and G. B. Giannakis, “Bandwidth-constrained distributed estimation for wireless sensor networks-part i: Gaussian case,” IEEE Transactions on Signal Processing, vol. 54, no. 3, pp. 1131–1143, 2006.

[38] Z. Q. Luo, “Universal decentralized estimation in a bandwidth constrained sensor network,” IEEE Transactions on Information Theory, vol. 51, no. 6, pp. 2210–2219, 2005.

[39] J. Lafond, O. Klopp, E. Moulines, and J. Salmon, “Probabilistic low-rank matrix completion on ﬁnite alphabets,” in Advances in Neural Information Processing Systems, 2014, pp. 1727–1735.

[40] M. Raginsky, R. M. Willett, Z. T. Harmany, and R. F. Marcia, “Compressed sensing performance bounds under Poisson noise,” IEEE Transactions on Signal Processing, vol. 58, no. 8, pp. 3990–4002, 2010.

[41] E. D. Kolaczyk and R. D. Nowak, “Multiscale likelihood analysis and complexity penalized estimation,” Annals of Statistics, pp. 500–527, 2004.

[42] X. Jiang, G. Raskutti, and R. Willett, “Minimax optimal rates for Poisson inverse problems with physical constraints,” IEEE Transactions on Information Theory, vol. 61, no. 8, pp. 4458–4474, 2015.

[43] T. G. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM review, vol. 51, no. 3, pp. 455–500, 2009. 105 [44] E. H Adelson and J. Y. A. Wang, “Single lens stereo with a plenoptic camera,” IEEE Transactions on Pattern Analysis & Machine Intelligence, , no. 2, pp. 99–106, 1992.

[45] M. Levoy and P. Hanrahan, “Light ﬁeld rendering,” in Proc. of the 23rd Annual Conference on Computer graphics and interactive techniques. ACM, 1996, pp. 31– 42.

[46] R. Prevedel, Y.G Yoon, M. Hoffmann, N. Pak, G. Wetzstein, S. Kato, T. Schrödel, R. Raskar, M. Zimmer, E. S. Boyden, et al., “Simultaneous whole-animal 3d imaging of neuronal activity using light-field microscopy,” Nature methods, vol. 11, no. 7, pp. 727–730, 2014.

[47] M. Levoy, R. Ng, A. Adams, M. Footer, and M. Horowitz, “Light ﬁeld microscopy,” in ACM SIGGRAPH 2006 Papers, pp. 924–934. 2006.

[48] T. Sasaki and J. R. Leger, “Light-ﬁeld reconstruction from scattered light using plenoptic data,” in Unconventional and Indirect Imaging, Image Reconstruction, and Wavefront Sensing 2018. International Society for Optics and Photonics, 2018, vol. 10772, p. 1077203.

[49] H. Cramer, Mathematical Methods of Statistics, Princeton, NJ: Princeton Univ. Press, 1946.

[50] C. R. Rao, “Minimum variance and the estimation of several parameters,” in Math- ematical Proceedings of the Cambridge Philosophical Society. Cambridge University Press, 1947, vol. 43, pp. 280–283.

[51] J. M. Hammersley, “On estimating restricted parameters,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 12, no. 2, pp. 192–240, 1950.

[52] D. G. Chapman and H. Robbins, “Minimum variance estimation without regularity assumptions,” The Annals of Mathematical Statistics, pp. 581–586, 1951.

[53] T. M. Li, M. Aittala, F. Durand, and J. Lehtinen, “Diﬀerentiable monte carlo ray tracing through edge sampling,” in SIGGRAPH Asia 2018 Technical Papers. ACM, 2018, p. 222. 106 [54] A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G Bawendi, and R. Raskar, “Recovering three-dimensional shape around a corner using ultrafast time-of-ﬂight imaging,” Nature communications, vol. 3, no. 1, pp. 1–8, 2012.

[55] A. Velten, D. Wu, A. Jarabo, B. Masia, C. Barsi, C. Joshi, E. Lawson, M. Bawendi, D. Gutierrez, and R. Raskar, “Femto-photography: capturing and visualizing the propagation of light,” ACM Transactions on Graphics (ToG), vol. 32, no. 4, pp. 1–8, 2013.

[56] V. Arellano, D. Gutierrez, and A. Jarabo, “Fast back-projection for non-line of sight reconstruction,” Optics express, vol. 25, no. 10, pp. 11574–11583, 2017.

[57] C. Thrampoulidis, G. Shulkind, F. Xu, W. T. Freeman, J. H. Shapiro, A. Torralba, F. NC Wong, and G. W. Wornell, “Exploiting occlusion in non-line-of-sight active imaging,” IEEE Transactions on Computational Imaging, vol. 4, no. 3, pp. 419– 431, 2018.

[58] F. Heide, M. OToole, K. Zang, D. B. Lindell, S. Diamond, and G. Wetzstein, “Non- line-of-sight imaging with partial occluders and surface normals,” ACM Transac- tions on Graphics (ToG), vol. 38, no. 3, pp. 1–10, 2019.

[59] A. Torralba and W. T. Freeman, “Accidental pinhole and pinspeck cameras: Re- vealing the scene outside the picture,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 374–381.

[60] O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nature photonics, vol. 8, no. 10, pp. 784–790, 2014.

[61] B. M. Smith, M. O’Toole, and M. Gupta, “Tracking multiple objects outside the line of sight using speckle imaging,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6258–6266.

[62] M. Batarseh, S. Sukhov, Z. Shen, H. Gemar, R. Rezvani, and A. Dogariu, “Passive sensing around the corner using spatial coherence,” Nature communications, vol. 9, no. 1, pp. 1–6, 2018. 107 [63] A. Viswanath, P. Rangarajan, D. MacFarlane, and M. P. Christensen, “Indirect imaging using correlography,” in Computational Optical Sensing and Imaging. Optical Society of America, 2018, pp. CM2E–3.

[64] T. Maeda, G. Satat, T. Swedish, L. Sinha, and R. Raskar, “Recent advances in imaging around corners,” arXiv preprint arXiv:1910.05613, 2019.

[65] J. Murray-Bruce, C. Saunders, and V. K. Goyal, “Occlusion-based computational periscopy with consumer cameras,” in Wavelets and Sparsity XVIII. International Society for Optics and Photonics, 2019, vol. 11138, p. 111380X.

[66] S. W. Seidel, J. Murray-Bruce, Y. Ma, C. Yu, W. T. Freeman, and V. K. Goyal, “Two-dimensional non-line-of-sight scene estimation from a single edge occluder,” arXiv preprint arXiv:2006.09241, 2020.

[67] S. K. Chow and P. Schultheiss, “Delay estimation using narrow-band processes,” IEEE Conference on Acoustics, Speech, and Signal Processing, vol. 29, no. 3, pp. 478–484, 1981.

[68] J. Tabrikian and J. L. Krolik, “Barankin bounds for source localization in an uncertain ocean environment,” IEEE Transactions on Signal Processing, vol. 47, no. 11, pp. 2917–2927, 1999.

[69] R. McAulay and E. Hofstetter, “Barankin bounds on parameter estimation,” IEEE Transactions on Information Theory, vol. 17, no. 6, pp. 669–676, 1971.

[70] L. Knockaert, “The Barankin bound and threshold behavior in frequency estimation,” IEEE Transactions on Signal Processing, vol. 45, no. 9, pp. 2398–2401, 1997.

[71] J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimation with constraints,” IEEE Transactions on Information Theory, vol. 36, no. 6, pp. 1285– 1301, Nov 1990.

[72] C. R. Rao, “Information and the accuracy attainable in the estimation of statistical parameters,” in Bulletin of the Calcutta Mathematical Society, 1945, vol. 37, pp. 81–91. 108 [73] J. Arvo, K. Torrance, and B. Smits, “A framework for the analysis of error in global illumination algorithms,” in Proceedings of the 21st Annual Conference on Computer graphics and interactive techniques, 1994, pp. 75–84.

[74] A. Celarek, W. Jakob, M. Wimmer, and J. Lehtinen, “Quantifying the error of light transport algorithms,” in Computer Graphics Forum. Wiley Online Library, 2019, vol. 38, pp. 111–121.

[75] M. McGuire, “Computer graphics archive,” July 2017, https://casual-effects.com/data.

[76] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “PyTorch: An imperative style, high-performance deep learning library,” in Advances in neural information processing systems, 2019, pp. 8026– 8037.

[77] D. P. Kingma and J. L. Ba, “Adam: A method for stochastic gradient descent,” in ICLR: International Conference on Learning Representations, 2015.

[78] M. Nimier-David, D. Vicini, T. Zeltner, and W. Jakob, “Mitsuba 2: A retargetable forward and inverse renderer,” ACM Transactions on Graphics (TOG), vol. 38, no. 6, pp. 1–17, 2019.

[79] Y. C. Eldar, Rethinking biased estimation: Improving maximum likelihood and the Cram´er- Rao bound, Now Publishers Inc, 2008.

[80] A. Kirmani, T. Hutchison, J. Davis, and R. Raskar, “Looking around the corner using transient imaging,” in 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp. 159–166.

[81] O. Gupta, T. Willwacher, A. Velten, A. Veeraraghavan, and R. Raskar, “Reconstruction of hidden 3d shapes using diﬀuse reﬂections,” Optics express, vol. 20, no. 17, pp. 19096–19108, 2012.

[82] L. I Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: Nonlinear Phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.

[83] A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Transactions on Image Processing, vol. 18, no. 11, pp. 2419–2434, 2009.

[84] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM journal on Imaging Sciences, vol. 2, no. 1, pp. 183–202, 2009. 109 [85] Y. Wang, W. Yin, and Y. Zhang, “A fast algorithm for image deblurring with total variation regularization,” CAAM Technical Report TR-7-10, Rice University, Houston, 2007.

[86] C. R. Vogel and M. E. Oman, “Fast, robust total variation-based reconstruction of noisy, blurred images,” IEEE transactions on Image Processing, vol. 7, no. 6, pp. 813–824, 1998.

[87] Takahiro Sasaki and James R Leger, “Light ﬁeld reconstruction from scattered light using plenoptic data,” JOSA A, vol. 37, no. 4, pp. 653–670, 2020.

[88] S. Jain, A. Gutierrez, and J. Haupt, “Noisy tensor completion for tensors with a sparse canonical polyadic factor,” in 2017 IEEE International Symposium on Information Theory (ISIT). IEEE, 2017, pp. 2153–2157.

[89] A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to derivative-free optimization, SIAM, 2009.

[90] M. M. Loper and M. J. Black, “Opendr: An approximate diﬀerentiable renderer,” in European Conference on Computer Vision. Springer, 2014, pp. 154–169.

[91] M. Aittala, P. Sharma, L. Murmann, A. Yedidia, G. Wornell, B. Freeman, and F. Durand, “Computational mirrors: Blind inverse light transport by deep matrix factorization,” in Advances in Neural Information Processing Systems, 2019, pp. 14311–14321.

[92] R. L. Cook and K. E. Torrance, “A reﬂectance model for computer graphics,” ACM Transactions on Graphics (ToG), vol. 1, no. 1, pp. 7–24, 1982.

[93] B. Walter, S. R. Marschner, H. Li, and K. E. Torrance, “Microfacet models for refraction through rough surfaces,” in Proceedings of the 18th Eurographics conference on Rendering Techniques, 2007, pp. 195–206. List of Acronyms

Table 5.1: Acronyms used (in alphabetical order)

Acronym Meaning AWGN Additive White Gaussian Noise BRDF Bi-directional Reflectance Distribution Function CR-LB Cramer-Rao Lower Bound DFO Derivative-Free Optimization EM algorithm Expectation-Maximization algorithm FD Finite Differences method FD-FI Fisher Information computed using Finite Differences FOV Field of View HCR-LB Hammersley-Chapman-Robbins Lower Bound i.i.d independent and identically distributed KL divergence Kullback-Leibler divergence (LW)IR (Long-Wave) Infrared MLE Maximum Likelihood Estimation (or Estimate) MSE Mean Squared Error NLOS Non-Line-of-Sight SNR Signal to Noise Ratio SPAD Single-Photon Avalanche Diode SSNR Signal-Separation to Noise Ratio TCSPC Time-Correlated Single Photon Counting TV Total Variation

110