New Information-Theoretic Analyses and Algorithmic Methods for Parameter Estimation in Structured Data Settings and Plenoptic Imaging Models
A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY
Abhinav Viswanathan Sambasivan
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy
Prof. Jarvis Haupt
August, 2020 c Abhinav Viswanathan Sambasivan 2020 ALL RIGHTS RESERVED Acknowledgements
I believe that no singular accomplishment is truly “singular,” and in that spirit, I would like to express my heartfelt gratitude to the plurality of factors that have led me to achieve this professional and personal milestone. First and foremost, I would like to begin by thanking my parents, who have always put my needs and joy over theirs, and strived to raise me as a good human being. I would like to thank my mother, Sujatha Viswanathan, for showing me what unconditional love is, and for emphasizing the value of good education over all else, and my father Viswanathan Sambasivan, for teaching me the importance of having faith, and for always keeping my spirits high. I would like to thank my grandmother, Smt. Visalakshi Sambasivan, and my aunts, uncles, and in-laws, who have constantly shown their support and instilled important family values in me. I would also like to thank my cousins−Aravind, Alamelu, Uma, and Krishnan and their spouses for holding me in a special place. I am deeply indebted to my Ph.D. advisor, Prof. Jarvis Haupt, for his unwavering support and for believing in me throughout my graduate study. He has always been approachable, counseled me when I needed his support, and shared his wisdom with me at every turn. My interactions with him over the years have greatly shaped my thought process and research outlook. I hope to carry these values and ethics with me throughout my professional life. I would like to express my gratitude to Defense Advanced Research Projects Agency (DARPA) for their financial support throughout my graduate study. I would also like to thank all my collaborators and Principal Investigators who worked with me on the DARPA REVEAL project. I would especially like to thank Prof. James Leger and Prof. Joey Talghader for many productive research discussions. I also thank Di Lin, Takahiro Sasaki, and Connor Hashemi, who have helped me with the experimental setup and
i data collection for this project. I am privileged to have had the opportunity to collaborate with Dr. Richard Pax- man, from Maxar Technologies. I have learnt a lot about how to pursue a research problem from him. I am also very thankful to Prof. Gary Meyer and his students, Michael Tetzlaff and Michael Ludwig for discussions on topics in computer graphics and ray-tracing, which was a key component of this thesis. I would like to thank Prof. Nikos Sidiropoulos, Prof. Soheil Mohajer, Prof. Arindam Banerjee, and Prof. Tom Luo, whose graduate classes were pivotal for me to gain a strong understanding of the fundamental concepts in Electrical Engineering and Com- puter Science. I would also like to extend a special thanks to Prof. Anand Gopinath, for his support and guidance during the early stages of my graduate study. I would like to thank Prof. Georgios Giannakis, Prof. Soheil Mohajer, and Prof. Nikos Papanikolopoulos for serving on my Ph.D. committee and providing useful com- ments that helped improve my thesis. I would also like to thank my labmates−Swayambhoo Jain, Mojtaba Elyaderani, Xingguo Li, Sirisha Rambhatla, Jineng Ren, Di Xiao, Alex Gutierrez, Gamini Udawat, and Akshay Kumar for the numerous exciting and enlightening discussions that have helped me overcome hurdles in my research path. I am especially thankful to Swayamb- hoo Jain for his mentorship during my internship at Technicolor AI Labs. I have been extremely fortunate to have always been surrounded by a great set of friends. I am thankful to Ashwin Varadarajan, who has been one of my closest friends for the better part of my life. I would also like to thank Rohit Sridhar, Venkat Ram Subramanian and Ramesh K.G. for being my support system and for endless hours of fun and banter. I am grateful to my Minnesota family−Vaishnavi, Karthik, Deepak, and Subhash for filling the void of missing my home and family. I would like to express my limitless appreciation to my wife and best friend, Ramya Ramasubramanian, for standing beside me through thick and thin, for helping with the drafts of this thesis, for making me a better person every day, for myriad of sweet things she has done for me, and finally for showing me what true love is. Finally, I would like to thank the Almighty for giving me the strength, the ability, and the opportunity to pursue my dreams and pray that the good fortune bestowed upon me continues for a long time.
ii Dedicated to the loving memory of my grandfather Shri. A.V. Sambasivan, whose name I have inherited and whose values I wish to inherit, and my grandmother Smt. Kamala Ramdhas, whose love for me had no limits.
iii Abstract
Parameter estimation problems involve estimating an unknown quantity (or parame- ter) of interest, from a set of data (or observations) that contains some information about the parameter. Such problems are ubiquitous and widely studied across diverse disci- plines in science and engineering including, but not limited to, physics, computer science, signal processing, computational genomics, and economics. Information-theoretic limits of a parameter estimation problem quantify the best-achievable performance (under a suitable metric), thus establishing the fundamental difficulty of solving the problem. A central theme of the first two parts of this work is to develop information-theoretic tools to analyze the fundamental limits of estimating parameters from noisy data under two very different settings: (1) the parameter of interest belongs to structured class of signals, and (2) a concise forward model relating the observations to the parameters is analytically challenging to obtain. The first part of this work examines the fundamental error characteristics of a general class of matrix completion problems, where the matrix of interest is a product of two a priori unknown matrices, one of which is sparse, and the observations are noisy. Our main contributions come in the form of minimax lower bounds for the expected per- element squared error for this problem under several common noise models. Our results establish that the error bounds derived in (Soni et al. 2016) for complexity-regularized maximum likelihood estimators achieve, up to multiplicative constants and logarithmic factors, the minimax error rates under certain (mild) conditions. The rest of this work focuses on plenoptic imaging, which usually involves taking multiple single snapshot images of a scene, collected across time (videos), wavelength (multi-spectral cameras), and from multiple vantage points (light field sensor arrays), thus providing substantially more information about a given scene than conventional imaging. For this thrust, we first focus on assessing the fundamental limits of scene parameter estimation in plenoptic imaging systems, with an eye towards passive indi- rect imaging problems. We develop a general framework to obtain lower bounds on the variance of unbiased estimators for scene parameter estimation from noisy plenoptic data. The novelty of this work lies in the use of computer graphics rendering software
iv to synthesize the (often-complicated) forward mapping to evaluate the Hammersley- Chapman-Robbins lower bound (HCR-LB), which is at least as tight as the more com- monly used Cramer-Rao lower bound. When the rendering software yields inexact estimates of the forward mapping, we analyze the effects of such inaccuracies on the HCR-LB both theoretically and via simulations, and provide a method to obtain upper and lower intervals for the true HCR-LB. The final part of this work explores algorithmic methods for Non-Line-of-Sight (NLOS) imaging from (noisy) plenoptic data, where the aim is to recover a hidden scene of interest from noisy measurements that arise from reflections off a scattering surface, e.g. a wall, or the floor. We use the insight that plenoptic data is highly structured due to parallax and/or motion in the hidden scene and propose a multi-way Total Variation (TV) regularized inversion methodology to leverage this structure and recover hidden scenes. We demonstrate our recovery algorithm on real-world plenoptic data measurements at visible and Long-Wave InfraRed (LWIR) wavelengths. Experi- ments in LWIR (or thermal) imaging shows that it is possible to reliably image human subjects around a corner, nearly in real-time, using our framework.
v Contents
Acknowledgementsi
Abstract iv
List of Tables ix
List of Figuresx
1 Introduction1 1.1 Information-Theoretic Analyses of Parameter Estimation Problem...2 1.2 Algorithmic Methods for Non-Line-of-Sight Plenoptic Imaging...... 5
2 Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models7 2.1 Introduction...... 7 2.1.1 Organization...... 8 2.1.2 Notations and Preliminaries...... 8 2.2 Problem Statement...... 9 2.2.1 Observation Model...... 9 2.2.2 The Minimax Risk...... 10 2.3 Main Results and Implications...... 12 2.3.1 Additive Gaussian Noise...... 14 2.3.2 Additive Laplace Noise...... 15 2.3.3 One-bit Observation Model...... 16 2.3.4 Poisson-distributed Observations...... 19 2.4 Conclusion...... 21 2.5 Acknowledgement...... 22
vi 2.6 Appendix...... 22 2.6.1 Proof of Theorem 2.1...... 23 2.6.2 Proof of Corollary 2.1...... 28 2.6.3 Proof of Corollary 2.2...... 28 2.6.4 Proof of Corollary 2.3...... 29 2.6.5 Proof of Theorem 2.2...... 30
3 Parameter Estimation Lower Bounds for Plenoptic Imaging Systems 36 3.1 Introduction...... 37 3.1.1 Prior Art...... 38 3.1.2 Our Contribution...... 39 3.2 Forward Model...... 40 3.2.1 The Rendering Equation...... 40 3.2.2 Illustrative Example Scene: A Π-shaped Hallway...... 42 3.3 Problem Statement...... 42 3.4 Renderer-Enabled Computation of Lower Bounds...... 44 3.4.1 HCR Lower Bound for Poisson Noise...... 46 3.4.2 HCR Lower Bound for Additive White Gaussian Noise...... 47 3.4.3 Localizing Information Content in Plenoptic Observations.... 47 3.4.4 Experimental Evaluation: Lower bounds for Π-shaped Hallway Scene...... 48 3.5 Computing Lower bounds with Inexact Rendering...... 52 3.5.1 Estimating HCR Lower Bounds from Inexact Rendering..... 57 3.5.2 Experimental Validation: NLOS Object Localization...... 60 3.6 Maximum Likelihood Estimation...... 64 3.6.1 Experimental Evaluation: NLOS Object Localization using Max- imum Likelihood Estimation...... 65 3.7 Conclusion...... 68 3.8 Acknowledgment...... 69 3.9 Appendix...... 70 3.9.1 Empirical validation of Assumption A.2 ...... 70 3.9.2 Proof of Theorem 3.1...... 72 3.9.3 Proof of Theorem 3.2...... 75
vii 4 Non-Line-Of-Sight Imaging from Plenoptic Observations 76 4.1 Introduction...... 76 4.1.1 Prior Art...... 77 4.2 Problem Formulation...... 78 4.2.1 The NLOS Imaging Problem...... 79 4.2.2 Notation and Preliminaries...... 81 4.3 Multi-Way Total Variation Regularization for NLOS Scene Recovery.. 82 4.4 Signal-Separation to Noise Ratio: An Unsupervised Evaluation Metric. 86 4.5 Recovering a 2D NLOS light field...... 88 4.5.1 Experimental Setup...... 88 4.5.2 Results and Discussion...... 89 4.6 NLOS Scene Recovery using Thermal Imaging...... 91 4.6.1 Experimental Setup...... 91 4.6.2 Results and Discussion...... 93 4.7 Conclusion...... 97 4.8 Acknowledgment...... 98
5 Directions for Future Work 99
References 101
viii List of Tables
3.1 Comparison of HCR lower bound and performance of MLE for AWGN with σ = 0.1...... 66 4.1 Average throughput of reconstruction methods in frames/sec (fps). Im- age acquisition rate was 10 fps. Methods (or settings) with greater throughputs > 10 fps, show potential for real-time imaging and are high- lighted in bold...... 96 4.2 Average ∆-SSNR (over all video frames) for thermal NLOS recovery (in dB). For each video, the method with highest Average ∆-SSNR is high- lighted in bold...... 96 5.1 Acronyms used (in alphabetical order)...... 110
ix List of Figures
1.1 A typical parameter estimation problem consists of the “forward model,” which describes how noisy observations y are generated from the param- eter of interest θ∗, and the “inverse problem,” which entails estimating θ∗ from y. The noise (and hence the observations) are modeled as a random quantities whose distribution is assumed to be known a priori. The distance between θb(y) and θ∗ (under a suitable, specified distance metric) determines the “goodness” of the estimator...... 2 3.1 The rendering equation, explained graphically: (a) The proportion of in-
cident light coming in from direction ϕi that gets reflected along direction
ϕo is determined by the BRDF of the surface; (b) Light incident on a surface point r, can be seen as light leaving from another point in the in out scene g(r, ϕi), ⇒ Lθ∗ (r, −ϕi) = Lθ∗ (g(r, ϕi), −ϕi)...... 41 3.2 Simulating the forward model using rendering: (a) Layout of a Π-shaped hallway with dimensions marked. Corridors A, B, and C are 2.5m, 3m, and 2.5m long respectively, and 2m tall. The hallway is illuminated with −1 −2 white ceiling lights with a luminance of 3 lm · sr m . The camera C0 is located 0.5m outside corridor A. Location and radius of a red spherical ball constitute the unknown scene parameter θ∗. (b) If we define θ∗ by setting ball radius = 10cm and ball location as the intersection of corridor
A and B, then Lθ∗ is the nominal RGB image of the scene captured by
a camera at C0. We obtain Lθ∗ using the rendering engine Mitsuba [1] as shown in (b)...... 43
x 3.3 HCR-LB for ball location estimation for Poisson Noise: (a)-(d), and AWGN with different values of σ: (e)-(h). HCR-LB under different regimes are shown in: (a),(e) LOS region - HCR-LB is very small. The LB drops significantly when ball starts moving in Corridor B; (b),(f) Transition from LOS to NLOS - Sharp increase in the HCR-LB when the ball moves away from LOS; and (c),(g) NLOS region - HCR-LB in is much higher, indicating the potential hardness of the estimation prob- lem; (d),(h) Ball radius estimation - HCR-LB decreases with increasing size (radius) of the ball. With the help of these curves, one can quantify how difficult the problem of estimating NLOS parameters can be. For AWGN model, we can see that HCR-LB increases with σ as expected.. 49 3.4 Pixelwise FD-Fisher Information (FD-FI), obtained by aggregating con- tributions from all 30 spectral channels. Darker regions ⇒ more infor- mative. Pixelwise FD-FI shows where and how information about the parameter of interest is localized in our observations. These images high- light subtle details about the scene parameters which are not visible from the nominal RGB images (bottow-row) which are obtained from the ren- dering software; (a) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row) for 4 different ball locations: (from left to right) completely in LOS, just inside LOS, just outside LOS, center of corridor B. Notice that different regions in the scene are more infor- mative than others for different ball locations; (b) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row). These images show where information about ball radius is localized. Notice that regions of information differ from the FD-FI images for ball location in Figure 3.4(a)...... 50
xi 3.5 (a) Top-view of the scene layout used. Scene geometry is the same as in Section 3.2.2. Instead of a red spherical object, we consider a red teapot, and the camera is now placed in the middle of hallway and captures RGB images. The scalar parameter of interest θ∗ is the horizontal displace- ment of the teapot from intersections of corridors A and B. RGB images rendered using Redner for different values of θ∗ are shown in (b) and (c). 65536 samples-per pixel were used, and it took around 3.3 minutes to render each scene; (b) θ∗ = 0.2m: teapot fully in LOS; (c) θ∗ = 0.9m: teapot just moved completely away from LOS...... 61 3.6 HCR-LB for estimation of teapot location under AWGN and Poisson
Noise. HCRNeff (Red lines): HCR-LB computed directly using rendered
data with Neff = 65536 samples per-pixel; HCRd (Black lines): HCR-LB estimated from rendering scenes with N = 2048, 3072,..., 11264. Due to h ∗ i ∗ rendering errors, typically we have E HCR(d θ ) ≥ HCR(θ ) ≥ HCRNeff .
The region between HCRd and HCRNeff denotes the interval within which the true HCR-LB is likely to lie. (a) HCR-LB for Poisson noise model. (b) HCR-LB for AWGN with σ = 0.1. (c) HCR-LB when the teapot is not in LOS for σ = 0.1, 0.2, 0.4, 0.6, and 0.8...... 62 3.7 Effect of samples per-pixel on λ and the HCR functional f(λ) for estima- tion of teapot location. Noise model: AWGN with σ = 0.1, true object location θ∗ = 1.05m. (a) λeN in the neighborhood of θ∗ = 1.05m - shows how λeN decreases with N uniformly for all values of ∆. (b) Plot of esti- mated and observed λ’s for θ∗ = 1.05m shows that λeNeff ≥ λb, even with
Neff = 65536 samples per-pixels. (c) HCR functional obtained from the estimated and observed λ’s...... 62
3.8 Relationship between HCRd and HCRNeff for different teapot locations (a)-(c) for AWGN with σ = 0.8, (d)-(f) for Poisson Noise model; under 2 different scenarios: (a),(d) Maximum of the HCR functional occurs for
∆ → 0 implies that HCRd is much larger than HCRNeff . (b),(e) Maximum
of the HCR functional occurs for k∆k 0 implies that HCRd and HCRNeff are approximately equal. (c),(f) HCR-LB for 0.7m ≤ θ∗ ≤ 1.69m..... 63
xii 3.9 Top row: Clean Images for different teapot locations rendered using 65536 samples per-pixel. Bottom row: A single instance of noisy images corrupted by AWGN with σ = 0.1. After the teapot goes completely out of LOS (θ∗ > 0.9m), it is very hard to discern any information about the teapot by simply looking at these images (both from clean and the noisy versions)...... 65 3.10 Comparison of HCR lower bound and MLE for AWGN with σ = 0.1... 67 3.11 Top left: A single instance of the teapot image rendered with 1024 sam- ples. Other plots: Per-pixel variance (summed over the 3 color channels) for the teapot image for different values of samples per-pixel N. It can be seen that per-pixel variance is not same across all pixels in the image. While the general pattern of pixel-wise variance is similar across different values of N, the magnitude of the variance decreases (as expected) with increasing samples...... 71 4 3.12 Results from the simulation with 10 independent draws of weights Wω: (a) Average Squared L2-Error of fit vs Degree p; (b) Distribution of the
optimal degree popt shows that most of it is concentrated around p = 1; (c) A single instance of weighted sum of pixel variance (γ(N)) along with the model fit using p = 1...... 72
4.1 Illustration of a typical NLOS imaging problem. The camera FOV Fc defines the set of all points on the reflecting surface corresponding to the pixels captured by the camera located at c. The light field ob-
served/measured by the camera from a surface point r ∈ Fc is denoted out (c−r) by L (r, ϕo), where the outgoing ray direction ϕo = ...... 80 kc−rk2
xiii 4.2 Experimental setup used for the 2D NLOS light field recovery. (a) Layout of the measurement studio. Length of rotating arm is 30cm, scattering surface is a brushed metal sheet coated with satin paint, 2 LED strips placed 9cm apart are the (hidden) 1D objects of interest. A CMOS cam- era captures pictures of the scattering surface at multiple angles, which constitutes the observed light field. (b) A picture of the measurement studio. All the apparatus is placed inside an enclosure and the surfaces in the interior of the enclosure are covered with anti-reflection black felt to minimize ambient light and unwanted reflections. While the picture here shows only one LED strip, the actual experimental measurements involved 2 LED strips...... 88 4.3 Results for NLOS 2D light field recovery using multi-way TV regulariza- tion. (a) Observed light field, Y. (b) Reconstructed (incident) light field containing the hidden scene, Xb . ∆-SSNR = 30.3055 dB. (c)-(f) An over- lay of the 1D observed and reconstructed light fields at different scatter points. Each plot corresponds to a single column of the 2D light fields shown in (a) and (b). Our regularized inversion algorithm produces sharp reconstructions of the 1D objects from diffuse or blurry measurements.. 90 4.4 Experimental setup used for thermal NLOS imaging. (a) Layout of the L-shaped hallway scene used. A LWIR camera mounted on an 1m long computer-controlled arm captures thermal images of a flat piece of ma- sonite at 10 frames/second. The hidden scene comprises of a moving
human subject, who is located at a distance dH from the wall. (b) A picture of the experimental setup. Reflecting surface used in our experi- ments (masonite) is different from the one pictured here (black masonite). 92 4.5 Observed (noisy) LWIR images and corresponding reconstructions (of
the “mirror-reflections”) for dH = 1.8m. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iterations (column 4) are approximately the same. Full videos are available in the supplementary material...... 94
xiv 4.6 Observed (noisy) LWIR images and corresponding reconstructions (of the
“mirror-reflections”) for dH = 2.7m. The signal level in the observations are significantly smaller compared to Figure 4.5 since the person is farther away from the wall. 3D-TV reconstructions are “smoother” than their 2D counterpart since they exploit the temporal structure. Reconstruction speeds (throughputs) of the 2D-TV method (column 2) and the 3D-TV method with 125 iteration (column 4) are approximately the same. Full videos are available in the supplementary material...... 95 4.7 Masonite scattering surface as seen by a visible camera...... 96
xv Chapter 1
Introduction
Parameter estimation problems involve estimating an unknown quantity (or parameter) of interest, from a set of data (or observations) that contains some information about said the parameter. Such problems are ubiquitous and widely studied across diverse disciplines in science and engineering including, but not limited to, physics, computer science, signal processing, computational genomics, and economics. The steps involved in a typical parameter estimation problem are outlined in Figure 1.1. The functional dependence between the parameter of interest θ∗, and the observations y, is described by the so-called “forward operator” f(·), which is typically assumed to be known a priori. In real-world applications, the observations y are modeled as noisy versions of the nominal forward mapping f(θ∗), where the noise might arise from a variety of sources, e.g., unmodeled effects, quantization errors, etc. This constitutes the forward model. The inverse problem, as the name suggests, involves recovering θ∗ back from noisy observations y using an estimation algorithm (or an estimator, in short). The error (under a suitable metric) incurred by an estimator θb in recovering the true parameter determines the accuracy of the estimates. Thus, estimation algorithms are often ac- companied by an analysis of their respective estimation errors, which determines the “goodness” of the estimates. A useful complement involves analyzing the fundamen- tal (or information-theoretic) limits of estimating parameters from noisy data, which quantifies the smallest achievable error (by any estimator). Such analyses provide a benchmark for optimality against which the performance of any estimator can be com- pared.
1 2
Figure 1.1: A typical parameter estimation problem consists of the “forward model,” which describes how noisy observations y are generated from the parameter of interest θ∗, and the “inverse problem,” which entails estimating θ∗ from y. The noise (and hence the observations) are modeled as a random quantities whose distribution is assumed to be known a priori. The distance between θb(y) and θ∗ (under a suitable, specified distance metric) determines the “goodness” of the estimator.
1.1 Information-Theoretic Analyses of Parameter Estima- tion Problem
It is well-known that an inverse problem depends on two key components: (1) the complexity of the parameter class, and (2) the forward model. If the parameter of interest θ∗ belongs to a structured class of signals (or vectors), then it is possible to leverage this structure and develop estimators that incur small errors; e.g., compressive sensing methods utilize such insights to accurately recover signals that admit sparse representations under a suitable basis (see e.g. results in [2]). The forward model also plays a crucial role in determining how much information about θ∗ is conveyed by the noisy observations y; e.g., results in compressive sensing have shown that measurement matrices with independent and identically distributed (i.i.d) Gaussian entries are a good forward model to recover sparse signals from linear measurements (see e.g. results in [2]). Unsurprisingly, the aforementioned factors play a crucial role in determining the fundamental limits of a parameter estimation problem as well. In Chapters2 and3 of this dissertation, we analyze the fundamental limits of parameter estimation from an information-theoretic standpoint for two different problems, one where the signal of interest lies in a structured class, and the other where the forward model is complicated and highly non-linear, making it difficult to obtain lower bounds using conventional methods, respectively. 3 We first examine the fundamental error characteristics for a general class of ma- trix completion problems, where the matrix of interest belongs to a structured class, in Chapter2. In particular, we consider matrices that can be expressed a product of two a priori unknown matrices, one of which is sparse. Our main contributions come in the form of minimax lower bounds for the expected per-element squared error for this problem under several common noise models. Specifically, we analyze scenarios where the corruptions are characterized by additive Gaussian noise or additive heavier-tailed (Laplace) noise, Poisson-distributed observations, and highly-quantized (e.g., one-bit) observations, as instances of our general result that appeared in [3]. Our results estab- lish that the error bounds derived in [4] for complexity-regularized maximum likelihood estimators achieve, up to multiplicative constants and logarithmic factors, the minimax error rates in each of these noise scenarios, provided that the nominal number of obser- vations is large enough, and the sparse factor has (on an average) at least one non-zero per column. It is worth pointing out that the lower bounds derived here quantify the smallest achievable errors by any estimator over all possible values of the parameter of interest. In other words, the lower bounds derived in Chapter2 are not a function of the individual parameter value, but depend on the entire parameter class, and hence are known as global lower bounds. The second part of this dissertation involves analyzing the fundamental limits of scene parameter estimation in plenoptic imaging systems, with an eye towards passive indirect imaging problems. In imaging science, plenoptic functions (also known as “light fields”) are high-dimensional functions (often 5D or higher) that describe the amount of light flowing through every point in space in every direction [5]. Given that tradi- tional images can be interpreted as projections of the plenoptic function onto distinct (2D) spatial planes, exploiting the other dimensions of the plenoptic function can pro- vide substantially more information about scenes of interest than do single snapshots. Specifically, we are interested in the fundamental limits of estimating scene parameters that are not in the line-of-sight (LOS) of the imaging system from information-rich (but noisy) plenoptic observations. In Chapter3, we present a general framework to compute lower bounds on the per- parameter mean squared error (MSE) of any unbiased estimator from noisy plenoptic data. The proposed framework builds on our initial work that appeared in [6] and en- ables us to compute local lower bounds (bounds that are a function of the individual 4 parameter value) for plenoptic imaging problems. Unlike the matrix completion prob- lem mentioned above, the forward model in plenoptic imaging settings is analytically challenging to express in closed-form, as it involves solving an integral equation (Fred- holm integral equation of the second kind). We circumvent this roadblock by using computer graphics rendering software to synthesize the forward model and numerically evaluate the Hammersley-Chapman-Robbins lower bound (HCR-LB) to establish lower bounds on the variance (or equivalently, the MSE) of any unbiased estimator of the unknown parameters. The HCR-LB enjoys several advantages over the more commonly used Cramer-Rao lower bound (CR-LB). Firstly, the HCR-LB doesn’t make any regu- larity assumptions about the log-likelihood function and hence is applicable to a broader class of problems. Secondly, the HCR-LB is at least as tight as the CR-LB when both bounds exist. Unlike the CR-LB, the HCR-LB doesn’t require computing derivatives of the score function, which is challenging to do here (since the forward model is not easily described in closed-form, computing its derivatives is even more of a challenge), and hence is a natural choice for our setting. The potential benefits of using the HCR- LB come at a cost of increased computational requirements. Evaluating the HCR-LB requires solving an optimization problem that can be computationally demanding rela- tive to evaluating the CR-LB, which simply requires computing (or approximating using Finite Difference (FD) methods) the derivatives of the log-likelihood function. While computing lower bounds typically requires knowledge of the true forward mapping (θ∗ 7→ f(θ∗)) exactly, in practice, rendering software packages only provide an approximate solution of the true forward mapping. For scenarios where the rendering error is non-negligible, we analyze the effects of such rendering inconsistencies on the proposed lower bounding framework both theoretically and via simulations. We also provide a simple method to obtain upper and lower intervals for the true HCR-LB in the presence of rendering errors. In addition to computing lower bounds, our frame- work enables us to localize the information content in the plenoptic observations using the pixelwise Fisher Information metric (computed using FD), thus providing valuable insights about Non-Line-of-Sight (NLOS) imaging problems. Some of the findings from our framework provide additional validation to the phenomenon observed in [7–11], where the authors use occluders to aid in NLOS scene recovery. We demonstrate the utility of our framework by computing the HCR-LB under Poisson noise and additive white Gaussian noise models, for a few canonical estimation problems. We also compare the lower bounds with the performance of Maximum Likelihood Estimators (MLEs) for 5 an object localization problem, which shows that for the scenarios we examined our lower bounds are nearly tight and hence are indicative of the true fundamental limits.
1.2 Algorithmic Methods for Non-Line-of-Sight Plenoptic Imaging
In the third and final part of this dissertation, we move away from the paradigm of fundamental limits and focus on algorithmic methods for NLOS imaging from noisy plenoptic data, where the aim is to recover a hidden scene of interest from noisy mea- surements that arise from reflections off a scattering surface, e.g. a wall, or the floor. Unlike the previous problem where the parameter of interest is related to the observa- tions through multiple (and potentially, infinite) bounces, the aim of NLOS imaging is to invert a “single-bounce” model and recover the light field incident on the reflecting surface. Using the rendering equation [12], it can be shown that the NLOS observations are a linear function of the hidden scene and the reflectance properties of the surface. At visible wavelengths, commonly occurring surfaces scatter the incident light diffusely (over a wide range of angles), making the forward model of the NLOS imaging prob- lem extremely ill-conditioned, and thus, challenging. However, plenoptic data, which typically comprises multiple single snapshot images collected across time and through multiple viewpoints is a highly structured multi-way tensor and has smooth variations across all plenoptic dimensions, due to parallax and/or motion in the hidden scene. We propose leveraging this structure of the plenoptic function (or the light field) using a multi-way Total Variation (TV) regularized linear inversion methodology, which jointly enforces sparse gradients across multiple plenoptic dimensions, to recover the hidden scene of interest. In Chapter4, we present an algorithm based on the split Bregman method [13] to solve our regularized linear inverse problem. The proposed algorithm has a fast conver- gence rate and admits a distributed (GPU-accelerated) implementation. Additionally, our algorithm only requires access to the linear forward model via function calls to the forward operator (and its adjoint), and doesn’t require storing the full forward operator as a matrix. This can be extremely beneficial as matricized forward operators can get very large even for problems of modest size. We demonstrate the efficacy of our regular- ized inversion algorithm for recovering simple 2D NLOS scenes from real data collected 6 in a real-world experimental setup. A common method to quantify the performance of a signal recovery algorithm is to use Signal to Noise Ratio (SNR). However, computing the SNR requires access to the ground truth data. For NLOS imaging experiments, measuring ground truth data can be an arduous or even an impossible task. To overcome this hurdle, we propose a novel metric called the Signal-Separation to Noise Ratio (SSNR), as evaluation metric for NLOS imaging problems. The proposed SSNR metric can be directly estimated from (observed and reconstructed) images without the ground truth data, and also faithfully quantify the reconstruction quality. We use this metric to quantify the accuracy of our reconstructions. It was recently observed in [14] that ordinary materials (e.g. rough metallic sur- faces, colored acrylic) behave almost mirror-like at infra-red wavelengths. Using this insight, we apply our multi-way TV regularized inversion methodology for recovering hidden scenes from noisy plenoptic data at Long-Wave InfraRed (LWIR) wavelengths. Experimental results on real data shows that it is possible to reliably image human subjects around a corner, nearly in real-time, using our algorithm. Furthermore, we compare our multi-way TV regularized method against applying the standard 2D TV regularization independently across multiple frames of the plenoptic images as in [14]. Reconstruction results show the added benefits of jointly exploiting the structure across multiple plenoptic dimensions, both qualitatively and quantitatively. Finally, it is worth noting that while the notation within each chapter is consistent, there might be some variations and reuse of symbols across chapters. Since each chapter is a self-contained effort, there might also be some overlap in the introductory material throughout. Chapter 2
Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models
2.1 Introduction
The matrix completion problem involves imputing the missing values of a matrix from an incomplete, and possibly noisy sampling of its entries. In general, without making any assumption about the entries of the matrix, the matrix completion problem is ill- posed and it is impossible to recover the matrix uniquely. However, if the matrix to be recovered has some intrinsic structure (e.g., low rank structure), it is possible to design algorithms that exactly estimate the missing entries. Indeed, the performance of low-rank matrix completion and estimation methods have been extensively studied in noiseless settings [15–19], in noisy settings where the observations are affected by additive noise [20–26], and in settings where the observations are non-linear (e.g., highly- quantized or Poisson distributed observation) functions of the underlying matrix entry (see, [27–29]). Recent works which explore robust recovery of low-rank matrices under malicious sparse corruptions include [30–33]. A notable advantage of using low-rank models is that the estimation strategies in- volved in completing such matrices can be cast into efficient convex methods which are well-understood and suitable to analyses. The fundamental estimation error charac- teristics for more general completion problems, for example, those employing general
7 8 bilinear factor models, have not (to our knowledge) been fully characterized. In this work, we provide several new results in this direction. Our focus here is on matrix completion problems under sparse factor model assumptions, where the matrix to be estimated is well-approximated by a product of two matrices, one of which is sparse. Such models have been motivated by a variety of applications in dictionary learning, subspace clustering, image demosaicing, and various machine learning problems (see, e.g. the discussion in [4]). Here, we investigate fundamental lower bounds on the achievable estimation error for these problems in several specific noise scenarios – addi- tive Gaussian noise, additive heavier-tailed (Laplace) noise, Poisson-distributed obser- vations, and highly-quantized (e.g., one-bit) observations. Our analyses compliment the upper bounds provided recently in [4] for complexity-penalized maximum likelihood es- timation methods, and establish that the error rates obtained in [4] are nearly minimax optimal (as long as the nominal number of measurements is large enough).1
2.1.1 Organization
The remainder of this chapter is organized as follows. We begin with a brief overview of the various preliminaries and notations in Section 2.1.2 followed by a formal definition of the matrix completion problem considered here in Section 2.2. Our main results are stated in Section 2.3; there, we establish minimax lower bounds for the recovery of a matrix X∗ that admits a sparse factorization under a general class of noise models. We also briefly discuss the implications of these bounds for different instances noise distributions and compare them with existing works. In Section 2.4 we conclude with a concise discussion of possible extensions and potential future directions. The proofs of our main results are provided in the Appendix.
2.1.2 Notations and Preliminaries
We provide a brief summary of the notations used here and revisit a few key concepts before delving into our main results. We let a ∨ b = max{a, b} and a ∧ b = min{a, b}. For any n ∈ N,[n] denotes the set of integers {1, . . . , n}. For a matrix M, we use the following notation: kMk0 denotes the number of non-zero elements in M, kMk∞ = maxi,j |Mi,j| denotes the entry-wise
1The material in Chapter2 is c 2018 IEEE. Reprinted, with permission, from IEEE Transactions on Information Theory, “Minimax Lower Bounds for Noisy Matrix Completion Under Sparse Factor Models,” A.V. Sambasivan and J. D. Haupt. 9 qP 2 maximum (absolute) entry of M, and kMkF = i,j Mi,j denotes the Frobenius norm. We use the standard asymptotic computational complexity (O, Ω, Θ) notations to suppress leading constants in our results for clarity of exposition. We also briefly recall an important information-theoretic quantity, the Kullback- Leibler divergence (or KL divergence). When x(z) and y(z) denote the pdfs (or pmfs) of real scalar random variables, the KL divergence of y from x is denoted by K(Px, Py) and given by
x(Z) K( , ) = log Px Py EZ∼x(Z) y(Z) x(Z) log , , Ex y(Z) provided x(z) = 0 whenever y(z) = 0, and ∞ otherwise. The logarithm is taken be the natural log. It is worth noting that the KL divergence is not necessarily commutative and, K(Px, Py) ≥ 0 with K(Px, Py) = 0 when x(Z) = y(Z). In a sense, the KL divergence quantifies how “far” apart two distributions are.
2.2 Problem Statement
2.2.1 Observation Model
∗ n ×n We consider the problem of estimating the entries of an unknown matrix X ∈ R 1 2 that admits a factorization of the form,
X∗ = D∗A∗, (2.1)
∗ n ×r ∗ r×n where for some integer r ∈ [n1 ∧ n2], D ∈ R 1 and A ∈ R 2 are a priori unknown factors. Additionally, our focus in this paper will be restricted to the cases where the ∗ matrix A is k-sparse (having no more than k ≤ rn2 nonzero elements). We further note here that if k < r, then the matrix A∗ will necessarily have zero rows which can be removed without affecting the product X∗. Hence without loss of generality, we assume that k ≥ r. In addition to this, we assume that the elements of D∗ and A∗ are bounded, so that ∗ ∗ kD k∞ ≤ 1 and kA k∞ ≤ Amax, (2.2) 10 ∗ for some constant Amax > 0. A direct implication of (2.2) is that the elements of X ∗ are also bounded (kX k∞ ≤ Xmax ≤ rAmax). However in most applications of interest
Xmax need not be as large as rAmax (for example, in the case of recommender systems, ∗ the entries of X are bounded above by constants, i.e. Xmax = O(1)). Hence we further assume here that Xmax = Θ(Amax) = O(1). While bounds on the amplitudes of the elements of the matrix to be estimated often arise naturally in practice, the assumption that the entries of the factors are bounded fixes some of the scaling ambiguities inherent to the bilinear model. Instead of observing all the elements of the matrix X∗ directly, we assume here that we make noisy observations of X∗ at a known subset of locations. In what follows, we will model the observations Yi,j as independent draws from a probability distribution ∗ (or mass) function parametrized by the true underlying matrix entry Xi,j. We denote by S ⊆ [n1] × [n2] the set of locations at which observations are collected, and assume that these points are sampled randomly with E[|S|] = m (which denotes the nominal number of measurements) for some integer m satisfying 1 ≤ m ≤ n1n2. Specifically, for
γ0 = m/(n1n2), we suppose S is generated according to an independent Bernoulli(γ0) model, so that each (i, j) ∈ [n1] × [n2] is included in S independently with probability ∗ γ0. Thus, given S, we model the collection of |S| measurements of X in terms of the collection {Yi,j}(i,j)∈S , YS of conditionally (on S) independent random quantities. The joint pdf (or pmf) of the observations can be formally written as
Y p ∗ (Y ) p ∗ (Y ) ∗ , (2.3) XS S , Xi,j i,j , PX (i,j)∈S where p ∗ (Y ) denotes the corresponding scalar pdf (or pmf), and we use the short- Xi,j i,j ∗ ∗ hand XS to denote the collection of elements of X indexed by (i, j) ∈ S. Given S ∗ and the corresponding noisy observations YS of X distributed according to (2.3), our matrix completion problem aims at estimating X∗ under the assumption that it admits a sparse factorization as in (2.1).
2.2.2 The Minimax Risk
In this paper, we examine the fundamental limits of estimating the elements of a matrix that follows the model (2.1) and observations as described above, using any possible estimator (irrespective of its computational tractability). The accuracy of an estimator 11 Xb in estimating the entries of the true matrix X∗ can be measured in terms of its risk R which we define to be the normalized (per-element) Frobenius error, Xb
h ∗ 2 i EYS kXb − X kF RX , . (2.4) b n1n2
Here, our notation is meant to denote that the expectation is taken with respect to all of the random quantities (i.e., the joint distribution of S and YS ). Let us now consider a class of matrices parametrized by the inner dimension r, sparsity factor k and upper bound Amax on the amplitude of elements of A, where each element in the class obeys the factor model (2.1) and the assumptions in (2.2). Formally, we set
n1×n2 n1×r r×n2 X (r, k, Amax) , {X = DA ∈ R : D ∈ R , kDk∞ ≤ 1 and A ∈ R ,
kAk0 ≤ k, kAk∞ ≤ Amax}. (2.5)
The worst-case performance of an estimator Xb over the class X (r, k, Amax), under the Frobenius error metric defined in (2.4), is given by its maximum risk,
ReX , sup RX. b ∗ b X ∈X (r,k,Amax)
The estimator having the smallest maximum risk among all possible estimators and is said to achieve the minimax risk, which is a characteristic of the estimation problem itself. For the problem of matrix completion under the sparse factor model described in Section 2.2.1, the minimax risk is expressed as
R∗ inf R X (r,k,Amax) , eXb Xb = inf sup RX ∗ b Xb X ∈X (r,k,Amax) h ∗ 2 i EYS kXb − X kF = inf sup . ∗ Xb X ∈X (r,k,Amax) n1n2 (2.6)
As we see, the minimax risk depends on the choice of the model class parameters r, k and Amax. It is worth noting that inherent in the formulation of the minimax risk are 12 the noise model and the nominal number of observations (m = E[|S|]) made. For the sake of brevity, we shall not make all such dependencies explicit. In general it is complicated to obtain closed form solutions for (2.6). Here, we will adopt a common approach employed for such problems, and seek to obtain lower bounds on the minimax risk, R∗ using tools from [34]. Our analytical approach X (r,k,Amax) is inspired also by the approach in [35], which considered the problem of estimating low- rank matrices corrupted by sparse outliers.
2.3 Main Results and Implications
In this section we establish lower bounds on the minimax risk for the problem settings defined in Section 2.2 where the KL divergences of the associated noise distributions exhibit a certain property (a quadratic upper bound in terms of the underlying param- eters of interest; we elaborate on this later). We consider four different noise models; additive Gaussian noise, additive Laplace noise, Poisson noise, and quantized (one-bit) observations, as instances of our general result. The proof of our main result presented in Theorem 2.1 appears in Appendix 2.6.1.
Theorem 2.1. Suppose the scalar pdf (or pmf) of the noise distribution satisfies, for all x, y in the domain of the parameter space,
1 2 K(Px, Py) ≤ 2 (x − y) , (2.7) 2µD for some constant µD which depends on the distribution. For observations made as independent draws YS ∼ PX∗ , there exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys
( ) ∗ 2 2 2 n1r + k R ≥ C · min ∆(k, n2)A , γ µ , (2.8) X (r,k,Amax) max D m where
∆(k, n2) , min {1, (k/n2)} . (2.9)
Let us now analyze the result of this theorem more closely and see how the estima- tion risk varies as a function of the number of measurements obtained, as well as the 13 dimension and sparsity parameters sof the matrix to be estimated. We can look at the minimax risk in equation (2.8) in two different scenarios w.r.t the sampling regime:
• Large sample regime or when m (n1r ∨ k), (where we use the notation to suppress dependencies on constants). In this case we can rewrite (2.8) and lower bound the minimax risk as ∗ 2 n1r + k R = Ω (µD ∧ Amax) ∆(k, n2) . (2.10) X (r,k,Amax) m
Here the quantities n1r and k (which give the maximum number of non-zeros in D∗ and A∗ respectively) can be viewed as the number of degrees of freedom
n1r contributed by each of the factors in the matrix to be estimated. The term m · ∆(k, n2) can be interpreted as the error associated with the non-sparse factor ∗ which follows the parametric rate (n1r/m) when k ≥ n2, i.e. A (on an average) has more than one non-zero element per column. Qualitatively, this implies that all the degrees of freedom offered by D∗ manifest in the estimation of the overall matrix X∗ provided there are enough non-zero elements (at least one non-zero per column) in A∗. If there are (on an average) less than one non-zero element per column in the sparse factor, a few rows of D∗ vanish due to the presence of zero columns in A∗ and hence all the degrees of freedom in D∗ are not carried over to X∗ (resulting in zero columns in X∗). This makes the overall problem easier and ∗ reduces the minimax risk (associated with D ) by a factor of (k/n2). Similarly, k ∗ m · ∆(k, n2) is the error term associated with the sparse factor A , and it follows the parametric rate of (k/m) in the large sample regime provided k ≥ n2.
• Small sample regime or when m (n1r ∨ k). In this case the minimax risk in (2.8) becomes R∗ = Ω ∆(k, n )A2 . (2.11) X (r,k,Amax) 2 max
Equation (2.11) implies that the minimax risk in estimating the unknown matrix doesn’t become arbitrarily large when the nominal number of observations is much smaller than the number of degrees of freedom in the factors (or when m 2 n1r + k), but is instead lower bounded by the squared-amplitude (Amax) of the sparse factor (provided there are sufficient non-zeros in A∗ for all the degrees of freedom to manifest). This is a direct consequence of our assumption that the entries of the factors are bounded. 14 The virtue of expressing the lower bounds for the minimax risk as in (2.8) is that we don’t make any assumptions on the nominal number of samples collected and is hence a valid bound over all sampling regimes. However, in the discussions that follow, we shall often consider the large sample regime and appeal to the lower bounds of the form (2.10). In the following sections, we consider different noise models which satisfy the KL-divergence criterion (2.7) and present lower bounds for each specific instance as corollaries of our general result presented in Theorem 2.1.
2.3.1 Additive Gaussian Noise
Let us consider a setting where the observations are corrupted by i.i.d zero-mean additive Gaussian noise with known variance. We have the following result; its proof appears in Appendix 2.6.2
∗ Corollary 2.1 (Lower bound for Gaussian Noise). Suppose Yi,j = Xi,j + ξi,j, 2 where ξi,j are i.i.d Gaussian N (0, σ ), σ > 0, ∀(i, j) ∈ S. There exist absolute constants
C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys
( ) ∗ 2 2 2 n1r + k R ≥ C · min ∆(k, n2)A , γ σ . (2.12) X (r,k,Amax) max m
Remark 2.1. If instead of i.i.d Gaussian noise, we have that ξi,j are just independent 2 2 zero-mean additive Gaussian random variables with variances σi,j ≥ σmin ∀(i, j) ∈ S, the result in (2.12) is still valid with the σ replaced by σmin. This stems from the fact that the KL divergence between the distributions in equations (2.42) and (2.45) can be upper bounded by the smallest of value variance amongst all the noise entries.
It is worth noting that our lower bounds on the minimax risk relate directly to the work in [4], which gives upper bounds for matrix completion problems under similar sparse factor models. The normalized (per-element) Frobenius error for the sparsity- penalized maximum likelihood estimator under a Gaussian noise model presented in [4] satisfies
h ∗ 2 i Y kXb − X k E S F 2 n1r + k = O (σ ∧ Amax) log(n1 ∨ n2) . (2.13) n1n2 m 15 A comparison of (2.13) to our results in Equation (2.12) imply that the rate attained by the estimator presented in [4] is minimax optimal up to a logarithmic factor when ∗ there is (on an average), at least one non-zero element per columns of A (i.e. k ≥ n2), and provided we make sufficient observations (i.e m n1r + k). Another direct point of comparison to our result here is the low rank matrix comple- tion problem with entry-wise observations considered in [22]. In particular, if we adopt the lower bounds obtained in Theorem 6 of that work to our settings, we observe that the risk involved in estimating rank-r matrices that are sampled uniformly at random follows
h ∗ 2 i Y kXb − X k E S F 2 (n1 ∨ n2)r = Ω (σ ∧ Xmax) , n1n2 m (n + n )r = Ω (σ ∧ X )2 1 2 , max m (2.14)
where the last equality follows from the fact that n1 ∨ n2 ≥ (n1 + n1)/2. If we consider ∗ ∗ ∗ non-sparse factor models (where k = rn2), it can be seen that the product X = D A is low-rank with rank(X∗) ≤ r and our problem reduces to the one considered in [22]
(with m ≥ (n1 ∨ n2)r, an assumption made in [22]). Under the conditions described above, and our assumption in Section 2.2.1 that Xmax = Θ(Amax), the lower bound given in (2.12) (or it’s counterpart for the large sample regime) coincides with (2.14). However the introduction of sparsity brings additional structure which can be exploited in estimating the entries of X∗, thus decreasing the risk involved.
2.3.2 Additive Laplace Noise
The following theorem gives a lower bound on the minimax risk in settings where the observations YS are corrupted with heavier tailed noises; its proof is given in Appendix 2.6.3.
∗ Corollary 2.2 (Lower bound for Laplacian Noise). Suppose Yi,j = Xi,j + ξi,j, where ξi,j are i.i.d Laplace(0, τ), τ > 0, ∀(i, j) ∈ S. There exist absolute constants
C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and r ≤ k ≤ n1n2/2, the minimax 16 risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys
( ) ∗ 2 2 −2 n1r + k R ≥ C · min ∆(k, n2)A , γ τ . (2.15) X (r,k,Amax) max m
When we compare the lower bounds obtained under this noise model to the results of the previous case it can be readily seen that the overall error rates achieved are similar in both cases. Since we have the variance of Laplace(τ) random variable to be (2/τ 2), the leading term τ −2 in (2.15) is analogous to the σ2 factor which appears in the error bound for Gaussian noise. Using (2.15), we can observe that the complexity penalized maximum likelihood estimator described in [4] is minimax optimal up to a constant times a logarithmic factor, τXmax log(n1 ∨ n2) in the large sample regime, and when ∗ there is (on an average), at least one non-zero element per columns of A (i.e. k ≥ n2).
2.3.3 One-bit Observation Model
We consider here a scenario where the observations are quantized to a single bit, i.e. the observations Yi,j can take only binary values (either 0 or 1). Quantized observation models arise in many collaborative filtering applications where the user ratings are quantized to fixed levels, in quantum physics, communication networks, etc. (see, e.g. discussions in [27, 36]).
For a given sampling set S, we consider the observations YS to be conditionally (on S) independent random quantities defined by
Yi,j = 1{Zi,j ≥0}, (i, j) ∈ S, (2.16) where ∗ Zi,j = Xi,j − Wi,j.
Here the {Wi,j}(i,j)∈S are i.i.d continuous zero-mean scalar noises having (bounded) probability density function f(w) and cumulative density function F (w) for w ∈ R, and
1{A} is the indicator function which takes the value 1 when the event A occurs (or is true) and zero otherwise. Our observations are thus quantized, corrupted versions of the true underlying matrix entries. Note that the independence of Wi,j implies that the elements Yi,j are also independent. Given this model, it can be easily seen that each
Yi,j, (i, j) ∈ S, is a Bernoulli random variable whose parameter is a function of the 17 ∗ true parameter Xi,j, and the cumulative density function F (·). In particular, for any ∗ ∗ (i, j) ∈ S, we have Pr(Yi,j = 1) = Pr(Wi,j ≤ Xi,j) = F (Xi,j). Hence the joint pmf of |S| the observations YS ∈ {0, 1} (conditioned on the underlying matrix entries) can be written as, Y ∗ Y ∗ 1−Y p ∗ (Y ) = [F (X )] i,j [1 − F (X )] i,j . (2.17) XS S i,j i,j (i,j)∈S
We will further assume that F (rAmax) < 1 and F (−rAmax) > 0, which will allow us to avoid some pathological scenarios in our analyses. In such settings, the following theorem gives a lower bound on the minimax risk; its proof appears in Appendix 2.6.4.
Corollary 2.3 (Lower bound for One-bit observation model). Suppose that the observations Yi,j are obtained as described in (2.16) where Wi,j are i.i.d continuous zero-mean scalar random variables as described above, and define
1/2 1/2 1 ! 2 cF,rAmax , sup sup f (t) . (2.18) |t|≤rAmax F (t)(1 − F (t)) |t|≤rAmax
There exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, 1 ≤ r ≤ (n1 ∧ n2), and r ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model class X (r, k, Amax) obeys
( ) ∗ 2 2 −2 n1r + k R ≥ C · min ∆(k, n2)A , γ c . (2.19) X (r,k,Amax) max F,rAmax m
It worth commenting on the relevance of our result (in the linear sparsity regime) to the upper bounds established in [4], for the matrix completion problem under similar settings. The normalized (per element) error of the complexity penalized maximum likelihood estimator described in [4] obeys
h i kX − X∗k2 2 ! ! ! EYS b F c 1 n r + k = O F,rAmax + X2 1 log(n ∨ n ) , n n c0 c2 max m 1 2 1 2 F,rAmax F,rAmax (2.20) where Xmax (≥ 0) is the upper bound on the entries of the matrix to be estimated and 18 c0 is defined as F,rAmax
f 2(t) 0 cF,rAmax , inf . (2.21) |t|≤rAmax F (t)(1 − F (t))
Comparing (2.20) with the lower bound established in (2.19), we can see that esti- mator described in [4] is minimax optimal up to a logarithmic factor (in the large sample regime, and with k ≥ n ) when the term (c2 /c0 ) is bounded above by a con- 2 F,rAmax F,rAmax stant. The lower bounds obtained for the one-bit observation model and the Gaussian case essentially exhibit the same dependence on the matrix dimensions (n1, n2 and r), sparsity (k) and the nominal number of measurements (m), except for the leading term
(which explicitly depends on the distribution of the noise variables Wi,j for the one- bit case). Such a dependence in error rates between rate-constrained tasks and their Gaussian counterparts was observed in earlier works on rate-constrained parameter es- timation [37, 38]. It is also interesting to compare our result with the lower bounds for the one-bit (low rank) matrix completion problem considered in [27]. In that work, the authors establish that the risk involved in matrix completion over a (convex) set of max-norm and nuclear norm constrained matrices (with the decreasing noise pdf f(t) for t > 0) obeys
h i kX − X∗k2 s r ! EYS b F 1 (n ∨ n )r = Ω X 1 2 n n max c0 m 1 2 F,rAmax s r ! 1 (n + n )r = Ω X 1 2 , (2.22) max c0 m F,rAmax
q where c0 is defined as in (2.21). As long as c2 and c0 are comparable, F,rAmax F,rAmax F,rAmax the leading terms of our bound and (2.22) are analogous to each other. In order to note the difference between this result and ours, we consider the case when A∗ is not ∗ sparse i.e., we set k = rn2 in (2.19) so that the resulting matrix X is low-rank (with rank(X) ≤ r). For such a setting, our error bound (2.19) scales in proportion to the ratio of the degrees of freedom (n1 + n2)r and the nominal number of observations m, while the bound in [27] scales to the square root of that ratio. A more recent work [39], proposed an estimator for the low-rank matrix completion 19 on finite alphabets and establishes convergence rates faster than in [27]. On casting their results to our settings, the estimation error in [39] was shown to obey
!2 ! kXb − X∗k2 c2 (n + n )r F = O F,rAmax 1 2 log(n + n ) . (2.23) n n c0 m 1 2 1 2 F,rAmax
On comparing (2.23) with the our lower bounds (for the low-rank case, where k = rn2), it is worth noting that their estimator achieves minimax optimal rates up to a logarithmic factor when the ratio (c2 /c0 ) is bounded above by a constant. F,rAmax F,rAmax
2.3.4 Poisson-distributed Observations
Let us now consider a scenario where the data maybe observed as discrete ‘counts’ (which is common in imaging applications e.g., number of photons hitting the receiver per unit time). A popular model for such settings is the Poisson model, where all the ∗ entries of the matrix X to be estimated are positive and our observation Yi,j at each location (i, j) ∈ S is an independent Poisson random variable with a rate parameter ∗ Xi,j. The problem of matrix completion now involves the task of Poisson denoising. Unlike the previous cases, this problem cannot be directly cast into the setting of our general result, as there is an additional restriction on the model class that the entries of X∗ are strictly bounded away from zero. A straightforward observation that follows is that the sparse factor A∗ in the factorization cannot have any zero valued columns.
Hence we have that k ≥ n2 be satisfied as a necessary (but not a sufficient) condition in this case. The approach we use to derive the following result is similar in spirit to the previous cases and is described in Appendix 2.6.5.
Theorem 2.2 (Lower bound for Poisson noise). Suppose that the entries of the ma- ∗ trix X satisfy min Xi,j ≥ Xmin for some constant 0 < Xmin ≤ Amax and the observations i,j ∗ Yi,j are independent Poisson distributed random variable with rates Xi,j ∀(i, j) ∈ S.
There exist absolute constants C, γ > 0 such that for all n1, n2 ≥ 2, r ∈ [n1 ∧ n2], and n2 ≤ k ≤ n1n2/2, the minimax risk for sparse factor matrix completion over the model 0 class X (r, k, Amax, Xmin) which is a subset of X (r, k, Amax) comprised of matrices with positive entries, obeys
( ) ∗ 2 2 n1r + k − n2 R 0 ≥ C · min ∆(e k, n2, δ)A , γ Xmin , (2.24) X (r,k,Amax,Xmin) max m 20 where δ = Xmin , and the function ∆(k, n , δ) is given by Amax e 2 2 k − n2 ∆(e k, n2, δ) = min (1 − δ) , . (2.25) n2
As in the previous cases, our analysis rests on establishing quadratic upper bounds on the KL divergence to obtain parametric error rates for the minimax risk; a similar approach was used in [40], which describes performance bounds on compressive sensing sparse signal estimation task under a Poisson noise model, and in [41]. Recall that the lower bounds for each of the preceding cases exhibited a leading factor to the parametric rate, which was essentially the noise variance. Note that for a Poisson observation model, the noise variance equals the rate parameter and hence depends on the true underlying matrix entry. So we might interpret the factor Xmin in (2.24) as the minimum variance of all the independent (but not necessarily identically distributed) Poisson observations and hence is somewhat analogous to the results presented for the Gaussian and Laplace noise models. The dependence of the minimax risk on the nominal number of observations (m), matrix dimensions (n1, n2, r), and sparsity factor k, is encapsulated in the two terms, n1r k−n2 m and m . The first term, which corresponds to the error associated with the dictionary term D∗ is exactly the same as with the previous noise models. However we can see that the term associated with the sparse factor A∗ is a bit different from the other models discussed. In a Poisson-distributed observation model, we have that the entries of the true underlying matrix to be estimated are positive (which also serves as the Poisson rate parameter to the observations Yi,j). A necessary implication of this is that the sparse factor A∗ should contain no zero-valued columns, or every columns should have at least one non-zero entry (and hence we have k ≥ n2). This reduces the effective number of degrees of freedom (as described in Section 2.3.1) in the sparse factor from k to k − n2, thus reducing the overall minimax risk. It is worth further commenting on the relevance of this result (in the large sample regime) to the work in [4], which establishes error bounds for Poisson denoising problems with sparse factor models. From Corollary III.3 of [4], we see that the normalized (per element) error of the complexity penalized maximum likelihood estimator obeys
h ∗ 2 i Y kXb − X k E S F Xmax 2 n1r + k = O Xmax + · Xmax log(n1 ∨ n2) , (2.26) n1n2 Xmin m 21 where Xmax is the upper bound on the entries of the matrix to be estimated. Comparing (2.26) with the lower bound established in (2.24) (or again, it’s counterpart in the large sample regime), we can see that estimator described in [4] is minimax optimal w.r.t to the matrix dimension parameters up to a logarithmic factor (neglecting the leading constants) when k ≥ 2n2. We comment a bit on our assumption that the elements of the true underlying ∗ matrix X , be greater than or equal to some Xmin > 0. Here, this parameter shows
n1r+k−n2 up as a multiplicative term on the parametric rate ( m ), which suggests that the minimax risk vanishes to 0 at the rate of Xmin/m (when the problem dimensions are fixed). This implication is in agreement with the classical Cram´er-Raolower bounds which states that the variance associated with estimating a Poisson(θ) random variable using m iid observations decays at the rate θ/m (and achieved by a sample average estimator). Thus our notion that the denoising problem becomes easier as the rate parameter decreases is intuitive and is consistent with classical analyses. On this note, we briefly mention recent efforts which do not make assumptions on the minimum rate of the underlying Poisson processes; for matrix estimation tasks as here [29], and for sparse vector estimation from Poisson-distributed compressive observations [42].
2.4 Conclusion
In this chapter, we established minimax lower bounds for sparse factor matrix comple- tion tasks, under very general noise/corruption models. We also provide lower bounds for several specific noise distributions that fall under our general noise model. This indicates that property (2.7), which requires that the scalar KL divergences of the noise distribution admit a quadratic upper bounded in terms of the underlying parameters, is not overly restrictive in many interesting scenarios. A unique aspect of our analysis is its applicability to matrices representable as a product of structured factors. While our focus here was specifically on models in which one factor is sparse, the approach we utilize here to construct packing sets extends naturally to other structured factor models (of which standard low-rank models are one particular case). A similar analysis to that utilized here could also be used to establish lower bounds on estimation of structured tensors, for example, those expressible in a Tucker decomposition with sparse core, and possibly structured factor matrices (see, e.g., [43] for a discussion of Tucker models). We defer investigations along these lines 22 to a future effort.
2.5 Acknowledgement
We acknowledge support for this effort from the DARPA Young Faculty Award, Grant No. N66001-14-1-4047. We are grateful to the anonymous reviewer of our paper [3] for their detailed and thorough evaluations of the paper. In particular, we thank the reviewer for pointing out some subtle errors in the initial versions of our main results that motivated us to obtain tighter lower bounds.
2.6 Appendix
In order to prove Theorem 2.1 we use standard minimax analysis techniques, namely the following theorem (whose proof is available in [34]),
Theorem 2.3 (Adopted from Theorem 2.5 in [34]). Assume that M ≥ 2 and suppose that there exists a set with finite elements, X = {X0, X1,... XM } ⊂ X (r, k, Amax) such that
• d(Xj, Xk) ≥ 2s, ∀ 0 ≤ j < k ≤ M; where d(·, ·): X × X → R is a semi-distance function, and
M 1 P • M K(PXj , PX0 ) ≤ α log M with 0 < α < 1/8. j=1 Then
inf sup PX(d(Xb , X) ≥ s) ≥ inf sup PX(d(Xb , X) ≥ s) Xb X∈X (r,k,Amax) Xb X∈X √ M r 2α ≥ √ 1 − 2α − > 0. (2.27) 1 + M log M
Here the first inequality arises from the fact that the supremum over a class of matrices X is upper bounded by that of a larger class X (r, k, Amax) (or in other words, estimating the matrix over an uncountably infinite class is at least as difficult as solving the problem over any finite subclass). We thus reduce the problem of matrix completion over an uncountably infinite set X (r, k, Amax), to a carefully chosen finite collection of matrices X ⊂ X (r, k, Amax) and lower bound the latter which then gives a valid bound for the overall problem. 23 In order to obtain tight lower bounds, it essential to carefully construct the class X (which is also commonly called a packing set) with a large cardinality, such that its elements are also as far apart as possible in terms of (normalized) Frobenius distance, which is our choice of semi-distance metric.
2.6.1 Proof of Theorem 2.1
n ×n Let us define a class of matrices X ⊂ R 1 2 as
X , {X = DA : D ∈ D, A ∈ A}, (2.28)
n ×r r×n where the factor classes D ⊂ R 1 and A ⊂ R 2 are constructed as follows for
γd, γa ≤ 1 (to be quantified later)
n1×r D , D ∈ R : Di,j ∈ {0, 1, d0} , ∀(i, j) ∈ [n1] × [r] , (2.29) and
r×n2 A , A ∈ R : Ai,j ∈ {0, Amax, a0}, ∀(i, j) ∈ [r] × [n2], kAk0 ≤ k , (2.30) where ( ) γd · µD n1r 1/2 d0 , min 1, p , Amax ∆(k, n2) m ( 1/2) γa · µD k a0 , min Amax, p , ∆(k, n2) m and ∆(k, n2) is defined in (2.9). Clearly X as defined in (2.28) is a finite class of matrices which admits a factorization as in Section 2.2.1, so X ⊂ X (r, k, Amax). We consider the lower bounds involving the non-sparse term, D and the sparse factor A separately and then combine those results to a get an overall lower bound on the minimax risk R∗ . X (r,k,Amax) Let us first establish the lower bound obtained by using the sparse factor A. In order to do this, we define a set of sparse matrices A¯ ⊂ A, where all the nonzero terms 24 l m are stacked in the first r0 = k rows. Formally we define n2
¯ r×n2 T A , A ∈ R : A = (Anz|0A) , (Anz)i,j ∈ {0, a0}, 0 ∀(i, j) ∈ [n2] × [r ], kAnzk0 ≤ k , (2.31)
0 0 where Anz is an n2 ×r sparse matrix with at most k non zeros and 0A is an n2 ×(r −r ) zero matrix. Let us now define the finite class of matrices XA ⊂ X as
¯ XA , X = DI A : A ∈ A , (2.32) where DI is made up of block zeros and block identity matrices, and defined as follows
Ir0 0 . . . . . . DI , , (2.33) I 0 0 r 0 0
0 0 where, Ir0 denotes the r × r identity matrix.
The definitions in equations (2.31) to (2.33) imply that the elements of XA form T 0 block matrices which are of the form Anz ··· Anz 0 , with bn1/r c blocks of
Anz, ∀A ∈ A¯ and the rest is a zero matrix of the appropriate dimension. Since the entries of Anz can take only one of two values 0 or a0, and since there are at most k non-zero elements (due to the sparsity constraint), the Varshamov-Gilbert bound (cf. 0 Lemma 2.9 in [34]) guarantees the existence of a subset XA ⊆ XA with cardinality 0 k/8 Card(XA) ≥ 2 + 1, containing the n1 × n2 zero matrix 0, such that for any 2 distinct 25 0 elements X1, X2 ∈ XA we have,
k jn k kX − X k2 ≥ 1 a2 1 2 F 8 r0 0 2 2 k n1 2 γaµD k = min Amax, 8 dk/n2e ∆(k, n2) m 2 2 n1n2 k ∧ n2 2 γaµD k ≥ min Amax, 32 n2 ∆(k, n2) m | {z } =∆(k,n2) n n k = 1 2 min ∆(k, n )A2 , γ2µ2 , (2.34) 32 2 max a D m where the second to last inequality comes from the fact that k ≤ (n1n2)/2, and bxc ≥ x/2 ∀x ≥ 1. 0 For any X ∈ XA, consider the KL divergence of P0 from PX, pXS (YS ) K(PX, P0) = EX log p0(YS ) X m = K( , ) · (2.35) PXi,j P0i,j n n i,j 1 2 m 1 X ≤ |X |2, (2.36) n n 2µ2 i,j 1 2 D i,j where (2.35) is obtained by conditioning2 the expectation w.r.t the sampling set S, and (2.36) follows from the assumption on noise model (2.7). To further upper bound the 0 RHS of (2.36), we note that the maximum number of nonzero entries in any X ∈ XA is at most n1(k ∧ n2) by construction of the sets XA and A¯ in (2.32), (2.31) respectively. Hence we have m k ∧ n2 2 K(PX, P0) ≤ 2 a0 2µD n2 | {z } =∆(k,n2) m 2 2 2 k = 2 min ∆(k, n2)Amax, γa · µD . (2.37) 2µD m
2 Here, both the observations YS , and the sampling set S are random quantities. Thus by conditioning h i h h ii w.r.t to S, we get X Y ∼p log pX (YS )/p0(YS ) = S log pX (YS )/p0(YS ) . E , E S XS S E EXS |S S Since S is generated according to the independent Bernoulli(m/n1n2) model, ES [·] yields the constant term m . We shall use such conditioning techniques in subsequent proofs as well. n1n2 26 From (2.37) we see that
1 X K( , ) ≤ α log(Card(X 0) − 1), (2.38) Card(X 0) − 1 PX P0 A A 0 X∈XA
√ α log 2 is satisfied for any 0 < α < 1/8 by choosing 0 < γa < 2 . Equations (2.34) and (2.38) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield
∗ 2 ! kXb − X kF 1 2 2 2 k inf sup PX∗ ≥ · min ∆(k, n2)Amax, γaµD ≥ β, (2.39) ∗ Xb X ∈XA n1n2 64 m for some absolute constant β ∈ (0, 1). We now consider the non-sparse factor D to construct a testing set and establish lower bounds similar to the previous case. Let us define a finite class of matrices XD ⊆ X as
¯ XD , X = DA : D ∈ D, A = Amax Ir ··· Ir 0D ∈ A , (2.40) where A is constructed with b(k ∧ n2)/rc blocks of r × r identity matrices (denoted by j k k∧n2 ¯ Ir), 0D is r × n2 − r r zero matrix, and D ⊆ D is defined as
¯ n1×r D , D ∈ R : Dij ∈ {0, d0} , ∀(i, j) ∈ [n1] × [r] . (2.41)
The definition in (2.40) is similar to that we used to construct XA and hence it results in a block matrix structure for the elements in XD. We note here that there are n1r elements in each block D, where each entry can be either 0 or d0. Hence the Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) guarantees the existence of a subset
0 0 n1r/8 XD ⊆ XD with cardinality Card(XD) ≥ 2 + 1, containing the n1 × n2 zero matrix 0 0, such that for any 2 distinct elements X1, X2 ∈ XD we have
n r k ∧ n kX − X k2 ≥ 1 2 A2 d2 1 2 F 8 r max 0 n n n n r o ≥ 1 2 min A2 ∆(k, n ), γ2µ2 1 , (2.42) 16 max 2 d D m where we use the fact that (k ∧ n2) = n2 · ∆(k, n2), and bxc ≥ x/2 ∀x ≥ 1 to obtain the 27 last inequality. 0 For any X ∈ XD, consider the KL divergence of P0 from PX, pXS (YS ) K(PX, P0) = EX log p0(YS ) m 1 X ≤ |X |2, (2.43) n n 2µ2 i,j 1 2 D i,j where the inequality follows from the assumption on noise model (2.7). To further upper bound the RHS of (2.43), we note that the maximum number of nonzero entries in any 0 X ∈ XD is at most n1(k ∧ n2) by construction of the class XD in (2.40). Hence we have m k ∧ n2 2 2 K(PX, P0) ≤ 2 Amaxd0 2µD n2 | {z } =∆(k,n2) m n 2 2 2 n1r o = 2 min ∆(k, n2)Amax, γdµD . (2.44) 2µD m
From (2.44) we see that
1 X K( , ) ≤ α0 log(Card(X 0 ) − 1), (2.45) Card(X 0 ) − 1 PX P0 D D 0 X∈XD
√ 0 α0 log 2 is satisfied for any 0 < α < 1/8 by choosing 0 < γd < 2 . Equations (2.42) and (2.45) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield
∗ 2 ! kXb − X kF 1 2 2 2 n1r 0 inf sup PX∗ ≥ · min ∆(k, n2)Amax, γdµD ≥ β , (2.46) ∗ Xb X ∈XD n1n2 64 m for some absolute constant β0 ∈ (0, 1). Inequalities (2.39) and (2.46) imply the result,
∗ 2 ! kXb − X kF 1 2 2 2 n1r + k inf sup PX∗ ≥ · min ∆(k, n2)Amax, γD µD ∗ Xb X ∈X (r,k,Amax) n1n2 128 m ≥ (β0 ∧ β), (2.47)
0 where γD = (γd ∧γa), is a suitable value for the leading constant, and we have (β ∧β) ∈
(0, 1). In order to obtain this result for the entire class X (r, k, Amax), we use the fact 28 that solving the matrix completion problem described in Section 2.2.1 over a larger (and possibly uncountable) class of matrices is at least as difficult as solving the same problem over a smaller (and possibly finite) subclass. Applying Markov’s inequality to (2.47) directly yields the result of Theorem 2.1, completing the proof.
2.6.2 Proof of Corollary 2.1
2 For a Gaussian distribution with mean x ∈ R and variance σ , denoted by Px ∼ N (x, σ2), we have
1 1 2 px(z) = √ exp − (z − x) , ∀z ∈ R. (2.48) 2πσ2 2σ2
Using the expression for the pdf of a Gaussian random variable in (2.48), the KL diver- gence of Px from Py (for any y ∈ R) satisfies px(z) K(Px, Py) = Ex log py(z) 1 = (x − y)2. (2.49) 2σ2
The expression for KL divergence between scalar Gaussian distributions (with identical variances) given in (2.49), obeys the condition (2.7) with equality, where µD = σ. Hence we directly appeal to Theorem 2.1 to yield the desired result.
2.6.3 Proof of Corollary 2.2
For a Laplace distribution with parameter τ > 0 centered at x, denoted by Px ∼ Laplace(x, τ) where x ∈ R, the KL divergence of Px from Py (for any y ∈ R) can be computed by (relatively) straightforward calculation as
px(z) K(Px, Py) = Ex log py(z) = τ|x − y| − 1 − e−τ|x−y|. (2.50) 29 Using a series expansion of the exponent in (2.50) we have
(τ|x − y|)2 (τ|x − y|)3 e−τ|x−y| = 1 − τ|x − y| + − + ··· 2! 3! (τ|x − y|)2 ≤ 1 − τ|x − y| + . (2.51) 2!
Rearranging the terms in (2.51) yields the result,
τ 2 K( , ) ≤ (x − y)2. (2.52) Px Py 2
With the upper bound on the KL divergence established in (2.52), we directly appeal −1 to Theorem 2.1 with µD = τ to yield the desired result.
2.6.4 Proof of Corollary 2.3
∗ For any X, X ∈ X (r, k, Amax) using the pdf model described in (2.17), it is straight- forward to show that the scalar KL divergence is given by
∗ ∗ ∗ F (Xi,j) ∗ 1 − F (Xi,j) K( ∗ , ) = F (X ) log + (1 − F (X )) log , PXi,j PXi,j i,j i,j F (Xi,j) 1 − F (Xi,j) for any (i, j) ∈ S. We directly use an intermediate result from [4] to invoke a quadratic upper bound for the KL divergence term,
1 2 ∗ 2 ∗ K(PX , PXi,j ) ≤ cF,rA (Xi,j − Xi,j) , (2.53) i,j 2 max
where cF,rAmax is defined in (2.18). Such an upper bound in terms of the underlying matrix entries can be attained by following a procedure illustrated in [36], where one first establishes quadratic bounds on the KL divergence in terms of the Bernoulli parameters, and then subsequently establishes a bound on the squared difference between Bernoulli parameters in terms of the squared difference of the underlying matrix elements. With a quadratic upper bound on the scalar KL divergences in terms of the un- derlying matrix entries (2.53), we directly appeal to the result of Theorem 2.1 with µ = c−1 to yield the desired result. D F,rAmax 30 2.6.5 Proof of Theorem 2.2
The Poisson observation model considered here assumes that all the entries of the un- derlying matrix X∗ are strictly non-zero. We will use similar techniques as in Appendix 2.6.1 to derive the result for this model. However we need to be careful while construct- ing the sample class of matrices as we need to ensure that all the entries of the members should be strictly bounded away from zero (and in fact ≥ Xmin). In this proof sketch, we will show how an appropriate packing set can be constructed for this problem, and obtain lower bounds using arguments as in Appendix 2.6.1. As before, let us fix D and establish the lower bounds due to the sparse factor A alone. For γd, γa ≤ 1 (which we shall qualify later), we construct the factor classes n ×r r×n D ⊂ R 1 , and A ⊂ R 2
n1×r D , D ∈ R :Di,j ∈ {0, 1, δ, d0}, ∀(i, j) ∈ [n1] × [r] , (2.54) and
r×n2 A , A ∈ R : Ai,j ∈ {0, Xmin, Amax, a0}, ∀(i, j) ∈ [r] × [n2], kAk0 ≤ k , (2.55) where δ Xmin , , Amax
√ γd · Xmin n1r 1/2 d0 , min 1 − δ, , Amax m ( s 1/2) Xmin k − n2 a0 , min Amax, γa · , ∆(k − n2, n2) m and ∆(·, ·) is defined in (2.9). Similar to the previous case, we consider a subclass A¯ ⊂ A l m with at most k nonzero entries which are all stacked in the first r0 + 1 = k rows such n2 that ∀A ∈ A¯ we have,
Xmin for i = 1, j ∈ [n2] 0 (A)ij = (A)ij ∈ {0, a0} for 2 ≤ i ≤ r + 1, j ∈ [n2] . (2.56) 0 otherwise
n ×n Now we define a finite class of matrices XA ⊂ R 1 2 as
¯ XA , X = (D0 + DI )A : A ∈ A , (2.57) 31 where DI , D0 ∈ D are defined as
0r0 Ir0 0 . . . . . . DI , , and (2.58) 0 0 I 0 0 r r 0 0 0 D0 , 1n1 | 0 , (2.59)
n 1 0 where 1n1 is the n1 × 1 vector of all ones, and we have r0 blocks of 0r (which is the 0 0 0 r × 1 zero vector) and Ir0 (which is the r × r identity matrix), in DI . 0 The above definitions ensure that XA ⊂ X (r, k, Amax, Xmin). In particular, for any
X ∈ XA we have
X = (D0 + DI )A, T 0 = Xmin1n1 · 1n2 +DI A , (2.60) | {z } =X0
¯ 0 r×n2 0 where (D0 + DI ) ∈ D, A ∈ A, and A ∈ R just retains the rows 2 to r + 1, of A. It 0 n1 is also worth noting here that the matrix DI A has r0 copies of the nonzero elements of A0. Now let us consider the Frobenius distance between any two distinct elements
X1, X2 ∈ XA,
2 2 kX1 − X2kF = k(D0 + DI )A1 − (D0 + DI )A2kF , 0 0 2 = kX0 + DI A1 − X0 − DI A2kF , 0 0 2 = kDI A1 − DI A2kF . (2.61)
The constructions of A0 in (2.60), and the class A¯ in (2.56) imply that the number of 0 degrees of freedom in the sparse matrix A (which can take values 0 or a0) is restricted to (k − n2). The Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) can be easily 0 ¯ applied to the set of matrices of the form Xe = DI A for A ∈ A, and this coupled 0 0 with (2.61) guarantees the existence of a subset XA ⊆ XA with cardinality Card(XA) (k−n2)/8 T ≥ 2 + 1, containing the n1 × n2 reference matrix X0 = Xmin1n1 · 1n2 , such that 32 0 for any two distinct elements X1, X2 ∈ XA we have,
k − n jn k kX − X k2 ≥ 2 1 a2, 1 2 F 8 r0 0 $ % k − n n X k − n = 2 1 min A2 , γ2 min 2 , k−n max a 8 2 ∆(k − n2, n2) m n2 n1n2 (k − n2) ∧ n2 2 2 Xmin k − n2 ≥ min Amax, γa , 32 n2 ∆(k − n2, n2) m | {z } =∆(k−n2,n2) n n k − n = 1 2 min ∆(k − n , n )A2 , γ2 X 2 , (2.62) 32 2 2 max a min m where the second to last inequality comes from the fact that bxc ≥ x/2 when x ≥ 1. The joint pmf of the set of |S| observations (conditioned on the true underlying matrix) can be conveniently written as a product of Poisson pmfs using the independence criterion as, ∗ ∗ Yi,j −Xi,j Y (Xi,j) e pX∗ (YS ) = . (2.63) S (Yi,j)! (i,j)∈S
0 For any X ∈ XA, the KL divergence of PX0 from PX where X0 is the reference matrix (whose entries are all equal to Xmin) is obtained by using an intermediate result from [40] giving
X K( , ) = K( , ) PX PX0 ES PXi,j PX0i,j i,j m X Xi,j = X log − X + X . n n i,j X i,j min 1 2 i,j min
Using the inequality log t ≤ (t − 1), we can bound the KL divergence as
m X Xi,j − Xmin K( , ) ≤ X − X + X PX PX0 n n i,j X i,j min 1 2 i,j min 2 m X (Xi,j − Xmin) = . (2.64) n n X 1 2 i,j min
To further upper bound the RHS of (2.64), we note that the number of entries greater 0 than Xmin in any X ∈ XA is at most n1(n2 ∧ (k − n2)) by the construction of sets XA 33 and A¯ in (2.57),(2.56) respectively. Hence we have
2 (a0 + Xmin − Xmin) n2 ∧ (k − n2) K(PX, PX0 ) ≤ m (2.65) Xmin n2 | {z } =∆(k−n2,n2) ∆(k − n , n )a2 = m 2 2 0 , (2.66) Xmin
0 where (2.65) uses the fact that by construction, entries of the matrices in XA are upper bounded by a0 + Xmin. From (2.66) we can see that
1 X 0 K( , 0 ) ≤ α log(Card(X ) − 1), (2.67) Card(X 0) − 1 PX PX A A 0 X∈XA
√ is satisfied for any 0 < α < 1/8 by choosing γ < α√log 2 . Equations (2.62) and a 2 2 (2.67) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield
∗ 2 ! kXb − X kF 1 2 2 k − n2 inf sup PX∗ ≥ · min ∆(k − n2, n2)Amax, γa Xmin ∗ Xb X ∈XA n1n2 64 m ≥ β, (2.68) for some absolute constant β ∈ (0, 1). We use arguments similar to the previous case to establish lower bounds using the dictionary term D. Again, the key factor in the construction of a packing set is to ensure that entries of the matrices are bounded away from zero (and in fact ≥ Xmin). 0 For this let us first define the finite class, XD ⊆ X (r, k, Amax, Xmin) as
¯ XD , X = (Dδ + D)A : D ∈ D, (2.69) A = Amax Ir ··· Ir ΨD ∈ A , (2.70) where, Ir denotes the r × r identity matrix, ΨD is an r × (n2 − rbn2/rc) matrix given ! ID by ΨD = and, ID is the identity matrix of dimension (n2 − rbn2/rc) and 0D 0D is the (r − n2 + rbn2/rc) × (n2 − rbn2/rc) zero matrix, D¯ ⊆ D and Dδ ∈ D are defined 34 as
¯ n1×r D , D ∈ R : Di,j ∈ {0, d0}, ∀(i, j) ∈ [n1] × [r] , and Xmin (Dδ)i,j , = δ, ∀(i, j). Amax
The above definition of A (with the block identity matrices and ΨD) ensures that entries of the members in our packing set XD, are greater than or equal to Xmin. We can see that for any X ∈ XD we have
X = DδA + DA, T = δAmax1n1 · 1n2 +DA. | {z } =X0
Thus we can appeal to the Varshamov-Gilbert bound (cf. Lemma 2.9 in [34]) for matrices of the form Xe = DA, where D ∈ D¯ and A is defined in (2.69), to guarantee
0 0 n1r/8 the existence of a subset XD ⊆ XD with cardinality Card(XD) ≥ 2 + 1, containing T the n1 ×n2 reference matrix X0 = Xmin1n1 ·1n2 , such that for any two distinct elements 0 X1, X2 ∈ XD we have, n r jn k kX − X k2 ≥ 1 2 A2 d2 1 2 F 8 r max 0 n n n n r o ≥ 1 2 min (1 − δ)2A2 , γ2X 1 . (2.71) 16 max d min m
0 For any X ∈ XD, the KL divergence of PX0 from PX where X0 is the reference matrix (whose entries are all equal to Xmin) can be upper bounded using (2.64) by
2 m X (Xi,j − Xmin) K( , ) ≤ , PX PX0 n n X 1 2 i,j min d2 A2 ≤ m 0 max , (2.72) Xmin
0 where (2.72) uses the fact that by construction, entries of the matrices in XD are upper bounded by d0 + Xmin. From (2.72) we can see that
1 X 0 0 K( , 0 ) ≤ α log(Card(X ) − 1), (2.73) Card(X 0 ) − 1 PX PX D D 0 X∈XD 35 √ 0 is satisfied for any 0 < α0 < 1/8 by choosing γ < α√log 2 . Equations (2.71) and d 2 2 (2.73) imply we can apply Theorem 2.3 (where the Frobenius error has been as used the semi-distance function) to yield
∗ 2 ! kXb − X kF 1 n 2 2 2 n1r o 0 inf sup PX∗ ≥ · min (1 − δ) Amax, γd Xmin ≥ β , ∗ Xb X ∈XA n1n2 64 m (2.74) for some absolute constant β0 ∈ (0, 1). Using (2.68) and (2.74), and by applying Markov’s inequality we directly get the result presented in Theorem 2.2, thus com- pleting the proof. Chapter 3
Parameter Estimation Lower Bounds for Plenoptic Imaging Systems
This work focuses on assessing the information-theoretic limits of scene parameter esti- mation in plenoptic imaging systems, which are capable of providing substantially more information about a given scene than conventional cameras. We present a general frame- work to compute lower bounds on the parameter estimation error from noisy plenoptic observations. Our particular focus is on passive indirect imaging problems, where the observations do not contain line-of-sight information about the parameter(s) of inter- est. Using computer graphics rendering software to synthesize the (often complicated) dependence among parameter(s) of interest and observations, i.e. the forward model, we numerically evaluate the Hammersley-Chapman-Robbins (HCR) bound to establish lower bounds on the variance of any unbiased estimator of the unknown parameters. For scenarios where the rendering software produces an inexact version of the true forward model, we analyze the effects of such rendering inconsistencies on the computed lower bounds both theoretically and via simulations. We also compare our lower bound with the performance of the Maximum Likelihood Estimator on a canonical object localization problem, which shows that our lower bounds are indicative of the true underlying fundamental limits. 1
1Portions of the material in this chapter is c 2019 IEEE. Reprinted, with permission, from 53rd Asilomar Conference on Signals, Systems, and Computers., “Computer Graphics meets Estimation Theory: Parameter Estimation Lower Bounds for Plenoptic Imaging Systems,” A.V. Sambasivan, R. G. Paxman, and J. D. Haupt.
36 37 3.1 Introduction
Conventional imaging systems are modeled after human vision and provide information about a scene via a two-dimensional image, where each pixel is described by a tristimulus value (e.g., RGB). However, the plenoptic function [5] captures the intensity of light at every location in space, over all possible angles and wavelengths, and at all time instances, providing much more information about a given scene than a conventional camera. The resulting plenoptic function takes the form
L = L(r, ϕ, ν, t),
3 where r ∈ R refers to the location at which the plenoptic observation is made, ϕ ∈ [0, 2π) × [0, π) denotes the angle pair (direction of arrival of light) in polar coordinates, ν is the wavelength, and t is the time index. The domain over which the plenoptic function is measured typically changes with the imaging modality used for a particular application. Plenoptic imaging has been used for a wide range of applications including stereoscopy [44, 45], microscopy [46, 47], and Non-Line-Of-Sight (NLOS) imaging [7,8, 10, 48]. While plenoptic imaging systems have a variety of applications in computer vision and image processing, we focus here on NLOS or indirect imaging. NLOS imaging corresponds to the scenario where the scene of interest is hidden from the observer (or imaging system) and the aim is to recover some parameters corresponding to the hidden scene from indirect reflections off various surfaces in the direct LOS of the observer (or imaging system). The parameters of interest in these problems could be as simple and high-level as the location of hidden objects (NLOS object tracking) or as complex as the entire hidden scene (NLOS image reconstruction). NLOS imaging, or more colloquially “imaging around corners,” has numerous potential applications in defense, autonomous navigation, and imaging inaccessible tissues in endoscopy for better diagnostics to name a few. We consider the problem of determining the fundamental limits of parameter esti- mation from noisy plenoptic observations, with an eye towards indirect (NLOS) imag- ing, from an estimation-theoretic standpoint. The difficulty in establishing fundamental lower bounds for plenoptic imaging problems lies in the complexity of the forward model, which codifies the functional dependence of the nominal (noiseless) observations on the 38 scene parameters. In this paper, we build on our previous work [6], where we first pro- posed combining analytical tools from classical estimation theory [49–52] with computer graphics rendering engines [1, 53] that help us simulate the often complicated forward model, for computing lower bounds on the estimability of scene parameters from noisy plenoptic observations. In doing so, we can provide a benchmark of optimality against which various estimation strategies could be compared, and also obtain useful insights on where information about the NLOS scene parameters is localized in a given set of observations.
3.1.1 Prior Art
NLOS imaging methods can be broadly classified into two categories: (1) Active imaging methods, and (2) Passive imaging methods. Active imaging typically involves illuminat- ing the hidden scene using external stimuli, e.g., pulsed lasers, LIDARs, and using the information from the returning photons, (e.g., Time-of-Flight information), measured by specialized detectors, e.g., single-photon avalanche diodes (SPADs), to reconstruct the shape and albedo of hidden objects. These methods were initially demonstrated in [54, 55] and have recently gained a lot of attention [56–58]. However, active imaging requires the use of expensive and bulky hardware like femtosecond lasers and specialized detectors, which make them less amenable for on-field deployment in real-life applica- tions. In this work, we are particularly interested in passive imaging methods that rely on the intensity measurements obtained at the imaging device to estimate the hidden scene parameters. The main challenge in passive imaging is that the indirect photons bouncing off diffuse (non-mirror-like) surfaces have very low signal levels making the imaging problems ill-conditioned. In a pioneer work [59], it was observed that the presence of occluders like sharp edges and corners facilitates the NLOS recovery problem. This led to many follow-up works that exploit occluders (and motion) in the hidden scene to perform NLOS imaging [7–11, 48]. Other passive imaging methods use coherence-based techniques for NLOS reconstruction [60–63]. A comprehensive survey of various NLOS imaging methods can be found in [64]. More recently, [65, 66] proposed using the Cramer-Rao bound and Fisher informa- tion as a proxy to study the feasibility and conditioning of their NLOS problem. The aforementioned works have shown significant promise in “imaging around corners,” but 39 a thorough information-theoretic treatment of the NLOS parameter estimation problem for more general and realistic scenes is still lacking.
3.1.2 Our Contribution
We present a general framework to establish fundamental limits for NLOS parameter estimation problems using noisy passive plenoptic observations. In contrast to the above-mentioned efforts, which develop simplified/approximate (linear) forward models for controlled experimental environments to study NLOS imaging, our framework can handle complicated (non-linear) forward models that describe realistic scenes. In [6], we proposed using rendering engines to simulate the forward model and numerically evaluate the Hammersley-Chapman-Robbins (HCR) bound that provides a lower bound on the variance of any unbiased estimator of the parameter(s) of interest. The HCR bound is applicable to a wider range of problems than the more commonly used Cramer- Rao bound (CRB), and is at least as tight as CRB when both exist. Our framework also enables us to localize the information content in the observations, which can be used to further validate some of the conclusions about the benefits of occluding objects for NLOS imaging [7,9–11, 65, 66]. One important assumption made by [6] is that the rendering engine simulates the forward model exactly, i.e., provides an error-free version of the true plenoptic obser- vations for a given set of scene parameter values. However, in practice this assumption might not hold as the rendering engine might yield a close albeit inexact estimate of the true plenoptic observations. In this work, we consider unbiased and progressive ren- dering techniques, which produce unbiased estimates of the true plenoptic observations with continually decreasing error as we let the renderer run indefinitely (see documen- tation of Mitsuba [1]). We extend the efforts in [6] by analyzing effects of using such inexact renderings in our lower bounding framework and provide a simple method to estimate intervals for the true HCR lower bounds. We instantiate the HCR lower bounds for Poisson noise and Additive White Gaus- sian Noise (AWGN) models, and demonstrate the utility of our framework using a few canonical estimation problems. Finally, we compare our lower bounds with the perfor- mance of Maximum Likelihood Estimators for the problem of object localization, which shows that our lower bounds are almost tight and are indicative of the true fundamental limits. 40 The rest of the chapter is organized as follows. We begin by providing an overview of the rendering equation [12], which explains the forward model for our settings in Section 3.2. Using this, the problem setup is formalized in Section 3.3. We explain our renderer- enabled lower bound computation framework and the information-theoretic tools used in Section 3.4. The effect of rendering errors on the lower bound computation is studied in Section 3.5. We develop Maximum Likelihood Estimators for the problem of NLOS object localization and compare its performance against the computed lower bounds in Section 3.6 and finish with some concluding remarks about future work in Section 3.7.
3.2 Forward Model
d Let Θ ⊆ R denote the parameter class containing all possible values of the parameter of our interest. For a given scene with some unknown (deterministic) parameter θ∗ ∈ Θ, we denote the set of plenoptic samples (obtained by sampling the full plenoptic field) associated with the scene by, Lθ∗ , {Lθ∗ (ω)}ω∈Ω , where we introduce the shorthand ω := [r, ϕ, ν, t], to denote the arguments of the plenoptic function, and Ω is the ob- servation space, which is a subset of the domain over which the plenoptic function is defined.
3.2.1 The Rendering Equation
Information-theoretic treatment of this estimation problem requires understanding the ∗ forward model θ 7→ Lθ∗ that captures the functional dependence of the plenoptic observations on the scene parameters. This can be mathematically explained by the rendering equation [12], which models the behavior of light rays as they originate from a light source, bounce off the objects in the scene and ultimately reach the detector (an imaging device). This forward mapping is commonly referred to as “light-transport” in the computer-graphics literature. For fixed (ν, t), light incident on a surface point r of a scene, along direction in −ϕi, denoted by Lθ∗ (r, −ϕi), is typically scattered along all directions, as shown in Figure 3.1(a). The proportion of this light reflected along ϕo is determined by a surface-dependent quantity known as the bi-directional reflectance distribution func- tion (BRDF) f(r, ·, ·).
The overall radiance leaving a surface point r, along direction ϕo, denoted by out in Lθ∗ (r, ϕo), can hence be obtained by summing up the contributions of Lθ∗ (r, −ϕi) 41
(a) (b)
Figure 3.1: The rendering equation, explained graphically: (a) The proportion of in- cident light coming in from direction ϕi that gets reflected along direction ϕo is de- termined by the BRDF of the surface; (b) Light incident on a surface point r, can in be seen as light leaving from another point in the scene g(r, ϕi), ⇒ Lθ∗ (r, −ϕi) = out Lθ∗ (g(r, ϕi), −ϕi).
from all possible incident directions −ϕi, and also any light emitted by the surface (if it is an emitter). Mathematically, Z out e in Lθ∗ (r, ϕo) = Lθ∗ (r, ϕo) + Lθ∗ (r, −ϕi)f(r, ϕo, ϕi)(ϕi · n)dϕi, 2 S+(r)
e where Lθ∗ (r, ϕo) is the emitted radiance at r along the direction ϕo, n is the surface 2 normal at r, and S+(r) is the unit hemisphere at r containing all outgoing directions. in out From Figure 3.1(b), we can see that Lθ∗ (r, −ϕi) = Lθ∗ (g(r, ϕi), −ϕi), where g(r, ϕi) is a scene geometry dependent operator that essentially finds the first surface point reached when traveling outward from r, along direction ϕi. Using this we can obtain the rendering equation as Z out e out Lθ∗ (r, ϕo) = Lθ∗ (r, ϕo) + Lθ∗ (g(r, ϕi), −ϕi)f(r, ϕ0, ϕi)(ϕi · n)dϕi, (3.1) 2 S+(r)
out where Lθ∗ is the plenoptic function of interest in the forward model alluded to above. Equation (3.1) is a Fredholm integral equation of the second kind, and is difficult to solve in closed form for all but simplest of settings. To overcome this hurdle, ray- tracing/rendering engines were developed to approximately solve the integral equation in (3.1) using Monte Carlo methods. Ray-tracing engines are widely used in the com- puter graphics community to generate photo-realistic images by tracing the path of light 42 rays as they interact with objects in a scene. Here, we rely upon this (mature) tech- nology for our purposes; for a given parameter value θ∗, we use ray-tracing packages to approximately solve (3.1) and synthesize the plenoptic observations Lθ∗ .
3.2.2 Illustrative Example Scene: A Π-shaped Hallway
We illustrate the connection between scene parameters and plenoptic observations using an illustrative example scene. The scene of interest is a Π-shaped hallway with dimen- sions as marked in Figure 3.2(a). The inner walls of the hallways are painted with a diffuse eggshell paint, and their BRDF is modeled using the roughplastic plugin in Mitsuba [1]. The ceiling lights are 0.1m × 1.5m with a luminance of 3 lm · sr−1m−2 and emit white light (uniformly over all wavelengths)2. Assuming we place a red spherical object in this hallway, one might be interested in estimating various parameters related to the object, e.g., its location, radius, etc., which when bundled together constitutes the scene parameter vector θ∗. Figure 3.2 shows how the plenoptic observations Lθ∗ can be rendered using a ray-tracing engine for a given parameter value θ∗, which in this case comprises of the ball location and radius. Given the ball location and radius, and a priori knowledge of the scene layout, we can use the ray-tracing software to (approximately) solve the rendering equation (3.1). A rendered RGB image of the scene is shown in Figure 3.2(b). We shall use this hallway setup as a recurring example in the following sections with different objects and camera configurations to demonstrate the utility of our lower bounding framework.
3.3 Problem Statement
We model the noisy observations Yω, as independent draws from a known class of prob- ∗ ∗ ability distributions p(Yω; θ ), whose parameters depends on θ through the plenoptic function. Letting YΩ , {Yω}ω∈Ω be shorthand for the entire collection of noisy plenop- tic observations, the likelihood of the observations can be written as
∗ Y ∗ p(YΩ; θ ) = p(Yω; θ ) , Pθ∗ . (3.2) ω∈Ω
Let Y denote the observation class that defines the set of all possible observations for
2The rendered images using this illumination and camera setup used here is roughly similar to using a commercially-available 2000 lm ceiling light with an exposure time of 1/350 seconds. 43
(a) (b)
Figure 3.2: Simulating the forward model using rendering: (a) Layout of a Π-shaped hallway with dimensions marked. Corridors A, B, and C are 2.5m, 3m, and 2.5m long respectively, and 2m tall. The hallway is illuminated with white ceiling lights with −1 −2 a luminance of 3 lm · sr m . The camera C0 is located 0.5m outside corridor A. Location and radius of a red spherical ball constitute the unknown scene parameter θ∗. (b) If we define θ∗ by setting ball radius = 10cm and ball location as the intersection of corridor A and B, then Lθ∗ is the nominal RGB image of the scene captured by a camera at C0. We obtain Lθ∗ using the rendering engine Mitsuba [1] as shown in (b). 44 our problem. We are then interested in determining the fundamental limits of imaging a given scene with unknown parameter θ∗ ∈ Θ, from noisy plenoptic measurements
YΩ ∈ Y. Specifically, we seek to evaluate the performance of estimators, θb(YΩ) , θb : Y → Θ, which have a finite measure with respect to Pθ∗ , as a function of the true underlying parameter of interest θ∗. The accuracy (or lack thereof) of an estimator θb in estimating the true scene parameter θ∗ can be measured in terms of the Mean Squared Error (MSE), " 2# MSE (θ∗) = θ − θ∗ , (3.3) θb E b 2 where the expectation is taken with respect to the randomness in the noisy observations
YΩ. In order to derive meaningful local lower bounds (bounds as a function of θ∗) on the MSE of estimators, we further need to make assumptions about the class of estimators. ∗ ∗ Indeed, for any θ ∈ Θ, we have a trivial (yet, valid) estimator θb(YΩ) = θ , ∀ YΩ ∈ Y, which achieves an MSE of 0. Such an estimator performs extremely well when θ = θ∗, but generalizes poorly in other cases. In order to avoid such trivial instances, we restrict our attention to the class of unbiased estimators, for which the MSE reduces to variance (or trace of the covariance matrix, to be precise). Hence we seek lower bounds on the covariance matrix Cov (θ∗) Cov(θ∗) in a semi-definite ordering sense, or on θb , Var (θ∗) = tr Cov (θ∗) Var(θ∗). θb θb ,
3.4 Renderer-Enabled Computation of Lower Bounds
Our approach employs the Hammersley-Chapman-Robbins lower bound (HCR-LB) [51, 52], which provides lower bounds on the variance of unbiased estimators. HCR-LB and its variants have been used in constrained parameter estimation problems for sensor array signal processing, like bearing estimation [67–69], frequency estimation [70, 71] and many more. In this section, we will state the HCR-LB and discuss some of its salient aspects which make it ideally suited to this problem setting.
∗ d Lemma 3.1 (HCR Lower Bound). Let θ ∈ Θ ⊆ R be any deterministic but unknown ∗ parameter, and let YΩ denote a set of noisy observations of the unknown parameter θ . ∗ Then the variance of any unbiased estimator of θi obeys
∗ ∗ Var(θi ) ≥ HCR(θi ), 45 where the HCR lower bound is given by
∆2 HCR(θ∗) = sup i , ∀ i = 1, . . . , d. (3.4) i ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )
∗ ∗ Here p(YΩ; θ ) and p(YΩ; θ + ∆) denote the pdfs (or pmfs) of the observations parametrized by θ∗ and θ∗ + ∆, respectively.
The denominator in the RHS of (3.4) is the so-called χ2-divergence of p(Y; θ∗ + ∆) from p(Y; θ∗) which essentially measures changes in the probability distribution functions when the true parameter θ∗ is perturbed by ∆. When a small change in the unknown parameter results in distinctly different sets of observations, resulting in a large χ2-divergence between the likelihoods, the HCR-LB is small, suggesting that the estimation problem could be easy and vice versa. It is worth commenting on the relationship of the HCR-LB framework with the well known Cramer-Rao lower bound (CR-LB) [49,50,72]. If we let ∆i → 0, the expression inside the supremum of (3.4) converges to the CR-LB (if the corresponding limit exists). Thus when both bounds exist, HCR-LB is at least as tight as the CR-LB. Unlike the CR-LB, HCR-LB makes no “regularity” assumptions on the noise likelihood function and hence is applicable for a wider range of problems. In particular, CR-LB requires computing derivatives of the log-likelihood function (with respect to θ) and is not well defined in scenarios where the log-likelihood function is not differentiable (e.g., when the parameter space is a countable set). Even for simple scenes, due to the presence of occluding barriers and edges, sharp “transition regions” may occur in the true plenoptic ∗ intensities Lθ∗ as the underlying scene parameter θ varies smoothly resulting in the log-likelihood to be non-differentiable. On the other hand, HCR-LB doesn’t require explicitly computing derivatives of the log-likelihood. For any given noise model, this can be done by rendering or synthesizing ∗ Lθ∗+∆ for a suitably large collection of possible values of θ and ∆ using a ray-tracing engine and then evaluating the functional form of the χ2-divergence. ∗ In addition to lower bounds on the variance of individual parameters θi , we can naturally extend the result in Lemma 3.1, to lower bound the MSE of estimator for a given value of θ∗ as follows.
∗ d Corollary 3.1 (HCR Lower bound on the MSE). Let θ ∈ Θ ⊆ R be any deterministic but unknown parameter, and let YΩ denote a set of noisy observations of the unknown 46 parameter θ∗. Then the MSE of any unbiased estimator of θ∗ obeys k∆k2 MSE(θ∗) ≥ sup , (3.5) ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )
∗ ∗ where p(YΩ; θ ) and p(YΩ; θ + ∆) denote the pdfs (or pmfs) of the observations parametrized by θ∗ and θ∗ + ∆, respectively.
∗ Pd ∗ Proof. For unbiased estimators we have, MSE(θ ) , i=1 Var(θi ). We can further lower bound each of the variance terms in this summation using Lemma 3.1 to obtain,
d X ∆2 MSE(θ∗) ≥ sup i ∗ 2 ∆6=0 h p(YΩ;θ +∆) i i=1 ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ ) d P 2 ∆i ≥ sup i=1 , ∗ 2 ∆6=0 h p(YΩ;θ +∆) i ∗ θ∗ ∗ − 1 θ +∆∈Θ E p(YΩ;θ )
P P where the last inequality follows from the fact that, sup fi(x) ≥ sup fi(x). i x x i In the following sub-sections, we instantiate Lemma 3.1 and provide functional ex- pressions for the HCR-LB under some common noise models, viz. Poisson noise and additive white Gaussian noise models.
3.4.1 HCR Lower Bound for Poisson Noise
Consider noisy plenoptic observations Yω, which are drawn independently from a
Poisson distribution with rates given by the true plenoptic intensities Lθ∗ (ω), i.e. ind Yω ∼ Poisson(Lθ∗ (ω)), ∀ω ∈ Ω. The Poisson distribution is commonly used to char- acterize noise which is discrete or quantized in nature, e.g., when the imaging device counts the number of photons incident on the detector over a certain window of time. If we specialize the HCR-LB given in (3.4) for this setting, we get
2 ∗ ∆i HCRPoisson(θi ) := sup 2 , ∀i = 1, 2, . . . , d, ∆6=0 P (Lθ∗+∆(ω)−Lθ∗ (ω)) ∗ exp − 1 θ +∆∈Θ L ∗ (ω) ω∈Ω θ (3.6) 47 where all the true plenoptic intensities in (3.6) can be obtained from the rendering engine.
3.4.2 HCR Lower Bound for Additive White Gaussian Noise
Consider noisy plenoptic observations with additive white Gaussian noise (AWGN) of i.i.d 2 the form Yω = Lθ∗ (ω) + ω, where the noise ω ∼ N (0, σ ), ∀ ω ∈ Ω. We further assume that the noise variance σ2 is known a priori. If we specialize the HCR-LB given in (3.4) for this setting, we get
2 ∗ ∆i HCRAWGN(θi ) := sup 2 , ∀i = 1, 2, . . . , d, ∆6=0 P (Lθ∗+∆(ω)−Lθ∗ (ω)) θ∗+∆∈Θ exp σ2 − 1 ω∈Ω (3.7) where all the true plenoptic intensities in (3.7) can be obtained from the rendering engine.
3.4.3 Localizing Information Content in Plenoptic Observations
In addition to computing lower bounds, we can go a step further and localize the in- formation content in the plenoptic observations. Since we assume that our observations are statistically independent, the Fisher-Information of a particular pixel ω, for any given parameter value θ∗ can be expressed as
" 2 # ∗ ∂ Iω(θi ) = E log p(Yω; θ) , ∀i = 1, 2, . . . , d, (3.8) ∂θi θ=θ∗ where we implicitly assume that the partial derivative of the log-likelihood function is well-defined at θ = θ∗, and the expectation is with respect to the pdf (or pmf) of the ∗ observations p(YΩ; θ ). If we instantiate (3.8) for the AWGN and Poisson noise models considered above, we get
2 ∂L ∗ (ω) 1 θ for Poisson noise ∗ Lθ∗ (ω) ∂θi Iω(θi ) = 2 , ∀i = 1, 2, . . . , d. 1 ∂Lθ∗ (ω) 2 2 for AWGN with variance σ σ ∂θi (3.9) 48 In the absence of access to the gradients of the plenoptic observations, we can still easily compute the pixel-wise Fisher Information for different noise models by approximating ∗ (3.9) using Finite Differences (FD), which we refer to as (pixel-wise) FD-FI. Iω(θi ) quantifies how much information is conveyed by a given pixel ω about the parameter ∗ of interest θi and hence is a useful tool in localizing the overall information content in the observations.
3.4.4 Experimental Evaluation: Lower bounds for Π-shaped Hallway Scene
In order to demonstrate the utility of our lower bounding framework, we consider the hallway scene setup described in Section 3.2.2. We assume that noisy plenoptic mea- surements are made by an imaging device located at C0, which collects multispectral images of size 512 × 384 with 30 uniformly spaced spectral channels in the 368 − 830nm range (as opposed to the RGB observations illustrated in Figure 3.2(b)). With this setup, we study the fundamental limits of two different scalar estimation problems:
• Estimating location of the ball (with a fixed radius of 10cm) along the Π-shaped hallway.
• Estimating radius (size) of the ball located at the center of Corridor B.
For these estimation problems, we assume that the scene geometry and the BRDFs of the surfaces are known, i.e., the only unknown in the scene is either the location of the ball or its radius. For the problem of estimating the ball location, we discretize the 1-D Π-shaped manifold along which the ball could be located to 790 equispaced points (each 1cm apart). Ball locations are numbered clockwise from 1 (bottom-left of the Π-shaped manifold) to 790 (bottow-right of the manifold). For size estimation problem, we consider 101 possible values for ball radii in the range 5cm to 15cm (with 0.1cm increments). We use the Mitsuba renderer [1] to synthesize physically accurate plenoptic ∗ samples Lθ∗ , for different values of the unknown parameter θ (and ∆), and numerically evaluate the HCR bound given by Equations (3.6) and (3.7). In addition to numerically evalutating the HCR-LB, we use Finite-Differences on the rendered data, as discussed in Section 3.4.3, to (approximately) compute pixel-wise Fisher Information (3.9), which is referred to as Finite Difference-Fisher Information (FD-FI). All the scenes were rendered with 16384 samples per-pixel using the Bi-Directional Path Tracer (BDPT) rendering 49
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 3.3: HCR-LB for ball location estimation for Poisson Noise: (a)-(d), and AWGN with different values of σ: (e)-(h). HCR-LB under different regimes are shown in: (a),(e) LOS region - HCR-LB is very small. The LB drops significantly when ball starts moving in Corridor B; (b),(f) Transition from LOS to NLOS - Sharp increase in the HCR-LB when the ball moves away from LOS; and (c),(g) NLOS region - HCR-LB in is much higher, indicating the potential hardness of the estimation problem; (d),(h) Ball radius estimation - HCR-LB decreases with increasing size (radius) of the ball. With the help of these curves, one can quantify how difficult the problem of estimating NLOS parameters can be. For AWGN model, we can see that HCR-LB increases with σ as expected. algorithm on a HP Linux distributed cluster with 20 Intel Haswell E5-2680v3 processor cores and 10GB of RAM per-scene and it took approximately 2.5 hours to render each scene.
Results and Discussion
HCR-LB under Poisson noise for estimating the ball location is show in Figures 3.3(a) to 3.3(c). Similary, HCR-LB under AWGN with different levels of noise variances σ2, is shown in Figures 3.3(e) to 3.3(g). It can be observed that the HCR-LB for AWGN increases for increasing σ, which is intuitive, as the estimation problem is expected to get tougher as the noise level increases. This trend is observed uniformly for both location estimation and the radius estimation problems. For the location estimation problem, the HCR-LB is (unsurprisingly) small ( 1) 50
(a) (b)
Figure 3.4: Pixelwise FD-Fisher Information (FD-FI), obtained by aggregating contri- butions from all 30 spectral channels. Darker regions ⇒ more informative. Pixelwise FD-FI shows where and how information about the parameter of interest is localized in our observations. These images highlight subtle details about the scene parameters which are not visible from the nominal RGB images (bottow-row) which are obtained from the rendering software; (a) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row) for 4 different ball locations: (from left to right) completely in LOS, just inside LOS, just outside LOS, center of corridor B. Notice that different regions in the scene are more informative than others for different ball locations; (b) Pixelwise FD-FI for Poisson noise (top row) and AWGN with σ = 0.2 (second row). These images show where information about ball radius is localized. Notice that regions of information differ from the FD-FI images for ball location in Figure 3.4(a). 51 when the ball is in corridor A (in LOS). The lower bound suddenly drops when the ball translates horizontally (with respect to the camera) in corridor B (see Figures 3.3(a) and 3.3(e)). This is expected, as detecting an object is easier when it moves horizontally as compared to when it moves away from the camera. HCR-LB increases by a few orders of magnitude when ball goes away from LOS – location 320 7→ location 321 (see Figures 3.3(b) and 3.3(f)), and continues to increase sharply for ball locations further down corridor B (see Figures 3.3(c) and 3.3(g)). In corridor C, the actual value of the bound gets very large ∼ 104cm2, which indicates that accurately locating the ball around two corners is (again, unsurprisingly) a much more difficult estimation problem. Some recent works [7,8] have proposed that sharp edges and occlusions in a scene can help recover NLOS imagery. These sharp occlusions act like a pin-hole camera, projecting information about NLOS regions of a scene on to visible regions like walls and floors. The authors of [7] refer to this imaging phenomenon as the “corner-camera”. Our pixel-wise FD-FI images are able to provide some evidence for this effect showing regions around the corners containing higher amount of information about the hidden object location than other regions. This effect can be clearly seen from the video of the pixel-wise FD-FI and HCR-LB as the ball moves from location 1 to 790, provided in the supplementary material. By looking at the FD-FI images, we also believe that some of the dips in the HCR-LB in corridor C (in Figures 3.3(c) and 3.3(g)) result from “corner-camera” type effects. Pixelwise FD-FI images in Figure 3.4(a) provide valuable insights on where and how information about a parameter of interest is distributed amongst the observed samples. As one might expect, when in LOS, the edges of the ball provide most information about its location. When the ball moves away from LOS, information about its location is conveyed by shadows cast by the ceiling light in corridor B on the floor, and other indirect photons from the back wall and the floor. These subtle details are however not visible in the raw plenoptic data (or the nominal RGB images in bottom-row of Figure 3.4). The ability to localize information content in indirect photons is extremely useful for those who are interested in developing NLOS imaging algorithms. The FD-FI images can be used as a guide to develop clever imaging modalities that collect observations with “high information content” about the NLOS scene parameters. HCR-LB for estimating the radius of a ball placed at the center of corridor B (lo- cation: 400) is shown in Figures 3.3(d) and 3.3(h). We observe that the lower bound decreases as the radius increases, suggesting that estimating radius of a larger ball could 52 be relatively easier. Pixelwise FD-FI images for radius estimation problem in Figure 3.4(b), show that regions of information are mainly on the back wall which arise from multi-bounce photons, and the number of informative pixels increases for larger radii. We can see that the regions of high information differ significantly between Figure 3.4(a) and Figure 3.4(b). We conclude that these informative regions/observations depend on both what the parameter of interest is (see difference between Figures 3.4(a) and 3.4(b)), and also on the particular value of the parameter (see differences amongst images in each of Figures 3.4(a) and 3.4(b)). Another interesting point to note is that, while the informative regions change significantly for the type of parameter under consideration and also for different values of a given parameter, they are nearly identical for the two noise models considered here. However, the magnitude of the FD-FI pixels between the two noise models are considerably different (see first two rows of Figures 3.4(a) and 3.4(b)).
3.5 Computing Lower bounds with Inexact Rendering
The framework above implicitly assumes that the rendering engine yields the exact plenoptic intensities for a given set of scene parameter values. While this assumption might be valid for scenes with simple geometry, lighting, and/or surface scattering models (or BRDFs), for complex scenes, the rendered images could be erroneous with per-pixel error that depends inversely on the number of Monte-Carlo samples used by the rendering algorithm to solve the rendering equation (3.1). In this section, we analyze the inconsistencies that arise when using plenoptic data with rendering errors to compute lower bounds of the form of equations (3.6) and (3.7). We also provide a simple algorithm to estimate intervals for the true HCR-LB when using inexact renderings. The errors induced by light-transport (or rendering) algorithms have been previously studied by Arvo et al. [73], where the authors identify different sources of error in the rendering algorithm and provide theoretical bounds. More recently, Celarek et al. [74] propose a methodology to numerically estimate rendering errors in terms of MSE and also per-pixel standard deviation using multiple short renderings. While this work aims to use the error estimates to compare various rendering methodologies, our aim here is to assess and quantify how rendering errors percolate into our HCR-LB framework. It is further useful to highlight the distinction between the rendering errors con- sidered in this section and the different noise models for the plenoptic observation in 53 Sections 3.4.1 and 3.4.2. While the Poisson and AWGN noise models describe the noise/uncertainty in the observations of an imaging system for scene parameter estima- tion, this section analyzes the effect of using inaccurately rendered data to compute the HCR-LB presented in Sections 3.4.1 and 3.4.2. For some value of the scene parameter θ, let us denote the true plenoptic intensities as Lθ, and the output of the rendering algorithm computed using N samples per-pixel (N) (N) (N) as, Leθ = Lθ + Eθ , where Eθ is the (additive) rendering error. In our analysis, we assume that for all θ ∈ Θ, and ω ∈ Ω, the rendering algorithm satisfies the following assumptions:
(N) A.1 The Rendering algorithm is unbiased, i.e. E[Leθ (ω)] = Lθ(ω), or equivalently (N) E[Eθ (ω)] = 0.
A.2 Most weighted sums of pixel-wise error variance decays at the rate of Θ(N −1)3.
In other words, for weights Wω ∼ Uniform [0, Lmax], where Lmax is an absolute constant, X (N) −1 Wω · Var(Eθ (ω)) = Cθ(ω) · N , ω∈Ω
4 with high probability , for some scene-dependent constant Cθ(ω) > 0.
A.3 The higher-order moments of the relative per-pixel error is upper bounded by " (N) k# −1 Eθ (ω) O(N ), i.e., ∃ N0, δ > 0, s.t. for all N > N0 and k > 2, E = Lθ(ω) O(N −(1+δ)).
A.4 The pixel-wise errors for different scene parameter values are statistically un- correlated.
Remark 3.1. Assumption A.1 is a mild assumption and is satisfied by a lot of modern rendering algorithms. In particular, we use Mitsuba’s [1] Bi-Directional Path Tracer (BDPT) and Redner’s [53] path tracer, which are unbiased, for rendering all our scenes.
Remark 3.2. A sufficient condition for Assumption A.2 is that the error variance of (N) −1 each pixel, Var(Eθ (ω)) = Θ(N ) hold for all ω ∈ Ω. While such behavior is expected from standard Monte-Carlo based algorithms, modern rendering algorithms use other
3Here Θ(·) refers to asymptotic notation and is not to be confused with the class of scene parameter values defined in Section 3.3. 4 The probability is with respect to the randomness in the weights Wω and not the rendering errors. 54 techniques like importance sampling to improve the performance. Hence the N −1 rate might not hold pixel-wise, but is typically valid when averaged over all pixels.
We present simulations in Appendix 3.9.1 to empirically validate Assumption A.2. In particular, we show empirically that the weighted sum of pixel-wise error variance decays at 1/N for the example scene used in this paper. For rendering algorithms with P (N) different rates of convergence, we could simply assume that Wω·Var(Eθ (ω)) = Cω· ω∈Ω N −p, where the degree of decay p > 0 depends on the nature of the rendering algorithm used. The decay rate p may be known a priori or could be estimated empirically from simulations similar to that presented in Appedix 3.9.1. Our analysis naturally extends for any p > 0.
Remark 3.3. Assumption A.3 is automatically satisfied if the rendering errors are bounded in magnitude, which they typically are for most rendering algorithms used in practice, and when Assumption A.2 holds. This criterion suggests that for “sufficiently large” values of N, the higher-order error terms are “relatively small.”
Remark 3.4. Assumption A.4 holds when the renderer uses different (random number) seeds for generating its samples.
We first look at the functional form of the HCR-LB presented in Sections 3.4.1 and 3.4.2. We can see that under both Poisson noise and AWGN models, the HCR-LB ∗ for θi takes the form
2 ∗ ∆i HCR(θi ) = sup , (3.10) ∆6=0 exp {λ(Lθ∗ , Lθ∗+∆)} − 1 θ∗+∆∈Θ where λ(Lθ∗ , Lθ∗+∆) captures the dependence on the rendered plenoptic intensities. It is easy to see that,
2 P (Lθ∗ (ω)−Lθ∗+∆(ω)) λP for Poisson Noise Model Lθ∗ (ω) , λ(L ∗ , L ∗ ) = ω∈Ω . θ θ +∆ 2 kLθ∗ −Lθ∗+∆k 2 σ2 , λG for AWGN with noise variance σ (3.11)
We suppress the dependence of λ on θ∗, ∆, and the forward model for the sake of brevity. Such dependences will be made explicit whenever additional clarity is required. 55 (N) (N) When we use inaccurately rendered plenoptic data Leθ∗ and Leθ∗+∆, where we use N samples per-pixel for rendering, we end up with λe(N) and obtain an erroneous estimate of the HCR functional (the function being maximized in the HCR-LB),
2 (N) ∗ ∆i f(λe ; θ , ∆) = . (3.12) exp λe(N) − 1
We first characterize the effect of rendering error in computing λ using the following claims, and then use that to analyze the effect on the HCR functional (3.12), and the overall HCR-LB. The proofs of Theorems 3.1 and 3.2 appear in Appendices 3.9.2 and 3.9.3, respectively.
Theorem 3.1 (Effect of rendering error on λP ). Let us denote the value of λP computed (N) using the (inexactly) rendered data with N samples per-pixel as λeP . Then we have
(N) CP λe = λP + + ηP , (3.13) P N
−(1+δ) where CP ≥ 0 is a constant that depends only on scene parameters, |E[ηP ]| = O(N ), −1 and Var(ηP ) = O(N ), and δ > 0 is the constant appearing in Assumption A.3.
Theorem 3.2 (Effect of rendering error on λG). Let us denote the value of λG computed (N) using the (inexactly) rendered data with N samples per-pixel as λeG . Then we have
(N) CG λe = λG + + ηG, (3.14) G N where CG ≥ 0 is a constant that implicitly depends only on scene parameters and 2 −1 AWGN variance σ , E[ηG] = 0, and Var(ηG) = O(N ).
Theorems 3.1 and 3.2 imply that λe overestimates the true λ in expectation, where −1 CP CG the estimation bias shrinks at the rate of Θ(N ). While the N and N terms denote the (positive) bias, the ηP and ηG terms codify the variance of the computed λe’s. Although both bias and the variance terms exhibit the same rates of O(N −1), there is a bias-variance tradeoff in the computed λe that depends on the scene parameters θ∗ and θ∗ + ∆. In order to better understand this bias-variance tradeoff, let us take a closer look at the expression of λe. We use the AWGN case here for the sake of exposition, but the tradeoff holds similarly for the Poisson noise model as well. 56 Using (3.11) we can write,
(N) 1 2 2 T λe = λG + kEθ∗ − Eθ∗+∆k + (Lθ∗ − Lθ∗+∆) · (Eθ∗ − Eθ∗+∆) . (3.15) G σ2 σ2 | {z } | {z } η1 η2
Let us see how η1 and η2 behave under 2 different scenarios:
Case (1): kLθ∗ − Lθ∗+∆k kEθ∗ − Eθ∗+∆k: (N) In this scenario, |η1| |η2|, and we can write λeG ≈ λG+η2. Since E[η2] = 0 (from (N) Assumption A.1), we can see that the computed value λeG is (approximately) un- (N) (N) −1 biased, i.e., E[λeG ] ≈ λG. Furthermore, we have Var(λeG ) ≈ Var(η2) = O(N ) from Assumption A.2. In other words, this is a regime where the (zero-mean) variance term dominates the overall error. This typically happens when:
•k ∆k is large, or • the parameters of interest are in LOS, which means that even a small perturbation of the true parameter θ∗ would result in a large difference in the plenoptic observations, or • the value of N 1, which would result in the rendering errors being very small.
Case (2): kLθ∗ − Lθ∗+∆k kEθ∗ − Eθ∗+∆k: (N) In this scenario, |η1| |η2|, and we can write λeG ≈ λG + η1 > λG. This in (N) ∗ turn means that the HCR functional is underestimated, i.e., f(λeG ; θ , ∆, ) < ∗ −1 −(1+δ) f(λG; θ , ∆). Furthermore, we have E[η1] = Θ(N ) and Var(η1) = O(N ) from Assumptions A.2 and A.3, respectively. In other words, this is a regime where the (non-negative) bias term dominates the overall error. This happens when:
•k ∆k is small, or • the parameters of interest are in NLOS, which means that even large per- turbations of the true parameter θ∗ would yield very similar plenoptic obser- vations, or • the value of N is small, which would result in larger rendering errors.
Furthermore, since the overall HCR-LB is achieved by maximizing the HCR functional ∗ f(λ; θ , ∆), the supremum typically occurs for small values of λ or when kLθ∗ −Lθ∗+∆k 57 is small (corresponding to Case (2) above). Thus the discussion above implies that the overall HCR-LB is usually underestimated, especially for NLOS imaging problems, when using inaccurately rendered plenoptic data to compute the bounds.
Remark 3.5 (Effect of rendering error on the overall HCR lower bound). A direct implication of Theorems 3.1 and 3.2 is that the HCR-LB for Poisson and AWGN noise models computed using plenoptic data rendered with N samples per-pixel obeys,
∗ a.s ∗ lim HCRN (θi ) = HCR(θi ), ∀ i = 1, 2, . . . , d. (3.16) N→∞
Furthermore, if the supremum of the true HCR functional f(λ; θ∗, ∆) occurs for k∆k ≤
, then from the above discussion we can infer that there exists a constant N0() that depends on the scene parameters such that for all N < N0(), with high probability,
∗ ∗ HCRN (θi ) ≤ HCR(θi ) ∀ i = 1, 2, . . . , d. (3.17)
The value N0() essentially determines how the samples per-pixel affects the bias- variance tradeoff described above.
Equation (3.17) simply follows from the observation that for small values of N, the CP CG bias-terms N and N dominate the variance terms (ηP and ηG respectively) so that N ∗ ∗ λe ≥ λ, which in turn means that HCRN (θi ) ≤ HCR(θi ).
3.5.1 Estimating HCR Lower Bounds from Inexact Rendering
Our error analysis above shows how rendering error manifests itself in the computation of the HCR-LB. One important take-away from this analysis is that in order to get a good estimate of the HCR-LB, one must use as many number of samples per-pixel as possible to render the scenes. However, this might result in a computational bottleneck as ray-tracing is a highly time and memory intensive process. For example, rendering each high-resolution plenoptic (multi-spectral) image in Section 3.4.4 until convergence using Mitsuba [1] took ∼ 2.5 hours, and we had to render close to 800 such images for the location estimation example (and an additional 100 images for the radius estimation example). In this section, we describe a simple method to estimate upper and lower intervals for the true HCR-LB. For any given values of θ∗ and ∆, Theorems 3.1 and 3.2 suggest that 58 the relationship between the true and the observed λ’s (for both Poisson and AWGN noise models) is given by,
C(θ∗, ∆) λe(N)(θ∗, ∆) = λ(θ∗, ∆) + + η, (3.18) N where C(θ∗, ∆) ≥ 0 is the co-efficient of the rate of decay of bias, and η models the higher-order (moments 3 and higher) terms of the rendering errors, with Var(η) = O(N −1). Thus we can use a data-driven method to estimate the unknown λ(θ∗, ∆) and C(θ∗, ∆) by rendering the scenes with different values of N and solving for the unknowns using a simple (weighted) Least-Squares algorithm. In particular, if we render the scenes for parameter values θ∗ and θ∗ + ∆ using
N1,N2,...,Nk samples per-pixel, for some k ≥ 2, then we get the following system of linear equations,
(N ) ∗ 1 λe 1 (θ , ∆) 1 η1 N1 " ∗ # . . . λ(θ , ∆) . . = . . + . . (3.19) C(θ∗, ∆) (N ) ∗ 1 λ k (θ , ∆) 1 ηk e Nk
Equation (3.19) can be solved in closed form to obtain an unbiased estimate (approx- imately unbiased for the case of Poisson noise model), λb(θ∗, ∆) of λ(θ∗, ∆). We can then use the estimated λb(θ∗, ∆) in (3.10) to get
2 ∗ ∆i HCR([ θi ) := sup n o , ∀ i = 1, 2, . . . , d, (3.20) ∆6=0 exp λb(θ∗, ∆) − 1 θ∗+∆∈Θ
∗ where HCR([ θi ), is a random variable whose randomness or stochasticity arises due to the rendering errors.
∗ ∗ Claim 3.1 (Relationship between HCR([ θi ) and HCR(θi )). For any given parameter value θ∗, the HCR lower bound computed using unbiased estimates λb(θ∗, ∆) (for all ∆) is, in expectation, an upper bound on the true HCR lower bound, i.e.,
h ∗ i ∗ E HCR([ θi ) ≥ HCR(θi ), ∀ i = 1, 2, . . . , d. (3.21) 59 Proof. We have
" # h ∗ i ∗ h ∗i E HCR([ θi ) = E sup f λb; ∆, θ ≥ sup E f λb; ∆, θ , (3.22) ∆6=0 ∆6=0
where we use the fact that EX sup G(X, y) ≥ sup EX [G(X, y)]. Next we observe that y y 2 ∆i ∗ the HCR functional f(λ) := exp(λ)−1 is a convex function of λ. Hence for every θ and ∆, we can apply Jensen’s inequality to get,
h ∗i h i ∗ ∗ E f λb; ∆, θ ≥ f E λb ; ∆, θ = f (λ; ∆, θ ) , (3.23) where the last equality in (3.23) follows from the fact that λb’s are unbiased estimates. Combining Equations (3.22) and (3.23) yields the desired result, thus concluding the proof.
While Claim 3.1 gives an upper bound (in expectation) on the true HCR-LB, Re- mark 3.5 suggests that the HCR-LB computed directly using rendered plenoptic data is (typically) a lower bound on the true HCR-LB value. Thus, for any given scene and unknown parameter value, we can compute an upper and lower interval within which the true HCR-LB is expected to lie using a fixed computational/rendering budget. In the next section, we present experimental results for the problem of object localization, which demonstrates the utility of our error analysis framework. It is worth commenting on the relationship between the λ’s in our error analysis and the CR-LB. We can obtain the CR-LB for some scalar parameter θ∗ as
∆2 CR(θ∗) = lim . (3.24) ∆→0 λ(Lθ∗ , Lθ∗+∆)
From Remark 3.5 and the discussion above it, we can see that for small values of ∆, using rendered data (and λeN ) to compute the CR-LB (using Finite Differences), typically results in underestimating the CR-LB as well especially for NLOS parameter estimation problems. 60 3.5.2 Experimental Validation: NLOS Object Localization
Setup
We use the same scene layout as described in Section 3.2.2, with a different camera configuration and object. We consider the problem of estimating the location of a red teapot (downloaded from Morgan McGuire’s website [75]) that is constrained to lie in a straight line in Corridor B as shown in Figure 3.5(a). The distance of the teapot from the intersection of corridors A and B is the scalar parameter of interest θ∗. The camera is located at the center of corridor A as shown in Figure 3.5(a) and captures RGB images of size 160 × 120. Using the insights on Fisher information from Section 3.4.4, we see that regions on the floor near the corner have more information about hidden objects than others. Hence we point the camera slightly towards the floor instead of looking straight at the back wall. Luminance of the ceiling lights were set to 12 lm · sr−1m−2, emitting white light (uniformly over all wavelengths)5. We use Redner’s [53] default path-tracer to render the scenes. Figures 3.5(b) and 3.5(c) are the rendered images for θ∗ = 0.2m and 0.9m respectively using 65536 samples-per pixel. We compute the HCR-LB for 100 teapot locations from 0.7m to 1.69m at 1cm incre- ments. For obtaining HCR[ from inexact renderings discussed in Section 3.5.1, we render all the 100 scenes with 10 different values of samples per-pixel, N = 2048, 3072,..., 11264(= 11 · 1024). For computing the HCR-LB directly from the rendered plenoptic data as discussed in Section 3.4, we use Neff = 65536 samples per-pixel. We use the same ren- dering/computational budget for both methods, i.e., the overall rendering time using
Neff = 65536 samples and rendering the scene 10 times with N = 2048, 3072,..., 11264, take approximately the same time (around 3.3 minutes per teapot location).
Results and Discussion
∗ ∗ ∗ We can see from Figure 3.6 that HCR([ θ ) ≥ HCRNeff (θ ) for θ ≥ 0.9m, which cor- responds to the NLOS regime. We can also observe a sharp increase in lower bounds around θ = 0.9m, which corresponds to the teapot moving away from the LOS of the camera . Such an increase is expected since localizing NLOS objects is a much harder problem as we saw in Section 3.4.4. We also observe that the HCR-LB (both HCR[ and HCRNeff ) increases for both AWGN and Poisson Noise models as the teapot moves
5The rendered images using this illumination and camera setup used here is roughly similar to using a commercially-available 2000 lm ceiling light with an exposure time of 1/120 seconds. 61
(a) (b) (c)
Figure 3.5: (a) Top-view of the scene layout used. Scene geometry is the same as in Section 3.2.2. Instead of a red spherical object, we consider a red teapot, and the camera is now placed in the middle of hallway and captures RGB images. The scalar parameter of interest θ∗ is the horizontal displacement of the teapot from intersections of corridors A and B. RGB images rendered using Redner for different values of θ∗ are shown in (b) and (c). 65536 samples-per pixel were used, and it took around 3.3 minutes to render each scene; (b) θ∗ = 0.2m: teapot fully in LOS; (c) θ∗ = 0.9m: teapot just moved completely away from LOS. further away from LOS and into corridor B, which again is expected.
It can also been seen that HCR[ (black lines) are not as smooth as HCRNeff (red lines). We believe that this is due to the fact that HCR[ is a noisy estimate obtained from many sets of low samples per-pixel renderings and hence has higher variance than
HCRNeff , which is computed using a single set of rendered images with larger sam- ples per-pixel. Furthermore, it is worth emphasizing that HCR[ shown as is, does not necessarily upper bound the true HCR-LB, but the relationship holds in expectation, ∗ ∗ i.e. E[HCR([ θ )] ≥ HCR(θ ), where the expectation is taken with respect to render- ing errors. Since computing E[HCR][ is computationally prohibitive, we use HCR[ as a surrogate for the upper bound on the true HCR-LB in our discussions. Thus it is important to keep in mind that the upper bounds (and hence the HCR intervals) shown here are “approximate” and help us understand and interpret the fundamental limits of parameter estimation problems associated with plenoptic imaging systems. Figure 3.7(a) shows the effect of samples per-pixel N on λ. As we increase N, we can see that λeN uniformly decreases across all values of ∆ for NLOS parameters, as predicted by our analysis. From Figure 3.7(b), we can see that λeNeff (θ∗, ∆) ≤ λb(θ∗, ∆) for all ∆. Even though λb and λeNeff are close, Figure 3.7(c) shows that the difference in 62
Estimated Intervals for HCR-LB - Poisson Noise Estimated Intervals for HCR-LB - AWGN with σ = 0.10 Estimated Intervals for HCR-LB in NLOS - AWGN
1000 3 HCR 16 HCR 10 HCR HCR HCRNeff HCRN Neff 14 eff d d d 800 HCR interval HCR interval σ = 0.10 12 σ = 0.20 2 2 2 σ = 0.40 2 600 10 10 σ = 0.60 σ = 0.80 8
400 6 HCR-LB in cm HCR-LB in cm HCR-LB in cm
4 101 200 2
0 0
0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Object Location in m Object Location in m Object Location in m (a) (b) (c)
Figure 3.6: HCR-LB for estimation of teapot location under AWGN and Poisson Noise.
HCRNeff (Red lines): HCR-LB computed directly using rendered data with Neff = 65536 samples per-pixel; HCR[ (Black lines): HCR-LB estimated from rendering scenes with h ∗ i N = 2048, 3072,..., 11264. Due to rendering errors, typically we have E HCR([ θ ) ≥ ∗ HCR(θ ) ≥ HCRNeff . The region between HCR[ and HCRNeff denotes the interval within which the true HCR-LB is likely to lie. (a) HCR-LB for Poisson noise model. (b) HCR-LB for AWGN with σ = 0.1. (c) HCR-LB when the teapot is not in LOS for σ = 0.1, 0.2, 0.4, 0.6, and 0.8.
˜N Estimated and Observed λ for θ∗ = 1.05 m HCR Functional for θ∗ = 1.05 m λ (θ∗, ∆) for θ∗ = 1.05 m 5 4.5
14 Estimated - f(λˆ) 4.0 HCR(θ∗)
4 2 12 3.5 Observed - f(λ˜Neff ) d HCRN (θ∗) 10 3.0 eff 3 2.5 8 2.0 6 2 Lambda value Lambda value 1.5 4 N = 2048
HCR functional1 value in.0 cm N = 6144 1 Estimated - λˆ 2 N = 11264 0.5 N Neff = 65536 Observed - λ˜ eff 0 0 0.0 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 0.08 0.06 0.04 0.02 0.00 0.02 0.04 0.06 0.100 0.075 0.050 0.025 0.000 0.025 0.050 0.075 − − − − − − − − − − − − Perturbation ∆ in m Perturbation ∆ in m Perturbation ∆ in m (a) (b) (c)
Figure 3.7: Effect of samples per-pixel on λ and the HCR functional f(λ) for estimation of teapot location. Noise model: AWGN with σ = 0.1, true object location θ∗ = 1.05m. (a) λeN in the neighborhood of θ∗ = 1.05m - shows how λeN decreases with N uniformly for all values of ∆. (b) Plot of estimated and observed λ’s for θ∗ = 1.05m shows that N λe eff ≥ λb, even with Neff = 65536 samples per-pixels. (c) HCR functional obtained from the estimated and observed λ’s. 63
HCR Functional for θ∗ = 1.05 m HCR Functional for θ∗ = 1.45 m Estimated Intervals for HCR-LB - AWGN with σ = 0.80 300 450 HCR 1000 275 HCR 400 Neff
2 2 d 250 HCR interval 800 350
225 2
200 300 600
175 250 400 HCR-LB in cm 150 Estimated - f(λˆ) Estimated - f(λˆ) 200 HCR(θ∗) HCR(θ∗) 200 HCR functional125 value in cm HCR functional value in cm Observed - f(λ˜Neff ) Observed - f(λ˜Neff ) d d 100 HCR (θ ) 150 HCR (θ ) Neff ∗ Neff ∗ 0
0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.2 0.1 0.0 0.1 0.2 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 − − − − − − Perturbation ∆ in m Perturbation ∆ in m Object Location in m (a) (b) (c)
HCR Functional for θ∗ = 1.05 m HCR Functional for θ∗ = 1.45 m Estimated Intervals for HCR-LB - Poisson Noise 275 1000 400 HCR HCR 250 Neff d 2 2 HCR interval 350 800 225 2 200 300 600
175 250 400
150 HCR-LB in cm Estimated - f(λˆ) Estimated - f(λˆ) 200 125 HCR(θ∗) HCR(θ∗) 200 HCR functional value in cm HCR functional value in cm Observed - f(λ˜Neff ) Observed - f(λ˜Neff ) d d 100 HCR (θ ) 150 HCR (θ ) Neff ∗ Neff ∗ 0
0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.20 0.15 0.10 0.05 0.00 0.05 0.10 0.15 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 − − − − − − − − Perturbation ∆ in m Perturbation ∆ in m Object Location in m (d) (e) (f)
Figure 3.8: Relationship between HCR[ and HCRNeff for different teapot locations (a)- (c) for AWGN with σ = 0.8, (d)-(f) for Poisson Noise model; under 2 different scenarios: (a),(d) Maximum of the HCR functional occurs for ∆ → 0 implies that HCR[ is much larger than HCRNeff . (b),(e) Maximum of the HCR functional occurs for k∆k 0 implies that HCR[ and HCRNeff are approximately equal. (c),(f) HCR-LB for 0.7m ≤ θ∗ ≤ 1.69m. the HCR functional is significant around the neighborhood of ∆ = 0, and the difference slowly tapers off for larger values of ∆. This implies that the overall HCR-LB will have larger uncertainty intervals when the maximum of the HCR functional occurs in the neighborhood of ∆ = 0 (see Figures 3.8(a) and 3.8(d)), and smaller intervals when the maximum occurs for k∆k 0 (see Figures 3.8(b) and 3.8(e)). We can also see from Figure 3.6(c) that the HCR intervals for AWGN model are relatively large for smaller values of σ and decrease as we increase the noise level σ. This arises from the fact that as we increase σ, the location of the maximum of the HCR functional moves further away from ∆ = 0, and this results in smaller HCR intervals as we have seen from Figure 3.8. Similar conclusions also hold for HCR-LB under Poisson noise model (see Figures 3.8(d) to 3.8(f)). From (3.24), we can see that CR-LB is essentially the limit of the HCR functional for ∆ → 0. Then Figure 3.8 would imply that, CR-LB computed with inexactly rendered 64 data might have larger uncertainty intervals (in general) than HCR-LB since, there could be instances where the supremum of the HCR functional is achieved away from the neighborhood of ∆ = 0 in which case the uncertainty in HCR-LB would be smaller (see Figures 3.8(b) and 3.8(e)). The experimental results provided here validate our analysis of the effects of ren- dering errors on the lower bound computation and illustrate the utility of our HCR estimation framework outlined in Section 3.5.1, where we use multiple sets of erro- neously rendered data with different number of samples per-pixel, to obtain intervals for true HCR-LB.
3.6 Maximum Likelihood Estimation
While HCR-LB provides lower bounds that are at least as tight as CR-LB, there are no guarantees on the existence of an unbiased estimator that achieves the lower bound. In order to show that the HCR-LB we derived here accurately depicts the true funda- mental limits of scene parameter estimation, we compare our lower bounds with the performance of the Maximum Likelihood (ML) Estimator. Maximum Likelihood Es- timates (MLEs) are a good first choice because they are intuitive, simple to derive, asymptotically unbiased, and are optimal for many simple estimation problems. While we make no claims about the optimality of MLEs, we show how they compare against the HCR-LB to give us a sense of how tight our lower bounds are. If the errors of the MLEs are close to the corresponding HCR-LB values, it would imply that our lower bounds are tight and indicative of the true fundamental limits. Under the additive white Gaussian noise model described in Section 3.4.2, we can ∗ obtain the MLE for θ , from noisy observations YΩ as,
X 2 θbML(YΩ) = arg min (Yω − Lθ(ω)) . (3.25) θ∈Θ ω∈Ω
It is worth noting that ML estimation under Gaussian noise is equivalent to minimizing the `2 loss. We use the optimization library in PyTorch [76] to solve (3.25), and obtain the ML estimates. In particular, we use the Adam optimizer [77], where gradients of the loss function with respect to the scene parameters are obtained using Finite Differences (FD). A principled alternative to FD gradients is to use differentiable renderers like Redner 65 θ∗ = 0.8m θ∗ = 1.0m θ∗ = 1.2m θ∗ = 1.4m θ∗ = 1.6m
Figure 3.9: Top row: Clean Images for different teapot locations rendered using 65536 samples per-pixel. Bottom row: A single instance of noisy images corrupted by AWGN with σ = 0.1. After the teapot goes completely out of LOS (θ∗ > 0.9m), it is very hard to discern any information about the teapot by simply looking at these images (both from clean and the noisy versions).
[53] or Mitsuba 2.0 [78] that enable us to directly render gradients of the plenoptic ob- servations with respect to the scene parameters of interest. We observed that rendering gradients of multi-bounce photon paths has a much larger memory footprint and a 25× to 30× computational overhead as compared to rendering the actual plenoptic observa- tions. Thus for problems with a small number of parameters (< 15 or so), we were able to obtain more robust gradients using FD than by using differentiable rendering.
3.6.1 Experimental Evaluation: NLOS Object Localization using Max- imum Likelihood Estimation
Setup
We use the exact same setup used in Section 3.5.2. We consider the problem of estimat- ing the location of a red teapot that is constrained to lie in a straight line in Corridor B as shown in Figure 3.5(a). The distance of the teapot from the intersection of corridors A and B is the scalar parameter of interest θ∗. The camera is located at the center of corridor A as shown in Figure 3.5(a) and captures RGB images of size 160×120. We use 66 ∗ ∗ ∗ Paramter θ HCRNeff (θ ) HCR([ θ ) MSE(θbML) Var(θbML) (in cm) (in cm2) (in cm2) (in cm2) (in cm2) 80 4.17 × 10−214 1.30 × 10−216 0.0024 0.0019 100 2.2201 4.3282 4.0518 4.0372 120 2.6999 4.9048 5.9466 4.0569 140 3.3964 5.6872 6.7245 6.2610 160 6.7721 11.6408 12.8361 12.3953
Table 3.1: Comparison of HCR lower bound and performance of MLE for AWGN with σ = 0.1.
Redner [53] to render the true plenoptic observations using Neff = 65536 samples per- pixel and then synthetically add AWGN with σ = 0.1 to generate noisy observations. Figure 3.9 shows the rendered clean images and a single instance of noisy observa- tions for 5 different teapot locations, θ∗ (in m) = 0.8, 1.0, 1.2, 1.4, and 1.5. Except for θ∗ = 0.8m, all other locations correspond to the case where the teapot is completely out of LOS.
Results and Discussion
We obtain the ML estimates by solving (3.25), where the derivative with respect to the 6 `2-loss is computed using FD . As opposed to 65536 samples, which was used to render the clean (noiseless) images, we use low-sample renderings to obtain fast (and noisy) gradients for the stochastic optimization procedure. We use N = 512 samples for the initial few iterations and then use N = 1024 towards the end. It is worth noting that it took approximately 3 seconds and 6 seconds to compute gradients with 512 and 1024 samples respectively, on an NVIDIA Quadro RTX 8000 GPU with 48GB of RAM. The overall runtime for (a single run of) the ML estimation algorithm varied from 28 − 40 minutes depending the true location of the teapot. The algorithm converged faster when the teapot was closer to the beginning of the corridor. We perform 30 independent runs of ML estimation, with different realizations of noisy observations, for each θ∗ to assess the performance of the MLE. The initialization point for every run was chosen uniformly at random in the interval 0.7m to 1.7m. We report the average MSE and the variance of the ML estimates over the 30 runs along with the computed HCR-LB (both HCRNeff and HCR)[ from the previous section, in
6 `(θ0+ξ)−`(θ0−ξ) We use the central difference method to obtain the gradients of the loss: ∇θ`(θ0) ≈ 2ξ , with ξ = 0.01m, where `(θ0) is the loss function. 67 MLE vs HCR-LB - AWGN with σ = 0.10
16 HCR
2 HCRN 14 eff d MSE(θˆML) 12 Var(θˆML)
10 HCR interval
8
6
4
HCR-LB / MSE /2 Variance in cm
0
0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Object Location in m Figure 3.10: Comparison of HCR lower bound and MLE for AWGN with σ = 0.1.
Table 3.1. Firstly, it is worth noting that the average MSE and the variance of the MLEs are very close to each other for most values of θ∗. This implies that the ML estimator for this problem is (nearly) unbiased and hence the HCR-LBs are applicable to the ML estimates. While it can be seen that the average MSE is a bit higher than the variance for θ∗ = 1.2m, we believe that this difference would vanish as we perform more runs ( 30) of the ML estimation. When the teapot is in LOS (θ∗ = 0.8m), we can see that the ML estimator (un- surprisingly) is able to estimate the object location quite accurately. As the object translates further away from LOS the performance of the ML estimator reduces in tan- dem with the HCR-LB. Furthermore, the MSE (and the variance) of the MLEs are quite close to the HCR-LB intervals computed in the previous sections. Starting at random initializations from a 1m wide interval, the ML estimates converge to within a few cm of the true teapot location. This is impressive and promising considering that it is nearly impossible to find any information about the hidden object directly by looking at the images in Figure 3.9. On the other hand, the renderer-enabled ML estimator is able to make use of the information content in the very weak signals from the indirect photons to localize the object. While there are no theoretical guarantees about the existence of estimators that can achieve the HCR-LB, we can see from this experiment that the performance of the ML estimator is close to that of the HCR-LB. Also, the error rates of the MLEs exhibit similar behavior as a function of the true object location as the HCR-LB. This shows 68 that the renderer-enabled HCR-LB framework proposed here yields lower bounds that are indicative of the true underlying fundamental limits for scene parameter estimation problems in plenoptic imaging systems.
3.7 Conclusion
We presented a framework to compute information-theoretic lower bounds for estimating scene parameters from noisy plenoptic data. Our approach employed the HCR-LB over the commonly used CR-LB, as the former is: (a) more amenable to our settings where computing partial derivatives (with respect to parameters of interest) might be infeasible, (b) applies to a wider range of problems, and (c) yields a bound that is at least as good as CR-LB. Using computer graphics rendering packages, we overcome the difficulty of having to solve the forward model in closed form, and numerically evaluate the HCR-LB. Furthermore, we analyze the effects of rendering error on the computed HCR-LB and show that the rendering error typically introduces a bias in the computed HCR-LB values. In particular, we show that the HCR functional computed using erroneously rendered images underestimates the true value especially for NLOS parameter estima- tion problems. We show that this bias vanishes at the rate of O(N −1) for unbiased and progressive renderers used here, where N is the number of samples per-pixel used. Based on our error analysis, we also provide a simple method to estimate intervals for the true HCR-LB. Our error analysis automatically accounts for the error accrued in computing the CR-LB using Finite Differences (FD) and indicates that the uncertainty (or the size) of the estimated intervals for CR-LB would be at least as large as those for HCR-LB. Thus, in addition to being at least as tight as CR-LB in value, HCR-LB is also at least as robust to rendering errors as CR-LB. Our renderer-enabled lower-bounding framework has been used to compute lower bounds for a few illustrative NLOS imaging problems under two common noise mod- els: Poisson Noise and AWGN with different levels of noise variances. Additionally, we are able to compute pixelwise Fisher Information (or FD-FI). This FD-FI data pro- vides useful insights, especially for NLOS imaging problems, as they tell us which of the indirect photons/observations convey more information about the parameter(s) of interest. We believe that these insights and tools can be used to develop novel adaptive sensing strategies for scene parameter estimation. Although we explore only classical 69 multi-spectral imaging systems in this work, our estimation-theoretic framework read- ily generalizes to accommodate additional dimensions of the plenoptic function, e.g., lenslet-array camera systems, polarization, motion in the scene, etc. The potential benefits of our HCR-LB framework come at a cost of increased com- putational requirements. Computing HCR-LB involves finding the supremum of the expression in (3.4) for general noise models. In this work, we compute the supre- mum by exhaustive search since the parameter space is small. However, if one is in- terested in problems with multiple parameters (d 1), the computational time for exhaustive search would increase exponentially making it prohibitive to apply in high- dimensional settings. For such settings, we could potentially explore either derivative free (zeroth-order) optimization methods, or recently developed differentiable renderers like [53,78] to compute derivatives and evaluate the HCR-LB, Equations (3.6) and (3.7) using gradient-based optimization algorithms. While it might be hard to obtain the global maxima of the HCR functional using gradient-based methods, convergence to any local maxima would yield a valid (but possibly loose) lower bound. It would be interesting to study the landscape of the HCR functional for such multi-parameter es- timation problems. We defer these investigations to future work. Finally, we supplement our lower bounding framework by comparing the HCR-LB computed here with the performance of the Maximum Likelihood Estimator (MLE) for a simple, but illustrative, object localization problem. We see that the HCR-LB values closely matches the behavior of MLEs indicating that our framework is able to compute meaningful lower bounds that reflect the true fundamental limits of scene parameter estimation in plenoptic imaging systems. While (asymptotically) unbiased estimators like MLEs are useful in understanding the fundamental limits of parameter estimation, it is common to introduce “bias” into the estimates in the form of regularization to reduce the overall estimation error [79], especially in high-dimensional statistical inference. Generalizing our lower bounding framework for such biased estimators would be an interesting avenue for future work.
3.8 Acknowledgment
We graciously acknowledge support for this work by the DARPA REVEAL program, Contract No. HR0011-16-C-0024, and the Minnesota Supercomputing Institute (MSI) 70 at the University of Minnesota (URL: http://www.msi.umn.edu) for providing compu- tational resources for rendering the scenes used in Section 3.4.4. We also thank Prof. Gary Meyer and his students Prof. Michael Tetzlaff and Dr. Michael Ludwig, for use- ful discussions about ray-tracing algorithms and for helping out with the design of the Π-shaped hallway scene used in this paper.
3.9 Appendix
3.9.1 Empirical validation of Assumption A.2
We provide empirical evidence using simulations to validate Assumption A.2 here. For a an example scene described below, we empirically study the relationship between the weighted sum of per-pixel variance of rendered images and the number of samples per- pixel N, used for rendering. It is worth noting that for unbiased rendering algorithms, per-pixel variance of rendered images is equivalent to the per-pixel error variance. i.e. Var(L(N)(ω)) = Var(E(N)(ω)). Hence we use the variance of rendered pixel values in this section to empirically validate Assumption A.2. Our example scene for this simulation consists of a red teapot placed at the intersec- tion of corridors A and B in the hallway scene described in Section 3.2.2. We consider 11 equally-spaced values of samples per-pixel N ranging from 1024, 2048,..., 11264 (= 11 · 1024). For each value of N, we render 100 independent low resolution (40 × 30) RGB images of the same scene using Redner [53] and compute the per-pixel variance. The first image (top-left) in Figure 3.11 shows a single instance of the rendered RGB scene with N = 1024 samples per-pixel. The other 11 images in Figure 3.11 show the per-pixel variance (summed over the 3 color channels) of the rendered images. We can see that the per-pixel variance is not uniform across the entire image. Some regions of the image have smaller error (pixels corresponding to the teapot) than others (pixels on the back wall above the teapot). However, we can also observe the general trend that pixel variances consistently decrease with increasing values of N.
We consider weights Wω ∼ Uniform [0, Lmax], where we set Lmax = 12, which is the radiance value of the lights in the scene. It is worth mentioning that the value of Lmax was observed to have little to no effect on the results of this simulation. For (N) P (N) different values of N, we then compute γ := Wω · Var(L (ω)). In order to find ω∈Ω the exact rate of decay of γ(N) with respect to N, we fit parametric models given by, 71 Image Rendered Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance at N = 1024 for N = 1024 for N = 2048 for N = 3072 for N = 4096 for N = 5120 7e-05 5e-05 3e-05 3e-05 2e-04 6e-05 4e-05 3e-05 2e-05 1e-04 5e-05 2e-05 2e-05 1e-04 4e-05 3e-05 2e-05 1e-05 8e-05 3e-05 2e-05 1e-05 1e-05 5e-05 2e-05 1e-05 1e-05 3e-05 1e-05 5e-06 5e-06
Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance Per-Pixel Variance for N = 6144 for N = 7168 for N = 8192 for N = 9216 for N = 10240 for N = 11264 2e-05 3e-05 2e-05 1e-05 1e-05 1e-05 2e-05 2e-05 1e-05 1e-05 1e-05 1e-05 1e-05 1e-05 2e-05 1e-05 1e-05 8e-06 1e-05 8e-06 1e-05 7e-06 6e-06 1e-05 7e-06 6e-06 1e-05 5e-06 4e-06 5e-06 4e-06 5e-06 5e-06 2e-06 2e-06 2e-06 2e-06
Figure 3.11: Top left: A single instance of the teapot image rendered with 1024 samples. Other plots: Per-pixel variance (summed over the 3 color channels) for the teapot image for different values of samples per-pixel N. It can be seen that per-pixel variance is not same across all pixels in the image. While the general pattern of pixel-wise variance is similar across different values of N, the magnitude of the variance decreases (as expected) with increasing samples.
(N) −p γ = Cp · N for values of p between 0.85 and 1.15 (in 0.001 increments) and choose the value of p that has the smallest L2 error of fit,
(N) −p 2 popt = arg min kγ − Cp · N k2. 0.85≤p≤1.15
4 We repeat the above process with 10 independent draws of Wω to find the distri- bution of popt. Figure 3.12(a) shows that the average fitting error is smallest for p = 1, with an average squared L2-error = 3.154 × 10−4. Furthermore, we can see from the distribution of popt in Figure 3.12(b) that popt is highly concentrated around p = 1. Mean, median and mode of the distribution all occur at p = 1. Finally, Figure 3.12(c) shows a single instance of γ(N) and the corresponding model fit using p = 1, which −1 shows how well the parametric model of Cp · N fits the observed weighted sum of pixel variances. These simulations show that the weighted sum of pixel variances follow a Θ(N −1) parametric decay rate with high probability, thus validating Assumption A.2. 72
Model-Fitting Error vs Degree p Histogram of Optimal Degree - popt Weighted Sum of Pixel Variance vs N 0.35 0.6 Average Squared L-2 Error Observed Values Optimal Degree, = 1.000 6 popt Model fit for p = 1.000 0.30 0.5 5
0.25 ))
0.4 ω ( )
N 4 0.20 ( L 0.3 ( Var
Fraction 3 0.15 · ω
0.2 W ω ∑ Squared L2-Error of fit 0.10 2
0.1 0.05 1
0.0 0.00 0.85 0.90 0.95 1.00 1.05 1.10 1.15 0.96 0.98 1.00 1.02 1.04 2000 4000 6000 8000 10000 Degree p Degree p Samples per pixel - N (a) (b) (c)
4 Figure 3.12: Results from the simulation with 10 independent draws of weights Wω: (a) Average Squared L2-Error of fit vs Degree p; (b) Distribution of the optimal degree popt shows that most of it is concentrated around p = 1; (c) A single instance of weighted sum of pixel variance (γ(N)) along with the model fit using p = 1.
3.9.2 Proof of Theorem 3.1
Let us denote the true plenoptic intensities for parameter values Lθ∗ , Lθ∗+∆ by L1, L2 respectively, for brevity of notation. We can then describe the corresponding (inexactly) (N) (N) rendered plenoptic values using N samples per-pixel as, Le1 = L1 + E1 and Le2 =
L2 + E2, where the dependence of the rendering noise on N is implicit. For the case of HCR lower bound under Poisson noise (Section 3.4.1), if we use the rendered plenoptic values, we end up with erroneous estimation of λP , given by
(N) (N) 2 X Le1 (ω) − Le2 (ω) λeP = ω∈Ω Le1(ω) 2 X (L1(ω) − L2(ω) + E(ω)) = , (3.26) L1(ω) + E1(ω) ω∈Ω where we let E := E1 − E2. If the errors in the rendered image are relatively small, i.e.
|E1(ω)| L1(ω) for all ω ∈ Ω, then we can use the series expansion to write
2 1 1 E1(ω) E1(ω) = 1 − + 2 − ... . (3.27) L1(ω) + E1(ω) L1(ω) L1(ω) L1(ω) 73 Substituting (3.27) in (3.26), we get
2 ! 2 X L1(ω) − L2(ω) + E(ω) E1(ω) E1(ω) λeP = 1 − + 2 − ... L1(ω) L1(ω) L1(ω) ω∈Ω 2 ! ∞ k X L1(ω) − L2(ω) X −E1(ω) = λP + L1(ω) L1(ω) ω∈Ω k=1 | {z } (I) 2 ! ∞ k X E(ω) + 2 L1(ω) − L2(ω) E(ω) X −E1(ω) + . (3.28) L1(ω) L1(ω) ω∈Ω k=0 | {z } (II)
If we further take a closer look at the term (I), we can see that
( 2 ) X (L1(ω) − L2(ω)) −(1+δ) E [(I)] = 3 Var(E1(ω)) + O(N ), (3.29) L1(ω) ω∈Ω where we have used the unbiasedness (Assumption A.1) and the bound on higher- order moments (Assumption A.3) of the rendering errors to get (3.29). Using similar arguments, we can show that
X 1 2(L1(ω) − L2(ω)) −(1+δ) E [(II)] = Var(E(ω)) − 2 E[E(ω)E1(ω)] + O(N ) L1(ω) L1(ω) ω∈Ω ( X Var(E1(ω)) + Var(E2(ω)) = − L1(ω) ω∈Ω ) 2(L1(ω) − L2(ω)) −(1+δ) 2 (Var(E1(ω)) − E[E1(ω)E2(ω)]) + O(N ) L1(ω) (3.30) X 2L2(ω) − L1(ω) Var(E2(ω)) −(1+δ) = 2 Var(E1(ω)) + + O(N ), L1(ω) L1(ω) ω∈Ω (3.31) where we have used the fact that E1 and E2 are independent (Assumption A.4) and zero- mean (Assumption A.1) to get Var(E(ω)) = Var(E1(ω))+Var(E2(ω)) and E[E1(ω)E2(ω)] = 74
E[E1(ω)] · E[E2(ω)] = 0. Combining Equations (3.29) and (3.31), we get
2 X L2(ω) Var(E2(ω)) −(1+δ) E [(I) + (II)] = 3 Var(E1(ω)) + + O(N ) L1(ω) L1(ω) ω∈Ω C = P + O(N −(1+δ)), (3.32) N for some scene-dependent constant CP ≥ 0 and δ > 0, which is the constant appearing in Assumption A.3. Equation (3.32) follows directly from Assumption A.2 about the rate of decay of weighted sums of pixel-wise error variance. Now we can get a handle on the variance of (I) + (II),
h 2i 2 Var ((I) + (II)) = E ((I) + (II)) − {E [(I) + (II)]} . (3.33)
From (3.28), we can rewrite
X c0(ω)E1(ω) (I) = , (3.34) L1(ω) + E1(ω) ω∈Ω