Optimal Stopping Times for Estimating Bernoulli Parameters with Applications to Active Imaging

OPTIMAL STOPPING TIMES FOR ESTIMATING BERNOULLI PARAMETERS WITH APPLICATIONS TO ACTIVE IMAGING Safa C. Medin, John Murray-Bruce, and Vivek K Goyal Boston University Electrical and Computer Engineering Department ABSTRACT First we establish a framework for optimal adaptation of the number of trials for a single Bernoulli process. Then we consider a rectangu- We address the problem of estimating the parameter of a Bernoulli lar array of Bernoulli processes representing a scene in an imaging process. This arises in many applications, including photon-efficient problem, and we evaluate the inclusion of total variation regulariza- active imaging where each illumination period is regarded as a sin- tion for exploiting correlations among neighbors. In a simulation gle Bernoulli trial. We introduce a framework within which to min- with parameters realistic for active optical imaging, we demonstrate imize the mean-squared error (MSE) subject to an upper bound on a reduction in MSE by a factor of 2.67 (4.26 dB) in comparison to the mean number of trials. This optimization has several simple and the same regularized reconstruction approach applied without adap- intuitive properties when the Bernoulli parameter has a beta prior. In tation in numbers of trials. addition, by exploiting typical spatial correlation using total variation regularization, we extend the developed framework to a rectan- gular array of Bernoulli processes representing the pixels in a natural 1.1. Related Work scene. In simulations inspired by realistic active imaging scenarios, we demonstrate a 4.26 dB reduction in MSE due to the adaptive In statistics, forming a parameter estimate from a number of i.i.d. ob- acquisition, as an average over many independent experiments and servations that is dependent on the observations themselves is called sequential estimation invariant to a factor of 3.4 variation in trial budget. [4]. Early interest in the sequential estimation problem for a Bernoulli process parameter was inspired by the Index Terms— adaptive sensing, Bernoulli processes, beta dis- high relative error of deterministically stopping after n trials when tribution, computational imaging, conjugate prior, low-light imag- pis small. Specifically, the standard error of the ML estimate is ing, photon counting, total variation regularization p(1 − p)=n, which for small p is unfavorable compared to any- thing proportional to p. This shortcoming manifests, for example, 1. INTRODUCTION in the difficulty of distinguishing between two small possible values for p when n is not large. Estimating the parameter of a Bernoulli process is a fundamental Haldane [5] observed that if one stops after ` successes, the (ran- problem in statistics and signal processing, and it underlies the rela- dom) number of trials is informative about p, and a simple unbiased tive frequency interpretation of probability [1]. From the outcomes estimate with standard error proportional to p can be found provided of independent and identically distributed (i.i.d.) binary-valued tri- that ` ≥ 3. Tweedie [6] suggested to call this inverse binomial sam- als (generically, failure (0) or success (1)), we wish to estimate the pling, but the resulting random variable is now commonly known probability p of success. Among myriad applications, our interest as negative binomial or Pascal distributed. More recent works have is raster-scanned active imaging in which a scene patch is periodi- focused on non-MSE performance metrics [7,8], estimation of func- cally illuminated with a pulse, and each period either has a photon- tions of p [9], estimation from imperfect observations [10], and com- detection event (success) or not (failure) [2]. The probability p has posite hypothesis testing [11]. a monotonic relationship with the reflectivity of that patch, and an First-photon imaging [12] introduced the use of a nondetermin- estimate of p becomes the corresponding image pixel. For efficiency istic dwell time to active imaging. This method uses the number in acquisition time or illumination energy, we are motivated to form of illumination pulses until the first photon is detected to reveal in- the image accurately from a small number of illumination pulses, formation about reflectivity, setting ` = 1 in the concept of Hal- under conditions where p is small.1 dane. Spatial correlations are used to regularize the estimation of Conventional systems are not adaptive. With a fixed number of the full scene reflectivity image, resulting in good performance from trials n, the number of successes K is a binomial random variable, only 1 detected photon per pixel, even when half of the detected and the maximum likelihood (ML) estimate of p is K=n, which has photons are attributable to uninformative ambient light. Comparing mean-squared error (MSE) of p(1 − p)=n.2 As we will show, allow- first-photon imaging to photon-efficient methods with deterministic ing the number of trials to vary while maintaining an upper bound of dwell time [2, 13–21] was an initial inspiration for this work. To the n on the average number of trials can result in improved performance best of our knowledge, no previous paper has explained an advantage under various metrics. We limit our attention here to the MSE of p. or disadvantage from variable dwell time. This material is based upon work supported in part by the US National Science Foundation under Grant No. 1422034. 1.2. Main Contributions and Outline 1In applications using time-correlated single photon counters, it is recom- mended to keep p below 0:05 to avoid time skew and missed detections due This work introduces a novel framework for depicting and under- to detector dead time [3]. standing stopping rules for estimating a Bernoulli process parameter 2Motivations for forming some other estimate of p include a prior distri- (Section 2). This framework is not limited to a single error criterion bution for p or a minimax criterion. or to Bayesian formulations, but here we limit our attention to MSE 978-1-5386-4658-8/18/$31.00 ©2018 IEEE 4429 ICASSP 2018 of p when a prior distribution for p is known. We develop an optimal q0;0 data-dependent stopping rule in detail for the case of a Beta prior on p in Section 2.2, extending it to arrays of Bernoulli parameters q q in Section 3. Numerical results inspired by active imaging applica- 0;1 1;1 tions are presented in Section 4 to validate the proposed schemes. q q q We conclude the paper in Section 5. 0;2 1;2 2;2 q q q q 2. A SINGLE BERNOULLI PROCESS 0;3 1;3 2;3 3;3 f g Let Xt : t = 1; 2;::: be a Bernoulli process with (unknown q0;4 q1;4 q2;4 q3;4 q4;4 random) parameter p, and let n 2 R+ be a trial budget.A (random- ized) stopping rule consists of a sequence of continuation probability q q q q q q functions 0;5 1;5 2;5 3;5 4;5 5;5 t qt : f0; 1g ! [0; 1]; t = 0; 1 :::; (1) that give the probability of continuing observations after trial t – Fig. 1. A trellis showing continuation probabilities for observation based on a biased coin flip independent of the Bernoulli process – sequences up to length 5; qk;m denotes the probability of continuing as a function of (X1;X2;:::;Xt). The result is a random number after observing k successes in m trials. of observed trials T .3 The stopping rule is said to satisfy the trial budget when E[T ] ≤ n. Our goal is to minimize the MSE in estimation of p through the Observations cease with ` successes in M trials, where M is a design of a stopping rule that satisfies the trial budget and an estima- NegativeBinomial(`; p) random variable. tor pb(X1;X2;:::;XT ). We will first show that the continuation In general, observations cease with K successes in M trials, probability functions can be simplified greatly with no loss of opti- where K and M are both random variables. Importantly, the i.i.d. mality. Then, we will provide results on optimizing the stopping rule nature of a Bernoulli process makes the pair (K; M) contain all the under a Beta prior on p. information that is relevant from the sequence of observations. Sim- ilarly to the reduction from tree to trellis, conditioned on (K; M) = 2.1. Framework for Data-Dependent Stopping (k; m), all sequences of length m with k successes are equally likely, so the specific sequence among these is uninformative about p. Based on (1), a natural representation of a stopping rule is a binary Our method for optimizing the design of continuation probabil- tree representing all sample paths of the Bernoulli process, with a ities is through analyzing mean Bayes risk reduction from continua- probability of continuation label at each node. This representation tion. We define risk function L as squared error or squared loss has 2t+1 − 1 labels for observation sequences up to length t. How- 2 ever, the tree can be simplified to a trellis without loss( of) optimality. L(p; pb) = (p − pb) ; Conditioned on observing k successes in m trials, all m sequences k where p is the Bernoulli parameter and pb is the estimate of this pa- of length m with k successes are equally likely. Thus, regardless rameter. The Bayes risk R is defined as of the prior on p, no improvement can come from having unequal [ ] continuation probabilities for tree nodes all representing k successes R(p; pb) = E[L(p; pb)] = E (p − pb)2 ; in m trials. Instead, these nodes should be combined, reducing the tree to a trellis. This representation has 1 (t + 1)(t + 2) labels for which in this case is the MSE.

Optimal Stopping Times for Estimating Bernoulli Parameters with Applications to Active Imaging

3.2.3 Binomial Distribution

5 Stochastic Processes

1 Normal Distribution

Bernoulli Random Forests: Closing the Gap Between Theoretical Consistency and Empirical Soundness

Probability Theory

Discrete Distributions: Empirical, Bernoulli, Binomial, Poisson

Randomization-Based Inference for Bernoulli Trial Experiments And

Some Basic Probabilistic Processes

Bernoulli Trials an Experiment, Or Trial, Whose Outcome Can Be Classified

Statistical Challenges with Modeling Motor Vehicle Crashes: Understanding the Implications of Alternative Approaches

AP Statistics Chapter 8

Chapter 4 Commonly Used Distributions Bernoulli Trial