Magnetic Resonance in Medicine 000:000–000 (2011)
The Bias/Variance Trade-Off When Estimating the MR Signal Magnitude From the Complex Average of Repeated Measurements
M. Dylan Tisdall,1,2*, Richard A. Lockhart,3 and M. Stella Atkins4
The signal-dependent bias of MR images has been considered the earliest attempt to compensate for it, in work by Henkel- a hindrance to visual interpretation almost since the beginning man (1,2). However, the recognition that the magnitude of of clinical MRI. Over time, a variety of procedures have been a quadrature MRI measurement has a Rician distribution, suggested to produce less-biased images from the complex and this distribution’s link to the signal processing litera- average of repeated measurements. In this work, we re-evaluate ture, was first noted by Bernstein et al. (3), who proposed these approaches using first a survey of previous estimators in the MRI literature, then a survey of the methods statisticians that correcting the Rician’s bias would improve image con- employ for our specific problem. Our conclusions are substan- trast and in turn improve feature detectability. Several tially different from much of the previous work: first, removing authors have since suggested techniques for correcting or bias completely is impossible if we demand the estimator have reducing the bias of a Rician-distributed MRI magnitude bounded variance; second, reducing bias may not be beneficial measurement (4–10), in addition to related previous work to image quality. Magn Reson Med 000:000–000, 2011. © 2011 on this problem in other disciplines (11–14). Wiley-Liss, Inc. The problem of bias correction can be thought of in Key words: signal-to-noise ratio; image bias; maximum likeli- statistical terms as the construction of an estimator func- hood estimate; denoising; nuisance parameters tion taking the measured data as its arguments and giv- ing an estimate of the magnitude where the function’s INTRODUCTION bias varies slowly—ideally not at all—as the “true” sig- MRI magnitude images are constructed from noisy mea- nal varies. It is important to note that these estimators surements of some presumed “true” signal magnitude at are only tasked with correcting for the measurement con- each voxel in the subject. An estimator is a function which tribution due to thermal noise. Artifacts (e.g., ghosts) are takes the noisy measurement as its arguments and returns not considered noise for the purpose of these estimators an estimate of the “true” value. In the context of MRI, we but are lumped in as part of the “true” signal. Although use an estimator function to estimate the values of the ghosts and other artifacts almost always have a structured, “true” signal magnitude at each voxel and then render an systematic influence on the measurements, the thermal image of these estimates. The quality of the estimator func- noise is a stochastic contribution to the measurement tion used is then of great importance in determining the and is thus particularly amenable to statistical estimation quality of the resulting image. methods. An estimator is biased if “true magnitude” s is, on aver- Several related estimation problems have arisen in the age, mapped to an estimated value sˆ = s; the bias is then MRI literature from this basic setup, so we will begin ˆ by surveying them and carefully specifying the estima- bsˆ = s − s. An estimator is said to have a signal-dependent tion problem that we are interested in for the remainder bias if bsˆ varies with s. In the context of MRI images, this could, for example, reduce contrast by overestimating the of this work. Our chosen estimation problem has been magnitude in low-magnitude regions. This is, in fact, a well-studied in the MRI literature, but we make three con- known issue: interest in the signal-dependent bias of MRI tributions here. First, we survey the previous work and magnitude images seems to have first appeared, along with contextualize it with the suggested methods from statis- tical estimation theory, along the way introducing a novel estimator [the mean Bayesian (MB) estimator]. Second, we introduce a novel bound on estimator performance for our 1Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, USA chosen problem with Eqs. 32 and 33, and their exten- 2Department of Radiology, Harvard Medical School, Brookline, sions in the Appendix. This bound imposes a trade-off Massachusetts, USA 3Department of Statistics and Actuarial Science, Simon Fraser University, between bias and variance for any possible estimator in Burnaby, British Columbia, Canada our problem, and any improvement beyond the bound must 4School of Computing Science, Simon Fraser University, Burnaby, British come from the inclusion of prior information (e.g., smooth- Columbia, Canada ness assumptions). Third, we analyze the estimators from Grant sponsor: Natural Sciences and Engineering Research Council of Canada *Correspondence to: M. Dylan Tisdall, Ph.D., Athinoula A. Martinos Center the previous literature, and the new estimator we present, for Biomedical Imaging, Massachusetts General Hospital, 149 13th Street, relative to this bound. Charlestown, MA 02129. E-mail: [email protected] Before we describe the estimation problem under study, Received 18 October 2010; revised 25 January 2011; accepted 15 February 2011. we need a model of the thermal noise. All of the estima- DOI 10.1002/mrm.22910 tors we are interested in rely on a common probabilistic Published online in Wiley Online Library (wileyonlinelibrary.com). model: measurements in quadrature MRI are assumed to © 2011 Wiley-Liss, Inc. 1 2 Tisdall et al. consist of two independent channels, each corrupted with measurements can be quickly generated with the subject in the addition of zero-mean gaussian noise (1,6,15–17): the scanner by acquiring data without an excitation pulse or gradients. Based on this, many of the previous estimators 1 (a − s cos φ)2 + (b − s sin φ)2 σ p(a, b; s, φ, σ) = exp − . have assumed that is effectively known, and remove it 2πσ2 2σ2 from their models’ parameters, treating it as a fixed, known [1] value (1,4–6,8,9). In the remainder of this article, we will similarly treat σ as known. In our notation here, a and b are the random variables We will use the notation sˆ(r , θ ) to represent an estimator for the measurements in the real and imaginary chan- for the unknown s at a voxel where the n-length list (r , θ )of φ nels respectively, s and are the true signal magnitude measurements was acquired. Two estimators are most-used and phase (where “true” phase is due, for example, to for our problem. First, σ B0 inhomogeneity), and is the standard deviation of the thermal noise. If we represent our measurements in polar n−1 2 n−1 2 1 coordinates, we have sˆ (r , θ ) = (r cos θ ) + (r sin θ ) , [3] Mag n i i i i r s2 + r2 − 2sr cos(θ − φ) i=0 i=0 p(r, θ; s, φ, σ) = exp − , [2] 2πσ2 2σ2 is commonly called a “magnitude image.” This estima- tor averages the complex measurements and subsequently where r and θ are the random variables for the measured signal magnitude and phase. takes the magnitude of the average. An alternative formula- A family of similar-looking estimation problems can be tion that also goes by the name “magnitude image” is given generated from this model depending on what assumptions by first taking the magnitude of each measurement and then are made about the signal. In the following, we assume that combining via sum-of-squares: s and φ are always unknown and σ is always known. n−1 1 1. r and θ are measured one or more times at each voxel. sˆ (r ) = r2. [4] MagAlt n i We assume that s, φ, and σ are constant for all mea- i=0 surements at each voxel. We want to estimate the unknown s at each voxel and ignore the unknown φ. It is important to note that these two estimators will not This form of the problem has been addressed by sev- generate the same image and that the resulting “magnitude eral groups (1,4–6,9). Under the additional assump- images” will have different noise distributions in general. ˆ tion that the unknown φ is slowly varying spatially, Under the assumptions we have given for our problem, sMag ˆ this problem has been addressed by several groups has a Rician distribution while sMagAlt has a noncentral- χ (3,18–20); instead, with a spatial local-smoothness distribution with 2n degrees of freedom—these two = constraint on s this problem has been addressed by estimators are therefore the same only when n 1. (21,22). Most of the attention on bias-correction in previous MRI 2. r is measured one or more times at each voxel (θ is not literature has been focused on the Rician distribution (i.e., ˆ = measured). We assume that s and σ are constant for images generated via sMag or where n 1). Henkelman all measurements at each voxel as we allow φ to vary suggested that the unbiased signal could be estimated by between measurements. Estimation of the unknown s setting the expected-value function for the Rician distri- while ignoring the unknown φ has been addressed by bution equal to the measured magnitude and solving for [ ] Sijbers et al. (8,9). s (1,2). Using the notation E z to represent the expected value of the random variable z, we can write this as There are many more permutations of this problem struc- ture; others are addressed in (6,9,10,23–25). E[sˆMag]=sˆMag. [5] In this work, we are interested solely in the prob- lem of estimating the true signal magnitude from single- In statistical terminology, this is the simplest form of the channel quadrature MRI measurements without assuming method of moments. McGibney and Smith (4) and Miller spatial smoothness—where only measurements from a sin- and Joseph (5) independently also used the method of gle voxel are used in the estimation of the true signal moments, but noted that the Rician’s second moment magnitude at that voxel. We assume that we have taken one ˆ 2 2 2 ˆ 2 or more measurements of the pair (r, θ) for our voxel of inter- E[(sMag) ]=s + 2σ = (sMag) [6] est, and that the unknown s, φ, and σ are constant for all measurements of our voxel. Our goal is to estimate s while permits a simpler method of moments estimator: ignoring φ and σ. We will represent our measurements via 2 2 the measurement lists r =[r , ..., r ] and θ =[θ , ..., θ ] or sˆMom2(r , θ) = (sˆMag(r , θ)) − 2σ . [7] 0 n 0 n alternatively a =[a0, ..., an] and b =[b0, ..., bn]. Additionally, although s and φ vary between voxels, we This method has also previously been employed for the note that, for a given volume, σ is fixed for all measurements same statistical problem as it arises in other disciplines of all voxels as the thermal noise results from processes [e.g., (11,13,14)]. unrelated to the sampling location in k-space (15). This One complaint that has been leveled against sˆMom2 is fact makes σ particularly easy to estimate. Any region of that it can produce imaginary-valued estimates for the real- homogenous true signal magnitude (usually a region of air valued s (8). We have a different interpretation, suggested 2 2 2 where it is known that s = 0) can be used to produce an by estimation of s in (11). We note that, as (sˆMag) − 2σ is effective estimate of σ. Alternatively, a series of noise-only an unbiased estimator of s2, when s 0 this equation must Bias/Variance Trade-Off 3 sometimes take negative values. However, as we know that by collapsing the probability distribution for our multiple s2 must be positive, there is another estimator that is guar- measurements (also called multiple excitations) into one 2 anteed to have equal or smaller error: max[0, (sˆMag) − 2σ], distribution. We do this using the concept of a sufficient although it will now be biased. Taking the square root then statistic. Informally, if we have a probability distribution gives us p(x, y; P), with x, and y being measurements and P being a parameter, the derived statistic f (x, y) is sufficient for P { [ ˆ 2 − σ]}1/2 = {[ ˆ 2 − σ2]1/2} max 0, (sMag) 2 Re (sMag) 2 . [8] if knowing just f (x, y) tells us as much about P as know- ing both x, and y [for further discussion of sufficiency see Thus, by simply taking the real part of sˆ (likely what Mom2 (29,30)]. the original authors intended), we are basing our result For our problem, it is known that the optimal sufficient on an improved estimator of s2 and producing a valid statistic is the complex average of the measurements (30). estimator for s. Thus, we define our sufficient statistics to be A = 1 n a , A related estimator has been suggested by Gudbjartsson √ n i=1 i = 1 n = 2 + 2 Θ = and Patz, B n j=1 bi, or equivalently R A B and arg(A, B), where arg is the argument function (e.g., atan2 in −π π 2 2 the C programming language) with range from to . The sˆGP(r , θ) = |(sˆMag(r , θ)) − σ |, [9] distribution of A and B is then also binormal, but with an which is proposed to have a more gaussian-like distribu- appropriately scaled standard deviation tion (6). p(A, B; s, φ) Sijbers et al. have considered several applications of the maximum likelihood method for deriving estimators in the n n[(A − s cos φ)2 + (B − s sin φ)2] = exp − , context of MRI (8,9,26). Their conclusions, as relevant to 2πσ2 2σ2 our problem, can be summarized as: when given a pair of [10] lists r and θ of quadrature MRI measurements from a single ˆ voxel meeting our stated assumptions, we should take sMag or in polar coordinates as the estimate of s (9). Additional previous work on this problem has extended estimators to multiple channels (10), Rn n[s2 + R2 − 2sR cos(Θ − φ)] p(R, Θ; s, φ) = exp − . but for simplicity we will leave the channel-combination 2πσ2 2σ2 problem aside here. [11] In the “Theory” section, we consider the major methods √ √ √ that statisticians have suggested for deriving information- s n A n B n If we define s = σ , A = σ , and B = σ ,wecan maximizing estimators (i.e., estimators that seek to get the write most information about the true MRI signal magnitude based on the available measurements). Although maxi- p(A , B ; s , φ) mizing the use of available information is theoretically 1 (A − s cos φ)2 + (B − s sin φ)2 appealing, it is also possible, as suggested by Bernstein = exp − , [12] et al. (3), that the signal-dependent, and thus spatially vary- 2π 2 ing, bias is a significant perceptual issue in radiologists’ √ R n interpretation of MRI magnitude images. With this in mind, and with R = σ we have we continue the “Theory” section by studying the prop- 2 2 erties of estimators that seek explicitly to reduce bias. As R s + R − 2s R cos(Θ − φ) p(R , Θ; s , φ) = exp − . part of this, we show that there is an unavoidable trade-off 2π 2 between bias and variance when estimating the true MR [13] signal magnitude. In the “Computing Estimator Metrics” section, we quan- This equation makes clear that the quantities marked with tify all the previous estimators’ performance relative to this “” do not depend on any of the fixed parameters and so trade-off bound. In addition to quantifying mean squared their estimation is the same regardless of the number of error and bias [some of which we previously computed measurements or the standard deviation of the noise. For with a different method in (27)], we attempt to illumi- the remainder of this work we will operate on the stan- nate the bias/variance trade-off between these estimators dardized sufficient statistics A, B, R, and Θ and their via comparison with computed bounds on best-case bias standardized binormal distributions in terms of s and φ. or variance (28). The results of these comparisons are pre- ˆ It is useful to note that s Mag = R . More concretely, the sented in the “Results” section. In the “Discussion” section, remainder of this work will address the question of whether we summarize our conclusions generally and suggest mea- ˆ s Mag can be improved by selecting a different function sures other than bias that we believe are worth considering applied to complex-average of the measurements. when evaluating image estimators.
Information Maximizing Estimators THEORY The majority of statistical prescriptions for deriving estima- Choice of Distribution tors proceed from the idea that the measurements contain Our approach to the problem of estimating the magnitude information about the unknown parameters. Estimators are of the true MRI signal begins by selecting the distribution then suggested that optimally use the available informa- which we will use as the basis for our estimators. We begin tion in the measurements to calculate an estimate of the 4 Tisdall et al. desired parameters. However, there are several theories in Although we omit details of the derivation here, the the statistics literature regarding how the informational Bartlett-corrected profile likelihood is produced by intro- relationship between the measurements and the unknown ducing a correction factor ∆ to the derivative of the profile parameters should be turned into an estimator. We will likelihood function (31). We have previously shown (27) consider several approaches from the statistical theory and that for our problem, the Bartlett correction is given by derive the estimators that they consider optimal. 2 1 ∂ ∂ ∆ =− E (s , φ; A , B ) 2s2 ∂s ∂φ2 Maximum Likelihood Estimator 1 The basic mechanism of the maximum likelihood esti- = . [19] 2s mator (MLE) is to view the distribution as a function of the parameters with the measurements fixed (viewed this Applying this correction gives us the equation way, a distribution is renamed a likelihood), and maximize d (s ; A , B ) 1 the resulting function. It is often the case that taking the P − ∆ = R − s − = 0, [20] logarithm of the likelihood function makes taking deriva- ds 2s tives easier as having no effect on the location of maxima, whose solution defines the Bartlett-corrected profile likeli- and so the maximum log-likelihood is often substituted. In hood estimator for our problem our problem, taking derivatives of the log-likelihood with φ √ respect to s and and using the our sufficient statistics as 2 R + R − 2 our input data gives sˆ (R ) = . [21] Corr 2 ∂(s , φ; A , B ) = A cos φ + B sin φ − s [14] We note that this estimator can produce√ complex-valued ∂ s estimates of the real-valued s when R < 2. We also note ∂(s , φ; A , B ) that this estimator is simply the average of sˆ and sˆ , = s (B cos φ − A sin φ), [15] Mag Mom2 ∂φ and so we can use the same approach as with sˆMom2 to force our estimates to be√ real-valued—in this case, the logic where (s , φ; A , B ) = log p(A , B ; s , φ). Setting these equal ˆ implies that when R < 2, the estimator becomes s Corr = to zero gives the maximum likelihood (ML) estimator, R/2. previously noted by Sijbers and den Dekker (9), sˆ also results from application of another famous Corr method of correction: the stably adjusted profile likelihood (sˆ , φˆ ) = ( A2 + B2, arg(B /A )) = (R , Θ) [16] ML ML estimator proposes multiplying the profile likelihood by a As our model has two parameters, the maximum likeli- factor M(s ) designed to approximate the likelihood func- φ hood estimator of our model is necessarily a pair of values. tion that would result if could be factored out of the If we wish to take just the portion representing the estimate probability density function (29,32). It can be shown that, of s alone, then we refer to φ as a nuisance parameter. Sim- for our problem, the estimator that results from maximiz- ply dropping the nuisance parameter in this way gives an ing the stably adjusted profile likelihood is identical to the estimator called the maximum profile likelihood estimate. one produced via Bartlett correction and so we will refer More formally, it is justified by replacing the likelihood to the estimator in Eq. 21 simply as the maximum cor- ˆ rected profile likelihood estimator for the remainder of this by the profile likelihood where φML is substituted for φ, giving (29). article.
ˆ d (s ; A , B , φ ) P ML = A2 + B2 − s [17] Maximum Marginal Likelihood Estimator ds Noting that none of the previous estimators based on the where we use P to designate the log of the profile likelihood profile likelihood involve the phase of the sufficient statis- φˆ and move ML to the list of known parameters as necessi- tic Θ, it is tempting to discard the phase statistic Θ entirely tated by the profiling. Setting this equal to zero gives the from the model (called marginalizing the distribution) in maximum profile likelihood estimate Eq. 13 giving the Rician distribution ˆ = π s P(R ) R [18] p(R ; s ) = p(R , Θ; s , φ)dΘ −π We can see from this that the maximum profile likelihood ˆ 2 2 estimate of the true signal magnitude is the same as s . s + R Mag = R exp − I (s R ), [22] 2 0 Information-Based Nuisance Parameter Removal This marginalized model is especially more convenient Although profiling to remove nuisance parameters from the for estimation as it also removes φ from the parameters; it maximum likelihood estimator is a well-known practice, it leaves us with a single statistic R and has a single param- is essentially ad hoc and several correction factors have eter s. Although it may be convenient, the critical issue is been suggested by statisticians in attempts to improve the whether basing estimators on this model is correct. It has performance, in particular the bias, of the resulting estima- been a major point of debate in the statistics community tors. We will briefly consider two famous examples of these exactly when such a marginalization is actually “informa- corrections. tion preserving,” and thus whether we should proceed to Bias/Variance Trade-Off 5 use this model [for a survey of this debate, see (33)]. For squared error of the estimate given this distribution and the present, we leave aside arguments about when this data. We define this MB estimator as reduction is justified. The maximum marginal likelihood ∞ ˆ estimator for our problem is the same as the maximum s MB = s p(s ; R , Θ)ds likelihood estimator for the Rician distribution. As noted 0 ∞ above, this has been presented in several contexts (8,11,12). sp(R; s)ds ˆ = 0 [26] We will denote this estimate as s Marg. Unfortunately, the p(R, Θ)2π estimator does not have a closed-form equation, but Sijbers ˆ et al. showed that s Marg = 0 is the unique solution when Solving R 2 ≤ 2 and there is a single positive solution that can be ∞ π 2 > found numerically when R 2, which allows us to cer- p(R , Θ) = p(R , Θ; s , φ)dφ ds tify that we have found the correct root using an efficient 0 −π numerical algorithm (8). R π R2 R2 = exp − I , [27] 2π 2 4 0 4 Bayesian Estimators and There is a separate school of thought in statistics that, ∞ although also much-debated, provides a different set of s p(R ; s )ds = R , [28] rules for generating estimators beyond the likelihood-based 0 ones. The Bayesian approach differs from likelihood-based methods in that Bayesian estimation requires us to specify we can substitute these back into our estimator equation our beliefs about values of the parameters before we see to get any measurement data; these beliefs are called prior distri- 2 exp R butions. In our problem, we would like to specify that the ˆ 2 4 s MB(R ) = . [29] true magnitude and phase are unrelated, and all phases are π 2 I R equally likely to occur at any given point. Thus, we chose 0 4 the prior p(φ) = 1/(2π) and Estimators Minimizing Bias 1 s ≥ 0 p(s ) = [23] As an alternative to focusing on maximizing the use of 0 otherwise. information in the measurements, it is often possible to construct estimators that optimize a chosen error metric. The prior p(s ) is clearly not a well-defined probability den- Given the focus in previous work on the detrimental effects sity function, as it does not integrate to unity. However, it is of signal-dependent estimator bias, it seems natural to ask common practice in Bayesian estimation to allow improper if we can construct an estimator of s whose bias does not priors for parameters so that we can express equivalent vary as s varies. Let us call this constant-biased estimator ˆ ˆ probabilities of events over an infinite range, as in our s CB. We will use b ˆ to denote the bias of the estimator s . s case where R is equally likely on the range (0, ∞). Using Importantly, if we can produce a constant-bias estimator these priors and Bayes’ Theorem, we can write the posterior (with bias b ˆ everywhere), then we can produce an unbi- s CB distribution ˆ ased estimator by taking s −b ˆ . Thus, statements about CB s CB the existence of a constant-bias estimator and an unbiased Θ φ φ Θ = p(R , ; s , ) estimator are equivalent. p(s , ; R , ) . [24] p(R , Θ)2π In particular, we will focus on estimators that ignore the phase of the sufficient statistic (i.e., radially symmetric esti- ˆ If we desire to remove φ from the model, we can do this by mators of the form s (R )) as all of the estimators derived simply marginalizing the posterior to remove φ, thus previously for our problem ignore the sufficient statistic’s phase—refer to Eqs. 3, 7, 9, 18, 21, and 29. A lengthier anal- π ysis, which can be found in the Appendix, is required for Θ = φ Θ φ = p(R ; s ) ˆ Θ p(s ; R , ) p(s , ; R , )d . [25] estimators of the form s (R , ), but we will leave this aside −π p(R , Θ)2π here as it is not necessary to understand the bias-variance trade-off for the cited estimators. where p(R ; s ) is the Rician distribution as in Eq. 22. In general, we can study the relationship between an Having derived the posterior distribution for our param- estimator’s bias and variance using the Cramér-Rao bound. eter of interest, we must now decide how to summarize it This a commonly used bound from statistics that links the with a single estimate for s . If we take the maximum of variance of an estimator with the gradient of its bias and the posterior distribution (commonly called the maximum the second-derivative of the log-likelihood function, which a posteriori or MAP estimate), we end up with the same is also known as the Fisher information. As R is Rician- result as the maximum marginalized likelihood estimator; distributed, we are interested in the Fisher information for ˆ = ˆ formally, s MAP s Marg. the Rician distribution, previously shown by Sijbers and As an alternative to the MAP, we could also consider the den Dekker to be (9) mean value of the parameter s specified by this posterior. This choice is attractive because it minimizes the expected I(s ) = Z(s ) − s 2, [30] 6 Tisdall et al. where for v > 1, I(s)−1 = v coincides with one value of s, which ≤ = ∞ 2 2 + 2 we label sv .Ifv 1, we will define sv 0. We also know 3 I1(s x) s x d ≡ − ˆ Z(s ) x exp dx. [31] that the minimum absolute value of ds bs is attained for 0 I0(s x) 2 < s sv by having equality in Eq. 32. Thus, we can write ˆ We then note that if an estimator s (R ) has bias gradient > 0 s sv [ ˆ] d d dE s ˆ = b ˆ (s ) = (s )−1, we can write the Cramér-Rao bound bs v − ≤ . [33] ds s ds ds −1 1 s sv for the estimator’s variance (28) I(s ) 2 This equation indicates that, irrespective of the value of v σ2 ≥ + d −1 → + d →− ˆ (s ) 1 b ˆ (s ) I(s ) . [32] chosen, as s 0 we must have b ˆ 1. However, s ds s ds s it also indicates that changes to v affect the√ rate at which d + − ˆ →− → + 1 ds bs 1ass 0 ; the effect is O( v). Thus, if we Studying Eq. 32, we can see that lims →0 I(s ) =∞, d d →− → + ˆ = = want to halve the rate at which b ˆ 1ass 0 , which implies that, if ds bs (s ) 0 for all s 0, we ds s + σ2 we must pay for this with a quadrupling of the bound on cannot put an upper bound on lims →0 ˆ (s ) in Eq. 32. s variance v (a variation of this result is extended to all esti- Thus, there is no estimator for our problem with constant mators sˆ(R, Θ) in the Appendix). We will return to this bias and bounded variance. This is a significant result as bound when we evaluate our estimators in the “Results” it explains why none of the estimators we show above are section. able to achieve an unbiased output: there is in fact no esti- mator of this form that can do so as maintaining a useful bound on variance. Visualization of Estimators Having shown the lack of a useful constant-bias estima- As these estimators are all functions from the real-valued tor, we might naturally ask how much signal-dependent R to a real-valued estimate sˆ, they are easy to visualize as variation in the estimator’s bias we must accept in order to filters by plotting the response (sˆ) as a function of the input bound the estimator’s variance at some chosen v. Consid- (R). In Fig. 1, we have performed such a plot to facilitate d −1 ˆ = ≤ ering [32], we can have ds bs 0asI(s ) v. Then, comparison of estimator behavior. −1 d > ˆ where I(s ) v, we must decrease d s bs proportion- In Fig. 2, we show an illustrative example of the esti- ally to compensate for the growth of I(s)−1.AsI(s)−1 is mators’ output given a synthetic input image corrupted −1 strictly decreasing with lims→∞ I(s ) = 1, we know that with complex gaussian noise. We note that some of the
FIG. 1. Responses of the esti- mators. Each plot displays the estimate of s produced by one estimator. The x-axis is the mea- sured value R, and the y-axis is the resulting estimate sˆ. Bias/Variance Trade-Off 7
FIG. 2. Example of estimator output. Complex zero-mean gaussian noise with σ = 1 was added to the true image, and the resulting magnitude data was used to produce estimated images from the estimators discussed above. The four target circles have integer amplitudes from 1 to 4. The images are scaled independently so that each covers the entire dynamic range available for display.