<<

arXiv:0804.4391v2 [cs.IT] 27 May 2009 measurements sett this the In measurements [1]. parameter parameter the the the combining about by information prior which constructed common with in is framework, A Bayesian estimate observations. the the is on problem this based to approach parameter unknown an tighte and compute to simpler methods. wh both alternative in is than cases technique several proposed to demonstrates the signal-to-noi study shown low numerical and is A high and regimes. the both distribution, MSE. in parame probability tight Bayesian vector asymptotically a arbitrary the of case an on general the with bound for developed lower is bound a The in resulting determined, given u ensurderretmto,otmlba,performa , optimal bounds. estimation, error -squared mum isfnto hc iiie h Cram the t minimizes to which in distrib connection prior function proposed well-known the bias is a Using setting. problem utilizes estimation estimation bound deterministic This Bayesian paper. a this in (MSE) error betv st osrc nestimator an construct to is objective ftemaueet,s that so measurements, the of omnmaueo h ult fa siao sismean- its is by an given of (MSE), quality error squared the of measure common uoenCmiso ntefaeoko h P ewr fEx of 216715) Network no. FP7 (contract the 1081 NEWCOM++ of no. COMmunications framework Grant Wireless under the in Isr Foundation in Science Commission Israel 32000, European the by Haifa part in Technology, Th [email protected]). of [email protected]; Institute Technion—Israel fteMS sisl nesbei aycss hshsled has This method cases. the many in of infeasible MSE itself the computation is Unfortunately, compare MMSE MMSE. the to the of is with practice this technique in do suboptimal used these to of way use One the from resulting dation computationally a with estimator algorithm. the efficient MMSE approach to the is su of methods alternatives, such performance of various purpose The result, [2]. mean developed maximum a posterior the As the as practice, prohibitive. computing minimum In often of problem. the is given complexity finding theoretical any the in in a however, difficulty estimator from (MMSE) no Thus, MSE is MSE. there the perspective, minimizing technique h uhr r ihteDprmn fEetia Engineeri Electrical of Department the with are authors The h olo siainter st ne h au of value the infer to is theory estimation of goal The Terms Index Abstract ti elkonta h otro mean posterior the that well-known is It nipratga st uniytepromnedegra- performance the quantify to is goal important An priori a θ rvdsfrhrifrainaotteprmtr The parameter. the about information further provides , oe on ntemnmmmean-squared minimum the on bound lower A — nweg fteukonvle naddition, In value. unknown the of knowledge aeinbud,Bysa siain mini- estimation, Bayesian bounds, Bayesian — x θ vk Ben-Haim, Zvika oe on nteBysa MSE Bayesian the on Bound Lower A r band hs odtoa distribution, conditional whose obtained, are srno,adisdsrbto describes distribution its and random, is posteriori a ae nteOtmlBa Function Bias Optimal the on Based .I I. NTRODUCTION θ ˆ scoeto close is E MP ehiu,hv been have technique, (MAP) {k tdn ebr IEEE, Member, Student θ − θ ˆ hc safunction a is which , rRobudcnbe can bound er–Rao ´ θ ˆ k θ 2 swr a supported was work is } nsm es.A sense. some in . E { θ 0 n ythe by and /07 | x e (e-mail: ael to,the ution, } . eratio se cellence sthe is ing, nce ich ter ng, ch be he s. r n oiaC Eldar, C. Yonina and 6,teCaa–aa–i on 7,adtegeneralizatio the and boun [7], Bell Bellini–Tartara of bound the Chazan–Zakai–Ziv related notably the a improvements, [6], bou to its Ziv–Zakai the problem and includes [4] estimation family bounds the This of scenario. comparing Weiss– family detection on Ziv–Zakai the based The and is [10]. [8], [9], bound bound a Bobrovski–Zakai Weinstein on Cram´er–R the Bayesian [3], based the bound is includes and family inequality Weiss–Weinstein The categories. bounds [3]–[12]. two lower problems simple estimation find various to in seeking MMSE work the of on body large a to forkolde hr r oohrrslscnenn the θ concerning results bounds. other Bayesian no of are performance there asymptotic the particular knowledge, To a our distributions. for of probability SNR low Gaussian-like at of value family true the to converges bound atr 6 reydsuspromneo hi on thig proof at bound and Bell analytical and their Bellini (SNR), of ratio of performance conditions. signal-to-noise discuss sort asymptotic briefly [6] under any Tartara even provide Few accuracy, settings. results of estimation previous particular the in numerically tested [12]. approaches nyoe h admvariable the random in the meaning expectation over the different case, only very this MSE. in a low since has setting, having deterministic estimator MSE an term construct the to However, is goal common a htb digapirdsrbto for fact distribution the prior is connection a this adding for by basis that The approaches. two relations insightful the exist there view, of points [28]. different [25], [14], the it than and to sharper known [25] are is bounds CRB [23], The CRB [27]. be [26], uniform bound the linear-bias functi minimax equivalent are bias pre-specified or, exceptions specific, Two , a having unbiased estimators with with CRB the explicitly bounds, Cram´er–Rao deterministic the deals other most is an Like approaches simplest (CRB). these the bound of far as By used bounds, [22]–[27]. commonly [21] results most recent Barankin more and as [20], well suc Bhattacharya results [19], Hammersley–Chapman–Robbins classical [18], [17], include [16], These Cram´er–Rao well. the as setting, istic parameter MSE unknown deterministic the the of whereas function number, a real is case single Bayesian a the is in MSE that is implications far-reaching with eeal paig rvosbud a edvddinto divided be can bounds previous speaking, Generally ifrn siainstigaie hnoeconsiders one when arises setting estimation different A h cuayo h onsdsrbdaoei usually is above described bounds the of accuracy The lhuhtedtriitcadBysa etnsse from stem settings Bayesian and deterministic the Although determin- the for developed been have bounds lower Many sa as asymptotically tal. et deterministic 1] eety Renaux Recently, [11]. ih nmn ae,ee huhmn later many though even cases, many in tight eirMme,IEEE Member, Senior nnw aaee.I hscs,too, case, this In parameter. unknown x n lmnaydifference elementary One . tal. et tal. et θ θ aecmie both combined have 1]poeta their that prove [11] n deterministic any , [13]–[15]. between staken is as h the , best on. nd ly, ao of h d d n 1 2 problem can be transformed to a corresponding Bayesian set- in the Bayesian sense, while expectation solely over x (in ting. Several theorems relate the performance of corresponding the deterministic setting) is denoted by E ; θ . The notation Bayesian and deterministic scenarios [13]. As a consequence, E θ indicates Bayesian expectation conditioned{· } on θ. numerous bounds have both a deterministic and a Bayesian {·Some | } further notation used throughout the paper is as fol- version [3], [10], [12], [29]. lows. Lowercase boldface letters signify vectors and uppercase The simplicity and asymptotic tightness of the deterministic boldface letters indicate matrices. The ith component of a (1) (2) CRB motivate its use in problems in which θ is random. vector v is denoted vi, while v , v ,... signifies a sequence Such an application was described by Young and Westerberg of vectors. The derivative ∂f/∂v of a function f(v) is a [5], who considered the case of a scalar θ constrained to vector function whose ith element is ∂f/∂vi. Similarly, given the interval [θ0,θ1]. They used the prior distribution of θ to a vector function b(θ), the derivative ∂b/∂θ is defined as the determine the optimal bias function for use in the biased CRB, matrix function whose (i, j)th entry is ∂bi/∂θj. The squared and thus obtained a Bayesian bound. It should be noted that Euclidean norm vT v of a vector v is denoted v 2, while T k k this result differs from the Bayesian CRB of Van Trees [3]; the squared Frobenius norm Tr(MM ) of a matrix M is the two bounds are compared in Section II-C. We refer to denoted M 2 . In Section III, we will also define some k kF the result of Young and Westerberg as the optimal-bias bound functional norms, which will be of use later in the paper. (OBB), since it is based on choosing the bias function which optimizes the CRB using the given prior distribution. A. The Bayesian–Deterministic Connection This paper provides an extension and a deeper analysis We now review a fundamental relation between the Bayes- of the OBB. Specifically, we generalize the bound to an ian and deterministic estimation settings. Let θ be an unknown arbitrary n-dimensional estimation setting [30]. The bound random vector in Rn and let x be a measurement vector. is determined by finding the solution to a certain partial The joint probability density function (pdf) of θ and x is differential equation. Using tools from functional analysis, we px θ(x, θ) = px θ(x θ)pθ(θ), where pθ is the prior distri- , | | demonstrate that a unique solution exists for this differential bution of θ and px|θ is the conditional distribution of x given equation. Under suitable symmetry conditions, it is shown that θ. For later use, define the set Θ of feasible parameter values the method can be reduced to the solution of an ordinary by n differential equation and, in some cases, presented in closed Θ= θ R : pθ(θ) > 0 . (1) { ∈ } form. Suppose θˆ = θˆ(x) is an estimator of θ. Its (Bayesian) MSE The mathematical tools employed in this paper are also used is given by for characterizing the performance of the OBB. Specifically, it 2 2 is demonstrated analytically that the proposed bound is asymp- MSE = E θˆ θ = θˆ θ px,θ(x, θ)dxdθ. (2) totically tight for both high and low SNR values. Furthermore, k − k k − k n o Z the OBB is compared with several other bounds; in the By the law of total expectation, we have examples considered, the OBB is both simpler computationally 2 MSE = θˆ θ px θ(x θ)dx pθ(θ)dθ and more accurate than all relevant alternatives. k − k | | The remainder of this paper is organized as follows. In Sec- Z Z  = E E θˆ θ 2 θ . (3) tion II, we derive the OBB for a vector parameter. Section III k − k n n oo discusses some mathematical concepts required to ensure the Now consider a deterministic estimation setting, i.e., sup- existence of the OBB. In Section IV, a practical technique for pose θ is a deterministic unknown which is to be estimated calculating the bound is developed using variational calculus. from random measurements x. Let the distribution px;θ of x In Section V, we demonstrate some properties of the OBB, (as a function of θ) be given by px;θ(x; θ) = px|θ(x θ), including its asymptotic tightness. Finally, in Section VI, we i.e., the distribution of x in the deterministic case equals| compare the performance of the bound with that of other the conditional distribution in the corresponding Bayesian relevant techniques. problem. The estimator θˆ defined above is simply a function of the II. THE OPTIMAL-BIAS BOUND measurements, and can therefore be applied in the determin- istic case as well. Its deterministic MSE is given by In this section, we derive the OBB for the general vector 2 2 case. To this end, we first examine the relation between the E θˆ θ ; θ = θˆ θ px θ(x; θ)dx (4) k − k k − k ; Bayesian and deterministic estimation settings (Section II-A). n o Z Since px θ(x; θ)= px θ(x θ), we have Next, we focus on the deterministic case and review the basic ; | | properties of the CRB (Section II-B). Finally, the OBB is E θˆ θ 2; θ = E θˆ θ 2 θ . (5) derived from the CRB (Section II-C). k − k k − k n o n o The focus of this paper is the Bayesian estimation prob- Combining this fact with (3), we find that the Bayesian lem, but the bound we propose stems from the theory of MSE equals the expectation of the MSE of the corresponding deterministic estimation. To avoid confusion, we will indicate deterministic problem, i.e. that a particular quantity refers to the deterministic setting E θˆ θ 2 = E E θˆ θ 2; θ . (6) by appending the symbol ; θ to it. For example, the notation k − k k − k E denotes expectation over both θ and x, i.e., expectation This relationn will be usedo to constructn n the OBB inoo Section II-C. {·} 3

B. The Deterministic Cramer–Rao´ Bound b(θ) = 0). Under the unbiasedness assumption, the bound simplifies to MSE Tr(J −1(θ)). However, in the sequel we Before developing the OBB, we review some basic results in ≥ the deterministic estimation setting. Suppose θ is a determinis- will make use of the general form (11). tic parameter vector and let x be a measurement vector having pdf px θ(x; θ). Denote by Θ Rn the set of all possible C. A Bayesian Bound from the CRB ; ⊆ values of θ. We assume for technical reasons that Θ is an The OBB of Young and Westerberg [5] is based on apply- 1 open set. ing the Bayesian–deterministic connection described in Sec- ˆ Let θ be an estimator of θ from the measurements x. We tion II-A to the deterministic CRB (11). Specifically, returning require the following regularity conditions to ensure that the now to the Bayesian setting, one can combine (6) and (11) to CRB holds [31, 3.1.3]. ˆ § obtain that, for any estimator θ with bias function b(θ), 1) px;θ(x; θ) is continuously differentiable with respect to 2 θ. This condition is required to ensure the existence of E θ θˆ Z[b] , CRB[b, θ] pθ(dθ) (12) k − k ≥ Θ the . n o Z 2) The Fisher information matrix J(θ), defined by where the expectation is now performed over both θ and x. Note that (12) describes the Bayesian MSE as a function of ∂ log px θ ∂ log px θ [J(θ)] = E ; ; ; θ (7) a deterministic property (the bias) of θˆ. Since any estimator ij ∂θ ∂θ  i j  has some bias function, and since all bias functions are is bounded and positive definite for all θ Θ. This continuously differentiable in our setting, minimizing Z[b] ensures that the measurements contain ∈ about the over all continuously differentiable functions b yields a lower unknown parameter. bound on the MSE of any Bayesian estimator. Thus, under 3) Exchanging the integral and derivative in the equation the regularity conditions of Section II-B, a lower bound on the Bayesian MSE is given by ∂ ∂ t(x) px;θ(x; θ)dx = t(x)px;θ(x; θ)dx ∂θi ∂θi Z Z (8) s = inf b(θ) 2+ b∈C1 k k is justified for any measurable function t(x), in the sense ZΘ " that, if one side exists, then the other exists and the two T ∂b −1 ∂b sides are equal. A sufficient condition for this to hold is Tr I + J (θ) I + pθ(dθ) (13) ∂θ ∂θ ! # that the support of px;θ does not depend on θ.     4) All estimators θˆ are Borel measurable functions which where C1 is the space of continuously differentiable functions satisfy f :Θ Rn. ∂px θ T → ; θˆ g(x) for all θ (9) Note that the OBB differs from the Bayesian CRB of ∂θ ≤ F Van Trees [3]. Van Trees’ result is based on applying the

for some integrable function g(x). This technical re- Cauchy–Schwarz inequality to the joint pdf px,θ, whereas the

quirement is needed in order to exclude certain patholog- deterministic CRB is based on applying a similar procedure ical estimators whose statistical behavior is insufficiently to px;θ. As a consequence, the regularity conditions required smooth to allow the application of the CRB. for the Bayesian CRB are stricter, requiring that px,θ be twice differentiable with respect to θ. By contrast, the OBB requires The bias of an estimator θˆ is defined as differentiability only of the conditional pdf px|θ. An example b(θ)= E θˆ; θ θ. (10) in which this difference is important is the case in which the − n o prior distribution pθ is discontinuous, e.g., when pθ is uniform. Under the above assumptions, it can be shown that the bias The performance of the OBB in this setting will be examined of any estimator is continuously differentiable [5, Lemma 2]. in Section VI. Furthermore, under these assumptions, the CRB holds, and In the next section, we will see that it is advantageous to thus, for any estimator having bias b(θ), we have perform the minimization (13) over a somewhat modified class of functions. This will allow us to prove the unique existence E θ θˆ 2; θ CRB[b, θ] k − k ≥ of a solution to the optimization problem, a result which will n o∂b ∂b T be of use when examining the properties of the bound later in , Tr I + J −1(θ) I + + b(θ) 2. ∂θ ∂θ k k the paper. "    # (11) III. MATHEMATICAL SAFEGUARDS A more common form of the CRB is obtained by restricting In the previous section, we saw that a lower bound on the attention to unbiased estimators (i.e., techniques for which MMSE can be obtained by solving the minimization problem 1This is required in order to ensure that one can discuss differentiability (13). However, at this point, we have no guarantee that the of px;θ with respect to θ at any point θ ∈ Θ. In the Bayesian setting to solution s of (13) is anywhere near the true value of the which we will return in Section II-C, Θ is defined by (1); in this case, adding MMSE. Indeed, at first sight, it may appear that s = 0 for a boundary to Θ essentially leaves the setting unchanged, as long as the for θ to be on the boundary of Θ is zero. Therefore, this any estimation setting. To see this, note that Z[b] is a sum requirement is of little practical relevance. of two components, a bias gradient part and a squared bias 4

Furthermore, this approach allows us to demonstrate several important theoretical properties of the OBB. Let L2 be the space of pθ-measurable functions b :Θ Rn such that → 2 b(θ) pθ(dθ) < . (14) k k ∞ ZΘ Define the associated inner product n (1) (2) , (1) (2) b , b bi (θ)bi (θ)pθ(dθ) (15) L2 i=1 Θ D E X Z 2 , and the corresponding norm b L2 b, b L2 . Any function b L2 has a derivative in thek k distributionalh i sense, but this derivative∈ might not be a function. For example, discontinuous functions have distributional derivatives which contain a Dirac delta. If, for every i, the distributional derivative ∂bi/∂θ of b is a function in L2, then b is said to be weakly differentiable [32], and its weak derivative is the matrix function ∂b/∂θ. Roughly speaking, a function is weakly differentiable if it is Fig. 1. A sequence of continuous functions for which both |b(θ)|2 and continuous and its derivative exists almost everywhere. ′ |1+ b (θ)|2 tend to zero for almost every value of θ. The space of all weakly differentiable functions in L2 is called the first-order Sobolev space [32], and is denoted H1. Define an inner product on H1 as part. Both parts are nonnegative, but the former is zero when n (1) (2) the bias gradient is I, while the latter is zero when the ∂bj ∂bj − b(1), b(2) , b(1), b(2) + , . bias is zero. No differentiable function b satisfies these two H1 L2 ∂θ ∂θ j=1 * +L2 constraints simultaneously for all θ, since if the squared bias is D E D E X (16) everywhere zero, then the bias gradient is also zero. However, 2 The associated norm is b 1 , b b 1 . An important (i) H , H it is possible to construct a sequence of functions b for property which will be usedk k extensivelyh ini our analysis is that which both the bias gradient and the squared bias norm tend H1 is a Hilbert space. to zero for almost every value of θ. An example of such a se- Note that since Θ is an open set, not all functions in C1 are quence in a one-dimensional setting is plotted in Fig. 1. Here, in 1. For example, in the case Rn, the function b θ (i) H Θ= ( )= a sequence b of smooth, periodic functions is presented. The k, for some nonzero constant k, is continuously differentiable function period tends to zero, and the percentage of the cycle but not integrable. Thus b is in C1 but not in H1, nor even in which the derivative equals 1 increases as i increases. in L2. However, any measurable function which is not in H1 Thus, the pointwise limit of the− function sequence is zero has b H1 = , meaning that either b or ∂b/∂θ has infinite almost everywhere, and the pointwise limit of the derivative L2 norm.k k Consequently,∞ either the bias norm part or the bias is 1 almost everywhere. − gradient part of Z[b] is infinite. It follows that performing the In the specific case shown in Fig. 1, it can be shown that the minimization (13) over 1 1, rather than over 1, does (i) (i) C H C value of Z[b ] does not tend to zero; in fact, Z[b ] tends not change the minimum value.∩ On the other hand, C1 H1 is to infinity in this situation. However, our example illustrates dense in H1, and Z[b] is continuous, so that minimizing∩ (13) that care must be taken when applying concepts from finite- over H1 rather than C1 H1 also does not alter the minimum. dimensional optimization problems to variational calculus. Consequently, we will henceforth∩ consider the problem The purpose of this section is to show that s > 0, so that the bound is meaningful, for any problem setting satisfying the s = inf Z[b]. (17) b∈H1 regularity conditions of Section II-B. (This question was not The advantage of including weakly differentiable functions addressed by Young and Westerberg [5].) While doing so, we in the minimization is that a unique minimizer can now be develop some abstract concepts which will also be used when guaranteed, as demonstrated by the following result. analyzing the asymptotic properties of the OBB in Section V. Proposition 1: Consider the problem As often happens with variational problems, it turns out that the minimum of (13) is not necessarily achieved by any b¯ = arg min Z[b] (18) continuously differentiable function. In order to guarantee an b∈H1 achievable minimum, one must instead minimize (13) over a where Z[b] is given by (12) and J(θ) is positive definite slightly modified space, which is defined below. As explained and bounded with probability 1. This problem is well-defined, in Section II-B, all bias functions are continuously differen- i.e., there exists a unique b¯ H1 which minimizes Z[b]. tiable, so that the minimizing function ultimately obtained, if Furthermore, the minimum value∈ s = Z[b¯] is finite and it is not differentiable, will not be the bias of any estimator. nonzero. However, as we will see, the minimum value of our new Proving the unique existence of a minimizer for (17) is a optimization problem is identical to the infimum of (13). technical exercise in functional analysis which can be found in 5

Appendix II. However, once the existence of such a minimizer the optimal b(θ) of (20) is given by the solution to the system is demonstrated, it is not difficult to see that 0

C. Tightness at High SNR [28, III.3], [13, 6.8]. One reason for this is that asymptotic § § ˆ We now examine the performance of the OBB for high efficiency describes the behavior of θ conditioned on each SNR values. To formally define the high SNR regime, we possible value of θ, and is thus a stronger result than the consider a sequence of measurements x(1), x(2),... of a single asymptotic Bayesian MSE of (33). parameter vector θ. It is assumed that, when conditioned on (i) θ, all measurements x are identically and independently VI. EXAMPLE:UNIFORM PRIOR distributed (IID). Furthermore, we assume that the Fisher in- The original bound of Young and Westerberg [5] predates formation matrix of a single observation J(θ) is well-defined, most Bayesian bounds, and, surprisingly, it has never been positive definite and finite for pθ-almost all θ. We consider the problem of estimating θ from the set of measurements cited by or compared with later results. In this section, we measure the performance of the original bound and of its x(1),..., x(N) , for a given value of N. The high SNR extension to the vector case against that of various other {regime is obtained} when N is large. techniques. We consider the case in which θ is uniformly When N tends to infinity, the MSE of the optimal estimator distributed over an -dimensional open ball θ θ tends to zero. An important question, however, concerns the n Θ= : < r Rn, so that { k k rate of convergence of the minimum MSE. More precisely, }⊆ (N) given the optimal estimator θˆ of θ from x(1),..., x(N) , 1 { } pθ(θ)= ½Θ (35) one would like to determine the asymptotic distribution of Vn(r) (N) √N(θˆ θ), conditioned on θ. A fundamental result of where ½S equals 1 when θ S and 0 otherwise, and asymptotic− can be loosely stated as follows ∈ [28, III.3], [13, 6.8]. Under some fairly mild regularity πn/2rn−1 § § (N) Vn(r)= (36) conditions, the asymptotic distribution of √N(θˆ θ), Γ(1 + n/2) conditioned on θ, does not depend on the prior distribution− (N) is the volume of an n-ball of radius r [35]. We further assume pθ; rather, √N(θˆ θ) θ converges in distribution to that − | a Gaussian random vector with mean zero and covariance x = θ + w (37) J −1(θ). It follows that where w is zero-mean Gaussian noise, independent of θ, (N) 2 −1 lim NE θˆ θ = E Tr[J (θ)] . (33) having covariance σ2I. We are interested in lower bounds on N→∞ k − k the MSE achievable by an estimator of θ from x. n o  Since the minimum MSE tends to zero at high SNR, We begin by developing the OBB for this setting, as well any lower bound on the minimum MSE must also tend to as some alternative bounds. We then compare the different zero as N . However, one would further expect a → ∞ approaches in a one-dimensional and a three-dimensional good lower bound to follow the behavior of (33). In other setting. words, if βN represents the lower bound for estimating θ The Fisher information matrix for the given estimation (1) (N) from x ,..., x , a desirable property is NβN −2 { −1 } → problem is given by J(θ) = σ I, so that the conditions E Tr[J (θ)] . The following theorem, whose proof is of Theorem 3 hold. It follows that the optimal bias function found in Appendix V, demonstrates that this is indeed the  is given by b(θ)= b( θ )θ/ θ , where b( ) is a solution to case for the OBB. the differential equationk k k k · Except for a very brief treatment by Bellini and Tartara [6], no previous Bayesian bound has shown such a result. b b′ b = b′′ + (n 1) (38) Although it appears that the Ziv–Zakai and Weiss–Weinstein σ2 − θ − θ2   bounds may also satisfy this property, this has not been proven with boundary conditions b(0) = 0, b′(r) = 1. The general formally. It is also known that the Bayesian CRB is not − asymptotically tight in this sense [34, Eqs. (37)–(39)]. solution to this differential equation is given by Theorem 6: Let θ be a random vector whose pdf θ θ p ( ) 1−n/2 θ 1−n/2 θ Rn (1) (2) b(θ)= C1θ In/2 + C2θ Kn/2 (39) is nonzero over an open set Θ . Let x , x ,... be a σ σ sequence of measurement vectors,⊆ such that x(1) θ, x(2) θ,...     | | are IID. Let J(θ) be the Fisher information matrix for where Iα(z) and Kα(z) are the modified Bessel functions estimating θ from x(1), and suppose J(θ) is finite and positive of the first and second types, respectively [36]. Since Kα(z) definite for pθ-almost all θ. Let βN be the optimal-bias is singular at the origin, the requirement b(0) = 0 leads to bound (20) for estimating θ from the observation sequence C2 =0. Differentiating (39) with respect to θ, we obtain x(1) x(N) . Then, ,..., θ θ θ { } b′(θ)= C θ−n/2 I + I (40) −1 1 n/2 1+n/2 lim NβN = E Tr(J (θ)) . (34) σ σ σ N→∞      Note that for Theorem 6 to hold, we require only that so that the requirement b′(r)= 1 leads to J(θ) be finite and positive definite. By contrast, the various − theorems guaranteeing asymptotic efficiency of Bayesian esti- rn/2 C1 = . (41) mators all require substantially stronger regularity conditions −In/2(r/σ)+ r/σI1+n/2(r/σ) 8

Substituting this value of b( ) into (24) yields the OBB, which where H = [h1,..., hm] is a matrix containing an arbitrary can be computed by evaluating· a single one-dimensional in- number m of test vectors and G is a matrix whose elements tegral. Alternatively, in the one-dimensional case, the integral are given by can be computed analytically, as will be shown below. E r(x, θ; hi,si)r(x, θ; hj,sj ) Despite the widespread use of finite-support prior distri- Gij = { } (50) E Lsi (x; θ + h , θ) E Lsj (x; θ + h , θ) butions [4], [10], the regularity conditions of many bounds { i } { j } are violated by such prior pdf functions. Indeed, the Bayesian in which CRB of Van Trees [3], the Bobrovski–Zakai bound [8], and the r(x, θ; h ,s ) , Lsi (x; θ+h , θ) L1−si (x; θ h , θ) (51) Bayesian Abel bound [12] all assume that pθ(θ) has infinite i i i − − i support, and thus cannot be applied in this scenario. and Techniques from the Ziv–Zakai family are applicable to pθ(θ1)px|θ(x θ1) L(x; θ1, θ2) , | . (52) constrained problems. An extension of the Ziv–Zakai bound pθ(θ2)px|θ(x θ2) for vector parameter estimation was developed by Bell et al. | [11]. From [11, Property 4], the MSE of the ith component The vectors h1,..., hm and the scalars s1,...,sm are arbi- of θ is bounded by trary, and can be optimized to maximize the bound (49). To ∞ avoid a multidimensional nonconvex optimization problem, we 2 restrict attention to , h e , and , as E (θi θˆi) V max A(δ)Pmin(δ) h dh m = n i = h i si = 1/2 δ eT δ − ≥ 0 : i =h suggested by [10]. This results in a dependency on a single n o Z   (42) scalar parameter h. where ei is a unit vector in the direction of the ith component, Under these conditions, Gij can be written as V is the valley-filling function defined by {·} 1 V f(h) = max f(h + η), (43) Gij = M˜ (hi hj , hj )+ M˜ (hi hj , hi) { } η≥0 M(hi)M(hj ) − − −  M˜ (hi + hj , hj ) M˜ (hi + hj , hi) (53) A(δ) , min (pθ(θ),pθ(θ + δ)) dθ, (44) − − Rn Z where  and Pmin(δ) is the minimum probability of error for the M(h) , E L1/2(x; θ + h, θ) (54) problem of testing hypothesis H0 : θ = θ0 vs. H1 : θ = θ + δ. In the current setting, P (δ) is given by P (δ)= n o 0 min 2 min and Q( δ /2σ), where Q(z) = (2π)−1/2 ∞ e−t /2dt is the tail z ˜ , 1/2 k k ½ function of the . Also, we have M(h1, h2) E L (x; θ + h1, θ) Θ+h2 . (55) R V C(r, δ ) Note that we have usedn the corrected version ofo the Weiss– A(δ)= n k k (45) Vn(r) Weinstein bound [37]. Substituting the ˜ where of x and θ into the definitions of M(h) and M(h1, h2), we

C have ½ Vn (r, h)= ½Θ Θ+he1 dθ (46) 2 2 2 2 Rn −kθ+h−xk /4σ kθ−xk /4σ

Z M(h)= E e e ½ h C Θ+ and Θ+ he1 = θ + he1 : θ Θ . Thus, Vn (r, h) is the { ∈ } nC 2 2 o volume of the intersection of two -balls whose centers are Vn (r, h ) −khk /8σ n = k k e (56) at a distance of h units from one another. Substituting these Vn(r) results into (42), we have and, similarly, ˆ 2 2 2 E (θi θi) −kh1k /8σ

− ˜ e ½ ½ M(h , h )= ½ h1 h2 dθ. (57) n ∞ o V C(r, δ ) δ 1 2 V (r) Θ Θ+ Θ+ V max n k k Q k k h dh. (47) n Z δ eT δ ≥ 0 : i =h Vn(r) 2σ Z    Thus, M(h) is a function only of h , and M˜ (h1, h2) is a k k Note that both V C (r, δ ) and Q( δ /2σ) decrease with δ . function only of h , h , and h h . Since h = he , n k 1k k 2k k 1 − 2k i i Therefore, the maximumk k in (47)k isk obtained for δ = khek. it follows that, for i = j, the numerator of (53) vanishes. Thus, i 6 Also, since the argument of V is monotonically decreasing, G is a diagonal matrix, whose diagonal elements equal the valley-filling function has{·} no effect and can be removed. M˜ (0,he ) M˜ (2he ,he ) Finally, since V C (r, h)=0 for h > 2r, the integration can 1 1 1 (58) n Gii =2 −2 . be limited to the range [0, 2r]. Thus, the extended Ziv–Zakai M (he1) bound is given by The Weiss–Weinstein bound is given by substituting this result 2r C into (49) and maximizing over h, i.e., ˆ 2 Vn (r, h) h E θ θ n Q h dh. (48) 2 2 k − k ≥ 0 Vn(r) 2σ 2 nh M (he1) n o Z   E θ θˆ max . We now compute the Weiss–Weinstein bound for the setting k − k ≥ h∈[0,2r] 2[M˜ (0,he1) M˜ (2he1,he1)] at hand. This bound is given by n o − (59) The value of h yielding the tightest bound can be determined E θ θˆ 2 Tr(HG−1HT ) (49) k − k ≥ by performing a grid search. n o 9

1 Actual MMSE 0.3 Optimal−bias bound Weinstein−Weiss 0.95 Bellini−Tartara 0.25

0.9 0.2 MSE 0.15 0.85

0.1

0.8 Optimal−bias bound 0.05

Ratio between bound and actual MMSE Weiss−Weinstein Bellini−Tartara 0.75 −20 −10 0 10 20 −20 −10 0 10 20 SNR (dB) SNR (dB) (a) (b)

Fig. 2. Comparison of the MSE bounds and the minimum achievable MSE in a one-dimensional setting for which θ ∼ U[−r, r] and x|θ ∼ N(θ, σ2).

To compare the OBB with the alternative approaches de- bound (48) can be written as veloped above, we first consider the one-dimensional case in 2r h h which θ is uniformly distributed in the range Θ = ( r, r). Let E θ θˆ 2 1 hQ dh. (62) − k − k ≥ 0 − 2r 2σ x = θ + w be a single noisy observation, where w is zero- n o Z     mean Gaussian noise, independent of θ, with σ2. We Using integration by parts, (62) becomes wish to bound the MSE of an estimator of θ from x. 2r2 r The optimal bias function is given by (39). Using the fact E θ θˆ 2 Q that I (t)= 2/π sinh(t)/√t, we obtain k − k ≥ 3 σ 1/2 n o 2  2 2 r 8 σ r + σ Γ3/2 Γ2 (63) p sinh(θ/σ) 2σ2 − 3√2π r 2σ2 b(θ)= σ (60)      − cosh(r/σ) z −t a−1 where Γa(z) = (1/Γ(a)) 0 e t dt is the incomplete which also follows [5] from Corollary 1. Substituting this Gamma function. Like the expression (61) for the OBB, this R 2 expression into (20), we have that, for any estimator θˆ, bound can be shown to converge to the noise variance σ when r σ and to the prior variance r2/3 when σ r. However, ≫ ≫ tanh(r/σ) while the convergence of the OBB to these asymptotic values E (θ θˆ)2 σ2 1 . (61) − ≥ − r/σ has been demonstrated in general in Theorems 5 and 6, the n o   asymptotic tightness of the Ziv–Zakai bound in the general Apart from the reduction in computational complexity, the case remains an open question. simplicity of (61) also emphasizes several features of the The Weiss–Weinstein bound (59) can likewise be simplified estimation problem. First, the dependence of the problem further in the one-dimensional case, yielding on the dimensionless quantity r/σ, rather than on r and σ 2 separately, is clear. This is to be expected, as a change in units E θ θˆ k − k 2 2 of measurement would multiply both r and σ by a constant. n o 2 −h /4σ h 2 h e 1 2r Second, the asymptotic properties demonstrated in Theorems max − 2 2 . (64) ≥ h∈[0,2r] h h −h /2σ 5 and 6 can be easily verified. For r σ, the bound converges 2 1 2r max 0,1 r e 2 ≫ − − − to the noise variance σ , corresponding to an uninformative However, calculating this bound still requires a numerical prior whose optimal estimator is θˆ = x; whereas, for σ r, search for the optimal value of h. ≫ a Taylor expansion of tanh(z)/z immediately shows that These bounds are compared with the exact value of the the bound converges to r2/3, corresponding to the case of MMSE in Fig. 2. In this figure, the SNR is defined as uninformative measurements, where the optimal estimator is Var(θ) r2 θˆ =0. Thus, the bound (61) is tight both for very low and for SNR(dB) = 10 log = 10log . (65) 10 Var(w) 10 3σ2 very high SNR, as expected.     In the one-dimensional case, we have V1(r) = 2r and The MMSE was computed by Monte Carlo approximation of C V1 (r, h) = max(2r h, 0), so that the extended Ziv–Zakai the error of the optimal estimator E θ x , which was itself bound (48) and the Weiss–Weinstein− bound (59) can also be computed by numerical integration. Fig.{ | 2(a)} plots the MMSE simplified somewhat. In particular, the extended Ziv–Zakai and the values obtained by the aforementioned bounds, while 10

1 0.6 Actual MSE Optimal−bias bound Weinstein−Weiss 0.5 0.95 Extended Ziv−Zakai

0.4 0.9

0.85

MSE 0.3

0.2 0.8

Optimal bias bound 0.1 0.75

Ratio between bound and optimal MSE Weiss−Weinstein Extended Ziv−Zakai 0.7 −20 −15 −10 −5 0 5 10 15 20 −20 −15 −10 −5 0 5 10 15 SNR (dB) SNR (dB) (a) (b)

Fig. 3. Comparison of the MSE bounds and the minimum achievable MSE in a three-dimensional setting for which θ is uniformly distributed over a ball of radius r and x|θ ∼ N(θ, σ2I).

Fig. 2(b) plots the ratio between each of the bounds and the the paper. The authors would also like to thank the anonymous actual MMSE in order to emphasize the difference in accuracy reviewers for their many constructive comments. between the various bounds. As can be seen from this figure, the OBB is closer to the true MSE than all other bounds, for APPENDIX I all tested SNR values. SOME TECHNICAL LEMMAS The improvements provided by the OBB continue to hold The proof of several theorems in the paper relies on the in higher dimensions as well, although in this case it is not following technical results. possible to provide a closed form for any of the bounds. For Lemma 1: Consider the minimization problems example, Fig. 3 compares the aforementioned bounds with the true MMSE in the three-dimensional case. In this case the SNR Mℓ = inf Zℓ[b], ℓ =1, 2, 3 (67) b∈S is given by where J(θ) is positive definite and bounded a.e. (pθ), Var(θ) r2 (66) SNR(dB) = 10 log10 = 10log10 2 . 2 Var(w) 5σ Z [b] , b(θ) pθ(dθ)     1 k k Here, computation of the minimum MSE requires multi- Θ Z T dimensional numerical integration, and is by far more compu- ∂b −1 ∂b Z2[b] , Tr I + J (θ) I + pθ(dθ) tationally complex than the calculation of the bounds. Again, it ∂θ ∂θ ZΘ     ! is evident from this figure that the OBB is a very tight bound Z [b] , Z [b]+ Z [b] (68) in all ranges of operation, and is considerably closer to the 3 1 2 true value than either of the alternative approaches. and S H1 is convex, closed, and bounded under the H1 norm (16).⊂ Then, for each ℓ, there exists a function b(0) S VII. CONCLUSION (0) ∈ such that Z[b ]= Mℓ. If ℓ =1 or ℓ =3, then the minimizer Although often considered distinct settings, there are in- of (67) is unique. sightful connections between the Bayesian and deterministic Note that Z3[b] equals Z[b] of (12); the notation Z3[b] is estimation problems. One such relation is the use of the introduced for simplicity. Also note that under mild regularity deterministic CRB in a Bayesian problem. The application assumptions on J(θ), uniqueness can be demonstrated for ℓ = of this deterministic bound to the problem of estimating the 2 as well, but this is not necessary for our purposes. minimum Bayesian MSE results in a Bayesian bound which Proof: The space H1 is a Cartesian product of n Sobolev is provably tight at both high and low SNR values. Numerical spaces H1(Θ), each of which is a separable Hilbert space simulation of the location estimation problem demonstrates [38, 3.7.1]. Therefore, H1 is also a separable Hilbert space. § that the technique is both simpler and tighter than alternative It follows from the Banach–Alaoglu theorem [39, 3.17] that § approaches. all bounded sequences in H1 have weakly convergent subse- quences [32, 2.18]. Recall that a sequence f (1), f (2),... ACKNOWLEDGEMENT H1 is said to§ converge weakly to f (0) H1 (denoted∈ The authors are grateful to Dr. Volker Pohl for fruitful f (i) ⇀ f (0)) if ∈ discussions concerning many of the mathematical aspects of L[f (j)] L[f (0)] (69) → 11 for all continuous linear functionals L[ ] [32, 2.9]. Note that, for any positive definite matrix W , Tr(AW BT ) · § Given a particular value ℓ 1, 2, 3 , let b(i) be a sequence is an inner product of the two matrices A and B. Therefore, ∈{(i) } of functions in S such that Zℓ[b ] Mℓ. This is a bounded by the Cauchy–Schwarz inequality, sequence since S is bounded, and→ therefore there exists a T T T (ik) (ℓ) 1 Tr(AW B ) Tr(AW A ) Tr(BWB ). (76) subsequence b which converges weakly to some bopt H . ≤ 2 (ℓ) ∈ q Furthermore, since S is closed, we have bopt S. We will Applying this to (75), we have (ℓ) ∈ now show that Zℓ[bopt ]= Mℓ. (0) Z2[f ] lim inf To this end, it suffices to show that Zℓ[ ] is weakly lower ≤ j→∞ ·(i) 1 semicontinuous, i.e., for any sequence f H which T (0) 1 ∈ (0) (0) converges weakly to f H , we must show that ∂f −1 ∂f ∈ vTr I + J (θ) I + (0) (i) Θu  ∂θ ! ∂θ !  Zℓ[f ] lim inf Zℓ[f ]. (70) Z u ≤ i→∞ u   t T (j) (0) (j) (j) Consider a weakly convergent sequence f ⇀ f . Then, ∂f −1 ∂f vTr I + J (θ) I + pθ(dθ). (69) holds for any continuous linear functional L[ ]. Specifi- ·u  ∂θ ∂θ  · u ! ! cally, choose the continuous linear functional u t   (77) (0) L1[f]= f (θ)f(θ)pθ(dθ). (71) Once again using the Cauchy–Schwarz inequality results in ZΘ We then have (0) (0) (j) Z2[f ] lim inf Z2[f ]Z2[f ] (78) ≤ j→∞ Z [f (0)]= L [f (0)] q 1 1 (0) (j) (j) and therefore Z2[f ] lim infj→∞ Z2[f ], so that Z2[ ] is = lim L1[f ] ≤ · j→∞ weakly lower semicontinuous. Since Z3[f]= Z1[f]+ Z2[f], n it follows that Z3[ ] is also weakly lower semicontinuous. (0) (j) · (ik ) (ℓ) (ik ) = lim f (θ)f (θ)pθ(dθ) Now recall that b ⇀ b and Zℓ[b ] Mℓ. By the j→∞ i i opt Θ i=1 → Z X definition (70) of lower semicontinuity, it follows that (0) (j) (ℓ) (ik) lim inf f (θ) 2pθ(dθ) f (θ) 2pθ(dθ) Zℓ[bopt ] lim inf Zℓ[b ]= Mℓ (79) ≤ k→∞ ≤ j→∞ s Θ k k · Θ k k Z Z (ℓ) (0) (j) and since Mℓ is the infimum of Zℓ[b], we obtain Z[bopt ]= M. = Z1[f ] lim inf Z1[f ] (72) j→∞ Thus b(ℓ) is a minimizer of (67). q q opt It remains to show that for ℓ 1, 3 , the minimizer where we have used the Cauchy–Schwarz inequality. It follows ∈ { } that of (67) is unique. To this end, we first show that Z1[ ] is (0) (1) · (0) (j) strictly convex. Let b ,b S be two essentially different Z1[f ] lim inf Z1[f ] (73) ∈ ≤ j→∞ functions, i.e., q q (0) (j) (0) (1) and therefore Z1[f ] lim inf j→∞ Z1[f ], so that Z1[ ] pθ θ Θ : b (θ) = b (θ) > 0. (80) ≤ · ∈ 6 is weakly lower semicontinuous. n o Similarly, consider the continuous linear functional Let b(2)(θ)= λb(0)(θ)+(1 λ)b(1)(θ) for some 0 <λ< 1, (2) − so that b S by convexity. We then have ∂f (0) ∂f T ∈ −1 2 L2[f]= Tr I + J (θ) I + pθ(dθ) (2) (0) (1) ∂θ ∂θ Z [b ]= λb (θ)+(1 λ)b (θ) pθ(dθ) ZΘ !   ! 1 − (74) ZQ 2 (0) (1) for which we have + λb (θ)+(1 λ)b (θ ) pθ(dθ) − (0) (0) ZΘ\Q Z2[f ]= L2[f ] (0) 2 (1) 2 (j) < λ b (θ) + (1 λ) b (θ ) pθ(θ) = lim L2[f ] Q k k − k k j→∞ Z h i (0) 2 (1) 2 (0) + λ b (θ) + (1 λ) b (θ) pθ(θ) ∂f −1 Θ\Q k k − k k = lim Tr I + J (θ) Z h i j→∞ Θ  ∂θ ! = λZ [b(0)]+(1 λ)Z [b(1)] (81) Z 1 − 2  T ∂f (j) where the inequality follows from strict convexity of the I + pθ(dθ). (75) squared Euclidean norm x 2. Thus Z [ ] is strictly convex, · ∂θ  1 ! and hence has a unique minimum.k k ·  Note that Z3[b] = Z1[b] + Z2[b]. Since Z1[ ] is strictly 2In fact, we require that S be “weakly closed” in the sense that weakly · convex and Z2[ ] is convex, it follows that Z3[ ] is strictly convergent sequences in S converge to an element in S. However, since S · · is convex, this notion is equivalent to the ordinary definition of closure [39, convex, and thus also has a unique minimum. This completes §3.13]. the proof. 12

The following lemma can be thought of as a triangle Similarly, we have inequality for a normed space of matrix functions over Θ. T ∂b −1 ∂b Lemma 2: Let pθ be a probability measure over Θ, and let Z[b] Tr I + J (θ) I + pθ(dθ) Rn×n ≥ ∂θ ∂θ M :Θ be a matrix function. Suppose ZΘ     ! → (90) 2 I + M(θ) F pθ(dθ) α (82) so that it suffices to minimize (17) over functions b for which Θ k k ≤ Z T ∂b −1 ∂b for some constant α. It follows that Tr I + J (θ) I + pθ(dθ) U. Θ ∂θ ∂θ ! ≤ 2 2 Z     M(θ) F pθ(dθ) (√α + √n) . (83) (91) Θ k k ≤ Note that J θ is bounded a.e., and therefore J −1 Proof: ZBy the triangle inequality, ( ) λmin( ) 1/K a.e., for some constant K. It follows that ≥ M(θ) = M(θ)+ I I M(θ)+ I + I F . F F F T k k k − k ≤k k k k(84) ∂b ∂b Tr I + J −1(θ) I + 2 ∂θ ∂θ ! Since I F = n, we have     k k 1 ∂b 2 2 I + a.e.(pθ). (92) M(θ) pθ(dθ) ≥ K ∂θ k kF F Θ Z Combining with (91) yields 2 I + M(θ) F + n +2√n I + M(θ) F pθ(dθ). ≤ k k k k 2 ZΘ ∂b h i (85) I + pθ(dθ) KU. (93) ∂θ ≤ ZΘ F Using the fact that From Lemma 2, we then have

2 2 2 I + M(θ) pθ(dθ) I + M(θ) pθ(dθ) ∂b F F pθ(dθ) √n + √KU . (94) Θ k k ≤ s Θ k k ∂θ ≤ Z Z ZΘ F (86)   From (89) and (94) it follows that the minimization (17) can and combining with (82), it follows that be limited to the closed, bounded, convex set 2 M(θ) pθ(dθ) α + n +2√nα (87) 2 F 1 2 Θ k k ≤ S = b H : b 1 U + √KU + √n . (95) Z ∈ k kH ≤ which completes the proof.     Applying Lemma 1 proves the unique existence of a minimizer of (17). The proof that 0

(j) where ǫ is an infinitesimal quantity, bi = ∂bi/∂θj, and We now demonstrate some properties of the transformation ν(θ) is an outward-pointing normal at the boundary point of b and θ. First, we have, for any j, θ Λ. We now seek conditions for which δZ[h]=0 for all∈h(θ). Consider first functions h(θ) which equal zero on 2 2 the boundary Λ. In this case, the second integral vanishes, and ∂˜b ∂˜b ∂b ∂b 2 1 + 2 = 1 cos φ + 2 sin φ we obtain the Euler–Lagrange equations ∂θ ∂θ ∂θ ∂θ j ! j !  j j  ∂F ∂ ∂F ∂b ∂b 2 (98) 1 2 i, (j) =0. + sin φ + cos φ ∀ ∂bi − ∂θj −∂θ ∂θ j ∂bi  j j  X 2 2 ∂b1 ∂b2 Substituting this result back into (97), and again using the fact = + . (103) ∂θj ∂θj that δZ[h]=0 for all h, we obtain the boundary condition    

T ∂F ∂F Also, for any i, i, θ Λ, ,..., ν(θ)=0. (99) ∀ ∀ ∈ (1) (n) ∂bi ∂bi ! 2 2 2 ∂bi ∂bi ∂bi ∂θ1 ∂bi ∂θ2 Plugging F [b, θ] = CRB[b, θ]pθ(θ) into (98) and (99) pro- + = + ˜ ˜ ∂θ ˜ ∂θ ˜ vides the required result.  ∂θ1  ∂θ2   1 ∂θ1 2 ∂θ1  ∂b ∂θ ∂b ∂θ 2 + i 1 + i 2 ∂θ ˜ ∂θ ˜  1 ∂θ2 2 ∂θ2  APPENDIX IV 2 2 ∂bi ∂bi PROOF OF THEOREM 3 = + (104) ∂θ ∂θ  1   2  Before proving Theorem 3, we provide the following two lemmas, which demonstrate some symmetry properties of the ˜ CRB. where we used the fact that θ = Rφθ. Third, we have Lemma 3: Under the conditions of Theorem 3, the func- tional Z[b] of (12) is rotation and reflection invariant, i.e., ∂˜b ∂b ∂b Z[b]= Z[Ub] for any unitary matrix U. 1 = 1 cos2 φ + 1 sin φ cos φ ∂θ ˜ ˜ Proof: We first demonstrate that Z[b] is rotation invari- 1 ∂θ1 ∂θ2 ∂b ∂b ant. From the definitions of Z[b] and CRB[b, θ], we have + 2 sin φ cos φ + 2 sin2 φ, ∂θ˜1 ∂θ˜2 T ˜ ∂b ∂b q( θ ) ∂b2 ∂b1 2 ∂b1 Z[b]= Tr I + I + k k dθ = sin φ sin φ cos φ ∂θ ∂θ J( θ ) ∂θ2 ∂θ˜1 − ∂θ˜2 ZΘ "    # k k ∂b2 ∂b2 2 2 sin φ cos φ + cos φ, (105) + b(θ) q( θ )dθ. (100) ˜ ˜ k k k k − ∂θ1 ∂θ2 ZΘ The second integral is clearly rotation invariant, since a rotation of b does not alter its norm. It remains to show that the so that first integral, which we denote by I1[b], does not change when ∂˜b ∂˜b ∂b ∂b b is rotated. To this end, we begin by considering a rotation 1 + 2 = 1 + 2 . (106) about the first two coordinates, such that b is transformed to ∂θ1 ∂θ2 ∂θ˜1 ∂θ˜2 b˜ , Rφb, where the rotation matrix Rφ is defined such that We now show that Rφb = (b1 cos φ + b2 sin φ, T b1 sin φ + b2 cos φ, b3,...,bn) . (101) − 2 2 ∂˜bi ∂bi We must thus show that b b˜ . Let us perform the δij + = δij + . (107) I1[ ] = I1[ ] ∂θ ˜ ˜ ˜ i,j j ! i,j ∂θj ! change of variables θ θ, where θ = R(−φ)θ. Rewriting X X the trace in (100) as a sum,7→ we have

2 ˜ ˜ ˜ ˜ For terms with i, j 3, we have bi = bi and θj = θj , so that ∂bi q( θ ) ≥ I [b˜]= δ + k k dθ˜ (102) replacing b˜ with b and θ with θ˜ does not change the result. 1 ij ∂θ ˜ Θ i,j j ! J( θ ) The terms with i =1, 2 and j 3 do not change because of Z X k k (103), while the terms with i ≥3 and j =1, 2 do not change where we have used the facts that θ = θ˜ and that Θ does because of (104). It remains to≥ show that the terms i, j =1, 2 not change under the change of variables.k k k k do not modify the sum. To this end, we write out these four 14 terms as this end, we note that 2 2 2 2 2 ∂bi ′ ∂ θ ′ ∂˜b1 ∂˜b2 ∂˜b1 ∂˜b2 = tδij + t θi k k = tδij +2t θiθj (112) 1+ + 1+ + + ∂θj ∂θj ∂θ1 ! ∂θ2 ! ∂θ2 ! ∂θ1 ! where δij is the Kronecker delta. It follows that ∂˜b ∂˜b =2+2 1 +2 2 2 ∂bi 2 ′ ′2 2 2 ∂θ1 ∂θ2 δij + = (1+ t) δij +4(1+ t)t θiθj δij +4t θi θj . 2 2 2 2 ∂θj ∂˜b ∂˜b ∂˜b ∂˜b   (113) + 1 + 1 + 2 + 2 ∂θ1 ! ∂θ2 ! ∂θ1 ! ∂θ2 ! Therefore T 2 ∂b1 ∂b2 ∂b ∂b ∂bi =2+2 +2 Tr I + I + = δij + ∂θ˜ ∂θ˜ ∂θ ∂θ ∂θj 1 2 "    # i,j   ∂b 2 ∂b 2 ∂b 2 ∂b 2 X + 1 + 1 + 2 + 2 = n(1 + t)2 +4t′2 θ2θ2 +4(1+ t)t′ θ2 ˜ ˜ ˜ ˜ i j i ∂θ1 ∂θ2 ∂θ1 ∂θ2 i,j i   2    2   2  2 X X ∂b1 ∂b2 ∂b1 ∂b2 2 ′2 θ 4 ′ θ 2 (114) = 1+ + 1+ + + = n(1 + t) +4t +4(1+ t)t . ˜ ˜ ˜ ˜ k k k k ∂θ1 ∂θ2 ∂θ2 ∂θ1 Thus, b θ depends on θ only through θ 2, completing        (108) CRB[ , ] the proof. k k where, in the second transition, we have used (103), (104), Proof: [Proof of Theorem 3] We have seen in Theorem 2 and (106). It follows that I1[b˜] of (102) is equal to I1[b], and that the solution of (20) is unique. Now suppose that the hence Z[b] = Z[b˜]. The result similarly holds for rotations optimum b is not rotation invariant, i.e., there exists a rotation about any other two coordinates. Since any rotation can be matrix R such that Rb(θ) is not identical to b(θ). By decomposed into a sequence of two-coordinate rotations, we Lemma 3, Rb(θ) is also optimal, which is a contradiction. conclude that Z[b] is rotation invariant. Furthermore, suppose that b is not radial, i.e., for some value Next, we prove that Z[b] is invariant to reflections through of θ, b(θ) contains a component perpendicular to the vector hyperplanes containing the origin. Since Z[b] is invariant to θ. Consider a hyperplane passing through the origin, whose rotations, it suffices to choose a single hyperplane, say θ : normal is the aforementioned perpendicular component. By { θ1 =0 . Let Lemma 3, The reflection of b through this hyperplane is also } an optimal solution of (20), which is again a contradiction. T b˜ , ( b1(θ),b2(θ),...,bn(θ)) (109) Therefore, the optimum b is spherically symmetric and radial, − so that it can be written as be the reflection of b, and consider the corresponding change θ of variables b(θ)= b( θ ) (115) k k θ θ˜ , ( θ ,θ ,...,θ )T . (110) k k − 1 2 n where b( ) is a scalar function. To determine· the value of b( ), it suffices to analyze the By the symmetry assumptions, pθ and J are unaffected by the differential equation (21) along a· straight line from the origin change of variables; furthermore, ∂b˜/∂θ˜ = ∂b/∂θ. It follows to the boundary. We choose a line along the axis, and begin that CRB[b˜, θ˜] = CRB[b, θ], and therefore Z[b]= Z[b˜]. θ1 by calculating the derivatives of b (θ), q( θ ), and J( θ ) Lemma 4: Suppose b(θ) is radial and rotation invariant, 1 along this axis. The derivative of q( θ ) isk givenk by k k i.e., b(θ) = t( θ 2)θ for some function t H1. Also k k k k ∈ suppose that J(θ)= J( θ )I, where J( ) is a scalar function. ∂q ′ θj k k · = q (ρ) (116) Then, CRB[b, θ] of (11) is rotation invariant in θ, i.e., ∂θj ρ CRB[b, Rθ] = CRB[b, θ] for any rotation matrix R. where we have denoted ρ = θ , so that ρ is weakly Proof: We will show that CRB[b, θ] depends on θ only differentiable and k k through θ 2, and is therefore rotation invariant. For the given ∂ρ θj k k = . (117) value of b(θ) and J(θ), we have ∂θj ρ Along the θ axis, we have θ = ρ while θ = = θ =0, b θ 1 1 2 n CRB[ , ] so that · · · T ∂q ′ 2 ∂b −1 ∂b = q (ρ)δ . (118) = b(θ) + Tr I + J (θ) I + ∂θ j1 k k ∂θ ∂θ j θ=ρe1 "    # T Similarly, since J(θ)= J(ρ)I, 1 ∂tθ ∂tθ 2 θ 2 I I = t + Tr + + ∂(J −1) J ′(ρ) θ k k J( θ ) " ∂θ ∂θ # jk j (119) k k     = 2 δjk (111) ∂θj −J (ρ) ρ so that along the θ axis where, for notational convenience, we have omitted the de- 1 −1 ′ pendence of t on θ 2. It remains to show that the trace in ∂(J )jk J (ρ) = δjkδj1. (120) the above expressionk k is a function of θ only through θ 2. To ∂θ −J 2(ρ) j θ=ρe1 k k

15

From (115), we have Substituting this into the definition of CRB[b, θ], we obtain

∂bi ′ θiθj b(ρ) θiθj CRB[b,ρe1] = b (ρ) + δij . (121) ∂θ ρ2 ρ − ρ2 2 j   1 n 1 b(ρ) = b2(ρ)+ (1 + b′(ρ))2 + − 1+ . (130) Thus, on the θ1 axis, we have J(ρ) J(ρ) ρ   ∂b1 Combining (130) with (127) yields (24), as required. = b′(ρ)δ . (122) ∂θ j1 j θ=ρe1 APPENDIX V The second derivative of bi(θ) can be shown to equal PROOFS OF ASYMPTOTIC PROPERTIES 2 ∂ bi ′′ θiθj θk Theorems 5 and 6 demonstrate asymptotic tightness of the = b (ρ) 3 ∂θj ∂θk ρ OBB. The proofs of these two theorems follow. b′(ρ) b(ρ) θ θ θ θ θ θ + i δ + j δ + k δ 3 i j k . Proof: [Proof of Theorem 5] We begin the proof by ρ − ρ2 ρ jk ρ ik ρ ij − ρ3 studying a certain optimization problem, whose relevance will     (123) be demonstrated shortly. Let t 0 be a constant and consider the problem ≥ Therefore, on the θ1 axis 2 2 ∂b ∂ b1 ′′ u(t) = inf I + pθ(dθ) 2 = b (ρ) b 1 ∂θ ∈H Θ ∂θ F 1 θ=ρe1 Z 2 ∂2b b′(ρ) b(ρ) s.t. b(θ) p θ(dθ) t. (131) 1 = (j = 1) k k ≤ 2 2 ZΘ ∂θj ρ − ρ 6 θ=ρe1 Notice that u(t) n for all t, since an objective having a value ∂2b ≤ 0 1 = 0 (j, k = 1). (124) of n is achieved by the function b(θ) = . Thus, it suffices 1 ∂θj∂θk θ e 6 to perform the minimization (131) over functions b =ρ 1 H satisfying ∈ Substituting these derivatives into (21), we obtain ∂b 2 ′ I + pθ(dθ) n. (132) q(ρ) ′′ b (ρ) b(ρ) ∂θ ≤ q(ρ)b(ρ)= b (ρ) + (n 1) (n 1) ZΘ F J(ρ) − ρ − − ρ2   It follows from Lemma 2 that such functions also satisfy q′(ρ) J ′(ρ) ′ (125) +(1+ b (ρ)) q(ρ) 2 2 J(ρ) − J (ρ) ∂b 2   pθ(dθ) (2√n) =4n. (133) ∂θ ≤ which is equivalent to (25). ZΘ F

To obtain the boundary conditions, observe that Lemma 3 Therefore, (131) is equivalent to the minimization implies b 0 0, whence we conclude that . Next, ( ) = b(0) = 0 2 evaluate the boundary condition (22) at boundary point θ = ∂b u(t) = inf I + pθ(dθ) (134) b re , where the surface normal ν(θ) equals e , so that ∈St ∂θ 1 1 ZΘ F

∂b where 1+ b′(ρ)=1+ 1 =0, θ = re (126) ∂θ 1 1 1 2 S = b H : b(θ) pθ(dθ) t, ′ t which is equivalent to the boundary condition b (r)= 1. ∈ Θ k k ≤ −  Z To find the OBB (24), we must now calculate Z[b] for ∂b 2 pθ(dθ) 4n . (135) the obtained bias function (115). To this end, note that, by ∂θ ≤ ZΘ F  Lemma 4, CRB[b, θ] is rotation invariant in θ for the required 1 b(θ). Thus, the integrand CRB[b, θ]q( θ ) is constant on any The set St is convex, closed, and bounded in H . Applying (n 1)-sphere centered on the origin,k sok that Lemma 1 (with ℓ = 2) implies that there exists a function − bopt S which minimizes (134), and hence also minimizes r ∈ t Z[b]= CRB[b,ρe1]q(ρ)Sn(ρ)dρ (127) (131). Z0 Note that the objective in (131) is zero if and only if where ∂bopt 2πn/2 = I a.e. (pθ). (136) S (ρ)= ρn−1 (128) ∂θ − n Γ(n/2) The only functions in H1 satisfying this requirement are the is the hypersurface area of an (n 1)-sphere of radius ρ [35]. functions It thus suffices to calculate the value− of CRB[b, θ] at points b(θ)= k θ a.e. (pθ) (137) along the θ1 axis. From (121), it follows that − for some constant k Rn. Let µ , E θ and define ∂b b(ρ) b(ρ) ∈ { } = diag b′(ρ), ,..., . (129) ∂θ ρ ρ v , E θ E θ 2 . (138) θ=ρe1   k − { }k

 16

(N) For functions of the form (137), the constraint of (131) is given and note that λN > 0 for all N, since J (θ) is positive by definite. Thus

2 2 θ θ (N) −1 k θ p (dθ)= k µ + µ θ p (dθ) (N) (N) ∂b (N) Θ k − k Θ k − − k Z [b ] Tr I + J (θ) Z Z ≥  ∂θ = k µ 2 + v ZΘ ! k − k   v. (139)  T ≥ ∂b(N) I + pθ(dθ) In (139), equality is obtained if and only if k = µ. Therefore, · ∂θ !  if t < v, no functions satisfying (136) are feasible, and thus 2  1 ∂b(N) u(t)=0 if t v, I + pθ(dθ) ≥ ≥ λN Θ ∂θ Z F u(t) > 0 if t 0. Since all eigenvalues of J (N)(θ) (N). Thus, for example, Z(N)[b] denotes the functional Z[b] decrease to zero, we have λ 0, and thus of (12) for the problem corresponding to the measurement N → vector x(N). u(q) (N) βN . (151) Since all eigenvalues of J (θ) decrease monotonically ≥ λN → ∞ with N for pθ-almost all θ, we have This contradicts the fact (145) that βN v. We conclude that (N) (N+1) ≤ CRB [b, θ] CRB [b, θ] (141) q = v, as required. ≤ Proof: [Proof of Theorem 6] The proof is analogous to for any b H1, for pθ-almost all θ, and for all N. Therefore ∈ that of Theorem 5. We begin by considering the optimization Z(N)[b] Z(N+1)[b]. (142) problem ≤ 1 2 for any b H and for all N. It follows that for all N inf b(θ) pθ(dθ) b 1 ∈ ∈H Θ k k (N) (N+1) Z T βN = min Z [b] min Z [b]= βN+1 (143) b 1 b 1 ∂b −1 ∂b ∈H ≤ ∈H s.t. Tr I + J (θ) I + pθ(dθ) t ∂θ ∂θ ≤ so that β is a non-decreasing sequence. Furthermore, note ZΘ     ! N (152) that (N) Z [µ θ]= v for all N (144) for some constant t 0. Denote the minimum value of (152) − by w(t). Let µ = E≥θ and note that b(θ)= µ θ satisfies where is given by (138). Therefore, for all . Thus v βN v N the constraint in (152){ } for any t 0, and has− an objective β converges to some value q, and we have≤ N equal to v of (138). Thus, to determine≥ w(t), it suffices to β q v for all N. (145) minimize (152) over the set N ≤ ≤ To prove the theorem, it remains to show that q = v. 1 2 S = b H : b(θ) pθ(dθ) v, Let b(N) be the minimizer of (17) when θ is estimated from t ∈ k k ≤  ZΘ x(N); this minimizer exists by virtue of Proposition 1. We then T ∂b −1 ∂b have Tr I + J (θ) I + pθ(dθ) t . Θ ∂θ ∂θ ≤ β = Z(N)[b(N)] q (146) Z     !  N ≤ Define and therefore λ , ess sup λmax(J(θ)). (153) (N) 2 θ∈Θ b (θ) pθ(dθ) q. (147) Θ k k ≤ Z Since J(θ) is positive definite almost everywhere, we have It follows that b(N) satisfies the constraint of the optimization λ> 0. For any b S , we have ∈ t problem (131) with t = q. As a consequence, we have 1 ∂b 2 2 (N) I + pθ(dθ) t (154) ∂b λ Θ ∂θ F ≤ I θ θ (148) Z + p (d ) u(q). Θ ∂θ ≥ Z F and therefore, by Lemma 2,

Define 2 2 (N) ∂b λN , ess sup λmax(J (θ)) (149) pθ(dθ) √tλ + √n . (155) θ ∂θ ≤ ∈Θ ZΘ F  

17

Hence, for any b S , Thus, b(N) satisfies the constraint of (152) with t = r. As a ∈ t 2 consequence, we have 2 2 ∂b b 1 b θ θ θ θ θ H = ( ) p (d )+ p (d ) (N) 2 k k k k ∂θ b (θ) pθ(dθ) w(r) (166) ZΘ ZΘ F 2 Θ k k ≥ v + √tλ + √n . (156) Z ≤ and therefore   (N) (N) Thus St is bounded for all t. It is straightforward to show that NβN = NZ [b ] St is also closed and convex. Therefore, employing Lemma 1 (N) 2 (with ℓ = 1) ensures that there exists a (unique) b S N b (θ) pθ(dθ) opt t ≥ k k minimizing (152). ∈ ZΘ Nw(r). (167) Note that the objective in (152) is 0 if and only if bopt(θ)= ≥ 0 almost everywhere. So, if 0 S , we have w(t)=0, and Now suppose by contradiction that r 0. Let us define (158) that w(r) > 0. Hence, by (167), NβN , which contradicts the fact that Nβ is bounded. We→ conclude ∞ that s , E Tr(J −1(θ)) (157) N r = s, as required. and note that 0 St if and only if t s. Thus ∈ ≥ REFERENCES w(t)=0 for t s ≥ [1] J. O. Berger, Statistical and Bayesian Analysis, 2nd ed. w(t) > 0 otherwise. (158) New York, NY: Springer-Verlag, 1985. [2] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Let us now return to the setting of Theorem 6. For sim- Theory. Englewood Cliffs, NJ: Prentice Hall, 1993. plicity, we denote functions corresponding to the problem [3] H. L. Van Trees, Detection, Estimation, and Modulation Theory. New York: Wiley, 1968, vol. 1. (1) (N) of estimating θ from x ,..., x with a superscript [4] J. Ziv and M. Zakai, “Some lower bounds on signal parameter estima- (N). For example, from{ the additive property} of the Fisher tion,” IEEE Trans. Inf. Theory, vol. 15, no. 3, pp. 386–391, May 1969. information [2, 3.4], we have [5] T. Y. Young and R. A. Westerberg, “Error bounds for stochastic § estimation of signal parameters,” IEEE Trans. Inf. Theory, vol. 17, no. 5, (N) pp. 549–557, Sep. 1971. J (θ)= NJ(θ). (159) [6] S. Bellini and G. Tartara, “Bounds on error in signal parameter estima- tion,” IEEE Trans. Commun., vol. 22, no. 3, pp. 340–342, 1974. It follows that [7] D. Chazan, M. Zakai, and J. Ziv, “Improved lower bounds on signal (N+1) (N) parameter estimation,” IEEE Trans. Inf. Theory, vol. 21, no. 1, pp. 90– (N + 1)CRB [b, θ] NCRB [b, θ] (160) 93, 1975. ≥ [8] B. Z. Bobrovski and M. Zakai, “A lower bound on the estimation error for all b H1, all θ Θ, and all N. Therefore for certain diffusion problems,” IEEE Trans. Inf. Theory, vol. 22, no. 1, ∈ ∈ pp. 45–52, Jan. 1976. (N + 1)Z(N+1)[b] NZ(N)[b] (161) [9] A. J. Weiss and E. Weinstein, “A lower bound on the mean-square error ≥ in random parameter estimation,” IEEE Trans. Inf. Theory, vol. 31, no. 5, for all b H1, and hence pp. 680–682, Sep. 1985. ∈ [10] E. Weinstein and A. J. Weiss, “A general class of lower bounds in (N+1) parameter estimation,” IEEE Trans. Inf. Theory, vol. 34, no. 2, pp. 338– (N + 1)βN+1 = min (N + 1)Z [b] b∈H1 342, Mar. 1988. [11] K. L. Bell, Y. Steinberg, Y. Ephraim, and H. L. Van Trees, “Extended  (N)  min NZ [b] Ziv–Zakai lower bound for vector parameter estimation,” IEEE Trans. b 1 ≥ ∈H Inf. Theory, vol. 43, no. 2, pp. 624–637, 1997.   = NβN . (162) [12] A. Renaux, P. Forster, P. Larzabal, and C. Richmond, “The Bayesian Abel bound on the mean square error,” in Proc. Int. Conf. Acoust., Thus NβN is a non-decreasing sequence. Furthermore, we Speech and Signal Processing (ICASSP 2006), vol. III, Toulouse, France, { } May 2006, pp. 9–12. have [13] E. L. Lehmann and G. Casella, Theory of , 2nd ed. (N) NZ [0]= s (163) New York: Springer, 1998. [14] Y. C. Eldar, “Rethinking biased estimation: Improving maximum like- so that NβN s for all N. It follows that NβN is non- lihood and the Cram´er–Rao bound,” Foundations and Trends in Signal decreasing and≤ bounded, and therefore converges{ } to some Processing, vol. 1, no. 4, pp. 305–449, 2008. [15] S. M. Kay and Y. C. Eldar, “Rethinking biased estimation,” IEEE Signal value r such that Process. Mag., vol. 25, no. 3, pp. 133–136, May 2008. [16] H. Cram´er, “A contribution to the theory of statistical estimation,” Skand. NβN r s for all N. (164) Akt. Tidskr., vol. 29, pp. 85–94, 1945. ≤ ≤ [17] C. R. Rao, “Information and accuracy attainable in the estimation of To prove the theorem, we must show that r = s. statistical parameters,” Bull. Calcutta Math. Soc., vol. 37, pp. 81–91, Let b(N) H1 denote the minimizer of (17) when θ 1945. is estimated∈ from x(1),..., x(N) (the existence of b(N) [18] J. M. Hammersley, “On estimating restricted parameters,” J. Roy. Statist. { } Soc. B, vol. 12, no. 2, pp. 192–240, 1950. is guaranteed by Proposition 1). We then have NβN = [19] D. G. Chapman and H. Robbins, “Minimum variance estimation without NZ(N)[b(N)] r, so that regularity assumptions,” Ann. Math. Statist., vol. 22, no. 4, pp. 581–586, ≤ Dec. 1951. T (N) (N) [20] P. K. Bhattacharya, “Estimating the mean of a multivariate normal ∂b −1 ∂b population with general quadratic ,” Ann. Math. Statist., Tr I + J (θ) I + pθ(dθ) r.  ∂θ ∂θ  ≤ vol. 37, no. 6, pp. 1819–1824, Dec. 1966. ZΘ ! ! [21] E. W. Barankin, “Locally best unbiased estimates,” Ann. Math. Statist.,   (165) vol. 20, no. 4, pp. 477–501, Dec. 1949. 18

[22] J. S. Abel, “A bound on mean-square-estimate error,” IEEE Trans. Inf. Theory, vol. 39, no. 5, pp. 1675–1680, 1993. [23] A. O. Hero, J. A. Fessler, and M. Usman, “Exploring estimator bias- variance tradeoffs using the uniform CR bound,” IEEE Trans. Signal Process., vol. 44, no. 8, pp. 2026–2041, 1996. [24] P. Forster and P. Larzabal, “On lower bounds for deterministic parameter estimation,” in Proc. Int. Conf. Acoust., Speech and Signal Processing (ICASSP 2002), vol. 2, Orlando, FL, May 2002, pp. 1137–1140. [25] Y. C. Eldar, “Minimum variance in biased estimation: Bounds and asymptotically optimal estimators,” IEEE Trans. Signal Process., vol. 52, no. 7, pp. 1915–1930, 2004. [26] ——, “Uniformly improving the Cram´er-Rao bound and maximum- likelihood estimation,” IEEE Trans. Signal Process., vol. 54, no. 8, pp. 2943–2956, 2006. [27] ——, “MSE bounds with affine bias dominating the Cram´er–Rao bound,” IEEE Trans. Signal Process., vol. 56, no. 8, pp. 3824–3836, Aug. 2008. [28] I. A. Ibragimov and R. Z. Has’minskii, Statistical Estimation: Asymp- totic Theory. New York: Springer, 1981. [29] A. Renaux, “Contribution `a l’analyse des performances d’estimation en traitement statistique du signal,” Ph.D. dissertation, Ecole` Normale Superieure de Cachan, 2006. [Online]. Available: http://tel.archives-ouvertes.fr/tel-00129527/ [30] Z. Ben-Haim and Y. C. Eldar, “A Bayesian estimation bound based on the optimal bias function,” in Proc. 2nd Int. Workshop on Computational Adv. in Multi-Sensor Adapt. Process. (CAMSAP 2007), St. Thomas, U.S. Virgin Islands, Dec. 2007. [31] J. Shao, Mathematical , 2nd ed. New York: Springer, 2003. [32] E. H. Lieb and M. Loss, Analysis, 2nd ed. American Mathematical Society, 2001. [33] D. R. Cox and N. Reid, “Parameter orthogonality and approximate conditional inference,” J. Roy. Statist. Soc. B, vol. 49, no. 1, pp. 1–39, 1987. [34] H. L. Van Trees and K. L. Bell, Bayesian Bounds for Parameter Estimation and Nonlinear Filtering/Tracking. New York: Wiley, 2007. [35] I. M. Vinogradov, Ed., Encyclopaedia of Mathematics. Dordrecht, The Netherlands: Kluwer, 1995. [36] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover, 1964. [37] Z. Ben-Haim and Y. C. Eldar, “A comment on the use of the Weiss– Weinstein bound with constrained parameter sets,” IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4682–4684, Oct. 2008. [38] L. P. Lebedev and M. J. Cloud, The Calculus of Variations and Functional Analysis. New Jersey: World Scientific, 2003. [39] W. Rudin, Functional Analysis. New York: McGraw-Hill, 1973. [40] I. M. Gelfand and S. V. Fomin, Calculus of Variations. Mineola, NY: Dover, 2000.