Nicolai Bissantz†, Thorsten Hohage‡ , and Axel Munk† § Institut fur¨ Mathematische Stochastik, Maschmuhlen¨ weg 8-10, 37073 G¨ottingen, † Germany, [email protected], [email protected] Institut fur¨ Numerische und Angewandte Mathematik, Lotzestr. 16-18, 37083 ‡ G¨ottingen, Germany, [email protected]

Consistency and Rates of Convergence of Nonlinear Tikhonov Regularization with Random Noise

Abstract. We consider nonlinear inverse problems described by operator equations F (a) = u. Here a is an element of a H which we want to estimate, and u is an L2-function. The given data consist of measurements of u at n points, perturbed by random noise. We construct an estimator aˆn for a by a combination of a local polynomial estimator and a nonlinear Tikhonov regularization and establish 2 consistency in the sense that the mean integrated square error E aˆn a (MISE) k − kH tends to 0 as n under reasonable assumptions. Moreover, if a satisfies a source → ∞ condition, we prove convergence rates for the MISE of aˆn, as well as almost surely. Further, it is shown that a cross validated parameter selection yields a fully data driven consistent method for the reconstruction of a. Finally, the feasibility of our algorithm is investigated in a numerical study for a groundwater filtration problem and an inverse obstacle scattering problem, respectively.

Keywords and phrases: statistical inverse problems, nonlinear Tikhonov regular- ization, , consistency, convergence rates, local polynomial es- timators, cross-validation, variance estimation

AMS classification scheme numbers: 62G05, 62J02, 35R30

To whom correspondence should be addressed § Nonlinear Tikhonov Regularization with Random Noise 2

1. Introduction

Suppose we want to estimate a quantity described by an element a of a separable Hilbert space H over the real numbers. Suppose further that a is not directly observable, but only measurements of an L2-function u L2(Ω, µ) defined on a metric space ∈ Ω with a Borel measure µ. Here u is related to a by a possibly nonlinear operator F : D(F ) L2(Ω, µ) defined on a subset D(F ) H, → ⊂ F (a) = u. (1) Examples will be given in the last two sections of this paper. We assume that the true solution a† D(F ) is uniquely determined by the function u† := F (a†), i.e. for all ∈ a D(F ), ∈ F (a) = F (a†) a = a†. (2) ⇒ In all practical situations only a finite number of data is available. We assume that at our disposal are n noisy measurements Y of the function u† at points X Ω described i i ∈ by an inverse regression model

† 1/2 Yi = u (Xi) + v(Xi) i, i = 1, . . . , n. (3)

Here (Xi, i) are independent and identically distributed random variables, s.t. E [ X ] = 0 and E [2 X ] = 1, and the function v L2(Ω, µ) describes the conditional i| i i | i ∈ variance of the measurement errors. More precisely we have that E [Y X ] = u†(X ) i| i i and Var [Y X ] = v(X ). If v is constant, the model is called homoscedastic, otherwise i| i i heteroscedastic. We would like to stress, that for the estimators discussed in this paper,

neither v nor the distribution of i have to be known. In other words, no apriori information on the noise level is required, the suggested estimators are fully data driven and will be shown to be consistent under very general assumptions on u, v and the

distribution of (Xi, εi). In statistical terminology model (3) is denoted as a random

design model, because the points Xi are random variables with values in Ω distributed according to a common (but unknown) design density f. In order to keep proofs more readable we confine ourself to this setting. However, we mention that similar results

hold for the assumption of a deterministic design, i.e. when the Xi are determined by the experimenter in advance. Here further regularity conditions on the distribution of

the measurement points Xi have to be posed (see [25]). † To construct an estimator aˆn of a we proceed in two steps: First we estimate † † u by a proper smoothing estimator uˆn. In the second step, an estimator for a is

constructed using the estimator uˆn of the first step. To this end we use nonlinear

Tikhonov regularization, i.e. the estimator aˆn,αn for some regularization parameter α > 0 and some a priori guess a X is given by a global minimum of the functional n 0 ∈ 2 2 a F (a) uˆ 2 + α a a , a D(F ). (4) 7→ k − nkL (Ω) nk − 0kH ∈ The main result of this paper states that this procedure is consistent under certain assumptions, i.e. E aˆ a† 2 0 as n , as well as almost surely. Moreover, k n − kH → → ∞ Nonlinear Tikhonov Regularization with Random Noise 3 we establish the rate of convergence of E aˆ a† 2 if a† a belongs to the range of k n − kH − 0 F 0[a†]∗ and is sufficiently small. An attractive feature of the model (3) is the possibility to reliably estimate the v of the noise. (Recall that in a deterministic framework the data noise level has to be known a priori to construct parameter choice rules leading to convergent regularization methods, see [2]). This estimate is useful in the second step, the nonlinear Tikhonov regularization (4), for the choice of the regularization parameter † 2 α . To this end cross validation of the mean integrated square error E uˆ u 2 will n k n − kL (Ω) be used in the first step and consistency of the entire algorithm will be shown. Hence, our method is completely data-driven, i.e. the choice of the smoothing parameters in

both steps depends only on the given data (Xi, Yi). Whereas there exists a huge literature on linear inverse problems with random noise (see [8] for a recent review), only a few results have been published on nonlinear problems of this kind (see e.g. [33, 35, 38]). To the best of our knowledge, rigorous consistency and convergence rate results for nonlinear inverse problems with random noise are only available in a benchmark paper by Sullivan [34] so far. The approach discussed there, often called method of regularization (MOR), 2 1 n 2 consists in replacing the term F (a) uˆ 2 in (4) by (F (a))(X ) Y (see k − nkL (Ω) n j=1 | j − j| also [35]). Apparently, this has the advantage that only one regularization parameter P has to be chosen. However, if we aim at a completely data-driven parameter selection method, e.g. by cross validation, then the MOR requires a large number of numerically expensive operator inversions. Already for linear inverse problems, an analysis of cross- validation (see Lucas [22]) requires a number of technical assumptions on the operator. With our approach a difficult parameter selection problem occurs only in the first step of † † 2 estimating u . Since this step also provides an estimate of E uˆ u 2 , we can apply k n − kL (Ω) well-known parameter selection methods from deterministic theory (see [7]) to select αn in (4). Another advantage of our approach is a simplification and improvement of the convergence analysis. Sullivan [34] investigated the convergence of the MOR only for a special class of problems for which the linearized operator equations are well-posed if the space L2(Ω) is replaced by a Sobolev space W s(Ω). This excludes important inverse problems as for example the inverse scattering problem considered in subsection 4.3 of this paper. Moreover, we demonstrate at a particular example in section 5 that our method is capable of achieving minimax rates for a particular smoothness class whereas Sullivan’s convergence rate estimates are always slightly worse than the corresponding rates for linear problems obtained by Nychka & Cox [28], i.e. they are suboptimal. As opposed to the frequentist’s point of view followed in this paper, Bayesian methods have been successfully applied for the solution of nonlinear inverse problems with applications in impedance tomography and other areas (see [19] and references therein). Whereas Bayesian methods involve a large number of regularized inversions of the operator F , the frequentist method considered here requires only one operator inversion. Hence, this method is computationally more efficient. Moreover, our method Nonlinear Tikhonov Regularization with Random Noise 4 requires no a priori information on the distribution of the data noise. On the other hand, a posterior distribution of a in (1) computed by a Bayesian approach contains much more information than a single estimator provided by a frequentist approach. The plan of this paper is as follows: Sections 2 and 3 contain the main results of this paper on consistency and convergence rates of the method under investigation

given certain assumptions on the estimators uˆn and on the operator F . In section 4 we demonstrate for two important inverse problems, the recovery of a diffusion coefficient in groundwater filtrations from distributed measurements, and an inverse obstacle scattering problem, that all assumptions can of our consistency theorem can be verified. To estimate u† in the first step, we use local polynomial estimators. Monte Carlo experiments show the validity of our results. Finally, in section 5, we show the potential minimax rate optimality of our method at a particular linear example.

2. Consistency

For the reconstruction of a from observations (X1, Y1), . . . , (Xn, Yn) it will be crucial to construct a sequence of uˆ , n N for u† such that n ∈ 1 † uˆn u L2(Ω) 0 almost surely (5) γn k − k → or † 2 2 E uˆ u 2 = O β (6) k n − kL (Ω) n for some sequences γn 0 and βn 0. In order to construct estimators satisfying → → these conditions, a priori smoothness information on u† are required. One possibility to construct estimators uˆn satisfying (5) and (6) is discussed in subsection 4.1. In general, in nonparametric regression this is a severe difficulty, however for most inverse problems in partial differential equations, u† is a part of the solution of a partial differential equation, and hence a priori smoothness information is available. Two examples will be discussed in subsections 4.2 and 4.3. In what follows, we assume that F is weakly sequentially closed, i.e. if a sequence (a ) N D(F ) convergences weakly to some a H, a * a and if F (a ) * u for n n∈ ⊂ ∈ n n some u G, then a D(F ) and F (a) = u. A sufficient condition for weak sequential ∈ ∈ closedness is that D(F ) is weakly closed (e.g. closed and convex) and that F is weakly continuous. As an immediate consequence of the deterministic convergence theorem for nonlinear Tikhonov regularization (see [7, Theorem 10.3]) we obtain the following result. Proposition 1. Assume further that F is weakly sequentially closed and that (2), (3) and (5) hold true. Let aˆn denote a solution to the minimization problem 2 2 F (a) uˆ 2 + α a a = min! a D(F ) (7) k − nkL (Ω) nk − 0kH ∈ and assume that the regularization parameters αn > 0 are chosen such that 2 γn αn 0, 0 as n . (8) → αn → → ∞ Nonlinear Tikhonov Regularization with Random Noise 5

Then aˆ a† 0 almost surely. k n − kH → We mention that the weak closedness of F guarantees existence, but not uniqueness of a solution to the minimization problem (7). For many initial estimators of q, in particular for local polynomial estimators considered below, condition (5) with an explicit rate γn may be hard to verify. Moreover, the parameter choice rule (8) may not be advantageous, especially if the estimate of the rate γn is too pessimistic. We will establish the following result based on condition (6). Theorem 2. Assume that F is weakly sequentially closed and that (2), (3) and (6) hold true. Let aˆn denote a (not necessarily unique) solution to the minimization problem (7) and assume that the regularization parameters αn > 0 are chosen such that 2 βn αn 0, 0 as n . (9) → αn → → ∞ Then E aˆ a† 2 0 as n . (10) k n − kH → → ∞

Proof. By the definition of aˆn as a solution to the minimization problem (7), the inequality 2 2 † 2 † 2 F (aˆ ) uˆ 2 + α aˆ a u uˆ 2 + α a a (11) k n − nkL (Ω) nk n − 0kH ≤ k − nkL (Ω) nk − 0kH holds true. It follows from assumption (6) that

2 2 2 † 2 E F (aˆ ) uˆ 2 + α E aˆ a Cβ + α a a . k n − nkL (Ω) n k n − 0kH ≤ n nk − 0kH With the parameter choice rule (9) and assumption (6) we obtain 2 2 βn † 2 † 2 lim sup E aˆn a0 H lim sup C + a a0 H = a a0 H , (12) n→∞ k − k ≤ n→∞ αn k − k k − k 2 E F (aˆ ) uˆ 2 0, n . (13) k n − nkL (Ω) → → ∞ Our next aim is to show that there exists a limiting vector a H and a ∞ ∈ subsequence (aˆ ) of (aˆ ) such that for all ϕ H n(k) n ∈ E aˆ , ϕ a , ϕ , k . (14) n(k) H → h ∞ iH → ∞ 2 1/2 Note that A := (sup N E aˆ ) is finite due to (12). Let ϕ : l N be n∈ k nkH { l ∈ } a complete orthonormal system in the separable Hilbert space H. By the Cauchy- Schwarz inequality, we have E aˆ , ϕ 2 E aˆ 2 ϕ 2 A. Hence, there exists | h n 1iH | ≤ k nkH k 1kH ≤ a subsequence n (k) and ξ R such that E aˆ 1 , ϕ ξ as k . Repeating 1 1 ∈ n (k) 1 H → 1 → ∞ the same argument, we obtain a subsequence n (k) of n (k) and a number ξ R such 2 1 2 ∈ that E aˆ 2 , ϕ ξ as k , and so on. The diagonal sequence n(k) := n (k) n (k) 2 H → 2 → ∞ k has the property that

E aˆ , ϕ ξ , k n(k) l H → l → ∞

Nonlinear Tikhonov Regularization with Random Noise 6 for all l N, where ξ R. For all L N we have ∈ l ∈ ∈ L L 2 E 2 E 2 2 ξl lim aˆn(k), ϕl H lim sup aˆn(k) H A . (15) ≤ k→∞ | | ≤ k→∞ k k ≤ l=1 l=1 X X Hence, a := ∞ ξ ϕ is well defined. Let  > 0 and choose L N such that ∞ l=1 l l ∈ ∞ ϕ, ϕ 2 (/(4A))2. There exists K > 0 such that l=L+1 h liHP≤ P L  ϕ, ϕl H E a∞ aˆn(k), ϕl h i − H ≤ 2 l=1 X for k K . Now it follows from the Cauchy-Schwarz inequality that ≥ E a∞ aˆn(k), ϕ H  for k K. This completes the proof of (14). | − | ≤ ≥ † The next step of the proof is to show that a∞ = a . By (6) and (13), it follows † 2 † that E F (aˆ ) u 2 0 as k . Hence, F (aˆ ) u 2 0 in k n(k) − kL (Ω) → → ∞ k n(k) − kL (Ω) → probability, which yields a subsequence of n(k), again denoted by n(k), such that † F (aˆ ) u 2 0 almost surely as k . Using a similar diagonal sequence k n(k) − kL (Ω) → → ∞ argument as above, we obtain a subsequence m(k) of n(k) such that aˆ , ϕ ξ , k a.s. m(k) l H → l → ∞ for all l N. We claim that almost surely there exists a bounded subsequence (aˆ ) ∈ m˜ (k) of (aˆ ). Assume on the contrary that the event D that lim inf aˆ = has m(k) k→∞ k m(k)kH ∞ probability P (D) > 0. Then Fatou’s lemma together with (12) gives the contradiction

2 2 = E lim inf aˆm(k) H lim inf E aˆm(k) H < . ∞ k→∞ k k ≤ k→∞ k k ∞

A similar argument as above with the bound A in (15) replaced by B := sup N aˆ k∈ k m˜ (k)kH shows that aˆm˜ (k) * a∞ as k . Now the weak closedness of F and the uniqueness → ∞† assumption (2) imply that a∞ = a . To finally prove the assertion (10), we assume on the contrary that there exist an

 > 0 and a subsequence (aˆn˜(k)) such that E aˆ a† 2  (16) k n˜(k) − kH ≥ for all k N. Using the results above, we may assume, by possibly going to another ∈ subsequence, that E aˆ a†, ϕ 0, (17) n˜(k) − H → for all ϕ H as k . Using the identity ∈ → ∞ aˆ a† 2 = aˆ a 2 + a a† 2 + 2 aˆ a , a a† , k n˜(k) − kH k n˜(k) − 0kH k 0 − kH n˜(k) − 0 0 − H the inequality (12), and (17) we obtain E † 2 † 2 E † lim sup aˆn˜(k) a H 2 a0 a H + 2 lim sup aˆn˜(k) a0, a0 a H = 0, k→∞ k − k ≤ k − k k→∞ − − which contradicts (16). This completes the proof of (10). Nonlinear Tikhonov Regularization with Random Noise 7

3. A convergence rate result

The following theorem is in our statistical setting (3) an analogue of a well-known result by Engl, Kunisch & Neubauer [6]. Theorem 3. Assume that F is Fr´echet differentiable, that D(F ) is convex, and that there exists a Lipschitz constant L > 0 such that 0 0 F [a ] F [a ] 2 L a a (18) k 1 − 2 kL(H,L (Ω,µ)) ≤ k 1 − 2kH for all a , a D(F ). Moreover, assume that the source condition 1 2 ∈ † 0 † ∗ a a = F [a ] w, L w 2 < 1 (19) − 0 k kL (Ω) is satisfied for some w L2(Ω, µ). Let aˆ be a solution to (7). ∈ n (i) If condition (5) is satisfied and α γ , (20) n ∼ n i.e. c−1α γ cα for all n N and some constant c > 0, then, as n n ≤ n ≤ n ∈ → ∞ 1 † 1 † aˆn a H 0, a.s., and F (aˆn) u L2(Ω) 0, a.s.(21) √γn k − k → γn k − k → (ii) If the estimate (6) holds true and α β , (22) n ∼ n then † 2 † 2 2 E aˆ a = O(β ), and E F (aˆ ) u 2 = O(β ). (23) k n − kH n k n − kL (Ω) n Proof. As in the proof of Theorem 2 we use inequality (11). Adding α aˆ a† 2 nk n − kH − α aˆ a 2 on both sides of this inequality yields nk n − 0kH 2 † 2 † 2 † † F (aˆ ) uˆ 2 + α aˆ a u uˆ 2 + 2α a a , a aˆ k n − nkL (Ω) nk n − kH ≤ k − nkL (Ω) n − 0 − n H † 2 0 † † = u uˆ 2 + 2α w, F [a ](a aˆ ) . (24) k − nkL (Ω) n − n H Here condition (19) has been used in the last line. Due to (18) and the con vexity of D(F ) the Taylor remainder satisfies the standard estimate

† 0 † † L † 2 F (aˆ ) F (a ) F [a ](aˆ a ) 2 aˆ a . k n − − n − kL (Ω) ≤ 2 k n − kH Hence,

0 † † L † 2 † F [a ](aˆ a ) 2 aˆ a + F (aˆ ) F (a ) 2 k n − kL (Ω) ≤ 2 k n − kH k n − kL (Ω) L † 2 † aˆ a + F (aˆ ) uˆ 2 + uˆ u 2 . ≤ 2 k n − kH k n − nkL (Ω) k n − kL (Ω) Inserting this into (24) and using the Cauchy-Schwarz inequality yields

2 † 2 † 2 † F (aˆ ) uˆ 2 + α aˆ a uˆ u 2 + 2α w 2 uˆ u 2 k n − nkL (Ω) nk n − kH ≤ k n − kL (Ω) nk kL (Ω) k n − kL (Ω) † 2 + 2α w 2 F (aˆ ) uˆ 2 + α L w 2 aˆ a , nk kL (Ω) k n − nkL (Ω) n k kL (Ω) k n − kH Nonlinear Tikhonov Regularization with Random Noise 8 and hence

2 † 2 F (aˆ ) uˆ 2 α w 2 + α (1 L w 2 ) aˆ a k n − nkL (Ω) − nk kL (Ω) n − k kL (Ω) k n − kH † 2  uˆn u L2(Ω) + αn w L2(Ω) . ≤ k − k k k Neglecting the first term on the left hand side of this inequality yields 1 2 † 2 † 2 2 aˆn a H uˆn u L (Ω) + αn w L (Ω) , (25) k − k ≤ α (1 L w 2 ) k − k k k n − k kL (Ω) and neglecting the second term yields  † † F (a ) u 2 2 uˆ u 2 + 2α w 2 , (26) k n − kL (Ω) ≤ k n − kL (Ω) nk kL (Ω) using the triangle inequality. Together with the assumption (5) and the parameter choice rule (20), the estimates (25) and (26) immediately yield (21). Squaring (26), taking expected values of (25) and (26), and using (22), we obtain (23). Remark 1. If no a priori smoothness information on u† is available, it is possible to introduce a smoothing operator Q : L2(Ω, µ) H¯ mapping to another Hilbert space H¯ → and multiplying both sides of (1) by Q: QF (a) = Qu.

† † The first step then consists in constructing estimators qˆn of q := Qu satisfying (5) or † † (6) with uˆn replaced by qˆn and u by q . The second step consists in minimizing the Tikhonov functional QF (a) qˆ 2 + α a a 2 over a D(F ). The results above k − nkH¯ nk − 0kH ∈ now can be applied to the operator QF instead of F . If F is linear, the adjoint operator Q = F ∗ is typically used. Under reasonable † assumptions it is possible to construct an unbiased, √n-consistent estimator qˆn for q in the first step, which gives, however additional ill-posedness induced by multiplication by F ∗. In a second step some linear regularisation method is employed. Mair and Ruymgaart [23] have shown that for linear operators F the resulting estimators for a† achieve the best possible rate of convergence as n in many cases. → ∞

4. Applications

In this section we show how the results above can be applied to the estimation of a diffusion coefficient from distributed measurements and an inverse obstacle scattering problem. In both cases the estimation of u† in the first step of our method is done by local polynomial estimators.

4.1. Estimation of u† by local polynomial estimators The basic idea of local polynomial estimators is to estimate the regression function u† at a particular point x Ω by locally fitting a pth degree polynomial to the data by ∈ weighted (see [11]). Nonlinear Tikhonov Regularization with Random Noise 9

We assume from now on that Ω is a smooth bounded subdomain of Rd, that u† C2(Ω)¯ and the design density is continuously differentiable and positive in Ω.¯ ∈ In this section we argue that local polynomial estimators are well suited to estimate u† in this setting and in particular that condition (6) holds if u† is estimated that way. Furthermore we describe an estimator of the MISE. This is necessary to select the regularization parameter αn in the Tikhonov functional (4). Of course, local polynomial estimators are not the only valid options, other estimators such as series or wavelet estimators would work as well (see [37] or [5] for an overview on statistical smoothing methods). A particularly important and simple class of local polynomial estimators are local linear estimators corresponding to the choice p = 1. The linear polynomial estimator can be expressed in matrix notation by T T −1 T uˆn(x; h) = e1 (Xx Wx,hXx) Xx Wx,hY, (27) where e := (1, 0, . . . , 0)T Rd+1, Y := (Y , . . . , Y )T , 1 ∈ 1 n T 1 (X1 x) . −. Xx := . . , Wx,h := diag Kh(X1 x), . . . , Kh(Xn x) ,   { − − } 1 (X x)T  n −   1 x  d and K (x) = d K , K : R [0, ) denotes a kernel function K = 1 with h h h → ∞ bandwidth h. The representation (27) allows for an efficient and simple numerical  R  implementation. The following theorem, adapted from [36], gives an estimate of the mean integrated † 2 square error E uˆ ( , h) u 2 conditioned on X , . . . , X . Recall that for two real k n · − kL (Ω) 1 n random sequences A , B with B = 0 we write A = o (B ) if lim P ( A /B > n n n 6 n P n n→∞ | n n| ε) = 0 for all ε > 0, i.e. if A /B 0 in probability. n n → Theorem 4. Assume that (i) Ω Rd is a smooth, bounded domain, u† C2(Ω), and that the design density ⊂ ∈ f C1(Ω) satisfies f > 0. ∈ T (ii) K is continuous, has compact support, xx K(x) dx = µ2(K)I for some constant µ (K) > 0, and all odd-order moments of K vanish, i.e. xκ1 xκd K(x)dx = 0 2 R 1 · · · d for all combinations of integers κ , . . . , κ 0 with κ odd. (These assumptions 1 d ≥ i i R are satisfied, e.g., for product kernels constructed from symmetric univariate P kernels.) (iii) The scalar bandwidth h = h(n) tends to zero as n such that nhd . → ∞ → ∞ 2 † Define R(K) := d K(x) dx and denote by H † (x) the Hessian of u at x Ω. Then R u ∈ the linear polynomial estimator (27) is well defined with probability 1 and the MISE R satisfies 4 † 2 h 2 2 E uˆn( ; h) u L2(Ω) X1, . . . , Xn = µ2(K) (tr Hu† (x)) dx (28) k · − k | 4 Ω   Z Nonlinear Tikhonov Regularization with Random Noise 10 1 v(x) 1 + R(K) dx + o + h4 . nhd f(x) P nhd ZΩ   An important issue is a proper selection of the bandwidth h. Minimizing the right hand side of eq. (28) (without the o ()-term) leads to the choice h = C n−1/(d+4), P opt d · with asymptotically optimal constant R(K)d v(x) dx † Ω f(x) Cd = Cd(u , v, f) = 2 2 . µ2(K) Ω(trR Hu† (x)) dx There are several options to estimateR the optimal bandwidth hopt. Recently, Xia & Li [40] proved that for the local polynomial estimator hopt can be consistently estimated by cross validation, which consists in minimizing the distance n 2 −1 (−i) LSCV (h) := n uˆn (Xi; h) Yi , if h is restricted to the interval Hn = i=1 − −1/(2p+3) −1/(2p+3)  (−i) [an , bn P ] for some positive constants a < b. Here uˆn is the ”leave-

one-out” estimator based on the data (Xj, Yj), j 1, . . . , n i . ∈ { † }\{ } Under additional regularity assumptions on K, f, u and the distribution of εi, and if d = 1, Xia & Li [40] showed that MISE(hˆ ) lim n = 1 a.s., (29) →∞ n (infh∈Hn MISE(h)) ˆ where hn denotes the bandwidth determined by cross-validation. This is sufficient to verify that assumption (6) holds a.s., and hence a.s. the assertions of Theorems 2 and 3

(ii). Hence we obtain, that our method with cross-validated bandwidth hˆn yields a fully data driven consistent estimator of u†.

We remark that many alternative solutions exist for the estimation of hopt, among them are methods based on plug-in selection (e.g. [14] or [30]), or on bootstrap (cf. [18]). There is a controversial and comprehensive discussion which data driven selection method of h performs best (e.g. [18]). Recently, Loader [21] argued forcefully that cross validation still represents a reliable method albeit critized by various authors during the past. Hence, in the subsequent examples we confine ourself to cross validation for the selection of h. So far we only have argued that the locally linear polynomial estimator provides an estimator for u†, for which assumption (6) holds, i.e. which has sufficient convergence properties of the MISE. However, in order to estimate a† we have to select the

regularization parameter αn in the Tikhonov functional (4) asymptotically proportional to the MISE as in (22) and hence we need to estimate the MISE itself. In theory we could

select αn = MISE(hˆn), i.e. use the MISE based on the cross validation bandwidth hˆn. However, thisq approach did not perform well in our numerical simulation studies (cf. sections 4.2 and 4.3). We therefore have used a different method related to plug-in bandwidth selection methods (see e.g. [15]), which is based on estimating the right hand side of (28). We stress that we do not aim to give a completely rigorous analysis of this method here, which would require an almost sure expansion of (28) in order to Nonlinear Tikhonov Regularization with Random Noise 11 obtain a.s. consistency. However, by means of (28) one can obtain a weakly and L2- consistent method. To this end estimate the first term on the right hand side of (28) by estimation of derivatives of u†. Using the idea of plug-in methods this is done again by local polynomial estimators of degree p m with cross-validated bandwidth. ≤ To estimate the second term on the right hand side of (28), we use an estimator of

the variance v. For simplicity we restrict ourselves to homoscedastic noise, i.e. v(x) = v0 for all x Ω (see [32, 12] for heteroscedastic noise). Following [13], in this case a simple ∈ estimator for v0 is given by 1 n vˆ = Y uˆ (X , h) 2 (30) 0 m | i − n i | i=1 X n n n 2 with a normalization constant m = n 2 wi(Xi) + wj(Xi) chosen such that − i=1 i=1 j=1 E vˆ = v if u† is a polynomial of degree Pp. For d = 1PthisPestimator can be shown to 0 0 ≤ be asymptotically minimax under certain regularity conditions (see [13]) and has been extended to d 2 by Munk et al. [26]. If the selection of h is performed by the cross- ≥ validation procedure, consistency has been shown by Neumann [27]. A computationally much simpler, but less efficient option is the use of local residual estimators as described in [4] and [26], which have been demonstrated to work sufficiently well in many examples. Remark 2. We mention that it is also possible to show that condition (5) holds for local polynomial estimators, e.g. based on a result of Masry [24], who established uniform strong consistency over compact subsets of Rd. Again, (29) can be used to prove a.s. consistency of our method. Masry’s [24] results involve various additional technical † assumptions on u , the design density f, the distribution of the error variables i and the kernel K. Uniform almost sure convergence of the L2-error for the local polynomial estimators has also been proved by Kohler [20] under general assumptions, but without rates as required in condition (5).

4.2. Estimation of the diffusivity in groundwater filtration Let Ω Rd, d N be a smooth bounded domain and consider the elliptic boundary ⊂ ∈ value problem a u = g in Ω ∇ ∇ (31) u = 0 on ∂Ω. This differential equation is used to model steady-state groundwater flow. Here a is the diffusitivity of the sediment, u is the piezometric head, and g represents sinks and sources of water. For a discussion of this and related problems in a deterministic context we refer to [1]. It is well-known that the boundary value problem (31) has a unique weak solution u H1(Ω) if g H−1(Ω), a L∞(Ω), and a a for some constant a > 0. We introduce ∈ 0 ∈ ∈ ≥ Nonlinear Tikhonov Regularization with Random Noise 12 the operator F : D(F ) L2(Ω) which maps a diffusivity a D(F ) to the corresponding → ∈ solution u of the differential equation (31). It is custom to define the domain of F as D(F ) := a Hs(Ω) : a a { ∈ ≥ } with s > d/2. By Sobolev’s embedding theorem, H s(Ω) is continuously embedded in L∞(Ω) for smooth Ω. By elliptic regularity results (see [39]) the degree of smoothness of u† is determined by the smoothness of a†, g, and ∂Ω, e.g. u† H1+k(Ω) for k = 0, 1, 2, . . . ∈ if ∂Ω is C1+k,1 smooth, a Ck(Ω), and g Hk−1(Ω). Hence, we can apply the results − ∈ ∈ of subsection 4.1 on local polynomial estimators. Using a weak formulation of the differential equation (31), it is not difficult to prove that F is Fr´echet-differentiable. The interpretation of the source condition (19) is discussed for d = 1 in [7, Chapter 10.5]. Weak closedness of F can be shown using Lemma 5 below and the compactness of the embedding H s(Ω) , Hσ(Ω) for d/2 < σ < s → and Ck,α-smooth ∂Ω, k + α > s (see [39]). The domain D(F ) is weakly closed since it is closed and convex. It has been shown by Richter [29] that a† is uniquely determined by u† (condition (2)) if g is positive and H¨older continuous on Ω. Hence, the results of sections 2 and 3 can be applied to our problem. Lemma 5. Assume that the Hilbert space H is compactly embedded in a Hilbert space H˜ . Let F : D(F ) Y be an operator defined on a weakly closed domain D(F ) H → ⊂ mapping to another Hilbert space Y and assume that F can be extended to an operator F˜ : D(F˜) Y with D(F ) D(F˜) H˜ which is continuous with respect to the → ⊂ ⊂ in H˜ . Then F is weakly sequentially closed.

† † Proof. Let (an) be a sequence in D(F ) such that an * a in H and F (an) * u in Y for some a† H and u† Y . Since D(F ) is weakly closed, a† D(F ). By the compactness ∈ ∈ ˜ †∈ of the embedding H , H, we have that limn→∞ an a H˜ = 0. Using the continuity → † † k − k † of F˜ we obtain that F (a ) = F˜(a ) = limn→∞ F˜(an) = limn→∞ F (an) = u . In our simulations we choose Ω = x R2 : x 1 , a†(x , x ) = 1 0.99 x3, { ∈ | | ≤ } 1 2 − ∗ 1 and g(x1, x2) = 1 + x1 + x2. Further, we assumed that the measurement points (wells)

X1, . . . , Xn are distributed randomly according to a uniform distribution over Ω and that the noise in the (3) is Gaussian and homoscedastic with √v 0.03. ≡ The direct problem to compute the solution u of the elliptic boundary value problem (31) given the diffusion coefficient a was solved by finite elements. To estimate u† in the first part of our algorithm we implemented a linear local polynomial estimator as described in subsection 4.1 with a bandwidth h determined by

least squares cross-validation. Further we used the estimator vˆ0 given by formula (30)

for the constant variance v0 of the data. In the second part of our algorithm, the estimation of a† from the estimate uˆ, we minimized the nonlinear Tikhonov functional by a Levenberg-Marquardt algorithm. For simplicity, the Sobolev index s in the definition of D(F ) was chosen to be s = 1. (To be consistent with theory, we actually should have used s > 1!) The Tikhonov functional (4) Nonlinear Tikhonov Regularization with Random Noise 13 3.0 40 2.5 30 2.0 1.5 20 1.0 10 0.5 −distance estimated − true solution u −distance estimated − true diffusivity a 2 2 L L 0 1 1.5 2 2.5 3 3.5 0.0 1 1.5 2 2.5 3 3.5 log10(sample size n) log10(sample size n)

Figure 1. Dependence of the quality of estimation of u† and a† on the sample size n. Left: L2-distance between the true function u† and the estimate uˆ. Right: L2-distance † between the true diffusivity a and the estimate aˆn computed from the estimate uˆn. was minimized over the finite dimensional subspace J c Φ( r ) : c R defined { j=1 j · − j j ∈ } by radial basis functions with Φ(x) = exp( x 2/2) and points r from an equidistant −| | P j rectangular grid in R2. Since radial basis functions have excellent approximation properties we need a much smaller number J of degrees of freedom to approximate a† with the required accuracy than with a linear finite element approximation. Figure 1 presents the L2-distance between the ”true” diffusivity function a† and the estimate aˆn for varying sample numbers n, where 25 simulations have been performed for each sample size. Observe from these log log-plots that the quality of reproduction − both for u† and a† suggests a polynomial rate as the sample size increases. Finally the performance of our algorithm is illustrated in Figure 2 which shows a typical estimate of u† and a† for n = 50 and √v = 0.03. The quality of the reconstruction of a† is reasonable considering the small number of observations and the ill-posedness of the problem. Observe the correlation between regions of low well density and sub- average quality of recovery of a†. Nonlinear Tikhonov Regularization with Random Noise 14

Figure 2. A typical simulation in our standard setting with n = 50 and √v = 0.03. † Upper left: true piezometric head u . The black symbols indicate the positions Xi of the wells (observation points). Upper right: estimated piezometric head uˆ. Lower left: true diffusivity a†. Lower right: estimated diffusivity aˆ.

4.3. Inverse obstacle scattering In this section we apply our algorithm to the problem to reconstruct the shape of a sound-hard acoustic scatterer from far field measurements of the scattered field (cf. [3]). Let K R2 be a smooth compact scatterer, which for simplicity is assumed to be ⊂ ikx·d star-shaped with respect to the origin. We consider a plane incident wave ui(x) := e with direction d S1, S1 := x R2 : x = 1 , which gives rise to a scattered field ∈ { ∈ | | } u : R2 K C. The total field field u := u + u satisfies the Helmholtz equation s \ → i s ∆u + k2u = 0 in R2 K \ and the boundary condition ∂u = 0 on ∂K ∂n in case of a sound-hard scatterer. Furthermore, the scattered field satisfies the Nonlinear Tikhonov Regularization with Random Noise 15

Sommerfeld radiation condition

∂us lim √r ikus = 0, r = x , r→∞ ∂r − | |   uniformly in all directions x/ x . The last condition implies that u behaves | | s asymptotically like a cylindrical outgoing wave: eikr u (x) = u (x/r) + O(r−1) , r = x . s √r ∞ | | → ∞ The amplitude factor u :S1 C is called far field pattern of u . Far field patterns ∞ → s are always analytic functions (see [3]). Since K is assumed to be star-shaped, its boundary can be described by a 2π-periodic function a : R (0, ): → ∞ ∂K = a†(t)(cos t, sin t)T : 0 t < 2π . { ≤ } We assume that a† belongs to the Sobolev space H s([0, 2π]) of periodic functions on [0, 2π] with s > 3/2. The is described by the operator equation

F (a) = u∞, where the operator F : D(F ) L2(S1) maps a function a to the far field pattern u → ∞ corresponding to the scatterer parameterized by a for some fixed incident field. The domain of F is given by D(F ) := a Hs([0, 2π]) : a r , with s > 3/2, r > 0. { ∈ ≥ 0} 0 2 1 The far field patterns naturally belong to the complex Hilbert space LC(S ). Since we have formulated our theorical results for real Hilbert spaces and since a is real 2 1 valued, we interpret LC(S ) as a real Hilbert space with the inner product u , v := h ∞ ∞i Re S2 u∞v∞ ds. It can be shown that F is well-defined and Fr´echet-differentiable for s > 3/2 (see [3, 17]). Weak closedness of F follows again from Lemma 5 using the R compactness of the embedding H s([0, 2π]) , Hσ([0, 2π]) for 3/2 < σ < s and the → weak closedness of D(F ). (This was our reason for introducing the parameter r0 in the definition of D(F ) instead of requiring simply a > 0.) Although uniqueness (condition (2)) is an open problem for the situation described above, a number of uniqueness results has been shown if more data or a priori information are available (see [3]). Concerning Theorem 3, it has been shown in [16] that condition (19) is far too restrictive for this problem, implying analyticity of a† a , and a weaker so-called logarithmic source − 0 condition has been investigated (cf. Remark in section 5). Hence, we cannot expect the rate (23) for this exponentially ill-posed problem. However, Theorem 2 still applies, i.e. our method is consistent. For our simulations we chose the wave number k = 3, the direction d = (1, 0) of the incident wave, and a bean-shaped obstacle shown in the right part of Figure 3. Nonlinear Tikhonov Regularization with Random Noise 16

estimation of real part of far field pattern estimation of the shape of the scatterer

data true shape 1.5 1 true far field estimated shape estimated far field a−priori guess 0.5 1

0.5 0

0 −0.5

−0.5 −1 −1 −1.5 0 2 4 6 −1 0 1

Figure 3. Estimation of the shape of a sound-hard scattering obstacle. d = (1, 0), k = 3. Left: Estimation of the far field pattern from measurements at n = 64 randomly distributed points. Right: Estimation of the shape of the obstacle.

estimation of the variance v 0 estimation of the MISE 1.3 1.1

1.25 1.05 1.2

1.15 1 estimate 0 1.1

0.95 1.05

rel. error of v 1 0.9

0.95 rel. error of estimate far field MISE

0.9 0.85 1 2 3 1 2 3 10 10 10 10 10 10 sample size n sample size n

Figure 4. Estimation of variance and MISE for random design. Left: vˆ0/v0. Right: † MISEˆ / uˆ∞,n u 2 . k − ∞kL (Ω) Nonlinear Tikhonov Regularization with Random Noise 17

convergence of far field estimators convergence of shape reconstructions 0 1 10 10

−1 0 10 10 −error of far field estimators 2 L −error of boundary parametrizations s H

1 2 3 1 2 3 10 10 10 10 10 10 sample size n sample size n

† † Figure 5. Left: uˆn,∞ u 2 . Right: aˆn a Hs . The solid lines linearly k − ∞kL (Ω) k − k connect the mean values for each sample size.

The evaluation of the direct solution operator and its Fr´echet derivative was implemented by a boundary integral equation method (see [16]). The parametrizations a of the boundary were approximated by trigonometric polynomials of degree 32. ≤ In the first step we estimated the true far field pattern from measurements at randomly distributed points using a polynomial estimator of degree p = 3. The variance function was chosen to be √v 0.3 in all our experiments. The left part of Figure 3 ≡ shows a typical estimate of the far field pattern. The right hand part shows our the shape of our test obstacle and a reconstruction of it. Clearly, the illuminated side of the obstacle is reconstructed much more accurately than the shadow side. The inital guess

a0 of the shape was the circle of radius 1 centered at the origin. In Figure 4 we plotted the relative errors of our estimates of the variance v and the MISE as a function of the sample size. Except for a few outliers, our estimates are accurate up to at least 10% although convergence of the relative errors seems rather slow. However, an estimate of this quality is more than enough for a sufficiently accurate

choice of the regularization parameter αn in Tikhonov regularization. Using the exact

(unknown) errors of the far field estimator uˆ∞,n improved the quality of the shape reconstructions only marginally. † The left part of Figure 5 is a plot of uˆ u 2 over n with logarithmic k ∞,n − ∞kL (Ω) scale of both axes. As expected, we see a linear behavior corresponding to a polynomial convergence rate. The right plot displays the convergence of the error in the shape † reconstructions, aˆ a s . In the range of n’s and the values of k and √v considered k n − kH here, the graph deviates only slightly from a linear behavior. This indicates that in our Nonlinear Tikhonov Regularization with Random Noise 18 parameter regime the problem is not that severely ill-posed, which is also reflected in the rather small difference in quality of the far field and the shape estimator in Figure 3.

5. Potential minimax rate optimality of the method

In this section we discuss an example where it is possible to achieve minimax rates with the proposed method. Of course, for most nonlinear problems minimax rates are unknown, for some results in the linear case we refer to [23]. Our example, taken from [23, Section 7.1], is the linear integral equation F (a) = u with the integral operator F : L2([0, 1]) L2([0, 1]) defined by → 1 (F a)(x) := Q(x, y)a(y) dy, x [0, 1] ∈ Z0 with kernel x(1 y), x y, Q(x, y) := − ≤ y(1 x), x > y. ( − Two functions u, a L2([0, 1]) satisfy F (a) = u if and only if u has a generalized ∈ derivative u00 L2([0, 1]), u(0) = u(1) = 0, and u00 = a. The source condition (19) ∈ − with a R(F ) (e.g. a = 0) reduces to a† R(F ) since L = 0 for linear operators and 0 ∈ 0 ∈ since F is self-adjoint. Therefore, (19) is equivalent to (u†)(4) L2([0, 1]), u†(0) = u†(1) = (u†)00(0) = (u†)00(1) = 0. (33) ∈ For this smoothness class, cubic polynomial estimators converge with the rate 8 2 1 † 2 h p+1 † (4) 2 E 2 uˆn( ; h) u L ([0,1]) X1, . . . , Xn = 2 y K(p)(y) dy (u ) (x) dx k · − k | (4!) 0 | |   Z  Z 1 1 v(x) + K (y)2 dy dx, nh (p) f(x) Z Z0 1 + o h8 + (34) P nh   where K(p) is a kernel of order p + 1 constructed from K as defined in [31]. This follows immediately from the proof of Theorem 4.1 in [31] if (u†)(4) is continuous (note that for this result continuity of (u†)(p+1) is sufficient since p is odd). It can easily be shown that this condition can be relaxed to (u†)(4) L2([0, 1]) by considering a sequence † 4 4 †∈ † (j) 2 u C ([0, 1]),  > 0 satisfying (33) and j=0 (u u ) L2([0,1]) 0 as  0. Then †∈ † k − k → † → u u ∞ 0 as  0, and therefore the left hand sides of (34) with u replaced by k† − k → → P u converge to the left hand side of (34). Obviously, the same holds true for the right hand sides. Nonlinear Tikhonov Regularization with Random Noise 19

The optimal rate O(n−8/9) is achieved if the bandwidth parameter h is chosen such that h n−1/9, e.g. using the methods discussed in section 4.1. Then assumption (6) ∼ −4/9 holds with βn = n , and Theorem 3 yields † 2 −4/9 E aˆ a 2 = O n . k n − kL ([0,1]) On the other hand, Mair & Ruymgaart [23]have proved the following minimax result: n 2 Let n denote the class of all estimators aˆn : ([0, 1] R) L ([0, 1]) such that A 2 × → E aˆ ((X , Y ), . . . , (X , Y )) 2 < . Then k n 1 1 n n kL ([0,1]) ∞ † 2 −4/9 inf sup E aˆn a L2([0,1]) cn (35) aˆn∈An a†∈R(F ) k − k ≥ for some constant c > 0. Therefore, our method converges of optimal order as n → ∞ for this particular example. Remark 3. As pointed out by a referee an interesting problem for future research consists in studying more general source conditions of the form † 0 † ∗ 0 † a a = f(F [a ] F [a ])w, w 2 C, (36) − 0 k kL ([0,1]) ≤ 0 † 2 with a continuous function f : [0, F [a ] 2 2 ] R instead of the source k kL(L ([0,1]),L ([0,1])) → condition (19) in Theorem 3. Eq. (19) is equivalent to eq. (36) with f(t) = √t and C = L−1 (cf. [7, Proposition 2.18]). In the example above, smoothness information on a† a in terms of other Sobolev spaces is described by (36) with f(t) = tν , ν > 0. For − 0 the scattering problem discussed in subsection 4.3, eq. (36) with f(t) = ( ln t)−p, p > 0 − essentially corresponds to the fact that a† a belongs to a Sobolev space (cf. [16]). − 0

Acknowledgments:

The support of the Graduiertenkolleg “Identifikation in mathematischen Modellen: Synergie stochastischer und numerischer Methoden” is gratefully acknowlegded. A. Munk gratefully acknowledges support of the DFG, grant Mu1230/8-1. We are grateful to two referees for their helpful comments which led to an improved version.

References

[1] H. Banks and K. Kunisch. Parameter Estimation Techniques for Distributed Systems. Birkh¨auser, Boston, 1989. [2] A. B. Bakushinskii. Remarks on choosing a regularization parameter using the quasi-optimality and ratio criterion. USSR Comp. Math. Math. Phys., 24(4):181–182, 1984. [3] D. Colton and R. Kreß. Inverse Acoustic and Electromagnetic Scattering Theory. Springer Verlag, Berlin Heidelberg New York, second edition, 1997. [4] H. Dette, A. Munk, and T. Wagner. Estimating the variance in nonparametric regression – what is a reasonable choice? J. Roy. Statist. Soc., Ser. B, 60:751-764, 1998. [5] S. Efromovich. Nonparametric Curve Estimation. Methods, Theory, and Applications. New York, Springer, 1999. [6] H. W. Engl, K. Kunisch and A. Neubauer. Convergence rates for Tikhonov regularization of nonlinear ill-posed problems. Inverse Problems, 5:523–540, 1989. Nonlinear Tikhonov Regularization with Random Noise 20

[7] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer Academic Publisher, Dordrecht Boston London, 1996. [8] S. N. Evans and P. B. Stark. Inverse problems as statistics. Inverse Problems, 18:R55-R97, 2002. [9] J. Fan. Design-adaptive nonparametric regression. J. Amer. Statist. Assoc., 87:998-1004, 1992. [10] J. Fan. Local smoothers and their minimax efficiencies. Ann. Statist., 21:196-216, 1993. [11] J. Fan and I. Gijbels. Local Polynomial Modelling and its Applications. Chapman & Hall, London, 1996. [12] J. Fan and Q. Yao. Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85:645-660, 1998. [13] P. Hall and J. S. Marron. On variance estimation in nonparametric regression. Biometrika, 77:415-419, 1990. [14] W. H¨ardle and J. S. Marron. Optimal bandwidth selection in nonparametric regression function estimation. Ann. Statist., 13:1465-1481, 1985. [15] E. Herrmann, M. P. Wand, J. Engel and T. Gasser. A bandwidth selector for bivariate kernel regression. J. Roy. Statist. Soc., Ser. B, 57:171–180, 1995. [16] T. Hohage. Convergence rates of a regularized Newton method in sound-hard inverse scattering. SIAM J. Numer. Anal., 36:125–142, 1998. [17] T. Hohage. Iterative Methods in Inverse Obstacle Scattering: Regularization Theory of Linear and Nonlinear Exponentially Ill-Posed Problems. PhD thesis, University of Linz, 1999. [18] M. C. Jones, J. S. Marron and S. J. Sheather. A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc., 91:401-407, 1996. [19] J. Kaipio and E. Somersalo. Computational and Statistical Methods for Inverse Problems. Springer, New York, 2004. [20] M. Kohler. Universal consistency of local polynomial kernel regression estimates. Ann. Inst. Statist. Math., 54:879-899, 2002. [21] C. R. Loader. Bandwidth selection: classical or plug-in? Ann. Statist., 27:415-438, 1999. [22] M. A. Lucas. Asymptotic optimality of generalized cross-validation for choosing the regularization parameter. Numer. Math., 66:41–66, 1993. [23] B. A. Mair and F. Ruymgaart. Statistical inverse estimation in Hilbert scales. SIAM J. Appl. Math., 56:1424–1444, 1996. [24] E. Masry. Multivariate local for time series: uniform strong consistency and rates. J. Time Series Anal., 17:571-599, 1996a. [25] A. Munk. Testing the goodness of fit of parametric regression models with random Toeplitz forms. Scand. J. Statist., 29:501-535, 2002. [26] A. Munk, N. Bissantz, T. Wagner and G. Freitag. On difference-based variance estimation in nonparametric regression when the covariate is high dimensional. J. Roy. Stat. Soc., Ser. B, to appear. [27] M. H. Neumann. Fully data-driven nonparametric variance estimators. Statist., 25:189-212, 1994. [28] D. W. Nychka and D. Cox. Convergence rates for regularized solutions of integral equations from discrete noisy data. Ann. Statist., 17:556–572, 1989. [29] G. R. Richter. An inverse problem for the steady state diffusion equation. SIAM J. Appl. Math., 4:210–221, 1981. [30] D. Ruppert, S. J. Sheather, and M. P. Wand. An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc., 90:1257-1270, 1995. [31] D. Ruppert and M. P. Wand. Multivariate locally regression. Ann. Statist., 22:1346-1370, 1994. [32] D. Ruppert, M. P. Wand, U. Holst and O. H¨ossjer. Local polynomial variance-function estimation. Technometrics, 39:262-273, 1997. [33] R. Snieder. An extension of Backus-Gilbert theory to nonlinear inverse problems. Inverse Problems, 7:409–433, 1991. Nonlinear Tikhonov Regularization with Random Noise 21

[34] F. O’Sullivan. Convergence characteristics of method of regularization estimators for nonlinear operator equations. SIAM J. Num. Anal., 27:1635–1649, 1990. [35] G. Wahba. Spline Models for Observational Data. SIAM, Philadelphia, 1990. [36] M. P. Wand. Error analyses for general multivariate kernel estimators. J. Nonp. Statist., 2:1-15, 1992. [37] M. P. Wand and M. C. Jones. Kernel Smoothing. London, Chapman & Hall, 1995. [38] J. Weese. A regularization method for nonlinear ill-posed problems. Comp. Phys. Commun., 77:429-440, 1993. [39] J. Wloka. Partial Differential Equations. Cambridge University Press, 1987. [40] Y. Xia and W. K. Li. Asymptotic behavior of bandwidth selected by the cross-validation method for local polynomial fitting. J. Multiv. Anal., 83:265-287, 2002.