arXiv:1710.09244v3 [math.NA] 3 Jul 2018 [email protected] ecnie ier l-oe nes rbesi h omo operator of form the in problems inverse ill-posed linear, consider We Introduction 1 ∗ ozsr 61,303Gotne,Gray b.sprung@mat Germany, G¨ottingen, 37083 16-18, Lotzestr. ihrodrcnegnertsfrBregman for rates convergence order Higher trtdvrainlrglrzto finverse of regularization variational iterated Keywords: theoret Our sp demonstrated. experiments. Besov numerical be of in can terms confirmed rates are in the interpreted of ps are optimality elliptic VSCs the with the regularization where res entropy operators These for VSC. order discussed third p ther under and fidelity rates convergence data derive general we with v spaces iterated lead Banach Bregman which in For order ularization spaces. high Hilbert arbitrarily in cons rates of be convergence VSCs to introduce have we regularization paper variational converge s bee of order the versions has higher to even ated VSCs up For order rates regularization. convergence higher Tikhonov obtained of low- towards who only step (2013) yielding first Grasmair by of A disadvantage su rates. the of have vergence formulations spaces other S Banach as well inequalities. in con as variational (VSCs) of source conditions form such source the terms in determin penalty formulated is or often convergence fidelity source this called data of solution rate non-quadratic the on The conditions noi 0. smoothness the stract to as tends spaces side Banach in hand equations operator ill-posed ear nrp euaiain aitoa oreconditions MSC: source variational regularization, entropy esuytecnegneo aitoal euaie solu regularized variationally of convergence the study We 52,65J22 65J20, ejmnSprung Benjamin nes rbes aitoa euaiain convergen regularization, variational problems, inverse etme 7 2018 27, September problems f T Abstract ∗ = 1 g obs hrtnHohage Thorsten eudodifferential odtos For conditions. c variational uch drd nthis In idered. c ae,iter- rates, nce rainlreg- ariational ei h right the in se hconditions ch nlyterms, enalty h.uni-goettingen.de, lsaefur- are ults in olin- to tions iin are ditions clresults ical ooptimal to re con- order ∗ db ab- by ed aturation csand aces taken n erates, ce equations (1) with a bounded linear operator T : between Banach spaces and . We will assume that T is injective, but thatX →T Y−1 : T ( ) is not continuous.X Y The exact solution will be denoted by f †, and the noisyX → observed X data by gobs , assuming the standard deterministic noise model ∈ Y

gobs Tf † δ (2) k − k≤ with noise level δ > 0. To obtain a stable estimator of f † from such data we will consider generalized Tikhonov regularization of the form

1 fˆ arg min (Tf gobs)+ (f) (P ) α ∈ αS − R 1 f∈X   with a convex, lower semi-continuous penalty functional : R , R , a regularization parameter α> 0 and a data fidelityR termX → ∪ {∞} 6≡ ∞ (g) := 1 g q (3) S q k kY for some q> 1. The aim of regularization theory is to bound the reconstruction error fˆα f † in terms of the noise level δ. Classically, in a setting, conditionsk − implyingk such bounds have been formulated in terms of the spectral calculus of the operator T ∗T ,

f † (T ∗T )ν/2( ) (4) ∈ X for some ν > 0 and some initial guess f0. In fact, using spectral theory it is easy to show that (4) yields

† ν f fˆ = δ ν+1 (5) − α O

 2 for classical Tikhonov regularization (i.e. (P1) with (f) = f f0 and 2 R k − k 2 ν+1 (g) = g Y ) if ν (0, 2] and α δ (see e.g. [14]). The proof and even Sthe formulationk k of the∈ source condition∼ (4) rely on spectral theory and have no straightforward generalizations to settings, and even in a Hilbert space setting the proof does not apply to frequently used nonquadratic func- tionals and . As anR alternative,S starting from [23], source conditions in the form of varia- tional inequalities have been used:

∗ f : f ∗,f † f 1 ∆f (f,f †)+Φ (Tf Tf †) (6) ∀ ∈ X − ≤ 2 R S − Here Φ : [0, ) [0, ) is an index function (i.e. Φ is continuous and increasing ∞ → ∞ f ∗ with Φ(0) = 0), and ∆R denotes the Bregman distance (see Section 2 for a definition). Under the noise model (2) the variational source condition (6) ∗ f ˆ † 2 implies the convergence rate ∆R (fα,f ) (Φ(δ )), as shown in [20]. In contrast to spectral source conditions, the condition≤ O (6) is not only sufficient, but

2 even necessary for this rate of convergence in most cases (see [26]). Moreover, due to the close connection to conditional stability estimates, variational source conditions can be verified even for interesting nonlinear inverse problems [25]. However, it is easy to see that (6) with quadratic and can only hold R S true for Φ satisfying limτ→0 Φ(τ)/√τ > 0 (except for the very special case f † arg min ), see [15, Prop. 12.10]. This implies that for quadratic Tikhonov regularization∈ R the condition (6) only covers spectral H¨older source condition (4) with indices ν (0, 1]. Several alternatives to the formulation (6) of the source condition suffer∈ from the same limitation: multiplicative variational source con- ditions [2, 30], approximate source conditions [15], and approximate variational source conditions [15]. Symmetrized version of multiplicative variational source conditions (see [2, eq. (6)] and [1, 4]) cover a larger range of ν, but have no obvious generalization to Banach space§ settings or non-quadratic or . As shown in the first paper [23], the limiting case Φ(τ)= c√τ is equivalentS R to the source condition p ∗ : T ∗p = f ∗ (7) ∃ ∈ Y studied earlier in [5, 11]. To generalize also H¨older source conditions (4) with ν > 1 to the setting (P1), Grasmair [21] imposed a variational source condition on p, which turns out to be the solution of a Fenchel dual problem. Again the limiting case of this dual source condition, which we tag second order source condition, is equivalent to a simpler condition, T ω ∂ ∗(p), which was studied earlier in [34, 36, 37]. Hence, Grasmair’s second order∈ S condition corresponds to the indices ν (1, 2] in (4). The aim of∈ this paper is to derive rates of convergence corresponding to † 2/3 indices ν > 2, i.e. faster than fˆα f = (δ ) in a Banach space setting. By the well-known saturation effectk − fork TikhonovO regularization [22] such rates can occur in quadratic Tikhonov regularization only for f † = 0. Therefore, we consider Bregman iterated Tikhonov regularization of the form

ˆ(n) 1 obs ˆ(n−1) fα arg min (Tf g ) + ∆R f, fα , (Pn) ∈ f∈X α S −    2 for n 2, which reduces to iterated Tikhonov regularization if (f) = f X and ≥(g) = g 2. There is a considerable literature on this typeR of iterationk k fromS which wek k can only give a few references here. Note that for (f) = 2 R f X the iteration (Pn) can be interpreted as the proximal point method for kminimizingk (f) := (Tf gobs). In [6, 7, 10] generalizations of the proximal point methodT for generalS functions− on Rd were studied, in which the quadratic term is replaced by some Bregman distanceT (also called D-function). For (f)= obs T (Tf g ) this leads to (Pn), and the references above discuss in particular theS case− entropy functions considered below. In the context of total variation R regularization of inverse problems, the iteration (Pn) was suggested in [35]. Low order convergence rates of this iterative method for quadratic data fidelity terms and general penalty terms were obtained in [4, 16, 17, 18]. We emphasize Sthat in contrast to all the referencesR above, we consider only small fixed number

3 of iterations in (Pn) here to cope with the saturation effect. In particular, we study convergence in the limit α 0, rather than n . The main contributions of this→ paper are: → ∞ The formulation of variational source conditions of arbitrarily high order • for quadratic regularization in Hilbert spaces (3.1) and the derivation of optimal convergence rates under these conditions (Theorem 3.2). Optimal convergence rates of general Bregman iterated variational regu- • larization (Pn) in Banach spaces under a variational source condition of order 3 (Theorem 4.5). Characterization of our new higher order variational source conditions in • terms of Besov spaces for finitely smoothing operators, both for quadratic regularization (Corollary 5.3) and for maximum entropy regularization (Theorem 5.7). The remainder of this paper is organized as follows: In the following section we review some basic properties of the Bregman iteration (Pn) and derive a general error bound. The following two sections 3 and 4 contain our main abstract convergence results in Hilbert and Banach§ spaces,§ respectively. The following section 5 is devoted to the interpretation of higher order variational source conditions.§ Our theoretical results are verified by numerical experiments for entropy regularization in 6, before we end the paper with some conclusions. Some results on duality mappings§ and consequences of the Xu-Roach inequality are collected in an appendix.

2 Bregman iterations

Let us first recall the definition of the Bregman distance for a convex functional ∗ ∗ : ( , ]: Let f0,f and assume that f0 belongs to the RsubdifferentialX → −∞ of∞ at f , f ∗ ∈∂ X(f ) (see e.g. [12, I.5]).∈ Then X we set R 0 0 ∈ R 0 § f ∗ ∆ 0 (f,f ) := (f) (f ) f ∗,f f . R 0 R − R 0 − h 0 − 0i In the context of inverse problems Bregman distances were introduced in [5, 11]. ∗ If there is no ambiguity, we sometimes omit the superindex f0 . This is in partic- ′ ular the case if is Gateaux differentiable, implying that ∂ (f0)= [f0] . In R 2 R {R } the case (f)= f X with a Hilbert space , the Bregman distance is simply R k k 2 X given by ∆R(f,f0)= f f0 X . In general, however, the Bregman distance is neither symmetric nork does− itk satisfy a triangle inequality. Later we will also use ∗ ∗ ∗ ∗ sym,f1 ,f2 f1 f2 symmetric Bregman distances ∆R (f1,f2) := ∆R (f2,f1)+∆R (f1,f2) for f ,f and f ∗ ∂ (f ), which satisfy 1 2 ∈ X j ∈ R j ∆sym(f ,f )= f ∗ f ∗,f f . R 1 2 h 2 − 1 2 − 1i

4 Under the same assumptions the following identity follows from Young’s equal- ity:

∗ f2 f1 ∗ ∗ ∆R (f1,f2) = ∆R∗ (f2 ,f1 ). (8)

Let us show that Bregman iterations (Pn) are well-defined for general data fidelity terms of the form (3). To this end we impose the following conditions: Assumption 2.1. Let , be Banach spaces, and assume that is q-smooth and r-convex 1 < q X2 Y r < (see Definition A.1). Moreover,Y consider an operator T L( ≤, ),≤ a convex,∞ proper, lower semi-continuous functional : ( ,∈ ] andX Y given by (3). Assume that the functional R X → −∞ ∞ S 1 f ∗ f (Tf gobs) + ∆ 0 (f,f ) 7→ αS − R 0 has a unique minimizer for all (f ,f ∗) ∗ such that f ∗ ∂ (f ). 0 0 ∈X×X 0 ∈ R 0 Existence and uniqueness of minimizers has been shown in many cases under different assumptions in the literature. As the main focus of this work are convergence rates, we just assume this property here. We just mention that it can be shown by a standard argument from calculus of variations under the ∗ f0 additional assumptions that the sublevel sets f : ∆R (f,f0) M are weakly or weakly∗ sequentially compact for all M{ ∈R. X For given by≤ the} cross entropy functional discussed in 5.3 such weak continuity∈ ofR sublevel sets in L1 was shown in [11, Lemma 2.3].§ For total variation regularization Assumption 2.1 has been shown in [35, Prop. 3.1]. For a number s (1, ) we will denote by s∗ the conjugate number satisfying 1 1 ∈ ∞ s + s∗ = 1. Recall that ∗ ∗ 1 q (p)= p ∗ S q∗ k kY and that ( gobs)∗(p) = ∗(p)+ p,gobs . The initial step of the Bregman S ·− S h i iteration is the Tikhonov minimization problem (P1). The Fenchel dual to (P1) is

1 ∗ obs ∗ ∗ ∗ pˆα arg min ( αp) p,g + (T p) . (P1 ) ∈ ∗ αS − − h i R p∈Y   ∗ By Theorem 4.1 in [12, Chap. III],p ˆα exists as the functional is contin- ∈∗ Y ∗ S uous everywhere. As is q-smooth, is q -convex, and hencep ˆα is unique. ˆ Y Y ∗ If fα exists as well we have strong duality for (P1),(P1 ), and by [12, Chap. III, Prop. 4.1] the following extremal relations hold true: T ∗pˆ ∂ (fˆ ) and αpˆ ∂ (T fˆ gobs). (9) α ∈ R α − α ∈ S α − ∗ Using the Bregman distance (f) := ∆T pˆα (f, fˆ ) we can give a precise defi- R2 R α nition of the second step of the Bregman iteration (Pn): 1 fˆ(2) arg min (Tf gobs)+ (f) . (P ) α ∈ αS − R2 2 f∈X  

5 Like this we can recursively prove well-definedness of the Bregman iteration (Pn) as follows:

(1) Proposition 2.2. Suppose Assumption 2.1 holds true. Let fˆα := fˆα be the solution to (P1) := (P1), and set 1 := . Then for n = 1, 2,... the dual solutions R R

(n) 1 ∗ obs ∗ ∗ ∗ pˆα arg min ( αp) p,g + n(T p) (Pn ) ∈ ∗ αS − − h i R p∈Y   ∗ are well defined, and we have strong duality between (Pn) and (Pn ). Moreover,

n f ∗ := T ∗pˆ(k) ∂ fˆ(n) , n α ∈ R α kX=1   ∗ fn (n) such that we can define (f) := ∆ (f, fˆα ) as well as Rn+1 R 1 fˆ(n+1) arg min (Tf gobs)+ (f) . (P ) α ∈ αS − Rn+1 n+1 f∈X   (n) Proof. We need to prove the existence ofp ˆα as well as the fact that the sub- ˆ(n) n ∗ (k) (n) differential ∂ (fα ) contains k=1 T pˆα . The existence ofp ˆα as well as R ∗ strong duality for (Pn), (Pn ) follows once again from Theorem 4.1 in [12]. The second statement can be provedP by induction. The base case was shown in (9). By strong duality we have

n−1 T ∗pˆ(n) ∂ (fˆ(n))= ∂ (fˆ(n)) T ∗pˆ(k). α ∈ Rn α R α − α Xk=1 n ∗ (k) ˆ(n) Therefore, we have k=1 T pˆα ∂ (fα ) and can define n+1 in the way we claimed. ∈ R R P A useful fact about the penalty functionals n is that their corresponding Bregman distances coincide for all n N as theyR only differ by an affine linear functional: ∈

∗ n−1 (k) Lemma 2.3. Let f ,f ∂ (f ) and p˜ := pˆα . Then we have 0 ∈ X 0 ∈ R 0 k=1 f ∗−T ∗p˜ f ∗ P ∆ 0 (f,f ) = ∆ 0 (f,f ). Rn 0 R 0

∗ ˆ(n−1) ∗ ∗ Proof. By Proposition 2.2 we have T p˜ ∂ (fα ) so f0 T p˜ ∂ n(f0) and ∈ R − ∈ R

∗ ∗ ∗ ∗ f0 −T p˜ T p˜ (n−1) T p˜ (n−1) ∗ ∗ ∆ (f,f0) = ∆ (f, fˆ ) ∆ (f0, fˆ ) f T p,˜ f f0 Rn R α − R α − h 0 − − i f ∗ = (f) (f ) f ∗,f f = ∆ 0 (f,f ). R − R 0 − h 0 − 0i R 0

6 The following lemma describes the first step towards our bounds on the error in the Bregman distance. All that is then left is to construct appropriate vectors f which approximately minimize the functional on the right hand side. Lemma 2.4. Suppose Assumption 2.1 holds true and there exists p ∗ such ∗ † (∈n) Y that T p ∂ (f ). With the notation of Proposition 2.2 define sα := p n−1 (k)∈ R − k=1 pˆα . Then

P ∗ T p ˆ(n) † 1 obs (n) obs ∆R fα ,f inf Tf g + sα ,Tf g ≤ f∈X α S − −    D E 1  ∗ + ∗ αs(n) + ∆T p f,f † . αS − α R     (n) Proof. Due to the minimizing property of fˆα we have 1 1 T fˆ(n) gobs + fˆ(n) Tf gobs + (f), α S α − Rn α ≤ α S − Rn      for all f , which is equivalent to ∈ X 1 1 fˆ(n) (f) Tf gobs T fˆ(n) gobs . (10) Rn α − Rn ≤ α S − − αS α −      ∗ (n) ∗ ∗ † ∗ † As T sα = T p fn−1 ∂ (f ) fn−1 = ∂ n(f ) by Proposition 2.2, it follows that − ∈ R − R

∗ (n) T sα (n) † (n) † ∗ (n) (n) † ∆ fˆ ,f = n fˆ n f T s , fˆ f Rn α R α − R − α α −   1   1 D E (Tf gobs)  T fˆ(n) gobs ≤ αS − − αS α −   T ∗s(n), fˆ(n) f † + (f) f † . − α α − Rn − Rn D E  (n) Due to the strong duality (Propostion 2.2) the extremal relation αpˆα (n) obs − ∈ ∂ (T fˆα g ) holds true, and thus the generalized Young equality yields S − 1 1 T fˆ(n) gobs = ∗ αpˆ(n) + pˆ(n),T fˆ(n) gobs −αS α − αS − α α α −   1   D E = ∗ αs(n) + s(n),T fˆ(n) gobs αS − α α α − 1  (n) D (n) E ∆ ∗ αs , αpˆ − α S − α − α 1   ∗ αs(n) + s(n),T fˆ(n) gobs ≤ αS − α α α −   D E where we have used that the Bregman distance is non-negative. Combining this

7 gives

T ∗s(n) 1 1 ∆ α fˆ(n),f † (Tf gobs)+ ∗ αs(n) + s(n),Tf † gobs Rn α ≤ αS − α S − α α −   + (f) f †   D E Rn − Rn 1 = (Tf gobs)+ s(n),Tf gobs αS − α − 1 D T ∗s(n) E + ∗ αs(n) + ∆ α f,f † . α S − α Rn    T ∗s(n) ∗ Now the identity ∆ α (f,f †) = ∆T p(f,f †) shown in Lemma 2.3 completes Rn R the proof.

3 Higher order variational source conditions in Hilbert spaces

We will now go back to the Hilbert space setting, where , are Hilbert spaces 1 2 1 2 X Y and (f) := 2 f X , (g) = 2 g Y , to prove (5) using variational source conditions,R whichk arek definedS as follows:k k Definition 3.1 (Variational source condition VSCl(f †, Φ)). Let f † , let Φ be a concave index function, and let n N. Then the statement ∈ X ∈ ω(n−1) : f † = (T ∗T )n−1ω(n−1) ∃ ∈ X 1 (11) f : ω(n−1),f f 2 +Φ Tf 2 ∧ ∀ ∈ X ≤ 2 k k k k D E   will be abbreviated by VSC2n−1(f †, Φ), and the statement

p(n) : f † = (T ∗T )n−1T ∗p(n) ∃ ∈ Y 1 (12) p : p, p(n) p 2 +Φ T ∗p 2 ∧ ∀ ∈ Y ≤ 2 k k k k D E   will be abbreviated by VSC2n(f †, Φ). VSCl(f †, Φ) for l N will be referred to as variational source condition of order l with index function∈ Φ for (the true solution) f †. Note that that VSC1(f †, Φ) is the classical variational source condition, and VSC2(f †, Φ) coincides with the source condition from [21] up to the term 1 p 2, 2 k k which implies that VSC2(f †, Φ) is formally weaker than the condition in [21]. It is well known that the spectral H¨older source condition (4) with ν (0, 1] implies ∈ VSC1(f †, A idν/(ν+1)) for some A > 0 and id(t) := t (see [24]). Therefore, it is easy to see that for any l N and ν [0, 1] the implication ∈ ∈ † ∗ l−1+ν l † ν f ran (T T ) 2 A> 0 : VSC f , A id ν+1 (13) ∈ ⇒ ∃     8 holds true. The converse implication is false for ν (0, 1) as discussed in 5.1. For ν = 1 we have by [38, Prop. 3.35] that ∈ §

† ∗ l l † f ran (T T ) 2 A> 0 : VSC f , A√ . (14) ∈ ⇔ ∃ ·   The aim of this section is to prove error bounds for iterated Tikhono v regular- ization based on these source conditions: Theorem 3.2. Let l,m N with m l/2, let Φ be an index function, and let ∈ ≥ l † ψ(s) := supt≥0[st + Φ(s)] denote the Fenchel conjugate of Φ. If VSC (f , Φ) holds true, there exists a constant C > 0 depending only on−l and m such that

2 δ2 1 fˆ(m) f † C + αl−1ψ for all α,δ > 0. (15) α − ≤ α −α   

Proof. We choose n N such that l = 2n or l = 2n 1. Then m n. In the following C will denote∈ a generic constant depending− only on m and≥ l. The proof proceeds in four steps. Step 1: Reduction to the case m = n. By Proposition 2.2 and the definition of the Bregman distance we have for all k 2 and f that ≥ ∈ X k−1 ∆ (fˆ(k),f)= (fˆ(k)) (f) f T ∗pˆ(j), fˆ(k) f . Rk α Rk α − Rk − − α α − * j=1 + X k−1 ∗ (j) ˆ(k−1) By the optimality condition j=1 T pˆα = fα and the minimizing property ˆ(k) of fα (10) we have P

1 2 1 2 2 fˆ(k) f = ∆ (fˆ(k),f) Tf gobs T fˆ(k) gobs 2 α − Rk α ≤2α − − α −   (16)

f fˆ(k−1), fˆ (k) f . − − α α − D E Choosing f = f † gives

1 2 1 2 fˆ(k) f † g† gobs + fˆ(k−1) f † fˆ(k) f † . 2 α − ≤ 2α − α − α −

2 (k) † Multiplying by four, subtracting fˆα f on both sides and completing the − square we get

2 2δ2 2 fˆ(k) f † +4 fˆ(k−1) f † . α − ≤ α α −

So it is enough to prove (15) for m = n as this will then also imply the claimed error bound for all m n by the above inequality. ≥ Step 2: Error decomposition based on Lemma 2.4. Both Assumptions (11) and (12) imply that there exist p(1),..., p(n−1) , ω(1),..., ω(n−1) such ∈ Y ∈ X

9 that f † = (T ∗T )j−1T ∗p(j),f † = (T ∗T )jω(j) for j =1,...,n 1. In the following we will write p(1) = p and ω(1) = ω. We have ∂ (f †) =− f † = T ∗p , so Lemma 2.4 yields R { } { }

2 1 2 2 fˆ(n) f † Tf gobs +2α s(n),Tf gobs + αs(n) α − ≤ α − α − − α   2 D E + f f † (17) − n−1 2 1 2 = Tf g obs + α p pˆ(k) + f f † α − − α − k=1 ! X

(n) n−1 ( k) for sα = p pˆα and all f . − k=1 ∈ X † n−1 ˆ(k) ˆ(k) obs We will choose f = nf αω k=1 fα . As T ω = p and T fα g = (k) P − − − αpˆα by strong duality, we have − P 2 2 n−1 ˆ(n) † 1 † obs 2 † ˆ(k) fα f n g g + (n 1)f αω fα . (18) − ≤ α − − − − k=1  X

It remains to bound the second term, which does not look favourable at first sight (k) † (n) † as we know that fˆα f should converge to zero slower than fˆα f − − ˆ(k) for k

bk,j + bk−1,j+1 = bk,j+1. (21)

Using (20) we can add zero in the form

n−1 n−1 n−k 0= αω + αlω(l) ( 1)j b = αω + ( 1)j b αk+j−1ω(k+j−1) − k,j − k,j j=1 Xl=1 k+Xj=l+1 kX=1 X

10 to find that

n−1 n−1 n−k (n 1)f † αω fˆ(k) = f † fˆ(k) + ( 1)j b αk+j−1ω(k+j−1) − − − α  − α − k,j  j=1 kX=1 kX=1 X   and by the triangle inequality this yields (19) with

n−k σ := ( 1)j b αk+j−1ω(k+j−1), k N. k − k,j ∈ j=1 X † (0) It will be convenient to set σ := f and fˆα := 0. 0 − Step 3: proof of (15) for the case l = 2n 1. In view of (18) and (19) it suffices to prove by induction that given VSC2−n−1(f †, Φ) (11) we have

2 δ2 1 fˆ(k) f † σ C + α2n−2ψ − , k =0, 1,...,n 1. (22) α − − k ≤ α α −   

For k = 0 this is trivial. Assume now that (22) holds true for k 1 with k 1,...,n 1 . Insert f = f † + σ in (16) to get − ∈{ − } k 2 1 2 2 fˆ(k) f † σ g† + T σ gobs T fˆ(k) gobs α − − k ≤α k − − α −  

2 f † + σ fˆ(k−1) , fˆ(k ) f † σ . − k − α α − − k D E (k−1) † Now we add and subtract fˆα f σk−1 to the first term of the inner product to find − −

2 1 2 2 fˆ(k) f † σ g† + T σ gobs T fˆ(k) gobs α − − k ≤ α k − − α −  

2 σ σ , fˆ(k) f † σ − k − k−1 α − − k D E +2 fˆ(k−1) f † σ fˆ(k) f † σ α − − k−1 α − − k

The last term, denoted by E := 2 fˆ(k−1) f † σ fˆ(k) f † σ , α − − k−1 α − − k

∗ (l) will be dealt with at the end of this step. Because of the identity T Tω = ω(l−1) and (21) we have

n−k σ σ = αk−1ω(k−1) + ( 1)j (b + b )αk+j−1ω(k+j−1) k − k−1 − k,j k−1,j+1 j=1 X 1 = T ∗T σ + ( 1)n−kb αn−1ω(n−1) −α k − k,n−k+1

11 for k> 1, and it is easy to see that this also holds true for k = 1. Therefore, 1 σ σ , fˆ(k) f † σ = T σ ,g† + T σ gobs + gobs T fˆ(k) k − k−1 α − − k α k k − − α D E D E + ( 1)n−kb αn−1ω(n−1), fˆ(k) f † σ , − k,n−k+1 α − − k D E which yields

2 1 2 2 fˆ(k) f † σ g† gobs T fˆ(k) T σ gobs + E (23) α − − k ≤ α − − α − k −  

( 1)n−k2b αn−1 ω(n−1), fˆ(k) f † σ − − k,n−k+1 α − − k D 2n−1 † E For shortage of notation denote b = 2bk,n−k+1. Apply VSC (f , Φ) (11) n−k (k) † n−1 n−1 2 with f = ( 1) (fˆα f σ )/(bα ) and multiply by (bα ) to obtain − − − k ( 1)n−k2b αn−1 ω(n−1), fˆ(k) f † σ − − k,n−k+1 α − − k 1 2 D E 2 fˆ(k) f † σ + (bαn−1)2Φ (bαn−1)−2 T fˆ(k) g† T σ . ≤ 2 α − − k α − − k  

Combining this bound with (23) yields

1 2 1 2 2 fˆ(k) f † σ g† gobs T fˆ(k) T σ gobs + E 2 α − − k ≤α − − α − k −   2 + (bα n−1)2Φ (bαn −1)−2 T fˆ(k) g† T σ . α − − k  

Then we have 1 2 1 2 2 fˆ(k) f † σ g† gobs T fˆ(k) T σ gobs + E 2 α − − k ≤α − − α − k −   2 + (bαn−1)2Φ (bα n−1)−2 T fˆ(k) g† T σ α − − k   δ2 1 2 T fˆ(k) T σ g† + E ≤ α − α α − k − 2 + (bαn− 1)2Φ (bαn−1)−2 T fˆ(k) g† T σ α − − k   δ2 τ + b2α2n−2 sup − ( Φ(τ)) + E ≤ α α − − τ≥0   δ2 1 = +4b2 α2n−2ψ − + E. α k,n−k+1 α   (k−1) † (k) † To get rid of E = 2 fˆα f σ fˆα f σ subtract the term − − k−1 − − k 2 1 (k) † fˆα f σ on both sides and use Young’s inequality as well as the 4 − − k induction hypothesis (22).

12 Step 4: Proof of (15) for the case l =2n. In view of (18) and (19) it suffices to prove by induction that given VSC2n(f †, Φ) (12) we have

2 δ2 1 fˆ(k) f † σ C + α2n−1ψ , k =0,...,n 1. (24) α − − k ≤ α −α −   

Again, the case k = 0 is trival. Assume that (24) holds true for all j =1,...,k 1. Note that −

2 fˆ(k) f † σ = fˆ(k) f † σ , fˆ(k) f † σ α − − k α − − k α − − k Dk E

fˆ(k) f † σ , fˆ(j) f † σ ≤ α − − k α − − j * j=1 + X   k−1 + fˆ(k) f † σ fˆ(j) f † σ . α − − k α − − j j=1 X Then Young’s inequality together with the induction hypothesis (24) gives

2 k 1 ˆ(k) † ˆ(k) † ˆ(j) † fα f σk fα f σk, fα f σj 2 − − ≤ * − − − − + j=1   (25) X δ2 1 + C + α2n−1ψ . α −α    A simple computation (for example another induction) shows that

k n−k−1 αω σ = ( 1)j b αk+j ω(k+j) =:σ ˆ . − − j − k,j k j=1 j=1 X X 2n † ∗ By VSC (f , Φ) (12) we have σk ran T and by Proposition 2.2 we have ∗ (k) (k) (k) k−∈1 ∗ (j) (j) (j) T pˆα ∂ (fˆα ) = fˆα T pˆα as well as αpˆα ∂ (T fˆα ∈ Rk { − j=1 } − ∈ S − obs (j) obs g )= T fˆα g such that { − } P k fˆ(k) f † σ , fˆ(j) f † σ α − − k α − − j * j=1 + X   k k = T ∗pˆ(j) T ∗p T ∗(T ∗−1σ ), fˆ(j) kf † + αω +ˆσ α − − k α − k *j=1 j=1 + X X k k = pˆ(j) p (T ∗−1σ ), ( αpˆ(j))+ αT ω + T σˆ + k(gobs g†) α − − k − α k − *j=1 j=1 + X X k k T σˆ =α p + (T ∗−1σ ) pˆ(j), pˆ(j) p k + kE, k − α α − − α * j=1 j=1 + X X

13 ∗−1 k (j) † obs where E := p + (T σ ) pˆα ,g g . On the right hand side of k − j=1 − k (j) ∗−1 the scalar productD we now exchangeP pˆα pE by (T σk) to find j=1 − k k P T σˆ p + (T ∗−1σ ) pˆ(j), pˆ(j) p k k − α α − − α * j=1 j=1 + X X 2 k k T σˆ = p + (T ∗−1σ ) pˆ(j), (T ∗−1σ ) k p + (T ∗−1σ ) pˆ(j) k − α k − α − k − α * j=1 + j=1 X X and together with the identity

n−k n−k−1 T σˆ (T ∗−1σ ) k = ( 1)j b αk+j−1p(k+j) + ( 1)j b αk+j−1p(k+j) k − α − k,j − k,j j=1 j=1 X X = ( 1)n−kb αn−1p(n) − k,n−k it follows that

2 k k fˆ(k) f † σ , fˆ(j) f † σ = α p + (T ∗−1σ ) pˆ(j) α − − k α − − j − k − α * j=1 + j=1 X   X

k + b αn ( 1)n−k p + (T ∗−1σ ) pˆ(j) , p(n) + kE, (26) k,n−k −  k − α  * j=1 + X   so we are finally in a position to apply VSC2n−1(f †, Φ) (12). For shortage ∗−1 k (j) of notation denote ˜b = 4b andp ˜ = p + (T σ ) pˆα . Choose k,n−k k − j=1 p = ( 1)n−kp/˜ (˜bαn−1), and multiply the inequality by α(˜bαn−1)2 to obtain − P k 4˜b αn ( 1)n−k p + (T ∗−1σ ) pˆ(j) , p(n) k,n−k −  k − α  * j=1 + (27) X α   p˜ 2 + ˜b2α2n−1Φ (˜bαn−1)−2 T ∗p˜ 2 . ≤ 2 k k k k   Now combine (25), (26) and (27) to find

2 2 2 fˆ(k) f † σ ˜b2α2n−1Φ (˜bαn−1)−2 f † + σ fˆ(k) α − − k ≤ k − α   α δ2 1 + p˜ 2 4α p˜ 2 +4kE + C + α2n−1ψ . (28) 2 k k − k k α −α    Completing the square, we get α 7 8k2δ2 p˜ 2 4α p˜ 2 +4kE = α p˜ 2 +4 p,˜ g† gobs . 2 k k − k k −2 k k − ≤ 7α

14 (k) † 2 Now we subtract fˆα f σ in (28) from both sides to find − − k 2 2 fˆ(k) f † σ ˜b2α2n−1Φ (˜bαn−1)−2 f † + σ fˆ(k) α − − k ≤ k − α   2 δ2 1 fˆ(k) f † σ + C + α2n−1 ψ − α − − k α −α    τ δ2 1 ˜b2α2n −1 sup − ( Φ (τ)) + C + α2n−1ψ ≤ α − − α −α τ≥0      1 δ2 1 =16b2 α2n−1ψ − + C + α2n−1ψ . k,n−k α α −α      Note that under a spectral source condition as on the left hand side of the implication (13), the VSC of the right hand side of (13) and Theorem 3.2 yield the error bound C(δ/α2 + αl−1+ν ). For the choice α δ2/(l+ν) this leads to the (m) † (l−1+ν)∼/(l+ν) optimal convergence rate fˆα f = δ . However, we have − O l † ν/(ν+1) derived this rate under the weaker assumption VSC (f , Aid ) using only variational, but no spectral arguments.

4 Higher order convergence rates in Banach spaces

In this section we will introduce a third order version of the variational source condition (6) in Banach spaces. Let us abbreviate (6) by VSC1(f †, Φ, , ) in the following. First we give a definition for the second order source conditionR S in Banach spaces based on [21, (4.2)]. Definition 4.1 (Variational source condition VSC2(f †, Φ, , )). Let Φ be an index function and a proper, convex, lower-semicontinuousR S functional on . We say that f † R satisfies the second order variational source conditionX VSC2(f †, Φ, , )∈if X there exist p ∗ such that T ∗p ∂ (f †) and g˜ ∂ ∗(p) such that R S ∈ Y ∈ R ∈ S

∗ 1 g˜ f † ∗ ∗ p : p p, g˜ ∆ ∗ (p, p)+Φ ∆ ∗ (T p,T p) . (29) ∀ ∈ Y h − i≤ 2 S R   ∗ Remark 4.2. Let Assumption 2.1 hold. Then ∂ (p) = Jq∗,Y∗ (p) , with S { 2 † } Jq∗,Y∗ being the duality mapping defined in the appendix. So VSC (f , Φ, , ) 1 R S ∗ is equivalent to [21, (4.2)] up to the additional term 2 ∆S (p, p). It is easy to see from the proof of [21, Theorem 4.4], that one still can conclude convergence rates q ∗ ∗ δ ∆ fˆ ,f † αq −1( Φ)∗ 1/αq −1 + D˜ , (30) R α ≤ − − α     with a slightly changed constant D>˜ 0. Definition 4.3 (Variational source condition VSC3(f †, Φ, , )). Let Φ be an index function and a proper, convex, lower-semicontinuousR S functional on R

15 . We say that f † satisfies the third order variational source condition VSCX 3(f †, Φ, , ) if∈ there X exist p ∗ and ω such that T ∗p ∂ (f †) and T ω ∂ ∗(p)RandS if there exist constants∈ Y β ∈0 X, µ > 1 and t >∈ 0 Ras well as f ∗ ∈∂ S(f † tω) for all 0 1 Rthe conditionS VSC3(f †, Φ) is equivalent to ≥ 1 f t> 0: ω,f f 2 +Φ Tf 2 + βt2µ−2, ∀ ∈ X ∀ h i≤ 2 k k k k   as the limit t 0 gives back the original inequality. Now we replace f by † f−f +tω → 2 t and multiply by t to see that this is equivalent to

† 2 † 1 † 2 2 Tf g + tT ω 2µ tω,f tω f f f + tω + t Φ − 2 + βt , − − − ≤ 2 − t !

which is equivalent to VSC3(f †, Φ, , ). Rsq Ssq We can now state the main result of this section: Theorem 4.5. Suppose Assumption 2.1 and that VSC3(f †, Φ, , ) is satisfied ∗ with constants β,µ, and t and that c−1δ αq −1 t for someR cS > 0. Define Φ(s)=Φ(sq/r). Then the error is bounded≤ by ≤

q ∗ ˜ q−r e ˆ(2) † δ 2(q∗−1) C (c + T ω ) 2µ(q∗−1) ∆R fα ,f C + α Φ kq∗−1k + βα ≤ α − α ! !     − e with constants C, C˜ > 0 depending at most on q, r, c, and cq,Y and cq∗,Y∗ from Lemma A.2.

16 (2) † The proof consists of the following three lemmas. First we show that ∆R(fˆα ,f ) 1 is related to the Bregman distance ∆ ∗ ( αpˆ , αp) as we will later actually α S − α − use VSC3(f †, Φ, , ) to prove convergence rates forp ˆ . R S α Lemma 4.6. If T ∗p ∂ (f †), then ∈ R ˆ(2) † 2 † obs 1 ∆R fα ,f g g + ∆S∗ ( αpˆα, αp) . ≤ α S − c ∗ ∗ − −  q ,Y     Proof. We apply Lemma 2.4 with f = f † to find 1 1 ∆ fˆ(2),f † g† gobs + p pˆ ,g† gobs + ∗ ( α(p pˆ )) . R α ≤ αS − − α − α S − − α   The generalized Young inequality applied to the middle term yields 2 ∆ fˆ(2),f † g† gobs + ∗ ( α(p pˆ )) . R α ≤ α S − S − − α   As is q-smooth, ∗ is q∗ convex, so we can apply Lemma A.2 to obtain Y Y (2) † 2 † obs −1 ∆ fˆ ,f g g + c ∗ ∗ ∆ ∗ ( αpˆ , αp) . R α ≤ α S − q ,Y S − α −     The next lemma shows convergence  rates in the image space. Such rates have also been shown under a first order variational source condition on f † in [27, Theorem 2.3]. Lemma 4.7. Suppose there exist p ∗ and ω such that T ∗p ∂ (f †) ∗ ∈ Y ∈ X ∈ R and T ω ∂ (p). Then there exists a constant Cq > 0 depending only on q such that∈ S

∗ T fˆ g† C δ + αq −1 T ω . α − ≤ q k k  

Proof. From [38, Lemma 3.20] we get 1 q T fˆ g† (T fˆ gobs)+ g† gobs . 2q−1q α − ≤ S α − S −

 By the minimizing property of fˆα (10) we have

(T fˆ gobs) g† gobs α f † (fˆ ) S α − − S − ≤ R − R α    = α∆ fˆ ,f † α T ∗p, fˆ f † . − R α − α −   D E Now using the non-negativity of the Bregman distance we have 1 q T fˆ g† 2 g† gobs + α p T fˆ g† 2q−1q α − ≤ S − k k α − q −q q∗ 2δ 2  q (2α) ∗ + T fˆ g† + p q , ≤ q q α − q∗ k k

17 where the last inequality follows from the g† gobs δ as well as from the generalized Young inequality. Therefore we have− ≤

q q∗ q 2δ (2α) ∗ T fˆ g† 2qq + p q . α − ≤ q q∗ k k  

The claim then follows from taking the q-th root and noticing that we have ∗ q∗ q∗ q q ∗ p = J (T ω) = T ω (see (57)) as well as α q = αq −1. k k k q,Y k k k The main part of the proof of Theorem 4.5 consists in the derivation of convergence rates for the dual problem:

q∗−1 Lemma 4.8. Suppose that Assumption 2.1 holds true and define αq := α , Φ(s)=Φ(sq/r). Moreover, let VSC3(f †, Φ, , ) hold true with constants β,µ, and t. If α is chosen such that c−1δ α Rt,S for some c> 0, then ≤ q ≤ e q ∗ ˜ q−r 1 δ 2 C (c + T ω ) 2µ ∆S∗ ( αpˆα, αp) C + αq Φ − k k + βαq , 2α − − ≤ α − αq !   e where C, C˜ > 0 depend at most on q, r, c, cq,Y , and cq∗,Y∗ .

∗ ∗ Proof. It follows from (58) and T ω ∂ (p) that αqT ω ∂ ( αp). To- gether with (9) we obtain ∈ S − ∈ S −

1 sym 1 obs ∆ ∗ ( αpˆ , αp)= αp ( αpˆ ), α T ω (T fˆ g ) α S − α − α − − − α − q − α − D E = T ∗pˆ T ∗p,f † α ω fˆ + pˆ p,gobs g† . α − − q − α α − − D E obs † The second term E := pˆα p,g g will be estimated later. Artificially adding zero in the form f ∗ − f ∗ with− f ∗ ∂ (f † α ω), we find αq − αq α q ∈ R − q

1 sym ∗ ∗ † ∆ ∗ ( αpˆ , αp)= f T p,f α ω fˆ + E α S − α − αq − − q − α D ∗ ∗ † E + T pˆα f ,f αqω fˆα . − αq − − D E sym ˆ † In view of (9) the last term is the negative symmetric Bregman distance ∆R (fα,f α ω). The first term can be bounded using VSC3(f †, Φ, , ) by− choosing − q R S f = fˆα and t = αq:

r 1 sym 2 −r † 2µ ∆ ∗ ( αpˆ , αp) α Φ α T fˆ g + α T ω + βα α S − α − ≤ q q α − q q   + ∆ fˆ ,f † α ω ∆sym (fˆ ,f † α ω)+ E eR α − q − R α − q   r α2Φ α−r T fˆ g† + α T ω + βα2µ + E. ≤ q q α − q q  

e

18 Now we use our joker. We subtract

1 1 obs ∆ ∗ ( αp, αpˆ )= ∆ (T fˆ g , α T ω) α S − − α α S α − − q (see (8)) from both sides leading to r 1 2 −r † ∆S∗ ( αpˆα, αp) α Φ α T fˆα g + αqT ω α − − ≤ q q − (31)  ˆ obs 2µ ∆S (T fα g , αqT ω)+ βα + E. − e − − q obs So we need to bound ∆ := ∆S (T fˆα g , αqT ω) from below. By Lemma A.2 we have (as q r) − − ≤ q−r r ∆ c max α T ω , T fˆ gobs + α T ω T fˆ gobs + α T ω . ≥ q,Y k q k α − q α − q   Moreover, it follows from Lemma 4.7 and the choice δ cαq that ≤ T fˆ gobs + α T ω T fˆ g† + g† gobs + α T ω α − q ≤ α − − k q k

C q (δ + αq T ω ) Cqαq (c + T ω ) . ≤ k k ≤ k k Therefore,

max α T ω , T fˆ gobs + α T ω α max ( T ω , C (c + T ω )) k q k α − q ≤ q k k q k k   αq max (1, Cq) (c + T ω ) . ≤ k k Hence, there exists a constant C˜ > 0 depending on q, cq,Y , and r such that

1 ∗ r ∆ 2r−1C˜ (c + T ω )q−r α(q −1)(q−r)−1 T fˆ gobs + α T ω . α ≥ k k α − q

Note that (q∗ 1)(q r) 1= (r 1)(q∗ 1). In order to replace g obs by g† on the right hand− side− we− use the− inequality− − r r r 21−r T fˆ g† + α T ω T fˆ gobs + α T ω g† gobs α − q − α − q ≤ −

(see [38, Lemma 3.20]) leading to

˜ q−r r 1 C (c + T ω ) ˆ † r−1 r ∆ kr−1 k T fα g + αqT ω +2 δ . −α ≤ αq − −  

Inserting this into (31) yields r 1 2 −r † 2µ ∆ ∗ ( αpˆ , αp) α Φ α T fˆ g + α T ω + βα + E α S − α − ≤ q q α − q q ˜  q−r  r Ce (c + T ω ) ˆ † r−1 r kr −1 k T fα g + αqT ω 2 δ − αq − −   2 ˜ q−r −1 αq sup C (c + T ω ) αq τ Φ( τ) ≤ τ≥0 − k k − − h r  i ˜ q−r r−1 δ e2µ + C (c + T ω ) 2 r−1 + E + βαq . k k αq

19 ∗ The supremum equals Φ C˜ (c + T ω )q−r α−1 , by the definition of the − − k k q convex conjugate . To deal with E we use the generalized Young inequality e 1 −1 ∗ ∗ ∗ ∗ 1 c ∗ ∗ q q c ∗ ∗ q q q ,Y (αpˆ αp), q ,Y (gobs g†) α 2 α − 2 − *    + q ∗ − ∗ q c ∗ ∗ ∗ 1 c ∗ ∗ q q δ q ,Y αpˆ αp q + q ,Y ≤ 2α k α − k q 2 α   and apply Lemma A.2, using that ∗ is q∗ convex, to find Y q ∗ − q∗ q obs † 1 1 cq∗,Y∗ q δ E = pˆ p,g g ∆ ∗ ( αpˆ , αp)+ . α − − ≤ 2α S − α − q 2 α   r q q r−q r−q δ δ δ The assumption δ αq, or equivalently δ αq , implies r−1 q−1 = α . ≤ ≤ αq ≤ αq Further (c + T ω )q−r cq−r, hence there exists a constant C > 0 depending k k ≤ on q, r, c, cq,Y , and cq∗,Y∗ such that

q ∗ 1 δ 2 q−r −1 2µ ∆ ∗ ( αpˆ , αp) C + α Φ C˜ (c + T ω ) α + βα , 2α S − α − ≤ α q − − k k q q     which completes the proof. e Now Theorem 4.5 is an immediate consequence of Lemma 4.6 and Lemma 4.8.

5 Verification of higher order variational source conditions

In this section we provide some examples how higher order VSCs can be verified for specific inverse problems.

5.1 Hilbert spaces

In the following we introduce spaces κ which are defined by conditions due to Neubauer [33] and describe necessaryX and sufficient conditions for rates of T ∗T ∗ convergence of spectral regularization methods. Let Eλ := 1[0,λ)(T T ), λ 0, denote the spectral projections for the operator T ∗T with the characteristic≥ function 1[0,λ) of the interval [0, λ). For an index function κ we define

T 1 T ∗T κ := f : f X T < , f X T := sup Eλ f . (32) X ∈ X k k κ ∞ k k κ λ>0 κ(λ) X n o

The function κ corresponds to the function in spectral source conditions f † ran(κ(T ∗T )). The corresponding convergence rate function is ∈

2 −1 √ Φκ(t) := κ Θκ (√t) , Θκ(λ) := λκ(λ).   20 ν/(ν+1) In particular, Φidν/2 = id . The following theorem is a generalization of [26, Thm. 3.1] from the special case l = 1 to general l N: ∈ Theorem 5.1. Let κ be an index function such that t κ(t)2/t1−µ is decreasing for some µ (0, 1), κ κ is concave, and κ is decaying7→ sufficiently rapidly such that ∈ · ∞ −k 2 k=0 κ(2 λ) Cκ := sup 2 < . (33) ∗ κ(λ) ∞ 0<λ≤kT T k P Moreover, let l N and define κ˜(t)= κ(t)tl/2. Then ∈ f † T A> 0 : VSCl+1(f †, AΦ ). (34) ∈ Xκ˜ ⇔ ∃ κ Note that condition (33) holds true for all power functions κ(t) = tν with ν > 0, but not for logarithmic functions κ(t) = ( ln t)−p with p> 0. The first two conditions on the other hand imply that κ must− not decay to 0 too rapidly. They are both satisfied for power functions κ(t)= tν/2 if and only if ν (0, 1). We point out that for the case l = 1 the condition (33) is not required.∈ Proof. We first show for all l N that ∈ † T ( l ) T † ∗ l ( l ) f ω 2 : f = (T T ) 2 ω 2 (35) ∈ Xκ˜ ⇔ ∃ ∈ Xκ which together with the special case l = 0 from [26, Thm 3.1] already implies (34) for even l:

( l ) T † ∗ l ( l ) (i) Assume there exists ω 2 κ such that f = (T T ) 2 ω 2 . Define λ+ λ+ε ∈ X 0 := limεց0 0 . Then,

R 2 R 1 ∗ l l 2 † T T ∗ 2 ( 2 ) f X T = sup 2 Eλ (T T ) ω κ˜ λ>0 κ˜(λ)

λ+ 2 1 l ( l ) = sup λ˜ d E ω 2 κ˜(λ)2 λ˜ λ>0 Z0 λ+ 1 l 2 l ( 2 ) sup 2 λ d Eλ˜ ω ≤ λ>0 κ˜(λ) 0 Z λ+ 2 2 1 ( l ) ( l ) = sup d E ˜ ω 2 = ω 2 < . 2 λ T λ>0 κ(λ) 0 Xκ ∞ Z

(ii) Now assume that f † T . It follows that ∈ Xκ˜ λ+ ∞ 2−kλ+ 1 ˜−l † 2 1 ˜−l † 2 2 λ d Eλ˜ f = 2 λ d Eλ˜ f κ(λ) κ(λ) −k−1 Z0 k=0 Z2 λ X ∞ 2−kλ+ 1 −k−1 −l † 2 2 (2 λ) d Eλ˜ f ≤ κ(λ) −k−1 k=0 Z2 λ X

21 ∞ −k 2 2−kλ+ 1 κ(2 λ) † 2 d E˜f ≤ κ(λ)2 κ(2−kλ)2(2−k−1λ)l λ k=0 Z0 X l ∞ −k 2 2 κ(2 λ) † 2 = lim E2−k λ+εf εց0 κ(λ)2 κ˜(2−kλ)2 k=0 X l ∞ 2 −k 2 † 2 l † 2 2 κ(2 λ) f X T 2 Cκ f X T . ≤ κ(λ) κ˜ ≤ κ˜ k=0 X

(l/2) ∞ ˜−l/2 † This shows that ω := 0 λ dEλ˜ f is well defined. Moreover, we (l/2) ∗ −(l/2) † (l/2) have ω = (T T ) f and ω T < . R Xκ ∞

To prove the theorem in the case of odd l we use the polar decomposition T = U(T ∗T )1/2 with a partial isometry U satisfying N(U) = N(T ) and set ( l+1 ) ( l ) T T ∗ p 2 := Uω 2 . As U : is an isometry, (35) implies Xκ → Yκ † T ( l+1 ) T ∗ † ∗ l−1 ∗ ( l+1 ) f p 2 : f = (T T ) 2 T p 2 . (36) ∈ Xκ˜ ⇔ ∃ ∈ Yκ Applying (34) for l = 0 from [26, Thm. 3.1] to and TT ∗ yields (34) for the case of odd l. Y The equivalence (34) together with the equivalence in [1, Prop. 4.1] also shows that in Hilbert spaces higher order variational source conditions are equivalent to certain symmetrized multiplicative variational source conditions. We have already seen at the end of 3 that VSCl(f †, A idν/(ν+1)) implies the (m§ ) † (l−1+ν)/(l+ν) order optimal convergence rate fˆα f = δ for an op- − O timal choice of α and m l/2. It follows from [26] and Theorem 5.1 that ≥  VSCl f †, A idν/(ν+1) , with ν (0, 1) is not only a sufficient condition for this ∈ rate of convergence, but in contrast to spectral H¨older source conditions also a necessary condition: Corollary 5.2. Let l N, m l/2, and ν (0, 1). Moreover, let f † = 0 and (m) (m) obs ∈ ≥ ∈ 6 let fˆα = fˆα (g ) denote the m-times iterated Tikhonov estimator. Then the following statements are equivalent: 1. A> 0 : VSCl f †, A idν/(ν+1) ∃   2.

ˆ(m) obs † (l−1+ν)/(l+ν) C > 0 δ > 0 : sup inf sup fα (g ) f Cδ ∃ ∀ δ>0 α>0 kgobs−T f †k≤δ − ≤

For operators which are a-times smoothing in the sense specified below, higher order variational source conditions can be characterized in terms of Besov spaces in analogy to first order variational source conditions (see [26]):

22 Corollary 5.3. Assume that is a connected, smooth Riemannian manifold, which is complete, has injectivityM radius r > 0 and a bounded geometry (see [41] for further discussions) and that T : Hs( ) Hs+a( ) is bounded and boundedly invertible for some a > 0 and all sM R→. Then forM all f † L2( ), all l N and all ν (0, 1) we have ∈ ∈ M ∈ ∈ l † ν † (l−1+ν)a A> 0 : VSC f , A id ν+1 f B ( ), (37a) ∃ ⇔ ∈ 2,∞ M A> 0 : VSCl f †, A√  f † Bla ( )= Hla( ). (37b) ∃ · ⇔ ∈ 2,2 M M  5.2 Non-quadratic smooth penalty terms The results in this subsection do not require Hilbert spaces and are formulated in the more general setting of Assumption 2.1. In [21, Lemma 5.3] it is stated that under a smoothness condition on the VSC2(f †, Φ, , ) follows from the benchmark condition T ∗p ∂ (f †), RT ω ∂ ∗(p). In theR S following proposi- tion we generalize this result∈ toR rates corresponding∈ S to H¨older conditions with exponent ν (1, 2) using the technique of [26, Thm. 2.1]. ∈ Proposition 5.4 (verification of VSC2(f †, Φ, , )). Suppose that Assumption R S 2.1 holds true. Let be a Banach space continuously embedded in and assume that is continuouslyX Fr´echet-differentiable in a neighborhood ofXf † with R ∈ X respect to e and e ′ : ∗ is uniformly Lipschitz continuous with respect k·kX R X → X e ∗ to X in this neighborhood. Further assume that there exists p e such k·k ∗ † ′ † ∗ ∈ Y e ∗ ∗ that T p ∂ (f ), [fe] = (eT p) X and define g˜ := Jq ,Y (p) . Suppose that there∈ existsR a familyR of operators| P L( ) indexed by k ∈N Y such that k ∈ Y ∈ P g˜ T for all k N, and let k ∈ X ∈ −1 κ := (I P )˜g , σ := max T P g˜ e , 1 . (38) e k k − k kY k k X  2 † If limk→∞ κk = 0, then there exists C > 0 such that VSC ( f , Φ, , ) holds true with the index function R S

1/2 q Φ(τ) := C inf σkτ + κ . (39) k∈N k h i Proof. We show (29) for all p ∗ by distinguishing three cases: ∈ Y 1 1 ∗ ∗ ∗ ∗ 2 Case 1: p := p : p p, g˜ A ∆R (T p,T p) , with a constant A > 0 whose∈ A exact{ value∈ Y willh be− choseni ≤ later. For these p the} inequality thus 1 holds with Φ(τ)= A √τ which is smaller than (39) for C 1/A. ∗ ∗ 1−1/q −1 ≥ ∗ ∗ Case 2: p := p : ∆S∗ (p, p) 2cq∗,Y ∗ g˜ . is q -convex, as is q-smooth,∈B so{ by∈ Lemma Y A.2 we have for≥ all p k thatk} Y Y ∈B 1 p p, g˜ p p g˜ ∆ ∗ (p, p) , h − i≤k − kk k≤ 2 S so (29) also holds for p . Case 3: p ∗ ( ∈B). By our regularity assumptions on , there exist ∈ Y \ A∪B R

23 † † e † constants Cf ,c> 0 such that for all f with f f X Cf we have the first order Taylor approximation ∈ X − ≤

e † ∗ † c † 2 (f) (f )+ T p,f f + f f e , R ≤ R − 2 − X where c is the Lipschitz constant of ′. Applying Young’s inequalities (f)+ ∗(T ∗p) T ∗p,f and (f †)+ ∗R(T ∗p)= T ∗p,f † , we find R R ≥ h i R R ∗ ∗ ∗ ∗ ∗ † ∗ † c † 2 (T p) (T p)+ T (p p),f + T (p p),f f f f e R ≥ R − − − − 2 − X

∗ † for all p and for all f with f f e C † , which is equivalent to ∈ Y ∈ X − X ≤ f

∗ † ∗ ∗ c † 2 T (p p),f fe ∆ ∗ (T p,T p)+ f f e (40) − − ≤ R 2 − X

∗ † e † for all p and for all f with f f X Cf . We decompose the left hand side∈ of Y (29) as follows:∈ X − ≤

e p p, g˜ = p p, P g˜ + p p, (I P )˜g . h − i h − k i h − − k i † −1 Now for some small ε> 0 choose f in (40) as f = f + εT Pkg˜. Then we can conclude that

∗ −1 1 ∗ ∗ cε −1 2 p p, P g˜ = T (p p),T P g˜ ∆ ∗ (T p,T p)+ T P g˜ e . h − k i − k ≤ ε R 2 k X

As p / we know that p p is bounded, say p p B. Now choose A ∈ B C k − k k − k ≤ from above as A = f† . Then from p / we know, that Bkg˜k ∈ A

∗ ∗ 1 ∆ ∗ (T p,T p) 2 A p p g˜ AB g˜ C † R ≤ k − kk k≤ k k≤ f ∗ ∗ 1 † so we can choose ε = ∆R∗ (T p,T p) 2 /σk, which ensures f f Cf † . Therefore we have − ≤

c ∗ ∗ 1 p p, P g˜ 1+ σ ∆ ∗ (T p,T p) 2 . h − k i≤ 2 k R   Combining everything and using Lemma A.2 with ∗ being q∗-convex we find Y c ∗ ∗ 1 p p, g˜ 1+ σ ∆ ∗ (T p,T p) 2 + κ p p h − i≤ 2 k R k k − k  c  ∗ ∗ 1 q 1 1+ σ ∆ ∗ (T p,T p) 2 + C κ + ∆ ∗ (p, p) , ≤ 2 k R q k 2 S 1  ∗ ∗ 1 q ∆ ∗ (p, p)+ C σ ∆ ∗ (T p,T p) 2 + κ , ≤ 2 S k R k   q/q∗ 1 2 2+c 1 with Cq = ∗ , C = max , Cq, , which completes the proof. q q cq∗,Y∗ 2 A   

24 Proposition 5.5 (verification of VSC3(f †, Φ, , )). Suppose that Assumption 2.1 holds true and let ω as in DefinitionR 4.3S exist. Let 0 < t¯ 1 and assume that for all f ∗ ∂∈ X(f †) there exists some ω∗ ∗ such that ≤ ∈ R ∈ X f ∗ f ∗ tω∗ C t2, (41) k − t − k≤ ω ¯ ∗ † whenever 0 < t t and ft ∂ (f tω). This last assumption follows for example from ≤being two times∈ R Fr´echet-differentiable− in in a neighborhood of f † with ′′ :R L( , ∗) uniformly Lipschitz continuousX in this neighbor- hood. FurtherR assumeX → X X

∆ (f ,f ) C f f µ (42) R 1 2 ≥ µ k 1 − 2k for some µ> 1, C > 0 and all f ,f dom( ). We have: µ 1 2 ∈ R 1. If ω∗ = T ∗p(2) for some p(2) ∗, then VSC3(f †, Φ, , ) holds true with Φ(τ) := p(2) τ 1/q . ∈ Y R S 2. Suppose that µ 2, 1 + 1 =1, and that there exists a family of operators µ µ∗ ∗ ≤ ∗ ∗ ∗ Pk L( ) indexed by k N such that Pkω T for all k N, and let ∈ X ∈ ∈ Y ∈

∗ ∗ −1 ∗ κ := (I P )ω ∗ , σ := (T ) P ω ∗ . (43) k k − k kX k k k kY 3 † If limj→∞ κk =0, then VSC (f , Φ, , ) holds true with the index func- tion R S

∗ κµ Φ(τ) := inf σ τ 1/q + k . (44) N k ∗ µ∗/µ k∈ " µ (Cµµ) #

Proof. Firstly, to prove that (41) is implied by two times differentiability with ′′ Lipschitz continuous, recall that ∂ (f)= ′[f] if is Fr´echet-differentiable inR , then by the first order Taylor approximationR {R } of tR ′[f † tω] at t =0 weX have 7→ R −

′[f † tω] ′[f †]+ t ′′[f †](ω, ) Ct2 ω 2 R − − R R · X ∗ ≤ k k ∗ ′′ † 2 for some C > 0. Thus (41) holds with ω = [f ](ω, ) and Cω = C ω . Now let (41) hold true, then we have for all f ∗ R ∂ (f † · tω) that k k t ∈ R − f ∗ T ∗p,f † tω f t ω∗,f † tω f + C t2 f † tω f . (45) t − − − ≤− − − ω − −

Then using (42) and Young’s inequality, we find that

1 ∗ C t2 f † tω f γt2∆ f,f † tω µ ∆ f,f † tω + βt2µ ω − − ≤ R − ≤ R − 1 µ∗ − µ − ∗   1 µ µ with γ := CωCµ and β := µ∗ µ γ . So we only need to bound the first term on the right hand side of (45) and this is done in two ways based on the two different assumptions:

25 1. If ω∗ = T ∗p(2), then

t ω∗,f † tω f = t p(2),g† tT ω Tf − − − − − − D E q 1/q t2 p(2) t−q Tf g† + tT ω . ≤ −   Hence Assumption 4.3 holds true Φ(τ ) := p(2) τ 1/q .

N 1 2. In the second case we have for all k with cµ := ∗ µ∗/µ that ∈ µ (Cµµ) t ω∗,f † tω f − − − = t P ω∗,f † tω f t (I P )ω∗,f † tω f − k − − − − k − − tσ Tf g† + tT ω + tκ f † tω f ≤ k − k − − ∗ ∗ µ t2 σ t−1 Tf g† + tT ω + c tµ −2κµ + C f † tω f ≤ k − µ k µ − −  ∗  t2 σ t−1 Tf g† + tT ω + c κµ + ∆ f,f † tω ≤ k − µ k R −  ∗  −q †  q for t t¯ 1 as µ 2. Substituting τ = t Tf g + tT ω and taking≤ the≤ infimum over≥ k shows Assumption 4.3 with Φ− given by (44). It

follows as in [26, Thm. 2.1] that Φ is an index function. Example 5.6. Let Ω be some measurable space with σ-finite measure. Let = Lq(Ω) for 1 < q 2 such that is q-smooth and 2-convex. Moreover, Y µ ≤ Y let = L (Ω) with µ = 2 or µ 3 such that the norm X is two times differentiableX with Lipschitz continuous≥ second derivative andk·k choose ( ) = 1 µ R · (2) µ X . Then (42) holds true by Lemma A.2. Assume that p, ω, and p arek·k given as in Proposition 5.5, Case 1. Then Assumption 4.3 holds true, and Theorem 4.5 yields q δ ∗ ∆ fˆ(2),f † = + α2( Φ)∗ Cα−1 + α2µ , (46) R α O α q − − q q     q∗−1  where we again use the notation αq := α . Note that the Fenchel conjugate ( Φ)∗ fulfills ( Φ)∗( s) 1/s such that − − − ∼ q ∗ δ ∗ q min(3,2µ ) ˆ(2) † 3 2µ q−1+min(3,2µ∗) inf ∆R fα ,f = inf + αq + αq = δ . (47) α>0 O α>0 α O        The best known error bound for Tikhonov regularization (requiring the existence of p and ω as above) is q δ 2q ˆ † 2 q+1 inf ∆R fα,f = inf + αq = δ . (48) α>0 O α>0 α O        ∗ (2) (see [34]). As µ > 1 we see that the bound (47) for fˆα is better than (48). In particular, if µ =2 or µ =3 and hence min(3, 2µ∗)=3, the right hand side 3q/(q+2) (2) † in (47) is δ , which for q =2 yields the optimal bound fˆα f = O k − k δ3/4 for the spectral source condition (4) with ν = 3. However, for µ > 3  Othe rate (47) is slower. 

26 5.3 Application to iterated maximum entropy regulariza- tion In this subsection we will apply Propositions 5.4 and 5.5 to the case that the penalty term is chosen as a cross-entropy term given by the Kullback-Leibler divergence

f (f) := KL(f,f ) := f ln f + f dx (49) R 0 f − 0 ZM  0  for some Riemannian manifold . Here f0 is some a-priori guess of f, possibly constant. For more backgroundM information and references on entropy regular- ization we refer to [39]. Under the source condition (7) convergence rates of ˆ † order fα f L1 = (√δ) were shown in [11] by variational methods and in [13] byk a reformulation− k O as Tikhonov regularization with quadratic penalty term ˆ † 2/3 for a nonlinear forward operator. In [36] the faster rate fα f L1 = (δ ) was obtained under the source condition T ∗T ω ∂ (f †k). − k O A simple computation shows that ∈ R

ϕ∗ ∆R (f, ϕ) = KL(f, ϕ)

∗ ∗ 1 if ϕ ∂ (ϕ), i.e. ϕ = ln(ϕ/f0). Let be a Hilbert space and T : L ( ) linear∈ andR bounded. We want to approximateY f † where L1( )M is closed→ Y and convex, from noisy data gobs L2 with ∈C C ⊂ M ∈ g† gobs δ − Y ≤ † and some a-priori guess f0 of f . The set may contain only probability densities or further a-priori∈ information C such asC box-constraints. To this end we apply generalized Tikhonov regularization in the form of maximum entropy regularization

ˆ obs 2 fα arg min Tf g Y + α KL (f,f0) . ∈ f∈C − h i This amounts to choosing (f) := KL(f,f )+ι (f) with the indicator function R 0 C ιC(f) := 0 if f and ιC(f) := else. If all iterates are in the interior of , Bregman iteration∈ C is given by ∞ C

ˆ(n) obs 2 ˆ(n−1) fα arg min Tf g Y + α KL f, fα , ∈ f∈C − h  i otherwise the iteration formula may involve an element of the normal cone of (n−1) C at fˆα . Theorem 5.7. Let T := R/Z, let := Td be the d-dimensional torus, = M Y L2(Td) and define (g) = 1 g 2 . Suppose that T and its L2-adjoint T ∗ are Ssq 2 k kY a > 0 times smoothing in the sense that T,T ∗ : Bs (Td) Bs+a(Td) are p,q → p,q

27 isomorphisms for all s [0, 3a], p [2, ], q [2, ]. Moreover, suppose there exists ρ> 0 such that ∈ ∈ ∞ ∈ ∞

f † ρ ρ−1 a.e. in Td (50) ≤ f0 ≤ and either f L1 : f 0, f dx =1 , then we set p := • C ⊂ ∈ ≥ ∞

or sup f ∞ < and Ra>d/2, then we set p := 2. • f∈C k kL ∞ We make a further case distinction: 1. Assume that

† f s Td Bp,∞( ) for some s (a, 2a) . f0 ∈ ∈

2 † Then there exists C > 0 such that VSC (f , Φ, KL( ,f0), sq) holds true with · S s−a Φ(τ)= Cτ s .

2. Assume additionally to (50) that f † ρ and ≥ d f †,f Bs (Td) for some s 2a + , 3a . (51) 0 ∈ p,∞ ∈ p   3 † Then there exists C > 0 such that VSC (f , Φ, KL( ,f0), sq) holds true with µ =2 and · S s−2a Φ(τ)= Cτ s−a .

2a In all four cases we obtain for the parameter choice α δ s+a the convergence rate ∼

(2) † 2s KL fˆ ,f = δ s+a , δ 0. (52) α O ց     Proof. Note that due to (50) the functional is Fr´echet differentiable at f † in L∞(Td) and R

† ′ † f ′′ † 1 [f ](g)= ln g dx, [f ](g,h)= † hg dx R Td f R Td f Z  0  Z 2 (53) ′′′ † 1 2 and [f ](g,h,h)= † h g dx. R − Td f Z   Hence, under the given regularity assumption we have

f † T ∗p = ln . f  0 

28 Local Lipschitz continuity of ′ w.r.t. := L∞(Td) follows from local bound- R X edness of ′′ w.r.t. . We are going to verify the assumptions of Propositions ∗ 5.4 and 5.5R choosingX = Lp (Td) withe the conjugate exponent p∗ of p. The X operators Pk for k eN0 are chosen as quasi-interpolation operators onto level k of a dyadic spline∈ space of sufficiently high order as defined in [9, eq. (4.20)]. Then we have the following two inequalities, where C > 0 now and in the fol- lowing will denote a generic constant. By [9, Thm. 4.5] we have for all t > 0 and all h Bt (Td) ∈ p,∞ −kt (I Pk)h p C2 h t , (54) k − kL ≤ k kBp,∞ where C is independent of ω∗ and k. And we have for all q (1,p], t,r > 0, with t < r that ∈

k(r−t) Pk Bt →Br C2 , (55) k k p,∞ q,2 ≤ where again, C is independent of k. The second inequality can be established as ∞ ltq q 1/q follows: From the fact that N (f) := 2 P f P f p (with t,p,q l=0 k l − l−1 kL P−1 := 0 and change to supremum norm if q = ) is an equivalent norm on s Td P ∞  Bp,q( ) ([9, Thm. 5.1]) we conclude that

Pk t r C sup Nr,q,2(Pkf) Bp,∞→Bp,q k k ≤ f with Nt,p,∞(f)≤1

We have PlPk = Pmin(k,l) and therefore PlPkf Pl−1Pkf = 0 for l > k. From lt − Nt,p,∞(f) = supl∈N 2 Plf Pl−1f Lp 1 we can conclude Plf Pl−1f Lq ≥ 2−lt by the continuityk of the− embeddingk ≤Lp(Td) ֒ Lq(Td).k Thus− we findk → 1 k 2 2lr −2lt k(r−t) Pk Bt →Br C 2 2 C2 . k k p,∞ p,q ≤ ! ≤ Xl=0 The proof works for both p = 2,p = simultaneously, but we distinguish between the two smoothness assumptions.∞ † −1 −1 Case 1: As ρ f /f0 ρ and ln restricted to [ρ,ρ ] is infinitely smooth, it follows from the≤ theorem≤ in [31] that ln(f †/f ) Bs . As T ∗ : Bs−a(Td) 0 ∈ p,∞ p,∞ → Bs (Td) is an isomorphism we have p Bs−a. ′′[f]=1/f is uniformly p,∞ ∈ p,∞ R bounded in a small neigborhood of f † ρ so we can conclude from (54) that −k(s−a) ≥ κk C2 p s−a . It follows from (55) that ≤ k kBp,∞ −1 −1 −1 T P p C T P p 0 C T a 0 P s−a a p s−a k Lp k B B →B k Bp,∞→B Bp,∞ ≤ p,2 ≤ p,2 p,2 k k p,2 k k k(2a−s) C2 p s−a , ≤ k kBp,∞ k(2a−s) −k so σk max 1, C2 p s−a . Then Proposition 5.4 and the choice 2 ≤ { k kBp,∞ } ∼ τ 1/(2s) show that VSC2(f †, Φ, KL( ,f ), ) holds true with · 0 Ssq −k(s−2a) −k(2s−2a) s−a Φ(τ)= C inf 2 √τ +2 Cτ s . k∈N ≤ h i

29 For the last statement we apply (30) with q = 2 and note that ( Φ)∗(x) = (a−s)/a ∗ −† (2) C( x) for x < 0 and ( Φ) (x) = else. Hence, KL f , fˆα − − 2a ∞ ≤ C(δ2/α + αs/a), and the choice α δ s+a leads to (52).   Case 2: Assumption (42) of Proposition∼ 5.5 is satisfied with µ = 2 due to the inequalities

2 2 KL(f ,f ) f f 1 if f 1 = f 1 =1 1 2 ≥k 1 − 2kL k 1kL k 2kL 2 4 1 2 f ∞ + f ∞ KL(f ,f ) f f 2 if f ∞ , f ∞ < 3k 1kL 3k 2kL 1 2 ≥ 2k 1 − 2kL k 1kL k 2kL ∞   for p = and p = 2, respectively (see [3, Prop. 2.3] and [28, Lemma 2.6]). By † ∞ 2 f ρ we have f0 ρ , so using the theorem in [31] and infinite smoothness ≥ ≥ 2 † s Td of F (x) := 1/x on [ρ , ) we obtain 1/f , 1/f0 Bp,∞( ). It then follows ∞ † s ∈Td † −1 from [29, Thm 6.6, case 1b] that f /f0 Bp,∞( ). As ρ f /f0 ρ and ln restricted to [ρ,ρ−1] is infinitely smooth,∈ it follows again≤ from [31]≤ that ln(f †/f ) Bs . Thus we have ln(f †/f ) = T ∗T ω for some ω and as 0 ∈ p,∞ 0 ∈ X T ∗T : Bs−2a(Td) Bs (Td) is an isomorphism, we obtain ω Bs−2a(Td). p,∞ → p,∞ ∈ p,∞ By our assumptions we have s 2a 1/p > 0 and hence ω L∞(Td) by the standard embedding theorem (see− [40,− Thm. 4.6.1]). In particular,∈ is Fr´echet- † ∞ Td ∗ R ′ † differentiable at f tω w.r.t. L ( ) for t< t := ρ/ ω with ft := [f tω] ∗ − † k k R − given by ft ,h = ln(f tω) ln f0,h (see (53)). Therefore, assumption (41) of Propositionh i 5.5h is satisfied− with− i ω ω∗ = . f † Again using [29, Thm 6.6, case 1b] we obtain

ω∗ Bs−2a(Td). ∈ p,∞ −k(s−2a) ∗ We conclude from (54) that κk C2 ω s−2a . Moreover, it follows ≤ k kBp,∞ from (55) that both for p = 2 and p = we have ∞ ∗ −1 ∗ −k(s−3a) ∗ σk (T ) Ba →L2 Pk Bs−2a→Ba ω Bs−2a C2 ω Bs−2a . ≤k k 2,2 k k p,∞ 2,2 k k p,∞ ≤ k k p,∞ Now Proposition 5.5 and the choice 2−k τ 1/(2s−2a) show that f † satisfies VSC3(f †, Φ, KL( ,f ), ) with µ = 2 and ∼ · 0 Ssq −k(s−3a) −k(2s−4a) s−2a Φ(τ) C inf 2 √τ +2 Cτ s−a . ≤ k∈N ≤ h i For the last statement we apply Theorem 4.5 with r = q = µ = 2 and note that Φ=Φ, ( Φ)∗(x) = C( x)(2a−s)/a for x < 0 and ( Φ)∗(x) = else. Hence, − − − 2a∞ † (2) 2 s/a 4 KL f , fˆα C(δ /α + α + βα ), and the choice α δ s+a leads to (52). e ≤ ∼   Remark 5.8. It can be shown in analogy to case 1 that the convergence rate (52) also holds true for s (0,a) if f †/f Bs (Td). To see this, first note ∈ 0 ∈ p,∞

30 in analogy to Proposition 5.5 that VSC1(f †, Φ, , ) (see (6)) holds true with Φ defined in (44) if after replacing ω∗ by f ∗ = ln(Rf †S/f ) in (43) we have κ 0 0 k → and σk well-defined and finite for all k N. Concerning the optimality of the rate∈ (52) we refer e.g. to [26]. The gap s Td [2a, 2a + d/2) in the Nikolskii scale B2,∞( ) in which we are not able to de- rive this rate may be due to technical difficulties. In the H¨older-Zygmund scale s Td B∞,∞( ) the only gaps are at integer multiples of a. In contrast, for quadratic regularization we had gaps only at integer multiples of a also in the Nikolskii scale (see Corollary 5.3).

6 Numerical results

In this section we give some numerical results for the iterated maximum entropy regularization. Test problem: We choose T : L1 (T) L2(T) to be the periodic convolution → operator (Tf)(x) := 1 k(x y)f(y) dy with kernel 0 − ∞ R 1 −1 2x 2 x 1 k(x)= exp( x j /2) = sinh cosh − ⌊ ⌋− , x R | − | 4 4 ∈ j=−∞ X   where x := max n Z: n x . Then integration by parts shows that T = 2 ⌊ ⌋ −1 { ∈ ≤ } ( ∂x + (1/4)I) , and hence T satisfies the assumptions of Theorem 5.7 with a−= 2. We choose f = 1 and the true solution f † such that f † 1 is the standard 0 − B-spline B5 of order 5 with supp(B5) = [0, 1] and equidistant knots. Then we † 5.5 T have f B2,∞( ), i.e. s = 5.5. (To see this note that piecewise constant ∈ 0.5 T functions belong to B2,∞( ) using the defintion of this space via the modulus of continuity.) Hence, according to Theorem 5.7 a third order variational source 3 † 3/7 condition condition VSC (f , Aτ , KL( , 1), sq) is satisfied for some A> 0. Implementation: The operator T is· discretizedS by sampling k and f on an equidistant grid with 480 points. Then matrix-vector multiplications with ∗ (2) T = T can be implemented efficiently by FFT. The minimizers fˆα and fˆα are computed by the Douglas-Rachford algorithm. To be consistent with our theory, we consider the constraint set := f L1(T): 0 f 5 a.e. . We checked that for none of the unconstrainedC { minimizers∈ the≤ bound≤ constraints} were active such that an explicit implementation of these constraints was not required for our test problem. To check the predicted convergence rates with respect to the noise level δ the regularization parameter α was chosen by an a-priori rule of the form α = cδσ with an optimal exponent σ > 0 and a constant c chosen to minimize the constants for the upper bound given in the figures. As we bound the worst case errors in our analysis we tried to approximate the worst case noise. Let † obs Gδ := g + δ sin(2πk ): k N . For each value of δ we found g Gδ such that the{ reconstruction· error∈ gets} maximal. This in particular yielde∈ d larger propagated data errors than discrete white noise.

31 100

10-2

10-4

10-6

10-8

10-10

10-12 10-7 10-6 10-5 10-4 10-3

Figure 1: Predicted and computed approximation error for standard and iter- ated maximum entropy regularization.

10-2

10-3

10-4

10-5

10-6

10-7

10-8 10-7 10-6 10-5 10-4 10-3

Figure 2: Predicted and computed convergence rates for standard and iterated maximum entropy regularization.

32 Discussion of the results: Figure 1 shows the approximation error as a func- † (2) tion of α, i.e. KL(fα,f ) where fα and fα , rsp., are the reconstructions for exact data gobs = g†. The two dashed lines indicate the corresponding asymp- totic convergence rates predicted by our theory, which are in good agreement with the empirical results. Note that the saturation effect limits the conver- (1) gence of the standard maximum entropy estimator fα = fα to the maximal † 2 rate KL(fα,f )= (α ). Iterating maximum entropy estimation yields a clear O (2) † s/a 11/4 improvement to KL(fα ,f )= (α )= (α ). Figure 2 displays the convergenceO ratesO with respect to the noise level δ for the a-priori choice rule of α described above. Of course, in practice one would rather use some a-posteriori stopping rule such as the Lepskii balancing principle, but this is not in the scope of this paper. Again, we observe very good † 4/3 agreement of the empirical rate KL(fˆα,f ) = (δ ) with the maximal rate for non-iterated maximum entropy regularization,O as well as agreement of the (2) † 2s/(s+a) 22/15 rate KL(fˆα ,f )= (δ )= (δ ) of the Bregman iterated estimator (2) O O fˆα with the rate predicted by Theorem 5.7.

7 Discussion and outlook

We have shown that variational source conditions can yield convergence rates of arbitrarily high order in Hilbert spaces. Furthermore we have used this approach to show third order convergence rates in Banach spaces for the first time. This naturally leads to the question about arbitrarily high order convergence rates in Banach spaces. There are some difficulties that prevented us from going to fourth order convergence rates. The approach in Section 4 relies on compar- ison with the convergence rates for the dual variables. As the dual problem to generalized Tikhonov regularization is again some form of generalized Tikhonov regularization, it has finite qualification. Therefore, it does not seem straightfor- ward to get to higher orders with this approach. For the approach in Section 3 (2) † † q∗−1 one needs some relation between ∆R(fˆα ,f ) and ∆R(fˆα,f α ω), which is established in (17) and (18) using the polarization identity.− However, this identity only has generalizations in the form of inequalities in Banach spaces. We hope that the tools provided in this paper will initialize a further de- velopment of regularization theory in Banach spaces concerning higher order convergence rates. Topics of future research may include other regularization methods (e.g. iterative methods), verifications of higher order variational source conditions for non-smooth penalty terms, stochastic noise models, more general data fidelity terms, or nonlinear forward operators.

33 A Duality mappings and an inequality by Xu and Roach

In this appendix we derive a lower bound on Bregman distances in terms of norm powers from more general inequalities by Xu and Roach. First recall the following definitions (see e.g. [32]):

Definition A.1. The modulus of convexity δY : (0, 2] [0, 1] of the space is defined by → Y

δ (ε) := inf 1 y +˜y /2 : y, y˜ , y = y˜ =1, y y˜ = ε . Y { −k k ∈ Y k k k k k − k } The modulus of smoothness ρ : (0, ) (0, ) of is defined by Y ∞ → ∞ Y ρ (τ) := sup ( y +˜y + y y˜ )/2 1 : y, y˜ , y =1, y˜ = τ . Y { k k k − k − ∈ Y k k k k } The space is called uniformly convex if δ (ε) > 0 for every ε> 0. It is called Y Y uniformly smooth if limτ→0 ρY (τ)/τ = 0. The space is called r-convex (or Y r convex of power type r) if there exists a constant K > 0 such that δY (ε) Kε for all ε > 0. Similarly, it is called s-smooth (or smooth of power type≥ s) if ρ (τ) Kτ s for all τ > 0. Y ≤ As an example we mention that Lp spaces with 1 Y1. By [8, Chap.1, Theorem 4.4] we have S q k·kY ∂ (y)= J (y), where J is the duality mapping given by S q,Y q,Y J (y) := ω ∗ : ω,y = ω y , ω = y q−1 . (57) q,Y ∈ Y h i k kk k k k k k n o J is (q 1)-homogeneous, i.e. for all λ R we have q,Y − ∈ J (λy) = sgn(λ) λ q−1J (y). (58) q,Y | | q,Y

We assume that is q-smooth, Jq,Y is single-valued [8, Chap.1, Corollary 4.5], and we can dropY superscripts in Bregman distances. Lemma A.2 (Xu-Roach). Let be an r-convex Banach space and = 1 q Y S q k·kY for some q > 1. Then there exist a constant cq,Y > 0 depending only on q and the space such that for all x, y we have Y ∈ Y q−r r cq,Y max y , x y x y if q r, ∆S (x, y) {k k k − k} k − k ≤ ≥ c y q−r x y r if q r. ( q,Y k k k − k ≥

34 Proof. Let q r. By [42, Theorem 1] there exists a constant C depending only on q and such≤ that Y 1 tr−1 ∆ (x, y) C max y , y + t(x y) q−r x y r dt. S ≥ 2r {k k k − k} k − k Z0 As max y , y + t(x y) 2max y , x y for all t [0, 1] and q r 0, we conclude{k k k − k} ≤ {k k k − k} ∈ − ≤

1 tr−1 ∆ (x, y) 2q−rC max y , x y q−r x y r dt. S ≥ {k k k − k} k − k 2r Z0 q−r 1 tr−1 This shows the lower bound with cq,Y := 2 C 0 2r dt > 0 in this case. If r < q we have max y , y + t(x y) q−r y q−r, so that the claim follows as above. {k k k − k} ≥k kR Acknowledgements. We would like to thank two anonymous referees for their comments, which helped to improve the paper. Financial support by Deutsche Forschungsgemeinschaft through grant CRC 755, project C09, and RTG 2088 is gratefully acknowledged.

References

[1] V. Albani, P. Elbau, M. V. de Hoop, and O. Scherzer. Optimal conver- gence rates results for linear inverse problems in hilbert spaces. Numerical and Optimization, 37(5):521–540, feb 2016. [2] Roman Andreev, Peter Elbau, Maarten V. de Hoop, Lingyun Qiu, and Otmar Scherzer. Generalized convergence rates results for linear inverse problems in hilbert spaces. Numerical Functional Analysis and Optimiza- tion, 36(5):549–566, mar 2015. [3] J. M. Borwein and A. S. Lewis. Convergence of best entropy estimates. SIAM J. Optim., 1(2):191–205, 1991. [4] M. Burger, E. Resmerita, and L. He. Error estimation for Bregman iter- ations and inverse scale space methods in image restoration. Computing, 81(2-3):109–135, 2007. [5] Martin Burger and Stanley Osher. Convergence rates of convex variational regularization. Inverse Problems, 20(5):1411–1421, 2004. [6] Y. Censor and S. A. Zenios. Proximal minimization algorithm with D- functions. J. Optim. Theory Appl., 73(3):451–464, 1992. [7] Gong Chen and Marc Teboulle. Convergence analysis of a proximal- like minimization algorithm using Bregman functions. SIAM J. Optim., 3(3):538–543, 1993.

35 [8] I. Cioranescu. Geometry of Banach Spaces, Duality Mappings and Non- linear Problems, volume 62 of Mathematics and its Applications. Kluwer Academic Publishers, 1990. [9] Ronald A. DeVore and Vasil A. Popov. Interpolation of Besov spaces. Trans. Amer. Math. Soc., 305(1):397–414, 1988. [10] Jonathan Eckstein. Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming. Math. Oper. Res., 18(1):202–226, 1993. [11] P P B Eggermont. Maximum entropy regularization for Fredholm integral equations of the first kind. SIAM J. Math. Anal., 24:1557–1576, 1993. [12] Ivar Ekeland and Roger Temam. Convex Analysis and Variational Prob- lems. North-Holland Publishing Company, Amsterdam, Oxford, 1976. [13] H W Engl and G Landl. Convergence rates for maximum entropy regular- ization. SIAM J. Numer. Anal., 30:1509–1536, 1993. [14] Heinz W. Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1996. [15] Jens Flemming. Generalized Tikhonov regularization and modern conver- gence rate theory in Banach spaces. Shaker Verlag, Aachen, 2012. [16] K. Frick and M. Grasmair. Regularization of linear ill-posed problems by the augmented Lagrangian method and variational inequalities. Inverse Problems, 28(10):104005, 16, 2012. [17] Klaus Frick, Dirk A. Lorenz, and Elena Resmerita. Morozov’s principle for the augmented lagrangian method applied to linear inverse problems. Multiscale Modeling & Simulation, 9(4):1528–1548, oct 2011. [18] Klaus Frick and Otmar Scherzer. Regularization of ill-posed linear equa- tions by the non-stationary augmented Lagrangian method. J. Integral Equations Appl., 22(2):217–257, 2010. [19] G. Godefroy. Renormings in Banach spaces. In J. Lindenstrauss W.B. Jon- son, editor, Handbook of the Geometry of Banach Spaces, volume 1, page pp. 781835. Elsevier Science, Amsterdam, 2001. [20] Markus Grasmair. Generalized Bregman distances and convergence rates for non-convex regularization methods. Inverse Problems, 26:115014 (16pp), 2010. [21] Markus Grasmair. Variational inequalities and higher order convergence rates for Tikhonov regularisation on Banach spaces. Journal of Inverse and Ill-Posed Problems, 21(3):379–394, Jan 2013.

36 [22] C W Groetsch. The Theory of Tikhonov regularization for Fredholm equa- tions of the first kind. Pitman, Boston, 1984. [23] B. Hofmann, B. Kaltenbacher, C. P¨oschl, and O. Scherzer. A convergence rates result for Tikhonov regularization in Banach spaces with non-smooth operators. Inverse Problems, 23(3):987–1010, 2007. [24] B. Hofmann and M. Yamamoto. On the interplay of source conditions and variational inequalities for nonlinear ill-posed problems. Applicable Analysis, 89(11):1705–1727, 2010. [25] Thorsten Hohage and Frederic Weidling. Verification of a variational source condition for acoustic inverse medium scattering problems. Inverse Prob- lems, 31(7):075006, 14, 2015. [26] Thorsten Hohage and Frederic Weidling. Characterizations of variational source conditions, converse results, and maxisets of spectral regularization methods. SIAM J. Numer. Anal., 55(2):598–620, 2017. [27] Thorsten Hohage and Frank Werner. Convergence rates for inverse prob- lems with impulsive noise. SIAM J. Numer. Anal., 52(3):1203–1221, 2014. [28] Thorsten Hohage and Frank Werner. Inverse problems with Poisson data: statistical regularization theory, applications and algorithms. Inverse Prob- lems, 32:093001, 56, 2016. [29] Jon Johnsen. Pointwise multiplication of Besov and Triebel-Lizorkin spaces. Math. Nach., 175:85–133, 1995.

[30] Barbara Kaltenbacher and Bernd Hofmann. Convergence rates for the iteratively regularized Gauss-Newton method in Banach spaces. Inverse Problems, 26(3):035007, 21, 2010. [31] Djalil Kateb. Fonctions qui op`erent sur les espaces de Besov. Proc. Amer. Math. Soc., 128(3):735–743, 2000.

[32] J. Lindenstrauss and L. Tzafriri. Classical Banach Spaces II: Function Spaces (Ergebnisse Der Mathematik Und Ihrer Granzgebiete, Vol 97). Springer-Verlag, 1979. [33] Andreas Neubauer. On converse and saturation results for Tikhonov regu- larization of linear ill-posed problems. SIAM Journal on Numerical Anal- ysis, 34(2):517527, Apr 1997. [34] Andreas Neubauer, Torsten Hein, Bernd Hofmann, Stefan Kindermann, and Ulrich Tautenhahn. Improved and extended results for enhanced con- vergence rates of Tikhonov regularization in Banach spaces. Applicable Analysis, 89(11):17291743, Nov 2010.

37 [35] Stanley Osher, Martin Burger, Donald Goldfarb, Jinjun Xu, and Wotao Yin. An iterative regularization method for total variation-based image restoration. Multiscale Modeling & Simulation, 4(2):460489, Jan 2005. [36] Elena Resmerita. Regularization of ill-posed problems in Banach spaces: convergence rates. Inverse Problems, 21(4):1303–1314, 2005. [37] Elena Resmerita and Otmar Scherzer. Error estimates for non-quadratic regularization and the relation to enhancement. Inverse Problems, 22(3):801–814, 2006. [38] Otmar Scherzer, Markus Grasmair, Harald Grossauer, Markus Haltmeier, and Frank Lenzen. Variational Methods in Imaging: 167 (Applied Mathe- matical Sciences). Springer New York, 2008. [39] J Skilling, editor. Maximum entropy and Bayesian methods. Kluwer, Dor- drecht, 1989. [40] Hans Triebel. Interpolation theory, function spaces, differential operators, volume 18 of North-Holland Mathematical Library. North-Holland Publish- ing Co., Amsterdam-New York, 1978. [41] Hans Triebel. Theory of Function Spaces II. Birkh¨auser, Basel, 1992. [42] Zong-Ben Xu and G.F Roach. Characteristic inequalities of uniformly con- vex and uniformly smooth banach spaces. Journal of Mathematical Analysis and Applications, 157(1):189210, May 1991.

38