A Appendix: Linear and

In this appendix, we have collected some results of functional analysis and algebra. Since the computational aspects are in the foreground in this book, we give separate proofs for the linear algebraic and functional analyses, although finite-dimensional spaces are special cases of Hilbert spaces.

A.1

A general reference concerning the results in this section is [47]. The following theorem gives the decomposition of an arbitrary real matrix.

Theorem A.1. Every matrix A ∈ Rm×n allows a decomposition

A = UΛV T, where U ∈ Rm×m and V ∈ Rn×n are orthogonal matrices and Λ ∈ Rm×n is diagonal with nonnegative diagonal elements λj such that λ1 ≥ λ2 ≥ ··· ≥ λmin(m,n) ≥ 0. Proof: Before going to the proof, we recall that a diagonal matrix is of the form ⎡ ⎤ λ1 0 ... 00... 0 ⎢ ⎥ ⎢ . . . ⎥ ⎢ 0 λ2 . . . ⎥ Λ = = [diag(λ ,...,λm), 0] , ⎢ . . ⎥ 1 ⎣ . .. ⎦ 0 ... λm 0 ... 0 if m ≤ n, and 0 denotes a zero matrix of size m×(n−m). Similarly, if m>n, Λ is of the form 312 A Appendix: Linear Algebra and Functional Analysis ⎡ ⎤ λ1 0 ... 0 ⎢ ⎥ ⎢ . ⎥ ⎢ 0 λ2 . ⎥ ⎢ ⎥ ⎢ . .. ⎥   ⎢ . . ⎥ diag(λ ,...,λn) Λ = ⎢ ⎥ = 1 , ⎢ λn ⎥ ⎢ ⎥ 0 ⎢ 0 ... 0 ⎥ ⎢ . . ⎥ ⎣ . . ⎦ 0 ... 0 where 0 is a zero matrix of the size (m − n) × n. Briefly, we write Λ = diag(λ1,...,λmin(m,n)). n Let A = λ1, and we assume that λ1 =0.Let x ∈ R be a unit vector m with Ax = A ,andy =(1/λ1)Ax ∈ R , i.e., y is also a unit vector. We n m pick vectors v2,...,vn ∈ R and u2,...,um ∈ R such that {x, v2,...,vn} n is an orthonormal in R and {y,u2,...,um} is an in Rm, respectively. Writing

n×n m×m V1 =[x, v2,...,vn] ∈ R ,U2 =[y,u2,...,um] ∈ R , we see that these matrices are orthogonal and it holds that ⎡ ⎤ yT   ⎢ uT ⎥ T T ⎢ 2 ⎥ λ1 w A1 = U1 AV1 = ⎢ . ⎥ [λ1y,Av2,...,Avn]= , ⎣ . ⎦ 0 B T um where w ∈ R(n−1), B ∈ R(m−1)×(n−1).Since     λ λ2 + w 2 A 1 = 1 , 1 w Bw it follows that     λ  A 1  ≥ λ2 + w 2,  1 w  1 or  ≥ 2 2 A1 λ1 + w . On the other hand, an orthogonal transformation preserves the matrix , so  ≥ 2 2 A = A1 λ1 + w . Therefore w =0. Now we continue inductively. Let λ2 = B .Wehave

λ2 ≤ A1 = A = λ1. A.1 Linear Algebra 313

If λ2 = 0, i.e., B = 0, we are finished. Assume that λ2 > 0. Proceeding as (m−1)×(m−1) (n−1)×(n−1) above, we find orthogonal matrices U˜2 ∈ R and V˜2 ∈ R such that   λ 0 U˜ TBV˜ = 2 2 2 0 C for some C ∈ R(m−2)×(n−2). By defining     10 10 U2 = ,V2 = , 0 U˜2 0 V˜2 we get orthogonal matrices that have the property ⎡ ⎤ λ1 00 T T ⎣ ⎦ U2 U1 AV1V2 = 0 λ2 0 . 00C

Inductively, performing the same partial diagonalization by orthogonal trans- formations, we finally get the claim. 2

Theorem A.2. Let A = UΛV T ∈ Rm×n be the singular value decomposition as above, and assume that λ1 ≥ λ2 ≥ ···λp >λp+1 = ··· = λmin(m,n) =0. By denoting U =[u1,...,um],V=[v1,...,vn], we have

T ⊥ Ker(A)=sp{vp+1,...,vn} =Ran(A ) , T ⊥ Ker(A )=sp{up+1,...,um} =Ran(A) .

Proof: By the orthogonality of U,wehaveAx =0ifandonlyif ⎡ ⎤ T λ1v1 x ⎢ ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎢ T ⎥ T ⎢ λpvp x ⎥ ΛV x = ⎢ ⎥ =0, ⎢ 0 ⎥ ⎢ . ⎥ ⎣ . ⎦ 0 from where the claim can be seen immediately. Furthermore, we have AT = VDTU T, so a similar result by interchanging U and V holds for the transpose. On the other hand, we see that x ∈ Ker(A)isequivalentto

(Ax)Ty = xTATy =0 for all y ∈ Rm, i.e., x is perpendicular to the range of AT. 2 In particular, if we divide U and V as 314 A Appendix: Linear Algebra and Functional Analysis

U =[[u1,...,up], [up+1,...,um]] = [U1,U2],

V =[[v1,...,vp], [vp+1,...,vn]] = [V1,V2], the orthogonal projectors P : Rn → Ker(A)andP˜ : Rn → Ker(AT) assume thesimpleforms ˜ T P = V2V2, P = U2U2 .

A.2 Functional Analysis

We confine the discussion to the Hilbert spaces.

Definition A.3. Let A : H1 → H2 be a continuous linear between Hilbert spaces H1 and H2. The operator is said to be compact if the sets A(U) ⊂ H2 are compact for every bounded set U ⊂ H1.

In the following simple lemmas, we denote by X⊥ the orthocomplement of the subspace X ⊂ H.

Lemma A.4. Let X ⊂ H be a closed subspace of the Hilbert H.Then (X⊥)⊥ = X.

Proof: Let x ∈ X. Then for every y ∈ X⊥, x, y = 0, i.e., x ∈ (X⊥)⊥. To prove the converse inclusion, letx ∈ (X⊥)⊥.DenotebyP : H → X the orthogonal projection. We have x − Px ∈ X⊥. On the other hand, Px ∈ X ⊂ (X⊥)⊥,sox − Px ∈ (X⊥)⊥. Therefore, x − Px is orthogonal with itself and thus x = Px ∈ X. The proof is complete. 2

⊥ Lemma A.5. Let X ⊂ H be a subspace. Then X = X⊥.

The proof is rather obvious and therefore omitted.

Lemma A.6. Let A : H1 → H2 be a bounded linear operator. Then

Ker(A∗)=Ran(A)⊥, Ker(A)=Ran(A∗)⊥.

Proof: We have x ∈ Ker(A∗) if and only if

0= A∗x, y = x, Ay for all y ∈ H1, proving the claim. 2 Based on these three lemmas, the proof of part (i) of Proposition 2.1 follows. Indeed, Ker(A) ⊂ H1 is closed and we have

⊥ ∗ ⊥ ⊥ H1 =Ker(A) ⊕ Ker(A) =Ker(A) ⊕ (Ran(A ) ) by Lemma A.6, and by Lemmas A.4 and A.5, A.2 Functional Analysis 315

⊥ (Ran(A∗)⊥)⊥ =(Ran(A∗) )⊥ = Ran(A∗). Part (ii) of Proposition 2.1 is based on the fact that the spectrum of a is discrete and accumulates possibly only at the origin. If ∗ A : H1 → H2 is compact, then A A : H1 → H1 is a compact and positive 2 operator. Denote the positive eigenvalues by λn, and let the corresponding ∗ normalized eigenvectors be vn ∈ H1.SinceA A is self-adjoint, the eigenvectors can be chosen mutually orthogonal. Defining 1 un = Avn, λn we observe that 1 ∗ un,uk = vn,A Avk = δnk, λnλk hence the vectors un are also orthonormal. The triplet (vn,un,λn)isthe singular system of the operator A. Next, we prove the fixed point theorem 2.8 used in Section 2.4.1. Proof of Proposition 2.8: Let T : H → H be a mapping, S ⊂ H an invariant closed set such that T (S) ⊂ S and let T be a contraction in S, T (x) − T (y) <κ x − y for all x, y ∈ S, where κ<1. For any j>1, we have the estimate

xj+1 − xj = T (xj) − T (xj−1) <κ xj − xj−1 , and inductively j−1 xj+1 − xj <κ x2 − x1 . It follows that for any n, k ∈ N,wehave k k n+j−2 xn+k − xn ≤ xn+j − xn+j−1 < κ x2 − x1 j=1 j=1 κn−1 ≤ x − x 1 − κ 2 1 by the geometric sum formula. Therefore, (xj ) is a Cauchy sequence, and thus is convergent and the is in S since S is closed. 2. We finish this section by giving without proof a version of the Riesz rep- resentation theorem. Theorem A.7. Let B : H × H → R be a bilinear quadratic form satisfying |B(x, y)|≤C x y for all x, y ∈ H, B(x, x) ≥ c x 2 for all x ∈ H for some constants 0

The variant of this for Hilbert spaces with complex coefficients is known as the Lax–Milgram lemma. See [142] for a proof.

A.3 Sobolev Spaces

In this section, we review the central definitions and results concerning Sobolev spaces. For square integrable functions f : Rn → R, define the L2-norm over Rn as 1/2 f = |f(x)|2dx < ∞. (A.1) Rn The linear space of square integrable functions is denoted by L2 = L2(Rn). The norm (A.1) defines an inner product, f,q = f(x)g(x)dx, Rn that defines a structure in L2. The for integrable functions of Rn is defined as Ff (ξ)=f(ξ)= e−iξ,xf(x)dx. Rn

The Fourier transform can be extended to L2, and it defines a bijective map- ping F : L2 → L2, see, e.g., [108]. Its inverse transformation is then n 1 F −1f (x)=f(x)= eiξ,xf(ξ)dξ. 2π Rn

In L2, Fourier transform is an up to a constant, and it satisfies the identities

f =(2π)n/2 f (Parseval) ,

f, g =(2π)n f,g (Plancherel) .

Defining the of functions as f ∗ g(x)= f(x − y)g(y)dy, provided that the converges, we have

F(f ∗ g)(ξ)=f(ξ)g(ξ).

n Let α be a multiindex, α =(α1,α2,...,αn) ∈ N . Define the length of the multiindex, |α|,as|α| = α1 + ···+ αn.Wedefine A.3 Sobolev Spaces 317

∂α1 ∂α2 ∂αn ∂|α| Dαf(x)=(−i)|α| ... f(x)=(−i)|α| f(x). α1 α2 αn α1 ··· αn ∂x1 ∂x2 ∂xn ∂x1 ∂xn Similarly, for x ∈ Rn,wedefine

α α1 α2 ··· αn x = x1 x2 xn . The Fourier transform has the property F Dαf (ξ)=ξαf(ξ). (A.2)

The classical Sobolev spaces W k over Rn, k ≥ 0, are defined as linear spaces of such functions f ∈ L2 that

Dαf ∈ L2 for all multiindices α, |α|≤k.

This space is equipped with the norm

⎛ ⎞1/2 ⎝ α 2 ⎠ f W k = |D f(x)| dx . (A.3) n |α|≤k R

By using identity (A.2), the definition of Sobolev spaces can be extended to noninteger smoothness indices. For arbitrary s ∈ R, we define the space Hs = Hs(Rn) as a linear space of those functions1 that satisfy

1/2  2 2 s f Hs = |f(ξ)| (1 + |ξ| ) dξ < ∞. (A.4) Rn

It turns out that Hk = W k for k ∈ N, and the norms (A.3) and (A.4) are equivalent. Also, from Parseval’s identity we see that H0 = L2. Let Ω ⊂ Rn be an open domain and assume, for simplicity, that the bound- ary of Ω is smooth. The Hs(Ω) of functions defined in Ω is    s |  ∈ s Rn H (Ω) f f = g Ω,g H ( ) . The norm in Hs(Ω) is the infimum of the norms of all the functions g in Hs(Rn) whose restriction to Ω is f. s1 s2 Evidently, H (Ω) ⊂ H for s1 >s2. The Sobolev spaces over bounded domains have the following important property.

n Theorem A.8. Let Ω ⊂ R be a smooth, bounded and let s1 >s2. Then the embedding

I : Hs1 (Ω) → Hs2 ,f→ f, is compact, i.e., every bounded set in Hs1 (Ω) is relatively compact in Hs2 (Ω).

1The term “” refers here to tempered distributions. Also, the should be interpreted in the distributional sense. See [108] for exact formulation. 318 A Appendix: Linear Algebra and Functional Analysis

In particular, every bounded sequence of Hs1 -functions contains a sub- sequence that converges in Hs2 . For a proof of this result, see, e.g., [2]. If Ω ⊂ Rn has a smooth boundary, we may define Sobolev spaces of func- tions defined on the boundary ∂Ω. If U ⊂ ∂Ωisanopensetandψ : U → Rn−1 is a smooth diffeomorphism, a compactly supported function f : U → C is in the Sobolev space Hs(∂Ω) if

f ◦ ψ−1 ∈ Hs(Rn−1).

More generally, by using partitions of unity on ∂Ω, we may define the condition f ∈ Hs(∂Ω) for functions whose supports are not restricted to sets that are diffeomorphic with Rn−1. The details are omitted here. We give without proof the following trace theorem of Sobolev spaces. The proof can be found, e.g., in [49].  →  Theorem A.9. The trace mapping f f ∂Ω, defined for continuous func- tions in Rn, extends to a continuous mapping Hs(Ω) → Hs−1/2(∂Ω), s>1/2. Moreover, the mapping has a continuous right inverse. B Appendix 2: Basics on

In the statistical inversion theory, the fundamental notions are random vari- ables and probability distributions. To fix the notations, a brief summary of the basic concepts is given in this appendix. In this summary, we avoid get- ting deep into the axiomatic . The reader is recommended to consult standard textbooks of probability theory for proofs, e.g., [24], [32], [43].

B.1 Basic Concepts

Let Ω be an abstract space and S a collection of its subsets. We say that S is a σ-algebra of subsets of Ω if the following conditions are satisfied: 1. Ω ∈ S . 2. If A ∈ S,thenAc =Ω\ A ∈ S . ∞ 3. If Ai ∈ S,i∈ N,then Ai ∈ S . i=1 Let S be a σ-algebra over a space Ω. A mapping µ : S → R is called a if the following conditions are met: 1. µ(A) ≥ 0 for all A ∈ S . 2. µ(∅)=0. 3. If the sets Ai ∈ S are disjoint, i ∈ N, (i.e., Ai ∩ Aj = ∅, whenever i = j), then   ∞ ∞ µ Ai = µ(Ai). i=1 i=1 The last condition is often expressed by saying that µ is σ-additive. A σ-algebra is called complete if it contains all sets of measure zero, i.e., if A ⊂ B,andB ∈ S with µ(B)=0,thenA ∈ S. It is always possible to complete a σ-algebra by adding sets of measure zero to it. 320 B Appendix 2: Basics on Probability

Ameasureµ is called finite if µ(Ω) < ∞. The measure is called σ-finite,if there is a sequence of subsets An ∈ S, n ∈ N, with the properties ∞ Ω= An,µ(An) < ∞ for all n. n=1 Of particular importance are the measures that satisfy

µ(Ω) = 1.

Such a measure is called a probability measure. Let S be a σ-algebra over Ω and P a probability measure. The triple (Ω, S, P) is called the probability space. The abstract space Ω is called the sample space, while S is the set of events,andP(A)istheprobability of the event A ∈ S. A particular example of a σ-algebra is the Borel σ-algebra B = B(Rn) over Rn, defined as the smallest σ-algebra containing the open sets of Rn.We say that the Borel σ–algebra is generated by the open sets of Rn.Inthisbook, Rn is always equipped with the Borel σ-algebra. The elements of the Borel σ-algebra are called Borel sets. As another example of a σ-algebra, let I be a discrete set, i.e., a set of either finite or enumerable set of elements. A natural σ-algebra is the set S = {A | A ⊂ I}, i.e., the power set of I containing all subsets of I. An important example of a σ-finite measure defined on the Borel σ-algebra of Rn is the Lebesgue measure m, assigning to a set its volume. For

Q =[a1,b1] ×···×[an,bn],aj ≤ bj, we have

m(Q)= dx =(b1 − a1) ···(bn − an). Q In this book, we confine ourselves to consider only random variables taking values either in Rn or in discrete subsets. Therefore, the particular σ- above are essentially the only ones needed. It is possible to develop the the- ory in more general settings, but that would require more measure-theoretic considerations which go beyond the scope of this book. A central concept in probability theory is independency. Let (Ω, S, P) be a probability space. The events A, B ∈ S are independent if

P(A ∩ B)=P(A)P(B).

More generally, a family {At | i ∈ T }⊂S with T an arbitrary index set is independent if   m m

P Ati = P(Ati ) i=1 i=1 B.1 Basic Concepts 321 for any finite sampling of the indices ti ∈ T . Given a probability space (Ω, S,P), a random variable with values in Rn is a measurable mapping X :Ω→ Rn, i.e., X−1(B) ∈ S for every open set B ⊂ Rn. A random variable with values in a discrete set I is defined similarly. We adopt in this section, whenever possible, the common notational con- vention that random variables are denoted by capital letters and their real- izations by lowercase letters. Hence, we may write X(ω)=x, ω ∈ Ω. With a slight abuse of notations, we indicate the value set of a random variable X :Ω→ Rn by using the shorthand notation X ∈ Rn. n n A random variable X ∈ R generates a probability measure µX in R equipped with its Borel σ-algebra through the formula

−1 µX (B)=P(X (B)),B∈ B.

This measure is called the probability distribution of X. The distribution function F : Rn → [0, 1] of a random variable X = n (X1,...,Xn) ∈ R is defined as

F (x)=P(X1 ≤ x1,X2 ≤ x2,...,Xn ≤ xn),x=(x1,x2,...,xn).

The expectation of a random variable is defined as

E{X} = X(ω)dP(ω)= xdµX (x), Ω Rn provided that the converge. Similarly, the correlation and covariance matrices of the variables are given as corr(X)=E{XXT} = X(ω)X(ω)TdP(ω) Ω T n×n = xx dµX (x) ∈ R Rn and   cov(X)=corr(X − E{X})=E (X − E{X})(X − E{X})T = (X(ω) − E{X})(X(ω) − E{X})TdP(ω) Ω T = (x − E{X})(x − E{X}) dµX (x) Rn

=corr(X) − E{X}(E{X})T, assuming again that they exist. 322 B Appendix 2: Basics on Probability

Let {Xn} be a sequence of random variables, and let

n 1 Sn = Xj. (B.1) n j=1 The following theorem is a version of the central limit theorem.

Theorem B.1. Assume that the random variables Xn are independent and equally distributed, and   2 E Xn = µ, cov Xn = σ .

Then the random variables Sn in (B.1) are asymptotically normally distributed in the following sense: ' x Sn − nµ 1 2 lim P √

n Proposition B.2. Let µ1 and µ2 be two σ-finite measures in R defined over the Borel σ-algebra such that µ1 is absolutely continuous with respect to µ2, n denoted µ1 ≺ µ2. Then there exists a π : R → R such that

µ1(B)= π(x)dµ2(x) B for all B ∈ B. The function π is called the Radon–Nikodym of µ1 with respect to µ2, denoted by dµ π = 1 . dµ2 It follows from the previous proposition that an absolutely continuous random variable X defines a probability density πX by −1 P(X (B)) = µX (B)= πX (x)dx, B ∈ B. B n m Let X1 ∈ R and X2 ∈ R be two random variables defined over the same probability space Ω. The joint probability distribution is defined as B.2 Conditional 323

B Rn × B Rm → R → −1 ∩ −1 µX1X2 : ( ) ( ) +, (B1,B2) P(X1 (B1) X2 (B2)). In other words, the joint probability distribution is simply the probability distribution of the product random variable

n m X1 × X2 :Ω→ R × R ,ω→ (X1(ω),X2(ω)).

This definition can be extended to any number of random variables. We say that the random variables X1 and X2 are independent,if

µX1X2 (B1,B2)=µX1 (B1)µX2 (B2),

−1 or, alternatively, if the events Xj (Bj) ∈ S are independent for all Borel sets Bj. If the random variables are absolutely continuous, the independence can be expressed in terms of the probability densities as

πX1X2 (x1,x2)=πX1 (x1)πX2 (x2)

n m for almost all (x1,x2) ∈ R × R . Based on the Radon–Nikodym theorem stated in Proposition B.2, it would be possible to define the conditional probability distributions and densities of random variables. However, we postpone this general definition and con- sider first a more restricted class of distributions to keep the discussion more tractable.

B.2 Conditional Probabilities

A central role in statistical inversion theory is played by conditional proba- bilities. Therefore, we discuss this issue in slightly more detail. The basic concept is the conditional probability of events. Hence, let A, B ∈ S and P(B) > 0. The conditional probability of A with the condi- tion B is defined as P(A ∩ B) P(A | B)= . P(B) It holds that

P(A)=P(A | B)P(B)+P(A | Bc)P(Bc),Bc =Ω\ B.

More generally, if Ω = ∪i∈I Bi with I enumerable, we have P(A)= P(A | Bi)P(Bi). i∈I

We want to extend the notion of conditional probability to random variables. The general definition can be done via the Radon–Nikodym theorem stated in Proposition B.2. However, to keep the ideas more transparent and better 324 B Appendix 2: Basics on Probability tractable, we limit the discussion first to the simple case of absolutely contin- uous variables with continuous probability densities. The general results are then briefly stated without detailed proofs. n Consider first two random variables Xi :Ω→ R i , i =1, 2. We adopt the n following shorthand notation: If Bi ⊂ R i are two Borel sets, we denote the joint probability of B1 × B2 as ∈ ∈ µX1X2 (B1,B2)=µ(B1,B2)=P(X1 B1,X2 B2), i.e., we suppress the subindex from the measure and use instead subindices for the sets to remind which set refers to which random variable. Similarly, if (X1,X2) is absolutely continuous, we write

µ(B1,B2)= π(x1,x2)dx1dx2. B1×B2 With these notations, we define first the marginal probability distribution of X1 as n2 µ(B1)=µ(B1, R )=P(X1 ∈ B1), i.e., the probability of X1 ∈ B1 regardless of the value of X2. Assume now that (X1,X2) is absolutely continuous. Then we have

µ(B1)= π(x1,x2)dx1dx2 = π(x1)dx1, n B1×R 2 B1 where the marginal probability density is given as

π(B1)= π(x1,x2)dx2. Rn2 Analogously to the definition of conditional probabilities of events, we give the following definition.

ni Definition B.3. Assume that Bi ⊂ R , i =1, 2, are Borel sets with B2 −1 having positive marginal measure, i.e., µ(B2)=P(X2 (B2)) > 0. We define the conditional measure of B1 conditioned on B2 by

µ(B1,B2) µ(B1 | B2)= . µ(B2) In particular, if the random variables are absolutely continuous, we have 1 π(B1 | B2)= π(x1,x2)dx1dx2, π(B2) B1×B2 with

µ(B2)= π(x2)dx2 = π(x1,x2)dx1dx2 > 0. n B2 R ×B2 B.2 Conditional Probabilities 325

Notice that with a fixed B2, the mapping B1 → π(B1 | B2) is a probability measure. Remark: If the random variables X1 and X2 are independent, then

µ(B1 | B2)=µ(B1),µ(B2 | B1)=µ(B2) by definition, i.e., any restriction of one variable does not affect the distribu- tion of the other. Our goal is now to define the conditional measure conditioned on a fixed value, i.e., X2 = x2. The problem is that in general, we may have P(X2 = x2) = 0 and the conditioning as defined above fails. Thus, we have to go through a limiting process.

Lemma B.4. Assume that the random variables X1 and X2 are absolutely n2 continuous with continuous densities and x2 ∈ R is a point such that

π(x2)= π(x1,x2)dx1 > 0. Rn1

(j) k Further, let (B ) ≤j<∞ be a decreasing nested sequence of intervals in R 2 1 (j) ↓{ } (j+1) ⊂ (j) (j) { } such that B2 x2 , i.e., B2 B2 and B2 = x2 . Then the limit

(j) lim µ(B1 | B )=µ(B1 | x2) j→∞ 2 exists and it can be evaluated as the integral 1 µ(B1 | x2)= π(x1,x2)dx1. π(x2) B1 Proof: Let us denote again by m(B) the Lebesgue measure of a measurable set B.Wewrite j 1 µ(B | B( ))= π(x ,x )dx dx 1 2 (j) 1 2 1 2 B ×B(j) µ(B2 ) 1 2  −1 1 = π(x )dx (j) (j) 2 2 m(B ) B 2 2 1 π(x ,x )dx dx . (j) 1 2 1 2 B(j) B m(B2 ) 2 1 , By the assumption, the functions x → π(x )andx → π(x ,x )dx 2 2 2 B1 1 2 1 are continuous, so 1 lim π(x )dx = π(x ) > 0 (j) 2 2 2 j→∞ B(j) m(B2 ) 2 and 326 B Appendix 2: Basics on Probability 1 lim π(x ,x )dx dx = π(x ,x )dx , (j) 1 2 2 2 1 2 1 j→∞ B(j) B B m(B2 ) 2 1 1 and the claim follows. 2 By the above lemma, we can extend the concept of a conditional measure to cover the case B2 = {x2}. Hence, we say that if π(x2) > 0, the conditional n probability of B1 ∈ R conditioned on X2 = x2 is the limit π(x1,x2) µ(B1 | x2)= dx1. B1 π(x2) From the above lemma, we observe that the conditional probability mea- sure µ(B1 | x2) is defined by the probability density

π(x1,x2) π(x1 | x2)= ,π(x2) > 0. π(x2) As a corollary, we obtain the useful formula for the joint probability distribution.

Corollary B.5. The joint probability density for (X1,X2) is

π(x1,x2)=π(x1 | x2)π(x2)=π(x2 | x1)π(x1).

This formula is essentially the Bayes formula that constitutes the corner stone of the Bayesian interpretation of inverse problems. Finally, we briefly indicate how the conditional probability is defined in the more general setting. n1 Let B1 ⊂ R be a fixed measurable set with positive measure µ(B1)= −1 P(X1 (B1)). Consider the mapping

n2 B(R ) → R+,B2 → µ(B1,B2).

This mapping is clearly a finite measure, and moreover

µ(B1,B2) ≤ µ(B2), i.e., it is an absolutely continuous measure with respect to the marginal mea- sure B2 → µ(B2). The Radon–Nikodym theorem asserts that there exists a measurable function x2 → ρ(B1,x2) such that

µ(B1,B2)= ρ(B1,x2)dµ(x2). B2

It would be tempting to define the conditional probability µ(B1 | x2)= ρ(B1,x2). Basically, this is correct but not without further considerations. The problem here is that the mapping x2 → ρ(B1,x2) is defined only almost everywhere, i.e., there may be a set of dµ(x2)-measure zero where it is not defined. On the other hand, this set may depend on B1.Sincethenumber B.2 Conditional Probabilities 327 of different subsets B1 is not countable, the union of these exceptional sets could well have a positive measure, in the worst case up to one. However, it is possible with careful measure-theoretic considerations to define a function

n1 n2 B(R ) × R → R+, (B1,x2) → µ(B1 | x2), such that

1. the mapping B1 → µ(B1 | x2) is a probability measure for all x2 except possibly in a set of dµ(x2)-measure zero; 2. the mapping x2 → µ(B1 |,x2) is measurable; and 3. the formula µ(B ,B )= µ(B | x )µ(dx )holds. 1 2 B2 1 2 2 We shall not go into details here but refer to the literature. References

1. A.˚ Bj¨orck. Numerical Methods for Least Squares Problems. SIAM, 1996. 2. R.A. Adams. Sobolev Spaces. Academic Press, 1975. 3. K.E. Andersen, S.P. Brooks and M.B. Hansen. Bayesian inversion of geoelec- trical resistivity data. J. R. Stat. Soc. Ser. B, Stat. Methodol., 65:619–642, 2003. 4. B.D.O. Anderson and J.B. Moore. Optimal Filtering. Prentice-Hall, 1979. 5. S. Arridge and W.R.B. Lionheart. Nonuniqueness in diffusion-based optical tomography. Optics Lett., 1998. 6. S.R. Arridge. Optical tomography in medical imaging. Inv. Probl., 15:R41– R93, 1999. 7. K. Astala and L. P¨aiv¨arinta. Calder´on’s inverse conductivity problem in the plain. Preprint, 2004. 8. A.C. Atkinson and A.N. Donev. Optimum Experimental Design.OxfordUni- versity Press, 1992. 9. S. Baillet, J.C. Mosher, and R.M. Leahy. Electromagnetic brain mapping. IEEE Signal Proc. Mag., pp. 14–30, 2001. 10. D. Baroudi, J.P. Kaipio and E. Somersalo. Dynamical electric wire tomography: Time series approach. Inv. Probl., 14:799–813, 1998. 11. J. Barzilai and J.M. Borwein. Two-point step-size gradient methods. IMA J. Numer. Anal., 8:141–148, 1988. 12. M. Bertero and P. Boccacci. Introduction to Inverse Problems in Imaging. Institute of Physics, 1998. 13. J. Besag. Spatial interaction and the statistical analysis of lattice systems. J. Royal Statist. Soc., 36:192–236, 1974. 14. J. Besag. The statistical analysis of dirty pictures. Proc. Roy. Stat. Soc., 48:259–302, 1986. 15. G.E.P. Box and G.C. Tiao. Bayesian Inference in Statistical Analysis. Wiley, 1992 (1973). 16. S.C. Brenner and L.R. Scott. The Mathematical Theory of Finite-element Methods. Springer-Verlag, 1994. 17. P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer- Verlag, 1991. 18. A.P. Calder´on. On an inverse boundary value problem. In W.H. Meyer and M.A. Raupp, editors, Seminar on and Its Applications to Continuum Physics, pp. 65–73, Rio de Janeiro, 1980. Brazilian Math. Society. 330 References

19. D. Calvetti, B. Lewis and L. Reichel. On the regularizing properties of the GMRES method. Numer. Math, 91:605–625, 2002. 20. D. Calvetti and L. Reichel. Tikhonov regularization of large linear problems. BIT, 43:263–283, 2003. 21. J. Carpenter, P. Clifford and P. Fearnhead. An improved particle filter for nonlinear problems. IEEE Proc. Radar Son. Nav., 146:2–7, 1999. 22. M. Cheney and D. Isaacson. Distinguishability in impedance imaging. IEEE Trans. Biomed. Eng., 39:852–860, 1992. 23. K.-S. Cheng, D. Isaacson, J.C. Newell and D.G. Gisser. Electrode models for electric computed tomography. IEEE Trans. Biomed. Eng., 3:918–924, 1989. 24. Y.S. Chow and H. Teicher. Probability Theory: Independence, Interchangeabil- ity, Martingales. Springer-Verlag, 1978. 25. H. Cramer. Mathematical Methods in . Princeton University Press, 1946. 26. R.F. Curtain and H. Zwart. An Introduction to Infinite-dimensional Linear Systems Theory. Springer-Verlag, 1995. 27. R. Dautray and J.-L. Lions. and Numerical Methods for Science and Technology, volume 6. Springer-Verlag, 1993. 28. H. Dehgani, S.R. Arridge, M. Schweiger and D.T. Delpy. Optical tomography in the presence of void regions. J. Opt. Soc. Am., 17:1659–1670, 2000. 29. R. Van der Merwe, A. Doucet, N. de Freitas and E. Wan. The unscented particle filter. http://www.cs.ubc.ca/ nando/papers/upfnips.ps, 2000. 30. D.C. Dobson and F. Santosa. Recovery of blocky images from noisy and blurred data. SIAM J. Appl. Math., 56(4):1181–1198, 1996. 31. D.L. Donoho, I.M. Johnstone, J.C. Hoch and A.S. Stern. Maximum entropy and the nearly black object. J. Roy. Statist. Ser. B, 54:41–81, 1992. 32. J.L. Doob. Stochastic Processes (Reprint of the 1953 original). Wiley- Interscience, 1990. 33. A. Doucet, J.F.G. de Freitas and N.J. Gordon. Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2000. 34. R.J. Elliot, L. Aggoun and J.B. Moore. Hidden Markov Models. Estimation and Control. Springer-Verlag, 1995. 35. H.W. Engl, M. Hanke and A. Neubauer. Regularization of Inverse Problems. Kluwer, 1996. 36. A.V. Fiacco and G.P. McCormick. Nonlinear Programming. SIAM, 1990. 37. M. Fink. Time-reversal of ultrasonic fields–part i: Basic principles. IEEE Trans. Ultrason. Ferroelectr. Freq. Control, 39:555–567, 1992. 38. C. Fox and G. Nicholls. Sampling conductivity images via MCMC. In Proc. Leeds Annual Stat. Research Workshop 1997, pp. 91–100, 1997. 39. M. Heath G.H. Golub and G. Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21:215–223, 1979. 40. A.E. Gelfand and A.F.M. Smith. Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc., 85:398–409, 1990. 41. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., 6:721–741, 1984. 42. D.B. Geselowitz. On the magnetic field generated outside an inhomogenous volume conductor by internal current sources. IEEE Trans. Magn., 6:346–347, 1970. References 331

43. I.I. Gikhman and A.V. Skorokhod. Theory of Stochastic Processes I. Springer- Verlag, 1980. 44. W.R. Gilks, S. Richardson and D.J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapmann & Hall, 1996. 45. E. Giusti. Minimal Surfaces and Functions of .Birkhauser, 1984. 46. S.J. Godsill, A. Doucet and M. West. Maximum a posteriori sequence estima- tion using Monte Carlo particle filters. Ann. I. Stat. Math., 53:82–96, 2001. 47. G.H. Golub and C.F. van Loan. Matrix Computations. The Johns Hopkins University Press, 1989. 48. P.J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995. 49. P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman Advanced Publishing, 1985. 50. C.W. Groetsch. Inverse Problems in the Mathematical Sciences. Vieweg, 1993. 51. M. H¨am¨al¨ainen, H. Haario and M.S. Lehtinen. Inferences about sources of neuromagnetic fields using Bayesian parameter estimation. Technical Report Report TKK-F-A620, Helsinki University of Technology, 1987. 52. M. H¨am¨al¨ainen, R. Hari, R.J. Ilmoniemi, J. Knuutila and O.V. Lounasmaa. Magnetoencephalography: theory, instrumentation, and applications to non- invasive studies of the working human brain. Rev. Mod. Phys., 65:413–487, 93. 53. M. H¨am¨al¨ainen and R.J. Ilmoniemi. Interpreting magnetic fields of the brain: Minimumnormestimates.Med. Biol. Eng. Comput., 32:35–42, 1994. 54. M. Hanke. Conjugate Gradient -Type Methods for Ill-posed Problems. Long- man, 1995. 55. P.C. Hansen and D.P. O’Learly. The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Comput., 14:1487–1503, 1993. 56. P.C. Hansen. Rank-deficient and Discrete Ill-posed Problems: Numerical As- pects of Linear Inversion. SIAM, 1998. 57. W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109, 1970. 58. J. Heino and E. Somersalo. Modelling error approach for optical anisotropies for solving the in optical tomography. Preprint (2004). 59. J. Heino and E. Somersalo. Estimation of optical absorption in anisotropic background. Inverse Problems, 18:559–573, 2002. 60. G.M. Henkin and R.G. Novikov. A multidimensional inverse problem in quan- tum and acoustic scattering. Inverse Problems, 4:103–121, 1988. 61. P.J. Huber. Robust Statistical Procedures. SIAM, 1977. 62. N. Hyv¨onen. Analysis of optical tomography with nonscattering regions. Proc. Edinburgh Math. Soc, 45:257–276, 2002. 63. D. Isaacson. Distinguishability of conductivities by electric current computed tomography. IEEE Trans. Med. Imaging, 5:91–95, 1986. 64. J.P. Kaipio, V. Kolehmainen, E. Somersalo and M. Vauhkonen. Statistical inversion and Monte Carlo sampling methods in electrical impedance tomog- raphy. Inv. Probl., 16:1487–1522, 2000. 65. J.P. Kaipio and E. Somersalo. Estimating anomalies from indirect observations. J. Comput. Phys., 181:398–406, 2002. 66. J.P. Kaipio, V. Kolehmainen, M. Vauhkonen and E. Somersalo. Inverse prob- lems with structural prior information. Inv. Probl., 15:713–729, 1999. 332 References

67. J.P. Kaipio, A. Sepp¨anen, E. Somersalo and H. Haario. Posterior covariance related optimal current patterns in electrical impedance tomography. Inv. Probl., 20:919–936, 2004. 68. J.P. Kaipio and E. Somersalo. Nonstationary inverse problems and state esti- mation. J. Inv. Ill-Posed Probl., 7:273–282, 1999. 69. R.E. Kalman and R.S. Bucy. New results in linear filtering and prediction theory. Trans. ASME J. Basic Eng., 83:95–108, 1961. 70. R.E. Kalman. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng., 82D:35–45, 1960. 71. V. Kolehmainen, S.R. Arridge, W.R.B. Lionheart, M. Vauhkonen, and J.P. Kaipio. Recovery of region boundaries of piecewise constant coefficients of an elliptic PDE from boundary data. Inv. Probl., 15:1375–1391, 1999. 72. V. Kolehmainen, S.R. Arridge, M. Vauhkonen and J.P. Kaipio. Simultaneous reconstruction of internal tissue region boundaries and coefficients in optical diffusion tomography. Phys. Med. Biol., 15:1375–1391, 2000. 73. V. Kolehmainen, S. Prince, S.R. Arridge, and J.P. Kaipio. State-estimation approach to the nonstationary optical tomography problem. J. Opt. Soc. Am. A, 20(5):876–889, 2003. 74. V. Kolehmainen, S. Siltanen, S. J¨arvenp¨a¨a, J.P. Kaipio, P. Koistinen, M. Las- sas, J. Pirttil¨a, and E Somersalo. Statistical inversion for X-ray tomogra- phy with few radiographs II: Application to dental radiology. Phys Med Biol, 48:1465–1490, 2003. 75. V. Kolehmainen, M. Vauhkonen, J.P. Kaipio, and S.R. Arridge. Recovery of piecewise constant coefficients in optical diffusion tomography. Opt. Express, 7(13):468–480, 2000. 76. V. Kolehmainen, A. Voutilainen, and J.P. Kaipio. Estimation of non-stationary region boundaries in EIT: state estimation approach. Inv Probl., 17:1937–1956, 2001. 77. K. Kurpisz and A.J. Nowak. Inverse Thermal Problems. Computational Me- chanics Publications, Southampton, 1995. 78. J. Lampinen, A. Vehtari and K. Leinonen. Application of Bayesian neural network in electrical impedance tomography. In Proc. IJCNN’99, Washington DC, USA, July 1999. 79. J. Lampinen, A. Vehtari, and K. Leinonen. Using a Bayesian neural network to solve the inverse problem in electrical impedance tomography. In B.K. Ersboll and P. Johansen, editors, Proceedings of 11th Scandinavian Conference on Image Analysis SCIA’99, pp. 87–93, Gangerlussuaq, Greenland, June 1999. 80. S. Lasanen. Discretizations of generalized random variables with applications to inverse problems. Ann. Acad. Sci. Fenn. Math. Dissertationes A, 130:1–64, 2002. 81. M. Lassas and S. Siltanen. Can one use total variation prior for edge-preserving Bayesian inversion? Inv. Probl., 20:1537–1563, 2004. 82. M.S. Lehtinen. On statistical inversion theory. In H. Haario, editor, Theory and Applications of Inverse Problems. Longman, 1988. 83.M.S.Lehtinen,L.P¨aiv¨arinta and E. Somersalo. Linear inverse problems for generalized random variables. Inv. Probl., 5:599–612, 1989. 84. D.V. Lindley. Bayesian Statistics: A Review. SIAM, 1972. 85. J.S. Liu. Monte Carlo Strategies in Scientific Computing. Springer-Verlag, 2003. References 333

86. A. Mandelbaum. Linear estimators and measurable linear transformations on a Hilbert space. Z. Wahrsch. Verw. Gebiete, 65:385–397, 1984. 87. K. Matsuura and U. Okabe. Selective minimum-norm solution of the biomag- netic inverse problem. IEEE Trans. Biomed. Eng., 42:608–615, 1995. 88. J.L. Melsa and D.L. Cohn. Decision and Estimation Theory. McGraw-Hill, 1978. 89. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller. of state calculations by fast computing machine. J. Chem. Phys., 21:1087–1091, 1953. 90. K. Mosegaard. Resolution analysis of general inverse problems through inverse Monte Carlo sampling. Inv. Probl., 14:405–426, 1998. 91. K. Mosegaard and C. Rygaard-Hjalsted. Probabilistic analysis of implicit in- verse problems. Inv. Probl., 15:573–583, 1999. 92. K. Mosegaard and A. Tarantola. Monte Carlo sampling of solutions to inverse problems. J. Geophys. Res. B, 14:12431–12447, 1995. 93. C. M¨uller. Grundprobleme der matematischen Theorie electromagnetischer Schwingungen. Springer-Verlag, 1957. 94. A. Nachman. Reconstructions from boundary measurements. Annals of Math., 128:531–576, 1988. 95. A. Nachman. Global uniqueness for a two-dimensional inverse boundary value problem. Annals of Math., 143:71–96, 1996. 96. F. Natterer. The Mathematics of Computerised Tomography. Wiley, 1986. 97. E. Nummelin. General Irreducible Markov Chains and Nonnegative Operators. Cambridge University Press, 1984. 98. B. Øksendal. Stochastic Differential Equations. Springer-Verlag, 1998. 99. R.L. Parker. Geophysical Inverse Theory. Princeton University Press, 1994. 100. J.-M. Perkki¨o. Radiative transfer problem on Riemannian . Master’s thesis, Helsinki University of Technology, 2003. 101. S. Prince, V. Kolehmainen, J.P. Kaipio, M.A. Franceschini, D. Boas and S.R. Arridge. Time-series estimation of biological factors in optical diffusion tomog- raphy. Phys. Med. Biol., 48:1491–1504, 2003. 102. A. Pruessner and D.P. O’Leary. Blind deconvolution using a regularized struc- tured total least norm . SIAM J. Matrix Anal. Appl., 24:1018–1037, 2003. 103. T. Quinto. Singularities of the x-ray transform and limited data tomography in r2 and r3. SIAM J. Math. Anal., 24:1215–1225, 1993. 104. J. Radon. Uber¨ die bestimmung von funktionen durch ihre integralw¨arte l¨angs gewisser mannichfaltigkeiten. Berichteuber ¨ di Verhandlungen der S¨achsischen Akademien der Wissenschaften, 69:262–267, 1917. 105. A.G. Ramm. Multidimensional inverse problems and completeness of the prod- ucts of solutions to PDE’s. J. Math.Anal.Appl, 134:211–253, 1988. 106. A.G. Ramm and A.I Katsevich. The Radon Transform and Local Tomography. CRC Press, 1996. 107. Yu.A. Rozanov. Markov Random Fields. Springer-Verlag, 1982. 108. W. Rudin. Functional Analysis. McGraw-Hill, second edition, 1991. 109. Y. Saad and H. Martin. GMRES: A genralized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 7:856–869, 1986. 110. J. Sarvas. Basic mathematical and electromagnetic concepts of the biomagnetic inverse problem. Phys. Med. Biol., 32:11–22, 1987. 334 References

111. D.M. Schmidt, J.S. George and C.C. Wood. Bayesian inference applied to the electromagnetic inverse problem. Human Brain Mapping, 7:195–212, 1999. 112. A. Sepp¨anen, L. Heikkinen, T. Savolainen, E. Somersalo and J.P. Kaipio. An experimental evaluation of state estimation with fluid dynamical models in process tomography. In 3rd World Congress on Industrial Process Tomography, Banff, Canada, pp. 541–546, 2003. 113. A. Sepp¨anen, M. Vauhkonen, E. Somersalo and J.P. Kaipio. State space models in process tomography – approximation of state noise covariance. Inv. Probl. Eng., 9:561–585, 2001. 114. A. Sepp¨anen, M. Vauhkonen, P.J. Vauhkonen, E. Somersalo and J.P. Kaipio. Fluid dynamical models and state estimation in process tomography: Effect due to inaccuracies in flow fields. J. Elect. Imaging, 10(3):630–640, 2001. 115. A. Sepp¨anen, M. Vauhkonen, P.J. Vauhkonen, E. Somersalo and J.P. Kaipio. State estimation with fluid dynamical evolution models in process tomography – an application to impedance tomography. Inv. Probl., 17:467–484, 2001. 116. S. Siltanen, V. Kolehmainen, S.J¨arvenp¨a¨a, J.P. Kaipio, P. Koistinen, M. Lassas, J. Pirttil¨a and E Somersalo. Statistical inversion for X-ray tomography with few radiographs I: General theory. Phys. Med. Biol., 48: 1437-1463, 2003. 117. A.F.M. Smith and G.O. Roberts. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J. R. Statist. Soc. B, 55:3–23, 1993. 118. K.T. Smith and F. Keinert. Mathematical foundations of computed tomogra- phy. Appl. Optics, 24:3950–3857, 1985. 119. E. Somersalo, M. Cheney and D. Isaacson. Existence and uniqueness for elec- trode models for electric current computed tomography. SIAM J. Appl. Math., 52:1023–1040, 1992. 120. H.W. Sorenson. Parameter Estimation: Principles and Problems.Marcel Dekker, 1980. 121. P. Stefanov. Inverse problems in transport theory. In Gunther Uhlmann, editor, Inside Out, pp. 111–132. MSRI Publications, 2003. 122. S.M. Stigler. Laplace’s 1774 memoir on inverse probability. Statistical Science, 3, 1986. 123. J. Sylvester and G. Uhlmann. A global uniqueness theorem for an inverse boundary value problem. Ann. Math., 125:153–169, 1987. 124. J. Tamminen. MCMC methods for inverse problems. Geophysical Publications 49, Finnish Meteorological Institute, 1999. 125. A. Tarantola. Inverse Problem Theory. Elsevier, 1987. 126. A. Tarantola and B. Valette. Inverse problems = quest for information. J. Geophys., pp. 159–170, 1982. 127. J.R. Thompson and R.A. Tapia. Nonparametric Function Estimation, Model- ing, and Simulation. SIAM, 1990. 128. L. Tierney. Markov chains for exploring posterior distributions. Ann. Statistics, 22:1701–1762, 1994. 129. A. N. Tihonov. Solution of incorrectly formulated problems and the regular- ization method. Soviet Mathematics: Doklady, 4:1035–1038, 1963. 130. A.N. Tikhonov and V.Y. Arsenin. Solution of Ill-posed Problems. Wiley, 1977. 131. K. Uutela, M. H¨am¨al¨ainen and E. Somersalo. Visualization og magnetoen- cephalographic data using minimum current estimates. NeuroImage, 10:173– 180, 1999. References 335

132. M. Vauhkonen, J.P. Kaipio, E. Somersalo and P.A. Karjalainen. Electrical impedance tomography with basis constraints. Inv. Probl., 13:523–530, 1997. 133. M. Vauhkonen, P.A. Karjalainen and J.P. Kaipio. A Kalman filter approach applied to the tracking of fast movements of organ boundaries. In Proc 20th Ann Int Conf IEEE Eng Med Biol Soc, pages 1048–1051, Hong Kong, China, October 1998. 134. M. Vauhkonen, P.A. Karjalainen and J.P. Kaipio. A Kalman filter approach to track fast impedance changes in electrical impedance tomography. IEEE Trans. Biomed. Eng., 45:486–493, 1998. 135. M. Vauhkonen, D. Vad´asz, P.A. Karjalainen, E. Somersalo and J.P. Kaipio. Tikhonov regularization and prior information in electrical impedance tomog- raphy. IEEE Trans. Med. Imaging., 17:285–293, 1998. 136. P.J. Vauhkonen, M. Vauhkonen, T. M¨akinen, P.A. Karjalainen and J.P. Kaipio. Dynamic electrical impedance tomography: phantom studies. Inv. Prob. Eng., 8:495–510, 2000. 137. T. Vilhunen, L.M. Heikkinen, T. Savolainen, P.J. Vauhkonen, R. Lappalainen, J.P. Kaipio and M. Vauhkonen. Detection of faults in resistive coatings with an impedance-tomography-related approach. Measur. Sci. Technol., 13:865–872, 2002. 138. T. Vilhunen, M. Vauhkonen, V. Kolehmainen and J.P. Kaipio. A source model for diffuse optical tomography. Preprint, 2003. 139. C.R. Vogel. Computational Methods for Inverse Problems. SIAM, 2002. 140. R.L. Webber, R.A. Horton, D.A. Tyndall, and J.B. Ludlow. Tuned apertire computed tomography (TACT). Theory and application for three-dimensional dento-alveolar imaging. Dentomaxillofacial Radiol., 26:53–62, 1997. 141. A. Yagola and K. Dorofeev. Sourcewise representation and a posteriori error estimates for ill-posed problems. Fields Institute Comm., 25:543–550, 2000. 142. K. Yosida. Functional Analysis. Springer-Verlag, 1965. Index

P1 approximation, 212 conditional mean (CM) estimate, 53 Pk approximation, 212 conjugate gradient method ensemble performance, 162 absorption coefficient, 209 inverse crime behavior, 184 activity density, 256 constitutive relations, 195 additive noise, 56 contraction, 27 admittivity, 196 affine estimators deconvolution problem, 8 approximate enhanced error model, diffusion matrix, 215 183 discrepancy principle, 14 enhanced error model, 183 distinguishability, 307 errors with projected models, 182 known statistics, 149 energy current density, 209 anisotropy matrix, 215, 292 energy fluence, 210 approximation error model evolution models deconvolution problem, 225 hidden Markov model, 143 electrical impedance tomography, 269 evolution–observation models, 118 filtering problem, 120 Bayes cost, 147 fixed-interval smoothing problem, 120 Bayes theorem of inverse problems, 51 fixed-lag smoothing problem, 120 Bayesian credibility set, 54 higher-order Markov models, 140 Bayesian experiment design, 260 observation, 118 Bayesian filtering, 115 observation noise process, 119 application to MEG, 249 observation update, 122 Biot–Savart law, 199 prediction problem, 120 blind deconvolution, 59, 103 state noise process, 119 bootstrap filter, 143 state vector, 118 time evolution update, 122 central limit theorem, 107 classical regularization methods, 49 filtered backprojection, 192 complementary information, 262 fixed-lag and fixed-interval smoothing, computerized tomography (CT), 190 see smoothers condition number, 9 fixed-point iteration, 27 conditional covariance, 53 fixed-point theorem, 27 338 Index

Fr´echet differentiability, 25 log-normal prior, 65 Fredholm , 7 Markov chain Monte Carlo Gaussian densities, 72 burn-in, 98 conditional probabilities, 75 convergence, 106 posterior potential, 79 Markov chain Monte Carlo methods, 91 Gaussian priors Markov chains, 92 smoothness priors, 79 time-homogeneous, 92 Geselowitz formula, 200 mass absorption coefficient, 190 Gibbs sampler, 98 maximum a posteriori (MAP) estimate, 53 Hammersley–Clifford theorem, 67, 113 maximum a posteriori estimate, 149 hierarchical models, 108 maximum entropy method, 64 hyperpriors, 108 maximum likelihood, 53 maximum likelihood estimate, 146 impedivity, 196 Maxwell–Amp`ere law, 195 importance sampling, see particle Maxwell–Faraday law, 195 filters, sampling importance mean square estimate, 147 resampling (SIR) Metropolis–Hastings algorithm, 94 improper densities, 82 minimum current estimate, 249 interval estimates, 52 minimum norm solution, 14 invariant measures, 93 minimum variance estimate, 148 invariant set, 27 Moore–Penrose inverse, see pseudoin- inverse crimes, 184 verse iterative methods for regularization, 27 Morozov discrepancy principle, 18 algebraic reconstruction technique nonstationary inverse problems, 115 (ART), 31, 37 conjugate gradient least squares observation models (CGLS), 48 additive noise, 155 conjugate gradient method, 40 Cauchy errors, 173 A-conjugate, 41 electric inverse source problems, 197 conjugate gradient normal equations electrical impedance tomography, 202 (CGNE), 48 complete electrode model, 204 conjugate gradient normal residual current patterns, 203 (CGNR), 48 optimal current patterns, 260 generalized minimal residual electrocardiography (ECG), 198 (GMRES), 48 electroencephalography (EEG), 198 Kaczmarz iteration, 31 inverse source problems, 194 Kaczmarz sequence, 32 limited angle tomography, 154 Krylov subspace methods, 39 magnetic inverse source problems, Landweber–Fridman iteration, 27 198 magnetoencephalography (MEG), Kalman filter, 123 198 extended Kalman filter (EKF), 126 noise level, 157 linear Gaussian case, 123 optical tomography, 208 Krylov subspace, 43 Boltzmann equation, 208 boundary recovery, 299 Laplace transform, 8 diffusion approximation, 211 likelihood function, 51 radiation transfer equation, 208 Index 339

quasi-static Maxwell equations, 194 irreducible, 93 sinogram, 153 periodic and aperiodic, 93 spatial blur, 152 pseudoinverse, 13 tomography, 153 Moore–Penrose equations, 47 Radon transform, 190 X-ray tomography radiance, 209 sinogram, 37 radiation flux density, 208 observation–evolution models resistance matrix, 204 augmented observation update, 136 resistivity, 196 back transition density, 134 reversible jump MCMC, 307 spatial priors, 133 robust error models, 188 weighted evolution update, 136 orthonormal white noise, 80 scattering coefficient, 209 overregularization, 24 scattering phase function, 209 Schur complements, 74 particle filters, 129 matrix inversion lemma, 74 layered sampling, 130 Schur identity, 75 sampling importance resampling singular value decomposition, 11 (SIR), 130 singular system, 11 photon flux density, 210 truncated singular value decomposi- point estimates, 52 tion, 10 Poisson observation model, 60 smoothers, 138 posterior distribution, 51 source localization, 242 prior density, 51 spread estimates, 52 prior models, 62 state space models, see evolution– Cauchy prior, 63 observation models discontinuities, 65 statistical inversion, 49 discretization invariance, 180 entropy prior, 64 Tikhonov regularization, 16 Gaussian white noise prior, 79 ensemble performance, 164 1 impulse or  prior, 62 generalizations, 24 Markov random fields, 66 regularization parameter, 16 problems related to discretization, Tikhonov regularized solution, 16 175 time series autocovariance, 108 sample-based densities, 70 truncated singular value decomposition smoothness priors, 150 ensemble performance, 159 structural priors, 152 subspace priors, 71 underregularization, 24 probability transition kernel, 92 uniform probability distribution, 20 balance equation, 95 detailed balance equation, 95 whitening matrix, 80