<<

arXiv:1705.10354v1 [stat.AP] 29 May 2017 priyefrigpir nivrepolm i omlva Normal via problems inverse in priors enforcing Sparsity chronobiology variationa estimation, model, uncertainties (LPM), co model priors, enforcing sparsity problems, inverse re Keywords: from issued algorithms enforcing spar sparsity between the comparison and proach for comput developed 3D is in study applications theoretical in appr compared Bayesian are variational algorithm via sulting t (PM) of Mean uncertainties Posterior and the a (MAP) for For account that distribution. models Hyperbolic hierarchical Generalized the using tions Gaussia Inv Normal-Inverse the the distribution distribution, mixing -Gamma E the the with distribution mixture, mixing the scale inverse with mixture, distributio scale Gamma Normal Inverse a as the distribution mixing Student the the with property: this with distributions enforcin tailed sparsity heavy on based ex models Analytical hierarchical considered. building by is approach Bayesian the when ors h presrcueo h ouinfra nes rbe c problem inverse an for solution the of structure sparse The itrs oe eeto,agrtm n applications and algorithms selection, model mixtures: CentraleSup NS–CentraleSup – CNRS 1 lc lta eMuo,912GfsrYet,France Gif-sur-Yvette, 91192 Moulon, de Plateau elec, ´ aoaor e inu tsyst et signaux des Laboratoire icaDumitru Mircea Abstract lc–Universit – elec ´ tdsrbto,wihi xrse saNra cl mixtur scale Normal a as expressed is which distribution, -t aeinapoiain(B)3 optdtomography, computed 3D (VBA) approximation Bayesian l 1 jgt ros tdn- ro oe SP) alc prio Laplace (StPM), model prior Student-t priors, njugate itiuin xrse i ojgt ros econside We priors. conjugate via expressed distributions g efradmdl o siain aiu Posterior A Maximum estimation, For model. forward he reGmadsrbto,teHproi itiuin the distribution, Hyperbolic the distribution, Gamma erse itiuin l he xrse i ojgt distribu conjugate via expressed three all distribution, n uaiaintcnqe,lk AS n oeothers. some and LASSO like techniques, gularization on based derived are algorithms iterative distributions ll ,teLpaedsrbto,wihcnas eexpressed be also can which distribution, Laplace the n, iyefrigagrtm bandvateBysa ap- Bayesian the via obtained algorithms enforcing sity rsinfrteukon ftemdlcnb obtained be can model the of unknowns the for pression pnnildsrbto rcnb xrse saNormal a as expressed be can or distribution xponential dtmgah 3-T n hooilg.Fnly a Finally, chronobiology. and (3D-CT) tomography ed nb oelduigdfeetsast nocn pri- enforcing sparsity different using modelled be an xmto VA r sd h efracso re- of performances The used. are (VBA) oximation 1 ms(L2S), emes ` Paris-Sud, e ´ riance e, r - r Contents

1 Introduction 4

2 Sparsity enforcing priors via conjugate priors 6

2.1 General Perspective: from the forward model to inversionviaaBayesianapproach ...... 6 2.2 Iterativealgorithmssparsitymechanism ...... 7 2.3 Student-t prior: expressed via conjugate priors: NormalandInverseGamma ...... 9 2.3.1 Student-t distribution via Normal variance mixture ...... 10 2.3.2 Student-t distribution via the Generalized Hyperbolicdistribution ...... 11

2.4 Hyperbolicprior: expressedvia conjugatepriors ...... 16 2.4.1 Hyperbolic prior: via Generalized Hyperbolic distribution ...... 16 2.5 Laplaceprior: expressedviaconjugatepriors ...... 17 2.5.1 Laplaceprior:NormalandExponential ...... 18 2.5.2 Laplaceprior:NormalandInverseGamma ...... 19 2.5.3 Laplace prior: via the Generalized Hyperbolic distribution ...... 20 2.6 Variance-: expressed via conjugate priors...... 22 2.6.1 Variance-Gamma: via Generalized Hyperbolic distribution...... 24 2.7 Normal-Inverse Gaussian distribution: expressed via conjugatepriors ...... 24 2.7.1 Normal-Inverse Gaussian distribution: via GeneralizedHyperbolicdistribution ...... 25

3 From uncertainties models to likelihood models 27

3.1 StationaryGaussianUncertaintiesModel ...... 27 3.2 Non-stationaryGaussianUncertaintiesModel ...... 27 3.3 StationaryStudent-tUncertaintiesModel ...... 28 3.4 Non-stationaryStudent-tUncertaintiesModel ...... 29 3.5 StationaryLaplaceUncertaintiesModel ...... 29 3.6 Non-stationaryLaplaceUncertaintiesModel ...... 30

4 Hierarchical Models with Student-t prior via StPM 31

4.1 Student-t hierarchical model: stationary Gaussian uncertainties model, known uncertainties variance . 31 4.2 Student-t hierarchical model: non-stationary Gaussian uncertainties model, known uncertainties vari- ances ...... 31

2 4.3 Student-t hierarchical model: stationary Gaussian uncertainties model, unknown uncertainties variance 32 4.4 Student-t hierarchical model: non-stationary Gaussian uncertainties model, unknown uncertainties ...... 33 4.5 Student-t hierarchical model: stationary Student-t uncertainties model, unknown uncertainties variance 33 4.6 Student-t hierarchical model: non-stationary Student-t uncertainties model, unknown uncertainties variances ...... 34 4.6.1 JointMAPestimation ...... 35

4.6.2 Posterior Mean estimation via VBA, partial separability ...... 36 4.6.3 Posterior Mean estimation via VBA, full separability ...... 44 4.7 IS: Student-t hierarchical model: non-stationary Student-t uncertainties model, unknown uncertainties variances ...... 48 4.7.1 JointMAPestimation ...... 50 4.7.2 Posterior Mean estimation via VBA, partial separability ...... 52 4.7.3 Posterior Mean estimation via VBA, full separability ...... 67 4.8 Student-t hierarchical model: stationary Laplace uncertainties model, unknown uncertainties variance 75 4.9 Student-t hierarchical model: non-stationaryLaplace uncertainties model, unknown uncertainties vari- ances ...... 78

5 Hierarchical Models with Laplace prior via LPM 78

5.1 Laplace hierarchical model: stationary Gaussian uncertainties model, known uncertainties variance . . 78 5.2 Laplace hierarchical model: non-stationary Gaussian uncertainties model, known uncertainties variances 79 5.3 Laplace hierarchical model: stationary Gaussian uncertainties model, unknown uncertainties variance 79 5.4 Laplace hierarchicalmodel: non-stationaryGaussian uncertainties model, unknown uncertainties vari- ances ...... 80 5.5 Laplace hierarchical model: stationary Student-t uncertainties model, unknown uncertainties variance 80 5.6 Laplace hierarchical model: non-stationaryStudent-t uncertainties model, unknown uncertainties vari- ances ...... 81 5.6.1 JointMAPestimation ...... 81 5.6.2 Posterior Mean estimation via VBA, partial separability ...... 83 5.6.3 Posterior Mean estimation via VBA, full separability ...... 91

5.7 Laplace hierarchical model: stationary Laplace uncertainties model, unknown uncertainties variance . 94

3 5.8 Laplace hierarchical model: non-stationary Laplace uncertainties model, unknown uncertainties vari- ances ...... 95 5.8.1 JointMAPestimation ...... 96 5.8.2 Posterior Mean estimation via VBA, partial separability ...... 97 5.8.3 Posterior Mean estimation via VBA, full separability ...... 103

1 Introduction

In many applications, the prior information concerning the unknown(s) of the model, namely the classical linear forward model, Equation (1)

g = Hf + ǫ (1)

used in inverse problems, can be translated as the sparse structure of the unknown(s) i.e. the f in Equation (1). In particular, the linear forward model expressed in Equation (1) corresponds to many application such as signal de- convolution, image restoration, Computed Tomography (CT) image reconstruction, Fourier Synthesis (FS) inversion, microwave imaging [NMD94] and [FDMD07], ultrasound echography, seismic imaging, radio astronomy [KTB04]

fluorescence imaging, inverse scattering [CMD97],[FDMD05],[AMD10] and[?], Eddy current non destructive test- ing [NMD96] or SAR imaging [AKZ06]. In all these examples the common inverse problem is to estimate f from the observations of g. In general, the inverse problems are ill-posed [Had01], since the conditioning number of the matrix H is very high. This means that, in practice, the data g alone is not sufficient to define an unique and satis- factory solution. The interpretation of the linear forward model, Equation (1) is presented in Figure (1). When the

Observed data: Unknown a signal, an image,

❄ ❄ g = Hf + ǫ ✻ ✻ Model uncertainties Transformation matrix: & measurements errors Radon, Fourier, ... & noise

Figure 1: Interpretation of the forward linear model, Equation (1)

Bayesian approach is considered, one way to build hierarchical models that are favouring a sparse solution is to con- sider distributions that are known to enforce sparsity for the prior. Such an approach gives the possibility to estimate the of the hierarchical model, i.e. the associated variances for f and ǫ. A typical hierarchical model associated to the forward model Equation (1) is presented in Figure (2). However, Figure (2) presents an hierarchical

4 ✲✓✏ ψθf θf ✒✑ ❄ ✓✏ g = Hf + ǫ f ✒✑H ❄ ✲✓✏✲✓✏✲✓✏ ψθǫ θǫ ǫ g Figure 2: From linear forward model, Equation (1✒✑) to the Hierarchical✒✑✒✑ Model: Direct sparsity, i.e. f is sparse

model for direct sparsity, i.e. an hierarchical model that asumes the sparse structure of f. In many applications, f is not sparse but can be expressed via a transformation D on a sparse structure z. Evidently, when considering the transformation on the sparse structure, the uncertainties and modelling errors have to accounted, Equation (2):

f = Dz + ξ (2)

In this cases, a more general general Hierarchical model is presented in Figure (3). When referring to the strategy ✲✓✏✲✓✏ ✓✏✛ ψθξ θξ ξ θz ψθz ✒✑✒✑❅ ✒✑ ❅ ❄ g = Hf + ǫ ❅❘ ✓✏f ✛ D ✓✏z ( f = Dz + ξ ✒✑H ✒✑ ❄ ✲✓✏✲✓✏✲✓✏g ψθǫ θǫ ǫ Figure 3: From linear forward model, Equation✒✑ (1✒✑) to the Hierarchical✒✑ Model: Sparsity via a transformation, i.e. Df is sparse

used in the Bayesian approach for searching sparse solution in the inverse problem context, we have used the word favouring. It is important to mention that generally, the linear forward model, Equation (1), may have an infinite number of solutions. Using a sparsity enforcing prior to model f results in algorithms selecting sparse solutions, but this is possible only when the linear forward model, Equation(1) is allowing such solutions. Therefore, for those type of algorithms there is no guarantee for the sparse structure of the solution.

In this work we present three classes of sparsity enforcing priors and show how a hierarchical model can be build using these kind of priors. We discuss then the mechanism of sparsity enforcing and present the advantages of iterative algorithms using sparsity enforcing priors that can be expressed via conjugate priors. e place this work in the context of such heavy tailed distributions that in particular can be expressed via conjugate priors.

5 2 Sparsity enforcing priors via conjugate priors

This section presents some sparsity enforcing priors, namely heavy tailed distributions that can be expressed via con- jugate priors. First, a brief presentation of the Bayesian approach in inverse problems, Subsection (2.1). The sparsity mechanism and its key factors that are to be considered when selecting a good sparsity enforcing prior is presented in Subsection (2.2). It is shown that not only the heavy-tailed form of the distribution is of great interest for enforcing sparsity but also the associated variance vector of the sparse structure plays a crucial role, for which we aim a specific behaviour: small variances associated with the zero or close to zero values and important variances for the other val- ues. Subsection (2.3) presents the Student-t distribution, expressed via a Normal variance mixture, using as the mixing distribution the Inverse Gamma distribution. In particular, it is shown that via those two conjugate priors a two param-

eters version of the Student-t distribution is obtained when no condition is imposed for the scale and shape parameters of the Inverse Gamma distribution, for which the correspondingvariance can be decreased to any positive value, which is a crucial fact in the sparsity mechanism. The same Normal variance mixture is obtained if the Student-t distribution is viewed as a generalized Hyperbolic distribution. In Subsection (2.5) the is expressed via two conjugate distributions as a Normal variance mixture where the mixing distribution is the . The same Normal variance mixture is obtained if the Laplace distribution is viewed as a generalized Hyperbolic distri- bution. Another way to express the Laplace distribution using conjugate distributions is via a Normal inverse-variance mixture where the mixing distribution is the Inverse Gamma distribution. In Subsection (2.6) the Variance-Gamma distribution is expressed via two conjugate distributions as a Normal mean-variance mixture where the mixing distri- bution is the Inverse Gamma distribution. The expression is obtained using the generalized Hyperbolic distribution. In Subsection (2.7) the Normal-Inverse Gaussian distribution is expressed via two conjugate distributions as a Normal mean-variance mixture where the mixing distribution is the Inverse Gaussian distribution.

2.1 General Perspective: from the forward model to inversion via a Bayesian approach

The strategy adopted for doing the inversion in Equation (1) (or an equivalent one) is to build an hierarchical model, accounting for the available prior informations (i.e. accounting for the sparsity when the prior information concerning f is its sparse structure) and also accounting for the particularities of the errors and uncertainties of the model, mod- elled via ǫ based on a Bayesian approach. Considering the linear forward model, Equation (1), the is based on the fundamental relation given by the Bayes rule:

p(g f, θǫ) p(f θf ) p(f g, θǫ, θf )= | | , θ = (θǫ, θf ) (3) | p(g θǫ, θf ) |

6 where θ represents the hyperparameters appearing in the hierarchical model (namely the variances vf and vǫ associ- ated with the unknowns of the linear forward model, f and ǫ. The Bayes rule can be interpreted as a proportionality relation between the posterior law and the product of the prior law (the prior information, sparsity in our case) and the likelihood:

p(f g, θǫ, θf ) p(g f, θǫ) p(f θf ) (4) | ∝ | | Generally, the likelihood is obtained via the linear forward model, Equation (1) and the distribution considered for modelling the errors and the uncertainties ǫ. More details on this, in Section (3). An extension of Equation (4) is the general Bayesian Inference, where the hyperparameters θ = (θǫ, θf ) from Equation (4) are considered to be unknown and are to be estimated, along with the unknowns of the forward model, Equation (1):

p(f, θǫ, θf g) p(g f, θǫ) p(f θf ) p(θǫ ψθ ) p(θf ψθ ) p(ψθ ) p(ψθ ) (5) | ∝ | | | ǫ | f ǫ f

For the case when the sparsity appears via a transformation, the forward linear model Equation (1) and Equation (2) are considered (Figure (3)):

p(f, θǫ, θξ, θz g) p(g f, θǫ) p(f z, θξ) p(z θz) p(θǫ ψθ ) p(θξ ψθ ) p(θz ψθ ) p(ψθ ) p(ψθ ) p(ψθ ) (5bis) | ∝ | | | | ǫ | ξ | z ǫ ξ z

Evidently, considering the general Bayesian Inference implies assigning distributions for the hyperparameters, i.e.,

assigning p(θ ψ). The set of distribution assigned for the prior, the likelihood and for the hyperparameters represents | the hierarchical model, Figure (4). The choices of the distributions are done in accordance with the application and with the available prior informations. In particular, when the prior information is the sparse structure of the unknown f of the forward model Equation (1) sparsity enforcing priors will be used. From the hierarchical model the posterior distribution is obtained via Equation (5) from which the unknowns and the hyperparameterscan be estimated. Another representation of an hierarchical model built on the linear forward model, Equation (1) is presented in Figure (4). We will see that when sparsity enforcing priors that can be expressed via conjugate distributions laws are considered, analytical expressions can be obtained for the unknowns of the hierarchical model. This is a great advantage and the fundamental reason for the great interest of such priors.

2.2 Iterative algorithms sparsity mechanism

In those type of approaches, the mechanism of sparsity is based not only on the heavy tailed property of the prior distribution (or its property to induce sparsity) but also on a particular behaviour of the associated variances. In such an approach a bivariate prior is set for the unknown of the model that needs to be estimated and for the corresponding variance. The algorithm that results is an iterative one, updating at every iteration both the unknown of the model and

7 Sparsity Prior: p(f)

Hierarchical Hyperparam: Model p(θ)

Likelihood: p(g f) from|p(ǫ)

Figure 4: Hierarchical Model: probability density functions assigned for the unknowns the corresponding variance. In order to obtain a sparse solution for the unknowns, the structure of the variance must be sparse itself. In particular the variances associated with the zero or close to zero points from the unknown of the model must be small, and the variances associated with the non-zero elements of the sparse unknowns of the model must be significant. Therefore, the parameters of the distribution modelling the variance (for example the shape and

sparsity enforcing prior

❄ ✲ f sparse ✛ vf sparse

distribution modelling variance

Figure 5: Sparsity Mechanism: For direct sparsity applications scale parameters of the Inverse Gamma distribution appearing in the conjugate prior models for sparsity enforcing

8 via Student-t or Laplace) must be chosen such that the variance vector vf is sparse, i.e. the of the

elements vf is close to zero, E vf 0. Furthermore, in a Bayesian approach, we may have a prior knowledge j j ց

for the numerical value associated with the variance of the distribution modelling vf j , i.e. Var vf j = w, where w is the numerical value obtained via prior knowledge. Evidently, depending on the parameters corresponding to the distribution modelling the variance, the behaviour of the marginal, i.e. the sparsity enforcing prior distribution. We note that in order to select a prior model that enforces sparsity, those parameters must be chosen such that the prior distribution expressed via conjugate prior distributions, modelling the unknown of the linear forward model,

Equation (1), is concentrated around zero, i.e. it’s variance is very small, VarPrior [f kj ] 0. ց 2.3 Student-t prior: expressed via conjugate priors: Normal and Inverse Gamma

In the following, the Student-t distribution is discussed. It is a sparsity enforcing prior because of its heavy-tailed form. It can be expressed via conjugate priors as the marginal of a Normal variance , with the mixing distribution an Inverse Gamma distribution. The standard form, with one parameter ν representing the degrees of freedom is obtained when shape and scale parameters corresponding to the Inverse Gamma distribution are considered

ν equal, α = β = 2 . When this equality is not imposed, a two parameters Student-t distribution is obtained, which is of great importance in the context of sparsity enforcing since in this case, the corresponding variance (which generally

needs to be small) can take any positive values. In probability and statistics, Student’s t-distribution (or simply the t-distribution) is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and population standard deviation is unknown. It was developed by William Sealy Gosset under the pseudonym Student. Whereas a de- scribes a full population, t-distributions describe samples drawn from a full population; accordingly, the t-distribution for each sample size is different, and the larger the sample, the more the distribution resembles a normal distribution. The t-distribution plays a role in a number of widely used statistical analyses, including Student’s t-test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis. The Student’s t-distribution also arises in the Bayesian analysis of data from a normal family.

A has a Student-t(x ν) distribution if its probability density function is: | ν+1 ν+1 2 (− 2 ) Γ 2 x t t(x ν)= 1+ , ν R+, Γ representing the Gamma function; (6) S − | √πνΓ ν ν ∈ 2   The comparison between the standard  Normal distribution (x 0, 1) and the standard Student-t distribution t N | S − t (x 0, 1) is presented in Figure (6). |

9 N ( x | 0, 1 ) 0.4 N ( x | 0, 1 ) 0.25 St-t ( x | 0,1 ) St-t ( x | 0,1 ) 0.2 0.3 0.15 0.2 0.1

0.1 0.05

0 -10 -5 5 10 -6 -5 -4 -3 -2 -1 (a) Normal vs. Student-t distribution (b) Heavy tailed property of the Student distribution

Figure 6: Comparison between the Normal distribution (x 0, 1) and Student-t distribution t t (x 0, 1). N | S − |

2.3.1 Student-t distribution via Normal variance mixture

For the linear model expressed in Equation (1), the Student-t distribution can be used as a prior via the Student-t Prior

Model (StPM), Equation (7), which considers a zero-mean Normal distribution for f j vf , with the variance vf α, β | j j | modelled as an Inverse Gamma distribution, with the corresponding shape and scale parameters, α and β:

1 1 2 − − 2 f j p(f j 0, vf )= (f j 0, vf )=(2π) 2 v exp j j f j 2vf StPM: | N | − j (7)  α n o p(v α, β)= (v α, β)= β v−α−1 exp β ,  f j f j Γ(α) f j v  | IG | − f j n o The expression of the joint probability distribution f j , vf α, β is a bivariate Normal-Inverse Gamma distribution,  j | Equation (8),

p(f j , vf α, β)= (f j 0, vf ) (vf α, β), (8) j | N | j IG j | and the marginal p(f j α, β) is, Equation (9): |

1 α 1 2 − β − 2 f j −α−1 β p(f j α, β)=(2π) 2 v exp v exp dvf . (9) | Γ(α) f j −2v f j −v j Z  f j   f j  In Equation (9) an distribution can be identified inside the integral, IG 2 f j 1 2 −(α+ 1 )−1 β + Γ(α + ) 1 f j I = v 2 exp 2 dv = 2 v α + ,β + dv , (10) f j f j 1 f j f j − vf 2 (α+ 2 ) IG | 2 2 Z ( j ) f j Z   β + 2 1   so, the marginal from Equation (9) is two-parameters Student-t distribution:| {z } 1 α 1 1 2 −(α+ 2 ) 1 β Γ(α + ) 1 Γ(α + ) f − 2 2 − 2 2 j p(f j α, β)=(2π) 1 = (2πβ) 1+ (11) | Γ(α) 2 (α+ 2 ) Γ(α) 2β f j   β + 2 ν   ν If α = β = , from Equation (11) we can conclude that the marginal is the Student-t distribution, p(f j α = ,β = 2 | 2 ν 2 )= t (f j ν): S | ν+1 ν+1 2 (− 2 ) ν ν Γ 2 f j p(f j α = ,β = )= 1+ = t (f j ν) (12) | 2 2 √πνΓ ν ν S | 2    10 Using the Student-t Prior Model (StPM), Equation (7), the Student-t distribution can be used as a sparsity enforcing prior for the forward linear model, Equation (1). The interest of expressing the prior distribution via prior models using conjugate priors is the possibility to compute analytical expressions for the unknown estimates. Section (4) presents the developments of hierarchical models based on Student-t prior distribution for the linear for- ward models Figure (2) and Figure (3).

2.3.2 Student-t distribution via the Generalized Hyperbolic distribution

In the following, a short introduction of the Generalized Hyperbolic distribution. The interest of this distribution is its heavy tailed form for different parameters and the fact that it can be expressed via conjugate priors, namely the Normal distribution and the generalized Inverse Gaussian distribution. The generalised Hyperbolic distribution ( ), introduced by Ole Barndorff-Nielsen, is a continuous probability distri- GH bution defined as the normal variance-mean mixture where the mixing distribution is the generalized Inverse Gaussian ( ) distribution. Its probability density function is given in terms of modified Bessel function of the second kind, GIG λ. A random variable has a generalized Hyperbolic (x λ,α,β,δ,µ,γ) distribution if its probability density K GH | function is:

2 2 γ λ λ− 1 α δ + (x µ) K 2 − (x λ, α, β, δ, µ)= δ   exp β (x µ) ,λ,α,β,δ,µ R,γ = α2 β2; q 1 λ GH | √ 2 − { − } ∈ − 2π λ (δγ) 2 K δ2 + (x µ) /α p − q  (13) For the linear model expressed in Equation (1), the Generalized Hyperbolic Prior Model (GHPM), Equation (14),

2 2 considers a Normal distribution for f j vf with the mean µ + βvf and variance vf . The variance vf γ , δ , λ is | j j j j | modelled as a generalized Inverse Gaussian distribution, with the corresponding parameters γ2, δ2, and λ:

2 1 1 x−µ−βv − − 2 1 ( fj ) p(f j µ, vf , β)= f j µ + βvf , vf = (2π) 2 v exp j j j f j 2 vf GHPM: | N | − j (14)  γ λ    2 2 2 2  ( δ ) λ−1 1 2 2 −1  p(vf j γ , δ , λ)= (vf j γ , δ , λ)= vf exp γ vf j + δ vf , | GIG | 2Kλ(δγ) j − 2 j n  o  2 2 The expression of the joint f j , vf µ, β, γ , δ , λ is given by Equation (15), j |

2 2 2 2 p f j , vf µ, β, γ , δ , λ = f j µ + βvf , vf (vf γ , δ , λ), (15) j | N | j j GIG j |   so the marginal is, Equation (16):

γ λ 2 λ− 3 1 f j µ βvf p(f µ,β,γ2,δ2, λ)= δ v 2 exp − − j + γ2v + δ2v−1 dv (16) j f j f j f j f j | 2√2π λ (δγ) (−2 vf j !) K Z 

11 For solving Equation (16) we can identify a inside the integral: GIG λ 1 ( − 2 )−1 1 2 2 2 2 −1 I =exp β (f j µ) v exp β + γ vf + δ + (f j µ) v dvf { − } f j −2 j − f j j Z     h i  2 2 2 2 2 λ− 1 (β + γ ) δ + (f j µ) K 2 − 1 2 2 2 2 =exp β (f j µ) r  λ 1  vf j λ , β + γ , δ + (f j µ) dvf j { − } 2 2 2 − 4 GIG | − 2 − β +γ Z   δ2+(f −µ)2 h i j  1    (17) | {z } Introducing the notations:

2 2 2 2 2 2 α = β + γ ; q(f j ) = δ + (f j µ) , (18) − the marginal from Equation (16) can be written as:

γ λ λ− 1 αq(f j ) p(f µ,β,γ2,δ2, λ)= δ K 2 exp β (f µ) (19) j 1 λ j | √ 2 − { − } 2π λ (δγ) q(fpj ) K α   Introducing between the parameters the parameter α and excluding the parameter γ, considering as parameters δ instead of δ2 and reordering the parameters we obtain:

γ λ λ− 1 αq(f j ) p(f λ, α, β, δ, µ)= δ K 2 exp β (f µ) ,γ = α2 β2 (20) j 1 λ j | √ 2 − { − } − 2π λ (δγ) q(fpj ) K α p   The interest of this distribution is that is of a very general form, being the superclass of, among others, the Student-t

( t t) distribution for the particular set of parameters ν , 0, 0, √ν,µ , the Hyperbolic ( ) distribution for the S − GH − 2 H particular set of parameters (x λ =1,α,β,δ,µ) - Subsection (2.4.1), the Laplace ( ) distribution for the particular GH | L set of parameters (x λ =1, α = b−1,β =0,δ 0,µ) - Subsection (2.5.3), the variance-gamma ( ) distribution GH | ց VG for the particular set of parameters (x λ,α,β,δ 0,µ) - Subsection (2.6.1), the Normal-Inverse Gaussian ( ) GH | ց N IG distribution (x λ, α = 1 ,β,δ,µ) - Subsection (2.7.1). GH | − 2 In the following we consider the Student-t distribution expressed via the generalized Hyperbolic distribution:

X (x λ = −ν , α 0,β = 0,δ = √ν,µ) has a Student’s t-distribution with ν degrees of freedom, t ∼ GH | 2 ց S − t (x µ,ν). | Fixing the asymmetry parameter β =0, as an immediate consequence γ = α and exp β (x µ) =1. The particular { − } case of the probability density function is:

(x−µ)2 α λ λ− 1 αδ 1+ 2 K 2 δ (x λ, α, β =0,δ,µ)= δ   . (21) q 1 −λ GH | √2π λ (αδ) 2 2 K (x−µ) δ 1+ δ2 /α  q 

12 Fixing δ2 = ν and considering α 0, the particular case of the probability density function is: ց

1 2 λ− 2 (x−µ) λ 2 λ− 1 α√ν 1+ 1 α √ν (x µ) K 2 ν (x λ, α 0,β =0,δ = √ν,µ)= 1+ −  q  GH | ց √2π √ν  α s ν  λ (α√ν)   K

  2 1 1 (x−µ) 1 λ− 2 2 ( 2 ) λ− 1 α√ν 1+ 1 α 2 (x µ) K 2 ν = 1+ −  q . √2π √ν ν λ (α√ν)   ! K (22)

Since we considered α 0, for both arguments of the modified Bessel functions of the second kind appearing in ց 2 Equation (22) we have α√ν 1+ (x−µ) 0 and α√ν 0 so for computing the values of the two modified Bessel ν ց ց q functions of the second kind we can use the asymptotic relations for small arguments:

λ−1 −λ λ (x) Γ (λ) 2 x , as x 0 and λ> 0 K ∼ ց (23) −λ−1 λ λ (x) Γ ( λ) 2 x , as x 0 and λ< 0. K ∼ − ց We have:

1 λ− 1 2 2 2 ( 2 ) (x µ) 1 1 λ− 1 (x µ) −λ+ 2 −1 2 λ− 1 α√ν 1+ − =Γ λ 2 α√ν 1+ − K 2  s ν  2 − ν   ! (24)    −λ−1 λ λ α√ν = Γ ( λ)2 α√ν K − Using Equation (24) in Equation (22), the particular case of the probability density function becomes:

1 1 λ− 1 2 2 1 α 2 − 1 Γ λ (x µ) (x λ, α 0,β =0,δ = √ν,µ)= √2 α√ν 2 2 − 1+ − (25) GH | ց √2π √ν Γ ( λ) ν   −  !  Finally, fixing λ = ν , Equation (25) becomes: − 2 − ν+1 ν 1 Γ ν+1 (x µ)2 2 (x λ = − , α 0,β =0,δ = √ν,µ)= 2 1+ − = t t (x µ,ν) . (26) GH | 2 ց √νπ ν ν S − | Γ 2  ! Figure (7a) presents four Student-t probability density functions  with different means (µ = 0, blue and red, µ = 1, yellow and µ = 1, violet) and different degrees of freedom (ν = 1, blue and yellow, ν = 1, 5, red and ν = 0.5, − violet). Figure (7b) presents the Generalized Hyperbolic probability density function for parameters (x λ = GH | ν/2, α 0,β = 0,δ = √ν,µ) with ν and µ set with the same numerical values as the corresponding ones from − ց Figure (7a). For α, the numerical values is 0.001. Figure (7c) presents the comparison between the four Student- t probability density functions and the corresponding Generalized Hyperbolic probability density functions, t t S − vs. . In all forth cases the probability density functions are superposed. Figure (7d) presents the comparison GH between the logarithm of distributions, log t t vs. log . Figure (8c) presents the comparison between the − S − − GH standard Student-t probability density function, t t (x µ =0,ν = 1) (reported in Figure (8a)) and the Generalized S − |

13 St-t ( x | 0,1 ) GH ( x | -0.5,0.001,0,1,0) St-t ( x | 0,1.5 ) 0.3 GH ( x | -0.75,0.001,0,1.2247,0)0.3 St-t ( x | 1,1 ) GH ( x | -0.5,0.001,0,1,1) St-t ( x | -1,0.5 ) GH ( x | -0.25,0.001,0,0.70711,-1) 0.2 0.2

0.1 0.1

-10 -5 5 10 -10 -5 5 10 ν (a) t t (x µ, ν) (b) x λ = 2 , α = 0.001, β = 0, δ = √ν, µ = 0 S − | GH | − 

GH ( x | -0.5,0.001,0,1,0) -log GH ( x | -0.5,0.001,0,1,0) 6 St-t ( x | 0,1 ) 0.3 -log St-t ( x | 0,1 ) GH ( x | -0.75,0.001,0,1.2247,0) -log GH ( x | -0.75,0.001,0,1.2247,0)5 St-t ( x | 0,1.5 ) -log St-t ( x | 0,1.5 ) GH ( x | -0.5,0.001,0,1,1)0.2 -log GH ( x | -0.5,0.001,0,1,1)4 -log St-t ( x | 1,1 ) St-t ( x | 1,1 ) -log GH ( x | -0.25,0.001,0,0.70711,-1)3 GH ( x | -0.25,0.001,0,0.70711,-1) -log St-t ( x | -1,0.5 ) St-t ( x | -1,0.5 ) 0.1 2 1 -10 -5 5 10 -10 -5 0 5 10 (c) t t vs. (d) log t t vs. log S − GH − S − − GH Figure 7: Four Student-t probability density functions with different means (µ = 0, blue and red, µ = 1, yellow and µ = 1, violet) and different degrees of freedom (ν = 1, blue and yellow, ν = 1, 5, red and ν = 0.5, violet), Figure (7a−) and the corresponding Generalized Hyperbolic probability density functions, with parameters set as in Equation (26), with α = 0.001, Figure (8b). Comparison between the distributions, t t vs. , Figure (8c) and between the logarithm of the distributions log t t vs. log , Figure (8d). S − GH S − GH

Hyperbolic density function for parameters (x λ = ν = 1 , α 0,β = 0,δ = √ν = 1,µ = 0) (reported GH | − 2 2 ց in Figure (8b)), showing that the two probability density functions are almost superposed. Figure (8d) presents the

comparison between the logarithm of the two distributions, log t t vs. log . In this case, the numericalvalue − S − − GH for α is 0.01. The behaviour of the Generalized Hyperbolic density function (x λ = ν/2, α 0,β = 0,δ = GH | − ց √ν,µ = 0) depending on α is presented in Figure (9a): a comparison between the standard Student-t probability density function t t (x µ =0,ν = 1) (in blue) and the Generalized Hyperbolic density function (x λ = ν = S − | GH | − 2 1 , α 0,β = 0,δ = √ν = 1,µ = 0) for α = 1 (in red), α = 0.1 (in yellow), α = 0.01 (in violet) and − 2 ց α = 0.001 (in green) is presented in Figure (9a). The difference between those four Generalized Hyperbolic density functions and the Student-t density function is presented in Figure (9b). Considering the Generalized Hyperbolic Prior Model (GHPM), Equation (14), for the structure of the parameters corresponding to the Student-t distribution

λ = −ν , α 0,β = 0,δ = √ν,µ we obtain a Normal variance mixture (since β = 0) with the mixing distribution 2 ց 2 −ν ν ν the Inverse Gamma distribution (since (vf γ 0, ν, ) = (vf )). Indeed, if α 0,β = 0 and GIG j | ց 2 IG j | 2 2 ց

14 St-t ( x | 0,1 ) GH ( x | -0.5,0.01,0,1,0) 0.3 0.3

0.2 0.2

0.1 0.1

-10 -5 5 10 -10 -5 5 10 ν (a) t t (x µ = 0,ν = 1) (b) x λ = 2 , α = 0.01, β = 0, δ = √ν, µ = 0 S − | GH | − 

St-t ( x | 0,1 ) -log St-t ( x | 0,1 ) 0.3 GH ( x | -0.5,0.01,0,1,0) -log GH ( x | -0.5,0.01,0,1,0)5

0.2 4

3 0.1 2

-10 -5 5 10 -10 -5 0 5 10 (c) t t vs. (d) log t t vs. log S − GH − S − − GH Figure 8: The standard Student-t distribution (ν = 1), Figure (8a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (26), Figure (8b). Comparison between the two distribution, t t vs. , Figure (8c) and between the logarithm of the distributions log t t vs. log , Figure (8d). S − GH − S − − GH

α St-t ( x | 0,1 ) 0.5 = 1 0.2 GH ( x | -0.5,1,0,1,0) α = 0.1 α GH ( x | -0.5,0.1,0,1,0) = 0.01 0.4 α 0.15 GH ( x | -0.5,0.01,0,1,0) = 0.001 GH ( x | -0.5,0.001,0,1,0)0.3 0.1 0.2 0.05 0.1 -10 -5 5 10 -10 -5 5 10 (a) t t vs. with α 1, 0.1, 0.01, 0.001 (b) t t for α 1, 0.1, 0.01, 0.001 S − GH ∈ { } GH−S − ∈ { } Figure 9: The behaviour of the Generalized Hyperbolic density function (x λ = ν/2, α 0,β = 0,δ = √ν,µ = 0) depending on α. GH | − ց

2 2 2 2 2 −ν 2 −ν γ = α + β we obtain γ 0. So p(vf γ , δ , λ) = p(vf γ 0, ν, ) = (vf γ 0, ν, ). ց j | j | ց 2 GIG j | ց 2 Sincepγ2 0, for the modified Bessel function of the second kind appearing in the expression of the generalized ց −λ−1 λ Inverse Gaussian distribution the asymptotic relation for small arguments can be used, λ (δγ) Γ (λ)2 (δγ) , K ∼ so the generalized Inverse Gaussian can be written as an Inverse Gamma with equal shape and scale parameters ν ( ν ) 2 − ν −1 (v γ2 0, ν, −ν ) 2 v 2 exp ν v−1 = (v ν , ν ). We obtain the same conjugate prior f j 2 ν f j 2 f j f j 2 2 GIG | ց ∼ Γ( 2 ) − IG | n o 15 model for the Student-t distribution, the Normal variance mixture with the mixing distribution an Inverse Gamma

distribution. If the condition δ = √ν is not imposed in the Generalized Hyperbolic distribution, the two parameters Student-t distribution form is obtained.

2.4 Hyperbolic prior: expressed via conjugate priors

The interest is given by the fact that the Hyperbolic distribution is a heavy-tailed distribution and therefore can be successfully used as a sparsity enforcing prior. The Hyperbolic distribution is a continuous probability distribution, for the logarithm of the probability density function is a hyperbola, therefore the distribution decreases exponentially, which is more slowly than the Normal distribution. It is suitable to model phenomena where numerically large values are more probable than is the case for the Normal distribution. The origin of the distribution is the observation by

Ralph Alger Bagnold, that the logarithm of the histogram of the empirical size distribution of sand deposits tends to form a hyperbola. The Hyperbolic distribution has the following probability density function:

γ (x α,β,δ,µ)= exp α δ2 + (x µ)2 + β (x µ) ,γ = α2 β2 (27) H | 2αδ 1 (δγ) − − − − K  q  p 2.4.1 Hyperbolic prior: via Generalized Hyperbolic distribution

X (x λ =1,α,β,δ,µ) has a Hyperbolic distribution, (x α,β,δ,µ). ∼ GH | H | The goal of this section is to derive the Hyperbolic distribution from the Generalized Hyperbolic distribution. For the particular case of the Generalized Hyperbolic distribution with λ =1, the probability density function is:

1 2 2 2 2 2 1 α δ + (x µ) 1 γ δ + (x µ) K 2 − (x λ =1,α,β,δ,µ)= −  q  exp β (x µ) , (28) GH | √2π δ q α  1 (δγ) { − } K

1   For λ = , the modified Bessel function of the second kind λ (x) can be stated explicitly with: 2 K π 1 (x)= − 1 (x)= exp x ,x> 0, (29) K 2 K 2 2x {− } r so: 1 2 2 2 π 2 2 1 α δ + (x µ) = exp α δ + (x µ) . (30) K 2 −  2  − −  q  2α δ2 + (x µ)  q  − Plugging Equation (30) in Equation (28):  q 

γ (x λ =1,α,β,δ,µ)= exp α δ2 + (x µ)2 + β (x µ) = (x α,β,δ,µ) . (31) GH | 2αδ 1 (δγ) − − − H | K  q  Figure (10a) presents four Hyperbolic probability density functions (x α,β,δ,µ) with different parameters. Fig- H | ure (10b) the corresponding Generalized Hyperbolic density functions for parameters (x λ = 1,α,β,δ,µ). The GH |

16 Hyperbolic probability density functions and the corresponding Generalized Hyperbolic density functions are super-

posed, Figure (10c). Figure (10d) presents the comparison between the logarithm of the two distributions, log − H vs. log . The Hyperbolic distribution can be expressed using the Generalized Hyperbolic Prior Model (GHPM), − GH

Hyper ( x | 2,-1,3,0)0.4 GH ( x | 1,2,-1,3,0)0.4 Hyper ( x | 2,1,3,0) GH ( x | 1,2,1,3,0) Hyper ( x | 3,1.5,3,0)0.3 GH ( x | 1,3,1.5,3,0)0.3 Hyper ( x | 4,1.5,3,0) GH ( x | 1,4,1.5,3,0) 0.2 0.2

0.1 0.1

-10 -5 5 10 -10 -5 5 10 (a) (x α, β, δ, µ) (b) (x λ = 1, α, β, δ, µ) H | GH |

GH ( x | 1,2,-1,3,0)0.4 -log GH ( x | 1,2,-1,3,0) Hyper ( x | 2,-1,3,0) -log Hyper ( x | 2,-1,3,0)40 GH ( x | 1,2,1,3,0)0.3 -log GH ( x | 1,2,1,3,0) Hyper ( x | 2,1,3,0) -log Hyper ( x | 2,1,3,0)30 GH ( x | 1,3,1.5,3,0) -log GH ( x | 1,3,1.5,3,0) 0.2 Hyper ( x | 3,1.5,3,0) -log Hyper ( x | 3,1.5,3,0)20 GH ( x | 1,4,1.5,3,0) -log GH ( x | 1,4,1.5,3,0) -log Hyper ( x | 4,1.5,3,0) Hyper ( x | 4,1.5,3,0)0.1 10

-10 -5 5 10 -10 -5 5 10 (c) vs. (d) log vs. log H GH − H − GH Figure 10: The Hyperbolic distribution, Figure (10a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (31), Figure (10b). Comparison between the two distribution, vs. , Figure (10c) and between the logarithm of the distributions log vs. log , Figure (10d). H GH − H − GH

Equation (14) by fixing λ =1,.

1 2 − 1 − (x−µ−βvf ) p(f µ, v , β)= f µ + βv , v = (2π) 2 v 2 exp 1 j j f j j f j f j f j 2 v HypPM: | N | − f j (32)  γ   2 2 2 2  ( δ ) 1 2 2 −1  p(vf γ , δ , λ)= (vf γ , δ , λ =1)= exp γ vf + δ v ,  j | GIG j | 2K1(δγ) − 2 j f j n  o  2.5 Laplace prior: expressed via conjugate priors

In the following, we present how the Laplace distribution, a sparsity enforcing prior because of its heavy-tailed form

can be expressed via conjugate priors, namely the Normal distribution and the Exponential distribution. For this part of the work, the following resources were used:

[2] A variational Bayes framework for sparse adaptive estimation - K. E. THEMELIS, A. A. RONTOGIANNIS, arvix: 1401.2771v1 [statML], 13 Jan 2014.

17 In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre- Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions (with an additional ) spliced together back-to-back, although the term ’double exponential distribution’ is also sometimes used to refer to the . The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.

A random variable has a Laplace(µ,b) distribution if its probability density function is:

exp µ−x ,x<µ 1 x µ 1 b p(x µ,b)= exp | − | = − ,b> 0; (33) | 2b − b 2b  exp  x−µ , x µ    − b ≥  Figure (11) presents the comparison between the Normal distribution and the standard Laplace distribution. We

N ( x | 0, 1 ) 0.5 N ( x | 0, 1 ) 0.25 Lapalce ( x | 0,1 ) Lapalce ( x | 0,1 ) 0.4 0.2

0.3 0.15

0.2 0.1

0.1 0.05

0 -10 -5 5 10 -6 -5 -4 -3 -2 -1 (a) Normal vs. Laplace distribution (b) Heavy tailed property of the Laplace distribution

Figure 11: Comparison between the standard Normal distribution and the Laplace distribution.

present two ways to obtain the Laplace distribution via conjugate priors. The first possibility is to consider a zero- mean Normal distribution for which the variance is modelled as an Exponential distribution. The second possibility considers an Inverse Gamma distribution for the inverse of the variance, with the shape parameter set at 1.

2.5.1 Laplace prior: Normal and Exponential

We start with the linear model expressed in Equation (1). We consider the Laplace Prior Model (LPM), Equation (34),

which considers a zero-mean Normal distribution for f j vf while the variance vf b is modelled as an Exponential | j j | distribution, with the corresponding parameter λ:

− 1 − 1 f 2 p(f 0, v )= (f 0, v )=(2π) 2 v 2 exp 1 j j f j j f j f j 2 v LPM: | N | − fj (34)  p(vf λ)= (vf λ)= λ exp λ vf , n o  j | E j | − j  

18 The expression of the joint probability distribution f j , vf b is given by Equation (35), j |

p(f j , vf λ)= (f j 0, vf ) (vf λ), (35) j | N | j E j | so the marginal is, Equation (36):

1 1 2 − − 2 1 f j p(f j vf , λ)=(2π) 2 λ v exp 2λvf + dvf (36) | j f j −2 j v j Z   f j  The probability distribution of the generalized inverse Gaussian distribution is presented in Equation (37)

c (a/b) 2 1 (x a,b,c)= xc−1 exp ax + bx−1 , a,b 0, c R, (37) GIG | √ −2 ≥ ∈ 2 c( ab)   K  and c( ) is the modified Bessel function of the second kind. So, for solving Equation (36) we can identify a K ◦ GIG inside the integral:

− 1 1 1 f 2 2λ 4 1 − 2 j 2 2 I = v exp 2λvf + dvf =2 1 2λf j vf 2λ, f j , dvf f j j j 2 j j −2 vf K 2 f j GIG | 2 Z   j  q    Z   1 (38) | {z } So, the marginal from Equation (36) can be written as:

− 1 − 1 1 1 1 3 1 2 4 2 2 4 − 2 4 2 2 p(f j vf j ,b)=(2π) λ (2λ) f j 2 1 2λf j =2 π λ f j 1 2λf j (39) | K 2 K 2 q  q  The following equality stands:

− π 1 1 2 4 − 2 1 2λf j = (2λ) f j 2 exp 2λf j (40) K 2 2 − q  r  q  From Equation (39) and Equation (40) the marginal from Equation (36) becomes:

−1 −1 1 − 1 3 1 1 − 1 − 1 2 − 1 1 2 p(f j vf , λ)=2 4 π 2 λ 4 f j 2 π 2 2 2 2 4 λ 4 f j 2 exp 2λf j =2 2 λ 2 exp 2λf j (41) | j − −  q   q  − 1 So, we conclude that the marginal is a Laplace distribution, p(f j vf , λ)= f j 0 (2λ) 2 : | j L |   1 f j − 1 p(f j vf , λ)= exp | | = f j 0 (2λ) 2 (42) j − 1 − 1 | 2(2λ) 2 −(2λ) 2 L |     2.5.2 Laplace prior: Normal and Inverse Gamma

We start with the linear model expressed in Equation (1). We consider the Laplace Prior Model (LPM), Equation (43),

which considers a zero-mean Normal distribution for f j vf and the inverse of the variance. The variance vf b is | j j | modelled as an Inverse Gamma distribution, with the corresponding shape parameter α equal to 1:

1 − 1 vf p(f 0, v )= (f 0, v−1)=(2π) 2 v 2 exp j f 2 j f j j f j f j 2 j LPM: | N | − (43)  b b −2 b −1 n o  p(vf j b)= (vf j 1, )= v exp v ,  | IG | 2 2 f j − 2 f j n o  19 The expression of the joint probability distribution f j , vf b is given by Equation (44), j |

−1 b p(f j , vf b)= (f j 0, v ) (vf 1, ), (44) j | N | f j IG j | 2

so the marginal is, Equation (45):

1 1 − b − 2 −1 1 2 −1 p(f j vf ,b)=(2π) 2 v exp f j vf + bv dvf (45) | j 2 f j −2 j f j j Z    The probability distribution of generalized inverse Gaussian distribution is presented in Equation (46)

c (a/b) 2 1 (x a,b,c)= xc−1 exp ax + bx−1 , a,b 0, c R, (46) GIG | √ −2 ≥ ∈ 2 c( ab)   K  and c( ) is the modified Bessel function of the second kind. So, for solving Equation (45) we can identify a K ◦ GIG inside the integral:

2 − 1 −1 1 2 − 1 ( bf j ) 1 I = v 2 exp f 2v + bv−1 dv = K 2 v f 2, b, dv (47) f j j f j f j f j 1 f j j f j 2 − −2 f j p 4 GIG | −2 Z    b Z   1   So, the marginal from Equation (45) can be written as: | {z }

1 2 4 − 1 b − 1 −1 f j − 1 3 1 2 2 2 2 4 2 2 p(f j vf j ,b)=(2π) vf 2 − 1 bf j = (2π) b f j − 1 bf j (48) | 2 j b K 2 K 2   q  q  The following equality stands:

π −1 1 2 4 − 2 2 − 1 bf j = b f j exp bf j (49) K 2 2 − q  r  q  From Equation (48) and Equation (49) the marginal from Equation (45) becomes:

− 1 − 1 3 1 1 − 1 − 1 − 1 2 1 2 p(f j vf ,b)=2 2 π 2 b 4 f j 2 π 2 2 2 b 4 f j 2 exp bf j = √b exp bf j (50) | j − 2 −  q   q  − 1 So, we conclude that the marginal is a Laplace distribution, p(f j vf ,b)= f j 0, b 2 : | j L |   1 f j − 1 p(f j vf ,b)= exp | | = f j 0, b 2 (51) j − 1 − 1 | 2b 2 −b 2 L |     2.5.3 Laplace prior: via the Generalized Hyperbolic distribution

X (x λ =1, α = b−1,β =0,δ 0,µ) has a Laplace distribution, (x µ,b). ∼ GH | ց L | The goal of this section is to derive the Laplace distribution from the Generalized Hyperbolic distribution. The Laplace distribution has the following probability density function:

1 x µ (x µ,b)= exp | − | (52) L | 2b − b  

20 Considering λ = 1 in the expression of the Generalized Hyperbolic probability density function, a Hyperbolic dis- tribution is obtained, Subsection (2.4.1). Fixing β = 0 implies γ = α and the expression of the probability density function becomes:

1 (x λ =1,α,β =0,δ,µ)= (x α, β =0,δ,µ)= exp α δ2 + (x µ)2 (53) GH | H | 2δ 1 (δα) − − K  q  Further, considering δ 0, for the expression of the modified Bessel function of the second degree (δα) the ց K1 asymptotic relation for small arguments presented in Equation (23) can be used, obtaining:

(δα)= α−1δ−1, for δ 0. (54) K1 ց

Using Equation (54) in Equation (53), the expression of the probability density function becomes:

1 x µ (x λ =1,α,β =0,δ 0,µ)= (x α, β =0,δ 0,µ)= exp | − | . (55) GH | ց H | ց 2α−1 − α−1   Finally, for α = b−1, the standard Laplace probability density function is obtained: 1 x µ (x λ =1, α = b−1,β =0,δ 0,µ)= (x α = b−1,β =0,δ 0,µ)= exp | − | = (x µ,b) GH | ց H | ց 2b − b L |   (56)

Figure (12a) presents four Laplace probability density functions (x µ,b) with different means values (µ = 0, blue L | and red, µ = 1, yellow and µ = 1, violet) and different values (b = 1, blue and violet, b = 2, red − and b = 0.5, yellow). Figure (12b) presents the Generalized Hyperbolic probability density function for parameters (x λ =1, α = b−1,β =0,δ 0,µ) with µ and b set with the same numerical values as the corresponding ones GH | ց from Figure(12a). For δ, the numerical values is 0.001. Figure(12c) presents the comparison between the four Laplace probability density functions and the corresponding Generalized Hyperbolic probability density functions, vs. . L GH In all forth cases the probability density functions are superposed. Figure (12d) presents the comparison between the

logarithm of distributions, log vs. log . Figure (13c) presents the comparison between the standard Laplace − L − GH probability density function, (x µ =0,b = 1) (reported in Figure (13a)) and the Generalized Hyperbolic density L | function for parameters (x λ =1, α = b−1 =1,β =0,δ =0.01,µ = 0) (presented in Figure (13b)), showing the GH | two distributions superposed. Figure (13d) presents the comparison between the logarithm of the two distributions,

log vs. log . The behaviourof the Generalized Hyperbolic density function (x λ =1, α = b−1 =1,β = − L − GH GH | 0,δ 0,µ = 0) depending on δ is presented in Figure (14a): a comparison between the Laplace probability density ց function (in blue) and the Generalized Hyperbolic density function (x λ =1, α = b−1 =1,β =0,δ 0,µ = 0) GH | ց for δ =1 (in red), δ =0.1 (in yellow), δ =0.01 (in violet) and δ =0.001 (in green) is presented in Figure (14a). The difference between those four Generalized Hyperbolic density functions and the Laplace density function is presented

21 Lapalce ( x | 0,1 ) 1 GH ( x | 1,1,0,0.001,0)1 Lapalce ( x | 0,2 ) GH ( x | 1,0.5,0,0.001,0) Lapalce ( x | 1,0.50.8 ) GH ( x | 1,2,0,0.001,1)0.8 Lapalce ( x | -1,1 ) GH ( x | 1,1,0,0.001,-1) 0.6 0.6

0.4 0.4

0.2 0.2

-10 -5 5 10 -10 -5 5 10 −1 (a) (x µ, b) (b) (x λ = 1, α = b , β = 0, δ 0, µ) L | GH | ց

GH ( x | 1,1,0,0.001,0)1 -log GH ( x | 1,1,0,0.001,0) Lapalce ( x | 0,1 ) -log Lapalce ( x | 0,120 ) GH ( x | 1,0.5,0,0.001,0)0.8 -log GH ( x | 1,0.5,0,0.001,0) Lapalce ( x | 0,2 ) -log Lapalce ( x | 0,215 ) GH ( x | 1,2,0,0.001,1)0.6 -log GH ( x | 1,2,0,0.001,1) Lapalce ( x | 1,0.5 ) -log Lapalce ( x | 1,0.510 ) GH ( x | 1,1,0,0.001,-1)0.4 -log GH ( x | 1,1,0,0.001,-1) Lapalce ( x | -1,1 ) -log Lapalce ( x | -1,1 ) 0.2 5

-10 -5 5 10 -10 -5 5 10 (c) vs. (d) log vs. log L GH − L − GH Figure 12: Four Laplace probability density functions with different means values (µ =0, blue and red, µ =1, yellow and µ = 1, violet) and different scale parameter values (b = 1, blue and violet, b = 2, red and b = 0.5, yellow), Figure (12a−) and the corresponding Generalized Hyperbolic probability density functions, with parameters set as in Equation (56), with δ = 0.001, Figure (12b). Comparison between the distributions, vs. , Figure (12c) and between the logarithm of the distributions log vs. log , Figure (12d). L GH L GH

in Figure (14b). The Laplace distribution can be expressed using the Generalized Hyperbolic Prior Model (GHPM),

Equation (14) by fixing λ =1, α = b−1,β =0,δ 0: ց 2 1 − 1 − (x−µ−βvf ) p(f µ, v , β)= f µ + βv , v = (2π) 2 v 2 exp 1 j j f j j f j f j f j 2 v LPM: | N | − fj  2 2 2 2  1 1  1  p(vf j γ , δ 0, λ =1)= (vf j γ , δ 0, λ =1)= (vf j 2 )= 2 exp 2 vf j ,  | ց GIG | ց E | 2b 2b − 2b  (57)  2.6 Variance-Gamma distribution: expressed via conjugate priors

The Variance-Gamma distribution, or the generalized Laplace distribution or Bessel function distribution is a con- tinuous probability distribution that is defined as the normal variance-mean mixture where the mixing density is the gamma distribution. The tails of the distribution decrease more slowly than the normal distribution. It is therefore suitable to model phenomena where numerically large values are more probable than is the case for the normal distri- bution. In particular, it can be successfully used for modelling sparse phenomena. Examples are returns from financial assets and turbulent wind speeds. The distribution was introduced in the financial literature by Madan and Seneta

22 Lapalce ( x | 0,1 )0.5 GH ( x | 1,1,0,0.01,0)0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

-10 -5 5 10 -10 -5 5 10 −1 (a) (x µ = 0,b = 1) (b) (x λ = 1, α = b = 1, β = 0, δ = 0.01, µ = 0) L | GH |

Lapalce ( x | 0,1 )0.5 -log Lapalce ( x | 0,1 ) 10 GH ( x | 1,1,0,0.01,0) -log GH ( x | 1,1,0,0.01,0) 0.4 8

0.3 6

0.2 4

0.1 2

-10 -5 5 10 -10 -5 5 10 (c) vs. (d) log vs. log L GH − L − GH Figure 13: The standard Laplace distribution, Figure (13a) and the Generalized Hyperbolic distribution, with param- eters set as in Equation (56), Figure (13b). Comparison between the two distribution, vs. , Figure (13c) and between the logarithm of the distributions log vs. log , Figure (13d). L GH − L − GH

δ Lapalce ( x | 0,1 ) 0.5 -10 = 1 -5 5 10 GH ( x | 1,1,0,1,0) δ = 0.1 GH ( x | 1,1,0,0.1,0)0.4 δ = 0.01 GH ( x | 1,1,0,0.01,0) δ = 0.001 -0.05 GH ( x | 1,1,0,0.001,0)0.3

0.2 -0.1

0.1 -0.15

-10 -5 5 10 -0.2 (a) vs. for δ 1, 0.1, 0.01, 0.001 (b) for δ 1, 0.1, 0.01, 0.001 L GH ∈ { } GH−L ∈ { } Figure 14: The behaviour of the Generalized Hyperbolic density function (x λ = 1, α = b−1,β = 0,δ 0,µ) depending on δ. GH | ց

(D.B. Madan and E. Seneta (1990): The variance gamma (V.G.) model for share market returns, Journal of Business, 63, pp. 511–524.). The Variance-Gamma distributions form a subclass of the generalised hyperbolic distributions. The fact that there is a simple expression for the moment generating function implies that simple expressions for all

23 moments are available. The Variance-Gamma distribution has the following probability density function:

2λ λ− 1 γ x µ 2 λ− 1 (α x µ ) (x α,β,λ,µ)= | − | K 2 | − | exp β (x µ) ,λ> 0,γ = α2 β2; (58) λ− 1 V − G | √πΓ (λ) (2α) 2 { − } − p 2.6.1 Variance-Gamma: via Generalized Hyperbolic distribution

The goal of this subsection is to derive the Variance-Gammadistribution from the Generalized Hyperbolic distribution.

X (x λ,α,β,δ 0,µ) has a variance-gamma distribution, (x α,β,λ,µ). Considering δ 0, the ∼ GH | ց V − G | ց particular form of the Generalized Hyperbolic distribution becomes:

λ λ− 1 γ x µ 2 λ− 1 (α x µ ) (x λ,α,β,δ 0,µ)= | − | K 2 | − | exp β (x µ) (59) λ λ− 1 GH | ց δ √2πα 2 λ (δγ) { − } K

Since δ 0, for the expression of the modified Bessel function of the second degree λ (δγ) the asymptotic relation ց K for small arguments presented in Equation (23) can be used, obtaining:

λ−1 −λ λ (δγ)=Γ(λ)2 (δγ) , for δ 0. (60) K ց

Using Equation (60) in Equation (59), the particular form of the Generalized Hyperbolic distribution becomes:

2λ λ− 1 γ x µ 2 λ− 1 (α x µ ) (x λ,α,β,δ 0,µ)= | − | K 2 | − | exp β (x µ) = (x α,β,λ,µ) (61) λ− 1 GH | ց √πΓ (λ) (2α) 2 { − } V − G |

Figure (15c) presents the comparison between the Variance-Gamma probability density function, (x α,β,λ,µ) V − G | (presented in Figure (15a)) and the Generalized Hyperbolic density function for parameters (x λ,α,β,δ 0,µ) GH | ց (presented in Figure (15b)). Figure (15d) presents the comparison between the logarithm of the two distributions,

log vs. log . The Variance-Gamma distribution can be expressed using the Generalized Hyperbolic Prior V − G GH Model (GHPM), Equation (14) by fixing δ 0: ց 2 1 − 1 − (x−µ−βvf ) p(f µ, v , β)= f µ + βv , v = (2π) 2 v 2 exp 1 j j f j j f j f j f j 2 v | N | − fj V-GPM: λ−1   γ2  2 2  2 2 2 2 γ 2 λ−1 γ  p(vf γ , δ 0, λ =1)= (vf γ , δ 0, λ =1)= (vf λ, )=   v exp vf , j | ց GIG j | ց IG j | 2 Γ(λ) f j − 2 j  (62)n o  2.7 Normal-Inverse Gaussian distribution: expressed via conjugate priors

The interest of the Normal-Inverse Gaussian distribution is given by the fact that it can be a heavy-tailed distribution and therefore can be successfully used as a sparsity enforcing prior. The Normal-Inverse Gaussian distribution is a continuous probability distribution, defined as the Normal variance-mean mixture, where the mixing density is the Inverse Gaussian distribution. The parameters of the Normal-Inverse Gaussian distribution can be used to construct

24 Var-Gam ( x | 2,0.5,1,0) GH ( x | 1,2,0.5,0.01,0) Var-Gam ( x | 2,1,1,0)0.8 GH ( x | 1,2,1,0.01,0)0.8 Var-Gam ( x | 3,1.5,1.5,0) GH ( x | 1.5,3,1.5,0.01,0) Var-Gam ( x | 4,2,2,0) GH ( x | 2,4,2,0.01,0) 0.6 0.6

0.4 0.4

0.2 0.2

-10 -5 5 10 -10 -5 5 10 (a) (x α,β,λ,µ) (b) (x λ, α, β, δ 0, µ) V−G | GH | ց

GH ( x | 1,2,0.5,0.01,0) -log GH ( x | 1,2,0.5,0.01,0) Var-Gam ( x | 2,0.5,1,0)0.8 -log Var-Gam ( x | 2,0.5,1,0) GH ( x | 1,2,1,0.01,0) -log GH ( x | 1,2,1,0.01,0) Var-Gam ( x | 2,1,1,0) 40 0.6 -log Var-Gam ( x | 2,1,1,0) GH ( x | 1.5,3,1.5,0.01,0) -log GH ( x | 1.5,3,1.5,0.01,0) Var-Gam ( x | 3,1.5,1.5,0) -log Var-Gam ( x | 3,1.5,1.5,0) GH ( x | 2,4,2,0.01,0)0.4 -log GH ( x | 2,4,2,0.01,0)20 Var-Gam ( x | 4,2,2,0) 0.2 -log Var-Gam ( x | 4,2,2,0)

-10 -5 5 10 -10 -5 5 10 (c) vs. (d) log vs. log V−G GH V−G GH Figure 15: The Variance-Gamma distribution, Figure (15a) and the Generalized Hyperbolic distribution, with param- eters set as in Equation (61), Figure (15b). Comparison between the two distribution, vs. , Figure (15c) and between the logarithm of the distributions log vs. log , Figure (15d). V − G GH V − G GH

a heaviness and skewness plot called the NIG-triangle. The Normal-Inverse Gaussian distribution has the following probability density function:

αδ α δ2 + (x µ)2 K1 − (x α,β,δ,µ)=  q  exp γδ + β (x µ) (63) N IG | 2 { − } π δ2 + (x µ) − q 2.7.1 Normal-Inverse Gaussian distribution: via Generalized Hyperbolic distribution

The goal of this subsection is to derive the Normal-Inverse Gaussian distribution from the Generalized Hyperbolic distribution.

X (x λ = 1 ,α,β,δ,µ) has a Normal-Inverse Gaussian distribution. ∼ GH | − 2 Considering λ = 1 , the Generalized Hyperbolic distribution is: − 2

1 2 2 γ − α δ + (x µ) 1 2 K1 − (x λ, α = ,β,δ,µ)= δ   exp β (x µ) (64) q 1 −λ GH | −2 √2π − 1 (γδ) 2 2 2 { − } K 2 √δ +(x−µ) α  

25 1 Using the fact that for λ = , the modified Bessel function of the second kind λ (x) can be stated explicitly, 2 K

Equation (29), we express − 1 (γδ): K 2

1 π 2 1 (γδ)= exp γδ . (65) K 2 2γδ {− }   Plugging Equation (65) in Equation (64):

αδ α δ2 + (x µ)2 1 K1 − (x λ = ,α,β,δ,µ)=  q  exp γδ + β (x µ) = (x α,β,δ,µ) (66) GH | −2 2 { − } N IG | π δ2 + (x µ) − q Figure (16c) presents the comparison between the Normal-Inverse Gaussian probability density function,

(x α,β,δ,µ) (presented in Figure (16a)) and the Generalized Hyperbolic density function for parameters N IG | (x λ = 1 ,α,β,δ,µ) (presented in Figure (16b)). Figure (16d) presents the comparison between the logarithm GH | − 2 of the two distributions, log vs. log . N IG GH

N-Inv Gau ( x | 1,0.5,1,0)0.5 GH ( x | -0.5,1,0.5,1,0)0.5 N-Inv Gau ( x | 1.5,1,1.5,0) GH ( x | -0.5,1.5,1,1.5,0) N-Inv Gau ( x | 2,1.5,2,0)0.4 GH ( x | -0.5,2,1.5,2,0)0.4 N-Inv Gau ( x | 2,2,2,0) GH ( x | -0.5,2,2,2,0) 0.3 0.3

0.2 0.2

0.1 0.1

-10 -5 5 10 -10 -5 5 10 (a) (x α, β, δ, µ) (b) (x λ = 1 , α, β, δ, µ) NIG | GH | − 2

0.5 GH ( x | -0.5,1,0.5,1,0) -log GH ( x | -0.5,1,0.5,1,0) N-Inv Gau ( x | 1,0.5,1,0) -log N-Inv Gau ( x | 1,0.5,1,0) 0.4 GH ( x | -0.5,1.5,1,1.5,0) -log GH ( x | -0.5,1.5,1,1.5,0)30 N-Inv Gau ( x | 1.5,1,1.5,0) -log N-Inv Gau ( x | 1.5,1,1.5,0) 0.3 GH ( x | -0.5,2,1.5,2,0) -log GH ( x | -0.5,2,1.5,2,0) -log N-Inv Gau ( x |20 2,1.5,2,0) N-Inv Gau ( x | 2,1.5,2,0) 0.2 -log GH ( x | -0.5,2,2,2,0) GH ( x | -0.5,2,2,2,0) -log N-Inv Gau ( x |10 2,2,2,0) N-Inv Gau ( x | 2,2,2,0)0.1

-10 -5 5 10 -10 -5 5 10 (c) vs. (d) log vs. log NIG GH NIG GH Figure 16: The Normal-Inverse Gaussian distribution, Figure (16a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (66), Figure (16b). Comparison between the two distribution, vs. , Figure(16c) and between the logarithm of the distributions log vs. log , Figure (16d). N IG GH N IG GH

26 3 From uncertainties models to likelihood models

Depending on the application the corresponding linear forward model variances might be unknown and are to be estimated. For the linear forward model, Equation (1), the unknowns are f, representing a signal or an image and ǫ, which accounts for measurements errors and uncertainties. When the correspondingvariancesare alos considered to be unknowns, we want to estimate the associated variances, i.e. vf and vǫ. Typically, the construction of the hierarchical models is based on general Bayesian inference, which gives the possibility to derive the posterior distribution from the prior and likelihood. In this section, for the prior distribution we will use the Student-t distribution, in order to enforce sparsity on f. The likelihood is obtained from the distribution proposed for modelling the uncertainties of the model, ǫ. Different distributions can be proposed for modelling the uncertainties, resulting in different likelihoods.

3.1 Stationary Gaussian Uncertainties Model

A stationary Gaussian uncertainties model (sGUM) can be proposed, under the assumption that the associated uncer- tainties variances are equal, Equation (67):

vǫi = vǫj = vǫ. (67)

The uncertainties vector elements are modelled using zero-mean Normal distributions, with the same associated vari-

ance, vǫ, Equation (68):

p (ǫi vǫ)= (ǫi 0, vǫ) ,i 1, 2,...,N . (68) | N | ∈{ } leading to the sGUM, a multivariate normal distribution, Equation (69):

sGUM: p (ǫ vǫ)= (ǫ 0, vǫI) (69) | N |

The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the stationary Gaussian un- | certainties model (sGUM), Equation (69). The distribution modelling the likelihood is also a multivariate normal

distribution with the same vǫI and mean Hf, stationary Gaussian likelihood (sGL), Equation (70):

sGL: p (g f, vǫ)= (g Hf, vǫI) (70) | N | 3.2 Non-stationary Gaussian Uncertainties Model

A non-stationary Gaussian uncertainties model (nsGUM) can be proposed, when the assumption that the associated uncertainties variances are equal, Equation (67) is not imposed. The uncertainties vector elements are modelled using zero-mean Normal distributions like in sGUM, but with different

27 associated variances, vǫi for each uncertainties vector element ǫi, Equation (71):

p (ǫi vǫ )= (ǫi 0, vǫ ) ,i 1, 2,...,N . (71) | i N | i ∈{ } leading to the nsGUM, a multivariate normal distribution, Equation (72):

nsGUM: p (ǫ vǫ)= (ǫ 0, V ǫ) , (72) | N | where the following notations were used:

vǫ = [vǫ1 , vǫ2 ,..., vǫN ]; V ǫ = diag [vǫ] . (73)

The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the stationary Gaussian un- | certainties model (nsGUM), Equation (72). The distribution modelling the likelihood is also a multivariate normal

distribution with the same covariance matrix V ǫ and mean Hf, non-stationary Gaussian likelihood (nsGL), Equa- tion (74):

nsGL: p (g f, vǫ)= (g Hf, V ǫ) (74) | N | 3.3 Stationary Student-t Uncertainties Model

A stationary Student-t uncertainties model (sStUM) can be proposed, under the assumption that the associated uncer-

tainties variances are equal, Equation (67).

The uncertainties vector elements given the same associated variance vǫ, are modelled using zero-mean Normal dis-

tributions, ǫ vǫ (ǫ 0, vǫ) and the variance vǫ is modelled using an Inverse Gamma distribution, Equation (75): | ∼ N |

αǫ βǫ −αǫ−1 βǫ p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ)= v exp , (75) | IG | Γ(α ) ǫ − v ǫ  ǫ  such that via the Student-t Prior Model, Equation (7), the error vector ǫ is modelled by a Student-t distribution, Equation (76): 1 1 2 −(αǫ+ 2 ) − 1 Γ(αǫ + 2 ) ǫi p(ǫi αǫ,βǫ)=(2πβǫ) 2 1+ . (76) | Γ(α ) 2β ǫ  ǫ  The sStUM is represented by a multivariate Student-t distribution, Equation (77):

1 1 N N −(αǫ+ ) N 2 2 − Γ(αǫ + 2 ) ǫi p (ǫ αǫ,βǫ)=(2πβǫ) 2 1+ (77) | Γ(αǫ) 2βǫ i=1   Y   and is expressed by a multivariate Normal distribution and an Inverse Gamma distribution, Equation (78):

p (ǫ vǫ)= (ǫ 0, vǫI) sStUM: | N | (78)  p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ) ,  | IG |  28 The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the stationary Student-t un- | certainties model (sStUM), Equation (78). The distribution modelling the likelihood is also a multivariate normal

distribution with the same covariance matrix vǫI and mean Hf, stationary Student-t likelihood (sStL), Equation (79):

p (g f, vǫ)= (g Hf, vǫI) sStL: | N | (79)  p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ) ,  | IG | 3.4 Non-stationary Student-t Uncertainties Model

A non-stationary Student-t uncertainties model (nsStUM) can be proposed, when the assumption that the associated uncertainties variances are equal, Equation (67) is not imposed.

The uncertainties vector elements given the associated variances vǫi are modelled using zero-mean Normal distribu-

tions, ǫi vǫ (ǫi 0, vǫ ) and the variances vǫ are modelled using Inverse Gamma distributions, having the same | i ∼ N | i i shape and scale parameters, Equation (80):

αǫ βǫ −αǫ−1 βǫ p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ)= v exp ,i 1, 2 ...,N , (80) i | IG i | Γ(α ) ǫi −v ∈{ } ǫ  ǫi  such that via the Student-t Prior Model, Equation (7), every element of the uncertainties vector ǫ is modelled by a Student-t distribution, Equation (76). The sStUM is represented by a multivariate Student-t distribution, Equation (77) and is expressed by a multivariate Normal distribution and a product of Inverse Gamma distributions, Equation (81):

p (ǫ vǫ)= (ǫ 0, V ǫ) | N | nsStUM: N (81)  p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ) ,  | i=1 IG i | Q The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the non-stationary Student-t un- | certainties model (nsStUM), Equation (81). The distribution modelling the likelihood is also a multivariate normal dis-

tribution with the same covariance matrix vǫI and mean Hf, stationary Student-t likelihood (nsStL), Equation (82):

p (g f, vǫ)= (g Hf, V ǫ) | N | nsStL: N (82)  p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ) ,  | i=1 IG i | Q 3.5 Stationary Laplace Uncertainties Model

A stationary Laplace uncertainties model (sLUM) can be proposed, under the assumption that the associated uncer- tainties variances are equal, Equation (67).

The uncertainties vector elements given the same associated variance vǫ, are all modelled using zero-mean Normal

−1 distributions, ǫi vǫ ǫi 0, v and the variance vǫ is modelled using an Inverse Gamma distribution, for which | ∼ N | ǫ 

29 bǫ the shape parameter is set at 1 and the considered scale parameter is 2 Equation (83):

bǫ bǫ bǫ −2 bǫ p vǫ 1, = vǫ 1, = v exp , (83) | 2 IG | 2 2 ǫ −2v      ǫ  such that via the Laplace Prior Model, Equation (43), the error vector ǫ is modelled by a Laplace distribution, Equa- tion (84):

1 ǫi p(ǫi bǫ)= 1 exp 1 . (84) | − 2 − − 2 2bǫ ( bǫ ) The sLUM is represented by a multivariate Laplace distribution, Equation (85):

N N 1 i=1 ǫi p (ǫ bǫ)= p(ǫi bǫ)= exp (85) N − 1 | | N − 2 − 2 i=1 2 bǫ ( bǫ ) Y P and is expressed by a multivariate Normal distribution and an Inverse Gamma distribution, Equation (86):

−1 p (ǫ vǫ)= ǫ 0, vǫ I sLUM: | N | (86) bǫ bǫ  p vǫ 1, = vǫ 1, ,  | 2 IG | 2   The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the stationary Laplace uncertain- | ties model (sLUM), Equation (86). The distribution modelling the likelihood is also a multivariate normal distribution

−1 with the same covariance matrix vǫ I and mean Hf, stationary Student-t likelihood (sLL), Equation (87):

−1 p (g f, vǫ)= g Hf, vǫ I sLL: | N | (87) bǫ bǫ  p vǫ 1, = vǫ 1, ,   | 2 IG | 2   3.6 Non-stationary Laplace Uncertainties Model

A non-stationary Laplace uncertainties model (nsLUM) can be proposed, when the assumption that the associated uncertainties variances are equal, Equation (67) is not imposed.

The uncertainties vector elements given the associated variance vǫi , are modelled using zero-mean Normal distribu-

−1 tions, ǫi vǫ ǫi 0, v and the variances vǫ are modelled using Inverse Gamma distributions, for which the | i ∼ N | ǫi i  bǫ shape parameter is set at 1 and the considered scale parameter is 2 , Equation (88):

bǫ bǫ bǫ −2 bǫ p vǫ 1, = vǫ 1, = v exp ,i 1, 2 ...,N , (88) i | 2 IG i | 2 2 ǫi −2v ∈{ }      ǫi  such that via the Laplace Prior Model, Equation (43), every element of the uncertainties vector ǫ is modelled by a Laplace distribution, Equation (84). The nsLUM is represented by a multivariate Laplace distribution, Equation (85) and is expressed by a multivariate Normal distribution and a product of Inverse Gamma distributions, Equation (89):

−1 p (ǫ vǫ)= ǫ 0, V ǫ nsLUM: | N | (89) bǫ N bǫ  p vǫ 1, =  vǫ 1, ,  | 2 i=1 IG | 2  Q   30 The likelihood p (g f, vǫ) is obtained via the linear forward model, Equation (1) and the non-stationary Laplace un- | certainties model (nsLUM), Equation (89). The distribution modelling the likelihood is also a multivariate normal

−1 distribution with the same covariance matrix V ǫ I and mean Hf, non-stationary Laplace likelihood (nsLL), Equa- tion (90): −1 p (g f, vǫ)= g Hf, V ǫ nsLL: | N | (90) bǫ N bǫ  p vǫ 1, = vǫ 1, ,  | 2 i=1 IG i | 2  Q  4 Hierarchical Models with Student-t prior via StPM

Considering the Student-t distribution for modelling the sparse structure of f in linear forward model, Equation (1), expressed via the conjugate priors StPM, depending on the model proposed for the uncertainties of the model, ǫ, and implicitly the corresponding likelihood, sGL or nsGL, different hierarchical models can be proposed:

4.1 Student-t hierarchical model: stationary Gaussian uncertainties model, known uncer- tainties variance

the hierarchical model is using as a prior the Student-t distribution; •

the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following two assumptions:

a) each element of the uncertainties vector has the same variance, vǫ;

b) the variance vǫ is known;

N − 2 1 Likelihood : sGL: p (g f, vǫ)= (g Hf, vǫI) vǫ exp (g Hf) . | N | ∝ − 2vǫ k − k 1 1 − n 1 − 2 o p(f 0, vf )= (f 0, V f ) det V f 2 exp V f , | N | ∝ { } − 2 k f k  n−α −1 o β P rior : StPM: M M f M f  p(vf αf ,βf )= j=1 (vf j αf ,βf ) j=1 vf exp j=1 v ,  | IG | ∝ j − fj  n o Student-t sGL K Q V fQ= diag [vf ] , vf = ...,P vf j ,... .     (91)  4.2 Student-t hierarchical model: non-stationary Gaussian uncertainties model, known un- certainties variances

the hierarchical model is using as a prior the Student-t distribution; •

31 the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following assumption:

a) the variance vector vǫ is known;

N 1 1 − 2 1 − 2 p (g f, vǫ)= (g Hf, V ǫ) vǫ exp V ǫ (g Hf) , Likelihood : nsGL: | N | ∝ i −2k − k  i=1    Y  V ǫ = diag [vǫ] , vǫ = [...vǫi ,...] .

 1 − 1  − 2 1 2 p(f 0, vf )= (f 0, V f ) det V f exp V f , | N | ∝ { } − 2 k f k  α StPM: M M n− f −1 o M βf P rior : p(vf αf ,βf )= j=1 (vf j αf ,βf ) j=1 vf exp j=1 , Student-t nsGL K  j v  | IG | ∝ − fj  n o Q V fQ= diag [vf ] , vf = ...,P vf j ,... .     (92)  4.3 Student-t hierarchical model: stationary Gaussian uncertainties model, unknown un- certainties variance

the hierarchical model is using as a prior the Student-t distribution; •

the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following two assumptions:

a) each element of the uncertainties vector has the same variance, vǫ;

b) the variance vǫ is unknown;

N − 2 1 Likelihood : sGL: p (g f, vǫ)= (g Hf, vǫI) vǫ exp (g Hf) . | N | ∝ − 2vǫ k − k 1 1 − n 1 − 2 o p(f 0, vf )= (f 0, V f ) det V f 2 exp V f , | N | ∝ { } − 2 k f k  n−α −1 o β P rior : StPM: M M f M f  p(vf αf ,βf )= j=1 (vf j αf ,βf ) j=1 vf exp j=1 v ,  | IG | ∝ j − fj  n o Q V fQ= diag [vf ] , vf = ...,P vf j ,... . Student-t sGL UNK     (93) 

32 4.4 Student-t hierarchical model: non-stationary Gaussian uncertainties model, unknown uncertainties variances

the hierarchical model is using as a prior the Student-t distribution; •

the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following assumption:

a) the variance vector vǫ is unknown;

N 1 1 − 2 1 − 2 p (g f, vǫ)= (g Hf, V ǫ) vǫ exp V ǫ (g Hf) , Likelihood : nsGL: | N | ∝ i −2k − k  i=1    Y  V ǫ = diag [vǫ] , vǫ = [..., vǫi ,...] .

 1 − 1  − 2 1 2 p(f 0, vf )= (f 0, V f ) det V f exp V f , | N | ∝ { } − 2 k f k  M M n−α −1 o M β P rior : StPM: p(v α ,β )= (v α ,β ) v f exp f ,  f f f j=1 f j f f j=1 f j j=1 v  fj  | IG | ∝ − Student-t nsGL UNK  n o Q V fQ= diag [vf ] , vf = ...,P vf j ,... .     (94)  4.5 Student-t hierarchical model: stationary Student-t uncertainties model, unknown un- certainties variance

the hierarchical model is using as a prior the Student-t distribution; •

the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a stationary Student-t uncertainties model is proposed, i.e. a multivariate Stu- • dent distribution expressed via StPM is used under the following two assumptions:

a) each element of the uncertainties vector has the same variance, vǫ;

b) the variance vǫ is unknown;

33 o-ttoaySuettUcranisMdl(stM,E (nsStUM), Model Uncertainties Student-t non-stationary ( Equation (StPM), Model Prior Student-t the via expressed h irrhclmdlbidoe h ierfradmodel, forward linear the over build model hierarchical The . tdn- irrhclmdl o-ttoayStudent non-stationary model: hierarchical Student-t 4.6 • • • Student-t nsStL UNK • Student-t sStL UNK o h netite vector uncertainties the for the h irrhclmdli sn sa as using is model hierarchical the h tdn- ro itiuini xrse via expressed is distribution prior Student-t the tdn- itiuinepesdvia expressed distribution Student-t )tevrac vector variance the a) Likelihood rior P netite variances uncertainties rior P Likelihood likelihood : : : : StPM: StPM: nsStL: sStL: sdrvdfo h itiuinpooe o oeln the modelling for proposed distribution the from derived is      v                                    ǫ p p is p p p p p p ( ( ( unknown ( ( v g ( ( ǫ ( v f v v f ǫ g | a f | f ǫ | f | | α 0 | f 0 | , | o-ttoaySuettucranismodel uncertainties Student-t non-stationary α α , ǫ α , v , β , v ǫ v f f ǫ v β , f = ) β , f β , ǫ ǫ prior = ) StPM = ) = ) = ) ǫ f ; f = ) = ) = ) N N N N IG the ( Q ( sue ne h olwn assumption: following the under used is g ( Q Q ( f f | i N g ( H =1 j M | v j M Student-t | | 0 =1 0 =1 H ǫ , StPM , | f IG α V f IG V , IG ǫ v , β , 34 f f ( ǫ V ( 7 v ( ) I uto ( quation ) qain( Equation v ǫ qain( Equation , v n oeln h netite ftemodel the of uncertainties the modelling and ) ǫ ) ǫ ∝ ) ∝ f i f ) | ∝ j ∝ j distribution; α ∝ | det | det α α ǫ v v β , f f i Y ǫ − ǫ − =1 N { β , { β , ǫ α N V 81 2 V ) ǫ v f f 1 − f ∝ ǫ − f ) ,i rsne nEuto ( Equation in presented is ), ) ,uiga ro for prior a as using ), exp i 7 V 1 } } 1 V 2 ∝ ∝ ,cnieigtevariance the considering ), V − Q − exp ǫ exp f 2 1 Q 1 2 Q f n i N tucranismdl unknown model, uncertainties -t = =1 = − exp = j M exp j M n diag  =1 =1 2 v diag diag β v 1 − v netite vector uncertainties ǫ − ǫ ǫ ǫ i v n v n α 1 o 2 k f − f − [ − − ǫ v j k [ j ( α − α , [ v g ǫ V v 1 2 f 1 2 f 1 ] f − k spooe,ie multivariate a i.e. proposed, is f − k − , ] ǫ − V exp ] V 1 1 , v , 2 1 H f − v exp f − ǫ exp v ( f f g 2 1 [ = f n 1 2 f f f = ) − tdn- distribution, Student-t a − = n n k k k , . . . P o − o H  − o  , . . . , . . . , , , P 96 P i N f =1 v v ǫ ) ǫ j M .Teposterior The ). f j M ; v i =1 k =1 v . . . , v f sunknown; as β  f ǫ j ǫ i j . . . , v , v β o . . . , β ǫ f f ] f f j j , , sn the using o o   . , , . (96) (95) distribution is obtained via the Bayes rule, Equation (97):

p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) p (g f, vǫ) p (vǫ αǫβǫ) p(f 0, vf ) p(vf αf ,βf ) | ∝ | | | | N N N − 1 1 − 1 βǫ 2 V 2 g Hf −αǫ−1 vǫi exp ǫ ( ) vǫi exp ∝ −2k − k − vǫ i=1 i=1 ( i=1 i ) (97) Y   Y X M M M − 1 1 − 1 −α −1 βf v 2 exp V 2 f v f exp f i f f j −2k k − vf  j=1 j=1 j=1 j Y   Y  X  The goal is to estimate the unknowns of the hierarchical model, namely f, the main unknown of the linear forward model, Equation (1) which was suppose sparse, and consequently modelled via the Student-t distribution and the two variances appearing in the hierarchical model, Equation (96), the variance corresponding to the sparse structure f,

namely vf and the variance corresponding to uncertainties of model ǫ, namely vǫ.

4.6.1 Joint MAP estimation

First, the Joint Maximum A Posterior (JMAP) estimation is considered: the unknowns are estimated on the basis of the available data g, by maximizing the posterior distribution:

f, vf , vǫ = argmax p(f, vf , vǫ g, αf , βf , αǫ, βǫ) = arg min (f, vf , vǫ), (98) f , v , v | f , v , v L   ( f ǫ) ( f ǫ) b b b where for the second equality the criterion (f, vf , vǫ) is defined as: L

(f, vf ,, vǫ)= ln p(f, vf , vǫ g, αf , βf , αǫ, βǫ) (99) L − |

The MAP estimation correspondsto the solution minimizing the criterion (f, vf , vǫ). From the analytical expression L of the posterior distribution, Equation (97) and the definition of the criterion , Equation (99), we obtain: L N N 1 1 − 2 3 βǫ (f, vǫ, vf )= ln p(f, vǫ, vf g)= V ǫ (g Hf) + αǫ + ln vǫi + L − | 2k − k 2 vǫ i=1 i=1 i   X X (100) M M 1 1 − 2 3 βf + V f f + αf + ln vf j + 2k k 2 vf j=1 j=1 j   X X One of the simplest optimisation algorithm that can be used is an alternate optimization of the criterion (f, vǫ, vf ) L with respect to the each unknown:

With respect to f: • ∂ (f, vf , vǫ) ∂ − 1 2 − 1 2 T −1 −1 L =0 V ǫ 2 (g Hf) + V f 2 f = H V ǫ (g Hf)+ V f f =0 ∂f ⇔ ∂f k − k k k − − T −1 −1 T −1  H V ǫ H + V f f = H V ǫ g ⇔ T −1  −1 −1 T −1 f = H V ǫ H + V f H V ǫ g ⇒  b 35 With respect to vf , j 1, 2,...,M : • ∈{ } ∂ (f, vf , vf ) ∂ 3 1 2 −1 L =0 αf + ln vf + βf + f j v =0 ∂v ⇔ ∂v 2 j 2 f j f j f j      3 1 2 αf + vf βf + f j =0 ⇔ 2 j − 2     1 2 βf + 2 f j vf j = 3 ⇒ αf + 2

With respect to vǫ , i 1, 2,...,N : b • i ∈{ } − 1 First, we develop the norm V ǫ 2 (g Hf) : k − k − 1 T −1 T −1 T T −1 V ǫ 2 (g Hf) = g V ǫ g 2g V ǫ Hf + H f V ǫ Hf k − k − N N N −1 2 −1 −1 T T = v gi 2 v giHif + v f Hi Hif, ǫi − ǫi ǫi i=1 i=1 i=1 X X X where Hi denotes the line i of the matrix H, i 1, 2,...,N , i.e. Hi = [hi , hi ,..., hiM ]. ∈{ } 1 1 ∂ (f, vf , vǫ) ∂ 3 1 2 T T −1 L =0 αǫ + ln vǫ + βǫ + gi 2giHif + f Hi Hif v =0 ∂v ⇔ ∂v 2 i 2 − ǫi ǫi ǫi      3 1 2  αǫ + vǫ βǫ + (gi Hif) =0 ⇔ 2 i − 2 −     1 2 βǫ + 2 (gi Hif) vǫi = −3 ⇒ αǫ + 2 b The iterative algorithm obtained via JMAP estimation is presented Figure (17).

4.6.2 Posterior Mean estimation via VBA, partial separability

In this subsection, the Posterior Mean (PM) estimation is considered. The Joint MAP computes the mod of the posterior distribution. The PM computes the mean of the posterior distribution. One of the advantages of this estimator is that it minimizes the Mean Square Error (MSE). Computing the posterior means of any unknown needs great dimensional integration. For example, the mean corresponding to f can be computed from the posterior distribution using Equation (101),

Ep f = f p(f, vf , vǫ g, αf ,βf , αǫ,βǫ)dfdvf dvǫ. (101) { } | ZZZ In general, these computations are not easy. One way to obtain approximate estimates is to approximate p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) by a separable one q(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = q (f) q (vf ) q (vǫ), then com- | | 1 2 3 puting the posterior means using the separability. The mean corresponding to f is computed using the corresponding

separable distribution q1(f), Equation (102),

Eq f = f q (f)df. (102) 1 { } 1 Z 36 Initialization

1 T 1 1 − T 1 f = H V ǫ− H + V f − H V ǫ− g

 (a) - update estimatedf b b b b V f = diag [vf ] (d) b 1 2 b βf + 2 f j vf j = 3 αf + 2 b (b) - update estimated variances vf j b V ǫ = diag [vǫ] (e) 2 b 1 b βǫ+ gi H if 2 −   vǫi = 3 αǫ+ 2 b

(c) - update estimated uncertainties variances vǫi b

Figure 17: Iterative algorithm corresponding to Joint MAP estimation for Student-t hierarchical model, non-stationary Student-t uncertainties model

If the approximation of the posterior distribution with a separable one can be done in such a way that conserves the mean, i.e. Equation (103),

Eq x = Ep x , (103) { } { } for all the unknowns of the model, a great amount of computational cost is gained. In particular, for the proposed hier- archical model, Equation (96), the posterior distribution, Equation (97), is not a separable one, making the analytical computations of the PM very difficult. One way the compute the PM in this case is to first approximate the posterior

law p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) with a separable law q(f, vf , vǫ g, αf ,βf , αǫ,βǫ), Equation (104), | |

p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) q(f, vf , vǫ g, αf ,βf , αǫ,βǫ)= q (f) q (vf ) q (vǫ) (104) | ≈ | 1 2 3

where the notations from Equation (105) are used

M N

q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (105) j=1 i=1 Y Y

37 Algorithm 1 Joint MAP - Student-t hierarchical model, non-stationary Student-t uncertainties model Ensure: INITIALIZATION f (0) ⊲ Where f (0) = f via a basic estimation method or f (0) = 0 (0) 1: function JMAP(αǫ,βǫ, αf ,βf , g, H, f ,M,N,NoIter) ⊲ αǫ,βǫ, αf ,βf are set in modeling phase, Eq. (96) 2: for n =0 to IterNumbb do b b b 3: for j =1 to M do 1 (n) 2 (n) β + (f j ) (n) 4: v = f 2 ⊲ v is computed using α ,β , f (n) f j α + 3 f j f f f b 2 5: end for (n) (n) (n) (n) 6: V f b = diag v ⊲ The diagonal matrix V f isb build using v , j 1, 2,...,M f f j ∈{ } 7: for i =1 to N do h i 2 1 (n) b βǫ+ b gi−H if b b (n) 2   (n) (n) 8: vǫi = 3 b ⊲ vǫi is computed using αǫ,βǫ, gi, Hi, f αǫ+ 2 9: end for (n) (n) (n) (n) 10: V ǫ b = diag vǫ ⊲ The diagonal matrix Vbǫ is build using vǫi , i 1, 2,...,N ∈{ } h i −1 −1 −1 −1 (n+1) T (n) (n) T (n) (n+1) 11: fb = H b V ǫ H + V f H V ǫ b g b ⊲ New value f 12: end for         b (n+1) (nb) (n) b b b return f , vf , vǫ , n = NoIter 13: end function  b b b by minimizing of the Kullback-Leibler divergence, defined as:

KL (q(f, vf , vǫ g, αf ,βf , αǫ,βǫ): p(f, vf , vǫ g, αf ,βf , αǫ,βǫ)) = | | q(f, vf , vǫ g, αf ,βf , αǫ,βǫ) (106) = ... q(f, vf , vǫ g, αf ,βf , αǫ,βǫ) ln | dfdvǫdvf | p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) ZZ Z | where the notations from Equation (107) are used

M N

dvf = dvf j ; dvǫ = dvǫi . (107) j=1 i=1 Y Y Equation (105) is selecting a partial separability for the approximated posterior distribution

q(f, vf , vǫ g, αf ,βf , αǫ,βǫ) in the sense that a total separability is imposed for the distributions corresponding to | the two variances appearing in the hierarchical model, q2 (vf ) and q3 (vǫ) but not for the distribution corresponding M to f. Evidently, a full separability can be imposed, by adding the supplementary condition q1(f) = j=1 q1j (f j ) in Equation (105). This case is considered in Subsection (4.6.3). The minimization can be doneQ via alternate optimization resulting the following proportionalities from Equations (108a),(108b)and (108c),

q (f) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , (108a) 1 ∝ | * q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , j 1, 2 ...,M , (108b) ∝ | q (f ) q − (v ) q (v ) ∈{ } *  1 2 j fj 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , i 1, 2 ...,N , (108c) ∝ | q (f ) q (v ) q − (v ) ∈{ } *  1 2 f 3 i ǫi )

38 using the notations: M N

q2−j (vf j )= q2k(vf k ); q3−i(vǫi )= q3k(vǫk ) (109) k ,k j k ,k i =1Y6= =1Y6= and also

u(x) = u(x)v(y)dy. (110)  v(y) Z Via Equation (99) and Equation (100), the analytical expression of logarithm of the posterior distribution is obtained, Equation (111):

N N 1 1 − 2 3 βǫ ln p(f, vǫ, vf g)= V ǫ (g Hf) αǫ + ln vǫi | − 2k − k− 2 − vǫ i=1 i=1 i   X X (111) M M 1 1 − 2 3 βf V f f αf + ln vf j − 2k k− 2 − vf j=1 j=1 j   X X

Computation of the analytical expression of q1(f). The proportionality relation corresponding to q1(f) is pre-

sented in established in Equation (108a). In the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of f | can be regarded as constants. Via Equation (111) the integral defined in Equation (110) becomes: hi 1 1 1 − 2 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V ǫ (g Hf) V f . | −2 k − k − 2 k f k  q2(vf ) q3(vǫ)  q3(vǫ)  q2(vf ) (112) Introducing the notations:

T v = v−1 ; v = v ... v ... v ; V = diag [v ] f j f j f f 1 f j f M f f q v   4j ( fj )   (113) T e e −1 ev e e e V v e vǫi = vǫi ; ǫ = vǫ1 ... vǫi ... vǫN ; ǫ = diag [ ǫ] q v   3i( ǫi )   the integral from Equatione (112) becomes: e e e e e e

1 1 1 − 2 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V ǫ (g Hf) V f . (114) | −2k − k− 2k f k  q2(vf ) q3(vǫ) e e Noting that ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) is a quadratic criterion and considering the proportional- |  q2(vf ) q3(vǫ) ity from Equation (108a) it can be concludedthat q1 (f) is a multivariate Normal distribution. Minimizing the criterion leads to the analytical expression of the corresponding mean. The variance is obtained by identification:

−1 T T f = H V ǫH + V f H V ǫg, q1(f)= f f, Σ ,  (115) N |   −1  b Te e e    Σ = H1 V ǫH + V f . b b    We note that both the expressions of the mean and varianceb depende on expectanciese corresponding to two variances of the hierarchical model.

39 Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) is

presented in established in Equation (108b). In the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of |

vf j can be regarded as constants. Via Equation (111) the integral defined in Equation (110) becomes:

1 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V f f | q (f ) q − (v ) q (v ) − 2 k k q (f ) q − (v )   1 2 j f j 3 ǫ   1 2 j f j (116) 3 βf αf + ln vf − 2 j − v   f j Introducing the notations:

T v−1 = v−1 ... v−1 v−1 v−1 ... v−1 ; V −1 = diag v−1 (117) f −i f 1 f i−1 f i f i+1 f N f −i f −i h i   − 1 2 e the integral V f fe e cane be written:e e e k k q (f ) q − (v )   1 2 j fj

− 1 − 1 V 2 f = V 2 f 2 (118) f f −i k k q (f ) q − (v ) k k q (f )   1 2 j fj   1 e Considering that q1(f) is a multivariate Normal distribution, Equation (115):

− 1 − 1 V 2 f 2 = V 2 f 2 + Tr V −1 Σ = C + v−1 f 2 + Σ (119) f −i f −i f −i f i j jj k k q (f ) k k   1     From Equation (116) ande Equation (119): e b e b b b

3 1 2 Σ −1 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = αf + ln vf j βf + f j + jj vf | q (f ) q − (v ) q (v ) − 2 − 2   1 2 j fj 3 ǫ      b b (120)

from which it can establish the proportionality corresponding to q2j (vf j ):

1 2 Σ 3 βf + f + jj −(αf + ) 2 j q (v ) v 2 exp , (121) 2j f j f j ∝ − vf   b b  leading to the conclusion that is an Inverse Gamma distribution with the following shape and scale parame- q2j (vf j )   ters: 1 αfj = αf + 2 (122) q2j (vf j )= vf j αfj , βfj ,  IG | β = β + 1 f 2 + Σ    bfj f 2 j jj b b    b b b Computation of the analytical expression of q3i(vǫi ). The proportionality relation corresponding to q3i(vǫi ) is presented in established in Equation (108c). In the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of |

vǫi can be regarded as constants. Via Equation (111) the integral defined in Equation (110) becomes:

1 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V ǫ (g Hf) q ( ) q − (v ) | q (f ) q (v ) q − (v ) − 2 k − k 1 f 3 i ǫi   1 2 f 3 i ǫi D E (123) 3 βǫ αǫ + ln vǫ − 2 i − v   ǫi

40 Introducing the notations:

−1 −1 −1 −1 −1 −1 T −1 −1 vǫ−i = vǫ1 ... vǫi−1 vǫi vǫi+1 ... vǫN ; V ǫ−i = diag vǫ−i (124)   1   − 2 e e e e e e e the integral V ǫ (g Hf) can be written: k − k q (f ) q − (v )   1 3 i ǫi

1 1 2 − 2 −1 2 V ǫ (g Hf) = V ǫ−i (g Hf) (125) k − k q (f ) q − (v ) k − k q (f )   1 3 i ǫi     1 e Considering that q1(f) is a multivariate Normal distribution, Equation (115):

1 1 V −1 2 g Hf 2 V −1 2 g Hf 2 HT V −1 HΣ ǫ−i ( ) = ǫ−i + Tr ǫ−i (126) k − k q (f ) k − k     1       e e b b and considering as constants all terms free of vǫi :

1 2 −1 2 2 −1 T −1 −1 T V g Hf = C + v gi Hif ; Tr H V HΣ = C + v HiΣHi (127) k ǫ−i − k ǫi − ǫ−i ǫi         e b b e b b where Hi is the line i of the matrix H, so we can conclude:

− 1 2 2 2 Σ T −1 V ǫ (g Hf) = C + Hi Hi + gi Hif vǫi (128) k − k q (f ) q − (v ) −   1 3 i ǫi     From Equation (123) and Equation (128): b b

3 f v v g ln p( , f , ǫ , αf ,βf , αǫ,βǫ) q (f ) q (v ) q − (v ) = αǫ + ln vǫi h | i 1 2 f 3 i ǫi − 2   (129) 2 1 T −1 βǫ + HiΣHi + gi Hif v − 2 − ǫi      b b from which it can establish the proportionality corresponding to q3i(vǫi ):

2 β + 1 H ΣH T + g H f 3 ǫ 2 i i i i −(αǫ+ 2 ) − q3i(vǫi ) vǫi exp      , (130) ∝ − vǫ   b b 

  leading to the conclusion that q3i(vǫi ) is an Inverse Gamma distribution with the following shape and scale parameters:

1 αǫj = αǫ + 2 (131) q3i(vǫi )= vǫi αǫi , βǫi ,  2 IG | 1 Σ T  βbǫj = βǫ + Hi Hi + gi Hif    2 − b b      Equations (115), (122)and(131) resume the distributions b families and theb corresponding parametersb for q1(f), a mul- tivariate Normal distribution and q j (vf ), j 1, 2,...,M and q i(vǫ ), i 1, 2,...,N , Inverse Gamma dis- 2 j ∈ { } 3 i ∈ { } −1 tributions. However, the parameters corresponding to the multivariate Normal distribution are expressed via V ǫ and V −1 (and by extension all elements forming the three matrices v−1, i 1, 2,...,N and v−1, j 1, 2,...,M ). f ǫi ∈{ } f j ∈{ e } e 41 e e −1 −1 Computation of the analytical expressions of V ǫ and V f . For an Inverse Gamma distribution with scale and shape parameters α and β, (x α, β), the following relation holds: IG | e e α x−1 = (132) β  IG(x|α,β) The prove of the above relation is done by direct computation, using the analytical expression of the Inverse Gamma Distribution: βα β βα Γ(α + 1) βα+1 β x−1 = x−1 x−α−1 exp dx = x−(α+1)−1 exp dx = Γ(α) − x Γ(α) βα+1 Γ(α + 1) − x  IG(x|α,β) Z   Z   α α = (x α +1,β) dx = β IG | β Z 1 | {z }

Since q j (vf ), j 1, 2,...,M and q i(vǫ ), i 1, 2,...,N are Inverse Gamma distributions, with the corre- 2 j ∈ { } 3 i ∈ { } sponding parameters αf and βf , j 1, 2,...,M respectively αǫ and βǫ , i 1, 2,...,N the expectancies j j ∈ { } i i ∈ { } v−1 and v−1 can be expressed via the parameters of the two Inverse Gamma distributions using Equation (132): f j ǫi b b b b

e e −1 αf −1 αǫi vf = ; vǫi = (133) βf βǫi b b e e Using the notation introduced in (113): b b

αf α 1 ... 0 ... 0 ǫ1 ... 0 ... 0 βf1 βǫ b . . . b 1 b ...... b . . . . .  . . . . .   ......  αf α −1  0 ... j ... 0  −1 −1  0 ... ǫi ... 0  −1 V f = β = V f ; V ǫ = β = V ǫ (134)  bfj   ǫi     . b. .   . .. b. .. .   . .. b. .. .  e  . . . . .  b e  . . . . .  b  α   αǫ  0 ... 0 ... fM  0 ... 0 ... N   β   βǫ   bfM   b N   b   b  −1 −1 In Equation (134) other notations are introduced for V f and V ǫ . Both values were expressed during the model via

unknown expectancies, but via Equation (134) thosee values don’te contain any more integrals to be computed. There-

fore, the new notations represent the final analytical expressions used for expressing the density functions qi. Using

Equation (134) and Equations (115),(122) and (131), the final analytical expressions of the separable distributions qi

42 are presented in Equations (135a),(135b)and (135c).

−1 T T f = H V ǫH + V f H V ǫg, q1(f)= f f, Σ ,  , (135a) N |   −1  b Tb b b    Σ = H1 V ǫH + V f b b    1  αbfj = αf +b2 b (135b) q2j (vf j )= vf j αfj , βfj ,  , j 1, 2,...,M , IG | β = β + 1 f 2 + Σ ∈{ }    bfj f 2 j jj b b    1  bαǫj = αǫ + 2 b b (135c) q3i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N . IG | 1 Σ T ∈{ }  βbǫj = βǫ + Hi Hi + gi Hif    2 − b b     Equation (135a) establishes the dependency b between the parametersb correspondingb to the multivariate Normal dis-

tribution q1(f) and the others parameters involved in the hierarchical model: the mean f and the covariance ma-

−1 −1 trix Σ depend on V and V which, via Equation (134) are defined using αf , βf , j 1, 2,...,M and ǫ f j j b ∈ { } n o αǫ , βǫ ,i 1, 2,...,N . The dependency between the parameters of the multivariate Normal distribution q (f) ib i ∈{ b }b b b 1 n o and the parameters of the Inverse Gamma distributions q j (vf ), j 1, 2,...,M and q i(vǫ ),i 1, 2,...,N is b b 2 j ∈{ } 3 i ∈{ } presented in Figure (18). Equation (135b) establishes the dependency between the parameters correspondingto the In-

✲ αfj , βfj , αǫj , βǫj f , Σ n o n o b b b b b b Figure 18: Dependency between q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters

verse Gamma distributions q j (vf ), j 1, 2,...,M and the others parameters involved in the hierarchical model: 2 j ∈{ } the shape and scale parameters αf , βf , j 1, 2,...,M depend on the mean f and the covariance matrix Σ j j ∈ { } of the multivariate Normal distributionn q o(f), Figure (19). Equation (135c) establishes the dependency between the b b 1 b b

✲ f , Σ αfj , βfj n o b b b b Figure 19: Dependency between q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters

parameters corresponding to the Inverse Gamma distributions q i(vǫ ),i 1, 2,...,N and the others parameters 3 i ∈ { } involved in the hierarchical model: the shape and scale parameters αǫ , βǫ ,i 1, 2,...,N depend on the mean i i ∈{ } f and the covariance matrix Σ of the multivariate Normal distributionn q (f)o, Figure (20). b 1b Theb iterative algorithm obtainedb via PM estimation is presented Figure (21).

43 ✲ f , Σ αǫj , βǫj n o b b b b Figure 20: Dependency between q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters

Initialization

1 T − T f = H V ǫH + V f H V ǫg   1 Σ T − b = bH1 V ǫHb + V f b   (a) - update estimated f and the covariance matrix Σ αfj b b b V f = diag βfj b  b  (d) b 1 b α = α + ct. during iterations fj f 2 ← 1 2 Σ βfj = βf + f j + jj b 2 (b) - update estimated param-  IG b eters modelling the variancesb bvf j αǫj V ǫ = diag βǫj  b  (e) 1 b b α = α + ct. during iterations ǫj ǫ 2 ← 2 β = β + 1 H ΣH T + g H f ǫj b ǫ 2 i i i − i     (c) - update estimated parameters b b IG b modelling the uncertainties variances vǫi

Figure 21: Iterative algorithm corresponding to PM estimation via VBA - partial separability for Student-t hierarchical model, non-stationary Student-t uncertainties model

4.6.3 Posterior Mean estimation via VBA, full separability

In this subsection, the Posterior Mean (PM) estimation is again considered, but via a full separable approximation.

The posterior distribution is approximated by a full separable distribution q (f, vf , vǫ), i.e. a supplementary condition is added in Equation (105):

M M N

q1(f)= q1j (f j ), ; q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (136) j=1 j=1 i=1 Y Y Y

44 Algorithm 2 PM via VBA partial sep. - Student-t hierarchical model, non-stationary Student-t uncertainties model Ensure: INITIALIZATION f (0) ⊲ Where f (0) = f via a basic estimation method or f (0) = 0 (0) 1: function JMAP(αǫ,βǫ, αf ,βf , g, H, f ,M,N,NoIter) ⊲ αǫ,βǫ, αf ,βf are set in modeling phase, Eq. (96) 2: for n =0 to IterNumbb do b b b 3: for j =1 to M do 1 (n) 2 (n) β + (f j ) (n) 4: v = f 2 ⊲ v is computed using α ,β , f (n) f j α + 3 f j f f f b 2 5: end for (n) (n) (n) (n) 6: V f b = diag v ⊲ The diagonal matrix V f isb build using v , j 1, 2,...,M f f j ∈{ } 7: for i =1 to N do h i 2 1 (n) b βǫ+ b gi−H if b b (n) 2   (n) (n) 8: vǫi = 3 b ⊲ vǫi is computed using αǫ,βǫ, gi, Hi, f αǫ+ 2 9: end for (n) (n) (n) (n) 10: V ǫ b = diag vǫ ⊲ The diagonal matrix Vbǫ is build using vǫi , i 1, 2,...,N ∈{ } h i −1 −1 −1 −1 (n+1) T (n) (n) T (n) (n+1) 11: fb = H b V ǫ H + V f H V ǫ b g b ⊲ New value f 12: end for         b (n+1) (nb) (n) b b b return f , vf , vǫ , n = NoIter 13: end function  b b b

As in Subsection (4.6.2), the approximationis done by minimizing of the Kullback-Leibler divergence, Equation (106), via alternate optimization resulting the following proportionalities from Equations (137a),(137b)and (137c),

q (f j ) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , j 1, 2 ...,M , (137a) 1 ∝ | ∈{ } * q1−j (f j )q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , j 1, 2 ...,M , (137b) ∝ | q (f ) q − (v ) q (v ) ∈{ } *  1 2 j fj 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) , i 1, 2 ...,N , (137c) ∝ | q (f ) q (v ) q − (v ) ∈{ } (  1 2 f 3 i ǫi ) using the notations introduced in Equation (109), Equation (110) and Equation (138).

M

q1−j (f j )= q1k(f k) (138) k ,k j =1Y6=

The analytical expression of logarithm of the posterior distribution ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) is obtained in | Equation (111).

Computation of the analytical expression of q1j (f j ). The proportionality relation corresponding to q1(f) is pre-

sented in established in Equation (137a). In the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of f i |

45 can be regarded as constants:

1 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V ǫ (g Hf) | q − (f ) q (v ) q (v ) − 2 k − k q − (f ) q (v )   1 j j 2 f 3 ǫ   1 j j 3 ǫ (139) 1 − 1 V 2 f . − 2 k f k  q1−j (f j ) q2(vf ) Using Equation (113) the integral from Equation (139) becomes:

1 1 1 − 2 1 − 2 ln p(f, vf , vǫ g, αf ,βf , αǫ,βǫ) = V ǫ (g Hf) V f . | −2 k − k − 2 k f k  q1−j (f j ) q2(vf ) q3(vǫ)  q1−j (f j )  q1−j (f j ) e e (140)

Considering all the f j free terms as constants, the first norm can be written:

1 1 1 − 2 − 2 j 2 2 jT − 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (141) k − k k k − −  where Hj representse the column j of the matrix He , H−j represents the matrixe H except the column j, Hj and f −j

represents the vector f except the element f j . Introducing the notation

T −j f j = f j q1j (f j )df j ; f = f 1 ... f j−1 f j+1 ... f M (142) Z h i the expectancy of the firste norm becomes: e e e e e

1 1 1 − 2 − 2 j 2 2 jT − 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (143) k − k q − (f ) k k − −   1 j j   Considering all thee free f j terms as constants, the expectancye for the second norme becomes: e

1 − 2 2 −1 2 V f = C + v f j (144) k f k f j  q1−j (f j )

From Equation (137a),(140),(143),and (e144) the proportionality foreq1j (f j ) becomes:

1 − 2 j 2 −1 2 jT −1 −j −j q1j (f j ) exp V ǫ H + v f j 2H V g H f f j (145) ∝ k k f j − ǫ − n 1   o − 2 j 2 −1 2 jT −1 −j −j Considering the criterion J(f j ) = Ve ǫ H +ev f j 2H eV g H f f j which is quadratic, k k f j − ǫ −   we conclude q j (f j ) is a Normal distribution. For computing the mean of the Normal distribution, it is sufficient to 1 e e e compute the solution that minimizes the criterion J(f j ):

jT −1 −j −j ∂J(f j ) H V ǫ g H f =0 f j = 1 − . (146) ∂f ⇔ − 2 j −1 j V ǫ H + v  ke k f The variance can be obtained by identification. Theb analytical expressions for the mean and the variance corresponding e e to the Normal distributions, q1(f j ) are presented in Equation (147).

jT −1 −j −j H V ǫ (g−H f ) f = j − 1 − f 2 j 2 1 kV ǫ H k +vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (147) f N | 1 e ∈{ }  varb j = − 1    2 j 2 −1 kV ǫ H k +v b fj c f  e  c 46 Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) established in Equation (137b) refers to vf , so in the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free j | of vf j can be regarded as constants,

1 1 2 −1 −1 ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ)= C ln vf j f j vf (αf + 1) ln vf j βf vf , (148) | − 2 − 2 q1j (f j ) j − − j

so the integral of the logarithm becomes: 3 1 ln p (f, v , v g, α ,β , α ,β ) = C α + ln v β + f 2 + var v−1. f ǫ f f ǫ ǫ q (f ) q − (v ) q (v ) f f j f j j f j h | i 1 2 j fj 3 ǫ − 2 − 2       (149) b c

Equation (149) leads to the conclusion that q2j (vf j ) is an Inverse Gamma distribution. Equation (150) presents the analytical expressions for to the shape and scale parameters corresponding to the Inverse Gamma distribution. 1 αfj = αf + 2 (150) q2j (vf j )= vf j αfj , βfj ,  , j 1, 2,...,M IG | β = β + 1 f 2 + var ∈{ }    bfj f 2 j j b b    b b c Computation of the analytical expression of q3i(vǫi ). The proportionality relation corresponding to q3i(vǫi ) es-

tablished in Equation (137c) refers to vǫ so in the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of i |

vǫi can be regarded as constants:

3 1 2 −1 ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ)= C αǫ + ln vǫ βǫ + (gi Hif) v . (151) | − 2 i − 2 − ǫi     Introducing the notation

T Not Σ f = f 1 ... f j ... f M = f ; = diag [varj ] (152) q (f )   1 h i the expectancy of the logarithm becomes b b b b b c 1 f v v g ln p ( , f , ǫ , αf ,βf , αǫ,βǫ) q (f ) q (v ) q − (v ) = C αǫ +1+ ln vǫi h | i 1 2 f 3 i ǫi − 2   2 1 T −1 βǫ + HiΣHi + gi Hif v , − 2 − ǫi      (153) b b

so and the proportionality relation for q3i(vǫi ) from Equation (137c) can be written:

α 3 2 −( ǫ+ 2 ) 1 T −1 q i(vǫ ) vǫ exp βǫ + HiΣHi + gi Hif v (154) 3 i ∝ i − 2 − ǫi       

Equation (154) shows that q3i(vǫi ) are Inverse Gamma distributions.b The analytical expressiob ns of the corresponding parameters are presented in Equation (155). 1 αǫi = αǫ + 2 (155) q3i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N IG | 1 Σ T ∈{ }  βbǫi = βǫ + Hi Hi + gi Hif    2 − b b      b b b 47 Since q (vf ), j 1, 2,...,M and q i(vǫ ),i 1, 2,...,N are Inverse Gamma distributions, it is easy to obtain 2 j ∈{ } 3 i ∈{ } analytical expressions for V −1, defined in Equation (113) and v−1, j 1, 2,...,M , obtaining the same expressions ǫ f j ∈{ } as in Equation (134). Using Equation (134) and Equations (147),(150) and (155), the final analytical expressions of e e the separable distributions qi are presented in Equations (156a),(156b) and(156c).

jT −1 −j −j H V ǫ (g−H f ) f j = − 1 f 2 j 2 −1 kV ǫ H k +vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (156a) f N | 1 e ∈{ }  varb j = − 1    2 j 2 −1 kV ǫ H k +v b f j c f  e  c 1 αfj = αf + 2 (156b) q2j (vf j )= vf j αfj , βfj ,  , j 1, 2,...,M IG | β = β + 1 f 2 + var ∈{ }    bfj f 2 j j b b    1  bαǫi = αǫ + 2 b c (156c) q3i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N IG | 1 Σ T ∈{ }  βbǫi = βǫ + Hi Hi + gi Hif    2 − b b     Equations (156a),(156b) and (156c) establish b dependenciesb between the parametersb of the distributions, very similar to the one presented in Figures (18),(19) and (20). The iterative algorithm obtained via PM estimation with full separability, is presented Figure (22).

4.7 IS: Student-t hierarchical model: non-stationary Student-t uncertainties model, un- known uncertainties variances

the hierarchical model is using as a prior the Student-t distribution; •

the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; •

the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

for the uncertainties vector ǫ a non-stationary Student-t uncertainties model is proposed, i.e. a multivariate • Student-t distribution expressed via StPM is used under the following assumption:

a) the variance vector vǫ is unknown;

48 Initialization

H jT V −1 g H −j f −j ǫ ( − ) f j = − 1 2 j 2 −1 Vfǫ H +v k k fj f 1 bvarj = − 1 e 2 j 2 −1 V ǫ H +v k k fj α (a) - update estimated f andf the corespnding variance var fj c j e j V f = diag βfj  b  (d) b 1 b α = α + ct. during iterations fj f 2 ← 1 2 βfj = βf + f j + varj b 2 (b) - update estimated param-  b bIG eters modelling the variancescvf j αǫj V ǫ = diag βǫj  b  (e) 1 b b α = α + ct. during iterations ǫi ǫ 2 ← 2 β = β + 1 H ΣH T + g H f ǫi b ǫ 2 i i i − i   (c) - update estimated parameters  b b IG b modelling the uncertainties variances vǫi

Figure 22: Iterative algorithm corresponding to PM estimation via VBA - full separability for Student-t hierarchical model, non-stationary Student-t uncertainties model

N 1 1 − 2 1 − 2 2 p (g f, vǫ)= (g Hf, V ǫ) vǫ exp V ǫ (g Hf) , | N | ∝ i −2k − k  i=1   Likelihood : nsStL: N Y N N β  v −αǫ−1 ǫ  p ( ǫ αǫ,βǫ)= i (vǫi αǫ,βǫ) i vǫi exp i ,  =1 =1 =1 vǫi  | IG | ∝ − n o Q VQǫ = diag [vǫ] , vǫ = [...,P vǫi ,...] ,    1 − 1  − 2 1 2 2 p(f z, vξ)= (f Dz, V ξ) det V ξ exp V (f Dz) , | N | ∝ { } − 2 k ξ − k  M M −αnξ−1 M βξ o Likelihood : nsStL: p(vξ αf ,βf )= (vξ αξ,βξ) v exp ,  j=1 j j=1 ξj j=1 vξ  | IG | ∝ − j  n o Q VQξ = diag [vξ] , vξ = ...,P vξj ,... ,    Student-t nsStL UNK   1 1  − 1 − 2 2 p(z 0, vz)= (z 0, V z) det V z 2 exp V z z , | N | ∝ { } − 2 k k  M M n o M P rior : StPM: v −αz −1 βz  p( z αz,βz)= j=1 (vzj αz,βz) j=1 vzj exp j=1 v ,  | IG | ∝ − zj  n o Q VQz = diag [vz] , vz = ...,P vzj ,... .   49   (157)  The hierarchical model showed in Figure (3), which is build over the linear forward model, Equation (1) and Equa- tion (2), using as a prior for z a Student-t distribution, expressed via the Student-t Prior Model (StPM), Equation (7) and modellingthe uncertaintiesof the model ǫ and ξ using the non-stationaryStudent-t Uncertainties Model (nsStUM), Equation (81), is presented in Equation(157). The posterior distribution is obtained via the Bayes rule, Equation (158):

p(f, z, vǫ, vξ, vz g,αǫ,βǫ, αξ,βξ, αz,βz) p (g f, vǫ) p (vǫ αǫβǫ) p(f z, vξ)p(vξ αf ,βf )p(z 0, vz)p(vz αz,βz) | ∝ | | | | | | N N N − 1 1 − 1 βǫ 2 V 2 g Hf 2 −αǫ−1 vǫi exp ǫ ( ) vǫi exp ∝ −2k − k − vǫ i=1 i=1 ( i=1 i ) Y   Y X M M 1 1 − 1 − −αξ−1 βξ det V 2 exp V 2 (f Dz) 2 v exp ξ ξ ξj { } −2k − k − vξ  j=1 j=1 j   Y  X  M M − 1 1 − 1 βz  V 2 V 2 z 2 −αz−1 det z exp z vzj exp { } −2k k − vz  j=1 j=1 j   Y  X  (158)   The goal is to estimate the unknowns of the hierarchical model, namely f, the main unknown of the linear forward model expressed in Equation (1), z, the main unknown of Equation (2), supposed sparse and consequently modelled using the Student-t distribution and the three variances appearing in the hierarchical model, Equation (157), the vari-

ance corresponding to the sparse structure z, namely vz and the two variances corresponding to the uncertainties of

model ǫ, and ξ, namely vǫ and vξ.

4.7.1 Joint MAP estimation

First, the Joint Maximum A Posterior (JMAP) estimation is considered: the unknowns are estimated on the basis of the available data g, by maximizing the posterior distribution:

f, z, vξ, vǫ, vz = argmax p(f, z, vξ, vǫ, vz g, αξ, βξ, αǫ, βǫ, αz, βz) | (f , z, vξ, vǫ, vz )   (159) b b b b b = arg min (f, z, vξ, vǫ, vz) , L (f , z, vξ, vǫ, vz ) where for the second equality the criterion (f, z, vξ, vǫ vz) is defined as: L

(f, z, vξ, vǫ, vz)= ln p (f, z, vξ, vǫ, vz g, αξ, βξ, αǫ, βǫ, αz, βz) (160) L − |

50 The MAP estimation corresponds to the solution minimizing the criterion (f, z, vξ, vǫ, vz). From the analytical L expression of the posterior distribution, Equation (158) and the definition of the criterion , Equation (160), we obtain: L

(f, z, vξ, vǫ, vz)= ln p (f, z, vξ, vǫ, vz g, αξ, βξ, αǫ, βǫ, αz, βz) L − | N N 1 1 − 2 2 3 βǫ + V ǫ (g Hf) + αǫ + ln vǫi + 2k − k 2 vǫ i=1 i=1 i   X X N N 1 (161) 1 − 2 2 3 βξ + V ξ (f Dz) + αξ + ln vξi + 2k − k 2 vξ i=1 i=1 i   X X M M 1 1 − 2 2 3 βf + V z z + αz + ln vzj + 2k k 2 vz j=1 j=1 j   X X One of the simplest optimisation algorithm that can be used is an alternate optimization of the criterion

(f, z, vξ, vǫ, vz) with respect to the each unknown: L

With respect to f: •

f z v v v 1 1 ∂ ( , , ξ, ǫ, z) ∂ − 2 − 2 2 L =0 V ǫ 2 (g Hf) + V (f Dz) =0 ∂f ⇔ ∂f k − k k ξ − k T −1 −1  H V ǫ (g Hf)+ V ξ (f Dz)=0 ⇔− − − T −1 −1 T −1 −1 H V ǫ H + V ξ f = H V ǫ g + V ξ Dz ⇔ T −1  −1 −1 T −1 −1 f = H V ǫ H + V ξ H V ǫ g + V ξ Dz ⇒   With respect to z: b •

1 1 ∂ (f, z, vξ, vǫ, vz) ∂ − 2 2 − 2 2 L =0 V (f Dz) + V z z =0 ∂z ⇔ ∂f k ξ − k k k T −1 −1  D V ξ (f Dz)+ V z z =0 ⇔− − T −1 −1 T −1 D V ξ D + V z z = D V ξ f ⇔ T −1  −1 −1 T −1 z = D V ξ D + V z D V ξ f ⇒  With respect to vǫ , i 1, 2,...,N : b • i ∈{ } − 1 2 First, we develop the norm V ǫ 2 (g Hf) : k − k − 1 2 T −1 T −1 T T −1 V ǫ 2 (g Hf) = g V ǫ g 2g V ǫ Hf + H f V ǫ Hf k − k − N N N −1 2 −1 −1 T T = v gi 2 v giHif + v f Hi Hif, ǫi − ǫi ǫi i=1 i=1 i=1 X X X

51 where Hi denotes the line i of the matrix H, i 1, 2,...,N , i.e. Hi = [hi , hi ,..., hiM ]. ∈{ } 1 1

∂ (f, z, vξ, vǫ, vz) ∂ 3 1 2 T T −1 L =0 αǫ + ln vǫ + βǫ + gi 2giHif + f Hi Hif v =0 ∂v ⇔ ∂v 2 i 2 − ǫi ǫi ǫi      3 1 2  αǫ + vǫ βǫ + (gi Hif) =0 ⇔ 2 i − 2 −     1 2 βǫ + 2 (gi Hif) vǫi = −3 ⇒ αǫ + 2 b

With respect to vξ , j 1, 2,...,M : • j ∈{ } − 1 2 First, we develop the norm V ξ 2 (f Dz) : k − k − 1 2 T −1 T −1 T T −1 V ξ 2 (f Dz) = f V ξ f 2f V ξ Dz + D z V ξ Dz k − k − M M M −1 2 −1 −1 T T = v f j 2 v f j Dj z + v z Dj Dj z, ξj − ξj ξj j=1 j=1 j=1 X X X

where Dj denotes the line j of the matrix D, j 1, 2,...,M , i.e. Dj = [dj , dj ,..., djM ]. ∈{ } 1 1

∂ (f, z, vξ, vǫ, vz) ∂ 3 1 2 T T −1 L =0 αξ + ln vξ + βξ + f j 2f j Dj z + z Dj Hj z v =0 ∂v ⇔ ∂v 2 j 2 − ξj ξj ξj       3 1 2 αξ + vξ βξ + (f j Diz) =0 ⇔ 2 j − 2 −     1 2 βξ + 2 (f j Dj z) vξj = −3 ⇒ αξ + 2 b

With respect to vzj , j 1, 2,...,M : • ∈{ }

∂ (f, z, vξ, vǫ, vz) ∂ 3 1 2 −1 L =0 αz + ln vz + βz + zj v =0 ∂v ⇔ ∂v 2 j 2 zj zj zj      3 1 2 αz + vz βz + zj =0 ⇔ 2 j − 2     1 2 βz + 2 zj vzj = 3 ⇒ αz + 2 b The iterative algorithm obtained via JMAP estimation is presented Figure (23).

4.7.2 Posterior Mean estimation via VBA, partial separability

In this subsection, the Posterior Mean (PM) estimation is considered. The Joint MAP computes the mod of the posterior distribution. The PM computes the mean of the posterior distribution. One of the advantages of this estimator is that it minimizes the Mean Square Error (MSE). Computing the posterior means of any unknown needs great

52 Initialization

1 T 1 1 − T 1 1 f = H V ǫ− H + V ξ− H V ǫ− g + V ξ− Dz

 (a) - update estimated f  b b b b b b

1 T 1 1 − T 1 z = D V ξ− D + V z− D V ξ− f  (b) - update estimated z b b b b b V ξ = diag [vξ] (f)

1 2 b βξ+ f j Dj z b 2 ( − ) vξj = 3 αξ+ 2 b b (c) - update estimated variances vf j b V ǫ = diag [vǫ] (g)

b 2 b 1 βǫ+ gi H if 2 −   vǫi = 3 αǫ+ 2 b

(d) - update estimated uncertainties variances vǫi b V z = diag [vz] (e)

1 2 b βz+ 2 zj b vzj = 3 αz+ 2

(e) - update estimated uncertainties variances vǫi b Figure 23: Indirect sparsity (via z) - Iterative algorithm corresponding to Joint MAP estimation for Student-t hierar- chical model, non-stationary Student-t uncertainties model dimensional integration. For example, the mean corresponding to f can be computed from the posterior distribution using Equation (162),

Ep f = f p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz)dfdzdvξdvǫdvz. (162) { } | ZZZ 53 Algorithm 3 Joint MAP - Student-t hierarchical model, non-stationary Student-t uncertainties model Ensure: INITIALIZATION f (0), z(0) (0) (0) 1: function JMAP(αξ,βξ, αǫ,βǫ, αz,βz, g, H, D, f , z ,M,N,NoIter) 2: for n =0 to IterNumb do 3: for j =1 to M do 1 (n) (n) 2 (n) βξ+ (f j −Dj z ) 4: v = 2 ξj α 3 b ξ+ 2 b 5: end for 6: for i =1 to N do b 2 1 (n) βǫ+ gi−H if (n) 2   7: vǫi = 3 b αǫ+ 2 8: end for 9: for jb =1 to M do 1 (n) 2 (n) βz+ 2 (zj ) 10: vzj = 3 αz+ 2 11: end for −1 −1 −1 −1 −1 b (n) (n) (n) (n) 12: f (n+1) = HT diag v H + diag v HT diag v g + diag v Dz ǫi ξj ǫi ξj             −1  −1 −1  −1   (n) (n) (n) 13: zb(n+1) = DT diag v D + diag v DT diag v f (n+1) b bξj bzj ξjb b        (n+1) (n+1) (n) (n) (n)     14: end forreturn f , z , v , vǫ , vz n = NoIter b b ξj i j b b b 15: end function   b b b b b

In general, these computations are not easy. One way to obtain approximate estimates is to approxi-

mate p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) by a separable one q(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = | | q1(f)q2(z)q3(vξ)q4(vǫ)q5(vz), then computing the posterior means using the separability. The mean correspond- ing to f is computed using the corresponding separable distribution q1(f), Equation (102),

Eq f = f q (f)df. (163) 1 { } 1 Z If the approximation of the posterior distribution with a separable one can be done in such a way that conserves the mean, i.e. Equation (164),

Eq x = Ep x , (164) { } { } for all the unknowns of the model, a great amount of computational cost is gained. In particular, for the proposed hier- archical model, Equation (157), the posterior distribution, Equation (158), is not a separable one, making the analytical computations of the PM very difficult. One way the compute the PM in this case is to first approximate the poste-

rior law p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) with a separable law q(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz), | | Equation (165),

p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) q(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) | ≈ | (165) = q1(f) q2(z) q3(vξ) q4(vǫ) q5(vz)

54 where the notations from Equation (166) are used

M N M

q3(vξ)= q2j (vξj ), ; q4(vǫ)= q3i(vǫi ); q5(vz)= q3i(vzj ) (166) j=1 i=1 j=1 Y Y Y by minimizing of the Kullback-Leibler divergence, defined as:

KL (q(f, z, vǫ, vξ, vz g, αǫ, βǫ, αξ, βξ , αz, βz) : p(f, z, vǫ, vξ, vz g, αǫ, βǫ, αξ, βξ, αz, βz)) = | | q(f, z, vǫ, vξ, vz g, αǫ, βǫ, αξ, βξ, αz, βz) = . . . q(f, z, vǫ, vξ, vz g, αǫ, βǫ, αξ , βξ, αz, βz) ln | dfdzdvξdvǫdvz, ZZ Z | p(f, z, vǫ, vξ, vz g, αǫ, βǫ, αξ, βξ, αz, βz ) | (167) where the notations from Equation (168) are used

M N M

dvξ = dvξj ; dvǫ = dvǫi ; dvz = dvzj . (168) j=1 i=1 j=1 Y Y Y Equation (166) is selecting a partial separability for the approximated posterior distribution q(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) in the sense that a total separability is imposed for the distributions | corresponding to the three variances appearing in the hierarchical model, q3 (vξ), q4 (vǫ) and q5 (vz) but not for the distribution corresponding to f and z. Evidently, a full separability can be imposed, by adding the supplementary M M conditions q1(f) = j=1 q1j (f j ) and q3(z) = j=1 q2j (zj ) in Equation (166). This case is considered in Subsection (4.7.3). The minimization can be done via alternate optimization resulting the following proportionalities from Equations (169a),(Q169b),(169c),(169d)and (169eQ ),

q (f) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , (169a) 1 ∝ | * q2(z) q3(vξ) q4(vǫ) q5(vz))

q (z) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , (169b) 2 ∝ | * q1(f ) q3(vξ) q4(vǫ) q5(vz))

q3j (vξj ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q − (v ) q (v ) q (v ) *  1 2 3 j ξj 4 ǫ 5 z ) j 1, 2 ...,M , (169c) ∈{ }

q4i(vǫi ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q (v ) q − (v ) q (v ) *  1 2 3 ξ 4 i ǫi 5 z ) i 1, 2 ...,N , (169d) ∈{ }

q5j (vzj ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q (v ) q (v ) q − (v ) *  1 2 3 ξ 4 ǫ 5 j zj ) j 1, 2 ...,M , (169e) ∈{ } using the notations:

M N M

q3−j (vξj )= q3k(vξk ); q4−i(vǫi )= q4k(vǫk ); q5−j (vzj )= q5k(vzk ) (170) k ,k j k ,k i k ,k j =1Y6= =1Y6= =1Y6= and also u(x) = u(x)v(y)dy. (171)  v(y) Z

55 Via Equation (160) and Equation (161), the analytical expression of logarithm of the posterior distribution is obtained, Equation (172):

N N 1 1 − 2 2 3 βǫ ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz)= V ǫ (g Hf) αǫ + ln vǫi | − 2k − k − 2 − vǫ i=1 i=1 i   X X M M 1 1 − 2 2 3 βξ V ξ (f Dz) αξ + ln vξj − 2k − k − 2 − vξ j=1 j=1 j   X X M M 1 1 − 2 2 3 βf V z z αz + ln vzj − 2k k − 2 − vz j=1 j=1 j   X X (172)

Computation of the analytical expression of q1(f) (first part). The proportionality relation corresponding to q1(f) is established in Equation (169a). In the expression of p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the terms free of f can be regarded as constants. Via Equation (172) the integral defined| in Equation (171) becomes: hi 1 1 − 2 2 p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = V ǫ (g Hf) | q (z) q (v ) q (v ) − 2 k − k q (v )   2 3 ξ 4 ǫ   4 ǫ (173) 1 − 1 V 2 (f Dz) 2 . − 2 k ξ − k  q2(z)q3(vξ) Introducing the notations:

T v = v−1 ; v = v ... v ... v ; V = diag [v ] ξj ξj ξ ξ1 ξj ξM ξ ξ q v   3j ( ξj )   (174) e −1 e e e e T e e vǫi = vǫi ; vǫ = vǫ1 ... vǫi ... vǫN ; V ǫ = diag [vǫ] q v   4i( ǫi )   the integral from Equatione (173) becomes: e e e e e e

1 1 2 2 p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = V ǫ (g Hf) | q (z) q (v ) q (v ) − 2k − k   2 3 ξ 4 ǫ (175) 1 e 1 V 2 (f Dz) 2 . − 2 k ξ − k  q2(z) e Evidently, p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) is a quadratic form with respect to f. The |  q2(z) q3(vξ) q4(vǫ) proportionality from Equation (169a) leads to the following corollary:

Corollary 4.0.1. q1 (f) is a multivariate Normal distribution.

Computation of the analytical expression of q2(z). The proportionality relation corresponding to q2(z) is estab- lished in Equation (169b). In the expression of p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the terms free of z can be regarded as constants. Via Equation (172) the integral defined| in Equation (171) becomes: hi 1 1 − 2 2 p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = V ξ (f Dz) | q (f ) q (v ) q (v ) − 2 k − k q (f )q (v )   1 3 ξ 5 z   1 3 ξ (176) 1 1 − 2 2 V z z . − 2 k k  q5(vz)

56 Using the notations introduced in Equation (174) and introducing the notations:

−1 T vzj = vzj ; vz = vz1 ... vzj ... vzM ; V z = diag [vz] (177) q v   5j ( zj )   e the integral from Equatione (176) becomes: e e e e e

1 1 2 2 p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = V (f Dz) | − 2 k ξ − k  q2(z) q3(vξ) q4(vǫ)  q1(f ) (178) 1 1 e2 2 V z z . − 2k k The norm from the first term in Equation (178) can be developed as it follows: e

1 1 1 2 2 T T T T T 2 2 T T 2 2 V (f Dz) = f V ξf 2z D V ξf + z D V ξDz = V f 2z D V ξf + V Dz , (179) k ξ − k − k ξ k − k ξ k so Equatione (178) can be developede as it follows:e e e e e

1 1 1 2 2 2 2 T T 2 2 V (f Dz) = V f 2z D V ξ f + V Dz . (180) k ξ − k k ξ k − k ξ k  q1(f )  q1(f )  q1(f )

e e 1 e e Using Corollary (4.0.1), we can easily compute V 2 f 2 and f . For the multivariate Normal distribu- ξ q1(f ) k k q1(f ) h i tion f we will denote f the corresponding meanD and ΣE , obtaining: q1( ) e f

1 1 b 2 2b 2 2 Σ f = f ; V ξ f = V ξ f + Tr V ξ f (181) q (f ) k k q (f ) k k   1   1 h i From Equations (178), (180),(181) web obtain: e e b e b

1 1 1 2 2 Σ T T p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = V ξ f + Tr V ξ f + z D V ξf | q (z) q (v ) q (v ) − 2k k −2   2 3 ξ 4 ǫ h i 1 1 1 e 2 b 2 1 2e b2 e b V Dz V z z − 2k ξ k − 2k k (182) e e

Evidently, p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) . The proportionality from Equa- |  q2(z) q3(vξ) q4(vǫ) tion (169b) leads to the following corollary:

Corollary 4.0.2. q2 (z) is a multivariate Normal distribution. Minimizing the criterion

1 1 1 1 2 2 1 T T 1 2 2 1 2 2 J(z)= V f + Tr V ξΣ + z D V ξf V Dz V z z , (183) −2k ξ k −2 f − 2k ξ k − 2k k h i leads to the analytical expressione b of the correspondinge b mean, denotede bz: e e

1 1 ∂J(z) ∂ T T 1 ∂ 2 2 1 ∂ 2 2 T T =0 f V Dz V Dz V z z b=0 D V ξf D V ξDz V zz =0 ∂z ⇔ ∂z ξ − 2 ∂z k ξ k − 2 ∂z k k ⇔ − − −1 T T T T Db VeξD + V z z = De V ξf z = De V ξD + V z De bV ξf e e ⇔ ⇒     (184) e e e b b e e e b 57 The variance can be obtained by identification:

−1 T Σz = D V ξD + V z (185)   Finally, we conclude that q2(z) is a multivariateb Normale distributione with the following parameters:

−1 T T z = D V ξD + V z D V ξf, q2(z)= z z, Σz ,  (186) N |   −1  Te e e b    Σb z = D V ξD + V z . b b     b e e We note that both the expressions of the mean z and variance Σz depend on expectancies corresponding to the three variances of the hierarchical model. b b Computation of the analytical expression of q1(f) (second part). We come back for computing the parameters of the multivariate Normal distribution q1(f). Using the fact that q2(z) was proved a multivariate Normal distribution, the norm from the latter term in Equation (175) can be computed. The norm was developed in Equation (179) and the computation of the integral uses the particular form of q2(z):

1 1 1 V 2 (f Dz) 2 = V 2 f 2 2f T V T D z + V 2 Dz 2 k ξ − k k ξ k − ξ k ξ k  q2(z)  q2(z)  q2(z) (187) 1 1 2 2 T 2 2 T e = Ve f 2zDe V ξf + V Dz +eTr D V ξDΣz . k ξ k − k ξ k h i Minimizing the criterion e b e e b e b

1 1 1 1 2 2 1 2 2 T 1 2 2 1 T J(f)= V ǫ (g Hf) V f + zD V ξf V Dz Tr D V ξDΣz (188) −2k − k − 2k ξ k − 2k ξ k − 2 h i leads to the analyticale expression of the correspondinge meab n,e denoted z:e b e b

1 1 ∂J(f) 1 ∂ 2 2 1 ∂ 2 2 ∂ T T =0 V ǫ (g Hf) V f + z bD V ξf =0 ∂f ⇔−2 ∂f k − k − 2 ∂f k ξ k ∂f T T T H V ǫg e H V ǫHf V ξf + Veξ Dz =0 e ⇔ − − b −1 T T T T T T H V ǫH + V ξ f = V Dz + H V ǫg f = H V ǫH + V ξ V Dz + H V ǫg ⇔ e e ξe e b ⇒ ξ      (189) e e e b e b e e e b e The variance can be obtained by identification:

−1 T Σf = H V ǫH + V ξ (190)   Finally, we conclude that q1(f) is a multivariateb Normale distributione with the following parameters:

−1 T T T f = H V ǫH + V ξ V ξ Dz + H V ǫg , q1(f)= f f, Σf ,  (191) N |   −1   b T    Σf = H eV ǫH +eV ξ .e b e b b     b e e We note that both the expressions of the mean f and variance Σf depend on expectancies corresponding to the three variances of the hierarchical model. b b

58 Computation of the analytical expression of q3j (vξj ). The proportionality relation corresponding to q3j (vξj ) is presented in established in Equation (169c). In the expression of p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vξj can be regarded as constants. Via Equation (172) the integral defined in Equation (171) becomes:

1 1 − 2 2 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = V ξ (f Dz) | q (f ) q (z) q − (v ) −2 k − k q (f ) q (z) q − (v )   1 2 3 j ξj   1 2 3 j ξj

3 −1 αξ + ln vξ βξv − 2 j − ξj   (192)

Introducing the notations:

T −1 −1 −1 −1 −1 −1 −1 v = v ... v v v ... v ; V − = diag v (193) ξ−j ξ1 ξj−1 ξj ξj+1 ξM ξ j ξ−j h i   1 e − 2 e e2 e e e e the integral V ξ (f Dz) can be written: k − k q (f ) q (z) q − (v )   1 2 3 j ξj

− 1 1 V 2 (f Dz) 2 = V 2 (f Dz) 2 (194) ξ ξ−j k − k q (f ) q (z) q − (v ) k − k q (f ) q (z)   1 2 3 j ξj   1 2 e Considering that q1(f) and q2(z) are multivariate Normal distributions, Equations (191) and (186) and considering the development of the norm Equations (179), we have:

1 1 1 2 2 2 2 T T 2 2 V (f Dz) = V f 2 z D V ξ− f + V Dz = k ξ−j − k k ξ−j k − j k ξ−j k  q1(f ) q2(z)  q1(f )  q1(f ) q2(z)  q2(z) 1 1 e 2 2 e Σ T T e 2 2 Te Σ = V f + Tr V ξ− f 2z D V ξ− f + V Dz + Tr D V ξ− D z = k ξ−j k j − j k ξ−j k j h i M hM i e −b1 2 −1e b −1 T T e b−1 e 2 −1 e b = C + v f + v Σf 2v zb D f j + v Dj z +bv djkdjiΣz ξj j ξj jj − ξj j ξj k k ξj ik k i=1 X=1 X b b b b b b (195)

where C denotes terms not containing vξj and Dj denotes the line j of the matrix D. From Equation (192) and Equation (195):

3 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = αξ + ln vξj | q (f ) q (z) q − (v ) − 2   1 2 3 j ξj   M M 1 2 T T 2 −1 βξ + f + Σf z D f j + Dj z + djkdjiΣz v − 2 j jj − j k k ik ξj " k i=1 #! X=1 X b b b b b b (196) from which it can establish the proportionality corresponding to q3j (vξj ):

1 2 zT DT D z 2 M M − α + 3 βξ + 2 f j + Σfjj 2 j f j + j + k=1 i=1 djkdjiΣzik ( ξ 2 ) − k k q3j (vξ ) v exp , (197) j ξj   ∝ − vξj P P   b b b b b b  leading to the following corollary: 

Corollary 4.0.3. q3j vξj is an Inverse Gamma distribution.  59 The shape and scale parameters are obtained by identification, from Equation (197):

3 αξj = αξ + 2

q3j (vξj )= vξj αξj , βξj , IG |  1 2 T T 2 M M  βbξj = βξ + f j + Σfjj 2z Dj f j + Dj z + djkdjiΣzik    2 − k k k=1 i=1 b b  P P (198)  b b b b b b b

Computation of the analytical expression of q4i(vǫi ). The proportionality relation corresponding to q4i(vǫi ) is presented in established in Equation (169d). In the expression of p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vǫi can be regarded as constants. Via Equation (172) the integral defined in Equation (171) becomes:

1 1 − 2 2 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = V ǫ (g Hf) | q (f ) q − (v ) − 2 k − k q (f ) q − (v )   1 4 i ǫi   1 4 i ǫi 3 βǫ αǫ + ln vǫ − 2 i − v   ǫi (199)

Introducing the notations:

T v−1 v−1 ... v−1 v−1 v−1 ... v−1 V v−1 ǫ−i = ǫ1 ǫi−1 ǫi ǫi+1 ǫN ; ǫ−i = diag ǫ−i (200)   1   − 2 e e2 e e e e e the integral V ǫ (g Hf) can be written: k − k q (f ) q − (v )   1 4 i ǫi

1 1 − 2 2 2 2 V ǫ (g Hf) = V ǫ−i (g Hf) (201) k − k q (f ) q − (v ) k − k q (f )   1 4 i ǫi   1 e Considering that q1(f) is a multivariate Normal distribution, Equations (191) we have:

1 1 2 2 2 2 T Σ V ǫ−i (g Hf) = V ǫ−i g Hf + Tr H V ǫ−i H f k − k q (f ) k − k   1   h i (202) 2 e e −1 b −1 e Tb = C + v gi Hif + v HiΣHi ǫi − ǫi   where C denotes terms not containing vǫi and Hi denotes the line i ofb the matrix Hb . From Equation (199) and Equation (202):

3 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = C αǫ + ln vǫi | q (f ) q − (v ) − 2   1 4 i ǫi   2 1 T −1 βǫ + gi Hif + HiΣf Hi v − 2 − ǫi      (203) b b from which it can establish the proportionality corresponding to q4i(vǫi ):

2 β + 1 g H f + H Σ H T 3 ǫ 2 i i i f i −(αξ+ 2 ) − q4i(vǫi ) vǫi exp     , (204) ∝ − vǫi   b b    leading to the following corollary:  

60 Corollary 4.0.4. q4i (vǫi ) is an Inverse Gamma distribution. The shape and scale parameters are obtained by identification, from Equation (204):

3 αǫi = αǫ + 2 (205) q4i(vǫi )= vǫi αǫi , βǫi ,  2 IG | 1 Σ T  βbǫi = βǫ + gi Hif + Hi f Hi    2 − b b     b b b Computation of the analytical expression of q5j (vzj ). The proportionality relation corresponding to q5j (vzj ) is presented in established in Equation (169d). In the expression of p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vzj can be regarded as constants. Via Equation (172) the integral defined in Equation (171) becomes:

1 1 − 2 2 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = V z z | q (z) q − (v ) − 2 k k q (z) q − (v )   2 5 j zj   2 5 j zj (206) 3 βf αz + ln vz − 2 j − v   zj Introducing the notations:

T v−1 v−1 ... v−1 v−1 v−1 ... v−1 V v−1 z−j = z1 zj−1 zj zj+1 zM ; z−j = diag z−j (207)     1 e − 2 e2 e e e e e the integral V z z can be written: k k q (z) q − (v )   2 5 j zj

1 1 − 2 2 2 2 V z z = V z−j z (208) k k q (z) q − (v ) k k q (z)   2 5 j zj   2 e Considering that q2(z) is a multivariate Normal distribution, Equations (186) we have:

1 1 V 2 2 V 2 2 V Σ −1 2 −1 z−j z = z−j z + Tr z−j z = C + vzj zj + vzj Σzjj (209) k k q (z) k k   2 h i From Equation (206) ande Equation (209): e b e b b

3 ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) = C αz + ln vzj | q (z) q − (v ) − 2   2 5 j zj   (210) 1 2 −1 βz + z + Σz v − 2 j jj zj  h i b from which it can establish the proportionality corresponding to q5j (vzj ):

1 2 3 βz + zj + Σzjj −(αz + 2 ) 2 q5j (vzj ) vzj exp , (211) ∝ − hvzj i  b  leading to the following corollary:  

Corollary 4.0.5. q5j vzj is an Inverse Gamma distribution. 

61 The shape and scale parameters are obtained by identification, from Equation (211):

3 αzj = αz + 2 (212) q5j (vzj )= vǫi αzj , βzj ,  IG | β = β + 1 z2 + Σ .    bzj z 2 j zjj b b h i Equations (191),(186),(198), (205) and (212) resume the distributionsb families andb the corresponding parameters for q (f), q (z) ( multivariate Normal distribution), q j (vξ ), j 1, 2,...,M , q i(vǫ ), i 1, 2,...,N and 1 2 3 j ∈ { } 4 i ∈ { } q j (vz ), j 1, 2,...,M (Inverse Gamma distributions). However, the parameters corresponding to the multivari- 5 j ∈{ } ate Normal distributions are expressed via V ξ, V ǫ and V z (and by extension all elements forming the three matrices vξ , j 1, 2,...,M , vǫ , i 1, 2,...,N , both defined in Equation (174) and vz , j 1, 2,...,M , defined in j ∈{ } i ∈{ } j ∈{ } Equation (177). e e e e e e Computation of the analytical expressions of V ξ, V ǫ and V z. For an Inverse Gamma distribution with scale and shape parameters α and β, (x α, β), the following relation holds: IG | e e e α x−1 = (213) β  IG(x|α,β) The prove of the above relation is done by direct computation, using the analytical expression of the Inverse Gamma Distribution: βα β βα Γ(α + 1) βα+1 β x−1 = x−1 x−α−1 exp dx = x−(α+1)−1 exp dx = Γ(α) − x Γ(α) βα+1 Γ(α + 1) − x  IG(x|α,β) Z   Z   α α = (x α +1,β) dx = β IG | β Z 1 | {z }

Since q j (vξ ), j 1, 2,...,M , q i(vǫ ), i 1, 2,...,N and q j (vz ), j 1, 2,...,M are Inverse Gamma 3 j ∈ { } 4 i ∈ { } 5 j ∈ { } distributions, with the corresponding parameters αξ and βξ , j 1, 2,...,M , αǫ and βǫ , j 1, 2,...,N j j ∈ { } j j ∈ { } respectively αz and βz , j 1, 2,...,M the expectancies vξ , vǫ and vz can be expressed via the parameters of j j ∈{ } j i j the two Inverse Gamma distributions using Equationb (213):b b b b b e e e αξj αǫi αzj vξj = ; vǫi = ; vzj = (214) βξj βǫi βzj b b b e e e Using the notation introduced in (174): b b b

αξ α αz 1 ... 0 ... 0 ǫ1 ... 0 ... 0 1 ... 0 ... 0 βξ1 βǫ βz1 b . . . b 1 b . . . b ...... b . . . . . b ......  . . . . .   ......   . . . . .  αξ α αz  0 ... j ... 0   0 ... ǫi ... 0   0 ... j ... 0  V ξ = β = V f ; V ǫ = β = V ǫ ; V z = β = V z (215)  bξj   ǫi   bzj     . b. .     . .. b. .. .   . .. b. .. .   . .. b. .. .  e  . . . . .  b e  . . . . .  b e  . . . . .  b  α   αǫ   α  0 ... 0 ... ξM  0 ... 0 ... N  0 ... 0 ... zM   β   βǫ   β   bξM   b N   bzM   b   b   b  In Equation (215) other notations are introduced for V ξ, V ǫ and V z. Both values were expressed during the model via unknown expectancies, but via Equation (215) those values don’t contain any more integrals to be computed. Therefore, the new notations represent the final analyticale eexpressionse used for expressing the density functions qi.

62 Using Equation (215) and Equations (191),(186),(198),(205), and (212), the final analytical expressions of the separable distributions qi are presented in Equations (216a),(216b),(216c),(216d) and(216e).

−1 T T T f = H V ǫH + V ξ V ξ Dz + H V ǫg , q1(f)= f f, Σf ,  , (216a) N |   −1   b T    Σf = H bV ǫH +bV ξ .b b b b b   −1  b T b b T z = D V ξD + V z D V ξf, q2(z)= z z, Σz ,  , (216b) N |   −1  T b    Σb z = D bV ξD +bV z . b b b    3  bαξj = αξ +b2 b (216c) q3j (vξj )= vξj αξj , βξj ,  2 , j 1, 2,...,M , IG | 1 Σ T ∈{ }  βbξj = βξ + f j Dj z + Dj zDj    2 − b b     b 3 b b  αǫi = αǫ + 2 b (216d) q4i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N , IG | 1 Σ T ∈{ }  βbǫi = βǫ + gi Hif + Hi f Hi    2 − b b     b 3 b b αzj = αz + 2 (216e) q5j (vzj )= vǫi αzj , βzj ,  , j 1, 2,...,M . IG | β = β + 1 z2 + Σ . ∈{ }    bzj z 2 j zjj b b h i Equation (216a) establishes the dependency b between the parametersb corresponding to the multivariate Normal distri- bution q1(f) and the others parameters involved in the hierarchical model: the mean f and the covariance matrix Σf

depend on V ǫ, V ξ which, via Equation (215) are defined using αξj , βξj , j 1, 2,...,M and αǫi , βǫi ,i ∈ { b } b∈ 1, 2,...,N and z. The dependency between the parameters ofn the multivariateo Normal distributionnq1(f) ando the { } parametersb of theb Inverse Gamma distributions q j (vξ ), j 1, 2b,...,Mb and q i(vǫ ),i 1, 2,...,Nb b and the 3 j ∈ { } 4 i ∈ { } multivariate Normalb distribution q2(z) is presented in Figure (24). Equation (216b) establishes the dependency be-

✲ Σ z, αξj , βξj , αǫi , βǫi f , f n o n o b b b b b b b Figure 24: Dependency between q1(f) parameters and q2(z), q3j (vξj ) and q4i(vǫi ) parameters

tween the parameters corresponding to the multivariate Normal distribution q2(z) and the others parameters involved in the hierarchical model: the mean z and the covariance matrix Σz depend on V ξ, V z which, via Equation (215)

are defined using αǫi , βǫi ,i 1, 2,...,N and αzj , βzj , j 1, 2,...,M and f. The dependency between ∈{ b } b∈{ b} b the parameters ofn the multivariateo Normal distributionn q2(z) ando the parameters of the Inverse Gamma distributions b b b q4i(vǫi ),i 1, 2,...,Nb and q5j (vzj ), j 1, 2,...,Mb and the multivariate Normal distribution q1(f) is pre- sented in Figure∈ { (25). Equation(} 216c) establishes∈ { the dependencybetween} the parameters correspondingto the Inverse

✲ f, αǫi , βǫi , αzj , βzj z , Σz n o n o b b b b b b b Figure 25: Dependency between q2(z) parameters and q1(f), q4i(vǫi ) and q5j (vzj ) parameters

63 Gamma distributions q j (vξ ), j 1, 2,...,M and the others parameters involved in the hierarchical model: the 3 j ∈ { } shape and scale parameters αξ , βξ , j 1, 2,...,M depend on the element j of the mean f of the multivariate j j ∈{ } Normal distribution q (f),n i.e. f j ando the mean z and the covariance matrix Σz of the multivariate Normal distri- 1 b b bution q2(z), Figure (26). Equationb (216d) establishes the dependency between the parameters corresponding to the b b b Σ ✲ f j , z, z αξj , βξj n o b b b b b Figure 26: Dependency between q3j (vξj ) parameters and q1(f) and q2(z) parameters

Inverse Gamma distributions q i(vǫ ),i 1, 2,...,N and the others parameters involved in the hierarchical model: 4 i ∈{ } the shape and scale parameters αǫ , βǫ ,i 1, 2,...,N depend on the mean f and the covariance matrix Σf i i ∈ { } of the multivariate Normal distributionn q1o(f), Figure (27). Equation (216e) establishes the dependency between the b b b b

Σ ✲ f , f αǫi , βǫi n o b b b b Figure 27: Dependency between q4i(vǫi ) parameters and q1(f) parameters

parameters corresponding to the Inverse Gamma distributions q j (vz ), j 1, 2,...,M and the others parameters 5 j ∈{ } involved in the hierarchical model: the shape and scale parameters αz , βz , j 1, 2,...,M depend on the j j ∈ { } element j of the mean z and the element jj of the covariance matrixnΣz correspondingo to the multivariate Normal b distribution q2(z), Figure (28). b b b ✲ zj , Σzjj αzj , βzj n o b b b b Figure 28: Dependency between q5j (vzj ) parameters and q2(z) parameters

The iterative algorithm obtained via PM estimation is presented Figure (29). More formal, the iterative algorithm obtained via PM estimation is presented Algorithm (4).

64 Algorithm 4 PM via VBA partial sep. - Student-t hierarchical model, non-stationary Student-t uncertainties model Ensure: INITIALIZATION α(0), β(0), α(0), β(0), α(0),β(0) ξj ξj ǫi ǫi zj zj (0) (0) (0) (0) (0) (0) ⊲ Where α = αξ, β = βξ, αǫ = αǫ, βǫ = βǫ, αz = αz,βz = βz b b ξj ξj i i j j b b (0) αξ (0) αǫ (0) αz ⊲ Leads to V ξ = I, V ǫ = I, V z = I b βξb βǫ βz b ⊲ αξ,βξ, αǫ,βǫ,b αz,βz are set in modeling phase, Eq. (157) (0) 1: function PMVIAVBAPS(αξ ,βξ, αǫ,βǫ, αz,βz, g, H, D, z ,M,N,NoIterb ) b b 2: for n =0 to IterNumb do −1 (n) α + 3 α + 3 (n) Σ T ǫ 2 ξ 2 Σ 3: f = H diag (n) H + diag (n+1) ⊲ New value f βǫ β   i   ξj  b T b (n) α + 3 α + 3 b(n) Σ ξ 2 (n) T ǫ 2 b (n) 4: f = f diag (n+1) Dz + H diag (n) g ⊲ New value f β βǫ  ξj   i  ! b b 5: forb j =1b to M do b b 2 (n+1) (n+1) 1 (n) (n+1) T 6: β = βξ + f Dj z + Dj Σ D ξj 2 j − z j 7: end for    b b b 8: for i =1 to N do b 2 (n+1) (n+1) 1 (n) T 9: βǫ = βǫ + gi Hif + HiΣ Hi i 2 − f 10: end for    b b b 11: for j =1 to M do 2 (n+1) 1 (n+1) (n+1) 12: βzj = βz + 2 zj + Σzjj 13: end for    b b −1 (n+1) α + 3 α + 3 (n+1) Σ T ξ 2 z 2 Σ 14: z = D diag (n+1) D + diag (n+1) ⊲ New value z β βz   ξj   j  (n+1) b α + 3 b 15: zb(n+1) = Σ DT diag ξ 2 f (n) ⊲ New value bz(n+1) z β(n+1)  ξj  16: end for b b b (n) (n+1) b b return f (n), Σ , z(n+1), Σ , β(n+1), β(n+1), β(n+1) n = NoIter f z ξj ǫi zj 17: end function  b b b b b b b

65 Initialization

1 T − T T f = H V ǫH + V ξ V ξ Dz + H V ǫg    1  Σ T − b fb = bH V ǫHb + Vbξ b

(a) - update estimated f and covariance matrix Σf b b b 1 b T − T z = D V ξD + V z D V ξf   1 Σ T − b z = b D V ξbD + V z b b

(b) - update estimated z and covariance matrix Σz b b b b V ξ = diag [vξ] (f) 3 b b αξj = αξ + 2 2 β = β + 1 f D z + D Σ D T ξj ξ 2b j − j j z j    (c) - update estimated parameters corresponding to vξj b bIG b b V ǫ = diag [vǫ] (g) b b 3 αǫi = αǫ + 2 2 β = β + 1 g H f + H Σ H T ǫi ǫ 2b i − i i f i    (d) - update estimated parameters corresponding to vǫi b IG b b V z = diag [vz] (h) 3 αzj = αz + 2 b b 1 2 βz = βz + z + Σz j b 2 j jj

(e) - update estimated parametersh correspondingi to vzj b IG b

Figure 29: Indirect sparsity (via z) - Iterative algorithm corresponding to partial separability PM estimation for Student-t hierarchical model, non-stationary Student-t uncertainties model

66 4.7.3 Posterior Mean estimation via VBA, full separability In this subsection, the Posterior Mean (PM) estimation is again considered, but via a full separable approximation. The posterior distribution is approximated by a full separable distribution q (f, z, vξ, vǫ, vz), i.e. a supplementary condition is added in Equation (166):

M M M N M

q1(f)= q1j (f j ); q2(z)= q2j (zj ); q3(vξ)= q3j (vξj ); q4(vǫ)= q4i(vǫi ); q5(vz)= q5j (vzj ) j=1 j=1 j=1 i=1 j=1 Y Y Y Y Y (217) As in Subsection (4.7.2), the approximationis done by minimizing of the Kullback-Leibler divergence, Equation (167), via alternate optimization resulting the following proportionalities from Equations (218a),(218b), (218c),(218d) and (218e),

q j (f j ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , 1 ∝ | * q1−j (f ) q2(z) q3(vξ) q4(vǫ) q5(vz )) j 1, 2 ...,M , (218a) ∈{ }

q j (zj ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , 2 ∝ | * q1(f ) q2−j (z) q3(vξ) q4(vǫ) q5(vz)) j 1, 2 ...,M , (218b) ∈{ }

q3j (vξj ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q − (v ) q (v ) q (v ) *  1 2 3 j ξj 4 ǫ 5 z ) j 1, 2 ...,M , (218c) ∈{ }

q4i(vǫi ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q (v ) q − (v ) q (v ) *  1 2 3 ξ 4 i ǫi 5 z ) i 1, 2 ...,N , (218d) ∈{ }

q5j (vzj ) exp ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) , ∝ | q (f ) q (z) q (v ) q (v ) q − (v ) *  1 2 3 ξ 4 ǫ 5 j zj ) j 1, 2 ...,M , (218e) ∈{ } using the notations introduced in Equation (170), Equation (171) and Equation (219).

M M

q1−j (f j )= q1k(f k); q2−j (zj )= q2k(zk) (219) k ,k j k ,k j =1Y6= =1Y6= The analytical expression of logarithm of the posterior distribution ln p(f, z, vǫ, vξ, vz g, αǫ,βǫ, αξ,βξ, αz,βz) is obtained in Equation (172). |

Computation of the analytical expression of q1j (f j ). The proportionality relation corresponding to q1j (f j ) is presented in established in Equation (218a). In the expression of ln p (f, vf , vǫ g, αf ,βf , αǫ,βǫ) all the terms free of | f j can be regarded as constants:

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q − (f ) q (z) q (v ) q (v ) q (v )   1 j 2 3 ξ 4 ǫ 5 z (220) 1 1 1 − 2 2 1 − 2 2 = V ǫ (g Hf) V (f Dz) . −2 k − k − 2 k ξ − k  q1−j (f ) q4(vǫ)  q1−j (f )q2(z)q3(vξ)

67 Using the notations introduced in Equation (174), the integral from Equation (220) becomes:

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q − (f ) q (z) q (v ) q (v ) q (v )   1 j 2 3 ξ 4 ǫ 5 z (221) 1 1 1 2 2 1 2 2 = V ǫ (g Hf) V (f Dz) . −2 k − k − 2 k ξ − k  q1−j (f )  q1−j (f )q2(z) Introducing the notations e e

T f j = f j ; f −j = f = f 1 ... f j−1 f j f j+1 ... f M (222) q (f ) q − (f )   1j j   1 j h i e e e e e e T −j T −j −j f = f 1 ... f j−1 f j+1 ... f M ; f = f = f 1 ... f j−1 f j+1 ... f M (223)  q1−j (f )   h i and denoting Hj the column j of matrix H andeH−j the matrix H withoute columne j, thee developmente of the norm from the first term if Equation (232) is

1 1 1 2 2 2 2 T 2 2 V ǫ (g Hf) = V ǫ g 2g V ǫHf + V ǫ Hf . (224) k − k k k − k k Using the equality Hf H−j f −j Hj we establish the equalities from Equation (225) and Equation (226): = e + f j e e e T T −j −j j T −j −j T j g V ǫHf = g V ǫ H f + H f j = g V ǫH f + g V ǫH f j (225)

1 1 2 2 2 −j −j j 2 −jT  −jT jT −j −j j V ǫ Hf = V eǫ H f +e H f j = f H +eH f j V ǫ H ef + H f j = k k k k (226) −jT −jT −j −j jT −j −j jT j 2 = f H V ǫH f +2H V ǫH f f j + H V ǫH f j  e e e so the corresponding integrals are: e e e T T −j −j T j g V ǫHf = g V ǫH f + g V ǫH f j q − (f )   1 j (227) 1 1 1 2e 2 e 2 −je −j 2 e jT −j −j 2 j 2 2 V ǫ Hf = V ǫ H f +2H V ǫH f f j + V ǫ H f k k k k k k j  q1−j (f )  q1−j (f ) e e e e e Considering all the term that don’t contain f j as constants, using Equation (227) via Equation (224), the integral of the norm from the first term of Equation (232) is:

1 1 2 2 T j jT −j −j 2 j 2 2 V ǫ (g Hf) = C 2 g V ǫH H V ǫH f f j + V ǫ H f j (228) k − k q − (f ) − − k k   1 j   The norm correspondinge to the latter term of Equation (232e ) is developede consideringe all thee term that don’t contain f j as constants: M 1 2 2 2 2 V (f Dz) = vξ (f j Dj z) = C 2vξ Dj zf j + vξ f j . (229) k ξ − k j − − j j j=1 X Introducing the notationse e e e

T zj = zj ; z = z = z1 ... zj−1 zj zj+1 ... zM , (230)  q2j (zj )  q2j (z)   the integral of the norme from the latter terme of Equation (232) is:e e e e e

1 2 2 2 V (f Dz) = C 2vξ Dj zf j + vξ f j . (231) k ξ − k − j j  q1−j (f )q2(z) e e e e 68 From Equation (232) via Equation (228) and Equation (231):

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = C | −  q1−j (f ) q2(z) q3(vξ) q4(vǫ) q5(vz) (232) 1 T j jT −j −j 1 2 j 2 2 g V ǫH H V ǫH f + vξ Dj z f j + V ǫ H + vξ f j , − − j 2 k k j     e e e e e e e so ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) is a quadratic form with respect |  q1−j (f ) q2(z) q3(vξ) q4(vǫ) q5(vz) to f j . The proportionality from Equation (218a) leads to the following corollary:

Corollary 4.0.6. q1j (f j ) is a Normal distribution. Minimizing the criterion

1 T j jT −j −j 1 2 j 2 2 J(f j )= C g V ǫH H V ǫH f + vξ Dj z f j + V ǫ H + vξ f j , (233) − − j 2 k k j     leads to the mean of the Normale distribution q1ej (f j ): e e e e e

1 ∂J(f j ) T j jT −j −j 2 j 2 =0 g V ǫH H V ǫH f + vξ Dj z + V ǫ H + vξ f j =0 ∂f ⇔− − j k k j ⇒ j (234)  1 −1    2 j 2 jT e −j −j f j = V ǫ eH + vξ e H V ǫ g e H ef +evξ Dj z . e ⇒ k k j − j   h   i By identification, theb variancee can be easilye derived: e e e e

1 −1 2 j 2 Σf = V ǫ H + vξ . (235) jj k k j   Finally, we conclude that q1j (f j ) is a Normalb distributione with thee following parameters:

−1 jT j jT −j −j f j = H V ǫH + vξ H V ǫ g H f + vξ Dj z j − j q1j (f j )= f j f j , Σf ,  (236) N | jj   −h1   i  b jeT j e e    Σfjj = H V ǫH +e vξj e e b b     b e e Computation of the analytical expression of q2j (zj ). The proportionality relation corresponding to q2j (zj ) is presented in established in Equation (218b). In the expression of ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of zj can be regarded as constants:

lnp (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q (f ) q − (z) q (v ) q (v ) q (v )   1 2 j 3 ξ 4 ǫ 5 z (237) 1 1 1 − 2 2 1 − 2 2 = V (f Dz) V z z . −2 k ξ − k − 2 k k  q1(f ) q2−j (z) q3(vξ)  q2(z)q5(vz) Using the notations introduced in Equation (174) and Equation (177), the integral from Equation (237) becomes:

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q (f ) q − (z) q (v ) q (v ) q (v )   1 2 j 3 ξ 4 ǫ 5 z (238) 1 1 1 2 2 1 2 2 = V (f Dz) V z z . −2 k ξ − k − 2 k k  q1(f )q2−j (z)  q2−j (z) e e 69 Introducing the notations

T zj = zj ; z−j = z = z1 ... zj−1 zj zj+1 ... zM (239)  q2j (zj )  q2−j (z)   e e e e e e −j T −j −j T z = z1 ... zj−1 zj+1 ... zM ; z = z = z1 ... zj−1 zj+1 ... zM (240)  q2−j (z)     and denoting Dj the column j of matrix D andeD−j the matrix D withoute columne j, thee developmente of the norm from the first term if Equation (238) is

1 1 1 2 2 2 2 T 2 2 V (f Dz) = V f 2f V ξDz + V Dz . (241) k ξ − k k ξ k − k ξ k −j −j j Using the equality Dz = D ez + D zj we establishe the equalitiese from Equatione (242) and Equation (243):

T T −j −j j T −j −j T j f V ξDz = f V ξ D z + D zj = f V ξD z + f V ξD zj (242)

1 1 2 2 2 −j −j j 2 −jT  −jT jT −j −j j V Dz = Ve D z +e D zj = z D +eD zj V ξ D ez + D zj = k ξ k k ξ k (243) −jT −jT −j −j jT −j −j jT j 2 = z D V ξD z +2D V ξD z zj + D V ξD z  e e e j so the corresponding integrals are: e e e T T −j −j T j f V ξDz = f V ξD z + f V ξD zj q (f ) q − (z)   1 2 j (244) 1 1 1 e 2 2 e e 2 −je −j 2 e e jT −j −j 2 j 2 2 V Dz = V D z +2D V ξD z zj + V D z k ξ k k ξ k k ξ k j  q2−j (z)  q2−j (z) e e e e Considering all the term that don’t contain zj as constants, using Equation (244)e via Equation (241), the integral of the norm from the first term of Equation (238) is:

1 1 2 2 T j jT −j −j 2 j 2 2 V ξ (f Dz) = C 2 f V ξD D V ξD z zj + V ξ D zj (245) k − k q (f ) q − (z) − − k k   1 2 j   The norme corresponding to the latter term of Equation (238e ) is developede consideringe all thee term that don’t contain zj as constants: M 1 2 2 2 2 V z z = vz z = C + vz zj . (246) k k j j j j=1 X so the integral of the norm from the lattere term of Equatione (238) is: e

1 2 2 2 V z z = C + vz zj . (247) k k j  q2−j (z) From Equation (238) via Equation (245) ande Equation (247): e

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = C | −  q1(f ) q2−j (z) q3(vξ) q4(vǫ) q5(vz) (248) 1 T j jT −j −j 1 2 j 2 2 f V ξD D V ξD z zj + V D + vz zj , − − 2 k ξ k j     e e e e e so ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) . The proportionality from Equa- |  q1(f ) q2−j (z) q3(vξ) q4(vǫ) q5(vz) tion (218b) leads to the following corollary:

70 Corollary 4.0.7. q2j (zj ) is a Normal distribution. Minimizing the criterion

1 T j jT −j −j 1 2 j 2 2 J(zj )= C f V ξD D V ξD z zj + V D + vz zj , (249) − − 2 k ξ k j leads to the mean of the Normal distribution q (z ):    e 2j j e e e e ∂J(zj ) 1 =0 f T V Dj DjT V D−j z−j + V 2 Dj 2 + v z =0 ∂z ξ ξ ξ zj j j ⇔− − k k ⇒ (250)  1 −1    2 j 2 T j jT −j −j zj = V De + vz e f V eξD D eV ξD z e . ⇒ k ξ k j − By identification, the variance can be easily derived: h i b e e e e e 1 −1 2 j 2 Σz = V D + vz . (251) jj k ξ k j Finally, we conclude that q (z ) is a Normal distribution with the following parameters: 2j j b e e −1 jT j jT −j −j zj = D V ξD + vz D V ξ f D z j − q2j (zj )= zj zj , Σz ,  (252) N | jj   −h1 i  jeT j e    Σb zjj = D V ξD +e vzj e b b     b e e Computation of the analytical expression of q3j (vξj ). The proportionality relation corresponding to q3j (vξj ) is presented in established in Equation (218c). In the expression of ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vξj can be regarded as constants:

lnp (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q (f ) q (z) q − (v ) q (v ) q (v )   1 2 3 j ξ 4 ǫ 5 z (253) 1 1 − 2 2 3 βξ = V ξ (f Dz) αξ + ln vξj . −2 k − k − 2 − vξ  q1(f ) q2(z) q3−j (vξ)   j Using the notations introduced in Equation (193), Corollary (4.0.6) and Corollary (4.0.7), the integral from Equa- tion (253) becomes:

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = |  q1(f ) q2(z) q3−j (vξ) q4(vǫ) q5(vz ) 1 3 β = v−1 (f D z)2 α + ln v ξ = ξj j j ξ ξj −2 − − 2 − vξ  q1j (f j )q2(z)   j 1 3 β (254) = v−1 f 2 2f D z + (D z)2 α + ln v ξ = ξj j j j j ξ ξj −2 − − 2 − vξ  q1j (f j )q2(z)   j 2 1 T βξ + f j Dj z + Σf + D ΣzDj 3 2 − jj j = αξ + ln vξ  . − 2 j −  v   b b ξj b b The proportionality from Equation (218c) via Equation (254) leads to the following corollary:

Corollary 4.0.8. q3j vξj is an Inverse Gamma distribution.

By identification we conclude  that q3j vξj is an Inverse Gamma distribution with the following parameters: 3  αξj = αξ + 2 (255) q3j vξj = vξj αξj , βξj ,  2 IG | 1 T Σ  βbξj = βξ + f j Dj z + Σfjj + Dj zDj     2 − b b     b b b b b 71 Computation of the analytical expression of q4i(vǫi ). The proportionality relation corresponding to q4j (vǫi ) is presented in established in Equation (218d). In the expression of ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vξj can be regarded as constants:

lnp (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q (f ) q (z) q (v ) q − (v ) q (v )   1 2 j ξ 4 i ǫ 5 z (256) 1 1 − 2 2 3 βǫ = V ǫ (g Hf) αǫ + ln vǫi . −2 k − k − 2 − vǫ  q1(f ) q4−i(vǫ)   i Using the notations introduced in Equation (200) and Corollary (4.0.6), the integral from Equation (256) becomes:

lnp (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = |  q1(f ) q2(z) qj (vξ) q4−i(vǫ) q5(vz) 1 −1 2 3 βǫ = vǫi (gi Hif) αǫ + ln vǫi = −2 − − 2 − vǫ  q1(f )   i 1 3 β (257) −1 2 H z H f 2 ǫ = vǫi gi 2gi i + ( i ) αǫ + ln vǫi = −2 − − 2 − vǫ  q1(f )   i 2 1 T βǫ + gi Hif + H Σf Hi 3 2 − i = αǫ + ln vǫ  . − 2 i −  v    bǫi b The proportionality from Equation (218d) via Equation (257) leads to the following corollary:

Corollary 4.0.9. q4i (vǫi ) is an Inverse Gamma distribution.

By identification we conclude that q4i (vǫi ) is an Inverse Gamma distribution with the following parameters: 3 αǫi = αǫ + 2 (258) q4i (vǫi )= vǫi αǫi , βǫi ,  2 IG | 1 T Σ  βbǫi = βǫ + gi Hif + Hi f Hi    2 − b b     b b b Computation of the analytical expression of q5j (vzj ). The proportionality relation corresponding to q5j (vzj ) is presented in established in Equation (218e). In the expression of ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) all the | terms free of vzj can be regarded as constants:

lnp (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = | q (f ) q (z) q (v ) q (v ) q − (v )   1 2 3 ξ 4 ǫ 5 j z (259) 1 1 − 2 2 3 βz = V z z αz + ln vzj . −2 k k − 2 − vz  q2(z) q5−j (vz)   j Using the notations introduced in Equation (207) and Corollary (4.0.7), the integral from Equation (259) becomes:

ln p (f, z, vξ, vǫ, vz g, αξ,βξ, αǫ,βǫ, αz,βz) = |  q1(f ) q2(z) q3(vξ) q4(vǫ) q5−j (vz ) 1 −1 2 3 βz = vzj zj αz + ln vzj = −2 − 2 − vz  q2(z)   j (260) 1 −1 2 3 βz = vzj zj + Σzjj αz + ln vzj = −2 − 2 − vzj     b 1 2 b3 βz + 2 zj + Σzjj = αz + ln vzj . − 2 − vzj    b b 72 The proportionality from Equation (218e) via Equation (260) leads to the following corollary:

Corollary 4.0.10. q5j vξj is an Inverse Gamma distribution.  By identification we conclude that q5j vzj is an Inverse Gamma distribution with the following parameters:

 3 αzj = αz + 2

q5j vzj = vzj αzj , βzj , (261) IG |  1 2  βbzj = βz + zj + Σzjj     2 b b   Equations (236),(252),(255), (258) and (261) resume the distributionsb familiesb andb the corresponding parameters for q j (f j ), j 1, 2,...,M , q j (zj ), j 1, 2,...,M (Normal distribution), q j (vξ ), j 1, 2,...,M , 1 ∈ { } 2 ∈ { } 3 j ∈ { } q i(vǫ ), i 1, 2,...,N and q j (vz ), j 1, 2,...,M (Inverse Gamma distributions). However, the parameters 4 i ∈{ } 5 j ∈{ } corresponding to the Normal distributions are expressed via V ξ, V ǫ and V z (and by extension all elements forming the three matrices vξ , j 1, 2,...,M , vǫ , i 1, 2,...,N , both defined in Equation (174) and vz , j j ∈ { } i ∈ { } j ∈ 1, 2,...,M , defined in Equation (177). e e e { } e e e Computation of the analytical expressions of V ξ, V ǫ and V z. For an Inverse Gamma distribution with scale and shape parameters α and β, (x α, β), the following relation holds: IG | e e e α x−1 = (262) β  IG(x|α,β) The prove of the above relation is done by direct computation, using the analytical expression of the Inverse Gamma Distribution: βα β βα Γ(α + 1) βα+1 β x−1 = x−1 x−α−1 exp dx = x−(α+1)−1 exp dx = Γ(α) − x Γ(α) βα+1 Γ(α + 1) − x  IG(x|α,β) Z   Z   α α = (x α +1,β) dx = β IG | β Z 1 | {z }

Since q j (vξ ), j 1, 2,...,M , q i(vǫ ), i 1, 2,...,N and q j (vz ), j 1, 2,...,M are Inverse Gamma 3 j ∈ { } 4 i ∈ { } 5 j ∈ { } distributions, with the corresponding parameters αξ and βξ , j 1, 2,...,M , αǫ and βǫ , j 1, 2,...,N j j ∈ { } j j ∈ { } respectively αz and βz , j 1, 2,...,M the expectancies vξ , vǫ and vz can be expressed via the parameters of j j ∈{ } j i j the two Inverse Gamma distributions using Equationb (262):b b b b b e e e αξj αǫi αzj vξj = = vξj ; vǫi = = vǫi ; vzj = = vzj (263) βξj βǫi βzj b b b e b e b e b Using the notation introduced in (174b ): b b

αξ α αz 1 ... 0 ... 0 ǫ1 ... 0 ... 0 1 ... 0 ... 0 βξ β βz b 1 bǫ1 b 1 ......  b ......   b ......   b ......  αξ α αz  0 ... j ... 0   0 ... ǫi ... 0   0 ... j ... 0  V ξ = β = V f ; V ǫ = β = V ǫ ; V z = β = V z (264)  bξj   ǫi   bzj     . b. .     . .. b. .. .   . .. b. .. .   . .. b. .. .  e  . . . . .  b e  . . . . .  b e  . . . . .  b  α   αǫ   α  0 ... 0 ... ξM  0 ... 0 ... N  0 ... 0 ... zM   β   βǫ   β   bξM   b N   bzM   b   b   b  73 In Equation (264) other notations are introduced for V ξ, V ǫ and V z. Both values were expressed during the model via unknown expectancies, but via Equation (264) those values don’t contain any more integrals to be computed. There- fore, the new notations represent the final analyticale expree ssionse used for expressing the density functions qi. Using Equation (264) and Equations (236),(252), (255), (258) and (261), the final analytical expressions of the separable distributions qi are presented in Equations (265a),(265b), (265c),(265d)and (265e).

−1 α α jT j ξj jT −j −j ξj f j = H V ǫH + H V ǫ g H f + Dj z β β ξj − ξj   b   b  q1j (f j )= f j f j , Σfjj , b −1   b , N |  α  b jbT j ξj b b b    Σfjj = H V ǫH + βξ b b  b j   b  b b (265a) −1 α jT j zj jT −j −j zj = D V ξD + D V ξ f D z β zj −   b  (265b) q2j (zj )= zj zj , Σzjj , b −h1  i , N |  α  b jbT j zj b b b    Σzjj = D V ξD + βz b b  b j   b  b 3 b  αξj = αξ + 2 (265c) q3j (vξj )= vξj αξj , βξj ,  2 , j 1, 2,...,M , IG | 1 Σ T ∈{ }  βbξj = βξ + f j Dj z + Dj zDj    2 − b b     b 3 b b  αǫi = αǫ + 2 b (265d) q4i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N , IG | 1 Σ T ∈{ }  βbǫi = βǫ + gi Hif + Hi f Hi    2 − b b     b 3 b b αzj = αz + 2 (265e) q5j (vzj )= vǫi αzj , βzj ,  , j 1, 2,...,M . IG | β = β + 1 z2 + Σ . ∈{ }    bzj z 2 j zjj b b h i  Equation (265a) establishes the dependency b between theb parametersb corresponding to the Normal distributions q1j (f j ) and the others parameters involved in the hierarchical model: the mean f j and the variance Σfjj depend on V ǫ, V ξ

which, via Equation (264) are defined using αξj , βξj , j 1, 2,...,M and αǫi , βǫi ,i 1, 2,...,N and z. ∈{ b} b∈{ }b b The dependency between the parameters of then Normalo distribution q1j (f j ) and then parameterso of the Inverse Gamma b b distributions q3j (vξj ), j 1, 2,...,M andb q4i(vǫi ),i 1, 2,...,N and theb Normal distributions q2j (zj ) bis presented in Figure (30).∈ Equation { (265b}) establishes the dependency∈ { between} the parameters corresponding to the

−j ✲ z, f , αξj , βξj , αǫi , βǫi f j , Σfjj n o b b b b b b b b Figure 30: Dependency between q1j (f j ) parameters and q2j (zj ), q3j (vξj ) and q4i(vǫi ) parameters

Normal distributions q2j (zj) and the others parameters involved in the hierarchical model: the mean zj and the

variance Σz depend on V ξ, V z which, via Equation (264) are defined using αǫ , βǫ ,i 1, 2,...,N and jj i i ∈ { } n o b αz , βz , j 1, 2,...,M and f. The dependency between the parameters of the Normal distribution q2j (zj ) j jb ∈ { b }b b b nand the parameterso of the Inverse Gamma distributions q4i(vǫ ),i 1, 2,...,N and q5j (vz ), j 1, 2,...,M i ∈ { } j ∈ { } andb theb Normal distributions q1j (f j ) isb presented in Figure (31). Equation (265c) establishes the dependency between

74 −j ✲ f, z , αǫi , βǫi , αzj , βzj zj , Σzjj n o b b b b b b b b Figure 31: Dependency between q2j (zj) parameters and q1j (f j ), q4i(vǫi ) and q5j (vzj ) parameters

the parameters corresponding to the Inverse Gamma distributions q j (vξ ), j 1, 2,...,M and the others parame- 3 j ∈{ } ters involved in the hierarchical model: the shape and scale parameters αξ , βξ , j 1, 2,...,M depend on the j j ∈{ }

of the Normal distribution q1j (f j ), i.e. f j and the mean zj and the variancen Σzjjoof the Normal distribution q2j (zj ), Figure (32). Equation (265d) establishes the dependency between the parametersb b corresponding to the Inverse Gamma b b Σ ✲ f j , z, z αξj , βξj

b b b b b Figure 32: Dependency between q3j (vξj ) parameters and q1j (f j ) and q2j (zj ) parameters

distributions q i(vǫ ),i 1, 2,...,N and the others parameters involved in the hierarchical model: the shape and 4 i ∈ { } scale parameters αǫ , βǫ ,i 1, 2,...,N depend on the mean f j and the variance Σf of the Normal distri- i i ∈ { } jj bution q1j (f j ), Figuren (33o). Equation (265e) establishes the dependency between the parameters corresponding to b b b b

Σ ✲ f , f αǫi , βǫi

b b b b Figure 33: Dependency between q4i(vǫi ) parameters and q1j (f j ) parameters

the Inverse Gamma distributions q j (vz ), j 1, 2,...,M and the others parameters involved in the hierarchical 5 j ∈ { } model: the shape and scale parameters αz , βz , j 1, 2,...,M depend on the mean zj and the variance Σz j j ∈{ } jj corresponding to the Normal distributionn q2j (zj ),o Figure (28). b b b b ✲ zj , Σzjj αzj , βzj

b b b b Figure 34: Dependency between q5j (vzj ) parameters and q2j (zj ) parameters

The iterative algorithm obtained via PM estimation is presented Figure (35). More formal, the iterative algorithm obtained via PM estimation is presented Algorithm (5).

4.8 Student-t hierarchical model: stationary Laplace uncertainties model, unknown uncer- tainties variance the hierarchical model is using as a prior the Student-t distribution; • the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a stationary Laplace uncertainties model is proposed, i.e. a multivariate Laplace • distribution expressed via LPM is used under the following two assumptions:

75 Algorithm 5 PM via VBA full sep. - Student-t hierarchical model, non-stationary Student-t uncertainties model Ensure: INITIALIZATION α(0), β(0), α(0), β(0), α(0),β(0) ξj ξj ǫi ǫi zj zj (0) (0) (0) (0) (0) (0) ⊲ Where α = αξ, β = βξ, αǫ = αǫ, βǫ = βǫ, αz = αz,βz = βz b b ξj ξj i i j j b b (0) αξ (0) αǫ (0) αz ⊲ Leads to V ξ = I, V ǫ = I, V z = I b βξb βǫ βz b ⊲ αξ,βξ, αǫ,βǫ,b αz,βz are set in modeling phase, Eq. (157) (0) (0) 1: function PMVIAVBAFS(αξ ,βξ, αǫ,βǫ, αz,βz, g, H, D, z , f ,M,N,NoIterb ) b b 2: for n =0 to IterNumb do 3: for j =1 to M do −1 (n+1) α + 3 α + 3 (n+1) 4: Σ = HjT diag ǫ 2 Hj + ξ 2 ⊲ New value Σ fjj (n) (n) fjj βǫ β i b ξj     (n) (n+1) b α + 3 b α + 3 5: fb (n+1) = Σ HjT diag ǫ 2 g H−j f −j + ξ 2 D z(n) ⊲ Newb value j fjj (n) (n) j βǫ β i − b ξj         f (n+1) b b j b b b b 6: end for 7: b for j =1 to M do −1 3 3 (n+1) jT αξ+ 2 j αz + 2 (n+1) 8: Σzjj = D diag (n) D + (n) ⊲ New value Σzjj β βz   ξi  b j  b 3 b n (n+1) (n+1) jT αξ+ 2 (n+1) −j −j ( ) (n+1) 9: zbj = Σzjj D diag (n) f D z ⊲ New value zbj βξ −   i   10: end for b    b b 11: for jb =1 to M do b b 2 (n+1) (n+1) 1 (n+1) (n+1) T 12: β = βξ + f Dj z + Dj Σ D ξj 2 j − z j 13: end for    b b b 14: for i =1 to N do b 2 (n+1) (n+1) 1 (n+1) T 15: βǫ = βǫ + gi Hif + HiΣ Hi i 2 − f 16: end for    b b b 17: for j =1 to M do 2 (n+1) 1 (n+1) (n+1) 18: βzj = βz + 2 zj + Σzjj 19: end for    b b 20: end for b (n+1) (n+1) return f (n+1), Σ , z(n+1), Σ ,, β(n+1), β(n+1), β(n+1) , n = NoIter f z ξj ǫi zj 21: end function  b b b b b b b

76 Initialization

1 jT j − jT j j f = H V H + v H V g H − f − + v D z j ǫ ξj ǫ − ξj j   h  1  i HjT V H j v − b b Σfjj e= b ǫ + ξj b e b

(a) - update estimated f and covariance matrixΣf b b e 1 b jT j − jT j j z = D V D + v D V f D− z− j ξ zj ξ −   h 1 i DjT V Dj v −  b Σzbjj = e ξ b+ zj b

(b) - update estimated z and covariance matrix Σz b b e b V ξ = diag [vξ] (f) 3 b b αξj = αξ + 2 2 β = β + 1 f D z + D Σ D T ξj ξ b 2 j − j j z j    (c) - update estimated parameters corresponding to vξj b IGb b b V ǫ = diag [vǫ] (g) b b 3 αǫi = αǫ + 2 2 β = β + 1 g H f + H Σ H T ǫi ǫ b 2 i − i i f i    (d) - update estimated parameters corresponding to vǫi b IG b b V z = diag [vz] (h) 3 αzj = αz + 2 b b 1 2 βz = βz + z + Σz j b 2 j jj

(e) - update estimated parametersh correspondingi to vzj b IG b

Figure 35: Indirect sparsity (via z) - Iterative algorithm corresponding to full separability PM estimation for Student-t hierarchical model, non-stationary Student-t uncertainties model

a) each element of the uncertainties vector has the same variance, vǫ;

77 b) the variance vǫ is unknown;

−1 p (g f, vǫ)= g Hf, vǫ I , Likelihood : sLL: | N | bǫ bǫ  p vǫ 1, = vǫ 1, .   | 2 IG | 2   1 1 − 1 − 2  p(f 0, vf )= (f 0, V f ) det V f 2 exp V f , | N | ∝ { } − 2 k f k  M M n−α −1 o M β P rior : StPM: f f  p(vf αf ,βf )= j=1 (vf j αf ,βf ) j=1 vf exp j=1 v ,  j fj  | IG | ∝ − Student-t sLL UNK  n o Q V fQ= diag [vf ] , vf = ...,P vf j ,... .     (266)  4.9 Student-t hierarchical model: non-stationary Laplace uncertainties model, unknown uncertainties variances the hierarchical model is using as a prior the Student-t distribution; • the Student-t prior distribution is expressed via StPM, Equation (7), considering the variance vf as unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a non-stationary Laplace uncertainties model is proposed, i.e. a multivariate • Laplace distribution expressed via LPM is used under the following assumption: a) the variance vector vǫ is unknown;

−1 p (g f, vǫ)= g Hf, V , | N | ǫ N  bǫ  bǫ Likelihood : nsLL: p vǫ 1, 2 = i=1 vǫi 1, 2 ,  | IG |  Vǫ = diag[vǫ]Q, vǫ = [..., vǫi ,...] ,

 1 1  − 1 − 2  p(f 0, vf )= (f 0, V f ) det V f 2 exp V f , | N | ∝ { } − 2 k f k  M M n−αf −1 o M β P rior : StPM: p(v α ,β )= (v α ,β ) v exp f ,  f f f j=1 f j f f j=1 f j j=1 v  | IG | ∝ − fj Student-t nsLL UNK  n o Q V fQ= diag [vf ] , vf = ...,P vf j ,... .     (267)  5 Hierarchical Models with Laplace prior via LPM

Considering the Laplace distribution for modelling the sparse structure of f in linear forward model, Equation (1), expressed via the conjugate priors LPM, depending on the model proposed for the uncertainties of the model, ǫ, and implicitly the corresponding likelihood, sGL or nsGL, different hierarchical models can be proposed:

5.1 Laplace hierarchical model: stationary Gaussian uncertainties model, known uncer- tainties variance the hierarchical model is using as a prior the Laplace distribution; • the Laplace prior distribution is expressed via LPM, Equation (43), considering the variance vf unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; •

78 for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following two assumptions: a) each element of the uncertainties vector has the same variance, vǫ; b) the variance vǫ is known;

N − 2 1 Likelihood : sGL: p (g f, vǫ)= (g Hf, vǫI) vǫ exp (g Hf) . | N | ∝ − 2vǫ k − k n o 1 1 −1 1 2 p(f 0, vf )= (f 0, V ) det V f 2 exp V f , | N | f ∝ { } − 2 k f k (268)  b n o b P rior : LPM: p(v b )= M (v 1, f ) M v−2 exp M f ,  f f j=1 f j 2 j=1 f j j=1 2v  | IG | ∝ − fj Laplace sGL K  n o Q V f =Qdiag [vf ] , vf = ...,P vf j ,... .     5.2 Laplace hierarchical model: non-stationary Gaussian uncertainties model, known un- certainties variances the hierarchical model is using as a prior the Laplace distribution; • the Laplace prior distribution is expressed via LPM, Equation (43), considering the variance vf unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following assumption: a) the variance vector vǫ is known;

N 1 1 − 2 1 − 2 p (g f, vǫ)= (g Hf, V ǫ) vǫ exp V ǫ (g Hf) , Likelihood : nsGL: | N | ∝ i −2k − k  i=1    Y  V ǫ = diag [vǫ] , vǫ = [...vǫi ,...] .

 1 1 (269)  −1 2 1 2 p(f 0, vf )= (f 0, V ) det V f exp V f , | N | f ∝ { } − 2 k f k  b n o b LPM: M f M −2 M f P rior : p(vf bf )= (vf j 1, ) v exp ,

Laplace nsGL K  j=1 2 j=1 f j j=1 2v  | IG | ∝ − fj  n o Q V f =Qdiag [vf ] , vf = ...,P vf j ,... .     5.3 Laplace hierarchical model: stationary Gaussian uncertainties model, unknown uncer- tainties variance the hierarchical model is using as a prior the Laplace distribution; • the Laplace prior distribution is expressed via LPM, Equation (43), considering the variance vf as unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a stationary Gaussian uncertainties model is proposed, i.e. a multivariate Gaus- • sian distribution is used under the following two assumptions: a) each element of the uncertainties vector has the same variance, vǫ; b) the variance vǫ is unknown;

79 . alc irrhclmdl o-ttoayGusa u Gaussian non-stationary model: hierarchical Laplace 5.4 . alc irrhclmdl ttoaySuettunce Student-t stationary model: hierarchical Laplace 5.5 • • • • • • • • Laplace nsGL UNK Laplace sStL UNK Laplace sGL UNK etdsrbto xrse via expressed distribution dent )tevariance the b) )ec lmn fteucranisvco a h same the has vector uncertainties the of element each a) o h netite vector uncertainties the for the o h netite vector uncertainties the for the h irrhclmdli sn sa as using is model hierarchical the h irrhclmdli sn sa as using is model hierarchical the h alc ro itiuini xrse via expressed is distribution prior Laplace the h alc ro itiuini xrse via expressed is distribution prior Laplace the )tevrac vector assumption: variance following the the a) under used is distribution sian netite variances uncertainties anisvariance tainties rior P Likelihood rior P Likelihood Likelihood rior P likelihood likelihood : : : : LPM: sdrvdfo h itiuinpooe o oeln the modelling for proposed distribution the from derived is sdrvdfo h itiuinpooe o oeln the modelling for proposed distribution the from derived is nsGL: : : v LPM: LPM: sStL: sGL: ǫ is unknown v                      ǫ                       is p p p p p p ( ( ǫ unknown p p p p ǫ v f ( ( ( ( ( ( ( ( a g g v g f a | v f v f 0 ; | ttoayGusa netite model uncertainties Gaussian stationary ǫ | | | StPM f f f ttoaySuettucranismodel uncertainties Student-t stationary | | b f f , | 0 0 α f | | v , b b , , , , = ) ǫ v f f f v v v v β , ǫ = ) prior prior = ) = ) ǫ ǫ f f = ) = ) = ) ; = ) = ) ǫ sue ne h olwn w assumptions: two following the under used is Q = ) N Q Q j M N N N =1 N N the the ( j M j M IG f =1 =1 ( ( ( ( ( g IG g g | f f Laplace Laplace 0 | LPM LPM | | IG IG ( H | | , H H v 0 0 ( v ǫ V , , f ( ( f | f f α v v V V , f − 80 j , , qain( Equation , qain( Equation , f f ǫ V | v v 1 f − f − β , 1 j j ǫ ǫ ) | | , 1 1 ǫ distribution; distribution; I I 1 1 ∝ ǫ ) ) ) , , ) ) variance b ) 2 V ∝ f ∝ ∝ ∝ ∝ ∝ det b b 2 2 ) V V f f f i Y det det ∝ ) ) =1 v v N v f f = { ǫ − ǫ − ǫ − ∝ ∝ V Q v V = = { { 43 43 α N N 2 2 diag ǫ − V V Q Q ǫ f i j M ǫ , − diag diag ,cnieigtevariance the considering ), ,cnieigtevariance the considering ), 2 =1 1 } v exp exp f f = j M j M 1 exp ǫ 2 1 =1 =1 } } [ ; v tite oe,ukonuncer- unknown model, rtainties v exp diag 2 2 1 1 exp [ [ f − n n f v v v v j  exp exp 2 ] f − f − − − f f cranismdl unknown model, ncertainties , j j n 2 2 − ] ] netite vector uncertainties netite vector uncertainties spooe,ie utvraeGaus- multivariate a i.e. proposed, is [ exp n 2 2 v , , v spooe,ie utvraeStu- multivariate a i.e. proposed, is β v 1 1 v v exp exp − 2 1 n n ǫ f ǫ ǫ ǫ ǫ v v ] k − − o k k 2 1 f f n , = V ( ( k , − 2 2 1 1 n n v = = g g V  ǫ − k k ǫ − − , . . . P − − V V   1 2 f [ = 2 1 , . . . , . . . P P ( f f f H H j M 2 1 2 1 g =1 k f f v , . . . j M j M − f f o =1 =1 f k k v v j ) ) 2 o o f f , . . . , H v b k k j j 2 2 v f f , , . . . , . . . , o o v v b b j ǫ v v f f f f f i o . , ǫ ǫ  j j . . . , f f ) ; ; o o . , k   sunknown; as sunknown; as  . . , , ] . , (270) (272) (271) 5.6 Laplace hierarchical model: non-stationary Student-t uncertainties model, unknown uncertainties variances the hierarchical model is using as a prior the Laplace distribution; • the Laplace prior distribution is expressed via LPM, Equation (43), considering the variance vf as unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a non-stationary Student-t uncertainties model is proposed, i.e. a multivariate • Student-t distribution expressed via StPM is used under the following assumption: a) the variance vector vǫ is unknown;

N 1 1 − 2 1 − 2 p (g f, vǫ)= (g Hf, V ǫ) vǫ exp V ǫ (g Hf) , | N | ∝ i −2k − k  i=1 Y   Likelihood : nsStL:  N N −αǫ−1 N βǫ  p (vǫ αǫ,βǫ)= (vǫ αǫ,βǫ) v exp ,  i=1 i i=1 ǫi i=1 vǫ  | IG | ∝ − i n o Q VQǫ = diag [vǫ] , vǫ = [...,P vǫi ,...] ,   1 1  −1 1 2  p(f 0, vf )= (f 0, V ) det V f 2 exp V f , | N | f ∝ { } − 2 k f k  M βf M −2 n Mo βf P rior : LPM: p(vf βf )= (vf 1, ) v exp ,  j=1 j 2 j=1 f j j=1 2vf Laplace nsStL UNK  | IG | ∝ − j  n o Q V f =Qdiag [vf ] , vf = ...,P vf j ,... .     (273) The hierarchical model build over the linear forward model, Equation (1), using as a prior for f a Laplace distribution, expressed via the Laplace Prior Model (LPM), Equation (43) and modelling the uncertainties of the model ǫ using the non-stationary Student Uncertainties Model (nsStUM), Equation (81), is presented in Equation (273). The posterior distribution is obtained via the Bayes rule, Equation (274):

p(f, vf , vǫ g,βf , αǫβǫ) p (g f, vǫ) p (vǫ βǫ) p(f 0, vf ) p(vf βf ) | ∝ | | | | N N N 1 1 − 2 1 − 2 −(αǫ+1) βǫ vǫi exp V ǫ (g Hf) vǫi exp ∝ −2k − k − vǫ i=1 i=1 ( i=1 i ) (274) Y   Y X M M M 1 1 1 1 βf v 2 exp V 2 f v−2 exp f i f f j −2k k −2 vf  j=1 j=1 j=1 j Y   Y  X  The goal is to estimate the unknowns of the hierarchical model, namely f, the main unknown of the linear forward model, Equation (1) which was suppose sparse, and consequently modelled via the Laplace distribution and the two variances appearing in the hierarchical model, Equation (273), the variance corresponding to the sparse structure f, namely vf and the variance corresponding to uncertainties of model ǫ, namely vǫ.

5.6.1 Joint MAP estimation First, the Joint Maximum A Posterior (JMAP) estimation is considered: the unknowns are estimated on the basis of the available data g, by maximizing the posterior distribution:

f, vf , vǫ = argmax p(f, vf , vǫ g, βf , αǫ, βǫ) = arg min (f, vf , vǫ), (275) f , v , v | f , v , v L   ( f ǫ) ( f ǫ) b b b where for the second equality the criterion (f, vf , vǫ) is defined as: L

(f, vf ,, vǫ)= ln p(f, vf , vǫ g, βf , αǫ, βǫ) (276) L − |

81 The MAP estimation correspondsto the solution minimizing the criterion (f, vf , vǫ). From the analytical expression of the posterior distribution, Equation (274) and the definition of the criterionL , Equation (276), we obtain: L N N 1 1 − 2 3 βǫ (f, vǫ, vf )= ln p(f, vǫ, vf g βf , αǫ, βǫ)= V ǫ (g Hf) + αǫ + ln vǫi + L − | 2k − k 2 vǫ i=1 i=1 i   X X (277) M M 1 1 2 3 1 βf + V f f + ln vf j + 2k k 2 2 vf j=1 j=1 j X X

One of the simplest optimisation algorithm that can be used is an alternate optimization of the criterion (f, vǫ, vf ) with respect to the each unknown: L With respect to f: •

∂ (f, vf , vǫ) ∂ − 1 1 T −1 L =0 V ǫ 2 (g Hf) + V f 2 f = H V ǫ (g Hf)+ V f f =0 ∂f ⇔ ∂f k − k k k − − T −1 T −1  H V ǫ H + V f f = H V ǫ g ⇔ T −1 −1 T −1 f = H V ǫ H +V f H V ǫ g ⇒  b

With respect to vf , j 1, 2,...,M : • ∈{ } ∂ (f, v , v ) ∂ f f =0 3 ln v + β v−1 + f 2v =0 L f j f f j j f j ∂vf j ⇔ ∂vf j 2 2  f j v +3vf βf =0 ⇔ f j j − 2 3 9 4f j βf vf j = − ± −2 ⇒ p2f j b With respect to vǫi , i 1, 2,...,N : • ∈{ − 1 } First, we develop the norm V ǫ 2 (g Hf) : k − k − 1 T −1 T −1 T T −1 V ǫ 2 (g Hf) = g V ǫ g 2g V ǫ Hf + H f V ǫ Hf k − k − N N N −1 2 −1 −1 T T = v gi 2 v giHif + v f Hi Hif, ǫi − ǫi ǫi i=1 i=1 i=1 X X X

where Hi denotes the line i of the matrix H, i 1, 2,...,N , i.e. Hi = [hi , hi ,..., hiM ]. ∈{ } 1 1

∂ (f, vf , vǫ) ∂ 3 1 2 T T −1 L =0 αǫ + ln vǫ + βǫ + gi 2giHif + f Hi Hif v =0 ∂v ⇔ ∂v 2 i 2 − ǫi ǫi ǫi      3 1 2  αǫ + vǫ βǫ + (gi Hif) =0 ⇔ 2 i − 2 −     1 2 βǫ + 2 (gi Hif) vǫi = −3 ⇒ αǫ + 2 b The iterative algorithm obtained via JMAP estimation is presented Figure (36).

82 Initialization

1 T 1 − T 1 f = H V ǫ− H + V f H V ǫ− g

 (a) - update estimated f b b b b V f = diag [vf ] (d) b b 2 3 √9 4fj βf v = − ± −2 f j 2f j

(b) - update estimated variances vf j b V ǫ = diag [vǫ] (e)

b 2 b 1 βǫ+ gi H if 2 ( − ) vǫi = 3 αǫ+ 2

(c) - update estimated uncertainties variances vǫi b

Figure 36: Iterative algorithm corresponding to Joint MAP estimation for Laplace hierarchical model, non-stationary Student-t uncertainties model

5.6.2 Posterior Mean estimation via VBA, partial separability In this subsection, the Posterior Mean (PM) estimation is considered. The Joint MAP computes the mod of the posterior distribution. The PM computes the mean of the posterior distribution. One of the advantages of this estimator is that it minimizes the Mean Square Error (MSE). Computing the posterior means of any unknown needs great dimensional integration. For example, the mean corresponding to f can be computed from the posterior distribution using Equation (278),

Ep f = f p(f, vf , vǫ g,βf , αǫ,βǫ)dfdvf dvǫ. (278) { } | ZZZ In general, these computations are not easy. One way to obtain approximate estimates is to approximate p(f, vf , vǫ g,βf ,βǫ) by a separable one q(f, vf , vǫ g,βf , αǫ,βǫ) = q1(f) q2(vf ) q3(vǫ), then computing the posterior means| using the separability. The mean correspon| ding to f is computed using the corresponding separable distribution q1(f), Equation (279),

Eq f = f q (f)df. (279) 1 { } 1 Z If the approximation of the posterior distribution with a separable one can be done in such a way that conserves the mean, i.e. Equation (280), Eq x = Ep x , (280) { } { } for all the unknowns of the model, a great amount of computational cost is gained. In particular, for the proposed hier- archical model, Equation (96), the posterior distribution, Equation (274), is not a separable one, making the analytical

83 computations of the PM very difficult. One way the compute the PM in this case is to first approximate the posterior law p(f, vf , vǫ g,βf , αǫ,βǫ) with a separable law q(f, vf , vǫ g,βf , αǫ,βǫ), Equation (281), | | p(f, vf , vǫ g,βf , αǫ,βǫ) q(f, vf , vǫ g,βf , αǫ,βǫ)= q (f) q (vf ) q (vǫ) (281) | ≈ | 1 2 3 where the notations from Equation (282) are used

M N

q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (282) j=1 i=1 Y Y by minimizing of the Kullback-Leibler divergence, defined as:

KL (q(f, vf , vǫ g,βf , αǫ,βǫ): p(f, vf , vǫ g,βf , αǫ,βǫ)) = | | q(f, vf , vǫ g,βf , αǫ,βǫ) (283) = ... q(f, vf , vǫ g,βf , αǫ,βǫ) ln | dfdvǫdvf | p(f, vf , vǫ g,βf , αǫ,βǫ) ZZ Z | where the notations from Equation (284) are used

M N

dvf = dvf j ; dvǫ = dvǫi . (284) j=1 i=1 Y Y Equation (282) is selecting a partial separability for the approximated posterior distribution q(f, vf , vǫ g,βf , αǫ,βǫ) in the sense that a total separability is imposed for the distributions corresponding to the two variances appearing| in the hierarchical model, q2 (vf ) and q3 (vǫ) but not for the distribution corresponding to f. Evidently, a full separability M can be imposed, by adding the supplementary condition q1(f) = j=1 q1j (f j ) in Equation (282). This case is considered in Subsection (5.6.3). The minimization can be done via alternate optimization resulting the following proportionalities from Equations (285a), (285b) and(285c), Q

q (f) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , (285a) 1 ∝ | ( q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , j 1, 2 ...,M , (285b) ∝ | q (f ) q − (v ) q (v ) ∈{ } (  1 2 j f j 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , i 1, 2 ...,N , (285c) ∝ | q (f ) q (v ) q − (v ) ∈{ } (  1 2 f 3 i ǫi ) using the notations: M N

q2−j (vf j )= q2k(vf k ); q3−i(vǫi )= q3k(vǫk ) (286) k ,k j k ,k i =1Y6= =1Y6= and also u(x) = u(x)v(y)dy. (287)  v(y) Z Via Equation (276) and Equation (277), the analytical expression of logarithm of the posterior distribution is obtained, Equation (288):

N N 1 1 − 2 3 βǫ ln p(f, vǫ, vf g,βf , αǫ,βǫ)= V ǫ (g Hf) αǫ + ln vǫi | − 2k − k− 2 − vǫ i=1 i=1 i   X X (288) M M 1 1 2 3 1 βf V f f ln vf j − 2k k− 2 − 2 vf j=1 j=1 j X X

84 Computation of the analytical expression of q1(f). The proportionality relation corresponding to q1(f) is pre- sented in established in Equation (285a). In the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of f can be regarded as constants. Via Equation (288) the integral defined in Equation| (287) becomes:

1 1 1 − 2 1 2 ln p(f, vf , vǫ g,βf , αǫ,βǫ) = V ǫ (g Hf) V f . (289) | −2 k − k − 2 k f k  q2(vf ) q3(vǫ)  q3(vǫ)  q2(vf ) Introducing the notations:

T vf j = vf j ; vf = vf 1 ... vf j ... vf M ; V f = diag [vf ] q v   2j ( f j )   (290) T e −e1 −1 v e−1 e−1 e −1 e −1 V −1 e v −1 vǫi = vǫi ; ǫ = vǫ1 ... vǫi ... vǫN ; ǫ = diag ǫ q v   3i( ǫi )     the integral frome Equation (289) becomes:e e e e e e

1 1 1 − 2 1 2 ln p(f, vf , vǫ g,βf , αǫ,βǫ) = V ǫ (g Hf) V f . (291) | −2k − k− 2k f k  q2(vf ) q3(vǫ) e e Noting that ln p(f, vf , vǫ g, αǫ,βf ,βǫ) is a quadratic criterion and considering the proportionality |  q2(vf ) q3(vǫ) from Equation (285a) it can be concluded that q1 (f) is a multivariate Normal distribution. Minimizing the criterion leads to the analytical expression of the corresponding mean. The variance is obtained by identification:

−1 T −1 T −1 f = H V ǫ H + V f H V ǫ g, q1(f)= f f, Σ ,  (292) N |   −1  b Te −1 e e    Σ = H1 V ǫ H + V f . b b    We note that both the expressions of the mean and b variance depeend on expectanciese corresponding to two variances of the hierarchical model.

Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) is

presented in established in Equation (285b). In the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of vf j can be regarded as constants. Via Equation (288) the integral defined in Equation| (287) becomes:

1 1 2 3 1 βf ln p(f, vf , vǫ g,βf , αǫ,βǫ) = V f f ln vf j (293) | q (f ) q − (v ) q (v ) −2 k k q (f ) q − (v ) − 2 − 2 vf j   1 2 j f j 3 ǫ   1 2 j fj Introducing the notations:

T vf −i = vf 1 ... vf i−1 vf i vf i+1 ... vf N ; V f −i = diag vf −i (294)

1    2 e the integral V f f e e cane be written:e e e k k q (f ) q − (v )   1 2 j fj

1 1 V 2 f = V 2 f 2 (295) f f −i k k q (f ) q − (v ) k k q (f )   1 2 j fj   1 e Considering that q1(f) is a multivariate Normal distribution, Equation (292):

1 1 2 2 2 2 2 V f = V f + Tr V − Σ = C + v f + Σ (296) f −i f −i f i f i j jj k k q (f ) k k   1     e e b e b b b 85 From Equation (293) and Equation (296):

3 1 1 ln p(f, v , v g,β , α ,β ) = ln v β v−1 f 2 + Σ v f ǫ f ǫ ǫ f j f f j j jj f j (297) | q (f ) q − (v ) q (v ) −2 − 2 − 2   1 2 j fj 3 ǫ   b b from which it can establish the proportionality corresponding to q2j (vf j ):

3 − 2 1 2 −1 q2j (vf ) v exp f j + Σjj vf + βf v , (298) j ∝ f j −2 j f j  h  i b b leading to the conclusion that q2j (vf j ) is a generalized inverse Gaussian distribution (see Equation (46)) with the following parameters: 2 Σ afj = f j + jj  q2j (vf j )= vf j afj , hfj , cfj , h = β (299) GIG |  bfj bf b     1 e cfj = b b e − 2   Computation of the analytical expression of q3i(vǫi ). The proportionalityb relation corresponding to q3i(vǫi ) is presented in established in Equation (285c). In the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of vǫi can be regarded as constants. Via Equation (288) the integral defined in Equation| (287) becomes:

1 1 − 2 ln p(f, vf , vǫ g,βf , αǫ,βǫ) = V ǫ (g Hf) | q (f ) q (v ) q − (v ) − 2 k − k q (f ) q − (v )   1 2 f 3 i ǫi   1 3 i ǫi (300) 3 βǫ αǫ + ln vǫ − 2 i − v   ǫi Introducing the notations:

T v−1 v−1 ... v−1 v−1 v−1 ... v−1 V −1 v−1 ǫ−i = ǫ1 ǫi−1 ǫi ǫi+1 ǫN ; ǫ−i = diag ǫ−i (301)     1 e − 2 e e e e e e the integral V ǫ (g Hf) can be written: k − k q (f ) q − (v )   1 3 i ǫi

1 1 − 2 − 2 2 V ǫ (g Hf) = V ǫ−i (g Hf) (302) k − k q (f ) q − (v ) k − k q (f )   1 3 i ǫi   1 e Considering that q1(f) is a multivariate Normal distribution, Equation (292):

− 1 − 1 2 2 2 2 T −1 Σ V ǫ−i (g Hf) = V ǫ−i g Hf + Tr H V ǫ−i H (303) k − k q (f ) k − k   1     e e b b and considering as constants all terms free of vǫi :

1 2 − 2 2 −1 T −1 −1 T V ǫ− g Hf = C + v gi Hif ; Tr H V HΣ = C + v HiΣHi (304) k i − k ǫi − ǫ−i ǫi       where Hieis the line i ofb the matrix H, so we can conclude:b e b b

− 1 2 2 2 Σ T −1 V ǫ (g Hf) = C + Hi Hi + gi Hif vǫi (305) k − k q (f ) q − (v ) −   1 3 i ǫi     b b

86 From Equation (300) and Equation (305):

1 ln p(f, vf , vǫ g,βf , αǫ,βǫ) = αǫ + ln vǫi | q (f ) q (v ) q − (v ) − 2   1 2 f 3 i ǫi   (306) 2 1 T −1 βǫ + HiΣHi + gi Hif v − 2 − ǫi      b b from which it can establish the proportionality corresponding to q3i(vǫi ):

3 2 −(αǫ+ 2 ) 1 T −1 q i(vǫ ) vǫ exp βǫ + HiΣHi + gi Hif v , (307) 3 i ∝ i − 2 − ǫi        b b leading to the conclusion that q3i(vǫi ) are Inverse Gamma distributions with the following parameters:

3 αǫi = αǫ + 2 (308) q3i(vǫi )= vǫi αǫi , βǫi ,  2 IG | 1 Σ T  βbǫi = βǫ + Hi Hi + gi Hif    2 − b b      b b b Equations (292),(299) and (308) resume the distributions families and the corresponding parameters for q1(f),

a multivariate Normal distribution, q2j (vf j ), j 1, 2,...,M Inverse Gamma distributions and q3i(vǫi ), i 1, 2,...,N , generalized inverse Gaussian distributions.∈ { However,} the parameters corresponding to the multivariate∈ { } V −1 V −1 Normal distribution are expressed via ǫ and f (and by extension all elements forming the three matrices vǫi , i 1, 2,...,N and vf , j 1, 2,...,M ). ∈{ } j ∈{ } e e e

Computation of the analyticale expression V f . For an generalized inverse Gaussian distribution with parameters a, b and 1 , (x a,b,c), the following relation holds: − 2 GIG | e

− 1 1 √ab a 2 2 x = K , (309) 1   GIG(x|a,b,− ) b 1 √   2 − ab   K 2   where p represents the modified Bessel function of the second kind. To proveK the above relation we consider the direct computation, using the analytical expression of the generalized inverse Gaussian distribution:

1 a − 2 1 b − 1 −1 1 −1 x = x (x a, b, )dx = x x 2 exp ax + bx dx 1 GIG(x|a,b,− ) GIG | −2 1 √ −2   2 Z Z −  ab   K 2  1 a − 2   b 1 −1 1 −1 = x 2 exp ax + bx dx 1 √ −2 −  ab Z   K 2  − 1  1 a 2 1 √ab a 2 2 1 1 b K b 2 −1 −1 = 1 x exp ax + bx dx a 2  1 √ 1 √ −2 −  ab b Z  ab   K 2 K 2      −1 1 √ab a 2 1 b = K (x a, b, )dx = b   GIG | 2 a − 1 √ab   K 2 Z 1 1  | {z } | {z }

87 The fact that the integral of the generalized inverse Gaussian distribution is obvious. Proving that the ratio between

the two modified Bessel functions of the second kind is 1, i.e. that 1 √ab = − 1 √ab comes from expressing K 2 K 2 the modified Bessel function of the second kind α (x) via the modified Bessel function of the first kind α (x): K I

π −α (x) α (x) π α (x) −α (x) π α (x) −α (x) α (x)= I − I = I − I = I − I = α (x) (310) K 2 sin (απ) 2 sin (απ) 2 sin ( απ) K− − −

Since q j (vf ), j 1, 2,...,M are generalized inverse Gaussian distributions, with the corresponding parameters 2 j ∈{ } afj , hfj and cfj , j 1, 2,...,M , the expectancies vf j can be expressed via the parameters of the generalized inverse Gaussian distributions∈ { using} Equation (309): b e b e hfj vf j = (311) afj e Using the notation introduced in (290): e b

hf1 a ... 0 ... 0 ef1  . . . . .  b ......

 hfj  V f =  0 ... a ... 0  = V f (312)  efj   . . .   . .. b. .. .  e  . . . . .  b  h  0 ... 0 ... fM   af   e M   b  In Equation (312) other notation is introduced for V f . The value was expressed during the model via unknown expectancies, but via Equation (312) this value doesn’t contain any more integrals to be computed. Therefore, the new notation represents the final analytical expression usede for expressing the density functions qi.

−1 Computation of the analytical expression V ǫ . For an Inverse Gamma distribution with scale and shape parame- ters α and β, (x α, β), the following relation holds: IG | e α x−1 = (313) β  IG(x|α,β) The prove of the above relation is done by direct computation, using the analytical expression of the Inverse Gamma Distribution: βα β βα Γ(α + 1) βα+1 β x−1 = x−1 x−α−1 exp dx = x−(α+1)−1 exp dx = Γ(α) − x Γ(α) βα+1 Γ(α + 1) − x  IG(x|α,β) Z   Z   α α = (x α +1,β) dx = β IG | β Z 1 | {z }

Since q3i(vǫi ), i 1, 2,...,N are Inverse Gamma distributions, with the corresponding parameters αǫi and βǫi , i 1, 2,...,N ∈the { expectancies} v−1 can be expressed via the parameters of the Inverse Gamma distributions using ∈{ } ǫi Equation (313): b b α e −1 ǫi vǫi = (314) βǫi b e b 88 Using the notation introduced in (290):

α ǫ1 ... 0 ... 0 β bǫ1 . . . . .  b ......  α V −1 =  0 ... ǫi ... 0  = V −1 (315) ǫ  βǫ  ǫ  b i   . .. b. .. .  e  . . . . .  b  α  0 ... 0 ... ǫN   β   bǫN   b  −1 In Equation (315) other notation is introduced for V ǫ . The value was expressed during the model via unknown expectancies, but via Equation (315) this value doesn’t contain any more integrals to be computed. Therefore, the new notation represents the final analytical expressions usede for expressing the density functions qi. Using Equation (312) and Equations (292),(299)and(308), the final analytical expressions of the separable distributions qi are presented in Equations (316a),(316b)and (316c).

−1 T T f = H V ǫH + V f H V ǫg, q1(f)= f f, Σ ,  , (316a) N |   −1  b Tb b b    Σ = H1 V ǫH + V f b b    2 Σ  bafj = f j +b jj b  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M , (316b) GIG |  bfj bf b ∈{ }     1 b e b ecfj = 2  −  3  αǫ = αǫ + b i 2 (316c) q3i(vǫi )= vǫi αǫi , βǫi ,  2 ,i 1, 2,...,N . IG | 1 Σ T ∈{ }  βbǫi = βǫ + Hi Hi + gi Hif    2 − b b     Equation (316a) establishes the dependency b between the parametersb correspondingb to the multivariate Normal dis- tribution q1(f) and the others parameters involved in the hierarchical model: the mean f and the covariance Σ matrix depend on V ǫ and V f which, via Equation (312) are defined using afj , hfj , j 1, 2,...,M b ∈ { } and aǫ , hǫ ,i 1, 2,...,N . The dependency between the parameters of then multivariateo Normal distribu- bi i ∈ {b b } b e tion nq1(f) ando the parameters of the generalized inverse Gaussian distributions q2j (vf j ), j 1, 2,...,M and ∈ { } q i(vǫb),ie 1, 2,...,N is presented in Figure (37). Equation (316b) establishes the dependency between the 3 i ∈ { }

✲ afj , hfj , aǫj , hǫj f , Σ n o n o b e b e b b Figure 37: Dependency between q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters

parameters corresponding to the Inverse Gamma distributions q j (vf ), j 1, 2,...,M and the others param- 2 j ∈ { } eters involved in the hierarchical model: the shape and scale parameters af , hf , j 1, 2,...,M depend j j ∈ { } on the mean f and the covariance matrix Σ of the multivariate Normaln distributiono q1(f), Figure (38). Equa- tion (316c) establishes the dependency between the parameters correspondingb toe the Inverse Gamma distributions q i(vǫ ),i b1, 2,...,N and the others parametersb involved in the hierarchical model: the shape and scale param- 3 i ∈{ } eters aǫ , hǫ ,i 1, 2,...,N depend on the mean f and the covariance matrix Σ of the multivariate Normal i i ∈ { } n o e b b b 89 ✲ f , Σ afj , hfj n o b b b e Figure 38: Dependency between q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters

distribution q1(f), Figure (39).

✲ f , Σ aǫj , hǫj n o b b b e Figure 39: Dependency between q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters

The iterative algorithm obtained via PM estimation is presented Figure (40).

Initialization

1 T − T f = H V ǫH + V f H V ǫg   1 Σ T − b = bH1 V ǫHb + V f b   (a) - update estimated f and the covariance matrix Σ h b b b fj V f = diag afj b e  b (d) b 2 Σ afj = f j + jj

hf = βǫ ct. during iterations j ← b b (b)b - update estimated param- eters modelling the variancesGIG v e f j hǫj V ǫ = diag aǫj e  (e) b b 2 a = H ΣH T + g H f ǫj i i i − i   hǫ = βǫ ct. during iterations b j b ← b (c) - update estimated parameters modelling the uncertaintiesGIG variances v e ǫi

Figure 40: Iterative algorithm corresponding to PM estimation via VBA - partial separability for Laplace hierarchical model, non-stationary Student-t uncertainties model

90 5.6.3 Posterior Mean estimation via VBA, full separability In this subsection, the Posterior Mean (PM) estimation is again considered, but via a full separable approximation. The posterior distribution is approximated by a full separable distribution q (f, vf , vǫ), i.e. a supplementary condition is added in Equation (282):

M M N

q1(f)= q1j (f j ), ; q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (317) j=1 j=1 i=1 Y Y Y As in Subsection (5.6.2), the approximationis done by minimizing of the Kullback-Leibler divergence, Equation (283), via alternate optimization resulting the following proportionalities from Equations (318a),(318b)and (318c),

q (f j ) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , j 1, 2 ...,M , (318a) 1 ∝ | ∈{ } ( q1−j (f j )q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , j 1, 2 ...,M , (318b) ∝ | q (f ) q − (v ) q (v ) ∈{ } (  1 2 j f j 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g,βf , αǫ,βǫ) , i 1, 2 ...,N , (318c) ∝ | q (f ) q (v ) q − (v ) ∈{ } (  1 2 f 3 i ǫi ) using the notations introduced in Equation (286), Equation (287) and Equation (319):

M

q1−j (vj )= q1k(vk) (319) k ,k j =1Y6=

The analytical expression of logarithm of the posterior distribution ln p(f, vf , vǫ g,βf , αǫ,βǫ) is obtained in Equa- tion (288). |

Computation of the analytical expression of q1(f). The proportionality relation corresponding to q1(f) is pre- sented in established in Equation (318a). In the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of f i can be regarded as constants: |

1 1 2 ln p(f, vf , vǫ g,βf , αǫ,βǫ) = V ǫ (g Hf) | q − (f ) q (v ) q (v ) − 2 k − k q − (f ) q (v )   1 j j 2 f 3 ǫ   1 j j 3 ǫ (320) 1 1 V 2 f . − 2 k f k  q1−j (f j ) q2(vf ) Using Equation (290) the integral from Equation (320) becomes:

1 1 1 2 1 2 ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) V f . (321) h | iq1−j (f j ) q2(vf ) q3(vǫ) −2 k − k − 2 k f k  q1−j (f j )  q1−j (f j ) e e Considering all the f j free terms as constants, the first norm can be written:

1 1 1 2 2 j 2 2 jT 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (322) k − k k k − − j −j  j −j where H represents thee column j of the matrix He , H represents thee matrix H except the column j, H and f represents the vector f except the element f j . Introducing the notation

T −j f j = f j q1j (f j ) df j ; f = f 1 ... f j−1 f j+1 ... f M (323) Z h i e e e e e e 91 the expectancy of the first norm becomes:

1 1 1 2 2 j 2 2 jT 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (324) k − k q − (f ) k k − −   1 j j   e e e e Considering all the free f j terms as constants, the expectancy for the second norm becomes:

1 2 2 2 V f = C + vf f j (325) k f k j  q1−j (f j ) e e From Equation (318a),(321),(324),and (325) the proportionality for q1j (f j ) becomes:

1 2 j 2 2 jT −j −j q j (f j ) exp V ǫ H + vf f j 2H V ǫ g H f f j (326) 1 ∝ k k j − − n  o 1  e2 j 2 2 jT e −j −j Considering the criterion J(f j ) = V ǫ H + vfe f j 2H V ǫ g H f f j which is quadratic, we k k j − − conclude q1j (f j ) is a Normal distribution. For computing the mean of the Normal distribution, it is sufficient to compute the solution that minimizes thee criterion J(ef j ): e

jT −j −j ∂J(f j ) H V ǫ g H f =0 f j = 1 − . (327) ∂f j ⇔ 2 j V ǫH + vf  ke k b The variance can be obtained by identification. The analytical expressions for the mean and the variance corresponding e e to the Normal distributions, q1(f j ) are presented in Equation (328).

jT −j −j H V ǫ(g−H f ) f j = 1 f 2 j kV ǫ H k+vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (328) N | f 1 ∈{ }  varb j = 1 e    kV 2 H j k+v b ǫ f j c f  e  c Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) established in Equation (318b) refers to vf , so in the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of j | vf j can be regarded as constants,

1 1 2 βf −1 ln p (f, vf , vǫ g,βf , αǫ,βǫ)= C + ln vf f j vf 2 ln vf v , (329) | 2 j − 2 j − j − 2 f j  q1j (f j ) so the integral of the logarithm becomes:

3 β 1 ln p (f, v , v g,β , α ,β ) = C ln v f v−1 f 2 + var v . (330) f ǫ f ǫ ǫ f j f j j j f j | q (f ) q − (v ) q (v ) − 2 − 2 − 2   1 2 j fj 3 ǫ   b c Equation (330) leads to the conclusion that q2j (vf j ) is an generalized inverse Gaussian distribution. Equation (331) presents the analytical expressions for the parameters corresponding to the Inverse Gamma distribution.

2 afj = f j + varj  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M (331) GIG |  bfj bf c ∈{ }     1 e cfj = b b e − 2   b

92 Computation of the analytical expression of q3i(vǫi ). The proportionality relation corresponding to q3i(vǫi ) es-

tablished in Equation (318c) refers to vǫi so in the expression of ln p (f, vf , vǫ g,βf , αǫ,βǫ) all the terms free of vǫi can be regarded as constants: |

3 βǫ −1 1 2 ln p (f, vf , vǫ g,βf ,βǫ)= C ln vǫ v + (gi Hif) vǫ . (332) | − 2 i − 2 ǫi 2 − i Introducing the notation T Not Σ f = f 1 ... f j ... f M = f ; = diag [varj ] (333) q (f )   1 h i the expectancy of the logarithm becomes b b b b b c 3 β 1 2 f v v g ǫ −1 H ΣH T H f ln p ( , f , ǫ ,βf , αǫ,βǫ) = C ln vǫi vǫi i i gi i vǫi , | q (f ) q (v ) q − (v ) − 2 − 2 − 2 − −   1 2 f 3 i ǫi     b b (334)

so the proportionality relation for q3i(vǫi ) from Equation (318c) can be written:

3 2 − 2 1 T −1 q i(vǫ ) vǫ exp HiΣHi + gi Hif vǫ + βǫv (335) 3 i ∝ i −2 − i ǫi       Equation (335) shows that q3i(vǫi ) are generalized inverseb Gaussian distributions.b The analytical expressions of the corresponding parameters are presented in Equation (336). 2 T aǫ = HiΣHi + gi Hif i −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , ,i 1, 2,...,N (336)  hǫi = βǫ b b GIG |  b ∈{ }    1 b e b ecǫi =  − 2  Since q2(vf j ), j 1, 2,...,M and q3i(vǫi ),i 1, 2,...,N are generalized inverse Gaussian distributions, it ∈ { } ∈b { } is easy to obtain analytical expressions for V ǫ, defined in Equation (290) and vf j , j 1, 2,...,M , obtaining the same expressions as in Equation (312). Using Equation (312) and Equations (328),(331∈)and( { 336), the} final analytical expressions of the separable distributions qi eare presented in Equations (337a),(e337b)and (337c).

jT −j −j H V ǫ(g−H f ) f j = 1 c 2 j kV ǫ H k+vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (337a) c N | 1 b ∈{ }  varb j = 1    kV 2 H j k+v b ǫ fj c c  b  c 2  afj = f j + varj  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M (337b) GIG |  bfj bf c ∈{ }     1 b e b ecfj = 2  −  2  Σ T baǫi = Hi Hi + gi Hif −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , ,i 1, 2,...,N (337c)  hǫi = βǫ b b GIG |  b ∈{ }    e 1 b b ecǫi =  − 2  Equations (337a),(337b) and (337c) establish dependencies between the parameters of the distributions, very similar to the one presented in Figures (37),(38) and (b39). The iterative algorithm obtained via PM estimation with full separability, is presented Figure (41).

93 Initialization

jT −j −j H V ǫ g H f ( − ) f j = 1 2 j Vc ǫ H +vf k k j 1 b varj = c 1 2 j b V ǫ H +vf k k j

(a) - update estimatedc f j and b hf cthe coresponding variance var j j V f = diag afj e  b (d) b 2 afj = f j + varj 1 hf = βf + ct. during iterations j 2 ←b (b)b - update estimated cparam- eters modelling the variancesIG v e f j hǫj V ǫ = diag aǫj e  (e) b b 2 a = H ΣH T + g H f ǫi i i i − i   hǫ = βǫ ct. during iterations b i b ← b (c) - update estimated parameters modelling the uncertaintiesIG variances v e ǫi

Figure 41: Iterative algorithm corresponding to PM estimation via VBA - full separability for Laplace hierarchical model, non-stationary Student-t uncertainties model

5.7 Laplace hierarchical model: stationary Laplace uncertainties model, unknown uncer- tainties variance the hierarchical model is using as a prior the Laplace distribution; • the Laplace prior distribution is expressed via LPM, Equation (43), considering the variance vf as unknown; • the likelihood is derived from the distribution proposed for modelling the uncertainties vector ǫ; • for the uncertainties vector ǫ a stationary Laplace uncertainties model is proposed, i.e. a multivariate Laplace • distribution expressed via LPM is used under the following two assumptions: a) each element of the uncertainties vector has the same variance, vǫ; b) the variance vǫ is unknown;

94 aine pern nteheacia oe,Euto ( Equation model, namely hierarchical the in hierarchical appearing the variances of unknowns ( the Equation estimate model, to is goal The ( Equation rule, Bayes the via obtained Equa is (nsLUM), distribution Model ( Uncertainties Equation Laplace (LPM), non-stationary Model Prior Laplace the via expressed h irrhclmdlbidoe h ierfradmodel, forward linear the over build model hierarchical The . alc irrhclmdl o-ttoayLpaeun Laplace non-stationary model: hierarchical Laplace 5.8 • • • Laplace nsLL UNK • Laplace sLL UNK o h netite vector uncertainties the for the )tevrac vector variance the a) via expressed distribution Laplace h irrhclmdli sn sa as using is model hierarchical the h alc ro itiuini xrse via expressed is distribution prior Laplace the etite variances certainties v rior P Likelihood Likelihood rior P f likelihood n h ainecrepnigt netite fmodel of uncertainties to corresponding variance the and p ( f : : , v 1 f hc a ups pre n osqetymdle i th via modelled consequently and sparse, suppose was which ) , sdrvdfo h itiuinpooe o oeln the modelling for proposed distribution the from derived is : : v LPM: LPM: nsLL: sLL: ǫ | g β , v f ǫ            β ,                            is ǫ p p p p ) unknown p p p p ǫ ( ( ( ( ∝ ∝ ( ( ( v f v f v g a g v f | f | ǫ | | ǫ 0 o-ttoayLpaeucranismodel uncertainties Laplace non-stationary f j 0 | Y i Y p | f | | M β =1 N =1 , β b , 1 , ( , v f f ǫ v LPM , v g v = ) prior = ) = ) f v f v b | ǫ 2 ǫ ǫ = ) f ǫ ; f = ) = ) = ) 2 1 2 1 i  i , Q v Q Q = exp exp sue ne h olwn assumption: following the under used is N ǫ N N the N i N j M j M ) IG =1 =1 =1 ( (  p f  f Laplace g g IG LPM ( − IG IG | − | 340 0 v | | v 0 H H , 2 1 ǫ ǫ , 1 2 ( (  | | k V β 1 v v k f V f ): 95 43 v V , qain( Equation , 339 ǫ V f f , , ǫ f − ) f − j j b i V 2 v n oeln h netite ftemodel the of uncertainties the modelling and ) qain( Equation ǫ ǫ 2 1 | | 1 f | 2 1 1 distribution; 1 1 p ǫ − 1 in( tion  ) ,tevrac orsodn otesas structure sparse the to corresponding variance the ), ) ( ǫ − f , , ( , 1 g ∝ ∝ f ,namely l, 1 ∝ k β I β 2 2 b  V f 2 − ǫ  | V  f 0 det v  det ) ∝ 89 ) , ∝ ǫ − f H f j ∝ Y i v ∝ ∝ M 2 =1 ǫ ,i rsne nEuto ( Equation in presented is ), det = { = f { 43 v namely , f 1 V Q ) V Q exp ǫ Q N ,uiga ro for prior a as using ), 2 v diag ) V diag ,cnieigtevariance the considering ), p { f j M f − k f i N j M f ( =1 V =1 } j =1 exp ǫ  } 2 v n h anukono h ierforward linear the of unknown main the , 1 2 = 2 1 ǫ f [ exp [ − v etite oe,ukonun- unknown model, certainties v } v v i Y v | exp =1 N f − β ǫ − exp  1 f − 2 diag f f 2 i v j f 2 j 2 ] − v b 2 exp ] netite vector uncertainties ǫ ) ǫ v    , ǫ , . j exp exp v n ǫ − exp n 2 alc itiuinadtetwo the and distribution Laplace e v i v − o ǫ [ 2 − v − f k n f , ǫ X j spooe,ie multivariate a i.e. proposed, is ( exp 1 n 2 − n M = ] 2 1 =1 n = g k , k − − − 2 1 V  V −  v k v , . . . P ( β P , . . . P ǫ V f f 2 1 f H 2 1 f − j [ = f f j M f i N ǫ j M 1 2 =1    =1 =1 f k v k X v i alc distribution, Laplace a ( o =1 N f , . . . o ) f g j 2 2 j k 339 2 , β , . . . , β v v − . . . , v

b v ǫ ǫ v f f β f f j j ǫ v , ǫ j f ǫ H .Teposterior The ). i o o ǫ ; o  )  i sunknown; as , . . . , f . , . , ) ǫ k ] o sn the using , , (340) (339) (338) f , 5.8.1 Joint MAP estimation First, the Joint Maximum A Posterior (JMAP) estimation is considered: the unknowns are estimated on the basis of the available data g, by maximizing the posterior distribution:

f, vf , vǫ = argmax p(f, vf , vǫ g, βf , βǫ) = arg min (f, vf , vǫ), (341) f , v , v | f , v , v L   ( f ǫ) ( f ǫ) b where for the second equalityb b the criterion (f, vf , vǫ) is defined as: L (f, vf ,, vǫ)= ln p(f, vf , vǫ g, βf , βǫ) (342) L − | The MAP estimation correspondsto the solution minimizing the criterion (f, vf , vǫ). From the analytical expression of the posterior distribution, Equation (340) and the definition of the criterionL Equation (342), we obtain: L N N 1 1 2 3 1 βǫ (f, vǫ, vf )= ln p(f, vǫ, vf g)= V ǫ (g Hf) + ln vǫi + L − | 2k − k 2 2 vǫ i=1 i=1 i X X (343) M M 1 1 2 3 1 βf + V f f + ln vf j + 2k k 2 2 vf j=1 j=1 j X X One of the simplest optimisation algorithm that can be used is an alternate optimization of the criterion (f, vǫ, vf ) with respect to the each unknown: L With respect to f: • ∂ (f, vf , vǫ) ∂ 1 1 T L =0 V ǫ 2 (g Hf) + V f 2 f = H V ǫ (g Hf)+ V f f =0 ∂f ⇔ ∂f k − k k k − − T T  H V ǫH + V f f = H V ǫg ⇔ T −1 T f = H V ǫH +V f H V ǫg ⇒

With respect to vf , j 1, 2,...,M :  • ∈{ b } ∂ (f, v , v ) ∂ f f =0 3 ln v + β v−1 + f 2v =0 L f j f f j j f j ∂vf j ⇔ ∂vf j 2 2  f j v +3vf βf =0 ⇔ f j j − 2 3 9 4f j βf vf j = − ± −2 ⇒ p2f j

With respect to vǫi , i 1, 2,...,N : b • ∈{ 1 } First, we develop the norm V ǫ 2 (g Hf) : k − k 1 T T T T V ǫ 2 (g Hf) = g V ǫg 2g V ǫHf + H f V ǫHf k − k − N N N 2 T T = vǫ gi 2 vǫ giHif + vǫ f Hi Hif, i − i i i=1 i=1 i=1 X X X where Hi denotes the line i of the matrix H, i 1, 2,...,N , i.e. Hi = [hi , hi ,..., hiM ]. ∈{ } 1 1 ∂ (f, v , v ) ∂ f ǫ −1 2 H f f T H T H f L =0 3 ln vǫi + βǫvǫi + gi 2gi i + i i vǫi =0 ∂vǫi ⇔ ∂vǫi −  2 2   (gi Hif) v +3vǫ βǫ =0 ⇔ − ǫi i − 2 3 9 4 (gi Hif) βǫ − ± − − vǫi = 2 ⇒ q2 (gi Hif) − b 96 The iterative algorithm obtained via JMAP estimation is presented Figure (42).

Initialization

1 T − T f = H V ǫH + V f H V ǫg

 (a) - update estimated f b b b b V f = diag [vf ] (d) b 2 b 3 √9 4fj βf v = − ± −2 f j 2f j

(b) - update estimated variances vf j b V ǫ = diag [vǫ] (e) b 2 b 3 9 4 gi H if βǫ − ± − ( − ) vǫi = q 2 2 gi H if ( − ) (c) - update estimated uncertainties variances v b ǫi

Figure 42: Iterative algorithm corresponding to Joint MAP estimation for Laplace hierarchical model, non-stationary Laplace uncertainties model

5.8.2 Posterior Mean estimation via VBA, partial separability In this subsection, the Posterior Mean (PM) estimation is considered. The Joint MAP computes the mod of the posterior distribution. The PM computes the mean of the posterior distribution. One of the advantages of this estimator is that it minimizes the Mean Square Error (MSE). Computing the posterior means of any unknown needs great dimensional integration. For example, the mean corresponding to f can be computed from the posterior distribution using Equation (344),

Ep f = f p(f, vf , vǫ g,βf ,βǫ)dfdvf dvǫ. (344) { } | ZZZ In general, these computations are not easy. One way to obtain approximate estimates is to approximate p(f, vf , vǫ g,βf ,βǫ) by a separable one q(f, vf , vǫ g,βf ,βǫ) = q1(f) q2(vf ) q3(vǫ), then computing the pos- terior means| using the separability. The mean correspondin| g to f is computed using the corresponding separable distribution q1(f), Equation (345),

Eq f = f q (f)df. (345) 1 { } 1 Z If the approximation of the posterior distribution with a separable one can be done in such a way that conserves the mean, i.e. Equation (346), Eq x = Ep x , (346) { } { }

97 for all the unknowns of the model, a great amount of computational cost is gained. In particular, for the proposed hier- archical model, Equation (96), the posterior distribution, Equation (340), is not a separable one, making the analytical computations of the PM very difficult. One way the compute the PM in this case is to first approximate the posterior law p(f, vf , vǫ g,βf ,βǫ) with a separable law q(f, vf , vǫ g,βf ,βǫ), Equation (347), | | p(f, vf , vǫ g,βf ,βǫ) q(f, vf , vǫ g,βf ,βǫ)= q (f) q (vf ) q (vǫ) (347) | ≈ | 1 2 3 where the notations from Equation (348) are used

M N

q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (348) j=1 i=1 Y Y by minimizing of the Kullback-Leibler divergence, defined as:

KL (q(f, vf , vǫ g,βf ,βǫ): p(f, vf , vǫ g,βf ,βǫ)) = | | q(f, vf , vǫ g,βf ,βǫ) (349) = ... q(f, vf , vǫ g,βf ,βǫ) ln | dfdvǫdvf | p(f, vf , vǫ g,βf ,βǫ) ZZ Z | where the notations from Equation (350) are used

M N

dvf = dvf j ; dvǫ = dvǫi . (350) j=1 i=1 Y Y Equation (348) is selecting a partial separability for the approximated posterior distribution q(f, vf , vǫ g,βf ,βǫ) in the sense that a total separability is imposed for the distributions corresponding to the two variances appearing| in the hierarchical model, q2 (vf ) and q3 (vǫ) but not for the distribution corresponding to f. Evidently, a full separability M can be imposed, by adding the supplementary condition q1(f) = j=1 q1j (f j ) in Equation (348). This case is considered in Subsection (5.8.3). The minimization can be done via alternate optimization resulting the following proportionalities from Equations (351a), (351b) and(351c), Q

q (f) exp ln p(f, vf , vǫ g,βf ,βǫ) , (351a) 1 ∝ | ( q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g,βf ,βǫ) , j 1, 2 ...,M , (351b) ∝ | q (f ) q − (v ) q (v ) ∈{ } (  1 2 j f j 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g,βf ,βǫ) , i 1, 2 ...,N , (351c) ∝ | q ( ) q ( ) q − (v ) ∈{ } (  1 f 2 vf 3 i ǫi ) using the notations: M N

q2−j (vf j )= q2k(vf k ); q3−i(vǫi )= q3k(vǫk ) (352) k ,k j k ,k i =1Y6= =1Y6= and also u(x) = u(x)v(y)dy. (353)  v(y) Z Via Equation (342) and Equation (343), the analytical expression of logarithm of the posterior distribution is obtained, Equation (354):

N N M M 1 1 1 2 3 1 βǫ 1 2 3 1 βf ln p(f, vǫ, vf g,βf ,βǫ)= V ǫ (g Hf) ln vǫi V f f ln vf j | − 2k − k− 2 − 2 vǫ − 2k k− 2 − 2 vf i=1 i=1 i j=1 j=1 j X X X X (354)

98 Computation of the analytical expression of q1(f). The proportionality relation corresponding to q1(f) is pre- sented in established in Equation (351a). In the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of f can be regarded as constants. Via Equation (354) the integral defined in Equation (353| ) becomes:

1 1 1 2 1 2 ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) V f . (355) | −2 k − k − 2 k f k  q2(vf ) q3(vǫ)  q3(vǫ)  q2(vf ) Introducing the notations:

T vf j = vf j ; vf = vf 1 ... vf j ... vf M ; V f = diag [vf ] q v   2j ( f j )   (356) e e e e e T e e vǫi = vǫi ; vǫ = vǫ1 ... vǫi ... vǫN ; V ǫ = diag [vǫ] q v   3i( ǫi )   the integral from Equatione (355) becomes: e e e e e e

1 1 1 2 1 2 ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) V f . (357) | −2k − k− 2k f k  q2(vf ) q3(vǫ) e e Noting that ln p(f, vf , vǫ g,βf ,βǫ) is a quadratic criterion and considering the proportionality from |  q2(vf ) q3(vǫ) Equation (351a) it can be concluded that q1 (f) is a multivariate Normal distribution. Minimizing the criterion leads to the analytical expression of the corresponding mean. The variance is obtained by identification:

−1 T T f = H V ǫH + V f H V ǫg, q1(f)= f f, Σ ,  (358) N |   −1  b Te e e    Σ = H1 V ǫH + V f . b b    We note that both the expressions of the mean and varianceb depende on expectanciese corresponding to two variances of the hierarchical model.

Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) is

presented in established in Equation (351b). In the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of vf j can be regarded as constants. Via Equation (354) the integral defined in Equation (353| ) becomes:

1 1 2 3 1 βf ln p(f, vf , vǫ g,βf ,βǫ) = V f f ln vf j (359) | q (f ) q − (v ) q (v ) −2 k k q (f ) q − (v ) − 2 − 2 vf j   1 2 j f j 3 ǫ   1 2 j fj Introducing the notations:

T vf −i = vf 1 ... vf i−1 vf i vf i+1 ... vf N ; V f −i = diag vf −i (360)

1    2 e the integral V f f e e cane be written:e e e k k q (f ) q − (v )   1 2 j fj

1 1 V 2 f = V 2 f 2 (361) f f −i k k q (f ) q − (v ) k k q (f )   1 2 j fj   1 e Considering that q1(f) is a multivariate Normal distribution, Equation (358):

1 1 2 2 2 2 2 V f = V f + Tr V − Σ = C + v f + Σ (362) f −i f −i f i f i j jj k k q (f ) k k   1     e e b e b b b 99 From Equation (359) and Equation (362): 3 1 1 ln p(f, v , v g,β ,β ) = ln v β v−1 f 2 + Σ v f ǫ f ǫ f j f f j j jj f j (363) | q (f ) q − (v ) q (v ) −2 − 2 − 2   1 2 j fj 3 ǫ   b b from which it can establish the proportionality corresponding to q2j (vf j ):

3 − 2 1 2 −1 q2j (vf ) v exp f j + Σjj vf + βf v , (364) j ∝ f j −2 j f j  h  i

leading to the conclusion that q2j (vf j ) is a generalized inverseb b Gaussian distribution (see Equation (46)) with the following parameters: 2 Σ afj = f j + jj  q2j (vf j )= vf j afj , hfj , cfj , h = β (365) GIG |  bfj bf b     1 e cfj = b b e − 2   Computation of the analytical expression of q3i(vǫi ). The proportionalityb relation corresponding to q3i(vǫi ) is presented in established in Equation (351c). In the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of vǫi can be regarded as constants. Via Equation (354) the integral defined in Equation (353| ) becomes:

1 1 2 3 1 βǫ ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) ln vǫi | q (f ) q (v ) q − (v ) − 2 k − k q (f ) q − (v ) − 2 − 2 vǫi   1 2 f 3 i ǫi   1 3 i ǫi (366) Introducing the notations:

T vǫ−i = vǫ1 ... vǫi−1 vǫi vǫi+1 ... vǫN ; V ǫ−i = diag vǫ−i (367)

1    2 the integral V ǫ (g eHf) e e cane be written:e e e k − k q (f ) q − (v )   1 3 i ǫi

1 1 2 2 2 V ǫ (g Hf) = V ǫ−i (g Hf) (368) k − k q (f ) q − (v ) k − k q (f )   1 3 i ǫi   1 e Considering that q1(f) is a multivariate Normal distribution, Equation (358):

1 1 2 2 2 2 T Σ V ǫ−i (g Hf) = V ǫ−i g Hf + Tr H V ǫ−i H (369) k − k q (f ) k − k   1     e e b b and considering as constants all terms free of vǫi :

1 2 2 2 T Σ Σ T V ǫ− g Hf = C + vǫ gi Hif ; Tr H V ǫ− H = C + vǫ Hi Hi (370) k i − k i − i i       where Hi is thee line i of theb matrix H, so we can conclude:b e b b

1 2 2 2 Σ T V ǫ (g Hf) = C + Hi Hi + gi Hif vǫi (371) k − k q (f ) q − (v ) −   1 3 i ǫi     From Equation (366) and Equation (371): b b

2 3 βǫ −1 1 Σ T ln p(f, vf , vǫ g,βf ,βǫ) = ln vǫi vǫi Hi Hi + gi Hif vǫi | q (f ) q (v ) q − (v ) − 2 − 2 − 2 −   1 2 f 3 i ǫi     b b (372)

100 from which it can establish the proportionality corresponding to q3i(vǫi ):

3 2 − 2 1 T −1 q i(vǫ ) vǫ exp HiΣHi + gi Hif vǫ + βǫv , (373) 3 i ∝ i −2 − i ǫi       b b leading to the conclusion that q3i(vǫi ) is an generalized inverse Gaussian distribution with the following parameters:

2 T aǫ = HiΣHi + gi Hif j −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , (374)  hǫj = βǫ b b GIG |  b    1 b e b ecǫj =  − 2  Equations (358),(365) and (374) resume the distributions families and the corresponding parameters for f , a b q1( ) multivariate Normal distribution and q2j (vf j ), j 1, 2,...,M and q3i(vǫi ), i 1, 2,...,N , generalized inverse Gaussian distributions. However, the parameters∈{ corresponding} to the multivariate∈{ Normal distribution} are expressed

via V ǫ and V f (and by extension all elements forming the three matrices vǫi , i 1, 2,...,N and vf j , j 1, 2,...,M ). ∈ { } ∈ { } e e e e Computation of the analytical expressions of V ǫ and V f . For an generalized inverse Gaussian distribution with parameters a, b and 1 , (x a,b,c), the following relation holds: − 2 GIG | e e

1 1 √ab a − 2 x = K 2 , (375) 1   GIG(x|a,b,− ) b 1 √   2 − ab   K 2   where p represents the modified Bessel function of the second kind. To proveK the above relation we consider the direct computation, using the analytical expression of the generalized inverse Gaussian distribution:

1 a − 2 1 b − 1 −1 1 −1 x = x (x a, b, )dx = x x 2 exp ax + bx dx 1 GIG(x|a,b,− ) GIG | −2 1 √ −2   2 Z Z −  ab   K 2  1 a − 2   b 1 −1 1 −1 = x 2 exp ax + bx dx 1 √ −2 −  ab Z   K 2  − 1  1 a 2 1 √ab a 2 2 1 1 b K b 2 −1 −1 = 1 x exp ax + bx dx a 2  1 √ 1 √ −2 −  ab b Z  ab   K 2 K 2      −1 1 √ab a 2 1 b = K (x a, b, )dx = b   GIG | 2 a − 1 √ab   K 2 Z 1 1  | {z } The fact that the integral of the generalized| {z inverse} Gaussian distribution is obvious. Proving that the ratio between

the two modified Bessel functions of the second kind is 1, i.e. that 1 √ab = − 1 √ab comes from expressing K 2 K 2 the modified Bessel function of the second kind α (x) via the modified Bessel function of the first kind α (x): K I

π −α (x) α (x) π α (x) −α (x) π α (x) −α (x) α (x)= I − I = I − I = I − I = α (x) (376) K 2 sin (απ) 2 sin (απ) 2 sin ( απ) K− − −

101 Since q j (vf ), j 1, 2,...,M and q i(vǫ ), i 1, 2,...,N are generalized inverse Gaussian distributions, with 2 j ∈{ } 3 i ∈{ } the corresponding parameters af , hf and cf , j 1, 2,...,M respectively aǫ and hǫ and cǫ , i 1, 2,...,N j j j ∈{ } i i i ∈{ } the expectancies vf j and vǫi can be expressed via the parameters of the two generalized inverse Gaussian distributions using Equation (375): b e b b e b

e e hfj hǫi vf j = ; vǫi = (377) afj aǫi e e Using the notation introduced in (356): e e b b hf h 1 ... 0 ... 0 ǫ1 af a ... 0 ... 0 e 1 eǫ1  ......  . . . . . b . . . . .  b ...... 

 hf  hǫ V j V V  i  V (378) f =  0 ... a ... 0  = f ; ǫ =  0 ... a ... 0  = ǫ  efj  eǫi  . . .   . . . . .   . .. b. .. .   . . b. . .  e  . . . . .  b e  . . . . .  b     hf  hǫN   M  0 ... 0 ... 0 ... 0 ... a  aǫ   efM   e N       b  b In Equation (378) other notations are introduced for V f and V ǫ. Both values were expressed during the model via unknown expectancies, but via Equation (378) those values don’t contain any more integrals to be computed. There- fore, the new notations represent the final analytical expree ssionse used for expressing the density functions qi. Using Equation (378) and Equations (358),(365) and (374), the final analytical expressions of the separable distributions qi are presented in Equations (379a),(379b)and (379c).

−1 T T f = H V ǫH + V f H V ǫg, q1(f)= f f, Σ ,  , (379a) N |   −1  b Tb b b    Σ = H1 V ǫH + V f b b    2 Σ  bafj = f j +b jj b  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M , (379b) GIG |  bfj bf b ∈{ }     1 b e b ecfj = 2  −  2  Σ T baǫj = Hi Hi + gi Hif −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , ,i 1, 2,...,N . (379c)  hǫj = βǫ b b GIG |  b ∈{ }    1 b e b ecǫj =  − 2  Equation (379a) establishes the dependency between the parameters corresponding to the multivariate Normal dis- b tribution q1(f) and the others parameters involved in the hierarchical model: the mean f and the covariance Σ matrix depend on V ǫ and V f which, via Equation (378) are defined using afj , hfj , j 1, 2,...,M b ∈ { } and aǫ , hǫ ,i 1, 2,...,N . The dependency between the parameters of then multivariateo Normal distribu- bi i ∈ {b b } b e tion nq1(f) ando the parameters of the generalized inverse Gaussian distributions q2j (vf j ), j 1, 2,...,M and ∈ { } q i(vǫb),ie 1, 2,...,N is presented in Figure (43). Equation (379b) establishes the dependency between the 3 i ∈ { } parameters corresponding to the Inverse Gamma distributions q j (vf ), j 1, 2,...,M and the others param- 2 j ∈ { } eters involved in the hierarchical model: the shape and scale parameters af , hf , j 1, 2,...,M depend j j ∈ { } on the mean f and the covariance matrix Σ of the multivariate Normaln distributiono q1(f), Figure (44). Equa- tion (379c) establishes the dependency between the parameters correspondingb toe the Inverse Gamma distributions b b 102 ✲ afj , hfj , aǫj , hǫj f , Σ n o n o b e b e b b Figure 43: Dependency between q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters

✲ f , Σ afj , hfj n o b b b e Figure 44: Dependency between q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters

q i(vǫ ),i 1, 2,...,N and the others parameters involved in the hierarchical model: the shape and scale param- 3 i ∈{ } eters aǫ , hǫ ,i 1, 2,...,N depend on the mean f and the covariance matrix Σ of the multivariate Normal i i ∈ { } distributionn q1(of), Figure (45). b e b b ✲ f , Σ aǫj , hǫj n o b b b e Figure 45: Dependency between q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters

The iterative algorithm obtained via PM estimation is presented Figure (46).

5.8.3 Posterior Mean estimation via VBA, full separability In this subsection, the Posterior Mean (PM) estimation is again considered, but via a full separable approximation. The posterior distribution is approximated by a full separable distribution q (f, vf , vǫ), i.e. a supplementary condition is added in Equation (348):

M M N

q1(f)= q1j (f j ), ; q2(vf )= q2j (vf j ), ; q3(vǫ)= q3i(vǫi ) (380) j=1 j=1 i=1 Y Y Y As in Subsection (5.8.2), the approximationis done by minimizing of the Kullback-Leibler divergence, Equation (349), via alternate optimization resulting the following proportionalities from Equations (381a),(381b)and (381c),

q (f j ) exp ln p(f, vf , vǫ g,βf ,βǫ) , j 1, 2 ...,M , (381a) 1 ∝ | ∈{ } ( q1−j (f j )q2(vf ) q3(vǫ))

q2j (vf j ) exp ln p(f, vf , vǫ g,βf ,βǫ) , j 1, 2 ...,M , (381b) ∝ | q (f ) q − (v ) q (v ) ∈{ } (  1 2 j f j 3 ǫ )

q3i(vǫi ) exp ln p(f, vf , vǫ g,βf ,βǫ) , i 1, 2 ...,N , (381c) ∝ | q (f ) q (v ) q − (v ) ∈{ } (  1 2 f 3 i ǫi ) using the notations introduced in Equation (352), Equation (353) and Equation (382):

M

q1−j (vj )= q1k(vk) (382) k ,k j =1Y6= The analytical expression of logarithm of the posterior distribution ln p(f, vf , vǫ g,βf ,βǫ) is obtained in Equa- tion (354). |

103 Initialization

1 T − T f = H V ǫH + V f H V ǫg   1 Σ T − b = bH1 V ǫHb + V f b

(a) - update estimated f and the covariance matrix Σ b b b hfj V f = diag a b fj e  b (d) b 2 Σ afj = f j + jj

h = β ct.b duringb iterations fj b ǫ ← (b) - update estimated param- eters modelling the variancesGIG v e f j hǫj V ǫ = diag aǫj e  (e) b b 2 a = H ΣH T + g H f ǫj i i i − i   hǫ = βǫ ct. during iterations b j b ← b (c) - update estimated parameters modelling the uncertaintiesGIG variances v e ǫi

Figure 46: Iterative algorithm corresponding to PM estimation via VBA - partial separability for Laplace hierarchical model, non-stationary Laplace uncertainties model

Computation of the analytical expression of q1(f). The proportionality relation corresponding to q1(f) is pre- sented in established in Equation (381a). In the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of f i can be regarded as constants: |

1 1 2 ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) | q − (f ) q (v ) q (v ) − 2 k − k q − (f ) q (v )   1 j j 2 f 3 ǫ   1 j j 3 ǫ (383) 1 1 V 2 f . − 2 k f k  q1−j (f j ) q2(vf ) Using Equation (356) the integral from Equation (383) becomes:

1 1 1 2 1 2 ln p(f, vf , vǫ g,βf ,βǫ) = V ǫ (g Hf) V f . (384) h | iq1−j (f j ) q2(vf ) q3(vǫ) −2 k − k − 2 k f k  q1−j (f j )  q1−j (f j )

Considering all the f j free terms as constants, the first norme can be written: e

1 1 1 2 2 j 2 2 jT 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (385) k − k k k − −  e e e 104 where Hj represents the column j of the matrix H, H−j represents the matrix H except the column j, Hj and f −j represents the vector f except the element f j . Introducing the notation

T −j f j = f j q1j (f j ) df j ; f = f 1 ... f j−1 f j+1 ... f M (386) Z h i the expectancy of the firste norm becomes: e e e e e

1 1 1 2 2 j 2 2 jT 2 −j −j V ǫ (g Hf) = C + V ǫ H f j 2H V ǫ g H f f j (387) k − k q − (f ) k k − −   1 j j   Considering all thee free f j terms as constants, the expectancye for the second norme becomes: e

1 2 2 2 V f = C + vf f j (388) k f k j  q1−j (f j ) e From Equation (381a),(384),(387),and (388) the proportionality fore q1j (f j ) becomes:

1 2 j 2 2 jT −j −j q j (f j ) exp V ǫ H + vf f j 2H V ǫ g H f f j (389) 1 ∝ k k j − − n 1   o 2 j 2 2 jT −j −j Considering the criterion J(f j ) = V eǫ H + vfe f j 2H Veǫ g H f f j which is quadratic, we k k j − − conclude q1j (f j ) is a Normal distribution. For computing the mean of the Normal distribution, it is sufficient to compute the solution that minimizes thee criterion J(ef j ): e

jT −j −j ∂J(f j ) H V ǫ g H f =0 f j = 1 − . (390) ∂f j ⇔ 2 j V ǫH + vf  ke k The variance can be obtained by identification. The analyticb al expressions for the mean and the variance corresponding e to the Normal distributions, q1(f j ) are presented in Equation (391). e

jT −j −j H V ǫ(g−H f ) f j = 1 f 2 j kV ǫ H k+vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (391) f N | 1 e ∈{ }  varb j = 1    kV 2 H j k+v b ǫ f j c f  e  c Computation of the analytical expression of q2j (vf j ). The proportionality relation corresponding to q2j (vf j ) established in Equation (381b) refers to vf j , so in the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of vf j can be regarded as constants, |

1 1 2 βf −1 ln p (f, vf , vǫ g,βf ,βǫ)= C + ln vf f j vf 2 ln vf v , (392) | 2 j − 2 j − j − 2 f j  q1j (f j ) so the integral of the logarithm becomes: 3 β 1 ln p (f, v , v g,β ,β ) = C ln v f v−1 f 2 + var v . (393) f ǫ f ǫ f j f j j j f j | q (f ) q − (v ) q (v ) − 2 − 2 − 2   1 2 j fj 3 ǫ   b c Equation (393) leads to the conclusion that q2j (vf j ) is an generalized inverse Gaussian distribution. Equation (394) presents the analytical expressions for the parameters corresponding to the Inverse Gamma distribution.

2 afj = f j + varj  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M (394) GIG |  bfj bf c ∈{ }     1 e cf = b b e j − 2   105b Computation of the analytical expression of q3i(vǫi ). The proportionality relation corresponding to q3i(vǫi ) es-

tablished in Equation (381c) refers to vǫi so in the expression of ln p (f, vf , vǫ g,βf ,βǫ) all the terms free of vǫi can be regarded as constants: |

3 βǫ −1 1 2 ln p (f, vf , vǫ g,βf ,βǫ)= C ln vǫ v + (gi Hif) vǫ . (395) | − 2 i − 2 ǫi 2 − i Introducing the notation T Not Σ f = f 1 ... f j ... f M = f ; = diag [varj ] (396) q (f )   1 h i the expectancy of the logarithm becomes b b b b b c 3 β 1 2 f v v g ǫ −1 H ΣH T H ln p ( , f , ǫ ,βf ,βǫ) = C ln vǫi vǫi i i gi if vǫi , | q (f ) q (v ) q − (v ) − 2 − 2 − 2 − −   1 2 f 3 i ǫi     b b (397)

so the proportionality relation for q3i(vǫi ) from Equation (381c) can be written:

3 2 − 2 1 T −1 q i(vǫ ) vǫ exp HiΣHi + gi Hif vǫ + βǫv (398) 3 i ∝ i −2 − i ǫi       Equation (398) shows that q3i(vǫi ) are generalized inverseb Gaussian distributions.b The analytical expressions of the corresponding parameters are presented in Equation (399). 2 T aǫ = HiΣHi + gi Hif i −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , ,i 1, 2,...,N (399)  hǫi = βǫ b b GIG |  b ∈{ }    1 b e b ecǫi =  − 2  Since q2(vf j ), j 1, 2,...,M and q3i(vǫi ),i 1, 2,...,N are generalized inverse Gaussian distributions, it ∈ { } ∈b { } is easy to obtain analytical expressions for V ǫ, defined in Equation (356) and vf j , j 1, 2,...,M , obtaining the same expressions as in Equation (378). Using Equation (378) and Equations (391),(394∈)and( { 399), the} final analytical expressions of the separable distributions qi eare presented in Equations (400a),(e400b)and (400c).

jT −j −j H V ǫ(g−H f ) f j = 1 c 2 j kV ǫ H k+vf q1(f j )= f j f j , varj ,  j , j 1, 2,...,M (400a) N | c 1 ∈{ }  varb j = 1 b    kV 2 H j k+v b ǫ fj c c  b  c 2  afj = f j + varj  q2j (vf j )= vf j afj , hfj , cfj , h = β , j 1, 2,...,M (400b) GIG |  bfj bf c ∈{ }     1 b e b ecfj = 2  −  2  Σ T baǫi = Hi Hi + gi Hif −    q3i(vǫi )= vǫi aǫi , hǫi , cǫi , ,i 1, 2,...,N (400c)  hǫi = βǫ b b GIG |  b ∈{ }    1 b e b ecǫi =  − 2  Equations (400a),(400b) and (400c) establish dependencies between the parameters of the distributions, very similar to the one presented in Figures (43),(44) and (b45). The iterative algorithm obtained via PM estimation with full separability, is presented Figure (47).

106 Initialization

jT −j −j H V ǫ g H f ( − ) f j = 1 2 j Vc ǫ H +vf k k j 1 b varj = c 1 2 j b V ǫ H +vf k k j

(a) - update estimatedc f j and b hf cthe coresponding variance var j j V f = diag afj e  b (d) b 2 afj = f j + varj 1 hf = βf + ct. during iterations j 2 ←b (b)b - update estimated cparam- eters modelling the variancesIG v e f j hǫj V ǫ = diag aǫj e  (e) b b 2 a = H ΣH T + g H f ǫi i i i − i   hǫ = βǫ ct. during iterations b i b ← b (c) - update estimated parameters modelling the uncertaintiesIG variances v e ǫi

Figure 47: Iterative algorithm corresponding to PM estimation via VBA - full separability for Laplace hierarchical model, non-stationary Laplace uncertainties model

List of Figures

1 Interpretation of the forward linear model, Equation (1)...... 4 2 From linear forwardmodel,Equation(1) to the Hierarchical Model: Direct sparsity, i.e. f is sparse . . 5 3 From linearforwardmodel,Equation(1) to the Hierarchical Model: Sparsity via a transformation, i.e. Df issparse ...... 5 4 Hierarchical Model: probability density functions assignedfortheunknowns ...... 8 5 Sparsity Mechanism: Fordirectsparsity applications ...... 8 6 Comparisonbetweenthe Normal distribution (x 0, 1) and Student-t distribution t t (x 0, 1). .. 10 7 Four Student-t probability density functionsN with| different means (µ = 0, blueS and− red,|µ = 1, yellow and µ = 1, violet) and different degrees of freedom (ν = 1, blue and yellow, ν = 1, 5, red and ν = 0.5, violet),− Figure (7a) and the corresponding Generalized Hyperbolic probability density functions, with parameters set as in Equation (26), with α =0.001, Figure (8b). Comparison between the distributions, t t vs. , Figure (8c) and between the logarithm of the distributions log t t vs. log , FigureS (8d− )...... GH S − 14 GH

107 8 The standard Student-t distribution (ν =1), Figure (8a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (26), Figure (8b). Comparison between the two distribution, t t vs. , Figure(8c) and between the logarithm of the distributions log t t vs. log , Figure(S −8d). 15 9 TheGH behaviour of the Generalized Hyperbolic density function −(x λS=− ν/2,− α GH0,β = 0,δ = √ν,µ = 0) depending on α...... GH | − ց 15 10 The Hyperbolic distribution, Figure (10a) and the Generalized Hyperbolic distribution, with param- eters set as in Equation (31), Figure (10b). Comparison between the two distribution, vs. , Figure (10c) and between the logarithm of the distributions log vs. log , FigureH (10d).GH ... 17 11 Comparison between the standard Normal distribution and the− LaplaceH distribution.− GH ...... 18 12 Four Laplace probability density functions with different means values (µ = 0, blue and red, µ = 1, yellow and µ = 1, violet) and different scale parameter values (b = 1, blue and violet, b = 2, red and b = 0.5,− yellow), Figure (12a) and the corresponding Generalized Hyperbolic probability density functions, with parameters set as in Equation (56), with δ =0.001, Figure (12b). Comparison between the distributions, vs. , Figure (12c) and between the logarithm of the distributions log vs. log , Figure (12d)...... L GH L 22 13 The standardGH Laplace distribution, Figure (13a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (56), Figure (13b). Comparison between the two distribution, vs. , Figure (13c) and between the logarithm of the distributions log vs. log , Figure (13dL ).GH ... 23 14 The behaviour of the Generalized Hyperbolic density function− L (x λ−= 1,GH α = b−1,β = 0,δ 0,µ) depending on δ...... GH | ց 23 15 The Variance-Gamma distribution, Figure (15a) and the Generalized Hyperbolic distribution, with parameters set as in Equation (61), Figure (15b). Comparison between the two distribution, vs. , Figure (15c) and between the logarithm of the distributions log vs. log , FigureV − (15d G ). . 25 16GH The Normal-Inverse Gaussian distribution, Figure (16a) and the GeneralizedV − G HyperbolicGH distribution, with parameters set as in Equation (66), Figure (16b). Comparison between the two distribution, vs. , Figure (16c) and between the logarithm of the distributions log vs. log , FigureN ( IG16d). 26 17 IterativeGH algorithm corresponding to Joint MAP estimation for Student-tN IG hierarchicalGH model, non- stationaryStudent-tuncertaintiesmodel ...... 37

18 Dependencybetween q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters ...... 43

19 Dependencybetween q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters ...... 43

20 Dependencybetween q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters ...... 44 21 Iterative algorithm corresponding to PM estimation via VBA - partial separability for Student-t hier- archical model, non-stationary Student-t uncertainties model ...... 44 22 Iterative algorithm corresponding to PM estimation via VBA - full separability for Student-t hierarchi- cal model, non-stationary Student-t uncertainties model ...... 49 23 Indirect sparsity (via z) - Iterative algorithm corresponding to Joint MAP estimation for Student-t hierarchical model, non-stationary Student-t uncertaintiesmodel ...... 53

24 Dependencybetween q1(f) parameters and q2(z), q3j (vξj ) and q4i(vǫi ) parameters ...... 63

25 Dependencybetween q2(z) parameters and q1(f), q4i(vǫi ) and q5j (vzj ) parameters ...... 63

26 Dependencybetween q3j (vξj ) parameters and q1(f) and q2(z) parameters ...... 64

27 Dependencybetween q4i(vǫi ) parameters and q1(f) parameters ...... 64

28 Dependencybetween q5j (vzj ) parameters and q2(z) parameters ...... 64 29 Indirect sparsity (via z) - Iterative algorithm corresponding to partial separability PM estimation for Student-t hierarchical model, non-stationary Student-t uncertaintiesmodel ...... 66

30 Dependencybetween q1j (f j ) parameters and q2j (zj ), q3j (vξj ) and q4i(vǫi ) parameters ...... 74

31 Dependencybetween q2j (zj ) parameters and q1j (f j ), q4i(vǫi ) and q5j (vzj ) parameters ...... 75

32 Dependencybetween q3j (vξj ) parameters and q1j (f j ) and q2j (zj ) parameters ...... 75

33 Dependencybetween q4i(vǫi ) parameters and q1j (f j ) parameters ...... 75

34 Dependencybetween q5j (vzj ) parameters and q2j (zj ) parameters ...... 75 35 Indirect sparsity (via z) - Iterative algorithm corresponding to full separability PM estimation for Student-t hierarchical model, non-stationary Student-t uncertaintiesmodel ...... 77

108 36 Iterative algorithm corresponding to Joint MAP estimation for Laplace hierarchical model, non- stationaryStudent-tuncertaintiesmodel ...... 83

37 Dependencybetween q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters ...... 89

38 Dependencybetween q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters ...... 90

39 Dependencybetween q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters ...... 90 40 Iterative algorithm corresponding to PM estimation via VBA - partial separability for Laplace hierar- chical model, non-stationary Student-t uncertainties model ...... 90 41 Iterative algorithm corresponding to PM estimation via VBA - full separability for Laplace hierarchical model, non-stationary Student-t uncertainties model ...... 94 42 Iterative algorithm corresponding to Joint MAP estimation for Laplace hierarchical model, non- stationaryLaplaceuncertaintiesmodel ...... 97

43 Dependencybetween q1(f) parameters and q2j (vf j ) and q3i(vǫi ) parameters ...... 103

44 Dependencybetween q2j (vf j ) parameters and q1(f) and q3i(vǫi ) parameters ...... 103

45 Dependencybetween q3i(vǫi ) parameters and q2j (vf j ) and q1(f) parameters ...... 103 46 Iterative algorithm corresponding to PM estimation via VBA - partial separability for Laplace hierar- chical model, non-stationary Laplace uncertainties model ...... 104 47 Iterative algorithm corresponding to PM estimation via VBA - full separability for Laplace hierarchical model, non-stationaryLaplaceuncertaintiesmodel ...... 107

List of Tables

109 References

[AKZ06] A. Achim, E. E. Kuruoglu, and J. Zerubia. Sar image filtering based on the heavy-tailed rayleigh model. IEEE Transactions on Image Processing, 15(9):2686 – 2693, Sept. 2006. [AMD10] Hacheme Ayasso and Ali Mohammad-Djafari. Joint NDT image restoration and segmentation using Gauss–Markov–Potts prior models and variational bayesian computation. IEEE Transactions on Image Processing, 19(9):2265–2277, 2010. [CMD97] H. Carfantan and Ali Mohammad-Djafari. A Bayesian framework for nonlinear diffraction tomography. In IEEE EURASIP Workshop on Nonlinear Signal and Image Processing NSIP’97, Mackinac island, MI, USA, September 1997. [FDMD05] Olivier F´eron, Bernard Duchˆene, and Ali Mohammad-Djafari. Microwave imaging of inhomogeneous objects made of a finite number of dielectric and conductive materials from experimental data. Inverse Problems, 21(6):95–115, Dec 2005. [FDMD07] Olivier F´eron, Bernard Duchˆene, and Ali Mohammad-Djafari. Microwave imaging of piecewise constant objects in a 2D-TE configuration. International Journal of Applied Electromagnetics and Mechanics, 26(6):167–174, IOS Press 2007. [Had01] J. Hadamard. Sur les probl`emes aux d´eriv´ees partielles et leur signification physique. Princeton Univ. Bull., 13, 1901. [KTB04] EE Kuruoglu,A. Tonazzini, and L. Bianchi. Sourceseparation in noisy astrophysical images modelled by markov random fields. In Image Processing, 2004. ICIP’04. 2004 International Conference on, volume 4, pages 2701–2704. IEEE, 2004. [NMD94] M. Nguyen and Ali Mohammad-Djafari. Bayesian approach with the maximum entropy principle in image reconstruction from microwave scattered field data. IEEE Trans. on Medical Imaging, 13(2):254– 262, June 1994. [NMD96] M. Nikolovaand Ali Mohammad-Djafari. Eddy current tomographyusing a binary Markov model. Signal Processing, 49:119–132, 1996.

110