IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ?? 1 Bayesian Fusion of Multi-Band Images Qi Wei, Student Member, IEEE, Nicolas Dobigeon, Senior Member, IEEE, and Jean-Yves Tourneret, Senior Member, IEEE

Abstract—This paper presents a Bayesian fusion technique for images has been considered in fewer research works and is still remotely sensed multi-band images is presented. The observed a challenging problem because of the high dimension of the images are related to the high spectral and high spatial res- data to be processed. Indeed, the fusion of MS and HS differs olution image to be recovered through physical degradations, e.g., spatial and spectral blurring and/or subsampling defined from traditional MS or HS pansharpening by the fact that by the sensor characteristics. The fusion problem is formulated more spatial and spectral information is contained in multi- within a Bayesian estimation framework. An appropriate prior band images. This additional information can be exploited to distribution exploiting geometrical consideration is introduced. obtain a high spatial and spectral resolution image. In practice, To compute the Bayesian estimator of the scene of interest from the spectral bands of panchromatic images always cover the its posterior distribution, a Markov chain Monte Carlo algo- rithm is designed to generate samples asymptotically distributed visible and infra-red spectra. However, in several practical according to the target distribution. To efficiently sample from applications, the spectrum of MS data includes additional this high-dimension distribution, a Hamiltonian Monte Carlo step high-frequency spectral bands. For instance the MS data of is introduced in the Gibbs sampling strategy. The efficiency of WorldView-31 have spectral bands in the intervals [400 ∼ the proposed fusion method is evaluated with respect to several 1750]nm and [2145 ∼ 2365]nm whereas the PAN data are state-of-the-art fusion techniques. in the range [450 ∼ 800]nm. Another interesting example is Index Terms—Fusion, super-resolution, multispectral and hy- the HS+MS suite (called hyperspectral imager suite (HISUI)) perspectral images, deconvolution, Bayesian estimation, Hamil- that has been developed by the Japanese ministry of economy, tonian Monte Carlo algorithm. trade, and industry (METI) [14]. HISUI is the Japanese next- generation Earth-observing sensor composed of HS and MS I.INTRODUCTION imagers and will be launched by the H-IIA rocket in 2015 or HE problem of fusing a high spatial and low spectral res- later as one of mission instruments onboard JAXA’s ALOS-3 T olution image with an auxiliary image of higher spectral satellite. Some research activities have already been conducted but lower spatial resolution, also known as multi-resolution for this practical multi-band fusion problem [15]. Noticeably, a image fusion, has been explored for many years [2]. When lot of pansharpening methods, such as component substitution considering remotely sensed images, an archetypal fusion task [2], relative spectral contribution [16] and high-frequency is the pansharpening, which generally consists of fusing a high injection [17] are inapplicable or inefficient for the HS+MS spatial resolution panchromatic (PAN) image and low spatial fusion problem. To address the challenge raised by the high resolution multispectral (MS) image. Pansharpening has been dimensionality of the data to be fused, innovative methods addressed in the literature for several decades and still remains need to be developed, which is the main objective of this paper. an active topic [2]–[4]. More recently, hyperspectral (HS) As demonstrated in [18], [19], the fusion of HS and MS imaging, which consists of acquiring a same scene in several images can be conveniently formulated within a Bayesian hundreds of contiguous spectral bands, has opened a new range inference framework. Bayesian fusion allows an intuitive inter- of relevant applications, such as target detection, classification pretation of the fusion process via the posterior distribution. and spectral unmixing [5]. The visualization of HS images is Since the fusion problem is usually ill-posed, the Bayesian also interesting to be explored [6]. Naturally, to take advantage methodology offers a convenient way to regularize the prob- of the newest benefits offered by HS images, the problem of lem by defining appropriate prior distribution for the scene fusing HS and PAN images has been explored [7]–[9]. Capi- of interest. Following this strategy, Hardie et al. proposed talizing on decades of experience in MS pansharpening, most a Bayesian estimator for fusing co-registered high spatial- of the HS pansharpening approaches merely adapt existing resolution MS and high spectral-resolution HS images [18]. To algorithms for PAN and MS fusion [10], [11]. Other methods improve the denoising performance, Zhang et. al implemented are specifically designed to the HS pansharpening problem the estimator of [18] in the wavelet domain [19]. (see, e.g., [8], [12], [13]). Conversely, the fusion of MS and HS In this paper, a prior knowledge accounting for artificial Copyright (c) 2014 IEEE. Personal use of this material is permitted. constraints related to the fusion problem is incorporated within However, permission to use this material for any other purposes must be the model via the prior distribution assigned to the scene to be obtained from the IEEE by sending a request to [email protected]. estimated. Many strategies related to HS resolution enhance- Part of this work has been supported by the Hypanema ANR Project n◦ANR-12-BS03-003 and by ANR-11-LABX-0040-CIMI within the program ment have been proposed to define this prior distribution. For ANR-11-IDEX-0002-02 within the thematic trimester on image processing. instance, in [3], the highly resolved image to be estimated Part of this work was presented during the IEEE ICASSP 2014 [1]. is a priori modeled by an in-homogeneous Gaussian Markov Qi Wei, Nicolas Dobigeon and Jean-Yves Tourneret are with Uni- versity of Toulouse, IRIT/INP-ENSEEIHT, 2 rue Camichel, BP 7122, 31071 Toulouse cedex 7, France (e-mail: {qi.wei, nicolas.dobigeon, jean- 1http://www.satimagingcorp.com/satellite-sensors/WorldView3-DS-WV3- yves.tourneret }@.fr). Web.pdf 2 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ?? random field (IGMRF). The parameters of this IGMRF are The paper is organized as follows. Section II formulates the empirically estimated from a panchromatic image in the first fusion problem in a Bayesian framework, with a particular step of the analysis. In [18] and related works [20], [21], a attention to the forward model that exploits physical consider- multivariate Gaussian distribution is proposed as prior dis- ations. Section III derives the hierarchical Bayesian model to tribution for the unobserved scene. The resulting conditional obtain the joint posterior distribution of the unknown image, mean and covariance matrix can then be inferred using a its parameters and hyperparameters. In Section IV, the hybrid standard clustering technique [18] or using a stochastic mixing Gibbs sampler based on Hamiltonian MCMC is introduced model [20], [21], incorporating spectral mixing constraints to to sample the desired posterior distribution. Simulations are improve spectral accuracy in the estimated high resolution conducted in Section V and conclusions are finally reported image. In this paper, we propose to explicitly exploit the in Section VI. acquisition process of the different images. More precisely, the sensor specifications (i.e., spectral or spatial responses) are II.PROBLEM FORMULATION exploited to properly design the spatial or spectral degradations suffered by the image to be recovered [22]. Moreover, to A. Notations and observation model define the prior distribution assigned to this image, we resort Let Z1,..., ZP denote a set of P multi-band images to geometrical considerations well admitted in the HS imaging acquired by different optical sensors for a same scene X. literature devoted to the linear unmixing problem [23]. In These measurements can be of different natures, e.g., PAN, MS particular, the high spatial resolution HS image to be estimated and HS, with different spatial and/or spectral resolutions. The is assumed to live in a lower dimensional subspace, which is observed data Zp, p = 1,...,P , are supposed to be degraded a suitable hypothesis when the observed scene is composed of versions of the high-spectral and high-spatial resolution scene a finite number of macroscopic materials. X, according to the following observation model Within a Bayesian estimation framework, two statistical es- timators are generally considered. The minimum mean square Zp = Fp (X) + Ep. (1) error (MMSE) estimator is defined as the mean of the posterior In (1), Fp (·) is a linear or nonlinear transformation that distribution. Its computation generally requires intractable models the degradation operated on X. As previously assumed multidimensional integrations. Conversely, the maximum a in numerous works (see for instance [3], [19], [24], [28], posteriori (MAP) estimator is defined as the mode of the [29] among some recent contributions), these degradations posterior distribution and is usually associated with a penalized may include spatial blurring, spatial decimation and spectral maximum likelihood approach. Mainly due to the complexity mixing which can all be modeled by linear transformations. of the integration required by the computation of the MMSE In what follows, the remotely sensed images Zp and the estimator (especially in high-dimension data space), most of unobserved scene X are assumed to be pixelated images of the Bayesian estimators have proposed to solve the HS and sizes nx,p ×ny,p ×nλ,p and mx ×my ×mλ, respectively, where MS fusion problem using a MAP formulation [18], [19], [24]. ·x and ·y refer to both spatial dimensions of the images, and ·λ However, optimization algorithms designed to maximize the is for the spectral dimension. Moreover, in the right-hand side posterior distribution may suffer from the presence of local of (1), Ep stands for an additive error term that both reflects extrema, that prevents any guarantee to converge towards the the mismodeling and the observation noise. actual maximum of the posterior. In this paper, we propose Classically, the observed image Zp can be lexicographi- to compute the MMSE estimator of the unknown scene by cally ordered to build the Np × 1 vector zp, where Np = using samples generated by a Markov chain Monte Carlo nx,pny,pnλ,p is the total number of measurements in the (MCMC) algorithm. The posterior distribution resulting from observed image Zp. For writing convenience, but without the proposed forward model and the a priori modeling is any loss of generality, the band interleaved by pixel (BIP) defined in a high dimensional space, which makes difficult vectorization scheme (see [30, pp. 103–104] for a more de- the use of any conventional MCMC algorithm, e.g., the Gibbs tailed description of these data format conventions) is adopted sampler or the Metropolis-Hastings sampler [25]. To overcome in what follows (see paragraph III-B1). Considering a linear this difficulty, a particular MCMC scheme, called Hamiltonian degradation, the observation equation (1) can be easily rewrit- Monte Carlo (HMC) algorithm, is derived [26], [27]. It differs ten as follows from the standard Metropolis-Hastings algorithm by exploiting zp = Fpx + ep (2) Hamiltonian evolution dynamics to propose states with higher M Np acceptance ratio, reducing the correlation between successive where x ∈ R and ep ∈ R are ordered versions of samples. Thus, the main contributions of this paper are two- the scene X (with M = mxmymλ) and the noise term fold. First, the paper presents a new hierarchical Bayesian Ep, respectively. In this work, the noise vector ep will be assumed to be a band-dependent Gaussian sequence, i.e., fusion model whose parameters and hyperparameters have to  be estimated from the observed images. This model is defined ep ∼ N 0Np , Λp where 0Np is an Np × 1 vector made by the likelihood, the priors and the hyper-priors detailed in of zeros and Λp = Inx,pny,p ⊗ Sp is an Np × Np matrix nx,pny,p×nx,pny,p the following sections. Second, a hybrid Gibbs sampler based where Inx,pny,p ∈ R is the identity matrix, n ×n on a Hamiltonian MCMC method is introduced to sample the ⊗ is the Kronecker product and Sp ∈ R λ,p λ,p is a diagonal matrix containing the noise variances, i.e., S = desired posterior distribution. These samples are subsequently h i p s2 , ··· , s2 used to approximate the MMSE estimator of the fused image. diag p,1 p,nλ,p . DOBIGEON et al.: BAYESIAN FUSION OF MULTI-BAND IMAGES 3

In (2), Fp is an Np×M matrix that reflects the spatial and/or TABLE I NOTATIONS spectral degradation Fp (·) operated on x. As in [18], Fp (·) can represent a spatial decimating operation. For instance, Notation Definition Size when applied to a single-band image (i.e., nλ,p = mλ = 1) X unobserved scene/target image mx × my × mλ with a decimation factor d in both spatial dimensions, it is x vectorization of X mxmymλ × 1 easy to show that Fp is an nx,pny,p × mxmy block diagonal xi ith spectral vector of x mλ × 1 matrix with mx = dnx,p and my = dny,p [31]. Another u vectorized image in subspace mxmymλ × 1 example of degradation frequently encountered in the signal e ui ith spectral vector of u mλ × 1 and image processing literature is spatial blurring [19], where e µ¯ u prior mean of u mxmyme λ × 1 Fp (·) usually represents a 2-dimensional convolution by a ¯ Σu prior covariance of u mxmyme λ × mxmyme λ kernel κp. Similarly, when applied to a single-band image, Fp µ prior mean of ui mλ × 1 ui e is an nxny × nxny Toeplitz matrix. The problem addressed in Σu prior covariance of ui mλ × mλ this paper consists of recovering the high-spectral and high- i e e P number of multi-band images 1 spatial resolution scene x by fusing the various spatial and/or Zp pth remotely sensed images nx,p × ny,p × nλ,p spectral information provided by all the observed images zp vectorization of Zp nx,pny,pnλ,p × 1 z = {z1,..., zP }. z set of P observation zp nx,pny,pnλ,pP × 1

B. Bayesian estimation of x In this work, we propose to estimate the unknown scene where |Λp| is the determinant of the matrix Λp. As mentioned x within a Bayesian estimation framework. In this statistical in the previous section, the collected measurements z may estimation scheme, the fused highly-resolved image x is have been acquired by different (possibly heterogeneous) inferred through its posterior distribution f (x|z). Given the sensors. Therefore, the observed vectors z1,..., zP can be observed data, this target distribution can be derived from the generally assumed to be independent, conditionally upon the likelihood function f (z|x) and the prior distribution f (x) by unobserved scene x and the noise covariances Λ1,..., Λp. As using the Bayes formula f (x|z) ∝ f (z|x) f (x), where ∝ a consequence, the joint likelihood function of the observed data is P means “proportional to”. Based on the posterior distribution, Y f (z|x, Λ) = f (z |x, Λ ) several estimators of the scene x can be investigated. For p p (4) instance, maximizing f (x|z) leads to the MAP estimator p=1 T xˆMAP = arg maxx f (x|z). This estimator has been widely with Λ = (Λ1,..., ΛP ) . exploited for HS image enhancement (see for instance [18], [20], [21] or more recently [3], [19]). This work proposes to B. Prior distributions focus on the first moment of the posterior distribution f (x|z), which is known as the posterior mean estimator or the MMSE The unknown parameters are the scene x to be recovered and the noise covariance matrix Λ relative to each observation. estimator xˆMMSE. This estimator is defined as In this section, prior distributions are introduced for these Z R xf (z|x) f (x) dx parameters. xˆMMSE = xf (x|z) dx = R . (3) f (z|x) f (x) dx 1) Scene prior: Following a BIP strategy, the vectorized h iT In order to compute (3), we propose a flexible and relevant x x = xT , xT , ··· , xT image can be decomposed as 1 2 mxmy , statistical model to solve the fusion problem. Deriving the T where xi = [xi,1, xi,2, ··· , xi,mλ ] is the mλ × 1 vector cor- corresponding Bayesian estimators xˆMMSE defined in (3), responding to the ith spatial location (with i = 1, ··· , mxmy). requires the definition of the likelihood function f (z|x) and Since adjacent HS bands are known to be highly correlated, the prior distribution f (x). These quantities are detailed in the HS vector xi usually lives in a subspace whose dimension the next section. To facilitate reading, notations have been is much smaller than the number of bands mλ [32], i.e., summarized in Table I. T xi = V ui (5) III.HIERARCHICAL BAYESIAN MODEL where ui is the projection of the vector xi onto the subspace A. Likelihood function spanned by the columns of VT ∈ Rmλ×me λ . Note that T The statistical properties of the noise vectors ep (p = V is possibly known a priori from the scene or can be 1,...,P ) allow one to state that the observed vector zp is learned from the HS data. In the proposed framework, we normally distributed with mean vector Fpx and covariance exploit the dimensionality reduction (DR) as prior information matrix Λp. Consequently, the likelihood function, that repre- instead of reducing the dimensionality of HS data directly. sents a data fitting term relative to the observed vector zp, can Another motivation for DR is that the dimension of the be easily derived leading to subspace me λ is generally much smaller than the number of bands, i.e., mλ  mλ. As a consequence, inferring Np nx,pny,p e − 2 − m ×1 f (zp|x, Λp) = (2π) |Λp| 2 in the subspace R e λ greatly decreases the computational  1  burden of the fusion algorithm. Note that the DR technique × exp − (z − F x)T Λ−1 (z − F x) 2 p p p p p defined by (5) has been used in some related HS analysis 4 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ?? references, e.g., [23], [32]. More experimental justifications for C. Hyperparameter priors the necessity of DR can be found in [33]. Using the notation h iT The hyperparameter vector associated with the parameter u = uT , uT , ··· , uT u = Vx V ¯ ¯ 1 2 mxmy , we have , where is priors defined above includes µu, Σu and γ. The quality of the fusion algorithm investigated in this paper depends on an Mf×M block-diagonal matrix whose blocks are equal to V the values of the hyperparameters that need to be adjusted and Mf = mxmymλ. Instead of assigning a prior distribution e carefully. Instead of fixing all these hyperparameters a priori, to the vectors xi, we propose to define a prior for the projected we propose to estimate some of them from the data using a vectors ui (i = 1, ··· , mxmy) hierarchical Bayesian algorithm [37, Chap. 8]. Specifically, we  ui|µui , Σui ∼ N µui , Σui . (6) propose to fix µ¯ u as the interpolated HS image in the subspace of interest following the strategy in [18]. Similarly, to reduce As u is a linear transformation of x , the Gaussian prior i i the number of statistical parameters to be estimated, all the assigned to u leads to a Gaussian prior for x , which allows i i covariance matrices are assumed to equal, i.e., Σ = Σ the ill-posed problem (2) to be regularized. The covariance ui u (for i = 1, ··· , m m ). Thus, the hyperparameter vector to matrix Σ is designed to explore the correlations between x y ui be estimated jointly with the parameters of interest is Φ = the different spectral bands after projection in the subspace of {Σ , γ}. The prior distributions for these two hyperparameters interest. Also, the mean µ¯ of the whole image u as well as u u are defined below. its covariance matrix Σ¯ u can be constructed from µ and ui 1) Hyperparameter Σ : Assigning a conjugate a priori Σ as follows u ui inverse-Wishart distribution to the covariance matrix of a T µ¯ = µT , ··· , µT  Gaussian vector has provided interesting results in the signal u u1 umxmy ¯   and image processing literature [38]. Following these works, Σu = diag Σu1 , ··· , Σum m . x y we have chosen the following prior for Σu The Gaussian prior assigned to u implies that the target image −1 Σu ∼ W (Ψ, η) (9) u is a priori not too far from the mean vector µ¯ u, whereas the covariance matrix Σ¯ u tells us how much confidence we whose density is have for the prior (the choice of the hyperparameters µ¯ u η 2 |Ψ| − η+mfλ+1 1 −1 Σ¯ 2 − tr(ΨΣu ) and u will be discussed later in Section III-C). Choosing f(Σu|Ψ, η) = |Σu| e 2 . ηmfλ η a Gaussian prior for the vectors ui is also motivated by the 2 2 Γ ( ) me λ 2 fact that this kind of prior has been used successfully in several Again, the hyper-hyperparameters Ψ and η will be fixed to works related to the fusion of multiple degraded images, provide a non-informative prior. including [20], [34], [35]. Note finally that the Gaussian prior 2) Hyperparameter γ: To reflect the absence of prior has the interest of being a conjugate distribution relative to knowledge regarding the mean noise level, a non-informative the statistical model (4). As it will be shown in Section IV, Jeffreys prior is assigned to the hyperparameter γ coupling this Gaussian prior distribution with the Gaussian likelihood function leads to simpler estimators constructed 1 f (γ) ∝ 1R+ (γ) (10) from the posterior distribution f (u|z). γ + where 1R+ (·) is the indicator function defined on R ( 2) Noise variance priors: Inverse-gamma distributions are 1, if u ∈ R+, 2 chosen as prior distributions for the noise variances s (i = 1R+ (u) = p,i 0, otherwise. 1, . . . , nλ,p, p = 1,...,P ) ν γ  The use of the improper distribution (10) is classical and s2 |ν, γ ∼ IG , . (7) p,i 2 2 can be justified by different means (e.g., see [37, Chap. 1]), providing that the corresponding full posterior distribution is statistically well defined, which is the case for the proposed The inverse-gamma distribution is a very flexible distribu- fusion model. tion whose shape can be adjusted by its two parameters. For simplicity, we propose to fix the hyperparameter ν whereas the hyperparameter γ will be estimated from the data. This D. Inferring the highly-resolved HS image from the posterior strategy is very classical for scale parameters (e.g., see [36]). distribution of its projection u Note that the inverse-gamma distribution (7) is conjugate Following the parametrization in the prior model (5), the for the statistical model (4), which will allow closed-form unknown parameter vector θ = u, s2 is composed of the expressions to be obtained for the conditional distributions projected scene u and the noise variance vector s2. The joint 2  f sp,i|z of the noise variances. By assuming the variances posterior distribution of the unknown parameters and hyperpa- 2  2 s = sp,i (∀p, i) are a priori independent, the joint prior rameters can be computed following the hierarchical structure 2 distribution of the noise variance vector s is f (θ, Φ|z) ∝ f (z|θ) f (θ|Φ) f (Φ). By assuming prior in- P nλ,p dependence between the hyperparameters Σu and γ and the 2  Y Y 2  2 f s |ν, γ = f sp,i|ν, γ . (8) parameters u and s conditionally upon (Σu, γ), the following p=1 i=1 2  results can be obtained f (θ|Φ) = f (u|Σu) f s |γ and DOBIGEON et al.: BAYESIAN FUSION OF MULTI-BAND IMAGES 5

f (Φ) = f (Σu) f (γ). Note that f (z|θ), f (u|Σu) and f s2|γ have been defined in (4), (6) and (8). ALGORITHM 1: The posterior distribution of the projected target image u, Hybrid Gibbs sampler required to compute the Bayesian estimators (3), is obtained by for t = 1 to NMC do marginalizing out the hyperparameter vector Φ and the noise % Sampling the image variances - see paragraph IV-A 2 ˜ (t) variances s from the joint posterior distribution f (θ, Φ|z) Sample Σu according to the conditional distribution (13) Z % Sampling the high-resolved image - see paragraph IV-B 2 2 Sample u˜(t) using an HMC algorithm detailed in Algorithm 2 f (u|z) ∝ f (θ, Φ|z) dΦds1,1, . . . , dsP,n . (11) λ,P % Sampling the noise variances - see paragraph IV-C p = 1 P The posterior distribution (11) is too complex to obtain closed- for to do for i = 1 to nλ,p do form expressions of the MMSE and MAP estimators uˆMMSE 2(t) Sample s˜p,i from the conditional distribution (18) and uˆMAP. As an alternative, this paper proposes to use an end for MCMC algorithm to generate a collection of NMC samples end for U = u˜(1),..., u˜(NMC) that are asymptotically distributed end for according to the posterior of interest f (u|z). These samples will be used to compute the Bayesian estimators of u. More precisely, the MMSE estimator of u will be approximated by an empirical average of the generated samples 2  B. Sampling u according to f u|Σu, s , z NMC 1 X (t) uˆMMSE ≈ u˜ (12) NMC − Nbi t=Nbi+1 Choosing the conjugate distribution (6) as prior distribution for the projected unknown image u leads to the following where Nbi is the number of burn-in iterations. Once the MMSE conditional posterior distribution for u estimate uˆMMSE has been computed, the highly-resolved HS image can be computed as xˆ = VT uˆ . Sam-   MMSE MMSE u|Σ , s2, z ∼ N µ , Σ (14) pling directly according to the marginal posterior distribution u u|z u|z f (u|z) is not straightforward. Instead, we propose to sample with 2  according to the joint posterior f u, s , Σu|z (hyperparam- h i−1 ¯ −1 PP T −1 T eter γ has been marginalized) by using a Metropolis-within- Σu|z = Σu + p=1 VFp Λp FpV h i Gibbs sampler, which can be easily implemented since all µ = Σ PP VFT Λ−1z + Σ¯ −1µ¯ . 2  u|z u|z p=1 p p p u u the conditional distributions associated with f u, s , Σu|z are relatively simple. The resulting hybrid Gibbs sampler is Sampling directly according to this multivariate Gaussian detailed in the following section. distribution requires the inversion of an Mf×Mf matrix, which is impossible in most fusion problems. An alternative would consist of sampling each element ui (i = 1,..., Mf) of u con- YBRID IBBS AMPLER 2  IV. H G S ditionally upon the others according to f ui|u−i, s , Σu, z , where u is the vector u whose ith component has been The Gibbs sampler has received a considerable attention −i removed. However, this alternative would require to sample u in the statistical community to solve Bayesian estimation by using M Gibbs moves, which is time demanding and leads problems [25]. The interesting property of this Monte Carlo f to poor mixing properties. algorithm is that it only requires to determine the conditional distributions associated with the distribution of interest. These The efficient strategy adopted in this work relies on a conditional distributions are generally easier to simulate than particular MCMC method, called Hamiltonian Monte Carlo the joint target distribution. The block Gibbs sampler that we (HMC) method (sometimes referred to as hybrid Monte Carlo 2  use to sample according to f u, s , Σu|z is defined by a method), which is considered to generate vectors u directly. 3-step procedure reported in Algorithm 1. The distribution More precisely, we consider the HMC algorithm initially involved in this algorithm are detailed below. proposed by Duane et al. for simulating the lattice field theory in [26]. As detailed in [39], this technique allows mixing property of the sampler to be improved, especially in a high- 2  A. Sampling Σu according to f Σu|u, s , z dimensional problem. It exploits the gradient of the distri- bution to be sampled by introducing auxiliary “momentum” Standard computations yield the following inverse-Wishart variables m ∈ RMf. The joint distribution of the unknown distribution as conditional distribution for the covariance ma- parameter vector u and the momentum is defined as trix Σu of the scene to be recovered 2  2  2 f u, m|s , Σu, z = f u|s , Σu, z f (m) Σu|u, s , z ∼ mxmy ! where f (m) is the normal probability density function (pdf) −1 X T W Ψ + (ui − µui ) (ui − µui ), mxmy + η . with zero mean and identity covariance matrix. The Hamilto- i=1 nian of the considered system is defined by taking the negative (13) 2  logarithm of the posterior distribution f u, m|s , µu, Σu, z 6 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ?? to be sampled, i.e., solve super resolution problems. Similarly, Orieux et al. have proposed a perturbation approach to sample high-dimensional H (u, m) = − log f u, m|s2, µ , Σ , z u u (15) Gaussian distributions for general linear inverse problems [41]. = U (u) + K (m) However, these techniques rely on additional optimization where U (u) is the potential energy function defined by schemes included within the Monte Carlo algorithm, which 2  implies that the generated samples are only approximately the negative logarithm of f u|s , Σu, z and K (m) is the corresponding kinetic energy distributed according to the target distribution. Conversely, the HMC strategy proposed here ensures asymptotic conver- 2  U (u) = − log f u|s , Σu, z gence of the generated samples to the posterior distribution. (16) 1 T Moreover, the HMC method is very flexible and can be K (m) = 2 m m. easily extended to handle non-Gaussian posterior distributions The parameter space where (u, m) lives is explored following contrary to the methods investigated in [40], [41]. the scheme detailed in Algo 2. At iteration t of the Gibbs sampler, a so-called leap-frogging procedure composed of 2 2  Nleapfrog iterations is achieved to propose a move from the C. Sampling s according to f s |u, Σu, z current state u˜(t), m˜ (t) to the state u˜(?), m˜ (?) with step 2 The conditional pdf of the noise variance sp,i (i = Mf Mf size ε. This move is operated in R × R in a direction 1, . . . , nλ,p, p = 1,...,P ) is given by the gradient of the energy function 2  f sp,i|u, Σu, z ∝ P n n x,p y,p +1 X T −1 T  −1 ! 2 T 2 ! (17) ∇uU (u) = − VFp Λp zp − FpV u +Σu (u−µ¯ u). 1 (zp − FpV u)i p=1 2 exp − 2 sp,i 2sp,i ρ = Then, the new state is accepted with probability t T min {1,A } where (zp − FpV u)i contains the elements of the ith t where 2 band. Generating samples sp,i distributed according to (?) (?) 2  f u˜ , m˜ |s , Σu, z f s2 |u, Σ ,  A = p,i u z is classically achieved by drawing samples t (t) (t) 2  f u˜ , m˜ |s , Σu, z from the following inverse-gamma distribution h    i (t) (t) (?) (?) T 2 ! = exp H u˜ , m˜ − H u˜ , m˜ . nx,pny,p (zp − FpV u)i s2 |u, z ∼ IG , . (18) p,i 2 2 In practice, if the noise variances are known a priori, we ALGORITHM 2: simply assign the noise variances to be known values and Hybrid Monte Carlo algorithm remove the sampling of the noise variances.

% Momentum initialization Sample m˜ (?) ∼ N 0 , I  Mf Mf D. Complexity Analysis Set m˜ (t) ← m˜ (?) % Leapfrogging The MCMC method can be computationally costly com- for j = 1 to NL do pared with optimization methods. The complexity of the (?) (?) ε  (?) Set m˜ ← m˜ − 2 ∇uU u˜ proposed Gibbs sampler is mainly due to the Hamiltonian Set u˜(?) ← u˜(?) + εm˜ (?) Monte Carlo method. The complexity of the Hamiltonian   Set m˜ (?) ← m˜ (?) − ε ∇ U u˜(?) O((m )3) + O((m m m )2) 2 u MCMC method is fλ fλ x y , which is end for highly expensive as mλ increases. Generally the number of % Accept/reject procedure, See (IV-B) pixels mxmy cannot be reduced significantly. Thus, projecting Sample w ∼ U ([0, 1]) the high-dimensional mλ ×1 vectors to a low-dimension space if w < ρt then u˜(t+1) ← u˜(?) to form mfλ×1 vectors decreases the complexity while keeping else most important information. u˜(t+1) ← u˜(t) end if Set x˜(t+1) = VT u˜(t+1) V. SIMULATION RESULTS Run Algorithm 3 to update stepsize This section studies the performance of the proposed Bayesian fusion algorithm. The reference image, considered here as the high spatial and high spectral image, is an hyper- This accept/reject procedure ensures that the simulated spectral image acquired over Moffett field, CA, in 1994 by (t) (t) vectors (u˜ , m˜ ) are asymptotically distributed according the JPL/NASA airborne visible/infrared imaging spectrometer to the distribution of interest. The way the parameters ε and (AVIRIS)2. This image was initially composed of 224 bands NL have been adjusted will be detailed in Section V. that have been reduced to 177 bands (mλ = nλ,1 = 177) after To sample according to a high-dimension Gaussian distri- removing the water vapor absorption bands. 2  bution such as f u|Σu, s , z , one might think of using other simulation techniques such as the method proposed in [40] to 2http://aviris.jpl.nasa.gov/ DOBIGEON et al.: BAYESIAN FUSION OF MULTI-BAND IMAGES 7

A. Fusion of HS and MS images We propose to reconstruct the reference HS image from two lower resolved images. First, a high-spectral low-spatial reso- lution image z1, denoted as HS image, has been generated by applying a 5×5 averaging filter on each band of the reference image. Besides, an MS image z2 is obtained by successively averaging the adjacent bands according to realistic spectral responses. More precisely, the reference image is filtered using the LANDSAT-like spectral responses depicted in the top of 3 Fig. 1, to obtain a 7-band (nλ,2 = 7) MS image . Note here that the observation models F1 and F2 corresponding to the HS and MS images are perfectly known. In addition to the blurring and spectral mixing, the HS and MS images have been both contaminated by zero-mean additive Gaussian 2 noise. The noise power sp,i depends on the signal to noise ratio SNRp,i (i = 1, ··· , nλ,p, p = 1, 2) defined by SNRp,i = Fig. 2. AVIRIS dataset: (Top left) HS Image. (Top right) MS Image. (Middle  k(F x) k2  p i F left) MAP [18]. (Middle right) Wavelet MAP [19]. (Bottom left) Hamiltonian 10 log10 n n s2 , where k.kF is the Frobenius norm. x,p y,p p,i MCMC. (Bottom right) Reference. Our simulations have been conducted with SNR1,· = 35dB for the first 127 bands and SNR1,· = 30dB for the remaining 50 bands of the HS image. For the MS image, SNR is 30dB T 2,· z = z , z , ··· , z  . The sample covariance for all bands. A composite color image, formed by selecting 1,i 1,i,1 1,i,2 1,i,nλ,1 matrix of the HS image z is diagonalized leading to the red, green and blue bands of the high-spatial resolution 1 HS image (the reference image) is shown in the right bottom WT ΥW = D (19) of Fig. 2. The noise-contaminated HS and MS images are where W is an m × m orthogonal matrix (WT = W−1) depicted in the top left and top right of Fig. 2. λ λ and D is a diagonal matrix whose diagonal elements are the

1 ordered eigenvalues of Υ denoted as d1 ≥ d2 ≥ ... ≥ dmλ . The dimension of the projection subspace me λ is defined as the Pme λ Pmλ R 0.5 minimum integer satisfying the condition i=1 di/ i=1 di ≥ 0.99. The matrix V is then constructed as the eigenvectors 0 associated with the m largest eigenvalues of Υ. As an 20 40 60 80 100 120 140 160 e λ Band illustration, the eigenvalues of the sample covariance matrix

1 Υ for the Moffett field image are displayed in Fig. 3. For this example, the m = 10 eigenvectors contain 99.93% of the noise e λ + 0.5 R information. To conclude this part, we invite the readers to

0 consult the technical report [33] containing additional results 20 40 60 80 100 120 140 160 Band obtained for this image with and without PCA (illustrating the importance of DR). Fig. 1. LANDSAT spectral responses. (Top) without noise. (Bottom) with an additive Gaussian noise with FSNR = 8dB.

1) Subspace learning: Learning the matrix V in (5) is a 0 10 preprocessing step, which can be solved by different strategies. −2 10 A lot of DR methods might be exploited, such as locally Eigenvalue −4 10 linear embedding (LLE) [42], independent component analysis 0 50 100 150 Principal Component Number (ICA) [43], hyperspectral signal subspace identification by minimum error (HySime) [32], minimum change rate devia- Fig. 3. Eigenvalues of Υ for the HS image. tion (MCRD) [44] and so on. In this work, we propose to use the principal component analysis (PCA), which is a classical 2) Hyper-hyperparameter selection: In our experiments, DR technique used in HS imagery. It maps the original data fixed hyper-hyperparameters have been chosen as follows: into a lower dimensional subspace while preserving most Ψ = I , η = m + 3. These choices can be motivated by me λ e λ information about the original data. Note that the bases of the following arguments this subspace are the columns of the transformation matrix VT , which are exactly the same for all pixels (or spectral • The identity matrix assigned to Ψ ensures a non- vectors). As in paragraph III-B1, the vectorized HS image z1 informative prior. h iT • T T T Setting the inverse gamma parameters to η = me λ + 3 can be written as z1 = z , z , ··· , z , where 1,1 1,2 1,nx,1ny,1 also leads to a non-informative prior [36]. • The parameter ν disappears when the joint posterior is 3Complementary results obtained on another dataset with an alternative multispectral sensor are available in [33]. integrated out with respect to parameter γ. 8 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ??

B. Stepsize and Leapfrog Steps b) SAM: The spectral angle mapper (SAM) measures the The performance of the HMC method is mainly governed spectral distortion between the actual and estimated images. The SAM of two spectral vectors x and xˆ is defined as by the stepsize ε and the number of leapfrog steps NL. n n  hxn,xˆni  SAM(xn, xˆn) = arccos . The average SAM is As pointed out in [27], a too large stepsize will result in kxnk2kxˆnk2 a very low acceptance rate and a too small stepsize yields finally obtained by averaging the SAMs of all image pixels. high computational complexity. In order to adjust the stepsize Note that SAM value is expressed in radians and thus belongs π π parameter ε, we propose to monitor the statistical acceptance to [− 2 , 2 ]. The smaller the absolute value of SAM, the less Na,t ratio ρˆt defined as ρˆt = where NW is the length of the important the spectral distortion. NW counting window (in our experiment, the counting window c) UIQI: The universal image quality index (UIQI) was at time t contains the vectors x˜(t−NW+1), x˜(t−NW), ··· , x˜(t) proposed in [48] for evaluating the similarity between two with NW = 50) and Na,t is the number of accepted samples single band images. It is related to the correlation, luminance in this window at time t. As explained in [45], the adaptive distortion and contrast distortion of the estimated image to tuning should adapt less and less as the algorithm proceeds the reference image. The UIQI between a = [a1, a2, ··· , aN ] to guarantee that the generated samples form a stationary and aˆ = [ˆa1, aˆ2, ··· , aˆN ] is defined as UIQI(a, aˆ) = 4σ2 µ µ Markov chain. In the proposed implementation, the parameter aaˆ a aˆ 2 2 2 2 2 2 , where µa, µaˆ, σa, σaˆ are the sample means (σa+σaˆ )(µa+µaˆ ) ε is adjusted as in Algorithm 3. The thresholds have been 2 and variances of a and aˆ, and σaaˆ is the sample covariance of fixed to (αd, αu) = (0.3, 0.9) and the scale parameters are (a, aˆ). The range of UIQI is [−1, 1] and UIQI= 1 when a = aˆ. (βd, βu) = (1.1, 0.9) (these parameters were adjusted by For multi-band image, the UIQI is obtained band-by-band and cross-validation). Note that the initial value of ε should not be averaged over all bands. too large to ‘blow up’ the leapfrog trajectory [27]. Generally, d) ERGAS: The relative dimensionless global error in the stepsize converges after some iterations of Algorithm 3. synthesis (ERGAS) calculates the amount of spectral distortion in the image [49]. This measure of fusion quality is defined r   ALGORITHM 3: 1 1 Pmλ RMSE(i) 2 as ERGAS = 100 × 2 , where 1/d is d mλ i=1 µi Adjusting Stepsize the ratio between the pixel sizes of the MS and HS images, Na,t Update ρˆt with Na,t : ρˆt = µi is the mean of the ith band of the HS image, and mλ is NW % Burn-in (t ≤ NMC): the number of HS bands. The smaller ERGAS, the smaller the if ρˆt > αu then spectral distortion. Set ε = βuε e) DD: The degree of distortion (DD) between two else if ρˆ < α then t d images X and Xˆ is defined as DD(X, Xˆ ) = 1 kvec(X) − Set ε = βdε M end if vec(Xˆ )k1. The smaller DD, the better the fusion. % After Burn in (t > NMC): if ρˆt > αu then Set ε = [1 − (1 − βu)exp(−0.01 × (t − Nbi))]ε, D. Comparison with other Bayesian models else if ρˆt < αd then The Bayesian model proposed here differs from previous Set ε = [1 − (1 − βd)exp(−0.01 × (t − Nbi))]ε, Bayesian models [18], [19] in three-fold. First, in addition to end if the target image x, the hierarchical Bayesian model allows the (t = Nbi + 1, ··· ,NMC) distributions of the noise variances s2 and the hyperparameter Σu to be inferred. The hierarchical inference structure makes this Bayesian model more general and flexible. Second, the Regarding the number of leapfrogs, setting the trajectory covariance matrix Σ is assumed to be block diagonal, which length N by trial and error is necessary [27]. To avoid the u L allows us to exploit the correlations between spectral bands. potential resonance, N is randomly chosen from a uniform L Third, the proposed method takes advantage of the relation distribution from N to N . After some preliminary runs min max between the MS image and the target image by introducing and tests, Nmin = 50 and Nmax = 55 have been selected. a forward model F2. This paragraph compares the proposed Bayesian fusion method with the two state-of-the-art fusion C. Evaluation of the Fusion Quality algorithms of [18] [19] for HS+MS fusion. The MMSE To evaluate the quality of the proposed fusion strategy, estimator of the image using the proposed Bayesian method is different image quality measures can be investigated. Referring obtained from (12). In this simulation, NMC = 500 and Nbi = to [19], we propose to use RSNR, SAM, UIQI, ERGAS and 500. The fusion results obtained with different algorithms DD as defined below. These measures have been widely used are depicted in Fig. 2. Graphically, the proposed algorithm in the HS image processing community and are appropriate performs competitively with the state-of-the-art methods. This for evaluating the quality of the fusion in terms of spectral result is confirmed quantitatively in Table II which shows the and spatial resolutions [18], [46], [47]. RSNR, UIQI, SAM, ERGAS and DD for the three methods. a) RSNR: The reconstruction SNR (RSNR) is related Note that the HMC method provides slightly better results in to the difference between the actual and fused images terms of image restoration than the other methods. However,  kxk2  RSNR(x, xˆ) = 10 log10 2 . The larger RSNR, the the proposed method allows the image covariance matrix and kx−xˆk2 better the fusion quality and vice versa. the noise variances to be estimated. The samples generated by DOBIGEON et al.: BAYESIAN FUSION OF MULTI-BAND IMAGES 9 the MCMC method can also be used to compute confidence 27 intervals for the estimators (e.g., see error bars in Fig. 4). 26 25 MAP Wavelet

RSNR(dB) 24 TABLE II HMC 23 PERFORMANCEOF HS+MS FUSIONMETHODSINTERMSOF:RSNR(DB), 5 10 15 20 25 30 FSNR(dB) UIQI,SAM(DEG),ERGAS AND DD(×10−2) (AVIRIS DATASET).

Methods RSNR UIQI SAM ERGAS DD Time(s) Fig. 5. Reconstruction errors of the different fusion methods versus FSNR. MAP 23.33 0.9913 5.05 4.21 4.87 1.6 Wavelet 25.53 0.9956 3.98 3.95 3.89 31 proposed method always outperforms the MAP and wavelet- Proposed 26.74 0.9966 3.40 3.77 3.33 530 based MAP methods. Thus, the proposed method is quite robust with respect to the imperfect knowledge of f2.

E. Estimation of the noise variances G. Application to pansharpening The proposed Bayesian method allows noise variances s2 p,i The proposed algorithm can also be used for pansharpening, (i = 1, ··· , n , p = 1, ··· ,P ) to be estimated from the λ,p which is an important and popular application in the area of re- samples generated by the Gibbs sampler. The MMSE estima- mote sensing. In this section, we focus on fusing panchromatic tors of s2 and s2 are illustrated in Fig. 4. Graphically, 1,(·) 2,(·) and hyperspectral images (HS+PAN), which is the extension of the estimations can track the variations of the noise powers conventional pansharpening (MS+PAN). The reference image within tolerable discrepancy. considered in this section (the high spatial and high spectral

−2 image) is a 128 × 64 × 93 HS image with very high spatial 10 Estimation resolution of 1.3 m/pixel) acquired by the Reflective Optics Actual −3 10 System Imaging Spectrometer (ROSIS) optical sensor over the urban area of the University of Pavia, Italy. The flight was

Noise Variances −4 10 operated by the Deutsches Zentrum fur¨ Luft- und Raumfahrt 20 40 60 80 100 120 140 160 HS bands (DLR, the German Aerospace Agency) in the framework of

−2 the HySens project, managed and sponsored by the European 10 Estimation Union. This image was initially composed of 115 bands that Actual −3 10 have been reduced to 93 bands after removing the water vapor absorption bands (with spectral range from 0.43 to 0.86 µm).

Noise Variances −4 10 This image has received a lot of attention in the remote 1 2 3 4 5 6 7 MS bands sensing literature [50]. The HS blurring kernel is the same as in paragraph V-A whereas the PAN image was obtained by Fig. 4. Noise variances and their MMSE estimates. (Top) HS image. (Bottom) MS image. averaging all the high resolution HS bands. The SNR of the PAN image is 30dB. Apart from [18], [19], we also compare the results with the method of [51], which proposes a popular pansharpening method. The results are displayed in Fig. 6 and F. Robustness with respect to the knowledge of F2 the quantitative results are reported in Table III. The proposed The sampling algorithm summarized in Algorithm 2 re- Bayesian method still provides interesting results. quires the knowledge of the spectral response F2. However, this knowledge can be partially known in some practical ap- TABLE III plications. As the spectral response is the same for each vector PERFORMANCEOF HS+PAN FUSIONMETHODSINTERMSOF:RSNR −2 xi (i = 1, ··· , mxmy), F2 is a block diagonal matrix whose (DB),UIQI,SAM(DEG),ERGAS AND DD(×10 )(ROSIS DATASET). blocks are f2 of size nλ,2 × mλ, i.e., F2 = diag[f2, ··· , f2]. This paragraph is devoted to testing the robustness of the Methods RSNR UIQI SAM ERGAS DD Time(s) proposed algorithm to the imperfect knowledge of f2. In AIHS [51] 16.69 0.9176 7.23 4.24 9.99 7.7 order to analyze this robustness, a zero-mean white Gaussian MAP [18] 17.54 0.9177 6.55 3.78 8.78 1.4 error has been added to any non-zero component of f2 as Wavelet [19] 18.03 0.9302 6.08 3.57 8.33 26 shown in the bottom of Fig. 1. Of course, the level of Proposed 18.23 0.9341 6.05 3.49 8.20 387 uncertainty regarding f2 is controlled by the variance of the 2 error denoted as σ2. The corresponding FSNR is defined as  kf k2  2 F VI.CONCLUSIONS FSNR = 10 log10 2 to adjust the knowledge of mλnλ,2s2 f2.The larger FSNR, the more knowledge we have about This paper proposed a hierarchical Bayesian model to f2. The RSNRs between the reference and estimated images fuse multiple multi-band images with various spectral and are displayed in Fig. 5 as a function of FSNR. Obviously, spatial resolutions. The image to be recovered was assumed the performance of the proposed Bayesian fusion algorithm to be degraded according to physical transformations included decreases as the uncertainty about f2 increases. However, within a forward model. An appropriate prior distribution, as long as the FSNR is above 8dB, the performance of the that exploited geometrical concepts encountered in spectral 10 IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, VOL. ??, NO. ??, ??

[5] C.-I Chang, Hyperspectral Imaging: Techniques for Spectral detection and classification. New York: Kluwer, 2003. [6] K. Kotwal and S. Chaudhuri, “A Bayesian approach to visualization- oriented hyperspectral image fusion,” Information Fusion, vol. 14, no. 4, pp. 349–360, 2013. [7] M. Cetin and N. Musaoglu, “Merging hyperspectral and panchromatic image data: qualitative and quantitative analysis,” Int. J. Remote Sens., vol. 30, no. 7, pp. 1779–1804, 2009. [8] G. A. Licciardi, M. M. Khan, J. Chanussot, A. Montanvert, L. Condat, and C. Jutten, “Fusion of hyperspectral and panchromatic images using multiresolution analysis and nonlinear PCA band reduction,” EURASIP J. Adv. Signal Process., vol. 2012, no. 1, pp. 1–17, 2012. [9] L. Loncan, L. B. Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus- sot, N. Dobigeon, S. Fabre, W. Liao, G. Licciardi, M. Simoes, J.- Y. Tourneret, M. Veganzones, G. Vivone, Q. Wei, and N. Yokoya, “Introducing hyperspectral pansharpening,” IEEE Geosci. Remote Sens. Mag., 2015, submitted. [10] M. Moeller, T. Wittman, and A. L. Bertozzi, “A variational approach to hyperspectral image fusion,” in Proc. SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2009, p. 73341E. Fig. 6. ROSIS dataset: (Top left) Reference. (Top right) PAN Image. (Middle left) Adaptive IHS [51]. (Middle right) MAP [18]. (Bottom left) Wavelet MAP [11] Z. Chen, H. Pu, B. Wang, and G.-M. Jiang, “Fusion of hyperspectral [19]. (Bottom right) Hamiltonian MCMC. and multispectral images: A novel framework based on generalization of pan-sharpening methods,” IEEE Geosci. and Remote Sensing Lett., vol. 11, no. 8, pp. 1418–1422, Aug 2014. [12] M. E. Winter and E. Winter, “Resolution enhancement of hyperspectral unmixing problems was proposed. The resulting posterior data,” in Proc. IEEE Aerospace Conference, 2002, pp. 3–1523. distribution was efficiently sampled thanks to a Hamiltonian [13] G. Chen, S.-E. Qian, J.-P. Ardouin, and W. Xie, “Super-resolution of hyperspectral imagery using complex ridgelet transform,” Int. J. Monte Carlo algorithm. Simulations conducted on pseudo-real Wavelets, Multiresolution Inf. Process., vol. 10, no. 03, 2012. data showed that the proposed method competed with the [14] N. Ohgi, A. Iwasaki, T. Kawashima, and H. Inada, “Japanese hyper- state-of-the-art techniques to fuse MS and HS images. These multi spectral mission,” in Proc. IEEE Int. Conf. Geosci. Remote Sens. (IGARSS), Honolulu, Hawaii, USA, July 2010, pp. 3756–3759. experiments also illustrated the robustness of the proposed [15] N. Yokoya and A. Iwasaki, “Hyperspectral and multispectral data fusion method with respect to the misspecification of the forward mission on hyperspectral imager suite (HISUI),” in Proc. IEEE Int. Conf. model. Future work includes the estimation of the parameters Geosci. Remote Sens. (IGARSS), Melbourne, Australia, July 2013, pp. 4086–4089. involved in the forward model (e.g., the spatial and spectral [16] J. Zhou, D. Civco, and J. Silander, “A wavelet transform method to responses of the sensors) to obtain a fully unsupervised fusion merge Landsat TM and SPOT panchromatic data,” Int. J. Remote Sens., algorithm. The incorporation of spectral mixing constraints for vol. 19, no. 4, pp. 743–757, 1998. a possible improved spectral accuracy and the generalization [17] M. Gonzalez-Aud´ ´ıcana, J. L. Saleta, R. G. Catalan,´ and R. Garc´ıa, “Fusion of multispectral and panchromatic images using improved to nonlinear degradations would also deserve some attention. IHS and PCA mergers based on wavelet decomposition,” IEEE Trans. Finally, a comparison with very recent fusion methods [47], Geosci. and Remote Sens., vol. 42, no. 6, pp. 1291–1299, 2004. [52] would be clearly interesting. [18] R. C. Hardie, M. T. Eismann, and G. L. Wilson, “MAP estimation for hyperspectral image resolution enhancement using an auxiliary sensor,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1174–1184, Sept. 2004. [19] Y. Zhang, S. De Backer, and P. Scheunders, “Noise-resistant wavelet- CKNOWLEDGMENTS A based Bayesian fusion of multispectral and hyperspectral images,” IEEE The authors thank Dr. Paul Scheunders and Dr. Yifan Trans. Geosci. and Remote Sens., vol. 47, no. 11, pp. 3834 –3843, Nov. 2009. Zhang for sharing the codes of [19] and Jordi Inglada, from [20] M. T. Eismann and R. C. Hardie, “Application of the stochastic mixing Centre National d’Etudes´ Spatiales (CNES), for providing the model to hyperspectral resolution enhancement,” IEEE Trans. Geosci. LANDSAT spectral responses used in the experiments. The and Remote Sens., vol. 42, no. 9, pp. 1924–1933, Sept. 2004. [21] ——, “Hyperspectral resolution enhancement using high-resolution mul- authors also acknowledge Prof. Jose´ M. Bioucas Dias for tispectral imagery with arbitrary response functions,” IEEE Trans. Image valuable discussions about this work that were handled during Process., vol. 43, no. 3, pp. 455–465, March 2005. his visit in Toulouse within the CIMI Labex. [22] X. Otazu, M. Gonzalez-Audicana, O. Fors, and J. Nunez, “Introduction of sensor spectral response into image fusion methods. Application to wavelet-based methods,” IEEE Trans. Geosci. and Remote Sens., vol. 43, REFERENCES no. 10, pp. 2376–2385, 2005. [23] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and A. O. [1] Q. Wei, N. Dobigeon, and J.-Y. Tourneret, “Bayesian fusion of hyper- Hero, “Joint Bayesian endmember extraction and linear unmixing for spectral and multispectral images,” in Proc. IEEE Int. Conf. Acoust., hyperspectral imagery,” IEEE Trans. Signal Process., vol. 57, no. 11, Speech, and Signal Processing (ICASSP), Florence, Italy, May 2014. pp. 4355–4368, 2009. [2] I. Amro, J. Mateos, M. Vega, R. Molina, and A. K. Katsaggelos, [24] M. V. Joshi, L. Bruzzone, and S. Chaudhuri, “A model-based approach to “A survey of classical methods and new trends in pansharpening of multiresolution fusion in remotely sensed images,” IEEE Trans. Geosci. multispectral images,” EURASIP J. Adv. Signal Process., vol. 2011, and Remote Sens., vol. 44, no. 9, pp. 2549–2562, Sept. 2006. no. 1, pp. 1–22, 2011. [25] C. P. Robert and G. Casella, Monte Carlo statistical methods. New [3] M. Joshi and A. Jalobeanu, “MAP estimation for multiresolution fusion York, NY, USA: Springer-Verlag, 2004. in remotely sensed images using an IGMRF prior model,” IEEE Trans. [26] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth, “Hybrid Geosci. and Remote Sens., vol. 48, no. 3, pp. 1245–1255, March 2010. Monte Carlo,” Lett. B, vol. 195, no. 2, pp. 216–222, Sept. 1987. [4] X. Ding, Y. Jiang, Y. Huang, and J. Paisley, “Pan-sharpening with [27] R. M. Neal, “MCMC using Hamiltonian dynamics,” Handbook of a Bayesian nonparametric dictionary learning model,” in Proc. Int. Markov Chain Monte Carlo, vol. 54, pp. 113–162, 2010. Conf. Artificial Intelligence and Statistics (AISTATS), Reykjavik, Iceland, [28] D. Fasbender, D. Tuia, P. Bogaert, and M. Kanevski, “Support-based 2014, pp. 176–184. implementation of Bayesian data fusion for spatial enhancement: Appli- DOBIGEON et al.: BAYESIAN FUSION OF MULTI-BAND IMAGES 11

cations to ASTER thermal images,” IEEE Geosci. and Remote Sensing Qi Wei (S’13) was born in Shanxi, China, in 1989. Lett., vol. 5, no. 4, pp. 598–602, Oct. 2008. He received the B.Sc degree in electrical engineering [29] M. Elbakary and M. Alam, “Superresolution construction of multispec- from Beihang University (BUAA), Beijing, China tral imagery based on local enhancement,” IEEE Geosci. and Remote in 2010. From February to August of 2012, he Sensing Lett., vol. 5, no. 2, pp. 276–279, April 2008. was an exchange master student with the Signal [30] J. B. Campbell, Introduction to remote sensing, 3rd ed. New-York, NY: Processing and Communications Group in the De- Taylor & Francis, 2002. partment of Signal Theory and Communications [31] R. Schultz and R. Stevenson, “A Bayesian approach to image expansion (TSC), Universitat Politecnica` de Catalunya (UPC). for improved definition,” IEEE Trans. Image Process., vol. 3, no. 3, pp. Since September of 2012, he has been a PhD student 233–242, May 1994. with the National Polytechnic Institute of Toulouse [32] J. M. Bioucas-Dias and J. M. Nascimento, “Hyperspectral subspace (University of Toulouse, INP-ENSEEIHT). He is identification,” IEEE Trans. Geosci. and Remote Sens., vol. 46, no. 8, also with the Signal and Communications Group of the IRIT Laboratory. pp. 2435–2445, 2008. His research has been focused on statistical signal processing, especially on [33] Q. Wei, N. Dobigeon, and J.-Y. Tourneret, “Bayesian fusion of multi- inverse problems in image processing. band images – Complementary results and supporting materials,” Univ. of Toulouse, IRIT/INP-ENSEEIHT, Tech. Rep., 2014. [34] R. C. Hardie, K. J. Barnard, and E. E. Armstrong, “Joint MAP registration and high-resolution image estimation using a sequence of undersampled images,” IEEE Trans. Image Process., vol. 6, no. 12, pp. 1621–1633, Dec. 1997. [35] N. A. Woods, N. P. Galatsanos, and A. K. Katsaggelos, “Stochastic methods for joint registration, restoration, and interpolation of multiple Nicolas Dobigeon (S’05–M’08–SM’13) was born undersampled images,” IEEE Trans. Image Process., vol. 15, no. 1, pp. in Angouleme,ˆ France, in 1981. He received the 201–213, Jan. 2006. Engineering degree in electrical engineering from [36] E. Punskaya, C. Andrieu, A. Doucet, and W. Fitzgerald, “Bayesian curve ENSEEIHT, Toulouse, France, and the M.Sc. degree fitting using MCMC with applications to signal segmentation,” IEEE in signal processing from the INP Toulouse, both Trans. Signal Process., vol. 50, no. 3, pp. 747–758, March 2002. in 2004, and the Ph.D. degree and Habilitation [37] C. P. Robert, The Bayesian Choice: from Decision-Theoretic Motiva- Diriger des Recherches in signal processing from tions to Computational Implementation, 2nd ed., ser. Springer Texts in INP Toulouse in 2007 and 2012, respectively. From Statistics. New York, NY, USA: Springer-Verlag, 2007. 2007 to 2008, he was a Post-Doctoral Research As- [38] M. Bouriga and O. Feron,´ “Estimation of covariance matrices based on sociate with the Department of Electrical Engineer- hierarchical inverse-Wishart priors,” J. of Stat. Planning and Inference, ing and Computer Science, University of Michigan, 2012. Ann Arbor. [39] R. M. Neal, “Probabilistic inference using Markov chain Monte Carlo Since 2008, he has been at INP Toulouse, University of Toulouse, where he methods,” Dept. of Computer Science, University of Toronto, Tech. Rep. is currently an Associate Professor. He conducts his research within the Signal CRG-TR-93-1, Sept. 1993. and Communications Group, IRIT Laboratory, and he is also an Affiliated [40] H. Zhang, Y. Zhang, H. Li, and T. S. Huang, “Generative Bayesian Faculty Member of the TeSA Laboratory. His recent research activities have image super resolution with natural image prior,” IEEE Trans. Image been focused on statistical signal and image processing, with a particular Process., vol. 21, no. 9, pp. 4054–4067, 2012. interest in Bayesian inverse problems with applications to remote sensing, [41] F. Orieux, O. Feron,´ and J.-F. Giovannelli, “Sampling high-dimensional biomedical imaging, and genomics. Gaussian distributions for general linear inverse problems,” IEEE Signal Process. Lett., vol. 19, no. 5, pp. 251–254, 2012. [42] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000. [43] J. Wang and C.-I. Chang, “Independent component analysis-based dimensionality reduction with applications in hyperspectral image anal- ysis,” IEEE Trans. Geosci. and Remote Sens., vol. 44, no. 6, pp. 1586– Jean-Yves TOURNERET (SM’08) received the 1600, 2006. ingenieur´ degree in electrical engineering from EN- [44] R. Dianat and S. Kasaei, “Dimension reduction of optical remote sensing SEEIHT, Toulouse in 1989 and the Ph.D. degree images via minimum change rate deviation method,” IEEE Trans. from the INP Toulouse in 1992. He is currently a Geosci. and Remote Sens., vol. 48, no. 1, pp. 198–206, 2010. professor in the University of Toulouse (ENSEEIHT) [45] G. O. Roberts and J. S. Rosenthal, “Coupling and ergodicity of adaptive and a member of the IRIT laboratory (UMR 5505 Markov Chain Monte Carlo algorithms,” J. of Appl. Probability, vol. 44, of the CNRS). His research activities are centered no. 2, pp. pp. 458–475, 2007. around statistical signal and image processing with [46] Y. Zhang, A. Duijster, and P. Scheunders, “A Bayesian restoration a particular interest to Bayesian and Markov chain approach for hyperspectral images,” IEEE Trans. Geosci. and Remote Monte Carlo (MCMC) methods. Sens., vol. 50, no. 9, pp. 3453 –3462, Sep. 2012. He has been involved in the organization of sev- [47] N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled nonnegative matrix eral conferences including the European conference on signal processing EU- factorization unmixing for hyperspectral and multispectral data fusion,” SIPCO’02 (program chair), the international conference ICASSP’6 (plenaries), IEEE Trans. Geosci. and Remote Sens., vol. 50, no. 2, pp. 528–537, the statistical signal processing workshop SSP’12 (international liaisons), the 2012. International Workshop on Computational Advances in Multi-Sensor Adap- [48] Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE tive Processing CAMSAP 2013 (local arrangements), the statistical signal Signal Process. Lett., vol. 9, no. 3, pp. 81–84, 2002. processing workshop SSP’2014 (special sessions), the workshop on machine [49] L. Wald, “Quality of high resolution synthesised images: Is there a learning for signal processing MLSP2014 (special sessions). He has been the simple criterion?” in Proc. Int. Conf. Fusion of Earth Data, , France, general chair of the CIMI workshop on optimization and statistics in image Jan 2000, pp. 99–103. processing hold in Toulouse in 2013 (with F. Malgouyres and D. Kouame)´ and [50] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. Benediktsson, “SVM- and of the International Workshop on Computational Advances in Multi-Sensor MRF-based method for accurate classification of hyperspectral images,” Adaptive Processing CAMSAP 2015 (with P. Djuric). He has been a member IEEE Trans. Geosci. and Remote Sens., vol. 7, no. 4, pp. 736–740, 2010. of different technical committees including the Signal Processing Theory and [51] S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, and T. Wittman, Methods (SPTM) committee of the IEEE Signal Processing Society (2001- “An adaptive IHS pan-sharpening method,” IEEE Geosci. and Remote 2007, 2010-present). He has been serving as an associate editor for the IEEE Sensing Lett., vol. 7, no. 4, pp. 746–750, 2010. Transactions on Signal Processing (2008-2011) and for the EURASIP journal [52] Q. Wei, J. M. Bioucas-Dias, N. Dobigeon, and J.-Y. Tourneret, “Hyper- on Signal Processing (since July 2013). spectral and multispectral image fusion based on a sparse representa- tion,” IEEE Trans. Geosci. and Remote Sens., 2015, to appear.