Nonlinear Regression Using Smooth Bayesian Estimation Abderrahim Halimi, Corinne Mailhes, Jean-Yves Tourneret

Nonlinear regression using smooth Bayesian estimation Abderrahim Halimi, Corinne Mailhes, Jean-Yves Tourneret

To cite this version:

Abderrahim Halimi, Corinne Mailhes, Jean-Yves Tourneret. Nonlinear regression using smooth Bayesian estimation. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), Apr 2015, South Brisbane, QLD, Australia. pp. 2634-2638. hal-01485021

HAL Id: hal-01485021 https://hal.archives-ouvertes.fr/hal-01485021 Submitted on 8 Mar 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés.

Open Archive TOULOUSE Archive Ouverte ( OATAO ) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web w here possible.

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 17108

The contribution was presented at ICASSP 2015 : http://icassp2015.org/

To cite this version : Halimi, Abderrahim and Mailhes, Corinne and Tourneret, Jean-Yves Nonlinear regression using smooth Bayesian estimation. (2015) In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), 19 April 2015 - 24 April 2015 (South Brisbane, QLD, Australia).

Any corresponde nce concerning this service should be sent to the repository administrator: staff -oatao@listes -diff.inp -toulouse.fr NONLINEAR REGRESSION USING SMOOTH BAYESIAN ESTIMATION

Abderrahim Halimi, Corinne Mailhes and Jean-Yves Tourneret

(1) University of Toulouse, IRIT/INP-ENSEEIHT/TéSA, Toulouse, France e-mail: {Abderrahim.Halimi, Corinne.Mailhes, Jean-Yves.Tourneret}@enseeiht.fr.

ABSTRACT chain Monte Carlo (MCMC) simulation based methods that will be considered in this paper (see [8] for more details about This paper proposes a new Bayesian strategy for the estima- these methods). tion of smooth parameters from nonlinear models. The observed signal is assumed to be corrupted by an independent The main contribution of this paper is the elaboration of and non identically (colored) Gaussian distribution. A prior a hierarchical Bayesian model that allows smooth estimation enforcing a smooth temporal evolution of the model parame- of parameters associated with different temporal signals. The ters is considered. The joint posterior distribution of the un- observed signals are assumed to be corrupted by an additive, known parameter vector is then derived. A Gibbs sampler independent and colored Gaussian noise. The parameter of coupled with a Hamiltonian Monte Carlo algorithm is pro- interest are assigned a prior enforcing smooth evolution be- posed which allows samples distributed according to the pos- tween consecutive signals which improves their estimation. terior of interest to be generated and to estimate the unknown This prior is defined from the discrete Laplacian of the dif- model parameters/hyperparameters. Simulations conducted ferent parameters. It has shown increasing interest for many with synthetic and real satellite altimetric data show the po- problems such as image deconvolution [11, 12], hyperspec- tential of the proposed Bayesian model and the corresponding tral unmixing [13], medical imaging [14] and spectroscopy estimation algorithm for nonlinear regression with smooth es- applications [15]. An algorithm is then proposed for estimat- timated parameters. ing the unknown model parameters. However, the minimum mean square error (MMSE) and maximum a posteriori (MAP) Index Terms— Bayesian algorithm, Hamiltonian Monte- estimators cannot be easily computed from the obtained joint Carlo, MCMC, Parameter estimation, Radar altimetry. posterior. The proposed algorithm alleviates this problem by generating samples distributed according to this posterior us- 1. INTRODUCTION ing Markov chain Monte Carlo (MCMC) methods. More pre- cisely, we use a Hamiltonian Monte Carlo (HMC) algorithm In many applications, the observed data are well described by since it has shown good mixing property for high-dimensional a nonlinear function of a vector of parameters [1–4]. This vectors [16]. paper aims at estimating these parameters from the observed The proposed estimation strategy is validated using syn- data and a non-linear regression model. This can be achieved thetic signals as well as real satellite altimetric echoes. Al- by using maximum likelihood based methods [5] or, equiva- timetric radar echoes are defined as a nonlinear function of lently, in the case of Gaussian noise, by using nonlinear least physical parameters [2,4] (the epoch τ related to the distance squares algorithms such as the Levenberg-Marquardt algo- satellite-observed scene, the significant wave height SWH rithm [6] and the natural gradient algorithm [7]. However, and the signal amplitude Pu). Moreover, the noise corrupt- the resulting estimated parameters may be noisy and not con- ing these echoes is known to have an approximate Gaussian venient for physical interpretation. It is particularly true when distribution which has been exploited to derive unweighted the estimation procedure is applied to signals acquired at con- least squares (ULS) techniques [17–19] for parameter esti- secutive time instants and when the parameters of these sig- mation. Note finally that the parameters of altimetric signals nals have small variations from one instant to another. In are often estimated echo by echo independently. However, this case, a lot of effort has been made to propose meth- recent works [20, 21] have shown the interest of considering ods smoothing the estimated parameters. This smoothing is echo’s correlation which motivates the study of the proposed generally achieved by adding some time correlation prior re- Bayesian approach. sulting in the so-called Bayesian smoothing algorithms [8]. The paper is organized as follows. Section 2 presents the Many kinds of numerical approximations have also been pro- hierarchical Bayesian model for the joint estimation of param- posed in the literature to handle the model nonlinearity while eters varying smoothly from one observed signal to another smoothing the parameters. For example, we can mention the while Section 3 details the proposed estimation algorithm. extended and unscented Kalman filters [9,10] and the Markov Simulation results performed on synthetic and real signals are 2 presented in Sections 4 and 5. Conclusions and future work for i 1, ,H , where ǫi is an hyperparameter, de- ∈ { ··· } 2 T || · || are finally reported in Section 6. notes the standard l2 norm such that x = x x and D is the discrete Laplacian operator. This|| prior|| has been referred 2. HIERARCHICAL BAYESIAN MODEL to as simultaneous autoregression (SAR) or conditional autoregression (CAR) models when used for image deconvolu- 2.1. Observation model tion [11, 12]. It has also been used for the spectral unmixing of hyperspectral images [13] or for medical imaging applica- In this work, we consider M successive signals Y = (y1, tions [14]. , yM ) defined as noisy nonlinear functions of unknown parameters··· Θ ΘT , ..., ΘT T following the model = ( 1 M ) 2.4. Prior for the noise parameters

y = sm (Θm) + em, with em (µm1K , Σm) (1) The absence of knowledge about the noise mean can be con- m ∼ N sidered by choosing the following Jeffreys prior f(µ) 1 . ∝ M where ym and sm are (K 1) vectors representing the mth Considering the noise variances, one could estimate a diag- observed and noiseless signals,× Θ = [θ (m), , θ (m)] m 1 H onal matrix Σm for each observed signal. However, for the is a 1 H vector containing the H parameters of··· the mth sig- × sake of simplicity, we assume that r consecutive signals have nal, em is a Gaussian noise vector with a mean µm1K , where 2 2 the same variances, i.e., σ(n 1)r+1,k = = σnr,k for n 1K is a (K 1) vector of 1, and a diagonal covariance matrix M − ··· ∈ × T 1, ,N , with N = (note that the general case is ob- Σ = diag σ2 with σ2 = σ2 , , σ2 a (K 1) { ··· } r m m m m1 ··· mK × tained by considering r = 1). Assuming prior independence vector. The proposed nonlinear regression method aims at 2 Λ between the noise variances σnr,k, the Jeffreys prior of is estimating both signal and noise parameters with smothness defined as constraints using the observation model (1). N K 1 2 f (Λ) = IR σ (5) σ2 + nr,k 2.2. Likelihood n=1 k=1 nr,k Y Y The observation model defined in (1) and the Gaussian prop- where IA(.) is the indicator of the set A. erties of the noise sequence em yield 2.5. Hyperparameter priors The hyperparameters ǫ2, i 1, ,H are assigned a Jef- Θ Σ 1 1 T Σ 1 i f(ym m, µm, m) exp xm m− xm freys prior given by ∈ { ··· } | ∝ K 2 −2 k=1 σmk 2 1 I 2 q (2) f ǫ = R+ ǫ (6) Q i ǫ2 i where means “proportional to”, xm = y sm µm1K i ∝ m − − and sm (Θm) has been denoted by sm for brevity. Assuming which reflects the absence of knowledge about these coeffi- independence between the observations leads to cients [22]. Moreover, these hyperparameters are supposed to

M be a priori independent leading to f(Y Θ, µ, Λ) f(y Θ , µ , Σ ). (3) H | ∝ m| m m m m=1 f ǫ2 f ǫ2 . (7) Y = ( i ) i=1 The unknown parameters of the observation model (1) in- Y clude the noise mean represented by an (M 1) vector µ = with ǫ2 = ǫ2, , ǫ2 . 1 ··· H (µ , , µ )T , the (K M) matrix Λ =× σ2, , σ2 1 M 1 M containing··· the noise variances× associated with the considered··· 2.6. Posterior distribution Θ M signals, and the (M H) matrix = [θ1, , θH ] gath- The proposed Bayesian model depends on the parameters × ··· ering the H parameters of the M signals. Θ, µ, Λ and hyperparameters ǫ2. The joint posterior distribution of the unknown parameters and hyperparameter can 2.3. Prior for signal parameters be computed from the following hierarchical structure Θ The prior used for each parameter θi enforces some f (Θ, µ, Λ Y ) f(Y Θ, µ, Λ)f (Θ, µ, Λ) (8) smoothness property for the time evolution∈ of this parame- | ∝ | Θ Λ Λ H 2 ter. This can be done by constraining the derivative of this with f ( , µ, ) = f (µ) f ( ) i=1 f(θi ǫi ), after assum- parameter to be small. In this paper, we propose to assign a ing a priori independence between the model| parameters. The Q Gaussian prior distribution to the second derivative of θi as MMSE and MAP estimators associated with the posterior (8) follows are not easy to determine mainly because of the nonlinearity of the observation model. The next section presents an 1 M/2 1 f(θ ǫ2) exp Dθ 2 (4) MCMC estimation algorithm that can be used to compute i| i ∝ ǫ2 −2ǫ2 k ik i i these MMSE and MAP estimators. 3. ESTIMATION ALGORITHM 4. SIMULATION RESULTS The principle of the Gibbs sampler is to sample according to 4.1. Altimetric signals the conditional distributions of the posterior of interest [23]. In this paper, we propose to use this principle to sequen- This section first considers synthetic altimetric signals defined tially sample the parameters Θ, µ, Λ and ǫ. When a condi- by the physical Brown model [2] defined by tional distribution cannot be sampled directly, sampling tech- 2 niques such as the HMC algorithm can be applied. This algo- Pu kT τT ασc sk = 1 + erf − − rithm has shown better mixing properties than independent or 2 √2σ c random walk Metropolis-Hasting moves especially for high- ασ2 dimensional problems [16, 24]. Therefore, it will be consid- exp α kT τT c (13) × − − − 2 ered in the present paper since the variable to be sampled are of size (M 1). The interested reader is invited to consult where sk = s(kT ) is the kth data sample of the received × 2 [16, 24] for more details about the HMC algorithm. 2 SWH 2 2 2 t z signal, σc = 2c + σp, erf (t) = √π 0 e− dz is the Θ 2 3.1. Sampling Gaussian error function, α and σp are twoR known satellite parameters, τ is the epoch expressed by samples (1 sample Using the likelihood (3) and the prior (4) leads to the follow- 46 cm), c is the speed of light and T is the time resolution. ing conditional distribution Note≈ that the discrete altimetric echo is gathered in the (K 1) T × M T Σ 1 2 vector s = (s1, , sK ) , where K = 128 samples. xm m− xm Dθi ··· f(θi Y , Ω i) exp k k The altimetric echoes are corrupted by a speckle noise | \ ∝ − 2 − 2ǫ2 ! m=1 i # X whose influence is reduced by averaging (on-board the satel- (9) lite) a sequence of L consecutive echoes. Considering the Ω Λ 2 where i = θ1, , θi 1, θi+1, , θH , µ, , ǫi . The \ ··· − ··· central limit theorem and using the fact that the averag- conditional distribution (9) has a complex form mainly be- ing is conducted on a large number of echoes, the result- cause of the nonlinearity of the theoretical model with respect Θ ing noise sequence is approximated by a Gaussian distribu- to the parameters . This distribution is sampled using a tion. This approximation is largely adopted in the altimetric HMC algorithm. community as shown in [25–27] and in the well known LS estimation algorithms used in [18, 20, 21, 28]. There- 3.2. Noise parameters fore, the observation model (1) and the proposed estimation Using (3) and the Jeffreys prior for the noise mean µ defined strategy are well adapted for the processing of altimetric in Section 2.4, it can be easily shown that the conditional dis- echoes. Note that we consider H = 3, M = 500 echoes, tribution of µ is the following Gaussian distribution Θm: = [SWH(m), τ(m),Pu(m)] and that the parameters generally belong to the following intervals of realistic values 1 µ y , Θ , Σ µ , (10) SWH [0, 50] m, τ [5, 70] and Pu > 0. Note finally that m| m m m ∼ N m K 2 ∈ ∈ ! k=1 σmk− # we consider the same noise covariance for r = 20 successive − K ymk smk echoes. Indeed, after averaging L echoes, the altimeter de- P 2 k=1 σ P mk livers 20 averaged echoes per second that will have the same with µm = K −2 . Similarly, using (3) and (5), it Pk=1 σmk can be shown that noise covariance matrix. N K 4.2. Results on synthetic data f (Λ Y , Θ, µ) = f σ2 Y , Θ, µ (11) | nr,k| :k n=1 k=1 The proposed strategy (denoted by SBMC for smooth Bayesian Y Y MC) is first studied when considering 500 correlated altimet- and that σ2 Y , Θ, µ is distributed according to the fol- nr,k :k ric echoes. This correlation is introduced by considering a lowing inverse-gamma| distribution smooth evolution of the altimetric parameters. More pre- 2 r cisely, the synthetic parameters have been chosen as follows σ Y :k, Θ, µ , β (12) nr,k| ∼ IG 2 SWH(m) = 2.5 + 2 cos(0.07m), τ(m) = 27 + 0.02m 2 nr x if m < 250 and τ(m) = 32 0.02m if m 250, and with β = mk . Note finally that the distribu- m=(n 1)r+1 2 P (m) = 158 + 0.05 sin(0.1m)−, where m denotes≥ the echo tions (10) and (12)− are easy to sample. u P number. The synthetic echoes are corrupted by a speckle 3.3. Hyperparameters noise resulting from the averaging of L = 90 echoes. The SBMC is compared to the state of the art ULS algorithm 2 The conditional distribution of the hyperparameters ǫi is an described in [17, 18]. Fig. 1 shows the actual parameter Dθ 2 2 M i values and the estimated ones by considering the ULS and inverse-gamma distribution defined by ǫi θi 2 , k 2 k | ∼ IG the SBMC algorithms for 100 echoes. SBMC provides better that is easy to sample. results than ULS due to the smoothness constraints enforced good adequation between the means of the estimated param- by the prior (4). Table 1 confirms this result since we ob- eters for both ULS and SBMC (except for SWH). Moreover, tain better standard-deviations (STDs) with the proposed the estimated STDs obtained with SBMS are smaller than approach. Note finally that the SBMC shows reduced bias for ULS which is of great importance for many practical (except for Pu) mainly because it exploits the fact that the applications related to oceanography such as bathymetry. noise is colored contrary to ULS.

Fig. 2. Estimated parameters using the ULS (red line) and Fig. 1. Comparison of the actual parameters (black dashed SBMC algorithms (blue line). line) with the estimated ones using the ULS algorithm (red line) and the proposed SBMC algorithm (blue line). Table 2. Parameter means and STDs for real Jason-2 data (43000 echoes). SWH (cm) τ (cm) P Table 1. Parameter biases and STDs on synthetic data (500 u ULS 237 14.67 167.81 echoes). Mean SBMC 270 14.69 167.53 SWH (cm) τ (cm) Pu ULS 3.18 1.11 -0.01 ULS 53.4 10.41 5.9 Bias STD SBMC -0.02 -0.13 -0.19 SBMC 4.3 5.22 4.82 ULS 44.7 6.1 1.91 STD SBMC 5.74 1.8 0.52 6. CONCLUSIONS

This paper introduced a Bayesian model for smooth estima- 5. RESULTS ON REAL JASON-2 DATA tion of parameters associated with nonlinear models. The proposed model considers an appropriate prior distribution This section illustrates the performance of the proposed enforcing a smooth temporal evolution of the parameters of SBMC algorithm when applied to a real Jason-2 dataset. interest. Due to the complexity of the resulting joint pos- The considered data lasts 36 minutes and consists of 43000 terior distribution, an MCMC procedure (based on a hybrid real echoes. Fig. 2 shows 500 estimated parameters when Gibbs sampler) was investigated to sample the posterior of considering the ULS and SBMC algorithms. The ULS es- interest and to approximate the Bayesian estimators of the timates suffers from the noise corrupting the echoes while unknown parameters using the generated samples. The pro- SBMC provides smoother estimates that are physically more posed SBMC algorithm showed good performance and im- consistent. Moreover, SBMC appears to be more robust to proved the quality of the estimated parameters when applied outliers as illustrated for the estimate #390 of Pu. Note that to both synthetic and real altimetric signals. It was also shown the estimated SWH is slightly larger for SBMC when com- to be robust to parameter outliers. Future work includes the pared to ULS. This difference is mainly due to the i.i.d. noise consideration of optimization algorithms for solving the pro- assumption used in ULS that is not in adequation with noise posed nonlinear regression problem with a reduced computa- correlations as already discussed in [21, 27]. Table 2 shows a tional cost. 7. REFERENCES [18] A. Halimi, C. Mailhes, J.-Y. Tourneret, P. Thibaut, and F. Boy, “A semi-analytical model for delay/Doppler altimetry and its [1] B. W. Hapke, “Bidirectional reflectance spectroscopy. I. The- estimation algorithm,” IEEE Trans. Geosci. Remote Sens., vol. ory,” J. Geophys. Res., vol. 86, pp. 3039–3054, 1981. 52, no. 7, pp. 4248–4258, July 2014. [2] G. Brown, “The average impulse response of a rough surface [19] A. Halimi, C. Mailhes, J.-Y. Tourneret, T. Moreau, and F. Boy, and its applications,” IEEE Trans. Antennas and Propagation, “Including antenna mispointing in a semi-analytical model for vol. 25, no. 1, pp. 67–74, Jan. 1977. delay/Doppler altimetry,” IEEE Trans. Geosci. Remote Sens., [3] A. Halimi, Y. Altmann, N. Dobigeon, and J.-Y. Tourneret, vol. 53, no. 2, pp. 598–608, Feb. 2015. “Nonlinear unmixing of hyperspectral images using a gener- [20] S. Maus, C. M. Green, and J. D. Fairhead, “Improved ocean- alized bilinear model,” IEEE Trans. Geosci. Remote Sens., vol. geoid resolution from retracked ERS-1 satellite altimeter wave- 49, no. 11, pp. 4153–4162, 2011. forms,” Geophys. J. Int., vol. 134, no. 1, pp. 243–253, Feb. [4] A. Halimi, C. Mailhes, J.-Y. Tourneret, P. Thibaut, and F. Boy, 1998. “Parameter estimation for peaky altimetric waveforms,” IEEE [21] D. T. Sandwell and W. H. F. Smith, “Retracking ERS-1 altime- Trans. Geosci. Remote Sens., vol. 51, no. 3, pp. 1568–1577, ter waveforms for optimal gravity field recovery,” Geophys. J. March 2013. Int., vol. 163, no. 1, pp. 79–89, Oct. 2005. [5] Harry L. Van Trees, Detection, Estimation, and Modulation [22] N. Dobigeon, S. Moussaoui, M. Coulon, J.-Y. Tourneret, and Theory: Part I, Wiley, New York, 1968. A. O. Hero, “Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery,” IEEE Trans. Signal [6] D. P. Bertsekas, Nonlinear programming, Athena Scientific, Process., vol. 57, no. 11, pp. 4355–4368, Nov. 2009. Belmont, Massachusetts, 1995. [23] C. P. Robert, The Bayesian Choice: from Decision-Theoretic [7] S. Amari and S. C. Douglas, “Why natural gradient?,” in Proc. Motivations to Computational Implementation, Springer Texts IEEE ICASSP-98, May 1998, vol. 2, pp. 1213–1216. in Statistics. Springer-Verlag, New York, 2 edition, 2007. [8] S. Särkkä, Bayesian Filtering and Smoothing, Cambridge Uni- [24] A. Halimi, N. Dobigeon, and J.-Y. Tourneret, “Unsupervised versity Press, Cambridge, UK, 2013. unmixing of hyperspectral images accounting for endmember [9] S. Haykin, Kalman Filtering and Neural Networks, John Wiley variability,” in ArXiv e-prints, Jun. 2014. & Sons, Inc., New York, USA, 2001. [25] C. Martin-Puig and G. Ruffini, “SAR altimeter retracker per- [10] F. Gustafsson and G. Hendeby, “Some relations between ex- formance bound over water surfaces,” in Proc. IEEE Int. Conf. tended and unscented kalman filters,” IEEE Trans. Signal Pro- Geosci. Remote Sens. (IGARSS), Cape Town, South Africa, cess., vol. 60, no. 2, pp. 545–555, Feb 2012. July 12-17, 2009, pp. 449 –452. [11] P. Campisi and K. Egiazarian, Blind image deconvolution: the- [26] O. Germain and G. Ruffini, “A revisit to the GNSS-R code ory and applications, Taylor & Francis, cop. 2007, Boca Ra- range precision,” in Proc. GNSS-R, Noordwijk, The Nether- ton, FL, 2007. lands, June 14-15, 2006. [12] R. Molina, J. Mateos, and A. K. Katsaggelos, “Blind decon- [27] A. Halimi, C. Mailhes, and J.-Y. Tourneret, “Cramér-Rao volution using a variational approach to parameter, image, and bounds and estimation algorithms for delay/Doppler and con- blur estimation,” IEEE Trans. Image Process., vol. 15, no. 12, ventional altimetry,” in Proc. EUSIPCO, Marrakech-Marocco, pp. 3715–3727, Dec. 2006. Sept. 9-13 2013. [13] J. Sigurdsson, M.O. Ulfarsson, and J.R. Sveinsson, “Hyper- [28] C. Martin-Puig, P. Berry, R. Smith, C. Gommenginger, spectral unmixing with lq regularization,” IEEE Trans. Geosci. G. Ruffini, P. Cipollini, L. Stenseng, Andersen O., P.D. Cot- Remote Sens., vol. 52, no. 11, pp. 6793–6806, Nov. 2014. ton, J. Benveniste, and S. Dinardo, “SAR altimetry over water surfaces,” in Oceans from Space, Venice, Italy, April 2010. [14] L. Chaari, T. Vincent, F. Forbes, M. Dojat, and P. Ciuciu, “Fast joint detection-estimation of evoked brain activity in event- related fMRI using a variational approach,” IEEE Trans. Image Process., vol. 32, no. 5, pp. 821–837, May 2013. [15] V. Mazet, “Joint bayesian decomposition of a spectroscopic signal sequence,” IEEE Signal Process. Lett., vol. 18, no. 3, pp. 181–184, March 2011. [16] S. Brooks, A. Gelman, G. L . Jones, and X.-L. Meng, Hand- book of Markov chain Monte Carlo, ser. Chapman & Hall/CRC Handbooks of Modern Statistical Methods. Taylor & Francis, 2011. [17] L. Amarouche, P. Thibaut, O. Z. Zanife, J.-P. Dumont, P. Vin- cent, and N. Steunou, “Improving the Jason-1 ground retracking to better account for attitude effects,” Marine Geodesy, vol. 27, no. 1-2, pp. 171–197, Aug. 2004.