Signal Processing Manuscript Draft

Manuscript Number: SIGPRO-D-14-00375

Title: Optimum Linear Regression in Additive Cauchy-Gaussian Noise

Article Type: Fast Communication

Keywords: Impulsive noise, , Gaussian distribution, mixture noise, Voigt profile, maximum likelihood estimator, pseudo-Voigt function, M-estimator

Abstract: In this paper, we study the estimation problem of linear regression in the presence of a new impulsive noise model, which is a sum of Cauchy and Gaussian random variables in time domain. The probability density function (PDF) of this mixture noise, referred to as the Voigt profile, is derived from the of the Cauchy and Gaussian PDFs. To determine the linear regression parameters, the maximum likelihood estimator is first developed. Since the Voigt profile suffers from a complicated analytical form, an M-estimator with the pseudo-Voigt function is also derived. In our algorithm development, both scenarios of known and unknown density parameters are considered. In the unknown scenario, density parameters need to be estimated prior to proposals, by utilizing the empirical characteristic function and characteristic function. Simulation results show that the performance of both proposed methods can attain the Cram\'{e}r-Rao lower bound.

Highlights (for review)

Highlights:

 An additive mixture noise is studied in this paper and the corresponding noise PDF, i.e., the Voigt function is derived.

 To determine the parameters of a linear regression model, the maximum likelihood estimator (MLE) is developed, where both the scenarios of known and unknown density parameters, are considered.

 To reduce the computational complexity of the MLE, an M-estimator with pseudo-Voigt function is presented.

 Both presented estimators approach the CRLB well.

*Manuscript Click here to view linked References

1 2 3 4 5 6 7 8 9 10 Optimum Linear Regression in Additive Cauchy-Gaussian Noise 11 12 ∗ ,1 2 1 13 Yuan Chen , Ercan Engin Kuruoglu , Hing Cheung So 14 1 15 Department of Electronic Engineering, City University of Hong Kong, Hong Kong SAR, China 16 2 17 ISTI-CNR (Italian National Council of Research), Pisa, Italy 18 19 20 Abstract: In this paper, we study the estimation problem of linear regression in the presence of a new impulsive 21 22 noise model, which is a sum of Cauchy and Gaussian random variables in time domain. The probability density 23 function (PDF) of this mixture noise, referred to as the Voigt profile, is derived from the convolution of the 24 25 Cauchy and Gaussian PDFs. To determine the linear regression parameters, the maximum likelihood estimator 26 is first developed. Since the Voigt profile suffers from a complicated analytical form, an M-estimator with the 27 28 pseudo-Voigt function is also derived. In our algorithm development, both scenarios of known and unknown 29 density parameters are considered. In the unknown scenario, density parameters need to be estimated prior to 30 31 proposals, by utilizing the empirical characteristic function and characteristic function. Simulation results show 32 33 that the performance of both proposed methods can attain the Cram´er-Rao lower bound. 34 Indexing terms: Impulsive noise, Cauchy distribution, Gaussian distribution, mixture noise, Voigt profile, max- 35 36 imum likelihood estimator, pseudo-Voigt function, M-estimator 37 38 39 1 Introduction 40 41 42 Impulsive noise is encountered in a variety of applications such as wireless communications, radar, sonar and 43 image processing [1]. Unlike Gaussian noise, impulsive noise belongs to a family of heavy-tailed noise distri- 44 45 butions. Popular models in the literature for impulsive noise are divided into two categories, namely, single 46 47 process and hybrid process mixed in the probability density function (PDF) domain. Typical single distribu- 48 tions are Student’s t-distribution [2], α- [3] and generalized Gaussian (GG) process [4], while 49 50 the mixture models include Gaussian mixture (GM) [5] and Cauchy Gaussian mixture (CGM) [6]. Neverthe- 51 less, these models alone may not be able to represent all varieties of impulsive noises in the real world such 52 53 as the case that the noise measured is the sum of two separate time series: one is an intrinsic Gaussian noise 54 55 due to the electronic devices in receiver and the other is environmental noise which can be non-Gaussian, in 56 particular impulsive. For example, considering some schemes in frequency-hopping spread spectrum (FH SS) 57 58 radio communication networks [7], binary transmission systems [8] and multiple-input multiple-output (MIMO) 59 systems [9], we model the multiple access interference as the α-stable distribution and regard the environmental 60 61 noise as the Gaussian distribution. Similarly, in astrophysical imaging [10], the cosmic microwave background 62 radiation is contaminated with the Gaussian noise from the satellite beam and α-stable distributed radiation 63 64 ∗Corresponding Author (Email: [email protected]; Fax: (852) 2788 7791) 65

1 1 2 3 4 5 from galaxies and stars. In these potential applications, the disturbance components can be combined into a 6 7 new mixture model which is a sum of two different random processes in the time domain. 8 To demonstrate the applicability of this model, we consider the linear regression problem and take the sum of 9 2 10 a symmetric Cauchy distribution with dispersion γ and zero-mean Gaussian distribution with σ as an 11 illustrative example. This mixture model belongs to the Middletons Class B [11] which is a classical impulsive 12 13 noise model that has been employed for decades. The PDF of the mixture has an analytical form, known as the 14 Voigt function [12], which is obtained via the convolution of the PDFs of these two processes. When the density 15 16 parameters, namely, γ and σ2 are known, the PDF of the mixture is readily determined, and the maximum 17 18 likelihood estimator (MLE) which is a special case of M-estimator, can be directly applied to find the parameters 19 of interest. The class of M-estimators introduced by Huber [13] generalizes the MLE by replacing the logarithm 20 21 of the likelihood function by an arbitrary ρ-function. Note that the MLE is in the class of M-estimators by 22 letting ρ = log (f(y)) with f(y) denoting the likelihood function. However, when γ and σ2 are unknown, 23 − 24 they should be estimated through the relationship between the empirical characteristic function (ECF) and the 25 26 characteristic function (CF) prior to employing the MLE. Although the MLE has the best performance in the 27 sense of attaining Cram´er-Rao lower bound (CRLB), it suffers from having a highly complex analytical form 28 29 because of the Faddeeva function that appears in the PDF of the mixture noise. Therefore, in order to keep 30 the high accuracy of the MLE and to reduce the computational complexity, a new M-estimator with the loss 31 32 function chosen as the logarithm of pseudo-Voigt function is employed, which is referred to as the MEPV. 33 The rest of this paper is organized as follows. The proposed methods, namely, the MLE and MEPV are 34 35 presented in Section 2. Both cases of known and unknown density parameters are investigated. Computer 36 37 simulations are provided in Section 3 to evaluate the accuracy and complexity of the MLE and MEPV. Finally, 38 conclusions are drawn in Section 4. 39 40 41 42 2 Proposed Algorithms 43 44 Without loss of generality, the observed data vector y = [y y ]T is modeled as: 45 1 ··· N 46 47 yn = sn(θ)+ en, n =1, 2,...,N, (1) 48 49 where sn(θ) denotes the noise-free signal with θ being the parameter vector of interest, en = pn + qn is the 50 mixture noise which is a sum of two independent and identically distributed (i.i.d.) processes p and q , whose 51 n n 52 PDFs are fP and fQ, respectively. 53 54 The PDF of en can be obtained from the convolution of fP and fQ: 55 56 fE = fP fQ, (2) 57 ∗ 58 where stands for the convolution operator. 59 ∗ T 60 Considering the simplest case of the linear regression model, i.e., sn(θ) = sn([A B] ) = An + B, where A 61 and B are the unknown parameters, the data model can be rewritten in vector form as: 62 63 64 y = Hθ + e n =1, 2,...,N, (3) 65

2 1 2 3 4 5 where 6 7 1 1 8  2 1 A 9 H = , θ = (4)  . .   10  . . B 11       12 N 1   13   and e = [e e ]T with e = c + g denoting the additive Cauchy Gaussian (ACG) noise which is the 14 1 ··· N n n n 15 sum of i.i.d. Cauchy noise c with dispersion γ and the i.i.d. zero-mean Gaussian noise g with variance σ2. 16 n n 17 Although we only study this simple model, our analysis can be extended to the general linear data model [14], 18 N M M that is, H R × where N M is known and θ R is unknown. It is noteworthy that (3)-(4) are also 19 ∈ ≥ ∈ 20 a common signal model for kick detection in oil drilling [15]. The PDFs of Cauchy and Gaussian distributions 21 22 are: 23 γ 24 f (c ; γ)= , (5) C n π(c2 + γ2) 25 n 2 26 2 1 gn fG(gn; σ )= exp . (6) 27 √2πσ −2σ2 28   29 Then the PDF of en is calculated based on (2):

30 2 2 ∞ γ 1 τ − 2σ2 31 fE(en; γ, σ )= 2 2 e dτ. (7) π((en τ) + γ ) √2πσ 32 Z−∞ − 33 The result of (7) is called the Voigt function which can be represented as [12] 34 35 2 Re w fE(en; γ, σ )= { }, (8) 36 σ√2π 37 38 where 2 en+iγ 39 √ en + iγ 2i σ 2 2 40 w = exp 1+ exp t dt (9) − σ√2 √π 41   ! Z0 !  42 and w is called the Faddeeva function with Re denoting the real part. 43 44 To estimate the parameter vector θ, we can utilize the M-estimator [13] whose cost function is:

45 N N 46 J(θ)= ρ = log f(y ,θθ; γ, σ2) , (10) 47 n − n n=1 n=1 48 X X  2 49 where ρn is an arbitrary function [13]. Note that the M-estimator coincides with the MLE when f(yn,θθ; γ, σ ) 50 2 51 is the ACG’s PDF fE(yn,θθ; γ, σ ). In the following, we introduce two types of functions, namely, the Voigt 52 function and its approximation which is referred to as the pseudo-Voigt function. 53 54 55 2.1 Maximum Likelihood Estimator 56 57 We first use the MLE to find the unknown parameters, assuming the scenario of known γ and σ2. The study 58 59 is then extended to the case of unknown distribution parameters. 60 In the first scenario, the PDF of the mixture noise is known and the PDF of y is: 61 62 N Re w 63 f (y,θθ; γ, σ2)= { n}, (11) E √ 64 n=1 σ 2π 65 Y

3 1 2 3 4 5 where 6 yn hnθ+iγ 2 − √ 7 yn hnθ + iγ 2i σ 2 2 wn = exp − 1+ exp t dt (12) 8 − σ√2 √π   ! Z0 ! 9  10 with hn = [n 1]. 11 12 In the case that γ and σ2 are unknown, an exact expression of the PDF cannot be derived readily, therefore, 13 γ and σ2 are estimated using the ECF first. For the ACG noise, the CF of the observed data y is: 14 n 15 2 t 2 16 φ(t)= E exp(iynt) = exp it(An + B) γ t σ , (13) { } − | |− 2 17   18 where E stands for expectation and the magnitude of φ(t) is: 19 20 2 t 2 21 φ(t) = exp γ t σ . (14) | | − | |− 2 22   23 Taking the logarithm on both sides of (14) yields: 24 25 t2 Φ(t)= log ( φ(t) )= γ t + σ2. (15) 26 − | | | | 2 27 28 On the other hand, the ECF, denoted by ψ(t), is 29 N 30 1 31 ψ(t)= eiynt. (16) N 32 n=1 33 X 34 The error distribution between the ECF and CF is unknown. Here ℓ1-norm estimator is employed. Let Ψ(t)= 35 2 log( ψ(t) ), γ and σ can be estimated if t is chosen in a grid t [t1,t ] [16]: 36 − | | ∈ K 37 2 38 γ,ˆ σˆ = arg min Ψ Fx 1, (17) γ,˜ σ˜2 || − || 39  40 where 41 2 42 t1 t1 2 43 | | 2 t2 44  t2  γ | | 2 T 45 F = , F = , Ψ = [Ψ(t1) Ψ(tK )] . (18)  . .   2 ··· 46  . .  σ  2  47  tK     tK  48 | | 2    49 Since (17) is not differentiable, subgradient method [17] is employed to update xˆ: 50 51 (ℓ+1) (ℓ) (ℓ) 52 xˆ = xˆ ηℓg , (19) − 53 54 (ℓ) (ℓ) T (ℓ) (ℓ) where stands for the ℓth iteration, g = F sign(ΨΨ Fx ) and ηℓ = 1/ g 2. We employ the least 55 − − || || 2 (0) xˆ(ℓ+1) xˆ(ℓ) 56 squares solution Ψ Fx 2 as xˆ and update (19) until the relative error xˆ(ℓ+1)− <ǫ is reached, where 57 || − || ǫ > 0 is the tolerance. After γ and σ2 have been estimated by (19), the PDF of y, namely, f (y,θθ;ˆγ, σˆ2) is 58 E 59 calculated by (11). 60 61 The MLE of θ is the minimum of the cost function in (10): 62 ˆ 63 θ = arg min J1(θ) , (20) 64 θ˜ { } 65

4 1 2 3 4 N 5 where J1(θ)= n=1 log (Re wn ). 6 − { } 7 The MLE costP function in (20) is multimodal for the linear regression model. In our study, the optimal 8 estimator is realized by the Newton’s method which is a local search algorithm with quadratic rate of convergence 9 10 [18]. As a result, it is clear that global convergence depends on the initialization. 11 The updated procedure of the Newton’s method is: 12 1 13 (ℓ+1) (ℓ) (ℓ) − (ℓ) 14 θˆ = θˆ 2 J (θˆ ) J (θˆ ) , (21) − ∇ 1 ∇ 1 15      16 where 17 18 N n=1 nvn 19 (J1(θ)) = , (22) ∇  N  20 P n=1 vn 21   P N 2 2 N 2 22 2 n=1 n un vn n=1 n un vn (J1(θ)) = − − (23) 23 ∇  N 2 N 2  P n un v  P un v  24 n=1 − n n=1 − n 25  P  P   26 with 27 1 1 28 v = Re (y h θ + iγ) w , (24) n σ2 Re w n − n n 29 { n}   2 30 1 1 yn hnθ + iγ 2γ 31 un = 2 Re 2 − 1 wn + . (25) σ Re wn σ√2 − √2πσ 32 { }    ! 

33 (0) 34 In this study, the weighted method is utilized for algorithm initialization, that is, θˆ = MED y [19] 35 { } 36 and the stopping criterion follows that of (19). According to our simulation results, the MLE is able to find the 37 global solution by utilizing this initialization. 38 39 40 2.2 M-estimator with Pseudo-Voigt Function 41 42 Although the MLE is maximally efficient, in the sense that its variance asymptotically achieves the CRLB, it 43 44 suffers from high computational complexity because of the integral in the Faddeeva function, i.e., likelihood 45 function. To reduce the computational cost, we consider that ρ is the logarithm of the pseudo-Voigt function 46 n 47 and the scenarios of known and unknown density parameters are also discussed. 48 2 49 When σ and γ are known, it has been proved that the Voigt function can be approximated by the sum of 50 the PDF of Cauchy and Gaussian distributions, which is called pseudo-Voigt function [20]: 51 52 f(e ; γ, σ2)= µ f (e ; γ)+(1 µ )f (e ; σ2), (26) 53 n a 1 n − a 2 n 54

55 where f1 and f2 are the PDFs of Cauchy and Gaussian distributions, respectively, which is totally different 56 57 from fC and fG: 58 59 Caξa f1(en; γ)= 2 2 , (27) 60 √π(en + ξa) 61 C e 2 62 f (e ; σ2)= a exp log(2) n (28) 2 n √πξ − ξ 63 a  a  ! 64 65

5 1 2 3 4 5 with 6 C log(2) 2 7 µ = a − , C = b (a)ea (1 erf(a)) , ξ = √2σb (a), (29) 8 a a 1/2 a 1/2 Ca(1 pπ log(2)) − 9 − 2 3 4 10 b1/2(a)= a +p log(2) exp( 0.6055a +0.0718a 0.0049a +0.000136a ) (30) − − 11 p 12 and a = γ , with erf( ) denoting the . 13 √2σ · 14 The model in equation (26) is totally different from Swamis model [6]. In his paper, the density parameters 15 are γ and σ2, which are apparently shown in the PDF model. However, the density parameter in equation (26) 16 2 2 17 is ξa, which is a nonlinear function of γ and σ . That is, in Swamis paper, the problem is to minimize g(γ, σ ). 18 2 19 While in our work, we deal with g(h(γ, σ )) which is more complicated. 20 In the unknown density parameter scenario, similar to the MLE, we estimate the density parameters γ and 21 2 2 2 22 σ first, which can be derived from (19). Afterγ ˆ andσ ˆ are obtained, f(y,θθ;ˆγ, σˆ ) can be constructed. 23 Then the θ is estimated by minimizing the cost function according to (10): 24 25 ˆ 26 θ = arg min J2(θ) , (31) ˜ { } 27 θ 28 where J (θ)= N log f(y ,θθ; γ, σ2) with 29 2 − n=1 n 30 P  f(y ,θθ; γ, σ2)= µ f (y ,θθ; γ)+(1 µ )f (y ,θθ; σ2) (32) 31 n a 1 n − a 2 n 32 C ξ C y h θ 2 33 = µ a a + (1 µ ) a exp log(2) n − n . (33) a √π ((y h θ)2 + ξ2) − a √πξ − ξ 34 n − n a a  a  ! 35 36 To find the minimum of (31), we apply the Newton’s method:

37 1 (ℓ+1) (ℓ) (ℓ) − (ℓ) 38 θˆ = θˆ 2 J (θˆ ) J (θˆ ) , (34) 39 − ∇ 2 ∇ 2      40 41 where 42 N 43 n=1 nVn (J2(θ)) = , (35) 44 ∇  N V  45 P n=1 n   46 P N 2 2 N 2 2 n=1 n Un + Wn Vn n=1 n Un + Wn Vn 47 (J2(θ)) = − − (36) ∇  N 2 N 2  48 P n Un + Wn V  P Un + Wn V  n=1 − n n=1 − n 49   50 with P  P  51 52 2 yn hnθ log(2)(yn hnθ) 2 Vn = µa − f1(yn,θθ; γ)+(1 µa) − f2(yn,θθ; σ ) , (37) 53 f(y ,θθ; γ, σ2) (y h θ)2 + ξ2 − ξ2 n   n − n a   a   54 2 2 2 (yn hnθ) ξa (1 µa) log(2) 2 55 Un = − µa − − f1(yn,θθ; γ)+ − f2(yn,θθ; σ ) , (38) 2 2 2 2 2 f(yn,θθ; γ, σ ) ((yn hnθ) + ξ ) ! ξa ! 56 − a 57 2 2 4 (y h θ) log(2)(y h θ) 58 W = µ n − n f (y ,θθ; γ)+(1 µ ) n − n f (y ,θθ; σ2) . n f(y ,θθ; γ, σ2) a (y h θ)2 + ξ2 1 n − a ξ2 2 n 59 n  n − n a   a  ! 60 (39) 61 62 ˆ 63 In this method, θ is updated by (34) and the initialization and stopping criterion are the same as for the MLE 64 in (21). 65

6 1 2 3 4 5 2.3 Complexity of Proposed Methods 6 7 The computational complexities of the proposed methods are roughly examined. The Faddeeva function in the 8 9 PDF can be realized according to [21]. At each iteration of the Newton’s method, the numbers of flops in the 10 MLE and MEPV required are N 3 + N 2 log N and (N), respectively. 11 O 2 O 12  13 14 3 Cram´er-Rao Lower Bound 15 16 In the scenario that the density parameters are known, the CRLB [14] of θˆ can be calculated by the diagonal 17 18 elements of the inverse of the Fisher information matrix I: 19 N N T 20 ∂2 log f (y ,θθ; γ, σ2) ∂ log f (y ,θθ; γ, σ2) ∂ log f (y ,θθ; γ, σ2) I = E E n = E E n E n . (40) 21 − ∂θθ2 ∂θθ ∂θθ n=1 n=1 22  X   X      23 Based on (11), we have 24 25 nRe (yn An B+iγ)wn 2 { − − } ∂ log fE(yn,θθ; γ, σ ) 1 Re wn 26 = { } . (41) 2 Re (yn An B+iγ)wn 27 ∂θθ σ  { − − }  Re wn { } 28   29 It is hard to derive a closed form expression for (41), therefore, we use the average of sufficient number of 30 31 independent runs to replace the expectation. 32 For unknown γ and σ2, the parameter vector is α = [ABγσ2]T . Then the CRLB of θˆ in this case 33 34 corresponds to the (1, 1) and (2, 2) entries in the inverse of I. The (k,l) element of I is written as: 35 N 2 2 36 ∂ log fE(yn,θθ; γ, σ ) ∂ log fE(yn,θθ; γ, σ ) I = E , k,l =1, 2, 3, 4, (42) 37 k,l − ∂α ∂α n=1 k l 38  X  39 where 40 1 nRe (yn An B+iγ)wn 41 2 { − − } σ Re wn 42 { }  1 Re (yn An B+iγ)wn  2 2 { − − } 43 ∂ log fE(yn,θθ; γ, σ ) σ Re wn 1 { } 2 = Re i(yn An B+iγ)wn + . (43)  σ2 2  44 ∂αα  { − − } √2πσ  Re wn 45  1 − 2{ } γ   2 Re (yn An B+iγ) wn +  46  σ { − − } √2πσ2σ2 1  2  Re wn 2σ  47  { } −  48   49 4 Simulation Results 50 51 To evaluate the performance of the MLE and MEPV, computer simulations have been conducted. The mean 52 53 square errors (MSEs), E (Aˆ A)2 and E (Bˆ B)2 , are employed as the performance measure. The signal is 54 { − } { − } 55 generated according to (5) with A = 1, B =0.5, the noise en is generated as the sum of i.i.d. Cauchy distribution 56 with dispersion γ and i.i.d. zero-mean Gaussian distribution with variance σ2. Following the setup in [22], the 57 58 interval of t in (18) is [0.1, 1] with 1000 uniform grid points. Under such mixture noise, the signal-to-noise ratio 2 59 is hard to define, therefore, we set σ2 = 10 and scale σ to produce different noise conditions. Comparison with 60 γ 61 ℓ1-norm estimator solved by the least absolute deviation [23], MM-estimator with breakdown point 0.85 [24] and 62 63 the CRLB are provided. The ℓ1-norm estimator is included because it is robust and suboptimum estimator for 64 Cauchy noise. While the MM-estimator, which is a double-stage M-estimator, is considered since this estimator 65

7 1 2 3 4 5 is robust for the linear regression model. All results are based on 1500 independent Monte Carlo runs with a 6 7 data length of N = 60. 8 First, we examine the difference between the Voigt and pseudo-Voigt functions. The γ is set to 10 and 9 10 Figure 1 shows the comparison with the logarithmic scale. This approximation has been studied in detail in 11 [20] and we have also verified also with our experiment a good match between the two curves. The closeness of 12 σ2 13 the match depends on the γ . For normalized curves in Figure 1, the mismatch measured by the area between 14 the two curves can be less than 0.5%, which agrees with the analysis in [20]. 15 16 Next, we examine the scenario of the known distribution parameters. It is seen in Figures 2 and 3 that when 17 2 γ and σ2 are available, the MSEs of both the MLE and MEPV can attain the CRLB for σ [0, 30] dB case. 18 γ ∈ 19 The performance of the MLE and MEPV are almost the same because the main idea of them is to optimize two 20 21 highly similar functions. Furthermore, they are superior to the ℓ1-norm estimator and MM-estimator in the 22 case of ACG noise. Figure 4 shows the average computational cost versus the data length N for γ being set to 23 24 10. A stopwatch timer is utilized to measure the operation time of both methods. It is demonstrated that the 25 26 complexity of the MEPV is significantly lower with respect to the MLE. The computational cost of the MLE 27 increases exponentially, while for the MEPV, the computational complexity grows approximately sublinearly 28 for small N (N [50, 2550]) and linearly for large N (N > 2550) since it requires less iterations to converge. 29 ∈ 30 Finally, we investigate the scenario of the unknown γ and σ2. The ECF is employed to estimate the density 31 32 parameters γ and σ2 first. Figures 5 and 6 show that the MSEs of the MLE and MEPV can achieve the CRLB 33 and they significantly outperform the ℓ -norm estimator and MM-estimator. 34 1 35 36 37 5 Conclusions 38 39 40 We investigate the ACG process in this paper, which is modeled as the sum of Cauchy and Gaussian variables 41 in time domain. The PDF of the ACG, known as the Voigt profile, is calculated by the convolution of PDFs 42 43 of these two components. To estimate the parameters in the linear regression in the environment of the ACG 44 noise, the MLE and MEPV are developed for both known and unknown density parameters cases. Computer 45 46 simulation results are provided to show that the MSE performance of both estimators can achieve the CRLB 47 2 for σ [0, 30] dB and are superior to ℓ -norm and MM-estimator. It is also seen in simulation results that 48 γ ∈ 1 49 the MEPV has much lower the computational cost than the MLE because of the complicated analytical form 50 51 in the PDF. Moreover, this study on the ACG process can be extended to autoregressive model and nonlinear 52 parameter estimation problem. 53 54 55 56 References 57 58 [1] A.M. Zoubir, V. Koivunen, Y. Chakhchoukh and M. Muma, “Robust estimation in signal proxcessing: A 59 60 tutorial-style treatment of fundamental concepts,” IEEE Signal Processing Magazine, vol. 29, no. 4, pp. 61 62 61-80, Jul. 2012. 63 64 65

8 1 2 3 4 5 [2] J. Pfanzagl and O. Sheynin, “Studies in the history of probability and statistics XLIV: A forerunner of the 6 IEEE Transactions on Signal Processing 7 t-distribution,” , vol. 83, no. 4, pp. 891-898, Apr. 1996. 8 9 [3] C.L. Nikias and M. Shao, Signal Processing with Alpha-Stable Distribution and Applications, John Wiley 10 & Sons Inc., New York, 1995. 11 12 13 [4] J.J. Shynk, Probability, Random Variables, and Random Processes: Theory and Signal Processing Appli- 14 cations, Hoboken, N.J.: Wiley, 2013. 15 16 17 [5] D.A. Reynolds, “Gaussian mixture models,” Encyclopedia of Biometrics, pp. 659-663, 2009. 18 19 [6] A. Swami,“Non-Gaussian mixture models for detection and estimation in heavy tailed noise,” in Proceedings 20 of IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Jun. 2000, 21 22 vol. 6, pp. 3802-3805. 23 24 [7] J. Ilow, D. Hatzinakos and A.N. Venetsanopoulos, “Performance of FH SS radio networks with interference 25 26 modeled as a mixture of Gaussian and alpha-stable noise,” IEEE Transactions on Communications, vol. 27 46, no. 4, pp. 509-520, Apr. 1998. 28 29 30 [8] S. Ambike, J. Ilow and D. Hatzinakos, “Detection for binary transmission in a mixture of Gaussian noise 31 and impulsive noise modeled as an alpha-stable process,” IEEE Signal Processing Letters, vol. 1, no. 3, pp. 32 33 55-57, Mar. 1994. 34 35 [9] A.X. Li, Y.Z. Wang, W.Y. Xu and Z.C. Zhou, “Receiver design of MIMO systems in a mixture of Gaussian 36 37 noise and impulsive noise,” in Proceedings of the 60th IEEE Vehicular Technology Conference, Los Angeles, 38 CA, Sept. 2004, vol. 24, pp.1493-1497. 39 40 41 [10] D.Herranz, E.E. Kuruoglu and L. Toffolatti, “An α-stable approach to the study of the P(D) distribution of 42 unresolved point sources in CMB sky maps,” Astronomy and Astrophysics, vol. 424, no. 3, pp. 1081-1096, 43 44 2004. 45 46 [11] Y. Kim and G.T. Zhou, “The Middleton class B model and its mixture representation,” Center for Signal 47 48 and Image Processing, Technical Report No. CSIP TR-98-01, Georgia Institute of Technology, Atlanta, 49 GA, 303320250, May 1998. 50 51 52 [12] F.W.J. Olver, D.M. Lozier and R.F. Boisvert, NIST Handbook of Mathematical Functions, Cambridge 53 University Press, Cambridge, pp. 167-168, 2010. 54 55 [13] P.J. Huber, Robust Statistics, 2nd ed., New York: Wiley, 2009. 56 57 58 [14] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall, NJ: Englewood 59 Cliffs, 1993. 60 61 62 [15] T. Burgess, A.A. Starkey and D. White, “Improvements for kick detection,” Oil Review, vol. 2, no. 1, pp. 63 43-51, 1990. 64 65

9 1 2 3 4 5 [16] R. Brcich and A.M. Zoubir, “Estimation and detection in a mixture of symmetric alpha stable and Gaussian 6 in Proceedings of the IEEE Signal Processing Workshop on Higher-Order Statistics 7 interference,” , Ceasarea, 8 Jun. 1999, pp. 219-223. 9 10 [17] S. Boyd and L. Vandenberghe, Convex Optimization, New York. Cambridge University Press, 2004. 11 12 13 [18] J. Nocedal and S.J. Wright, Numerical Optimization, 2nd ed., New York: Springer, c2006. 14 15 [19] R.W. Hawley and N.C. Gallagher Jr., “On edgeworths method for minimum absolute error linear regres- 16 17 sion,” IEEE Transactions on Signal Processing, vol. 42, no. 8, pp. 2045-2054, Aug. 1994. 18 19 [20] H.O. Dirocco and A. Cruzado, “The Voigt profile as a sum of a Gaussian and a Lorentzian functions, when 20 the coefficient depends only on the widths ratio,” Acta Physica Polonica A, vol. 122, no. 4, pp. 666-669, 21 22 2012. 23 24 [21] J.A.C. Weideman, “Computation of the complex error function,” SIMAM Journal on Numerical Analysis, 25 26 vol. 31, no. 5, pp. 1497-1518, Oct. 1994. 27 28 [22] S.M. Kogon and D.B. Williams, “On the characterization of impulsive noise with α-stable distributions 29 30 using Fourier techniques,” in Proceedings of the 29th Asilomar Conference on Signals, Systems, and Com- 31 puters, Pacific Grove, CA, Nov. 1995, vol. 2, pp. 787-791. 32 33 34 [23] Y. Li and G. Arce, “A maximum likelihood approach to least absolute deviation regression,” EURASIP 35 Journal on Applied Signal Processing, no. 12, pp. 1762-1769, Sept. 2004. 36 37 [24] C. Croux, G. Dhaene and D. Hoorelbeke, “Robust Standard Errors for Robust Estimators,” DTEW Re- 38 39 search Report 0367, K.U. Leuven, Jan. 2004. 40 41 42 43 −15 Voigt 44 pseudo−Voigt 45 46 −20 47 48 49

50 ) (dB) −25 n

51 f(e 52 53 54 −30 55 56

57 −35 58 −100 −50 0 50 100 e 59 n 60 61 62 Figure 1: Comparison between Voigt and pseudo-Voigt functions 63 64 65

10 1 2 3 4 5 6 7 8 9 10 11 ℓ1-norm 12 MM MLE 13 MEPV 14 CRLB 15 16 17 −25 18 19 20

21 Mean Square Error of A (dB) 22 23 24

25 0 5 10 15 20 25 30 26 σ2/γ (dB) 27 28 σ2 2 29 Figure 2: Mean square error of A versus γ with known γ and σ 30 31 32 33 34 35 36 37 38 39 40

41 ℓ1-norm 42 MM 43 MLE 44 MEPV CRLB 45 46 47 48 49 50

51 Mean Square Error of B (dB) 52 53 54 55 56 0 5 10 15 20 25 30 σ2/γ(dB) 57 58 2 59 Figure 3: Mean square error of B versus σ with known γ and σ2 60 γ 61 62 63 64 65

11 1 2 3 4 5 6 7 8 9 1.4 10 MLE MEPV 11 0.015 1.2 12 MEPV 13 0.01 14 1 15 0.005 16 0.8 17

18 0.6 0 19 1000 2000 3000 4000 5000

20 Computational Time (s) 21 0.4 22 23 0.2 24

25 0 26 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 N 27 28 29 Figure 4: Computational complexity versus N 30 31 32 33 34 35 36 37 38 39 40 −16 41 ℓ1-norm 42 −18 MM 43 MLE 44 −20 MEPV CRLB 45 −22 46 47 −24 48 49 −26 50 −28

51 Mean Square Error of A (dB) 52 −30 53 −32 54 55 −34 0 5 10 15 20 25 30 56 σ2/γ (dB) 57 58 2 59 Figure 5: Mean square error of A versus σ with unknown γ and σ2 60 γ 61 62 63 64 65

12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

25 14 26 ℓ1-norm 27 12 MM 28 MLE MEPV 29 10 CRLB 30 31 8 32 33 6 34 4 35

36 Mean Square Error of B (dB) 2 37 38 0 39 −2 40 0 5 10 15 20 25 30 41 σ2/γ (dB) 42 43 44 σ2 2 Figure 6: Mean square error of B versus γ with unknown γ and σ 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

13