A Generalized Cauchy Distribution Framework for Problems Requiring Robust Behavior

Home , Cauchy distribution

Hindawi Publishing Corporation EURASIP Journal on Advances in Signal Processing Volume 2010, Article ID 312989, 19 pages doi:10.1155/2010/312989

Research Article A Generalized Cauchy Distribution Framework for Problems Requiring Robust Behavior

Rafael E. Carrillo, Tuncer C. Aysal (EURASIP Member), and Kenneth E. Barner

Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA

Correspondence should be addressed to Rafael E. Carrillo, [email protected]

Received 8 February 2010; Revised 27 May 2010; Accepted 7 August 2010

Academic Editor: Igor Djurovic´

Copyright © 2010 Rafael E. Carrillo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Statistical modeling is at the heart of many engineering problems. The importance of statistical modeling emanates not only from the desire to accurately characterize stochastic events, but also from the fact that distributions are the central models utilized to derive sample processing theories and methods. The generalized Cauchy distribution (GCD) family has a closed-form pdf expression across the whole family as well as algebraic tails, which makes it suitable for modeling many real-life impulsive processes. This paper develops a GCD theory-based approach that allows challenging problems to be formulated in a robust fashion. Notably, the proposed framework subsumes generalized Gaussian distribution (GGD) family-based developments, thereby guaranteeing performance improvements over traditional GCD-based problem formulation techniques. This robust framework can be adapted to a variety of applications in signal processing. As examples, we formulate four practical applications under this framework: (1) ﬁltering for power line communications, (2) estimation in sensor networks with noisy channels, (3) reconstruction methods for compressed sensing, and (4) fuzzy clustering.

1. Introduction of models more complicated than the Gaussian distribution, has thus led to the recent dynamic interest in heavy-tailed Traditional signal processing and communications methods models. are dominated by three simplifying assumptions: (1) the Robust statistics—the stability theory of statistical systems under consideration are linear; the signal and noise procedures—systematically investigates deviation from processes are (2) stationary and (3) Gaussian distributed. modeling assumption affects [4]. Maximum likelihood (ML) Although these assumptions are valid in some applications type estimators (or more generally, M-estimators) developed and have significantly reduced the complexity of techniques in the theory of robust statistics are of great importance in developed, over the last three decades practitioners in various robust signal processing techniques [5]. M-estimators can be branches of statistics, signal processing, and communica- described by a cost function-defined optimization problem tions have become increasingly aware of the limitations these or by its first derivative, the latter yielding an implicit equa- assumptions pose in addressing many real-world application (or set of equations) that is proportional to the influence tions. In particular, it has been observed that the Gaussian function. In the location estimation case, properties of the distribution is too light-tailed to model signals and noise influence function describe the estimator robustness [4]. that exhibits impulsive and nonsymmetric characteristics Notably, ML location estimation forms a special case of M- [1]. A broad spectrum of applications exists in which such estimation, with the observations taken to be independent processes emerge, including wireless communications, tele- and identically distributed and the cost function set propor- traffic, hydrology, geology, atmospheric noise compensation, tional to the logarithm of the common density function. economics, and image and video processing (see [2, 3]and To address as wide an array of problems as possible, references therein). The need to describe impulsive data, modeling and processing theories tend to be based on coupled with computational advances that enable processing density families that exhibit a broad range of characteristics. 2 EURASIP Journal on Advances in Signal Processing

Signal processing methods derived from the generalized algebraic tails and a closed form pdf. Like the α-stable Gaussian distribution (GGD), for instance, are popular in the family, the generalized-t model contains the Gaussian and literature and include works addressing heavy-tailed process the Cauchy distributions as special cases, depending on the [2, 3, 6–8]. The GGD is a family of closed form densities, degrees of freedom parameter. It is shown in [18] that with varying tail parameter, that effectively characterizes the myriad estimator is also optimal for the generalized-t many signal environments. Moreover, the closed form nature family of distributions. Thus we focus on the GCD family of the GGD yields a rich set of distribution optimal error of operators, as their performance also subsumes that of norms (L1, L2,andLp), and estimation and filtering theories, generalized-t approaches. for example, linear filtering, weighted median filtering, In this paper, we develop a GCD-based theoretical fractional low order moment (FLOM) operators, and so approach that allows challenging problems to be formulated forth. [3, 6, 9–11]. However, a limitation of the GGD in a robust fashion. Within this framework, we establish a model is the tail decay rate—GGD distribution tails decay statistical relationship between the GGD and GCD families. exponentially rather than algebraically. Such light tails do not The proposed framework subsumes GGD-based develop- accurately model the prevalence of outliers and impulsive ments (e.g., least squares, least absolute deviation, FLOM, samples common in many of today’s most challenging Lp norms, k-means clustering, etc.), thereby guaranteeing statistical signal processing and communications problems performance improvements over traditional problem for- [3, 12, 13]. mulation techniques. The developed theoretical framework As an alternative to the GGD, the α-stable density family includes robust estimation and filtering methods, as well has gained recent popularity in addressing heavy-tailed prob- as robust error metrics. A wide array of applications can lems. Indeed, symmetric α-stable processes exhibit algebraic be addressed through the proposed framework, including, tails and, in some cases, can be justified from first principles among others, robust regression, robust detection and (Generalized Central Limit Theorem) [14–16]. The index estimation, clustering in impulsive environments, spectrum of stability parameter, α ∈ (0, 2], provides flexibility in sensing when signals are corrupted by heavy-tailed noise, impulsiveness modeling, with distributions ranging from and robust compressed sensing (CS) and reconstruction light-tailed Gaussian (α = 2) to extremely impulsive (α → methods. As illustrative and evaluation examples, we for- 0). With the exception of the limiting Gaussian case, α- mulate four particular applications under this framework: stable distributions are heavy-tailed with infinite variance (1) filtering for power line communications, (2) estimation and algebraic tails. Unfortunately, the Cauchy distribution in sensor networks with noisy channels, (3) reconstruction (α = 1) is the only algebraic-tailed α-stable distribution that methods for compressed sensing, and (4) fuzzy clustering. possesses a closed form expression, limiting the flexibility The organization of the paper is as follows. In Section 2, and performance of methods derived from this family we present a brief review of M-estimation theory and of distributions. That is, the single distribution Cauchy the generalized Gaussian and generalized Cauchy density methods (Lorentzian norm, weighted myriad) are the most families. A statistical relationship between the GGD and commonly employed α-stable family operators [12, 17–19]. GCD is established, and the ML location estimate from The Cauchy distribution, while intersecting the α-stable GCD statistics is derived. An M-type estimator, coined M- family at a single point, is generalized by the introduction GC estimator, is derived in Section 3 from the cost function of a varying tail parameter, thereby forming the Generalized emerging in GCD-based ML estimation. Properties of the Cauchy density (GCD) family. The GCD has a closed form proposed estimator are analyzed, and a weighted filter struc- pdf across the whole family, as well as algebraic tails that ture is developed. Numerical algorithms for multiparameter make it suitable for modeling real-life impulsive processes estimation are also presented. A family of robust metrics [20, 21]. Thus the GCD combines the advantages of the derived from the GCD are detailed in Section 4, and their GGD and α-stable distributions in that it possesses (1) heavy, properties are analyzed. Four illustrative applications of the algebraic tails (like α-stable distributions) and (2) closed proposed framework are presented in Section 5. Finally, form expressions (like the GGD) across a flexible family of we conclude in Section 6 with closing thoughts and future ∈ densities defined by a tail parameter, p (0, 2]. Previous directions. GCD family development focused on the particular p = 2 (Cauchy distribution) and p = 1 (meridian distribution) cases, which lead to the myriad and meridian [13, 22] 2. Distributions, Optimal Filtering, and estimators, respectively. (It should be noted that the original M-Estimation authors derived the myriad filter starting from α-stable distributions, noting that there are only two closed-form This section presents M-estimates, a generalization of max- expressions for α-stable distributions [12, 17, 18].) These imum likelihood (ML) estimates, and discusses optimal estimators provide a robust framework for heavy-tail signal filtering from an ML perspective. Specifically, it discusses processing problems. statistical models of observed samples obeying generalized In yet another approach, the generalized-t model is Gaussian statistics and relates the filtering problem to maxi- shown to provide excellent fits to different types of atmo- mum likelihood estimation. Then, we present the generalized spheric noise [23]. Indeed, Hall introduced the family of Cauchy distribution, and a relation between GGD and GCD generalized-t distributions in 1966 as an empirical model random variables is introduced. The ML estimators for GCD for atmospheric radio noise [24]. The distribution possesses statistics are also derived. EURASIP Journal on Advances in Signal Processing 3

2.1. M-Estimation. In the M-estimation theory the objective In this form, α is an inverse scale parameter, and k>0, is to estimate a deterministic but unknown parameter θ ∈ R sometimes called the shade parameter, controls the tail decay (or set of parameters) of a real-valued signal s(i; θ)corrupted rate. The GGD model contains the Laplacian and Gaussian by additive noise. Suppose that we have N observations distributions as special cases, that is, for k = 1andk = 2, yielding the following parametric signal model: respectively. Conceptually, the lower the value of k is the more impulsive the distribution is. The ML location estimate x(i) = s(i; θ) + n(i) (1) for GGD statistics is reviewed in the following. Detailed derivations of these results are given in [3]. = { }N { }N for i 1, 2, ..., N,where x(i) i=1 and n(i) i=1 denote the Consider a set of N independent observations each obey- observations and noise components, respectively. Let θ be an ing the GGD with common location parameter, common ff estimate of θ, then any estimate that solves the minimization shape parameter k, and di erent scale parameter σi.TheML problem of the form estimate of location is given by ⎡ ⎤ N N 1 = = ⎣ | − |k⎦ θ arg min ρ(x(i); θ) (2) θ arg min k x(i) θ . (7) θ θ i=1 i=1 σi or by an implicit equation There are two special cases of the GGD family that are well studied: the Gaussian (k = 2) and the Laplacian (k = 1) N distributions, which yield the well known weighted mean and ψ x(i); θ = 0 (3) weighted median estimators, respectively. When all samples i=1 are identically distributed for the special cases, the mean is called an M-estimate (or maximum likelihood type and median estimators are the resulting operators. These estimate). Here ρ(x; θ) is an arbitrary cost function to be estimators are formally defined in the following. designed, and ψ(x; θ) = (∂/∂θ)ρ(x; θ). Note that ML- Definition 1. Consider a set of N independent observations estimatorsareaspecialcaseofM-estimators with ρ(x; θ) = each obeying the Gaussian distribution with different vari- − log f (x; θ), where f (·) is the probability density function ance σ2. The ML estimate of location is given by of the observations. In general, M-estimators do not neces- i sarily relate to probability density functions. N = ( ) = i 1 hix i · |N In the following we focus on the location estimation θ N mean hi x(i) i=1 ,(8) problem. This is well founded, as location estimators have i=1 hi been successfully employed as moving window type filters where = 1 2 and · denotes the (multiplicative) weighting [3, 5, 9]. In this case, the signal model in (1)becomesx(i) = hi /σi operation. θ + n(i) and the minimization problem in (2)becomes

N Definition 2. Consider a set of N independent observations θ = arg min ρ(x(i) − θ) (4) each obeying the Laplacian distribution with common θ i=1 location and different scale parameter σi.TheMLestimate of location is given by or = |N N θ median hi x(i) i=1 ,(9) ψ x(i) − θ = 0. (5) = i 1 where hi = 1/σi and denotes the replication operator defined as For M-estimates it can be shown that the influence function is proportional to ψ(x)[4, 25], meaning that we can hi times derive the robustness properties of an M-estimator, namely, (10) hi x(i) =x(i), x(i), ..., x(i) . efficiency and bias in the presence of outliers, if ψ is known. Through arguments similar to those above, the k =/ 1, 2 2.2. Generalized Gaussian Distribution. The statistical behav- cases yield the fractional lower order moment (FLOM) ior of a wide range of processes can be modeled by the GGD, estimation framework [9]. For k<1, the resulting estimators such as DCT and wavelets coefficients and pixels difference are selection type. A drawback of FLOM estimators for 1 < [2, 3]. The GGD pdf is given by k<2 is that their computation is, in general, nontrivial, although suboptimal (for k>1) selection-type FLOM kα estimators have been introduced to reduce computational f (x) = exp −(α|x − θ|)k, (6) 2Γ(1/k) costs [6]. · = ∞ x−1 −t where Γ( ) is the gamma function Γ(x) 0 t e dt, θ 2.3. Generalized Cauchy Distribution. The GCD family was is the location parameter, and α is a constant related to the proposed by Rider in 1957 [20], rediscovered by Miller and − standard deviation σ,definedasα = σ −1 Γ(3/k)(Γ(1/k)) 1. Thomas in 1972 with a different parametrization [21], and 4 EURASIP Journal on Advances in Signal Processing has been used in several studies of impulsive radio noise distributed random variables (GGD k = 2). Recently, Aysal [3, 12, 17, 21, 22]. The GCD pdf is given by and Barner showed that this relationship also holds for the Laplacian and Meridian distributions [13], that is, the p p −2/p = fGC(z) = aσ σ + |z − θ| (11) ratio of two independent Laplacian (GGD k 1) random variables yields a Meridian (GCD p = 1) random variable. with a = pΓ(2/p)/2(Γ(1/p))2. In this representation, θ is the In the following, we extend this finding to the complete set location parameter, σ is the scale parameter, and p is the tail of GGD and GCD families. constant. The GCD family contains the Meridian [13]and Cauchy distributions as special cases, that is, for p = 1and Lemma 1. Therandomvariableformedastheratiooftwo p = 2, respectively. For p<2, the tail of the pdf decays slower independent zero-mean GGD distributed random variables U than in the Cauchy distribution case, resulting in a heavier- and V, with tail constant β and scale parameters αU and αV , tailed distribution. respectively, is a GCD random variable with tail parameter = ν = The flexibility and closed-form nature of the GCD make λ β and scale parameter αU /αV . it an ideal family from which to derive robust estimation and filtering techniques. As such, we consider the location esti- Proof. See Appendix A. mation problem that, as in the previous case, is approached from an ML estimation framework. Thus consider a set of N 3. Generalized Cauchy-Based Robust i.i.d. GCD distributed samples with common scale parameter Estimation and Filtering σ and tail constant p. The ML estimate of location is given by ⎡ ⎤ In this section we use the GCD ML location estimate cost N function to define an M-type estimator. First, robustness = ⎣ p | − |p ⎦ θ arg min log σ + x(i) θ . (12) and properties of the derived estimator are analyzed, and the θ = i 1 filtering problem is then related to M-estimation. The pro- Next, consider a set of N independent observations each posed estimator is extended to a weighted filtering structure. obeying the GCD with common tail constant p,but Finally, practical algorithms for the multiparameter case are ν developed. possessing unique scale parameter i.TheMLestimateis = N ν formulated as θ arg maxθ i=1 fGC(x(i); i). Inserting the GCD distribution for each sample, taking the natural log, 3.1. Generalized Cauchy-Based M-Estimation. The cost func- and utilizing basic properties of the argmax and log functions tion associated with the GCD ML estimate of location yield derived in the previous section is given by ⎡ ⎤ = p | |p ≤ N ρ(x) log σ + x , σ>0, 0

x[N]. EURASIP Journal on Advances in Signal Processing 5

26 should be at least as powerful as GGD-based estimators 24 (linear FIR, median, FLOM) in light-tailed applications, 22 while the untapped algebraic tail potential of GCD methods should allow them to substantially outperform in heavy- 20 tailed applications. 18 In contrast to the equivalence with Lp norm approaches )

θ for σ large, M-GC estimators become more resistant to ( 16 Q impulsive noise as σ decreases. In fact, as σ → 0 the M-GC 14 yields a mode type estimator with particularly strong impulse 12 rejection. 10 { }N 8 Property 2. Given a set of input samples x(i) i=1, the M-GC estimate converges to a mode type estimator as σ → 0. This 6 −20 2 4 6 8 10 12 is ⎡ ⎤ θ ⎢ ⎥ Figure 1: Typical M-GC objective functions for diﬀerent values limθ = arg min ⎣ x(i) − x j ⎦, (17) of ∈{0 5, 1, 1 5, 2} (from bottom to top respectively). Input σ → 0 x(j)∈M p . . i,x(i) =/ x(j) samples are x = [4.9, 0, 6.5, 10.0, 9.5, 1.7, 1] and σ = 1. where M is the set of most repeated values.

(2) All local extrema of Q(θ) lie in the interval [x[1], x[N]]. Proof. See Appendix D. (3) If 0 outlier, assigning greater influence to the most 2N − 1 local extrema points and therefore a finite set of local repeated values in the observations set. This property makes minima. the M-GC a suitable framework for applications such as image processing, where selection-type filters yield good Proof. See Appendix B. results [7, 13, 18].

The M-GC estimator has two adjustable parameters, σ 3.2. Robustness and Analysis of M-GC Estimators. To formally and p. The tail constant, p, depends on the heaviness of the evaluate the robustness of M-GC estimators, we consider the ≤ underlying distribution. Notably, when p 1 the estimator influence function, which, if it exists, is proportional to ψ(x) → behaves as a selection type filter, and, as p 0, it becomes and determines the effect of contamination of the estimator. increasingly robust to outlier samples. For p>1, the location For the M-GC estimator estimate is in the range of the input samples and is readily − computed. Figure 1 shows a typical sketch of the M-GC p|x|p 1 sgn(x) ψ(x) = , (18) objective function, in this case for p ∈{0.5, 1, 1.5, 2} and σ p + |x|p σ = 1. The following properties detail the M-GC estimator where sgn(·) denotes the sign operator. Figure 2 shows the behavior as σ goes to either 0 or ∞. Importantly, the results M-GC estimator influence function for p =∈ {0.5, 1, 1.5, 2}. show that the M-GC estimator subsumes other classical To further characterize M-estimates, it is useful to list the estimator families. desirable features of a robust influence function [4, 25].

{ }N (i) B-Robustness. An estimator is B-robust if the supre- Property 1. Given a set of input samples x(i) i=1, the M-GC estimate converges to the ML GGD estimate ( Lp norm as mum of the absolute value of the influence function cost function) as σ →∞: is finite. (ii) Rejection Point. The rejection point, defined as the N p distance from the center of the influence function lim θ = arg min |x(i) − θ| . (16) σ →∞ θ i=1 to the point where the influence function becomes negligible, should be finite. Rejection point measures Proof. See Appendix C. whether the estimator rejects outliers and, if so, at what distance. Intuitively, this result is explained by the fact that |x(i) − θ|p/σ p becomes negligible as σ grows large compared to 1. The M-GC estimate is B-robust and has a finite rejection This, combined with the fact that log(1 + x) ≈ x when point that depends on the scale parameter σ and the x 1, which is an equality in the limit, yields the resulting tail parameter p.Asp → 0, the influence function cost function behavior. The importance of this result is that has higher decay rate, that is, as p → 0 the M-GC M-GC estimators include M-estimators with Lp norm (0 < estimator becomes more robust to outliers. Also of note p ≤ 2) cost functions. Thus M-GC (GCD-based) estimators is that limx →±∞ψ(x) = 0, that is, the influence function 6 EURASIP Journal on Advances in Signal Processing

1.5 Proof of Property 5 follows from the fact that the M- GC estimator inﬂuence function is odd, bounded, and 1 continuous (except at the origin, which is a set of measure zero); argument details parallel those in [4]. 0.5 Notably, M-estimators have asymptotic normal behavior [4]. In fact, it can be shown that )

x ( 0

ψ N θN − θ −→ Z (22) −0.5 in distribution, where Z ∼ N (0, v)and −1 E ψ2(X − θ) = F v 2 . (23) EF ψ (X − θ) −1.5 −10 −50 510 The expectation is taken with respect to F, the underlying x distribution of the data. The last expression is the asymptotic p = 0.5 p = 1.5 variance of the estimator. Hence, the variance of θN decreases = = p 1 p 2 as N increases, meaning that M-GC estimates are asymptot- ffi Figure 2: Influence functions of the M-GC estimator for different ically e cient. values of P.(Black:)P = .5, (blue:) P = 1, (red:) P = 1.5, and (cyan:) P = 2. 3.3. Weighted M-GC Estimators. A filtering framework cannot be considered complete until an appropriate weighting operation is defined. Filter weights, or coefficients, are is asymptotically redescending, and the effect of outliers extremely important for applications in which signal corre- monotonically decreases with an increase in magnitude [25]. lations are to be exploited. Using the ML estimator under The M-GC also possesses the followings important independent, but non identically distributed, GCD statistics properties. (expression (13)), the M-GC estimator is extended to include weights. Let h = [h1, ..., hN ] denote a vector of nonnegative Property 3 (outlier rejection). For σ<∞, weights. The weighted M-GC (WM-GC) estimate is defined as lim θ(x(1), ..., x(N)) = θ(x(1), ..., x(N − 1)). ⎡ ⎤ →±∞ (19) x(N) N ⎣ p p ⎦ θ = arg min log σ + hi|x(i) − θ| . (24) Property 4 (no undershoot/overshoot). The output of the M- θ i=1 GC estimator is always bounded by The filtering structure defined in (24) is an M-smoother x[1] < θ

{ }N = Require: Data set x(i) i 1 and tolerances 1, 2, 3. (1) Initialize σ(0) and θ(0). (m) (m−1) (m) (m−1) (m) (m−1) (2) while |θ − θ | > 1, |σ − σ | > 2 and |p − p | > 3 do (3) Estimate p(m) as the solution of (30). (4) Estimate θ(m) as the solution of (28). (5) Estimate σ(m) as the solution of (29). (6) end while (7) return θ,σ and p.

Algorithm 1: Multiparameter estimation algorithm.

3.4. Multiparameter Estimation. The location estimation where Γ(x) is the Gamma function.) It can be noticed that problem defined by the M-GC filter depends on the param- (28) is the implicit equation for the M-GC estimator with ψ eters σ and p. Thus to solve the optimal filtering problem, as defined in (18), implying that the location estimate has the we consider multiparameter M-estimates [26]. The applied same properties derived above. approach utilizes a small set of signal samples to estimate Of note is that g(x; θ, σ, p) has a unique maximum in σ and p and then uses these values in the filtering process σ for fixed θ and p, and also a unique maximum in p for (although a fully adaptive filter can also be implemented fixed θ and σ and p ∈ (0, 2]. In the following, we provide an using this scheme). algorithm to iteratively solve the above set of equations. { }N Let x(i) i=1 be a set of independent observations from a common GCD with deterministic but unknown parameters Multiparameter Estimation Algorithm. For a given set of data θ, σ,andp. The joint estimates are the solutions to the { }N following maximization problem: x(i) i=1, we propose to find the optimal joint parameter estimates by the iterative algorithm details in Algorithm 1, with the superscript denoting iteration number. θ, σ, p = arg max g x; θ, σ, p , (26) θ,σ,p The algorithm is essentially an iterated conditional mode (ICM) algorithm [27]. Additionally, it resembles the where expectation maximization (EM) algorithm [28] in the sense N that, instead of optimizing all parameters at once, it finds −2/p g x; θ, σ, p = aσ σ p + |x(i) − θ|p , (27) the optimal value of one parameter given that the other two i=1 are fixed; it then iterates. While the algorithm converges to a local minimum, experimental results show that initializing a = pΓ(2/p)/2(Γ(1/p))2. The solution to this optimization θ as the sample median and σ as the median absolute problem is obtained by solving a set of simultaneous equa- deviation (MAD), and then computing p as a solution tions given by first-order optimality conditions. Differentiat- to (30), accelerates the convergence and most often yields ing the log-likelihood function, (x; , , ), with respect to g θ σ p globally optimal results. In the classical literature-fixed-point θ, σ,andp and performing some algebraic manipulations algorithms are successfully used in the computation of M- yields the following set of simultaneous equations: estimates [3, 4]. Hence, in the following, we solve items 3–5 − in Algorithm 1 using fixed-point search routines. ∂g N −p|x(i) − θ|p 1 sgn(x(i) − θ) = = 0, (28) p | − |p ∂θ i=1 σ + x(i) θ Fixed-Point Search Algorithms. Recall that when 0

Table 1: Multiparameter Estimation Results for GCD Process with 0 length N and (θ, σ, p) = (0,1,2). −0.5 N 10 100 1000 θ 0.0035 −0.0009 −0.0002 −1 MSE 0.0302 2.4889 × 10−3 1.7812 × 10−4 σ 0.9563 1.0224 1.0186 −1.5 MSE 0.0016 1.7663 × 10−5 1.1911 × 10−6 p 1.5816 1.8273 1.9569 −2 × −6 MSE 0.0519 0.0109 1.5783 10 log mean square error

−2.5 Similarly, for (29) the recursion can be written as −3 ⎛ ⎞ 121.5332.5 .5 44.55 N 1/p = b σ x(i) ⎝ i 1 i (j) ⎠ Iteration number σ(j+1) = (32) N i=1 bi σ(j) Location: θ Scale: σ = p | − |p Tail: p with bi(σ(j)) 1/(σ(j)+ x(i) θ ). The algorithm terminates | − | when σ(j+1) σ(j) <δ2 for δ2 a small positive number. Since Figure 3: Multiparameter estimation MSE iteration evolution for a the objective function has only one minimum for fixed θ and GCD process with (θ, σ, P) = (0,1,1.5). p, the recursion converges to the global result. The parameter p recursion is given by ! ! of the model, thus providing advantage over the aforemen- N = 2 2 − 1 tioned methods that require aprioriparameter tuning. It is p(j+1) Ψ Ψ N i=1 p(j) p(j) straightforward to show that the computational complexity of the proposed method is O(N2), assuming the practical +log σ p(j) +|x(i)−θ|p(j) case in which the number of fixed-point iterations is N. 2 ⎤ The dominating N term is the cost of selecting the input p p(j) −| − | (j) | − | sample that minimizes the objective function, that is, the p(j) σ log σ x(i) θ log x(i) θ ⎦ + . cost of evaluating the objective function N times. However, if p(j) −| − |p(j) σ x(i) θ faster methods that avoid evaluation of the objective function (33) for all samples (e.g., subsampling methods) are employed, the computational cost is lowered. Noting that the search space is the interval I = (0, 2], the function g (27) can be evaluated for a finite set of points P ∈ I, keeping the value that maximizes g, setting it as the initial 4. Robust Distance Metrics point for the search. This section presents a family of robust GCD-based error As an example, simulations illustrating the developed metrics. Specifically, the cost function of the M-GC estimator multiparameter estimation algorithm are summarized in defined in Section 3.1 is extended to define a quasinorm over Table 1,forp = 2, θ = 0, and σ = 1 (standard Rm and a semimetric for the same space—the development is Cauchy distribution). Results are shown for varying sample analogous to L norms emanating from the GGD family. We lengths: 10, 100, and 1000. The experiments were run 1000 p denote these semimetrics as the log-L (LL )norms.(Note times for each block length, with the presented results the p p that for the σ = 1andp = 1 case, this metric defines the average on the trials. Mean final θ, σ,andp estimates are log-L space in Banach space theory.) reported as well as the resulting MSE. To illustrate that the algorithm converges in a few iterations, given the proposed m Definition 5. Let u ∈ R , then the LLp norm of u is defined initialization, consider an an experiment utilizing data drawn as from a GCD θ = 0, σ = 1, and p = 1.5 distribution. Figure 3 reports θ, σ, p estimate MSE curves. As in the previous case, m |u |p = log 1+ i , 0 100 trials are averaged. Only the first five iteration points are u LLp,σ p σ> . (34) = σ shown, as the algorithms are convergent at that point. i 1 To conclude this section, we consider the computational The LLp norm is not a norm in the strictest sense since complexity of the proposed multiparameter estimation algo- it does not meet the positive homogeneity and subadditivity rithm. The algorithm in total has a higher computational properties. However, it follows the positive definiteness and complexity than the FLOM, median, meridian, and myriad a scale invariant properties. operators, since Algorithm 1 requires initial estimates of the location and the scale parameters. However, it should be Proposition 2. Let c ∈ R, u, v ∈ Rm,andp, σ>0.The noted that the proposed method estimates all the parameters following statements hold: EURASIP Journal on Advances in Signal Processing 9

≥ = = ∈ Rm ≤ (i) u LLp,σ 0, with u LLp,σ 0 if and only if u 0; Lemma 2. For every u , 0

0,the following relations hold: (ii) cu LL ,σ = u LL ,δ,whereδ = σ/|c|; p p = p (iii) u + v LLp,σ v + u LLp,σ ; p ≤ ≤ p u LLp,σ − σ u LLp,σ u p σ m e 1 . (38) p−1 (iv) let Cp = 2 . Then Proof. The ﬁrst inequality comes from the relation log(1 + ≤ ≥ =| |p p u + v LLp,σ x) x,forallx 0. Setting xi ui /σ and summing ⎧ over i yield the result. The second inequality follows from ⎨ ≤ u LLp,σ + v LLp,σ , for 0

1. u = log 1+ i LLp,σ LLp,σ p LLp,σ σ p (35) i=1 p p |ui| u ∞ ≥ ≥ max log 1+ = log 1+ . Proof. Statement 1 follows from the fact that log(1 + a) 0 i σ p σ p ≥ = for all a 0, with equality if and only if a 0. Statement 2 (39) follows from p p ≤ u LLp,σ − 1/p ≤ m |cu |p m |u |p Noting that u ∞ σ(e 1) and u p m u ∞ log 1+ i = log 1+ i p | | p . (36) for all p>0 gives the desired result. i=1 σ i=1 (σ/ c ) The particular case p = 2 yields the well-known Statement 3 follows directly from the deﬁnition of the LLp Lorentzian norm. The Lorentzian norm has desirable robust norm. Statement 4 follows from the well-known relation |a+ error metric properties. p p p b| ≤ Cp(|a| + |b| ), a, b ∈ R,whereCp is a constant that depends only on p.Indeed,for01wehaveCp 2 (for further details see (ii) It is convex near the origin (0 ≤ u ≤ σ), behaving [29] for example). Using this result and properties of the log similar to an L cost function for small variations. function we have 2 (iii) Large deviations are not heavily penalized as in the m p |u + v | L1 or L2 norm cases, leading to a more robust error u + v = log 1+ i i LLp,σ p metric when the deviations contain gross errors. i=1 σ m C |u |p + |v |p Contour plots of select norms are shown in Figure 4 ≤ p i i log 1+ p for the two-dimension case. Figures 4(a) and 4(c) show the = σ i 1 L2 and L1 norms, respectively, while the LL2 (Lorentzian) and norms (for = 1) are shown in Figures 4(b) m | |p | |p LL1 σ = 1 ui + vi and 4(d), respectively. It can be seen from Figure 4(b) that log Cp +log + p i=1 Cp σ the Lorentzian norm tends to behave like the L2 norm for points within the unitary L2 ball. Conversely, it gives the same m | |p | |p ≤ ui + vi penalization to large sparse deviations as to smaller clustered log Cp +log 1+ p i=1 σ deviations. In a similar fashion, Figure 4(d) shows that the LL1 norm behaves like the L1 norm for points in the unitary m | |p | |p | |p| |p ≤ ui vi ui vi L1 ball. log 1+ p + p + 2p i=1 σ σ σ 5. Illustrative Application Areas + m log Cp ! ! This section presents four practical problems developed m | |p | |p = ui vi under the proposed framework: (1) robust ﬁltering for log 1+ p 1+ p i=1 σ σ power line communications, (2) robust estimation in sensor networks with noisy channels, (3) robust reconstruction + m log Cp methods for compressed sensing, and (4) robust fuzzy = u + v + m log C . clustering. Each problem serves to illustrate the capabilities LLp,σ LLp,σ p and performance of the proposed methods. (37)

5.1. Robust Filtering. The use of existing power lines for The LLp norm deﬁnes a robust metric that does not heav- transmitting data and voice has been receiving recent interest ily penalize large deviations, with the robustness depending [30, 31]. The advantages of power line communications on the scale parameter σ and the exponent p. The following (PLCs) are obvious due to the ubiquity of power lines lemma constructs a relationship between the Lp norms and and power outlets. The potential of power lines to deliver the LLp norms. broadband services, such as fast internet access, telephone, 10 EURASIP Journal on Advances in Signal Processing

10 10

8 8

6 6

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8

−10 −10 −10 −50 5 10−10 −50 5 10 (a) (b) 10 10

8 8

6 6

4 4

2 2

0 0

−2 −2

−4 −4

−6 −6

−8 −8

−10 −10 −10 −50 5 10−10 −50 5 10 (c) (d)

Figure 4: Contour plots of diﬀerent metrics for two dimensions: (a) L2,(b)LL2 (Lorentzian), (c) L1, and (d) LL1 norms.

fax services, and home networking is emerging in new com- signaling, and equiprobable alphabet v ={−2, −1, 1, 2},is munications industry technology. However, there remain considered. The noise is taken to be white, zero location, considerable challenges for PLCs, such as communications α-stable distributed with γ = 1andα ranging from 0.2 to channels that are hampered by the presence of large 2 (very impulsive to Gaussian noise). The filtering process amplitude noise superimposed on top of traditional white employed utilizes length nine sliding windows to remove the Gaussian noise. The overall interference is appropriately noise and enhance the signal. The M-GC parameters were modeled as an algebraic tailed process, with α-stable often determined using the multiparameter estimation algorithm chosen as the parent distribution [31]. described in Section 3.4. This optimization was applied to While the M-GC filter is optimal for GCD noise, it is the first 50 samples, yielding p = 0.756 and σ = 0.896. The also robust in general impulsive environments. To compare M-GC filter is compared to the FLOM, median, myriad, and the robustness of the M-GC filter with other robust filtering meridian operators. The meridian tunable parameter was schemes, experiments for symmetric α-stable noise cor- also set using the multiparameter optimization procedure, rupted PLCs are presented. Specifically, signal enhancement but without estimating p. The myriad filter tuning parameter for the power line communication problem with a 4-ASK was set according to the α − k curve established in [18]. EURASIP Journal on Advances in Signal Processing 11

The channel noise density function is denoted by wk ∼ fw(u). When this noise is impulsive (e.g., atmospheric noise 10−5 or underwater acoustic noise), traditional Gaussian-based methods (e.g., least squares) do not perform well. We extend the blind decentralized estimation method proposed in [33], modeling the channel corruption as GCD noise and deriving −10 10 a robust estimation method for impulsive channel noise scenarios. The sensor noise, n, is modeled as zero-mean 2 Normalised MSE additive white Gaussian noise with variance σn , while the 10−15 channel noise, w, is modeled as zero-location additive white GCD noise with scale parameter σw and tail constant p. A realistic approach to the estimation problem in sensor networks assumes that the noise pdf is known but that 0.200.411.6 0.8 .21.4 1.61.8 the values of some parameters are unknown [33]. In the Tail parameter, α following, we consider the estimation problem when the sensor noise parameter σ is known and the channel noise Median Meridian n FLOM M-GC tail constant p and scale parameter σw are unknown. Myriad Instrumental to the scheme presented is the fact that bk is a Bernoulli random variable with parameter Figure 5: Power line communication enhancement. MSE for diﬀerent ﬁltering structures as function of the tail parameter . α ψ(θ) Pr{bk = +1}=1 − Fn(τ − θ), (42)

where Fn(·) is the cumulative distribution function of nk. The normalized MSE values for the outputs of the The pdf of the noisy observations received at the fusion different filtering structures are plotted, as a function of α, center is given by in Figure 5. The results show that the various methods per- * + form somewhat similarly in the less demanding light-tailed fy y = ψ(θ) fw y − 1 + 1 − ψ(θ) fw y +1 . (43) noise environments, but that the more robust methods, in particular the M-CG approach, significantly outperform in Note that the resulting pdf is a GCD mixture with mixing the heavy-tailed, impulsive environments. The time-domain parameters ψ and [1 − ψ]. To simplify the problem, we first results are presented in Figure 6, which clearly show that the estimate ψ = ψ(θ) and then utilize the invariance of the ML M-GC is more robust than the other operators, yielding a estimate to determine θ using (42). cleaner signal with fewer outliers and well-preserved signal Using the log-likelihood function, the ML estimate of (symbol) transitions. The M-GC filter benefits from the ψ ∈ (0, 1) reduces to optimization of the scale and tail parameters and therefore perform at least as good as the myriad and meridian filters. K * + ψ = arg max log ψfw yk − 1 + 1 − ψ fw yk +1 . Similarly, the M-GC filter performs better than the FLOM ψ filter, which is widely used for processing stable processes [9]. k=1 (44)

5.2. Robust Blind Decentralized Estimation. Consider next a The unknown parameter set for the estimation problem set of K distributed sensors, each making observations of a is {ψ, σw, p}. We address this problem utilizing the well deterministic source signal θ. The observations are quantized known EM algorithm [28] and a variation of Algorithm 1 in with one bit (binary observations), and then these binary Section 3.4. The followings are the E-andM-steps for the observations are transmitted through a noisy channel to a considered sensor network application. fusion center where θ is estimated (see [32, 33] and references therein). The observations are modeled as x = θ + n,wheren are sensor noise samples assumed to be zero-mean, spatially E-Step. Let the parameters estimated at the j-th iteration (j) (j) (j) uncorrelated, independent, and identically distributed. Thus be marked by a superscript (j)andΓ = (σw , p ). The the quantized binary observations are posterior probabilities are computed as = { ∈ ∞ } bk 1 xk (τ,+ ) (40) (j) (j) ψ fw yk − 1 | Γ = * + = qk (j) (j) (j) (j) . for k 1, 2, ..., K,whereτ is a real-valued constant and ψ fw yk − 1 | Γ + 1 − ψ fw yk +1| Γ 1{·} is the indicator function. The observations received at (45) the fusion center are modeled by y = (2b − 1) + w = m + w, (41) M-Step. The ML estimates {ψ(j+1), Γ(j+1)} are given by where are zero-mean independent channel noise samples w K and the transformation mk = 2bk − 1ismadetoadopta (j+1) 1 (j+1) ψ = qk, Γ = arg maxΛ(Γ), (46) Γ binary phase shift keying (BPSK) scheme. K k=1 12 EURASIP Journal on Advances in Signal Processing

3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 (a) (b) 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 (c) (d) 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 (e) (f) 3 3 2 2 1 1 0 0 −1 −1 −2 −2 −3 −3 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 (g) (h)

Figure 6: Power line communication enhancement. (a) Transmitted signal, (b) Received signal corrupted by α-stable noise α = 0.4. Filtering results with: (c) Mean, (d) Median, (e) FLOM P = .25, (f) Myriad, (g) Meridian, (h) M-GC. where channel-based method derived in [33], referred to as maximum likelihood with unknown Gaussian channel parameter K (MLUG), are presented in Figure 7. The MSE is used as Λ(Γ) = qkΥ yk − 1; Γ + 1 − qk Υ yk +1;Γ , (47) k=1 a comparison metric. As a reference, the MSE of the binary estimator (BE) and the clairvoyant estimator (CE) −1 p p where Υ(u; Γ) = log a(p) + log σw − 2p log(σw + |u| )and (estimators in perfect transmission) are also included. a(p) = pΓ(2/p)/2(Γ(1/p))2. We use a suboptimal estimate A sensor network with the following parameters is used: of p in this case, choosing the value from P ={0.5, 1, 1.5, 2} θ = 1, τ = 0, σn = 1, and K = 1000, and the that maximizes (46). results are averaged for 200 independent realizations. For the Numerical results comparing the derived GCD method, channel noise we use two models: contaminated p-Gaussian coined maximum likelihood with unknown generalized and α-stable distributions. Figure 7(a) shows results for Cauchy channel parameters (MLUGC), with the Gaussian contaminated p-Gaussian noise with the variance set as EURASIP Journal on Advances in Signal Processing 13

102 101

101 100

100 10−1 MSE MSE 10−1

10−2 10−2

10−3 10−3 10−3 10−2 10−1 0.20.40.60.81 1.2 1.41.61.8 Contamination factor, p Tail parameter, α

CE MLUG CE MLUG BE MLUGC BE MLUGC (a) (b)

Figure 7: Sensor network example with parameters: θ = 1, τ = 0, σn = 1, and K = 1000. Comparison of MLUGC, MLUG, BE, and CE. (a) 2 = Channel noise contaminated p-Gaussian distributed with σw 0.5. MSE as function of the of the contamination parameter, p. (b) Channel noise α-stable distributed with σw = 0.5. MSE as function of the tail parameter, α.

2 = σw 0.5 and varying p (percentage of contamination) from bounded, Gaussian, or, at a minimum, has finite variance. 10−3 to 0.2. The results show a gain of at least an order of Recent works have begun to address the reconstruction of magnitude over the Gaussian-derived method. Results for α- sparse signals from measurements corrupted by outliers, for stable distributed noise are shown in Figure 7(b),withscale example, due to missing data in the measurement process parameter σw = 0.5 and the tail parameter, α, varying from or transmission problems [40, 41]. These works are based 0.2 to 2 (very impulsive to Gaussian noise). It can be observed on the sparsity of the measurement error pattern to first that the GCD-derived method has a gain of at least an order estimate the error and then estimate the true signal, in an of magnitude for all α. Furthermore, the MLUGC method iterative process. A drawback of this approach is that the has a nearly constant MSE for the entire range. It is of note reconstruction relies on the error sparsity to first estimate that the MSE of the MLUGC method is comparable to that the error, but if the sparsity condition is not met, the obtained by the MLUG (Gaussian-derived) for the especial performance of the algorithm degrades. case when α = 2 (Gaussian case), meaning that the GCD- Using the arguments above, we propose to use a robust derived method is robust under heavy-tailed and light-tailed metric derived in Section 4 to penalize the residual and environments. address the impulsive sampling noise problem. Utilizing the strong theoretical guarantees of basis pursuit (BP) L minimization, for sparse recovery of underdetermined 5.3. Robust Reconstruction Methods for Compressed Sensing. 1 systems of equations (see [34]), we propose the following As a third example, consider compressed sensing, which is nonlinear optimization problem to estimate x0 from y: a recently introduced novel framework that goes against the , , traditional data acquisition paradigm [34]. Take a set of m min subject to , − , ≤ x 1 y Φx LL2,γ . (48) n x∈Rn sensors making observations of a signal x0 ∈ R .Suppose { }m that x0 is s-sparse in some orthogonal basis Ψ, and let φi i=1 The following result presents an upper bound for the be a set of measurements vectors that are incoherent with the reconstruction error of the proposed estimator and is based sparsity basis. Each sensor takes measurements projecting x0 on restricted isometry properties (RIPs) of the matrix Φ (see { }m onto φi i=1 and communicates its observation to the fusion [34, 42] and references therein for more details on RIPs). center over a noisy channel. The measurement process can Theorem 1 (see [42]). Assume the matrix Φ meets an RIP, be modeled as y = Φx0 + z,whereΦ is an m × n matrix with vectors φ as rows and z is white additive noise (with possibly then for any s-sparse signal x0 and observation noise z with i ≤ ∗ impulsive behavior). The problem is how to estimate x from z LL2,γ , the solution to (48),denotedasx ,obeys 0 the noisy measurements y. , , , ∗ − , ≤ · · − (49) Arangeofdifferent algorithms and methods have been x x0 2 Cs 2γ m(e 1), developed that enable approximate reconstruction of sparse where Cs is a small constant. signals from noisy compressive measurements [35–39]. Most such algorithms provide bounds for the L2 reconstruction Notably, γ controls the robustness of the employed norm error based on the assumption that the corrupting noise is and the radius of the feasibility set LL2 ball. Let Z be a 14 EURASIP Journal on Advances in Signal Processing

Cauchy random variable with scale parameter σ and location noisy data in [43, 44]. The NC approach is successful in parameter zero. Assuming a Cauchy model for the noise improving the robustness of a variety of prototype-based E = E { −2 2}= vector yields z LL2,γ m log 1+γ Z 2m log(1 + clustering methods. This method considers the noise as a γ−1σ). We use this value for and set γ as MAD(y). separate class and represents it by a prototype that has a Debiasing is achieved through robust regression on a constant distance δ. X ={ }N ∈ Rn subset of x indexes using the Lorentzian norm. The subset Let xj j=1, xj , be a finite data set and C is set as I ={i : |xi| >α}, α = λ maxi|xi|, where 0 <λ<1. the given number of clusters. NC partitions the data set by d Thus x-I ∈ R is defined as minimizing the following function proposed in [43]: , , ⎛ ⎞ , , m x- = arg min y − Φ x , C N N C I I LL2,σ (50) m x∈Rd ⎝ ⎠ J(Z) = uij d xj , zi + δ 1 − uij , (51) where d =|I|. The final reconstruction after the regression i=1j=1 j=1 i=1 (x-)isdefinedasx- for indexes in the subset I and zero outside I where Z = [z1; ...; zC]isamatrixwhoserowsarethecluster I. The reconstruction algorithm composed of solving (48) centers, m ∈ (1, ∞) is a weighting exponent, and d(xj , zi)is followed by the debiasing step is referred to as Lorentzian the squared L2 distance from a data point xj to the center zi. basis pursuit (BP) [42]. U = [uij]isaC×N matrix, called a constraint fuzzy partition Experiments evaluating the robustness of Lorentzian of X, which satisfies [43] BP in different impulsive sampling noises are presented, comparing performance with traditional CS reconstruction uij ∈ [0, 1] ∀i , j, algorithms orthogonal matching pursuit (OMP) [38]and N basis pursuit denoising (BPD) [34]. The signals are synthetic 0 < u

40 40

20 20

0 0

−20 −20

−40 −40 Reconstruction SNR (dB) Reconstruction SNR (dB) −60 −60

−80 −80 10−3 10−2 10−1 100 0.20.40.6 0.811.2 1.41.61.8 2 Contamination factor, p Tail parameter, α

Lorentzian BP Lorentzian BP BPD BPD OMP OMP (a) (b)

Figure 8: Comparison of Lorentzian BP with BPD and OMP for impulsive contaminated samples. (a) Contaminated p-Gaussian, σ2 = 0.01. R-SNR as a function of the contamination parameter, p.(b)α-stable noise, σ = 0.1. R-SNR as a function of the tail parameter, α.

Require: cluster number C, weighting parameter m, δ, maximum number of iterations or terminate parameter . (1) Initialize cluster centers. (2) While zs(t +1)− zs(t) > or a maximum number of iterations is not reached do (3) Compute the fuzzy set U using (53)and (4) Update cluster centers using (55). (5) end while (6) return Cluster centroids Z = [z1; ...; zC].

Algorithm 2: LLp-based noise clustering algorithm. with k-means on a small subset of the data to initialize a set of cluster centers. The proposed algorithm is summarized in p−2 m u p xjl − zsl(t) Algorithm 2 and is coined the LLp-based Noise Clustering = sj wj (t) p , (56) (LLp-NC) algorithm. p − σ + xjl zsl(t) Experimental results show that for multigroup heavy- tailed process, the results of the LLp based method generally where t denotes the iteration number. The recursion is converges to the global minimum. However, to address the terminated when zs(t+1)−zs(t) 2 < for some given > 0. problem of local minima, the clustering algorithm is per- This method is used to find the update of the cluster centers. formed multiple times with different random initializations Alternation of (53)and(55) gives an algorithm to find the (subsets randomly sampled) and with a fixed small number cluster centers that converge to a local minimum of the cost of iterations. The best result is selected as the final solution. function. Simulations to validate the performance of GCD-based = In the NC approach, m 1 corresponds to crisp mem- clustering algorithm (LLp-NC) in heavy tailed environments berships, and increasing m represents increased fuzziness are carried out and summarized in Table 2.Theexperiment and soft rejection of outliers. When m is too large, spurious uses three synthetic data sets of 400 points each with different cluster may exist. The choice of the constant distance δ also distributions and 100 points in each cluster. The locations of influences the fuzzy membership; if it is too small, then we the centers for the three sets are [−6, 2], [−2, −2], [2, 4], and cannot distinguish good clusters from outliers, and if it is [3, 0] for each set. The first set has Cauchy distributed clusters too large, the result diverges from the basic FCM. Based (GCD, p = 2) with σ = 1 and is shown in Figure 9.The = 2 N − = = on [43], we set δ (λ/N ) i =/ j xi xj LLp,σ ,whereλ is second has the meridian distribution (GCD, p 1), with σ a scale parameter. In order to reduce the local minimum 1. The meridian is a very impulsive distribution. The third caused by initialization of the NC approach, we use classical set has a two-dimensional α-stable distribution with α = 0.9 16 EURASIP Journal on Advances in Signal Processing

30 relationship between the GGD and GCD families. The proposed framework, due to its ﬂexibility, subsumes GGD-based 20 developments, thereby guaranteeing performance improvements over the traditional problem formulation techniques. Properties of the derived techniques are analyzed. Four 10 particular applications are developed under this framework: (1) robust ﬁltering for power line communications, (2) 0 robust estimation in sensor networks with noisy channels, (3) robust reconstruction methods for compressed sensing, and (4) robust fuzzy clustering. Results from the applications −10 show that the proposed GCD-derived methods provide a robust framework in impulsive heavy-tailed environments, −20 with performance comparable to existing methods in less demanding light-tailed environments.

−30 −30 −20 −10 0 10 20 30 Appendices Figure 9: Data set for clustering example 1. Cauchy distributed A. Proof of Lemma 1 samples with cluster centers [−6, 2], [−2, −2], [2, 4], and [3, 0]. Let X be the RV formed as the ratio of two RVs, U and V: X = U/V. In the case where U and V are independent, the Table 2: Clustering results for GCD processes and α-stable process. pdf of the RV X, fX (·), is given by [46]

N MSE MAD LLp Average Distance . ∞ LLp-NC 0.34987 0.62897 0.0968 Cauchy fX (x) = |v| fU (xv) fV (v)dv, (A.1) −∞ L1-NC 1.8186 1.8361 0.1262 15.39 Similarity-based 1.6513 1.136 0.18236 where fU (·)and fV (·) denote the pdf s of U and V, LLp-NC 0.85197 0.9283 0.1521 Meridian respectively. Replacing the GGD in (A.1) and manipulating L -NC 5.887 2.7311 0.5573 50.363 the obtained expression yield 1 Similarity–based 5.2309 2.4627 1.8416 fX (x) = C αU , β C αV , β LLp-NC 0.50408 0.73618 0.1896 α-stable . ∞ / 0 / 0 L -NC 3.2105 2.7684 0.2174 44.435 |xv| β |v| β 1 × |v| exp − exp − dv, Similarity-based 1.7578 1.6322 1.0112 v=−∞ αU αV (A.2) where C( , ) (2 (1 )). Noting that | |=| | and = α β β/ αΓ /β ab a b and γ 1, which is also a very impulsive case. The algorithm dividing the integral give wasrun200timesforeachsetwithdifferent initializations, setting the maximum number of iterations to 50, = 0.0001, fX (x) = C αU , β C αV , β = 1. and λ 0.1. β To evaluate the results, we calculate the MSE, the mean × v exp −v K αU , αV , β, x dv v>0 absolute deviation (MAD), and the LLp distance between . 2 the solutions and the true cluster centers, averaging the β − v exp −(−v) K αU , αV , β, x dv , results for 200 trials. The LLp NC approach is compared with v≤0 classical NC employing the L1 distance and the similarity- (A.3) based method in [45]. The average L2 distance between all points in the set (AD) is shown as a reference for each sample where set. As the results show, GCD-based clustering outperforms |x|β 1 K , , , + both traditional NC and similarity-based methods in heavy- αU αV β x β β . (A.4) α α tailed environments. Of note is the meridian case, which is a U V Consider first very impulsive distribution. The GCD clustering results are . significantly more accurate than those obtained by the other β I1(v) v exp −v K αU , αV , β, x dv. (A.5) approaches. v>0 β Letting z = v K(αU , αV , β, x), after some manipulations, 6. Concluding Remarks yields . 1 This paper presents a GCD-based theoretical approach that I ( ) = (2/β)−1 (− ) 1 v K2/β z exp z dz. allows the formulation of challenging problems in a robust β αU , αV , β, x z>0 fashion. Within this framework, we establish a statistical (A.6) EURASIP Journal on Advances in Signal Processing 17

Noting that k ∈ 1, 2, ..., N − 1. If all the extrema points lie in . ! [x[1], x[N]], the function is concave in Ik, and since ﬀ (2/β)−1 − = 2 the function is not di erentiable in the input samples z exp( z)dz Γ (A.7) { }N z>0 β x(i) i=1 (critical points), then the only possible local minimums of the objective function are the input gives samples. ! (4) Consider the ith term in Q(θ)anddeﬁne I = 1 2 1(v) 2/β Γ . (A.8) p p βK αU , αV , β, x β qi(θ) = log σ + |x(i) − θ| . (B.4)

Consider next Clearly for each qi(θ) there exists a unique minima . = in θ x(i). Also, it can be easily shown that qi(θ) β − I2(v) v exp −(−v) K αU , αV , β, x dv. (A.9) is convex in the interval [x(i) a, x(i)+a], where v≤0 a = σ(p − 1)1/p, and concave outside this interval (for 1

For θx[N],sgn(x(i) θ) [ , ] since it is the sum of convex functions (in − x[1] x[K] 1foralli and Q (θ) > 0, showing that the function that interval) and it has only one global minima. Now is strictly increasing in that interval. suppose that |x[K] − x[1]|≥a, and also suppose that = ∈ (2) From (1) we see that Q (θ) / 0ifθ / [x[1], x[N]], then thereareatmost2K − 1 local extrema points. Let all local extrema of Q(θ) lie in the interval [x[1], x[N]]. x(K + 1) be a new sample in the data set, and without loss of generality assume that x(K +1)>x[K]. (3) Let x[k] <θx[K].If x(K 1) x[K] a, N the new sample will add at most two local extrema + log σ p + (x(i) − θ)p . points (one local maxima and one local minima) in = i k+1 the interval (x[K], x(K + 1)]. The local maxima is an inﬂexion point between (x , x(K +1)), and the local The second derivative with respect to is [K] θ minima is in the neighborhood of x(K+1). Therefore, − − the total number of extrema points for N = K +1is k p p − 1 (θ − x(i))p 2σ p − p(θ − x(i))2p 2 Q (θ) = at most 2K − 1+2= 2(K +1)− 1, which is the claim p p 2 i=1 σ + (θ − x(i)) of the statement. This concludes the proof.

− − N p p − 1 (x(i) − θ)p 2σ p − p(x(i) − θ)2p 2 + . C. Proof of Property 1 p − p 2 i=k+1 σ + (x(i) θ) (B.3) Using the properties of the argmin function, the M-GC estimator can be expressed as From (B.3) it can be seen that if 0

− Let δ = σ p. Since multiplying by a constant does not aﬀect Since the ﬁrst term in (D.5)isO(1/σ p)N r , the second term is the result of the arg min operator, we can rewrite (C.1)as negligible for small σ. Then, in the limit, θ can be computed as N p * + |x(i) − θ| θ = arg min δ log 1+ . (C.2) limθ = arg min Hσ x j ; x θ σ → 0 x(j)∈M i=1 δ ⎡ ⎤ Using the fact that a log b = log ba and taking the limit as ⎢ x(i) − x j p ⎥ →∞ = arg min ⎣ ⎦ δ yield x(j)∈M σ p (D.6) i,x(i) =/ x(j) N p δ ⎡ ⎤ |x(i) − θ| lim θ = lim arg min log 1+ ⎢ ⎥ δ →∞ δ →∞ θ = ⎣ − ⎦ i=1 δ arg min x(i) x j . x(j)∈M (C.3) i,x(i) =/ x(j) N = arg min |x(i) − θ|p, θ i=1 Acknowledgment where the last step follows since This paper was supported in part by NSF under Grant no. 0728904. 3 4 u δ lim log 1+ = u. (C.4) δ →∞ δ References

D. Proof of Property 2 [1] E. E. Kuruoglu, “Signal processing with heavy-tailed distributions,” Signal Processing, vol. 82, no. 12, pp. 1805–1806, 2002. The M-GC estimator can be expressed as [2] K. E. Barner and G. R. Arce, Nonlinear Signal and Image Processing: Theory, Methods and Applications, CRC Press, Boca Raton, Fla, USA, 2003. N |x(i) − θ|p θ = arg min log 1+ [3] G. R. Arce, Nonlinear Signal Processing: A Statistical Approach, θ p i=1 σ John Wiley & Sons, New York, NY, USA, 2005. ⎧ "⎫ (D.1) [4] P. Huber, Robust Statistics, John Wiley & Sons, New York, NY, ⎨N | − |p ⎬ USA, 1981. = x(i) θ arg min log⎩ 1+ ⎭. [5] S. A. Kassam and H. V. Poor, “Robust techniques for signal θ p i=1 σ processing: a survey,” Proceedings of the IEEE,vol.73,no.3, pp. 433–481, 1985. Define [6] J. Astola and Y. Neuvo, “Optimal median type filters for " exponential noise distributions,” Signal Processing, vol. 17, no. N |x(i) − θ|p 2, pp. 95–104, 1989. H (θ; x) = 1+ . (D.2) σ σ p [7] L. Yin, R. Yang, M. Gabbouj, and Y. Neuvo, “Weighted median i=1 filters: a tutorial,” IEEE Transactions on Circuits and Systems II, vol. 43, no. 3, pp. 157–192, 1996. Since the log function is monotone nondecreasing, the M- [8] G. R. Arce, “A general weighted median filter structure GC estimator can be reformulated as admitting negative weights,” IEEE Transactions on Signal Processing, vol. 46, no. 12, pp. 3195–3205, 1998. = θ arg min Hσ (θ; x). (D.3) [9] M. Shao and C. L. Nikias, “Signal processing with fractional θ lower order moments: stable processes and their applications,” It can be checked that when σ is very small Proceedings of the IEEE, vol. 81, no. 7, pp. 986–1010, 1993. [10] K. E. Barner and T. C. Aysal, “Polynomial weighted median / 0 N−r(θ) filtering,” IEEE Transactions on Signal Processing,vol.54,no.2, 1 pp. 636–650, 2006. Hσ (θ; x) = O , (D.4) σ p [11] T. C. Aysal and K. E. Barner, “Hybrid polynomial filters for Gaussian and non-Gaussian noise environments,” IEEE where r(θ) is the number of times the value θ is repeated in Transactions on Signal Processing, vol. 54, no. 12, pp. 4644– the sample set and O denotes the asymptotic order as σ → 0. 4661, 2006. In the limit the exponent N − r(θ) must be minimized for [12] J. G. Gonzales, Robust techniques for wireless communications in nongaussian environments, Ph.D. dissertation, ECE Depart- Hσ (θ; x) to be minimum. Therefore, θ will be one of the most ment, University of Delaware, 1997. repeated values in the input set. Define r = max j r(x(j)), then for x(j) ∈ M, expanding the product in (D.2)gives [13] T. C. Aysal and K. E. Barner, “Meridian filtering for robust signal processing,” IEEE Transactions on Signal Processing, vol. ⎧ ⎫ 55, no. 8, pp. 3949–3962, 2007. ⎨⎪ ⎬⎪ / 0 |x(i) − θ|p 1 N−r−1 [14] V. Zolotarev, One-Dimensional Stable Distributions,American H x j ; x = + O . σ ⎩⎪ σ p ⎭⎪ σ p Mathematical Society, Providence, RI, USA, 1986. i,x(i) =/ x(j) [15] J. P. Nolan, Stable Distributions: Models for Heavy Tailed Data, (D.5) Birkhuser, Boston, Mass, USA, 2005. EURASIP Journal on Advances in Signal Processing 19

[16]R.F.Brcich,D.R.Iskander,andA.M.Zoubir,“The [34] E. J. Candes` and M. B. Wakin, “An introduction to compressive stability test for symmetric alpha-stable distributions,” IEEE sampling: a sensing/sampling paradigm that goes against Transactions on Signal Processing, vol. 53, no. 3, pp. 977–986, the common knowledge in data acquisition,” IEEE Signal 2005. Processing Magazine, vol. 25, no. 2, pp. 21–30, 2008. [17] J. G. Gonzalez and G. R. Arce, “Optimality of the myriad filter [35] D. L. Donoho, M. Elad, and V. N. Temlyakov, “Stable recovery in practical impulsive-noise environments,” IEEE Transactions of sparse overcomplete representations in the presence of on Signal Processing, vol. 49, no. 2, pp. 438–441, 2001. noise,” IEEE Transactions on Information Theory, vol. 52, no. [18] J. G. Gonzalez and G. R. Arce, “Statistically-efficient filtering 1, pp. 6–18, 2006. in impulsive environments: weighted myriad filters,” Eurasip [36] J. Haupt and R. Nowak, “Signal reconstruction from noisy Journal on Applied Signal Processing, vol. 2002, no. 1, pp. 4–20, random projections,” IEEE Transactions on Information The- 2002. ory, vol. 52, no. 9, pp. 4036–4048, 2006. [19] T. C. Aysal and K. E. Barner, “Myriad-type polynomial [37] E. J. Candes,` J. K. Romberg, and T. Tao, “Stable signal filtering,” IEEE Transactions on Signal Processing, vol. 55, no. recovery from incomplete and inaccurate measurements,” 2, pp. 747–753, 2007. Communications on Pure and Applied Mathematics, vol. 59, no. [20] P. R. Rider, “Generalized cauchy distributions,” Annals of the 8, pp. 1207–1223, 2006. Institute of Statistical Mathematics, vol. 9, no. 1, pp. 215–223, [38] J. A. Tropp and A. C. Gilbert, “Signal recovery from random 1957. measurements via orthogonal matching pursuit,” IEEE Trans- [21] J. Miller and J. Thomas, “Detectors for discrete- time signals actions on Information Theory, vol. 53, no. 12, pp. 4655–4666, in non- Gaussian noise,” IEEE Transactions on Information 2007. Theory, vol. 18, no. 2, pp. 241–250, 1972. [39] D. Needell and J. A. Tropp, “CoSaMP: iterative signal recov- [22] T. C. Aysal, Filtering and estimation theory: first-order, poly- ery from incomplete and inaccurate samples,” Applied and nomial and decentralized signal processing, Ph.D. dissertation, Computational Harmonic Analysis, vol. 26, no. 3, pp. 301–321, ECE Department, University of Delaware, 2007. 2009. [23] D. Middleton, “Statistical-physical models of electromagnetic [40] E. J. CandesandP.A.Randall,“Highlyrobusterrorcorrection` interference,” IEEE Transactions on Electromagnetic Compati- by convex programming,” IEEE Transactions on Information bility, vol. 19, no. 3, pp. 106–127, 1977. Theory, vol. 54, no. 7, pp. 2829–2840, 2008. [24] H. M. Hall, “A new model for impulsive phenomena: appli- [41] B. Popilka, S. Setzer, and G. Steidl, “Signal recovery from cation to atmospheric-noise communication channels,” Tech. incomplete measurements in the presence of outliers,” Inverse Rep. 3412 and 7050-7, Standford Electronics Laboratories, Problems and Imaging, vol. 1, no. 4, pp. 661–672, 2007. Standford University, Standford, Calif, USA, 1966. [42] R. E. Carrillo, K. E. Barner, and T. C. Aysal, “Robust sampling [25]F.Hampel,E.Ronchetti,P.Rousseeuw,andW.Stahel,Robust and reconstruction methods for sparse signals in the presence Statistics: The Approach Based on Influence Functions, Wiley, of impulsive noise,” IEEE Journal on Selected Topics in Signal New York, NY, USA, 1986. Processing, vol. 4, no. 2, pp. 392–408, 2010. [26] R. E. Carrillo, T. C. Aysal, and K. E. Barner, “Generalized [43] R. N. Dave, “Characterization and detection of noise in Cauchy distribution based robust estimation,” in Proceedings clustering,” Pattern Recognition Letters, vol. 12, no. 11, pp. 657– ofIEEEInternationalConferenceonAcoustics,Speechand 664, 1991. Signal Processing (ICASSP ’08), pp. 3389–3392, April 2008. [44] R. N. Dave and R. Krishnapuram, “Robust clustering methods: [27] J. Besag, “On the statiscal analysis of dirty pictures,” Journal of a unified view,” IEEE Transactions on Fuzzy Systems, vol. 5, no. the Royal Statistical Society. Series B, vol. 48, no. 3, pp. 259–302, 2, pp. 270–293, 1997. 1986. [45] M.-S. Yangand K.-L. Wu, “A similarity-based robust clustering [28] G. McLachlan and T. Krishman, The EM Algorithm and method,” IEEE Transactions on Pattern Analysis and Machine Extensions, John Wiley & Sons, New York, NY, USA, 1997. Intelligence, vol. 26, no. 4, pp. 434–448, 2004. [29] G. H. Hardy, J. E. Littlewood, and G. Polya,´ Inequalities,Cam- [46] A. Papoulis, Probability, Random Variables, and Stochastic bridge Mathematical Library, Cambridge University Press, Processes, McGraw-Hill, New York, NY, USA, 1984. Cambridge, Mass, USA, 1988. [30] M. Zimmermann and K. Dostert, “Analysis and modeling of impulsive noise in broad-band powerline communications,” IEEE Transactions on Electromagnetic Compatibility, vol. 44, no. 1, pp. 249–258, 2002. [31] Y. H. Ma, P. L. So, and E. Gunawan, “Performance analysis of OFDM systems for broadband power line communications under impulsive noise and multipath effects,” IEEE Transac- tions on Power Delivery, vol. 20, no. 2, pp. 674–682, 2005. [32] T. C. Aysal and K. E. Barner, “Constrained decentralized estimation over noisy channels for sensor networks,” IEEE Transactions on Signal Processing, vol. 56, no. 4, pp. 1398–1410, 2008. [33] T. C. Aysal and K. E. Barner, “Blind decentralized estimation for bandwidth constrained wireless sensor networks,” IEEE Transactions on Wireless Communications,vol.7,no.5,Article ID 4524301, pp. 1466–1471, 2008.