<<

Robust Nonlinear Transform based on -

David L. Donoho & Thomas P.Y. Yu Department of Stanford University Stanford, CA 945305, USA [email protected] [email protected]

Abstract distribution; and so some form of homogeneous treatment (i.e. thresholding all coefficients in a standard way) is plau-

It is well known that wavelet transforms can be de- sible.

n

(z )

rived from stationary linear refinement subdivision schemes Now consider (1) in cases when the white i

i=1 [2, 3]. In this paper we discuss a special nonlinear re- is highly non-Gaussian. An example of such setting arises finement subdivision scheme Ð median-interpolation. It is a in target tracking in [9, 13], in which one encounters

nonlinear cousin of Deslauriers-Dubuc interpolation and of contaminated by impulsive glint noise. As a model,

n

(z )

average-interpolation. The refinement scheme is based on we consider i to be i.i.d. Cauchy distributed. The

i=1

R

`

` = f (x)dx

constructing polynomials which interpolate median func- Cauchy distribution has no moments x for

; 2;::: tionals of the underlying object. The refinement scheme can 1 , in particular neither nor . It therefore

be deployed in multiresolution fashion to construct nonlin- offers an extreme example of non-Gaussian data.

n

z ) ear pyramid schemes and associated forward and inverse (

Under this model, typical noise realizations i con-

i=1

transforms. In this paper we discuss the basic properties tain a few astonishingly large observations: the largest

(n)

of this transform and its possible use in wavelet de-noising observation is of size O . (In comparison, for Gaus-

p

( log (n)) schemes for badly non-Gaussian data. Analytic and com- sian noise, the largest observation is of size O .) putational results are presented to show that the nonlinear Moreover, the of i.i.d. Cauchy noise does pyramid has very different performance compared to tradi- not result in independent, nor identically distributed wavelet tional when coping with non-Gaussian data. coefficients. In fact, coefficients at larger scales are more likely to be affected by the perturbing influence of the few large noise values, and so have a systematically larger dis- persion at large scales. Invariance of distribution under or-

1. Introduction p

( log(n)) thogonal transforms and O behavior of maxima A number of recent theoretical studies [6, 5] have found are fundamental to the results of [5, 6]. This extremely non- that the orthogonal wavelet transform offers a promising ap- Gaussian situation lacks key quantitative properties which proach to noise removal. They assume that one has noisy were used in de-noising in the Gaussian case. In this paper we study a wavelet-like approach to deal samples of an underlying function f

with such extremely non-Gaussian data which is based

= f (t )+z ;i=1;:::;n;

y on abandoning linear orthogonal transforms and using in-

i i i (1)

stead a kind of nonlinear wavelet transform called Median-

n

(z )

where i is a Gaussian . In this setting, Interpolating Pyramid Transform (MIPT.) The nonlinear

i=1 they show that one removes a noise successfully by apply- transforms we propose combine ideas from the theory of ro- ing a wavelet transform, then applying thresholding in the bust estimation [10, 8] and ideas from wavelets and the the- wavelet domain, then inverting the transform. ory of interpolating subdivisions [1, 7]. The forward trans- A key principle underlying this approach is the fact that form (FMIPT) is based on deploying the median in a mul- an orthogonal transform takes Gaussian white noise into tiresolution pyramid; this gives an analysis method which is Gaussian white noise. It follows from this, and the Gaussian highly robust against extreme observations. The transform

assumption on the noise z , that in the transform domain, all itself is based on computation of “block ” over tri- wavelet coefficients suffer noise with the same probability adic cells, and the recording of block medians in a “r«esum«e- detail” array. The inverse transform (IMIPT) is based on Unlike Lagrange interpolation or average-interpolation, deploying median-interpolation in a multiresolution fash- median-interpolation is a nonlinear procedure and we are ion. At its center is the notion of “median-interpolating re- not aware of any algorithms available in the literature. In finement scheme”, a two-scale refinement scheme based on the following we briefly describe our attempt in solving this

imputing behavior at finer scales from coarse-scale infor- problem. =0

mation. D . It is the simplest by far; in that case one is fitting

 (t) = C onst A = 0

Our method of building transforms from refinement a constant function j;k . Hence , and

 (t)=m : j;k

schemes is similar to the way interpolating wavelet (2) becomes j;k The imputation step (3) then

m = m l =0; 1; 2:

+1;3k +l j;k transforms are built from Deslauriers-Dubuc interpolating yields j for Hence refinement

schemes in [2] and in the way biorthogonal wavelet trans- proceeds by imputing a constant behavior at finer scales.

D > 0 I = [i 1;i] x = i 1=2 i = i

forms are built from average-interpolating refinement in [3]. . Let i , ,

D +1 D +1

;:::;(D +1) F : R ! R

However, despite structural similarities there are important 1 . Define the map by

(F (v )) = med( jI )  i

differences: i where is the polynomial specified

D +1

2 R by the vector v and the Lagrange interpolation

 both the forward and inverse transforms can be nonlin-

ear; basis,

D +1 D +1

X Y



x x )

the transforms are based on a triadic pyramid and a 3- (

j

 (x)= v l (x) l (x)= :

i i

to-1 decimation scheme; i where (4)

(x x )

i j

i=1

j =1;j 6=i

n 

 the transforms are expansive (they map data into

F

=2n 3 coefficients). We have the following bisection algorithm to apply to a

given vector v .

It would be more accurate to call the approach a nonlinear

= v tol Algorithm m BlockMedian( , ):

pyramid transform.

a+b

+

) = max( jI ) m = min( jI ) m =  (

1. m , , ,

2 v

2. Median-Interpolating Refinement where  is formed from as in (4).

(x) m

In this section, we describe the mechanism of median- 2. Calculate all the roots of  , collect the ones

a; b)

interpolating refinement, which is the major building block that are (i) real, (ii) inside ( and (iii) with odd

fr ;:::;r g k

of MIPT. multiplicities into the ordered array 1 (i.e.

r

i+1 0 k +1

i ). Let and .

I med (f jI )

Given a function f on an , let de-

P

f I

note a median of for the interval , for definiteness we +

 (a)  m M = r r M =

i1

3. If , i and

i odd

P P

follow John Tukey’s suggestion and use the lowest median,

M = r r r r

i i1 i1

i ; else and

i odd i even

P

med (f jI ) = inf f : m(t 2 I : f (t)  ) 

defined by +

M = r r

i1

i .

i even

(t 2 I : f (t)  )g:

m Now suppose we are given a tri-

j

3 1

+ +

jM M j < tol M >M +

fm g

adic array j;k of numbers representing the medi- 4. If , terminate, else if

k =0

+ +

j j

tol m = m; m =(m + m )=2 m =

f I =[k 3 ; (k +1)3 )

ans of on the triadic intervals j;k : , , goto 2 ; else

j

m; m =(m + m )=2

m = med(f jI ) 0  k  3 : j;k j;k The goal of median- , goto 2.

interpolating refinement is to use the data at scale j to infer

To search for a degree D polynomial with prescribed +1

behavior at the finer scale j , obtaining imputed medi-

fm g fI g i

block median i at , we need to solve the nonlin-

f I

+1;k ans of on intervals j . Obviously we are missing the information to impute perfectly; nevertheless we can try to ear system

do a reasonable job.

(v )=m: F (5) We will employ polynomial-imputation. Starting from a

fixed even integer D , it involves two steps. There are general nonlinear solvers that can be used to 5 I solve ( ). We recommend below a more tailored method

[M1] (Interpolation) For each interval j;k , find a polyno-

D = 2A  which is based on fixed point iteration. mial j;k of degree satisfying the median- interpolation condition: The main empirical observation is that median-

interpolation can be well approximated by Lagrange inter-

med( jI )=m A  l  A:

j;k +l j;k +l

j;k for (2)

 j[a; b])   ((a +

polation. We first observe that med(

)=2)

[M2] (Imputation) Obtain (pseudo-) medians at the finer b . The above approximation is exactly, for example,

[a; b] [a; b] scale by setting when  is monotonic on , which is the case when

is away from the vicinity of the extremum points of  . The

m~ = med( jI ) l =0; 1; 2:

+1;3k +l j;k j;3k +l j for (3) linear approximation can be combined with the idea of fixed =2 point iteration to give the following iterative algorithm for In case D , we have no closed-form expression for

median-interpolation. the result. The operator R is nonlinear, and proving the ex-

~

f v = (m; tol ) h ! 1

+h Algorithm MedianInterp : istence of a limit j as requires more than a

direct application of theory in linear refinement schemes. m

1. Input. - vector of prescribed median at intervals ~

f f ()g j

In [14], it was shown that j defined above converges

I =[i 1;i] i =1;:::;(D +1) i , . always to a uniformly continuous limit. Moreover, the limit

0 is H¬older continuous and a lower bound of the H¬older ex-

= m 2. Initialization. Set v .

ponent was derived.



k +1 k k

= v + m F (v )

3. Iteration. Iterate v until

k

jjm F (v )jj < tol 1 . 3. Median-Interpolating Pyramid Transform

4. Output. v - solution of (5). We now apply the refinement scheme to construct a suggest that the above algorithm has an ex- nonlinear pyramid and associated nonlinear multiresolution ponentially fast convergence rate, which is an expectable analysis. phenomenon due to the contraction mapping theorem. While it is equally possible to construct pyramids for

However it requires more work to obtain a complete proof

f (t) y

decomposition of functions or of sequence data i ,we of convergence.

keep an eye on applications and concentrate attention on

= 2 D : A surprise. Although algorithm

the sequence case . So we assume we are given a discrete

D = 2; 4; 6;:::

MedianInterp can be applied to any , J

y ;i = 0;:::;n 1 n = 3

dataset i where , is a triadic

it suffers from two pitfalls. First, it is an iterative proce-

; 2187; 6561;::: number, such as 729 . We aim to use the dure and is harder to implement in practical setting (e.g. in nonlinear refinement scheme to decompose and reconstruct DSP hardware.) Second, more analysis on the convergence such sequences.

of the algorithm is needed to be carried out. These prob- =2 lems go away completely in the special case of D , be- Algorithm FMIPT: Pyramid Decomposition. cause closed-form solution of median-interpolating refine-

ment can be carried out in the quadratic case. The reason

D 2 0; 2; 4;::: j  0

1. Initialization. Fix and 0 . Set

of our success in obtaining such closed-form expressions is

= J j . due to the fact that quadratic polynomials all have a simple

shape, namely, each of them has a unique extremum point

m =

2. Formation of block medians. Calculate j;k

and is symmetric and monotonic on each side of the ex-

med(y : i=n 2 I ): j;k

tremum. This geometric property happens to make manip- i

m = ulation with medians much easier, and hence a closed-form ~

3. Formation of Refinements. Calculate j;k

((m ))

solution can be derived. For the formulae and details and R

1;k j using refinement operators of the previ- proofs, see [14]. ous section.

For the rest of the paper, we will mainly focus on the

=

D =2

case, although most of the results can be extended 4. Formation of Detail corrections. Calculate j;k

m m~ : j;k

to higher degrees. j;k

j = j +1 m =med(y : i=n 2

j ;k i 5. Iteration.If 0 , set

2.1 Multiscale Refinement 0

j = j 1 I ) ;k

j and terminate the algorithm, else set 0

The two-scale refinement scheme described in the last and goto 2.

fm~ g = ;k section applied to an initial median sequence j

0 Algorithm IMIPT: Pyramid Reconstruction.

fm g ;k

j implicitly defines a (generally nonlinear) refine-

0

R = RR(fm~ g)=fm~ g j 

j;k j +1;k

ment operator MI

j = j +1 D 2 0; 2; 4;:::

1. Initialization. Set 0 . Fix and

(~m ) j :

j;k k

0 We can associate resulting sequences to

j  0

0 , as in the decomposition algorithm.

~

f () =

piecewise constant functions on the line via j

P

1

j  j () m~ 1

0 I

j;k for . We can recursively ap-

j;k

k =1

(m ) =

2. Reconstruction by Refinement. j;k

ply this operator, and associate the sequence of piecewise

R ((m )) + ( )

1;k j;k k

constant functions defined on successively finer and finer j

= J j = j +1

meshes. 3. Iteration.Ifj goto 4, else set and goto

~

D = 0 h  0; f = f

+h j

In case ,wehave j for all 0 0 2.

so the result is just a piecewise constant object taking value

m I y = m ; i =0;:::;n 1

;k j ;k i J;i

j on . 4. Termination. Set

0 0

y = z i = i

An implementation is described in [14]. Important de- P5. Cauchy Noise. Suppose that i ,

0;:::;n 1 z tails for the implementation concern the treatment of bound- , and that i is i.i.d Cauchy white noise. Then

ary effects and efficient calculation of block medians.

2

p

 0

Gather the outputs of the Pyramidal Decomposition al- 0

J j

P ( 3 j j )  C  exp(C )

j;k

1 2 2

gorithm into the sequence 

p

0

J j

 =(((m ) ; ( ) ; ( ) ;:::( ) ):

0    3 C > 0

j ;k k j +1;k k j +2;k k J;k k

0 0

0 where and the are absolute con- i stants.

We call  the Median-Interpolating Pyramid Transform of For typical wavelet transforms, the coefficients of

 = MIP T (y ) y and we write . Applying the Pyramidal Cauchy noise have Cauchy distributions, and such exponen-

Reconstruction algorithm to  gives an array which we call tial bounds cannot hold. In contrast, it is known that the me-

1

= MIP T ( ) the inverse transform, and we write y . dian of a sample of Cauchy random variables obeys expo-

The reader may wish to check that nential inequalities over a wide of arguments. Again,

1

(MIP T (y )) = y y MIP T for every sequence . see [14] for proof of P5.

4. Properties 5. De-Noising via MIPT Thresholding

The pyramid just described has several basic properties.

We now consider applications of pyramid transforms to

P1. Coefficient Localization. The coefficient j;k in the multi-scale de-noising. In general, we act as we would in pyramid only depends on block medians of blocks at scale

the wavelet de-noising case.

j 1 j I

and which cover or abut the interval j;k .

j 0

P2. Expansionism. There are 3 r«esum«e coefficients

 = MIP T (y )

 Pyramid decomposition. Calculate .

j

(m )  3 ( ) j

;k j;k k

j in , and coefficients at each level . 0

Hence

  (y ) = y  1

t

y j>tg

Apply Thresholding. Let fj

j j +1 J

0 0

Dim( )=3 +3 + ::: +3 :

^ =

be the hard thresholding function and let 

J

t ((m ) ; ( ( )) ; ::::)

Dim( )=3 (1 + 1=3+1=9+:::)  3=2  n

j j ;k k t +1;k k

j Here the is a se-

0 j +1

It follows that . 0 0

The transform is about 50% expansionist. quence of threshold levels.

y =

P3. Coefficient Decay. Suppose that the data i

1

^ ^

f = MIP T ( ) 

(i=n) f 2

f are noiseless samples of a continuous function Pyramid reconstruction. Calculate .

_

0   1 jf (s) f (t)jC js tj

C , , i.e. for a fixed t

In this approach, coefficients smaller than j are judged

MIP T D =0 2 C . Then for or ,wehave

negligible, as noise rather than signal. Hence the thresholds

0 j

t

j

jC C 3 : j control the degree of noise rejection, but also of valid

j;k (6)

signal rejection. One hopes, in analogy with the orthogonal

r + (r ) (r )

_

C r = 1 2 jf (s) f (t)j 

If f is for or , i.e. transform case studied in [4], to set thresholds which are

js tj C 0 <  1

C , for some fixed and , . Then, for small, but which are very likely to exceed every coefficient =2 MIP T D , in case of a pure noise signal. If the MIPT is very success-

ful, the thresholds can be set ‘as if’ the noise is Gaussian

0 j (r + )

j jC C 3 :

j;k (7) and the transform is the AIPT, even when the noise is very

non-Gaussian. This would mean that the use of the median

y = z i = i

P4. Gaussian Noise. Suppose that i , cancels out many of the bad effects of impulsive noise.

0;:::;n 1 z

, and that i is i.i.d Gaussian white noise. Then 2

p 5.1 Choice of Thresholds



J j

P ( 3 j j )  C  exp(C )

j;k 1 2

2

 2

Motivated by P4 and P5, we work with the “L -

p

C > 0

J j

where the i are absolute constants.

 = 3 j;k normalized” coefficients j;k in this section.

While P1 and P2 are rather obvious, P3 and P4 require

ft g

In order to choose thresholds j which are very likely

proof. See [14]. to exceed every coefficient in case of a pure noise signal, J

These properties are things we naturally expect of linear

t P (j  j > t )  c  3 =J

j;k j we find j satisfying where

pyramid transforms, such as those of Adelson and Burt, and p

2

J j

3 L

j;k is the -normalized MIPT coefficients of the

P1, P3, and P4 we expect also of wavelet transforms. J

3 1

(X ) X  F

i i:i:d:

pure noise signal i , . Then we have =0

A key property of MIPT but not linear wavelet trans- i

j

P P

3 1 J

1

J

P (9(j; k ) j  j > t )  c  3 ! j

forms is the following: s.t. j;k

k =0 j =j

J 0

J !1: F 0 as It was derived in [14] that, when is a sym- (a) Doppler+Gaussian White Noise (b) Heaviside+Cauchy White Noise

metric law, a valid choice is 10 5000

0 1

s

2 

 5 0

J j

p

3

1 1 1

F 1

J j

@ A

t := t = 3 F + 1 :

j (8) j

J 0 −5000

2 2 2J 3

−5 −10000 We claim that the thresholding rule (8) does not de-

pend much on F despite its appearance. For simplicity we −10 −15000 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

will consider only distributions that are symmetric. Since (c): MIPT coef. of (a) (d): MIPT coef. of (b)

J j

=3

1 −2 −2 j increases very rapidly to 1 as decreases, (8) sug- gests that as long as we are away from the finest scales, the

−4 −4 t

magnitude of j is pretty much governed by the behavior of

1

1=2 F at . This observation can be turned into the follow- −6 −6 ing thoerem. Triad Triad

−8 −8

1 0

(M; ) := fF : (F ) (1=2) =

Theorem 1 Let C

p

1 00

−10 −10

; j(F ) (p)j  M 81=2   p  1=2+  g

2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

0 0 < <1=2

where M> and are absolute constants. For (e) MIPT denoisingt of (a) (f) MIPT denoisingt of (b) 

 5 5

0  2 (0; 1) J = J (;  ; M ;  )

any > and , there exists

F F



1 2

J  J max jt t j  

bJ c

such that if then j ,

j j

8F ;F 2C(M; ):

1 2 0 0 Proof. See [14]. It was also shown that alpha stable laws and, in par- ticular, Cauchy and Gaussian distributions, are members

−5 −5

(M; ) M  of C (with and appropriately calibrated.) The 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 following figure shows a Doppler signal contaminated by Gaussian noise and Cauchy noise in Panel (a) and (b) re- [4] D. L. Donoho. De-noising by soft-thresholding. IEEE Trans. spectively. Panel (c) and (d) depict their MIPT coefficients on Information Theory., 41(3):613Ð27, 1995. plotted scale by scale. Notice the similar distributions in [5] D. L. Donoho and I. M. Johnstone. Adapting to unknown magnitude when comparing Panel (c) and (d) in the three smoothness by wavelet shrinkage. Preprint Department of corasest scales; and the distinguish differences when com- Statistics, Stanford University, 1992. paring the two finest ones Ð a phenomenon described math- [6] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Pi- ematically by Theorem 1. In Panel (e) and (f) we show the card. Wavelet shrinkage: Asymptopia? Journ. Roy. Stat.

Soc., B, 1995. t denoised results by MIPT thresholding, the thresholds j in both cases were set by (8). The performance in the Gaussian [7] N. Dyn, J. Gregory, and D. Levin. Analysis of uniform bi- nary subdivision schemes for curve design. Constr. Approx., case (Panel (a) and (c)) is comparable to results obtained by 7:127Ð147, 1991. orthonormal wavelet thresholding. However, in the Cauchy [8] F. R. Hampel. : the approach based on case there is no hope to obtain results comparable to that influence functions. Wiley, 1986. in Panel (d) using a simple thresholding procedure in linear [9] G. A. Hewer, R. D. Martin, and J. Zeh. Robust prepocessing (orthogonal or bi-orthogonal) wavelet transform domains. for kalman filtering of glint noise. IEEE Trans. on Aerospace and Electron Systems, AES-23(1):120Ð128, Jan. 1987. [10] P. Huber. Robust Statistics. Wiley, 1981. References [11] C. L. Nikias and M. Shao. with Alpha- Stable Distributions and Applications. Wiley, 1995. [1] I. Daubechies. Ten Lectures on Wavelets. Number 61 [12] G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian in CBMS-NSF Series in Applied Mathematics. SIAM, Random Process: Stochastic Models with Infinite Variance. Philadelphia, 1992. Chapman & Hall, 1994. [2] D. L. Donoho. Interpolating wavelet transform. [13] W.-R. Wu. Target tracking with glint noise. IEEE Trans. on Technical Report 408, Department of Statis- Aerospace and Electron Systems, 29(1):174Ð185, Jan. 1993. tics, Stanford University, 1992. Available at [14] T. P.-Y. Yu. New Developments in Interpolating ftp://stat.stanford.edu/reports/donoho/interpol.ps.Z. Wavelet Transforms. PhD thesis, Program of Scien- [3] D. L. Donoho. Smooth wavelet decompositions with blocky tific Computing and Computational Mathematics, Stan- coefficient kernels. In L. Schumaker and G. Webb, edi- ford University, August 1997. Available at http://www- tors, Recent Advances in Wavelet Analysis, pages 259Ð308. sccm.stanford.edu/Students/yu/thesis.ps.gz. Boston: Academic Press, 1993.