Lecture 12 Robust Estimation

Lecture 12
Robust Estimation

Prof. Dr. Svetlozar Rachev
Institute for Statistics and Mathematical Economics
University of Karlsruhe

Financial Econometrics, Summer Semester 2007

Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe

Copyright

These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book:

Financial Econometrics: From Basics to Advanced Modeling Techniques

(Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`.

Outline

III

Robust statistics. Robust estimators of regressions. Illustration: robustness of the corporate bond yield spread model.

Robust Statistics

I

Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed.

II

The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. Robust statistics:

1. assesses the changes in estimates due to small changes in the basic assumptions;
2. creates new estimates that are insensitive to small changes in some of the assumptions.

I

Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data.

Robust Statistics

II

Peter Huber observed, that robust, distribution-free, and

nonparametrical actually are not closely related properties. Example: The sample mean and the sample median are nonparametric estimates of the mean and the median but the mean is not robust to outliers. In fact, changes of one single observation might have unbounded eﬀects on the mean while the median is insensitive to changes of up to half the sample.

I

Robust methods assume that there are indeed parameters in the distributions under study and attempt to minimize the eﬀects of outliers as well as erroneous assumptions on the shape of the distribution.

Robust Statistics: Qualitative and Quantitative Robustness

II

Estimators are functions of the sample data. Given an N-sample of data X = (x₁, . . . , x_N)⁰from a population with a cdf F(x), depending on parameter Θ_∞, an
ˆestimator for Θ_∞is a function ϑ = ϑ_N(x₁, . . . , x_N)..

Consider those estimators that can be written as functions of

I

the cumulative empirical distribution function:

N

X

F_N(x) = N⁻¹

I(x_i≤ x)

i=1

where I is the indicator function. For these estimators we can write
ˆϑ = ϑ_N(F_N)

II

Most estimators, in particular the ML estimators, can be written in this way with probability 1.
ˆ
In general, when N → ∞ then F_N(x) → F(x) and ϑ_N→ ϑ_∞

ˆin probability. The estimator ϑ_Nis a random variable that

depends on the sample.

II

Under the distribution F, it will have a probability distribution L_F(ϑ_N).

Statistics deﬁned as functionals of a distribution are robust if they are continuous with respect to the distribution.

II

In 1968, Hampel introduced a technical deﬁnition of qualitative robustness based on metrics of the functional space of distributions.

It states that an estimator is robust for a given distribution F if small deviations from F in the given metric result in small deviations from L_F(ϑ_N) in the same metric or eventually in some other metric for any sequence of samples of increasing size.

I

The deﬁnition of robustness can be made quantitative by assessing quantitatively how changes in the distribution F aﬀect the distribution L_F(ϑ_N).

Robust Statistics: Resistant Estimators

II

An estimator is called resistant if it is insensitive to changes in one single observation.
ˆ
Given an estimator ϑ = ϑ_N(F_N),we want to understand what

happens if we add a new observation of value x to a large sample. To this end we deﬁne the inﬂuence curve (IC), also

called inﬂuence function.

I

The IC is a function of x given ϑ, and F is deﬁned as follows: ϑ((1 − s)F + sδ_x) − ϑ(F)
IC_ϑ,F(x) = lim

s→0

s

where δ_xdenotes a point mass 1 at x.

II

As we can see from its previous deﬁnition, the IC is a function of the size of the single observation that is added. In other words, the IC measures the inﬂuence of a single observation x on a statistics ϑ for a given distribution F.

In practice, the inﬂuence curve is generated by plotting the value of the computed statistic with a single point of X added to Y against that X value. Example: The IC of the mean is a straight line.

Several aspects of the inﬂuence curve are of particular interest:

I

Is the curve ”bounded” as the X values become extreme? Robust statistics should be bounded. That is, a robust statistic should not be unduly inﬂuenced by a single extreme point.

II

What is the general behavior as the X observation becomes extreme? For example, does it becomes smoothly down-weighted as the values become extreme?

What is the inﬂuence if the X point is in the ”center” of the Y points?.

Robust Statistics: Breakdown Bound

The breakdown (BD) bound or point is the largest possible fraction of observations for which there is a bound on the change of the estimate when that fraction of the sample is altered without restrictions.

Example: We can change up to 50% of the sample points without provoking unbounded changes of the median. On the contrary, changes of one single observation might have unbounded eﬀects on the mean.

Robust Statistics: Rejection Point

II

The rejection point is deﬁned as the point beyond which the IC becomes zero.

Note: The observations beyond the rejection point make no contribution to the ﬁnal estimate except, possibly, through the auxiliary scale estimate.

Estimators that have a ﬁnite rejection point are said to be redescending and are well protected against very large outliers. However, a ﬁnite rejection point usually results in the underestimation of scale.

Robust Statistics: Main concepts

II

The gross error sensitivity expresses asymptotically the maximum eﬀect that a contaminated observation can have on the estimator. It is the maximum absolute value of the IC.

The local shift sensitivity measures the eﬀect of the removal of a mass at y and its reintroduction at x. For continuous and diﬀerentiable IC, the local shift sensitivity is given by the maximum absolute value of the slope of IC at any point.

I

Winsor’s principle states that all distributions are normal in the middle.

Robust Statistics: M-Estimators

II

M-estimators are those estimators that are obtained by minimizing a function of the sample data.

Suppose that we are given an N-sample of data X= (x₁, . . . , x_N)⁰. The estimator T(x₁, . . . , x_N) is called an M-estimator if it is obtained by solving the following minimum problem:

(

)

N

X

T = arg min_tJ =

ρ(x_i, t)

i=1

where ρ(x_i, t) is an arbitrary function.

Alternatively, if ρ(x_i, t) is a smooth function, we can say that T is an M-estimator if it is determined by solving the equations:

N

X

ψ(x_i, t) = 0

i=1

where

∂ρ(x_i, t)

ψ(x_i, t) =

∂t

II

When the M-estimator is equivariant, that is T(x₁+ a, . . . , x_N+ a) = T(x₁, . . . , x_N) + a, ∀a ∈ R, we can write ψ and ρ in terms of the residuals x − t.

Also, in general, an auxiliary scale estimate, S, is used to obtain the scaled residuals r = (x − t)/S. If the estimator is also equivariant to changes of scale, we can write

ꢀ

ꢁ

x − t

ψ(x, t) = ψ

= ψ(r)

S

ꢀ

x − t

ρ(x, t) = ρ

= ρ(r)

S

I

ML estimators are M-estimators with ρ = − log f , where f is the probability density.

II

The name M-estimators means maximum likelihood-type estimators. LS estimators are also M-estimators.

The IC of M-estimators has a particularly simple form. In fact, it can be demonstrated that the IC is proportional to the function ψ:

IC = Constant × ψ

Robust Statistics: L-Estimators

Consider an N-sample (x₁, . . . , x_N)⁰. Order the samples so that x₍₁₎≤ x₍₂₎≤ · · · ≤ x_(N). The i-th element X = x_(i)of the ordered sample is called the i-th order statistic.

II

L-estimators are estimators obtained as a linear combination of order statistics:

N

X

L =

a_ix_(i)

i=1

where the a_iare ﬁxed constants. Constants are typically normalized so that

N

X

a_i= 1

i=1

I

An important example of an L-estimator is the trimmed mean. It is a mean formed excluding a fraction of the highest and/or lowest samples.

Robust Statistics: R-Estimators

R-estimators are obtained by minimizing the sum of residuals weighted by functions of the rank of each residual. The functional to be minimized is the following:

(

)

N

X

arg min J =

a(R_i)r_i

i=1

where R_iis the rank of the i-th residual r_iand a is a nondecreasing score function that satisﬁes the condition

N

X

a(R_i) = 0

i=1