Lecture 12
Robust Estimation
Prof. Dr. Svetlozar Rachev
Institute for Statistics and Mathematical Economics
University of Karlsruhe
Financial Econometrics, Summer Semester 2007
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Copyright
These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book:
Financial Econometrics: From Basics to Advanced Modeling Techniques
(Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Outline
III
Robust statistics. Robust estimators of regressions. Illustration: robustness of the corporate bond yield spread model.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics
I
Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed.
II
The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. Robust statistics:
1. assesses the changes in estimates due to small changes in the basic assumptions;
2. creates new estimates that are insensitive to small changes in some of the assumptions.
I
Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics
II
Peter Huber observed, that robust, distribution-free, and
nonparametrical actually are not closely related properties. Example: The sample mean and the sample median are nonparametric estimates of the mean and the median but the mean is not robust to outliers. In fact, changes of one single observation might have unbounded effects on the mean while the median is insensitive to changes of up to half the sample.
I
Robust methods assume that there are indeed parameters in the distributions under study and attempt to minimize the effects of outliers as well as erroneous assumptions on the shape of the distribution.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Qualitative and Quantitative Robustness
II
Estimators are functions of the sample data. Given an N-sample of data X = (x1, . . . , xN)0 from a population with a cdf F(x), depending on parameter Θ∞, an
ˆestimator for Θ∞ is a function ϑ = ϑN(x1, . . . , xN)..
Consider those estimators that can be written as functions of
I
the cumulative empirical distribution function:
N
X
FN(x) = N−1
I(xi ≤ x)
i=1
where I is the indicator function. For these estimators we can write
ˆϑ = ϑN(FN)
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Qualitative and Quantitative Robustness
II
Most estimators, in particular the ML estimators, can be written in this way with probability 1.
ˆ
In general, when N → ∞ then FN(x) → F(x) and ϑN → ϑ∞
ˆin probability. The estimator ϑN is a random variable that
depends on the sample.
II
Under the distribution F, it will have a probability distribution LF (ϑN).
Statistics defined as functionals of a distribution are robust if they are continuous with respect to the distribution.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Qualitative and Quantitative Robustness
II
In 1968, Hampel introduced a technical definition of qualitative robustness based on metrics of the functional space of distributions.
It states that an estimator is robust for a given distribution F if small deviations from F in the given metric result in small deviations from LF (ϑN) in the same metric or eventually in some other metric for any sequence of samples of increasing size.
I
The definition of robustness can be made quantitative by assessing quantitatively how changes in the distribution F affect the distribution LF (ϑN).
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Resistant Estimators
II
An estimator is called resistant if it is insensitive to changes in one single observation.
ˆ
Given an estimator ϑ = ϑN(FN),we want to understand what
happens if we add a new observation of value x to a large sample. To this end we define the influence curve (IC), also
called influence function.
I
The IC is a function of x given ϑ, and F is defined as follows: ϑ((1 − s)F + sδx ) − ϑ(F)
ICϑ,F (x) = lim
s→0
s
where δx denotes a point mass 1 at x.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Resistant Estimators
II
As we can see from its previous definition, the IC is a function of the size of the single observation that is added. In other words, the IC measures the influence of a single observation x on a statistics ϑ for a given distribution F.
In practice, the influence curve is generated by plotting the value of the computed statistic with a single point of X added to Y against that X value. Example: The IC of the mean is a straight line.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Resistant Estimators
Several aspects of the influence curve are of particular interest:
I
Is the curve ”bounded” as the X values become extreme? Robust statistics should be bounded. That is, a robust statistic should not be unduly influenced by a single extreme point.
II
What is the general behavior as the X observation becomes extreme? For example, does it becomes smoothly down-weighted as the values become extreme?
What is the influence if the X point is in the ”center” of the Y points?.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Breakdown Bound
The breakdown (BD) bound or point is the largest possible fraction of observations for which there is a bound on the change of the estimate when that fraction of the sample is altered without restrictions.
Example: We can change up to 50% of the sample points without provoking unbounded changes of the median. On the contrary, changes of one single observation might have unbounded effects on the mean.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Rejection Point
II
The rejection point is defined as the point beyond which the IC becomes zero.
Note: The observations beyond the rejection point make no contribution to the final estimate except, possibly, through the auxiliary scale estimate.
Estimators that have a finite rejection point are said to be redescending and are well protected against very large outliers. However, a finite rejection point usually results in the underestimation of scale.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: Main concepts
II
The gross error sensitivity expresses asymptotically the maximum effect that a contaminated observation can have on the estimator. It is the maximum absolute value of the IC.
The local shift sensitivity measures the effect of the removal of a mass at y and its reintroduction at x. For continuous and differentiable IC, the local shift sensitivity is given by the maximum absolute value of the slope of IC at any point.
I
Winsor’s principle states that all distributions are normal in the middle.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: M-Estimators
II
M-estimators are those estimators that are obtained by minimizing a function of the sample data.
Suppose that we are given an N-sample of data X= (x1, . . . , xN)0. The estimator T(x1, . . . , xN) is called an M-estimator if it is obtained by solving the following minimum problem:
- (
- )
N
X
T = arg mint J =
ρ(xi , t)
i=1
where ρ(xi , t) is an arbitrary function.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: M-Estimators
Alternatively, if ρ(xi , t) is a smooth function, we can say that T is an M-estimator if it is determined by solving the equations:
N
X
ψ(xi , t) = 0
i=1
where
∂ρ(xi , t)
ψ(xi , t) =
∂t
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: M-Estimators
II
When the M-estimator is equivariant, that is T(x1 + a, . . . , xN + a) = T(x1, . . . , xN) + a, ∀a ∈ R, we can write ψ and ρ in terms of the residuals x − t.
Also, in general, an auxiliary scale estimate, S, is used to obtain the scaled residuals r = (x − t)/S. If the estimator is also equivariant to changes of scale, we can write
- ꢀ
- ꢁ
ꢁ
x − t
ψ(x, t) = ψ
= ψ(r)
S
ꢀ
x − t
ρ(x, t) = ρ
= ρ(r)
S
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: M-Estimators
I
ML estimators are M-estimators with ρ = − log f , where f is the probability density.
II
The name M-estimators means maximum likelihood-type estimators. LS estimators are also M-estimators.
The IC of M-estimators has a particularly simple form. In fact, it can be demonstrated that the IC is proportional to the function ψ:
IC = Constant × ψ
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: L-Estimators
Consider an N-sample (x1, . . . , xN)0. Order the samples so that x(1) ≤ x(2) ≤ · · · ≤ x(N). The i-th element X = x(i) of the ordered sample is called the i-th order statistic.
II
L-estimators are estimators obtained as a linear combination of order statistics:
N
X
L =
ai x(i)
i=1
where the ai are fixed constants. Constants are typically normalized so that
N
X
ai = 1
i=1
I
An important example of an L-estimator is the trimmed mean. It is a mean formed excluding a fraction of the highest and/or lowest samples.
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe
Robust Statistics: R-Estimators
R-estimators are obtained by minimizing the sum of residuals weighted by functions of the rank of each residual. The functional to be minimized is the following:
- (
- )
N
X
arg min J =
a(Ri )ri
i=1
where Ri is the rank of the i-th residual ri and a is a nondecreasing score function that satisfies the condition
N
X
a(Ri ) = 0
i=1
Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicLaelctEucroen1o2mRicosbUunstivEersstiitmyaotfioKnarlsruhe