<<

Schneeweiss, Nittner: Estimating A Polynomial Regression With Measurement Errors In The Structural And In The Functional Case - A Comparison

Sonderforschungsbereich 386, Paper 197 (2000)

Online unter: http://epub.ub.uni-muenchen.de/

Projektpartner

ESTIMATING A POLYNOMIAL

REGRESSION WITH MEASUREMENT

ERRORS IN THE STRUCTURAL AND

IN THE FUNCTIONAL CASE A

COMPARISON

HANS SCHNEEWEISS AND THOMAS NITTNER

UNIVERSITY OF MUNICH

SUMMARY

Two metho ds of estimating the of a p olynomial regression with mea

surement errors in the regressor variable are compared to each other with resp ect to

their relative eciency and robustness One of the two estimators SLS is valid for

the structural variant of the mo del and uses the assumption that the true regressor

also its small sample variable is normally distributed while the other one ALS and

mo dication MALS do es not need any assumption on the regressor distribution SLS

turns out to react rather strongly on violations of the normality assumption as far as

its bias is concerned but is quite robust with resp ect to its MSE It is more ecient

than ALS or MALS whenever the normality assumption holds true

Keywords Polynomial regression measurement errors eciency robustness

INTRODUCTION

It is well known that if a regressor variable of a is measured with

errors the ordinary OLS or naive estimator of the corresp onding

slop e will be biased the bias usually being such that it will attenuate

the true value of the slop e parameter Cheng and Van Ness Fuller

Schneeweiss and Mittag Assuming that the error is a random variable with

exp ectation zero and indep endent of the true regressor variable and that the value of

its is known one can use this known error variance to construct an adjusted

least squares ALS estimator whichdoesnothaveany bias but is in fact consistent

The construction principle involved is that of the corrected score function metho d

which can be applied not only to the linear but also to a large class of nonlinear

mo dels Nakamura Buonaccorsi In this pap er it will b e applied to the

p olynomial regression mo del just as in Cheng and Schneeweiss see also Chan

and Mak and Stefanski

The ALS estimator do es not use any further information beyond the knowledge

of the error variance Supp ose however that even though the time regressor variable

cannot b e observed its distribution were known then this additional knowledge could

b e used to construct a p ossibly sup erior estimator by using another principle than that

for the ALS The idea is to start from a function in the true regressor variables

as the original mo del supplemented byavariance function and to transform it into a

mean function mo del in the observable regressor by taking conditional exp ectations

Again this principle can b e applied to a large class of mo dels including the p olynomial

regression mo del Thamerus Carrol et al As the case of a known

regressor distribution corresp onds to what is usually called the structural variant of

a measurement error mo del this new estimator will b e denoted by the same name a

structural least squares estimator SLS It can b e most easily constructed if a normal

distribution is assumed for the regressor variable

In such a case SLS will presumably be b etter than ALS in the sense of having

a smaller asymptotic variance both estimators b eing consistent However if the

normality assumption is not valid the SLS estimator may lo ose its sup eriority In

deed the ALS estimator is more robust than the SLS estimator owing to the fact

that it do es not dep end on any particular distribution of the regressor variable In

fact the true regressor can even be thought of as being nonsto chastic i e just an

unknown constant for each observation a case which is called the functional variant

of a measurement error mo del Thus ALS is a metho d go o d for the functional variant

but can also b e used in the structural variant case whereas SLS explicitly makes use

of the distributional assumption of the structural variant of the measurement error

mo del and do es dep end on the validityof this assumption

In the present pap er we want to compare these two estimation metho ds rstly

when the distribution of the true regressor variable is correctly sp ecied as Gaussian

and secondly when it is nonGaussian but incorrectly assumed to be Gaussian One

may exp ect the ALS estimator not to be eected very much by the shap e of the

regressor distribution and thus it will behave similarly whether the distribution is

correctly sp ecied or not but the SLS estimator will clearly dep end on the correct

sp ecication of the regressor distribution The question is to what extent do es the

SLS estimator react to a missp ecication of the regressor distribution When will its

prop erties deteriorate so much that it will b ecome inferior to the more robust ALS

estimator

The comparison will be done by way of a simulation study and will thus cover

small sample prop erties of the estimators As with small samples ALS do es not

behave very nicely the estimates b ecoming very unstable ALS has to be mo died

so that its small sample variations b ecome more stable and it can b e compared more

easily with SLS which apparently needs no mo dication The mo died metho d is

called MALS in this pap er The idea of mo dication stems from Fuller for a

linear mo del and has b een adapted to the p olynomial measurement error mo del by

Cheng et al

In a recent pap er Kuha and Temple carried out a similar study trying to

answer the same question as in this pap er They do however not assume the error

variance to be known the usual assumption but rather the noisetosignalratio

Asymptotically these two approaches do not dier but in small samples there may

b e dierences Kuha and Temple also do not go b eyond the quadratic mo del On the

other hand they study some other estimation metho ds as well

In the next section a brief exp osition is given of the estimation metho ds ALS and

SLS involved Section then describ es the simulation study and presents its results

The nal section has some concluding remarks

ESTIMATION METHODS

Adjusted least squares ALS and MALS

The mo del that is investigated is a p olynomial regression in a latent variable that

can only b e measured with a measurement error

k

y

i  i k i

i

x i n

i i i

x b eing the observed regressor variable We assume the errors to be iid

i i

 

Gaussian indep endent of the s with and and covariance

i

r

x of degree r such that Et x It is then p ossible to construct p olynomials t

r r i

i

Let H be a k k matrix with elements H t x rs k

i i rs r s i

and h a k vector with elements h y t x r k then the

i i r i r i

unmo died ALS estimator of is given as the solution of

ALS k

H h

ALS

P



where the bar denotes averages e g H H for details see Cheng and

i

n

Schneeweiss

This estimator is consistent and asymptotically normal For small samples how

ever it can give rise to large estimation errors at least o ccasionally and in particular

 

if the noisetosignalratio is large say larger than A mo dication of ALS

is available which reduces the estimators variance considerably without intro ducing

any conceivable bias For this dene the vector t t x t x and the

i i k i

is given as the solution of matrix V t t H Then the MALS estimator of

i i i

i

tt aV h

MALS

where



if

n n

a

 n

if

n n

b eing the smallest p ositive ro ot which always exists of



y yt

det

V

ty tt

where k The estimator is an adaptation of Fullers small sample

improvement of the parameter estimates in a linear mo del with measurement errors

For details see Cheng et al

Structural least squares SLS



In order to intro duce SLS assume iid N write the regression mo del

i

as a conditional meanvariance mo del see Caroll et al

k

Ey j

 k



Vy j

and nd a new meanvariance mo del in the observable variable x by taking conditional

exp ectations given x

k

X

Ey j x x

j j

j 

k k

X X



Vy j x f x x xg

j l j l j l

j 

l 

r

where x E j x The conditional moments are easily computed using the

r



with fact that the conditional distribution of given x is N x

 

x x

x x

x

   

x

 r

Let Ef x g j x b e the r th conditional central momentof given x then

r

if r is o dd



r

r

r if r is even

is given by and x

r

r

X

r

 r j

x x

r

j

j

j 



and can be estimated by their empirical counterparts

x

x

n

X

 

x x x

x i

x

n

i

 

If these are substituted for and in to estimates of and x

x

x

and nally of x arise Replacing the x in and by their estimates and

r r

substituting the observable values x for the variable x we nally get a meanvariance

i

mo del for the observable with mean and variance functions

k

X

Ey j x x x

i j j i

j 

k k

X X



Vy j x x f x x x g

i j l j l i j i l i

j 

l 

One can derive estimates for the s for this mo del by an iteratively reweighted



least squares metho d where in each step s an estimate for has to be up dated

using the residuals of the previous step s

n k

X X

s

s 

  

y f n k x g

i j i

j

i j 

For details see Thamerus and for the general metho d Carroll et al

It mightbementioned that an approximate metho d exists where Ey jx is approxi

mated by replacing in with E jx This is the regression calibration metho d

Carrol et al which however can only reduce the measurement error bias not

remove it A more elab orate expanded regression calibration metho d is also avail

able and is in fact used by Kuha and Temple in their simulation study In

the quadratic mo del it coincides with the metho d used here but not in higher order

p olynomials where it is only appro ximately unbiased

SIMULATIONS

In order to compare the p erformance of ALS b oth unmo died and mo died with

that of SLS a simulation study was run Several p olynomial mo dels were studied

which diered in the degree of the p olynomial k or and in the distribution

of The parameter values were and for k

 

 

In all cases the error variances were taken to be A much





smaller error variance was also exp erimented with but in this case the

results of the various estimation metho ds did not dier very much For larger error



variances like the results b ecame rather unstable The sample size was

xed at n Three distributions for were chosen the Gaussian distribution



N the uniform distribution where the were xed at the equidistant p oints

i



i

i and the exp onential distribution Exp shifted to the left



p

 

by the amount with Clearly In all three cases E and Var



for the uniform distribution which here is taken to b e nonsto chastic E andVar

P

 

  

holds only have to b e replaced with and s and the equality s

i

n 

approximately

For these altogether mo dels articial samples were generated and the three

estimation metho ds ALS MALS and SLS were applied The simulations were run

with N replications Bias and of the estimates were computed

and can be compared between mo dels and between estimation metho ds The naive

OLS estimator was also computed but is not repro duced here It clearly showed a

signicant bias in almost all cases as was to b e exp ected

In table the simulation results are presented for k and in table for



k Note that the noisetosignal ratio Var is rather high so that a

noticeable bias for the naive estimator not shown here results The SLS estimator

is consistent if is actually normally distributed The other estimators ALS and

MALS are always consistent whatever the distribution of Nevertheless they may

show some bias in small or medium sized samples where n may b e considered

medium sized It is for this reason that the bias is shown in Tables and even

though it turns out to be rather small and often insignican t for all the consistent

estimators

From table it is seen that in the quadratic case ALS and MALS estimators

hardly dier at all so that one might think the small sample mo dication of ALS was

not necessary There are however alb eit rare cases where the ALS estimate has an

extremely high estimation error which is then greatly reduced by the MALS metho d

Apparently such a case did not come up in the presentsimulation study Nevertheless

the MALS metho d should always be used if only for precautionary reasons

The necessity of using MALS instead of ALS is seen most clearly in Table

For the cubic regression the MALS estimator has always a conspicuously smaller

While the standard deviation of MALS is rather mo dest that of

ALS is often extremely large rendering the ALS metho d almost useless in this case

Letusnow compare MALS to SLS The standard deviations of the SLS estimators

are always smaller than those of MALS regardless of the distribution of However

if we consider the bias it is seen that on the whole though not always SLS has

a smaller bias than MALS if is normally distributed but a signicantly higher

bias if the distribution of deviates from the normal one the dierence being most

prominentinthe case of the exp onential distribution

It should b e noted that the values for the bias are only estimated values A rough

p

ruleofthumb condence interval for the true bias is given by B N

is the estimated bias of the estimated regression co ecient as shown where B

in the tables is the corresp onding estimated standard deviation and N

p

Bias values with one asterisk dier from zero by N and with two asterisks by

p

N

A simple and comprehensive measure of precision is the overall MSE which here

is dened as the sum of the MSEs for to k or This measure is shown

k

in Table for the six mo dels

It is seen that the SLS estimators have smaller overall MSE than the MALS

estimators in mo dels with a normal and uniform distribution of For the exp onential

distribution however the overall MSE of the SLS estimators is larger in the quadratic

regression k but still smaller in the cubic regression k although only

slightly so

CONCLUSION

Several conclusions can b e drawn from the results of this simulation exp eriment

in this pap er are rather stable and do not dier The estimators considered

to o much in the quadratic mo del On the other hand due to the high mul

ticollinearity all the estimators b ecome rather unstable in the cubic case and

dier considerably with regard to their variances

In particular the ALS estimator a simple adjustment of the naive estimator

although b eing consistent has very bad small sample prop erties for the cubic

regression Here a mo dication of ALS viz MALS greatly reduces the instabil

ity of the estimator giving rise to reasonable standard errors In the quadratic



mo del ALS and MALS hardly dier This changes however when increases

e g to Then ALS b ecomes unstable also for the quadratic case

While MALS just like ALS is a consistent metho d whatever the distribution

of another estimation pro cedure develop ed for the structural variant of the

measurement error mo del viz SLS dep ends heavily on the assumption of nor

mally distributed variables As long as this assumption is true SLS is sup erior

to MALS b oth with regard to bias and to the standard error

When the distribution of deviates from the normal distribution SLS b ecomes

strongly biased the more so the farther away the distribution of gets from

normality However the standard error of the SLS estimator is still rather small

indeed so small that the overall MSE of SLS is smaller than that of MALS in

most cases except for the cubic regression with an exp onential distribution of

This MSE behavior of SLS will certainly change when either the sample size is



b ecomes smaller In these cases the overall increased or the error variance

MSE will typically be always larger for the SLS estimators whenever the dis

tribution of is nonnormal This is testied by the results of table They



show that for the overall MSE of SLS is always considerably larger

than the MSE of MALS except for the case of normal

To sum up SLS is always sup erior to MALS when the assumption of normality

of is valid Whenever deviates from normality SLS b ecomes biased and as

far as one is solely concerned with the bias MALS should be prefered to SLS

The picture is not so clear when one takes the MSE as a precision criterion

With resp ect to this measure SLS is rather robust at least for not to o large

 

samples and if is large enough For small and for large sample size SLS

deteriorates with resp ect to its overall MSE as compared to MALS

ACKNOWLEDGEMENT

We thank T Augustin for some useful comments on an earlier draft of the pap er

Normal

ALS MALS SLS

 

   Bias



  



  

   Standard deviation



  



Uniform

ALS MALS SLS

 

   Bias



  



  

   Standard deviation



  



Exp onential

ALS MALS SLS

  

   Bias



  





   Standard deviation



  



Table Bias and standard error of three estimators in three dierentmodelswithk

Normal

ALS MALS SLS



   Bias



  



 



 

    Standard deviation



  



 



Uniform

ALS MALS SLS

  

   Bias



  



  



  

   Standard deviation



  



   



Exp onential

ALS MALS SLS

 

   Bias



  



  



 

  Standard deviation



  



  



Table Bias and standard error of three estimators in three dierentmodelswithk

distribution of

degree estimator normal uniform exp onential

k  MALS

SLS

k MALS

SLS



Table Overall MSE of the estimators in six dierent mo dels

distribution of

degree estimator normal uniform exp onential

k MALS

SLS

k MALS

SLS



Table Overall MSE of the estimators in six dierentmodels

REFERENCES

Buonaccorsi JP A Mo died Estimating Equation Approach to Correcting

for Measurement Error in Regression Biometrika

Caroll RJ Ruppert D and Stefanski LA Measurement Error in Nonlinear

Models Chapman and Hall London

ChanLKMak TK On the Polynomial Functional Relationship JRSSB

Cheng CL and van Ness JW Statistical Regression with Measurement Error

Arnold London

Cheng CL and Schneeweiss H The Polynomial Regression with Errors

in the Variables JRSSB

Schneeweiss H and Thamerus M A Small Sample Estimator Cheng CL

for a Polynomial Regression with Errors in the Variables Discussion Pap er

Sonderforschungsb ereich University of Munich

FullerW A Measurement Error Models WileyNew York

Kuha JT Temple J Covariate Measurement Error in Quadratic Regression

Pap er presented at the nd Session of the International Statistical Institute Helsinki

Nakamura T Corrected Score Functions for ErrorsinVariables Mo dels

Metho dology and Application to Generalized Linear Mo dels Biometrika

Schneewei H and Mittag H J LineareModel le mit fehlerbehafteten Daten Phys

ica Heidelb erg

Stefanski L A Unbiased estimation of a nonlinear function of a normal

mean with application to measuremen t error mo dels Communications in

Part A Theory and Metho ds

Thamerus M Dierent Mo dels with Incorrectly Ob

served Covariates In R Galata and H Kuchenho Eds in Theory and

Practice Festschrift for Hans Schneewei Physica Heidelb erg New York