Median Unbiased Estimation of Coefficient Variance in a Time

Total Page:16

File Type:pdf, Size:1020Kb

Median Unbiased Estimation of Coefficient Variance in a Time Median Unbiased Estimationof CoefficientVariance in a Time-VaryingParameter Model James H. STOCKand MarkW. WATSON This articleconsiders inference about the varianceof coefficientsin time-varyingparameter models with stationaryregressors. The Gaussian maximumlikelihood estimator (MLE) has a largepoint mass at 0. We thusdevelop asymptotically median unbiased estimatorsand asymptoticallyvalid confidenceintervals by invertingquantile functions of regression-basedparameter stability test statistics,computed under the constant-parameternull. These estimatorshave good asymptoticrelative efficiencies for small to moderateamounts of parametervariability. We applythese results to an unobservedcomponents model of trendgrowth in postwar U.S. per capita gross domesticproduct. The MLE impliesthat there has been no change in the trendgrowth rate, whereas the upperrange of the median-unbiasedpoint estimates imply that the annualtrend growth rate has fallenby 0.9% per annumsince the 1950s. KEY WORDS: Stochasticcoefficient model; Structuraltime series model; Unit movingaverage root; Unobserved components. 1. INTRODUCTION We considerthe problem of estimationof thescale pa- Sinceits introduction in the early 1970s by Cooleyand rameterr. If (as is common)et and nt are assumedto be Prescott(1973a,b, 1976), Rosenberg (1972, 1973), and Sar- jointlynormal and independentof {Xt, t-1) . .; T}, then ris(1973), the time-varying parameter (TVP), or "stochastic theparameters of (1)-(4) can be estimatedby maximum coefficients,"regression model has been used extensivelylikelihood implemented by the Kalman filter. However, the in empiricalwork, especially in forecastingapplications. maximum likelihood estimator (MLE) has theundesirable Chow (1984), Engle and Watson(1987), Harvey(1989), propertythat if r is small,then it has point mass at 0. In the Nicholsand Pagan (1985), Pagan (1980), and Stockand case Xt - 1, thisis relatedto theso-called pile-up prob- Watson(1996) haveprovided references and discussion of lemin the first-order moving average [MA( 1)] modelwith a thismodel. The appealof the TVP modelis thatby permit- unitroot (Sargan and Bhargava 1983; Shephard and Harvey tingthe coefficients toevolve stochastically over time, it can 1990).In thegeneral TVP model(1)-l4), the pile-up proba- be appliedto timeseries models with parameter instability. bility depends on theproperties of Xt andcan be large.The The TVP modelconsidered in thisarticle is pile-upprobability is a particularproblem when 7 is small and thusis readilymistaken for 0. Arguably,small values yt= 3itXt+ ut, (1) of r are appropriatefor many empirical applications; in- deed,if 7 is large,then the distribution of the MLE canbe ,3t /3t-l+ vt, (2) approximatedby conventionalT1/2-asymptotic normality, butMonte Carlo evidence suggests that this approximation a(L)ut = Et, (3) is poorin manycases of empiricalinterest. (See Davis and and Dunsmuir1996 and Shephard1993 fordiscussions in the case of Xt = 1.) vt = Tvt, where vt = B(L)qt, (4) We thusfocus on theestimation of r whenit is small.In particular,we considerthe nesting where {(yt,Xt),t = 1,... ,T} are observed,Xt is an ex- ogenousk-dimensional regressor, /3t is a k x 1 vectorof r = A/T. (5) unobservedtime-varying coefficients, T is a scalar,a(L) is a scalarlag polynomial,B(L) is a k x k matrixlag polyno- Orderof magnitudecalculations suggest that this might be an appropriatenesting for certain empirical problems of mial,and Et and qt are seriallyand mutually uncorrelated mean0 randomdisturbances. (Additional technical condi- interest,such as estimatingstochastic variation in thetrend tionsused forthe asymptotic results are givenin Section componentin thegrowth rate of U.S. realgross domestic product(GDP), as we discussin Section4. Thisis also the 2, wherewe also discussrestrictions on B(L) andE(rqtqr) thatare sufficientto identifythe scale factorT.) An im- nestingused to obtainlocal asymptoticpower functions of testsof 7r a fact thatif the researcher is in a portantspecial case of thismodel is whenXt = 1 and 0, suggesting B(L) = 1; followingHarvey (1985), we referto thiscase regionin which tests yield ambiguous conclusions about the as the"local-level" unobserved components model. nullhypothesis 7r 0, thenthe nesting (5) is appropriate. The maincontribution of this article is thedevelopment of asymptoticallymedian unbiased estimators of A and JamesH. Stock is Professorof PoliticalEconomy, Kennedy School of asymptoticallyvalid confidence intervals for A in the model Government,Harvard University, Cambridge, MA 02138, and Research (1)-(5).These are obtained by inverting asymptotic quantile Associate at the National Bureau of Economic Research (NBER). Mark functionsof statisticsthat test the hypothesis A = 0. The W. Watsonis Professorof Economicsand PublicAffairs, Woodrow Wilson School,Princeton University, Princeton, NJ 08544, andResearch Associate at the NBER. The authorsthank Bruce Hansen,Andrew Harvey and two refereesfor comments on an earlierdraft, and Lewis Chan and Jonathan ? 1998 AmericanStatistical Association Wrightfor researchassistance. This researchwas supportedin part by Journalof the AmericanStatistical Association National Science Foundationgrant SBR-9409629. March1998, Vol.93, No. 441, Theoryand Methods 349 350 Journalof the AmericanStatistical Association, March 1998 teststatistics are based on generalizedleast squares (GLS) and residuals,which are readilycomputed under the null. As partof thecalculations, we obtainasymptotic representa- FT(s) = (SSRl,T- SSRl,[Ts] - SSRrTs]+1,T) tionsfor a familyof testsunder the local alternative(5). ? + - k)]. (9) Theserepresentations can be usedto computelocal asymp- [k(SSRl,[Ts] SSR[TS]+l,T)/(T toticpower functions against nonzero values of A. Section 2 presentsthese theoretical results. (For othertests in versionsof thismodel, see Franziniand Section3 providesnumerical results for the special cases Harvey 1983; Harveyand Streibel1997; King; and Hillier ofthe univariate local-level model. Properties of the median 1985; Nabeya and Tanaka 1988; Nyblom1989; Reinsel and unbiasedestimators are compared to twoMLEs, whichal- Tam 1996; Shively1988.) ternativelymaximize the marginal and the profile (or con- The FT statisticis an empirical process, and infer- centrated)likelihoods; these MLEs differin theirtreatment ence is performedusing one-dimensionalfunctionals of of theinitial value for 3t. BothMLEs arebiased and have FT. We consider three such functionals:the maximum largepile-ups at A = 0. WhenA is small,the median un- FT statistic(the Quandt [1960] likelihoodratio statistic), biasedestimators are more tightly concentrated around the QLRT = supSE(S,S1) FT(s); themean Wald statisticof An- truevalue of A thaneither MLE. drews and Ploberger(1994) and Hansen (1992), MWT = Section4 presentsan applicationto theestimation of a fZ FT(r) dr; and the Andrews-Ploberger(1994) exponen- long-runstochastic trend for the growthrate of postwar tial Wald statistic,EWT = ln{f1 exp( FT(r)) dr}, where real percapita GDP in theUnited States. Point estimates O < so < SI < 1. fromthe median unbiased estimators suggest a slowdownin Three assumptionsare used to obtainthe asymptoticre- theaverage trend rate of growth;the largest point estimate sults. For a stationaryprocess Zt, let ci ...i.(r, ..rn-1) suggestsa slowdownof approximately .9% perannum from denote the nth joint cumulantof zi1t1,...)zintn, where the1950s to the1990s. The MLEs suggesta muchsmaller rj = tj-tn,j = 1, . ..,n-1 (Brillinger1981), and let decline,with point estimates ranging from 0 to .2%. Section C(ri,..., rn-1) = Supil,.,in Cil ... in(rl, * * rn-1). 5 concludes. AssumptionA. Xt is stationarywith eighth order cumu- 2. THEORETICAL RESULTS lantsthat satisfy ,.rJ IC(rI,... ,r7) < 00 We assumethat a(L) has knownfinite order p andthus t = of considerstatistics based on feasible GLS. Specifically,(a) Yt AssumptionB. {Xt, 1,... ,T} is independent t1, ...,T}. is regressedon Xt byordinary least squares (OLS), produc- {Ut, Vt, ingresiduals fit; (b) a univariateAR(p) is estimatedby OLS Assumption C. (Ct, r)' is a (k + 1) x 1 vectorof iid of fit on , and regression (1,It-i,... yielding&(L); errorswith mean 0 and fourmoments; Et and rt are inde- (c) t = &(L)ytis regressedon Xt &(L)Xt,yielding the pendent;a(L) has finite-orderp; and B(L) is one-summable GLS estimator,3 - (T-1 Et=L1XtX ) t1T-1 t=1xy, withB(1) 340. residualset andmoment matrix V: AssumptionA requiresthat Xt have boundedmoments or,if nonstochastic,that it notexhibit a trend.The assump- = et It-3'Xt (6) tion of stationarityis made for conveniencein the proofs and could be relaxedsomewhat. However, the requirement and that Xt not be integratedof order 1 (I(1)) or higheris essentialfor our results. V = (T-1 E tk &2 (7) AssumptionB requiresXt to be strictlyexogenous. This assumptionpermits estimation of (1), underthe null f3t 3o0,by GLS. where&2 = (T - k)-1 ET= E2. If a(L) 1, thensteps (a) The assumptionthat a(L) has finite-orderp in assumption and(b) areomitted and the OLS andGLS regressionsof Yt C is made to simplifyestimation by feasibleGLS. The as- on Xt areequivalent. sumptionthat Et and rt are independentensures that ut and Two teststatistics are considered:Nyblom's (1989) LT vt have a zero cross-spectraldensity matrix. This is a basic statistic(modified to use GLS residuals)and the sequential identifyingassumption of the TVP model (Harvey 1989). GLS ChowF statistics,FT(s)(O < s < 1), whichtest for To constructthe Gaussian MLE, Et and rt are modeled as a breakat date [Ts],where [.] denotesthe
Recommended publications
  • A Recursive Formula for Moments of a Binomial Distribution Arp´ Ad´ Benyi´ ([email protected]), University of Massachusetts, Amherst, MA 01003 and Saverio M
    A Recursive Formula for Moments of a Binomial Distribution Arp´ ad´ Benyi´ ([email protected]), University of Massachusetts, Amherst, MA 01003 and Saverio M. Manago ([email protected]) Naval Postgraduate School, Monterey, CA 93943 While teaching a course in probability and statistics, one of the authors came across an apparently simple question about the computation of higher order moments of a ran- dom variable. The topic of moments of higher order is rarely emphasized when teach- ing a statistics course. The textbooks we came across in our classes, for example, treat this subject rather scarcely; see [3, pp. 265–267], [4, pp. 184–187], also [2, p. 206]. Most of the examples given in these books stop at the second moment, which of course suffices if one is only interested in finding, say, the dispersion (or variance) of a ran- 2 2 dom variable X, D (X) = M2(X) − M(X) . Nevertheless, moments of order higher than 2 are relevant in many classical statistical tests when one assumes conditions of normality. These assumptions may be checked by examining the skewness or kurto- sis of a probability distribution function. The skewness, or the first shape parameter, corresponds to the the third moment about the mean. It describes the symmetry of the tails of a probability distribution. The kurtosis, also known as the second shape pa- rameter, corresponds to the fourth moment about the mean and measures the relative peakedness or flatness of a distribution. Significant skewness or kurtosis indicates that the data is not normal. However, we arrived at higher order moments unintentionally.
    [Show full text]
  • Lecture 12 Robust Estimation
    Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Copyright These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book: Financial Econometrics: From Basics to Advanced Modeling Techniques (Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Outline I Robust statistics. I Robust estimators of regressions. I Illustration: robustness of the corporate bond yield spread model. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed. I The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. I Robust statistics: 1. assesses the changes in estimates due to small changes in the basic assumptions; 2. creates new estimates that are insensitive to small changes in some of the assumptions. I Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Peter Huber observed, that robust, distribution-free, and nonparametrical actually are not closely related properties.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • Should We Think of a Different Median Estimator?
    Comunicaciones en Estad´ıstica Junio 2014, Vol. 7, No. 1, pp. 11–17 Should we think of a different median estimator? ¿Debemos pensar en un estimator diferente para la mediana? Jorge Iv´an V´eleza Juan Carlos Correab [email protected] [email protected] Resumen La mediana, una de las medidas de tendencia central m´as populares y utilizadas en la pr´actica, es el valor num´erico que separa los datos en dos partes iguales. A pesar de su popularidad y aplicaciones, muchos desconocen la existencia de dife- rentes expresiones para calcular este par´ametro. A continuaci´on se presentan los resultados de un estudio de simulaci´on en el que se comparan el estimador cl´asi- co y el propuesto por Harrell & Davis (1982). Mostramos que, comparado con el estimador de Harrell–Davis, el estimador cl´asico no tiene un buen desempe˜no pa- ra tama˜nos de muestra peque˜nos. Basados en los resultados obtenidos, se sugiere promover la utilizaci´on de un mejor estimador para la mediana. Palabras clave: mediana, cuantiles, estimador Harrell-Davis, simulaci´on estad´ısti- ca. Abstract The median, one of the most popular measures of central tendency widely-used in the statistical practice, is often described as the numerical value separating the higher half of the sample from the lower half. Despite its popularity and applica- tions, many people are not aware of the existence of several formulas to estimate this parameter. We present the results of a simulation study comparing the classic and the Harrell-Davis (Harrell & Davis 1982) estimators of the median for eight continuous statistical distributions.
    [Show full text]
  • Bias, Mean-Square Error, Relative Efficiency
    3 Evaluating the Goodness of an Estimator: Bias, Mean-Square Error, Relative Efficiency Consider a population parameter ✓ for which estimation is desired. For ex- ample, ✓ could be the population mean (traditionally called µ) or the popu- lation variance (traditionally called σ2). Or it might be some other parame- ter of interest such as the population median, population mode, population standard deviation, population minimum, population maximum, population range, population kurtosis, or population skewness. As previously mentioned, we will regard parameters as numerical charac- teristics of the population of interest; as such, a parameter will be a fixed number, albeit unknown. In Stat 252, we will assume that our population has a distribution whose density function depends on the parameter of interest. Most of the examples that we will consider in Stat 252 will involve continuous distributions. Definition 3.1. An estimator ✓ˆ is a statistic (that is, it is a random variable) which after the experiment has been conducted and the data collected will be used to estimate ✓. Since it is true that any statistic can be an estimator, you might ask why we introduce yet another word into our statistical vocabulary. Well, the answer is quite simple, really. When we use the word estimator to describe a particular statistic, we already have a statistical estimation problem in mind. For example, if ✓ is the population mean, then a natural estimator of ✓ is the sample mean. If ✓ is the population variance, then a natural estimator of ✓ is the sample variance. More specifically, suppose that Y1,...,Yn are a random sample from a population whose distribution depends on the parameter ✓.The following estimators occur frequently enough in practice that they have special notations.
    [Show full text]
  • A Joint Central Limit Theorem for the Sample Mean and Regenerative Variance Estimator*
    Annals of Operations Research 8(1987)41-55 41 A JOINT CENTRAL LIMIT THEOREM FOR THE SAMPLE MEAN AND REGENERATIVE VARIANCE ESTIMATOR* P.W. GLYNN Department of Industrial Engineering, University of Wisconsin, Madison, W1 53706, USA and D.L. IGLEHART Department of Operations Research, Stanford University, Stanford, CA 94305, USA Abstract Let { V(k) : k t> 1 } be a sequence of independent, identically distributed random vectors in R d with mean vector ~. The mapping g is a twice differentiable mapping from R d to R 1. Set r = g(~). A bivariate central limit theorem is proved involving a point estimator for r and the asymptotic variance of this point estimate. This result can be applied immediately to the ratio estimation problem that arises in regenerative simulation. Numerical examples show that the variance of the regenerative variance estimator is not necessarily minimized by using the "return state" with the smallest expected cycle length. Keywords and phrases Bivariate central limit theorem,j oint limit distribution, ratio estimation, regenerative simulation, simulation output analysis. 1. Introduction Let X = {X(t) : t I> 0 } be a (possibly) delayed regenerative process with regeneration times 0 = T(- 1) ~< T(0) < T(1) < T(2) < .... To incorporate regenerative sequences {Xn: n I> 0 }, we pass to the continuous time process X = {X(t) : t/> 0}, where X(0 = X[t ] and [t] is the greatest integer less than or equal to t. Under quite general conditions (see Smith [7] ), *This research was supported by Army Research Office Contract DAAG29-84-K-0030. The first author was also supported by National Science Foundation Grant ECS-8404809 and the second author by National Science Foundation Grant MCS-8203483.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Section 7 Testing Hypotheses About Parameters of Normal Distribution. T-Tests and F-Tests
    Section 7 Testing hypotheses about parameters of normal distribution. T-tests and F-tests. We will postpone a more systematic approach to hypotheses testing until the following lectures and in this lecture we will describe in an ad hoc way T-tests and F-tests about the parameters of normal distribution, since they are based on a very similar ideas to confidence intervals for parameters of normal distribution - the topic we have just covered. Suppose that we are given an i.i.d. sample from normal distribution N(µ, ν2) with some unknown parameters µ and ν2 : We will need to decide between two hypotheses about these unknown parameters - null hypothesis H0 and alternative hypothesis H1: Hypotheses H0 and H1 will be one of the following: H : µ = µ ; H : µ = µ ; 0 0 1 6 0 H : µ µ ; H : µ < µ ; 0 ∼ 0 1 0 H : µ µ ; H : µ > µ ; 0 ≈ 0 1 0 where µ0 is a given ’hypothesized’ parameter. We will also consider similar hypotheses about parameter ν2 : We want to construct a decision rule α : n H ; H X ! f 0 1g n that given an i.i.d. sample (X1; : : : ; Xn) either accepts H0 or rejects H0 (accepts H1). Null hypothesis is usually a ’main’ hypothesis2 X in a sense that it is expected or presumed to be true and we need a lot of evidence to the contrary to reject it. To quantify this, we pick a parameter � [0; 1]; called level of significance, and make sure that a decision rule α rejects H when it is2 actually true with probability �; i.e.
    [Show full text]
  • 11. Parameter Estimation
    11. Parameter Estimation Chris Piech and Mehran Sahami May 2017 We have learned many different distributions for random variables and all of those distributions had parame- ters: the numbers that you provide as input when you define a random variable. So far when we were working with random variables, we either were explicitly told the values of the parameters, or, we could divine the values by understanding the process that was generating the random variables. What if we don’t know the values of the parameters and we can’t estimate them from our own expert knowl- edge? What if instead of knowing the random variables, we have a lot of examples of data generated with the same underlying distribution? In this chapter we are going to learn formal ways of estimating parameters from data. These ideas are critical for artificial intelligence. Almost all modern machine learning algorithms work like this: (1) specify a probabilistic model that has parameters. (2) Learn the value of those parameters from data. Parameters Before we dive into parameter estimation, first let’s revisit the concept of parameters. Given a model, the parameters are the numbers that yield the actual distribution. In the case of a Bernoulli random variable, the single parameter was the value p. In the case of a Uniform random variable, the parameters are the a and b values that define the min and max value. Here is a list of random variables and the corresponding parameters. From now on, we are going to use the notation q to be a vector of all the parameters: Distribution Parameters Bernoulli(p) q = p Poisson(l) q = l Uniform(a,b) q = (a;b) Normal(m;s 2) q = (m;s 2) Y = mX + b q = (m;b) In the real world often you don’t know the “true” parameters, but you get to observe data.
    [Show full text]
  • Volatility Modeling Using the Student's T Distribution
    Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Economics Aris Spanos, Chair Richard Ashley Raman Kumar Anya McGuirk Dennis Yang August 29, 2003 Blacksburg, Virginia Keywords: Student’s t Distribution, Multivariate GARCH, VAR, Exchange Rates Copyright 2003, Maria S. Heracleous Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous (ABSTRACT) Over the last twenty years or so the Dynamic Volatility literature has produced a wealth of uni- variateandmultivariateGARCHtypemodels.Whiletheunivariatemodelshavebeenrelatively successful in empirical studies, they suffer from a number of weaknesses, such as unverifiable param- eter restrictions, existence of moment conditions and the retention of Normality. These problems are naturally more acute in the multivariate GARCH type models, which in addition have the problem of overparameterization. This dissertation uses the Student’s t distribution and follows the Probabilistic Reduction (PR) methodology to modify and extend the univariate and multivariate volatility models viewed as alternative to the GARCH models. Its most important advantage is that it gives rise to internally consistent statistical models that do not require ad hoc parameter restrictions unlike the GARCH formulations. Chapters 1 and 2 provide an overview of my dissertation and recent developments in the volatil- ity literature. In Chapter 3 we provide an empirical illustration of the PR approach for modeling univariate volatility. Estimation results suggest that the Student’s t AR model is a parsimonious and statistically adequate representation of exchange rate returns and Dow Jones returns data.
    [Show full text]
  • A Widely Applicable Bayesian Information Criterion
    JournalofMachineLearningResearch14(2013)867-897 Submitted 8/12; Revised 2/13; Published 3/13 A Widely Applicable Bayesian Information Criterion Sumio Watanabe [email protected] Department of Computational Intelligence and Systems Science Tokyo Institute of Technology Mailbox G5-19, 4259 Nagatsuta, Midori-ku Yokohama, Japan 226-8502 Editor: Manfred Opper Abstract A statistical model or a learning machine is called regular if the map taking a parameter to a prob- ability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/logn, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model.
    [Show full text]
  • Statistic: a Quantity That We Can Calculate from Sample Data That Summarizes a Characteristic of That Sample
    STAT 509 – Section 4.1 – Estimation Parameter: A numerical characteristic of a population. Examples: Statistic: A quantity that we can calculate from sample data that summarizes a characteristic of that sample. Examples: Point Estimator: A statistic which is a single number meant to estimate a parameter. It would be nice if the average value of the estimator (over repeated sampling) equaled the target parameter. An estimator is called unbiased if the mean of its sampling distribution is equal to the parameter being estimated. Examples: Another nice property of an estimator: we want it to be as precise as possible. The standard deviation of a statistic’s sampling distribution is called the standard error of the statistic. The standard error of the sample mean Y is / n . Note: As the sample size gets larger, the spread of the sampling distribution gets smaller. When the sample size is large, the sample mean varies less across samples. Evaluating an estimator: (1) Is it unbiased? (2) Does it have a small standard error? Interval Estimates • With a point estimate, we used a single number to estimate a parameter. • We can also use a set of numbers to serve as “reasonable” estimates for the parameter. Example: Assume we have a sample of size n from a normally distributed population. Y T We know: s / n has a t-distribution with n – 1 degrees of freedom. (Exactly true when data are normal, approximately true when data non-normal but n is large.) Y P(t t ) So: 1 – = n1, / 2 s / n n1, / 2 = where tn–1, /2 = the t-value with /2 area to the right (can be found from Table 2) This formula is called a “confidence interval” for .
    [Show full text]