Estimation of Inefficiency in Stochastic Frontier Models: a Bayesian Kernel

Total Page:16

File Type:pdf, Size:1020Kb

Estimation of Inefficiency in Stochastic Frontier Models: a Bayesian Kernel Estimation of Inefficiency in Stochastic Frontier Models: A Bayesian Kernel Approach Guohua Fengy Department of Economics University of North Texas Denton, Texas 76203 USA Chuan Wangz Wenlan School of Business Zhongnan University of Economics and Law Wuhan, Hubei 430073 China Xibin Zhangx Department of Econometrics and Business Statistics Monash University Caulfield East, VIC 3145 Australia April 21, 2018 1 Abstract We propose a kernel-based Bayesian framework for the analysis of stochastic frontiers and efficiency measurement. The primary feature of this framework is that the unknown distribution of inefficiency is approximated by a transformed Rosenblatt-Parzen kernel density estimator. To justify the kernel-based model, we conduct a Monte Carlo study and also apply the model to a panel of U.S. large banks. Simulation results show that the kernel-based model is capable of providing more precise estimation and prediction results than the commonly-used exponential stochastic frontier model. The Bayes factor also favors the kernel-based model over the exponential model in the empirical application. JEL classification: C11; D24; G21. Keywords: Kernel Density Estimation; Efficiency Measurement; Stochastic Distance Frontier; Markov Chain Monte Carlo. We would like to thank Professor Cheng Hsiao at the University of Southern California for helpful discussion. yE-mail: [email protected]; Phone: +1 940 565 2220; Fax: +1 940 565 4426. zE-mail: [email protected]; Phone: +61 3 9903 4539; Fax: +61 3 99032007. xE-mail: [email protected]; Phone: +61 3 99032130; Fax: +61 3 99032007. 2 1. Introduction Beginning with the seminal works of Aigner et al. (1977) and Meeusen and van den Broeck (1977), stochastic frontier models have been commonly used in evaluating productivity and effi- ciency of firms (see Greene, 2008). A typical stochastic frontier model involves the estimation of a specific parameterized efficient frontier with a composite error term consisting of non-negative inefficiency and noise components. The parametric frontier can be specified as a production, cost, profit, or distance frontier, depending on the type of data available and the issue under investigation. For example, a production (or output distance) frontier specifies maximum outputs for given sets of inputs and existing production technologies; and a cost frontier defines minimum costs given output levels, input prices and the existing production technology. In practice, it is unlikely that all (or possibly any) firms will operate at the frontier. The deviation from the frontier is a measure of inefficiency and is the focus of interest in many applications. Econometrically, this deviation is captured by the non-negative inefficiency error term. Despite the wide applications of stochastic frontier models, there is no consensus on the distri- bution to be used for the one-sided inefficiency term. The earliest two distributions in the literature are the half-normal distribution adopted by Aigner et al. (1977) and the exponential distribution adopted by Meeusen and van den Broeck (1977). However, both of these two distributions are crit- icized for being restrictive. Specifically, the half-normal distribution is criticized in that the zero mean is an unnecessary restriction, while the exponential distribution is criticized on the ground that the probability of a firm’s efficiency falling in a certain interval is always strictly less than one if the interval does not have 0 or 1 as one of the endpoints. These limitations have led some researchers to consider more general parametric distributions, such as the truncated normal dis- tribution proposed by Stevenson (1980) and the Gamma distribution proposed by Greene (1990). Schmidt and Sickles (1984) go one step further by estimating the inefficiency term without making distributional assumptions on the inefficiency term and noise term1. 1Schmidt and Sickles (1984) note that, with panel data, time-invariant inefficiency can be estimated without making distributional assumptions on the inefficiency term and noise term. In their model, only the intercept vary over firms, and differences in the intercept are interpreted as differing efficiency levels. The Schmidt and Sickles (1984) model can be estimated using the traditional panel data methods of fixed-effects estimation (dummy variables) or error- 3 More recently, other researchers extend this literature further by modeling the inefficiency term non-parametrically. For example, Griffin and Steel (2004) use a nonparametric, Dirichlet-process- based technique to model the distribution of the inefficiency term, while keeping the frontier part parametric. Using data on U.S. hospitals, Griffin and Steel (2004) demonstrate that compared with the Dirichlet stochastic frontier model, the commonly-used exponential parametric model underestimates the probability of a firm’s efficiency falling in the efficiency interval [0.6, 0.8]. In addition, they show that the Gamma parametric model misses the mass in the region of efficiencies above 0:95 and generally underestimates the probabilities on high efficiencies (above 0:80) and overestimates those on low efficiencies (especially under 0:6). The purpose of this paper is to contribute to this literature by proposing a new nonparametric methodology for flexible modeling of the inefficiency distribution within a Bayesian framework. Specifically, we use a kernel density estimator to estimate the probability density of the inefficiency term. There is a growing number of studies that use kernel density estimators to approximate error terms. These studies include, but are not limited to, Yuan and de Gooijer (2007), Jaki and West (2008), and Zhang et al. (2014). It is worth noting that all these studies have used kernel density estimators to approximate idiosyncratic errors in non-stochastic frontier models. To the best of our knowledge, this is the first study that uses a kernel density estimator to approximate the inefficiency term2. However, we cannot use standard kernel density estimators to approximate the density of the inefficiency term, which has a bounded support on [0; ). This is because the bias of these estima- 1 tors at the boundary have a different representation and are of different order from when consider- ing interior points. To avoid this problem, we follow Wand et al (1991) by using “the transformed kernel estimation approach”. This approach involves three steps: i) transform an original data set that has a bounded support using a transformation function that is capable of transforming the original data into the interval ( ; ); ii) calculate the density of the transformed data by use of 1 1 components estimation. 2We acknowledge that this is conceptually equivalent to Zhang et al. (2014). We also note that another difference between Zhang et al. (2014) and our study is that their methodology is proposed in a cross-sectional setting, while ours is proposed in a panel data setting. 4 classical kernel density estimators; and iii) the estimator of the density of the original data set can then be obtained by “back-transforming” the estimate of the density of the transformed data. A crucial step of the transformed kernel estimation approach is the choice of transformation function. In our case, we use the log transformation, a special case of the Box-Cox transformation suggested by Wand et al (1991), to transform the non-negative inefficiency term into an unbounded variable. Because a kernel density estimator is used to estimate the density of the inefficiency term, we refer to our model as the “kernel-based semi-parametric stochastic frontier model”. Our kernel-based semi-parametric stochastic frontier model is estimated within a sampling framework. In doing so, we pay particular attention to the possible identification problem between the intercept and the inefficiency term. As is well known, this identification problem is very likely to arise when the inefficiency term is modeled in a flexible manner as in this paper. A consequence of this identification problem is slow convergence in MCMC. To overcome this problem, we im- plement the idea of hierarchical centering, which was introduced by Gelfand et al. (1995) in the context of normal linear mixed effects models in order to improve the behavior of maximization and sampling-based algorithms. In the context of stochastic frontier models, hierarchical centering involves reparameterizing these models by replacing the inefficiency term by the sum of the inter- cept and the inefficiency term. This reparameterization is capable of overcoming the identification problem, because both the intercept and the inefficiency term have an additive effect and thus the model is naturally informative on the sum of the intercept and the inefficiency term. In this paper, we use a hybrid sampler, which randomly mixes updates from the centered and the uncentered parameterizations. We conduct a Monte Carlo study to investigate the performance of the kernel-based semi- parametric stochastic frontier model in terms of its ability to recover true inefficiencies. In doing so, we use two benchmark models for comparison: (1) the exponential stochastic frontier model; and (2) the Dirichlet stochastic frontier model. Our simulation results indicate that when the num- ber of firms is large enough ( 200), the kernel-based model outperforms the exponential paramet- ric model on all the three measures we use, namely, the Euclidean distance between the estimated and true vectors of technical efficiencies, the Spearman rank correlation coefficient between the 5 estimated and true vectors of technical efficiencies, and the coverage probability of credible in- terval. Our simulation results also suggest that the kernel model outperforms the Dirichlet model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability), but underperforms the Dirichlet model on the other measure (i.e., the Spearman rank correlation coefficient). Finally, we apply a kernel-based semi-parametric stochastic distance frontier (SDF) model to a panel data of 292 large banks in the U.S. over the period 2000 – 2005. Our Bayes factor analysis suggests strongly that the kernel SDF model outperforms the exponential parametric SDF model.
Recommended publications
  • Lectures on Nonparametric Bayesian Statistics
    Lectures on Nonparametric Bayesian Statistics Notes for the course by Bas Kleijn, Aad van der Vaart, Harry van Zanten (Text partly extracted from a forthcoming book by S. Ghosal and A. van der Vaart) version 4-12-2012 UNDER CONSTRUCTION 1 Introduction Why adopt the nonparametric Bayesian approach for inference? The answer lies in the si- multaneous preference for nonparametric modeling and desire to follow a Bayesian proce- dure. Nonparametric (and semiparametric) models can avoid the arbitrary and possibly un- verifiable assumptions inherent in parametric models. Bayesian procedures may be desired for the conceptual simplicity of the Bayesian paradigm, easy interpretability of Bayesian quantities or philosopohical reasons. 1.1 Motivation Bayesian nonparametrics is the study of Bayesian inference methods for nonparametric and semiparametric models. In the Bayesian nonparametric paradigm a prior distribution is assigned to all unknown quantities (parameters) involved in the modeling, whether finite or infinite dimensional. Inference is made from the “posterior distribution”, the conditional distribution of all parameters given the data. A model completely specifies the conditional distribution of all observable given all unobserved quantities, or parameters, while a prior distribution specifies the distribution of all unobservables. From this point of view, random effects and latent variables also qualify as parameters, and distributions of these quantities, often considered as part of the model itself from the classical point of view, are considered part of the prior. The posterior distribution involves an inversion of the order of conditioning. Existence of a regular version of the posterior is guaranteed under mild conditions on the relevant spaces (see Section 2).
    [Show full text]
  • 4.2 Variance and Covariance
    4.2 Variance and Covariance The most important measure of variability of a random variable X is obtained by letting g(X) = (X − µ)2 then E[g(X)] gives a measure of the variability of the distribution of X. 1 Definition 4.3 Let X be a random variable with probability distribution f(x) and mean µ. The variance of X is σ 2 = E[(X − µ)2 ] = ∑(x − µ)2 f (x) x if X is discrete, and ∞ σ 2 = E[(X − µ)2 ] = ∫ (x − µ)2 f (x)dx if X is continuous. −∞ ¾ The positive square root of the variance, σ, is called the standard deviation of X. ¾ The quantity x - µ is called the deviation of an observation, x, from its mean. 2 Note σ2≥0 When the standard deviation of a random variable is small, we expect most of the values of X to be grouped around mean. We often use standard deviations to compare two or more distributions that have the same unit measurements. 3 Example 4.8 Page97 Let the random variable X represent the number of automobiles that are used for official business purposes on any given workday. The probability distributions of X for two companies are given below: x 12 3 Company A: f(x) 0.3 0.4 0.3 x 01234 Company B: f(x) 0.2 0.1 0.3 0.3 0.1 Find the variances of X for the two companies. Solution: µ = 2 µ A = (1)(0.3) + (2)(0.4) + (3)(0.3) = 2 B 2 2 2 2 σ A = (1− 2) (0.3) + (2 − 2) (0.4) + (3− 2) (0.3) = 0.6 2 2 4 σ B > σ A 2 2 4 σ B = ∑(x − 2) f (x) =1.6 x=0 Theorem 4.2 The variance of a random variable X is σ 2 = E(X 2 ) − µ 2.
    [Show full text]
  • Download (4Mb)
    University of Warwick institutional repository: http://go.warwick.ac.uk/wrap A Thesis Submitted for the Degree of PhD at the University of Warwick http://go.warwick.ac.uk/wrap/58404 This thesis is made available online and is protected by original copyright. Please scroll down to view the document itself. Please refer to the repository record for this item for information to help you to cite it. Our policy information is available from the repository home page. Library Declaration and Deposit Agreement 1. STUDENT DETAILS Please complete the following: Full name: …………………………………………………………………………………………….Gui Pedro Araújo de Mendonça University ID number: ………………………………………………………………………………0759252 2. THESIS DEPOSIT 2.1 I understand that under my registration at the University, I am required to deposit my thesis with the University in BOTH hard copy and in digital format. The digital version should normally be saved as a single pdf file. 2.2 The hard copy will be housed in the University Library. The digital version will be deposited in the University’s Institutional Repository (WRAP). Unless otherwise indicated (see 2.3 below) this will be made openly accessible on the Internet and will be supplied to the British Library to be made available online via its Electronic Theses Online Service (EThOS) service. [At present, theses submitted for a Master’s degree by Research (MA, MSc, LLM, MS or MMedSci) are not being deposited in WRAP and not being made available via EthOS. This may change in future.] 2.3 In exceptional circumstances, the Chair of the Board of Graduate Studies may grant permission for an embargo to be placed on public access to the hard copy thesis for a limited period.
    [Show full text]
  • On the Efficiency and Consistency of Likelihood Estimation in Multivariate
    On the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models∗ Gabriele Fiorentini Università di Firenze and RCEA, Viale Morgagni 59, I-50134 Firenze, Italy <fi[email protected]fi.it> Enrique Sentana CEMFI, Casado del Alisal 5, E-28014 Madrid, Spain <sentana@cemfi.es> Revised: October 2010 Abstract We rank the efficiency of several likelihood-based parametric and semiparametric estima- tors of conditional mean and variance parameters in multivariate dynamic models with po- tentially asymmetric and leptokurtic strong white noise innovations. We detailedly study the elliptical case, and show that Gaussian pseudo maximum likelihood estimators are inefficient except under normality. We provide consistency conditions for distributionally misspecified maximum likelihood estimators, and show that they coincide with the partial adaptivity con- ditions of semiparametric procedures. We propose Hausman tests that compare Gaussian pseudo maximum likelihood estimators with more efficient but less robust competitors. Fi- nally, we provide finite sample results through Monte Carlo simulations. Keywords: Adaptivity, Elliptical Distributions, Hausman tests, Semiparametric Esti- mators. JEL: C13, C14, C12, C51, C52 ∗We would like to thank Dante Amengual, Manuel Arellano, Nour Meddahi, Javier Mencía, Olivier Scaillet, Paolo Zaffaroni, participants at the European Meeting of the Econometric Society (Stockholm, 2003), the Sympo- sium on Economic Analysis (Seville, 2003), the CAF Conference on Multivariate Modelling in Finance and Risk Management (Sandbjerg, 2006), the Second Italian Congress on Econometrics and Empirical Economics (Rimini, 2007), as well as audiences at AUEB, Bocconi, Cass Business School, CEMFI, CREST, EUI, Florence, NYU, RCEA, Roma La Sapienza and Queen Mary for useful comments and suggestions. Of course, the usual caveat applies.
    [Show full text]
  • Robust Numerical Bayesian Unit Root Test for Model
    ROBUST NUMERICAL BAYESIAN UNIT ROOT TEST FOR MODEL UNCERTAINTY by XUEDONG WU (Under the Direction of Jeffrey H. Dorfman) ABSTRACT Unit root testing is an important procedure when performing time series analysis, since all the succeeding inferences should be performed based on a stationary series. Thus, it is crucial to test the stationarity of the time series in hand accurately and efficiently. One issue among the existing popular unit root testing methods is that they all require certain assumptions about the model specification, whether on the functional form or the stochastic term distribution; then all the analyses are performed based on the pre- determined model. However, various circumstances such as data incompleteness, variable selection, and distribution misspecification which may lead to an inappropriate model specification that will produce an erroneous conclusion since the test result depends on the particular model considered. This dissertation focuses on confronting this issue by proposing a new numerical Bayesian unit root test incorporating model averaging which can take model uncertainty as well as variable transformation into account. The first chapter introduces a broad literature review of all the building blocks need for the development of the new methods, including traditional frequentist unit root tests, Bayesian unit root tests, and Bayesian model averaging. Following chapter II elaborates the mathematical derivation of the proposed methods, Monte Carlo simulation study and results, as well as testing conclusions on the benchmark Nelson and Plosser (1982) macroeconomic time series. Chapter III applies the method to investigate the effect of data frequency on unit root test results particularly for financial data.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • The Smoothed Median and the Bootstrap
    Biometrika (2001), 88, 2, pp. 519–534 © 2001 Biometrika Trust Printed in Great Britain The smoothed median and the bootstrap B B. M. BROWN School of Mathematics, University of South Australia, Adelaide, South Australia 5005, Australia [email protected] PETER HALL Centre for Mathematics & its Applications, Australian National University, Canberra, A.C.T . 0200, Australia [email protected] G. A. YOUNG Statistical L aboratory, University of Cambridge, Cambridge CB3 0WB, U.K. [email protected] S Even in one dimension the sample median exhibits very poor performance when used in conjunction with the bootstrap. For example, both the percentile-t bootstrap and the calibrated percentile method fail to give second-order accuracy when applied to the median. The situation is generally similar for other rank-based methods, particularly in more than one dimension. Some of these problems can be overcome by smoothing, but that usually requires explicit choice of the smoothing parameter. In the present paper we suggest a new, implicitly smoothed version of the k-variate sample median, based on a particularly smooth objective function. Our procedure preserves many features of the conventional median, such as robustness and high efficiency, in fact higher than for the conventional median, in the case of normal data. It is however substantially more amenable to application of the bootstrap. Focusing on the univariate case, we demonstrate these properties both theoretically and numerically. Some key words: Bootstrap; Calibrated percentile method; Median; Percentile-t; Rank methods; Smoothed median. 1. I Rank methods are based on simple combinatorial ideas of permutations and sign changes, which are attractive in applications far removed from the assumptions of normal linear model theory.
    [Show full text]
  • A Note on Inference in a Bivariate Normal Distribution Model Jaya
    A Note on Inference in a Bivariate Normal Distribution Model Jaya Bishwal and Edsel A. Peña Technical Report #2009-3 December 22, 2008 This material was based upon work partially supported by the National Science Foundation under Grant DMS-0635449 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Statistical and Applied Mathematical Sciences Institute PO Box 14006 Research Triangle Park, NC 27709-4006 www.samsi.info A Note on Inference in a Bivariate Normal Distribution Model Jaya Bishwal¤ Edsel A. Pena~ y December 22, 2008 Abstract Suppose observations are available on two variables Y and X and interest is on a parameter that is present in the marginal distribution of Y but not in the marginal distribution of X, and with X and Y dependent and possibly in the presence of other parameters which are nuisance. Could one gain more e±ciency in the point estimation (also, in hypothesis testing and interval estimation) about the parameter of interest by using the full data (both Y and X values) instead of just the Y values? Also, how should one measure the information provided by random observables or their distributions about the parameter of interest? We illustrate these issues using a simple bivariate normal distribution model. The ideas could have important implications in the context of multiple hypothesis testing or simultaneous estimation arising in the analysis of microarray data, or in the analysis of event time data especially those dealing with recurrent event data.
    [Show full text]
  • Bayesian Treatments to Panel Data Models with an Application To
    Bayesian Treatments to Panel Data Models with an Application to Models of Productivity Junrong Liu Department of Economics/Rice University Robin Sickles Department of Economics/Rice University E. G. Tsionas University of Athens This Draft November 25, 2013 Abstract This paper considers two models for uncovering information about technical change in large heterogeneous panels. The first is a panel data model with nonparametric time effects. Second, we consider a panel data model with common factors whose number is unknown and their effects are firm-specific. This paper proposes a Bayesian approach to estimate the two models. Bayesian inference techniques organized around MCMC are applied to implement the models. Monte Carlo experiments are performed to examine the finite-sample performance of this approach, which dominates a variety of estimators that rely on parametric assumptions. In order to illustrate the new method, the Bayesian approach has been applied to the analysis of efficiency trends in the U.S. largest banks using a dataset based on the Call Report data from FDIC over the period from 1990 to 2009. Keywords: Panel data, time-varying heterogeneity, Bayesian econometrics, banking studies, productivity JEL Classification: C23; C11; G21; D24 1 1. Introduction In this paper, we consider two panel data models with unobserved heterogeneous time-varying effects, one with individual effects treated as random functions of time, the other with common factors whose number is unknown and their effects are firm-specific. This paper has two distinctive features and can be considered as a generalization of traditional panel data models. Firstly, the individual effects that are assumed to be heterogeneous across units as well as to be time varying are treated nonparametrically, following the spirit of the model from Bai (2009) and Kneip et al.
    [Show full text]
  • Chapter 23 Flexible Budgets and Standard Cost Systems
    Chapter 23 Flexible Budgets and Standard Cost Systems Review Questions 1. What is a variance? A variance is the difference between an actual amount and the budgeted amount. 2. Explain the difference between a favorable and an unfavorable variance. A variance is favorable if it increases operating income. For example, if actual revenue is greater than budgeted revenue or if actual expense is less than budgeted expense, then the variance is favorable. If the variance decreases operating income, the variance is unfavorable. For example, if actual revenue is less than budgeted revenue or if actual expense is greater than budgeted expense, the variance is unfavorable. 3. What is a static budget performance report? A static budget is a budget prepared for only one level of sales volume. A static budget performance report compares actual results to the expected results in the static budget and reports the differences (static budget variances). 4. How do flexible budgets differ from static budgets? A flexible budget is a budget prepared for various levels of sales volume within a relevant range. A static budget is prepared for only one level of sales volume—the expected number of units sold— and it doesn’t change after it is developed. 5. How is a flexible budget used? Because a flexible budget is prepared for various levels of sales volume within a relevant range, it provides the basis for preparing the flexible budget performance report and understanding the two components of the overall static budget variance (a static budget is prepared for only one level of sales volume, and a static budget performance report shows only the overall static budget variance).
    [Show full text]
  • Random Variables Generation
    Random Variables Generation Revised version of the slides based on the book Discrete-Event Simulation: a first course L.L. Leemis & S.K. Park Section(s) 6.1, 6.2, 7.1, 7.2 c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation Random Variables Generation 1/80 Introduction Monte Carlo Simulators differ from Trace Driven simulators because of the use of Random Number Generators to represent the variability that affects the behaviors of real systems. Uniformly distributed random variables are the most elementary representations that we can use in Monte Carlo simulation, but are not enough to capture the complexity of real systems. We must thus devise methods for generating instances (variates) of arbitrary random variables Properly using uniform random numbers, it is possible to obtain this result. In the sequel we will first recall some basic properties of Discrete and Continuous random variables and then we will discuss several methods to obtain their variates Discrete-Event Simulation Random Variables Generation 2/80 Basic Probability Concepts Empirical Probability, derives from performing an experiment many times n and counting the number of occurrences na of an event A The relative frequency of occurrence of event is na/n The frequency theory of probability asserts thatA the relative frequency converges as n → ∞ n Pr( )= lim a A n→∞ n Axiomatic Probability is a formal, set-theoretic approach Mathematically construct the sample space and calculate the number of events A The two are complementary! Discrete-Event Simulation Random
    [Show full text]
  • Comparison of the Estimation Efficiency of Regression
    J. Stat. Appl. Pro. Lett. 6, No. 1, 11-20 (2019) 11 Journal of Statistics Applications & Probability Letters An International Journal http://dx.doi.org/10.18576/jsapl/060102 Comparison of the Estimation Efficiency of Regression Parameters Using the Bayesian Method and the Quantile Function Ismail Hassan Abdullatif Al-Sabri1,2 1 Department of Mathematics, College of Science and Arts (Muhayil Asir), King Khalid University, KSA 2 Department of Mathematics and Computer, Faculty of Science, Ibb University, Yemen Received: 3 Jul. 2018, Revised: 2 Sep. 2018, Accepted: 12 Sep. 2018 Published online: 1 Jan. 2019 Abstract: There are several classical as well as modern methods to predict model parameters. The modern methods include the Bayesian method and the quantile function method, which are both used in this study to estimate Multiple Linear Regression (MLR) parameters. Unlike the classical methods, the Bayesian method is based on the assumption that the model parameters are variable, rather than fixed, estimating the model parameters by the integration of prior information (prior distribution) with the sample information (posterior distribution) of the phenomenon under study. By contrast, the quantile function method aims to construct the error probability function using least squares and the inverse distribution function. The study investigated the efficiency of each of them, and found that the quantile function is more efficient than the Bayesian method, as the values of Theil’s Coefficient U and least squares of both methods 2 2 came to be U = 0.052 and ∑ei = 0.011, compared to U = 0.556 and ∑ei = 1.162, respectively. Keywords: Bayesian Method, Prior Distribution, Posterior Distribution, Quantile Function, Prediction Model 1 Introduction Multiple linear regression (MLR) analysis is an important statistical method for predicting the values of a dependent variable using the values of a set of independent variables.
    [Show full text]