Estimation of Inefficiency in Stochastic Frontier Models: a Bayesian Kernel

Estimation of Inefficiency in Stochastic Frontier Models: A Bayesian Kernel Approach Guohua Fengy Department of Economics University of North Texas Denton, Texas 76203 USA Chuan Wangz Wenlan School of Business Zhongnan University of Economics and Law Wuhan, Hubei 430073 China Xibin Zhangx Department of Econometrics and Business Statistics Monash University Caulfield East, VIC 3145 Australia April 21, 2018 1 Abstract We propose a kernel-based Bayesian framework for the analysis of stochastic frontiers and efficiency measurement. The primary feature of this framework is that the unknown distribution of inefficiency is approximated by a transformed Rosenblatt-Parzen kernel density estimator. To justify the kernel-based model, we conduct a Monte Carlo study and also apply the model to a panel of U.S. large banks. Simulation results show that the kernel-based model is capable of providing more precise estimation and prediction results than the commonly-used exponential stochastic frontier model. The Bayes factor also favors the kernel-based model over the exponential model in the empirical application. JEL classification: C11; D24; G21. Keywords: Kernel Density Estimation; Efficiency Measurement; Stochastic Distance Frontier; Markov Chain Monte Carlo. We would like to thank Professor Cheng Hsiao at the University of Southern California for helpful discussion. yE-mail: [email protected]; Phone: +1 940 565 2220; Fax: +1 940 565 4426. zE-mail: [email protected]; Phone: +61 3 9903 4539; Fax: +61 3 99032007. xE-mail: [email protected]; Phone: +61 3 99032130; Fax: +61 3 99032007. 2 1. Introduction Beginning with the seminal works of Aigner et al. (1977) and Meeusen and van den Broeck (1977), stochastic frontier models have been commonly used in evaluating productivity and efficiency of firms (see Greene, 2008). A typical stochastic frontier model involves the estimation of a specific parameterized efficient frontier with a composite error term consisting of non-negative inefficiency and noise components. The parametric frontier can be specified as a production, cost, profit, or distance frontier, depending on the type of data available and the issue under investigation. For example, a production (or output distance) frontier specifies maximum outputs for given sets of inputs and existing production technologies; and a cost frontier defines minimum costs given output levels, input prices and the existing production technology. In practice, it is unlikely that all (or possibly any) firms will operate at the frontier. The deviation from the frontier is a measure of inefficiency and is the focus of interest in many applications. Econometrically, this deviation is captured by the non-negative inefficiency error term. Despite the wide applications of stochastic frontier models, there is no consensus on the distribution to be used for the one-sided inefficiency term. The earliest two distributions in the literature are the half-normal distribution adopted by Aigner et al. (1977) and the exponential distribution adopted by Meeusen and van den Broeck (1977). However, both of these two distributions are criticized for being restrictive. Specifically, the half-normal distribution is criticized in that the zero mean is an unnecessary restriction, while the exponential distribution is criticized on the ground that the probability of a firm’s efficiency falling in a certain interval is always strictly less than one if the interval does not have 0 or 1 as one of the endpoints. These limitations have led some researchers to consider more general parametric distributions, such as the truncated normal distribution proposed by Stevenson (1980) and the Gamma distribution proposed by Greene (1990). Schmidt and Sickles (1984) go one step further by estimating the inefficiency term without making distributional assumptions on the inefficiency term and noise term1. 1Schmidt and Sickles (1984) note that, with panel data, time-invariant inefficiency can be estimated without making distributional assumptions on the inefficiency term and noise term. In their model, only the intercept vary over firms, and differences in the intercept are interpreted as differing efficiency levels. The Schmidt and Sickles (1984) model can be estimated using the traditional panel data methods of fixed-effects estimation (dummy variables) or error- 3 More recently, other researchers extend this literature further by modeling the inefficiency term non-parametrically. For example, Griffin and Steel (2004) use a nonparametric, Dirichlet-process- based technique to model the distribution of the inefficiency term, while keeping the frontier part parametric. Using data on U.S. hospitals, Griffin and Steel (2004) demonstrate that compared with the Dirichlet stochastic frontier model, the commonly-used exponential parametric model underestimates the probability of a firm’s efficiency falling in the efficiency interval [0.6, 0.8]. In addition, they show that the Gamma parametric model misses the mass in the region of efficiencies above 0:95 and generally underestimates the probabilities on high efficiencies (above 0:80) and overestimates those on low efficiencies (especially under 0:6). The purpose of this paper is to contribute to this literature by proposing a new nonparametric methodology for flexible modeling of the inefficiency distribution within a Bayesian framework. Specifically, we use a kernel density estimator to estimate the probability density of the inefficiency term. There is a growing number of studies that use kernel density estimators to approximate error terms. These studies include, but are not limited to, Yuan and de Gooijer (2007), Jaki and West (2008), and Zhang et al. (2014). It is worth noting that all these studies have used kernel density estimators to approximate idiosyncratic errors in non-stochastic frontier models. To the best of our knowledge, this is the first study that uses a kernel density estimator to approximate the inefficiency term2. However, we cannot use standard kernel density estimators to approximate the density of the inefficiency term, which has a bounded support on [0; ). This is because the bias of these estima- 1 tors at the boundary have a different representation and are of different order from when consider- ing interior points. To avoid this problem, we follow Wand et al (1991) by using “the transformed kernel estimation approach”. This approach involves three steps: i) transform an original data set that has a bounded support using a transformation function that is capable of transforming the original data into the interval ( ; ); ii) calculate the density of the transformed data by use of 1 1 components estimation. 2We acknowledge that this is conceptually equivalent to Zhang et al. (2014). We also note that another difference between Zhang et al. (2014) and our study is that their methodology is proposed in a cross-sectional setting, while ours is proposed in a panel data setting. 4 classical kernel density estimators; and iii) the estimator of the density of the original data set can then be obtained by “back-transforming” the estimate of the density of the transformed data. A crucial step of the transformed kernel estimation approach is the choice of transformation function. In our case, we use the log transformation, a special case of the Box-Cox transformation suggested by Wand et al (1991), to transform the non-negative inefficiency term into an unbounded variable. Because a kernel density estimator is used to estimate the density of the inefficiency term, we refer to our model as the “kernel-based semi-parametric stochastic frontier model”. Our kernel-based semi-parametric stochastic frontier model is estimated within a sampling framework. In doing so, we pay particular attention to the possible identification problem between the intercept and the inefficiency term. As is well known, this identification problem is very likely to arise when the inefficiency term is modeled in a flexible manner as in this paper. A consequence of this identification problem is slow convergence in MCMC. To overcome this problem, we im- plement the idea of hierarchical centering, which was introduced by Gelfand et al. (1995) in the context of normal linear mixed effects models in order to improve the behavior of maximization and sampling-based algorithms. In the context of stochastic frontier models, hierarchical centering involves reparameterizing these models by replacing the inefficiency term by the sum of the intercept and the inefficiency term. This reparameterization is capable of overcoming the identification problem, because both the intercept and the inefficiency term have an additive effect and thus the model is naturally informative on the sum of the intercept and the inefficiency term. In this paper, we use a hybrid sampler, which randomly mixes updates from the centered and the uncentered parameterizations. We conduct a Monte Carlo study to investigate the performance of the kernel-based semi- parametric stochastic frontier model in terms of its ability to recover true inefficiencies. In doing so, we use two benchmark models for comparison: (1) the exponential stochastic frontier model; and (2) the Dirichlet stochastic frontier model. Our simulation results indicate that when the number of firms is large enough ( 200), the kernel-based model outperforms the exponential parametric model on all the three measures we use, namely, the Euclidean distance between the estimated and true vectors of technical efficiencies, the Spearman rank correlation coefficient between the 5 estimated and true vectors of technical efficiencies, and the coverage probability of credible interval. Our simulation results also suggest that the kernel model outperforms the Dirichlet model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability), but underperforms the Dirichlet model on the other measure (i.e., the Spearman rank correlation coefficient). Finally, we apply a kernel-based semi-parametric stochastic distance frontier (SDF) model to a panel data of 292 large banks in the U.S. over the period 2000 – 2005. Our Bayes factor analysis suggests strongly that the kernel SDF model outperforms the exponential parametric SDF model.

Load more