<<

Estimation of Inefficiency in Stochastic Frontier

Models: A Bayesian Kernel Approach

Guohua Fengy Department of Economics

University of North Texas

Denton, Texas 76203

USA

Chuan Wangz Wenlan School of Business

Zhongnan University of Economics and Law

Wuhan, Hubei 430073

China

Xibin Zhangx Department of and Business

Monash University

Caulfield East, VIC 3145

Australia

April 21, 2018

1 Abstract

We propose a kernel-based Bayesian framework for the analysis of stochastic frontiers and efficiency measurement. The primary feature of this framework is that the unknown distribution of inefficiency is approximated by a transformed Rosenblatt-Parzen kernel density . To justify the kernel-based model, we conduct a Monte Carlo study and also apply the model to a panel of U.S. large banks. Simulation results show that the kernel-based model is capable of providing more precise estimation and prediction results than the commonly-used exponential stochastic frontier model. The also favors the kernel-based model over the exponential model in the empirical application.

JEL classification: C11; D24; G21.

Keywords: Kernel ; Efficiency Measurement; Stochastic Distance Frontier; .

We would like to thank Professor Cheng Hsiao at the University of Southern California for helpful discussion. yE-mail: [email protected]; Phone: +1 940 565 2220; Fax: +1 940 565 4426. zE-mail: [email protected]; Phone: +61 3 9903 4539; Fax: +61 3 99032007. xE-mail: [email protected]; Phone: +61 3 99032130; Fax: +61 3 99032007.

2 1. Introduction

Beginning with the seminal works of Aigner et al. (1977) and Meeusen and van den Broeck (1977), stochastic frontier models have been commonly used in evaluating productivity and effi- ciency of firms (see Greene, 2008). A typical stochastic frontier model involves the estimation of a specific parameterized efficient frontier with a composite error term consisting of non-negative inefficiency and noise components. The parametric frontier can be specified as a production, cost, profit, or distance frontier, depending on the type of available and the issue under investigation. For example, a production (or output distance) frontier specifies maximum outputs for given sets of inputs and existing production technologies; and a cost frontier defines minimum costs given output levels, input prices and the existing production technology. In practice, it is unlikely that all (or possibly any) firms will operate at the frontier. The deviation from the frontier is a measure of inefficiency and is the focus of interest in many applications. Econometrically, this deviation is captured by the non-negative inefficiency error term. Despite the wide applications of stochastic frontier models, there is no consensus on the distri- bution to be used for the one-sided inefficiency term. The earliest two distributions in the literature are the half- adopted by Aigner et al. (1977) and the exponential distribution adopted by Meeusen and van den Broeck (1977). However, both of these two distributions are crit- icized for being restrictive. Specifically, the half-normal distribution is criticized in that the zero is an unnecessary restriction, while the exponential distribution is criticized on the ground that the probability of a firm’s efficiency falling in a certain interval is always strictly less than one if the interval does not have 0 or 1 as one of the endpoints. These limitations have led some researchers to consider more general parametric distributions, such as the truncated normal dis- tribution proposed by Stevenson (1980) and the Gamma distribution proposed by Greene (1990). Schmidt and Sickles (1984) go one step further by estimating the inefficiency term without making distributional assumptions on the inefficiency term and noise term1.

1Schmidt and Sickles (1984) note that, with panel data, time-invariant inefficiency can be estimated without making distributional assumptions on the inefficiency term and noise term. In their model, only the intercept vary over firms, and differences in the intercept are interpreted as differing efficiency levels. The Schmidt and Sickles (1984) model can be estimated using the traditional panel data methods of fixed-effects estimation (dummy variables) or error-

3 More recently, other researchers extend this literature further by modeling the inefficiency term non-parametrically. For example, Griffin and Steel (2004) use a nonparametric, Dirichlet-process- based technique to model the distribution of the inefficiency term, while keeping the frontier part parametric. Using data on U.S. hospitals, Griffin and Steel (2004) demonstrate that compared with the Dirichlet stochastic frontier model, the commonly-used exponential underestimates the probability of a firm’s efficiency falling in the efficiency interval [0.6, 0.8]. In addition, they show that the Gamma parametric model misses the mass in the region of efficiencies above 0:95 and generally underestimates the probabilities on high efficiencies (above 0:80) and overestimates those on low efficiencies (especially under 0:6). The purpose of this paper is to contribute to this literature by proposing a new nonparametric methodology for flexible modeling of the inefficiency distribution within a Bayesian framework. Specifically, we use a kernel density estimator to estimate the probability density of the inefficiency term. There is a growing number of studies that use kernel density to approximate error terms. These studies include, but are not limited to, Yuan and de Gooijer (2007), Jaki and West (2008), and Zhang et al. (2014). It is worth noting that all these studies have used kernel density estimators to approximate idiosyncratic errors in non-stochastic frontier models. To the best of our knowledge, this is the first study that uses a kernel density estimator to approximate the inefficiency term2. However, we cannot use standard kernel density estimators to approximate the density of the inefficiency term, which has a bounded support on [0; ). This is because the bias of these estima- 1 tors at the boundary have a different representation and are of different order from when consider- ing interior points. To avoid this problem, we follow Wand et al (1991) by using “the transformed kernel estimation approach”. This approach involves three steps: i) transform an original data set that has a bounded support using a transformation function that is capable of transforming the original data into the interval ( ; ); ii) calculate the density of the transformed data by use of 1 1 components estimation. 2We acknowledge that this is conceptually equivalent to Zhang et al. (2014). We also note that another difference between Zhang et al. (2014) and our study is that their methodology is proposed in a cross-sectional setting, while ours is proposed in a panel data setting.

4 classical kernel density estimators; and iii) the estimator of the density of the original data set can then be obtained by “back-transforming” the estimate of the density of the transformed data. A crucial step of the transformed kernel estimation approach is the choice of transformation function. In our case, we use the log transformation, a special case of the Box-Cox transformation suggested by Wand et al (1991), to transform the non-negative inefficiency term into an unbounded variable. Because a kernel density estimator is used to estimate the density of the inefficiency term, we refer to our model as the “kernel-based semi-parametric stochastic frontier model”. Our kernel-based semi-parametric stochastic frontier model is estimated within a framework. In doing so, we pay particular attention to the possible identification problem between the intercept and the inefficiency term. As is well known, this identification problem is very likely to arise when the inefficiency term is modeled in a flexible manner as in this paper. A consequence of this identification problem is slow convergence in MCMC. To overcome this problem, we im- plement the idea of hierarchical centering, which was introduced by Gelfand et al. (1995) in the context of normal linear mixed effects models in order to improve the behavior of maximization and sampling-based algorithms. In the context of stochastic frontier models, hierarchical centering involves reparameterizing these models by replacing the inefficiency term by the sum of the inter- cept and the inefficiency term. This reparameterization is capable of overcoming the identification problem, because both the intercept and the inefficiency term have an additive effect and thus the model is naturally informative on the sum of the intercept and the inefficiency term. In this paper, we use a hybrid sampler, which randomly mixes updates from the centered and the uncentered parameterizations. We conduct a Monte Carlo study to investigate the performance of the kernel-based semi- parametric stochastic frontier model in terms of its ability to recover true inefficiencies. In doing so, we use two benchmark models for comparison: (1) the exponential stochastic frontier model; and (2) the Dirichlet stochastic frontier model. Our simulation results indicate that when the num- ber of firms is large enough ( 200), the kernel-based model outperforms the exponential paramet-  ric model on all the three measures we use, namely, the Euclidean distance between the estimated and true vectors of technical efficiencies, the Spearman coefficient between the

5 estimated and true vectors of technical efficiencies, and the coverage probability of credible in- terval. Our simulation results also suggest that the kernel model outperforms the Dirichlet model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability), but underperforms the Dirichlet model on the other measure (i.e., the Spearman rank correlation coefficient). Finally, we apply a kernel-based semi-parametric stochastic distance frontier (SDF) model to a panel data of 292 large banks in the U.S. over the period 2000 – 2005. Our Bayes suggests strongly that the kernel SDF model outperforms the exponential parametric SDF model. Our analysis on the posterior predictive efficiency density function of unobserved banks shows that predicting the efficiency for an unobserved bank on the basis of the exponential SDF model is misleading. Our further analysis on the posterior efficiency distributions of observed banks shows that efficiency inference on observed banks is also different from what we conclude on the basis of the kernel SDF model. In addition, we find that the kernel model and the Dirichlet model produce very similar firm-specific posterior efficiency densities and posterior predictive efficiency distributions. The rest of the paper is organized as follows. In Section 2, we present the kernel stochastic frontier model. In Section 3, we discuss the Bayesian procedure for estimating the kernel stochas- tic frontier model. Section 4 conducts a Monte Carlo study to investigate the performance of the kernel-based semi-parametric stochastic frontier model in terms of its ability to recover true inef- ficiencies. In Section 5, we apply a kernel stochastic distance frontier (SDF) model to large banks in the U.S.. Section 6 concludes the paper.

2. The Kernel-based Semi-parametric Stochastic Frontier Model

Let firms be indexed by i = 1; 2; ;N, and time by t = 1; 2; ;T . A typical production       stochastic frontier model is written as

yit = + x0 ui + vit, (1) it

6 where yit represents the logarithm of output, is an intercept, xit is a k dimensional vector of inputs (e.g., logarithms of inputs and squared logarithms of inputs), is the corresponding vector of coefficients, vit represents statistical noise and is assumed to be independently and identically distributed as

2 vit i:i:d: N 0;  , (2)   and ui is a non-negative disturbance which accounts for time-invariant inefficiency of firm i. This model can also represent other economic frontiers. For example, it can represent a stochastic cost frontier, with yit being the logarithm of cost and xit being a vector of output quantities and input prices, by changing “ ui” to “+ui”. To give another example, the model can also represent an output distance frontier, as shown in Section 5. In this latter model, yit is the negative logarithm of one of the outputs, xit is a vector of output-ratios and inputs, and ui has a positive sign.

We assume that ui is time invariant in order to exploit the panel structure of the data. This latter assumption can be restrictive especially when T is large, and a number of ways have been proposed in the literature to make ui vary over time (see, for example, Cornwell et al., 1990; Battese and

Coelli, 1992). In this paper, we will not allow ui to vary over time, as we will allow for a flexible treatment of the distribution of the inefficiency term ui, requiring substantial data information. Consequently, we only deal with short panels. However, given sufficient data, this framework can be extended to time-varying inefficiencies to accommodate longer panels, as discussed at the end of this section. As discussed in the Introduction, we propose using a kernel density estimator to approxi- mate the unknown density of ui, which has a bounded support on [0; ). It is not appropriate 1 to use standard kernel density estimators here because these estimators suffers from the well- known “boundary effects” or “edge effects” 3 (see for example, Chen, 2000). Generally, there are two approaches that can overcome this problem in kernel density estimation. The first approach, due to Chen (2000), is to use a Gamma kernel estimator because this estimator is free of bound- ary bias, always non-negative and achieve the optimal rate of convergence in the mean integrated

3The bias of these estimators have a different representation at the boundary than at interior points. In addition, these estimators also have a different convergence rate at the boundary than at interior points.

7 square error within the class of non-negative kernel density estimators. The second approach in- volves transforming the original bounded data into the interval ( ; ), estimating the density 1 1 of transformed data through the Gaussian kernel, and then “back-transforming” the resulting den- sity estimator of transformed data to derive a density estimator of the original data. For example, Wand et al. (1991) propose to use the Box-Cox transformation or shifted power transformation to transform the original data, while Buch-Larsen et al. (2005) suggest using the Champernowne transformation. For notational simplicity, we refer to the second approach as “the transformation

approach”, which we use to approximate the unknown density of non-negative disturbances, ui, for i = 1; 2; ;N, in this paper.    Before discussing how we approximate the unknown density of ui, it is helpful to illustrate the

basic idea behind the transformation approach. Let Xi; i = 1; 2; ; N denote generic positive    random variables with an unknown density p(x). To estimate p(x), we transform the data with a e transformation function, G( ), such that the transformed data (denoted by Yi, for i = 1; 2; ; N)     are unbounded: e

Yi = G(Xi); for i = 1; 2; ; N:   

Examples of G( ) include the Box-Cox transformation and the shiftede power transformation. With  the transformed data, we can calculate the Rosenblatt-Parzen kernel density estimator:

N 1 1 Yi y p (y) = K ; trans e b b N i=1 X   b where b is a bandwidth and K( ) is a kernel function,e which is often chosen to be a Gaussian kernel.  The density estimator of the the original data, Xi, for i = 1; 2; ; N, can then be obtained by    “back-transforming” the density estimator of the transformed data: e

N dG(x) 1 1 G(Xi) G(x) dG(x) p(x) = ptrans (G(x)) = K :  dx N e b b dx i=1   X

b b Buch-Larsen et al. (2005) show that the biase of p(x) is of an order of O(b2) and the

b 8 p(x) has an order of O 1 . Under the assumption that b 0 and Nb as N , Nb ! ! 1 ! 1   p(x) p(x) = op(1) ande Nb (p(x) E [p(x)]) N (0;R(K) G0(x) p(x)), where N(:; :) b ! j e j e stands for the normal distributionp and R(K) = K2(u)du. b e b b In our model, we choose the logarithm transformationR for G( ) when approximating the un-  known density of ui. The log-transformation is a special case of the Box-Cox transformation suggested by Wand et al. (1991) and thus is capable of transforming the non-negative inefficiency

term, ui, for i = 1; ;N, into a scalar variable with an unbounded support.    Let i = ln ui, for i = 1; 2; ;N,  = (1; 2; ; N )0, and (i) = (1; 2; ; i 1; i+1; ; N )0.             As i has an unbounded support, the density of i, denoted by p (i), can be approximated by the following Rosenblatt-Parzen kernel density estimator:

N 2 1 1 i j p i (i);  =  ; j N 1   j=1;j=i    X6 b where ( ) is the standard Gaussian density function and  is the bandwidth. The purpose for  using leave-one-out version is to exclude (0=)=, which can be made arbitrarily large when  is arbitrarily small.

Assuming that 1; 2; ; N are independent to each other, we define the density estimator of    = (1; 2; ; N )0 as    N 2 2 p   = p i (i);  : j i=1 j  Y  Applying the Jacobian of log transformationb to pb(  2) gives the following density estimator of j inefficiency vector u= (u1; u2; ; uN )0:    b

p u  2 = p ln u  2 det (J) ; (3) j j  j j   b b where J is a diagonal matrix with its ith diagonal element being 1=ui. Note that the independence

of 1; 2; ; N implies the independence of u1; u2; ; uN . The density estimator of ui can be      

9 obtained through (3) as

N 2 1 1 ln ui ln uj p ui u(i);  =  I(ui > 0); (4) j (N 1)ui   j=1;j=i    X6 b for i = 1; 2; ;N, where u(i) = (u1; u2; ; ui 1; ui+1; ; uN )0, and I( ) is an indicator func-           tion whose value is one for a true argument and zero otherwise. Hence, we obtain a kernel-based semi-parametric stochastic production frontier model defined by (1), (2) and (4). It is straightforward to show that for the kernel-based semi-parametric stochastic frontier model, technical efficiency (TE) can be computed as

TEi = exp( ui); (5) for i = 1; 2; ;N.    While we focus on the case of time-invariant inefficiencies in this paper, the above framework can be extended to allow for time-varying inefficiencies. A straightforward way is to assume that uit is i.i.d over both i and t. With this assumption, the density estimator of the inefficiency term becomes

N T 2 1 ln uit ln ujr p uit u(it);  =  I(uit > 0); j (NT 1)uit  j=1;j=i r=1;r=t    X6 X6 b th where u(it) is the inefficiency vector without the it element. In addition, the above framework can also be extended to allow for inefficiency heterogeneity, an important issue in the stochastic frontier analysis literature (Greene, 2004). Specifically, one can

follow Terrell and Scott (1992) and let the bandwidth differ across firms (i.e.,  i, i = 1; 2; ;N),    thus allowing the inefficiency of each firm to be approximated by a different density estimator. Our Bayesian algorithm given in Section 3 can be modified slightly to accommodate firm-specific

bandwidth ( i). Due to space limitations, we will not dwell on this extension here, but leave it for future research.

10 3.

In this section, we discuss a Bayesian procedure for estimating the kernel-based semi-parametric stochastic frontier model. Since the exponentially-distributed stochastic frontier model and the Dirichlet stochastic frontier model are used as benchmark models in Sections 4 and 5, we also briefly discuss the Bayesian inference procedures for the latter two models. For notational simplic- ity, we refer to the first model as the “kernel model” and the latter two models as the “exponential model” and the “Dirichlet model” respectively.

3.1. Bayesian Inference for the Kernel Model

We first introduce some matrix notations: yi = (yi1; yi2; ; yiT )0, y= (y0 ; y0 ; ; y0 ; ; y0 )0,    1 2    i    N xi = (xi1; xi2; ; xiT )0, x= (x0 ; x0 ; ; x0 ; ; x0 )0, xit = (1; xit;1; xit;2; ; xit;k)0, xi =    1 2    i    N    (xi1; xi2; ; xit; ; xiT )0, and x = x0 ; x0 ; ; x0 ; ; x0 0.       1 2    i    e N e For successful inference on our kernel-based stochastic frontier model, we implement the idea e e e e e e e e e of centering, as discussed in Introduction. Such an implementation involves a reparameterization

from ( ;u) to ( ;), where = ( ;  ; ;  ) and  = ui. This reparameterization is 1 2    N i necessary because our model (i.e., (1)) implies that the data are naturally informative on ui, and thus identification of and ui separately relies on the distribution on ui. If ui is quite flexible (as in our case), the commonly-used uncentered parameterization can result in very slow convergence. Ritter and Simar (1997) discuss the problems caused by this identification problem in the context of Gamma-distributed stochastic frontier models. To avoid these problems, we follow Griffin and Steel (2004) and use a hybrid sampler, which randomly mixes updates from the centered and uncentered parameterizations. In what follows, we will discuss the centered and uncentered parameterizations in detail.

3.1.1 Uncentered parameterization For the uncentered parameterization, we need to specify priors for the following parameters: ( ; h;  2; u), where ( ; ). Specifically, for and h, we follow O’Donnell and Coelli (2005) 

11 and use the following priors:

p ( ) 1; (6) / 1 1 p(h) , where h = . (7) / h 2

The density of ui is approximated by (4), which we copy here for convenience:

N 2 1 1 ln ui ln uj p ui u(i);  =  I(ui > 0): (8) j (N 1)ui   j=1;j=i    X6

For the bandwidth, , we treat it as a parameter in this paper. In the context of kernel density estimation based on direct observations, there exist some investigations involving similar treatment (see for example, Brewer, 2000; Gangopadhyay and Cheung, 2002; de Lima and Atuncar, 2011). In nonparametric and semi-parametric regression models, bandwidths are also treated as parameters (Härdle, Hall and Ichimura, 1993; Rothe, 2009, among others). Following Zhang et al (2014), we choose an inverse Gamma density as the prior of  2:

1+1 1 2 1 1 1 p  = (1) exp ; (9) 2  2  2    2     where the 1 = 1 and the 2 = 1=0:05. The reason for choosing an inverse Gamma prior is that it is the variance of each Gaussian component in (8) and the prior of the variance of a Gaussian distribution is usually chosen as an inverse Gamma density (Geweke, 2009, p. 38–39). The is shown to be

N T=2 2 h h L y ; h;  ; u = exp (y x i + ui T )0 j (2)T=2 2 i i=1    Y (y xi + uiT )] ; (10) i g e e 2 where  T is a T 1 vector of ones. The joint posterior density p ( ; h;  ; u y ) is the product of  j

12 the priors in (6 )–(9) and the likelihood function in (10). In order to simulate from the joint posterior density, we follow previous studies (e.g., Koop, 2003) and use a Gibbs sampler with data augmentation. The expression “data augmentation” comes from the fact that it is convenient to augment the observed data by drawing observations of

ui. The Gibbs sampler with data augmentation involves drawing sequentially from the full condi- tional posterior densities of parameters in the model under study. In our case, the full conditional posterior densities are

2 p y; h;  ; u fNormal ; V ; (11) / 2  2  p h y; ; u;  fGamma h v; s ; (12) / j 1+1 N 2  1  1 2 p( y; ; h; u) exp p ui u(i);  : (13) j /  2  2 j    2  i=1 Y  N h N 2 0 2 p u y; ; h;  exp  i p ui u(i);  (14) / 2 i j i=1    i=1  Y Y  e e where fNormal and fGamma denotes the normal and Gamma distribution respectively, 1 N N V = h i=1 xi0 xi , = V h i=1 xi0 (yi + uiT ) ,  2 1  N h i v = T N=P2, s = (y xi P+ uiT )0 (y xi + uiT ) , and i = yi xi +uiT . e e2 i=1 i e i h i P e 3.1.2 Centered parameterization e e e The centered parameterization requires a reparameterization from ( ;u) to ( ;). Accordingly, we need to specify priors for the following parameters: ( ; ; h;  2; ). Specifically, for h and  2, we use the same priors as in the case of uncentered parameterization, which are given by (7) and (9), respectively. For and , we follow Koop et al. (1997) and use the following priors:

p( ) 1; (15) / p ( ) 1: (16) /

13 The prior of i can be derived from the prior of ui given by (8):

1 N ln(  ) ln(  ) p ( )  i j I( >  ): (17) i j / (N 1)(  )  i i j=1;j=i   X6

The likelihood function after the reparameterization becomes

N T=2 2 h h L y ; ; h;  ; u = exp (y xi  T )0 j (2)T=2 2 i i i=1    Y (y xi  T )] : (18) i i g

Accordingly, the joint posterior p ( ; ; h;  2; u y ) is the product of the priors in (7), (9), (15), j (16) and (17), and the likelihood in (18). The full conditional posteriors for , , h,  2, and  become, respectively

p y, , 2,h,  p ( ) ; (19) / j 2  p y; ; h;  ;  fNormal ; V  ; (20) / 2   2  p h y; ; ; ;  fGamma h v; s ; (21) / j   1 1+1  1 p( 2 y; ; ; h;  ) exp p ( ) ; (22) j /  2  2 j    2  N 2 h p  y; ; ; h;  exp 0 i p ( ) : (23) / 2 i j i=1     Y

N N ln(  ) ln(  ) where p ( ) 1  i j I( > max( )), i=1 (N 1)( i) j=1;j=i  i j / 1 6 N   N V  = hQ x0 xi , P  = V  h x0 (y  T ) , i=1 i i=1 i i i 2 1 N   h i s = 2 i=1 (yi xi PiT )0 (yi xi iT ) , and i = yi xi PiT .  SamplinghP from (19)–(23) can be implemented usingi the Metropolis-Hastings algorithm with constraints.

3.2. Bayesian Inference for the Exponential Model

When ui is exponentially distributed, (1) becomes an exponential stochastic frontier model. For this model, we consider only uncentered parameterization because the exponential distribution

14 is not flexible, and thus a reparameterization from ( ;u) to ( ;) is not required. Consequently,

1 we need to specify priors for the following parameters: ; h;  ; u . To facilitate comparison of simulation and empirical results between this model and the kernel model in Sections 4 and 5, we use the same priors for and h as in the kernel model, which is given by (6) and (7). Since the

exponential distribution is a special case of the Gamma distribution, the prior of ui is

1 1 p ui  = fGamma ui 1;  . (24)  

According to Fernandez et al. (1997), in order to obtain a proper posterior we need a proper prior for the parameter, . Therefore, we choose

1 1 p( ) = fGamma( 1; ln  ), (25) j

where  is the prior of the efficiency distribution. For our application to U.S. large com- mercial banks in Section 5, we with various values of  ranging from 0:50 to 0:99 and

find that our results reported in Section 5 are very robust to large changes in . Other studies that use the exponential inefficiency distribution have reported similar findings. For example, Koop et al. (1997) investigates the efficiency of U.S. hospitals during 1987–1991 and find that their empir- ical results are extremely robust to enormous changes in : Feng and Zhang (2012) examines the technical efficiency for large banks in the US during 1997–2006 and find that their results are very robust to large changes in . With regard to the likelihood function, it is the same as in the uncentered parameterization for

1 the kernel model, which is (10). Therefore, the joint posterior density p ; h;  ; u y becomes j the product of the priors in (6), (7), (24), and (25) and the likelihood function in (10). 

15 1 It is straightforward to show that the full conditional posterior densities for , h,  and u are

1 p y; h;  ; u fNormal ; V ; (26) / 1  2  p h y; ; u;  fGamma h v; s ; (27) / j 1  1  p( y; ; h; u) fGamma  N + 1; u0 ln  ; (28) j / j 1 1 1 p ui y; ; h;  fNormal ui zi q (T h) ; (T h) I(ui > 0); (29) / j i  

2 T where V , , v, and s are defined as above. qi = t=1 qit=T and zi is a matrix containing the average value of each explanatory variable for individualP i. Under very mild assumptions (Tierney, 1994), these draws will converge to draws from the posterior joint density function.

3.3. Bayesian Inference for the Dirichlet Model

In this model, the distribution of ui is modeled non-parametrically through a Dirichlet process prior. Formally,

ui F;  F DP (M;H) ; (30) 

where F is a random probability measure; DP (:) is a Dirichlet process; M is a positive scaling parameter; and H is a base distribution. The realizations of F are discrete distributions (Ferguson,

1973) and there are K N distinct values of the inefficiencies for N firms (u1;u2; :::; uN ). As in  Griffin and Steel (2004), we call these distinct values (u(1);u(2); :::; u(K)) and let ni be the number

of time u(i) occurs. With these notations, the prediction for the inefficiency of a new firm is written as follows

K n M F u u u ; :::; u ; n ; n ; :::; n = i  + H; (31) N+1 (1); (2) (K) 1 2 K M + N u(i) M + N j i=1  X

where u(i) is an indicator function that takes value one if uN+1 = u(i) (i.e., uN+1 belongs to the

16 th i cluster). (31) indicates that the probability of uN+1 taking a value from the existing cluster i is

ni M M+N , and the probability of uN+1 taking a new value from the centring distribution H is M+N . Following Griffin and Steel (2004), we use the following priors for the three parameters (M,

H and ) related to the inefficiency term

M inverted-Be (a; b) ; (32) 

H Ga (j; ) ; (33) 

 Ga (j; ln ) ; (34) 

where Be (; ) denotes a Beta distribution; Ga (; ) denotes a Gamma distribution; and  is the implied prior median efficiency. Priors for other parameters are also chosen following Griffin and Steel (2004). Due to space limitations, full conditional posterior densities for this model are omitted. Interested readers are referred to Griffin and Steel (2004) for more details.

4. Monte Carlo Simulation Study

In this section, we perform a Monte Carlo study to investigate the performance of the kernel- based semi-parametric stochastic frontier model in terms of its ability to recover true inefficiencies. In doing so, we use two benchmark models for comparison: (1) the exponential stochastic frontier model; and (2) the Dirichlet stochastic frontier model of Griffin and Steel (2004). The former benchmark model is chosen because it is arguably the most commonly-used parametric stochastic frontier model (O’Donnell and Coelli 2005; Koop et al., 1997). The latter benchmark model is chosen because it is arguably the best known semi-parametric stochastic frontier model where the inefficiency term is modeled nonparametrically. The data generating process is given by

yit = 1 + xit ui + vit; (35)

17 where xit is a scalar generated from a uniform distribution: xit U(0; 1), and  the measurement error, vit, is generated from vit N(0; 0:04) as in Koop (2003). For the nonneg-  ative inefficiency term, ui, we consider two distributions: 1). a generalized Gamma distribution, GGa(2; 2; 20), and 2). an exponential distribution: exp( log(0:85)). The reason for using the generalized Gamma distribution is that to the best of our knowledge, this distribution is by far the most general parametric distribution used in the literature (see, for example, Griffin and Steel, 2008) and nests many commonly used parametric distributions, such as the Gamma distribution, the exponential distribution, and the half normal distribution, as special cases. The reason for using the exponential distribution is that to evaluate the performance of our kernel model in the extreme situation where the exponential model has full advantages. In our Monte Carlo study, we consider N = 100, 200, 300, 400, and T = 6 (a short panel is used to ensure the time invariance assumption on ui). For each experiment, 500 repetitions are conducted. As mentioned in the Introduction, our assessment of the performance of the kernel model against the exponential model is based on three different measures. The first measure is the average Euclidean distance between the estimated vector of technical efficiencies (denoted as TE_est) and N 1 2 true vector of technical efficiencies (denoted as TE_true): d = N (TE_esti TE_truei) . si=1 This measure summarizes the performance of alternative estimatorsP in calculating technical effi- ciencies. The second measure is the Spearman rank correlation coefficient between the estimated vector of technical efficiencies and true vector of technical efficiencies:

N 2 6 (Ranki1 Ranki2)  = 1 i=1 , N (N 2 1) P

where Ranki1 — the rank of firm i in terms of estimated technical efficiency and Ranki2 — the rank of the same firm in terms of true technical efficiency. This measure summarizes the performance of alternative estimators in firms in terms of their efficiencies. The closer it is to 1, the higher the rank correlation between the estimated vector of technical efficiencies and true vector of technical efficiencies, thus the better the estimator. The third measure is the coverage probability of , i.e., the that the estimated credible interval contains the true technical

18 efficiency value in 500 repetitions. This latter measure measures the performance of alternative estimators in terms of the accuracy of their credible intervals. Table 1a summarizes the simulation results for the case where the true inefficiency is generated from the generalized Gamma distribution. We first compare the kernel model and the exponential parametric model. As can seen from Panel A of this table, when N = 100 and T = 6, the kernel model outperforms the exponential parametric model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability), but underperforms the latter on the other measure (i.e., the Spearman Rank Correlation Coefficient). However, when N is large enough (i.e., N 200), the kernel model outperforms the exponential parametric model on all the  three measures. We then compare the kernel model and the Dirichlet model. For all four combinations of N and T , the kernel model outperforms the Dirichlet model on two out of the three measures, namely, the average Euclidean efficiency distance and the coverage probability. Taking the case where N = 400 and T = 6 for example, the average Euclidean efficiency distance for the kernel model is 0:004, whereas that for the Dirichlet model is 0:005, suggesting that on average estimates of technical efficiency obtained from the kernel model are more accurate than those obtained from the Dirichlet model. The coverage probability for the kernel model is 0:910, indicating that for most observations, the confidence interval obtained from the kernel model contains the true efficiency value. In contrast, the coverage probability for the Dirichlet model model is lower, being 0:820. That said, we note that the kernel model underperforms the Dirichlet model on the third mea- sure (i.e., the Spearman correlation coefficients). For example, when N = 400 and T = 6, the Spearman correlation coefficient is 0:710 for the kernel model and 0:756 for the Dirichlet model. This suggests that the efficiency ranking based on the kernel model has a lower correlation with the true efficiency ranking than does the efficiency ranking based on the Dirichlet model. Table 1b presents the simulation results for the case where the true inefficiency is generated from the exponential distribution. As expected, both the kernel and Dirichlet stochastic frontier models underperforms the exponential stochastic frontier model under this setting. Furthermore, it is interesting to note that as in the case of the generalized Gamma setting, the kernel model

19 still outperforms the Dirichlet model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability), but underperforms the Dirichlet model on the other measure (i.e.,the Spearman correlation coefficients). In summary, the simulation results suggest that the kernel model outperforms the exponential model on all the three measures as long as N is large enough (i.e., N 200). In addition, the  kernel model performs better than the Dirichlet model on two of the three measures (namely, the average Euclidean efficiency distance and the coverage probability).

5. An Application to U.S. Large Commercial Banks

In this section we apply a kernel-based semi-parametric stochastic distance frontier model (SDF) to analyze the efficiency of large banks in the U.S. over the period 2000 – 2005. In do- ing so, we mainly use the corresponding exponential parametric SDF model as a benchmark. In addition, we also briefly compare the kernel-based SDF model with the corresponding Dirichlet- process-based semi-parametric SDF model at the end of this section. The data used are obtained from the Reports of Income and Condition (Call Reports) published by the Federal Reserve Bank of Chicago. We examine only continuously operating large banks to avoid the impact of entry and exit and to focus on the performance of a of healthy, surviving institutions during the sample period. In this application, large banks are defined to be those with assets of at least $1 billion (in 2000 dollars) in the last three sample years. This gives a total of 292 banks over 6 years. To select the relevant variables, we follow the commonly-accepted intermediation approach proposed by Sealey and Lindley (1977), whereby banks collect purchased funds and use labor and capital to transform these funds into loans and other assets. On the input side, three inputs are included: i) the quantity of labor; ii) the quantity of purchased funds and deposits; and iii) the quantity of physical capital, which includes premises and other fixed assets. On the output side, three outputs are specified: i) consumer loans; ii) securities, which includes all non-loan financial assets (i.e., all financial assets minus the sum of all loans, securities, and equity); and iii) non-consumer loans, which is composed of industrial, commercial, and real estate loans. All the quantities are constructed by following the data construction method in Berger and Mester

20 (2003). These quantities are also deflated by the GDP deflator to the base year 2000, except for the quantity of labor. Following the common practice in the literature, all variables are normalized by their , as this normalization can facilitate the evaluation of the monotonicity and curvature conditions below (e.g., O’Donnell and Coelli, 2005; Malikov et al, 2015).

5.1 The Kernel-based Semi-parametric SDF Model and the Exponential SDF Model To allow for multiple outputs, we use a translog output distance function to represent the pro- duction technology of commercial banks in the U.S. As can be seen below, this output distance function can be transformed into an estimable equation that has the standard form of the stochas- tic frontier model as described in Section 2. Specifically, the translog output distance function is written as

M 1 M M ln D (y; x; t) = a + a ln y + a ln y ln y o 0 m m 2 mp m p m=1 m=1 p=1 X X X L 1 L L 1 + b ln x + b ln x ln x +  t +  t2 l l 2 lj l j  2  l=1 l=1 j=1 X X X L M M L

+ glm ln xl ln ym + mt ln ym + lt ln xn, (36) l=1 m=1 m=1 l=1 X X X X where t denotes a time trend serving as a proxy for technical change. The usual symmetry re- strictions require amp = apm and blj = bjl. Moreover, to ensure linear homogeneity of the output distance function in y, the following restrictions are imposed

M M M M

am = 1; amp = glm = m = 0. (37) m=1 p=1 m=1 m=1 X X X X The output distance function in (36) can be easily transformed into a standard stochastic frontier model by exploiting the linear homogeneity restrictions in (37). Specifically, we follow Lovell et al. (1994) and O’Donnell and Coelli (2005) and impose the linear homogeneity by normalizing

21 (36) by one of the outputs (say, output M). This normalization gives us

y ln yM = ln Do ; x; t + u; (38) y  M  where u ln Do (y; x; t) 0 is the usual inefficiency term. After adding an idiosyncratic error,   v, (38) can be further written as

y ln yM = ln Do ; x; t + u + v, (39) y  M  which is an estimable equation in the form of a standard stochastic frontier model as described in Section 2.

y Noting that ln Do ; x; t has a translog functional form, the stochastic distance frontier yM model in (38) can be written more explicitly as

M 1 M 1 M 1 y 1 y y ln y = a + a ln m + a ln m ln p M 0 m y 2 mp y y m=1 M m=1 p=1 M M X   X X     L 1 L L 1 + b ln x + b ln x ln x +  t +  t2 l l 2 lj l j  2  l=1 l=1 j=1 X X X L M 1 M 1 L ym ym + glm ln xl ln + mt ln + lt ln xl + u + v. (40) yM yM l=1 m=1 m=1 l=1 X X   X   X In matrix notations, equation (40) can be written compactly as

qit = zit0 + ui + vit, (41)

where qit = ln yM;it, zit is a vector comprising all the variables appearing on the right hand side of (40), and refers to the corresponding vector of coefficients of the translog function (including the intercept).

If the density of ui is approximated by the kernel density function in (4), we obtain a kernel- based semi-parametric stochastic distance frontier model. If ui is assumed to be distributed as an exponential distribution, we obtain an exponential parametric stochastic distance frontier model. If

22 ui is approximated by the Dirichlet process, we obtain a Dirichlet process-based semi-parametric stochastic distance frontier model. For notational convenience, we refer to the former model as the “kernel SDF model” and the latter two models as the “exponential SDF model” and the “Dirichlet SDF model” respectively.

5.2 Statistical Comparison of the Kernel and Exponential SDF Models Before proceeding to model comparison, we first check if the monotonicity and curvature con- ditions, implied by microeconomic theory, are satisfied for the two SDF models. Monotonicity

L M requires that kl = @ ln Do (y; x; t) =@ ln xl = bl + blj ln xj + glm ln ym +  t 0 for j=1 m=1 l  M L l = 1; 2; 3 and rm = @ ln Do (y; x; t) =@ ln ym = amP+ amp ln yPp + glm ln xl + mt 0 p=1 l=1  for m = 1; 2; 3. To simplify these nonlinear constraints,P we follow O’DonnellP and Coelli (2005) and deflate the sample data so that all output and input variables have a sample mean of one (as mentioned above) and the time trend has a sample mean of zero. When evaluated at these variable

means, @ ln Do (y; x; t) =@ ln xl and @ ln Do (y; x; t) =@ ln ym collapse to bl and am respectively,

and the monotonicity conditions can therefore be expressed as bl 0 and am 0. As for curva-   ture, it requires that Do (y; x; t) be quasi-convex in inputs and convex in outputs (O’Donnell and Coelli, 2005; Feng and Serletis, 2010). We first estimate both SDF models without imposing the monotonicity and curvature condi-

tions. These regularity conditions are then checked by evaluating the posterior means of kl, rm, the bordered Hessian matrix associated with the quasi-convexity in inputs, and the Hessian matrix as- sociated with the convexity in outputs4. We then calculate the proportions of regularity violations relative to the total number of observations, and present the results in Table 2. For the exponential

SDF model, Table 2 show that only two (k2 and r1) of the six monotonicity conditions are satisfied at all the observations and that both curvature conditions are violated, with the quasi-convexity in outputs being violated at all observations. For the kernel SDF model, all the regularity conditions are violated. In addition, compared with the exponential SDF model, the kernel SDF model has more violations for each of the regularity conditions. Since monotonicity and curvature are not at- tained in the unconstrained models, we reestimate the two models with these conditions imposed, 4Due to space limitations, these two matrices are not presented here. For more details, see Feng and Serletis (2010).

23 by following the Bayesian procedure discussed in O’Donnell and Coelli (2005). The estimated parameters and their associated 95% credible intervals for the two models are reported in Tables 3 and 4 respectively. To check the convergence performance of the sampling algorithms, we also report the simula- tion inefficiency factor (SIF) for each coefficient of the two models. The SIF can be interpreted as the number of successive iterations needed to obtain independent draws (see, for example, Kim et al.,1998). In our experience, a sampler achieves reasonable mixing performance when the result- ing SIF value is below 100. Looking at the SIF values for the kernel SDF model in Table 3, we see that all of them are less than 70, suggesting that the sampler has converged. This is also true for the exponential SDF model, because the SIF values for this latter model (shown in Table 4) are all below 60. In addition, we note that the use of the centering parameterization has significantly improved the convergence of the kernel SDF model. We now turn to comparing the estimation performance of the kernel and exponential SDF models. In doing so, we compute the Bayes factor (Kass and Raftery, 1995), a commonly-used

Bayesian method for model comparison. Letting MI and MJ denote two competing models, the

Bayes factor is defined as the ratio of the posterior odds of MJ to MI multiplied by the prior odds of MJ to MI . When both models have an equal prior likelihood, the Bayes factor becomes

Pr (D MJ ) BJI = j , Pr (D MI ) j

where Pr (D MJ ) and Pr (D MI ) are the marginal likelihood of the data for MJ to MI , respec- j j tively. The Bayes factor summarizes “the evidence provided by the data in favor of one scien- tific theory, represented by a , as opposed to another” (Kass and Raftery, 1995).

Kass and Raftery (1995) suggest using the Schwarz criterion, S = l (D MJ ) l (D MI ) j j 1 (dJ dI ) log n ln BJI , for evaluating evidence, where l ( ) is the maximized log marginal 2   likelihood, dJ (dI ) is the number of parameters of model J (model I), and n is the sample size. This gives an approximation to the Bayes factor without specification of an explicit prior. 2 S 

24 can then be used with the following table to judge which model is preferred by the data

2 ln BJI Evidence against MI 0 to 2 Not worth more than a bare mention 2 to 6 Positive : 6 to 10 Strong > 10 Very strong

For our particular case, the value of 2S of the kernel SDF model against the exponential SDF model is 13:7, suggesting strongly that the former outperforms the latter. This result further confirms our previous conclusion from the Monte Carlo simulations that the kernel-based semi-parametric stochastic frontier model outperforms the exponential stochastic frontier model.

5.3 Comparison of Efficiency Estimates from the Kernel and Exponential SDF Models Having established the superiority of the kernel SDF model using the Bayes factor, it is of interest to compare the efficiency estimates from the two SDF models. In particular, in what follows we will compare the two SDF model in terms of the posterior efficiency distributions of observed (within sample) banks and the posterior predictive efficiency density function of an unobserved (out of sample) bank. We first compare posterior efficiency density functions for some observed banks obtained from the two SDF models. Figure 1 presents the posterior efficiency densities for five banks: the banks corresponding to the minimum, maximum and quartiles of the efficiency distribution, as measured by the posterior mean efficiencies in the exponential parametric model5. We see that for the worst, best, and third quartile banks, there is a big difference between the efficiency inference resulting from the two SDF models. Specifically, the best and third quartile banks see their efficiency esti- mates artificially overestimated due to the use of the exponential functional form of the parametric model, while the worst bank see its efficiency estimate artificially underestimated due to the use of

5The posterior efficiency density of each bank is obtained by applying a kernel density estimator to simulated values of efficiency resulting from the estimation of the corresponding SDF model, with bandwidths selected by likelihood cross-validation.

25 the exponential model. This finding is consistent with Griffin and Steel (2004) that suggests that firm-specific efficiency distributions obtained from the exponential parametric model are substan- tially different from those obtained from semi-parametric stochastic frontier models. With regard to the median and first quartile banks, the difference between the efficiency inference resulting from the two SDF models is also noticeable, though not as pronounced6. Next, we turn to comparing the posterior predictive efficiency density function corresponding to an unobserved bank produced by the two SDF models. Theoretically, the posterior predictive effi- ciency density function can be calculated by using the following formula: p (TEN+1 y ; y ; :::; y ) j 1 2 N 2 = p (TEN+1 ) p ( y ; y ; :::; y ) d (where = ( ; ; h;  ; u) denotes all the parameters j j 1 2 N andR in the model under consideration). Given draws (s), s = 1; :::; M, from the posterior density function p ( y ; y ; :::; y ), the posterior predictive efficiency density function j 1 2 N 1 M (s) can be approximated by M s=1 p TEN+1 . The posterior predictive efficiency den-   sity functions obtained from theP two SDF models are plotted in Figure 2. As can be seen from this figure, the exponential SDF model is restrictive to capture the data information. In particu- lar, the kernel SDF model places a lot of probability mass in the area (0:6, 0:8), which cannot be accommodated by the parametric model based on an exponential inefficiency distribution. The marked difference between the kernel and exponential SDF models is also reflected in their estimates of technical change, returns to scale, and productivity growth, as evidenced by Tables 5, 6, and 7 respectively. Here, technical change (TC), returns to scale (RTS), and productivity growth (TFPG) are defined as in Caves et al. (1982) and Färe and Grosskopf (1994, p. 103) :

TC = @ ln Do (y; x; t) =@t; L @ ln Do (y; x; t) RTS = "l, where "l = ; @ ln xl l=1 X L "l TFPG = TC + SC + EC, where SC = (RTS 1) x_ l; RTS l=1 X   6We find that the difference is much larger for the median and first quartile banks when the theoretical monotonity and curvature conditions implied by microeconomic theory are not imposed. This suggests that for the latter two banks, the relatively smaller difference between the efficiency inference resulting from the two SDF models is caused by the imposition of the theoretical regularity conditions.

26 d ln xl where x_ l is the growth rate for input l (i.e., x_ l = dt ), SC is scale effects, and EC is efficiency change, which is zero because inefficiency is assumed to be constant over time in this paper. Table 5 shows that the estimates of the average annual technical change obtained from the kernel SDF model from 1:19% to 4:18%, whereas those obtained from the exponential SDF model vary within a narrower range ( 0:030% to 3:98%). We also note that while both SDF models suggest that technical change declines over time, the kernel model shows a faster declining rate. Turning to the estimates of returns to scale displayed in Table 6, we see that both the point estimates and 95% credible intervals produced by the kernel SDF model are larger than one (ranging from 1:0291 to 1:0361), indicating that large banks in the U.S. show increasing returns to scale during the sample period. In contrast, all the 95% credible intervals produced by the exponential SDF model contain one, indicating that large banks in the U.S. show almost constant returns to scale. Finally, looking at Table 7, we see that compared with the estimates of average annual productivity growth resulting from the exponential SDF model, those resulting from the kernel SDF model vary within a wider range and also decline at a faster rate over time. Although the estimates of technical change and productivity growth obtained from the two models are quantitatively different, they are qualitatively similar to those reported by previous studies. For example, Feng and Zhang (2012) apply a Bayesian true random-effects translog stochastic distance frontier model to a panel of 350 large banks over the period 1997–2006 and find that technical change (productivity growth) declines from 2.53% to 0.05% (from 2.41% to 1.64%). Finally, we briefly compare the efficiency ranking of individual banks based on the kernel SDF model with that based on the exponential SDF model. It is well known that efficiency estimates are very sensitive when different parametric distributions are assumed for the inefficiency term, whereas are highly robust to different parametric distributions (Greene, 2008). Thus, it is of interest to examine if rankings are still robust when a parametric exponential distribution is assumed for the inefficiency term in one model, while a nonparametric kernel density estimator is used to approximate the inefficiency term in the other model. For this purpose, we calculate both Pearson correlation coefficient and Spearman rank correlation coefficient between the vector of technical efficiencies obtained from the exponential SDF model and that obtained from the kernel

27 SDF model. We find that both correlation coefficients are pretty high with the former being 0.928 and the latter being 0.929, suggesting that the efficiency ranking based on the kernel SDF model has a pretty high correlation with that based on the exponential SDF model. This result indicates that rankings are not only highly robust to different parametric distributions, but also pretty robust when the inefficiency term is modeled nonparametrically.

5.4 Comparison of Efficiency Estimates from the Kernel and Dirichlet SDF Models It would also be of interest to briefly compare the estimates of technical efficiency obtained from the kernel SDF model with those obtained from the Dirichlet SDF Model. Before that, we compute the Bayes factor between these two models, and the value of 2S of the kernel SDF model against the Dirichlet SDF model is 3:8, suggesting that the former slightly outperforms the latter. Figure 3 plots the posterior efficiency densities obtained using the Dirichlet SDF model, to- gether with those obtained using the kernel SDF model. As can be seen, the firm-specific posterior efficiency densities obtained from the two models largely overlap with each other. This is espe- cially true for the third quartile bank, where the posterior efficiency densities obtained from the two models almost coincide. Figure 4 plots the posterior predictive efficiency density functions obtained from the two models7. This latter figure shows that the posterior predictive efficiency dis- tributions from the two models are very close to each other, both placing a lot of probability mass in the area (0:6, 0:8). Thus, in general we find that the kernel and Dirichlet SDF models produce quite similar results.

5.5. Robustness Check In this subsection, we examine the robustness of our findings regarding technical efficiency to alternative priors for the bandwidth parameter. Specifically, following Zhang et al. (2009, 2006), we consider the following four priors for the bandwidth parameter: (1) the exponential density, which is a special case of the inverse Gamma density in (9); (2) the half cauchy density, which belongs to half-t family and is a special case of the folded non-central t distribution (Johnson and

7Note that as in Griffin and Steel (2004), the posterior predictive density function with the Dirichlet model is approximated using a piecewise linear interpolation of the posterior predictive distribution function which, when differentiated, gives a piecewise constant estimate of the density function.

28 Kotz, 1972); (3) the lognormal density; and (4)  2 1= 2 (Geweke, 2005), which is flat (or nearly  flat) over the range of the parameter space in which the likelihood function is concentrated. Figure 5 plots the posterior efficiency densities for the third quartile bank, obtained using the four alternative priors respectively. Due to space limitations, the posterior efficiency densities for the other four banks are not presented here. As can be seen from this figure, the kernel SDF model still assigns most mass to the interval (0.6, 0.8) for each of the four alternative priors of the bandwidth parameter. Consequently, the third quartile bank still sees its efficiency estimates artificially boosted due to the use of the exponential parametric model. This finding is consistent with that obtained using the inverse Gamma distribution. Figure 6 plots the posterior predictive efficiency density function for each of the four alternative priors of the bandwidth parameter, together with that obtained using the exponential parametric model. As can be seen from this figure, the kernel SDF model still places a lot of probability mass in the area (0:6, 0:8), which cannot be accommodated by the parametric model based on an exponential inefficiency distribution. This latter finding is also consistent with our previous finding obtained using the inverse Gamma distribution. Hence, we conclude that our findings regarding technical efficiency are robust to alternative priors for the bandwidth parameter.

6. Conclusion

In this paper, we propose a semiparametric modelling framework for stochastic frontier models, where the inefficiency term is approximated by a log-transformed Rosenblatt-Parzen kernel density estimator. We present a sampling algorithm for the kernel-based semiparametric stochastic frontier model. This sampling algorithm has two important features. First, it provides a data-driven solution to the problem of simultaneously estimating bandwidth for the kernel estimator of the density of the inefficiency term. Second, it uses a hybrid sampling procedure, which randomly mixes updates from the centered and uncentered parameterizations, in order to identify the intercept and the inefficiency term separately. Our Monte Carlo study shows that the kernel-based semiparametric stochastic frontier model outperforms the commonly exponential stochastic frontier model under three measures, namely the Euclidean distance between the estimated and true vectors of technical

29 efficiencies, the Spearman rank correlation coefficient between the estimated and true vectors of technical efficiencies, and the coverage probability of credible interval. We apply a kernel-based semiparametric stochastic distance frontier (SDF) model to a sample of large banks in the U.S. from 2000 to 2005. Our Bayes factor analysis suggests that the kernel SDF model is preferred over the exponential SDF model. Our analysis on posterior predictive efficiency density function shows that predicting the efficiency for an unobserved bank on the basis of the exponential SDF model is misleading. Our further analysis on posterior efficiency distributions of observed banks shows that the efficiency inference on observed banks is also very different from what we conclude on the basis of the kernel SDF model. In particular, we find that the exponential SDF model tends to substantially overestimate efficiency. In addition, we find that the kernel model and the Dirichlet model produce very similar firm-specific posterior efficiency densities and posterior predictive efficiency distributions.

30 References

Aigner, D., Lovell, C.A.K., Schmidt, P. (1977) Formulation and estimation of stochastic frontier production function models. Journal of Econometrics 6: 21–37.

Battese, G.E., Coelli, T.J. (1992) Prediction of firm-level technical efficiencies with a generalized frontier production function and panel data. Journal of Econometrics 38: 387-399.

Berger, A.N., Mester, L.J. (2003) Explaining the dramatic changes in the performance of U.S. banks: Technological change, deregulation, and dynamic changes in competition. Journal of Fi- nancial Intermediation 12: 57–95.

Brewer, M.J. (2000) A Bayesian model for local smoothing in kernel density estimation. Statistics and Computing 10(4): 299–309.

Buch-Larsen, T., Nielsen, J.P., Guillen, M., Bolance, C. (2005) Kernel density estimation for heavy-tailed distributions using the Champernowne transformation. Statistics 39: 503-518.

Caves, D.W., Christensen, L.R., Diewert, W.E. (1982) The economic theory of index numbers and the measurement of input, output, and productivity. Econometrica 50: 1393–1414.

Chen, S.X. (2000) Probability density function estimation using gamma kernels. Annals of the In- stitute of Statistical Mathematics 52: 471–480.

Cornwell, C., Schmidt, P., Sickles, R.C. (1990) Production frontiers with cross-sectional and time- series variation in efficiency levels. Journal of Econometrics, Elsevier 46(1-2): 185-200.

de Lima, M.S., Atuncar, G.S. (2011) A Bayesian method to estimate the optimal bandwidth for multivariate kernel estimator. Journal of 23(1): 137–148.

Färe, R., Grosskopf, S. (1994) Cost and revenue constrained production, Springer, New York.

Feng, G., Serletis, A. (2010) Efficiency, technical change, and returns to scale in large US banks: Panel data evidence from an output distance function satisfying theoretical regularity. Journal of

31 Banking and Finance 34: 127-138.

Feng, G., Zhang, X. (2012) Productivity and efficiency at large and community banks in the U.S.: A Bayesian true random effects stochastic distance frontier analysis. Journal of Banking and Fi- nance 36: 1883-1895.

Ferguson, T. (1973) A Bayesian analysis of some nonparametric problems. Ann. Statist 1: 209- 230.

Fernandez, C., Osiewalski, J., Steel, M.F.J. (1997) On the use of panel data in stochastic frontier models with improper priors. Journal of Econometrics 79: 169-193.

Gangopadhyay, A.K., Cheung, K. (2002) A Bayesian approach to the kernel density estimation. Journal of Nonparametric Statistics 14: 655-664.

Gelfand, A.E., Sahu, S., Carlin, B. (1995) Efficient parametrization for normal linear mixed effects models. Biometrika 82: 479-488.

Geweke, J.F. (2005) Contemporary Bayesian econometrics and statistics. John Wiley and Sons, Canada.

Geweke, J.F. (2009) Complete and incomplete econometric models. Princeton University Press, New Jersey.

Greene, W. (1990) A gamma-distributed stochastic frontier model. Journal of Econometrics 46: 141–163.

Greene, W. (2004) Distinguishing between heterogeneity and inefficiency: Stochastic frontier analysis of the world health organization’s panel data on national health care systems. Health Economics 13: 959–980.

Greene, W. (2008) The econometric approach to efficiency analysis. In Fried, H. O., Knox Lovell, C. A., and Schmidt, P., (eds), The Measurement of Productive Efficiency. Oxford University Press,

32 New York and Oxford.

Griffin, J., Steel, M.F.J. (2004) Semiparametric Bayesian inference for stochastic frontier models. Journal of Econometrics 123: 121–152.

Griffin, J., Steel, M.F.J. (2008) Flexible mixture modeling of stochastic frontiers. Journal of Pro- ductivity Analysis 21: 157–178.

Härdle, W., Hall, P., Ichimura, H. (1993) Optimal smoothing in single-index models. The Annals of Statistics 21(1): 157–178.

Jaki, T., West, R.W. (2008) Maximum kernel likelihood estimation. Journal of Computational and Graphical Statistics 17: 976-993.

Johnson, N., Kotz, S. (1972) Distributions in statistics: Continuous multivariate distributions. Wi- ley, New York.

Kass, R.E., Raftery A.E. (1995) Bayes factors. Journal of the American Statistical Association 90: 773-795.

Kim, S., Shephard, N., Chib, S. (1998) Stochastic volatility: Likelihood inference and comparison with ARCH models. The Review of Economic Studies 65: 361–393.

Koop, G. (2003) Bayesian econometrics. Wiley, Chichester.

Koop, G., Osiewalski, J., Steel, M. (1997) Bayesian efficiency analysis through individual effects: Hospital cost frontiers. Journal of Econometrics 76: 77–105.

Lovell, C.A., Pastor, J.T., Turner, J.A. (1994) Measuring macroeconomic performance in the OECD: A comparison of European and Non-European countries. European Journal of Opera- tional Research 87: 507-518.

Malikov, E., Kumbhakar, S., Tsionas, E. (2015) A cost system approach to the stochastic direc- tional technology distance function with undesirable outputs: The case of US banks in 2001-2010.

33 Journal of Applied Econometrics. DOI: 10.1002/jae.2491.

Meeusen, W., Van Den Broeck, J. (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review 18: 435–444.

O’Donnell, C.J., Coelli, T.J. (2005) A Bayesian approach to imposing curvature on distance func- tions. Journal of Econometrics 126: 493–523.

Ritter, C., Simar, L. (1997) Pitfalls of normal-gamma stochastic frontier models. Journal of Pro- ductivity Analysis 8(2): 167-182.

Rothe, C. (2009) Semiparametric estimation of binary response models with endogenous regres- sors. Journal of Econometrics 153(1): 51–64.

Sealey, C. W., Lindley, J. T. (1977) Inputs, outputs, and a theory of production and cost at deposi- tory financial institutions. The Journal of Finance 32 (4): 1251-1266.

Schmidt, P., Sickles, R.C. (1984) Production frontiers and panel data. Journal of Business & Eco- nomic Statistics 2(4): 367-374.

Stevenson, R. (1980) Likelihood functions for generalized stochastic frontier estimation. Journal of Econometrics 13: 57–66.

Tierney, L. (1994) Markov Chains for exploring posterior distributions. The Annals of Statistics 22: 1701-1728.

Terrell, G.R., Scott, D.W. (1992) Variable kernel density estimation. The Annals of Statistics 20: 1236–1265.

Wand, M.P., Marron, J.S. Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association 86: 343–353.

Yuan, A., de Gooijer, J.G. (2007) Semiparametric regression with kernel error model. Scandina- vian Journal of Statistics 34(4): 841-869.

34 Zhang, X., Brooks, R., King, M.L. (2009) A Bayesian approach to bandwidth selection for mul- tivariate kernel regression with an application to state-price density estimation. Journal of Econo- metrics 153(1): 21-32.

Zhang, X., King, M.L., Hyndman, R. (2006) A Bayesian approach to bandwidth selection for mul- tivariate kernel density estimation. Computational Statistics and Data Analysis 50(11): 3009-3031.

Zhang, X., King, M.L., Shang, H.L. (2014) A sampling algorithm for bandwidth estimation in a model with a flexible error density. Computational Statistics and Data Analysis 78: 218-234.

35 TABLE 1A

MONTE CARLO SIMULATION RESULTS (ui ISGENERATEDFROMGENERALIZED GAMMADISTRIBUTION)

Panel A: N = 100, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.010 0.015 0.011 Spearman Rank Correlation Coefficient 0.559 0.607 0.660 Coverage Probability 0.618 0.430 0.563

Panel B: N = 200, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.005 0.011 0.007 Spearman Rank Correlation Coefficient 0.610 0.583 0.643 Coverage Probability 0.685 0.460 0.628

Panel C: N = 300, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.004 0.009 0.006 Spearman Rank Correlation Coefficient 0.667 0.601 0.701 Coverage Probability 0.810 0.633 0.785

Panel D: N = 400, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.004 0.008 0.005 Spearman Rank Correlation Coefficient 0.710 0.579 0.756 Coverage Probability 0.910 0.710 0.820

1 TABLE 1B

MONTE CARLO SIMULATION RESULTS (ui IS GENERATED FROM EXPONENTIALDISTRIBUTION)

Panel A: N = 100, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.008 0.006 0.016 Spearman Rank Correlation Coefficient 0.809 0.879 0.776 Coverage Probability 0.780 0.950 0.690

Panel B: N = 200, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.006 0.004 0.011 Spearman Rank Correlation Coefficient 0.609 0.820 0.718 Coverage Probability 0.775 0.940 0.665

Panel C: N = 300, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.008 0.003 0.009 Spearman Rank Correlation Coefficient 0.623 0.794 0.743 Coverage Probability 0.743 0.937 0.533

Panel D: N = 400, T = 6 Kernel model Exponential model Dirichlet model Average Eucidean Distance 0.007 0.003 0.009 Spearman Rank Correlation Coefficient 0.515 0.782 0.682 Coverage Probability 0.615 0.953 0.490

2 TABLE 2

REGULARITY VIOLATIONS (UNCONSTRAINED MODELS).

Regularity conditions Exponential Parametric SDF Model Kernel SDF Model

Monotonicity ¯ k1 ≤ 0 11.34% 29.02% ¯ k2 ≤ 0 0% 0.07% ¯ k3 ≤ 0 69.73% 82.18% r¯1 ≥ 0 0% 0.93% r¯2 ≥ 0 6.88% 12.11% r¯3 ≥ 0 0.37% 0.53%

Curvature Quasi-convex in inputs 100% 100% Convex in outputs 16.23% 27.45%

3 TABLE 3

PARAMETER ESTIMATES FOR THE KERNEL SDFMODEL

Parameter Estimate 95% Credible Interval SIF a0 −0.0655 (−0.0854, −0.0497) 54 a1 0.3495 (0.3330, 0.3680) 57 a2 0.1185 (0.1107, 0.1303) 60 a3 0.5320 (0.5250, 0.5355) 60 a11 0.0722 (0.0621, 0.0876) 57 a12 0.0136 (0.0090, 0.0180) 60 a13 −0.0858 (−0.0941, −0.0777) 61 a22 0.0090 (0.0057, 0.0116) 63 a23 −0.0226 (−0.0325, −0.0193) 58 a33 0.1084 (0.0999, 0.1108) 57 b1 −0.1979 (−0.2662, −0.1072) 69 b2 −0.7402 (−0.8419, −0.6641) 70 b3 −0.0871 (−0.1072, −0.0643) 56 b11 −0.1161 (−0.1589, −0.0300) 68 b12 0.1138 (0.0548, 0.1540) 68 b13 0.0060 (−0.0091, 0.0194) 47 b22 −0.0692 (−0.1056, −0.0353) 63 b23 −0.0319 (−0.0444, −0.0159) 50 b33 0.0035 (−0.0070, 0.0157) 36 g11 −0.0664 (−0.0794, −0.0433) 60 g21 0.0975 (0.0798, 0.1132) 59 g31 −0.0097 (−0.0165, −0.0033) 43 g12 0.0232 (0.0138, 0.0313) 57 g22 −0.0098 (−0.0178, −0.0021) 44 g32 −0.0031 (−0.0070, 0.0012) 44 g13 0.0432 (0.0384, 0.0509) 58 g23 −0.0877 (−0.0928, −0.0785) 58 g33 0.0128 (0.0121, 0.0174) 60 δτ −0.0706 (−0.0802, −0.0596) 53 δττ 0.0129 (0.0097, 0.0161) 54 δ1 −0.0111 (−0.0148, 0.0069) 58 δ2 −0.0031 (−0.0048, 0.0009) 58 δ3 0.0142 (0.0130, 0.0199) 56 ρ1 0.0056 (−0.0042, 0.0134) 62 ρ2 −0.0074 (−0.0146, 0.0028) 63 ρ3 0.0021 (−0.0005, 0.0047) 35

4 TABLE 4

PARAMETER ESTIMATES FOR THE EXPONENTIAL PARAMETRIC SDFMODEL

Parameter Estimate 95% Credible Interval SIF a0 −0.0412 (−0.0845, −0.0037) 57 a1 0.3313 (0.3139, 0.3482) 46 a2 0.1216 (0.1072, 0.1339) 55 a3 0.5471 (0.5434, 0.5500) 44 a11 0.0849 (0.0709, 0.0995) 50 a12 0.0096 (0.0038, 0.0141) 49 a13 −0.0945 (−0.0963, −0.0895) 35 a22 0.0102 (0.0084, 0.0122) 40 a23 −0.0198 (−0.0230, −0.0141) 45 a33 0.1143 (0.1114, 0.1206) 23 b1 −0.3539 (−0.4012, −0.3086) 55 b2 −0.5573 (−0.5979, −0.5139) 53 b3 −0.0921 (−0.1164, −0.0678) 52 b11 −0.1990 (−0.2305, −0.1671) 40 b12 0.1862 (0.1522, 0.2323) 53 b13 0.0065 (−0.0081, 0.0208) 27 b22 −0.1372 (−0.1865, −0.0911) 53 b23 −0.0344 (−0.0539, −0.0152) 44 b33 0.0088 (−0.0060, 0.0256) 43 g11 −0.0593 (−0.0864, −0.0411) 53 g21 0.0876 (0.0685, 0.1189) 56 g31 0.0135 (−0.0235, −0.0028) 42 g12 0.0161 (0.0041, 0.0283) 54 g22 0.0020 (−0.0076, 0.0161) 56 g32 −0.0048 (−0.0085, −0.0011) 29 g13 0.0432 (0.0367, 0.0442) 34 g23 −0.0896 (−0.0971, −0.0815) 55 g33 0.0183 (0.0106, 0.0270) 56 δτ −0.0615 (−0.0707, −0.0511) 42 δττ 0.0101 (0.0072, 0.0126) 42 δ1 −0.0077 (−0.0111, −0.0046) 42 δ2 −0.0038 (−0.0052, −0.0020) 44 δ3 0.0115 (0.0040, 0.0127) 40 ρ1 0.0125 (0.0055, 0.0199) 40 ρ2 −0.0122 (−0.0195, −0.0075) 47 ρ3 0.0016 (−0.0014, 0.0049) 28

5 TABLE 5

TECHNICAL CHANGE

A. Kernel SDF Model B. Exponential Parametric SDF Model

Year Estimate 95% Credible Interval Estimate 95% Credible Interval

2001 0.0418 (0.0370, 0.0458) 0.0398 (0.0350, 0.0446) 2002 0.0288 (0.0254, 0.0316) 0.0289 (0.0261, 0.0326) 2003 0.0158 (0.0117, 0.0191) 0.0185 (0.0154, 0.0220) 2004 0.0018 (−0.0048, 0.0079) 0.0078 (0.0028, 0.0132) 2005 −0.0119 (−0.0216, −0.0028) −0.0030 (−0.0103, 0.0049)

TABLE 6

RETURNSTO SCALE

A. Kernel SDF Model B. Exponential Parametric SDF Model

Year Estimate 95% Credible Interval Estimate 95% Credible Interval

2000 1.0292 (1.0200, 1.0414) 1.0020 (0.9898, 1.0146) 2001 1.0291 (1.0208, 1.0406) 1.0014 (0.9907, 1.0128) 2002 1.0294 (1.0209, 1.0395) 1.0013 (0.9913, 1.0127) 2003 1.0307 (1.0217, 1.0414) 1.0017 (0.9920, 1.0143) 2004 1.0341 (1.0240, 1.0472) 1.0032 (0.9929, 1.0182) 2005 1.0361 (1.0248, 1.0503) 1.0040 (0.9927, 1.0208)

TABLE 7

PRODUCTIVITY GROWTH

A. Kernel SDF Model B. Exponential Parametric SDF Model

Year Estimate 95% Credible Interval Estimate 95% Credible Interval

2001 0.0455 (0.0405, 0.0497) 0.0404 (0.0363, 0.0445) 2002 0.0304 (0.0271, 0.0333) 0.0296 (0.0269, 0.0329) 2003 0.0179 (0.0142, 0.0212) 0.0188 (0.0158, 0.0222) 2004 0.0047 (−0.0011, 0.0106) 0.0082 (0.0033, 0.0135) 2005 −0.0092 (−0.0180, −0.0003) −0.0026 (−0.0099, 0.0053)

6 Figure 1: Posterior Efficiency Distributions for Five Banks (solid line: Kernel SDF Model; dashed line: Exponential Parametric SDF Model) 7 Figure 2: Posterior Predictive Efficiency Densities (solid line: Kernel SDF Model; dashed line: Exponential Parametric SDF Model)

8 Figure 3: Posterior Efficiency Distributions for Five Banks (solid line: Kernel SDF Model; dashed line: Dirichlet SDF Model) 9 Figure 4: Posterior Predictive Efficiency Densities (solid line: Kernel SDF Model; dashed line: Dirichlet SDF Model)

10 Figure 5: Posterior Efficiency Distributions for the Third Quantile Bank 11 Figure 6: Posterior Predictive Efficiency Densities 12