Journal of Agricultural and Applied Economics, 42,2(May 2010):303–319 Ó 2010 Southern Agricultural Economics Association
A Flexible Parametric Family for the Modeling and Simulation of Yield Distributions
Octavio A. Ramirez, Tanya U. McDonald, and Carlos E. Carpio
The distributions currently used to model and simulate crop yields are unable to accom- modate a substantial subset of the theoretically feasible mean-variance-skewness-kurtosis (MVSK) hyperspace. Because these first four central moments are key determinants of shape, the available distributions might not be capable of adequately modeling all yield distributions that could be encountered in practice. This study introduces a system of distributions that can span the entire MVSK space and assesses its potential to serve as a more comprehensive parametric crop yield model, improving the breadth of distributional choices available to researchers and the likelihood of formulating proper parametric models.
Key Words: risk analysis, parametric methods, yield distributions, yield modeling and simulation, yield nonnormality
JEL Classifications: C15, C16, C46, C63
Agricultural economists have long recognized Shonkwiler (1993), Ramirez, Moss, and Boggess that the choice of an appropriate probability (1994), Coble et al. (1996), and Ramirez (1997). distribution to represent crop yields is critical These authors have provided strong statistical for an accurate measurement of the risks as- evidence of nonnormality and heteroskedasticity sociated with crop production. Anderson in crop yield distributions—specifically, the ex- (1974) first emphasized the importance of ac- istence of kurtosis and negative skewness in counting for nonnormality in crop yield distri- a variety of cases. The possibility of positive butions for the purpose of economic risk anal- skewness has been documented as well (Ramı´rez, ysis. Since then, numerous authors have Misra, and Field, 2003). focused on this issue, including Gallagher The three general statistical procedures that (1987), Nelson and Preckel (1989), Moss and have been used for the modeling and simulating of crop yield distributions are the parametric, the nonparametric, and semiparametric. All have Octavio A. Ramirez is professor and head, Department of Agricultural and Applied Economics, University of distinct advantages and disadvantages. The para- Georgia, Athens, GA. Tanya U. McDonald is research metric procedures assume that the data-generat- specialist, Department of Agricultural Economics and ing process can be adequately represented by a Agricultural Business, New Mexico State University, particular parametric probability distribution Las Cruces, NM. Carlos E. Carpio is assistant pro- fessor, Department of Applied Economics and Statis- function. For this reason, the main disadvantage tics, Clemson University, Clemson, SC. of this method is the potential error resulting This study was supported by the National Research from assuming a probability distribution that is Initiative of the Cooperative State Research, Education and not flexible enough to properly represent the Extension Service, USDA, Grant #2004-35400-14194, and by the Agricultural Experiment Stations of the Uni- data. The main advantage of this method is that, versity of Georgia and New Mexico State University. if the assumed distribution can adequately 304 Journal of Agricultural and Applied Economics, May 2010 represent the data-generating process, it performs Coble (2003) proposed a semiparametric model relatively well even in small sample applications. based on the Normal and Beta densities. Em- This is important because crop yield datasets do pirical comparisons of leading parametric not often span long periods. Distributions that models have been recently attempted (Norwood, have been used as a basis for parametric pro- Roberts, and Lusk, 2004). Despite such ad- cedures include the Normal, the Log-normal, the vances, the potential of different probability Logistic, the Weibull, the Beta, the Gamma, and density functions (pdfs) to serve as suitable yield the Inverse Hyperbolic Sine. and price distribution models has not been The nonparametric approaches also have assessed in the context of a rigorous theoretical strengths and weaknesses. Because these framework, and this critical research methods methods are free of a functional form assump- issue remains unsettled. tion, they are generally more flexible. However, According to basic statistical theory (Mood, they can be inefficient relative to parametric Graybill, and Boes, 1974), the first four central procedures under certain conditions. Specifi- moments of a pdf are the main descriptors of its cally, according to Ker and Coble (2003), ‘‘it is shape. Although there are other means for possible, perhaps likely, for very small samples characterizing and comparing distributions, this such as those corresponding to farm-level yield suggests that the flexibility of a pdf to accom- data, that an incorrect parametric form—say modate a variety of empirical shapes (i.e., data- Normal—is more efficient than the standard generating processes) should be closely related to nonparametric kernel estimator.’’ Other authors the Mean-Variance-Skewness-Kurtosis (MVSK) cite theoretical complexity and intensive com- combinations that are allowed by it. putational requirements as another disadvantage Unfortunately, in this regard, all of the pdfs of nonparametric procedures (Yatchew, 1998). that have been used as a basis for parametric Semiparametric methods show significant po- methods suffer from two significant range re- tential because they encapsulate the advantages strictions: (1) they can only accommodate lim- of the parametric and nonparametric approaches ited subsets of all of the theoretically feasible while mitigating their disadvantages (Ker and Skewness-Kurtosis (SK) combinations and, Coble, 2003; Norwood, Roberts, and Lusk, therefore, might only be capable of adequately 2004). Because the semiparametric approach is modeling underlying data distributions where based on nonparametrically ‘‘correcting’’ a par- third and fourth central moments are within that ticular parametric estimate, the availability of subset; and (2) their variance, skewness, and more flexible distributions such as the ones ad- kurtosis are controlled by only two parameters vanced in this study should improve the potential and, therefore, are arbitrarily interrelated (i.e., efficiency of semiparametric procedures as well. only two of these three key moments are free to Extensive efforts have been devoted to the vary independently). Although the expanded issue of the most appropriate probability dis- form of the SU advanced by Ramı´rez, Misra, and tribution to be used as a basis for parametric or Field (2003) allows for any mean and variance semiparametric methods. Gallagher (1987) used to be freely associated with the SK values per- the well-known Gamma density as a parametric mitted by the original parameterization of this model for soybean yields. Nelson and Preckel family, it is far from encompassing all theoret- (1989) proposed a conditional Beta distribution ically feasible SK combinations. to model corn yields. Taylor (1990) estimated This research contributes to the yield and multivariate nonnormal densities through a con- price distribution modeling literature through ditional distribution approach based on the the following measures:
Hyperbolic Tangent transformation. Ramirez 1. Introduce a family of distributions (SB) that (1997) introduced a modified Inverse Hyper- perfectly complements the SU in its coverage bolic Sine transformation (also known in the of the SK space, and that spans significant statistics literature as the SU Distribution) as regions of the space not covered by any other a possible multivariate nonnormal and hetero- distribution that has been used for the mod- skedastic crop yield distribution model. Ker and eling and simulation of crop yields. Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 305
2. Derive expanded forms of this SB family and outline a reparameterization technique that of the Beta distribution, which are analogous expands any probability distribution by two to Ramı´rez, Misra, and Field’s (2003) repar- parameters that specifically and uniquely con-
ameterization of the SU, so that they, too, can trol the mean and variance without affecting the allow for any mean and variance to be freely range of skewness and kurtosis values that can associated with their possible SK values. be accommodated. The expanded distribution 3. Assess the importance of MVSK coverage in obtained through this reparameterization can determining a model’s flexibility to adequately therefore model any conceivable mean and represent a wide range of distributional shapes. variance in conjunction with the set of SK combinations allowed by the original distribu- The SU-SB System tion. For the purposes of this study, the tech-
nique is first applied to the SU and SB families. Johnson (1949) introduced the SU and the SB This yields a system that can model all theo- families of distributions and showed that, to- retically feasible MVSK combinations. The gether, they span the entire SK space. Figure 1 reparameterization begins with the original is constructed on the basis of the formulas for two-parameter families (Johnson, 1949): the skewness and kurtosis of the SU and SB 1 distributions, which were also first derived by (1) Z 5 g 1 d sinh Y for the SU distribution Johnson (1949). Specifically, SK pairs are (2) Z 5 g 1 d ln½Y=ð1 YÞ for the S computed and plotted for very fine grids of the B distribution parameter spaces corresponding to these two distributions. The same procedure is followed where Y is a nonnormally distributed random for the Beta, Gamma, and Log-normal (SL). It variable based on a standard normal variable is observed that the lower bound of the SB is at (Z). In other words, the original SU and SB the boundary for the theoretically feasible SK distributions are derived from transformations space (K 5 S2 22). This plotting illustrates of a standard normal density. Their pdfs, which Johnson’s claim of a comprehensive coverage are also provided by Johnson (1949), are of the theoretically feasible SK space by the obtained by substituting Z in Equation (1) (for
SU-SB families. the SU) or Equation (2) (for the SB) into However, in the parameterizations proposed a standard normal density and multiplying the by Johnson, each of those SK combinations is resulting equation by the derivative of Equation arbitrarily associated with a fixed set of mean- (1) (for the SU) or Equation (2) (for the SB) with variance values. Ramirez and McDonald (2006) respect to Y. In the mathematical statistics
Figure 1. SU,SL,SB, Beta, and Gamma Distributions in the SK Plane; the SB Distribution Allows all SK Combinations in the Beta and Gamma Areas as Well 306 Journal of Agricultural and Applied Economics, May 2010
literature, this is commonly known as the allowed by the original SU and SB families. The transformation technique for deriving pdfs. final step in the reparameterization process is to Note that from Equations (1) and (2) it fol- expand the Y S distributions so that, instead of lows that: being zero and one, their means and variances Z g can be controlled by parameters or by para- Y 5 sinh 5 sinhðNÞ for the metric functions of explanatory variables. This (3) d is accomplished by multiplying Y S times the SU distribution parameter or parametric function representing Z g exp the variance and then subtracting the parametric d eN Y5 5 for the function representing the mean: (4) Z g ð1 1 eN Þ 1 1 exp d F S S (7) Yt 5stY Mt5ðZtsÞY ðXtbÞ SB distribution where N is a normal random variable with where Y S is as defined in Equations (5) and (6) g 1 mean and variance . The above equations for the SU and SB distributions, t 5 1, ...,T d d2 express the SU and SB random variables (Y) as denotes the observations on the explanatory F a function of a normal, and can be used for variables, and Yt represents the final random simulating draws from their probability distri- variables of interest. From Equation (7), note butions. Johnson (1949) also provides the for- that for both reparameterized variables: mulas for computing their means and variances, F F 2 2 which will be denoted by FSU and FSB (for the (8) E½Yt 5Mt5Xtb and V½Yt 5 st 5 ðZtsÞ means) and GSU and GSB (for the variances). These formulas are simple but lengthy trigono- where Xt and Zt represent vectors of explanatory metric functions of g and d. The skewness and variables believed to affect the means and vari- kurtosis of the SU and SB distributions are ances of the distributions, and b and s are con- functions of g and d as well. All formulas and formable parameter vectors. That is, the mean a Gauss program to compute the first four cen- and variance of the reparameterized SU and SB F tral moments of both distributions for given random variables (Yt ) are uniquely controlled 2 values for g and d are available from the authors. by Mt and st ,whileg and d determine their The random variables (Y) corresponding to skewness and kurtosis according to the formulas each of the two distributions given in Equations (3) provided by Johnson (1949) for the original SU and (4) are then standardized by subtracting their and SB distributions. Therefore, the reparame- means and dividing by their standard deviations: terized SU-SB system can accommodate any theoretically possible MVSK combination. sinhðNÞ F (5) YS5 SU for the S distribution As noted previously, Figure 1 illustrates the 1=2 U GSU SK regions covered by the SU and SB, as well eN as three other commonly used distributions. FSB 1 1 eN The SL or Log-normal distribution, which is S5 (6) Y 1=2 for the SB distribution also a part of the original Johnson system, only GSB spans the curvilinear boundary between the SU Note that the standardized SU and SB variables and SB. The Gamma distribution only spans (YS) will always have a mean of zero and a curvilinear segment on the upper right a variance of one, i.e., the parameters g and d no quadrant of the SK plane as well. Like the SL, longer affect the mean or the variance of their the Gamma can be adapted to cover the mirror distributions. However, because standardization image of this segment on the upper left quad- only involves subtracting from and dividing the rant. However, the combinations of SK values original random variables (Y) by constants, the allowed by it are still extremely limited. distributions corresponding to these standard- Although the Beta covers a nonnegligible S ized variables (Y ) can still accommodate the area of the SK plane, note that the SB can ac- same sets of skewness-kurtosis combinations commodate all SK combinations allowed by it. Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 307
ffiffiffiffiffiffi In fact, the Beta region is quite narrower than the XT p GB LLB5 ln 1 n ln GðÞd 1 l SB’s (i.e., the Beta only covers a subset of the SK 2 t51 st area spanned by the S ). Thus, in general, one B n ln GðÞ d n ln GðÞl might expect the SB to be a more applicable ! XT model than the Beta. However, because higher- (11) 1 ðd 1Þ ln Pt order moments also affect distributional shape, t51 ! it is possible that, in some applications, a simi- XT larly parameterized Beta would provide for 1 ðÞl 1 lnðÞ 1 Pt , t51 a better model than the S . Another difference ffiffiffiffiffi B F p d ðYt MtÞ GB that could affect their relative performance in where Pt 5 1 2 and G repre- d 1 l st a particular case is the fact that the Beta is sent the Gamma function. As in the case of the F F a bounded distribution, whereas the SB is not. expanded SU and SB, E[Yt ] 5 Mt and V[Yt ] 5 2 2 In short, even if the Beta distribution is rep- st and Mt and st can be specified as linear arameterized using the previously discussed functions of relevant explanatory variables. technique, because it only spans a relatively small subset of the empirically possible SK Estimation of the S -S System space, it may not serve as an acceptable model in U B some cases. However, it is also possible that it Estimation of the S -S system can also be would provide for a better model than the S and U B B accomplished by maximum likelihood. Since the S in other applications. Considering its U both originate from normal random variables significant coverage of the SK space and the (N), the transformation technique (Mood, potential role of the higher-order moments and Graybill, and Boes 1974) can be applied to de- support characteristics of a distribution on rive their probability distribution functions. goodness-of-fit, the Beta is selected as the alter- According to this technique, the pdf of the native candidate model for the comparative F transformed random variable (Yt ) is given by: evaluation of the SU-SB system conducted in this study. The expanded parameterization of the Beta 1 F F @ðq ðYt ÞÞ 1 F distribution is derived in the following section. PðYt Þ 5 F Pðq ðYt ÞÞ (12) @Yt F 1 F The Expanded Beta Distribution 5 JðYt Þ Pðq ðYt ÞÞ, where q21(Y F) is the inverse of the trans- An expanded parameterization of the Beta dis- t formation of N into Y F (i.e., the function re- tribution that can accommodate any mean and t lating N to Y F, P(q21(Y F)) is the pdf of an variance in conjunction with all SK combina- t t independently and identically distributed nor- tions allowed by the original Beta is obtained by mal random variable N with mean ð gÞ and applying the same technique used for the S and d U variance d22 evaluated at q21(Y F), and J(Y F) S . In the case of the Beta distribution: t t B is the Jacobian of the transformation—that is, 21 F F E½Y 5d=ðÞd 1 l 5 FB and the derivative of q (Yt ) with respect to Yt . 1 F (9) dl Specifically, for the SU, N 5 qSUðYt Þ is found V½Y 5 5 GB ðÞd 1 l 1 1 ðÞd 1 l 2 by substituting Equation (5) into Equation (7) and solving for N: Thus, the transformation from the original Beta- distributed variable (Y) into the random variable 1 F 1 F N5qSUðYt Þ5 sinh fRSUtg where exhibiting the expanded Beta distribution (Yt )is: (13) F 1=2 ðYt XtbÞ GSU F 1=2 RSUt5 1 FSU, (10) Yt 5stðÞY FB =GB 1 Mt Zts
F The pdf for Yt is obtained through a and FSU and GSU are as defined previously. The straightforward application of the transformation Jacobian is obtained by taking the absolute technique, which leads to the following log- value of the derivative of Equation (13) with F likelihood function: respect to Yt , which yields: 308 Journal of Agricultural and Applied Economics, May 2010
1=2 and positive skewness, and both families ap- F GSU (14) JSUðY Þ5 . t 2 1=2 proach a normal distribution as u goes to zero. Zts ð1 1 R Þ SUt This also allows for testing the null hypothesis The S pdf is then obtained by substituting U of normality as Ho: u 5 m 5 0. In addition, for Equation (13) into a normal density with mean the purposes of estimation, the following pa- ð gÞ and variance d22 and premultiplying the d rameter range restrictions are recommended result by Equation (14). based on the authors’ experience: 0 < u < 1.5 Analogously, for the S , Equation (6) is B and 210 < m < 10 for the S ;0