Journal of Agricultural and Applied Economics, 42,2(May 2010):303–319 Ó 2010 Southern Agricultural Economics Association

A Flexible Parametric Family for the Modeling and Simulation of Yield Distributions

Octavio A. Ramirez, Tanya U. McDonald, and Carlos E. Carpio

The distributions currently used to model and simulate crop yields are unable to accom- modate a substantial subset of the theoretically feasible -variance-skewness- (MVSK) hyperspace. Because these first four central moments are key determinants of shape, the available distributions might not be capable of adequately modeling all yield distributions that could be encountered in practice. This study introduces a system of distributions that can span the entire MVSK space and assesses its potential to serve as a more comprehensive parametric crop yield model, improving the breadth of distributional choices available to researchers and the likelihood of formulating proper parametric models.

Key Words: risk analysis, parametric methods, yield distributions, yield modeling and simulation, yield nonnormality

JEL Classifications: C15, C16, C46, C63

Agricultural economists have long recognized Shonkwiler (1993), Ramirez, Moss, and Boggess that the choice of an appropriate probability (1994), Coble et al. (1996), and Ramirez (1997). distribution to represent crop yields is critical These authors have provided strong statistical for an accurate measurement of the risks as- evidence of nonnormality and heteroskedasticity sociated with crop production. Anderson in crop yield distributions—specifically, the ex- (1974) first emphasized the importance of ac- istence of kurtosis and negative skewness in counting for nonnormality in crop yield distri- a variety of cases. The possibility of positive butions for the purpose of economic risk anal- skewness has been documented as well (Ramı´rez, ysis. Since then, numerous authors have Misra, and Field, 2003). focused on this issue, including Gallagher The three general statistical procedures that (1987), Nelson and Preckel (1989), Moss and have been used for the modeling and simulating of crop yield distributions are the parametric, the nonparametric, and semiparametric. All have Octavio A. Ramirez is professor and head, Department of Agricultural and Applied Economics, University of distinct advantages and disadvantages. The para- Georgia, Athens, GA. Tanya U. McDonald is research metric procedures assume that the data-generat- specialist, Department of Agricultural Economics and ing process can be adequately represented by a Agricultural Business, New Mexico State University, particular parametric Las Cruces, NM. Carlos E. Carpio is assistant pro- fessor, Department of Applied Economics and Statis- function. For this reason, the main disadvantage tics, Clemson University, Clemson, SC. of this method is the potential error resulting This study was supported by the National Research from assuming a probability distribution that is Initiative of the Cooperative State Research, Education and not flexible enough to properly represent the Extension Service, USDA, Grant #2004-35400-14194, and by the Agricultural Experiment Stations of the Uni- data. The main advantage of this method is that, versity of Georgia and New Mexico State University. if the assumed distribution can adequately 304 Journal of Agricultural and Applied Economics, May 2010 represent the data-generating process, it performs Coble (2003) proposed a semiparametric model relatively well even in small sample applications. based on the Normal and Beta densities. Em- This is important because crop yield datasets do pirical comparisons of leading parametric not often span long periods. Distributions that models have been recently attempted (Norwood, have been used as a basis for parametric pro- Roberts, and Lusk, 2004). Despite such ad- cedures include the Normal, the Log-normal, the vances, the potential of different probability Logistic, the Weibull, the Beta, the Gamma, and density functions (pdfs) to serve as suitable yield the Inverse Hyperbolic Sine. and price distribution models has not been The nonparametric approaches also have assessed in the context of a rigorous theoretical strengths and weaknesses. Because these framework, and this critical research methods methods are free of a functional form assump- issue remains unsettled. tion, they are generally more flexible. However, According to basic statistical theory (Mood, they can be inefficient relative to parametric Graybill, and Boes, 1974), the first four central procedures under certain conditions. Specifi- moments of a pdf are the main descriptors of its cally, according to Ker and Coble (2003), ‘‘it is shape. Although there are other for possible, perhaps likely, for very small samples characterizing and comparing distributions, this such as those corresponding to farm-level yield suggests that the flexibility of a pdf to accom- data, that an incorrect parametric form—say modate a variety of empirical shapes (i.e., data- Normal—is more efficient than the standard generating processes) should be closely related to nonparametric kernel estimator.’’ Other authors the Mean-Variance-Skewness-Kurtosis (MVSK) cite theoretical complexity and intensive com- combinations that are allowed by it. putational requirements as another disadvantage Unfortunately, in this regard, all of the pdfs of nonparametric procedures (Yatchew, 1998). that have been used as a basis for parametric Semiparametric methods show significant po- methods suffer from two significant range re- tential because they encapsulate the advantages strictions: (1) they can only accommodate lim- of the parametric and nonparametric approaches ited subsets of all of the theoretically feasible while mitigating their disadvantages (Ker and Skewness-Kurtosis (SK) combinations and, Coble, 2003; Norwood, Roberts, and Lusk, therefore, might only be capable of adequately 2004). Because the semiparametric approach is modeling underlying data distributions where based on nonparametrically ‘‘correcting’’ a par- third and fourth central moments are within that ticular parametric estimate, the availability of subset; and (2) their variance, skewness, and more flexible distributions such as the ones ad- kurtosis are controlled by only two vanced in this study should improve the potential and, therefore, are arbitrarily interrelated (i.e., efficiency of semiparametric procedures as well. only two of these three key moments are free to Extensive efforts have been devoted to the vary independently). Although the expanded issue of the most appropriate probability dis- form of the SU advanced by Ramı´rez, Misra, and tribution to be used as a basis for parametric or Field (2003) allows for any mean and variance semiparametric methods. Gallagher (1987) used to be freely associated with the SK values per- the well-known Gamma density as a parametric mitted by the original parameterization of this model for soybean yields. Nelson and Preckel family, it is far from encompassing all theoret- (1989) proposed a conditional Beta distribution ically feasible SK combinations. to model corn yields. Taylor (1990) estimated This research contributes to the yield and multivariate nonnormal densities through a con- price distribution modeling literature through ditional distribution approach based on the the following measures:

Hyperbolic Tangent transformation. Ramirez 1. Introduce a family of distributions (SB) that (1997) introduced a modified Inverse Hyper- perfectly complements the SU in its coverage bolic Sine transformation (also known in the of the SK space, and that spans significant statistics literature as the SU Distribution) as regions of the space not covered by any other a possible multivariate nonnormal and hetero- distribution that has been used for the mod- skedastic crop yield distribution model. Ker and eling and simulation of crop yields. Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 305

2. Derive expanded forms of this SB family and outline a reparameterization technique that of the Beta distribution, which are analogous expands any probability distribution by two to Ramı´rez, Misra, and Field’s (2003) repar- parameters that specifically and uniquely con-

ameterization of the SU, so that they, too, can trol the mean and variance without affecting the allow for any mean and variance to be freely range of skewness and kurtosis values that can associated with their possible SK values. be accommodated. The expanded distribution 3. Assess the importance of MVSK coverage in obtained through this reparameterization can determining a model’s flexibility to adequately therefore model any conceivable mean and represent a wide range of distributional shapes. variance in conjunction with the set of SK combinations allowed by the original distribu- The SU-SB System tion. For the purposes of this study, the tech-

nique is first applied to the SU and SB families. Johnson (1949) introduced the SU and the SB This yields a system that can model all theo- families of distributions and showed that, to- retically feasible MVSK combinations. The gether, they span the entire SK space. Figure 1 reparameterization begins with the original is constructed on the basis of the formulas for two- families (Johnson, 1949): the skewness and kurtosis of the SU and SB 1 distributions, which were also first derived by (1) Z 5 g 1 d sinh Y for the SU distribution Johnson (1949). Specifically, SK pairs are (2) Z 5 g 1 d ln½Y=ð1 YÞ for the S computed and plotted for very fine grids of the B distribution parameter spaces corresponding to these two distributions. The same procedure is followed where Y is a nonnormally distributed random for the Beta, Gamma, and Log-normal (SL). It variable based on a standard normal variable is observed that the lower bound of the SB is at (Z). In other words, the original SU and SB the boundary for the theoretically feasible SK distributions are derived from transformations space (K 5 S2 22). This plotting illustrates of a standard normal density. Their pdfs, which Johnson’s claim of a comprehensive coverage are also provided by Johnson (1949), are of the theoretically feasible SK space by the obtained by substituting Z in Equation (1) (for

SU-SB families. the SU) or Equation (2) (for the SB) into However, in the parameterizations proposed a standard normal density and multiplying the by Johnson, each of those SK combinations is resulting equation by the derivative of Equation arbitrarily associated with a fixed set of mean- (1) (for the SU) or Equation (2) (for the SB) with variance values. Ramirez and McDonald (2006) respect to Y. In the mathematical statistics

Figure 1. SU,SL,SB, Beta, and Gamma Distributions in the SK Plane; the SB Distribution Allows all SK Combinations in the Beta and Gamma Areas as Well 306 Journal of Agricultural and Applied Economics, May 2010

literature, this is commonly known as the allowed by the original SU and SB families. The transformation technique for deriving pdfs. final step in the reparameterization process is to Note that from Equations (1) and (2) it fol- expand the Y S distributions so that, instead of lows that: being zero and one, their means and variances Z g can be controlled by parameters or by para- Y 5 sinh 5 sinhðNÞ for the metric functions of explanatory variables. This (3) d is accomplished by multiplying Y S times the SU distribution parameter or parametric function representing Z g exp the variance and then subtracting the parametric d eN Y5 5 for the function representing the mean: (4) Z g ð1 1 eN Þ 1 1 exp d F S S (7) Yt 5stY Mt5ðZtsÞY ðXtbÞ SB distribution where N is a normal random variable with where Y S is as defined in Equations (5) and (6) g 1 mean and variance . The above equations for the SU and SB distributions, t 5 1, ...,T d d2 express the SU and SB random variables (Y) as denotes the observations on the explanatory F a function of a normal, and can be used for variables, and Yt represents the final random simulating draws from their probability distri- variables of interest. From Equation (7), note butions. Johnson (1949) also provides the for- that for both reparameterized variables: mulas for computing their means and variances, F F 2 2 which will be denoted by FSU and FSB (for the (8) E½Yt 5Mt5Xtb and V½Yt 5 st 5 ðZtsÞ means) and GSU and GSB (for the variances). These formulas are simple but lengthy trigono- where Xt and Zt represent vectors of explanatory metric functions of g and d. The skewness and variables believed to affect the means and vari- kurtosis of the SU and SB distributions are ances of the distributions, and b and s are con- functions of g and d as well. All formulas and formable parameter vectors. That is, the mean a Gauss program to compute the first four cen- and variance of the reparameterized SU and SB F tral moments of both distributions for given random variables (Yt ) are uniquely controlled 2 values for g and d are available from the authors. by Mt and st ,whileg and d determine their The random variables (Y) corresponding to skewness and kurtosis according to the formulas each of the two distributions given in Equations (3) provided by Johnson (1949) for the original SU and (4) are then standardized by subtracting their and SB distributions. Therefore, the reparame- means and dividing by their standard deviations: terized SU-SB system can accommodate any theoretically possible MVSK combination. sinhðNÞF (5) YS5 SU for the S distribution As noted previously, Figure 1 illustrates the 1=2 U GSU SK regions covered by the SU and SB, as well eN as three other commonly used distributions. FSB 1 1 eN The SL or Log-, which is S5 (6) Y 1=2 for the SB distribution also a part of the original Johnson system, only GSB spans the curvilinear boundary between the SU Note that the standardized SU and SB variables and SB. The Gamma distribution only spans (YS) will always have a mean of zero and a curvilinear segment on the upper right a variance of one, i.e., the parameters g and d no quadrant of the SK plane as well. Like the SL, longer affect the mean or the variance of their the Gamma can be adapted to cover the mirror distributions. However, because standardization image of this segment on the upper left quad- only involves subtracting from and dividing the rant. However, the combinations of SK values original random variables (Y) by constants, the allowed by it are still extremely limited. distributions corresponding to these standard- Although the Beta covers a nonnegligible S ized variables (Y ) can still accommodate the area of the SK plane, note that the SB can ac- same sets of skewness-kurtosis combinations commodate all SK combinations allowed by it. Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 307

ffiffiffiffiffiffi In fact, the Beta region is quite narrower than the XT p GB LLB5 ln 1 n ln GðÞd 1 l SB’s (i.e., the Beta only covers a subset of the SK 2 t51 st area spanned by the S ). Thus, in general, one B n ln GðÞd n ln GðÞl might expect the SB to be a more applicable ! XT model than the Beta. However, because higher- (11) 1 ðd 1Þ ln Pt order moments also affect distributional shape, t51 ! it is possible that, in some applications, a simi- XT larly parameterized Beta would provide for 1 ðÞl 1 lnðÞ 1 Pt , t51 a better model than the S . Another difference ffiffiffiffiffi B F p d ðYt MtÞ GB that could affect their relative performance in where Pt 5 1 2 and G repre- d 1 l st a particular case is the fact that the Beta is sent the Gamma function. As in the case of the F F a bounded distribution, whereas the SB is not. expanded SU and SB, E[Yt ] 5 Mt and V[Yt ] 5 2 2 In short, even if the Beta distribution is rep- st and Mt and st can be specified as linear arameterized using the previously discussed functions of relevant explanatory variables. technique, because it only spans a relatively small subset of the empirically possible SK Estimation of the S -S System space, it may not serve as an acceptable model in U B some cases. However, it is also possible that it Estimation of the S -S system can also be would provide for a better model than the S and U B B accomplished by maximum likelihood. Since the S in other applications. Considering its U both originate from normal random variables significant coverage of the SK space and the (N), the transformation technique (Mood, potential role of the higher-order moments and Graybill, and Boes 1974) can be applied to de- support characteristics of a distribution on rive their probability distribution functions. goodness-of-fit, the Beta is selected as the alter- According to this technique, the pdf of the native candidate model for the comparative F transformed random variable (Yt ) is given by: evaluation of the SU-SB system conducted in this study. The expanded parameterization of the Beta 1 F F @ðq ðYt ÞÞ 1 F distribution is derived in the following section. PðYt Þ 5 F Pðq ðYt ÞÞ (12) @Yt F 1 F The Expanded Beta Distribution 5 JðYt Þ Pðq ðYt ÞÞ, where q21(Y F) is the inverse of the trans- An expanded parameterization of the Beta dis- t formation of N into Y F (i.e., the function re- tribution that can accommodate any mean and t lating N to Y F, P(q21(Y F)) is the pdf of an variance in conjunction with all SK combina- t t independently and identically distributed nor- tions allowed by the original Beta is obtained by mal random variable N with mean ð gÞ and applying the same technique used for the S and d U variance d22 evaluated at q21(Y F), and J(Y F) S . In the case of the Beta distribution: t t B is the Jacobian of the transformation—that is, 21 F F E½Y5d=ðÞd 1 l 5 FB and the derivative of q (Yt ) with respect to Yt . 1 F (9) dl Specifically, for the SU, N 5 qSUðYt Þ is found V½Y5 5 GB ðÞd 1 l 1 1 ðÞd 1 l 2 by substituting Equation (5) into Equation (7) and solving for N: Thus, the transformation from the original Beta- distributed variable (Y) into the random variable 1 F 1 F N5qSUðYt Þ5 sinh fRSUtg where exhibiting the expanded Beta distribution (Yt )is: (13) F 1=2 ðYt XtbÞGSU F 1=2 RSUt5 1 FSU, (10) Yt 5stðÞY FB =GB 1 Mt Zts

F The pdf for Yt is obtained through a and FSU and GSU are as defined previously. The straightforward application of the transformation Jacobian is obtained by taking the absolute technique, which leads to the following log- value of the derivative of Equation (13) with F : respect to Yt , which yields: 308 Journal of Agricultural and Applied Economics, May 2010

1=2 and positive skewness, and both families ap- F GSU (14) JSUðY Þ5 . t 2 1=2 proach a normal distribution as u goes to zero. Zts ð1 1 R Þ SUt This also allows for testing the null hypothesis The S pdf is then obtained by substituting U of normality as Ho: u 5 m 5 0. In addition, for Equation (13) into a normal density with mean the purposes of estimation, the following pa- ð gÞ and variance d22 and premultiplying the d rameter range restrictions are recommended result by Equation (14). based on the authors’ experience: 0 < u < 1.5 Analogously, for the S , Equation (6) is B and 210 < m < 10 for the S ;0 0; and GSU, RSUt, GSB, and are specified as second- and first-degree poly- RSBt are as defined previously. nomial functions of time:

An adjustment that facilitates estimation 2 Mt5Xtb5b 1 b t 1 b t and st5ðZtsÞ and interpretation is redefining the distribu- (20) 0 1 2 5s 1 s t;t51, ...,T tional shape parameters as follows: for the 0 1

SU g 52m, for the SB g 5 m, and for both All nonnormal models initially include seven families d 5 1/u. Then, in all cases m <0,m 5 parameters (b0, b1, b2, s0, s1, u, and m in the 0, and m > 0 are associated with negative, zero, case of the SU-SB system and b0, b1, b2, s0, s1, Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 309 d, and l in the case of the Beta). Normal Although this is not a formal statistical test for models with the same mean and standard de- the superiority of a nonnormal model over an- viation specifications are also estimated and other, it has been routinely used to rank fit included in the comparison. Select statistics (Norwood, Roberts, and Lusk, 2004). Following about those models are presented in Table 1. this criterion, five of the 26 underlying distri-

The null hypothesis of yield normality is butions are categorized as Normal, six as SU, tested through likelihood ratio tests of the nor- seven as SB, and eight as Beta (Table 1). mal model versus the nonnormal model with the For the purpose of the analysis, the corre- highest maximum log-likelihood function value sponding SU,SB, and Beta models are assumed (MLLFV). This is possible because the Normal to be the true distributions and used to generate model is nested to all three nonnormal models. If the data for the next phase. Specifically, 21 this hypothesis cannot be rejected (a 5 0.10), it datasets of 100,000 observations each are sim- is assumed that the underlying pdf is Normal. ulated on the basis of the six SU, seven SB, and Otherwise, it is assumed that the true distribu- eight estimated Beta models. The fact that the tion is the nonnormal with the highest MLLFV. distributional shapes used in this evaluation are

Table 1. Select Statistics for Illinois Farm-Level Corn Yield Models Based on SU,SB, Beta, and Normal Distributions Likelihood

Farm Sample SU SB Beta Normal Ratio Test Final Label Size MLLFV MLLFV MLLFV MLLFV Statistica Model

3 A442183.62 2186.67 2187.24 2191.64 16.03 SU 3 B322123.81 2123.81 2126.39 2134.94 22.27 SB 3 C442186.38 2182.15 2185.00 2187.61 10.91 SB 2 D432189.23 2189.39 2189.54 2192.55 6.63 SU E252108.09 2108.00 2107.72 2112.23 9.012 Beta F272128.31 2127.08 2127.55 2128.98 3.810 N G312133.58 2133.57 2133.26 2140.68 14.833 Beta H342161.15 2160.20 2160.93 2161.80 3.200 N 2 I432181.27 2184.84 2184.94 2185.62 8.71 SU 2 J322145.96 2145.94 2146.56 2149.20 6.53 SB 3 K272120.75 2118.66 2118.98 2126.11 14.90 SB L292132.56 2132.49 2132.55 2132.56 0.130 N M372169.08 2169.00 2168.95 2171.97 6.022 Beta 1 N452197.46 2195.15 2196.37 2197.47 4.64 SB 3 O422189.54 2188.40 2188.55 2194.36 11.92 SB 1 P422195.34 2195.28 2195.31 2197.77 4.97 SB Q402174.07 2173.55 2172.74 2178.18 10.883 Beta 3 R332145.36 2145.47 2145.67 2150.09 9.46 SU 1 S402181.77 2182.35 2182.50 2184.12 4.70 SU T292131.07 2131.05 2129.44 2133.79 8.692 Beta U442201.83 2201.21 2200.55 2204.01 6.912 Beta V292127.78 2126.34 2125.69 2131.64 11.913 Beta W292131.22 2131.24 2131.20 2132.56 2.710 N 3 X20293.45 293.96 294.00 298.42 9.94 SU Y292135.14 2135.00 2134.35 2136.90 5.083 Beta Z302143.92 2143.26 2143.37 2144.92 3.320 N

Note: MLLFV refers to the maximum log-likelihood function value. a The likelihood ratio test statistic compares the nonnormal model with the highest MLLFV with the normal model. The superscripts 1, 2, and 3 denote rejection of the null hypothesis of normality at the 10%, 5%, and 1% levels, respectively, according to the likelihood ratio test, while the superscript 0 indicates nonrejection at the 10% level. If the null hypothesis of normality is rejected at the 10% level, the final model is the one with the highest MLLFV, otherwise the final model is the normal. 310 Journal of Agricultural and Applied Economics, May 2010 empirically motivated (i.e., derived from para- In the case of the six sets of models corre- metric models that have been estimated on the sponding to the SU-generated datasets (Table basis of actual yield data) enhances the credi- 4), both the SB and the Beta models yield bility of the analysis. MLLFVs that are substantially lower than those

The following simulation formulas are of the SU models. On average, the MLLFVs are based on Equations (5), (6), and (7) for the 11.71 units lower in the SB, 12.60 units lower in proposed system and Equation (10) for the Beta the Beta, and 13.94 units lower in the Normal distribution: models. n In short, the MLLFV comparisons suggest SSU5Mt 1 st½sinhðufZ 1 mgÞFSU o that the SU model is not a close substitute for (21) pffiffiffiffiffiffiffiffi either the SB or the Beta, and that the SB and the 4u GSU n Beta models are poor surrogates for the SU.In

SSSB5Mt 1 st expðu½Z mÞ contrast, it appears that the SB and the Beta (22) pffiffiffiffiffiffiffiffi o models could be acceptable substitutes for each 4 GSB½1 1 expðu½Z mÞ FSB other, with the S being a better surrogate for nopffiffiffiffiffiffi B (23) SB5Mt 1 st ðB FBÞ4 GB the Beta than the Beta is for the SB. These findings support the hypothesis that distribu- where Z is a draw from a standard Normal and tions with similar MVSK coverage are better B is a draw from a Beta distribution with pa- able to ‘‘substitute’’ for each other. However, rameters d and l. the question remains as to how well these

Next, a second round of SU,SB, Beta, and nonnormal models can substitute for each Normal models are estimated on the basis of other. each of the 21 simulated datasets. Key statistics To answer this question, the cumulative about those models are presented in Table 2 distribution functions (cdfs) implied by the

(data-generating process 5 SB), Table 3 (data- second-round SU,SB, Beta, and Normal models generating process 5 Beta), and Table 4 (data- are obtained for each of the 21 cases, also generating process 5 SU). As expected, in each through simulation. The ‘‘true’’ cdfs are also of the 21 cases, the model with the highest plotted using the correct distribution and the MLLFV is the one based on the probability exact parameters underlying each of the 21 distribution used to simulate the data. data-generating processes. Two main statistics In the case of the seven sets of models related to those cdfs are also presented in Ta- corresponding to the SB-generated datasets bles 2, 3, and 4. AD is the average of 125 (Table 2), the MLLFVs of the Beta models are vertical percentage distances between the true relatively close to those of the SB models, with and the estimated cdf. Distances are computed the differences averaging 1.02 units. At 2.53 for yield values ranging from 25% to 150% of units, the average MLLFV difference between the average yields at equal 1% intervals (cdf the SB and SU models is considerably larger. values outside of that range are negligible in all The Normal models show substantially lower cases). MD represents the maximum of those MLLFVs than any of the three nonnormal 125 vertical distances. Cdf values are expressed models in all cases. to range from zero to 100%, instead of zero to In the case of the eight sets of models cor- one. responding to the Beta-generated datasets (Ta- Table 2 contains the statistics for the seven ble 3), with differences averaging 0.39 units, cases when the underlying data-generating the MLLFVs of the SB models are very close to process is SB. As expected, the estimated SB those of the Beta models. At 1.26 units, the cdfs are very close to the cdfs obtained on the average MLLFV difference between the Beta basis of the true models (AD of 0.03% and MD and the SU models is over four times larger. As of 0.17%, on average). When SU models are before, the Normal models show much lower used to approximate the SB, across the seven MLLFVs than any of the three nonnormal cases the AD averages 1.62% and the MD av- models. erages 4.50%. As anticipated, because the SK aie,MDnl,adCri:Feil il itiuinModels Distribution Yield Flexible Carpio: and McDonald, Ramirez,

Table 2. Key Statistics about the S U ,SB , Beta, and Normal Models Estimated on the Basis of the Seven S B -Simulated Datasets

FARM B (DGP 5 S B ) FARM C (DGP 5 S B ) FARM J (DGP 5 S B )

ModelU S B Beta Normal S U SB Beta Normal S U SB Beta Normal MLLFV 2156.70 2156.69 2158.02 2172.58 2168.81 2163.41 2164.79 2173.64 2181.44 2181.44 2181.97 2190.54 Skewness 23.25 22.77 21.46 0 23.24 20.63 20.71 0 22.01 21.93 21.19 0 Kurtosis 23.27 14.25 3.04 0 23.15 20.74 20.30 0 7.92 7.04 2.08 0 AD 0.07% 0.02% 0.95% 3.54% 2.65% 0.02% 1.13% 2.67% 0.04% 0.01% 0.78% 3.44% MD 0.31% 0.22% 4.58% 14.28% 7.67% 0.25% 4.83% 10.21% 0.14% 0.07% 2.77% 10.78% B ) FARM N (DGP 5 S B ) FARM O (DGP 5 S B ) FARM K (DGP 5 S ModelU S B Beta Normal S U SB Beta Normal S U SB Beta Normal MLLFV 2180.46 2176.62 2178.09 2192.06 2177.98 2171.56 2173.61 2178.09 2180.70 2178.92 2179.29 2185.33 Skewness 27.42 21.05 21.50 0 0.36 20.09 20.16 0 22.13 20.81 20.75 0 Kurtosis 176.36 0.17 2.62 0 0.23 21.21 21.09 0 9.02 20.06 0.01 0 AD 4.20% 0.04% 1.36% 5.04% 2.19% 0.04% 0.74% 2.18% 1.60% 0.02% 0.53% 2.94% MD 11.26% 0.29% 3.80% 14.06% 6.41% 0.14% 2.51% 6.70% 4.23% 0.13% 1.82% 8.92% B ) AVERAGES (DGP 5 S B ) FARM P (DGP 5 S ModelU S SB Beta Normal S U SB Beta Normal MLLFV 2185.85 2185.56 2185.59 2187.83 2175.99 2173.46 2174.48 2182.87 Skewness 20.96 20.66 20.67 0 22.66 21.13 20.92 0.00 Kurtosis 1.70 0.19 0.32 0 34.52 2.81 0.95 0.00 AD 0.62% 0.04% 0.21% 2.13% 1.62% 0.03% 0.81% 3.13% MD 1.49% 0.07% 0.57% 5.56% 4.50% 0.17% 2.98% 10.07% Notes: DGP is the data-generating process; MLLFV refers to the maximum log-likelihood function value; AD is the average of 125 vertical percentage distances between the true and the estimated cdfs; and MD represents the maximum of those 125 vertical distances. Distances are computed for yield values ranging from 25% to 150% of the mean yields at equal 1% intervals. Cumulative yield probability values outside of that range are negligible in all cases. 311 312

Table 3. Key Statistics about the S U ,SB , Beta, and Normal Models Estimated on the Basis of the Eight Beta-Simulated Datasets FARM E (DGP 5 Beta) FARM G (DGP 5 Beta) FARM M (DGP 5 Beta)

Model SU SB Beta Normal S U SB Beta Normal S U SB Beta Normal MLLFV 2181.38 2181.00 2180.71 2190.63 2172.60 2172.16 2171.79 2182.69 2183.60 2183.20 2183.14 2186.53 Skewness 22.78 21.71 21.56 0 23.16 21.83 21.59 0 21.24 20.84 20.81 0 Kurtosis 16.32 3.94 3.56 0 21.83 4.55 3.62 0 2.86 0.51 0.54 0 AD 0.83% 0.49% 0.05% 3.44% 0.86% 0.49% 0.00% 2.91% 0.58% 0.16% 0.02% 1.93% MD 2.88% 1.79% 0.24% 10.73% 3.46% 2.11% 0.04% 11.22% 1.75% 0.62% 0.08% 6.36% FARM Q (DGP 5 Beta) FARM T (DGP 5 Beta) FARM U (DGP 5 Beta) U SB Beta Normal S U SB Beta Normal S U SB Beta Normal

Model S 2010 May Economics, Applied and Agricultural of Journal MLLFV 2173.72 2173.09 2172.41 2186.76 2178.86 2178.21 2177.71 2191.80 2183.79 2181.38 2181.12 2186.05 Skewness 23.98 22.09 21.80 0 4.45 22.00 21.79 0 21.41 20.61 20.55 0 Kurtosis 37.64 5.75 4.64 0 49.11 5.09 4.66 0 3.72 20.50 20.53 0 AD 0.98% 0.71% 0.01% 3.46% 1.52% 0.97% 0.13% 4.56% 2.02% 0.51% 0.01% 2.83% MD 3.85% 2.97% 0.04% 14.16% 4.99% 3.15% 0.55% 13.74% 4.34% 1.23% 0.05% 6.89% FARM V (DGP 5 Beta) FARM Y (DGP 5 Beta) AVERAGES (DGP 5 Beta) U SB Beta Normal S U SB Beta Normal S U SB Beta Normal Model S MLLFV 2173.22 2172.52 2172.02 2186.98 2186.69 2184.53 2184.06 2191.69 2179.23 2178.26 2177.87 2187.89 Skewness 24.74 22.05 21.85 0 22.41 20.95 20.84 0 21.91 21.51 21.35 0.00 Kurtosis 57.24 5.30 5.00 0 11.79 0.22 0.07 0 25.06 3.11 2.70 0.00 AD 1.11% 0.66% 0.03% 3.09% 1.76% 0.58% 0.05% 2.80% 1.21% 0.57% 0.04% 3.13% MD 5.01% 3.23% 0.18% 14.11% 5.00% 1.95% 0.30% 9.14% 3.91% 2.13% 0.19% 10.79% Notes: DGP is the data-generating process; MLLFV refers to the maximum log-likelihood function value; AD is the average of 125 vertical percentage distances between the true and the estimated cdfs; and MD represents the maximum of those 125 vertical distances. Distances are computed for yield values ranging from 25% to 150% of the mean yields at equal 1% intervals. Cumulative yield probability values outside of that range are negligible in all cases. aie,MDnl,adCri:Feil il itiuinModels Distribution Yield Flexible Carpio: and McDonald, Ramirez,

Table 4. Key Statistics about the S U ,SB , Beta, and Normal Models Estimated on the Basis of the Six S U -Simulated Datasets

FARM A (DGP 5 S U ) FARM D (DGP 5 S U ) FARM I (DGP 5 S U )

ModelU S B Beta Normal S U SB Beta Normal S U SB Beta Normal MLLFV 2165.79 2174.33 2176.38 2177.93 2175.40 2176.05 2176.76 2178.60 2167.28 2185.23 2185.23 2185.23 Skewness 23.94 20.31 20.21 0 21.18 20.58 20.35 0 20.83 0 0 0 Kurtosis 58.33 0.14 0.03 0 3.44 0.60 0.15 0 369.63 0 0 0 AD 0.06% 2.96% 3.74% 3.91% 0.02% 0.95% 1.42% 2.03% 0.08% 5.04% 5.04% 5.04% MD 0.25% 11.34% 12.92% 14.02% 0.09% 3.11% 4.61% 6.73% 0.18% 15.60% 15.60% 15.60% U ) FARM S (DGP 5 S U ) FARM X (DGP 5 S U ) FARM R (DGP 5 S ModelU S B Beta Normal S U SB Beta Normal S U SB Beta Normal MLLFV 2176.92 2178.85 2180.59 2184.67 2181.43 2183.89 2184.68 2185.28 2186.37 2225.13 2225.13 2225.13 Skewness 22.11 20.73 20.45 0 21.39 20.25 20.13 0 229.46 0 0 0 Kurtosis 10.14 0.95 0.27 0 5.71 0.09 20.01 0 10369.3 0 0 0 AD 0.10% 1.86% 2.57% 3.80% 0.06% 1.56% 1.93% 2.45% 0.04% 9.25% 9.25% 9.25% MD 0.35% 5.75% 7.56% 10.69% 0.17% 5.42% 6.47% 7.68% 0.18% 24.85% 24.85% 24.85% U ) AVERAGES (DGP 5 S ModelU S SB Beta Normal MLLFV 2175.53 2187.24 2188.13 2189.47 Skewness 26.49 20.31 20.19 0.00 Kurtosis 1802.76 0.30 0.07 0.00 AD 0.06% 3.60% 3.99% 4.41% MD 0.20% 11.01% 12.00% 13.26% Notes: DGP is the data-generating process; MLLFV refers to the maximum log-likelihood function value; AD is the average of 125 vertical percentage distances between the true and the estimated cdfs; and MD represents the maximum of those 125 vertical distances. Distances are computed for yield values ranging from 25% to 150% of the mean yields at equal 1% intervals. Cumulative yield probability values outside of that range are negligible in all cases. 313 314 Journal of Agricultural and Applied Economics, May 2010

regions spanned by the SU and the SB do not In contrast, when SB models are used to overlap (Figure 1), in general, the SU is a poor approximate the Beta, AD averages 0.57% and surrogate for the SB. However, the SU approx- MD averages 2.13%. As anticipated, because imation of the SB is much better in two of the the SB spans all of the SK area covered by the cases when the SK combination is near the SU- Beta (Figure 1), it provides for a relatively SB boundary (Figure 2). satisfactory approximation of that distribution. Alternatively, when Beta models are used to Figure 3 provides a visual cue of the closeness approximate the SB, AD averages 0.81% and with which the typical SB model can replicate MD averages 2.98%. As anticipated, because a true Beta cdf. All vertical differences in the the Beta spans only part of the SK region lower one-third of the cdf are in fact less than covered by the SB (Figure 1) and two of the SB 1.1%. That is, the SB model can predict cu- data-generating processes exhibit SK values mulative probability at any point within the that cannot be accommodated by the Beta lower third of the true cdf with a margin of (Figure 2), this distribution does not generally error of 1.1% or less. This is particularly provide a good approximation of the SB.On noteworthy because in most cases the lower average, however, the Beta is a substantially (left) tail is the relevant segment of the cdf for better surrogate for the SB than the SU. With the purposes of risk analysis. With average ADs average ADs of 3.13% and MDs of 10.07%, the of 3.13% and MDs of 10.79%, the normal cdfs normal cdfs are by far the worst models for an are by far the worst models for an underlying underlying SB process. Beta process as well. Table 3 presents statistics for the eight in- The statistics for the six cases when the stances when the data-generating process is data-generating process is SU are shown in Beta. The estimated Beta cdfs are again very Table 4. The estimated SU cdfs are again very close to the cdfs obtained on the basis of the close to the cdfs obtained on the basis of the true models (AD of 0.04% and MD of 0.19%, true models (AD of 0.06% and MD of 0.20%, on average). When SU models are used to ap- on average). In addition, as expected from the proximate the Beta, the AD averages 1.21% fact that the SB and the Beta do not span any of and the MD averages 3.91%. Because the SK the SK space covered by the SU, with ADs regions spanned by the Beta and the SU do not averaging 3.60% and 3.99%, respectively, and overlap and are in fact separated by a widening MDs averaging 11.01% and 12.00%, re- segment of the SK space (Figure 1), the SU is spectively, they are not good surrogates for the consistently a poor surrogate for the Beta. SU. Particularly large cdf approximation errors

Figure 2. Skewness-Kurtosis Combinations of Estimated Nonnormal Models Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 315

Figure 3. Estimated SU,SB, and Normal versus True (Beta) cdf

are observed in the two cases when the SU underlying distribution are more consistent with process is characterized by a very high kurto- the Beta’s, a model based on this density would sis-to-skewness ratio, which are not shown in be expected to provide for a somewhat better fit

Figure 2 due to scale limitations. The normal than the SB. Likewise, under analogous condi- cdfs are again the worst models. tions, the Gamma, Log-normal, and other dis-

In short, the results presented in Tables 2, 3, tributions could improve fit in relation to the SB. and 4 support the working hypothesis that a However, these expected gains are only assured distribution’s capacity to provide for an adequate to materialize under large sample conditions. approximation of another is closely related to Therefore, when working with small samples, it its ability to accommodate the underlying might be best to simply consider the two most

MVSK value. These results also suggest that general alternatives (i.e., the SU and the SB)as researchers can achieve a relatively small candidate models. specification error by using parametric models based on distributions which, as a whole, span Economic Relevance the entire MVSK space. Specifically, given that there are no other distributions spanning the A final issue of interest is the economic rele- middle-upper regions of this space (the green vance of using a more suitable probability dis- area in Figure 1) and that several empirically tribution model for risk management decisions. occurring yield distributions seem to exhibit This issue is explored on the basis of the results

SK values well into this area, the SU should from the previous section. Specifically, the cdfs always be considered as a candidate model. In implied by the second-round SU,SB,Beta,and addition, the results indicate that the SB is Normal models are used to compute the values, a generally better alternative than the Beta for expressed as percentages of the distributions’ underlying distributions with SK values on the means, that correspond to the 5, 10, and 20 cdf surrounding regions of the SK space (the blue, percentiles (Table 5). In the case of farm E, for yellow, pink, and red areas in Figure 1) because example, the Normal, SU,SB,andBetacdfs the Beta can only partially cover these reach their 5th percentile at 72%, 63%, 64%, and remaining regions. 66% of their respective means. However, if those SK values are in the Beta The economic relevance of this information (i.e., yellow) area and the higher-order moments can be related to crop insurance. Assume that and support characteristics of the true a farm manager wants to know the coverage Table 5. Coverage Levels Implied by True Distribution versus Three Alternatives for Three Different Risk Tolerance Levels 316 Risk % Coverage % Coverage % Coverage % Coverage Risk % Coverage % Coverage % Coverage % Coverage TD/Farm Tolerance Normal S U S B Beta TD/Farm Tolerance Normal S U S B Beta Beta/E 5% 0.72 0.63 0.64 0.66 Sb/J 5% 0.72 0.67 0.67 0.70 10% 0.78 0.77 0.76 0.77 10% 0.78 0.79 0.79 0.79 20% 0.85 0.90 0.88 0.88 20% 0.86 0.89 0.89 0.88 Beta/G 5% 0.77 0.70 0.70 0.72 Sb/K 5% 0.63 0.35 0.52 0.55 10% 0.82 0.82 0.81 0.81 10% 0.71 0.65 0.64 0.69 20% 0.88 0.92 0.91 0.90 20% 0.81 0.89 0.81 0.84 Beta/M 5% 0.73 0.68 0.69 0.69 Sb/N 5% 0.72 0.71 0.73 0.73 10% 0.79 0.78 0.77 0.77 10% 0.79 0.78 0.76 0.77 20% 0.87 0.88 0.87 0.87 20% 0.86 0.86 0.82 0.84 Beta/Q 5% 0.76 0.69 0.67 0.70 Sb/O 5% 0.69 0.59 0.62 0.64

10% 0.81 0.81 0.79 0.80 10% 0.76 0.73 0.71 0.73 2010 May Economics, Applied and Agricultural of Journal 20% 0.87 0.93 0.91 0.90 20% 0.84 0.87 0.83 0.84 Beta/T 5% 0.67 0.54 0.54 0.60 Sb/P 5% 0.65 0.60 0.60 0.61 10% 0.74 0.73 0.71 0.73 10% 0.73 0.72 0.70 0.71 20% 0.83 0.89 0.87 0.87 20% 0.82 0.84 0.82 0.83 Beta/U 5% 0.59 0.49 0.52 0.53 Su/A 5% 0.73 0.73 0.74 0.72 10% 0.68 0.65 0.63 0.64 10% 0.79 0.84 0.80 0.78 20% 0.79 0.82 0.77 0.77 20% 0.86 0.93 0.88 0.86 Beta/V 5% 0.79 0.71 0.71 0.74 Su/D 5% 0.74 0.71 0.71 0.72 10% 0.83 0.83 0.82 0.83 10% 0.79 0.80 0.79 0.79 20% 0.89 0.94 0.92 0.92 20% 0.86 0.89 0.87 0.87 Beta/Y 5% 0.70 0.60 0.62 0.65 Su/I 5% 0.68 0.77 0.68 0.68 10% 0.77 0.74 0.72 0.73 10% 0.75 0.86 0.75 0.75 20% 0.85 0.88 0.85 0.85 20% 0.84 0.93 0.84 0.84 Sb/B 5% 0.78 0.75 0.75 0.78 Su/R 5% 0.67 0.64 0.67 0.67 10% 0.83 0.84 0.84 0.85 10% 0.75 0.76 0.76 0.75 20% 0.89 0.93 0.93 0.92 20% 0.83 0.88 0.86 0.85 Sb/C 5% 0.74 0.61 0.70 0.72 Su/S 5% 0.71 0.68 0.72 0.72 10% 0.80 0.76 0.75 0.79 10% 0.77 0.78 0.79 0.78 20% 0.87 0.90 0.85 0.87 20% 0.85 0.88 0.87 0.86 Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 317

level that he/she would need to purchase in order to be protected from yield losses that are likely to occur every 20, 10, and 5 years (i.e., for three different levels of risk tolerance: 5%, 10%, and 20%). Although actual coverage levels are limited to 65%, 70%, 75%, 80%, and 85% of the ‘‘average proven yield’’ (i.e., the estimated mean of the yield distribution), exact levels are computed and used for the purposes the four columns). of this evaluation. ed to achieve a particular level of risk In the case of farm E, for example, the true underlying distribution is Beta and, therefore,

Normal Su Sb Beta the farm manager should choose 66%, 77%, or 88% coverage levels depending on his/her risk

a tolerance. If this decision was being made on

the basis of an estimated SB distribution, the selected levels would be 64%, 76%, and 88%

Average (Table 5). In general, in the eight instances Differences when the true underlying distribution is Beta,

the coverage levels suggested by the SB model are fairly close to the correct ones. The average of the absolute differences between the correct

Beta levels and those implied by the SB distribution

% Coverage across the 24 cases (eight farms and three risk- tolerance levels) is only 1.2% (Table 5). Given the actual coverage level choices (0.65%, 0.70%, 0.75%, 0.80%, and 0.85%), differences

Sb of this magnitude are unlikely to cause an in- correct selection in most cases and, therefore, % Coverage could be considered relatively unimportant from an economic standpoint. In the case of farm C, the true underlying

distribution is SB and, therefore, the farm Su manager should choose the 70%, 75%, or 85% coverage level depending on his/her risk tol- % Coverage erance. If this decision was made on the basis of an estimated Beta model, the selected levels would be 72%, 79%, and 87%. In general, in the seven instances when the true underlying

Normal distribution is SB, the coverage levels implied % Coverage by the Beta are somewhat different from the correct ones (Table 5). The average absolute difference in this case is 1.9%. Differences of this magnitude are more likely to cause in- Risk 10%20% 0.70 0.80 0.88 0.97correct 0.70 coverage 0.80 selection 0.70 0.80 in Sb some Su cases 0.036 and 0.063 0.030 0.000 0.000 0.057 0.019 0.061 Tolerance could therefore be considered somewhat im-

Continued portant from an economic standpoint. Finally, in the six instances when the true

underlying distribution is SU, the coverage Averages of the absolute differences between the % coverage levels implied by the true distributions (in the three rows) and the four alternatives (in Table 5. TD/Farm Notes: TD/Farm is the true underlying distribution and thea farm associated with it. % Coverage refers to the coverage level that would have to be select protection (i.e., tolerance) according to each of the four distributions being evaluated. Su/X 5% 0.61levels 0.76 implied by 0.61 the Beta or 0.61 the SB model Beta are 0.031 0.021 0.012 0.000 318 Journal of Agricultural and Applied Economics, May 2010 generally quite different from the correct ones reduce the specification error risk that has long (Table 5). In the case of the Beta, the average of been associated with parametric methods, per- the absolute differences is 6.1%. Differences of haps to an acceptable level in most applica- this magnitude are likely to cause major errors tions. This conclusion, however, should be in coverage selection in most cases and, strengthened by further testing the performance therefore, are economically important. of the SU-SB family versus other parametric In short, the conclusions from the economic distributions, and does not preclude their con- relevance evaluation are consistent with those sideration as candidate models. A particularly of the previous section. If the true distribution promising alternative for future testing is the underlying the yield data are Beta and man- multivariate normal mixture, which could also agement decisions are made on the basis of an be reparameterized to span all MVSK combi- estimated SB model, the degree of error and its nations. It is also recognized that the relative economic implications are relatively minor. If complexity of the proposed family versus the the true distribution is SB and decisions are most commonly used alternatives could affect made using a Beta, the errors are somewhat its widespread applicability. higher and more likely to be economically A final caveat on the recommended distri- significant in some cases. Finally, if the un- butions is that they are continuous in nature. derlying distribution is SB or Beta and de- Thus, econometric modeling allowances must cisions are made on the basis of an SU, or if the be made when working with data discontinu- true distribution is SU and decisions are made ities such as censored yield observations due to using an SB or a Beta model, the degree of error droughts or flooding. It is also recognized that and its economic significance are likely to be a statistically reliable use of the procedures substantial. discussed in this article requires at least mod- erate sample sizes (30–50 observations), which are often not available at the individual farm Conclusion level. However, multivariate extensions of these procedures that can pool information A first general observation is that although the from several farms to estimate the skewness yield data used in the analyses is from the same and kurtosis parameters are feasible and state and crop, the SK combinations implied by straightforward, and county, state, and country the best fitting models are scattered over a large level data now span several decades. Finally, as region of the SK plane corresponding to both sample sizes grow, these procedures will be- the SU and the SB/Beta distributions, which come more usable at the single farm level as suggests a need for candidate model alterna- well. tives that comprehensively span the SK space. In addition, it is concluded that substantial, economically relevant errors in model fit References should be expected if such alternatives are not considered and the assumed distribution is in- Anderson, J.R. ‘‘Simulation: Methodology and consistent with the SK profile of the true dis- Application in Agricultural Economics.’’ Re- tribution underlying the data. Alternatively, if view of Marketing and Agricultural Economics the assumed distribution is capable of accom- 42(1974):3–55. modating the underlying MVSK values, errors Coble, K.H., T.O. Knight, R.D. Pope, and J.R. due to discrepancies in higher-order moments Williams. ‘‘Modeling Farm-level Crop In- or in the support characteristics of the assumed surance Demand with Panel Data.’’ American Journal of Agricultural Economics 78(1996): versus the true distribution appear to be rela- 439–47. tively minor. Gallagher, P. ‘‘U.S. Soybean Yields: Estimation Following the recommended strategy of al- and Forecasting with Nonsymmetric Distur- ways considering the SU and SB distributions as bances.’’ American Journal of Agricultural potential candidate models could substantially Economics 69(1987):796–803. Ramirez, McDonald, and Carpio: Flexible Yield Distribution Models 319

Johnson, N.L. ‘‘System of Frequency Curves skedastic, Correlated, Non-Normal Random Generated by Method of Translation.’’ Bio- Variables: The Case of Corn-Belt Corn, Soy- metrika 36(1949):149–76. beans and Wheat Yields.’’ American Journal of Ker, A.P., and K. Coble. ‘‘Modeling Conditional Agricultural Economics 79(1997):191–205. Yield Densities.’’ American Journal of Agri- Ramirez, O.A. and T.U. McDonald. ‘‘Ranking cultural Economics 85(2003):291–304. Crop Yield Models: A Comment.’’ American Mood, A.M., F.A. Graybill, and D.C. Boes. In- Journal of Agricultural Economics 88(2006): troduction to the Theory of Statistics.3rd ed. 1105–10. New York: McGraw-Hill, 1974. Ramı´rez, O.A., S.K. Misra, and J.E. Field. ‘‘Crop Moss, C.B., and J.S. Shonkwiler. ‘‘Estimating Yield Yield Distributions Revisited.’’ American Distributions Using a Stochastic Trend Model Journal of Agricultural Economics 85(2003): and Non-Normal Errors.’’ American Journal of 108–20. Agricultural Economics 75(1993):1056–62. Ramirez, O.A., C.B. Moss, and W.G. Boggess. Nelson, C.H., and P.V. Preckel. ‘‘The Conditional ‘‘Estimation and Use of the Inverse Hyperbolic Beta Distribution as a Stochastic Production Sine Transformation to Model Non-Normal Function.’’ American Journal of Agricultural Correlated Random Variables.’’ Journal of Economics 71(1989):370–78. Applied Statistics 21(1994):289–305. Norwood, B., M.C. Roberts, and J.L. Lusk. Taylor, C.R. ‘‘Two Practical Procedures for Esti- ‘‘Ranking Crop Yield Models Using Out-of- mating Multivariate Non-Normal Probability Sample Likelihood Functions.’’ American Density Functions.’’ American Journal of Ag- Journal of Agricultural Economics 86(2004): ricultural Economics 72(1990):210–17. 1032–43. Yatchew, A. ‘‘Nonparametric Regression Tech- Ramirez, O.A. ‘‘Estimation of a Multivariate niques in Economics.’’ Journal of Economic Parametric Model for Simulating Hetero- Literature 36(1998):669–721.