Communications in Statistics - Theory and Methods

ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: https://www.tandfonline.com/loi/lsta20

The empirical Bayes estimators of the mean and variance parameters of the normal distribution with a conjugate normal-inverse-gamma prior by the method and the MLE method

Ying-Ying Zhang, Teng-Zhong Rong & Man-Man Li

To cite this article: Ying-Ying Zhang, Teng-Zhong Rong & Man-Man Li (2019): The empirical Bayes estimators of the mean and variance parameters of the normal distribution with a conjugate normal-inverse-gamma prior by the moment method and the MLE method, Communications in Statistics - Theory and Methods, DOI: 10.1080/03610926.2018.1465081 To link to this article: https://doi.org/10.1080/03610926.2018.1465081

Published online: 01 Feb 2019.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=lsta20 COMMUNICATIONS IN STATISTICS—THEORY AND METHODS https://doi.org/10.1080/03610926.2018.1465081

The empirical Bayes estimators of the mean and variance parameters of the normal distribution with a conjugate normal-inverse-gamma prior by the moment method and the MLE method Ying-Ying Zhang, Teng-Zhong Rong, and Man-Man Li

Department of Statistics and Actuarial Science, College of Mathematics and Statistics, Chongqing University, Chongqing, China

ABSTRACT ARTICLE HISTORY Most of the samples in the real world are from the normal distributions Received 22 February 2017 with unknown mean and variance, for which it is common to assume Accepted 20 March 2018 a conjugate normal-inverse-gamma prior. We calculate the empirical KEYWORDS Bayes estimators of the mean and variance parameters of the normal dis- Empirical bayes estimators; tribution with a conjugate normal-inverse-gamma prior by the moment moment method; maximum method and the Maximum Likelihood Estimation (MLE) method in two likelihood estimation (MLE) theorems. After that, we illustrate the two theorems for the monthly method; normal distribution simple returns of the Shanghai Stock Exchange Composite Index. with normal-inverse-gamma prior; non standardized student-t distribution

2010 MATHEMATICS SUBJECT CLASSIFICATIONS 62C12; 62F10; 62F15

1. Introduction The empirical Bayes analysis is based on a perception of imprecision over the prior informa- tion,atapragmaticlevel.Itreliesonaconjugatepriormodeling,wherethehyperparameters are estimated from the observations and this “estimated prior” is then used as a regular prior in the subsequent inference. See Berger (1985), Maritz and Lwin (1989), Carlin and Louis (2000a), and the references therein. Robbins (1955, 1964, 1983) introduces the empirical Bayes method. Given (n + 1) independent observations x1, ..., xn+1 with densities f (xi|θi), it is required to draw an inference on θn+1, under the additional assumption that the θi’s have all been generated according to the same unknown prior distribution g.FromaBayesian point of view, this means that the sampling distribution is known, and the prior distribution is not. The marginal distribution can then be used to recover the prior distribution from the observations. Deely and Lindley (1981) compare the empirical Bayes approach and the hierarchical Bayes approach in the case of a . Morris (1983) investigates the theory and applications of the parametric empirical Bayes inference. Carlin and Louis (2000b) give a review of the past, present, and future of the empirical Bayes method.

CONTACT Ying-Ying Zhang [email protected] Department of Statistics and Actuarial Science, College of Mathematics and Statistics, Chongqing University, Chongqing 401331, China. Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/lsta. © 2019 Taylor & Francis Group, LLC 2 Y.-Y. ZHANG ET AL.

Sarhan (2003) estimates the reliability measures in an exponential reliability model using empirical Bayes procedure. Chang and Li (2006) use a set of unlabeled samples to establish an empirical to classify defective items. Ehsanes-Saleh et al. (2006) address the problem of heterogeneity among various studies to be combined in a meta-analysis. They adopt quasi-empirical Bayes methodology to predict the odds ratios for each study. Coram and Tang (2007) propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. Efron (2011) investigates the merits and limitations of the empirical Bayes approach for correcting selection bias. van Houwelingen (2014)reviews and discusses the role of empirical Bayes methodology in medical statistics in the last 50 years. Ghosh, Kubokawa, and Kawakubo (2015) develop hierarchical empirical Bayes and benchmarked hierarchical empirical Bayes estimators of positive small area means under multiplicative models. Satagopan et al. (2016) propose shrinkage-type estimators based on Bayesianpenalizationmethodstoestimatetheeffectsoftheriskfactorsusingthethemes deriving from biological considerations. Mostofthesamplesintherealworldarefromthenormaldistributionswithunknown mean and variance, for which it is common to assume a conjugate normal-inverse-gamma prior. For the normal model (Equation 1), our main goals are to calculate the empirical Bayes estimators of the mean and variance parameters of the model by the moment method and the Maximum Likelihood Estimation (MLE) method, and the results are summarized in Theorems 2 and 3. The rest of the paper is organized as follows. In the next Section 2, we prove three theorems and four lemmas for the normal model (Equation 1). In Section 3, we calculate the empirical Bayes estimators of the mean and variance parameters of the normal model (Equation 1)by themomentmethodandtheMLEmethodforthemonthlysimplereturnsoftheShanghai Stock Exchange (SSE) Composite Index. Section 4 concludes.

2. Main results

Suppose that the observations X1, X2, ..., Xn are from the normal distribution with normal- inverse-gamma prior: | iid∼ = Xi (μ, θ) N (μ, θ) , i 1, 2, ..., n (1) | ∼ ∼ 2 μ θ N(μ0, θ/κ0), θ IG ν0/2, ν0σ0 /2 where −∞ <μ0 < ∞, κ0 > 0, ν0 > 0, and σ0 > 0 are known hyper-parameters, N (μ, θ) is anormaldistributionwithanunknownmeanμ andanunknownvarianceθ,theconditional distribution of μ given θ is N(μ0, θ/κ0) which is a normal distribution with a known mean μ0 and an unknown variance θ/κ0, the marginal conjugate prior distribution of 2 θ is IG ν0/2, ν0σ0 /2 whichisaninversegammadistributionwithaknownshapeparameter 2 ν0/2 and a known rate parameter ν0σ0 /2. Specifically, the posterior distribution of μ and θ ∼ − 2 with a joint conjugate prior π(μ, θ) N IG(μ0, κ0, ν0, σ0 ) which is the normal-inverse- , was studied in Example 1.5.1 (p. 20) of Mao and Tang (2012), and Part I (pp. 69–70) of Chen (2014). The marginal distribution of X and the Bayes estimators of the mean and variance parameters in model (Equation 1) under the squared error loss function are characterized in the following theorem whose proof can be found in the Appendix. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 3

Theorem 1. The marginal distribution of X in model (Equation 1) is a non standardized student-t distribution, that is, σ 2 ( + κ ) ∼ 0 1 0 X tν0 μ0, κ0 = 2 + where μ0 is a location parameter, σ σ0 (1 κ0) /κ0 > 0 is a scale parameter, and ν0 > 0 is a degrees of freedom parameter. The Bayes estimators of the mean and variance parameters in model (Equation 1) under the squared error loss function are ¯ + ∗ π,μ = nx κ0μ0 π,θ = β δ2 (x) and δ2 (x) ∗ (2) n + κ0 α − 1 where ν + κ ∗ = 0 n ∗ = 1 2 + − 2 + n 0 ¯ − 2 α , β ν0σ0 (n 1) s (x μ0) 2 2 n + κ0 xisthesamplemean,ands¯ 2 is the sample variance.

To prove Theorem 2 which calculates the empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the moment method, we need the following lemmas. Lemma 1 whose proof can be found in the Appendix is about the high-order moments of the normal distribution. Lemma 1. Let X ∼ N μ, σ 2 . Then the first six moments of X are: EX = μ, EX4 = μ4 + 6μ2σ 2 + 3σ 4 EX2 = μ2 + σ 2, EX5 = μ5 + 10μ3σ 2 + 15μσ 4 EX3 = μ3 + 3μσ 2, EX6 = μ6 + 15μ4σ 2 + 45μ2σ 4 + 15σ 6

The following lemma whose proof can be found in the Appendix is about the first three moments of the inverse gamma distribution.

Lemma 2. Let θ ∼ IG (a, b) follow an inverse gamma distribution with shape parameter a > 0 and rate parameter b > 0,whosedensityisgivenby + ba 1 a 1 b f (θ|a, b) = exp − (a) θ θ Then, b Eθ = ,ifa> 1 a − 1 b2 Var (θ) = ,ifa> 2 (a − 1)2 (a − 2) b2 Eθ 2 = ,ifa> 2 (a − 1)(a − 2) b3 Eθ 3 = ,ifa> 3 (a − 1)(a − 2)(a − 3) 4 Y.-Y. ZHANG ET AL.

The following lemma whose proof can be found in the Appendix relates a non standardized student-t distribution to a mixture distribution by compounding a normal distribution with mean μ andunknownvariance,withaninversegammadistributionplacedoverthevariance with parameters a = ν/2andb = νσ2/2.

Lemma 3. Let X|θ ∼ N (μ, θ) θ ∼ IG a = ν/2, b = νσ2/2 where −∞ <μ<∞, ν>0,andσ>0 are known hyper-parameters. Then the marginal distribution of X is a non standardized student-t distribution, that is, 2 X ∼ tν μ, σ where μ is a location parameter, σ>0 is a scale parameter, and ν>0 is a degrees of freedom parameter.

Combining Lemmas 1, 2,and3,wecanprovethefollowinglemmawhichcalculatesthe 2 first six moments of a non standardized student-t distribution, tν μ, σ . The proof of the lemma can be found in the Appendix. 2 Lemma 4. Let X ∼ tν μ, σ be a non standardized student-t distribution. Then the first six moments of X are: EX = μ νσ2 EX2 = μ2 + , if ν>2 ν − 2 νσ2 EX3 = μ3 + 3μ , if ν>2 ν − 2 νσ2 ν2σ 4 EX4 = μ4 + 6μ2 + 3 , if ν>4 ν − 2 (ν − 2)(ν − 4) νσ2 ν2σ 4 EX5 = μ5 + 10μ3 + 15μ , if ν>4 ν − 2 (ν − 2)(ν − 4) νσ2 ν2σ 4 ν3σ 6 EX6 = μ6 + 15μ4 + 45μ2 + 15 ,ifν>6 ν − 2 (ν − 2)(ν − 4) (ν − 2)(ν − 4)(ν − 6)

The hyper-parameters of model (Equation 1)are−∞ <μ0 < ∞, κ0 > 0, ν0 > 0, and σ0 > 0. However, we can not directly obtain the estimators of the four hyper-parameters of model (Equation 1) by the moment method. Let 2 σ (1 + κ0) u = 0 κ0 which is the square of the scale parameter of a non standardized student-t distribution. Since κ0 and σ0 appear together in u, we can not directly obtain the estimators of κ0 and σ0 by the moment method. But we can obtain the estimator of u by the moment method. If we take κ0 = 1forsimplicity,wecanobtaintheestimatorsofκ0 and σ0 by the moment method. The empirical Bayes estimators of the mean and variance parameters of model (Equation 1)bythe COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 5 moment method are summarized in the following theorem whose proof can be found in the Appendix.

Theorem 2. The estimators of the hyper-parameters of model (Equation 1) by the moment method are

μ˜ 0 = A1 − 14 4 + 2 + 2 − 4 3 A1 4A1A2 2A2 3 A4 ν˜0 = − 2 A4 + A2 − 1 A 3 1 2 3 4 1 2 4 2 − A2 − A 5A − 6A A2 + A4 u˜ = 3 1 1 1 − 7 4 + 2 + 2 − 2 3 A1 2A1A2 A2 3 A4 where n 1 A = Xk, k = 1, 2, ... k n i i=1 is the sample kth moment of X. For simplicity, take κ˜0 = 1. Then the estimators of the hyper- parameters of model (Equation 1) by the moment method become

μ˜ 0 = A1

κ˜0 = 1 − 14 4 + 2 + 2 − 4 3 A1 4A1A2 2A2 3 A4 ν˜0 = − 2 A4 + A2 − 1 A 3 1 2 3 4 1 2 4 2 1 − A2 − A 5A − 6A A2 + A4 σ˜ 2 = u˜ = 6 1 1 1 0 − 7 4 + 2 + 2 − 2 2 3 A1 2A1A2 A2 3 A4 The empirical Bayes estimators of the mean and variance parameters of model (Equation 1) bythemomentmethodaregivenbyEquation(2) with the four hyper-parameters estimated by (μ˜ 0, κ˜0, ν˜0, σ˜0).

TheempiricalBayesestimatorsofthemeanandvarianceparametersofmodel(Equation 1) bytheMLEmethodaresummarizedinthefollowingtheoremwhoseproofcanbefoundin the Appendix.

Theorem 3. The estimators of the hyper-parameters of model (Equation 1) by the MLE method are − 2 2 (n 1) s μˆ 0 = x¯, κˆ0 =∞, νˆ0 =∞, σˆ = 0 n where x¯ is the sample mean and s2 is the sample variance. The empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the MLE method are given by Equation (2) with the four hyper-parameters estimated by μˆ 0, κˆ0, νˆ0, σˆ0 .

3. Real data example Normal distributions with unknown mean and variance exist everywhere. In this section, we exploit the data from finance. The R package quantmod (Ryan 2016)isexploitedto 6 Y.-Y. ZHANG ET AL. download the data 000001.SS (the SSE Composite Index) during 2007-01-08 and 2017-01- 13from“finance.yahoo.com.”Itiscommonlybelievedthatthemonthlysimplereturnsofthe indexdataorthestockdataarenormallydistributed.ItiseasytocheckthattheSSEmonthly simple returns follow the normal model (Equation 1). The estimators of the hyper-parameters of model (Equation 1) by the moment method are calculated as

μ˜ 0 = 0.004689882, κ˜0 = 1, ν˜0 = 22.01677, σ˜0 = 0.05556403

The empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the moment method under the squared error loss are π,μ,moment = π,θ,moment = δ2 (x) 0.004689882 and δ2 (x) 0.006313047 Theestimatorsofthehyper-parametersofmodel(Equation1)bytheMLEmethodare calculated as

μˆ 0 = 0.004689882, κˆ0 =∞, νˆ0 =∞, σˆ0 = 0.08241164

The empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the MLE method under the squared error loss are

π,μ,MLE = π,θ,MLE = δ2 (x) 0.004689882 and δ2 (x) 0.006791678 Note that the two empirical Bayes estimators of the mean by the moment method and the MLEmethodarethesame,sincetheyarebothequaltox¯.

4. Conclusion For the normal distribution with a conjugate normal-inverse-gamma prior, we find that the marginal distribution of X in model (Equation 1) is a non standardized student-t distribution, and obtain the Bayes estimators of the mean and variance parameters of model (Equation 1) under the squared error loss function in Theorem 1.ToproveTheorem 2 which calculates the empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the moment method, we prove four lemmas. Lemma 1 calculates the first six moments of the normal distribution. Lemma 2 calculates the first three moments of the inverse gamma distribution. Lemma 3 relates a non standardized student-t distribution to a mixture distribution by mixing the variance parameter of the normal distribution with an inverse gamma distribution. Lemma 4 calculates the first six moments of a non standardized student-t distribution. Moreover, we calculate the empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the MLE method in Theorem 3. Finally, in the real data section, we calculate the empirical Bayes estimators of the mean and variance parameters of model (Equation 1) by the moment method and the MLE method for the monthly simple returns of the SSE Composite Index which follows the normal model (Equation 1).

Acknowledgments

The authors are grateful to the editor, the associate editor, and the anonymous referees for their valuable comments and suggestions to improve this article. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 7

Funding

The research was supported by the Fundamental Research Funds for the Central Universities (CQDXWL-2012-004; 106112016CDJXY100002), China Scholarship Council (201606055028), National Natural Science Foundation of China (11671060), and MOE project of Humanities and Social Sciences on the west and the border area (14XJC910001).

References

Abramowitz, M., and I. A. Stegun. 1970. Handbook of mathematical functions.9thPrintinged.New York: United States Government Printing Office. Berger, J. O. 1985. Statistical decision theory and Bayesian analysis.2nded.NewYork:Springer. Carlin, B. P., and A. Louis. 2000a. BayesandempiricalBayesmethodsfordataanalysis. 2nd ed. London: Chapman & Hall. Carlin,B.P.,andA.Louis.2000b. Empirical Bayes: Past, present and future. Journal of the American Statistical Association 95:1286–90. Casella, G., and R. L. Berger. 2002. . 2nd ed. Belmont, CA: Duxbury Press. Chang, S. C., and T. F. Li. 2006. Empirical Bayes decision rule for classification on defective items in weibull distribution. AppliedMathematicsandComputation182:425–33. Chen,M.H.2014. lecture.Changchun,China:StatisticsGraduateSummerSchool, School of Mathematics and Statistics, Northeast Normal University. Coram, M., and H. Tang. 2007. Improving population-specific allele frequency estimates by adapting supplemental data: An empirical Bayes approach. The Annals of Applied Statistics 1:459–79. Deely, J. J., and D. V. Lindley. 1981. Bayes empirical Bayes. JournaloftheAmericanStatisticalAssociation 76:833–41. Efron, B. 2011. Tweedie’s formula and selection bias. Journal of the American Statistical Association 106:1602–14. Ehsanes-Saleh, A. K. M., K. M. Hassanein, R. S. Hassanein, and H. M. Kim. 2006.Quasi- empirical bayes methodology for improving meta-analysis. Journal of Biopharmaceutical Statistics 16:77–90. Ghosh,M.,T.Kubokawa,andY.Kawakubo.2015. Benchmarked empirical bayes methods in multi- plicative area-level models with risk evaluation. Biometrika 102:647–59. Jackman, S. 2009. Bayesian analysis for the social sciences. New York: Wiley. Mao, S. S., and Y. C. Tang. 2012. Bayesian statistics.2nded.Beijing,China:ChinaStatisticsPress. Maritz, J. S., and T. Lwin. 1989. Empirical Bayes methods. 2nd ed. London: Chapman & Hall. Morris, C. 1983. Parametric empirical Bayes inference: Theory and applications. Journal of the American Statistical Association 78:47–65. RCoreTeam.2017. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Robbins, H. 1955. An empirical Bayes approach to statistics. In: Proceedings of Third Berkeley Sympo- sium on Mathematical Statistics and Probability,Vol.1.UniversityofCaliforniaPress. Robbins, H. 1964. The empirical Bayes approach to statistical decision problems. Annals of Mathemat- ical Statistics 35:1–20. Robbins, H. 1983. Some thoughts on empirical Bayes estimation. Annals of Statistics 1:713–23. Ryan,J.A.2016. quantmod: Quantitative financial modelling framework. R package version 0.4-7. Sarhan, A. M. 2003. Empirical Bayes estimates in exponential reliability model. Applied Mathematics and Computation 135:319–32. Satagopan,J.M.,A.Sen,Q.Zhou,Q.Lan,N.Rothman,H.Langseth,andL.S.Engel.2016.Bayes and empirical Bayes methods for reduced rank regression models in matched case-control studies. Biometrics 72:584–95. van Houwelingen, H. C. 2014. The role of empirical Bayes methodology as a leading principle in modern medical statistics. Biometrical Journal 56:919–32. 8 Y.-Y. ZHANG ET AL.

Appendix The proofs of the lemmas and the theorems are given here.  2 The Proof of Theorem 1. Let η = μ0, κ0, ν0, σ be the hyper-parameters. We first compute the 0  0 marginal distribution of x = (x1, ..., xn) in model (Equation 1), which is f (x|μ, θ) π μ, θ|η x|η = 0 m 0 | π μ, θ x, η0

To lighten notations, the η0 will be dropped in the densities. Some of the following derivations are quoted from Example 1.5.1 (p. 20) of Mao and Tang (2012). By the Bayes Theorem, the joint posterior distribution of μ and θ is π (μ, θ|x) ∝ f (x|μ, θ) π (μ, θ) The joint conjugate prior distribution of μ and θ is decomposed as π (μ, θ) = π (μ|θ) π (θ) which is a normal-inverse-gamma distribution. Hence, π (μ, θ|x) ∝ f (x|μ, θ) π (μ|θ) π (θ) It is easy to see that ν0 + 2 1 2 1 ν0σ0 1 π (θ) ∝ exp − , θ>0, ν0, σ0 > 0, θ 2 θ − 1 κ0 2 π (μ|θ) ∝ θ 2 exp − (μ − μ0) , −∞<μ<∞, κ0 > 0, −∞<μ0 < ∞ 2θ and n n 2 1 (xi − μ) f (x|μ, θ) = f (xi|μ, θ) = √ √ exp − 2π θ 2θ i=1 i=1 n n 2 1 1 2 ∝ exp − (xi − μ) θ 2θ i=1 Thus, ν + n 1 − 0 n + − 2 1 1 2 π (μ, θ|x) ∝ θ 2 θ exp − (xi − μ) 2θ = i 1 2 κ0 2 ν0σ0 1 × exp − (μ − μ0) exp − 2θ 2 θ ν + − 1 − 0 n +1 = θ 2 θ 2 n 1 2 2 2 × exp − (xi − μ) + κ0 (μ − μ0) + ν0σ (A1) 2θ 0 i=1 Focusing on the expression in the square brackets of Equation (A1), we obtain n − 2 + − 2 + 2 (xi μ) κ0 (μ μ0) ν0σ0 i=1 n = − ¯ 2 + ¯ − 2 + − 2 + 2 (xi x) n (x μ) κ0 (μ μ0) ν0σ0 i=1 = − 2 + ¯ − 2 + − 2 + 2 (n 1) s n (x μ) κ0 (μ μ0) ν0σ0 (A2) COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 9 where x¯ is the sample mean and s2 isthesamplevariance.ThesumofthemiddletwotermsinEquation (A2) can be recombined as n (x¯ − μ)2 + κ (μ − μ )2 = nμ2 − 2nx¯μ + nx¯2 + κ μ2 − 2κ μ μ + κ μ2 0 0 0 0 0 0 0 = + 2 − ¯ + + ¯2 + 2 (n κ0) μ 2μ (nx κ0μ0) nx κ0μ0 ¯ + ¯ + 2 2 nx κ0μ0 nx κ0μ0 = (n + κ0) μ − 2μ + n + κ0 n + κ0 ¯ + 2 (nx κ0μ0) 2 2 − + nx¯ + κ0μ n + κ 0 0 2 2 nx¯ + κ0μ0 nκ0 (x¯ − μ0) = (n + κ0) μ − + n + κ0 n + κ0 Denote ⎧ ⎪ κn = κ0 + n, ⎨⎪ ¯+ = nx κ0μ0 μn n+κ , ⎪ = + 0 (A3) ⎩⎪ νn ν0 n, ν σ 2 = ν σ 2 + (n − 1) s2 + nκ0 (x¯ − μ )2 n n 0 0 n+κ0 0 Thus, n − 2 + − 2 + 2 = − 2 + 2 (xi μ) κ0 (μ μ0) ν0σ0 κn (μ μn) νnσn i=1 It is easy to calculate f (x|μ, θ) π (μ, θ) = f (x|μ, θ) π (μ|θ) π (θ) n − n − n 1 2 1 κ0 2 = (2π) 2 θ 2 exp − (xi − μ) × √ √ exp − (μ − μ0) 2θ π θ/κ 2θ i=1 2 0 ν0 2 ν0σ 2 ν 0 0 +1 2 2 1 2 ν0σ × exp − 0 ν0 θ θ 2 2 ν0 2 2 ν0σ0 ν + n+1 √ 2 1 − 0 n + − − 2 1 = (2π) 2 κ0 θ 2 θ ν0 2 n 1 2 2 2 × exp − (xi − μ) + κ0 (μ − μ0) + ν0σ 2θ 0 i=1 ν0 2 2 ν0σ0  + √   − n 1 2 − 1 −( νn +1) 1 2 2 = (2π) 2 κ0 θ 2 θ 2 exp − κn (μ − μn) + νnσ ν0 θ n 2 2 Let ν0 2 2 ν0σ0 − n+1 √ 2 C = (2π) 2 κ 1 0 ν0 2 Referring back to Equation (A1), we have    − 1 −( νn +1) 1 2 2 π (μ, θ|x) ∝ θ 2 θ 2 exp − κn (μ − μn) + νnσ 2θ n 10 Y.-Y. ZHANG ET AL.

It is shown that π (μ, θ|x) is a normal-inverse-gamma distribution as follows. The joint posterior distribution π (μ, θ|x) can be written as π (μ|θ, x) π (θ|x),where ⎧ −1/2   ⎨ θ κn 2 π (μ|θ, x) ∝ exp − (μ − μn) κn 2θ ⎩ − νn + ν σ 2 | ∝ ( 2 1) − 1 n n π (θ x) θ exp θ 2

That is, μ|θ, x ∼ N (μn, θ/κn) | ∼ ∗ = ∗ = 2 θ x IG α νn/2, β νnσn /2 Therefore, the joint posterior distribution π (μ, θ|x) is π (μ, θ|x) = π (μ|θ, x) π (θ|x) νn ν σ 2 2   n n 2 1 κn 2 2 −( νn +1) 1 νnσn = √ √ exp − (μ − μn) × θ 2 exp − θ νn θ 2π θ/κn 2 2 2 νn √ ν σ 2 2  n n   κn 2 − 1 −( νn +1) 1 2 2 = √ θ 2 θ 2 exp − κn (μ − μn) + νnσ νn θ n 2π 2 2 Let νn 2 √ νnσn 2 κn 2 C = √ 2 νn 2π 2 Consequently,    − 1 −( νn +1) 1 2 2 f (x|μ, θ) π μ, θ|η C1θ 2 θ 2 exp − κn (μ − μn) + νnσ |η = 0 =  2θ  n  m x 0 − 1 − νn + π μ, θ|x, η 2 ( 2 1) − 1 − 2 + 2 0 C2θ θ exp 2θ κn (μ μn) νnσn ν0 2 ν0σ 2 0 ν0 n+1 √ 2 1 2 − − n ν0σ 2 ν π 2 κ 2 2 0 n (2 ) 0 ν0 (2π) κ0 C1 ( 2 ) 2 2 = = = νn 1 νn C 2 2 2 2 2 √ νnσn 2 νnσn ν0 2 κn √κn 2 2 νn 2π ( 2 ) When n = 1 in the above expression, we obtain the marginal distribution of x as ν0 ν0 1 2 1 2 − 1 ν0σ 2 ν − 1 ν0σ 2 ν +1 (2π) 2 κ 2 0 1 (2π) 2 κ 2 0 0 | = 0 2 2 = 0 2 2 m x η0 ν1 ν +1 1 2 κ 0 ν1σ 2 ν 2+ 0 − 2 2 2 1 0 1 ν0σ0 1+κ (x μ0) ν κ1 + 2 0 0 2 2 (κ0 1) 2 2 + − ν0 1 2 κ0 2 2 ν0σ + + (x − μ0) ∝ 0 1 κ0 2 + − ν0 1 2 2 ν0σ 2 κ = 0 1 + 0 (x − μ )2 2 + 0 2 ν0σ0 2 (1 κ0) + ⎡ ⎤− ν0 1 2 − μ 2 σ 2 ( + κ ) ∝ ⎣ + 1 (x 0) ⎦ ∼ 0 1 0 1 2 + tν0 μ0, ν0 σ0 (1 κ0) κ0 κ0 = which is a non standardized student-t distribution, where μ0 is a location parameter, σ 2 + σ0 (1 κ0) /κ0 > 0isascaleparameter,andν0 > 0 is a degrees of freedom parameter. COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 11

The marginal posterior distribution of μ is # #  ∞   | = | ∝ − 1 −( νn +1) − 1 − 2 + 2 π (μ x) π (μ, θ x) dθ θ 2 θ 2 exp κn (μ μn) νnσn dθ 0 2θ #  ∞ + 2 2 − νn 1 +1 1 κ (μ − μ ) + ν σ ∝ θ 2 exp − n n n n dθ θ 0 # 2 (α ) ∞ = n ( ) αn f y dy βn 0 where f (y) is the pdf of + − 2 + 2 νn 1 κn (μ μn) νnσn Y ∼ IG αn = , βn = 2 2 $ ∞ = Due to 0 f (y)dy 1, we have   ν + − − n 1 | ∝ αn ∝ − 2 + 2 2 π (μ x) βn κn (μ μn) νnσn − νn+1 1 (μ − μ )2 2 σ 2 ∝ + n ∼ μ n 1 2 tνn n, νn σn /κn κn which is a non standardized√ student-t distribution with degree of freedom νn, location parameter μn, and scale parameter σn/ κn. The of μ under the squared error loss function is given by ¯ + π,μ = | = = nx κ0μ0 δ2 (x) E (μ x) μn n + κ0 The Bayes estimator of θ under the squared error loss function is given by β∗ δπ,θ (x) = E (θ|x) = 2 α∗ − 1 where ν + κ ∗ = 0 n ∗ = 1 2 + − 2 + n 0 ¯ − 2 α , β ν0σ0 (n 1) s (x μ0) 2 2 n + κ0 The proof of the theorem is complete.

The Proof of Lemma 1. The proof of the lemma exploits Stein’s lemma (see Lemma 3.6.5 in Casella and Berger 2002). The first two moments of X are familiar to all. The calculation of EX3 can be found in Example 3.6.6 in Casella and Berger (2002) and thus it is omitted. Now we calculate EX4.Wehave EX4 = EX3 (X − μ + μ)  = EX3 (X − μ) + μEX3 (g (x) = x3, g (x) = 3x2) = 3σ 2EX2 + μEX3 = 3σ 2 μ2 + σ 2 + μ μ3 + 3μσ 2 = μ4 + 6μ2σ 2 + 3σ 4 Similarly, EX5 = EX4 (X − μ + μ)  = EX4 (X − μ) + μEX4 (g (x) = x4, g (x) = 4x3) = 4σ 2EX3 + μEX4 = 4σ 2 μ3 + 3μσ 2 + μ μ4 + 6μ2σ 2 + 3σ 4 = μ5 + 10μ3σ 2 + 15μσ 4 12 Y.-Y. ZHANG ET AL. and EX6 = EX5 (X − μ + μ)  = EX5 (X − μ) + μEX5 (g (x) = x5, g (x) = 5x4) = 5σ 2EX4 + μEX5 = 5σ 2 μ4 + 6μ2σ 2 + 3σ 4 + μ μ5 + 10μ3σ 2 + 15μσ 4 = μ6 + 15μ4σ 2 + 45μ2σ 4 + 15σ 6 The proof of the lemma is complete.

The Proof of Lemma 2. The expectation and variance of the inverse gamma distribution can be found in Definition B.35 in Jackman2009 ( ) and thus it is omitted. It is easy to calculate Eθ 2 = Var (θ) + (Eθ)2 b2 b 2 b 2 1 = + = + 1 (a − 1)2 (a − 2) a − 1 a − 1 a − 2 2 − 2 = b a 1 = b (a − 1)2 a − 2 (a − 1)(a − 2) Now we calculate Eθ 3. Since f (θ|a, b) is a density, after some arrangements, we have # ∞ + 1 a 1 b (a) − θ = exp d a (A4) 0 θ θ b Therefore, for a > 3, we have # ∞ + ba 1 a 1 b Eθ 3 = θ 3 exp − dθ 0 (a) θ θ # ∞ − + ba 1 (a 3) 1 b = exp − dθ (a) 0 θ θ a − = b (a 3) (a) ba−3 3 = b (a − 1)(a − 2)(a − 3) The proof of the lemma is complete.

The Proof of Lemma 3. By straightforward calculations, the density of the marginal distribution of X is # ∞ # ∞ m (x) = f (x, θ) dθ = f (x|θ) π (θ) dθ 0 0 ν # νσ2 2 ν ∞ 2 +1 1 (x − μ) 2 1 2 νσ2 = √ √ exp − exp − dθ θ ν θ θ 0 2π θ 2 2 2 ν 2 2 # +  νσ ∞ ν 1 +1 1 2 1 2 1   = √ exp − (x − μ)2 + νσ2 dθ ν θ θ 2π 2 0 2 Now recognizing the integrand of the above integral as a kernel of the inverse gamma distribution ν + 1 (x − μ)2 + νσ2 IG a = , b = 2 2 COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 13 and applying Equation (A4), we have ν νσ2 2 ν+1 = √1 2 2 m (x) ν   ν+1 2π 2 (x−μ)2+νσ2 2 2 ν 2 ν+1 νσ 2 = √2 2 1 ν ν+1   ν+1 π 2 2 − 2 2 2 2 νσ + (x μ) 1 2 2 νσ + ν 1 1 = √2   ν+1 ν 2 − 2 2 πνσ + 1 (x μ) 2 1 ν 2 σ 2 ∼ tν μ, σ The proof of the lemma is complete.

The Proof of Lemma 4. We will use the iterated expectation identity and Lemmas 1, 2,and3.Notethat −∞ <μ<∞, ν>0, and σ>0 are known parameters. By Lemmas 1 and 3,wehave E (X|θ)= μ, E X4|θ = μ4 + 6μ2θ + 3θ 2 E X2|θ = μ2 + θ, E X5|θ = μ5 + 10μ3θ + 15μθ 2 E X3|θ = μ3 + 3μθ, E X6|θ = μ6 + 15μ4θ + 45μ2θ 2 + 15θ 3 By Lemmas 2 and 3,wehave

2 b νσ νσ2 θ = = 2 = ν> E − ν − − ,if 2 a 1 2 1 ν 2 b2 ν2σ 4 Eθ 2 = = ,ifν>4 (a − 1)(a − 2) (ν − 2)(ν − 4) b3 ν3σ 6 Eθ 3 = = ,ifν>6 (a − 1)(a − 2)(a − 3) (ν − 2)(ν − 4)(ν − 6) Therefore, we have EX = E [E (X|θ)] = E [μ] = μ,     νσ2 EX2 = E E X2|θ = E μ2 + θ = μ2 + ν − 2     νσ2 3 = 3|θ = μ3 + μθ = μ3 + μ EX E E X E 3 3 −     ν 2 EX4 = E E X4|θ = E μ4 + 6μ2θ + 3θ 2 νσ2 ν2σ 4 = μ4 + μ2 + 6 − 3 − −  ν 2  (ν 2)(ν 4)  EX5 = E E X5|θ = E μ5 + 10μ3θ + 15μθ 2 νσ2 ν2σ 4 = μ5 + 10μ3 + 15μ ν − ν − ν −   2  ( 2)( 4)  EX6 = E E X6|θ = E μ6 + 15μ4θ + 45μ2θ 2 + 15θ 3 νσ2 ν2σ 4 ν3σ 6 = μ6 + 15μ4 + 45μ2 + 15 ν − 2 (ν − 2)(ν − 4) (ν − 2)(ν − 4)(ν − 6) The proof of the lemma is complete. 14 Y.-Y. ZHANG ET AL.

The Proof of Theorem 2. The hyper-parameters of model (Equation 1)are−∞ <μ0 < ∞, κ0 > 0, ν0 > 0, and σ0 > 0. By Theorem 1,weknowthatthemarginaldistributionofX in model (Equation 1) is a non standardized student-t distribution, that is, σ 2 ( + κ ) ∼ 0 1 0 X tν0 μ0, κ0 Since there are four hyper-parameters, if we want to obtain the estimators of the hyper-parameters of model (Equation 1) by the moment method, we need to calculate the first four moments of X at least. By Lemma 4, we obtain the first six population moments of X as follows. Furthermore, let the population momentsbeequaltothesamplemoments,weobtain

EX = μ0 = A1 ν σ 2 ( + κ ) 2 = 2 + 0 0 1 0 = EX μ0 A2,ifν0 > 2 ν0 − 2 κ0 2 + 3 3 ν0 σ0 (1 κ0) EX = μ + 3μ0 = A3,ifν0 > 2 0 ν − 2 κ 0 0 ν σ 2 ( + κ ) ν2 σ 2 ( + κ ) 2 4 = 4 + 2 0 0 1 0 + 0 0 1 0 EX μ0 6μ0 3 ν0 − 2 κ0 (ν0 − 2)(ν0 − 4) κ0 = A4,ifν0 > 4 ν σ 2 ( + κ ) ν2 σ 2 ( + κ ) 2 5 = 5 + 3 0 0 1 0 + 0 0 1 0 EX μ0 10μ0 15μ0 ν0 − 2 κ0 (ν0 − 2)(ν0 − 4) κ0 = A5,ifν0 > 4 ⎧   ⎫ ⎪ 2 + 2 2 + 2 ⎪ ⎨ 6 4 ν0 σ0 (1 κ0) 2 ν0 σ0 (1 κ0) ⎬ μ + 15μ − + 45μ − − 6 0 0 ν0 2 κ0 0 (ν0 2)(ν0 4) κ0 EX =   = A6,ifν0 > 6 ⎪ ν3 σ 2(1+κ ) 3 ⎪ ⎩ +15 0 0 0 ⎭ (ν0−2)(ν0−4)(ν0−6) κ0

From the first moment of X, we obtain the moment estimator of μ0 as

μ˜ 0 = A1 Let 2 + ν0 σ0 (1 κ0) u1 = ν − 2 κ 0 0 2 2 + 2 ν0 σ0 (1 κ0) u2 = (ν0 − 2)(ν0 − 4) κ0 3 2 + 3 ν0 σ0 (1 κ0) u3 = (ν0 − 2)(ν0 − 4)(ν0 − 6) κ0 From the second moment of X, we obtain the moment estimator of u1 as ˜ = − 2 u1 A2 A1 From the third moment of X, we obtain the moment estimator of u1 as − 3 A3 A1 u˜ 1 = 3A1 Obviously, the two moment estimators of u1 are not equal. Therefore, we choose one of them as the moment estimator of u1. For simplicity, we use the moment estimator of u1 calculated from the second moment of X, and ignore the third equation involving EX3.Similarly,theequationsinvolvingEX4 and 5 4 EX both have u1 and u2. For simplicity, we use the equation involving EX , and ignore the equation involving EX5.Tohavefourequations,wewillusetheequationinvolvingEX6. Therefore, the moment equations become

EX = μ0 = A1 COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 15

2 = 2 + = EX μ0 u1 A2 4 = 4 + 2 + = EX μ0 6μ0u1 3u2 A4 6 = 6 + 4 + 2 + = EX μ0 15μ0u1 45μ0u2 15u3 A6 Solving the above moment equations, we obtain

μ˜ 0 = A1 2 + ν0 σ0 (1 κ0) 2 u˜ 1 = = A2 − A B1 ν − 2 κ 1 0 0 ν2 σ 2 ( + κ ) 2 ˜ = 0 0 1 0 = 1 − 2 + 4 u2 A4 6A1A2 5A1 B2 (ν0 − 2)(ν0 − 4) κ0 3 3 2 + 3 ν0 σ0 (1 κ0) u˜ 3 = (ν0 − 2)(ν0 − 4)(ν0 − 6) κ0 1 6 4 2 = −61A + 75A2A − 15A4A + A6 B3 15 1 1 1 Our objective is to calculate the moment estimators of the four hyper-parameters. Let 2 σ (1 + κ0) u = 0 κ0 The moment equations involving u˜ 1, u˜ 2,andu˜ 3 become ν0 u = B1 ν0 − 2 2 ν0 2 u = B2 (ν0 − 2)(ν0 − 4) 3 ν0 3 u = B3 (ν0 − 2)(ν0 − 4)(ν0 − 6) Since there are three equations and only two parameters ν0 and u, for simplicity, we will only use the first two equations, and ignore the third equation. Solving the above first two equations for ν0 and u, we obtain the moment estimators of ν0 and u as 2B2 − 4B −B B ν˜ = 1 2 , u˜ = 1 2 0 2 − 2 − B1 B2 B1 2B2 Substituting the expressions of B1 and B2, and after some algebra, we obtain the expressions of ν˜0 and u˜ in terms of Ai as − 14 4 + 2 + 2 − 4 3 A1 4A1A2 2A2 3 A4 ν˜0 = − 2 A4 + A2 − 1 A 3 1 2 3 4 1 2 4 2 − A2 − A 5A − 6A A2 + A4 u˜ = 3 1 1 1 − 7 4 + 2 + 2 − 2 3 A1 2A1A2 A2 3 A4 For simplicity, take κ˜0 = 1. Then the estimators of the hyper-parameters of model (Equation 1)bythe moment method become

μ˜ 0 = A1 κ˜0 = 1 − 14 4 + 2 + 2 − 4 3 A1 4A1A2 2A2 3 A4 ν˜0 = − 2 A4 + A2 − 1 A 3 1 2 3 4 1 2 4 2 1 − A2 − A 5A − 6A A2 + A4 σ˜ 2 = u˜ = 6 1 1 1 0 − 7 4 + 2 + 2 − 2 2 3 A1 2A1A2 A2 3 A4 16 Y.-Y. ZHANG ET AL.

The empirical Bayes estimators of the mean and variance parameters of model (Equation 1)bythe moment method are given by Equation (2) with the four hyper-parameters estimated by (μ˜ 0, κ˜0, ν˜0, σ˜0). The proof of the theorem is complete.

−∞ ∞ 2 The Proof of Theorem 3. The hyper-parameters are <μ0 < , κ0 > 0, ν0 > 0, and σ0 > 0. Let = 2 =  γ0 σ0 and η0 (μ0, κ0, ν0, γ0) .ByTheorem 1, we obtain the of η0: ν0 1 2 − n ν0σ 2 ν (2π) 2 κ 2 0 n | = | = 0 2 2 L η0 x m x η0 νn 1 2 2 νnσn 2 ν0 κn 2 2 By (A3), the log-likelihood function of η becomes 0 2 n 1 1 ν0 ν0σ0 log L η |x =− log (2π) + log κ0 − log κn + log 0 2 2 2 2 2 ν ν σ 2 ν ν − n log n n + log n − log 0 2 2 2 2   n 1 1 ν0 =− log (2π) + log κ0 − log (κ0 + n) + log ν0 + log γ0 − log 2 2 2 2 2  + ν0 n 2 nκ0 2 − log ν0γ0 + (n − 1) s + (x¯ − μ0) − log 2 2 n + κ0 ν + n ν + log 0 − log 0 2 2

First, we calculate the MLE of μ0 by solving nκ0 ¯ − − ∂ ν + n + 2 (x μ0)( 1) | =− 0 n κ0 = log L η0 x nκ 2 0 ∂μ0 2 ν γ + (n − 1) s2 + 0 (x¯ − μ ) 0 0 n+κ0 0 It is easy to obtain

μˆ 0 = x¯ = ¯ The log-likelihood function of η0 evaluated at μ0 x becomes n 1 1 log L η |x =− log (2π) + log κ0 − log (κ0 + n) 0 μ0=x¯ 2  2 2  ν0 + log ν0 + log γ0 − log 2 2 +     ν0 n 2 − log ν0γ0 + (n − 1) s − log 2 2 ν + n ν + log 0 − log 0 2 2

Second,wecalculatetheMLEofκ0 by observing ∂ | = 1 1 − 1 1 = 1 n log L η0 x =¯ > 0 ∂κ μ0 x 2 κ 2 κ + n 2 κ (κ + n) 0 0 0 0 0 which means that log L η |x is an increasing function of κ0,andthus 0 μ0=x¯

κˆ0 =∞ = ¯ =∞ The log-likelihood function of η0 evaluated at μ0 x and κ0 becomes   | =−n + ν0 + − log L η0 x =¯ =∞ log (2π) log ν0 log γ0 log 2 μ0 x,κ0 2 2 +     ν0 n 2 − log ν0γ0 + (n − 1) s − log 2 2 COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 17

ν + n ν + log 0 − log 0 2 2

Third, we calculate the MLE of γ0 by solving ∂ ν 1 ν + n ν η | = 0 − 0 0 log L 0 x μ =x¯,κ =∞ 2 ∂γ0 0 0 2 γ0 2 ν0γ0 + (n − 1) s ν (n − 1) s2 − nγ = 0  0  = 0 2 2 γ0 ν0γ0 + (n − 1) s It is easy to obtain − 2 2 (n 1) s γˆ0 =ˆσ = 0 n 2 The log-likelihood function of η evaluated at μ0 = x¯, κ0 =∞,andγ0 = (n − 1) s /n becomes 0 | log L η0 x =¯ =∞ = − 2 μ0 x,κ0 ,γ0 (n 1)s /n 2 n ν0 (n − 1) s =− log (2π) + log ν0 + log − log 2 2 2 n  + − 2 ν0 n (n 1) s 2 − log ν0 + (n − 1) s − log 2 2 n ν + n ν + log 0 − log 0 2 2 It is easy to see that − 2 − 2 (n 1) s 2 2 ν0 (n 1) s ν0 + (n − 1) s = (n − 1) s + 1 = (ν0 + n) n n n Therefore, − 2 | =−n + ν0 + (n 1) s − log L η0 x =¯ =∞ = − 2 log (2π) log ν0 log log 2 μ0 x,κ0 ,γ0 (n 1)s /n 2 2 n  2 ν0 + n (n − 1) s − log (ν0 + n) + log − log 2 2 n ν + n ν + log 0 − log 0 2 2

Fourth, we calculate the MLE of ν0 by solving = ∂ | f (ν0) log L η0 x =¯ =∞ = − 2 ∂ν μ0 x,κ0 ,γ0 (n 1)s /n 0 2 1 (n − 1) s ν0 1 = log ν0 + log − log 2 + 2 n 2 ν 0 2 1 (n − 1) s ν0 + n 1 − log (ν0 + n) + log − log 2 − 2 n 2 ν0 + n +  ν0 n  ν0 2 1 2 1 + + − ν0 n 2 ν0 2 2 2 1 ν0 + n ν0 = log ν0 − log (ν0 + n) + ψ − ψ 2 2 2   1 ν0 + n ν0 = ψ − log (ν0 + n) − ψ − log ν0 2 2 2 18 Y.-Y. ZHANG ET AL. y 0 20406080100

0246810 v0

Figure A1. The figure of f (ν0) on (0, 10] when n = 10. where  (z) d log (z) ψ (z) = = (z) dz is the digamma function, and (z) is the gamma function. In R software (R Core Team 2017), the function digamma (z) calculates ψ (z).Afigureoff (ν0) on (0, 10] when n = 10 is plotted below. From Figure A1 weseethatitisdecreasing,anditispositiveforallν0 > 0. We want to prove that f (ν ) > 0forallν > 0. Let 0 0 x g (x) = ψ − log x 2 Then ∂ = η | f (ν0) log L 0 x μ =x¯,κ =∞,γ =(n−1)s2/n ∂ν0 0 0 0 1   = g (ν0 + n) − g (ν0) > 0 2 ⇔ g (ν0 + n) > g (ν0)   x 1 1 ⇔ g (x) = ψ − > 0, for x > 0 2 2 x  x 2 ⇔ ψ > ,forx > 0 2 x  1 x ⇔ ψ (u) > ,foru > 0 let u = (A5) u 2 We will use formula (6.4.10) in Abramowitz and Stegun (1970) (p. 260): ∞ + − − ψ(n) (z) = (−1)n 1 n! (z + k) n 1 , z = 0, −1, −2, ... (6.4.10) k=0 Let n = 1 in Equation (6.4.10) of Abramowitz and Stegun (1970), we obtain ∞  1 ψ (z) = (z + k)2 k=0 COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 19 which is “larger” than # ∞ ∞ 1 1 1 dk =− = + 2 + 0 (z k) z k k=0 z A sketched figure will help understand the “larger” part. Therefore, Equation (A5)iscorrectand = ∂ | f (ν0) log L η0 x =¯ =∞ = − 2 > 0 ∂ν μ0 x,κ0 ,γ0 (n 1)s /n 0 | which means that log L η x 2 is an increasing function of ν0,andthus 0 μ0=x¯,κ0=∞,γ0=(n−1)s /n

νˆ0 =∞ The empirical Bayes estimators of the mean and variance parameters of model (Equation 1)bythe MLEmethodaregivenbyEquation(2) with the four hyper-parameters estimated by μˆ 0, κˆ0, νˆ0, σˆ0 . The proof of the theorem is complete.