for the of a Student-t density

Jean-Fran¸cois Angers∗†

CRM-2642 February 2000

∗D´ep. de math´ematiques et de statistique; Universit´ede Montr´eal; C.P. 6128, Succ. ”Centre-ville”; Montr´eal, Qu´ebec, H3C 3J7;[email protected] †This research has been partially funded by NSERC, Canada

Abstract Student-t densities play an important role in Bayesian . For example, suppose that an of the of a normal population with unknown is desired, then the marginal posterior density of the mean is often a Student-t density. In this paper, estimation of the location parameter of a Student-t density is considered when its prior is also a Student-t density. It is shown that the posterior mean and variance can be written as the ratio of two finite sums when the number of the degrees of freedom of both the and the prior are odd. When one of them (or both) is even, approximations for the posterior mean and variance are given. The behavior of the posterior mean is also investigated in presence of outlying observations. When robustness is achieved, second order approximations of the estimator and its posterior expected loss are given.

Mathematics Subject Classification: 62C10, 62F15, 62F35.

Keyworks:Robust estimator, Fourier transform, Convolution of Student-t densities.

1 Introduction

Heavy tail priors play an important role in . They can be viewed as an alternative to noninformative priors since they lead to which are insensitive to misspecification of the prior parameters. However, they allow the use of prior information when it is available. Because of its heavier tails, the Student-t density is a “robust” alternative to the normal density when large observations are expected. (Here, robustness that the prior information is ignored when it conflicts with the information contained in the .) This density is also encountered when the data come from a normal population with unknown variance. In this paper, the problem of estimating the location of a Student-t density, under squared- error loss, is considered. To obtain an estimator which will ignore the prior information when it conflicts with the likelihood information, the prior density proposed in the paper is another Student- t density. However, it has fewer degrees of freedom than the likelihood. Consequently, the prior tails are heavier that those of the likelihood, resulting in an estimator which is insensitive to prior misspecification (cf. O’Hagan, 1979). This problem has been previously studied by Fan and Berger (1990), Angers and Berger (1991), Angers (1992) and Fan and Berger (1992). However, some conditions have to be imposed on the degrees of freedom in order to obtain an analytic expression for the estimator. A statistical motivation of the importance of this problem can be found in Fan and Berger (1990). In Section 2 of this paper, it is assumed that the degrees of freedom of both the prior and the likelihood are odd. Using Angers (1996a), an alternative form (which is sometimes easier to use (cf. Angers, 1996b)) for the estimator is also proposed in this section. In Section 3, it is shown that the effect of large observation of the proposed estimator is limited. In the last section, using Saleh (1994), an approximation is considered for the case where the number of degrees of freedom of the likelihood function is even.

2 Development of the estimator—odd degrees of freedom

Let us consider the following model:

X | θ ∼ T2k+1(θ, σ),

θ ∼ T2κ+1(µ, τ),

where σ, µ and τ are known and both k and κ are in N. The notation Tm(η, ν) denotes the Student-t density with m degrees of freedom and location and scale parameters respectively given by η and ν, that is Γ([m + 1]/2) (x − η)2 !−[m+1]/2 f (x | η, ν) = √ 1 + . (1) m ν mπΓ(m/2) mν2

Since the are assumed to be known, we suppose, without loss of generality, that µ = 0 and σ = 1. The general case can be obtained by replacing X by σX + µ and θ by θ + µ in Theorems 2 and 3. In Angers (1996a), the following theorem is proved.

1 Theorem 1. If X | θ ∼ g(x − θ) and if θ | τ ∼ τ −1h(θ/τ), then

m(x) = marginal density of X evaluated at x

= I0(x), (2) θb(x) = posterior expected mean of θ I (x) = x − i 1 , (3) I0(x) ρ(x) = posterior variance of θ I (x)!2 I (x) = 1 − 2 , (4) I0(x) I0(x) √ −1 (j) (j) where i = −1, Ij(x) = F {hb(τs)gb (s); x}, hb(s) denotes the Fourier transform of h(x), gb (s) the jth derivative of the Fourier transform of g(x) and F −1{fb; x} represents the inverse Fourier transform of fb evaluated at x. Applications of this theorem to several models can be found in Leblanc and Angers (1995) and Angers (1996a, 1996b). In order to compute equations (2),(3) and (4), the Fourier transform of a Student-t density is needed. It is given, along with its first two derivatives, in the following proposition. Since the proof is mostly technical, it is omitted.

Proposition 1. If X ∼ Tm(0, σ), then √ ( mσ|s|)m/2 √ fb (s) = K ( mσ|s|), m 2[m−2]/2Γ(m/2) m/2 √ ( mσ|s|)m/2 √ fb0 (s) = −σ sign(s) K ( mσ|s|), m 2[m−2]/2Γ(m/2) [m−2]/2 √ m(m − 1)σ2( mσ|s|)m−2 √ fb00 (s) = mσ2fb(s) − K ( mσ|s|), m 2[m−2]/2Γ(m/2) [m−2]/2

where Km/2(s) denotes the modified Bessel function of the second kind of order m/2.

Note that if m = 2k + 1 where k ∈ N, then, using Gradshteyn and Ryzhik (1980, equation (8.468)), we have that √ √ Km/2( mσ|s|) = Kk+1/2( mσ|s|) √ π 1 k (2k − p)! √ = √ X (2σ 2k + 1|s|)p. (5) k+1/2 k (2σ 2k + 1) |s| p=0 p!(k − p)! To obtain the marginal density of x, the posterior mean and variance of θ, we need to compute

−1 (j) F {fb2κ+1(τs)fb2k+1(s); x}, for j = 0, 1 and 2. Hence, the following two integrals need to be evaluated: Z ∞ √ k+κ−l+1 Ak,l(x) = cos(|x|s)s Kk−l+1/2( 2k + 1s) 0 √ × Kκ+1/2( 2κ + 1τs)ds, (6)

2 for l = 0 and 1 and Z ∞ √ k+κ+1 Bk(x) = sin(|x|s)s Kk−1/2( 2k + 1s) 0 √ × Kκ+1/2( 2κ + 1τs)ds. (7) Using Angers (1997), we can also show that Theorem 2. (2k + 1)[2k+1]/4(2κ + 1)[2κ+1]/4τ 2κ+1 m (x) = A (x), 2k+1 2k+κπΓ(k + 1/2)Γ(κ + 1/2) k,0

√ Bk(x) θb2k+1(x) = x − 2k + 1 sign(x) , Ak,0(x)  !2 2k Ak,1(x) Bk(x) ρ2k+1(x) = (2k + 1) √ − 1 +  . 2k + 1 Ak,0(x) Ak,0(x)

In order to compute equations (6) and (7), we need the following lemma. This lemma can be proven using Gradshteyn and Ryzhik (1980, equations (3.944.5) and (3.944.6)). Lemma 1. Z ∞ Γ(a + 1) sa cos(xs) e−bs ds = cos([a + 1] tan−1(x/b)), 0 (b2 + x2)[a+1]/2 Z ∞ Γ(a + 1) sa sin(xs) e−bs ds = sin([a + 1] tan−1(x/b)). 0 (b2 + x2)[a+1]/2

Using Lemma 1 and equation (5), the functions Ak,l(x) and Bk(x) can be easily evaluated and they are given in the following theorem. Theorem 3. π 1 A (x) = k,l 2k−l+κ+1 (2k + 1)[2(k−l)+1]/4([2κ + 1]τ 2)[2κ+1]/4 k−l κ X X (2[k − l] − p)! (2κ − q)!

p=0 q=0 (k − l − p)! (κ − q)! p + q! 2p+q(2k + 1)p/2([2κ + 1]τ 2)q/2 × √ √ (8) q ([ 2k + 1 + τ 2κ + 1]2 + x2)[p+q+1]/2 " |x| #! × cos [p + q + 1] tan−1 √ √ , 2k + 1 + τ 2κ + 1 π 1 B (x) = k 2k+κ (2k + 1)[2k−1]/4([2κ + 1]τ 2)[2κ+1]/4 k−1 κ X X (2[k − 1] − p)! (2κ − q)!

p=0 q=0 (k − 1 − p)! (κ − q)! p + q! 2p+q(2k + 1)p/2([2κ + 1]τ 2)q/2 × √ √ (9) q ([ 2k + 1 + τ 2κ + 1]2 + x2)[p+q+2]/2 " |x| #! × sin [p + q + 2] tan−1 √ √ . 2k + 1 + τ 2κ + 1

3 Using Theorems 2 and 3, the posterior expected mean and the posterior variance can be com- puted using only a ratio of two finite sums. In Section 4, the case where the likelihood function is a Student-t density with an even number of degrees of freedom is considered. In this situation, the posterior quantities cannot be written using finite sums, although they can be expressed as the ratio of two infinite series (cf. Angers, 1997). However, using an approximation for the Student-t density (cf. Saleh94), θb2k and ρ2k(x) can be approximated accurately. Before doing so, we first discuss two limit cases, that is, when |x| is large and when τ → ∞.

3 Special cases

The main advantage of using a heavy-tails prior is that the resulting , under the squared-error loss, is insensitive to the choice of prior when there is a conflict between the prior and the likelihood information. This situation is considered in the next subsection.

3.1 Behavior of θb2k+1(x) for large |x|

In order to study the behavior of θb2k+1(x) for large values of |x|, it should first be noted that √ √ x ! 2k + 1 + τ 2κ + 1 ! tan−1 √ √ = cos−1 √ √ 2k + 1 + τ 2κ + 1 [ 2k + 1 + τ 2κ + 1]2 + x2 x2 ! = sin−1 √ √ . [ 2k + 1 + τ 2κ + 1]2 + x2

Using these last equalities in equations (8) and (9), the following theorem can be proven.

Theorem 4. π 1 A = k,l 2k−l+κ+1 (2k + 1)[2(k−l)+1]/4([2κ + 1]τ 2)[2κ+1]/4 " c # × √ √l + O(|x|−6) , ([ 2k + 1 + τ 2κ + 1]2 + x2)2 π 1 B (x) = k 2k+κ (2k + 1)[2k−1]/4([2κ + 1]τ 2)[2κ+1]/4 " 4c |x| # × √ √1 + O(|x|−6) , ([ 2k + 1 + τ 2κ + 1]2 + x2)3 where

(2[k − l])!(2κ)! √ √ n √ √ c = [ 2k + 1 + τ 2κ + 1] 2[ 2k + 1 + τ 2κ + 1] l (k − l)!κ! 2[κ − 1] √ √ − 3 (2κ + 1)τ 2 (κ) + τ 2k + 1 2κ + 1 (κ) (k − l) 2κ − 1 I2 I1 I1 2[k − l − 1] !) +2 (2k + 1) (k − l) , 2[k − l] − 1 I2  1 if b ∈ {a, a + 1,...}, Ia(b) = 0 otherwise.

4 Using this theorem, it can be shown that if |x| → ∞, then

! 8c1 2k + 1 θb (x) = 1 − √ √ x + O(|x|−2), 2k+1 2 2 c0 [ 2k + 1 + τ 2κ + 1] + x   c1 −2 ρ2k+1(x) = (2k + 1) 4k − 1 + O(|x| ). c0

Note that, as expected, θb2k+1(x) collapses to x when a conflict occurs between the prior and the likelihood information.

3.2 Behavior of θb2k+1(x) for large τ If the prior is large, the resulting Bayes estimator should be close to the one obtained using a uniform prior on θ (i.e., π(θ) ≡ 1). In this subsection, the behavior of θb2k+1(x) and ρ2k+1(x) are considered when τ → ∞. √ √ √ √ If τ is large, tan−1(x/[ 2k + 1 + τ 2κ + 1]) = (x/[ 2k + 1 + τ 2κ + 1]) + O(τ −3). Conse- quently,

" x #! cos [p + q + 1] tan−1 √ √ = 1 + O(τ −2), (10) 2k + 1 + τ 2κ + 1 " x #! (p + q + 2)x sin [p + q + 2] tan−1 √ √ = √ √ 2k + 1 + τ 2κ + 1 2k + 1 + τ 2κ + 1 + O(τ −3). (11)

Substituting equations (10) and (11) in equations (8) and (9), we obtain the following theorem.

Theorem 5.

π 1 (2[k − l])! A (x) = k,l 2k−l+κ+1 (2k + 1)(2[k−l]+1)/4([2κ + 1]τ 2)(2κ+1)/4 (k − l)!   S0 −2 × q √ √ + O(τ ) , [ 2k + 1 + τ 2κ + 1]2 + x2 π 1 (2[k − 1])! B (x) = k 2k+κ (2k + 1)(2k+1)/4([2κ + 1]τ 2)(2κ+1)/4 (k − 1)! " S x # × √ √ 1 √ + O(τ −4) , τ 2κ + 1([ 2k + 1 + τ 2κ + 1]2 + x2) where

κ X (2κ − q)! q S0 = 2 , q=0 (κ − q)! κ X (2κ − q)! q S1 = (q + 1)(q + 2)2 . q=0 (κ − q)!

5 Table 1: Maximum error for k = 1, 2,..., 15 k 1 2 3 4 5 η∗ 2.737 2.400 2.299 2.253 2.226 −6 −7 −9 −9 −10 maxη |ek(η)| 5.2 × 10 1.2 × 10 9.6 × 10 1.4 × 10 2.9 × 10 η∗∗ 3.049 2.515 2.391 2.330 2.294 −5 −7 −8 −9 −10 maxη |ηek(η)| 1.5 × 10 3.0 × 10 2.2 × 10 3.2 × 10 6.6 × 10 k 6 7 8 9 10 η∗ 2.208 2.195 2.185 2.177 2.172 −11 −11 −12 −12 −12 maxη |ek(η)| 8.0 × 10 2.6 × 10 9.9 × 10 4.1 × 10 1.8 × 10 η∗∗ 2.272 2.256 2.244 2.234 2.227 −10 −11 −11 −12 −12 maxη |ηek(η)| 1.8 × 10 5.8 × 10 2.2 × 10 9.1 × 10 4.1 × 10 k 11 12 13 14 15 η∗ 2.167 2.163 2.159 2.156 2.154 −13 −13 −13 −13 −14 maxη |ek(η)| 9.2 × 10 4.8 × 10 2.6 × 10 1.5 × 10 8.7 × 10 η∗∗ 2.221 2.216 2.212 2.208 2.205 −12 −12 −13 −13 −13 maxη |ηek(η)| 2.0 × 10 1.0 × 10 5.7 × 10 3.2 × 10 1.9 × 10

Using the previous theorem, it can be shown that

2k + 1 S1 θb2k+1(x) = 1 − 2k − 1 S0,0  1 × √ q √ √  x τ 2κ + 1 ( 2k + 1 + τ 2κ + 1)2 + x2 + O(τ −3), 2k + 1 ρ (x) = + O(τ −1). 2k+1 2k − 1

Hence, θb2k+1(x) has the desired behavior. In the next section, approximations for θb2k(x) and ρk(x) are given.

4 Even number of degrees of freedom for the likelihood function

In Saleh (1994), it is shown that

2k − 1 2k + 1 f (x) = f (x) + f (x) + e (x), (12) 2k 4k 2k−1 4k 2k+1 k where fm(x) is given by equation (1) with η = 0 and ν = 1. (Note that other approximations for the Student-t density are discussed in Saleh (1994).) The term ek(x) represents an error term.

Using Mathematica, the maximum of |ek(η)| is approximately given by maxη∈R |ek(η)| ≈ 1.158 × −5 6.782 10 /k . In Table 1, we computed maxη |ek(η)| for k = 1 to 15 along with the values of η, denoted by η∗, where the maximum occurs. It can be seen that the maximum error becomes negligible as k increases.

6 Using equation (12), the Fourier transform of f2k(x) given in Proposition 1 can also be approx- imated by 2k − 1 2k + 1 fb2k(s) ≈ fb2k−1(s) + fb2k+1(s) 4k √ 4k 2k − 1 ( 2k − 1|s|)k−1/2 √ = K ( 2k − 1|s|) 4k 2k−3/2Γ(k − 1/2) k−1/2 √ 2k + 1 ( 2k + 1|s|)k+1/2 √ + K ( 2k + 1|s|). 4k 2k−1/2Γ(k + 1/2) k+1/2

Using this approximation, the following theorem can be proved.

Theorem 6. If X | θ ∼ T2k(θ, 1) and θ ∼ T2κ+1(0, τ), then

θb2k(x) ≈ wk(x)θb2k−1(x) + (1 − wk(x))θb2k+1(x), (13)

ρ2k(x) ≈ wk(x)ρ2k−1(x) + (1 − wk(x))ρ2k+1(x) 2 − wk(x)(1 − wk(x))(θb2k+1(x) − θb2k−1(x)) , where [2k+7]/4 (2k − 1) Ak−1,0(x) wk(x) = [2k+7]/4 [2k+5]/4 . (2k − 1) Ak−1,0(x) + (2k + 1) Ak,0(x) ˜ In order to see if the approximation given in Theorem 6 is accurate, let θ2k(x) denote the exact Bayes estimator of θ (cf. Angers, 1997) and θb2k(x), its approximation using equation (13). Then, it can be shown that ˜ |E1(x) − E0(x)(x − θb2k(x))| |θ2k(x) − θb2k(x)| = , |m(x) − E0(x)| R ∞ i where Ei(x) = −∞ η ek(η)π(η − x)dη for i = 0, 1 and m(x) represents the marginal density of x using the approximation given by equation (12). Using Mathematica, we also tabulated in Table 1, the value of maxη∈R |ηek(η)| for k = 1 to 15 along with the value of η, denoted by η∗∗, for which the maximum occurs. Fitting a log-log regression model, we obtain that

3.058 × 10−5 max |ηek(η)| ≈ 6.862 . η∈R k ˜ The of the approximation error (in absolute value), that is |θ2k(x) − θb2k(x)|, is also given ˜ in Figure 1 for 2k = 4, 6 and 10 and κ = 0 and for 0 ≤ x ≤ 10. (Note that θ2k(x) has been computed using equation (3) and the Il(x) integrals were evaluated using the Monte Carlo integration technique.) The marginal density (10m3(x)) is also plotted in Figure 1 to indicate which values of x have a large likelihood. From Figure 1 it can be seen that the maximum error occurs around x = 5 and that it decreases as k increases. For small values of x (values for which the marginal of X is maximal), the error does not depend much on k. For large values of x the approximation is better for larger k.

5 Conclusion

In this paper, we provide an exact (and closed form) solution for the estimation of a Student- t location parameter when the prior is also a Student-t density and both numbers of degrees of

7 0.8

0.6

0.4

0.2

2 4 6 8 10

Figure 1: Approximation error for 2k = 4 (top curve), 2k = 6 (middle curve) and 2k = 10 (bottom curve) and κ = 0 freedom are odd. This estimator is also shown to be insensitive to misspecification of the prior location and scale parameter. It also corresponds to the generalized Bayes estimator (based on π(θ) ≡ 1) when τ is large. When the number of degrees of freedom of the likelihood function is even, the previous estimator does not apply. However, based on this estimator, an approximation of θb2k(x) is proposed. This case can be easily generalized to the cases where either the prior or the likelihood, or both, are Student-t densities with an even number of degrees of freedom.

References

[1] Angers, J.-F. (1992). Use of Student-t prior for the estimation of normal means: A compu- tational approach. Bayesian Statistics IV (J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, eds.), Oxford University Press, 567–575.

[2] Angers, J.-F. (1996a). Fourier transform and Bayes estimator of a location parameter. Statistics & Probability Letters 29, 353–359.

[3] Angers, J.-F. (1996b). Protection against outliers using a symmetric stable law prior. In IMS Lecture Notes - Monograph Series, 29, 273–283.

[4] Angers, J.-F. (1997). Bayesian estimator of the location parameter of a student-t density. Technical Report 97-07, University of Nottingham, Nottingham Univer- sity Statistics Group.

[5] Angers, J.-F. and J. O. Berger (1991). Robust hierarchical Bayes estimation of exchangeable means. The Canadian Journal of Statistics 19, 39–56.

[6] Fan, T. H. and J. O. Berger (1990). Exact convolution of t distributions, with applications to Bayesian inference for a normal mean with t prior distributions. Journal of Statistical Computing and Simulation 36, 209–228.

8 [7] Fan, T. H. and J. O. Berger (1992). Behaviour of the posterior distribution and inferences for a normal means with t prior distributions. Statistics & Decisions 10, 99–120.

[8] Gradshteyn, L. S. and I. M. Ryzhik (1980). Table of integrals, series and products. New York: Academic Press.

[9] Leblanc, A. and J.-F. Angers (1995). Fast Fourier transforms and Bayesian estimation of location parameters. Technical Report DMS-380, Universit´ede Montr´eal, D´epartement de math´ematiques et de statistique.

[10] O’Hagan, A. (1979). On outlier rejection phenomena in Bayes inference. Journal of the Royal Statistical Society Ser. B 41, 358–367.

[11] Saleh, A. A. (1994). Approximating the characteristic function of the student’s t distribution. The Egyptian Statistical Journal 39(2), 177–195.

9