arXiv:1409.6276v1 [math.ST] 1 Sep 2014 ocnrto nqaiisfrUiait Distribution Univariate for Inequalities Concentration 3 ieiodRtoMethod Ratio Likelihood 2 Introduction 1 Contents og,L 01,UA mi:[email protected] Electrical Email: of USA; Department 70813, the LA and Rouge, USA, 70803, LA Rouge, Baton ∗ .2LpaeDsrbto ...... Distribution . . . . . Laplace ...... Distribution 3.12 ...... Binomial . . . Negative ...... Lagrangian ...... 3.11 . . . . Distribution ...... Logarithmic ...... Lagrangian . . . Distribution . . . . . 3.10 . Gaussian . . . . . Inverse ...... Distribution . . . 3.9 . Gamma . . . . . Inverse ...... 3.8 . Distribution . . . Gumbel ...... 3.7 . . Distribution . . . Geeta ...... 3.6 Distribution . . . Consul ...... 3.5 Distribution . Borel ...... 3.4 . Distribution . Beta-Prime . Distribution . Binomial 3.3 Negative . Beta . . 3.2 Distribution Beta 3.1 . eea rnil ...... Distributions . Parameterized . of Construction . . 2.2 Principle General 2.1 h uhri fflce ihteDprmn fEetia Engi Electrical of Department the with afflicted is author The ocnrto nqaiisfo ieiodRatio Likelihood from Inequalities Concentration etainieulte o ievreyo nvraean univariate of variety wide a for inequalities centration nqaiisfrvrosdsrbtosaedvlpdwith developed functions. are distributions various for inequalities .. egtFnto ...... Restriction Parameter . . 2.2.2 Function Weight 2.2.1 eepoeteapiain forpeiul salse l established previously our of applications the explore We etme 2014 September Method ijaChen Xinjia Abstract 1 ern n optrSinea oiin tt University State Louisiana at Science Computer and neering niern,Suhr nvriyadAMClee Baton College, A&M and University Southern Engineering, utvraedsrbtos e concentration New distributions. multivariate d u h dao iiiigmmn generating moment minimizing of idea the out ∗ klho-ai ehdfrdrvn con- deriving for method ikelihood-ratio s 5 ...... 10 . 10 ...... 10 ...... 9 ...... 7 ...... 11 ...... 8 ...... 9 8 . . . 8 ...... 7 ...... 9 ...... 4 ...... 6 . . . . . 5 . . 7 4 3 , 3.13 ...... 11 3.14LognormalDistribution ...... 11 3.15NakagamiDistribution...... 12 3.16ParetoDistribution...... 12 3.17 Power-LawDistribution ...... 12 3.18 Stirling Distribution ...... 13 3.19 Snedecor’s F-Distribution ...... 13 3.20 Student’s t-Distribution ...... 13 3.21 Truncated ...... 14 3.22 Uniform Distribution ...... 14 3.23 ...... 14

4 Concentration Inequalities for Multivariate Distributions 15 4.1 Dirichlet-Compound ...... 15 4.2 Inverse Matrix ...... 15 4.3 Multivariate ...... 16 4.4 Multivariate ...... 16

5 Conclusion 17

A Proofs of Univariate Inequalities 17 A.1 ProofofTheorem2 ...... 17 A.2 ProofofTheorem3 ...... 18 A.3 ProofofTheorem4 ...... 20 A.4 ProofofTheorem5...... 21 A.5 ProofofTheorem6...... 22 A.6 ProofofTheorem7...... 23 A.7 ProofofTheorem8...... 24 A.8 ProofofTheorem9...... 24 A.9 ProofofTheorem10 ...... 26 A.10ProofofTheorem11 ...... 27 A.11ProofofTheorem12 ...... 28 A.12ProofofTheorem13 ...... 29 A.13ProofofTheorem14 ...... 29 A.14ProofofTheorem15 ...... 30 A.15ProofofTheorem16 ...... 31 A.16ProofofTheorem17 ...... 32 A.17ProofofTheorem18 ...... 33 A.18ProofofTheorem19 ...... 34 A.19ProofofTheorem20 ...... 34 A.20ProofofTheorem21 ...... 35 A.21ProofofTheorem22 ...... 36 A.22ProofofTheorem23 ...... 36 A.23ProofofTheorem24 ...... 37 A.24ProofofTheorem25 ...... 38

2 B Proofs of Multivariate Inequalities 39 B.1 ProofofTheorem26 ...... 39 B.2 ProofofTheorem27 ...... 40 B.3 ProofofTheorem28 ...... 41 B.4 ProofofTheorem29 ...... 42

1 Introduction

Bounds for of random events play important roles in many areas of engineering and sciences. Formally, let E be an event defined in space (Ω, Pr, F ), where Ω is the sample space, Pr denotes the probability measure, and F is the σ-algebra. A frequent problem is to obtain simple bounds as tight as possible for Pr E . In general, the event E can be expressed in terms of a matrix-valued { } X . In particular, X can be a random vector or scalar. Clearly, the event E can be represented as X E , where E is a certain set of deterministic matrices. In , a { ∈ } conventional approach for deriving inequalities for Pr E is to bound the indicator function I X E by a { } { ∈ } family of random variables having finite expectation and minimize the expectation. The central idea of this approach is to seek a family of bounding functions w(X , ϑ) of X , parameterized by ϑ Θ, such that ∈ I X E w(X , ϑ) for all ϑ Θ. (1) { ∈ } ≤ ∈ I X Here, the notion of inequality (1) is that the inequality X (ω) E w( (ω), ϑ) holds for every ω Ω. As { ∈ } ≤ ∈ a consequence of the monotonicity of the mathematical expectation E[.],

Pr E = E[I X E ] E[w(X , ϑ)] for all ϑ Θ. (2) { } { ∈ } ≤ ∈ Minimizing the upper bound in (2) with respect to ϑ Θ yields ∈ Pr E inf E[w(X , ϑ)]. (3) { }≤ ϑ Θ ∈ Classical concentration inequalities such as Chebyshev inequality and Chernoff bounds [2] can be derived by this approach with various bounding functions w(X , ϑ), where X is a scalar random variable. We call this technique of deriving probabilistic inequalities as the mathematical expectation (ME) method, in view of the crucial role played by the mathematical expectation of bounding functions. For the ME method to be successful, the mathematical expectation E[w(X , ϑ)] of the family of bounding functions w(X , ϑ), ϑ ∈ Θ must be convenient for evaluation and minimization. The ME method is a very general approach. However, it has two drawbacks. First, in some situations, the mathematical expectation E[w(X , ϑ)] may be intractable. Second, the ME method may not fully exploit the information of the underlying distribution, since the mathematical expectation is only a quantity of summary for the distribution. Recently, we have proposed in [3, 4, 5] a more general approach for deriving probabilistic inequalities, aiming at overcoming the drawbacks of the ME method. Let f(.) denote the probability density function (pdf) or probability mass function (pmf) of X . The primary idea of the proposed approach is to seek a family of pdf or pmf g(., ϑ), parameterized by ϑ Θ, and a deterministic function Λ(ϑ) of ϑ Θ such ∈ ∈ that for all ϑ Θ, the indicator function I X E is bounded from above by the product of Λ(ϑ) and the ∈ X { ∈ } likelihood ratio g( ,ϑ) . Then, the probability Pr X E is bounded from above by the infimum of Λ(ϑ) f(X ) { ∈ } with respect to ϑ Θ. Due to the central role played by the likelihood ratio, this technique of deriving ∈ probabilistic inequalities is referred to as the likelihood ratio (LR) method. It has been demonstrated in [4] that the ME method is actually a special technique of the LR method.

3 In this paper, we shall apply the LR method to investigate the concentration phenomenon of random variables. Our goal is to derive simple and tight concentration inequalities for various distributions. The remainder of the paper is organized as follows. In Section 2, we introduce the fundamentals of the LR method. In Section 3, we apply the LR method to the development of concentration inequalities for univariate distributions. In Section 4, we apply the LR method to establish concentration inequalities for multivariate distributions. Section 5 is the conclusion. Most proofs are given in Appendices.

Throughout this paper, we shall use the following notations. Let IE denote the indicator function such I I t that E = 1 if E is true and E = 0 otherwise. We use the notation k to denote a generalized combinatoric number in the sense that  t k (t ℓ + 1) Γ(t + 1) t = ℓ=1 − = , =1, k k! Γ(k +1) Γ(t k + 1) 0   Q −   where t is a real number and k is a non-negative integer. We use Xn to denote the average of random Pn X variables X , ,X , that is, X = i=1 i . The notation denotes the transpose of a matrix. The trace 1 ··· n n n ⊤ of a matrix is denoted by tr. We use pdf and pmf to represent probability density function and probability mass function, respectively. The other notations will be made clear as we proceed.

2 Likelihood Ratio Method

In this section, we shall introduce the LR method for deriving probabilistic inequalities.

2.1 General Principle

Let E be an event which can be expressed in terms of matrix-valued random variable X , where X is defined on the sample space Ω and σ-algebra F such that the true probability measure is one of two measures Pr and Pϑ. Here, the measure Pr is determined by pdf or pmf f(.). The measure Pϑ is determined by pdf or pmf g(., ϑ), which is parameterized by ϑ Θ. The subscript in Pϑ is used to indicate the dependence on ∈ the parameter ϑ. Clearly, there exists a set, E , of deterministic matrices of the same size as X such that E = X E . The LR method for obtaining an upper bound for the probability Pr E is based on the { ∈ } { } following general result.

Theorem 1 Assume that there exists a function Λ(ϑ) of ϑ Θ such that ∈

f(X ) I X E Λ(ϑ) g(X , ϑ) for all ϑ Θ. (4) { ∈ } ≤ ∈ Then,

Pr E inf Λ(ϑ) Pϑ E inf Λ(ϑ). (5) { }≤ ϑ Θ { }≤ ϑ Θ ∈ ∈ In particular, if the infimum of Λ(ϑ) is attained at ϑ∗ Θ, then ∈

Pr E P ∗ E Λ(ϑ∗). (6) { }≤ ϑ { } X I X The notion of the inequality in (4) is that f( (ω)) X (ω) E Λ(ϑ) g( (ω), ϑ) for every ω Ω. { ∈ } ≤ ∈ The function Λ(ϑ) in (4) is referred to as likelihood-ratio bounding function. Theorem 1 asserts that the probability of event E is no greater than the likelihood ratio bounding function.

4 2.2 Construction of Parameterized Distributions

In the sequel, we shall introduce two approaches for constructing parameterized distributions g(., ϑ) which are essential for the application of the LR method.

2.2.1 Weight Function

A natural approach to construct parameterized distribution g(., ϑ) is to modify the pdf or pmf f(.) by multiplying it with a parameterized function and performing a normalization. Specifically, let w(., ϑ) be a non-negative function with parameter ϑ Θ such that E[w(X , θ)] < for all ϑ Θ, where the ∈ ∞ ∈ expectation is taken under the probability measure Pr determined by f(.). Define a family of distributions as w(X , ϑ) f(X ) g(X , ϑ)= E[w(X , ϑ)] for ϑ Θ and X in the range of X . In view of its role in the modification of f(.) as g(., ϑ), the function ∈ w(., ϑ) is called a weight function. Note that

f(X ) w(X , ϑ)= E[w(X , ϑ)] g(X , ϑ) for all ϑ Θ. (7) ∈ For simplicity, we choose the weight function such that the condition (1) is satisfied. Combining (1) and (7) yields f(X ) I X E f(X ) w(X , ϑ)= E[w(X , ϑ)] g(X , ϑ) for all ϑ Θ. { ∈ } ≤ ∈ Thus, the likelihood ratio bounding function can be taken as

Λ(ϑ)= E[w(X , ϑ)] for ϑ Θ. ∈ It follows from Theorem 1 that

Pr E inf Λ(ϑ) Pϑ E inf Λ(ϑ). { }≤ ϑ Θ { }≤ ϑ Θ ∈ ∈ Thus, we have demonstrated that the ME method is actually a special technique of the LR method. By constructing a family of parameterized distributions and making use of the LR method, we have obtained the following result.

Theorem 2 Let X be a random variable with moment generating function φ(.). Let X , ,X be i.i.d. 1 ··· n samples of X. Let CBE be the absolute constant in the Berry-Essen inequality. Then,

1 zτ n Pr X z + ∆ e− φ(τ) , { n ≥ }≤ 2     where 3 2 4 1 C φ(τ)[φ′′′′(τ) 4zφ′′′(τ)] + 3[φ′′(τ)] ∆ = min , BE − 3 2 √n [φ (τ) z2φ(τ)]2 − (  ′′ −  ) ′ φ (τ) with τ satisfying φ(τ) = z.

See Appendix A.1 for a proof. Note that ∆ 0 as n . So, for large n, the above bound is twice → → ∞ tighter than the classical Chernoff bound.

5 2.2.2 Parameter Restriction

In many situations, the pdf or pmf f(.) of X comes from a family of distributions parameterized by θ Θ. ∈ If so, then the parameterized distribution g(., ϑ) can be taken as the subset of pdf or pmf with parameter ϑ contained in a subset Θ of parameter space Θ. By appropriately choosing the subset Θ, the deterministic function Λ(ϑ) may be readily obtained. As an illustrative example, consider the normal distribution. A random variable X is said to have a normal distribution with µ and σ2 if it possesses a probability density function 1 x µ 2 f (x)= exp | − | . X √ − 2σ2 2πσ   Let X , ,X be i.i.d. samples of the random variable X. The following well-known inequalities hold 1 ··· n true. 1 n(z µ)2 Pr X z exp − for z µ, (8) { n ≤ }≤ 2 − 2σ2 ≤   1 n(z µ)2 Pr X z exp − for z µ. (9) { n ≥ }≤ 2 − 2σ2 ≥   1 It should be noted that the factor 2 in these inequalities cannot be obtained by using conventional tech- niques of Chernoff bounds. By virtue of the LR method, we can provide an easy proof for inequalities (8) and (9). We proceed as follows. Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 2 1 i=1(xi µ) fX (x)= exp − . √ n − 2σ2 ( 2πσ)  P  To apply the LR method to show (8), we construct a family of probability density functions

n 2 1 i=1(xi ϑ) gX (x, ϑ)= exp − (√2πσ)n − 2σ2  P  for ϑ ( ,z] with z µ. It can be checked that ∈ −∞ ≤ n fX (x) 2(ϑ µ)x + µ2 ϑ2 = exp − n − . gX (x, ϑ) − 2σ2    For any ϑ ( ,z], we have ϑ z µ and thus ∈ −∞ ≤ ≤ n fX (x) 2(ϑ µ)z + µ2 ϑ2 exp − − ϑ ( ,z] for xn z. gX (x, ϑ) ≤ − 2σ2 ∀ ∈ −∞ ≤    This implies that X X I X X f ( ) Xn z Λ(ϑ) g ( , ϑ) ϑ ( ,z], { ≤ } ≤ ∀ ∈ −∞ where 2(ϑ µ)z + µ2 ϑ2 n Λ(ϑ)= exp − − . − 2σ2    By differentiation, it can be readily shown that the infimum of Λ(ϑ) with respect to ϑ ( ,z] is equal ∈ −∞ to n(z µ)2 exp − , − 2σ2  

6 which is attained at ϑ = z. By symmetry, it can be shown that 1 P X z = . z{ n ≤ } 2 Using these facts and invoking (6) of Theorem 1, we have

Pr X z P X z Λ(z) for z µ. { n ≤ }≤ z{ n ≤ } ≤ This implies that inequality (8) holds. In a similar manner, we can show inequality (9).

3 Concentration Inequalities for Univariate Distributions

In this section, we shall apply the LR method to derive bounds for tail probabilities for univariate distri- butions. Such bounds are referred to as concentration inequalities.

3.1

A random variable X is said to have a beta distribution if it possesses a probability density function

1 α 1 β 1 f(x)= x − (1 x) − , 0 0, β> 0, (α, β) − B where (α, β)= Γ(α)Γ(β) . Let X , ,X be i.i.d. samples of the random variable X. Making use of the B Γ(α+β) 1 ··· n LR method, we have shown the following results.

E α βz α(1 z) Theorem 3 Let z (0, 1) and µ = [X]= α+β . Define α = 1 z and β = z− . Then, ∈ − (α, β) zα n Pr X z B b for 0

3.2 Beta Negative

A random variable X is said to have a beta distribution if it possesses a probability mass function n + x 1 Γ(α + n)Γ(β + x)Γ(α + β) f(x)=Pr X = x = − , x =0, 1, 2, { } x Γ(α + β + n + x)Γ(α)Γ(β) ···   where α> 1 and β > 0 and n> 1. By virtue of the LR method, we have obtained the following results.

nβ Theorem 4 Let z be a nonnegative integer no greater than α 1 . Then, − αz z αz z Γ( n− ) Γ(β + z) Γ(α + n− + n + z) Pr X z αz z . { ≤ }≤ Γ(β) Γ( n− + z) Γ(α + β + n + z) See Appendix A.3 for a proof.

7 3.3 Beta-Prime Distribution

A random variable X is said to have a beta-prime distribution if it possesses a probability density function

xα 1(1 + x) α β f(x)= − − − , x> 0, α> 0, β> 0. (α, β) B Let X , ,X be i.i.d. samples of the random variable X. Making use of the LR method, we have 1 ··· n obtained the following results.

α Theorem 5 Assume that β > 1 and 0

3.4

A random variable X is said to possess a Borel distribution if it has a probability mass function

(θx)x 1e θx f(x)=Pr X = x = − − , x =1, 2, , { } x! ··· where 0 <θ< 1. Let X , ,X be i.i.d. samples of the random variable X. Making use of the LR 1 ··· n method, we have obtained the following result.

Theorem 6 z 1 n eθz − θz 1 Pr X z e− for 1

3.5 Consul Distribution

A random variable X is said to have a Consul distribution if it possesses a probability mass function

x 1 1 mx θ − f(x)=Pr X = x = (1 θ)mx, x =1, 2, { } x x 1 1 θ − ···  −   −  where 0 θ < 1, 1 m < 1 . See, e.g., [7], for an introduction of this distribution. Let X , ,X be ≤ ≤ θ 1 ··· n i.i.d. samples of the random variable X. Making use of the LR method, we have obtained the following result.

Theorem 7

z 1 n θ − mz 1 θ (1 θ) 1 − − Pr Xn z  z 1  for 1 z < . (16) { ≤ }≤ z 1  − z 1 mz ≤ 1 mθ 1 z−+mz (1 mz− ) −  − −      See Appendix A.6 for a proof.

8 3.6 Geeta Distribution

A random variable X is said to have a Geeta distribution if it possesses a probability mass function

1 βx 1 x 1 βx x f(x)=Pr X = x = − θ − (1 θ) − , x =1, 2, { } βx 1 x − ··· −   1 where 0 <θ< 1 and 1 <β< θ . Making use of the LR method, we have obtained the following result.

Theorem 8 n z 1 βz z θ − (1 θ) − 1 θ Pr Xn z  z 1 − βz z  for 1 z − . (17) { ≤ }≤ z 1 − z 1 − ≤ ≤ 1 βθ βz− 1 1 βz− 1 −  − − −       See Appendix A.7 for a proof.

3.7

A random variable X is said to have a Gumbel distribution if it possesses a probability density function

1 µ x µ x f(x)= exp − exp − , 0 and <µ< . Let X , ,X be i.i.d. samples of random variable X. By virtue of the −∞ ∞ 1 ··· n LR method, we have obtained the following result.

Theorem 9 µ z µ z n Pr X z exp − +1 exp − (18) { n ≤ }≤ β − β     for z µ. ≤ See Appendix A.8 for a proof.

3.8 Inverse Gamma Distribution

A random variable X is said to have an inverse gamma distribution if it possesses a probability density function α β α 1 β f(x)= x− − exp , x> 0, α> 0, β> 0. Γ(α) − x   Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained 1 ··· n the following results.

Theorem 10 n β β α+1 Γ( + 1) z z − β Pr X z z for 0

9 3.9 Inverse Gaussian Distribution

A random variable X is said to have an inverse Gaussian distribution if it possesses a probability density function λ 1/2 λ(x θ)2 f(x)= exp − , x> 0 2πx3 − 2θ2x     where λ> 0 and θ> 0. Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained 1 ··· n the following result.

Theorem 11 λ λ λz n Pr X z exp for 0

3.10 Lagrangian Logarithmic Distribution

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability mass function x x(β 1) θ (1 θ) − Γ(βx) f(x)=Pr X = x = − − , x =1, 2, { } Γ(x + 1)Γ(βx x + 1) ln(1 θ) ··· − − where 0 < θ θβ < 1. Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR ≤ 1 ··· n method, we have obtained the following result.

Theorem 12

z z(β 1) n θ 1 θ − ln(1 ϑ) θ Pr X z − − for 0

3.11 Lagrangian Negative Binomial Distribution

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability mass function

β αx + β x β+αx x f(x, θ)=Pr X = x = θ (1 θ) − , x =0, 1, 2, { } αx + β x − ···   where 0 <θ< 1, θ<αθ< 1 and β > 0. Let X , ,X be i.i.d. samples of random variable X. By 1 ··· n virtue of the LR method, we have obtained the following result.

Theorem 13 z β+αz z n θ 1 θ − βθ Pr X z − for 0 z , (23) { n ≤ }≤ ϑ 1 ϑ ≤ ≤ 1 αθ "   −  # − z where ϑ = β+αz .

See Appendix A.12 for a proof.

10 3.12

A random variable X is said to have a Lagrangian logarithmic distribution if it possesses a probability density function 1 x α f(x)= exp | − | , 0. Let X , ,X be i.i.d. samples of random variable X. By virtue of the −∞ ∞ 1 ··· n LR method, we have obtained the following results.

Theorem 14 z α z α n Pr X z − exp 1 − for z α + β, (24) { n ≥ }≤ β − β ≥    α z α z n Pr X z − exp 1 − for z α β. (25) { n ≤ }≤ β − β ≤ −    See Appendix A.13 for a proof.

3.13 Logarithmic Distribution

A random variable X is said to have a logarithmic distribution if it possesses a probability mass function qx f(x)= , x =1, 2, x ln p ··· − where p (0, 1) and q = 1 p. Let X , ,X be i.i.d. samples of random variable X. By virtue of the ∈ − 1 ··· n LR method, we have obtained the following result.

Theorem 15 ln(1 q) q z n q Pr Xn z − for 0

3.14 Lognormal Distribution

A random variable X is said to have a lognormal distribution if it possesses a probability density function

1 1 f(x)= exp (ln x µ)2 , x> 0, <µ< , σ> 0. √ −2σ2 − −∞ ∞ x 2πσ   Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained 1 ··· n the following result.

Theorem 16 n µ ln z 2 Pr X z exp − for 0

11 3.15

A random variable X is said to have a Nakagami distribution if it possesses a probability density function 2 x2m 1 x2 f(x)= − exp , x> 0 Γ(m) σ2m −σ2   where m 1 and σ > 0. Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR ≥ 2 1 ··· n method, we have obtained the following results.

Theorem 17 2(m ϑ) n Γ(ϑ + 1 ) Γ(ϑ) Γ(ϑ + 1 ) − Pr X 2 σ 2 for 0 < ϑ m, (28) n ≤ Γ(ϑ) ≤ Γ(m) Γ(ϑ) ≤   (   ) n z2 m z2 Pr X z exp m for z √mσ. (29) { n ≥ }≤ mσ2 − σ2 ≥     See Appendix A.16 for a proof.

3.16 Pareto Distribution

A random variable X is said to have a Pareto distribution if it possesses a probability density function θ a θ+1 f(x)= , x>a> 0, θ> 1. a x Let X , ,X be i.i.d. samples of random  variable X. By virtue of the LR method, we have obtained 1 ··· n the following result.

Theorem 18 n θ 1 θ ρθ 1 1 1 Pr X ρµ eθ − ln for 1 <ρ 1 exp , (30) { n ≤ }≤ ρθ θ 1 − θ ≤ − θ θ "    − #     E θa where µ = [X]= θ 1 . − See Appendix A.17 for a proof.

3.17 Power-Law Distribution

A random variable X is said to have a power-law distribution if it possesses a probability density function x α f(x)= − , 1 x β, C(α) ≤ ≤ where β > 1, α R and 1− ∈ 1 β α −α 1 for α =1, C(α)= − 6 ln β for α =1  Let X , ,X be i.i.d. samples of random variable X. By virtue of the LR method, we have obtained 1 ··· n  the following result.

−1 θ 1 βθ β Theorem 19 Let θ α> 1 and z = θ−2 βθ−1−1 . Then, ≥ − − 1 θ n α 1 1 β − θ α Pr X z − − z − . (31) n ≤ ≤ θ 1 1 β1 α  − − −  See Appendix A.18 for a proof.

12 3.18 Stirling Distribution

A random variable is said to have a Stirling distribution if it possesses a probability mass function m! s(x, m) θx Pr X = x = | | , 0 <θ< 1, x = m,m +1, , { } x![ ln(1 θ)]m ··· − − where s(x, m) is the Stirling number of the first kind, with arguments x and m. Let X , ,X be i.i.d. 1 ··· n samples of random variable X. By virtue of the LR method, we have obtained the following result.

Theorem 20 ln(1 ϑ) nm θ nz mθ Pr X z − for z , (32) n ≤ ≤ ln(1 θ) ϑ ≤ (θ 1) ln(1 θ)  −    − −  mϑ where ϑ (0,θ] is the unique number such that z = (ϑ 1) ln(1 ϑ) . ∈ − − See Appendix A.19 for a proof.

3.19 Snedecor’s F-Distribution

If random variable X has a probability density function of the form

n+m m m/2 (m 2)/2 Γ( 2 )( n ) x − f(x)= m n m (n+m)/2 , for 0

Theorem 21 n + m (n+m)/2 Pr X z zm/2 for z 1 (33) { ≥ }≤ n + mz ≥   n + m (n+m)/2 Pr X z zm/2 for 0

3.20 Student’s t-Distribution

If random variable X has a probability density function of the form

n+1 Γ( 2 ) f(x)= 2 , for

Theorem 22 n +1 (n+1)/2 Pr X z z for z 1, (35) {| |≥ }≤ n + z2 ≥   n +1 (n+1)/2 Pr X z z for 0

13 3.21 Truncated Exponential Distribution

A random variable X is said to have a truncated exponential distribution if it possesses a probability density function θeθx f(x)= , θ =0, 0

Theorem 23 ϑ n θ e 1 (θ ϑ)z 1 1 1 Pr X z − e − for 0 0. (38) n ≤ 2 ≤ eθ 1    −  See Appendix A.22 for a proof.

3.22 Uniform Distribution

Let X be a random variable uniformly distributed over interval [0, 1]. Let X , ,X be i.i.d. samples of 1 ··· n the random variable X. By virtue of the LR method, we have obtained the following results.

Theorem 24 n eϑ 1 1 2 1 Pr X z − exp 6n z for 1 >z> , (39) { n ≥ }≤ ϑeϑz ≤ − − 2 2     ! 1 1 where ϑ is a positive number such that z =1+ eϑ 1 ϑ . Similarly, − − n eϑ 1 1 2 1 Pr X z − exp 6n z for 0

3.23 Weibull Distribution

A random variable X is said to have a Weibull distribution if it possesses a probability density function

β 1 β f(x)= αβx − exp αx , x> 0, α> 0, β> 0. − Let X , ,X be i.i.d. samples of the random variable X. By virtue of the LR method, we have obtained 1 ··· n the following results.

Theorem 25

n Pr X z αzβ exp(1 αzβ) for αzβ 1 and β < 1, (41) { n ≤ }≤ − ≤ β β n β Pr Xn z αz exp(1 αz ) for αz 1 and β > 1. (42) { ≥ }≤ − ≥ See Appendix A.24 for a proof.  

14 4 Concentration Inequalities for Multivariate Distributions

In this section, we shall apply the LR method to derive concentration inequalities for the joint distributions of multiple random variables.

4.1 Dirichlet-Compound Multinomial Distribution

Random variables X , ,X are said to have a Dirichlet-compound multinomial distribution if they 1 ··· k possess a probability mass function k n Γ( k α ) Γ(x + α ) f(x)= ℓ=0 ℓ ℓ ℓ , x k Γ(α ) Γ(n + ℓ=0 αℓ) ℓ   P ℓY=0 where P n n! x = [x , x , , x ]⊤, = 0 1 ··· k x k   ℓ=0 xℓ! and k Q xℓ = n Xℓ=0 with x 0 and α > 0 for ℓ = 0, 1, , k. Based on the LR method, we have obtained the following ℓ ≥ ℓ ··· result.

Theorem 26 Assume that 0

4.2 Inverse Matrix Gamma Distribution

A positive-definite X is said to have an inverse matrix gamma distribution [9] if it possesses a probability density function α Ψ α (p+1)/2 1 1 f(x)= | | x − − exp tr(Ψx− ) , βpαΓ (α) | | −β p   where β > 0 is the scale parameter, Ψ is a positive-definite real matrix of size p p. Here x is a positive- × definite matrix of size p p, and Γ (.) is the multivariate gamma function. The inverse matrix gamma × p n 4 distribution reduces to the with β =2, α = 2 . Let denote the relationship of two matrices A and B of the same size such that A 4 B implies that B A is positive definite. By virtue of − the LR method, we have obtained the following result. Theorem 27 1 p 1 Pr X 4 ρΥ exp 1 (2α p 1) for 0 <ρ< 1, (44) { }≤ ρpα −2 ρ − − −     E 2 Ψ where Υ = [X]= β 2α p 1 is the expectation of X. − − See Appendix B.2 for a proof.

15 4.3 Multivariate Normal Distribution

A random vector X is said to have a multivariate normal distribution if it possesses a probability density function k/2 1/2 1 1 f(x)=(2π)− Σ − exp (x µ)⊤Σ− (x µ) , | | −2 − −   where k is the dimension of X, x is a vector of k elements, µ is the expectation of X, and Σ is the covariance matrix of X. Let X , , X be i.i.d. samples of X. Define 1 ··· n n X X = i=1 i . n n P Let < denote the relationship of two vectors A = [a , ,a ] and B = [b , ,b ] such that A < B implies 1 ··· k 1 ··· k a b , ℓ =1, , k. By virtue of the LR method, we have obtained the following result. ℓ ≥ ℓ ··· Theorem 28 n 1 1 1 1 Pr X < z exp µ⊤Σ− z [z⊤Σ− z + µ⊤Σ− µ] (45) { n }≤ − 2    1 1 provided that Σ− z < Σ− µ.

See Appendix B.3 for a proof.

4.4 Multivariate Pareto Distribution

Random variables X , ,X are said to have a multivariate Pareto distribution if they possess a proba- 1 ··· k bility density function

k k (α+k) α + i 1 x − f(x , , x )= − 1 k + i , x >β > 0, α> 0. 1 ··· k β − β i i i=1 i ! i=1 i ! Y X

Let X = [X , ,X ]⊤. Let z = [z , ,z ]⊤. Let X , , X be i.i.d. samples of random vector X. 1 ··· k 1 ··· k 1 ··· n Define n X X = i=1 i . n n P Let the notation “ ” denote the relationship of two vectors A = [a , ,a ]⊤ and B = [b , ,b ]⊤ such  1 ··· k 1 ··· k that A B a b ℓ =1, , k.  ℓ ≤ , ··· By virtue of the LR method, we have the following results.

Theorem 29 Let z >β , ℓ =1, , k. The following statements hold true. ℓ ℓ ··· (I): The inequality

θ α n k α + i 1 k z − Pr X z − 1 k + i (46) { n  }≤  θ + i 1 − β  i=1 ! i=1 i ! Y − X   holds for any θ > α. (II): The inequality (46) holds for θ such that

k 1 k − 1 z = ln 1 k + i (47) θ + ℓ − β i=1 i ! Xℓ=0 X

16 provided that k 1 k − 1 z > ln 1 k + i . (48) α + ℓ − β i=1 i ! Xℓ=0 X (III): The inequality (46) holds for 1 θ =1+ (49) 1 k zi 1 k i=1 βi −  P  provided that α> 1 and 1 k zi < α . k i=1 βi α 1 − See Appendix B.4 for aP proof.

5 Conclusion

We have investigated the concentration phenomenon of random variables based on the likelihood ratio method. A wide variety of concentration inequalities for various distributions are developed without using moment generating functions. The new inequalities are generally simple, insightful and fairy tight.

A Proofs of Univariate Inequalities

A.1 Proof of Theorem 2

Let f(.) denote the pmf or pdf of random variable X. Let Θ be the set of non-negative real number such that the moment generating function φ(.) of X exists. Define

f(x)eϑx g(x, ϑ)= , ϑ Θ. φ(ϑ) ∈

Then, g(x, ϑ) is a family of pmf or pdf, which contains f(x) = g(x, 0). Let X = [X1, ,Xn] and n ··· x = [x , , x ]. The joint pmf or pdf of X is fX (x)= f(x ), which is contained in the family 1 ··· n i=1 i 1 n n Q gX (x, ϑ)= f(x ) exp( ϑx ), ϑ Θ. φ(ϑ) i − i ∀ ∈ i=1   Y It can be checked that fX (x) n = [φ(ϑ)exp( ϑxn)] , ϑ Θ, gX (x, ϑ) − ∀ ∈ n Pi=1 xi where xn = n . Hence,

fX (x) z n φ(ϑ)e− , ϑ Θ provided that xn z. gX (x, ϑ) ≤ ∀ ∈ ≥   This implies that fX (X ) I Λ(ϑ) gX (X , ϑ), Xn z { ≥ } ≤ z n where Λ(ϑ) = [φ(ϑ)e− ] . By differentiation, it can be shown that the infimum of Λ(ϑ) with respect to ′ ϑ Θ is attained at τ Θ such that φ (τ) = z. It follows from (6) of Theorem 1 that ∈ ∈ φ(τ) zτ n Pr X z Λ(τ) P X z Λ(τ)= φ(τ)e− . (50) { n ≥ }≤ τ { n ≥ }≤   17 Now we evaluate P X z . Let E [.] denote the expectation of a function of random variable X τ { n ≥ } τ having pmf or pdf g(x, τ). Note that

xf(x)eτx 1 φ (τ) E [X]= dx = xf(x)eτxdx = ′ = z. τ φ(τ) φ(τ) φ(τ) Z Z Similarly, φ (τ) φ (τ) φ (τ) E [X2]= ′′ , E [X3]= ′′′ , E [X4]= ′′′′ τ φ(τ) τ φ(τ) τ φ(τ) So, φ (τ) E [ X z 2]= E [X2] z2 = ′′ z2. τ | − | τ − φ(τ) − Note that (X z)4 = X4 4zX3 +6z2X2 4z3X + z4. Hence, − − −

4 1 2 4 E [(X z) ]= φ′′′′(τ) 4zφ′′′(τ)+6z φ′′(τ) 3z φ(τ) . τ − φ(τ) − −   From Berry-Essen’s inequality [1, 8], we have

3 2 4 4 1 C φ′′′′(τ) 4zφ′′′(τ)+6z φ′′(τ) 3z φ(τ) P X z + BE − − τ { n ≥ } ≤ 2 √n (φ (τ) z2φ(τ))2/φ(τ) (  ′′ −  ) 3 2 4 1 C φ(τ)[φ′′′′(τ) 4zφ′′′(τ)] + 3[φ′′(τ)] = + BE − 3 . 2 √n [φ (τ) z2φ(τ)]2 − (  ′′ −  ) Making use of the above inequalities and (50) completes the proof of the theorem.

A.2 Proof of Theorem 3

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n α 1 n β 1 1 − − fX (x)= x (1 x ) . [ (α, β)]n i − i i=1 ! "i=1 # B Y Y To apply the LR method, we construct a family of probability density functions

n ϑ 1 n β 1 1 − − gX (x, ϑ)= x (1 x ) [ (ϑ, β)]n i − i i=1 ! "i=1 # B Y Y for ϑ (0, α]. It can be checked that ∈ n n α ϑ fX (x) (ϑ, β) − = B xi . gX (x, ϑ) (α, β) i=1 ! B  Y Since the geometric mean is no greater than the arithmetic mean, we have

n x (x )n , i ≤ n i=1 Y n Pi=1 xi where xn = n . Hence, n fX (x) (ϑ, β) α ϑ B (xn) − gX (x, ϑ) ≤ (α, β) B 

18 and it follows that n fX (x) (ϑ, β) α ϑ B z − ϑ (0, α] provided that xn z. gX (x, ϑ) ≤ (α, β) ∀ ∈ ≤ B  Consequently, X X I X X f ( ) X n z Λ(ϑ) g ( , ϑ) ϑ (0, α], { ≤ } ≤ ∀ ∈ where n (ϑ, β) α ϑ Λ(ϑ)= B z − . (α, β) B  It follows from Theorem 1 that n 1 α ϑ Pr Xn z inf (ϑ, β)z − for 0

n α 1 n ϑ 1 1 − − gX (x, ϑ)= x (1 x ) [ (α, ϑ)]n i − i i=1 ! "i=1 # B Y Y for ϑ (0,β]. It can be checked that ∈ β ϑ n n − n fX (x) (α, ϑ) (α, ϑ) β ϑ = B (1 xi) B (1 z) − ϑ (0,β] provided that xn z. gX (x, ϑ) (α, β) − ≤ (α, β) − ∀ ∈ ≥ "i=1 # B  Y B  Hence, X X I X X f ( ) Xn z Λ(ϑ) g ( , ϑ) ϑ (0,β], { ≥ } ≤ ∀ ∈ where n (α, ϑ) β ϑ Λ(ϑ)= B (1 z) − . (α, β) − B  It follows from Theorem 1 that n 1 β ϑ Pr Xn z inf (α, ϑ)(1 z) − for 0

19 Consider function w(ϑ) = ln α ln ϑ + (α ϑ) ln z. Note that the first and second derivatives are w′(ϑ)= 1 1 − − 1 ϑ ln z and w′′(ϑ)= ϑ2 , respectively. By the assumption that 0

A.3 Proof of Theorem 4

To apply the LR method, we construct a family of probability mass functions n + x 1 Γ(α + n)Γ(ϑ + x)Γ(α + ϑ) g(x, ϑ)= − , x =0, 1, 2, x Γ(α + ϑ + n + x)Γ(α)Γ(ϑ) ···   for ϑ (0,β]. Define ∈ f(x) L(x, ϑ)= , x =0, 1, 2, . g(x, ϑ) ··· Then, Γ(ϑ) Γ(β + x) Γ(α + ϑ + n + x) L(x, ϑ)= , x =0, 1, 2, . Γ(β) Γ(ϑ + x) Γ(α + β + n + x) ··· It can be checked that L(x +1, ϑ) β + x α + ϑ + n + x = 1, x =0, 1, 2, L(x, ϑ) ϑ + x α + β + n + x ≥ ··· for ϑ (0,β]. This implies that for any non-negative integer z, ∈ L(x, ϑ) L(z, ϑ), ϑ (0,β] ≤ ∀ ∈ for any non-negative integer x no greater than z. Hence, f(x) Λ(ϑ), ϑ (0,β] g(x, ϑ) ≤ ∀ ∈ for any non-negative integer x no greater than z, where Γ(ϑ) Γ(β + z) Γ(α + ϑ + n + z) Λ(ϑ)= . Γ(β) Γ(ϑ + z) Γ(α + β + n + z) Consequently, I f(X) X z Λ(ϑ) g(X, ϑ) ϑ (0,β]. { ≤ } ≤ ∀ ∈ By virtue of Theorem 1, we have Pr X z inf Λ(ϑ). { ≤ }≤ ϑ (0,β] ∈ E nβ Since z [X]= α 1 , we have ≤ − αz z 0 − β ≤ n ≤ and thus αz z αz z αz z Γ( − ) Γ(β + z) Γ(α + − + n + z) Pr X z Λ − = n n . { ≤ }≤ n Γ(β) Γ( αz z + z) Γ(α + β + n + z)   n− This completes the proof of the theorem.

20 A.4 Proof of Theorem 5

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n α 1 α β i=1[xi − (1 + xi)− − ] fX (x)= . [ (α, β)]n Q B To apply the LR method to show (13), we construct a family of probability density functions

n ϑ 1 ϑ β i=1[xi − (1 + xi)− − ] gX (x, ϑ)= , ϑ (0, α]. [ (ϑ, β)]n ∈ Q B It can be checked that

n n α ϑ fX (x) (ϑ, β) x − = B i , ϑ (0, α]. gX (x, ϑ) (α, β) 1+ x ∈ i=1 i B  Y   x By differentiation, it can be shown that ln 1+x is a concave function of x > 0. As a consequence of this fact, we have n α ϑ n(α ϑ) x − x − i n ϑ (0, α], 1+ x ≤ 1+ x ∀ ∈ i=1 i n Y     n Pi=1 xi x where xn = n . Since 1+x is an increasing function of x> 0, it follows that

n α ϑ n(α ϑ) x − z − i ϑ (0, α] provided that 0 x z. 1+ x ≤ 1+ z ∀ ∈ ≤ n ≤ i=1 i Y     Therefore, we have established that

X X I X X f ( ) Xn z Λ(ϑ) g ( , ϑ) ϑ (0, α], { ≤ } ≤ ∀ ∈ where α ϑ n (ϑ, β) z − Λ(ϑ)= B . (α, β) 1+ z "B   # Invoking Theorem 1, we have

Pr Xn z inf Λ(ϑ). ≤ ≤ ϑ (0,α] ∈ α As a consequence of β > 1 and 0

n α 1 α ϑ i=1[xi − (1 + xi)− − ] gX (x, ϑ)= , ϑ [β, ). [ (α, ϑ)]n ∈ ∞ Q B It can be seen that n n fX (x) (α, ϑ) ϑ β = B (1 + xi) − , ϑ [β, ). gX (x, ϑ) (α, β) ∈ ∞ i=1 B  Y

21 By differentiation, it can be shown that ln(1 + x) is a concave function of x> 0. As a consequence of this fact, we have n ϑ β n(ϑ β) n(ϑ β) (1 + x ) − (1 + x ) − (1 + z) − , ϑ [β, ) i ≤ n ≤ ∈ ∞ i=1 Y x X X I X X provided that 0 n z. Hence, we have that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [β, ), ≤ ≤ { ≤ } ≤ ∈ ∞ where n (α, ϑ) ϑ β Λ(ϑ)= B (1 + z) − . (α, β) B  α Making use of Theorem 1, we have Pr Xn z infϑ β Λ(ϑ). As a consequence of 0 < z β 1 , we ≤ ≤ ≥ ≤ − have 1 + α β. Hence, z ≥  α n α (α, 1+ z ) 1+ α β α Pr X z Λ 1+ = B (1 + z) z − for 0

A.5 Proof of Theorem 6

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n

n xi 1 θxi (θxi) − e− fX (x)= . x ! i=1 i Y To apply the LR method to show (15), we construct a family of probability mass functions

n xi 1 ϑxi (ϑxi) − e− gX (x, ϑ)= , ϑ (0,θ]. x ! ∈ i=1 i Y It can be seen that n xn 1 n fX (x) θ − ϑ = exp((ϑ θ)xn) = exp ((ln θ θ ln ϑ + ϑ) xn) , gX (x, ϑ) ϑ − θ − − "  #   

n where x = Pi=1 xi . Noting that ln x x is increasing with respect to x (0, 1), we have that n n − ∈ ln θ θ ln ϑ + ϑ 0 − − ≥ as a consequence of 0 < ϑ θ. It follows that ≤ ϑ n ϑ n exp ((ln θ θ ln ϑ + ϑ) x ) exp ((ln θ θ ln ϑ + ϑ) z) ϑ (0,θ] θ − − n ≤ θ − − ∀ ∈       provided that x z. Hence, n ≤ fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where ϑ n Λ(ϑ)= exp ((ln θ θ ln ϑ + ϑ) z) . θ − −   

22 X X I X X Hence, we have that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, { ≤ } ≤ ∈ we have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with ≤ ≤ ∈ respective to ϑ (0,θ] is attained at ϑ =1 1 . Therefore,  ∈ − z z 1 n 1 eθz − θz 1 Pr X z Λ 1 = e− for 1

A.6 Proof of Theorem 7

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n

n xi 1 1 mxi θ − mx fX (x)= (1 θ) i . x x 1 1 θ − i=1 i i Y  −   −  To apply the LR method to show (16), we construct a family of probability mass functions

n xi 1 1 mxi ϑ − mx gX (x, ϑ)= (1 ϑ) i , ϑ (0,θ]. x x 1 1 ϑ − ∈ i=1 i i Y  −   − 

It can be verified that n xn 1 mxn fX (x) θ(1 ϑ) − 1 θ = − − , gX (x, ϑ) ϑ(1 θ) 1 ϑ ( −   −  ) n Pi=1 xi where xn = n . Define function x h(x) = ln + m ln(1 x) 1 x − − for x (0, 1). Then, ∈ n fX (x) ϑ(1 θ) = − exp(xn[h(θ) h(ϑ)]) . gX (x, ϑ) θ(1 ϑ) −  −  1 1 Note that the first derivative of h(x) is h′(x) = 1 x x m , which is positive for x (0, 1). Hence, − − ∈ h(θ) h(ϑ) 0 for ϑ (0,θ]. It follows that − ≥ ∈  fX (x) Λ(ϑ) ϑ (0,θ] provided that 1 xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ ≤ where ϑ(1 θ) n Λ(ϑ)= − exp(z[h(θ) h(ϑ)]) . θ(1 ϑ) −  −  This implies that fX (X ) I Λ(ϑ) gX (X , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, we Xn z { ≤ } ≤ ∈ have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with ∈ ≤ ≤ z 1 respective to ϑ (0,θ] is attained at ϑ = − . So,  ∈ mz z 1 n θ − mz z 1 1 θ (1 θ) 1 − − Pr Xn z Λ − =  z 1  for 1 z < . { ≤ }≤ mz z 1  − z 1 mz ≤ 1 mθ   1 z−+mz (1 mz− ) −  − −     This completes the proof of the theorem.

23 A.7 Proof of Theorem 8

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n n 1 βxi 1 xi 1 (β 1)xi fX (x)= − θ − (1 θ) − . βx 1 x − i=1 i i Y −   To apply the LR method to show (17), we construct a family of probability mass functions

n 1 βxi 1 xi 1 (β 1)xi gX (x, ϑ)= − ϑ − (1 ϑ) − , ϑ (0,θ]. βx 1 x − ∈ i=1 i i Y −  

It can be verified that n xn 1 (β 1)xn fX (x) θ − 1 θ − = − , gX (x, ϑ) ϑ 1 ϑ (   −  ) n Pi=1 xi where xn = n . Define function h(x) = ln x + (β 1) ln(1 x) − − for x (0, 1). Then, ∈ n fX (x) ϑ = exp(xn[h(θ) h(ϑ)]) . gX (x, ϑ) θ −   1 βx 1 Note that the first derivative of h(x) is h′(x)= x(1− x) , which is positive for x (0, β ). Hence, h(θ) h(ϑ) − ∈ − ≥ 0 for ϑ (0,θ]. It follows that ∈ fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where ϑ n Λ(ϑ)= exp(z[h(θ) h(ϑ)]) . θ −   X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, we { ≤ } ≤ ∈ have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with ∈ ≤ ≤ z 1 respective to ϑ (0,θ] is attained at ϑ = βz− 1 . Therefore,  ∈ − n z 1 βz z z 1 θ − (1 θ) − 1 θ Pr Xn z Λ − =  z 1 − βz z  for 1 z − . { ≤ }≤ βz 1 z 1 − z 1 − ≤ ≤ 1 βθ  −  βz− 1 1 βz− 1 −  − − −       This completes the proof of the theorem.

A.8 Proof of Theorem 9

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 1 µ xi µ xi fX (x)= exp − exp − . β β − β i=1 Y    To apply the LR method to show (18), we construct a family of probability density functions

n 1 ϑ xi ϑ xi gX (x, ϑ)= exp − exp − , ϑ ( ,µ]. β β − β ∈ −∞ i=1 Y   

24 Note that n fX (x) µ ϑ ϑ x µ x = exp − + exp − i exp − i gX (x, ϑ) β β − β i=1      Y n µ ϑ n ϑ µ x = exp − exp exp exp exp i . β β − β − β ( i=1 )         X   Observing that for ϑ ( ,µ], ∈ −∞ ϑ µ x exp exp exp β − β −β        is a concave function of x, we have that n fX (x) µ ϑ ϑ µ x exp − exp n exp exp exp n , gX (x, ϑ) ≤ β β − β − β            n where x = Pi=1 xi . In view of the fact that for ϑ ( ,µ], n n ∈ −∞ ϑ µ x exp exp exp β − β −β        is also an increasing function of x, we have that

fX (x) Λ(ϑ) ϑ ( ,µ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ −∞ ≤ where µ ϑ n ϑ µ z Λ(ϑ)= exp − exp n exp exp exp . β β − β −β            This implies that fX (X ) I Λ(ϑ) gX (X , ϑ) holds for any ϑ ( ,µ]. By virtue of Theorem 1, Xn z { ≤ } ≤ ∈ −∞ we have Pr Xn z inf ϑ ( ,µ] Λ(ϑ). Note that ≤ ≤ ∈ −∞  µ µ z n Λ(ϑ)= exp w(ϑ)+ exp − , β − β     where ϑ ϑ z w(ϑ)= − + exp exp , ϑ ( ,µ]. β β −β ∈ −∞     It can be checked that the first and second derivatives of w(ϑ) are 1 1 ϑ z 1 ϑ z w′(ϑ)= − + exp exp , w′′(ϑ)= exp exp . β β β −β β2 β −β         Obviously, 1 w′(z)=0, w′′(z)= > 0. β2 It follows that inf Λ(ϑ)=Λ(z). ϑ ( ,µ] ∈ −∞ Therefore, µ z µ z n Pr X z Λ(z)= exp − +1 exp − { n ≤ }≤ β − β     for z µ. This completes the proof of the theorem. ≤

25 A.9 Proof of Theorem 10

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n α β α 1 β fX (x)= x− − exp . Γ(α) i −x i=1 i Y   To apply the LR method to show (19), we construct a family of probability density functions

n ϑ β ϑ 1 β gX (x, ϑ)= x− − exp , ϑ [α, ). Γ(ϑ) i −x ∈ ∞ i=1 i Y   Clearly,

n fX (x) Γ(ϑ) α ϑ ϑ α = β − xi − gX (x, ϑ) Γ(α) i=1 Y ϑ α n n − Γ(ϑ) α ϑ = β − x Γ(α) i   i=1 ! n Y Γ(ϑ) α ϑ n(ϑ α) β − (x ) − , ≤ Γ(α) n   n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ [α, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≤ where ϑ α n Γ(ϑ) z − Λ(ϑ)= . Γ(α) β "   # X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [α, ). By virtue of Theorem 1, we { ≤ } ≤ ∈ β ∞ β have Pr Xn z infϑ [α, ) Λ(ϑ). As a consequence of 0 < z α 1 , we have z +1 α. It follows ≤ ≤ ∈ ∞ ≤ − ≥ that  n β β α+1 β Γ( + 1) z z − β Pr X z Λ +1 = z for 0

n α ϑ α 1 ϑ gX (x, ϑ)= x− − exp , ϑ (0,β]. Γ(α) i −x ∈ i=1 i Y   It can be seen that nα n fX (x) β ϑ β = exp − . gX (x, ϑ) ϑ x i=1 i !   X ϑ β Observing that for ϑ (0,β], − is a concave function of x> 0, we have that ∈ x α n fX (x) β ϑ β exp − . gX (x, ϑ) ≤ ϑ x    n 

26 It follows that fX (x) Λ(ϑ) ϑ (0,β] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where n β α ϑ β Λ(ϑ)= exp − . ϑ z     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,β]. By virtue of Theorem 1, we { ≤ } ≤ ∈ have Pr Xn z infϑ (0,β] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with ≤ ≤ ∈ respect to ϑ (0,β] is attained at ϑ = αz as long as 0

A.10 Proof of Theorem 11

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 1/2 2 λ λ(xi θ) fX (x)= exp − . 2πx3 − 2θ2x i=1 i i Y     To apply the LR method to show (21), we construct a family of probability density functions

n 1/2 2 λ λ(xi ϑ) gX (x, ϑ)= exp − , ϑ (0,θ]. 2πx3 − 2ϑ2x ∈ i=1 i i Y    

It can be verified that n fX (x) λ λ λ λ = exp + xn , gX (x, ϑ) θ − ϑ 2ϑ2 − 2θ2      n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where λ λ λ λ n Λ(ϑ)= exp + z . θ − ϑ 2ϑ2 − 2θ2      X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, we { ≤ } ≤ ∈ have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that the infimum of Λ(ϑ) with ≤ ≤ ∈ respect to ϑ (0,θ] is attained at ϑ = z as long as 0

27 A.11 Proof of Theorem 12

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n

n xi xi(β 1) θ (1 θ) − Γ(βxi) fX (x)= − − . Γ(x + 1)Γ(βx x + 1) ln(1 θ) i=1 i i i Y − − To apply the LR method to show (22), we construct a family of probability mass functions

n xi xi(β 1) ϑ (1 ϑ) − Γ(βxi) gX (x, ϑ)= − − , ϑ (0,θ]. Γ(x + 1)Γ(βx x + 1) ln(1 ϑ) ∈ i=1 i i i Y − −

It can be seen that n xn (β 1)xn fX (x) θ 1 θ − ln(1 ϑ) = − − , gX (x, ϑ) ϑ 1 ϑ ln(1 θ) "   −  − # n Pi=1 xi where xn = n . Define function

h(x) = ln x + (β 1) ln(1 x) − − for x (0, 1). Then, we can write ∈ n fX (x) ln(1 ϑ) = exp ([h(θ) h(ϑ)]xn) − . gX (x, ϑ) − ln(1 θ)  −  Note that the first derivative of h(x) is 1 βx h′(x)= − , x(1 x) − which is positive for x (0, 1 ). Hence, ∈ β

fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where n z z(β 1) n ln(1 ϑ) θ 1 θ − ln(1 ϑ) Λ(ϑ)= exp ([h(θ) h(ϑ)]z) − = − − . − ln(1 θ) ϑ 1 ϑ ln(1 θ)  −  "   −  − # X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, { ≤ } ≤ ∈ we have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z ≤ ≤ ∈ ≤ θ , the infimum of Λ(ϑ) with respect to ϑ (0,θ] is attained at ϑ such that z = ϑ . (βθ 1) ln(1 θ) (βϑ 1) ln(1 ϑ) − − ∈ ϑ −1 − Such number ϑ is unique because the first derivative of (βϑ 1) ln(1 ϑ) with respective to ϑ (0, β ) is equal − − ∈ to 1 1 βϑ ln(1 ϑ) ϑ − , [(1 βϑ) ln(1 ϑ)]2 − − − 1 ϑ − −  −  which is no less than (β 1)ϑ2 − > 0. (1 ϑ)[(1 βϑ) ln(1 ϑ)]2 − − − This completes the proof of the theorem.

28 A.12 Proof of Theorem 13

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n n β αxi + β xi β+αxi xi fX (x)= θ (1 θ) − . αx + β x − i=1 i i Y   To apply the LR method to show (23), we construct a family of probability mass functions

n β αxi + β xi β+αxi xi gX (x, ϑ)= ϑ (1 ϑ) − , ϑ (0,θ]. αx + β x − ∈ i=1 i i Y  

It can be seen that n xn β+(α 1)xn fX (x) θ 1 θ − = − , gX (x, ϑ) ϑ 1 ϑ "   −  # n Pi=1 xi where xn = n . Define function

h(x) = ln x + (α 1) ln(1 x) − − for x (0, 1). Then, we can write ∈ β n fX (x) 1 θ = exp ([h(θ) h(ϑ)]xn) − . gX (x, ϑ) − 1 ϑ "  −  # Note that the first derivative of h(x) is 1 αx h′(x)= − , x(1 x) − which is positive for x (0, 1 ). Hence, ∈ α fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where β n z β+(α 1)z n 1 θ θ 1 θ − Λ(ϑ)= exp ([h(θ) h(ϑ)]z) − = − . − 1 ϑ ϑ 1 ϑ "  −  # "   −  # X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, we { ≤ } ≤ ∈ βθ have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z 1 αθ , ≤ ≤ ∈ ≤ − the infimum of Λ(ϑ) with respect to ϑ (0,θ] is attained at ϑ = z . This completes the proof of the  ∈ β+αz theorem.

A.13 Proof of Theorem 14

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 1 xi α fX (x)= exp | − | . 2β − β i=1 Y   To apply the LR method to show (24), we construct a family of probability density functions

n 1 xi α gX (x, ϑ)= exp | − | , ϑ [β, ). 2ϑ − ϑ ∈ ∞ i=1 Y  

29 It can be seen that for ϑ [β, ), ∈ ∞ n n fX (x) ϑ 1 1 = exp xi α gX (x, ϑ) β ϑ − β | − | " i=1 #     X ϑ n 1 1 n exp (x α) ≤ β ϑ − β i − " i=1 #     X ϑ n 1 1 n = exp (x α) , β ϑ − β n −       n x where x = Pi=1 i . Since 1 1 0 for ϑ [β, ), it follows that n n ϑ − β ≤ ∈ ∞

fX (x) Λ(ϑ) ϑ [β, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≥ where ϑ 1 1 n Λ(ϑ)= exp (z α) . β ϑ − β −     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [β, ). By virtue of Theorem 1, we { ≥ } ≤ ∈ ∞ have Pr Xn z infϑ [β, ) Λ(ϑ). By differentiation, it can be shown that, as long as z α + β, the ≥ ≤ ∈ ∞ ≥ infimum of Λ(ϑ) with respect to ϑ [β, ) is attained at ϑ = z α. This proves (24).  ∈ ∞ − To show (25), note that for ϑ [β, ), ∈ ∞ n n fX (x) ϑ 1 1 = exp xi α gX (x, ϑ) β ϑ − β | − | " i=1 #     X ϑ n 1 1 n exp (α x ) ≤ β ϑ − β − i " i=1 #     X ϑ n 1 1 n = exp (α x ) . β ϑ − β − n       Since 1 1 0 for ϑ [β, ), it follows that ϑ − β ≤ ∈ ∞

fX (x) Λ(ϑ) ϑ [β, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≤ where ϑ 1 1 n Λ(ϑ)= exp (α z) . β ϑ − β −     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [β, ). By virtue of Theorem 1, we { ≤ } ≤ ∈ ∞ have Pr Xn z infϑ [β, ) Λ(ϑ). By differentiation, it can be shown that, as long as z α β, the ≤ ≤ ∈ ∞ ≤ − infimum of Λ(ϑ) with respect to ϑ [β, ) is attained at ϑ = α z. This proves (25). The proof of the  ∈ ∞ − theorem is thus completed.

A.14 Proof of Theorem 15

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n n qxi fX (x)= . x ln 1 i=1 i 1 q Y −

30 To apply the LR method to show (26), we construct a family of probability mass functions

n ϑxi gX (x, ϑ)= , ϑ (0, q]. x ln 1 ∈ i=1 i 1 ϑ Y − Clearly,

n fX (x) ln(1 q) q xn = − , gX (x, ϑ) ln(1 ϑ) ϑ  −    n Pi=1 xi where xn = n . Hence,

fX (x) Λ(ϑ) ϑ (0, q] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where ln(1 q) q z n Λ(ϑ)= − . ln(1 ϑ) ϑ  −    X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0, q]. By virtue of Theorem 1, we { ≤ } ≤ ∈ q have Pr Xn z infϑ (0,q] Λ(ϑ). By differentiation, it can be shown that, as long as z (1 q)ln 1 , ≤ ≤ ∈ ≤ − 1−q ϑ the infimum of Λ( ϑ) with respect to ϑ (0, q] is attained at ϑ (0, q] such that z = (1 ϑ)ln 1 . Such ∈ ∈ − 1−ϑ ϑ number ϑ is unique because the function (1 ϑ)ln 1 is increasing with respect to ϑ (0, 1). The proof of 1−ϑ ∈ the theorem is thus completed. −

A.15 Proof of Theorem 16

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 2 1 1 µ ln xi fX (x)= exp − . √ −2 σ i=1 xi 2πσ " # Y   To apply the LR method to show (27), we construct a family of probability density functions

n 2 1 1 ϑ ln xi gX (x, ϑ)= exp − , ϑ (0,µ]. √ −2 σ ∈ i=1 xi 2πσ " # Y   It can be seen that

n fX (x) ϑ µ µ + ϑ = exp − ln xi . gX (x, ϑ) σ2 2 − " i=1 # X   It can be readily shown that for ϑ (0,µ], ∈ ϑ µ µ + ϑ − ln x σ2 2 −   is a concave function of x> 0. Hence,

2 n fX (x) ϑ µ µ + ϑ exp − ln xn for ϑ (0,µ], gX (x, ϑ) ≤ σ2 2 − ∈ ( "   #)

31 n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ (0,µ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where n ϑ µ µ + ϑ 2 Λ(ϑ)= exp − ln z . σ2 2 − ( "   #) X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,µ]. By virtue of Theorem 1, we { ≤ } ≤ ∈ µ have Pr Xn z infϑ (0,µ] Λ(ϑ). By differentiation, it can be shown that, as long as 0 < z e , the ≤ ≤ ∈ ≤ infimum of Λ(ϑ) with respect to ϑ (0,µ] is attained at ϑ = ln z. Therefore,  ∈ n µ ln z 2 Pr X z Λ(ln z) = exp − for 0

A.16 Proof of Theorem 17

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n 2m 1 2 2 xi − xi fX (x)= exp . Γ(m) σ2m −σ2 i=1 Y   To apply the LR method to show (28), we construct a family of probability density functions

n 2ϑ 1 2 2 xi − xi gX (x, ϑ)= exp , ϑ (0,m]. Γ(ϑ) σ2ϑ −σ2 ∈ i=1 Y   Clearly, for ϑ (0,m], ∈ 2(m ϑ) n n − n fX (x) Γ(ϑ) 2(ϑ m) Γ(ϑ) 2(ϑ m) 2n(m ϑ) = σ − xi σ − (xn) − , gX (x, ϑ) Γ(m) ≤ Γ(m) i=1 !   Y   n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ (0,m] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where Γ(ϑ) z 2(m ϑ) n Λ(ϑ)= − . Γ(m) σ     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,m]. By virtue of Theorem 1, we { ≤ } ≤ 1 ∈ Γ(ϑ+ 2 ) have Pr Xn z infϑ (0,µ] Λ(ϑ). Letting z = σ leads to (28). ≤ ≤ ∈ Γ(ϑ) To apply the LR method to show (29), we construct a family of probability density functions n 2m 1 2 2 xi − xi gX (x, ϑ)= exp , ϑ [σ, ). Γ(m) ϑ2m −ϑ2 ∈ ∞ i=1 Y   It can be seen that 2mn n fX (x) ϑ 1 1 2 = exp xi . gX (x, ϑ) σ ϑ2 − σ2 " i=1 #     X

32 1 1 2 Observing that for ϑ [σ, ), 2 2 x is a concave function of x> 0, we have that ∈ ∞ ϑ − σ 2m n fX (x) ϑ 1 1 2 exp (xn) , ϑ [σ, ). gX (x, ϑ) ≤ σ ϑ2 − σ2 ∀ ∈ ∞ (    ) It follows that fX (x) Λ(ϑ) ϑ [σ, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≥ where n ϑ 2m z2 z2 Λ(ϑ)= exp . σ ϑ2 − σ2 "   # X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [σ, ). By virtue of Theorem 1, we { ≥ } ≤ ∈ ∞ have Pr Xn z infϑ [σ, ) Λ(ϑ). By differentiation, it can be shown that, as long as z √mσ, the ≥ ≤ ∈ ∞ ≥ infimum of Λ(ϑ) with respect to ϑ [σ, ) is attained at ϑ = z . Therefore,  ∈ ∞ √m n z z2 m z2 Pr X z Λ = exp m for z √mσ. { n ≥ }≤ √m mσ2 − σ2 ≥       This establishes (29) and completes the proof of the theorem.

A.17 Proof of Theorem 18

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n θ a θ+1 fX (x)= . a x i=1 i Y   To apply the LR method to show (30), we construct a family of probability density functions

n ϑ a ϑ+1 gX (x, ϑ)= , ϑ [θ, ). a x ∈ ∞ i=1 i Y   Clearly, ϑ θ n n n − ϑ θ fX (x) θ θ xn − = xi , gX (x, ϑ) ϑ ≤ ϑ a i=1 ! " #   Y   n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ [θ, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≤ where θ z ϑ θ n Λ(ϑ)= − . ϑ a     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [θ, ). By virtue of Theorem 1, we { ≤ } ≤ ∈ ∞ have Pr Xn z infϑ [θ, ) Λ(ϑ). Hence, for γ > 1, ≤ ≤ ∈ ∞  n θ ϑ θ n Pr Xn γa inf γ − = θ inf exp[n w(ϑ)], { ≤ }≤ ϑ θ ϑ ϑ θ ≥   ≥ where w(ϑ)= ln ϑ + (ϑ θ) ln γ. − −

33 Now consider the minimization of w(ϑ) subject to ϑ θ. Note that the first and second derivatives of 1 1 ≥ 1 w(ϑ) are w′(ϑ) = ϑ + ln γ and w′′(ϑ) = ϑ2 , respectively. Hence, the minimum is achieved at ϑ∗ = ln γ − 1/θ provided that 1 <γ e . Accordingly, w(ϑ∗) = 1 + ln ln γ θ ln γ and ≤ − eθ n Pr X γa ln γ for 1 <γ e1/θ. { n ≤ }≤ γθ ≤   θa ρµ Note that the mean of X is µ = θ 1 . Letting γ = a yields − n θ 1 θ ρθ 1 1 1 Pr X ρµ eθ − ln for 1 <ρ 1 exp( ). { n ≤ }≤ ρθ θ 1 − θ ≤ − θ θ "    − #   This establishes (30) and completes the proof of the theorem.

A.18 Proof of Theorem 19

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n α xi− fX (x)= . C(α) i=1 Y To apply the LR method to show (31), we construct a probability density functions

n ϑ xi− gX (x, ϑ)= , ϑ [α, ). C(ϑ) ∈ ∞ i=1 Y Clearly, ϑ α n n − n n fX (x) C(ϑ) C(ϑ) ϑ α = xi (xn) − Λ(ϑ) gX (x, ϑ) C(α) ≤ C(α) ≤ i=1 !   Y   h i n x provided that x z, where x = Pi=1 i and n ≤ n n n C(ϑ) ϑ α Λ(ϑ)= z − . C(α)   X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [α, ). By virtue of Theorem 1, we { ≤ } ≤ ∈ ∞ have Pr X z Λ(ϑ). This completes the proof of the theorem. n ≤ ≤  A.19 Proof of Theorem 20

Let X = [X , ,X ] and x = [x , , x ]. The joint probability mass function of X is 1 ··· n 1 ··· n

n xi m! s(xi,m) θ fX (x)= | | . x ![ ln(1 θ)]m i=1 i Y − − To apply the LR method to show (32), we construct a family of probability mass functions

n xi m! s(xi,m) ϑ gX (x, ϑ)= | | , ϑ (0,θ]. x ![ ln(1 ϑ)]m ∈ i=1 i Y − − Clearly, n nm xn fX (x) ln(1 ϑ) θ = − , gX (x, ϑ) ln(1 θ) ϑ  −  "  #

34 n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ (0,θ] provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≤ where n ln(1 ϑ) nm θ z Λ(ϑ)= − . ln(1 θ) ϑ  −     X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ (0,θ]. By virtue of Theorem 1, we { ≤ } ≤ ∈ mθ have Pr Xn z infϑ (0,θ] Λ(ϑ). By differentiation, it can be shown that, as long as z (θ 1) ln(1 θ) , ≤ ≤ ∈ ≤ − − the infimum of Λ(ϑ) with respective to ϑ (0,θ] is attained at a number ϑ such that z = mϑ .  (ϑ 1) ln(1 ϑ) ϑ ∈ − − Such number ϑ is unique because (ϑ 1) ln(1 ϑ) is an increasing function of ϑ (0, 1). This completes the − − ∈ proof of the theorem.

A.20 Proof of Theorem 21

To apply the LR method, we introduce a family of probability density functions 1 x g(x, ϑ)= f , ϑ> 0. ϑ ϑ   Clearly, (n+m)/2 f(x) f(x) n + mx (n+m)/2 1 1 = = ϑm/2 ϑ = ϑm/2 1+ ϑ − . g(x, ϑ) 1 f( x ) n + mx 1+ n ϑ ϑ    mx  To show inequality (33), note that f(x) is decreasing with respect to x> 0 for ϑ 1. Hence, g(x,ϑ) ≥ f(x) Λ(ϑ) ϑ [1, ) provided that x z, g(x, ϑ) ≤ ∀ ∈ ∞ ≥ where (n+m)/2 1 1 Λ(ϑ)= ϑm/2 1+ ϑ − . (54) 1+ n  mz  I This implies that f(X) X z Λ(ϑ) g(X, ϑ) holds for any ϑ [1, ). By virtue of Theorem 1, we have { ≥ } ≤ ∈ ∞ n + m (n+m)/2 Pr X z inf Λ(ϑ)=Λ(z)= zm/2 for z 1. { ≥ }≤ ϑ [1, ) n + mz ≥ ∈ ∞   To show inequality (34), note that f(x) is increasing with respect to x> 0 for 0 < ϑ 1. Hence, g(x,ϑ) ≤ f(x) Λ(ϑ) ϑ (0, 1] provided that x z, g(x, ϑ) ≤ ∀ ∈ ≤ I where Λ(ϑ) is defined by (54). This implies that f(X) X z Λ(ϑ) g(X, ϑ) holds for any ϑ (0, 1]. By { ≤ } ≤ ∈ virtue of Theorem 1, we have

n + m (n+m)/2 Pr X z inf Λ(ϑ)=Λ(z)= zm/2 for 0

35 A.21 Proof of Theorem 22

To apply the LR method, we introduce a family of probability density functions 1 x g(x, ϑ)= f , ϑ> 0. ϑ ϑ   Clearly, x 2 (n+1)/2 1 (n+1)/2 f(x) f(x) n + ( ϑ ) ϑ2 1 = 1 x = ϑ 2 = ϑ 1+ n − . g(x, ϑ) f( ) n + x 2 +1 ϑ ϑ    x  To show inequality (35), note that f(x) is decreasing with respect to x for ϑ 1. Hence, g(x,ϑ) | | ≥ f(x) Λ(ϑ) ϑ [1, ) provided that x z, g(x, ϑ) ≤ ∀ ∈ ∞ | |≥ where 1 (n+1)/2 ϑ2 1 Λ(ϑ)= ϑ 1+ n − . (55) 2 +1  z  I This implies that f(X) X z Λ(ϑ) g(X, ϑ) holds for any ϑ [1, ). By virtue of Theorem 1, we have {| |≥ } ≤ ∈ ∞ n +1 (n+1)/2 Pr X z inf Λ(ϑ)=Λ(z)= z for z 1. { ≥ }≤ ϑ [1, ) n + z2 ≥ ∈ ∞   This proves inequality (35). To show inequality (36), note that f(x) is increasing with respect to x for ϑ (0, 1]. Hence, g(x,ϑ) | | ∈ f(x) Λ(ϑ) ϑ (0, 1] provided that x z, g(x, ϑ) ≤ ∀ ∈ | |≤ I where Λ(ϑ) is defined by (55). This implies that f(X) X z Λ(ϑ) g(X, ϑ) holds for any ϑ (0, 1]. By {| |≤ } ≤ ∈ virtue of Theorem 1, we have

n +1 (n+1)/2 Pr X z inf Λ(ϑ)=Λ(z)= z for 0

A.22 Proof of Theorem 23

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n θ θx fX (x)= e i . eθ 1 i=1 Y − To apply the LR method to show (37), we construct a family of probability density functions

n ϑ ϑx gX (x, ϑ)= e i , ϑ ( ,θ], ϑ =0. eϑ 1 ∈ −∞ 6 i=1 Y − Clearly, n fX (x) θ eϑ 1 = − exp [n(θ ϑ)xn] , gX (x, ϑ) ϑ eθ 1 −  − 

36 n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ ( ,θ], ϑ = 0 provided that xn z, gX (x, ϑ) ≤ ∀ ∈ −∞ 6 ≤ where n θ eϑ 1 Λ(ϑ)= − exp [n(θ ϑ)z] . ϑ eθ 1 −  −  X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ ( ,θ], ϑ = 0. By virtue of { ≤ } ≤ ∈ −∞ 6 Theorem 1, we have Pr Xn z infϑ ( ,θ] Λ(ϑ). By differentiation, it can be shown that, as long as ≤ ≤ ∈ −∞ 0

A.23 Proof of Theorem 24

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is fX (x)=1. To 1 ··· n 1 ··· n apply the LR method to show (39), we construct a family of probability density functions

n ϑ ϑx gX (x, ϑ)= e i , ϑ> 0. eϑ 1 i=1 Y − Clearly, n fX (x) eϑ 1 = − exp( ϑ xn) , gX (x, ϑ) ϑ −   n Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ> 0 provided that xn z, gX (x, ϑ) ≤ ∀ ≥ where n eϑ 1 Λ(ϑ)= − . ϑeϑz   This implies that fX (X ) I Λ(ϑ) gX (X , ϑ) holds for any ϑ> 0. By virtue of Theorem 1, we have Xn z { ≥ } ≤ 1 Pr Xn z infϑ>0 Λ(ϑ). By differentiation, it can be shown that, as long as 1 > z 2 , the infimum ≥ ≤ ≥1 1 of Λ(ϑ) with respective to ϑ> 0 is attained at a positive number ϑ∗ such that z =1+ ϑ∗ ∗ . Such a  e 1 − ϑ number is unique because − 1 1 1 lim 1+ = ϑ 0 eϑ 1 − ϑ 2 ↓  − 

37 1 1 and 1+ eϑ 1 ϑ is increasing with respect to ϑ> 0. Therefore, we have shown that − − 1 Pr X z Λ(ϑ∗) for

n zs sX Λ(ϑ∗)= inf e− E[e ] s>0  

To establish an upper bound on Λ(ϑ∗), we can use the following inequality due to Chen [6, Appendix H],

s2 s E[esX ] < exp + , s ( , ). 24 2 ∀ ∈ −∞ ∞   By differentiation, it can be shown that

s2 s n 1 2 1 Pr X z Λ(ϑ∗) inf exp + zs exp 6n z , 1 >z> . { ≥ }≤ ≤ s>0 24 2 − ≤ − − 2 2      ! This establishes (39). By a similar argument, we can show (40). This completes the proof of the theorem.

A.24 Proof of Theorem 25

Let X = [X , ,X ] and x = [x , , x ]. The joint probability density function of X is 1 ··· n 1 ··· n n β 1 β fX (x)= αβx − exp αx . i − i i=1 Y   To apply the LR method, we construct a family of probability density functions

n β 1 β gX (x, ϑ)= ϑβx − exp ϑx , ϑ (0, ). i − i ∈ ∞ i=1 Y   Clearly, n n fX (x) α β = exp (ϑ α) xi . gX (x, ϑ) ϑ − " i=1 #   X To show inequality (41) under the condition that αzβ 1 and 0 <β 1, we restrict ϑ to be no less than ≤ ≤ α. As a consequence of 0 <β 1 and ϑ α, we have that (ϑ α)xβ is a concave function of x> 0. By ≤ ≥ − virtue of such concavity, we have

n fX (x) α β exp (ϑ α)(xn) , ϑ [α, ), gX (x, ϑ) ≤ ϑ − ∀ ∈ ∞ n o n   Pi=1 xi where xn = n . It follows that

fX (x) Λ(ϑ) ϑ [α, ) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ∞ ≤ where α n Λ(ϑ)= exp (ϑ α)zβ . (56) ϑ − n  o

38 X X I X X This implies that f ( ) Xn z Λ(ϑ) g ( , ϑ) holds for any ϑ [α, ). By virtue of Theorem 1, we { ≤ } ≤ ∈ ∞ β have Pr Xn z infϑ [α, ) Λ(ϑ). By differentiation, it can be shown that, as long as αz 1, the ∈ ∞ ≤ ≤ β ≤ infimum of Λ(ϑ) with respect to ϑ [α, ) is attained at ϑ = z− . Therefore,  ∈ ∞ β β β n β Pr X z Λ(z− )= αz exp(1 αz ) for αz 1 and β < 1. { n ≤ }≤ − ≤ This proves inequality (41).   To show inequality (42) under the condition that αzβ 1 and β > 1, we restrict ϑ to be a positive ≥ number less than α. As a consequence of β > 1 and 0 <ϑ<α, we have that (ϑ α)xβ is a concave − function of x> 0. By virtue of such concavity, we have

n fX (x) α β exp (ϑ α)(xn) , ϑ (0, α). gX (x, ϑ) ≤ ϑ − ∀ ∈ n  o It follows that fX (x) Λ(ϑ) ϑ (0, α) provided that xn z, gX (x, ϑ) ≤ ∀ ∈ ≥ where Λ(ϑ) is defined by (56). This implies that fX (X ) I Λ(ϑ) gX (X , ϑ) holds for any ϑ (0, α). Xn z { ≥ } ≤ ∈ By virtue of Theorem 1, we have Pr Xn z infϑ (0,α) Λ(ϑ). By differentiation, it can be shown that, ∈ β ≥ ≤ β as long as αz 1, the infimum of Λ(ϑ) with respect to ϑ (0, α) is attained at ϑ = z− . Therefore, ≥  ∈ β β β n β Pr X z Λ(z− )= αz exp(1 αz ) for αz 1 and β > 1. { n ≥ }≤ − ≥ This proves inequality (42). The proof of the theorem is thus completed.

B Proofs of Multivariate Inequalities

B.1 Proof of Theorem 26

To apply the LR method to show (43), we introduce a family of probability mass functions

n Γ( k ϑ ) k Γ(x + ϑ ) g(x, ϑ)= ℓ=0 ℓ ℓ ℓ , with ϑ = α and 0 < ϑ α , ℓ =1, , k x k Γ(ϑ ) 0 0 ℓ ≤ ℓ ··· Γ(n + ℓ=0 ϑℓ) ℓ   P ℓY=0 where ϑ = [ϑ , ϑ , , ϑ ]⊤P. Clearly, 0 1 ··· k k f(x) Γ( k α ) Γ(n + k ϑ ) Γ(x + α ) Γ(ϑ ) = ℓ=0 ℓ ℓ=0 ℓ ℓ ℓ ℓ . g(x, ϑ) k k Γ(x + ϑ ) Γ(α ) Γ( ℓ=0 ϑℓ) Γ(n + ℓ=0 αℓ) ℓ ℓ ℓ P P ℓY=1 For simplicity of notations, define P P f(x) L(x, ϑ)= . g(x, ϑ)

Let y = [y ,y , ,y ]⊤ be a vector such that y = x +1 for some i 1, , k and that y = x for all 0 1 ··· k i i ∈{ ··· } ℓ ℓ ℓ 1, , k except ℓ = i. Then, ∈{ ··· } L(y, ϑ) x + α = i i 1. L(x, ϑ) xi + ϑi ≥

Making use of this observation and by an inductive argument, we have that for z = [z ,z , ,z ]⊤ such 0 1 ··· k that x z for ℓ =1, , k, it must be true that ℓ ≤ ℓ ··· L(z, ϑ) 1. L(x, ϑ) ≥

39 It follows that f(x) Λ(ϑ) ϑ Θ provided that x z , ℓ =1, , k, g(x, ϑ) ≤ ∀ ∈ ℓ ≤ ℓ ··· where k Γ( k α ) Γ(n + k ϑ ) Γ(z + α ) Γ(ϑ ) Λ(ϑ)= ℓ=0 ℓ ℓ=0 ℓ ℓ ℓ ℓ , k k Γ(z + ϑ ) Γ(α ) Γ( ℓ=0 ϑℓ) Γ(n + ℓ=0 αℓ) ℓ ℓ ℓ P P ℓY=1 Θ is the set of vectors ϑ = [ϑ , ϑP, , ϑ ]⊤ suchP that ϑ = α and 0 < ϑ α , ℓ = 1, , k. This 0 1 ··· k 0 0 ℓ ≤ ℓ ··· implies that

f(X ) I X 4z Λ(ϑ) g(X , ϑ) ϑ Θ, { } ≤ ∀ ∈ where X = [X ,X , ,X ]⊤ and X 4 z means X z , ℓ =1, , k. By virtue of Theorem 1, we have 0 1 ··· k ℓ ≤ ℓ ··· Pr Xℓ zℓ, ℓ =1, , k = Pr X 4 z infϑ Θ Λ(ϑ). { ≤ ··· } { }≤ ∈ As a consequence of the assumption that 0

nαℓ α0 k α0zℓ Pi=0 αi θℓ = = αℓ n k z ≤ n k nαℓ i=1 i ℓ=1 k α − − Pi=0 i P P for ℓ =1, , k. Define θ = [θ ,θ , ,θ ]⊤. Then, θ Θ and ··· 0 1 ··· k ∈

Pr Xℓ zℓ, ℓ =1, , k inf Λ(ϑ) Λ(θ). { ≤ ··· }≤ ϑ Θ ≤ ∈ This completes the proof of the theorem.

B.2 Proof of Theorem 27

To apply the LR method to show inequality (44), we introduce a family of probability density functions

α ϑ α (p+1)/2 1 1 g(x, ϑ)= | | x − − exp tr(ϑx− ) , βpαΓ (α) | | −β p   where ϑ is a positive-definite real matrix of size p p such that ϑ 4 Ψ. Note that × α f(x) Ψ 1 1 = | | exp tr([Ψ ϑ]x− ) . g(x, ϑ) ϑ α −β − | |   For positive definite matrices x and z such that x 4 z, we have

1 1 tr([Ψ ϑ]x− ) tr([Ψ ϑ]z− ) − ≥ − as a consequence of ϑ 4 Ψ. If follows that

f(x) Λ(ϑ) g(x, ϑ) ≤ for ϑ 4 Ψ and x 4 z, where

α Ψ 1 1 Λ(ϑ)= | | exp tr([Ψ ϑ]z− ) . ϑ α −β − | |   Hence,

f(X) I X4z Λ(ϑ) g(X, ϑ) provided that ϑ 4 Ψ. { } ≤

40 2ρ Ψ 4 Ψ E By virtue of Theorem 1, we have Pr X z infϑ4 Λ(ϑ). In particular, taking z = ρ [X]= β 2α p 1 { }≤ − − and ϑ = β (2α p 1)z, we have 2 − − Pr X 4 ρΥ = Pr X 4 z { } { α} Ψ 1 β 1 | | exp tr([Ψ (2α p 1)z]z− ) ≤ β (2α p 1)z α −β − 2 − − 2   | p − − | α exp( 2 (2α p 1)) Ψ 1 1 = − − | | exp tr(Ψz− ) [ β (2α p 1)]pα z α −β 2 − − | |   1 p 1 = exp 1 (2α p 1) . ρpα −2 ρ − − −     This completes the proof of the theorem.

B.3 Proof of Theorem 28

For simplicity of notations, let X = [X , , X ]. 1 ··· n Let X = [x , , x ], 1 ··· n where x , , x are vectors of dimension k. Since X , , X are identical and independent, the joint 1 ··· n 1 ··· n probability density of X is

n n k/2 1/2 1 1 fX (X )= (2π)− Σ − exp (x µ)⊤Σ− (x µ) . | | −2 i − i − i=1 Y    To apply the LR method to show (45), we introduce a family of probability density functions

n n k/2 1/2 1 1 gX (X , ϑ)= (2π)− Σ − exp (x ϑ)⊤Σ− (x ϑ) , | | −2 i − i − i=1 Y    1 1 where ϑ is a vector of dimension k such that Σ− ϑ < Σ− µ. It can be checked that

X n fX ( ) 1 1 1 1 = exp (µ⊤ ϑ⊤)Σ− xi + [ϑ⊤Σ− ϑ µ⊤Σ− µ] gX (X , ϑ) − 2 − i=1   Y n 1 1 1 1 = exp (µ⊤ ϑ⊤)Σ− x + [ϑ⊤Σ− ϑ µ⊤Σ− µ] , − n 2 −    where n x x = i=1 i . n n P 1 1 As a consequence of Σ− ϑ < Σ− µ, we have that

1 1 (µ⊤ ϑ⊤)Σ− u (µ⊤ ϑ⊤)Σ− v − ≤ − 1 1 for arbitrary vectors u and v such that v < u. This implies that for ϑ such that Σ− ϑ < Σ− µ,

fX (X ) Λ(ϑ) provided that xn < z, gX (X , ϑ) ≤

41 where n 1 1 1 1 Λ(ϑ)= exp (µ⊤ ϑ⊤)Σ− z + [ϑ⊤Σ− ϑ µ⊤Σ− µ] . − 2 −    It follows that X X I ϑ X X ϑ f ( ) Xn

Pr Xn z inf Λ(ϑ). { ≥ }≤ Σ−1ϑ<Σ−1µ

By differentiation, it can be shown that the infimum is attained at ϑ = z. Hence, Pr X z Λ(ϑ). { n ≥ } ≤ This completes the proof of the theorem.

B.4 Proof of Theorem 29

Let X denote the random matrix of size k n such that the j-th column is X . Let x denote a matrix of × j size k n such that the j-th column is [x , x , , x ]⊤. Then, the joint probability density function of × 1j 2j ··· kj X is (α+k) n k k − α + i 1 xij fX (x)= − 1 k + .  β − β  j=1 i=1 i ! i=1 i ! Y Y X To apply the LR method, we introduce a family of probability density functions

(ϑ+k) n k k − ϑ + i 1 xij gX (x, ϑ)= − 1 k + , ϑ > α.  β − β  j=1 i=1 i ! i=1 i ! Y Y X   Clearly, ϑ α k n n k − fX (x) α + i 1 x = − 1 k + ij . gX (x, ϑ) ϑ + i 1  − β  i=1 ! j=1 i=1 i ! Y − Y X Using the fact that the geometric mean does not exceed the arithmetic mean, we have that

n k k n x u 1 k + ij 1 k + i , − β ≤ − β j=1 i=1 i ! i=1 i ! Y X X where n 1 u = x . i n ij j=1 X Hence, ϑ α n k k − fX (x) α + i 1 u − 1 k + i . gX (x, ϑ) ≤  ϑ + i 1 − β  i=1 ! i=1 i ! Y − X This implies that   X X I X X f ( ) Xn z Λ(ϑ) g ( , ϑ), ϑ > α, {  } ≤ ∀ where n k k ϑ α α + i 1 z − Λ(ϑ)= − 1 k + i .  ϑ + i 1 − β  i=1 ! i=1 i ! Y − X   42 By virtue of Theorem 1, we have Pr X z Λ(θ) for any θ > α. This proves the first statement. { n  }≤ Note that if (48) holds, then θ > α for θ satisfying (47). Moreover, by differentiation, it can be shown that Pr X z inf Λ(ϑ)=Λ(θ). This proves statement (II). { n  }≤ ϑ>α For θ satisfying (49), it must be true that θ > α as a consequence of the assumption that α > 1 and 1 k zi < α . This proves statement (III). The proof of the theorem is thus completed. k i=1 βi α 1 − P References

[1] A. C. Berry, “The accuracy of the Gaussian approximation to the sum of independent variates,” Trans. Amer. Math. Soc., vol. 49, no. 1, pp. 122–139, 1941.

[2] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of obser- vations,” Ann. Math. Statist., vol. 23, pp. 493–507, 1952.

[3] X. Chen, “New probabilistic inequalities from monotone likelihood ratio property,” arXiv:1010.3682v1 [math.PR], October 2010.

[4] X. Chen, “A likelihood ratio approach for probabilistic inequalities,” arXiv:1308.4123 [math.PR], August 2013.

[5] X. Chen, “Probabilistic inequalities with applications to machine learning,” Proceeding of SPIE con- ference, Baltimore, USA, May 2014.

[6] X. Chen, “New optional stopping theorems and maximal inequalities on stochastic processes,” arXiv:1207.3733 [math.PR], July 2012.

[7] P. C. Consul, Generlized , Dekker, 1989.

[8] C. G. Esseen, “On the Liapunoff limit of error in the theory of probability,” Ark. Mat. Astron. Fys., vol. A28, no. 9, pp. 1-19, 1942.

[9] A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, Chapman and Hall/CRC, 1999.

43