arXiv:1408.2754v2 [math.PR] 14 Nov 2014 The Introduction 1 deviations large function variation weig some of show series to variables. conce only are random is which literature Bernoulli variables paper symmetric The random this of of Cram´er [2]). goal transform the A (see variables 3]). the [4, random of monographs contexts i.i.d. general of more sequence a of X oooia pcs(e 5 r[].Let [1]). or [5] (see spaces topological domain f 2011/01/B/ST1/03838. endby defined scle the called is X ∗ : 1 7→ e od:Rdmce eis rme rnfr,Legendre Cram´er transform, series, Rademacher words: Key 00MteaisSbetClassification: Subject 2010 h rme rnfr sthe Cram´er is The transform h uhri upre ytePls ainlSineCne,Gra Center, Science National Polish the by supported is author The X t ulsae By space. dual its Cram´er transform edn ymti enul admvrals(Rademacher variables random Bernoulli symmetric pendent R 7→ f {∞} ∪ of ∗∗ aitoa oml o h rme rnfr fsre of series of Cram´er transform the for formula variational A frv ewl edtegnrlnto fteLgnr-ece tr Legendre-Fenchel the of notion general the need will We r.v. of f R ( ∗ x f ( {∞} ∪ sup = ) i.e. , x eedeFnhltransform Legendre-Fenchel ∗ sup = ) endby defined rme rnfr fRdmce series Rademacher of Cram´er transform x ∗ ∈ nttt fMteais nvriyo Bialystok of University Mathematics, of Institute D x X eafnto nonidentically function a be ∈ ( ∗ X f {h {h = ) ensa defines kdmca2 527Baytk Poland Bialystok, 15-267 2, Akademicka h· ,x x, ,x x, , ·i ∗ { − i ∗ ednt h aoia arn between pairing canonical the denote we x − i ryzo Zajkowski Krzysztof ag eito principles deviation large ∈ [email protected] aefunction rate f eedeFnhltransformLegendre-Fenchel f X ∗ ( ( x x : ∗ ) } ) } Abstract X f sup = sup = ( x eara oal ovxHudr pc and space Hausdorff convex locally real a be ( x ovxconjugate convex 1 ) x ∈D ∗ ∈D < fthe of ( 41,60F10 44A15, f ) ( ∞} {h f ∗ ∞ ) ,x x, {h function A . ag deviations large By . ,x x, ∗ − i ∗ 1 − i svr at(e o instance for (see vast very is D f ( ( of ) f fthe of x f ednt the denote we ) ) ∗ } ( eis sgiven. is series) x f f ∗ ) n function a and uuatgenerating cumulant egtd inde- weighted, ∗ Fnhltransform, -Fenchel ( } o miia means empirical for x td independent hted, : ∗ X X ∈ ( tn.DEC- no. nt ∗ x lfruafor formula al X and 7→ ∈ nn much rning ∗ nfr in ansform ) X R X effective ) {∞} ∪ ∗ Let . f ∗∗ : is called the convex biconjugate of f. The functions f ∗ and f ∗∗ are convex and lower semicontinuous in the weak* and weak topology on X∗ and X, respectively. Moreover, the biconjugate theorem states that the function f : X 7→ R ∪ {∞} not identically equal to +∞ is convex and lower semicontinuous if and only if f = f ∗∗. Let I be a countable set and (ǫi)i∈I be a Bernoulli sequence, i.e. a sequence of i.i.d. 2 2 symmetric r.v’s taking values ±1. For t =(ti)i∈I ∈ ℓ (I) ≡ ℓ the series

Xt := tiǫi Xi∈I converges a.s.. Notice that for t ∈ ℓ1

|Xt| ≤ |ti| = ktk1, Xi∈I i.e. Xt is a bounded r.v. and we can define its cumulant generating function on whole R that is sXt ψt(s) = ln Ee for every s ∈ R. Because (ǫi)i∈I is i.i.d. Bernoulli sequence then

stiǫi ψt(s) = ln Ee Yi∈I esti + e−sti = ln = ln cosh(st ). 2 i Yi∈I Xi∈I Observe that ′ ψt(s)= ti tanh(sti). Xi∈I ∗ We can not derive an evident form of ψt by using the classical Legendre transform ′ because we can not solve (inverse the derivative ψt) the equation

′ ψt(s)= α (1) and find ∗ ψt (α)= αsα − ψt(sα), where sα is a solution of the equation (1). ∗ The following theorem shows some variational expression on ψt .

1 Theorem 1.1. Let (ǫi)i∈I be a Bernoulli sequence and t =(ti)i∈I ∈ ℓ (I). The Cram´er transform of a variable Xt = i∈I tiǫi is given by the following variational formula P ∗ ∗ b ψt (α) = min∗ ψ1( ) b∈D(ψ1 ) Pi∈I tibi=α

2 for α ∈ (−ktk1, ktk1) and +∞ otherwise, where 1 ψ∗(b)= 1+ b ln 1+ b + 1 − b ln 1 − b 1 2 i i i i Xi∈I h    i

1 Xt is the of a functional ψ1 : ℓ 7→ R of the form ψ1(t) = ln Ee and ∗ D(ψ1) ⊂ ℓ∞(I) denotes its effective domain. Remark 1.2. Presented in the next section proof techniques are similar, but not the same, to methods used by Ostaszewska and Zajkowski in [6, 7].

2 Proof of Theorem 1.1

We begin with an observation on the of the cumulant generating func- 1 tion: |ψt(s)|≤|s|ktk1. A parameter t may be an arbitrary element of ℓ . Formally we can define a function ψ of two variables:

sXt 1 ψ(s, t)= ψt(s) = ln Ee for (s, t) ∈ R × ℓ .

Fixing t or s we write ψ(s, t)= ψt(s) or ψ(s, t)= ψs(t), respectively. First we derive ∗ ∗ ∗ ψs and next we show how ψt is expressed by ψs . 1 In a standard way one can check the convexity of ψs for every s ∈ R. Let t, u ∈ ℓ and λ ∈ (0, 1) then

s Pi∈I (λti+(1−λ)ui)ǫi ψs(λt + (1 − λ)u) = ln Ee λ 1−λ = ln E es Pi∈I tiǫi es Pi∈I uiǫi .     Using the H¨older inequality for exponents 1/λ and 1/(1 − λ) we get

λ 1−λ λ 1−λ E es Pi∈I tiǫi es Pi∈I uiǫi ≤ Ees Pi∈I tiǫi Ees Pi∈I uiǫi       and, in consequence,

s Pi∈I tiǫi s Pi∈I uiǫi ψs(λt + (1 − λ)u) ≤ λ ln Ee + (1 − λ) ln Ee

= λψs(t)+(1 − λ)ψs(u).

1 1 ∗ Because ψs : ℓ 7→ R and (ℓ ) ≃ ℓ∞ then ∗ R ψs : ℓ∞ 7→ ∪{+∞}.

Let a =(ai)i∈I ∈ ℓ∞. By the definition of the convex conjugate we have

∗ a t a ψs ( ) = sup h , i − ln cosh(sti) , (2) t 1 ∈ℓ n Xi∈I o

3 t a where h , i = i∈I tiai. Note that forPs = 0 we have 0 if a = 0, ψ∗(a)= 0  +∞ otherwise. Assume now that s =6 0. An expression in the curly bracket of (2), denote it by w, is 1 concave and its partial derivatives along vector of basis ei = (δij)j∈I in ℓ (δij is the Kronecker delta) equal ∂ ∂ w(t)= tiai − ln cosh(sti) = ai − s tanh(sti). ∂ti ∂ti  Xi∈I Xi∈I 

The expression w is a sum of functions with separated variables (ti)i∈I . Concavity of each of these functions implies that the gradient ∇w(t)=(ai − s tanh(sti))i∈I belongs to the subgradient ∂w(t) since

∀u∈ℓ1 w(t) − w(u) ≤ (ti − ui)[ai − s tanh(sti)] = ht − u, ∇w(t)i . Xi∈I The w attained its maximum (global) at the point t if and only if 0 ∈ ∂w(t). It suffices that

∀i∈I ai − s tanh(sti)=0.

1 1+x Because arc tanh(x)= 2 ln 1−x for |x| < 1 then the partial derivatives equal zero when

ai 1 1+ s ai ti = ln ai for < 1. 2s 1 − s s

Substituting the above values of ti’s into (2) we get

∗ a 1 ai ai ai ai ai ψs ( )= 1+ ln 1+ + 1 − ln 1 − for < 1. 2 h s s s s i s Xi∈I     ∗ Look a bit closely at the effective domain of ψs that is at the set

∗ a ∗ a D(ψs )= ∈ l∞ : ψs ( ) < ∞ . n o The function f(x)=(1+ x) ln(1 + x)+(1 − x) ln(1 − x) is even and f(0) = 0. Since lim|x|→1− = 2 ln 2 we can extend its domain to the interval [−1, 1]. One can check that (1 + x) ln(1 + x)+(1 − x) ln(1 − x) ≥ x2. It follows that a a a a 1 1+ i ln 1+ i + 1 − i ln 1 − i ≥ a2 s s s s s2 i Xi∈I h    i Xi∈I

4 and |ai|≤|s|. Let B∞(0; r) denote of the closed ball at the center 0 and radius r in the space ℓ∞. The properties of f gives that ∗ 0 2 D(ψs ) ⊂ B∞( ; |s|) ∩ ℓ . ∗ a ∗ a ∗ Let us note that D(ψs ) is a symmetric set that is ∈D(ψs ) if and only if − ∈D(ψs ). Moreover it is symmetric with respect to each coordinates ai of a. Return to the function ψt. Let us observe that

′ |ψt(s)| = ti tanh(sti) < ktk1

Xi∈I

′ ∗ ′ and lims→±∞ ψt(s)= ±ktk1. It follows D(ψt )= ψt(R)=(−ktk1, ktk1). Because ψt is convex and continuous on R then, by the biconjugate theorem, we get

∗∗ ∗ ψt(s)= ψt (s) = sup αs − ψt (α) . t t α∈(−k k1,k k1)  On the other hand

1 ai ai ai ai ψt(s)= ψs(t) = sup ht, ai − 1+ ln 1+ + 1 − ln 1 − . a ψ∗ 2 s s s s ∈D( s ) n Xi∈I h    io a b ∗ b ∗ b b ∗ If we take = s then ψs (s )= ψ1( ) with ∈ D(ψ1 ). It means that we can rewrite the above variational principle as follows 1 ψt(s) = sup s ht, bi − 1+ bi ln 1+ bi + 1 − bi ln 1 − bi . (3) b ψ∗ 2 ∈D( 1 ) n Xi∈I h    io Take now α = ht, bi. Recall that

sup ht, bi = ktk1. b∈B∞(0;1)

We show that every number in (−ktk1, ktk1) is taken by the inner product ht, bi over ∗ b the set D(ψ1). Observe that a vector = i∈J r(sgn ti)ei, where J is some finite ∗ subset of I and r ∈ [−1, 1], belongs to D(ψ1P) (only finite number of nonzero terms). For this vector we have ht, bi = r |ti|. Xi∈J t b ∗ It follows that the inner product h , i attains over the set D(ψ1) any number belonging to the interval (−ktk1, ktk1). t 1 ∗ For a fixed ∈ ℓ , intersect D(ψ1) ⊂ ℓ∞ with a family of hyperplains

b ∈ ℓ∞ : ht, bi = α . n oα∈(−ktk1,ktk1) 5 Now we can divide the supremum of (3) into two parts and get

t b ∗ b ψt(s) = sup sup s h , i − ψ1 ( ) t t b ∗ α∈(−k k1,k k1) ∈D(ψ1 ) n o ht,bi=α = sup sα − inf ψ∗(b) . (4) b ∗ 1 α∈(−ktk1,ktk1) n ∈D(ψ1 ) o ht,bi=α

Define a function ∗ t b ϕ (α) = inf ∗ ψ1 ( ). b∈D(ψ1 ) ht,bi=α ∗ We prove that in the above definition of function ϕt an infimum over the set D(ψ1)∩ {b ∈ ℓ∞ : ht, bi = α} is attained and we can replace it by a minimum over this set that is we prove ∗ t b ϕ (α) = min∗ ψ1( ) (5) b∈D(ψ1 ) ht,bi=α for α ∈ (−ktk1, ktk1) and +∞ otherwise. 1 ∗ By Banach-Alaoglu theorem the closed (unit) ball B∞(0;1) ⊂ ℓ∞ ≃ (ℓ ) is weak* compact and for each t and α ∈ (−ktk1, ktk1) the hyperplain Ht,α = {b ∈ ℓ∞ : ht, bi = α} is closed in this topology. We have that an intersection B∞(0;1) ∩ Ht,α is weak* compact. Let ℓ0 be the space of sequences with finite support. Obviously 0 ∗ ℓ0 ∩ B∞( ;1) ⊂D(ψ1) and Ht,α ∩ ℓ0 =6 ∅. We have

1 ∗ 0 ∀t∈ℓ ∀α∈(−ktk1,ktk1) D(ψ1) ∩ Ht,α ⊃ B∞( ;1) ∩ Ht,α ∩ ℓ0 =6 ∅.

∗ Recall that the function ψ1 is nonegative and lower semicontinuous in the weak* topol- ∗ 0 ogy. By Weierstrass Theorem ψ1 attains its minimum in the compact set B∞( ;1) ∩ ∗ Ht,α. Because an intersection of this set with the effective domain of ψ1 is nonempty ∗ then it means that a nonegative infimum is attained at some element in D(ψ1). It follows that in the definition of ϕt we can replace the infimum by minimum and the formula (5) holds. The formula (4) means that ψt is the convex conjugate of ϕt. To prove an equality ∗ ϕt = ψt we should show that ϕt is convex and lower semicontinuous. First we check the convexity of ϕt. Take α1, α2 ∈ (−ktk1, ktk1). If α1 or α2 do not belong to the interval (−ktk1, ktk1) then the value of ϕt at such αk equals ∞ and the b ∗ condition of convexity is trivially satisfied. Let k (k =1, 2) be vectors in D(ψ1)∩Ht,αk such that ∗ ∗ t b b ϕ (αk) = min∗ ψ1( )= ψ1 ( k). b∈D(ψ1 ) ht,bi=αk

6 Observe that for λ ∈ (0, 1)

ht,λb1 + (1 − λ)b2i = λ ht, b1i + (1 − λ) ht, b2i = λα1 + (1 − λ)α2, b b ∗ that is λ 1 + (1 − λ) 2 ∈ Ht,λα1+(1−λ)α2 . The above and convexity of ψ1 gives ∗ b b ϕt(λα1 + (1 − λ)α2) ≤ ψ1 (λ 1 + (1 − λ) 2) ∗ b ∗ b ≤ λψ1 ( 1)+(1 − λ)ψ1( 2)= λϕt(α1)+(1 − λ)ϕt(α2). ∗ Now we prove the lower semicontinuity of ϕt. Recall that ψ1 is convex and lower semicontinuous in the weak* topology on ℓ∞. It means that for any c ∈ R the set b ∗ b { ∈ ℓ∞ : ψ1( ) ≤ c} (6) ∗ is weak* closed. Since ψ1 ≥ 0 we can assume that c ≥ 0. Because the above set is 0 ∗ contained in weak* compact unit ball B∞( ;1) ⊃D(ψ1) then it is also compact in this topology. Consider a range of the set (6) by the functional lt := ht, ·i, i.e.

∗ b lt ψ1( ) ≤ c . (7)   1 Since for each t ∈ ℓ the linear functional lt is continuous on ℓ∞ (also in the weak* topology), by the intermediate and extreme value theorems we get that the set (7) is a closed interval. By symmetry of the set (6) and linearity of the functional lt we get the existence of a α such that ∗ b lt ψ1( ) ≤ c = [−α,α].   We show that −1 ϕt ((−∞,c]) = [−α,α]. −1 ∗ b Let β ∈ ϕt ((−∞,c]). Since ψ1 is lower semicontinuous, there exists β such that ∗ ∗ t b b c ≥ ϕ (β) = min∗ ψ1( )= ψ1( β). b∈D(ψ1 ) ht,bi=β

That is ht, bβi = β ∈ [−α,α]. Conversely, let β ∈ [−α,α]. Since lt = ht, ·i is continuous ∗ b b′ ∗ b on the connected set {ψ1( ) ≤ c}, there is β ∈{ψ1 ( ) ≤ c} such that t b′ , β = β.

Note that ∗ ∗ ′ t b b ϕ (β) = min∗ ψ1 ( ) ≤ ψ1( β) ≤ c, b∈D(ψ1 ) ht,bi=β −1 that is β ∈ ϕt ((−∞,c]). ∗ Because ϕt is convex and lower semicontinuos then ψt = ϕt, which completes the proof.

7 Remark 2.1. The result of Theorem 1.1 is similar to those obtained by the contraction principle (see for instance [3]) but let us emphasize that we used the space of parameters ℓ1 to generate the convex conjugate of the investigated function and we did not consider any probability distribution on it.

Remark 2.2. Let us stress that the proof of Theorem 1.1 contains some scheme which allow us to generate, under some assumptions of course, variational formulas on the Cram´er transform for another series of random variables.

References

[1] V. Barbu, T. Precupanu, Convexity and Optimization in Banach Spaces, 4th ed., Springer Monographs in Mathematics, Springer, Dordrecht, 2012.

[2] H. Cram´er, Sur un nouveau th´eor`eme-limite de la th´eorie des probabilit´es, Actualit´es Scientifiques et Industrielles 736 (1938), 5-23. Colloque consacr´e `ala th´eorie des probabilit´es, Vol. 3, Hermann, Paris.

[3] A. Dembo, O. Zeitouni, Large Deviations Techniques and Applications. Corrected reprints of the second (1998) edition, Stochastic Modeling and Applied Probability, 38, Springer-Verlag, Berlin, 2010.

[4] J. D. Deuschel, D. W. Stroock. Large Deviations. Pure and Applied Mathematics, 137, Academic Press, Inc., Boston, 1989.

[5] I. Ekeland, R. T´emam, and Variational Problems, Translated from French. Corrected reprint of the 1976 English edition. Classics in Applied Mathe- matics, 28, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1999.

[6] U. Ostaszewska, K. Zajkowski, Cram´er transform and t-entropy, Positivity 18 (2014), no. 2, 347-358.

[7] K. Zajkowski, Convex conjugates of analytic functions of logarithmically convex functional, J. Convex Anal. 20 (2013), no. 1, 243-252.

8