Arxiv:1806.09062V3 [Math.FA]

arXiv:1806.09062v3 [math.FA] 12 Jun 2019 oiain iigdistance, mixing jorization, majorizatio matrix notably, functions; measurable vector-valued r nw,te perrte bcr nteltrtr;w los also we literature; the in possible. obscure when rather appear they known, are oeie denoted sometimes trix majorized majorized that adsmlryfor similarly (and iheult when equality with x Iran. operato Markov operators; stochastic functionals; convex Canada. A,Canada. 6A9, = IPIIDADUIIDGNRLZTO FSOME OF GENERALIZATION UNIFIED AND SIMPLIFIED A nti ok ecnetsvrlgnrlztoso aoiainin majorization of generalizations several connect we work, this In efis ealtedfiiino vco)mjrzto:if majorization: (vector) of definition the recall first We e od n phrases. and words Key 2010 osdrnwtomatrices two now Consider 1 2 3 eateto ahmtclSine,IfhnUiest o University Isfahan Sciences, Mathematical of Department eateto ahmtc n ttsis nvriyo Gu of University Statistics, and Mathematics of Department eateto ahmtc n optrSine rno Un Brandon Science, Computer and Mathematics of Department Sy aoiainrte hnvco aoiain lhuhmti ma matrix although majorization, vector than rather majorization x ≺ ahmtc ujc Classification. Subject Mathematics 1,Term8]. Theorem [11, h ieaue npriua,mti aoiainadmul in and related majorization matrix are particular, tion In literature. the inequalities. frslsfrvco-audfntoson functions vector-valued for results of ae yasqec fsohsi nerloeaos(uho (such operators in integral majorization stochastic matrix of considering operato when sequence stochastic naturally a any by spaces, mated finite-dimensional on that, on Abstract. y HRNMOEIN SHIRIN by by L seuvln oteeitneo obysohsi matrix stochastic doubly a of existence the to equivalent is 1 y T pc,kona tcatcoeaoso akvoeaos W operators. Markov or operators stochastic as known space, denoted , denoted , y ecnie oiie nerlpeevn ieroperato linear integral-preserving positive, consider We .Awl-nw hoe fHry iteod n Pólya states and Littlewood, Hardy, of theorem well-known A ). k ≺ X j =1 k d = arxmjrzto;mliait aoiain subline majorization; multivariate majorization; matrix or AOIAINRESULTS MAJORIZATION n x 1,2 x R j ↓ where , R n ≺ AEHPEREIRA RAJESH , ≺ ≤ In . f S ≺ y dvrec,adcas riig hl oeresults some While graining. coarse and -divergence, X j if , =1 odsigihi rmvco aoiain fthere if majorization) vector from it distinguish to k T R 1. R y x hs r loeuvln ocne function convex to equivalent also are these , weei scerfo otx htti is this that context from clear is it (where ∈ j ↓ Introduction a enrodrds that so reordered been has M ∀ m k 55,2B5 61,4B5. 47B65 26D15, 26B25, 15B51, 1 × { ∈ n L ( 1 R 1 ipiyn oepof on in found proofs some simplifying , 2 n , . . . , and ) N AA PLOSKER SARAH AND , rs; L − T ehooy saa 84156-83111, Isfahan Technology, f 1 .W olc number a collect We ). 1 ∈ } lh ulh NNG2W1, N1G ON Guelph, elph, M iait majoriza- tivariate a eapproxi- be can r vriy rno,M R7A MB Brandon, iversity, ,y x, p × eaosarise perators n x ,mliait ma- multivariate n, ( mlf arguments implify ∈ 1 ↓ R ≥ .W say We ). sacting rs R show e n x 3,2 esay we , 2 ↓ rfunctionals; ar eeec to reference oiainis jorization S ≥ · · · ≥ uhthat such R ma- x x is is n ↓ 2 SHIRINMOEIN,RAJESHPEREIRA,ANDSARAHPLOSKER exists a column stochastic matrix S ∈ Mm×p(R) such that R = ST . For more information on matrix majorization, see [8]; when we restrict ourselves to the special case where m = p and S is doubly stochastic we get a more restrictive ordering called multivariate majorization, see [18, Chapter 15]. Matrix majorization has recently been generalized to quantum majorization between bipartite states [10]. We denote by L1(X,µ), or simply L1(X) if the measure µ is clear from context, 1 the set of all functionals f satisfying X |f|dµ < ∞. If f ∈ L (X), the distribution function of f is defined by df (s) =R µ({x : f(x) > s}) for all real s, and the decreasing rearrangement of f is defined by ↓ f (t) = inf{s : df (s) ≤ t}, 0 ≤ t ≤ µ(X)

= sup{s : df (s) >t}, 0 ≤ t ≤ µ(X). We are now in the position to define continuous majorization. Typically the word “continuous” is dropped as it is clear from context. Definition 1.1. Let (X,µ) and (Y,ν) be finite measure spaces for which a = µ(X)= ν(Y ). If f ∈ L1(X,µ) and g ∈ L1(Y,ν) satisfy t t f ↓dx ≤ g↓dx ∀t : 0 ≤ t ≤ a Z0 Z0 a a and f ↓dx = g↓dx, Z0 Z0 where the integration is with respect to Lebesgue measure, then we say that f is majorized by g, denoted f ≺ g. Following [8], we define the positive homogeneous subadditive functionals on Rn, also called sublinear functionals, to be all functionals ψ satisfying ψ(λx) = λψ(x) and ψ(x + y) ≤ ψ(x)+ ψ(y) for all x, y ∈ Rn, and λ ≥ 0. Part of [8, Theorem 3.3] shows that if R ∈ Mm×n(R) and T ∈ Mp×n(R), then m p R ≺ T is equivalent to j=1 ψ(rj ) ≤ j=1 ψ(tj ) for all sublinear functionals ψ, where rj is the jth row ofP the matrix RP, and similarly for tj . Given a measure space (X,µ), let L1(X, µ, Rn), or simply L1(X, Rn), denote the Rn set of all measurable functions f from (X,µ) to that satisfy X |f|dµ < ∞, n where |f|(x)= k=1 |fk(x)|. R The notion ofP a stochastic matrix was generalized to a stochastic operator on L1([0, 1]) in [23]; we provide the corresponding definition for a stochastic operator from L1(Y,ν) to L1(X,µ). Such an operator is sometimes referred to a Markov operator in the literature [16]. Definition 1.2. Let (X,µ) and (Y,ν) be σ-finite measure spaces. A linear operator S : L1(Y ) → L1(X) is called a stochastic operator if (1) S is positive (that is, S takes positive elements to positive elements), and 1 (2) X Sfdµ = Y fdν, ∀f ∈ L (Y ). Moreover,R if in additionR to the two conditions above, µ(X)= ν(Y ) < ∞ and S1=1, then S is called a doubly stochastic operator. We have the following lemma which will be useful later on. Lemma 1.3. Let (X,µ) and (Y,ν) be σ-finite measure spaces. Let f ∈ L1(Y ) and 1 1 S : L (Y ) → L (X) be a stochastic operator, then X |(Sf)(t)|dµ(t) ≤ Y |f(y)|dν(y). R R A SIMPLIFIED AND UNIFIED GENERALIZATION OF SOME MAJORIZATION RESULTS 3

Proof. Let f+(y) = max(f(y), 0) and f−(y) = max(−f(y), 0). Then

|Sf(x)|dµ(x) ≤ S(f+)(x)dµ(x)+ S(f−)(x)dµ(x) ZX ZX ZX

= f+(y)dν(y)+ f−(y)dν(y) ZY ZY = |f(y)|dν(y). ZY

Note that the absolute value function is a nonnegative sublinear functional on R; we will later show that a similar inequality holds for all nonnegative sublinear functionals on Rn. We ﬁrst need to describe how S acts on an element of L1(Y, Rn). 1 n If f = (f1,f2, ..., fn) ∈ L (Y, R ), then S acts componentwise on f; that is, Sf = (Sf1,Sf2, ..., Sfn).

Definition 1.4. Let (X,µ) be a σ-finite measure space and let P = {Ei}i∈N be a partition of X into disjoint measurable sets of finite measure. We define 1 MP to be the operator which maps every f ∈ L (X) to i∈N aiχEi where ai = 1 f x dµ x E a E µ(Ei) Ei ( ) ( ) if i has positive measure and i =0 ifP i is measure zero. R We note that it is easy to verify that MP in Definition 1.4 is a stochastic operator on L1(X); in fact, it maps 1 7→ 1 and is therefore a doubly stochastic operator. Let K be a convex set. A function f : K → K is affine if f(λx + (1 − λ)y) = λf(x)+(1 − λ)f(y) for all x, y ∈ K and all λ ∈ (0, 1). We note that affine functions on the nonnegative face of the unit ball of L1 are exactly the stochastic operators. Affine transformations on measure spaces are used to define coarse graining, a relation on the measurement statistics coming from two positive operator valued measures [2,3,13,14,25]. A stochastic operator is an affine transformation between nonnegative faces of the unit balls in the respective measure spaces. Note that the L1 norm is one of the few norms where the nonnegative elements of the unit ball form a face. The following theorem is a combination of two well-known results in the literature. Theorem 1.5. If f ∈ L1(X,µ), g ∈ L1(Y,ν) where µ(X) = ν(Y ) < ∞, then the following are equivalent: (1) f ≺ g, as in Definition 1.1. (2) For all convex functions φ : R → R,

φ(f)dµ ≤ φ(g)dν. ZX ZY (3) There exists a doubly stochastic operator D : L1(Y ) → L1(X) such that f = Dg. Proof. Chong [7] in Theorem 2.5 proved the equivalence of 1 and 2 and Day [9] in Theorem 4.9 proved the equivalence of 2 and 3.

Of particular interest are the integral operators which are stochastic or doubly stochastic. 4 SHIRINMOEIN,RAJESHPEREIRA,ANDSARAHPLOSKER

Deﬁnition 1.6. A stochastic kernel is a measurable function S : X × Y → [0, ∞) such that X S(x, y)dµ(x)=1 for almost all y ∈ Y . A doubly stochastic kernel is a stochasticR kernel with the additional property that Y S(x, y)dν(y)=1 for almost all x ∈ X. R Deﬁnition 1.7. An integral operator M from L1(Y ) to L1(X) given by Mg =

Y S(x, y)g(y)dν(y) is said to be a stochastic integral operator (resp. doubly sto- chasticR integral operator) if S(x, y) is stochastic kernel (resp. doubly stochastic kernel). All stochastic integral operators are stochastic operators, and all doubly stochastic integral operators are doubly stochastic operators. However, the converse of either statement is false. Indeed, consider the identity operator which is a doubly stochastic operator but is not a doubly stochastic integral operator nor a stochastic integral operator.

2. Convex function inequalities We now discuss some properties of convex and sublinear functionals. Proposition 2.1. Let K be a convex cone of Rn. Let φ : K → R be a convex v functional. Then ψ(v, x) := xφ( x ) is a sublinear functional on the cone K ×(0, ∞). λv v Proof. For λ > 0, we have ψ(λ(v, x)) = λxφ( λx ) = λxφ( x ) = λψ(v, x). Next, let v1, v2 ∈ K and x1, x2 ∈ (0, ∞). Then by convexity of φ we have v + v x v x v x v x v φ 1 2 = φ 1 1 + 2 2 ≤ 1 φ 1 + 2 φ 2 . x1 + x2 x1(x1 + x2) x2(x1 + x2) x1 + x2 x1 x1 + x2 x2 It follows that ψ is a sublinear functional on K × (0, ∞).

Due to issues with convergence, we have avoided considering x = 0 in the above proposition, whence the restriction to (0, ∞). However, one can consider x → 0+ to obtain the recession function φ∞. Note that the converse of the above proposition is immediate; any sublinear functional on a convex set is automatically convex on that set. Proposition 2.2. Let K be a convex subset of Rn. For any continuous convex nonnegative functional φ on K, there exists an increasing sequence of Lipschitz ∞ convex nonnegative functionals {φk}k=1 that converges pointwise to it on K. If ∞ further, K is a convex cone and φ is sublinear, then {φk}k=1 can be taken to be sublinear. If K is a closed convex cone in Rn, then in fact every continuous sublinear functional φ : K → R is Lipschitz. To prove Theorem 3.3, we require the generalization of Jensen’s inequality to the multivariate case; see [18, Proposition 16.C.1]. Theorem 2.3. (multivariate Jensen’s inequality) Let (X,µ) be a probability measure space. Let φ : Rn → R be a convex function and let f ∈ L1(X, Rn). Then

φ( X f(x)dµ(x)) ≤ X φ(f(x))dµ(x). R R If φ is a sublinear function, we no longer require the measure to be a probability measure. A SIMPLIFIED AND UNIFIED GENERALIZATION OF SOME MAJORIZATION RESULTS 5

Theorem 2.4 (Roselli-Willem inequality). [21, Theorem 6] Let (X,µ) be an arbitrary measure space. Let φ : Rn → R be a sublinear function and let f ∈ L1(X, Rn).

Then φ( X f(x)dµ(x)) ≤ X φ(f(x))dµ(x). R R The related concept of f-divergence (which we call φ-divergence since φ is convex) was introduced by Csiszár [6] and was studied extensively by statisticians [5, Chapter 2] and [4,15,17,20,24]. Similar concepts with different names have also appeared in the Physics literature [12, Chapter 6] and [19, 25]. Definition 2.5. Let (X,µ) be a measure space and V be a real vector space. Let φ be a real valued convex function on V . Let f : X → V and let h : X → (0, ∞) with all functions being measurable. Then the φ-divergence of f with respect to h f is X hφ( h ) dµ. R The following result is useful in relating φ-divergence inequalities with sublinear functional integral inequalities: Theorem 2.6. Let (X,µ) and (Y,ν) be measure spaces, f ∈ L1(X,K), g ∈ L1(Y,K), h ∈ L1(X, (0, ∞)) and k ∈ L1(Y, (0, ∞)), where K is a convex cone. Define F : X → K × R and G : Y → K × R as F (x) = (f(x),h(x)) and G(y) = (g(y), k(y)). The following are equivalent. (1) The φ-divergence of f with respect to h is less than or equal to that of g with f g respect to k; that is, X φ( h )hdµ ≤ Y φ( k )kdν, for all convex functions φ : K → R. R R (2) X ψ(F (x))dµ(x) ≤ Y ψ(G(y))dν(y) for all sublinear functionals ψ : K × (0R , ∞) → R. R Proof. (1)⇒ (2): Let ψ(v,t): K × (0, ∞) → R be a sublinear functional. Let φ(v) = ψ(v, 1) for all v ∈ K. Then φ is a real valued convex function on K. We can then use the sublinearity of ψ to prove the following: f(x) X ψ(F (x))dµ(x) = X φ( h(x) )h(x)dµ(x) R R g(x) ≤ Y φ( k(x) )k(x)dν(x) = RY ψ(G(x))dν(x). (2)⇒ (1): This implicationR is a straightforward application of Proposition 2.1. v Let φ be any real-valued convex functional on K. Then ψ(v, x) = xφ( x ) is a sublinear functional on K × (0, ∞), and we have f(x) X φ( h(x) )h(x)dµ(x) = X ψ(f(x),h(x))dµ(x) R ≤ RY ψ(g(y), k(y))dν(y) g(y) = RY φ( k(y) )k(y)dν(y). R 3. A generalization of matrix majorization With the notion of a stochastic operator from L1(Y,ν) to L1(X,µ), we can generalize the definition of matrix majorization:

Deﬁnition 3.1. Let (X,µ) and (Y,ν) be measure spaces. Let f = (f1,f2,...,fn) ∈ 1 n 1 n L (X, R ) and g = (g1,g2,...,gn) ∈ L (Y, R ). Then we say that f is matrix majorized by g, denoted f ≺M g, if there exists a stochastic operator S such that f = S(g); i.e., fk = Sgk for all k =1,...,n. 6 SHIRINMOEIN,RAJESHPEREIRA,ANDSARAHPLOSKER

It is straightforward to check that matrix majorization between measurable functions in L1 is a reﬂexive, transitive relation and therefore is a preorder, which gen- eralizes the same result in [8, Theorem 3.3] for matrix majorization on matrices. The term matrix majorization was coined by Dahl [8]. To see that our formulation is a generalization of Dahl’s, we now restrict ourselves to the special case where X = {1, 2, ..., m} and Y = {1, 2, ..., p} are ﬁnite sets, µ and ν are counting 1 n measures. We can represent each function f = (f1,f2, ..., fn) ∈ L (X, R ) as an n by m matrix Af whose kth column is f(k). We can see that f is matrix majorized by g if there exists a row stochastic matrix S such that Af = AgS; the later is Dahl’s formulation of matrix majorization on matrices. We now note that part of [8, Theorem 3.3] can now be rephrased as follows:

Theorem 3.2. Let X = {1, 2, ..., m}, Y = {1, 2, ..., p}, f ∈ L1(X, Rn) and g ∈ 1 Rn m p L (Y, ). Then f is matrix majorized by g if and only if k=1 φ(f(k)) ≤ k=1 φ(g(k)) for all sublinear functionals φ : Rn → R. P P This suggests the following one-sided extension to the general case.

Theorem 3.3. Let (X,µ) and (Y,ν) be measure spaces, f ∈ L1(X, Rn) and g ∈ L1(Y, Rn). If there exists a stochastic kernel S(x, y): X × Y → [0, ∞) such that f(x)= Y S(x, y)g(y)dν(y) then X φ(f(x))dµ(x) ≤ Y φ(g(y))dν(y) for all sublinear functionalsR φ : Rn → R. R R

Proof. Suppose there exists a stochastic kernel S(x, y): X × Y → [0, ∞) such that Rn R f(x)= Y S(x, y)g(y)dµ(y). Let φ : → be sublinear. Hence by using Theorem 2.4 andR Fubini’s Theorem X φ(f(x))dµ(x) = X φ( Y S(x, y)g(y)dν(y))dµ(x) R ≤ RX YRφ(S(x, y)g(y))dν(y)dµ(x) = RX RY S(x, y)φ(g(y))dν(y)dµ(x) = RY (R X S(x, y)dµ(x))φ(g(y))dν(y) = RY φR(g(y))dν(y), as desired. R

Note that, if we take X = Y = [0, 1] in Theorem 3.3, then this is nearly the deﬁnition of mixing distance [22, Deﬁnition 1a], except that the authors of [22] take φ to be any convex functions, or certain subsets thereof, whereas we are working with sublinear (positively homogeneous convex) functionals in accordance with [8]. This similarity hints at a connection between matrix majorization and the mixing distance.

Lemma 3.4. Let (X,µ) be a σ-ﬁnite measure space and let f ∈ L1(X). Then there ∞ exists a sequence of partitions {Pn}n=1 of X into disjoint sets of ﬁnite measure such ∞ 1 that {MPn f}n=1 converges to f in the L norm.

2 2n +1 Proof. Let Pn = {Ek}k=0 where E0 = {x ∈ X : f(x) < −n} and Ej = {x ∈ X : − j 1 j 2 2 f(x) ∈ [−n + n , −n + n )} if 1 ≤ j ≤ 2n and E2n +1 = {x ∈ X : f(x) ≥ n}. 2 If Ek has infinite measure for some k ∈{0,..., 2n +1}, since X is σ-finite, Ek is a countable disjoint union of sets of finite measure. Replace every such Ek in the ∞ partition with the sets of finite measure. It is then easy to verify that {MPn f}n=1 converges to f in the L1 norm. A SIMPLIFIED AND UNIFIED GENERALIZATION OF SOME MAJORIZATION RESULTS 7

We note that if P = {Ei}i∈N and Q = {Fi}j∈N are two partitions of X into disjoint sets of finite measure, then we can form the intersection partition P∩Q = {Ei ∩ Fj : i, j ∈ N}. Lemma 3.5. Let (X,µ) be a σ-finite measure space and let V be a finite dimen- 1 ∞ sional subspace of L (X). Then there exists a sequence of partitions {Pn}n=1 of X ∞ 1 into disjoint sets of finite measure such that {MPn f}n=1 converges to f in the L norm for all f ∈ V . Proof. The proof is by induction on the dimension of V with the base case being Lemma 3.4. Now suppose the induction hypothesis holds for dimension n and let V be a subspace of dimension n +1. Let S be a subspace of V of dimension n; ∞ by the induction hypothesis there exists a sequence of partitions {Pn}n=1 of X ∞ into disjoint sets of finite measure such that {MPn f}n=1 converges to f in the L1 norm for all f ∈ S. Now let g ∈ V with g 6∈ S, then by Lemma 3.4 there ∞ exists a sequence of partitions {Qn}n=1 into disjoint sets of finite measure such ∞ 1 that {MQn g}n=1 converges to g in the L norm. It is easy to see that sequence ∞ of intersection partitions {Rn = Pn ∩Qn}n=1 now satisfies the property that that ∞ 1 {MRn f}n=1 converges to f in the L norm for all f ∈ V . Theorem 3.6. Let (X,µ) and (Y,ν) be σ-finite measure spaces. Let S : L1(Y ) → L1(X) be a stochastic operator and let V be a finite dimensional subspace of L1(Y ). Then there exists a sequence of stochastic integral operators from L1(Y ) to L1(X) which converges to S on V .

Proof. Let P = {Ei}i∈N be a partition of X into disjoint sets of ﬁnite measure. 1 1 Then MP S is a stochastic operator from L (Y ) → L (X); we will show that it is a stochastic integral operator. Fix x ∈ X. Then there exists a unique k such that x ∈ Ek. Deﬁne the functional gx(f) = (MP Sf)(x). Then |g (f)| = | 1 (Sf(t))dµ(t)| x µ(Ek ) Ek 1 R Sf t dµ t ≤ µ(Ek ) X |( )( )| ( ) 1 R f y dν y ≤ µ(Ek ) Y | ( )| ( ) by Lemma 1.3 1 Rf . = µ(Ek ) k k1 1 Hence gx is a bounded linear functional of L (Y ). So by the Riesz representa- ∞ tion theorem, there exist a nonnegative function hx ∈ L (Y ) such that gx(f) =

Y f(y)hx(y)dν(y). Now let KP (x, y) = hx(y) for all x ∈ X and all y ∈ Y . SinceR KP (x, y)= i∈N χEi (x)hx(y), KP (x, y) is measurable. Then (MP Sf)(x)= Y KP (x, y)f(y)dνP(y). R Since MP S is a stochastic operator, we have

1 ((MP S)f)(x)dµ(x)= f(y)dν(y) ∀f ∈ L (Y ) ZX ZY and by using Fubini’s Theorem, we ﬁnd

Y f(y)dν(y) = X ((MP S)f)(x)dµ(x) R = RX Y KP (x, y)f(y)dν(y)dµ(x) = RY fR(y)( X KP (x, y)dµ(x))dν(y). R R Therefore X KP (x, y)dµ(x) = 1 for almost all y ∈ Y . Since V is a ﬁnite dimensional subspaceR of L1(Y ), the forward image S(V ) is a ﬁnite dimensional subspace of L1(X). The result now follows from Lemma 3.5. 8 SHIRINMOEIN,RAJESHPEREIRA,ANDSARAHPLOSKER

We also have a doubly stochastic version of this theorem: Theorem 3.7. Let (X,µ) and (Y,ν) be ﬁnite measure spaces. A doubly stochastic operator D : L1(Y ) → L1(X) on a ﬁnite dimensional subspace V of L1(Y ) can be approximated by doubly stochastic integral operators.

n Proof. Let P = {Ei}i=1 be a partition of X into disjoint sets of ﬁnite measure. The 1 1 operator MP : L (X) → L (X) from Deﬁnition 1.4 is a doubly stochastic operator and since the composition of doubly stochastic operators is a doubly stochastic operator, MP D is a doubly stochastic operator. By a proof similar to that of 1 Theorem 3.6, for all f ∈ L (Y ) we have (MP Df)(x)= Y KP (x, y)f(y)dν(y) such that X KP (x, y)dµ(x) = 1 for almost all y ∈ Y . NowR suppose f = 1. Then Y KPR(x, y)dν(y) = 1 for almost all x ∈ X and hence MP D is a doubly stochastic Rintegral operator.

Theorem 3.8. Let (X,µ) and (Y,ν) be two σ-ﬁnite measure spaces, f ∈ L1(X, Rn) and g ∈ L1(Y, Rn). If f is matrix majorized by g, then

φ(f(x))dµ(x) ≤ φ(g(y))dν(y) ZX ZY for all nonnegative sublinear functionals φ : Rn → [0, ∞).

Proof. Let g = (g1, ..., gn) and V = span{g1, ..., gn}. Since V is a ﬁnite dimensional subspace of L1(Y ), by using Theorem 3.6, there exists a sequence of sto- ∞ chastic integral operators {Sk}k=1 which converges to the stochastic operator S coming from Deﬁnition 3.1. Now by using Theorem 3.3, for each k ∈ N we obtain

X φ(Skg)dµ(x) ≤ Y φ(g)dν(y) for all sublinear functionals φ. R Since φ is sublinearR on Rn, it is Lipschitz; denote its Lipschitz constant as c. Then we have, n |φ(Skg) − φ(Sg)|dµ(x) ≤ c |Skgj − Sgj|dµ(x) ∀k ∈ N, where c ≥ 0. Z Z X Xj=1 X 1 Since limk→∞ Skgj = Sgj in L for all j, the left hand side must go to zero which means that limk→∞ X φ(Skg)dµ(x)= X φ(Sg)dµ(x). Therefore we have R R φ(f)dµ(x)= φ(Sg)dµ(x) ≤ φ(g)dν(y). ZX ZX ZY

We note that a special case of this result is a slight generalization of a theorem of Alberti; when X = Y and µ = ν, Theorem 3.8 reduces to one direction of the following result which was proved using methods from the theory of von Neumann algebras. Theorem 3.9. [1, Theorem 1] Let (X,µ) be a σ-ﬁnite measure space and f,g ∈ L1(X, Rn). Then f is matrix majorized by g, if and only if

φ(f(x))dµ(x) ≤ φ(g(x))dµ(x) ZX ZX for all nonnegative sublinear functionals φ : Rn → [0, ∞). A SIMPLIFIED AND UNIFIED GENERALIZATION OF SOME MAJORIZATION RESULTS 9

We do not know if the converse to Theorem 3.8 holds for arbitrary measures. We now consider a generalization of majorization known as multivariate majorization. In the setting of Rn, we can show (Theorem 3.11) that matrix majorization and multivariate majorization are strongly related. Definition 3.10. Let (X,µ) and (Y,ν) be finite measure spaces, f ∈ L1(X, Rn), and g ∈ L1(Y, Rn). Then f is multivariate majorized by g if there exists a doubly stochastic operator D : L1(Y ) → L1(X) such that f = Dg. Theorem 3.11. Let (X,µ) and (Y,ν) be finite measure spaces, f ∈ L1(X, µ, Rn), g ∈ L1(Y, ν, Rn), h ∈ L1(X, µ, (0, ∞)), and k ∈ L1(Y, ν, (0, ∞)). The following are equivalent:

(1) (f1,f2,...,fn,h) is matrix majorized by (g1,g2,...,gn, k); i.e., there exists 1 1 a stochastic operator S : L (Y,ν) → L (X,µ) such that Sgi = fi for all i =1, ..., n and Sk = h, f1 f2 fn g1 g2 gn (2) h , h ,..., h is multivariate majorized by k , k ,..., k with respect to measures α and β where the measures α and β are deﬁned by α = h dµ and β = k dν; i.e., there exists a doubly stochastic operator D : L1(Y,β) → 1 gi fi L (X, α) such that D k = h for all i =1, ..., n.

Proof. Let Ts denote the multiplication operator which maps any function f to the product sf. (1) ⇒ (2): Suppose there exists a stochastic operator S : L1(Y,ν) → L1(X,µ) such that Sgi = fi for all i =1,...,n and Sk = h. We now show that f1 f2 fn 1 g1 g2 gn 1 h , h ,..., h ∈ L (Y,β) is multivariate majorized by k , k ,..., k ∈ L (X, α). 1 1 The multiplication operator T1/h is a stochastic operator from L (X,µ) to L (X, α). 1 fi Note that for all i = 1,...,n, fi ∈ L (X,µ) and T1/h(fi) = h . Similarly, Tk is a 1 1 1 stochastic operator from L (Y,β) to L (Y,ν) and for all i =1,...,n, gi ∈ L (Y,β) 1 1 and Tk(gi) = kgi. Construct D = T1/hSTk : L (Y,β) → L (X, α), which is a stochastic operator since it is a product of stochastic operators. Furthermore, D1 = 1, gi fi therefore D is a doubly stochastic operator that maps k to h . (2) ⇒ (1): Assume there exists a doubly stochastic D : L1(Y,β) → L1(X, α) fi gi 1 1 such that h = D k . We define a stochastic operator Th : L (X, α) → L (X,µ) such 1 that for all i = 1,...,n, fi ∈ L (X,µ), and Th(fi) = fih. Similarly, define T1/k : 1 1 1 gi L (Y,ν) → L (Y,β) such that for all i =1,...,n, gi ∈ L (Y,ν) and T1/k(gi)= k . 1 1 Construct S = ThDT1/k : L (Y,ν) → L (X,µ). Since D is a (doubly) stochastic operator and the product of stochastic operators is a stochastic operator, S is a stochastic operator such that Sgi = fi for all i =1, ..., k and Sk = h. In the setting of R we can show that matrix majorization, multivariate majorization, and the convex function inequalities are all strongly related. The following theorem can be viewed as a simplified version of the result on mixing distance in [22]. Theorem 3.12. Let (X,µ) and (Y,ν) be finite measure spaces, f ∈ L1(X, R), 1 R 1 1 g ∈ L (Y, ), h ∈ L (X, (0, ∞)), k ∈ L (Y, (0, ∞)), with X h dµ = Y k dν. The following are equivalent: R R (1) There exists a stochastic operator S : L1(Y,ν) → L1(X,µ) such that Sg = f and Sk = h. 10 SHIRINMOEIN,RAJESHPEREIRA,ANDSARAHPLOSKER

(2) For all real valued convex functions on R, f g φ h dµ ≤ φ k dν. ZX h ZY k 1 1 g (3) There exists a doubly stochastic D : L (Y,β) → L (X, α) such that D k = f h , where the measures α and β are deﬁned by α = h dµ and β = k dν. g f Proof. Both (2) and (3) are equivalent to k ≺ h by Theorem 1.5. The equivalence of (1) and (3) follows from Theorem 3.11 with n = 1.

Acknowledgements S.M. was supported by a travel grant from the Iranian Ministry of Science Re- search and Technology. She gratefully acknowledges the University of Guelph and Brandon University for the time she spent at the respective universities during the course of this work. R.P. was supported by NSERC Discovery Grant number 400550. S.P. was supported by NSERC Discovery Grant number 1174582, the Canada Foundation for Innovation (CFI) grant number 35711, and the Canada Research Chairs (CRC) Program grant number 231250. The authors thank the anonymous referee for their helpful comments.

References

[1] P.M. Alberti, A note on stochastic operators on L1-spaces and convex functions, J. Math. Anal. Appl. 130(2) (1988), pp. 556-563. [2] P. Busch and R. Quadt, Concepts of coarse graining in quantum mechanics, Int. J. Theor. Phys. 32(12) (1993), pp. 2261-2269. [3] P. Busch and R. Quadt, On Ruch’s principle of decreasing mixing distance in classical statistical physics, J. Stat. Phys. 61(1/2) (1990), pp. 311-328. [4] P. Cerone and S.S. Dragomir, Approximation of the integral mean divergence and f- divergence via mean results, Math. Comput. Modelling 42(1-2) (2005), pp. 207-219. [5] J.E. Cohen, J.H.B. Kempermann, and G. Zb˘aganu, Comparisons of stochastic matrices with applications in information theory, statistics, economics and population. Birkhäuser: Boston, 1998. [6] I. Csiszár, Information measures of difference of probability distributions and indirect obser- vations, Studia Sci. Math. Hungar., 2, (1967) pp. 299–318. [7] K.M. Chong, Some extensions of a theorem of Hardy, Littlewood and Pólya and their applications, Canadian Journal of Mathematics, 26, (1974) pp. 1321-1340. [8] G. Dahl, Matrix majorization, Lin. Alg. Appl., 288 (1999), pp. 53-73. [9] P.W. Day Decreasing rearrangements and doubly stochastic operators, Amer. Math. Soc, 178 (1973), pp. 383-392. [10] G. Gour, Quantum majorization and a complete set of entropic conditions for quantum thermodynamics, arXiv preprint arXiv:1708.04302 (2017). [11] G.H. Hardy, J.E. Littlewood, and G. Pólya, Some simple inequalities satisfied by convex functions, Messenger Math, 58 (1929), pp. 145-152. [12] M. Hayashi, S. Ishizaka, A. Kawachi, G. Kimura, and T. Ogawa, Introduction to quantum information science. Springer-Verlag: Berlin, 2014. [13] T. Heinonen, Optimal measurements in quantum mechanics, Phys. Lett. A, 346 (2005), pp. 77-86. [14] T. Heinosaari and M. Ziman, The mathematical language of quantum theory: from uncer- tainty to entanglement, Cambridge Univ. Press: New York (2011). [15] K.C. Jain and P. Chhabra, New information inequalities on new generalized f-divergence and applications, Le Matematiche 70(2) (2015), pp. 271-281. [16] A. Lasota and M.C. Mackey, Chaos, fractals and noise: stochastic aspects of dynamics, 2nd ed. (1994), Springer: New York A SIMPLIFIED AND UNIFIED GENERALIZATION OF SOME MAJORIZATION RESULTS 11

[17] F. Liese and I. Vajda, On divergences and informations in statistics and information theory, IEEE Trans. Information Theory 52(10) (2006), pp. 4394-4412. [18] A.W. Marshall, I. Olkin, and B.C. Arnold, Inequalities: theory of majorization and its applications, 2nd ed. (2011), Springer: New York. [19] T. Morimoto, Markov processes and the H-theorem, J. Phys. Soc. Jpn, 18 (1963), pp. 328-331. [20] M.S. Moslehian and M. Kian, Non-commutative f-divergence functional, Math. Nachr. 286(14-15) (2013), pp. 1514-1529. [21] P. Roselli and M. Willem, A convexity inequality, Amer. Math. Monthly 109 (2002), pp. 64-70. [22] E. Ruch, R. Schranner, and T.H. Seligman, The mixing distance, J. Chem. Phys. 69(1) (1978), pp. 386-392. [23] E. Ruch, R. Schranner, and T.H. Seligman, Generalization of a theorem by Hardy, Littlewood, and P´olya, J. Math. Anal. Appl. 76 (1980), pp. 222-229. [24] I. Sason, and S. Verd´u, f-divergence inequalities, IEEE Trans. Information Theory, 62 (2016), pp. 5973–6006. [25] S. Zanzinger, On informational divergences for general statistical theories. Int. J. Theor. Phys. 37(1) (1998), pp. 357-363.