arXiv:1205.3409v2 [quant-ph] 20 Feb 2014 de oehrgiving together added sntasml ucino h entropies the of function simple a not is h otx fba splitters. proper beam convexity of certain context exhibits Furthermo the information information. de Fisher diffus Fisher quantum this under quantum new production divergence-based a entropy classic neede find a are relating standard we ideas identity Specifically, and to setting. Bruijin quantities spirit quantum new some the in but in EPI, similar quan the the are sepa of two two to proofs proofs of inequality proof of of Our power the purpose entropy of regime. combination outline The the detailed the splitter. of a give generalizations beam analogue is to a is quantum variables work at this The modes random bosonic statistics. two independent and adding theory of information in u ahrdpnso h eal of details the on depends rather but h nrp of entropy the eetees h nrp oe nqaiy(P) hc st which (EPI), inequality power that entropy the Nevertheless, Introduction I IBscdefinitions Basic II h unu eBujnidentity Bruijin de quantum The information V Fisher Quantum Divergence-based IV I unu ifso equation diffusion quantum A III fApidMteais nvriyo aelo Waterloo, Waterloo, of email University Canada, Mathematics, Applied of okonHihs Y158 USA. 10598, NY Heights, Yorktown .Koi swt h nttt o unu optn n the and Computing Quantum for Institute the with K¨onig is R. .Sihi ihIMT asnRsac etr 11Kitchawa 1101 Center, Research Watson TJ IBM with is Smith G. Abstract e 2 H - h unu nrp oe inequality: power entropy quantum The inequality: power entropy classical I-D The inequality: power entropy quantum I-C The inequality: power entropy classical I-B The I-A IBQatmadto sn h beamsplit- the using 5 addition Quantum information quantum Continuous-variable II-B II-A I- optblt ihtebaslte 8 as . entropy the . of Scaling beamsplitter the with Compatibility III-B III-A ( Z We w needn nlgsignals, analog independent two —When ) ≥ [email protected] e Z 2 ro lmns...... 4 . 3 ...... 2 ...... 1 ...... elements . . . proof ...... sketch . . proof ...... formulation . . . . . formulation e:dfiiin...... 7 ...... definition ter: H hsieult a on ayapplications many found has inequality This . ( X ) + Z e 2 C H = ONTENTS ( h nrp oe inequality power entropy The Y X ) ie eytgtrsrcinon restriction tight very a gives , + . Y h nrp of entropy the , X o unu systems quantum for and t H ∞ → oetKoi n reeSmith Graeme and K¨onig Robert Y ( X sdistributions. ’s ) X NNL3G1, N2L ON 8 . . . . and and Z Department , Road, n H H isin ties o to ion Y ( ( rate tum ates Y Z are re, 11 al ) ) d 1 5 9 7 , , References inequalities Appendix power entropy Quantum VII able) IFse nomto nqaiisfrtebeam- the for inequalities information Fisher splitter VI hno lorcgie nte udmna rpryof ( inequality property power entropy fundamental the called another entropy differential recognized Howe theory. also information Shannon contributions classical well-known most continous-variable his noi to thermal among white are corollar the These of a channel. capacity As the signals. computed explicitly continuous he involving capacity differenti channel the Using for noisy expression a content. an information obtained Shannon its entropies, for measure vant farno variable random a of entropy formulation inequality: power entropy classical The A. nrpe fteipt n uptvralsi hsprocess this in density variables as probability the output stated commonly of and relates is input- space It inequality the the power of on entropies entropy law’ Shannon’s ‘addition functions. an as emytiko h operation the of think may We yteconvolution the by na output an in in ntepriua omo h rbblt est func density probability the of form assu particular tions without the holds (3) on inequality tions that stressed be should It f osdrastainweea nu inl(rrno vari- random (or signal input an where situation a Consider nhs14 ae 1,Sannietfidte(differential) the identified Shannon [1], paper 1948 his In X + f X Y X IBSa nqaiyadcneiyinequal- convexity and inequality Stam with splitter beam VI-B the of Compatibility VI-A ( , scmie iha needn signal independent an with combined is z f = ) Y e 2 H eodcranrglrt odtos Specialized conditions. regularity certain beyond X H Z ( rnltos...... 11 11 ...... information . Fisher . for . ity . . . . translations X ( + X f + X Y Y = ) ( ) .I I. x ( /n ihpoaiiydniyfnto given function density probability with f ) − X X f ≥ Y NTRODUCTION f , Z ( e ihspoton support with z Y 2 f H − ) X ( 7→ ( X x x ) ) log ) d f /n n X x + + Y f e X 2 H ( x ( o all for Y ) R dx ) /n n sterele- the as Y . z resulting , ∈ R EPI). n mp- ver, (2) (1) (3) . 14 13 12 11 of se al y, 1 - . 2

to the case where one of the arguments in (2), say, fY , is a The set of inequalities (cEPI′) can be shown [6], [7] to be fixed probability density function (e.g., a Gaussian), inequal- equivalent to (cEPI). We will refer to both (cEPI), (cEPI′) as ity (3) immediately gives a lower bound on the output entropy entropy power inequalities; both express a type of concativity of the so-called additive noise channel X X + Y (e.g., the under the addition law (4). white thermal noise channel) in terms of the7→ entropy H(X) of the input signal. Given this fact and the fundamental nature of B. The quantum entropy power inequality: formulation the operation (2), it is is hardly suprising that the EPI (3) has figured prominently in an array of information-theoretic works Here we formulate and subsequently outline a proof of a since its introduction in 1948. For example, it has been used quantum version of the entropy power inequality. It is phrased to obtain converses for the capacity of the Gaussian broadcast in terms of the von Neumann entropy channel, and to show convergence in relative entropy for the S(X)= tr(ρX log ρX ) central limit theorem [2]. Other uses of the entropy power − inequality in multi-terminal may be found of a state ρX on a separable infinite-dimensional Hilbert space e.g., in [3]. describing n bosonic degrees of freedom. In quantum optics The expression entropy power stems from the fact that terminology, it gives a relation between the entropies of states a centered normal (or Gaussian) distribution fX (x) = going in and coming out of a beamsplitter. Central to this 1 x 2/(2σ2) Rn 2 program is the identification of the correct definitions applying (2πσ2)n/2 e−k k on with variance σ has en- n 2 1 2H(X)/n to the quantum case. This is what we consider our main tropy H(X)= 2 log 2πeσ . Hence the quantity 2πe e is the variance of X, commonly referred to as its power contribution. or average energy. For a general Z, the Our ‘quantum addition law’ generalizes (4) and is most con- 1 2H(Z)/n veniently described for the case of two single-mode (n = 1) quantity 2πe e is the power of a Gaussian variable with the same entropy as Z. This fact has been connected [4] to states ρX and ρY (although the generalization to n> 1 modes Brunn-Minkowski-type inequalities bounding the volume of is straightforward, see Section II-B). It takes the following the set-sum of two sets in terms of their spherical counterparts, form: the Gaussian unitary Uλ corresponding to a beam- which in turn inspired generalizations to free probability splitter with transmissivity 0 < λ < 1 is applied to the 1 two input modes. This is followed by tracing out the second theory [5]. In the following, we will omit the factor 2πe and refer to e2H(Z)/n as the entropy power of Z. mode. Formally, we have (see Section II-B for more precise The entropy power inequality (3) can be reeexpressed as a definitions) kind of ‘concativity’ property of the entropy power under a (ρX ,ρY ) ρX⊞λY = λ(ρX ρY ) slightly modified ‘addition rule’ (X, Y ) X ⊞ Y . To 7→ E ⊗ (5) 1/2 where λ(ρ)=tr2 UλρUλ† . define the latter, let µX denote the random7→ variable with E 1 x Rn This choice of ‘addition law’ is motivated by the fact that probability density fµX (x) = µ fX ( µ ), x for any ⊞ ∈ the resulting position- and momentum operators (Q, P ) of the scalar µ > 0. We define X 1/2 Y as the result of rescaling both X and Y by 1/√2 and subsequently taking their sum output mode in the Heisenberg picture are given by

(see Eq. (4) below). Because differential entropies satisfy the Q = √λQ1 + √1 λQ2 scaling property H(µX) = H(X)+ n log µ for µ > 0, this − (6) P = √λP1 + √1 λP2 leads to the following reformulation of the entropy power − inequality (3): in terms of the original operators of the first (Q1, P1) and second (Q2, P2) mode. Hence (6) indeed mimics the classical ⊞ 1 1 e2H(X 1/2Y )/n e2H(X)/n + e2H(Y )/n . (cEPI) addition rule (4) in phase space. ≥ 2 2 Having defined an addition law, we can state the general- Observe that the rhs. is the equal-weight average of the entropy izations of (cEPI) and (cEPI′) to the quantum setting. They powers of X and Y . are It turns out that the entropy H(X) itself satisfies a concativ- ⊞ 1 1 eS(X 1/2Y )/n eS(X)/n + eS(Y )/n , (qEPI) ity inequality analogous to (cEPI) (see (cEPI′) below). More ≥ 2 2 generally, introduce a parameter 0 <λ< 1 controlling the S(X ⊞ Y ) λS(X)+(1 λ)S(Y ) for λ [0, 1] . λ ≥ − ∈ relative weight of X and Y and consider the addition rule (qEPI′)

⊞ ⊞ Observe that (qEPI) differs from (cEPI) by the absence of a (fX ,fY ) fX λY where fX λY = f√λX+√1 λY . (4) 7→ − factor 2 in the exponents. This may be attributed to the fact that Note that this map is covariance-preserving when applied to the phase space of n bosonic modes is 2n-dimensional. We two random-variables X and Y whose first moments and emphasize, however, that neither (qEPI) nor (qEPI′) appear covariances coincide, a property not shared by convolution to follow in a straightforward manner from their classical without rescaling (i.e., (2)). counterparts. A further distinction from the classical case is The concativity inequality for the entropy takes the form that, to the best of our knowledge, Eq. (qEPI) and (qEPI′) do not appear to be equivalent. Note also that Guha estab- H(X ⊞ Y ) λH(X)+(1 λ)H(Y ) for 0 <λ< 1 . lished (qEPI′) for the special case λ = 1/2 using different λ ≥ − (cEPI′) techniques [8, Section 5.4.2]. 3

N We will not discuss implications of these inequalities here with initial condition f0(x)= fX (x) for all x R . In (but see [9] for an upper bound on classical capacities), but particular, diffusion acts as a one-parameter semigroup∈ expect them to be widely applicable. We point out that the im- on the set of probability density functions, i.e., portance of a quantum analog of the entropy power inequality X = (X ) for all t ,t 0 . (9) in the context of additive noise channels was recognized in t1+t2 t1 t2 1 2 ≥ ealier work by Guha et al. [10]. These authors proposed a Time evolution under (8) smoothes out initial spatial different generalization of (cEPI), motivated by the fact that inhomogenities. In the limit t , the distribution → ∞ the average photon number tr(a†aρ) of a state ρ is the natural of Xt becomes ‘simpler’: it approaches a Gaussian, and analog of the variance of a (centered) probability distribution. the asymptotic scaling of its entropy H(X ) g(t) is t ∼ In analogy to the definition of (classical) entropy power, they independent of the initial distribution fX . This is crucial defined the photon number of a state ρ as the average photon in the proof of the EPI. number of a Gaussian state σ with identical entropy. Their A second essential feature of the evolution (8) is that it conjectured photon number inequality states that this quantity commutes with convolution: we have (for any 0 <λ< 1 is concave under the addition rule (X, Y ) X ⊞λ Y . The and t> 0) photon number inequality has been shown to7→ hold in the case X ⊞ Y (X ⊞ Y ) (10) of Gaussian states [8]. t λ t ≡ λ t We find that our more literal generalization (qEPI) (for for the time-evolved versions (Xt, Yt) of two indepen- λ = 1/2) allows us to translate a proof of the classical dent random variables (X, Y ). entropy power inequality to the quantum setting. The for- (ii) de Bruijin’s identity: A third property of the diffusion mulation (qEPI) also yields tight bounds on classical capaci- process (7) concerns the rate of entropy increase for ties [9] for thermal noise associated with the transmissivity- infinitesimally small times, t 0. de Bruijin’s identity 1/2-beamsplitter. We conjecture that the quantity eS(X)/n is relates this rate to a well-known→ estimation-theoretic concave with respect to the addition rule ⊞λ for all 0 <λ< 1 quantity. It states that (a fact which is trivial in the classical case due to the scaling d √ 1 property of entropy). If shown to be true, this establishes tight dt H(X + tZ)= 2 J(X) where t=0 bounds on the classical capacity of thermal noise channels, for J(X)= ( f(x))T ( f(x)) 1 dN x . ∇ ∇ · f(x) all transmissivities. (11) R The quantity J(X) is the J(X) = (θ) (θ) C. The classical entropy power inequality: proof sketch J(f ; θ) θ=0 associated with the family f θ R of distributions| { } ∈ It is instructive to review the basic elements of the proof of the classical entropy power inequality. Roughly, there are f (θ)(x)= f(x θ ~e) for all x Rn . (12) − · ∈ two different proof methods: the first one relies on Young’s where ~e = (1,..., 1). In other words, the random inequality in the version given by Beckner [11], and Brascamp variables X(θ) are obtained by translating X by an and Lieb [12]. The second one relies on the relation between θ unknown amount{ } θ in each direction, and J(X) relates to entropy and Fisher information, known as de Bruijin’s identity. the problem of estimating θ (by the Cram´er-Rao bound). For a detailed account of the general form of these two ap- More importantly, the quantity J(X) inherits various proaches, we refer to [13]. While [13] gives a good overview, it general properties of Fisher information, generally de- is worth mentioning that more recently, Rioul has found proofs fined as of the EPI that manage to rely entirely on the use of ‘standard’ 2 information measures such as mutual information [14], [15]. ∂ J(f (θ); θ) = log f (θ)(x) f (θ)(x)dx θ=θ0 His work also involves ideas from the second approach. ∂θ θ=θ0 Z   Here we focus on the second type of proof, originally (13) introduced by Stam [16] and further simplified and made for a parameterized family f (θ) of probability distri- more rigorous by Blachman [17] and others [2], [13]. Its main butions on R (this is replaced{ by} a matrix in the case components can be summarized as follows: n> 1). (i) Gaussian perturbations: For a random variable X on The link between entropy and Fisher information ex- Rn define pressed by de Bruijin’s identity is appealing in itself. Its usefulness in the proof of the EPI stems from yet another X = X + √tZ t 0 (7) t ≥ compatibility property with convolution. Translations as defined by (12) can be equivalently applied before or where Z be a standard normal with variance σ2 =1. It after applying addition (convolution), that is, we have is well-known that the family of distributions ft = fXt (θ) (√λθ) (√1 λθ) satisfies the heat or diffusion equation (X ⊞λ Y ) = X ⊞λ Y − (14)

N 2 for 0 <λ< 1 and θ R. In other words, adding (con- ∂ ∂ ∈ ft = ∆ft , ∆= (8) volving) X and Y and then translating by θ is equivalent ∂t ∂x2 j=1 j (√λθ) (√1 λθ) X to adding up translated versions X , Y − . 4

(iii) Convexity inequality for Fisher information: This inequality (cEPI′) for (Xt, Yt) is equivalent to the state- shows that Fisher information behaves in a ‘convex’ ment δ(t) 0. Since (X , Y ) (X, Y ), the entropy power ≥ 0 0 ≡ manner under the addition law (4), i.e., inequality (cEPI′) amounts to the inequality J(X ⊞ Y ) λJ(X)+(1 λ)J(Y ) . (15) δ(0) 0 . (19) λ ≤ − ≥ A related inequality for the addition law when λ =1/2 To show (19), observe that in the infinite-diffusion limit t is known as Stam’s inequality; it takes the form , we have → ∞ 2 1 1 lim δ(t)=0 (20) ⊞ + . (16) t J(X 1/2 Y ) ≥ J(X) J(Y ) →∞ as all entropies H(X ) H(Y ) H(X ⊞ Y ) have the Proofs of such inequalities have been given in [16], [17], t t t λ t same asymptotic scaling.∼ Here we used∼ the compatibility (10) [4]. An insightful proof was provided by Zamir [18], re- of diffusion and the addition rule (4). de Bruijin’s identity (i) lying only on basic properties of Fisher information. The and the convexity inequality for Fisher information (ii) imply most fundamental of those is the information-processing that the function δ(t) is non-increasing, i.e, inequality, which states that Fisher information is non- increasing under the application of a channel (stochastic δ˙(t) 0 for all t 0 . (21) map) , ≤ ≥ E Indeed, we can use the semigroup property (9) of diffusion to J( (f (θ)); θ) J(f (θ); θ) . (17) show that it is sufficient to consider infinitesimal perturbations, θ=θ0 θ=θ0 E ≤ i.e., the derivative of δ(t) at t =0. But the condition The Fisher information inequalities (15) and (16) are

simple consequences of this data-processing inequality δ˙(0) 0 and the compatibility (14) of convolution with trans- ≤ is simply the Fisher information inequality (15), thanks to de lations. For example, the quantities on the two sides Bruijin’s identity (11) and the compatibility (10) of diffusion of (15) are the Fisher informations of the two families of with convolution. Hence one obtains (21) which together with distributions X(θ) and (X(√λθ), Y (√1 λθ)) , and θ − θ Eq. (20) implies the claim (19). are directly associated{ } with{ both sides of the compatiblity} identity (14). D. The quantum entropy power inequality: proof elements The entropy inequalities (cEPI) and (cEPI′) follow immedi- ately from (i)–(iii). Concretely, let us describe the argument of Here we argue that a careful choice of corresponding Blachman [17] adapted to the proof of (cEPI′) (The derivation quantum-mechanical concepts guarantees that the remarkable of (cEPI) is identical in spirit, but slightly more involved. compatibility properties between convolution, diffusion, trans- It uses (16) instead of (15).). Note that a similar proof lation, entropy and Fisher information all have analogous which avoids the use of asymptotically divergent quantities quantum formulations. This yields a proof of the quantum is provided in [13], yet we find that the following version is entropy power inequalities (qEPI) and (qEPI′) that follows more amenable to a quantum generalization. exactly the structure of the classical proof, yet involves fun- The core idea is to consider perturbed or diffused versions damentally quantum-mechanical objects. In this section, we briefly motivate and discuss the choices √ Xt = X + tZ1 required. Yt = Y + √tZ2 , (i) A quantum diffusion process: For a state ρ of n modes, of the orignal random variables (X, Y ), where t 0 and we define ≥ t where Z1,Z2 are independent random variables with standard ρt = e L(ρ) t 0 (22) normal distribution. This is useful because asymptotically ≥ as t , the entropy power inequality is satisfied trivially as the result of evolving for time t under the one- → ∞ t for the pair by . We call this the infinite-diffusion parameter semigroup e L t 0 generated by the Marko- (Xt, Yt) (i) { } ≥ limit. de Bruijin’s identity (ii) and the Fisher information vian Master equation inequality (iii) provide an understanding of the involved d dt ρt = (ρt) entropies in the opposite limit, where the diffusion only acts 1 n L (23) ( )= j=1 ([Qj , [Qj, ]] + [Pj , [Pj , ]]) . for infinitesimally short times. This allows one to extend the L · − 4 · · This choice ofP Liouvillean is motivated (as in [19]) by result from infinitely large times down to all values of t L including 0. In the latter case, there is no diffusion and we the fact that it is the natural quantum counterpart of the are dealing with the original random variables (X, Y ). (classical) Laplacian (cf. (8)) n Sketch proof of (cEPI′): In more detail, introduce the 1 quantity ∆( )= ( qj, qj , + pj , pj, ) · 4 { { ·}} { { ·}} j=1 δ(t) := H(X ⊞ Y ) λH(X ) (1 λ)H(Y ) (18) X t λ t − t − − t acting on probability density function on the symplectic R2n 1 measuring the amount of potential violation of the entropy phase space of n classical particles (the factor 4 is power inequality for the pair (Xt, Yt). The entropy power introduced here for mathematical convenience). Indeed, 5

is obtained from the Dirac correspondence , on the mode operators, L1 [ , ] between the Poisson bracket and the commutator,{· ·} 7→ i~ eiθPj Q e iθPj = Q + θ as well· · as phase space variables and canonical operators, j − j iθQj iθQj (q ,p ) (Q , P ). e− Pj e = Pj + θ , j j 7→ j j Alternatively, one can argue that the Wigner func- in analogy to the addition law (6). tions W of the states ρ defined by the evolu- t t In the remainder of this paper, we verify that these definitions tion (22){ obey} the heat equation{ } (8) on R2n. indeed satisfy the necessary properties to provide a proof Hall [19] has previously examined the relation between of the quantum inequalities (qEPI) and (qEPI ). A word of the Master equation (23) and classical Fisher infor- ′ mathematical caution is in order here: several times we will mation. He argued that the position- and momentum need to interchange a derivative with an integral or infinite density functions f (x)= x ρ x and f (p)= p ρ p t t t t sum, typically taking a derivative inside a trace. This can obey the heat equation (8).h Correspondingly,| | i he obtainedh | | i be justified as long as the functions involved are sufficiently de Bruijin-type identities for the measurement-entropies smooth. In order to focus on the main ideas in our argument, when measuring ρ in the (overcomplete) position- and t rather than pursue a detailed justification of the interchange momentum eigenbases, respectively. These results are, of limits we will simply restrict our attention to families however, insufficient for our purposes as they refer to of states that are sufficiently smooth. This involves little purely classical objects (derived from quantum states). loss of generality since we are interested in the entropy (ii) Divergence-based quantum Fisher information: The of states satisfying an energy bound. On this set of states, information processing inequality (17) is well-known to entropy is continuous, so one can hope to approximate non- be the defining property of classical Fisher information: smooth functions with smooth ones to obtain the unrestricted A theorem by Cencovˇ [20], [21] singles the Fisher result1. As a result, smoothness requirements are not usually information metric as the unique monotone metric (under considered in proofs of the classical EPI [16], [17], [14], [7] or stochastic maps) on the manifold of (parametrized) prob- studies of related quantum ideas [24], [25]. Developing a more ability density functions. Unfortunately, such a charac- mathematically thorough version of these arguments appears terization does not hold in the quantum setting: Petz [22] to be a formidable task (even in the classical case, see [2]) established a one-to-one correspondence between mono- which is left to future work. tone metrics for quantum states and the set of matrix- monotone functions. It turns out that the correct choice of quantum Fisher II. BASIC DEFINITIONS information in this context is given by the second deriva- A. Continuous-variable quantum information tive of the relative entropy or divergence S(ρ σ) = A quantum state of n modes has 2n canonical degrees of k tr(ρ log ρ ρ log σ), that is, freedom. Let Q and P be the “position” and “momentum” − k k d2 operators of the k-th mode, for k = 1,...,n, acting on the (θ) (θ0) (θ) (24) J(ρ ; θ) = 2 S(ρ ρ ) . Hilbert space n associated with the system. Defining R = θ=θ0 dθ k θ=θ0 H (Q1, P1,...,Qn, Pn), these operators obey the commutation This choice is guided by the fact that the classical relations Fisher information (13) can also be written in this n 2 (θ) d (θ0) (θ) 0 1 ⊕ form, that is, J(f ; θ) = 2 D(f f ) , θ=θ0 dθ [Rk, Rℓ]= iJkℓ where J = , k θ=θ0 1 0 where D(f g)= f(x) log f(x) is the classical relative −  g(x) n entropy. Wek call the quantity (24) the divergence-based with ⊕ denoting the n-fold direct sum. The Weyl displacement (quantum) FisherR information. operators are defined by iξ JR 2n Lesniewski and Ruskai [23] have studied metrics ob- D(ξ)= e · for ξ R . tained by differentiation of a ‘contrast functional’ S( ) ∈ in a similar manner as (24). They found that every mono-·k· These are unitary operators satisfying the relations tone Fisher information arises from a quasi-entropy in i ξ Jη D(ξ)D(η)= e− 2 · D(ξ + η) this way. In particular, the data-processing inequality (17) is a consequence of their general result. We will discuss and an independent elementary proof of the data processing D(ξ)R D(ξ)† = R + ξ I or (25) inequality for the divergence-based Fisher information in k k k Section IV. [D(ξ), Rk]= ξkD(ξ) . (26) (iii) Translations of quantum states: These are simply This explains the terminology. Any state ρ takes the form translations in phase space. We use translated versions n 2n of a state ρ displaced in each direction, ρ = (2π)− χ (ξ)D( ξ)d ξ , (27) ρ − (θ,Q ) iθP iθP Z ρ j = e j ρe− j 1Alternatively, one could introduce a high-photon-number cutoff (well (θ,P ) iθQ iθQ ρ j = e− j ρe j for j =1, . . . , n . higher than the energy bound of the states under consideration) to make the state space finite, where these problems disappear, and take the limit as the Clearly, this choice is dictated by the Heisenberg action cutoff grows. 6 where the state takes the form n β n (28) e− k k χρ(ξ) = tr(D(ξ)ρ) † (36) USρUS = β n . tr e− k k is the characteristic function of ρ. The characteristic func- Ok=1 tion χρ is well-defined for any trace-class operator, but the In this expression, the inverse temperatures βk are defined map ρ χρ can be extended to all Hilbert-Schmidt class in terms of the symplectic eigenvalues as operators7→ρ by continuity. The map is an isometry between (eβk 1) 1 = (ν 1)/2 := N(ν ) for k =1, . . . , n . the Hilbert space of these operators and the space 2 R2n of − k k L ( ) − − (37) square-integrable functions on the phase space: The quantity (37) is the mean photon number, i.e., the expec- n tr(ρ†σ)=(2π)− χρ(ξ)χσ(ξ) for all ρ, σ . (29) tation value tr(USρUS† ak† ak). Since the number operators have Z the spectral decomposition nk = n 0 n n n with integer ≥ | ih | Observe that for Hermitian ρ = ρ†, we have χ (ξ)= χ ( ξ) eigenvalues, the entropy S(ρ) = tr(ρ log ρ) of a Gaussian ρ ρ − −P since D(ξ)† = D( ξ) state ρ can be evaluated from (36) and (37) as − A Gaussian state ρ is uniquely characterized by the vector n 2n d = (d1,...,d2n) R of first moments S(ρ)= g(N(νk)) (38) ∈ k=1 d = tr(ρR ) (30) X k k where g(N) := (N + 1) log(N + 1) N log N. Note that this − and its covariance matrix γ with entries quantity does not depend on the displacement vector d. A completely positive trace-preserving map (CPTPM) γkℓ = tr (ρ Rk dk, Rℓ dℓ ) , (31) on is called Gaussian if it preserves the set of GaussianE { − − } H where A, B = AB + BA. Its characteristic function is states, that is, the state ρ′ := (ρ) is Gaussian for all { } Gaussian input states ρ. By definition,E the set of Gaussian op- χ(ξ) = exp iξ Jd ξ J T γJξ/4 . (32) erations is closed under composition. A Gaussian operation · − · is completely specified by its (Heisenberg) action on modeE The covariance matrixγ of a quantum state satisfies the operators. The action of on Gaussian states is determined operator inequality by a triple (X,Y,ξ), whereE ξ Rn is an arbitrary vector ∈ γ iJ , (33) and X, Y are real 2n 2n matrices satisfying Y T = Y and ≥ Y +i(J XJXT ) 0×. For a Gaussian state ρ with covariance which is Heisenberg’s uncertainty relation. Conversely, any matrix γ−and displacement≥ vector d, the Gaussian state (ρ) R2n matrix γ satisfying (33) and vector d define a Gaussian is described by E quantum state via (32). Complex conjugating∈ (33) and adding T it to itself shows that the covariance matrix γ 0 is γ γ′ := XγX + Y (39) ≥ 7→ positive definite. According to Williamson’s theorem [26], d d′ := Xd + ξ . (40) there is a symplectic matrix S Sp(2n, R), i.e., a matrix 7→ More generally, the action on a general state ρ is determined satisfying SJST = J, such that ∈ by the action T SγS = diag(ν1,ν1,ν2,ν2,...,νn,νn) with νj 0 (34) 1 ξ Y ξ ≥ χρ(ξ) χ (ρ)(ξ)= χρ(Xξ)e− 4 · . (41) 7→ E is diagonal. Eq. (34) is called the Williamson normal form on characteristic functions. In the special case where (ρ)= of γ, and νj j are referred to as the Williamson eigenvalues. E R { } UρU † is unitary, we have Y = 0 and X =: S Sp(2n, ). Symplectic transformations S are in one-to-one correspon- We call such a unitary U Gaussian and sometimes∈ write U = dence with linear transformations R R := SR preserving ′ U to indicate it is defined by S (cf. (35)). the canonical commutation relations.7→ Each S Sp(2n, R) S ∈ A general Gaussian operation on n has a Stinespring defines a unitary US on n which realizes this map by E H R H dilation with a Gaussian unitary US, S Sp(2(n + m), ) conjugation, i.e., ∈ acting on the Hilbert space n m of n + m modes (for 2n some m) and a Gaussian ancillaryH ⊗ H state ρ on , i.e., it is B Hm US† RkUS = SkℓRℓ =: Rk′ . (35) of the form Xℓ=1 (ρ)=trB(US(ρ ρB)US† ) . (42) The unitary US is unique up to a global phase. E ⊗ If S Sp(2n, R) brings the covariance matrix of a Gaussian Since the operations of taking the tensor product and the partial ∈ state ρ into Williamson normal form as in (34), then Rk′ k trace correspond to taking the direct sum or a submatrix on the are the eigenmodes of ρ. In terms of the creation, annihilation{ } level of covariance matrices, identities (39) and (42) translate and number operators into T T 1 1 S(γ γB)S = XγX + Y, a† = (Qk′ + iPk′ ) ak = (Qk′ iPk′ ) ⊗ 2n k √2 √2 − where [ ]2n denotes the leading principle 2n 2n submatrix nk = a† ak , for k =1, . . . , n , · × k and γB is the covariance matrix of ρB. It is again convenient 7

to express this in terms of characteristic functions. First to (41) and (44), the state U (ρ ρ )U † has characteristic λ X ⊗ Y consider the partial trace: if ρn+m has characteristic function function R2n R2m χρn+m (ξ, ξ′), where ξ , ξ′ , then the partial † √ √ ∈ ∈ χUλ(ρX ρY )U (ξX , ξY )= χX ( λξX + 1 λξY ) trace ρn = trm ρn+m has characteristic function ⊗ − √ √ (48) 2m χY ( 1 λξX λξY ) . χρn (ξ)= χρn+m (ξ, 0 ) . · − − It follows that λ(ρ ρ) has characteristic function With (41), we get the transformation rule E ⊗ √ √ 1 2m 2m χ λ(ρX ρY )(ξ)= χX ( λξ) χY ( 1 λξ) . (49) 2m 4 (ξ,0 ) Y (ξ,0 ) E ⊗ · − χρ χ (ρ)(ξ) := (χ χB)(S(ξ, 0 ))e− · , 7→ E ⊗ (43) III. A QUANTUM DIFFUSION EQUATION where χB is the characteristic function of the Gaussian ancil- Consider the diffusion Liouvillean n lary state ρB. Here (χ χB)(ξ, ξ′) := χ(ξ) χB(ξ′) is the 1 ⊗ · (ρ)= ([Qj, [Qj , ]] + [Pj , [Pj , ]]) characteristic function of the product state ρ ρB . L −4 · · ⊗ j=1 X 1 2n B. Quantum addition using the beamsplitter: definition = [R , [R ,ρ]] . (50) −4 j j In this section, we specify the quantum addition opera- j=1 X tion (5) in more detail. It takes takes two n-mode states ρX ,ρY defined on n modes (cf. (23)). We first establish the relevant and outputs an n-mode state ρ ⊞ = λ(ρX ρY ), where t X λY E ⊗ properties of the one-parameter semigroup e L t 0, where 0 <λ< 1. t { } ≥ the CPTP map e L describes evolution under the Markovian To define the CPTP map λ, consider the transmissivity λ- master equation beam splitter. This is a GaussianE unitary U , whose action λ,n d on 2n modes is defined by the symplectic matrix ρ(t)= (ρ(t)) . (51) dt L √λIn √1 λIn for time t. We call this the diffusion semigroup. We will Sλ,n = − I2 , (44) √1 λIn √λIn ⊗ subsequently show that the maps et are compatible with  − −  L beamsplitters (Section III-A), and analyze the asymptotic scal- where we use a tensor product eX ,...,eX ,eY ,...,eY 1 n 1 1 ing of solutions t of (51) (Section III-B). q,p to represent{ the two sets}⊗ of S(ρ(t)) ρ(t)= e L(ρ) The following is well-known (see e.g., [27] and [28], {modes} (qX ,pX ,...,qX ,pX ) and (qY ,pY ,...,qY ,pY ). 1 1 n n 1 1 n n where is given in terms of creation- and annihilation Note that the symplectic matrix takes the form In J, with L ⊗ operators). 0 1 J = . Lemma III.1. Let be the Liouvillean (50). Then 1 0 L −  (i) The Liouvillean is Hermitian with respect to the X X Y Y L With respect to the n pairs (qj ,pj ), (qj , qj ) of modes, Hilbert-Schmidt inner product, i.e., n for j = 1,...,n, we have Sλ,n = Sλ,⊕1 , hence Uλ,n = n tr(ρ (σ)) = tr( (ρ)σ) (52) Uλ,⊗1 corresponds to independently applying a beam-splitter L L to each pair of modes. In the following, we will omit the for all states ρ, σ. Q1 P1 Qn Pn subscript n, and simply write ξX = (ξX , ξX ,...,ξX , ξX ) (ii) Let ρ0 be a state with characteristic function χρ0 (ξ), Q1 P1 Qn Pn t and ξY = (ξY , ξY ,...,ξY , ξY ) for the collection of phase and let ρ(t)= e L(ρ0) denote the solution of the Master t space variables associated with first and second set of n modes. equation (51) with initial value ρ(0) = ρ0. Then e L(ρ0) The map λ is defined by conjugation with Uλ and tracing has characteristic function out the secondE set of modes, i.e., it is χ(t)(ξ)= χ (ξ)exp ξ 2t/4 , (53) ρ0 −k k λ(ρXY )=trY UλρXY Uλ† . (45) 2 2n 2 E where ξ = j=1 ξj .  k k t Clearly, this is a Gaussian map. A bipartite Gaussian state ρ (iii) For any t 0, the CPTPM e L is a Gaussian map acting XY ≥ P with covariance matrix and displacement vector on covariance matrices and displacement vectors by

γX γXY γ γ′ = γ + tI2n γ = , d = (dX , dY ) (46) 7→ (54) γ γ d d′ = d .  Y X Y  7→ Proof: If is the characteristic function of a state , gets mapped into a Gaussian state λ(ρXY ) with χρ(ξ) ρ E we can express the commutator [Rj ,ρ] as γ′ = λγX + (1 λ)γY + λ(1 λ)(γXY + γY X ) − − 1 2 d′ = √λdX + √1 λdY [R ,ρ]= χ (ξ)ξ D( ξ)d ξ . p − j (2π)n ρ j − (47) Z by using (27) and the displacement property (26). Iterating This completely determines the action of on Gaussian λ this argument gives inputs, but we will also be interested in moreE general inputs 1 of product form. To get an explicit expression, consider two [R , [R ,ρ]] = χ (ξ)ξ2D( ξ)d2ξ , j j (2π)n ρ j − states ρX ,ρY with characteristic functions χX ,χY . According Z 8 hence Theorem III.3 (Entropy scaling under diffusion). Let ρ be an arbitrary (not necessarily Gaussian) state, whose covariance 1 2 2 (ρ)= χρ(ξ) ξ D( ξ)d ξ . (55) matrix has symplectic eigenvalues ν ,...,ν . Let t> 2. Then L −4(2π)n k k − 1 n Z n t In other words, the operator (ρ) has characteristic function ng(N(t 1)) S(e L(ρ)) g(N(t + νk)) , (57) 1 2 L − ≤ ≤ χ (ρ)(ξ) = χρ(ξ) ξ . Together with (29), this immedi- k=1 L − 4 k k X ately implies (52). where g(N) = (N + 1) log(N + 1) N log N is the entropy Let ρ(t) be the operator with characteristic function (53). of one-mode state with mean photon− number N and N(ν)= By commuting derivative and integration, we have (ν 1)/2. − d d 1 2 2 Proof: The lower bound is essentially the lower bound ρ(t)= χρ (ξ) exp( ξ t/4)D( ξ)d ξ dt dt (2π)n 0 −k k − on the minimum output entropy from [24, (33)], generalized Z 1 to several modes. We recall the necessary definitions in the = χ (ξ) ξ 2 exp( ξ 2t/4)D( ξ)d2ξ −4(2π)n ρ0 k k −k k − Appendix. Any state can be written in the form Z 1 2 2 2n = χρ(t)(ξ) ξ D( ξ)d ξ ρ = Q(η)σ(η)d η with −4(2π)n k k − Z Z 1 ξ 2/4 iξ Jη 2n Combined with (55), this shows that ρ(t) is indeed the solution σ(η)= ek k e · D( ξ)d ξ , t (2π)n − to the Liouville equation, i.e., ρ(t)= e L(ρ). This proves (ii). Z Finally, property (iii) follows from (ii) since multiplying a where Q is a probability distribution, i.e., Q(η) 0 and Gaussian characteristic function by a Gaussian preserves its Q(η)d2nη = 1. The operators σ(η) are generally≥ not t Gaussianity. quantum states. However, by linearity of the superoperator e L, weR can compute

t 1 (t 1)/4 ξ 2 iξ Jη 2n A. Compatibility with the beamsplitter e L(σ(η)) = e− − ·k k e · D( ξ)d ξ (2π)n − In the same way as convolution is compatible with the heat Z t equation (cf. (10)), beamsplitters are compatible with diffusion according to Lemma III.1, which shows that e L(σ(η)) is a displaced thermal state with covariance matrix (t 1)I if defined by the Liouvillean . This can be understood as a − 2n consequence of the fact that,L on the level of Wigner functions t> 2. Because entropy is concave, we therefore get

(i.e., the Fourier transform of the characteristic function), both t t 2n t S(e L(ρ)) S(e L(σ(η)))d η = ng(N(t 1)) , the beam-splitter map λ and the diffusion e L are described ≥ − E Z by a convolution. Here we give a compact proof based on the and the claim follows. fact that these maps are Gaussian. For the proof of the upper bound in (57), we can assume

Lemma III.2. Let tX ,tY 0, 0 <λ< 1 and let be the without loss of generality that ρ is Gaussian. This is because diffusion Liouvillean acting≥ on n modes. Then L the maximum entropy principle [30] states that for any state ρ, we have tX tY t λ (e L e L) e L λ (56) E ◦ ⊗ ≡ ◦E S(ρ) S(ρG) , (58) ≤ where t = λt + (1 λ)t . X − Y where ρG is a Gaussian state with covariance matrix identical Proof: According to Sections II-B and III, both λ to ρ. The remainder of the proof closely follows arguments t E and e L are Gaussian maps, hence this is true also for both in [29]. It is a well-known fact that the entropy of a state ρ sides of identity (56). This means it suffices to verify that they can be computed by taking the limit agree on all Gaussian input states ρXY described by (46). The 1 S(ρ) = lim log tr ρp . (59) claim follows immediately from (47) and (54): Indeed, both p 1+ 1 p t t t → (e X L e Y L)(ρ ) and e L( (ρ )) are described by − Eλ ⊗ XY Eλ XY For a Gaussian state ρ whose covariance matrix has symplectic

γ′ = λγX + (1 λ)γY + tI eigenvalues ν = (ν1,...,νn), the expression on the rhs. is − equal to [31] d′ = √λdX + √1 λdY . n − 2p 2pn tr ρp = := and where fp(νj ) Fp(ν) k=1 Y p p fp(ν) = (ν + 1) (ν 1) (60) B. Scaling of the entropy as t − − → ∞ for any p 1. Weak submajorization, written x w y for two The following result will be essential for the proof of the vectors x,≥ y Rn, is defined by the condition ≺ entropy power inequality. It derives from a combination of ∈ k k arguments from [24] and [29], as well as the maximum entropy w x y x↓ y↓ for k =1, . . . , n , principle [30]. ≺ ⇔ j ≤ j j=1 j=1 X X 9

where x1↓ x2↓ xn↓ and y1↓ y2↓ yn↓ are for t , for some constant ν depending on ρ. The claim decreasing≥ rearrangements≥ ··· ≥ of x and y≥, respectively.≥ ··· ≥ In [29], follows→ from ∞ this. ∗ t it is argued that the function ν Fp(ν) respects the partial A simple consequence of Corollary III.4 is that e L(ρ) order imposed by weak submajorization7→ on Rn, i.e., it has the converges in relative entropy to a Gaussian state: we have property t t G lim S(e L(ρ) e L(ρ) )=0 , (64) w t F (x) F (y) if x y . (61) →∞ k p ≥ p ≺ where σG denotes the ‘Gaussification’ of σ, i.e., the Gaussian Furthermore, [29, Theorem 1] states that for any 2n 2n real state with identical first and second moments. Indeed, because positive symmetric matrices A and B, the vector of symplectic× et is a Gaussian map (Lemma III.1), we have et (ρ)G = eigenvalues of their sum A + B is submajorized by the sum L L et (ρG) for any state ρ. Therefore, of the corresponding vectors of A and B, respectively, i.e., L t t G t t G S(e L(ρ) e L(ρ) )= S(e L(ρ) e L(ρ )) ν(A + B) w (ν(A)+ ν(B)) . (62) k t k t t G ≺ = S(e L(ρ)) tr(e L(ρ) log e L(ρ )) − − Combining (59), (60), (61) and (62) gives the following t t G t G = S(e L(ρ)) tr(e L(ρ) log e L(ρ )) statement. Let ρ[γ] denote the centered Gaussian state with − − t t G t G covariance matrix γ. Let γA > 0 and γB > 0 be positive = S(e L(ρ)) tr(e L(ρ ) log e L(ρ )) A B − t − t G covariance matrices with symplectic eigenvalues ν ,ν . Then = S(e L(ρ)) + S(e L(ρ )) . − n t G In the third identity, we used the fact that log e L(ρ ) is S(ρ[γ + γ ]) S ρ (νA + νB )I . (63) A B ≤   j j 2 quadratic in the mode operators. Statement (64) now follows j=1 M from Corollary III.4. The upper bound (57) follows  immediately from (58) and (63) applied to γA = γ and γB = tI2n (a valid covariance IV. DIVERGENCE-BASED QUANTUM FISHER matrix for t 1). This is because for an initial state ρ INFORMATION with covariance≥ matrix γ, the time-evolved state et (ρ) has L In this section, we introduce the divergence-based Fisher covariance matrix γ + tI according to Lemma III.1. 2n information and establish its main properties. We restrict our Using Theorem III.3, we can show that the asymptotic focus to what is needed for the proof of the entropy power scaling of the entropy (or the entropy power) is independent inequality. We refer to the literature for substantially more of the initial state ρ, in the following sense. general results concerning quasi-entropies and quantum Fisher t Corollary III.4. The entropy and the entropy power of e L(ρ) information (see e.g., [22], [23], [32], [33]). grow logarithmically respectively linearly as t , with an Recall that the relative entropy or divergence of two (in- → ∞ asymptotic time-dependence independent of the initial state ρ. vertible) density operators ρ, σ is defined as S(ρ σ) = More precisely, we have tr ρ(log ρ log σ). The divergence is nonnegative and faithful,k − t i.e., S(e L(ρ))/n (1 log 2 + log t) O(1/t) − − ≤ 1 tL e S(ρ σ) 0 and S(ρ σ)=0 if and only if ρ = σ . eS(e (ρ))/n O(1/t) . k ≥ k t − 2 ≤ (65) where the constants in O( ) depend on ρ. · Furthermore, it is monotone, i.e., non-increasing under the application of a CPTPM , Proof: From the Taylor series expansions E log(1 + ǫ)= ǫ + O(ǫ2) S( (ρ) (σ)) S(ρ σ) . (66) E kE ≤ k e (1 + ǫ)1/ǫ = e ǫ + O(ǫ2) Properties (65) and (66) tell us that S( ) may be thought 2 − of as a measure of closeness. A third important·k· property of we get divergence is its additivity under tensor products, g(N) = log N +1+ O(1/N) 1 1 S(ρ ρ σ σ )= S(ρ σ )+ S(ρ σ ) . (67) eg(N) = + N e + O( ) for N . 1 ⊗ 2k 1 ⊗ 2 1k 1 2k 2 2 N → ∞ n (θ) Replacing the rhs of (57) with the upper bound g(N(t+ Consider a smooth one-parameter family θ ρ of states. k=1 Using divergence to quantify how much these7→ states change as νk)) n maxk g(N(t + νk)) =: ng(N(t + ν )) and using ≤ ∗ R N(ν) = (ν 1)/2 therefore gives P we deviate from a basepoint θ0 , it is natural to consider (θ0) (θ) ∈ − the function θ S(ρ ρ ) in the neighborhood of θ0. In t t 7→ k log( 1)+1 O(1/t) S(e L(ρ))/n the following, we assume that it is twice differentiable at θ0 2 − − ≤ (but see comment below). According to (65), this function t + ν 1 log( ∗ − )+1+ O(1/t) vanishes for θ = θ0, and is nonnegative everywhere. This ≤ 2 implies that its first derivative vanishes, i.e., and d tL (θ0) (θ) t 1 1 S(e (ρ))/n t ν∗ 1 S(ρ ρ ) =0 . (68) 2 e 2 e O( t ) e 2 e + 2 e + O( t ) dθ k θ=θ0 − − ≤ ≤

10

One is therefore led to consider the second derivative, which Let us argue that an analogous identity holds for the fam- (θ) θ0 θ we will denote by ily (ρ ) θ. Define gθ0 (θ) = S( (ρ ) (ρ )). We know from{E (65), (68)} and the monotonicityE (66)kE that d2 J(ρ(θ); θ) = S(ρ(θ0) ρ(θ)) . (69) θ=θ0 2 0 g (θ) f (θ) for all θ and | dθ k θ=θ0 ≤ θ0 ≤ θ0 d (74) We call the quantity (69) the divergence-based Fisher infor- gθ0 (θ0)= fθ0 (θ0)= fθ0 (θ) =0 . mation of the family ρ(θ) . Its properties are as follows: dθ θ=θ0 { }θ Therefore, the derivative of g is Lemma IV.1 (Reparametrization formula). For any constant θ0 c, we have d gθ0 (θ0 + ǫ) gθ0 (θ0) gθ0 (θ) = lim − dθ θ=θ0 ǫ 0 ǫ (cθ) 2 (θ) → J(ρ ; θ) θ=θ0 = c J(ρ ; θ) θ=θ0 | | gθ0 (θ0 + ǫ) (θ+c) (θ) = lim 0 . J(ρ ; θ) θ=θ0 = J(ρ ; θ) θ=θ0+c . ǫ 0 ǫ ≥ | | → On the other hand, (74) also gives Proof: This is immediate from Definition (69).

d gθ0 (θ0 + ǫ) fθ0 (θ0 + ǫ) Lemma IV.2 (Non-negativity). The Fisher information satis- gθ0 (θ) = lim lim dθ θ=θ0 ǫ 0 ǫ ǫ 0 ǫ fies → ≤ → d = f (θ) =0 , (θ) θ0 J(ρ ; θ) 0. (70) dθ θ=θ0 θ=θ0 ≥ hence (θ0) (θ) Proof: Define fθ (θ)= S (ρ ρ ). We know that 0 k d gθ0 (θ) =0 . fθ (θ) 0 for all θ and fθ (θ0)=0 . dθ θ=θ0 0 ≥ 0 Similarly, writing the second derivative of g as a limit as The claim follows from this by writing the second derivative θ0 in (71) and using (74) gives as a limit d2 d2 d2 d2 (θ0) (θ) 0 2 gθ0 (θ) 2 fθ0 (θ) . 2 S(ρ ρ ) = 2 fθ0 (θ) ≤ dθ θ=θ0 ≤ dθ θ=θ0 dθ k θ=θ0 dθ θ=θ0 In summary, the function g has vanishing first and bounded fθ0 (θ0 + ǫ) 2fθ0 (θ0)+ fθ0 (θ0 ǫ) θ0 = lim − − . ǫ 0 ǫ2 second derivative, which shows that → (71) (θ0) (θ) 1 (θ) 2 2 S( (ρ ) (ρ )) = J( (ρ ); θ) (θ θ0) + o( θ θ0 ) E kE 2 E θ=θ0· − | − | (75) (θ) (θ) Lemma IV.3 (Additivity). For a family ρA ρB θ of bipartite product states, we have { ⊗ } The claim of the theorem now follows from (73), (75) and the data processing inequality (66). (θ) (θ) (θ) (θ) J(ρA ρB ; θ) = J(ρA ; θ) + J(ρB ; θ) . One may worry about the differentiability of the function ⊗ θ=θ0 θ=θ0 θ=θ0 for an arbitrary smooth one-parameter fam- (72) θ S(ρθ0 ρθ) ily7→ρ(θ) onk an infinite-dimensional Hilbert space. However, { }θ Proof: This directly follows from the additivity (67) of we do not need this full generality here. Throughout, we are divergence. only concerned with covariant families of states Most importantly, the divergence-based Fisher information (θ) i(θ θ0)H (θ0) i(θ θ0)H ρ = e − ρ e− − for all θ,θ R (76) satisfies the data processing inequality. 0 ∈ Theorem IV.4 (Data processing for Fisher information). Let generated by a Hamiltonian H. For a family of the form (76), be an arbitrary CPTPM. Then we can easily compute the derivative E d J( (ρ(θ)); θ) J(ρ(θ); θ) . S(ρ(θ0) ρ(θ)) = tr ρ(θ0)[iH, log ρ(θ0)] , (77) θ=θ0 θ=θ0 E ≤ dθ k θ=θ0 −   Proof: The proof proceeds in the same way as a well- In accordance with (68), this can be seen to vanish by inserting known proof for the information-processing property of clas- the spectral decomposition of ρ(θ0). We can also give an sical Fisher information (see e.g., [34]). We show here that our explicit expression for the second derivative, the divergence- (θ0) (θ) assumption that fθ0 (θ)= S(ρ ρ ) is twice differentiable based Fisher information. It will be convenient to state this as k at θ = θ0 is sufficient to give the desired inequality. From (68) a lemma. and the definition of Fisher information, we conclude that Lemma IV.5. Let ρ(θ) be a covariant family of states as { }θ (θ0) (θ) 1 (θ) 2 2 in (76). Then S(ρ ρ )= J(ρ ; θ) (θ θ0) + o( θ θ0 ) . k 2 θ=θ0 · − | − | (73) J(ρ(θ); θ) = tr(ρ(θ0)[H, [H, log ρ(θ0)]]) . (78) θ=θ0 | 11

i(θ θ0)H Proof: Define U(θ)= e − . Using U(θ)HU(θ)† = VI. FISHERINFORMATIONINEQUALITIESFORTHE H, we calculate BEAMSPLITTER A. Compatibility of the beam splitter with translations d (θ) (θ0) log ρ = [iH,U(θ)(log ρ )U(θ)†] dθ We first establish a slight generalization of the compat- (θ0) = U(θ)[iH, log ρ ]U(θ)† , ibility condition (14) between the addition rule defined by the CPTPM λ (cf. (45)) and translations DR in phase hence space (cf. (79)).E 2 d (θ) (θ ) 0 Lemma VI.1. Let θ, wX , wY R and consider the CPTP 2 log ρ = [iH,U(θ)[iH, log ρ ]U(θ)†] . dθ maps ∈ Therefore (ρ)= 2 2 F d (θ0) (θ) (θ0) d (θ) S(ρ ρ )= tr(ρ log ρ ) (D (w θ) D (w θ)) ρ (D (w θ) D (w θ))† dθ2 k − dθ2 Eλ R X ⊗ R Y R X ⊗ R Y (θ0) (θ0)   = tr(ρ iH,U(θ)[iH, log ρ ]U(θ)† ) and − h i and evaluating this at θ = θ gives the claim. (ρ)= D (wθ) (ρ)D (wθ)† 0 G R Eλ R

where w = √λwX + √1 λwY . Then . In particular, V. THEQUANTUMDE BRUIJIN IDENTITY − F ≡ G (ρ(wX θ,R) ρ(wY θ,R)) = ( (ρ ρ ))(wθ,R) The quantum analog of the translation rule (12) is defined Eλ X ⊗ Y Eλ X ⊗ Y in terms of phase space translations: For R Qj, Pj define for any two n-mode states ρ and ρ , where ρ ρ(θ,R) is ∈{ } X Y the displacement operator in the direction R as defined by (80). 7→

iθPj e if R = Qj Proof: Since both and are Gaussian (as compositions DR(θ)= (79) F G iθQj of Gaussian operations), it suffices to show that they agree (e− if R = Pj . on Gaussian inputs. Note that ρ D (θ)ρD (θ)† leaves the 7→ R R For a state ρ, consider the family of translated states covariance matrix ρ invariant while adding d′ = d+θ eR to the · displacement vector, where eR is the standard basis vector in (θ,R) R ρ = DR(θ)ρDR(θ)† θ (80) the -direction. This, together with (47), immediately implies ∈ R that . and its Fisher information (θ,R) . With a slight abuse J(ρ ; θ) θ=0 F ≡ G of terminology, we will call the quantity B. Stam inequality and convexity inequality for Fisher infor- 2n mation (θ,Rk) (81) J(ρ) := J(ρ ; θ) θ=0 k=1 The following Fisher information inequality for the quan- X tity J(ρ) (cf. (81)) is a straightforward consequence of the obtained by summing over all phase space directions the compatibility of beam-splitters with translations, and the data Fisher information of ρ. The quantum version of de Bruijin’s processing inequality for Fisher information. The arguments identity (11) then reads as follows: used here are identical to those applied in Zamir’s proof [18] Theorem V.1 (Quantum de Bruijin). Let be the Liouvil- (described transparently in [36] for the special case of interest lean (50). The rate of entropy increase whenL evolving under here) of Stam’s inequality. The following statement implies a t the one-parameter semigroup e L t 0 from the initial state ρ quantum version of the latter and a convexity inequality of the is given by { } ≥ form (15). d 1 Theorem VI.2 (Quantum Fisher information inequality). Let S(e t(ρ)) = J(ρ) . (82) L t=0 w , w R and 0 <λ< 1. Let ρ ,ρ be two n-mode dt 4 X Y ∈ X Y states. Then Proof: We use the expression (see e.g.,[35]) w2J( (ρ ρ )) w2 J(ρ )+ w2 J(ρ ) d λ X Y X X Y Y t (83) E ⊗ ≤ S(e L(ρ)) t=0 = tr( (ρ) log ρ) . dt − L where w = √λwX + √1 λwY . − for the rate of entropy increase under a one-parameter semi- n t Proof: Let R Qj, Pj j=1 and consider the fam- group e L. Because is Hermitian (Lemma III.1), this is equal ∈ {(wX θ,R)} (wY θ,R) L ily of product states ρX ρY θ and the fam- to (wθ,R{ ) ⊗ } ily ( λ(ρX ρY )) θ. By Lemma VI.1, the latter is d t { E ⊗ } S(e L(ρ)) = tr(ρ (log ρ)) , (84) obtained by data processing from the former. By the data dt t=0 − L processing inequality for Fisher information, this implies The claim now follows from the definition (50) of the Liou- (wθ,R) (wX θ,R) (wY θ,R) villean and Lemma IV.5. J ( λ(ρX ρY )) J(ρX ρY ; θ) , L E ⊗ θ=0≤ ⊗ θ=0  

12

t and by the properties of Fisher information, this is equivalent Because e L t is a semigroup, we get to { } d t t 2 (wθ,R) 4 δ(t)= J(e L( λ(ρX ρY ))) λJ(e L(ρX )) w J ( λ(ρX ρY )) dt E ⊗ − E ⊗ θ=0 t (1 λ)J(e L(ρY )) ,  2 (θ,R) 2 (θ,R) − − wX J (ρX ) θ=0 + wY J(ρY ; θ) . ≤ | θ=0by the quantum de Bruijin identity (Theorem V.1). Because t The claim is obtained by summing over all phase space of the compatibility of λ with e L (Lemma III.2), we can E directions R. rewrite this as The following convexity inequality for Fisher information d t t t 4 δ(t)= J( (e L(ρ ) e L(ρ ))) λJ(e L(ρ )) is an immediate consequence of Theorem VI.2, obtained by dt Eλ X ⊗ Y − X t setting wX = √λ and wY = √1 λ. (1 λ)J(e L(ρ )) . − − − Y Corollary VI.3 (Convexity inequality for Fisher information). Using the convexity inequality for Fisher information (Corol- Let ρX ,ρY be n-mode states and 0 <λ< 1. Then lary VI.3), the inequality (87) follows. Specializing Theorem VII.1 to Gaussian states, we obtain J( (ρ ρ )) λJ(ρ )+(1 λ)J(ρ ) . (85) Eλ X ⊗ Y ≤ X − Y a concativity inequality for the entropy with respect to the We also obtain the following quantum analog of the classical covariance matrix. That is, denoting by ρ[γ] the centered Stam inequality (16). Gaussian state with covariance matrix γ, we have Corollary VI.4 (Quantum Stam inequality). Let ρ ,ρ be S(λρ[γ ]+(1 λ)ρ[γ ]) λS(ρ[γ ]) + (1 λ)S(ρ[γ ]) X Y 1 − 2 ≥ 1 − 2 arbitrary states and consider the 50 : 50-beamsplitter map (88) = . Then E E1/2 for all 0 <λ< 1. 2 1 1 Finally, we can obtain the quantum analog (qEPI) of the + . J( (ρX ρY )) ≥ J(ρX ) J(ρY ) classical entropy power inequality (cEPI) for the 50 : 50- E ⊗ beamsplitter, again mimicing a known classical proof. Proof: This follows from Theorem VI.2 by setting (λ = 1/2) and Theorem VII.2 (Entropy power inequality for 50:50 beam- splitter). Consider the map associated with a 50:50 J(ρ ) 1 = 1/2 X − and E E wX = 1 1 beamsplitter. Let ρX ,ρY be two n-mode states. Then J(ρX )− + J(ρY )− 1 S( (ρ ρ ))/n 1 S(ρ )/n 1 S(ρ )/n J(ρY )− e E X ⊗ Y e X + e Y . (89) wY = . ≥ 2 2 J(ρ ) 1 + J(ρ ) 1 X − Y − Proof: The proof follows Blachman [17], and is repro- duced here for the reader’s convenience. Define the functions F VII. QUANTUMENTROPYPOWERINEQUALITIES F EX (F ) := exp S(e L(ρX ))/n 7→ G G EY (G) := exp S(e L(ρY ))/n (90) Having introduced the right quantum counterparts of the 7→ H  H E (H) := exp S(e L( (ρ ρ )))/n relevant classical concepts, the proof of the quantum entropy 7→ Z E X ⊗ Y  power inequalities is straightforward. We first provide a proof expressing the entropy powers of the states obtained by letting of the version (qEPI ) for the transmissivity -beamsplitter. diffusion act on ρX ,ρY and (ρX ρY ) for times F, G ′ λ E ⊗ It closely follows an argument given by Dembo, Cover and and H 0, respectively. According to Corollary III.4, these ≥ Thomas in [13] for the classical statement (cEPI′), but relies functions have identical scaling on a slight adaptation that allows us to generalize it to the E(T ) const. + Te/2 for T . (91) quantum setting. ∼ → ∞ Moreover, since EX and EY are continuous functions on Theorem VII.1 (Quantum entropy power inequality for trans- the positive real axis, Peano’s existence theorem shows that missivity λ). Let 0 <λ< 1 and let be the map (45) Eλ the initial value problems associated with a beamsplitter of transmissivity λ. Let ρX F˙ (t) = E (F (t)) , F (0) = 0 and ρY be arbitrary n-mode states. Then X (92) G˙ (t) = EY (G(t)) , G(0) = 0 S( λ(ρX ρY )) λS(ρX )+(1 λ)S(ρY ) . (86) E ⊗ ≥ − have (not necessarily unique) solutions F ( ), G( ). Fix a pair Proof: Define of solutions and set · · t t δ(t)= S(e L( (ρ ρ ))) λS(e L(ρ )) Eλ X ⊗ Y − X H(t) = (F (t)+ G(t))/2 . (93) t (1 λ)S(e L(ρ )) . − − Y Since We want to show that δ(0) 0. Since limt δ(t)=0 by g(F (t) 1) g(G(t) 1) ≥ →∞ F˙ (t) e − > 0 and G˙ (t) e − > 0 the asymptotic scaling of these entropies (Corollary III.4), it ≥ ≥ suffices to show that according to Theorem III.3, these functions diverge, i.e., d δ(t) 0 for all t 0 . (87) lim F (t) = lim G(t) = lim H(t)= . (94) t t t dt ≤ ≥ →∞ →∞ →∞ ∞ 13

Consider the function where we inserted the definition (92) of F and G. Replac- ing JZ (H) by the upper bound (99) and further suppressing EX (F (t)) + EY (G(t)) δ(t)= . dependencies finally gives 2EZ (H(t)) With the initial conditions (92), it follows that the claim (89) ˙ 2 2 2 JX JY 8nδ EX JX + EY JY (EX + EY ) is equivalent to ≥ − JX + JY 2 (EX JX EY JY ) δ(0) 1 . (95) = − , ≥ JX + JY Inequality (95) follows from two claims: first, we have hence (96) follows from the nonnegativity of the Fisher information. lim δ(t)=1 , t →∞ as follows immediately from the asymptotic scaling (91) of APPENDIX the entropy powers E, as well as (94) and the choice (93) of H(t). Second, we claim that the derivative of δ satisfies Define the Q-function in terms of the characteristic function d by the expression δ(t) 0 for all t 0 . (96) dt ≥ ≥ 1 iJξ η η 2/4 2n Q(ξ)= e− · χ(η)e−k k d η . (100) Define, in analogy to (90), the Fisher informations of the (2π)2n Z states ρX ,ρY and (ρX ρY ) after diffusion for times F, G and H 0 as E ⊗ Identity (100) can be inverted to give ≥ F JX (F ) := J e L(ρX ) η 2/4 iη Jξ 2n G χ(η)= ek k e · Q(ξ)d ξ . (101) JY (G) := J e L(ρY ) (97) H  Z JZ (H) := J e L( (ρX ρY )) . E ⊗  Inserting the definition (28) of the characteristic function T Using the fact that e L T is a semigroup, we can restate the into (100) gives quantum Bruijin identity{ } (Theorem V.1) as 1 1 iη Jξ η 2/4 2n 1 Q(ξ)= tr ρ e · D( η)e−k k d η E˙ (T )= E (T )J (T ) where A X,Y,Z (2π)n (2π)n − A 4n A A ∈{ }  Z  (98) Comparing this with (32), we conclude that Note also that J (F ),J (G) and J (H) are related by X Y Z 1 Stam’s inequality if H = (F + G)/2 is the average (93) of F Q(ξ)= n ξ ρ ξ , and G: because (2π) h | | i

H F G where ξ = D(ξ) 0 is a coherent state. e L( (ρX ρY )) = (e L(ρX ) e L(ρY )) , | i | i E ⊗ E ⊗ This shows that Q(ξ) 0 for all ξ R2n. Furthermore, by the compatibility of the beamsplitter with diffusion since the coherent states satisfy≥ the completeness∈ relation (Lemma III.2), we obtain 1 2n 2JX (F )JY (G) I = n ξ ξ d ξ , JZ (H) (99) (2π) | ih | ≤ JX (F )+ JY (G) Z 2n using the quantum Stam inequality (Corollary VI.4) and the we have Q(ξ)d ξ =1, i.e., Q is a probability density. The state ρ can be expressed in terms of Q using (101) nonnegativity of divergence-based Fisher information. Com- R puting the derivative of δ, and (27): we have

E˙ (F (t))F˙ (t)+ E˙ (G(t))G˙ (t) 2n δ˙(t)= X Y ρ = Q(η)σ(η)d η where 2EZ (H(t)) Z 1 ξ 2/4 iξ Jη 2n EX (F (t)) + EY (G(t)) (102) ˙ ˙ σ(η)= n ek k e · D( ξ)d ξ . 2 EZ (H(t))H(t) , (2π) − − 2EZ (H(t)) · Z inserting (98) and suppressing the δ-dependence leads to ACKNOWLEDGMENTS 8nδ˙ = EX (F )JX (F )F˙ + EY (G)JY (G)G˙ The authors thank Mark Wilde for his comments on the (E (F )+ E (G)) E (H)J (H)H˙ X Y Z Z manuscript. G.M. acknowledges support by DARPA QUEST −2 2 · = EX (F ) JX (F )+ EY (G) JY (G) program under contract no. HR0011-09-C-0047. R.K. ac- 1 knowledges support by NSERC and IBM Watson research, (E (F )+ E (G))2J (H) , − 2 X Y Z where most of this work was done. 14

REFERENCES [26] J. Williamson, “On the algebraic problem concerning the normal forms of linear dynamical systems,” American Journal of Mathematics, vol. 58, no. 1, pp. pp. 141–163, 1936. [Online]. Available: [1] C. E. Shannon, “A mathematical theory of communication,” The Bell http://www.jstor.org/stable/2371062 System Technical Journal, vol. 27, pp. 379–423, 623–656, October 1948. [27] P. Vanheuverzwijn, “Generators for quasi-free completely positive [2] A. R. Barron, “Entropy and the central limit theorem,” Ann. Probab., semi-groups,” Annales de l’institut Henri Poincar´e (A) Physique vol. 14, no. 1, pp. 336–342, 1986. th´eorique, vol. 29, no. 1, pp. 123–138, 1978. [Online]. Available: [3] A. E. Gamal and Y.-H. Kim, Network Information Theory. Cambridge http://eudml.org/doc/75994 University Press, 2012. [28] S. Lloyd, V. Giovannetti, L. Maccone, N. J. Cerf, S. Guha, R. Garcia- [4] M. H. M. Costa and T. M. Cover, “On the similarity of the entropy power Patron, S. Mitter, S. Pirandola, M. Ruskai, J. Shapiro, and H. Yuan., inequality and the Brunn-Minkowski inequality,” IEEE Transactions on “The bosonic minimum output entropy conjecture and Lagrangian Information Theory, vol. 30, no. 6, pp. 837–839, 1984. minimization,” 2009, arXiv:0906.2758. [5] S. Szarek and D. Voiculescu, “Volumes of restricted minkowski [29] T. Hiroshima, “Additivity and multiplicativity properties of sums and the free analogue of the entropy power inequality,” some Gaussian channels for Gaussian inputs,” Phys. Rev. Communications in Mathematical Physics, vol. 178, pp. 563–570, A, vol. 73, p. 012330, Jan 2006. [Online]. Available: 1996. [Online]. Available: http://dx.doi.org/10.1007/BF02108815 http://link.aps.org/doi/10.1103/PhysRevA.73.012330 [6] E. H. Lieb, “Proof of an entropy conjecture of wehrl,” Comm. Math. [30] M. M. Wolf, G. Giedke, and J. I. Cirac, “Extremality of Gaussian Phys., vol. 62, no. 1, pp. 35–41, 1978. quantum states,” Phys. Rev. Lett., vol. 96, p. 080502, Mar 2006. [Online]. [7] S. Verdu and D. Guo, “A simple proof of the entropy-power inequality,” Available: http://link.aps.org/doi/10.1103/PhysRevLett.96.080502 Information Theory, IEEE Transactions on, vol. 52, no. 5, pp. 2165 [31] A. Holevo, M. Sohma, and O. Hirota, “Error exponents for –2166, may 2006. quantum channels with constrained inputs,” Reports on Mathematical [8] S. Guha, “Multiple-user quantum information theory for optical com- Physics, vol. 46, no. 3, pp. 343 – 358, 2000. [Online]. Available: munication channels,” Ph.D. dissertation, MIT, Dept. of Electrical En- http://www.sciencedirect.com/science/article/pii/S0034487700900053 gineering and Computer Science., 2008. [32] D. Petz and C. Ghinea, “Introduction to quantum Fisher information,” 2010, arXiv:1008.2417. [9] R. K¨onig and G. Smith, “Limits on classical communication from [33] F. Hiai, M. Mosonyi, D. Petz, and C. Beny., “Quantum f-divergences quantum entropy power inequalities,” Nature Photonics, vol. 7, p. and error correction,” 2011. 142146, January 2013. [34] M. Raginsky, “Divergence in everything: Cram´er-Rao from data process- [10] S. Guha, B. Erkmen, and J. Shapiro, “The entropy photon-number in- ing,” Information Structuralist, blog infostructuralist.wordpress.com, equality and its consequences,” in Information Theory and Applications July 2011. Workshop, 2008, 2008, pp. 128–130. [35] H. Spohn, “Entropy production for quantum dynamical semigroups,” J. [11] W. Beckner, “Inequalities in Fourier analysis,” The Annals of Math. Phys., vol. 19, no. 1227, 1978. Mathematics, vol. 102, no. 1, pp. pp. 159–182, 1975. [Online]. [36] A. Kagan and T. Yu, “Some inequalities related to the Available: http://www.jstor.org/stable/1970980 stam inequality,” Applications of Mathematics, vol. 53, pp. [12] H. Brascamp and E. Lieb, “Best constants in Young’s inequality, its 195–205, 2008, 10.1007/s10492-008-0004-2. [Online]. Available: converse and its generalization to more than three functions,” Adv. in http://dx.doi.org/10.1007/s10492-008-0004-2 Math., vol. 20, pp. 151–172, 1976. [13] A. Dembo, T. Cover, and J. Thomas, “Information theoretic inequalities,” Information Theory, IEEE Transactions on, vol. 37, no. 6, pp. 1501 – 1518, nov 1991. [14] O. Rioul, “Two proofs of the Fisher information inequality via data processing arguments,” April 2006, arXiv:cs/0604028v1. [15] ——, “Information theoretic proofs of entropy power inequalities,” Information Theory, IEEE Transactions on, vol. 57, no. 1, pp. 33 –55, jan. 2011. [16] A. Stam, “Some inequalities satisfied by the quantities of information of Fisher and shannon,” Information and Control, vol. 2, no. 2, pp. 101 – 112, 1959. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0019995859903481 [17] N. Blachman, “The convolution inequality for entropy powers,” Infor- mation Theory, IEEE Transactions on, vol. 11, no. 2, pp. 267 – 271, apr 1965. [18] R. Zamir, “A proof of the Fisher information inequality via a data pro- cessing argument,” Information Theory, IEEE Transactions on, vol. 44, no. 3, pp. 1246 –1250, may 1998. [19] M. J. W. Hall, “Quantum properties of classical Fisher information,” Phys. Rev. A, vol. 62, p. 012107, Jun 2000. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevA.62.012107 [20] N. N. Cencov.,ˇ “Statistical decision rules and optimal inference,” Transl. Math. Monographs, Amer. Math. Soc., Providence, RI, vol. 53, 1981. [21] L. L. Campbell, “An extended Cencovˇ characterization of the informa- tion metric,” Proceedings of the American Mathematical Society, vol. 98, no. 1, pp. 135–141, 1986. [22] D. Petz, “Monotone metrics on matrix spaces,” Linear Algebra Appl., vol. 244, pp. 81–96, 1996. [23] A. Lesniewski and M. B. Ruskai, “Monotone Riemannian metrics and relative entropy on noncommutative probability spaces,” J. Math. Phys., vol. 40, no. 5702, 1999. [24] V. Giovannetti, S. Guha, S. Lloyd, L. Maccone, and J. H. Shapiro, “Minimum output entropy of bosonic channels: A conjecture,” Phys. Rev. A, vol. 70, p. 032315, Sep 2004. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevA.70.032315 [25] V. Giovannetti, A. S. Holevo, S. Lloyd, and L. Maccone, “Generalized minimal output entropy conjecturefor one-mode Gaussian channels: definitions and some exact results,” J. Phys. A, vol. 43, no. 032315, 2010.