<<

arXiv:0812.3832v4 [quant-ph] 27 Mar 2009 adda esrso h ieec ewe w states. two between difference the of measures as garded iymatrices sity osdsacsbtensae,sc steBrsdis- Bures the var- as define such can square one states, the tance short), between From for distances fidelity ious [2]. (or fidelity theory root information quantum in skona the as known is gle toa en.Antrlmaueta unie the quantifies states that quantum measure pure oper- natural two any of A by similarity precision means. cannot arbitrary faith- ational states with be quantum distinguished can generic states be two classical distinguished, while that, fully is systems tum hc sgvnb h ipeexpression quantity, simple this the all of by over given root states, is square which mixed The probability and the transition purifications. [1] of the possible of Uhlmann purifications minimum by two the between proposed by given was is states it mixed to ity certainty. with state outcome unique no particular the is a is state yields there straightforward mixed that since no a probability which is for transition there measurement the states, mixed of of analogue case the In any hsqatt ssmercwt epc othe overlap, to their of respect square the with cer- by given symmetric with is the is and outcome states is quantity particular states This a the yields tainty. of that one state outcome which same unique for the measurement yield a would under states two probability the the which i.e., with them, between probability transition udmna ieec ewe lsia n quan- and classical between difference fundamental A eeaiaino h ocp ftasto probabil- transition of concept the of generalization A A ( B ,σ ρ, ( ,σ ρ, arccos = ) rpd ´sc eoia nvria u`nm eBarcelo Aut`onoma F´ısica de Te`orica, Universitat de Grup h blt ommcoeesml ie h te n sareso a as one other the given appr while ensemble problem, two one transportation consider optimal mimic Monge-Kantorovich We to channe paper, ensembles. ability this quantum In quantum the stochastic two communication. a quantum between in of fidelity role output central s a the describe play to as necessary such are formation, ensembles Quantum probability. fsohsi unu hnesadqatmdetectors. quantum stocha and for channels be fidelity quantum may s stochastic and quantities mixed of These distance the define (). m to to measures the operator-valued measures equal pr to new ensembles are equal the between averages is use fidelity whose latter EHS ensembles inte states—the the mixed pure-state operational that between present fidelity show We the also We are. monotonic measures Kant sures. The not EHS are the properties. operations, contrast, desirable quantum of deterministic number under a enjoy auxi measures introduce of which representations (EHS) Hilbert-space = ) itnusaiiymaue ewe nebe fquantum of ensembles between measures Distinguishability σ unu ensemble quantum A .INTRODUCTION I. and F qaero fidelity root square p ( ,σ ρ, 1 ρ − F Tr = ) n a rvneteeyuseful extremely proven has and ( F ,σ ρ, ( ,σ ρ, 2 ] hc a ere- be can which 5], [2, ) q 3 ]o h ue an- Bures the or 4] [3, ) √ σρ { √ ( p gynOeho n onCalsamiglia John and Oreshkov Ognyan σ, x ewe w den- two between | ρ , ψ i x ) and } sasto unu ttsec curn admywt give a with randomly occurring each states quantum of set a is | φ |h Dtd ue4 2018) 4, June (Dated: i ψ sthe is | φ i| (1) 2 . esrso itnehv enpooe se e.g., (see, The proposed 11]). 10, other been 9, 8, various have 7, [6, measures, distance Refs. fidelity-based of to measures addition In otemxmmpoaiiywt hc h w states two the which with and probability related maximum meaning the operational to its and properties, useful ous operator an a vra nqeto steBatcayacoefficient statisti- Bhattacharyya The the is [12]. question states in overlap P the mea- the cal on possible between is performed all overlap states by surements statistical generated quantum minimum distributions two probability the between to fidelity to equal approach For the natural states. instance, between a measures suggests distinguishability probabil- This defining and distributions. states ity quantum between correspondence systems variables. random the pos- corresponding on of of perform continuum continuum can a a and one is that there observations case, vari- single quantum sible random the a In the only to able. probabil- of corresponding two outcomes one the the than observable—the however, concern smaller case, distributions strictly classical ity is the drawn In en- was which unity. variable from correctly the them. guessing semble of over- of no probability one have the distributions from two lap, the sampled of variable supports the the random Unless at a looking distinguishing of by of value distributions problem probability the classical to two similarity strong bares o xml,where example, for ieetqatmmaueet sals different establish measurements quantum Different states quantum two distinguishing of problem The x p σ ir one o a)sae.Bt types Both states. flag) (or pointer liary P a 89 eltra(acln) Spain (Barcelona), Bellaterra 08193 na, h eodoeue h dao extended- of idea the uses one second the a edsigihdb measurement. a by distinguished be can ( tain ihicmlt roiin- priori a incomplete with ituations x ne eeaie esrmns In measurements. generalized under epooemaue fdsac and distance of measures propose we ) sfli h otx ftomography of context the in useful rvc esrs letmonotonic albeit measures, orovich tcqatmcanl n positive and channels quantum stic peain o ohtpso mea- of types both for rpretations Q xmmo h dlt ewe all between fidelity the of aximum ae en oprd efinally We compared. being tates O gnrlzdmaueet,and measurement), (generalized l readi lsl eae othe to related closely is and urce ahs h rtoei ae on based is one first the oaches: ( x swdl sddet t ipefr,vari- form, simple its to due used widely is , vdsanvlitrrtto of interpretation novel a ovides ewe lsia rbblt distribu- probability classical between ) ∆( k ,σ ρ, O = ) k Tr = 1 2 √ k rc distance trace ρ O − † O states σ stetaenr of norm trace the is k , n [6], (2) ρ 2 tions P (x) and Q(x) (here x is a classical random vari- Even though various measures of distance and fidelity able). Similarly, the trace distance (2) can be obtained between quantum states have been studied, similar mea- by maximizing over all possible measurements the Kol- sures for ensembles of states have been lacking. With 1 mogorov distance x 2 P (x) Q(x) between the cor- the development of quantum technology, it becomes im- responding outcome probability| − distributions.| As ex- portant to be able to rigorously compare different ex- pected, in the limitP of commuting density matrices, both perimental schemes and assess the degree to which they the fidelity and the trace distance reduce to their classi- differ from ideal ones. The existing measures of distance cal counterparts, i.e., to the Bhattacharyya overlap and and fidelity between quantum states are sufficient for this the Kolmogorov distance, respectively. purpose when the system of interest at a given stage of As is manifested in these examples, density matrices the experiment is described by a single . can be thought of as generalizations of classical proba- These measures can also be used to define distance and bility distributions, which include the latter as a special fidelity between deterministic quantum operations, i.e., case. However, in many scenarios, completely positive trace-preserving (CPTP) maps [20]. one often deals with an even more general concept, which However, in many situations an experiment may involve is a hybrid between the quantum and classical cases. This states obtained randomly according to some probabil- is the concept of a probabilistic ensemble of quantum ity distribution, such as the states obtained during the states, i.e., a classical probability distribution of density process of entanglement concentration [21], or the states matrices. Ensembles of quantum states describe situa- resulting from the measurement of an error syndrome tions in which a quantum system can take a number of during and error-correction protocol [22], or simply a different states at random according to some probability source of quantum states used for communication. It distribution. Such a situation is, for example, the out- is therefore important to have a distinguishability mea- come of a quantum measurement. A quantum measure- sure between two ensembles of states. Furthermore, the ment can be regarded as a stochastic quantum channel tools of quantum information involve not only CPTP that outputs different quantum states with probabilities maps but also stochastic quantum operations (general- that depend on the input state according to the general- ized measurements), and a figure of merit comparing two ized Born rule [2]. When the measurement is projective, such operations (e.g., a real one with an ideal one) would the possible output states are orthogonal and the output require a quantitative comparison between their output ensemble can be regarded as a classical one. However, in ensembles. Rigorous measures that compare generalized the case of generalized measurements the states need not measurements would be useful, in particular, for assess- be orthogonal, and the output of the channel is a genuine ing the performance of quantum detectors, which can quantum ensemble. now be characterized experimentally [26] through quan- A quantum state is said to “...capture the best infor- tum detector tomography [23, 24, 25]. mation available about how a quantum system will react The purpose of this paper is to propose measures of dis- in this or that experimental situation” [13]. Accordingly, tance and fidelity between ensembles of quantum states a quantum ensemble gives the best information available and use them to define distance and fidelity between gen- about how a quantum system will react in this or that ex- eralized measurements. The rest of the paper is organized perimental situation when the choice of experiment can as follows. In Sec. II, we review the concept of an ensem- be made conditional on some classical side information. ble of quantum states and establish nomenclature. In The uses or applications of the quantum system will de- Sec. III, we discuss some basic properties that we expect a pend strongly on the particular quantum states that ap- measure of distinguishability between ensembles to have, pear in the ensemble and on their probabilities. and rule out several naive candidates. In Sec. IV, we pro- It should be noted that in the context of resource the- pose measures of distance and fidelity of a Kantorovich ory [14], a protocol consisting of allowed transformations type and study their properties. We first introduce the generally involves measurements, and the resource avail- measure of distance on the basis of intuitive considera- able after a measurement is given by the average resource tions concerning the ability of states obtained randomly of the resulting ensemble. For example, the restriction from one ensemble to mimic states obtained randomly to local operations and classical communication (LOCC) from the other ensemble. The measure is based on the naturally gives rise to entanglement as a resource, which trace distance between states and satisfies a number of is quantified by an entanglement monotone—a function desirable properties. In addition to the standard distance which does not increase on average under LOCC [15, 16]. properties, it is jointly convex, monotonic under averag- In this sense, entanglement can be thought of as a func- ing of the ensembles and under CPTP maps. When the tion defined on ensembles. Ensembles of quantum states ensembles are discrete, the measure is equivalent to a lin- have various other applications in quantum information ear program and can be computed efficiently in the size theory, with particularly notable ones in quantum com- of the set of states participating in the ensembles. We munication, e.g., for representing sources of quantum show that for simple limiting cases, the distance between states used for communication [17, 18], or for describ- ensembles reduces to intuitive expressions involving the ing “static resources” of shared classical-quantum corre- trace distance between states. We introduce a measure lations in multi-partite systems [19]. of fidelity between ensembles in a similar fashion. The 3

fidelity satisfies properties analogous to those of the dis- a set of pairs (px,ρx) of probabilities px (px 0, { } ≥ tance and also can be computed as a linear program. px = 1) and distinct density matrices ρx ( ) x ∈ B H We provide operational interpretations of both quanti- (ρx > 0, Tr(ρx) = 1), ρx = ρy for x = y. For simplicity, ties. We show that for the case when the measures are Pwe will assume that the set6 of states6 participating in an based on the trace distance and the standard fidelity, ensemble is discrete (i.e., the index x runs over a count- the measures are not monotonic under generalized mea- able set), although we expect that our considerations ex- surements. We explain why this is natural considering tend to non-discrete ensembles as well. We will use the the operational interpretations of the quantities and de- concept of ensemble of states to describe situations in rive necessary and sufficient conditions which the basic which a system takes a state ρx at random with proba- measures of distance or fidelity between states have to bility px. The statement that a system takes the state ρx satisfy in order for the corresponding Kantorovich mea- means that there exists classical information about the sures to be monotonic under measurements. In Sec. V, identity of the state. This is to be distinguished from the we propose measures of distance and fidelity which make situation in which no information about the identity of use of the extended-Hilbert-space (EHS) representation the state exists or can be obtained. In the latter case, for of ensembles [19]. We argue that to every ensemble of all practical purposes, the average of the quantum states there is a corresponding class of valid ensemble, ρ = x pxρx, provides a complete description EHS representations and provide a rigorous definition of of the state of the system. this class. We then define the measures as a minimum An exampleP of an ensemble of states is the output of (maximum) of the distance (fidelity) between all possible a non-destructive generalized measurement. Under the EHS representations of the ensembles being compared. most general type of quantum measurement, a density We show that these definitions can be simplified and are matrix ρ transforms as equivalent to convex optimization problems. We also pro- vide equivalent formulations without reference to an ex- i(ρ) ρ ρi = M , with probability pi = Tr i(ρ), tended Hilbert space. These quantities are based on the → Tr i(ρ) M trace distance and the square root fidelity and inherit all M (3) their celebrated properties such as joint convexity in the where i( ) = j Mij ( )Mij† is the measurement su- case of the trace distance or strong concavity in the case peroperatorM · corresponding· to measurement outcome i. P of the fidelity. In addition, they are monotonic under (The operators Mij satisfy the completeness relation averaging of the ensembles, as well as under generalized i,j Mij† Mij = I.) Note that different measurement out- measurements. The latter property can be regarded as comes do not necessarily yield different output states. a generalization of the monotonicity under CPTP maps ForP example, both outcomes of a measurement on a of the trace distance and the square root fidelity. The qubit system with measurement superoperators ( )= EHS measures are upper (lower) bounded by the Kan- M1 · 0 0 ( ) 0 0 and 2( ) = 0 1 ( ) 1 0 leave the sys- torovich distance (fidelity). We provide operational in- tem| ih in| · the| ih state| 0 M0 , although· | ih they| · | provideih | information terpretations for the EHS measures too. In Sec. VI, we | ih | about the input state. If ρx is the set of distinct output present a novel interpretation of the standard fidelity be- { } states, each occurring with probability px = pi, tween mixed states as a maximum of the fidelity between i: ρi=ρx the ensemble of post-measurement states resulting from all pure-state ensembles from which the mixed states be- P the stochastic transformation (3) is (px,ρx) . ing compared can be constructed. The fidelity between { } Let (p ,ρ ) be an ensemble of density matrices over pure-state ensembles used in this definition is of the EHS x x a Hilbert{ space} . IfΩ is the set of all density matrices type but can be expressed without any reference to fi- 1 ρ that participateH in the ensemble, we can equivalently delity between mixed states and has a form which can be x represent the ensemble as a probability distribution P (ρ), regarded as a generalization of the Bhattacharyya over- ρ Ω (P (ρ ) p ), over the set Ω . Consider a sec- lap. In Sec. VII, we use the measures between ensembles 1 x x 1 ond∈ ensemble, ≡Q(σ), σ Ω , where the set Ω is not of quantum states to define distance and fidelity between 2 2 necessarily equal to Ω . We∈ can think of the two ensem- generalized measurements. We consider two definitions— 1 bles as corresponding to probability distributions over one based on the Jamiolkowski isomorphism [27] and an- the same set, by extending the definitions of P (ρ) and other based on worst-case comparison—and discuss their Q(σ) to the larger set Ω = Ω Ω through assigning properties. We also propose distance and fidelity be- 1 2 zero probabilities to those states∪ that do not participate tween positive operator-valued measures (POVMs). In in the respective ensembles. Therefore, without loss of Sec. VIII, we conclude. generality, we will treat the ensembles that we compare as probability distributions P (ρ) and Q(ρ) over the same set Ω. (Sometimes, when it is clear from the context, we II. ENSEMBLES OF QUANTUM STATES will denote the ensembles we compare simply by P and Q.) Most generally, the set Ω can be taken to be the full Let ( ) denote the set of linear operators on a finite- set of density matrices over , but in this paper we will H dimensionalB H Hilbert space . For the purposes of this assume that Ω is discrete. paper, a (probabilistic) ensembleH of quantum states is The fact that P (ρ) and Q(ρ) are valid probability dis- 4 tributions is expressed in the conditions what properties we expect such measures to have. The answer to this question will depend on the operational P (ρ′)=1, P (ρ) 0, ρ Ω, (4) ≥ ∀ ∈ context in which we want to compare the ensembles. ρ′ Ω X∈ We could ask, for example, how different on aver- Q(ρ′)=1,Q(ρ) 0, ρ Ω. (5) age two states drawn randomly from the two ensem- ≥ ∀ ∈ ρ′ Ω bles are. Given a measure of distance d(ρ, σ) between X∈ states, the average distance in that sense would be If our world is ultimately quantum, it is natural to ex- P (ρ)Q(σ)d(ρ, σ). This quantity obviously could pect that an ensemble of quantum states must have a ρ Ωσ Ω ∈ ∈ description in terms of the state of a (possibly larger) beP non-zeroP even when the two ensembles are identical. quantum system. Indeed, there is a correspondence be- Similarly, we could look at the average fidelity which can tween an ensemble of the form (px,ρx) and a state of be smaller than 1 for identical ensembles. Thus even the form { } though these quantities have a well defined meaning, they are not good measures of distinguishability. ρ = pxρx x x , (6) Another possibility is to look at a distance d(ρP , ρQ) x ⊗ | ih | X between the average density matrices ρP = P (ρ)ρ ρ Ω where the pointerb (or flag) states x are an orthonor- ∈ mal set in the Hilbert space of an{| auxiliaryi} system of a and ρQ = Q(ρ)ρ of the two ensembles, orP the fi- ρ Ω sufficiently large dimension [19]. The pointer states can ∈ delity F (ρP ,PρQ) between them. Obviously, for identical be thought of as carrying the classical information about ensembles the distance is equal to 0 and the fidelity is which particular state from the ensemble we are given— equal to 1. However, these quantities cannot discrimi- a measurement of the classical system yields the quan- nate between different ensembles that have the same av- tum state ρx with probability px, which is equivalent to erage density matrices. Imagine, for example, that an drawing a state randomly from the ensemble. Reversely, experimentalist has at her disposal two devices. The first if we are given a state drawn randomly from the ensem- 00 + 11 00 11 one produces the two-qubit Bell states | i | i , | i−| i , ble, we can record our knowledge about the identity of √2 √2 01 + 10 01 10 | i | i , | i−| i , each occurring with probability 1/4, the state in a ‘classical’ pointer attached to it and forget √2 √2 the information about the state since this information is together with a classical indicator specifying which state stored in the pointer and can always be retrieved. After is produced. The second device produces the two-qubit the latter operation, the state of the original system plus product states 00 , 01 , 10 , 11 , each occurring with | i | i | i | i the pointer system is described by pxρx x x . This probability 1/4, again with an indicator of the identity of x ⊗| ih | representation is referred to as an extended-Hilbert-space the state. Although the average states in the two cases (EHS) representation of an ensembleP [19]. For simplicity are the same, the ensembles produced by the two devices and in order to distinguish the system storing the clas- have very different properties. In the first case, the av- sical memory from the quantum system, we will use the erage entanglement between the two qubits is maximal, following notation for the pointers: whereas in the second case it is zero. Therefore, in or- der to capture the difference between two ensembles, we [x] x x . (7) ≡ | ih | would like our measure of distance (fidelity) to be 0 (1) In this notation, the state (6) reads if and only if P (ρ)= Q(ρ), ρ Ω. Measures of distance and∀ fidelity∈ which satisfy the lat- ρ = pxρx [x]. (8) ter requirement could be any measures of distance and ⊗ x fidelity between probability distributions which treat ρ X b as a classical variable. Consider, for example, the Kol- In terms of the description of an ensemble as a proba- 1 bility distribution P (ρ) over a set of states Ω, an EHS mogorov distance 2 P (ρ) Q(ρ) . Note that this dis- ρ Ω| − | representation of this type can be written as tance is precisely equalP∈ to the trace distance between the EHS representations of the two ensembles of type (9), ρP = P (ρ)ρ [ρ], (9) ⊗ ρ Ω 1 ∈ X ∆(ρP , ρQ)= P (ρ)ρ [ρ] Q(σ)σ [σ] 2 k ⊗ − ⊗ k where [ρ] is anb orthonormal set of pure pointer states ρ Ω σ Ω ρ ρ ,{ each} of which is associated with a unique density X∈ X∈ b b 1 |matrixih | ρ Ω. We will develop this concept further in = P (ρ) Q(ρ) . (10) 2 | − | ∈ ρ Ω Sec. V. X∈ In a similar manner, we could look at the Bhat- tacharyya overlap P (ρ)Q(ρ), which is equal to the III. NAIVE CANDIDATES ρ Ω ∈ fidelity between theP twop EHS representations of type (9). Before we propose distinguishability measures between Such measures, however, do not take into account the two ensembles of quantum states, let us first consider quantum-mechanical aspect of the variables ρ. If the two 5 distributions P and Q have supports on non-overlapping ities imposes the conditions T (ρ σ) 0, ρ, σ Ω, and subsets of Ω, the above distance (fidelity) would be max- T (ρ σ) = 1, σ Ω. | ≥ ∀ ∈ ρ Ω | ∀ ∈ imal (minimal), but as we mentioned earlier, two distinct ∈ density matrices are not necessarily distinguishable (they PIn order to measure how much the state σ fails to often behave as if they are the same state) and we would mimic the state ρ, we can use any measure of distance like our distance and fidelity to capture this property. In between states. In this paper, we will concentrate on the particular, in the special case where each of the two en- case of the trace distance, ∆(ρ, σ) (Eq. (2)). To mea- sure the degree to which a map T (ρ σ) from one ensem- sembles consists of a single state, we would like the mea- | sures between the two ensembles to be equal to the dis- ble to the other fails to mimic the latter, we propose tance or fidelity between the respective states. If we used to use the average distance between the actual states the above distance (fidelity) between classical probability and those that they mimic: ρ,σ Ω T (ρ σ)Q(σ)∆(ρ, σ). We can write this expression in∈ an| explicitly sym- distributions in this case, we would obtain a maximum P (minimum) value even if the two states are very similar. metric form by introducing the joint probability dis- tribution Π(ρ, σ) T (ρ σ)Q(σ) which satisfies the At the same time, it is natural to expect that a distance ≡ | marginal conditions σ Ω Π(ρ, σ)= P (ρ), ρ Ω, and between ensembles would reduce to a distance between ∈ ∀ ∈ classical probability distributions when the states partic- ρ Ω Π(ρ, σ)= Q(σ), σ Ω: ∈ P ∀ ∈ ipating in the ensembles are orthogonal. P DΠ(P,Q)= Π(ρ, σ)∆(ρ, σ). (11) ρ,σ Ω X∈ IV. DISTANCE AND FIDELITY OF A Clearly, different choices of the map T (ρ σ) (or equiva- KANTOROVICH TYPE lently, of Π(ρ, σ)) can yield different values| for the quan- tity (11). Therefore, we define the distance between the A. Motivating the definitions two ensembles as the minimum of the quantity (11) over all possible choices of Π(ρ, σ), i.e., we choose the optimal The above examples suggest that distinguishability mimicking strategy. measures with the desired properties may have to be non- Definition 1 (Kantorovich distance). Let P (ρ) trivial functions of the probability distributions and the and Q(ρ), ρ Ω, be two ensembles (probability distri- ∈ set of states participating in the ensembles. Heuristically, butions over Ω), which we denote by P and Q for short. a distance (fidelity) between two quantum states can be Then regarded as a measure of the extent to which the two K states do not (do) behave as if they are the same state D (P,Q) = min Π(ρ, σ)∆(ρ, σ), (12) Π(ρ,σ) ρ,σ Ω (the precise meaning of this statement depends on the op- X∈ erational meaning of the distance (fidelity) in question). In a similar manner, we would expect a distance (fidelity) where minimum is taken over all joint probability between two ensembles of quantum states to compare the distributions Π(ρ, σ) with marginals σ Ω Π(ρ, σ) = P (ρ), ρ Ω, and Π(ρ, σ)= Q(σ), ∈ σ Ω. extent to which the two ensembles do not (do) “behave” ∀ ∈ ρ Ω ∀ ∈ The quantity (12) is∈ of the same formP as the Kan- as if they are the same ensemble. Since the ensemble P is a statistical concept which describes the situation of torovich formulation of the optimal transportation prob- having particular states with particular probabilities, we lem [28], which is a relaxation of a problem studied in would like to compare the extent to which states drawn 1781 by Monge. In 1975, Kantorovich received the Nobel randomly from one ensemble can be used to mimic states Prize in Economics, together with Koopmans, for their drawn randomly from the other ensemble. contributions to the theory of optimum allocation of re- When states drawn randomly from the ensemble sources, and he is considered to be one of the fathers of (Q(σ), σ) are used to mimic states drawn from the en- linear programming. The optimal transportation prob- semble{ (P}(ρ),ρ) , a given state σ obtained according to lem can be cast in the spirit of its original formulations the distribution{ Q}(σ) most generally can be taken with as follows: different probabilities to pass off as different states ρ from Assume you have to transport the coal produced in (P (ρ),ρ) . In other words, the process of mimicking some mines X to the factories Y . The amounts pro- one{ ensemble} using the other one as a resource can be de- duced in each mine P1, P2,... as well as the needs for each factory Q ,Q{,... are given.} There is a cost per scribed by a transition probability matrix whose elements { 1 2 } T (ρ σ), ρ, σ Ω, describe the probabilities with which unit of mass c(x, y) to move coal from mine x to fac- the| state σ sampled∈ from the distribution Q(σ) is taken tory y. The problem is to find the optimal transportation plan or transportation map T (y x), i.e., for every mine x to pass off as the state ρ sampled from P (ρ). The require- | ment that under this simulation the probabilities are con- determine how much material has to be carried to every sistent with the probabilities P (ρ) and Q(σ), respectively, factory y so as to minimize the overall cost. is expressed in the condition T (ρ σ)Q(σ) = P (ρ). The analogy with the above definition (12) is straight- σ Ω | forward: mines and factories play the role of the quantum The fact that T (ρ σ) describe validP∈ transition probabil- states ρ and σ in each ensemble respectively, and the cost | 6 function is given by the trace distance. Kantorovich’s for- if and only if the supports of P and Q are orthogonal sets mulation extended also to non-discrete probability mea- of states. sures [29] and was one of the first infinite-dimensional Proof. Since ∆(ρ, σ) 1, then for any given Π(ρ, σ) linear programming problems to be considered. If the we have Π(ρ, σ)∆(≤ ρ, σ) Π(ρ, σ) = 1. ρ,σ Ω ≤ ρ,σ Ω probability measures are defined over a metric space and Furthermore,∈ ∆(ρ, σ) = 1 if and only∈ if ρ and σ are the cost function is taken to be the corresponding dis- orthogonal.P Observe that the onlyP non-zero values tance function, the optimal average cost is known as Π(ρ, σ) of a joint probability distribution that respects the Kantorovich distance (also referred to as Wasserstein the marginal distributions P and Q are those for which distance [30]). The optimal transportation problem is ρ is in the support of P and σ is in the support of Q. now an active field of research with tight connections Therefore, if P and Q have supports on sets of density with problems in geometry, probability theory, differen- matrices which are orthogonal, every non-zero compo- tial equations, fluid mechanics, economics and image or nent Π(ρ, σ) in the sum on the right-hand side of Eq. (12) data processing. will be multiplied by ∆(ρ, σ) = 1, which implies that Based on the same idea we can define a fidelity between K D (P,Q) = 1. Inversely, since ρ,σ Ω Π(ρ, σ) = 1 if ∈ two ensembles, which we will refer to as the Kantorovich DK (P,Q) = 1, then every non-zero Π(ρ, σ) on the right- P fidelity. hand side of Eq. (12) must be multiplied by 1, which Definition 2 (Kantorovich fidelity). The Kan- implies that P and Q must have supports on orthogonal torovich fidelity between the ensembles P (ρ) and Q(ρ), sets. ρ Ω, is ∈ Property 3 (Symmetry). K K F K (P,Q)= max Π(ρ, σ)F (ρ, σ), (13) D (P,Q)= D (Q, P ), (18) Π(ρ,σ) ρ,σ Ω P,Q . X∈ ∀ ∈PΩ where F (ρ, σ) is the square root fidelity between ρ and Proof. The symmetry follows from the definition (12) σ (Eq. (1)), and maximum is taken over all joint proba- and the symmetry of ∆(ρ, σ). Property 4 (Triangle inequality). bility distributions Π(ρ, σ) that satisfy σ Ω Π(ρ, σ) = ∈ K K K P (ρ), ρ Ω, and ρ Ω Π(ρ, σ)= Q(σ), σ Ω. D (P, R) D (P,Q)+ D (Q, R), (19) ∀ ∈ ∈ P ∀ ∈ ≤ P P,Q,R Ω. ∀ ∈P B. Properties of the Kantorovich distance Proof. Let ΠP Q(ρ, σ) and ΠQR(ρ, σ) be the two joint probability distributions which achieve the minimum in Let Ω denote the set of probability distributions over Eq. (12) for the pairs of distributions (P,Q) and (Q, R), a set ofP density matrices Ω. respectively. Consider the quantity Property 1 (Positivity). 1 Π˜ PR(ρ, σ)= ΠP Q(ρ,κ) ΠQR(κ, σ), ρ,σ Ω K Q(κ) ∈ D (P,Q) 0, (14) κ Ω ≥ X∈ P,Q , (20) ∀ ∈PΩ where for Q(κ) = 0, we define with equality P Q 1 QR Π (ρ,κ) Q(κ) Π (κ, σ) = 0 (note that if Q(κ) = 0, DK (P,Q) = 0 iff P (ρ)= Q(ρ), ρ Ω. (15) then ΠP Q(ρ,κ) = ΠQR(κ, σ) = 0, ρ, σ Ω). One ∀ ∈ can readily verify that this is a valid∀ joint∈ probability Proof. Since all terms in Eq. (12) are non-negative, distribution with marginals P and R. Therefore, we the distance DK (P,Q) is also non-negative. Obviously, if have P (ρ)= Q(ρ), ρ Ω, we obtain DK (P,Q) = 0 by choos- ∀ ∈ DK (P, R) Π˜ PR(ρ, σ)∆(ρ, σ) ing the joint probability distribution Π(ρ, σ)= δρ,σP (ρ). ≤ K ρ,σ Ω Inversely, assume that D (P,Q) = 0. This means that X∈ 1 all terms in Eq. (12) must be zero, which can happen only = ΠP Q(ρ,κ) ΠQR(κ, σ)∆(ρ, σ) if Π(ρ, σ) δρ,σ. From the condition for the marginal Q(κ) ρ,σ,κ Ω probability∝ distributions, we see that Π(ρ, σ)= δ P (ρ) X∈ ρ,σ 1 and P (ρ)= Q(ρ). ΠP Q(ρ,κ) ΠQR(κ, σ)∆(ρ,κ) ≤ Q(κ) Property 2 (Normalization). ρ,σ,κ Ω X∈ 1 DK (P,Q) 1, (16) + ΠP Q(ρ,κ) ΠQR(κ, σ)∆(κ, σ) ≤ Q(κ) P,Q , ρ,σ,κ Ω ∀ ∈PΩ X∈ = ΠP Q(ρ,κ)∆(ρ,κ)+ ΠQR(κ, σ)∆(κ, σ) with equality ρ,κ Ω σ,κ Ω X∈ X∈ DK (P,Q)=1 (17) = DK (P,Q)+ DK (Q, R), (21) 7 where in the second inequality we have used the triangle such that ρ′ = (ρx). Similarly, Q′(σ′) = P (σy), E y inequality for ∆. where the sum is over all σy Ω such that σ′ = (σy). Property 5 (Joint convexity). Therefore, we have that ∈ P E

K K D (pP1 + (1 p)P2,pQ1 + (1 p)Q2) (22) D (M (P ),M (Q)) Π′(ρ′, σ′)∆(ρ′, σ′) − − E E K K ≤ ′ ′ pD (P ,Q )+(1 p)D (P ,Q ), ρ ,σ ΩE ≤ 1 1 − 2 2 X∈ P1, P2,Q1,Q2 Ω, p [0, 1]. = Π(ρ, σ)∆( (ρ), (σ)) Π(ρ, σ)∆(ρ, σ) ∀ ∈P ∀ ∈ E E ≤ ρ,σ Ω ρ,σ Ω 1 2 X∈ X∈ Proof. Let Π (ρ, σ) and Π (ρ, σ) be two joint = DK (P,Q), (26) probability distributions which achieve the minimum in Eq. (12) for the pairs of distributions (P1,Q1) and where the last inequality follows from the monotonicity (P2,Q2), respectively. It is immediately seen that of ∆(ρ, σ) under CPTP maps [32]. Corollary (Invariance under unitary maps). ˜ 12 1 2 Π (ρ, σ)= pΠ (ρ, σ)+(1 p)Π (ρ, σ) (23) For all unitary maps , − U K K is a joint probability distribution with marginals pP1 + D (P,Q)= D (M (P ),M (Q)). (27) U U (1 p)P2 and pQ1 + (1 p)Q2. Therefore, − − The property follows from the fact that unitary maps are K D (pP1 + (1 p)P2,pQ1 + (1 p)Q2) reversible CPTP maps. − − Property 7 (Monotonicity under averaging). Π˜ 12(ρ, σ)∆(ρ, σ) ≤ Let P denote the singleton ensemble consisting of the ρ,σ Ω X∈ average state of P (ρ), ρP = P (ρ)ρ. Then = p Π1(ρ, σ)∆(ρ, σ)+(1 p) Π2(ρ, σ)∆(ρ, σ) ρ Ω − ∈ ρ,σ Ω ρ,σ Ω P X∈ X∈ DK (P,Q) DK (P, Q). (28) = pDK (P ,Q )+(1 p)DK (P ,Q ). (24) ≥ 1 1 − 2 2 Proof. Let Π(σ, ρ) be a joint probability distribution Property 6 (Monotonicity under CPTP maps). for which the minimum in the definition (12) of D(P,Q) Let : ( ) ( ′), where and ′ generally can is attained. Since ∆(ρ, σ) is jointly convex [2], we have have differentE B H dimensions,→B H be a completelyH H positive trace- preserving (CPTP) map. (Any such map can be writ- DK (P,Q)= Π(ρ, σ)∆(ρ, σ) ten in the Kraus form (ρ) = i MiρMi†, ρ ( ) ρ,σ Ω [31]). Denote the set ofE density matrices consisting∀ ∈ B H of X∈ P ∆( Π(ρ, σ)ρ, Π(ρ, σ)σ) (ρ), with ρ Ω, by Ω . If we apply the same CPTP ≥ E ∈ E ρ,σ Ω ρ,σ Ω map to every state in an ensemble P (ρ), ρ Ω, we X∈ X∈ E ∈ obtain another ensemble P ′(ρ′), ρ′ Ω . Note that gen- = ∆( P (ρ)ρ, Q(σ)σ) ∈ E erally P (ρ) = P ′( (ρ)), because the map may be such ρ Ω σ Ω 6 E E X∈ X∈ that it takes two or more different states from Ω to one K = ∆(ρP , ρQ)= D (P, Q). (29) and the same state in Ω , e.g., (ρ1) = (ρ2), ρ1 Ω, ρ Ω, ρ = ρ . (The oppositeE obviouslyE E cannot happen∈ 2 ∈ 1 6 2 (For the last equality, see Eq. (48) below.) because every state ρ in Ω is mapped to a unique state Corollary. If two distributions are close, their average (ρ) Ω .) Thus the operation induces a map from states are also close, i.e., Ethe set∈ ofE probability distributionsE over Ω to the set of probability distributions over Ω . Denote this map by if DK (P,Q) ε, then ∆(ρ , ρ ) DK (P,Q) ε. E ≤ P Q ≤ ≤ M : Ω ΩE . (30) E P →P Now we can state the property of monotonicity under Property 8 (Continuity of the average of a con- CPTP maps as follows: For all CPTP maps , tinuous function). Let h(ρ) be a bounded function, E which is continuous with respect to the distance ∆. Then DK (P,Q) DK (M (P ),M (Q)), (25) E E the ensemble average of h(ρ), hP = P (ρ)h(ρ), is con- ≥ ρ Ω K ∈ where M : Ω ΩE is the map induced by . tinuous with respect to D . P E P →P E Proof. Let Π(ρ, σ) be a joint probabil- Proof. The proof is presented in Appendix A. ity distribution for which the minimum in Comment. Property 8 naturally reflects the idea of the definition (12) of DK (P,Q) is attained. states as resources. Assuming that a resource is a con- Observe that ρ,σ Ω Π(ρ, σ)∆( (ρ), (σ)) = tinuous function of the state, if two ensembles are close, ∈ E E ′ ′ Π (ρ , σ )∆(ρ , σ ), where Π (ρ , σ ) is a joint their corresponding average resources must also be close. ρ ,σ ΩE ′ ′ ′ ′ ′ ′ ′ ′ probability∈ distributionP over Ω Ω with marginals Example (Continuity of the Holevo informa- P E × E P ′(ρ′) and Q(ρ′). This can be seen from the fact that tion). A function of ensembles, which is of great sig- P ′(ρ′) = P (ρx), where the sum is over all ρx Ω nificance in quantum information theory, is the Holevo x ∈ P 8 information [18] Comment. This inequality is based on a Fannes-type inequality for the von Neumann entropy due to Aude- χ(P )= S(ρ) pxS(ρx). (31) naert [35], which is stronger than the original inequal- − x ity by Fannes [36] and provides the sharpest continuity X bound for the von Neumann entropy based on ∆ and d. Here ρ = x pxρx is the average density matrix of the Proof. In Ref. [35], it was shown that ensemble (px,ρx) which we denote by P for short, and S(ρ) = {Tr(P ρ log ρ}) is the von Neumann entropy. This S(ρ) S(σ) ∆ log2(d 1)+ H((∆, 1 ∆)). (36) function− gives an upper bound to the amount of infor- | − |≤ − − mation about the index x extractable through measure- The theorem follows from Lemma 1 and the fact that the ments on a state obtained randomly from the ensemble right-hand side of Eq. (36) is a concave function of ∆. and is used to define the classical capacity of a quantum Corollary (Continuity bound for the Holevo in- channel under independent uses of the channel [33, 34]. formation). The term S(ρ) in the expression (31) for The second term in the expression (31) is the average of the Holevo information is not an average of a func- the von Neumann entropy over the ensemble, while the tion, but according to the Corollary of Property 7, first term is the von Neumann entropy of the average. ∆(σ, ρ) DK (P,Q). The right-hand side of Eq. (36) Since S(ρ) is a continuous function, from Property 8 and is monotonically≤ increasing in the interval 0 ∆ the Corollary of Property 7 one can easily see that the (d 1)/d and monotonically decreasing in the≤ interval≤ Holevo information is a continuous function of the ensem- (d − 1)/d < ∆ 1. Therefore, we can write ble with respect to the Kantorovich distance. It would be − ≤ K K K interesting, however, to obtain an explicit bound of that S(σ) S(ρ) D log2(d 1)+ H((D , 1 D )) | − |≤ − − continuity. For this purpose, we will need the following for 0 DK (d 1)/d. (37) lemma. ≤ ≤ − Lemma 1. If a function h(ρ) satisfies the continuity Combining Eq. (35) and Eq. (37), we obtain property K K K χ(Q) χ(P ) 2D log2(d 1)+2H((D , 1 D )) h(ρ) h(σ) g[∆(ρ, σ)] (32) | − |≤ − − | − |≤ for 0 DK (d 1)/d. (38) ≤ ≤ − for some function g[x] that is concave in x [0, 1], then ∈ For the interval (d 1)/d

g Π(ρ, σ)∆(ρ, σ) = g[DK (P,Q)]. (34) DK (P R,Q R)= DK (P,Q). (40) ≤   ⊗ ⊗ ρ,σ Ω X∈   Comment. The physical meaning of this property Theorem 1 (A Fannes-type inequality for the is that unrelated ensembles do not affect the value of ensemble average of the von Neumann entropy). DK (P,Q). Even though this may seem as a natural prop- For any two ensembles P and Q of density matrices over erty to expect from a distance, it does not hold in general a d-dimensional Hilbert space, even for distance measures between states. For example, the Hilbert-Schmidt distance Tr(ρ σ)2, which has a K K K − SP SQ D log2(d 1)+ H((D , 1 D )), (35) well-defined operational meaning [7], is not stable. | − |≤ − − Proof. Let p where DK is the Kantorovich distance between the K K ensembles P and Q, and H((D , 1 D )) = DK (P R,Q R)= K K K K − ⊗ ⊗ D log2(D ) (1 D ) log2(1 D ) is the Shannon − − − − K Π(ρ τ ′, σ κ′)∆(ρ τ ′, σ κ′), (41) entropy of the binary probability distribution (D , 1 ⊗ ⊗ ⊗ ⊗ K − ρ,σ Ω;τ ′,κ′ Ω′ D ). ∈ X ∈ 9 where Π(ρ τ ′, σ κ′) has left and right marginals Limiting case 1 (Two singleton ensembles). If ⊗ ⊗ P (ρ)R(τ ′) and Q(σ)R(κ′), respectively. From the mono- P (ρ)= δρτ , ρ, τ Ω and Q(ρ)= δρσ, ρ, σ Ω, i.e., each tonicity of ∆ under partial tracing it follows that of the ensembles∈P and Q consists of only∈ a single state, then the distance between the ensembles is equal to the K D (P R,Q R) Π′(ρ, σ)∆(ρ, σ), (42) distance between the respective states, ⊗ ⊗ ≥ ρ,σ Ω X∈ DK (P,Q) = ∆(τ, σ). (48) where Proof. Obviously, the only joint probability distribu- Π′(ρ, σ)= Π(ρ τ ′, σ κ′) (43) tion with marginals P and Q in this case is Π(κ, τ) = ⊗ ⊗ τ ′,κ′ Ω′ δκσδτρ, so the property follows. X∈ Limiting case 2 (One singleton ensemble). If the is a joint probability distribution with left and right ensemble Q(ρ) consists of only one state σ, i.e., Q(ρ) = marginals P (ρ) and Q(σ), respectively. Therefore, δρσ, ρ, σ Ω, then the distance between P (ρ) and Q(ρ) is equal to∈ the average distance between a state drawn DK (P R,Q R) DK (P,Q). (44) ⊗ ⊗ ≥ from the ensemble P (ρ) and the state σ,

But by choosing Π(ρ τ , σ κ ) = Π(ρ, σ)R(τ )δ ′ ′ , ′ ′ ′ τ κ DK (P,Q)= P (ρ)∆(ρ, σ). (49) where Π(ρ, σ) is a joint⊗ distribution⊗ which attains the ρ Ω minimum in the definition (12) of DK (P,Q), and using X∈ the stability of ∆, the equality in Eq. (44) is attained. Proof. The property follows from the fact that the This completes the proof. only joint probability distribution with marginals P and Property 10 (Linear programming). The task of Q in this case is Π(κ,ρ)= δσκP (ρ). finding the optimal Π(σ, ρ) in Eq.(12) is a linear program Limiting case 3 (Classical distributions). If the and can be solved efficiently in the cardinality of Ω. set Ω consists of perfectly distinguishable density matri- K Proof. If the cardinality of Ω is N, we can think ces, i.e., ∆(ρ, σ)=1 δρσ, ρ, σ Ω, then D (P,Q) of ∆(ρ, σ), ρ, σ Ω as the components cµ, µ = (ρ, σ), reduces to the Kolmogorov− distance∀ ∈ between the classi- 2 ∈ of an N -component vector which we will denote by c. cal probability distributions P and Q, The joint probability distribution Π(ρ, σ) over which we 1 want to minimize the expression on the right-hand side of DK (P,Q)= P (ρ) Q(ρ) . (50) Eq. (12) can similarly be thought of as an N 2-component 2 | − | ρ Ω vector x with components xµ, µ = (ρ, σ). Thus the task X∈ of finding the optimal Π(ρ, σ) can be expressed in the Proof. Since in this case the set Ω consists of orthog- compact form onal states, we can write the right-hand side of Eq. (12) as Minimize cT x. (45) min Π(ρ, σ) 1+ Π(ρ,ρ) 0 Π(ρ,σ) × × The constraints σ Ω Π(ρ, σ) = P (ρ), ρ Ω, and ρ,σ Ω,ρ=σ ρ Ω ∈ ∀ ∈ ∈X 6 X∈ ρ Ω Π(ρ, σ) = Q(σ), σ Ω, can also be expressed in a∈ compact matrixP forms∀ as∈ = min (1 Π(ρ,ρ)), (51) Π(ρ,σ) − P ρ Ω X∈ Ax = a, where the equality follows from the fact that Bx = b, (46)

2 Π(ρ, σ)+ Π(ρ,ρ)=1. (52) where A is an N N matrix with components Aκµ = δκρ ρ,σ Ω,ρ=σ ρ Ω where µ = (ρ, σ)× is a double index, B is an N N 2 matrix ∈X 6 X∈ × with components B(κ,µ) = δκσ (µ = (ρ, σ)), and a and The minimum in Eq. (51) is achieved when ρ Ω Π(ρ,ρ) b are N-component vectors with elements aκ = P (κ), is maximal, which in turn is achieved when each∈ of the κ Ω, and bκ = Q(κ), κ Ω, respectively. In addition, P ∈ ∈ terms Π(ρ,ρ) is maximal. Since the maximum value of the positivity of the quantities Π(ρ, σ) amounts to the Π(ρ,ρ) is min(P (ρ),Q(ρ)), we obtain constraint DK (Q, P )=(1 min(Q(ρ), P (ρ))) x 0. (47) − ρ Ω ≥ X∈ 1 Eqs. (45)-(47) are the canonical form of a linear program, = Q(ρ) P (ρ) . (53) 2 2 | − | which can be solved efficiently in the length N of the ρ Ω vector x. This completes the proof. X∈ It is natural to ask about the properties of the dis- Comment. Note that we can distinguish two limits tance in certain simple limiting cases. We consider the which can be interpreted as comparing classical proba- following three cases. bility distributions. One is Limiting case 3—probability 10 distributions over a set of orthogonal states. The other is Corollary. If two distributions are close, their average the case where each of the two ensembles consists of a sin- states are also close, i.e., gle state (two singleton ensembles) and the two states are K diagonal in the same basis. In both limits, the distance if F (P,Q) 1 ε, then F (ρP , ρQ) 1 ε. (62) K ≥ − ≥ − D (Q, P ) reduces to the Kolmogorov distance between Property 6 (Stability). Let P (ρ) and Q(ρ) be two classical distributions. ensembles of states in Ω and R(σ′) be an ensemble of states in Ω′. Then, C. Properties of the Kantorovich fidelity F K (P R,Q R)= F K(P,Q). (63) ⊗ ⊗ Property 7 (Linear programming). The task of The following properties of the Kantorovich fidelity finding the optimal Π(ρ, σ) in Eq.(13) is a linear program (13) can be proven similarly to the corresponding proper- and can be solved efficiently in the cardinality of Ω. ties of the Kantorovich distance, which is why we present Limiting case 1 (Two singleton ensembles). If them without proof. P (ρ)= δ , ρ, τ Ω and Q(ρ)= δ , ρ, σ Ω, i.e., each Property 1 (Positivity and normalization). ρτ ρσ of the ensembles∈P and Q consists of only∈ a single state, then the fidelity between the ensembles is equal to the 0 F K (P,Q) 1, (54) ≤ ≤ fidelity between the respective states, P,Q Ω, ∀ ∈P F K (P,Q)= F (τ, σ). (64) with Limiting case 2 (One singleton ensemble). If the F K(P,Q) = 1 iff P (ρ)= Q(ρ), ρ Ω, (55) ensemble Q(ρ) consists of only one state σ, i.e., Q(ρ) = ∀ ∈ δρσ, ρ, σ Ω, then the fidelity between P (ρ) and Q(ρ) is ∈ and equal to the average fidelity between a state drawn from the ensemble P (ρ) and the state σ, F K (P,Q)=0 (56) F K (P,Q)= P (ρ)F (ρ, σ). (65) if and only if the supports of P and Q are orthogonal sets ρ Ω X∈ of states. Property 2 (Symmetry). Limiting case 3 (Classical distributions). If the set Ω consists of perfectly distinguishable density matri- K F K (P,Q)= F K (Q, P ), (57) ces, i.e., F (ρ, σ)= δρσ, ρ, σ Ω, then F (P,Q) reduces to the following overlap∀ between∈ the classical probability P,Q . ∀ ∈PΩ distributions over the set Ω: 1 Property 3 (Joint concavity). F K (P,Q)= min(P (ρ),Q(ρ))=1 P (ρ) Q(ρ) . −2 | − | ρ Ω ρ Ω F K (pP + (1 p)P ,pQ + (1 p)Q ) (58) X∈ X∈ 1 − 2 1 − 2 (66) K K pF (P1,Q1)+(1 p)F (P2,Q2), Comment. As pointed out earlier, there are two lim- ≥ − its which can be interpreted as corresponding to classi- P1, P2,Q1,Q2 Ω, p [0, 1]. ∀ ∈P ∀ ∈ cal probability distributions—Limiting case 3 (probabil- Property 4 (Monotonicity under CPTP maps). ity distributions over a set of orthogonal states), and the For all CPTP maps , limit of two singleton ensembles where the two states E are diagonal in the same basis. Here, these two lim- F K (P,Q) F K (M (P ),M (Q)), (59) its yield different results. In the first case, we obtain ≤ E E Eq. (66) which is a particular type of overlap between where M : Ω ΩE is the map induced by . classical probability distributions. In the second case, if CorollaryE P (Invariance→P under unitary maps).E For P (ρ) and Q(ρ) are the spectra of the two density ma- all unitary maps , trices, the fidelity reduces to the Bhattacharyya over- U lap P (ρ),Q(ρ) which upper bounds expression (66). K K ρ Ω F (P,Q)= F (M (P ),M (Q)), (60) ∈ U U ThisP reflectsp the fact that the way F K treats the overlap between the ‘classical aspect’ of the probability distribu- where M : Ω ΩU is the map induced by . PropertyU P 5→P (Monotonicity under averaging).U tion P (ρ) is not a special case of the way it treats the Let P denote the singleton ensemble consisting of the overlap between two quantum states. We will show in average state of P (ρ), ρ = P (ρ)ρ. Then subsection E, that this property is intimately related to P K ρ Ω the fact that F is not monotonic under measurements. ∈ P The fidelity which we propose in Sec. V is monotonic un- F K (P,Q) F K (P, Q). (61) der measurements and both its classical limits coincide. ≤ 11

D. Operational interpretations of the Kantorovich the states, measures

F (τ,υ) = min Tr(Eiτ) Tr(Eiυ), (68) Ei { } i To further develop our understanding of the meaning Xp p of the Kantorovich measures, it is useful to illustrate their where minimum is taken over all positive operators Ei interpretation in the spirit of game theory. Let us con- { } that form a positive operator-valued measure ( Ei = I). sider the Kantorovich distance first. i The trace distance is related to the maximum average Then we can modify the game as follows. AfterP sending one of the two states to Bob, Alice does not throw away probability pmax(ρ, σ) with which two equally probable states ρ and σ can be distinguished by a measurement as the other state, but waits for Bob to tell her the type 1 1 of measurement he performs on his state, and she per- follows: pmax(ρ, σ)= 2 + 2 ρ σ [6]. This naturally suggests the following gamek scenario.− k Imagine that Alice forms the same measurement on her state. They record has access to two ensembles of quantum states P (ρ) and their results under many repetitions, and at the end they Q(ρ), ρ Ω. More precisely, we will assume that she calculate the average of the statistical overlap between has at her∈ disposal two sufficiently large pools of states the resulting distributions of measurement outcomes for in which the relative frequencies of different states are every pair of states. Bob’s task is to minimize this quan- approximately equal to the corresponding probabilities tity by appropriately choosing his measurements for ev- for these states within a satisfactory precision. Alice has ery pair of states, while Alice’s goal is again to make to pick one state from one pool and another state from Bob’s task as difficult as possible by choosing the pairs the other pool and choose randomly (with equal proba- of states in a suitable manner. bility) whether to send the first state to Bob and throw the other away, or vice versa. She has to tell Bob which is E. Non-monotonicity under generalized the pair of states drawn from the two ensembles. Bob’s measurements task is to distinguish, by performing any operation on the received state, from which ensemble the state he re- ceives has been drawn. This is repeated until the two The trace distance and the fidelity (as well as all pools are depleted (the two pools are assumed to have fidelity-based distance measures between states) are equal numbers of states). Bob’s success is measured in monotonic under CPTP maps [2, 20, 32]. This prop- terms of the average number of times he guesses correctly erty, also known as contractivity, can be understood as the ensemble from which the state he receives has been an expression of the fact that the distinguishability be- drawn. Alice’s goal, on the other hand, is to choose the tween states described by these measures cannot be in- pairs of states from the two ensembles in such a way as creased by performing any operation on the states. One to make Bob’s task as difficult as possible. may wonder if, when going to the realm of ensembles, we should expect a measure of distinguishability between If every time Bob employs the optimal measurement ensembles to be monotonic under the more general class strategy for distinguishing which state he has been sent, of stochastic operations, i.e., generalized measurements. it is obvious that the optimal strategy for Alice is to pair After all, these are operations that transform ensembles the states according to the joint probability distribution into ensembles. We will show that this is not satisfied Π(σ, ρ) which minimizes the right-hand side of Eq. (12), by the Kantorovich distance and fidelity. We will also that is, minimizes the average probability of correctly relate this property to the fact that the Kantorovich fi- distinguishing the two states in each pair by an optimal delity yields two different results in the two ‘classical’ measurement. The Kantorovich distance can then be un- limits since a necessary condition for a Kantorovich mea- derstood as sure to be monotonic under measurements is that both its classical limits are the same. This condition, however, K Bob D (P,Q)=2pmax(P,Q) 1, (67) is not sufficient, as shown by the case of the Kantorovich − distance.

Bob Note, however, that our definitions of the Kantorovich where pmax(P,Q) is Bob’s maximal probability of success measures were based on the trace distance and the square when Alice chooses her strategy optimally. root fidelity. In an analogous manner, one can define The fidelity F K (P,Q) can be given a similar opera- Kantorovich measures based on any other distance or fi- tional interpretation, although a bit more artificial. The delity between states. Non-monotonicity under general- difference is that Bob’s task and corresponding measure ized measurements is not a problem per se and we will see of success have to be chosen so that they are given by that there is no reason why we should expect it, consider- the fidelity between the two states which Bob wants to ing the operational meaning of the Kantorovich measures distinguish at every round. For this purpose, we can use based on the trace distance and the square root fidelity. Fuchs’ operational interpretation of the fidelity [12] as the Nevertheless, it would be useful to have measures such minimum Bhattacharyya overlap between the statistical that the distinguishability between ensembles that they distributions generated by all possible measurements on describe cannot be increased by any possible operation 12

(see Sec. V). Driven by this motivation, we derive neces- for the resulting ensembles can only be better than her sary and sufficient conditions that a measure of distance optimal strategy for the original ensembles. Indeed, as or fidelity between states has to satisfy in order for the shown in Appendix B, this is not the case when the figure corresponding Kantorovich measure to be monotonic un- of merit is based on the trace distance or the square root der measurements. fidelity. Let us first formulate precisely what we mean by mono- We now provide necessary and sufficient conditions tonicity under generalized measurements. As pointed out that a measure of distance or fidelity between states has earlier, under the most general type of quantum measure- to satisfy in order for the Kantorovich measure based ment, the state of a system transforms as in Eq. (3). on it to be monotonic under measurements. We will de- K Definition 3 (Monotonicity under generalized note by Dd the Kantorovich distance based on a distance measurements). Consider a measurement M with d(ρ, σ) between states, which is defined as in Eq. (12) K measurement superoperators i . Denote the set of with d in the place of ∆. Similarly, by F we will de- {M } f distinct density matrices among all possible outcomes note the Kantorovich fidelity based on a fidelity f(ρ, σ) i(ρ) M over all possible inputs ρ Ω by ΩM. If we ap- between states. Tr i(ρ) ∈ plyM the same generalized measurement (3) to every state Theorem 2 (Conditions for monotonicity of the in an ensemble P (ρ), ρ Ω, we obtain another ensemble Kantorovich measures under generalized mea- ∈ P ′(ρ′), ρ′ ΩM. Thus the generalized measurement (3) surements). Let d(ρ, σ) and f(ρ, σ) be normalized induces a∈ map from the set of probability distributions distance and fidelity between states, which are mono- over Ω to the set of probability distributions over ΩM. tonic under CPTP maps and jointly convex (concave). The Kantorovich distance DK (P,Q) or fidelity F K (P,Q) Denote this map by M : Ω ΩM . When we say that d f a distance function D(Q,P ) between→P ensembles of states based on d(ρ, σ) and f(ρ, σ), respectively, is monotonic Q and P is monotonically decreasing (or simply mono- under generalized measurements if and only if for every tonic) under generalized measurements, we mean that for two states of the form piρi i i and qiσi i i , i ⊗| ih | i ⊗| ih | any generalized measurement (3), where i is an orthonormal set of states, the distance and fidelity{| i} satisfy P P D(M(P ),M(Q)) D(P,Q), (69) ≤ d( piρi i i , qiσi i i ) ⊗ | ih | ⊗ | ih | where M : M is the map induced by the mea- i i PΩ → PΩ X X surement. Similarly, a monotonicity of a fidelity F (Q, P ) 1 = min(pi, qi)∆(ρi, σi)+ pi qi , (71) means 2| − | i X   F (M(P ),M(Q)) F (P,Q) (70) ≥ and for any generalized measurement. f( piρi i i , qiσi i i ) ⊗ | ih | ⊗ | ih | Property. The Kantorovich distance based on the i i trace distance (Eq. (12)) and the Kantorovich fidelity X X based on the square root fidelity (Eq. (13)) are not mono- = min(pi, qi)F (ρi, σi), (72) i tonic under generalized measurements. X Proof. The proof is presented in Appendix B. respectively. The lack of monotonicity of the Kantorovich measures Proof. The proof is presented in Appendix C. is something that should not be surprising considering Comment 1. This theorem is a statement regard- the operational interpretations we discussed in the pre- ing the relation between the values of a given measure vious subsection. Generally, monotonicity under certain (distance or fidelity) between states over Hilbert spaces types of operations means that the type of distinguisha- of different dimensions. Note that if a measure has a bility described by the measures cannot be increased un- well-defined operational interpretation formulated with- der these operations. However, from the above game sce- out reference to the dimension of the Hilbert space (to narios we see that the distinguishability concerns Bob’s the best of our knowledge, this is the case for all known ability do distinguish which of a pair of states Alice has measures of distance and fidelity between states), that sent to him, in the case where Alice has chosen the way measure is automatically defined for any dimension. The she pairs the states in an optimal way. Certainly, by property of monotonicity that we are interested in is also applying a measurement on the state he receives, Bob dimension-independent. We remark that the above the- cannot improve his chances of guessing correctly beyond orem concerns distance and fidelity measures between what he would obtain by doing the optimal measurement. states which are monotonic under CPTP maps without However, the question of monotonicity we are asking con- the restriction that the CPTP maps preserve the dimen- cerns applying the same measurement to all states in the sion of the Hilbert space since we are interested in proving original ensembles before Alice has chosen her optimal monotonicity under the most general type of quantum strategy. There is no reason to expect that after apply- operations. One can easily see that monotonicity under ing a measurement on all of the states in the original CPTP maps that can increase the dimension is equiva- ensembles, the optimal strategy that Alice can employ lent to monotonicity under dimension-preserving CPTP 13

maps plus the stability condition d(ρ, σ)= d(ρ κ, σ κ) of the form ρ = x pxρx [x] (Eq. (8)). When only a and f(ρ, σ) = f(ρ κ, σ κ) for all ρ, σ ⊗( )⊗ and single ensemble is involved,⊗ this representation is suffi- ⊗ ⊗ ∈ B H P κ ( ′) where and ′ are arbitrary Hilbert spaces. cient and itb is not important what the pointer (or flag) Similarly,∈B H monotonicityH underH CPTP maps that can de- states [x] x x are, as long as they form an orthonor- crease the dimension is equivalent to monotonicity under mal set and≡ | eachih | [x] is unambiguously associated with dimension-preserving CPTP maps plus monotonicity un- ρx. However, if we want to use the EHS idea to compare der partial tracing. two ensembles, we need to go beyond this simple formu- Comment 2. The third Jozsa axiom states that a lation. In Sec. III, we already saw one example where fidelity function should satisfy [37] a naive application of this idea fails. Namely, we argued that if we represent two ensembles P (ρ) and Q(ρ), ρ Ω, f(ρ, ψ ψ )= ψ ρ ψ . (73) by the states P (ρ)ρ [ρ] and Q(ρ)ρ [∈ρ], a | ih | h | | i ρ Ω ρ Ω distance or fidelity∈ between⊗ these EHS representations∈ ⊗ is The square root fidelity we have considered above satis- equivalent to aP distance or fidelity betweenP the probabil- fies a modified version of that axiom, namely, ity distributions P (ρ) and Q(ρ) in which ρ is treated as a classical variable. Such a measure does not capture the F (ρ, ψ ψ )= ψ ρ ψ . (74) | ih | h | | i idea of closeness between different quantum states. In But one can see that if the fidelityp f satisfies Eq. (72), it this section, we will provide a generalized formulation of must satisfy an EHS representation of an ensemble, which will allow us to define measures of distance and fidelity between ensembles that possess all properties that we would like f( pj ρj j j , ψ ψ i i ) ⊗ | ih | | ih | ⊗ | ih | such measures to have. j X For this purpose, it is convenient to introduce the no- = pif(ρi i i , ψ ψ i i ), (75) ⊗ | ih | | ih | ⊗ | ih | tion of a ‘classical’ system whose states live in a ’classical’ space which we define to be a fixed set ΩC of orthogonal which can be only consistent with Eq. (73) and not with pure states [c], Tr([c][c ]) = δ ′ , where we use the nota- Eq. (74). This rules out a class of possible fidelity func- ′ cc tion [c] c c to distinguish the states of the ‘classical’ tions. system≡ from | ih the| states of the quantum system. Gener- A natural question to ask is whether there actually ex- ally, the classical space can consist of infinitely many dif- ist measures of distance or fidelity between states that ferent states, but later we will see that it suffices to con- satisfy the conditions of the theorem and thereby would sider a classical space of cardinality ΩC = Ω 2, where give rise to Kantorovich measures that are monotonic Ω is the cardinality of the set Ω of| density| | | matrices under generalized measurements. We leave this problem |participating| in the ensembles. open for future investigation. Instead, in the next sec- Given the classical system described by the classical tion we propose distance and fidelity between ensembles C which are based on the trace distance and the square root space Ω and a set Ω of states of a quantum system, we can ask what are the most general states of the quantum- fidelity but are not of the Kantorovich type and satisfy the desired monotonicity. classical system that represent an ensemble P (ρ), ρ Ω, consistently with our notion of ensemble. As we pointed∈ out, the information about the identity of a quantum V. DISTANCE AND FIDELITY BASED ON THE state from the ensemble must be stored in the classi- EXTENDED-HILBERT-SPACE cal system in a way which allows one to unambiguously REPRESENTATION OF ENSEMBLES identify the state by measuring the state of the classical system. If we take this to be the definition of a valid EHS A. Motivating the definitions representation, then we should allow for the possibility that several flag states [ci(ρ)] point at the same quan- tum state as long as every{ flag} state is associated with a In this section, we adopt a different approach to single quantum state and, of course, each quantum state defining measures between ensembles of quantum states, ρ still appears with the correct total probability. More which is based on the extended-Hilbert-space (EHS) rep- succinctly, the most general EHS representation should resentation of ensembles that we briefly touched upon in allow for mixed flag states, i.e., Sec. II. As we pointed out, an ensemble describes states occurring randomly according to some probability distri- bution, but an indispensable part of the ensemble is the ρP = P (ρ)ρ pi(ρ)[ci(ρ)] . (76) ⊗ classical side information about the identity of the given ρ Ω i ! state. The idea behind the EHS representation is that X∈ X b the classical system storing that information is ultimately Having a quantum-classical state of this form is equiv- quantum and therefore it must be possible to describe it alent to having the ensemble (P (ρ),ρ) because by mea- in the language of . In the original suring the state of the classical{ system, we} can infer which formulation of the EHS representation [19], an ensemble state from the ensemble we are given, and given a state of the form (px,ρx) is represented in terms of a state drawn randomly from the ensemble we can always pre- { } 14 pare the state (76) by attaching the corresponding classi- where F is the square root fidelity (Eq. (1)), and max- cal state and discarding any additional information. Note imum is taken over all EHS representations ρ and σ of that in the expression (76) we have written the classi- P (ρ) and Q(ρ), respectively. cal states as [ci(ρ)], explicitly indicating which classical Before we proceed with studying the propertiesb of theseb states are associated with the quantum state ρ, but it measures, it is convenient to present two equivalent for- is convenient to express the condition that every pointer mulations of the above definitions. state is associated with a unique ρ Ω as a condition on Lemma 2 (Equivalent form of the EHS dis- a general state of the quantum-classical∈ system. tance). The EHS distance (80) is equivalent to

Definition 4 (EHS representation of an ensem- EHS ble). An EHS representation of an ensemble P (ρ), D (P,Q) = (82) ρ Ω, is a quantum-classical state of the form min ∆( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]), ∈ P (ρ,σ), ⊗ ⊗ ρ,σ Ω ρ,σ Ω ρ = P (ρ, [c])ρ [c], (77) Q(ρ,σ) X∈ X∈ ⊗ ρ Ω [c] ΩC X∈ X∈ where minimum is taken over pairs of joint probabil- b e for which the non-negative quantities P (ρ, [c]) satisfy ity distributions P (ρ, σ) and Q(ρ, σ) such that the left marginal of P (ρ, σ) is equal to P (ρ) and the right P (ρ, [c]) = P (ρ), ρ Ω, (78) marginal of Q(ρ, σ) is equal to Q(σ). The set of pointer ∀ e∈ [c] ΩC states [ρσ] is fixed and has cardinality equal to the square X∈ of the cardinality of Ω. P (ρ, [c])P (σ, [c])e = 0, ρ, σ Ω ρ = σ, [c] ΩC . Proof. First, observe that for any two EHS represen- ∀ ∈ | 6 ∀ ∈ (79) tations ρ and σ of P and Q, the distance ∆(ρ, σ) has the e e Equation (78) ensures that every quantum state ρ Ω form ∈ occurs with the correct probability P (ρ) and Eq. (79) ex- b b b b presses the fact that a given pointer state [c] in ΩC cannot ∆(ρ, σ) = (83) be associated with more than one state in Ω. In other ∆( P (ρ, [c])ρ [c], Q(ρ, [c])ρ [c]), C ⊗ ⊗ words, there exists an injective function ζ : Ω Ω ρ Ω [c] ΩC b bρ Ω [c] ΩC → X∈ X∈ X∈ X∈ which specifies the pointer states associated with a given e e 1 ρ Ω, and P (ρ, [c]) = 0 if ζ− ([c]) = ρ. It is impor- where P (ρ, [c]) and Q(ρ, [c]) are consistent with Defini- tant∈ to note that a given ensemble can6 be encoded us- tion 4. It can generally happen that one and the same ing many differente injections. If two ensembles P and pointere [c] is attachede to a state ρ from the first ensem- Q are encoded using injections ζP and ζQ which map ble and to a state σ from the second ensemble, that the space Ω to two non-overlapping subsets of ΩC , the is, P (ρ, [c]) = 0 and Q(σ, [c]) = 0. However, having a corresponding EHS representations of the two ensembles pair of states6 ρ and σ from the6 first and second ensem- would be completely orthogonal and therefore perfectly bles,e respectively, attachede simultaneously to more than distinguishable. However, if the sets of quantum states one pointer, does not help in attaining the minimum in participating in the two ensembles are not orthogonal, Eq. (80). This follows from the fact that we could replace one can always chose two EHS representations of the two the second pointer by the first one, which would result ensembles which have a non-zero overlap because one can in valid EHS representations of the two ensembles. But assign one and the same pointer to two non-overlapping the latter operation also corresponds to a CPTP map states from the two ensembles. At the same time, unless on the states in the extended Hilbert space, and since the two ensembles are identical, their EHS representa- ∆ is monotonic under CPTP maps, the resultant rep- tions cannot be made identical. This suggests a way of resentations will be closer. Therefore, without loss of defining distance and fidelity between ensembles based generality, we can assume that every pair of states ρ and on an optimal choice of their EHS representations. σ from the first and second ensemble, respectively, is as- Definition 5 (EHS distance between ensembles). sociated with a single pointer state, which we will label The EHS distance between the ensembles P (ρ) and Q(ρ), by [ρσ]. This implies that the minimum in Eq. (80) can ρ Ω, is be taken over EHS representations of P and Q of the ∈ EHS form ρ,σ Ω P (ρ, σ)ρ [ρσ] and ρ,σ Ω Q(ρ, σ)σ [ρσ], D (P,Q) = min ∆(ρ, σ), (80) ∈ ⊗ ∈ ⊗ ρ,b σb where the condition of consistency with the original dis- tributionsP P and Q amounts to conditionsP on the left and where ∆ is the trace distance (Eq. (2)), and minimum is b b right marginals of P (ρ, σ) and Q(ρ, σ), respectively: taken over all EHS representations ρ and σ of P (ρ) and Q(ρ), respectively. P (ρ, σ)= P (ρ), (84) Definition 6 (EHS fidelity between ensembles). b b σ The EHS fidelity between the ensembles P (ρ) and Q(ρ), X ρ Ω, is Q(ρ, σ)= Q(σ). (85) ∈ ρ F EHS(P,Q) = max F (ρ, σ), (81) X ρ,b σb This completes the proof. b b 15

Lemma 3 (Equivalent form of the EHS fidelity). with equality The EHS fidelity (81) is equivalent to DEHS(P,Q)=1 (92) F EHS(P,Q) = (86) if and only if the supports of P and Q are orthogonal sets max F ( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]), of states. P (ρ,σ),Q(ρ,σ) ⊗ ⊗ EHS ρ,σ Ω ρ,σ Ω Proof. Since ∆(ρ, σ) 1, obviously D (P,Q) 1. X∈ X∈ If P and Q have supports≤ on orthogonal sets of states,≤ where minimum is taken over pairs of joint probabil- then all of their EHS representations will also be or- ity distributions P (ρ, σ) and Q(ρ, σ) such that the left thogonal, which implies DEHS(P,Q) = 1. Reversely, if marginal of P (ρ, σ) is equal to P (ρ) and the right DEHS(P,Q) = 1, this means that the EHS states for marginal of Q(ρ, σ) is equal to Q(σ). The set of pointer which the minimum in Eq. (80) is achieved, must be states [ρσ] is fixed and has cardinality equal to the square orthogonal. But unless P and Q have supports on or- of the cardinality of Ω. thogonal sets of states, it is always possible to find EHS Proof. The proof is analogous to the proof of Lemma representations of P and Q which have non-zero overlap 2. because we can assign one and the same pointer to two Corollary (Formulation without reference to an non-overlapping states from the two different ensembles. extended Hilbert space). Considering the explicit Property 3 (Symmetry). forms of the trace distance and the square root fidelity, EHS EHS one can see that Eqs. (82) and (86) can be written with- D (P,Q)= D (Q, P ), (93) out reference to the classical pointer system: P,Q . ∀ ∈PΩ 1 Proof. The symmetry follows from the definition (12) DEHS(P,Q)= min P (ρ, σ)ρ Q(ρ, σ)σ , 2 P (ρ,σ),Q(ρ,σ) k − k and the symmetry of ∆(ρ, σ). ρ,σ Ω X∈ Property 4 (Triangle inequality). (87) DEHS(P, R) DEHS(P,Q)+ DEHS(Q, R), (94) ≤ P,Q,R Ω. (95) F EHS(P,Q)= max P (ρ, σ)Q(ρ, σ)F (ρ, σ), ∀ ∈P P (ρ,σ),Q(ρ,σ) ρ,σ Ω Proof. The proof is presented in Appendix D. X∈ p (88) Property 5 (Joint convexity).

EHS where optimization is taken over all joint distributions D (pP1 + (1 p)P2,pQ1 + (1 p)Q2) (96) EHS − EHS− P (ρ, σ) with left marginal P (ρ) and Q(ρ, σ) with right pD (P1,Q1)+(1 p)D (P2,Q2), ≤ − marginal Q(σ). P , P ,Q ,Q , p [0, 1]. ∀ 1 2 1 2 ∈PΩ ∀ ∈ Proof. Let

B. Properties of the EHS distance EHS D (P1,Q1)=

Property 1 (Positivity). ∆( P1(ρ, σ)ρ [ρσ], Q1(ρ, σ)σ [ρσ]) (97) ⊗ ⊗ ρ,σ Ω ρ,σ Ω EHS X∈ X∈ D (P,Q) 0, (89) ≥ and P,Q Ω, ∀ ∈P EHS D (P2,Q2)= with equality ∆( P (ρ, σ)ρ [ρσ], Q (ρ, σ)σ [ρσ]), (98) 2 ⊗ 2 ⊗ DEHS(P,Q) = 0 iff P (ρ)= Q(ρ), ρ Ω. (90) ρ,σ Ω ρ,σ Ω ∀ ∈ X∈ X∈ where the joint distributions P (ρ, σ) and P (ρ, σ) have Proof. The EHS distance is obviously non-negative 1 2 left marginals P (ρ) and P (ρ), respectively, and the joint since ∆(ρ, σ) 0. If both ensembles are the same, 1 2 EHS distributions Q (ρ, σ) and Q (ρ, σ) have right marginals P (ρ) = Q(ρ), ≥ρ Ω, clearly D (P,Q) = 0, because 1 2 Q (σ) and Q (σ), respectively. Since ∆ is jointly convex, we can choose identical∀ ∈ EHS representations for both en- 1 2 EHS we have sembles. Reversely, if D (P,Q) = 0, this means that EHS EHS the EHS representations of P and Q must be identical, pD (P1,Q1)+(1 p)D (P2,Q2) which means that P and Q must be the same. − ≥ ∆( (pP (ρ, σ)+(1 p)P (ρ, σ))ρ [ρσ], Property 2 (Normalization). 1 − 2 ⊗ ρ,σ Ω ∈ EHS X D (P,Q) 1, (91) (pQ (ρ, σ)+(1 p)Q (ρ, σ))σ [ρσ]). (99) ≤ 1 − 2 ⊗ P,Q Ω, ρ,σ Ω ∀ ∈P X∈ 16

But obviously pP1(ρ, σ)+(1 p)P2(ρ, σ) is a joint dis- Proof. Let tribution with left marginal pP− (ρ)+(1 p)P (ρ), and 1 2 EHS pQ (ρ, σ) +(1 p)Q (ρ, σ) is a joint distribution− with D (P,Q)= 1 − 2 right marginal pQ1(σ) +(1 p)Q2(σ). Therefore, the ∆( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]). (104) quantity on the right-hand− side of Eq. (99) is greater ⊗ ⊗ ρ,σ Ω ρ,σ Ω than or equal to DEHS(pP + (1 p)P ,pQ + (1 p)Q ), X∈ X∈ 1 − 2 1 − 2 which completes the proof. Observe that Property 6 (Monotonicity under generalized EHS measurements). D (P,Q) is monotonic under gen- ρP = TrC ( P (ρ, σ)ρ [ρσ]) (105) ⊗ eralized measurements in the sense of Definition 3, ρ,σ Ω X∈ DEHS(P,Q) DEHS(M(P ),M(Q)). and Proof. Let≥ P (ρ) and Q(ρ), ρ Ω be two ensembles of quantum states, and let ∈ ρ = TrC ( Q(ρ, σ)σ [ρσ]), (106) Q ⊗ EHS ρ,σ Ω D (P,Q)= X∈ ∆( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]). (100) where TrC denotes partial tracing over the subsystem ⊗ ⊗ ρ,σ Ω ρ,σ Ω containing the classical pointers [ρσ] . On the other X∈ X∈ hand, ∆(ρ, σ)= DEHS(P, Q) (see Eq.{ (116)} below). Since ∆(ρ, σ) is monotonic under partial tracing (which is a Let i , i(ρ) = j Mij ρMij† , be the measure- ment{M superoperators} M of a generalized measurement M, CPTP map), the property follows. P Corollary. If two distributions are close, their average M † Mij = I. Consider the following CPTP map: i,j ij states are also close, i.e., P (ρ) i(ρ) [i], (101) EHS M → M ⊗ if D (P,Q) ε, then ∆(ρP , ρQ) ε. (107) i ≤ ≤ X Property 8 (Continuity of the average of a con- where [i] is an orthonormal set of pure states in the tinuous function). Let h(ρ) be a bounded function, Hilbert{ space} of some additional system. Since ∆ is which is continuous with respect to the distance ∆. Then monotonic under CPTP maps, we have the ensemble average of h(ρ), hP = P (ρ)h(ρ), is con- EHS ρ Ω D (P,Q) EHS ∈ ≥ tinuous with respect to D . P ∆( ( P (ρ, σ)ρ [ρσ]), ( Q(ρ, σ)σ [ρσ])) Proof. The proof is presented in Appendix E. M ⊗ M ⊗ ρ,σ Ω ρ,σ Ω Comment. Again, as we pointed out in relation to the X∈ X∈ Kantorovich distance, Property 8 naturally reflects the i(ρ) = ∆( P (ρ, σ)Tr( i(ρ)) M [ρσi], idea of states as resources—if a resource is a continuous M Tr( i(ρ)) ⊗ ρ,σ Ω i function of the state, when two ensembles are close, their X∈ X M i(σ) average resources must also be close. Q(ρ, σ)Tr( i(σ)) M [ρσi]) Property 9 (The EHS distance is upper M Tr( i(σ)) ⊗ ρ,σ Ω i X∈ X M bounded by the Kantorovich distance). DEHS(M(P ),M(Q)), (102) ≥ DEHS(P,Q) DK (P,Q). (108) ≤ where M : Ω ΩM is the map in- duced by the measurementP → P as explained in Defini- Proof. Let Π(ρ, σ) be a joint probability distribution with left and right marginals P (ρ) and Q(σ) for which the tion 3. The last inequality follows from the fact K i(ρ) minimum in the definition (12) of D (P,Q) is attained. that P (ρ, σ)Tr( i(ρ)) M [ρσi] and ρ,σ Ω i Tr( i(ρ)) ∈ M M ⊗ Obviously, the minimum in Eq. (82) satisfies i(σ) ρ,σ i Q(ρ, σ)Tr( i(σ)) M [ρσi] are EHS PΩ P M Tr( i(σ)) ⊗ EHS representations∈ of the new ensemblesM M(P ) and M(Q). D (P,Q) P P ≤ Corollary (Monotonicity under CPTP maps ∆( Π(ρ, σ)ρ [ρσ], Π(ρ, σ)σ [ρσ]) ⊗ ⊗ and invariance under unitary maps). Property 6 ob- ρ,σ Ω ρ,σ Ω viously implies monotonicity under CPTP maps, which X∈ X∈ K can be regarded as a special type of generalized mea- = Π(ρ, σ)∆(ρ, σ)= D (P,Q). (109) ρ,σ Ω surements. This in turn implies invariance under unitary X∈ maps since the latter are reversible CPTP maps. Property 10 (Stability). Let P (ρ) and Q(ρ) be two Property 7 (Monotonicity under averaging). ensembles of states in Ω and R(σ ) be an ensemble of Let P denote the singleton ensemble consisting of the ′ states in Ω′, where Ω and Ω′ are sets of states of two average state of P (ρ), ρP = P (ρ)ρ. Then ρ Ω different systems. Then, ∈ P EHS EHS DEHS(P,Q) DEHS(P, Q). (103) D (P R,Q R)= D (P,Q). (110) ≥ ⊗ ⊗ 17

Proof. Let tP (ρ, σ) + (1 t)P (ρ, σ). Similarly, if Q (ρ, σ) and 1 − 2 1 Q2(ρ, σ) have right marginals equal to Q(ρ), so does DEHS(P R,Q R)= ⊗ ⊗ tQ1(ρ, σ)+(1 t)Q2(ρ, σ). Since the marginal conditions on x are linear,− the problem of finding x which minimizes ∆( Π(ρ τ ′, σ κ′)ρ τ ′ [ρτ ′σκ′], ⊗ ⊗ ⊗ ⊗ ξ(x) subject to these constraints is a convex optimization ρ,σ Ω τ ′,κ′ Ω′ X∈ X∈ problem, for which efficient numerical techniques exist. J(ρ τ ′, σ κ′)σ κ′ [ρτ ′σκ′]), (111) Limiting case 1 (Two singleton ensembles). If ⊗ ⊗ ⊗ ⊗ ρ,σ Ω τ ′,κ′ Ω′ P (ρ)= δ , ρ, τ Ω and Q(ρ)= δ , ρ, σ Ω, i.e., each X∈ X∈ ρτ ρσ of the ensembles∈P and Q consists of only∈ a single state, where Π(ρ τ ′, σ κ′) has left marginal P (ρ)R(τ ′) and ⊗ ⊗ then the distance between the ensembles is equal to the J(ρ τ ′, σ κ′) has right marginal Q(σ)R(τ ′). distance between the respective states, One⊗ can⊗ readily see that the monotonicity of ∆ under partial tracing implies DEHS(P,Q) = ∆(τ, σ). (116)

DEHS(P R,Q R) DEHS(P,Q). (112) Proof. Due to the monotonicity of ∆ under partial ⊗ ⊗ ≥ tracing over the pointer system, we have that D(P,Q) Using the stability of ∆, we see that if we choose ∆(τ, σ). But clearly, equality is achievable because≥ Π(ρ τ ′, σ κ′)= P (ρ, σ)R(τ ′)δτ ′κ′ and J(ρ τ ′, σ κ′)= we can choose the probability distributions in Eq. (82) ⊗ ⊗ ⊗ ⊗ Q(ρ, σ)R(τ ′)δτ ′κ′ , where P (ρ, σ) and Q(ρ, σ) are two P (κ,ρ)= Q(κ,ρ)= δκτ δρσ. joint distributions for which the minimum in Eq. (82) Limiting case 2 (One singleton ensemble). Un- is attained, we obtain like the Kantorovich distance, when the ensemble Q(ρ) consists of only one state σ, i.e., Q(ρ) = δρσ, ρ, σ Ω, DEHS(P R,Q R) DEHS(P,Q), (113) ∈ ⊗ ⊗ ≤ the EHS distance between P (ρ) and Q(ρ) is generally not equal to the average distance between a state drawn from which together with Eq. (112) implies Eq. (110). the ensemble P (ρ) and the state σ, This property can also be seen to follow from Property 6 because one can go from P and Q to P R and Q R, EHS ⊗ ⊗ D (P,Q) = P (ρ)∆(ρ, σ). (117) respectively, and vice versa, via stochastic operations. 6 ρ Ω Property 11 (Convex optimization). The task of X∈ finding the optimal P (ρ, σ) and Q(ρ, σ) in Eq.(82) is a Proof. We provide a proof by counterexample. Let convex optimization problem. the singleton ensemble consist of the sate σ =σ ˜ 0 0 0 ⊗ | ih | Proof. We can think of P (ρ, σ) and Q(ρ, σ) as the and let the other ensemble consist of two states, ρ0 = 2 components of a vector x of dimension 2N , where N is ρ˜0 0 0 and ρ1 =ρ ˜1 1 1 , with probabilities p0 and 2 ⊗ | ih | ⊗ | ih | the cardinality of the set Ω. The first N components p1 =1 p0, respectively. The average distance between 2 − of the vector are equal to P (ρ, σ) and the second N the state σ0 and the states from the other ensemble is components are equal to Q(ρ, σ). The convexity of the function ∆ave = p0∆(ρ0, σ0)+ p1∆(ρ1, σ0) = p0∆(˜ρ0, σ˜0)+ p1. (118) ξ(x) ∆( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]) ≡ ⊗ ⊗ ρ,σ Ω ρ,σ Ω However, if we choose the joint distributions P (ρ, σ) = X∈ X∈ (114) p0δρρ0 +p1δρρ1 and Q(ρ, σ)= δρ0σ0 , we see from Eq. (87) that can be seen from the fact that for any x , x , and t, 1 2 1 1 0 t 1, we have DEHS(P,Q) p ρ σ + p ≤ ≤ ≤ 2 k 0 0 − 0 k 2 1 ≤ ξ(tx1 + (1 t)x2)= p0 1 1 − ρ0 σ0 + (1 p0) σ0 + p1 = 2 k − k 2 − k k 2 ∆( (tP (ρ, σ)+(1 t)P (ρ, σ))ρ [ρσ], 1 − 2 ⊗ 1 ρ,σ Ω p0 ρ˜0 σ˜0 +p1 = ∆ave. (119) X∈ 2 k − k (tQ1(ρ, σ)+(1 t)Q2(ρ, σ))σ [ρσ]) − ⊗ ≤ For an appropriate choice ofρ ˜0 andσ ˜0, the second in- ρ,σ Ω X∈ equality can be made strict, which completes the proof. t∆( P (ρ, σ)ρ [ρσ], Q (ρ, σ)σ [ρσ])+ Limiting case 3 (Classical distributions). If the 1 ⊗ 1 ⊗ ρ,σ Ω ρ,σ Ω set Ω consists of perfectly distinguishable density matri- X∈ X∈ EHS ces, i.e., ∆(ρ, σ)=1 δρσ, ρ, σ Ω, then D (P,Q) (1 t)∆( P2(ρ, σ)ρ [ρσ], Q2(ρ, σ)σ [ρσ]) − ∀ ∈ − ⊗ ⊗ reduces to the trace distance ∆(ρP , ρQ) between the den- ρ,σ Ω ρ,σ Ω X∈ X∈ sity matrices ρP = ρ Ω P (ρ)ρ and ρQ = ρ Ω Q(ρ)ρ, = tξ(x1)+(1 t)ξ(x2), (115) which is equal to the Kolmogorov∈ distance between∈ the − classical probabilityP distributions P and Q, DPEHS(P,Q)= due to the joint convexity of ∆. Notice that if P1(ρ, σ) 1 2 P (ρ) Q(ρ) . and P2(ρ, σ) have left marginals equal to P (ρ), so does ρ Ω| − | P∈ 18

Proof. The property follows from the fact that via Property 7 (Stability). Let P (ρ) and Q(ρ) be two CPTP maps one can go back and forth between any EHS ensembles of states in Ω and R(σ′) be an ensemble of representations of the ensembles P (ρ) and Q(ρ), ρ Ω, states in Ω′, where Ω and Ω′ are sets of states of two ∈ and the states ρP and ρQ. different systems. Then,

F EHS(P R,Q R)= F EHS(P,Q). (128) C. Properties of the EHS fidelity ⊗ ⊗

The properties of the EHS fidelity (86) can be proven Property 8 (Convex optimization). The task of analogously to the properties of the EHS distance, which finding the optimal P (ρ, σ) and Q(ρ, σ) in Eq.(86) is a is why we present them without proof. convex optimization problem. Property 1 (Positivity and normalization). Limiting case 1 (Two singleton ensembles). Let 0 F EHS(P,Q) 1, (120) P (ρ)= δρτ , ρ, τ Ω and Q(ρ)= δρσ, ρ, σ Ω, i.e., each ≤ ≤ of the ensembles∈P and Q consists of only∈ a single state. P,Q , ∀ ∈PΩ Then the fidelity between the ensembles is equal to the with fidelity between the respective states,

EHS F (P,Q) = 1 iff P (ρ)= Q(ρ), ρ Ω, (121) EHS ∀ ∈ F (P,Q)= F (τ, σ). (129) and F EHS(P,Q)=0 (122) Limiting case 2 (One singleton ensemble). Un- like the Kantorovich fidelity, when the ensemble Q(ρ) if and only if the supports of P and Q are orthogonal sets consists of only one state σ, i.e., Q(ρ) = δρσ, ρ, σ Ω, of states. the EHS fidelity between P (ρ) and Q(ρ) is generally∈not Property 2 (Symmetry). equal to the average fidelity between a state drawn from the ensemble P (ρ) and the state σ, F EHS(P,Q)= F EHS(Q, P ), (123)

P,Q Ω. ∀ ∈P F EHS(P,Q) = P (ρ)F (ρ, σ). (130) 6 Property 3 (Strong concavity). ρ Ω X∈ F EHS(pP + (1 p)P ,qQ + (1 q)Q ) (124) 1 − 2 1 − 2 EHS EHS Limiting case 3 (Classical distributions). If the √pqF (P1,Q1)+ (1 q)(1 p)F (P2,Q2), ≥ − − set Ω consists of perfectly distinguishable density ma- P1, P2,Q1,Q2 Ω, p, q [0, 1]. EHS p trices, i.e., F (ρ, σ) = δρσ, ρ, σ Ω, then F (P,Q) ∀ ∈P ∀ ∈ ∀ ∈ Property 4 (Monotonicity under generalized reduces to the fidelity F (ρP , ρQ) between the density ma- EHS measurements). F (P,Q) is monotonic under gen- trices ρP = ρ Ω P (ρ)ρ and ρQ = ρ Ω Q(ρ)ρ, which ∈ ∈ eralized measurements in the sense of Definition 3, is equal to the Bhattacharyya overlap between the clas- P P EHS F EHS(P,Q) F EHS(M(P ),M(Q)). sical probability distributions P and Q, F (P,Q) = Corollary≤ (Monotonicity under CPTP maps P (ρ)Q(ρ). EHS ρ Ω and invariance under unitary maps). F (P,Q) is ∈ P p monotonic under CPTP maps and invariant under uni- Comment. Unlike the Kantorovich fidelity, here both tary maps. ‘classical’ limits are the same. Property 5 (Monotonicity under averaging). Let P denote the singleton ensemble consisting of the average state of P (ρ), ρP = P (ρ)ρ. Then ρ Ω ∈ P D. Operational interpretations of the EHS F EHS(P,Q) F EHS(P, Q). (125) measures ≤ Corollary. If two distributions are close, their average Similarly to the Kantorovich measures, we can under- states are also close, i.e., stand the meaning of the EHS measures from an opera- EHS tional point of view. However, we present an interpreta- if F (P,Q) 1 ε, then F (ρP , ρQ) 1 ε. (126) ≥ − ≥ − tion in the spirit of Sec. IV.D only for the EHS distance. Property 6 (The EHS fidelity is lower bounded For the EHS fidelity, we present an interpretation of a by the Kantorovich fidelity). different type, in which an ensemble of density matrices is looked upon as the output of a stochastic quantum F EHS(P,Q) F K (P,Q). (127) ≥ channel with a pure-state input. 19

Bob 1. The EHS distance where pmax(P,Q) is Bob’s maximal probability of success when Alice chooses her strategy optimally. Observe that Eq. (87) can be written as P (ρ, σ)+ Q(ρ, σ) DEHS(P,Q) = min 2. The EHS fidelity P (ρ,σ),Q(ρ,σ) 2 × ρ,σ Ω X∈ P (ρ, σ) Q(ρ, σ) For the EHS fidelity, we propose an interpretation ρ σ . (131) which is similar to the one proposed for the square root k P (ρ, σ)+ Q(ρ, σ) − P (ρ, σ)+ Q(ρ, σ) k fidelity in Ref. [38], It is not difficult to see that F (ρ, σ) = max ψ φ , (134) P (ρ, σ) Q(ρ, σ) ρ σ |h | i| k P (ρ, σ)+ Q(ρ, σ) − P (ρ, σ)+ Q(ρ, σ) k where maximization is taken over all pure states ψ and | i =2pmax(ρ, σ) 1, (132) φ such that ρ = ( ψ ψ ) and σ = ( φ φ ) for some − |CPTPi map . AccordingE | ih to| this interpretation,E | ih | if ρ and σ where pmax(σ, ρ) is the maximum average probability are the outputsE of a deterministic quantum channel with with which the two states σ and ρ, each occurring with P (ρ,σ) Q(ρ,σ) pure-state inputs, the square root fidelity is an upper prior probability P (ρ,σ)+Q(ρ,σ) and P (ρ,σ)+Q(ρ,σ) , respec- bound on the overlap between the input states. It turns tively, can be distinguished by a measurement [6]. In the out that the EHS fidelity provides a generalization of this case when each of the states ρ and σ is equally likely, the idea to stochastic quantum channels. quantity (132) reduces to 1 ρ σ . When a generalized measurement M with measure- 2 k − k Imagine that Alice is given two ensembles P (ρ) and ment superoperators i is applied to a given state Q(ρ), ρ Ω, which are also known to Bob. With prob- σ, it gives rise to an{M ensemble} P (ρ), ρ Ω, with ∈ ability 1/2, she chooses one of the two ensembles and p , where p = Tr( (σ)) are∈ the prob- P (ρ)= i: σi=ρ i i i draws a random state from it. Let us say that she draws abilities for the different measurementM outcomes, and P the state ρ from the first ensemble. She then sends this σi = i(σ)/pi are their corresponding output states. state to Bob but tells him that she is sending either the In otherM words, M can be viewed as a stochastic quan- state ρ drawn from the first ensemble or the state σ drawn tum channel which for a given input state outputs an from the second ensemble, where Alice can choose to say ensemble of states. We will use the short-cut notation a particular σ depending on the ρ she actually got. Bob’s M(σ) to denote the ensemble of states resulting from the task is to distinguish from which ensemble the state he re- action of the channel M on the state σ. ceives has been drawn, and the figure of merit of his suc- Theorem 3 (Channel-based interpretation of cess is the average number of times he guesses correctly. the EHS fidelity). Let P (ρ) and Q(ρ), ρ Ω, be two Alice’s goal is to make Bob’s task as difficult as possible, ensembles of density matrices on S. Then,∈ with the caveat that, although she is free to choose her H strategy, she has to reveal it to Bob. Alice’s strategy is F EHS(P,Q) = max ψ φ , (135) |h | i| described by the probabilities T1(ρ σ) with which, when having drawn state ρ from the first| ensemble, she will where maximization is taken over all pure states ψ S S | i ∈ tell Bob that the state is either ρ from the first ensem- and φ such that M( ψ ψ ) = (P (ρ),ρ) , H | i ∈ H | ih | { } ble or σ from the second ensemble, and the probabilities ρ Ω, and M( φ φ ) = (Q(ρ),ρ) , ρ Ω, for some ∈ | ih | { } ∈ T2(ρ σ) with which, when having drawn state σ from the stochastic channel M. second| ensemble, she will say that the sate is either σ Proof. From the monotonicity of the EHS fidelity from the second ensemble or ρ from the first ensemble. under generalized measurements it follows that for any In other words, Bob is aware of the joint probabilities generalized measurement M and two states ψ and φ , | i | i P (ρ, σ) = P (ρ)T1(ρ σ) and Q(ρ, σ) = T2(ρ σ)Q(σ). Ob- | | F EHS(M( ψ ψ ), M( φ φ )) ψ φ . (136) viously, the probability that Bob will be told that the | ih | | ih | ≥ |h | i| state he receives is either ρ from the first ensemble or σ P (ρ,σ)+Q(ρ,σ) Therefore, we only have to show that there exist states from the second ensemble is equal to , and 2 ψ , φ S and a generalized measurement M, for the prior probability that in such a case the state is ρ is |whichi | i equality ∈ H is attained. P (ρ,σ) , while the prior probability that the state P (ρ,σ)+Q(ρ,σ) Let P (ρ, σ) and Q(ρ, σ) be two joint probability dis- Q(ρ,σ) is σ is P (ρ,σ)+Q(ρ,σ) . Then assuming that Bob performs tributions which achieve the maximum in Eq. (88) for the optimal measurement to distinguish these states with the pair of probability distribution P (ρ) and Q(ρ). From these prior probabilities, the optimal strategy for Alice Uhlmann’s theorem [1] we know that for any pair (ρ, σ) SB S B ∈ is to choose T1(ρ σ) and T2(ρ σ) (or equivalently, P (ρ, σ) Ω Ω, there exist purifications ψρ,σ and | | × SB S B | i ∈ H ⊗ H and Q(ρ, σ)) that minimize the quantity (131). The EHS φρ,σ of ρ and σ, respectively, such that | i ∈ H ⊗ H SB distance can then be understood as F (ρ, σ) = ψρ,σ φρ,σ . The second system B can be h | i EHS Bob chosen to have the same dimension as that of S. Let D (P,Q)=2pmax(P,Q) 1, (133) us introduce a third system with a Hilbert space E of − H 20 dimension N 2, where N is the cardinality of the set Ω. P (ψ, φ)Q(ψ, φ) between the probabilities P (ψ, φ) and Let (ρ, σ) E , (ρ, σ) Ω Ω, be an orthonormal basis Q(ψ, φ) is modified by the factor ψ φ . Heuristi- of {|E. Fromi } Eq. (88)∈ one× can readily see that the pure pcally, we could think that the probabilities|h | i| of the two statesH distributions are of a quantum nature, i.e., instead of P (ψ, φ) and Q(ψ, φ) at a given point (ψ, φ), we have SBE SB E P = P (ρ, σ) ψρ,σ (ρ, σ) , (137) P (ψ, φ) ψ ψ and Q(ψ, φ) φ φ , whose overlap is given | i | i | i ρ,σ Ω | ih | | ih | X∈ p by P (ψ, φ)Q(ψ, φ) ψ φ . Note that expression (143) SBE SB E |h | i| Q = Q(ρ, σ) φρ,σ (ρ, σ) , (138) is formulated without any reference to mixed-state fi- | i | i | i p ρ,σ Ω delity. X∈ p Theorem 4. The square root fidelity F (ρ, σ) = by construction satisfy Tr √σρ√σ is equal to the maximum of the fidelity (143) between all possible pure-state ensembles whose average P Q SBE = F EHS(P,Q). (139) p h | i density matrices are equal to ρ and σ, i.e., Notice that there exists a unitary transformation U EHS S B E ∈ F (ρ, σ) = max F (P,Q), (144) ( ) such that P,Q B H ⊗ H ⊗ H U ψ S 0 BE = P SBE , (140) where maximization is taken over all P = | i | i | i (P (ψ), ψ ψ ) and Q = (Q(φ), φ φ ) , such U φ S 0 BE = Q SBE, (141) { | ih | } { | ih | } | i | i | i that BE B E S where 0 is some state in , and ψ and P (ψ) ψ ψ = ρ, (145) S | i S H ⊗ H | i | ih | φ are states in . Since unitary operations preserve ψ |thei overlap betweenH states, X Q(φ) φ φ = σ. (146) S SBE EHS | ih | ψ φ = P Q = F (P,Q). (142) φ h | i h | i X But from the states P SBE and Q SBE we can ob- More succinctly, tain the ensembles (P| (iρ),ρ) and| (iQ(ρ),ρ) , respec- tively, by performing{ a destructive} measurement{ } on sub- F (ρ, σ)= max P (ψ, φ)Q(ψ, φ) ψ φ , P (ψ,φ),Q(ψ,φ), |h | i| E E ψ,φ Ω system in the basis (ρ, σ) and tracing out sub- Ω ∈ HB {| i } S X p system . Therefore, starting from the two states ψ (147) and φ HS we can obtain the ensembles (P (ρ),ρ) |andi (Q(|ρ)i,ρ) by appending the state 0 BE{ , applying} the where maximization is taken over all sets of pure states unitary{ operation} U, measuring in the| i basis (ρ, σ) E Ω and joint distributions P (ψ, φ) and Q(ψ, φ), ψ, φ Ω, ∈ and discarding system B. This operation is equivalent{| i to} such that a generalized measurement M on system S. This com- P (ψ, φ) ψ ψ = ρ, (148) pletes the proof. | ih | ψ,φ Ω X∈ Q(ψ, φ) φ φ = σ. (149) VI. AN ENSEMBLE-BASED | ih | ψ,φ Ω INTERPRETATION OF THE SQUARE ROOT X∈ FIDELITY Proof. From the monotonicity of F EHS(P,Q) under averaging, it follows that As we pointed out in Sec. V, the EHS fidelity can be formulated without reference to an extended Hilbert F (ρ, σ) max P (ψ, φ)Q(ψ, φ) ψ φ . ≥ P (ψ,φ),Q(ψ,φ), |h | i| space (Eq. (88)). In the case when the set Ω consists of ψ,φ Ω Ω X∈ p pure states, the quantity (88) can be written as (150)

EHS F (P,Q)= max P (ψ, φ)Q(ψ, φ) ψ φ , To prove that there are pure-state ensembles for which P (ψ,φ),Q(ψ,φ) |h | i| ψ,φ Ω X∈ p equality is achieved, we will make use of Uhlmann’s the- (143) orem [1] according to which where optimization is taken over all joint distributions F (ρ, σ)= max ψ φ , (151) P (ψ, φ) with left marginal P (ψ), and Q(ψ, φ) with ψe , φe |h | i| right marginal Q(φ). Notice that for fixed P (ψ, φ) and | i | i e e Q(ψ, φ), the quantity P (ψ, φ)Q(ψ, φ) ψ φ where maximization is taken over all possible purifica- ψ,φ Ω |h | i| can be thought of as a generalization∈ of the Bhat- tions ψ and φ of ρ and σ, respectively. Let ψ0 P p | i | i | i tacharyya overlap between classical probability distri- and φ0 be two purifications for which the maximum butions over the variable (ψ, φ), where the overlap in Eq.| e (151)i is attained.e Choose an orthonormal basise e 21

i in the auxiliary system needed for the purification. Before we propose distinguishability measures between We{| i} can write stochastic quantum operations, let us discuss what we mean when we say that two such operations are different. ψ0 = αi ψi i , (152) For the purposes of the present paper, we will identify a | i | i| i i stochastic M (or a generalized mea- X e surement) with an ensemble (mi, ¯ i) , mi 0, of dif- ferent completely positive measurement{ M } superoperators≥ φ0 = βi φi i . (153) ¯ ¯ ¯ † normalized | i | i| i i( )= j Mij ( )Mij which are as i M · · X P e Tr( M¯ † M¯ ij )= d, i, (156) The overlap between these states can be written as ij ∀ j X ψ0 φ0 = α∗βi ψi φi α∗βi ψi φi . (154) and satisfy |h | i| | i h | i| ≤ | i h | i| i i X X ¯ ¯ e e miMij† Mij = I. (157) Notice that if we change arbitrarily the phases of αi and i,j βi in Eqs. (152) and (153), we obtain valid (although not X necessarily optimal) purifications of ρ and σ. If we choose The unnormalized measurement superoperators i M the phases such that each of the quantities αi∗βi ψi φi which appear in the usual description of a measurement have the same phase, then equality in Eq. (154)h is| at-i (Eq. (3)) are related to the normalized ones via tained. Therefore, for optimal purifications we have ¯ i = i/mi, (158) M M ψ φ = αi βi ψi φi . (155) mi = Tr( M † Mij )/d. (159) |h 0| 0i| | || ||h | i| ij i j X X e e 2 Notice that the ensembles αi , ψi ψi and Notice that the weights mi satisfy i mi = 1, i.e., {| | | ih |} ¯ β 2, φ φ are such that their averages give they can be thought of as ‘probabilities’ and (mi, i) i i i P { M } rise{| | to| ρ ihand|}σ, i.e., they are among those ensembles can be thought of as a ‘probabilistic’ ensemble of nor- ¯ over which maximization in Eq. (144) is taken. But malized superoperators i. Note, however, that mi are not equal to the probabilitiesM of the measurement out- i αi βi ψi φi is exactly of the form on the right- hand| side|| ||h of| Eq.i| (147), i.e., equality in Eq. (150) is comes which generally depend on the input state ρ and ¯ Pattained by α 2, ψ ψ and β 2, φ φ . This are given by pi = miTr( i(ρ)). i i i i i i M completes the{| proof.| | ih |} {| | | ih |} The reason why we associate different outcomes with Clearly, all interpretations of the fidelity must be normalized superoperators is that we want our descrip- equivalent, but they provide different intuitive ways of tion to explicitly emphasize the fact that measurement understanding the same quantity. Theorem 4 gives an outcomes whose measurement superoperators differ from interpretation based on the pure-state ensembles from each other only by a factor are not considered different. which a mixed state can be prepared by averaging and This is because for us a generalized measurement is not thus reflects the common intuition of mixed states as de- a characterization of a particular physical device (which scribing mixtures of pure states. could produce classical readings not necessarily related to the quantum system of interest), but the most abstract characterization of an operation on the state of the quan- VII. DISTANCE AND FIDELITY BETWEEN tum system, which includes information extraction as STOCHASTIC QUANTUM OPERATIONS well as state transformation. Clearly, two measurement superoperators which differ from each other by a factor In practice, it often makes sense to ask how close two do not provide any different information about the state quantum processes are. For example, we may want to of the system prior to the measurement (according to compare an ideal quantum operation which we would like Bayes’s rule) nor give rise to different post-measurement to implement, with an imperfect operation that we are states. Note that when we say that two normalized measurement superoperators ¯ i( )= M¯ ij ( )M¯ † and able to implement. Distance measures between deter- M · j · ij ¯ ¯ ¯ ministic quantum operations (CPTP maps) have been k( ) = k Nkl( )Nkl† are the same,P we compare them defined, e.g., in Ref. [20]. However, a similar treatment Nas completely· positive· maps, i.e., irrespectively of their P for stochastic quantum operations (generalized measure- operator-sum representations. In other words, ¯ i = ¯k ments) has been missing. Stochastic operations are an if and only if there exists a unitary matrix withM compo-N ¯ ¯ important tool for quantum information processing with nents Ujl, such that Mij = l UjlNkl, j [2]. In that applications in various areas, such as quantum control, sense, if two measurements are described∀ by identical en- state estimation, entanglement manipulation, and error sembles of normalized measurementP superoperators, they correction, to name a few. Identifying such measures are the same measurement. Conversely, if two measure- could thus be very useful. ments are described by different ensembles of normalized 22 measurement superoperators, they should be considered i.e, Tr(ρ ) = 1. However, not all density matrices on different because they either give rise to different output A MS correspond to CPTP maps, but only those ensembles for some input, or provide different informa- Hwhose⊗ H reduced density matrix on subsystem A is the tion about the input state, or both. Therefore, we will maximally mixed state I/d. It is easy to see that most specify a generalized measurement M by the correspon- generally, a density matrix on A S corresponds to a H ⊗¯ H ¯ ¯ dence completely positive superoperator ( ) = i Mi( )Mi†, which is normalized as M · · M (mi, ¯ i) . (160) P ↔{ M } ¯ ¯ Tr( Mi†Mi)= d. (162) There are many possible ways in which one can define i distance between two quantum operations. The following X desirable properties for a distance D between determin- The reverse is also true—every completely positive su- istic quantum operations and were pointed out and peroperator on ( S), which satisfies Eq. (162), gives B H AS discussed in Ref. [20]: (1)EmetricF —the measure should rise to a density matrix when applied to Φ Φ . We | ih | be positive, symmetric, satisfy the triangle inequality, therefore see that there is an isomorphism and vanish if and only if the two operations are identical; ¯ (mi, i) (mi,ρ ¯ i ) (163) (2) computability—it should be possible to evaluate D { M }↔{ M } in a direct manner; (3) measurability—there should be between ensembles of normalized completely positive su- an achievable experimental procedure for determining D; peroperators and ensembles of density matrices. Of (4) physical interpretation—the distance should have course, just like not every completely positive map cor- a well motivated physical interpretation; (5) stability— responding to a density matrix is trace preserving, not D( , )= D( , ), which means that unrelated I⊗E I⊗F E F every ensemble (mi, i) , i mi = 1, forms a gen- physical systems should not affect the value of D; (6) { M } eralized measurement ( miM¯ † M¯ ij = I). But since chaining—D( 2 1, 2 1) D( 1, 1)+D( 2, 2), i,j P ij E ⊗E F ⊗F ≤ E F E F the reverse is true, we can use the isomorphism to define i.e., for a process composed of several steps, the total er- P ror should be less than the sum of the errors in the indi- distance and fidelity between generalized measurements vidual steps. We will consider the same requirements for through the distance and fidelity between ensembles of a distance between stochastic quantum operations. In states. the deterministic case, in view of the above desiderata, Definition 7 (Distance between generalized two main approaches to distinguishing quantum opera- measurements based on the Jamio lkowski isomor- tions stand out—comparison based on the Jamiolkowski phism). Let M and N be two generalized measurements acting on ( S). Then, isomorphism and worst-case comparison. We will adopt B H the same approaches here. D (M, N) Since many of the properties for the following measures iso ≡ A S AS A S AS and their proofs are similar to those discussed in Ref. [20], D M ( Φ Φ ), N ( Φ Φ ) , (164) I ⊗ | ih | I ⊗ | ih | we will only comment on them briefly. In what follows, where A MS and A NS denote the generalized we will use D and F to denote distance and fidelity be- I ⊗ I ⊗ tween ensembles, which can be either of the Kantorovich measurements M and N applied locally on subsystem S and Φ AS = j A j S/√d is a maximally entangled or of the EHS type. We will use M(ρ) to denote the en- | i j | i | i semble of output states that results from the action of a state on A S. P stochastic quantum operation M on an input state ρ. PropertyH ⊗ 1 H (Metric). It follows from the metric properties of D. Property 2 (Computability). It follows from the A. Measures based on the Jamio lkowski computability of D which is either a linear program (in isomorphism the Kantorovich case) or a convex-optimization problem (in the EHS case). The Jamiolkowski isomorphism [27] is a one-to-one cor- Property 3 (Measurability). As in the determin- respondence between completely positive maps (super- istic case, Diso can be determined by doing full process operators) : ( S) ( S) and positive operators tomography [41, 42]. ρ ( MA B SH), where→ B dim(H A) = dim( S ) = d. Property 4 (Physical interpretation). In addition M ∈ B H ⊗ H H H The correspondence is established via to the obvious meaning of Diso following from its defini- tion, it was pointed out in Ref. [20] that in the determin- A S AS 1 ρ = ( Φ Φ ), (161) istic case, Diso( , ) ∆( ( x x ), ( x x )), M I ⊗M | ih | d x where the sum isE overF a≥ set of orthonormalE | ih | basisF | ih states| where Φ AS = j A j S /√d is a maximally entangled P | i j | i | i x which can be thought of as the different instances of state on A S (here j A and j S are orthonor- |a computationali problem. In a similar manner, it can be H ⊗ HPA {|S i } {| i } 1 mal bases of and , respectively). Notice that if seen that Diso(M, N) d x ∆(M( x x ), N( x x )). the completelyH positiveH map is trace-preserving, the Property 5 (Stability).≥ It follows| ih from| the| stabilityih | corresponding positive operatorMρ is a density matrix, of D. P M 23

Property 6 (Chaining). The proof of this property and therefore computable. By a similar argument it can assumes monotonicity of D under generalized measure- be seen that for stochastic quantum operations, finding ments and therefore it holds for the EHS distance. Simi- the maximum in Eq. (166) is also a convex optimization larly to the deterministic case [20], it can be shown that problem. D satisfies D (M M , N N ) D (M , N )+ Property 3 (Measurability). Here too, the value of iso iso 2 ◦ 1 2 ◦ 1 ≤ iso 2 2 Diso(M1, N1), provided that N1 is a unital measurement, Dmax can be determined using quantum process tomog- ¯ ¯ i.e., j n1j 1j (I) = I, where (n1j , 1j ) is the en- raphy [41, 42]. semble of normalizedN measurement{ superoperatorsN } cor- Property 4 (Physical interpretation). The phys- P responding to N1. ical meaning of Dmax follows directly from its definition Definition 8 (Fidelity between generalized mea- and the physical meaning of D. surements based on the Jamio lkowski isomor- Property 5 (Stability). The proof goes along phism). Let M and N be two generalized measurements the same lines as the proof for the deterministic case acting on ( S). Then, (Ref. [20])—all one needs to show is that the quantity B H (166) is independent of the dimension of system A, as F (M, N) iso ≡ long as this dimension is greater than or equal to d. This F A MS( Φ Φ AS ), A NS( Φ Φ AS ) . (165) follows from the observation that an input state which I ⊗ | ih | I ⊗ | ih | achieves the maximum in Eq. (166) can have at most d The fidelity satisfies similar properties to those of the Schmidt coefficients, which implies that there is a sub- distance, except for the triangle inequality. space of A with dimension d such that the maximum can be achievedH by maximization inside that subspace. Property 6 (Chaining). The chaining property fol- B. Measures based on worst-case comparison lows from the triangle inequality and the monotonicity of D under generalized measurements, i.e., it holds for the Definition 9 (Distance between generalized EHS distance. measurements based on the worst case). Let Definition 10 (Fidelity between generalized M and N be two generalized measurements acting on measurements based on the worst case). Let ( S), dim( S) = d. Introduce an ancillary system A M and N be two generalized measurements acting on withB H a HilbertH space A, dim( A)= d. Then, ( S), dim( S) = d. Introduce an ancillary system A H H withB H a HilbertH space A, dim( A)= d. Then, D (M, N) H H max ≡ A S A S F (M, N) max D M ( ψ ψ ), N ( ψ ψ ) , (166) min ≡ ψ I ⊗ | ih | I ⊗ | ih | A S A S | i min F M ( ψ ψ ), N ( ψ ψ ) , (167)  ψ I ⊗ | ih | I ⊗ | ih | where maximum is taken over all ψ A S. | i | i ∈ H ⊗ H  The definition is based on a maximization over states where minimum is taken over all ψ A S. | i ∈ H ⊗ H in an extended Hilbert space in order to guarantee sta- The fidelity Fmin satisfies properties analogous to those bility of the distance, as it is known that without this of Dmax with the exception of the triangle inequality. extension even the analogously defined distance between CPTP maps based on the trace distance is not stable [39]. Note that this definition takes maximum over pure-state C. Distance and fidelity between POVMs inputs. As we saw in Sec. IV.E, a generalized measure- ment can be defined to act on ensembles of mixed states A very useful concept in quantum information is that so that it most generally transforms ensembles of density of a positive operator-valued measure (POVM)—a set matrices into ensembles of density matrices. However, of positive operators Ei , Ei > 0, which sum up to it is easy to see that one cannot obtain a larger value { } the identity, i Ei = I. A POVM provides the most by maximizing over mixed states or ensembles of mixed general description of a quantum measurement in situa- states. This follows from the joint convexity of D with tions where oneP is not interested in the post-measurement respect to ensembles and from the joint convexity of ∆ state. In terms of the measurement superoperators i, with respect to mixed states. M the POVM elements are given by Ei = j Mij† Mij , i.e., Property 1 (Metric). It follows from the metric there is no unique generalized measurement which corre- properties of D. (The fact that the distance between sponds to a given POVM. Similarly to theP case of gener- different measurements is non-zero follows from the fact alized measurements, we can express a POVM as an en- that for the input state Φ SA, different measurements semble of normalized POVM elements, (mi, E¯i) , where yield different output ensembles.)| i { } mi = Tr(Ei)/d, E¯i = Ei/mi. Notice that the operators Property 2 (Computability). We already pointed ¯ out that the measure D for any particular pair of ensem- ρE¯ Ei/d (168) bles is computable. In Ref. [20] it was argued that in the i ≡ case of deterministic operations, the corresponding opti- are density matrices (Tr(ρE¯i ) = 1), i.e., there is a one-to- mization in Eq. (166) is a convex optimization problem one correspondence between POVMs and ensembles of 24

density matrices (mi,ρE¯i ) which satisfy i miρE¯i = problem is whether measures of distance and fidelity that I/d. Therefore,{ we can compare} POVMs directly us- satisfy the conditions of Theorem 2 exist. P ing the distinguishability measures between ensembles of The second type of measures is based on the notion states. of an extended-Hilbert-space (EHS) representation of an Definition 11 (Distance between POVMs). Let ensemble. We showed that for every ensemble there is a Ei and Gj be two POVMs and let PE (mi,ρE¯ ) class of valid EHS representations and defined the mea- { } { } ≡{ i } (mi = Tr(Ei)/d, ρE¯ = Ei/(mid)) and PG (nj ,ρG¯ ) sures as a minimum (maximum) of the trace distance i ≡{ j } (nj = Tr(Gj )/d, ρG¯j = Gj /(njd)) be the ensembles of (square root fidelity) between all EHS representations of density matrices that correspond to them. Then, the ensembles being compared. These measures, which are monotonic under generalized measurements, can be DPOVM( Ei , Gj ) D(PE, PG). (169) computed as convex optimization problems. We provided { } { } ≡ operational interpretations for the measures and showed Definition 12 (Fidelity between POVMs). Let that the EHS fidelity is an upper bound of the overlap

Ei and Gj be two POVMs and let PE (mi,ρE¯i ) between all possible pure-state inputs that could give rise { } { } ≡{ } (mi = Tr(Ei)/d, ρE¯i = Ei/(mid)) and PG (nj ,ρG¯j ) to the two ensembles being compared under the action of ≡{ } (nj = Tr(Gj )/d, ρG¯j = Gj /(njd)) be the ensembles of a stochastic quantum operation. We also used the EHS density matrices that correspond to them. Then, fidelity between ensembles to provide a novel interpreta- tion of the square root fidelity between density matrices. FPOVM( Ei , Gj ) F (PE , PG). (170) We showed that the square root fidelity is equal to the { } { } ≡ minimum fidelity between all possible pure-state ensem- The properties of these measures can be obtained in bles from which the density matrices being compared can a straightforward manner from the properties of the dis- be obtained. tance and fidelity between states. We only remark that An interesting question is whether any of the measures the ensemble of states PE = (mi,ρ ¯ ) that corresponds { Ei } between ensembles that we introduced can be used to to a given POVM Ei has the following operational { } define a Riemannian metric on the space of ensembles, meaning—it is the ensemble of states of system A that which endows the space with geometrical notions such as we obtain from the maximally entangled state Φ AS if | i volume or geodesics. Clearly, the measures based on the we perform the destructive POVM Ei on subsystem S, { } trace distance would not induce a Riemannian metric be- AS A A S AS cause the trace distance is known not to be Riemannian Φ Φ ρ ¯ = TrS(I E Φ Φ )/mi, (171) | ih | → Ei ⊗ i | ih | [40]. The Kantorovich fidelity is not a good candidate A S AS S with probability mi = Tr(I Ei Φ Φ ) = Tr(Ei ). either because in one of the classical limits it reduces to ⊗ | ih | a function of the Kolmogorov distance. However, we can As quantum detector tomography is now within the define an EHS distance which is a generalization of the reach of experimental technology [26], it becomes relevant Bures distance between density matrices, BEHS(P,Q) = to ask how much a real quantum detector differs from an 1 F EHS(P,Q), or an EHS angle which is a generaliza- ideal one. The distance and fidelity between POVMs in- tion− of the Bures angle, AEHS(P,Q) = arccos F EHS(P,Q). troduced in this section provide rigorous means of quan- pIt is known that the Bures distance and angle induce a tifying such difference. Riemannian metric, and it would be interesting to see if their EHS generalizations induce such a metric on the space of ensembles. This problem is left open for future VIII. CONCLUSION investigation. Finally, based on the measures between ensembles, In this paper we defined measures of distance and fi- we defined two types of distinguishability measures be- delity between probabilistic ensembles of quantum states tween generalized measurements. The first one is based and used them to define measures of distance and fidelity on the Jamiolkowski isomorphism and the second one between stochastic quantum operations. We proposed on the worst-case comparison. These measures are gen- two types of measures between ensembles. eralizations of the distance and fidelity between CPTP The first one is based on the ability of one ensemble to maps proposed in Ref. [20] and similarly to them sat- mimic another and leads to measures of a Kantorovich isfy the desiderata outlined in Ref. [20]. One of the de- type, which appear in the context of optimal transporta- sired properties—the chaining property—is satisfied only tion and can be computed as linear programs. However, by the measures based on the EHS distance and fidelity when based on the trace distance or the square root fi- since this property requires monotonicity under gener- delity, these measures are not monotonic under general- alized measurements of the corresponding measures be- ized measurements. We derived necessary and sufficient tween ensembles of states. In addition to generalized conditions that the basic measures of distance and fidelity measurements, we also defined distinguishability mea- between states have to satisfy in order for the correspond- sures between POVMs. The proposed measures may find ing Kantorovich distance and fidelity to be monotonic various applications as they provide a rigorous general under measurements (Theorem 2). An interesting open tool for assessing the performance of non-destructive and 25 destructive measurement schemes. On the other hand, we have

hP hQ = P (ρ)h(ρ) Q(σ)h(σ) | − | | − | Appendix A: CONTINUITY OF THE AVERAGE ρ Ω σ Ω X∈ X∈ OF A CONTINUOUS FUNCTION WITH = Π(ρ, σ)h(ρ) Π(ρ, σ)h(σ) RESPECT TO THE KANTOROVICH DISTANCE | − | ρ,σ Ω ρ,σ Ω X∈ X∈ Π(ρ, σ) h(ρ) h(σ) = Let h(ρ) be a bounded function which is continuous ≤ | − | ρ,σ Ω with respect to the distance ∆, i.e., for every δ > 0, X∈ there exists ε> 0, such that for all ρ and σ for which Π(ρ, σ) h(ρ) h(σ) + Π(ρ, σ) h(ρ) h(σ) . | − | | − | Ω>ε Ω≤ε ∆(ρ, σ) ε, (A1) X X ≤ (A10) we have Since h(ρ) is bounded, there exists a constant hmax > 0 1 such that h(ρ) h(σ) hmax for all ρ and σ. Using this h(ρ) h(σ) δ. (A2) fact, together| with− Eq.|≤ (A9) and the assumption that for | − |≤ 2 1 all (ρ, σ) Ω ε, h(ρ) h(σ) δ, we can upper bound ∈ ≤ | − |≤ 2 1 the last line in Eq. (A10) as follows: (The factor of 2 in front of δ is chosen for convenience.) Let h denote the average of the function h(ρ) over the P Π(ρ, σ) h(ρ) h(σ) + Π(ρ, σ) h(ρ) h(σ) ensemble P (ρ), ρ Ω, | − | | − |≤ ∈ Ω>ε Ω≤ε X X ε 1 hP = P (ρ)h(ρ). (A3) ′ hmax + Π(ρ, σ) δ ρ Ω ε 2 ≤ X∈ ΩX≤ε ε 1 We will prove that for every δ > 0, there exists ε′ > 0, ′ hmax + δ. (A11) such that for all P,Q for which ε 2 ∈PΩ K Therefore, we see that by choosing D (P,Q) ε′, (A4) ≤ δε we have ε′ , (A12) ≤ 2hmax

hP hQ δ. (A5) we obtain | − |≤ K hP hQ δ. (A13) Assume that D (P,Q) ε′. Let Π(ρ, σ) be a joint | − |≤ distribution for which the minimum≤ in the definition (12) Since δ was arbitrarily chosen, the property follows. of DK (P,Q) is achieved, i.e.,

K D (P,Q)= Π(ρ, σ)∆(ρ, σ) ε′. (A6) Appendix B: NON-MONOTONICITY UNDER ≤ ρ,σ Ω GENERALIZED MEASUREMENTS OF THE X∈ KANTOROVICH MEASURES Define the sets Ω>ε and Ω ε as the sets of all pairs of states (ρ, σ) for which ∆(ρ,≤ σ) > ε and ∆(ρ, σ) ε, re- ≤ To show that the Kantorovich distance is not mono- spectively. The sum in Eq. (A6) can then be split in two tonic under measurements, let us look at a particular sums, example. Consider the case of two singleton ensembles consisting of the states i piρi i i and i qiσi i i , Π(ρ, σ)∆(ρ, σ)+ Π(ρ, σ)∆(ρ, σ) ε′. (A7) ⊗| ih | ⊗| ih | ≤ respectively, where the states i are an orthonormal Ω Ω P {| i} P X>ε X≤ε set, i j = δij . Imagine that we apply a nondestruc- tive projectiveh | i measurement on the second subsystem in The first sum obviously can be bounded as follows, the basis i . This measurement yields the ensembles {| i} (pi,ρi i i ) and (qi, σi i i ) , which we will de- Π(ρ, σ)ε Π(ρ, σ)∆(ρ, σ) ε′, (A8) {note by⊗p |andih |q}for short.{ Observe⊗ | ih | that} the Kantorovich ≤ ≤ ΩX>ε ΩX>ε distance between the resulting ensembles, as defined in Eq. (12), is equal to which implies that K 1 D (p, q)= (min(pi, qi) ρi σi + pi qi ). ε′ 2 k − k | − | Π(ρ, σ) . (A9) i ≤ ε X (B1) ΩX>ε 26

This follows from the observation that for any joint prob- i.e., ability distribution Π(ρi i i , σj j j ), the quantity ⊗ | ih | ⊗ | ih | qi in Eq. (11) reads piρi qiσi pi( ρi σi +( 1)) k − k≤ k − k pi − 1 = pi ρi σi +(qi pi) D (p, q)= Π(ρi i i , σi i i ) ρi σi Π 2 ⊗ | ih | ⊗ | ih | k − k k − k − i = min(pi, qi) ρi σi + pi qi . (B7) X k − k | − | + Π(ρi i i , σj j j ) (B2) ⊗ | ih | ⊗ | ih | Since we arbitrarily assumed which is the smaller of the i=j X6 two values pi and qi, the inequality (B7) must hold for every i. Comparing Eq. (B1) and Eq. (B4), we see that because i j = δij . Since Π(ρi i i , σi i i )+ h | i i ⊗ | ih | ⊗ | ih | i=j Π(ρi i i , σj j j ) =1, and ρi σi 1, if 1 K 6 ⊗ | ih | ⊗ | ihP| k − k≤ piσi i i qj σj j j D (p, q). (B8) each of the terms Π(ρi i i , σi i i ) is equal to its 2 k ⊗| ih |− ⊗| ih | k≤ P ⊗ | ih | ⊗ | ih | i j maximal possible value consistent with the marginal con- X X ditions, then the value of DΠ(p, q) would be minimal and K For most choices of ρi and σi, the inequality (B8) is strict it would be equal to the Kantorovich distance D (p, q). since the triangle inequality used in Eq. (B6) is gener- The maximum possible value of Π(ρi i i , σi i i ) ⊗ | ih | ⊗ | ih | ally strict. Thus we see that the Kantorovich distance is consistent with the marginal probability distributions is not monotonically decreasing under measurements. Ob- min(pi, qi) because if, say, min(pi, qi) = pi and Π(ρi ⊗ viously it is not monotonically increasing either because i i , σi i i ) > pi, then Π(ρi i i , σj j j ) | ih | ⊗ | ih | j ⊗ | ih | ⊗ | ih | it decreases under CPTP maps (Property 6, Sec. IV.B). would be strictly larger than pi, while by definition it P For the Kantorovich fidelity, we already observed that has to be equal to pi. Each of these values is achiev- its values in the two classical limits are not the same: the able because there exist joint probability distributions fidelity between the two singleton distributions consisting Π(ρi i i , σj j j ) with the correct marginals that ⊗ | ih | ⊗ | ih | of states of the form ρ = i pi i i and σ = i qi i i , satisfy where i is an orthonormal set,| ih is| equal to | ih | {| i} P P Π(ρi i i , σi i i ) = min(pi, qi), i. (B3) K ⊗ | ih | ⊗ | ih | ∀ F (P,Q)= F (ρ, σ)= √piqi, (B9) i The latter can be seen from the fact that Π(ρi i i , σj X j j ) describes a transportation plan which tells⊗| ih us| what⊗ whereas the fidelity between the ensembles (p , i i ) |probabilityih | weights taken from p and q come together as i i j and (q , i i ) is equal to min(p , q ),{ which| ih | is} we transport one distribution on top of the other. The i i i i strictly{ smaller| ih | } than F (ρ, σ) unless p = q , i. The condition Π(ρ i i , σ i i ) = min(p , q ) simply i i i i i i latter pair of ensembles are exactlyP the ensembles∀ that specifies how to⊗ pair | ih | certain⊗ | partsih | of the two distribu- result from a measurement in the i basis applied to tions, each having a total weight of min(p , q ). Since i i i the states ρ and σ. Therefore, the{| Kantorovichi} fidelity the remaining parts of the two distributions have equal can decrease under measurements. Clearly, it is not al- weights, 1 min(p , q ), there certainlyP exists a trans- i i i ways decreasing because it increases under CPTP maps portation− plan according to which one can be mapped (Property 4, Sec. IV.C). on top of theP other. Therefore, the Kantorovich distance We can now see that the difference of the values of the between p and q is given by Eq. (B1). Kantorovich fidelity in the two ‘classical’ limits discussed However, the Kantorovich distance between the orig- earlier can be linked to its lack of monotonicity under inal singleton ensembles is equal to the trace distance measurements. Obviously, through a projective measure- between the two states, ment and averaging, we can go back and forth between 1 these two limits. Since the Kantorovich fidelity is mono- piσi i i qj σj j j 2 k ⊗ | ih |− ⊗ | ih |k tonic under averaging, if it were also monotonic under i j X X measurements, it would have to remain invariant under 1 these operations since they are reversible. By the same = piρi qiσi . (B4) 2 k − k token, any measure of distinguishability between ensem- i X bles, which is monotonic both under measurements and Assume that for a given i, min(pi, qi)= pi. We can write averaging of the ensembles, would have to have the same values in the two classical limits. As we saw for the case qi piρi qiσi = pi ρi σi . (B5) of the Kantorovich distance, however, the latter property k − k k − pi k by itself is not a guarantee for monotonicity. But from the triangle inequality we have

qi qi Appendix C: PROOF OF THEOREM 2 ρi σi ρi σi + σi σi k − pi k≤k − k k − pi k qi = ρi σi +( 1), (B6) From the proof of Property 7 in Sec. IV.B, it can be k − k pi − seen that if the distance (fidelity) between states is jointly 27 convex (concave), the corresponding Kantorovich mea- where pi(ρ) = Tr( i(ρ)), ρi = i(ρ)/pi(ρ). Now ob- sure would be monotonic under averaging of the ensem- serve that there existsM a joint probabilityM distribution bles. The necessity of the conditions in Theorem 2 follows Π(˘ ρi, σj ) that satisfies from the observation that if we apply a measurement on the second subsystem in the basis i , we obtain the en- Π(˘ σi,ρi) = min (Π(ρ, σ)pi(ρ), Π(ρ, σ)pi(σ)) (C5) {| i} sembles (pi,ρi i i ) and (qi, σi i i ) , and if we follow the{ measurement⊗ | ih | } by an{ averaging⊗ | ih of| } the ensem- and has marginals bles, we obtain the original states. If the Kantorovich ˘ measures are monotonic both under measurements and Π(ρi, σj )= P (ρ)pi(ρ), (C6) σ Ω j averaging, they must be invariant during the process. By X∈ X an argument analogous to the one following Eq. (B1), it K can be seen that a Kantorovich distance Dd between en- Π(˘ ρi, σj )= Q(σ)pj (σ). (C7) sembles of the form (pi,ρi i i ) and (qi, σi i i ) { ⊗| ih | } 1 { ⊗| ih | } ρ Ω i is equal to (min(pi, qi)d(ρi, σi) + pi qi ). Simi- X∈ X i 2 | − | larly, a Kantorovich fidelity F K between ensembles of P f This is because condition (C5) is compatible with the the form (pi,ρi i i ) and (qi, σi i i ) is equal marginal conditions (C6) and (C7), which follows from { ⊗ | ih | } { ⊗ | ih | } to i min(pi, qi)f(ρi, σi). an argument analogous to the one in the paragraph after To prove the sufficiency of condition (71), consider two Eq.(B3). For this distribution, we can write ensemblesP of states P (ρ) and Q(σ). Let Π(ρ, σ) be a joint probability distribution that attains the minimum Π˘ (ρi, σj ) d (ρi, σj )= in Eq. (12), i.e., ρ,σ Ω i,j X∈ X K min (Π(ρ, σ)p (ρ), Π(ρ, σ)p (σ)) d (ρ , σ )+ Dd (P,Q)= Π(ρ, σ)d(ρ, σ). (C1) i i i i ρ,σ Ω ρ,σ Ω i X∈ X∈ X ˘ According to condition (71), Π (ρi, σj ) d (ρi, σj ) . (C8) ρ,σ Ω i=j K X∈ X6 Dd (P,Q)= But we have that d( Π(ρ, σ)ρ ρσ ρσ , Π(ρ, σ)σ ρσ ρσ ), ⊗ | ih | ⊗ | ih | ρ,σ Ω ρ,σ Ω d (ρ , σ ) 1 (C9) X∈ X∈ i j (C2) ≤ and where ρσ is a set of orthonormal states, ρσ ρ′σ′ = | i h | i ′ ′ Π˘ (ρ , σ )= δρρ δσσ . Let i , i(ρ) = j Mij ρMij† , be a set of i j completely positive{M } mapsM that form a generalized mea- ρ,σ Ω i=j P X∈ X6 surement, M † M = I. Consider the following i,j ij ij 1 min (Π(ρ, σ)pi(ρ), Π(ρ, σ)pi(σ)) , (C10) − CPTP map: ρ,σ Ω i P X∈ X (ρ)= i(ρ) i i , (C3) M M ⊗ | ih | from which we obtain that the second sum on the right- i X hand side of Eq. (C8) satisfies where i is an orthonormal set of states in the {| i} Π˘ (ρi, σj ) d (ρi, σj ) Hilbert space of some additional system (this map is not ≤ ρ,σ Ω i=j dimension-preserving). From the monotonicity of d(ρ, σ) X∈ X6 under CPTP maps and property (71), it follows that 1 min (Π(ρ, σ)pi(ρ), Π(ρ, σ)pi(σ)) − K ρ,σ Ω i Dd (P,Q)= X∈ X 1 d( Π(ρ, σ)ρ ρσ ρσ , Π(ρ, σ)σ ρσ ρσ ) = Π(ρ, σ)pi(ρ) Π(ρ, σ)pi(σ) . (C11) ⊗ | ih | ⊗ | ih | 2 | − | ρ,σ Ω ρ,σ Ω ρ,σ Ω i X∈ X∈ X∈ X d( Π(ρ, σ) i(ρ) ρσ ρσ i i , Combining Eqs. (C8) and (C11), we see that the expres- ≥ M ⊗ | ih | ⊗ | ih | ρ,σ Ω i X∈ X sion on the right-hand side of the last equality in Eq. (C4) is greater than or equal to Π(ρ, σ) i(σ) ρσ ρσ i i ) M ⊗ | ih | ⊗ | ih | ρ,σ Ω i X∈ X Π˘ (ρi, σj ) d (ρi, σj ) D ˘ (M(P ),M(Q)). ≡ d,Π = min (Π(ρ, σ)pi(ρ), Π(ρ, σ)pi(σ)) d (ρi, σi)+ ρ,σ Ω i,j X∈ X ρ,σ Ω i (C12) X∈ X 1 Π(ρ, σ)pi(ρ) Π(ρ, σ)pi(σ) , (C4) 2 | − | But notice that the quantity (C12) is greater than or ρ,σ Ω i K equal to D (M(P ),M(Q)), where M : M is X∈ X d PΩ → PΩ 28 the map on the original probability distributions induced where ρ and κ are EHS representations of P (ρ) and R(ρ). by the measurement M with measurement superopera- What remains to be shown is that maps and ′ with M M tors i . This is because Π˘ (ρi, σj ) is a joint probabil- the above properties exist. {M } b b ity distribution with marginals P (ρ)pi(ρ) and Q(σ)pj (σ), The maps that we propose act on the pointer space as which are consistent with the distributions M(P ) and follows: M(Q) over ΩM, and therefore the quantity Eq. (C12) is ([ρσ]) = Tσ(κ ρ)[κρσ], (D4) among those quantities over which the minimum in the M | K definition of Dd (M(P ),M(Q)) is taken. Therefore, we have shown that for an arbitrary generalized measure- ′([κσ]) = Tσ′ (ρ κ)[κρσ], (D5) M | ment, where for every σ, Tσ(κ ρ) and Tσ′ (ρ κ) describe transi- K K | | Dd (P,Q) Dd (M(P ),M(Q)). (C13) tion probabilities from ρ to κ and from κ to ρ, respec- ≥ tively, such that This completes the proof of the sufficiency of Eq. (71).

The proof of the sufficiency of Eq. (72) follows in a similar Tσ(κ ρ)Q(ρ, σ)= Tσ′ (ρ κ)Q′(κ, σ) Jσ(κ,ρ). (D6) manner, and we do not present it here. | | ≡ The fact that such transition probabilities exist fol- lows from the fact that for every σ, ρ Q(ρ, σ) = Appendix D: TRIANGLE INEQUALITY FOR THE Q (κ, σ) = Q(σ), i.e., for every fixed σ, Q(ρ, σ) and κ ′ P EHS DISTANCE Q′(κ, σ) describe (unnormalized) distributions of ρ and κ thatP have the same weight and therefore can be mapped Let one on top of each other via stochastic matrices that map

EHS ρ to κ or κ to ρ. D (P,Q)= By construction, we have ∆( P (ρ, σ)ρ [ρσ], Q(ρ, σ)σ [ρσ]) (D1) ⊗ ⊗ ( Q(ρ, σ)σ [ρσ]) = ′( Q′(κ, σ)σ [κσ]) ρ,σ Ω ρ,σ Ω M ⊗ M ⊗ X∈ X∈ ρ,σ Ω κ,σ Ω and X∈ X∈

EHS = Jσ(κ,ρ)σ [κρσ]. (D7) D (Q, R)= κ,ρ,σ ⊗ X ∆( Q′(κ, σ)σ [κσ], R′(κ, σ)κ [κσ]). (D2) ⊗ ⊗ Let us now verify that and ′ applied to κ,σ Ω κ,σ Ω M M P (ρ, σ)ρ [ρσ] and R′(κ, σ)κ [κσ], re- X∈ X∈ ρ,σ Ω ⊗ κ,σ Ω ⊗ Here, the joint probability distributions P (ρ, σ), Q(ρ, σ), spectively,∈ give rise to valid EHS∈ representations of P P P Q′(ρ, σ), R′(ρ, σ) are such that the maxima for and R. From the definition of the maps (D4) and (D5), DEHS(P,Q) and DEHS(Q, R) in Eq. (82) are achieved. one immediately obtains (The left marginals of P (ρ, σ) and R′(ρ, σ) are P (ρ) and ( P (ρ, σ)ρ [ρσ]) = R(ρ), respectively, and the right marginals of Q(ρ, σ) and M ⊗ ρ,σ Ω Q′(ρ, σ) are equal to Q(σ).) X∈ Note that Q(ρ, σ) and Q′(ρ, σ) are generally differ- Tσ(κ ρ)P (ρ, σ)ρ [κρσ] (D8) ent, and we cannot use directly the triangle inequal- κ,ρ,σ | ⊗ ity of ∆ to prove Eq. (95). This is why, we will con- X and struct two CPTP maps, and ′, which map the states Q(ρ, σ)σ M[ρσ] andM Q (κ, σ)σ ρ,σ Ω κ,σ Ω ′ ′( R′(κ, σ)κ [κσ]) = ∈ ⊗ ∈ ⊗ M ⊗ [κσ], respectively, to the same state, while at the same κ,σ Ω P P X∈ time transform the states ρ,σ Ω P (ρ, σ)ρ [ρσ] and ∈ ⊗ Tσ′ (ρ κ)R′(κ, σ)κ [κρσ]. (D9) κ,σ Ω R′(κ, σ)κ [κσ], respectively, to valid EHS rep- | ⊗ ⊗ P κ,ρ,σ resentations∈ of the ensembles P (ρ) and R(ρ). Then using X Pthe monotonicity under CPTP maps of ∆, it will follow The fact that these are EHS representations of the en- that sembles P and R follows from two observations. The EHS EHS first one is that from the pointer [κρσ] one can unam- D (P,Q)+ D (Q, R) ≥ biguously determine the state ρ or κ in the ensemble P ∆( ( P (ρ, σ)ρ [ρσ]), ( Q(ρ, σ)σ [ρσ]))+ or R. The second one is that the joint probability dis- M ⊗ M ⊗ ρ,σ Ω ρ,σ Ω tributions Tσ(κ ρ)P (ρ, σ) and Tσ′ (ρ κ)R′(κ, σ) have the X∈ X∈ correct marginals,| | ∆( ′( Q′(κ, σ)σ [κσ]), ′( R′(κ, σ)κ [κσ])) M ⊗ M ⊗ κ,σ Ω κ,σ Ω X∈ X∈ Tσ(κ ρ)P (ρ, σ)= κ,σ | ∆( ( P (ρ, σ)ρ [ρσ]), ′( R′(κ, σ)κ [κσ])) ≥ M ⊗ M ⊗ X ρ,σ Ω κ,σ Ω X∈ X∈ ( Tσ(κ ρ))P (ρ, σ)= P (ρ, σ)= P (ρ), (D10) EHS | = ∆(ρ, κ) D (P, R), (D3) σ κ σ ≥ X X X b b 29

T ′ (ρ κ)R′(κ, σ)= Notice also that since the trace distance is monotonic σ | ρ,σ under tracing, we have X ( Tσ′ (ρ κ))R′(κ, σ)= R′(κ, σ)= R(κ). (D11) 1 | P (ρ, σ) Q(ρ, σ) σ ρ σ 2 | − |≤ X X X ρ,σ Ω X∈ This completes the proof. 1 P (ρ, σ)ρ Q(ρ, σ)σ ε′. (E9) 2 k − k≤ ρ,σ Ω ∈ Appendix E: CONTINUITY OF THE AVERAGE X OF A CONTINUOUS FUNCTION WITH Therefore, RESPECT TO THE EHS DISTANCE 1 P (ρ, σ) Q(ρ, σ) ε′, (E10) 2 | − |≤ Let h(ρ) be a bounded function, which is continuous Ω>ε with respect to the distance ∆, i.e., for every δ > 0, there X exists ε> 0, such that for all ρ and σ for which and ∆(ρ, σ) ε, (E1) 1 ≤ P (ρ, σ) Q(ρ, σ) ε′. (E11) 2 | − |≤ we have ΩX≤ε 1 h(ρ) h(σ) δ. (E2) On the other hand, we have | − |≤ 2 1 P (ρ, σ)ε P (ρ, σ) ρ σ Let hP denote the average of the function h(ρ) over the ≤ 2 k − k≤ ensemble P (ρ), ρ Ω, ΩX>ε ΩX>ε ∈ 1 1 P (ρ, σ)ρ Q(ρ, σ)σ + Q(ρ, σ) P (ρ, σ) hP = P (ρ)h(ρ). (E3) 2 k − k 2 | − | Ω>ε Ω>ε ρ Ω X X X∈ ε′ + ε′ =2ε′, (E12) ≤ We will show that for every δ > 0, there exists ε′ > 0, such that for all P,Q for which where the second inequality follows from the triangle in- ∈PΩ EHS equality for the trace distance and the third inequality D (P,Q) ε′, (E4) ≤ follows from Eqs. (E8) and (E10). This implies we have 2ε P (ρ, σ) ′ . (E13) hP hQ δ. (E5) ≤ ε | − |≤ ΩX>ε EHS Assume that D (P,Q) ε′. Let P (ρ, σ) and Q(ρ, σ) ≤ Let us now look at the difference between the average be two joint distributions for which the minimum in functions over the two ensembles. Eq. (87) is attained. We then have

hP hQ = P (ρ)h(ρ) Q(σ)h(σ) EHS 1 | − | | − | D (P,Q)= P (ρ, σ)ρ Q(ρ, σ)σ ε′. ρ Ω σ Ω 2 k − k≤ X∈ X∈ ρ,σ Ω X∈ = P (ρ, σ)h(ρ) Q(ρ, σ)h(σ) (E6) | − | ρ,σ Ω ρ,σ Ω X∈ X∈ Define the sets Ω>ε and Ω ε as the sets of all pairs of P (ρ, σ)h(ρ) Q(ρ, σ)h(σ) ≤ ≤ | − |≤ states (ρ, σ) for which ∆(ρ, σ) > ε and ∆(ρ, σ) ε, re- ρ,σ Ω spectively. The sum in Eq. (E6) can then be split≤ in two X∈ (P (ρ, σ) h(ρ) h(σ) + Q(ρ, σ) P (ρ, σ) h(σ) sums, | − | | − || | ρ,σ Ω 1 X∈ P (ρ, σ)ρ Q(ρ, σ)σ + = P (ρ, σ) h(ρ) h(σ) + P (ρ, σ) h(ρ) h(σ) + 2 k − k Ω | − | | − | X>ε ΩX>ε ΩX≤ε 1 P (ρ, σ)ρ Q(ρ, σ)σ ε′. (E7) Q(ρ, σ) P (ρ, σ) h(σ) . (E14) 2 k − k≤ | − || | Ω ρ,σ Ω X≤ε X∈

The first sum obviously can be bounded from above as Since h(ρ) is bounded, there exists a constant hmax > 0 such that h(ρ) h(σ) h and h(ρ) h for 1 | − | ≤ max | | ≤ max P (ρ, σ)ρ Q(ρ, σ)σ ε′. (E8) all ρ and σ. Using this fact, together with Eqs. (E13) 2 k − k≤ Ω>ε and (E9) and the assumption that for all (ρ, σ) Ω ε, X ∈ ≤ 30

1 h(ρ) h(σ) 2 δ, we can upper bound the last line in we obtain |Eq. (E14)− as| follows: ≤ hP hQ δ. (E17) P (ρ, σ) h(ρ) h(σ) + P (ρ, σ) h(ρ) h(σ) + | − |≤ | − | | − | Ω>ε Ω≤ε Since δ was arbitrarily chosen, the property follows. X X Q(ρ, σ) P (ρ, σ) h(σ) | − || |≤ ρ,σ Ω X∈ ACKNOWLEDGMENTS 2ε′ 1 h + P (ρ, σ) δ +2ε′h ε max 2 max ≤ ΩX≤ε The authors thank Emili Bagan, Ram´on Mu˜noz-Tapia, 2ε′ 1 Oriol Romero-Isart, Igor Devetak and Nathan K. Lang- h + δ +2ε′h . (E15) ε max 2 max ford for helpful discussions. This work was supported by the Spanish MICINN through the Ram´on y Cajal Therefore, we see that by choosing program (JC), contract FIS2008-01236/FIS, and project δε QOIT (CONSOLIDER2006-00019), and by the General- ε′ , (E16) itat de Catalunya through CIRIT 2005SGR-00994. ≤ 4hmax(1 + ε)

[1] A. Uhlmann, Rep. Math. Phys. 9, 273 (1976). [19] I. Devetak and A. Winter, e-print [2] M. Nielsen and I. Chuang, Quantum Computation arXiv:quant-ph/0304196 (2003). and Quantum Information (Cambridge Univeristy Press, [20] A. Gilchrist, N. K. Langford, and M. A. Nielsen, Phys. Cambridge, 2000). Rev. A 71, 062310 (2005). [3] D. Bures, Trans. Am. Math. Soc. 135, 199 (1969). [21] C. H. Bennett, H. J. Bernstein, S. Popescu, B. Schu- [4] M. H¨ubner, Phys. Lett. A 163, 239 (1992). macher, Phys. Rev. A 53 2046 (1996). [5] A. Uhlmann, in Quantum Groups and Related Topics. [22] D. Gottesman, Stabilizer codes and quantum er- Proceedings of the First Max Born Symposium, (edited ror orrection, Ph.D. thesis, Caltech, 1997, e-print by R. Gielerak, J. Lukierski, and Z. Popovicz), pp. 267 - arXiv:quant-ph/9705052 (1997). 274 (Kluwer Acad. Publishers, 1992). [23] A. Luis and L. L. Sanchez-Soto, Phys. Rev. Lett. 83, [6] C.W. Helstrom, Quantum Detection and Estimation 3573 (1999). Theory (Academic, New York, 1976). [24] J. Fiurasek, Phys. Rev. A 64, 024102 (2001). [7] J. Lee, M. S. Kim, and C. Brukner, Phys. Rev. Lett. 91, [25] G. M. D’Ariano, L. Maccone, and P. Lo Presti, Phys. 087902 (2003). Rev. Lett. 93, 250407 (2004). [8] K. M. R. Audenaert, J. Calsamiglia, R. Munoz-Tapia, E. [26] J. S. Lundeen, A. Feito, H. Coldenstrodt-Ronge, K. L. Bagan, Ll. Masanes, A. Acin, and F. Verstraete, Phys. Pregnell, Ch. Silberhorn, T. C. Ralph, J. Eisert, M. B. Rev. Lett. 98, 160501 (2007). Plenio and I. A. Walmsley, Nature Phys. 5, 27 (2009). [9] J. Calsamiglia, R. Munoz-Tapia, Ll. Masanes, A. Acin, [27] A. Jamio lkowski, Rep. Math. Phys. 3, 275 (1972). and E. Bagan, Phys. Rev. A 77, 032311 (2008). [28] L. V. Kantorovich, Dokl. Akad. Nauk SSSR, 37, No. 78, [10] O. Oreshkov, Phys. Rev. A 77, 032333 (2008). 227229 (1942). [11] I. Bengtsson and K. Zyczkowski,˙ Geometry of Quan- [29] C. Villani, Optimal Transport: Old and New (Spinger, tum States: An Introduction to Quantum Entanglement Berlin, 2009). (Cambridge Univeristy Press, Cambridge, 2006). [30] L. N. Vasershtein, “Markov processes on a countable [12] C. A. Fuchs, Ph.D. thesis, University of New Mexico, Al- product space, describing large systems of automata”, baquerque, NM, 1996, e-print arXiv:quant-ph/9601020. Problemy Peredachi Informatsii, 5, 3 (1969), pp. 64-73. [13] S. L. Braunstein, C. A. Fuchs, and H. J. Kimble, J. [31] K. Kraus, States, Effects, and Operations: Fundamental Mod. Opt. 47, 267 (2000). Notions of Quantum Theory, Lecture Notes in Physics [14] G. Gour and R. W. Spekkens, New J. Phys. 10, 033023 Vol. 190 (Springer-Verlag, Berlin, 1983). (2008). [32] M. B. Ruskai, Rev. Math. Phys. 6, 1147 (1994). [15] G. Vidal, J. Mod. Opt. 47, 255 (2000). [33] A. S. Holevo, IEEE Trans. Info. Theory 44, 269 (1998). [16] O. Oreshkov and T. A. Brun, Phys. Rev. A 73, 042314 [34] B. Schumacher and M. D. Westmoreland, Phys. Rev. A (2006). 56, 131 (1997). [17] L. B. Levitin, “On the quantum measure of the amount [35] K. M. R. Audenaert, J. Phys. A: Math. Theor. 40, 8127 of information”, in Proceedings of the Fourth All-Union (2007). Conference on Information Theory, Tashkent (1969), pp. [36] M. Fannes, Commun. Math. Phys. 31, 291 (1973). 111, in Russian. [37] R. Jozsa, J. Mod. Opt. 41, 2315 (1994). [18] A. S. Holevo, “Bounds for the quantity of informa- [38] J. L. Dodd and M. A. Nielsen, e-print tion transmitted by a quantum communication channel”, arXiv:quant-ph/0111053 (2001). Probl. Peredachi Inf. 9(3), 311 (1973) [in Russian; En- [39] D. Aharonov, A. Kitaev, and N. Nisan, in Proceedings of glish translation in Probl. Inf. Transm. (USSR) 9, 177183 the 30th Annual ACM Symposium on Theory of Com- (1973)]. putation (STOC), pp. 20-30 (1998). 31

[40] H.-J. Sommers, and K. Zyczkowski,˙ J. Phys. A: Math. [42] J. F. Poyatos, J. I. Cirac, and P. Zoller, Phys. Rev. Lett. Gen. 36, 10083 (2003). 78, 390 (1997). [41] I. L. Chuang and M. A. Nielsen, J. Mod. Opt. 44, 2455 (1997).