<<

Quantifying the magic of quantum channels

Xin Wang,1, 2, ∗ Mark M. Wilde,3, † and Yuan Su1, 4, ‡ 1Joint Center for and Computer Science, University of Maryland, College Park, Maryland 20742, USA 2Institute for , Baidu Research, Beijing 100193, China 3Hearne Institute for Theoretical Physics, Department of Physics and Astronomy, Center for Computation and Technology, Louisiana State University, Baton Rouge, Louisiana 70803, USA 4Department of Computer Science, Institute for Advanced Computer Studies, University of Maryland, College Park, USA (Dated: October 9, 2019) To achieve universal quantum computation via general fault-tolerant schemes, stabilizer operations must be supplemented with other non-stabilizer quantum resources. Motivated by this necessity, we develop a resource theory for magic quantum channels to characterize and quantify the quantum “magic” or non-stabilizerness of noisy quantum circuits. For qu- dit quantum computing with odd dimension d, it is known that quantum states with non- negative Wigner function can be efficiently simulated classically. First, inspired by this ob- servation, we introduce a resource theory based on completely positive-Wigner-preserving quantum operations as free operations, and we show that they can be efficiently simulated via a classical algorithm. Second, we introduce two efficiently computable magic measures for quantum channels, called the mana and thauma of a . As applica- tions, we show that these measures not only provide fundamental limits on the distillable magic of quantum channels, but they also lead to lower bounds for the task of synthesizing non-. Third, we propose a classical algorithm for simulating noisy quantum circuits, whose sample complexity can be quantified by the mana of a quantum channel. We further show that this algorithm can outperform another approach for simulating noisy quantum circuits, based on channel robustness. Finally, we explore the threshold of non- stabilizerness for basic quantum circuits under depolarizing noise.

Contents

I. Introduction 2 A. Background 2 B. Overview of results3

II. Preliminaries 4 A. The stabilizer formalism4 arXiv:1903.04483v2 [quant-ph] 8 Oct 2019 B. Discrete Wigner function4 C. Stabilizer channels and beyond7 D. Magic measures of quantum states7

III. Quantifying the non-stabilizerness of a quantum channel 7 A. Completely Positive-Wigner-Preserving operations7 B. Quantum (CPWP) superchannels 10 C. Logarithmic negativity (mana) of a quantum channel 12

∗Electronic address: [email protected] †Electronic address: [email protected] ‡Electronic address: [email protected] 2

D. Generalized thauma of a quantum channel 15 E. Max-thauma of a quantum channel 19

IV. Distilling magic from quantum channels 24 A. Amortized magic 24 B. Distillable magic of a quantum channel 24 C. Injectable quantum channel 27

V. Magic cost of a quantum channel 29 A. Magic cost of exact channel simulation 29 B. Magic cost of approximate channel simulation 31

VI. Classical simulation of quantum channels 32 A. Classical algorithm for simulating noisy quantum circuits 32 B. Comparison of classical simulation algorithms for noisy quantum circuits 34

VII. Examples 37 A. Non-stabilizerness under depolarizing noise 37 B. Werner-Holevo channel 37

VIII. Conclusion 39

Acknowledgements 39

A. On completely positive maps with non-positive mana 39

B. Data processing inequality for the Wigner trace norm 39

References 40

I. INTRODUCTION

A. Background

One of the main obstacles to physical realizations of quantum computation is decoherence that occurs during the execution of quantum algorithms. Fault-tolerant quantum computation (FTQC) [20, 78] provides a framework to overcome this difficulty by encoding quantum information into quantum error-correcting codes, and it allows reliable quantum computation when the physical error rate is below a certain threshold value. The fault-tolerant approach to quantum computation allows for a limited set of transversal, or manifestly fault-tolerant, operations, which are usually taken to be the stabilizer operations. However, the stabilizer operations alone do not enable universality because they can be simu- lated efficiently on a classical computer, a result known as the Gottesman-Knill theorem [1, 39]. The addition of non-stabilizer quantum resources, such as non-stabilizer operations, can lead to universal quantum computation [12]. With this perspective, it is natural to consider the resource- theoretic approach [23] to quantify and characterize non-stabilizer quantum resources, including both quantum states and channels. One solution for the above scenario is to implement a non-stabilizer operation via state in- jection [100] of so-called “magic states,” which are costly to prepare via [12] (see also [11, 19, 21, 45, 46, 53, 56]). The usefulness of such magic states also motivates the 3 resource theory of magic states [13, 48, 60, 85, 86, 93], where the free operations are the stabilizer operations and the free states are the stabilizer states (abbreviated as “Stab”). On the other hand, since a key step of fault-tolerant quantum computing is to implement non-stabilizer operations, a natural and fundamental problem is to quantify the non-stabilizerness or “magic” of quantum operations. As we are at the stage of Noisy Intermediate-Scale Quantum (NISQ) technology, a resource theory of magic for noisy quantum operations is desirable both to exploit the power and to identify the limitations of NISQ devices in fault-tolerant quantum computation.

B. Overview of results

In this paper, we develop a framework for the resource theory of magic quantum channels, based on qudit systems with odd prime dimension d. Related work on this topic has appeared recently [74], but the set of free operations that we take in our resource theory is larger, given by the completely positive-Wigner-preserving operations as we detail below. We note here that d- level fault-tolerant quantum computation based on qudits with prime d is of considerable interest for both theoretical and practical purposes [4, 18, 28, 40, 50]. Our paper is structured as follows:

• In SectionII, we first review the stabilizer formalism [39] and the discrete Wigner function [43, 44, 99]. We further review various magic measures of quantum states and introduce various classes of free operations, including the stabilizer operations and beyond.

• In Section III, we introduce and characterize the completely positive-Wigner-preserving (CPWP) operations. We then introduce two efficiently computable magic measures for quantum channels. The first is the mana of quantum channels, whose state version was introduced in [86]. The second is the max-thauma of quantum channels, inspired by the magic state measure in [93]. We prove several desirable properties of these two measures, including reduction to states, faithfulness, additivity for tensor products of channels, sub- additivity for serial composition of channels, an amortization inequality, and monotonicity under CPWP superchannels.

• In SectionIV, we explore the ability of quantum channels to generate magic states. We first introduce the amortized magic of a quantum channel as the largest amount of magic that can be generated via a quantum channel. Furthermore, we introduce an information- theoretic notion of the distillable magic of a quantum channel. In particular, we show that both the amortized magic and distillable magic of a quantum channel can be bounded from above by its mana and max-thauma.

• In SectionV, we apply our magic measures for quantum channels in order to evaluate the magic cost of quantum channels, and we explore further applications in quantum gate syn- thesis. In particular, we show that at least four T gates are required to perfectly implement a controlled-controlled-NOT gate.

• In SectionVI, we propose a classical algorithm, inspired by [65], for simulating quantum circuits, which is relevant for the broad class of noisy quantum circuits that are currently being run on NISQ devices. This algorithm has sample complexity that scales with respect to the mana of a quantum channel. We further show by concrete examples that the new al- gorithm can outperform a previous approach for simulating noisy quantum circuits, based on channel robustness [74]. 4

II. PRELIMINARIES

A. The stabilizer formalism

For most known fault-tolerant schemes, the restricted set of quantum operations is the sta- bilizer operations, consisting of preparation and measurement in the computational basis and a restricted set of unitary operations. Here we review the basic elements of the stabilizer states and operations for systems with a dimension that is a product of odd primes. Throughout this paper, a Hilbert space implicitly has an odd dimension, and if the dimension is not prime, it should be understood to be a tensor product of Hilbert spaces each having odd prime dimension. Let Hd denote a Hilbert space of dimension d, and let {|ji}j=0,··· ,d−1 denote the standard computational basis. For a prime number d, we define the unitary boost and shift operators X,Z ∈ L(Hd) in terms of their action on the computational basis:

X|ji = |j ⊕ 1i, (1) Z|ji = ωj|ji, ω = e2πi/d, (2) where ⊕ denotes addition modulo d. We define the Heisenberg–Weyl operators as

−a1a2 a1 a2 Tu = τ Z X , (3)

(d+1)πi/d where τ = e , u = (a1, a2) ∈ Zd × Zd. For a system with composite Hilbert space HA ⊗ HB, the Heisenberg–Weyl operators are the tensor product of the subsystem Heisenberg–Weyl operators:

TuA⊕uB = TuA ⊗ TuB , (4) where uA ⊕ uB is an element of ZdA × ZdA × ZdB × ZdB . The Clifford operators Cd are defined to be the set of unitary operators that map Heisenberg– Weyl operators to Heisenberg–Weyl operators under unitary conjugation up to phases:

0 † iθ U ∈ Cd iff ∀u, ∃θ, u , s.t. UTuU = e Tu0 . (5)

These operators form the Clifford group. The pure stabilizer states can be obtained by applying Clifford operators to the state |0i:

† {Sj} = {U|0ih0|U : U ∈ Cd}. (6)

A state is defined to be a magic or non-stabilizer state if it cannot be written as a convex com- bination of pure stabilizer states.

B. Discrete Wigner function

The discrete Wigner function [43, 44, 99] was used to show the existence of bound magic states [85]. For an overview of discrete Wigner functions, we refer to [85, 86] for more details. See also [33] for a review of quasi-probability representations in quantum theory, with applica- tions to quantum information science. For each point u ∈ Zd × Zd in the discrete phase space, there is a corresponding operator Au, and the value of the discrete Wigner representation of a state ρ at this point is given by 1 W (u) := Tr[A ρ], (7) ρ d u 5

where d is the dimension of the Hilbert space and {Au}u are the phase-space point operators:

1 X A = T ,A = T A T †. (8) 0 d u u u 0 u u

The discrete Wigner function can be defined more generally for a Hermitian operator X acting on a space of dimension d via the same formula: 1 W (u) = Tr[A X]. (9) X d u For the particular case of a measurement operator E satisfying 0 ≤ E ≤ 1, the discrete Wigner representation is defined as   W (E|u) := Tr EAu , (10) i.e., without the prefactor 1/d. The reason for this will be clear in a moment and is related to the distinction between a frame and a dual frame [34, 35, 65]. Some nice properties of the set {Au}u are listed as follows:

1. Au is Hermitian; P 1 2. u Au/d = ; 0 3. Tr[AuAu0 ] = d δ(u, u );

4. Tr[Au] = 1; P 5. ρ = u Wρ(u)Au; T 6. {Au}u = {Au }u. From the second property above and the definition in (7), we conclude the following equality for a quantum state ρ: X Wρ(u) = 1. (11) u

For this reason, the discrete Wigner function is known as a quasi-probability distribution. More generally, for a Hermitian operator X, we have that X WX (u) = Tr[X], (12) u P so that for a subnormalized state ω, satisfying ω ≥ 0 and Tr[ω] ≤ 1, we have that u Wω(u) ≤ 1. Following the convention in (10) for measurement operators, we find the following for a posi- x x P x 1 tive operator-valued measure (POVM) {E }x (satisfying E ≥ 0 ∀x and x E = ): X W (Ex|u) = 1, (13) x so that the quasi-probability interpretation is retained for a POVM. That is, W (Ex|u) can be inter- preted as the conditional quasi-probability of obtaining outcome x given input u. 6

We can quantify the amount of negativity in the discrete Wigner function of a state ρ via the sum negativity, which is equal to the absolute sum of the negative elements of the Wigner func- tion [86]: ! ! X 1 X 1 X 1 sn(ρ) := |W (u)| = |W (u)| − W (u) = |W (u)| − . (14) ρ 2 ρ ρ 2 ρ 2 u:Wρ(u)<0 u u By definition, we find that sn(ρ) ≥ 0. The mana of a state ρ is defined as [86] ! X M(ρ) := log |Wρ(u)| = log(2 · sn(ρ) + 1) ≥ 0. (15) u We define the mana more generally, as in [93], for a positive semi-definite operator X via the formula !     X X M(X) := log |WX (u)| = log2  |WX (u)| + Tr[X] . (16)

u u:WX (u)<0

We denote the set of quantum states with a non-negative Wigner function by W+ (Wigner polytope), i.e.,

W+ := {ρ : ∀u,Wρ(u) ≥ 0, ρ ≥ 0, Tr[ρ] = 1}. (17) It is known that quantum states with non-negative Wigner function are classically simulable and thus are useless in magic state distillation [85], which can be seen as the analog of states with positive partial transpose (PPT) in entanglement distillation [47, 66]. Motivated by the Rains bound [71] and its variants [32, 82, 83, 88–91] in entanglement theory, the set of sub-normalized states with non-positive mana was introduced as follows [93] to explore the resource theory of magic states: ( ) X W := σ : |Wσ(u)| ≤ 1, σ ≥ 0 = {σ : M(σ) ≤ 0, σ ≥ 0} . (18) u It follows from definitions and the triangle inequality that Tr[σ] ≤ 1 if σ ∈ W (alternatively one can conclude this by inspecting the right-hand side of (16)). Furthermore, we define Wc+ to be the set of Hermitian operators with non-negative Wigner function:

Wc+ := {V : ∀u,WV (u) ≥ 0}. (19) The Wigner trace norm and Wigner spectral norm of an Hermitian operator V are defined as follows, respectively: X X kV kW,1 := |WV (u)| = | Tr[AuV ]/d|, (20) u u

kV kW,∞ := d max |WV (u)| = max | Tr[AuV ]|. (21) u u The Wigner trace and spectral norms are dual to each other in the following sense:

kV kW,1 := max{| Tr[VC]| : kCkW,∞ ≤ 1}, (22) C

kV kW,∞ := max{| Tr[VC]| : kCkW,1 ≤ 1}, (23) C with C ranging over Hermitian operators within the same space. 7

Measures Acronym Definition P Mana [86] M(ρ) log u | Tr Auρ|/d ρ+rσ Robustness of magic [48] R(ρ) inf{2r + 1 : 1+r = τ, σ, τ ∈ Stab} Relative entropy of magic [86] RM(ρ) infσ∈Stab D(ρkσ) ∞ ∞ ⊗n Regularized relative entropy of magic [86] RM(ρ) limn→∞ RM(ρ )/n Max-thauma [93] θmax(ρ) infσ∈W Dmax(ρkσ)

Thauma [93] θ(ρ) infσ∈W D(ρkσ) ∞ ⊗n Regularized thauma [93] θ (ρ) limn→∞ θ(ρ )/n

Min-thauma [93] θmin(ρ) infσ∈W D0(ρkσ)

TABLE I: Partial zoo of magic measures.

C. Stabilizer channels and beyond

A stabilizer operation (SO) consists of the following types of quantum operations: Clifford op- erations, tensoring in stabilizer states, partial trace, measurements in the computational basis, and post-processing conditioned on these measurement results. Any quantum protocol composed of these quantum operations can be written in terms of the following Stinespring dilation represen- † tation: E(ρ) = TrE[U(ρ ⊗ ρE)U ], where U is a Clifford unitary and the ancilla ρE is a stabilizer state. The authors of [2] generalized the set of stabilizer operations to stabilizer-preserving opera- tions, which are those that transform stabilizer states to stabilizer states and which form the largest set of physical operations that can be considered free for the resource theory of non-stabilizerness. More recently, Ref. [74] introduced the completely stabilizer-preserving operations (CSPO); i.e., a quantum operation Π is called completely stabilizer-preserving if for any reference system R,

∀ρRA ∈ Stab, (idR ⊗ΠA→B)(ρRA) ∈ Stab. (24)

D. Magic measures of quantum states

We review some of the magic measures of quantum states in TableI. In particular, the max- thauma of a quantum state ρ is defined as follows [93]:

h λ i θmax(ρ) := min Dmax(ρkσ) := min min{λ : ρ ≤ 2 σ} (25) σ∈W σ∈W

= log2 min {kV kW,1 : ρ ≤ V } , (26) where the max-relative entropy Dmax(ρkσ) was defined in [27].

III. QUANTIFYING THE NON-STABILIZERNESS OF A QUANTUM CHANNEL

A. Completely Positive-Wigner-Preserving operations

A consisting of an initial quantum state, unitary evolutions, and measure- ments, each having non-negative Wigner functions, can be classically simulated [65]. It is thus nat- ural to consider free operations to be those that completely preserve the positivity of the Wigner function. Indeed, any such quantum operations are proved to be efficiently simulated via classi- cal algorithms in SectionVI and thus become reasonable free operations for the resource theory of magic. 8

CPWPO CSPO

SO

FIG. 1: Relationship between stabilizer operations, completely stabilizer-preserving operations, and com- pletely PWP operations.

Definition 1 (Completely PWP operation) A Hermiticity-preserving linear map Π is called com- pletely positive Wigner preserving (CPWP) if for any system R with odd dimension, the following holds

∀ρRA ∈ W+, (idR ⊗ΠA→B)(ρRA) ∈ W+ . (27)

Figure1 depicts the relationship between stabilizer operations, completely stabilizer- preserving operations, and completely PWP operations. We now recall the definition of the discrete Wigner function of a quantum channel from [60], which is strongly related to the Wigner function of a quantum channel as defined in [5, Eq. (95)].

Definition 2 (Discrete Wigner function of a quantum channel) Given a quantum channel NA→B, its discrete Wigner function is defined as

1 u T v N WN (v|u) := Tr[((AA) ⊗ AB)JAB] (28) dB 1 v u = Tr[ABN (AA)]. (29) dB N P Here JAB = ij |iihj|A ⊗ N (|iihj|A0 ) denotes the Choi-Jamiołkowski matrix [24, 51] of the channel N , where {|iiA}i and {|iiA0 }i are orthonormal bases on isomorphic Hilbert spaces HA and HA0 , respectively. More generally, the discrete Wigner function of a Hermiticity-preserving linear map PA→B can be defined using the same formula in (29), by substituting N therein with P. From the definition above and the properties recalled in SectionIIB, it follows for a quantum channel NA→B that X WN (v|u) = 1 ∀u, (30) v because " ! # X X X Av W (v|u) = Tr[Av N (Au )]/d = Tr B N (Au ) (31) N B A B d A v v v B u u u = Tr[IBN (AA)] = Tr[N (AA)] = Tr[AA] = 1, (32) where the penultimate equality follows from the fact that N is trace preserving (in fact here we did not require complete positivity or even linearity). Due to the normalization in (30), WN (v|u) can be interpreted as a conditional quasi-probability distribution. Furthermore, the discrete Wigner function of a channel allows one to determine the output Wigner function from the input Wigner function by propagating the quasi-probability distribi- tions, just as one does in the classical case. When there is no reference system, such a statement was proved in [60]. Here we slightly extend this result to the case with a reference system in the following lemma. 9

Lemma 1 For an input state ρAR and a quantum channel NA→B with respective Wigner functions

WρAR (u, y) and WN (v|u), the Wigner function WN (ρAR)(v, y) of the output state NA→B(ρAR) is given by X WNA→B (ρAR)(v, y) = WN (v|u) WρAR (u, y). (33) u Proof. The proof is straightforward:

1 v y WN (ρAR)(v, y) = Tr[(AB ⊗ AR)NA→B(ρAR)] (34) dBdR 1 X = Tr[(Av ⊗ Ay )N (W (u, w)Au ⊗ Aw)] (35) d d B R A→B ρAR A R B R u,w 1 X = W (u, w) Tr[Av N (Au ) ⊗ Ay Aw)] (36) d d ρAR B A→B A R R B R u,w 1 X = W (u, w) Tr[Av N (Au )]d δ(y, w) (37) d d ρAR B A→B A R B R u,w X = WN (v|u) WρAR (u, y). (38) u All steps follow from definitions and the properties of the phase-space point operators recalled P v w in SectionIIB. In particular, we made use of the fact that ρAR = v,w WρAR (v, w)AA ⊗ AR in the second equality. 

Theorem 2 The following statements about CPWP operations are equivalent: 1. The quantum channel N is CPWP;

2. The discrete Wigner function of the Choi-Jamiołkowski matrix JN is non-negative;

3. WN (v|u) is non-negative for all u and v (i.e., WN (v|u) is a conditional probability distribution or classical channel).

Proof. 1 → 2: Let us first apply the (stabilizer) qudit controlled-NOT gate CNOTd to the stabilizer state |+i ⊗ |0i to prepare the maximally entangled state Φd ∈ W+. Since N completely preserves the positivity of the Wigner function, it follows that

JN /d = (idR ⊗N )(Φd) ∈ W+. (39) 2 → 3: We find that u T v u0 v WN (v|u) = Tr[JN ((AA) ⊗ AB)]/dB = Tr[JN (AA ⊗ AB)]/dB ≥ 0. (40) u0 u T 0 u In the last inequality, we note that AA = (AA) and we can always find such u since {AA}u = u T {(AA) }u. The fact that WN (v|u) is a conditional probability distribution follows from the in- equality in (40) and the constraint in (30). 3 → 1: If the channel N has a non-negative Wigner function, then for an input state ρAR such that ρAR ∈ W+, it follows from Lemma1 that X WN (ρAR)(v, y) = WN (v|u) WρAR (u, y) ≥ 0, (41) u concluding the proof.  We remark here that the equivalence between 2 and 3 above was proved in [60], and our con- tribution is to show the equivalence between 2, 3, and the completely positive Wigner preserving property, which considers information processing in the presence of reference systems. 10

B. Quantum (CPWP) superchannels

A superchannel Ξ(A→B)→(C→D) is a quantum-physical evolution of a quantum channel NA→B [22, 42], which leads to an output channel KC→D as

KC→D = Ξ(A→B)→(C→D)(NA→B). (42)

The output channel KC→D taking system C to system D can be denoted by Ξ(N ) for short. The key property of a quantum superchannel is that the output map  idR ⊗Ξ(A→B)→(C→D) (NRA→RB) (43) is a legitimate quantum channel for all input bipartite channels NRA→RB, where the reference system R is arbitrary and idR denotes the identity superchannel. A superchannel Ξ(A→B)→(C→D) has a physical realization in terms of a pre-processing channel EC→AM and a post-processing channel DBM→D [22, 42], so that

Ξ(A→B)→(C→D)(NA→B) = DBM→D ◦ NA→B ◦ EC→AM . (44)

The superchannel Ξ(A→B)→(C→D) is in one-to-one correspondence with a bipartite channel PCB→AD, defined as

PCB→AD := DBM→D ◦ EC→AM . (45)

0 Related to this, an arbitrary bipartite channel PCB→AD is in one-to-one correspondence with a 0 superchannel Ξ(A→B)→(C→D) as long as it obeys the following non-signaling constraint [68, The- orem 4]:

0 0 π TrD ◦PCB→AD = TrD ◦PCB→AD ◦ RB, (46) π π where RB is a replacer channel, defined as RB(ωB) = Tr[ωB]πB with πB the maximally mixed state. That is, the non-signaling constraint implies that a trace out of system D has the effect of tracing and replacing system B, thus preventing B from signaling to A. The Choi operator of a quantum superchannel Ξ(A→B)→(C→D) is given by the Choi operator of its corresponding bipartite channel PCB→AD [22, 42]:

Ξ X 0 0 0 0 JCBAD := |iihj|C ⊗ |i ihj |B ⊗ PC0B0→AD(|iihj|C0 ⊗ |i ihj |B0 ), (47) i,j,i0,j0 where systems B0 and C0 are isomorphic to B and C, respectively. It obeys the following con- straints:

Ξ JCBAD ≥ 0, (48) Ξ TrAD[JCBAD] = ICB, (49) Ξ Ξ TrD[JCBAD] = TrDB[JCBAD] ⊗ πB, (50) which correspond respectively to complete positivity of the corresponding bipartite channel PCB→AD, trace preservation of the bipartite channel PCB→AD, and the B 6→ A non-signaling constraint. Conversely, any operator obeying the three constraints above is a bipartite channel corresponding to a superchannel. One can employ the following propagation rule [22, 42] to de- termine the Choi operator of the output channel in (42):

K h N T  Ξ i JCD = TrAB JAB ⊗ ICD JCBAD , (51) 11 where the superscript T denotes the transpose operation. By employing Definition2, we define the discrete Wigner function of a quantum superchan- nel Ξ(A→B)→(C→D), and we do so by means of its corresponding bipartite channel PCB→AD. That is, since PCB→AD is a channel, it has a discrete Wigner function

1 uA vD  uC vB  WΞ(uA, vD|uC , vB) = Tr[ AA ⊗ AD PCB→AD AC ⊗ AB ], (52) dAdD where we use the subscript Ξ to indicate its assocation with the superchannel Ξ and the choice of letters u and v are made in the above way because, in what follows, we will link up the discrete Wigner function WN (vB|uA) of a quantum channel NA→B with WΞ and the notation given above is more convenient for doing so. In addition to obeying the following property X WΞ(uA, vD|uC , vB) = 1, (53) uA,vD so that WΞ(uA, vD|uC , vB) is a conditional quasi-probability distribution, there is an extra con- straint imposed on WΞ(uA, vD|uC , vB) related to the non-signaling constraint B 6→ A in (46). To see this, let X WΞ(uA|uC , vB) := WΞ(uA, vD|uC , vB). (54) vD

By employing the non-signaling constraint in (46) and properties of the phase-space point oper- ators, it is straightforward to conclude that the non-signaling constraint B 6→ A is equivalent to the following condition on the discrete Wigner function WΞ(uA, vD|uC , vB):

0 0 WΞ(uA|uC , vB) = WΞ(uA|uC , vB) ∀vB, vB, (55) so that we can write

WΞ(uA|uC ) := WΞ(uA|uC , vB). (56)

This can be interpreted as indicating that the output phase-space point uA is independent of vB if system D is not available (i.e., has been marginalized). We note here that the conditions in (55) represent a direct generalization of non-signaling constraints for classical probability distributions to quasi-probability distributions. Furthermore, we also observe that the super-quasi-probability distribution in (52) represents a generalization of the classical superchannels discussed in [61, 62]. By employing the propagation rule in (51) and a sequence of steps similar to those given in the proof of Lemma1, we conclude that the discrete Wigner function of the output channel KC→D = Ξ(A→B)→(C→D)(NA→B) is given by X WΞ(N )(vD|uC ) = WΞ(uA, vD|uC , vB)WN (vB|uA)., (57) uA,vB again generalizing the fully classical case from [61, 62]. We now define CPWP superchannels as free superchannels that extend the notion of CPWP channels:

Definition 3 (CPWP superchannel) A superchannel Ξ(A→B)→(C→D) is completely CPWP preserv- ing (CPWP superchannel for short) if, for all CPWP channels NRA→RB, the output channel Ξ(A→B)→(C→D)(NRA→RB) is CPWP, where R is an arbitrary reference system. 12

We then have the following theorem as a generalization of Theorem2 (its proof is very similar and so we omit it):

Theorem 3 The following statements about CPWP superchannels are equivalent:

1. The quantum superchannel Ξ is CPWP;

Ξ 2. The discrete Wigner function of the Choi matrix JCBAD is non-negative;

3. The discrete Wigner function WΞ(uA, vD|uC , vB) is non-negative for all uA, vD, uC , and vB (i.e., WΞ(uA, vD|uC , vB) is a conditional probability distribution or classical bipartite channel with a non-signaling constraint).

An interesting consequence of the third part of the above theorem is that every CPWP super- channel has a non-unique realization in terms of pre- and post-processing CPWP channels. This follows from the fact that every non-signaling classical bipartite channel can be realized in terms of pre- and post-processing classical channels (see the discussion surrounding [61, Eq. (7)]), and these pre- and post-processing classical channels can be identified as the discrete Wigner functions of pre- and post-processing CPWP channels.

C. Logarithmic negativity (mana) of a quantum channel

To quantify the magic of quantum channels, we introduce the mana (or logarithmic negativity) of a quantum channel NA→B:

Definition 4 (Mana of a quantum channel) The mana of a quantum channel NA→B is defined as

u M(NA→B) := log max kNA→B(AA)kW,1 (58) u X 1 v u = log max |Tr[ABNA→B(AA)]| (59) u d v B X 1 u v N = log max Tr[(AA ⊗ AB)JAB] (60) u d v B X = log max |WN (v|u)|. (61) u v

More generally, we define the mana of a Hermiticity-preserving linear map PA→B via the same formula above, but substituting N with P.

In the following, we are going to show that the mana of a quantum channel has many desirable properties, such as

1. Reduction to states: M(N ) = M(σ) when the channel N is a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, with σ a state.

2. Additivity under tensor products (Proposition5): M(N1 ⊗ N2) = M(N1) + M(N2).

3. Subadditivity under serial composition of channels (Proposition6): M(N2◦N1) ≤ M(N1)+ M(N2).

4. Faithfulness (Proposition7): M(N ) ≥ 0 and M(N ) = 0 if and only if N ∈ CPWP. 13

5. Amortization inequality (Proposition8): ∀ρRA, M((idR ⊗N )(ρRA)) − M(ρRA) ≤ M(N ).

6. Monotonicity under CPWP superchannels (Proposition9), which implies monotonicity un- der completely stabilizer-preserving superchannels. Proposition 4 (Reduction to states) Let N be a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbi- trary input state ρ, with σ a state. Then M(N ) = M(σ). (62)

Proof. Applying definitions and the fact that Tr[Au] = 1 for a phase-space point operator Au, we find that

M(N ) = log max kN (Au)kW,1 = log max k Tr[Au]σkW,1 = log kσkW,1 = M(σ), (63) u u concluding the proof. 

Proposition 5 (Additivity) For quantum channels N1 and N2, the following additivity identity holds

M(N1 ⊗ N2) = M(N1) + M(N2). (64)

More generally, the same additivity identity holds if N1 and N2 are Hermiticity-preserving linear maps. Proof. The proof relies on basic properties of the Wigner 1-norm and composite phase-space point operators, i.e.,

M(N1 ⊗ N2) = log max k(N1 ⊗ N2)(Au1 ⊗ Au2 )kW,1 (65) u1,u2

= log max [kN1(Au1 )kW,1 · kN2(Au2 )kW,1] (66) u1,u2

= log max kN1(Au1 )kW,1 + log max kN2(Au2 )kW,1 (67) u1 u2

= M(N1) + M(N2). (68)

This concludes the proof. 

Proposition 6 (Subadditivity) For quantum channels N1 and N2, the following subadditivity inequal- ity holds

M(N2 ◦ N1) ≤ M(N1) + M(N2). (69)

More generally, the same subadditivity inequality holds if N1 and N2 are Hermiticity-preserving linear maps.

Proof. Consider the following for an arbitrary phase-space point operator Au:

k(N2 ◦ N1)(Au)kW,1 log k(N2 ◦ N1)(Au)kW,1 = log + log kN1(Au)kW,1 (70) kN1(Au)kW,1 P 0  N 0 W ( )A 0 2 u N1(Au) u u W,1 = log P 0 + log kN1(Au)kW,1 (71) u0 |WN1(Au)(u )| 0 X |WN1(Au)(u )| 0 ≤ log P 0 kN2 (Au )kW,1 + log kN1(Au)kW,1 (72) 0 |WN (A )(u )| u0 u 1 u

≤ log max kN2 (Au0 )kW,1 + log max kN1(Au)kW,1 (73) u0 u = M(N1) + M(N2). (74)

Since the chain of inequalities holds for an arbitrary phase-space point operator Au, we conclude the statement of the proposition.  14

Proposition 7 (Faithfulness) Let NA→B be a quantum channel. Then the mana of the channel N satis- fies M(N ) ≥ 0, and M(N ) = 0 if and only if N ∈ CPWP.

Proof. To see the first claim, from the assumption that N is a quantum channel and (30), we find that   X X |WN (v|u)| = 2  |WN (v|u)| + 1 ≥ 1 ∀u. (75)

v v:WN (v|u)<0

Taking a maximization over u and applying a logarithm leads to the conclusion that M(N ) ≥ 0 for all channels N . Now suppose that N ∈ CPWP. Then by Theorem2, it follows that WN (v|u) is a conditional P P probability distribution, so that v |WN (v|u)| = v WN (v|u) = 1 for all u. It then follows from the definition that M(N ) = 0. P Finally, suppose that M(N ) = 0. By definition, this implies that maxu v |WN (v|u)| = 1. However, consider that the rightmost inequality in (75) holds for all channels. So our assumption P |W ( | )| = 0 W ( | ) ≥ and this inequality imply that v:WN (v|u)<0 N v u for all u, which means that N v u 0 for all u, v. By Theorem2, it follows that N ∈ CPWP. 

Proposition 8 (Amortization inequality) For any quantum channel NA→B, the following inequality holds

sup[M(N (ρA)) − M(ρA)] ≤ M(N ). (76) ρA

Furthermore, we have that

sup[M((idR ⊗N )(ρRA)) − M(ρRA)] ≤ M(N ). (77) ρRA

Proof. The inequality in (76) is a direct consequence of reduction to states (Proposition4) and subadditivity of mana with respect to serial compositions (Proposition6). Indeed, letting N 0 be a replacer channel that prepares the state ρA, we find that

0 0 M(N (ρA)) = M(N ◦ N ) ≤ M(N ) + M(N ) = M(N ) + M(ρA). (78) for all input states ρA, from which we conclude (76). By applying the inequality in (78) with the substitution N → id ⊗N , the additivity of the mana of a channel from Proposition5, and the fact that the identity channel is free (and thus has mana equal to zero), we finally conclude that

M((idR ⊗N )(ρRA)) − M(ρRA) ≤ M(idR ⊗N ) (79)

= M(idR) + M(N ) (80) = M(N ), (81) from which we conclude (77). 

CPWP Theorem 9 (Monotonicity) Let NA→B be a quantum channel, and let Ξ be a CPWP superchannel as given in Definition3. Then M(N ) is a channel magic measure in the sense that

M(N ) ≥ M(ΞCPWP(N )). (82) 15

Proof. Recalling the definition of the channel mana in terms of the discrete Wigner function (see (61)) and abbreviating ΞCPWP as Ξ, consider that

CPWP X M(Ξ (N )) = log max WΞ(N )(vD|uC ) (83) uC vD

X X = log max WΞ(uA, vD|uC , vB)WN (vB|uA) (84) uC vD uA,vB X ≤ log max |WΞ(uA, vD|uC , vB)WN (vB|uA)| (85) uC vD,uA,vB X = log max WΞ(uA, vD|uC , vB) |WN (vB|uA)| (86) uC vD,uA,vB X = log max WΞ(uA|uC , vB) |WN (vB|uA)| (87) uC uA,vB X = log max WΞ(uA|uC ) |WN (vB|uA)| . (88) uC uA,vB

The second equality follows from (57). The first inequality follows from the triangle inequal- ity. The third equality follows from the assumption that the superchannel Ξ is CPWP, so that its discrete Wigner function is non-negative (see Theorem3). The fourth equality follows from marginalizing WΞ over vD. The fifth equality follows from the non-signaling constraint in (55). Continuing, we find that X X Eq. (88) = log max WΞ(uA|uC ) |WN (vB|uA)| (89) uC uA vB " # X X ≤ log max WΞ(uA|uC ) max |WN (vB|uA)| (90) uC uA uA vB X = log max |WN (vB|uA)| (91) uA vB = M(N ). (92)

The first equality follows from rearranging sums. The inequality follows from bounding P |W (v |u )| u vB N B A in terms of its maximum value (so that it is no longer dependent on A). The P W (u |u ) = 1 penultimate equality follows because uA Ξ A C , and the final one follows by defini- tion. 

Remark 1 We note here that the monotonicity inequality in (82) holds more generally if N is a completely positive map that is not necessarily trace preserving.

D. Generalized thauma of a quantum channel

In this section, we define a rather general measure of magic for a quantum channel, called the generalized thauma, which extends to channels the definition from [93] for states. To define it, recall that a generalized divergence D(ρkσ) is any function of a quantum state ρ and a positive semi-definite operator σ that obeys data processing [69, 77], i.e., D(ρkσ) ≥ D(N (ρ)kN (σ)) where N is a quantum channel. Examples of generalized divergences, in addition to the trace distance and relative entropy, include the Petz–Renyi´ relative entropies [67], the sandwiched Renyi´ relative 16 entropies [63, 98], the Hilbert α-divergences [17], and the χ2 divergences [81]. One can then define the generalized channel divergence [57], as a way of quantifying the distinguishability of two quantum channels NA→B and PA→B, as follows:

D(N kP) := sup D(NA→B(ψRA)kPA→B(ψRA)), (93) ψRA where the optimization is with respect to all pure states ψRA such that system R is isomorphic to the channel input system A (note that one does not achieve a higher value of D(N kP) by allowing for an optimization over mixed states ρRA with an arbitrarily large reference system [57], as a consequence of purification, the Schmidt decomposition theorem, and data processing). More generally, PA→B can be a completely positive map in the definition in (93). Interestingly, the generalized channel divergence is monotone under the action of a superchannel Ξ:

D(N kP) ≥ D(Ξ(N )kΞ(P)), (94) as shown in [42, Section V-A]. We then define generalized thauma as follows:

Definition 5 (Generalized thauma of a quantum channel) The generalized thauma of a quantum channel NA→B is defined as

θ(N ) := inf D(N kE), (95) E:M(E)≤0 where the optimization is with respect to all completely positive maps E having mana M(E) ≤ 0.

It is clear that the above definition extends the generalized thauma of a state [93], which we recall is given by

θ(ρ) := inf D(ρkσ). (96) σ≥0:M(σ)≤0

We now prove that the generalized thauma of a quantum channel reduces to the state measure whenever the channel N is a replacer channel:

Proposition 10 (Reduction to states) Let N be a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, where σ is a state. Then

θ(N ) = θ(σ). (97)

Proof. First, denoting the maximally mixed state by π, consider that

θ(N ) = inf sup D((idR ⊗N )(ψRA)k(idR ⊗E)(ψRA)) (98) E:M(E)≤0 ψRA

≥ inf D(πR ⊗ N (πA)kπR ⊗ E(πA)) (99) E:M(E)≤0

= inf D(σBkE(π)) (100) E:M(E)≤0

= inf D(σBkω) (101) ω:M(ω)≤0 = θ(σ). (102)

The first equality follows from the definition. The inequality follows by choosing the input state suboptimally to be πR ⊗ πA. The second equality follows because the generalized divergence 17 is invariant with respect to tensoring in the same state for both arguments. The third equality follows because π is a free state with non-negative Wigner function and E is a completely positive map with M(E) ≤ 0. Since one can reach all and only the operators ω ∈ W, the equality follows. Then the last equality follows from the definition. To see the other inequality, consider that E(ρ) = Tr[ρ]ω, for ω ∈ W, is a particular completely positive map satisfying M(E) = M(ω) ≤ 0, so that

θ(N ) = inf sup D((idR ⊗N )(ψRA)k(idR ⊗E)(ψRA)) (103) E:M(E)≤0 ψRA

≤ inf D(ψR ⊗ σBkψR ⊗ ωB) (104) ω:M(ω)≤0

= inf D(σBkωB) (105) ω:M(ω)≤0 = θ(σ). (106)

This concludes the proof. 

That the generalized thauma of channels proposed in (95) is a good measure of magic for quantum channels is a consequence of the following proposition:

CPWP Theorem 11 (Monotonicity) Let NA→B be a quantum channel, and let Ξ be a CPWP superchan- nel as given in Definition3. Then θ(N ) is a channel magic measure in the sense that

CPWP θ(NA→B) ≥ θ(Ξ (NA→B)) . (107)

Proof. The idea is to utilize the generalized divergence and its basic property of data processing. In more detail, consider that

θ(NA→B) = inf D(NA→BkE) (108) E:M(E)≤0 CPWP CPWP ≥ inf D(Ξ (NA→B)kΞ (E)) (109) E:M(E)≤0 CPWP ≥ inf D(Ξ (NA→B)kEb) (110) Eb:M(Eb)≤0 CPWP = θ(Ξ (NA→B)). (111)

The first inequality follows from the fact that the generalized divergence of channels is monotone under the action of a superchannel [42, Section V-A]. The second inequality follows from the monotonicity of M(N ) given in Theorem9 (and which extends more generally to completely positive maps as stated in Remark1). This monotonicity implies that M(E) ≥ M(ΞCPWP(E)) and leads to the second inequality. 

A generalized divergence is called strongly faithful [8] if for a state ρA and a subnormalized state σA, we have D(ρAkσA) ≥ 0 in general and D(ρAkσA) = 0 if and only if ρA = σA.

Proposition 12 (Faithfulness) Let D be a strongly faithful generalized divergence. Then the generalized thauma θ(N ) of a channel N defined through D is non-negative and it is equal to zero if N ∈ CPWP. If the generalized divergence is furthermore continuous and θ(N ) = 0, then N ∈ CPWP.

Proof. From Lemma 29 in AppendixA, it follows that any completely positive map E subject to the constraint M(E) ≤ 0 is trace non-increasing on the set W+. It thus follows that EA→B(ψRA) is subnormalized for any input state ψRA ∈ W+. By restricting the maximization to such input 18

states in W+, applying the faithfulness assumption, and applying the definition of generalized thauma, we conclude that θ(N ) ≥ 0. Suppose that N ∈ CPWP. Then by Proposition7, M(N ) = 0 and so we can set E = N in the definition of generalized thauma and conclude from the faithfulness assumption that θ(N ) = 0. Finally, suppose that θ(N ) = 0. By the assumption of continuity, this means that there exists a completely positive map E satisfying D(N kE) = 0. By Lemma 29 in AppendixA and the faithfulness assumption, this in turn means that NA→B(ΦRA) = EA→B(ΦRA) for the maximally entangled state ΦRA ∈ W+, which implies that NA→B = EA→B. However, we have that M(E) ≤ 0, implying that M(N ) = 0, since N is a channel and M(N ) ≥ 0 for all channels. By Proposition7, we conclude that N ∈ CPWP. 

As discussed in [57, 76], a generalized divergence possesses the direct-sum property on classical-quantum states if the following equality holds: !

X x X x X x x D pX (x)|xihx|X ⊗ ρ pX (x)|xihx|X ⊗ σ = pX (x)D(ρ kσ ), (112) x x x

x x where pX is a probability distribution, {|xi}x is an orthonormal basis, and {ρ }x and {σ }x are sets of states. We note that this property holds for trace distance, quantum relative entropy [84], and the Petz-Renyi´ [67] and sandwiched Renyi´ [63, 98] quasi-entropies sgn(α−1) Tr ρασ1−α and 1−α 1−α α sgn(α − 1) Tr[(σ 2α ρσ 2α ) ], respectively. For such generalized divergences, which are additionally continuous, as well as convex in the second argument, we find that an exchange of the minimization and the maximization in the definition of the generalized thauma is possible:

Proposition 13 (Minimax) Let D be a generalized divergence that is continuous, obeys the direct-sum property in (112), and is convex in the second argument. Then the following exchange of min and max is possible in the generalized thauma:

θ(N ) = sup inf D(NA→B(ψRA)kEA→B(ψRA)). (113) ψRA E:M(E)≤0

1 2 Proof. Let E be a fixed completely positive map such that M(E) ≤ 0. Let ψRA and ψRA be input states to consider for the maximization. Due to the unitary freedom of purifications and invariance of generalized divergence with respect to unitaries, we can equivalently consider the maximization to be over the convex set of density operators acting on the channel input system A. Define

λ 1 2 ρA = λψA + (1 − λ)ψA, (114) for λ ∈ [0, 1]. Then the state √ √ λ 1 2 |φ iR0RA := λ|0iR0 |ψ iRA + 1 − λ|1iR0 |ψ iRA (115)

λ λ λ purifies ρA and is related to a purification |ψ iRA of ρA by an isometry. It then follows that

λ λ D(NA→B(ψRA)kEA→B(ψRA)) λ λ = D(NA→B(φR0RA)kEA→B(φR0RA)) (116) 1 1 ≥ λD(|0ih0|R0 ⊗ NA→B(ψRA)k|0ih0|R0 ⊗ EA→B(ψRA)) 2 2 + (1 − λ)D(|1ih1|R0 ⊗ NA→B(ψRA)k|1ih1|R0 ⊗ EA→B(ψRA)) (117) 19

1 1 2 2 = λD(NA→B(ψRA)kEA→B(ψRA)) + (1 − λ)D(NA→B(ψRA)kEA→B(ψRA)). (118) The inequality follows from data processing, by applying a completely dephasing channel to the register R0. The last equality again follows from data processing. So the objective function is concave in the argument being maximized (again thinking of the maximization being performed over density operators on A rather than pure states on RA). By assumption, for a fixed input state ψRA, the objective function is convex in the second argument and the set of completely positive maps E satisfying M(E) is convex. Then the Sion minimax theorem [79] applies, and we conclude the statement of the proposition.  Remark 2 Examples of generalized divergences to which Proposition 13 applies include the quantum rel- ative entropy [84], the sandwiched R´enyirelative entropy [63, 98], and the Petz–R´enyirelative entropy [67]. The proposition applies to the latter two by working with the corresponding quasi-entropies and then lifting the result to the actual relative entropies.

E. Max-thauma of a quantum channel

As a particular case of the generalized thauma of a quantum channel defined in (95), we con- sider the max-thauma of a quantum channel, which is the max-relative entropy divergence be- tween the channel and the set of completely positive maps with non-positive mana. Specifically, for a given quantum channel NA→B, the max-thauma of NA→B is defined by

θmax(N ) := min Dmax(N kE), (119) E:M(E)≤0 where the minimum is taken with respect to all completely positive maps E satisfying M(E) ≤ 0 and

Dmax(N kE) := sup Dmax(NA→B(ψRA)kEA→B(ψRA)) (120) ψRA is the max-divergence of channels [26]. (More generally, N and E could be arbitrary completely positive maps in (120).) Note that it is known that [8, 31]

N E Dmax(N kE) = Dmax(NA→B(ΦRA)kEA→B(ΦRA)) = log min{t : JAB ≤ tJAB}, (121) N where ΦRA is the maximally entangled state and JAB is the Choi-Jamiołkowski matrix of the E channel NA→B and similarly for JAB. Due to the properties of max-relative entropy, it follows that Theorem 11 and Propositions 10, 12, and 13 apply to the max-thauma of a channel, implying reduction to states, that it is monotone with respect to completely CPWP superchannels, faithful, and obeys a minimax theorem, so that

θmax(R) = θmax(σ), if R(ρ) = Tr[ρ]σ (122) CPWP θmax(N ) ≥ θmax(Ξ (N )), (123)

θmax(N ) ≥ 0 and θmax(N ) = 0 if and only if N ∈ CPWP, (124)

θmax(N ) = min max Dmax(NA→B(ψRA)kEA→B(ψRA)) (125) E:M(E)≤0 ψRA

= max min Dmax(NA→B(ψRA)kEA→B(ψRA)), (126) ψRA E:M(E)≤0 where N is a quantum channel and ΞCPWP is a CPWP superchannel. We can alternatively express the max-thauma of a channel as the following SDP: 20

Proposition 14 (SDP for max-thauma) For a given quantum channel NA→B, its max-thauma θmax(N ) can be written as the following SDP:

θmax(N ) = log min t s.t. J N ≤ Y AB AB (127) X u v | Tr[(AA ⊗ AB)YAB]|/dB ≤ t, ∀u, v N where JAB is the Choi-Jamiołkowski matrix of the channel NA→B. Moreover, the dual SDP to the above is as follows: N θmax(N ) = log max Tr[JABVAB] X s.t. bv ≤ 1 v X v u 0 ≤ VAB ≤ (cv,u − fv,u)AA ⊗ AB/dB, (128) v,u

cv,u + fv,u ≤ bv, ∀u, v,

cv,u ≥ 0, fv,u ≥ 0, ∀u, v. Proof. Consider the following chain of equalities:

θmax(N ) = log min Dmax(N kE) (129) E:M(E)≤0 n N E0 0 o = log min t : JAB ≤ tJAB, M(E ) ≤ 0 (130) ( ) N E0 X 0 u v = log min t : JAB ≤ tJAB, | Tr[E (AA)AB]|/dB ≤ 1, ∀u (131) v ( ) N E X u v = log min t : JAB ≤ JAB, | Tr[E(AA)AB]|/dB ≤ t, ∀u (132) v ( ) N X u v = log min t : JAB ≤ YAB, | Tr[(AA ⊗ AB)YAB]|/dB ≤ t, ∀u , (133) v where the second equality follows from (121) and the last from the fact that E is completely pos- itive and thus in one-to-one correspondence with positive semi-definite bipartite operators. We further rewrite the absolute-value constraint in (133) and arrive at the following SDP: ( ) N X u v θmax(N ) = log min t : JAB ≤ YAB, −t ≤ Tr[(AA ⊗ AB)YAB]/dB ≤ t, ∀u , (134) v Then we use the Lagrangian method to obtain the dual SDP: N θmax(N ) = log max Tr[JABVAB] X s.t. bv ≤ 1 v X v u 0 ≤ VAB ≤ (cv,u − fv,u)AA ⊗ AB/dB, (135) v,u

cv,u + fv,u ≤ bv, ∀u, v,

cv,u ≥ 0, fv,u ≥ 0, ∀u, v. This concludes the proof.  21

Corollary 15 (Max-thauma vs. mana) For a quantum channel NA→B, its max-thauma does not exceed its mana:

θmax(N ) ≤ M(N ). (136)

N Proof. The proof is a direct consequence of the primal formulation in (127). By setting YAB = JAB, we find that

X u v θmax(N ) ≤ log min{t : max | Tr[(AA ⊗ AB)JAB]|/dB ≤ t} = M(N ), (137) u v where the last equality follows from (60). 

Proposition 16 (Additivity) For two given quantum channels N1 and N2, the max-thauma is additive in the following sense:

θmax(N1 ⊗ N2) = θmax(N1) + θmax(N2). (138)

Proof. The idea of the proof is to utilize the primal and dual SDPs of θmax(N ) from Proposition 14. On the one hand, suppose that the optimal solutions to the primal SDPs for θmax(N1) and θmax(N2) are {R1, bv1 } and {R2, bv2 }, respectively. It is then easy to verify that {R1 ⊗ R2, bv1 bv2 } is a feasible solution to the SDP of θmax(N1 ⊗ N2). Thus,

θ (N ⊗ N ) ≥ log Tr[(J N1 ⊗ J N2 )(R ⊗ R )] = θ (N ) + θ (N ). max 1 2 A1B1 A2B2 1 2 max 1 max 2 (139)

On the other hand, considering Eq. (119), suppose that the optimal solutions for N1 and N2 are E1 and E2, respectively. Noting that M(E1 ⊗ E2) = M(E1) + M(E2) ≤ 0, and employing (121) and the additivity of the max-relative entropy, we find that

θmax(N1 ⊗ N2) ≤ Dmax(N1 ⊗ N2kE1 ⊗ E2) (140)

= θmax(N1) + θmax(N2). (141)

This concludes the proof. 

The following lemma is essential to establishing subadditivity of max-thauma of channels with respect to serial composition, as stated in Proposition 18 below. We suspect that Lemma 17 will find wide use in general resource theories beyond the magic resource theory considered in this paper. For example, it leads to an alternative proof of [8, Proposition 17].

1 Lemma 17 (Subadditivity of max-divergence of channels) Given completely positive maps NA→B, 2 1 2 NB→C , EA→B, and EB→C , the following subadditivity inequality, with respect to serial compositions, holds for the max-channel divergence of (120)–(121):

Dmax(N2 ◦ N1kE2 ◦ E1) ≤ Dmax(N1kE1) + Dmax(N2kE2), (142)

1 2 1 2 where we have made the abbreviations N1 ≡ NA→B, N2 ≡ NB→C , E1 ≡ EA→B, and E2 ≡ EB→C .

Proof. Recall the “data-processed triangle inequality” from [25]:

Dmax(P(ρ)kω) ≤ Dmax(ρkσ) + Dmax(P(σ)kω), (143) 22 which holds for P a positive map and ρ, ω, and σ positive semi-definite operators. Note that one can in fact see this as a consequence of the submultiplicativity of the operator norm and the data-processing inequality of max-relative entropy for positive maps:

−1/2 1/2 Dmax(P(ρ)kω) = 2 log ω [P(ρ)] (144) ∞

= 2 log ω−1/2[P(σ)]1/2[P(σ)]−1/2[P(ρ)]1/2 (145) ∞

≤ 2 log ω−1/2[P(σ)]1/2 · [P(σ)]−1/2[P(ρ)]1/2 (146) ∞ ∞

= 2 log ω−1/2[P(σ)]1/2 + 2 log [P(σ)]−1/2[P(ρ)]1/2 (147) ∞ ∞ = Dmax(P(ρ)kP(σ)) + Dmax(P(σ)kω) (148)

≤ Dmax(ρkσ) + Dmax(P(σ)kω). (149) Let us pick

P = id ⊗N2, ρ = (id ⊗N1)(Φ), σ = (id ⊗E1)(Φ), ω = (id ⊗E2)(σ), (150) where Φ denotes the maximally entangled state. We find that

Dmax(N2 ◦ N1kE2 ◦ E1)

= Dmax((id ⊗(N2 ◦ N1))(Φ)k(id ⊗(E2 ◦ E1))(Φ)) (151)

≤ Dmax((id ⊗N1)(Φ)k(id ⊗E1)(Φ)) + Dmax((id ⊗(N2 ◦ E1))(Φ)k(id ⊗(E2 ◦ E1))(Φ)) (152)

≤ Dmax(N1kE1) + Dmax(N2kE2). (153) The first equality follows from (121). The first inequality follows from (143) with the choices in (150). The second inequality follows because Dmax((id ⊗N1)(Φ)k(id ⊗E1)(Φ)) = Dmax(N1kE1), as a consequence of (121), and the channel divergence Dmax(N2kE2) involves an optimization over all bipartite input states, one of which is (id ⊗E1)(Φ).  Remark 3 The proof above applies to any divergence that obeys the data-processed triangle inequality, which includes the Hilbert α-divergences of [17], as discussed in [8, Appendix A].

Proposition 18 (Subadditivity) For two given quantum channels N1 and N2, the max-thauma is sub- additive in the following sense:

θmax(N2 ◦ N1) ≤ θmax(N1) + θmax(N2). (154)

Proof. This is a direct consequence of Lemma 17 above. Let Ei be the completely positive map satisfying M(Ei) ≤ 0 and that is optimal for Ni with respect to the max-thauma θmax, for i ∈ {1, 2}. Then applying Lemma 17, we find that

Dmax(N2 ◦ N1kE2 ◦ E1) ≤ Dmax(N1kE1) + Dmax(N2kE2) (155)

= θmax(N1) + θmax(N2), (156)

The equality follows from the assumption that Ei is the completely positive map satisfying M(Ei) ≤ 0, which is optimal for Ni with respect to the max-thauma θmax, for i ∈ {1, 2}. Given that, by assumption, M(Ei) ≤ 0 for i ∈ {1, 2}, it follows from Proposition6 that M(E2 ◦ E1) ≤ 0. Since the max-thauma involves an optimization over all completely posi- tive maps E satsifying M(E) ≤ 0, we conclude that

θmax(N2 ◦ N1) ≤ Dmax(N2 ◦ N1kE2 ◦ E1) ≤ θmax(N1) + θmax(N2), (157) which is the statement of the proposition.  23

Proposition 19 (Amortization inequality) For any quantum channel NA→B, the following inequality holds

sup[θmax(NA→B(ρA)) − θmax(ρA)] ≤ θmax(N ), (158) ρA with the optimization performed over input states ρA. Moreover, the following inequality also holds

sup[θmax((idR ⊗N )(ρRA)) − θmax(ρRA)] ≤ θmax(NA→B). (159) ρRA

Proof. The inequality in (158) is a direct consequence of reduction to states (Proposition 122) and subadditivity of max-thauma with respect to serial compositions (Proposition 18). Indeed, letting 0 N be a replacer channel that prepares the state ρA, we find that

0 0 θmax(N (ρA)) = θmax(N ◦ N ) ≤ θmax(N ) + θmax(N ) = θmax(N ) + θmax(ρA). (160) for all input states ρA, from which we conclude (158). To arrive at the inequality in (159), we make the substitution N → id ⊗N , apply the above reasoning, the additivity in Proposition 16, and the fact that the identity channel is free (CPWP), to conclude that the following holds for all input states ρRA

θmax((idR ⊗NA→B)(ρRA)) − θmax(ρRA) ≤ θmax(id ⊗N ) = θmax(id) + θmax(N ) = θmax(N ), (161) from which we conclude (159). 

To summarize, the properties of θmax(N ) are as follows:

1. Reduction to states: θmax(N ) = θmax(σ) when the channel N is a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, where σ is a state.

2. Monotonicity of θmax(N ) under CPWP superchannels (including completely stabilizer- preserving superchannels).

3. Additivity under tensor products of channels: θmax(N1 ⊗ N2) = θmax(N1) + θmax(N2).

4. Subadditivity under serial composition of channels: θmax(N2 ◦ N1) ≤ θmax(N1) + θmax(N2).

5. Faithfulness: N ∈ CPWP if and only if θmax(N ) = 0.

sup θ ((id ⊗N )(ρ )) − θ (ρ ) ≤ θ (N ) 6. Amortization inequality: ρRA max R RA max RA max .

Remark 4 Due to the subadditivity inequality in Proposition 18, the additivity identity in Proposition 16, and faithfulness in (124), the following identities hold

θmax(N1) = sup θmax([id ⊗N1] ◦ N2) − θmax(N2), (162) N2

θmax(N1) = sup θmax(N2 ◦ [id ⊗N1]) − θmax(N2), (163) N2

θmax(N1) = sup θmax(N2 ◦ [id ⊗N1] ◦ N3) − θmax(N2) − θmax(N3), (164) N2,N3 which have the interpretation that amortization in terms of arbitrary pre- and post-processing does not increase the max-thauma of a quantum channel. 24

IV. DISTILLING MAGIC FROM QUANTUM CHANNELS

A. Amortized magic

Since many physical tasks relate to quantum channels and time evolution rather than directly to quantum states, it is of interest to consider the non-stabilizer properties of quantum channels. Now having established suitable measures to quantify the magic of quantum channels, it is nat- ural to figure out the ability of a quantum channel to generate magic from input quantum states. Let us begin by defining the amortized magic of a quantum channel:

Definition 6 (Amortized magic) The amortized magic of a quantum channel NA→B is defined relative to a magic measure m(·) via the following formula:

A m (N ) := sup m((idR ⊗N )(ρRA)) − m(ρRA). (165) ρRA

The strict amortized magic of a quantum channel is defined as

A me (N ) := sup m((idR ⊗N )(ρRA)). (166) ρRA∈Stab

That is, the amortized magic is defined as the largest increase in magic that a quantum channel can realize after it acts on an arbitrary input quantum state. The strict amortized magic is defined by finding the largest amount of magic that a quantum channel can realize when a stabilizer state is given to it as an input. Such amortized measures of resourcefulness of quantum channels were previously studied in the resource theories of quantum coherence (e.g., [6, 31, 36, 59]) and (e.g., [7, 55, 58, 80, 92]). They have been considered in the context of an arbitrary resource theory in [55, Section 7].

Proposition 20 Given a quantum channel NA→B, the following inequalities hold

A M (N ) := sup M((idR ⊗NA→B)(ρRA)) − M(ρRA) ≤ M(N ), (167) ρRA A θmax(N ) := sup θmax((idR ⊗NA→B)(ρRA)) − θmax(ρRA) ≤ θmax(N ). (168) ρRA

Proof. These statements are an immediate consequence of the amortization inequality for M(ρ) and θmax(ρ) given in Propositions8 and 19, respectively. 

B. Distillable magic of a quantum channel

The most general protocol for distilling some resource by means of a quantum channel N employs n invocations of the channel N interleaved by free channels [55, Section 7]. In our case, the resource of interest is magic, and here we take the free channels to be the CPWP channels discussed in Section III A. In such a protocol, the instances of the channel N are invoked one at a time, and we can integrate all CPWP channels between one use of N and the next into a single CPWP channel, since the CPWP channels are closed under composition. The goal of such a protocol is to distill magic states from the channel. In more detail, the most general protocol for distilling magic from a quantum channel pro- R A ρ(1) ceeds as follows: one starts by preparing the systems 1 1 in a state R1A1 with non-negative 25

A B A B An Bn      n+1 ω R R R n

FIG. 2: The most general protocol for distilling magic from a quantum channel.

(1) Wigner function, by employing a free CPWP channel F , then applies the channel NA →B , ∅→R1A1 1 1 F (2) followed by a CPWP channel R1B1→R2A2 , resulting in the state ρ(2) := F (2) ((id ⊗N )(ρ(1) )). R2A2 R1B1→R2A2 R1 A1→B1 R1A1 (169) ρ(i) i − 1 Continuing the above steps, given state RiAi after the action of invocations of the channel NA→B and interleaved CPWP channels, we apply the channel NAi→Bi and the CPWP channel F (i+1) RiBi→Ri+1Ai+1 , obtaining the state

ρ(i+1) := F (i+1) ((id ⊗N )(ρ(i) )). Ri+1Ai+1 RiBi→Ri+1Ai+1 Ri Ai→Bi RiAi (170)

n N F (n+1) After invocations of the channel A→B have been made, the final free CPWP channel RnBn→S produces a state ωS on system S, defined as

ω := F (n+1) (ρ(n) ). S RnBn→S RnAn (171) Such a protocol is depicted in Figure2. Fix ε ∈ [0, 1] and k ∈ N. The above procedure is an (n, k, ε) ψ-magic distillation protocol with rate k/n and error ε, if the state ωS has a high fidelity with k copies of the target magic state ψ, ⊗k ⊗k hψ| ωS|ψi ≥ 1 − ε. (172) A rate R is achievable for ψ-magic state distillation from the channel N , if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n(R − δ), ε) ψ-magic state distillation protocol of the above form. The ψ-distillable magic of the channel N is defined to be the supremum of all achievable rates and is denoted by Cψ(N ). A common choice for a non-Clifford gate is the T -gate. The T gate [49] is given by   ξ 0 0   T = 0 1 0  , (173) 0 0 ξ−1 where ξ = e2πi/9 is a primitive ninth root of unity. The T gate leads to the T magic state |T i := T |+i, (174) by inputting the stabilizer state |+i to the T gate. Furthermore, by the method of state injection [41, 100], one can generate a T gate by acting with stabilizer operations on the T state |T i. In what follows, we use quantum hypothesis testing to establish an upper bound on the rate at which one can distill qutrit T states. The proof follows the general method in [7, Theorem 1] and [6, Theorem 1], which was later generalized to an arbitrary resource theory in [55, Section 7]. 26

Proposition 21 Given a quantum channel N , the following upper bound holds for the rate R = k/n of an (n, k, ε) T -magic distillation protocol:

1  log(1/[1 − ε]) R ≤ θ (N ) + . (175) log(1 + 2 sin(π/18)) max n

Consequently, the following upper bound holds for the T -distillable magic of a quantum channel N :

θmax(N ) C (N ) ≤ . (176) T log(1 + 2 sin(π/18))

Proof. Consider an arbitrary (n, k, ε) T -magic state distillation protocol of the form described n ρ(1) previously. Such a protocol uses the channel times, starting from the state R1A1 with non- ρ(2) , . . . , ρ(n) , ω negative Wigner function and generating R2A2 RnAn and S step by step along the way, ⊗k such that the final state ωS has fidelity 1 − ε with |T i , where |T i = T |+i is the corresponding magic state of the T gate. By assumption, it follows that

⊗k Tr[|T ihT | ωS] ≥ 1 − ε, (177) while the result in [93] implies that

⊗k −k Tr[|T ihT | σS] ≤ (1 + 2 sin(π/18)) (178) for all σS ∈ W with the same dimension as ωS. Applying the data processing inequality for the max-relative entropy, with respect to the measurement channel

(·) → Tr[|T ihT |⊗k(·)]|0ih0| + Tr[(I⊗k − |T ihT |⊗k)(·)]|1ih1|, (179) we find that

k θmax(ωS) ≥ log[(1 − ε)(1 + 2 sin(π/18)) ] (180) ≥ log(1 − ε) + k log(1 + 2 sin(π/18)). (181)

(n+1) Moreover, by labeling ωS as ρ , we find that

n (n+1) X (j+1) (j) θmax(ρ ) = [θmax(ρ ) − θmax(ρ )] (182) j=1 n X (j+1) (j) (j) = [θmax((F ◦ [id ⊗N ])(ρ )) − θmax(ρ )] (183) j=1 n X (j) (j) ≤ [θmax([id ⊗N ](ρ )) − θmax(ρ )] (184) j=1

≤ nθmax(N ). (185)

(1) The first equality follows because θmax(ρ ) = 0 and by adding and subtracting terms. The first inequality follows because the max-thauma of a state does not increase under the action of a CPWP channel. The last inequality follows from applying Proposition 19. Hence,

nθmax(N ) ≥ log(1 − ε) + k log(1 + 2 sin(π/18)), (186) 27 which implies that

1  log(1/[1 − ε]) R = k/n ≤ θ (N ) + . (187) log(1 + 2 sin(π/18)) max n

This concludes the proof. 

We note here that one could also use the subadditivity inequality in Proposition 18 to establish the above result. We further note here that similar results in terms of max-relative entropies have been found in the context of other resource theories. Namely, a channel’s max-relative entropy of entanglement is an upper bound on its distillable secret key when assisted by LOCC channels [25], the max-Rains information of a quantum channel is an upper bound on its distillable entangle- ment when assisted by completely PPT preserving channels [9], and the max-k-unextendibility of a quantum channel is an upper bound on its distillable entanglement when assisted by k- extendible channels [54].

C. Injectable quantum channel

In any resource theory of quantum channels, it tends to simplify for those channels that can be implemented by the action of a free channel on the tensor product of the channel input state and a resourceful state [55, Section 7] and [97, Section 6]. The situation is no different for the resource theory of magic channels. In fact, particular channels with the aforementioned structure have been considered for a long time in the context of magic states, via the method of state injection [41, 100]. Here we formally define an injectable channel as follows:

Definition 7 (Injectable channel) A quantum channel N is called injectable with associated resource state ωC if there exists a CPWP channel ΛAC→B such that the following equality holds for all input states ρA:

NA→B(ρA) = ΛAC→B(ρA ⊗ ωC ). (188)

The notion of a resource-seizable channel was introduced in [8, 97], and here we consider the application of this notion in the context of magic resource theory:

Definition 8 (Resource-seizable channel) Let NA→B be an injectable channel with associated resource pre state ωC . The channel N is resource-seizable if there exists a free state κRA with non-negative Wigner post function and a post-processing free CPWP channel FRB→C such that

post pre FRB→C (NA→B(κRA)) = ωC . (189)

In the above sense, one seizes the resource state ωC by employing free pre- and post-processing of the channel NA→B.

An interesting and prominent example of an injectable channel that is also resource seizable is the channel T corresponding to the T gate. This channel T has the following action T (ρ) := T ρT † on an input state ρ. This channel is injectable with associated resource state ωC = |T ihT |, since one can use the method of circuit injection [100] to obtain the channel T by acting on |T ihT | with stabilizer operations. It is resource seizable because one can act on the free state |+ih+| with the channel T in order to seize the underlying resource state |T ihT | = T (|+ih+|). 28

As a generalization of the T channel example above, consider the channel ∆p ◦ T , where ∆p is a dephasing channel of the form

p † 2 2 † ∆ (ρ) = p0ρ + p1ZρZ + p2Z ρ(Z ) (190) where p = (p0, p1, p2), p0, p1, p2 ≥ 0, and p0 + p1 + p2 = 1. The channel is injectable with resource state ∆p(|T ihT |), because the same method of circuit injection leads to the channel ∆p ◦ T when acting on the resource state ∆p(|T ihT |). Furthermore, the channel ∆p ◦ T is resource seizable because one recovers the resource state ∆p(|T ihT |) by acting with ∆p ◦ T on the free state |+ih+|. For such injectable channels, the resource theory of magic channels simplifies in the following sense:

Proposition 22 Let N be an injectable channel with associated resource state ωC . Then the following inequalities hold

M(N ) ≤ M(ωC ), θ(N ) ≤ θ(ωC ), (191) where θ denotes the generalized thauma measures from Section III D. If N is also resource seizable, then the following equalities hold

M(N ) = M(ωC ), θ(N ) = θ(ωC ). (192)

Proof. We first prove the first inequality in (191). Consider that

u M(N ) = log max kNA→B(AA)kW,1 (193) u u = log max kΛAC→B(AA ⊗ ωC )kW,1 (194) u u ≤ log max kAA ⊗ ωC kW,1 (195) u u = log max kAAkW,1 + log kωC kW,1 (196) u = log kωC kW,1 (197)

= M(ωC ). (198)

The first two equalities follow from definitions. The inequality follows from Lemma 30 in the appendix. The third equality follows because the Wigner trace norm is multiplicative for tensor- u product operators. The fourth equality follows because kAAkW,1 = 1 for any phase-space point operator Au. We now prove the second inequality in (191):

θ(N ) = inf sup D(NA→B(ψRA)kEA→B(ψRA)) (199) E:M(E)≤0 ψRA

= inf sup D(ΛAC→B(ψRA ⊗ ωC )kEA→B(ψRA)) (200) E:M(E)≤0 ψRA

≤ inf sup D(ΛAC→B(ψRA ⊗ ωC )kΛAC→B(ψRA ⊗ σC )) (201) σC ≥0:M(σC )≤0 ψRA

≤ inf sup D(ψRA ⊗ ωC kψRA ⊗ σC ) (202) σC ≥0:M(σC )≤0 ψRA

= inf D(ωC kσC ) (203) σC ≥0:M(σC )≤0

= θ(ωC ). (204)

The first two equalities follow from definitions. The first inequality follows because the com- pletely positive map E = ΛAC→B(· ⊗ σC ) with σC ∈ W is a special kind of completely positive 29 map such that M(E) ≤ 0, due to the first inequality in (191). The second inequality follows from data processing under the channel ΛAC→B. The third equality follows because the generalized divergence is invariant under tensoring its two arguments with the same state ψRA (again a con- sequence of data processing [98]). The final equality follows from the definition in (96). The inequalities in (192) are a direct consequence of the definition of a resource-seizable chan- nel, the fact that both the mana and the generalized thauma are monotone under the action of post pre a CPWP superchannel (Theorems9 and 11, respectively), and with FRB→C (NA→B(κRA)) under- stood as a particular kind of superchannel that manipulates NA→B to the state ωC . Furthermore, it is the case that the channel measures reduce to the state measures when evaluated for preparation channels that take as input a trivial one-dimensional system, for which the only possible “state” is the number one, and output a state on the output system (see Proposition4 and (122)). 

Applying Proposition 22 to the channel T and applying some of the results in [93], we find that

θmax(T ) = θ(T ) = θmax(|T ihT |) = θ(|T ihT |) = log(1 + 2 sin(π/18)). (205)

The notion of an injectable channel also improves the upper bounds on the distillable magic of a quantum channel:

Proposition 23 Given an injectable quantum channel N with associated resource state ωC , the following upper bound holds for the rate R = k/n of an (n, k, ε) T -magic distillation protocol:   1 h2(ε) R ≤ θ(ω ) + , (206) log(1 + 2 sin(π/18))(1 − ε) C n where h2(ε) := −ε log2 ε − (1 − ε) log2(1 − ε). Consequently, the following upper bound holds for the T -distillable magic of the injectable quantum channel N :

θ(ωC ) C (N ) ≤ . (207) T log(1 + 2 sin(π/18))

Proof. Consider an arbitrary (n, k, ε) T -magic state distillation protocol of the form described previously. Due to the injection property, it follows that such a protocol is equivalent to a CPWP ⊗n channel acting on the resource state ωC (see Figure 5 of [55]). So the channel distillation problem reduces to a state distillation problem. Applying Proposition 4 of [93] and standard inequalities for the hypothesis testing relative entropy from [87], we conclude the bound in (206). Then taking limits, we arrive at (207). 

V. MAGIC COST OF A QUANTUM CHANNEL

A. Magic cost of exact channel simulation

Beyond magic distillation via quantum channels, the magic measures of quantum channels can also help us investigate the magic cost in quantum gate synthesis. In the past two decades, tremendous progress has been accomplished in the area of gate synthesis for (e.g., [3, 10, 14, 38, 52, 73, 75, 96]) and qudits (e.g., [15, 16, 29, 30, 64]). Elementary two-qudit gates include the controlled-increment gate [15] and the generalized controlled-X gate [29, 30]. More recently, the synthesis of single-qutrit gates was studied in [37, 70]. Of particular interest is to study exact gate synthesis of multi-qudit unitary gates from elements of the Clifford group supplemented by T gates. More generally, a fundamental question is to 30

A’  ’ B’ A’ ’ B’ A’ n ’ B’n A B      n+1 R R R n

FIG. 3: The most general protocol for exact synthesis of a channel NA→B starting from n uses of another quantum channel N 0, along with free CPWP channels F i, for i ∈ {1, . . . , n}.

determine how many instances of a given quantum channel N 0 are required simulate another quantum channel N , when supplemented with CPWP channels. That is, such a channel synthesis protocol has the following form:

n+1 0 n 2 0 1 NA→B = F 0 ◦ N 0 0 ◦ F 0 0 ◦ · · · ◦ F 0 0 ◦ N 0 0 ◦ F 0 , (208) RnBn→B An→Bn Rn−1Bn−1→RnAn R1B1→R2A2 A1→B1 A→R1A1

0 as depicted in Figure3. Let SN 0 (N ) denote the smallest number of N channels required to im- plement the quantum channel N exactly. Note that it might not always be possible to have an exact simulation of the channel N when starting from another channel N 0. For example, if N is a unitary channel and N 0 is a noisy depolarizing channel, then this is not possible. In this case, we define SN 0 (N ) = ∞. In the following, we establish lower bounds on gate synthesis by employing the channel mea- sures of magic introduced previously.

Proposition 24 For any qudit quantum channel N , the number of channels N 0 required to implement it is bounded from below as follows:   M(N ) θmax(N ) 0 SN (N ) ≥ max 0 , 0 . (209) M(N ) θmax(N )

If the channel N is injectable with associated resource state ωC , then the following bound holds   M(N ) θmax(N ) SN 0 (N ) ≥ max , . (210) M(ωC ) θmax(ωC )

Proof. Suppose that the simulation of N is realized as in (208). Applying Proposition6 iteratively, we find that

n+1 X M(N ) ≤ nM(N 0) + M(F i) = nM(N 0), (211) i=1 where the equality follows from Proposition7 and the assumption that each F i is a CPWP chan- M(N ) nel. Then n ≥ M(N 0) . Since this inequality holds for an arbitrary channel synthesis protocol, we M(N ) find that SN 0 (N ) ≥ M(N 0) . θmax(N ) S 0 (N ) ≥ 0 Applying Propositions 18 and 12 in a similar way, we conclude that N θmax(N ) . If the channel is injectable, then the upper bounds in (191) apply, from which we conclude (210).  31

40

35

30

25

20

15

10

5

0 0 0.1 0.2 0.3 0.4 0.5 0.6

FIG. 4: Lower bound on the number of noisy T gates required to implement a low-noise CCX gate.

As a direct application, we investigate gate synthesis of elementary gates. In the following, we prove that four T gates are necessary to synthesize a controlled-controlled-X qutrit gate (CCX gate) exactly.

Proposition 25 To implement a controlled-controlled-X qutrit gate, at least four qutrit T gates are re- quired.

Proof. By direct numerical evaluation, we find that M(CCX) 2.1876 S (CCX) ≥ ≥ ≥ 3.2861, (212) T M(|T ihT |) 0.6657 which means that four qutrit T gates are necessary to implement a qutrit CCX gate. 

For NISQ devices, it is natural to consider gate synthesis under realistic quantum noise. One common noise model in quantum information processing is the depolarizing channel: p X D (ρ) = (1 − p)ρ + XiZjρ(XiZj)†. (213) p d2 − 1 0≤i,j≤d−1 (i,j)6=(0,0)

Suppose that a T gate is not available, but instead only a noisy version Dp ◦ T of it is. Then it is reasonable to consider the number of noisy T gates required to implement a low-noise CCX gate, and the resulting lower bound is depicted in Figure4. Considering the depolarizing noise (p = 0.01) and applying Proposition 24, the lower bound is given by

M(D⊗3 ◦ CCX) 0.01 . (214) M(Dp ◦ T )

B. Magic cost of approximate channel simulation

Here we consider the magic cost of approximate channel simulation, which allows for a small error in the simulation process. To be specific, we establish the following proposition: 32

Proposition 26 For any qudit channel N (with odd dimensions), the following lower bound for the num- ber of channels N 0 required to implement it with error tolerance ε:

ε 0 1 S 0 (N ) ≥ min{k : kM(N ) ≥ M(Ne), kNe − N k♦ ≤ ε, Ne ∈ CPTP}, (215) N 2 k · k := sup sup k(· ⊗ 1 )(X)k where ♦ k∈N kXk1≤1 k 1 denotes the diamond norm. To get this bound, we minimize the lower bound of exact magic cost in Proposition 24 over the quantum channels that are ε-close to the target channel in terms of diamond norm. Using the SDP form of the diamond norm [94] the above lower bound can be computed via the following SDP:

log min t M(N 0) s.t. t2 ≥ kNe(Au)kW,1, ∀u, (216) Tr Y ≤ ε1 ,Y ≥ 0,Y ≥ J − J , B AB A N Ne J ≥ 0, Tr J = 1 . Ne B Ne A Moreover, one could also replace mana with thauma in the above resource estimation. Also, it is possible and interesting to exactly characterize the minimum error of channel simulation under CPWP bipartite channels. We leave these for future study.

VI. CLASSICAL SIMULATION OF QUANTUM CHANNELS

A. Classical algorithm for simulating noisy quantum circuits

An operational meaning associated with mana is that it quantifies the rate at which a quan- tum circuit can be simulated on a classical computer. Inspired by [65], we propose an algorithm for simulating quantum circuits in which the operations can potentially be noisy. We show that the complexity of this algorithm scales with the mana (the logarithmic negativity) of quantum channels, establishing mana as a useful measure for measuring the cost of classical simulation of a (noisy) quantum circuit. For recent independent and related work, see [72]. ⊗n Let Hd be the Hilbert space of an n-qudit system. Consider an evolution that consists of the L sequence {Nl}l=1 of channels acting on an input state ρ. Then the probability of observing the POVM measurement outcome E, where 0 ≤ E ≤ I, can be computed according to the Born rule as

L   X Y Tr E(NL ◦ · · · ◦ N1)(ρ) = W (E|uL) WNl (ul|ul−1)Wρ(u0), (217) −→ u l=1 −→ where u = (u0,..., uL) represents a vector in the discrete phase space and W (E|·) is the discrete Wigner function of the measurement operator E (cf., Eq. (10)). For the base case L = 1, this follows from the properties of the discrete Wigner function:

X 0 0 X  1  1   W (E|u )W (u |u)W (u) = Tr EA 0 Tr A 0 N (A ) Tr A ρ (218) N ρ u d u u d u u,u0 u,u0 X  1   = Tr EA 0 Tr A 0 N (ρ) (219) u d u u0 = TrEN (ρ). (220) 33

The case of general L follows by induction.   Our goal is to estimate Tr E(NL ◦· · ·◦N1)(ρ) with additive error. In what follows, we assume that the input state is ρ = |0nih0n| and the desired outcome is |0ih0|. This assumption is without loss of generality, since we can reformulate both the state preparation and the measurement as quantum channels

N1(σ) = Tr(σ)ρ, NL(σ) = Tr(Eσ)|0ih0| + Tr((1 − E)σ)|1ih1|. (221)

Consequently, we have

L  n n  X Y n n Tr |0ih0|(NL ◦ · · · ◦ N1)(|0 ih0 |) = W (|0ih0||uL) WNl (ul|ul−1)W|0 ih0 |(u0). (222) −→ u l=1 To describe the simulation algorithm, we define the negativity of quantum states and channels as X Mρ := kρkW,1 = |Wρ(u)|, u X 0 MN (u) := kN (Au)kW,1 = |WN (u |u)|, (223) u0 M(N ) MN := 2 = max MN (u). u

L Then a noisy circuit comprised of the channels {Nl}l=1 can be simulated as follows. We sample the initial phase point u0 according to the distribution |W|0nih0n|(u0)|/M|0nih0n| and, for each l = 1,...,L, we sample a phase point ul according to the conditional distribution

|WNl (ul|ul−1)|/MNl (ul−1), after which we output the estimate

L   Y   n n n n M|0 ih0 |Sign W|0 ih0 |(u0) MNl (ul−1)Sign WNl (ul|ul−1) W (|0ih0||uL). (224) l=1 This gives an unbiased estimate of the output probability since

L h   Y   i n n n n E M|0 ih0 |Sign W|0 ih0 |(u0) MNl (ul−1)Sign WNl (ul|ul−1) W (|0ih0||uL) l=1 L X |W|0nih0n|(u0)| Y |WN (ul|ul−1)| = l n n −→ M|0 ih0 | MNl (ul−1) u l=1 L   Y   (225) n n n n × M|0 ih0 |Sign W|0 ih0 |(u0) MNl (ul−1)Sign WNl (ul|ul−1) W (|0ih0||uL) l=1 L X Y n n = W (|0ih0||uL) WNl (ul|ul−1)W|0 ih0 |(u0) −→ u l=1  n n  = Tr |0ih0|(NL ◦ · · · ◦ N1)(|0 ih0 |) .

n Note that M|0nih0n| = 1 since |0 i is trivially a stabilizer state. Also for any stabilizer POVM   {Ek}, we have Tr EkAu ≥ 0 and X     Tr EkAu = Tr Au = 1, (226) k 34 which implies that     max |W (Ek|u)| = max Tr EkAu = max Tr EkAu ≤ 1. (227) u u u Therefore, the estimate that we output has absolute value bounded from above by

L Y M→ = MNl . (228) l=1 By the Hoeffding inequality, it suffices to take 2 2 M2 log (229) 2 → δ samples to estimate the probability of a fixed measurement outcome with accuracy  and success probability 1 − δ. In the description of the above algorithm, we have used the discrete Wigner representation of quantum states, channels, and measurement operators. However, the algorithm can be general- ized using the frame and dual frame representation along the lines of the work [65]. Specifically, for any frame {F (λ) : λ ∈ Λ} and its dual frame {G(λ) : λ ∈ Λ} on a d-dimensional Hilbert space, we define the corresponding quasiprobability representation of a state, a channel, and a measurement operator respectively as

Wρ(λ) = Tr[F (λ)ρ], 0 0 WN (λ |λ) = Tr[F (λ )N (G(λ))], (230) W (E|λ) = TrEG(λ).

Then, we have similar rules for computing the measurement probability

X 0 0   W (E|λ )WN (λ |λ)Wρ(λ) = Tr EN (ρ) (231) λ,λ0 and the above discussion carries through without any essential change. For simplicity, we omit the details here. Note that for the discrete Wigner function representation that we used in our paper, the correspondence is F (λ) = Au/d and G(λ) = Au.

B. Comparison of classical simulation algorithms for noisy quantum circuits

Recently, the channel robustness and the magic capacity were introduced to quantify the magic of multi- noisy circuits [74]. To be specific, given a quantum channel N , the channel robust- ness is defined as

R∗(N ) := min {2p + 1 : (1 + p)Λ+ − pΛ− = N , p ≥ 0} , (232) Λ±∈CSPO and the magic capacity is defined as

C(N ) := max R[(id ⊗N )(|φihφ|)]. (233) |φi∈Stab

They are related by the inequality [74]

R(ΦN ) ≤ C(N ) ≤ R∗(N ), (234) 35

where R(·) is the robustness of magic (cf. SectionII) and ΦN is the normalized Choi-Jamiołkowski operator of N . The authors of [74] further developed two matching simulation algorithms that scale quadratically with these channel measures. Here, we compare their approach with the one described in SectionVIA for simulating noisy qudit circuits. Note that neither the proof of (234) nor the static Monte Carlo algorithm of [74] depend on the dimensionality of the underlying system, so those results can be generalized to any qudit system with odd prime dimension. ⊗n Thus, we consider an n-qudit system with the underlying Hilbert space Hd , where d is an L odd prime. Consider a noisy circuit consisting of the sequence {Nl}l=1 of channels acting on the initial state |0ni, after which a computational basis measurement is performed. To describe the simulation algorithm based on channel robustness [74], we assume each Nj has the optimal decomposition with respect to the set of CSPOs

Nj = (1 + pj)Nj,0 − pjNj,1, (235)

~ L where R∗(Nj) = 2pj + 1. For any k ∈ Z2 , define Y Y X Y p~k = (1 + pj) (−pj) kpk1 = |p~k| = R∗(Nj) = R∗. (236) kj =0 kj =1 ~k j

~ L We sample a k ∈ Z2 from the distribution |p~k|/kpk1 and simulate the evolution NL,kL ◦ · · · ◦ N2,k2 ◦

N1,k1 [101]. To achieve accuracy  and success probability 1 − δ, it suffices to take 2 2 R2 log (237) 2 ∗ δ samples. To compare it with the mana-based simulation algorithm, we first prove that that the expo- nentiated mana of a quantum channel is always smaller than or equal to the channel robustness. To establish the separation, we introduce the robustness of magic with respect to non-negative Wigner function as follows.

Definition 9 Given a quantum state ρ, the robustness of magic with respect to non-negative Wigner function is defined as

RW+ (ρ) := min {2p + 1 : ρ = (1 + p)σ − pω, ω, σ ∈ W+} . (238)

Since Stab ⊂ W+, we have

RW+ (ΦN ) ≤ R(ΦN ) ≤ C(N ) ≤ R∗(N ). (239)

Proposition 27 Given a quantum channel N , the following inequality holds

M(N ) 2 = MN ≤ R∗(N ), (240) and the inequality can be strict.

Proof. Suppose {p, Λ±} is the optimal solution to Eq. (232) of R∗(N ). Then, we have

MN = max kN (Au)kW,1 (241) u

= max k(1 + p)Λ+(Au) − pΛ−(Au)kW,1 (242) u 36

1.8

1.75

1.7

1.65

1.6

3.5 4 4.5 5 5.5 6

FIG. 5: Comparison between MUθ and RW+ (ΦUθ ) for π ≤ θ ≤ 2π. The gap indicates that MUθ is strictly smaller than C(Uθ) and R∗(Uθ).

≤ max [(1 + p)kΛ+(Au)kW,1 + pkΛ−(Au)kW,1] (243) u = 2p + 1 (244)

= R∗(N ). (245) The inequality in (243) follows due to the triangle inequality. The equality in (244) follows since Λ± ∈ CSPO and then kΛ±(Au)kW,1 = 1 for any u. M(N ) Furthermore, we demonstrate the strict separation between 2 and R∗(N ) via the follow- ing example. Let us consider the diagonal unitary   eiθ/9 0 0   Uθ =  0 1 0  . (246) 0 0 e−iθ/9

Note that the T gate is a special case, given by U2π. Due to Eq. (239), the separation between

M(Uθ) and log RW+ (ΦUθ ) in Figure5 indicates that the mana of a channel can be strictly smaller than the channel robustness and magic capacity, i.e.,

MUθ < RW+ (ΦUθ ) ≤ C(Uθ) ≤ R∗(Uθ). (247) This concludes the proof.  L Applying Proposition 27 to the channels {Nl}l=1, we find that Y Y M→ = MNj ≤ R∗(Nj) = R∗. (248) j j Thus for an n-qudit system with odd prime dimension, the sample complexity of the mana-based approach is never worse than the algorithm of [74] based on channel robustness. Furthermore, the separation demonstrated in (247) indicates that the mana-based algorithm can be strictly faster for certain quantum circuits. The above example shows that the mana of a quantum channel can be smaller than its magic capacity [74], due to (247), but it is not clear whether this relation holds for general quantum channels. 37

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 0.2 0.4 0.6 0.8 1

FIG. 6: Non-stabilizerness of T gate after depolarizing noise. The solid red line quantifies the classical simulation cost of noisy circuits Dp ◦ T . The dashed blue line gives the upper bound of magic generating capacity of Dp ◦ T .

VII. EXAMPLES

A. Non-stabilizerness under depolarizing noise

For near term quantum technologies, certain physical noise may occur during quantum infor- mation processing. One common quantum noise model is given by the depolarizing channel: p X D (ρ) = (1 − p)ρ + XiZjρ(XiZj)†, (249) p d2 − 1 0≤i,j≤d−1 (i,j)6=(0,0) where p ∈ [0, 1] and X,Z are the generalized Pauli operators. Let us suppose that depolarizing noise occurs after the implementation of a T gate. From Figure6, we find that if the depolarizing noise parameter p is higher than or equal to 0.62, then the channel Dp ◦ T cannot generate any non-stabilizerness. That is, the channel Dp ◦ T becomes CPWP after this cutoff. Another interesting case is the CCX gate. Let us suppose that depolarizing noise occurs in ⊗3 parallel after the implementation of the CCX gate. The mana of Dp ◦ CCX is plotted in Figure7, where we see that it decreases linearly and becomes equal to zero at around p ≈ 0.75.

B. Werner-Holevo channel

An interesting qutrit channel is the qutrit Werner-Holevo channel [95]: 1 N (V ) = [(Tr V )1 − V T ]. (250) WH 2 In what follows, we find that the Werner-Holevo channel maps any quantum state to a free state in W+ (state with non-negative Wigner function), while its amortized magic is given by our channel measure. This also indicates that the ancillary reference system is necessary to consider in the study of the resource theory of magic channels. 38

2

1.5

1

0.5

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

FIG. 7: Non-stabilizerness of CCX gate after depolarizing noise. The solid line also quantifies the classical ⊗3 simulation cost of the noisy circuit Dp ◦ CCX.

Proposition 28 For the qutrit Werner-Holevo channel NWH,

NWH(ρ) ∈ W+, (251) for any input state ρ (which is restricted to be a state of the channel input system), while 5 MA(N ) = M(N ) = , (252) WH WH 3 5 θA (N ) = θ (N ) = . (253) max WH max WH 3 Proof. On the one hand, for any input state ρ, we find that 1 1 1 W (u) = Tr A (1 − ρT ) = (1 − Tr A ρT ) ≥ (1 − kA k ) ≥ 0, (254) NWH(ρ) 6 u 6 u 6 u ∞ † where the first inequality follows because kAuk∞ = kA0k∞ = 1, since Au = TuA0Tu, the matrix Tu is unitary, and A0 for a qutrit is explicitly given by the following unitary transformation:   1 0 0   A0 = 0 0 1 . (255) 0 1 0

On the other hand, we can set ρRA to be the maximally entangled state, and we find from numerical calculations that 5 M((id ⊗N )(ρ )) − M(ρ ) = . (256) R WH RA RA 3 Meanwhile, we find from numerical calculations that M(NWH) = 5/3, which by Proposition8 means that 5 sup[M((idR ⊗NWH)(ρRA)) − M(ρRA)] = M(NWH) = . (257) ρRA 3 Similarly, we find from numerical calculations that 5 sup[θmax((idR ⊗NWH)(ρRA)) − θmax(ρRA)] = θmax(NWH) = . (258) ρRA 3 This concludes the proof.  39

VIII. CONCLUSION

We have introduced two efficiently computable magic measures of quantum channels to quan- tify and characterize the non-stabilizer resource possessed by quantum channels. These two channel measures have application in evaluating magic generating capability, gate synthesis, and classical simulation of noisy quantum circuits. More generally, our work establishes fundamen- tal limitations on the processing of quantum magic using noisy quantum circuits, opening new perspectives for the investigation of the resource theory of quantum channels in fault-tolerant quantum computation. One future direction is to explore tighter evaluations of the distillable magic of quantum chan- nels. We think that it would also be interesting to explore other applications of our channel mea- sures and generalize our approach to the multi-qubit case.

Acknowledgements

We are grateful to the anonymous referees for several comments that helped to improve our paper. XW acknowledges support from the Department of Defense. MMW acknowledges sup- port from NSF under grant no. 1714215. YS acknowledges support from the Army Research Office (MURI award W911NF-16-1-0349), the Canadian Institute for Advanced Research, the National Science Foundation (grants 1526380 and 1813814), and the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Quantum Algorithms Teams and Quantum Testbed Pathfinder programs.

Appendix A: On completely positive maps with non-positive mana

Lemma 29 Let E be a completely positive map. If M(E) ≤ 0, then E is trace non-increasing on the set W+ (quantum states with non-negative Wigner function).

P Proof. The assumption M(E) ≤ 0 is equivalent to maxu v |WE (v|u)| ≤ 1. For ρ ∈ W+, then consider that

X Tr[E(ρ)] = |Tr[E(ρ)]| = WE (v|u)Wρ(u) (A1) v,u X X X ≤ |WE (v|u)| |Wρ(u)| ≤ |Wρ(u)| = Wρ(u) = 1. (A2) v,u u u

The first equality follows from the assumption that E is completely positive and ρ is positive semi- definite. The second equality follows from Lemma1. The first inequality follows from the triangle inequality and the second from the assumption M(E) ≤ 0. The final two equalities follow from the assumption ρ ∈ W+ and (11). 

Appendix B: Data processing inequality for the Wigner trace norm

Lemma 30 For any operator Q and CPWP channel Π, the following inequality holds

kΠ(Q)kW,1 ≤ kQkW,1. (B1) 40

P P Proof. Let us suppose that Q = v qvAv. Then we have kQkW,1 = v |qv|. Furthermore,

X kΠ(Q)kW,1 = qvΠ(Av) (B2) v W,1

X X = qv Tr[AuΠ(Av)] (B3) u v

X X = qvWΠ(u|v) (B4) u v X X ≤ |qv|WΠ(u|v) (B5) u v X = |qv| = kQkW,1. (B6) v

This concludes the proof. 

[1] Scott Aaronson and Daniel Gottesman, Improved simulation of stabilizer circuits, Physical Review A 70 (2004), no. 5, 052328, arXiv:quant-ph/0406196. [2] Mehdi Ahmadi, Hoan Bui Dang, Gilad Gour, and Barry C. Sanders, Quantification and manipulation of magic states, Physical Review A 97 (2018), no. 6, 062332, arXiv:1706.03828. [3] Matthew Amy, Dmitri Maslov, and Michele Mosca, Polynomial-time T-depth optimization of Clifford+T circuits via matroid partitioning, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 33 (2014), no. 10, 1476–1489, arXiv:1303.2042. [4] Hussain Anwar, Earl T. Campbell, and Dan E. Browne, Qutrit magic state distillation, New Journal of Physics 14 (2012), no. 6, 063006, arXiv:1202.2326. [5] Stephen D. Bartlett, Terry Rudolph, and Robert W. Spekkens, Reconstruction of Gaussian quantum mechanics from Liouville mechanics with an epistemic restriction, Physical Review A 86 (2012), no. 1, 012103, arXiv:1111.5057. [6] Khaled Ben Dana, Mar´ıa Garc´ıa D´ıaz, Mohamed Mejatty, and Andreas Winter, Resource theory of coherence: Beyond states, Physical Review A 95 (2017), no. 6, 062327, arXiv:1704.03710. [7] Charles H. Bennett, Aram W. Harrow, Debbie W. Leung, and John A. Smolin, On the capacities of bipartite Hamiltonians and unitary gates, IEEE Transactions on Information Theory 49 (2003), no. 8, 1895–1911, arXiv:quant-ph/0205057. [8] Mario Berta, Christoph Hirche, Eneet Kaur, and Mark M. Wilde, Amortized channel divergence for asymptotic quantum channel discrimination, (2018), arXiv:1808.01498. [9] Mario Berta and Mark M. Wilde, Amortization does not enhance the max-Rains information of a quantum channel, New Journal of Physics 20 (2018), 053044, arXiv:1709.04907, arXiv:1709.04907. [10] Alex Bocharov, Martin Roetteler, and Krysta M Svore, Efficient synthesis of universal repeat-until-success quantum circuits, Physical Review Letters 114 (2015), no. 8, 080502, arXiv:1404.5320. [11] Sergey Bravyi and Jeongwan Haah, Magic-state distillation with low overhead, Physical Review A 86 (2012), no. 5, 052329, arXiv:1209.2426. [12] Sergey Bravyi and Alexei Kitaev, Universal quantum computation with ideal Clifford gates and noisy an- cillas, Physical Review A 71 (2005), no. 2, 022316, arXiv:quant-ph/0403025. [13] Sergey Bravyi, Graeme Smith, and John Smolin, Trading classical and quantum computational resources, Physical Review X 6 (2015), no. 2, 021043, arXiv:1506.01396. [14] Michael J. Bremner, Christopher M. Dawson, Jennifer L. Dodd, Alexei Gilchrist, Aram W. Harrow, Duncan Mortimer, Michael A. Nielsen, and Tobias J. Osborne, Practical scheme for quantum computa- tion with any two-qubit entangling gate, Physical Review Letters 89 (2002), no. 24, 247902, arXiv:quant- ph/0207072. 41

[15] Gavin K. Brennen, Stephen S. Bullock, and Dianne P. O’Leary, Efficient circuits for exact-universal com- putations with qudits, Quantum Information and Computation 6 (2006), 436, arXiv:quant-ph/0509161. [16] Stephen S. Bullock, Dianne P. O’Leary, and Gavin K. Brennen, Asymptotically optimal quantum circuits for d-level systems, Physical Review Letters 94 (2004), no. 23, 230502, arXiv:quant-ph/0410116. [17] Francesco Buscemi and Gilad Gour, Quantum relative Lorenz curves, Physical Review A 95 (2017), no. 1, 012110, arXiv:1607.05735. [18] Earl T. Campbell, Enhanced fault-tolerant quantum computing in d-level systems, Physical Review Letters 113 (2014), no. 23, 230501, arXiv:1406.3055. [19] Earl T. Campbell and Mark Howard, Magic state parity-checker with pre-distilled components, Quantum 2 (2018), 56, arXiv:1709.02214. [20] Earl T. Campbell, Barbara M. Terhal, and Christophe Vuillot, Roads towards fault-tolerant universal quantum computation, Nature 549 (2017), no. 7671, 172–179. [21] Christopher Chamberland and Andrew W. Cross, Fault-tolerant magic state preparation with flag qubits, Quantum 3 (2019), 143, arXiv:1811.00566. [22] Giulio Chiribella, Giacomo Mauro D’Ariano, and Paolo Perinotti, Transforming quantum operations: Quantum supermaps, Europhysics Letters 83 (2008), no. 3, 30004, arXiv:0804.0180. [23] Eric Chitambar and Gilad Gour, Quantum resource theories, (2018), arXiv:1806.06107. [24] Man-Duen Choi, Completely positive linear maps on complex matrices, Linear Algebra and its Applica- tions 10 (1975), no. 3, 285–290. [25] Matthias Christandl and Alexander Muller-Hermes,¨ Relative entropy bounds on quantum, pri- vate and repeater capacities, Communications in Mathematical Physics 353 (2017), no. 2, 821–852, arXiv:1604.03448, arXiv:1604.03448. [26] Tom Cooney, Milan´ Mosonyi, and Mark M. Wilde, Strong converse exponents for a quantum channel discrimination problem and quantum-feedback-assisted communication, Communications in Mathematical Physics 344 (2014), no. 3, 797–829, arXiv:1408.3373. [27] Nilanjana Datta, Min- and max-relative entropies and a new entanglement monotone, IEEE Transactions on Information Theory 55 (2009), no. 6, 2816–2826. [28] Hillary Dawkins and Mark Howard, Qutrit magic state distillation tight in some directions, Physical Review Letters 115 (2015), no. 3, 030501, arXiv:1504.05965. [29] Yao-Min Di and Hai-Rui Wei, Synthesis of multivalued quantum logic circuits by elementary gates, Physi- cal Review A 87 (2013), no. 1, 012325, arXiv:1302.0056. [30] Yao-Min Di and Hai-Rui Wei, Optimal synthesis of multivalued quantum circuits, Physical Review A 92 (2015), no. 6, 062317, arXiv:1506.04394. [31] Mar´ıa Garc´ıa D´ıaz, Kun Fang, Xin Wang, Matteo Rosati, Michalis Skotiniotis, John Calsamiglia, and Andreas Winter, Using and reusing coherence to realize quantum processes, Quantum 2 (2018), 100, arXiv:1805.04045. [32] Kun Fang, Xin Wang, Marco Tomamichel, and Runyao Duan, Non-asymptotic entanglement distillation, IEEE Transactions on Information Theory 65 (2019), no. 10, 6454–6465, arXiv:1706.06221. [33] Christopher Ferrie, Quasi-probability representations of quantum theory with applications to quantum in- formation science, Reports on Progress in Physics 74 (2011), no. 11, 116001, arXiv:1010.2701. [34] Christopher Ferrie and Joseph Emerson, Frame representations of quantum mechanics and the necessity of negativity in quasi-probability representations, Journal of Physics A: Mathematical and Theoretical 41 (2008), no. 35, 352001, arXiv:0711.2658. [35] Christopher Ferrie and Joseph Emerson, Framed Hilbert space: hanging the quasi-probability pictures of quantum theory, New Journal of Physics 11 (2009), no. 6, 063040, arXiv:0903.4843. [36] Mar´ıa Garc´ıa-D´ıaz, Dario Egloff, and Martin B Plenio, A note on coherence power of N- dimensional unitary operators, Quantum Information & Computation 16 (2015), no. 15-16, 1282–1294, arXiv:1510.06683. [37] Andrew N. Glaudell, Neil J. Ross, and Jacob M. Taylor, Canonical forms for single-qutrit Clifford+T operators, arXiv preprint arXiv:1803.05047 (2018), 1–19, arXiv:1803.05047. [38] David Gosset, Vadym Kliuchnikov, Michele Mosca, and Vincent Russo, An algorithm for the T-count, Quantum Information and Computation 14 (2014), no. 15-16, 1261–1276, arXiv:1308.4134. [39] Daniel Gottesman, Stabilizer codes and , PhD thesis (1997), arXiv:quant- ph/9705052. [40] Daniel Gottesman, Fault-tolerant quantum computation with higher-dimensional systems, Chaos, Solitons 42

& Fractals 10 (1999), no. 10, 1749–1758, arXiv:quant-ph/9802007. [41] Daniel Gottesman and Isaac L. Chuang, Demonstrating the viability of universal quantum computa- tion using teleportation and single-qubit operations, Nature 402 (1999), no. 6760, 390–393, arXiv:quant- ph/9908010. [42] Gilad Gour, Comparison of quantum channels with superchannels, (2018), arXiv:1808.02607. [43] David Gross, Hudson’s theorem for finite-dimensional quantum systems, Journal of Mathematical Physics 47 (2006), no. 12, 122107, arXiv:quant-ph/0602001. [44] David Gross, Non-negative Wigner functions in prime dimensions, Applied Physics B 86 (2007), no. 3, 367–370, arXiv:quant-ph/0702004. [45] Jeongwan Haah, Matthew B. Hastings, D. Poulin, and D. Wecker, Magic state distillation with low space overhead and optimal asymptotic input count, Quantum 1 (2017), 31, arXiv:1703.07847. [46] Matthew B. Hastings and Jeongwan Haah, Distillation with sublogarithmic overhead, Physical Review Letters 120 (2018), no. 5, 050504, arXiv:1709.03543. [47] Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki, Mixed-state entanglement and distilla- tion: is there a “bound” entanglement in nature?, Physical Review Letters 80 (1998), no. 24, 5239–5242, arXiv:quant-ph/9801069. [48] Mark Howard and Earl Campbell, Application of a resource theory for magic states to fault-tolerant quan- tum computing, Physical Review Letters 118 (2017), no. 9, 090501, arXiv:1609.07488. [49] Mark Howard and Jiri Vala, Qudit versions of the qubit pi/8 gate, Physical Review A 86 (2012), no. 2, 022316, arXiv:1206.1598. [50] Mark Howard, Joel J. Wallman, Victor Veitch, and Joseph Emerson, Contextuality supplies the magic for quantum computation, Nature 510 (2014), 351—-355, arXiv:1401.4174. [51] Andrzej Jamiołkowski, Linear transformations which preserve trace and positive semidefiniteness of opera- tors, Reports on Mathematical Physics 3 (1972), no. 4, 275–278. [52] Cody Jones, Low-overhead constructions for the fault-tolerant Toffoli gate, Physical Review A 87 (2013), no. 2, 022328, arXiv:1212.5069. [53] Cody Jones, Multilevel distillation of magic states for quantum computing, Physical Review A 87 (2013), no. 4, 042305, arXiv:1303.3066. [54] Eneet Kaur, Siddhartha Das, Mark M. Wilde, and Andreas Winter, Extendibility limits the performance of quantum processors, Physical Review Letters 123 (2018), 070502, arXiv:1803.10710. [55] Eneet Kaur and Mark M. Wilde, Amortized entanglement of a quantum channel and approximately teleportation-simulable channels, Journal of Physics A 51 (2018), no. 3, 035303, arXiv:1707.07721. [56] Anirudh Krishna and Jean-Pierre Tillich, Towards low overhead magic state distillation, arXiv:1811.08461 (2018), 1–7, arXiv:1811.08461. [57] Felix Leditzky, Eneet Kaur, Nilanjana Datta, and Mark M. Wilde, Approaches for approximate ad- ditivity of the Holevo information of quantum channels, Physical Review A 97 (2018), no. 1, 012332, arXiv:1709.01111. [58] Matthew S. Leifer, Leah Henderson, and Noah Linden, Optimal entanglement generation from quantum operations, Physical Review A 67 (2003), no. 1, 012306, arXiv:quant-ph/0205055. [59] Azam Mani and Vahid Karimipour, Cohering and decohering power of quantum channels, Physical Re- view A 92 (2015), no. 3, 032331, arXiv:1506.02304. [60] Andrea Mari and Jens Eisert, Positive Wigner functions render classical simulation of quantum computation efficient, Physical Review Letters 109 (2012), no. 23, 230503, arXiv:1208.3660. [61] William Matthews, A linear program for the finite block length converse of Polyanskiy–Poor–Verd´u via nonsignaling codes, IEEE Transactions on Information Theory 58 (2012), no. 12, 7036–7044, arXiv:1109.5417. [62] William Matthews, Converses from non-signalling codes and their relationship to converses from hypothesis testing, International Zurich Seminar on Communications, pp. 203 – 207 203 – 207 203 – 207 203–207, 2016. [63] Martin Muller-Lennert,¨ Fred´ eric´ Dupuis, Oleg Szehr, Serge Fehr, and Marco Tomamichel, On quan- tum R´enyientropies: a new generalization and some properties, Journal of Mathematical Physics 54 (2013), no. 12, 122203, arXiv:1306.3142. [64] Ashok Muthukrishnan and C. R. Stroud, Multi-valued logic gates for quantum computation, Physical Review A 62 (2000), no. 5, 052309, arXiv:quant-ph/0002033. [65] Hakop Pashayan, Joel J. Wallman, and Stephen D. Bartlett, Estimating outcome probabilities of quantum 43

circuits using quasiprobabilities, Physical Review Letters 115 (2015), no. 7, 070501, arXiv:1503.07525. [66] Asher Peres, Separability criterion for density matrices, Physical Review Letters 77 (1996), no. 8, 1413– 1415, arXiv:quant-ph/9604005. [67]D enes´ Petz, Quasi-entropies for finite quantum systems, Reports in Mathematical Physics 23 (1986), 57– 65. [68] Marco Piani, Michal Horodecki, Pawel Horodecki, and Ryszard Horodecki, Properties of quantum nonsignaling boxes, Physical Review A 74 (2006), no. 1, 012305. [69] Yury Polyanskiy and Sergio Verdu, Arimoto channel coding converse and Renyi divergence, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1327–1333, IEEE, sep 2010. [70] Shiroman Prakash, Akalank Jain, Bhakti Kapur, and Shubhangi Seth, Normal form for single-qutrit Clifford+T operators and synthesis of single-qutrit gates, Physical Review A 98 (2018), no. 3, 032304, arXiv:1803.03228. [71] Eric M. Rains, A semidefinite program for distillable entanglement, IEEE Transactions on Information Theory 47 (2001), no. 7, 2921–2933, arXiv:quant-ph/0008047. [72] Patrick Rall, Daniel Liang, Jeremy Cook, and William Kretschmer, Simulation of qubit quantum circuits via Pauli propagation, Physical Review A 99 (2019), 062337, arXiv:1901.09070. [73] Neil J. Ross and Peter Selinger, Optimal ancilla-free Clifford+T approximation of z-rotations, Quantum Information and Computation 11-12 (2016), 901–953, arXiv:1403.2975. [74] James R. Seddon and Earl Campbell, Quantifying magic for multi-qubit operations, Proceedings of Eoyal Society A 475 (2019), no. 2227, arXiv:1901.03322. [75] Peter Selinger, Quantum circuits of T-depth one, Physical Review A 87 (2012), no. 4, 042302, arXiv:1210.0974. [76] Kunal Sharma, Mark M. Wilde, Sushovit Adhikari, and Masahiro Takeoka, Bounding the energy- constrained quantum and private capacities of phase-insensitive Gaussian channels, New Journal of Physics 20 (2018), 063025, arXiv:1708.07257. [77] Naresh Sharma and Naqueeb Ahmad Warsi, On the strong converses for the quantum channel capacity theorems, (2012), arXiv:1205.1712. [78] Peter W. Shor, Fault-tolerant quantum computation, Proceedings of 37th Conference on Foundations of Computer Science, pp. 56–65, IEEE Comput. Soc. Press, 1996, arXiv:quant-ph/9605011. [79] M. Sion, On general minimax theorems, Pacific Journal of Mathematics 8 (1958), no. 171. [80] Masahiro Takeoka, Saikat Guha, and Mark M. Wilde, The squashed entanglement of a quantum channel, IEEE Transactions on Information Theory 60 (2013), no. 8, 4987–4998, arXiv:1310.0129. [81] Kristan Temme, Michael J. Kastoryano, Mary B. Ruskai, Michael M. Wolf, and Frank Verstraete, The χ2-divergence and mixing times of quantum Markov processes, Journal of Mathematical Physics 51 (2010), no. 12, 122201, arXiv:1005.2358. [82] Marco Tomamichel, Mario Berta, and Joseph M Renes, Quantum coding with finite resources, Nature Communications 7 (2015), 11419, arXiv:1504.04617. [83] Marco Tomamichel, Mark M. Wilde, and Andreas Winter, Strong converse rates for quantum communi- cation, IEEE Transactions on Information Theory 63 (2014), no. 1, 715–727, arXiv:1406.2946. [84] Hisaharu Umegaki, Conditional expectation in an operator algebra. IV. Entropy and information, Kodai Mathematical Seminar Reports 14 (1962), no. 2, 59–85. [85] Victor Veitch, Christopher Ferrie, David Gross, and Joseph Emerson, Negative quasi-probability as a resource for quantum computation, New Journal of Physics 14 (2012), no. 11, 113011, arXiv:1201.1256. [86] Victor Veitch, S. A. Hamed Mousavian, Daniel Gottesman, and Joseph Emerson, The resource theory of stabilizer quantum computation, New Journal of Physics 16 (2014), no. 1, 013009, arXiv:1307.7171. [87] Ligong Wang and Renato Renner, One-shot classical-quantum capacity and hypothesis testing, Physical Review Letters 108 (2012), no. 20, 200501. [88] Xin Wang and Runyao Duan, A semidefinite programming upper bound of quantum capacity, 2016 IEEE International Symposium on Information Theory (ISIT), vol. 2016-Augus, pp. 1690–1694, IEEE, July 2016. [89] Xin Wang and Runyao Duan, Improved semidefinite programming upper bound on distillable entanglement, Physical Review A 94 (2016), no. 5, 050301, arXiv:1601.07940. [90] Xin Wang and Runyao Duan, Nonadditivity of Rains’ bound for distillable entanglement, Physical Review A 95 (2016), no. 6, 062322, arXiv:1605.00348. 44

[91] Xin Wang, Kun Fang, and Runyao Duan, Semidefinite programming converse bounds for quantum com- munication, IEEE Transactions on Information Theory 65 (2019), no. 4, 2583–2592, arXiv:1709.00200. [92] Xin Wang and Mark M. Wilde, Exact entanglement cost of quantum states and channels under PPT- preserving operations, arXiv:1809.09592 (2018), 1–54, arXiv:1809.09592. [93] Xin Wang, Mark M. Wilde, and Yuan Su, Efficiently computable bounds for magic state distillation, arXiv:1812.10145 (2018), arXiv:1812.10145. [94] John Watrous, Semidefinite Programs for Completely Bounded Norms, Theory of Computing 5 (2009), no. 1, 217–238. [95] Reinhard F. Werner and Alexander S. Holevo, Counterexample to an additivity conjecture for output purity of quantum channels, Journal of Mathematical Physics 43 (2002), no. 9, 4353–4357, arXiv:quant- ph/0203003. [96] Nathan Wiebe and Martin Roetteler, Quantum arithmetic and numerical analysis using repeat-until- success circuits, Quantum Information & Computation 16 (2016), 134–178, arXiv:1406.2040. [97] Mark M. Wilde, Entanglement cost and quantum channel simulation, Physical Review A 98 (2018), no. 4, 042320, arXiv:1807.11939. [98] Mark M. Wilde, Andreas Winter, and Dong Yang, Strong converse for the of entanglement-breaking and Hadamard channels via a sandwiched R´enyirelative entropy, Communications in Mathematical Physics 331 (2014), no. 2, 593–622. [99] William K. Wootters, A Wigner-function formulation of finite-state quantum mechanics, Annals of Physics 176 (1987), no. 1, 1–21. [100] Xinlan Zhou, Debbie W. Leung, and Isaac L. Chuang, Methodology for construction, Physical Review A 62 (2000), no. 5, 052316, arXiv:quant-ph/0002039. [101] Strictly speaking, this evolution can be directly simulated only when the circuit is noiseless. Oth-

erwise, we need to further decompose each Nj,kj as a finite combination of stabilizer-preserving operations, each of which has a single Kraus operator. This does not affect the sample complexity of the simulator [74].