The Critical Points of Coherent Information on the Manifold of Positive Definite Matrices

by Alireza Tehrani

A Thesis presented to The University of Guelph

In partial fulfilment of requirements for the degree of Master of Science in Mathematics

Guelph, Ontario, Canada c Alireza Tehrani, January, 2020 ABSTRACT

THE CRITICAL POINTS OF COHERENT INFORMATION ON THE MANIFOLD OF POSITIVE DEFINITE MATRICES

Alireza Tehrani Advisors: University of Guelph, 2020 Dr. Bei Zeng Dr. Rajesh Pereira

The coherent information of quantum channels plays a important role in quantum infor- mation theory as it can be used to calculate the of a channel. However, it is a non-linear, non-differentiable optimization problem. This thesis discusses that by restricting to the space of positive definite density matrices and restricting the class of quan- tum channels to be strictly positive, the coherent information becomes differentiable. This allows the computation of the Riemannian gradient and Hessian of the coherent information. It will be shown that the maximally mixed state is a critical point for the n-shot coherent information of the Pauli, dephrasure and Pauli-erasure channels. In addition, the classifica- tion of the maximally mixed state as a local maxima/minima and saddle-point will be solved for the one shot coherent information. The hope of this work is to provide a new avenue to explore the quantum capacity problem. Dedication

To those who have inspired or encouraged me: Brother, Father, Randy, Farnaz and Paul.

iii Acknowledgements

I want to express my gratitude and thanks to Dr. Zeng for the guidance, advice and direction that was provided and for giving me the opportunity to work in theory and to be part of IQC and the Peng Cheng Laborotory. It has been a inspiring experience to witness the level of enthusiasm with respect to science that a person could have. In addition to Dr.Pereira, whom made sure I was on the right path, answering any questions and providing new suggestions. I’m thankful for the time spent doing research and words can never be enough to express the kindness that was shown. Couldn’t ask for better office mates, David, Eric, Katrina, Momina and Ningping thanks for the time spent together.

iv Contents

Abstract ii

Dedication iii

Acknowledgements iv

List of Figures viii

1 Introduction 1 1.1 Introduction to Problem ...... 1 1.1.1 Contributions ...... 2 1.2 Overview of Chapters ...... 3

2 Coherent Information 5 2.1 von Neumann Entropy ...... 5 2.1.1 Classical Entropy ...... 6 2.1.2 Density Matrices ...... 7 2.1.3 Quantum/von Neumann Entropy ...... 8 2.2 Coherent Information ...... 11 2.2.1 Quantum Channels ...... 11 2.2.2 Motivation ...... 15 2.2.3 Coherent Information of a channel ...... 17 2.3 Properties ...... 19 2.3.1 Data Processing and Error-Correction ...... 19 2.3.2 Quantum Capacity ...... 20 2.3.3 Lipschitz ...... 21 2.3.4 Conjecture: Optima on Positive Definite Density Matrices ...... 22 2.4 Known Results ...... 22 2.4.1 Amplitude-Damping Channel ...... 23 2.4.2 Pauli Channel ...... 23 2.4.3 Dephrasure Channel ...... 24 2.4.4 Degradable and Antidegradable Quantum Channels ...... 25

v 3 Introduction to Manifold Theory and The Gradient/Hessian 27 3.1 Preliminaries to Manifold Theory ...... 28 3.1.1 Manifolds ...... 28 3.1.2 Smooth Maps ...... 33 3.1.3 Tangent Space ...... 36 3.1.4 Differential ...... 37 3.1.5 Tangent Bundle ...... 43 3.1.6 Submanifolds ...... 44 3.1.7 Riemannian Manifolds ...... 46 3.2 Riemannian Gradient and Hessian ...... 48 3.2.1 Euclidean Gradient ...... 49 3.2.2 Riemannian Gradient ...... 50 3.2.3 Hessian ...... 53

4 Gradient/Hessian of Coherent Information on Positive Definite Matrices 59 4.1 Strictly Positive Quantum Channels ...... 60 4.1.1 Definition ...... 60 4.1.2 Dense ...... 63 4.1.3 Examples ...... 64 4.2 Differentiability of Coherent Information ...... 64 4.2.1 Differentiability of Entropy Exchange and Coherent Information . . . 65 4.3 Gradient and Hessian ...... 66 4.3.1 Gradient ...... 66 4.3.2 Hessian ...... 71

5 Critical Points and Local Maximas/Minimas 76 5.1 n-shot Coherent Information ...... 77 5.1.1 Critical Points of Product States ...... 77 5.2 Unital Channels ...... 80 5.3 Pauli Channel ...... 83 5.3.1 Dephasing Channel ...... 87 5.3.2 Depolarizing Channel ...... 88 5.4 Dephrasure Channel ...... 95 5.5 Pauli Erasure Channel ...... 103

6 Conclusion And Further Work 107 6.1 Further Work ...... 109

Bibliography 112

vi A Matrix Calculus 115 A.1 Matrix Functions ...... 115 A.2 Frechet and Gateux Derivatives ...... 117 A.3 Matrix Exponential and Logarithm ...... 118 A.4 Power Series ...... 120

vii List of Figures

2.1 The initial model and assumptions of the coherent information...... 16

5.1 Coherent Information of Depolarizing Channel Evaluated at the Maximally Mixed State...... 91 5.2 Maximum of Coherent Information of Depolarizing Channel...... 92 5.3 von Neumann entropy of the optimal solution of maximum of coherent infor- mation ...... 93 5.4 The eigenvalue of the Hessian at the Maximally Mixed state of coherent in- formation...... 94

viii Chapter 1

Introduction

1.1 Introduction to Problem

One of the most challenging problems of Quantum Information is the evaluation of quantum capacity Q(N ) of a N , the maximal amount of quantum information that can be sent through N . It was shown in [22], [11], and [28], known as the LSD theorem, that it can be calculated as I (N ⊗n) Q(N ) = lim c , n→∞ n c where Ic(N ) = maxρ∈D S(N (ρ)) − S(N (ρ)) is the coherent information of the quantum ⊗n channel N and D is the space of all density matrices. The term Ic(N ) is called the n-shot coherent information. It is the quantum analogue of mutual information found in classical information theory. Unlike the classical case, quantum capacity does not have a single letter expression [27]. This is due to the existence of superadditivity

⊗n Ic(N ) > nIc(N ), for some channel use n. Note that in general, for all n the following inequality holds ⊗n Ic(N ) ≥ nIc(N ). It was shown in 2005 by Devetak and Shor [10], that if the quantum channel is degradable then the quantum capacity has a single letter formula as Q(N ) = Ic(N ). This is due to ⊗n the fact that for all n, the coherent information is subadditive Ic(N ) = nIc(N ). If the quantum channel is antidegradable, then the quantum capacity is known to be zero. The notion of approximate degradable quantum channels were introduced in 2017 and bounds

1 on the quantum capacities were found [32] which were further studied in [18]. The quantum capacity is generally an extremely difficult problem. It is shown in 2015, that an unbounded number of channel uses of optimizing the coherent information may be required to find non-zero quantum capacity [7]. Due to this result, the problem shifted to understanding examples of when superadditivity occurs. This is further studied in the following papers [21] (2017) and [19] (2018), respectively. The overall goal of this thesis is to investigate the critical points (ie when its gradient is zero) of the n-shot coherent information in a coordinate invariant way. This is done by first investigating when the coherent information becomes a smooth map on the space of positive definite density matrices D++. The reasoning behind this restriction is that the global optima is conjectured to generically be positive definite. Additionally, the set of positive definite density matrices is dense inside the space of all density matrices and it has a smooth manifold structure. This allows the calculation of the gradient and Hessian of the coherent information. It will have the advantage of being coordinate invariant and will be shown that critical points can be completely studied on the space of positive definite matrices M++ (rather than D++).

1.1.1 Contributions

The contributions of this thesis is as follows.

• The restriction of the coherent information to the manifold of postive definite matrices and the conditions on the quantum channel N and the complementary channel N c to insure smoothness of coherent information is explored. This is heavily tied to the notion of a strictly positive map [4]. This is discussed in chapter four.

• The Riemannian Gradient of the coherent information has been solved within the dense class of strictly positive quantum channels. It will be shown that a positive definite

density matrix ρ is a critical point of the coherent information Ic of a strictly positive quantum channel N iff     −N † log(N (ρ)) + (N c)† log(N c(ρ)) ∈ span{I},

where log is the matrix logarithm, N c is the complementary channel, N † is the adjoint of a channel and I is the identity matrix. This is the statement of Theorem 4.3.2.

2 • It will be shown that the coherent information Ic(N , ρ) can be written using the gra- dient term above, ie

Ic(N , ρ) = hgrad(Ic(N , ρ))|ρi,

† where hA|Bi = T r(A B) is the Frobenius inner product and the gradient grad(Ic(N , ρ)) is −N †(log(N (ρ))) + N c†(log(N c(ρ))) on the manifold of positive definite matrices. It will be shown that the coherent information is a positively homogeneous function of degree zero.

• In addition, the Riemannian Hessian Hess(Ic(N , ρ)) of the coherent information Ic at a positive definite density matrix ρ is solved. It is a linear map from trace zero Hermitian matrices H0 to itself such that      † c † c Hess(Ic(N , ρ))[V ] = P − N d logN (ρ) N (V ) + (N ) d logN c(ρ) N (V ) ,

where V is a trace zero Hermitian matrix, P is the orthogonal projection from Her- 0 mitian matrices H to trace zero, Hermitian matrices H and d logX : H → H is the differential of the matrix logarithm. This is a statement of Theorem 4.3.5.

• It will be shown that critical points ρ of k-shot coherent information become critical points of nk-shot coherent information via the product state ρ⊗n. This is the topic in section 5.1.

• The maximally mixed state is shown to always be a critical point for n-shot coherent information for the Pauli and Pauli+erasure channel. Indicating the role it has as a bifurcation point. This is explored in chapter five.

• In addition, the eigenvalues of the Hessian at the maximally mixed state are solved indicating when the maximally mixed state is a local maxima/minima or saddle point for the single shot coherent information. The result for dephasing and dephrasure channels match those in literature and the results for the depolarizing channel match computational results.

1.2 Overview of Chapters

Chapter two introduces the coherent information of a quantum channel. It will begin with a discussion of von Neumann entropy, following the coherent information and its known

3 properties. It will finish with known results on the coherent information with respect to amplitude-damping channel, and dephrasure channel. Chapter three introduces first the basics of smooth manifold theory needed to show that the positive definite density matrices D++ is a manifold and that its tangent space, the space of all directional derivaties, is isomorphic to the space of trace zero Hermitian matri- ces. Finally, the minimal requirements of Riemannian manifold theory is shown to formally define the gradient and Hessian with emphasis placed on submanifold that are isometrically embedded into a Euclidean space. All examples illustrate applications pertaining to the von Neumann entropy as a smooth map on the manifold D++. Chapter four first introduces the conditions needed on the quantum channel N and the complementary channel N c such that the coherent information becomes a smooth map on the manifold D++. It will be shown that the class of strictly positive quantum channel, ie it is positive definite invariant, will insure that the channel entropy S(N ) is smooth. Additionally, it will be shown that strictly positive quantum channels are dense within the class of quantum channels, known before in [35]. The complementary channel N c is going to be related to the Gram matrix, which then will be shown is strictly positive when the Kraus

operators {Ai} of N are linearly independent. Finally the chapter ends with the calculation of the gradient and Hessian of the coherent information alongside discussing their properties pertaining to critical points and local maximas/minimas. It will show that the gradient is invariant under positive scalar multiplication of density matrices and that the coherent information can be written in terms of the gradient, linking it to the Euler’s Homogenous function Theorem. Chapter five applies the results of the previous chapter to different classes of quantum channels. It will begin with a discussion of n-shot coherent information and how critical points can be formed via product states of critical points of ’lower’ shot coherent information. The main result is that the maximally mixed state is always a critical point for the n-shot coherent information of the Pauli, dephrasure and Pauli-erasure channels. The eigenvalues of the Hessian of these channels will be solved at the maximally mixed state for the one- shot coherent information. The results obtained from the dephrasure channel will match the results obtained from the paper [19], validating the formulas for the gradient and Hessian. For the depolarizing channel with channel parameter p, it will be shown that when p ∈ [0, 0.07055) the maximally mixed state is a local maxima, p ∈ {0.07055} will be a saddle- 1 point and p ∈ (0.07055, 3 ] will be a local minima for the one shot coherent information.

4 Chapter 2

Coherent Information

In classical information theory, Claude Shannon’s famous paper [27] introduces the notion of information and entropy, the expectation value of the information content of a probability distribution. In turn, he defines the mutual information of two probability distributions in order to quantify the notion of channel capacity, a measure of how much information passes through a classical communication channel. The quantum analogue of mutual information is known as the coherent information of a channel. Similar to mutual information, this can be used to define the quantum analogue of channel capacity, known as the quantum capacity. This measures the maximal amount of quantum information that passes through a quantum communication channel. This chapter is about the definition and properties of coherent information. Section 2.1 introduces quantum analogue of entropy alongside its properties. After doing so, the coherent information of a channel is motivated and defined in section 2.2. The coherent information has applications to familiar concepts from classical information theory, such as data-processing inequality, error-correction, and the quantum capacity. The coherent information will be shown to be a Lipschitz function which satisfies Levy’s lemma in section 2.3. The last section 2.4 introduces examples of quantum channels where results on either quantum capacity or the coherent information is known and serve as examples used in the later chapters of this thesis.

2.1 von Neumann Entropy

This section introduces the notion of classical entropy as the measure of disorder of a prob- ability distribution. The properties of classical entropy and the joint classical entropy will

5 be presented. The section will conclude with the quantum analogue of entropy known as the von Neumann entropy. Its properties and joint entropy will also be introduced with its connection to quantum entanglement. For the purposes of this thesis, only discrete random variables and finite-dimensional Hilbert spaces are presented.

2.1.1 Classical Entropy

Let X : E → R be a discrete random variable over a finite set of outcomes E. Suppose pX (X(e)) is a probability distribution over the random variable X, that gives the probability of a event e in E occurring. For simplicity, we will defined the probability of event e occurring as pX (e) instead of pX (X(e)).

The information content iX (e) of a event e in random variable X measures the amount of information in bits needed to represent e. Claude Shannon showed in [27] that the infor- mation content is defined by iX (e) = − log2(pX (e)), where we identify the logarithm base two of zero to be zero. The classical entropy H(X) of a discrete random variable X is defined to be the ex- pectation value EX [iX ] of the information content over X. This is precisely H(X) = P e∈E pX (e) log2(pX (e)). The classical entropy H(X) can be thought of as a measure of the disorder of X or the average number of bits needed to represent the random variable X. The following is a series of properties that the classical entropy has. Proofs of these are found in [34].

Theorem 2.1.1. Consider the classical entropy H(X) over a discrete random variable X with probability distribution pX . The following property holds for the classical entropy:

1. The classical entropy is non-negative over all discrete random variables X.

2. The classical entropy is concave over any two discrete random variables X1 and X2, ie

αH(X1) + (1 − α)H(X2) ≤ H(αX1 + (1 − α)X2).

3. The classical entropy is zero only when the probability distribution pX is one for a event e and zero everywhere else.

4. The maximum of classical entropy occurs when all events e have the same probability 1 ie pX (e) = |E| , where |E| is the cardinality of the set of all outcomes E.

6 5. The classical entropy is continuous function as a function over the space of all proba- bility distributions of events E.

Given two discrete random variables, X and Y , the joint probability distribution is

denoted as pX,Y . The classical entropy H(X,Y ) of two random variables is defined to be the expectation value of the information content of the joint probability distribution. This is referred as the joint entropy of two random variables X and Y .

The conditional probability pX|Y =y(x) is the probability of event x over the r5andom variable X if the event y for the random variable Y occurred. The conditional entropy H(X|Y = y) over the realization of Y = y is the entropy over the conditional probability

distribution pX|Y =y. The conditional entropy H(X|Y ) is the following (Definition 10.2.1 in [34]): X X H(X|Y ) = − pX,Y (x, y) log(pX|Y =y(x)). x y

2.1.2 Density Matrices

Density operators are the analogue of probability distributions on quantum states. |ψihψ| is the density operator with probability one that |ψi is obtained. Every density operator ρ can P be written as a convex-sum of pure-states |ψihψ| ∈ L(H). In other words, ρ = λi|ψiihψi| P such that λi = 1. The space of all density matrices D is the convex hull of all quantum states in H ie

 X X  D = conv{|ψihψ| : |||ψi|| = 1} = ci|ψihψ| : ci = 1 and |||ψi|| = 1 .

We have the following characterization of density operators which can be found in any standard quantum information book [34].

Theorem 2.1.2. A operator ρ ∈ L(H) is a density operator iff it is a trace-one, positive semidefinite Hermitian operator.

A density operator ρ that is rank-one implies it is a pure-state ie ρ = |ψihψ|. A density operator that is rank k can be written as a convex sum of k-pure states with no fewer pure- states can be removed. A maximal-rank density operator ρ is one whose rank is equivalent to dimension of H.

7 2.1.3 Quantum/von Neumann Entropy

Let H be a k-dimensional Hilbert space whose unit vectors represent the quantum states that can model a single particle. Define a discrete random variable X : H → R over the Hilbert space such that only a finite set in H has non-zero measure.

This defines an noise ensemble of , ie a set {(pX (i), |ψii)} where pX (i) is

the probability of obtaining the quantum state |ψii. The density matrix ρ of this ensemble

can be defined to be the expectation value over all quantum states |ψii representing as a

bounded linear operator in L(H) as |ψiihψi| (via Riesz representation Theorem), ie

X ρ = EX [|ψiihψi|] = pX (i)|ψiihψi|.

The von Neumann entropy S(ρ) of a density matrix ρ is defined to be the trace of the P classical entropy of the probability distribution pX , ie S(ρ) = i pX (i) log2(pX (i)). Similar to classical entropy, this measures how ”mixed/disorder” the ensemble is. This is equivalent to the definition proposed above. Suppose one has a density matrix ρ inside the space of linear bounded operators L(H) on a Hilbert space H, ie a trace-one, positive semidefinite Hermitian matrix. The unitary diagonalization of hermitian matrices is needed:

Theorem 2.1.3 (Unitary Diagonalization of Hermitian Matrices). Let X be a Hermitian matrix. Then X admits a spectral/unitary decomposition as X = UΣU †, where U is a unitary matrix and Σ is a diagonal matrix with real entries.

From this theorem, one has a diagonalization of the form ρ = UΣU †, where U are

unitary operators on H that represents the eigenvectors |ψii and Σ is a diagonal matrix with

eigenvalues λi. The trace of a density matrix is the sum of eigenvalues and being positive semidefinite matrix requires that all eigenvalues are greater than or equal to zero. Hence,

the eigenvalues λi of ρ is a number between zero and one, whose sum over all eigenvalues

is one. This exactly represents the probability of obtaining the eigenvector state |ψii. The

density matrix ρ represents the ensemble {(λi, |ψii)} of quantum states with probability λi

of having |ψii. The von Neumann entropy of ρ is defined to be the classical entropy over density matrix based on its unitary diagonalization. The previous result motivates the following definition of quantum entropy.

Definition 2.1.1 (von Neumann entropy). Given a density matrix ρ of L(H) for some † Hilbert space H. Consider the diagonalization ρ = UΣU of ρ, with eigenvalues λi.

8 The von Neumann entropy S : D → R over the space of all density matrices D is defined to be the classical entropy of its diagonalization,

S(ρ) = H({λi}). (2.1)

Alternatively, the von Neumann entropy can be defined using the trace-operator and the logarithm of the matrix as follows,

S(ρ) = T r(ρ log(ρ)). (2.2)

where the logarithm of a matrix log(ρ) is defined on the unitary diagonalization of ρ as a matrix function. In other words, the matrix logarithm is defined as follows, log(ρ) = log(UΣU †) = U log(Σ)U †, where UΣU † is the unitary diagonalization of ρ and log(Σ) is the matrix where logarithm base two is applied to each of its eigenvalue. It is assumed that the logarithm of zero, ie log(0) = 0.

The alternative definition 2.2 is equivalent to 2.1 by the following.

S(ρ) = T r(ρ log(ρ)) (Definition) = T r((UΣU †) log(UΣU †)) (Diagonalization) = T r(UΣU †U log(Σ)U †) (Definition of log) = T r(UΣ log(Σ)U †) (Definition of Unitary) = T r(U †UΣ log(Σ)) (Cyclic Property of Trace) = T r(Σ log(Σ)) (Definition of Unitary) X = λi log(λi) (Trace is sum of eigenvalues)

Similar to classical entropy, the von Neumann entropy shares many of the same properties. Proofs of which can be found in [34].

Theorem 2.1.4. The von Neumann entropy S : D → R over the space of density matrices D has the following properties for all density matrix ρ.

1. The von Neumann entropy is non-negative over all density matrices, S(ρ) ≥ 0.

2. von Neumann entropy is concave, ie let σ be another density matrix and β ∈ [0, 1] :

S(βρ + (1 − β)σ) ≤ βS(ρ) + (1 − β)S(σ)

9 3. The maximum of von Neumann occurs at the maximally mixed state I/d, where d is the the dimension of H.

4. Consider a metric on D, based on the trace norm, as   ||ρ − σ|| = T r p(ρ − σ)†(ρ − σ) .

The von Neumann entropy is continuous with respect to the topology on D induced from that metric.

The last point actually holds for all norms on the space of bounded linear operators L(H) of Hilbert space H. This is because the Hilbert space considered here are finite-dimensional and hence so is L(H). It is then known that all norms are equivalent on finite-dimensional Hilbert spaces (see Theorem 5.10.6 in [25]).

Joint Quantum Entropy

A quantum state representing a n-particle is described as a unit vector of the tensor product Nn ⊗n of single-particle Hilbert spaces, ie H ⊗ · · · ⊗ H := i=1 H := H . Similarly, the n- particle density operator is described as a trace one, positive semidefinite and Hermitian Nn operator in L( i=1 H). Choosing a basis representation on H induces a basis representation Nn on L( i=1 H) to obtain a n-particle density matrix ρ1···n. Consider the following to define a description of k-particles that arose from a n-particle description. The partial trace T rj over the jth particle is defined to be a function from Nn the n-particle description L( i=1 H) to a function from the (n − 1)-particle description Nn K L( i=1,i6=j H). The partial trace has the following action where we define {|li}l=1 to be a orthonormal basis set of the jth particle, k-dimensional Hilbert space H,

K     X ⊗(j−1) ⊗n−j ⊗(j−1) ⊗n−j T rj(ρ1···n) = I ⊗ hl| ⊗ I ρ1···n I ⊗ |li ⊗ I . l=1

The joint quantum entropy of n-particles is defined to be quantum entropy of ρ1···n as defined earlier. Furthermore, the conditional quantum entropy over k-particles from n- particles is defined to be the von Neumann entropy of T rk···nρ1···n where (n − k) particles are traced-out. The following are properties of the joint quantum entropy. Proofs can be found in [34].

10 Theorem 2.1.5 (Properties of Joint Quantum Entropy.). Let HA and HB be two finite- dimensional Hilbert space.

1. Let ρ ∈ L(HA) and σ ∈ L(HB) be two density matrices. Then the von Neumann entropy S is additive on ρ ⊗ σ, ie

S(ρ ⊗ σ) = S(ρ) + S(σ).

2. Let |φAQi be a unit vector in HA ⊗ HB. Consider its partial trace on system A and Q, respectively:

ρA = T rQ(|φihφ|AQ)

ρQ = T rA(|φihφ|AQ)

Then S(ρA) = S(ρQ) while S(|φihφ|AQ) = 0.

2.2 Coherent Information

This section will first introduce and motivate the coherent information of a channel with respect to a input quantum state. The final subsection will then introduce the coherent information of a channel and show that it is always a positive quantity. Before doing so, the notion of a quantum channel, isometric extensions and their adjoints needs to be formally addressed.

2.2.1 Quantum Channels

Denote H to be a finite-dimensional Hilbert space. Denote L(HA, HB) to be the space of bounded linear operators between HA and HB. This subsection is largely based on [34]. Quantum channels are a type of model for the evolution of density matrices.

Definition 2.2.1 (Axiomatic Definition of Quantum Channels.). A linear map N : L(HA) →

L(HB) is said to be a quantum channel if it satisfies the following:

1. It is linear ie N (λρ + σ) = λN (ρ) + N (σ).

2. It is trace-perserving ie T r(N (ρ)) = T r(ρ).

11 3. It is completely-positive, ie the map (IR ⊗ N ) maps positive semidefinite operators to positive semidefinite operators for all reference systems R.

The next theorem is a more concrete representation of a quantum channel.

Definition 2.2.2 (Kraus Representation of Quantum Channel). A map N : L(HA) →

L(HB) is a quantum channel iff there exists operators {Ai ∈ L(HA, HB)} such that

X † N (ρ) = AiρAi

P † where ρ ∈ L(HA) and Ai Ai = I, where I is the identity map on HA. Each operator Ai is called a Kraus operator. The minimal number of Kraus operators is called the Choi rank.

The analogue of purification of a quantum state is the isometric extension.

Theorem 2.2.1 (Isometric Extension). Let N : L(HA) → L(HB) be a quantum channel.

Let HE be the environment Hilbert space that has no dimension higher than the Choi-rank. † The isometric extension of N is a isometry operator U : HA → HB ⊗ HE (ie U U = I) such that † N (ρ) = T rE UρU holds for all ρ ∈ L(HA).

If {Ai} are the Kraus operators for N , a isometric extension can be obtained as U = P Ai ⊗ |iiE, where {|iiE} forms a orthonormal basis on HE.

Alternatively, given a isometric extension U, then the Kraus operators Ai can be defined as

Ai := (IB ⊗|iihi|E)U, where IB is the identity operator on L(HB) and {|iiE} is a orthonormal

basis on L(HE).

The isometric extension can be used to define the complementary channel, which is a mapping of the system to the environment.

Definition 2.2.3 (Complementary Channel). Let N : L(HA) → L(HB) be a quantum

channel. Let HE be the environmental system whose dimension is no larger than the Choi- rank of N . c The complementary channel is a map N : L(HA) → L(HE) such that for any isometric

extension U : HA → HB ⊗ HE the following holds:

c † N (ρ) = T rB UρU

12 for all ρ ∈ L(HA).

The complementary channel has a simple form when the Kraus operators are {Ai}. P Choose a isometric extension as U = i Ai ⊗ |iiE, then

c † N (ρ) = T rB UρU X †  = T rB (Ai ⊗ |iiE)ρ(Aj ⊗ hj|E) ij X †  = T rB (Ai ⊗ |iiE)(ρ ⊗ IE)(Aj ⊗ hj|E) ij X †  = T rB (AiρAj) ⊗ |iihj|E ij X X †  = (hk| ⊗ IE) (AiρAj) ⊗ |iihj|E (|ki ⊗ IE) k ij X † = T r(AiρAj)|iihj|E. ij

c Hence in basis representation |iiE, the complementary channel is the matrix N with entries c † c N ij = T r(AiρAj). This matrix N is called the entropy exchange matrix (for this thesis it will sometimes refer entropy exchange as the entropy of the entropy exchange matrix S(N c)).

Definition 2.2.4 (Adjoint of a quantum channel). Let N : L(HA) → L(HB) be a quantum † channel. The adjoint of the channel N is the unique linear operator from L(HB) to L(HA) such that hN †(σ)|ρi = hσ|N (ρ)i

for all ρ ∈ L(HA) and σ ∈ L(HB).

The adjoint of a quantum channel N with Kraus operators {Ai} can trivially be seen † P † to be N (ρ) = Ai ρAi. The adjoint of the complementary channel is a little harder to characterize.

Theorem 2.2.2 (Adjoint of complementary channel). Let N : L(HA) → L(HB) be a quan- c tum channel. Let N : L(HA) → L(HE) be the complementary channel. Then the adjoint c† c N : L(HE) → L(HA) of N is

c† X † N (σ) = hi|σ|jiAi Aj. ij

13 Proof. The proof will be done straight from definition.

c † X † hσ|N (ρ)i = T r(σ T r(AiρAj)|iihj|E) ij X † † † = T r(AiρAj)T r(σ |iihj|E) (Tr is linear and T r(AiρAj) is a number.) ij X † † = T r(AiρAj)T r(hj|σ |ii) ( Trace cyclic) ij X † † † = T r(AiρAj)hj|σ |ii ( hj|σ |ii is a number) ij X † † † = T r(hj|σ |iiAiρAj)( hj|σ |ii is a number) ij X † † = T r(Ajhj|σ |iiAiρ) ( Trace cyclic) ij X † † = T r( Ajhj|σ |iiAiρ) ( Trace linear) ij X † † † = T r( hj|σ |iiAjAiρ)( hj|σ |ii is a number) ij = hN c†(σ)|ρi

The next proof shows what the adjoint of the channel is in terms of the isometric exten- sion.

Theorem 2.2.3 (Adjoint Channel with Isometric Extension). Let N : L(HA) → L(HB) be a quantum channel. Let U : HA → HB ⊗ HE be the isometric extension of N . Then the † adjoint channel N : L(HB) → L(HA) is

† †  N (σ) = U σB ⊗ IE U,

where IE : HE → HE is the identity channel on HE. c † Similarly, the adjoint of the complementary channel (N ) : L(HE) → L(HA) is

c † † (N ) (σ) = U (IB ⊗ σ)U,

where σ ∈ L(HE) and IB is the identity channel on HB.

14 Proof. Let σ ∈ L(HB) and ρ ∈ L(HA). Let {|ii} for a orthonormal basis for environ- † ment system HE. From theorem 2.2.1, the channel N (ρ) is T rE(UρU ). Starting with the definition of the adjoint channel by definition.

hN c(σ)|ρi = hσ|N (ρ)i † = hσ|T rE(UρU )i X † = hσ| (IB ⊗ hi|E)UρU (IB ⊗ |iiE)i ( Definition of partial trace) i X † = hU (IB ⊗ |iiE)σ(IB ⊗ hi|E)U|ρi (Trace is cyclic and linear) i † X = hU (σ ⊗ |iihi|E)U|ρi (σ is in L(HB)) i † = hU (σ ⊗ IE)U|ρi ( Definition of Identity.)

† † Hence, the adjoint channel N (σ) is U (σ ⊗ IE)U. The exact proof can be shown by re- placing the partial trace over system E with system B to show the adjoint channel of the complementary channel.

2.2.2 Motivation

Consider the following scenario, first motivated in [26]. There are three systems, two of which are the internal system A and Q which are accessible to the experimenter and the third is the external system E modeling the environment which is inaccessible to the experimenter. Each one of these systems is modeled as a Hilbert space HA, HQ and HE, respectively. Lastly, consider a noisy quantum channel NA : L(HA) → L(HA) that acts on system A, but not system Q. From Theorem 2.2.1, every quantum channel can be associated to a isometry operator UAE : HA ⊗ HE → HA acting on a larger system AE that includes the system A together with the environment system E [31].

Suppose initially, the experimenter constructs a pure state |ψiAQ ∈ HA ⊗HQ of system A and Q. Suppose the environment starts off as a pure state |0iE, which respect to some basis element |0i in HE. The total initial description of all systems A, Q and E is |ψiAQ ⊗ |0iE ∈

HA ⊗ HQ ⊗ HE. This is reflected in the following figure 2.2.2.

The coherent information of N with respect to |ψiAQ is a measure on how well the entanglement between system A and system Q from |ψiAQ is preserved if system A goes through the noisy channel NA. Since von Neumann entropy is one way of measuring en-

15 |ψAQi (IQ ⊗ UAE)(|ψAQi ⊗ |0Ei

UAE |0Ei

Figure 2.1: The initial model and assumptions of the coherent information. tanglement. Then it can be formulated by looking at the von Neumann entropy of system

A from |ψAQihψAQ| after going through noisy channel NA (by taking the partial trace with respect to system Q). Then comparing it to the von Neumann entropy of the AQ system

|ψAQihψAQ| after going the channel (IQ ⊗ NA). Denoting the initial state of system A as  ρA = T rQ |ψAQihψAQ| . It is the following formula,

  S NA(ρA) − S (IQ ⊗ NAE)|ψAQihψAQ| . (2.3)

Since one of the assumptions is that the environment starts off as a pure state, then the entire system AQE is consequently the pure state |ψAQi ⊗ |0Ei. These assumptions allow to use the von Neumann property from Theorem 2.1.5. Hence, the entropy of the last term of formula 2.3 can be written as:    †  S (IQ ⊗ NAE)|ψAQihψAQ| = S T rA UAE(ρA ⊗ |0Eih0E|)UAE .

†  The term T rA UAE(ρA ⊗ |0Eih0E|)UAE is precisely the definition of the complementary c c channel N , where N is the mapping from L(HA), to the environment system, L(HE), with   † the following action ρA 7→ T rA UAE(ρA ⊗ |0Eih0E|)UAE (see definition 2.2.3). Thus, the formula 2.3 can be written in a more compact form as:

 c  S NA(ρA) − S N (ρA) . (2.4)

Remark 1. It is worthwhile to note that the coherent information does not depend on the choice of purification of system A into system Q. Furthermore, it does not depend on the isometric extension of the channel NA to the isometry UAE. The complementary channel which was shown here depends on the isometric extension UAE and is also invariant on the choice of isometric extension [34].

16 2.2.3 Coherent Information of a channel

The previous discussion shown that based on the assumptions of the model, there are two equivalent formulas of the coherent information of a channel with respect to system A. One equation is based on the density matrices on system A and the other equation is based on purification of system A with reference system Q. Both of which were shown to be equivalent to one another and are presented below as formal definitions.

Definition 2.2.5 (Coherent Information of a channel at a density matrix). The coherent

information Ic(N , ρA) of a channel N : L(HA) → L(HB) mapping system A to system B at density matrix ρ is defined to be

c Ic(N , ρ) = S(N (ρA)) − S(N (ρA)),

c where N : L(HA) → L(HE) is the complementary channel mapping system A to the envi- ronment system E.

Definition 2.2.6 (Coherent Information of pure states). The coherent information Ic(N , ρA)

of a channel N : L(HA) → L(HB) is a mapping system A to system B at the pure state

|ψiAQ of system A and purification system Q is defined to be:

  Ic(N , |ψiAQ) = S N T rA(|ψihψ|AQ) − S (IQ ⊗ N )|ψihψ|AQ .

The space of all density matrices D forms a compact, connected set. The coherent infor-

mation Ic(N , ·) being a difference of continuous functions (Theorem 2.1.4) is also continuous.

By the extreme-value Theorem and intermediate value Theorem, the image of Ic(N , ·) forms a compact, connected set in R, respectively. Hence, the image forms a closed interval and the minimum/maximum exists. This motivates the next definition of using the maximum of coherent information.

Definition 2.2.7 (Coherent Information of a Channel). Given a quantum channel N : c L(HA) → L(HB) from system A to system B. Let N : L(HA) → L(HE) denote the complementary channel of N to the environment system E.

The coherent information of a channel Ic(N ) is defined to be the maximum of the coherent information of a channel with respect to all density matrices:

 c  Ic(N ) = max Ic(N , ρ) = max S(N (ρ)) − S(N (ρ)) . ρ∈D ρ∈D

17 The space of unit vectors in the space HA ⊗HQ also forms a compact set. Hence, the same argument can be applied to show that maximization over the unit vectors using definition 2.2.6 is equivalent to maximization over density matrices (definition 2.2.7). The coherent information of a channel provides a way of measuring the maximum amount of entanglement that is preserved between two systems after passing through the channel. The various important properties will be discussed in the later section which motivates its use of being a key measure in quantum information theory. Here, one property is presented.

Corollary 2.2.1 (Coherent Information of a channel is positive). The coherent information

of a channel N : L(HA) → L(HB) is always positive:

Ic(N ) ≥ 0.

Proof. This proof will show that if the density matrix is a pure state then the coherent information is always zero. This matches the intuition of coherent information being how well the channel preserves entanglement because since system A is in a pure-state then the system AQ is also a pure-state and hence there is no entanglement to begin with.

Let ρ be a density matrix in L(HA) representing a pure state ie ρA = |ψihψ|A, where

|ψiA ∈ HA.

A purification of a density state ρ ∈ L(HA) is a pure state |φiAR ∈ HA ⊗ HR for some

Hilbert space HR, such that T rR(|φihφ|AR) = ρ. Consider the following simple purification

of |ψihψ| of system A to include both system A and system Q as |0iQ ⊗ |ψiA ∈ HQ ⊗ HA.

According to the definition 2.2.6, the coherent information with respect to |0iQ ⊗ |ψiA is zero as,

Ic(N , |0iQ ⊗ |ψiA) = S(N (ρA)) − S(|0ih0|Q ⊗ N (|ψihψ|A))

= S(N (|ψihψ|A)) − S(|0ih0|Q) − S(N (|ψihψ|A)) (Theorem 2.1.5)

= S(N (|ψihψ|A)) − S(N (|ψihψ|A)) (Pure states have zero entropy) = 0

Since the coherent information of a channel is defined to be a maxima and pure states always give a value of zero, then the value of coherent information at the maxima will always be at least zero.

18 2.3 Properties

It was shown earlier of one property of coherent information of a channel, which was that it is always bounded below by zero. This section introduces other properties that coherent information possesses.

2.3.1 Data Processing and Error-Correction

Coherent information was initially proposed in the 1996 paper [26]. They initially proposed it due to its two nice properties. The first being the data-processing inequality and the other being a measure of error-correction. Motivated by classical communication theory, the data-processing inequality arises from the following scenario. Suppose, there are three systems A, B and C and two-channels inbetween them N1 : A → B and NW : B → C. Consider its concatenation:

A →N1 B →N2 C.

Any measure f of information passing through the channels must satisfy that any further con- catenation of additional channels cannot increase the amount of information going through ie

f(N2 ◦ N1) ≤ f(N1).

The authors in [26] showed that coherent information of a channel Ic(N , ·) has this property.

Theorem 2.3.1 (Data Processing inequality). Given two quantum channels N1 : L(HA) →

L(HB) and N2 : L(HB) → L(HC ) between systems A, B and C.

Coherent information of the channel N2 ◦ N1 will be less than or equal to coherent infor- mation of N1 for all density operator ρ ∈ L(HA):

Ic(N2 ◦ N1, ρ) ≤ Ic(N1, ρ).

The final property that was shown in [26] is the error-correction property. Let N :

L(HA) → L(HB) be a quantum channel and suppose there is an additional decoding channel

D : L(HB) → L(HA). Error-correction occurs on the density state ρ in system A if the decoding channel can recover ρ after going through N , ie (D ◦ N )ρ = ρ. Coherent information was shown to characterize perfect-error correction.

19 Theorem 2.3.2 (Perfect-Error Correction). Let N : L(HA) → L(HB) and D : L(HB) →

L(HA) be two quantum channels and let ρ be a density operator in L(HA). Error correction occurs (D◦N )(ρ) = ρ iff the coherent information of the channel (D◦N ) with respect to density operator ρ is equal to the von Neumann entropy of ρ:

Ic(D ◦ N , ρ) = S(ρ).

Furthermore, coherent information of any channel N with respect to any density state ρ is upper bounded by the entropy of ρ:

Ic(N , ρ) ≤ S(ρ).

2.3.2 Quantum Capacity

One of the most important applications of coherent information is its relation in the compu- tation of the quantum channel capacity. Informally, this quantity is the amount of ”quantum information” that is transmitted through a given channel N . It is denoted as Q(N ). This definition can be found in [9].

Definition 2.3.1 (Quantum Capacity). Let N : L(HA) → L(HB) be a quantum channel. A ⊗n ⊗n (n, ) code is defined to be a encoder E : L(H) → L(HA ) and a decoder D : L(HB ) → L(H) such that min F (|φi, (D ◦ N ⊗n ◦ E)|φihφ|) ≥ 1 − , |φi∈H √ √ where F (ρ, σ) = T r(p ρσ ρ)2 is the fidelity of two density states ρ and σ in L(H). The 1 rate of the code is R = n log dim(H). The quantum capacity Q(N ) is defined to be supremum over all achievable rates which can reliably transmit the number of qubits through a channel.

It was shown in [22], [28] and [11], known as the (Lloyd, Shor, Devetak) LSD coding Theorem, that the regularized coherent information can be used to calculate the quantum capacity. It states that the quantum capacity Q(N ) is the limit of the regularized coherent ⊗n Ic(N ) information n as the number of channel uses n goes to infinity:

⊗n ⊗n I (N ) max ⊗n I (N , ρ) Q(N ) = lim c = lim ρ∈D c . n→∞ n n→∞ n

20 Unfortunately, evaluating this expression is difficult and it is known that it will not have a simple expression for evaluation. In 2008, it was shown that parallel concatenation of two channels N1 ⊗ N2 can have positive quantum capacity Q(N1 ⊗ N2) when individual capacity

Q(Ni) is zero for both channels N1 and N2 [30]. It was shown in the paper [7] in 2015, that it can take a unbounded number of channel uses just to detect positive quantum capacity. One particular explanation on the difficult of evaluating the quantum capacity based on a single letter coherent information is the notion of superadditivity.

Definition 2.3.2 (Superadditivity). Coherent information is said to be superadditive for channel use n if ⊗n Ic(N ) > nIc(N ).

Given the difficulty of evaluating quantum capacity, much of the discussion has been shifted on to understanding superadditivity and when it occurs (see [19] and references within). Superadditivity was shown to be related to degenerate quantum codes [29].

2.3.3 Lipschitz

A function f :(X, dX ) → (Y, dY ) between two metric spaces (X, dx) and (Y, dy) is said to be Lipschitz with Lipschitz constant C iff there exists some real-valued constant C and for all points x and y in X, that the following is satisfied:

dY (f(x), f(y)) ≤ C · dX (x, y).

Lipschitz function is particularly important if the metric space X is the hyper-sphere. This is due to Levy’s lemma, stated in the book [3].

Lemma 2.3.3 (Levy’s Lemma). If f : Sn−1 → R is a real-valued function on the unit hyper- sphere Sn−1 := {x ∈ Rn : ||x|| = 1}. Suppose f is Lipschitz with Lipschitz constant K, then n−1 the probability pSn−1 over the hyper-sphere S of the function f(x) deviating away from its average hfi over all positive  is exponentially upper-bounded, ie

−2 2 px∈Sn−1 [|f(x) − hfi| > ] ≤ 4e 2K .

The coherent information of a channel with respect to pure-states is a Lipschitz function with Lipschitz constant 1 and thus satisfies Levy’s lemma. It was shown in the book [34], that coherent information is Lipschitz with Lipschitz constant 1 with respect to density

21 states. Since all norms are equivalent on finite-dimensional vector spaces, then it is easy to show that the distance function on H can be upper bounded by the distance function on D. The implication of Levy’s lemma implies that random initial guesses will concentrate around the mean. This will imply that random initial guesses are not sufficient for optimiza- tion of coherent information and requires better parameterization methods.

2.3.4 Conjecture: Optima on Positive Definite Density Matrices

The majority of this thesis relies on the following conjecture, which at the time could not be solved for simple non-qubit channels. The coherent information of a channel Ic(N ), if non-zero, is conjectured by the author to achieve its optimal maxima on the interior of the space of density matrices D for most cases of quantum channels. The interior is precisely, the space of all positive definite, density matrices ie the density matrices that have eigenvalues that are all positive, non-zero. This trivially holds for qubit channels as the only two possible ranks are rank 1 and rank 2, where rank 1 always has zero coherent information (see corollary 2.2.1) The intuition is that for most general quantum channels the more entangled system A and system Q is, the more information can be sent after sending through the channel (follows from Theorem 2.3.2). Regardless of whether it is true or not, the coherent information is a continuous function and the set of positive definite density matrices is dense in the set of all density matrices. In other words, denote D++ to be the space of positive definite density matrices, then

max Ic(N , ρ) = sup Ic(N , ρ). ρ∈D ρ∈D++

2.4 Known Results

The following is a list of results on the computation of coherent information and quantum capacity of various types of quantum channels. In addition, properties of Pauli matrices and representations of qubit channels is presented.

22 2.4.1 Amplitude-Damping Channel

The amplitude-damping channel N of damping parameter γ > 0 has the following Kraus operators. " # " √ # 1 0 0 γ A0 = √ A1 = . 0 1 − γ 0 0

The single letter coherent information Ic(N ) of the amplitude damping channel is found in [34] to be,

Ic(N ) = max [H(rγ) − H(r(1 − γ))], r∈[0,1] where H(x) = −x log(x) − (1 − x) log(1 − x). The quantum capacity of the amplitude-damping channel is stated in the survey paper [15], it is in fact equal to the single letter coherent information and for completeness is

Q(N ) = max [H(rγ) − H(r(1 − γ))]. r∈[0,1]

2.4.2 Pauli Channel

The Pauli channel N with parameters px, py, pz with denoting pi = 1 − px − py − pz has the following Kraus operators: " # " # " # " # √ 1 0 √ 0 1 √ 0 −i √ 1 0 A0 = pi ,A1 = px ,A2 = py ,A3 = pz . 0 1 1 0 i 0 0 −1

The individual matrices, without the square root coefficients and excluding the identity matrix, are known as the Pauli matrices and are denoted respectively as {X,Y,Z}. The next Theorem shows the following properties of the Pauli matrices.

Theorem 2.4.1 (Pauli Matrices and Properties). The Pauli matrices {X,Y,Z} which is a subset of the space of 2 × 2 complex matrices C2×2 has the following properties: 1. The trace of each Pauli matrices is zero, ie T r(X) = 0.

2. Each Pauli matrix is unitary.

3. Each Pauli matrix is Hermitian.

4. The set {I,X,Y,Z} forms a basis for the real vector space of all 2 × 2 Hermitian matrices H. Furthermore, the set {X,Y,Z} forms a basis for the real subspace of all

23 0 trace zero, Hermitian matrices, denoted as H2.

⊗n 5. The set {I,X,Y,Z} := {Pi1 ⊗ · · · ⊗ Pin : Pi ∈ {I,X,Y,Z}} forms a basis for the ⊗n Nn ⊗n space of size 2n × 2n, Hermitian matrices H2 := i=1 H2. Similarly, {X,Y,Z} 0 forms a basis for H2n.

Proof. Only the proof for (4) will be shown due to its importance in this thesis. Proof of (5), follows from (4) and the definition of tensor product. It should be clear that {I,X,Y,Z} is a subset of . H2 " # a b − ic Just from definition, every 2 × 2, Hermitian matrix M can be written as b + ic d for real-valued coefficients a, b, c, d ∈ R. To show this is basis, one must find real-valued coefficients c0, c1, c2, c3 ∈ R such that M = c0I + c1X + c2Y + c3Z. This is equivalent to solving the following linear system,

c0 + c3 = a, c0 − c3 = z, c1 = b, c2 = c.

a−z a−z Which has a unique solution when c3 = 2 and c0 = a − 2 . This shows that {I,X,Y,Z} is a basis for H2. 0 For trace zero matrices H2, the trace map T r : H → R is a linear functional taking the sum of diagonal entries of a matrix. The kernel of a linear map is a subspace of H and is 0 equivalent to H2. Furthermore, the image is a subspace of R and can be at most dimension 1. Any simple matrix with non-zero trace can be used to show that rank of the image im(T r)

of the trace T r is one. Hence from the rank nullity Theorem, the dimension of H0 is three 0 0 as H2 = dim(H2) − dim(im(T r)) = 4 − 1 = 3. Since {X,Y,Z} is a subset of H2 and forms 0 a subbasis of H, then it forms a basis for H2.

It is stated in [34], that the quantum capacity of a Pauli channel is lower bounded by the

Hashing bound which is 1−H(pi, px, py, pz) = 1+pi log(pi)+px log(px)+py log(py)+pz log(pz).

2.4.3 Dephrasure Channel

Let H2 be a two dimensional Hilbert space and H3 be a three dimensional Hilbert space. 1 Let p, q be two numbers inbetween 0 and 2 . The dephasure channel N : L(H2) → L(H3)

24 has the following action

ρ 7→ (1 − p)(1 − q)ρ + p(1 − q)ZρZ + qT r(ρ)|eihe|,

such that |eihe| is orthogonal to H2 where we associate the embedding i : H2 ,→ H3 of H2

into H3. It has the following Kraus operators:

1 0 1 0  0 0 √ p   p     A0 = (1 − p)(1 − q) 0 1 ,A1 = p(1 − q) 0 −1 ,A2 = q 0 0 , 0 0 0 0 1 0

0 0 √   A3 = q 0 0 . 0 1 The coherent information and super-additivity were discussed in [19].

The authors in [19] have shown that the single letter coherent information Ic(N ) of the dephrasure channel is maximized by the maximally mixed state when the channel parameters p, q satisfies the following region,

1−p 1 − 2p − 2p(1 − p) ln( p ) q < 1−p . 2 − 4p − 2p(1 − p) ln( p )

Moreover, the region of channel parameters p, q where single letter coherent information is zero satisfies, (1 − 2p)2 q > . 1 + (1 − 2p)2

2.4.4 Degradable and Antidegradable Quantum Channels

A quantum channel N is said to be degradable if there exists a quantum channel D such that the complementary channel N c = D ◦ N can be recovered from the action of D. A quantum channel is said to be antidegradable if the complementary channel N c is degradable. The importance of degradable channels is due to the fact that the coherent information ⊗n becomes subadditive, Ic(N ) = nIc(N ) [10]. Further, it becomes a concave function [37]. The quantum capacity is then just the single shot coherent information. Additionally, an-

25 tidegradable quantum channels have zero quantum capacity. It was shown in [36], that for qubit channels with Choi rank less or equal than two is either degradable or antidegradable. It was further shown in [8], that no channels with qubit output whose Choi rank is larger than two can be degradable.

26 Chapter 3

Introduction to Manifold Theory and The Gradient/Hessian

Calculus on nonlinear spaces requires the theory of smooth manifolds. This chapters intro- duces the preliminary concepts of manifold theory needed to cast the coherent information of a quantum channel as a differentiable function on a smooth manifold. Unfortunately, the space of interest, the set of all density matrices, does not have a smooth manifold structure. However, it will be shown that its interior, the set of positive definite density matrices, does have a manifold structure and that the von Neumann entropy is indeed a smooth function defined on it. In the final section, the Riemannian gradient and Hessian will be discussed and will be applied to the Von Neumann entropy. First, the concepts from vector calculus are generalized to smooth manifolds. Section 3.1 introduces the basic concepts of manifold theory, ie the tangent space, differential, sub- manifolds and Riemannian manifolds. Section 3.2 introduces the Riemannian gradient and Hessian. A theoretical reference for manifold theory is [20] and a applied reference – catered to matrix manifolds – is [1]. It is always assumed that the manifolds and vector spaces here are finite dimensional. The main Theorems of this section is to show that the space of positive definite matrices M++ is diffeomorphic to the vector space of Hermitian matrices H. It is then shown that the space of positive definite density matrices D++ is a submanifold of M++ and its tangent structure will be characterized. It will be shown that the gradient and Hessian can be computed in the Euclidean space H using standard matrix calculus techniques and then projected to become the gradient on D++.

27 3.1 Preliminaries to Manifold Theory

One of the purposes of smooth manifold theory is to generalize calculus from vector spaces to nonlinear surfaces. There are two classes of manifolds, topological and smooth manifolds. Both will be discussed in this section but emphasis placed on smooth manifolds and smooth functions between manifolds. After discussing examples of manifolds, the tangent space of a smooth manifold at a point will be introduced. This is the best linear approximation to that manifold at that point and is defined to be the set of all directional derivatives centered at that point. This allows to define the differential of a function between manifolds as a mapping between tangent spaces that model the change of that function along various directions. Finally, different types of smooth manifolds, such as embedded submanifolds and Riemannian manifolds, will be introduced.

3.1.1 Manifolds

Here, the notion of a topological manifold and smooth manifold will be introduced. Both are in a general sense, spaces that locally ”look” like Euclidean space.

Topological Manifolds

There are different definitions of what constitutes a manifold. The easy to construct and intuitive definition of atlases and charts will be used here. A particular example to keep in mind is the earth embedded into three dimensional space. A chart is intuitively defined to be a portion of the earth drawn onto a two dimensional map and a collection of charts that cover the earth is defined to be an atlas. Note that a portion of the earth can be covered by more than one chart. The earth on a local perspective is linear but from the view point of space, the earth is nonlinear. Topological manifolds are always assumed to be topological spaces with a local metric space. Any topological space that is locally a metric space has to be Hausdorff and second countable. Similarly, any topological space that is Hausdorff and second countable can also be a local metric space.

Definition 3.1.1 (Topological Manifolds). A topological space (X, τ) that is Hausdorff and second countable is a topological manifold if it locally ”looks” homeormophic to Euclidean space, ie for every point x in X there exists a open set U around x and a homeomorphic m mapping φU from U to the Euclidean space R .

28 The tuple (U, φU ) is called the (coordinate) chart on X. The homeomorphic function m φU : U ⊂ X → R is called the coordinate map on the domain U. Note that every coordinate 1 m map can be written as a concatenation φU (p) = (x (p), ··· , x (p)) of component functions i x : U → R. The collection of charts {(Ui, φUi )} such that the set {Ui | Ui is open in X}

covers the topological space X, ie X = ∪iUi, is called the atlas of the topological manifold X.

In summary, a n-dimensional topological manifold (X, {(Ui, φUi )}) can be defined to be

a local metric space X such that X = ∪iUi where Ui is open and φUi is a homeomorphism n of Ui to R .

Smooth Manifolds

Recall from calculus that a function between Euclidean spaces

m n f : R → R : x → (f1(x), ··· , fn(x))

is a smooth function if for each point x there is a open set U around x, such that all of the n ∂ fi component functions fi has continuous partial derivatives for all positive integer n. ∂xi1 ···xin The space of all smooth functions f : Rm → Rn between Euclidean spaces of dimension m and n is denoted as C∞(Rm, Rn). To define a analogous definition on manifolds, the notion of being smooth at any point needs to be independent on the chart that surrounds the point. This is equivalent to requiring that the function −1 n n φU ◦ φV : φV (U ∩ V ) ⊆ R → φU (U ∩ V ) ⊆ R

is smooth function between Euclidean spaces, where (U, φU ) and (V, φV ) are any two charts around the point x. Two charts that satisfy this requirement is said to be smoothly compat- ible and an atlas whose charts are smoothly compatible is said to be a smooth atlas. This motivates the definition of smooth manifolds.

Definition 3.1.2 (Smooth Manifolds). A topological manifold (M, (U, φU )) is a smooth man-

ifold if the atlas A = {(U, φU )} is a smooth atlas and it is maximal in the sense that there isn’t any other smooth atlas A0 for M that contains A. A smooth, maximal atlas is said to induce a smooth structure on the manifold.

29 Examples of Smooth Manifolds

m Example 1 (Euclidean Space). Consider the Euclidean space (R , || · ||2) with the standard l2-norm. Since it is a metric space, it is first countable and Hausdorff. A trivial chart to use m m is the identity map IdRm : R → R : x 7→ x. This is clearly a homeomorphic map between m m Euclidean space and thus {(R , IdRm )} forms an atlas for R with only one chart. An atlas with a single chart trivially satisfies being smoothly compatible with any other chart. Denote m A to be the set of all charts that is smoothly compatible with IdRm . This shows that (R , A) is a m-dimensional smooth manifold. 2 Similarly, the space of all complex numbers C with the l2-norm can be identified as R m with the l2-norm using the following mapping a + ib 7→ (a, b). This shows that C is a 2m-dimensional smooth manifold.

The previous example can be generalized to any normed finite dimensional vector space (V, || · ||). This can be seen by noting that any normed finite dimensional vector space is isomorphic to Euclidean space with the standard Euclidean norm (l2-norm). Denote ei be a vector in Rm where the ith coordinate is one and rest of the coordinates are zero ie M m ei = (0, ··· , 1, ··· , 0). The set {ei}i=1 is known as the standard basis of R .

Example 2 (Normed Finite Dimensional Vector Space). Consider a normed finite dimen- M sional vector (V, || · ||) with M basis vectors {li}i=1. Every vector v can be written as a linear Pm combination of basis vectors v = i=1 cili. The vector (c1, ··· cn) of coefficients is known as the coordinate vector of v. Denote the map L : V → Rm such that it maps any vector v ∈ V to its coordinate vector. The map L is clearly linear as it maps the basis set {li} to the standard basis set {ei} and whose kernel is trivial. By the rank nullity Theorem, L is an isomorphism. Furthermore, the linear isomorphism L is a homeomorphism. This can be seen by using the operator norm || · ||∞ on L as ||L||∞ := sup||v||=1||L(v)||2 and the following inequality

||L(v)||2 ≤ ||L||∞||v||V . Using this inequality, the −δ definition of continuity can be satisfied:

||L(v) − L(w)||2 ≤ ||L(v − w)|| ≤ ||L||∞||v − w||V ≤ ||L||∞δ := .

Thus, the basis set {li} induces a atlas A giving it a smooth manifold structure on V of dimension M. M ˜ Similarly, another different basis {mi}i=1 will induce a smooth structure A. This will have the same smooth structure induced from A due to the fact that the change of coordinates

30 L˜ ◦ L−1 formula is clearly infinitely differentiable:

˜ −1 ˜ X ˜ X X i ˜ X i X j X j L ◦ L (c1 ··· , cm) = L( cili) = L( ci djmj) = L( cidjmj) = ( cid1, ··· , cidM ). i i j ij ij ij

A finite dimensional normed vector space is called a linear manifold. The next example shows another example of a linear manifold that is the main vector space of this thesis. Recall from linear algebra, that a linear transformation L ∈ L(V,W ) between two finite   dimensional, complex field, inner product vector spaces V, h·|·iV and W, h·|·iW can be induced a matrix representation M and a corresponding adjoint matrix M † : W → V such that for all vectors w in W and v in V , the following is satisfied:

† hM (w)|viV = hw|M(v)iW .

A matrix M is said to be Hermitian if it is a square matrix and its adjoint is equivalent to itself, M † = M. The set of all Hermitian matrices forms a vector space over the field of real numbers with dimension n2. However, does not form a vector space over the field of complex numbers.

Example 3 (Set of Hermitian Matrices). Denote H to be the vector space of the set of all Hermitian matrices between Cn, ie H := {X ∈ L(Cn, Cn) | X† = X}. This is a linear man- ifold of dimension n2. Inner product structure can be added on H by defining the Frobenius inner product as hA|Bi = T r(AB).

The following is a example of a nonlinear manifold and a example of a submanifold which will be discussed later in subsection 3.1.6. Recall, that if a function f : Rn → Rm between Euclidean spaces is infinitely differentiable, then for any open set U of Rn, f restricted to U is also infinitely differentiable.

Example 4 (Open submanifolds). Let U be an open set of a smooth manifold M. The open set U can be given a manifold structure induced from the manifold structure of M. Consider all charts (V, φV ) of M such that the coordinate domain V has non-empty intersection with U. The restriction of this coordinate map φV to U forms a chart (U ∩V, φV U ) for U. Hence, {(U ∩ V, φV |U )} forms an atlas for U. This atlas is smooth due to the fact that restriction of infinitely differentiable function to a open set is a infinitely differentiable. It is maximal because the original atlas is maximal for M. Smooth manifolds constructed from restriction to open sets are called open submanifolds.

31 The next example will show that the space of positive definite matrices is a open sub- manifold of the space of Hermitian matrices.

Definition 3.1.3 (Positive Semidefinite Matrix). Let Hn be the space of n × n Hermitian matrices from Rn to Rn with the Frobenius inner product hX|Y i = T r(X†Y ). A n × n Hermitian matrix X is said to be positive semidefinite if any of the following hold,

1. hx|X|xi ≥ 0 for all vectors x ∈ Rn. 2. All eigenvalues of X are greater than or equal to zero.

3. (Sylvester’s criterion) All of the minors Mi of X from 1 to n have determinant det(Mi) greater than or equal to zero. A Hermitian matrix X is said to be positive definite if one replaces all the conditions above of being greater than equal zero with greater than zero. The space of positive definite n × n ++ + Hermitian matrices is denoted as Mn and similarly, positive semidefinite as Mn . Since every positive definite matrix is also a positive semidefinite matrix, then it is clear that M++ is a subset of M+. Both of this spaces are also subsets of the vector space of Hermitian matrices H. The following Theorem shows that M++ is a open set of H and hence it is also a open submanifold of the same dimension. Theorem 3.1.1. The set of positive definite matrices M++ is an open set of the vector space of n × n Hermitian matrices H with the Frobenius inner product.

Proof. Consider the determinant detn : Hn → R from the space of n × n Hermitian matrices Hn to the real line R. The determinant is a polynomial in the matrix entries and since polynomials are continuous, then the determinant is also a continuous function. One charac- terization of positive definite matrices is that its minors are all positive (Sylvestor’s criterion as above). In other words, every positive definite matrix X is characterized by having minors

Mi from i = 1 to n with positive determinant det(Mi) > 0. Denote the mapping of a matrix to a tuple of its minors as,

Φ: Hn → H1 × · · · Hn : X 7→ (M1, ··· ,Mn−1,X).

It should be clear that this is a continuous function by using the product topology in the

codomain H1 × · · · Hn. The Sylvestor criteria implies that   ++ −1 n M = ((det1 × · · · × detn) ◦ Φ) (0, ∞) .

32 Since composition of continuous functions is continuous, and the set (0, ∞) × · · · × (0, ∞) forms a open set in Rn with the product topology. This completes the proof due to the preimage of open set is open under the action of continuous functions.

The proof of the Theorem can also be used to show that M+ forms a closed set and in fact it is the closure of M++. The next example formally states that M++ is a open submanifold of H.

Example 5 (Positive Definite, Hermitian Matrices). The set of all positive definite, Hermi- tian matrices M++ is a open submanifold of the set of all Hermitian matrices H with the same dimension.

3.1.2 Smooth Maps

This subsection introduces the notion of a smooth real-valued functions on manifolds and smooth functions between manifolds. Let M be a smooth n-dimensional manifold. First, the definition of a function being smooth at a point is presented, following the notion of being a smooth function over all points. A real-valued function f : M → Rm is said to be smooth at a point p if for ev- ery chart (U, φ) that covers the point p, the function f ◦ φ−1 : φ(U) ⊂ Rn → Rm is smooth in the Euclidean sense. The function f ◦ φ−1 is called coordinate representation of f. This is motivated from the fact that the function f can be represented locally as −1 (f ◦ φ )(x1, ··· xn) = (f1(x1, ··· xn), ··· , fm((x1, ··· xn)). A real-valued function on a man- ifold is said to be smooth if it is smooth at every point.

Definition 3.1.4 (Smooth Real-Valued Map on Manifold). A smooth map f : M → Rm between a smooth n-dimensional manifold M and Euclidean space Rm is smooth if for all −1 n m charts (U, φU ), the following coordinate representation f ◦φU : φU (U) ⊆ R → R is smooth in the Euclidean sense.

The space of all smooth real-valued functions on a manifold M is denoted as C∞(M). This can be extended to smooth maps between manifolds M and N.

Definition 3.1.5 (Smooth Map between Manifolds). Suppose M and N are smooth mani- folds of dimension m and n, respectively. A map f : M → N between M and N is smooth if for all points p ∈ M there exists a chart (U, φU ) around p, and chart (V, φV ) around f(p) −1 −1 m n such that f(U) ⊂ V and φV ◦ f ◦ φU : φU (U) ⊆ R → R is smooth in the Euclidean sense.

33 −1 The function φV ◦ f ◦ φU is called the coordinate representation of f based on the point p.

A smooth map that is bijective whose inverse is also smooth is called a diffeomorphism. Two manifolds are called diffeomorphic if there exists a diffeomorphism between them. Note that a diffeomorphism between the same space X but with different smooth structures can exist, implying that diffeomorphism is a weaker notion of equivalency between smooth man- ifolds. In fact two different atlases on the manifold X generate the same smooth structures if and only if the identity map on X is a diffeomorphism. From the definition of smooth manifolds, coordinate charts φ : U → Rn that came from a chart (U, φ) are by definition homeomorphism. The next corollary shows that they are in fact diffeomorphism, where we identity U and f(U) to have a open submanifold structure. The proof follows from the definition of being smooth and smoothly compatible charts.

Corollary 3.1.1. Let M be a smooth manifold and (U, φ) be a smooth chart. Then φ is a diffeomorphism from U to its image φ(U).

Examples of Smooth Functions

Pn Example 6 (Trace). The trace T r : Hn 7→ R is defined to be T r(X) = i aii sum of the diagonal elements aii of Hermitian matrix X. Note that the diagonal elements of Hermitian matrices are real from its definition. n2 Space of n × n Hermitian matrices Hn has a single chart vec : Hn → R known as the vectorization function defined as follows:

  a11      a12  a11 ··· a1n + ib1n      b   . . .   12  vec  . . .  =  .  .    .  a + ib ··· a   1n 1n nn   bn(n−1) ann

To show smoothness of the trace functional, one has to show that the coordinate represen- 2 tation T r ◦ vec−1 : Rn → R is smooth in the Euclidean sense. Due to the fact that this is polynomial in matrix entries, it can be trivially seen. More explicitly, by taking the deriva- tive with respect to each matrix entry that the coordinate representation is smooth in the

34 Euclidean sense.  ∂(T r ◦ vec−1) 1 i = j ∂k(T r ◦ vec−1) = δ = and = 0 ∀k ≥ 2 ∂a ij ∂a ··· a ij 0 else i1j1 ikjk

The previous example can be generalized to show that any linear map between linear manifolds is smooth.

Example 7 (Linear Maps). Any linear map L : V → W between normed, finite dimensional vector spaces over the real field is smooth. Suppose V has finite dimension n. Continuity was shown earlier from example 2. n For smoothness, denote a basis on V as {vi}i=1. This induces a chart (V, φ) such that the coordinate map φ maps any vector v to its corresponding coefficient vector (c1, ··· , cn) in Rn. −1 n P The coordinate function L ◦ φ : R → W :(c1, ··· , cn) 7→ ciL(vi) will have a derivative in each coordinate ei (standard basis vector) as

∂L ∂kL = ci and = 0 ∀k ≥ 2. ∂ei ∂ei1 ··· eik

Hence the coordinate function has continuous partial derivatives on all orders and thus L is smooth function between linear manifolds.

Example 8 (Matrix Exponential). The matrix exponential exp : Hn → Hn was introduced in the appendix . Based on the power series definition, it is clearly a smooth function. Furthermore, Theorem A.4.3 states that matrix exponential restricted to Hermitian ma- ++ trices Hn is a diffeomorphism onto its image of positive definite, Hermitian matrices M . This provides a maximal, smooth atlas with a single chart for M++ to be {(M++,L◦exp−1)}, where L is the coordinate map based on some basis for Hn. −1 ++ ++ The inverse of exp on M is precisely the matrix logarithm log : M → Hn intro- duced in A.3.2 as a definition and discussed next.

Example 9 (Logarithm of Positive Definite Matrices). The previous examples shows that the matrix logarithm defined to be the inverse of the matrix exponential restricted to positive definite, Hermitian matrices is also a diffeomorphism and hence a smooth map.

++ Example 10 (Entropy). The von Neumann entropy Sn : Mn → R : X 7→ −T r(X log(X)) ++ on the space of all n × n positive definite, Hermitian matrices Mn is a smooth function.

35 The proof follows from the fact that the trace, matrix multiplication and matrix logarithm are all smooth functions on M++.

3.1.3 Tangent Space

This subsection will generalize the notion of a directional derivative (or Gateux derivative) to an abstract, algebra definition called the derivation. The set of all derivations of a manifold based on a point is called the tangent space which can be thought of as the best linear approximation of the manifold at that point.

From vector calculus, Let (V, || · ||V ) and (W, || · ||W ) be two normed vector spaces and let f : V → W be some function between them. Let p be a vector in V and v be a vector in V . ∂f The directional derivative ∂v |p at p in direction v measures the rate of change of f centered at p ∈ V in direction v ∈ V , ie

∂f ||f(p + hv) − f(p)||W = lim . h→0 ∂v p h

∂ The directional derivative ∂v p at p in direction v satisfies the following key properties. It is a operator valued linear function from the space of all smooth functions C∞(V ) of a normed vector space V to the real line R. In addition, it also satisfies the product rule, ∂(fg) ∂g ∂f ∂v |p = f(p) ∂v |p + ∂v |pg(p). These properties can generalize the notion of a directional derivative.

Definition 3.1.6 (Derivation at a point). A derivation Xp at a point p in a smooth manifold M is a linear function from the set of smooth functions C∞(M) of M to the real line R such that it satisfies the product rule, ie

∞ Xp(fg) = f(p)Xp(g) + Xp(f)g(p) ∀f, g ∈ C (M).

The next definition is the set of all possible derivations at a point p in a smooth manifold M.

Definition 3.1.7 (Tangent Space at a point). Suppose M is a smooth manifold and p is a point inside M. The tangent space at p is the set of all derivations (directional derivatives)

based at p, denoted as Tp(M).

It was stated earlier that the tangent space Tp(M) is the best linear approximation of M centered at p. The next Theorem shows that this is a vector space.

36 Theorem 3.1.2 (Tangent Space is a Vector space). Let M be a n-dimensional manifold.

The tangent space Tp(M) at a point p of M is a vector space over the real numbers under the following addition and scalar multiplication operator respectively,

(Xp + Yp)(f) := Xp(f) + Yp(f)

(λXp)(f) := λ(Xp(f)).

The zero vector is the linear functional that maps C∞(M) to zero.

Proof. Since the space of linear functionals L(C∞(M), R) on the vector space C∞(M) forms

a vector space and Tp(M) is a subset. Then it is only required to show addition and scalar multiplication is preserved. Hence, only the product rule needs to be satisfied for the com- pletion of the proof. Let f and g be smooth real-valued functions on M from C∞(M). For the proof for addition, the product rule on each derivation is applied following a simple rearrangement,

(Xp + Yp)(fg) = Xp(fg) + Yp(fg) = f(p)Xp(g) + g(p)Xp(f) + f(p)Yp(g) + g(p)Yp(f)

= f(p)(Xp(g) + Yp(g)) + g(p)(Xp(f) + Yp(f))   = f(p) (Xp + Yp)(g) + g(p) (Xp + Yp)(f) .

For scalar multiplication, it is clear that the product rule is satisfied.

3.1.4 Differential

Vector Calculus

A Euclidean function f : Rn → Rm is said to be differentiable at a point p if there exists a open set U that contains p such that f restricted to U and its partial derivatives (or Gateux

derivatives) all exist and are continuous over all directions. Furthermore, given a basis {ei} n P in R , any direction v written in basis format v = ciei, its directional derivative can be ∂f ∂f ∂f ∂ n written as = ( , ··· , ) · (c1, ··· , cn). This shows that { } forms a basis for ∂v p ∂e1 ∂en ∂ei p i=1 n the tangent space Tp(R ) and hence is a finite dimensional vector space. The total derivative (or Frechet derivative see appendix A.2.2) models a differentiable Euclidean function f as the best linear approximation of f at a point.

Definition 3.1.8 (Total Derivative). Let (V, || · ||V ) and (W, || · ||W ) be two normed vector spaces. Let F be a function between V and W . Then F is differentiable at point v if there

37 exists a linear map DF (v) between V and W ,

DvF : V → W

such that for any direction h ∈ V ,

||F (v + th) − F (v) − DF (v)(h)|| lim W . t→0 t

In a sense, the function DxF maps directions to directions that best approximates the difference in the output of the function. DF (x) generalized on all x values is called the total derivative of f. Its connection to directional derivatives is that on any direction h, the vector DF (x)[h] is the rate of change of F at x in the direction h.

Manifold Theory

The analogue to manifold theory is the differential of a smooth map f : M → N between manifolds M and N. The tangent space at a point p in M can be thought of as a linear approximation to a neighborhood around p as it was shown to be a vector space. The differential of f at a point p can then correspond to a linear map between the tangent space

Tp(M) and the tangent space Tf(p)(N). This in effect defines a linear approximation to f at some point.

Definition 3.1.9 (Differential of Smooth Map Between Manifolds). Suppose M and N are smooth manifolds and f : M → N is a smooth map between them. Let p be a point in

M. The differential dfp : Tp(M) → Tf(p)(N) is defined to be the linear function between the tangent space Tp(M) and the tangent space Tf(p)(N) that has the following action,

∞ dfp[V ][g] = V (g ◦ f) ∀V ∈ Tp(M) and ∀g ∈ C (N).

The last condition of the definition ensures that directional derivative V on the manifold M centered at p map to directional derivative V (· ◦ f) on the manifold N centered at f(p). The next Theorem states the important properties of the differential.

Theorem 3.1.3 (Properties of Differential). Suppose M and N are smooth manifolds and f : M → N is a smooth map between them. Let p be a point in M. The differential dfp : Tp(M) → Tf(p)(N) of f satisfies the following properties.

1. The differential dfp is a linear function.

38 2. The differential satisfies the chain rule ie let W be a manifold and g : W → M be

a smooth function. Let V be a derivational dervative in Tp(W ) for some point p in

manifold W , then the differential d(f ◦ g)p : Tp(W ) → T(f◦g)(p)(N) satisfies

d(f ◦ g)p[V ] = dfg(p) ◦ dgp[V ].

3. If f is a diffeomorphism, then the differential dfp is an isomorphism.

Proof. The proof of (1) follows straight from the definition. Let g ∈ C∞(M) be a smooth real-valued function on N. For satisfying addition,

dfp(V + W )[g] = (V + W )(g ◦ f) = V (g ◦ f) + W (g ◦ f) = dfp(V )[g] + dfp(W )[g].

Scalar multiplication similarly follows. Hence, the differential is a linear map from Tp(M) to

Tf(p)(N). ∞ For the proof of (2), let h ∈ C (N). Consider the following d(f ◦g)p[V ](k) = V (h◦f ◦g).

Since h ◦ f is a smooth real-valued function on M, then V (h ◦ f ◦ g) = dgp[V ](h ◦ f).

Furthermore, since h is a smooth real-valued function on N and dgp[V ] ∈ Tg(p)(M), then dfg(p)[dgp[V ]](h) = (dfg(p) ◦ dgp)[V ](h). For the proof of (3), let f be a diffeomorphism and f −1 be its inverse such that f −1 ◦ f =

IdM . Using the chain rule,

−1 −1 d(f ◦ f)[V ](h) = (dff(p) ◦ dfp)[V ](h) = d(IdM )p[V ](h)

The differential of the identity on the right side has a simple action as d(IdM )p[V ](h) =

V (h ◦ IdM ) = V (h). In other words, the differential of the identity map is the identity map

IdTp(M) on the tangent space Tp(M). Hence we can simplify the right side as,

−1 (dff(p) ◦ dfp)[V ](h) = IdTp(M)[V ](h).

−1 −1 This implies that dfp is surjective and dff(p) is injective. Using a similar argument on f ◦f = −1 IdN , we obtain that dfp is injective and dff(p) is surjective. This completes the proof as the differential is a bijective linear map.

39 Implications

The implications of point (3) in 3.1.3 is that the tangent space always forms a finite- dimensional vector space. Furthermore, every chart induces a canonical basis onto the tangent space. This can be seen as follows. Let p be a point in a n-dimensional manifold M. Let (U, φ) be a chart around p. Since the coordinate map φ can be thought of a diffeomorphism from n the open submanifold of U into the linear manifold R . Then its differential dφp : Tp(U) → n n Tφ(p)(R ) is an isomorphism. It was mentioned before that the tangent space of Tφ(p)(R ) is isomorphic to Rn. n Hence, we obtain that Tp(U) is isomorphic to R . Since open submanifolds are diffeo- morphic to its ”larger” manifold, then Tp(U) is isomorphic to Tp(M). In addition of showing n that Tp(M) is isomorphic to R , it also shows that it has the same vector space dimension equal to the manifold dimension n. The canonical basis can be seen by first writing the coordinate map φ : U → Rn into its component functions xi : U → R such that φ(p) = (x1(p), ··· , xn(p)). Since we’ve shown n that the differential of the chart dφp : Tp(M) → Tφ(p)(R ) is a isomorphism, then choosing ∂ n n the canonical basis { ∂xi φ(p)}i=1 on the codomain Tφ(p)(R ) will induce a basis on the domain −1  ∂ n Tp(M) via the inverse dφp . The derivation ∂xi φ(p) i=1 will be the directional derivative along the direction xi(p) based at φ(p),

∂ ∞ n f(φ(p) + eit) − f(φ(p)) : C ( ) → : f 7→ lim . i R R t→0 ∂x φ(p) t

−1 The action of dφp on this basis will be,

  −1 −1 ∂ ∞ ∂ −1 f(φ (φi(p) + eit)) − f(p) dφ : C (M) → : f 7→ (f ◦ φ ) = lim i R i t→0 ∂x φ(p) ∂x φ(p) t

Since f is smooth then by definition its coordinate function f ◦φ−1 is differentiable and hence

this expression is well-defined. This allows us to define a basis on Tp(M) with having basis elements as   ∂ −1 ∂ i := dφ i . ∂x p ∂x φ(p) It can be thought of as being the directional derivative of the coordinate function f ◦ φ−1 along xi based on the coordinate point φ(p).

40 Examples of the Differential

Just like the total derivative, the differential of a linear map between linear manifolds is itself.

Theorem 3.1.4 (Differential of Linear maps). Let M and N be linear manifolds of dimension m and n, respectively. let f be a linear map between M and N. Then the differential

dfp : Tp(M) → Tf(p)(N) at some vector v ∈ M has the following simple action.

dfp[V ] = (ΦN ◦ f ◦ ΦM )(V ),

where ΦM is a linear isomorphism from Tp(M) to M and ΦN is a linear isomorphism from

N to Tf(p)(N).

If the linear isomorphisms Φ and Γ are clear, we will write it more compactly as dfp[V ] = f(V ).

Proof. Let M and N be a linear manifold and let L : M → N be a linear function between M N them. Denote the basis for M as {mi}i=1 and the basis for N to be {nj}j=1 such that f(mi) = nji for all i ∈ {1, ··· ,N}. Let p be a point in M.

It was shown that Tp(M) has dimension m and Tp(N) has dimension n. Since every finite-dimensional vector space with the same dimension are isomorphic, we can identity M with Tp(M) (and similarly with N). Denote the isomorphic mapping of Tp(M) into M as ∂ ΦM that maps ∂mi p to mi. Similarly, denote the mapping ΦN from N to Tf(p)(N) that maps ∂ basis vector ni to the derivation ∂ni L(p). The proof amounts to showing that the following diagram commutes.

M L N

−1 ΦM ΦN

dLp Tp(M) Tf(p)(N)

However, two linear maps are equivalent if they map the same basis set to the same basis set. Hence, the proof amounts to showing that the following holds for all i ∈ {1, ··· M}:

  ∂ ∂ dLp = . ∂mi p ∂nji L(p)

41 Let g ∈ C∞(N) be a smooth function on N.

  ∂ ∂ dLp (g) = (g ◦ L) (Definition of Differential) ∂mi p ∂mi p |(g ◦ L)(p + tm ) − (g ◦ L)(m )| = lim i i (Differential from Chart) t→0 |t| |g(L(p + tm )) − g(L(m ))| = lim i i t→0 |t| |g(L(p) + tn )) − g(L(p))| = lim ji ( L is linear) t→0 |t|

∂ ∂ = (g ◦ L) = (g) ∂nji L(p) ∂nji L(p)

Here, examples of differential of functions will be introduced that will be used the later chapters.

Example 11 (Trace). Let Hn be the set of n × n Hermitian matrices. Let T r : Hn → R : Pn X 7→ T r(X) be the trace functional, ie T r(X) = i=1 Xii. Following Theorem 3.1.4, The differential of the trace functional is itself ie,

d(T r)X [V ] = T r(V ) ∀V ∈ TX (Hn) ' Hn.

Example 12 (Matrix Logarithm). Recall, the matrix logarithm from example 9. It is a smooth mapping from space of size n × n positive definite matrices to Hermitian matrices of ++ size n × n ie log : Mn → Hn. The differential of the matrix logarithm is a function from Hn to Hn that satisfies the following:

Z 1 −1 −1 d(log)X : V 7→ (Xs + (1 − s)I) V (Xs + (1 − s)I) ds (3.1) 0

If the derivation V ∈ Hn and point X ∈ Hn commute, then it is found to have a simpler formula,

−1 d(log)X [V ] = X V (3.2)

These can found in the appendix starting with equation A.2.

42 Example 13 (von Neumann Entropy). The von Neumann entropy S : M++ → R is defined to be −T r(X log(X)) where X ∈ M++. ++ ++ The differential of entropy dSX : TX (M ) ' Hn → R at a point X ∈ M in direction V ∈ Hn is derived as follows,

dSX [V ] = −d(T r(X log(X)))X [V ] (Definition of Entropy)

= −T r(d(X log(X))X )[V ] (Trace is linear)

= −T r(d(X)X [V ] log(X) + Xd(log(X))X [V ]) (Product Rule) = −T r(V log(X) + XX−1V ) (Differential of log. Equation 3.2)

= −T r((log(X) + In)V )

= h−(log(X) + In)|V i.

Note that due to the cyclic property of the trace, that X−1 and V commute and thus equation † 3.2 can be used. Here, the hX|V i = T r(X Y ) is the Frobenius inner product defined on Hn.

Since dSX is a linear functional then by the Riesz representation Theorem it has a unique dual vector as −(log(X) + In).

3.1.5 Tangent Bundle

Each tangent space Tp(M) can be ”attached” to the point p along the manifold M to produce a bundle of tangent spaces called the tangent bundle. This provides a nice representation of tangent spaces which will be used in the later sections. The tangent bundle is also a manifold. Definition 3.1.10 (Tangent Bundle). Let M be a n-dimensional manifold. The tangent bundle TM is defined to be the disjoint union of each tangent space Tp(M) of M, ie

TM = {(p, v) | p ∈ M and v ∈ Tp(M)}.

The tangent bundle can be thought of as having one axis being the manifold and the other axis being the tangent space. It is not surprising then that it is itself a smooth manifold. Theorem 3.1.5 (Proposition 3.18 [20]). The tangent bundle TM of a n-dimensional man- ifold M is a 2n-dimensional manifold. The formulation of tangent bundle makes it easy to define what a vector field is for smooth manifolds.

43 Definition 3.1.11 (Vector Field). Let M be a smooth manifold and consider TM to be the tangent bundle of M. A vector field

Φ: M → TM

is a smooth mapping from M to the tangent bundle TM such that Φ(x) ∈ Tx(M).

3.1.6 Submanifolds

The goal of this subsection is to prove that the space of positive definite density matrices ++ ++ D is a embedded submanifold of Hn and M . Not all subsets of manifolds are smooth manifolds. Consider, a subset that has corner points (breaks smoothness) or any self-intersections (breaks locally Euclidean). The previous section already shown a simple kind of submanifold structure, the open submanifolds, which is diffeomorphic to the embedding. Here, we will take the definition of a embedded submanifold from Lee’s book [20, Chapter 5],

Definition 3.1.12 (Embedded Submanifold). Let M be a smooth n-dimensional manifold. A subset S of M with the subspace topology is called a embedded submanifold if the inclusion map i : S,→ M is a smooth embedding. In other words, i is a homeomorphism onto its

image and its differential dip has constant rank and is injective for all points p in S.

The following Theorem provides a easy way to find embedded submanifolds [20, Corollary 5.13].

Theorem 3.1.6. Let f : M → N be a smooth map between two smooth manifolds M and N.

If f has differential dfp that has constant rank over all points p in M and its differential is always surjective over all points p then each level set is a embedded submanifold of M whose codimension equals the dimension of N. The function f is called a defining map of S.

The defining map can also be used to characterize the tangent space of S. The following Theorem is from Lee’s book, [20, Proposition 5.38]. Here, we are using the fact that every defining map is a local defining map.

Theorem 3.1.7. Let S be a embedded submanifold from a smooth manifold M. Let f : M →

N be the defining map of S. The tangent space Tp(S) at a point p in S is characterized to

be equal to the kernel of the differential dfp : Tp(M) → Tf(p)(N) of the defining map f.

44 Examples of Submanifolds

Example 14 (Trace-class Sub-manifold(Affine)). Recall, that the trace map T r : Hn → R is a smooth linear functional and that its differential d(T r)X : Hn → R is equivalent to T r. Here we will show using Theorem 3.1.6 that the space of all trace k, Hermitian matrices is a embedded submanifold of dimension n2 − 1.

To show this, the differential d(T r)X over all Hermitian matrices X needs to be shown to have rank 1. This can be trivially seen by choosing any matrix V˜ with the top left diagonal element of 1 and the rest of elements to be zero. This matrix is Hermitian and will have trace one. Thus the trace map over the span of V˜ will cover all of R and hence the differential is rank 1. This implies it is surjective. This shows that the conditions of Theorem 3.1.6 is now satisfied and thus each level −1 2 T r (k) = {X ∈ Hn | T r(X) = k} is a embedded submanifold of dimension n − 1. The kernel of T r(V ) is the set of all trace zero, Hermitian matrices. Hence, the tangent −1 −1 space T rX (T r (k)) of trace k, Hermitian matrices at a matrix X ∈ T r (k) is equal to 0 Hn = {V ∈ Hn | T r(V ) = 0}.

Using the previous example, we can now show that the space of positive definite, trace ++ one matrices, denoted as Dn , is a embedded submanifold.

Example 15 (Positive-Definite Density Matrices). The previous example shows that the space of trace-one, Hermitian matrices, denoted here as T r−1(1), is a embedded submanifold of Hn. ++ Recall, that the n × n positive definite matrices Mn is a open submanifold of the space of n × n Hermitian matrices Hn (Example 5). Additionally, embedded submanifold have the ++ subspace topology inherit from the ”bigger” manifold Hn. Thus the open set Mn intersected with the space of trace-one matrices T r−1(1) is open within the subspace topology of T r−1(1). This implies that it has a open submanifold structure of dimension n2 − 1 and equivalent tangent spaces. ++ Alternatively, the differential of the trace map when restricted to Mn is equivalent to ++ the original trace map over Hn. This is due to the fact that both of the manifolds Mn and −1 Hn are diffeomorphic to one another. Following the previous example, it shows that T r (1) in M++ is a submanifold of dimension n2 − 1. ++ This shows that the space of positive definite, trace one matrices Dn is a embedded 2 ++ submanifold of dimension n − 1 with tangent space Tρ(Dn ) being isomorphic to the space 0 of trace zero matrices Hn = {V ∈ Hn | T r(V ) = 0}.

45 Remark 2. The manifold structure of positive definite, density matrices has been investigated before within literature. The paper [14] discusses the complete geometric structure of density matrices. They show that only qubit density matrices are a smooth manifold with boundary. See [12], [13], [24], and [6] for further references of density matrices as manifolds.

3.1.7 Riemannian Manifolds

Recall from calculus, that the gradient of a smooth function is the direction of steepest- ascent. For abstract, smooth manifolds, there isn’t any way of comparing directions to one another. This can be remedied by inducing a inner product on the tangent space of a manifold. This inner product allows one to define a notion of size of derivations and furthermore the notion of an angle between derivations. This will help with generalizing the gradient onto manifolds. If one further requires that the inner product is smooth map with respect to the manifold then the inner product is called a Riemannian metric. A manifold equipped with a Riemannian metric is called a Riemannian manifold.

Definition 3.1.13 (Riemannian Metric). Let M be a smooth manifold. A Riemannian metric g : TM × TM → R is a smooth function on the tangent bundle TM of M such that g restricted to Tp(M) × Tp(M) is a inner product on Tp(M) for all p ∈ M. A manifold M together with a Riemannian metric g is called a Riemannian manifold denoted as (M, g). The metric g restricted to Tp(M) × Tp(M) is denoted as gp or h·|·ip.

The next Theorem gives a easy characterization of the Riemannian metric onto embed- ded submanifolds. The proof follows from the fact that the tangent space of a embedding submanifold is a subspace of the tangent space of the ”larger” manifold.

Theorem 3.1.8 (Induced-Riemannian Metric on Sub-manifold). Let (M, g) be a Rieman- nian manifold and S be a embedded submanifold. Then a Riemannian metric can be defined on S by restricted the Riemannian metric g to act on the tangent sub-bundle T (S) × T (S) rather than the tangent bundle T (M) × T (M).

Proof. Let S be a embedded submanifold of a Riemannian manifold (M, h·|·i). Consider the inclusion map i : S,→ M and its differential dip : Tp(S) ,→ Tp(M). The tangent space Tp(S) is a subspace of Tp(M) from the following. Let f be a smooth function on M and let V be a derivation in Tp(S). Then the differential of inclusion on derivation V acting on f will be

dip(V )(f) = V (f ◦ i) = V (f|S).

46 Suppose dip(V ) is the zero-derivation on M. Then V (f|S) will be zero for all smooth functions

on M. This is only true if V is the zero derivation, hence dip is a injective linear map.

Since Tp(S) is a subspace of Tp(M) then the inner product h·|·ip restricted to Tp(S) is a inner product. What’s left is to show smoothness of the Riemannian metric on the sub- bundle T (S) × T (S). However, this can be seen by defining the inclusion ˜i : T (S) × T (S) ,→ T (M) × T (M) and noting g ◦ ˜i is the desired Riemannian metric on S.

This implies that the Hermitian matrices Hn together with the Frobenius inner product † ++ ++ hX|Y i = T r(X Y ) forms a Riemannian linear manifold. Futhermore, both Mn and Dn are also a Riemannian submanifold induced from the inner product on Hn.

Orthogonal Complement and Projections

Recall from linear algebra, that two vectors v, w from a inner product, finite-dimensional vector space (V, h·|·i) are orthogonal if hv|wi = 0. Every subspace W has a orthogonal complement W ⊥ such that all vectors in W are orthogonal to all vectors in the complement W ⊥. Furthermore, the direct sum W ⊕W ⊥ is equal to V and there exists projection operators ⊥ PW and PW ⊥ defined as follows based on the decomposition W ⊕ W = V ,

⊥ PW : W ⊕ W → W :(v, w) → v ⊥ ⊥ PW ⊥ : W ⊕ W → W :(v, w) 7→ w.

Let p be a point in a Riemannian submanifold (S, h·|·ip) of a ”bigger” Riemannian man-

ifold (M, h·|·ip). The tangent space Tp(S) is a subspace of the tangent space Tp(M). Hence ⊥ there exists a orthogonal subspace Tp(S) such that

⊥ Tp(S) ⊕ Tp(S) = Tp(M).

Example 16 (Positive-Definite Density Matrices). The space of positive definite density ++ ++ matrices Dn has a tangent space Tρ(Dn ) equivalent to the span of trace zero, Hermitian 0 matrices Hn. This is a subspace of the set of all n × n Hermitian matrices Hn. 0 ⊥ It can be shown that the orthogonal complement (Hn) is precisely the span of the identity. 0 Let V ∈ Hn and W ∈ span{In}, then the inner product between them is zero.

hV |W i = T r(VW ) = aT r(VI) = aT r(V ) = 0.

47 Considering that the dimension of W ⊕ S is dim(W ) + dim(S) and since W ⊕ S is a subspace of V , then its dimension must be less than the dimension of V . Since the dimension 0 2 2 of span(In) is one, the dimension of Hn is n − 1 and the dimension of H is n , then clearly we have that, 0 Hn ⊕ span{In} = H.

0 ⊥ Thus showing that (Hn) = span{In}.

0 The projection operator onto Hn also has a nice-characterization.

0 Theorem 3.1.9. Define the function Pρ : Hn → Hn from the space of size n × n Hermitian matrices to space of size n × n trace zero, Hermitian matrix as

T r(V ) P : V → V − I , ρ n n

where In is the identity matrix. ++ ++ Then Pρ is a projection function from Tρ(M ) to Tρ(D ). Furthermore, re-defining ++ 0 the projection map P : D → L(Hn, Hn): ρ 7→ Pρ to the space of bounded linear functions 0 L(Hn, Hn) becomes a constant map.

Proof. Let ρ be a positive definite, density matrix. The function Pρ maps to a trace zero matrix as T r(V ) T r(P (V )) = T r(V − I ) = T r(V ) − T r(V ) = 0. ρ n n 2 Furthermore, the function is clearly linear and a projection map due to the fact that Pρ = Pρ.

Since Pρ does not depend on ρ, then re-defining the projection map as

++ ++ ++ 0 P : D → L(Tρ(M ),Tρ(D )) ' L(H, Hn): ρ 7→ Pρ

is clearly a constant map.

3.2 Riemannian Gradient and Hessian

This section will introduce the notion of gradient and Hessian of a smooth functions defined on a Riemannian manifold. The Euclidean gradient is first discussed, following the Rieman- nian gradient. The gradient of the von Neumann entropy will be calculated. The section will finish with a presentation on the Hessian. Throughout this subsection, let (M, g) be a

48 Riemannian manifold with Riemannian metric g. Let f : M → R be a real-valued, smooth function defined on M.

3.2.1 Euclidean Gradient

Suppose M is a linear, Riemannian manifold (ie finite-dimensional inner product vector

space). The total derivative D(f)p : M → R is the best linear approximation to the real- valued function f at the point p in M. In the other words, it satisfies informally that

Dp(f)(v) ≈ f(x + v) − f(x) for all small enough vectors v and formally as a limit,

|f(x + v) − f(x) − D(f) [v]| lim p . v→0 ||v||

Consider the set of all unit vectors of M. It is clearly a compact set due to the Heine-

Borel property and since D(f)p is a linear function, then by the extreme value Theorem, sup||v||=1 |Dp(f)(v)| exists. The gradient grad(f, p) of function f at point p is defined to be direction of steepest ascent of the total derivative D(f)p, ie the best direction v with size one that makes f(x + v) − f(x) large as possible. This is equivalent to the following,

grad(f, p) = arg sup |Dp(f)[V ]|. ||V ||=1

n Furthermore with a basis {vi}i=1 on the linear manifold M will induce a chart (M, Φ), where Φ maps any vector v in M to its coefficients in Rn. This will also induce a basis ∂f ∂f on Tp(M) to be the partial derivatives of the basis coordinates, [ , ··· , ]. Then any ∂v1 ∂vn PN ∂ derivation V ∈ Tp(M) can be written as ci . The previous definition of the gradient i=1 ∂vi implies,

X sup |D(f)p[V ]| = sup | ciD(f)p[vi]| ||V ||=1 T T = sup |hDp(f)[v]|ciR| (c = (c1, ··· , cn) and v = (v1, ··· , vn) ) ||V ||=1

= sup |D(f)p[v]| · ||c|| cos(θ)| (Cauchy-Schwarz Inequality) ||V ||=1

= sup |D(f)p[v]|| cos(θ)| (||c|| = 1 since ||V || = 1) ||V ||=1

= |D(f)p[v]|. (Choose V such that cos(θ) = 1.)

49 This clearly shows that the gradient is the vector.

3.2.2 Riemannian Gradient

For general Riemannian manifolds, the gradient grad(f, p) can be defined using the Rieman- nian metric to define the direction of steepest ascent.

Definition 3.2.1 (Gradient on Riemannian Manifolds). Let (M, g) be a Riemannian man- ifold and let f : M → R be a smooth real-valued function on M. The gradient grad(f): M → T (M) of f on M is defined to be a vector field such that

grad(f)(p) = arg max |dfp[V ]| V ∈Tp(M) ||V ||=1

The Riemannian gradient grad(f)(p) also has a better characterization for computation purposes at the expense of intuition.

Theorem 3.2.1. Let (M, g) be a Riemannian manifold and let f : M → R be a smooth real-valued function on M. The gradient grad(f)(p) is the unique linear operator such that

hgrad(f)(p)|V ip = dfp[V ],

where h·|·ip is the inner product on Tp(M) × Tp(M) induced from g.

Proof. Since f is a real-valued function, then its differential dfp at a point p is a linear functional on Tp(M). From the Riesz representation Theorem, there exists a unique dual ? vector A ∈ Tp (M) such that for all derivations V ∈ Tp(M),

hA|V i = dfp[V ], (3.3)

where h·|·ip = gp is the inner product on Tp(M) × Tp(M) from the Riemannian metric g. Completion of this proof occurs if we show that A is equivalent to grad(f)(p). The proof follows analogously from the Euclidean space. The point p is surrounded by a chart (U, φ) N such that it induces a basis {Vi}i=1 on Tp(M). Hence every derivation V can be written as T T hv|ci, where v = (v1, ··· , vn) and c = (c1, ··· , cn) .

X max |dfp[V ]| = max | cidfp(Vi)| V ∈Tp(M) ||V ||=1 V ∈Tp(M) ||V ||=1

50 T = max |h(dfp(V1), ··· , dfp(Vn) |cip| V ∈Tp(M) ||V ||=1 T = max | ||(dfp(V1), ··· , dfp(Vn)) || · ||c|| cos(θ)| V ∈Tp(M) ||V ||=1 T = ||(dfp(V1), ··· , dfp(Vn)) || max ||c|| cos(θ)| V ∈Tp(M) ||V ||=1 T = ||(dfp(V1), ··· , dfp(Vn)) ||

Now this satisfies equation 3.3 as

X T hA|V ip = dfp[V ] = cidfp(Vi) = h(dfp(V1), ··· , dfp(Vn)) |cip.

The result follows from uniqueness of A.

Gradient on Riemannian Submanifolds

Suppose one has the gradient of a smooth real-valued function f : M → R on the bigger Riemannian manifold (M, g) and wants the gradient instead on the embedded submanifold S. How can this be accomplished?

On any given point p in the submanifold S the tangent space Tp(S) on S at p is a subspace of the tangent space Tp(M) of the ”bigger” manifold. Denote Pp to be the orthogonal S projection operator from Tp(M) to Tp(S). Here we will prove that the gradient grad (f)(p) M on the submanifold is the projection Pp of the gradient grad (f)(p) of the larger manifold M, ie   S M grad (f)(p) = Pp grad (f)(p) (3.4)

⊥ Proof. Recall that the tangent space has decomposition Tp(M) = Tp(S) ⊕ Tp(S) and the projection operator based on the decomposition is Pp : Tp(M) → Tp(S):(v, w) 7→ v. M Then the gradient grad (f)(p) ∈ Tp(M) at a point p can be decomposed into (v, w) where ⊥ v ∈ Tp(S) and w ∈ Tp(S) . The smooth real-valued function f can be defined on the submanifold S via its inclusion map i : S,→ M to obtain f ◦ i : S → R. Using the chain rule, the differential of f ◦ i is just dfp restricted to Tp(S) since

d(f ◦ i)p = dfp ◦ dip = dfp . Tp(S)

51 It is easy to see now that for any derivation V in Tp(M) that

dfp[Pρ(V )]) = d(f ◦ i)p[Pρ(V )]. (3.5)

Recall the definition of the gradient on M and S respectively.

M hgrad (f)(p)|V i = dfp[V ] (For all V ∈ Tp(M)) S hgrad (f)(p)|W i = d(f ◦ i)p[W ] (For all W ∈ Tp(S))

Plugging equation (3.5) into the gradient formula for M:

M hgrad (f)(p)|Pp(V )i = dfp[Pp(V )] M hgrad (f)(p)|Pp(V )i = d(f ◦ i)p[Pp(V )] (Equation (3.5)) M hPp(grad (f)(p))|V i = d(f ◦ i)p[Pp(V )] (Pp is a orthogonal projection)

Restricting V to lie in the tangent space of Tp(S) recovers the gradient formula on the submanifold S. Hence, from the uniqueness of the dual vector from Riesz representation Theorem the proof is now complete, ie

M S Pp(grad (f)(p)) = grad (f)(p).

Remark 3. The Riemannian manifold of set of Hermitian matrices H is diffeomorphic to the open set of positive definite, Hermitian matrices M++. The space of positive definite, trace-one matrices D++ is a submanifold of M++. The above Theorem states that any differentiable function f : D++ → R on positive definite density matrices can be computed from first computing the gradient on H via matrix calculus and then projecting the gradient with respect to D++. The next chapter shows how to apply this for the coherent information of a channel.

Example 17 (von Neumann Entropy on D++). Recall that the gradient of the von Neumann ++ ++ entropy S : M → R at a point X ∈ M is −(log(X) + In) (example 13). Then the gradient of the von Neumann entropy on D++ at ρ ∈ D++ is   ++ ++ T r(log(ρ) + I ) gradD (S)(ρ) = P gradM (ρ) = −(log(ρ) + I ) + n I ρ n n n

52 This can be simplified further by canceling out the identity matrix to get

++ T r(log(ρ)) gradD (S)(ρ) = − log(ρ) + I . n n

In The von Neumann entropy is concave and maximized by the maximally mixed state n . The gradient at the maximally mixed state satisfies being a critical point due to the following formulas.

I  log(n) log(ρ) = log n = − I and T r(log(ρ)) = − log(n). n n n

Which when plugged in the gradient the zero-matrix/derivation is obtained,

log(n) log(n) I − I = 0. n n n n

This is the only critical point because of the following reasons. A point ρ is a critical point if the gradient is zero ie,

T r(log(p)) T r(log(ρ)) 0 = − log(ρ) + I =⇒ log(ρ) = I . n n n n

Taking the matrix exponential on both sides (see appendix A for properties of matrix func- tions):

T r(log(ρ)) T r(log(ρ)) In ρ = exp n = e n In.

T r(log(ρ)) 1 n Using the fact that T r(ρ) = 1, then this is only true if e = n . Furthermore, it implies 1 T r(log(ρ)) = n ln( n ). Since the trace is the sum of eigenvalues and log is a matrix function, then this only happens when the point is the maximally mixed state.

3.2.3 Hessian

Knowing that a point p is a critical point is not enough and one generally needs the eigenval- ues of the Hessian to know whether p is a local maxima or minima. This subsection introduces the Hessian defined on Riemannian manifold. This section will conclude on showing that, similar to before, one can extrinsically define the Hessian and project that Hessian onto the submanifold. For detailed reference, refer to [1].

53 Euclidean Riemannian Connection

The Hessian of a real-valued function on a linear manifold is the change of the gradient as you move along in the linear manifold. However, to do so in the manifold context requires the ability to compare derivations V in Tp(M) at a point p to derivations W in Tp0 (M) at some other point p0. The function that accomplishes this smoothly is known as the affine connection. Although there are infinite number of ways to uniquely define the affine con- nection there is a standard one on Riemannian manifolds. Both are presented below and definitions can be found in [1].

Definition 3.2.2 (Affine and Riemannian Connection). Let M be the smooth manifold and denote X (M) be the space of all vector fields on M. The affine connection ∇ is a mapping,

∇ : X (M) × X (M) → X (M)

which is written as (χ, ξ) 7→ ∇χξ such that it satisfies the following properties:

1. It is C∞(M)-linear in component χ, ie

∇fχ1+gχ2 (ξ) = f∇χ1 ξ + g∇χ2 ξ

∞ for all f, g ∈ C (M) and χ1, χ2, ξ ∈ X (M).

2. It is R-linear in component ξ, ie

∇χ(aξ1 + bξ2) = a∇χξ1 + b∇χξ2

for all a, b ∈ R and χ, ξ1, ξ2 ∈ X (M).

3. It satisfies the product rule, ie

∇χ(fξ) = (χ(f))ξ + f∇χ(ξ)

for all f ∈ C∞(M) and χ, ξ ∈ X (M). Note, that χ(f) is a vector field on M by the

following action χ(f)(p) = f(p)χp, where χp ∈ Tp(M).

If M is a Riemannian manifold with Riemannian metric h·|·i. Then there is a unique affine connection such that it satisfies the following properties:

54 1. It has symmetry, ie

∇χξ + ∇ξχ = [χ, ξ]

where [χ, ξ](f) = χ(ξ(f)) − ξ(χ(f)) is the lie bracket and f ∈ C∞(M).

2. ηhχ|ξi = h∇ηχ|ξi + hχ|∇ηξi for all η, χ, ξ ∈ X (M). Note that hξ|χi is a function from

M to the real line via hξ|χi(p) = hξp|χpip. This connection is called the Riemannian connection.

The next following result (Equation 5.15 in [1]) characterizes the Riemannian connection of a submanifold when the ”larger” manifold is a Riemannian linear manifold.

Proposition 3.2.1.1. Let M be a Riemannian submanifold of a Riemannian linear manifold E. Let ∇ be a Riemannian connection on M. Then for all vector fields χ, ξ ∈ X (M) on M,

∇χξ = Pp(dξp[χ])

where Pp is the projection of tangent space Tp(E) to the tangent space Tp(M) and dξp[χ] is a 0 0 vector field such that dξp[χ](p ) = dξp[χ(p )]. Note that since ξ : M → TM is a vector field and the tangent bundle is a manifold, then its differential is dξp : Tp(M) → Tp(M).

Hessian

The following is the abstract definition of the Riemannian Hessian based on the Riemannian connection ∇.

Definition 3.2.3 (Riemannian Hessian. Definition 5.5.1 in [1]). Let M be a Riemannian manifold with the Riemannian connection ∇. Let f be a smooth real-valued function on M.

The Riemannian Hessian, denoted as Hessf(x), at x ∈ M is the mapping from Tx(M) to

Tx(M) such that

Hess(f(x))[Vx] = ∇Vx grad(f) for all derivations Vx ∈ Tx(M). The next proposition is when the Riemannian manifold M arises as a submanifold of a Riemannian linear manifold. This is combining the definition above with proposition 3.2.1.1.

Proposition 3.2.1.2 (Hessian on Submanifolds to Riemannian Linear Manifolds). Let M be a Riemannian manifold that is a embedded submanifold of a Riemannian, linear manifold E. Let f be a smooth real-valued function on M and p be a point in M.

55 Then the Hessian is  

Hess(f(p))[Vp] = ∇Vp grad(f) = Pp d(grad(f))p[Vp] ,

where Vp is a derivation in Tp(M).

Let M be a embedded, Riemannian submanifold of a Riemannian, linear manifold E such that the Riemannian metric on M is induced from E. Let f be a smooth real-valued function on E and its restriction to M as f M . Denote gradM (f) and gradE (f) to be the gradient of f with respect to M and E, respectively. It was shown in the previous subsection that M E grad (f) at a point p ∈ M is equal to Pp(grad (f)(p)). The following shows a similar result for the Hessian. It is based on a technique which is discussed in the 2013 paper ”Extrinsic view of the Riemannian Hessian” [2]. Using the

Hessian formula from 3.2.1.2, let x be a point in M and Vx be a derivation in Tx(M),   M M Hess(f (x)) = Px d(grad (f))x .

Using the fact that gradM (f) = P (gradE (f)),

  M E Hess(f (x)) = Px d(P (grad (f)))x .

The technique used in [2], is to note that P : M → L(T E,TM): x 7→ Px is a mapping from E M to a linear map Px between Tx(E) and Tx(M). The gradient grad (f) is a mapping from

M to a derivation in Tx(E). Once a orthonormal basis is fixed, then Px is a matrix-valued function and gradE (f)(x) is a vector-valued function (since E is a vector space). Hence, the usual product rule can be used:   M E E Hess(f (x)) = Px d(P )x(grad (f)(x)) + Pxd(grad )x .

2 Using the fact that Px = Px, it can be reduced to:   M E E Hess(f (x))[Vx] = Px d(P )x(grad (x))x[Vx] + Pxd(grad (x))x[Vx],

where d(P )x can be thought of as the differential of P : M → L(T E,T M) as a operator- valued function.

56 Since P as a operator-valued function is constant on the space of positive definite density matrices D++. The Hessian can be simplified further. The following example shows this.

++ Example 18 (Riemannian Hessian on Any Smooth Function on D ). Let f : Hn → R be D any smooth real-valued function on the linear manifold Hn. Denote f to be the restriction of f to D++. The Hessian of f D has a easy characterization based on the Euclidean Hessian of f.

D D Hessf (ρ)[V ] = OV grad(f (ρ)) D  = Pρd grad(f ) ρ[V ] ( Proposition 3.2.1.1)  = Pρd P grad(f) ρ[V ] ( Definition of extrinsic gradient)    = Pρ d(P ))ρ[V ]grad(f(ρ) + Pρd grad(f(ρ))) ρ ( Product Rule)

 2 = Pρ d(grad(f))ρ[V ] + Pρd(P )ρ[V ]grad(f(ρ)) (Pρ = Pρ)

= Pρ(d(grad(f))ρ[V ]) ( d(P )ρ = 0 Theorem 3.1.9)

Using this, the next example calculates the Hessian of the von Neumann entropy on D++. Example 19 (Riemannian Hessian on von Neumann entropy). Recall, that the gradient of von Neumann entropy S has the following form:

D++  grad (S)(ρ) = −Pρ log(ρ) ,

X 0 where Pρ(X) = X − n In is the orthogonal projection from H to H . Using the previous example 18, the Hessian on D++ is

Hess(S)(p) = −Pρ(d(log)ρ), (3.6)

which is a linear map from H0 to H0. It is known from the first chapter that Von Neuman entropy is a concave function. A concave function implies that its Hessian Hess(S)(p) is negative definite on all points p ie T r(V Hess(S)(p)V ) ≤ 0 for all non-zero trace zero Hermitian matrices V ∈ H0. Here, it will be shown that equation (3.5) is always negative definite on ρ ∈ D++. Let V be a trace zero Hermitian matrix.

−1 −T r(VPρ(d(log)ρ)[V ]) = −T r(VPρ(ρ V )) (3.7)

57 † −1 = −T r(Pρ (V )(ρ V ) (3.8) = −T r(V ρ−1V ) (3.9) = −T r(V 2ρ−1) (3.10)

Equation (3.6) follows from the differential of the logarithm, equation 3.2, since the trace causes ρ−1 and V to commute. Equation (3.7) follows from the adjoint of a linear map. Equation (3.8) follows from the fact that V is already a trace zero Hermitian matrix. The last line follows from the cyclic property of the trace. To show that this value is always less than zero is to show that T r(V 2ρ−1) > 0 for all trace zero Hermitian matrices V . Here, the von Neumann trace inequality Theorem will be used here. The proof can be found in page 340-341 in the book [23].

Theorem 3.2.2 (Trace Inequality). Let A and B be two n × n Hermitian matrices with eigenvalues λ1 ≥ · · · ≥ λn for A and eigenvalues β1 ≥ · · · ≥ βn for B, then

n X λiβn−i+1 ≤ T r(AB) i=1

Noting that t5he eigenvalues of V 2 is always positive because it is squared and the eigenvalues of ρ−1 is always positive due to ρ being positive definite. Then by the trace inequality Theorem, n X 2 −1 0 < λiβn−i+1 ≤ T r(V ρ ), i=1 2 −1 where the eigenvalues of V and ρ respectively are λi and βi, Equality to zero never occurs because V is a non-zero matrix and ρ−1 is positive definite, hence there is no zero eigenvalues.

58 Chapter 4

Gradient/Hessian of Coherent Information on Positive Definite Matrices

In the previous chapter, it was shown that the space of positive definite, density matrices D++ has a Riemannian manifold structure. Additionally, the von Neumann entropy S(ρ) is a smooth function on D++ to the real line R. Its gradient has a simple expression due to the ability to solve it within the bigger, linear manifold H and projecting the answer onto the desired space D++. The goal of this section is to apply the results from the previous chapter to the coherent ++ information Ic(N , ρ): D → R of the channel N with respect to a positive definite density matrix ρ ∈ D++. This has a particular representation as

c Ic(N , ρ) = S(N (ρ)) − S(N (ρ)).

In order to accomplish this, we must first characterize when the coherent information is a smooth function on D++ and furthermore calculate its gradient and Hessian. The first section of this chapter, introduces a the class of quantum channels called strictly positive quantum channels. These are exactly the channels that maps positive definite den- sity matrices to positive definite density matrices, ie N (D++) ⊆ D++. One way to charac- terize strictly positive channels is when the dimension of the input of the quantum channel matches the dimension of the output of the quantum channel. It will be shown that in such a

scenario, if at least one Kraus operator Ai is invertible, then the channel is strictly positive.

59 Lastly, it will be shown that strictly positive quantum channels are in fact dense within the class of all quantum channels (see 4.1.3). The following section will characterize when the complementary channel is strictly pos- itive. It will be shown that if the Kraus operators are linearly independent then the com- plementary channel is always strictly positive. This implies based on Choi’s Theorem, that this can always be guaranteed. Both conditions on the quantum channels above will give the conditions needed so that the coherent information is indeed a smooth function on D++ allowing the computation of Riemannian gradient and Hessian. The final section will compute the gradient and the Hessian of the coherent information of a channel. The gradient will be first solved on the space of positive definite matrices M++ and then projected onto D++ (see section 4.3). The many different characterization of when a state ρ ∈ D++ is a critical point (Theorem 4.3.2) and furthermore, a local maxima/minima will be presented (Theorem 4.3.6).

4.1 Strictly Positive Quantum Channels

The goal of this section is to introduce quantum channels N that map positive definite density matrices D++ to positive definite density matrices D++.

4.1.1 Definition

A linear map Φ : L(HA) → L(HB) is said to be strictly positive if when A is positive definite then Φ(A) is positive definite. For a detailed reference of strictly positive linear maps see [4].

Definition 4.1.1 (Strictly Positive Quantum Channel). Let HA and HB be two finite- dimensional Hilbert spaces. Let N : L(HA) → L(HB) be a quantum channel between system A and system B. The quantum channel N is said to be strictly positive (or positive definite invariant) if it maps positive definite matrices in L(HA) to positive definite matrices in L(HB).

The next Theorem shows when dim(HA) is equal to dim(HB), there is a simple condition to check for positive definite invariance of quantum channels.

Theorem 4.1.1 (Kraus Condition for strictly positive channels). Let HA and HB be two

finite-dimensional Hilbert spaces such that dim(HA) = dim(HB). Let N : L(HA) → L(HB)

60 be a quantum channel between system A and system B with Kraus operators {Ai : HA → M HB}i=1. If at least one Kraus operator is invertible, then N is positive definite invariant.

Before proving it, we are going to need two corollaries that gives us an upper bound on the rank of a density state ρ.

Lemma 4.1.2. Given two positive semidefinite matrix A, B, the following relations hold,

min(rank(A), rank(B)) ≤ rank(A + B) (4.1) max(rank(A), rank(B)) ≤ rank(A + B) (4.2) rank(A) ≤ rank(A + B) (4.3) rank(B) ≤ rank(A + B), (4.4)

Note that if (4.2) holds, then trivially the rest all hold.

Proof. Based on the note, only equation 4.2 is going to be proved. Recall that the rank of a matrix A : V → W is the dimension of the image im(A) as a linear map. Additionally, the rank nullity Theorem says that dim(V ) = dim(im(A)) + dim(ker(A)). This implies that equation 4.2, if it were to be true, is equivalent to showing that dim(ker(A + B)) ≤ min{dim(ker(A)), dim(ker(B))}. The sum of two positive semidefinite matrices A + B implies

ker(A + B) = ker(A) ∩ ker(B). (4.5)

This is indeed so as ker(A)∩ker(B) ⊂ ker(A+B) is trivial to show. For the other inclusion, let |ψi ∈ ker(A + B) be a vector such that

hψ|(A + B)|ψi = hψ|A|ψi + hψ|B|ψi = 0

Since hψ|A|ψi ≥ 0 and hψ|B|ψi ≥ 0, then both hψ|A|ψi and hψ|B|ψi must be equal to zero. This implies that |ψi ∈ ker(A) (and ker(B)) as every positive semidefinite matrix has a square root such that A = L†L, then (hψ|L†)(L|ψi) = 0 implies that |ψi ∈ ker(L). But the kernel of L is equivalent to the kernel of A. Since ker(A + B) = ker(A) ∩ ker(B) and that intersections of subspaces is a subspace of both spaces, one has that dim(ker(A + B)) ≤ dim(ker(A)) and dim(ker(A + B)) ≤ dim(ker(B)). Thus proving expression (4.2).

61 All three expressions (4.1, 4.3, 4.4) follow from the fact expresstion (4.2) holds.

Using the lemma above we get a stronger statement,

Proposition 4.1.2.1. Let HA and HB be two finite-dimensional Hilbert spaces such that dim(HA) = dim(HB). Let N : L(HA) → L(HB) be a quantum channel between system A M and system B with Kraus operators {Ai : HA → HB}i=1. If at least one Kraus operator is injective then for any rank-k density matrix ρ the inequality holds

rank(N (ρ)) ≥ rank(ρ)

Proof.

X † rank(N (ρ)) = rank( AiρAi ) † † ≥ max{rank(A1ρA1), ··· , rank(AM ρAM )} (4.2 from Theorem 4.1.2) † † = max{rank(ρ), rank(A2ρA2), ··· , rank(AM ρAM )} (Assumption) † = max{rank(ρ), rank(A2ρ), ··· , rank(AM ρ)} (rank(A ) = rank(A))

= rank(ρ)(rank(Aiρ) ≤ rank(ρ))

Using these corollaries, the proof of Theorem 4.1.1 is trivial.

Proof of Theorem 4.1.1. The proof follows from the fact that since dim(HA) is equal

to dim(HB) and the rank N (ρ) is at-most dim(HB). Using proposition 4.1.2.1, then for a

rank-(dim(HA)) density matrix ρ the inequality holds,

rank(dim(HA)) ≤ N (ρ) ≤ rank(dim(HA)).

It would be nice to have a similar result to larger output spaces ie dim(HB) > dim(HA). This is the case for the dephrasure channel (introduced in subsection 2.4.3), as the last row

is independent of any density matrix ρ ∈ L(HA). In particular, in standard basis N (ρ) of the dephrasure channel will have the form, " # P A ρA† 0 N (ρ) = i i 0 q

62 4.1.2 Dense

Given any two quantum channels N1, N2 : L(HA) → L(HB) then any convex combination

λN1 +(1−λ)N2 inbetween them with λ ∈ [0, 1] is also a quantum channel. Denote the space

of all quantum channel between L(HA) and L(HB) as C(A, B). This is a convex subset of

the vector space of bounded linear operators L(L(HA), L(HB)) with the operator norm.

Theorem 4.1.3 (Positive Definite Invariant Channels are Dense). The space of strictly positive quantum channels is dense and convex inside the space of all quantum channels C(A, B).

Proof. Let N be a quantum channel in C(A, B). Consider the following map depending on positive integer n ∈ N, n 1 N + I 1 + n 1 + n where I is the completely deplorizing map that sends ρ to the maximally mixed state I dim(HB ) in system B. Denote db = dim(HB). This is a quantum channel as it is a convex sum of two quantum channels. This map is strictly positive due to corollary 4.1.2, in particular

max{rank(A), rank(B)} ≤ rank(A + B).

n 1 I Let ρ ∈ L(HA) be a positive definite quantum state. Denote A = N (ρ) and B = . 1+n 1+n db Since I has maximal rank then it must mean that rank(A + B) = rank(B). db Putting any norm on L(L(HA), L(HB)), (since it is finite-dimensional) then

n 1 n 1 n→∞ N + I ≤ ||N || + ||I|| −→ ||N ||. 1 + n 1 + n 1 + n 1 + n

n 1 This implies that as n increases, the channel 1+n N + 1+n I approaches N . Hence the space of all strictly positive quantum channels is dense inside C(A, B).

To see convexity, let λ ∈ [0, 1] and consider a convex combination λN1 + (1 − λ)N2 of

strictly positive channels N1 and N2. This is also a strictly positive channel. Let ρ be a

positive definite matrix, then consider λN1(ρ) + (1 − λ)N2(ρ). This is a positive definite matrix because it is a positive linear combination of positive definite matrices. To see this, use the inner product definition of positive definite matrix. Hence, the space of strictly positive quantum channels is also a convex subset.

63 4.1.3 Examples

Using Theorem 4.1.1, then a large number of quantum channels can be shown to be strictly positive. The following are all cases of strictly positive quantum channels.

1. Amplitude-damping channel (introduced in subsection 2.4.1). If the damping param- eter is not equal to one, then using Theorem 4.1.1 the amplitude damping channel is strictly positive.

2. Pauli Channel (introduced in subsection 2.4.2). Since the Kraus operator corresponding to the identity matrix is invertible then the Pauli channel is strictly positive.

1 3. Dephrasure Channel (introduced in subsection 2.4.3). Let p, q ∈ [0, 2 ] be the probabil- ity of phasing error and erasure error respectively. The channel has the form, " # (1 − q)(1 − p)ρ + p(1 − q)ZρZ 0 N (ρ) = 0 q

where the last row is independent on ρ and the first block matrix is the dephasing channel acting on ρ. This is an example of a positive definite invariant channel where the output has higher dimension than the input.

4. Pauli erasure channel. Generalizing the dephrasure channel where instead of the first- block being the dephasing channel, it is the Pauli channel instead. This is clearly a strictly positive channel.

4.2 Differentiability of Coherent Information

Recall that the coherent information has two alternative definitions, which are equivalent to one another (Definition 2.2.5 and 2.2.6). The difference is how the last-term

S(N c(ρ)) or S((I ⊗ N )|ψihψ|) is written. The last subsection showed that S(N (ρ)) is smooth on D++ when N is strictly positive. This section will give a characterization of when the complementary channel N c such that S(N c(ρ)) is smooth on D++.

64 4.2.1 Differentiability of Entropy Exchange and Coherent Infor- mation

The complementary channel N c(ρ) to a quantum channel N is defined to be the matrix with † (i, j)th entries T r(AiρAj) (defined in definition 2.2.3), where {Ai} the Kraus operators for N .

Theorem 4.2.1 (Smoothness of Entropy Exchange). Let N : L(HA) → L(HB) be a quantum M channel with Kraus operators {Ai : HA → HB}i=1. Let HE be the Hilbert space of the environment system with dimension no larger than the Choi rank. c The complementary channel N : L(HA) → L(HE) of N is strictly positive if the Kraus operators forms a linearly independent set inside the space of bounded linear oper- ators L(HA, HB). √ Proof. Suppose that ρ is positive definite, then it has a unique square root ρ such that √ ( ρ)2 = ρ. Since ρ is positive definite then its square root is too. √ √ † One can rewrite the entries of the complementary channel on ρ as (T r(Ai ρ ρAj))ij. † The inner product on L(HA, HE) is the Frobenius inner product ie hA|Bi = T r(A B), then the entropy exchange (matrix representation of the complementary channel) is precisely the √ Gram matrix with respect to the set M := {Ai ρ}. It is well-known that the Gram matrix

is positive definite iff the set M is linearly-independent in L(HA, HB) ie

 √ c1A1 + ... + cM AM ρ = 0 ⇐⇒ ci = 0 ∀i.

√ Since ρ is positive definite, then it has an inverse and thus it is equivalent to

  c1A1 + ... + cM AM = 0 ⇐⇒ ci = 0 ∀i.

The next Theorem combines the previous results to formally tell when the coherent information is smooth on D++.

Theorem 4.2.2 (Differentiability of Coherent Information). Let N : L(HA) → L(HB) be a quantum channel with Kraus operators {Ai}. The coherent information of a channel with respect to state ρ ∈ D++, c Ic(N , ρ) = S(N (ρ)) − S(N (ρ))

65 is smooth function on D++ if N is a strictly positive quantum channel and its Kraus operators

{Ai} forms an independent set in L(HA, HB).

++ ++ Proof. Let DA be the positive definite density states in L(HA) and similarly DB for L(HB). ++ Since it was shown that von Neumann entropy S is a smooth function (Example 9) on DB and N and N c are positive definite invariant ie

++ ++ c ++ ++ N (DA ) ⊂ DB and N (DA ) ⊂ DE .

Then the proof follows from the fact that composition and addition are smooth functions.

4.3 Gradient and Hessian

++ Having shown when the coherent information Ic is a smooth function on D . This section will compute its gradient and Hessian.

4.3.1 Gradient

Recall that if f : M → R is a real-valued function on a smooth Riemannian manifold M. Its gradient grad(f)(p) is the unique dual vector such that

hgrad(f)(p)|V ip = dfp[V ]

holds for all derivations V ∈ Tp(M) (introduced in Theorem 3.2.1).

Theorem 4.3.1. Let N be a strictly positive quantum channel with independent Kraus c operators. Let Ic be the coherent information of N . Let N be the complementary channel. Let ρ be a positive definite density state. The gradient of the coherent information is      † c† c grad(Ic(N , ρ)) = P − N log(N (ρ)) + N log(N (ρ)) (4.6) where P is the projection of Hermitian matrices H (tangent space of M++) to the space of trace zero Hermitian matrices H0 (tangent space of D++).

c ++ Proof. The coherent information is Ic(N , ρ) = S(N (ρ)) − S(N (ρ)), where S : D → R is the von Neumann entropy. The projection operator Pρ is the mapping from the tangent space

66 ++ ++ Tρ(M ) to Tρ(D ) (introduced in Theorem 3.1.9). Let dA, dB and dE be the dimensions of HA, HB and HE respectively. M++ ++ The goal here is to compute the gradient grad(Ic ) of Ic as a smooth function on M , then project it onto tangent space of D++ using the projection operator P . Recall, this holds from Theorem 3.4 that says   M++ grad(Ic(N , ρ)) = P grad(Ic (N , ρ)) . (4.7)

The differential dSX [V ] of S was shown to be −T r((log(X) + In)V ) (in example 13). Let ρ be a positive definite density state. Here, we will compute the differential of S(N(ρ)). Let ++ V be a derivation in Tρ(M ).

 d S ◦ N p[V ] = dSN (ρ) ◦ d(N )p[V ] (Chain Rule)

= dSN (ρ)[N(V )] (N is Linear)  = −T r (log(N (ρ)) + In)N (V ) (Differential of dSX )

= h−(log(N (ρ)) + In)|N (V )i (Frobenius Inner Product) †  = h−N log(N (ρ)) + In |V i (Adjoint/Dual Channel)

Hence the gradient grad((S(N )M++ )) of S(N ) on M++ is

    † † −N log(N (ρ)) + IdB = −N log(N (ρ)) − IdA

† where N : L(HB) → L(HA) is the adjoint/dual channel introduced in the second chapter as definition 2.2.4 and IdA is the identity map on HA (similarly IdB on HB). Note that adjoint channels of trace-perserving channels are unital linear maps. The same reasoning can be shown that the gradient for the complementary channel N c is   c M++ c† c grad((S(N ) )) = −N log(N (ρ)) − IdA .

Using equation 4.7, and canceling out the identity operator gives the final result:   †  c† c  grad(Ic(N , ρ)) = Pρ − N log(N (ρ)) + N log(N (ρ)) ,

M++ denote the inner-term inside Pρ as grad (Ic(N , ρ)). .

67 The set of points that causes the gradient of f to be zero is called the set of critical points of f. The next Theorem provides different ways that a density matrix ρ ∈ D++ could be a critical point. Theorem 4.3.2. Let N be a strictly positive quantum channel with independent Kraus

operators {Ai}. Let Ic be the coherent information of N on the space of positive definite density matrices D++. Let N c be the complementary channel. Let ρ be a positive definite density state. Let C be a real number. Let U be the isometric extension of N and by definition N c. ++ The positive definite density matrix ρ ∈ D is a critical point of Ic if any of the following hold.     † c† c 1. −N log(N (ρ)) + N log(N (ρ)) = CIdA

P † P c  † 2. − i Ai log(N (ρ))Ai + ij log(N (ρ)) ijAjAi = CIdA   † c 3. U − log(N (ρ)) ⊕ log(N (ρ)) U = CIdA , where ⊕ is the Kronecker sum.

4. Either two cases must follow from satisfying item (1),     † c† c −N log(N (ρ)) and N log(N (ρ)) are both in span(IdA ) or     † c† c −N log(N (ρ)) and N log(N (ρ)) are both not in span(IdA ).

Proof. Since N is strictly positive with independent Kraus operators, then by Theorem 4.2.2 the coherent information is smooth on the manifold D++. Suppose ρ is a critical point, ie equation 4.6 on ρ is zero. The proof of (1) is as follows. Since P is the orthogonal projection operator from H to trace zero, Hermitian matrices H0 and the orthogonal complement of H0 is the span of the identity (proven in 2.4.1). Then the set of Hermitian matrices X ∈ H such that its projection is zero, ie P (X) = 0, is precisely when X is in the span of the identity. For the proof of (2), the definitions of adjoint channel with Kraus representations (can be found in 2.2.4) and the definition of the adjoint of complementary channel (found in second chapter as theorem 2.2.2) is substituted in previous item (1). For the proof of (3), Theorem 2.2.3 shows how to represent the adjoint channels N † and c† N in terms of the isometric extension U : HA → HB ⊗ HE. Precisely it is,

† † N (σB) = U (σB ⊗ IE)U ∀σB ∈ L(HB), (4.8)

68 c† † N (σE) = U (IB ⊗ σE)U ∀σE ∈ L(HE). (4.9)

Plugging these expressions in item (1) to get,

† † c −U (log(N (ρ)) ⊗ IE)U + U (IB ⊗ log(N (ρ))U

Factoring out the isometry operator U.   † c U − log(N (ρ)) ⊗ IE + IB ⊗ log(N (ρ)) U.

This completes the proof due to the kronecker sum A ⊕ B is (A ⊗ I) + (I ⊗ B). The proof of (4) just follows from (1). If one of the terms (either −N †(log(N (ρ))) or N c†(log(N c(ρ)))) on the left side is in the span of the identity, and since the right side is in the span of the identity, then the other term from the left side must be in the span of the identity. Same reasoning holds if they are both not in the span of the identity but sum to be in the span of the identity.

The next chapter will show applications of this Theorem to various quantum channels (Pauli channel and dephrasure channel). In particular condition (1) will primarily be used to show that the maximally mixed state is a critical point. The next Theorem shows that the inner gradient formula (left term of condition (1.) above) does not depend on positive scalar multiplications of the density state ρ.

Theorem 4.3.3. Denote [ρ] := {λρ : λ > 0} to be the ray of positive definite density ++ matrices. The gradient grad(Ic(N )) of coherent information on M (Equation (1.) above) does not depend on λ ie,

M++ M++ grad (Ic(N , [ρ])) = grad (Ic(N , ρ)),     M++ † c† c where grad (Ic(N ))(ρ) = −N log(N (ρ)) + N log(N (ρ)) .

Proof. Let ρ be a positive definite density state and λ be a real number. Then the gradient of channel entropy acting on λρ is:     −N † log(λIN (ρ)) = −N † log(c)I + log(N (ρ)) .

69 Using the fact that if A and B commute then log(AB) = log(A) + log(B). Applying the adjoint gives:   − log(c)I − N † log(N (ρ)) .

The gradient of entropy exchange acting on λρ is similarly:   log(c)I + N c† log(N c(ρ)) .

Hence, putting them together, the log(c)I cancel out giving the original formula (1.) in Theorem 4.3.2.

This Theorem says that to understand the gradient formula amounts to only needing to understand a particular fixed trace T r−1(k) matrices intersected with the positive definite, Hermitian matrices M++ and by doing so, the gradient of coherent information over all M++ is known.

The next Theorem shows that one can rewrite the coherent information Ic in terms of the gradient of the coherent information on the manifold M++.

Theorem 4.3.4. Let N be a strictly positive quantum channel with independent Kraus operators and denote N c to be the complementary channel. Let ρ be a positive definite density matrix. c The coherent information of a quantum channel Ic(N , ρ) = S(N (ρ)) − S(N (ρ)) can be written as: M++ Ic(N , ρ) = hgrad Ic(N , ρ)|ρi,     M++ † c† c where grad (Ic(N , ρ)) = −N log(N (ρ)) + N log(N (ρ)) .

Proof. The proof just follows from the definition of the adjoint of a channel.

c Ic(N , ρ) = S(N (ρ)) − S(N (ρ)) = −T r(N (ρ) log(N (ρ))) + T r(N c(ρ) log(N c(ρ)) (Definiton of Entropy) = −T r(ρN †(log(N (ρ)))) + T r(ρN c†(log(N c(ρ)))) (Definition of Adjoint Channel)    = T r ρ − N †(log(N (ρ))) + N c†(log(N c(ρ)))

= T r(ρ(grad(Ic(N , ρ))))

70 = hgrad(Ic(N , ρ))|ρi

This Theorem 4.3.4 plus Theorem 4.3.3 implies that coherent information is homogeneous of degree zero.

4.3.2 Hessian

Since the space of positive definite density matrices D++ is a embedded submanifold of the linear manifold of Hermitian matrices H, with the Riemannian metric induced from H. The result from example 18 shows that the Hessian of a real valued function f D : D++ → R is just the projection of the differential of the gradient of f H : H → R onto the tangent space of D++, ie   D D Hess(f )(ρ) = Pρ d(grad(f ))ρ (4.10)

The previous section solved the gradient for the coherent information on M++ and conse- quently D++. The next Theorem is for the Hessian.

Theorem 4.3.5 (Hessian for Coherent Information). Let N be a strictly positive quantum c channel with independent Kraus operators. Let Ic be the coherent information of N . Let N be the complementary channel. Let ρ be a positive definite density state. ++ ++ The Hessian of Ic at ρ is the mapping from Tρ(D ) to Tρ(D ) such that it has the ++ following action on a derivation V ∈ Tρ(D ):   †  c† c  P − N d(log(N (ρ)))ρ[V ] + N d(log(N (ρ)))ρ[V ] (4.11)

where the differential of the logarithm was introduced in example 12 and P is the orthogonal projection of H to H0 (ignoring ρ since P is a constant map), introduced in Theorem 3.1.9.

Proof. The proof will done by first computing the Hessian of S(N ) on M++ then using equation 4.10 to compute the Hessian on D++. Let ρ be a positive definite density matrix ++ and let V be a derivation in Tρ(M ).   † d(grad(S ◦ N ))ρ[V ] = −d N (log(N ) + I)) [V ] ρ

71   †  † = −N d log(N + I) ρ[V ] (N is linear)   † = −N d(log(N )))ρ[V ] + 0 (d(I)ρ = 0)   † = −N d(log ◦N )ρ[V ]   † = −N d(log)N (ρ) ◦ d(N )ρ[V ] (Chain Rule)   † = −N d(log)N (ρ)[N (V )] (N is linear)

The differential of the logarithm is,

Z 1 −1 −1 d(log)N (ρ)[N (V )] = [sN (ρ) + (1 − s)I] N (V )[sN (ρ) + (1 − s)I] ds 0 −1 d(log)N (ρ)[N (V )] = N (ρ) N (V ) ( if N (ρ) and N (V ) commute)

The same reasoning gives us the differential of the entropy S ◦ N c on the complementary channel:   c c† c d(grad(S ◦ N ))ρ[V ] = −N d(log)N c(ρ)[N (V )] .

Finally, putting them together the Hessian of the coherent information on D++ is   †  c† c  Hess(Ic(N , ρ))[V ] = P − N d(log)N (ρ)[N (V )] + N d(log)N c(ρ)[N (V )] .

++ Suppose ρ ∈ D is a critical point. The Hessian Hess(Ic(N , ρ)) of Ic at ρ is a linear operator on H0. To determine if ρ is a local maxima (or local minima) amounts to showing that the linear operator Hess(Ic(N , ρ)) is negative definite (or positive definite) ie all of its eigenvalues are negative or positive. One approach of doing this is to write the Hessian in matrix representation and solve for the eigenvalues of that matrix. Recall from Theorem 2.4.1, that all of the tensor combinations of the Pauli operators {X,Y,Z} forms a basis for the space of size 2n ×2n, trace zero matrices 0 H2n . Suppose one is working with qubit density matrices then writing out the Hessian in this basis can provide a matrix representation. This is done in the next chapter. M++ Theorem 4.3.2 gives conditions on critical points of grad(Ic(N , ρ)) = P (grad(f )) on

72 the manifold D++ by providing conditions on the inner-term grad(f M++ ) on the manifold M++. The next Theorem shows a similar result for the Hessian.

M++ ++ Theorem 4.3.6. The Hessian Hess(Ic (N , ρ)) of coherent information on M is pos- itive definite (or negative definite) on H0 iff the Hessian   M++ Hess(Ic(N , ρ)) = P Hess(Ic (N , ρ)) of coherent information on D++ is positive definite (or negative definite) on H0.

Proof. Recall that a linear map L : V → W is positive definite if for all non-zero vectors v,

hv|L|vi > 0.

M++ 0 Suppose that Hess(Ic (N , ρ)) is positive definite. Let V be a trace zero matrix in H .      M++ M++ hV |P Hess(Ic (N , ρ)) |V i = T r VP Hess(Ic (N , ρ)[V ])   † M++ = T r P (V )Hess(Ic (N , ρ)[V ])   M++ = T r P (V )Hess(Ic (N , ρ)[V ])   M++ = T r V Hess(Ic (N , ρ)[V ])

M++ = hV |Hess(Ic (N , ρ)|V i > 0

The proof relies on the fact that the projection map P : H → H0 is a orthogonal projection ie P † = P (used on line three) and P (V ) = V on H0 (used on line four). The proof from the reverse direction is identical.

Rather than computing the integral of the differential of the matrix logarithm. Theorem

A.3.4 in the appendix states that the differential of the matrix logarithm d logN (ρ)[N (V )] is

  1 log(λ1)−log(λ2) ··· log(λ1)−log(λn) λ1 λ1−λ2 λ1−λn  . .   . .  ◦ N (V ) (4.12)   log(λn)−log(λ1) ······ 1 λn−λ1 λn

73 where {λ1, ··· λn} are the eigenvalues of N (ρ) and ◦ is the schur/Hadamard product. Note that equation (4.12) is the special case when the eigenvalues are all distinct. For the general form, see Theorem A.3.4. Denote the matrix that corresponds to the Loewner matrix of logarithm as log[1](N (ρ)) (in this case the left matrix in equation (4.12)). This implies that the Hessian can be written (ignoring the projection term) as

    † [1] c† [1] c c Hess(Ic(N , ρ))[V ] = −N log N (ρ) ◦ N (V ) + N log N (ρ) ◦ N (V ) (4.13)   X † [1] X [1] c  †  † = − Ai log N (ρ) ◦ N (V ) Ai + log N (ρ) ij T r(AiVAj) Ai Aj, ij (4.14) where Ai are the Kraus operators for N .

Concavity

The previous chapter showed at the end that the von Neumann entropy S is strictly concave by showing the Riemannian Hessian is negative definite. It is known that S ◦ N is concave since N is linear. Now it will be shown that the Hessian of S ◦ N is negative semidefinite on D++. Let V be a non-zero, trace zero Hermitian matrix and ρ be a positive definite density matrix.   †  hV |Hess(S ◦ N (ρ))V )i = −T r VP N (d logN (ρ) N (V )   † † = −T r P (V )N (d logN (ρ) N (V )   † = −T r V N (d logN (ρ) N (V )   = −T r N (V )d logN (ρ) N (V )   = −T r N (V )N (ρ)−1N (V )   = −T r N (V )2N (ρ)−1

Since N (V )2 eigenvalues are always greater than or equal to zero and N (ρ)−1 eigenvalues are always positive, non-zero (since ρ is positive definite and N is strictly positive). Using

74 the trace inequality (Theorem 3.2.2), then

X 2 −1 0 ≤ λiβn−i+1 ≤ T r(N (V ) N (ρ) ),

2 −1 where λi are the eigenvalues of N (V ) and βi are the eigenvalues of N (ρ) , both ordered in decreasing order. Thus this proves that S ◦ N is concave.

Remark 4. It is worth to remark that S is strictly concave as its Hessian is negative definite whereas S ◦ N is concave as its Hessian is negative semidefinite. This is because N (V ) may map trace zero Hermitian matrices V to the zero matrix. Furthermore, we have that the coherent information is concave if for all positive definite density matrices ρ,     T r N c(V )2N c(ρ)−1 ≤ T r N (V )2N (ρ)−1 holds for all V ∈ H0 where N c is the complementary channel of N .

75 Chapter 5

Critical Points and Local Maximas/Minimas

The last chapter showed that strictly positive quantum channels with independent Kraus operators cause the coherent information to be a smooth function on the manifold of positive definite matrices D++. Consequently, the gradient and Hessian was solved for the coherent information and its properties were discussed. This main concern of this chapter is to use the gradient and Hessian in order to find critical points and local maximas/minimas of the coherent information of various channels. The first subsection will try to understand under what conditions does a product state ρ⊗n become a critical point for n-shot coherent information, when ρ is a critical point for the single shot coherent information. After doing so, the class of unital quantum channels is discussed next. The next three sections will show that the maximally mixed state is always a critical point of the n-shot coherent information for the Pauli channel, dephrasure channel and Pauli-erasure channel. Afterwards, the eigenvalues of the Hessian of the single shot coherent information of each of those channels is solved at the maximally mixed state showing what regions it becomes a local maxima/minima or saddle point. Throughout this section assume that all quantum channels N mentioned in generality are strictly positive and its Kraus operators {Ai} form an independent set inside the space of bounded linear operators. Denote N c as the complementary channel such that N c(ρ) is † a matrix with (i, j)th entries as T r(AiρAj).

76 5.1 n-shot Coherent Information

Recall that the definition of quantum capacity is,

I (N ⊗n, ρ) Q(N ) = lim max c n→∞ ρ∈Dn n

It is clear that as n increases, the space of density matrices becomes larger and one would expect more critical points. Denote n-shot coherent information to be the coherent informa- ⊗n ++ tion of Ic(N , ·) of the channel tensored n-times. Denote Dn to be the space of positive definite density matrices corresponding to the input space of N ⊗n. This section is concerned with connecting critical points at some n-shot coherent information to larger multiples of n.

5.1.1 Critical Points of Product States

If ρ is a critical point for the k-shot coherent information, then is the product state ρ⊗n a critical point for the nk-shot coherent information? The next Theorem shows this.

++ Theorem 5.1.1. Suppose that ρ ∈ D1 is a critical point for the single shot coherent information Ic(N , ·). ⊗n ++ ⊗n Then ρ ∈ Dn is a critical point for n-shot coherent information Ic(N , ·) for any positive integer n.

The proof relies on the following logarithm property log(X ⊗ X) = log(X) ⊕ log(X) that holds for all X ∈ M++ and ⊕ is the Kronecker sum. This can be seen by the following, let X = UΣU † be the spectral decomposition of X, then

log(X ⊗ X) = log(UΣU † ⊗ UΣU †) = log((U ⊗ U)(Σ ⊗ Σ)(U † ⊗ U †)) (Tensor Property) = (U ⊗ U) log(Σ ⊗ Σ)(U † ⊗ U †) ( Definition of log) = (U ⊗ U)(log((Σ ⊗ I)(I ⊗ Σ))(U † ⊗ U †) = (U ⊗ U)(log(Σ) ⊕ log(Σ))(U † ⊗ U †)((I ⊗ Σ) Commutes (Σ ⊗ I)) = (U ⊗ U)(log(Σ) ⊗ I)(U † ⊗ U †) + (U ⊗ U)(I ⊗ log(Σ))(U † ⊗ U †) = (log(X) ⊗ I) + (I ⊗ log(X)) = log(X) ⊕ log(X).

77 This property can be generalized to any higher tensor powers X⊗n by writing it out as ⊗n Ln log(X ) = i=1 log(X).

++ Proof. Since ρ ∈ D1 is a critical point for single shot coherent information, then for some constant C ∈ R −N †(log(N (ρ))) + N c†(log(N c(ρ))) = CI

holds (theorem 4.3.2) where I is the identity matrix. The previous paragraph shows that,

n n M M log(N ⊗n(ρ⊗n)) = log(N (ρ)) and log(N c⊗n(ρ⊗n)) = log(N c(ρ)), i=1 i=1

where the Kronecker sum for higher sums is A⊕B⊕C = (A⊗I ⊗I)+(I ⊗B⊗I)+(I ⊗I ⊗C). ⊗n Applying the adjoint channel N †⊗n and N c† for both of these channels respectively to obtain,

n n M   M   N † log(N (ρ)) and N c† log(N c(ρ)) , i=1 i=1

where it is noted that the adjoint channels are always unital since the channels are trace perserving. Putting them all together,

n   n   ⊗n ⊗n M †  M c† c  grad(Ic(N , ρ )) = − N log(N (ρ)) + N log(N (ρ)) i=1 i=1 n n M   M   = − N † log(N (ρ)) + N c† log(N c(ρ)) i=1 i=1 n M   = − N † log(N (ρ)) + N c† log(N c(ρ)) i=1 n M = CI i=1 = nCI˜

where I˜ is the larger identity matrix acting on domain of N ⊗n. Since this is in the span of the identity, then the criteria for ρ⊗n to be a critical point for the nth-shot coherent information is satisfied by satisfying condition (4) in Theorem 4.3.2.

78 There is no reason that these Theorem can’t be generalized further from k-shot coherent information to nk-shot coherent information.

++ ⊗k Theorem 5.1.2. If ρ ∈ Dk is a critical point for the k-shot coherent information Ic(N , ·). ⊗kn ++ ⊗kn Then ρ ∈ Dkn is a critical point for n-shot coherent information Ic(N , ·) for all positive integers n ∈ N.

Proof. The proof is nearly identical to Theorem 5.1.1. It follows from noting the following property: N ⊗kn = (N ⊗k)⊗n

Furthermore, if one has m density states ρi such that ρi is a critical point for coherent

⊗ni Nm information of the quantum channel N . Then the product state i=1 ρi is a critical point ⊗n for the coherent information of the quantum channel N , where n = n1 + ··· + nm. The proof follows from decomposing N ⊗n into N ⊗n1 ⊗ N ⊗n2 ⊗ · · · ⊗ N ⊗nm .

79 5.2 Unital Channels

Quantum channel from system A to system B is unital if it maps the identity matrix of system A to the identity matrix of system B, ie N (IA) = IB. Note that adjoint of quantum channels are always unital. The next Theorem shows that the maximally mixed state is always a critical point for channel entropy of unital channels.

Theorem 5.2.1. Let N : L(HA) → L(HB) be a unital quantum channel. Denote dA to be the dimension of HA. The maximally mixed state I ∈ D++ is a critical point for the channel entropy S(N ). dA Proof. The first part of this proof is going to show that gradient of S(N ) is the identity. This can be trivially be seen

I I N † log(N ( )) = N † log( ) dA dA †  = N − log(dA)I ( Spectral Decomposition of I is trivial)

= − log(dA)I ( Adjoint channel are unital)

This implies that based on condition (4) of Theorem 4.3.2, that if I is a critical point, dA then N c†(log(N c( I ))) must be in the span of the identity. The Hessian for unital channels dA will have a much simpler form. This is due to the fact that N ( I ) will always commute with dA N (V ) for any derivations V ∈ H0.

Proposition 5.2.1.1. Let N : L(HA) → L(HB) be a unital quantum channel. Denote dA to be the dimension of HA. Then the Hessian Hess(S(N ( I ))) has the following action dA   †  V 7→ dAP − N N (V ) (5.1)

Proof. Since N ( I ) = I , then this will always commute with N (V ) and hence dA dA

 −1 I  d(log)N ( I )[N (V )] = N N (V ) = dAIN (V ) = dAN (V ). dA dA

Plugging these in the Hessian formula 4.11 (only for S ◦ N term) completes the proof.

80 The next proposition is just a re-statement where the complementary channel N c is also unital. It is very unlikely that the complementary channel N c is unital, it will have to satisfy † † T r(A1A1) = ··· = T r(AmAm) and the Kraus operators have to be orthogonal in L(HA, HB). This implies that with the trace-preserving condition it must also satisfy:

X dA T r( A†A ) = mT r(A†A ) = d =⇒ T r(A†A ) = , i i 1 i A 1 1 m

where dA is the dimension of HA. A particular example of this is the Pauli channel whose probabilities are all equivalent. This is measure zero inside the parameter space.

Proposition 5.2.1.2. If there exists a ρ ∈ D++ such that

I I N (ρ) = and N c(ρ) = dA dE

then ρ is a critical point for the coherent information. ++ Similarly, the Hessian Hess(Ic(N , ρ))) on D has a simple action,     †  c† c  V 7→ −dAP − N N (V ) + dEP N N (V )

Proof. The proof is trivial since adjoint channels , N † and N c†, are unital. And the logarithm I log( ) on the maximally mixed state is − log(dA)I. Hence, both the gradient of channel dA entropy S(N ) and gradient of entropy exchange S(N c) are in the span of the identity and hence satisfies condition (4) in Theorem 4.3.2. The Hessian result follows analogously to the previous proposition. Since N (ρ) and N c(ρ) is the maximally mixed state, then it will always commute with N (V ) and N c(V ), respectively, for any trace zero matrix V .

Mixed-Unitary

A quantum channel from HA to HA is said to be mixed unitary if there exists a collection P † P of unitary operators {Ui : HA → HA} such that N (ρ) = piUρU , where pi = 1. In √ other words, the set { piUi} are the Kraus operators. It should be clear that mixed-unitary channels are unital. When the dimension is two, every unital channel is mixed-unitary [33]. Since mixed-unitary are unital, then the maximally mixed state is a critical point for the channel entropy. Furthermore, if the unitary channels are orthonomal in the sense † T r(Ui Uj) = pidAδij. Then the maximally mixed state is also a critical point for the entropy

81 of the complementary channel N c. This can be seen by noting that N c( I ) is the diagonal dA matrix with entries pi, whose log is the diagonal matrix with entries log(pi). Denote this matrix as D, the complementary adjoint N c†(D) on D is

c† X √ † X † X N (D) = (D)ij pipjUi Uj = log(pi)piUi Ui = log(pi)piI = H({pi})I, ij i i where H({pi}) is the entropy on the probabilities pi. This proves the following.

Theorem 5.2.2. The maximally mixed state is always a critical point for n-shot coherent information of mixed-unitary channels, whose Kraus operators {Ui} are unitary and orthog- † 2 onal to one another with respect to the Frobenius inner product ie T r(Ui Uj) = ci δij for some ci ∈ C.

The Hessian of mixed-unitary quantum channels whom Kraus operators are unitary and orthogonal has the following description (ignoring the projection term):

I X X Hess(I (N , ))[V ] = −d U †U VU †U + (log[1](N c(I/d )))(T r(U VU †)) U †U c d A i j j i A i j ii i j A ij ij X † † X [1] † † = −dA Ui UjVUj Ui + f (pi, pj)T r(Uj UiV )Ui Uj. ij ij,i6=j

[1] log(pi)−log(pj ) [1] 1 where f (pi, pj) = when pi 6= pj and f (pi, pj) = when pi = pj. This follows pi−pj pi from equation 4.14 after plugging in the various components discussed above.

82 5.3 Pauli Channel

Recall that the Pauli channel (introduced in 2.4.2) with error parameters [px, py, pz] is

N (ρ) = (1 − px − py − pz)ρ + pxXρX + pyY ρY + pzZρZ,

where we write pi = 1 − px − py − pz. Since this channel maps the same dimension to the same dimension and it has an invertible Kraus operator then by Theorem 4.1.1 it is strictly positive. The set {I,X,Y,Z} forms a basis for H and thus its Kraus operators form a linearly independent set. This implies from Theorem 4.2.2 that the coherent information is a smooth function on D++. Since Pauli channels are mixed-unitary and its Kraus operators are orthonormal to one another, then the last section shows that maximally mixed state is a critical point of its coherent information. Here is a more concrete proof.

Theorem 5.3.1. Let N ⊗n be the Pauli channel tensored n-times. I The maximally mixed state 2n is always a critical point for the n-shot coherent informa- tion.

Proof. Since N ⊗n is always a unital channel then from Theorem 5.2.1 and condition (4) in c† ⊗n c⊗n I Theorem 4.3.2, the gradient (N ) (log(N ( 2n ))) of the complementary channel needs to be shown to be part of the span of the identity. Due to Theorem 5.1.1, then it only needs to be shown for n = 1. 2 The Pauli matrices Pi ∈ {X,Y,Z} have the property that Pi = I and PiPj = Pk, where i 6= j 6= k, see Theorem 2.4.1. More importantly, their trace is zero. Hence the entropy c I c I † exchange N ( 2 ), which has entries N ij = T r(Ai 2 Aj), at the maximally mixed state is,   1 − px − py − pz 0 0 0    0 px 0 0    .  0 0 p 0   y  0 0 0 pz

The logarithm of this is trivial due to its diagonalization is itself,   log(1 − px − py − pz) 0 0 0   c I  0 log(px) 0 0  log(N ( )) =   . 2  0 0 log(p ) 0   y  0 0 0 log(pz)

83 c† P † The adjoint N (σ) is (σ)ijAjAi which is going to be part of the span of the identity. This † is because AjAj = AjAj = pjI for Pauli matrices, where pj ∈ {pi, px, py, pz}. Putting them together, it will be the identity matrix with constant

(1 − px − py − pz) log(1 − px − py − pz) + px log(px) + py log(py) + pz log(pz).

This completes the proof, as the gradient of the entropy of complementary channel is in the span of the identity.

The next Theorem computes the eigenvalues of the Hessian at the maximally mixed state.

Theorem 5.3.2. The Hessian for the single shot coherent information of the Pauli channel

with channel parameters (pi, px, py, pz) at the maximally mixed state is a diagonal matrix (with Pauli basis) with entries/eigenvalues,

  2 [1] [1] − 2(1 − 2py − 2pz) + 4 pipxf (pi, px) + pypzf (py, pz)   2 [1] [1] − 2(1 − 2px − 2pz) + 4 pipyf (pi, py) + pxpzf (px, pz)   2 [1] [1] − 2(1 − 2px − 2py) + 4 pipzf (pi, pz) + pxpyf (px, py) ,

log( x ) [1] 1 [1] y where f (x, y) = x when x = y and f (x, y) = x−y when x 6= y. The eigenvectors respectively are [X,Y,Z].

Proof. Choose {X,Y,Z} to be a basis for H0. We are going to obtain the matrix representa- tion of the Hessian by first computing the Hessian of channel entropy S(N ) and the Hessian of entropy exchange term S(N c), where N c is the complementary channel. Since Pauli channels are unital, then the Hessian has a simple form (see proposition 5.2.1.1).

† 2 N (X) = (1 − 2py − 2pz)X =⇒ N (N (X)) = 2 (2py + 2pz − 1) X † 2 N (Y ) = (1 − 2px − 2pz)Y =⇒ N (N (Y )) = 2 (2px + 2pz − 1) Y † 2 N (Z) = (1 − 2px − 2py)Z =⇒ N (N (Z)) = 2 (2px + 2py − 1) Z

Thus the matrix representation of the Hessian for channel entropy S(N ) at the maximally

84 mixed state is,

 2  (1 − 2py − 2pz) 0 0  I  Hess S(N ), = −2  0 (1 − 2p − 2p )2 0  . 2  x z  2 0 0 (1 − 2px − 2pz)

For the entropy exchange, we need to first construct the integrand for the differential of the logarithm. Denote the following matrix as D0. Recall that the entropy exchange at the identity is a diagonal matrix and hence its log[1] N c(ρ) is trivially (written for now assuming pi 6= px 6= py 6= pz),

p  log( pi ) log( i ) log( pi )    1 px py pz 1 − p − p − p 0 0 pi pi−px pi−py pi−pz x y z p  log( pi ) log( x ) log( px )   0 p 0 0   px 1 py pz  c  x  [1] c  pi−px px px−py px−pz  N (I/2) = =⇒ log (N (I/2)) = p p p    log( i ) log( x ) log( y )   0 0 py 0   py py 1 pz     pi−py px−py py py−pz  p  log( pi ) log( px ) log( y )  0 0 0 pz pz pz pz 1 pi−pz px−pz py−pz pz

The entropy exchange on the basis {X,Y,Z} are

 √ √  0 2 px pI 0 0  √ √  c 2 px pI 0 0 0  N (X) =  √ √   0 0 0 −2i py pz  √ √  0 0 2i py pz  √ √  0 0 2 py pI 0  √ √  c  0 0 0 2i px pz N (Y ) =  √ √  2 py pI 0 0 0   √ √  0 −2i px pz 0 0  √ √  0 0 0 2 pz pI  √ √  c  0 0 −2i px py 0  N (Z) =  √ √   0 2i px py 0 0   √ √  2 pz pI 0 0 0

85 Hence, the differential of the logarithm using equation 4.12 is,

 log( pi )  √ √ px 0 2 px pI 0 0 pi−px  √ √ log( pi )  2 p p px 0 0 0  [1] c c x I p −px  i p  log (N (I/2)) ◦ N (X) =  √ √ log( y )   0 0 0 −2i p p pz  y z py−pz  p   √ √ log( y )  0 0 2i p p pz 0 y z py−pz √ √ py  py p log( )  0 0 2 I pz 0 py−pz √ √ px  2i px pz log( )   0 0 0 pz 0 [1] c c px−pz log (N (I/2)) ◦ N (Y ) =  √ √ py   py pI log( )  2 pz 0 0 0  py−pz  √ √ px  2i px pz log( ) 0 − pz 0 0 px−pz √ √ pi  2 pz p log( )  0 0 0 I pz pi−pz √ √ px  2i px py log( )   0 0 − py 0  [1] c c  px−py  log (N (I/2)) ◦ N (Z) = √ √ px  2i px py log( )   0 py 0 0   px−py  √ √ pi  2 pz p log( )  I pz 0 0 0 pi−pz

Applying the adjoint N c† on each of these matrices:   pi py   4 pipx (py − pz) log( p ) + pypz (pi − px) log( p )  N c† log[1](N c(I/2)) ◦ N c(X) = x z X, (pi − px)(py − pz)   pi px   4 pipy (px − pz) log( p ) + pxpz (pi − py) log( p )  N c† log[1](N c(I/2)) ◦ N c(Y ) = y z Y, (pi − py)(px − pz)   pi px   4 pipz (px − py) log( p ) + pxpy (pi − pz) log( p )  N c† log[1](N c(I/2)) ◦ N c(Z) = z y Z. (pi − pz)(px − py)

Choosing the ordered basis as {X,Y,Z}, the total Hessian matrix Hess(Ic(N ,I/2)) is just the diagonal matrix with entries,

pi py pipx log( ) pypz log( ) 2 px pz Hess(Ic(N ,I/2))11 = −2(1 − 2py − 2pz) + 4 + (5.2) (pi − px) (py − pz) pi px (pipy log( ) pxpz log( ) 2 py pz Hess(Ic(N ,I/2))22 = −2(1 − 2px − 2pz) + 4 + (5.3) pi − py px − pz

86 pi px (pipz log( ) pxpy log( ) 2 pz py Hess(Ic(N ,I/2))33 = −2(1 − 2px − 2py) + 4 + . (5.4) pi − pz (px − py)

For the general case of when px, py, and pz can be equal to one another, the eigenvalues becomes:   2 [1] [1] Hess(Ic(N ,I/2))11 = −2(1 − 2py − 2pz) + 4 pipxf (pi, px) + pypzf (py, pz)   2 [1] [1] Hess(Ic(N ,I/2))22 = −2(1 − 2px − 2pz) + 4 pipyf (pi, py) + pxpzf (px, pz)   2 [1] [1] Hess(Ic(N ,I/2))33 = −2(1 − 2px − 2py) + 4 pipzf (pi, pz) + pxpyf (px, py) ,

log( x ) [1] 1 [1] y where f (x, y) = x when x = y and f (x, y) = x−y when x 6= y.

5.3.1 Dephasing Channel

The dephasing channel is when px = py = 0 and pi = 1 − pz, where pz ∈ [0, 1]. The

eigenvalues {e1, e2, e3} of the Hessian of the coherent information of the dephasing channel becomes:

2 2 e1 = −2(1 − 2pz) e2 = −2(1 − 2pz) [1] e3 = −2 + 4(1 − pz)pzf (1 − pz, pz).

The first two eigenvalues are always negative, so the third eigenvalue dictates when the maximally mixed state is a local maxima or local minima. 1 Consider the case when pz 6= 2 and hence pz 6= 1 − pz. It is a local maxima when

[1] −2 + 4(1 − pz)pzf (1 − pz, pz) < 0 log( 1−pz ) pz 1 (1 − pz)pz < 1 − 2pz 2 1−p log( z ) 1 pz < . 1 − 2pz 2(1 − pz)pz

1 Suppose that pz < 2 . This implies the denominator (1 − 2pz) is positive and 1 − pz > pz

87 such that log( 1−pz ) > 0. Putting it all together, pz

1 − p 1 − 2p log( z ) < z pz 2(1 − pz)pz 1−2pz pz 1 < e 2(1−pz)pz . 1 − pz

1 Graphing the equation on the right side shows that on (0, 2 ) it is greater than 1 and equal 1 1 to it on 2 . Hence, when pz < 2 , the maximally mixed state is a local maxima. 1 Suppose that pz > 2 . This implies the denominator (1 − 2pz) is negative and 1 − pz < pz such that log( 1−pz ) < 0. Putting it together, pz

1 − p 1 − 2p log( z ) ≥ z (5.5) pz 2(1 − pz)pz 1−2pz pz 1 ≥ e 2(1−pz)pz . (5.6) 1 − pz

1 Graphing the right equation shows that on [ 2 , 1], the equation (5.6) is never satisfied. How- 1 ever the eigenvalue is zero when p = 2 , where then the maximally mixed is a saddle point 1 and positive when p > 2 , indicating it is a local minima. This result is as suspected since the coherent information of the dephasing channel is 1 positive when pz < 2 and maximized by the maximally mixed state [21]. Due to symmetry, this result will hold for py and pz being the non-zero parameter.

5.3.2 Depolarizing Channel

The depolarizing channel is when the Pauli parameters px, py, pz are all equal to one another. 1 The channel parameter px will then range from 0 to 3 and pi will be equal to 1 − 3px. The three eigenvalues {e1, e2, e3} of the Hessian of the coherent information at the max- imally mixed state will be:

2 [1] e1 = −2(1 − 4px) + 4((1 − 3px)pxf (pi, px) + px) 2 [1] e2 = −2(1 − 4px) + 4((1 − 3px)pxf (pi, px) + px) 2 [1] e3 = −2(1 − 4px) + 4((1 − 3px)pxf (pi, px) + px)

1 Note that all eigenvalues are equal to one another. Suppose px = 4 such that pi = px.

88 Re-arranging and simplifying the eigenvalues,

2 −2(1 − 4px) + 4px + 4px 2 −2(1 − 8px + 16px) + 8px 2 −2 + 16px − 32px + 8px 2 −2 + 24px − 32px

1 When px = 4 then this value is 2, thus eigenvalues are all positive and the maximally mixed state is a local minima. 1 Suppose px 6= 4 . Re-arranging the eigenvalues to be less than zero:

log( 1−3px ) 2 px −2(1 − 4px) + 4(1 − 3px)px + 4px < 0 (5.7) 1 − 4px log( 1−3px ) 2 px −2 + 20px − 32px + 4(1 − 3px)px < 0 (5.8) 1 − 4px log( 1−3px ) px 2 1 (1 − 3px)px < 8px − 5px + (5.9) 1 − 4px 2 log( 1−3px ) 8p2 − 5p + 1 px < x x 2 (5.10) 1 − 4px (1 − 3px)px

Suppose that p < 1 , then 1 − 4p is positive and 1 − 3p > p such that log( 1−3px ) is x 4 x x x px positive. Equation (5.10) can be reduced to:

1 − 3p 8(p − 1 )(p − 1 )(1 − 4p ) log( x ) < x 2 x 8 x (5.11) px (1 − 3px)px 8(p − 1 )(p − 1 )(1−4p ) x 2 x 8 x px 1 < e (1−3px)px (5.12) 1 − 3px

Equation (5.12) is satisfied when 0 < px < 0.07055 and hence the maximally mixed state is local maxima on that region. The region (0.07055, 0.25) implies the opposite, that the maximally mixed state is a local minima. The points that correspond to the maximally

mixed state being a saddle-point is when px ∈ {0.07055}. 1 On the other hand, suppose px > 4 and 1 − 4px is negative. The equation (5.10) is satisfied when

8(p − 1 )(p − 1 )(1−4p ) x 2 x 8 x px 1 ≥ e (1−3px)px (5.13) 1 − 3px

89 1 This is never satisfied when px > 4 , hence there are no negative eigenvalues in this region. 1 1 The eigenvalues are positive when 4 < px < 3 , this implies that in this region the maximally mixed state is a local minima. Altogether, we have the following theorem.

1 Theorem 5.3.3. Let px be a number inbetween 0 and 3 . The following conditions on px characterize when the maximally mixed state is a local maxima/minima or saddle point for one shot coherent information of the depolarizing channel.

1. If 0 ≤ px < 0.07055 then the maximally mixed state is a local maxima.

2. If px = 0.07055 then the maximally mixed state is a saddle-point.

1 3. If 0.07055 < px ≤ 3 , then the maximally mixed state is a local minima.

This is expected as it is known that quantum capacity for depolarizing is zero when

0.25 < px [5] and thus one shot coherent information is zero. The quantum capacity from

px < 0.25 is not known [18]. .

Computation Results

Here, various plots are shown on the coherent information of the depolarizing channel against the channel parameter px. The optimization is done via the github software written by the author himself (Github/PyQCodes). Optimization is done by parameterizing positive definite matrices via the Choleskly factorization L†L/T r(L†L) and optimizing using Limited- Memory Broyden Fletcher Goldfarb Shanno (L-BFGS) algorithm. The x-axis is the channel parameter px ∈ [0.01, 0.333]. The Hessian was calculated using a finite difference of the integral representation of the differential of the logarithm with a step-size of 0.001. The matrix representation of the Hessian was found by solving for the coefficients with respect to Pauli basis and then the eigenvalues were solved using ’np.linalg.eigvalsh’ from the Numpy Package.

90 Figure 5.1: Coherent Information of Depolarizing Channel Evaluated at the Maximally Mixed State. 91 Figure 5.2: Maximum of Coherent Information of Depolarizing Channel.

92 Figure 5.3: von Neumann entropy of the optimal solution of maximum of coherent informa- tion 93 Figure 5.4: The eigenvalue of the Hessian at the Maximally Mixed state of coherent infor- mation. 94 5.4 Dephrasure Channel

The dephrasure channel was introduced in section 2.4.3. It is a mapping where the Hilbert

space of the qubit space H2 is two-dimensional and Hilbert space of the output, qutrit space

H3, is three-dimensional. It has the following action,

N (ρ) = (1 − p)(1 − q)ρ + p(1 − q)ZρZ + q|eihe|T r(ρ).

1 where p, q ∈ [0, 2 ] The dephrasure channel is clearly strictly positive when q 6= 0. The Kraus operators trivially form a independent set. Just like before, the maximally mixed state is always a critical point for the dephrasure channel.

Theorem 5.4.1. The maximally mixed state is always critical point for the n-shot coherent information of the dephrasure channel on D++.

Proof. The proof is going to show that at the maximally mixed state, condition (1) in Theorem 4.3.2 is satisfied, ie both the gradient of channel entropy and entropy exchange are in the span of the identity. Then using Theorem 5.1.1, to extend this over all n-shot coherent information.

 (1−p)(1−q)+p(1−q) 0 0  1−q 0 0 I  2 2 N =  0 (1−p)(1−q)+p(1−q) 0 =  0 1−q 0 2  2   2  0 0 q 0 0 q

The logarithm base two of this matrix is trivially,

log(1 − q) − log(2) 0 0   I  log N =  0 log(1 − q) − log(2) 0  2   0 0 log(q)   log2(1 − q) − 1 0 0 =    0 log2(1 − q) − 1 0  0 0 log2(q)

95 The adjoint N † of this matrix is the following,

4   I  X N † log N = A†(log(N (I/2)))A 2 i i i=1 = (1 − p)(1 − q)I(log(N (I/2)))I + p(1 − q)Z(log(N (I/2)))Z + q|0ihe| log(N (I/2)))|eih0| + q|1ihe| log(N (I/2)))|eih1|   = I (1 − p)(1 − q)(log2(1 − q) − 1) + p(1 − q)(log2(1 − q) − 1) + log2(q)q

Hence, the gradient of the channel entropy is part of the identity. The next step is to show that the gradient of entropy exchange is also in the span of the identity. The complementary channel N c at the maximally mixed state is   (1 − p)(1 − q) 0 0 0     c I  0 p(1 − q) 0 0 N =   . 2  0 0 q 0  2  q 0 0 0 2

c I The logarithm of this log(N ( 2 )) is clearly the diagonal matrix with entries [log((1−p)(1−q), log(p) + log(1 − q), log(q) − 1, log(q) − 1]. The adjoint of complementary channel N c† on c I log(N ( 2 )) is

4  I  X I  N c† log(N c( )) = N c A†A 2 2 i i i=1 ii = log((1 − p)(1 − q)I + log(p(1 − q))I + log(q/2)|0ihe||eih0| + log(q/2)|1ihe||eih1|.   = I log((1 − p)(1 − q) + log(p(1 − q)) + log(q/2) .

I Thus, it was shown that the maximally mixed state 2 is a critical point for the single shot I ⊗n I coherent information. Due to theorem 5.1.1, the maximally mixed state ( 2 ) = 2n is also a critical point for n-shot coherent information of the dephrasure channel.

The next theorem gives the conditions on p, q for the maximally mixed state to be a local maxima and local minima for the single shot coherent information. This is identical to the results obtained in the paper [19] and stated in section 2.4.3.

1 Theorem 5.4.2. Let p, q ∈ [0, 2 ] be the channel parameters for the dephrasure channel.

96 I The Hessian Hess(Ic(N , 2 )) of coherent information of the dephrasure channel at the maximally mixed state satisfies the follow condition on p, q.

• if the channel parameters satisfy,   1 − 2p + 2p(1 − p) log p (1 − 2p)2 1−p q < and q < , 2   1 + (1 − 2p) p 2 − 4p + 2p(1 − p) log 1−p

then the maximally mixed state is a local maxima.

• if it only satisfies (1 − 2p)2 q < 1 + (1 − 2p)2 then the maximally mixed state is a saddle point with two negative eigenvalue directions and one positive.

• if it doesn’t satisfy any of the conditions above, then it is a local minima.

Proof. Recall, what the dephrasure channel at the maximally mixed state is,

 1−q 0 0  2 0 0 I  2 I −1 1−q N =  0 1−q 0 and N =  0 2 0 . 2  2  2  1−q  1 0 0 q 0 0 q

0 Choose a basis on H2 to be the Pauli basis {X,Y,Z}. The goal is to get a matrix- representation of the Hessian of the channel entropy and entropy exchange separately.

 (2p−1)(q−1)    0 2 0 0 −i(2p − 1)(q − 1) 0 N (X) =  (2p−1)(q−1) 0 0 N (Y ) = i(2p − 1)(q − 1) 0 0  2    0 0 0 0 0 0 1 − q 0 0 1 0 0     N (Z) =  0 q − 1 0 = (1 − q) 0 −1 0 0 0 0 0 0 0

  I It should be clear that N 2 commutes with both N (X), N (Y ) and N (Z) and hence we can use the simpler formula for the differential of the logarithm.

97 This implies that the Hessian of channel entropy S(N ) is

I   Hess(S(N ), )[V ] = −N † N (I/2)−1N (V ) 2 for all V ∈ {X,Y,Z}. Computing the inner-term gives:

 0 2 − 4p 0  0 i(4p − 2) 0 −1   −1   N (I/2) N (X) = 2 − 4p 0 0 , N (I/2) N (Y ) = i(2 − 4p) 0 0 0 0 0 0 0 0 1 0 0 −1   N (I/2) N (Z) = 2 0 −1 0 . 0 0 0

Taking the adjoint channel on each these matrices, " #   0 (1 − 2p)(4p − 2)(q − 1) N † N (I/2)−1N (X) = (1 − 2p)(4p − 2)(q − 1) 0 " #   0 i(2p − 1)(4p − 2)(q − 1) N † N (I/2)−1N (Y ) = −i(2p − 1)(4p − 2)(q − 1) 0 " #   2 − 2q 0 N † N (I/2)−1N (Z) = 0 2q − 2

This gives a matrix representation in the {X,Y,Z}-basis for the Hessian of channel entropy S(N ) to be

(1 − 2p)(4p − 2)(q − 1) 0 0    Hess(S(N ), ρ) =  0 (1 − 2p)(4p − 2)(q − 1) 0  . 0 0 2 − 2q

Now the goal is to compute the Hessian for the entropy exchange term. Recall that

98 N c(I/2) is

   1  (1 − p)(q − 1) 0 0 0 (1−p)(q−1) 0 0 0      −1  1  c I  0 p(1 − q) 0 0 c I  0 p(1−q) 0 0 N =   and N =   2  0 0 q 0 2  0 0 2 0  2   q  0 0 0 q 2 2 0 0 0 q

Evaluating the exchange term on each of the basis element.     0 0 0 0 0 0 0 0         c 0 0 0 0 c 0 0 0 0  N X =   N Y =   0 0 0 q 0 0 0 −iq     0 0 q 0 0 0 iq 0   0 2pp(1 − p)(1 − q) 0 0    p  c 2 p(1 − p)(1 − q) 0 0 0  N Z =   .  0 0 q 0    0 0 0 −q

It should be clear that N c(I/2) commutes with N c(X) and N c(Y ) but not N c(Z).

    0 0 0 0 0 0 0 0     c −1 c 0 0 0 0 c −1 c 0 0 0 0  (N (I/2) N (X)) =   and (N (I/2) N (Y )) =   . 0 0 0 2 0 0 0 −2i     0 0 2 0 0 0 2i 0

The adjoint channel N c† on each of these matrices are " # " #   0 2q   0 −2qi N c† N c(I/2)−1N c(X) = and N c† N c(I/2)−1N c(Y ) = . 2q 0 2iq 0

This implies the partial matrix representation of the Hessian of entropy exchange S(N c) at the maximally mixed state is

2q 0 ?    0 2q ? 0 0 ?

99 What’s left is to find the action on the Z basis.   s(1 − p)(1 − q) + (1 − s) 0 0 0   c  0 sp(1 − q) + (1 − s) 0 0  [sN (I/2) + (1 − s)I] =    0 0 sq + (1 − s) 0   2  sq 0 0 0 2 + (1 − s)  1  s(1−p)(1−q)+(1−s) 0 0 0  1  c −1  0 sp(1−q)+(1−s) 0 0  [sN (I/2) + (1 − s)I] =    0 0 2 0   sq+2(1−s)  2 0 0 0 sq+2(1−s)

Multiplying to get the integrand,

[sN c(I/2) + (1 − s)I]−1N c(Z)[sN c(I/2) + (1 − s)I]−1 = √  2 p(1−p)(q−1)  0 0 0  √ (ps(q−1)+s−1)(s(p−1)(q−1)−s+1)   2 p(1−p)(q−1)   0 0 0   (ps(q−1)+s−1)(s(p−1)(q−1)−s+1)   q  0 0 sq 2 0  ( 2 −s+1)   −q  0 0 0 qs 2 ( 2 −s+1)

Integrating each term over s from zero to one. √  2 p(1−p)  0 log( p ) 0 0  √ 2p−1 1−p  Z 1  2 p(1−p) p  c −1 c c −1  log( ) 0 0 0  [sN (I/2)+(1−s)I] N (Z)[sN (I/2)+(1−s)I] ds =  2p−1 1−p  0  0 0 2 0    0 0 0 −2

Using the complementary channel adjoint to get,

 4p(p−1)(q−1)(log (p)−log (1−p))+2q(2p−1)  2p−1 0    p −4p(p−1)(q−1)  −2q(2p−1)+log (− p−1 ) 0 2p−1

100 Hence, the matrix representation for the entropy exchange S(N c) is

2q 0 0  c   Hess(S(N ), ρ) =  0 2q 0  4p(p−1)(q−1)(log (p)−log (1−p))+2q(2p−1) 0 0 2p−1

The matrix representation of the total Hessian Hess(Ic(N , ρ)) is the sum of the negative of the Hessian of channel entropy S(N ) plus the Hessian of entropy exchange S(N c), which is the diagonal matrix with the following entries,

Hess(Ic(N , ρ))11 = −(1 − 2p)(4p − 2)(q − 1) + 2q

Hess(Ic(N , ρ))22 = −(1 − 2p)(4p − 2)(q − 1) + 2q   p 4p(p − 1)(q − 1) log 1−p + 2q(2p − 1) Hess(I (N , ρ)) = 2q − 2 + . c 33 2p − 1

The eigenvalues of diagonal matrices are the diagonal elements. Thus, this matrix is negative definite when all of the diagonal elements are negative. The first two eigenvalues are the same hence are negative when

q < (1 − 2p)(2p − 1)(q − 1) = (1 − 2p)2(1 − q) q + (1 − 2p)2q < (1 − 2p)2 q(1 + (1 − 2p)2) < (1 − 2p)2 (1 − 2p)2 q < . 1 + (1 − 2p)2

The last eigenvalue is negative when   p 4p(p − 1)(q − 1) log 1−p + 2q(2p − 1) 2q − 2 + < 0 2p − 1  p  4p(p − 1)(q − 1) log + 2q(2p − 1) ≥ (2 − 2q)(2p − 1) 1 − p   p   p  q 4p(p − 1) log + 2q(2p − 1) ≥ (2 − 2q)(2p − 1) + 4p(p − 1) log 1 − p 1 − p   p    p  q 4p(p − 1) log + 2(2p − 1) ≥ (2 − 2q)(2p − 1) + 4p(p − 1) log 1 − p 1 − p

101   p    p  q 4p(p − 1) log + 2(2p − 1) ≥ 4p − 2 − 4qp + 2q + 4p(p − 1) log 1 − p 1 − p   p    p  q 4p(p − 1) log + 2(2p − 1) + 4qp − 2q ≥ 4p − 2 + 4p(p − 1) log 1 − p 1 − p   p    p  q 4p(p − 1) log + 2(2p − 1) + 4p − 2 ≥ 4p − 2 + 4p(p − 1) log 1 − p 1 − p   p    p  q 4p(p − 1) log + 4p − 2 + 4p − 2 ≥ 4p − 2 + 4p(p − 1) log 1 − p 1 − p   p    p  q 4p(p − 1) log + 8p − 4 ≥ 4p − 2 + 4p(p − 1) log 1 − p 1 − p   p 4p − 2 + 4p(p − 1) log 1−p q <   p 8p − 4 + 4p(p − 1) log 1−p   p 1 − 2p + 2p(1 − p) log 1−p q <   p 2 − 4p + 2p(1 − p) log 1−p

This completes the proof.

Example of Non-Identity Critical Point

It is hinted by evidence in [19] that superadditivity occurs when the maximally mixed state is a saddle point for the dephrasure channel. In particular, they shown that given a bloch I xX+yY +zZ vector parameterization (x, y, z) 7→ ρ = 2 + 2 , then the global optimal of the one shot coherent information is optimal on (0, 0, z). The gradient of channel entropy on ρ = (0, 0, z) gives the following:

  " (1+z)(1−q) # † q log q + (1 − q) log( 2 ) 0 −N log(N (ρ)) = (1−q)(1−z) . 0 (1 − q) log( 2 ) + q log(q)

This is in the span of the identity when z = 0, where then (0, 0, 0) corresponds to the maximally mixed state. Recall, that critical points of coherent information must satisfy that the gradient of channel entropy S(N ) and channel of complementary channel S(N c) are either both in the span of the identity or both not in the span of the identity. Hence, the global optima in the saddle-point region provides a critical point whose gradient of channel

102 entropy and gradient of entropy exchange are not in the span of the identity. Another observations is that when the maximally mixed state is a saddle-point. The eigenvector corresponding to the positive eigenvalue direction is precisely the Z matrix which may explain why that the global optima has the form (0, 0, z). Further study is required.

5.5 Pauli Erasure Channel

Here, the Hessian will solved for the Pauli Erasure channel (replacing the dephasing channel in the dephrasure channel with the Pauli channel instead). The proof to show that the maximally mixed state is always a critical point is similar to before and the focus in this section is strictly on the Hessian. Suppose (for now) that pi 6= px 6= py 6= pz 6= q.

 log( 1−q )  2 2 2 2q     1−q 1−q 1−3q I 1 − q 1 − q I  log( 1−q )  N = diag , , q =⇒ log[1](N ( )) =  2 2 2 2q  2 2 2 2  1−q 1−q 1−3q   log( 1−q ) log( 1−q )  2q 2q 1 2 1−3q 2 1−3q q

The channel at the Pauli basis {X,Y,Z} are:

  0 (1 − q)(−2py − 2pz + 1) 0   N (X) = (1 − q)(−2py − 2pz + 1) 0 0 0 0 0   0 −i(1 − q)(−2px − 2pz + 1) 0   N (Y ) = i(1 − q)(−2py − 2pz + 1) 0 0 0 0 0   (1 − q)(−2px − 2py + 1) 0 0   N (Z) =  0 −(1 − q)(−2px − 2py + 1) 0 0 0 0

Hence, the differential d log I at each of these matrices are: N ( 2 )

  0 2(−2py − 2pz + 1) 0 I log[1](N ( )) ◦ N (X) = 2(−2p − 2p + 1) 0 0 2  y z  0 0 0

103   0 −i2(2px − 2pz + 1) 0 I log[1](N ( )) ◦ N (Y ) = 2i(−2p − 2p + 1) 0 0 2  y z  0 0 0   2(−2px − 2py + 1) 0 0 I log[1](N ( )) ◦ N (Z) =  0 2(−2p − 2p + 1) 0 2  x y  0 0 0

Applying the adjoint N † to each of matrices: " #  I  0 2(1 − q)(2p + 2p − 1)2 N † log[1](N ( )) ◦ N (X) = y z 2 2 2(1 − q)(2py + 2pz − 1) 0 " #  I  0 −2(1 − q)i(2p + 2p − 1)2 N † log[1](N ( )) ◦ N (Y ) = x z 2 2 2i(1 − q)(2px + 2pz − 1) 0 " #  I  2(1 − q)(2p + 2p − 1)2 0 N † log[1](N ( )) ◦ N (Z) = x y 2 2 0 −2(1 − q)(2px + 2py − 1)

Hence, the matrix representation of the Hessian of the channel entropy S(N ) in the basis {X,Y,Z} is:

 2  −2(2py + 2pz − 1) 0 0  I  Hess S(N ( )) = (1−q)  0 −2(2p + 2p − 1)2 0  . 2  x z  0 0 −2(2px + 2py − 1)

What is left is to solve for the Hessian of the entropy of the complementary channel N c.

I   q q  N c = diag (1 − q)p , (1 − q)p , (1 − q)p , (1 − q)p , , 2 i x y z 2 2

[1] c This implies that log (N (I/2)) is matrix that is based on the eigenvalues {(1 − q)pi, (1 −

q)px, (1 − q)py, (1 − q)pz, q/2, q/2}.

104 The complementary channel at the Pauli basis are:  √ √  0 2 pi px(1 − q) 0 0 0 0  √ √  2 pi px(1 − q) 0 0 0 0 0  √ √   0 0 0 −2i p p (1 − q) 0 0 c  y z  N (X) =  √ √   0 0 2i py pz(1 − q) 0 0 0    0 0 0 0 0 q   0 0 0 0 q 0  √ √  0 0 2 pi py(1 − q) 0 0 0  √ √   0 0 0 2i px pz(1 − q) 0 0   √ √  2 p p (1 − q) 0 0 0 0 0  c  i y  N (Y ) =  √ √   0 −2i px pz(1 − q) 0 0 0 0     0 0 0 0 0 −iq   0 0 0 0 iq 0  √ √  0 0 0 2 pi pz(1 − q) 0 0  √ √   0 0 −2i px py(1 − q) 0 0 0   √ √   0 2i p p (1 − q) 0 0 0 0  c  x y  N (Z) =  √ √  2 pi pz(1 − q) 0 0 0 0 0     0 0 0 0 q 0    0 0 0 0 0 −q

Finally, the adjoint on the differential of the matrix logarithm based on N c(I/2) is:

pi py  I  (4pipx(1 − q)(log( )) 4pypz(1 − q)log( )  N c† log[1] N c  ◦ N c(X) = px + pz + 2q X 2 (1 − 2px − py − pz) (py − pz) pi px  I  (4pipy(1 − q)(log( )) 4pxpz(1 − q)log( )  N c† log[1] N c  ◦ N c(Y ) = py + pz + 2q Y 2 (1 − px − 2py − pz) (px − pz) pi px  I  (4pipz(1 − q)(log( )) 4pxpy(1 − q)log( )  N c† log[1] N c  ◦ N c(Z) = pz + py + 2q Z 2 (1 − px − py − 2pz) (px − py)

This completes the computation of the Hessian of the entropy of complementary channel c I S(N ). To put it all together, the Hessian Hess(Ic(N , 2 )) of the coherent information of the Pauli erasure channel at the maximally mixed state is a diagonal matrix (in Pauli basis

105 {X,Y,Z}) with entries:

pi py (4pipx(1 − q)(log( )) 4pypz(1 − q)log( ) I 2 px pz Hess(Ic(N , ))11 = −2(1 − q)(2py + 2pz − 1) + + + 2q 2 (1 − 2px − py − pz) (py − pz) pi px (4pipy(1 − q)(log( )) 4pxpz(1 − q)log( ) I 2 py pz Hess(Ic(N , ))22 = −2(1 − q)(2px + 2pz − 1) + + + 2q 2 (1 − px − 2py − pz) (px − pz) pi px (4pipz(1 − q)(log( )) 4pxpy(1 − q)log( ) I 2 pz py Hess(Ic(N , ))33 = −2(1 − q)(2px + 2py − 1) + + + 2q 2 (1 − px − py − 2pz) (px − py)

The results on the dephrasure channel is recovered when px = py = 0. The results above

assume that px 6= py 6= pz 6= pi 6= q. Generalizing this proof so that the probabilities can be equal to one another, the diag-

onal Hessian matrix with eigenvalues e1, e2, e3 corresponding to the eigenvectors {X,Y,Z} respectively is:

2 [1] [1] e1 = −2(1 − q)(2py + 2pz − 1) + 4pipx(1 − q)f (pi, px) + 4pypz(1 − q)f (py, pz) + 2q 2 [1] [1] e2 = −2(1 − q)(2px + 2pz − 1) + 4pipy(1 − q)f (pi, py) + 4pxpz(1 − q)f (px, pz) + 2q 2 [1] [1] e3 = −2(1 − q)(2px + 2py − 1) + 4pipz(1 − q)f (pi, pz) + 4pxpy(1 − q)f (px, py) + 2q

106 Chapter 6

Conclusion And Further Work

”In science, if you know what you’re doing you shouldn’t be doing it.”

- Richard Hamming, Art of Doing Science and Engineering.

”Everybody got a plan until they get punched in the face.”

- Mike Tyson, Interview before fighting Holyfield.

The goal of this work is to study the complexity of the quantum capacity Q(N ) defined as max I (N ⊗n, ρ) Q(N ) = lim ρ∈D c , n→∞ n ⊗n ⊗n by studying the critical points of the coherent information Ic(N , ρ) = S(N (ρ)) − S((N c)⊗nρ). This was accomplished by restricting to the space of positive definite density matrices ++ ++ D . This is a reasonable restriction since Ic(N , ρ) ≤ S(ρ) and the fact that D is dense

inside the space of all density matrices D with Ic being continuous. For qubit channels and n = 1, it is trivially known that the coherent information is maximized by a state in D++ as rank 1 density matrices have zero coherent information. Chapter three discusses the basics of manifold theory to study D++ and its manifold structure. It was eventually shown that the von Neumann entropy S(X) = −T r(X log(X)) is a smooth function on D++ and including M++, the space of positive definite matrices. The gradient and Hessian of S were solved from using the technique found in Riemannian geometry/manifold optimization communities to solve in the larger manifold M++ using

107 matrix calculus then projecting (using P ) the gradient onto D++, ie

grad(SD++ ) = P (grad(SM++ )).

Furthermore, the Hessian has a similar result on D++ by taking the differential of the gradient on M++ as D++ M++ Hess(S ) = P (d(grad(S ))ρ),

++ where dfρ is the differential of f at ρ ∈ D . Chapter four applied the results on the previous chapter to the coherent information c Ic(N , ρ) = S(N (ρ)) − S(N (ρ)). However, it needed to be shown that coherent information is a smooth function on D++ and M++. The is done by introducing the concept of strictly positive quantum channels, meaning that it is positive definite invariant. It was then shown that this is dense inside the space of all quantum channels (see 4.1.3) and that the entropy of the complementary channel can always be guaranteed to be strictly positive. Using the same tactics as chapter three, the gradient and Hessian were solved for the coherent information (theorem 4.3.1 and 4.3.5). It was shown that the study of critical points can be completely understood from studying the coherent information as a function on M++ rather than the density matrices D++ (theorems 4.3.2, 4.3.3 and 4.3.6). It was shown in theorem 4.3.4 that the coherent information can be re-written using its gradient on M++ and the Frobenius inner product as: M++ Ic(N , ρ) = hgrad(Ic , ρ)|ρi. The final chapter applies the result of the previous chapter to particular examples of quantum channels, the Pauli, dephrasure and Pauli-erasure channels. It was proven that if ρ is a critical point for k-shot coherent information, then its product state ρ⊗n is a critical point for the nk shot coherent information (see 5.1.1). The maximally mixed state was shown to be always be a critical point for the three channels above (see 5.3.1 and 5.4.1) and the eigenvalues of its Hessian at the maximally mixed state with n = 1 were completely solved (see 5.3.2 and 5.4.2). The results match those of literature and one’s own computation indicating evidence for the validity of this work.

108 6.1 Further Work

Local Maximas of Product States on n-shot Coherent Information

It was shown that if ρ is a critical point for k-shot coherent information, then for every positive integer n, ρ⊗n is a critical point for nk-shot coherent information. The next question is, if ρ is a local maxima/minima, then is ρ⊗n also a local maxima/minima for the nk-shot coherent information? If not, when does it hold. So far the best it can be said is that the following holds for ρ⊗n to be a local maxima,     T r N c⊗n(V )2(N c(ρ)−1)⊗n ≤ T r N ⊗n(V )2(N (ρ)−1)⊗n

for all V ∈ (H0)⊗n.

Global-Optima Criteria

The next goal would be to investigate under what conditions do local maxima become global maxima. The following is a conjecture that is supported from evidence of the Pauli and dephrasure channel.

Conjecture 6.1.1. If the maximally mixed state is a local maxima of the coherent infor-

mation Ic and Ic on the maximally mixed state is positive then it is a global maxima of the coherent information for Pauli and Pauli-erasure channel.

This generalizes the property that the degradable and anti-degradable quantum channels have on the coherent information geometrically. Alternative way of showing a local maxima is a global maxima would be to find when the Hessian of coherent information becomes concave.

Conjecture: Superadditivity and gradient condition

There is evidence supported from the dephrasure channel such that if ρ is a global optima and if the gradient terms at ρ satisfies     −N † log(N (ρ)) ∈/ span{I} and − (N c)† log((N c(ρ)) ∈/ span{I},

then it is somehow related to the notion of superadditivity.

109 Stratum-Perserving Channels

One approximation that was made was to restrict to positive definite matrices. This was due to the fact that the set of all density matrices D does not have a global manifold structure but rather a stratified manifold structure [14]. A stratified manifold is one that can be broken into smooth manifold pieces. In particular,

D = D1 ∪ D2 ∪ · · · ∪ D++, where Di is the smooth manifold of all rank i density matrices. Only for qubit density matrices, does it have a smooth manifold structure with boundary. It was further shown in the paper [6] that the special linear group SL(C, n) = {X ∈ Cn×n : det(X) = 1} of determinant one, n × n matrices has a group action on Di that recovers the space. A quantum channel that is stratum perserving is one that for all integers i up to the full rank will map rank i density matrices Di to rank k density matrices Dk. In particular, this is true if it maps the positive semidefinite matrices of rank i intersected with the special linear group to positive semidefintie matrices of rank k intersected with SL(C, n). It is well- known that for when i = k for all integer i, then the quantum channel is rank perserving and has a single Kraus operator as {U}, where U is unitary. Further understanding what kind of characterization this has may help with extending this work to all density matrices. However, for reasons stated in chapter two, it may be unnecessary.

Counting the number of critical points

Having the ability to count the number of states ρ that satisfy     † c† c −N log(N (ρ)) + N log(N (ρ)) ∈ span{In}

may be of virtue. Theorem 5.1.1 is an example of critical points at n = 1 extending to larger n’s. This may lead to the ability to understand how the number of critical points relates to the complexity of quantum capacity,

I (N ⊗n) Q(N ) = lim c . n→∞ n

110 I Critical points through computation and d as a bifurcation point The author of thesis wrote a software package Github/PyQCodes that allows the optimiza- tion of the coherent information by plugging in only the Kraus operators of the quantum channel. It also allows the use of sparse matrices and exploits the Lipschitz properties of coherent information in order to optimize. Using this software package and the formulas for the gradient and Hessian may lead to a better understanding on the complexity of n-shot coherent information.

For example, for the dephrasure channel at a particular channel parameter (p?, q?) such that the maximally mixed state is a saddle point. The gradient of channel and entropy exchange are both diagonal but not individually in the span of the identity for n = 1, 2, 3 (recall (4) in theorem 4.3.2). The optimal density matrix has the following very interesting pattern. The optimal density matrix at n = 1 is a diagonal matrix that is not the maximally mixed state. The optimal rho at n=2 has tensor decomposition, " # " # " # " # 1 0 2a 0 0 0 2b 0 ⊗ + ⊗ . 0 0 0 2b 0 1 0 2a

The optimal density matrix at n = 3 has tensor decomposition, " # " # " # " # " # " # " # " # " # " # 1 0  1 0 4a 0 0 0 4b 0  0 0  0 0 4a 0 1 0 4b 0  ⊗ ⊗ + ⊗ + ⊗ ⊗ + ⊗ . 0 0 0 0 0 4b 0 1 0 4a 0 1 0 1 0 4b 0 0 0 4a

This was also noted by the authors themselves in the paper in [19]. In summary, using computation to find the optimal density matrices and the work in this thesis such as the gradient and Hessian, will help identify why certain kinds of structure is optimal for the n-shot coherent information. Hopefully, revealing why super-additivity occurs. It was shown in this thesis that it is heavily involved with the maximally mixed state being a bifurcation point. Using the Hessian at the maximally mixed state and using the eigenvector direction with positive eigenvalues may lead to a better understanding of this. One explanation as to why at the single shot coherent information of the dephrasure channel, the optimal density matrix is a diagonal matrix is because the eigenvector with the positive eigenvalue is the Z matrix at the bifurcation point (shown in chapter five). The author suspects that this holds for higher n’s. In other words, the optimal density matrix at n = 2, 3, ... listed above has that structure precisely because those are the eigenvectors directions with positive (possibly largest) eigenvalues.

111 Bibliography

[1] Absil, P.-A., Mahony, R., and Sepulchre, R. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.

[2] Absil, P.-A., Mahony, R., and Trumpf, J. An extrinsic look at the Riemannian Hessian. In International Conference on Geometric Science of Information (2013), Springer, pp. 361–368.

[3] Aubrun, G., and Szarek, S. J. Alice and Bob meet Banach, vol. 223. American Mathematical Soc., 2017.

[4] Bhatia, R. Positive definite matrices, vol. 24. Princeton university press, 2009.

[5] Bruß, D., DiVincenzo, D. P., Ekert, A., Fuchs, C. A., Macchiavello, C., and Smolin, J. A. Optimal universal and state-dependent quantum cloning. Physical Review A 57, 4 (1998), 2368.

[6] Chru´scinski,´ D., Ciaglia, F. M., Ibort, A., Marmo, G., and Ventriglia, F. Stratified manifold of quantum states, actions of the complex special linear group. Annals of Physics 400 (2019), 221–245.

[7] Cubitt, T., Elkouss, D., Matthews, W., Ozols, M., Perez-Garc´ ´ıa, D., and Strelchuk, S. Unbounded number of channel uses may be required to detect quantum capacity. Nature Communications 6 (2015), 6739.

[8] Cubitt, T. S., Ruskai, M. B., and Smith, G. The structure of degradable quantum channels. Journal of Mathematical Physics 49, 10 (2008), 102104.

[9] Devetak, I. The private classical capacity and quantum capacity of a quantum chan- nel. IEEE Transactions on Information Theory 51, 1 (2005), 44–55.

[10] Devetak, I., and Shor, P. W. The capacity of a quantum channel for simultaneous transmission of classical and quantum information. Communications in Mathematical Physics 256, 2 (2005), 287–303.

[11] Devetak, I., and Winter, A. Classical data compression with quantum side infor- mation. Physical Review A 68, 4 (2003), 042301.

112 [12] Dittmann, J. On the Riemannian metric on the space of density matrices. Reports on Mathematical Physics 36, 2-3 (1995), 309–315.

[13] Gibilisco, P., and Isola, T. Monotone metrics on statistical manifolds of density matrices by geometry of non-commutative l2-spaces. In AIP Conference Proceedings (2001), vol. 553, AIP, pp. 129–140.

[14] Grabowski, J., Ku´s, M., and Marmo, G. Geometry of quantum systems: density states and entanglement. Journal of Physics A: Mathematical and General 38, 47 (2005), 10217.

[15] Gyongyosi, L., Imre, S., and Nguyen, H. V. A survey on quantum channel capacities. IEEE Communications Surveys & Tutorials 20, 2 (2018), 1149–1205.

[16] Higham, N. J. Functions of matrices: theory and computation, vol. 104. Siam, 2008.

[17] Hilgert, J., and Neeb, K.-H. Structure and geometry of Lie groups. Springer,, 2012.

[18] Leditzky, F., Leung, D., and Smith, G. Quantum and private capacities of low- noise channels. In 2017 IEEE Information Theory Workshop (ITW) (2017), IEEE, pp. 484–488.

[19] Leditzky, F., Leung, D., and Smith, G. Dephrasure channel and superadditivity of coherent information. Physical review letters 121, 16 (2018), 160501.

[20] Lee, J. M. Smooth manifolds. Springer, 2013.

[21] Leung, D., and Watrous, J. On the complementary quantum capacity of the depolarizing channel. Quantum 1 (2017), 28.

[22] Lloyd, S. Capacity of the noisy quantum channel. Physical Review A 55, 3 (1997), 1613.

[23] Marshall, A. W., Olkin, I., and Arnold, B. C. Inequalities: theory of majoriza- tion and its applications, vol. 143. Springer, 1979.

[24] Naudts, J. Quantum statistical manifolds. Entropy 20, 6 (2018), 472.

[25] Naylor, A. W., and Sell, G. R. Linear operator theory in engineering and science. Springer Science & Business Media, 2000.

[26] Schumacher, B., and Nielsen, M. A. Quantum data processing and error correc- tion. Physical Review A 54, 4 (1996), 2629.

[27] SHANNON, C. A mathematical theory of communication.

113 [28] Shor, P. W. The quantum channel capacity and coherent information. In lecture notes, MSRI Workshop on Quantum Computation (2002).

[29] Smith, G., and Smolin, J. A. Degenerate quantum codes for Pauli channels. Physical Review Letters 98, 3 (2007), 030501.

[30] Smith, G., and Yard, J. Quantum communication with zero-capacity channels. Science 321, 5897 (2008), 1812–1815.

[31] Stinespring, W. F. Positive functions on C*-algebras. Proceedings of the American Mathematical Society 6, 2 (1955), 211–216.

[32] Sutter, D., Scholz, V. B., Winter, A., and Renner, R. Approximate degrad- able quantum channels. IEEE Transactions on Information Theory 63, 12 (2017), 7832– 7844.

[33] Watrous, J. The theory of quantum information. Cambridge University Press, 2018.

[34] Wilde, M. M. Quantum information theory. Cambridge University Press, 2017.

[35] Wolf, M. M. Quantum channels & operations: Guided tour. Lecture notes available at http://www-m5. ma. tum. de/foswiki/pub M 5 (2012).

[36] Wolf, M. M., and Perez-Garcia, D. Quantum capacities of channels with small environment. Physical Review A 75, 1 (2007), 012303.

[37] Yard, J., Hayden, P., and Devetak, I. Capacity theorems for quantum multiple- access channels: Classical-quantum and quantum-quantum capacity regions. IEEE Transactions on Information Theory 54, 7 (2008), 3091–3113.

114 Appendix A

Matrix Calculus

This appendix introduces the basic notions of matrix calculus. The reference for this section is the book [16]. n×n n×n Denote C to be the space of all n×n matrices with complex entries. Denote Hn ⊂ C to be the set of n × n Hermitian matrices.

A.1 Matrix Functions

Let f : C → C be a scalar function. The goal of this section is to extend the scalar function f to matrix-valued function f˜ : Cn×n → Cn×n on the space of complex-valued matrices such that f˜ has the same continuity/differentiability properties of f. This is done by using the Jordan canonical form of any matrix (definition 1.2 in [16]).

Definition A.1.1 (Matrix function via Jordan canonical form). Let f : C → C be a l- continuous differentiable Cl function. Then f can be extended to a matrix-valued function f˜ : Cn×n → Cn×n by the following. Any matrix X ∈ Cn×n has a unique Jordan canonical −1 form X = ZJZ such that J = diag(J1, ··· ,Jm) is a block matrix with blocks Jk, such that

  λk 1 ···

 0 λk 1 ···    mk×mk Jk =  . .  ∈ C  . .   . . 1  λk

Pm where Z is invertible and i=1 mi = n and mk ≤ l for all k.

115 −1 −1 f(X) is defined to be Zf(J)Z = Zdiag(f(J1), ··· , f(Jk))Z , where

m −1  0 f k (λk)  f(λk) f (λk) ··· (mk−1)!  . .   0 f(λ ) .. .   k  f(Jk) =  . .   . .. f 0(λ )   k  0 ··· f(λk)

Given any matrix function f,˜ its restriction to the space of Hermitian matrices X has a simpler form due to its unitary diagonalization X = UDU †. The Jordan canonical form is equivalent to the spectral diagonalization and all Jordan blocks become size 1 × 1 matrices.

This can be seen by noting that J = diag(λ1, ··· , λk), where λi ∈ R and Z = U, where U is unitary. Thus, matrix functions on Hermitian matrices are written as f(X) = Uf(D)U †, where f(D) = diag(f(λ1), ··· , f(λk)). Matrix functions defined on the spectrum of matrices has the following properties.

Theorem A.1.1 (Properties of Matrix Functions ([16], Theorem 1.13)). Let f be a scalar function. Its extension to a matrix function f : Cn×n → Cn×n has the following properties for all X ∈ Cn×n :

1. f(X) commutes with X.

2. f(XT ) = f(X)T . If f as a scalar function is analytic, then f(X†) = f(X)†.

3. If X commutes with A, then X commutes with f(A).

mi×mi 4. if X = diag(A11, ··· ,Amm) is a block matrix with blocks Aii ∈ C , then f(X) =

diag(f(A11), ··· , f(Amm)).

5. f(Im ⊗ X) = Im ⊗ f(X), where ⊗ is the kronecker product.

6. f(X ⊗ Im) = f(X) ⊗ Im.

The next theorem shows that extensions of scalar functions to matrix functions preserves continuity under certain conditions.

Theorem A.1.2 (Continuity, Theorem 1.19 in [16]). Let D be a open set in R or C and let f be n - 1 times continuously differentiable scalar -function on D. Then the matrix function extension f(X) is a continuous function on matrices X ∈ Cn×n with spectrum in D.

116 A.2 Frechet and Gateux Derivatives

This section introduces the notions of total derivative/Frechet and directional/Gateux deriva- tives found in vector calculus. The Gateux/directional derivative is the derivative along a given direction.

Definition A.2.1 (Gateux). Let f : C → C be a scalar-function. The Gateux derivative G(X,E) of f as a matrix-valued function at X ∈ Cn×n in direction E ∈ Cn×n is

f(X + tE) − f(X) d G(X,E) = lim = f(X + tE). t→0 t dt t=0

The Frechet derivative of a matrix function is analogous to the total derivative in vector calculus.

Definition A.2.2 (Frechet Derivative). Let f : C → C be a scalar-function. The Frechet derivative L(X, ·) of a matrix function f at X ∈ Cn×n is a linear operator on Cn×n such that the following approximation is satisfied

f(X + tE) − f(X) − L(X,E) n×n lim = 0 ∀E ∈ C . (A.1) t→0 t

If the dependence on function f needs to be explicit, it is written as Lf .

The existence of the Frechet derivative implies uniqueness. This can be seen by noting that if two Frechet derivatives L and L˜ exist at point X ∈ Cn×n, then along any direction E ∈ Cn×n the equality L(X,E) − L˜(X,E) lim = 0 t→0 t is satisfied only when L(X,E) = L˜(X,E). The existence of Frechet derivative F (X,E) at X in direction E is equivalent to the Gateux dervative G(X,E) of f in direction E. This is seen by re-aranging equation A.1 L(X,E) to obtain G(X,E) := limt→0 t . The converse is true only if the Gateux derivative is a linear function in parameter E and is continuous in parameter X [16]. The Frechet derivative satisfies the following properties.

Theorem A.2.1 (Frechet Derivative Properties). If f and g are Frechet differentiable at point X ∈ Cn×n then so is

• (Linearity) the Frechet Derivative Lf+g of f + g at X is Lf + Lg.

117 • (Product Rule) the Frechet Derivative Lfg of fg at X with

Lfg(X,E) = f(X) · Lg(X,E) + g(X)Lf (X,E).

• (Chain Rule) the Frechet Derivative Lf◦g of f ◦ g at X with

Lf◦g = Lf (g(X),Lg(X,E)).

The next theorem (theorem 3.8 in [16]) gives a condition for when the Frechet derivative is continuous.

Theorem A.2.2. Let f be a scalar-function that is 2n − 1 times continuously differentiable on a open set D in R or C. For any X ∈ Cn×n with spectrum in D the Frechet derivative L(X,E) exists and is continuous in both variables X and E.

A.3 Matrix Exponential and Logarithm

The two most important matrix functions for this thesis is the matrix exponential and matrix logarithm. Both are defined by the scalar equivalent in the complex or real plane via the Jordan canonical form (definition A.1.1).

Definition A.3.1 (Matrix Exponential). Let e : C → C be the complex-valued exponen- tial function. The matrix exponential exp : Cn×n → Cn×n is defined to be exp(X) = Z exp(J)Z−1, where Z is invertible and J is the Jordan block of X. This is analogous to definition A.1.1.

The matrix exponential satisfies the following properties found in [16].

Theorem A.3.1 (Properties of Matrix Exponential). Let exp be the matrix exponential. Let X,Y ∈ Cn×n be two matrices. The matrix exponential satisfies the following properties.

1. Let n ≥ 2. Then for all t ∈ C,

exp((A + B)t) = exp(At) exp(Bt) ⇐⇒ AB = BA

2. exp(A) exp(B) = exp(B) exp(A) ⇐⇒ AB = BA

118 3. exp(A ⊗ I) = exp(A) ⊗ I and similarly exp(I ⊗ A) = I ⊗ exp(A), where ⊗ is the kronecker product.

4. exp(A ⊕ B) = exp(A) ⊗ exp(B), where A ⊕ B := A ⊗ I + I ⊗ B is the kronecker sum.

The complex-valued logarithm is a multi-valued function and a choice of a branch cut is needed. The branch cut is chosen to be the negative real line including zero and the logarithm is restricted such that the spectrum lies in {z| − π < Im{z} < π}. The logarithm could be defined analogous to the matrix exponential via the Jordan canonical form, however the integral definition will be used just like in [16].

Definition A.3.2 (Matrix Logarithm). Let log : C → C be a complex-valued logarithm function whose branch cut is chosen on the negative real line and any complex number z is restricted to satisfy −π < Im{z} < π. Let A ∈ Cn×n be a matrix with no negative eigenvalues, then the matrix logarithm at A is Z 1 log(A) = (A − I)(s(A − I) + I)−1ds. (A.2) 0 If A is Hermitian and positive definite, then matrix logarithm has the following series representation, ∞ X 1 log(A) = −2 ((I − A)(I + A)−1)2k+1. (A.3) 2k + 1 k=0 It was then shown that the Frechet Derivative also has an integral representation based on the definition above and the definition of Frechet Derivative.

Theorem A.3.2 (Frechet Derivative of Matrix Logarithm). Let log be the matrix logarithm. Let A ∈ Cn×n be a matrix with no negative eigenvalues. The Frechet Derivative of A in direction E ∈ Cn×n is Z 1 L(A, E) = (s(A − I) + I)−1E(s(A − I) + I)−1ds. 0

Proof. The proof is going to be based on showing that the Frechet condition A.1 holds based on the integral representation of matrix logarithm.

We have similar properties of matrix exponential for the matrix logarithm found in [16].

119 Theorem A.3.3 (Matrix Logarithm Properties). The matrix logarithm satisfies the follow- ing properties for all matries A with no negative eigenvalues. 1. log(A ⊗ I) = log(A) ⊗ I and log(I ⊗ A) = I ⊗ log(A).

2. log(A ⊕ B) = log(A) ⊗ log(B), where ⊕ is the kronecker sum and B is a matrix with no negative eigenvalues.

3. If A is Hermitian and positive definite matrix, then

−1 −1 ||L(A, E)||F = max |λ| = ||A ||2, λ∈∆(A)

where || · ||F is the Frobenius norm.

A simple description will be presented for the differential of the logarithm d(log)X [V ]. The Loewner matrix f [1](X) of continuous function f ∈ C(I) of matrix X ∈ H is a matrix with entries that are either

[1] f(λi) − f(λj) f (X) = ( if λi 6= λj.) λi − λj [1] 0 f (X) = f (λi), ( if λi = λi.)

where λi are the eigenvalues of X. [1] The next theorem states that the differential d(log)X [V ] is precisely log (X) ◦ V , where

◦ is the Schur (or Hadamard) product ie a matrix with entries (X ◦ V )ij = XijVij. Theorem A.3.4 (Theorem 5.3.1 in [4]). Let f ∈ C(I) be a continuous function on a interval and let X be a Hermitian matrix with eigenvalues in I. Then the differential DfX [V ] of f based on X in direction V is f [1](X) ◦ V, where ◦ is the Schur/Hadamard product.

A.4 Power Series

One method of extending scalar-functions to matrix functions is by extending its power series, rather than using the Jordan canonical form. This representation allows to easily show that a matrix function is infinitely differentiable. This requires the notion of a banach algebra.

120 Definition A.4.1 (Banach Algebra). A set (A, +, ·, ||·||, F) is a banach algebra over a field F if the tuple (A, +, ||·||, F) forms a banach space (complete normed vector space). Additionally, the multiplication function · : A × A → A is bilinear and associative such that the norm is sub-multiplicative with it ie

||a · b|| ≤ ||a|| · ||b|| ∀a, b ∈ A.

If there exists a element I such that a · I = I · a = a for all a ∈ A, then A is said to be a unital banach algebra. If its vector space is finite-dimensional, then A is said to be a finite-dimensional Banach algebra.

We have the following lemma regarding extension and differentiability of power series from [17].

P∞ n Lemma A.4.1. Let cn ∈ F and r > 0 with n=0 |cn|r be a convergent power series. Let A be a finite-dimensional unital Banach algebra. Let Br(0) be the open ball of radius r around the zero vector 0. P∞ n Then the function f : Br(0) → A : x 7→ n=0 cnx is a smooth function with its total derivative given as ∞ X df(x) = cndpn(x), n=0 n n−1 n−2 n−2 n−1 where pn(x) = x and its derivative is dpn(x)(y) = x y + x yx + ··· + xyx + yx .

x P∞ 1 k The exponential e : C → C has a power series representation as e = k=0 k! x with infinite radius. Hence from the previous lemma, it can be extended smoothly and entirely to a finite-dimensional Banach algebra.

Lemma A.4.2 ([17]). The exponential function exp : Cn×n → Cn×n is smooth and for xy = yx the total derivative satisfies,

dexp(x)(y) = exp(x)y = yexp(x).

The matrix logarithm also has a power series within the ball of radius one around the

identity B1(I) such that the matrix logarithm is defined as

∞ X xk log(X): B (I) → n×n : X 7→ (−1)k+1 , 1 C k k=1

121 where its differential with the requirement that ||X|| < 1 and XY = YX is

d(log(X))(Y ) = (I + X)−1Y.

It is worth commenting that given a power series, the Jordan canonical form definition A.1.1 satisfies the power series definition. We now have the following theorem.

Theorem A.4.3 (Exponential is a diffeomorphism [17]). Let M++ be the set of positive def- inite, Hermitian matrices. The matrix exponential map exp restricted to Hermitian matrices is a diffeomorphism onto M++. The logarithm is the inverse of the matrix exponential map.

122