<<

Quantum Computation (CMU 15-859BB, Fall 2015) Lecture 22: Introduction to Quantum November 23, 2015 Lecturer: John Wright Scribe: Tom Tseng

1 Motivation

When conducting research in quantum physics, it is essential that, once we create or are otherwise in possession of a , we can determine what the state is with high accuracy. For instance, researchers have conducted various experiments that support that the quantum teleportation procedure, which we discussed in lecture 3, works. These experiments involve sending quantum states over long distances. But how does the receiving party verify that the state that they receive is the same as the one that was sent by the other party? This is a problem with significant practical importance and is precisely what tomography aims to solve.

2 The problem

2.1 Definition Roughly speaking, tomography is the problem where we are given an unknown mixed state ρ ∈ Cd×d and where our goal is to “learn” what ρ is. We would like to take the input ρ and put it through some tomography algorithm that will output the classical d × d state describing ρ.

classical state   quantum state ρ1,1 . . . ρ1,d Tomography ρ  . .. .  = ρ algorithm  . . .  ρd,1 . . . ρd,d

This is similar to hypothesis testing, which we have been discussing during the past few lectures. The difference is that we have no prior information as to what ρ is. Actually, there is a problem with the above description: the goal is impossible to achieve. If we really had an algorithm that could completely learn everything about ρ, then we could replicate ρ, contradicting the no-cloning theorem. What can we say about the state ρ, then? Not much. Suppose we measured ρ with respect to some {|v1i , |v2i ,..., |vmi} and observed |vii. All we learn is that hvi| ρ |vii,

1 the probability of observing |vii, is non-zero, and maybe we can further guess that hvi| ρ |vii is somewhat large. This tells us little useful information about ρ. Although no perfect tomography algorithm exists, we can make some adjustments to the conditions of the problem to make tomography more tractable. We saw last lecture that when using the Pretty Good Measurement for the hidden subgroup problem, we could generate several copies of the coset state in order to obtain measurement probabilities that were more favorable. That gives our first adjustment: we can stipulate that we are given n ∈ N many copies of ρ. In addition, the requirement that we have to output exactly ρ is d×d too restricitive, so our second adjustment will be that we only have to output some ρe ∈ C that approximates ρ with high probability.

quantum states classical state ρ   ρ1,1 ... ρ1,d Tomography e e ρ  . .. .  = ρ algorithm  . . .  e ρd,1 ... ρd,d ρ e e

This situation is more sensible. With n copies of ρ, perhaps we can learn enough infor- mation to make a fairly precise guess as to what ρ is. Experiments that test quantum teleportation are verified in this way. In 2012, a group of researchers sent a quantum state between two islands that were 143 miles apart. The result was analyzed by applying quantum tomography to the state, which was sent many times over [MHS+12]. The above discussion motivates the following definition of tomography.

Definition 2.1. In the problem of quantum state tomography, the goal is to estimate an unknown d-dimensional mixed state ρ when given ρ⊗n, where n, d ∈ N. Explicitly, we want a process with the following input and output.

Input: a tensor power ρ⊗n of the unknown mixed state ρ ∈ Cd×d, and an error margin ε > 0.

d×d Output: a state ρe ∈ C such that dtr(ρ, ρe) ≤ ε with high probability.

2.2 Optimizing the input If ρ were given to us in the form ρ⊗n where n is unimaginably large, we could estimate ρ with great accuracy. However, in reality, preparing so many copies of ρ is expensive. The main question, then, is, given the constraint on the output, how small can n be? Here are a couple of results that shed some on this question.

2 Fact 2.2. To solve the tomography problem, it is necessary to have n ∈ Ω(d2/ε2) and sufficient to have n ∈ Θ(d2/ε2). Fact 2.3. If ρ is pure, then it is sufficient to have n ∈ Θ(d/ε2). If ρ is pure, then it is likely also necessary to have n ∈ Ω(d/ε2). However, the best known bound is n ∈ Ω(d/ (ε2 log(d/ε))) [HHJ+15]. It makes sense that n need not be as great if ρ is pure, since ρ is then a -1 matrix that can be fully described as the outer product of a d-dimensional vector. At first glance, this looks promising. Quadratic and linear complexity with respect to d seems efficient. However an m- state will have dimension d = 2m. Therefore, in the case of fact 2.2, we would need n ∈ Θ (4m/ε2). This fast exponential growth in the size of the input is the biggest obstacle to quantum tomography at present. In the remainder of this lecture, we will focus on pure state tomography and prove fact 2.3. The general bound from fact 2.2 is considerably more difficult to prove, so we will not discuss it. For more detail, consult the recent paper written by the instructors of this course [OW15].

3 Pure state tomography

3.1 Relation to hypothesis testing ⊗n In pure state tomography, we are given |ψi ∈ Cd in the form |ψi , and we want to output |ψei ∈ Cd such that   dtr |ψi hψ| , |ψei hψe| ≤ ε. When looking at pure states and their trace distance, we have the following fact. Fact 3.1. For |ψi , |ψei ∈ Cd, the trace distance of |ψi and |ψei satisfies the following equality: r   2

dtr |ψi hψ| , |ψei hψe| = 1 − hψ|ψei . Thus the goal is to make our guess |ψei have large inner product with |ψi. How do we achieve that? This problem resembles worst-case hypothesis testing. In ⊗n ⊗n worst-case hypothesis testing, we are given some state |ψi from some set {|vi }|vi, and we try to guess at |ψi. The difference in tomography, of course, is that the set of possibilities is infinitely large and that we have some margin of error. This problem is also similar to average-case hypothesis testing, which, as we saw last lecture, was related to worst-case hypothesis testing through the minimax theorem. In tomography, we are not given a , so it is similar to the situation ⊗n ⊗n where we try to guess |ψi given a uniformly random |ψi ∼ {|vi }unit |vi. We saw that if we have an optimal algorithm for average-case hypothesis testing, then it is closely related to the optimal algorithm for worst-case hypothesis testing. Thus, to devise the algorithm for tomography, we can get away with just solving the average-case situation on the hardest distribution – the uniform distribution.

3 3.2 Infinitesimals The strategy for tomography is that we will merely pretend we are given a probability ⊗n distribution. With input |ψi , we will treat |ψi as a uniformly random unit vector in Cd. What does it mean to have a uniformly random unit vector? Since the state is uniformly random, the probability of measuring any particular |ψi ∈ Cd is infinitesimally small. We will write this as Pr[|ψi] = dψ. We need to choose a POVM suitable for tackling this probability distribution. Last lecture, we saw that a good way to deal with average-case hypothesis testing was to run the Pretty Good Measurement, in which, given an ensemble {(σi, pi)}i of density matrices and probabilities, we would choose a POVM whose elements resembled the states we knew were in the ensemble. Namely, we took each POVM element Ei to be piσi adjusted by a couple of S1/2 factors. We will reappropriate that technique for our situation. The version of the Pretty Good Measurement for tomography is a POVM with elements resembling

⊗n ⊗n E|ψi ≈ dψ · |ψi hψ| .

This gives an infinite number of POVM elements, which means that the typical constraint P that i Ei = I needs to be rethought. We instead consider a new constraint that Z E|ψi = I |ψi where I is the n × n identity matrix. This talk of infinitesimals and integrals might seem non-rigorous, but it is really just informal shorthand that will be notationally convenient for us. A more rigorous analysis could be achieved by instead taking finite-sized that approximate this infinite scheme and then considering the limit as you make the finite POVMs more and more accurate (and consequently larger and larger in the number of elements). (A natural question to ask is, “how would we actually implement measurement with a POVM of infinitely many elements?” We would have to take a finite-sized POVM that approximates the infinite POVM. This is not a big sacrifice; we do a similar discretization of randomness in classical algorithms whenever we implement a randomized algorithm that calls for a real number from the uniform distribution over [0, 1].)

3.3 The algorithm Now we are ready to present the algorithm for pure state tomography that will prove fact 2.3.

4 Pure State Tomography Algorithm ⊗n Input: We are given |ψi , where |ψi ∈ Cd. We know n and d, which are natural numbers.

Strategy: Measure with the POVM consisting of {E|vi}unit vectors |vi∈Cd , where n + d − 1 E = |vi⊗n hv|⊗n dv. |vi d − 1

Our output: The measurement will give some vector |ψei ∈ Cd, which is what we will output.

Let’s do a quick sanity check by making sure our output is sensible. We have    ⊗n ⊗n n + d − 1  ⊗n ⊗n ⊗n ⊗n Pr[|ψei] = tr E |ψi hψ| = tr |ψei hψe| |ψi hψ| dψe |ψei d − 1 n + d − 1 2n = hψ|ψei dψe (1) d − 1 where the last equality follows from the cyclic property of the trace operator. Notice that

hψ|ψei is 1 if |ψei is equal to |ψi and less than 1 otherwise. This signals that the probability of outputting a particular vector is greater if that output vector is close in dot product to the input state, which is what we want. R Before the analyze the accuracy of the algorithm, we need to first verify that |ψi E|ψi = I.

3.4 The integral constraint The presence of that binomial coefficient in the POVM elements might seem strange, but R it is merely a scaling factor that makes the constraint |ψi E|ψi = I hold. Why does this particular coefficient appear, though? Notice that the input is guaranteed to be of a certain form, namely, a tensor power with specific dimensions. Naturally, we want to understand the space in which these inputs live.

⊗n Definition 3.2. The vector space Sym(n, d) is span{||vi i : |vi ∈ Cd}. The notation stands for “symmetric” since vectors of the form |vi⊗n are, in a way, sym- metric as a tensor product. As an abbreviation, we will simply refer to this space as Sym with the n and d parameters being implied. The binomial coefficient comes from the dimension of this space.

n+d−1 Lemma 3.3. The dimension of Sym(n, d) is d−1 . We will not prove this lemma. (It might show up as a homework problem.)

5 We promised that this would make our POVM satisfy the integral constraint, so of course we must now check that Z   Z n + d − 1 ⊗n ⊗n ? E|vi = |vi hv| dv = I. |vi d − 1 |vi holds. (Here, the integral is only over unit vectors, and this will be true of all the integrals we consider in the remainder of this lecture.) In order to show this, we will use the following proposition.

Proposition 3.4. Let Z M = |vi⊗n hv|⊗n dv = E |vi⊗n hv|⊗n |vi unit |vi

and let n + d − 1 Π = M. d − 1 Then Π projects onto Sym.

Proof. Let PSym be the projector onto Sym. We will start by showing that M = CPSym for some constant C. To show this, we only need to show that M zeroes out any vector perpendicular to Sym and that M scales vectors in Sym by a constant factor. This is sufficient since it is easy to see that M is linear. The first part is easy. Because |vi⊗n ∈ Sym, we have that for any |ui ∈ Sym⊥,

M |ui = E |vi⊗n hv|⊗n |ui  = 0 unit|vi | {z } 0 as desired. For the second part, take any |ui ∈ Sym and write it as |ui = |wi⊗n for some |wi in Cd. We will illustrate the case where n = 1. (The proof for the general case where n > 1 is about the same, except more annoying. For that reason, we will skip it.) Since M is linear, we can assume that |wi is a unit vector without loss of generality. We have

M |wi = E [|vi hv| |wi] unit |vi

For arbitrary |vi, decompose |vi into

|vi = α |wi + |w⊥i

where |w⊥i is some vector that is perpendicular to |wi. Because |vi is random, the coefficient α is random as well. In addition, because |vi is uniformly random, |vi appears with the same

6 probability as α |wi − |w⊥i. We use this to break up the expectation before combining it again with linearity of expectation, getting M |wi = E [|vi hv| |wi] = E (α |wi + |w⊥i)(α† hw| + hw⊥|) |wi unit |vi α 1 1 = E (α |wi + |w⊥i)(α† hw| + hw⊥|) |wi  + E (α |wi − |w⊥i)(α† hw| − hw⊥|) |wi  2 α | {z } 2 α | {z } 0 0 1 1 = E (α |wi + |w⊥i)α† hw|wi + E (α |wi − |w⊥i)α† hw|wi 2 α 2 α 1 ⊥ † ⊥ † = E (α |wi +|w i)α hw|wi + (α |wi − |w i)α hw|wi 2 α = E αα† |wi hw|wi  = E |α|2 |wi α | {z } α 1  2 It is not hard to see that Eα |α| is independent of |wi, i.e. is a constant, thanks to the symmetry induced by the uniform randomness of |vi. Thus M |wi = C |wi where we let C  2 be Eα |α| .

All this implies that M = CPSym for some constant C. We can find the actual value of the constant by taking the trace of both sides of the equality and performing some manipulations: n + d − 1 C = C dim Sym = C tr(P ) = tr(CP ) d − 1 Sym Sym = tr(M) = E tr |vi⊗n hv|⊗n = E hv|vin = 1 unit |vi unit |vi where the first equality uses lemma 3.3, the penultimate equality uses the cyclic property of the trace operator, and the last equality uses the observation that |vi is a unit vector. n+d−1 Dividing through by d−1 gives 1 C = , n+d−1 d−1 and therefore 1 Π = M = P . C Sym This proves that Π indeed projects onto Sym. R n+d−1 R ⊗n ⊗n This proposition tells us that |vi E|vi = d−1 |vi |vi hv| dv = Π projects onto Sym, and consequently it is the identity within Sym. Since Sym is the only subspace our POVM R operates upon, this shows |vi E|vi = I as desired.

3.5 Algorithm analysis We have proven that the POVM in our algorithm is valid, and now we will show that our algorithm is accurate. The previous proposition gives the following corollary.

7 Corollary 3.5. For any unit |ψi ∈ Cd, the equality 1 E |hv|ψi|2n = . unit |vi n+d−1 d−1 holds. Proof. We know that Π projects onto Sym and that |ψi⊗n ∈ Sym. Thus Π |ψi⊗n = |ψi⊗n, which gives tr Π |ψi⊗n hψ|⊗n = tr |ψi⊗n hψ|⊗n = tr(hψ|ψin) = 1. We also have tr Π |ψi⊗n hψ|⊗n = tr hψ|⊗n Π |ψi⊗n = hψ|⊗n Π |ψi⊗n n + d − 1 n + d − 1 = E hψ|⊗n |vi⊗n hv|⊗n |ψi⊗n = E |hv|ψi|2n . d − 1 unit |vi d − 1 unit |vi Putting these two equalities together, we get n + d − 1 E |hv|ψi|2n = 1. d − 1 unit |vi n+d−1 Dividing both sides by d−1 gives the result. The next theorem gives an upper bound on the error of our algorithm. ⊗n Theorem 3.6. Given an input |ψi , the pure state tomography algorithm outputs |ψei with the property that h  i r d − 1 E dtr |ψi , |ψei ≤ . |ψei n + d Proof. We have "r # s h  i 2  2

E dtr |ψi , |ψei = E 1 − hψ|ψei ≤ 1 − E hψ|ψei . |ψei |ψei |ψei The first equality uses fact 3.1 from earlier in the lecture. The second equality first uses Jensen’s inequality on the concavity of the square root operator and then uses linearity of expectation. Looking only at the inner term, we have  2 Z 2 h i

E hψ|ψei = hψ|ψei Pr |ψei |ψei |ψei Z 2 n + d − 1 2n

= hψ|ψei hψ|ψei dψe |ψei d − 1 n + d − 1 Z 2(n+1)

= hψ|ψei dψe d − 1 |ψei n + d − 1 1 d − 1 = = 1 − d − 1 n+d n + d d−1

8 where the second line uses equation (1) from early in the lecture and where the last line uses corollary 3.5. This finally gives s s h  i  2  d − 1 

E dtr |ψi , |ψei ≤ 1 − E hψ|ψei = 1 − 1 − |ψei |ψei n + d which is the desired result. In particular, if we choose n ∈ Θ(d/ε2) as in fact 2.3, we get r h  i d − 1 p E dtr |ψi , |ψei ≤ ≤ d/n ∈ Θ(ε). |ψei n + d

This proves that the bound given in fact 2.3, n ∈ Θ(d/ε2), is indeed sufficient.

References

[HHJ+15] Jeongwan Haah, Aram W. Harrow, Zhengfeng Ji, Xiaodi Wu, and Nengkun Yu. Sample-optimal tomography of quantum states. 2015.

[MHS+12] Xiao-Song Ma, Thomas Herbst, Thomast Scheidl, Daqing Wang, Sebastian Kropatschek, William Naylor, Alexandra Mech, Bernhard Wittmann, Johannes Kofler, Elena Anisimova, Vadim Makarov, Thomas Jennewein, Rupert Ursin, and Anton Zeilinger. Quantum teleportation using active feed-forward between two canary islands. Nature, 489(7415):269–723, 2012.

[OW15] Ryan O’Donnell and John Wright. Efficient quantum tomography. 2015.

9