<<

The Concept of : Quantum Communication Advanced Quantum Mechanics: Final Report

Medha Goyal

PHYS 243 at the University of Chicago

1 Introduction: What is ?

Information theory is the study of how to quantify, store, and communicate information [1]. It is concerned with ideas of transmitting data compactly, efficiently, and without error. In the modern world, where we are all reliant on the internet for information, where increasing amounts of data has to be stored without issue, and where we want to know we can transmit information reliably, the basic ideas of information theory should resonate with us deeply. The founding of information theory as a discipline is largely attributed to Claude Shan- non’s seminal paper from 1948: "A Mathematical Theory of Communication" [2], in which he proposed several ideas we take for granted today. The ideas he put forth in his paper have made a significant impact in fields as diverse as statistics, computer science, electri- cal engineering, cryptography, and even linguistics! The conception of the internet is often traced back to this paper. Among the ideas he put forth are a conception of error-free com- munication, of maximizing the "information content" per message sent, and even of naming a unit of information a "". While a lot of these ideas may be familiar to us, there is one concept introduced that has seeped into the language of the lay person, but which is thrown around carelessly and often misused, so much so that it has been used taken completely out of context to describe the depiction of chaos in art by academics in the field of art history! That concept is ’entropy’. Physicists should find this word familiar, as in thermodynamics it relates to the progression of time through irreversible processes[3], and in statistical mechanics it describes a statistical in the state of a physical symptoms based on the number of states it can be in and their relative . While neither of these two definitions are exactly how entropy is defined in the context of information theory, the second of these two definitions comes very close. This paper starts with the basic ideas of discrete and quantum data transmission, and touch on very simple ideas of quantum error correction and redundancy, all of which to build up to an exploration of entropy in both classical and quantum contexts.

2 Long Distance Communication When Shouting Just Won’t Work

2.1 The Process of Data Transmission

The process of data transmission starts with a sender, who picks a message mi from a set of messages M, and transmits a corresponding to this message through a communica- tion channel [4]. The original message is considered discrete, but since physical are continuous in the real world, the message to be transmitted must first be converted into an 2 Medha Goyal analogue representation. The process of turning the message into a vector of real numbers is called encoding and the process of then choosing the corresponding analogue wave-form is called modulation. The message reaches a receiver, who then decides on what the message must have been by observing the output of the channel and finding the message amidst (a) deterministic distortions and (b) random noise. The process of deciding upon the received message, by minimizing the of an error, is called detection.

Data Encoding and Modulation: Each message m is turned into a symbol by a vector encoder, represented by a real vector x. Each possible message maps to a vector x with a different value. A modulator then converts each x into a continuous waveform xi(t). The channel then distorts the continuous waveform into yi(t). Data Detection: A demodulator converts yi(t) into a vector y (analogous to x). The vector is then decoded to get the message ˆmi. Hence the probability of error is defined as Pe ≡ P (mi 6= ˆmi). Code: We use vector xk to denote a vector sent at time t = k. A code C is a set of one or more indexed sequences, or codewords xk which are formed by concatenating symbols from the output of the encoder. Each codeword in the code has a one to one mapping with encoder-input messages.

2.2 Interlude: When Things Go Wrong in Quantum Data Transmission

Errors due to noisy transmission is not limited to classical information transmission1.A budding subfield of quantum information theory is quantum error correction, which in- volves building circuits to correct errors that occur during transmission. One of the most insurmountable engineering challenges of building a quantum computer is the fact that quantum information will interact with its environment, leading to decoherence (a loss of information). This is one source by which error is introduced to the quantum message. A quantum error correcting code (QECC) aims to recover the original message by map- ping k into n qubits (a map from Hilbert space of dimension 2k to one of dimension 2n) where n > k)[5]. The k qubits represent the message we want to encode, so we add redundancy in the form of the remaining n-k qubits to minimize the chance that the errors are made on the k important . In addition to classical errors, called bit flip errors, where |0i ↔ |1i, there are also phase errors in the quantum case, such as |0i ↔ |0i, |1i ↔ − |1i. However, quantum errors are continuous, so really a bit flip or phase shift could be by any number of intermediate angles between 0 and 360 degrees. This makes the task of quantum error correction non-trivial, and there are many codes out there to correct quantum errors. It would be beyond the scope of this paper to go through all possible QECCs, but to illustrate this particular application of quantum entropy, we will calculate the entropy of one of the simplest QECCs: the 3- bit-flip code[6]. This code only considers qubit flips, and not the phase changes, so it is not a full quantum code, but it is sufficient for our purposes. We start with the two basis states |000i and |111i. We can map any arbitrary single qubit state α |0i + β |1i to α |000i + β |111i using the quantum circuit shown in Fig.2.2: If |Ψi = α |0i, then the input state is α |000i, and the application of the CNOT gates means that nothing would happen to the second and third qubit (they would stay 0). If the

1 The process just described relates to sending classical information. Quantum data transmission has been done as well, but the process is less standardized (they differ more significantly between experiments), so we will skip a discussion of equivalent quantum methods of data transmission. The Concept of Entropy: Quantum Communication 3

Fig. 1. 3 Qubit Bit-Flip Circuit: Mapping α |0i + β |1i to α |000i + β |111i [6]

input |Ψi = β |1i, then the input state is β |100i but the CNOT gates would flip both of the second two qubits and the output state is β |111i. To correct errors using this code, we would add two ancilla qubits that extract in- formation about possible errors. The circuit shown in Fig.2.2 includes both the encoding component as shown in Fig.2.2, and a correction component. For the sake of simplicity, we consider no errors to occur during encoding (those CNOT gates are sound), and only between the encoding and correction steps.

Fig. 2. 3 Qubit Bit-Flip Circuit with Ancilla Qubits for Measurement and Error Correction [6] 4 Medha Goyal

The first ancilla is connected by a CNOT gate to the input |ψi. If |Ψi = α |0i, then the ancilla stays |0i. If |Ψi = β |1i, then ancilla flips to from |0i to |1i. Next a CNOT gate connects the same ancilla to the second qubit of the original code. If the original |Ψi = α |0i, and there was no error, then the second qubit would still be α |0i, and the first ancilla would stay |0i. If there was an error, the second qubit would be β |1i and the ancilla would flip to |1i. It is then measured at the end of the circuit. A similar process occurs if the original |Ψi = 0, and with the second ancilla. The results can be summarized in this table.

Error Location Final State: |datai |ancillai No Error α |000i |00i + β |111i |00i Qubit 1 Flip α |100i |11i + β |011i |11i Qubit 2 Flip α |010i |10i + β |101i |10i Qubit 3 Flip α |001i |01i + β |110i |01i

Note that each ancilla combination is different for each possible scenario. Knowing what the ancilla values are, we can now apply a "correction" on the qubit with a bit flip error by applying a X gate to that qubit. So, if we the ancilla values to be |11i, we then know to apply an X gate to qubit 1. Unfortunately this QECC only works for a maximum of one qubit error. If we get a bit flip error in qubits 1 and 2, then the ancilla measurement becomes |01i, and the assumption is that qubit 3 is erroneous. The X gate will be applied to qubit 3, and in fact all three qubits will have been flipped, meaning that our final result is the exact opposite of what we wanted it to be.

2.3 The Uncertainty Inherent to Data Transmission

Based on our description in sections 2.1-2.3 about the way data transmission works, we see that since a communication channel may distort and add noise to messages, there is a lot of uncertainty associated with the process. Our only conception of what the original message may have been relies on the fidelity of our decoding process which gets us a message ˆm which may not equal the original message m. If we were electrical engineers, we could discuss different demodulation processes, and the corresponding probability of error Pe. However as physicists we are more interested in finding a physical limit to the associated error. We want to understand the uncertainty inherent to each original message. We encountered the term entropy in statistical mechanics, as a measure of how much uncertainty there is in a physical system. In the field of information theory, entropy in a classical sense measures the amount of uncertainty inherent to a X before we learn the value of X. We can also rethink this definition, and instead consider classical entropy to be the average amount of information we gain upon learning the value of X [7]. Let us say that the random variable X represents the throw of a dice hidden within a black box that we cannot see, only feel. We roll the dice once (take a measurement) and get a value of six (measure the xi). Then the best we can suppose is that 6 (X = xi) is the only outcome of X. When we roll the dice again (measure again), if we get a different value 4 (X = xj) we must re-evaluate our assumption, and decide that 6 (X = xi) and 4 (X = xj) are equally probable. Upon measuring an infinite number of times, we get a distribution that is closer to the actual probabilities of each outcome of the random 1 variable X (X = 1,2,3,4,5,6 each with P (X) = 6 ). So with each new measurement we get The Concept of Entropy: Quantum Communication 5 more information on possible outcomes of X. You can think of it as an . If we choose a measurement at random, or if we continue to take continuous measurements n times, then as n approaches infinity, our best guess for information content associated with that measurement would be the average information content we gain upon learning the value of X, which is just the definition of entropy.

3 The Definition of Classical Entropy

3.1 Shannon’s Rules for Calculating Entropy

In the example of the entropy associated with rolling a dice, our random variable X was hidden in a black box. This was a useful example for our data transmission model because like in the case of the receiver who has gotten message ˆm and has to assume that this is equal to message m (or else try to figure out how different ˆm and m could be) we do not know what random variable X is, we do not know what it looks like, what it was intended to be, only what its outcome is, which is any of the integers between and inclusive of 1 and 6. It is this inability to see the dice (the noisiness of our communication channel) that was our source of uncertainty in data transmission. We want to make sure it is clear that we are discussing the entropy inherent to a random variable. So to reframe this discussion a little bit, let us use language used by C.E. Shannon, one of the founding fathers of information theory, in his landmark 1948 paper "A Mathe- matical Theory of Communication". He refers to entropy both as the level of uncertainty of an outcome but also the degree of "choice" involved in selection of the event. When constructing a mathematical function for the value of entropy associated with a measurement outcome in terms of the probability of each outcome, Shannon set up a few stipulations, two of which are given below.

1. The entropy should be continuous with respect to the probabilities of each possible outcome of X. 1 2. In the case that each possible outcome xi of X is has an equal likelihood n , we get with each new outcome an increasing level of uncertainty of what the outcome could be, or in other words we have an increasing degree of choice when selecting a possible outcome. So H should be a monotonically increasing function of the number of outcomes, or n.

3.2 Definition of Shannon Entropy

With these stipulations, Shannon came up with a definition for entropy which is as follows[2]: For a random variable X with n possible outcomes x1, x2, ...xn, the classical entropy, called the Shannon Entropy, is defined in Eq.1 where the is base 2.

n X H(X) ≡ − p(xi) log p(xi) (1) i=1 To calculate the entropy of a series of the same random variable repeated m times, we would use the formula H(Xm) as in Eq.2.

x x x m X X X H(X ) ≡ − ... p(xi)p(xj)...p(xm) log p(xi)log p(xj)...log p(xm) (2) i=1 j=1 m=1 6 Medha Goyal

Since only one of the many outcomes can occur at a time, these outcomes are all orthog- onal, which means that all products of p(ai), p(aj),p(ak) where i,j, and k were not equal simplify to 0, and S(An) simplifies to Eq.3

n m X H(X ) = −m p(xi) log p(xi) = mH(X) (3) i=1

3.3 Some Related Definitions 1. Joint Entropy H(X,Y ) ≡ − P p(x, y)logp(x, y) Measures total uncertainty of outcome of random variables X, and Y

2. H(X|Y ) ≡ H(X,Y ) − H(Y ) Measures uncertainty of the outcome of Y given that we know the outcome of X

3. Content of X and Y H(X : Y ) ≡ H(X) + H(Y ) − H(X,Y ) H(X : Y ) = H(X) − H(X|Y ) H(X : Y ) = H(Y ) − H(Y |X) Measures the shared information content of X and Y, i.e. how many bits of information about A is revealed by knowing the value of B.

3.4 An Illustration: Approximating the Entropy of English Zeroth Order Approximation Using Eq.1, we can calculate the entropy of a single let- ter in the English alphabet. That is, we calculate how many classical bits are needed to convey the information of a single letter. Call the act of choosing a letter the random vari- able A. x is the number of outcomes of choosing the letter, which equals 26. If we were to choose the letters without bias, something which is very difficult for people to do without the aid of a computer, then each letter would have an equal probability of being chosen, so 1 p(A = a) = p(A = b) = ... = p(A = z) = 26 = 0.0385. Substituting this value into Eq.1 1 1 reveals that the entropy of a single letter is −26 26 log2( 26 ) = 4.7, so you would need 4.7 bits to convey a single letter in the alphabet. And the entropy of n letters in a row is equal to n times 4.7 bits using Eq.3.

When conveying a message that has the potential to be more than one word, however, we must include a space as the ’27th letter’. The total entropy is then 4.76 bits per letter. This is the zeroth order approximation.

First Order Approximation However, the frequency of each letter in the English alpha- bet is not the same, so if we were to calculate the entropy of the word ’textbook’ in bits, 8 1 the entropy is not simply equal to S(A ) where the probability of each letter is 26 . Rather, the probability for ’e’ will be higher than for ’x’.

Using a list of frequencies of letters in the English alphabet, the first order approximation of entropy of a letter in the English alphabet was found to be about 4.03 bits. The Concept of Entropy: Quantum Communication 7

Second Order Approximation The 27 possible outcomes when choosing the next letter to encode are not all mutually independent. For example ’h’ is more likely to come after ’t’ than after the letter ’z’ is. So to calculate the entropy of the English language to a second order approximation we can’t just use the frequency of individual letters, but instead use a probability of each subsequent letter that is dependent on the previous letter. This is called a ’digram’ structure. By this approximation, the bits per letter is 3.32.

Higher Order Approximations Each n-order approximation depends on the probability of the nth letter occurring with respect to the last n-1 letters. Continuing this way, you can approach the true entropy of the English language. However, as the text length increases, calculating the probability dependence of each letter with respect to the previous n-1 letters becomes computationally intense. Since there does not seem to exist tables with relative frequencies listed for higher order approximations, (at least at the time Shannon was writing), Shannon instead consulted word frequency tables[8], and found an entropy per word of 11.82 bits. Given that English words have on average 4.5 letters, that leaves us with 2.62 bits per letter, still lower than our previous calculations. Using a complicated thought experiment, involving subjects that guess letters of a text, and then involving a second round where the identical twin of the subjects guess letters based on the first guess (instead of the original text) and so on, Shannon constructed an ’N-gram predictor’. The details of this exercise can be perused in Shannon’s 1951 paper "Prediction and Entropy of the English Language" [8], but his final result was about 1.3 bits per letter.

The Surprising Benefits of Redundancy We usually think of the word redundant in negative light. If something is redundant, it is not needed. However when it comes to error prevention, it is highly desirable to add redundant bits to data transmission so that if errors occur, they are less likely to occur to the message itself, but rather to the redundant bits that separate the bits carrying our message. The fact that the entropy per letter of the English language that we originally calculated using a zero-order approximation is so much higher than when we calculated the entropy using the frequency of words, suggests that there is some redundancy is built into the English language. That is, there is room for compression. Not all of the letters are necessary. You know this already, of course, if you are a fluent English speaker. For example, we use the word ’a’ in front of words that start with consonants, like ’textbook’ and ’qubit’, whereas we append the additional ’n’ to ’a’ when announcing words that start with vowels, such as in ’an apple’. The additional ’n’ gives us information regarding the fact that the word to come starts with a vowel, rather than a consonant. However, when we then transmit the second word ’apple’, we receive the information again that ’apple’ starts with a vowel, and in fact, we are even told that the vowel it starts with is ’a’. So the entire message ’an apple’ gives us twice over the information that the second word starts with a vowel. In other words, there is redundant information in an English message and that gives us the power to compress the code into something more compact – something with a higher information efficiency. So how do we figure out the extent to which we can compress this information? Luckily Shannon already answered this question, with his Noiseless Channel Coding Theorem. You can read more about it in the textbook by Nielsen and Chuang cited under references [7]. 8 Medha Goyal

4 Quantum Entropy: Von Neumann Entropy

There is a quantum analogue to the Shannon entropy, defined by Von Neumann as shown in Eq.4 [7]:

S(ρ) ≡ −tr(ρ log ρ) (4) In Eq.4, ρ is a density matrix, and the logarithm is base 2. A calculation of Von Neumann entropy tells us how many bits are needed to convey some quantum information. If we diagonalize ρ and define the eigenvalues as λx then we can rewrite the Von Neumann Entropy as shown in 5. X S(ρ) = − λx logλx (5) x

If we consider the eigenvalues λx in Eq.5 to be analogues of the probability of each outcome, then we have reduced the definition to the classical definition of entropy, or the Shannon entropy.

In analogy with the classical case, if you have a string of information for which you want to calculate the entropy, take the tensor product of each density matrix (where a density matrix represents one unit of information, e.g. a quantum letter) like so: ρ ⊗ ... ⊗ ρ = ρn

S(ρn) = −tr(ρnlogρn) S(ρn) = −tr(ρn) − tr(logρn) S(ρn) = −tr(ρρρ...ρ) − tr(nlogρ) S(ρn) = −n tr(ρ) − n tr(logρ) S(ρn) = −n(tr(ρ) + tr(logρ)) S(ρn) = −n tr(ρlogρ) S(ρn) = nS(ρ)

4.1 An Illustration: The Entropy of a Spin-1/2 Particle

Let us prepare a very simple state that represents a spin-1/2 particle: √1 (|0i+|1i). In order 2 to calculate its entropy we find its density operator by multiplying it with its conjugate.

1 (|0i + |1i)(h0| + h1|) 2 1 = (|0i h0| + |0i h1| + |1i h0| + |1i h0|) 2 . This yields the following density operator ρ, which has eigenvalues of 0 and 1:

 1 1  2 2 1 1 2 2 The Concept of Entropy: Quantum Communication 9

We can substitute these eigenvalues into Eq.5:

X S(ρ) = λxlogλx x

= −0 log 0 − 1 log 1

= 0 − 1log1 = 0

An entropy of 0 means that there is no information to be gained by selecting an outcome: any additional measurement / selection yields no new information. Now let us repeat this calculation in a more general sense, for a quantum system in a su- perposition of two states with an uneven probability. Then we can describe its wavefunction like this:

ψ = α|0i + β|1i

ρ = α2 |0i h0| + αβ |0i h1| + βα |1i h0| + β2 |1i h1|

The density matrix then looks like this:

α2 αβ βα β2

Then the eigenvalues λ are 0, α2 + β2. Since the total probability of all states added together is 1, and the probability of each state in a wavefunction is the square of its coeffi- cient, β2 = 1 − α2, and substituting this into the entropy equation, we get for any value of alpha:

S(ρ) = 0 log0 − (α2 + β2)log(α2 + β2)

S(ρ) = 0 − (α2 + 1 − α2)log(α2 + 1 − α2)

S(ρ) = 0 − 1log1 = 0

So regardless of what the probability is for each state, the entropy of any quantum superposition of two states in a pure state system is 0. We can reason it out in the following way. The particle exists in nature as a superposition of two possible spin states, and while we may get a value every time we measure its spin, this is no guarantee we get the same value the second time we measure the same spin. By the Copenhagen interpretation, every measurement we take of a quantum state causes the wave-function to collapse into one of its states. Hence even when the probabilities are not equal, every new measurement is a ’reset’ in a sense. The second measurement is not only independent of the first, but gives us no new information than we had already known. The classical concept of entropy being the average rate of information gained over time can no longer be considered in that way. It turns out in fact that the entropy of any pure state (not just a 2 state system like the one described in this section) is 0. Now what if the quantum system is in a mixed state? 10 Medha Goyal

4.2 Another Illustration: Entropy of a Bell State It is well known that a Bell state is one of the simplest entangled states. On its own it is a pure state. We can directly calculate the entropy to verify this for one of the bell states. The bell state √1 (|00i + |11i) corresponds to the following density matrix: 2

1 0 0 1

We can substitute this into the Von Neumann equation. The power series definition of the logarithm of a matrix is reproduced below:

(ρ − I)2 (ρ − I)3 log(ρ) = (ρ − I) − + − ... 2 3 Since ρ equals 0, ρlogρ equals 0, and so is the trace, so the Von Neumann entropy is 0. The Bell state is a pure state. However, we are not satisfied by this, and still want to look at the entropy of one of the qubits of the bell state. So instead we look at the reduced density matrix of one of its qubits So to get the reduced density matrix instead of the whole density matrix, we remove the cross terms, and should end up with this: 1 (|0i h0| + |1i h1|) 2

This gets us the reduced density matrix ρA: 0.5 0  0 0.5

So substituting this into the Von Neumann entropy should get us log 2, which equals 1. So in total, although the whole Bell state itself is a pure state, it is comprised of two qubits each of which is in a mixed state. Hence the entropy for each individual qubit is nonzero. We earlier suggested that the entropy of pure states and only pure states are 0. The fact that this individual qubit has a nonzero entropy suggests that it is not pure. We can check this by taking the trace and the trace squared of the reduced density matrix.

tr(ρA) = 1 1 tr(ρ2 ) = A 2 Only for pure matrices is tr(ρ2) = 1. Otherwise tr(ρ2) < 1. So the qubit is a mixed state, and our previous statement is supported. Making sense of this insight, we see that while the average information rate of taking measurements of the Bell state is 0, the average information rate of taking measurements of the state of one of the qubits repeatedly is actually non-zero. We are getting information out of measuring the state of one of two entangled qubits, but not out of measuring the total Bell state. How is it possible that measuring one qubit gives us information, but measuring the whole system does not? Well we know that a Bell state describes an entanglement of two qubits. The measurement of one influences the other, so both qubits when measured at the same time yields meaningless results. And before being measured, both qubits can be The Concept of Entropy: Quantum Communication 11 in either of their two states. But once one of the qubits is measured, then we know with perfect certainty what the other qubit is. In this way, the measurement of one qubit gives us information about the other. Hence it has a non-zero entropy. So by considering the entropy of pure states, mixed states, and entangled states we can come at quantum concepts that are hard to wrap your brain around from a different perspective!

References

1. Wikipedia. 2020. "Information Theory". Last Modified: 14 March 2020. 2. Claude E. Shannon. A Mathematical Theory of Communication. The Bell System Technical Journal. 10.1002/j.1538-7305.1948.tb01338.x, 1948 3. Wikipedia. 2020. "Entropy (classical thermodynamic). Last Modified: 14 November 2018. 4. Cioffi, John. "Signal Processing and Detection" https://ee.stanford.edu/ cioffi/doc/book/chap1.pdf 5. Preskill, John. "Quantum Information Theory" http://www.theory.caltech.edu/people/preskill/ph229/notes/chap7.pdf 6. Simon J. Devitt, William J. Munro, and Kae Nemoto. "Quantum Error Correction for Begin- ners." arXiv, 2013. 7. Nielsen, Michael; Chuang, Isaac. textitQuantum Computation and Quantum Information. Cam- bridge University Press, 2010. 8. C. E. Shannon, "Prediction and entropy of printed English," in The Bell System Technical Journal, vol. 30, no. 1, pp. 50-64, Jan. 1951.