Basic Probability Theory A
Total Page:16
File Type:pdf, Size:1020Kb
Basic Probability Theory A In order to follow many of the arguments in these notes, especially when talking about entropies, it is necessary to have some basic knowledge of probability theory. Therefore, we review here the most important tools of probability theory that are used. One of the basic notions of probability theory that also frequently appears throughout these notes is that of a discrete random variable. A random variable X can take one of several values, the so-called realizations x,givenbythealphabet X. The probability that a certain realization x ∈ X occurs is given by the probability distribution pX(x). We usually use upper case letters to denote the random variable, lower case letters to denote realizations thereof, and calligraphic letters to denote the alphabet. Suppose we have two random variables X and Y , which may depend on each other. We can then define the joint probability distribution pX,Y (x, y) of X and Y that tells you the probability that Y = y and X = x. This notion (and the following definition) can be expanded to n random variables, but we restrict ourselves to the case of pairs X, Y here to keep the notation simple. Given the joint probability distribution of the pair X, Y , we can derive the marginal distribution PX(x) by pX(x) = pX,Y (x, y) ∀ x ∈ X (A.1) y∈Y and analogously for PY (y). The two random variables X and Y are said to be independent if pX,Y (x, y) = pX(x)pY (y). (A.2) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 219 R. Wolf, Quantum Key Distribution, Lecture Notes in Physics 988, https://doi.org/10.1007/978-3-030-73991-1 220 A Basic Probability Theory Furthermore, we can define the conditional probability that Y takes the value y ∈ Y, given that X takes the value x ∈ X: pX,Y (x, y) pY |X(y|x) = . (A.3) pX(x) To avoid complications, we use the convention that pX,Y (x, y) = 0ifpX(x) = 0. If X and Y are independent, pY |X(x|y) = pY (y) for all y ∈ Y. Using the definition of the conditional probability, (A.1) can be rewritten as pX(x) = pX|Y (x|y)pY (y) ∀ x ∈ X. (A.4) y∈Y In this form it is also called the law of total probability. Another important rule that relates different conditional probabilities is Bayes’ rule: pX(x) pX|Y (x|y) = pY |X(y|x) . (A.5) pY (y) This rule can be proved as follows: Note that (A.3) can be rewritten as pX,Y (x, y) = pY |X(y|x)pX(x). (A.6) It follows that pX,Y (x, y) pX(x) pX|Y (x|y) = = pY |X(y|x) . (A.7) pY (y) pY (y) Calderbank–Shor–Steane Codes B Calderbank–Shor–Steane (CSS) codes are a large class of quantum error cor- rection codes that exploit ideas from classical linear error correction codes. In entanglement-based QKD protocols, they can be used to correct errors that occur during the distribution of entangled states. B.1 Classical Linear Codes Before we can understand CSS codes, we need to make a short detour into the theory of classical linear codes. A linear code C that encodes k bits into an n bit code space (with n>k)isasetof2k codewords, where each codeword is a binary vector of length n. We call such a code an [n, k] code. It is specified by a n × k generator matrix G with elements in {0, 1}. G maps messages to their equivalent in the code space, for instance, a k bit message x (which is represented by a column vector) is encoded as y = Gx. Note that all arithmetic operations (especially multiplications and additions) are done modulo 2. As a simple example, consider the [3, 1] repetition code that encodes 1 bit messages into three copies of them: 0 is mapped to (0, 0, 0)T and 1 is mapped to (1, 1, 1)T . Hence, the generator matrix G is ⎛ ⎞ 1 G = ⎝1⎠ . (B.1) 1 To connect this definition of classical codes to error correction, we have to introduce a different formulation of linear codes, the parity check matrices. In this formulation, an [n, k] code is defined as all vectors x of length n with entries from {0, 1} such that Hx = 0, (B.2) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 221 R. Wolf, Quantum Key Distribution, Lecture Notes in Physics 988, https://doi.org/10.1007/978-3-030-73991-1 222 B Calderbank–Shor–Steane Codes where H is an (n − k) × n matrix with entries in {0, 1} called the parity check matrix. To construct the parity matrix H from a generator matrix G, one has to pick out n − k linearly independent vectors orthogonal to the columns of G.The corresponding parity check matrix for the [3, 1] repetition code with G given in (B.1)isthen 110 H = . (B.3) 011 In the language of parity check matrices, it is quite easy to see how error detection and correction work. Suppose we have a message x that we encode as y = Gx.If an error e occurs, the codeword y is transformed into the corrupted codeword y = y + e. Because Hy = 0 for all codewords y, it follows that Hy = Hy+ He = He. This is called the error syndrome. If the syndrome is 0, we know that no error has occurred. Otherwise, it contains information about the error because of the way the parity check matrix H was constructed. In the example of the [3, 1] repetition code, every codeword has a length of 3 bits. Therefore, errors can occur at three different positions. Denote by ei an error in the ith bit, i.e., a vector with a 1 at position i. Then for all codewords y,wehave that Hy = Hei; hence, the three different syndromes are 1 0 1 He = ,He= ,He= . (B.4) 1 0 2 1 3 1 This makes it possible to read off the position of the error from the syndromes. Note that this procedure is only successful if we know that an error has occurred for at most one bit. Hence, the [3, 1] repetition code can correct one error. More general linear error correction codes can be obtained using the concepts of Hamming distance. The Hamming distance d(x,y) between two binary vectors x and y is defined as the number of positions in which the two bit strings differ. For example, d((1, 1, 0, 0)T ,(1, 0, 0, 1)T ) = 2, because the vectors differ in the 2nd and 4th positions. Error correction now works as follows: suppose we have a codeword y = Gx that is corrupted such that the resulting vector is y = y + e.If 1 the probability that an error occurs is less than 2 , the most likely codeword to have been encoded is the one that minimizes the Hamming distance to y, i.e., d(y,y), since this is the one with the least amount of bit flips. How many errors can such a code correct? This can also be analysed in terms of the Hamming distance: We define the distance of a code C to be the minimum Hamming distance between any of its codewords: d(C) = min d(x,y). (B.5) x,y∈C, x=y We use the notation d = d(C) and call C an [n, k, d] code. With a little bit of thinking one can see that a code with distance 2t + 1 for some integer t can be used B Calderbank–Shor–Steane Codes 223 to correct up to t errors, simply by decoding the corrupted message y as the unique codeword y that satisfies d(y,y) ≤ t. If more than t errors occur, this codeword is no longer unique and therefore, errors cannot be reliably detected and corrected. The last concept we need from classical linear codes is duality. Suppose we have a linear [n, k] code C with generator matrix G and parity check matrix H . We can then construct another code, the dual code C⊥ of C, which consists of all codewords that are orthogonal to each codeword in C. Hence, the generator matrix of the dual code is H T and its parity check matrix is GT . B.2 Quantum Error Correction In the quantum case, the situation is a bit more complicated. Where in the classical case only one type of error is possible (namely the bit flip error), a qubit can undergo three different types of errors: a bit flip, which changes |0 to |1 and |1 to |0,a phase error, which maps |1 to −|1 but leaves |0 unchanged, and a combination of the two, which maps |0→−|1 and |1→|0. The Calderbank–Shor–Steane (CSS) code is now defined as follows: Suppose we have two classical linear error correction codes, an [n, k1] code C1 and an [n, k2] ⊂ ⊥ code C2 such that C2 C1 and C1 and C2 both correct up to t errors. Using these two classical codes we can define a quantum error correction code, the CSS code of C1 over C2, denoted CSS(C1,C2).Itisan[n, k1 − k2] quantum code that is capable of correcting errors on up to t qubits. The construction works as follows: for any codeword x ∈ C1, we define the quantum state 1 |x + C2=√ |x + y, (B.6) |C2| y∈C2 where + is the bitwise addition modulo 2 and |C2| denotes the cardinality of C2 k (which is 2 2 , since this is the number of codewords of C2).