
Classical and Quantum Information Theory from Discrete to Continuous Variables Nicolas J. Cerf Centre for Quantum Information and Communication (QuIC), Ecole Polytechnique, Universit´e Libre de Bruxelles, Belgium QUANTUM INFORMATION, COMPUTATION, AND COMPLEXITY Institut Henri Poincar´e (IHP) Paris, January 4 -- April 7, 2006 0 Shannon's information theory Claude E. Shannon 1916{2001 A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379-423 and 623-656. Communication theory of secrecy systems, Bell System Technical Journal 28 (1949) 656-715. This work has laid out the entire foundation of today's information technology era. 1. Every kind of information { text, image, sound { can be associated with an information content, which quantifies how efficiently it can be represented with 0's and 1's. 2. Any imperfect communication channel { telephone line, radio channel, satellite link { has a capacity, which quantifies how much information can be transmitted reliably over the channel. 1 Source coding theorem (noiseless coding) Q: What is the highest possible data compres- sion rate? A: Shannon's entropy H = f(source statistics). 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 1 1 0 0 1 −! ... suppressing redundancy: \source code" Channel coding theorem (noisy channel coding) Q: What is the highest possible transmission rate? A: Shannon's capacity C = f(noise statistics). 0 1 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 −! ... adding redundancy: \error correcting code" 2 Example of source coding Source: random variable X distributed as p(x) S Entropy: H(X) p(x) log2 p(x) ≡ −x=1 where S is the alphabXet size 1=2 if x = "A" 1=4 if x = "B" p(x) = 8 > 1 8 if = "C" > = x < 1=8 if x = "D" > :> "A" 00 "B" ! 01 Simple code: L = 2 bits/symbol "C" ! 10 "D" ! 11 ! Average code length L H(X) ≥ 1 2 3 3 7 H(X) = 2 + 4 + 8 + 8 = 4 bits < 2 bits "A" 0 "B" ! 10 Clever code: L = 7 bits/symbol "C" ! 110 4 "D" ! 111 ! 3 Typical sequences Entropy H(X) is thus the average length of the shortest description of X. But can we always reach this irreducible compression? YES, using \typical" sequences of size n . ! 1 Sequence (X1; : : : Xn) of independent random variables distributed each as p(x) n n 1 1 1 log p(x1; : : : xn) = log p(x ) = log p(x ) −n 2 −n 2 i −n 2 i i=1 i=1 Y X log2 p(X) X p(x) H(X) ' h− i ∼ ≡ nH(X) p(x ; : : : xn) 2 for typical sequences. ) 1 ' − nH(X) Normalization: p(x¯) 1 Styp 2 x¯ S ' ) j j ' 2Xtyp About nH(X) bits are sufficient, on average, to code a typical n-symbol sequence, so that L H(X) bits/symbol. ' 4 Example: binary source p(0) = 1 p = 3=4 00001010000100110000 p(1) = p =− 1=4 n=20 bits | {z } Typical n-bit sequence contains n(1 p) = 15 0's ∼ − × np = 5 1's. ∼ × Probability of emission nr. of 0's nr. of 1's p(x1; : : : xn) = (1 p) p − (1 ) (1 ) log (1 )+ log (1 p)n p pnp = 2n p 2 p np 2 p 2 nH(X) ' − − − − ≡ − Number of typical sequences n n! n n log n Styp = use n! n = 2 2 j j ' np (np)!(n(1 p))! ' − log n+log p log n+log (1 p) 2 2 2 2 − 2n log2 n np log2(np) n(1 p)log2(n(1 p)) 2nH(X) ' − − − − ≡ z }| { z }| { Probability of emitting any typical sequences 2 nH(X) 2nH(X) 1 ∼ − × ∼ Note: most frequent sequence 00000000000000000000 is generally not typical ! 5 Remarkably, Styp is only an exponentially small part of the set of all sequences, while it has arbitrarily large probability. Since 0 H(X) log S ≤ ≤ 2 generally S 2nH(X) 2n log2 S = Sn j typj ' Typical set is sufficiently large to contain • most of the probability > 0; δ > 0; n and S : 8 9 typ n[H(X) + δ] Styp < 2 and p(x¯) > 1 j j x¯ S − 2Xtyp Typical set is the smallest set that contains • most of the probability > 0; δ > 0; n : 8 8 n[H(X) δ] S < 2 − p(x¯) < j j ) x¯ S X2 6 Instantaneous (variable-length) codes Code C: symbol x codeword C(x) of length l(x) ! C is instantaneous code if no C(x) is a prefix of another C(x ) with x = x 0 0 6 "A" 0 "B" ! 10 10; 0; 0; 110; 0; 111; 10; : : : "C" ! 110 "D" ! 111 ! S l(x) Kraft inequality 2− 1 x=1 ≤ 0 X 0 1 0 0 1 1 0 1 0 1 0 1 1 1 "A" uses 2− = 1=2 of the decoding tree 2 "B" 2− = 1=4 3 "C" 2− = 1=8 3 "D" 2− = 1=8 Necessary and sufficient condition for the existence of instantaneous code 7 Optimal instantaneous code S S l(x) min p(x)l(x) with constraint 2− = 1 l(x) x=1 x=1 X using Lagrange multiplier X = p(x)l(x) + λ 2 l(x) L x x − P P =1 =1 @ l(x) l(x) p(x) x; = 0 p(x) λ 2 ln 2 = 0 2− = P 8 @l(Lx) ) − − ) P λ ln 2 z }| { z}|{ lopt(x) = log p(x) (assuming it is an integer) ) − 2 Lopt = p(x)lopt(x) = p(x) log p(x) H(X) ) x − x 2 ≡ P P Bloc coding: typical size-n sequences become nH(X) almost equiprobable p(x ; : : : xn) 2 1 ' − l (x ; : : : xn) nH(X) integer opt 1 ' ' L nH(X) bits/bloc opt ' 8 No further compression? Relative entropy (Kullback \distance") S p(x) D(p q) = p(x) log2 jj q(x) xX=1 with p(x); q(x) probability distributions. q(x) D(p q) = log2 X p(x) jj h− p(x)i ∼ q(x) log2 X p(x) concavity of log ≥ − hp(x)i ∼ q(x) = log2 p(x) − x p(x) = log q(x) = 0 − 2 Px P D(p q) 0 jj ≥ D(p q) = 0 iff x; p(x) = q(x) jj 8 9 L H(X) = p(x)l(x) + p(x) log p(x) − x x 2 P Pl(x) = x p(x) log2 2 p(x) P l(y) 2− p(x) = p(x) log y x 2 2 l(x) × 2 l(y) "P − y − # P 2 l(x) P Let q(x) = − normalized probability distribution l(y) y 2− P p(x) 1 L H(X) = p(x) log x 2 l(y) − "q(x) × y 2− # P l(Py) = D(p q) log 2− jj − 2 0 y ≥ X 1 Kraft | {z } ≤ 0 | {z } ≥ The average length L of any instantaneous code cannot be lower than H(X) 10 Noisy channel coding X Y p(y|x) X = input symbol Y = output symbol p(y x) = transition probability j Naive picture: to reduce the error probability Pe - either increase signal power S - or decrease noise power N Shannon theory: as soon as the rate R C, an arbitrarily low Pe ≤ is achievable in the limit of large blocs n . ! 1 capacity C = B log2(1+S=N) (B = bandwidth) X 1 ... X n Y1 ... Y n ENC CHANNEL DEC 11 Conditional and mutual entropy H(X) p(x) log p(x) = uncertainty of X ≡ − x 2 H(X Y ) P p(y) H(X Y = y) = unc. of X if Y known j ≡ y j p(x y) log2 p(x y) P − x j j | {z } P p(x; y) H(X Y ) = p(y)p(x y) log2 ) j − x;y j p(y) = P p(x; y) log p(x; y) + p(y) log p(y) − 2 2 x;y y X X H(X;Y ) H(Y ) − | X {z } |Y {z } p(y|x) H(X Y ) = H(X; Y ) H(Y ) = loss of information j − perfect channel H(X Y ) = 0 random channel H(Xj Y ) = H(X) j I(X:Y ) H(X) H(X Y ) = mutual information ≡ − j = H(X) + H(Y ) H(X; Y ) − H(X) H(Y) H(X|Y) I(X:Y) H(Y|X) ( : ) = ( ) log p(x;y) I X Y x;y p x; y 2 p(x)p(y) = PD p(x; y) p(x)p(y) 0 jj ≥ 12 ( : ) = ( ) log p(x;y) I X Y x;y p x; y 2 p(x)p(y) =PD p(x; y) p(x)p(y) 0 jj ≥ X and Y independent I(X:Y ) = 0 , I(X:Y ) measures the statistical dependence between X and Y Subadditivity of entropies I(X:Y ) = H(X) + H(Y ) H(X; Y ) 0 − ≥ thus H(X; Y ) H(X) + H(Y ) ≤ Strong subadditivity of entropies I(X:Y Z) = p(z) I(X:Y Z = z) 0 j z j ≥ p(x;y z) P p(x;y z) log j x;y j 2 p(x z)p(y z) | {z j } j = P( ) log p(x;y;z)p(z) x;y;z p x; y; z 2 p(x;z)p(y;z) = HP(X; Z) + H(Y; Z) H(X; Y; Z) H(Z) − − thus H(X; Y; Z) + H(Z) H(X; Z) + H(Y; Z) ≤ 13 Interpretation of strong subadditivity H(X; Y; Z) + H(Z) H(X; Z) + H(Y; Z) ≤ +H(X) +H(X) H(X) + H(Z) H(X; Z) − H(X) + H(Y; Z) H(X; Y; Z) ≤ − I(X:Z) I(X:Y; Z) ) ≤ Discarding the variable Y can only decrease the mutual information with X 14 Shannon capacity C = max I(X:Y ) for given channel p(y x) p(x) j Binary symmetric channel 1−p 0 0 p p p=bit-flip probability 1 1 1−p I(X:Y ) = H(Y ) H(Y X) − j = H(Y ) p(x)H(Y X = x) − x j = H(Y ) p(x)H2[p] − Px = H(Y ) H2[p] − P with H2[p] = (1 p) log (1 p) p log p − − 2 − − 2 p(x) = 1=2; 1=2 p(y) = 1=2; 1=2 H(Y ) = 1 f g ) f g ) C = 1 H2[p] bits per use of the channel − 1 0.9 0.8 0.7 0.6 C (bits) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p 15 Quantum information theory What if information is encoded into quantum states ? Bits b 0; 1 Quantum bits 0 ; 1 2 f g −! j i 2 fj i j ig Quantum state in a 2-d Hilbert space, regardless the physical support (e.g., spin 1/2, photon polarization) Unique properties (compared to classical info): Superposition principle: • α 0 + β 1 with α 2 + β 2 = 1 j i j i j j j j Computational basis 0 ; 1 (convention) fj i j ig Dual basis = 1 ( 0 1 ) |i p2 j i j i Non-orthogonal states • are not perfectly distinguishable quantum data compression, accessible ) information, no-cloning principle Non-classical correlations (entanglement) • EPR = 1 ( 0 0 + 1 1 ) j i p2 j ij i j ij i = 1 ( + + + ) p2 j ij i |−i|−i 16 Distinguishing quantum states prob.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages29 Page
-
File Size-