Lectures 1,2,3

Classical and Quantum Information Theory from Discrete to Continuous Variables Nicolas J. Cerf Centre for Quantum Information and Communication (QuIC), Ecole Polytechnique, Université Libre de Bruxelles, Belgium QUANTUM INFORMATION, COMPUTATION, AND COMPLEXITY Institut Henri Poincaré (IHP) Paris, January 4 -- April 7, 2006 0 Shannon's information theory Claude E. Shannon 1916{2001 A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379-423 and 623-656. Communication theory of secrecy systems, Bell System Technical Journal 28 (1949) 656-715. This work has laid out the entire foundation of today's information technology era. 1. Every kind of information { text, image, sound { can be associated with an information content, which quantifies how efficiently it can be represented with 0's and 1's. 2. Any imperfect communication channel { telephone line, radio channel, satellite link { has a capacity, which quantifies how much information can be transmitted reliably over the channel. 1 Source coding theorem (noiseless coding) Q: What is the highest possible data compression rate? A: Shannon's entropy H = f(source statistics). 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 0 1 1 0 0 1 −! ... suppressing redundancy: \source code" Channel coding theorem (noisy channel coding) Q: What is the highest possible transmission rate? A: Shannon's capacity C = f(noise statistics). 0 1 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 0 1 −! ... adding redundancy: \error correcting code" 2 Example of source coding Source: random variable X distributed as p(x) S Entropy: H(X) p(x) log2 p(x) ≡ −x=1 where S is the alphabXet size 1=2 if x = "A" 1=4 if x = "B" p(x) = 8 > 1 8 if = "C" > = x < 1=8 if x = "D" > :> "A" 00 "B" ! 01 Simple code: L = 2 bits/symbol "C" ! 10 "D" ! 11 ! Average code length L H(X) ≥ 1 2 3 3 7 H(X) = 2 + 4 + 8 + 8 = 4 bits < 2 bits "A" 0 "B" ! 10 Clever code: L = 7 bits/symbol "C" ! 110 4 "D" ! 111 ! 3 Typical sequences Entropy H(X) is thus the average length of the shortest description of X. But can we always reach this irreducible compression? YES, using \typical" sequences of size n . ! 1 Sequence (X1; : : : Xn) of independent random variables distributed each as p(x) n n 1 1 1 log p(x1; : : : xn) = log p(x ) = log p(x ) −n 2 −n 2 i −n 2 i i=1 i=1 Y X log2 p(X) X p(x) H(X) ' h− i ∼ ≡ nH(X) p(x ; : : : xn) 2 for typical sequences. ) 1 ' − nH(X) Normalization: p(x¯) 1 Styp 2 x¯ S ' ) j j ' 2Xtyp About nH(X) bits are sufficient, on average, to code a typical n-symbol sequence, so that L H(X) bits/symbol. ' 4 Example: binary source p(0) = 1 p = 3=4 00001010000100110000 p(1) = p =− 1=4 n=20 bits | {z } Typical n-bit sequence contains n(1 p) = 15 0's ∼ − × np = 5 1's. ∼ × Probability of emission nr. of 0's nr. of 1's p(x1; : : : xn) = (1 p) p − (1 ) (1 ) log (1 )+ log (1 p)n p pnp = 2n p 2 p np 2 p 2 nH(X) ' − − − − ≡ − Number of typical sequences n n! n n log n Styp = use n! n = 2 2 j j ' np (np)!(n(1 p))! ' − log n+log p log n+log (1 p) 2 2 2 2 − 2n log2 n np log2(np) n(1 p)log2(n(1 p)) 2nH(X) ' − − − − ≡ z }| { z }| { Probability of emitting any typical sequences 2 nH(X) 2nH(X) 1 ∼ − × ∼ Note: most frequent sequence 00000000000000000000 is generally not typical ! 5 Remarkably, Styp is only an exponentially small part of the set of all sequences, while it has arbitrarily large probability. Since 0 H(X) log S ≤ ≤ 2 generally S 2nH(X) 2n log2 S = Sn j typj ' Typical set is sufficiently large to contain • most of the probability > 0; δ > 0; n and S : 8 9 typ n[H(X) + δ] Styp < 2 and p(x¯) > 1 j j x¯ S − 2Xtyp Typical set is the smallest set that contains • most of the probability > 0; δ > 0; n : 8 8 n[H(X) δ] S < 2 − p(x¯) < j j ) x¯ S X2 6 Instantaneous (variable-length) codes Code C: symbol x codeword C(x) of length l(x) ! C is instantaneous code if no C(x) is a prefix of another C(x ) with x = x 0 0 6 "A" 0 "B" ! 10 10; 0; 0; 110; 0; 111; 10; : : : "C" ! 110 "D" ! 111 ! S l(x) Kraft inequality 2− 1 x=1 ≤ 0 X 0 1 0 0 1 1 0 1 0 1 0 1 1 1 "A" uses 2− = 1=2 of the decoding tree 2 "B" 2− = 1=4 3 "C" 2− = 1=8 3 "D" 2− = 1=8 Necessary and sufficient condition for the existence of instantaneous code 7 Optimal instantaneous code S S l(x) min p(x)l(x) with constraint 2− = 1 l(x) x=1 x=1 X using Lagrange multiplier X = p(x)l(x) + λ 2 l(x) L x x − P P =1 =1 @ l(x) l(x) p(x) x; = 0 p(x) λ 2 ln 2 = 0 2− = P 8 @l(Lx) ) − − ) P λ ln 2 z }| { z}|{ lopt(x) = log p(x) (assuming it is an integer) ) − 2 Lopt = p(x)lopt(x) = p(x) log p(x) H(X) ) x − x 2 ≡ P P Bloc coding: typical size-n sequences become nH(X) almost equiprobable p(x ; : : : xn) 2 1 ' − l (x ; : : : xn) nH(X) integer opt 1 ' ' L nH(X) bits/bloc opt ' 8 No further compression? Relative entropy (Kullback \distance") S p(x) D(p q) = p(x) log2 jj q(x) xX=1 with p(x); q(x) probability distributions. q(x) D(p q) = log2 X p(x) jj h− p(x)i ∼ q(x) log2 X p(x) concavity of log ≥ − hp(x)i ∼ q(x) = log2 p(x) − x p(x) = log q(x) = 0 − 2 Px P D(p q) 0 jj ≥ D(p q) = 0 iff x; p(x) = q(x) jj 8 9 L H(X) = p(x)l(x) + p(x) log p(x) − x x 2 P Pl(x) = x p(x) log2 2 p(x) P l(y) 2− p(x) = p(x) log y x 2 2 l(x) × 2 l(y) "P − y − # P 2 l(x) P Let q(x) = − normalized probability distribution l(y) y 2− P p(x) 1 L H(X) = p(x) log x 2 l(y) − "q(x) × y 2− # P l(Py) = D(p q) log 2− jj − 2 0 y ≥ X 1 Kraft | {z } ≤ 0 | {z } ≥ The average length L of any instantaneous code cannot be lower than H(X) 10 Noisy channel coding X Y p(y|x) X = input symbol Y = output symbol p(y x) = transition probability j Naive picture: to reduce the error probability Pe - either increase signal power S - or decrease noise power N Shannon theory: as soon as the rate R C, an arbitrarily low Pe ≤ is achievable in the limit of large blocs n . ! 1 capacity C = B log2(1+S=N) (B = bandwidth) X 1 ... X n Y1 ... Y n ENC CHANNEL DEC 11 Conditional and mutual entropy H(X) p(x) log p(x) = uncertainty of X ≡ − x 2 H(X Y ) P p(y) H(X Y = y) = unc. of X if Y known j ≡ y j p(x y) log2 p(x y) P − x j j | {z } P p(x; y) H(X Y ) = p(y)p(x y) log2 ) j − x;y j p(y) = P p(x; y) log p(x; y) + p(y) log p(y) − 2 2 x;y y X X H(X;Y ) H(Y ) − | X {z } |Y {z } p(y|x) H(X Y ) = H(X; Y ) H(Y ) = loss of information j − perfect channel H(X Y ) = 0 random channel H(Xj Y ) = H(X) j I(X:Y ) H(X) H(X Y ) = mutual information ≡ − j = H(X) + H(Y ) H(X; Y ) − H(X) H(Y) H(X|Y) I(X:Y) H(Y|X) ( : ) = ( ) log p(x;y) I X Y x;y p x; y 2 p(x)p(y) = PD p(x; y) p(x)p(y) 0 jj ≥ 12 ( : ) = ( ) log p(x;y) I X Y x;y p x; y 2 p(x)p(y) =PD p(x; y) p(x)p(y) 0 jj ≥ X and Y independent I(X:Y ) = 0 , I(X:Y ) measures the statistical dependence between X and Y Subadditivity of entropies I(X:Y ) = H(X) + H(Y ) H(X; Y ) 0 − ≥ thus H(X; Y ) H(X) + H(Y ) ≤ Strong subadditivity of entropies I(X:Y Z) = p(z) I(X:Y Z = z) 0 j z j ≥ p(x;y z) P p(x;y z) log j x;y j 2 p(x z)p(y z) | {z j } j = P( ) log p(x;y;z)p(z) x;y;z p x; y; z 2 p(x;z)p(y;z) = HP(X; Z) + H(Y; Z) H(X; Y; Z) H(Z) − − thus H(X; Y; Z) + H(Z) H(X; Z) + H(Y; Z) ≤ 13 Interpretation of strong subadditivity H(X; Y; Z) + H(Z) H(X; Z) + H(Y; Z) ≤ +H(X) +H(X) H(X) + H(Z) H(X; Z) − H(X) + H(Y; Z) H(X; Y; Z) ≤ − I(X:Z) I(X:Y; Z) ) ≤ Discarding the variable Y can only decrease the mutual information with X 14 Shannon capacity C = max I(X:Y ) for given channel p(y x) p(x) j Binary symmetric channel 1−p 0 0 p p p=bit-flip probability 1 1 1−p I(X:Y ) = H(Y ) H(Y X) − j = H(Y ) p(x)H(Y X = x) − x j = H(Y ) p(x)H2[p] − Px = H(Y ) H2[p] − P with H2[p] = (1 p) log (1 p) p log p − − 2 − − 2 p(x) = 1=2; 1=2 p(y) = 1=2; 1=2 H(Y ) = 1 f g ) f g ) C = 1 H2[p] bits per use of the channel − 1 0.9 0.8 0.7 0.6 C (bits) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p 15 Quantum information theory What if information is encoded into quantum states ? Bits b 0; 1 Quantum bits 0 ; 1 2 f g −! j i 2 fj i j ig Quantum state in a 2-d Hilbert space, regardless the physical support (e.g., spin 1/2, photon polarization) Unique properties (compared to classical info): Superposition principle: • α 0 + β 1 with α 2 + β 2 = 1 j i j i j j j j Computational basis 0 ; 1 (convention) fj i j ig Dual basis = 1 ( 0 1 ) |i p2 j i j i Non-orthogonal states • are not perfectly distinguishable quantum data compression, accessible ) information, no-cloning principle Non-classical correlations (entanglement) • EPR = 1 ( 0 0 + 1 1 ) j i p2 j ij i j ij i = 1 ( + + + ) p2 j ij i |−i|−i 16 Distinguishing quantum states prob.

Lectures 1,2,3

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support