Appendix: Axiomatic Information Theory
Total Page:16
File Type:pdf, Size:1020Kb
Appendix: Axiomatic Information Theory The logic of secrecy was the mirror-image of the logic of information Colin Burke 1994 Perfect security was promised at all times by the inventors of cryptosystems, particularly of crypto machines (Bazeries: je suis indechiffrable). In 1949, Claude E. Shannon gave in the framework of his information theory a clean definition of what could be meant by perfect security. We show in the fol lowing that it is possible to introduce the cryptologically relevant part of information theory axiomatically. Shannon was in contact with cryptanalysis, since he worked 1936-1938 in the team of Vannevar Bush, who developed the COMPARATOR for determina tion of character coincidences. His studies in the Bell Laboratories, going back to the year 1940, led to a confidential report (A Mathematical Theory of Communication) dated Sept. 1, 1945, containing apart from the definition of Shannon entropy (Sect. 16.5) the basic relations to be discussed in this ap pendix. The report was published three years later: Communication Theory of Secrecy Systems, Bell System Technical Journal 28,656-715 (1949). A.I Axioms of an Axiomatic Information Theory It is expedient to begin with events, i.e., sets X, y, Z of elementary events, and with the uncertainty on events - real numbers. More precisely, Hy(X) denotes the uncertainty on X, provided Y is known. H(X) = H0(X) denotes the uncertainty on X, provided nothing is known. A.I.I Intuitively patent axioms for the real-valued binary set function Hare (0) 0 ~ Hy(X) ("Uncertainty is nonnegative.") For 0 = Hy(X) we say "Y uniquely determines X." (1) Hyuz(X) ~ Hz(X) ("Uncertainty decreases, if more is known." ) For Hyuz(X) = Hz(X) we say "Y says nothing about X ." The critical axiom on additivity is (2) Hz(X U Y) = Hyuz(X) + Hz(Y) This says that uncertainty can be built up additively over events. Appendix: Axiomatic Information Theory 419 The classical stochastic model for this axiomatic information theory is based on px(a) = Pr [X = a], the probability that the random variable X assumes the value a, and defines H0({X})=- L px(s)·ldpx(s) s :px(s) >0 H0({X}U{Y})=- L PX,y(s,t)·ldpx,Y(s,t) s,t :pX'y(s,t) >0 H{Y}({X}) = - L px,Y(s,t) .ldpx/y(s/t) s,t: px(Y(s/t) >0 where px,Y(a, b) =def Pr[(X = a) /\ (Y = b)] and px/y(a/b) obeys Bayes' rule for conditional probabilities: PX,y(s, t) = py(t) . Px/y(s/t) ,thus -ld PX,y(s, t) = -ld py(t) -ld px/y(s/t) A.1.2 From the axioms (0), (1), and (2), all the other properties usually derived for the classical model can be obtained. For Y = 0, (2) yields (2a) Hz(0) = 0 ("There is no uncertainty on the empty event set.") (1) and (2) imply (3a) Hz(X U Y) ::; Hz(X) + Hz(Y) ("Uncertainty is subadditive.") (0) and (2) imply (3b) Hz(Y) ::; Hz(X U Y) ("Uncertainty increases with larger event set.") From (2) and the commutativity of . u. follows (4) Hz(X) - Hyuz(X) = Hz(Y) - Hxuz(Y) (4) suggests the following definition: The mutual information of X and Y under knowledge of Z is defined as lz(X,Y) =def Hz(X) - Hyuz(X) . Thus, the mutual information lz(X, Y) is a symmetric (and because of (1) nonnegative) function of the events X and y. From (2), lz(X,Y) = Hz(X) + Hz(Y) - Hz(X U Y) . Because of (4), "Y says nothing about X" and "X says nothing about Y" are equivalent and are expressed by Iz(X, Y) = O. Another way of saying this is that under knowledge of Z , the events X and Yare mutually independent. In the classical stochastic model, this situation is given if and only if X, Yare independent random variables: PX,y(s,t)=px(s)-y (t) . lz(X, Y) = 0 is equivalent with the additivity of H under knowledge of Z : (5) lz(X, Y) = 0 if and only if Hz(X) + Hz(Y) = Hz(X U Y) . 420 Appendix: Axiomatic Information Theory A.2 Axiomatic Information Theory of Cryptosystems For a cryptosystem X , events in the sense of abstract information theory are sets of finite texts over Zm as an alphabet. Let P be a plaintext(-event), C a cryptotext(-event), K a keytext(-event).l The uncertainties H(K), Hc(K) , Hp(K), H(C), Hp(C), HK(C), H(P), HK(P), Hc(P) are now called equivocations. A.2.1 First of all, from (1) one obtains H(K) ::; Hp(K) , H(C)::; Hp(C) , H(C) ::; HK(C) , H(P)::; HK(P) , H(P) ::; Hc(P) , H(K)::; Hc(K) . A.2.1.1 If X is functional, then C is uniquely determined by P and K, thus (CRYPT) Hp,K(C) = 0, i.e., h(P, C) = HK(C) , Jp(K, C) = Hp(C) ("plaintext and keytext together allow no uncertainty on the cryptotext.") A.2.1.2 If X is injective, then P is uniquely determined by C and K, thus (DECRYPT) He,K(P) = 0, i.e., Jc(K, P) = Hc(P) , h(C, P) = HK(P) ("cryptotext and keytext together allow no uncertainty on the plaintext.") A.2.1.3 If X is Shannon, then K is uniquely determined by C and P, thus (SHANN) He,p(K) = 0, i.e., Jp(C, K) = Hp(K) , Jc(P, K) = Hc(K) ("cryptotext and plaintext together allow no uncertainty on the keytext.") A.2.2 From (4) follows immediately HK(C) + HK,e(P) = HK(P) , Hp(C) + Hp,c(K) = Hp(K) , He(P) + He,p(K) = Hc(K) , HK(P) + HK,p(C) = HK(C) , Hp(K) + Hp,K(C) = Hp(C) , Hc(K) + He,K(P) = He(P) . With (1) this gives Theorem 1: (CRYPT) implies HK(C)::; HK(P) , Hp(C)::; Hp(K) , (DECRYPT) implies He(P)::; Hc(K) , HK(P)::; HK(C) , (SHANN) implies Hp(K)::; Hp(C) , Hc(K)::; Hc(P) . A.2.3 In a cryptosystem, X is normally injective, i.e., (DECRYPT) holds. In Fig. 163, the resulting numerical relations are shown graphically. In the classical professional cryposystems, there are usually no homophones and the 1 Following a widespread notational misusage, in the sequel we replace {X} by X and {X} U {Y} by X, Y ; we also omit 0 as subscript. Appendix: Axiomatic Information Theory 421 H(K) H(C) ~HP(K) HP(C)~ VI VI Fig. 163. Numerical equivocation relations for injective cryptosystems Shannon condition (2.6.4) holds. Monoalphabetic simple substitution and transposition are trivial, and VIGENERE, BEAUFORT, and in particular VERNAM are serious examples of such classical cryptosystems. The conjunction of any two of the three conditions (CRYPT), (DECRYPT), (SHANN) has far-reaching consequences in view of the antisymmetry of the numerical relations: Theorem 2: (CRYPT) 1\ (DECRYPT) implies HK(C) = HK(P) ("Uncertainty on the cryptotext under knowledge of the keytext equals uncertainty on the plaintext under knowledge of the keytext,") (DECRYPT) 1\ (SHANN) implies Hc(P) = Hc(K) ( "Uncertainty on the plaintext under knowledge of the cryptotext equals uncertainty on the keytext under knowledge of the cryptotext," ) (CRYPT) 1\ (SHANN) implies Hp(K) = Hp(C) . ("Uncertainty on the keytext under knowledge of the plaintext equals uncertainty on the cryptotext under knowledge of the plaintext.") In Fig. 164, the resulting numerical relations for classical cryptosystems with (CRYPT), (DECRYPT), and (SHANN) are shown graphically. H(K) H(C) ~ Hp(K) = Hp(C) ~ VI VI Fig. 164. Numerical equivocation relations for classical cryptosystems 422 Appendix: Axiomatic Information Theory A.3 Perfect and Independent Key Cryptosystems A.3.1 A cryptosystem is called a perfect cryptosystem, if plaintext and cryptotext are mutually independent: I(P,C) = 0 . This is equivalent to H(P) = He(P) and to H(C) = Hp(C) ("Without knowing the keytext: knowledge of the cryptotext does not change the uncertainty on the plaintext, and knowledge of the plaintext does not change the uncertainty on the cryptotext" ) and is, according to (5) , equivalent to H(P, C) = H(P) + H( C) . A.3.2 A cryptosystem is called an independent key cryptosystem, if plain text and keytext are mutually independent: I(P,K) = 0 . This is equivalent to H(P) = HK(P) and to H(K) = Hp(K) ("Without knowing the cryptotext: knowledge of the keytext does not change the uncertainty on the plaintext, and knowledge of the plaintext does not change the uncertainty on the keytext") and, according to (5) , is equivalent to H(K, P) = H(K) + H(P) . H(K) H(C) (K) = H ~H(independent key) p p (C)~(perfect) VI VI H(P) (perfect) 0 ~ (independent key) He(K) = He(P) HK(P) = HK(C) Fig. 165. Numerical equivocation relations for classical cryptosystems, with properties perfect and independent key A.3.3 Shannon also proved a pessimistic inequality. Theorem 3 K : In a perfect classical cryptosystem (Fig. 165), H(P) ::; H(K) and H(C) ::; H(K) . Proof: H(P) ::; He(P) (perfect) He(P) ::; He(K) (DECRYPT), Theorem 1 Hc(K) ::; H(K) (1) . Analogously with (CRYPT) for H(C) . Thus, in a perfect classical cryptosystem, the uncertainty about the key is not smaller than the uncertainty about the plaintext, and not smaller than the uncertainty about the cryptotext. Appendix: Axiomatic Information Theory 423 From (SHANN) /\ (DECRYPT) with Theorem 1 we find Hc(P) = Hc(K) ; after adding H(C) on both sides, according to (2) we get H(P, C) = H(K, C).