Vigen`Ere Cipher – Friedman's Attack
Statistics 116 - Fall 2002 Breaking the Code Lecture # 5 – Vigen`ere Cipher – Friedman’s Attack Jonathan Taylor 1 Index of Coincidence In this lecture, we will discuss a more “automated” approach to guessing the keyword length in the Vigen`erecipher. The procedure is based on what we will call the index of coincidence of two strings of text S1 and S2 of a given length n. The index of coincidence is just the proportion of positions in the strings where the corresponding characters of the two strings match, and is defined for the strings S1 = (s11, s12, . , s1n) and S2 = (s21, s22, . , s2n) n 1 X I(S ,S ) = δ(s , s ) 1 2 n 1i 2i i=1 where the function ( 1 if x = y δ(x, y) = 0 otherwise. Now, in our situation we have to imagine that the string we observe is “random”. Suppose then, that S1 and S2 are two random strings generated by choosing letters at random with the frequencies (p0, . , p25) and (q0, . , q25), respectively. If the strings are random we can talk about the “average” index of coincidence of the two strings S1 and S2. In statistics, if we have a random quantity X we write E(X) for the “expectation” or “average” of X. It is not difficult to show that if S1 is a string of random letters with frequencies (p0, . , p25) and (q0, . , q25) then the average index of coincidence of these two strings is given by 25 X piqi. (1) i=0 1 This is because, for any of the letters in the string, the probability that the two characters match is the probability that they are both A’s, both B’s, .
[Show full text]