The Index of Coincidence

The Index of Coincidence

REF ID:A64687 CONFIDENTIAL THE INDEX OF COINCIDENCE •.· \ \ :-.... -·-·· . ·- - ·'-·'--·'-" \_::··-...· l' ··-· '-.r.... \,.-· r ,,i. ... ·- . ~ ......... , ''-· \-., .. , ~\...- '(~ ·-~ ··-\...__. : ·.. ·.. • ·: .~· ':. : r .. : .... ~- ·._ : ·. ; . ·; .:- .... ! '( : .. ·=- ..~ ...... -. ·~; ~~-- ~: ·_ .. : ~-· ·. ·. ··.. ... -~ .. : . :. : ·. -~··. ~ . ·. :. :' . <.. .· :'ClQNFmH::JQl'.i. .·. ~ ·.. ·.· .: ·. :':,::.:· ..... -. ,'·"·.. :· : .. ~··· :.\ ..... ·k··::.~<~. ·~.;::;;· .. :-~ :~: ·. '.. : ·... : . .. ·. •: .:.· ... .~· : .·. L~- :<; .o:· >(:L _::..:.... REF ID:A64687 ·I 06NFIBRN'l'Ild:. .I Mecll&ecl llaaclHn:& l.dltheri11ecl 'I NATIONAL SECURITY AGENCY THE INDEX OF COINCIDENCE NOTICE: This material contains information affecting the National Defense of the United States within the meaning of the espionage laws, Title 18, U. S. C., Sections 793 and 794, the trans­ mission or the revelation of which in any manner to an unauthorized person is prohibited by law. NATIONAL SECURITY AGENCY WASHINGTON 25, D. C. JANUARY 1955 OONFIBEN't'IM; I ORIGINAL Reverse (Page II) Blank t REF ID:A64687 CmJ~IDKNTI!A. Modified: Handling Aa:tlwrizecl THE INDEX OF COINCIDENCE FOREWORD This edition is a revision of the paper published in 1946 by the Chief of Na.val Operations, which paper was itself a revision of a pamphlet called "Coincidence", tlrst published in 1929, and revised in 1930, 1939, and in 1940 by the Navy Department. HOWARD H. CAMPAIGNE Assistant Chief for Research Office of Research and Development 12 January 1955 ··-ro 4--v- r/. f: F~~~ ~~~~~~ ~)~~ fv-~ Lolla ~ > a_J ...;,AL ~ fr A-..:.. ~ -kr.r ~ ~~ f/ rwt1-.J Ca-J~ 00NFIBRN':f'IkL III ORIGINAL Reverse (Page IV) Blank ,: REF ID:A64687 ,, C6Nli'IBEN'flA-L I TABLE OF CONTENTS Sedifm Paoe I Simple :M:onOIP'&phie Conipariaon___________________________________________________________ 2 II Po)ygraphieConiparison___________________________________________________________________ 4 III TheoretiealReeapitulation_________________________________________________________________ 5 IV Examplea of Use _________________________________________________________________________ _ 5 (A) To Det.mmine Whether Two Meaaagea are in the Sanie Key_ . ___ .. _________________________ _ 5 (B) To Determine Where Two :M:eaaagea Overlap ___ • ________________________________________ _ 7 V The Rouglmeaa of a Single Sample ______________________ . __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 7 VI Eumination of Cipher Alphabets and Cipher Texts and Coherence ___________________________ . __ 9 VII TheStandardDeviation___________________________________________________________________ 11 VIII TheCl'OllBI. C·--------------------------------------------------------------------------- 12 IX The Coneidence Test used to Alim Secondary Alphabets into a Primary Alphabet . __ . __ _ _ _ _ _ _ _ _ _ _ 13 X The I. C. of the :M:odular Swn of Two Streams ________________________________ . ________ - __ _ _ _ 22 XI The Roughness of :M:ixed Tests _____________________________________ ._._. _____________ -- __ __ _ 23 XII The Relation between Chi-Square and the G&1Dma I. C._. _______ .. _. __ --------- _---- ---------- 28 XIII The I. C. of a Reetangular Array ____________________ . _. _. ____________________ --------- _. 24 XIV Conclusion and Problenis _________________ ... _. _. ________________________ -- _____ ---.... --- . 24 APPENDICES I The Relation between the Monographie and Digrapbie I. C._ -. _... 29 II A Proof that the Expected Cross I. C. is L _____________ ... _____________________ -------- ___ . 81 III The Standard Deviation of the I. C.____ ___ __ _ __ _ _ _ .. ____________________ ------------.. - . 33 IV Proof of the Formula of Section X _________ .... _. ______________________________ --- -- -- ... __ -- 35 V Proof of the Mixture Formula 37 VI I. C. Distribution and Tables . __ . ___ . _______________________ . _. _____ . - . - . --- __ . 39 1. Distribution of the 'Y or 6 I. C. (X2 niethod). ________ . ___________ .. _. 41 2. Distribution of the Cl'Ollll I. C. ___ . _________ . ___________ . _.. __ _ 42 S. Distribution of the i I. C•. ____ . _______ . _. ___ ... _. ____ . _ __ . _ 4. Tables ofl. C. _ _ .. ___________ .. _______ . _. __ .. _.... 5. Short Cumulative Poisson Table. ___ . __ . __ ... _..... 51 6. Short Table of Logarithms of Factorials._. ____ .. .. _.. ____ . __ .. _. _. _____ . - . -... -- . 52 QQHFIDKNTI/Ji:I:. v ORIGINAL Reverse (Page VI) Blank REF ID:A64687 C6NPIBEN't'IA:L INTRODUCTION The subject of this pamphlet is coincidence. "Coincidence" as the term is used here may be defined as a recurrence of a lett.er in the same place, or in a corresponding place, as when two text.s are lined up one under the other, lett.er for lett.er. Mathematical evaluation assist.s the cryptanalyst first in preparing his material for attack, and lat.er in the actual attack itself. It assist.s specifically in answering the following questions. 1) How much like random, or how different from random, is this text? 2) How similar are these text.s? 3) How significant is this variation from random? 4) How is significant is this similarity? I l'ii ·11 1:,, i: r I1 GONFIBBN'f'IAL 1 ORIGINAL 11:l: t.~ i •' REF ID:A64687 l ~- I' G9NFIBEN'fIAL ~.. I. Simple Monographlc Comparison. ...~ .. , I; When examining cipher text looking for a break the cryptanalyst keeps in mind as a standard +' for comparison "random" t.ext. This is text which has no meaning or system behind it, in which each letter can appear as often as any other, in which each digraph can appear as often as any other, and generally no significant pattern can appear except in small samples as the result of chance deviations. Text which appears to meet these conditions is sometimes described as "flat". Text which fails to be random in some way is called "rough". A test has been devised which measures whether two t.exts are rough in the same way. This test is performed by writing the texts one above the other and counting the occasions for which the same letters come together, such as an E over an E, called a "coincidence". The ratio of the number of coincidences to the number of coincidences expected in random text is called the ...in d ex of comc1. "d ence " , and is. a bbrevia . ted as I . C . or '· , = actual coincidences. If the t wo expected coincidences texts were random then a coincidence would occur once in 26 trials (for a 26 letter alphabet), or 3.85% of the time. If the two texts were English then there would be more coincidences, almost 7%. The percentage found divided by 3.85% is the I. C. Most European languages have an I. C. of about 2. For random text the I. C. is 1. The expected I. C. for English can be computed as follows: Take two pages of English text. Make a chance selection from each page. There are about 130 chances in 1,000 of the first letter's being an "E" (See table following). There are about 130 chances in 1,000 of the second letter's being an "E". There are about 16,900 chances in 1,000,000 of both letters' being an "E". Likewise, there are 8,464 chances in 1,000,000 of both being "T", 6,400 chances in 1,000,000 of both being "N", etc. G9NFIBBN'Flld:i 2 ORIGINAL REF ID:A64687 C6NPIBEN'f1Ai:I Table Text Letter Chances in Chances in Chances in (Telegraphic 1,000 of 1st 1,000 of 2nd 1,000,000 of both Text) letter's be- letter's be- letters' being ing this ltr. ing this ltr. this letter E 180 180 16,900 0 75 75 5,625 A 74 74 5,476 I 78 78 5,829 N 80 80 6,400 I R 76 76 5,776 s 61 61 8,729 T 92 92 8,464 D 42 42 1,764 H 84 84 1,156 L 86 86 1,296 c 81 81 961 M 25 25 625 u 26 26 676 p 27 27 729 F 28 28 784 G 16 16 256 y 19 19 861 B 10 10 100 v 15 15 225 w 16 16 256 K 4 4 16 J 2 2 4 Q 2 2 4 x 5 5 25 z 1 1 1 Any letter 1,000 1,000 66,930 I.I FinalJy, there are 66,930 chances in 1,000,000 (the sum of the chances for the individual letters) ; of both letters' being the same letter in a chance selection. Therefore, if we select many pairs of plain text letters, the average number of identical letters to be expected "in the long run" will be 6.69% (about 1/15) of the total number of possible coincidences. We may call this number the expected coincidences in English text. ' 0669 The expected I. C. for English (or monoalphabetic cipher text) is · =1.78 ! . .0885 The actual I. C. of unknown cipher text may take almost any value but in practice on small samples the range will generally extend from about .80 to about 2.00 (simple monographic Index of Coincidence). The value of the index of coincidence for a given English text will depend on the distribution of letters in that text. Repetitions in short texts will increase the index of coincidence. Text j I with few repetitions will give an I. C. approaching the theoretical 1.78. As the expected number of chance coincidences is based on a flat frequency (where each cipher letter is ultimately used the same number of times) any cipher text that differs radically from such frequency distribution 00NPIBEN'f'U:L 3 ORIGINAL I. L REF ID:A64687 G9H¥1BBN'PIM. will have a correspondingly higher I. C. This is especially noticeable in short cipher·texts where the frequencies have not had an opportunity to "flatten out". The monographic I. C. of English text will increase with small amounts of text to 1.80-2.00 (as compared with the theoretical 1.73) and small amounts of random text will give I. C.'s of 1.10-1.20 (as compared with the theoretical 1.00).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    51 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us