Cryptography
Total Page:16
File Type:pdf, Size:1020Kb
Cryptography Prof. Dr. Carsten Damm Dr. Henrik Brosenne University of Goettingen Institut of Computer Science Winter 2013/2014 Table of Contents Elementary Cryptanalysis Classification of Cryptanalytic Attacks Stochastic structure of natural language - Part 1 Cryptanalysis by Frequency Analysis Breaking the Vigenere cipher Statistical Measures Cryptanalysis of Transposition Ciphers Starring Alice = first person in all protocols (initiator) Bob = second person in all protocols Eve = an eavesdropper, i.e., passive attacker Mallory = malicious active attacker In this chapter we study passive attacks. Eve tries to get information about the plaintext, while observing only ciphertext messages in a cryptographic protocol. All attacks rely on a fixed cryptosystem (E; D). Ciphertext-only attack The ciphertext-only attack is the type of attack we will study in this chapter. given ciphertexts C1 = EK (M1) ::: Ci = EK (Mt ) of several messages, all generated by the same cipher EK . wanted an algorithm to infer Mt+1 from Ct+1 = EK (Mt+1). weaker: recover some information about M1 ::: Mt . stronger: recover the key K (or at least information about it). Known plaintext attack additionally given M1;:::; Mt scenario: disclosure of formerly classified documents Chosen plaintext attack instead given (limited) access to the cipher EK , so that the analyst can choose M1;:::; Mt and generate the corresponding ciphertexts C1 = EK (M1);:::; Ci = EK (Mt ) scenario: a spy that is able to plant some specially prepared messages on the Enigma-operator Adaptive-chosen-plaintext attack special variant of chosen plaintext attack: I the attacker doesn't need to fix the chosen plaintexts in advance but rather can watch the outcome of chosen plaintext encryptions and based on that choose the next one(s) scenario: before World War II Polish cryptanalysts were in posession of a copy of the Enigma machine (http://en.wikipedia.org/wiki/Biuro Szyfrow) Table of Contents Elementary Cryptanalysis Classification of Cryptanalytic Attacks Stochastic structure of natural language - Part 1 Cryptanalysis by Frequency Analysis Breaking the Vigenere cipher Statistical Measures Cryptanalysis of Transposition Ciphers Published Worksheet Published worksheet 04 stochastic structure of natural languages part1. Simple observations Well known: each language (English, German, . ) has statistical characteristics that can be used to differentiate between various text sources. frequencies of letters and words of pairs, triples, . n-grams, or more general patterns starting/ending letters of words, starting/ending words of sentences lengths of words/sentences ... Letter frequencies of typical english text samples ABCDEFGHIJKLM 7.3 0.9 3.0 4.4 13.0 2.8 1.6 3.5 7.4 0.2 0.3 3.5 2.5 NOPQRSTUVWXYZ 7.8 7.4 2.7 0.3 7.7 6.3 9.3 2.7 1.3 1.6 0.5 1.9 0.1 heavy vowels: fE; I; O; Ag = more than 1/3 heavy consonants: fT; N; R; Sg = almost 1/3 low frequency symbols fJ; K; Q; X; Zg = less than 2/100 Popular frequency ordered alphabets (cited from F.L.Bauer: Entzifferte Geheimnisse) English (various sources) I etaoins(h)r dlucmfwypvbgkqjxz (1884) I etoanirs hdlcufmpywgbvkxjqz (1893) I etaoinsr hldcumfpgwybvkxjqz (1982) German (various sources) I enrisdutaghlobmfzkcwvjpqxy (1840) I enirsahtudlcgmwfbozkpjvqxy (1863) I enisratduhglcmwobfzkvpjqxy (1955) Artificial text samples one can generate random text by drawing symbols according to symbol frequencies in genuine text sources (0 order Markov source) better: Shannon's method (gives a 1st order Markov source) 1 take a large text sample (typical of the language) 2 select a random cursor position, σ = symbol at cursor 3 output σ, select a random cursor position 4 locate first occurence of σ after cursor 5 σ = character following cursor 6 back to 3. or STOP see published worksheet for illustration can be extended to 2nd, 3rd, . order sources Law of large numbers wanted: a suitable mathematical model for plaintext sources a stochastic source over alphabet A is a device that randomly emits ! “infinite texts" X = X1X2 ::: 2 A the source is called memoryless, if for every symbol a the probability P(Xn = a) =: pa is independent of n and of all previous or future symbols emitted let Nn(a; X ) denote the number of occurrences i, Xi = a in the prefix Nn(a;X ) X1;:::; Xn and let fn(a; X ) := n be the relative frequency of a in the prefix X1X2 ::: Xn Theorem If X is a random emission from a memoryless source with symbol probabilites (pa)a2A, then with probability 1 holds lim fn(a; X ) = pa : n!1 this law holds true also for relative frequency of pairs, triples, . , and more general \patterns" in the prefix important: the longer the text sample, the more stable are its stochastic features in terms of pattern frequency Ergodic sources a source is called stationary if probabilty of occurence of arbitrary \patterns" at position n of X is independent of n generalization of memoryless sources: source is called ergodic, if it is stationary and the law of large numbers holds for arbitrary patterns natural language sources are \close to" ergodic sources one feature is that for an ergodic source the (infinite) emission is \almost surely typical" (where typicality has a precise mathematical meaning that we will discuss later) Exercise 11 1 Implement a digram counter for text data and try to find some important \heavy pairs" by testing various text samples. 2 Extend this to triples (going much further probably doesn't make much sense for cryptanalysis) Table of Contents Elementary Cryptanalysis Classification of Cryptanalytic Attacks Stochastic structure of natural language - Part 1 Cryptanalysis by Frequency Analysis Breaking the Vigenere cipher Statistical Measures Cryptanalysis of Transposition Ciphers Published Worksheet Published worksheet 04 cryptanalysis by frequency analysis. Breaking a simple substitution cipher Ciphertext from a simple substitution cipher QWMMPQDVKUVFDTXJQVDBOPIDUHDQQUGDLAMWJGXBGURRBPBURMKULDVX OOKUJUOVDJQDGBWHLDJQQMUODQUBIMWBOVWUVXPBUBIOKUBGXBGURROK UJUOVDJQVPWMMDJOUQDVKDBVKDCDAQXEDFKXOKLPWBIQVKDQDOWJXVAP TVKDQAQVDHXQURMKULDVXOOKUJUOVDJQVKDJDTPJDVKDVPVURBWHLDJP TCDAQXQPTDBPJHPWQQXEDBDNDJVKDRDQQFDFXRRQDDVKUVQXHMRDQWLQ VXVWVXPBXQNDJAQWQODMVXLRDVPOJAMVUBURAVXOUVVUOCQ most frequent cipher symbols are D, V, Q, V, U, O, J, K, B (conjecture: these correspond to the heavy symbols) looks like the cipher takes E 7! D and T 7! V or T 7! Q rarest are E, N, S, Y, Z (conjecture: these correspond to the low frequency symbols) Exercise 12 1 Complete the analysis of this ciphertext. Hint: It is useful to replace recovered plaintext letters in lower case in the ciphertext. i.e. replacing e for D gives QWMMPQeVKUVFeTXJQVeBOPIeUHeQQUGeLAMWJGXBGURRBPBURMKULeVX... Once several letters have been identified it may help to first ignore the unidentified ones, as in the below ficticious example) and make a good guess. t.etopo.t.et.reetreesisato..o.oneo.t.reetree. 1 Using a brute force attack is an option for Caesar ciphers. Suggest a method to avoid it. Implement it in Sage. 2 Using a brute force attack is an option for affine ciphers. Suggest a method to avoid it. Try to implement it in Sage. 3 Implement a digram counter for text data and try to find some important \heavy pairs" by testing various text samples Analysis of Vigen`ereCiphers consider Vigenere ciphers as synonymous to periodic substitution cipher on the standard alphabet and with \short period" methods apply in principle to any periodic substitution cipher but are probably not powerful enough to break the Enigma or similar ciphers The column trick if (E; C) is polylaphabetic cryptosystem and for a specific key cipher EK has period `, then each of the \plaintext columns" (1) M = M1 M1+` M1+2` ::: (2) M = M2 M2+` M2+2` ::: ::: (`) M = M` M2` M3` ::: is enciphered by the same monoalphabetic cipher. the corresponding ciphertext columns C (1);:::; C (`) can be deciphered as simple substitution ciphers in particular: I the symbol distributions in the columns are permuted versions of the source language symbol distribution I the symbol distributions falling ordered are all very similar Frequency analysis of periodic ciphers Observation periodic ciphers destroy the stochastic structure of the source language, the distribution looks \more random" than normal source language the first task for the cryptanalist is to determine the period there are several methods of estimating the period often a combination is to be applied Decimation of a sequence given a sequence S = s0 s1 s3 ::: of symbols and a positive integer ` (the period) for 0 ≤ k < ` the k-th decimation of S is the sequence (`) Sk := sk sk+` sk+2` ::: decimating a sequence is a kind of downsampling Idea if m is a candidate period, consider and compare the decimated symbol distributions: compare them to \typical" source language distributions compare the decimations among each other (e.g., by bar-charts, if you have no other idea) more efficient: compare numerical parameters of distributions expectation of rank, variance of rank, entropy, index of coincidence (see below) Reminder on entropy binary entropy h(p) = −p log2 p − (1 − p) log2(1 − p) maximum at p = 0:5 (uniform distribution) general entropy P H(p1;:::; pN ) = − pi log2 pi 1 maximum at p1 = p2 = ::: = pN = N (uniform distribution) Fact. The \more uncertain" a distribution, the larger the entropy Remark Symbol distributions of natural languages (or programming source code or . ), are pretty predictable, i.e. they should have small entropy values. Kasiski's method Kasiski (1805-1881) was a Prussian officer (http://en.wikipedia.org/wiki/Friedrich