<<

EE 595 (PMP) Advanced Topics in Communication Theory Handout #1

Introduction to Cryptography. Symmetric Encryption.1 Wednesday, January 13, 2016

Tamara Bonaci Department of Electrical Engineering University of Washington, Seattle

Outline:

1. Review - Security goals 2. Terminology 3. Secure communication – Symmetric vs. asymmetric setting 4. Background: Modular 5. Classical cryptosystems – The shift cipher – The substitution cipher 6. Background: The Euclidean 7. More classical cryptosystems – The affine cipher – The Vigen´erecipher – The Hill cipher – The permutation cipher 8. Cryptanalysis – The Kerchoff Principle – Types of cryptographic attacks – Cryptanalysis of the shift cipher – Cryptanalysis of the affine cipher – Cryptanalysis of the Vigen´erecipher – Cryptanalysis of the Hill cipher

Review - Security goals

Last lecture, we introduced the following security goals:

1. Confidentiality - ability to keep information secret from all but authorized users. 2. Data integrity - property ensuring that messages to and from a user have not been corrupted by communication errors or unauthorized entities on their way to a destination. 3. Identity authentication - ability to confirm the unique identity of a user. 4. Message authentication - ability to undeniably confirm message origin. 5. Authorization - ability to check whether a user has permission to conduct some action. 6. Non-repudiation - ability to prevent the denial of previous commitments or actions (think of a con- tract). 7. Certification - endorsement of information by a trusted entity.

In the rest of today’s lecture, we will focus on three of these goals, namely confidentiality, integrity and authentication (CIA). In doing so, we begin by introducing the necessary terminology.

1 We thank Professors Radha Poovendran and Andrew Clark for the help in preparing this material.

1 1 Terminology

1.1 Crytpology, cryptography and cryptanalysis: Cryptology is an all-inclusive term for the study of communication over insecure and unreliable communi- cation channels. Cryptography is an algorithmic process of designing communication systems capable of realizing secure communication, and cryptanalysis is a process of analyzing cryptosystems, for the purpose of breaking the secrecy of the communication.

1.2 Communication channel and communicating parties A communication channel is a physical medium over which communication occurs. It can be either wired (e.g. copper wire, optic fiber) or wireless (e.g. radio). Communicating parties are entities wishing to (secretly) communicate. In the case of two-party communication, they are often referred to as (A)lice and (B)ob. Communicating parties can employ different modes of communication, such as: – Unicast: One-to-one or point-to-point communication. – Multicast: One-to-many or point-to-multipoint communication. – Broadcast: One-to-any or point-to-any point communication.2

1.3 Attacker Attacker, adversary or opponent is an entity communicating parties wish to conceal the information from. We typically differentiate between two types of attackers: – Eavesdropper - an attacker passively observing the communication channel (referred to as Eve). – An attacker actively trying to manipulate (decrypt) the communicated information (referred to as Mal- lory).

1.4 Plaintext and Ciphertext Plaintext is any information that Alice may wish to communicate to Bob. It can be text, numerical data, or anything else. Ciphertext is defined as a message that is transmitted over an insecure channel after the plaintext has been encrypted. Decryption of the ciphertext using the correct decryption algorithm and decryption key should produce the corresponding plaintext.

1.5 Encryption and Decryption Encryption is defined as a process of creating a ciphertext from a plaintext by using an encryption key K and following an encryption rule (algorithm) eK . Similarly, decryption is the process of obtaining the plaintext from a ciphertext by using a decryption key K and following a decryption rule (algorithm) dK . Encryption/decryption key(s) is secret shared by the communicating parties that is used in cryptographic operations. Thus, a cryptographic system can formally be described as follows.

1.6 Formal description of a cryptosystems A cryptosystem is a 5-tuple (P, C, K, E, D). 1. P is the set of possible plaintexts. 2. C is the set of possible ciphers. 3. K is the set of possible keys. 4. E is the encryption rule set. 5. D is the decryption rule set.

2 Multicast can be seen as a special case of broadcast communication.

2 Let x ∈ P, K ∈ K. Encryption is a rule eK ∈ E, and decryption is a rule dK ∈ D. In order to have a well-defined cryptosystem, we require: dK (eK (x)) = x. (1) In words, decryption should recover the original plaintext.

2 Secure Communication

Let’s now assume that Alice wants to send a message to Bob over an insecure channel, and that neither Alice nor Bob want this information to be readable by any other parties.

1. What does Alice do? (a) Alice takes plaintext x = x1x2 . . . xn for some n ≥ 1, where ∀i, xi ∈ P, and encrypts it with a key K ∈ K, using the encryption rule eK to generate ciphertext (cipher) y = y1y2 . . . yn. (b) She then transmits the cipher y over the insecure channel.

2. What does Bob do? (a) Bob knows the key K and the decryption algorithm dK . (b) He receives the cipher y and runs decryption. (c) Bob recovers the plaintext x = dK (y).

2.1 Symmetric vs. Asymmetric Setting

Encryption/decryption can broadly be divided into two groups: symmetric key and public key. In symmetric key algorithms, the encryption and decryption keys are known to both Alice and Bob. For example, the encryption and the decryption key might be the same, or the encryption key is shared, and the decryption key is easily calculated from it. In public key algorithms (also known as asymmetric key algorithms), the encryption key is made public, but it is computationally infeasible to find the decryption key without information known only to a party intended to receive the ciphertext. Simple (non-mathematical) way of thinking about public key communication: Bob sends Alice a box and an unlocked padlock. Alice puts her message in the box, locks Bob’s lock on it, and sends the box back to Bob. Now only Bob can open the box, and read the message.

3 Background:

In many cryptographic systems, the communicated messages are represented by numerical values prior to being encrypted and transmitted. For example, the English alphabet consists of 26 letters. As shown in Table 1, we can denote the element of the alphabet with corresponding numbers.

A B C D E F G H I J K L M 0 1 2 3 4 5 6 7 8 9 10 11 12 N O P Q R S T U V W X Y Z 13 14 15 16 17 18 19 20 21 22 23 24 25

Table 1. Mapping of alphabets to numerals

The encryption processes can now be thought of as mathematical operations that turn the input nu- merical values into output numerical values. Building, analyzing and attacking such cryptosystems requires mathematical tools, and the most important of these is number theory.

3 Definition 1. Let a and b be , a, b ∈ Z and let m be a positive integer, m ∈ Z+. If m divides (a − b), we can write: a ≡ b (mod m) (2) or m|(a − b) (3) The operator ≡ is called congruence and a ≡ b (mod m) is read: “a is congruent to b modulo m.” The positive integer m is known as the modulus.

3.1 Properties of modulo arithmetic

Let Zm denote the set of integers {0, 1, 2, . . . , m − 1}. 1. a ≡ b (mod m) if and only if a (mod m) = b (mod m), i.e. the remainders of a and b modulo m are equal. 2. Addition is closed: for any a, b ∈ Zm, a + b ∈ Zm. 3. Addition is commutative: for any a, b ∈ Zm, a + b = b + a. 4. Addition is associative: for any a, b, c ∈ Zm, (a + b) + c = a + (b + c). 5. 0 is an additive identity: for any a ∈ Zm, a + 0 = 0 + a = a. 6. The of any a ∈ Zm is m − a: that is a + (m − a) = (m − a) + a = 0, ∀a ∈ Zm. 7. is closed: for any a, b ∈ Zm, ab ∈ Zm. 8. Multiplication is commutative: for any a, b ∈ Zm, ab = ba. 9. Multiplication is associative: for any a, b, c ∈ Zm, (ab)c = a(bc). 10. 1 is the multiplicative identity: for any a ∈ Zm, a × 1 = 1 × a = a. 11. The distributive property is satisfied: for any a, b, c ∈ Zm, (a+b)c = (ac)+(bc) and a(b+c) = (ab)+(ac).

Properties 1, 3-5, say that Zm forms a . Since property 2 also holds, the group is called an abelian group. Properties 1-10 make Zm a .

We can also define in Zm as (a − b) mod m.

4 Classical Cryptosystems

Methods of making communicated messages unintelligible to attackers have been important throughout history. In this section, we cover some of this older cryptosystems that were primarily used before the advent of computers. In doing so, we will make use of number theory, especially modular arithmetic we just reviewed. We start with the shift cipher.

4.1 The Shift Cipher The shift cipher is one of the oldest known cryptosystems, often attributed to Julius Caesar. The idea used in this cryptosystem is to replace each letter in an alphabet by another letter at a distance K from it.

Formally, let’s associate each letter A, B, ..., Z with an integer 0,..., 25. If we allow the key K to be any integer with 0 ≤ K ≤ 25, the shift cipher can be defined as:

P = C = K = Z26. For 0 ≤ K ≤ 25,

y = eK (x) = (x + K) mod 26, (4)

x = dK (y) = (y − K) mod 26. (5)

Example: Let K = 3 and let the plaintext be shift. Assume each letter is shifted right (or left) by 3 places. We then get VKLIW as the cipher for the right shift, ir PEFCQ, for the left shift.

4 Is the Shift Cipher Secure? NO. Let’s try a brute force attack: Assume Eve knows a shift cipher algorithm is used for encryption, and she observes the ciphertext V KLIW . Given the small cardinality of the key space, Eve can try all the possible 26 shifts in right direction. Upon shifting, the following plaintexts are obtained: 1stleft shift 2ndleft shift 3rdleft shift vkliw −→ ujkhv −→ tijgu −→ shift, and so on. Since, “shift” is the only dictionary word in the list of 26 possible words, Eve can assume that it is indeed the plaintext that was encrypted. Thus, Eve not only recovers the plaintext, but also infers the original key K = 3.

4.2 The Substitution Cipher

In the shift cipher cryptosystem, each letter (alphabet) of the plaintext is replaced with an alphabet at a fixed distance determined by the key K. Given the keyspace, K = Z26, there are only 26 possible keys in this cipher. The substitution cipher overcomes this limitation, and provides a much larger keyspace. The idea of the substitution cipher is to replace each alphabet of the plaintext with an alphabet at an arbitrary distance.

Formally, we can describe this cryptosystem as follows. Let P = C = Z26. The keyspace K includes all possible permutations of the 26 symbols, 0, 1,..., 25. For each permutation π ∈ K:

y = eπ(x) = π(x), (6) −1 dπ(y) = π (y). (7)

π−1 denotes inverse permutation to π.

Is the Substitution Cipher Secure? Brute force attack: Since a key consists of a permutation of the 26 letters, the keyspace is very large (26! ≈ 4.0 × 1026). Hence, the key space in the substitution cipher is much larger than the key space of the shift cipher, and a brute force attack (exhaustive) search will take a long time. However, other attacks are feasible against the substitution cipher. For example, frequency analysis may allow us to break this cipher, as we will shortly show.

5 Background: The Euclidean Algorithm(s)

Many of the cryptosystems covered in this course involve finding the multiplicative inverse of an integer a, denoted as a−1 under modulo arithmetic with base integer b. The Euclidean algorithm and the extended version become handy in finding those inverses. We will first review the basic ’s algorithm for finding the greatest common divisor (gcd)3 between two integers a, b, with the assumption a > b. We will then state the condition for the equation ax ≡ 1 modulo b to have a solution. We will then present the extended Euclidean algorithm that helps us find the a−1 under modulo arithmetic with base b.

Lemma 1. Let a and b be integers. There exists a unique integer d satisfying the following properties:

1. d|a and d|b 2. If c is another integer such that c|a and c|b, then c|d.

d is defined to be the greatest common divisor (gcd) of a and b.

The Euclidean algorithm can be used to find the gcd of two integers. This algorithm finds the gcd through repeated integer . First, r0 = a is divided by r1 = b and the remainder r2 is found. In the next step, r1 = b is divided by r2 and the remainder r3 is found. The process continues until the remainder of rm−1 divided by rm is zero. The gcd(a, b) = gcd(r0, r1) is the last non-, namely rm. The steps of the are as follows:

3 The greatest common divisor of two number a and b is the largest positive integer dividing both a and b.

5 r0 = q1r1 + r2

r1 = q2r2 + r3

r2 = q3r3 + r4 ······

rm−2 = qm−1rm−1 + rm

rm−1 = qmrm

The terms ri are the remainders at each step of the equations. The terms qi are the quotients. Now consider the equation ri = qi+1ri+1 + ri+2. The relationship between the divisor ri+1 and the remainder ri+2 is given by 0 ≤ ri+2 < ri+1. We also assumed that r0 > r1. Hence, we can write r0 > r1 > r2 > ··· rm.

EUCLIDEAN ALGORITHM Input: Positive integers a and b Output: Greatest common divisor d of a and b r0 ← a r1 ← b m ← 1 while rm 6= 0 rm−1 qm ← b c rm rm+1 ← rm−1 − qmrm m ← m + 1 end while d ← rm−1 return d

Fig. 1. The Euclidean algorithm. Finds the greatest common divisor of a and b, where a > b.

Example: Let a = 87 and b = 24. Then we have:

87 = 3(24) + 15 24 = 1(15) + 9 15 = 1(9) + 6 9 = 1(6) + 3 6 = 2(3)

Therefore gcd (87, 24) = 3.

Some Properties of the Euclidean Algorithm

1. Algorithm terminates in finite steps. 2. rm is the gcd(a, b) = gcd(r0, r1).

Proof: The remainder sequence ri is non-negative and monotonically decreasing. The first term r0 is finite. Since each remainder is integer, the difference between any two adjacent remainders is at least one. Hence, the sequence must reach the limit value of 0 in finite steps. In the worst case, it will take r0 steps to terminate. To show that rm is the gcd(a, b), let’s denote d = gcd(a, b). Then d|a, d|b. Hence d|r2. In addition, since d|r1, d|r2, and r1 = q2r2 + r3, we can conclude d|r3. By induction, let’s assume that d|ri for all values of

6 i < j. Then rj−2 = qj−1rj−1 + rj implies that d|rj. Hence, by induction, d divides all the remainders. In particular, d|rm, the last non-zero divisor. On the other hand, rm|rm−1 at the last step. Looking up one step above the last step, we have rm−2 = qm−1rm−1 + rm. Since rm divides the right hand side, rm|rm−2. Continuing this way up, by induction, let’s assume that rm|rm−l for l < j. Then looking at rm−j = qm−(j−1)rm−(j−1) + rm−(j−2), the right hand side is divisible by rm. Hence, rm|rm−j. Therefore, by induction, we have that rm|b and rm|a. Hence, rm is a common divisor of a, b. Since d=gcd(a,b), by definition, rm|d. We now have rm|d and d|rm. Hence, d = rm = gcd(a, b).

When the gcd(a, b) = 1, the Euclidean algorithm also allows us to find the multiplicative inverse of a under modulo b. The following lemma is key to finding the inverses. Lemma 2. Let a and b be positive integers, and let d = gcd (a, b). Then there exist integers x and y such that ax + by = d (8)

Question: Suppose we have such integers x and y. How can we use them to find the inverse of a modulo b?

Answer: We have seen that a has an inverse mod b if and only if gcd (a, b) = 1. By Lemma 2, there exist x and y such that ax + by = 1 (9) Then we can write 1 − ax = by, which is the same as b|(ax − 1). Finally, by definition: ax ≡ 1 (mod b) (10) Thus, if we can find x and y satisfying Eq. (9), we can invert a modulo b. The algorithm for finding x and y is called the extended Euclidean algorithm:

Example: Let a = 7, m = 26. Find a−1 mod m. First, let’s look at the Euclidean algorithm. 26 = 3(7) + 5 (11) 7 = 1(5) + 2 (12) 5 = 2(2) + 1 (13) Now, let’s rewrite the last equation to put the gcd (which is 1) on to the left-hand side of the equation. 1 = 5 − 2(2) (14) From Eq. (13), we have 2 = 7 − 5 (15) Substituting Eq. (15) into Eq. (14) yields 1 = 5 − 2(7 − 5) = 3(5) − 2(7) (16) We’re almost there; the last step is to use Eq. (12), as follows: 5 = 26 − 3(7) (17) so that 1 = 3(26 − 3(7)) − 2(7) = 3(26) − 11(7) (18) And so 7−1 mod 26 = −11 mod 26 = 15 mod 26.

The math behind the computation of x, y, has its own update equations as shown below. The main statement is the following:

7 EXTENDED EUCLIDEAN ALGORITHM Input: Positive integers a and b Output: Integers r, s, and t such that r = gcd (a, b) and sa + tb = r a0 ← a b0 ← b t0 ← 0 t ← 1 s0 ← 1 s ← 0 q ← b a0 c b0 r ← a0 − qb0 while r > 0 temp ← t0 − qt t0 ← t t ← temp temp ← s0 − qs s0 ← s s ← temp a0 ← b0 b0 ← r q ← b a0 c b0 r ← a0 − qb0 end while r ← b0 return (r, s, t)

Fig. 2. The extended Euclidean algorithm.

Lemma 3. Let r0 = a and r1 = b be positive integers, and let d = gcd (a, b). If the sequences ri, i > 1 satisfy the division algorithm such that ri−2 = qi−1ri−1 + ri, Then there exist integers xi and yi such that

axi + byi = ri. (19)

The logical proof goes via induction as follows: We know from r0 = q1r1 +r2 that r2 = r0 −q1r1 = a−q1b. Similarly, from r1 = q2r2 + r3, we have r3 = r1 − q2r2 = b − q2(a − q1b) = (1 + q1q2)b − q2a. Hence, letting x2 = 1, y2 = −q1, x3 = −q2, y3 = (1 + q1q2), we have r2 = ax2 + by2 and r3 = ax3 + by3. Let this be true for all values of i < j. Then we can write rj = rj−2 − qj−1rj−1. But by induction, we then have rj = (axj−2 + byj−2) − qj−1(axj−1 + byj−1) leading to rj = a(xj−2 − qj−1xj−1) + b(yj−2 − qj−1yj−1). Letting xj = xj−2 − qj−1xj−1 and yj = yj−2 − qj−1yj−1 leads to rj = axj + byj. Hence by induction, at the final step of the division algorithm, we have rm = d = gcd(a, b) = axm + bym. Setting the final values to x = xm and y = ym leads to d = ax + by as desired.

5.1 The Affine Cipher The idea of the affine cipher is to first scale and then shift, which is known as the affine transformation.

y = eK (x) = (ax + b) mod 26, (20) −1 dK (y) = a (y − b) mod 26. (21) In this scheme, the pair (a, b) denotes the cryptographic key K used for encryption/decryption. Here we need to know which pairs (a, b) are valid keys that yield an injective encryption function. Note that we need to know a−1 for decryption. Also if a = 1, the affine cipher becomes identical to the shift cipher.

Example: Let a = 9 and b = 3. Let the plaintext be d that corresponds to the numerical value 3, based on Table 1. eK (d) = (9 × 3 + 3) mod 26 = 4. (22)

8 For the decryption part,

−1 −1 −1 dK (4) = a (4 − b) mod 26 = 9 (4 − 3) mod 26 = 9 (mod 26) = 3, (23)

which is the multiplicative inverse of 9 (mod 26), i.e. 9 × 3 ≡ 1 (mod 26).

Decryption of the Affine Cipher

−1 Definition 2. The modular multiplicative inverse of an integer a ∈ Zm modulo m, denoted as a mod m, 0 0 0 is an element a ∈ Zm such that aa ≡ a a ≡ 1 (mod m).

If m is prime, every non-zero element of Zm has a multiplicative inverse. The modular multiplicative inverse of an integer a ∈ Zm can be found using the Extended Euclidean Algorithm. Given the multiplicative inverse, the congruence y ≡ ax + b (mod 26) can be solved for x as follows.

ax ≡ y − b (mod 26), (24) a−1(ax) ≡ a−1(y − b) (mod 26), (25) a−1(ax) ≡ (a−1a)x ≡ 1x ≡ x (mod 26), (26) x = a−1(y − b) mod 26. (27)

Problem with the Choice of a Not all choices of a have a multiplicative inverse. As an example, consider the case where a = 13 and b = 3. Assume the plaintext is the word busted. Using Table 1, we can compute cipher for busted as follows.

eK (1) = (13 × 1 + 3) mod 26 = 16 = Q. (28)

eK (20) = (13 × 20 + 3) mod 26 = 3 = D. (29)

eK (18) = (13 × 18 + 3) mod 26 = 3 = D. (30)

eK (19) = (13 × 19 + 3) mod 26 = 16 = Q. (31)

eK (4) = (13 × 4 + 3) mod 26 = 3 = D. (32)

eK (3) = (13 × 3 + 3) mod 26 = 3 = Q. (33)

i.e. busted −→ QDDQDQ.

Since multiple plaintexts will result in this ciphertext (for instance, the word dealer also encrypts to QDDQDQ), no unique decryption is possible here. This is due to the fact that a = 13 does not have a multiplicative inverse in Z26. For your interest you can also work out the example for a = 2, and see that affine cipher does not work. It is thus important to characterize the integers that have multiplicative inverses mod 26, and in doing so, we have to reconsider the concept of greatest common divisor 4.

Theorem 1. If gcd(a, m) = 1 then ax ≡ y (mod m) has a unique solution.

First, note that an integer a has an inverse mod m if and only if there exist p and q such that ap+mq = 1. We have 1 = ap + mq ≡ ap mod m, which implies that a has multiplicative inverse p mod m. On the other hand, r ≡ 1 mod m if and only if r + bm = 1 for some b, implying that ap ≡ 1 mod m if and only if ap + mq = 1 for some q. This, in turn, can only happen when gcd (a, m) = 1. To see why, let c = gcd (a, m) and suppose c > 1. Then there exist positive integers α, β satisfying a = αc and m = βc. If ap + mq = 1 for some p, q, then pcα + qcα = 1, hence c(pα + qα) = 1. This is a contradiction since there are no positive integers that divide 1 (except 1 itself).

4 Given two integers a and b, the greatest common divisor of a and b (denoted gcd (a, b)) is equal to the largest integer c that divides both a and b

9 The other direction is also true: if gcd (a, m) = 1, then there exist integers p, q satisfying ap + mq = 1. These integers can be computed using the extended Euclidean algorithm. The integer p will be the multiplicative inverse of a mod m.

Example: Given m = 26, for a = 13 we have gcd(13, 26) = 13 6= 1. Also if a = 2 then gcd(2, 26) = 2.

But for a = 9, gcd(9, 26) = 1 and hence the affine cipher works. Similarly for a = 1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25 we have gcd(a, 26) = 1. Hence, a can take a total of 12 values with unique inverses in Z26, and b can take any of the 26 values in Z26. Therefore the key space is limited to 12 × 26 = 312 values for K, and a brute force attack (exhaustive search is possible).

Computation of the Cardinality of the Key Space for the Affine Cipher

Definition 3. An integer p > 1 is prime if it has no positive divisors other than 1 and p.

Theorem 2 (Unique Factorization). For any integer m, there exists an integer n, a set of distinct primes p1, . . . , pn, and a set of integers e1, . . . , en satisfying

e1 e2 en m = p1 p2 ··· pn (34)

Furthermore, the sequences p1, . . . , pn and e1, . . . , en are unique up to reordering of the pi’s.

Example: For m = 192, 432 = 24 × 33. (35)

This factorization is unique up to a rearrangement of the terms on the right hand side (i.e., we can write 33 × 24 instead).

Definition 4. Two integers a ≥ 1 and m ≥ 2 are said to be relatively prime if gcd(a, m) = 1. The number of integers in Zm that are relatively prime to m is known as the Euler-phi function, denoted by φ(m).

Theorem 3. Let n Y ei m = pi , (36) i=1

where pi are distinct primes and ei > 0, 1 ≤ i ≤ n. Then

n Y ei ei−1 φ(m) = (pi − pi ). (37) i=1

Based on Theorem 2, the cardinality of the key space for the affine cipher is mφ(m).

Example: For m = 60 60 = 22 × 31 × 51, (38)

and, φ(m) = (4 − 2) × (3 − 1) × (5 − 1) = 16. (39)

The cardinality of the key space 60 × 16 = 960 keys.

10 5.2 The Vigen`ereCipher

The idea behind this cryptosystem is to use a vector of m keys, i.e., K = (K1,K2, .., Km). m m P = C = K = (Z26) where (Z26) is an m-tuple. The difference between the Vigen`erecipher and the shift, substitution, and affine ciphers is that in the Vigen`erecipher each alphabetic character is not uniquely mapped to another alphabetic character.

y = eK (x1, x2, .., xm) = (x1 + K1, x2 + K2, .., xm + Km) mod 26, (40)

dK (y1, y2, .., ym) = (y1 − K1, y2 − K2, .., ym − Km) mod 26. (41)

Example Let the plaintext be vector, and let m = 4, K = (2, 4, 6, 7). From the correspondence table we have x = (21, 4, 2, 19, 14, 17), and the cipher is shown in Table 5.2.

PLAINTEXT: 21 4 2 19 14 17 KEY: 2 4 6 7 2 4 CIPHER: 23 8 8 0 16 21 XIIA QV

To decrypt, we use the same keyword, but modulo subtraction is performed instead of modulo addition. The number of possible keywords of length m is 26m, so even for small m an exhaustive search attack requires a long time.

5.3 The Hill Cipher

Consider the affine cipher, where e(a,b)(x) = ax + b mod m, and suppose that b = 0, so that encryption becomes equal to e(a,0)(x) = ax mod m, i.e. multiplication by the secret key a modulo m. Decryption is then −1 given by dK (y) = a y mod m, provided that gcd (a, m) = 1.

Question: How can we generalize this from x corresponding to a single letter to x corresponding to a string of letters?

Answer: The idea is to choose an integer m > 0, and then to define an m × m matrix K.

Example: Let consider an example where m = 2. We can define K as:

 2 3  K = . (42) 5 7

In this cryptosystem, a plaintext is written as row matrices. For example, if plaintext is test, we write it as:

19 4  , (43)

18 19  . (44) Encryption of te is:  2 3  19 4  = 38+20 57+28  = 6 7  mod 26. (45) 5 7 Encryption of st is:  2 3  18 19  = 20 21  mod 26. (46) 5 7

11 Hence, the cipher is: 6 7 20 21  , (47) which is GHUV . To decrypt, we will use K−1 as the decryption key. This begs the following question.

Question: What does it mean for a matrix to be invertible mod 26?

Answer: Much like with numbers, there is an identity matrix over integers mod n. The m × m iden- tity matrix mod n (denoted Im) has 1’s along the diagonal and 0’s elsewhere. As with the reals, for any −1 matrix K, we have KIm = ImK = K. A matrix K is invertible mod n when there exists a matrix K such −1 −1 that KK = K K = Im. Recall that a matrix K is invertible over the real numbers when its is non-zero (see Stin- son, 3rd ed, pg 16 for a definition of the determinant). Analogously, K is invertible over Zn when det K is invertible mod n, i.e. when gcd (det K, n) = 1.

Question: How do we compute K−1 mod 26?

Theorem 4. Let K be a matrix such that gcd (det K, n) = 1. Then

K−1 mod n ≡ (det K)−1K∗ mod n (48)

∗ i+j where the (i, j)-th entry of K is equal to (−1) det Kji and Kji is obtained by deleting the j-th row and i-th column of K.

Example: When K is equal to the above encryption matrix, we have

 2 3  K = (49) 5 7

and  7 −3   7 23  K∗ = ≡ mod 26 (50) −5 2 21 2 Furthermore, we have (det K)−1 mod 26 ≡ 25−1 mod 26 ≡ 25 mod 26 (51) Hence

 7 23  K−1 mod 26 = (det K)−1K∗ mod 26 = 25 mod 26 (52) 21 2  175 575   19 3  = mod 26 = mod 26 (53) 525 50 5 24

To decrypt with the Hill cipher, we multiply the ciphertext by K−1. We leave it as an exercise to verify that yK−1 is equal to the original plaintext in this case.

m Stated formally, the Hill cipher has P = C = (Z26) , where m ≥ 2. K ={set of all m × m invertible matrices over Z26}. For K ∈ K:

eK (x) = xK, (54) −1 dK (y) = yK . (55)

12 5.4 The Permutation Cipher The idea of the permutation cipher (also known as the transposition cipher) cryptosystem is to gener- ate the ciphertext by altering the positions of the characters in the plaintext, i.e. to rearrange the alphabets using a permutation. In contrast to the substitution cipher, there is no replacement of characters (it is sim- ilar to just scrambling the letters of a word). Formally, we describe the permutation cipher cryptosystem as follows.

m Let P = C = (Z26) , where m is a positive integer. K includes all permutations of {1, ..., m}. For each permutation π ∈ K:

y = eπ(x1, ..., xm) = (xπ(1), ..., xπ(m)) (56)

x = dπ(y1, ..., ym) = (yπ−1(1), ..., yπ−1(m)). (57) π−1 denotes inverse permutation to π.

Example: For illustration, let’s consider m = 6, and permutation (the key) π is as follows:

j 1 2 3 4 5 6 π(j) 3 5 1 6 4 2

To obtain π−1, interchange the rows, and sort the columns such that the first row is in ascending order. We obtain:

j 1 2 3 4 5 6 π−1(j) 3 6 1 5 2 4

For encryption, if the plaintext is followashore, we first partition the plaintext into groups of six letters as: follow | ashore. Using the above key π, we re-arrange each group of six alphabets as: LOFWLO | HRAEOS. Similarly, the ciphertext can be decrypted using the inverse permutation π−1.

Note: The permutation cipher is a special case of the Hill Cipher. Consider the above encryption rule π(x). It can be written as a Hill encryption matrix Kπ as follows:

 0 0 1 0 0 0   0 0 0 0 0 1     1 0 0 0 0 0  Kπ =   (58)  0 0 0 0 1 0     0 1 0 0 0 0  0 0 0 1 0 0 and the decryption matrix is:

 0 0 1 0 0 0   0 0 0 0 1 0    T  1 0 0 0 0 0  K =   (59) π  0 0 0 0 0 1     0 0 0 1 0 0  0 1 0 0 0 0 Note that the decryption matrix is the transpose of the encryption matrix, i.e. we obtain the decryption matrix by interchanging the rows and columns of the encryption matrix.

13 6 Cryptanalysis

Now that we have defined some simple classical cryptosystems, we might be interested in how secure these cryptosystems are (or how could one go about breaking them). In doing so, we turn to cryptanalysis, and start by considering one of the most important assumptions in the modern cryptography, namely the Kerchoff’s principle.

6.1 Kerchoff’s Principle:

The Kerchoff’s principle was introduced in 1883 by A. Kerchoff, and it states that in assessing the security of a cryptosystem, one should always assume that an attacker know the details of the cryptosystem being used. In other words, an attacker knows the tuple (P, C, K, E, D) defining the cryptosystem. Therefore, the security of the system should always be based on the key, and not on the obscurity of a cryptographic algorithm.

6.2 Attack models

An attacker can have different goals when attacking a channel between communicating parties. For example, an attacker may wish to:

1. Read one specific message. 2. Find the encryption/decryption key, and thus read all of the exchanged messages. 3. Corrupt Alice’s message into another message in such a way that Bob thinks that Alice has sent the altered message. 4. Masquerade as Alice in order to communicate with Bob such that Bob believes he is communicating with Alice.

For each of these goals, there are four main types of attacks that an attacker can use, and those types differ in the amount of information an attacker has available when trying to determine the key. Those four attack types are as follows.

Type of attack Description Ciphertext only attack Eve only observes the ciphertext y Known plaintext attack Eve knows the ciphertext y corresponding to plaintext x Chosen plaintext attack Eve has temporary access to an encryption box. The encryption box takes as input any chosen plaintext x and outputs the ciphertext y Chosen ciphertext attack Eve has temporary access to a decryption box. The decryption box takes as input any chosen ciphertext y and outputs the plaintext x Based on these models, we can analyze the security of every cryptosystem.

6.3 Cryptanalysis of the Shift Cipher

– Ciphertext only: Let K = 3 and the plaintext be shift. We then get VKLIW as the cipher (for a right shift). Assume Eve knows only the ciphertext V KLIW . Eve also knows that a shift cipher algorithm is used for encryption. Given the small cardinality of the key space, Eve can try all the possible 26 shifts in right direction. Upon shifting, the following plaintexts are obtained:

1stleft shift 2ndleft shift 3rdleft shift vkliw −→ ujkhv −→ tijgu −→ shift, and so on. Since “shift” is the only dictio- nary word in the list of 26 possible words, Eve assumes that it is indeed the plaintext that was encrypted. Therefore, Eve can also infer the original key K = 3.

14 – Known plaintext: If Eve knows a (plaintext, ciphertext) pair, then Eve can find the key by subtracting the plaintext from the ciphertext mod 26. For instance, if Eve knows that plaintext b corresponds to ciphertext E, then Eve can determine that K = 3.

– Chosen plaintext: Choose letter a as plaintext; the resulting ciphertext will be the key. For example, if the ciphertext is P then K = 15.

– Chosen cipher: Choose A as the ciphertext. The plaintext is then the negative of the key K.

6.4 Remarks on Letter Distribution of the English Language English language text has different frequencies for different alphabets. An estimate of relative frequencies (probabilities) of the 26 letters are as indicated in Table 6.4. Note that the letter e has the maximum relative frequency of 0.127.

Table 2. Probabilities of occurrence of the 26 letters of the English language alphabet.

A B C D E F G H I J K L M 0.082 0.015 0.028 0.043 0.127 0.022 0.020 0.061 0.070 0.002 0.008 0.040 0.024 N O P Q R S T U V W X Y Z 0.067 0.075 0.019 0.001 0.060 0.063 0.091 0.028 0.010 0.023 0.001 0.020 0.001

Similarly we can define frequencies of digrams, trigrams, initial letters, final letters, etc. More generally, we can then use the statistical properties of the English language to perform cryptanalysis. A key observation here that the vowels ”a, e, i, o” and the letters ”t, s, b, h, d” have relatively high probability of appearance in the English language. Table 6.4 indicates the rank order of vowels based on their frequencies, and Table 6.4 the rank order of consonants ”t, s, d, n, h” based on their frequencies.

Table 3. Rank order of the probabilities of occurrence of the vowels.

E 0.0127 A 0.082 I 0.075 O 0.070 U 0.028

Table 4. Probabilities of most frequently occurring consonants.

T 0.091 S 0.063 N 0.067 H 0.061 D 0.043

6.5 Cryptanalysis of the Affine Cipher – Ciphertext only attack: Let’s assume Eve that has intercepted the following ciphertext:

15 FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSH VUFEDKAPRKDLYEVLRHHR The most frequent letters are R with 8 occurrences, D with 7, E,K,H with 5 and F,V,S with 4. First guess is that R = e and D = t. Given the encryption function

eK (x) = ax + b (60) we get the following linear system: 4a + b = 17 (61) 19a + b = 3. (62)

Solving the system we obtain the unique solution a = 6, b = 19 (note that a solution must be in Z26). But for the affine cipher a has to be relatively prime to 26. Given that gcd(26, 6) = 2, a = 6, b = 19 is not a valid key. Second guess R = e and E = t. Solving the linear system yields a = 13 which again is not a legal key. Third guess is R = e and K = t, which yields a = 3, and b = 5. Since this is a valid key we decrypt the entire ciphertext to see if we get a meaningful English text.

algorithms are quite general definitions of arithmetic processes

Note: Besides the statistical analysis, Eve could have tried all possible 312 pairs (a, b) that constitute a valid key for the affine cipher.

– Known plaintext attack: Let Eve know that uw = 20 22, has cipher KQ = 10 16. She can then setup the following system of linear equations: 10 = 20a + b (mod 26), (63) 16 = 22a + b (mod 26). (64) Equations 63 and 64 give: 6 = 2a mod 26. i.e. 2a = q × 26 + 6 ⇒ a = 3, 16. But gcd(16, 26) 6= 1 ⇒ a = 3. From Equation 63 we can now get b as follows: 10 = 20 × 3 + b (mod 26), (65) i.e. − 50 = b (mod 26) (66) i.e. b = q × 26 + (−50) ⇒ q = 2 ⇒ b = 2. (67) Hence Eve only needs to know two pairs of (cipher, plaintext) pairs.

– Chosen plaintext: If Eve can choose ab = 0 1 as plaintext, the cipher will be: 0 × a + b ≡ b (mod 26), (68) 1 × a + b ≡ a + b (mod 26). (69) and Eve can easily find the key K.

– Chosen ciphertext: Eve chooses AB as cipher, and proceeds as above.

6.6 Cryptanalysis of the Vigen´ereCipher – Known plaintext attack: If Eve knows at least m (ciphertext, plaintext) pairs, then by subtracting the plaintext from the ciphertext she can get the vector of m keys. – Chosen plaintext attack: Eve can simply choose aa..a as plaintext, and she readily gets K as cipher: | {z } m Note: Eve does not have to choose x = aa...a as plaintext, as any known plaintext will reveal the key | {z } m K.

16 a a a ... a 0 0 0 ... 0 + K1 K2 K3 ... Km K1 K2 K3 ... Km

A A A ... A 0 0 0 ... 0 - K1 K2 K3 ... Km −K1 −K2 −K3 ... −Km

– Chosen ciphertext attack: Eve chooses y = AAA..A as ciphertext, and the plaintext obtained is then | {z } m the negative of the key K. Again, Eve does not need choose AAA..A as the ciphertext, as any chosen ciphertext will do. | {z } m

6.7 Cryptanalysis of the Hill Cipher – Known plaintext attack: The Hill cipher is difficult to break with a ciphertext only attack, but a known plaintext attack can be easily launched. Assume that Eve knows the key length, m, and that she has sufficient number of (plaintext, ciphertext) pairs. Then she can define a matrix equation Y = XK and solve for K by inverting matrix X, so that K = X−1Y .

Example: Let’s assume that Eve knows that the key length is m = 2 and that plaintext friday (5 17 8 3 0 24) yields ciphertext PQCFKU (15 16 2 5 10 20). Then she can find the following plaintext matrix:

 5 17  X = . (70) 8 3

and its inverse:

 9 1  X−1 = . (71) 2 15 She can now compute the key K as:

 9 1   15 16   7 19  K = = . (72) 2 15 2 5 8 3

Note: If m is unknown, Eve can proceed using trial and error for different values of m.

Sources for Today’s Lecture:

1. Douglas R. Stinson, Cryptography, Theory and Practice, 3rd edition. CRC Press, 2005, p. 1–39 and 48–54. 2. Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Coding Theory. Prentice Hall, 2002, p. 1–26 and 59–95. 3. Neil Daswani, Christoph Kern, and Anita Kesavan Foundations of Security, What Every Programmer Needs to Know. Apress, 2007, p. 203–221.

17