<<

CS 241 Data Organization

March 22, 2018

• In , a cipher (or cypher) is an algorithm for performing or decryption. • When using a cipher, the original information is known as plaintext, and the encrypted form as . • The encrypting procedure of the cipher usually depends on a piece of auxiliary information, called a . • A key must be selected before using a cipher to encrypt a message. • Without knowledge of the key, it should be difficult, if not nearly impossible, to decrypt the resulting ciphertext into readable plaintext.

• In cryptography, a substitution cipher is a method of encryption by which units of plaintext are replaced with ciphertext according to a regular system. • Example: case insensitive substitution cipher using a shifted alphabet with keyword ”zebras”: • Plaintext alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ • Ciphertext alphabet: ZEBRASCDFGHIJKLMNOPQTUVWXY flee at once. we are discovered! Enciphers to SIAA ZQ LKBA. VA ZOA RFPBLUAOAR! Other substitution ciphers Shift alphabet by fixed amount. (Caesar apparently used 3.) ROT13 Replace letters with those 13 away. Used to hide spoilers on newsgroups. Replace letters with symbols. Substitution Cipher: Encipher Example

OENp(ENTE#X@EN#zNp(ENCL]pEnN7p-pE;8N]LN} dnEdNp#Nz#duN-Nu#dENXEdzE9pNCL]#L8NE;p-b @];(N0G;p]9E8N]L;GdENn#uE;p]9Nld-L/G]@]p _8NXd#|]nENz#dNp(EN9#uu#LNnEzEL;E8NXd#u# pENp(ENQELEd-@NOE@z-dE8N-LnN;E9GdENp(EN^ @E;;]LQ;N#zN<]bEdp_Np#N#Gd;E@|E;N-LnN#Gd NT#;pEd]p_8Nn#N#dn-]LN-LnNE;p-b@];(Np(]; N5#L;p]pGp]#LNz#dNp(ENCL]pEnN7p-pE;N#zN) uEd]9-D Breaking a Substitution Cipher In English, • The most common character is the space: “ ”. • Letters in order of frequency are: ETAONRISHDLFCMUGYPWBVKXJQZ • Letter pairs in order of frequency are: TH HE AN RE ER IN ON AT ND ST ES EN OF TE ED OR TI HI AS TO • Doubled letters in order of frequency are: LL EE SS OO TT FF RR NN PP CC MM Substitution Cipher: plaintext & Map We the People of the United States, in Order to form a =51 more perfect Union, establish Justice, insure domestic e=39 Tranquility, provide for the common defense, promote t=28 the general Welfare, and secure the Blessings of Liberty o=24 to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.

[ N] [!%] ["Z] [#1] [$f] [%=] [&r] [’I] [( ] [)U] [*,] [+a] [,8] [-m] [.D] [/y] [0P] [1’] [2\] [33] [4h] [5?] [6t] [7K] [8"] [9W] [:.] [;c] [<:] [=o] [>F] [?{] [@R] [A)] [B^] [C5] [Dj] [EA] [Fv] [GM] [H$] [IY] [J0] [Ke] [L<] [Mq] [NH] [O}] [PT] [Q+] [R‘] [S7] [Tl] [UC] [Vx] [WO] [X&] [Y[] [Z2] [[g] [\>] []s] [^J] [_!] [‘V] [a-] [bb] [c9] [dn] [eE] [fz] [gQ] [h(] [i]] [j4] [ki] [l@] [mu] [nL] [o#] [pX] [q/] [rd] [s;] [tp] [uG] [v|] [wS] [x*] [y_] [z6] [{k] [|B] [}w] Substitution Map Generator: Globals

#include

#defineASCII_START 32 #defineASCII_END 126 #defineMAP_SIZE 94 char map[MAP_SIZE];

• Program that creates a substitution map based on the first letter of the plaintext message. • The program creates a one-to-one map of all printable ASCII characters. Substitution Map Generator: buildMap void buildMap(char seed) • seed: first number in the { int i; pseudo-random sequence. int m=94; int a=189; • m, a, c: constants chosen to int c=53; int n=seed; give pseudo-random behaviour. for (i=0; i

void printMap(void) { int i; for (i=0; i

if ((i % 16) == 15) printf("\n"); } printf ("\n"); } Substitution Map Generator: main void main(void) { char c=getchar(); buildMap(c); printMap(); inti=0; while (c!= EOF) { if (c < ASCII_START) break; if (c > ASCII_END) break; if (i % 40 == 0) printf("\n"); printf ("%c", map[c-ASCII_START]); c=getchar(); i ++; } printf ("\n"); } A book cipher is a cipher in which the key is some aspect of a book or other piece of text. A message is typically encoded by three numbers for each letter: • Page • Line • Word Where the encoded letter is the first letter of the specified word. It is typically essential that both correspondents not only have the same book, but the same edition. One-time pad

• Random key (or “pad”) is at least as long as the text. • Add plaintext to key (modulo 26) to encrypt. • Subtract key from ciphertext to decrypt. • Destroy the key after use. • Problems: • Truly random key is hard to produce • How to exchange the key securely? • Never ever use the key again. • Modern stream ciphers mimic the one-time pad. Modern Cipher Categories Stream Ciphers Applied to a continuous stream of symbols. Algorithms applied to small blocks (1 to 16 bytes) are often still called stream ciphers. Block Ciphers Applied to blocks of symbols. Symmetric Key Algorithms The same key is used for both encryption and decryption. For example: RC4 (used in Secure Sockets Layer (SSL) ). Asymmetric Key Algorithms A different key is used for both encryption and decryption. For example: RSA which uses public / private key pairs. RSA: Key Generation

1. Choose two distinct prime numbers p and q. The primes should be chosen uniformly at random and should be of similar bit-length. 2. Compute the divisor, n = pq, to be used in the modulus operation. 3. Compute φ(pq) = (p − 1)(q − 1). The totient, φ, of a positive integer n is defined to be the number of positive integers less than or equal to n that are coprime to n. 4. Choose an integer e such that 1 < e < φ(pq), and e and φ(pq) are coprime. 5. Find d:(ed − 1) can be evenly divided by φ(p − 1)(q − 1). 6. The public key consists of the modulus n and e. 7. The private key consists of the modulus n and d. Linear Congruential Generator (LCG) A Linear Congruential Generator is one of the oldest and best known pseudorandom number generator algorithms:

Xn+1 = (aXn + c) mod m where Xn is the sequence of pseudorandom values, and • Modulus: m, 0 < m • Multiplier: a, 0 < a < m • Increment: c, 0 ≤ c < m

• Seed: X0, 0 ≤ X0 < m LCG example Example: Xn+1 = (7Xn + 11) mod 18, X0 = 0. X1 = (7(0) + 11) mod 18 = 11 X2 = (7(11) + 11) mod 18 = 16 X3 = (7(16) + 11) mod 18 = 15 X4 = (7(15) + 11) mod 18 = 8 LCG with Pseudorandom Behavior

Xn+1 = (aXn + c) mod m

• All of the values, a, c, m, and Xn used in an LCG are integers. • The sequence of integers in an LCG can never have a period greater than m. Why? • Some values of the modulus, multiplier and increment yield a sequence with maximum period and with good pseudorandom behavior. LCG with Pseudorandom Behavior An LCG will have a full period if and only if: • c and m are coprime (have no common factor > 1) • a − 1 is divisible by all prime factors of m, • a − 1 is a multiple of 4 if m is a multiple of 4. LCGs in Common Use Source m a c Numerical Recipes 232 1664525 1013904223 Borland C/C++ 232 22695477 1 glibc (used by GCC) 231 1103515245 12345 ANSI C: Watcom, Digital 231 1103515245 12345 Mars, CodeWarrior, IBM Visu- alAge C/C++ Borland Delphi, Virtual Pascal 232 134775813 1 32 Microsoft Visual/Quick 2 214013 (343FD16) 2531011 (269EC316) C/C++ Microsoft Visual Basic (6 and 224 1140671485 12820163 earlier) (43FD43FD16) (C39EC316) RtlUniform from Native API 231 − 1 2147483629 2147483587 (7FFFFFED16) (7FFFFFC316) Apple CarbonLib 231 − 1 16807 0 MMIX by Donald Knuth 264 6364136223846793005 1442695040888963407 Newlib 264 6364136223846793005 1 VAX’s MTH$RANDOM, old 232 69069 1 versions of glibc Java’s java.util.Random 248 25214903917 11 LC53 in Forth 232 − 5 232 − 333333333 0 Note: LCG’s do not always return all of the bits in the values they produce. Example: Java produces 48 bits, but only returns the 32 most significant. table from http://en.wikipedia.org/wiki/Linear_congruential_generator Linear Congruential Generator in C

#include void main (void) 136235578099065 { unsigned long m = 1 << 31; 15519887047211639486 unsigned long a = 1103515245; 13838650204930187039 unsigned long c = 12345; 3346717808905085548 unsigned long x = 123456; 3049859088482533429 7398829542172257482 15952872801263783483 int i; 9897407542008592728 for (i=1; i<=100; i++) 12436645181455133361 8967167257519066006 { 15551416184959193879 x = (a*x + c) % m; 14392011937321128708 printf ("%20lu\n", x); 7012146815830884589 } 837494824685357858 9089108537697995187 } 4486637365379619696 10706175307392302825 3333036225721595758 15798734497693109775 1184521088365346460 762097951296315557 14053214745497771130 13752776674615655467 4036971838066272392 13168213029112046113 2176075275402511942 11251131393832824839 9338128740141843764 7951965562090585949 Toy Linear Congruential Generator 0 #include 1 void main (void) 8 { int m = 18; 3 inta=7; 4 intc=1; 11 intx=0; 6 7 int i; 14 for (i=1; i<=28; i++) 9 { 10 printf ("%4d\n", x); 17 x = (a*x + c) % m; 12 } 13 } 2 15 16 5 0 1 8 3 4 11 6 7 14 9 Poor Linear Congruential Generator void main(void) This LCG does NOT { have a full period. int m = 18; How can this be 0 inta=6; intc=1; known before 1 intx=0; computing it? 7 int i; 7 for (i=1; i<=14; i++) (6 × 7 + 1) mod 18 7 { 7 printf ("%4d\n", x); (42 + 1) mod 18 x = (a*x + c) % m; 43 mod 18 7 } 7 7 } 7 7 7 7 7 7 Question: Linear Congruential Generator The output is: #include A:7513 void main(void) { int m = 10; B:7574 inta=2; intc=1; C:7567 intx=7; D:7526 int i; for (i=1; i<=4; i++) E:7537 { printf ("%d", x); x = (a*x + c) % m; } printf ("\n"); } XOR Cipher Lab The cipher used in this project: • Encrypts plain text messages consisting solely of the printable ASCII characters. • Decrypts ciphertext consisting solely of the printable ASCII characters. • The printable ASCII characters are 8-bit codes in the range of values from 32 to 126 (00100000 through 01111110). See http://en.wikipedia.org/wiki/ASCII • Any 8-bit sequences outside of this range constitutes invalid data. Cipher Record Format

Each cipher record must be of the form: Action lcg m , lcg c , Data \n 1 char 1-20 char 1 char 1-20 char 1 char any # of char 1 char Action: Must be either ‘e’ or ‘d’ specifying encryption or decryption respectively. lcg m: Specifies a 64-bit positive integer used for m of a Linear Congruential Generator. Must be decimal digits. lcg c: Specifies a 64-bit positive integer used for c of a Linear Congruential Generator. Must be decimal digits. Data: Printable ASCII character data to be encrypted or decrypted. Can be empty or arbitrarily long. Note: with the given algorithm there will be no need to keep the full line of data in memory. Cipher Algorithm: Summary

1. Determine the Linear Congruential Generator specified by the given key. This is done only once per line of input. 2. Read 1 byte of data b 3. Generate random value x 4. Compute encrypted byte as b XOR (x mod 128) 5. Deal with any non-printable ASCII characters. 6. Print the resulting character(s) to the standard output stream. 7. Return to step 2 and continue until the end of the line Cipher Algorithm: Initialization Given a 128-bit symmetric key consisting of: • m : a 64-bit positive integer used as an LCG modulus, • c : a 64-bit positive integer used as an LCG increment. Define a Linear Congruential Generator:

Xn+1 = (aXn + c) mod m Where • X0 = c • a = 1 + 2p, if 4 is a factor of m, otherwise, a = 1 + p. • p = product of m’s unique prime factors Reading Data Bytes

• Within each record, the data portion is the set of characters between the second comma and ’\n’. • When encrypting: Use getchar() to read 1 byte of data to encrypt. • When decrypting: Reading in an encrypted byte may require reading 2 bytes from standard input since some character codes have 2-byte ciphertext representations. • Any character in the data segment not in the range of printable ASCII characters (32 to 126) is an error. Non-printable ASCII characters This cipher algorithm can generate target bytes that are outside the range of printable ASCII. • When encrypting, if a generated byte e is: < 32 Replace with two bytes: ’*’ and ’?’+e. = 127 Replace with two bytes: ’*’ and ’!’. =’*’ Replace with two bytes: ’*’ and ’*’. • When decrypting, if a generated byte p is a non-printing ASCII character, then print the specified error message and read to the end of line. Output Format

• For every line of input, one line of output must be sent to the standard output stream. • If the input line is invalid, then the output has the form: ("%5d) %s Error\n", inputLineNumber, trash) where trash is any string no longer than twice the length any data part of the line. • If the input line is valid, then the output has the form: ("%5d) %s\n", inputLineNumber, outStr) Where outStr is the encrypted or decrypted data. Example: Read Line Input: e126,25,Byte\n • Given Key: m = 126 and c = 25 • Calculate: a = 43 (2 × 3 × 7+1 since m = 126 = 2 × 3 × 3 × 7)

• LCG: Xn+1 = (43(Xn) + 25) mod 126 with X0 = 25 • Data: Byte B 0 1 0 0 0 0 1 0 y 0 1 1 1 1 0 0 1 t 0 1 1 1 0 1 0 0 e 0 1 1 0 0 1 0 1 LCG Example: Generating random values

You’ll be generating i Xi Xi mod 128 one value at a time in 0 25 25 your program, I’m just 1 92 92 demonstrating the 2 75 75 sequence here. 3 100 100 In this example, we 4 41 41 have Xi mod 128 = Xi 5 24 24 since m = 126 < 128 6 49 49 This will not be the 7 116 116 case with larger LCG 8 99 99 values...... Data: y 121 0 1 1 1 1 0 0 1 Xi mod 128 92 0 1 0 1 1 1 0 0 encrypted % 37 0 0 1 0 0 1 0 1 Data: t 116 0 1 1 1 0 1 0 0 Xi mod 128 75 0 1 0 0 1 0 1 1 encrypted ? 63 0 0 1 1 1 1 1 1 Data: e 101 0 1 1 0 0 1 0 1 Xi mod 128 100 0 1 1 0 0 1 0 0 encrypted *@ 1 0 0 0 0 0 0 0 1 Byte encrypts to [%?*@

Example: Encrypting

Data: B 66 0 1 0 0 0 0 1 0 Xi mod 128 25 0 0 0 1 1 0 0 1 encrypted [ 91 0 1 0 1 1 0 0 1 Data: t 116 0 1 1 1 0 1 0 0 Xi mod 128 75 0 1 0 0 1 0 1 1 encrypted ? 63 0 0 1 1 1 1 1 1 Data: e 101 0 1 1 0 0 1 0 1 Xi mod 128 100 0 1 1 0 0 1 0 0 encrypted *@ 1 0 0 0 0 0 0 0 1 Byte encrypts to [%?*@

Example: Encrypting

Data: B 66 0 1 0 0 0 0 1 0 Xi mod 128 25 0 0 0 1 1 0 0 1 encrypted [ 91 0 1 0 1 1 0 0 1 Data: y 121 0 1 1 1 1 0 0 1 Xi mod 128 92 0 1 0 1 1 1 0 0 encrypted % 37 0 0 1 0 0 1 0 1 Data: e 101 0 1 1 0 0 1 0 1 Xi mod 128 100 0 1 1 0 0 1 0 0 encrypted *@ 1 0 0 0 0 0 0 0 1 Byte encrypts to [%?*@

Example: Encrypting

Data: B 66 0 1 0 0 0 0 1 0 Xi mod 128 25 0 0 0 1 1 0 0 1 encrypted [ 91 0 1 0 1 1 0 0 1 Data: y 121 0 1 1 1 1 0 0 1 Xi mod 128 92 0 1 0 1 1 1 0 0 encrypted % 37 0 0 1 0 0 1 0 1 Data: t 116 0 1 1 1 0 1 0 0 Xi mod 128 75 0 1 0 0 1 0 1 1 encrypted ? 63 0 0 1 1 1 1 1 1 Byte encrypts to [%?*@

Example: Encrypting

Data: B 66 0 1 0 0 0 0 1 0 Xi mod 128 25 0 0 0 1 1 0 0 1 encrypted [ 91 0 1 0 1 1 0 0 1 Data: y 121 0 1 1 1 1 0 0 1 Xi mod 128 92 0 1 0 1 1 1 0 0 encrypted % 37 0 0 1 0 0 1 0 1 Data: t 116 0 1 1 1 0 1 0 0 Xi mod 128 75 0 1 0 0 1 0 1 1 encrypted ? 63 0 0 1 1 1 1 1 1 Data: e 101 0 1 1 0 0 1 0 1 Xi mod 128 100 0 1 1 0 0 1 0 0 encrypted *@ 1 0 0 0 0 0 0 0 1 Example: Encrypting

Data: B 66 0 1 0 0 0 0 1 0 Xi mod 128 25 0 0 0 1 1 0 0 1 encrypted [ 91 0 1 0 1 1 0 0 1 Data: y 121 0 1 1 1 1 0 0 1 Xi mod 128 92 0 1 0 1 1 1 0 0 encrypted % 37 0 0 1 0 0 1 0 1 Data: t 116 0 1 1 1 0 1 0 0 Xi mod 128 75 0 1 0 0 1 0 1 1 encrypted ? 63 0 0 1 1 1 1 1 1 Data: e 101 0 1 1 0 0 1 0 1 Xi mod 128 100 0 1 1 0 0 1 0 0 encrypted *@ 1 0 0 0 0 0 0 0 1 Byte encrypts to [%?*@ Finding Unique Prime Factors - Algorithm

1. Let n be the number of which to find the prime factors. 2. Start with 2 as a test divisor. 3. If the test divisor squared is greater than the current n, then the current n is either 1 or prime. Save it if prime and return. 4. If the remainder of n divided by the test divisor is zero, then: a. The test divisor is a prime factor of n. Save it. b. Replace n with n divided by the test divisor. c. Repeat b until test divisor is no longer a factor of n. 5. If, in step 4, the remainder of n divided by the test divisor was not zero, then increment the test divisor. 6. Loop to step 3. • But only Unique Prime Factors are needed. • The product of the first 15 primes is: 2×3×5×7×11×13×17×19×23×29×31×37× 41 × 43 × 47 = 11, 682, 905, 869, 181, 336, 790. Times 53 > 264 − 1.

How Many Prime Factors Can n have? Write a function that finds the unique prime factors of a number, n, and stores each factor in an array. How big does the array need to be? • If n is a 64-bit integer then the largest number it can hold is 264 − 1. • Since 2 is the smallest prime, the largest 64-bit number must have less than 64 factors. How Many Prime Factors Can n have? Write a function that finds the unique prime factors of a number, n, and stores each factor in an array. How big does the array need to be? • If n is a 64-bit integer then the largest number it can hold is 264 − 1. • Since 2 is the smallest prime, the largest 64-bit number must have less than 64 factors. • But only Unique Prime Factors are needed. • The product of the first 15 primes is: 2×3×5×7×11×13×17×19×23×29×31×37× 41 × 43 × 47 = 11, 682, 905, 869, 181, 336, 790. Times 53 > 264 − 1.