Prof. Dr. Carsten Damm Dr. Henrik Brosenne

University of Goettingen Institut of Computer Science

Winter 2013/2014 Table of Contents

Stream Ciphers Types of Stream Ciphers Pseudorandomness Linear Feedback sequences Linear Complexity RC4 Published worksheet 06 general considerations. Setup

recall: stream ciphers encipher plaintext symbols/blocks one at a time, but the cipher transformation (the key) varies with time Vigenere, running key, auto-key, Vernam the cipher is determined by the current state S (which requires some memory to be stored) of a key stream generator G

G is basically determined by a state function fk : current state 7→ next state in the narrow sense “stream cipher” refers to bit based encryption Synchronous stream ciphers

in synchronous stream ciphers the is generated independently from plaintext or ciphertext

given key k and initial state S0, the encryption process is defined by

key stream function ki = gk (Si ) output function ci = h(ki , mi ) next state function Si+1 = fk (Si ) the stream ciphers we study are additive binary stream ciphers, i.e., h = XOR (repeating) XOR-cipher = Vigenere-cipher on the binary alphabet Properties of synchronous stream ciphers

synchronization: while processing the ith symbol sender and receiver must be in same state and process corresponding plain-/ciphertext symbols error propagation: none attacks: active adversary can use insertion/deletion/replay of ciphertext and observe the result (chosen plaintext attack) Self-synchronizing ciphers

other name: asynchronous stream cipher requirement: keystream is function of the key and fixed number ` of previous ciphertext characters ⇒ state i can be encoded as vector of last ` cipher symbols stored in a shift register

S0 = (c−`, ..., c−1) initial state

Si = (ci−`, ..., ci−1) state i

key stream function and output function as in the synchronous case

ki = gk (Si )

ci = h(ki , mi ) Properties of self-synchronizing ciphers

synchronization: insertion/deletion/change of ciphertext symbols results in loss of a fixed number of blocks, afterwards the cipher self-synchronizes in the following sense:

I after the change is “shifted out”, the receiver observes the unchanged cipher bits from which the key is derived error propagation: ciphertext transmission errors affect at most ` deciphered plaintext symbols attacks: changing attacks are more easily detected (because of error propagation); insertion/deletion/replay are harder to detect because of self-synchronization; better diffusion properties compared to synchronous stream cipher remark: similar to OFB mode of block ciphers Example 1: One-time pad

key = keystream, determined by true randomness from the real world last digits of time difference between keyboard strokes or mouse events Linux machines maintain a special device /dev/random/ that outputs random bits “distilled” by hashing many system states, quick&dirty access: echo $RANDOM (see man urandom for more information), this is a rather slow device and it is not designed for security applications the one time pad is very secure and very impractical as the huge key has to be transferred in advance and kept secret Example 2: the binary XOR cipher

= Vigenere cipher on the binary alphabet Example: 8 bit key 1110011 defines the keystream 1110011 1110011 1110011 1110011 . . . vulnerable against known plaintext attacks (key = cipher ⊕ knownplaintext) vulnerable against ciphertext only statistical attacks very practical but very insecure Exercise 24

see published worksheet 06 exercise 24 Serious and practical stream ciphers

generate k 7→ k0k1k2... key stream that “looks random” but is easily reconstructed from a short key most generators can be considered as feedback shift registers:

I state = last ` output values

I initial state determined by key

I current output = function of current state

I next state = current state shifted by one place (oldest output thrown away) and current output appended

shift clock bit stream

bn−1 b1 b0

f : {0, 1}n → {0, 1} feedback shift register

shift clock bit stream

bn−1 b1 b0

f : {0, 1}n → {0, 1}

if the generator has ` memory cells, each able to store one out of m symbols, the system has N = m` states obvious consequence: the key stream period

d = min{t | ∀i : ki = ki+t }

satisfies d ≤ N Example

Linear concruential generators (LCG) the bits will be derived from the sequence

kn = (a · kn−1 + c) (mod m)

initial state k0 and a, c are part of the key a, b, m are called multiplier, increment and modulus of the LCG Table of Contents

Stream Ciphers Types of Stream Ciphers Pseudorandomness Linear Feedback Shift Register sequences Linear Complexity RC4 stream cipher Published worksheet 06 pseudorandomness. What is a random bit sequence?

A bit sequence is 1 truly random if the next bit is unpredictable by whatever means (no precise mathematical definition) 2 Kolmogoroff random if it is “not compressible”, i.e., the string has essentially no significantly shorter description than itself

I precise description can be given but is somehow cumbersome

I main drawback: it is provably impossible, to prove that a given string is Kolmogorov random (at the same time it is easy to prove, that there are Kolmogorov random strings) 3 statistically random if it passes any statistical test of randomness frequency tests, auto correlation tests, graphical tests, . . . difficult to specify 4 computational random restricts this to polynomial time computable statistical tests (precise definition somewhat cumbersome, input consists of the data to be tested and the confidence level of the test) random bit sequence

truly random or Kolmogorov random bit sequences cannot be algorithmically generated from a short key we can only generate pseudorandom bit sequences, i.e., sequences that “look random” to any observer who does not know initial state and parameters of the generation process (the seed) to be precise we require computational randomness Cryptographic requirements

we need a good trade-off between “quality of randomness” and practicability in particular we need: C1 good expansion = ratio period/key length C2 simple and fast generation of bits C3 computation of next bit given all previous bits takes more resources than an attacker is willing/able to invest linear congruential generator satisfies C1, C2, but not C3 (see Exercise 25) Quadratic generator (Blum, Blum, Shub 1986)

let p, q be Blum primes = primes that are congruent to 3 modulo 4 (there are infinitely many Blum primes) ∗ let n = p · q and k ∈ Zn (this means: gcd(k, n) = 1) based on these values we define a sequence of residues

2 a0 = k (mod n) 2 ai+1 = ai (mod n)

and a sequence of bits xi = ai (mod 2) obviously C1, C2 are satisfied and under the following complexity theoretic assumption also C3 is satisfied Quadratic residue

Quadratic residue assumption (QRA): There is no polynomial time algorithm, that given only n (not its factorization into p · q) and a ∈ Zn decides, wheher a is a quadratic residue, i.e., whether there is some b such that b2 = a (mod n). Remark The obvious brute force attack of testing all possible b is an exponential time algorithm!: running time is measured in dependence of input length in bits (which is O(log n)).

Theorem If quadratic residue assumption (QRA) holds then the quadratic generator generates a computationally random bit sequence. Exercise 25

1 Consider a LCG with modulus 231 and three consecutive generated values 1403686589, 4653678, 1890276371. Compute the next three values. Table of Contents

Stream Ciphers Types of Stream Ciphers Pseudorandomness Linear Feedback Shift Register sequences Linear Complexity RC4 stream cipher Published worksheet 06 LFSR sequences. Definition

a linear feedback shift register sequence of n registers is the output sequence of a binary FSR with linear output function

n−1 X f (x0,..., xn−1) = ci · xi (mod 2) i=0

where ci are the feedback coefficients

bn−1 b1 b0

ci cn−1 c0

2n states, period ≤ 2n − 1 (because state 0 = (0, ..., 0) is stable) linear feedback shift register

bn−1 b1 b0

ci cn−1 c0

the characteristic polynomial of the LFSR with feedback coefficients c0, ..., cn−1 is

∗ n−1 n f (X ) = c0 + c1x + ... + cn−1X + X ∈ GF(2)[X ]

f ∗(X ) is called primitive if ∗ I f (X ) is irreducible (has no non-trivial divisors) AND n ∗ m I for every m < 2 − 1 holds: f (X ) does not divide X − 1 linear feedback shift register

Theorem A linear feedback shift register (LFSR) with characteristic polynomial

∗ n−1 n f (X ) = c0 + c1x + ... + cn−1X + X

has period 2n − 1 (which is maximal) if and only if f ∗(X ) is primitive.

Obvious consequence In case f ∗(X ) is primitive, every initial state6= 0 leads to a maximum period sequence LFSR sequences for cryptographic use

Fact 1: the number of primitive polynomials of degree n is

ϕ(2n − 1) n where ϕ(n) = number of coprimes to n (Euler totient function, will be considered in more detail later)

I just keep in mind: there are many maximal LFSR-sequences for any number of registers

I coefficients + initial state could be used as key for a stream cipher Fact 2: maximal LFSR sequences “look randomly” as seen below Pseudorandom periodic sequences

by definition: periodic sequences are not random what is a “randomly looking” bit sequence of period d? consider weak version of the requirements for statistical randomness using following notions/notations:

I t- block = subsequence of shape 0 11...1 0 | {z } t I t- gap = subsequence of shape 1 00...0 1 | {z } t a d-periodic bit sequence x = x0,..., xN−1 is Golomb-random, if for every length d subsequence x of x holds: G1 balance. the number of 1’s and the number of 0’s is ≈ d/2 G2 for small enough t the number of t-blocks and the number of t-gaps are both ≈ d/4t G3 autocorrelation. the relative number of agreements MINUS the number of disagreements between x and any non-zero cyclical shift of x is about ±1/d Remark: For d independent randomly picked bits the above figures are exactly the expectations of these numbers. Maximal LFSR sequences are Golomb-random

Theorem Any maximal LFSR sequence on n registers (period 2n − 1) has the following properties: it contains 2n−1 ones and 2n−1 − 1 zeroes for any t with 1 ≤ t ≤ n − 2 the number of t-blocks equals the number of t-gaps equals 2n−2−t the autocorrelation between the sequence and a non-zero cyclical shift is 1 − 2n−1 for t 6= 0.

Conclusion there are many pseudorandom LFSR-sequences sequences can be generated very fast expansion is exponential BUT stream ciphers with LFSR keystream are cryptographically weak Known ciphertext attack against LFSR stream ciphers

Theorem A maximum period LFSR sequence of period 2n − 1 can be reconstructed from 2n consecutive bits.

Proof.

let c0, ..., cn−1 be the feedback coefficients, then the following equation holds for the consecutive bits xr , xr−1, ..., xr+2n−1:       xr+n xr xr+1 ... xr+n−1 c0  xr+n+1   xr+1 xr+2 ... xr+n   c1    =      .   . . .. .   .   .   . . . .   .  xr+2n−1 xr+n−1 xr+n ... xr+2n−1 cn−1

LHS and matrix are known parts of the equation, th coefficients ci of the characteristic polynomial are the unknowns the matrix is invertible iff its rows generate all binary vectors, which is the case for maximum period LFSRs

Consequence Stream ciphers based on LFSR as keystream generators are insecure against known-plaintext attacks. Table of Contents

Stream Ciphers Types of Stream Ciphers Pseudorandomness Linear Feedback Shift Register sequences Linear Complexity RC4 stream cipher Published worksheet 06 linear complexity Berlekamp-Massey-Algorithm

Fact solving a linear equation system in n unknowns generally takes O(n3) time — this is the attackers cost to pay in a known plaintext attack the above equation system has a very special shape: entries along minor diagonals are constant (Hankel-matrix), which indicates existence of more efficient solution algorithms indeed the Berlekamp-Massey-algorithm solves the equation system in time O(n2), making use of polynomial algebra oveer finte fields Linear complexity

more precisely: the berlekamp-massey-algorithm finds the shortest LFSR (i.e., minimum number of registers) producing a given periodic bit sequence s the number of registers (= degree of characteristic polynomial) is called linear complexity of s good key stream generators have large linear complexity, despite of low memory as we have seen, large linear complexity requires non-linear FSR (NLFSR) Example: The

the shrinking generator consists of two LFSR generators (A, S):

I both generators are clocked simultaneously

I A produces bits and S selects the bits to output

I only if the S-bit is 1, the A-bit is output, otherwise it is cancelled

I thus S acts like an “decimating operator” in an irregular way Golomb-properties of A and S ensure good pseudorandom properties of the shrinking generator (A, S) up to know no practical attacks against shrinking generators are known on the other hand, the generator lacks a satisfying mathematical theory to judge about its security Exercise 26

1 The published worksheet 06 linear complexity demonstrates a shrinking generator stream cipher. The worksheet also shows the generated key stream (by encoding the period-long 0-string). But the implementation is not very intuitive because the wrapping into a cryptosystem demands a lot of formal overhead. Improve the implementation to a single function call ShrinkingGenerator((p1,IS1,p2,IS2),n) that similar as the command lfsr sequence(...,n) returns the sequence of the first n generated bits. Hint You may take a look into the source code ShrinkingGeneratorCryptosystem. by entering ShrinkingGeneratorCryptosystem?? 2 Determine the linear complexity of the shrinking generator from the worksheet using berlekamp massey 3 Read about the about the A5/1 stream cipher and implement it in Sage (http://en.wikipedia.org/wiki/A5/1) Table of Contents

Stream Ciphers Types of Stream Ciphers Pseudorandomness Linear Feedback Shift Register sequences Linear Complexity RC4 stream cipher RC4 stream cipher

RC4 is a often used stream cipher designed by Ron Rivest in 1987. RC4 was kept as a trade secret by RSA Data Security until it leaked out in 1994. Today, RC4 is used in the SSL/TLS protocol, the WEP protocol and its successor, the TKIP protocol as in many other protocols and applications. Overview

RC4 consists of an internal state and two algorithms.

The internal state of RC4 consists of a permutation S = (S0,..., S255) of (0,..., 255) and two pointers i, j to elements of S. RC4 Key Scheduling Algorithm (RC4-KSA) transfers a key of length 1 to 256 bytes into an internal state of RC4. RC4 Pseudo Random Generator Algorithm (RC4-PRGA) can be used to generate a key stream of arbitrary length, after the internal state has been initialized. With every output byte produced by the RC4-PRGA, the internal state of RC4 is updated. RC4-KSA

To begin, the entries of the internal state S = (S0,... S255) are set equal to the values from 0 through 255 in ascending order; that is, S0 = 0, S1 = 1,... S255 = 255 A temporary vector T is also created. If the length of the key is 256 bytes, then the key is transferred to T . Otherwise, the key is repeated copied to T as many times as necessary to fill out T . see published worksheet 06 . RC4-PRGA

Once the internal state S is initialized, the input key is no longer used. Stream generation involves cycling through all the elements of S, and for each Si , swapping Si with another byte in S according to a scheme dictated by the current configuration of S.

After S255 is reached, the process continues, starting over again at S0. see published worksheet 06 rc4. encrypt/decrypt

To encrypt, XOR the next byte of the key stream with the next byte of plaintext. To decrypt, XOR the value k with the next byte of ciphertext. Strength of RC4

A number of papers have been published analyzing methods of attacking RC4 None of these approaches is practical against RC4 with a reasonable key length, such as 128 bits. The WEP protocol uses RC 4 and is intended to provide confidentiality on 802.11 wireless LAN networks, isvulnerable to a particular attack approach. In essence, the problem is not with RC4 itself but the way in which keys are generated for use as input to RC4. This particular problem does not appear to be relevant to other applications using RC4 and can be remedied in WEP by changing the way in which keys are generated. This problem points out the difficulty in designing a secure system that involves both cryptographic functions and protocols that make use of them. ?? ?? ??