<<

06-20008 The University of Birmingham Autumn Semester 2012 School of Computer Science Eike Ritter 25 October, 2012 Handout 5 Summary of this handout: Stream Ciphers — RC4 — Linear Feedback Shift Registers — CSS — A5/1

II.2 Stream Ciphers A is a symmetric cipher that encrypts the plaintext units one at a time and that varies the transformation of successive units during the encryption. In practise, the units are typically single bits or bytes. In contrast to a block cipher, which encrypts one block and then starts over again, a stream cipher encrypts plaintext streams continuously and therefore needs to maintain an internal state in order to avoid obvious duplication of encryptions. The main difference between stream ciphers and block ciphers is that stream ciphers have to maintain an internal state, while block ciphers do not. Recall that we have already seen how block ciphers avoid duplication and can also be transformed into stream ciphers via modes of operation. Basic stream ciphers are similar to the OFB and CTR modes for block ciphers in that they produce a continuous , which is used to encipher the plaintext. However, modern stream ciphers are computationally far more efficient and easier to implement in both hard- and software than block ciphers and are therefore preferred in applications where speed really matters, such as in real-time audio and video encryption. Before we look at the technical details how stream ciphers work, we recall the One Time Pad, which we have briefly discussed in Handout 1. The One Time Pad is essentially an unbreakable cipher that depends on the fact that the key

1. is as long as the message, and

2. is a truly random sequence of letters that cannot be guessed.

Both points are reasons, why the One Time Pad is unpractical, since one has to constantly exchange new keys and getting true randomness in practise is difficult to achieve. 39. Pseudo-random Generators One idea to overcome this problem is to not use keys that are fully random but keys that only look random. A relatively short string which is truly random is used to compute a larger string which, while of course not being truly random, is as good as being random. This large string is called a pseudo-random string, and it can be used to replace the random key in the One Time Pad. Algorithms that produce pseudo-random strings are called pseudo-random generators (PRG). The short string that initialises a pseudo-random generator is called a seed and takes the place of the secret key for stream ciphers. In overview a stream cipher works like this:

Plaintext ⊕ Ciphertext Key/Seed Pseudo-random Generator Keystream

40. Getting True Randomness The seed for a pseudo-random generator, and keys for symmetric encryption schemes in general, should be as random as possible. One uses for example physical random number generators to get good random- ness. There are some physical sources that are supposed to produce good randomness, but the resulting bits may have a certain bias or some correlation. One usually circumvents this by taking the xor of bits obtained from different such sources. Typical physical sources of randomness include:

• Thermal noise in various electric circuits,

35 • Radioactive decay, • Atmospheric noise. In practise more easily available are events in computer hardware such as • measurement of times between user key-strokes, and • time needed to access different sectors on the hard-disk drive (the air turbulence caused by the spinning disk is supposed to be random). 41. Properties of Pseudo-random Generators One of the most important improper usages of stream ciphers is to re-use the seed and therefore the keystream twice, i.e., to encrypt several messages with the same key. Assume Eve intercepts two en- cryptions C1 = K⊕M1 and C2 = K⊕M2 for two messages M1,M2 with the same key K then she can simply compute the xor of C1 and C2 yielding:

C1⊕C2 =(M1⊕K)⊕(M2⊕K)= M1⊕M2 Thus re-using the key leaks the xor of the actual plaintexts. Assuming that both messages contain ordi- nary text, Eve can use frequency analysis to recover the plaintexts M1 and M2 from M1⊕M2. Thus one has to be careful not to re-use a key when using stream ciphers. There are mainly two methods to realise this: • One might use successive parts of the output stream to encrypt successive messages. This requires synchronisation of the senders and the receivers streams by some means, usually by transmitting its position along with the encrypted message. This has disadvantages if the order of messages is changed in the transmission line or by the protocol. • One might create a new seed for each message that needs to be encrypted. Then one additionally transmits the seed along with the message. Of course, the seed has to be transmitted secretly somehow. This can be done by combining the stream cipher with a block cipher and to transmit the seed enciphered with the block cipher before the actual ciphertext encrypted with the stream cipher. As a consequence it is important that stream ciphers appear random — which can be checked with sta- tistical methods — and have a long period, i.e. can produce a large number of bits before the same keystream is produced again. Generally determining more of the sequence from a part should be compu- tationally infeasible. Ideally, even if one knows the first one billion bits of the keystream sequence, the probability of guessing the next bit correctly should be no better than one half. We now have a look at several pseudo-random generators.

II.2.1 RC4 RC4 is a stream-cipher invented by Ron Rivest in 1987 for RSA Security, which also holds the trademark for it. The source code was originally not released to the public because it was a trade secret, but was posted to a newsgroup some time ago; thus people referred to this version as alleged RC4. Today it is known that alleged RC4 indeed equals RC4. While RC4 does not hold up to most randomness tests, it is considered secure from a practical point of view if one takes certain precautions. It works on bytes instead of bits and can therefore be very efficiently implemented. It is used in many protocols such as SSL/TLS and 802.11b WEP. RC4 consists of two phases: an initialisation phase, which can also be understood as a key schedule, and a keystream generation phase. Its main data structure is an array S of 256 bytes. The array is initialised to the identity before any output is generated, i.e., the first cell is initialised with 0, the second with 1 and so on. Then the cells are permuted using a swap operation that depends on the current state and the chosen key K. The key K can be of variable size between 5 and 16 bytes. This keylength is a constant that is exploited during the initialisation algorithm. In pseudo code, the RC4 initialisation phase works as follows:

36 for i := 0 to 255 do S[i] := i end j := 0 for i := 0 to 255 do j := (j + S[i]+K[i mod keylength]) mod 256 swap(S[i],S[j]) end

After initialisation has been completed, the following procedure computes the pseudo-random sequence. For each output byte, new values of the variables i,j are calculated, the corresponding cells are swapped, and the content of a third cell is output. The algorithm looks as follows: i := 0 j := 0 while GeneratingOutput: i := (i + 1) mod 256 j := (j + S[i]) mod 256 swap(S[i],S[j]) output S[(S[i]+S[j]) mod 256] end In the while loop the first line makes sure every array element is used once after 256 iterations; the second line makes the output depend non-linearly on the array; the third line makes sure the array is evolved and modified as the iteration continues; and the fourth line makes sure the output sequence reveals little about the internal state of the array. The generated keystream is then xor-ed with the plaintext byte by byte. Here is a graphical depiction of RC4. Observe that the K here stands for the generated keystream byte and not for the initial key. Source: Wikipedia

Nevertheless we can see that the first output byte depends on the content of 3 cells, only. This property can be used to launch attacks against the cipher, so one usually discards the first 256 bytes of output generated by this algorithm to prevent these attacks.

II.2.2 LFSR Linear Feedback Shift Registers (LFSR) is a pseudo-random generator that is used as a building block for many modern stream ciphers. They can be very efficiently implemented in both hardware and software and constitute a very fast way to produce . They consist of a shift register, which is a group of single bit cells that shift by one cell at every clock cycle together with a linear function f, called the feedback function, that determines the new incoming bit for the shift register. The function f generally uses some of the bits in the shift register to determine the new input bit. For instance below we have a 4 bit shift register, and the feedback function uses bit 1 and 4 to compute the new input.

37 The process of taking certain bits, but not all bits from a shift register is referred to as tapping. Thus the feedback function f above taps the bits 1 and 4. The sequence of bits tapped from a shift register by a feedback function is also referred to as the tapping sequence. Since normally the feedback function f is a simple xor of the tapped bits, the behaviour of an LFSR is determined by the tapping sequence together with the initial content of the shift register, which serves as seed for the PRG, and thereby as the key for the cipher. As an example take the following 16 bit LFSR with tapping sequence [11, 13, 14, 16]: Source: Wikipedia

42. Properties of LFSRs The main property one is interested in for an LFSR is of course the length of its period, i.e. how long it takes before the same keystream is reproduced. Since during the construction of the LFSR we can only choose the tapping sequence, as the seeds are random, we have to examine the effect of particular tapping sequences on the set of all possible initial states. This is done mathematically via computations in F2. I will only sketch this process here: Let L be a LFSR, then we define

1. the state vector of L for the shift register s with n cells as s = [s1,s2,...,sn], where sn is the highest bit, i.e., the next output bit. s1 is the lowest bit, the one that will be filled with the result of the feedback function in the next step.

n 2. the connection polynomial c(x) ∈ F2[x] as follows: c(x)= cnx + ... + c1x + 1, where ci is 1 if the ith cell in s is tapped or 0 if it is not tapped. For instance the connection polynomial for the above 16 bit LFSR is x16 + x14 + x13 + x11 + 1. Observe that the trailing 1 does not correspond to a cell in the shift register as we start counting the registers from 1 (i.e., x1) rather then from 0 (i.e., x0).

c1 c2 cn−1 cn   1 0 0 0  0 1 0 0  3. a matrix M =    . . . . .   ......     0 0 1 0  We can now easily express the transitions of the state vector s for L by s = M s and observe the be- haviour of L given different seeds for s. The properties of L depend strongly on the algebraic properties of the connection polynomial c(x) and reasoning about the security of L or how it could be attacked is done via examining c(x). We will not go into any detail and only have a brief look at two examples how to generate sequences of states. Example: Let L be a 4 bit register with connection polynomial c(x) = x3 + x + 1. Thus we get 1010 1000 M = We denote the single states s of L by their corresponding integer value and recall 0100   0010

38 that the highest bit is right and the lowest left, e.g., s = [1, 0, 0, 0] is 1 and s = [0, 0, 0, 1] is 8. Observe that for state 0 we will always get the 0 state as a result. We get then the following sequences of states: 1 11 15

3 7 14 8 0 9 4 10 13

12 2 5 6

The sequence can be easily computed using all 16 possible values for s. In the example for s = 8 we get 1010 0 1 0 + 0 0 + 1 0 + 0 1 0 1000 0 1 0 + 0 0 + 0 0 + 0 1 0 M s = = = 0100 0 0 0 + 1 0 + 0 0 + 0 1 0         0010 1 0 0 + 0 0 + 1 0 + 0 1 0

Example: Let L be a 4 bit register with connection polynomial c(x) = x4 + x + 1. Thus we get 1001 1000 M = and the following sequences of states: 0100   0010 1 3 7 15 14 13 10 5 0 8 4 2 9 12 6 11

43. Combining multiple LFSRs While LFSR are easy to build they are very insecure in practise. Even for large shift registers with good security properties, there are fairly efficient algorithms to compute the connection polynomial from a sufficiently long keystream. Thus in practise stream ciphers are built by combining multiple LFSRs in a non-linear fashion (i.e., with functions other than simple xor). The picture below illustrates how the output keystreams of n LFSRs are combined with a function F to produce the eventual keystream K. As an example for a non-linear function F consider F (x1,x2,x3) = x1 x2⊕x1 x3, where the xi are output bits of three different LFSRs and is bit-wise and or multiplication in F2. We will discuss two examples of stream ciphers using multiple LFSRs below. Source: Wikipedia

Normally LFSRs are clocked regularly, i.e., for each bit of output they perform one shift. One way of introducing non-linearity is to make shifts dependent on the output of a second LFSR. Or, when combining several LFSRs, to express the clock of a each LFSR as a function in selected bits of the other LFSRs. We will discuss this in more detail for the A5/1 stream cipher.

39 II.2.3 CSS The content scrambling system (CSS) was the old proprietary standard to encrypt multimedia DVDs for copy protection. It is a stream cipher with a 40 bit key and two combined LFSRs and combines these with a fairly complex authentication and key extraction protocols. DVD players contain a CSS decryption module that performs key extraction and unscrambling. In a first step the DVD and the player authenticate each other and check their regional compatibility. (There are 8 different regions for DVDs and players in the world. DVDs released in each region will typically only play on players sold in that region. I.e., a DVD released in the UK “Region 2” would not play on a player sold in the US “Region 1”.) Each player has roughly 400 Player Keys that are used to extract a Disk Key from the DVD, which is in turn used to extract a title key for each individual track. The title key is then used to extract for each single sector a 5 byte sector key, which is stored in bytes 80–84 of a DVD sector. The 40 bits of the sector key are used as seed for the two LFSRs to start the actual decryption process. Each sector key is of the form K0K1K2K3K4, where each Ki has 8 bits. The pseudo random generator consists of: 15 • a 17 bit LFSR with connection polynomial x + x + 1 and seed 1K0K1, 23 20 19 11 • a 25 bit LFSR with connection polynomial x + x + x + x and seed 1K2K3K4,

• a combination function that adds every 8 bits of output from the LFSR modulo 256 while observing the carry bit from the previous addition.

seed 8 bits 1K0K1 17 bit LFSR add modulo 256 Keystream 1K2K3K4 25 bit LFSR 8 bits 44. Security and Legal Issues In 1999 CSS was successfully re-engineered and is effectively insecure. Copies of the deCSS libraries started appearing on the Internet and Open Source solutions to playback CSS encrypted DVDs where created. However, the CSS licensing agency obtained court injunctions against both people developing deCSS tools as well as web sites distributing the code. Eventually in 2003 the California Supreme Court threw out the last law suit on grounds that deCSS falls under “freedom of speech”. According to the court’s decision it is not legal to offer binary tools to enable illegal copying of protected content, however, it is perfectly legal to develop alternative means to decrypt CSS for legal purposes, such as the replay of legally purchased DVD content, as well as offering source code for others to read and to work on. In 2003 the European Parliament passed the European Directive on Copyright and Related Rights Regu- lations in which the development, provision, and possession of tools, “the sole intended purpose of which is to facilitate the unauthorised removal or circumven- tion of the technical device” is prohibited. As this definition is fairly vague, different member states implemented the directive differ- ently in their respective national legislation. Some went so far as to even outlaw the discussion of how such tools could be built, as they are part of the development process. In the UK legislation the concept “fair dealing” prevents these extremities as working with protected content “for the purposes . . . of research for a non-commercial purpose. . . the purpose of criticism or review . . . does not infringe any copyright in the work provided it is accompanied by a sufficient acknowledgement” Some of the issues debated how digital copyright differs from normal copyright are:

40 Fair Use goes further than just fair dealing. It allows the use of copyright protected material also in commercial context for a number of purposes, such as criticism, parody and in education.

Lifetime of Copyrights is normally limited in order to protect the copyright owner and enable them to earn proceeds for a finite time before their intellectual property will become public domain. An effective, enforceable digital content protection makes copyright effectively infinite.

Liability Issues The focus on who commits a copyright infringement shifts from the actual perpetrators (i.e. the person that illegally copies content) to the tool-maker (i.e. the person providing software to unscramble content). For other copyrighted material, this is not the case: For example a pho- tocopier can easily be abused to mass-copy a copyrighted book. However, no-one would sue the manufacturer of the photocopier. DeCSS is a library that can of course be used for ripping DVDs illegally, but also for playing back legally purchased content.

45. Illegal Primes Since any data on a computer is essentially only a string of binary numbers, it can be represented as a single number. Therefore, any program can simply be published as a single number as well. The C implementation of DeCSS for the Linux operation system was probably the first program to be published as a single, executable decimal number in 2001. Interestingly enough, it turned out that the number representing the program was a prime number. Since it was argued at the time that DeCSS was an illegal means to copyright infringement, DeCSS was the first known instance of a number, whose possession was deemed illegal by some, or simply an illegal prime number. Illegal primes are special instances of illegal numbers that represent some secret which is illegal to possess or distribute. These not only include implementations such as DeCSS but also software product keys, etc. As already mentioned for the AACS controversy, the question is, is it possible to patent a number and enforce that patent by stopping others from using and, in particular, publishing such a number as it might infringe a patent or copyright. Since I can represent any number in different bases (binary, octal, decimal, hexadecimal,. . . ) as well as a combination of other numbers using mathematical operations, would that imply that all representations are illegal? In the particular case of AACS, it can be used with millions of keys. Are all those illegal numbers and can one be prevented from using these numbers? Thinking this through further, illegal numbers have even more implications: One can mathematically show that you can produce an infinite number of primes that represent the “compressed” version of a program. (I put compressed in quotation marks since clearly some of the compressed programs are larger than the uncompressed ones.) This would mean that there is a potentially infinite number of numbers illegal.

II.2.4 A5/1 A5/1 is a stream cipher used in GSM mobile phone communication. It was developed in 1987 in Europe and, while it was initially kept secret, it became public knowledge through leaks and reverse engineering. It is a stream cipher built from three LFSRs with irregular clock cycle, that use a 64 bit secret key (actually only 54 bits are relevant as 10 bits are fixed to 0) and 22 bit publicly available frame number for initialisation. In detail the pseudo random generator consists of:

• a 19 bit LFSR with connection polynomial x19 + x5 + x2 + x + 1 and clock bit 9,

• a 22 bit LFSR with connection polynomial x22 + x + 1 and clock bit 11,

• a 23 bit LFSR with connection polynomial x23 + x15 + x2 + x + 1 and clock bit 11, The output of the LFSRs is simply xor-ed together to compute the keystream. The idea of the clock bits is that a shift register is only shifted if its clock bit is the same as the majority of the three clock bits of the three LFSRs taken together. For example if two clock bits are 0 and one is

41 1 than only those two registers with clock bits 0 will be shifted. Thus in each cycle either two or three registers are shifted. The following graphic illustrates A5/1. Observe that the registers are enumerated starting with 0, thus the clock bits are labelled 8 and 10. Source: Wikipedia

The shift registers are initialised by xor-ing the 64 bit key and the 22 bit frame number step-wise into the originally empty register. This takes 86 cycles. The next 100 cycles are then computed and discarded. Afterwards communication can begin. Call transmission is consists of sequences of bursts, one burst is sent every 4.615 milliseconds and contains 114 bits of information. For each burst A5/1 produces a 114 bit key stream which is xor-ed with the 114 bits before the digital signal is transformed into an audio signal.

42 Cryptography Glossary 5

A5/1 A stream cipher used in GSM mobile phone communication. 41

Connection Polynomial Polynomial used to describe and analyse a LFSR based stream cipher. 38 Content Scrambling System A proprietary standard to encrypt multimedia DVDs for copy protection. 39 CSS Short for Content Scrambling System. 39

Feedback Function A function that determines a new bit for a shift register by tapping the 37 content of some of the registers.

Illegal Number A number that represents some secret which is illegal to possess or dis- 41 tribute. Illegal Prime An illegal number that happens to be a prime. The first known instance of 41 an illegal prime was a number representing the C source code of DeCSS.

Keystream A stream of bits used of encryption by xor-ing it with the plaintext in a 35 stream cipher.

LFSR Short for Linear Feedback Shift Register. 37 Linear Feedback Shift Register A shift register with a linear feedback function that is a building block 37 for many modern stream ciphers.

PRG Short for Pseudo-random Generator. 35 Pseudo-random Generator An algorithm that produces a bit string that looks like a random se- 35 quence. Pseudo-random String A bit string that looks like a random sequence but was generated by a 35 pseudo-random generator.

RC4 A stream cipher used in many protocols such as SSL/TLS and 802.11b 36 WEP.

Seed The initial value for a pseudo-random generator. 35 Shift Register A group of single bit cells that shift by one cell at every clock cycle. 37 Stream Cipher A symmetric cipher that encrypts plaintext continuously. 35

Tapping Taking certain, but not all bits from a shift register. 38 Tapping Sequence The sequence of bits tapped from a shift register by its feedback function. 38