Dissertation Docteur De L'université Du Luxembourg

Home , Shrinking generator

PhD-FSTC-2011-21 The Faculty of Sciences, Technology and Communication

DISSERTATION

Defense held on 25/11/2011 in Luxembourg

to obtain the degree of

DOCTEUR DE L’UNIVERSITÉ DU LUXEMBOURG

EN INFORMATIQUE

Deike PRIEMUTH-SCHMID Born on 22 May 1981 in Oranienburg (Germany)

ANALYSIS OF RESYNCHRONIZATION MECHANISMS OF STREAM CIPHERS

Dissertation defense committee Dr. Alex Biryukov, dissertation supervisor A-Professor, Université du Luxembourg

Dr. Volker Müller, Chairman A-Professor, Université du Luxembourg

Dr. Frederik Armknecht A-Professor, Universität Mannheim

Dr. Ralf-Philipp Weinmann, Vice Chairman Université du Luxembourg

Abstract

Stream ciphers are cryptographic primitives belonging to symmetric key cryptography to ensure data confidentiality of messages sent through an insecure communication channel. This thesis presents attacks on several stream ciphers, especially against their initialization methods. The first targets are the stream ciphers Salsa20 and Trivium. For both ciphers slid pairs are described. Salsa20 can be distinguished from a random function using only the slid pair relation. When a slid pair is given for Salsa20 both secret keys can be recovered immediately if the nonces and counters are known. Also an efficient search for a hidden slid pair in a large list of ciphertexts is shown. The efficiency of the birthday attack can be increased twice using slid pairs. For the cipher Trivium a large related-key class which produces identical keystreams up to a shift is presented. Then the resynchronization mechanism of the stream ciphers SNOW 3G and SNOW 2.0 is analyzed. Both ciphers are simplified by replacing all additions modulo 232 with XORs. A known IV key-recovery attack is presented for SNOW 3G⊕ and SNOW 2.0⊕ where both ciphers have no feedback from the FSM. This attack works for any amount of initialization clocks. Then in both ciphers the feedback from the FSM is restored and the number of 33 initialization clocks is reduced. Chosen IV key-recovery attacks on SNOW 3G⊕ with 12 to 16 initialization clocks and SNOW 2.0⊕ with 12 to 18 initialization clocks are shown. In a similar way versions of the stream cipher K2 are attacked. This cipher is simplified by replacing all additions modulo 232 with XORs as well. Chosen IV key-recovery attacks on versions with reduced initialization clocks from five to seven out of 24 are presented. For the version with seven initialization clocks also a chosen IV distinguishing attack is shown. The last part deals with a linear key-IV setup and known feedback polynomials of the shrinking generator. It is shown that this linear initialization results in a very weak cipher as only a few known IVs are required to recover the secret key. The original design of the shrinking generator does not include any initialization method so the initial state was assumed to be the secret key.

Acknowledgment

Many people have contributed to this thesis. I am grateful to have the op- portunity to thank all of them and some people in particular. First and foremost, I wish to thank my supervisor Alex Biryukov for always having time for my problems and for being contactable at almost any time. His deep knowledge and encouragement were an ongoing motivation for me through all these years. Special thanks go to my coauthors Bin Zhang and Ralf-Philipp Weinmann for our inspiring discussions on various cryptography topics. In particular I am grateful for Ralf’s comments and suggestions and primarily for proof reading my thesis. I would like to thank Volker Müller, Frederik Armknecht and Stefan Lucks for having agreed to be members of my thesis jury. Without Stefan’s exciting lectures about cryptography during my studies I would probably never have thought about writing my thesis in this scientific area. I heartily thank all my colleagues at LACS, especially Dmitry, Alexander, Ivica, Ilja, David, Johann and Jean-François, as well as Barbara, Ton, Sasa and Hugo of the SaToSS group. Lunch breaks would have been much less fun and relaxing without you. I would also like to express my thanks and appreciation to our secretaries Ragga, Mireille and Fabienne for always being supportive with all the administrative matters and for never being too busy for an enjoyable talk. Additional thanks go to Kay Dittner for proof reading parts of my thesis and correcting my sometimes inscrutable English phrases. As she did not proof read this acknowledgment I hope I found a matching English expression here. And although we have never met I would like to thank Martin Sievers for his helpful advice concerning LATEX questions. My boundless gratitude go to my husband Oliver for his help and encouragement through my whole PhD time. While proof reading my thesis he pointed out several incomprehensible sentences and errors. Any errors that still remain are completely my own responsibility. Oliver and our daughter Freya were a continuous impulse to overcome all difficulties to finalize this thesis. They are the best that happened to me and I am looking forward to the years to come.

To my husband Oliver

Contents

1 Introduction 1 1.1 What is Cryptography? ...... 1 1.2 Security Goals ...... 2 1.3 Abridgment of Cryptographic History ...... 3 1.4 Thesis Outline ...... 7

2 Stream Ciphers 11 2.1 Basic Concept of Stream Ciphers ...... 11 2.2 Building Blocks ...... 15 2.2.1 Linear Feedback Shift Register ...... 15 2.2.2 Non-Linear Feedback Shift Register ...... 16 2.2.3 Boolean Function ...... 16 2.2.4 S-box ...... 18 2.2.5 Finite State Machine ...... 19 2.2.6 Additional Components ...... 19 2.3 Stream Ciphers Analyzed in this Thesis ...... 20

3 Survey of Attacks 23 3.1 Brute Force Attack ...... 24 3.2 Time/Memory Tradeoﬀs ...... 24 3.3 Correlation Attacks ...... 24 3.4 Diﬀerential Cryptanalysis ...... 25 3.5 Algebraic Attacks ...... 25 3.6 Cube Attacks ...... 27 3.7 Side-Channel Attacks ...... 27

4 Slid Pairs in Salsa20 and Trivium 29 4.1 Slid Pairs in Salsa20 ...... 31 4.1.1 Brief Description of Salsa20 ...... 31 4.1.2 Slid Pairs ...... 32 4.1.3 Sliding State Recovery Attack on the Davies-Meyer Mode ...... 34 4.1.4 Related Key Key-Recovery Attack on Salsa20 ..... 36

I II Contents

4.1.5 A Generalized Related Key Attack on Salsa20 ..... 37 4.1.6 Time-Memory Tradeoﬀ Attacks on Salsa ...... 41 4.2 Slid Pairs in Trivium ...... 42 4.2.1 Brief Description of Trivium ...... 42 4.2.2 Slid Pairs ...... 43 4.2.3 Systems of Equations ...... 44 4.3 Summary ...... 46

5 Analysis of SNOW 3G⊕ and SNOW 2.0⊕ Resynchronization Mechanism 49 5.1 Description of both SNOW Ciphers ...... 50 5.1.1 Description of SNOW 3G and SNOW 3G⊕ ...... 50 5.1.2 Description of SNOW 2.0 and SNOW 2.0⊕ ...... 52 5.2 Known and Chosen IV Attacks on Versions of SNOW 3G⊕ .. 53 5.2.1 Difference Propagation through the S-boxes ...... 53 5.2.2 Known IV Attack without FSM to LFSR Feedback .. 55 5.2.3 Differential Chosen IV Attacks ...... 58 5.3 Known and Chosen IV Attacks on Versions of SNOW 2.0⊕ .. 63 5.3.1 Known IV Attack without FSM to LFSR Feedback .. 64 5.3.2 Differential Chosen IV Attacks ...... 66 5.4 Summary ...... 70

6 Attacks on Simplified Versions of K2 73 6.1 Description of K2 and K2⊕ ...... 73 6.2 Differential Chosen IV Attack with Key-Recovery ...... 76 6.2.1 Difference Propagation through the S-boxes ...... 76 6.2.2 Reduced Initialization of 5 Clocks ...... 78 6.2.3 Reduced Initialization of 6 Clocks ...... 81 6.2.4 Reduced Initialization of 7 Clocks ...... 83 6.3 Chosen IV Distinguishing Attack ...... 87 6.4 Summary ...... 93

7 Cryptanalysis of the Shrinking Generator with Linear Ini- tialization 95 7.1 Description of the Shrinking Generator ...... 96 7.2 Known IV Key-Recovery Attack ...... 96 7.3 Summary ...... 101

8 Conclusion 103

Bibliography 105

A Salsa20 System of Equations for a Slid Pair 117

B SNOW 3G Simplify the Equation with two S-boxes 119

Chapter 1

Introduction

1.1 What is Cryptography?

The notation “cryptography” originates from the Greek words κρυπτoς´ = krypt´os which means secret or hidden and γραϕειν´ = gr´aphein which means writing. Thus, it is the science of secret transmission, encryption, decryption, code developing, code breaking, etc. A famous paraphrase of the quintessence of cryptography was given by a well-known cryptographer: “Cryptography is about communication in the presence of adversaries.” Ron Rivest This describes in a very clear way one of the main goals of cryptography where two persons want to communicate and hide the shared information from unwanted persons who can listen to their conversation. The symmetric-key cryptography deals with the protection of communication based on Shannon’s model in Figure 1.1.

secret channel key key plaintext plaintext

encryption ciphertext decryption

public channel sender receiver

Figure 1.1: Shannon’s model

1 2 Chapter 1. Introduction

The sender and receiver have access to a secret channel and a public channel. Messages going through the secret channel can not be monitored, altered or erased by an adversary. Though the secret channel is limited in such a way that it is only established at a certain point in time, has a slow transfer rate, can only handle small messages or has other restrictions. The public channel is always available without limitations but messages can be eavesdropped by an intruder. Thus, the sender and receiver have to use a method to encrypt and decrypt their messages. The sender encrypts the message called plaintext to a ciphertext using an encryption algorithm together with a secret key. This key is transmitted via the secret channel and the ciphertext is sent through the public channel. Then the receiver uses the same key to decrypt the ciphertext yielding the plaintext and is able to read the message. The eavesdropper can listen to the ciphertext in the public channel but he can not recover the original message as he does not have the right key. The symmetric-key encryption algorithms can be divided into stream ciphers, which are the topic of this thesis, and block ciphers. Stream ciphers encrypt the plaintext one digit at a time whereas block ciphers take a “block” of plaintext digits at once and encrypt it. Another branch of cryptography dealing with the protection of communication is the public-key cryptography. The sender and receiver do not share the same key anymore. Instead the receiver creates a pair of diﬀerent keys with one key kept secret and the other one broadcasted over the public channel. The so-called public key is used by the sender to encrypt the message which is transmitted over the public channel. Only the receiver knows the secret key connected to the public key and can decrypt the message. There are many other areas of cryptography such as hash functions, digital signatures, etc., each one invented to achieve another speciﬁc goal.

1.2 Security Goals

Over the years the development of cryptography has advanced faster and faster, especially since the invention of computers and even more with the growing importance of inter-computer communication. Not only the methods for cryptography have changed, also the goals for cryptography have altered to meet the requirements of modern communication systems. Nowa- days, it is not suﬃcient anymore to only hide the information of a message from an unwanted person. In modern cryptography four diﬀerent security goals can be distinguished.

Data conﬁdentiality. This is probably the most widespread goal. Two or more parties want to communicate over an open channel and hide the information from unwanted persons. This can be done by using a cipher 1.3. Abridgment of Cryptographic History 3 to encrypt the message. The sender encrypts a message with a key so that it usually looks random. Only the receiver who is intended to read the message and therefore has the right key can decrypt the message and get the information. Any person who does not have the key should not be able to get the information from the encrypted message.

User authentication. This goal is also best-known nowadays, since nearly every electronic device which stores and deals with personal information requires a password from the user for the login. The identiﬁcation of the user to another party which can be another person, a computer or any system with restricted access can be achieved by proving that he knows a secret (password, code), owns a unique object (ID card, token) or has a biometric characteristic (ﬁngerprint, eyes).

Data integrity. With the development of the internet and especially of e-mail communication and ﬁle transfer this goal becomes more and more important. When a person sends a message or ﬁle to another person then data integrity means that this message is not altered during its way through the communication channel. One way to ensure integrity is to use a Message Authentication Code.

Other goals of cryptography are non-repudiation, access control, anonymity, revocation, certiﬁcation, timestamping etc.

1.3 Abridgment of Cryptographic History

Although they were probably no form of secret communication, exceptional hieroglyphs chiseled into ancient Egyptian monuments around 1900 BC can be considered the ﬁrst evidence of classic cryptography [CYC06]. In the sixth century BC Hebrew scholars began to use monoalphabetic substitution ciphers like Atbash or Albam to replace letters based on their order in the alphabet [Coh95]. Though questioned by Kelly [Kel98] the Scytale (σκυταλη´ ) usually is considered a cryptographic tool developed in ancient Greece, supposedly used by Spartan commanders to transmit secret messages. The ﬁrst transposition cipher was made possible by winding e.g., a strip of parchment around a baton and writing the message onto it (see Figure 1.2). Once unwound the original text was “unreadable” due to the rearranged order of the letters and could be decrypted by using a device of the same diameter. The probably best-known cipher of ancient history is a substitution cipher known as Caesar Cipher. The additive cipher, that replaced each letter by another obtained through shifting in the alphabet, was named after the famous Roman statesman. Concerning the English alphabet each letter is 4 Chapter 1. Introduction

Figure 1.2: A parchment wound around a Skytale [Wik07]

substituted by the letter with an additional value of an integer constant c modulo 26, in the case of Julius Caesar this value was three. After several centuries lacking noteworthy advances in both cryptography and cryptanalysis, the first documented description of frequency analysis to break monoalphabetic substitution ciphers was penned by al-Kindi in the ninth century AD [Sin99]. In his “Manuscript for the Deciphering Cryp- tographic Messages” (“Risalah fi Istikhraj al-Mu’amma”) he also provided cryptanalysis techniques for both mono- and polyalphabetic ciphers. Another important medieval cryptographer was Leon Battista Alberti, the author of a treatise on ciphers in 1467 entitled “De Cifris” who was addressed as the “Father of Western Cryptography” by cryptography histo- rian David Kahn [Kah67a]. The Italian invented the Alberti Cipher Disk that allowed a polyalphabetic substitution with mixed alphabets and variable period and thus made frequency analysis ineffective due to the masked frequency distribution. In 1586 the French diplomat Blaise de Vigenère took credit for an idea originally described in the book “La cifra del Sig. Giovan Battista Bel[l]aso”. In 1553 Bellaso had introduced a method that used a square table, the so- called “tabula recta”1, and a key termed “countersign” to switch the cipher alphabets for each letter periodically [Kah67a]. The repetition of the usually short key was the starting point for attacks, as guessing the right key length n allowed to split the cipher into n Caesar Ciphers. Nevertheless a version of this method named Vigenère Cipher was considered indecipherable until Friedrich Kasiski published a successful general attack on the cipher in his book “Secret Writing and the Art of Deciphering” (“Die Geheimschriften und die Dechiffri[e]r-Kunst”) in 1863 [Kas63].2

1The device is also known as Vigenère square or Vigenère table and can be used for both en- and decryption. 2Charles Babbage might have been the ﬁrst to break the Vigenère Cipher but Kasiski was the ﬁrst cryptanalyst to publish his results. 1.3. Abridgment of Cryptographic History 5

The second important publication of the late nineteenth century was Auguste Kerckhoffs’s article “La Cryptographie Militaire” in 1883 [Ker83] that contained some pioneering ideas whereof some are still fundamental for today’s cryptography. A part of this specification known as Kerckhoffs’ principle will be explained in Section 2.1. The beginning of the twentieth century brought several innovative con- cepts to the cryptographic world, in fact one of them actually dated back several years as the basic concept of the One-Time Pad originally was deliv- ered in 1882 [Bel11]. The cipher whose perfect secrecy was later proven by Claude E. Shannon is described more detailed in Section 2.1, likewise. It was World War II to usher in a new era of cryptography. At that time mechanical and electromechanical devices were frequently used for military communication and manual techniques only remained in use where these machines were not suitable. In consequence of these developments the new century was affected by several fundamental changes. Now it were no longer the linguists but the mathematicians working on cryptanalysis and thus trying to develop new methods for encryption, too. And due to the complexity of these new machines the development of elaborate cryptanalysis techniques became inevitable as manually executed brute force attacks were no longer feasible. The Enigma (from the Greek word α´ινιγµα meaning “riddle”) is probably the best-known example for these electromechanical machines. The three- rotor, single-notched Enigma (see Figure 1.3) combined the functionality of a plugboard and three rotating wheels with further setting options to a the- oretical amount of 3.3×10114 possible configurations which equates 380 bits. Even if the architecture of this far most common model was known, there were still 1.1 × 1023 possible cryptovariables left when trying to determine the daily key, which is comparable to a 76-bit key [Mil01]. Nevertheless, the Enigma Code was cracked due to some of its weaknesses, in particular by exploiting the facts that the plugboard connections were reciprocal3 and no letter was encrypted to itself [Mah45]. For this purpose the British Govern- ment Code and Cypher School had recruited scientists at Bletchley Park, mainly mathematicians like Alan Turing, usually referred to as the “Father of Computer Science”. With the deployment of Colossus Mark 1 helping to crack the Enigma Code the digital age had begun in February 1944. In the late forties Claude E. Shannon published the often-cited article “A Mathematical Theory of Communication” [Sha48] and the fundamental paper “Communication Theory of Secrecy Systems” [Sha49]. The latter contained the already mentioned model of communication and the important proof of the One-Time Pad’s perfect secrecy based on results developed during WWII and described in a classified report [Sha45].

3Reciprocity implicates that if a certain state would encrypt O to S it would encrypt S to O in the same manner. 6 Chapter 1. Introduction

Figure 1.3: A three-wheel Enigma machine [Wik05]

After World War II both cryptography and cryptanalysis was almost ex- clusively studied by military and intelligence until the scientific community got the chance to participate actively. Two requests for proposals by the National Bureau of Standards in 1973 and 1974 led to the Data Encryp- tion Standard (DES) that was developed by researchers of IBM and finally published as Federal Information Processing Standard (FIPS) in 1977. DES and its successors are symmetric-key block ciphers with original DES having only a 56-bit key due to NSA demands4 contrary to its predecessor Lucifer already using 128-bit keys [Joh09]. Another moment of high importance in the seventies is known as Diffie- Hellman key exchange, an algorithm that keeps a shared key secret to out- siders although the communication to generate this key can be eavesdropped. Whitfield Diffie and Martin Hellman proposed this innovative method for distributing keys in their article “New Directions in Cryptography” [DH76]. They proposed an asymmetric algorithm combining a so-called “private key” that has to be kept secret with a so-called “public key” that can be made available to anybody and thus especially be communicated via an insecure channel. By computing the shared key based on exponentiation of a generator of a group and the utilization of both private and public key the result

4Originally the National Security Agency had even wanted a key length of 48 bits, presumably for “security reasons”. 1.4. Thesis Outline 7 stays exclusive to both participants of the key exchange. Two years later Ron Rivest, Adi Shamir and Leonard Adleman published an article covering the same topic presenting a public key cryptography algorithm known as RSA [RSA78] that also was patented in the United States [RSA83]. Although proven insecure being vulnerable to brute force attacks taking 56 hours in 1998 DES was retained as standard until the Advanced En- cryption Standard (AES) became eﬀective as FIPS in May 2002. Six months earlier the block cipher Rijndael developed by the Belgians Joan Daemen and Vincent Rijmen had been selected as replacement by the National Institute of Standards and Technology5. Since then several components developed for AES have been reused in other cryptography algorithms, amongst them the S-box and the MixColumns operation which both will appear in this thesis repeatedly.

1.4 Thesis Outline

Chapter 1 characterizes cryptography in general and gives a description of its goals. Afterwards an abridgement of the history of cryptography is given. Chapter 2 states the general concept of a stream cipher. It starts with defining a cipher and then introduces the well-known one-time pad cipher which was the motivation for stream ciphers. Afterwards the commonly used building blocks (LFSR, NLFSR, Boolean function, S-box, FSM) are explained. Chapter 3 contains a survey of cryptanalytic attacks. Before describing several attacks we define the goal of an attack as well as the knowledge of the attacker and which phase (keystream generation or initialization) he strikes. The first three chapters are influenced to some extent by some books [MvOV96, LN97], PhD theses [Arm06, Hel07, Max06, Ekd03, Lan06, Fis08] and internet sources like Wikipedia [Wik11]. Chapter 4 describes slid pairs for the stream ciphers Salsa20 and Trivium. Both ciphers are in the final portfolio of the eSTREAM project. Given such a slid pair for Salsa20 we can recover both secret keys immediately if the nonces and counters are known. We can also find efficiently a hidden slid pair in a large list of ciphertexts and recover both unknown keys. The slid pairs can also be used to increase the efficiency of the birthday attack twice. For Trivium we show a large related key-class which produces identical keystreams up to a shift. Our results on slid pairs in Salsa20 and Trivium are presented in [PSB08]. Chapter 5 presents attacks against simplified versions of the stream cipher SNOW 3G and its predecessor SNOW 2.0. The two ciphers are simplified by replacing all additions modulo 232 with XORs. For both SNOW 3G and SNOW 2.0 we show a known IV key-recovery attack without feedback from

5Until 1988 known as National Bureau of Standards. 8 Chapter 1. Introduction the FSM working for any amount of initialization clocks on the one hand and chosen IV key-recovery attacks on reduced numbers of initialization clocks on the other hand. Our results on SNOW 3G are presented in [BPSZ10]. Chapter 6 describes attacks against simplified versions of the stream cipher K2. The simplification is received by replacing all additions modulo 232 with XORs. We present chosen IV key-recovery attacks on versions with reduced initialization clocks from five to seven out of 24 initialization clocks and a chosen IV distinguishing attack on the version with seven initialization clocks. The attack on the reduced version with five initialization clocks and the distinguishing attack will appear in [PS11]. The extension of the chosen IV key-recovery attack to six and seven clocks was obtained later. Chapter 7 shows that the shrinking generator with a linear key-IV setup and known feedback polynomials can easily be broken using a few known IVs. Originally, the shrinking generator was published without any initialization mode and the initial state was assumed to be the secret key. We studied a linear initialization with key and IV and known feedback polynomials. In this case we only needed a few known IVs to receive the secret key. This work is part of a paper currently under submission. Chapter 8 gives some final conclusions.

List of publications

During my time as PhD student I have published the following papers which are part of this thesis:

• D. Priemuth-Schmid and A. Biryukov. Slid Pairs in Salsa20 and Trivium. In D. R. Chowdhury, V. Rijmen, and A. Das (eds.), IN- DOCRYPT, volume 5365 of Lecture Notes in Computer Science, pages 1-14. Springer, 2008.

• A. Biryukov, D. Priemuth-Schmid, and B. Zhang. Analysis of SNOW 3G; Resynchronization Mechanism. In S. K. Katsikas and P. Samarati (eds.), SECRYPT, pages 327-333. SciTePress, 2010.

• D. Priemuth-Schmid. Attacks on Simpliﬁed Versions of K2. To appear in P. Bouvry, M. A. Kłopotek, F. Leprevost, M. Marciniak, A. Mykowiecka and H. Rybiński, SIIS, volume 7053 of Lecture Notes in Computer Science, pages 117-127. Springer, 2011.

Currently under submission is the following paper:

• A. Biryukov, R.-P. Weinmann and D. Priemuth-Schmid. Cryptanalysis of the Shrinking Generator and the Alternating Step Generator using Diﬀerentially Related States 1.4. Thesis Outline 9

During my PhD time I have also co-authored the following paper which is not included in this thesis:

• A. Biryukov, D. Priemuth-Schmid and B. Zhang. Multiset Collision Attacks on Reduced-Round SNOW 3G and SNOW 3G⊕ In J. Zhou and M. Yung (eds.), ACNS, volume 6123 of Lecture Notes in Computer Science, pages 139-153. Springer, 2010. 10 Chapter 1. Introduction Chapter 2

Stream Ciphers

Stream ciphers are cryptographic primitives belonging to symmetric key ciphers. A block cipher in a stream cipher mode can simulate a stream cipher so why do we need “real” stream ciphers? In the cryptographic community the opinion prevails that well constructed stream ciphers can have a better performance and / or smaller hardware requirements then block ciphers in a stream cipher mode. The eSTREAM project [eST08] confirms that opinion. eSTREAM was a project by ECRYPT [ECR08] to detect new promising stream ciphers conducted from 2004 to 2008. There are seven stream ciphers in the final portfolio classified into two profiles, a hardware and a software profile. The demand for the hardware profile was that stream ciphers must have less hardware requirements than AES [DR02] whereas the software profile required a better software performance than AES in a stream cipher mode.

2.1 Basic Concept of Stream Ciphers

We start with the definition of a symmetric key cipher. Definition 2.1: A symmetric key cipher is given by three sets • a finite set of plaintexts P • a finite set of ciphertexts C • a finite set of keys K and two algorithms • the encryption algorithm E with E : K × P → C • the decryption algorithm D with D : K × C → P with the property D(k, E(k, p)) = p for each k ∈ K and all p ∈ P.

A fundamental principle which should guide the construction of a cipher is Kerckhoﬀs’ principle [Ker83] which states that a cipher should be secure even if everything except the key is known to the attacker. Thus, the security

11 12 Chapter 2. Stream Ciphers of a cipher should not rely on the secrecy of the algorithm rather on a secret key. A key can be kept secure much easier than a whole system, which might be used by several parties and / or implemented on several devices, and a key can easily be changed if it is jeopardized or revealed. The well-known one-time pad cipher illustrates this principle in an im- pressive manner since everything except the secret key is known to the attacker. The one-time pad is an advancement of the Vernam cipher [Ver26]. Vernam did not insist on a random key as according to the text of U.S. Patent 1,310,719 [Ver19] he left the choice to the users but suggested the key to be “[. . . ] preferably selected at random [. . . ]”. Mauborgne [Kah67b] identiﬁed the advantage that an endless and senseless key would make the cryptanalysis more diﬃcult. The one-time pad is a Vernam cipher with completely random keys which are used only once.

Deﬁnition 2.2: The one-time pad is a cipher with the following properties: • the key is completely random • the key is at least as long as the plaintext • each key is used only once for encryption. As encryption function the XOR or a modulo addition can be used. Ta- ble 2.1 gives an example of the encryption with the one-time pad. A number from 0 to 25 is dedicated each letter of the alphabet, e.g., a=0, b=1, . . . , z=25. Then the encryption function is the addition modulo 26.

Plaintext: thisismytexttoencrypt 19 7 8 18 8 18 12 24 19 4 23 19 19 14 4 13 2 17 24 15 19 Encryption with random key: hkoromgigcwbzflxwvqmy + 7 10 14 17 14 12 6 8 6 2 22 1 25 5 11 23 22 21 16 12 24 Ciphertext: arwjwesgzgtustpkymobr = 0 17 22 9 22 4 18 6 25 6 19 20 18 19 15 10 24 12 14 1 17

Table 2.1: Example for encryption with the one-time pad

In 2011 Bellovin [Bel11] discovered that Frank Miller invented the one- time pad 35 years earlier. The one-time pad was proven to be unconditionally secure by Shannon [Sha49]. Unconditionally secure means that even an attacker with unlimited resources (computing power, time, memory requirements, etc.) is unable to recover the unknown plaintext or the secret key from a ciphertext. This advantage of a security proof is confronted with the fact that this cipher is infeasible for most practical applications. The constant need of truly random keys is a problem as the generation of random 2.1. Basic Concept of Stream Ciphers 13 keys is extensive due to a required source of true randomness (for example radioactive decay). Also the safe storage or transmission of the secret keys which are at least as long as the plaintexts is diﬃcult. This is the motivation for stream ciphers which take a short secret key and produce long pseudo-random sequences, so-called keystreams, to encrypt the plaintexts. Thus, stream ciphers only imitate the one-time pad, the unconditional security does not hold anymore. Instead, the security of stream ciphers is described as conditionally secure which means that in principle the cipher can be broken but the attacker would need an unrealistic amount of resources (computing power, time or memory requirements, etc.). The simplest way to break a stream cipher is the brute force attack where the attacker exhaustively tries all keys until he has found the right one. The complexity of this exhaustive key search is based on the amount of possible keys. With an increasing amount of potential keys the complexity grows exponentially. Therefore, the quantity of possible keys should be big enough to make the brute force attack infeasible in practice. Consequently, a stream cipher is called secure if no attack exists with a smaller complexity than the brute force attack. The algorithm to compute the keystream is called a keystream generator.

Definition 2.3: A KeyStream Generator (KSG) is a Finite State Machine (FSM) which consists of • a finite set of keys K • a finite set of keystreams Z • an internal state S • an update function g() • an output function f(). First the FSM is loaded with the key. Then for each clock the update function updates the internal state and the output function produces the element of the keystream.

Deﬁnition 2.4: A stream cipher is a symmetric key cipher which encrypts the elements of the plaintext one at a time. It consists of a KSG which generates the keystream and a combining function which combines the keystream and the plaintext elements.

The combining function is usually an XOR but also the modulo addition is used. Stream ciphers are typically classiﬁed into synchronous and self- synchronizing ones but there are also stream ciphers which do not belong to any of these categories, for example Phelix [WSLM05] which has a built-in message authentication code functionality.

Deﬁnition 2.5: A synchronous stream cipher generates the keystream independently from the previous ciphertext elements. 14 Chapter 2. Stream Ciphers

The notation “synchronous” means that the sender and receiver must be synchronized for a successful decryption. If a ciphertext element is deleted or inserted during its way through the communication channel then synchro- nization is lost and all the following ciphertext elements can not be decrypted correctly. If a ciphertext element is modified only the corresponding plaintext element is changed and no other plaintext elements are affected. Such a modification of one element might not be detected. Message authentication is needed to prevent both incidents.

Deﬁnition 2.6: In a self-synchronizing (asynchronous) cipher a ﬁxed number of previous ciphertext elements are involved in the keystream generation.

The notation “self-synchronizing” means that the cipher will synchronize itself after a ﬁxed number of ciphertexts if an error occurs.

Stream ciphers must use a new random key for each new plaintext. As described, the generation of random keys is expensive so the designers looked for a possibility to use one key for more than one plaintext. The solution is the use of an initial value (IV) in addition to the key. Nearly all new cipher constructions use a key together with an IV. This IV is typically publicly known and should be a random nonce (number used once) or a counter. The combination of the same key and IV together must never be used again. An initialization phase is needed to have a proper mixing of the unknown key and the known IV before the keystream is produced. This initialization phase should be non-linear, unequal if procurable to the update function used during the keystream generation as well as long enough to counteract attacks against the initialization method. Figure 2.1 shows a synchronous stream cipher which takes a key and an IV to produce the keystream.

key IV

initialization update function

internal output keystream state function

plaintext combining ciphertext function

Figure 2.1: Synchronous stream cipher with key and IV 2.2. Building Blocks 15

2.2 Building Blocks

Stream ciphers can be constructed in many diﬀerent ways. We describe the widely used building blocks. There are several other constructions.

2.2.1 Linear Feedback Shift Register One of the most frequently used building block of stream ciphers is the Linear Feedback Shift Register (LFSR). It is a FSM operating on some finite field Fq. This finite field is often a binary field with q = 2 or an extension of the binary field with q = 2e where 1 < e ∈ N. The register consists of a number of cells which gives the length of the register denoted with l. Each cell can contain one element of Fq. The content of the register is called the state and the content at the beginning of the computation is called initial state or starting state. Figure 2.2 shows a picture of an LFSR.

...

cl cl−1 cl−2 ... c2 c1 c0

sl−1 sl−2 ... s2 s1 s0

Figure 2.2: General composition of a Linear Feedback Shift Register

When the LFSR is clocked (the time is denoted with t) the content of the last (rightmost) cell becomes the output, for all other cells the content is shifted one cell to the right and the content of the ﬁrst (leftmost) cell is computed via the linear recurrence relation

Xl−1 st+l = cl · ci · st+i c0, c1, . . . cl ∈ Fq i=0 where all computations are done in Fq. The constants c0, c1, . . . , cl are the feedback coeﬃcients with c0 6= 0 and cl 6= 0. The so-called tap positions are the connections at ci 6= 0 for i = 0, . . . , l−1 and the connection at cl is called the feedback position.

Deﬁnition 2.7: The feedback polynomial of an LFSR is the polynomial

0 1 l g(x) = clx − cl−1x − · · · − c0x in the ring of polynomials Fq[x]. 16 Chapter 2. Stream Ciphers

There are some important properties which the feedback polynomial must fulﬁl in order to achieve a long period.

Deﬁnition 2.8: A polynomial g(x) ∈ Fq[x] with degree l is called irreducible if it can not be factored into two polynomials with positive degrees smaller than l in Fq.

Deﬁnition 2.9: An irreducible polynomial g(x) ∈ Fq[x] with degree l is called primitive if it divides the polynomial xql−1 − 1 but no polynomial xu − 1 with u < ql − 1.

Theorem 2.10: If an LFSR has a primitive feedback polynomial g(x) ∈ l Fq[x] with degree l, then the period of the LFSR is T = q − 1 for every non-zero starting state. For the proof we refer to [LN97].

Therefore, an LFSR with a primitive feedback polynomial is called a maximum length LFSR and its sequence is called maximum length sequence.

Deﬁnition 2.11: The linear complexity of a sequence s = s0, s1, . . . , si ∈ Fq, denoted with LC(s), is the length of the shortest LFSR that can generate this sequence.

If the sequence s is the all-zero sequence then LC(s) = 0. If the sequence s is truly random then LC(s) = ∞ because no LFSR can produce such a sequence. The Berlekamp-Massey algorithm [Mas69] can eﬃciently ﬁnd the shortest LFSR for a sequence s and hence its linear complexity if at least 2 LC(s) elements of s are given.

2.2.2 Non-Linear Feedback Shift Register One solution to solve the linearity problem of an LFSR is the usage of a Non-Linear Feedback Shift Register (NLFSR). It is built in the same way as an LFSR but the feedback function is non-linear. Thus, it preserves the advantage to be easily implemented. In addition, NLFSRs have high linear complexity which makes the execution of algebraic attacks much harder or infeasible. One disadvantage is that balancedness is hard to achieve. Another problem is the little knowledge of the mathematical background of NLFSRs contrary to LFSRs. Thus, no general criteria for maximum periods or good statistical properties are known.

2.2.3 Boolean Function n Deﬁnition 2.12: A Boolean function is a mapping f : F2 → F2

f(x1, x2, . . . , xn) = y with x1, x2, . . . , xn, y ∈ F2. 2.2. Building Blocks 17

There are 22n diﬀerent Boolean functions with n input variables. We denote this set of Boolean functions with Bn. Boolean functions can be represented in diﬀerent ways.

Deﬁnition 2.13: The truth table (TT) of a Boolean function f ∈ Bn is a binary string of length 2n

TT (f) = [f(0, 0,..., 0), f(1, 0,..., 0), f(0, 1,..., 0), . . . , f(1, 1,..., 1)] .

The truth table is the easiest way if the number n of inputs is not too big.

Deﬁnition 2.14: The algebraic normal form (ANF) of a Boolean function f ∈ Bn is a polynomial over F2 M M f(x1, x2, . . . , xn) = c0 ⊕ cixi ⊕ cijxixj ⊕ · · · ⊕ c12...nx1x2 . . . xn 1≤i≤n 1≤i

This representation as a polynomial is unique.

Deﬁnition 2.15: Let f ∈ Bn and x, w ∈ F2 with x = (x1, x2, . . . , xn) and w = (w1, w2, . . . , wn) and x · w = x1w1 ⊕ x2w2 ⊕ · · · ⊕ xnwn. Then the Walsh n transform of f(x) is a real valued function over F2 deﬁned as X f(x)⊕x·w Wf (w) = (−1) . n x∈F2

The Walsh spectrum is the integer valued vector Wf .

For cryptographic use some properties like algebraic degree, balancedness, non-linearity, correlation immunity, algebraic immunity are important.

Deﬁnition 2.16: The algebraic degree of f ∈ Bn denoted as deg(f) is the number of variables in the highest order term with a non-zero coeﬃcient in the ANF.

Definition 2.17: A function f ∈ Bn is called affine if it has no term with degree > 1 in the ANF. The set of all affine functions with n input variables is denoted as An. A function f ∈ An is called linear if it has a constant term equal to zero.

For some properties we need the Hamming weight and the Hamming distance so we deﬁne them as well. 18 Chapter 2. Stream Ciphers

Deﬁnition 2.18: The Hamming weight of a binary sequence s denoted as wH (s) is the number of ones in s. For two binary sequences s1 and s2 with the same length the Hamming distance denoted as dH (s1, s2) is the number of positions where both sequences diﬀer. Thus, the equation dH (s1, s2) = wH (s1 ⊕ s2) holds.

Deﬁnition 2.19: A function f ∈ Bn is called balanced if its truth table has an equal number of ones and zeros, i.e.,

n−1 wH (TT (f)) = 2 ⇔ Wf (0) = 0 .

Deﬁnition 2.20: The non-linearity of f ∈ Bn denoted as nl(f) is the minimum distance to the set An, i.e., 1 nl(f) = min (d (f, g)) ⇔ nl(f) = 2n−1 − max |W (w)| . H n f g∈An 2 w∈F2

th Deﬁnition 2.21: A function f ∈ Bn is called m order correlation immune (CI) if and only if its Walsh transform satisﬁes

Wf (w) = 0 ∀ w with 1 ≤ wH (w) ≤ m .

If f is also balanced then it is called m-resilient.

Deﬁnition 2.22: The algebraic immunity for a function f ∈ Bn denoted as AI(f) is the minimum degree of g ∈ Bn with f · g = 0 or (f ⊕ 1) · g = 0.

2.2.4 S-box

n m A Substitution Box (S-box) is a mapping f : F2 → F2 . It substitutes a n m vector x = (x1, x2, . . . , xn) ∈ F2 by a vector y = (y1, y2, . . . , yn) ∈ F2 in a non-linear manner. The mapping f can be divided into m Boolean functions. Then we can write y = f(x) as

y1 = f1(x1, x2, . . . , xn), y2 = f2(x1, x2, . . . , xn), . . ym = fm(x1, x2, . . . , xn).

Therefore, the S-box can be deﬁned by the truth tables or the ANF of its Boolean functions. The Boolean functions are also used to deﬁne some properties of the S-box.

Deﬁnition 2.23: An S-box f = (f1, f2, . . . , fm) is balanced if all non-zero linear combinations of its Boolean functions are balanced. 2.2. Building Blocks 19

Deﬁnition 2.24: The algebraic degree of an S-box f is deﬁned as the minimum algebraic degree of all non-zero linear combinations of its Boolean functions, i.e. Ã ! Xm deg(f) = min deg cifi . (c ,c ,...,c ) ∈ Fm\{0} 1 2 m 2 i=1

Deﬁnition 2.25: The non-linearity of an S-box f is deﬁned as minimum non-linearity of all non-zero linear combinations of its Boolean functions, i.e. Ã ! Xm nl(f) = min nl cifi . (c ,c ,...,c ) ∈ Fm\{0} 1 2 m 2 i=1

There are two kinds of S-boxes, static ones and varying ones. The static S-box is just a fixed mapping which can be designed to have special properties and which is assumed to be known to the attacker. The varying S-box is key dependent and thus varies during the computation. The design of good S-boxes is a difficult task. This might be the reason why a good S-box is often reused by designers of other ciphers, for example the 8-bit to 8-bit S-box of the AES block cipher [DR02]. The input is mapped to its multiplicative inverse in the finite field, followed by an affine transformation. Amongst others, the stream ciphers SNOW 2.0 [EJ02], SNOW 3G [ETS06a] and K2 [KTS07] use this S-box.

2.2.5 Finite State Machine The main intention of using a Finite State Machine (FSM) is to increase non- linearity in a stream cipher construction. Usually an FSM is used together with one or more LFSRs. It consists of some extra memory and a non-linear update function. A non-linear operation is mostly more expensive then a linear operation. Thus, stream ciphers have to ﬁnd a tradeoﬀ between a large state which can be updated fast and easily and non-linearity. The FSM is one solution for this problem. A typical construction of an FSM has some memory cells which are updated by S-boxes and one or more inputs from the LSFR(s). The output produced by the FSM is combined with the output of the LFSR(s) to yield the keystream. During the initialization phase the output of the FSM is often fed back in the LFSR(s) to achieve a non-linear initialization phase and a non-linear state of the LFSR(s).

2.2.6 Additional Components The previously described building blocks need to be connected to result in a stream cipher construction. Nowadays, usually non-linear functions are used to combine several building blocks and to compute the keystream. 20 Chapter 2. Stream Ciphers

The non-linear combining function uses the output of several (N)LFSRs to compute the output of the stream cipher in a non-linear way, whereas the non-linear ﬁlter function uses several elements of only one (N)LFSR to compute the output of the stream cipher in a non-linear way. Figure 2.3 shows both functions which are usually Boolean functions if Fq = F2. The irregular clocking is another method used to increase the non-linearity. There are diﬀerent kinds of irregular clocking. One uses the outputs or some elements of one or more (N)LFSRs to determine when and how often which of the (N)LFSRs are clocked. An example is the A5/1 algorithm [BGW99] where for each cycle two or three out of its three registers are clocked. An- other way to achieve an irregular clocking is the use of regular clocked (N)LFSR(s) but decimate the output yielding a keystream with unknown clocking decisions. Two examples for such a decimation are the shrinking generator [CKM93] and the self-shrinking generator [MS94]. This enumeration is far from being complete as there are many more elements and / or methods to build a stream cipher.

non-linear combining function non-linear ﬁlter function

(N)LFSR 1 (N)LFSR zt (N)LFSR 2 f . . f

(N)LFSR n zt

Figure 2.3: Two functions to increase non-linearity

2.3 Stream Ciphers Analyzed in this Thesis

All ciphers analyzed in this thesis are synchronous stream ciphers which use a key together with an IV except for the last cipher. Our first topic is Salsa20 [Ber05], a stream cipher inspired by a block cipher in counter mode. It consists of a 4×4 - matrix of 32-bit words together with a non-linear update function. This cipher has no initialization phase. The keystream is produced by loading the key, a nonce and a counter in the matrix and performing the update function several times. As long as keystream is needed this procedure is repeated with an increased counter. The second stream cipher named Trivium [dCP05] is a simple but ele- gant design of one NLFSR over F2 with three non-linear update functions having three different feedback positions and a linear filter function for the 2.3. Stream Ciphers Analyzed in this Thesis 21 keystream computation. The update of the internal state during the initialization phase of 4·288 clocks is the same as during the keystream generation. The ciphers Salsa20 and Trivium are analyzed in Chapter 4. Both SNOW [EJ02, ETS06a] ciphers use an LFSR with sixteen 32-bit words, an FSM containing either three memory words connected with two S-boxes or two memory words connected with one S-box and a non-linear keystream function. After loading the key and IV in the LFSR both ciphers perform an initialization phase of 32 clocks where the output of the FSM is fed back in the LFSR resulting in a non-linear initialization phase. Then both ciphers switch to the keystream generation phase but the first keystream word is discarded. Key-recovery attacks on simplified versions with and without feedback from the FSM in the LFSR during the initialization are presented in Chapter 5. The next cipher named K2 [KTS07] is special as it uses a dynamic feedback controller which dynamically chooses between four different linear feedback functions for one FSR consisting of eleven 32-bit words. The other building blocks are another FSR of five 32-bit words, an FSM with four memory words and four S-boxes as well as a non-linear keystream function combining words from all three building blocks to produce two keystream words each clock. During the initialization phase the two keystream words are fed back in the FSRs one in each. Key-recovery attacks as well as a distinguishing attack on simplified versions are shown in Chapter 6. The last one is the shrinking generator [CKM93], an old design which uses two LFSRs over F2 where the output bit of the first LFSR decides whether the output bit of the second LFSR becomes the keystream bit or is discarded. This construction is given without any key-IV loading or initialization method. Thus the initial states are assumed to be the secret key which sometimes includes the feedback polynomials of the LFSRs, too. A version with a linear key-IV setup and known feedback polynomials is studied in Chapter 7.

In this chapter we gave some basics about stream ciphers. We introduced the one-time pad and its relation to stream ciphers. This is followed by a short overview of the most used building blocks. We also gave a classiﬁcation of the stream ciphers analyzed in this thesis. In the next chapter we will give a survey of attacks applicable to stream ciphers. 22 Chapter 2. Stream Ciphers Chapter 3

Survey of Attacks

Prior to the description of diﬀerent attacks on stream ciphers we have to deﬁne the goal of an attack, the knowledge of the attacker and which phase (initialization or keystream generation) the attacker strikes.

There are two main goals in attacking a stream cipher. The more important one is to recover the secret key. Consequently, an attack with this goal is called a key recovery attack. The other goal is to distinguish the keystream computed by the stream cipher from a truly random stream. These attacks are called distinguishing attacks. A key recovery attack is much more dangerous as the attacker is able to decrypt all ciphertexts which were encrypted with this key. If an attacker can recover the secret key he can also distinguish the produced keystream from a random stream. There are also other goals possible, for example state recovery.

The knowledge of an attacker can be divided into four categories. In a ciphertext only attack he knows only the ciphertext. The known plaintext attack gives the attacker access to known plaintext as well as the corresponding ciphertext. In the stream cipher case this means that the attacker knows the keystream. In a chosen plaintext attack the attacker is allowed to choose the plaintexts and get the ciphertexts. The other way around works the chosen ciphertext attack where the attacker chooses the ciphertexts and gets the decrypted plaintexts. It is also possible that the attacker has additional knowledge, for example that two keys are related somehow.

As most stream ciphers execute an initialization phase before they produce the keystream the attacker can decide which of these two phases he likes to strike. The attacks on the initialization phase can be divided into known IV attacks and chosen IV attacks depending on whether the attacker only knows the IVs or is able to choose them. An attack on the keystream phase usually assumes a known-keystream scenario (i.e. known plaintext attack).

23 24 Chapter 3. Survey of Attacks

3.1 Brute Force Attack

This kind of attack is always possible as the attacker can always try all potential keys exhaustively. He does not exploit any weakness the stream cipher might have. As mentioned in Section 2.1 the complexity is based on the amount of possible keys and grows exponentially with an increasing amount of possible keys. If keys of size n bits are used then the amount of possible keys is 2n which leads to 2n attempts to ﬁnd the right key. The complexity of the brute force attack is only an upper bound because the search for the right key can be parallelized.

3.2 Time/Memory Tradeoﬀs

Time/memory tradeoff attacks on stream ciphers have been independently described by Babbage [Bab95] and Golic [Gol97] adapting Hellman’s preced- ing work on block ciphers [Hel80]. Their results were improved by Biryukov and Shamir [BS00]. Usually, the attacks are based on the findings of the birthday paradox stating the astonishingly high probability of two persons of a relatively small group having the same birthday – in a group of 23 people the probability is 50.7 % that this phenomenon occurs. The time/memory tradeoff attack consists of two phases, a preprocessing phase and a real-time phase. During the first phase the general structure of the cipher is analyzed and the results are collected in large tables. In the second phase the preparatory work is used in connection with data produced from a particular unknown key to recover it. Thus, the attack can be regarded as a brute force attack on a smaller key space.

3.3 Correlation Attacks

An important kind of attacks are correlation attacks. They were introduced by Siegenthaler [Sie84, Sie85] and target stream ciphers which combine several LFSRs with a non-linear Boolean function to compute the keystream bit. These attacks use linear correlations between the keystream bits and the output bits of one or a small number of LFSRs to recover the secret key. If the stream cipher consists of n LFSRs eachQ having length li with i = 1, . . . n n li then the brute force attack would need i=1 2 tries to recover the initial states of the LFSRs and thus the secret key. With correlations between the keystream and single LFSRs the initialP states of the LFSRs can be recovered n li one by one which would need only i=1 2 tries which is a huge step down. The correlation immunity of a Boolean function can be computed via the Walsh transform. If a function is mth order correlation immune then m + 1 LFSRs must be considered together. 3.4. Diﬀerential Cryptanalysis 25

The approach to approximate the keystream by a linear combination of the output of some LFSRs is related to coding theory. The keystream is considered as a polluted linear combination of the output bits of the LFSRs. Therefore, methods from coding theory have been used to improve correlation attacks [MG90, Pen96]. Fast correlation attacks [MS89, CS91, JJ99, JJ00] not only use weaknesses in the Boolean functions but also exploit the linearity of the LFSRs’ update functions. These attacks search during a preprocessing step for parity check equations and use them in the processing step to recover the initial state of the LFSRs.

3.4 Diﬀerential Cryptanalysis

Biham and Shamir [BS90] introduced the differential cryptanalysis in 1990. The main target for differential attacks are block ciphers but this method can also be used to attack hash functions and stream ciphers. The idea is to find differences in the plaintext so that the differences in the ciphertext have some recognizable characteristics. For stream ciphers the differences are placed in the IV and the keystream is checked for special differences. Depending on the capabilities of the attacker we can talk about known or chosen IV attacks. The goal of differential attacks is either to distinguish the keystream from a random function or to recover the key. Typically the differences are computed via an XOR. The differences in the IVs are traced through the clocks of the initialization phase. For all linear building blocks the differences can be calculated directly. For the non-linear parts the probability that a given input difference leads to a special output difference must be computed. In the case of an S-box the behavior of the differences can be represented by its difference distribution table. This table of size 2n × 2m contains as rows the input differences and as columns the output differences. At the intersection of a row and a column is a number which shows how often this input difference results in this output difference. The differential attacks on stream ciphers have been studied in [ALP04].

3.5 Algebraic Attacks

The general idea for algebraic attacks is to model the known keystream as a system of equations in the unknown key variables and then solve this system to derive the secret key. The idea for such a method was brought up by Shannon [Sha49]. A system of linear equations can be solved quite easily using Gaussian elimination whereas the general solution of a system of non-linear equations over a ﬁnite ﬁeld is an NP-complete problem. 26 Chapter 3. Survey of Attacks

There are several methods to solve systems of non-linear equations. The easiest one to solve a system of non-linear equations is to replace each monomial with a degree higher than one by a new variable which is treated independently from the already existing ones. The result is a system where all equations are linear but the number of variables has increased considerably. So the bottleneck is to get enough linear independent equations that Gaussian elimination is applicable, more precisely the number of equations should be greater or equal to the number of variables. Another widely-used technique is Buchberger’s algorithm [Buc65] which computes Gröbner bases to solve a system of non-linear equations. The theory of Gröbner bases is also used by Faugère who developed the algorithms F4 [Fau99] and F5 [Fau02]. These algorithms are the most efficient methods known so far to solve non-linear equations over a finite field. The drawback for the algorithms based on Gröbner bases is the difficult estimation of the complexity. An algorithm to solve an overdefined system of quadratic equations over F2 was presented by Kipnis and Shamir [KS99]. This relinearization algorithm was analyzed and strengthen by Courtois et al. [CKPS00] to handle higher degree multivariate systems. This improved version was called XL algorithm where XL comes from eXtended Linearization was used to attack the stream cipher Toyocrypt [Cou02]. The criteria for success and complexity of the XL algorithm are unsettled as well as for another improvement the XSL algorithm [CP02]. In 2003 two papers [CM03, AK03] started a surge on research on algebraic attacks. One year later, Meier et al. [MPC04] introduced the algebraic immunity of a Boolean function which was a generalization of [CM03].

Deﬁnition 3.1: The algebraic immunity of a Boolean function f, denoted with AI(f), is the smallest degree of a Boolean function g which satisﬁes

g · f = 0 or g · (f ⊕ 1) = 0 .

Then the function g is called an annihilator for f.

An algorithm to ﬁnd the algebraic immunity for a Boolean function f with n variables is also presented in [MPC04]. The algorithm runs with n 3 a complexity of O(( d ) ) with d being the algebraic immunity. Two years later Armknecht et al. [ACG+06] gave an improved algorithm with complex- n 2 ity O(( d ) ). A lot of research concerning algebraic attacks and algebraic immunity followed (e.g., [BL05, DGM05, Car06, FM07]). Fast algebraic attacks are an extension of algebraic attacks and were introduced by Courtois [Cou03] and improved by Armknecht [Arm04]. The idea is to reduce the degree of the system of equations in a precomputation step by eliminating all high degree monomials which are independent of the keystream bits. 3.6. Cube Attacks 27

Carlet and Feng [CF08] showed an inﬁnitive class of Boolean functions with good properties regarding balancedness, non-linearity, algebraic immunity and immunity against fast algebraic attacks.

3.6 Cube Attacks

A very new kind of attack are cube attacks introduced in 2009 by Dinur and Shamir [DS09]. They consider the stream cipher as a tweakable polynomial in the public variables coming from the IV and the secret variables coming from the key. The attacker is allowed to tweak the polynomial by choosing diﬀerent values for the public variables. During a precomputation step several systems of equations with low degree are constructed by varying the public variables for all possible values. Then these systems are solved to get the solution in the secret variables following the secret key. Cube attacks are the currently best known attack against Trivium [DS09] and Grain-128 [DS11].

3.7 Side-Channel Attacks

Instead of looking for weaknesses of algorithms with cryptanalysis, side- channel attacks exploit leaked information gained from the physical imple- mentation of cryptographic devices. Side-channel attack can be divided into two categories passive attacks, which only monitor the behavior of the devices performing a cryptographic computation, and active attacks, which try to influence the device during the computation by physical means to cause errors or irregular behavior. Active attacks are usually called fault attacks. Passive attacks can observe different sources of side-channel information. Timing attacks are based on data dependent differences in execution time measured while a computation occurs. Kocher [Koc96] explained how to analyze timing information on various public key algorithms, amongst others Diffie-Hellman [DH76] and RSA [RSA78]. Differences in processing zeros and ones permit correlation analyses and hence the discovery of the secret key. Power monitoring attacks or power analyses utilize slight variations in power consumption of the operating hardware for the retrieval of information. Kocher et al. [KJJ99] describe two approaches. In a simple power analysis (SPA) the power trace of a cryptographic device is observed over time, allowing to identify specific mathematical operations and even the key. A differential power analysis (DPA) uses statistical methods for the inter- pretation of these measurement results. By involving signal processing and error correction functions e.g., to filter out noise, relevant information can be extracted even better than with SPA. 28 Chapter 3. Survey of Attacks

Electromagnetic attacks take advantage of electromagnetic waves that are emanated by electronic devices and hence are comparable to the power analysis. When the methods described above are adapted to the exploitation of electromagnetic radiation, they are called SEMA or DEMA [Rec04]. There also exist further forms of attacks like acoustic attacks that make use of the sounds emitted e.g., by CPUs or thermal imaging attacks based on infrared images of a processor. The countermeasures for side-channel attacks are as diverse as the attacks themselves. To prevent timing attacks, the duration of data dependent operations can be changed e.g., by equalization of computation time depending on the data to process or inserting random delays. In general, the masking or blinding of data is suitable to counteract multiple forms of side-channel attacks. To avoid these attacks the conditional execution paths for the pro- cessed data must not depend on secret but on public information.

In this chapter we gave a survey of existing attacks against stream ciphers. This list is far from being complete several other techniques exist. Chapter 4

Slid Pairs in Salsa20 and Trivium

In 2005 Bernstein [Ber05] submitted the stream cipher Salsa20 to the eSTREAM project [eST08]. Original Salsa20 has 20 rounds, later 8 and 12 rounds versions Salsa20/8 and Salsa20/12 were also proposed. Now Salsa20/12 is in the software profile in the eSTREAM portfolio. It has a 512-bit state which is initialized by copying into it a 128- or a 256-bit key, a 64-bit nonce and counter and a 128-bit constant. Previous attacks on Salsa used differential cryptanalysis exploiting a truncated differential over three or four rounds. The first attack was presented by Crowley [Cro06] which could break the 5 round version of Salsa20 within claimed 2165 trials. Later, a four round differential was exploited by Fischer et al. [FMB+06] to break 6 rounds in 2177 trials and by Tsnunoo et al. [TSK+07] to break 7 rounds in about 2190 trials. The currently best attack by Aumasson et al. [AFK+08] covers an 8 round version of Salsa20 with an estimated complexity of 2251. The stream cipher Trivium was submitted by de Cannière and Pre- neel [dCP05] to the eSTREAM project [eST08] in 2005. Analog to Salsa20/12 Trivium is in the eSTREAM portfolio, even though it is in the hardware profile. Trivium uses an 80-bit key and an 80-bit initial value (IV) to generate up to 264 bits of keystream. This synchronous cipher has a state size of 288 bits. The interesting part of Trivium is the non-linear update function of second degree. In [Rad06] Raddum presented and attacked simplified versions of Trivium called Bivium but the attack on Trivium had a complexity higher than the exhaustive key search. Bivium was completely broken by Maximov and Biryukov [MB07] and an attack on Trivium with complexity about 2100 was presented which showed that the key size of Trivium can not be increased just by loading longer keys into the state. In [MCP07], McDonald et al. attacked Bivium using SatSolvers. Another approach that gained attention recently is to reduce the key setup of Trivium as done by Turan and Kara [TK07] with 288 initialization rounds and Vielhaber [Vie07]

29 30 Chapter 4. Slid Pairs in Salsa20 and Trivium with 576 initialization rounds. Fischer, Khazaei and Meier [FKM08] showed an attack with complexity 255 on a version with 672 initialization rounds. The best attack for Trivium known so far is the cube attack published by Dinur and Shamir [DS09] which can break a version with 767 initialization rounds with time complexity 245. So far, no attack faster than an exhaustive key search was shown for Trivium with full 1152 initialization rounds. A summary of attacks on Trivium is given in Table 4.1.

Attack complexity authors linearization technique to reduce the quadratic system c · 283.5 factor c to solve the system of linear equations Maximov and Biryukov [MB07] linear approximation with bias 2−31 288 initialization rounds Turan and Kara [TK07] algebraic IV diﬀerential attack negligible 576 initialization rounds, 47 recovered key bits Vielhaber [Vie07] chosen IV statistical analysis key-recovery attack 255 672 initialization rounds Fischer, Khazei and Meier [FKM08] cube attack 245 767 initialization rounds Dinur and Shamir [DS09]

Table 4.1: Attacks on Trivium

In this chapter we show a sliding property for the stream ciphers Salsa20 and Trivium. Slide attacks were introduced by Biryukov and Wagner [BW99, BW00] mainly to attack block ciphers but the method is applicable on stream ciphers as well. We start with our investigation of Salsa20 followed by a description of the attacks. We show that the following observation holds: Suppose that two black boxes are given, one with Salsa20 and one with a random mapping. The attacker is allowed to choose a relation F for a pair of inputs, after which a secret initial input x is chosen and a pair (x, F(x)) is encrypted either by Salsa20 or by a random mapping. We stress that only the relation F is known to the attacker and no restrictions are made on the initial input. The goal of the attacker is given a pair of ciphertexts to tell whether they were encrypted by Salsa20 or by a random mapping. It is clear that for a truly random mapping no useful relation F would exist. In contrast, Salsa20 can easily be distinguished from random if F is a carefully selected function 4.1. Slid Pairs in Salsa20 31 related to the round-structure of Salsa20. If the initial input satisfies the Salsa20 conditions (secret key and known nonce, counter and constant), it is not only a distinguishing but also a complete key-recovery attack. To give the attacker a hard time the pair may be hidden in a large collection of other ciphertexts. For a random function there is no way of checking a large list except for checking all the pairs or doing a birthday attack. We describe for Salsa20 how to find such a hidden pair in a large list efficiently. Our attacks are independent of the number of rounds in Salsa and thus work for all three versions of Salsa. We also show a general birthday attack on 256-bit key Salsa20 with complexity 2192 which can be further sped up twice using sliding observations. Although as Bernstein [Ber08] mentioned there are better attacks on Salsa20 using parallelization, we wanted to point out that the sliding property is useful is this general birthday attack scenario. In the second part of this chapter we describe our results on Trivium which show a large related key class (239 out of 280 keys) which produce identical keystreams up to a shift. We solve the resulting non-linear sliding equations using Magma and present several examples of such slid key-IV pairs. The interesting observation is that 24 key bits do not appear in the equations for a shift of 111 clocks and thus for a fixed IV, there is a 224 freedom of choice for the key that the key-IV pair may have a sliding property.

4.1 Slid Pairs in Salsa20

In this section the symbol “+” denotes the addition modulo 232, the other two symbols work at the level of the bits with “⊕” as XOR-addition and “≪” as a shift of bits.

4.1.1 Brief Description of Salsa20 The Salsa20 encryption function uses the Salsa20 core function in a counter mode. The internal state of Salsa20 is a 4 × 4 - matrix of 32-bit words which is denoted by   y0 y1 y2 y3  y y y y  Y =  4 5 6 7  .  y8 y9 y10 y11  y12 y13 y14 y15

A so-called quarterround transforms an arbitrary vector (ya, yb, yc, yd) of four words into a vector (za, zb, zc, zd) by calculating

zb = yb ⊕ ((ya + yd) ≪ 7) zc = yc ⊕ ((zb + ya) ≪ 9) zd = yd ⊕ ((zc + zb) ≪ 13) za = ya ⊕ ((zd + zc) ≪ 18) . 32 Chapter 4. Slid Pairs in Salsa20 and Trivium

This non-linear operation is the basic part of the columnround and the rowround. The columnround converts the matrix column by column using the quarterround by slightly shifting the entries of the columns as input for the quarterround. The shifted columns as input vectors for the quarterround are (y0, y4, y8, y12), (y5, y9, y13, y1), (y10, y14, y2, y6) and (y15, y3, y7, y11). Simi- larly, the rowround transforms the matrix row by row using the quarterround by slightly shifting the entries of the rows as input for the quarterround. The shifted rows as input vectors for the quarterround are (y0, y1, y2, y3), (y5, y6, y7, y4), (y10, y11, y8, y9) and (y15, y12, y13, y14). A so-called doubleround consists of a columnround followed by a rowround. The doubleround function of Salsa20 is repeated 10 times. If Y denotes the matrix, a keystream block is deﬁned by Z = Y + doubleround10(Y ) . The quarterround has 12 word operations. One columnround as well as one rowround has 4 quarterrounds which means 48 word operations for each one. Salsa20 consists of 10 doublerounds which altogether gives 960 word operations. The feedforward at the end of Salsa20 has 16 word operations which concludes 976 word operations in total for one encryption. The cipher takes as input a 256-bit key (k0, . . . , k7), a 64-bit nonce (n0, n1) and a 64-bit counter (c0, c1). The remaining four words have been set by the designer to ﬁxed publicly known constants

σ0 = 0x61707865 σ1 = 0x3320646e σ2 = 0x79622d32 σ3 = 0x6b206574 , given in hexadecimal numbers. Then a starting state S is given by the matrix   σ0 k0 k1 k2  k σ n n  S =  3 1 0 1  .  c0 c1 σ2 k4  k5 k6 k7 σ3 A 128-bit key version of Salsa20 copies the 128-bit key twice and has slightly diﬀerent constants. In this chapter we mainly concentrate on the 256-bit key version.

4.1.2 Slid Pairs The structure of a doubleround can be rewritten as columnround, then a matrix transposition, another columnround followed by a second transposition. We deﬁne F to be a function which consists of a columnround followed by a transposition. Now the 10 doublerounds can be transferred into 20 times the function F. If we have two triples (k, n, c) and (k0, n0, c0) with keys k, k0, nonces n, n0 and counters c, c0 so that h i F 1st starting state (k, n, c) = 2nd starting state (k0, n0, c0) 4.1. Slid Pairs in Salsa20 33 then this property holds for each point during the round computation of Salsa20, especially for the end of the round computation. Please note that the feedforward at the end of Salsa20 destroys this property. We call such a pair of a 1st and 2nd starting state a slid pair and show their relation in Figure 4.1.

X S F 19 × F Z

X0 S0 19 × F F Z0

Figure 4.1: Relation of a slid pair for Salsa20

In a starting state four words are constants and 12 words can be chosen freely which leads to a total amount of 2384 possible starting states. If we want a starting state – after applying function F – to result in a 2nd starting state, we obtain four wordwise equations. This means we can choose eight words of the 1st starting state freely whereas the other four words are determined by the equations as well as the words for the 2nd starting state. This leads to a total amount of 2256 possible slid pairs. For the 128-bit key version no such slid pair exists due to the additional constraints of four fewer words freedom in the 1st starting state and four more wordwise equations in the 2nd starting state. With function F we get two equations S0 = F(S) and X0 = F(X). The words for these matrices we denote as

   0 0 0  σ0 k0 k1 k2 σ0 k0 k1 k2    0 0 0   k3 σ1 n0 n1  0  k3 σ1 n0 n1  S =   S =  0 0 0  c0 c1 σ2 k4 c0 c1 σ2 k4 0 0 0 k5 k6 k7 σ3 k5 k6 k7 σ3    0 0 0 0  x0 x1 x2 x3 x0 x1 x2 x3    0 0 0 0   x4 x5 x6 x7  0  x4 x5 x6 x7  X =   X =  0 0 0 0  x8 x9 x10 x11 x8 x9 x10 x11 0 0 0 0 x12 x13 x14 x15 x12 x13 x14 x15    0 0 0 0  z0 z1 z2 z3 z0 z1 z2 z3    0 0 0 0   z4 z5 z6 z7  0  z4 z5 z6 z7  Z =   Z =  0 0 0 0  . z8 z9 z10 z11 z8 z9 z10 z11 0 0 0 0 z12 z13 z14 z15 z12 z13 z14 z15 The set up of the system of equations for a whole Salsa20 computation is too complicated but the equations for the computation of F are very clear. For a complete description of the equations see Appendix A. The 34 Chapter 4. Slid Pairs in Salsa20 and Trivium structure of both systems of equations coming from the relation F is the same, especially all the known variables are at the same place. Due to the eight words freedom we have in a 1st or 2nd starting state, there are some relations in the 12 non-ﬁxed words. For the 2nd starting state these relations are very clear as they only deal with words

0 0 0 0 0 0 0 0 0 = k2 + k1 0 = k3 + n1 0 = c1 + c0 and 0 = k7 + k6 . (4.1)

If we have an arbitrary starting state and the equations (4.1) are satisﬁed we know this starting state is a 2nd starting state. In contrast, for the 1st starting state the conditions to be a 1st starting state depend on the single bits and not on words anymore. Thus, the conditions for a 1st starting state are more complicated and not obvious, hence we omit them. Sliding by the function F is applicable to any version of Salsa20/r where r is even. For odd r there would be no transposition at the end of the round computation. Consequentially the equations are a bit diﬀerent, though still solvable.

4.1.3 Sliding State Recovery Attack on the Davies-Meyer Mode In this section we consider a general state-recovery slide attack on a Davies- Meyer construction. We demonstrate it with an example of a Davies-Meyer feedforward used with the iterative permutation of Salsa20. The feedforward breaks the sliding property and makes slide attack more complicated to mount. We consider the following scenario:

1. The oracle chooses a secret 512-bit state S (here we assume that there is no restriction of 128-bit diagonal constants and the full 512 bits can be chosen at random).

2. The oracle computes F(S) = S0.

3. The oracle computes Z = Salsa20(S), Z0 = Salsa20(S0) and gives them to the attacker.

4. The goal of the attacker is to recover the secret state S.

Due to the weak diﬀusion of F we can write separate systems of equations for each column of S. If we combine for one column the quarterround coming from S0 = F(S), the corresponding quarterround from X0 = F(X) and the feedforward, we get a system with 16 equations shown below. We assume all 16 variables are unknown.

0 0 s1 = s4 ⊕ ((s0 + s12) ≪ 7) x1 = x4 ⊕ ((x0 + x12) ≪ 7) 0 0 0 0 s2 = s8 ⊕ ((s1 + s0) ≪ 9) x2 = x8 ⊕ ((x1 + x0) ≪ 9) 0 0 0 0 0 0 s3 = s12 ⊕ ((s2 + s1) ≪ 13) x3 = x12 ⊕ ((x2 + x1) ≪ 13) 0 0 0 0 0 0 s0 = s0 ⊕ ((s3 + s2) ≪ 18) x0 = x0 ⊕ ((x3 + x2) ≪ 18) 4.1. Slid Pairs in Salsa20 35

0 0 0 z0 = x0 + s0 z0 = x0 + s0 0 0 0 z4 = x4 + s4 z1 = x1 + s1 0 0 0 z8 = x8 + s8 z2 = x2 + s2 0 0 0 z12 = x12 + s12 z3 = x3 + s3 This system can be reduced to four equations. h ³ í 0 z1 = (z4 − s4) ⊕ [(z0 − s0) + (z12 − s12)] ≪ 7 h ³ í (4.2) + s4 ⊕ (s0 + s12) ≪ 7 h ³ í 0 0 0 z2 = (z8 − s8) ⊕ [(z1 − s1) + (z0 − s0)] ≪ 9 h ³ í (4.3) 0 + s8 ⊕ (s1 + s0) ≪ 9 h ³ í 0 0 0 0 0 z3 = (z12 − s12) ⊕ [(z2 − s2) + (z1 − s1)] ≪ 13 h ³ í (4.4) 0 0 + s12 ⊕ (s2 + s1) ≪ 13 h ³ í 0 0 0 0 0 z0 = (z0 − s0) ⊕ [(z3 − s3) + (z2 − s2)] ≪ 18 h ³ í (4.5) 0 0 + s0 ⊕ (s3 + s2) ≪ 18

In (4.2) we guess two variables out of s4, s0, s12 and calculate the third one. 0 With these three variables we calculate s1 and solve (4.3) to get s8. Now we 0 0 0 can compute s2, s3, s0 and check our guess with (4.4) and (4.5). Therefore, the system of equations for one column with completely unknown variables can be solved by guessing only two variables. Similarly, we solve the systems of equations for the other three columns. With the four guesses of 264 steps each we can completely recover the 512-bit secret state S. This shows that Salsa20 without the diagonal constants is easily distinguishable from a random function, for which a similar task would require about 2511 steps. Now we add the diagonal constants which reduces the ﬂexibility of the oracle in a choice of the initial states to 2256 but the attack works even better:

1. The oracle chooses a starting state S0 with key k0, nonce n0 and counter c0 satisfying equations (4.1). The attacker does not know this state.

2. The oracle applies F −1(S0) to compute the related key k, nonce n and counter c for the starting state S.

3. The oracle computes Z = Salsa20(S), Z0 = Salsa20(S0) and gives them to the attacker.

4. The goal of the attacker is to recover the secret state S. 36 Chapter 4. Slid Pairs in Salsa20 and Trivium

The knowledge of the diagonal makes the previous attack even faster and allows the full 384-bit (256-bit entropy) state recovery with complexity of 4·232 because in the system of equations for each column only one variable has to be guessed. If the attacker chooses the nonce and the counter n0, c0 (160-bit entropy) then the complexity drops to 2 · 232. Furthermore, if nonce and counter n, c are known (128-bit entropy left), the state can be recovered immediately (with or without knowing nonce and counter n0, c0) as shown in the following section. Table 4.2 shows the time complexities for the described attacks, memory complexity is negligible.

known words of the starting state sliding on Salsa20 random oracle nothing 266 2511 only diagonal 234 2255 diagonal, n0, c0 233 2159 diagonal, n, c directly 2127 diagonal, n, c and n0, c0 directly 263

Table 4.2: Time complexities for state-recovery attacks

4.1.4 Related Key Key-Recovery Attack on Salsa20

Let us assume that we know the two ciphertexts and the corresponding nonces and counters of a slid pair. We do not know both keys but we know that both starting states diﬀer by function F which gives the relation shown in Figure 4.2 and the fact that the 2nd starting state conforms to the initial state format of Salsa20 (i.e. has proper 128-bit constant on the diagonal). In this section we show that this information is suﬃcient to completely recover the two related 256-bit secret keys of Salsa20 immediately.

0 0 0 0 0 0 σ0 k0 k1 k2 σ0 k3 c0 k5 σ0 k0 k1 k2 column 0 0 0 trans- 0 0 0 k3 σ1 n0 n1 k0 σ1 c1 k6 k3 σ1 n0 n1 → 0 0 0 → 0 0 0 c0 c1 σ2 k4 round k1 n0 σ2 k7 position c0 c1 σ2 k4 0 0 0 0 0 0 k5 k6 k7 σ3 k2 n1 k4 σ3 k5 k6 k7 σ3

Figure 4.2: Relation of the 1st and 2nd starting state (The grey squares are the known words.)

With the knowledge of both nonces and counters we are able to recover four words of the ﬁrst unknown key and two words of the second unknown 4.1. Slid Pairs in Salsa20 37 key (marked by bold letters in the following) by solving the equations1

0 0 0 0 0 k3 = −n1 k0 = k3 ⊕ ((n1 + n0) ≪ 13) 0 0 0 0 0 k6 = n1 ⊕ ((n0 + σ1) ≪ 9) k4 = −c0 + ((c1 ⊕ n0) ≫ 13) 0 0 0 k1 = c0 ⊕ ((k4 + σ2) ≪ 9) k7 = k4 ⊕ ((σ2 + n0) ≪ 7) . At this point we still have not used the similar equations from the bottom of the round computation. If parts of the nonce / counter are unknown these equations can be used to check our guesses. Then two similar systems of equations are left to get the other 10 key words. For the ﬁrst one we solve the equation

0 0 0 0 z2 = [(z8 − c0) ⊕ ((z1 − k0 + z0 − σ0) ≪ 9)] + [c0 ⊕ ((k0 + σ0) ≪ 9)] , (4.6)

0 whereas some words of the ciphertexts are used and only k0 is unknown. 0 With the solution of k0 we get four more words

0 0 0 0 k1 = c0 ⊕ ((k0 + σ0) ≪ 9) k2 = −k1 0 0 0 0 k5 = k2 ⊕ ((k1 + k0) ≪ 13) k3 = k0 ⊕ ((σ0 + k5) ≪ 7) .

0 For the second system we solve a similar equation to get k5

0 0 0 0 z13 = [(z7−n1)⊕((z12−k5+z15−σ3) ≪ 9)]+[n1⊕((k5+σ3) ≪ 9)] . (4.7)

0 With the solution of k5 we get the last four words

0 0 0 0 k6 = n1 ⊕ ((k5 + σ3) ≪ 9) k7 = −k6 0 0 0 0 k4 = k7 ⊕ ((k6 + k5) ≪ 13) k2 = k5 ⊕ ((σ3 + k4) ≪ 7) .

Equation (4.6) (similarly (4.7)) can be solved by just checking all 232 possi- 0 0 bilities for k0 (respectively k5 for (4.7)), however it can be done much more 8 0 0 eﬃciently with less then 2 steps by guessing k0 (or k5) gradually and checking the bitwise equations. We solved 16 equations but used 26 of the 64 equations in Appendix A to get these 16 equations. The complexity to get all the 16 key words is approximately 56 word operations which is much faster than a single Salsa20 encryption (976 word operations). We compute Salsa20 for both keys and compare the calculated ciphertexts to the given ones. If they match we have found the right keys.

4.1.5 A Generalized Related Key Attack on Salsa20 Suppose we are given a (possibly large) list of ciphertexts with the corresponding nonces and counters and we are told that the slid pair is hidden in this list. The question is: Can we eﬃciently ﬁnd slid pairs in a large list of

1The complete system of equations is shown in Appendix A, here we only give the rearranged equations. 38 Chapter 4. Slid Pairs in Salsa20 and Trivium ciphertexts? As we saw in the previous section, it is easy to compute both keys given such slid ciphertext pairs. The task is made more diﬃcult by the feedforward of Salsa20 which destroys the sliding relationship. Never- theless, in this section we show that given a list of ciphertexts of size O(2l) it is possible to detect a slid pair with memory and time complexity of just O(2l).2 The naive approach, which would require to check the equations from function F for each possible pair, will have complexity O(22l) which is too expensive. Our idea is to reduce the amount of potential pairs by sorting them by eight precomputed words, so that only elements where these eight words match have the possibility to yield a slid pair. After decreasing the number of possible pairs in that way we can check the remaining pairs using additional constraints coming from the sliding equations. For the sorting we use bucket sort because each word has only 232 possibilities. The number of words we sort by is equal to the number of runs of bucket sort. We have a set M of ciphertexts with corresponding nonces and counters. Each ciphertext can be either a 1st starting state or a 2nd starting state. To st regard this, the set is stored twice, ﬁrst under M1 to check for possible 1 nd starting states and second under M2 to check for possible 2 starting states.

Step 1: Sort the ﬁrst list For each element in set M1, undo the feedforward for the four words on the diagonal and for x9 = z9 − c1. Then sort M1 by the speciﬁed eight words x0, x5, x10, x15, x9 and c1, z1, z13. Step 2: Sort the second list3 0 0 Select only elements of M2 that satisfy equation 0 = c0 + c1 since only such an entry can be a 2nd starting state. For each element undo the feedforward 0 0 for the four words on the diagonal and for x6, . . . , x9 because nonces and counters are known. Then for each element compute the words marked in bold in the equations

0 0 0 0 0 x0 = x0 ⊕ ((z2 + z3) ≪ 18) k3 = −n1 0 0 0 0 0 0 0 x5 = x5 ⊕ ((z4 + n1 + x7) ≪ 18) x10 = x10 ⊕ ((x9 + x8) ≪ 18) 0 0 0 0 0 0 0 x15 = x15 ⊕ ((z13 + z14) ≪ 18) x1 = (z4 + n1) ⊕ ((x7 + x6) ≪ 13) 0 0 0 0 x9 = x6 ⊕ ((x5 + x1) ≪ 7) k0 = k3 ⊕ ((n1 + n0) ≪ 13) 0 c1 = n0 ⊕ ((σ1 + k0) ≪ 7) z1 = x1 + k0 0 0 0 0 k6 = n1 ⊕ ((n0 + σ1) ≪ 9) z13 = (k6 + x7) ⊕ ((x6 + x5) ≪ 9) .

2Sorting is done via bucket sort so we save the logarithmic factor l in complexity. 3If the number of rounds of Salsa is odd then such simple sorting would not be possible since Salsa equations are easier to solve in reverse direction. In our approach we know two words at the input and three words at the output of the columnround which is easier to solve than the opposite (three words at the input vs. two at the output). Nevertheless, the system is still solvable. 4.1. Slid Pairs in Salsa20 39

0 During this computation we calculate three key words k0, k6 and k3. Sort the set M2 by the calculated eight words x0, x5, x10, x15, x9 and c1, z1, z13 for the potential 1st starting states.

Step 3: Check each possible pair Cross check all the possible pairs that match in these eight words and thus satisfy the 256-bit ﬁltering. For the conforming pairs we can continue the check using the equations below. If a test condition is wrong this pair can not be a slid pair. For each pair undo the feedforward for the word x6 = z6 − n0 of the ciphertext of the 1st starting state. Then compute the bold variables and check the conditions for the three equations

0 0 0 compute k4 = −c0 + ((n0 ⊕ c1) ≫ 13) 0 0 0 x11 = −x8 + ((x6 ⊕ x9) ≫ 13) 0 0 0 check z11 = x11 + k4 0 0 compute k1 = c0 ⊕ ((k4 + σ2) ≪ 9) 0 0 x2 = x8 ⊕ ((x11 + x10) ≪ 9) check z2 = x2 + k1 0 compute k7 = k4 ⊕ ((σ2 + n0) ≪ 7) 0 x14 = x11 ⊕ ((x10 + x6) ≪ 7) check z14 = x14 + k7 .

0 During this computation we calculate the three key words k1, k7 and k4. For the remaining pairs we have two further systems of equations to 0 check. For the ﬁrst one we solve (4.6) and if there is no solution for k0 this 0 pair can not be a slid pair. Otherwise, we use k0 to compute four additional key words while we check two more conditions

0 0 compute k1 = c0 ⊕ ((k0 + σ0) ≪ 9) 0 0 k2 = −k1 0 0 0 k5 = k2 ⊕ ((k1 + k0) ≪ 13) 0 0 0 x1 = z1 − k0 0 0 0 0 0 x12 = (z3 − k2) ⊕ ((z2 − k1 + x1) ≪ 13) check z12 = x12 + k5 0 compute k3 = k0 ⊕ ((σ0 + k5) ≪ 7) 0 x4 = x1 ⊕ ((x0 + x12) ≪ 7) check z4 = x4 + k3 .

For the second system we similarly solve (4.7) and again if there is no 0 0 solution for k5 this pair can not be a slid pair. Otherwise, we use k5 to 40 Chapter 4. Slid Pairs in Salsa20 and Trivium compute the rest of the key words while we check two more conditions

0 0 compute k6 = n1 ⊕ ((k5 + σ3) ≪ 9) 0 0 k7 = −k6 0 0 0 k4 = k7 ⊕ ((k6 + k5) ≪ 13) 0 0 0 x12 = z12 − k5 0 0 0 0 0 x11 = (z14 − k7) ⊕ ((z13 − k6 + x12) ≪ 13) check z11 = x11 + k4 0 compute k2 = k5 ⊕ ((σ3 + k4) ≪ 7) 0 x3 = x12 ⊕ ((x15 + x11) ≪ 7) check z3 = x3 + k2 .

We have nine extra test conditions to check the potential slid pairs. We would only expect seven conditions but due to the diﬀerent arithmetic operations the dependencies of the equations are not clear. In total, we have at least ﬁltering power of 32 · 7 bits. Thus, we expect that only the correct slid pairs survive this check. The remaining pairs are the correct slid pairs for which we completely know both keys.

Complexity. Assuming we are given a list of 2l ciphertexts with corresponding nonces and counters. Instead of storing the list twice we use two kinds of pointers, one kind for the potential 1st starting states and the other one for the potential nd l l 2 starting states. For the pointers we need 32 · 2 words of memory. A summary for the complexity of diﬀerent lists is given in Table 4.3. The larger the list of the random states in which our target is hidden – the larger would be the complexity of the attack. However, the time complexity of the attack grows only linearly with the size of the list.

list size memory in words time in Salsa20 clocks 2128 28 · 2128 2122 2192 32 · 2192 2186 2256 36 · 2256 2250

Table 4.3: Complexities for diﬀerent list sizes

The number of slid pairs is 2256 which gives the probability of 2−128 for a random starting state to be a 1st or a 2nd starting state. Via the birthday paradox we expect the amount of 2256 random ciphertexts to contain both stating states for one slid pair. We have described how to search in a big list eﬃciently for a slid pair and recover both secret keys. 4.1. Slid Pairs in Salsa20 41

4.1.6 Time-Memory Tradeoff Attacks on Salsa Salsa20 has 2384 possible starting states. We notice that the square root of 2384 is less than the keyspace size for keys longer than 192 bits. Thus, a trivial birthday attack on 256-bit key Salsa20 would proceed as follows: During the preprocessing stage, we generate a list of 2192 randomly chosen starting states and run Salsa20 for each of them to get a sample of ciphertexts. Afterwards, we sort this list by the ciphertexts. During the online stage, we capture 2192 ciphertexts for which we want to find the keys. We do not have to store these ciphertexts and can check each of them immediately for a match with the sorted array of precomputed ciphertexts. If we have a match we retrieve the corresponding key from our table. Of course, due to very high memory complexity this attack can only be viewed as a certificational weakness. If we can choose the nonce or the counter there are only 2320 different starting states reducing the attack to precomputation and memory complexity of 2160 and – if we can choose both – the state space drops to 2256 and the attack complexity drops to 2128. Similar reasoning for 128-bit key Salsa20 would yield an attack with 264 complexity. Thus, it is crucial for the security of Salsa20 that nonces are chosen at random for each new key and the counter is not stuck at some fixed value (like 0, for example). The complexities are summarized in Table 4.4 where R stands for a complete run of Salsa20 and M for a matrix of Salsa (16 words).

precom- captured attack memory time putation ciphertext chosen nonce and counter R · 2128 2M · 2128 2128 2128 chosen nonce or counter R · 2160 2M · 2160 2160 2160 general R · 2192 2M · 2192 2192 2192 using sliding property R · 2192 2.5M · 2192 2192 2191

Table 4.4: Complexities for the Birthday attack

Improved Birthday Using the Sliding Property. We can use the sliding property to increase the eﬃciency of the birthday attack twice (which can be translated into the reduction of memory or time, or the increase of success probability of the birthday attack). Salsa20 has 2384 possible starting states in total and the sliding property reduces the number of possible starting states to 2257 (a slid pair has two starting states). Hence, a random starting state has the probability of 2−127 to be a starting state for a slid pair (either 1st or 2nd one). 42 Chapter 4. Slid Pairs in Salsa20 and Trivium

During the preprocessing stage we generate a sample of 2192 2nd starting states by using (4.1) of page 34 and choose the remaining eight words at random. We compute the corresponding ciphertexts for these states as well as the eight speciﬁed words for the corresponding 1st starting states as mentioned in step 2 of Section 4.1.5. We use two kinds of pointers to sort this generated list by the ciphertexts for the 2nd starting states and by the eight words for the corresponding 1st starting state. We capture ciphertext from the keystream where we also know the nonce and the counter. The amount of 2191 captured ciphertexts contains about either 264 1st starting states or 264 2nd starting states. We check if the ciphertext is a correct one for a 2nd starting state from our list (direct birthday) or matches the eight words for a 1st starting state for one of the states from our collection (then proceed with step 3 as described in Section 4.1.5 to check the remaining eight words) (indirect birthday). In both cases we obtain the key for this ciphertext.

4.2 Slid Pairs in Trivium

4.2.1 Brief Description of Trivium

The designers introduced the stream cipher Trivium with a state size of 288 bits. This internal state can be split into three registers as shown in Figure 4.3. The ﬁrst register which we call A has a length of 93, the second one called B has a length of 84 and the last register named C has 111 bits. The internal state is denoted in the following way:

A: (s1, s2, . . . , s93) B: (s94, s95, . . . , s177) C: (s178, s279, . . . , s288) . 1 69 93 66

t3 s t1 s s s 94 177 162 t1 171 t2 s s s s

zi 288 243 178 t2 264 t3 s s s s

Figure 4.3: Trivium with three registers 4.2. Slid Pairs in Trivium 43

Update and Keystream Production. The non-linear update function of second degree uses 15 bits of the internal state to compute three new bits, each for one register, and the keystream bit zi is calculated by summing up only 6 of these 15 bits. In the following pseudo-code all computations are over GF(2).

t1 ←− s66 + s93 t2 ←− s162 + s177 t3 ←− s243 + s288

zi ←− t1 + t2 + t3

t1 ←− t1 + s91 · s92 + s171 t2 ←− t2 + s175 · s176 + s264 t3 ←− t3 + s286 · s287 + s69

A: (s1, s2, ..., s93) ←− (t3, s1, . . . , s92) B: (s94, s95, . . . , s177) ←− (t1, s94, . . . , s176) C: (s178, s279, . . . , s288) ←− (t2, s178, . . . , s287)

Key and IV setup. In register A the key of 80 bits is loaded and the last 13 bits are set to zero. The IV is loaded into the ﬁrst 80 bits of register B and the last 4 bits are set to zero. In register C all positions are set to zero except for the last three bits which are set to one.

A: (s1, s2, ..., s93) ←− (K80,...,K1, 0,..., 0) B: (s94, s95, . . . , s177) ←− (IV80,...,IV1, 0,..., 0) C: (s178, s279, . . . , s288) ←− (0,..., 0, 1, 1, 1) In this chapter we refer to this state with key, IV and 128 ﬁxed positions as starting state. After the registers are initialized in the described way, the cipher is clocked 4 · 288 times using the update function without producing any keystream bits. This will ﬁnish the key setup. Now each following clock will produce a keystream bit.

4.2.2 Slid Pairs We start with the observation made by Jin Hong on the eSTREAM forum [Hon05] that it is possible to produce sliding states in Trivium. We searched for pairs of key and IV which produce another starting state after a few clocks. If we have a key and IV pair (K1,IV1) which produces another starting state with a key and IV (K2,IV2), the keystream created by (K2,IV2) will be the same as the one created by (K1,IV1) except for a shift of some bits. The number of shifted bits is equal to the number of clocks needed to get from the 1st to the 2nd starting state. We call such a pair of 44 Chapter 4. Slid Pairs in Salsa20 and Trivium two key and IV pairs a slid pair and denote this with [(K1,IV1), (K2,IV2), c] whereas c stands for the number of clock-shifts. Due to the special structure of the third register with 108 zeros and the last three ones, the ﬁrst possibility of a 2nd starting state to occur is after 111 clocks. Each following clock gives the chance for a 2nd starting state. Table 4.5 shows two examples for slid pairs with diﬀerent shifts written in hexadecimal numbers. The bits for keys and IVs are ordered from 1 to 80 but in the key-IV setup they are used the other way around.

[(K1,IV1), (K2,IV2), 111]

K1 : 70011000001E00000000 IV1 : AF9D635BCEF9AE376CF7 keystream4: 2E7338CB404272ABEE3F7BEC2F8D 55E27536D29AFFFF15DFDFD711AECC78D13D7B61 ...

K2 : 780000001DA2000003C1 IV2 : 1DF35CF6D4FFF4E3A6C0 keystream: 55E27536D29AFFFF15DFDFD711AECC78D13D7B61 ...

[(K3,IV3), (K4,IV4), 112]

K3 : 02065B9C001730000000 IV3: 609FC141828705160A3C keystream: A48BCA9143685F03DE646F83AB52 88BC9542798983349A959503E63BBF29C4755DE6 ...

K4 : B98000003E96E70005CE IV4 : 2B7C1483BC476A62E4CB keystream: 88BC9542798983349A959503E63BBF29C4755DE6 ...

Table 4.5: Two examples for slid pairs with diﬀerent shifts

4.2.3 Systems of Equations We describe the 2nd starting state as polynomial equations in the 80 key and 80 IV variables of the 1st pair. The 128 ﬁxed positions in a starting state yield a system of equations with 160 variables and 128 equations. We have more variables than equations which results in 232 solutions. To solve these systems we tried the F4 algorithm implemented in the computer algebra system Magma to get a Gröbner basis and the solutions for (K1,IV1) but gave up after c = 115 because of the long computation time. A more brute force approach – guessing a part of the variables, checking this guess and outputting the solution – which we implemented for each individual c worked

4The shift is c = 111 which means the first 111 bits are a prefix. When rewriting this prefix from hexadecimal to binary numbers the leading zero must be omitted because 111 is not a multiple of 4. 4.2. Slid Pairs in Trivium 45 much better. To get the 2nd key and IV we can use the systems of equations which describe the key and IV in the 2nd starting state just inserting the known values of the 1st key and IV pair or simply clock Trivium c times starting from (K1,IV1) to get (K2,IV2).

Some Facts about these Systems. The system of equations for the ﬁrst instance which appears after 111 clocks contains only 136 variables because the last 24 bits of the key do not occur in this system. Furthermore, 16 bits are given a priori due to the 13 zeros in register A and 3 ones in register C. The degree of the monomials in the equations is raised from 1 to 3. Due to the 24 key bits missing in the equations, these bits can be chosen arbitrarily. This leads us to 224 diﬀerent keys for one IV in the 1st key and IV pair of a slid pair. Table 4.6 collects some facts for the systems which we solved with our brute force approach.

clock-shift c 111 112 113 114 115 116 ··· 124 variables in equations 136 137 138 139 140 141 ··· 149 last key bits 24 23 22 21 20 19 ··· 11 not in the equation a priori given bits 16 15 14 13 13 13 ··· 13 computing time 2.5 2.5 10 32.5 64 – – – for Magma in days guess bits for Magma5 0 4 6 8 10 – – –

Table 4.6: Some facts for the systems of equations

We found that we sometimes have slightly less but mostly have slightly more solutions that we would have expected as shown in Table 4.7. The higher the clock-shift will be, the more complicated the systems of equations will become. For each clock-shift another system of equations is needed but for every step of c most of the equations are the same or related. Due to the length of register C which deﬁnes the occurrence of a second starting state we have at least 111 clock-shifts. Thus, we expect a minimum 111 · 232 ≈ 239 slid pairs just within a shift of 221 bits of each other. There are much more slid pairs for longer shifts but the equations would be much more complicated.

Non-existence of Special Slid Pairs. We searched for slid pairs with additional constraints. The ﬁrst type applies when the keys in both key and IV pairs are the same for any clock-shift c: ([(K,IV1), (K,IV2)], c), and the second type applies when both times the

5We guessed these bits to get a solution from Magma in a reasonable amount of time. 46 Chapter 4. Slid Pairs in Salsa20 and Trivium

c expected real solutions 111 28 28 + 84 112 29 29 – 72 113 210 210 – 176 114 211 211 + 29 115 212 212 – 28 116 213 213 – 27 117 214 214 – 29 118 215 215 + 212 + 210 + 29 119 216 216 + 213 120 217 217 – 212 121 218 218 + 214 + 212 + 211 122 219 219 + 214 + 213 + 212 + 210 123 220 220 + 215 124 221 221 + 213 + 210 + 29

Table 4.7: Some facts for solutions of diﬀerent systems of equations

same IV is used for any clock-shift c: ([(K1,IV ), (K2,IV )], c). In both cases the fixed 2nd key or IV leads to 80 additional equations which account for the occurrence of all 80-bit of key or IV resulting in overdefined systems with 208 equations and 160 variables. For both types the systems are not likely to be solvable for any reasonably small amount of shift. As a result of the 48 extra equations the chance for such a system to have a solution is about 2−48. We computed that for the first 31 instances (clock-shifts 111 up to 142) these systems have no solution.

4.3 Summary

In this chapter we have described sliding properties of Salsa20 and Trivium which lead to distinguishing, key-recovery and related key attacks on these ciphers. We have also shown that Salsa20 does not offer 256-bit security due to a simple birthday attack on its 384-bit state. Since the likelihood of falling in our related key classes by chance is relatively low (2256 out of 2384 for Salsa20, 239 out of 280 for Trivium), these attacks do not threaten most of the real-life usage of these ciphers. However, designers of protocols which would use these primitives should definitely be aware of these non- randomness properties which can be exploited in certain scenarios. In 2011 Bernstein [Ber11] published a modified version of Salsa20 called XSalsa20 which uses a 192-bit nonce. XSalsa20 uses the Salsa20 matrix and the Salsa20 computations but performs one step prior to the keystream generation. For this step the matrix is loaded with the constants, the key and the first 128 bits of the 192-bit nonce which are placed at the positions of the 4.3. Summary 47 former 64-bit nonce and 64-bit counter. Thereafter, Salsa20 is performed on this matrix without the feedforward at the end. Eight words are taken from the output matrix, four words at the diagonal and four words at the positions of the former 64-bit nonce and 64-bit counter. This concludes the prior step and these eight words are used as key for the keystream generation. Now XSalsa20 behaves like Salsa20 and the matrix is loaded with the constants, the eight specified words as key, the last 64 bits of the 192-bit nonce as 64-bit nonce and the 64-bit counter. The sliding property of Salsa20 is preserved for the keystream generation of XSalsa20 but not for the prior step. Thus, we can compute the starting matrix for the keystream generation but not the key itself. 48 Chapter 4. Slid Pairs in Salsa20 and Trivium Chapter 5

Analysis of SNOW 3G⊕ and SNOW 2.0⊕ Resynchronization Mechanism

The stream cipher SNOW 3G is the core of the 3GPP confidentiality and integrity algorithms UEA2 and UIA2, published in 2006 by the 3GPP Task Force [ETS06a]. Compared to its predecessor SNOW 2.0 [EJ02], SNOW 3G adopts a Finite State Machine (FSM) of three 32-bit words and two S- boxes to increase the resistance against the algebraic attacks by Billet and Gilbert [BG05]. Full evaluation of the design by the consortium is not public but a survey of this evaluation is given in [ETS06b]. Furthermore, SNOW 3G⊕ (in which the two modular additions are replaced by XORs) is defined and evaluated in [ETS06b]. The designers and external reviewers show that SNOW 3G has remarkable resistance against linear distinguishing attacks [NW06, WBdC03], while SNOW 3G⊕ offers much better resistance against algebraic attacks than SNOW 2.0⊕. A fault analysis of SNOW 3G is presented by Debraize [DC09] in 2009 using only 22 fault injections to reveal the secret key. In 2010 Brumley et al. [BHNS10] show a cache-timing attack on SNOW 3G retrieving the complete internal state from empirical timing data in a matter of seconds, without known keystream and only observation of a small number of cipher clocks. The stream cipher SNOW 2.0 was published in 2002 by Ekdahl and Johansson [EJ02]. Previous attacks on SNOW 2.0 used linear distinguishers to distinguish the keystream of SNOW 2.0 from a truly random sequence. A distinguishing attack with linear masking was applied by Watan- abe et al. [WBdC03] in 2003. It requires 2225 words of output and has a complexity of 2225. This attack was later improved by Nyberg and Wal- lén [NW06] in 2006 to a linear distinguisher which needs 2179 bits of keystream and 2174 operations. A completely different kind of attack was proposed by Billet and Gilbert [BG05] in 2005. They showed that a slightly

49 50 Chapter 5. Analysis of SNOW Resynchronization Mechanism simplified version of SNOW 2.0 can be broken using a linearization attack with a complexity of 250 and only 1,000 words of keystream to get the initial state of the Linear Feedback Shift Register (LFSR). The extension of this attack to the actual SNOW 2.0 leads to a overdetermined system of quadratic equations for which the solving complexity remains unknown so far. In this chapter we present the first attempt of cryptanalysis of SNOW 3G⊕ in public literature. We show that the feedback from the FSM to the LFSR during the initialization phase is vital for the security of this cipher, since we can break a version without such a feedback with two known IV’s in 256.1 time, 233 data and 225 memory complexity, and for any amount of initialization rounds! We then restore the feedback and study SNOW 3G⊕ against differential chosen IV attacks. We show attacks on SNOW 3G⊕ with 14, 15 and 16 rounds of initialization with time complexities 242.1, 291.3 and 2123.3 respectively. Afterwards, we perform the same kind of attacks on SNOW 2.0⊕. The known IV attack on a version without the feedback from the FSM to the LFSR during the initialization phase requires 236.5 time, 62 data and 230 memory complexities, and works for any amount of initialization rounds, too. For SNOW 2.0⊕ with the feedback we found differential chosen IV attacks on 14 up to 18 rounds of initialization with 236.4 time and small memory complexities up to 2122 time and 2123 memory complexities. This chapter is organized as follows: We give a description of SNOW 3G and SNOW 2.0 as well as their simplified versions SNOW 3G⊕ and SNOW 2.0⊕ in Section 5.1. The known and chosen IV attacks on SNOW 3G⊕ are presented in Section 5.2 and the similar attacks on SNOW 2.0⊕ are shown in Section 5.3. Finally, some conclusions are given in Section 5.4.

5.1 Description of both SNOW Ciphers

5.1.1 Description of SNOW 3G and SNOW 3G⊕ SNOW 3G is a word-oriented synchronous stream cipher with a 128-bit key and a 128-bit initial value (IV), each considered as four 32-bit words. It consists of a Linear Feedback Shift Register (LFSR) of sixteen 32-bit words and a Finite State Machine (FSM) with three 32-bit words, shown in Figure 5.1. The symbol ’⊕’ denotes the bitwise XOR and ’¢’ denotes the addition modulo 232. The feedback word of the LFSR is recursively computed as

t+1 −1 t t t s15 = α · s11 ⊕ s2 ⊕ α · s0 , where α is the root of the irreducible GF (28)[x] polynomial

x4 + β23x3 + β245x2 + β48x + β239 with β being the root of the irreducible GF (2)[x] polynomial

x8 + x7 + x5 + x3 + 1 . 5.1. Description of both SNOW Ciphers 51

α−1 α

s15 s11 s5 s2 s0

FSM F zt

R1 S1 R2 S2 R3

Figure 5.1: Keystream generation of SNOW 3G

t t The FSM has two input words s15 and s5 from the LFSR and computes at ﬁrst the output word F as

t t t t F = (s15 ¢ R1) ⊕ R2 .

Afterwards, the FSM is updated as

t+1 t t t t+1 t t+1 t R1 = R2 ¢ (R3 ⊕ s5) R2 = S1(R1) R3 = S2(R2) , where S1 and S2 are 32-bit to 32-bit S-boxes defined as compositions of four parallel applications of two 8-bit to 8-bit small S-boxes SR and SQ, with a linear diffusion layer respectively. Here SR is the well-known AES S-box and SQ is defined as

9 13 15 33 41 45 47 49 SQ(y) = y ⊕ y ⊕ y ⊕ y ⊕ y ⊕ y ⊕ y ⊕ y ⊕ y ⊕ 0x25 for y ∈ GF (28) deﬁned by x8 + x6 + x5 + x3 + 1. If we decompose a 32-bit word w into four bytes w = w(0)kw(1)kw(2)kw(3) with w(0) being the most and w(3) the least signiﬁcant bytes, then the S-boxes are

h iT (0) (1) (2) (3) S1(w) = MC1 · SR(w ),SR(w ),SR(w ),SR(w ) h iT (0) (1) (2) (3) S2(w) = MC2 · SQ(w ),SQ(w ),SQ(w ),SQ(w ) ,

8 where MC1 is the AES MixColumns operation for S1 over GF (2 ) deﬁned 8 4 3 8 by x + x + x + x + 1 and MC2 is the similar operation for S2 over GF (2 ) deﬁned by x8 + x6 + x5 + x3 + 1. SNOW 3G is initialized with the four word key K = (k0, k1, k2, k3) and the four word IV = (IV0,IV1,IV2,IV3). Let 1 be the all-one word, then the 52 Chapter 5. Analysis of SNOW Resynchronization Mechanism

LFSR is loaded as follows:

s15 = k3 ⊕ IV0 s14 = k2 s13 = k1 s12 = k0 ⊕ IV1 s11 = k3 ⊕ 1 s10 = k2 ⊕ 1 ⊕ IV2 s9 = k1 ⊕ 1 ⊕ IV3 s8 = k0 ⊕ 1 s7 = k3 s6 = k2 s5 = k1 s4 = k0 s3 = k3 ⊕ 1 s2 = k2 ⊕ 1 s1 = k1 ⊕ 1 s0 = k0 ⊕ 1

The FSM is initialized with R1 = R2 = R3 = 0. Then the cipher runs 32 clocks in the initialization mode doing:

1. compute the output F t of the FSM,

2. update the FSM,

3. clock the LFSR with F t XORed to the feedback of the LFSR.

After this, the cipher is switched to the keystream generation mode. As long as keystream is needed the cipher does:

1. compute the output F t of the FSM,

2. update the FSM,

t t t 3. compute the keystream word z = s0 ⊕ F , 4. clock the LFSR.

Please note that the ﬁrst keystream word is discarded. Hence, there are 33 initialization rounds. If we replace the two modulo additions in SNOW 3G by XORs we get SNOW 3G⊕. In each clock SNOW 3G⊕ executes two S-boxes, two multiplications and seven XORs. We will measure the time complexity of our attacks in SNOW 3G⊕ clocks.

5.1.2 Description of SNOW 2.0 and SNOW 2.0⊕

The only diﬀerence between SNOW 3G and SNOW 2.0 is the absence of the second S-box S2 and the third word R3 of the FSM as shown in Figure 5.2. All other equations, the initialization mode and the keystream generation are the same for SNOW 2.0. Two key sizes are allowed for SNOW 2.0 a 128-bit key and a 256-bit key but for the larger key the mixing and loading step is slightly diﬀerent. A more detailed description is given in [EJ02]. In our attacks we concentrate on the 128-bit key version. If we replace the two modulo additions in SNOW 2.0 by XORs we get SNOW 2.0⊕. In each clock SNOW 2.0⊕ executes one S-box, two multiplications and six XORs. Analog to SNOW 3G⊕ we will measure the time complexity of our attacks on SNOW 2.0⊕ in clocks of SNOW 2.0⊕. 5.2. Known and Chosen IV Attacks on Versions of SNOW 3G⊕ 53

α−1 α

s15 s11 s5 s2 s0

FSM F zt

R1 S1 R2

Figure 5.2: Figure of the keystream generation of SNOW 2.0

5.2 Known and Chosen IV Attacks on Versions of SNOW 3G⊕

In this section we will present a known IV attack on SNOW 3G⊕ without the FSM to LFSR feedback during the initialization phase. Afterwards, we restore the feedback and show a differential chosen IV attack on SNOW 3G⊕ with a reduced number of initialization rounds. In both attacks the attacker has access to two keystreams corresponding to (K,IVa) and (K,IVb), where IVa and IVb are either arbitrary known IVs or IVs chosen by the attacker. The goal is to recover the internal state and thus the secret key from the known or chosen differences. All attacks have a similar structure, they begin with a starting step, which is different for each attack, followed by a reduction step and then finish with an insertion step. In the starting step we only use the differences in the LFSR and the keystream to deduce potential differences in the FSM words exploiting the non-linearity from the S-boxes. In the reduction step we switch from the differences in the FSM words to the individual values of the FSM words to reduce the amount of potential differences exploiting the non-linearity of the S-boxes again. In the insertion step we simply insert the collected individual values for the FSM words in one system of keystream equations.

5.2.1 Diﬀerence Propagation through the S-boxes

To recover the initial state from the diﬀerences we need to know how the differences propagate through the S-boxes. Let va, vb, wa, wb, xa, xb be arbitrary 54 Chapter 5. Analysis of SNOW Resynchronization Mechanism words satisfying the following equations for the ﬁrst S-box

∆v = va ⊕ vb wa = S1(va) wb = S1(vb)

∆w = wa ⊕ wb = S1(va) ⊕ S1(vb) . We deﬁne a new notation for the last row

out def. ∆ S1(∆v) = S1(va) ⊕ S1(vb) . Similarly, we have for the second S-box

∆w = wa ⊕ wb xa = S2(wa) xb = S2(wb)

∆x = xa ⊕ xb = S2(wa) ⊕ S2(wb) . Likewise, we deﬁne a new notation for the last row

out def. ∆ S2(∆w) = S2(wa) ⊕ S2(wb) .

In the same way we define the output difference for the small S-box SR as out out ∆ SR(∆y) for a byte ∆y and ∆ SQ(∆y) for the small S-box SQ. For a fixed input difference ∆y we get at most 127 possible output differences for the 28 small S-box SR. This yields approximately 2 possible output differences for the S-box S1 for a fixed input difference ∆v. Also, we get at most 127 28 possible output differences for the small S-box SQ resulting in about 2 possible output differences for the S-box S2. It is obvious that for a zero input difference, the output difference will be zero as well for all S-boxes. If we can fix the input difference ∆v to a word with three bytes equal to zero out 7 and only one non-zero byte, we would get 2 possibilities for ∆ S1(∆v) as out 7 well as 2 possibilities for ∆ S2(∆v). The knowledge of an input-output difference ∆v → S1 → ∆w with fixed differences ∆v and ∆w for the first S-box allows us to recover on average 126 1 4 (2 · 127 + 4 · 127 ) ≈ 16.51 pairs of individual values for (va, vb). If we know 126 1 the pair (va, vb) we know (wa, wb), too. The number of 2 · 127 + 4 · 127 ≈ 2.02 pairs comes from the difference propagation of the small S-box SR. We have computed that the 256 pairs of individual values for each input difference unequal to zero yield exactly 127 output differences. In 126 cases the output difference is the same for two pairs and once four pairs result in the same output difference. Thus, in the 16.51 possible pairs of individual values we have around 8.255 times the case of two pairs (va, vb) and (vb, va). Now we can apply the second S-box to the individual values wa and wb and yield values xa = S2(wa) and xb = S2(wb). The 16.51 possible pairs for the individual values result in 8.255 possibilities for ∆x due to the fact that each two pairs (va, vb) and (vb, va) give two pairs (xa, xb) and (xb, xa) yielding the same difference ∆x. If we can fix this difference as well, yielding a known input-output difference sequence ∆v → S1 → ∆w → S2 → ∆x, only two pairs of individual values will conform to this difference sequence, namely (va, vb) and (vb, va). 5.2. Known and Chosen IV Attacks on Versions of SNOW 3G⊕ 55

5.2.2 Known IV Attack without FSM to LFSR Feedback In this section we consider a known IV attack on SNOW 3G⊕ without the FSM to LFSR feedback, in which the attacker has access to two keystreams corresponding to (K,IVa) and (K,IVb), where IVa and IVb are arbitrary known IVs. This attack works for any number of initialization rounds. During the keystream generation, we have the following equations for the diﬀerences at clock t:

out t t t t t t t−1 ∆z = ∆s15 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆s0 ∆R2 = ∆ S1(∆R1 ) out t t−1 t−1 t−1 t t−1 ∆R1 = ∆R2 ⊕ ∆R3 ⊕ ∆s5 ∆R3 = ∆ S2(∆R2 ) . The differences in the LFSR depend on the known difference of the IVs and are completely predictable due to the linearity. t The main procedures of our attack are: Assume that ∆R1 = 0 is given at time t. Deduce potential differences in the FSM words at different times from the linear evolution of the difference in the LFSR and the keystream difference equations. Knowing the input-output difference for the S-boxes, the few possibilities for the individual values of the FSM words are deduced. Combine the knowledge of the FSM state with that of the keystream to get linear equations on the LFSR state. Collect enough equations to get a solvable linear system which will recover the state of the LFSR. By the invertibility of the cipher run it backwards to find the 128-bit secret key K.

t Assume ∆R1 = 0. If this is not true, just take the next clock and so on. If we try this step 232 times, then it will happen with a good probability. t 1 2 Denote the time when ∆R1 = 0 by t = 1, then ∆R1 = 0 implicates ∆R2 = 0 3 and ∆R3 = 0 due to the diﬀerence propagation of the S-boxes. From the 1 2 keystream equation at t = 1 we know ∆R2. Similarly, we know ∆R1 from 1 which we can derive ∆R3 via the update equation. Hereafter, we denote the known diﬀerence values by ∆ki and yield the state of the FSM as follows:

t t t clock t ∆R1 ∆R2 ∆R3 1 0 ∆k0 ∆k2 2 ∆k1 0 3 0

Starting step: For clock 3 we take the keystream equation and the 3 update equation of ∆R1, which are

3 3 3 3 3 ∆z = ∆s15 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆s0 , 3 2 2 2 ∆R1 = ∆R2 ⊕ ∆R3 ⊕ ∆s5 , and combine them to one equation

2 3 3 3 2 3 ∆R3 ⊕ ∆R2 = ∆z ⊕ ∆s15 ⊕ ∆s5 ⊕ ∆s0 . 56 Chapter 5. Analysis of SNOW Resynchronization Mechanism

We know the right part completely and denote it with ∆k3. By the notations introduced in Section 5.2.1 we get

out out ∆ S2(∆k1) ⊕ ∆ S1(∆k2) = ∆k3 . (5.1)

228·228 24 We have 232 = 2 pairs satisfying this equation. To enumerate the possible pairs we ﬁrst rewrite (5.1) as     out (0) out (0)    (0)  ∆ SR(∆k ) ∆ SQ(∆k ) msb ∆k 2 1 p0 3  out (1)   out (1)   (1)   ∆ S (∆k )   ∆ S (∆k )   pmsb   ∆k   R 2  =  Q 1  ⊕  1  ⊕ MC−1 · 3 ,  out (2)   out (2)   msb  1  (2)  p2  ∆k   ∆ SR(∆k2 )   ∆ SQ(∆k1 )  3 msb (3) out (3) out (3) p3 ∆k ∆ SR(∆k2 ) ∆ SQ(∆k1 ) 3

msb where pi (i = 0, 1, 2, 3) denotes a byte polynomial which contains only out the most significant bits of all four values ∆ SQ. For a detailed explanation please see Appendix B. Thus, we can fulfill the enumeration byte by byte. out (0) 7 For the first row we need the value of ∆ SQ(∆k1 ), which has 2 possibilities, msb and three more bits for p0 . Then we check whether the value computed out (0) at the right side of the equation is a correct value for ∆ SR(∆k2 ). We will obtain 29 solutions for this byte equation. For the next three equations we already know the leading bits, which gives only 26 possibilities left in each byte equation resulting in 25 solutions. To get the solution of the word equation we have to combine the corresponding byte solutions and get 29 · 25 · 25 · 25 = 224 solutions, which needs about 2 · 224 = 225 words of memory. The time complexity of this step is negligible as we only have a −1 few XORs on bytes – the multiplication with MC1 has to be done only once prior we try all the byte possibilities. Now the states of the FSM are as follows: t t t clock t ∆R1 ∆R2 ∆R3 1 0 ∆k0 ∆k2 24 2 ∆k1 0 (2 ) 3 (224) (224) 0 4

3 4 Each possible value of ∆R2 results in a possible value of ∆R1. At clock 4 we have 3 4 4 4 3 4 ∆R2 ⊕ ∆R2 = ∆z ⊕ ∆s15 ⊕ ∆s5 ⊕ ∆s0 . 4 Replacing the diﬀerence ∆R2 with the notation of Section 5.2.1 and denoting the known right part with ∆k4 we receive

3 out 3 ∆R2⊕ ∆ S1(∆R1) = ∆k4 . 5.2. Known and Chosen IV Attacks on Versions of SNOW 3G⊕ 57

3 (0) (1) (2) (3) 3 (0) (1) (2) (3) Let ∆R1 = v kv kv kv and ∆R2 = w kw kw kw . Expanding this equation to the byte form we get  out    S (v(0))  (0)  (0) ∆ R w ∆k4  out   (1)   (1)   (1)   ∆ SR(v )  −1  w  −1  ∆k4  out = MC · ⊕ MC ·  .  (2)  1  w(2)  1  (2)   ∆ SR(v )  ∆k4 out (3) (3) (3) w ∆k ∆ SR(v ) 4 24 3 3 We have to insert all 2 possible pairs (∆R1, ∆R2) and verify the values out 24 1 23 ∆ SR for the single bytes. This results in a time complexity of 2 · 2 = 2 224·228 20 clocks. There are 232 = 2 entries satisfying this equation. This means 20 2 3 3 4 4 that we have 2 sequences (∆R3, ∆R1, ∆R2, ∆R1, ∆R2) left which gives the following state of the FSM: t t t clock t ∆R1 ∆R2 ∆R3 1 0 ∆k0 ∆k2 20 2 ∆k1 0 (2 ) 3 (220) (220) 0 4 (220) (220) The naive approach would be to continue checking the diﬀerences with the keystream equation at clock 5. The combined equation is

4 out 3 out 4 ∆R2⊕ ∆ S2(∆R2)⊕ ∆ S1(∆R1) = ∆k5 . We would have to insert all 220 possibilities for the sequences and check 220·228·228 44 two S-boxes. In total 232 = 2 sequences would satisfy this equation. Thus, we would get more and more sequences with each clock. Instead of increasing the number of difference sequences we reduce them exploiting the non-linearity of the S-boxes. We do no longer consider only the differences of the FSM values. We rather switch to the individual values of the FSM words. Reduction step: For each of the 220 sequences we know the input- 2 3 output difference of S1 at clock 2 and 3: ∆R1 → S1 → ∆R2. As explained 2 2 in Section 5.2.1 we can recover 16.51 pairs (R1,a,R1,b) on average. This 4 means that we have 8.255 possible values for ∆R3. Looking at clock 5 we have 4 4 out 4 ∆R2 ⊕ ∆R3⊕ ∆ S1(∆R1) = ∆k5 . 4 (0) (1) (2) (3) 4 (0) (1) (2) (3) 4 Let ∆R1 = v kv kv kv , ∆R2 = w kw kw kw and ∆R3 = x(0)kx(1)kx(2)kx(3). We can rewrite this equation into byte form  out    S (v(0))  (0)   (0)  (0) ∆ R w x ∆k5  out   (1)   (1)   (1)   (1)   ∆ SR(v ) −1 w  −1 x  −1 ∆k5  out = MC · ⊕ MC · ⊕ MC · .  (2)  1 w(2)  1 x(2)  1  (2)   ∆ SR(v ) ∆k5 out (3) (3) (3) (3) w x ∆k ∆ SR(v ) 5 58 Chapter 5. Analysis of SNOW Resynchronization Mechanism

20 4 4 We have to insert all 2 possible pairs (∆R1, ∆R2) with the corresponding out 4 8.255 possible values of ∆R3 and verify the values ∆ SR for the single bytes. 220·8.255·228 19.05 There are 232 ≈ 2 possible sequences left and the time complexity 20 1 22.05 is 2 · 8.255 · 2 ≈ 2 clocks. This identiﬁcation of the individual values in the FSM for both keystreams has to be repeated for the next nine clocks to get enough individual values to compute the complete internal state. Each check will have a lower time complexity than the one before and will reduce the possible number of diﬀerences. The time complexity for all ten checks together in SNOW 3G⊕ clocks is Ã ! X10 1 220 · 8.255i · (2−4)i−1 · ≈ 223.1 . 2 i=1

During this reduction step the memory requirements will not exceed 222 words. The number of sequences left is 220 · 8.25510 · (2−4)10 ≈ 210.5. Insertion step: Now we insert the individual values of the FSM into the keystream generation equations and the FSM update equations to get a i linear system of the LFSR initial state. For each diﬀerence ∆R1 we have two i i i i pairs (R1,a,R1,b) and (R1,b,R1,a) for i = 2,..., 12. Thus, we have to insert i i both values R1,a or R1,b but we only need the system of equations for one keystream. This would need a time complexity of 210.5 · 210 ≈ 220.5 clocks. We can check with the keystream equation of clock 1 whether we took the i i right value of the pairs (R1,a,R1,b). Then we clock backwards to receive the secret key K. Results: The overall time complexity in SNOW 3G⊕ clocks is ¡ ¢ 232 · 223 + 223.1 + 220.5 ≈ 256.1 .

The memory requirements are 225 words and the keystream is of length 233 words, especially 232 consecutive words for both keystreams.

5.2.3 Differential Chosen IV Attacks Now we look at SNOW 3G⊕ with the FSM to LFSR feedback during the initialization but reduced number of initialization rounds. We show attacks on 12 to 16 initialization rounds. All attacks are similar having the starting step, which is slightly different for each number of initialization rounds, followed by the reduction step and then finish with the insertion step. We consider a differential chosen IV attack scenario. Assume that we have two 128-bit IVs differing only in the most significant word IV0, which gives the difference in s15 of the LFSR. As we will see below for the 13 and 14 rounds case we can restrict the difference to a single byte of IV0 in order to reduce the complexity of our attacks. We denote this difference by ∆d. Until round 11 this difference will not affect the FSM. After round 11 the known 5.2. Known and Chosen IV Attacks on Versions of SNOW 3G⊕ 59

∆d enters the FSM word R1. In this section we denote the ﬁrst displayed keystream word with z0 and increase the clock for the following keystream words. The previous initialization rounds will have negative clocks. As described in Section 5.1.1 the ﬁrst computed keystream word is discarded. Hence, clock −1 will have no feedback from the FSM to the LFSR and no keystream word.

Reduced Initialization of 12 Rounds.

Since all the differences in the FSM are zero, there are no differences fed back into the LFSR. Thus, the differences in the LFSR are all known. Our knowledge of the differences in the FSM is shown in the following table:

t t t round clock t ∆R1 ∆R2 ∆R3 12 −1 ∆d 0 0 0 ∆d 0 1

The starting step is very easy because from the ﬁrst keystream equation 0 0 1 1 with ∆R1 = ∆d we get ∆R2, which gives us immediately ∆R1 and also ∆R2 from the next keystream equation. Thus, we only have one known sequence −1 −1 −1 0 0 0 1 1 (∆R1 = ∆d, ∆R2 = ∆R3 = 0, ∆R1 = ∆d, ∆R2, ∆R3 = 0, ∆R1, ∆R2). We perform the reduction step in order to get the individual values for the words of the FSM. Starting with the known input-output diﬀerence −1 0 ∆R1 = ∆d → S1 → ∆R2 we proceed as explained in the reduction step on 1 5.4 page 57. The time complexity is 10 · 8.255 · 2 ≈ 2 clocks. We have one i i i i i sequence left and for each ∆R1 only two pairs (R1,a,R1,b) and (R1,b,R1,a) with i = −1,..., 8. Afterwards, we execute the insertion step as explained on page 58 with a time complexity of 210 clocks. We use the keystream equation of clock 12 to check the candidates. Results: The total time complexity in SNOW 3G⊕ clocks is

25.4 + 210 ≈ 210.1 , the memory requirements are small and the needed keystream is only 12 consecutive words for each IV.

Reduced Initialization of 13 Rounds.

Here we extend the attack above by one more round. In the 13 round case, since until now all the differences in the FSM are either zero or the known ∆d, no unknown difference was fed back into the LFSR. Thus, the differences 60 Chapter 5. Analysis of SNOW Resynchronization Mechanism in the LFSR values are known. We know the differences in the FSM as shown in the following table:

t t t round clock t ∆R1 ∆R2 ∆R3 12 −2 ∆d 0 0 13 −1 ∆d 0 0

Starting step: From the ﬁrst keystream equation and the update equations we receive

−1 0 0 0 −1 0 ∆R2 ⊕ ∆R2 = ∆z ⊕ ∆s15 ⊕ ∆s5 ⊕ ∆s0 . Then we replace the diﬀerences at the left side with the notation introduced in Section 5.2.1, denote the known part at the right side with k0 and obtain the equation out out ∆ S1(∆d) ⊕ ∆ S1(∆d) = ∆k0 . −1 Multiplying by MC1 we get the byte form equation

 out   out    S (∆d(0)) S (∆d(0)) (0) ∆ R ∆ R ∆k0  out   out   (1)   (1)   (1)   ∆ SR(∆d )   ∆ SR(∆d )  −1  ∆k0  out ⊕ out = MC ·  .  (2)   (2)  1  (2)   ∆ SR(∆d )   ∆ SR(∆d )  ∆k0 out out (3) (3) (3) ∆k ∆ SR(∆d ) ∆ SR(∆d ) 0 We can check these four byte equations in a negligible time complexity as 228·228 24 we only have a few XORs. The number of solutions will be 232 = 2 −1 0 25 pairs of (∆R2 , ∆R2) which has memory requirements of 2 words. We 24 −2 −2 −2 −1 −1 have 2 sequences (∆R1 = ∆d, ∆R2 = ∆R3 = 0, ∆R1 = ∆d, ∆R2 , −1 0 0 ∆R3 = 0, ∆R1, ∆R2). Again, we go through the reduction step as explained on page 57 to reduce the number of sequences and to get the individual values for the FSM. −2 −1 We start with the input-output³ diﬀerence ∆R1 = ∆d → ´S1 → ∆R2 . The P10 24 i −4 i−1 1 27.1 time complexity of this step is i=1 2 · 8.255 · (2 ) · 2 ≈ 2 clocks. At the end we have 224 · 8.25510 · (2−4)10 ≈ 214.5 diﬀerence sequences left. The memory requirements will not exceed 226 words. Now we perform the insertion step as explained on page 58, which would require a time complexity of 214.5 · 210 ≈ 224.5 clocks. Results: The overall time complexity in SNOW 3G⊕ clocks is

227.1 + 224.5 ≈ 227.3 .

The memory requirements are 226 words and the keystream is of length 12 consecutive words for each IV. 5.2. Known and Chosen IV Attacks on Versions of SNOW 3G⊕ 61

Restriction: If we restrict the known arbitrary diﬀerence ∆d to a word with three bytes equal to zero and only one non-zero byte, we can reduce our attack complexity considerably. After the starting step we only have −1 0 one pair (∆R2 , ∆R2) of diﬀerences left, just as in the attack on 12 rounds explained on page 59. This way we will have the same time complexity of 210.1 clocks and the memory requirements are small. The keystream will be of 12 consecutive words for each IV.

Reduced Initialization of 14 Rounds. After 14 initialization rounds, we have one unknown difference in the LFSR −2 since the difference ∆R2 was fed back into the LFSR in round 13. All other differences, which were fed back, are either zero or the known ∆d. −3 32 We guess the individual value R1,a for the first pair (K,IVa) which has 2 −3 −3 −3 possibilities. From the value R1,a we get with ∆R1 = ∆d the value R1,b for −2 −2 the second pair (K,IVb). Furthermore, we obtain the two pairs (R2,a,R2,b ) −1 −1 −2 and (R3,a,R3,b ). We denote the now known difference ∆R2 with ∆k0, the −1 −1 linear dependent ∆R1 with ∆k1 and ∆R3 with ∆k2. This way we can compute all differences of the LFSR because we know all differences coming from the FSM. We have the following knowledge of differences for the FSM:

t t t round clock t ∆R1 ∆R2 ∆R3 12 −3 ∆d 0 0 13 −2 ∆d ∆k0 0 14 −1 ∆k1 ∆k2 0

Starting step: From the ﬁrst keystream equation and the update equa- 0 0 tions for ∆R1 and ∆R2 we receive

−1 0 0 0 −1 0 ∆R2 ⊕ ∆R2 = ∆z ⊕ ∆s15 ⊕ ∆k2 ⊕ ∆s5 ⊕ ∆s0 , which gives, with the notation of Section 5.2.1 and the known right part denoted as ∆k3, the equation

out out ∆ S1(∆d)⊕ ∆ S1(∆k1) = ∆k3 .

−1 We multiply the equation with MC1 and rewrite it in byte notation as

 out   out    S (∆d(0)) S (∆k(0)) (0) ∆ R ∆ R 1 ∆k3  out   out   (1)   (1)   (1)   ∆ SR(∆d )   ∆ SR(∆k1 )  −1  ∆k3  out ⊕ out = MC ·  .  (2)   (2)  1  ∆k(2)   ∆ SR(∆d )   ∆ SR(∆k1 )  3 out out (3) (3) (3) ∆k ∆ SR(∆d ) ∆ SR(∆k1 ) 3 62 Chapter 5. Analysis of SNOW Resynchronization Mechanism

Then we check this equation line by line for each byte in a negligible time 228·228 24 −1 0 complexity. The number of solutions will be 232 = 2 pairs (∆R2 , ∆R2) which need 225 words of memory. As expected we start the reduction step with the input-output diﬀe- −2 −1 24 rence ∆R1 → S1 → ∆R2 . Since we start with 2 sequences, we have exactly the same procedure as in the attack on 13 rounds on page 59 and thus the same complexities. The same applies to the insertion step. Results: The overall time complexity is the same as in the 13 round −3 case for each guess of R1,a, which gives ¡ ¢ 232 · 227.1 + 224.5 ≈ 259.3

SNOW 3G⊕ clocks. The memory requirements are 226 words and the keystream is of length 12 words for each IV.

Restriction: If we restrict the known difference ∆d to one non-zero byte only, we can reduce our attack complexity. We would start with guessing −3 32 the individual value R1,a for the first pair (K,IVa) with 2 possibilities, as explained above, and yield the same known differences of the FSM. Then we proceed with the starting step as described. Due to the re- 27·228 3 −1 0 striction only 232 = 2 pairs (∆R2 , ∆R2) satisfy the equation, which require 24 words of memory. During the reduction step the amount of sequences decreases to one. −2 −1 We start with the input-output difference ∆R1 → S1 → ∆R2 and get the individual values for 10 clocks in total. The time complexity is 26.5 clocks and the memory will not exceed 26 words. The insertion step is the same as in the 12 round case explained on page 59 with a time complexity of 210 clocks. Results: The total time complexity for this attack with the restriction is ¡ ¢ 232 · 26.5 + 210 ≈ 242.1 SNOW 3G⊕ clocks. The memory requirements are 26 words and the keystream is of length 12 words for each IV.

Reduced Initialization of 15 Rounds and 16 Rounds. −3 −2 In the 15 round case two unknown differences ∆R2 and ∆R2 were fed back −4 −3 into the LFSR. We guess the individual values of R1,a and R1,a for the first 64 −4 −4 pair (K,IVa) with complexity of 2 . From the value R1,a and ∆R1 = ∆d −4 −3 −3 −2 −2 we get the value of R1,b and following the pairs (R2,a,R2,b ) and (R3,a,R3,b ). −3 −2 −2 Denote the known differences ∆R2 by ∆k0, ∆R1 by ∆k1 and ∆R3 by −3 −3 −3 ∆k2. From R1,a and ∆R1 = ∆d we get the value of R1,b and following −2 −2 −1 −1 the pairs (R2,a,R2,b ) and (R3,a,R3,b ). Again, we denote the now known 5.3. Known and Chosen IV Attacks on Versions of SNOW 2.0⊕ 63

−2 −1 −1 diﬀerences ∆R2 by ∆k3, ∆R1 by ∆k4 and ∆R3 by ∆k5. This gives the following diﬀerences for the FSM:

t t t round clock t ∆R1 ∆R2 ∆R3 12 −4 ∆d 0 0 13 −3 ∆d ∆k0 0 14 −2 ∆k1 ∆k3 ∆k2 15 −1 ∆k4 ∆k5 0

We now have the same starting point as in the attack on 14 initialization rounds described on page 61. For all three steps we proceed in the same way as explained there. Results: Since we guessed one more word in the beginning of the attack the time complexity becomes

232 · 259.3 ≈ 291.3

SNOW 3G⊕ clocks. The memory and keystream requirements remain. out A restriction of the ∆d is no longer useful because the value ∆ S1(∆d) is not computed anymore.

In the 16 round case we guess one more word and then proceed as in the attack on 15 rounds. The time complexity in SNOW 3G⊕ clocks is

232 · 291.3 ≈ 2123.3 and the memory and keystream requirements remain.

The summary of our results is given in Table 5.1 with time in SNOW 3G⊕ clocks and keystream and memory in words.

5.3 Known and Chosen IV Attacks on Versions of SNOW 2.0⊕

In this section we will show a known IV attack on SNOW 2.0⊕ without the FSM to LFSR feedback during the initialization phase. Afterwards, we restore the feedback and present a chosen IV attack on SNOW 2.0⊕ with a reduced number of initialization rounds. Both attacks are similar to the attacks on SNOW 3G⊕ described in Section 5.2. In both attacks on SNOW 2.0⊕ the attacker has access to two keystreams corresponding to (K,IVa) and (K,IVb), where IVa and IVb are either arbitrary known IVs or IVs chosen by the attacker. The goal is to recover the internal state and thus the secret key from the known or chosen diﬀerences. 64 Chapter 5. Analysis of SNOW Resynchronization Mechanism

target keystream time memory SNOW 3G⊕ without feedback 233 256.1 225 SNOW 3G⊕ with feedback 12 initialization rounds 24 210.1 small 13 initialization rounds 24 227.3 226 with restriction of ∆d to 1 byte 24 210.1 small 14 initialization rounds 24 259.3 226 with restriction of ∆d to 1 byte 24 242.1 26 15 initialization rounds 24 291.3 226 16 initialization rounds 24 2123.3 226

Table 5.1: Summary of our known and chosen IV attacks on SNOW 3G⊕

For the propagation of the differences through the S-box we refer to Sec- tion 5.2.1 considering only the first S-box there. All attacks have a similar structure, they begin with a starting step, which is different for each attack, followed by a reduction step and then finish with an insertion step. In the starting step we either know or guess the first difference occurring in the FSM and deduce the potential differences for the FSM words using the next keystream equation. In the reduction step we reduce the amount of possible differences to one using the keystream and update equations for the next clocks always performing the same check. In the insertion step we compute the individual values for the differences in the FSM words, insert them into one system of keystream equations and check for the right insertion.

5.3.1 Known IV Attack without FSM to LFSR Feedback In this section we consider a known IV attack on SNOW 2.0⊕ without the FSM to LFSR feedback, in which the attacker has access to two keystreams corresponding to (K,IVa) and (K,IVb), where IVa and IVb are arbitrary known IVs. This attack works for any number of initialization rounds. During the keystream generation we have the following equations for the diﬀerences at clock t:

t t t t t ∆z = ∆s15 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆s0 t t−1 t−1 ∆R1 = ∆R2 ⊕ ∆s5 out t t−1 ∆R2 = ∆ S1(∆R1 ) .

The diﬀerences in the LFSR depend on the known diﬀerence of the IVs and are completely predictable due to the linearity. 5.3. Known and Chosen IV Attacks on Versions of SNOW 2.0⊕ 65

t The main procedures of our attack are: Guess ∆R1 at time t. Deduce potential differences in the FSM words at different times from the linear evolution of the difference in the LFSR and the keystream difference equations. Knowing the input-output difference for the S-box, the few possibilities for the individual values of the FSM words are deduced. Combine the knowledge of the FSM state with that of the keystream to get linear equations on the LFSR state. Collect enough equations to get a solvable linear system which will recover the state of the LFSR. By the invertibility of the cipher run it backwards to find the 128-bit secret key K.

1 Starting step: We guess at an arbitrary clock t = 1 the value of ∆R1 1 which gives us directly ∆R2 via the keystream equation of clock 1. Fur- 2 1 thermore, we get ∆R1 via the update equation of ∆R2 and with the next out 2 2 1 keystream equation ∆R2. On the other hand, we know ∆R2 =∆ S1(∆R1) 1 as explained in Section 5.2.1. Thus, we can check our guess for ∆R1 immediately by combining the keystream equations of clock 1 and 2 with the 2 2 update equations for ∆R1 and ∆R2 yielding

out 1 1 2 2 1 1 1 1 2 ∆ S1(∆R1) ⊕ ∆R1 = ∆z ⊕ ∆s15 ⊕ ∆z ⊕ ∆s15 ⊕ ∆s0 ⊕ ∆s5 ⊕ ∆s0 .

We know the right part of this equation completely and denote it with ∆k0. 1 (0) (1) (2) (3) −1 Let ∆R1 = v kv kv kv . We multiply the whole equation with MC1 and receive it in byte form

 out    S (v(0))  (0)  (0) ∆ R v ∆k0  out   (1)   (1)   (1)   ∆ SR(v )  −1  v  −1  ∆k0  out = MC · ⊕ MC ·  .  (2)  1  v(2)  1  (2)   ∆ SR(v )  ∆k0 out (3) (3) (3) v ∆k ∆ SR(v ) 0

out The values ∆ SR for the small S-box can be checked bytewise. We have 32 1 to insert all 2 possibilities for ∆R1 which results in a time complexity of out 32 1 28 2 1 2 clocks. For each ∆R1 we have 2 possibilities for ∆R2 =∆ S1(∆R1) 232·228 28 1 2 and therefore 232 = 2 pairs (∆R1, ∆R2) remain after the check. The memory requirements to store these pairs are 229 words. With the linear de- 28 1 1 2 2 pendencies of the diﬀerences we have 2 sequences (∆R1, ∆R2, ∆R1, ∆R2). Reduction step: For clock 3 we take the keystream equation and the update equations for the FSM and obtain

2 3 3 3 2 3 ∆R2 ⊕ ∆R2 = ∆z ⊕ ∆s15 ⊕ ∆s5 ⊕ ∆s0 .

Denoting the known right part with ∆k1 and using the notation of Sec- tion 5.2.1 we get 2 out 2 ∆R2⊕ ∆ S1(∆R1) = ∆k1 . 66 Chapter 5. Analysis of SNOW Resynchronization Mechanism

2 (0) (1) (2) (3) 2 (0) (1) (2) (3) Let ∆R1 = w kw kw kw and ∆R2 = x kx kx kx . We can rewrite this equation into byte form

 out    S (w(0))  (0)  (0) ∆ R x ∆k1  out   (1)   (1)   (1)   ∆ SR(w )  −1  x  −1  ∆k1  out = MC · ⊕ MC ·  .  (2)  1  x(2)  1  (2)   ∆ SR(w )  ∆k1 out (3) (3) (3) x ∆k ∆ SR(w ) 1

28 2 2 We have to insert all 2 possibilities for (∆R1, ∆R2), which requires a time out 28 complexity of 2 clocks, and can check the values for ∆ SR as explained 228·228 24 2 2 3 before. The amount of 232 = 2 triples (∆R1, ∆R2, ∆R2) remains after the check. With the linear dependencies of the differences we have 224 se- 1 1 2 2 3 3 quences (∆R1, ∆R2, ∆R1, ∆R2, ∆R1, ∆R2). We can repeat this check until we only have one sequence left. Since every check will reduce the number of possibilities as well as the time complexity for the next check by 2−4, we will need six more clocks. The time complexity for all seven checks together is P6 28 −4i 28.1 i i i=0 2 · 2 ≈ 2 . At the end, we only have one sequence (∆R1, ∆R2) 10 10 left with i = 1,..., 9. For this sequence we need to compute (∆R1 , ∆R2 ). Insertion step: Now we can switch from the differences to the individual i i+1 i values for (R1,R2 ), i = 1,..., 9. Each fixed input-output difference ∆R1 → i+1 i i S1 → ∆R2 has about 16.51 pairs (R1,a,R1,b) with i = 1,..., 9 as explained in Section 5.2.1. We need the system of equations for one keystream only, and insert always the first value of each pair. The clocks 1 to 9 are sufficient to get 16 consecutive values for the LFSR. Altogether, we get 16.519 ≈ 236.4 possible insertions. Thus, we use 22 consecutive keystream equations starting with clock 10 to verify which of the 236.4 possible insertions was correct. Consequently, this insertion step has a time complexity of 236.4 clocks. Results: In summary, our attack will have a time complexity of

232 + 232.1 + 236.4 ≈ 236.5

SNOW 2.0⊕ clocks. We need 229 words of memory and 9 + 22 = 31 consecutive words for both keystreams.

5.3.2 Differential Chosen IV Attacks Now we look at SNOW 2.0⊕ with the FSM to LFSR feedback during the initialization but reduced number of initialization rounds. We show attacks on 12 to 18 initialization rounds. All attacks are similar having the starting step, which is slightly different for each number of initialization rounds, followed by the reduction step and then finish with the insertion step. We consider a differential chosen IV attack scenario. Assume that we have two 128-bit IVs differing only in the most significant word IV0, which gives the difference in s15 of the LFSR. We denote the difference in the IV word 5.3. Known and Chosen IV Attacks on Versions of SNOW 2.0⊕ 67

IV0 by ∆d. As we will see below we can restrict this difference to a single byte of ∆d in order to reduce the complexity of our attacks. Until round 11 the difference will not affect the FSM. After round 11 the known ∆d enters the FSM word R1. In this section we denote the first displayed keystream word with z0 and increase the clock for the following keystream words. The previous initialization rounds will have negative clocks. As described in Sec- tion 5.1.2 the first computed keystream word is discarded. Hence, clock −1 will have no feedback from the FSM to the LFSR and no keystream word.

Reduced Initialization of 12 Rounds.

The differences in the LFSR values are known because so far all differences in the FSM are zero and therefore no unknown difference was fed back into the LFSR. 0 In the starting step we know ∆R1 = ∆d for the first keystream equa- 0 t t tion. Therefore, we know ∆R2 and consequentially ∆R1 and ∆R2 for all clocks. We do not need the reduction step because we already have only one se- 10 10 quence of differences. We simply compute all differences up to (∆R1 , ∆R2 ) via the update and keystream equations. Then we perform the insertion step of the individual values for R1 and R2 as explained in the attack in Section 5.3.1. Results: Our attack has a time complexity of 236.4 SNOW 2.0⊕ clocks, memory requirements are small and we need 31 consecutive words for both keystreams.

Reduced Initialization of 13 Rounds.

We extend the attack above by one more round. All differences in the FSM are zero or the known ∆d until now. Therefore, no unknown difference was fed back into the LFSR and all differences of the LFSR values are known. Our knowledge of the differences in the FSM is shown in the following table:

t t round clock t ∆R1 ∆R2 12 −2 ∆d 0 13 −1 ∆d 0

Starting step: From the ﬁrst keystream equation and the update equa- 0 0 tions for ∆R1 and ∆R2 with notation of Section 5.2.1 we get the equation

out out 0 0 −1 0 ∆ S1(∆d) ⊕ ∆ S1(∆d) = ∆z ⊕ ∆s15 ⊕ ∆s5 ⊕ ∆s0 . 68 Chapter 5. Analysis of SNOW Resynchronization Mechanism

We denote the known part at the right side of this equation as ∆k0, multiply −1 the whole equation with MC1 and gain the byte form equation

We can check these four byte equations in a negligible time complexity as we 228·228 24 only have a few XORs. The number of solutions will be 232 = 2 pairs −1 0 25 (∆R2 , ∆R2) requiring 2 words of memory. Reduction step: We start the check with the keystream and update 24 0 0 equations for clock 1 and have to insert all 2 pairs (∆R1, ∆R2) similarly to the reduction step in Section 5.3.1. Then we proceed for the next five clocks i i until we only have one sequence (∆R1, ∆R2) left with i = −1,..., 6. For 8 8 this sequence we compute the differences up to (∆R1, ∆R2) via the update and keystream equations. The time complexity for all six checks together is P5 24 −4i 24.1 i=0 2 · 2 ≈ 2 . Insertion step: Our starting point for the computation of the individual −1 0 values for the FSM is the input-output difference ∆R1 → S1 → ∆R2. Then we proceed as explained in the insertion step in Section 5.3.1. Results: Altogether, the time complexity in SNOW 2.0⊕ clocks is

224.1 + 236.4 ≈ 236.4 .

The memory requirements are 225 words and we need 31 consecutive words for both keystreams.

Restriction: If we would restrict the known diﬀerence ∆d to a word which has three bytes equal to zero and only one non-zero byte, we would get only one pair after the starting step. Thus, we have no reduction step and the same insertion step as in the attack on 12 rounds described on page 67. There would be no inﬂuence on the time complexity and the needed keystream because the insertion step is the expensive one, but we could reduce the memory requirements to a small amount.

Reduced Initialization of 14 Rounds.

After 14 initialization rounds we have one unknown difference in the LFSR. All other differences in the LFSR values are either zero or the known ∆d. We out −2 28 guess this unknown difference ∆R2 =∆ S1(∆d) which has 2 possibilities. 5.3. Known and Chosen IV Attacks on Versions of SNOW 2.0⊕ 69

−2 −1 We denote the now known ∆R2 with ∆k0 and the linear dependent ∆R1 with ∆k1. This gives the following diﬀerences for the FSM: t t round clock t ∆R1 ∆R2 12 −3 ∆d 0 13 −2 ∆d k0 14 −1 k1 0 With this knowledge of the FSM diﬀerences we are at the same starting point as in the attack on 13 rounds on page 67. Results: Thus, we would have the same memory and keystream requirements and the time complexity is 228 · 236.4 ≈ 264.4 SNOW 2.0⊕ clocks due to the extra guess.

Consideration with more memory: If we consider the 228 extra possibil- 52 −1 0 ities immediately in the starting step we would get 2 pairs (∆R2 , ∆R2) which would need 253 words of memory. During the reduction step we decrease this amount to one using 13 checks until clock 14. Afterwards, we perform the insertion step with this single sequence. Then the time complexity for the starting step is 237, for the reduction step it is 252.1, and for the insertion step it remains. This concludes to a total time complexity of 252.1 clocks. The memory requirements are 253 words and the needed keystream remains.

Restriction: With the restriction of ∆d to a word with only one non- 3 −1 0 zero byte we would have 2 pairs (∆R2 , ∆R2) after the starting step. Thus, the time complexity for the starting step and the reduction step is negligible, and for the insertion step it remains. This concludes to a total time complexity of 236.4 SNOW 2.0⊕ clocks. The memory requirements are small and the needed keystream remains.

Reduced Initialization up to 18 Rounds. We can pursue these attacks for the next four rounds until initialization round 18. Prior to the starting step we would have to guess all unknown diﬀerences which were fed back into the LFSR. Then we are always at the same starting point as described in the 14 round case before. The restriction of ∆d to a word with only one non-zero byte reduces the complexities in all these cases. The consideration with more memory, as explained in the 14 round case before, is applicable with or without the restriction for all cases. 70 Chapter 5. Analysis of SNOW Resynchronization Mechanism

The summary of our results is given in Table 5.2 with time in SNOW 2.0⊕ clocks and keystream and memory in words. If one of the requirements exceeds 2128 the attempt is not considered as a promising attack anymore.

target keystream time memory SNOW 2.0⊕ without feedback 62 236.5 230 SNOW 2.0⊕ with feedback 12 initialization rounds 62 236.4 small 13 initialization rounds 62 236.4 225 with restriction of ∆d to 1 byte 62 236.4 small 14 initialization rounds 62 264.4 225 with more memory 62 252.1 253 with restriction of ∆d to 1 byte 62 236.4 small 15 initialization rounds 62 292.4 225 with more memory 62 280.1 281 with restriction of ∆d to 1 byte 62 250.4 225 with restriction and more memory 62 238.4 239 16 initialization rounds with more memory 62 2108.1 2109 with restriction of ∆d to 1 byte 62 278.4 225 with restriction and more memory 62 266 267 17 initialization rounds with restriction of ∆d to 1 byte 62 2106.4 225 with restriction and more memory 62 294 295 18 initialization rounds with restriction and more memory 62 2122 2123

Table 5.2: Summary of our known and chosen IV attacks on SNOW 2.0⊕

5.4 Summary

In the ﬁrst half of this chapter we have shown known IV and chosen IV resynchronization attacks on SNOW 3G⊕. We can attack any amount of initialization rounds of SNOW 3G⊕ if there is no feedback from FSM to LFSR. With such feedback we show key-recovery attacks up to 16 rounds of initialization and use only a few keystream words. Our results indicate that about half of the initialization rounds of SNOW 3G⊕ might succumb to chosen IV resynchronization attacks. However, the remaining security margin is quite signiﬁcant and thus these attacks pose no threat to the security of SNOW 3G itself. In the second half we have shown known IV and chosen IV resynchronization attacks on SNOW 2.0⊕. Likewise, we can attack 5.4. Summary 71 any amount of initialization rounds of SNOW 2.0⊕ if there is no feedback from FSM to LFSR. With such feedback we are able to mount key-recovery attacks up to 18 rounds of initialization using only a few keystream words. This shows that more than half of the initialization rounds of SNOW 2.0⊕ succumb to chosen IV resynchronization attacks. However, the remaining security margin is big enough and thus these attacks pose no threat to the security of SNOW 2.0 itself. 72 Chapter 5. Analysis of SNOW Resynchronization Mechanism Chapter 6

Attacks on Simpliﬁed Versions of K2

The stream cipher K2 was proposed by Kiyomoto, Tanaka and Sakurai at SECRYPT 2007 [KTS07]. Currently it is submitted to the International Organization for Standardization (ISO) for standardization. A security evaluation is given in [BPR11] with the conclusion that no weaknesses were found. The attack idea of [BHNS10] on SNOW 3G is applicable to K2 as well. Some side-channel attacks are applied on K2 in [HYY+10] showing that K2 offers reasonable resistance to side-channel attacks even without countermeasures. In this chapter we present two attacks on simplified versions of K2. We show a differential chosen IV attack with key-recovery on simplified versions with five, six and seven initialization clocks with time complexity of 210.05, 218.05 and 247.05 clocks. The needed keystream amounts to 28 words for the attacks on five and six initialization clocks and 24 words for the attack on seven initialization clocks. For a simplified version with seven initialization clocks we show a distinguishing attack with a time complexity of 227.6 clocks, needed keystream of 232 words and negligible memory requirements. This chapter is organized as follows. We give a description of the cipher K2 and its simplification K2⊕ in Section 6.1. The differential chosen IV attack with key-recovery on K2⊕ with five, six and seven initialization clocks is presented in Section 6.2. In Section 6.3 we describe the distinguishing attack on K2⊕ with seven initialization clocks. Some conclusions are given in Section 6.4.

6.1 Description of K2 and K2⊕

Kiyomoto, Tanaka and Sakurai proposed the stream cipher K2 at SECRYPT 2007 [KTS07]. It consists of two Feedback Shift Registers (FSR), a Dy- namic Feedback Controller (DFC), which dynamically chooses between four

73 74 Chapter 6. Attacks on Simpliﬁed Versions of K2 diﬀerent feedback functions, and a Non-Linear Function (NLF) as shown in Figure 6.1.

α0 FSR-A 0 1 2 3 4

DFC

1 α1 or or α3 α2 FSR-B 10 9 8 7 6 5 4 3 2 1 0

L2 R2

Sub Sub Sub Sub

L1 R1

NLF

zH zL

Figure 6.1: Keystream generation of K2

The ﬁrst FSR, called FSR−A, has ﬁve words (a4, . . . , a0) each of size 32 bit. The feedback function is

t t−1 t−1 a4 = α0a0 ⊕ a3 where the multiplier α0 is a constant chosen as the root of an irreducible polynomial of fourth degree in GF (28)[x]. The second FSR, called FSR−B, has eleven words (b10, . . . , b0) each of size 32 bit. The feedback function of FSR−B is selected by the DFC. This controller has two clock control bits cl1 and cl2 which are described as

t t cl1t = a2[30] and cl2t = a2[31] 6.1. Description of K2 and K2⊕ 75

t t t with a2[30] being the second most signiﬁcant bit (smsb) of a2 and a2[31] t being the most signiﬁcant bit (msb) of a2. Then the feedback function of the FSR−B is

t cl1t−1 1−cl1t−1 t−1 t−1 t−1 cl2t−1 t−1 b10 = (α1 + α2 − 1)b0 ⊕ b1 ⊕ b6 ⊕ α3 b8 where the multipliers α(1,2,3) are constants each one chosen as the root of a different irreducible polynomial of fourth degree in GF (28)[x]. The NLF has four words memory (L1,L2,R1,R2) and four times a Sub function. This Sub function operates on a word and uses the 8-bit AES S-box [DR02] and the AES MixColumns operation over GF (28) defined by x8 +x4 +x3 +x+1. The exact work flow for a word w = w(0)kw(1)kw(2)kw(3) with w(0) being the most and w(3) being the least significant bytes is h iT Sub(w) = MC · S(w(0)),S(w(1)),S(w(2)),S(w(3)) with S denoting the AES S-box and MC the MixColumns operation. With H L each clock the output of the NLF is the keystream of two words (zt , zt ) computed as H t t t t zt = (b10 ¢ L2 ) ⊕ L1 ⊕ a0 L t t t t zt = (b0 ¢ R2 ) ⊕ R1 ⊕ a4 . The symbol ’⊕’ denotes the bitwise XOR and the symbol ’¢’ denotes the addition modulo 232. To update the memory words of the NLF compute t t−1 t−1 t t−1 L1 = Sub(R2 ¢ b4 ) L2 = Sub(L1 ) t t−1 t−1 t t−1 R1 = Sub(L2 ¢ b9 ) R2 = Sub(R1 ) .

The K2 cipher uses a four word key K = [K0,K1,K2,K3] and a four word IV = [IV0,IV1,IV2,IV3]. For the loading step two intermediate results are computed using the Sub function

S1 = Sub[(K3 ¿ 8) ⊕ (K3 À 24)] ⊕ 0x01000000

S2 = Sub[((K0 ⊕ K1 ⊕ K2 ⊕ K3 ⊕ S1) ¿ 8) ⊕((K0 ⊕ K1 ⊕ K2 ⊕ K3 ⊕ S1) À 24)] ⊕ 0x02000000 , the constants at the end are given in hexadecimal numbers. Afterwards, the FSRs are loaded

a0 = K0 ⊕ S1 b3 = IV1 a1 = K3 b4 = K0 ⊕ S1 ⊕ S2 a2 = K2 b5 = K1 ⊕ S2 a3 = K1 b6 = IV2 a4 = K0 b7 = IV3 b0 = K0 ⊕ K2 ⊕ S1 ⊕ S2 b8 = K0 ⊕ K1 ⊕ K2 ⊕ K3 ⊕ S1 b1 = K1 ⊕ K3 ⊕ S2 b9 = K0 ⊕ K1 ⊕ S1 b2 = IV0 b10 = K0 ⊕ K1 ⊕ K2 ⊕ S1 . The four memory words of the NLF are initialized with zero. Then, during the initialization the cipher is clocked 24 times doing 76 Chapter 6. Attacks on Simpliﬁed Versions of K2

H L 1. get the output from the NLF (zt , zt ), 2. update the NLF, H 3. update FSR−B with zt is XORed to the new word of FSR−B, L 4. update FSR−A with zt is XORed to the new word of FSR−A. After this initialization the K2 cipher produces the keystream and the FSRs are updated without feedback from the NLF. For the rest of this chapter we consider a simpliﬁed version of K2 where all additions modulo 232 are replaced with XOR and denote this version with K2⊕. We measure the time complexity for our attacks in K2⊕ clocks.

6.2 Diﬀerential Chosen IV Attack with Key-Recovery

Considering a differential chosen IV scenario, we choose two different IVs IVa and IVb. We know the keystream from both pairs (K,IVa) and (K,IVb) with unknown key K. Both IVs only differ in IV word IV1 which takes the longest until it enters the NLF. We denote this chosen difference with ∆d. The DFC always takes the msb and smsb of a2. Thus, we do not want to have a difference there. Accordingly, we choose the starting difference ∆d with msb and smsb equal to zero. Our goal is to recover the whole internal state right after the loading step which means we get the unknown key K. H L We denote the first displayed keystream words with z0 and z0 and increase the clock for the following keystream words. The previous initialization clocks will have negative clocks.

6.2.1 Difference Propagation through the S-boxes From the differences in the keystream and the partially known differences in the FSRs we compute the differences in the words of the NLF. We then need to know how the differences in the NLF words propagate through the Sub function. Let va, vb, wa, wb be arbitrary words at different times t with the following equations:

∆v = va ⊕ vb wa = Sub(va) wb = Sub(vb)

∆w = wa ⊕ wb = Sub(va) ⊕ Sub(vb) . We deﬁne a new notation

out def. ∆ Sub(∆v) = Sub(va) ⊕ Sub(vb) . In the same way we deﬁne a new notation for the AES S-box S. Let xa, xb, ya, yb be arbitrary bytes at diﬀerent times t with the following equations: ∆x = xa ⊕ xb ya = S(xa) yb = S(xb)

∆y = ya ⊕ yb = S(xa) ⊕ S(xb) . 6.2. Diﬀerential Chosen IV Attack with Key-Recovery 77

Then the new notation is

out def. ∆ S(∆x) = S(xa) ⊕ S(xb) .

During the keystream generation we have the following equations for the diﬀerences at clock t:

H t t t t ∆zt = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 L t t t t ∆zt = ∆b0 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆a4

out out t t−1 t−1 t t−1 t−1 ∆L1 = ∆ Sub (∆R2 ⊕ ∆b4 ) ∆R1 = ∆ Sub (∆L2 ⊕ ∆b9 ) out out ∆L2t = ∆ Sub (∆L1t−1) ∆R2t = ∆ Sub (∆R1t−1) .

We deﬁne two notations for the XORed sums in the Sub functions at time t − 1

t−1 def. t−1 t−1 t−1 def. t−1 t−1 ∆Rb = (∆R2 ⊕ ∆b4 ) and ∆Lb = (∆L2 ⊕ ∆b9 ) .

Any fixed input difference ∆Rbt−1 6= 0 results in nearly 228 possible output differences for ∆L1t, because any input difference in the small 8-bit AES S-box S results in 127 possible output differences. If we know or fix the input-output difference of Sub meaning ∆Rbt−1 → Sub → ∆L1t, we 126 1 4 can recover on average (2 · 127 + 4 · 127 ) ≈ 16.51 pairs of individual values t−1 t−1 (Rba , Rbb ) for Sub. Due to the difference propagation in the small S- 126 1 box S, we have 2 · 127 + 4 · 127 ≈ 2.02 pairs on byte level. From the pair t−1 t−1 t t t+1 t+1 (Rba , Rbb ) we obtain (L1a,L1b) and (L2a ,L2b ), too, after applying 16.51 the Sub function once and twice. This means that we have 2 ≈ 8.255 possible values for ∆L2t+1. If we know or fix this difference as well, we have only two pairs of individual values left which satisfy the sequence ∆Rbt−1 → t t+1 t−1 t−1 t−1 t−1 Sub → ∆L1 → Sub → ∆L2 , namely (Rba , Rbb ) and (Rbb , Rba ). For a fixed input-output difference ∆Lbt−1 → Sub → ∆R1t we can t−1 t−1 recover on average 16.51 pairs of individual values (Lba , Lbb ) for Sub in the same way as described above resulting in 8.255 possible values for ∆R2t+1. Similarly, with this difference known or fixed we have only two pairs of individual values remaining which satisfy the sequence ∆Lbt−1 → t t+1 t−1 t−1 t−1 t−1 Sub → ∆R1 → Sub → ∆R2 , namely (Lba , Lbb ) and (Lbb , Lba ). If we have collected enough individual values for the NLF, we can derive t t−1 t−1 some words for FSR−B from the update equations L1 = Sub(R2 ⊕b4 ) t t−1 t−1 and R1 = Sub(L2 ⊕ b9 ) of the NLF. The insertion of the individual values of the NLF together with some words of FSR−B in the keystream equations yields some words for FSR−A. At the end, we know enough words of the NLF, FSR−B and FSR−A to clock backwards and reveal the secret key. 78 Chapter 6. Attacks on Simplified Versions of K2

6.2.2 Reduced Initialization of 5 Clocks

We reduce the number of initialization clocks of K2⊕ to five. In the 5th initialization clock the starting difference ∆d enters the word R1 of the NLF. During the keystream computation there is no more feedback from the NLF to the FSRs anymore. Therefore, we know all differences of FSR−A as ∆d enters it in clock 4 and then propagates linearly. For FSR−B the computation of the differences only depends on the unknown bits of the DFC which select the multipliers for the FSR−B feedback. The propagation of the chosen difference ∆d during the five initialization clocks is shown in Table 6.1. We omit the zero differences and write a question mark for unknown differences. The work flow of K2 has two steps, first computing the keystream words, then updating the internal state. The first row in Table 6.1 only shows the starting state. Then each row shows the updated state of K2⊕. Thus, the first keystream words are computed from the values in the last row of this table.

clock FSR-B FSR-A NLF ini 10 9 8 7 6 5 4 3 2 1 0 4 3 2 1 0 L1 L2 R1 R2 d 1 d 2 d 3 d d 4 ? d d 5 ?? d d d ?

Table 6.1: Propagation of the diﬀerence d through K2⊕ during the ﬁve initialization clocks

The diﬀerences of the keystream words for clock 0 are

H 0 0 0 0 ∆z0 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 L 0 0 0 0 ∆z0 = ∆b0 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆a4 ,

0 0 0 0 0 where the differences in ∆L1 , ∆L2 , ∆a0, ∆b0, ∆R2 are all equal to zero 0 0 0 and ∆a4 = ∆d. Hence, we know ∆b10 and ∆R1 . The update equation of 0 −1 0 0 1 FSR−B implies ∆b10 = ∆b10 = ∆b9 and ∆b10 = ∆b9. Furthermore, we −1 t−1 t−1 0 know ∆Lb = ∆L2 ⊕ ∆b9 = 0 ⊕ ∆d and ∆R1 which fixes the input- output difference ∆Lb−1 → Sub → ∆R10. As explained in Section 6.2.1 we can recover on average 16.51 pairs of individual values for Sub resulting in 8.255 possible values for ∆R21. 6.2. Differential Chosen IV Attack with Key-Recovery 79

H 1 2 Clock 1 gives us ∆z1 = ∆b10 = ∆b9 because all other diﬀerences are L zero. For ∆z1 we can rewrite the keystream equation in the following way:

L 1 1 1 1 ∆z1 = ∆b0 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆a4 1 1 1 1 L ⇔ ∆R1 = ∆b0 ⊕ ∆R2 ⊕ ∆a4 ⊕ ∆z1 out 0 0 1 L ⇔ ∆ Sub(∆L2 ⊕ ∆b9) = 0 ⊕ ∆R2 ⊕ ∆d ⊕ ∆z1 out 0 1 ⇔ ∆ Sub(0 ⊕ ∆b9) = ∆R2 ⊕ ∆k0 ,

L so that we know all diﬀerences at the right side with ∆k0 = ∆d ⊕ ∆z1 . Let 0 (0) (1) (2) (3) 1 (0) (1) (2) (3) ∆b9 = v kv kv kv and ∆R2 = w kw kw kw . We can undo the MixColumns and yield the equation in byte form

 out    S(v(0))  (0)  (0) ∆ w ∆k0  out   (1)   (1)   (1)   ∆ S(v )  −1  w  −1  ∆k0  out = MC · ⊕ MC ·  .  (2)  1  w(2)  1  (2)   ∆ S(v )  ∆k0 out w(3) (3) ∆ S(v(3)) ∆k0

Then we insert all 8.255 possibilities of ∆R21 and check byte by byte whether the computed value on the right side is a valid difference for the left side. The time complexity for this check is two clocks and only one pair (∆R11, ∆R21) 228·8.255 1 will remain due to 232 < 1. The known difference ∆R2 fixes the difference sequence ∆Lb−1 → Sub → ∆R10 → Sub → ∆R21 leaving only two pairs of individual values satisfying it. Furthermore, we know 0 0 0 1 ∆Lb = ∆L2 ⊕ ∆b9 = 0 ⊕ ∆d and ∆R1 which fixes the input-output difference ∆Lb0 → Sub → ∆R11. Thus, we can recover nearly 16.51 pairs of individual values following 8.255 possible differences for ∆R22. 2 In clock 2 ∆b10 has two possibilities because in its update equation

2 cl11 1−cl11 1 1 1 cl21 1 ∆b10 = (α1 + α2 − 1)∆b0 ⊕ ∆b1 ⊕ ∆b6 ⊕ α3 ∆b8 ,

1 all diﬀerences are equal to zero except for ∆b8. The DFC takes cl21 to 1 select the multiplier. We have a zero in the msb of ∆a2 = ∆d and following 2 1 2 1 ∆b10 = ∆b8 or ∆b10 = α3∆b8. We can rewrite the keystream equation in the following way:

H 2 2 2 2 ∆z2 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 2 2 2 2 H ⇔ ∆L1 = ∆b10 ⊕ ∆L2 ⊕ ∆a0 ⊕ ∆z2 out 1 1 2 H ⇔ ∆ Sub(∆R2 ⊕ ∆b4) = ∆b10 ⊕ 0 ⊕ 0 ⊕ ∆z2 out 1 2 H ⇔ ∆ Sub(∆R2 ⊕ 0) = ∆b10 ⊕ ∆z2 , so that we know all differences at the right side. We insert both possibilities 2 for ∆b10, undo the MixColumns operation and check bytewise whether the computed value on the right side is a valid difference for the left side. The 80 Chapter 6. Attacks on Simplified Versions of K2

2 2 time complexity for this check is a half clock and only one pair (∆b10, ∆L1 ) 1 1 1 1 2 will remain. We know ∆Rb = ∆R2 ⊕ ∆b4 = ∆R2 ⊕ 0 and ∆L1 which ﬁxes the input-output diﬀerence ∆Rb1 → Sub → ∆L12. Therefore, we have nearly 16.51 pairs of individual values following 8.255 possibilities for ∆L23. L L For ∆z2 we do exactly the same as described for ∆z1 at clock 1 receiving known values for ∆R11 and ∆R22 as well as 8.255 possibilities for ∆R23. 3 For clock 3 we have two possibilities for ∆b10 and 8.255 possibilities for ∆L23. Thus, we rewrite the equation

H 3 3 3 3 ∆z3 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 3 3 3 3 H ⇔ ∆L1 = ∆b10 ⊕ ∆L2 ⊕ ∆a0 ⊕ ∆z3 out 2 2 3 3 H ⇔ ∆ Sub(∆R2 ⊕ ∆b4) = ∆b10 ⊕ ∆L2 ⊕ ∆d ⊕ ∆z3 out 2 3 3 H ⇔ ∆ Sub(∆R2 ⊕ 0) = ∆b10 ⊕ ∆L2 ⊕ ∆d ⊕ ∆z3 , insert all listed possibilities, undo the MixColumns operation and check byte by byte. The time complexity for this check is four clocks and only one triple 3 3 3 228·2·8.255 (∆b10, ∆L1 , ∆L2 ) will remain due to 232 ≈ 1. Furthermore, we know 2 2 2 2 3 ∆Rb = ∆R2 ⊕ ∆b4 = ∆R2 ⊕ 0 and ∆L1 which fixes the input-output difference ∆Rb2 → Sub → ∆L13. Thus, we can recover 8.255 possibilities for ∆L24. In addition, the known difference ∆L23 fixes the difference sequence ∆Rb1 → Sub → ∆L12 → Sub → ∆L23 yielding only two pairs of individual L L values satisfying it. For ∆z3 we do exactly the same as described for ∆z1 at clock 1 receiving known values for ∆R12 and ∆R23 as well as 8.255 possibilities for ∆R24. In clock 4, 5 and 6 we do exactly the same as described for clock 3. Now we have collected enough individual values for the NLF. So far, the 1 time complexity is 2 + 2 + 2 + 4 · (4 + 2) = 28.5 clocks. For each key-IV pair we have a system of keystream and update equations. We select one of them for the insertion of these individual values. Since we collected 20 pairs of individual values, always two pairs for one difference, we take the first value of a pair to be inserted in the system of equations. Wrong allocations will have contradictions somewhere in the equations or at least in the keystream for clock 7 and are dispelled this way. This would result in a time complexity of 210 clocks. For the right allocation at clock 6 we can clock backwards and receive the secret key with time complexity of six clocks. The overall time complexity in K2⊕ clocks is

28.5 + 210 + 6 ≈ 210.05 .

The needed keystream amounts to two words per clock for seven clocks for each pair (K,IVa) and (K,IVb) which yields 28 keystream words. The memory requirements are roughly 30 words. 6.2. Diﬀerential Chosen IV Attack with Key-Recovery 81

6.2.3 Reduced Initialization of 6 Clocks We extend the previous attack by one more clock. In the 5th initialization clock the starting difference ∆d enters the word R1 of the NLF resulting in an unknown difference ∆R1−1, which is fed back into FSR−A. We guess out out −1 −1 −2 −2 ∆R1 with ∆R1 =∆ Sub(∆L2 ⊕ ∆b9 ) =∆ Sub(0 ⊕ ∆d). Thus, we know the input difference and have 228 possibilities for the output difference ∆R1−1. We can only employ those ∆R1−1 which have msb and smsb equal to zero because otherwise we will have differences in the bits for the DFC. This reduces the possibilities for ∆R1−1 to 226 but we have to try the whole attack 22 times to let the case msb=smsb=0 happen. We denote the now −1 known ∆R1 with ∆k0. During the keystream computation there is no more feedback from the NLF to the FSRs anymore. Therefore, we know all differences of FSR−A as ∆d enters it in clock 4, ∆k0 enters it in clock 6 and both propagate linearly. For FSR−B the computation of the differences only depends on the unknown bits of the DFC which select the multipliers for the FSR−B −2 −1 feedback. We have to guess the difference ∆b10 = ∆b10 because we need this difference for the update equations of the NLF as well as for the feedback −2 −1 function of FSR−B. The difference ∆b10 = ∆b10 has only two possibilities due to the fact that in its update equation all differences are zero except for −3 ∆b0 = ∆d. For this difference we need to guess the choice of the multiplier −3 −2 −1 which depends on the smsb of a2 . We denote the now known ∆b10 = ∆b10 with ∆k1. Table 6.2 shows the propagation of ∆d and all guessed differences. Zero differences are omitted and unknown differences are noted with question marks. clock FSR-B FSR-A NLF ini 10 9 8 7 6 5 4 3 2 1 0 4 3 2 1 0 L1 L2 R1 R2 d 1 d 2 d 3 d d 4 k1 d d 5 k1 k1 d d d k0 6 ? k1 k1 d k0 d d ??

⊕ Table 6.2: Propagation of the diﬀerences ∆d, ∆k0 and ∆k1 through K2 during the six initialization clocks

The diﬀerences of the keystream words for clock 0 are H 0 0 0 0 ∆z0 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 L 0 0 0 0 ∆z0 = ∆b0 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆a4 , 82 Chapter 6. Attacks on Simpliﬁed Versions of K2

0 0 0 0 0 where the differences in ∆L1 , ∆L2 , ∆a0, ∆b0 are zero, and ∆a4 = ∆k0. 0 −1 Hence, we know ∆b10. As ∆R1 = ∆k0 is fixed we know the input-output −2 −1 difference ∆Lb → Sub → ∆R1 = ∆k0. Thus, we can recover on average 16.51 pairs of individual values for this sequence resulting in 8.255 0 L possibilities for ∆R2 . For ∆z0 we can rewrite the keystream equation in the following way: L 0 0 0 0 ∆z0 = ∆b0 ⊕ ∆R1 ⊕ ∆R2 ⊕ ∆a4 0 0 0 0 L ⇔ ∆R1 = ∆b0 ⊕ ∆R2 ⊕ ∆a4 ⊕ ∆z0 out −1 −1 0 L ⇔ ∆ Sub(∆L2 ⊕ ∆b9 ) = 0 ⊕ ∆R2 ⊕ ∆k0 ⊕ ∆z0 out 0 L ⇔ ∆ Sub(0 ⊕ ∆k1) = ∆R2 ⊕ ∆k0 ⊕ ∆z0 . Then we can insert all 8.255 possibilities for ∆R20, undo the MixColumns operation and check bytewise whether the value on the right side is a valid difference for the left side. The time complexity for this check is two clocks 0 0 228·8.255 and only one pair (∆R1 , ∆R2 ) will survive due to 232 < 1. The ∆R20 leaves only two pairs of individual values which fulfills the sequence −2 −1 0 0 ∆Lb → Sub → ∆R1 = ∆k0 → Sub → ∆R2 . The ∆R1 fixes the input-output difference ∆Lb−1 → Sub → ∆R10 resulting in nearly 16.51 pairs of individual values following 8.255 possible differences for ∆R21. 1 In clock 1 ∆b10 has two possibilities because in its update equation 1 cl10 1−cl10 0 0 0 cl20 0 ∆b10 = (α1 + α2 − 1)∆b0 ⊕ ∆b1 ⊕ ∆b6 ⊕ α3 ∆b8 , 0 1 0 all differences are equal to zero except for ∆b8. Hence, we have ∆b10 = ∆b8 1 0 0 or ∆b10 = α3∆b8 because we have a zero in the msb of ∆a2 = ∆d. We can rewrite the keystream equation in the following way: H 1 1 1 1 ∆z1 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 1 1 1 1 H ⇔ ∆L1 = ∆b10 ⊕ ∆L2 ⊕ ∆a0 ⊕ ∆z1 out 0 0 1 H ⇔ ∆ Sub(∆R2 ⊕ ∆b4) = ∆b10 ⊕ 0 ⊕ ∆d ⊕ ∆z1 out 0 1 H ⇔ ∆ Sub(∆R2 ⊕ 0) = ∆b10 ⊕ ∆d ⊕ ∆z1 , so that we know all differences at the right side. We insert both possibilities 1 for ∆b10, undo the MixColumns operation and check bytewise whether the value on the right side is a valid one for the left side. The time complexity 1 1 for this check is a half clock and only one pair (∆b10, ∆L1 ) will remain. The ∆L11 fixes the input-output difference ∆Rb0 → Sub → ∆L11 which results in nearly 16.51 pairs of individual values following 8.255 possibilities 2 L L for ∆L2 . For ∆z1 we do exactly the same as described for ∆z0 at clock 0. 2 For clock 2 we have two possibilities for ∆b10 and 8.255 possibilities for ∆L22. Thus, we rewrite the equation H 2 2 2 2 ∆z2 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 2 2 2 2 H ⇔ ∆L1 = ∆b10 ⊕ ∆L2 ⊕ ∆a0 ⊕ ∆z2 out 1 1 2 2 H ⇔ ∆ Sub(∆R2 ⊕ ∆b4) = ∆b10 ⊕ ∆L2 ⊕ ∆d ⊕ ∆z2 out 1 2 2 H ⇔ ∆ Sub(∆R2 ⊕ 0) = ∆b10 ⊕ ∆L2 ⊕ ∆d ⊕ ∆z2 , 6.2. Differential Chosen IV Attack with Key-Recovery 83 insert all listed possibilities, undo the MixColumns operation and check byte by byte. The time complexity for this check is four clocks and only one triple 2 2 2 228·2·8.255 2 (∆b10, ∆L1 , ∆L2 ) will remain due to 232 ≈ 1. The difference ∆L1 fixes the sequence ∆Rb1 → Sub → ∆L12 which results in 8.255 possibilities for ∆L23. The known difference ∆L22 fixes the pair of individual values 0 1 2 L for the sequence ∆Rb → Sub → ∆L1 → Sub → ∆L2 . For ∆z2 we do L exactly the same as described for ∆z0 at clock 0. For the clocks 3, 4 and 5 we do exactly the same as described for clock 2. Now we have collected enough individual values for the NLF. So far, the 1 time complexity is 2+ 2 +2+4·(4+2) = 28.5 clocks. It is the same as for the computations for five initialization clocks as we executed similar equations. For each key-IV pair we have a system of keystream and update equations. We select one of them for the insertion of these individual values. Since we collected 20 pairs of individual values, always two pairs for one difference, we take the first value of a pair to be inserted in the system of equations. Wrong allocations will have contradictions somewhere in the equations or at least in the keystream for clock 7 and are dispelled this way. This would result in a time complexity of 210 clocks. For the right allocation at clock 6 we can clock backwards and receive the secret key with a time complexity of six clocks.

−1 26 −2 At the beginning we guessed ∆R1 with 2 possibilities and ∆b10 with two possibilities. Since we have the constraint on ∆R1−1 with msb=smsb=0, we have to execute our attack 22 times (for example with diﬀerent ∆d) that this case happens. The overall time complexity in K2⊕ clocks is 22 · 226 · 2 · (28.5 + 210 + 6) ≈ 239.05 . The needed keystream amounts to two words per clock for seven clocks for each pair (K,IVa) and (K,IVb) which yields 28 keystream words. The memory requirements are roughly 30 words. If we would restrict the diﬀerence ∆d to a word with three bytes equal to zero and one non-zero byte, we would get a time complexity of 22 · 25 · 2 · (28.5 + 210 + 6) ≈ 218.05 K2⊕ clocks. The keystream and memory requirements remain.

6.2.4 Reduced Initialization of 7 Clocks Now we want to extend the attack to seven initialization clocks. The big problem is the feedback from the NLF to FSR−A and the DFC. Table 6.3 shows the propagation of difference ∆d and all unknown differences as question marks. Before we can start with any computation we have to guess some differences to get rid of some question marks in this table. 84 Chapter 6. Attacks on Simplified Versions of K2

clock FSR-B FSR-A NLF ini 10 9 8 7 6 5 4 3 2 1 0 4 3 2 1 0 L1 L2 R1 R2 d 1 d 2 d 3 d d 4 ? d d 5 ?? d d d ? 6 ??? d ? d d ?? 7 ???? d ?? d d ???

Table 6.3: Propagation of the diﬀerence ∆d through K2⊕ during the seven initialization clocks

−3 For FSR−B we need to guess ∆b10 of initialization clock 4. In its update −4 equation all differences are zero except for ∆b0 = ∆d, for which we need to −4 guess the choice of the multiplier which depends on the smsb of a2 . Thus, −3 −3 we have two possibilities for ∆b10 . We denote the guessed ∆b10 with ∆k0. −2 −3 From the update equations we derive ∆b10 = ∆b10 = ∆k0. Similarly, we −1 have to guess ∆b10 with two possibilities and denote it with ∆k1. For the NLF we need to guess ∆R1−2 because it is fed back into FSR−A. out out −2 −3 −3 We have ∆R1 =∆ Sub(∆L2 ⊕ ∆b9 ) =∆ Sub(0 ⊕ ∆d). Thus, we know the input difference and have 228 possibilities for the output difference ∆R1−2. We can only employ those values of ∆R1−2 which have msb and smsb equal to zero because otherwise we will have differences in the bits for the DFC. This reduces the possibilities for ∆R1−2 to 226 but we have to try the whole attack 22 times to let the case msb=smsb=0 −2 happen. We denote the guessed ∆R1 with ∆k2. This fixes the input- −3 −2 output difference ∆Lb → Sub → ∆R1 = ∆k2 yielding 16.51 pairs of individual values and following 8.255 possibilities for ∆R2−1. Further- more, we guess ∆R1−1 with 228 possibilities due to the known input dif- out out −1 −2 −2 ference in ∆R1 =∆ Sub(∆L2 ⊕ ∆b9 ) =∆ Sub(0 ⊕ ∆k0). The sum ∆R1−1 ⊕ ∆R2−1 must have a zero in the msb and smsb because we do not want to have a difference in the bits for the DFC. This reduces the possibilities for ∆R1−1 to 226 but we have to execute our attack 22 more times. −1 We denote the guessed ∆R1 with ∆k3. This determines the input-output −2 −1 difference ∆Lb → Sub → ∆R1 = ∆k3 yielding 16.51 pairs of individual values and 8.255 possibilities for ∆R20. We update Table 6.3 with all guessed differences and get Table 6.4. Now we have a look at the keystream equations for clock 0. The difference 0 ∆b10 has two possibilities because in its update equation we need to guess −1 −1 the multiplier for ∆b8 depending on the msb of a2 . We can rewrite the 6.2. Differential Chosen IV Attack with Key-Recovery 85

clock FSR-B FSR-A NLF ini 10 9 8 7 6 5 4 3 2 1 0 4 3 2 1 0 L1 L2 R1 R2 d 1 d 2 d 3 d d 4 k0 d d 5 k0 k0 d d d k2 6 k1 k0 k0 d k2 d d k3 ? 7 ? k1 k0 k0 d ? k2 d d ???

Table 6.4: Propagation of the known or guessed diﬀerences through K2⊕ during the seven initialization clocks

H equation for ∆z0 in the following way:

H 0 0 0 0 ∆z0 = ∆b10 ⊕ ∆L1 ⊕ ∆L2 ⊕ ∆a0 0 0 0 0 H ⇔ ∆L1 = ∆b10 ⊕ ∆L2 ⊕ ∆a0 ⊕ ∆z0 out −1 −1 0 H ⇔ ∆ Sub(∆R2 ⊕ ∆b4 ) = ∆b10 ⊕ 0 ⊕ 0 ⊕ ∆z0 out −1 0 H ⇔ ∆ Sub(∆R2 ⊕ 0) = ∆b10 ⊕ ∆z0 .

We insert the 8.255 possibilities for ∆R2−1 and the two possibilities for 0 ∆b10. Afterwards, we undo the MixColumns operation and check bytewise whether the value at the right side is a valid one for the left side. The time 0 −1 0 complexity is about four clocks and only one triple (∆L1 , ∆R2 , ∆b10) 228·2·8.255 0 will remain due to 232 ≈ 1. The ∆L1 fixes the input-output difference ∆Rb−1 → Sub → ∆L10 which results in 8.255 possibilities for ∆L21. The ∆R2−1 determines the two pairs of individual values for the sequence ∆Lb−3 → Sub → ∆R1−2 → Sub → ∆R2−1. Also, ∆R2−1 fixes 0 −1 −1 −1 L ∆a4 = ∆a3−1 ⊕ ∆R2 ⊕ ∆R1 ⊕ ∆a4 . For ∆z0 we know the input out out 0 −1 −1 difference for ∆R1 =∆ Sub(∆L2 ⊕ ∆b9 ) =∆ Sub(0 ⊕ ∆k0) as well as 0 0 the 8.255 possibilities for ∆R2 and ∆a4. Therefore, we can do exactly the L same as described for ∆z0 at clock 0 in the attack on six initialization clocks on page 82. From now on all differences in FSR−A are known and propagate linearly because there is no more feedback from the NLF to FSR−A. For the clocks from 1 to 4 we perform the same computations as for clock 2 in the attack on six initialization clocks described on page 82. For H clock 5 we discard ∆z5 because we might have msb6=0 and / or smsb6=0 for 4 L ∆a2 which results in differences in the DFC. For ∆z5 we do the same as in the previous clocks. Now we have collected enough individual values for the NLF. So far, the time complexity for the computations is 4 + 4 · (4 + 2) + 2 = 30 clocks. 86 Chapter 6. Attacks on Simplified Versions of K2

For each key-IV pair we have a system of keystream and update equations. We select one of them for the insertion of these individual values. Since we collected 20 pairs of individual values, always two pairs for one diﬀerence, we take the ﬁrst value of a pair to be inserted in the system of equations. Wrong allocations will have contradictions somewhere in the equations or at least in the keystream for clock 6 and are dispelled this way. This would result in a time complexity of 210 clocks. For the right allocation at clock 0 we can clock backwards and receive the secret key with time complexity of seven clocks.

−3 −2 At the beginning we guessed ∆b10 and ∆b10 each with two possibilities. Furthermore, we guessed ∆R1−2 and ∆R1−1 each with 226 possibilities. Since we have the constraint on ∆R1−2 with msb=smsb=0 and the constraint on ∆R1−1, we have to execute our attack 22 · 22 times (for example with diﬀerent ∆d) that these cases happen. The overall time complexity in K2⊕ clocks is

2 · 2 · 226 · 226 · 22 · 22 · (30 + 210 + 7) ≈ 268.05 .

The needed keystream amounts to two words per clock for six clocks for each pair (K,IVa) and (K,IVb) which yields 24 keystream words. The memory requirements are roughly 30 words. If we would restrict the diﬀerence ∆d to a word with three bytes equal to zero and one non-zero byte, we would get a time complexity of

2 · 2 · 25 · 226 · 22 · 22 · (30 + 210 + 7) ≈ 247.05

K2⊕ clocks. The keystream and memory requirements remain.

The summary of our results is given in Table 6.5 with time in K2⊕ clocks and keystream and memory in words. We were not able to extend this attack to eight initialization clocks because we have not found a way to handle the unknown differences from the NLF fed back into FSR−A and FSR−B. Furthermore, non-zero differences in the bits of the DFC will occur which makes the computation of the differences of FSR−B impossible.

diﬀerential chosen IV attack on K2⊕ keystream time memory 5 initialization clocks 28 210.05 30 6 initialization clocks 28 239.05 30 with restriction of ∆d to 1 byte 28 218.05 30 7 initialization clocks 24 268.05 30 with restriction of ∆d to 1 byte 24 247.05 30

Table 6.5: Summary of our results on K2⊕ 6.3. Chosen IV Distinguishing Attack 87

6.3 Chosen IV Distinguishing Attack

We now consider K2⊕ with the number of initialization clocks reduced to seven. To distinguish the cipher from a random function, we build a multiset over all possible 232 values of one word using 232 diﬀerent key-IV pairs H and check whether the XORed sum over all ﬁrst keystream words z0 is equal to zero which holds with probability one. For a random function the probability is 2−32 that this sum is zero.

For all 232 key-IV pairs we take the same unknown key K. The four words of the IV = [IV0,IV1,IV2,IV3] are loaded in the FSR−B, where the word IV1 loaded in b3 takes the longest time until it enters the NLF. For 32 this word IV1 we make a multiset in a way that all values [0, 2 − 1] occur exactly once. We emphasize that we need the multiset in ascending order starting with zero. Then we know that all IV1 values in the first half have msb equal to zero whereas all IV1 values in the second half have msb equal to one. We will use this fact about the msb later. The multiset in IV1 yields 32 i 32 2 different IVs IV = [IV0, i, IV2,IV3], i = 0,..., 2 − 1 with arbitrary i ⊕ words IV(0,2,3). For each pair (K,IV ) we run the K2 cipher with seven i H initialization clocks and get the first keystream word z0 . Now we explain why the XORed sum over all keystream words is equal to zero. Our goal is to prove the multiset propagation through K2⊕ as shown in Table 6.6. clock FSR-B FSR-A NLF ini 10 9 8 7 6 5 4 3 2 1 0 4 3 2 1 0 L1 L2 R1 R2 0 Ms 1 Ms 2 Ms 3 Ms Ms 4 S0 Ms Ms 5 S0 S0Ms MsMs M1 6 S0 S0 S0Ms S0MsMs ?M2 7 S0 S0 S0 S0Ms ? S0MsMs M3 ??

Table 6.6: Propagation of the multiset through K2⊕ during the seven initialization clocks

In this table we only show the multiset and its behavior. All values which are the same in all 232 key-IV pairs are omitted (empty in the table). We put the ’?’ for those sets we do not know anything about. With the symbol ’Ms’ we denote the multiset in the starting order. The symbols ’M1’, ’M2’ and ’M3’ denote different multisets. In each of them each value [0, 232 −1] occurs exactly once but we do not know in which order. Thus, also the XORed sums 88 Chapter 6. Attacks on Simplified Versions of K2 over these multisets are zero. The symbol ’S0’ denotes a multiset, where the characteristic that each value [0, 232 − 1] occurs exactly once is lost, but the feature that it sums up (XORed) to zero still remains. To prove this feature t t for the ’S0’ multisets it is sufficient to consider the sets in b10 and a4. The i t update equation for the XORed sum over all b10 is

2X32−1 2X32−1 ³ i t i t−1 i t−1 i t−1 i t−1 i t−1 i t−1 b10 = msmsb b0 ⊕ b1 ⊕ b6 ⊕ mmsb b8 i=0 i=0 ´ i t−1 i t−1 i t−1 i t−1 ⊕ b10 ⊕ L2 ⊕ L1 ⊕ a0 . (6.1)

i t−1 Here, the variable mmsb denotes the multiplier depending on the msb of i t−1 i t−1 i t−1 a2 . In particular, mmsb = α3 if the msb of a2 is equal to one and i t−1 i t−1 mmsb = 1 otherwise. Likewise, the variable msmsb denotes the multiplier i t−1 i t−1 depending on the smsb of a2 . In particular, msmsb = α1 if the smsb of i t−1 i t−1 a2 is equal to one and msmsb = α2 otherwise. The update equation for i t the XORed sum over all a4 is

2X32−1 2X32−1 ³ i t i t−1 i t−1 a4 = α0 a0 ⊕ a3 i=0 i=0 ´ i t−1 i t−1 i t−1 i t−1 ⊕ b0 ⊕ R2 ⊕ R1 ⊕ a4 . (6.2)

From now on, we always mean XORed sum when we write sum or the sigma sign. For clock 1 and 2 our starting multiset is only shifted in the FSR−B. We do not mention the shifts of a multiset in the next clocks anymore. At clock 3 i 3 we compute the sum over all b10. We see from Table 6.6 that the choice 2 for the multipliers is constant due to the constant value of a2. Thus, we do not know which multiplier is chosen but we know it is the same for all 232 key-IV pairs. Table 6.6 also shows that nearly all summands are constants, too, which means their sums are zero. We omit all zero sums reducing (6.1) to

2X32−1 2X32−1 2X32−1 2X32−1 i 3 i 2 i 0 b10 = b1 = b3 = i = 0 , (6.3) i=0 i=0 i=0 i=0 because it is the sum of our starting multiset and therefore zero. For clock 4 the choice for the multipliers in (6.1) is constant as well. We leave all zero 6.3. Chosen IV Distinguishing Attack 89 sums out (empty values in Table 6.6) and yield

2X32−1 2X32−1 2X32−1 i 4 3 i 3 i 3 b10 = msmsb b0 ⊕ b10 i=0 i=0 i=0 2X32−1 2X32−1 3 i 0 i 0 = msmsb b3 ⊕ b3 i=0 i=0 2X32−1 2X32−1 3 = msmsb i ⊕ i = 0 (6.4) i=0 i=0 due to (6.3). At this clock our starting multiset enters the FSR−A because after omitting all zero sums (6.2) gives

2X32−1 2X32−1 2X32−1 2X32−1 i 4 i 3 i 0 a4 = b0 = b3 = i = 0 . (6.5) i=0 i=0 i=0 i=0 Looking at clock 5 the choice for the multipliers is constant due to the 4 constant value of a2. Table 6.6 shows that nearly all sums are zero which reduces (6.1) with (6.4) to

2X32−1 2X32−1 i 5 i 4 b10 = b10 = 0 . (6.6) i=0 i=0 For FSR−A we get from (6.2) and (6.5)

2X32−1 2X32−1 i 5 i 4 a4 = a4 = 0 . (6.7) i=0 i=0 At this clock our starting multiset also enters the NLF in word R1 yielding with (6.3)

2X32−1 2X32−1 i 5 ¡i 4 i 4¢ R1 = Sub L2 ⊕ b9 i=0 i=0 2X32−1 ¡i 4 i 3 ¢ = Sub L2 ⊕ b10 i=0 32 2X−1 ¡ ¢ = Sub iL24 ⊕ i = 0 (6.8) i=0 due to the constant value of L24. Thus, the order of our starting multiset is destroyed but the feature that each value [0, 232 − 1] occurs exactly once remains which results in a zero sum. 90 Chapter 6. Attacks on Simpliﬁed Versions of K2

For clock 6 the choice for the multipliers in (6.1) is constant as well. Thus, we can lower (6.1) with (6.3) and (6.6) to

2X32−1 2X32−1 2X32−1 i 6 5 i 5 i 5 b10 = mmsb b8 ⊕ b10 i=0 i=0 i=0 2X32−1 2X32−1 5 i 3 i 5 = mmsb b10 ⊕ b10 i=0 i=0 2X32−1 2X32−1 5 i 5 = mmsb i ⊕ b10 = 0 . (6.9) i=0 i=0

For FSR−A we can reduce (6.2) with (6.5), (6.8) and (6.7) to

2X32−1 2X32−1 2X32−1 2X32−1 i 6 i 5 i 5 i 5 a4 = a3 ⊕ R1 ⊕ a4 i=0 i=0 i=0 i=0 2X32−1 2X32−1 2X32−1 i 4 i 5 i 5 = a4 ⊕ R1 ⊕ a4 = 0 . (6.10) i=0 i=0 i=0

At this clock we have the ﬁrst unknown set in the NLF because with (6.6) and (6.4) we get

2X32−1 2X32−1 i 6 ¡i 6 i 6¢ R1 = Sub L2 ⊕ b9 i=0 i=0 2X32−1 ¡i 6 i 5 ¢ = Sub L2 ⊕ b10 i=0 2X32−1 ¡i 6 3 ¢ = Sub L2 ⊕ msmsbi ⊕ i , (6.11) i=0 where we do not know how the summands in the Sub function behave. There- fore, we have no clue what outcome the entire sum might have. The word R26 = Sub(R15) is a multiset, because R15 is a multiset, in which each value [0, 232 −1] occurs exactly once, and the Sub function preserves this property. Thus, we do not know the order of R26 but we know that the sum is zero. At clock 7 we see that we have only three zero sums in Table 6.6. This reduces (6.1) to

2X32−1 2X32−1 2X32−1 2X32−1 i 7 i 6 i 6 i 6 i 6 i 6 b10 = msmsb b0 ⊕ mmsb b8 ⊕ b10 . (6.12) i=0 i=0 i=0 i=0 6.3. Chosen IV Distinguishing Attack 91

6 The ﬁrst two summands depend on the value of a2. From the update equation 6 4 for a2 = a4 we know

2X32−1 2X32−1 ³ ´ i 4 i 3 i 3 i 3 i 3 i 3 i 3 a4 = α0 a0 ⊕ a3 ⊕ b0 ⊕ R2 ⊕ R1 ⊕ a4 . i=0 i=0

3 3 3 3 3 The values for a3, a0,R2 ,R1 and a4 remain constant for all pairs. We 3 3 3 3 3 denote the sum of them with C = α0a0 ⊕ a3 ⊕ R2 ⊕ R1 ⊕ a4 which does not depend on i and is unknown to us. With this simpliﬁcation and (6.5), we obtain 2X32−1 2X32−1 ³ ´ i 4 a4 = i ⊕ C . (6.13) i=0 i=0 As a result of the multiset we know that in each bit of i, the number of ones and zeros occurring is exactly 231 which is an even number. This means that 4 31 31 the smsb of a4 has 2 ones and 2 zeros. Taking this and the fact that the 6 value of b0 remains constant we obtain for the ﬁrst summand of (6.12)

2X32−1 2X31−1 2X31−1 i 6 i 6 6 6 msmsb b0 = α1b0 ⊕ α2b0 = 0 . (6.14) i=0 i=0 i=0

Altogether, (6.12) simpliﬁes with (6.9) and (6.14) to

2X32−1 2X32−1 2X32−1 i 7 i 6 i 6 i 6 i 4 b10 = mmsb b8 = mmsb b10 , (6.15) i=0 i=0 i=0

6 4 where the choice of the multiplier depends on the msb of the value a2 = a4 4 and the value of b10 does not remain constant. At the beginning we empha- sized the ascending order of our multiset. This means that the ﬁrst half of our multiset has msb zero and the second half has msb one. Accordingly, i 4 31 the msb of all a4 with i = 0,..., 2 − 1 is the msb of the unknown constant i 4 31 32 C shown in (6.13), whereas the msb of all a4 with i = 2 ,..., 2 − 1 is the opposite of the msb of the unknown constant C. Thus, we divided the sum into a ﬁrst and a second half. We need to check whether we can also divide i 4 the set of b10. From (6.4) we know

2X32−1 2X32−1 2X32−1 i 4 3 b10 = msmsb i ⊕ i . i=0 i=0 i=0

The choice of the multipliers is constant but unknown to us. We can divide both sums into a first and a second half. Together with the two possibilities 4 of the msb of a4 (first half zero and second half one, or vice versa) we can 92 Chapter 6. Attacks on Simplified Versions of K2 compute two values for (6.15)     2X31−1 2X31−1 2X32−1 2X32−1  3   3  X1 = msmsb i ⊕ i ⊕ α3 msmsb i ⊕ i i=0 i=0 i=231 i=231     2X31−1 2X31−1 2X32−1 2X32−1  3   3  X2 = α3 msmsb i ⊕ i ⊕ msmsb i ⊕ i . i=0 i=0 i=231 i=231

For the sums over exactly one ordered half of the multiset the property of summing up to zero is preserved. Accordingly, the values X1 and X2 are both zero which results in

2X32−1 2X32−1 i 7 i 7 b10 = X1 = 0 or b10 = X2 = 0 . (6.16) i=0 i=0 The unknown sum in R16 gives us an unknown sum in R27 = Sub(R16) and 7 7 an unknown sum in a4 due to (6.2). In R1 we get an unknown sum for a similar reason as for R16 in (6.11). We also get a multiset in L17 due to

2X32−1 2X32−1 i 7 ¡i 6 i 6¢ L1 = Sub R2 ⊕ b4 = 0 , (6.17) i=0 i=0

i 6 7 where b4 remains constant for all i. Thus, we have a multiset in L1 in which each value [0, 232 − 1] occurs exactly once, because the Sub function preserves this feature of the multiset in R26. Consequently, we have proven the assumed multiset propagation through K2⊕ in Table 6.6.

The keystream is produced at clock zero after the initialization. For the internal states we write the number of the initialization clock as superscript. i H We know about the sum of all words z0

2X32−1 2X32−1 ³ ´ i H i 7 i 7 i 7 i 7 z0 = b10 ⊕ L2 ⊕ L1 ⊕ a0 . i=0 i=0

32 7 7 For all 2 key-IV pairs the values for L2 and a0 remain constant for all pairs meaning the sum over them is zero. The values iL17 form a multiset i 7 which sums up to zero and (6.16) shows the zero sum for b10. Thus, we have

2X32−1 i H z0 = 0 . (6.18) i=0 We have shown that we can distinguish the K2⊕ from a random function with probability 1 − 2−32. We need the ﬁrst keystream word for all 232 pairs 6.4. Summary 93

(key, IV i) with (i = 0,..., 232−1). The time complexity is 232 times an XOR of a word which is roughly 227.6 clocks of K2⊕. The memory requirements are negligible. We have not found a way to extend this attack to eight initialization clocks because in L18 we will get a set we do not know anything about coming from R27 and therefore resulting in a random sum.

6.4 Summary

We have shown a differential chosen IV attack with key-recovery on K2⊕ with five, six and seven initialization clocks. The time complexities for the three attacks are 210.05, 218.05 and 247.05 clocks of K2⊕. All three attacks have the same memory requirements of 30 words. The attacks on five and six clocks need 28 words of keystream, and for seven clocks 24 keystream words are needed. A chosen IV distinguishing attack on K2⊕ with seven initialization clocks is also presented. With a multiset and its predictable propagation through these seven clocks we can distinguish K2⊕ from a random function with a probability of 1−2−32. The complexity for this attack is 227.6 clocks of K2⊕, negligible memory requirements and needed keystream of 232 words. We can only attack reduced versions of K2⊕ with seven initialization clocks out of 24. Thus, the security margin is large enough and these attacks do not pose a threat to the K2 stream cipher. 94 Chapter 6. Attacks on Simplified Versions of K2 Chapter 7

Cryptanalysis of the Shrinking Generator with Linear Initialization

The shrinking generator [CKM93] is a classic stream cipher construction that is specified without any key loading or resynchronization mechanisms; the initial state is considered to be the secret key. This is probably one of the reasons why even though these schemes have been very intensively studied in the academic literature they are not widely used in practice. In this chapter we show that a linear initialization with known feedback polynomials results in a very weak cipher. This coincides with similar findings for filter generators and combiners with memory [ALP04] and with chosen IV attacks on irregularly clocked ciphers A5/1 [EJ03, BB05] and DECT [NTW10] amongst others. We assume that the attacker has access to output streams of multiple states that are differentially related, i.e. not only does the attacker know the 0 output sequence (zi ) for a secret initial state σ0 but also output sequences j (zi ) for states σj = σ0 ⊕ ∆j together with the corresponding differences ∆j. This can occur if the internal state of the stream cipher is not only initialized with a key but also with an initial value (IV) that is transmitted in clear, an approach that is commonly used when the communication secured by the stream cipher is packetized. Then the attacker can use the known differences to determine the internal state and thus the secret key. The chapter is organized as follows: In Section 7.1 we briefly describe the shrinking generator. In Section 7.2 we show the known IV key-recovery attack on the shrinking generator with linear initialization and known feedback polynomials. The summary is given in Section 7.3.

95 96 Chapter 7. The Shrinking Generator with Linear Initialization

7.1 Description of the Shrinking Generator

The shrinking generator is a stream cipher construction proposed by Copper- smith, Krawczyk and Mansour [CKM93] in 1993. Two LFSRs A and S are used to generate a so-called “shrunken sequence” by conditionally emitting the output bit of A only if the output bit of the selector S in a given clock equals 1; if the output bit of S equals 0, no output is produced. The output sequence of the shrinking generator will be denoted by (zi). We denote by (si) the output sequence and by nS the length of LFSR S. Similarly, (ai) denotes the output sequence and nA the length of LFSR A. For optimal security, gcd(nS, nA) = 1 should hold; in this case the period of the generator is (2nA − 1) · 2nS −1.

ai zj LFSR A selection clock i rule si LFSR S

Figure 7.1: Shrinking Generator

In the original description the initial state and sometimes the feedback polynomials as well are assumed to be the secret key. The use of an IV is not mentioned nor is a method for loading and / or initialization given. We consider the shrinking generator with a linear key-IV setup and known feedback polynomials. The shrinking generator is initialized with a key K and an IV V by sequentially clocking the bits of K and V into both LFSRs. We have no restriction whether the key or the IV is clocked into the LFSRs ﬁrst. After the key and IV loading a number of blank rounds can be executed, i.e. the shrinking generator is clocked without producing output.

7.2 Known IV Key-Recovery Attack

In this section we demonstrate a known IV key-recovery attack against the shrinking generator with a linear key-IV setup and known feedback j polynomials. The attacker has access to v different keystreams (zi) with j = 1, . . . , v which are produced by v ≥ 2 known IVs and always the same j j secret key. Of course, the corresponding sequences (si) and (ai) produced by the LFSRs S and A are unknown to the attacker. We can compute all differences between the known IVs and denote their Pv−1 amount with w = i=1 i. With each IV difference we start the shrinking generator to produce the difference sequences for the LFSRs S denoted with h h ∆(si) and the difference sequences for the LFSRs A denoted with ∆(ai) where h = 1, . . . , w. 7.2. Known IV Key-Recovery Attack 97

j We can determine the ﬁrst nS output bits of the unknown sequences (si) h h sequentially bit by bit using the knowledge of ∆(si) , ∆(ai) and the diﬀer- ences in the given keystreams. This allows us to recover the initial states of the corresponding LFSRs S. Subsequently, we can recover the initial state of each LFSR A using only the corresponding LFSR S and the keystream. With the recovered LFSR S we can assign the clock to each keystream bit when it was produced by the LFSR A. This gives us a system of linear equations in the unknown bits of the initial state of LFSR A. We need nA bits of the keystream to yield enough equations. This corresponds to 2 nA clocks because roughly half of the output bits of LFSR A are discarded. We solve 3 2 the system in O(nA) time and O(nA) memory complexity and yield the initial state of the LFSR A. Now we know the initial state of the LFSRs S and A and therefore the secret key.

j To determine the output bits of the unknown sequences (si) we consider each clock separately. At each clock t we compute all differences between the j h first bits of the sequences (zi) and denote them with ∆z1 with h = 1, . . . , w. h Now we look at the ∆st and can distinguish three cases where each case has different probabilities. h j The first case occurs when all ∆st = 0 which means all unknown st have the same value, either zero or one. For all h = 1, . . . , w we check h h h h whether ∆at = ∆z1 . If at least one h exists with ∆at 6= ∆z1 then we j conclude that all unknown st must be zero because the differences in the output of the LFSRs A do not match the differences in the keystream. If all differences of the output and the keystream match we can not decide whether this happens by coincidence or because all st = 1. h The second case occurs when v − 1 out of w of the ∆st equal one j j which means that one of the st has the opposite value than the other st . h We consider only these h for which ∆st = 0 and check for these h whether h h h h ∆at = ∆z1 . If among these h at least one exists with ∆at 6= ∆z1 then we j conclude that the corresponding st must be zero because the differences in the output of the LFSRs A do not match the differences in the keystream. If all these differences match we can not decide whether this happens coin- cidentally or because the corresponding st = 1. h The third case occurs when the ∆st can be divided into three subsets. h j The first subset contains ∆st = 0 so that the corresponding st all have h the same value. The second subset contains ∆st = 0 as well so that the j corresponding st all have the same value but this value is opposite to the j value of the st belonging to the first subset. Consequentially, the third h subset contains all ∆st = 1. For the first and second subset we check the h h corresponding ∆at = ∆z1 . If for one subset all differences match and for the other subset at least one inequation occurs then we can determine that j all st belonging to the subset with the inequation must be zero and the j st belonging to the other subset are one. If all differences match we can 98 Chapter 7. The Shrinking Generator with Linear Initialization

j j not decide whether the st belonging to the first subset are zero and the st belonging to the second subset are one or the other way around. It can also happen that for both subsets at least one inequation occurs which means the j j st belonging to the first subset must be zero as well as the st belonging to h the second subset. This is a contradiction because the ∆st constitute that j not all st can have the same value. This contradiction can only appear if at j a previous clock u we had to guess su equals zero or one. So we have the possibility to detect wrong guesses. If we had no guess in all previous clocks we will have a decision. For two or three IVs we can not eliminate wrong guesses because the case for a contradiction never occurs. The amount of IVs is too small to yield a division into three subsets as needed for the third case. j At each clock t we can either decide which st is zero and which is one j or we can not decide and therefore have to guess which st equals zero or j j one. If st = 1 the first bit of the corresponding keystream sequence (zi) is discarded and the next bit becomes the first one.

Table 7.1 shows the frequency of occurrence for the different cases for two up to nine IVs. The variable v denotes the number of IVs, p the amount of possibilities, g the guesses, d the decisions and c the contradictions. The probability to have a contradiction is given by r. The last column itemizes as j subcategories the different cases with the division of the st into subsets. The value m denotes how often this division can occur and “α | β” describes the j exact division of the st into subsets. Please note that α ≤ β and α + β = v. j For example for six IVs shows “2 | 4” the division where two of the st have j the same value and the remaining four st have the same value but this one is opposite to the value of the previous two. The first subcategory contains h j the first case when all ∆st = 0 thus all st have the same value which gives the division “0 | v”. The second subcategory contains the second case when j j one of the st has the opposite value than the other st yielding the division “1 | v − 1”. The remaining subcategories contain the different possibilities for the third case where each subset given by the division contains at least j two st . The more IVs are provided the better the attack works due to the in- j creased probability for determining st and more cases to eliminate wrong guesses by a contradiction. For an arbitrary amount v of IVs the value p can be computed with p = 22(v−1). We were not able to find a single equation to compute the sum d and c depending on the amount of IVs but a recursive description for the separate entries in Table 7.1. For d and c as well as g and m we define a new notation where the superscript gives the number of j IVs and the bottom index denotes the division of the st into subsets. For 3 example d1|2 = 2 is the number of decisions for three IVs in the second case j where one st has the opposite value than the other two. 7.2. Known IV Key-Recovery Attack 99 . . . ··· ··· ··· ··· ··· ··· ··· ··· ··· β | 4 | 5 4 | 4 α × . . . × — — × 35 126 β m | 3 | 3 3 | 4 3 | 5 3 | 6 α . . . × × × × ———— × 10 35 56 84 β m | 2 | 3 2 | 4 2 | 5 2 | 6 2 | 7 α 2 | 2 . . . × × × × × —————— × × 3 10 15 21 28 36 Cases itemized β m | α 1 | 1 1 | 2 1 | 3 1 | 4 1 | 5 1 | 6 1 | 7 1 | 8 . . . × × × × × × × × × 1 3 4 5 6 7 8 9 β m | α 0 | 2 0 | 3 0 | 4 0 | 5 0 | 6 0 | 7 0 | 8 0 | 9 . . . × × × × × × × × × 1 1 1 1 1 1 1 1 m g/d/c g/d/c g/d/c g/d/c g/d/c 1 / 1 / 0 2 /1 / 0 3 / 0 / 0 2 /1/7/0 2 / 0 2/6/01 / 15 / 0 2 / 2/4/2 141 / / 0 31 / 0 2 / 2 8 / 30 / / 0 6 2 / 16 / 14 2 / 12 / 18 1 / 255 / 0 2 / 254 / 0 2 / 128 / 126 2 / 68 / 186 2 / 44 / 210 1 / 63 / 0 2 / 621 / / 127 0 / 0 2 / 2 / 32 126 / 0 / 30 2 / 64 2 / / 20 / 62 42 2 / 36 / 90 2 / 28 / 98 c p P . . . = . . . c r P . . . d P . . . 3 1 0 0.0 % 7 9 0 0.0 % g 15 4331 16563 6 9.4 % 60 571 23.4 % 390 38.1 % 511 18,405 46,620 71.1 % 127 1,869255 2,100 51.3 5,923 % 10,206 62.3 % P 2 4 6 8 . . . 10 16 12 14 2 2 2 2 2 2 2 2 . . . 2 3 4 5 6 9 7 8 v p IVs Poss. Sums Prob.

Table 7.1: Probability of contradictions 100 Chapter 7. The Shrinking Generator with Linear Initialization

For the multiplier m we have the following recursive equations:

The number of guesses g can easily be extracted from the table:

v for the ﬁrst subcategory g0|v = 1 v for all other entries gα|β = 2 P v v equation for the sum gα|β = 2 − 1

For the number of decisions d we have the following recursive equations:

v v−1 P v−1 for the ﬁrst subcategory d0|v = 2 − 1 = gα|β v v−1 v−1 for the second subcategory d1|v−1 = 2 − 2 = 2 · d0|v−1 v α−1 v−1 v−α−1 v−1 for all other entries dα|β = 2 + dα−1|β = 2 + dα|β−1

For the number of contradictions c the recursive equations are:

The sum for d and c can be computed with these equations.

To mount the known IV attack with two or three IVs is useless because j we would have to guess too many bits of the sequences (si) . Even with four IVs the probability of getting a guess with 23.4 % is much higher than the probability of getting a contradiction with 9.4 % which results in an j exponential time complexity to determine the bits of the sequences (si) . For ﬁve IVs we have a probability of 23.4 % to get a contradiction which is nearly twice the probability of 12.1 % to get a guess. Therefore, we conclude j that the time complexity to recover the bits of the sequences (si) would be in O(nS) if at least ﬁve IVs are provided. We assume both LFSRs having nearly the same length. Then the over- 3 all time complexity for our attack is dominated by O(nA) as well as the 2 memory requirements by O(nA). The needed keystream is roughly 2nA bits for each IV. 7.3. Summary 101

Note on the Self-Shrinking Generator with Known Feedback Polynomials The self-shrinking generator was presented by Meier and Staﬀelbach [MS94] in 1994. The generator uses only one LFSR L and is speciﬁed without any key loading or resynchronization mechanisms. The initial state is considered to be the secret key. The output bits of the LFSR are grouped to pairs (li+1, li) where the bit li+1 decides whether the bit li is an output bit of the self-shrinking generator or not. If li+1 = 1 the bit li is the output bit of the self-shrinking generator. If li+1 = 0 the bit li is discarded. We can determine the initial state of the self-shrinking generator in a similar way as for the shrinking generator. By dividing the sequence produced by the LFSR L into a “selector” sequence and a “non-shrunken” sequence, we are in the same place as for the shrinking generator. We can recover the bits of the “selector” sequence one by one. The keystream contains roughly half of the bits of the “non-shrunken” sequence and with the recovered “selector” bits we also know the clock for each bit. Altogether we can determine roughly 3 4 of the bits in the sequence produced by L. Therefore we need to know 4 the bits for 3 nL clocks on average to yield a system of nL linear equations in the unknown initial state variables. The time complexity to solve this 3 2 system is in O(nL) and the memory requirements are in O(nL). Compared to these complexities the needed time and memory for the determination of 1 the “selector” bits are negligible. The needed keystream is roughly 3 nL bits for each IV.

7.3 Summary

In this chapter we described a known IV key-recovery attack on the shrinking generator with known linear feedback polynomials and a linear key-IV setup. 3 Despite the irregular clocking we are able to break the cipher in O(nA) time 2 complexity and O(nA) memory requirements using at least ﬁve IVs. The design of the self-shrinking generator is related to the design of the shrinking generator. Therefore, this cipher can also be broken with at least ﬁve IVs. 3 The time complexities is in O(nL) and the memory requirements are in 2 O(nL). 102 Chapter 7. The Shrinking Generator with Linear Initialization Chapter 8

Conclusion

Even the most successful attacks started with pointing out a small weakness. based loosely on Confucius This thesis presents cryptanalysis of several stream ciphers, especially known or chosen IV key-recovery attacks against their initialization methods. If the update of the internal state is the same during initialization and keystream generation or if a cipher does not have an initialization method then it is worth to consider a slide attack. We describe slid pairs for Salsa20 and Trivium. The special structure of their starting states limits the amount of slid pairs so that the probability of getting such a slid pair by chance is too small to pose a real threat on these ciphers. For both SNOW ciphers the addition modulo 232 is replaced by an XOR to retain only the FSM as a non-linear part. If this non-linear part has no influence on the linear part during the initialization the construction is vulnerable. When the FSM affects the LFSR we are able to attack roughly half of the initialization clocks. The K2 cipher has a new design aspect to increase non-linearity. The feedback function of one FSR is dynamically chosen by the other FSR among four different linear feedback functions. This makes differential cryptanalysis even more difficult than only dealing with the non-linearity introduced by the FSM. We are only able to attack seven out of 24 initialization clocks for a version where all additions modulo 232 are replaced by XORs. A rather old design is the shrinking generator where the output bit of one LSFR decides whether the output bit of the second LFSR becomes the keystream. This results in an irregularly clocked keystream sequence. We consider known feedback polynomials and a linear initialization with an IV and are able to break the shrinking generator despite the irregular clocking using only a few known IVs. Apart from the shrinking generator the analyzed stream ciphers are rather recent designs. We point out some weaknesses but are not able to break these ciphers. Time will tell whether they withstand cryptanalytic attacks or get broken.

103 104 Chapter 8. Conclusion Bibliography

[ACG+06] F. Armknecht, C. Carlet, P. Gaborit, S. Künzli, W. Meier, and O. Ruatta. Eﬃcient Computation of Algebraic Immunity for Al- gebraic and Fast Algebraic Attacks. In S. Vaudenay (ed.), EURO- CRYPT, volume 4004 of Lecture Notes in Computer Science, pages 147–164. Springer, 2006.

[AFK+08] J.-P. Aumasson, S. Fischer, S. Khazaei, W. Meier, and C. Rech- berger. New Features of Latin Dances: Analysis of Salsa, ChaCha, and Rumba. In K. Nyberg (ed.), FSE, volume 5086 of Lecture Notes in Computer Science, pages 470–488. Springer, 2008.

[AK03] F. Armknecht and M. Krause. Algebraic Attacks on Combiners with Memory. In Boneh [Bon03], pages 162–175.

[ALP04] F. Armknecht, J. Lano, and B. Preneel. Extending the Resynchro- nization Attack. In H. Handschuh and M. A. Hasan (eds.), Selected Areas in Cryptography, volume 3357 of Lecture Notes in Computer Science, pages 19–38. Springer, 2004.

[Arm04] F. Armknecht. Improving Fast Algebraic Attacks. In B. K. Roy and W. Meier (eds.), FSE, volume 3017 of Lecture Notes in Computer Science, pages 65–82. Springer, 2004.

[Arm06] F. Armknecht. Algebraic Attacks on Certain Stream Ciphers. PhD thesis, University of Mannheim, 2006.

[Bab95] S. Babbage. A Space/Time Trade-Oﬀ in Exhaustive Search At- tacks on Stream Ciphers. In European Convention on Security and Detection, number 408. IEEE Conference Publication, 1995.

[BB05] E. Barkan and E. Biham. Conditional Estimators: An Eﬀective Attack on A5/1. In Preneel and Tavares [PT06], pages 1–19.

[Bel11] S. M. Bellovin. Frank Miller: Inventor of the One-Time Pad. Cryptologia, 35(3):203–222, 2011.

105 106 Bibliography

[Ber05] D. J. Bernstein. Salsa20. Report 2005/025, eSTREAM, ECRYPT Stream Cipher Project, Apr. 2005. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/ salsa20_p3.zip.

[Ber08] D. J. Bernstein. Response to “Slid Pairs in Salsa20 and Trivium”. pdf ﬁle released on cr.yp.to, 2008. Retrieved Oct. 23, 2011 from: http://cr.yp.to/snuffle/reslid-20080925.pdf.

[Ber11] D. J. Bernstein. Extending the Salsa20 nonce. In Symmetric Key Encryption Workshop (SKEW), Feb. 2011. Retrieved Oct. 23, 2011 from: http://skew2011.mat.dtu.dk/proceedings/Extending% 20the%20Salsa20%20nonce.pdf.

[BG05] O. Billet and H. Gilbert. Resistance of SNOW 2.0 Against Al- gebraic Attacks. In A. Menezes (ed.), CT-RSA, volume 3376 of Lecture Notes in Computer Science, pages 19–28. Springer, 2005.

[BGW99] M. Briceno, I. Goldberg, and D. Wagner. A Pedagogical Imple- mentation of A5/1, 1999. Retrieved Oct. 23, 2011 from: http: //cryptome.org/jya/a51-pi.htm.

[BHNS10] B. B. Brumley, R. M. Hakala, K. Nyberg, and S. Sovio. Consecu- tive S-box Lookups: A Timing Attack on SNOW 3G. In M. Soriano, S. Qing, and J. López (eds.), ICICS, volume 6476 of Lecture Notes in Computer Science, pages 171–185. Springer, 2010.

[BL05] A. Braeken and J. Lano. On the (Im)Possibility of Practical and Secure Nonlinear Filters and Combiners. In Preneel and Tavares [PT06], pages 159–174.

[Bon03] D. Boneh (ed.). Advances in Cryptology – CRYPTO 2003, 23rd Annual International Cryptology Conference, Santa Barbara, Cal- ifornia, USA, August 17-21, 2003, Proceedings, volume 2729 of Lecture Notes in Computer Science. Springer, 2003.

[BPR11] A. Bogdanov, B. Preneel, and V. Rijmen. Security Evalua- tion of the K2 Stream Cipher, Mar. 2011. Retrieved Oct. 23, 2011 from: http://www.cryptrec.go.jp/estimation/techrep_ id2010_2.pdf.

[BPSZ10] A. Biryukov, D. Priemuth-Schmid, and B. Zhang. Analysis of SNOW 3G; Resynchronization Mechanism. In S. K. Katsikas and P. Samarati (eds.), SECRYPT, pages 327–333. SciTePress, 2010.

[BS90] E. Biham and A. Shamir. Diﬀerential Cryptanalysis of DES- like Cryptosystems. In A. Menezes and S. A. Vanstone (eds.), Bibliography 107

CRYPTO, volume 537 of Lecture Notes in Computer Science, pages 2–21. Springer, 1990. [BS00] A. Biryukov and A. Shamir. Cryptanalytic Time/Memory/Data Tradeoffs for Stream Ciphers. In T. Okamoto (ed.), ASIACRYPT, volume 1976 of Lecture Notes in Computer Science, pages 1–13. Springer, 2000. [Buc65] B. Buchberger. Ein Algorithmus zum Auffinden der Basiselemente des Restklassenringes nach einem nulldimensionalen Polynomideal. PhD thesis, University of Innsbruck, 1965. [BW99] A. Biryukov and D. Wagner. Slide Attacks. In L. R. Knudsen (ed.), FSE, volume 1636 of Lecture Notes in Computer Science, pages 245–259. Springer, 1999. [BW00] A. Biryukov and D. Wagner. Advanced Slide Attacks. In Preneel [Pre00], pages 589–606. [Car06] C. Carlet. On the Higher Order Nonlinearities of Algebraic Immune Functions. In C. Dwork (ed.), CRYPTO, volume 4117 of Lecture Notes in Computer Science, pages 584–601. Springer, 2006. [CF08] C. Carlet and K. Feng. An Infinite Class of Balanced Functions with Optimal Algebraic Immunity, Good Immunity to Fast Alge- braic Attacks and Good Nonlinearity. In J. Pieprzyk (ed.), ASI- ACRYPT, volume 5350 of Lecture Notes in Computer Science, pages 425–440. Springer, 2008. [CKM93] D. Coppersmith, H. Krawczyk, and Y. Mansour. The Shrinking Generator. In D. R. Stinson (ed.), CRYPTO, volume 773 of Lecture Notes in Computer Science, pages 22–39. Springer, 1993. [CKPS00] N. Courtois, A. Klimov, J. Patarin, and A. Shamir. Efficient Algo- rithms for Solving Overdefined Systems of Multivariate Polynomial Equations. In Preneel [Pre00], pages 392–407. [CM03] N. Courtois and W. Meier. Algebraic Attacks on Stream Ciphers with Linear Feedback. In E. Biham (ed.), EUROCRYPT, volume 2656 of Lecture Notes in Computer Science, pages 345–359. Springer, 2003. [Coh95] F. Cohen. A Short History of Cryptography, 1995. Retrieved Oct. 23, 2011 from: http://all.net/edu/curr/ip/Chap2-1.html. [Cou02] N. Courtois. Higher Order Correlation Attacks, XL Algorithm and Cryptanalysis of Toyocrypt. In P. J. Lee and C. H. Lim (eds.), ICISC, volume 2587 of Lecture Notes in Computer Science, pages 182–199. Springer, 2002. 108 Bibliography

[Cou03] N. Courtois. Fast Algebraic Attacks on Stream Ciphers with Linear Feedback. In Boneh [Bon03], pages 176–194.

[CP02] N. Courtois and J. Pieprzyk. Cryptanalysis of Block Ciphers with Overdeﬁned Systems of Equations. In Y. Zheng (ed.), ASI- ACRYPT, volume 2501 of Lecture Notes in Computer Science, pages 267–287. Springer, 2002.

[Cro06] P. Crowley. Truncated diﬀerential cryptanalysis of ﬁve rounds of Salsa20, 2006. Retrieved Oct. 23, 2011 from: http://www.ecrypt. eu.org/stream/papersdir/073.pdf.

[CS91] V. V. Chepyzhov and B. J. M. Smeets. On A Fast Correlation Attack on Certain Stream Ciphers. In EUROCRYPT, pages 176– 185, 1991.

[CYC06] CYCOM Cypher Research Laboratories. A Brief History of Cryp- tography. Last update: January 24, 2006. Retrieved Oct. 23, 2011 from: http://www.cypher.com.au/crypto_history.htm.

[DC09] B. Debraize and I. M. Corbella. Fault Analysis of the Stream Ci- pher Snow 3G. In L. Breveglieri, I. Koren, D. Naccache, E. Oswald, and J.-P. Seifert (eds.), FDTC, pages 103–110. IEEE Computer Society, 2009.

[dCP05] C. de Cannière and B. Preneel. Trivium – A Stream Cipher Construction Inspired by Block Cipher Design Principles. Re- port 2005/030, eSTREAM, ECRYPT Stream Cipher Project, Apr. 2005. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/ stream/p3ciphers/trivium/trivium_p3.pdf.

[DGM05] D. K. Dalai, K. C. Gupta, and S. Maitra. Cryptographically Sig- niﬁcant Boolean Functions: Construction and Analysis in Terms of Algebraic Immunity. In H. Gilbert and H. Handschuh (eds.), FSE, volume 3557 of Lecture Notes in Computer Science, pages 98–111. Springer, 2005.

[DH76] W. Diﬃe and M. E. Hellman. New Directions in Cryptography. IEEE Transactions on Information Theory, 22(6):644–654, 1976.

[DR02] J. Daemen and V. Rijmen. The Design of Rijndael: AES – The Advanced Encryption Standard. Springer, 2002.

[DS09] I. Dinur and A. Shamir. Cube Attacks on Tweakable Black Box Polynomials. In A. Joux (ed.), EUROCRYPT, volume 5479 of Lecture Notes in Computer Science, pages 278–299. Springer, 2009. Bibliography 109

[DS11] I. Dinur and A. Shamir. Breaking Grain-128 with Dynamic Cube Attacks. In A. Joux (ed.), FSE, volume 6733 of Lecture Notes in Computer Science, pages 167–187. Springer, 2011.

[ECR08] ECRYPT: Network of Excellence in Cryptology, 2004-2008. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/ ecrypt1/.

[EJ02] P. Ekdahl and T. Johansson. A New Version of the Stream Cipher SNOW. In K. Nyberg and H. M. Heys (eds.), Selected Areas in Cryptography, volume 2595 of Lecture Notes in Computer Science, pages 47–61. Springer, 2002.

[EJ03] P. Ekdahl and T. Johansson. Another Attack on A5/1. IEEE Transactions on Information Theory, 49(1):284–289, 2003.

[Ekd03] P. Ekdahl. On LFSR based Stream Ciphers – Analysis and Design. PhD thesis, Lund University, 2003.

[eST08] eSTREAM: The ECRYPT Stream Cipher Project, 2004-2008. Re- trieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/stream/.

[ETS06a] ETSI/SAGE. Specification of the 3GPP Confidentiality and In- tegrity Algorithms UEA2 & UIA2. Document 2: SNOW 3G Spec- ification, version 1.1, Sept. 2006. Retrieved Oct. 23, 2011 from: http://cryptome.org/uea2-uia2/snow_3g_spec.pdf.

[ETS06b] ETSI/SAGE. Speciﬁcation of the 3GPP Conﬁdentiality and In- tegrity Algorithms UEA2 & UIA2. Document 5: Design and Evaluation Report, version 1.1, Sept. 2006. Retrieved Oct. 23, 2011 from: http://cryptome.org/uea2-uia2/uea2_design_ evaluation.pdf.

[Fau99] J.-C. Faugère. A new eﬃcient algorithm for computing Gröbner basis (F4). Journal of Pure and Applied Algebra, 139:61–88, June 1999.

[Fau02] J.-C. Faugère. A new eﬃcient algorithm for computing Gröbner bases without reduction to zero (F5). pages 75–83, New York, NY, USA, 2002. ACM Press.

[Fis08] S. Fischer. Analysis of Lightweight Stream Ciphers. PhD thesis, École Polytechnique Fédérale de Lausanne, 2008.

[FKM08] S. Fischer, S. Khazaei, and W. Meier. Chosen IV Statistical Anal- ysis for Key Recovery Attacks on Stream Ciphers. In S. Vaudenay (ed.), AFRICACRYPT, volume 5023 of Lecture Notes in Computer Science, pages 236–245. Springer, 2008. 110 Bibliography

[FM07] S. Fischer and W. Meier. Algebraic Immunity of S-Boxes and Aug- mented Functions. In A. Biryukov (ed.), FSE, volume 4593 of Lec- ture Notes in Computer Science, pages 366–381. Springer, 2007.

[FMB+06] S. Fischer, W. Meier, C. Berbain, J.-F. Biasse, and M. J. B. Robshaw. Non-randomness in eSTREAM Candidates Salsa20 and TSC-4. In R. Barua and T. Lange (eds.), INDOCRYPT, volume 4329 of Lecture Notes in Computer Science, pages 2–16. Springer, 2006.

[Gol97] J. D. Golic. Cryptanalysis of Alleged A5 Stream Cipher. In Ad- vances in Cryptology – EUROCRYPT’97, volume 1233 of Lecture Notes in Computer Science, pages 239–255. Springer-Verlag, 1997.

[Hel80] M. E. Hellman. A Cryptanalytic Time-Memory Trade-Oﬀ. IEEE Transactions on Information Theory, 26(4):401–406, July 1980.

[Hel07] M. Hell. On the Design and Analysis of Stream Ciphers. PhD thesis, Lund University, 2007.

[Hon05] J. Hong. certain pairs of key-IV pairs for Trivium. ECRYPT Discussion Forum, article created at 05:11PM on September 13, 2005. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/ stream/phorum/read.php?1,152.

[HYY+10] M. Henricksen, W.-S. Yap, C. H. Yian, S. Kiyomoto, and T. Tanaka. Side-Channel Analysis of the K2 Stream Cipher. In R. Steinfeld and P. Hawkes (eds.), ACISP, volume 6168 of Lecture Notes in Computer Science, pages 53–73. Springer, 2010.

[JJ99] T. Johansson and F. Jönsson. Fast Correlation Attacks Based on Turbo Code Techniques. In Wiener [Wie99], pages 181–197.

[JJ00] T. Johansson and F. Jönsson. Fast Correlation Attacks through Re- construction of Linear Polynomials. In M. Bellare (ed.), CRYPTO, volume 1880 of Lecture Notes in Computer Science, pages 300–315. Springer, 2000.

[Joh09] T. R. Johnson. American Cryptology during the Cold War, 1945- 1989. Book III: Retrenchment and Reform, 1972-1980, page 232. File released on September 13, 2009. Retrieved Oct. 23, 2011 from: http://cryptome.org/0001/nsa-meyer.htm.

[Kah67a] D. Kahn. The Codebreakers – The Story of Secret Writing. Macmillan Publishing, ﬁrst edition, 1967.

[Kah67b] D. Kahn. The Codebreakers – The Story of Secret Writing. Macmillan Publishing, ﬁrst edition, 1967. Bibliography 111

[Kas63] F. W. Kasiski. Die Geheimschriften und die Dechiﬀrir-Kunst (sic.). E. S. Mittler und Sohn, Berlin, 1863.

[Kel98] T. Kelly. The Myth of the Skytale. Cryptologia, 22(3):244–260, 1998.

[Ker83] A. Kerckhoﬀs. La cryptographie militaire. Journal des sciences militaires, IX:5–83, Jan. 1883.

[KJJ99] P. C. Kocher, J. Jaﬀe, and B. Jun. Diﬀerential Power Analysis. In Wiener [Wie99], pages 388–397.

[Koc96] P. C. Kocher. Timing Attacks on Implementations of Diﬃe- Hellman, RSA, DSS, and Other Systems. In N. Koblitz (ed.), CRYPTO, volume 1109 of Lecture Notes in Computer Science, pages 104–113. Springer, 1996.

[KS99] A. Kipnis and A. Shamir. Cryptanalysis of the HFE Public Key Cryptosystem by Relinearization. In Wiener [Wie99], pages 19–30.

[KTS07] S. Kiyomoto, T. Tanaka, and K. Sakurai. K2: A Stream Ci- pher Algorithm using Dynamic Feedback Control. In J. Hernando, E. Fernández-Medina, and M. Malek (eds.), SECRYPT, pages 204– 213. INSTICC Press, 2007.

[Lan06] J. Lano. Cryptanalysis and Design of Synchronous Stream Ciphers. PhD thesis, Katholieke Universiteit Leuven, 2006.

[LN97] R. Lidl and H. Niederreiter. Finite Fields, volume 20 of Encyclo- pedia of Mathematics and its Applications. Cambridge University Press, second edition, 1997.

[Mah45] A. P. Mahon. The History of Hut Eight 1939 - 1945, 1945. Re- trieved Oct. 23, 2011 from: http://www.ellsbury.com/hut8/ hut8-000.htm.

[Mas69] J. L. Massey. Shift-Register Synthesis and BCH Decoding. IEEE Transactions on Information Theory, 15(1):122–127, 1969.

[Max06] A. Maximov. Some Words on Cryptanalysis of Stream Ciphers. PhD thesis, Lund University, 2006.

[MB07] A. Maximov and A. Biryukov. Two Trivial Attacks on Trivium. In C. M. Adams, A. Miri, and M. J. Wiener (eds.), Selected Areas in Cryptography, volume 4876 of Lecture Notes in Computer Science, pages 36–55. Springer, 2007. 112 Bibliography

[MCP07] C. McDonald, C. Charnes, and J. Pieprzyk. Attacking Bivium with MiniSat. Report 2007/040, eSTREAM, ECRYPT Stream Cipher Project, Apr. 2007. Retrieved Oct. 23, 2011 from: http://www. ecrypt.eu.org/stream/papersdir/2007/040.pdf.

[MG90] M. J. Mihaljevic and J. D. Golic. A Fast Iterative Algorithm For A Shift Register Initial State Reconstruction Given The Noisy Out- put Sequence. In J. Seberry and J. Pieprzyk (eds.), AUSCRYPT, volume 453 of Lecture Notes in Computer Science, pages 165–175. Springer, 1990.

[Mil01] A. R. Miller. The Cryptographic Mathematics of Enigma. Technical report, National Security Agency, 2001. Re- trieved Oct. 23, 2011 from: http://www.nsa.gov/about/ _files/cryptologic_heritage/publications/wwii/engima_ cryptographic_mathematics.pdf.

[MPC04] W. Meier, E. Pasalic, and C. Carlet. Algebraic Attacks and De- composition of Boolean Functions. In C. Cachin and J. Camenisch (eds.), EUROCRYPT, volume 3027 of Lecture Notes in Computer Science, pages 474–491. Springer, 2004.

[MS89] W. Meier and O. Staﬀelbach. Fast Correlation Attacks on Certain Stream Ciphers. Journal of Cryptology, 1(3):159–176, 1989.

[MS94] W. Meier and O. Staﬀelbach. The Self-Shrinking Generator. In A. D. Santis (ed.), EUROCRYPT, volume 950 of Lecture Notes in Computer Science, pages 205–214. Springer, 1994.

[MvOV96] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Hand- book of Applied Cryptography. Publisher CRC Press, ﬁrst edition, 1996.

[NTW10] K. Nohl, E. Tews, and R.-P. Weinmann. Cryptanalysis of the DECT Standard Cipher. In S. Hong and T. Iwata (eds.), FSE, volume 6147 of Lecture Notes in Computer Science, pages 1–18. Springer, 2010.

[NW06] K. Nyberg and J. Wallén. Improved Linear Distinguishers for SNOW 2.0. In M. J. B. Robshaw (ed.), FSE, volume 4047 of Lec- ture Notes in Computer Science, pages 144–162. Springer, 2006.

[Pen96] W. T. Penzhorn. Correlation Attacks on Stream Ciphers: Comput- ing Low-Weight Parity Checks Based on Error-Correcting Codes. In D. Gollmann (ed.), FSE, volume 1039 of Lecture Notes in Com- puter Science, pages 159–172. Springer, 1996. Bibliography 113

[Pre00] B. Preneel (ed.). Advances in Cryptology – EUROCRYPT 2000, International Conference on the Theory and Application of Cryp- tographic Techniques, Bruges, Belgium, May 14-18, 2000, Proceed- ing, volume 1807 of Lecture Notes in Computer Science. Springer, 2000.

[PS11] D. Priemuth-Schmid. Attacks on Simpliﬁed Versions of K2. In P. Bouvry, M. A. Kłopotek, F. Leprevost, M. Marciniak, A. Mykowiecka, and H. Rybiński (eds.), SIIS, volume 7053 of Lec- ture Notes in Computer Science, pages 117–127. Springer, 2011.

[PSB08] D. Priemuth-Schmid and A. Biryukov. Slid Pairs in Salsa20 and Trivium. In D. R. Chowdhury, V. Rijmen, and A. Das (eds.), INDOCRYPT, volume 5365 of Lecture Notes in Computer Science, pages 1–14. Springer, 2008.

[PT06] B. Preneel and S. E. Tavares (eds.). Selected Areas in Cryptog- raphy, 12th International Workshop, SAC 2005, Kingston, ON, Canada, August 11-12, 2005, Revised Selected Papers, volume 3897 of Lecture Notes in Computer Science. Springer, 2006.

[Rad06] H. Raddum. Cryptanalytic Results on Trivium. Report 2006/039, eSTREAM, ECRYPT Stream Cipher Project, Mar. 2006. Re- trieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/stream/ papersdir/2006/039.ps.

[Rec04] C. Rechberger. Side-Channel Analysis of Stream Ciphers, 2004.

[RSA78] R. L. Rivest, A. Shamir, and L. M. Adleman. A Method for Ob- taining Digital Signatures and Public-Key Cryptosystems. Com- munications of the ACM, 21(2):120–126, 1978.

[RSA83] R. L. Rivest, A. Shamir, and L. M. Adleman. Cryptographic Com- munications System and Method. U.S. Patent 4,405,829, 1983. Re- trieved Oct. 23, 2011 from: http://www.google.com/patents? vid=4405829.

[Sha45] C. E. Shannon. A Mathematical Theory of Cryptography. Memo- randum MM 45-110-02, Bell Laboratories, 114 pp. + 25 ﬁgs. Clas- siﬁed report, superseded by the following paper, Sept. 1945.

[Sha48] C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27:379–423 and 623–656, July and Oct. 1948.

[Sha49] C. E. Shannon. Communication Theory of Secrecy Systems. Bell System Technical Journal, 28(4):656–715, 1949. 114 Bibliography

[Sie84] T. Siegenthaler. Correlation-Immunity of Nonlinear Combining Functions for Cryptographic Applications. IEEE Transactions on Information Theory, 30(5):776–780, 1984. [Sie85] T. Siegenthaler. Decrypting a Class of Stream Ciphers Using Ciphertext Only. IEEE Transactions on Computers, 34(1):81–85, 1985. [Sin99] S. Singh. The Code Book. Fourth Estate, first edition, 1999. [TK07] M. S. Turan and O. Kara. Linear Approximations for 2-round Trivium. In SASC 2007 – The State of the Art of Stream Ciphers, 2007. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/ stream/papersdir/2007/008.pdf. [TSK+07] Y. Tsunoo, T. Saito, H. Kubo, T. Suzaki, and H. Nakashima. Differential Cryptanalysis of Salsa20/8. In SASC 2007 – The State of the Art of Stream Ciphers, 2007. Retrieved Oct. 23, 2011 from: http://www.ecrypt.eu.org/stream/papersdir/2007/010.pdf. [Ver19] G. S. Vernam. SECRET SIGNALING SYSTEM. U.S. Patent 1,310,719, 1919. Retrieved Oct. 23, 2011 from: http://www. google.com/patents?vid=1310719. [Ver26] G. S. Vernam. Cipher Printing Telegraph Systems For Secret Wire and Radio Telegraphic Communications. Journal of American In- stitute of Electrical Engineers, 45:109–115, 1926. [Vie07] M. Vielhaber. Breaking ONE.Fivium by AIDA an Algebraic IV Differential Attack. Report 2007/413, Cryptology ePrint Archive, Oct. 2007. Retrieved Oct. 23, 2011 from: http://eprint.iacr. org/2007/413.pdf. [WBdC03] D. Watanabe, A. Biryukov, and C. de Cannière. A Distinguishing Attack of SNOW 2.0 with Linear Masking Method. In M. Matsui and R. J. Zuccherato (eds.), Selected Areas in Cryptography, volume 3006 of Lecture Notes in Computer Science, pages 222–233. Springer, 2003. [Wie99] M. J. Wiener (ed.). Advances in Cryptology – CRYPTO ’99, 19th Annual International Cryptology Conference, Santa Barbara, Cal- ifornia, USA, August 15-19, 1999, Proceedings, volume 1666 of Lecture Notes in Computer Science. Springer, 1999. [Wik05] Wikimedia Commons. Enigma Machine at the Imperial War Mu- seum, London. Image photographed and published by Karsten Sperling, 2005. Retrieved Oct. 23, 2011 from: http://commons. wikimedia.org/wiki/File:EnigmaMachineLabeled.jpg. Bibliography 115

[Wik07] Wikimedia Commons. Skytale. Image published by user Luringen, 2007. Retrieved Oct. 23, 2011 from: http://commons.wikimedia. org/wiki/File:Skytale.png.

[Wik11] Wikipedia, The Free Encyclopedia. English and German texts concerning miscellaneous topics, 2011. Available Oct. 23, 2011 at: http://en.wikipedia.org/wiki/Main_Page and http://de. wikipedia.org/wiki/Hauptseite.

[WSLM05] D. Whiting, B. Schneier, S. Lucks, and F. Muller. Phelix – Fast Encryption and Authentication in a Single Cryptographic Prim- itive. Report 2005/020, eSTREAM, ECRYPT Stream Cipher Project, Apr. 2005. Retrieved Oct. 23, 2011 from: http://www. ecrypt.eu.org/stream/p2ciphers/phelix/phelix_p2.pdf. 116 Bibliography Appendix A

Salsa20 System of Equations for a Slid Pair

Word equations given by the equations S0 = F (S) and X0 = F (X).

0 0 k0 = k3 ⊕ ((σ0 + k5) ≪ 7) x1 = x4 ⊕ ((x0 + x12) ≪ 7) 0 0 0 0 k1 = c0 ⊕ ((k0 + σ0) ≪ 9) x2 = x8 ⊕ ((x1 + x0) ≪ 9) 0 0 0 0 0 0 k2 = k5 ⊕ ((k1 + k0) ≪ 13) x3 = x12 ⊕ ((x2 + x1) ≪ 13) 0 0 0 0 0 σ0 = σ0 ⊕ ((k2 + k1) ≪ 18) x0 = x0 ⊕ ((x3 + x2) ≪ 18) 0 0 n0 = c1 ⊕ ((σ1 + k0) ≪ 7) x6 = x9 ⊕ ((x5 + x1) ≪ 7) 0 0 0 0 n1 = k6 ⊕ ((n0 + σ1) ≪ 9) x7 = x13 ⊕ ((x6 + x5) ≪ 9) 0 0 0 0 0 0 k3 = k0 ⊕ ((n1 + n0) ≪ 13) x4 = x1 ⊕ ((x7 + x6) ≪ 13) 0 0 0 0 0 σ1 = σ1 ⊕ ((k3 + n1) ≪ 18) x5 = x5 ⊕ ((x4 + x7) ≪ 18) 0 0 k4 = k7 ⊕ ((σ2 + n0) ≪ 7) x11 = x14 ⊕ ((x10 + x6) ≪ 7) 0 0 0 0 c0 = k1 ⊕ ((k4 + σ2) ≪ 9) x8 = x2 ⊕ ((x11 + x10) ≪ 9) 0 0 0 0 0 0 c1 = n0 ⊕ ((c0 + k4) ≪ 13) x9 = x6 ⊕ ((x8 + x11) ≪ 13) 0 0 0 0 0 σ2 = σ2 ⊕ ((c1 + c0) ≪ 18) x10 = x10 ⊕ ((x9 + x8) ≪ 18) 0 0 k5 = k2 ⊕ ((σ3 + k4) ≪ 7) x12 = x3 ⊕ ((x15 + x11) ≪ 7) 0 0 0 0 k6 = n1 ⊕ ((k5 + σ3) ≪ 9) x13 = x7 ⊕ ((x12 + x15) ≪ 9) 0 0 0 0 0 0 k7 = k4 ⊕ ((k6 + k5) ≪ 13) x14 = x11 ⊕ ((x13 + x12) ≪ 13) 0 0 0 0 0 σ3 = σ3 ⊕ ((k7 + k6) ≪ 18) x15 = x15 ⊕ ((x14 + x13) ≪ 18)

117 118 Appendix A. Salsa20 System of Equations for a Slid Pair

Word equations given by the feedforward for the ﬁrst keystream words and for the second keystream words.

0 0 z0 = x0 + σ0 z0 = x0 + σ0 0 0 0 z1 = x1 + k0 z1 = x1 + k0 0 0 0 z2 = x2 + k1 z2 = x2 + k1 0 0 0 z3 = x3 + k2 z3 = x3 + k2 0 0 0 z4 = x4 + k3 z4 = x4 + k3 0 0 z5 = x5 + σ1 z5 = x5 + σ1 0 0 0 z6 = x6 + n0 z6 = x6 + n0 0 0 0 z7 = x7 + n1 z7 = x7 + n1 0 0 0 z8 = x8 + c0 z8 = x8 + c0 0 0 0 z9 = x9 + c1 z9 = x9 + c1 0 0 z10 = x10 + σ2 z10 = x10 + σ2 0 0 0 z11 = x11 + k4 z11 = x11 + k4 0 0 0 z12 = x12 + k5 z12 = x12 + k5 0 0 0 z13 = x13 + k6 z13 = x13 + k6 0 0 0 z14 = x14 + k7 z14 = x14 + k7 0 0 z15 = x15 + σ3 z15 = x15 + σ3 Appendix B

SNOW 3G Simplify the Equation with two S-boxes

We want to simplify the equation

out out ∆ S2(∆k1) ⊕ ∆ S1(∆k2) = ∆k4 .

The main difficulty is that S1 and S2 use the same MixColumns matrix but over two different fields GF (28). We start with rewriting this equation in byte notation as

 out   out    S (∆k(0)) S (∆k(0)) (0) ∆ Q 1 ∆ R 2 ∆k4  out   out   (1)   (1)   (1)   ∆ SQ(∆k1 )   ∆ SR(∆k2 )   ∆k4  MC2 · out ⊕ MC1 · out =   .  (2)   (2)   ∆k(2)   ∆ SQ(∆k1 )   ∆ SR(∆k2 )  4 out (3) out (3) ∆k(3) ∆ SQ(∆k1 ) ∆ SR(∆k2 ) 4

−1 Then multiplying this equation with the inverse matrix MC1 , we get

  out   out    S (∆k(0)) S (∆k(0)) (0) ∆ Q 1 ∆ R 2 ∆k4   out   out    (1)   (1)   (1)  −1   ∆ SQ(∆k1 )  ∆ SR(∆k2 ) −1 ∆k4  MC · MC2 · out ⊕ out = MC · . 1   (2)   (2)  1 ∆k(2)    ∆ SQ(∆k1 )  ∆ SR(∆k2 ) 4 out (3) out (3) ∆k(3) ∆ SQ(∆k1 ) ∆ SR(∆k2 ) 4

If we expand the matrix multiplications and have a look at the byte vectors it shows that the first entry of the first vector contains the byte out 0 ∆ SQ(∆k1) and a byte polynomial containing only the most significant out msb bits of all four values ∆ SQ. We denote this polynomial with p0 . The msb other three rows have similar structures but with different polynomials pi

119 120 Appendix B. SNOW 3G Simplify the Equation with two S-boxes

(i = 1, 2, 3). Therefore, we can rewrite the equation to     out (0) out (0)    (0)  ∆ SR(∆k ) ∆ SQ(∆k ) msb ∆k 2 1 p0 4  out (1)   out (1)   (1)   ∆ S (∆k )   ∆ S (∆k )   pmsb   ∆k   R 2  =  Q 1  ⊕  1  ⊕ MC−1 · 4 .  out (2)   out (2)   msb  1  (2)  p2  ∆k   ∆ SR(∆k2 )   ∆ SQ(∆k1 )  4 msb (3) out (3) out (3) p3 ∆k ∆ SR(∆k2 ) ∆ SQ(∆k1 ) 4

out 0 By m0 we denote the most signiﬁcant bit of the value ∆ SQ(∆k1) and out 1 with m1 the most signiﬁcant bit of the value ∆ SQ(∆k1) as well as m2 for out out 2 3 msb ∆ SQ(∆k1) and m3 for ∆ SQ(∆k1). Then the polynomials pi i = 0,..., 3 are

msb 7 6 5 4 p0 = (m1 ⊕ m3)x + (m0 ⊕ m1)x + (m2 ⊕ m3)x + (m1 ⊕ m2)x 2 + (m0 ⊕ m2)x + (m1 ⊕ m2)x + (m0 ⊕ m1 ⊕ m2 ⊕ m3)

msb 7 6 5 4 p1 = (m0 ⊕ m2)x + (m1 ⊕ m2)x + (m0 ⊕ m3)x + (m2 ⊕ m3)x 2 + (m1 ⊕ m3)x + (m2 ⊕ m3)x + (m0 ⊕ m1 ⊕ m2 ⊕ m3)

msb 7 6 5 4 p2 = (m1 ⊕ m3)x + (m2 ⊕ m3)x + (m0 ⊕ m1)x + (m0 ⊕ m3)x 2 + (m0 ⊕ m2)x + (m0 ⊕ m3)x + (m0 ⊕ m1 ⊕ m2 ⊕ m3)

msb 7 6 5 4 p3 = (m0 ⊕ m2)x + (m0 ⊕ m3)x + (m1 ⊕ m2)x + (m0 ⊕ m1)x 2 + (m1 ⊕ m3)x + (m0 ⊕ m1)x + (m0 ⊕ m1 ⊕ m2 ⊕ m3) .