Eindhoven University of Technology

MASTER

Towards post-quantum bitcoin side-channel analysis of bimodal lattice signatures

Groot Bruinderink, L.

Award date: 2016

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Towards Post-Quantum Bitcoin Side-Channel Analysis of Bimodal Lattice Signatures

Leon Groot Bruinderink Email: [email protected] Student-ID: 0682427

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Industrial and Applied Mathematics

Supervisors: prof.dr. Tanja Lange (TU/e) dr. Andreas H¨ulsing(TU/e) dr. Lodewijk Bonebakker (ING)

January 2016 Acknowledgements This thesis is the result of many months of work, both at my internship company ING and university TU/e. Before we rush into the contents of this thesis, I would like to take a moment to thank the people who made it possible. First of all, I would like to thank Tanja Lange (TU/e) and Lodewijk Bonebakker (ING) for supervising this project and introducing me to the fascinating aspects of their work. I am very grateful for the freedom and guidance they offered me throughout this period. Second, I would like to thank my other TU/e supervisor Andreas H¨ulsing. It was always possible for me to ask questions and discuss my thesis. When I was stuck, he inspired me to continue my search for answers. I would also like to thank these three people, together with Jan Draisma, to take part in my graduation committee. I am also very thankful to Daniel J. Bernstein for attending the meetings with Tanja and Andreas, and for sharing his knowledge. Also thanks to Thijs Laarhoven and Benne de Weger for discussing the unknowns of this thesis. Last but not least, I would like to thank my girlfriend, my family and friends for their personal support and trust, during this period and all the years before. It was never hard to clear my head and just enjoy spending time with them. Abstract In this thesis, we investigate Bitcoin’s long term vision for the cryptographic protocols it relies on. The biggest threat in the near future is a large quantum computer, able to forge the digital signatures used by Bitcoin to secure transactions. When a large quantum computer arises, Bitcoin has to switch to post-quantum , in which Bimodal Lattice Signatures (BLISS) seem most promising to use. However, it is unclear if these signatures are vulnerable to side-channel attacks, which are mountable on actual implementations. An important step in BLISS is sampling a discrete-Gaussian-distributed integer, which is not straightforward to do. We investigated two sampling algorithms most used in practice, which both rely on table look-ups. We show that both methods are vulnerable to cache-attacks, leading to extraction of the secret key. We provide experimental results as verification. This means we need to re-invent ways to sample a discrete Gaussian, or implement current methods more securely, before the scheme is ready for implementation in the real-world.

ii Contents

List of Algorithms3

List of Tables 4

List of Figures5

1 Introduction6 1.1 Motivation...... 6 1.2 Our Contributions...... 7 1.3 Roadmap...... 7

2 The Security of the Blockchain8 2.1 Introduction to Bitcoin...... 8 2.2 One-Way Functions and Hash-Functions...... 8 2.3 Proof-of-Work and Hash-Chaining...... 9 2.4 Double Spending and 51% Computational Security...... 10 2.5 Adjustments for Post-Quantum Security...... 10 2.6 Conclusion...... 11

3 Digital Signature Schemes 12 3.1 Public-Key Cryptography...... 12 3.2 Properties of Digital Signature Schemes...... 12 3.3 RSA Signatures...... 14 3.4 Elliptic Curve Signatures...... 15 3.5 Factorization and Discrete Log with Shor’s Algorithm...... 17

4 Hash-Based Signature Schemes 18 4.1 One-Time Signature Schemes...... 18 4.2 Merkle Signature Schemes...... 19 4.3 Practicality Issues...... 23

5 Introduction to Lattices 24 5.1 Notations...... 24 5.2 Definitions and Bases...... 24 5.3 Lattice Basis Reduction...... 27 5.4 Hard Lattice-Problems...... 28 5.5 A First Attempt of Lattice-Based Signatures...... 31

6 Lattice-Based Signatures In Practice 33 6.1 More Hard Lattice Problems: SIS and LWE...... 33 6.2 BLISS: Bimodal Lattice Signature Scheme...... 34 6.3 Gaussian Sampling...... 36 6.3.1 The Discrete Gaussian Distribution...... 36 6.3.2 Rejection Sampling...... 37 6.3.3 Cumulative Distribution Table...... 37 6.4 Lattice Implementations Via NTRU Lattices...... 38

1 6.5 Parameter Suggestions For BLISS...... 39

7 Side-Channel Attacks 40 7.1 Introduction...... 40 7.2 Timing Attacks...... 40 7.3 Cache-Attacks...... 40 7.4 Countermeasures...... 45

8 Cache-Attacks on BLISS 46 8.1 Intuition behind the Cache-Attacks...... 46 8.2 Cache-Attack Model...... 48 8.3 Cache-Attack 1: CDT Sampling...... 48 8.3.1 Modified CDT Sampling with Acceleration Table...... 48 8.3.2 Cache-Attack Weaknesses...... 50 8.3.3 Exploiting the Weakness...... 51 8.3.4 Extracting the Secret Key...... 53 8.3.5 Complexity Analysis...... 55 8.4 Cache-Attack 2: Rejection Sampling...... 56 8.4.1 Modified Rejection Sampling with Exponential Table...... 56 8.4.2 Cache-Attack Weakness and Exploitation...... 57 8.4.3 Extracting the Secret Key...... 58 8.4.4 Complexity Analysis...... 59 8.5 Experiments...... 59 8.6 Countermeasures...... 60 8.7 A Short Note on Timing Attacks...... 61

9 Summary 62 9.1 Conclusions...... 62 9.2 Future Work...... 63

Bibliography 64

Appendix 66

A Cache Weaknesses for Suggested Parameter Sets 66

2 List of Algorithms

1 RSA Key Generation...... 14 2 RSA Signing...... 14 3 RSA Verification...... 14 4 Elliptic Curve Key Generation...... 16 5 Elliptic Curve DSA Signing...... 16 6 Elliptic Curve DSA Verification...... 16 7 LOTSS Key Generation...... 18 8 LOTSS Signing...... 18 9 LOTSS Verification...... 19 10 Merkle Key Generation...... 19 11 Merkle Signing...... 20 12 Merkle Verification...... 22 13 LLL Lattice Basis Reduction...... 28 14 GGH Key Generation...... 31 15 GGH Signing...... 31 16 GGH Verification...... 31 17 BLISS Key Generation...... 34 18 BLISS Signing...... 35 19 BLISS Verification...... 35 20 Basic Rejection Sampling...... 37 21 CDT Sampling...... 38 22 Square-and-Multiply Algorithm...... 40 23 CDT Sampling with Acceleration Table...... 50 24 Cache-Attack on BLISS with CDT Sampling...... 55 25 Rejection Sampling with Exponential Table...... 57 26 Cache-Attack on BLISS with Rejection sampling...... 59

3 List of Tables

1 Parameter Suggestions for BLISS...... 39 2 Visualization of Intersection Weakness...... 51 3 Visualization of Jump Weakness...... 51 4 Experimental Results Cache-Attacks on BLISS...... 60 5 Table of Cache-Line Analysis BLISS-0...... 67 6 Cache Weaknesses for BLISS-0...... 68 7 Table of Cache-Line Analysis BLISS-I...... 71 8 Cache Weaknesses for BLISS-I...... 71 9 Table of Cache-Line Analysis BLISS-II...... 73 10 Cache Weaknesses for BLISS-II...... 74 11 Table of Cache-Line Analysis BLISS-III...... 77 12 Cache Weaknesses for BLISS-III...... 77 13 Table of Cache-Line Analysis BLISS-IV...... 80 14 Cache Weaknesses for BLISS-IV...... 81

4 List of Figures

1 Secp256k1...... 15 2 Merkle Tree...... 20 3 Merkle Signature...... 21 4 Merkle Signature Verification...... 22 5 Lattice Spanned By Two Vectors...... 25 6 Lattice With Two Different Bases...... 26 7 Visualization of LLL basis reduction...... 28 8 The Shortest Vector Problem...... 29 9 The Closest Vector Problem...... 30 10 Attack on GGH Signature Scheme...... 32 11 Discrete Gaussian distribution...... 36 12 Visualization of Cache Memory...... 41 13 Prime + Probe cache attack...... 43 14 Visualization of Cache-Attack on RSA...... 44 15 CDT Sampling with Acceleration Table...... 49 16 Biased Requirement...... 53 17 Weight Function Rejection Sampling...... 57

5 1 Introduction

1.1 Motivation With the introduction of Bitcoin in 2009 [19], by somebody using the pseudonym Satoshi Nakamoto, a new alternative payment system was offered to our society. It is completely decentralized: no trusted third party, like a bank, is needed to validate transactions. Anyone with a computer and internet access, is able to set up an account and start making payments. By now, also many banks [13] are experimenting with this new, decentralized system for handling their international payments. Setting up a new financial system for banks however, takes a lot of time. It requires a thorough analysis of security threats, as well as many political agreements. This is the reason why these systems are not changed often, and actually have not changed in essence, since the first digital version of the transaction system. We will examine future security threats for Bitcoin, as it can add value to the long-term vision, that all banks have to agree upon. The reason why these decentralized payment systems, like Bitcoin, are called crypto- currencies is that without cryptography these systems would not exist. An important aspect is the use of digital signature schemes: they provide for authentication of users and integrity of transactions. The security of these signature schemes is based on hard mathematical problems, like factorization or the . However, researchers believe it is very likely that in 10 to 15 years there will be a large quantum computer, able to solve these problems instantly and thus breaking the signature schemes based on them. It means we have to change the security of signature schemes to other mathematical problems, which are still hard for quantum computers. Post-quantum cryptography refers to cryptosystems, which are thought to be secure against an attack by a quantum computer. This thesis will focus on lattice-based signature schemes, and in particular the Bimodal Lattice Signature Scheme (BLISS) [8], which is a highly optimized scheme introduced by Ducas et. al. However, it is not well understood how the security of lattice-based signature schemes is affected by so-called Side-Channel Attack (SCA). These attacks use physical information, like power consumption, timing information or memory access patterns, to break the security of cryptographic implementations. The motivation to examine the possibilities of these attacks, is to narrow the gap between theoretically secure post-quantum cryptography and practically secure implementations thereof. This is the main objective of this thesis.

6 1.2 Our Contributions This thesis slowly builds up to the final section on side-channel attacks on BLISS. There are two main contributions in the shape of two concrete side-channel attacks. These attacks exploit cache-memory access patterns, and are therefore called cache-attacks, of an important step in the BLISS signature scheme: the discrete Gaussian sampler. Both attacks are capable of extracting the secret key and thus breaking the signature scheme. The results include an experiment section of a modeled version of the attack. Last, we discuss the possibility of remote timing attacks and we conjecture them to be possible.

1.3 Roadmap The thesis begins with a journey through the cryptographic aspects of the Bitcoin protocol. The ”common denominator” in these first two chapters is the (lack of) security against quantum computers. First, chapter 2 will describe the security mechanisms of Bitcoin’s Blockchain: the public ledger of all transactions. It will explain the main building block for decentralization of trust (hash-functions) and the influence of Grover’s algorithm, one of the two major results of quantum computing related to cryptography. In chapter 3, digital signatures are introduced and their importance in Bitcoin is explained. Two widely-used digital signature schemes, RSA and Elliptic Curves/DSA, are introduced and a brief explanation is given why these systems are broken by Shor’s algorithm, the other major result of quantum computing related to cryptography. From this point, it will be clear that we need to switch to post-quantum signature schemes. Chapter 4 gives the first example of such a scheme: hash-based signature schemes. We will also explain briefly why these schemes have some practical issues as a replacement of the digital signatures used in Bitcoin. Chapter 5 introduces the concept of lattices, with two hard mathematical lattice problems. These problems can be used as the basis for a digital signature scheme. However, at the end of this chapter we will show that it is not that straightforward to use them. In chapter 6 two additional hard lattice problems are given, which are practical for lattice- based signature schemes. Also BLISS is introduced in this chapter, together with the discrete Gaussian sampler step. The remaining part of the thesis will focus on the main objective and contributions. In chapter 7, side-channel attacks are introduced. Two possible side-channel attacks against RSA are given, based on timing and cache information. We end the section with some general countermeasures. In chapter 8, our two main contributions are given. We briefly summarize two practical algorithms for a discrete Gaussian sampler and show their weaknesses against cache-attacks. Experimental results and countermeasures are discussed. We briefly discuss the possibility of remote timing attacks. Finally, in chapter 9 we end the thesis with conclusions, recommendations and future work. Some open questions will be discussed as well.

7 2 The Security of the Blockchain

Bitcoin has proven to be a very robust system, as it remains unbroken. We will briefly introduce the security mechanisms of the transaction ledger, the Blockchain, in this chapter. The chapter ends with a discussing how security is affected by quantum computers.

2.1 Introduction to Bitcoin Bitcoin is a decentralized payment system, enabling secure payments between users without using a trusted third party (a bank). Thousands of special Bitcoin users, so-called miners, take care of the security of the whole system and (possibly) get rewarded for doing so. The miners also agree on which transactions are to be included in the Blockchain: the all-time transaction ledger. All transactions ever made are saved in the Blockchain and it is nearly impossible for a malicious entity to make changes. Each Bitcoin miner holds a copy of this Blockchain and communicates with other miners for changes. The reason why we call Bitcoin a crypto-currency, is because the security of transactions and the Blockchain is provided by cryptographic techniques. The idea behind the security mechanism of the Blockchain is explained in this section.

2.2 One-Way Functions and Hash-Functions To understand why Bitcoin is a secure, decentralized payment system, one must know about so-called one-way functions: a function H, for which, given input x, it is very easy to calculate its output H(x), but given output H(x) it is very hard to find any x with that output. The functions used in cryptography and in Bitcoin are special one-way functions, called (cryptographic) hash-functions, where input x can be any string, of any length, and output H(x) will be some fixed-length output of bits. All users of Bitcoin use the hash-function SHA-256. A secure hash-function H has the following properties:

Pre-Image Resistance A hash-function is pre-image resistant if it is a one-way function: given output H(x), it is hard to find any x with output H(x).

Second Pre-Image Resistance A hash-function is second pre-image resistant, if, given any x, it is hard to find any y = x with the same hash-output: H(x) = H(y). 6 Collision Resistance A hash-function is collision resistant if it is hard to find any pair x = y 6 with the same hash-output: H(x) = H(y).

These properties are very important for Bitcoin and will be explained later. Let us assume from now on that the input and output of a hash-function are bits. In cryptography, we assume a (cryptographic) hash-function behaves like a random function, so that is what we will use in this analysis. Suppose we put (partial) restrictions on the output H(x) and ask to find x which gives this correct output. This is asking for a (partial) pre-image. For instance, we want output H(x) to start with a zero and we do not care about the remaining part: H(x) = 0......

The easiest way of finding such x is by brute-force search, so you can start with any x1 and check if H(x1) starts with a zero. If not, pick a different x2 and check again. Continue like

8 this until you have a correct x where H(x) starts with a zero. On average, you would only have to check 2 different x’s to find a matching one! To see why, remember we can assume H to behave like a random function. This means that we can model H such that for each position in the output sequence, we can flip a coin and pick bit 1 if it is heads and 0 if the coin is tails. On average, one would have to flip a coin twice to see a tails, which is why on average we would have to check 2 different x’s to find H(x) that starts with a zero. To make this pre-image search a little harder, we ask to find x such that H(x) starts with many zero’s: 10 zero’s, or 20! On average we would have to check 210, or 220, different inputs x to find a correct one in those cases: we need 10 or 20 tails in a row. For larger numbers it will take some time to find a suitable x, even for a modern computer.

2.3 Proof-of-Work and Hash-Chaining This partial pre-image search for inputs, with an output that starts with many zero’s, is exactly what Bitcoin miners are doing. In practice however, the miners search for an output that, when modelled as an integer, is below a target value T they all agree on, but it has the same probability as finding an output that starts with many zero’s. When a miner has found a matching input, he/she broadcasts the input to the network to show it is valid: everyone can check easily by just inserting the input in the hash-function and checking that the output starts with a lot of zero’s. The input is enough to prove that a miner has found the correct one: a proof-of-work. Finding this pre-image is actually that hard, that doing this on your own will take, on average, many years. It is like winning the lottery. However, since there are thousands of miners, chances that someone will find a pre-image quite fast become higher and higher: someone will eventually win the lottery. This is the basic idea behind the functionality for Bitcoin’s Blockchain. As a payment system, users of Bitcoin are able to make transactions to each-other. For now assume users are able to make transactions securely and they broadcast their transactions to the Bitcoin network. The Bitcoin miners, the ones that search for a specific pre-image of a hash-function, collect all these transactions. If we assume the Blockchain, the ledger containing previous valid transactions, is secure, the miners need a way to add new transactions and keeping the Blockchain secure. A valid new block, agreed to by the Bitcoin community, contains a set of new transactions, some space for random input and, very important, the hash of the previous chain. Miners are asked for a partial pre-image of the hash of this new valid block, and preimage has to start with many zero’s. Miners pick random inputs to search for this partial pre-image and broadcast it to the network if they found it. The Bitcoin network appends this newly found block to the Blockchain and the new transactions become valid. By including the hash of the previous block, this previous block becomes better protected. Changes to content in the previous block will change the hash of that block, but also the hash-output of all blocks after it will change. This is why we call it the Bitcoin Blockchain: the complete set of all transactions ever done are chained together using hash-functions. Now suppose two miners find different valid blocks and broadcast them to the network. They are unable to chain after each-other, because they both used the same hash of the last block. It means there is a fork in the chain: there are now two valid last blocks. Miners can pick either of them as the last block and try to build a new one. Having fork after fork is very unlikely, which means eventually one fork will be the ”longest”. Longest in this case, means the chain with the highest number of blocks. The Bitcoin community will always follow the longest chain known in the network: it is a universal consensus rule everyone follows. So

9 eventually, the shorter part of the fork will be dropped.

2.4 Double Spending and 51% Computational Security The biggest accomplishment of Satoshi Nakamoto was to solve the double spending problem in a trustless network. When do I actually know, or have enough trust, that I have received my money and am able to spend it again, without a bank/trusted third party telling me so? When a transaction is included in a valid block, which is included in the Blockchain, it becomes known to the whole network as a valid transaction. Now suppose a malicious Bitcoin participant wants to undo his transaction he made earlier: he wants to double spend his money. To do this, he has to make changes to the block containing his transaction, and therefore change the hash-output of that block. So he has to find a new valid block, removing his previous transaction. But since the hash-output of the new block changed, all blocks after it also changed. He has to find a completely new chain of blocks, to remove his previous transaction. While the malicious user was trying to find a new block with its fraudulent transaction, the honest Bitcoin users were also looking for new blocks to append to the current chain. The malicious user has to outrun the honest chain by finding valid blocks faster, since the network always follows the longest chain. This requires more computational power than the honest Bitcoin users all together. To be more precise: when a malicious user, or group of users, has 51% of the total computational power of the network, it is highly likely it can outrun the honest Bitcoin chain and be able to make fraudulent transactions. However, since there are thousands of miners with very expensive equipment, chances are very small that one group is able to get this amount of power. Furthermore, there is an economical incentive to be a honest user when you have this amount of computational power. When miners find a new block, they are rewarded with bitcoins. When a user has 51% computational power, it will find half of every new block found by the whole network, which means it gets rewarded with a lot of bitcoins. These rewards are probably higher than the gains from fraudulent transactions.

2.5 Adjustments for Post-Quantum Security Grover’s algorithm [11] is one of two big breakthroughs for quantum computing, but is not a very severe threat to cryptography so far known. Roughly speaking, this algorithm is able to brute-force search a solution in square-root time of the solution space. For example: when there are M words in a dictionary, Grover’s algorithm is able to find any word in about √M computational time. This means that a miner with a quantum computer, would be much faster than those without: they will find a pre-image with N zero’s in the same time another miner finds N/2 zero’s. A possible thing to do for the community would be to switch to bigger hash-functions, like SHA-512, and increase the amount of proof-of-work. This will make it also harder for a quantum computer. But the best thing to do would be to switch to a different consensus scheme. There are a lot more alternative coins and Blockchains with other consensus algorithms, that do not rely on this 51% computational security. Other consensus schemes, like Ripple [24], rely even more on digital signature schemes, which are introduced in the next chapter.

10 2.6 Conclusion To conclude this chapter: the current consensus algorithm of Bitcoin would, in theory, allow a quantum computer to falsify the Blockchain. Other consensus schemes might be better to use in this case. The hash-functions however, remain unbroken and actually finding the complete pre-image of SHA-256 would still be an impossible task, even for a quantum computer. This is why this thesis will focus on the other cryptographic part of Bitcoin: the threat of the other breakthrough of quantum computing are much higher than those of Grover’s. This will be explained in the next chapter.

11 3 Digital Signature Schemes

In this chapter we will describe the fundamentals of digital signatures. These signatures are of significant importance for Bitcoin. We will give two examples widely used on the internet and Bitcoin itself. At the end of this chapter, we will briefly discuss Shor’s algorithm, the first and most influential breakthrough of quantum computing, and explain why these systems are broken once a quantum computer is large enough. The remaining part of this thesis will be devoted to post-quantum digital signatures.

3.1 Public-Key Cryptography One of the fundamental parts of Bitcoin is the usage of public-key cryptography. Cryptography is mostly known for encryption: hiding a message for everyone except the holders of a secret key. Until 1976, all cryptographic schemes were symmetric: both the sender and receiver of the message had to use the same key for encryption and decryption. Diffie and Hellman published the first asymmetric key cryptosystem [6], which could be used to agree on a symmetric key, based on public information: public-key cryptography. A public-key cryptographic scheme requires two keys: a secret, or private, key S and a public key P , which are mathematically connected to each-other. This mathematical connection is based upon a trapdoor function, also called a trapdoor one-way function. Recall that a one-way function H has the property that it is easy to compute H(x) given x, but it is hard to compute x given H(x). A trapdoor function f(x) has the additional property that given some secret y, it is easy to compute x given f(x). These functions are the basis of public-key encryption. For example, an encryption algorithm can be viewed as a trapdoor function, which, on input of message m and public key P , returns an encrypted version of m. The decryption can be viewed as the inverse function, that given the encrypted version and secret key S, returns m. Similar arguments hold for the construction of digital signature schemes, which will be introduced in the next section. The one-way functions beneath these schemes are based on hard mathematical problems: solving the math-problem means breaking the cryptosystem.

3.2 Properties of Digital Signature Schemes After introducing asymmetric cryptography, Diffie and Hellman also describe an additional feature of public-key cryptography: a digital signature scheme. A digital signature scheme is fulfilling three purposes: authentication, integrity and non-repudiation. The importance of each of these properties in Bitcoin is emphasized after introducing the property itself:

Authentication; proving authorship of a message. When ownership of a key-pair is bound to a specific user, a valid signature shows the message was sent by that user. Bitcoin account-numbers are based on the public key of a user and transactions are sent to these account numbers. The user has to prove he is the owner of this account, before he is able to perform any transaction. All transactions in Bitcoin, and also all validated transactions in the Blockchain, contain a valid digital signature of a user.

Integrity; proving the message has not been altered during transmission. Although encryption hides the contents from a message, a man-in-the-middle may easily change random parts of the message. A valid signature shows the message has not been altered during transmission.

12 Bitcoin transaction-data are public and not even encrypted, which means any user who captures a transaction could easily change data. For instance, he could change the receiving account-number to a number he owns. But this altered transaction does not have a valid signature, and is thus rejected as a valid transaction by the network.

Non-repudiation; an entity that has signed some information cannot later deny having signed it. A Bitcoin user who broadcasts a transaction with a valid signature is not capable of getting his money back: he cannot deny having signed it and thus cannot deny having spent the money. This is one of the key concepts of Bitcoin, which can be seen as both a positive and negative property of the system: there is no refund mechanism.

There are many more use-cases of digital signatures, but we will focus on the usage for Bitcoin. A digital signature scheme typically consists of three algorithms:

Key generation algorithm It selects a random secret key S from a set of possible secret keys. The algorithm outputs the secret key and a corresponding public key P . We will denote this by a key-pair (S, P )

Signing algorithm Given a message m and a private key S, it produces a signature µ.

Verification algorithm Given a message m, a signature µ and a public key P , it either accepts or rejects the signature as a valid one.

These algorithms are at the basis of making transactions in the Bitcoin network. First, users create a valid key-pair, where the public key P is hashed to form an account number. This account number is shared with all other users. Second, to make a transaction, the required data (sending/receiving account number, amount, meta-data, etc) is formed. All this data is signed by the user holding the sending account number (who holds the corresponding secret key). The valid signature is appended to the transaction. This transaction, together with public key P , is then shared with the Bitcoin network. Each miner will check if the transaction data is valid and verify the signature of the transaction. If it is valid, the transaction can be added to a new block in the Blockchain. If it is invalid, the transaction will be discarded immediately. Using digital signature schemes in practice comes with some minor changes in these schemes. For instance, a message m is often hashed before it is signed. This has the following advantages:

Security: given a message m and valid signature µ, it is in some signature schemes easy • to forge a valid signature for an expanded message m0 = m x or algebraically modified || message m0 = k m. However, by first hashing and then signing, an attacker would have · to find hashes with these relations in order to do such signature forgery.

Efficiency: since a hash-function has a fixed output, arbitrary large messages (documents) • are signed more efficiently by first hashing them. The running time of signing will only increase by the running time of hashing.

From now on, we always assume we hashed a message before it is signed, using a secure hash-function. We omit the notation of H(m), and simply use m to denote a hashed message.

13 In the following section, two concrete examples of these algorithms are given, which are widely used on the internet today: RSA and Elliptic Curve (ECC) signatures. For each example, an explanation is given why this signature works. Note that the examples are school-book versions, which are not necessarely secure in practice.

3.3 RSA Signatures Although Diffie and Hellman were the first to introduce the notion of digital signatures, they only conjectured that such schemes could exist. One year later, Ronald Rivest, Adi Shamir and Len Adleman invented the RSA algorithm [22], which could be used for encryption but also to produce digital signatures. To this date, it is still the most commonly used public-key cryptosystem on the internet. It is based upon the hard problem of factorization.

Algorithm 1 RSA Key Generation Output: A valid key-pair (S, P ) 1: Generate two large primes p, q. Set N = pq. 2: Generate integers e, d such that ed 1 mod (p 1)(q 1). ≡ − − 3: Output (S, P ) = (d, (N, e))

An important consequence of this key generation is that for any x < N: xed x1 x mod N ≡ ≡ To sign a message m < N, the following algorithm is used:

Algorithm 2 RSA Signing Input: Secret key d, message m. Output: A valid signature µ for message m. 1: Output µ md mod N ≡ Finally, the RSA message-signature pair is validated by:

Algorithm 3 RSA Verification Input: Public key (N, e), message m and signature µ. Output: Accept or Reject ? 1: Accept if m µe mod N ≡ 2: Reject otherwise.

A valid signature will be accepted since: µe mod N med mod N m ≡ ≡ Remember that m is hashed before signed, otherwise there are other ways of forging a signature. But now, in order to forge a signature, one needs a value d, with the property: ed 1 mod (p 1)(q 1) ≡ − − If we would know p or q, one of the factors of N, this is calculated easily using Euclids extended algorithm. Knowing d is the trapdoor function used in RSA: factoring an RSA number p q is very hard. This is why we say that RSA’s security is based on the hard problem · of factorization.

14 3.4 Elliptic Curve Signatures Bitcoin uses a different public-key cryptosystem, based on the discrete log problem in elliptic curves. Explaining an elliptic curve is easiest by looking at a concrete example:

y2 = x3 + ax + b

We say a point Q = (x, y) is on this curve if it satisfies the curve’s equation. Below is a picture of this curve, with a = 0, b = 1, which is used by Bitcoin and is called Secp256k1:

Figure 1: Secp256k1 over R, used by Bitcoin

An elliptic curve over Fq is defined as the set of points (x, y) satisfying the above equation in Fq, together with a unity element O, also called the point at infinity. This means for any point Q on the curve: Q + O = Q = O + Q and O + O = O. The negated point Q of a point − Q = (x, y) is (x, y). − Let Q1 = (x1, y1) and Q2 = (x2, y2) be points on a elliptic curve E. Then the sum Q3 of Q1 and Q2 is defined by: ( O if Q1 = Q2 Q3 = − R if x = x or Q = Q − 1 6 2 1 2

Here, R is the third intersection point of the line through Q1 and Q2 and curve E. Let dQ denote the addition of Q to itself, d 1 times. Elliptic curve cryptography is based − on the hardness of finding a scalar d, such that P = dQ for given points P Q and Q on ∈ h i curve E. Finding d given P is called the Discrete Logarithm Problem (DLP ) on an elliptic curve. The cryptographic system fixes a curve E and point Q, where point Q has a large order n, meaning nQ = O. The following algorithms are used in digital signature schemes based on elliptic curves, and are used by each user and for each transaction in Bitcoin.

15 Algorithm 4 Elliptic Curve Key Generation Input: Elliptic curve E over Fq and a point Q on E. Point Q has order n. Output: A valid key-pair (S, P ) 1: Generate a large, random scalar d [1, n 1] ∈ − 2: Calculate point P 0 = dQ 3: Output (S, P ) = (d, P 0).

To sign a message m Fq, the following algorithm is used: ∈ Algorithm 5 Elliptic Curve DSA Signing Input: Secret key d, message m, point Q on E of order n. Output: A valid signature µ for message m. 1: Select a random integer k [1, n 1]. ∈ − 2: Calculate point kQ = (x1, y1). 3: Take r = x1 mod n. If r = 0, return to step 1. 4: Compute s = k−1(m + dr) mod n. If s = 0, return to step 1. 5: Output signature µ = (r, s).

Finally, the elliptic curve message-signature pair is validated by:

Algorithm 6 Elliptic Curve DSA Verification Input: Public key P = dQ, message m and signature µ = (r, s). Output: Accept or Reject 1: Verify that r, s are both in the interval [1, n 1]. Reject otherwise. − 2: Compute u = ms−1 mod n and v = rs−1 mod n. 3: Calculate point S = uQ + vP = (x2, y2). Reject if S = O. ? 4: Accept if r x mod n. ≡ 2 5: Reject otherwise.

A valid signature µ = (r, s) of m will be accepted, since:

uQ + vP = uQ + v dQ · = (u + vd)Q = (ms−1 + rs−1d)Q = s−1(m + rd)Q = ((k−1(m + rd))−1(m + dr))Q = k((m + dr)−1(m + dr))Q = kQ

This means r x mod n as required. Forging a signature requires the knowledge of d, ≡ 2 which can be obtained by calculating the discrete log of P = dQ and, as said earlier, this is very hard to do. This is the trapdoor of elliptic curve cryptography.

16 3.5 Factorization and Discrete Log with Shor’s Algorithm Shor’s algorithm [25] will change the field of cryptography completely. The fastest computer still takes many centuries to solve the mathematical problems discussed in this chapter, but a large quantum-computer, that implements Shor’s algorithm, will solve these problems in seconds. This would mean these digital signature schemes are completely broken afterwards. Understanding Shor’s algorithm requires understanding quantum mechanics, which is out of the scope of this thesis. However, we will give a general idea why factorization is broken by this algorithm. Shor’s algorithm is capable of finding the period of a number a lot faster than traditional computers. Given a number a and an modulus N = pq, the order of a is the smallest number r such that: ar 1 mod N ≡ Although it is not immediately clear from this how this might help, we can write it in a different way: (ar/2 1)(ar/2 + 1) 0 mod N − ≡ With high probability, the order r is divisible by 2 and both (ar/2 1) and (ar/2 + 1) are not − multiples of N. What this actually means, is that both (ar/2 1) and (ar/2 + 1) are factors − of N! By trying enough random numbers a, calculating order r and checking if one of the above factors is indeed a factor of N, means solving the factorization problem. Traditional computers however, take a lot of time finding order r, but as said, a quantum computer with Shor’s algorithm finds this order very fast. To use the analogy with the hardness of finding the pre-image with mining: traditional computers take (sub)exponential time to factorize (not using above method, but another algorithm [16]), while Shor’s algorithm takes polynomial time. Similar arguments can be made to show that Shor’s algorithm also solves the discrete logarithm problem, the basis of elliptic curve signatures, in polynomial time, whilst it takes exponential time for conventional computers. This means we cannot continue using signatures based on the factorization problem and discrete logarithm problem. Any attacker with a large enough quantum computer, using Shor’s algorithm, can easily find the secret key given the public key. For Bitcoin, this means it can impersonate anyone in the community: the attacker can forge signatures and therefore spend another user’s money. The financial system would be completely broken. Luckily, there are alternative schemes which do provide the necessary security.

17 4 Hash-Based Signature Schemes

In the previous chapter we concluded that, once a large quantum computer arises, implementing Shor’s algorithm, the signature schemes used today are completely broken. The remaining part of this thesis will focus on post-quantum signature schemes. The first example of such a scheme is based on hash-functions. Recall that a hash-function is a one-way function H, with some security properties, such as pre-image resistance: given H(x), it is very hard to find x. This property can be used to build signature schemes whose security relies only on the security of the hash-function used. These functions are considered safe against quantum computers, which is why these signature schemes are safe too. However, these schemes have some practicality issues, which will be described at the end of this chapter. This is the main reason why this thesis will focus on another scheme secure against quantum computers.

4.1 One-Time Signature Schemes The first hash-based signature scheme was introduced by Lamport [15] and is therefore called the Lamport One-Time Signature Scheme (LOTSS). As the name suggests: it can only be used once and it will become clear why when we introduce the signature algorithms. Let (x y) || denote the concatenation (appending) of string y to string x.

Algorithm 7 LOTSS Key Generation Input: Message length k. Output: A valid key-pair (S, P ) ` 1: Generate 2k random numbers Xij; 1 i k; j 0, 1 where Xij 0, 1 for some ≤ ≤ ∈ { } ∈ { } bit-size ` of the numbers. 2: For each i, j, compute Yij = H(Xij), using some secure hash-function H. 3: Output (S, P ) = ( Xij , Yij ) { } { }

Note that constructing Xij from Yij is to hard due to the pre-image security of H. To sign a message of bit-length k, one needs 2k random values and hash-values, which will become clear why when we introduce the signing and verification. Let mi be the i’th bit of message m. Then the signing algorithm is as follows:

Algorithm 8 LOTSS Signing Input: Secret values Xij , message m of bit-length k. { } Output: A valid signature µ for m. 1: For each 1 i k: ≤ ≤ 2: µi = Xi,mi . 3: Return µ = (µ ... µk). 1|| ||

It means that, depending on the i’th bit of mi, one picks µi = Xi,0 if mi = 0 or µi = Xi,1 if mi = 1. Finally, the LOTSS-signature is validated by:

18 Algorithm 9 LOTSS Verification Input: Public key Yij , signature µ for m. { } Output: Accept or Reject 1: Accept if H(µi) = Yi,m for all 1 i k i ≤ ≤ 2: Reject otherwise.

A valid signature is accepted since:

H(µi) = H(Xi,mi ) = Yi,mi

The reason why this scheme is called a one-time signature scheme is because a signature leaks half of the secret values Xij, since µi = Xi,mi . In the worst case, this system is completely 0 0 broken after two signatures. Let m be another message with the property m = 1 mi. Then i − after signing both m and m0, all secret values are exposed. This is why these schemes should only be used once. However, in the next section we will explain how to build schemes which are able to create more signatures out of these one-time signatures.

4.2 Merkle Signature Schemes The [18] can be used to sign a limited number of messages with one public key, using multiple one-time signature key-pairs such as Lamport’s signatures. The number of messages must be a power of 2, so let N = 2n be the number of messages one wants to sign with public key P . The signature algorithms are given as follows:

Algorithm 10 Merkle Key Generation Input: N = 2n, the number of possible messages to sign. Output: A valid key-pair (S, P ) 1: Generate N key-pairs (Xi,Yi) for N one-time signatures (for instance, key-pairs for LOTSS) 2: Build a special binary tree (the Merkle tree) of depth n, with nodes ai,j and leaves a ,j = H(Yj) for 0 j N 1, using secure hash-function H. 0 ≤ ≤ − 3: The value of an inner node of the tree is the hash value of the concatenation of its children. So ai ,j = H(ai, j ai, j ). +1 2 || 2 +1 4: Output S = Xi ,P = (an, ). In other words: the secret key are all secret keys from the { } 0 N one-time signatures and the public key is the root of the Merkle Tree.

A visualization of the Merkle tree is given in figure 2.

19 a3,0

a2,0 a2,1

a1,0 a1,1 a1,2 a1,3

H(Y0) H(Y1) H(Y2) H(Y3) H(Y4) H(Y5) H(Y6) H(Y7)

Figure 2: Example of a Merkle Tree with n = 3. Each inner node is the hash value of the concatenation of its children. The root a3,0, in orange, is the public key.

Now to validate a Merkle-signature, one needs to calculate the path (authentication path) from one of the leaves (the hash of the used one-time signature public key) to the root of the Merkle Tree (the public key). To do this, one needs the sibling of every node in the path, since each inner node is the hash of its two children. These values are part of the signature of message m. To get this path, one can either compute the whole path using the secret values (build the tree from bottom up) or save the tree. Either way, we assume we have it for the signing algorithm:

Algorithm 11 Merkle Signing Input: Secret keys S = Xi of one-time signatures, message m. { } Output: A valid signature µ for m. 1: Pick any unused secret key Xi to construct a one-time signature Si. 2: Let A = (A0,A1, ..., An) denote the n nodes on the path in the Merkle tree, between leaf a0,i = H(Yi) to top an,0 (excluded in signature, because it is public). 3: Let B = (B ,B , ..., Bn) denote the siblings of A, such that As = H(As Bs) or 0 1 +1 || As = H(Bs As), depending on the path, for 0 s n 1. +1 || ≤ ≤ − 4: Output µ = (µi Yi B ... Bn− ) || || 0|| || 1

20 a3,0

a2,0 a2,1

a1,0 a1,1 a1,2 a1,3

H(Y0) H(Y1) H(Y2) H(Y3) H(Y4) H(Y5) H(Y6) H(Y7)

Y0

Figure 3: Example of a Merkle signature with i = 0. X0 is the secret key used for the one-time signature. The inner leaves marked in blue are the siblings Bi of the signature.

21 The verification of this algorithm is simply reconstructing the path from a0,i to an,0 using Bs:

Algorithm 12 Merkle Verification Input: Root Merkle Tree (public) an,0, signature µ for m. Output: Accept or Reject 1: Verify the one-time signature of m using µi and one-time signature public key Yi. 2: Rebuild the path from A = a ,i = H(Yi) to an, by computing As = H(As Bs) or 0 0 0 +1 || As = H(Bs As) with Bs given in µ. +1 || 3: Accept if An = an,0 (root Merkle tree). Reject Otherwise.

a3,0

a2,0 a2,1

a1,0 a1,1 a1,2 a1,3

H(Y0) H(Y1) H(Y2) H(Y3) H(Y4) H(Y5) H(Y6) H(Y7)

X0

Figure 4: Verifying the signature (in blue) by reconstructing the path (in red) from H(Y0) to the root an,0, which is also the public key.

22 It is very important to use a different one-time signature each time you construct a Merkle signature: using a one-time signature twice will again jeopardize security. This is why this scheme is called stateful: the user needs to remember which signatures have been used. There is a stateless, optimized alternative called SPHINCS [4], which is using multiple Merkle-trees, together with random numbers.

4.3 Practicality Issues All you need to build hash-based signature schemes is a secure hash-function. This is already required, because messages are hashed before signed. As we have seen in chapter two, the security of hash-functions is minimally impacted by quantum computers. However, if we go into more details, there are some disadvantages to these schemes which make them unsuitable in the situation of this thesis. The first and most important issue is that of the signature size. The size of a LOTSS- signature depends on k secret values Xij. To achieve a security of, say, 128 bits, one needs secret values of at least 256 bits due to the birthday paradox. Since, also in this case, messages are hashed before signed, we have k = 256 bits too. This means, a LOTSS-signature is of size 256 256 = 65536 bits. That is 30 times more than RSA signatures on the internet and 256 · times more than elliptic curve signatures used in Bitcoin! The signature size will dominate the size of blocks in the Blockchain when using this. There is a more efficient scheme, called Winternitz’s signatures [26], which can reduce the size of LOTSS-signatures. However, to achieve a linear decrease in signature size, the signing time becomes exponentially bigger. This could make the scheme more practical, but the signatures are still only usable once. It means that Bitcoin users have to be very careful to use an account only once! Using many-time signature schemes, like Merkle signatures, has another practicality issue. One has to keep track of the so-called state of the scheme: remember which one-time signatures are already used. This is very hard to do in practice, especially in the use-case of Bitcoin. Chances are high that a Bitcoin user wants to store keys in many places, allowing to spend money using multiple devices. But all these devices need to track the state of the signatures, in order to maintain security. This issue is hard to deal with. The stateless alternative SPHINCS [4], which is highly optimized, does not have this issue but has signature sizes of 41000 bytes, which means for all these schemes the signature size is dominating the whole Blockchain. Despite these practicality issues, it is advised [2] to start using these signature schemes, for applications such as Bitcoin. The reason for this is simple: the security of hash-based signatures is well understood and does not require significant software updates. Hash-functions are already implemented for other purposes. Another way of constructing post-quantum digital signatures will be introduced in the next chapters. These schemes do not have the practicality issues just described, but their security is not yet well understood.

23 5 Introduction to Lattices

Several cryptosystems secure against quantum computers, are based on problems in lattices. These systems are considered very promising for practical use. Lattice-based signature schemes are fast (comparable to the speed of elliptic curve signatures), have practical signature sizes (comparable to the size of RSA signatures) and have relatively small keys (comparable to the size of RSA keys) [8]. The name suggests these systems are based on hard problems in lattices, and to understand them, we need to understand lattices. The theory of lattices is very broad and complex, which is why we only introduce the parts which are relevant. As an example, we end the section with the first, but broken, signature scheme related to a hard lattice problem.

5.1 Notations The theory of lattices relies heavily on linear algebra, which is why we first introduce some n important notations. Let R be the n-dimensional Euclidean vector-space over R with its n usual set of rules and Z is the space of vectors with only integer coefficients. We define the rounding function x : R Z, x = x + 0.5 d c → d c b c to be the integer closest to x. Column vectors are denoted by bold letters x, where a matrix B has column vectors bi. We say a lattice is of rank m if it is spanned by m linearly independent vectors. We use the Euclidean inner-product of two vectors:

n X x, y = xiy h i i i=1 and consider both the L1,L2 and L∞ norms:

x 1 = x1 + x2 + + xn || || q| | | | ··· | | x = x2 + x2 + + x2 || ||2 1 2 ··· n x ∞ = max xi || || 1≤i≤n | |

However, when we do not give any subscript 1, 2, with a norm, we always refer to the L ∞ 2 norm. The distance between two vectors is denoted by d(x, y) = x y and the distance n n || − || between an element x R and a set E R is: ∈ ⊂ d(x,E) = min x y y∈E || − ||

5.2 Definitions and Bases n We can define a (integer) lattice Λ of dimension n, spanned by vectors b1, ..., bm Z , { } ⊂ where b1, ..., bm are linearly independent vectors over Z, as the set of all linear combinations of the bi’s with integer coefficients:

( m ) X Λ(b1, ..., bm) := xibi x1, ..., xm Z | ∈ i=1

24 We call the vectors b1, ..., bm a basis of the lattice. One of the easiest examples of a lattice n { } is the set Z , spanned by the n standard unit vectors ei. As with a normal collection of m×n vectors, we can also identify a lattice by using the matrix B Z , whose columns are the n ∈ vectors bj as vectors in Z , and the lattice can be written as:

m Λ = Bx x Z { | ∈ } In this case, the rank of B equals m. We say that Λ is of full rank if the rank m of the lattice equals the dimension n of the space. In the remaining part of this thesis, all lattices introduced will be of full rank unless stated otherwise. It is easiest to visualize a lattice in the case of 2 dimensions, as in figure5.

b1

b2

O

Figure 5: Visualization of a lattice spanned by two vectors, b1 and b2. Each lattice point can be expressed as a linear combination of these two vectors.

An important aspect of a lattice is that there are infinitely many bases to choose, which all span the same lattice. Another way of saying this, is that we can apply any uni-modular n×n transformation matrix U Z , that is, a matrix with det(U) = 1, to a basis B and the ∈ ± new basis will span the same lattice:

n Λ(B) = (BU)x x Z = Λ(BU) { | ∈ } Below is an example of the previous lattice, with another basis in blue:

25 r2

r1

b1

b2

O

Figure 6: Visualization of a lattice with two different bases. We call basis B = b , b a good { 1 2} basis and R = r , r a bad basis. { 1 2}

Note that the first basis is nearly orthogonal, while the second basis is more angular. Furthermore, the second basis has longer vectors than the first. Therefore, we call basis b , b a good basis of the lattice and r , r a bad basis. This difference in bases allow the { 1 2} { 1 2} usability of lattices in cryptography. This is because a lot of lattice problems, introduced later, are easy to solve when you know a good basis, but are very hard to solve when you only know a bad basis. This makes them suitable as a trapdoor. To make this difference more precise, we will define the notion of the orthogonality defect. n Let Λ Z be a lattice spanned by b1, ..., bn . Let B be the matrix, whose columns are ⊂ { } the bi. Then we define the orthogonality defect δ as: Qn bi δ(B) = i=1 || ||2 det(B) | | We call det(B) the volume vol(Λ(B)) of Λ. Note that the vol of Λ is independent of the | | bases, which follows immediately from calculations: det(BU) = det(B) det(U) = det(B) 1 = det(B) . From the definition of the orthogo- | | | | | · ± | | | nality defect, you can derive that δ(B) 1, but in the case of the bases in figure 5.2 we can ≥ also conclude δ( r , r ) > δ( b , b ) { 1 2} { 1 2} Furthermore, we denote the length of the shortest non-zero vector in Λ with:

λ1(Λ) = min y y∈Λ\{0} || ||

Naturally, bases which have vectors with lengths close to this value are better than bases with only large vectors.

26 5.3 Lattice Basis Reduction As introduced in the previous section, we have good bases and bad bases for a lattice, where a good basis is as orthogonal as possible. Lattice Basis Reduction methods try to transform a bad basis into a good basis. There are many Reduction algorithms, but we will focus on the Lenstra-Lenstra-Lov´aszalgorithm (LLL) [17]. This algorithm runs in polynomial time in the dimension of the lattice and many other algorithms are modifications of this algorithm. To understand the main principles behind this method, we will first describe how orthogonalization is done in linear algebra over R: the Gram-Schmidt Orthogonalization method. Given a basis b1, ..., bn, we can define the Gram-Schmidt Orthogonalization (GSO) of the basis by using an iterative process:

∗ b1 := b1, b∗ := b µ b∗, 2 2 − 21 1 i−1 ∗ X ∗ b := bi µijb , i − j j=1 where the GSO-coefficients are given by:

∗ bi, bj µij = h i ; i > j. b∗ 2 || j || ∗ The meaning of these GSO-coefficients is that µijbi is the projection of bi on bj . By removing all these projections, the vectors become naturally orthogonal. So after this process, ∗ ∗ the GSO-coefficients µij = 0 and the vectors b1, ..., bn form an orthogonal basis. Now note that the µij are not necessarily in Z, which means we cannot use these values when we reduce a lattice. However, we can simply round these values µij to the nearest d c integer and use these values instead. This rounded version of GSO is called a Size-Reduction step and is the main step in LLL. 1 After a Size-Reduction, we know that µij 2 , since that is the biggest error one can ∗ | | ≤ ∗ make by rounding. This means that bi is already close to orthogonal to bj for all j < i. ∗ Afterwards, bi will be used to decrease bj’s for j > i, but not the other way around! So ∗ after a Size-Reduction step, one can swap bi with bi+1, and another Size-Reduction step ∗ might reduce bi even further. So by a process of Size-Reductions and swaps, one can reduce the lattice further and further. But this would mean an exponential running time, unless we stop after a certain condition. The LLL algorithm uses the Lov´aszcondition as a stopping criteria, which makes the algorithm run in polynomial time and the resulting basis has some nice properties. The Lov´aszcondition, with factor δ ( 1 , 1) is defined as: ∈ 4 ∗ 2 ∗ ∗ 2 δ b b + µi ,ib || i || ≤ || i+1 +1 i || for all 1 i n. Another way of viewing this condition is that if the Lov´aszcondition does ≤ ≤ ∗ ∗ ∗ not hold, swapping bi and bi+1 significantly reduces the norm of bi . It means there is still a lot of progress to be made in reducing the lattice. A summary of LLL is given below, where δ ( 1 , 1) is used in Lov´aszcondition: ∈ 4

27 Algorithm 13 LLL Lattice Basis Reduction Input: Lattice basis B = b , ..., bn { 1 } Output: A reduced lattice basis B∗ 1: Size-Reduce b , ..., bn (using rounded GSO-coeffcients µij ). 1 d c 2: If there exists an index i which violates Lov´aszcondition: 3: Swap bi, bi+1 and go to step 1. ∗ ∗ ∗ 4: Output B = b , ..., b { 1 n}

One can prove that this algorithm terminates in polynomial time in the lattice rank m = n, vector space dimension n and maximal bit-length of the bi’s. Furthermore, after LLL the basis B∗ has the following property:

!(n−1)/2 ∗ 1 b λ1(Λ) || 1|| ≤ δ 1 · − 4 However, implementations of LLL perform much better than theory predicts and find much smaller vectors. Similar bounds can be constructed for other basis vectors of B∗, which means there is an upper-bound β for which bi β for all 1 i n. A visualization of LLL and β || || ≤ ≤ ≤ is given below.

r2

r1

v1

v2 β O

Figure 7: Visualization of LLL lattice basis reduction, with input basis r , r . LLL will find { 1 2} a basis, with smaller orthogonality defect and vectors smaller than bound β

5.4 Hard Lattice-Problems As with RSA and elliptic curve Cryptography, we need a trapdoor function based on a hard mathematical problem. In the previous section we already mentioned that there are lattice-problems which are very hard to solve given a bad basis, but easy when given a good basis. This means an input of such a problem can be used as a trapdoor function, and is thus

28 usable in building a cryptosystem. However, as we will see in the end of this chapter, it is not that straightforward to construct a secure system. Two classical lattice-problems are easily recognized by their name: the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP). As their names suggest, both involve finding a vector with a certain property. The Shortest Vector Problem is the problem of, given basis B of lattice Λ, find a non-zero vector x Λ 0 which has minimal length in the lattice: ∈ \{ }

x = min y = λ1(Λ) || || y∈Λ\{0} || ||

Note that this vector is not unique: also x and more vectors are possible as a solution. −

r2

r1

v

O

Figure 8: An instance of the Shortest Vector Problem, with input basis r , r . Find a vector { 1 2} v with minimal length: v = λ (Λ) || 1

The Closest Vector Problem is the problem of, given basis B of lattice Λ, and a target vector v (not necessarily in the lattice), find the vector x Λ closest to v: ∈ d(v, Λ) = x v || − ||

29 r2

v x

r1

O

Figure 9: An instance of the Closest Vector Problem, with input basis r , r and target { 1 2} vector v. Find a vector x closest to v

One can show that if you are able to solve CVP, then you can also solve SVP. The best way to solve these problems, is to use a lattice reduction algorithms, like LLL, and proceed further from there. LLL will produce a relatively short basis in polynomial time and afterwards one can continue with Enumeration [9] or Sieving [1] algorithms to solve these problems exactly. However, this last step will take exponential time in doing so. Given a good basis of the lattice, both problems are solved easily. A signature scheme based on CVP is given in the next section.

30 5.5 A First Attempt of Lattice-Based Signatures One of the first cryptosystems based on hard lattice problems is named after their introducers Goldreich-Goldwasser-Halevi (GGH) [10]. It uses the closest vector problem as the key ingredient for both an encryption algorithm and as a signature scheme, together with a good basis (secret key) and bad basis (public key). The algorithms for the signature scheme are as follows:

Algorithm 14 GGH Key Generation Output: A valid key-pair (S, P ) n×n 1: Generate a good basis B Z . ∈ 2: Generate a bad basis R = BU, for some uni-modular matrix U. 3: Output (S, P ) = (B, R).

That the GGH signature scheme is build upon the CVP, will become clear from the signing algorithm. We will model message m as a vector m, not necessarily in the lattice. Then signing works as follows:

Algorithm 15 GGH Signing Input: Secret key B, message m. Output: A valid signature µ for message m. − 1: Calculate lattice point z = mB 1 B d c 2: Output µ = z

Note that m is not an element of the lattice spanned by B, but is very close to z which is a lattice point. This is the reason why this scheme is based on the CVP: if one wants to forge a signature for message m, one needs to find the closest lattice point to m. Checking that µ Λ and µ is close to m is the only thing to do for verification: ∈ Algorithm 16 GGH Verification Input: Public key R, message m, signature µ Output: Accept or Reject 1: Verify that µ = z Λ(R) and that z m is small. ∈ || − || 2: Reject if either of the above is false.

In other words: one verifies that the signer found a vector close to message m in the lattice.

31 The reason why this scheme is broken, is because every signature leaks secret information. With every signature, an element v = m z is given, which lies in the parallelepipid spanned by − the secret vectors b1, ..., bn. When one captures enough vectors v, which does not require that many signatures, one knows the whole parallelepipid and can therefore take the surrounding vectors as secret basis. A visualization of this attack is given below:

r2 r2

m1 m1

µ1 µ1

r1 r1 m2

µ2

b1 b1

b2 b2 m µ 1 − 1

m2 µ2 O O −

(a) (b)

r2 b2

r1 b1

b1 v1

b2 v2

O O

(c) (d)

Figure 10: Step-by-step attack on GGH Signature Scheme. Each time a signature is given, an element inside the parallelepipid (in grey) of a small basis is given. After enough signatures (step (a), (b), (c)), one has enough elements to compute the small basis from it (step (d)).

32 6 Lattice-Based Signatures In Practice

In the previous section we have seen how the GGH signature scheme is broken easily, because it leaks information about the secret key with each signature. In this section, we will first discuss two other hard lattice problems and introduce the Bimodal Lattice Signature Scheme (BLISS) [8]. We will zoom in on the discrete Gaussian sampler, which is needed for sampling an error vector in the signature scheme. We end the section with an efficient way of implementing lattice-based cryptography, as well as parameter suggestions from the authors of BLISS.

6.1 More Hard Lattice Problems: SIS and LWE Lattice-based signatures used in practice do not rely on SVP or CVP, but rely on different problems. Two other problems which are hard to solve are the Short Integer Solution (SIS) and the Learning With Errors (LWE) probems. Unlike SVP and CVP, these problems are not problems which have geometric meaning in lattices, but their hardness relies on the hardness of the lattice problems discussed before. This means that one can transform SIS and LWE to problems in lattices. They are considered hard on average, which means a random key will most likely be a secure one. Unlike SVP and CVP, both SIS and LWE are problems which are related to the vectors modulo a prime q: n Zq = x = (x1, ..., xn): xi Zq i { ∈ ∀ } n n This means any linear combination of vectors in Zq will be an element of Zq , so we can define n×n a lattice in this space, spanned by a matrix A Zq : ∈ n Λq(A) = Ax mod q x Z { | ∈ } n×n Short Integer Solution is the problem of, given matrix A Zq , find a short, non-zero n ∈ vector x Zq such that: ∈ Ax 0 mod q ≡ The hardness of this problem depends on how short the vector x must be. In other words: find n a linear combination of these given n vectors, which results in a zero vector in Zq . A more n×m general version of SIS, called the in-homogeneous SIS, is the problem of, given A Zq and ∈ target vector v, finding x such that:

n Ax v Zq ≡ ∈ To see the analogy with a lattice problem, we can define the space of all solutions of the above equation: ⊥ Λ (A) = x Z Ax 0 { ∈ | ≡ } This is called the dual lattice and the problem SIS is the Shortest Vector Problem in this dual lattice. This is why we call SIS a lattice problem, since there is an analogy between the hardness of SIS and SVP. The Learning With Errors(LWE) problem is also defined on Λq(A) and has resulted in many cryptosystems, mainly encryption. As the name suggests, this problem involves an error n vector e Z which is unknown. ∈ n×n Learning With Errors is the problem of, given public matrix A Zq , and public vectors ∈ b satisfying T T T n b s A + e Zq ≡ ∈

33 n for some unknown error vector e with fixed distribution D, find secret vector s Zq . Note ∈ that this error vector is very important: if e is known, one has the equation:

bT eT = sT A − n and this is easily solvable for s Z by a linear solver. Not all probability distributions D are ∈ suitable, but multiple suffice to build a secure system. We can again define a (regular) lattice of solutions by: Λ∗(A) = xT sT A mod q { ≡ } The problem of LWE is the problem of CVP in this lattice: in this case b is close to a vector in Λ∗(A) and one has to find this vector. So the hardness of LWE is similar to the hardness of CVP. We will see how to build a secure signature scheme on SIS next.

6.2 BLISS: Bimodal Lattice Signature Scheme There exist many signature schemes based on the hardness of SIS and LWE, but one that is in particularly optimized for practical uses is called Bimodal Lattice Signature Scheme [8] (BLISS). This scheme is based on the hardness of SIS or LWE, depending on the parameters. The algorithms of BLISS given below are simplified, but are sufficient to understand the basic building blocks of this scheme. This means the real scheme given in [8] is a bit more complex. Later in this chapter, we will explain in more detail how lattice-based cryptosystems are implemented via NTRU lattices.

Algorithm 17 BLISS Key Generation Input: Modulus q, lattice dimension n. Output: Key-Pair (S, P ). 1: Divide dimension n = `0 + n0 into two parts, with `0 > n0. 0 `0×n0 0 2: Sample a random, sparse matrix S Z2q with coefficients Sij 0, 1, 1 .  0  ∈ ∈ { − } S n×n0 3: Set S = Z2q . In0 ∈ 0 n0×`0 4: Sample a random matrix A Zq . 0 0 0 ∈ n0×n 5: Set A = (2A qIn0 2A S ) Z . | − ∈ 2q 6: Output (S, P ) = (S, A).

0 0 Here, In0 is the n n identity matrix. By construction, secret S is a (matrix) solution of × the general version of SIS: AS qIn0 mod 2q ≡ and thus AS O mod q ≡ where O is the all-zero matrix. We will see shortly why we need the first equation to be true. To sign a message m, we need a different hash-function than usual. We will not go into details on how to do this, but there are several methods to construct a hash-function, which outputs a vector with certain constraints. BLISS uses a hash-function H, which outputs a vector n0 c Z with c 1 = κ and ci 0, 1 . Here, κ n denotes the sparsity of c. In other words: ∈ || || ∈ { }  it is a vector consisting of precisely κ coefficients 1 and the remaining coefficients are 0. As we

34 will see shortly, all signatures have a part which consists of a sum of κ secret vectors of key S. We will hide this secret part by using a noise vector y, where it is important that this noise vector is sampled according to a discrete Gaussian distribution Dσ [12]. By doing this, the average hardness of the scheme is just as hard as the worst-case hardness. Sampling a discrete Gaussian in practice is not straight-forward, and we will zoom in on this distribution and how to sample from it in the next chapters. Like a regular Gaussian distribution, it has a certain standard deviation σ and a mean µ, but we only use centered Gaussians with µ = 0. We denote the centered discrete Gaussian distribution, with standard deviation σ, by Dσ. Signing a message goes as follows:

Algorithm 18 BLISS Signing Input: Secret key S, public key A and message m. Output: A valid signature µ for m. n 1: Sample y Dσ (discrete Gaussian). ∈ 0 2: Compute c = H(Ay mod 2q m), c 0, 1 n , c = κ. || ∈ { } || ||1 3: Choose a random bit b 0, 1 . ∈ { } 4: Set z = y + ( 1)bSc. − 5: Go to step 1 with probability 1 ρ∗(z, c) − 6: Output µ = (z, c)

Note that in step 5, we restart the whole signature creation process with probability 1 ρ∗ − depending on z, c. The reason why we do this, is that z is distributed according to Dσ,Sc: a discrete Gaussian centered around Sc. This means that after a couple of signatures, we have a lot of information about Sc: all signature values of z are centered around that value. This would mean it has the same weakness as GGH. The probability 1 ρ∗ is chosen such that, in − the long run, z is distributed according to Dσ. In other words: by occasionally rejecting a signature, we make sure that we do not leak any secret information. To be precise:   Sc   z,Sc  ρ∗ = 1/ M exp −|| || cosh h i 2σ2 σ2 where M > 1 is chosen such that in any case ρ∗ 1 for any z, c. ≤ Bit b is sampled such that we either add or subtract the secret part Sc from y. The authors of BLISS introduced this bit b, such that ρ∗ is smaller than without using this bit b. So this optimizes the speed, since we reject fewer signatures on average. The verification algorithm is as follows:

Algorithm 19 BLISS Verification Input: Public key A, signature µ for m. Bound β2. Output: Accept or Reject 1: Reject if z 2 > β2 or z ∞ > q/4. || ||? || || 2: Accept if c = H(Az + qc mod 2q m) ||

The first two conditions make sure that vector z is small. Bound β2 makes sure that z is sampled from a centered discrete Gaussian distribution. Now, a valid signature will be accepted since: Az + qc = A(y + ( 1)bSc) + qc = Ay + ( 1)bqc + qc Ay mod 2q − − ≡

35 because of the way how A, S are constructed. In the signing algorithm, the discrete Gaussian sample y Dn is used to hide the secret ∈ σ relation Sc. This step is actually the main speed bottleneck of this algorithm: sampling a discrete Gaussian is quite hard. But like with LWE, this step is very important since knowing y means you have relation: z y = ( 1)bSc − − where only bit b and secret key S is unknown. How to exploit this relation will be shown in the last chapter of this thesis.

6.3 Gaussian Sampling In this section we will zoom in on how to actually sample a discrete Gaussian, since it is a very important step and it is slowing the algorithm down significantly.

6.3.1 The Discrete Gaussian Distribution The discrete Gaussian distribution is easily visualized: it is the distribution defined over the integers, with the proportional values as the corresponding Gaussian distribution over the reals:

Dσ (x) 0.04 : σ = 10 : σ = 20 : σ = 30

0.03

0.02

0.01

x -40 -20 20 40

Figure 11: The (centered) discrete Gaussian distribution, with σ 10, 20, 30 . ∈ { }

The probability distribution of a discrete Gaussian distribution, with mean µ and standard deviation σ is given by: pµ,σ(x) Dµ,σ(x) = P+∞ y=−∞ pµ,σ(y)

 −(x−µ)2  for x Z and where pµ,σ(x) = exp 2 is the regular Gaussian probability. Note that ∈ 2σ the sum in the denominator is necessary to make sure that Dµ,σ is actually a probability distribution.

36 This distribution is defined over all integers Z, but in practice we do not need big numbers. This is why we use a so-called tail-cut and we only sample integers smaller than this cut. To be more precise, we pick a number τ and we only sample Gaussian distributed integers between τσ and +τσ. This suffices to hide the secret part: the tail probability is negligible for the − security loss. In practice often τ is the square root of the security level. What this means will be shown with the parameter suggestions. Furthermore, in lattice-based cryptography we always choose µ = 0, so we further denote D0,σ = Dσ. And since the Gaussian is symmetric around 0, we can focus on the non-negative part of this distribution and pick a random sign with probability 1/2. In this method, we need to take half of the zero’s we sample.

6.3.2 Rejection Sampling One of the earliest sampling methods for discrete Gaussians is based on rejection sampling. The basic rejection method is straightforward: it picks a random number x and with probability Dσ(x) it accepts the sample.

Algorithm 20 Basic Rejection Sampling Input: Standard deviation σ, tail-cut τ. Output: Gaussian sample y Dσ ∈ 1: Sample a random integer x 0, ..., τσ uniformly at random. ∈ { } 2: Compute random bit r with probability Dσ( x ) of being 1. 1 | | 3: If r1 = 0 goto step 1. 4: If x = 0: compute bit r2 with probability 1/2. If r2 = 0, goto step 1. 5: Compute random bit b with probability 1/2. 6: Output ( 1)bx −

So on the long run, these integers are Gaussian distributed. There are modifications on this algorithm, which will speed-up this method. However, in practice it is much easier to use a so-called Cumulative Distribution Table, which will be introduced next.

6.3.3 Cumulative Distribution Table As the name says, it uses a Cumulative Distribution Table (CDT): a table of inverses of the Cumulative Distribution Function F of the discrete Gaussian. For any number u [1/2, 1) ∈ there is a unique x Z+ such that ∈ X X F (x 1) = Dσ(i) < u Dσ(i) = F (x) − ≤ 0≤i≤x−1 x≤i≤τσ

By storing F (x) we can map a random number u [1/2, 1) to correctly distributed integers ∈ and pick a sign at the end.

37 Algorithm 21 CDT Sampling Input: Standard deviation σ, tail-cut τ. Cumulative Distribution Table F for integers in 0, ..., τσ { } Output: Gaussian sample y Dσ. ∈ 1: Compute a random number u [1/2, 1). ∈ 2: Find x Z such that F (x) < u F (x + 1). ∈ ≤ 3: Compute random bit b with probability 1/2. 4: Output ( 1)bx −

Here, F (0) should be reduced such that there is no doubled probability of sampling a zero. Note that for u very close to 1, the algorithm does not find a sample x, because of the tail-cut. However, we assume the probability of this to happen negligible. This method is the easiest and fastest way of sampling from the discrete Gaussian, but it needs a large table for storing all these values. This is the method often used, since the large table is no issue for modern CPU’s. This is only a problem for small devices, which can use rejection sampling.

6.4 Lattice Implementations Via NTRU Lattices A major bottleneck of quantum-secure systems, such as code/hash-based signatures, but also general lattice-based signature schemes, are the huge key-sizes. Luckily, in the case of lattices we can apply a small trick which, from what is known so far, does not influence security. By choosing so-called cyclic lattices, we can reduce the key-sizes of lattice-based cryptosystems by a significant amount. Instead of using a whole set of vectors as a basis, we can use only one vector and rotate it enough times to form a basis. An easy example is the identity matrix, which corresponds to a set of vectors which are all rotations of the unit vector. Using cyclic lattices means we can represent both public and private keys by a single vector each. But if we would like to do operations, we would need to compute the whole lattice, out of this single vector, every time we want to use it. However, these cyclic bases can be represented by a polynomial in a specific mathematical set. One can show a congruence n n between operations of cyclic lattices over Zq and operations over the ring R = Zq[x]/(x + 1). n Let f, g Zq[x]/(x + 1) be two polynomials, with coefficients fi, gi; 0 i n 1. Then for ∈ ≤ ≤ − the product h = f g we have: · X b i+j c hk = figj ( 1) n mod q · − i+j≡k mod n The last term of the sum is for the calculation mod(xn + 1), which means xn 1. The ≡ − multiplication can be written as a matrix-vector multiplication over Zq. We can rotate f to form matrix F, with entries Fij satisfying:

b i+j c Fij = fi j n ( 1) n + mod · − Let g be the vector with coefficients gi = gi. Then f g over R equals F g over Zq. · So instead of computing the lattice, we will model the single vector as a polynomial in R and perform operations via these rings. This greatly improves practicality, and it is therefore used and already standardized. The term NTRU lattice is used whenever we talk about these polynomials that represent cyclic lattices. So in real-life implementations, this is the system that is being used, but the underlying system relies on lattices.

38 6.5 Parameter Suggestions For BLISS In the paper on BLISS [8], the authors also give parameter suggestions. One of the tasks of a cryptographer is quantifying the security of a cryptographic system. In other words: how strong is a cryptographic algorithm? In an attempt to quantify it, often a security level λ is used, which is represented in bits. It means that it takes at least 2λ operations to break the system. It is based on the best known theoretic attack on that algorithm. Unfortunately, lattice-based cryptography is a fairly new area, which means the security levels are decreasing with every new cryptanalytic algorithm. Below in table1 are the numbers shown in [ 8], but it is possible that these numbers should be different now. The authors of BLISS introduce five parameter sets, each for different applications. The parameters we focus on are: security level λ, dimension n, modulus q, Gaussian standard deviation σ and sparsity κ. We also give the signature size in kilobits and signing speed in milliseconds, where the signature algorithm is optimized.

Parameter Set Optimized For Size Speed λ n q σ κ BLISS-0 (Toy) Fun 3.3 0.241 60 256 7681 100 12 ≤ BLISS-I Speed 5.6 0.124 128 512 12289 215 23 BLISS-II Size 5 0.480 128 512 12289 107 23 BLISS-III Security 6 0.203 160 512 12289 250 30 BLISS-IV Security 6.5 0.375 192 512 12289 271 39

Table 1: Parameter suggestions for BLISS from [8]. The resulting signature size is given in kilobits and the signature speed is given in milliseconds. There are more parameters to consider for implementations, details are in [8]

We end with a comparison of the signature sizes and signing speed with current widely used signature schemes RSA and elliptic curves with security level λ = 128 bits. RSA signatures require signature size at least 4 kb, which is about the same as BLISS. Signing speed of RSA in this case is about 8.6 milliseconds, which is way higher than BLISS. Elliptic curve signatures are very small, about 0.5 kb, but the speed is about the same as BLISS: 0.106 milliseconds. This means that BLISS signatures come close to the practical signatures we already use today, although elliptic curve signature sizes are out of reach. In the next two chapters, we will start exploring the possibilities of side-channel attacks against this scheme.

39 7 Side-Channel Attacks

7.1 Introduction To break public-key cryptography, it is not always necessary to break the underlying hard mathematical problem (mathematical cryptanalysis). So-called Side-Channel Attacks (SCA) use information leakage due to actual implementation of a cryptosystems, such as a digital signature scheme. Physical features of the implementation, such as power consumption, (cache)memory usage and timings, might be abused to retrieve secret information, which allows to find the secret key. Side-channel attacks have shown to be very effective in breaking real-world security, such as the widely used internet protocol SSL/TLS [5]. These attacks must always be considered when implementing cryptography.

7.2 Timing Attacks One of the first examples by Kocher [14] is using timing information in the modular exponen- tiation in the RSA signature scheme:

md mod N where only d is unknown. In practice, this is a time consuming operation and is therefor implemented using a combined Square-and-Multiply exponentiation method:

Algorithm 22 Square-and-Multiply Algorithm Input: Base a, exponent x, (big) modular N Output: y = ax mod N Pw−1 i 1: Let x = i=0 xi2 , with xi the bits of x. 2: Let s0 = 1. 3: Set y = 1 4: For k = w 1 to 0: − 5: If xk = 1: 6: Set y = y a mod N · 7: Set y = y2 mod N 8: Return y.

Pw−1 i In the case of RSA, depending on bit xk = dk of secret key d = i=0 di2 step 5 of the above algorithm is executed (multiplication), which is more time consuming than just the squaring (step 6). By gathering enough messages m and signature timings, one is able to retrieve bit dk by this timing information. Doing this bit-by-bit, one can retrieve secret key d. We will not go into details how to actually do this, but it is important to note that timing information of a such a small part of the algorithm can mean a significant security breach. In [5] the authors showed that it is indeed possible to mount such an attack.

7.3 Cache-Attacks A second, broad type of side-channel attacks are based on cache memory mechanisms. Although this requires access to CPU resources, it can be shown that also these attacks can be mounted remotely. For instance, in [20] the authors show it is possible to mount these attacks in regular

40 browsers using JavaScript. A cache memory is a small piece of fast memory, to bridge the gap between processor speed (fast) and working memory RAM speed (slow). The cache is shared between all threads: they all have to compete for the same resources. It aims at keeping the CPU as busy as possible by minimizing load/store latency. Most CPUs have two or three levels of cache-memory, called L1, L2 and L3. These caches are inclusive: data in L1 is also contained in L2, which is contained in L3, and all data is contained in the main memory. The sizes of these caches differ per CPU, but they are much smaller than main memory (the order of 3MB). A cache is divided into multiple small cache-lines, typically of 64 Bytes. This means memory data is also divided into parts of 64 Bytes. Data in main memory is assigned to a certain cache-set (set of cache-lines) by an address-tag and is associated with a cache-line via a mapping. The type of mapping depends on the CPU being used and there are two extreme scenarios: the direct mapping and the fully associative cache. Direct mapping means a memory location, assigned to a cache-set, has only one location where it can be stored in a cache-line. This means that the look-up will be quick, but different parts of the memory will compete for the same cache-line, while other parts are empty. The fully associative cache means a memory location can be stored anywhere in the cache. This means that a look-up will be slower, but it reduces the number of collisions. Figure 12 summarizes the story so far:

L1 Cache

Thread 1 Thread 2 Thread 3

Figure 12: Cache memory is fast memory close to the CPU and is shared among threads. In this example, three threads currently have data in cache and are competing for the same memory. Cache memory is divided in cache-lines, so data in main memory is also divided in blocks of the size of these cache-lines.

Now when data is requested, it is first checked to which cache-set it belongs. Then, depending on the mapping, it is checked if the data is inside the cache-lines it is associated with. This is done for every level of cache-memory until data is found, where each higher level causes more latency. However, retrieving data from main memory causes a significant latency, resulting in a so-called cache-miss. There are three types of cache-misses:

41 Cold start misses: they occur when data is first requested. • Capacity misses: they occur when the size of data exceeds the size of the cache. • Conflict misses: they occur when data from an earlier access has meanwhile been evicted • from the cache.

In this thesis, we will not consider capacity misses since the contents of the tables will not exceed the cache-size. However, there are several attacks based on cold start and conflict misses. The reason why we can take advantage of cache-misses, is because the cache memory is shared among all applications, processes and threads. Even if every process is executed in a sandbox-mode (no unwanted data-sharing) and therefore protected from malicious processes, cache-memory can be manipulated to retrieve secret information. There are several types of attacks:

Evict+Time. The attacker measures the time it takes to execute a piece of victim code. • The attacker then flushes the cache with own memory, and then times the victim code again. The difference in timing reveals something about whether the victim uses that part of the cache.

Prime + Probe. The attacker fills the cache with its own memory and waits for the • victim to execute his code. After this, the attacker measures the time it takes to access the memory that he placed in cache before. If a data access is slow, the victim needed that part of the cache and that reveals something about what the victim did.

Flush + Reload. This attack uses the fact that processes often share memory. The • attacker first flushes a shared memory address and then waits for the victim to execute his code and measures the time it takes to access the memory. The time will tell the attacker if the victim placed the address in question in the cache by accessing it.

Note the subtle, but important difference between Evict+Time and the other attacks. With Evict+Time, the attacker measures time based on execution of the victim code, but with the other attacks it measures time for his own memory accesses. This means Evict+Time has more noise in its timings. There are different ways of flushing the cache, but the general concept is by using eviction set(s). An eviction set is a set of locations in the memory which, when accessed, occupies a single cache-line mapping to the same cache-set, which is shared with the victims code. By accessing all locations in the eviction set, all victim data is removed (flushed) from the cache and the attacks can begin.

42 L1 Cache L1 Cache

Attacker Data Attacker Data Vitcim Table Victim Data Victim Data

Learning Algorithms

(a) Prime (b) Probe

Figure 13: Visualization of Prime + Probe cache attack. The attacker fills the cache with his data (step (a)) and waits for the victim to perform cryptographic operations. When the attacker notices that the victim is putting data in cache (step (b)), his data will be removed and will cause delays in his memory accesses. By carefully learning the access patterns from the victim, since there is a mapping from main memory to cache-lines, the attacker is able to learn which data from the victim table has been used.

Note that the most one can learn from these attacks is therefore if a certain cache-line is used by the victim process. If multiple, different variables are mapped to the same cache-line, we are unable to see the difference. This is why in general one does not learn the exact data the victim accessed, but a range of possibilities. However, it has been shown that this is enough to retrieve a secret RSA key (as visualised in figure 14 or AES’ symmetric key. By carefully monitoring the cache (Prime + Probe) and looking for activities from the victim, Percival [21] was able to track the activities of the Square-and-Multiply, together with the Chinese Remainder Theorem, of algorithm 7.2, used in RSA in an old OpenSSL version:

43 Figure 14: A visualization of a cache attack by monitoring the cache, using Prime + Probe. The shading of each block indicates the number of CPU cycles needed to access all the cache-lines in a cache-set, where darker blocks means more cycles (Picture from [21])

44 7.4 Countermeasures In the process of developing cryptographic systems, one always has to take these side-channel attacks into account. An obvious prevention of timing attacks are constant-time implemen- tations, but they have in general a few downsides. To make an implementation constant time, everything has to work at the worst-case speed, which can greatly reduce practicality. It has been showed by Bernstein [3] that even an implementation using table look-ups has time dependencies. Another countermeasure against timing attacks is using so-called masking: randomizing of secret operations. A nice masking example is that of RSA. Instead of signing operation md mod N, one takes a random integer r and computes (rm)d = rdmd mod N. The signature is then computed by division by rd. Any timing information gained from signing is randomized by r and therefor useless. However, this increases the complexity of implementation. An attacker could also ask for a signature of a message twice, which might still reveal the necessary information. Countermeasures for cache-attacks are not straightforward. A simple technique might be to pre-load all possibly necessary tables before a cryptographic operation. Besides the practicality issue, since tables are loaded before each operation, increasing the operation time, this might not even work. Modern CPU’s often allow multi-threading: the sharing of CPU resources between threads. In this case, the cache is even shared between operations. This means that after loading the whole table or parts of it, an attacker with a malicious thread can still remove the whole table from cache before secret cryptographic operations are done. The cache-attack will still be possible in that case. The best way is to deal with these attacks by design. After this brief introduction to side-channel attacks, we move on to the final chapter, in which we examine possibilities of side-channel attacks on BLISS.

45 8 Cache-Attacks on BLISS

The main result of this Masters’ thesis is a practical side-channel attack against BLISS. The bottleneck of this signature scheme is calculating discrete-Gaussian-distributed values for a noise vector. We will revisit the two algorithms considered most practical and used by BLISS, one using a big table and one using a very small table. While the first method is the fastest and most practical way of sampling, the second method is more suitable for small devices. For both of these Gaussian sampling methods, there is a weakness we can exploit using a cache-attack.

8.1 Intuition behind the Cache-Attacks The setting of the attacks given next, is the following. We use a simplified version of BLISS: n×n the victim has two keys, A, S Z , such that AS qIn mod 2q and public key A and ∈ 2q ≡ secret key S represent full rank lattices. We assume an attacker has access to the cache of a victim, and can mount cache-attacks (Prime + Probe). The victim is signing multiple messages/transactions and the attacker is collecting these signatures, together with cache information from signing. Using this additional information, the attacker wants to extract the secret key of the victim. The cache-attacks will target the noise-vector y Dn required in the ∈ σ BLISS signature scheme. The signature µ = (z, c) hides a system of n equations over Z:

 z   y   s   c  1 1 − 1 − 1  z2   y2  b  s2   c2    =   + ( 1) − −    ...   ...  |−{z }  ...  ·  ...  Sign − − zn yn sn cn | {z } | {z } | − {z − } | {z } Signature Noise Secret Lattice Signature

n n×n Here, y Dσ , b 0, 1 and S Zq are unknowns. But since we will target an imple- ∈ ∈ { } ∈ mentation of the scheme, the secret lattice S = si is an NTRU lattice, meaning it is a { } cyclic rotation of one vector s. The part S c can be modeled as a polynomial multiplication. · However, instead of rotating s into lattice S, we can also rotate signature vector c into lattice C with the relation: Sc = Cs This means the hidden relation of the signatures becomes:

 z   y   c   s  1 1 − 1 − 1  z2   y2  b  c2   s2    =   + ( 1) − −    ...   ...  |−{z }  ...  ·  ...  Sign − − zn yn cn sn | {z } | {z } | − {z − } | {z } Signature Noise Signature Lattice Secret

b Here, ci are rotations of c. We can write this equation as z = y + ( 1) Cs. − Now suppose we can determine any noise vector y Dσ from cache information. If we ∈ collect one signature, then only b and s are unknown in the equation above and we can write it as: ( 1)b(z y) = Cs − −

46 We can use a linear solver twice (once for each value of b) and extract secret vector s easily. We can verify correctness of s with the public key. So this is quite easy, but being able to determine any noise vector from cache information seems a bit too optimistic. So let us be more restrictive: suppose we can determine, from cache information, if a coordinate yi of y is in some small set G, where we know G in advance. We do not make any errors with this determination whether yi is in set G and if yi is indeed in this set G, we can determine the value. Since G is sparse, we probably need more than one signature, because for some coordinates in y we cannot determine its value. In total, we need to acquire (at least) n linear relations, where n is the dimension of s. Suppose we need N signatures before n coordinates yi are known to be in set G. We get signatures µj = (zj, cj) with b zj = y + ( 1) j Cjs for 1 j N. Here, Cj has rows cji, where cji are rotations of signature j − ≤ ≤ vector cj, and zji, yji are coordinates of zj, y for 1 i n. We can use the above vector j ≤ ≤ equations and zoom in on coordinate-wise equations, and get:

bj zji = yji + ( 1) cji, s − h i

So suppose, from cache-information, we know that coordinate yji is in set G and determined its value. We can again write the equation as:

bj ( 1) (zji yji) = cji, s − − h i

In this equation, bit bj and secret vector s are unknown, so we would like to save this rotated vector cji and zji yji for our linear solver. So let us call ζk = cji with zji yji = zk yk as − − − corresponding value. We can acquire n of these equations using multiple signatures and form the following system:

 b1      ( 1) (z1 y1) ζ1 s1 − b2 − − −  ( 1) (z2 y2)   ζ2   s2   − −  = − −    ...   ...  ·  ...  bn − − ( 1) (zn yn) ζn sn − − − − Unfortunately: all n bits bk are unknown. This would mean we cannot use a linear solver, or we have to try a linear solver 2N times, where N is the number of signatures. For large N, this is useless. But we can apply a restriction to the equation we got earlier:

bj zji = yji + ( 1) cji, s − h i

We can require that zji must be equal to yji, by verifying with the signature vector zj, before we use cji as one of the n vectors ζk in the system we want to solve in the last step. By doing this, we would eliminate bit bj:

bj ( 1) (zji yji) = 0 = cji, s − − h i

If we collect n of the vectors ζk = cji satisfying the above equation, we end up with system:

Ms = 0 where M is a matrix of (possibly rotated) vectors ζk = cji, extracted from multiple signatures, and 0 is the all-zero vector. We can simply compute the kernel of M and search for the secret key s in the kernel space.

47 As a final extension of this analysis, suppose we make an error with very low probability α (0, 1) in the determination whether coordinate yji is in set G . If we make an error, then ∈ it was not yji that was sampled but it was yji 1, so we make an error up to 1. We can then ± apply the same method as above, but Ms would not be the zero vector, but rather a small vector in the lattice spanned by the signature vectors ζk = cji of M. Intuitively, we can use LLL to search for small vectors in this lattice. We will show that this is indeed possible. The above overview of the cache-attack is explained more thoroughly in the remaining part of this section.

8.2 Cache-Attack Model In the previous chapter, we introduced several building blocks of the cache and types of attacks. For a cache-attack on BLISS, we make the following assumptions of the CPU model:

Cache-lines are 64 Bytes, whereas each entry of the tables used in BLISS is 8 bytes • (data-type LONG). This means there are 8 entries in each cache-line.

Subsequent table entries are in subsequent cache-lines. That is, entry at position i of • the table is in cache-line position i/8 . b c Memory is mapped via direct mapping: each cache-line is mapped to the same location • in cache each time.

We implemented a modeled version of the Prime + Probe cache-attack. This means • in the actual experiments, we assume we are getting information about the position of cache-activity of the victim. Furthermore, we assume we are able to map this cache- activity to certain cache-lines of the specific tables used by the victim. In other words: for each table look-up at position i we get the cache-line position i/8 that has been b c used by the victim.

These are some quite strong assumptions, but it is realistic that a cache-attack is able to get this information without many errors. It requires a more sophisticated implementation to actually measure cache-activity for the Prime + Probe part. Lastly, we assume we know the parameters of the system, which is a mild assumption. Some of the parameters are also required for verification.

8.3 Cache-Attack 1: CDT Sampling 8.3.1 Modified CDT Sampling with Acceleration Table The first and easiest way is to sample from a discrete Gaussian distribution is to construct a large table with all possible values for the inverse of the distribution and sample with random values between 1/2 and 1. This is also described in section 6.3.3. However, to actually acquire the correct integer on input of the random value, means we need to search in the table. The most common way of doing this is to do a binary search, which will take O(log N) steps where N = τσ is the size of the table. Since the precision of these table entries needs to be quite large, it will also require a lot of bit comparisons per step in the binary search. Taking all into account, using such a method will still be rather slow. There is a faster CDT sampling method, relying on two speed-ups. First, instead of doing a binary search on the whole table each time, the method first selects an interval in the table and then perform a binary search

48 only on that interval. This requires a second table containing intervals, which is called an Acceleration Table AT. The easiest way of implementing this is to have an acceleration table with 256 entries, where the intervals grow larger towards the end. This means you can simply sample a byte, and pick the correct table entry immediately. Using such a table will reduce the number of steps needed for the binary search. Note that the interval is sampled uniformly at random, but in the end we want a discrete Gaussian distributed value. So we need to take the uniformity of the interval into account. What this means in practice, is that some of the intervals are partially overlapping: AT[i] AT[j] = for some i, j 0, 255 . At the end, we ∩ 6 ∅ ∈ { } want for each integer x:

256 X P[AT[i]] P[X = x X AT[i]] = Dσ(x) · | ∈ i=0 We call this equation the Probability Requirement. In other words: the total probability divided over the intervals for integer x, which is in the end the probability to sample x, should be equal to the probability according to a discrete Gaussian. So the search must be done in such a way, that it satisfies the above requirement. We can model each step of the binary search tree within an interval AT[i] with a probability pi to go to the left child and (1 pi) to − go to the right child. Each step contributes to the probability P[X = x X AT[i]] of sampling | ∈ value x in AT[i]. All these probabilities added up should satisfy the Probability Requirement. We use this property later in our attack.

Intervals 1 2 3 4 5 6 7 8 9 10 11 Values r2

p1 1 p1 −

p2 1 p2 p3 1 p3 − −

x1 x2 x3 x4

Figure 15: The values of the large Cumulative Distribution Table are divided over intervals. First, an interval is selected uniform at random. In this example, interval 9 is picked. Then, via a binary search the correct value is searched for. The probabilities pi are such that the Probability Requirement is satisfied. The total probability to hit x2 in interval 9 is in this case: P[X = x X AT[9]] = p (1 p ). 2| ∈ 1 · − 2

Second, instead of using the full precision to compare the random value to the table entry, the method does byte-per-byte comparisons and uses more random numbers when there is need for more precision. This is done via a binary search. Most of the table entries are separated enough to determine which value is to be sampled, given the first few bytes. This effect grows when only comparing inside the same interval. When it is clear which sample

49 is to be retrieved, one can stop the search and output it. So we assume the table entries are sequences of some fixed number of bytes, where we can compare byte per byte until we reached the correct value. So the fastest algorithm using a big Cumulative Distribution Table with Acceleration Table is roughly as follows:

Algorithm 23 CDT Sampling with Acceleration Table Input: Cumulative Distribution Table CDT with standard deviation σ, tail-length τ. Ac- celeration table AT containing intervals. Each entry CDT[x] is a sequence of bytes and represents a floating point approximation of the inverses of the Cumulative Distribution Function values of Dσ. Output: Discrete-Gaussian-distributed sample y Dσ ∈ 1: Pick a random byte r 0, ..., 255 . 1 ∈ { } 2: Let interval I = AT[r1] 0, ..., τσ be the interval to search for a sample. ⊂ { } 3: Set j = 1. 4: Pick a random byte r2. 5: Perform a binary search in I, using table look-ups in table CDT, and try to find x Z ∈ such that CDT[x] < r CDT[x + 1] for byte j of the table entries. 2 ≤ 6: If the binary search fails to find an x Z for this comparison, take j j + 1 and go to ∈ ← step 4. 7: Compute random sign b with probability 1/2. 8: Output y = ( 1)bx. −

For most values of r2, step 5 will find the sample for j = 1.

8.3.2 Cache-Attack Weaknesses In the cache-attack model, we assumed to get the index of the cache-lines of the tables used. What this means for the sampling method from algorithm 23, is that we obtain the following cache information for each coordinate yi in y: The cache-line of interval table AT: a range of 8 adjacent intervals R = AT[i], ..., AT[i + • { 7] , i = 0, 8, 16, ..,. We can use the inverse of the table to retrieve the correct i, so } ultimately we know: r i, ..., i + 7 . 1 ∈ { } The cache-line of every table-lookup of the CDT table, needed in the binary search: • a range of 8 adjacent values T = CDT[j], ..., CDT[j + 7] , j = 0, 8, 16, ... We can { } use the inverse of the table to retrieve the correct j, for which we know that lookup x j, ..., j + 7 and CDT[x] T . ∈ { } ∈ There are two types of cache-weaknesses in this sampling method, and we will denote these by an Intersection weakness and a Jump weakness. Intersection Weakness We exploit the fact that we are using two tables and in both cases we get the cache-line. Given these two cache-lines, we can intersect the possible intervals from cache-line R with the possible CDT table look-ups of cache-line T . Now this can give an interval J with the property:

J R and x J : x T ∈ ∀ ∈ ∈

50 Value 0 1 2 3 4 5 6 7 8 9 10 11 Cache-Line CDT 0 1 Cache-Line Interval Table 0 1

Table 2: Visualization of an Intersection Weakness. In this example, in red the cache-lines of a table look-up are given for both the CDT and Interval table. By intersecting these two cache-lines, we get an interval with the possible sampled values (in green).

Jump Weakness We exploit the fact that for bigger intervals, the binary search part in CDT becomes larger and larger. For some integers G at the boundary of these intervals, the table lookup is in a different cache-line compared to all other integers not in G in the same interval. That is, we have a set G of possible values for the table entries, with:

G AT[i] and x G : x T and y AT[i] G : y T ⊂ ∀ ∈ ∈ 2 ∀ ∈ \ ∈ 1 where 0 i 255. So G is a subset of AT[i], such that its elements lie in a different ≤ ≤ cache-line compared to the remaining set AT[i] G. T ,T are different cache-lines of \ 1 2 the CDT table look-ups. This means there is a jump in the used cache-lines for certain values of interval AT[i]. So we know for a certain cache-access pattern that the corresponding Gaussian sample y G. | | ∈ Value 0 1 2 3 4 5 6 7 8 9 10 11 Cache-Line CDT 0 1 Interval 0 1 2

Table 3: Visualization of a Jump Weakness. In this example, interval AT [1] = 5, 6, 7, 8, 9 is { } divided over two cache-lines of the CDT, line T = 0, 1, 2, 3, 4, 5, 6, 7 and T = 8, 9, 10, 11, ... . 0 { } 1 { } Since the binary search begins in the middle of the interval, at value 7, cache-line T0 (denoted in yellow and red) is always requested. However, only for values 8 and 9, cache-line T1 is also requested in addition. So when both cache-line T0 and T1 are requested, we get a set G = 8, 9 of possible values. { }

In appendixA, there is a table of weaknesses for every parameter set advised by the authors of BLISS. These two types of weaknesses can give a range of possible values for coordinate yi of noise vector y Dn. However, if this range is too large, we do not gain enough information to find ∈ σ the secret key. In the next section, we describe two ways of restricting ourselves, which help us to recover more information.

8.3.3 Exploiting the Weakness For each standard deviation σ, we can identify possible Intersection and Jump weaknesses in the Gaussian Sampling method. Moreover, we can restrict ourselves to only use those weaknesses with additional properties, which will help us in the offline part. Denote the size of an Intersection weakness by the size of interval J and the size of a Jump weakness by | |

51 the size of set G , where J, G are as defined in the previous section. For the CDT Sampling | | method, we have the following additional requirements:

Size Requirement The weaknesses derived in the previous section can have any size, but will at least be of size two (or an interval is totally unique). The reason for this is simple: the binary search is a search in a binary tree, which means the last step will always return one of two values. We will restrict ourselves to weaknesses of size up to two. Then we know from the cache-line analysis that coordinate yi must be one of two values: yi g , g + 1 or yi g , g 1 for some value g 0, ..., τσ . From | | ∈ { 1 1 } | | ∈ { 1 1 − } 1 ⊂ { } now, we denote these possibilities for yi by yi g , g 1 . Now since we narrowed | | | | ∈ { 1 1 ± } the possibilities down to two values, we make an error up to 1 if we assume either of them to be true.

Biased Requirement Remember that the intervals can be partially overlapping and have the probability requirement that

255 X P[AT[i]] P[X = x X AT[i]] = Dσ(x) · | ∈ i=0

Each step in the binary search can be modeled as a probability pi to go to the left child and probability (1 pi) to go to the right child. By using cache weaknesses, combined − with the Size Requirement, we have narrowed the possibilities of coordinate yi down to two: yi g , g 1 . It means we know the path in the binary search tree up to | | ∈ { 1 1 ± } the last step. For some samples however, due to the above probability requirement and probabilities pi, we can furthermore measure that if yi g , g 1 , then pi = α of the | | ∈ { 1 1 ± } time it will be g (or g 1). In other words: the last step in the binary search within 1 1 ± an interval is very biased towards one value. We only take those weaknesses which have a small α, and thus are very biased. By assuming the value with highest probability (1 pi) = (1 α) to be true, we make an error of size at most 1 with low probability α. − −

52 Intervals 1 2 3 4 5 6 7 8 9 10 11 Values r2

p1 1 p1 −

α 0 1 α p3 1 p3 ≈ − −

x1 x2 x3 x4

Cache Weakness

Figure 16: Example of weakness satisfying the biased requirement. In this example, both x1 and x2 are part of a cache weakness. So when the left part of interval 9 is requested, the attacker will know it up to the last step in the tree. When this is the case, with probability (1 α), this will be x . For small α( 0), this gives the attacker additional information: they − 2 ≈ can assume x to be true and that is correct with high probability (1 α). This behavior is 2 − possible, because of the Probability Requirement.

Given both the Size and Biased Requirement, we narrow the number of exploitable weaknesses a little bit, but this will help us a lot in the offline part. In appendixA, there is a table of weaknesses satisfying the Size and Biased requirements for every parameter set advised by the authors of BLISS.

8.3.4 Extracting the Secret Key Note that in the previous section we concluded that we can learn that the absolute value of a coordinate yi in the noise vector is one of two values: yi g , g 1 , which also leaves the | | ∈ { 1 1 ± } question open which sign coordinate yi has. However, we assume we can learn it by looking b at the sign of zi = yi + ( 1) ci, s , because sign(zi) = sign(yi) is only possible if and only − h i 6 if ci, s zi + yi. Since both c, s are sparse and small, we assume this possibility to be |h i| ≥ negligible. So when we learn yi , we learn its sign by looking at the sign of zi. | | In section 8.1, when discussing the intuition behind the cache-attacks, we ended with a scenario, in which we were able to determine whether coordinate yi is in some sparse set G, and we make an error with low probability α (0, 1). For these coordinates yi, we can ∈ determine their values with an error up to size 1. To link it with the previous section: cache weaknesses satisfying the Size requirement enable us to get a sparse set G, where we can determine the coordinate yi and make an error up to size 1, and the Biased requirement with small α enables the low probability in making an error. In this section, we finalize the attack using LLL. We assume an attacker Eve has access to the cache of victim Alice and can mount cache-

53 attacks. Alice has a BLISS key-pair (A, S) = (a, s) and signs multiple messages/transactions. Eve collects information from cache, together with these signatures µj = (zj, cj) with zj = b y + ( 1) j Cjs for 1 j N, where the number of signatures N is high enough to have j − ≤ ≤ enough linear relations to extract the secret key. Here, Cj has rows cji, where cji are rotations of signature vector cj, and zji, yji are coordinates of zj, y for 1 i n. We can use the j ≤ ≤ above vector equations and zoom in on coordinate-wise equations, and get:

bj zji = yji + ( 1) cji, s − h i

So suppose, from cache-information, Eve knows that coordinate yji is in set G and determined its value, and makes an error up to 1 with probability α. Eve requires that zji = yji before collecting ζk = cji in a matrix M whose rows are the vectors ζk, because in that case: ζk, s 0, 1, 1 h i ∈ { − } but with probability (1 α): ζk, s = 0. After collecting n of these vectors ζk, we have the − h i following information regarding Ms:

E[ Ms 2] = αn || ||2 It means for small α, vector Ms is a small vector in the lattice spanned by rows ζk of M. We can use LLL basis reduction algorithm on M, to get an LLL-reduced version MR of M and a unitary transformation matrix U, with:

MU = MR We cannot verify correctness of vector Ms, so we cannot search for it in MR. However, we can try all columns uk of U and test whether this gives the secret key by verifying correctness with the public key. To do that, we rotate vector uk into NTRU lattice P and check whether ? AP qI mod 2q. If this is the case, we found the secret key (or its negative)! ≡ Note that we used a vague notion of ”short vector” Ms, because it is unclear under what conditions it is short enough, that a basis reduction algorithm, like LLL, finds it. We do not (yet) have a proof when LLL finds it, but in practice we have an easy way to make sure it finds it: randomize this process. Instead of waiting for n vectors ζk = cji, gather more than n, for instance 2n vectors, and pick a random subset of n vectors as input for LLL. Experiments (section 8.5) confirm that this method works and succeeds in finding the secret key (or its negative). The cache-attack is summarized as follows:

54 Algorithm 24 Cache-Attack on BLISS with CDT Sampling Input: Signer Alice with key-pair (A, S). Malicious Eve with access to cache-patterns of Alice. Input parameters n, σ, q, κ of BLISS. Eve has access to signature vectors (z, c) from Alice. Alice uses CDT sampling with table AT for noise vector y. Output: Eve extracts secret key S of Alice. 1: Let k = 0 be the number of vectors gained by Eve and let L = [] be an empty list of vectors. 2: While (k < 2n): 3: Alice creates signature (z, c). Eve collects this signature, together with cache information for each coordinate yi of noise vector y. Let ci be a rotation of vector of c. 4: For each i = 1, ..., n: 5: If Eve can determine coordinate yi (with error probability α) from cache information and if zi = yi: the include vector ζk = ci in M and set k = k + 1. 6: End While. 7: Set boolean KeyF ound = F alse. 8: While (not KeyFound): 9: Take a random subset of n vectors of L and construct matrix M. R 10: Perform LLL basis reduction on M to get: MU = M , where U is a unitary transformation matrix and MR is LLL reduced. 11: For each i = 1, ..., n: 12: Construct NTRU lattice P by rotating column vector ui of U. ? 13: Check if AP qI mod 2q, then KeyF ound = T rue. ≡ 14: Return secret key S = P of Alice.

8.3.5 Complexity Analysis There are two things that determine the speed of this side-channel attack: the number of signatures needed to get enough signatures and the number of LLL lattice basis reductions needed to find the secret key. We will give a formula for the first complexity, but for the second part we first need to understand why LLL can be used to find the key, which is still an open question at this moment. Experimental results suggest that about 2 LLL computations are sufficient to find the secret key. We do know that we need Ms to be a small vector, otherwise it will not be in the reduced version of M. This is satisfied, because we restricted the sizes with the size and biased requirements. From the cache weaknesses we get a set G, for which we can determine if yi G and we ∈ make an error up to size 1 with probability α. It means that for yi G, there is a cache access ∈ pattern wi with a weakness, satisfying the size and biased requirement. Let W be the set of cache-access patterns for set G. Then if wi W , we know that yi G and can determine its ∈ ∈ value. Then step 5 of the algorithm is satisfied when wi W and zi = yi. The probabilities of ∈ these events are independent of each-other, which means:

P[zi = yi, wi W ] = P[zi = yi] P[wi W ] ∈ · ∈ The first part we can write in a different way:

P[zi = yi] = P[ s, ci = 0] h i

55 The right hand side can be calculated, because we know the distributions of s and ci. The second part, P[wi W ], can be calculated using knowledge about the interval table ∈ AT and the probability requirement. We use a heuristic approach to calculate this probability by simulation. In total, the expected number of signatures N, to acquire 2n vectors ζk needed in the algorithm equals: 2n 2 E[N] = = n P[zi = yi, wi W ] P[ s, c = 0]P[wi W ] · ∈ h i ∈ In the Experiments section (section 8.5), this expected number is given together with the experimental values.

8.4 Cache-Attack 2: Rejection Sampling 8.4.1 Modified Rejection Sampling with Exponential Table The huge downside of CDT sampling is the big look-up table. Section 6.3.2 introduced rejection sampling, but this method has significant practicality issues: the number of rejections and the cost of calculating the probability to reject. In the same paper [8] where BLISS is introduced, the authors describe an improved rejection sampling algorithm relying on two speedups. First, instead of trying to sample from Dσ, break this distribution into pieces that are nicer to sample from, using the following property:

Dσ = KDσ + U(0,K 1) 2 − σ where K = + 1 is the number of pieces, Dσ is the discrete binary Gaussian distribution d σ2 e 2 with standard deviation σ = 1 and U(0,K 1) is the discrete Uniform distribution between 2 2 ln 2 − 0 and K 1. This means, instead of sampling from Dσ, one needs to draw one sample from − Dσ and one sample from U(0,K 1). There is an efficient way of sampling from Dσ by using 2 − 2 random bits [8] and sampling from a discrete uniform distribution is easy. If samples x Dσ 1 ∈ 2 and x2 U(0,K 1) are sampled accordingly, then y = K x1 + x2 is distributed according ∈ − · 2 to Dσ if we accept with probability exp( x (x + 2Kx )/(2σ )). The number of times this is − 2 2 1 rejected is way less than the number of rejections with plain rejection sampling (section 6.3.2). So by breaking the distribution into smaller pieces, the sampling time decreases significantly. The second speedup is based on the fact that accepting a sample with an exponential valued probability is hard to do in practice. These exponential values need to have high precision, and hence take some time to calculate on the fly. An easy improvement is simply storing these exponential values and looking them up in a table. But this would still require a big look-up table, which is what we want to avoid in the first place. However, one can use the binary representation of the sample and perform a rejection step for each non-zero bit of it. This means, instead of needing N table entries, one needs log N. Combining this method with the previous speed-up, means the table size is log(N) for N = (K 1)(K 1 + 2Kτσ ) − − 2 for tail-cut τ. Using both speedups, the modified rejection sampling algorithm with an exponential table is summarized below:

56 Algorithm 25 Rejection Sampling with Exponential Table 1 σ Input: Standard deviation σ, tail-cut τ. Values σ2 = , K = + 1. Small exponential 2 ln 2 d σ2 e table ET with exponential values exp(i/(2σ2)) for i 0, ..., log N where N = (K ∈ { } − 1)(K 1 + 2Kτσ ) − 2 Output: Gaussian sample y Dσ. ∈ 1: Sample x Dσ via rejection sampling using random bits. 1 ∈ 2 2: Sample x U(0,K 1) and set y = Kx + x . 2 ∈ − 1 2 3: Set r = x (x + 2Kx ). 1 − 2 2 1 4: For each non-zero bit of r1, compute an exponentially distributed bit using the exponential values in table ET. If this bit is zero, goto step 1. 5: If y = 0: compute bit r2 with probability 1/2. If r2 = 0, goto step 1. 6: Compute random sign bit b with probability 1/2. 7: Output ( 1)by −

8.4.2 Cache-Attack Weakness and Exploitation For this second cache-attack to work, we assume to get the following information for each sample yi in vector y:

The activity of table ET for the last trial in the sampling algorithm, that is: has there • been a table look-up in ET for the last non-rejected value, which is ultimately the value for yi.

This information is based on step 5 of the algorithm. Only in this step there are possible table look-ups in table ET. However, when x2 = 0, there are no table look-ups at all in ET, since r1 = 0. Let the W eight(z) of a number z be defined as the Hamming-Weight of z, that is: the number of non-zero coefficients in the binary representation of z. Then step 5 of the algorithm will not use a table-lookup for when W eight(z) = 0 with z = x (x + 2Kx): − 2 2

Weight( y(y +2 K x)) − · ·

10

8

6

4

2

y +xK 50 100 150 200

Figure 17: The weight of numbers z = x (x + 2Kx) where x 0, 1, ..., K 1 and − 2 2 2 ∈ { − } x 0, 1, 2, 3 for K = 50. These weights represent the number of table look-ups in table ET ∈ { } in the rejection sampling algorithm

57 So when there is no table look-up for yi, we know that yi 0,K, 2K, ... = G. Again | | ∈ { } we assume to learn the sign of yi by looking at corresponding coordinate zi of signature vector z. That this determines a unique yi, we need s, ci κ < K. Furthermore, since κ < K, we |h i| ≤ know the exact value of yi if there was no table look-up used to sample. Verifying that κ < K is easily done using the public parameters.

8.4.3 Extracting the Secret Key We can use the same method as with the first cache-attack, to combine information from cache with information from signature vector z. In the intuition of the cache-attacks, section 8.1, we discussed an attack scenario where we can determine, from cache information, if a coordinate yi of y is in some sparse set G, where we know G in advance. We do not make any errors with this determination whether yi is in set G and if yi is indeed in this set G, we know its value. The above weakness satisfies this scenario: if there is no table look-up in ET for coordinate yi, we know what the value of yi is, without making any errors. Analogues to the previous attack: Alice has a BLISS key-pair (A, S) and signs multiple transactions/messages. Eve collects information from cache, together with these signatures b µj = (zj, cj) with zj = y + ( 1) j Cjs for 1 j N, where the number of signatures N j − ≤ ≤ is high enough to have enough linear relations to extract the secret key. Here, Cj has rows cji, where cji are rotations of signature vector cj, and zji, yji are coordinates of zj, yj for 1 i n. Again using coordinate-wise equations: ≤ ≤ bj zji = yji + ( 1) cji, s . − h i

So suppose, from cache-information, Eve knows that coordinate yji 0, K, 2K, .... ∈ { ± ± } and exactly determined its value. Eve requires that zji = yji before including ζk = cji in a matrix M whose rows are the vectors ζk, because in that case:

ζk, s = 0 h i

When Eve collects n of these vectors ζk satisfying above equation, she has the following system:

Ms = 0 It means that s is a kernel vector of M. One can prove that when collecting n vectors uniformly at random, they are most likely to be linearly independent. This means that M is likely to have no kernel vector at all, so if it does have a kernel vector, it has to be the secret vector! Calculating the kernel space of a matrix is easily done. However, if the kernel space does not contain the secret vector, one can collect more vectors and repeat. In total, the cache-attack works as follows:

58 Algorithm 26 Cache-Attack on BLISS with Rejection sampling Input: Signer Alice with key-pair (A, S). Malicious Eve with access to cache-patterns of Alice. Input parameters n, σ, q, κ of BLISS. Eve has access to signature vectors (z, c) from Alice. Alice uses rejection sampling with table ET for noise vector y. Output: Eve extracts secret key S of Alice. 1: Let k = 0 be the number of vectors gained by Eve and let L = [] be an empty list of vectors. 2: While(k < n): 3: Alice creates signature (z, c). Eve collects this signature, together with cache information for each coordinate yi of noise vector y. Let ci be a rotation of vector of c. 4: For each i = 1, ..., n: 5: If Eve can determine coordinate yi (with no error) from cache information and if zi = yi: then include vector ζk = ci in L and set k = k + 1. 6: End While. 7: Calculate the kernel space of L, this gives a matrix U such that LU = 0, where 0 is the all-zero matrix. 8: For each column ui of U: 9: Construct NTRU lattice P by rotating column vector ui of U. ? 10: Check if AP qI mod 2q. If this is the case, return secret key S = P of Alice. ≡ 11: Goto step 2.

8.4.4 Complexity Analysis As with Cache-Attack 1, the complexity is determined by the number of signatures needed and the offline part, which is calculated as the running time of finding the kernel space, divided by the probability that the secret vector is in the kernel. For simplicity we assume that the probability that the secret vector is part of the kernel, is equal to the probability that n random vectors are linearly independent, which is approximately 1. This means that the offline part is negligible and that the algorithm will always terminate. Estimating the probability that we can determine yi is easier now and can be calculated exactly, because it is the probability that yi 0, K, 2K, ... . Again when zi = yi, we know ∈ { ± ± } that c, s = 0. So in total, we get the expected number of signatures to be: h i 1 E[N] = P[yi 0, K, 2K ] P[ s, c = 0] ∈ { ± ± } · h i In the next section, this expected number is given together with the experimental values.

8.5 Experiments The authors of BLISS provided a research oriented implementation of the signature scheme on their web-page [7]. This implementation is not optimized, but sufficed to show experimental results of the above cache-attacks. The implementation of the signature schemes are tweaked to provide the cache information. For each of the parameter sets advised eventually by the authors, the cache-attack succeeds in finding the secret key and breaking the scheme. The following table states the expected value of the number of signatures E[N] based on the complexity analysis and the average experimental values N for the required number of signatures.

59 Sampling Method Parameter Set BLISS E[N] N Running Time Offline Part CDT Sampling BLISS-0 (Toy) 5036 5084 3.535 (1.69) BLISS-I 900 880 57.792 (1.73) BLISS-II 4039 4032 72.951 (2.18) BLISS-III 1859 1895 45.272 (1.18) BLISS-IV 2377 2402 66.875 (1.64) Rejection Sampling BLISS-0 (Toy) 1102 1113 0.839 (1.0) BLISS-I 1694 1671 14.709 (1.0) BLISS-II 839 824 14.437 (1.0) BLISS-III 2970 3018 15.951 (1.0) BLISS-IV 4154 4223 18.103 (1.0)

Table 4: Experimental results of cache-attacks 1 and 2 on BLISS. For each parameter set and sampling method, we did 100 experiments and all succeeded in finding the secret key. The expected number of necessary signatures E[N] and the average experimental value of N are given. Time is given in seconds, between parentheses is the average number of LLL’s and kernel calculations.

The experimental number of signatures are close to the expected necessary number of signatures. An important end note is that the offline part is influenced by dimension n, but for the CDT sampling this highly depends on the number of LLL computations one has to perform. Since this is still unclear, the timings are just an indication of a possible running time, nothing more. The experiments always succeed to find the secret key and the average running times are quite low. In the case of rejection sampling we do know that the kernel will be found with probability about 1, which is confirmed by the experiments.

8.6 Countermeasures We discuss some countermeasures against the previously given cache attacks. In the CDT sampling method, we use two tables which give rise to two different cache weaknesses. The jump weaknesses can be countered by simply iterating all values inside an interval: this will always give a jump whenever an interval is divided over multiple cache-lines. The intersection weakness however, is not that simple to avoid. The best countermeasure would be to take this weakness into account when both tables are constructed. That is: prevent leaking information when cache-lines of both tables are given. However, it is unclear how this can be done efficiently. Tweaking the construction of the tables could also help in preventing the offline part of the particular attacks described in this section. By making sure that there is no such thing as a biased value inside an interval, the attacker cannot construct the small vector to search for with LLL. This means the attacker cannot know how to choose the linear relations based on the signature vector z. But also in this case it is unclear how to tweak the table such that all intervals are unbiased. Finally, for the rejection sampling algorithm it is easiest to always randomly sample a fixed number (one or two) of values from the exponential table. This means there is always cache-activity and there is no way of distinguishing between the values. This should also minimize the additional time.

60 8.7 A Short Note on Timing Attacks We began the side-channel analysis of this thesis by looking at timing attacks. This type of attack is very powerful, because it can easily be mounted remotely. A timing attack measures the total time to perform a cryptographic operation (signing, decryption), and tries to use differences in these times, to extract the secret key. Most discrete Gaussian sampling methods have a timing issue: one can link the average sampling time to the size of a sample. This is also noted in [23]. However, how to turn this timing issue into exploitation remains unclear. There are several major issues to cope with:

Only the total signature time is given, but one needs the time per coordinate. Assuming • that the attacker gets the time in the number of clockcycles (integers). Then only when all sampling times of the coordinates are relatively prime, is one able to reconstruct times for the individual coordinates and possibly map them to the correct place. However, this is very unlikely to be true and also repetitions are possible. Otherwise, the attacker gets a subset sum problem, which in general is a hard problem. Assuming that the remaining part of the algorithm is constant time is also highly unlikely. There are many factors which will introduce noise, making the subset sum problem even harder.

Even if on average one is able to link a sample time to the size of a sample, it is unclear • how to use this when the attacker is just given a single sampling time. In most timing attacks/cache-timing attacks, one is able to use a lot of running times and average out the noise. But this is not possible for timing attacks on the Gaussian sampling. For each signature, a different noise vector is used which means an attacker cannot use multiple sampling times and average out the noise. The attacker is given a time and has to decide, only from that time, which sample it is, because a coordinate of the noise vector is only used once. He could construct a confidence interval for each sample and when it is inside one of these intervals, decide it is the appropriate one. This is only possible when all confidence intervals are non-overlapping and for the sampling with big tables, the times are nearly the same. It is then very likely that the attacker makes many errors.

For the cache-attacks described in the previous section, no timing information is used to determine the samples. This means that, when cache-line activity is given, the attack is completely deterministic.

61 9 Summary

We started this thesis with the cryptographic aspects of the Bitcoin protocol, both Blockchain and digital signatures, and showed that we need to adapt it to make it post-quantum secure. The most important part to change are the digital signatures, since the Blockchain’s security could also rely on the security of these signatures. As an introduction to post-quantum signature schemes, hash-based signatures were discussed. Although these schemes have some practicality issues, certainly in the use-case of Bitcoin, it is advised to switch to these schemes first when quantum-computers arise. The security is well understood and no major hardware/software changes are necessary. However, lattice-based signature schemes are more practical in this case, which is why we focus on these schemes. The security level of lattice-based signature schemes is still uncertain, but it was certainly not clear whether it was robust against side-channel attacks. These attacks have to be considered, before using lattice-based signatures for serious applications such as Bitcoin. To understand the security, we introduced the theory of lattices and lattice-basis reductions. We focused on a highly optimized scheme, called BLISS, and examined a crucial step, the discrete Gaussian sampler, in more depth. In the last chapter we showed two potential cache-attacks on BLISS, resulted in breaking the scheme. The first one is exploiting weaknesses, when sampling a discrete Gaussian is done using a Cumulative Distribution Table, combined with an acceleration table. The sampling algorithm has two potential cache-attack weaknesses, denoted by intersection and jump weaknesses. However, for the offline part to work, we restricted ourselves in using these weaknesses, at the expense of using more samples. When both the size and biased requirement are satisfied, one is able to construct a small vector inside a lattice, spanned by vectors extracted from signatures. By using the LLL lattice-basis reduction algorithm, one is able to find the secret key in the unitary transformation matrix of the reduction. However, it is still unclear why this is the case and it remains an open question. The second cache-attack is usable when sampling discrete Gaussians is implemented with a rejection sampling algorithm, combined with a small table with exponential values. For certain values of the sampler, there is no look-up in this table. This significantly reduces the number of values possible when this is encountered. This resulted in a concrete offline attack, where the secret vector is part of the kernel of the integer matrix, spanned by signature vectors.

9.1 Conclusions The goal of this thesis was to examine side-channel attacks for BLISS, an optimized lattice- based signature scheme. We started with the possibilities of timing attacks. Some poeple had expressed the belief that these attacks were easily mounted, because sampling a discrete Gaussian in constant time is not doable with current methods. Despite this, we found no possibility of mounting such a timing attack. The main problem is that only the global execution time is retrievable, whilst one needs the sampling time per coordinate. Our results in concrete side-channel attacks for BLISS, based on the discrete Gaussian sampler, are new as far as we know. To put this in broader perspective: we think this work is very usable to narrow the gap of lattice-based cryptography in theory and practice. Using a big look-up table for the discrete Gaussian sampler invites for cache-attacks, but we also showed that the alternative, based on rejection sampling, has a weakness for cache-attacks. This means we need to re-invent ways to sample a discrete Gaussian, or implement current methods more securely, before the scheme is ready for implementation in the real-world.

62 9.2 Future Work This thesis leaves several open questions. The most important one is the question why the offline part of the first cache-attack finds the secret key. If one would list all vectors of the same size as the secret in the lattice we created, than it would be an impossible outcome that we find the secret key. But experiments confirm that it is very likely that the secret key is found. It could be possible that there are not that many short basis vectors of the lattice we create, and that it is therefore likely that we find the secret key. Another possibility is that the weakness is actually in the part where the sparse signature vector is created by hashing. This is a vital step in our cache-attack and we will investigate this further in the future. The second important question is how we can adapt the CDT sampler with acceleration table, to make it robust against cache-attacks. We think it should be robust by design, which means the table should be constructed in such a way that the attacks are not possible. Possible extensions of this thesis are examining cache-attacks on the discrete Gaussian sampler on other cryptographic algorithms that use it, such as an LWE encryption scheme. The cache-attacks should provide linear relations of the secret key. Furthermore, it might be the case that other side-channel attacks are also possible, such as power analysis, which is out of the scope of this thesis.

63 References

[1] Mikl´osAjtai, Ravi Kumar, and D. Sivakumar. A sieve algorithm for the shortest lattice vector problem. In Jeffrey Scott Vitter, Paul G. Spirakis, and Mihalis Yannakakis, editors, STOC, pages 601–610. ACM, 2001.

[2] Daniel Augot, Lejla Batina, Daniel J. Bernstein, Joppe Bos, Johannes Buchmann, Wouter Castryck, Orr Dunkelman, Tim G¨uneysu,Shay Gueron, Andreas H¨ulsing,Tanja Lange, Mohamed Saied Emam Mohamed, Christian Rechberger, Peter Schwabe, Nicolas Sendrier, Frederik Vercauteren, and Bo-Yin Yang. Initial recommendations of long-term secure post- quantum systems. Available at http://pqcrypto.eu.org/docs/initial-recommendations.pdf, 2015.

[3] Daniel J. Bernstein. Cache-timing attacks on AES. https://cr.yp.to/antiforgery/ cachetiming-20050414.pdf, 2005.

[4] Daniel J. Bernstein, Daira Hopwood, Andreas H¨ulsing,Tanja Lange, Ruben Niederhagen, Louiza Papachristodoulou, Michael Schneider, Peter Schwabe, and Zooko Wilcox-O’Hearn. SPHINCS: practical stateless hash-based signatures. In Elisabeth Oswald and Marc Fischlin, editors, EUROCRYPT, volume 9056 of Lecture Notes in Computer Science, pages 368–397. Springer, 2015.

[5] David Brumley and Dan Boneh. Remote timing attacks are practical. Computer Networks, 48(5):701–716, 2005.

[6] Whitfield Diffie and Martin E. Hellman. New directions in cryptography. IEEE Transac- tions on Information Theory, 22(6):644–654, 1976.

[7] L´eoDucas, Alain Durmus, Tancr`edeLepoint, and Vadim Lyubashevsky. BLISS: Bimodal Lattice Signature Schemes. http://bliss.di.ens.fr/, 2013.

[8] L´eoDucas, Alain Durmus, Tancr`edeLepoint, and Vadim Lyubashevsky. Lattice Signatures and Bimodal Gaussians. In Ran Canetti and Juan A. Garay, editors, CRYPTO, volume 8042 of Lecture Notes in Computer Science, pages 40–56. Springer, 2013.

[9] Ulrich Fincke and Michael Pohst. Improved methods for calculating vectors of short length in a lattice, including a complexity analysis. Mathematics of computation, 44(170):463–471, 1985.

[10] Oded Goldreich, Shafi Goldwasser, and Shai Halevi. Public-key cryptosystems from lattice reduction problems. In Burton S. Kaliski Jr., editor, CRYPTO, volume 1294 of Lecture Notes in Computer Science, pages 112–131. Springer, 1997.

[11] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Gary L. Miller, editor, STOC, pages 212–219. ACM, 1996.

[12] Tim G¨uneysu,Vadim Lyubashevsky, and Thomas P¨oppelmann. Practical lattice-based cryptography: A signature scheme for embedded systems. In Emmanuel Prouff and Patrick Schaumont, editors, CHES, volume 7428 of Lecture Notes in Computer Science, pages 530–547. Springer, 2012.

64 [13] Jemima Kelly. Nine of world’s biggest banks join to form blockchain partnership. http: //www.reuters.com/article/us-banks-blockchain-idUSKCN0RF24M20150915, 2015.

[14] Paul C. Kocher. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In Neal Koblitz, editor, CRYPTO, volume 1109 of Lecture Notes in Computer Science, pages 104–113. Springer, 1996.

[15] Leslie Lamport. Constructing digital signatures from a one-way function. Technical report, Technical Report CSL-98, SRI International Palo Alto, 1979.

[16] Arjen K. Lenstra, Hendrik W. Lenstra Jr., Mark S. Manasse, and John M. Pollard. The number field sieve. In Harriet Ortiz, editor, STOC, pages 564–572. ACM, 1990.

[17] Arjen K. Lenstra, Hendrik W. Lenstra Jr., and L´aszl´oLov´asz.Factoring polynomials with rational coefficients. Mathematische Annalen, 261(4):515–534, 1982.

[18] Ralph C. Merkle. A certified digital signature. In Gilles Brassard, editor, CRYPTO, volume 435 of Lecture Notes in Computer Science, pages 218–238. Springer, 1989.

[19] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. https://bitcoin. org/bitcoin.pdf, 2008.

[20] Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, and Angelos D. Keromytis. The spy in the sandbox: Practical cache attacks in javascript and their implications. In Indrajit Ray, Ninghui Li, and Christopher Kruegel, editors, ACM SIGSAC, pages 1406–1418. ACM, 2015.

[21] Colin Percival. Cache missing for fun and profit. http://css.csail.mit.edu/6.858/ 2011/readings/ht-cache.pdf, 2005.

[22] Ronald L. Rivest, Adi Shamir, and Leonard M. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM, 21(2):120–126, 1978.

[23] Markku-Juhani O. Saarinen. Gaussian sampling precision and information leakage in lattice cryptography. IACR Cryptology ePrint Archive, 2015:953, 2015.

[24] David Schwartz, Noah Youngs, and Arthur Britto. The Ripple protocol consensus algorithm. Ripple Labs Inc White Paper, 2014.

[25] Peter W. Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Review, 41(2):303–332, 1999.

[26] Robert S. Winternitz. Producing a one-way hash function from DES. In David Chaum, editor, CRYPTO, pages 203–207. Plenum Press, New York, 1983.

65 A Cache Weaknesses for Suggested Parameter Sets

In this section, we describe the analysis part of cache-attack 1. For each parameter set suggested by the authors of BLISS, two tables are given. An overview table, with all intervals and corresponding cache-lines of the interval table (AT) and cache-lines of table CDT. Then, both intersection and jump weaknesses are given with cache patterns, satisfying the size requirement. Lastly, when we also restrict with the biased requirement, we end up with a set of values G, with a error-rate of α.

BLISS-0 Overview Cache-Line Analysis

Intervals Cache-line Interval Cache-line CDT [[0, 2], [1, 3], [2, 4], [3, 5]] 0 0 [[4, 6], [5, 7], [6, 8]] 1 0 [[7, 9]] 1 1 [[8, 10], [9, 11], [10, 12], [11, 13]] 2 1 [[12, 14], [13, 15], [14, 16]] 3 1 [[15, 17]] 3 2 [[16, 18], [17, 19], [18, 20], [19, 21]] 4 2 [[20, 22], [21, 23], [22, 24]] 5 2 [[23, 25]] 5 3 [[24, 26], [25, 27], [26, 28], [27, 29]] 6 3 [[28, 30], [29, 31], [30, 32]] 7 3 [[31, 33]] 7 4 [[32, 34], [33, 35], [34, 36], [35, 37]] 8 4 [[36, 38], [37, 39], [38, 40]] 9 4 [[39, 41]] 9 5 [[40, 42], [41, 43], [42, 44], [43, 45], [44, 46]] 10 5 [[45, 47], [46, 48]] 11 5 [[47, 49], [48, 50]] 11 6 [[49, 51], [50, 52], [51, 53], [52, 54]] 12 6 [[53, 55], [54, 56]] 13 6 [[55, 57], [56, 58], [57, 59]] 13 7 [[58, 60], [59, 61], [60, 62], [61, 63], [62, 64]] 14 7 [[63, 65], [64, 66], [65, 67], [66, 68], [67, 69]] 15 8 [[68, 70], [69, 71], [70, 72]] 16 8 [[71, 73], [72, 74]] 16 9 [[73, 75], [74, 76], [75, 77], [76, 78], [77, 79]] 17 9 [[78, 80]] 18 9 [[79, 81], [80, 82], [81, 83], [82, 84]] 18 10 [[83, 85], [84, 86], [85, 87], [86, 88]] 19 10 [[87, 89], [88, 90]] 19 11 [[89, 91], [90, 92], [91, 93], [92, 94], [93, 95], [94, 96]] 20 11 [[95, 97], [96, 98], [97, 99], [98, 100], [99, 101], [100, 102]] 21 12

66 [[101, 103], [102, 104]] 22 12 [[103, 105], [104, 106], [105, 107], [106, 108], [107, 109]] 22 13 [[108, 110], [109, 111], [110, 112]] 23 13 [[111, 113], [112, 114], [113, 115], [114, 116]] 23 14 [[115, 117], [116, 118], [117, 119], [118, 120]] 24 14 [[119, 121], [120, 122], [121, 123], [122, 124]] 24 15 [[123, 125], [124, 126], [125, 127], [126, 129]] 25 15 [[126, 129], [128, 130], [129, 131], [130, 132], [131, 133]] 25 16 [[132, 134], [133, 135], [134, 137]] 26 16 [[134, 137], [136, 138], [137, 139], [138, 140], [139, 142], [141, 143]] 26 17 [[142, 144]] 27 17 [[143, 146], [145, 147], [146, 149], [148, 150], [149, 152]] 27 18 [[151, 153], [152, 155]] 27 19 [[154, 156], [155, 158], [157, 160]] 28 19 [[159, 161], [160, 163], [162, 165], [164, 167], [166, 169]] 28 20 [[166, 169]] 28 21 [[168, 171], [170, 173], [172, 175], [174, 178]] 29 21 [[174, 178], [177, 180], [179, 182], [181, 185]] 29 22 [[181, 185], [184, 188]] 29 23 [[187, 191], [190, 194]] 30 23 [[190, 194], [193, 197], [196, 200]] 30 24 [[199, 204], [203, 208]] 30 25 [[207, 212], [211, 217]] 30 26 [[211, 217]] 30 27 [[216, 222], [221, 228]] 31 27 [[221, 228], [227, 235]] 31 28 [[227, 235], [234, 243]] 31 29 [[234, 243], [242, 254]] 31 30 [[242, 254], [253, 268]] 31 31 [[253, 268]] 31 32 [[253, 268], [267, 290]] 31 33 [[267, 290]] 31 34 [[267, 290]] 31 35 [[267, 290], [289, 1202]] 31 36 [[289, 1202]] 31 37-149 Table 5: Table of cache-line analysis of BLISS-0. For each interval created, the corresponding AT and CDT cache-lines are given. These two cache-lines are at the basis for the weaknesses exploited on CDT sampling with acceleration table. Note that the possible values inside an interval are not including the upper bound.

Cache Weaknesses

Weakness Type Values Cache-line Interval Cache-Line Pattern CDT

67 Intersection 7, 8 1 1 Intersection 15, 16 3 2 Intersection 23, 24 5 3 Intersection 31, 32 7 4 Intersection 39, 40 9 5 Intersection 78, 79 18 9 Intersection 142, 143 27 17 Jump 127, 128 25 15, 16 Jump 135, 136 26 16, 17 Jump 167, 168 28 20, 21 Jump 174, 175 29 22, 21 Jump 183, 184 29 22, 23 Jump 190, 191 30 24, 23 Jump 215, 216 30 26, 26, 27 Table 6: List of cache weaknesses for BLISS-0. For each type, the associated values and cache-line patterns are given.

The set of values G, with cache-weakness satisfying both the size and biased requirements, is:

G = 127 { } with α = 0.09.

BLISS-I Overview Cache-Line Analysis

Intervals Cache-line Interval Cache-line CDT [[0, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 7], [6, 8]] 0 0 [[7, 9]] 0 1 [[8, 11], [10, 12], [11, 13], [12, 14], [13, 15], [14, 16]] 1 1 [[15, 17], [16, 18]] 1 2 [[17, 19], [18, 20], [19, 21], [20, 22], [21, 23], [22, 24]] 2 2 [[23, 25], [24, 26]] 2 3 [[25, 27], [26, 29], [28, 30], [29, 31], [30, 32]] 3 3 [[31, 33], [32, 34], [33, 35]] 3 4 [[34, 36], [35, 37], [36, 38], [37, 39], [38, 40]] 4 4 [[39, 41], [40, 42], [41, 44]] 4 5 [[43, 45], [44, 46], [45, 47], [46, 48]] 5 5 [[47, 49], [48, 50], [49, 51], [50, 52]] 5 6 [[51, 53], [52, 54], [53, 55], [54, 57]] 6 6 [[54, 57], [56, 58], [57, 59], [58, 60], [59, 61]] 6 7 [[60, 62], [61, 63], [62, 64]] 7 7 [[63, 65], [64, 66], [65, 68], [67, 69], [68, 70]] 7 8 [[69, 71], [70, 72]] 8 8 [[71, 73], [72, 74], [73, 75], [74, 76], [75, 78], [77, 79]] 8 9

68 [[78, 80]] 9 9 [[79, 81], [80, 82], [81, 83], [82, 84], [83, 85], [84, 87], [86, 88]] 9 10 [[87, 89], [88, 90], [89, 91], [90, 92], [91, 94], [93, 95], [94, 96]] 10 11 [[95, 97]] 10 12 [[96, 98], [97, 99], [98, 101], [100, 102], [101, 103], [102, 104]] 11 12 [[103, 105], [104, 106]] 11 13 [[105, 108], [107, 109], [108, 110], [109, 111], [110, 112]] 12 13 [[111, 114], [113, 115], [114, 116]] 12 14 [[115, 117], [116, 119], [118, 120]] 13 14 [[119, 121], [120, 122], [121, 123], [122, 125], [124, 126]] 13 15 [[125, 127], [126, 128]] 14 15 [[127, 130], [129, 131], [130, 132], [131, 134], [133, 135], [134, 136]] 14 16 [[135, 137], [136, 139], [138, 140], [139, 141], [140, 143], [142, 144]] 15 17 [[143, 145], [144, 147]] 15 18 [[146, 148], [147, 149], [148, 151], [150, 152]] 16 18 [[151, 153], [152, 155], [154, 156], [155, 157]] 16 19 [[156, 159], [158, 160]] 17 19 [[159, 161], [160, 163], [162, 164], [163, 166], [165, 167], [166, 168]] 17 20 [[167, 170], [169, 171], [170, 173], [172, 174], [173, 176]] 18 21 [[175, 177], [176, 179], [178, 180]] 18 22 [[179, 182], [181, 183], [182, 185]] 19 22 [[182, 185], [184, 186], [185, 188], [187, 189], [188, 191], [190, 192]] 19 23 [[191, 194], [193, 196], [195, 197], [196, 199], [198, 200]] 20 24 [[199, 202], [201, 204], [203, 205]] 20 25 [[204, 207], [206, 209]] 21 25 [[206, 209], [208, 210], [209, 212], [211, 214], [213, 215], [214, 217]] 21 26 [[214, 217], [216, 219]] 21 27 [[218, 221], [220, 222], [221, 224]] 22 27 [[223, 226], [225, 228], [227, 230], [229, 232]] 22 28 [[231, 233]] 22 29 [[232, 235], [234, 237], [236, 239], [238, 241]] 23 29 [[238, 241], [240, 243], [242, 245], [244, 247], [246, 249]] 23 30 [[246, 249]] 23 31 [[248, 251], [250, 253], [252, 255], [254, 258]] 24 31 [[254, 258], [257, 260], [259, 262], [261, 264]] 24 32 [[263, 266]] 24 33 [[265, 269], [268, 271], [270, 273]] 25 33 [[270, 273], [272, 276], [275, 278], [277, 280]] 25 34 [[279, 283], [282, 285]] 25 35 [[284, 288]] 26 35 [[287, 290], [289, 293], [292, 296]] 26 36 [[295, 298], [297, 301], [300, 304]] 26 37 [[303, 307]] 26 38 [[306, 310], [309, 313]] 27 38 [[309, 313], [312, 316], [315, 319], [318, 322]] 27 39 [[318, 322], [321, 325], [324, 329]] 27 40 [[324, 329], [328, 332]] 27 41

69 [[331, 335], [334, 339]] 28 41 [[334, 339], [338, 343], [342, 346]] 28 42 [[342, 346], [345, 350], [349, 354]] 28 43 [[349, 354], [353, 358], [357, 363]] 28 44 [[357, 363]] 28 45 [[362, 367], [366, 371]] 29 45 [[366, 371], [370, 376]] 29 46 [[375, 381], [380, 386]] 29 47 [[380, 386], [385, 391], [390, 397]] 29 48 [[390, 397], [396, 403]] 29 49 [[396, 403]] 29 50 [[402, 409]] 30 50 [[402, 409], [408, 415], [414, 422]] 30 51 [[414, 422], [421, 430]] 30 52 [[421, 430], [429, 438]] 30 53 [[429, 438], [437, 446]] 30 54 [[437, 446], [445, 455]] 30 55 [[445, 455], [454, 466]] 30 56 [[454, 466]] 30 57 [[454, 466]] 30 58 [[465, 477]] 31 58 [[465, 477], [476, 490]] 31 59 [[476, 490]] 31 60 [[476, 490], [489, 505]] 31 61 [[489, 505]] 31 62 [[489, 505], [504, 523]] 31 63 [[504, 523]] 31 64 [[504, 523], [522, 545]] 31 65 [[522, 545]] 31 66 [[522, 545]] 31 67 [[522, 545], [544, 575]] 31 68 [[544, 575]] 31 69 [[544, 575]] 31 70 [[544, 575], [574, 624]] 31 71 [[574, 624]] 31 72 [[574, 624]] 31 73 [[574, 624]] 31 74 [[574, 624]] 31 75 [[574, 624]] 31 76 [[574, 624]] 31 77 [[623, 2588]] 31 78-322

70 Table 7: Table of cache-line analysis of BLISS-I. For each interval created, the corresponding AT and CDT cache-lines are given. These two cache-lines are at the basis for the weaknesses exploited on CDT sampling with acceleration table. Note that the possible values inside an interval are not including the upper bound.

Cache Weaknesses

Weakness Type Values Cache-line Interval Cache-Line Pattern CDT Intersection 7, 8 0 1 Intersection 78, 79 9 9 Intersection 95, 96 10 12 Intersection 231, 232 22 29 Jump 55, 56 6 6, 7 Jump 183, 184 19 22, 23 Jump 207, 208 21 25, 26 Jump 215, 216 21 26, 27 Jump 239, 240 23 29, 30 Jump 247, 248 23 30, 31 Jump 254, 255 24 32, 31 Jump 271, 272 25 33, 34 Jump 311, 312 27 38, 39 Jump 318, 319 27 40, 39 Jump 327, 328 27 40, 40, 41 Jump 334, 335 28 42, 41 Jump 342, 343 28 43, 42 Jump 366, 367 29 46, 45 Jump 390, 391 29 49,48 Jump 407, 408 30 50, 50, 51 Jump 414, 415 30 52, 52, 51 Table 8: List of cache weaknesses for BLISS-I. For each type, the associated values and cache-line patterns are given.

The set of values G, with cache-weakness satisfying both the size and biased requirements, is:

G = 8, 55, 207, 255, 327, 335, 390, 415 { } with α 0.10. ≥ BLISS-II Overview Cache-Line Analysis

71 Intervals Cache-line Interval Cache-line CDT [[0, 2], [1, 3], [2, 4], [3, 5]] 0 0 [[4, 6], [5, 7], [6, 8]] 1 0 [[7, 9]] 1 1 [[8, 10], [9, 11], [10, 12], [11, 13], [12, 14]] 2 1 [[13, 15], [14, 16]] 3 1 [[15, 17], [16, 18]] 3 2 [[17, 19], [18, 20], [19, 21], [20, 22]] 4 2 [[21, 23], [22, 24]] 5 2 [[23, 25], [24, 26]] 5 3 [[25, 27], [26, 28], [27, 29], [28, 30], [29, 31]] 6 3 [[30, 32]] 7 3 [[31, 33], [32, 34], [33, 35]] 7 4 [[34, 36], [35, 37], [36, 38], [37, 39], [38, 40]] 8 4 [[39, 41], [40, 42], [41, 43], [42, 44]] 9 5 [[43, 45], [44, 46], [45, 47], [46, 48]] 10 5 [[47, 49]] 10 6 [[48, 50], [49, 51], [50, 52], [51, 53]] 11 6 [[52, 54], [53, 55], [54, 56]] 12 6 [[55, 57], [56, 58]] 12 7 [[57, 59], [58, 60], [59, 61], [60, 62], [61, 63]] 13 7 [[62, 64]] 14 7 [[63, 65], [64, 66], [65, 67], [66, 68]] 14 8 [[67, 69], [68, 70], [69, 71], [70, 72]] 15 8 [[71, 73]] 15 9 [[72, 74], [73, 75], [74, 76], [75, 77], [76, 78], [77, 79]] 16 9 [[78, 80]] 17 9 [[79, 81], [80, 82], [81, 83], [82, 84]] 17 10 [[83, 85], [84, 86], [85, 87], [86, 88]] 18 10 [[87, 89], [88, 90]] 18 11 [[89, 91], [90, 92], [91, 93], [92, 94], [93, 95], [94, 96]] 19 11 [[95, 97], [96, 98], [97, 99], [98, 100], [99, 101], [100, 102]] 20 12 [[101, 103], [102, 104]] 21 12 [[103, 105], [104, 106], [105, 107], [106, 108], [107, 109]] 21 13 [[108, 110], [109, 111], [110, 112]] 22 13 [[111, 113], [112, 114], [113, 115], [114, 116]] 22 14 [[115, 117], [116, 118], [117, 119], [118, 120]] 23 14 [[119, 121], [120, 122], [121, 123], [122, 124]] 23 15 [[123, 125], [124, 126], [125, 127], [126, 128]] 24 15 [[127, 129], [128, 130], [129, 132], [131, 133]] 24 16 [[132, 134], [133, 135], [134, 136]] 25 16 [[135, 137], [136, 138], [137, 140], [139, 141], [140, 142]] 25 17 [[141, 143], [142, 145]] 26 17 [[142, 145], [144, 146], [145, 147], [146, 149], [148, 150], [149, 151], [150, 153]] 26 18 [[150, 153]] 26 19 [[152, 154], [153, 156], [155, 157], [156, 159], [158, 160]] 27 19

72 [[159, 162], [161, 163], [162, 165]] 27 20 [[164, 167], [166, 169]] 28 20 [[166, 169], [168, 170], [169, 172], [171, 174], [173, 176]] 28 21 [[175, 178], [177, 180]] 28 22 [[179, 183], [182, 185]] 29 22 [[182, 185], [184, 187], [186, 190], [189, 192]] 29 23 [[191, 195], [194, 197], [196, 200]] 29 24 [[199, 203], [202, 207], [206, 210]] 30 25 [[206, 210], [209, 214], [213, 218]] 30 26 [[213, 218], [217, 222], [221, 226]] 30 27 [[221, 226], [225, 231]] 30 28 [[230, 237]] 31 28 [[230, 237], [236, 244]] 31 29 [[236, 244], [243, 251]] 31 30 [[243, 251], [250, 260]] 31 31 [[250, 260], [259, 271]] 31 32 [[259, 271], [270, 286]] 31 33 [[270, 286]] 31 34 [[270, 286], [285, 310]] 31 35 [[285, 310]] 31 36 [[285, 310]] 31 37 [[285, 310], [309, 1284]] 31 38 [[309, 1284]] 31 39-159 Table 9: Table of cache-line analysis of BLISS-II. For each interval created, the corresponding AT and CDT cache-lines are given. These two cache-lines are at the basis for the weaknesses exploited on CDT sampling with acceleration table. Note that the possible values inside an interval are not including the upper bound.

Cache Weaknesses

Weakness Type Values Cache-line Interval Cache-Line Pattern CDT Intersection 7, 8 1 1 Intersection 30, 31 7 3 Intersection 47, 48 10 6 Intersection 62, 63 14 7 Intersection 71, 72 15 9 Intersection 72, 78 16 9 Intersection 78, 79 17 9 Jump 143, 144 26 17, 18 Jump 151, 152 26 18, 19 Jump 167, 168 28 20, 21 Jump 183, 184 29 22, 23 Jump 206, 207 30 26, 25

73 Table 10: List of cache weaknesses for BLISS-II. For each type, the associated values and cache-line patterns are given.

The set of values G, with cache-weakness satisfying both the size and biased requirements, is:

G = 143 { } with α 0.07. ≥ BLISS-III Overview Cache-Line Analysis

Intervals Cache-line Interval Cache-line CDT [[0, 2], [1, 3], [2, 5], [4, 6], [5, 7], [6, 8]] 0 0 [[7, 10], [9, 11]] 0 1 [[10, 12], [11, 13], [12, 14], [13, 16]] 1 1 [[15, 17], [16, 18], [17, 19], [18, 21]] 1 2 [[20, 22], [21, 23], [22, 24]] 2 2 [[23, 26], [25, 27], [26, 28], [27, 29], [28, 31]] 2 3 [[30, 32]] 3 3 [[31, 33], [32, 34], [33, 35], [34, 37], [36, 38], [37, 39], [38, 40]] 3 4 [[39, 42], [41, 43], [42, 44], [43, 45], [44, 47], [46, 48]] 4 5 [[47, 49], [48, 50]] 4 6 [[49, 52], [51, 53], [52, 54], [53, 55], [54, 57]] 5 6 [[54, 57], [56, 58], [57, 59], [58, 60]] 5 7 [[59, 62], [61, 63], [62, 64]] 6 7 [[63, 65], [64, 67], [66, 68], [67, 69], [68, 71]] 6 8 [[70, 72]] 7 8 [[71, 73], [72, 74], [73, 76], [75, 77], [76, 78], [77, 80]] 7 9 [[79, 81]] 7 10 [[80, 82], [81, 83], [82, 85], [84, 86], [85, 87], [86, 89]] 8 10 [[86, 89], [88, 90], [89, 91]] 8 11 [[90, 93], [92, 94], [93, 95], [94, 96]] 9 11 [[95, 98], [97, 99], [98, 100], [99, 102]] 9 12 [[101, 103], [102, 104]] 10 12 [[103, 106], [105, 107], [106, 108], [107, 110], [109, 111], [110, 113]] 10 13 [[110, 113]] 10 14 [[112, 114], [113, 115], [114, 117], [116, 118], [117, 119], [118, 121]] 11 14 [[118, 121], [120, 122], [121, 123]] 11 15 [[122, 125], [124, 126], [125, 128]] 12 15 [[127, 129], [128, 130], [129, 132], [131, 133], [132, 135]] 12 16 [[134, 136]] 13 16 [[135, 137], [136, 139], [138, 140], [139, 142], [141, 143], [142, 145]] 13 17 [[142, 145], [144, 146]] 13 18

74 [[145, 148], [147, 149], [148, 150], [149, 152]] 14 18 [[151, 153], [152, 155], [154, 156], [155, 158]] 14 19 [[157, 159], [158, 161]] 15 19 [[158, 161], [160, 162], [161, 164], [163, 165], [164, 167], [166, 168]] 15 20 [[167, 170]] 15 21 [[169, 172], [171, 173], [172, 175], [174, 176]] 16 21 [[175, 178], [177, 179], [178, 181], [180, 183]] 16 22 [[182, 184]] 17 22 [[183, 186], [185, 187], [186, 189], [188, 191], [190, 192]] 17 23 [[191, 194], [193, 196]] 17 24 [[195, 197], [196, 199], [198, 201]] 18 24 [[198, 201], [200, 202], [201, 204], [203, 206], [205, 207], [206, 209]] 18 25 [[206, 209]] 18 26 [[208, 211], [210, 213], [212, 214], [213, 216]] 19 26 [[215, 218], [217, 220], [219, 221], [220, 223]] 19 27 [[222, 225]] 20 27 [[222, 225], [224, 227], [226, 229], [228, 231], [230, 233]] 20 28 [[230, 233], [232, 234], [233, 236], [235, 238]] 20 29 [[237, 240]] 21 29 [[239, 242], [241, 244], [243, 246], [245, 248]] 21 30 [[247, 250], [249, 252], [251, 254]] 21 31 [[253, 256]] 22 31 [[255, 258], [257, 260], [259, 262], [261, 264]] 22 32 [[263, 267], [266, 269], [268, 271]] 22 33 [[270, 273]] 23 33 [[270, 273], [272, 275], [274, 278], [277, 280]] 23 34 [[279, 282], [281, 285], [284, 287], [286, 289]] 23 35 [[286, 289]] 23 36 [[288, 292], [291, 294], [293, 296]] 24 36 [[295, 299], [298, 301], [300, 304]] 24 37 [[303, 307], [306, 309]] 24 38 [[308, 312]] 25 38 [[311, 314], [313, 317], [316, 320]] 25 39 [[319, 323], [322, 325], [324, 328]] 25 40 [[327, 331]] 25 41 [[330, 334], [333, 337]] 26 41 [[333, 337], [336, 340], [339, 343], [342, 346]] 26 42 [[342, 346], [345, 350], [349, 353]] 26 43 [[349, 353], [352, 356]] 26 44 [[355, 360]] 27 44 [[359, 363], [362, 367], [366, 370]] 27 45 [[366, 370], [369, 374], [373, 378]] 27 46 [[373, 378], [377, 381], [380, 385]] 27 47 [[380, 385]] 27 48 [[384, 389], [388, 394]] 28 48 [[388, 394], [393, 398], [397, 402]] 28 49 [[397, 402], [401, 407], [406, 411]] 28 50

75 [[406, 411], [410, 416]] 28 51 [[415, 421]] 28 52 [[420, 426]] 29 52 [[420, 426], [425, 431], [430, 437]] 29 53 [[430, 437], [436, 442]] 29 54 [[436, 442], [441, 448]] 29 55 [[447, 454], [453, 461]] 29 56 [[453, 461], [460, 468]] 29 57 [[460, 468]] 29 58 [[467, 475]] 30 58 [[467, 475], [474, 482]] 30 59 [[474, 482], [481, 490]] 30 60 [[481, 490], [489, 499]] 30 61 [[489, 499], [498, 508]] 30 62 [[498, 508], [507, 518]] 30 63 [[507, 518], [517, 529]] 30 64 [[517, 529]] 30 65 [[517, 529], [528, 541]] 30 66 [[528, 541]] 30 67 [[540, 554]] 31 67 [[540, 554]] 31 68 [[540, 554], [553, 569]] 31 69 [[553, 569]] 31 70 [[553, 569], [568, 586]] 31 71 [[568, 586]] 31 72 [[568, 586], [585, 607]] 31 73 [[585, 607]] 31 74 [[585, 607], [606, 633]] 31 75 [[606, 633]] 31 76 [[606, 633]] 31 77 [[606, 633]] 31 78 [[606, 633], [632, 667]] 31 79 [[632, 667]] 31 80 [[632, 667]] 31 81 [[632, 667]] 31 82 [[632, 667], [666, 724]] 31 83 [[666, 724]] 31 84 [[666, 724]] 31 85 [[666, 724]] 31 86 [[666, 724]] 31 87 [[666, 724]] 31 88 [[666, 724]] 31 89 [[666, 724], [723, 3006]] 31 90 [[723, 3006]] 31 91-374

76 Table 11: Table of cache-line analysis of BLISS-III. For each interval created, the corresponding AT and CDT cache-lines are given. These two cache-lines are at the basis for the weaknesses exploited on CDT sampling with acceleration table. Note that the possible values inside an interval are not including the upper bound.

Cache Weaknesses

Weakness Type Values Cache-line Interval Cache-Line Pattern CDT Intersection 30, 31 3 3 Intersection 70, 71 7 8 Intersection 79, 80 7 10 Intersection 134, 135 13 16 Intersection 182, 183 17 22 Jump 55, 56 5 6, 7 Jump 87, 88 8 10, 11 Jump 111, 112 10 13, 14 Jump 119, 120 11 14, 15 Jump 143, 144 13 17, 18 Jump 159, 160 15 19, 20 Jump 199, 200 18 24, 25 Jump 207, 208 18 25, 26 Jump 223, 224 20 27, 28 Jump 231, 232 20 28, 29 Jump 271, 272 23 33, 34 Jump 287, 288 23 35, 36 Jump 335, 336 26 41, 42 Jump 342, 343 26 43, 42 Jump 351, 352 26 43, 44 Jump 366, 367 27 46, 45 Jump 383, 384 27 47, 47, 48 Jump 406, 407 28 51, 50 Table 12: List of cache weaknesses for BLISS-III. For each type, the associated values and cache-line patterns are given.

The set of values G, with cache-weakness satisfying both the size and biased requirements, is: G = 87, 111, 199, 231 { } with α 0.10. ≥ BLISS-IV Overview Cache-Line Analysis

77 Intervals Cache-line Interval Cache-line CDT [[0, 2], [1, 4], [3, 5], [4, 6], [5, 8]] 0 0 [[7, 9], [8, 10], [9, 12]] 0 1 [[11, 13], [12, 14], [13, 16]] 1 1 [[15, 17], [16, 18], [17, 20], [19, 21], [20, 22]] 1 2 [[21, 24]] 2 2 [[23, 25], [24, 26], [25, 28], [27, 29], [28, 30], [29, 32]] 2 3 [[31, 33]] 2 4 [[32, 34], [33, 36], [35, 37], [36, 38], [37, 40]] 3 4 [[39, 41], [40, 42], [41, 44]] 3 5 [[43, 45], [44, 46], [45, 48]] 4 5 [[47, 49], [48, 51], [50, 52], [51, 53], [52, 55]] 4 6 [[54, 56]] 5 6 [[55, 57], [56, 59], [58, 60], [59, 61], [60, 63], [62, 64]] 5 7 [[63, 65]] 5 8 [[64, 67], [66, 68], [67, 70], [69, 71], [70, 72]] 6 8 [[71, 74], [73, 75], [74, 76]] 6 9 [[75, 78], [77, 79], [78, 81]] 7 9 [[78, 81], [80, 82], [81, 83], [82, 85], [84, 86], [85, 88]] 7 10 [[87, 89], [88, 90], [89, 92], [91, 93], [92, 95], [94, 96]] 8 11 [[95, 97], [96, 99]] 8 12 [[98, 100], [99, 102], [101, 103], [102, 105]] 9 12 [[102, 105], [104, 106], [105, 107], [106, 109], [108, 110]] 9 13 [[109, 112]] 10 13 [[111, 113], [112, 115], [114, 116], [115, 118], [117, 119], [118, 121]] 10 14 [[118, 121], [120, 122]] 10 15 [[121, 123], [122, 125], [124, 126], [125, 128]] 11 15 [[127, 129], [128, 131], [130, 132], [131, 134]] 11 16 [[133, 135], [134, 137]] 12 16 [[134, 137], [136, 138], [137, 140], [139, 141], [140, 143], [142, 144]] 12 17 [[143, 146]] 12 18 [[145, 148], [147, 149], [148, 151], [150, 152]] 13 18 [[151, 154], [153, 155], [154, 157], [156, 158]] 13 19 [[157, 160]] 14 19 [[159, 162], [161, 163], [162, 165], [164, 166], [165, 168]] 14 20 [[167, 170], [169, 171]] 14 21 [[170, 173], [172, 174], [173, 176]] 15 21 [[175, 178], [177, 179], [178, 181], [180, 183], [182, 184]] 15 22 [[183, 186], [185, 188], [187, 189], [188, 191], [190, 193]] 16 23 [[190, 193], [192, 194], [193, 196], [195, 198]] 16 24 [[197, 200]] 17 24 [[199, 201], [200, 203], [202, 205], [204, 207], [206, 208]] 17 25 [[207, 210], [209, 212]] 17 26 [[211, 214], [213, 216]] 18 26 [[215, 217], [216, 219], [218, 221], [220, 223], [222, 225]] 18 27

78 [[222, 225], [224, 227]] 18 28 [[226, 229], [228, 230], [229, 232]] 19 28 [[231, 234], [233, 236], [235, 238], [237, 240]] 19 29 [[239, 242]] 19 30 [[241, 244], [243, 246], [245, 248]] 20 30 [[247, 250], [249, 252], [251, 254], [253, 256]] 20 31 [[255, 258]] 20 32 [[257, 260], [259, 263], [262, 265]] 21 32 [[262, 265], [264, 267], [266, 269], [268, 271], [270, 273]] 21 33 [[270, 273], [272, 275]] 21 34 [[274, 278], [277, 280]] 22 34 [[279, 282], [281, 285], [284, 287], [286, 289]] 22 35 [[286, 289], [288, 291], [290, 294]] 22 36 [[293, 296]] 23 36 [[295, 299], [298, 301], [300, 304]] 23 37 [[303, 306], [305, 309], [308, 311], [310, 314]] 23 38 [[310, 314]] 23 39 [[313, 316], [315, 319], [318, 322]] 24 39 [[318, 322], [321, 324], [323, 327], [326, 330]] 24 40 [[326, 330], [329, 332], [331, 335]] 24 41 [[334, 338]] 25 41 [[334, 338], [337, 341], [340, 344]] 25 42 [[343, 347], [346, 350], [349, 353]] 25 43 [[349, 353], [352, 356], [355, 359]] 25 44 [[358, 362]] 26 44 [[358, 362], [361, 366], [365, 369]] 26 45 [[365, 369], [368, 372], [371, 376]] 26 46 [[375, 379], [378, 383], [382, 386]] 26 47 [[382, 386]] 26 48 [[385, 390], [389, 394]] 27 48 [[389, 394], [393, 398], [397, 401]] 27 49 [[397, 401], [400, 405], [404, 410]] 27 50 [[404, 410], [409, 414], [413, 418]] 27 51 [[413, 418]] 27 52 [[417, 422], [421, 427]] 28 52 [[421, 427], [426, 431], [430, 436]] 28 53 [[430, 436], [435, 441]] 28 54 [[435, 441], [440, 446], [445, 451]] 28 55 [[445, 451], [450, 456]] 28 56 [[455, 462], [461, 468]] 29 57 [[461, 468], [467, 474]] 29 58 [[467, 474], [473, 480]] 29 59 [[479, 486], [485, 493]] 29 60 [[485, 493], [492, 500]] 29 61 [[492, 500], [499, 507]] 29 62 [[499, 507]] 29 63 [[506, 515]] 30 63

79 [[506, 515], [514, 523]] 30 64 [[514, 523], [522, 532]] 30 65 [[522, 532], [531, 541]] 30 66 [[531, 541], [540, 551]] 30 67 [[540, 551], [550, 562]] 30 68 [[550, 562]] 30 69 [[550, 562], [561, 574]] 30 70 [[561, 574], [573, 586]] 30 71 [[573, 586]] 30 72 [[573, 586]] 30 73 [[585, 601]] 31 73 [[585, 601]] 31 74 [[585, 601], [600, 617]] 31 75 [[600, 617]] 31 76 [[600, 617], [616, 636]] 31 77 [[616, 636]] 31 78 [[616, 636], [635, 658]] 31 79 [[635, 658]] 31 80 [[635, 658]] 31 81 [[635, 658], [657, 686]] 31 82 [[657, 686]] 31 83 [[657, 686]] 31 84 [[657, 686], [685, 724]] 31 85 [[685, 724]] 31 86 [[685, 724]] 31 87 [[685, 724]] 31 88 [[685, 724]] 31 89 [[685, 724], [723, 785]] 31 90 [[723, 785]] 31 91 [[723, 785]] 31 92 [[723, 785]] 31 93 [[723, 785]] 31 94 [[723, 785]] 31 95 [[723, 785]] 31 96 [[723, 785]] 31 97 [[723, 785], [784, 3261]] 31 98 [[784, 3261]] 31 99-406 Table 13: Table of cache-line analysis of BLISS-IV. For each interval created, the corresponding AT and CDT cache-lines are given. These two cache-lines are at the basis for the weaknesses exploited on CDT sampling with acceleration table. Note that the possible values inside an interval are not including the upper bound.

Cache Weaknesses

80 Weakness Type Values Cache-line Interval Cache-line(s) CDT Intersection 31, 32 2 4 Intersection 54, 55 5 6 Intersection 55, 63 5 7 Intersection 63, 64 5 8 Jump 79, 80 7 9, 10 Jump 103, 104 9 12, 13 Jump 119, 120 10 14, 15 Jump 135, 136 12 16, 17 Jump 191, 192 16 23, 24 Jump 223, 224 18 27, 28 Jump 263, 264 21 32, 33 Jump 271, 272 21 33, 34 Jump 287, 288 22 35, 36 Jump 310, 311 23 39, 38 Jump 318, 319 24 40, 39 Jump 326, 327 24 41, 40 Jump 334, 335 25 42, 41 Jump 351, 352 25 43, 44 Jump 358, 359 26 45, 44 Jump 367, 368 26 45, 46 Jump 382, 383 26 48, 47 Jump 399, 400 27 49, 50 Jump 439, 440 28 54, 54, 55 Table 14: List of cache weaknesses for BLISS-IV. For each type, the associated values and cache-line patterns are given.

The set of values G, with cache-weakness satisfying both the size and biased requirements, is:

G = 79, 103, 119, 263 { } with α 0.10. ≥

81