<<

BOAZBARAK

ANINTENSIVE INTRODUCTION TOCRYPTOGRAPHY

LECTURENOTES. AVAILABLEON HTTPS://INTENSECRYPTO.ORG Text available on  https://github.com/boazbk/crypto - please post any issues there - thank you!

This version was compiled on Thursday 23rd September, 2021 13:32

Copyright © 2021 Boaz Barak

This work is licensed under a Creative Commons “Attribution-NonCommercial- NoDerivatives 4.0 International” license. If you can just get your mind together

Then come on across to me

We’ll hold hands, and then we’ll watch the sun rise

From the bottom of the sea

Jimi Hendrix, Are You Experienced?

Contents

Foreword and Syllabus 9

I Preliminaries 15

0 Mathematical Background 17

1 Introduction 39

II Private 59

2 Computational Security 61

3 Pseudorandomness 83

4 Pseudorandom functions 103

5 Pseudorandom functions from pseudorandom generators and CPA security 117

6 Chosen Security 137

7 Hash Functions and Random Oracles 153

8 Key derivation, protecting passwords, slow hashes, Merkle trees 167

9 Private key crypto recap 177

III Public key cryptography 179

10 Public key cryptography 181

11 Concrete candidates for public key crypto 211

12 Chosen ciphertext security for public key 229

Compiled on 9.23.2021 13:32 6

13 Lattice based cryptography 231

14 Establishing secure connections over insecure channels 247

IV Advanced topics 257

15 Zero knowledge proofs 259

16 Fully homomorphic encryption: Introduction and boot- strapping 273

17 Fully homomorphic encryption: Construction 285

18 Multiparty secure computation I: Definition and Honest- But-Curious to Malicious complier 299

19 Multiparty secure computation II: Construction using Fully Homomorphic Encryption 315

20 Quantum computing and cryptography I 325

21 Quantum computing and cryptography II 337

22 Software Obfuscation 347

23 More obfuscation, exotic 357

24 Anonymous communication 363

V Conclusions 365

25 Ethical, moral, and policy dimensions to cryptography 367

26 Course recap 373 Contents (detailed)

Foreword and Syllabus 9 0.1 Syllabus ...... 10 0.1.1 Prerequisites ...... 12 0.2 Why is cryptography hard? ...... 12

I Preliminaries 15

0 Mathematical Background 17 0.1 A quick overview of mathematical prerequisites . . . . 17 0.2 Mathematical Proofs ...... 19 0.2.1 Example: The existence of infinitely many primes. 20 0.3 Probability and Sample spaces ...... 22 0.3.1 Random variables ...... 24 0.3.2 Distributions over strings ...... 26 0.3.3 More general sample spaces...... 27 0.4 Correlations and independence ...... 27 0.4.1 Independent random variables ...... 29 0.4.2 Collections of independent random variables. . . 30 0.5 Concentration and tail bounds ...... 31 0.5.1 Chebyshev’s Inequality ...... 32 0.5.2 The Chernoff bound ...... 33 0.6 Exercises ...... 34 0.7 Exercises ...... 35

1 Introduction 39 1.1 Some history ...... 39 1.2 Defining encryptions ...... 41 1.3 Defining security of encryption ...... 43 1.3.1 Generating randomness in actual cryptographic systems ...... 44 1.4 Defining the secrecy requirement...... 46 1.5 Perfect Secrecy ...... 49 1.5.1 Achieving perfect secrecy ...... 52 1.6 Necessity of long keys ...... 54 8

1.6.1 Amplifying success probability ...... 57 1.7 Bibliographical notes ...... 58

II Private key cryptography 59

2 Computational Security 61 2.0.1 Proof by reduction ...... 65 2.1 The asymptotic approach ...... 66 2.1.1 Counting number of operations...... 68 2.2 Our first conjecture ...... 70 2.3 Why care about the cipher conjecture? ...... 71 2.4 Prelude: Computational Indistinguishability ...... 71 2.5 The Length Extension Theorem or Stream Ciphers . . . 76 2.5.1 Appendix: The computational model ...... 80

3 Pseudorandomness 83 3.0.1 Unpredictability: an alternative approach for proving the length extension theorem ...... 88 3.1 Stream ciphers ...... 90 3.2 What do pseudorandom generators actually look like? . 92 3.2.1 Attempt 0: The counter generator ...... 92 3.2.2 Attempt 1: The linear checksum / linear feed- back (LFSR) ...... 92 3.2.3 From insecurity to security ...... 94 3.2.4 Attempt 2: Linear Congruential Generators with dropped bits ...... 95 3.3 Successful examples ...... 96 3.3.1 Case Study 1: Subset Sum Generator ...... 96 3.3.2 Case Study 2: RC4 ...... 97 3.3.3 Case Study 3: Blum, Blum and Shub ...... 98 3.4 Non-constructive existence of pseudorandom generators 99

4 Pseudorandom functions 103 4.1 One time passwords (e.g. Google Authenticator, RSA ID, etc.) ...... 107 4.1.1 How do pseudorandom functions help in the login problem? ...... 108 4.1.2 Modifying input and output lengths of PRFs . . . 111 4.2 Message Authentication Codes ...... 112 4.3 MACs from PRFs ...... 114 4.4 Arbitrary input length extension for MACs and PRFs . 115 4.5 Aside: natural proofs ...... 115 9

5 Pseudorandom functions from pseudorandom generators and CPA security 117 5.1 Securely encrypting many messages - chosen plaintext security ...... 122 5.2 Pseudorandom permutations / block ciphers ...... 125 5.3 Encryption modes ...... 130 5.4 Optional, Aside: Broadcast Encryption ...... 131 5.5 Reading comprehension exercises ...... 134

6 Chosen Ciphertext Security 137 6.1 Short recap ...... 137 6.2 Going beyond CPA ...... 138 6.2.1 Example: The Wired Equivalence Privacy (WEP) 138 6.2.2 Chosen ciphertext security ...... 140 6.3 Constructing CCA secure encryption ...... 143 6.4 (Simplified) GCM encryption ...... 148 6.5 Padding, chopping, and their pitfalls: the “buffer overflow” of cryptography ...... 149 6.6 Chosen ciphertext attack as implementing metaphors . 150 6.7 Reading comprehension exercises ...... 150

7 Hash Functions and Random Oracles 153 7.1 The “Bitcoin” Problem ...... 153 7.1.1 The Currency Problem ...... 153 7.1.2 Bitcoin Architecture ...... 154 7.2 The Bitcoin Ledger ...... 155 7.2.1 From Proof of Work to Consensus on Ledger . . . 158 7.3 Collision Resistance Hash Functions and Creating Short “Unique” Identifiers ...... 160 7.4 Practical Constructions of Cryptographic Hash Func- tions ...... 161 7.4.1 Practical Random-ish Functions ...... 162 7.4.2 Some History ...... 163 7.4.3 The NSA and Hash Functions ...... 164 7.4.4 Cryptographic vs Non-Cryptographic Hash Functions ...... 164 7.5 Reading comprehension exercises ...... 165

8 Key derivation, protecting passwords, slow hashes, Merkle trees 167 8.1 Keys from passwords ...... 167 8.2 Merkle trees and verifying storage...... 170 8.3 Proofs of Retrievability ...... 171 8.4 Entropy extraction ...... 172 8.4.1 Forward and backward secrecy ...... 175 10

9 Private key crypto recap 177 9.0.1 Attacks on private key ...... 178

III Public key cryptography 179

10 Public key cryptography 181 10.1 Private key crypto recap ...... 183 10.2 Public Key Encryptions: Definition ...... 185 10.2.1 The obfuscation paradigm ...... 186 10.3 Some concrete candidates: ...... 188 10.3.1 Diffie-Hellman Encryption (aka El-Gamal) . . . . 189 10.3.2 Sampling random primes ...... 193 10.3.3 A little bit of group theory...... 194 10.3.4 Digital Signatures ...... 195 10.3.5 The Algorithm (DSA) . . . . . 196 10.4 Putting everything together - security in practice. . . . . 200 10.5 Appendix: An alternative proof of the density of primes 204 10.6 Additional Group Theory Exercises and Proofs . . . . . 205 10.6.1 Solved exercises: ...... 206

11 Concrete candidates for public key crypto 211 11.1 Some number theory...... 211 11.1.1 Primaliy testing ...... 212 11.1.2 Fields ...... 213 11.1.3 Chinese remainder theorem ...... 214 11.1.4 The RSA and Rabin functions ...... 215 11.1.5 Abstraction: trapdoor permutations ...... 216 11.1.6 Public key encryption from trapdoor permuta- tions ...... 217 11.1.7 Digital signatures from trapdoor permutations . 220 11.2 Hardcore bits and security without random oracles . . 222 11.2.1 Extending to more than one hardcore bit . . . . . 226

12 Chosen ciphertext security for public key encryption 229

13 Lattice based cryptography 231 13.0.1 Quick linear algebra recap ...... 233 13.1 A world without Gaussian elimination ...... 235 13.2 Security in the real world...... 237 13.3 Search to decision ...... 238 13.4 An LWE based encryption scheme ...... 239 13.5 But what are lattices? ...... 244 13.6 Ring based lattices ...... 245

14 Establishing secure connections over insecure channels 247 11

14.1 Cryptography’s obsession with adjectives...... 247 14.2 Basic protocol ...... 249 14.3 Authenticated key exchange ...... 250 14.3.1 Bleichenbacher’s attack on RSA PKCS V1.5 and SSL V3.0 ...... 250 14.4 Chosen ciphertext attack security for public key cryp- tography ...... 251 14.5 CCA secure public key encryption in the Random Oracle Model ...... 253 14.5.1 Defining secure authenticated key exchange . . . 255 14.5.2 The compiler approach for authenticated key exchange ...... 255 14.6 Password authenticated key exchange...... 256 14.7 Client to client key exchange for secure text messaging - ZRTP, OTR, TextSecure ...... 256 14.8 Heartbleed and logjam attacks ...... 256

IV Advanced topics 257

15 Zero knowledge proofs 259 15.1 Applications for zero knowledge proofs...... 260 15.1.1 Nuclear disarmament ...... 260 15.1.2 Voting ...... 261 15.1.3 More applications ...... 261 15.2 Defining and constructing zero knowledge proofs . . . 261 15.3 Defining zero knowledge ...... 265 15.4 Zero knowledge proof for Hamiltonicity...... 268 15.4.1 Why is this interesting? ...... 271 15.5 Parallel repetition and turning zero knowledge proofs to signatures...... 271 15.5.1 “Bonus features” of zero knowledge ...... 272

16 Fully homomorphic encryption: Introduction and boot- strapping 273 16.1 Defining fully homomorphic encryption ...... 276 16.1.1 Another application: fully homomorphic en- cryption for verifying computation ...... 277 16.2 Example: An XOR homomorphic encryption ...... 278 16.2.1 Abstraction: A trapdoor pseudorandom generator.280 16.3 From linear homomorphism to full homomorphism . . 282 16.4 Bootstrapping: Fully Homomorphic “escape velocity” . 282 16.4.1 Radioactive legos analogy ...... 283 16.4.2 Proving the bootstrapping theorem ...... 283 12

17 Fully homomorphic encryption: Construction 285 17.1 Prelude: from vectors to matrices ...... 286 17.2 Real world partially homomorphic encryption . . . . . 288 17.3 Noise management via encoding ...... 289 17.4 Putting it all together ...... 290 17.5 Analysis of our scheme ...... 291 17.5.1 Correctness ...... 292 17.5.2 CPA Security ...... 293 17.5.3 Homomorphism ...... 293 17.5.4 Shallow decryption circuit ...... 293 17.6 Advanced topics: ...... 296 17.6.1 Fully homomorphic encryption for approximate computation over the real numbers: CKKS . . . . 296 17.6.2 Bandwidth efficient fully homomorphic encryp- tion GH ...... 297 17.6.3 Using fully homomorphic encryption to achieve private information retrieval...... 298

18 Multiparty secure computation I: Definition and Honest- But-Curious to Malicious complier 299 18.1 Ideal vs. Real Model Security...... 300 18.2 Formally defining secure multiparty computation . . . 301 18.2.1 First attempt: a slightly “too ideal” definition . . 301 18.2.2 Allowing for aborts ...... 302 18.2.3 Some comments: ...... 304 18.3 Example: Second price auction using bitcoin ...... 306 18.3.1 Another example: distributed and threshold cryptography ...... 307 18.4 Proving the fundamental theorem: ...... 308 18.5 Malicious to honest but curious reduction ...... 309 18.5.1 Handling probabilistic strategies: ...... 313

19 Multiparty secure computation II: Construction using Fully Homomorphic Encryption 315 19.1 Constructing 2 party honest but curious computation from fully homomorphic encryption ...... 316 19.2 Achieving circuit privacy in a fully homomorphic encryption ...... 319 19.2.1 Bottom line: A two party secure computation protocol ...... 321 19.3 Beyond two parties ...... 322

20 Quantum computing and cryptography I 325 20.0.1 Quantum computing and computation - an executive summary...... 327 13

20.1 Quantum 101 ...... 328 20.1.1 Physically realizing quantum computation . . . . 332 20.1.2 Bra-ket notation ...... 333 20.1.3 Bell’s Inequality ...... 333 20.2 Grover’s Algorithm ...... 335

21 Quantum computing and cryptography II 337 21.1 From order finding to factoring and discrete log . . . . 337 21.2 Finding periods of a function: Simon’s Algorithm . . . 338 21.3 From Simon to Shor ...... 340 21.3.1 The Fourier transform over ...... 340 21.3.2 Fast Fourier Transform...... 341 푚 21.3.3 Quantum Fourier Transformℤ over ...... 342 21.4 Shor’s Order-Finding Algorithm...... 343 푚 21.4.1 Analysis: the case that . . . .ℤ ...... 344 21.4.2 The general case ...... 344 21.5 Rational approximation of real푟|푚 numbers ...... 345 21.5.1 ...... 346

22 Software Obfuscation 347 22.1 Witness encryption ...... 348 22.2 Deniable encryption ...... 349 22.3 Functional encryption ...... 349 22.4 The software patch problem ...... 350 22.5 Software obfuscation ...... 351 22.6 Applications of obfuscation ...... 352 22.7 Impossibility of obfuscation ...... 352 22.7.1 Proof of impossibility of VBB obfuscation . . . . 352 22.8 Indistinguishability obfuscation ...... 355

23 More obfuscation, exotic encryptions 357 23.1 Slower, weaker, less securer ...... 357 23.2 How to get IBE from pairing based assumptions. . . . . 358 23.3 Beyond pairing based cryptography ...... 361

24 Anonymous communication 363 24.1 ...... 363 24.2 Anonymous routing ...... 364 24.3 Tor ...... 364 24.4 Telex ...... 364 24.5 Riposte ...... 364

V Conclusions 365

25 Ethical, moral, and policy dimensions to cryptography 367 14

25.1 Reading prior to lecture: ...... 369 25.2 Case studies...... 369 25.2.1 The Snowden revelations ...... 369 25.2.2 FBI vs Apple case ...... 370 25.2.3 Juniper backdoor case and the OPM break-in . . 371

26 Course recap 373 26.1 Some things we did not cover ...... 375 26.2 What I hope you learned ...... 376 Foreword and Syllabus

“Human ingenuity cannot concoct a cipher which human ingenuity cannot resolve.” Edgar Allan Poe, 1841

Cryptography - the art or science of “secret writing” - has been around for several millenia. For almost all that time, Edgar Allan Poe’s quote above held true. Indeed, the is littered with the figurative corpses of cryptosystems believed secure and then broken, and sometimes with the actual corpses of those who have mistakenly placed their faith in these cryptosystems. Yet, something changed in the last few decades. New cryptosystems have been found that have not been broken despite being subjected to immense efforts involving both human ingenuity and computational power on a scale that completely dwarves the “crypto breakers” of Poe’s time. Even more amazingly, these cryptosystems are not only seemingly unbreak- able, but they also achieve this under much harsher conditions. Not only do today’s attackers have more computational power but they also have more data to work with. In Poe’s age, an attacker would be lucky if they got access to more than a few with known plaintexts. These days attackers might have massive amounts of data - terabytes or more - at their disposal. In fact, with public key encryp- tion, an attacker can generate as many ciphertexts as they wish. These new types of cryptosystems, both more secure and more ver- satile, have enabled many applications that in the past were not only impossible but in fact unimaginable. These include secure communi- cation without sharing a secret, electronic voting without a trusted authority, anonymous digital cash, and many more. Cryptography now supplies crucial infrastructure without which much of the mod- ern “communication economy” could not function. This course is about the story of this cryptographic revolution. However, beyond the cool applications and the crucial importance of cryptography to our society, it contains also intellectual and math- ematical beauty. To understand these often paradoxical notions of cryptography, you need to think differently, adapting the point of view of an attacker, and (as we will see) sometimes adapting the

Compiled on 9.23.2021 13:32 16

points of view of other hypothetical entities. More than anything, this course is about this cryptographic way of thinking. It may not be im- mediately applicable to protecting your credit card information or to building a secure system, but learning a new way of thinking is its own reward.

0.1 SYLLABUS

In this fast-paced course, I plan to start from the very basic notions of cryptography and by the end of the term reach some of the exciting advances that happened in the last few years such as the construction of fully homomorphic encryption, a notion that Brian Hayes called “one of the most amazing magic tricks in all of computer science”, and in- distinguishability obfuscators which are even more amazing. To achieve this, our focus will be on ideas rather than implementations and so we will present cryptographic notions in their pedagogically simplest form – the one that best illustrates the underlying concepts – rather than the one that is most efficient, widely deployed, or conforms toIn- ternet standards. We will discuss some examples of practical systems and attacks, but only when these serve to illustrate a conceptual point. Depending on time, I plan to cover the following notions:

• Part I: Introduction

1. How do we define security for encryption? Arguably the most important step in breaking out of the “build-break-tweak” cycle that Poe’s quote described has been the idea that we can have a mathematically precise definition of security, rather than relying on fuzzy notions, that allow us only to determine with certainty that a system is broken but never have a chance of proving that a system is secure . 2. Perfect security and its limitations: Showing the possibility (and the limitations) of encryptions that are perfectly secure regardless of the attacker’s computational resources. 3. Computational security: Bypassing the above limitations by re- stricting to computationally efficient attackers. Proofs of security by reductions.

• Part II: Private Key Cryptography

1. Pseudorandom generators: The basic building block of cryp- tography, which also provided a new twist on the age-old philo- sophical and scientific question of the nature of randomness. 2. Pseudorandom functions, permutations, block ciphers: Block ciphers are the working horse of crypto. 17

3. Authentication and active attacks: Authentication turns out to be as crucial, if not more so, to security than secrecy and often a precondition to the latter. We’ll talk about notions such as Message Authentication Codes and Chosen-Ciphertext-Attack secure encryption, as well as real-world examples why these notions are necessary. 4. Hash functions and the “Random Oracle Model”: Hash func- tions are used everwhere in crypto, including for verifying in- tegrity, entropy distillation, and many other cases. 5. Building pseudorandom generators from one-way permu- tations (optional): Justifying our “axiom” of pseudo-random generators by deriving it from a weaker assumption.

• Part III: Public key encryption

1. Public key cryptography and the obfuscation paradigm: How did Diffie, Hellman, Merkle, Ellis even dare to imagine the possi- bility of public key encryption? 2. Constructing public key encryption: Factoring, discrete log, and lattice based systems: We’ll discuss several variants for constructing public key systems, including those that are widely deployed such as RSA, Diffie-Hellman, and the elliptic curve variants. We’ll also discuss some variants of lattice based cryp- tosystems that have the advantage of not being broken by quan- tum computers and being more versatile. The former’s weakness to quantum computers is the reason why the NSA has advised people to transition to lattice-based cryptosystems in the not too far future. 3. Signature schemes: These are the public key versions of authen- tication, though interestingly they are easier to construct in some sense than the latter. 4. Active attacks for encryption: Chosen ciphertext attacks for public key encryption.

• Part IV: Advanced notions

1. Fully homomorphic encryption: Computing on encrypted data. 2. Multiparty secure computation: An amazing construction that enables applications such as playing poker over the net without trusting the server, privacy preserving data mining, electronic auctions without a trusted auctioneer, and electronic elections without a trusted central authority. 3. Zero knowledge proofs: Prove a statement without revealing the reason to why its true. 18

4. Quantum computing and cryptography: Shor’s algorithm to break RSA and friends. . On “quan- tum resistant” cryptography. 5. Indistinguishability obfuscation: Construction of indistin- guishability obfuscators, the potential “master tool” for crypto. 6. Practical protocols: Techniques for constructing practical proto- cols for particular tasks as opposed to general (and often ineffi- cient) feasibility proofs. 7. : Hash chains and Merkle trees, proofs of work, achieving consensus on a ledger via “majority of cycles”, smart contracts, achieving anonymity via zero knowledge proofs.

0.1.1 Prerequisites The main prerequisite is the ability to read, write (and even enjoy!) mathematical proofs. In addition, familiarity with algorithms, ba- sic probability theory and basic linear algebra will be helpful. We’ll only use fairly basic concepts from all these areas: e.g. Oh-notation- e.g. running time - from algorithms, notions such as events, ran- dom variables, expectation, from probability theory, and notions such as matrices,푂(푛) vectors, and eigenvectors. Mathematically mature stu- dents should be able to pick up the needed notions on their own. See the “mathematical background” handout for more details. No programming knowledge is needed. If you’re interested in the course but are not sure if you have sufficient background, or you have any other questions, please don’t hesitate to contact me.

0.2 WHY IS CRYPTOGRAPHY HARD?

Cryptography is a hard topic. Over the course of history, many bril- liant people have stumbled in it, and did not realize subtle attacks on their ciphers. Even today it is frustratingly easy to get crypto wrong, and often system security is compromised because developers used crypto schemes in the wrong, or at least suboptimal, way. Why is this topic (and this course) so hard? Some of the reasons include:

• To argue about the security of a cryptographic scheme, you have to think like an attacker. This requires a very different way of thinking than what we are used to when developing algorithms or systems, and arguing that they perform well.

• To get robust assurances of security you need to argue about all possible attacks . The only way I know to analyze this infinite set is via mathematical proofs . Moreover, these types of mathematical proofs tend to be rather different than the ones most mathemati- cians typically work with. Because the proof itself needs to take the 19

viewpoint of the attacker, these often tend to be proofs by contra- diction and involve several twists of logic that take some getting used to.

• As we’ll see in this course, even defining security is a highly non- trivial task. Security definitions often get subtle and require quite a lot of creativity. For example, the way we model in general a statement such as “An attacker Eve does not get more information from observing a system above what she knew a-priori” is that we posit a “hypothetical alter ego” of Eve called Lilith who knows everything Eve knew a-priori but does not get to observe the actual interaction in the system. We then want to prove that anything that Eve learned could also have been learned by Lilith. If this sounds confusing, it is. But it is also fascinating, and leads to ways to argue mathematically about knowledge as well as beautiful generalizations of the notion of encryption and protecting communication into schemes for protecting computation .

If cryptography is so hard, is it really worth studying? After all, given this subtlety, a single course in cryptography is no guarantee of using (let alone inventing) crypto correctly. In my view, regardless of its immense and growing practical importance, cryptography is worth studying for its intellectual content. There are many areas of science where we achieve goals once considered to be science fiction. But cryptography is an area where current achievements are so fantastic that in the thousands of years of secret writing people did not even dare imagine them. Moreover, cryptography may be hard because it forces you to think differently, but it is also rewarding because it teaches you to think differently. And once you pass this initial hurdle, and develop a “cryptographer’s mind”, you might find that this point of view is useful in areas that seem to have nothing to do with crypto.

I PRELIMINARIES

0 Mathematical Background

This is a brief review of some mathematical tools, and especially probability theory, that we will use in this course. See also the math- ematical background and probability lectures in my Notes on Intro- duction to Theoretical Computer Science, which share much of the following text. At Harvard, much of this material (and more) is taught in Stat 110 “Introduction to Probability”, CS20 “Discrete Mathematics”, and AM107 “Graph Theory and Combinatorics”. Some good sources for this material are the lecture notes by Papadimitriou and Vazirani (see home page of Umesh Vaziarani), Lehman, Leighton and Meyer from MIT Course 6.042 “Mathematics For Computer Science” (Chapters 1-2 and 14 to 19 are particularly relevant), and the Berkeley course CS 70. The mathematical tool we use most often is discrete probability. The “Probabilistic Method” book by Alon and Spencer is a great re- source in this area. Also, the books of Mitzenmacher and Upfal and Prabhakar and Raghavan cover probability from a more algorithmic perspective. For an excellent popular discussion of some of the math- ematical concepts we’ll talk about see the book “How Not to Be Wrong” by Jordan Ellenberg. Although knowledge of algorithms is not strictly necessary, it would be quite useful. Students who did not take an algorithms class such as CS 124 might want to look at (1) Corman, Leiserson, Rivest and Smith, (2) Dasgupte, Papadimitriou and Vaziarni, or (3) Klein- berg and Tardos. We do not require prior knowledge of complexity or computability but some basic familiarity could be useful. Students who did not take a theory of computation class such as CS 121 might want to look at my lecture notes or the first 2 chapters of my book with Arora.

0.1 A QUICK OVERVIEW OF MATHEMATICAL PREREQUISITES

The main notions we will use in this course are the following:

Compiled on 9.23.2021 13:32 24 an intensive introduction to cryptography

• Proofs: First and foremost, this course will involve a heavy dose of formal mathematical reasoning, which includes mathematical definitions, statements, and proofs.

• Sets and functions: We will assume familiarity with basic notions of sets and operations on sets such as union (denoted ), intersec- tion (denoted ), and set subtraction (denoted ). We denote by the size of the set . We also assume familiarity⧵ with functions,∪ and notions such as∩ one-to-one (injective) functions and onto (surjec-|퐴| tive) functions. If 퐴 is a function from a set to a set , we denote this by . If is one-to-one then this implies that . If is onto then 푓 . If is a permutation/bijection퐴 퐵 (e.g., one-to-one푓 ∶ 퐴and →onto) 퐵 푓 then this implies that . |퐴| ≤ |퐵| 푓 |퐴| ≥ |퐵| 푓 Big Oh notation: • If are two functions from|퐴| = |퐵|to , then (1) if there exists a constant such that for every sufficiently 푓,large 푔 , (2) if ℕ , (3)ℕ is 푓 = 푂(푔)and , (4) 푐 if for every푓(푛) ≤, 푐 ⋅ 푔(푛) for every sufficiently 푛large , and푓 = (5) Ω(푔) 푔 = 푂(푓)if 푓 =. Θ(푔) To emphasize푓 = 푂(푔) the푔 input = 푂(푓) parameter,푓 = 표(푔)we often write 휖 > 0 푓(푛) ≤ 휖⋅푔(푛) instead of , and use푛 similar notation푓 = 휔(푔) for 푔 = 표(푓). While this is only an imprecise heuristic, when you see푓(푛) a statement = 푂(푔(푛)) of the form 푓 = 푂(푔) you can often replace it in your표, Ω, mind 휔, Θ by the statement while the statement can often푓(푛) be thought = 푂(푔(푛)) of as . 푓(푛) ≤ 1000푔(푛) 푓(푛) = Ω(푔(푛)) Logical operations: • The푓(푛) operations ≥ 0.001푔(푛) AND, OR, and NOT ( ) and the quantifiers “exists” and “forall” ( , ). ∧, ∨, ¬ Tuples and strings: • The notation and ∃ ∀where is some finite set which is called the alphabet (quite푘 often ∗ ). Σ Σ Σ Graphs: • Undirected and directed graphs, connectivity,Σ = {0, 1} paths, and cycles.

• Basic combinatorics: Notions such as (the number of -sized subset of a set of size ). 푛 (푘) 푘 Discrete probability: • 푛We will extensively use probability theory, and specifically probability over finite samples spaces such as tossing coins, including notions such as random variables, expectation, and concentration. 푛

• Modular arithmetic: We will use modular arithmetic (i.e., addition and multiplication modulo some number ), and in particular talk about operations on vectors and matrices whose elements are taken modulo . If is an integer, then we푚 denote by mod the remainder of when divided by . mod is the number 푚 푛 푎 ( 푛) 푎 푛 푎 ( 푛) mathematical background 25

such that for some integer . It will be very useful that mod mod mod 푟and ∈ {0,mod … , 푛 − 1} mod 푎 = 푘푛 + 푟mod and so modular푘 arithmetic inherits all푎 ( the rules푛) + (associativity, 푏 ( 푛) = commutativity (푎 + 푏) ( etc..)푛) of integer푎 ( arithmetic.푛) ⋅ 푏 ( If 푛)are = positive (푎 ⋅ 푏) ( integers푛) then is the largest integer that divides both and . It is known that for every there exist (not necessarily푎, 푏 positive) integers such푔푐푑(푎, that 푏) (it’s a good exercise푎 푏 to prove this on your own). In푎, 푏 particular, if then there exists a modular푥, 푦 inverse for which푎푥 + 푏푦 is = a number푔푐푑(푎, 푏) such that mod . We sometimes write as mod 푔푐푑(푎,. 푛) = 1 푎 −1 푏 푎푏 = 1 ( 푛) • Group푏 푎 theory,( 푛) linear algebra: In later parts of the course we will need the notions of matrices, vectors, matrix multiplication and inverse, determinant, eigenvalues, and eigenvectors. These can be picked up in any basic text on linear algebra. In some parts we might also use some basic facts of group theory (finite groups only, and mostly only commutative ones). These also can be picked up as we go along, and a prior course on group theory is not necessary.

• Discrete probability: Probability theory, and specifically probability over finite samples spaces such as tossing coins is a crucial part of cryptography, since (as we’ll see) there is no secrecy without randomness. 푛

0.2 MATHEMATICAL PROOFS

Arguably the mathematical prerequisite needed for this course is a certain level of comfort with mathematical proofs. Many students tend to think of mathematical proofs as a very formal object, like the proofs studied in school in geometry, consisting of a sequence of axioms and statements derived from them by very specific rules. In fact, a proof is a piece of writing meant to convince human readers that a particular statement is true.

(In this class, the particular humans you are trying to convince are me and the teaching fellows.) To write a proof of some statement X you need to follow three steps:

1. Make sure that you completely understand the statement X.

2. Think about X until you are able to convince yourself that X is true.

3. Think how to present the argument in the clearest possible way so you can convince the reader as well. 26 an intensive introduction to cryptography

Like any good piece of writing, a proof should be concise and not be overly formal or cumbersome. In fact, overuse of formalism can of- ten be detrimental to the argument since it can mask weaknesses in the argument from both the writer and the reader. Sometimes students try to “throw the kitchen sink” at an answer trying to list all possi- bly relevant facts in the hope of getting partial credit. But a proof is a piece of writing, and a badly written proof will not get credit even if it contains some correct elements. It is better to write a clear proof of a partial statement. In particular, if you haven’t been able to convince yourself that the statement is true, you should be honest about it and explain which parts of the statement you have been able to verify and which parts you haven’t.

0.2.1 Example: The existence of infinitely many primes. In the spirit of “do what I say and not what I do”, I will now demon- strate the importance of conciseness by belaboring the point and spending several paragraphs on a simple proof, written by Euclid around 300 BC. Recall that a prime number is an integer whose only divisors are and . Euclid’s Theorem is the following: 푝 > 1 Theorem 0.1 — Infinitude푝 1 of primes. There exist infinitely many primes.

Instead of simply writing down the proof, let us try to understand how we might figure this proof out. (If you haven’t seen this proof before, or you don’t remember it, you might want to stop reading at this point and try to come up with it on your own before continuing.) The first (and often most important) step is to understand what the statement means. Saying that the number of primes is infinite means that it is not finite. More precisely, this means that for every natural number , there are more than primes. Now that we understand what we need to prove, let us try to con- vince ourselves푘 of this fact. At first,푘 it might seem obvious— since there are infinitely many natural numbers, and every one of them can be factored into primes, there must be infinitely many primes, right? Wrong. Since we can multiply a prime many times with itself, a finite number of primes can generate infinitely many numbers. In- deed the single prime generates the infinite set of all numbers of the form . So, what we really need to show is that for every finite set of primes푛 , there3 exists a number that has a prime factor outside3 this set. 1 푘 Now we{푝 need, … , to 푝 start} playing around. Suppose푛 that we had just two primes and . How would we find a number that is not gen- erated by and ? If you try to draw things on the number line, you will see that푝 there푞 is always some gap between multiples푛 of and in 푝 푞 푝 푞 mathematical background 27

the sense that they are never consecutive. It is possible to prove that (in fact, it’s not a bad exercise) but this observation already suggests a guess for what would be a number that is divisible by neither nor , namely . Indeed, the remainder of when dividing by either or would be (which in particular is not zero). This푝 obser-푞 vation generalizes푝푞 + 1 and we can set 푛 = 푝푞to + be 1 a number that is divisible푝 neither푞 by 1nor , and more generally is not divisible by . 푛 = 푝푞푟 + 1 1 푘 Now we have convinced푝, 푞 ourselves푟 of the statement푛 = and 푝 ⋯ it , 푝 is time+ 1 1 푘 to think of how to푝 , write … , 푝 this down in the clearest way. One issue that arises is that we want to prove things truly from the definition of primes and first principles, and so not assume properties of division and remainders or even the existence of a prime factorization, without proving it. Here is what a proof could look like. We will prove the following two lemmas: Lemma 0.2 — Existence of prime divisor. For every integer , there exists a prime that divides . 푛 > 1 Lemma 0.3 — Existence of co-prime. For every set of integers , 푝 > 1 푛 there exists a number such that none of divide . 1 푘 푝 , … , 푝 > 1 From these two lemmas it follows that there1 exist푘 infinitely many 푛 푝 , … , 푝 푛 primes, since otherwise if we let be the set of all primes, then we would get a contradiction as by combining Lemma 0.2 and 1 푘 Lemma 0.3 we would get a number푝 , …with , 푝 a prime factor outside this set. We now prove the lemmas: 푛 Proof of Lemma 0.2. Let be a number, and let be the smallest divisor of that is larger than (there exists such a number since divides itself). We claim푛 that > 1 is a prime. Indeed suppose푝 otherwise there was푛 some that1 divides . Then since 푝for some푛 integer and for some푝 integer we’ll get that and hence divides 1in < contradiction′푞 < 푝 to the푝′ choice of as푛 the = 푝푐 smallest′ divisor푐 of . 푝 = 푞푐 푐 푛 = 푞푐푐 푞 푛 푝 ■ 푛 Proof of Lemma 0.3. Let and suppose for the sake of contradiction that there exists some such that for some 1 푘 integer . Then if we divide푛 = the 푝 ⋯ equation 푝 + 1 by then we 푖 get minus an integer on the lefthand푖 side, and푛 the = fraction 푝 ⋅ 푐 on 1 푘 푖 the righthand푐 side. 푛 − 푝 ⋯ 푝 = 1 푝 푖 푐 1/푝 ■

This completes the proof of Theorem 0.1 28 an intensive introduction to cryptography

0.3 PROBABILITY AND SAMPLE SPACES

Perhaps the main mathematical background needed in cryptography is probability theory since, as we will see, there is no secrecy without randomness. Luckily, we only need fairly basic notions of probability theory and in particular only probability over finite sample spaces. If you have a good understanding of what happens when we toss random coins, then you know most of the probability you’ll need. The discussion below is not meant to replace a course on proba-푘 bility theory, and if you have not seen this material before, I highly 1 1 Harvard’s STAT 110 class (whose lectures are avail- recommend you look at additional resources to get up to speed. able on youtube ) is a highly recommended introduc- The nature of randomness and probability is a topic of great philo- tion to probability. See also these lecture notes from sophical, scientific and mathematical depth. Is there actual random- MIT’s “Mathematics for Computer Science” course ,as well as notes 12-17 of Berkeley’s CS 70. ness in the world, or does it proceed in a deterministic clockwork fash- ion from some initial conditions set at the beginning of time? Does probability refer to our uncertainty of beliefs, or to the frequency of occurrences in repeated experiments? How can we define probability over infinite sets? These are all important questions that have been studied and de- bated by scientists, mathematicians, statisticians, and philosophers. Fortunately, we will not need to deal directly with these questions here. We will be mostly interested in the setting of tossing random, unbiased and independent coins. Below we define the basic proba- bilistic objects of events and random variables when restricted푛 to this setting. These can be defined for much more general probabilistic ex- periments or sample spaces, and later on we will briefly discuss how this can be done. However, the -coin case is sufficient for almost everything we’ll need in this course. If instead of “heads” and “tails”푛 we encode the sides of each coin by “zero” and “one”, we can encode the result of tossing coins as a string in . Each particular outcome is obtained with probability푛 . For example, if we toss three coins,푛 푛 then we obtain each{0, of 1} the−푛 8 outcomes 푥 ∈ {0, 1} with probability 2 (see also Fig. 1). We can describe the experiment of tossing−3 coins as000, choosing 001, 010, a 011, string 100,uniformly 101, 110, 111 at random from 2 ,= and 1/8 hence we’ll use the shorthand Figure 1: The probabilistic experiment of tossing for that is chosen according푛 푛 to this experiment. 푥 푛 three coins corresponds to making choices, each with equal probability. In this example, An event is{0, simply 1} a subset of . The probability푥 of ∼ {0,, de- 1} the blue set corresponds to the event2 × 2 × 2 = 8 noted푥 by Pr (or Pr for short,푛 when the sample space is where the first coin toss is equal to , and the pink set corresponds to the event understood from the푛 context),퐴 is the{0, probability 1} that an chosen퐴 uni- 3 퐴 = {푥 ∈ 푥∼{0,1} {0, 1} | 푥0 = 0} where the second coin toss is formly at random will[퐴] be contained[퐴] in . Note that this is the same as equal0 to (with3 their intersection having a purplish퐵 = (where as usual denotes the number of elements푥 in the set {푥color). ∈ {0, As 1} we| can 푥1 see,= 1} each of these events contains elements1 (out of total) and so has probability . ). For푛 example, the probability that 퐴has an even number of ones The intersection of and contains two elements, 4 is|퐴|/2 Pr where |퐴| mod . In the case , and so the probability8 that both of these events occur1/2 퐴 푛−1 푥 is . 퐴 퐵 푖 [퐴] 퐴 = {푥 ∶ ∑푖=0 푥 = 0 2} 푛 = 3 2/8 = 1/4 mathematical background 29

, and hence Pr (see Fig. 2). It turns out this is true for every : 4 1 퐴 = {000, 011, 101, 110} [퐴] = 8 = 2 Lemma 0.4 For every , 푛

푛 >Pr 0 is even 푛−1

푛 푖 푥∼{0,1} [∑푖=0 푥 ] = 1/2

P To test your intuition on probability, try to stop here and prove the lemma on your own. Figure 2: The event that if we toss three coins then the sum of the ’s is even has probability since it corresponds to exactly 0 1 2 푖 Proof of Lemma 0.4. We prove the lemma by induction on . For the out푥 , 푥of, the 푥 ∈possible {0, 1} strings of length . 푥 1/2 4 case it is clear since is even and is odd, and hence 8 3 the probability that is even is . Let . We푛 assume by induction푛 = 1 that the lemma푥 = is 0true for 푥and = 1 we will prove it for . We split the set푥 ∈ {0, 1}into four disjoint1/2 sets푛 > 1 , where for , is defined푛 as푛 the − 1 set of such that 0 1 0 1 푛 has even number{0, 1} of ones and and퐸 , similarly 퐸 푛, 푂 , 푂 is 푏 the set of 푏 ∈ {0, 1} 퐸such that has odd푥 ∈ number {0, 1} of ones and 0 푛−2 푛−1 푏 푥 ⋯ 푥 . Since is푛 obtained by simply푥 extending= 푏 -length푂 string 0 푛−2 with even푥 number ∈ {0, 1} of ones by the푥 digit⋯ 푥 , the size of is simply the 푛−1 0 푥number= 푏 of such퐸 -length strings which by the induction푛 − 1 hypothesis 0 is . The same reasoning0 applies for 퐸, , and . Hence푛−1 each one푛−2 of푛−1 the four sets is of size . Since 1 0 1 2 /2 =has 2 an even number of ones if and only퐸 if 푂 푛−2 푂 0 1 0 1 (i.e., either푛 the first coordinates퐸 , 퐸 sum, 푂 , up푂 to an even2 number and 0 1 푥the ∈ final {0, 1} coordinate is or the first coordinates sum푥 ∈ up 퐸 to∪ an 푂 odd number and the final푛 − 1coordinate is ), we get that the probability that satisfies this property0 is 푛 − 1 1 푥 푛−2 푛−2 |퐸0∪푂1| 푛 2 + 2 1 using the fact that and2 =are disjoint푛 and= hence, . 2 2 0 1 0 1 퐸 푂 |퐸 ∪ 푂 | = ■ 0 1 |퐸 | + |푂 | We can also use the intersection ( ) and union ( ) operators to talk about the probability of both event and event happening, or the probability of event or event ∩ happening. For∪ example, the probability that has an even number퐴 of ones and 퐵 is the same as Pr where 퐴 퐵 mod and 0 푝 푥 . This probability푛 푛−1 is equal푥 = to 1 for 푖=0 푖 [퐴. (It ∩ 퐵] is a great푛 exercise퐴 = {푥 for ∈ {0, you 1} to pause∶ ∑ here푥 = and 0 verify2} that you 0 understand퐵 = {푥 ∈ {0,why 1} this∶ is 푥 the= case.) 1} 1/4 푛 > 1 30 an intensive introduction to cryptography

Because intersection corresponds to considering the logical AND of the conditions that two events happen, while union corresponds to considering the logical OR, we will sometimes use the and operators instead of and , and so write this probability Pr defined above also as ∧ ∨ ∩ ∪ 푝 = [퐴 ∩ 퐵] Pr mod

푛 푖 0 푥∼{0,1} [∑ 푥 = 0 2 ∧ 푥 = 1] . If is an event,푖 then corresponds to the event that does푛 not happen. Since 푛 ⧵ , we get that 퐴 ⊆ {0, 1} 퐴 = {0, 1}푛 퐴 Pr Pr 퐴 푛 |퐴| = 2 − |퐴| |퐴|푛 2 −|퐴|푛 |퐴|푛 This makes sense:[퐴] since = 2 =happens2 = if 1and − 2 only= if 1 − does[퐴]not happen, the probability of should be one minus the probability of . 퐴 퐴 퐴 퐴 R Remark 0.5 — Remember the sample space. While the above definition might seem very simple and almost trivial, the human mind seems not to have evolved for probabilistic reasoning, and it is surprising how often people can get even the simplest settings of probability wrong. One way to make sure you don’t get confused when trying to calculate probability statements is to always ask yourself the following two questions: (1) Do I understand what is the sample space that this probability is taken over?, and (2) Do I under- stand what is the definition of the event that we are analyzing?. For example, suppose that I were to randomize seating in my course, and then it turned out that students sitting in row 7 performed better on the final: how surprising should we find this? If we started out with the hypothesis that there is something special about the number 7 and chose it ahead of time, then the event that we are discussing is the event that stu- dents sitting in number 7 had better performance on the final, and we might find it surprising.퐴 However, if we first looked at the results and then chose the row whose average performance is best, then the event we are discussing is the event that there exists some row where the performance is higher than the over- all average. is a superset of 퐵, and its probability (even if there is no correlation between sitting and performance)퐵 can be quite significant.퐴

0.3.1 Random variables Events correspond to Yes/No questions, but often we want to analyze finer questions. For example, if we make a bet at the roulette wheel, mathematical background 31

we don’t want to just analyze whether we won or lost, but also how much we’ve gained. A (real valued) random variable is simply a way to associate a number with the result of a probabilistic experiment. Formally, a random variable is a function that maps every outcome to an element .푛 For example, the function 푛 that maps to푋 the ∶ {0, sum 1} of→ its ℝ coordinates (i.e., to 푥) is∈ a {0,푛 random 1} variable. 푋(푥) ∈ ℝ The expectation푠푢푚푛−1 ∶ {0,of 1} a random→ ℝ variable푥 , denoted by , is the 푖=0 푖 average∑ value푥 that that this number takes, taken over all draws from the probabilistic experiment. In other words,푋 the expectation피[푋] of is defined as follows: 푋

−푛 피[푋] = ∑ 푛 2 푋(푥) . If and are random variables,푥∈{0,1} then we can define as simply the random variable that maps a point to 푋. One basic푌 and very useful property of the expectation푋푛 + is 푌 that it is linear: 푥 ∈ {0, 1} 푋(푥) + 푌 (푥) Lemma 0.6 — Linearity of expectation.

Proof. 피[푋 + 푌 ] = 피[푋] + 피[푌 ]

−푛 푛 피[푋 + 푌 ] =푥∈{0,1} ∑ 2 (푋(푥) + 푌 (푥)) = −푛 −푛 푏 푏 푥∈{0,1}∑ 2 푋(푥) +푥∈{0,1} ∑ 2 푌 (푥) = ■ 피[푋] + 피[푌 ] Similarly, for every . For example, using the linearity of expectation, it is very easy to show that the expectation of the sum of the피[푘푋]’s for = 푘 피[푋] is equal푘 ∈ ℝto . Indeed, if we write then 푛 where is the random 푖 variable 푛−1. Since푥 for every푥 ∼ {0,, Pr 1} 푛/2and Pr , 푖=0 푖 0 푛−1 푖 푋we = get ∑ that 푥 푋 = 푋 + ⋯ + 푋 and푋 hence 푖 푖 푖 푥 푖 . [푋 = 0] = 1/2 [푋 = 1] = 1/2 푖 푛−1 피[푋 ] = (1/2) ⋅ 0 + (1/2) ⋅ 1 = 1/2 피[푋] = 푖 ∑푖=0 피[푋 ] = 푛 ⋅ (1/2) = 푛/2 P If you have not seen discrete probability before, please go over this argument again until you are sure you follow it; it is a prototypical simple example of the type of reasoning we will employ again and again in this course. 32 an intensive introduction to cryptography

If is an event, then is the random variable such that equals if , and otherwise. Note that Pr 퐴 퐴 (can퐴 you see why?). Using1 this and the linearity of expectation,1 (푥) we 퐴 퐴 can show1 one푥 ∈ of 퐴 the most1 (푥) useful = 0 bounds in probability theory:[퐴] = 피[1 ] Lemma 0.7 — Union bound. For every two events , Pr Pr Pr 퐴, 퐵 [퐴 ∪ 퐵] ≤ [퐴] + [퐵] P Before looking at the proof, try to see why the union bound makes intuitive sense. We can also prove it directly from the definition of probabilities and the cardinality of sets, together with the equation . Can you see why the latter equation is true? (See also Fig. 3.) |퐴 ∪ 퐵| ≤ |퐴| + |퐵|

Proof of Lemma 0.7. For every , the variable . Hence, Pr Pr Pr . 퐴∪퐵 퐴 퐵 푥 1 (푥) ≤ 1 (푥) + 1 (푥)■ 퐴∪퐵 퐴 퐵 퐴 퐵 [퐴∪퐵] = 피[1 ] ≤ 피[1 +1 ] = 피[1 ]+피[1 ] = [퐴]+ [퐵] The way we often use this in theoretical computer science is to argue that, for example, if there is a list of 100 bad events that can hap- pen, and each one of them happens with probability at most , then with probability at least , no bad event happens. 1/10000 1 − 100/10000 = 0.99 0.3.2 Distributions over strings While most of the time we think of random variables as having as output a real number, we sometimes consider random vari- ables whose output is a string. That is, we can think of a map and consider the “random variable” such that for every푛 ∗ , the probability that outputs is equal 푌to ∶ {0, 1} → {0, 1} ∗ . To avoid confusion, we will푌 typically Figure 3: The union bound tells us that the probability of or happening is at most the sum of the indi- refer1푛 to such string-valued푦 ∈푛 {0, 1} random variables as푌distributions푦 over vidual probabilities. We can see it by noting that for 2 |{푥 ∈ {0, 1} | 푌 (푥) = 푦}| strings. So, a distribution over strings can be thought of as every퐴 two퐵 sets (with equality only a finite collection of strings ∗ and probabilities if and have no intersection). |퐴 ∪ 퐵| ≤ |퐴| + |퐵| (which are non-negative푌 numbers{0, 1} summing∗ up to one), 0 푀−1 퐴 퐵 so that Pr . 푦 , … , 푦 ∈ {0, 1} 0 푀−1 푝 ,Two … , 푝 distributions and are identical if they assign the same 푖 푖 probability[푌 to = every 푦 ] = string. 푝 For′ example, consider the following two functions 푌 푌 . For every , we define and ′ 2 2 where is the XOR operations.2 Al- though these푌 , 푌 are′∶ two {0, 1} different→ {0, 1}functions, they푥 ∈ {0,induce 1} the same distri- 0 0 1 푌bution (푥) = over 푥 푌 (푥)when = 푥 (푥 invoked⊕ 푥 ) on a uniform⊕ input. The distribution for 2 is of course the uniform distribution over . {0, 1} 2 2 푌 (푥) 푥 ∼ {0, 1} {0, 1} mathematical background 33

On the other hand is simply the map , , , which is a permutation′ over the map defined as 푌 and the map 00 ↦ 00 01 ↦ 012 10defined ↦ 112 11as ↦ 10 퐹 ∶2 {0, 1} →2 {0, 1} 0 1 0 1 퐹 (푥 푥 ) = 푥 푥 퐺 ∶ {0, 1} → {0, 1} 0 1 0 0 1 0.3.3퐺(푥 More푥 ) general = 푥 (푥 sample⊕ 푥 ) spaces. While in this chapter we assume that the underlying probabilistic experiment corresponds to tossing independent coins, everything we say easily generalizes to sampling from a more general finite or countable set (and not-so-easily generalizes푛 to uncountable sets as well). A probability distribution over a finite푥 set is simply a function 푆 such that . We think of this as the 푆 experiment where we obtain every with푆 probability , and 푥∈푆 휇sometimes ∶ 푆 → [0, denote 1] this as ∑ 휇(푠). An event = 1 is a subset of , and the probability of , which we denote by푥∈ Pr 푆 , is .A휇(푠)random variable is a function 푥 ∼ 휇, where the퐴 probability that푆 is 휇 퐴 [퐴] ∑푥∈퐴 휇(푥) equal to s.t. . 푋 ∶ 푆 → ℝ 푋 = 푦 ∑푥∈푆 푋(푥)=푦 휇(푥) 0.4 CORRELATIONS AND INDEPENDENCE

One of the most delicate but important concepts in probability is the notion of independence (and the opposing notion of correlations). Subtle correlations are often behind surprises and errors in probability and statistical analysis, and several mistaken predictions have been blamed on miscalculating the correlations between, say, housing prices in Florida and Arizona, or voter preferences in Ohio and Michigan. See also Joe Blitzstein’s aptly named talk “Conditioning is the Soul of Statistics”. (Another thorny issue is of course the difference between correlation and causation. Luckily, this is another point we don’t need to worry about in our clean setting of tossing coins.) Two events and are independent if the fact that happens 푛 Figure 4: Two events and are independent if makes neither more nor less likely to happen. For example, if we Pr Pr Pr . In the two figures above, think of the experiment퐴 퐵 of tossing random coins 퐴 , and we the empty square퐴 is the퐵 sample space, and and are two events in this sample space. In the left let be퐵 the event that and the event that 3 , [퐴 ∩ 퐵] = [퐴] ⋅ [퐵] figure, and푥 × 푥are independent, while in the right퐴 then if happens it is more likely3 that happens,푥 and ∈ {0, hence 1} these figure they are negatively correlated, since is less 0 0 1 2 퐵 events퐴 are not independent.푥 = 1On the퐵 other hand, if we푥 let+ 푥 be+ 푥the≥ event 2 likely to퐴 occur퐵 if we condition on (and vice versa). Mathematically, one can see this by noticing that in that 퐴 , then because the second coin퐵 toss is not affected by the 퐵 the left figure the areas of and 퐴 respectively are result of the first one, the events and are independent.퐶 and , and so their probabilities are 1 퐴 퐵 The푥 formal= 1 definition is that events and are independent if and respectively, while the area of 푎⋅푥2 is푎 푥 푥 푎 ⋅ 푥which푏 ⋅ corresponds 푥 to the probability . In= the Pr Pr Pr . If Pr 퐴 퐶 Pr Pr then we say 푏⋅푥2 푏 푥 = 푥 퐴 ∩ 퐵 right figure, the area of the triangle is 푎⋅푏2which that and are positively correlated, while퐴 if Pr퐵 Pr Pr 푥 corresponds푎 ⋅ 푏 to a probability of , but the푏⋅푥 area of [퐴 ∩ 퐵] = [퐴] ⋅ [퐵] [퐴 ∩ 퐵] > [퐴] ⋅ [퐵] 2 then we say that and are negatively correlated (see Fig. 1). is for some . This푏 퐵 means that the ′ 2푥 If퐴 we consider퐵 the above examples on the experiment[퐴 ∩ 퐵] < of choosing[퐴] ⋅ [퐵] probability푏 ⋅푎 of is ′ , or in other words 퐴Pr ∩ 퐵 2 Pr Pr 푏′ .< 푏 then we퐴 can퐵 see that 푏 ⋅푎2 푏 푎 퐴 ∩ 퐵 2푥 < 2푥 ⋅ 푥 3 [퐴 ∩ 퐵] < [퐴] ⋅ [퐵] 푥 ∈ {0, 1} 34 an intensive introduction to cryptography

Pr 1 Pr Pr 0 [푥 = 1] = 2 4 1 but 0 1 2 [푥 + 푥 + 푥 ≥ 2] = [{011, 101, 110, 111}] = 8 = 2

Pr Pr 3 1 1 0 0 1 2 and[푥 hence,= 1 as∧ we푥 + already 푥 + 푥 observed,≥ 2] = the[{101, events 110, 111}] = 8and> 2 ⋅ 2 are not independent and in fact are positively correlated. 0 0 On the other hand, Pr Pr {푥 = 1} {푥 + 1 2 푥and+ hence 푥 ≥ 2}the events and are indeed independent.2 1 1 0 1 [푥 = 1 ∧ 푥 = 1] = [{110, 111}] = 8 = 2 ⋅ 2 0 1 {푥 = 1} {푥 = 1} R Remark 0.8 — Disjointness vs independence. People sometimes confuse the notion of disjointness and in- dependence, but these are actually quite different. Two events and are disjoint if , which means that if happens then definitely does∅ not happen. They are퐴 independent퐵 if Pr 퐴 ∩ 퐵 =Pr Pr which means퐴 that knowing that퐵 happens gives us no infor- mation about whether happened[퐴 ∩ 퐵] = or not.[퐴] If [퐵]and have nonzero probability,퐴 then being disjoint implies that they are not independent,퐵 since in particular퐴 it 퐵 means that they are negatively correlated.

Conditional probability: If and are events, and happens with nonzero probability then we define the probability that happens conditioned on to be Pr 퐴 퐵Pr Pr 퐴. This corresponds to calculating the probability that happens if we already퐵 know that happened.퐴 Note that[퐵|퐴] =and [퐴are ∩ 퐵]/ independent[퐴] if and only if Pr Pr . 퐵 퐴 퐴 퐵 More than two events: [퐵|퐴] = [퐵] We can generalize this definition to more than two events. We say that events are mutually independent if knowing that any set of them occurred or didn’t occur does not 1 푘 change the probability that an event퐴 , … outside , 퐴 the set occurs. Formally, the condition is that for every subset , Figure 5: Consider the sample space and the events corresponding to : , : Pr Pr 푛 퐼 ⊆ [푘] , : , : {0, 1} and :퐴, 퐵, 퐶, 퐷, 퐸 . We can see퐴 that푥0 = 1 퐵 푖∈퐼 푖 푖 and1 are independent,0 1 2 is positively0 1 correlated2 [∧ 퐴 ] = ∏푖∈퐼 [퐴 ]. 푥 = 1 퐶 푥 +푥 +푥 ≥ 2 퐷 푥 +푥 +푥 = 0푚표푑2 For example, if , then the events , and with퐷 and푥0 + positively 푥1 = 0푚표푑2 correlated with , the three퐴 are mutually independent.3 On the other hand, the events events퐵 are mutually퐶 independent, and while 0 1 every pair out of is independent, the three , 푥 ∼and {0, 1} mod {푥 are= 1}not{푥mutually= 1} 퐴 퐵 2 events 퐴, 퐵, 퐷 are not mutually independent since {푥independent,= 1} even though every pair of these events is independent their intersection퐴, has 퐵, probability 퐸 instead of 0 1 0 1 {푥(can= you 1} see{푥 why?= 1} see also{푥Fig.+ 5 푥). = 0 2} 퐴, 퐵,. 퐸 2 1 8 4 1 1 1 1 = 2 ⋅ 2 ⋅ 2 = 8 mathematical background 35

0.4.1 Independent random variables We say that two random variables and are independent if for every , the events푛 and 푛 are independent. (We use 푋as ∶ {0,shorthand 1} → ℝ for 푌 ∶ {0, 1} →.) ℝ In other words, and are푢, independent 푣 ∈ ℝ if Pr {푋 = 푢} {푌 = 푣} Pr Pr for every{푋 = 푢} . For example,{푥 if two| 푋(푥) random = 푢} variables depend푋 on the푌 result of tossing different[푋 = coins 푢 ∧ 푌 then = 푣] they = are independent:[푋 = 푢] [푌 = 푣] 푢, 푣 ∈ ℝ Lemma 0.9 Suppose that and are disjoint subsets of and let be random 0 푘−1 0 푚−1 variables such that 푆 = {푠 , … , 푠 }and 푇 = {푡푛 , … , 푡 } for some functions {0, … , 푛 − 1} and 푋, 푌 ∶ {0, 1} .→ Then ℝ and 푠0 푠푘−1 푡0 푡푚−1 are independent. 푋 = 퐹푘 (푥 , … , 푥 ) 푌푚 = 퐺(푥 , … , 푥 ) 퐹 ∶ {0, 1} → ℝ 퐺 ∶ {0, 1} → ℝ 푋 푌

P The notation in the lemma’s statement is a bit cum- bersome, but at the end of the day, it simply says that if and are random variables that depend on two disjoint sets and of coins (for example, might be푋 the sum푌 of the first coins, and might be the largest consecutive푆 푇 stretch of zeroes in the second푋 coins), then they are independent.푛/2 푌 푛/2

Proof of Lemma 0.9. Let , and let and . Since and are disjoint,푘 we can reorder the indices so푚 that푎, 푏 ∈ ℝ 퐴 =and {푥 ∈ {0, 1} ∶ 퐹 (푥) = 푎} without퐵 = affecting {푥 ∈ {0, 1} any∶ 퐺(푥)of the = 푏}probabilities.푆 푇Hence we can write Pr where푆 = {0, … , 푘 − 1} 푇 = {푘, … , 푘 + 푚 − 1} 푛 . Another way to write this using string[푋 = 0 푛−1 0 푘−1 푎concatenation ∧ 푌 = 푏] = is |퐶|/2 that 퐶 = {푥 , … , 푥 ∶ (푥 , … , 푥 ), ∈ and 푘 푘+푚−1 퐴hence ∧ (푥 , … , 푥 ) ∈ 퐵}, which means that 푛−푘−푚 푛−푘−푚퐶 = {푥푦푧 ∶ 푥 ∈ 퐴, 푦 ∈ 퐵, 푧 ∈ {0, 1} } |퐶| = |퐴||퐵|2 Pr Pr 푛−푘−푚 |퐶|푛 |퐴|푘 |퐵|푚 2푛−푘−푚 2 = 2 2 2 = [푋 = 푎] [푌 = 푏]. ■

Note that if and are independent random variables then (if we let denote all the numbers that have positive probability of being the output푋 of 푌and , respectively) it holds that: 푋 푌 푆 , 푆 XY Pr푋 푌 Pr Pr (1) (2)

피[ ] =푎∈푆 ∑푋,푏∈푆푌 [푋 = 푎 ∧ 푌 = 푏] ⋅ 푎푏 = 푎∈푆∑푋,푏∈푆푌 [푋 = 푎] [푌 = 푏] ⋅ 푎푏 = Pr Pr (3)

(푎∈푆 ∑푋 [푋 = 푎] ⋅ 푎) (푏∈푆 ∑푌 [푌 = 푏]푏) =

피[푋] 피[푌 ] 36 an intensive introduction to cryptography

where the first equality ( ) follows from the independence of and , the second equality(1) ( ) follows by “opening the parenthe- ses” of the righthand side,= and(2) the third inequality ( ) follows푋 from푌 the definition of expectation.= (This is not an(3) “if and only if”;see Exercise 0.8.) = Another useful fact is that if and are independent random variables, then so are and for all functions . This is intuitively true since learning푋 푌 can only provide us with less information than퐹 does (푋) learning퐺(푌 ) itself. Hence, if learning퐹 , 퐺 ∶ ℝ → 푅 does not teach us anything about (and퐹 (푋) so also about ) then neither will learning . Indeed,푋 to prove this we can write for푋 every : 푌 퐹 (푌 ) 퐹 (푋) 푎, 푏 ∈ ℝ Pr Pr s.t. s.t.

[퐹 (푋) = 푎 ∧ 퐺(푌 ) = 푏] =푥 퐹(푥)=푎,푦Pr ∑ 퐺(푦)=푏Pr [푋 = 푥 ∧ 푌 = 푦] = s.t. s.t.

푥 퐹(푥)=푎,푦∑ 퐺(푦)=푏 [푋 = 푥] [푌 = 푦] = Pr Pr s.t. s.t. ⎛ ⎞ ⎛ ⎞ ⎜푥 ∑퐹(푥)=푎 Pr[푋 = 푥]⎟ ⋅ ⎜Pr푦 ∑퐺(푦)=푏 [푌 = 푦]⎟ = ⎝ ⎠ ⎝ ⎠ 0.4.2 Collections of independent[퐹 (푋) = random 푎] [퐺(푌 variables. ) = 푏]. We can extend the notions of independence to more than two random variables: we say that the random variables are mutually independent if for every , 0 푛−1 푋 , … , 푋 0 푛−1 Pr 푎 , … , 푎 ∈ ℝPr Pr

0 0 푛−1 푛−1 0 0 푛−1 푛−1 And similarly,[푋 = 푎 ∧ we ⋯ have ∧ 푋 that= 푎 ] = [푋 = 푎 ] ⋯ [푋 = 푎 ]. Lemma 0.10 — Expectation of product of independent random variables. If are mutually independent then

0 푛−1 푋 , … , 푋 푛−1 푛−1

푖 푖 Lemma 0.11 — Functions preserve피[∏푖=0 independence.푋 ] = ∏푖=0 피[푋If]. are mu- tually independent, and are defined as for 0 푛−1 some functions , then 푋 , … , 푋 are mutually 0 푛−1 푖 푖 푖 independent as well. 푌 , … , 푌 푌 = 퐹 (푋 ) 0 푛−1 0 푛−1 퐹 , … , 퐹 ∶ ℝ → ℝ 푌 , … , 푌

P We leave proving Lemma 0.10 and Lemma 0.11 as Exercise 0.9 Exercise 0.10. It is good idea for you stop now and do these exercises to make sure you are com- mathematical background 37

fortable with the notion of independence, as we will use it heavily later on in this course.

0.5 CONCENTRATION AND TAIL BOUNDS

The name “expectation” is somewhat misleading. For example, sup- pose that you and I place a bet on the outcome of 10 coin tosses, where if they all come out to be ’s then I pay you 100,000 dollars and other- wise you pay me 10 dollars. If we let be the random variable denoting your gain,1 then we see that 10 푋 ∶ {0, 1} → ℝ

But we don’t really “expect”−10 the result of−10 this experiment to be for you to gain 90 dollars.피[푋] = 2 Rather,⋅ 100000 99.9% − of (1 the − 2 time)10 you ∼ will 90. pay me 10 dollars, and you will hit the jackpot 0.01% of the times. However, if we repeat this experiment again and again (with fresh and hence independent coins), then in the long run we do expect your average earning to be close to 90 dollars, which is the reason why casinos can make money in a predictable way even though every individual bet is random. For example, if we toss independent and unbiased coins, then as grows, the number of coins that come up ones will be more and more concentrated around 푛 according to the famous “bell curve” (see푛Fig. 6). Much of probability theory is concerned with푛/2 so called concentration or tail bounds, which are upper bounds on the probability that a ran- dom variable deviates too much from its expectation. The first and simplest one of them is Markov’s inequality: 푋 Theorem 0.12 — Markov’s inequality. If is a non-negative random variable then Pr . 푋 [푋 ≥ 푘 피[푋]] ≤ 1/푘 Figure 6: The probabilities that we obtain a particular sum when we toss coins P converge quickly to the Gaussian/normal distribution. Markov’s Inequality is actually a very natural state- 푛 = 10, 20, 100, 1000 ment (see also Fig. 7). For example, if you know that the average (not the median!) household income in the US is 70,000 dollars, then in particular you can deduce that at most 25 percent of households make more than 280,000 dollars, since otherwise, even if the remaining 75 percent had zero income, the top 25 percent alone would cause the average income to be larger than 70,000 dollars. From this example you can already see that in many situations, Markov’s inequality will not be tight and the probability of devi- ating from expectation will be much smaller: see the Chebyshev and Chernoff inequalities below. 38 an intensive introduction to cryptography

Proof of Theorem 0.12. Let and define . That is, if and otherwise. Note that by 푋≥푘휇 definition, for every , 휇 = 피[푋] . We need푌 to = show 1 . But푌 this (푥) follows = 1 푋(푥) since ≥ 푘휇 푌 (푥) = 0 . 푥 푌 (푥) ≤ 푋/(푘휇) 피[푌 ] ≤ 1/푘 ■ 피[푌 ] ≤ 피[푋/푘(휇)] = 피[푋]/(푘휇) = 휇/(푘휇) = 1/푘 Going beyond Markov’s Inequality: Markov’s inequality says that a (non- negative) random variable can’t go too crazy and be, say, a million times its expectation, with significant probability. But ideally we would like to say that with푋 high probability, should be very close to its expectation, e.g., in the range where . This is not generally true, but does turn out to hold푋 when is obtained by combining (e.g., adding) many[0.99휇, independent 1.01휇] random휇 = variables. 피[푋] Figure 7 This phenomenon, variants of which are known as “law푋 of large num- : Markov’s Inequality tells us that a non- negative random variable cannot be much larger bers”, “central limit theorem”, “invariance principles” and “Chernoff than its expectation, with high probability. For exam- bounds”, is one of the most fundamental in probability and statistics, ple, if the expectation of 푋is , then the probability that must be at most , as otherwise just and is one that we heavily use in computer science as well. the contribution from this푋 part휇 of the sample space will be푋 too> 4휇 large. 1/4 0.5.1 Chebyshev’s Inequality A standard way to measure the deviation of a random variable from its expectation is by using its standard deviation. For a random variable , we define the variance of as Var where ; i.e., the variance is the average squared distance2 of from its expectation.푋 The standard deviation푋 of[푋]is = defined 피[(푋 − 휇)as ] Var휇 = . (This피[푋] is well-defined since the variance, being an average푋 of a square, is always a non-negative number.) 푋 휎[푋] = √ [푋] Using Chebyshev’s inequality, we can control the probability that a random variable is too many standard deviations away from its expectation.

Theorem 0.13 — Chebyshev’s inequality. Suppose that and Var . Then for every , Pr . 2 휇 =2 피[푋] Proof.휎 =The[푋] proof follows from Markov’s푘 > 0 [|푋 inequality. − 휇| ≥ 푘휎] We ≤ define 1/푘 the random variable . Then Var , and hence by Markov the probability that2 is at most .2 But clearly if푌 and = (푋 only − 휇) if 2피[푌2 ] =. [푋] =2 휎 ■ 2 2 2 푌 > 푘 휎 1/푘 (푋 − 휇) ≥ 푘 휎 |푋 − 휇| ≥ 푘휎 One example of how to use Chebyshev’s inequality is the setting when where ’s are independent and identically distributed (i.i.d for short) variables with values in where each 1 푛 푖 has expectation푋 = 푋 + ⋯. + Since 푋 푋 , we would like to say that is very likely to be in, say, the interval [0, 1] . Us- 푖 푖 ing Markov’s inequality1/2 directly피[푋] = will ∑ not피[푋 help] = us, 푛/2 since it will only tell 푋 [0.499푛, 0.501푛] mathematical background 39

us that is very likely to be at most (which we already knew, since it always lies between and ). However, since are independent,푋 100푛 1 푛 0 푛 푋 , … , 푋 Var Var Var (1)

(We leave showing1 this to the푛 reader as Exercise1 0.11.) 푛 [푋 + ⋯ + 푋 ] = [푋 ] + ⋯ + [푋 ]. For every random variable in , Var (if the variable is always in , it can’t be more than away from its expectation), 푖 푖 and hence (1) implies that Var푋 [0, 1]and hence[푋 ] ≤ 1 . For large , [0, 1] , and in particular1 if , we can √ use Chebyshev’s inequality to bound[푋] ≤the 푛 probability휎[푋] that ≤ is푛 not in √ √ 푛 푛 ≪ 0.001푛by . 푛 ≤ 0.001푛/푘 2 푋 [0.499푛,0.5.2 The 0.501푛] Chernoff1/푘 bound Chebyshev’s inequality already shows a connection between inde- pendence and concentration, but in many cases we can hope for a quantitatively much stronger result. If, as in the example above, where the ’s are bounded i.i.d random variables of mean , then as grows, the distribution of would be roughly 1 푛 푖 푋the =normal 푋 +or … +Gaussian 푋 distribution푋 that is, distributed according to the bell curve1/2 (see Fig.푛 6 and Fig. 8). This distribution푋 has the property of being very concentrated in the sense− that the probability of devi- ating standard deviations from the mean is not merely as is guaranteed by Chebyshev, but rather is roughly . Specifically,2 for 2 a normal푘 random variable of expectation and−푘 standard1/푘 deviation , the probability that is at most 푒 . That is, we have 2 an exponential decay of the probability푋 of deviation.휇 −푘 /2 휎 The following extremely|푋 − useful휇| ≥ 푘휎 theorem shows2푒 that such expo- nential decay occurs every time we have a sum of independent and bounded variables. This theorem is known under many names in dif- ferent communities, though it is mostly called the Chernoff bound in the computer science literature:

Theorem 0.14 — Chernoff/Hoeffding bound. If are i.i.d random variables such that and for every , then for 1 푛 every 푋 , … , 푋 푖 푖 Figure 8: In the normal distribution or the Bell curve, 푋 ∈ [0, 1] 피[푋 ] = 푝 푖 the probability of deviating standard deviations Pr from the expectation shrinks exponentially in , and 휖 > 0 푛−1 2 −2휖 푛 specifically with probability푘 at least 2 , a 푖 random variable of expectation and standard푘2 [∣∑푖=0 푋 − 푝푛∣ > 휖푛] ≤ 2 ⋅ 푒 . −푘 /2 We omit the proof, which appears in many texts, and uses Markov’s deviation satisfies 1 − 2푒. This figure inequality on i.i.d random variables that are of the form gives more precise푋 bounds for 휇 . (Image credit:Imran휎 휇−푘휎 Baghirov) ≤ 푋 ≤ 휇+푘휎 for some carefully chosen parameter . See Exercise 0.14 0 푛 푘 = 1, 2, 3, 4, 5, 6 for a proof휆푋푖 of the simple (but highly푌 useful, … , 푌 and representative) case 푖 where푌 = 푒 each is valued and . (See휆 also Exercise 0.15 for a generalization.) 푖 푋 {0, 1} 푝 = 1/2 40 an intensive introduction to cryptography

0.6 EXERCISES

Exercise 0.1 Prove that for every finite , there are partial functions from to . |푆| 푆, 푇 (|푇 | + 1) ■ 푆 푇 Exercise 0.2 — -notation. For every pair of functions below, deter- mine which of the following relations holds: , , or 푂 . 퐹 , 퐺 퐹 = 푂(퐺) 퐹 = Ω(퐺) 퐹a. = 표(퐺) ,퐹 = 휔(퐺) . b. 퐹 (푛) = 푛, 퐺(푛) = 100푛. c. log , √ log . 퐹 (푛) = 푛 퐺(푛) = 푛 2 log( (푛)) d. 퐹 (푛) = 푛 , 푛 퐺(푛) = 2 √ 푛 e. √ , (where is the number of -sized 퐹 (푛) = 푛 퐺(푛) = 2 2 one way to do this is to use Stirling’s approximation subsets of a푛 set of size ) and0.1푛 푛 . See footnote for hint.2 ⌈0.2푛⌉ 푘 for the factorial function.. 퐹 (푛) = ( ) 퐺(푛) = 2 0.1푛( ) 푘 푛 푔(푛) = 2 ■ Exercise 0.3 Give an example of a pair of functions such that neither nor holds. 퐹 , 퐺 ∶ ℕ → ℕ ■ 퐹 = 푂(퐺) 퐺 = 푂(퐹 ) Exercise 0.4 — Properties of expectation and variance. In the follow- ing exercise denote random variables over some sample space . You can assume that the probability on is the uniform distribution—푋, every 푌 point is output with probability . Thus 푆 . We define the variance푆 and standard deviation of and as above푠 (e.g., 1/|푆| and the 푠∈푆 standard피[푋] = (1/|푆|) deviation ∑ is the푋(푠) square root of the variance). You can2 reuse your answers푋 to prior푌 questions in the푉 푎푟[푋]later ones. = 피[(푋 − 피[푋]) ]

1. Prove that is always non-negative.

2. Prove that 푉 푎푟[푋] . 2 2 3. Prove that always푉 푎푟[푋] = 피[푋 ] − 피[푋]. 2 2 4. Give an example for피[푋 a random] ≥ 피[푋] variable such that . 2 2 5. Give an example for a random variable 푋 such that its피[푋 standard] > 피[푋] deviation is not equal to . 푋 6. Give an example for a random피[|푋 − 피[푋]|]variable such that its standard deviation is equal to to . 푋 7. Give an example for two피[|푋 random − 피[푋]|] variables such that XY . 푋, 푌 피[ ] = 피[푋]피[푌 ] mathematical background 41

8. Give an example for two random variables such that XY . 푋, 푌 피[ ] ≠ 9.피[푋]피[푌 Prove that ] if and are independent random variables (i.e., for every , Pr Pr Pr ) then XY 푋 and푌 . 푥, 푦 [푋 = 푥 ∧ 푌 = 푦] = [푋 = 푥] [푌 = 푦] 피[ ] = 피[푋]피[푌 ] 푉 푎푟[푋 + 푌 ] = 푉 푎푟[푋] + 푉 푎푟[푌 ] ■ Exercise 0.5 — Random hash function. Suppose that is chosen to be a random function mapping the numbers to the numbers . That is, for every , is퐻 chosen to be a ran- dom number in and that choice{1, is … done , 푛} independently for {1,every .., 푚}. For every 푖 ∈ {1, …, define , 푛} 퐻(푖) the random variable to equal if there was{1, … a collision , 푚} between and in the sense that 푖,푗 푖 and to푖 < equal 푗 ∈ {1,otherwise. … , 푛} 푋 1 퐻(푖) 퐻(푗) 퐻(푖)1. For = every 퐻(푗) , compute0 .

푖,푗 2. Define 푖 < 푗 to be피[푋 the] total number of collisions. Compute as a function of and . In particular your answer should 푖<푗 푖,푗 imply that푌 = if ∑ 푋 then and hence in expectation 피[푌there ] should be at least2 푛 one푚 collision and so the function will not be one to one.푚 < 푛 /1000 피[푌 ] > 1 퐻 3. Prove that if then the probability that is one to one is at least . 2 푚 > 1000 ⋅ 푛 퐻 4. Give an example0.9 of a random variable (unrelated to the function ) that is always equal to a non-negative integer, and such that but Pr . 푍 퐻 5.피[푍] Prove ≥ that 1000 if [푍 > 0]

0.7 EXERCISES

Exercise 0.6 Suppose that we toss three independent fair coins . What is the probability that the XOR of , , and is equal to ? What is the probability that the AND of these three values is equal푎, 푏, 푐 to ∈ {0,? Are 1} these two events independent? 푎 푏 푐 1 ■ 1 Exercise 0.7 Give an example of random variables such that XY . 3 푋, 푌 ∶ {0, 1} → ℝ ■ 피[ ] ≠ 피[푋] 피[푌 ] 42 an intensive introduction to cryptography

Exercise 0.8 Give an example of random variables such that and are not independent but XY .3 푋, 푌 ∶ {0, 1} → ℝ ■ 푋 푌 피[ ] = 피[푋] 피[푌 ] Exercise 0.9 — Product of expectations. Prove Lemma 0.10 ■

Exercise 0.10 — Transformations preserve independence. Prove Lemma 0.11 ■

Exercise 0.11 — Variance of independent random variables. Prove that if are independent random variables then Var Var . 0 푛−1 0 푋 , … , 푋 푛−1 [푋 + ⋯ + ■ 푛−1 푖 푋 ] = ∑푖=0 [푋 ] Exercise 0.12 — Entropy (challenge). Recall the definition of a distribution over some finite set . Shannon defined the entropy of a distribution , denoted by , to be log . The idea is that if 휇is a distribution of entropy푆 , then encoding members of will require 푥∈푆 휇 bits, in an amortized퐻(휇) sense.∑ In휇(푥) this exercise(1/휇(푥)) we justify this definition.휇 Let be such that 푘. 휇 푘 1. Prove that for every one to one function , 휇 . 퐻(휇) = 푘 ∗ 2. Prove that for every , there is some and퐹 a ∶ one-to-one 푆 → {0, 1} function 푥∼휇 피 |퐹 (푥)| ≥ 푘 , such that , where denotes푛 the experiments∗ of휖 choosing푛 푛 each independently 푥∼휇 퐹from ∶ 푆 using→ {0, the 1} distribution피 . |퐹 (푥)| ≤ 푛(푘 + 휖) 푥 ∼ 휇 0 푛−1 푥 , … , 푥 ■ 푆 휇 Exercise 0.13 — Entropy approximation to binomial. Let log 3 3 While you don’t need this to solve this exercise, this log . is the function that maps to the entropy (as defined Prove that for every and , if is퐻(푝) large = enough 푝 (1/푝) then +4 in Exercise 0.12) of the -biased coin distribution over (1 − 푝) (1/(1 − 푝)) , which is the function푝 s.y. and 푝 . 푝 ∈ (0, 1) 휖 > 0 푛 4{0,Hint: 1} Use Stirling’s formula휇 for ∶ {0, approximating 1} → [0, 1] the factorial휇(0) = 1 function. − 푝 휇(1) = 푝 (퐻(푝)−휖)푛 푛 (퐻(푝)+휖)푛 2 ( ) ≤ 2 where is the binomial coefficient푝푛 which is equal to the number푛 of -size subsets of 푛!. 푘 푘!(푛−푘)! ( ) ■ 푘 {0, … , 푛 − 1} Exercise 0.14 — Chernoff using Stirling. 1. Prove that Pr . 푛 푥∼{0,1} 푖 푛 −푛 [∑ 푥 = 2.푘] Use = this(푘)2 and Exercise 0.13 to prove (an approximate version of) the Chernoff bound for the case that are i.i.d. random variables over each equaling and with probability . 0 푛−1 That is, prove that for every , and푋 , … , 푋 as above, Pr {0, 1} . 0 1 1/2 2 0 푛−1 푛−1 푛/2 0.1⋅휖휖푛 > 0 푋 , … , 푋 ■ [| ∑푖=0 − | > 휖푛] < 2 mathematical background 43

Exercise 0.15 — Poor man’s Chernoff. Exercise 0.14 establishes the Chernoff bound for the case that are i.i.d variables over with expectation . In this exercise we use a slightly different method 0 푛−1 (bounding the moments푋of, the … , random 푋 variables) to establish{0, a 1} version of Chernoff1/2 where the random variables range over and their expectation is some number that may be different than . Let be i.i.d random variables with [0, 1] and Pr . Define 푝 ∈ [0, 1] . 0 푛−1 푖 1/2 푋 , … , 푋 피 푋 = 푝 푖 푖 푖 1.[0 Prove ≤ 푋 that≤ 1] for = every1 푌 = 푋 − 푝, if there exists one such that is odd then . 0푖 푛−1 푖 푛−1 푗푗 , … , 푗 ∈ ℕ 푖 푗 5 Hint: Bound the number of tuples such 푖 2. Prove that for피[∏ every푖=0 푌 , ] = 0 .5 that every is even and using the Binomial 0 푛−1 푛−1 coefficient and the fact that in 푗any, … such , 푗 tuple there are 푘 푘/2 푖 푖 푖=0 푖 log 6 at most 푗 distinct indices.∑ 푗 = 푘 3. Prove that for every 푘 피[(∑, Pr 푌 ) ] ≤ (10푘푛) . 6 2 Hint: Set and then show that if the −휖 푛/(10000 1/휖) event 푘/2 happens then the random variable 푖 푖 2 휖 > 0 [| ∑ 푌 | ≥ 휖푛] ≥ 2 ■ is a푘 factor = 2⌈휖 of푛/1000⌉larger than its expectation. 푖 |푘 ∑ 푌 | ≥ 휖푛 −푘 Exercise 0.16 — Lower bound for distinguishing coins. The Chernoff bound (∑ 푌푖) 휖 can be used to show that if you were given a coin of bias at least , you should only need samples to be able to reject the “null hypothesis” that the coin is completely2 unbiased with extremely휖 high confidence. In the following푂(1/휖 ) somewhat more challenging question, we try to show a converse to this, proving that distinguishing between a fair every coin and a coin that outputs “heads” with probability requires at least samples. Let be the uniform2 distribution over and be the 1/2+휖- biased distributionΩ(1/휖 corresponding) to tossing coins푛 in which each one has a probability푃 of of equaling and{0, 1} probability푄 1/2of + 휖 equaling . Namely the probability of 푛 according to is equal to 1/2 + 휖 . 1 푛 1/2 − 휖 0 푛 푥 ∈ {0, 1} 푄 푖 1. Prove∏ that푖=1(1/2 for every − 휖 + threshold 2휖푥 ) between and , if then the probabilities that under and respectively 2 differ by at most . Therefore,휃 one cannot0 use the푛 test푛 < whether 1/(100휖) 푖 the number of heads is above∑ 푥 or≤ below 휃 some푃 threshold푄 to reliably distinguish between0.1 these two possibilities unless the number of samples of the coins is at least some constant times . 2 2. Prove that푛 for every function mapping to 1/휖, if then the probabilities that under푛 and re- spectively2 differ by at most 퐹. Therefore,{0, if1} the number{0, 1} of samples푛 < is1/(100휖) smaller than a constant times 퐹then (푥) there = 1 is simply푃 no test푄 that can reliably distinguish between0.1 these2 two possibilities. 1/휖 ■ 44 an intensive introduction to cryptography

Exercise 0.17 — Simulating distributions using coins. Our model for proba- bility involves tossing coins, but sometimes algorithms require sam- pling from other distributions, such as selecting a uniform number in for some푛 . Fortunately, we can simulate this with an exponentially small probability of error: prove that for every , if {0,log … , 푀, − then 1} there is a푀 function such that (1) The probability that 푛is at most and푀(2)푛the > 푘⌈ 푀⌉ 퐹 ∶ {0, 1} → {0, … , 푀 − 1} ∪ {⊥} distribution of conditioned on is equal to−푘 the uniform 7 7 Hint: Think of as choosing numbers distribution over . 퐹 (푥) = ⊥ 2 log 푛 . Output the first such ■ 퐹 (푥) 퐹 (푥) ≠ ⊥ number that is in푥 ∈⌈ {0,푀⌉ 1} . 푘 1 푘 {0, … , 푀 − 1} 푦 , … , 푦 ∈ {0, … , 2 − 1} Exercise 0.18 — Sampling. Suppose that a country has 300,000,000 citi- {0, … , 푀 − 1} zens, 52 percent of which prefer the color “green” and 48 percent of which prefer the color “orange”. Suppose we sample random citi- zens and ask them their favorite color (assume they will answer truth- fully). What is the smallest value among the following푛 choices so that the probability that the majority of the sample answers “green” is at most ? 푛 a. 1,0000.05 b. 10,000 c. 100,000 d. 1,000,000

Exercise 0.19 Would the answer to Exercise 0.18 change if the country had 300,000,000,000 citizens? ■

Exercise 0.20 — Sampling (2). Under the same assumptions as Exer- cise 0.18, what is the smallest value among the following choices so that the probability that the majority of the sample answers “green” is at most ? 푛 −100 a. 1,0002 b. 10,000 c. 100,000 d. 1,000,000 e. It is impossible to get such low probability since there are fewer than citizens. 100 2 ■ 1 Introduction

Additional reading: Chapters 1 and 2 of Katz-Lindell book. Sections 2.1 (Introduction) and 2.2 (Shannon ciphers and perfect security) in the 1 Referring to a book such as Katz-Lindell or Boneh- 1 Boneh Shoup book. Shoup can be useful during this course to supplement Ever since people started to communicate, there were some mes- these notes with additional discussions, extensions, details, practical applications, or references. In partic- sages that they wanted kept secret. Thus cryptography has an old ular, in the current state of these lecture notes, almost though arguably undistinguished history. For a long time cryptogra- all references and credits are omitted unless the name phy shared similar features with Alchemy as a domain in which many has become standard in the literature, or I believe that the story of some discovery can serve a pedagogical otherwise smart people would be drawn into making fatal mistakes. point. See the Katz-Lindell book for historical notes Indeed, the history of cryptography is littered with the figurative and references. This lecture shares a lot of text with (though is not identical to) my lecture on cryptog- corpses of cryptosystems believed secure and then broken, and some- raphy in the introduction to theoretical computer times with the actual corpses of those who have mistakenly placed science lecture notes. their faith in these cryptosystems. The definitive text on the history of cryptography is David Kahn’s “The Codebreakers”, whose title al- 2 Traditionally, cryptography was the name for the 2 ready hints at the ultimate fate of most cryptosystems. (See also “The activity of making codes, while cryptoanalysis is the Code Book” by Simon Singh.) name for the activity of breaking them, and cryptology is the name for the union of the two. These days We recount below just a few stories to get a feel for this field. But cryptography is often used as the name for the broad before we do so, we should introduce the cast of characters. The basic science of constructing and analyzing the security of setting of “encryption” or “secret writing” is the following: one per- not just encryptions but many schemes and protocols for protecting the confidentiality and integrity of son, whom we will call Alice, wishes to send another person, whom communication and computation. we will call Bob, a secret message. Since Alice and Bob are not in the same room (perhaps because Alice is imprisoned in a castle by her cousin the queen of England), they cannot communicate directly and need to send their message in writing. Alas, there is a third person, whom we will call Eve, that can see their message. Therefore Alice needs to find a way to encode or encrypt the message so that only Bob (and not Eve) will be able to understand it.

1.1 SOME HISTORY

In 1587, Mary the queen of Scots, and heir to the throne of England, wanted to arrange the assassination of her cousin, queen Elisabeth I of England, so that she could ascend to the throne and finally escape

Compiled on 9.23.2021 13:32 46 an intensive introduction to cryptography

the house arrest under which she had been for the last 18 years. As part of this complicated plot, she sent a coded letter to Sir Anthony Babington. Mary used what’s known as a substitution cipher where each letter is transformed into a different obscure symbol (see Fig. 1.1). At a first look, such a letter might seem rather inscrutable- a meaningless se- quence of strange symbols. However, after some thought, one might recognize that these symbols repeat several times and moreover that different symbols repeat with different frequencies. Now it doesn’t Figure 1.1: Snippet from encrypted communication take a large leap of faith to assume that perhaps each symbol corre- between queen Mary and Sir Babington sponds to a different letter and the more frequent symbols correspond to letters that occur in the alphabet with higher frequency. From this observation, there is a short gap to completely breaking the cipher, which was in fact done by queen Elisabeth’s spies who used the de- coded letters to learn of all the co-conspirators and to convict queen Mary of treason, a crime for which she was executed. Trusting in su- perficial security measures (such as using “inscrutable” symbols) isa trap that users of cryptography have been falling into again and again over the years. (As in many things, this is the subject of a great XKCD cartoon, see Fig. 1.2.) The Vigenère cipher is named after Blaise de Vigenère who de- Figure 1.2: XKCD’s take on the added security of using scribed it in a book in 1586 (though it was invented earlier by Bellaso). uncommon symbols The idea is to use a collection of substitution ciphers - if there are different ciphers then the first letter of the plaintext is encoded with the first cipher, the second with the second cipher, the with the푛 cipher, and then the letter is again encoded with푡ℎ the first cipher.푡ℎ The key is usually a word푠푡 or a phrase of letters,푛 and the 푛 substitution cipher is푛 obtained + 1 by shifting each letter positions in푡ℎ the alphabet. This “flattens” the frequencies푛 and makes it much 푖 푖harder to do frequency analysis, which is why this cipher푘 was consid- ered “unbreakable” for 300+ years and got the nickname “le chiffre Figure 1.3: Confederate Cipher Disk for implementing the Vigenère cipher indéchiffrable” (“the unbreakable cipher”). Nevertheless, Charles Babbage cracked the Vigenère cipher in 1854 (though he did not pub- lish it). In 1863 Friedrich Kasiski broke the cipher and published the result. The idea is that once you guess the length of the cipher, you can reduce the task to breaking a simple substitution cipher which can be done via frequency analysis (can you see why?). Confederate gen- erals used Vigenère regularly during the civil war, and their messages were routinely cryptanalyzed by Union officers. The Enigma cipher was a mechanical cipher (looking like a type- Figure 1.4: Confederate encryption of the message “Gen’l Pemberton: You can expect no help from this writer, see Fig. 1.5) where each letter typed would get mapped into a side of the river. Let Gen’l Johnston know, if possible, different letter depending on the (rather complicated) key and current when you can attack the same point on the enemy’s state of the machine which had several rotors that rotated at different lines. Inform me also and I will endeavor to make a diversion. I have sent some caps. I subjoin a despatch paces. An identically wired machine at the other end could be used from General Johnston.” introduction 47

to decrypt. Just as many ciphers in history, this has also been believed by the Germans to be “impossible to break” and even quite late in the war they refused to believe it was broken despite mounting evidence to that effect. (In fact, some German generals refused to believe it was broken even after the war.) Breaking Enigma was an heroic effort which was initiated by the Poles and then completed by the British at Bletchley Park, with Alan Turing (of the Turing machines) playing a key role. As part of this effort the Brits built arguably the world’s first large scale mechanical computation devices (though they looked more similar to washing machines than to iPhones). They were also helped along the way by some quirks and errors of the German operators. For example, the fact that their messages ended with “Heil Hitler” turned out to be quite useful. Here is one entertaining anecdote: the Enigma machine would never map a letter to itself. In March 1941, Mavis Batey, a cryptana- lyst at Bletchley Park received a very long message that she tried to decrypt. She then noticed a curious property— the message did not contain the letter “L”.3 She realized that the probability that no “L”’s appeared in the message is too small for this to happen by chance. Hence she surmised that the original message must have been com- posed only of L’s. That is, it must have been the case that the operator, perhaps to test the machine, have simply sent out a message where he Figure 1.5: In the Enigma mechanical cipher the secret repeatedly pressed the letter “L”. This observation helped her decode key would be the settings of the rotors and internal wires. As the operator types up their message, the the next message, which helped inform of a planned Italian attack and encrypted version appeared in the display area above, secure a resounding British victory in what became known as “the and the internal state of the cipher was updated (and Battle of Cape Matapan”. Mavis also helped break another Enigma so typing the same letter twice would generally result in two different letters output). Decrypting follows machine. Using the information she provided, the Brits were able the same process: if the sender and receiver are using to feed the Germans with the false information that the main allied the same key then typing the ciphertext would result in the plaintext appearing in the display. invasion would take place in Pas de Calais rather than on Normandy. 3 Here is a nice exercise: compute (up to an order In the words of General Eisenhower, the intelligence from Bletchley of magnitude) the probability that a 50-letter long park was of “priceless value”. It made a huge difference for the Allied message composed of random letters will end up not containing the letter “L”. war effort, thereby shortening World War II and saving millions of lives. See also this interview with Sir Harry Hinsley.

1.2 DEFINING ENCRYPTIONS

Many of the troubles that designers faced over history (and still face!) can be attributed to not properly defining or under- standing what the goals they want to achieve are in the first place. We now turn to actually defining what is an encryption scheme. Clearly we can encode every message as a string of bits, i.e., an element of for some . Similarly, we can encode the key as a string of bits as well,ℓ i.e., an element of for some . Thus, we can think of {0,an encryption 1} schemeℓ as composed푛 of two functions. The encryption function maps a secret key{0, 1} and푛 a message (known also 푛 퐸 푘 ∈ {0, 1} 48 an intensive introduction to cryptography

as plaintext) into a ciphertext for some . We write this as ℓ. The decryption function 퐿does the reverse operation, mapping푚 ∈ {0, the 1} secret key and푐 the ∈ ciphertext {0, 1} back퐿 into the 푘 plaintext message푐 = 퐸 (푚), which we write as 퐷 . The basic equa- tion is that if we use the same key for푘 encryption and decryption,푐 then 푘 we should get the same푚 message back. That푚 = is, 퐷 for(푐) every and , 푛 ℓ 푘 ∈ {0, 1} 푚 ∈ {0, 1}

푘 푘 This motivates the following푚 = 퐷definition(퐸 (푚)) . which attempts to capture what it means for an encryption scheme to be valid or “make sense”, regardless of whether or not it is secure:

Definition 1.1 — Valid encryption scheme. Let and be two functions mapping natural numbers to natural numbers. A pair of polynomial-time computable functionsℓ ∶ ℕ → ℕ 퐶map- ∶ ℕ → ℕ ping strings to strings is a valid private key encryption scheme (or encryption scheme for short) with plaintext length(퐸, function 퐷) and ciphertext length function if for every , and , and ℓ(⋅) 푛 ℓ(푛) 퐶(⋅) 푛 ∈ ℕ 푘 ∈ {0, 1} 푘 푚 ∈ {0, 1} |퐸 (푚)| = 퐶(푛) (1.1)

퐷(푘, 퐸(푘, 푚)) = 푚 . We will often write the first input (i.e., the key) to the encryp- tion and decryption as a subscript and so can write (1.1) also as .

푘 푘 퐷 (퐸 (푚)) = 푚 Figure 1.6: A private-key encryption scheme is a pair of algorithms such that for every key and plaintext ,

is a ciphertext푛 of length퐸, 퐷 . The encryptionℓ(푛) scheme 푘is ∈valid {0,if 1} for every such ,푥 ∈ {0, 1} . That푐 = is, 퐸 the푘(푚) decryption of an encryption퐶(푛) of is , as long as both encryption and decryption푦 퐷 use푘(푦) the = same 푥 key. 푥 푥

The validity condition implies that for any fixed , the map is one to one (can you see why?) and hence the ciphertext length is always at least the plaintext length. Thus we푘 typically푚 focus ↦ 푘 퐸on(푚) the plaintext length as the quantity to optimize in an encryption introduction 49

scheme. The larger is, the better the scheme, since it means we need a shorter secret key to protect messages of the same length. ℓ(푛)

R Remark 1.2 — A note on notation, and comparison with Katz-Lindell, Boneh-Shoup, and other texts.. A note on notation: We will always use to denote natural numbers. The number will often denote푖, 푗, theℓ, 푛 length of our secret key. The length of the key (or another closely related number)푛 is often known as the security parame- ter in the literature. Katz-Lindell also uses to denote this parameter, while Boneh-Shoup and Rosulek use for it. (Some texts also use the Greek letter푛 for the same parameter.) We chose to denote the security 휆parameter by as to correspond with the standard휅 algorithmic notation for input length (as in or time algorithms).푛 We often2 use to denote the length of the message,푂(푛) 푂(푛sometimes) also known as “block length” since longer messages areℓ simply chopped into “blocks” of length and also appropriately padded. We will use to denote the secret key, to denote ℓ the secret plaintext message, and to denote the en- crypted ciphertext.푘 Note that are푚 not numbers but rather bit strings of lengths 푐 respec- tively. We will also sometimes푘, use 푚, 푐 and to denote strings, and so sometimes use 푛,as ℓ(푛), the plaintext 퐶(푛) and as the ciphertext. In general, while푥 we try푦 to reserve variable names for particular purposes,푥 cryptography 푦uses so many concepts that it would sometimes need to “reuse” the same letter for different purposes. For simplicity, we denote the space of possible keys as and the space of possible messages as for 푛 . Boneh-Shoup uses a more generalℓ no- {0,tation 1} of for the space of all possible keys and{0, 1} for theℓ =space ℓ(푛) of all possible messages. This does not make much풦 difference since we can representℳ every discrete object such as a key or message as a binary string. (One difference is that in principle the space of all possible messages could include messages of unbounded length, though in such a case what is done in both theory and practice is to break these up into finite-size blocks and encrypt one block at a time.)

1.3 DEFINING SECURITY OF ENCRYPTION

Definition 1.1 says nothing about security and does not rule out trivial “encryption” schemes such as the scheme that simply outputs the plaintext as is. Defining security is tricky, and we’ll take it 푘 one step at a time, but let’s start by pondering퐸 (푚) what = is 푚 secret and what 50 an intensive introduction to cryptography

is not. A priori we are thinking of an attacker Eve that simply sees the ciphertext and does not know anything on how it was generated. So, it does not know the details of and , and certainly 푘 does not know푐 the = 퐸 secret(푚) key . However, many of the troubles past cryptosystems went through were caused by them퐸 relying퐷 on “secu- rity through obscurity”— trusting푘 that the fact their methods are not known to their enemy will protect them from being broken. This is a faulty assumption - if you reuse a method again and again (even with a different key each time) then eventually your adversaries will figure out what you are doing. And if Alice and Bob meet frequently in a secure location to decide on a new method, they might as well take the opportunity to exchange their secret messages… These considerations led Auguste Kerckhoffs in 1883 to state the following principle:

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge. 4 4 The actual quote is “Il faut qu’il n’exige pas le Why is it OK to assume the key is secret and not the algorithm? secret, et qu’il puisse sans inconvénient tomber entre les mains de l’ennemi” loosely translated as “The Because we can always choose a fresh key. But of course that won’t system must not require secrecy and can be stolen by help us much if our key is “1234” or “passw0rd!”. In fact, if you use the enemy without causing trouble”. According to Steve Bellovin the NSA version is “assume that the any deterministic algorithm to choose the key then eventually your first copy of any device we make is shipped tothe adversary will figure this out. Therefore for security we must choose Kremlin”. the key at random and can restate Kerckhoffs’s principle as follows:

There is no secrecy without randomness

This is such a crucial point that is worth repeating:

There is no secrecy without randomness

At the heart of every cryptographic scheme there is a secret key, and the secret key is always chosen at random. A corollary of that is that to understand cryptography, you need to know some probability theory. Fortunately, we don’t need much of probability- only probabil- ity over finite spaces, and basic notions such as expectation, variance, concentration and the union bound suffice for most of we need. In fact, understanding the following two statements will already get you much of what you need for cryptography:

• For every fixed string , if you toss a coin times, the probability that the heads/tails pattern푛 will be exactly is . 푥 ∈ {0, 1} 푛 −푛 • A probability of is really really small. 푥 2 −128 1.3.1 Generating randomness2 in actual cryptographic systems How do we actually get random bits in actual systems? The main idea is to use a two stage approach. First we need to get some data that introduction 51

is unpredictable from the point of view of an attacker on our system. Some sources for this could be measuring latency on the network or hard drives (getting harder with solid state disk), user keyboard and mouse movement patterns (problematic when you need fresh ran- domness at boot time ), clock drift and more, there are some other sources including audio, video, and network. All of these can be prob- lematic, especially for servers or virtual machines, and so hardware based random number generators based on phenomena such as ther- mal noise or nuclear decay are becoming more popular. Once we have some data that is unpredictable, we need to estimate the en- tropy in it. You can roughly imagine that has bits of entropy if the probability that푋 an attacker can guess is at most . People then use a hash function (an object we’ll talk about푋 more푘 later)−푘 to map into a string of length which is then hopefully푋 distributed2 (close to) uniformly at random. All of this process, and especially under-푋 standing the amount of푘 information an attacker may have on the en- tropy sources, is a bit of a dark art and indeed a number of attacks on cryptographic systems were actually enabled by weak generation of randomness. Here are a few examples. One of the first attacks was on the SSL implementation of Netscape (the browser at the time). Netscape used the following “unpre- dictable” information— the time of day and a process ID both of which turned out to be quite predictable (who knew attackers have clocks too?). Netscape tried to protect its security through “security through obscurity” by not releasing the source code for their pseu- dorandom generator, but it was reverse engineered by Ian Goldberg and David Wagner (Ph.D students at the time) who demonstrated this attack. In 2006 a programmer removed a line of code from the procedure to generate entropy in OpenSSL package distributed by Debian since it caused a warning in some automatic verification code. As a result for two years (until this was discovered) all the randomness generated by this procedure used only the process ID as an “unpredictable” source. This means that all communication done by users in that period is fairly easily breakable (and in particular, if some entities recorded that communication they could break it also retroactively). This caused a huge headache and a worldwide regeneration of keys, though it is believed that many of the weak keys are still used. See XKCD’s take on that incident. In 2012 two separate teams of researchers scanned a large num- ber of RSA keys on the web and found out that about 4 percent of them are easy to break. The main issue were devices such as routers, internet-connected printers and such. These devices sometimes run variants of Linux- a desktop operating system- but without a hard Figure 1.7: XKCD Cartoon: Random number generator 52 an intensive introduction to cryptography

drive, mouse or keyboard, they don’t have access to many of the en- tropy sources that desktops have. Coupled with some good old fash- ioned ignorance of cryptography and software bugs, this led to many keys that are downright trivial to break, see this blog post and this web page for more details. After the entropy is collected and then “purified” or “extracted” to a uniformly random string that is, say, a few hundred bits long, we of- ten need to “expand” it into a longer string that is also uniform (or at least looks like that for all practical purposes). We will discuss how to go about that in the next lecture. This step has its weaknesses too, and in particular the Snowden documents, combined with observations of Shumow and Ferguson, strongly suggest that the NSA has deliberately inserted a trapdoor in one of the pseudorandom generators published by the National Institute of Standards and Technologies (NIST). Fortu- nately, this generator wasn’t widely adopted, but apparently the NSA did pay 10 million dollars to RSA security so the latter would make this generator their default option in their products.

1.4 DEFINING THE SECRECY REQUIREMENT.

Defining the secrecy requirement for an encryption is not simple. Over the course of history, many smart people got it wrong and con- vinced themselves that ciphers were impossible to break. The first per- son to truly ask the question in a rigorous way was Claude Shannon in 1945 (though a partial version of his manuscript was only declassified in 1949). Simply by asking this question, he made an enormous contri- bution to the science of cryptography and practical security. We now will try to examine how one might answer it. Let me warn you ahead of time that we are going to insist on a mathematically precise definition of security. That means that the defi- nition must capture security in all cases, and the existence of a single counterexample, no matter how “silly”, would make us rule out a candidate definition. This exercise of coming up with “silly” coun- terexamples might seem, well, silly. But in fact it is this method that has led Shannon to formulate his theory of secrecy, which (after much followup work) eventually revolutionized cryptography, and brought this science to a new age where Edgar Allan Poe’s maxim no longer holds, and we are able to design ciphers which human (or even non- human) ingenuity cannot break. The most natural way to attack an encryption is for Eve to guess all possible keys. In many encryption schemes this number is enormous and this attack is completely infeasible. For example, the theoretical number of possibilities in the Enigma cipher was about which roughly means that even if we filled the milky way galaxy113 with com- puters operating at light speed, the sun would still die out10 before it introduction 53

5 There are about atoms in the galaxy, so even if finished examining all the possibilities.5 One can understand why the we assumed that each68 one of those atoms was a com- Germans thought it was impossible to break. (Note that despite the puter that can process10 say decryption attempts per second (as the speed of light is meters per number of possibilities being so enormous, such a key can still be eas- 21 second and the diameter of10 an atom is9 about ily specified and shared between Alice and Bob by writing down meters), then it would still take 10 −12 digits on a piece of paper.) Ray Miller of the NSA had calculated that, seconds, which is about years113−89 to exhaust10 all24 possibilities, while the sun is estimated to burn out in 113 17 10 = 10 in the way the Germans used the machine, the number of possibilities about 5 billion years. 10 was “only” , but this is still extremely difficult to pull off even to- day, and many23 orders of magnitudes above the computational powers during the WW-II10 era. Thus clearly, it is sometimes possible to break an encryption without trying all possibilities. A corollary is that hav- ing a huge number of key combinations does not guarantee security, as an attacker might find a shortcut (as the allies did for Enigma) and recover the key without trying all options. Since it is possible to recover the key with some tiny probability (e.g. by guessing it at random), perhaps one way to define security of an encryption scheme is that an attacker can never recover the key with probability significantly higher than that. Here is an attempt at such a definition:

Definition 1.3 — Security of encryption: first attempt. An encryption scheme is -secure if no matter what method Eve employs, the probability that she can recover the true key from the cipher- text is(퐸, at most 퐷) 푛 . −푛 푘 푐 2

P When you see a mathematical definition that attempts to model some real-life phenomenon such as security, you should pause and ask yourself:

1. Do I understand mathematically what the defini- tion is stating? 2. Is it a reasonable way to capture the real life phe- nomenon we are discussing?

One way to answer question 2 is to try to think of both examples of objects that satisfy the definition and examples of objects that violate it, and see if this conforms to your intuition about whether these objects display the phenomenon we are trying to capture. Try to do this for Definition 1.3

You might wonder if Definition 1.3 is not too strong. After all how are we going to ever prove that Eve cannot recover the secret key no matter what she does? Edgar Allan Poe would say that there can al- ways be a method that we overlooked. However, in fact this definition is too weak! Consider the following encryption: the secret key is cho- sen at random in but our encryption scheme simply ignores it 푛 푘 {0, 1} 54 an intensive introduction to cryptography

and lets and . This is a valid encryption since , but is of course completely insecure as we are simply 푘 푘 outputting퐸 (푚) the plaintext = 푚 in퐷 the(푐) clear. = 푐 Yet, no matter what Eve does, if 푘 푘 퐷she(퐸 only(푚)) sees = 푚and not , there is no way she can guess the true value of with probability better than , since it was chosen completely at random and she푐 gets no푘 information−푛 about it. Formally, one can prove the푘 following result: 2 Lemma 1.4 Let be the encryption scheme above. For every func- tion and for every , the probability that (퐸,ℓ 퐷) is exactly푛 . ℓ 퐸푣푒 ∶ {0, 1} → {0, 1} −푛 푚 ∈ {0, 1} 푘 Proof.퐸푣푒(퐸This follows(푚)) = because 푘 2 and hence which is some fixed value that is independent of 푘 푘 . Hence the probability that퐸 (푚) =′is 푚 . QED푛 퐸푣푒(퐸 (푚)) = ■ 퐸푣푒(푚) 푘′ ∈−푛 {0, 1} 푘 푘 = 푘 2 The math behind the above argument is very simple, yet I urge you to read and re-read the last two paragraphs until you are sure that you completely understand why this encryption is in fact secure according to the above definition. This is a “toy example” of the kind of reasoning that we will be employing constantly throughout this course, and you want to make sure that you follow it. So, Lemma 1.4 is true, but one might question its meaning. Clearly this silly example was not what we meant when stating this defini- tion. However, as mentioned above, we are not willing to ignore even silly examples and must amend the definition to rule them out. One obvious objection is that we don’t care about hiding the key- it is the message that we are trying to keep secret. This suggests the next at- tempt:

Definition 1.5 — Security of encryption: second attempt. An encryption scheme is -secure if for every message no matter what method Eve employs, the probability that she can recover from the ciphertext(퐸, 퐷) 푛 is at most . 푚 −푛 푚 푘 Now this seems푐 = like 퐸 (푚) it captures our2 intended meaning. But remem- ber that we are being anal, and truly insist that the definition holds as stated, namely that for every plaintext message and every func- tion , the probability over the choice of that 퐶 is at mostℓ . But now we see that푚 this is clearly impossible.퐸푣푒 ∶ {0, After 1} all,→ this {0, 1} is supposed−푛 to work for every message푘 푘 퐸푣푒(퐸and every(푚))function = 푚 , but clearly2 if is the all-zeroes message and is the function that ignores its input and simply outputs푚ℓ , then it will hold that퐸푣푒 푚with probability one. 0 ℓ 퐸푣푒 0 푘 퐸푣푒(퐸 (푚)) = 푚 introduction 55

So, if before the definition was too weak, the new definition is too strong and is impossible to achieve. The problem is that of course we could guess a fixed message with probability one, so perhaps we could try to consider a definition with a random message. That is:

Definition 1.6 — Security of encryption: third attempt. An encryption scheme is -secure if no matter what method Eve employs, if is chosen at random from , the probability that she can recover (퐸,from 퐷) the푛 ciphertext ℓ is at most . 푚 {0, 1} −푛 푘 This weakened푚 definition can푐 =in 퐸 fact(푚) be achieved,2 but we have again weakened it too much. Consider an encryption that hides the last bits of the message, but completely reveals the first bits. The prob- ability of guessing a random message is , and so such a schemeℓ/2 would be “ secure” per Definition 1.6 but−ℓ/2 this is stillℓ/2 a scheme that you would not want to use. The point is2 that in practice we don’t en- crypt randomℓ/2 messages— our messages might be in English, might have common headers, and might have even more structures based on the context. In fact, it may be that the message is either “Yes” or “No” (or perhaps either “Attack today” or “Attack tomorrow”) but we want to make sure Eve doesn’t learn which one it is. So, using an encryption scheme that reveals the first half of the message (or frankly even only the first bit) is unacceptable.

1.5 PERFECT SECRECY

So far all of our attempts at definitions oscillated between being too strong (and hence impossible) or too weak (and hence not guarantee- ing actual security). The key insight of Shannon was that in a secure encryption scheme the ciphertext should not reveal any additional in- formation about the plaintext. So, if for example it was a priori possible for Eve to guess the plaintext with some probability (e.g., because there were only possibilities for it) then she should not be able to guess it with higher probability after seeing the ciphertext.1/푘 This can be formalized as follows:푘

Definition 1.7 — Perfect secrecy. An encryption scheme is per- fectly secret if there for every set of plaintexts, and for every strategy used by Eve, if we choose at randomℓ (퐸, 퐷) and a random key , then푀 the ⊆ probability {0, 1} that Eve guesses after seeing is at most푛 . 푚 ∈ 푀 푘 ∈ {0, 1} 푚 푘 In particular,퐸 if(푚) we encrypt either1/|푀| “Yes” or “No” with probability , then Eve won’t be able to guess which one it is with probability better than half. In fact, that turns out to be the heart of the matter: 1/2 56 an intensive introduction to cryptography

Theorem 1.8 — Two to many theorem. An encryption scheme is perfectly secret if and only if for every two distinct plaintexts and every strategy used by Eve, if(퐸, we 퐷) choose at random ℓ and a random key , then the 0 1 {푚probability, 푚 } that⊆ {0,Eve 1} guesses after seeing is at most푛 . 푏 ∈ {0, 1} 푘 ∈ {0, 1} 푏 푘 푏 Proof. The “only if” direction is푚 obvious— this퐸 condition(푚 ) is a special1/2 case of the perfect secrecy condition for a set of size . The “if” direction is trickier. We will use a proof by contradiction. We need to show that if there is some set (of푀 size possibly2 much larger than ) and some strategy for Eve to guess (based on the ci- phertext) a plaintext chosen from with푀 probability larger than , then2 there is also some set of size two and a strategy for Eve to guess a plaintext chosen푀 from′ with probability larger ′ than1/|푀| . 푀 ′ 퐸푣푒 Let’s fix the message to be the all zeroes푀 message and pick at random1/2 in . Under our assumption, it holds that for random key 0 1 and message , 푚 푚 푀 푘 1 Pr (1.2) 푚 ∈ 푀 푛 푘 1 1 On the other푘← hand,푅{0,1} for,푚1 every←푅푀[퐸푣푒(퐸 choice of(푚 ,)) = 푚 ] > 1/|푀| . is a fixed string independent on the choice of ′ , and so if we pick at 푘 0 random in , then the probability that푘 푚 = 퐸푣푒(퐸is at most(푚 )) , or 1 1 in other words 푚 ′ 푚 1 푀 푚 = 푚 1/|푀| Pr (1.3)

푛 푘 0 1 We can also푘←푅 write{0,1} ,푚 (1.21←)푅 and푀[퐸푣푒(퐸 (1.3) as(푚 )) = 푚 ] ≤ 1/|푀| . Pr

1 푅 푘 1 1 and 푚 ←피 푀 [퐸푣푒(퐸 (푚 )) = 푚 ] > 1/|푀| Pr where these expectations1 푅 are taken푘 over0 the choice1 of . Hence by 푚 ←피 푀 [퐸푣푒(퐸 (푚 )) = 푚 ] ≤ 1/|푀| linearity of expectation 1 푚 Pr Pr (1.4)

1 푅 푘 1 1 푘 0 1 (In푚 words,←피 푀 ( for[퐸푣푒(퐸 random(푚 )), = the 푚 probability] − [퐸푣푒(퐸 that(푚 Eve)) outputs = 푚 ]) > 0given . an encryption of is higher than the probability that Eve outputs 1 1 given an encryption of 푚 .) 푚 1 1 In particular, by푚 the averaging argument (the argument that if the 푚 0 average of numbers is larger푚 than then one of the numbers is larger than ) there must exist satisfying 훼 Pr 1 Pr 훼 푚 ∈ 푀 푘 1 1 푘 0 1 [퐸푣푒(퐸 (푚 )) = 푚 ] > [퐸푣푒(퐸 (푚 )) = 푚 ]. introduction 57

(Can you see why? This is worthwhile stopping and reading again.) But this can be turned into an attacker such that for . the probability that ′ is larger than . 푅 Indeed, we can define to′ output 퐸푣푒if 푏and ← 푘 푏 푏 otherwise{0, 1} output a random′퐸푣푒 message(퐸 (푚 in)) = 푚 . The probability1/2 1 1 that equals 퐸푣푒is higher(푐) when 푚 퐸푣푒(푐)than = when푚 0 1 ,′ and since outputs either {푚or, 푚 }, this means that the 1 푘 1 probability퐸푣푒 (푦) that 푚 ′ is푐 larger = 퐸 than(푚 ) . (Can you푐 = see 푘 0 0 1 퐸why?)(푚 ) 퐸푣푒′ 푚 푚 푘 푏 푏 퐸푣푒 (퐸 (푚 )) = 푚 1/2 ■

P The proof of Theorem 1.8 is not trivial, and is worth reading again and making sure you understand it. An excellent exercise, which I urge you to pause and do now is to prove the following: is perfectly secret if for every plaintexts , the two random variables and ′(퐸, 퐷) (forℓ ran- domly chosen keys and )푚, have 푚 precisely′∈′ {0, 1} the same 푘 푘 distribution. {퐸 (푚)} ′ {퐸 (푚 )} 푘 푘

Solved Exercise 1.1 — Perfect secrecy, equivalent definition. Prove that a valid encryption scheme with plaintext length is perfectly secret if and only if for every and plaintexts , the following two distributions(퐸, 퐷) and over ℓ(⋅)are′ identical:ℓ(푛) 푛 ∈ ℕ ′ 푚, 푚∗ ∈ {0, 1} • is obtained by sampling푌 푌 {0,and 1} outputting . 푛 푅 푘 • 푌 is obtained by sampling푘 ← {0, 1} and outputting퐸 (푚) . ′ 푛 ′ 푅 푘 푌 푘 ← {0, 1} 퐸 (푚 ) ■

Solution: We only sketch the proof. The condition in the exercise is equiv- alent to perfect secrecy with . For every , if and are identical then clearly for every and possible ′ output , Pr′ |푀| =Pr 2 푀 =since {푚, these 푚 } correspond푌 푌 applying on the same distribution퐸푣푒′ . On the 푘 푘 other hand,푦 [퐸푣푒(퐸 if and(푚))are = not 푦] identical= [퐸푣푒(퐸 then(푚 there)) must= 푦] exist′ some ciphertext such that퐸푣푒′ Pr Pr 푌(or = vice 푌 versa). The adversary∗ 푌 that on푌 input guesses∗ that ′is an encryption∗ of if and푐 otherwise tosses[푌 = a coin푐 ] > will have[푌 = some 푐 ] advantage over in distinguishing∗ an encryption푐 of from푐 an encryption of 푚 . 푐 = 푐 ′■ 1/2 푚 푚 58 an intensive introduction to cryptography

We summarize the equivalent definitions of perfect secrecy in the following theorem, whose (omitted) proof follows from Theorem 1.8 and Solved Exercise 1.1 as well as similar proof ideas.

Theorem 1.9 — Perfect secrecy equivalent conditions. Let be a valid encryption scheme with message length . Then the following conditions are equivalent: (퐸, 퐷) ℓ(푛) 1. is perfectly secret as per Definition 1.7.

2.(퐸, For 퐷)every pair of messages , the distributions and areℓ(푛) identical. 0 1 푛 푚 , 푚 ∈ {0,푛 1} 푘 0 푘←푅{0,1} 푘 1 푘←푅{0,1} 3.{퐸 (Two-message(푚 )} security:{퐸 Eve(푚 can’t)} guess which of one of two messages was encrypted with success better than half.) For ev- ery function and pair of messages , 퐶(푛) ℓ(푛) 퐸푣푒ℓ(푛) ∶ {0, 1} → {0, 1} 0 1 푚 , 푚 ∈ {0, 1} Pr

푛 푅 푅 푘 푏 푏 4. (Arbitrary푏← prior{0,1},푘← security:{0,1} [퐸푣푒(퐸 Eve can’t(푚 guess)) = which 푚 ] ≤ message 1/2 was encrypted with success better than her prior information.) For every distribution over , and , ℓ(푛) 퐶(푛) ℓ(푛) 풟 {0, 1} 퐸푣푒 ∶ {0, 1} → {0, 1} Pr max

푛 푘 where we푚← denote푅풟,푘←푅 max{0,1} [퐸푣푒(퐸max(푚)) = 푚] ≤Pr (풟) to be the largest probability of any element∗ underℓ(푛) . ∗ 푚 ∈{0,1} 푚←푅풟 (풟) = [푚 = 푚 ] 1.5.1 Achieving perfect secrecy 풟 So, perfect secrecy is a natural condition, and does not seem to be too weak for applications, but can it actually be achieved? After all, the condition that two different plaintexts are mapped to the same distri- bution seems somewhat at odds with the condition that Bob would succeed in decrypting the ciphertexts and find out if the plaintext was in fact or . It turns out the answer is yes! For example, Fig. 1.8 details a perfectly′ secret encryption for two bits. 6 The one-time pad is typically credited to Gilbert In fact,푚 this푚 can be generalized to any number of bits:6 Vernam of Bell and Joseph Mauborgne of the U.S. Army Signal Corps, but Steve Bellovin discovered an earlier inventor Frank Miller who published a Theorem 1.10 — One Time Pad (Vernam 1917, Shannon 1949). There is a description of the one-time pad in 1882. However, it perfectly secret valid encryption scheme with . is unclear if Miller realized the fact that security of this system can be mathematically proven, and so the theorem below should probably be still be credited to Proof Idea: (퐸, 퐷) ℓ(푛) = 푛 Vernam and Mauborgne. introduction 59

Figure 1.8: A perfectly secret encryption scheme for two-bit keys and messages. The blue vertices represent plaintexts and the red vertices represent ciphertexts, each edge mapping a plaintext to a ciphertext is labeled with the correspond- ing key . Since there are four possible keys,푚 the degree of the푐 = graph 퐸푘(푚) is four and it is in fact a complete bipartite푘 graph. The encryption scheme is valid in the sense that for every , the map

is one-to-one, which in other words2 means that the set of edges labeled with푘 ∈is {0, a matching 1} . 푚 ↦ 퐸푘(푚)

Our scheme is the one-time pad also known as the “Vernam Ci- pher”, see Fig. 1.9. The encryption is exceedingly simple: to encrypt a message with a key we simply output where is the bitwise푛 XOR operation that푛 outputs the string corresponding푚 ∈ to {0, XORing 1} each coordinate푘 ∈ {0, of 1} and . 푚 ⊕ 푘 ⊕ 푚 푘 ⋆ Figure 1.9: In the one time pad encryption scheme we encrypt a plaintext with a key

by the ciphertext 푛 where denotes the bitwise푛 XOR operation.푚 ∈ {0, 1} 푘 ∈ {0, 1} 푚 ⊕ 푘 ⊕

Proof of Theorem 1.10. For two binary strings and of the same length , we define to be the string such that mod for every . The푎 encryption푏 푛 scheme 푛is defined 푎as ⊕ follows: 푏 푐 ∈and {0, 1} . 푖 푖 푖 푐By= the 푎 associative+ 푏 law2 of addition푖 ∈ (which [푛] works also modulo two), 푘 푘 (퐸, 퐷) 퐸 (푚) = 푚 ⊕ 푘 퐷 (푐), using= 푐 ⊕ the 푘 fact that for every bit , mod and푛 mod . 푘 푘 퐷Hence(퐸 (푚)) =form (푚 ⊕ a 푘) valid ⊕ 푘 =encryption. 푚 ⊕ (푘 ⊕ 푘) = 푚 ⊕ 0 = 푚 To analyze the휎 perfect ∈ {0, 1} secrecy휎 + 휎 property,2 =we 0 claim휎 that + 0 for = 휎 every 2 (퐸, 퐷), the distribution where is simply the uniform푛 distribution over , and hence in particular푛 푚 푘 푅 푚the ∈ distributions {0, 1} and are푌 identical= 퐸 (푚) for푛 every푘 ← {0, 1} . Indeed, for every particular ′ {0,, the 1} value is output′ by 푛 if 푚 푚 and only if 푌 which푌 holds if and푛 only if 푚, 푚 ∈. Since {0, 1} is 푚 chosen uniformly at random푦 in ∈ {0, 1} , the probability푦 that happens푌 to equal 푦 =is 푚 exactly ⊕ 푘 , which means푛 that every푘 = 푚 string ⊕ 푦 is output푘 by with probability −푛. {0, 1} 푘 푚 ⊕ 푦 2−푛 푦 푚 푌 2 60 an intensive introduction to cryptography

Figure 1.10: For any key length , we can visualize an encryption scheme as a graph with a vertex for every one of the possible푛 plaintexts and for every one of the ciphertexts(퐸,ℓ(푛) 퐷) in of the form for 2 and ∗ . For every plaintext and key 푛, we add an{0, edge 1} ℓ(푛) labeled 퐸between푘(푥) 푘and ∈ {0, 1} . By the푥 ∈ validity {0, 1} condition, if we pick any fixed푥 key 푘, the map must푘 be one-to-one.푥 The퐸 condition푘(푥) of perfect secrecy simply corresponds to requiring푘 that every푥 ↦ 퐸 two푘(푥) plaintexts and have exactly the same set of neighbors (or

multi-set,′ if there are parallel edges). 푥 푥

P The argument above is quite simple but is worth read- ing again. To understand why the one-time pad is perfectly secret, it is useful to envision it as a bipartite graph as we’ve done in Fig. 1.8. (In fact the encryp- tion scheme of Fig. 1.8 is precisely the one-time pad for .) For every , the one-time pad encryp- tion scheme corresponds to a bipartite graph with vertices푛 = on 2 the “left side”푛 corresponding to the plaintexts푛 in and vertices on the “right side” 2corresponding to the푛 ciphertexts푛 . For every and{0, 1} 2 , we connect푛 to the vertex 푛 with an edge that푛 we{0, label 1} with . One can 푥see ∈ that {0, 1} this is the푘 complete∈ {0, 1} bipartite graph,푥 where 푘 every푦 = 퐸 vertex(푥) on the left is connected to all vertices푘 on the right. In particular this means that for every left vertex , the distribution on the ciphertexts obtained by taking a random and going to the neighbor푥 of on the edge labeled is푛 the uniform dis- tribution over 푘. This ∈ ensures {0, 1} the perfect secrecy condition. 푥 푛 푘 {0, 1}

1.6 NECESSITY OF LONG KEYS

So, does Theorem 1.10 give the final word on cryptography, and means that we can all communicate with perfect secrecy and live happily ever after? No it doesn’t. While the one-time pad is efficient, and gives perfect secrecy, it has one glaring disadvantage: to commu- nicate bits you need to store a key of length . In contrast, practically

푛 푛 introduction 61

used cryptosystems such as AES-128 have a short key of bits (i.e., bytes) that can be used to protect terabytes or more of communica- tion! Imagine that we all needed to use the one time pad.128 If that was the16 case, then if you had to communicate with people, you would have to maintain (securely!) huge files that are each as long as the length of the maximum total communication you푚 expect with that per- son. Imagine that every time푚 you opened an account with Amazon, Google, or any other service, they would need to send you in the mail (ideally with a secure courier) a DVD full of random numbers, and every time you suspected a virus, you’d need to ask all these services for a fresh DVD. This doesn’t sound so appealing. This is not just a theoretical issue. The Soviets have used the one- time pad for their confidential communication since before the 1940’s. In fact, even before Shannon’s work, the U.S. intelligence already knew in 1941 that the one-time pad is in principle “unbreakable” (see page 32 in the Venona document). However, it turned out that the hassle of manufacturing so many keys for all the communication took its toll on the Soviets and they ended up reusing the same keys for more than one message. They did try to use them for completely dif- ferent receivers in the (false) hope that this wouldn’t be detected. The Venona Project of the U.S. Army was founded in February 1943 by Gene Grabeel (see Fig. 1.11), a former home economics teacher from Madison Heights, Virginia and Lt. Leonard Zubko. In October 1943, they had their breakthrough when it was discovered that the Russians were reusing their keys. In the 37 years of its existence, the project has resulted in a treasure chest of intelligence, exposing hundreds of KGB agents and Russian spies in the U.S. and other countries, including Julius Rosenberg, Harry Gold, Klaus Fuchs, Alger Hiss, Harry Dexter Figure 1.11: Gene Grabeel, who founded the U.S. White and many others. Russian SigInt program on 1 Feb 1943. Photo taken in Unfortunately it turns out that that such long keys are necessary for 1942, see Page 7 in the Venona historical study. perfect secrecy:

Theorem 1.11 — Perfect secrecy requires long keys. For every perfectly secret encryption scheme the length function satisfies . (퐸, 퐷) ℓ ℓ(푛) ≤ 푛 Proof Idea: The idea behind the proof is illustrated in Fig. 1.12. We define a graph between the plaintexts and ciphertexts, where we put an edge Figure 1.12: An encryption scheme where the number of keys is smaller than the number of plaintexts between plaintext and ciphertext if there is some key such that corresponds to a bipartite graph where the degree is . The degree of this graph is at most the number of potential smaller than the number of vertices on the left side. keys. The fact that푥 the degree is smaller푦 than the number푘 of plaintexts Together with the validity condition this implies that 푘 there will be two left vertices with non-identical 푦(and = 퐸 hence(푥) of ciphertexts) implies that there would be two plaintexts neighborhoods, and hence the scheme′ does not satisfy and with different sets of neighbors, and hence the distribution perfect secrecy. 푥, 푥 ′ 푥 푥 62 an intensive introduction to cryptography

of a ciphertext corresponding to (with a random key) will not be identical to the distribution of a ciphertext corresponding to . 푥 ′ 푥 Proof⋆ of Theorem 1.11. Let be a valid encryption scheme with messages of length and key of length . We will show that is not perfectly secret퐸,퐷 by providing two plaintexts such that theℓ distributions and푛 < ℓ are not identical, where 0 1 (퐸,is 퐷) theℓ distribution obtained by picking and푥 , outputting 푥 ∈ 푥0 푥1 {0, 1}. 푌 푌 푛 푥 푅 푌 We choose . Let be푘 the ← set{0, of 1} all ciphertexts 푘 퐸that(푥) have nonzero probabilityℓ of being∗ output in . That is, 0 0 푥 = 0 . Since푆 ⊆ there {0, 1} are only keys, we know that 푥0 0 . 푛 푛푌 푆 = 푘∈{0,1} 푘 0 {푦We | ∃ will푛 show푦 = the 퐸 (푥 following)} claim: 2 0 |푆 Claim| ≤ 2 I: There exists some and such that . ℓ 푛 1 Claim I implies that the string푥 ∈ {0, 1}has positive푘 ∈ {0, probability 1} of 푘 1 0 퐸being(푥 ) output ∉ 푆 by and zero probability of being output by and 푘 1 hence in particular and are퐸 not(푥 identical.) To prove Claim I, just 푥1 푥0 choose a fixed 푌 . By the validity condition, the map푌 푥0 푥1 is a one to one푌 map푛 of푌 to and hence in particular the image of this푘 map ∈ {0, which 1} is the setℓ ∗ 푥 ↦has 푘 size퐸 (푥) at least (in fact exactly) {0,. Since 1} {0, 1} , thisℓ means that 푘 푥∈{0,1} 푘 and so in particularℓ there exists퐼 = some{푦푛 | ∃ stringℓ 푦in = 퐸 (푥)}. But 0 by the definition of this means2 that|푆 there| ≤ is 2 some< 2 ⧵ such 푘 0 푘 0 |퐼that| > |푆 | which concludes the proof of Claim I and푦 hence퐼 ℓ 푆 of 푘 Theorem 1.11. 퐼 푥 ∈ {0, 1} 푘 0 퐸 (푥) ∉ 푆 ■

R Remark 1.12 — Adding probability into the picture. There is a sense in which both our secrecy and our impossi- bility results might not be fully convincing, and that is that we did not explicitly consider algorithms that use randomness . For example, maybe Eve can break a perfectly secret encryption if she is not modeled as a deterministic function but rather a probabilistic process. Similarly,표 maybe the ℓ encryption and decryption퐸푣푒 functions ∶ {0, could 1} be→ prob- {0, 1} abilistic processes as well. It turns out that none of those matter. For the former, note that a probabilistic process can be thought of as a distribution over functions, in the sense that we have a collection of functions mapping to , and some probabilities 1 푁 표 ℓ 푓 , ..., 푓 {0, 1} {0, 1} introduction 63

(non-negative numbers summing to ), so we now think of Eve as selecting the function with 1 푁 probability푝 , … , 푝 . But if none of those functions can1 give 푖 an advantage better than , then neither can푓 this 푖 collection (this푝 is related to the averaging principle in probability). 1/2 A similar (though more involved) argument shows that the impossibility result showing that the key must be at least as long as the message still holds even if the encryption and decryption algorithms are allowed to be probabilistic processes as well (working this out is a great exercise).

1.6.1 Amplifying success probability Theorem 1.11 implies that for every encryption scheme with , there is a pair of messages and an attacker that can distinguish between an encryption of and an encryption(퐸, 퐷) of 0 1 ℓ(푛)with > 푛 success better than . But perhaps푥 , 푥 Eve’s success is퐸푣푒 only 0 marginally better than half, say ? It푥 turns out that’s not the 1 case.푥 If the message is even1/2 somewhat larger than the key, the success of Eve can be very close to : 0.50001

Theorem 1.13 — Short keys imply1 high probability attack. Let be an encryption scheme with . Then there is a function and pair of messages such that (퐸, 퐷) ℓ(푛) = 푛 + 푡 퐸푣푒 0 1 Pr 푥 , 푥 −푡−1 푛 푘 푏 푏 푘←푅{0,1} ,푏←푅{0,1}[퐸푣푒(퐸 (푥 )) = 푥 ] ≥ 1 − 2 . Proof. As in the proof of Theorem 1.11, let and let and be the set of size at most of allℓ 0 ciphertexts corresponding to 푛. We claim thatℓ = ℓ(푛) 푥푛 = 0 0 푘 0 푆 = {퐸 (푥 ) ∶ 푘 ∈ {0, 1} } 2 0 Pr 푥 (1.5) −푡 푛 ℓ 푘 0 We show this by푘← arguing푅{0,1} ,푥∈{0,1} that this[퐸 bound(푥) ∈ 푆 holds] ≤ 2 for. every fixed , when we take the probability over , and so in particular it holds also for random . Indeed, for every fixed , the map is a one-푘 to-one map, and so the distribution푥 of for random is 푘 uniform over푘 some set of size .푘 For every 푥, the↦ 퐸 probability(푥) overℓ 푘 that is equal to 푛+푡 퐸 (푥) 푥 ∈ {0, 1} 푘 푇 2 푘 푘 0 푥 퐸 (푥) ∈ 푆 푛 푘 0 0 |푇 ∩푆 | |푆 | 2푛+푡 −푡 푘 푘 thus proving (1.5). |푇 | ≤ |푇 | ≤ 2 = 2 Now, for every , define to be Pr . By (1.5), the expectation of over random 푛 is at most and so 푥 푘←푅{0,1} 푘 0 푥 푝 [퐸푛 (푥) ∈ 푆 ] −푡 푥 푅 푝 푥 ← {0, 1} 2 64 an intensive introduction to cryptography

in particular by the averaging argument there exists some such that . Yet that means that the following adversary will be 1 able to distinguish−푡 between an encryption of and an encryption푥 of 푥1 푝 with≤ 2 probability at least : 퐸푣푒 0 −푡−1 푥 1 푥• Input: A ciphertext 1 − 2 ∗ • Operation: If 푦, output ∈ {0, 1} , otherwise output .

0 0 1 The probability푦 ∈ that 푆 푥 is equal to푥 , while the probability that is equal to . Hence 푘 0 0 the overall probability of퐸푣푒(퐸guessing(푥 )) = correctly 푥 is 1 −푡 푘 1 1 푥1 퐸푣푒(퐸 (푥 )) = 푥 1 − 푝 ≥ 1 − 2 퐸푣푒 1 1 −푡 −푡−1 ■ 2 ⋅ 1 + 2 ⋅ (1 − 2 ) = 1 − 2 .

1.7 BIBLIOGRAPHICAL NOTES

Much of this text is shared with my Introduction to Theoretical Com- puter Science textbook. Shannon’s manuscript was written in 1945 but was classified, and a partial version was only published in 1949. Still it has revolutionized cryptography, and is the forerunner to much of what followed. The Venona project’s history is described in this document. Aside from Grabeel and Zubko, credit to the discovery that the Soviets were reusing keys is shared by Lt. Richard Hallock, Carrie Berry, Frank Lewis, and Lt. Karl Elmquist, and there are others that have made important contributions to this project. See pages 27 and 28 in the document. In a 1955 letter to the NSA that only recently came forward, John Nash proposed an “unbreakable” encryption scheme. He wrote “I hope my handwriting, etc. do not give the impression I am just a crank or circle-squarer… The significance of this conjecture [that certain encryption schemes are exponentially secure against key recovery attacks] .. is that it is quite feasible to design ciphers that are effectively unbreakable.” John Nash made seminal contributions in mathematics and game theory, and was awarded both the Abel Prize in mathematics and the Nobel Memorial Prize in Economic Sciences. However, he has struggled with mental illness throughout his life. His biography, A Beautiful Mind was made into a popular movie. It is natural to compare Nash’s 1955 letter to the NSA to the 1956 letter by Kurt Gödel to John von Neumann. From the theoretical computer science point of view, the crucial difference is that while Nash informally talks about exponential vs polynomial computation time, he does not mention the word “Turing Machine” or other models of computation, and it is not clear if he is aware or not introduction 65

that his conjecture can be made mathematically precise (assuming a formalization of “sufficiently complex types of enciphering”).

II PRIVATE KEY CRYPTOGRAPHY

2 Computational Security

Additional reading: Sections 2.2 and 2.3 in Boneh-Shoup book. Chapter 3 up to and including Section 3.3 in Katz-Lindell book. Recall our cast of characters- Alice and Bob want to communicate securely over a channel that is monitored by the nosy Eve. In the last lecture, we have seen the definition of perfect secrecy that guarantees that Eve cannot learn anything about their communication beyond what she already knew. However, this security came at a price. For every bit of communication, Alice and Bob have to exchange in ad- vance a bit of a secret key. In fact, the proof of this result gives rise to the following simple Python program that can break every encryption scheme that uses, say, a bit key, with a bit message: from itertools import product # Import an iterator for 128 129 cartesian products

from↪ random import choice # choose random element of list

# Gets ciphertext as input and two potential plaintexts # Returns most likely plaintext # We assume we have access to the function Encrypt(key,plaintext)

def↪ Distinguish(ciphertext,plaintext1,plaintext2): for key in product([0,1], repeat = 128): # Iterate over all possible keys of length 128

↪ if Encrypt(key, plaintext1)==ciphertext: return plaintext1 if Encrypt(key, plaintext2)==ciphertext: return plaintext2 return choice([plaintext1,plaintext2])

The program Distinguish will break any -bit key and -bit message encryption Encrypt, in the sense that there exist a pair of messages such that Distinguish Encrypt128 129 with probability at least over and . 0 1 푏 0 1 푚 , 푚 ( 푛(푘, 푚 ), 푚 , 푚 ) = 푏 푅 푅 푚 0.75 푘 ← {0, 1} 푏 ← {0, 1}

Compiled on 9.23.2021 13:32 70 an intensive introduction to cryptography

Now, generating, distributing, and protecting huge keys causes immense logistical problems, which is why almost all encryption schemes used in practice do in fact utilize short keys (e.g., bits long) with messages that can be much longer (sometimes even ter- abytes or more of data). 128 So, why can’t we use the above Python program to break all en- cryptions in the Internet and win infamy and fortune? We can in fact, but we’ll have to wait a really long time, since the loop in Distinguish will run times, which will take much more than the lifetime of the universe to128 complete, even if we used all the computers on the planet. However,2 the fact that this particular program is not a feasible at- tack, does not mean there does not exist a different attack. But this still suggests a tantalizing possibility: if we consider a relaxed version of perfect secrecy that restricts Eve to performing computations that can be done in this universe (e.g., less than steps should be safe not just for human but for all potential alien civilizations)256 then can we bypass the impossibility result and allow the2 key to be much shorter than the message? This in fact does seem to be the case, but as we’ve seen, defining security is a subtle task, and will take some care. As before, the way we avoid (at least some of) the pitfalls of so many cryptosystems in history is that we insist on very precisely defining what it means for a scheme to be secure. Let us defer the discussion how one defines a function being com- putable in “less than operations” and just say that there is a way to formally do so. We will want to say that a scheme has “ bits of security” if it is not possible푇 to break it using less than operations, and more generally that it has bits of security if it can’t256 be256 broken using less than operations. Given the perfect secrecy2 definition we saw last time, a natural푡 attempt푡 for defining computational secrecy would be the following:2

Definition 2.1 — Computational secrecy (first attempt). An encryption scheme has bits of computational secrecy if for every two dis- tinct plaintexts and every strategy of Eve using at most (퐸,computational 퐷) 푡 steps, if weℓ choose at random 0 1 and a random푡 key{푚 , 푚 } ⊆ {0,, then 1} the probability that Eve guesses after2 seeing is at most푛 . 푏 ∈ {0, 1} 푘 ∈ {0, 1} 푏 푘 푏 푚 퐸 (푚 ) 1/2 Note: It is important to keep track of what is known and unknown to the adversary Eve. The adversary knows the set of potential messages, and the ciphertext . The only things she doesn’t 0 1 know are whether or , and the value{푚 of the, 푚 secret} key . 푘 푏 In particular, because and푦 =are 퐸 (푚 known) to Eve, it does not matter 푏 = 0 푏 = 1 푘 0 1 푚 푚 computational security 71

whether we define Eve’s goal in this “security game” as outputting or as outputting . 푏 Definition 2.1 seems very natural, but is in fact impossible to achieve푚 if the key is shorter푏 than the message.

P Before reading further, you might want to stop and think if you can prove that there is no, say, encryption scheme with bits of computational security satisfy- ing Definition 2.1 with and where the time to √ compute the encryption푛 is polynomial. ℓ = 푛 + 1 The reason Definition 2.1 can’t be achieved is that if the message is even one bit longer than the key, we can always have a very efficient procedure that achieves success probability of about by guessing the key. This is because we can replace the loop in−푛−1 the Python program Distinguish by choosing the key at1/2 random. + 2 Since we have some small chance of guessing correctly, we will get a small advantage over half. Of course an advantage of in guessing the message is not really something we would worry−256 about. For example, since the earth is about 5 billion years old, we2 can estimate the chance that an asteroid of the magnitude that caused the dinosaurs’ extinction will hit us this very second to be about . Hence we want to relax the notion of computational security so it−60 would not consider guessing with such a tiny advantage as a “true2 break” of the scheme. The resulting definition is the following:

Definition 2.2 — Computational secrecy (concrete). An encryption scheme has bits of computational secrecy 1 if for every two distinct plaintexts and every strategy of Eve using at (퐸,most 퐷) computational푡 steps, if weℓ choose at random and 0 1 a random푡 key{푚 , 푚 } ⊆, {0, then 1} the probability that Eve guesses after seeing2 is at most푛 . 푏 ∈ {0, 1} 푏 푘 ∈ {0, 1} −푡 푚 1 푘 푏 Another version of “ bits of security” is that a Having learned퐸 (푚 our) lesson, let’s1/2 try + 2to see that this strategy does scheme has bits of security if for every , an attacker running in푡 time can’t get success give us the kind of conditions we desired. In particular, let’s verify 1 2 probability advantage푡 more1 than . However푡 + 푡 ≤ 푡 that this definition implies the analogous condition to perfect secrecy. 푡 these two definitions 2only differ−푡2 from one another by at most a factor of two. This may2 be important for practical applications (where the difference between Theorem 2.3 — Guessing game for computational secrecy. If has and bits of security could be crucial) but won’t bits of computational secrecy as per Definition 2.2 then for ev- matter for our concerns. ery subset and every strategy of Eve(퐸, using 퐷) at most 64 32 푡 computationalℓ steps, if we choose at random 푡 and푀 a random ⊆ {0, key 1} , then the probability that Eve 2guesses− (100ℓafter + seeing 100) is at most푛 . 푚 ∈ 푀 푘 ∈ {0, 1} −푡+1 푘 푚 퐸 (푚) 1/|푀| + 2 72 an intensive introduction to cryptography

Before proving this theorem note that it gives us a pretty strong guarantee. In the exercises we will strengthen it even further showing that no matter what prior information Eve had on the message before, 2 2 The latter property is known as “”, she will never get any non-negligible new information on it. One way see also section 3.2.2 of Katz Lindell on “semantic se- to phrase it is that if the sender used a -bit secure encryption to en- curity” and Section 2 of Boneh-Shoup “computational crypt a message, then your chances of getting to learn any additional ciphers and semantic security”. information about it before the universe256 collapses are more or less the same as the chances that a fairy will materialize and whisper it in your ear.

P Before reading the proof, try to again review the proof of Theorem 1.8, and see if you can generalize it yourself to the computational setting.

Proof of Theorem 2.3. The proof is rather similar to the equivalence of guessing one of two messages vs. one of many messages for perfect secrecy (i.e., Theorem 1.8). However, in the computational context we need to be careful in keeping track of Eve’s running time. In the proof of Theorem 1.8 we showed that if there exists:

• A subset of messages ℓ and 푀 ⊆ {0, 1}

• An adversary such that 표 ℓ 퐸푣푒 ∶ {0, 1} → {0, 1} Pr

푛 푅 푅 푘 Then there exist푚← two푀,푘← messages{0,1} [퐸푣푒(퐸 (푚))and = an 푚] adversary > 1/|푀| such that Pr ′ 0 1 . 표 ℓ 푚 , 푚 푛 ′ 퐸푣푒 ∶ 푏←푅{0,1},푘←푅{0,1} 푘 푏 푏 {0,To 1} adapt→ {0, this 1} proof to the computational setting[퐸푣푒 and(퐸 (푚 complete)) = 푚 the] > proof1/2 of the current theorem it suffices to show that:

• If the probability of succeeding was then the probability of succeeding is at least . 1 |푀| ′ 퐸푣푒 1 + 휖 • If 퐸푣푒can be computed in operations,2 + 휖/2 then can be com- puted in operations. ′ 퐸푣푒 푇 퐸푣푒 This will푇 imply + 100ℓ that + if 100 ran in polynomial time and had poly- nomial advantage over in guessing a plaintext chosen from , then would run in polynomial퐸푣푒 time and have polynomial advan- tage over′ in guessing1/|푀| a plaintext chosen from . 푀 퐸푣푒 0 1 1/2 {푚 , 푚 } computational security 73

The first item can be shown by simply doing the same proof more carefully, keeping track how the advantage over for translates into an advantage over for . As the world’s1 most annoying |푀| saying goes, doing this is1 an excellent′ exercise for the reader.퐸푣푒 2 The second item is obtained퐸푣푒 by looking at the definition of from that proof. On input , computed (which ′ costs operations), checked if ′ (which costs, say, at most퐸푣푒 operations), and then outputted푐 퐸푣푒 either or a푚 random = 퐸푣푒(푐) bit (which is a 0 constant,푇 say at most operations).푚 = 푚 5ℓ 1 ■ 100 2.0.1 Proof by reduction The proof of Theorem 2.3 is a model to how a great many of the re- sults in this course will look like. Generally we will have many theo- rems of the form: “If there is a scheme satisfying security defini- tion then there is a′ scheme satisfying security definition′ ” 푆 푋 푆 In the context of Theorem푋 2.3, was “having bits of security” (in the context distinguishing between′ encryptions of two ciphertexts) and was the more general notion푋 of hardness of푡 getting a non-trivial advantage over guessing for an encryption of a random . While푋 in Theorem 2.3 the encryption scheme was the same as , this need not always be the case. However, all of the proofs푚 ∈ of 푀 such′ statements will have the same global structure—푆 we will assume푆 towards a contradiction, that there is an efficient adversary strategy demonstrating that the scheme violates the security notion , and build from a strategy demonstrating that violates . 퐸푣푒This is such an important point that′ 푆 it deserves repeating:′ 푋 ′ 퐸푣푒 퐸푣푒 푆 푋 The way you show that if is secure then is secure is by giving a transformation′ from an adversary that breaks into an adversary that breaks푆 푆 ′ For computational푆 secrecy, we will always푆 want that will be efficient if is, and that will usually be the case because′ will simply use as a black box, which it will not invoke퐸푣푒 too many′ times, and addition퐸푣푒 will use some polynomial time preprocessing퐸푣푒 and postprocessing.퐸푣푒 The more challenging parts of such proofs are typically:

• Coming up with the strategy . ′ • Analyzing the probability of success퐸푣푒 and in particular showing that if had non-negligible advantage then so will . ′ 퐸푣푒 퐸푣푒 74 an intensive introduction to cryptography

Note that, just like in the context of NP completeness or uncom- putability reductions, security reductions work backwards. That is, we construct the scheme based on the scheme , but then prove that we can transform an algorithm breaking into′ an algorithm breaking . Just like in computational푆 complexity, it can푆 sometimes be hard to keep′ track of the direction of the reduction.푆 In fact, cryptographic re- 푆ductions can be even subtler, since they involve an interplay of several entities (for example, sender, receiver, and adversary) and probabilis- tic choices (e.g., over the message to be sent and the key).

2.1 THE ASYMPTOTIC APPROACH

For practical security, often every bit of security matters. We want our keys to be as short as possible and our schemes to be as fast as possible while satisfying a particular level of security. In practice we would usually like to ensure that when we use a smallish security parameter such as in the few hundreds or thousands then:

• The honest parties푛(the parties running the encryption and decryp- tion algorithms) are extremely efficient, something like 100-1000 cycles per byte of data processed. In theory terms we would want them be using an or at worst time algorithms with not- too-big hidden constants. 2 푂(푛) 푂(푛 ) • We want to protect against adversaries (the parties trying to break the encryption) that have much vaster computational capabilities. A typical modern encryption is built so that using standard key sizes it can withstand the combined computational powers of all computers on earth for several decades. In theory terms we would want the time to break the scheme to be (or if not, at least or ) with not too small hiddenΩ(푛) constants. √ 1/3 Ω( 푛) Ω(푛 ) 2 For2 implementing2 cryptography in practice, the tradeoff between security and efficiency can be crucial. However, for understanding the principles behind cryptography, keeping track of concrete security can be a distraction, and so just like we do in algorithms courses, we will use asymptotic analysis (also known as big Oh notation) to sweep many of those details under the carpet. To a first approximation, there will be only two types of running times we will encounter in this course:

• Polynomial running time of the form for some constants (or for short), which we푐 will consider as efficient. 3 Some texts reserve the term exponential to functions 푂(1) 푑⋅푛 푑, 푐 > 0 of the form for some and call a function • Exponential running time of the form for some constants such as, say, subexponential . However, we will 푝표푙푦(푛) = 푛 휖 휖푛 3 √ (or for short) which we will consider푑⋅푛 as infeasible. generally not2 make푛 this distinction휖 > 0 in this course. Ω(1) 2 푛 2 푑, 휖 > 0 2 computational security 75

Another way to say it is that in this course, if a scheme has any security at all, it will have at least bits of security where is the length of the key and is some휖 absolute constant such as . Hence in this course, whenever you푛 hear the term “super polyno-푛 mial”, you can equate휖 it > in 0 your mind with “exponential” and휖 you = 1/3 won’t be far off the truth. These are not all the theoretically possible running times. One can have intermediate functions such as log though we will generally not encounter those. To make things clean푛 (and to correspond to standard terminology), we will generally푛 associate “efficient computa- tion” with polynomial time in where is either its input length or the key size (the key size and input length will always be polynomially related, and so this choice won’t푛 matter).푛 We want our algorithms (en- cryption, decryption, etc.) to be computable in polynomial time, but to require super polynomial time to break.

Negligible probabilities. In cryptography, we care not just about the running time of the adversary but also about their probability of suc- cess (which should be as small as possible). If is a function (which we’ll often think of as corresponding to the adver- sary’s probability of success or advantage over휇 the ∶ ℕtrivial → [0, probability, ∞) as a function of the key size ) then we say that is negligible if it’s smaller than the inverse of every (positive) polynomial. Our security definitions will have the following푛 form: 휇(푛)

”Scheme is secure if for every polynomial and time adversary , there is some negligible function such that푆 the probability that succeeds푝(⋅) in the 푝(푛) security game for퐸푣푒 is at most ” 휇 퐸푣푒 We now make these notions푆 more푡푟푖푣푖푎푙 formal. + 휇(푛)

Definition 2.4 — Negligible function. A function is negligi- ble if for every polynomial there exists such that for every . 4 휇 ∶ ℕ → [0, ∞) 푝 ∶ ℕ → ℕ 푁 ∈ ℕ 1 4 푝(푛) Negligible functions are sometimes defined with 휇(푛)The following< exercise푛 >provides 푁 a good way to get some comfort image equalling as opposed to the set of with this definition: non-negative real numbers, since they are typically used to bound probabilities.[0, 1] However, this does[0, ∞) not Exercise 2.1 — Negligible functions properties. 1. Let be a make much difference since if is negligible then for large enough , will be smaller than one. negligible function. Prove that for every polynomials 휇 with non-negative coefficients such that 휇 ∶ ℕ, the → function[0, ∞) 푛 휇(푛) defined as is negligible.푝, 푞 ∶ ℝ → ℝ ′ ′ 푝(0) = 0 2.휇 Let∶ ℕ → [0, ∞) . Prove that휇 (푛) =is 푝(휇(푞(푛)))negligible if and only if for every constant , lim . 휇 ∶ ℕ → [0, ∞) 푐 휇 ■ 푛→∞ 푐 푛 휇(푛) = 0 76 an intensive introduction to cryptography

R Remark 2.5 — Asymptotic analysis. The above defini- tions could be confusing if you haven’t encountered asymptotic analysis before. Reading the beginning of Chapter 3 (pages 43-51) in the KL book, as well as the mathematical background lecture in my intro to TCS notes can be extremely useful. As a rule of thumb, if every time you see the word “polynomial” you imag- ine the function and every time you see the word “negligible” you imagine10 the function then you √ will get the right푛 intuition. − 푛 What you need to remember is that negligible2 is much smaller than any inverse polynomial, while polynomi- als are closed under multiplication, and so we have the “equations”

and 푛푒푔푙푖푔푖푏푙푒 × 푝표푙푦푛표푚푖푎푙 = 푛푒푔푙푖푔푖푏푙푒

As mentioned, in practice people really want to get as close as푝표푙푦푛표푚푖푎푙 possible to ×bits 푝표푙푦푛표푚푖푎푙 of security = 푝표푙푦푛표푚푖푎푙 with an bit key, but we would be happy as long as the security grows with the key, so when푛 we say a scheme is “secure”푛 you can think of it having bits of security (though any function growing faster than log would be fine as √ well). 푛 푛 From now on, we will require all of our encryption schemes to be efficient which means that the encryption and decryption algorithms should run in polynomial time. Security will mean that any efficient adversary can make at most a negligible gain in the probability of 5 5 Note that there is a subtle issue here with the or- guessing the message over its a priori probability. der of quantifiers. For a scheme to be efficient, the We can now formally define computational secrecy in asymptotic algorithms such as encryption and decryption need terms: to run in some fixed polynomial time such as or . In contrast we allow the adversary to run in2 any polynomial time. That is, for every , if is large Definition 2.6 — Computational secrecy (asymptotic). An encryption 3 푛 enough,푛 then the scheme should be secure against scheme is computationally secret if for every two distinct an adversary that runs in time . This푐 푛 is in line with the general principle in cryptography that we plaintexts and every efficient (i.e., polynomial 푐 always allow the adversary potentially푛 much more time) strategy(퐸, 퐷) of Eve, if we chooseℓ at random and a ran- resources than those used by the honest users. In 0 1 dom key {푚 , 푚 }, ⊆ then {0, the 1} probability that Eve guesses after practical security we often assume that the gap be- tween the honest use and the adversary resources can seeing is at most푛 for some푏 ∈ negligible {0, 1} function 푏 be exponential. For example, a low power embedded . 푘 ∈ {0, 1} 푚 device can encrypt messages that, as far as we know, 푘 푏 퐸 (푚 ) 1/2 + 휇(푛) are undecipherable even by a nation-state using super-computers and massive data centers. 휇(⋅) 2.1.1 Counting number of operations. One more detail that we’ve so far ignored is what does it mean exactly for a function to be computable using at most operations. Fortu- nately, when we don’t really care about the difference between and, 푇 푇 computational security 77

say, , then essentially every reasonable definition gives the same an- 6 6 With some caveats that need to be added due to swer. 2 Formally, we can use the notions of Turing machines, Boolean quantum computers: we’ll get to those later in the circuits,푇 or straightline programs to define complexity. For concrete- course, though they won’t change most of our theory. ness, let’s define that a function has complexity See also this discussion in my intro TCS textbook and this presentation of Aaronson on the “extended at most if there is a Boolean circuit that푛 computes 푚 using at most Church Turing thesis”. Boolean gates (say AND/OR/NOT퐹 ∶ {0, or 1} NAND;→ {0, alternatively 1} you can choose your푇 favorite universal gate sets.) We will often퐹 also consider probabilistic푇 functions in which case we allow the circuit a RAND gate that outputs a single random bit (though this in general does not give extra power). The fact that we only care about asymptotics means you don’t really need to think of gates when arguing in cryptogra- phy. However, it is comforting to know that this notion has a precise mathematical formulation.

Uniform vs non-uniform models. While many computational texts fo- cus on models such as Turing machines, in cryptography it is more convenient to use Boolean circuits which are a non uniform model of computation in the sense that we allow a different circuit for every given input length. The reasons are the following:

1. Circuits can express finite computation, while Turing machines only make sense for computing on arbitrarily large input lengths, and so we can make sense of notions such as “ bits of computational security”. 푡 2. Circuits allow the notion of “hardwiring” whereby if we can com- pute a certain function using a circuit of gates and have a string 푛+푠 then we푚 can compute the function using퐹 ∶ {0,gates 1} as→ well.푠 {0, This 1} is useful in many cryptograhic푇 proofs. 푤 ∈ {0, 1} 푥 ↦ 퐹 (푥푤) 푇 One can build the theory of cryptography using Turing machines as well, but it is more cumbersome.

R Remark 2.7 — Computing beyond functions. Later on in the course, both our cryptographic schemes and the adversaries will extend beyond simple functions that map an input to an output, and we will consider interactive algorithms that exchange messages with one another. Such an algorithm can be implemented us- ing circuits or Turing machines that take as input the prior state and the history of messages up to a certain point in the interaction, and output the next message in the interaction. The number of operations used in such a strategy is the total number of gates used in computing all the messages. 78 an intensive introduction to cryptography

2.2 OUR FIRST CONJECTURE

We are now ready to make our first conjecture:

The Cipher Conjecture: 7 There exists a compu- tationally secret encryption scheme (where are efficient) with length function . (퐸, 퐷) 7 As will be the case for other conjectures we talk A conjecture퐸, 퐷is a well defined mathematicalℓ(푛) statement = 푛 + 1 which (1) about, the name “The Cipher Conjecture” is not a standard name, but rather one we’ll use in this we believe is true but (2) don’t know yet how to prove. Proving the course. In the literature this conjecture is mostly cipher conjecture will be a great achievement and would in particular referred to as the conjecture of existence of one way functions, a notion we will learn about later. These two settle the P vs NP question, which is arguably the fundamental ques- conjectures a priori seem quite different but have been tion of computer science. That is, the following theorem is known: shown to be equivalent.

Theorem 2.8 — Breaking crypto if P=NP. If NP then there does not exist a computationally secret encryption with efficient and and where the message is longer than푃 the = key. 퐸 퐷 Proof. We just sketch the proof, as this is not the focus of this course. If NP then whenever we have a loop that searches through some domain to find some string that satisfies a particular property (likethe loop푃 = in the Distinguish subroutine above that searches over all keys) then this loop can be sped up exponentially . ■

While it is very widely believed that NP, at the moment we do not know how to prove this, and so have to settle for accepting the cipher conjecture as essentially an axiom,푃 though ≠ we will see later in this course that we can show it follows from some seemingly weaker conjectures. There are several reasons to believe the cipher conjecture. We now briefly mention some of them:

• Intuition: If the cipher conjecture is false then it means that for every possible cipher we can make the exponential time attack described above become efficient. It seems “too good to be true” in a similar way that the assumption that P=NP seems too good to be true.

• Concrete candidates: As we will see in the next lecture, there are sev- eral concrete candidate ciphers using keys shorter than messages for which despite tons of effort, no one knows how to break them. Some of them are widely used and hence governments and other benign or not so benign organizations have every reason to invest huge resources in trying to break them. Despite that as far as we know (and we know a little more after Edward Snowden’s reve- lations) there is no significant break known for the most popular ciphers. Moreover, there are other ciphers that can be based on computational security 79

canonical mathematical problems such as factoring large integers or decoding random linear codes that are immensely interesting in their own right, independently of their cryptographic applications.

• Minimalism: Clearly if the cipher conjecture is false then we also don’t have a secure encryption with a message, say, twice as long as the key. But it turns out the cipher conjecture is in fact necessary for essentially every , including not just private key and public key encryptions but also digital signatures, hash functions, pseudorandom generators, and more. That is, if the cipher conjecture is false then to a large extent cryptography does not exist, and so we essentially have to assume this conjecture if we want to do any kind of cryptography.

2.3 WHY CARE ABOUT THE CIPHER CONJECTURE?

“Give me a place to stand, and I shall move the world” Archimedes, circa 250 BC

Every perfectly secure encryption scheme is clearly also compu- tationally secret, and so if we required a message of size instead , then the conjecture would have been trivially satisfied by the one-time pad. However, having a message longer than the푛 key by just a푛 single + 1 bit does not seem that impressive. Sure, if we used such a scheme with -bit long keys, our communication will be smaller by a factor of (or a saving of about ) over the one-time pad, but this doesn’t128 seem worth the risk of using an unproven conjecture. However, it128/129 turns out that if we assume this0.8% rather weak condition, we can actually get a computationally secret encryption scheme with a message of size for every polynomial . In essence, we can fix a single -bit long key and communicate securely as many bits as we want! 푝(푛) 푝(⋅) Moreover,푛 this is just the beginning. There is a huge range of other useful cryptographic tools that we can obtain from this seemingly innocent conjecture: (We will see what all these names and some of these reductions mean later in the course.) We will soon see the first of the many reductions we’ll learn in this course. Together this “web of reductions” forms the scientific core of cryptography, connecting many of the core concepts and enabling us to construct increasingly sophisticated tools based on relatively simple “axioms” such as the cipher conjecture.

2.4 PRELUDE: COMPUTATIONAL INDISTINGUISHABILITY

The task of Eve in breaking an encryption scheme is to distinguish between an encryption of and an encryption of . It turns out

0 1 푚 푚 80 an intensive introduction to cryptography

Figure 2.2: Web of reductions between notions equiva- lent to ciphers with larger than key messages

to be useful to consider this question of when two distributions are computationally indistinguishable more broadly:

Definition 2.9 — Computational Indistinguishability (concrete definition). Let and be two distributions over . We say that and are -computationally indistinguishable푚, denoted by , if 푋for every푌 function {0, 1}computable with푋 at most푌 푇 ,휖 operations,(푇 , 휖) 푚 푋 ≈ 푌 퐷 ∶ {0, 1} → {0, 1} 푇 Pr Pr

| [퐷(푋) = 1] − [퐷(푌 ) = 1]| ≤ 휖 . Solved Exercise 2.1 — Computational Indistinguishability game. Prove that for every and as above if and only if for every -operation computable , the probability that wins in the 푇 ,휖 following푋, game 푌 is at푇 most, 휖 푋: ≈ 푌 ≤ 푇 퐸푣푒 퐸푣푒 1. We pick . 1/2 + 휖/2

푅 2. If , we푏 ← let {0, 1} . If , we let .

푅 푅 3. We푏 =give 0 the푤 input ← 푋 , and푏 = 1 outputs푤 ← 푌 . ′ 4. wins퐸푣푒if . 푤 퐸푣푒 푏 ∈ {0, 1} ′ 퐸푣푒 푏 = 푏 ■

P Working out this exercise on your own is a great way to get comfortable with computational indistinguisha- bility, which is a fundamental notion. computational security 81

Solution: For every function , let Pr and Pr . 푚 푋 Then the probability퐸푣푒 that ∶ {0, 1}wins→ the {0, 1}game is:푝 = [퐸푣푒(푋) = 푌 1] 푝 = [퐸푣푒(푌 ) = 1] Pr 퐸푣푒 Pr

and since Pr Pr 푋 this is 푌 [푏 = 0](1 − 푝 ) + [푏 = 1]푝 [푏 = 0] = [푏 = 1] = 1/2

1 1 1 1 1 We see that wins푋 the game푌 with success푌 푋 if and only 2 − 2 푝 + 2 푝 = 2 + 2 (푝 − 푝 ) if Pr퐸푣푒 Pr 1/2+휖/2 Since Pr Pr Pr Pr , [퐸푣푒(푌 ) = 1] − [퐸푣푒(푋) = 1] = 휖 . this already shows that if and are -indistinguishable then will[퐸푣푒(푌 win the ) = game 1]− with[퐸푣푒(푋) probability = 1] ≤ at | most[퐸푣푒(푋). = 1] − [퐸푣푒(푌 ) = 1]| For the other direction,푋 assume푌 that (푇 ,and 휖) are not compu- 퐸푣푒tationally indistinguishable and let be a time휖/2 operation function such that 푋 푌 퐸푣푒 푇 Pr Pr

Then by definition| [퐸푣푒(푋) of = absolute 1] − [퐸푣푒(푌 value, ) = there 1]| ≥ 휖 are . two options. Either Pr Pr in which case wins the game with probability at least . Otherwise Pr [퐸푣푒(푋)Pr = 1] − [퐸푣푒(푌 ), = in which 1] ≥ case 휖 the function 퐸푣푒 (which is just as easy1/2 to compute) + 휖/2 wins the game[퐸푣푒(푋)′ withprobability = 1] − [퐸푣푒(푌 at least ) = 1] ≤ −휖. 퐸푣푒Note(푤) that = 1 above − 퐸푣푒(푤) we assume that the class of “functions com- putable in at most operations”1/2 is +closed 휖/2 under negation, in the sense that if is in this class, then is also. For standard Boolean circuits, this푇 can be done if we don’t count negation gates (which can change퐹 the total circuit1 size by− at 퐹most a factor of two), or we can allow for to require a constant additional number of operations, in which case′ the exercise is still essentially true but is slightly more cumbersome퐸푣푒 to state. ■

As we did with computational secrecy, we can also define an asymptotic definition of computational indistinguishability.

Definition 2.10 — Computational indistt. Let be some function and let and be two sequences of distributions such that and are distributions over푚 ∶ ℕ → ℕ . 푛 푛∈ℕ 푛 푛∈ℕ {푋 } {푌 } 푚(푛) 푛 푛 푋 푌 {0, 1} 82 an intensive introduction to cryptography

We say that and are computationally indistin- guishable, denoted by , if for every polynomial 푛 푛∈ℕ 푛 푛∈ℕ and sufficiently{푋 } {푌large }, . 푛 푛∈ℕ 푛 푛∈ℕ {푋 } ≈ {푌 } 푛 푝(푛),1/푝(푛) 푛 푝Solving ∶ ℕ → ℕ the following asymptotic푛 analog푋 ≈ of Solved푌 Exercise 2.1 is a good way to get comfortable with the asymptotic definition of computational indistinguishability: Exercise 2.2 — Computational Indistinguishability game (asymptotic). Let and be as above. Then if and only if for every polynomial-time , there is some 푛 푛∈ℕ 푛 푛∈ℕ 푛 푛∈ℕ {푋negligible} , {푌 function} such푚 ∶ that ℕ → ℕwins the following{푋 game} with≈ 푛 푛∈ℕ {푌probability} at most : 퐸푣푒 휇 퐸푣푒 1. We pick . 1/2 + 휇(푛) 2. If , we let푅 . If , we let . 푏 ← {0, 1} 3. We give the input푅 푛, and outputs 푅 푛 . 푏 = 0 푤 ← 푋 푏 = 1 푤 ← 푌 ′ 4. wins if . 퐸푣푒 푤 퐸푣푒 푏 ∈ {0, 1} ′ ■ 퐸푣푒 푏 = 푏 Dropping the index n. Since the index of our distributions would often be clear from context (indeed in most cases it will be the length of the key), we will sometimes drop it푛 from our notation. So if and are two random variables that depend on some index , we will say that is computationally indistinguishable from (denoted as푋 푌 ) when the sequences and are computationally푛 indistinguishable.푋 푌 푋 ≈ 푛 푛∈ℕ 푛 푛∈ℕ 푌 We can use computational{푋 } indistinguishability{푌 } to phrase the defini- tion of computational secrecy more succinctly:

Theorem 2.11 — Computational Indistinguishability phrasing of security. Let be a valid encryption scheme. Then is computation- ally secret if and only if for every two messages , (퐸, 퐷) (퐸, 퐷) ℓ 0 1 푚 , 푚 ∈ {0, 1}

푘 0 푛∈ℕ 푘 1 푛∈ℕ where each of these{퐸 two(푚 distributions)} ≈ {퐸 is(푚 obtained)} by sampling a random . 푛 푅 Working푘← out{0, the 1} proof is an excellent way to make sure you under- stand both the definition of computational secrecy and computational indistinguishability, and hence we leave it as an exercise. One intuition for computational indistinguishability is that it is related to some notion of distance. If two distributions are computa- tionally indistinguishable, then we can think of them as “very close” computational security 83

to one another, at least as far as efficient observers are concerned. In- tuitively, if is close to and is close to then should be close 8 8 Results of this form are known as “triangle inequal- to . Similarly if four distributions satisfy that is close ities” since they can be viewed as generalizations of to and 푋is close to 푌, then푌 you might′ expect푍 ′ that푋 the distribu- the statement that for every three points on the plane , the distance from to is not larger than the tion푍 ′ where we take′ two independent푋, 푋 , 푌 , samples 푌 from 푋and distance from to plus the distance from to . In respectively,푌 푋′ is close to푌 the distribution where we take two ′ 푥,other 푦, 푧 words, the edge 푥of the푧 triangle is not longer than the sum of the lengths of the other independent(푋, 푋 ) samples from and respectively.′ We will now푋 verify푋 푥 푦 푦 푧 two edges and 푥,. 푧 (푥, 푦, 푧) that these intuitions are in fact correct:′ (푌 , 푌 ) 푌 푌 푥, 푦 푦, 푧 Theorem 2.12 — Triangle Inequality for Computational Indistinguishability. Suppose . Then .

1 푇 ,휖 2 푇 ,휖 푇 ,휖 푚 1 푇 ,(푚−1)휖 푚 Proof. Suppose푋 ≈ that푋 there≈ exists⋯ ≈ a 푋time such푋 ≈ that 푋

Pr Pr 푇 퐸푣푒

1 푚 Write | [퐸푣푒(푋 ) = 1] − [퐸푣푒(푋 ) = 1]| > (푚 − 1)휖 .

Pr Pr Pr Pr 푚−1

1 푚 푖 푖+1 [퐸푣푒(푋Thus, ) = 1]− [퐸푣푒(푋 ) = 1] = ∑푖=1 ( [퐸푣푒(푋 ) = 1] − [퐸푣푒(푋 ) = 1]) .

Pr Pr 푚−1

푖 푖+1 and hence∑푖=1 in| particular[퐸푣푒(푋 ) there = 1]must − [퐸푣푒(푋 exist some) = 1]| > (푚 − 1)휖 such that Pr Pr 푖 ∈ {1, … , 푚 − 1} contradicting the assumption푖 that 푖+1 for all | [퐸푣푒(푋 ) = 1] − [퐸푣푒(푋 ) = 1]| > 휖 . 푖 푇 ,휖 푖+1 {푋 } ≈ {푋 } 푖 ∈ ■ {1, … , 푚 − 1}

Theorem 2.13 — Computational Indistinguishability is preserved under repeti- tion. Suppose that are distributions over such that . Then . 푛 1 ℓ 1 ℓ 푋 , … , 푋 , 푌 , … , 푌 {0, 1} 푖 푇 ,휖 푖 1 ℓ 푇 −10ℓ푛,ℓ휖 1 ℓ Proof. For every푋 ≈ 푌 (푋we, … define , 푋 ) ≈ to be the(푌 distribu-, … , 푌 ) tion . Clearly and 푖 푖. We ∈{0, will … prove , ℓ} that for every퐻 , , 1 푖 푖+1 ℓ ℓ 1 ℓ and the(푋 proof, … , 푋 will, 푌 then, … follow, 푌 ) from the퐻 triangle= (푋 inequality, … , 푋 ) (can 0 1 ℓ 푖−1 푇 −10ℓ푛,휖 푖 퐻you= see (푌 why?)., … , 푌 Indeed,) suppose towards the sake푖 퐻 of contradic-≈ 퐻 tion that there was some and some -time such that ′ 푛ℓ 푖 ∈ {1, … , ℓ} 푇 − 10ℓ푛 퐸푣푒 ∶ {0, 1} → {0, 1}

′ ′ 푖−1 푖 |피[퐸푣푒 (퐻 )] − 피[퐸푣푒 (퐻 )]| > 휖 . 84 an intensive introduction to cryptography

In other words

′ ′ 푋By1,…,푋 linearity푖−1,푌푖,…,푌 ofℓ expectation1 we푖−1 can write푖 theℓ difference푋1,…,푋푖,푌 푖+1of ,…,푌theseℓ two 1 푖 푖+1 ℓ ∣피 [퐸푣푒 (푋 , … , 푋 , 푌 , … , 푌 )] − 피 [퐸푣푒 (푋 , … , 푋 , 푌 , … , 푌 )]∣ > 휖 . expectations as

1 푖−1 푖 푖 푖+1 ℓ 9 ′ ′ 푋By,…,푋 the averaging,푋 ,푌 ,푌 ,…,푌 principle this1 means푖−1 that푖 there푖+1 exist someℓ values1 푖−19 푖 푖+1 ℓ 피 [퐸푣푒 (푋 , … , 푋 , 푌 , 푌 , … , 푌 ) − 퐸푣푒 (푋 , … , 푋 This, 푋 is, 푌the principle, … , 푌 )] that if the average grade in an such that exam was at least then someone must have gotten at least , or in other words that if a real-valued random 1 푖−1 푖+1 ℓ variable satisfies훼 then Pr . 푥 , … , 푥 , 푦 , … , 푦 훼 ′ ′ Now푋푖,푌푖 and 1 are simply푖−1 푖 independent푖+1 ℓ draws from1 the 푖−1 푖 푖+1 ℓ 푍 피[푍] ≥ 훼 [푍 ≥ 훼] > 0 ∣피 [퐸푣푒 (푥 , … , 푥 , 푌 , 푦 , … , 푦 ) − 퐸푣푒 (푥 , … , 푥 , 푋 , 푦 , … , 푦 )]∣ > 휖 distributions and respectively, and so if we define 푖 푖 푋 푌 then runs in time 10 at most the running푋′ 푌 time of plus and it satisfies 1 푖−1 푖+1 ℓ 10 The cost is for the operations of feeding the 퐸푣푒(푧) = 퐸푣푒 (푥 , … , 푥 , 푧, 푦′ , … , 푦 ) 퐸푣푒 “hardwired” strings , into 퐸푣푒 10ℓ푛 . These10ℓ푛 take up at most bits, and depending 푖 푖 on the computational model,1 푖−1 storing푖+1 and feedingℓ contradicting the assumption푋 that푖 푌 .푖 ′ 푥 , … , 푥 푦 , … , 푦 ∣피 [퐸푣푒(푋 )] − 피 [퐸푣푒(푌 )]∣ > 휖 ■ 퐸푣푒them into may take ℓ푛steps for some small constant . In the future, we will usually ignore 푖 푇 ,휖 푖 ′ 푋 ≈ 푌 such minor퐸푣푒 details and simply푐ℓ푛 say that if runs in polynomial푐 < time 10 then so will . ′ R 퐸푣푒 Remark 2.14 — The hybrid argument. The above proof 퐸푣푒 illustrates a powerful technique known as the hybrid argument whereby we show that two distribution and are close to each other by coming up with0 a sequence1 of distributions such that 퐶 퐶 , and we can argue that is close to 0 푡 for1 all . This0 type of argument퐻 , … repeats , 퐻 itself time 푡 0 푖 퐻and= again 퐶 , 퐻 in cryptography,= 퐶 and so it is important퐻 to 푖+1 퐻get comfortable푖 with it.

2.5 THE LENGTH EXTENSION THEOREM OR STREAM CIPHERS

We now turn to show the length extension theorem, stating that if we have an encryption for -length messages with -length keys, then we can obtain an encryption with -length messages for every polynomial . For a푛 warm-up, + 1 let’s show the easier푛 fact that we can transform an encryption such as above,푝(푛) into one that has keys of length and푝(푛) messages of length for every integer :

Theorem푡푛 2.15 — Security of repetition.푡(푛Suppose + 1) that is푡 a com- putationally secret encryption scheme with bit keys′ ′ and bit messages. Then the scheme where (퐸 , 퐷 ) and 푛 푛 + 1 푘1,…,푘푡 1 푡 ′ ′ (퐸, 퐷) 퐸 (푚′ , … , 푚 )′ = 1 1 푡 푡 푘1,…,푘푡 1 푡 1 1 푡 푡 (퐸푘 (푚 ), … , 퐸푘 (푚 )) 퐷 (푐 , … , 푐 ) = (퐷푘 (푐 ), … , 퐷푘 (푐 )) computational security 85

is a computationally secret scheme with bit keys and bit messages. 푡푛 푡(푛 + 1) Proof. This might seem “obvious” but in cryptography, even obvious facts are sometimes wrong, so it’s important to prove this formally. Luckily, this is a fairly straightforward implication of the fact that computational indisinguishability is preserved under many samples. That is, by the security of we know that for every two mes- sages , ′ ′ where is chosen from the distribution′ . Therefore푛+1 (퐸′ , 퐷 by) the indistinguishability′ ′ of many 푘 푘 samples푚, lemma, 푚 ∈ {0, for 1} every퐸 two(푚) tuples ≈ 퐸 (푚 ) 푘 and 푛 푈 , 푛+1 1 푡 ′ ′ 푛+1 푚 , … , 푚 ∈ {0, 1} 1 푡 푚 , … , 푚 ∈ {0, 1}

′ ′ ′ ′ ′ ′ for random 1 1 chosen푡 independently푡 1 1 from 푡 which푡 is ex- (퐸푘 (푚 ), … , 퐸푘 (푚 )) ≈ (퐸푘 (푚 ), … , 퐸푘 (푚 )) actly the condition that is computationally secret. 1 푡 푛 푘 , … , 푘 푈 ■ (퐸, 퐷) Randomized encryption scheme. We can now prove the full length exten- sion theorem. Before doing so, we will need to generalize the notion of an encryption scheme to allow a randomized encryption scheme. That is, we will consider encryption schemes where the encryption algo- rithm can “toss coins” in its computation. There is a crucial difference between key material and such “as hoc” (sometimes also known as “ephemeral”) randomness. Keys need to be not only chosen at ran- dom, but also shared in advance between the sender and receiver, and stored securely throughout their lifetime. The “coin tosses” used by a randomized encryption scheme are generated “on the fly” and are not known to the receiver, nor do they need to be stored long term by the sender. So, allowing such randomized encryption does not make a difference for most applications of encryption schemes. In fact, as we will see later in this course, randomized encryption is necessary for security against more sophisticated attacks such as chosen plaintext and chosen ciphertext attacks, as well as for obtaining secure public key encryptions. We will use the notation to denote the output of the encryption algorithm on key , message and using internal ran- 푘 domness . We often suppress the notation퐸 (푚; for 푟) the randomness, and hence use to denote the random푘 variable푚 obtained by sampling a random푟 and outputting . 푘 We can퐸 now(푚) show that given an encryption scheme with messages 푘 one bit longer푟 than the key,퐸 we(푚; can 푟) obtain a (randomized) encryption scheme with arbitrarily long messages: 86 an intensive introduction to cryptography

Theorem 2.16 — Length extension of ciphers. Suppose that there exists a computationally secret encryption scheme with key length and message length . Then for every′ polynomial′ there exists a (randomized) computationally secret(퐸 encryption, 퐷 ) scheme 푛 with key length푛 +and 1 message length . 푡(푛)

(퐸, 퐷) 푛 푡(푛) Figure 2.3: Constructing a cipher with bit long messages from one with long messages 푡 푛 + 1

P This is perhaps our first example of a non trivial cryp- tographic theorem, and the blueprint for this proof will be one that we will follow time and again during this course. Please make sure you read this proof carefully and follow the argument.

Proof of Theorem 2.16. The construction, depicted in Fig. 2.3, is actually quite natural and variants of it are used in practice for stream ciphers, which are ways to encrypt arbitrarily long messages using a fixed size key. The idea is that we use a key of size to encrypt (1) a fresh key of size and (2) one bit of the message. Now we can encrypt 0 using and so on and so forth.푘 We now푛 describe the construction 1 and푘 analysis in푛 detail. 2 1 푘 Let 푘 . We are given a cipher which can encrypt - bit long messages with an -bit long key′ and we need to encrypt a -bit long푡 = message 푡(푛) 퐸 . Our idea is simple푛 + 1 (at 푡 least in hindsight). Let 푛 be our key (which is chosen at 11 1 푡 The keys are sometimes known as 푡random). To encrypt푚 =using (푚 , … ,, 푚 the푛) ∈ encryption {0, 1} function will choose ephemeral keys in the crypto literature, since they 0 푅 random strings 푘 ← {0, 1} . We will then encrypt the - are created푘 only1, … for , 푘푡 the purposes of this particular 0 interaction. bit long message 푚 with푘 the key푛 to obtain the ciphertext , 12 1 푡 푅 The astute reader might note that the key is 푡then encrypt the 푘 , …-bit , 푘 ← long{0, message 1} with the key 푛 +to 1 actually not used anywhere in the encryption nor 1 1 0 1 푡 obtain the ciphertext(푘 , 푚,) and so on and푘 so forth until we encrypt the푐 decryption and hence we could encrypt more푘 bits of the message instead in this final round. We used 11 2 2 1 message 푛with + 1 the key . We(푘 output, 푚 ) as the푘 final the current description for the sake of symmetry푛 and 2 ciphertext.12 푐 simplicity of exposition. 푡 푡 푡−1 1 푡 (푘 , 푚 ) 푘 (푐 , … , 푐 ) computational security 87

To decrypt using the key , first decrypt to learn , then use to decrypt to learn , and so on until we 1 푡 0 1 use to decrypt(푐 , … , 푐and) learn 푘. Finally we can푐 simply output 1 1 1 2 2 2 (푘 , 푚 ) . 푘 푐 (푘 , 푚 ) 푡−1 푡 푡 푡 The푘 above are clearly푐 valid encryption(푘 , 푚 ) and decryption algorithms, 1 푡 and(푚 , hence … , 푚 the) real question becomes is it secure??. The intuition is that hides all information about and so in particular the first bit of the message is encrypted securely, and still can be treated as an 1 1 1 푐unknown random string even to(푘 an, 푚 adversary) that saw . Thus, we 1 can think of as a random secret key for푘 the encryption , and hence 1 the second bit of the message is encrypted securely, and푐 so on and so 1 2 forth. 푘 푐 Our discussion above looks like a reasonable intuitive argument, but to make sure it’s true we need to give an actual proof. Let be two messages. We need to show that ′. The heart푡 of the proof will be the following claim: 푚, 푚′ ∈ 푈푛 푈푛 {0,Claim: 1} Let be the algorithm that on input퐸 a message(푚) ≈ 퐸and(푚 key) works like except that its block contains where ̂ is a random string퐸 in , that푡ℎ is chosen independently′ ′of푚 everything ′ 0 푘푖−1 푖 푖 푖 푘else including퐸 the key . Then,푛 푖 for every message퐸 (푘 , 푚 ) 푘 {0, 1} 푡 푖 푘 푚 ∈ {0, 1} (2.1)

푈푛 푈푛 Note that is not a valid퐸 encryption(푚) ≈ 퐸̂ scheme(푚) . since it’s not at all clear there is a decryption algorithm for it. It is just an hypothetical tool we use for퐸̂ the proof. Since both and are randomized en- cryption schemes (with using bits of randomness for the ephemeral keys and using퐸 퐸̂ bits of randomness for the ephemeral keys 퐸 (푡 − 1)푛 ), we can also write (2.1) as 1 푡−1 푘 , … , 푘 퐸̂ ′ (2푡′ − 1)푛 1 푡 2 푡 푘 , … , 푘 , 푘 , … , 푘 ′ ′ 푈푛 푡푛 푈푛 where we use to퐸 denote(푚; a푈 random) ≈ 퐸̂ variable(푚; 푈(2푡−1)푛 that) is chosen uni- formly at random′ from and independently from the choice ℓ of (which is푈 chosen uniformlyℓ at random from ). Once we prove the claim{0, 1} then we are done since we know푛 that for 푛 every푈 pair of messages , and{0, 1}

but ′ 푛since is essentially푛 the푛 same′ as 푈 푈̂ 푈 the -times′ repetition scheme푚, 푚 we퐸′ analyzed(푚) ≈ 퐸 above.(푚) Thus퐸 by the(푚 triangle) ≈ 푈푛 푈푛 푈푛 퐸inequalitŷ (푚 ) we퐸 can̂ (푚) conclude ≈ 퐸̂ that(푚 ) 퐸̂ as we desired. Proof푡 of claim: We prove the claim by the hybrid′ method. For 푈푛 푈푛 , let be the distribution퐸 (푚) of ≈ ciphertexts 퐸 (푚 ) where in the first blocks we act like and in the last blocks we act like . That 푗 is,푗 ∈ we {0, choose … , 푡} 퐻 independently at random from ̂ 푗and the block of퐸 is′ equal′ to 푡 − 푗 if and is퐸 equal to 0 푡 1 푡 푛 푡ℎ 푘 , … , 푘 , 푘 , … , 푘 ′ 푈 푗 푖−1 푖 푖 푖 퐻 퐸푘 (푘 , 푚 ) 푖 > 푗 88 an intensive introduction to cryptography

if . Clearly, and and so it′ suffices′ to prove that for every , 푛 . Indeed, let푛 푘푖−1 푖 푖 푡 푈 0 푈 퐸and suppose(푘 , 푚 ) towards푖 ≤ 푗 the sake퐻 of= contradiction퐸̂ (푚) that퐻 there= 퐸 exists(푚) an 푗−1 푗 efficient such that 푗 퐻 ≈ 퐻 푗 ∈ {1, … , 푡} ′ 퐸푣푒

′ ′ where is noticeable.푗−1 By the averaging푗 principle, there ∣피[퐸푣푒 (퐻 )] − 피[퐸푣푒 (퐻 )]∣ ≥ 휖 (∗) exists some fixed choice for such that still holds.휖 Note = 휖(푛) that in this case′ the′ only randomness is the choice of 1 푡 0 푗−2 푗 푡 and moreover the푘 first, … , 푘 , 푘 ,blocks … , 푘 and, 푘 the, … , last 푘 blocks(∗) of and would be identical and we can denote them by and 푗−1 푅 푛 푘respectively← 푈 and hence write as푗 − 1 푡 − 푗 푗−1 푗 퐻 퐻 훼 훽 (∗)

푗−1 ′ ′ ′ ′ ′ 푘 푗−1 푗 푗 푗−1 푗 푗 ∣피But now[퐸푣푒 consider(훼, 퐸푘 the(푘 , adversary 푚 ), 훽) − 퐸푣푒 (훼,that 퐸 is푘 defined(푘 , 푚 ), 훽)]∣as ≥ 휖 (∗∗) . Then is also efficient and by it can distinguish between′ and 퐸푣푒thus contradicting the퐸푣푒(푐) secu- = 퐸푣푒rity of(훼, 푐, 훽)′ . This퐸푣푒 concludes′ the′ proof of the(∗∗) claim and hence the 푈푛 푗 푗 푈푛 푗 푗 theorem.퐸′ ′(푘 , 푚 ) 퐸 (푘 , 푚 ) (퐸 , 퐷 ) ■

2.5.1 Appendix: The computational model For concreteness sake let us give a precise definition of what it means for a function or probabilistic process mapping to to be computable using operations. 푛 푚 푓 {0, 1} {0, 1} • If you have taken any푇 course on computational complexity (such as Harvard CS 121), then this is the model of Boolean circuits, except that we also allow randomization.

• If you have not taken such a course, you might simply take it on faith that it is possible to model what it means for an algorithm to be able to map an input into an output using “elementary operations”. 푥 푓(푥) 푇 In both cases you might want to skip this appendix and only return to it if you find something confusing. The model we use is a Boolean circuit that also has a RAND gate that outputs a random bit. We could use as the basic set of gates the standard AND, OR and NOT but for simplicity we use the one- element set NAND. We represent the circuit as a straightline program, but this is of course just a matter of convenience. As shown (for exam- ple) in the CS 121 textbook, these two representations are identical. computational security 89

Definition 2.17 — Probabilistic straightline program. A probabilistic straight- line program consists of a sequence of lines, each one of them one of the following forms:

• foo = NAND(bar, baz) where foo,bar,baz are variable identi- fiers.

• foo = RAND() where foo is a variable identifier.

Given a program , we say that its size is the number of lines it contains. Variables of the form X[ ] or Y[ ] are considered input and output variables respectively.휋 If the input variables range from to and the output variables range푖 from푗 to then the program computes the probabilistic process that maps to 0in the 푛natural − 1 way. If is a (probabilistic or deterministic)0 푚 −푛 1 map of 푚 to , the complexity of is the size of the smallest{0, 1} program{0, 1} that푛 computes푚 it. 퐹 {0, 1} {0,If 1} you haven’t taken a class퐹 such as CS121 before, you might푃 won- der how such a simple model captures complicated programs that use loops, conditionals, and more complex data types than simply a bit in , not to mention some special purpose crypto-breaking devices that might involve tailor-made hardware. It turns out that it does (for the{0, 1} same reason we can compile complicated programming languages to run on silicon chips with a very limited instruction set). In fact, as far as we know, this model can capture even computations that hap- pen in nature, whether it’s in a bee colony or the human brain (which contains about neurons, so should in principle be simulatable by a program that10 has up to a few order of magnitudes of the same number of lines).10 Crucially, for cryptography, we care about such pro- grams not because we want to actually run them, but because we want to argue about their non existence. If we have a process that cannot be computed by a straightline program of length shorter than then it seems safe to say that a computer the size of the human128 brain38 (or even all the human and nonhuman brains on this planet)2 will> not10 be able to perform it either.

Advanced note: non uniformity. The computational model we use in this class is non uniform (corresponding to Boolean circuits) as opposed to uniform (corresponding to Turing machines). If this distinction doesn’t mean anything to you, you can ignore it as it won’t play a sig- nificant role in what we do next. It basically means that we doallow our programs to have hardwired constants of bits where is the input/key length. In fact, to be precise, we will hold ourselves to a higher standard than our adversary, in the sense푝표푙푦(푛) that we require푛 our 90 an intensive introduction to cryptography

algorithms to be efficient in the stronger sense of being computable in uniform probabilistic polynomial time (for some fixed polynomial, often or )), while the adversary is allowed to use non uni- formity. 2 푂(푛) 푂(푛 Quantum computing. An interesting potential exception to this princi- ple that every natural process should be simulatable by a straightline program of comparable complexity are processes where the quantum mechanical notions of interference and entanglement play a significant role. We will talk about this notion of quantum computing towards the end of the course, though note that much of what we say does not really change when we add quantum into the picture. As discussed in the CS 121 text, we can still capture these processes by straightline programs (that now have somewhat more complex form), and so most of what we’ll do just carries over in the same way to the quantum realm as long as we are fine with conjecturing the strong form of the cipher conjecture, namely that the cipher is infeasible to break even for quantum computers. All current evidence points toward this strong form being true as well. The field of constructing encryption schemes that are potentially secure against quantum computers is known as post quantum cryptography and we will return to this later in the course. 3 Pseudorandomness

1 Edited and expanded by Richard Xu in Spring 2020. Reading: Katz-Lindell Section 3.3, Boneh-Shoup Chapter 31

The nature of randomness has troubled philosophers, scientists, 2 2 Even lawyers grapple with this question, with a statisticians and laypeople for many years. Over the years people recent example being the debate of whether fantasy have given different answers to the question of what does it mean for football is a game of chance or of skill. data to be random, and what is the nature of probability. The move- ments of the planets initially looked random and arbitrary, but then early astronomers managed to find order and make some predictions on them. Similarly, we have made great advances in predicting the weather and will probably continue to do so. So, while these days it seems as if the event of whether or not it will rain a week from today is random, we could imagine that in the future we will be able to predict the weather accurately. Even the canonical notion of a random experiment -tossing a coin - might not be as ran- dom as you’d think: the second toss will have the same result as the first one with about a 51% chance. (Though see also this experiment.) It is conceivable that at some point someone would discover some

function that, given the first 100 coin tosses by any given person, can 3 3 In fact such a function must exist in some sense since predict the value of the 101 . in the entire history of the world, presumably no In all these퐹 examples, the푠푡 physics underlying the event, whether it’s sequence of fair coin tosses has ever repeated. the planets’ movement, the weather, or coin tosses, did not change but 100 only our powers to predict them. So to a large extent, randomness is a function of the observer, or in other words

If a quantity is hard to compute, it might as well be random.

Much of cryptography is about trying to make this intuition more formal, and harnessing it to build secure systems. The basic object we want is the following:

Compiled on 9.23.2021 13:32 92 an intensive introduction to cryptography

Definition 3.1 — Pseudorandom generator (concrete). A function is a pseudorandom generator if where푛 denotesℓ the uniform distribution on . 퐺 ∶ 푛 푇 ,휖 ℓ {0, 1} → {0, 1} (푇 , 휖) 푡퐺(푈 ) ≈ 푈 푡 That is,푈 is a pseudorandom generator{0, if no 1} circuit of at most gates can distinguish with bias better than between the output of (on a random퐺 input)(푇 , 휖) and a uniformly random string of the same length.푇 Spelling this out fully, this means that휖 for every function 퐺 computable using at most operations, ℓ 퐷 ∶ {0, 1} → {0, 1}Pr Pr 푇

푛 ℓ 푅 As we did∣푥← for{0,1} the[퐷(퐺(푥)) case of encryption, = 1] − 푦←푅 we{0,1} will[퐷(푦) typically = 1]∣ < use 휖 .asymp- totic terms to describe cryptographic pseudorandom generator. We say that is simply a pseudorandom generator if it is efficiently com- putable and it is -pseudorandom generator for every polynomial퐺 . In other words, we define pseudorandom generators as follows: (푝(푛), 1/푝(푛)) 푝(⋅) Definition 3.2 — Pseudorandom generator. Let be some function computable in polynomial time. We say∗ that is∗ a pseudorandom generator with length function퐺 ∶ {0, 1} → {0,(where 1} ) if 퐺 ℓ ∶ ℕ → ℕ ℓ(푛)• For > every 푛 , . ∗ • For every polynomial푥 ∈ {0, 1} |퐺(푥)|and =sufficiently ℓ(|푥|) large , if is computable by at most operations, then ℓ(푛) 푝(⋅) 푛 퐷 ∶ {0, 1} → {0, 1} 푝(푛)

Pr Pr (3.1) 1 푛 ℓ | [퐷(퐺(푈 )) = 1] − [퐷(푈 ) = 1]| < 푝(푛) Another way to say it, is that a polynomial-time computable func- tion mapping bits strings to bit strings is a pseudo- random generator if the two distributions and are compu- tationally퐺 indistinguishable푛 . ℓ(푛) > 푛 푛 ℓ(푛) 퐺(푈 ) 푈

P This definition (as is often the case in cryptography) is a bit long, but the concept of a pseudorandom gen- erator is central to cryptography, and so you should take your time and make sure you understand it. In- tuitively, a function is a pseudorandom generator if (1) it expands its input (mapping bits to or more) and (2) we cannot퐺 distinguish between the out- 푛 푛 + 1 pseudorandomness 93

put for a short (i.e., bit long) random string, often known as the seed of the pseudorandom gen- erator,퐺(푥) and a푥 truly random long푛 (i.e., of length ) string chosen uniformly at random from . ℓ(푛) {0, 1}

Figure 3.1: A function is a

pseuaodrandom generator if for푛 a randomℓ(푛) short is computationally퐺 ∶ {0, 1} indistinguishable→ {0, 1} from a long truly푛 random 퐺(푥) . 푅 푥 ← {0, 1} ℓ(푛) 푦 ←푅 {0, 1}

Note that the requirement that is crucial to make this notion non-trivial, as for the function clearly satisfies that is identical to (and hence inℓ particular > 푛 indistinguishable from) the distribution ℓ. = (Make 푛 sure that you퐺(푥) understand = 푥 this last state- 푛 퐺(푈ment!)) However, for this is no longer trivial at all. In particular, 푛 if we didn’t restrict푈 the running time of then no such pseudo- random generator wouldℓ > 푛 exist: 퐸푣푒 Lemma 3.3 Suppose that . Then there ex- ists an (inefficient) algorithm 푛 푛+1 such that but 퐺 ∶ {0, 1} → {0,푛+1. 1} 퐸푣푒 ∶ {0, 1} → {0, 1} 푛 푛+1 피[퐸푣푒(퐺(푈Proof. On input))] = 1 피[퐸푣푒(푈, consider)] ≤ 1/2 the algorithm that goes over all possible 푛+1and will output if and only if for some . Clearly푦 ∈ {0, 1}푛 . However, the set퐸푣푒 on which푥 ∈ {0, Eve 1} outputs has size1 at most , and푦 hence = 퐺(푥) a 푛 random 푥푛 will피[퐸푣푒(퐺(푈 fall in with))] = probability 1 at most푛 푆 =. {퐺(푥) ∶ 푥 ∈ {0, 1} } 1 2 ■ 푅 푛+1 푦← 푈 푆 1/2 It is not hard to show that if NP then the above algorithm Eve can be made efficient. In particular, at the moment we do not know how to prove the existence of pseudorandom푃 = generators. Nevertheless we believe that pseudorandom generators exist and hence we make the following conjecture:

Conjecture (The PRG conjecture): For every , there exists a pseudorandom generator mapping bits to bits. 4 푛 퐺 푛 푛 + 1 94 an intensive introduction to cryptography

As was the case for the cipher conjecture, and any other conjecture, there are two natural questions regarding the PRG conjecture: why should we believe it and why should we care. Fortunately, the answer to the first question is simple: it is known that the cipher conjecture implies the PRG conjecture, and hence if we believe the former we should believe the latter. (The proof is highly non-trivial and we may not get to see it in this course.) As for the second question, we will see that the PRG conjecture implies a great number of useful crypto- graphic tools, including the cipher conjecture (i.e., the two conjectures are in fact equivalent). We start by showing that once we can get to an output that is one bit longer than the input, we can in fact obtain any polynomial number of bits.

Theorem 3.4 — Length Extension for PRG’s. Suppose that the PRG con- jecture is true. Then for every polynomial , there exists a pseu- dorandom generator mapping bits to bits. 푡(푛) 푛 푡(푛) Figure 3.2: Length extension for pseudorandom generators

Proof. The proof of this theorem is very similar to the length extension theorem for ciphers, and in fact this theorem can be used to give an alternative proof for the former theorem. The construction is illustrated in Fig. 3.2. We are given a pseu- dorandom generator mapping bits into bits and need to construct a pseudorandom′ generator mapping bits to bits for some polynomial 퐺 . The idea푛 is that we푛 maintain + 1 a state of bits, 5 Because we use a small input to grow a large pseu- 5 which are originally our input seed 퐺, and at the푛 step푡 we =use 푡(푛) dorandom string, the input to a pseudorandom to map to the 푡(⋅)-long bit string , output푡ℎ and keep푛 ′ generator is often known as its seed. 0 as our new state. To prove the security푠 of this construction푖 we need퐺 푖−1 푖 푖 푖 푖 to show푠 that the distribution푛 + 1 (푠 , 푦 ) is computationally푦 푠 indistinguishable from the uniform distribution . As usual, we will 푛 1 푡 use the hybrid argument. For퐺(푈 ) = (푦 ,we … , 푦define) to be the dis- 푡 tribution where the first bits chosen uniformly at푈 random, whereas 푖 the last bits are computed푖as ∈ {0,above. … , 푡} Namely, we퐻 choose at ran- 푖 푖 푡 − 푖 푠 pseudorandomness 95

dom in and continue the computation of from the state . Clearly푛 and and hence by the triangle 푖+1 푡 inequality{0, it 1} suffices to prove that for푦 all , … , 푦 . 푖 0 푛 푡 푡 We illustrate푠 these퐻 two= 퐺(푈 hybrids) in 퐻Fig.= 3.3 푈. 푖 푖+1 퐻 ≈ 퐻 푖 ∈ {0, … , 푡 − 1} Figure 3.3: Hybrids and — dotted boxes refer to values that are chosen independently and uniformly at random퐻푖 퐻푖+1

Now suppose otherwise that there exists some adversary such that for some non-negligible . From , we will an adversary breaking the security of the pseudo-퐸푣푒 푖 푖+1 random|피[퐸푣푒(퐻 generator)] − 피[퐸푣푒(퐻(see Fig.′)]| 3.4 ≥). 휖 휖 퐸푣푒 ′ 퐸푣푒 퐺 Figure 3.4: Building an adversary for from an adversary distinguishing and′ ′ . The boxes marked with questions marks퐸푣푒 are those퐺 that are random or퐸푣푒 pseudorandom depending퐻푖 퐻 on푖+1 whether we are in or . Everything inside the dashed red lines is simulated by that gets as input the 푖 푖+1 -bit string퐻 퐻 . ′ 퐸푣푒 푛 + 1 (푠푖+1, 푦푖+1)

On input a string of length , will interpret as where . She then chooses′ randomly and compute 푦 as in our푛푛 pseudorandom+ 1 퐸푣푒 generator’s푦 construc- 푖+1 푖+1 푖+1 1 푖 (푠tion. , 푦 )will then푠 feed∈ {0, 1} to and output푦 , … whatever , 푦 푖+2 푡 ′ 푦 , … , 푦 1 푡 퐸푣푒 (푦 , … , 푦 ) 퐸푣푒 퐸푣푒 96 an intensive introduction to cryptography

does. Clearly, is efficient if is. Moreover, one can see that if was random then′ is feeding with an input distributed according to 퐸푣푒while if′ was퐸푣푒 of the form for a random then 푦 will feed with퐸푣푒 an input distributed퐸푣푒 according to . Hence 푖+1 we get′ that 퐻 푦 퐺(푠) contradicting푠 the 푖 security퐸푣푒 of . 퐸푣푒′ ′ ′ 퐻 푛 푛+1 ■ | 피[퐸푣푒′ (퐺 (푈 ))] − 피[퐸푣푒 (푈 )]| ≥ 휖 퐺

R Remark 3.5 — Pseudorandom generators in practice. The proof of Theorem 3.4 is indicative of many practical constructions of pseudorandom generators. In many operating systems and programming environments, pseudorandom generators work as follows:

1. Upon initialization, the system obtains an initial seed of randomness (where often is something like or ). 푛 0 2. At the -th call to a function푥 ∈ {0, such 1} as ‘rand’ to obtain푛 new randomness,128 the system256 uses some underlying pseudorandom푡 generator to let , updates′ 푛 and outputs푛+푚 . ′ ′ 퐺 ∶ {0, 1} ′→ {0, 1} 푡−1 푡 푥 ‖푦 = 퐺 (푥 ) 푥 = 푥 There are often some additional complications on 푦 how to obtain this seed from some “unpredictable” or “high entropy” observations (which can some- times include network latency, user typing and mouse patterns, and more), and whether the state of the system is periodically “refreshed” using additional observations.

3.0.1 Unpredictability: an alternative approach for proving the length extension theorem The notion that being random is the same as being “unpredictable”, as discussed at the beginning of this chapter, can be formalized as follows.

Definition 3.6 — Unpredictable function. An efficiently computable func- tion is unpredictable if, for any , and polynomially-sized∗ circuit∗ , 퐺 ∶ {0, 1} → {0, 1} 푛 1 ≤ 푖 < ℓ(푛) Pr 푃

1 푖−1 푖 1 푦←퐺(푈푛)[푃 (푦 , … , 푦 ) = 푦 ] ≤ + 푛푒푔푙(푛). Here, is the length function of and 2 denotes that is a random output of . In other words, no polynomial-sized cir- 푛 ℓ(푛) 퐺 푦 ← 퐺(푈 ) 푦 퐺 pseudorandomness 97

cuit can predict the next bit of the output of given the previous bits significantly better than guessing. 퐺 We now show that the condition for a function to be unpre- dictable is equivalent to the condition for it to be a secure PRG. Please make sure you follow the proof, because it is an important퐺 theorem, and because it is another example of a canonical cryptographic proof. Lemma 3.7 Let be a function with length function , then is a secure PRG∗ iff it∗ is unpredictable. 퐺 ∶ {0, 1} → {0, 1} Proof. For the forward direction, suppose for contradiction that there ℓ(푛) 퐺 exists some and some circuit can predict given with probability for non-negligible . Consider the adversary 푖 1 푖−1 that, given푖 a1 string , runs푃 the circuit on푦 푦 , …, , checks 푦 if 2 the output is푝 equal ≥ + to 휖(푛)and if so output 1. 휖 1 푖−1 퐸푣푒If for a uniform푦 , then succeeds푃 푦 with, … , 푦 probability 푖 . If is uniformly random,푦 then we can imagine that the bit is generated푦 = 퐺(푥)after finished its푥 calculation.푃 The bit is or with equal 푖 푝probability,푦 so succeeds with probability . Since outputs푦 1 푖 when succeeds,푃 1 푦 0 1 푃 2 퐸푣푒 푃Pr Pr

푛 ℓ 1 a contradiction.| [퐸푣푒(퐺(푈 )) = 1] − [퐸푣푒(푈 ) = 1]| = |푝 − | ≥ 휖(푛), For the backward direction, let be an unpredictable2 function. Let be the distribution where the first bits come from while the last bits are all random. Notice퐺 that and , so 푖 푛 퐻it suffices to show that for all푖 . 퐺(푈 ) 0 ℓ ℓ 푛 Supposeℓ − 푖 for some , i.e. there퐻 exists= 푈 some퐻 =and 퐺(푈 non-) 푖−1 푖 negligible such that 퐻 ≈ 퐻 푖 푖−1 푖 퐻 ≉ 퐻 푖 퐸푣푒 Pr Pr 휖 Consider the program 푖 that, on input 푖−1 , picks the bits [퐸푣푒(퐻 ) = 1] − [퐸푣푒(퐻 ) = 1] > 휖(푛). uniformly at random. Then, calls on the generated 1 푖−1 input. If outputs 푃then outputs (푦, and, … , otherwise푦 ) it outputs 푖 ℓ ̂푦 , … ,. ̂푦 푃 퐸푣푒 푖 The string퐸푣푒 1 푃 has thê푦 same distribution as . 푖 However,1 − ̂푦 conditioned on , the string has distribution equal to 1 푖−1 푖 ℓ 푖−1 . Let be the(푦 probability, … , 푦 , ̂푦 , that … , ̂푦 ) outputs if and be퐻 the 푖 푖 same probability when ̂푦 =, 푦 then we get 푖 푖 푖 퐻 푝 퐸푣푒 1 ̂푦 = 푦 푞 푖 푖 Pr ̂푦 ≠ 푦 Pr

1 푖 푖−1 Therefore,푝 − the(푝 + probability 푞) = [퐸푣푒(퐻outputs) = 1] the − correct[퐸푣푒(퐻 value) = is 1] equal > 휖(푛). to 2 , a contradiction. 1 2 1 1 푃 푝 +■ 2 (1 − 푞) = 2 + 휖(푛) 98 an intensive introduction to cryptography

The definition of unpredictability is useful because many of our candidates for pseudorandom generators appeal to the unpredictabil- ity definition in their proofs. For example, the Blum-Blum-Shub gen- erator we will see later in the chapter is proved to be unpredictable if the “quadratic residuosity problem” is hard. It is also nice to know that our intuition at the beginning of the chapter can be formalized.

3.1 STREAM CIPHERS

We now show a connection between pseudorandom generators and encryption schemes:

Theorem 3.8 — PRG conjecture implies Cipher conjectures. If the PRG conjecture is true then so is the cipher conjecture.

It turns out that the converse direction is also true, and hence these two conjectures are equivalent. We will probably not show the (quite non-trivial) proof of this fact in this course. (We might show a weaker version though.)

Proof. Recall that the one time pad is a perfectly secure cipher but its only problem was that to encrypt an long message it needed an long bit key. Now using a pseudorandom generator, we can map an -bit long key into an -bit푛 + long 1 string that looks random enough푛 + 1 that we could use it as a key for the one-time pad. That is, our cipher will푛 look as follows: 푛 + 1

푘 and 퐸 (푚) = 퐺(푘) ⊕ 푚

푘 Just like in the one time퐷 pad,(푐) = 퐺(푘) ⊕ 푐 . Moreover, the encryption and decryption algorithms are clearly 푘 푘 efficient. We will prove security퐷 (퐸 of(푚)) this = encryption 퐺(푘) ⊕ 퐺(푘) by ⊕ 푚 showing = the stronger푚 claim that for any . Notice that , as we showed in the security of the 푈푛 푛+1 one-time pad. Suppose퐸 (푚) that ≈ for 푈 some non-negligible푚 there 푛+1 푛+1 is an efficient푈 adversary= 푈 ⊕such 푚 that ′ 휖 = 휖(푛) > 0 퐸푣푒

′ ′ 푛 푛+1 Then the|피[퐸푣푒 adversary(퐺(푈 ) ⊕defined 푚)] − 피[퐸푣푒 as (푈 ⊕ 푚)]| ≥ 휖. would be also efficient. Furthermore, if is pseudorandom then′ and퐸푣푒 if is uniformly퐸푣푒(푦) random = then 퐸푣푒 (푦 ⊕ 푚) ′ 푦 퐸푣푒(푦) = 푛 퐸푣푒 (퐺(푈 ) ⊕ 푚) 푦 퐸푣푒(푦) = pseudorandomness 99

. Then, can distinguish the two distributions with′ advantage , a contradiction. 푛+1 퐸푣푒 (푈 ⊕ 푚) 퐸푣푒 ■ 휖 If the PRG outputs bits instead of then we automatically get an encryption scheme with long message length. In fact, in practice if we use the length푡(푛) extension for푛 PRG’s, + 1 we don’t need to decide on the length of messages푡(푛) in advance. Every time we need to encrypt another bit (or another block) of the message, we run the basic PRG to update our state and obtain some new randomness 푖 that we can XOR with the message and푚 output. Such constructions 푖 are known as stream ciphers in the literature. In much of the practical푦 literature, the name is used both for the pseudorandom generator itself as well as for the encryption scheme that is obtained by combining it with the one-time pad.

R Remark 3.9 — Using pseudorandom generators for coin tossing over the phone. The following is a cute appli- cation of pseudorandom generators. Alice and Bob want to toss a fair coin over the phone. They use a pseudorandom generator . 1. Alice will send to푛 Bob 3푛 퐺 ∶ {0, 1} → {0, 1} 2. Bob picks and3푛 . If 푅 then Bob sends 푧 ← {0,푛 1}and if he sends 푅 푅 . In푠 other ← {0, words, 1} 푏 ← {0, 1}where푏 = 0 is the vector 푦 = 퐺(푠) . 푏 = 1 푦 = 3.퐺(푠) Alice ⊕ then 푧 picks a random푦 = 퐺(푠) ⊕ 푏 ⋅ 푧 and sends푏 ⋅ 푧 1 3푛 it to Bob. (푏 ⋅ 푧 , … , 푏 ⋅ 푧 ′) 푅 4. Bob sends to Alice the string푏 ←and {0,. Alice 1} verifies that indeed . Otherwise Alice aborts. 5. The output of the protocol is 푠 .푏 It can be shown푦 that = 퐺(푠) (assuming ⊕ 푏 ⋅ 푧 the protocol′ is com- pleted) the output is a random coin,푏 ⊕ 푏 which neither Alice or Bob can control or predict with more than negligible advantage over half. Trying to formalize this and prove it is an excellent exercise. Two main components in the proofs are: • With probability over , the sets and3푛 푅 1 − 푛푒푔푙(푛) 푧will ← be{0, disjoint.푛 1} 0 Hence by푆 choosing= {퐺(푥)|푥at random, Alice ∈푛 can {0, ensure 1} } 1 that푆 Bob={퐺(푥) is committed ⊕ 푧|푥to the ∈ choice {0, 1} of} after sending . 푧 • For every , both the distribution 푏 and 푦 are pseudorandom. This can be shown 푛 to imply that푧 no matter what string퐺(푈Alice) chooses, 푛 she퐺(푈 cannot) ⊕ predict푧 from the string sent by Bob with probability better than 푧 . Hence her choice of will푏 be essentially independent푦 of . ′ 1/2 + 푛푒푔푙(푛) 푏 푏 100 an intensive introduction to cryptography

3.2 WHAT DO PSEUDORANDOM GENERATORS ACTUALLY LOOK LIKE?

So far we have made the conjectures that objects such as ciphers and pseudorandom generators exist, without giving any hint as to how they would actually look like. (Though we have examples such as the Caesar cipher, Vigenere, and Enigma of what secure ciphers don’t look like.) As mentioned above, we do not know how to prove that any particular function is a pseudorandom generator. However, there are quite simple candidates (i.e., functions that are conjectured to be secure pseudorandom generators), though care must be taken in constructing them. We now consider candidates for functions that maps bits to bits (or more generally for some constant ) and look at least somewhat “randomish”. As these constructions are typically푛 used푛 + as 1 a basic component for obtaining푛 + 푐 a longer length PRG푐 via the length extension theorem (Theorem 3.4), we will think of these pseudorandom generators as mapping a string representing the current state into a string representing푛 the new state as well as a string representing푛 푠 the ∈ {0,current 1} output. See also Section 6.1 in Katz-Lindell푐푠’ and ∈ {0,(for 1} greater depth) Sections 3.6-3.9 in the Boneh-Shoup푏 ∈ book.{0, 1}

3.2.1 Attempt 0: The counter generator To get started, let’s look at an example of an obviously bogus pseudo- random generator. We define the “counter pseudorandom generator” as follows. where mod (treating푛 and푛+1 as numbers in ′ ) and′ is the 퐺least ∶ {0, significant푛 1} → {0, digit 1} of ′ . It’s a great퐺(푠) exercise = (푠 to, 푏)푛 work out푠 why= 푠 this + 1 is not a secure2 pseudorandom푠 푠′ generator. {0, … , 2 − 1} 푏 푠

P You should really pause here and make sure you see why the “counter pseudorandom generator” is not a secure pseudorandom generator. Show that this is true even if we replace the least significant digit by the -th digit for every .

푘 0 ≤ 푘 < 푛 3.2.2 Attempt 1: The linear checksum / linear feedback shift register (LFSR) LFSR can be thought of as the “mother” (or maybe more like the sick great-uncle) of all pseudorandom generators. One of the simplest ways to generate a “randomish” extra digit given an digit number 6 CRC are often used to generate a “control digit” to is to use a checksum - some linear combination of the digits, with a detect mistypes of credit card or social security card number. This has very different goals than its use for 6 canonical example being the cyclic redundancy check푛or CRC. This pseudorandom generators, though there are some motivates the notion of a linear feedback shift register generator (LFSR): common intuitions behind the two usages. pseudorandomness 101

if the current state is then the output is where is a linear function (modulo 2) and푛 the new state is obtained by right shifting the previous푠 state ∈ {0, and 1} putting at the leftmost푓(푠) location.푓 That is, and for . LFSR’s′ have several good′ properties-푓(푠) if the function is chosen 1 푖 푖−1 properly푠 then= 푓(푠) they can푠 have= 푠 very long푖 ∈ {2,periods … , 푛}(i.e., it can take an ex- ponential number of steps until the state repeats itself),푓(⋅) though that also holds for the simple “counter” generator we saw above. They also have the property that every individual bit is equal to or with probability exactly half (the counter generator also shares this prop- erty). 0 1 A more interesting property is that (if the function is selected prop- erly) every two coordinates are independent from one another. That is, there is some super-polynomial function (in fact can be exponential in ) such that if , then if we look at the two random variables corresponding′ to the푡(푛) -th and 푡(푛)-th output of the generator푛 (where randomnessℓ ≠ ℓ ∈ is {0, the … initial , 푡(푛)} state) then′ they are distributed like two independent random coins.ℓ (This isℓ non-trivial to show, and depends on the choice of - it is a challenging but useful exercise to work this out.) The counter generator fails badly at this condition: the least significant bits푓 between two consecutive states always flip. There is a more general notion of a linear generator where the new state can be any invertible linear transformation of the previous state. That is, we interpret the state as an element of for some integers 7 7 A ring is a set of elements where addition and , and let and the output where푡 푞 multiplication are defined and obey the natural rules and are invertible푠 linear transformationsℤ (modulo푡 ).푡 of associativity and commutativity (though without 푞 푞 necessarily having a multiplicative inverse for every This푞, 푡 includes푡 푠’ as = a 퐹 special (푠) case the linear푏 congruential = 퐺(푠) generator퐹 ∶ ℤ where→ ℤ 푞 푞 element). For every integer we define (known as 퐺and ∶ ℤ the→ map ℤ corresponds to taking mod where푞 is the ring of integers modulo ) to be the set number co-prime to . where addition and multiplication푞 is doneℤ푞 modulo . 푡 =All 1 these generators퐹 (푠) are unfortunately insecure푎푠 ( due to푞) the great푎 푞 {0, … , 푞 − 1} 푞 푞 bane of cryptography- the Gaussian elimination algorithm which stu- 8 8 Despite the name, the algorithm goes at least as far dents typically encounter in any linear algebra class. back as the Chinese Jiuzhang Suanshu manuscript, circa 150 B.C. Theorem 3.10 — The unfortunate theorem for cryptography. There is a poly- nomial time algorithm to solve linear equations in variables (or to certify no solution exists) over any ring. 푚 푛 Despite its seeming simplicity and ubiquity, Gaussian elimination (and some generalizations and related algorithms such as Euclid’s extended g.c.d algorithm and the LLL lattice reduction algorithm) has been used time and again to break candidate cryptographic con- structions. In particular, if we look at the first outputs of a linear generator then we can write linear equations in the unknown 푛 1 푛 푏 , … , 푏 102 an intensive introduction to cryptography

initial state of the form where the ’s are known linear functions. Either those functions are linearly independent, 1 1 푛 푛 푖 in which case we can solve푓 (푠) the = equations 푏 , … , 푓 (푠) to get = 푏 the unique solution푓 for the original state and from which point we can predict all outputs of the generator, or they are dependent, which means that we can predict some of the outputs푠 even without recovering the original state. Either way, the generator is ’ed (where refers to whatever verb you prefer to use when your system is broken). See also this 1977 paper of James Reed. ∗♯! ∗♯!

R Remark 3.11 — Non-cryptographic PRGs. The above means that it is a bad idea to use a linear checksum as a pseudorandom generator in a cryptographic appli- cation, and in fact in any adversarial setting (e.g., one shouldn’t hope that an attacker would not be able to reverse engineer the algorithm 9 that computes the control digit of a credit card number). However, that does not mean that there are no legitimate cases where linear generators can be used . In a setting where the application is not adversarial and you have an ability to test if the generator is actually successful, it might be reasonable to use such insecure non-cryptographic generators. They tend to be more efficient (though often not by much) and hence are often the default option in many programming environments such as the C rand() command. (In fact, the real bottleneck in using cryptographic pseudorandom generators is often the generation of entropy for their seed, as discussed in the previous lecture, and not their actual running time.)

9 That number is obtained by applying an algorithm of Hans Peter Luhn which applies a simple map to each digit of the card and then sums them up modulo 3.2.3 From insecurity to security 10. It is often the case that we want to “fix” a broken cryptographic prim- itive, such as a pseudorandom generator, to make it secure. At the moment this is still more of an art than a science, but there are some principles that cryptographers have used to try to make this more principled. The main intuition is that there are certain properties of computational problems that make them more amenable to algo- rithms (i.e., “easier”) and when we want to make the problems useful for cryptography (i.e., “hard”) we often seek variants that don’t pos- sess these properties. The following table illustrates some examples of such properties. (These are not formal statements, but rather is intended to give some intuition )

Easy Hard Continuous Discrete pseudorandomness 103

Easy Hard Convex Non-convex Linear Non-linear (degree ) Noiseless Noisy Local Global ≥ 2 Shallow Deep Low degree High degree

Many cryptographic constructions can be thought of as trying to transform an easy problem into a hard one by moving from the left to the right column of this table. The discrete logarithm problem is the discrete version of the con- tinuous real logarithm problem. The learning with errors problem can be thought of as the noisy version of the linear equations problem (or the discrete version of least squares minimization). When con- structing block ciphers we often have mixing transformation to ensure that the dependency structure between different bits is global, S-boxes to ensure non-linearity, and many rounds to ensure deep structure and large algebraic degree. This also works in the other direction. Many algorithmic and ma- chine learning advances work by embedding a discrete problem in a continuous convex one. Some attacks on cryptographic objects can be thought of as trying to recover some of the structure (e.g., by embed- ding modular arithmetic in the real line or “linearizing” non linear equations).

3.2.4 Attempt 2: Linear Congruential Generators with dropped bits One approach that is widely used in implementations of pseudoran- dom generators is to take a linear generator such as the linear congru- ential generators described above, and use for the output a “chopped” version of the linear function and drop some of the least significant bits. The operation of dropping these bits is non-linear and hence the attack above does not immediately apply. Nevertheless, it turns out this attack can be generalized to handle this case, and hence even with dropped bits Linear Congruential Generators are completely insecure and should be used (if at all) only in applications such as simulations where there is no adversary. Section 3.7.1 in the Boneh-Shoup book describes one attack against such generators that uses the notion of lattice algorithms that we will encounter later in this course in very different contexts. 104 an intensive introduction to cryptography

3.3 SUCCESSFUL EXAMPLES

Let’s now describe some successful (at least per current knowledge) pseudorandom generators:

3.3.1 Case Study 1: Subset Sum Generator 10 Actually modern computers will be able to break Here is an extremely simple generator that is yet still secure10 as far as this generator via brute force, but if the length and number of the constants were doubled (or perhaps we know. quadrupled) this should be sufficiently secure, though longer to write down. # seed is a list of 40 zero/one values # output is a 48 bit integer def subset_sum_gen(seed): modulo = 0x1000000 constants = [ 0x3D6EA1, 0x1E2795, 0xC802C6, 0xBF742A, 0x45FF31, 0x53A9D4, 0x927F9F, 0x70E09D, 0x56F00A, 0x78B494, 0x9122E7, 0xAFB10C, 0x18C2C8, 0x8FF050, 0x0239A3, 0x02E4E0, 0x779B76, 0x1C4FC2, 0x7C5150, 0x81E05E, 0x154647, 0xB80E68, 0xA042E5, 0xE20269, 0xD3B7F3, 0xCC5FB9, 0x0BFC55, 0x847AE0, 0x8CFDF8, 0xE304B7, 0x869ACE, 0xB4CDAB, 0xC8E31F, 0x00EDC7, 0xC50541, 0x0D6DDD, 0x695A2F, 0xA81062, 0x0123CA, 0xC6C5C3 ]

# return the modular sum of the constants # corresponding to ones in the seed return reduce(lambda x,y: (x+y) % modulo, map(lambda a,b: a*b, constants,seed))

The seed to this generator is an array seed of 40 bits, with 40 hard- wired constants each 48 bits long (these constants were generated at random, but are fixed once and for all, and are not kept secret and hence are not considered part of the secret random seed). The output is simply

seed constants mod 40 48 and hence expands∑푖=1 the [푖]bit input into[푖] a ( bit output.2 ) This generator is loosely motivated by the “subset sum” computa- tional problem, which is40 NP hard. However,48 since NP hardness is a worst case notion of complexity, it does not imply security for pseudo- random generators, which requires hardness of an average case variant. To get some intuition for its security, we can work out why (given that it seems to be linear) we cannot break it by simply using Gaussian elimination. pseudorandomness 105

P This is an excellent point for you to stop and try to answer this question on your own.

Given the known constants and known output, figuring out the set of potential seeds can be thought of as solving a single equation in 40 variables. However, this equation is clearly overdetermined, and will have a solution regardless of whether the observed value is indeed an output of the generator, or it is chosen uniformly at random. More concretely, we can use linear-equation solving to compute (given the known constants and the output ) the linear subspace of all vectors 48 such that48 1 40 2 2 mod . But, regardless푐 , … , 푐 ∈ of ℤ whether was48 40 generated푦 ∈ ℤ at 1 40 2 random from , or48푉 was generated(푠 , as … an , 푠 output) ∈ (ℤ of the) generator, 푖 푖 ∑the 푠 subspace푐 = 푦 ( will48 2 always) have the same dimension푦 (specifically, 2 since it is formedℤ by a푦 single linear equation over variables, the dimension will푉 be .) To break the generator we seem to need to be able to decide whether this linear subspace 40 contains a Boolean vector (i.e.,39 a vector ). Since the condition48 40 that 2 a vector is Boolean is not defined by linear푛 푉 equations, ⊆ (ℤ ) we cannot use Gaussian elimination to break푠 the ∈ {0, generator. 1} Generally, the task of finding a vector with small coefficients inside a discrete linear sub- space is closely related to a classical problem known as finding the shortest vector in a lattice. (See also the short integer solution (SIS) problem.)

3.3.2 Case Study 2: RC4 The following is another example of an extremely simple generator known as RC4 (this stands for Rivest Cipher 4, as Ron Rivest invented this in 1987) and is still fairly widely used today. def RC4(P,i,j): i = (i + 1) % 256 j = (j + P[i]) % 256 P[i], P[j] = P[j], P[i] return (P,i,j,P[(P[i]+P[j]) % 256])

The function RC4 takes as input the current state P,i,j of the gen- erator and returns the new state together with a single output byte. The state of the generator consists of an array P of 256 bytes, which can be thought of as a permutation of the numbers in the sense that we maintain the invariant that P P for every , and two indices . We can consider the0, initial … , 255 state as the case where P is a completely random permutation[푖] ≠ [푗] and and푖 ≠are 푗 initial- 푖, 푗 ∈ {0, … , 255} 푖 푗 106 an intensive introduction to cryptography

ized to zero, although to save on initial seed size, typically RC4 uses some “pseudorandom” way to generate P from a shorter seed as well. RC4 has extremely efficient software implementations and hence has been widely implemented. However, it has several issues with its 11 11 I typically do not include references in these lecture security. In particular it was shown by Mantin and Shamir that the notes, and leave them to the texts, but I make here an second bit of RC4 is not random, even if the was exception because Itsik Mantin was a close friend of random. This and other issues led to a practical attack on the 802.11b mine in grad school. WiFi protocol, see Section 9.9 in Boneh-Shoup. The initial response to those attacks was to suggest to drop the first 1024 bytes of the output, but by now the attacks have been sufficiently extended that RC4 is simply not considered a secure cipher anymore. The ciphers Salsa and ChaCha, designed by Dan Bernstein, have a similar design to RC4, and are considered secure and deployed in several standard protocols such as TLS, SSH and QUIC, see Section 3.6 in Boneh-Shoup.

3.3.3 Case Study 3: Blum, Blum and Shub B.B.S., which stands for the authors Blum, Blum and Shub, is a simple generator constructed from a potentially hard problem in number theory. Let , where are primes. (We will generally use of size roughly , where is our security parameter, and so use capital푁 letters = 푃 to ⋅ 푄emphasize푃 that, 푄 the magnitude of these numbers is 푃exponential , 푄 in the security푛 parameter.)푛 We define QR to be the set of quadratic residues modulo , which are the numbers that have a modular square root. Formally, 푁 푁 QR mod gcd This definition extends2 the concept of “perfect squares” when 푁 = {푋 푁 ∣ (푋, 푁) = 1}. we are working with standard integers. Notice that each number in QR has at least one square root (number such that mod ). We will see later in the course that if for primes2 푁 푌 ∈ then each QR has exactly square roots.푋 The B.B.S.푌= genera- 푋 tor chooses푁 , where are prime and푁 = 푃 ⋅ 푄 mod . 푁 푃The , 푄 second condition푌 ∈ guarantees that4 for each QR , exactly one of its square푁 roots = 푃 fall ⋅ 푄 in QR ,푃 and , 푄 hence the map 푃 , 푄 ≡ 3 (mod 4)is 푁 one-to-one and onto map from QR to itself. 푌 ∈ 2 푁 It is defined as follows: 푋 ↦ 푋 푁 푁 def BBS(X): return (X * X % N, N % 2)

In other words, on input , BBS outputs mod and the least significant bit of . We can think of BBS as a2 map BBS QR and so it maps푋 a domain(푋) into a larger푋 domain.푁 We can 푁 also extend it to output푋 additional bits, by repeatedly squaring∶ 푄푅 the→ 푁 × {0, 1} 푡 pseudorandomness 107

input, letting , mod , for , and outputting together with the least2 significant bits of . 0 푖+1 푖 It turns out that푋 assuming= 푋 푋 that= there 푋 is no푁 polynomial-time푖 = 0, … , 푡 −algorithm 1 푡 0 푡−1 (where “polynomial-time”푋 means polynomial in the number푋 , … of , 푋bits to represent , i.e., polynomial in log ) to factor randomly chosen integers , for every that is polynomial in the number of bits in , the output푁 of the -step BBS generator푁 will be computationally indistinguishable푁 = 푃 ⋅ 푄 from 푡 where denotes the uniform distribution푁 over QR . 푡 푄푅푁 푡 푄푅푁 The number theory required푈 × to푈 show takes푈 a while to develop. 푁 However, it is interesting and I recommend the reader to search up this particular generator, see for example this survey by Junod.

3.4 NON-CONSTRUCTIVE EXISTENCE OF PSEUDORANDOM GEN- ERATORS

We now show that, if we don’t insist on constructivity of pseudoran- dom generators, then there exists pseudorandom generators with output that are exponentially larger than the input length. Lemma 3.12 — Existence of inefficient pseudorandom generators. There is some absolute constant such that for every , if log log and , then there is an pseudorandom generator .퐶 휖, 푇 ℓ > 퐶( 푇 + (1/휖)) 푚ℓ ≤ 푇 푚 (푇 , 휖) 퐺 ∶ Proof{0, 1} Idea:→ {0, 1} The proof uses an extremely useful technique known as the “prob- abilistic method” which is not too hard mathematically but can be 12 There is a whole (highly recommended) book by confusing at first.12 The idea is to give a “non constructive” proof of Alon and Spencer devoted to this method. existence of the pseudorandom generator by showing that if was chosen at random, then the probability that it would be a valid pseudorandom generator is positive. In particular퐺 this means that퐺 there exists a single that is a valid pseudorandom generator.(푇 , 휖) The probabilistic method is just a proof technique to demonstrate the existence of such a function.퐺 Ultimately,(푇 , 휖) our goal is to show the ex- istence of a deterministic function that satisfies the conditions of a PRG. 퐺 (푇 , 휖) The above discussion might be rather abstract at this point, but ⋆ would become clearer after seeing the proof.

Proof of Lemma 3.12. Let be as in the lemma’s statement. We need to show that there exists a function that “fools” every line program휖, 푇 , ℓ, 푚in the sense of (3.1).ℓ We will show푚 that this follows from the following claim:퐺 ∶ {0, 1} → {0, 1} 푇 푃 108 an intensive introduction to cryptography

Claim I: For every fixed NAND program / Boolean circuit , if we pick at random then the probability that (3.1) is violated is at mostℓ . 푚 푃 2 Before퐺 ∶ {0, proving 1} → Claim {0,−푇 1} I, let us see why it implies Lemma 3.12. We can identify a function2 with its “truth table” or simply the list of evaluations onℓ all its possible푚 inputs. Since each output is an bit퐺 string, ∶ {0, we 1} can→ also {0, 1} think of ℓ as a string in . We define to be the set of all functions2 from to ℓ 푚⋅2. As discussed푚 above푚 we can identify with퐺 andℓ ℓ ℓ {0,choosing 1}푚 a random functionℱ corresponds푚 to choosing{0,푚⋅2 1} a ℓ {0,random 1} -long bit string. 푚 ℱ {0, 1} ℓ For every NANDℓ program퐺 / Boolean∼ ℱ circuit let be the event that, if we푚 choose ⋅ 2 at random from then (3.1) is violated with 푃 respect to the program . It is important푚 to understand푃 퐵 what is the ℓ sample space that퐺 the event is definedℱ over, namely this event depends on the choice of푃 and so is a subset of . An equiva- 푃 lent way to define the event 퐵 is that it is the subset of푚 all functions 푃 ℓ mapping to 퐺that violate퐵 (3.1), or in otherℱ words: 푃 ℓ 푚 퐵 {0, 1} {0, 1}

⎧ 푚 1ℓ 1푚 ⎫ 푃 { } 퐵 = 퐺 ∈ ℱℓ ∣ ∣ 2 ∑ ℓ 푃 (퐺(푠)) − 2 ∑ 푚 푃 (푟)∣ > 휖 (3.2). ⎨ 푠∈{0,1} 푟∈{0,1} ⎬ (We’ve replaced⎩{ here the probability statements in (3.1) with the⎭} equivalent sums so as to reduce confusion as to what is the sample space that is defined over.) To understand this proof it is crucial that you pause here and see 푃 how the definition퐵 of above corresponds to (3.2). This may well take re-reading the above text once or twice, but it is a good exercise 푃 at parsing probabilistic퐵 statements and learning how to identify the sample space that these statements correspond to. Now, the number of programs of size (or circuits of size ) is at most log . Since log ) this means that if Claim I is true, then푂(푇 by푇 the ) union bound it holds2 푇 that the probability of푇 the union of2 over all NAND푇 programs푇 = 표(푇 of at most lines is at most log for sufficiently large . What is important for 푃 2 us푂(푇 about푇 )퐵 the−푇 number is that it is smaller than 푇. In particular this 2means that2 there

, if log log (which by setting , we can ensure푚 is larger than 퐶( 푇 +) then(1/휖)) the probability that 푃 ∶ {0, 1} → {0, 1} 퐿 > 2 2 2 퐶 > 4 10푇 /휖Pr (3.3) 퐿−1 1 푖 푚 ∣ 퐿 ∑ 푃 (푦 ) − 푠←푅{0,1} [푃 (푠) = 1]∣ > 휖 is at most .(??)푖=0 follows directly from the Chernoff bound. If 2 we let for every−푇 the random variable denote , then since 2 is chosen independently at random, these are inde- 푖 푖 pendently and identically푖 ∈ [퐿] distributed random푋 variables푃 with (푦 ) mean 0 퐿−1 푦 , … , 푦 Pr and hence the probability

푚 푚 that푅 they deviate from their푅 expectation by is at most . 푦← {0,1} 푦← {0,1} 2 ■ 피 [푃 (푦)] = [푃 (푦) = 1] −휖 퐿/2 휖 2 ⋅ 2

4 Pseudorandom functions

Reading: Rosulek Chapter 6 has a good description of pseudorandom functions. Katz-Lindell cover pseudorandom functions in a different order than us. The topics of this lecture and the next ones are covered in KL sections 3.4-3.5 (PRFs and CPA security), 4.1-4.3 (MACs), and 8.5 (construction of PRFs from PRG). In the last lecture we saw the notion of pseudorandom generators, and introduced the PRG conjecture, which stated that there exists a pseu- dorandom generator mapping bits to bits. We have seen the length extension theorem, which states that given such a pseudoran- dom generator, there exists a generator푛 mapping푛 + 1 bits to bits for an arbitrarily large polynomial . But can we extend it even further? Say, to bits? Does this question even make sense?푛 And푚 why would we want푛 to do that? This is the푚(푛) topic of this lecture. At a2 first look, the notion of extending the output length ofapseu- dorandom generator to bits seems nonsensical. After all, we want our generator to be efficient푛 and just writing down the output will take exponential time. However,2 there is a way around this conundrum. While we can’t efficiently write down the full output, we can require

that it would be possible, given an index , to compute 1 1 We will often identify the strings of length with the bit of the output in polynomial time. That is,푛 we require that the numbers between and , and switch freely

the function푡ℎ is efficiently computable푖 ∈ {0, … , 2 and− 1} (by security of between the two representations,푛−1 and hence푛 can think the pseudorandom푖 generator) indistinguishable from a function that of also as a string in 0 2. We will also switch 푖 between indexing strings starting푛 from and starting maps each index푖 ↦ 퐺(푠)to an independent random bit in . This is the from푖 based on convenience.{0, 1} notion of a pseudorandom function generator which is a bit subtle to de- 0 1 fine and construct,푖 but turns out to have a great{0, 1} many applications in cryptography.

Definition 4.1 — Pseudorandom Function Generator. An efficiently com- putable function taking two inputs and and outputting a single bit is a pseudorandom∗ function |푠| 퐹 푠 ∈ {0, 1} 푖 ∈ {0, … , 2 − 1} 퐹 (푠, 푖)

Compiled on 9.23.2021 13:32 112 an intensive introduction to cryptography

(PRF) generator if for every polynomial time adversary out- putting a single bit and polynomial , if is large enough then: 퐴 푝(푛) 푛

푛 퐹(푠,⋅) 푛 푛 퐻 푛 ∣푠∈{0,1} 피 [퐴 (1 )] −퐻←푅[2 피]→{0,1}[퐴 (1 )]∣ < 1/푝(푛) . Some notes on notation are in order. The input is simply a string of ones, and it is a typical cryptography convention푛 to assume that such an input is always given to the adversary. This1 is simply be- 푛 cause by “polynomial time adversary” we really mean polynomial in 2 2 This also allows us to be consistent with the notion (which is our key size or security parameter) . The notation of “polynomial in the size of the input.” means that has black box (also known as oracle) access to the func-퐹(푠,⋅) 푛tion that maps to . That is, can choose an index , query퐴 the box and get퐴 , then choose a new index , query the box to get , and so푖 on for퐹 (푠, a 푖)polynomial퐴 number of′ queries. The푖 notation ′ 퐹 (푠, 푖) means that is a completely푖 random function 퐹that (푠, maps푖 ) 푛 every index to an independent and random different bit. 푅 퐻 ← [2 ] → {0, 1} 퐻

R 푖 Remark 4.2 — Completely Random Functions. This no- tion of a randomly chosen function can be difficult to wrap your mind around. Try to imagine a table of all of the strings in . We now go to each possible input, randomly generate푛 a bit to be its output, and write down the result{0, 1} in the table. When we’re done, we have a length lookup table that maps each input to an output that was푛 generated uniformly at random and independently2 of all other outputs. This lookup table is now our random function . In practice it’s too cumbersome to actually generate all bits, and sometimes in theory퐻 it’s convenient to think푛 of each output as generated only after a query is made.2 This leads to adopting the lazy evaluation model. In the lazy evaluation model, we imagine that a lazy person is sitting in a room with the same lookup table as before, but with all entries blank. If someone makes some query , the lazy person checks if the entry for in the lookup table is blank. If so, the lazy evalu- ator generates퐻(푠) a random bit, writes down the result for , and푠 returns it. Otherwise, if an output has already been generated for previously (because has been queried푠 before), the lazy evaluator simply returns this value. Can you see푠 why this model is more푠 convenient in some ways? One last way to think about how a completely random function is determined is to first observe that there exist a total of functions from to (can 푛 you see why? It2 may be easier to think of푛 them as functions from2 to ). We{0, choose 1} one{0, of 1} them 푛 [2 ] {0, 1} pseudorandom functions 113

uniformly at random to be , and it’s still the case that for any given input the result is or with equal probability independent퐻 of any other input. Regardless of which model푠 we use to퐻(푠) think0 about1 gen- erating , after we’ve chosen and put it in a black box, the behavior of is in some sense “determinis- tic” because퐻 given the same query퐻 it will always return the same result. However,퐻 before we ever make any given query we can only guess correctly with probability , because without previously observing it is effectively1푠 random and퐻(푠) undecided to us (just like in the lazy2 evaluator model). 퐻(푠)

P Now would be a fantastic time to stop and think deeply about the three constructions in the remark above, and in particular why they are all equivalent. If you don’t feel like thinking then at the very least you should make a mental note to come back later if you’re confused, because this idea will be very useful down the road.

Thus, the notation in the PRF definition means has access to a completely random퐻 black box that returns a random bit for any new query made, and퐴 on previously seen queries returns퐴 the same bit as before. Finally one last note: below we will identify the set with the set (there is a one to one mapping푛 between푛 those sets using the binary푛 representation), and so we[2 will] = treat{0, … ,interchangeably 2 − 1} as a{0, number 1} in or a string in . 푛 푛 Ensembles of PRFs. If is a pseudorandom function generator, then 푖 [2 ] {0, 1} if we choose a random string and consider the function defined by , no퐹 efficient algorithm can distinguish between 푠 black box access to and black푠 box access to a completely푓 random 푠 function푓 (푖)= (see 퐹 (푠,Fig. 푖) 4.1). Notably, black box access implies that a priori 푠 the adversary does푓 not(⋅) know which function it’s querying. From the adversary’s point of view, they query some oracle (which behind the scenes is either or ), and must decide if or . Thus often instead of talking about a pseudorandom푂 function 푠 푠 generator we will refer푓 (⋅) to a퐻pseudorandom function ensemble푂 = 푓 (⋅) 푂 = . 퐻Formally, this is defined as follows: ∗ 푠 푠∈{0,1} {푓 } Definition 4.3 — PRF ensembles. Let be an ensemble of functions such that for every , ∗ . We 푠 푠∈{0,1} say that is a pseudorandom function{푓 } ∗ ensemble if the|푠| function 푠 푠 ∈ {0, 1} 푓 ∶ {0, 1} → {0, 1} 푠 {푓 } 퐹 114 an intensive introduction to cryptography

that on input and outputs is a PRF generator. ∗ |푠| 푠 푠 ∈ {0, 1} 푖 ∈ {0, … , 2 − 1} 푓 (푖) Note that the condition of Definition 4.3 corresponds to requiring that for every polynomial and -time adversary , if is large enough then 푝 푝(푛) 퐴 푛

푠 푛 푓 (⋅) 푛 ℎ 푛 푅 푛,1 where ∣is푠∈{0,1} the 피 set[퐴 of all(1 functions)] −ℎ← 피ℱ mapping[퐴 (1 )]∣ < 1/푝(푛)to (i.e., the set ). 푛 푛,1 ℱ 푛 {0, 1} {0, 1} {0, 1} → {0, 1} P It is worth while to pause and make sure you un- derstand why Definition 4.3 and Definition 4.1 give different ways to talk about the same object.

Figure 4.1: In a pseudorandom function, an adversary cannot tell whether they are given a black box that computes the function for some secret that was chosen at random and fixed, or whether the black box computes a completely푖 ↦ 퐹(푠, 푖) random function 푠 that tosses a fresh random coin whenever it’s given a new input .

In the next lecture we will see the proof of following theorem (due to Goldreich, Goldwasser, and Micali)

Theorem 4.4 — PRFs from PRGs. Assuming the PRG conjecture, there exists a secure pseudorandom function generator.

But before we see the proof of Theorem 4.4, let us see why pseudo- random functions could be useful. pseudorandom functions 115

4.1 ONE TIME PASSWORDS (E.G. GOOGLE AUTHENTICATOR, RSA ID, ETC.)

Until now we have talked about the task of encryption, or protecting the secrecy of messages. But the task of authentication, or protecting the integrity of messages is no less important. For example, consider the case that you receive a software update for your PC, phone, car, pacemaker, etc. over an open channel such as an unencrypted Wi- Fi connection. The contents of that update are not secret, but it is of crucial importance that it was unchanged from the message sent out by the company and that no malicious attacker had modified the code. Similarly, when you log into your bank, you might be much more concerned about the possibility of someone impersonating you and cleaning out your account than you are about the secrecy of your information. Let’s start with a very simple scenario which we’ll call the login problem. Alice and Bob share a key as before, but now Alice wants to simply prove her identity to Bob. What makes this challenging is that this time they need to contend with not the passive eavesdropping Eve but the active adversary Mallory, who completely controls the communication channel between them and can modify (or mall) any message that they send. Specifically for the identity proving case, we think of the following scenario. Each instance of such an identifica- tion protocol consists of some interaction between Alice and Bob that ends with Bob deciding whether to accept it as authentic or reject as an impersonation attempt. Mallory’s goal is to fool Bob into accepting her as Alice. The most basic way to try to solve the login problem is by simply using a password. That is, if we assume that Alice and Bob can share a key, we can treat this key as some secret password that was se- lected at random from (and hence can only be guessed with probability ). Why doesn’t푛 Alice simply send to푝 Bob to prove to him her identity?−푛 A moment’s{0, 1} thought shows that this would be a very bad idea.2 Since Mallory is controlling the communication푝 line, she would learn after the first identification attempt and could then easily impersonate Alice in future interactions. However, we seem to have just the tool푝 to protect the secrecy of — encryption. Suppose that Alice and Bob share a secret key and an additional secret password . Wouldn’t a simple way to solve the login푝 problem be for Alice to send Bob an encryption of the password푘 ? After all, the security of 푝the encryption should guarantee that Mallory can’t learn , right? 푝 푝

P 116 an intensive introduction to cryptography

This would be a good time to stop reading and try to think for yourself whether using a secure encryption to encrypt would guarantee security for the login problem. (No really, stop and think about it.) 푝 The problem is that Mallory does not have to learn the password in order to impersonate Alice. For example, she can simply record the message Alice sends to Bob in the first session and then replay 푝it to Bob in the next session. Since the message is a valid encryption 1 of , then Bob would푐 accept it from Mallory! (This is known as a replay attack and is a common attack one needs to protect against in cryptographic푝 protocols.) One can try to put in countermeasures to defend against this particular attack, but its existence demonstrates that secrecy of the password does not guarantee security of the login protocol.

4.1.1 How do pseudorandom functions help in the login problem? The idea is that they create what’s known as a one time password. Alice and Bob will share an index for the pseudorandom func- tion generator . When Alice wants푛 to prove her identity to Bob, Bob will choose a random 푠 ∈ {0, 1} , send to Alice, and then Alice 푠 will send {푓 } 푛 to Bob where is some param- 푅 eter (you can think of 푖 ←for simplicity).{0, 1} Bob푖 will check that indeed 푠 푠 푠 and푓 (푖), if 푓 so(푖 accept + 1), … the , 푓 session(푖 + ℓ − as 1) authentic. ℓ The formal protocolℓ is = as 푛 follows: 푠 푦 = 푓 (푖) Protocol PRF-Login:

• Shared input: . Alice and Bob treat it as a seed for a pseudorandom function푛 generator . • In every session푠 Alice∈ {0, and1} Bob do the following: 푠 {푓 } 1. Bob chooses a random and sends to Alice. 2. Alice sends to Bob where푛 . 푅 3. Bob checks that for every푖 ← [2 ] , 푖 and if 1 ℓ 푗 푠 so accepts the푦 session;, … , 푦 otherwise he푦 rejects= 푓 (푖 it. + 푗 − 1) 푗 푠 푗 ∈ {1, … , ℓ} 푦 = 푓 (푖 + 푗 − 1) As we will see it’s not really crucial that the input (which is known in crypto parlance as a nonce) is random. What is crucial is that it never repeats itself, to foil a replay attack. For this reason푖 in many applications Alice and Bob compute as a function of the current time (for example, the index of the current minute based on some agreed- upon starting point), and hence we can푖 make it into a one message protocol. Also the parameter is sometimes chosen to be deliberately short so that it will be easy for people to type the values . Figure 4.2: The Google Authenticator app is one popular example of a one-time password scheme Why is this secure? The keyℓ to understanding schemes using pseu- 1 ℓ using pseudorandom functions. Another example is dorandom functions is to imagine what would happen if푦 , …was , 푦 be RSA’s SecurID token.

푠 푓 pseudorandom functions 117

an actual random function instead of a pseudo random function. In a truly random function, every one of the values is chosen independently and uniformly at random from 푛 . One 푠 푠 useful way to imagine this is using the concept푓 of(0), “lazy … , evaluation”. 푓 (2 − 1) We can think of as determined by tossing different{0, coins 1} for the values . Now consider the푛 case where we don’t 푆 actually toss the푓 coin푛 until we need it. The2 crucial point is that if we have queried푓(0), …푡ℎ , the 푓(2 function− 1) in places, then when Bob chooses a random푖 it is extremely unlikely푛 that any one of the set 푛 will be one푇 ≪ of those2 locations that we pre- viously queried. Thus,푖 ∈ [2 if the] function was truly random, Mallory has no information{푖, 푖 + 1, …on , 푖 + the ℓ −value 1} of the function in these coordinates, and would be able to predict (or rather, guess) it in all these locations with probability at most . −ℓ 2 P Please make sure you understand the informal rea- soning above, since we will now translate this into a formal theorem and proof.

Theorem 4.5 — Login protocol via PRF. Suppose that is a secure pseudorandom function generator and Alice and Bob interact us- 푠 ing Protocol PRF-Login for some polynomial number{푓 } of sessions (over a channel controlled by Mallory). After observing these in- teractions, Mallory then interacts with Bob, where Bob푇 follows the protocol’s instructions but Mallory has access to arbitrary efficient computation. Then, the probability that Bob accepts the interaction is at most where is some negligible function. −ℓ Proof. This proof,2 + 휇(푛)as so many others휇(⋅) in this course, uses an argument via contradiction. We assume, towards the sake of contradiction, that there exists an adversary (for Mallory) that can break the identifi- cation scheme PRF-Login with probability after interactions. We then construct an attacker푀 that can distinguish−ℓ access to from access to a random function in 2 time+ 휖 and with푇 bias at 푠 least . 퐴 {푓 } How do we construct this adversary푝표푙푦(푇? The) idea is as follows. First, we prove휖/2 that if we ran the protocol PRF-Login using an actual random function, then would not be able to succeed퐴 in impersonating with probability better than . Therefore, if does do better, then we can use푀 that to distinguish−ℓ from a random function. The adversary gets some2 black+ box푛푒푔푙푖푔푖푏푙푒(for oracle) and푀 will use it while 푠 internally simulating all the parties—푓 Alice, Bob and Mallory (using ) in the 퐴 interactions of the푂(⋅)PRF-Login protocol. Whenever any

푀 푇 + 1 118 an intensive introduction to cryptography

of the parties needs to evaluate , will forward to its black box and return the value . It will then output if and only if 푠 succeeds in impersonation in this푓 (푖) internal퐴 simulation.푖 The argument 푂(⋅)above showed that if 푂(푖)is a truly random function,1 then the proba-푀 bility that outputs is at most (and so in particular less than ). On푂(⋅) the other−ℓ hand, if is the function for some fixed퐴−ℓ and1 random , then2 this+ 푛푒푔푙푖푔푖푏푙푒 probability is at least . 푠 Thus will2 distinguish+ 휖/2 between the two cases푂(⋅) with bias at least−ℓ푖 ↦ 푓.(푖) We now turn to the formal proof:푠 2 + 휖 Claim퐴 1: Let PRF-Login* be the hypothetical variant of the protocol휖/2 PRF-Login where Alice and Bob share a completely random function . Then, no matter what Mallory does, the probability she can푛 impersonate Alice after observing interactions is at most 퐻 ∶ [2 ] → {0, 1}. −ℓ(If PRF-Login*푛 is easier to prove secure푇 than PRF-Login, you might 2wonder+ (8ℓ푇 why )/2 we bother with PRF-Login in the first place and not sim- ply use PRF-Login*. The reason is that specifying a random function requires specifying bits, and so that would be a huge shared key. So PRF-Login* is not a protocol푛 we can actually run but rather a hy- pothetical퐻 “mental experiment”2 that helps us in arguing about the security of PRF-Login.) Proof of Claim 1: Let be the nonces chosen by Bob and recieved by Alice in the first iterations. That is, is the nonce cho- 1 2푇 sen by Bob in the first iteration푖 , … , 푖 while is the nonce that Alice re- 1 ceived in the first iteration 푇(if Mallory doesn’t푖 modify itthen ). 2 Similarly, is the nonce chosen by Bob푖 in the second iteration while 1 2 is the nonce received by Alice and so on and so forth. Let 푖be= the 푖 3 nonce chosen푖 in the iteration in which Mallory tries to im- 4 personate푖 Alice. We claim푠푡 that the probability that there exists푖 some such푇 that + 1 is at most . Indeed, let be the union of all the intervals of the form 푛 푗 푗for ∈ {1, … , 2푇 } . Since it’s|푖 − a union 푖 | < 2ℓ of intervals8ℓ푇 each /2 of length 푆 푗 푗 less than , contains at most elements,{푖 so− the 2ℓ + probability 1, … , 푖 + 2ℓ that − 1} 1 ≤is 푗 ≤ 2푇 . Now, if there2푇 does not exists a such that 4ℓthen푆푛 it means in푛 particular8푇 ℓ that all the queries to made by푖 ∈ either 푆 |푆|/2 Alice or≤ Bob(8푇 ℓ)/2 during the first iterations are disjoint푗 from the 푗 interval|푖−푖 | < 2ℓ . Since is a completely random퐻(⋅) function, the values 푇 are chosen uniformly and independently{푖, 푖 + 1,from … , 푖all + the ℓ − rest 1} of the values퐻(⋅) of this function. Since Mallory’s message 퐻(푖),to Bob … in , 퐻(푖 the + ℓ − 1) iteration depends only on what she observed in the past, the values푠푡 are inde- pendent from , and푦 hence under the푇 + condition 1 that there is no overlap between this interval and prior queries, the퐻(푖), probability … , 퐻(푖+ℓ−1) that they equal is . QED푦 (Claim 1). −ℓ 푦 2 pseudorandom functions 119

The proof of Claim 1 is not hard but it is somewhat subtle, so it’s good to go over it again and make sure you understand it. Now that we have Claim 1, the proof of the theorem follows as outlined above. We build an adversary to the pseudorandom func- tion generator from by having simulate “inside its belly” all the parties Alice, Bob and Mallory and output퐴 if Mallory succeeds in impersonating. Since푀 we assumed퐴 is non-negligible and is polyno- mial, we can assume that and1 hence by Claim 1, if the black box is a random function, then푛휖 we are in the PRF-Login*푇 setting and Mallory’s success will(8ℓ푇 be at )/2 most< 휖/2 . If the black box is , then we get exactly the PRF-Login−ℓsetting and hence under our assumption the success will be at least2 + 휖/2. We conclude that the 푠 푓difference(⋅) in probability of outputting−ℓ between the random and pseudorandom case is at least thus2 contradicting+ 휖 the security of the pseudorandom function퐴 generator. 1 휖/2 ■

4.1.2 Modifying input and output lengths of PRFs In the course of constructing this one-time-password scheme from a PRF, we have actually proven a general statement that is useful on its own: that we can transform standard PRF which is a collection of functions mapping to , into a PRF where the functions 푠 have a longer output . Specifically,푛 we can make the following{푓 } defini- tion: {0, 1} {0, 1} ℓ Definition 4.6 — PRF ensemble (varying inputs and outputs). Let in out . An ensemble of functions is a PRF ensemble with

input length in and output length out if: ∗ ℓ , ℓ ∶ 푠 푠∈{0,1} ℕ → ℕ {푓 } 1. For everyℓ and ℓ , in out . 푛 ℓ ℓ 푠 2. For every polynomial푛 ∈ ℕ 푠 ∈and {0, 1} -time푓 ∶ {0, adversary 1} → {0,, 1} if is large enough then 푝 푝(푛) 퐴 푛

in out 푓푠(⋅) 푛 ℎ 푛 푛 ℓ ℓ ∣푠∈{0,1} 피 [퐴 (1 )] −ℎ←푅{0,1} 피→{0,1} [퐴 (1 )]∣ < 1/푝(푛) . Standard PRFs as we defined in Definition 4.3 correspond to gener- alized PRFs where in and out for all . It is a good exercise (which we will leave to the reader) to prove the following theorem: ℓ (푛) = 푛 ℓ (푛) = 1 푛 ∈ ℕ 120 an intensive introduction to cryptography

Theorem 4.7 — PRF length extension. Suppose that PRFs exist. Then for every constant and polynomial-time computable functions

in out with in out , there exist a PRF ensem-

ble with input length푐 in and output length푐 out. ℓ , ℓ ∶ ℕ → ℕ ℓ (푛), ℓ (푛) ≤ 푛 Thus from now on wheneverℓ we are given aℓ PRF, we will allow ourselves to assume that it has any polynomial output size that is convenient for us.

4.2 MESSAGE AUTHENTICATION CODES

One time passwords are a tool allowing you to prove your identity to, say, your email server. But even after you did so, how can the server trust that future communication comes from you and not from some attacker that can interfere with the communication channel between you and the server (so called “man in the middle” attack)? Similarly, one time passwords may allow a software company to prove their identity before they send you a software update, but how do you know that an attacker does not change some bits of this software update on route between their servers and your device? This is where Message Authentication Codes (MACs) come into play- their role is to authenticate not only the identity of the parties but also their communication. Once again we have Alice and Bob, and the adversary Mallory who can actively modify messages (in contrast to the passive eavesdropper Eve). Similar to the case to encryption, Alice has a message she wants to send to Bob, but now we are not concerned with Mallory learning the contents of the message. Rather, we want to make sure푚 that Bob gets precisely the message sent by Alice. Actually this is too much to ask for, since Mallory can always decide to block all communication, but we can ask that either푚 Bob gets precisely or he detects failure and accepts no message at all. Since we are in the private key setting, we assume that Alice and Bob share a key that푚 is unknown to Mallory. What kind of security would we want? We clearly want Mallory not to푘 be able to cause Bob to accept a message . But, like in the encryption setting, we want more than that.′ We would like Alice and Bob to be able to use the same key for푚many≠messages. 푚 So, Mallory might observe the interactions of Alice and Bob on messages before trying to cause Bob to accept a message . In fact, to make our notion of security more robust, we′ will 1 푇 푇 +1 푚even, … allow , 푚 Mallory to choose the messages (this푚 is known≠ 푇 +1 as푚 a chosen message or chosen plaintext attack). The resulting formal 1 푇 definition is below: 푚 , … , 푚 pseudorandom functions 121

Definition 4.8 — Message Authentication Codes (MAC). Let (for sign and verify) be a pair of efficiently computable algorithms where takes as input a key and a message , and produces(푆, 푉 ) a tag , while takes as input a key , a message , and a tag 푆, and produces∗ a bit 푘 . We푚 say that is a Message 휏Authentication ∈ {0, 1} Code (MAC)푉 if: 푘 푚 휏 푏 ∈ {0, 1} (푆, 푉 ) • For every key and message , . • For every polynomial-time adversary and polynomial , 푘 푘 it is with less than푘 probability푚 푉 (푚, over 푆 (푚)) the choice = 1 of that such퐴 that is not one푝(푛) of the 푅 푘 3 messages푛 queries푆 (⋅) 1/푝(푛)푛 and ′ ′ . ′ 푘 ← {0, 1} 퐴 (1 ) = (푚′ , 휏′ ) 푚 푘 퐴 푉 (푚 , 휏 ) = 1 3 Clearly if the adversary outputs a pair that If Alice and Bob share the key , then to send a message to Bob, it did query from its oracle then that pair will pass Alice will simply send over the pair where . If verification. This suggests the possibility(푚, 휏) ofa replay attack whereby Mallory resends to Bob a message that Bob receives a message , then푘 he will accept if and푚 only Alice sent him in the past. As above, one can thwart 푘 if . Mallory now′ ′ observes(푚, 휏)rounds휏 of = communication′ 푆 (푚) this by insisting the every message begins with a fresh nonce or a value derived from the current time. of the form′ ′ (푚for, messages휏 ) of her푚 choice, and her 푘 푚 goal푉 (푚 is to, 휏 try) = to 1create a new message 푡that was not sent by Alice, 푖 푘 푖 1 푡 but for which(푚 she, 푆 can(푚 forge)) a valid tag푚′ ,that … , 푚 will pass verification. 푚 Our notion of security guarantees that she’ll′ only be able to do so with 4 4 A priori you might ask if we should not also give negligible probability, in which case the휏 MAC is CMA-secure. Mallory an oracle to as well. After all, in the course of those many interactions, Mallory could also send Bob many messages푉푘(⋅) of her choice, and observe from his behavior whether′ ′ or not these passed verification. It is a (푚good, 휏exercise) to show that R adding such an oracle does not change the power of Remark 4.9 — Why can Mallory choose the messages?. the definition, though we note that this is decidedly The notion of a “chosen message attack” might seem not the case in the analogous question for encryption. a little “over the top”. After all, Alice is going to send to Bob the messages of her choice, rather than those chosen by her adversary Mallory. However, as cryp- tographers have learned time and again the hard way, it is better to be conservative in our security defini- tions and think of an attacker that has as much power as possible. First of all, we want a message authentica- tion code that will work for any sequence of messages, and so it’s better to consider this “worst case” setting of allowing Mallory to choose them. Second, in many realistic settings an adversary could have some effect on the messages that are being sent by the parties. This has occurred time and again in cases ranging from web servers to German submarines in World War II, and we’ll return to this point when we talk about chosen plaintext and chosen ciphertext attacks on encryption schemes. 122 an intensive introduction to cryptography

R Remark 4.10 — Strong unforgability. Some texts (such as Boneh Shoup) define a stronger notion of unforgabil- ity where the adversary cannot even produce new sig- natures for messages it has queried in the attack. That is, the adversary cannot produce a valid message- signature pair that it has not seen before. This stronger definition can be useful for some applications. Itis fairly easy to transform MACs satisfying Definition 4.8 into MACs satisfying strong unforgability. In partic- ular, if the signing function is deterministic, and we use a canonical verifier algorithm where iff then weak unforgability automatically 푘 implies strong unforgability since every푉 message(푚, 휎) has= a 1 푘 single푆 (푚) signature = 휎 that would pass verification (can you see why?).

4.3 MACS FROM PRFS

We now show how pseudorandom function generators yield mes- sage authentication codes. In fact, the construction is so immediate that much of the more applied cryptographic literature does not dis- tinguish between these two concepts, and uses the name “Message Authentication Codes” to refer to both MAC’s and PRF’s. However, since this is not applied cryptographic literature, the distinction is rather important.

Theorem 4.11 — MAC Theorem. Under the PRF Conjecture, there exists a secure MAC.

Proof. Let be a secure pseudorandom function generator with bits output (which we can obtain using Theorem 4.7). We define 퐹 (⋅, ⋅) and to output iff . Suppose to- wards푛/2 the sake of contradiction that there exists an adversary breaks 푘 푘 the푆 (푚) security = 퐹 (푘, of 푚) this construction푉 (푚, 휏) of a MAC.1 That퐹 (푘, is, 푚)queries = 휏 many times and with probability for some polynomial퐴 푘 outputs that she did not ask for such that퐴 푆 (⋅). 푝표푙푦(푛)We use to′ construct′ an adversary 1/푝(푛)that can distinguish′ between′ 푝oracle access(푚 to, 휏 a) PRF and a random function′ by simulating퐹 (푘, 푚 ) the = 휏 MAC security game퐴 inside . Every time 퐴requests the signature of some message , returns ′ . When returns at the end of the MAC game,′ 퐴returns if 퐴 , and′ ′otherwise. If 푚 퐴for some′ completely푂(푚) random퐴′ function′ (푚 , 휏 ) , then the value would be completely퐴 random1 푂(푚 in) = 휏 and0 independent of 푂(⋅)all prior =′ 퐻(⋅) queries. Hence the probability that this푛/2 value퐻(⋅) would equal 퐻(푚is at most) . If instead {0,, then 1} by the fact that wins′ the MAC security−푛/2 game with probability , the adversary will휏 2 푂(⋅) = 퐹 (푘, ⋅) 퐴 ′ 1/푝(푛) 퐴 pseudorandom functions 123

output with probability . That means that such an adversary can distinguish between an oracle to and an oracle to a random′ 1 function , which1/푝(푛) gives us a contradiction. 퐴 퐹 (푘, ⋅) ■ 퐻 4.4 ARBITRARY INPUT LENGTH EXTENSION FOR MACS AND PRFS

So far we required the message to be signed to be no longer than the key (i.e., both bits long). However, it is not hard to see that this requirement is not really needed. If our message푚 is longer, we can divide푘 it into blocks푛 and sign each message individually. The disadvantage here is that the size of the tag (i.e., 1 푡 푖 MAC output) will grow푚 with, … the , 푚 size of the message. However,(푖, 푚 ) even this is not really needed. Because the tag has length for length messages, we can sign the tags and only output those. The verifier can repeat this computation to verify푛/2 this. We can continue푛 1 푡 this way and so get tags of 휏 length, … , 휏 for arbitrarily long messages. Hence in the future, whenever we need to, we can assume that our PRFs and MACs can get inputs푂(푛) in — i.e., arbitrarily length strings. ∗ We note that this issue of length{0, extension 1} is actually quite a thorny and important one in practice. The above approach is not the most efficient way to achieve this, and there are several more practical vari- ants in the literature (see Boneh-Shoup Sections 6.4-6.8). Also, one needs to be very careful on the exact way one chops the message into blocks and pads it to an integer multiple of the block size. Several at- tacks have been mounted on schemes that performed this incorrectly.

4.5 ASIDE: NATURAL PROOFS

Pseudorandom functions play an important role in computational complexity, where they have been used as a way to give “barrier re- 5 This discussion has more to do with computational sults” for proving results such as P NP.5 Specifically, the Natural complexity than cryptography, and so can be safely Proofs barrier for proving circuit lower bounds says that if strong skipped without harming understanding of future material in this course. enough pseudorandom functions exist,≠ then certain types of argu- ments are bound to fail. These are arguments which come up with a property EASY of a Boolean function such that: 푛 • If can be computed by a polynomial푓 ∶ {0, sized 1} circuit,→ {0, then1} it has the property EASY. 푓 • The property EASY fails to hold for a random function with high probability.

• Checking whether EASY holds can be done in time polynomial in the truth table size of . That is, in time. 푂(푛) 푓 2 124 an intensive introduction to cryptography

A priori these technical conditions might not seem very “natu- ral” but it turns out that many approaches for proving circuit lower bounds (for restricted families of circuits) have this form. The idea is that such approaches find a “non generic” property of easily com- putable function, such as finding some interesting correlations be- tween the some input bits and the output. These are correlations that are unlikely to occur in random functions. The lower bound typically follows by exhibiting a function that does not have this property, and then using that to derive that cannot be efficiently computed by 0 this particular restricted family of푓 circuits. 0 The existence of strong enough푓 pseudorandom functions can be shown to contradict the existence of such a property EASY, since a pseudorandom function can be computed by a polynomial sized cir- cuit, but it cannot be distinguished from a random function. While a priori a pseudorandom function is only secure for polynomial time distinguishers, under certain assumptions it might be possible to cre- ate a pseudorandom function with a seed of size, say, , that would be secure with respect to adversaries running in time 5 . 2 푛푂(푛 ) 2 5 Pseudorandom functions from pseudorandom generators and CPA security

In this lecture we will see that the PRG conjecture implies the PRF conjecture. We will also see how PRFs imply an encryption scheme that is secure even when we encrypt multiple messages with the same key. We have seen that PRF’s (pseudorandom functions) are extremely useful, and we’ll see some more applications of them later on. But are they perhaps too amazing to exist? Why would someone imagine that such a wonderful object is feasible? The answer is the following theorem:

Theorem 5.1 — The PRF Theorem. Suppose that the PRG Conjecture is true, then there exists a secure PRF collection such that for every , maps to . ∗ 푠 푠∈{0,1} 푛 푛 푛{푓 } 푠 푠 ∈ {0, 1} 푓 {0, 1} {0, 1} Figure 5.1: The construction of a pseudorandom function from a pseudorandom generator can be illustrated by a depth binary tree. The root is labeled by the seed and for every internal node labeled by a strong 푛 , we use that label as a seed into the PRG푠 to label푛 ’s two children.푣 In particular, the children푥 ∈ of {0, 1}are labeled with 푥 and respectively.퐺 The output푣 of the function on input is the label of the푣 leaf counting퐺 from0(푥) left 1 푠 to right.퐺 (푥) Note that the numbering푡ℎ of leaf is related푓 to the bitstring푖 representation of푖 and the path leaf in the following way: we traverse to leaf from푖 the root by reading off the bits of left푖 to right and descend푖 into the left child of the current node for푖 every 0 we encounter and traverse푛 right푖 for every 1.

Compiled on 9.23.2021 13:32 126 an intensive introduction to cryptography

Proof. We describe the proof, see also Chapter 6 of Rosulek or Section 8.5 of Katz-Lindell (section 7.5 in 2nd edition) for alternative exposi- tions. If the PRG Conjecture is true then in particular by the length exten- sion theorem there exists a PRG that maps bits into bits. Let’s denote 푛 where2푛 denotes concatenation. That is, denotes퐺 ∶ {0, the 1} first→ {0,bits 1} and denotes푛 0 1 the last 2푛bits of . 퐺(푠) = 퐺 (푠) ∘ 퐺 (푠) ∘ 0 1 For , we define퐺 (푠) as 푛 퐺 (푠) 푛 푛퐺(푠) 푠 푖 ∈ {0, 1} 푓 (푖)

푖푛 푖푛−1 푖1 This corresponds to composed퐺 (퐺 applications(⋯ 퐺 (푠))). of for . If the bit of ’s binary string is 0 then the application of the PRG is 푏 otherwise푡ℎ it is 푛. This series of successive푡ℎ applications퐺 푏 ∈ starts{0, 1} with the initial푗 seed푖 . 푗 0 1 퐺 This definition퐺 directly corresponds to the depiction in Fig. 5.1, where the successive푠 applications of correspond to the recursive labeling procedure. 푏 By the definition above we can퐺 see that to evaluate we need to evaluate the pseudorandom generator times on inputs of length , 푠 and so if the pseudorandom generator is efficiently푓 computable(푖) then so is the pseudorandom function. Thus,푛 “all” that’s left is to prove푛 that the construction is secure and this is the heart of this proof. I’ve mentioned before that the first step of writing a proof is con- vincing yourself that the statement is true, but there is actually an often more important zeroth step which is understanding what the statement actually means. In this case what we need to prove is the following: We need to show that the security of the PRG implies the security of the PRF ensemble . Via the contrapositive, this means that we assume that there is an adversary that can distinguish퐺 in time 푠 a black box for {푓from} a black-box for a random function with advantage . We need to use come up퐴 with an adversary that 푠 can푇 distinguish in푓 time(⋅) an input of the form (where is random in 휖 ) from an input퐴 of the form where is random퐷 in with bias푛 at least푝표푙푦(푇 ) . 퐺(푠) 푠 Figure 5.2: In the “lazy evaluation” implementation Assume2푛 {0, that 1} as above is a -time adversary푦 that wins푦 in the “PRF of the black box to the adversary, we label every {0,game” 1} with advantage . Let휖/푝표푙푦(푇 us consider ) the “lazy evaluation” imple- node in the tree only when we need it. Subsequent mentation of the퐴 black box for 푇 illustrated in Fig. 5.2. That is, at every traversals do not reevaluate the PRG, leading to reuse 휖 of the intermediate seeds. Thus for example, two point in time there are nodes in the full binary tree that are labeled sibling leaves will correspond to a single call to , and nodes which we haven’t yet퐴 labeled. When makes a query , where is their parent’s label, but with the left child this query corresponds to the path in the tree. We look at the receiving the first bits and the right child receiving퐺(푥) the second푥 bits of . In this figure check marks lowest (furthest away from the root) node on this퐴 path which has푖 correspond to nodes푛 that have been labeled and 1 푛 been labeled by some value , and then푖 … 푖 we continue labelling the path question marks푛 to nodes퐺(푥) that are still unlabeled. 푣 푦 pseudorandom functions from pseudorandom generators and cpa security 127

from downwards until we reach . In other words, we label the two children of by and , and then if the path involves the first푣 child then we label its children푖 by and , and 0 1 so on and so푣 forth퐺 (see(푦) Fig.퐺 5.3(푦)). Note that because 푖 and 0 0 1 0 correspond to a single call to , regardless퐺 (퐺 of(푦)) whether퐺 the(퐺 traversals(푦)) 0 1 continues left or right (i.e. whether the current level퐺 corresponds(푦) 퐺 (푦) to a value 0 or 1 in ) we label both퐺 children at the same time.

푖 Figure 5.3: When the adversary queries , the oracle takes the path from to the root and computes the generator on the minimum number of internal푖 nodes that is needed to obtain푖 the label of the leaf. 푡ℎ 푖

A moment’s thought shows that this is just another (arguably cum- bersome) way to describe the oracle that simply computes the map . And so the experiment of running with this oracle pro- duces precisely the same result as running with access to . Note 푠 that푖 ↦ since 푓 (푖) has running time at most , the number퐴 of times our or- 푠 acle will need to label an internal node is at퐴 most 푓(since(⋅) we label at most퐴 nodes for every query푇). ′ We now define the following hybrids: in the푇 ≤hybrid, 2푛푇 we run this experiment2푛 but in the first′ times푖 the oracle푡ℎ needs to label internal nodes then it uses independent푇 random labels.푗 That is, for the first times we label a node , instead푗 of letting the label of be (where is the parent of , and corresponds to whether is 푏 the left푗 or right child of ), we푣 label by a random string in푣 퐺 .(푢) Note푢 that the hybrid푣 corresponds푏 ∈ {0, 1} to the case where the oracle푛푣 implements the function푡ℎ 푢 , while푣 in the hybrid{0, all labels 1} are random and0 hence implements a random function.′푡ℎ By the hybrid 푠 argument, if can distinguish푖 ↦ 푓 between(푖) the hybrid푇 and the hybrid with bias then there must exists some푡ℎ such that it distin-′푡ℎ guishes between퐴 the hybrid (pictured in0Fig. 5.4) and the 푇 hybrid (pictured휖 in Fig.푡ℎ 5.5) with bias at least 푗 . We will use this푠푡 and to break the pseudorandom푗 generator. ′ 푗 + 1 We can now describe our distinguisher (see휖/푇Fig. 5.6) for the 푗 pseudorandom퐴 generator. On input a string will run 퐷 2푛 푦 ∈ {0, 1} 퐷 128 an intensive introduction to cryptography

Figure 5.4: In the hybrid the first internal labels

are drawn uniformly푡ℎ at random from . All sub- sequent children’s푗 labels are produced푗 in the usual way by seeding with the label of the푈푛 parent and assigning the first bits ( ) to the left child and the last bits ( 퐺 ) to the right푧 child. For exam- ple, for some node푛 at퐺 the0(푧) level, we generate 1 pseudorandom푛 퐺 string(푧)퐿 and푡ℎ label the left child 푗−1 and푣 the right퐿 child푗 . 푗−1 Note퐿 that the퐿 labeling퐺(푣 scheme) for this푅 diagram is퐿 dif- ferent푣푗 =퐺 from0(푣푗−1 that) in the previous figures.푣푗 = 퐺This1(푣푗−1 is )simply for ease of exposition, we could still index our nodes via the path reaching them from the root.

Figure 5.5: The hybrid differs from the

in that the process of푠푡 assigning random labels con-푡ℎ tinues until the푗 + 1 step as opposed to the 푗 . The hybrids are otherwise푠푡 completely identically푡ℎ constructed. 푗 + 1 푗 pseudorandom functions from pseudorandom generators and cpa security 129

Figure 5.6: Distinguisher D is similar to hybrid , in that the nodes in the first layers are assigned completely random labels. When evaluating along푗 a particular path through 푗 , rather than labeling the two children by applying퐿 to its label, it simply splits the input into two푣 strings푗−1 , . If is truly random, is identical퐺 to hybrid . If for some푦 random seed 푦,0...푛 then푦푛+1...2푛simulates 푦hybrid . 퐷 푗 + 1 푦 = 퐺(푠) 푠 퐷 푗

and the oracle inside its belly with one difference- when the time comes푡ℎ to label the node, instead of doing this by applying the pseudorandom퐴 푗 generator푡ℎ to the label of its parent (which is what should happen in the 푗 oracle) it uses its input to label the two children of . 푡ℎ 푣 Now, if was completely푗 random then we get푦 exactly the distribu- tion of the 푣 oracle, and hence in this case simulates internally the 푦hybrid.푠푡 However, if for some randomly sampled 푠푡 푗, though+ 1 it may not be obvious at first,퐷 we actually get the distribution푗 + 1 푛 of the oracle. 푦 = 퐺(푠) 푠 ∈The {0, equivalence 1} between푡ℎ hybrid and distinguisher under the condition that 푗 is non obvious, because in hybrid , the label for the children of was supposed푗 to be the result of퐷 applying the pseudorandom푦 generator = 퐺(푠)퐿 to the label of and not to some푗 other 푗−1 random string (see푣Fig. 5.6). However, because퐿 was labeled before the 푗−1 step then we know that it was actually푣 labeled by a random string. Moreover,푡ℎ since we use lazy evaluation we know푣 that step is the first time푗 where we actually use the value of the label of . Hence, if at this point we resampled this label and used a completely independent푗 random string then the distribution of and would푣 be identical. The key observations here are: 퐿 푗−1 푠 푣 푠 1. The output of does not directly depend on the internal labels, but only on the labels of the leaves (since those are the only values returned by the퐴 oracle).

2. The label for an internal vertex is only used once, and that is for generating the labels for its children. 푣 130 an intensive introduction to cryptography

Hence the distribution of , for drawn from , is iden- tical to the distribution, , of the hybrid, and thus if had 푛 advantage in breaking the퐿 PRF푦 = 퐺(푠)then푡ℎ 푠 will have advantage푈 푗−1 in breaking the PRG thus퐺(푣 obtaining) a푗 contradiction. 퐴 ′ 푠 휖 {푓 } 퐷 휖/푇■ 퐺

R Remark 5.2 — PRF’s in practice. While this construc- tion reassures us that we can rely on the existence of pseudorandom functions even on days where we re- member to take our meds, this is not the construction people use when they need a PRF in practice because it is still somewhat inefficient, making calls to the underlying pseudorandom generators. There are constructions (e.g., HMAC) based on hash푛 functions that require stronger assumptions but can use as few as two calls to the underlying function. We will cover these constructions when we talk about hash functions and the random oracle model. One can also obtain practical constructions of PRFs from block ciphers, which we’ll see later in this lecture.

5.1 SECURELY ENCRYPTING MANY MESSAGES - CHOSEN PLAIN- TEXT SECURITY

Let’s get back to our favorite task of encryption. We seemed to have nailed down the definition of secure encryption, or did we?

P Try to think what kind of security guarantees are not provided by the notion of computational secrecy we saw in Definition 2.6

Definition 2.6 talks about encrypting a single message, but this is not how we use encryption in the real world. Typically, Alice and Bob (or Amazon and Boaz) setup a shared key and then engage in many back and forth messages between one another. At first, we might think that this issue of a single long message vs. many short ones is merely a technicality. After all, if Alice wants to send a sequence of messages to Bob, she can simply treat them as a single long message. Moreover, the way that stream ciphers work, Alice 1 2 푡 can compute(푚 the, encryption 푚 , … , 푚 ) for the first few bits of the message she decides what will be the next bits and so she can send the encryption of to Bob and later the encryption of . There is some truth to this sentiment, but there are issues with using stream ciphers for 1 2 multiple푚 messages. For Alice and Bob to푚 encrypt messages in this pseudorandom functions from pseudorandom generators and cpa security 131

way, they must maintain a synchronized shared state. If the message was dropped by the network, then Bob would not be able to decrypt 1 correctly the encryption of . 푚 There is another way in which treating many messages as a single 2 tuple is unsatisfactory. In real푚 life, Eve might be able to have some im- pact on what messages Alice encrypts. For example, the Katz-Lindell book describes several instances in World War II where Allied forces made particular military maneuver for the sole purpose of causing the axis forces to send encryptions of messages of the Allies’ choosing. To consider a more modern example, today Google uses encryption for all of its search traffic including (for the most part) the ads that are displayed on the page. But this means that an attacker, by pay- ing Google, can cause it to encrypt arbitrary text of their choosing. This kind of attack, where Eve chooses the message she wants to be en- crypted is called a chosen plaintext attack. You might think that we are already covering this with our current definition that requires security for every pair of messages and so in particular this pair could chosen by Eve. However, in the case of multiple messages, we would want to allow Eve to be able to choose after she saw the encryption of . All that leads us to the following definition, which is a strengthen- 2 1 ing of our definition of computational푚 security: 푚

Definition 5.3 — Chosen Plaintext Attack (CPA) secure encryption. An en- cryption scheme is secure against chosen plaintext attack (CPA secure) if for every polynomial time , Eve wins with probability at most (퐸, 퐷)in the game defined below: 퐸푣푒 1. The key1/2 +is 푛푒푔푙(푛) chosen at random in and fixed. 1 2. Eve gets the length of the key as input.푛 3. Eve interacts푘 with for 푛 {0,rounds 1} as follows: in the 1 Giving Eve the key as a sequence of s as op- round, Eve chooses a message1 and obtains . 푡ℎ posed to in binary representation is a common′ no- tational convention in cryptography. It푛 makes 1 no 4. Then Eve chooses two퐸 messages푡 = 푝표푙푦(푛) , and gets 푖 푖 푖 푘 푖 difference except that it makes the input length for for . 푚 푐 = 퐸∗ (푚 ) Eve of length , which makes sense since we want to 0 1 푘 푏 5. Eve continues to interact with 푚for, another 푚 푐 rounds,= 퐸 (푚 as ) allow Eve to run in time. 푅 푛 in Step푏 ← 3. {0, 1} 푝표푙푦(푛) 6. Eve wins if she outputs . 퐸 푝표푙푦(푛)

푏 Definition 5.3 is illustrated in Fig. 5.7. Our previous notion of com- putational secrecy (i.e., Definition 2.6) corresponds to the case that we skip Steps 3 and 5 above. Since Steps 3 and 5 only give the ad- versary more power (and hence is only more likely to win), CPA security (Definition 5.3) is stronger than computational secrecy (Def- Figure 5.7: In the CPA game, Eve interacts with the inition 2.6), in the sense that every CPA secure encryption is encryption oracle and at the end chooses , gets an encryption and outputs . She wins also computationally secure. It turns out that CPA security is strictly 0 1 if ∗ 푚′ , 푚 푘 푏 (퐸, 퐷) ′ 푐 = 퐸 (푚 ) 푏 푏 = 푏 132 an intensive introduction to cryptography

stronger, in the sense that without modification, our stream ciphers cannot be CPA secure. In fact, we have a stronger, and intially some- what surprising theorem:

Theorem 5.4 — CPA security requires randomization. There is no CPA secure where is deterministic.

Proof. The(퐸, proof 퐷) is very퐸 simple: Eve will only use a single round of interacting with where she will ask for the encryption of . In the second round, Eve will choose and , andℓ get 1 she퐸 wil then output if and onlyℓ if . 푐ℓ 0 0 1 ■ ∗ 푚 = 0 푚∗ = 1 푘 푏 1 푐 = 퐸 (푚 ) 0 푐 = 푐 This proof is so simple that you might think it shows a problem with the definition, but it is actually a real problem with security. If you encrypt many messages and some of them repeat themselves, it is possible to get significant information by seeing the repetition pattern (que the XKCD cartoon again, see Fig. 5.8). To avoid this issue we need to use a randomized (or probabilistic) encryption, such that if we encrypt the same message twice we won’t see two copies of the same ciphertext.2 But how do we do that? Here pseudorandom functions Figure 5.8: Insecurity of deterministic encryption 2 come to the rescue: If the messages are guaranteed to have high entropy which roughly means that the probability that a message repeats itself is negligible, then it is possible Theorem 5.5 — CPA security from PRFs. Suppose that is a PRF to have a secure deterministic private-key encryption, collection where , then the following is a and this is sometimes used in practice. (Though often 푠 some sort of randomization or padding is added CPA secure encryption scheme:푛 ℓ {푓 } where 푠 to ensure this property, hence in effect creating a , and푓 ∶ {0, 1} → {0,. 1} randomized encryption.) Deterministic encryptions 푠 푠 can sometimes be useful for applications such as 푛 퐸 (푚) = (푟, 푓 (푟) ⊕ 푚) 푅 푠 푠 efficient queries on encrypted databases. See this Proof.푟 ← I leave{0, 1} to you퐷 to verify(푟, 푧) = that 푓 (푟) ⊕ 푧 . We need to show lecture in Dan Boneh’s coursera course. the CPA security property. As is usual in PRF-based constructions, we 푠 푠 first show that this scheme 퐷will(퐸 be(푚)) secure if =was 푚 an actually random function, and then use that to derive security. 푠 Consider the game above when played with푓 a completely random function and let be the random string chosen by in the round and the string chosen in the last round. We start with the following푡ℎ 푖 simple∗ but crucial푟 claim: 퐸 푖 Claim:푟 The probability that for some is at most . Proof of claim: For any particular∗ , since is chosen indepen-푛 푖 dently of , the probability that푟 = 푟 is ∗. Hence푖 the claim푇 /2 fol- lows from the union bound. QED∗ 푖 −푛푟 푖 푖 Given this푟 claim we know that푟 with= 푟 probability2 (which is ), the string is distinct from any string that was푛 chosen before. This means that by∗ the lazy evaluation principle,1 − 푇 /2 if is 1a completely− 푛푒푔푙(푛) random function푟 then the value can be thought 푠 of as being chosen at random in the final round∗ independently푓 (⋅) of 푠 푓 (푟 ) pseudorandom functions from pseudorandom generators and cpa security 133

anything that happened before. But then amounts to simply using the one-time pad to encrypt .∗ That is, the distributions 푠 푏 and (where we think푓 (푟 of) ⊕ 푚 as fixed and 푏 the randomness∗ comes∗ from the choice of푚 the random∗ function ) 푠 0 푠 1 0 1 푓are(푟 both) ⊕ 푚 equal to푓 the(푟 uniform) ⊕ 푚 distribution over푟 , 푚 , 푚 and hence 푠 Eve gets absolutely no information about . 푛 푓 (⋅) 푛 This shows that if was a random function푈 then{0, 1} Eve would win the game with probability at most . Now푏 if we have some efficient 푠 Eve that wins the game푓 (⋅) with probability at least then we can build an adversary for the PRF that1/2 will run this entire game with black box access to and will output if and1/2 only + if 휖 Eve wins. By the argument above,퐴 there would be a difference of at least in the 푠 probability it outputs푓 (⋅)when is random1 vs when it is pseudoran- dom, hence contradicting the security property of the PRF. 휖 푠 1 푓 (⋅) ■

5.2 PSEUDORANDOM PERMUTATIONS / BLOCK CIPHERS

Now that we have pseudorandom functions, we might get greedy and want such functions with even more magical properties. This is where the notion of pseudorandom permutations comes in. ::: {.definition title=“Pseudorandom permutations” #PRPdef} Let be some function that is polynomially bounded (i.e., there are some such that for every ). A ℓcollection ∶ ℕ → ℕ of functions where푐 퐶 for is called a pseudorandom0 < 푐 < 퐶 permutation푛 (PRP)< ℓ(푛) collection <ℓ 푛 if: ℓ 푛 푠 푠 {푓 } 푓 ∶ {0, 1} → {0, 1} ℓ = ℓ(|푠|) 1. It is a pseudorandom function collection (i.e., the map is efficiently computable and there is no efficient distinguisher 푠 between with a random and a random function).푠, 푥 ↦ 푓 (푥) 2. Every function is a permutation of (i.e., a one to one and 푠 onto map).푓 (⋅) 푠 ℓ 푠 3. There is an efficient푓 algorithm that{0, on 1} input returns . The parameter is known as the key length of the pseudorandom−1 푠 permutation collection and the parameter 푠, 푦 is known푓 (푦) as the input length or block푛 length. Often, and so in most cases you can safely ignore this distinction. ℓ = ℓ(푛) ℓ = 푛

P At first look ?? might seem not to make sense, since on one hand it requires the map to be a permu- tation, but on the other hand it can be shown that with 푠 high probability a random map푥 ↦ 푓 (푥) will not be a permutation. How can then suchℓ a collec-ℓ tion be pseudorandom? The key퐻 insight ∶ {0, is1} that→ while {0, 1} a random map might not be a permutation, it is not 134 an intensive introduction to cryptography

possible to distinguish with a polynomial number of queries between a black box that computes a random function and a black box that computes a random permutation. Understanding why this is the case, and why this means that ?? is reasonable, is crucial to getting intuition to this notion, and so I suggest you pause now and make sure you understand these points.

As usual with a new concept, we want to know whether it is possi- ble to achieve it and whether it is useful. The former is established by the following theorem:

Theorem 5.6 — PRP’s from PRFs. If the PRF conjecture holds (and hence by Theorem 5.1 also if the PRG conjecture holds) then there exists a pseudorandom permutation collection.

Figure 5.9: We build a PRP on bits from three PRFs on bits by letting 푝where2푛 , 1 2 3 푓푠 , 푓푠 and, 푓푠 푛 . 1 2 3 1 푝푠 ,푠 ,푠 (푥1, 푥2) = (푧1, 푦2) 푦1 = 푥1 ⊕ 푓푠 (푥2) 2 3 푦2 = 푥2 ⊕ 푓푠 (푦1) 푧1 = 푓푠 (푦2) ⊕ 푦1

Proof. Fig. 5.9 illustrates the construction of a pseudorandom permu- tation from a pseudorandom function. The construction (known as the Luby-Rackoff construction) uses several rounds of what is known as the Feistel Transformation that maps a function into a permutation using the푛 map 푛 . 2푛 2푛푓 ∶ {0, 1} → {0,Specifically, 1} given a푔 PRF ∶ {0, family 1} with→ {0,-bit 1} keys, inputs, and outputs,(푥, 푦) ↦ (푥,our 푓(푥) candidate ⊕ 푦) PRP family will be called . Here, 푠 is calculated{푓 } on푛 input 푠1,푠2,푠3 as follows (see 2푛Fig. 5.9): 2푛 {푝 } 2푛 푠1,푠2,푠3 1 2 푝 ∶ {0, 1} → {0, 1} (푥 , 푥 ) ∈ {0, 1} • First, map , where .

1 2 1 2 1 1 푠1 2 (푥 , 푥 ) ↦ (푦 , 푥 ) 푦 = 푥 ⊕ 푓 (푥 ) pseudorandom functions from pseudorandom generators and cpa security 135

• Next, map , where . • Next, map , where . 1 2 1 2 2 2 푠2 1 • Finally, output(푦 , 푥 ) ↦ (푦 , 푦 ) 푦 =. 푥 ⊕ 푓 (푦 ) 1 2 1 2 1 1 푠3 2 (푦 , 푦 ) ↦ (푧 , 푦 ) 푧 = 푦 ⊕ 푓 (푦 ) 푠1,푠2,푠3 1 2 1 2 푝 (푥 , 푥 ) = (푧 , 푦 ) Each of the first three steps above corresponds to a single round of the Feistel transformation, which is easily seen to be both efficiently computable and efficiently invertible. In fact, we can efficiently calcu- late for an arbitrary string by running the above−1 three rounds of Feistel transformations in reverse2푛 order. 푠1,푠2,푠3 1 2 1 2 Thus,푝 the(푧 real, 푦 challenge) in proving Theorem(푧 , 푦 ) 5.6 ∈is {0, not 1} showing that is a valid permutation, but rather showing that it is pseudorandom. The details of remainder of this proof are a bit technical, 푠1,푠2,푠3 and can{푝 be safely} skipped on a first reading. Intuitively, the argument goes like this. Consider an oracle for that answers an adversary’s query by carrying out the three Feistel transformations outlined above and outputting 풪 . 푠1,푠2,푠3 1 2 First,푝 we’ll show that with high probability,(푥 will, 푥 ) never encounter 1 2 the same intermediate string twice, over the course of all queries(푧 , 푦 ) (unless the adversary makes a duplicate query).풪 Since the string , 1 calculated in Step 1, determines푦 the input on which is evaluated 1 in Step 2, it follows that the strings calculated in Step 2 will appear푦 푠2 to be chosen independently and at random. In particular,푓 they too 2 will be pairwise distinct with high probability.푦 Since the string is in turn passed as input to in Step 3, it follows that the strings 2 encountered over the course of all queries will also appear to푦 be 푠3 chosen independently and at푓 random. Ultimately, this means that the 1 oracle’s푧 outputs will look like freshly independent, random strings. 1 2 To make this reasoning(푧 , 푦 ) precise, notice first that it suffices to estab- lish the security of a variant of in which the pseudorandom functions , , and used in the construction are replaced by 푠1,푠2,푠3 truly random functions 푝 . Call this vari- 푠1 푠2 푠3 ant 푓 . Indeed,푓 the푓 assumption that푛 is a PRF푛 collection 1 2 3 tells us that making thisℎ change, ℎ , ℎ has∶ {0, only 1} a negligible→ {0, 1} effect on the ℎ1,ℎ2,ℎ3 푠 output푝 of an adversary with oracle access{푓 to }. With this in mind, our job is to show that for every efficient adversary , the difference Pr Pr 푝 is negligible. In this ℎ ,ℎ ,ℎ expression,푝 1 2 the3 (⋅) first푛 probability퐻(⋅) is taken푛 over the퐴 choice of the random |functions[퐴 (1 ) = 1] − [퐴 (1 )used = 1]| in the Feistel transfor- mation, and the second probability푛 is taken푛 over the random function 1 2 3 ℎ , ℎ , ℎ ∶ {0,. To1} simplify→ {0, 1} matters, suppose without loss of generality2푛 that always2푛 makes distinct queries to its oracle, de- 퐻noted ∶ {0, 1} → {0, 1} in order. Similarly, let (1) (1)퐴 (푞(푛)) (푞(푛))푞(푛) (푖) (푖) (푖) (푥1 , 푥2 ), … , (푥1 , 푥2 ) 푦1 , 푦2 , 푧1 136 an intensive introduction to cryptography

denote the intermediate strings calculated in the three rounds of the Feistel transformation. Here, is a polynomial in . Consider the case in which the adversary is interacting with the oracle for , as opposed푞 to the random oracle.푛 Let us say that 퐴 a collision occurs1 2 at3 if for some , the string ℎ ,ℎ ,ℎ computed푝 while answering ’s th query coincides with the string(푖) 1 1 computed while answering푦 ’s 1th ≤ query. 푖 < We푗 ≤ claim 푞(푛) the probability푦 (푗) 1 that a collision occurs at is퐴 negligibly푖 small. Indeed, if a collision푦 occurs at , then 퐴for푗 some . By the construction of 1 , this means(푖) that푦(푗) . In particular, 1 1 1 푦 푦 = 푦 (푖) (푖) 푖 ≠ 푗(푗) (푗) it cannot1 2 3 be the case that and . Since we assumed ℎ ,ℎ ,ℎ 1 1 2 1 1 2 푝that makes distinct queries푥(푖) ⊕ to ℎ its(푗)(푥 oracle,) =(푖) 푥 it follows⊕(푗) ℎ (푥 that) and 1 1 2 2 hence that and 푥 ≠are 푥 uniform푥 and= 푥 independent.(푖) In(푗) other 2 2 words,퐴 Pr (푖) Pr (푗) 푥 ≠. 푥 Taking 1 2 1 2 a union boundℎ(푖)(푥 over)(푗) all choicesℎ (푥(푖) ) of and(푖) , we(푗) see that(푗) the probability−푛 of 1 1 1 1 2 1 1 2 a collision[푦 at =is 푦 at] most = [푥 ⊕ 푓 (푥, which) = 푥 is negligible.⊕ 푓 (푥 )] = 2 Next, define a collision at ,2 by푛푖 a pair푗 of queries 1 such that 푦 . We argue푞(푛) /2 that the probability of a collision at 2 is also negligible,(푖) (푗) provided푦 that we condition on1 the ≤ overwhelm-푖 < 푗 ≤ 푞(푛) 2 2 ingly likely푦 event= 푦 that no collision occurs at . Indeed, if 2 푦for all , then are distribued independently(푖) (푗) 1 1 1 and uniformly at random.(1) In particular,(푞(푛)) we have푦 Pr 푦 ≠ 푦 2 1 2 1 no collision푖 ≠ 푗 at ℎ (푦 ,), which … , ℎ (푦 is negligible) even after(푖) taking(푗) a union 2 2 bound over all and −푛. The same argument applied[푦 to the=third 푦 round∣ 1 of the Feistel transformation푦 ] = 2 similarly shows that, conditioned on the overwhelmingly푖 likely푗 event that no collision occurs at or , the strings for are also distributed as fresh, 1 2 independent,(1) random(푖) strings. At this point, we’ve shown푦 that푦 the ad- 1 1 versary푧 cannot, … , 푧 distinguish1 ≤ the 푖 ≤ outputs 푞(푛) of the oracle for from the outputs of(1) a random(1) oracle(푞(푛)) unless(푞(푛)) an 1 2 1 2 event with negligibly small probability(푧 occurs., 푦 We), … conclude , (푧 , that푦 the) ℎ1,ℎ2,ℎ3 collection 푝 , and hence our original collection , is a secure PRP collection. ℎ1,ℎ2,ℎ3 푠1,푠2,푠3 For more{푝 details} regarding this proof, see Section 4.5{푝 in Boneh} Shoup or Section 8.6 (7.6 in 2nd ed) in Katz-Lindell, whose proof was used as a model for ours. ■

R Remark 5.7 — How many Feistel rounds?. The construc- tion in the proof of Theorem 5.6 constructed a PRP by performing rounds of the Feistel transformation with a known PRF . It is an interesting exercise to 푝 try to show that3 doing just or rounds of the Feis- 푓 1 2 pseudorandom functions from pseudorandom generators and cpa security 137

tel transformation does not suffice to achieve a PRP. Hint: consider an adversary that makes queries of the form where is held fixed and is varied.

1 2 2 1 (푥 , 푥 ) 푥 푥 The more common name for a pseudorandom permutation is (though typically block ciphers are expected to meet additional security properties on top of being PRPs). The constructions for block ciphers used in practice don’t follow the construction of Theorem 5.6 (though they use some of the ideas) but have a more ad-hoc nature. One of the first modern block ciphers was the Data Encryption Standard (DES) constructed by IBM in the 1970’s. It is a fairly good cipher- to this day, as far as we know, it provides a pretty good num- ber of security bits compared to its key length. The trouble is that its key is only bits long, which is no longer outside the reach of mod- ern computing power. (It turns out that subtle variants of DES are far less secure and56 fall prey to a technique known as differential crypt- analysis; the IBM designers of DES were aware of this technique but kept it secret at the behest of the NSA.) Between 1997 and 2001, the U.S. national institute of standards (NIST) ran a competition to replace DES which resulted in the adop- tion of the block cipher Rijndael as the new advanced encryption standard (AES). It has a block size (i.e., input length) of 128 bits and a key size (i.e., seed length) of 128, 196, or 256 bits. The actual construction of AES (or DES for that matter) is not ex- tremely illuminating, but let us say a few words about the general principle behind many block ciphers. They are typically constructed by repeating one after the other a number of very simple permuta- tions (see Fig. 5.10). Each such iteration is called a round. If there are rounds, then the key is typically expanded into a longer string, which we think of as a tuple of strings via some pseu- 푡dorandom generator known푘 as the key scheduling algorithm. The -th 1 푡 string in the tuple is known푡 as the round(푘 key, …and , 푘 is) used in the round. Each round is typically composed of several components:푡ℎ푖 there is a “key mixing component” that performs some simple푖 permu- tation based on the key (often as simply as XOR’ing the key), there is a “mixing component” that mixes the bits of the block so that bits that were initially nearby don’t stay close to one another, and then there is some non-linear component (often obtained by applying some simple non-linear functions known as “S boxes” to each small block of the input) that ensures that the overall cipher will not be an affine func- tion. Each one of these operations is an easily reversible operations, and hence decrypting the cipher simply involves running the rounds Figure 5.10: A typical round of a block cipher, is the backwards. round key, is the block before the round and 푖 푡ℎ is the block at the end of this round.푡ℎ 푘 푖 푥푖 푖 푥푖+1 138 an intensive introduction to cryptography

5.3 ENCRYPTION MODES

How do we use a block cipher to actually encrypt traffic? Well we could use it as a PRF in the construction above, but in practice people 3 3 Partially this is because in the above construction we use other ways. had to encode a plaintext of length with a ciphertext The most natural approach would be that to encrypt a message of length meaning an overhead of 100 percent in , we simply use where is the PRP/block cipher. This is the communication. 푛 2푛 known as the electronic code book (ECB) mode of a block cipher (see 푠 푠 Fig.푚 5.11). Note that푝 (푚) we can easily{푝 decrypt} since we can compute . If the PRP only accepts inputs of a fixed length , we can use−1 ECB mode to encrypt a message whose length is a multiple of 푠 푠 푝 by(푚) writing {푝 } , where each block has lengthℓ , and then encrypting each block separately.푚 The ciphertext output 1 2 푡 푖 byℓ this encryption푚 = (푚 scheme, 푚 , is… , 푚 ) . A major푚 drawbackℓ 푖 of ECB mode is that it is a deterministic푚 encryption scheme and hence 푠 1 푠 푡 cannot be CPA secure. Moreover,(푝 (푚 this), … is ,actually 푝 (푚 )) a real problem of se- curity on realistic inputs (see Fig. 5.12), so ECB mode should never be used. A more secure way to use a block cipher to encrypt is the cipher block chaining (CBC) mode. The idea of cipher block chaining is to encrypt the blocks of a message sequentially. To encrypt the first block , we XOR with a random string known 1 푡 as the initialization vector, or IV, before푚 = (푚 applying, … , 푚 the) block cipher . Figure 5.11: In the Electronic Codebook (ECB) mode, 1 1 every message is encrypted deterministically and To encrypt one of the later푚 blocks 푚, where , we XOR with 푠 independently the encryption of before applying the block cipher . Formally,푝 푖 푖 the ciphertext consists of the tuple푚IV 푖 >, 1 where IV is푚 chosen 푖−1 푠 uniformly at random푚 and for 푝 (we use 1 푡 the convention that IV). This( encryption, 푐 , … , 푐 process) is depicted 푖 푠 푖−1 푖 in Fig. 5.13. In order to decrypt푐 = 푝IV(푐 ⊕ 푚 ), we1 simply ≤ 푖 ≤ calculate 푡 0 푐for= . Note that if we lose the block to 1 푡 traffic−1 in the CBC mode, then( , 푐we, … are , 푐 ) unable to decrypt the next block 푖 푠 푖 푖−1 푖 Figure 5.12: An encryption of the Linux penguin (left 푚 =, but 푝 we(푐 ) can ⊕ 푐 recover1 from ≤ 푖 that ≤ 푡 point onwards. 푐 image) using ECB mode (middle image) vs CBC On the one hand, CBC mode is vastly superior to a simple elec- mode (right image). The ECB encryption is insecure 푖+1 푐 as it reveals much structure about the original image. tronic codebook since CBC mode with a random IV is CPA secure Image taken from Wikipedia. (proving this is an excellent exercise). On the other hand, CBC mode suffers from the drawback that the encryption process cannot bepar- allelized: the ciphertext block must be computed before . In the output feedback (OFB) mode we first encrypt the all zero string 푖 푖+1 using CBC mode to get a sequence푐 of pseudorandom푐 outputs that we can use as a stream cipher. To transmit a message 1 2 , we send the XOR of (푦with, 푦 the, …) bits output by this stream cipher, along∗ with the IV used to generate the sequence. The receiver Figure 5.13: In the Cypher-Block-Chaining (CBC) the 푚 ∈ {0, 1} 푚 encryption of the previous message is XOR’ed into can decrypt a ciphertext IV by first using IV to recover , the current message prior to encrypting. The first and then taking the XOR of with the appropriate number of bits message is XOR’ed with an initialization vector (IV) 1 2 ( , 푐) (푦 , 푦 , …) that if chosen randomly, ensures CPA security. 푐 pseudorandom functions from pseudorandom generators and cpa security 139

from this sequence. Like CBC mode, OFB mode is CPA secure when IV is chosen at random. Some advantages of OFB mode over CBC mode include the ability for the sender to precompute the sequence well before the message to be encrypted is known, as well as the fact that the underlying function used to generate 1 2 only(푦 , 푦 needs, …) to be a PRF (not necessarily a PRP). 푠 1 2 Perhaps the simplest mode of operation푝 is counter (CTR)(푦 mode, 푦 , …) where we convert a block cipher to a stream cipher by using the stream IV IV IV where IV is a random string in which we identify with (and perform addition modulo 푠 푠 푠 ). That푝푛 ( is, to), 푝 encrypt( + 1), a message 푝 ( + 2),푛 … , we choose IV at random,푛 {0, 1} and output IV [2, where] IV for 1 푡 2 . Decryption is performed푚= similarly. (푚 , … , For 푚 ) a modern block 1 푡 푖 푠 푖 cipher, CTR mode is no( less, 푐 , …secure , 푐 ) than CBC푐 = or 푝 OFB,( + and 푖) in ⊕ fact푚 of- fers1 ≤ several 푖 ≤ 푡 advantages. For example, CTR mode can easily encrypt and decrypt blocks in parallel, unlike CBC mode. In addition, CTR mode only needs to evaluate once to decrypt any single block of the ciphertext, unlike OFB mode. 푠 A fairly comprehensive study푝 of the different modes of block ci- phers is in this document by Rogaway. His conclusion is that if we simply consider CPA security (as opposed to the stronger notions of chosen ciphertext security we’ll see in the next lecture) then counter mode is the best choice, but CBC, OFB and CFB are widely imple- mented due to legacy reasons. ECB should not be used (except as a building block as part of a construction achieving stronger security).

5.4 OPTIONAL, ASIDE: BROADCAST ENCRYPTION

At the beginning of this chapter, we saw the proof of Theorem 5.1, which states that the PRG Conjecture implies the existence of a secure PRF collection. At the heart of this proof was a rather clever construc- tion based on a binary tree. As it turns out, similar tree constructions have been used time and again to solve many other problems in cryp- tography. In this section, we will discuss just one such application of these tree constructions, namely broadcast encryption. Let’s put ourselves in the shoes of Hollywood executives facing the following problem: we’ve just released a new movie for sale (in the form of a download or a Blu-ray disc), and we’d like to prevent it from being pirated. On the one hand, consumers who’ve purchased a copy of the movie should be able to watch it on certain approved, standalone devices such as TVs and Blu-ray players without needing an external internet connection. On the other hand, to minimize the risk of piracy, these consumers should not have access to the movie data itself. 140 an intensive introduction to cryptography

One way to protect the movie data, which we model as a string , is to provide consumers with a secure encryption of the data. Although the secret key used to encrypt the data is hidden from 푥 푘 consumers, it is provided to device manufacturers퐸 so(푥) that they can embed it in their TVs and푘 Blu-ray players in some secure, tamper- resistant manner. As long as the key is never leaked to the public, this system ensures that only approved devices can decrypt and play a consumer’s copy of the movie. For this푘 reason, we will sometimes refer to as the device key. This setup is depicted in Fig. 5.14.

푘 Figure 5.14: The problem setup for broadcast encryp- tion.

Unfortunately, if we were to implement this scheme exactly as written, it would almost certainly be broken in a matter of days. After all, as soon as even a single device is hacked, the device key would be revealed. This would allow the public to access our movie’s data, as well as the data for all future movies we release for these devices!푘 This latter consequence is one that we would certainly want to avoid, and doing so requires the notion of distinct, revocable keys:

Definition 5.8 — Broadcast Encryption Scheme. For our purposes, a broad- cast encryption scheme consists of:

• A set of distinct devices (or device manufacturers), each of which has access to one of the -bit device keys . 푚 1 푚 • A decryption algorithm that푛 receives as input푘 a, ciphertext … , 푘 and a key . 퐷 푦 푖 푘 pseudorandom functions from pseudorandom generators and cpa security 141

• An encryption algorithm that receives as input a plaintext , a key , and a revocation set of devices (or device manufacturers) that are no퐸 longer to be trusted. 푥 푚푎푠푡푒푟 푘 푅 ⊆ [푚]

Intuitively, a broadcast encryption scheme is secure if can suc- cessfully recover from whenever , but fails to 푘푖 do so whenever . In our example of movie piracy, such퐷 an en- 푘푚푎푠푡푒푟,푅 cryption scheme would푥 allow퐸 us to(푥)revoke certain푖 device ∉ 푅 keys when we find out that푖 ∈ they 푅 have been leaked. To revoke akey , we would 푖 simply include when encrypting all future movies. Doing푘 so 푖 prevents from being used to decrypt these movies. Crucially,푘 revok- ing the key of푖 the ∈ 푅 hacked device doesn’t prevent a secure device 푖 from푘 continuing to perform decryption on future movie releases; 푖 this is exactly푘 what we want in our푖 system. 푗 ≠For 푖 the sake of brevity, we will not provide a formal definition of security for broadcast encryption schemes, although this can and has been done. Instead, in the remainder of this section, we will describe a couple examples of broadcast encryption schemes, one of which makes clever use of a tree construction, as promised. The simplest construction of a broadcast encryption scheme in- volves letting be the collection of all device keys and letting be the concatenation over all of a secure 푚푎푠푡푒푟 1 푚 encryption 푘 . Device= (푘 ,performs … , 푘 ) decryption by looking up the 푘푚푎푠푡푒푟,푅 relevant substring퐸 (푥) of the ciphertext and decrypting푖 ∉ 푅 it with . 푘푖 Intuitively, with퐸 (푥) this scheme,푖 if represents our movie data and there 푘푖 푖 are one million퐸 devices,(푥) then is just an encryption푘 of one million copies of the movie푥 (one for each device key). Revok- 푘푚푎푠푡푒푟,푅 ing푚 the ≈ key amounts to only encrypting퐸 (푥) copies of all future movies, so that device can no longer perform decryption. 푖 Clearly, this푘 simple solution to the broadcast999, 999 encryption prob- lem has two serious inefficiencies:푖 the length of the master keyis , and the length of each encryption is . One way to address the former problem is to use a . That 푂(푛푚)is, we can shorten the master key by choosing푂(|푥|푚) a fixed PRF collection , and calculating each device key by the rule . The latter problem can be addressed using a technique known as 푘 푖 푖 푘푚푎푠푡푒푟 hybrid{푓 } encryption. In hybrid encryption,푘 we encrypt 푘 by= first 푓 choos-(푖) ing an ephemeral key , encrypting using each device key where , and then outputting푛 the concatenation푥 of these 푅 strings , along with푘̂ ← a single{0, 1}encryption 푘̂ of the movie us- 푖 푘 푖 ∉ 푅 ing the ephermal푖 key. Incorporating these two optimizations reduces 푘 푘̂ the length퐸 of(푘)̂ to and the length of퐸 each(푥) encryption to . 푚푎푠푡푒푟 푘 푂(푛) 푂(푛푚 + |푥|) 142 an intensive introduction to cryptography

Figure 5.15: A tree based construction of broadcast encryption with revocable keys.

It turns out that we can construct a broadcast encryption scheme with even shorter ciphertexts by considering a tree of keys (see Fig. 5.15). The root of this tree is labeled , its children are and , their children are , and so on.∅ The depth of the tree is 0 1 log , and the value of each key in the tree푘 is decided uniformly푘 at푘 00 01 10 11 random, or by applying푘 , 푘 a, key푘 , derivation푘 function to a string . 2 Each푚 device receives all the keys on the path from the root to the 푚푎푠푡푒푟 th leaf. For example, if , then device receives the푘 keys 푖 . 푖 ∅ To encrypt a message푚, =we 8 carry out the following011 procedure: 0 01 011 푘initially,, 푘 , 푘 when, 푘 no keys have been revoked, we encrypt using an ephermal key (as described푥 above) and encrypt with a single device key . This is sufficient since all devices have푥 access to . In order to add∅ a푘 hacked̂ device to the revocation set,푘̂ we discard all∅ log keys푘 belonging to device , which comprise a root-to-leaf푘 path in the tree. Instead of using these푖 keys, we will make sure to encrypt 2 all future푚 ’s using the siblings of푖 the vertices along this path. Doing so ensures that (1) device can no longer decrypt secure content and (2) every푘 devicê can continue to decrypt content using at least one of the keys along the path푖 from the root to the th leaf. With this scheme, the total푗 length ≠ 푖 of a ciphertext is only log bits, where is the number of devices revoked so far.푗 When is small, 2 this bound is much better than what we previously푂(푛|푅| achieved푚+|푥|) without a tree-based|푅| construction. |푅| pseudorandom functions from pseudorandom generators and cpa security 143

5.5 READING COMPREHENSION EXERCISES

I recommend students do the following exercises after reading the lecture. They do not cover all material, but can be a good way to check your understanding. Exercise 5.1 Let be the encryption scheme that we saw in Lec- ture 2 where where is a pseudorandom generator. Is this(퐸, scheme 퐷) CPA secure? 푘 퐸 (푚) = 퐺(푘) ⊕ 푚 퐺(⋅) a. No it is never CPA secure. b. It is always CPA secure. c. It is sometimes CPA secure and sometimes not, depending on the properties of the PRG

퐺 ■ Exercise 5.2 Consider the proof constructing PRFs from PRGs. Up to an order of magnitude, how many invocations of the underlying pseudorandom generator does the pseudorandom function collection make when queried on an input ? 푛 a. 푖 ∈ {0, 1} b. c. 푛2 d. 푛 1 푛 2 ■ Exercise 5.3 In the following we identify a block cipher with a pseudo- random permutation (PRP) collection. Which of these statements is true: a. Every PRP collection is also a PRF collection b. Every PRF collection is also a PRP collection c. If is a PRP collection then the encryption scheme is a CPA secure encryption scheme. 푠 푠 d. If{푓 } is a PRF collection then the encryption scheme 퐸 (푚) = 푠 푓 (푚) is a CPA secure encryption scheme. 푠 푠 {푓 } 퐸 (푚) = 푠 푓 (푚) ■

6 Chosen Ciphertext Security

6.1 SHORT RECAP

Let’s start by reviewing what we have learned so far:

• We can mathematically define security for encryption schemes. A natural definition is perfect secrecy: no matter what Eve does, she can’t learn anything about the plaintext that she didn’t know before. Unfortunately this requires the key to be as long as the message, thus placing a severe limitation on the usability of it.

• To get around this, we need to consider computational consid- erations. A basic object is a pseudorandom generator and we con- sidered the PRG Conjecture which stipulates the existence of an efficiently computable function such that

(where denotes the uniform푛 distribution푛+1 on 1 The PRG conjecture is the name we use in this and denotes computational퐺 ∶ {0, indistinguishability). 1} → {0, 1} 1 course. In the literature this is known as the conjec- 푛 푛+1 푚 ture of the existence of pseudorandom generators, 퐺(푈 )푚 ≈ 푈 푈 • We showed that the PRG conjecture implies a pseudorandom gen- and through the work of Håstad, Impagliazzo, Levin {0, 1} ≈ and Luby (HILL) it is known to be equivalent to the erator of any polynomial output length which in particular via the existence of one way functions, see Vadhan, Chapter 7. stream cipher construction implies a computationally secure en- cryption with plaintext arbitrarily larger than the key. (The only re- striction is that the plaintext is of polynomial size which is needed anyway if we want to actually be able to read and write it.)

• We then showed that the PRG conjecture actually implies a stronger object known as a pseudorandom function (PRF) function collection: this is a collection of functions such that if we choose at random and fix it, and give an adversary a black box computing 푠 then she{푓 can’t} tell the difference between this푠 and a black- 2 This was done by Goldreich, Goldwasser and Micali. box computing a random function.2 푠 푖 ↦ 푓 (푖) • Pseudorandom functions turn out to be useful for identification pro- tocols, message authentication codes and this strong notion of security

Compiled on 9.23.2021 13:32 146 an intensive introduction to cryptography

of encryption known as chosen plaintext attack (CPA) security where we are allowed to encrypt many messages of Eve’s choice and still re- quire that the next message hides all information except for what Eve already knew before.

6.2 GOING BEYOND CPA

It may seem that we have finally nailed down the security definition for encryption. After all, what could be stronger than allowing Eve unfettered access to the encryption function? Clearly an encryption satisfying this property will hide the contents of the message in all practical circumstances. Or will it?

P Please stop and play an ominous sound track at this point.

6.2.1 Example: The Wired Equivalence Privacy (WEP) The Wired Equivalence Privacy (WEP) protocol is perhaps one of the most inaccurately named protocols of all times. It was invented in 1999 for the purpose of securing Wi-Fi networks so that they would have virtually the same level of security as wired networks, but al- ready early on several security flaws were pointed out. In particular in 2001, Fluhrer, Mantin, and Shamir showed how the RC4 flaws we mentioned in prior lecture can be used to completely break WEP in less than one minute. Yet, the protocol lingered on and for many years after was still the most widely used WiFi encryption protocol as many routers had it as the default option. In 2007 the WEP was blamed for a hack stealing 45 million credit card numbers from T.J. Maxx. In 2012 (after 11 years of attacks!) it was estimated that it is still used in about a quarter of encrypted wireless networks, and in 2014 it was still the default option on many Verizon home routers. It is still (!) used in some routers, see Fig. 6.1. This is a great example of how hard it is to remove insecure protocols from practical usage (and so how impor- tant it is to get these protocols right). Here we will talk about a different flaw of WEP that is in fact shared by many other protocols, including the first versions of the secure socket layer (SSL) protocol that is used to protect all encrypted web traffic. To avoid superfluous details we will consider a highly abstract (and somewhat inaccurate) version of WEP that still demonstrates our main point. In this protocol Alice (the user) sends to Bob (the access point) an IP packet that she wants routed somewhere on the internet. chosen ciphertext security 147

Figure 6.1: WEP usage over time according to Wigle.net. Despite having documented security issues since 2001 and being officially deprecated since 2004, WEP continued to be the most popular WiFi encryption protocol up to 2012, and at the time of writing, it is still used by a non-trivial number of devices (though see this stackoverflow answer for more).

Thus we can think of the message Alice sends to Bob as a string of the form where is the IP address this packet needsℓ to be routed to and is the actual message that needs 1 2 1 푚to be ∈ {0,delivered. 1} In the WEP푚 = protocol, 푚 ‖푚 the message푚 that Alice sends 2 to Bob has the form CRC 푚 (where denotes concatenation and CRC is some cyclic redundancy check). A CRC is some func- 푘 tion mapping 퐸to(푚‖ which(푚)) is meant‖ to enable detection of errors in(푚) typing or communication.푛 푡 The idea is that if a message is mistyped into{0,, 1} then it{0, is 1} very likely that CRC CRC . It is similar to the checksum′ digits used in credit card numbers and′ many푚 other cases. Unlike푚 a message authentication code,(푚) a ≠ CRC does(푚 ) not have a secret key and is not secure against adversarial perturbations. The actual encryption WEP used was RC4, but for us it doesn’t really matter. What does matter is that the encryption has the form where is computed as some function of the key. In′ particular the′ attack we will describe works even if we use our 푘 퐸stronger(푚 ) = CPA 푝푎푑 secure ⊕ 푚 PRF-based푝푎푑 scheme where for some random (or counter) that is sent out separately. 푘 Now the security of the encryption means that푝푎푑 an = adversary 푓 (푟) seeing the ciphertext 푟 CRC will not be able to know , but since this is traveling over the air, the adversary could “spoof” the 푘 signal and send푐 a= different 퐸 (푚‖ ciphertext(푚)) to Bob. In particular,푚 if the adversary knows the IP address that′ Alice was using (e.g., for example, the adversary can guess that푐 Alice is probably one of the 1 billions of people that visit the website푚 boazbarak.org on a regular basis) then she can XOR the ciphertext with a string of her choosing and hence convert the ciphertext CRC into the ciphertext where is computed so that 1 2 1 2 is equal to the′ adversary’s푐= own 푝푎푑 IP ⊕ address! (푚 ‖푚 ‖ (푚 , 푚 )) 1 2 3 So, the adversary푐 doesn’t= 푐 ⊕ 푥 need to decrypt푥 = 푥 ‖푥 the‖푥 message- by spoof- 1 1 ing푥 ⊕ the 푚 ciphertext she can ensure that Bob (who is an access point, 148 an intensive introduction to cryptography

and whose job is to decrypt and then deliver packets) simply delivers it unencrypted straight into her hands. One issue is that Eve modi- fies then it is unlikely that the CRC code will still check out, and hence Bob would reject the packet. However, CRC 32 - the CRC al- 1 gorithm푚 used by WEP - is linear modulo , that is CRC CRC CRC . This means that if the original ciphertext′ was an encryption′ of the message 2 CRC (푥 ⊕ 푥 )then = (푥) ⊕ (푥CRC) will be an encryption of the message푐 1 2 1 2 ′ 푡 CRC 푡 푚 = 푚(where‖푚 ‖ denotes(푚 , 푚 a) string of 1 1 푐zeroes′ = 푐 of ⊕ the (푥 ‖0 same‖ length(푥 ‖0 ))as , and hence 푡 ). There- 1 1 2 1 1 2 푚fore= by (푚 XOR’ing⊕ 푥 )‖푚with‖ ((푥 CRC⊕ 푚 )‖푚 ), the adversary0 푡 Mallory can 2 2 2 ensure that Bob will deliver푡 푡 the푚 message푡 to푚 the⊕ IP0 address= 푚 1 1 of her choice (see푐 Fig. 6.2푥 ).‖0 ‖ (푥 ‖0 ) 2 1 1 푚 푚 ⊕ 푥 Figure 6.2: The attack on the WEP protocol allowing the adversary Mallory to read encrypted messages even when Alice uses a CPA secure encryption.

6.2.2 Chosen ciphertext security This is not an isolated example but in fact an instance of a general pattern of many breaks in practical protocols. Some examples of pro- tocols broken through similar means include XML encryption, IPSec (see also here) as well as JavaServer Faces, Ruby on Rails, ASP.NET, and the Steam gaming client (see the Wikipedia page on Padding Oracle Attacks). The point is that often our adversaries can be active and modify the communication between sender and receiver, which in effect gives them access not just to choose plaintexts of their choice to encrypt but even to have some impact on the ciphertexts that are decrypted. This motivates the following notion of security (see also Fig. 6.3): chosen ciphertext security 149

Definition 6.1 — CCA security. An encryption scheme is chosen ciphertext attack (CCA) secure if every efficient adversary Mallory wins in the following game with probability at most(퐸, 퐷) :

• Mallory gets where is the length of the key 1/2 + 푛푒푔푙(푛) 푛 • For rounds,1 Mallory푛 gets access to the functions and . 푝표푙푦(푛) 푚 ↦ 푘 푘 • Mallory퐸 (푚) chooses푐 ↦ 퐷 a pair(푐) of messages , a secret is cho- sen at random in , and Mallory gets . 0 1 {푚 , 푚∗ } 푏 푘 푏 • Mallory now gets{0, another 1} rounds푐 of= access 퐸 (푚 to) the func- tions and except that she is not allowed to query to her second oracle.푝표푙푦(푛) 푘 푘 푚 ↦∗ 퐸 (푚) 푐 ↦ 퐷 (푐) • Mallory outputs푐 and wins if . ′ ′ 푏 푏 = 푏

Figure 6.3: The CCA security game.

This might seems a rather strange definition so let’s try to digest it slowly. Most people, once they understand what the definition says, don’t like it that much. There are two natural objections to it:

• This definition seems to be too strong: There is no way we would let Mallory play with a decryption box - that basically amounts to letting her break the encryption scheme. Sure, she could have some impact on the ciphertexts that Bob decrypts and observe some resulting side effects, but there is a long way from that to giving her oracle access to the decryption algorithm. 150 an intensive introduction to cryptography

The response to this is that it is very hard to model what is the “realistic” information Mallory might get about the ciphertexts she might cause Bob to decrypt. The goal of a security definition is not to capture exactly the attack scenarios that occur in real life but rather to be sufficiently conservative so that these real life attacks could be modeled in our game. Therefore, having a too strong definition is not a bad thing (as long as it can be achieved!). The WEP example shows that the definition does capture a practical issue in security and similar attacks on practical protocols have been shown time and again (see for example the discussion of “padding attacks” in Section 3.7.2 of the Katz Lindell book.)

• This definition seems to be too weak: What justification do we have for not allowing Mallory to make the query to the decryp- tion box? After all she is an adversary so she could∗ do whatever she wants. The answer is that the definition would 푐be clearly impossi- ble to achieve if Mallory could simply get the decryption of and learn whether it was an encryption of or . So this restriction∗ is the absolutely minimal one we could make without causing푐 the 0 1 notion to be obviously impossible. Perhaps푚 surprisingly,푚 it turns out that once we make this minimal restriction, we can in fact con- struct CCA-secure encryptions.

What does CCA have to do with WEP? The CCA security game is some- what strange, and it might not be immediately clear whether it has anything to do with the attack we described on the WEP protocol. However, it turns out that using a CCA secure encryption would have prevented that attack. The key is the following claim: Lemma 6.2 Suppose that is a CCA secure encryption. Then, there is no efficient algorithm that given an encryption of the plain- text outputs a ciphertext(퐸, 퐷) that decrypts to where . ′ 푐′ 1 2 1 2 ′ (푚 , 푚 ) 푐 (푚 , 푚 ) 1In particular1 Lemma 6.2 rules out the attack of transforming that 푚 ≠ 푚 encrypts a message starting with a some address IP to a ciphertext that starts with a different address IP . Let us now sketch its proof.푐 ′ Proof. We’ll show that such if we had an adversary that violates the conclusion of the claim, then there is an adversary′ that can win in the CCA game. 푀 The proof is simple and relies on the crucial fact that푀 the CCA game allows to query the decryption box on any ciphertext of her choice, as long as it’s not exactly identical to the challenge cipertext . In par- ticular,푀 if is able to morph an encryption of to some encryption∗ of some different′ that agrees with on some set of bits,푐 then can′ do the푀 following: in′ the security game, use푐 푚to be some random 푐 푚 푚 푀 0 푚 chosen ciphertext security 151

message and to be this plaintext . Then, when receiving , apply to it to obtain a ciphertext (note that if the plaintext differs∗ then 1 the′ ciphertext푚 must differ also;′ can푚 you see why?) ask the푐 decryption box푀 to decrypt it and output 푐if the resulting message agrees with in the corresponding set of bits (otherwise output a random bit). If was successful with probability1 , then would win in the CCA푚 game′ with probability at least or so. 푀 휖 푀 ■ 1/2 + 휖/10

P The proof above is rather sketchy. However it is not very difficult and proving Lemma 6.2 on your own is an excellent way to ensure familiarity with the definition of CCA security.

6.3 CONSTRUCTING CCA SECURE ENCRYPTION

The definition of CCA seems extremely strong, so perhaps it isnot surprising that it is useful, but can we actually construct it? The WEP attack shows that the CPA secure encryption we saw before (i.e., ) is not CCA secure. We will see other ex- amples of non CCA secure encryptions in the exercises. So, how do 푘 푘 we퐸 (푚) construct = (푟,such 푓 (푟) a ⊕ scheme? 푚) The WEP attack actually already hints of the crux of CCA security. We want to ensure that Mallory is not able to modify the challenge ciphertext to some related . Another way to say it is that we need to ensure the∗ integrity of messages′ to achieve their confidentiality if we want to푐 handle active adversaries푐 that might modify messages on the channel. Since in a great many practi- cal scenarios, an adversary might be able to do so, this is an important message that deserves to be repeated:

To ensure confidentiality, you need integrity.

This is a lesson that has been time and again been shown and many protocols have been broken due to the mistaken belief that if we only 3 I also like the part where Green says about a block care about secrecy, it is enough to use only encryption (and one that cipher mode that “if OCB was your kid, he’d play is only CPA secure) and there is no need for authentication. Matthew three sports and be on his way to Harvard.” We will have an exercise about a simplified version of Green writes this more provocatively as the GCM mode (which perhaps only plays a single sport and is on its way to …). You can read about Nearly all of the symmetric encryption modes you learned OCB in Exercise 9.14 in the Boneh-Shoup book; about in school, textbooks, and Wikipedia are (poten- it uses the notion of a “tweakable block cipher” 3 tially) insecure. which simply means that given a single key , you actually get a set of permutations that exactly because these basic modes only ensure security for pas- are indistinguishable from independent random푘 permutation (the{푝 set푘,1, … , 푝푘,푡}is called the set of sive eavesdropping adversaries and do not ensure chosen ciphertext “tweaks” and we sometimes푡 index it using strings security which is the “gold standard” for online applications. (For instead of numbers). {1, … , 푡} 152 an intensive introduction to cryptography

symmetric encryption people often use the name “authenticated en- cryption” in practice rather than CCA security; those are not identical but are extremely related notions.) All of this suggests that Message Authentication Codes might help us get CCA security. This turns out to be the case. But one needs to take some care exactly how to use MAC’s to get CCA security. At this point, you might want to stop and think how you would do this…

P You should stop here and try to think how you would implement a CCA secure encryption by combining MAC’s with a CPA secure encryption. chosen ciphertext security 153

P If you didn’t stop before, then you should really stop and think now. 154 an intensive introduction to cryptography

OK, so now that you had a chance to think about this on your own, we will describe one way that works to achieve CCA security from MACs. We will explore other approaches that may or may not work in the exercises.

Theorem 6.3 — CCA from CPA and MAC (encrypt-then-sign). Let be CPA-secure encryption scheme and be a CMA-secure MAC with bit keys and a canonical verification algorithm. 4(퐸,Then 퐷) the following encryption with keys(푆, 푉 ) bits is CCA secure: 푛 ′ ′ • is obtained(퐸 by, 퐷 computing) 2푛 , and outputting′ . 1 2 푘1 푘2 퐸푘 ,푘 (푚) 푐 = 퐸 (푚) 휎 = 푆 (푐) • (푐,outputs 휎) nothing (e.g., an error message) if

,′ and otherwise outputs . 2 푘1,푘2 푘 퐷 (푐, 휎) 푉 (푐, 휎) ≠ 4 1 By a canonical verification algorithm we mean that 푘 1 퐷 (푐) iff .

푉푘(푚, 휎) = 1 푆푘(푚) = 휎 Proof. Suppose, for the sake of contradiction, that there exists an ad- versary that wins the CCA game for the scheme with probability′ at least . We consider the following two′ ′ cases: Case푀 I: With probability at least , at some point(퐸 during, 퐷 ) the CCA game, sends1/2 to + its 휖 decryption box a ciphertext that is not identical to′ one of the ciphertexts휖/10 it previously obtained from its encryption box,푀 and obtains from it a non-error response.(푐, 휎) Case II: The event above happens with probability smaller than . We will derive a contradiction in either case. In the first case, we will휖/10 use to obtain an adversary that breaks the MAC , while in the second′ case, we will use to obtain an adversary that breaks the CPA-security푀 of . ′ (푆, 푉 ) Let’s start with Case I: When푀 this case holds, we will build an ad- versary (for “forger”)(퐸, 퐷) for the MAC , we can assume the ad- 5 Since we use a MAC with canonical verification, versary has access to the both signing and verification algorithms access to the signature algorithm implies access to the as black퐹 boxes for some unknown key(푆,that 푉 ) is chosen at random and verification algorithm. fixed.5 퐹will choose on its own, and will also choose at random 2 a number from to , where is the푘 total number of queries that 1 makes퐹 to the decryption푘 box. will run the entire CCA game 0 with′ , using푖 1and푇 its access푇 to the black boxes to execute the de- 푀cryption′ and decryption boxes, all퐹 the way until just before makes 1 the 푀query 푘 to its decryption box. At that point, will output′ 푡ℎ. We claim that with probability at least , our forger푀 will 0 succeed푖 in the(푐, CMA 휎) game in the sense that (i) the query퐹 will (푐,pass 휎) verification, and (ii) the message was not휖/(10푇 previously ) queried before to the signing oracle. (푐, 휎) 푐 chosen ciphertext security 155

Indeed, because we are in Case I, with probability , in this game some query that makes will be one that was not asked before and hence was not queried′ by to its signing oracle, and휖/10 moreover, the returned message푀 is not an error message, and hence the signature passes verification. Since is random,퐹 with probability this query will be at the round. Let us assume that this above event 0 GOOD happened in which푡ℎ 푖 the -th query to the decryption휖/(10푇 box ) is 0 a pair that both푖 passes verification and the pair was not 0 returned before by the encryption푖 oracle. Since we pass (canonical) verification,(푐, 휎) we know that , and because all(푐, encryption 휎) queries return pairs of the form , this means that no such 푘2 query returned as its first휎 =element 푆 ′ (푐) either.′ In other words, when the 푘2 event GOOD happens the query(푐 , 푆contains(푐 )) a pair such that was not queried푐 before to the푡ℎ signature box, but passes verifi- 0 cation. This is the definition푖 of breaking in a(푐, chosen 휎) message푐 attack, and hence we obtain a contradiction to the(푐, CMA 휎) security of . (푆, 푉 ) Now for Case II: In this case, we will build an adversary for (푆,CPA-game 푉 ) in the original scheme . As you might expect, the adversary will choose by herself the key for the MAC퐸푣푒 scheme, and attempt to play the CCA security(퐸, 퐷) game with . When makes 2 encryption queries퐸푣푒 this should not be a problem-푘 ′can forward′ the plaintext to its encryption oracle to get 푀 and then푀 com- pute since she knows the signing key퐸푣푒. 푘1 However,푚 what does do when makes푐 = 퐸 decryption(푚) queries? 푘2 2 That휎 is, = suppose 푆 (푐) that sends a query of′ the form푘 to its de- cryption box. To simulate퐸푣푒′ the algorithm푀 , will need access to a decryption box for , but푀 she doesn’t get such′ a box in(푐, the 휎) CPA game (This is a subtle point- please pause here퐷 and퐸푣푒 reflect on it until you are sure you understand퐷 it!) To handle this issue will follow the common approach of “winging it and hoping for the best”. When sends a query of the form , will first퐸푣푒 check if it happens′ to be the case that was returned before as an answer to an encryption푀 query . In this case (푐, 휎)will퐸푣푒 breathe a sigh of relief and simply return to (푐,as 휎) the answer. (This is obviously correct: if is the encryption푚 ′ of then퐸푣푒is the decryption of .) However, if the query푚 푀 has not been returned before as an answer, then (푐, 휎)is in a bit of a pickle. The푚 way out푚 of it is for her to simply(푐, 휎) return “error” and hope that(푐, 휎) every- thing will work out. The crucial observation퐸푣푒 is that because we are in case II things will work out. After all, the only way makes a mis- take is if she returns an error message where the original decryption box would not have done so, but this happens with퐸푣푒 probability at most . Hence, if has success in the CCA game, then even ′ 휖/10 푀 1/2 + 휖 156 an intensive introduction to cryptography

if it’s the case that always outputs the wrong answer when makes this mistake, we′ will still get success at least . Since is non negligible,푀 this would contradict the CPA security of 퐸푣푒 thereby concluding the proof of the theorem. 1/2 + 0.9휖 휖 (퐸, 퐷) ■

P This proof is emblematic of a general principle for proving CCA security. The idea is to show that the de- cryption box is completely “useless” for the adversary, since the only way to get a non error response from it is to feed it with a ciphertext that was received from the encryption box.

6.4 (SIMPLIFIED) GCM ENCRYPTION

The construction above works as a generic construction, but it is some- what costly in the sense that we need to evaluate both the block cipher and the MAC. In particular, if messages have blocks, then we would need to invoke two cryptographic operations (a block cipher encryp- tion and a MAC computation) per block. The푡GCM (Galois Counter Mode) is a way around this. We are going to describe a simplified ver- sion of this mode. For simplicity, assume that the number of blocks is fixed and known (though many of the annoying but important de- tails in block cipher modes of operations involve dealing with padding푡 to multiple of blocks and dealing with variable block size). A universal hash function collection is a family of functions such that for every , the random variablesℓ and푛 (taken over the choice′ of a randomℓ from{ℎ ∶ {0,this 1} family)→ {0, are 1} pairwise} ′ independent푥 in ≠ 푥 ∈ {0,. That 1} is, for every two potentialℎ(푥) outputsℎ(푥 ) , 2푛 ℎ ′ 푛 {0, 1} Pr (6.1) 푦, 푦 ∈ {0, 1} ′ ′ −2푛 Universal hash functionsℎ [ℎ(푥) = have 푦 ∧ rather ℎ(푥 ) = efficient 푦 ] = 2 constructions, and in particular if we relax the definition to allow almost universal hash func- tions (where we replace the factor in the righthand side of (6.1) by a slightly bigger, though still−2푛 negligible quantity) then the con- structions become extremely2 efficient and the size of the description of 6 6 In -almost universal hash functions we require that is only related to , no matter how big is. for every , and ,

Our encryption scheme is defined as follow. The key is where the probability휖 ′ that 푛 is at′ most . Itℓ can be easily shown that the analysis below extends ℎ is an index to a pseudorandom푛 permutationℓ and is the key for 푦, 푦 ∈ {0, 1} 푥′ ≠ 푥 ∈ {0, 1} 7 to almost universalℎ(푥) hash = functions ℎ(푥 ) as long as휖 is a universal hash function. To encrypt a message (푘, ℎ) negligible, but we will leave verifying this to the 푘 푘 do the following: {푝 } ℎ reader.휖 휖 1 푡 7 In practice the key is derived from the key by 푛푡 푚 = (푚 , … , 푚 ) ∈ • Choose IV at random in . applying the PRP to some particular input. {0, 1} ℎ 푘 푛 [2 ] chosen ciphertext security 157

• Let IV for .

푖 푘 • Let 푧 = 퐸 ( +. 푖) 푖 = 1, … , 푡 + 1

푖 푖 푖 • Let 푐 = 푧 ⊕ 푚 .

푡+1 1 푡 푡+1 • Output푐 =IV ℎ(푐 , … , 푐 ) ⊕. 푧

1 푡+1 The communication( , 푐 , … , 푐 overhead) includes one additional output block plus the IV (whose transmission can often be avoided or reduced, de- pending on the settings; see the notion of “nonce based encryption”). This is fairly minimal. The additional computational cost on top of block-cipher evaluation is the application of . For the particular choice of used in Galois Counter Mode, this function can be eval-푡 uated very efficiently- at a cost of a singleℎ(⋅) multiplication in the Galois field of ℎsize per block (one can think of it as some veryℎ particu- lar operation128 that maps two bit strings to a single one, and can be carried out quite2 efficiently). We leave it as an (excellent!) exercise to prove that the resulting scheme128 is CCA secure.

6.5 PADDING, CHOPPING, AND THEIR PITFALLS: THE “BUFFER OVERFLOW” OF CRYPTOGRAPHY

In this course we typically focus on the simplest case where messages have a fixed size. But in fact, in real life we often need to chop long messages into blocks, or pad messages so that their length becomes an integral multiple of the block size. Moreover, there are several subtle ways to get this wrong, and these have been used in several practical attacks.

Chopping into blocks: A block cipher a-priori provides a way to en- crypt a message of length , but we often have much longer messages and need to “chop” them into blocks. This is where the block cipher modes discussed in the previous푛 lecture come in. However, the basic popular modes such as CBC and OFB do not provide security against chosen ciphertext attack, and in fact typically make it easy to extend a ciphertext with an additional block or to remove the last block from a ciphertext, both being operations which should not be feasible in a CCA secure encryption.

Padding: Oftentimes messages are not an integer multiple of the block size and hence need to be padded. The padding is typically a map that takes the last partial block of the message (i.e., a string of length in ) and maps it into a full block (i.e., a string ). The map needs to be invertible which in particular푚 means that푛 if{0, the … message , 푛 − 1} is already an integer multiple of the block size푚 ∈ we {0, will 1} need to add an extra block. (Since we have to map all the 158 an intensive introduction to cryptography

messages of length into the messages of length in a푛−1 one-to-one fashion.) One approach for doing푛 so is to pad 1an + 2 + … +length 2 message with the string1, … , 푛 − 1 . Sometimes2 people ′ use a′ different푛 padding which involves 푛−푛encoding−1 the length of the pad. 푛 < 푛 10 6.6 CHOSEN CIPHERTEXT ATTACK AS IMPLEMENTING METAPHORS

The classical “metaphor” for an encryption is a sealed envelope, but as we have seen in the WEP, this metaphor can lead you astray. If you placed a message in a sealed envelope, you should not be able to modify it to the message without opening the envelope, and yet this is exactly what푚 happens′ in the canonical CPA secure encryp- tion 푚 ⊕. 푚 CCA security comes much closer to realizing the metaphor, and hence is considered as the “gold stan- 푘 푘 dard”퐸 of(푚) secure = (푟, encryption. 푓 (푟) ⊕ 푚) This is important even if you do not in- tend to write poetry about encryption. Formal verification of computer programs is an area that is growing in importance given that com- puter programs become both more complex and more mission critical. Cryptographic protocols can fail in subtle ways, and even published proofs of security can turn out to have bugs in them. Hence there is a line of research dedicated to finding ways to automatically prove se- curity of cryptographic protocols. Much of these line of research is based on simple models to describe protocols that are known as Dolev Yao models, based on the first paper that proposed such models. These models define an algebraic form of security, where rather than thinking of messages, keys, and ciphertexts as binary string, we think of them as abstract entities. There are certain rules for manipulating these symbols. For example, given a key and a message you can create the ciphertext , which you can decrypt back to using the same key. However the assumption is that푘 any information푚 that cannot be 푘 obtained by such{푚} manipulation is unknown. 푚 Translating a proof of security in this algebra to a proof for real world adversaries is highly non trivial. However, to have even a fight- ing chance, the encryption scheme needs to be as strong as possible, and in particular it turns out that security notions such as CCA play a crucial role.

6.7 READING COMPREHENSION EXERCISES

I recommend students do the following exercises after reading the lecture. They do not cover all material, but can be a good way to check your understanding. Exercise 6.1 Let be the “canonical” PRF-based CPA secure en- cryption, where and is a PRF collection and is chosen(퐸, at random. 퐷) Is this scheme CCA secure? 푘 푘 푘 퐸 (푚) = (푟, 푓 (푟) ⊕ 푚) {푓 } 푟 chosen ciphertext security 159

Figure 6.4: The Dolev-Yao Algebra of what an adver- sary or “intruder” knows. Figure taken from here.

a. No it is never CCA secure. b. It is always CCA secure. c. It is sometimes CCA secure and sometimes not, depending on the properties of the PRF .

푘 {푓 } ■ Exercise 6.2 Suppose that we allow a key to be as long as the message, and so we can use the one time pad. Would the one-time pad be: a. CPA secure b. CCA secure c. Neither CPA nor CCA secure.

Exercise 6.3 Which of the following statements is true about the proof of Theorem 6.3: a. Case I corresponds to breaking the MAC and Case II corresponds to breaking the CPA security of the underlying encryption scheme. b. Case I corresponds to breaking the CPA security of the underlying encryption scheme and Case II corresponds to breaking the MAC. c. Both cases correspond to both breaking the MAC and encryption scheme d. If neither Case I nor Case II happens then we obtain an adversary breaking the security of the underlying encryption scheme.

7 Hash Functions and Random Oracles

We have seen pseudorandom generators, functions and permuta- tions, as well as Message Authentication codes, CPA and CCA secure encryptions. This week we will talk about cryptographic hash func- tions and some of their magical properties. We motivate this by the Bitcoin . As usual our discussion will be highly abstract and idealized, and any resemblance to real cryptocurrencies, living or dead, is purely coincidental.

7.1 THE “BITCOIN” PROBLEM

Using cryptography to create a centralized digital-currency is fairly straightforward, and indeed this is what is done by Visa, Mastercard, and so on. The main challenge with Bitcoin is that it is decentralized. There is no trusted server, there are no “user accounts”, no central authority to adjudicate claims. Rather we have a collection of anony- mous and autonomous parties that somehow need to agree on what is a valid payment.

7.1.1 The Currency Problem Before talking about cryptocurrencies, let’s talk about currencies in 1 1 I am not an economist by any stretch of the imagina- general. At an abstract level, a currency requires two components: tion, so please take the discussion below with a huge of salt. I would appreciate any comments on it. • A scarce resource.

• A mechanism for determining and transferring ownership of certain quantities of this resource.

Some currencies are/were based on commodity money. The scarce resource was some commodity having intrinsic value, such as gold or silver, or even salt or tea, and ownership based on physical pos- session. However, for various financial and political reasons, some societies shifted to representative money, where the currency is not the commodity itself but rather a certificate that provides the right to the commodity. Representative money requires trust in some central

Compiled on 9.23.2021 13:32 162 an intensive introduction to cryptography

authority that would respect the certificate. The next step in the evo- lution of currencies was fiat money, which is a currency (like today’s dollar, ever since the U.S. moved off the gold standard) that does not correspond to any commodity, but rather only relies on trust in a cen- tral authority. (Another example is the Roman coins, which though originally made of silver, have underdone a continous process of debasement until they contained less than two percent of it.) One ad- vantage (sometimes disadvantage) of a fiat currency is that it allows for more flexible monetary policy on parts of the central authority.

7.1.2 Bitcoin Architecture Bitcoin is a fiat currency without a central authority. A priori this seems like a contradiction in terms. If there is no trusted central au- thority, how can we ensure a scarce resource? who settles claims of ownership? and who sets monetary policy? For instance, one problem we are particularly concerned with is the double-spend problem. The following scenario is a double-spend:

1. Adversary orders a pizza from Pinocchio’s. 2. gives Pinocchio’s a particular “set” of money . 3. eats the pizza.퐴 4. 퐴 gives that same set of money to another Domino’s푚 such that Pinocchio’s퐴 no longer has that money. 5. 퐴 eats the second pizza. 푚

With퐴 cash, this situation is unfathomable. But think about a credit card: if you can “revoke” (or dispute) the first payment, you could take money away from Pinocchio’s after you’ve received some goods or services. Also consider that rather than giving to Domino’s in step 4, could just give back to itself. We want to make it difficult or impossible푚 for the anyone to perform a double-spend퐴 like푚 this. Bitcoin (and other cryptocurrencies) aims to provide cryptographic solutions to this problem and more.

The basic unit in the Bitcoin system is a coin. Each coin has a unique 2 2 This is one of the places where we simplify and identifier, and a current owner . Transactions in the system have either deviate from the actual Bitcoin system. In the actual the form of “mint coin with identifier ID and owner ” or “transfer Bitcoin system, the atomic unit is known as a Satoshi and one Bitcoin (abbreviated BTC) is Satoshis. the coin ID from to ”. All of these transactions are recorded in a For reasons of efficiency, there is no8 individual iden- public ledger. 푃 tifier per Satoshi and transactions10 can involve transfer Since there are푃 no user푄 accounts in Bitcoin, the “entities” and and creation of multiple Satoshis. However, conceptu- ally we can think of atomic coins each of which has a are not identifiers of any physical person. Rather and are “com- unique identifier. putational puzzles”. A computational puzzle can be thought of푃 as a 푄 string that specifies some “problem” such that푃 it’s푄 easy to verify whether some other string is a “solution” for , but it is hard to find such a훼 solution on your own. (Students with complexity background 훽 훼 hash functions and random oracles 163

will recognize here the class NP.) So when we say “transfer the coin ID from to ” we mean that whomever holds a solution for the puzzle is now the owner of the coin ID (and to verify the authen- ticity of this푃 transfer,푄 you provide a solution to the puzzle .) More accurately,푄 a transaction involving the coin ID is self-validating if it contains a solution to the puzzle that is associated with ID푃according to the latest transaction in the ledger.

P Please re-read the previous paragraph, to make sure you follow the logic.

One theoretical example of a puzzle is the following: if is the puzzle, an entity can “prove” that they own coins assigned to if they can produce numbers such that . 푎푙푝ℎ푎 Another more generic example (that you can keep in mind푎푙푝ℎ푎 as a potential implementation for퐴, the 퐵 puzzles we푁 =use 퐴 here) ⋅ 퐵 is: is some string in and will be a string in such that where 2푛 is some pseudorandom푛 generator.훼 The real{0, Bitcoin 1} 푛 system훽 2푛 typically uses puzzles{0, 1} based on digital훼 = 퐺(훽) sig- natures퐺, a ∶ concept {0, 1} → we {0, will 1} learn about later in this course, but you can simply think of as specifying some abstract puzzle and every per- son that can solve can construct transactions with the coins owned 3 There are reasons why Bitcoin uses digital signatures by .3 Unfortunately,푃 this means if you lose the solution to the puz- and not these puzzles. The main issue is that we zle then you have no푃 access to the coin. More alarmingly, if someone want to bind the puzzle not just to the coin but also to the particular transaction, so that if you know the steals푃 the solution from you, then you have no recourse or way to get solution to the puzzle corresponding to the coin your coin back. People have managed to lose millions of dollars in this ID and want to use that to transfer it to , it won’t be possible for someone to푃 take your solution and use way. that to transfer the coin to before your푄 transaction is added to the public ledger.′ We will come back to this issue after we learn about푄 digital signatures. As 7.2 THE BITCOIN LEDGER a quick preview, in Bitcoin the puzzle is as follows: whoever can produce a digital signature with the The main idea behind Bitcoin is that there is a public ledger that con- private key corresponding to the public key can claim these coins. tains an ordered list of all the transactions that were ever performed 푃 and are considered as valid in the system. Given such a ledger, it is easy to answer the question of who owns any particular coin. The main problem is how does a collection of anonymous parties with- out any central authority agree on this ledger? This is an instance of the consensus problem in distributed computing. This seems quite scary, as there are very strong negative results known for this prob- lem; for example the famous Fischer, Lynch, Patterson (FLP) result showed that if there is even one party that has a benign failure (i.e., it halts and stop responding) then it is impossible to guarantee con- sensus in a completely asynchronous network. Things are better if we assume some degree of partial synchrony (i.e., a global clock and 164 an intensive introduction to cryptography

some bounds on the latency of messages) as well as that a majority or supermajority of the parties behave correctly. The partial synchrony assumption is typically approximately main- tained on the Internet, but the honest majority assumption seems quite suspicious. What does it mean a “majority of parties” in an anonymous network where a single person can create multiple “en- tities” and cause them to behave arbitrarily maliciously (known as “byzantine” faults in distributed parlance)? Also, why would we as- sume that even one party would behave honestly- if there is no central authority and it is profitable to cheat then they everyone would cheat, wouldn’t they?

Figure 7.1: The Bitcoin ledger consists of an ordered list of transactions. At any given point in time there might be several “forks” that continue the ledger, and different parties do not necessarily have to agree on them. However, the Bitcoin architecture is designed to ensure that the parties corresponding to a majority of the computing power will reach consensus on a single ledger.

Perhaps the main idea behind Bitcoin is that “majority” will corre- spond to a “majority of computing power”, or as the original Bitcoin paper says, “one CPU one vote” (or perhaps more accurately, “one cycle one vote”). It might not be immediately clear how to imple- ment this, but at least it means that creating fictitious new entities (sometimes known as a Sybill attack after the movie about multiple- personality disorder) cannot help. To implement it we turn to a cryp- tographic concept known as “proof of work” which was originally suggested by Dwork and Naor in 1991 as a way to combat mass mar- 4 keting email.4 This was a rather visionary paper in that it foresaw this issue before the term “spam” was introduced and Consider a pseudorandom function mapping bits to bits. indeed when email itself, let alone spam email, was On average, it will take a party Alice queries to obtain an input hardly widespread. 푘 such that . So, if we’re not tooℓ{푓 careful,} we might푛 thinkℓ of such an input as aℓproof that Alice spent2 time. 푥 푘 푓 (푥) = 0 ℓ 푥 2 P Stop here and try to think if indeed it is the case that one cannot find an input such that using much fewer than steps. ℓ 푘 ℓ 푥 푓 (푥) = 0 2 The main question in using PRF’s for proofs of work is who is hold- ing the key for the pseudorandom function. If there is a trusted server holding the key, then sure, finding such an input would take on average 푘 queries, but the whole point of Bitcoin is to not have a ℓ 푥 2 hash functions and random oracles 165

trusted server. If we give to a party Alice, then can we guarantee that she can’t find a “shortcut” to find such an input without running queries? The answer, in푘 general, is no. ℓ 2 P Indeed, it is an excellent exercise to prove that (under the PRF conjecture) that there exists a PRF map- ping bits to bits and an efficient algorithm such 푘 that such that . {푓 } 푛 푛 ℓ 퐴 푘 퐴(푘) = 푥 푓 (푥) = 0 However, suppose that was somehow a “super-strong PRF” that would behave like a random function even to a party that holds 푘 the key. In this case, we can{푓 imagine} that making a query to corre- sponds to tossing independent random coins, and it would not be 푘 feasible to obtain such that using much less than푓 cy- cles. Thus presentingℓ such an input canℓ serve as a “proof of work”ℓ 푘 that you’ve spent 푥 cycles or푓 so.(푥) By = adjusting 0 we can obtain a2 proof of spending cyclesℓ for a value of푥 our choice. Now if things would go as usual in this2 course then I would state a resultℓ like the following: 푇 푇 Theorem: Under the PRG conjecture, there exist super strong PRF.

Where again, the “super strong PRF” behaves like a truly random function even to a party that holds the key. Unfortunately such a result is not known to be true, and for a very good reason. Most natural ways to define “super strong PRF” will result in properties that canbe shown to be impossible to achieve. Nevertheless, the intuition behind it still seems useful and so we have the following heuristic:

The random oracle heuristic (aka “Random oracle model”, Bellare-Rogaway 1993): If a “natural” protocol is secure when all parties have access to a random function , then it remains secure even when we give푛 the partiesℓ the description of a cryptographic퐻 ∶ {0, 1} hash→ function {0, 1} with the same input and output lengths.

We don’t have a good characterization as to what makes a proto- col “natural” and we do have fairly strong counterexamples to this heuristic (though they are arguably “unnatural”). That said, it still seems useful as a way to get intuition for security, and in particular to analyze Bitcoin (and many other practical protocols) we do need to assume it, at least given current knowledge.

R 166 an intensive introduction to cryptography

Remark 7.1 — Important caveat on the random oracle model. The random oracle heuristic is very different from all the conjectures we considered before. It is not a formal conjecture since we don’t have any good way to define “natural” and we do have examples of protocols that are secure when all parties have ac- cess to a random function but are insecure whenever we replace this random function by any efficiently computable function (see the homework exercises).

Under the random oracle model, we can now specify the “proof of work” protocol for Bitcoin. Given some identifier ID , an integer , and a hash function , the푛 proof ∈ {0, 1} of work corresponding푛 to ID and will be some2푛 푛 such that 5 The actual Bitcoin protocol is slightly more general, 5 the first푇log ≪ 2 bits of ID are zero.퐻 ∶ {0, 1} → {0, 1} ∗ where the proof is some such that ID , when 푇 푥 ∈ {0, 1} interpreted as a number in , is at most . There are also other issues about how exactly is placed ⌈ 푇 ⌉ 퐻( ‖푥) 푥 푛 퐻( ‖푥) 7.2.1 From Proof of Work to Consensus on Ledger and ID is computed from past[2 ] history that푇 we ignore here. 푥 How does proof of work help us in achieving consensus? We want every transaction in the Bitcoin system to have a corresponding proof of work. In particular, some proof of time 푖 “amount” of work with respect푡 to some identifier that is unique to . 푖 The length of a ledger is the sum of the corresponding푇 푖 ’s. In other words, the length corresponds to the total number of 푡 1 푛 cycles invested in creating(푡 this, … , ledger. 푡 ) A ledger is valid if every trans- 푖 푇action in the ledger of the form “transfer the coin ID from to ” is self-certified by a solution to . Critically, participants (specifically miners) in the Bitcoin푃 network푄 are rewarded for adding valid푃 entries to the ledger. In other words, they are given Bitcoins (which are newly minted for them) for per- forming the “work” required to add an entry to the ledger. However, honest participants (including non-miners, people who just read the ledger) will accept the longest known ledger as the ground truth. In addition, Bitcoin miners are rewarded for adding entry after entry is added to the ledger. This gives miners an incentive to choose the longest ledger to contribute their work towards. To see푖 why, con- sider푖 + 100 the following rough approximation of the incentive structure: Remember that Bitcoin miners are rewarded for adding entry after entry is added to the ledger. Thus, by spending “work” (which directly corresponds to CPU cycles, which directly corresponds to푖 monetary푖 + 100 value), miners are “betting” on whether a particular ledger will “win”. Think of yourself as a miner, and consider a scenario in which there are two competing ledgers. Ledger 1 has length and Ledger 2 has length . That means miners have put roughly 2x the amount of work (= CPU cycles = money) into Ledger 2. In order3 for Ledger 1 to “win” (from6 your perspective that means reach length hash functions and random oracles 167

to claim your prize and to become longer than Ledger 2), you would have to perform entries worth of work just to get Ledger 1 to 104length . But in the meantime, other miners will already be working on Ledger 2, further increasing3 its length! Thus you want to add entries to Ledger6 2. If a ledger corresponds to the majority of the cycles that were available in this network then every honest party would accept it, as any alternative퐿 ledger would be necessarily shorter. (See Fig. 7.1.) Thus one can hope that the consensus ledger will continue to grow. (This is a rather hand-wavy and imprecise argument, see this paper for a more in depth analysis; this is also related to the phenomenon known as preferential attachment.)

Cost to mine, mining pools: Generally, if you know that completing a -cycle proof will get you a single coin, then making a single query (which will succeed with probability ) is akin to buying a lottery 푇ticket that costs you a single cycle and has probability to win a single coin. One difference over the1/푇 actual lottery is that there is also some probability that you’re working on the wrong1/푇 fork of the ledger, but this incentivizes people to avoid this as much as possible. Another, perhaps even more major difference, is that things are setup so that this is a profitable enterprise and the cost of a cycle is smaller than the value of coins. Just like in the lottery, people can and do gather in groups (known as “mining pools”) where they pool together all their computing1/푇 resources, and then split the award if they win it. Joining a pool doesn’t change your expectation of winning but reduces the variance. In the extreme case, if everyone is in the same pool, then for every cycle you spend you get exactly coins. The way these pools work in practice is that someone that spent cycles looking for an output with all zeroes, only has probability1/푇 of getting it, but is very likely to get an output that begins with퐶 log zeroes. This output can serve as their own “proof of work”퐶/푇 that they spent cycles and they can send it to the pool management so they퐶 get an appropriate share of the reward. 퐶 The real Bitcoin: There are several aspects in which the protocol described above differs from the real Bitcoin protocol. Some of them were already discussed above: Bitcoin typically uses digital sig- natures for puzzles (though it has a more general scripting language to specify them), and transac- tions involve a number of Satoshis (and the user interface typically displays currency is in units of BTC which are Satoshis). The Bitcoin protocol also has a formula8 designed to factor in the decrease in dollar cost per10 cycle so that Bitcoins become more 168 an intensive introduction to cryptography

expensive to mine with time. There is also a fee mechanism apart from the mining to incentivize parties to add to the ledger. (The issue of incentives in Bitcoin is quite subtle and not fully resolved, and it is possible that parties’ behavior will change with time.) The ledger does not grow by a single transaction at a time but rather by a block of transac- tions, and there is also some timing synchronization mechanism (which is needed, as per the consensus impossibility results). There are other differences as well; see the Bonneau et al paper as well as the Tschorsch and Scheuermann survey for more.

7.3 COLLISION RESISTANCE HASH FUNCTIONS AND CREATING SHORT “UNIQUE” IDENTIFIERS

Another issue we “swept under the carpet” is how do we come up with these unique identifiers per transaction. We want each transac- tion to be bound to the ledger state , and so the ID of should contain also the ID’s all the prior transactions. But yet we want 푖 1 푖−1 푖 this ID푡 to be only bits long. Ideally,(푡 we, … could , 푡 solve) this if we had푡 a one to one mapping from to for some very large . Then the푛 ID corresponding푁 to the task푛 of appending to would simply퐻 be{0, 1} {0,. The 1} only problem is that 푖 푁this ≫ is of푛 course clearly impossible- is much bigger than and푡 1 푖−1 1 푖 (푡there, … is , 푡 no) one to one map from퐻(푡 a large‖ ⋯푁 ‖푡 set) to a smaller set. Luckily푛 we are in the magical world of crypto where2 the impossible is routine2 and the unimaginable is occasional. So, we can actually find a function that is “essentially” one to one. The main idea is the following simple result, which can be thought퐻 of as one side of the so called “birthday paradox”: Lemma 7.2 If is a random function from some domain to , then the probability that after queries an attacker finds such푛 that 퐻 is at most . 푆 {0,′ 1} ′ 푇2 푛 푥 ≠ 푥 Proof.퐻(푥)Let = us 퐻(푥 think) of in the푇 “lazy/2 evaluation” mode where for every query the adversary makes, we choose a random answer in Figure 7.2: A collision-resistant hash function is a at the time it is퐻 made. (We can assume the adversary never map that from a large universe to a small one that is “practically one to one” in the sense that collisions for makes푛 the same query twice since a repeat query can be simulated by the function do exist but are hard to find. {0,repeating 1} the same answer.) For in let be the event that . Since is chosen at random and independently 푖,푗 from the prior choice of , the푖 < probability 푗 [푇 ] of퐸 is . Thus the 푖 푗 푗 퐻(푥probability) = 퐻(푥 of the) union퐻(푥 of ) over all ’s is less than −푛 , but this 푖 푖,푗 probability is exactly what퐻(푥 we) needed to calculate.퐸 2 2 푛 푖,푗 퐸 푖, 푗 푇 /2 ■ hash functions and random oracles 169

This means that a random function is collision resistant in the sense that it is hard for an efficient adversary to find two inputs that collide. Thus the random oracle heuristic퐻 would suggest that a crypto- graphic hash function can be used to obtain the following object:

Definition 7.3 — Collision resistant hash functions. A collection of functions where for is a collision 푘 resistant hash function (CRH)∗ collection if푛 the map 푛 {ℎ } 푘 is efficiently computableℎ ∶ {0, 1} and→ for {0, 1}every 푘efficient ∈ {0, 1} adversary , 푘 the probability over that such(푘, that 푥) ↦ ℎ and(푥) 6 is negligible. ′ 퐴 ′

′ 푘 퐴(푘) = (푥, 푥 ) 푥 ≠ 푥 6 푘 푘 Note that the other side of the birthday bound ℎOnce(푥) = more ℎ (푥 we) do not know a theorem saying that under the PRG shows that you can always find a collision in using conjecture there exists a collision resistant hash function collection, roughly queries. For this reason we typically 푘 need to double푛/2 the output length of hash functionsℎ even though this property is considered as one of the desiderata for compared2 to the key size of other cryptographic cryptographic hash functions. However, we do know how to obtain primitives (e.g., bits as opposed to bits). collections satisfying this condition under various assumptions that 256 128 we will see later in the course such as the learning with error problem and the factoring and discrete logarithm problems. Furthermore if we consider the weaker notion of security under a second preimage attack (also known as being a “universal one way hash function” or UOWHF) then it is known how to derive such a function from the PRG assumption.

R Remark 7.4 — CRH vs PRF. A collection of colli- sion resistant hash functions is an incomparable object 푘 to a collection of pseudorandom functions{ℎ } with the same input and output lengths. On one hand, 푠 the condition of{푓 being} collision-resistant does not imply that is indistinguishable from random. For example, it is possible to construct a valid collision 푘 resistant hashℎ function where the first output bit al- ways equals zero (and hence is easily distinguishable from a random function). On the other hand, unlike Definition 4.1, the adversary of Definition 7.3 is not merely given a “black box” to compute the hash func- tion, but rather the key to the hash function. This is a much stronger attack model, and so a PRF does not have to be collision resistant. (Constructing a PRF that is not collision resistant is a nice and recommended exercise.) 170 an intensive introduction to cryptography

7.4 PRACTICAL CONSTRUCTIONS OF CRYPTOGRAPHIC HASH FUNCTIONS

While we discussed hash functions as keyed collections, in practice people often think of a hash function as being a fixed keyless function. However, this is because most practical constructions involve some hardwired standardized constants (often known as IV) that can be thought of as a choice of the key. Practical constructions of cryptographic hash functions start with a basic block which is known as a compression function . The function is defined as 2푛 푛 IV ∗when the푛 message is ℎcomposed ∶ {0, 1} of→blocks {0, 1} (and we can pad퐻 it ∶ otherwise).{0, 1} → {0, See 1}Fig. 7.3. This 1 푡 1 2 푡 construction퐻(푚 , … , 푚is known) = ℎ(ℎ(ℎ(푚 as the Merkle-Damgard, ), 푚 ), ⋯ , 푚 ) construction and we know that it does푡 preserve collision resistance:

Figure 7.3: The Merkle-Damgard construction converts a compression function

into a hash function that maps strings2푛 of arbitrary푛 length into . Theℎ transformation ∶ {0, 1} → preserves {0, 1} collision resistance푛 but does not yield a PRF even if was pseudorandom.{0, 1} Hence for many applications it should not be used directly but rather composed withℎ a transformation such as HMAC.

Theorem 7.5 — Merkle-Damgard preserves collision resistance. Let be constructed from as above. Then given two messages such that we can efficiently 퐻find′ two mes- sages푡푛 ℎ such that ′ . 푚 ≠ 푚 ∈ {0, 1} ′ 퐻(푚)2푛 = 퐻(푚 ) ′ 푥 ≠ 푥 ∈ {0, 1} ℎ(푥) = ℎ(푥 )

Proof. The intuition behind the proof is that if was invertible then we could invert by simply going backwards. Thus in principle if a collision for exists then so does a collisionℎ for . Now of course this is a vacuous퐻 statement since both and shrink their inputs and hence clearly have퐻 collisions. But we want to showℎ a constructive proof for this statement that will allow us toℎ transform퐻 a collision in to a collision in . This is very simple. We look at the computation of and and at the first block in which the inputs퐻 differ but the output isℎ the′ same (there must be such a block). This block will yield퐻(푚) a collision퐻(푚 for) . ■ ℎ hash functions and random oracles 171

7.4.1 Practical Random-ish Functions In practice we want much more than collision resistance from our hash functions. In particular we often would like them to be PRF’s as well. Unfortunately, the Merkle-Damgard construction is not a PRF even when IV is random and secret. This is because we can perform a length extension attack on it. Even if we don’t know IV, given and a block we can compute which equals . ′ 푦 = 퐼푉 1 푡 푡+1 푡+1 퐻 One(푚 fix, …for , 푚 this) is to use푚 a different IV in the end푦of= the ℎ(푦, encryption. 푚 ) 퐼푉 1 푡+1 That is, we define:퐻 (푚 , … , 푚 ) ′

IV IV A variant′ of this construction′ (where IV is obtained as some sim- 퐼푉 , 1 푡 퐼푉 1 푡 ple퐻 function(푚 of, …IV ,) 푚 is) known = ℎ( as, 퐻 HMAC(푚 , and … ,′ 푚it can)) be shown to be a pseudorandom function under some pseudorandomness assump- tions on the compression function . It is very widely implemented. In many cases where I say “use a cryptographic hash function” in this course I actually mean to use an HMACℎ like construction that can be conjectured to give at least a PRF if not stronger “random oracle”-like properties. The simplest implementation for a compression function is to take a block cipher with an bit key and an bit message and then simply define . A more common vari- ant is known as Davies-Meyer푛 where we푛 also XOR the output with 1 2푛 푥푛+1,…,푥2푛 1 푛 ℎ(푥 ,. … In , practice 푥 ) = 퐸 people often(푥 use, … ,tailor 푥 ) made block ciphers that are designed for some efficiency or security concerns. 푛+1 2푛 푥 , … 푥

7.4.2 Some History Almost all practically used hash functions are based on the Merkle- Damgard paradigm. Hash functions are designed to be extremely 7 For example, the Boneh-Shoup book quotes process- efficient7 which also means that they are often at the “edge of insecu- ing times of up to 255MB/sec on a 1.83 Ghz Intel Core 2 processor, which is more than enough to handle not rity” and indeed have fallen over the edge. just Harvard’s network but even Lamar College’s. In 1990 Ron Rivest proposed MD4, which was already shown weak- nesses in 1991, and a full collision has been found in 1995. Even faster attacks have been since found and MD4 is considered completely insecure. In response to these weaknesses, Rivest designed MD5 in 1991. A weakness was shown for it in 1996 and a full collision was shown in 2004. Hence it is now also considered insecure. In 1993 the National Institute of Standards proposed a standard for a hash function known as the Secure Hash Algorithm (SHA), which has quite a few similarities with the MD4 and MD5 functions. This func- tion is known as SHA-0, and the standard was replaced in 1995 with SHA-1 that includes an extra “mixing” (i.e., bit rotation) operation. At the time no explanation was given for this change but SHA-0 was later 172 an intensive introduction to cryptography

found to be insecure. In 2002 a variant with longer output, known as SHA-256, was added (as well as some others). In 2005, following the MD5 collision, significant weaknesses were shown in SHA-1. In 2017, a full SHA-1 collision was found. Today SHA-1 is considered insecure and SHA-256 is recommended. Given the weaknesses in MD-5 and SHA-1 , NIST started in 2006 a competition for a new hashing standard, based on functions that seem sufficiently different from the MD5/SHA-0/SHA-1 family. (SHA-256 is unbroken but it seems too close for comfort to those other systems.) The hash function Keccak was selected as the new standard SHA-3 in August of 2015.

7.4.3 The NSA and Hash Functions The NSA is the world’s largest employer of mathematicians, and is very heavily invested in cryptographic research. It seems quite pos- sible that they devote far more resources to analyzing symmetric primitives such as block ciphers and hash functions than the open re- search community. Indeed, the history above suggests that the NSA has consistently discovered attacks on hash functions before the cryp- tographic community (and the same holds for the differential crypt- analysis technique for block ciphers). That said, despite the “mythic” powers that are sometimes ascribed to the NSA, this history suggests that they are ahead of the open community but not so much ahead, discovering attacks on hash functions about 5 years or so before they appear in the open literature. There are a few ways we can get “insider views” to the NSA’s thinking. Some such insights can be obtained from the Snowden documents. The Flame malware has been discovered in Iran in 2012 after operating since at least 2010. It used an MD5 collision to achieve its goals. Such a collision was known in the open literature since 2008, but Flame used a different variant that was unknown in the litera- ture. For this reason it is suspected that it was designed by a western intelligence agency. Another insight into NSA’s thoughts can be found in pages 12- 19 of NSA’s internal Cryptolog newsletter which has been recently declassified; one can find there a rather entertaining and opinion- ated (or obnoxious, depending on your point of view) review of the CRYPTO 1992 conference. In page 14 the author remarks that certain weaknesses of MD5 demonstrated in the conference are unlikely to be extended to the full version, which suggests that the NSA (or at least the author) was not aware of the MD5 collisions at the time. (The full archive of the cryptolog newsletter makes for some interesting reading!) hash functions and random oracles 173

7.4.4 Cryptographic vs Non-Cryptographic Hash Functions Hash functions are of course also widely used for non-cryptographic ap- plications such as building hash tables and load balancing. For these applications people often use linear hash functions known as cyclic redundancy codes (CRC). Note however that even in those seemingly non-cryptographic applications, an adversary might cause signifi- cant slowdown to the system if he can generate many collisions. This can and has been used to obtain denial of service attacks. As a rule of thumb, if the inputs to your system might be generated by someone who does not have your best interests at heart, you’re better off using a cryptographic hash function.

7.5 READING COMPREHENSION EXERCISES

I recommend students do the following exercises after reading the lecture. They do not cover all material, but can be a good way to check your understanding. Exercise 7.1 Choose the strongest true statement from the following op- tions. (That is, choose the mathematical statement from these options that is both true, and one can derive the other true statements as direct corollaries.) a. For every function there exist two strings in such that1024 128. ′ 1024 ℎ ∶ {0, 1} → {0, 1}′ b.푥 There ≠ 푥 is a{0, randomized 1} algorithmℎ(푥) =that ℎ(푥 ) makes at most queries to a given black box computing a function 128 that with probability at least , 퐴outputs a pair 1024 2 in 128 such that . ℎ ∶ {0, 1} →′ {0, 1} 1024 0.9′ 퐴 푥 ≠ 푥 c.{0, There 1} is a randomizedℎ(푥) algorithm = ℎ(푥 ) that makes at most queries to a given black box computing a function 64 that with probability at least퐴 , outputs a pair1001024 ⋅ 2 in 128 such that . ℎ ∶ {0, 1} →′ {0, 1}1024 ′ 0.9 퐴 푥 ≠ 푥 d.{0, There 1} is a randomizedℎ(푥) algorithm = ℎ(푥 ) that makes at most queries to a given black box computing a function 64 that with probability at least퐴 , outputs a pair0.011024 ⋅ 2 in 128 such that . ℎ ∶ {0, 1} →′ {0, 1}1024 ′ 0.9 퐴 푥 ≠ 푥 {0, 1} ℎ(푥) = ℎ(푥 ) ■ Exercise 7.2 Suppose that is chosen at random. If is chosen at random in 1024and we pick128 indepen- dently at random in ℎ ∶ {0,, how 1}128 large→ {0, does 1} need to be so that the 1 푡 probability푦 that there is some1024{0, 1}such that 푥 , …is , at푥 least . (Pick the answer with{0, the 1} closest estimate): 푡 푖 푖 푥 ℎ(푥 ) = 푦 1/2 174 an intensive introduction to cryptography

a. 1024 b. 2 256 c. 2 128 d. 2 64 2 ■ Exercise 7.3 Suppose that a message authentication code where Alice and Bob use as one of its components a function as a black box is secure when is a random function. Is it still secure when(푆, 푉 Alice ) and Bob uses a hash function that is chosen from some PRFℎ collection and whose keyℎ is given to the adversary? ℎ a. It can sometimes be secure and sometimes insecure. b. It is always secure. c. It is always insecure.

■ 8 Key derivation, protecting passwords, slow hashes, Merkle trees

Last lecture we saw the notion of cryptographic hash functions which are functions that behave like a random function, even in set- tings (unlike that of standard PRFs) where the adversary has access to the key that allows them to evaluate the hash function. Hash functions have found a variety of uses in cryptography, and in this lecture we survey some of their other applications. In some of these cases, we only need the relatively mild and well-defined property of collision resistance while in others we only know how to analyze security under the stronger (and not precisely well defined) random oracle heuristic.

8.1 KEYS FROM PASSWORDS

We have seen great cryptographic tools, including PRFs, MACs, and CCA secure encryptions, that Alice and Bob can use when they share a cryptographic key of 128 bits or so. But unfortunately, many of the current users of cryptography are humans which, generally speaking, have extremely faulty memory capacity for storing large numbers. There are ways to select a password of 8 upper and lower case letters +8 numbers,48 but some letter/numbers combinations end up being chosen62 much≈ 2 more frequently than others. Due to several large scale hacks, very large databases of passwords have been made public, and one estimate is that 91 percent of the passwords chosen by users are contained in a list of about strings. If we choose a password at random from10 some set then the en- tropy of the password is simply1, log 000 ≈. 2 However, estimating the entropy of real life passwords is rather difficult. For퐷 example, suppose 2 that I use the winning Massachussets|퐷| Mega-Lottery numbers as my password. A priori, my password consists of numbers between till and so its entropy is log . However, if an attacker knew that I did this, the entropy might5 be something5 like log (since1 75 2(75 ) ≈ 31 (520) ≈ 9

Compiled on 9.23.2021 13:32 176 an intensive introduction to cryptography

there were only 520 such numbers selected in the last 10 years). More- over, if they knew exactly what draw I based my password on, then they would it exactly and hence the entropy (from their point of view) would be zero. This is worthwhile to emphasize: The entropy of a secret is always measured with respect to the attacker’s point of view.

The exact security of passwords is of course a matter of intense practical interest, but we will simply model the password as being chosen at random from some set (which is sometimes called the “dictionary”). The set is known푛 to the attacker, but she has no information on the particular퐷 ⊆ choice {0, 1} of the password. Much of the challenge for using퐷 passwords securely relies on the distinction between offline and online attacks. If each guess for a pass- word requires interacting online with a server, as is the case when typing a PIN number in the ATM, then even a weak password (such as a 4 digit PIN that at best provides bits of entropy) can yield meaningful security guarantees, as typically an alarm would be raised after five or so failed attempts. 13 However, if the adversary has the ability to check offline whether a password is correct then the number of guesses they can try can be as high as the number of computing cycles at their disposal, which can easily run into the billions and so break passwords of or more bits of entropy. (This is an issue we’ll return to after we learn about public key cryptography when we’ll talk about password authenticated30 key exchange.) Consider a password manager application. In such an application, a user typically chooses a master password which she can then use to access all her other passwords . To enable her to do 푚푎푠푡푒푟 so without requiring online access to a server,푝 the master password 1 푡 is used to encrypt the other passwords.푝 , … , 푝 However to do that, we need to derive a key from the password. 푚푎푠푡푒푟 푝 푚푎푠푡푒푟 푘 P A natural approach is to simply let the key be the password. For example, if the password is a string of at most 16 bytes, then we can simply treat it as a bit key and use it for encryption. Stop푝 and think why this would not be a good idea. In particular think 128of an example of a secure encryption and a distribution over of entropy at least such that if the key is chosen푛 at random from(퐸, 퐷) then the encryption will푃 be completely{0, 1} insecure. 푛/2 푘 푃 A classical approach is to simply use a cryptographic hash function , and let . If think of as ∗ 푛 푚푎푠푡푒푟 푚푎푠푡푒푟 퐻 ∶ {0, 1} → {0, 1} 푘 = 퐻(푝 ) 퐻 key derivation, protecting passwords, slow hashes, merkle trees 177

a random oracle and as chosen randomly from , then as long as an attacker makes queries to the oracle, they are unlikely to 푚푎푠푡푒푟 make the query 푝 and hence the value will퐷 be completely random from their point≪ |퐷| of view. 푚푎푠푡푒푟 푚푎푠푡푒푟 However, since푝 is not too large, it might푘 not be so hard to per- form such queries. For this reason, people typically use a deliber- ately slow hash function|퐷| as a key derivation function. The rationale is that the honest|퐷| user only needs to evaluate once, and so could afford for it to take a while, while the adversary would need to evaluate it times. For example, if is about 퐻 and the honest user is willing to spend 1 cent of computation resources every time they need |퐷|to derive from |퐷| , then we100, could 000 set so that it costs 1 cent to evaluate it and hence on average it will cost the adversary 푚푎푠푡푒푟 푚푎푠푡푒푟 dollars푘 to recover푝 it. 퐻(⋅) There are several approaches for trying to make deliberately 1,“slow” 000 or “costly” to evaluate but the most popular and simplest one is to simply let be obtained by iterating many times퐻 a basic hash function such as SHA-256. That is, where is some standard퐻 (“fast”) cryptographic hash function and the number 퐻(푥) = ℎ(ℎ(ℎ(⋯ ℎ(푥)))) ℎ of iterations is tailored to be the largest one that the honest users can 1 Since CPU speeds can vary quite radically and 1 tolerate. attackers might even use special-purpose hardware In fact, typically we will set where is a to evaluate iterated hash functions quickly, Abadi, Burrows, Manasse, and Wobber suggested in 2003 long random but public string known as a “salt” (see Fig. 8.1). Includ- to use memory bound functions as an alternative 푚푎푠푡푒푟 푚푎푠푡푒푟 ing such a “salt” can be important푘 to foiling= 퐻(푝 an adversary’s‖푟) attempts푟 to approach, where these are functions designed so that evaluating them will consume at least bits amortize the computation costs, see the exercises. of memory for some large . See also퐻(⋅) the followup paper of Dwork, Goldberg and Naor. This approach푇 has also been used in some푇 practical key derivation Figure 8.1: To obtain a key from a password we will functions such as scrypt and Argon2. typically use a “slow” hash function to map the password and a unique-to-user public “salt” value to a cryptographic key. Even with such a procedure, the resulting key cannot be consider as secure and unpredictable as a key that was chosen truly at random, especially if we are in a setting where an adversary can launch an offline attack to guess all possibilities.

Even when we don’t use one password to encrypt others, it is gen- erally considered the best practice to never store a password in the 178 an intensive introduction to cryptography

clear but always in this “slow hashed and salted” form, so if the pass- words file falls to the hands of an adversary it will be expensive to recover them.

8.2 MERKLE TREES AND VERIFYING STORAGE.

Suppose that you outsource to the cloud storing your huge data file . You now need the bit of that file and ask the cloud for . How can푁 you tell that you actually푡ℎ received the correct bit? 푥 ∈Ralph {0, 1} Merkle came up in 1979푖 with a clever solution, which is 푖 known푥 as “Merkle hash trees”. The idea is the following (see The- orem 8.1): suppose we have a collision-resistant hash function , and think of the string as composed of blocks of size2푛 . We then푛 hash every pair of consecutive blocks to transformℎ ∶ {0,into 1} a string→ {0, 1}of blocks, and continue푥 in this way for log푡 steps until we푛 get a single block . (Assume here is a power of 1 푥two for simplicity,푥 though푡/2 it doesn’t make푛 much difference.) 푡 푦 ∈ {0, 1} 푡 Figure 8.2: In the Merkle Tree construction we map a long string into a block that is a “digest”

of the long string . As in a collision푛 resistant hash we can imagine푥 that this map푦 ∈ {0,is “one 1} to one” in the sense that it won’t푥 be possible to find with the same digest. Moreover, we can efficiently′ certify that a certain bit of is equal to some푥 value≠ 푥 without sending out all of but rather the log blocks that are on the path between푥 to the root together with their “siblings” used in푥 the hash function, for푡 a total of at most log blocks. 푖

2 푡

Alice who sends to the cloud Bob will keep the short block . Whenever Alice queries the value she will ask for a certificate that is indeed the right value.푥 This certificate will consists of the 푦block that 푖 contains , as well as all of the log푖 blocks that were used in the hash푥 from this block to the root. The security of this scheme follows from the following푖 simple theorem:2 푡

Theorem 8.1 — Merkle Tree security. Suppose that is a valid certificate that , then either this statement is true, or one can effi- ciently extract from and two inputs 휋 in such that 푖 푥 =. 푏 ′ 2푛 ′ 휋 푥 푧 ≠ 푧 {0, 1} ℎ(푧) = ℎ(푧 ) key derivation, protecting passwords, slow hashes, merkle trees 179

Proof. The certificate consists of a sequence of log pairs of size- blocks that are obtained by following the path on the tree from the coordinate of to the휋 final root . The last pair of푡 blocks is the 푛 a푡ℎ preimage of under , while each pair on this list is a preimage 푖of one of the blocks푥 in the next pair.푦 If , then the first pair of blocks cannot be푦 identicalℎ to the pair of blocks of that contains the 푖 coordinate. However, since we know푥 the≠ final푏 root is identical, if we푡ℎ compare the corresponding path in to , we푥 will see that at some point푖 there must be an input in the path from and a푦 distinct input in that hash to the same output. 푥 휋 ■ ′ 푧 푥 푧 휋

8.3 PROOFS OF RETRIEVABILITY

The above provides a way to ensure Alice that the value retrieved from a cloud storage is correct, but how can Alice be sure that the cloud server still stores the values that she did not ask about? A priori, you might think that she obviously can’t. If Bob is lazy, or short on storage, he could decide to store only some small fraction of that he thinks Alice is more likely to query for. As long as Bob wasn’t unlucky and Alice doesn’t ask these queries, then it seems Bob could 푥 get away with this. In a proof of retrievability, first proposed by Juels and Kalisky in 2007, Alice would be able to get convinced that Bob does in fact store her data. First, note that Alice can guarantee that Bob stores at least 99 per- cent of her data, by periodically asking him to provide answers (with proofs!) of the value of at 100 or so random locations. The idea is that if bob dropped more than 1 percent of the bits, then he’d be very likely to be caught “red푥 handed” and get a question from Alice about a location he did not retain. Now, if we used some redundancy to store such as the RAID format, where it is composed of some small number parts and we can recover any bit of the original data as long푥 as at most one of the parts were lost, then we might hope that even if 1% of푐 was in fact lost by Bob, we could still recover the whole string. This is not a fool-proof guarantee since it could possibly be that the data lost by푥 Bob was not confined to a single part. To handle this case one needs to consider generalizations of RAID known as “local reconstruction codes” or “locally decodable codes”. The paper by Dodis, Vadhan and Wichs is a good source for this; see also these slides by Seny Kamara for a more recent overview of the theory and implementations. 180 an intensive introduction to cryptography

8.4 ENTROPY EXTRACTION

As we’ve seen time and again, randomness is crucial to cryptography. But how do we get these random bits we need? If we only have a small number of random bits (e.g., or so) then we can expand them to as large a number as we want using a pseudorandom genera- tor, but where푛 do we get those initial푛 = 128bits from? The approach used in practice is known as “harvesting entropy”. The idea is that we make great many푛 measurements of events that are considered “unpredictable” to some extent, including 1 푚 mouse movements, hard-disk and network latency, sources푥 , … , 푥 of noise etc… and accumulate them in an entropy “pool” which would simply be some memory array. When we estimate that we have accumulated more than bits of randomness, then we hash this array into a bit string which we’ll use as a seed for a pseudorandom generator 2 2 The reason that people use entropy “pools” rather (see Fig. 8.3128). Because entropy needs to be measured from the point128 of than simply adding the entropy to the generator’s view of the attacker, this “entropy estimation” routine is a bit of a “black state as it comes along is that the latter alternative art” and there isn’t a very principled way to perform it. In practice might be insecure. Suppose that initial state of the generator was known to the adversary and now the people try to be very conservative (e.g., assume that there is only one entropy is “trickling in” one bit at a time while we bit of entropy for 64 bits of measurements or so) and hope for the best, continuously use the generator to produce outputs that can be observed by the adversary. Every time which often works but sometimes also spectacularly fails, especially in a new bit of entropy is added, the adversary now embedded systems that do not have access to many of these sources. has uncertainty between two potential states of the generator, but once an output is produced this eliminates this uncertainty. In contrast, if we wait until we accumulate, say, 128 bits of entropy, then nowFigure the 8.3 adversary: To obtain will pseudorandom have possible bits for state crypto-

optionsgraphic toapplications consider, and we hashit could down128 be computationally measurements infeasiblewhich contain to cull some thementropy usingin further2 them observation.to a shorter string that is hopefully truly uniformly random or at least statistically close to it, and then expand this to get as many pseudorandom bits as we need using a pseudorandom generator.

How do hash functions figure into this? The idea is that if an input has bits of entropy then would still have the same bits of entropy, as long as its output is larger than . In practice people use 푥the notion푛 of “entropy” in aℎ(푥) rather loose sense, but we will try to be more precise below. 푛 key derivation, protecting passwords, slow hashes, merkle trees 181

The entropy of a distribution is meant to capture the amount of “uncertainty” you have over the distribution. The canonical example is when is the uniform distribution퐷 over , in which case it has bits of entropy. If you learn a single bit of 푛 then you reduce the entropy퐷 by one bit. For example, if you{0, learn 1} that the bit is equal푛 to , then the new conditional distribution퐷 is the uniform푡ℎ distribution over all strings in such that′ 17 and has 0 bits of entropy. Entropy is invariant푛 under퐷 permutations 17 of the sample space, and only depends푥 ∈ {0, 1} on the vector푥 of probabilities,= 0 and푛 thus − 1 for every set all notions of entropy will give log bits of entropy for the uniform distribution over . A distribution that is 2 uniform over some set푆 is known as a flat distribution. |푆| Where different notions of entropy begin푆 to differ is when thedis- tributions are not flat.푆 The Shannon entropy follows the principle that “original uncertainty = knowledge learned + new uncertainty”. That is, it obeys the chain rule which is that if a random variable has bits of entropy, and has bits of entropy, then after learning on average will have bits of entropy. That is, (푋, 푌 ) 푛 푋 푘 푋 Where푌 the entropy푛 −of 푘 a conditional distribution is simply 푆ℎ푎푛푛표푛 푆ℎ푎푛푛표푛 푆ℎ푎푛푛표푛 퐻 (푋) + 퐻 where(푌 |푋) = 퐻 is the(푋, distribution 푌 ) on obtained by conditioning on the event that . 푌 |푋 푥←푋 푆ℎ푎푛푛표푛 피 If 퐻 (푌is |푋 a vector = 푥) of probabilities푌 |푋 = 푥 summing up to and let푌 us assume they are rounded so that for every 푋, = 푥 for some inte- 1 푚 ger (푝. We, … can , 푝 then) split the set into disjoint푛 sets 1 푖 푖 where , and consider the probability푛 푖 푝 distribution= 푘 /2 푖 1 푚 where푘 is uniform over ,{0, and 1} is equal푚 to whenever푆 , … , 푆 . 푖 푖 Therefore,|푆 | by = the 푘 principles above푛 we know that (푋, 푌 ) 푖 (since 푌 is completely determined{0, 1} 푋and hence 푖 is uniform푌 ∈ 푆 푆ℎ푎푛푛표푛 over a set of elements), and log 퐻. Thus the(푋, chain 푌 ) = 푛 rule tells푋 us that푛 푌 (푋, 푌 ) 푖 log2 since 퐻(푌 |푋). Since = 피 log푘 푚 log 푆ℎ푎푛푛표푛 푖=1 푖 푖 we see푚 that this means퐻푛 that(푋) = 푛 −푛 피[푌 |푋] = 푛푛 − ∑ 푝 푘 = 푖=1 푖 푖 푖 푖 푖 푖 푛 − ∑ 푝 (2 푝 ) 푝 = 푘 /2 (2 푝 ) = 푛 + (푝 ) 3 The notation for min entropy comes from the fact that one can define a family of entropy like func- log log tions, containing퐻∞ a(⋅) function for every non-negative number based on the -norm of the probability 푆ℎ푎푛푛표푛 푖 푖 푖 푖 푖 distribution. That is, the Rényi entropy of order is 퐻 (푋) = 푛 − ∑푖 푝 ⋅ 푛 − ∑푖 푝 (푝 ) = − ∑푖 푝 (푝 ) defined as log Pr . using the fact that . 푝 푝 The min entropy can be thought−1 of as the limit of푝 푝 The Shannon entropy has many attractive properties, but it turns when tends퐻푝(푋) to infinity = (1 − 푝) while− the(∑ Shannon푥 [푋 = entropy 푥] ) is 푖 푖 푝 out that for cryptographic∑ 푝 = applications, 1 the notion of min entropy is the limit as tends to . The entropy is related퐻 to the 푝collision probability of and is often used as more appropriate. For a distribution the min-entropy is simply de- well. The min푝 entropy1 is the smallest among퐻2(⋅) all the fined as min log Pr .3 Note that if is flat then entropies and hence it is the푋 most conservative (and so appropriate for usage in cryptography). For flat and that 푋 for all . We ∞ 푥 sources, which are uniform over a certain subset, all can now퐻 formally(푋) = define the(1/ notion[푋 = of 푥]) an extractor: 푋 entropies coincide. ∞ 푆ℎ푎푛푛표푛 ∞ 푆ℎ푎푛푛표푛 퐻 (푋) = 퐻 (푋) 퐻 (푋) ≤ 퐻 (푋) 푋 182 an intensive introduction to cryptography

Definition 8.2 — Randomness extractor. A function is a randomness extractor (“extractor” for short) if forℓ+푛 every distribution푛 over with min entropy atℎ least ∶ {0,, if 1} we pick→ {0,to 1} be a random “salt”, theℓ distribution is computationally indistinguishable푋 from{0,1} the uniform distribution. 4 2푛 푠 ℎ(푋) 4 The pseudorandomness literature studies the notion The idea is that we apply the hash function to our measurements of extractors much more generally and consider all in then if those measurements had at least bits of entropy possible variations for parameters such as the entropy requirement, the salt (more commonly known as (with someℓ extra “security margin”) then the output will be as seed) size, the distance from uniformity, and more. good{0, as1} random. Since the “salt” value is not secret,푘 it can be chosen The type of notion we consider here is known in that once at random and hardwired into the description ofℎ(푋) the function. literature as a “strong seeded extractor”. See Vadhan’s monograph for an in-depth treatment of this topic. (Indeed in practice people often do not푠 explicitly use such a “salt”, but the hash function description contain some parameters IV that play a similar role.)

Theorem 8.3 — Random function is an extractor. Suppose that is chosen at random, and . Then with high proba-ℓ+푛 bility 푛 is an extractor. 100 ℎ ∶ {0, 1} → {0, 1} ℓ < 푛 Proof. Letℎ be chosen as above, and let be some distribution over with max Pr . Now, for every let beℓ the functionℎ that maps −2푛 푋to , and let 푛 . 푥 {0,We 1} want to prove{ that[푋 =is 푥]} pseudorandom. ≤ 2 ℓ We will use푠 the∈ {0, following 1} 푠 푠 푠 ℎclaim: 푥 ∈ {0, 1} ℎ(푠‖푥) 푌 = ℎ (푋) 푠 Claim: Let be푌 the probability that two independent sample from are identical. Then with probability at least , 푠 퐶표푙(푌. ) 푠 푠 −푛Proof푌 of claim:−2푛 Pr 0.99 퐶표푙(푌Pr ) <

2 + 100 ⋅ 2 Pr Pr −푛 . Let’s′ separate this to the 푠 푠 푠 푥,푥 contribution′ 푛 when피 퐶표푙(푌and) = when ∑ ′ they2 ∑ differ.[푋 The = 푥]contribution[푋 = from 푦∈{0,1} the푥 ] ∑ first term is[ℎ(푠, 푥) =′ 푦]Pr [ℎ(푠, 푥 ) =which 푦] is simply Pr 푥−푛 =since 푥 Pr 2 . In the second term, the 푠 푥 events that 2 ∑−2푛2 and∑ [푋 = 푥] are−2푛 independent,퐶표푙(푋) and hence = the ∑contribution[푋 = 푥] here≤ 2 is at most [푋′ =Pr 푥] ≤ 2 Pr . The claim follows fromℎ(푠, Markov. 푥) = 푦 ℎ(푠, 푥 ′) = 푦 ′ −푛 푥,푥 Now suppose that is some∑ efficiently[푋 = 푥] computable[푋 = 푥 ]2 function from to , then by Cauchy-Schwarz 푛 Pr푇 Pr 푛 푠 {0, 1} 푛{0, 1} −푛 | 피[푇2 (푈 )] −−푛 피[푇 (푌 )]| = 2 but opening up 푠Pr we get Pr푠 | ∑푦∈{0,1} 푇 (푦)[2 − [푌 = 푦]]| ≤ ∑푦 푇 (푦) ⋅ ∑푦(2 − [푌 = 푦]) Pr or−푛 √2which is at−푛 most the−푛 negligible 푦 푆 푦 푠 quantity ∑.2(2 − [푌 =−푛 푦]) 2 − 2 ⋅ 2 ∑ [푌 = 푦 푠 푠 ■ 푦] + ∑ [푌 =−2푛 푦] 퐶표푙(푌 ) − 2 100 ⋅ 2 key derivation, protecting passwords, slow hashes, merkle trees 183

R Remark 8.4 — Statistical randomness. This proof ac- tually proves a much stronger statement. First, note that we did not at all use the fact that is efficiently computable and hence the distribution will not be merely pseudorandom but actually푇 statisti- 푠 cally indistinguishable from truly randomℎ distribution.(푋) Second, we didn’t use the fact that is completely random but rather what we needed was merely pairwise independence: that for everyℎ and , Pr . There are efficient′ constructions of functions′ with−2푛 푥 this property, ≠ 푥 푦 푠 푠 푠 though[ℎ (푥) in practice = ℎ (푥 people) = 푦] still = often 2 use cryptographic hash function for this purpose.ℎ(⋅)

8.4.1 Forward and backward secrecy A cryptographic tool such as encryption is clearly insecure if the ad- versary learns the private key, and similarly the output of a pseudo- random generator is insecure if the adversary learns the seed. So, it might seem as if it’s “game over” once this happens. However, there is still some hope. For example, if the adversary learns it at time but didn’t know it before then, then one could hope that she does not learn the information that was exchanged up to time . This prop-푡 erty is known as “forward secrecy”. It had recently gained interest as means to protect against powerful “attackers” such as푡the − 1 NSA that may record the communication transcripts in the hope of deciphering them in some future after it had learned the secret key. In the context of pseudorandom generators, one could hope for both forward and backward secrecy. Forward secrecy means that the state of the genera- tor is updated at every point in time in a way that learning the state at time does not help in recovering past state, and “backward secrecy” means that we can recover from the adversary knowing our internal state푡 by updating the generator with fresh entropy. See this paper of me and Halevi for some discussions of this issue, as well as this later work by Dodis et al.

9 Private key crypto recap

We now review all that we have learned about private key cryp- tography before we embark on the wonderful journey to public key cryptography. This material is mostly covered in Chapters 1 to 9 of the Katz Lin- dell book, and now would be a good time for you to read the corre- sponding proofs in the book. It is often helpful to see the same proof presented in a slightly different way. Below is a review of some ofthe various reductions we saw in class that are covered in the KL book, with pointers to the corresponding sections.

• Pseudorandom generators (PRG) length extension (from output PRG to output PRG): Section 7.4.2 • PRG’s to pseudorandom functions (PRF’s): Section 7.5 푛 + 1 • PRF’s to Chosen푝표푙푦(푛) Plaintext Attack (CPA) secure encryption: Section 3.5.2 • PRF’s to secure Message Authentication Codes (MAC’s): Section 4.3 • MAC’s + CPA secure encryption to chosen ciphertext attack (CCA) secure encryption: Section 4.5.4 • Pseudorandom permutation (PRP’s) to CPA secure encryption / block cipher modes: Section 3.5.2, Section 3.6.2 • Hash function applications: fingerprinting, Merkle trees, pass- words: Section 5.6 • Coin tossing over the phone: we saw a construction in class that used a commitment scheme built out of a pseudorandom genera- tor. Section 5.6.5 shows an alternative construction using random oracles. • PRP’s from PRF’s: we only sketched the construction which can be found inSection 7.6

One major point we did not talk about in this course was one way functions. The definition of a one way function is quite simple:

Compiled on 9.23.2021 13:32 186 an intensive introduction to cryptography

A function is a one way function if it is effi- ciently computable and∗ for every ∗and a time adversary , the probability푓 over ∶ {0, 1} → {0, 1}that outputs such that is negligible. 푛푛 푝표푙푦(푛) ′ 퐴 푅 The′ “OWF conjecture”푥 ← is{0, the 1} conjecture퐴(푓(푥)) that one way푥 functions ex- ist.푓(푥 It) turns= 푓(푥) out to be a necessary and sufficient condition for much of cryptography. That is, the following theorem is known (by combining works of many people):

Theorem: The following are equivalent: * One way functions exist * Pseudorandom generators (with non trivial stretch) exist * Pseudoran- dom functions exist * CPA secure private key encryptions exist * CCA secure private key encryptions exist * Message Authentication Codes exist * Commitment schemes exist (and others as well) The key result in the proof of this theorem is the result of Hastad, Impagliazzo, Levin and Luby that if one way functions exist then pseudorandom generators exist. If you are interested in finding out more, Sections 7.2-7.4 in the KL book cover a special case of this theo- rem for the case that the one way function is a permutation on for every . This proof has been considerably simplified and quanti-푛 tatively improved in works of Haitner, Holenstein, Reingold, Vadhan,{0, 1} Wee and Zheng.푛 See this talk of Salil Vadhan for more on this. See also these lecture notes from a Princeton seminar I gave on this topic (though the proof has been simplified since then by the above works).

9.0.1 Attacks on private key cryptosystems Another topic we did not discuss in depth is attacks on private key cryptosystems. These attacks often work by “opening the black box” and looking at the internal operation of block ciphers or hash func- tions. One then often assigns variables to various internal registers, and then we look to finding collections of inputs that would satisfy some non-trivial relation between those variables. This is a rather vague description, but you can read KL Section 6.2.6 on linear and differential for more information. See also this course of Adi Shamir. There is also the fascinating area of side channel attacks on both public and private key crypto. III PUBLIC KEY CRYPTOGRAPHY

10 Public key cryptography

People have been dreaming about heavier-than-air flight since at least the days of Leonardo Da Vinci (not to mention Icarus from Greek mythology). Jules Verne wrote with rather insightful details about go- ing to the moon in 1865. But, as far as I know, no one had considered the possibility of communicating securely without first exchanging a key until about 50 years ago. This is surprising given the thousands of years people have been using secret writing! However, in the late 1960’s and early 1970’s, several people started to question this “common wisdom”. Perhaps the most surprising of these visionaries was an undergrad- uate student at Berkeley named Ralph Merkle. In the fall of 1974, he wrote a project proposal for his computer security course that while “it might seem intuitively obvious that if two people have never had the opportunity to prearrange an encryption method, then they will be unable to communicate securely over an insecure channel… I be- lieve it is false”. The project proposal was rejected by his professor as “not good enough”. Merkle later submitted a paper to the commu- nication of the ACM, where he apologized for the lack of references since he was unable to find any mention of the problem in the scien- tific literature, and the only source where he saw the problem even raised was in a science fiction story. The paper was rejected with the comment that “Experience shows that it is extremely dangerous to transmit key information in the clear.” Merkle showed that one can design a protocol where Alice and Bob can use invocations of a hash function to exchange a key, but an adversary (in the random oracle model, though he of course didn’t use this name)푇 would need roughly invocations to break it. He conjectured that it may be possible to obtain2 such protocols where breaking is exponentially harder than using them푇 but could not think of any concrete way to doing so. We only found out much later that in the late 1960’s, a few years be- fore Merkle, James Ellis of the British Intelligence agency GCHQ was having similar thoughts. His curiosity was spurred by an old World

Compiled on 9.23.2021 13:32 190 an intensive introduction to cryptography

War II manuscript from Bell labs that suggested the following way that two people could communicate securely over a phone line. Alice would inject noise to the line, Bob would relay his messages, and then Alice would subtract the noise to get the signal. The idea is that an adversary over the line sees only the sum of Alice’s and Bob’s signals and doesn’t know what came from what. This got James Ellis thinking whether it would be possible to achieve something like that digitally. As he later recollected, in 1970 he realized that in principle this should be possible. He could think of an hypothetical black box that on input a “handle” and plaintext would give a “ciphertext” . There would be a secret key corresponding to such that feeding퐵 and to the box would훼 recover . However,푝 Ellis had no idea how to푐 actu- ally instantiate this box.훽 He and others kept훼 giving this question훽 as a푐 puzzle to bright new recruits푝 until one of them, Clifford Cocks, came up in 1973 with a candidate solution loosely based on the factoring problem; in 1974 another GCHQ recruit, Malcolm Williamson, came up with a solution using modular exponentiation. But among all those thinking of public key cryptography, probably the people who saw the furthest were two researchers at Stanford, Whit Diffie and Martin Hellman. They realized that with the advent of electronic communication, cryptography would find new applica- tions beyond the military domain of spies and submarines. And they understood that in this new world of many users and point to point communication, cryptography would need to scale up. They envi- sioned an object which we now call “trapdoor permutation” though they called it “one way trapdoor function” or sometimes simply “pub- lic key encryption”. This is a collection of permutations where is a permutation over (say) , and the map 푘 is efficiently computable but the reverse|푘| map {푝 } is com- 푘 푘 푝putationally hard. Yet, there is{0, also 1} some secret key (푥,−1 푘)(i.e., ↦ the 푝 (푥) 푘 “trapdoor”) such that using it is possible(푘, to 푦) efficiently ↦ 푝 (푦) compute . Their idea was that using such a trapdoor permutation,푠(푘) Alice who−1 knows would be able푠(푘) to publish on some public file such 푘 푝that everyone who wants to send her a message could do so by com- puting 푠(푘). (While today we know, due푘 to the work of Goldwasser and Micali, that such a deterministic encryption푥 is not a good idea, 푘 at the time푝 (푥) Diffie and Hellman had amazing intuitions but didn’t re- ally have proper definitions of security.) But they didn’t stop there. They realized that protecting the integrity of communication is no less important than protecting its secrecy. Thus, they imagined that Alice could “run encryption in reverse” in order to certify or sign mes- sages. That is, given some message , Alice would send the value (for a hash function ) as a way to certify that she en- −1 푚 푥 = 푝푘 (ℎ(푚)) ℎ public key cryptography 191

dorses , and every person who knows could verify this by check- ing that . However,푚 Diffie and Hellman were푘 in a position not unlike physi- 푘 cists who푝 predicted(푥) = ℎ(푚) that a certain particle should exist but had no experimental verification. Luckily they met Ralph Merkle. His ideas about a probabilistic key exchange protocol, together with a suggestion from their Stanford colleague John Gill, inspired them to come up with what today is known as the Diffie-Hellman Key Exchange (which, unbeknownst to them, was found two years earlier at GCHQ by Mal- colm Williamson). They published their paper “New Directions in Cryptography” in 1976, and it is considered to have brought about the birth of modern cryptography. However, they still didn’t find their elusive trapdoor function. This was done the next year by Rivest, Shamir and Adleman who came up with the RSA trapdoor function, which through the framework of Diffie and Hellman yielded not just encryption but also signatures (this was essentially the same func- tion discovered earlier by Clifford Cocks at GCHQ, though as far asI can tell Cocks, Ellis and Williamson did not realize the application to digital signatures). From this point on began a flurry of advances in cryptography which hasn’t really died down till this day.

10.1 PRIVATE KEY CRYPTO RECAP

Before we embark on the wonderful journey to public key cryptogra- phy, let’s briefly look back and see what we learned about private key cryptography. This material is mostly covered in Chapters 1 to 9 of the Katz Lindell (KL) book and Part I (Chapters 1-9) of the Boneh Shoup (BS) book. Now would be a good time for you to read the correspond- ing proofs in one or both of these books. It is often helpful to see the same proof presented in a slightly different way. Below is a review of some of the various reductions we saw in class, with pointers to the corresponding sections in these books.

• Pseudorandom generators (PRG) length extension (from output PRG to output PRG): KL 7.4.2, BS 3.4.2 • PRG’s to pseudorandom functions (PRF’s): KL 7.5, BS 4.6푛 + 1 • PRF’s to Chosen푝표푙푦(푛) Plaintext Attack (CPA) secure encryption: KL 3.5.2, BS 5.5 • PRF’s to secure Message Authentication Codes (MAC’s): KL 4.3, BS 6.3 • MAC’s + CPA secure encryption to chosen ciphertext attack (CCA) secure encryption: BS 4.5.4, BS 9.4 • Pseudorandom permutation (PRP’s) to CPA secure encryption / block cipher modes: KL 3.5.2, KL 3.6.2, BS 4.1, 4.4, 5.4 192 an intensive introduction to cryptography

• Hash function applications: fingerprinting, Merkle trees, pass- words: KL 5.6, BS Chapter 8 • Coin tossing over the phone: we saw a construction in class that used a commitment scheme built out of a pseudorandom generator. This is shown in BS 3.12, KL 5.6.5 shows an alternative construction using random oracles. • PRP’s from PRF’s: we only sketched the construction which can be found in KL 7.6 or BS 4.5

One major point we did not talk about in this course was one way functions. The definition of a one way function is quite simple:

Definition 10.1 — One Way Functions. A function is a one way function if it is efficiently computable and∗ for every and∗ a time adversary , the probability over푓 ∶ {0, 1} → {0, 1}that outputs such that is negligible. 푛푛 푅 푝표푙푦(푛) ′ 퐴 ′ 푥 ← {0, 1} 퐴(푓(푥))The “OWF conjecture”푥 is the푓(푥 conjecture) = 푓(푥) that one way functions exist. It turns out to be a necessary and sufficient condition for much of private key cryptography. That is, the following theorem is known (by combining works of many people):

Theorem 10.2 — One way functions and private key cryptography. The fol- lowing are equivalent: * One way functions exist * Pseudorandom generators (with non-trivial stretch) exist * Pseudorandom functions exist * CPA secure private key encryptions exist * CCA secure private key encryptions exist * Message Authentication Codes exist * Commitment schemes exist

The key result in the proof of this theorem is the result of Hastad, Impagliazzo, Levin and Luby that if one way functions exist then pseudorandom generators exist. If you are interested in finding out more, Sections 7.2-7.4 in the KL book cover a special case of this theo- rem for the case that the one way function is a permutation on for every . This proof has been considerably simplified and quanti-푛 tatively improved in works of Haitner, Holenstein, Reingold, Vadhan,{0, 1} Wee and Zheng.푛 See this talk of Salil Vadhan for more on this. See also these lecture notes from a Princeton seminar I gave on this topic (though the proof has been simplified since then by the above works).

R public key cryptography 193

Remark 10.3 — Attacks on private key cryptosystems. Another topic we did not discuss in depth is attacks on private key cryptosystems. These attacks often work by “opening the black box” and looking at the internal operation of block ciphers or hash functions. We then assign variables to various internal registers, and look to find collections of inputs that would sat- isfy some non-trivial relation between those variables. This is a rather vague description, but you can read KL Section 6.2.6 on linear and differential cryptanalysis and BS Sections 3.7-3.9 and 4.3 for more information. See also this course of Adi Shamir. There is also the fascinating area of side channel attacks on both public and private key crypto.

R Remark 10.4 — Digital Signatures. We will discuss in this lecture Digital signatures, which are the public key analog of message authentication codes. Surprisingly, despite being a “public key” object, it is possible to base digital signatures on one-way functions (this is obtained using ideas of Lamport, Merkle, Goldwasser- Goldreich-Micali, Naor-Yung, and Rompel). However these constructions are not very efficient (and this may be inherent), and so in practice people use digital signatures that are built using similar techniques to those used for public key encryption.

10.2 PUBLIC KEY ENCRYPTIONS: DEFINITION

We now discuss how we define security for public key encryption. As mentioned above, it took quite a while for cryptographers to arrive at the “right” definition, but in the interest of time we will skip aheadto what by now is the standard basic notion (see also Fig. 10.1):

Definition 10.5 — Public key encryption. A triple of efficient algorithms is a public key encryption scheme if it satisfies the following:

(퐺,• 퐸,is 퐷) a probabilistic algorithm known as the key generation algo- rithm that on input outputs a distribution over pair of keys 퐺 . 푛 • is the encryption algorithm1 that takes a pair of inputs with Figure 10.1: In a public key encryption, the receiver (푒, 푑) and outputs . Bob generates a pair of keys . The encryption key is used for encryption, and the decryption key is • 퐸 is the decryption푛 algorithm that takes a pair of inputs 푒, 푚and used for decryption. We call it a public key system 푒 (푒, 푑) outputs푚 ∈ {0, 1} . 푐 = 퐸 (푚) since푒 the security of the scheme does not rely on the adversary Eve not knowing the encryption key. 퐷 ′ 푑, 푐 푑 Hence, Bob can publicize the key to a great many 푚 = 퐷 (푐) potential receivers and still ensure confidentiality of the messages he receives. 푒 194 an intensive introduction to cryptography

• For every , with probability over the choice of output from푛 and the coins of , , 푚 ∈. {0, 1} 푛 1 − 푛푒푔푙(푛) (푒, 푑) 퐺(1 ) 퐸 퐷 푑 푒 We퐷 (퐸 say(푚)) that = 푚 is CPA secure if every efficient adversary wins the following game with probability at most : (퐺, 퐸, 퐷) 퐴 • 1/2 + 푛푒푔푙(푛) • is given and푛 outputs a pair of messages . 푅 • (푒,is 푑) given ← 퐺(1 ) for . 푛 0 1 • 퐴 outputs 푒 and wins if . 푚 , 푚 ∈ {0, 1} 푒 푏 푅 퐴 푐′ = 퐸 (푚 ) 푏 ← {0,′ 1} 퐴 푏 ∈ {0, 1} 푏 = 푏

P Despite it being a “chosen plaintext attack”, we don’t explicitly give access to the encryption oracle in the public key setting. Make sure you understand why giving it such access퐴 would not give it more power.

One metaphor for a public key encryption is a “self-locking lock” where you don’t need the key to lock it (but rather you simply push the shackle until it clicks and lock), but you do need the key to unlock it. So, if Alice generates , then serves as the “lock” that can be used to encrypt messages푛 for Alice while only can be used to decrypt the messages.(푒, Another 푑) = 퐺(1 way to) think푒 about it is that is a “hobbled key” that can be used for only some of the functions푑 of . 푒 10.2.1 The obfuscation paradigm 푑 Why would someone imagine that such a magical object could exist? The writing of both James Ellis as well as Diffie and Hellman sug- gests that their thought process was roughly as follows. You imagine a “magic black box” such that if all parties have access to then we could get a public key encryption scheme. Now if public key encryp- tion was impossible it퐵 would mean that for every possible program퐵 that computes the functionality of , if we distribute the code of to all parties, then we don’t get a secure encryption scheme. That means푃 that no matter what program the adversary퐵 gets, she will always be푃 able to get some information out of that code that helps break the encryp- tion, even though she wouldn’t푃 have been able to break it if was a black box. Now, intuitively understanding arbitrary code is a very hard problem, so Diffie and Hellman imagined that 푃it might bepos- sible to take this ideal and compile it to some sufficiently low level assembly language so that it would behave as a “virtual black box”. In particular, if you took,퐵 say, the encoding procedure

푘 푚 ↦ 푝 (푚) public key cryptography 195

of a block cipher with a particular key and ran it through an opti- mizing compiler, you might hope that while it would be possible to perform this map using the resulting executable,푘 it will be hard to ex- tract from it. Hence, you could treat this code as a “public key”. This suggests the following approach for getting an encryption scheme: 푘 “Obfuscation based public key encyption”: Ingredients: (i) A pseudorandom permutation col- lection where for every ,

∗ , (ii) An “obfuscat-푛 푘 푘∈{0,1} ing compiler”{푝 } polynomial-time푛 computable푛 푘 ∈ {0, 1} 푘 푝 ∶ {0, 1} → {0,such 1} that for every circuit , is a∗ circuit that computes∗ the same function as푂 ∶. {0, 1} → {0, 1} 퐶 푂(퐶) • 퐶Key Generation: The private key is , the public key is where is the cir-푛 푅 cuit that maps to 푘. ← {0, 1} 푘 푘 퐸 = 푂(퐶푛 ) 퐶 • Encryption: To encrypt 푘 with pub- 푥 ∈ {0, 1} 푝 (푥) lic key , choose IV 푛and output IV IV . 푚 ∈ {0, 1}푛 푅 • Decryption:퐸 To decrypt ←IV with{0, 1} key , output IV( , 퐸(푥 ⊕ . )) −1 ( , 푦) 푘 푘 ⊕ 푝 (푦) Diffie and Hellman couldn’t really find a way to make this work, but it convinced them this notion of public key is not inherently im- possible. This concept of compiling a program into a functionally equivalent but “inscrutable” form is known as software obfuscation. It had turned out to be quite a tricky object to both define formally and achieve, but it serves as very good intuition for what can be achieved, even if, as with the random oracle, this intuition can sometimes be too optimistic. (Indeed, if software obfuscation was possible then we could obtain a “random oracle like” hash function by taking the code of a function chosen from a PRF family and compiling it through an obfuscating compiler.) 푘 We will not푓 formally define obfuscators yet, but on an intuitive level it would be a compiler that takes a program and maps into a program such that: ′ 푃 푃 • is not much slower/bigger than (e.g., as a Boolean circuit it 1 For simplicity, assume that the program is side would′ be at most polynomially larger). effect free and hence it simply computes some function, 푃 푃 • is functionally equivalent to , i.e., for every input say from to for some . 푃 1 .′ ′ 푛 ℓ • 푃 is “inscrutable” in the sense that푃 seeing푃 (푥) the = code 푃 (푥) of is not {0, 1} {0, 1} 푛, ℓ more푥 ′ informative than getting black box access to . ′ 푃 푃 푃 196 an intensive introduction to cryptography

Let me stress again that there is no known construction of obfus- cators achieving something similar to this definition. In fact, the most natural formalization of this definition is impossible to achieve (as we might see later in this course). Only very recently (exciting!) progress was finally made towards obfuscators-like notions strong enough to achieve these and other applications, and there are some significant caveats (see my survey on this topic). However, when trying to stretch your imagination to consider the amazing possibilities that could be achieved in cryptography, it is not a bad heuristic to first ask yourself what could be possible ifonly everyone involved had access to a magic black box. It certainly worked well for Diffie and Hellman.

10.3 SOME CONCRETE CANDIDATES:

We would have loved to prove a theorem of the form:

“Theorem”: If the PRG conjecture is true, then there exists a CPA-secure public key encryption.

This would have meant that we do not need to assume anything more than the already minimal notion of pseudorandom generators (or equivalently, one way functions) to obtain public key cryptogra- phy. Unfortunately, no such result is known (and this may be inher- ent). The kind of results we know have the following form:

Theorem: If problem is hard, then there exists a CPA-secure public key encryption. 푋 Here, is some problem that people have tried to solve and couldn’t. Thus, we have various candidates for public key encryp- tion, and푋 we fervently hope that at least one of them is actually secure. 2 The dirty little secret of cryptography is that we actually don’t have There have been some other more exotic suggestions for public key encryption (including some by yours 2 that many candidates. We really have only two well studied families. truly as well as suggestions such as the isogeny star One is the “group theoretic” family that relies on the difficulty of the problem, though see also this), but they have not yet received wide scrutiny. discrete logarithm (over modular arithmetic or elliptic curves) or the integer factoring problem. The other is the “coding/lattice theoretic” family that relies on the difficulty of solving noisy linear equations or related problems such as finding short vectors ina lattice and solv- ing instances of the “knapsack” problem. Moreover, problems from the first family are known to be efficiently solvable in a computational model known as “quantum computing”. If large scale physical de- vices that simulate this model, known as quantum computers, exist, then they could break all cryptosystems relying on these problems, and we’ll be down to only having a single family of candidate public key encryption schemes. public key cryptography 197

We will start by describing cryptosystems based on the first family (which was discovered before the other and was more widely imple- mented), and talk about the second family in future lectures.

10.3.1 Diffie-Hellman Encryption (aka El-Gamal) The Diffie-Hellman public key system is built on the presumed diffi- culty of the discrete logarithm problem: For any number , let be the set of numbers where addition and multiplication are done modulo . We will think of 푝 numbers that are푝 of magnitudeℤ roughly , so they{0,… can , 푝 be − described 1} with about bits. We can clearly multiply and푛 푝 add such numbers modulo 푝in time. If and 2is any natural number, we can define푛 to be simply ( times). A priori one might 푝 think that푝 it would푝표푙푦(푛)푎 take 푔 ∈ ℤtime to푎 compute , which might be exponential푔 if itself is roughly푔 ⋅ 푔 ⋯ 푔. However,푎 we can푎 compute this in log time using푎 ⋅ 푝표푙푦(푛) the repeated푛 squaring trick푔 . The idea is that if , then푎 we can compute2 in by squaring times, and a general푝표푙푦(( can푎)ℓ ⋅ be 푛) decomposed into powers푎 of two using the binary representation.푎 = 2 푔 ℓ 푔 ℓ The discrete푎 logarithm problem is the problem of computing, given , a number such that . If such a solution exists then there is always also a solution of size푎 at most (can you see why?) 푝 푔,and ℎ ∈ so ℤ the solution can푎 be represented푔 = ℎ using bits. However,푎 cur- rently the best-known algorithm for computing푝 the discrete logarithm 푛 runs in time roughly , which is currently prohibitively expensive 3 1/3 3 The running time of the best known algorithms when is a prime of length푛 about bits. for computing the discrete logarithm modulo 2 John Gill suggested to Diffie and Hellman that modular exponen- bit primes is , where is a function 푛1/3 푛 tiation푝 can be a good source for the2048 kind of “easy-to-compute but that depends polylogarithmically푓(푛)2 on . If would hard-to-invert” functions they were looking for. Diffie and Hellman equal , then2 we’d need numbers푓(푛) of bits to get bits of security, but because푛 3 푓(푛)is larger6 based a public key encryption scheme as follows: than one,1 the current estimates are that128 we≈ need 2⋅10 to let 128 bit key to get bits of of security.푓(푛) Still, • The key generation algorithm, on input , samples a prime number the existence of such a non-trivial algorithm means 푛that = we 3072 need much larger128 keys than those used for of bits description (i.e., between to ), a number private key systems to get the same level of security. and . We also sample푛−1푛 a푛 hash function 푝 In particular, to double the estimated security to 푅 푝 푛 . The public key 2is 2 , while the푔 ← secretℤ bits, NIST recommends that we multiply the RSA keysize five-fold to . (The same document 푅4 256 key 푎푛is ←. {0, …ℓ , 푝 − 1} 푎 퐻 ∶ also says that SHA-256 gives bits of security as {0, 1} → {0, 1} 푒 (푝, 푔, 푔 , 퐻) a pseudorandom generator15, 360 but only bits when • The 푑encryption푎 algorithm, on input a message and a used to hash documents for digital256 signatures; can you see why?) 128 public key , will choose a random ℓ 4 Formally, the secret key should contain all the and output . 푚 ∈ {0, 1} 푅 information in the public key plus the extra secret 푒 =푏 (푝, 푔,푏 ℎ, 퐻) 푏 ← {0, … , 푝 − 1} information, but we omit the public information for • The decryption(푔 algorithm, 퐻(ℎ ) ⊕,푚) on input a ciphertext and the secret simplicity of notation. key, will output . 푎 (푓, 푦) The correctness of퐻(푓 the) decryption ⊕ 푦 algorithm follows from the fact that and hence computed by the encryption 푎 푏 푏 푎 푎푏 푏 (푔 ) = (푔 ) = 푔 퐻(ℎ ) 198 an intensive introduction to cryptography

algorithm is the same as the value computed by the decryption algorithm. A simple relation between the푎 discrete logarithm and the Diffie-Hellman system is the following:퐻(푓 ) Lemma 10.6 If there is a polynomial time algorithm for the discrete logarithm problem, then the Diffie-Hellman system is insecure.

Proof. Using a discrete logarithm algorithm, we can compute the private key from the parameters present in the public key, and clearly once we know the private key푎 we can decrypt any message of our choice.푎 푝, 푔, 푔 ■

Unfortunately, no such result is known in the other direction. How- ever in the random oracle model, we can prove that this protocol is secure assuming the task of computing from and (which is 5 5 One can get security results for this protocol with- now known as the Diffie-Hellman problem)푎푏 is hard. 푎 푏 out a random oracle if we assume a stronger variant 푔 푔 푔 known as the Decisional Diffie-Hellman (DDH) as- Computational Diffie-Hellman Assumption: Let sumption: for a random (prime ), the be a group whose elements can be described in triple . This implies CDH bits, with an associative and commutative multi- 푝 (can you푎 see푏 why?).푎푏 DDH푎푎, also푏,푏 푢푢 restricts ∈ ℤ our focus푝 plication픾 operation that can be computed in to groups(푔 , of 푔 , prime 푔 ) ≈ order. (푔 , 푔In, particular, 푔 ) DDH does time.푛 The Computational Diffie-Hellman (CDH) as- not hold in even-order groups. For example, DDH sumption holds with respect to the group if푝표푙푦(푛) for does not hold in (with group operation multiplication mod ) since half of its ele- every generator (see below) of and efficient ments are quadraticℤ푝 = residues {1, 2 …and 푝 −it 1} is efficient to test algorithm , the probability that on input 픾 , if an element is a quadratic residue푝 using Fermat’s outputs the element is negligible푔 픾 as a function푎 푏 little theorem (can you see why? See Exercise 10.7). 6 of . 퐴 푎푏 푔, 푔 , 푔 However, DDH holds in subgroups of of prime 퐴 푔 order. If is a safe prime (i.e. for a prime ), then we can instead use the subgroup푝 of quadratic In particular푛 we can make the following conjecture: ℤ residues,푝 which has prime order푝 =. 2푞 See + Boneh-Shoup 1 Computational Diffie-Hellman Conjecture for 푞10.4.1 for more details on the underlying groups for CDH and DDH. mod prime groups: For a random -bit prime and 푞 6 Formally, since it is an asymptotic statement, the random , the CDH holds with respect to the CDH assumption needs to be defined with a sequence group mod . 푛 of groups. However, to make notation simpler we will 푝 ignore this issue, and use it only for groups (such as That is, for푔 every∈푎 ℤ polynomial , if is large 픾 = {푔 푝 |푎 ∈ ℤ} the numbers modulo some bit primes) where we enough, then with probability at least can easily increase the “security parameter” . over the choice of a uniform prime푞 ∶ ℕ → ℕ 푛 and 푛 , for every circuit of size at1 most − 1/푞(푛)푛 , the 푛 probability that outputs푝 ∈such [2 that] 푝 푔 ∈ ℤ mod is at most푎 퐴푏 where the푞(푛) prob- ability푎푏 is taken over퐴(푔, 푝, 푔chosen, 푔 ) at randomℎ in . 푔7 = ℎ 푝 1/푞(푛) 푝 푎, 푏 ℤ 7 In practice people often take to be a generator of a group significantly smaller in size than , which enables to be smaller numbers푔 and hence mul- P tiplication to be more efficient. We ignore푝 this opti- Please take your time to re-read the following conjec- mization푎, in 푏 our discussions. ture until you are sure you understand what it means. Victor Shoup’s excellent and online available book A Computational Introduction to Number Theory and Algebra has an in depth treatment of groups, genera- public key cryptography 199

tors, and the discrete log and Diffie-Hellman problem. See also Chapters 10.4 and 10.5 in the Boneh-Shoup book, and Chapters 8.3 and 11.4 in the Katz-Lindell book. There are also solved group theory exercises at the end of this chapter.

Theorem 10.7 — Diffie-Hellman security in Random Oracle Model. Suppose that the Computational Diffie-Hellman Conjecture for mod prime groups is true. Then, the Diffie-Hellman public key encryption is CPA secure in the random oracle model.

Proof. For CPA security we need to prove that (for fixed of size and random oracle ) the following two distributions are computa- tionally indistinguishable for every two strings 픾 : 푝 퐻 ′ ℓ • for chosen uniformly푚, and 푚 independently∈ {0, 1} in 푎. 푏 푎푏 • (푔 , 푔 , 퐻(푔 ) ⊕ 푚) for푎, 푏 chosen uniformly and independently in 푝 ℤ 푎. 푏 푎푏 ′ (푔 , 푔 , 퐻(푔 ) ⊕ 푚 ) 푎, 푏 푝 (canℤ you see why this implies CPA security? you should pause here and verify this!) We make the following claim: CLAIM: For a fixed of size , generator for , and given random oracle , if there is a size distinguisher with advantage between the distribution 픾 and푝 the distribution푔 픾 (where are퐻 chosen uniformly푎 푏 and푇 푎푏 independently퐴 in 휖), then푎 푏 there is a ℓ size algorithm(푔 , 푔 , 퐻(푔to solve)) the Diffie-Hellman(푔 , 푔problem, 푈 ) with 푝 respect푎, 푏 to with success′ at least . That is, forℤ random , 푝표푙푦(푇 ) with퐴 probability at least . 푝 ′ Proof푎 of푏픾, claim: 푔 푎푏 The proof is simple.휖 We claim that under푎, 푏 the ∈ ℤas- 퐴sumptions(푔, 푔 , 푔 ) above, = 푔 makes the query to휖/(2푇 its oracle ) with probabil- ity at least since otherwise, by the “lazy푎푏 evaluation” paradigm, we can assume that 푎 is chosen independently푔 at random퐻 after ’s attack is completed휖/2 and푎푏 hence (conditioned on the adversary not mak- ing that query), the퐻(푔 value) is indistinguishable from a uniform퐴 output. Therefore, on input 푎푏 , can simulate and simply output one of the at most 퐻(푔queries푎) 푏 that′ makes to at random and will be successful with probability푔, 푔 , 푔 at퐴 least . 퐴 Now given the claim, we푇 can complete퐴 the proof of퐻 security via the following hybrids. Define the following 휖/(2푇“hybrid” ) distributions (where in all cases are chosen uniformly and independently in ):

푝 • : 푎, 푏 ℤ • : 푎 푏 푎푏 0 퐻 (푔푎, 푔푏, 퐻(푔 ) ⊕ 푚) 1 ℓ 퐻 (푔 , 푔 , 푈 ⊕ 푚) 200 an intensive introduction to cryptography

• : • : 푎 푏 ′ 2 ℓ 퐻 (푔푎, 푔푏, 푈 ⊕푎푏 푚 ) ′ 3 퐻The(푔 claim, 푔 implies, 퐻(푔 ) that ⊕ 푚 ) . Indeed otherwise we could trans- form a distinguisher between and to a distinguisher , 0 1 violating the claim by letting퐻 ≈ 퐻 . ′ 0 1 The distributions 푇 and ′are퐻 identical′ 퐻 by the′ same argument푇 as the security of the one time pad푇 (ℎ, (since ℎ , 푧) = 푇 (ℎ,is ℎ identical, 푧 ⊕ 푚) to ). 1 2 The distributions 퐻 and 퐻 are computationally indistinguishable ℓ ℓ by the same argument that . 푈 ⊕ 푚 푈 2 3 Together these imply퐻 that퐻 which yields the CPA security 0 1 of the scheme. 퐻 ≈ 퐻 0 3 퐻 ≈ 퐻 ■

R Remark 10.8 — Elliptic curve cryptography. As men- tioned, the Diffie-Hellman systems can be run with many variants of Abelian groups. Of course, for some of those groups the discrete logarithm problem might be easy, and so they would be inappropriate to use for this system. One variant that has been proposed is elliptic curve cryptography. This is a group consisting of points of the form that satisfy a certain equation, and multiplication can be3 defined according 푝 in a certain way. The(푥, main 푦, 푧) advantage ∈ ℤ of elliptic curve cryptography is that the best known algorithms run in time as opposed to , which allows for much 1/3 shorter≈푛 keys. Unfortunately,≈푛 elliptic curve cryptogra- phy is2 just as susceptible2 to quantum algorithms as the discrete logarithm problem over .

푝 ℤ

R Remark 10.9 — Encryption vs Key Exchange and El Gamal. In most of the cryptography literature the pro- tocol above is called the Diffie-Hellman Key Exchange protocol, and when considered as a public key system it is sometimes known as ElGamal encryption. 8 The reason for this mostly stems from the early confusion on what the right security definitions are. Diffie and Hellman thought of encryption as a deterministic pro- cess and so they called their scheme a “key exchange protocol”. The work of Goldwasser and Micali showed that encryption must be probabilistic for security. Also, because of efficiency considerations, these days public key encryption is mostly used as a mechanism to exchange a key for a private key encryption that is then used for the bulk of the communication. Together this means that there is not much point in distinguish- public key cryptography 201

ing between a two-message key exchange algorithm and a public key encryption.

8 ElGamal’s actual contribution was to design a signature scheme based on the Diffie-Hellman problem, a variant of which is the Digital Signature Algorithm 10.3.2 Sampling random primes (DSA) described below. To sample a random bit prime, one can sample a random number and then test if is prime. If it is not prime, then we can sample a new푛 random푛 number again. To make this work we need to 0show ≤ 푝 two < 2 properties: 푝 Efficient testing: That there is a time algorithm to test whether an bit number is a prime. It turns out that there are such known algorithms. Randomized algorithm푝표푙푦(푛) have been known since the 1970’s. Moreover푛 in a 2002 breakthrough, Manindra Agrawal, Neeraj Kayal, and Nitin Saxena (a professor and two undergraduate students from the Indian Institute of Technology Kanpur) came up with the first deterministic polynomial time algorithm for testing primality. Prime density: That the probability that a random bit number is prime is at least . This probability is in fact ln by the Prime Number Theorem. However, for푛 the sake푛 of completeness, we sketch1/푝표푙푦(푛) below a simple argument showing1/ (2 the) proba- = bilityΩ(1/푛) is at least . Lemma 10.10 The number2 of primes between and is log . Ω(1/푛 ) Proof. Recall that the least common multiple (LCM)1 푁of twoΩ(푁/ or more푁) is the smallest number that is a multiple of all of the ’s. One way to compute the LCM of is to take the prime factoriza- 1 푡 푖 푎tions, … of, 푎 all the ’s, and then the LCM is the product of all the푎 primes 1 푡 that appear in these factorizations,푎 , … , 푎each taken to the corresponding 푖 highest power that푎 appears in the factorization. Let be the number of primes between and . The lemma will follow from the following two claims: 푘 CLAIM 1: LCM 1 푁 . CLAIM 2: If is odd, then LCM푘 . The two claims immediately(1, … , 푁) ≤ 푁 imply the result, since푁−1 they imply that , and푁 taking logs we get(1, that … , 푁) ≥ 2 log or 푁 log 푘 . (We can assume that is odd without of loss of generality,2 ≤ since푁 changing from to 푁can − change 2 ≤ 푘 the푁 number푘 ≥ of (푁primes − 2)/ by at most푁 one.) Thus, all that푁 is left is to prove the two claims. Proof of CLAIM 1: Let 푁 be푁 all + the 1 prime numbers between and , and let be the largest integer such that and 1 푘 . Since is the product푝 , … , of 푝 terms, each of size푒푖 at most , 푖 푖 1푒1 푁푒푘 . But we푒 claim that every number 푝 ≤divides 푁 퐿. = 1 푘 푝Indeed,⋯ 푝 푘 every prime퐿 in the prime푘 factorization of is one of the푁 퐿’s, ≤ and 푁 since , the power in which 1appears ≤ 푎 ≤ in푁 is at most퐿 푝 푎 푖 푝 푎 ≤ 푁 푝 푎 202 an intensive introduction to cryptography

. By the definition of the least common multiple, this means that LCM . QED (CLAIM 1) 푖 푒 Proof of CLAIM 2: Consider the integral (1, … , 푁). This ≤ 퐿 is clearly some positive number and1 (푁−1)/2 so . On 0 one(푁−1)/2 hand, for every between zero and one,퐼 = ∫ 푥 and(1 − hence 푥)is at most푑푥 . On the other hand, the polynomial퐼 > 0 −(푁−1)/2푥 is some−푁+1 polynomial of푥(1 degree − 푥) at ≤ most 1/4 with퐼 (푁−1)/2 integer4 coefficients,(푁−1)/2= 2 and so for some integer 푥 (1 − 푥) 푁−1 1 푘 푁 − 1 coefficients . Since ,푘 we see that is a sum of 퐼 = ∑푘=0 퐶 ∫0 푥 푑푥 fractions with integer numerators1 and푘 with1 denominators that are at 0 푁−1 푘+1 most . Since퐶 , all … ,the 퐶 denominators∫0 푥 are= at most and 퐼 , it follows that , and so 푁 1 푁 퐼 > 0 퐼 ≥ 퐿퐶푀(1,…,푁) −푁+1 1 which implies LCM 2 ≥ 퐼 ≥ 퐿퐶푀(1,…,푁). QED (CLAIM 2 and hence lemma) 푁−1 (1, … , 푁) ≤ 2 ■

10.3.3 A little bit of group theory. If you haven’t seen group theory, it might be useful for you to do a quick review. We will not use much group theory and mostly use the theory of finite commutative (also known as Abelian) groups (in fact often cyclic) which are such a baby version that it might not be considered true “group theory” by many group theorists. Shoup’s excellent book contains everything we need to know (and much more than that). What you need to remember is the following:

•A finite commutative group is a finite set together with a multiplica- tion operation that satisfies and . 픾 • has a special element known푎 ⋅ 푏 as = 푏,⋅ where 푎 (푎 ⋅ 푏) ⋅ 푐 = (푎for ⋅ 푏) every ⋅ 푐) and for every there exists an element such that 픾 . 1 푔1 = 1푔−1 = 푔 푔 ∈−1 픾 푔 ∈ 픾 푔 ∈ 픾 •푔푔 For every= 1 , the order of , denoted , is the smallest positive integer such that . 푔 ∈ 픾 푎푔 표푟푑푒푟(푔) The following basic푎 facts are푔 all= not 1 too hard to prove and would be useful exercises:

• For every , the map is a to map from 9 9 For every , you can show a one to one and onto to where . See푎 footnote for hint mapping between the set and the set

푔 ∈ 픾 푎 ↦ 푔 푘 1 {0, … , |픾| − 1} 푓 ∈by 픾 choosing some푎 element from the • As a corollary, the order of is always a divisor of . This is a latter set and looking at the map mod . 픾 푘 = |픾|/표푟푑푒푟(푔) 푏 {푎 ∶ 푔 = 1} special case of a more general phenomenon: the set is a {푏 ∶ 푔 = 푓} 푏 푎 ↦ 푎 + 푏 |픾| subset of the group that is푔 closed under multiplication,|픾| 푎 and such {푔 | 푎 ∈ ℤ} 픾 public key cryptography 203

subsets are known as subgroups of . It is not hard to show (using the same approach as above) that for every group and subgroup , the size of divides the size of 픾. This is known as Lagrange’s Theorem in group theory. 픾 ℍ ℍ 픾 • An element of is called a generator if . A group is called cyclic if it has a generator. If is cyclic then there is a (not necessarily efficiently푔 픾 computable) isomorphism표푟푑푒푟(푔) = |픾| which is a one-to-one and onto map satisfying픾 for |픾| every . 휙 ∶ 픾 → ℤ 휙(푔 ⋅ ℎ) = 휙(푔) + 휙(ℎ) When푔, using ℎ ∈ 픾 a group for the Diffie-Hellman protocol, we want the property that is a generator of the group, which also means that the map is a one-to-one픾 mapping from to . This can be efficiently푎 푔 tested if we know the order of the group andits factorization,푎 ↦ since 푔 it will occur if and only if {0, …for , |픾| every − 1} 픾 (can you see why this holds?) and we know that푎 if then must divide (and this?). 푔 ≠ 1 푎 푎 < |픾| It is not hard to show that a random element 푔 =will 1 be a gen-푎 erator with픾 non-trivial probability (for similar reasons that a random number is prime with non-trivial probability). Hence,푔 ∈ 픾 an approach to getting such a generator is to simply choose at random and test that for all of the fewer than log numbers that are obtained by taking푎 where is a factor of . 푔 푔 ≠ 1 |픾| |픾|/푞 푞 |픾| P Try to stop here and verify all the facts on groups mentioned above. There are additional group theory exercises at the end of the chapter as well.

10.3.4 Digital Signatures Public key encryption solves the confidentiality problem, but we still need to solve the authenticity or integrity problem, which might be even more important in practice. That is, suppose Alice wants to en- dorse a message that everyone can verify, but only she can sign. This of course is extremely widely used in many settings, including software updates,푚 web pages, financial transactions, and more.

Definition 10.11 — Digital Signatures. A triple of algorithms is a chosen-message-attack secure digital signature scheme if it satisfies the following: (퐺, 푆, 푉 ) 204 an intensive introduction to cryptography

• On input , the probabilistic key generation algorithm outputs a pair 푛of keys, where is the private signing key and is the public verification1 key. 퐺 • On input(푠, 푣) a message and푠 the signing key , the signing algo-푣 rithm outputs a string such that with probability , 푚 . 푠 푠 • Every푆 efficient adversary휎wins = 푆 the(푚) following game with at 푣 푠 most1 − 푛푒푔푙(푛) negligible푉 (푚, probability: 푆 (푚)) = 1 퐴 1. The keys are chosen by the key generation algorithm. 2. The adversary gets the inputs , , and black box access to the signing algorithm(푠, 푣) . 푛 3. The adversary wins if they output1 푣 a pair such that 푠 was not queried before푆 (⋅) to the signing algorithm∗ ∗ and ∗ . (푚 , 휎 ) ∗ 푚∗ 푣 푉 (푚 , 휎 ) = 1

R Remark 10.12 — Strong unforgeability. Just like for MACs (see Definition 4.8), our definition of security for digital signatures with respect to a chosen message attack does not preclude the ability of the adversary to produce a new signature for the same message that it has seen a signature of. Just like in MACs, people sometimes consider the notion of strong unforgeability which requires that it would not be possible for the adversary to produce a new message-signature pair (even if the message itself was queried before). Some signature schemes (such as the full domain hash and the DSA scheme) satisfy this stronger notion while others do not. However, just like MACs, it is possible to transform any signature with standard security into a signature that satisfies this stronger unforgeability condition.

10.3.5 The Digital Signature Algorithm (DSA) The Diffie-Hellman protocol can be turned into a signature scheme. This was first done by ElGamal, and a variant of his scheme wasde- veloped by the NSA and standardized by NIST as the Digital Signa- ture Algorithm (DSA) standard. When based on an elliptic curve this is known as ECDSA. The starting point is the following generic idea of how to turn an encryption scheme into an identification protocol. If Alice published a public encryption key , then one natural ap- proach for Alice to prove her identity to Bob is as follows. Bob will send an encryption of some random푒 message to Alice, and Alice will send back. If , then she has푛 푒 푅 푐 = 퐸 (푥) ′ ′ 푥 ← {0, 1} 푑 푥 = 퐷 (푐) 푥 = 푥 public key cryptography 205

proven that she can decrypt ciphertexts encrypted with , and so Bob can be assured that she is the rightful owner of the public key . However, this falls short of a signature scheme in two푒 aspects: 푒 • This is only an identification protocol and does not allow Alice to endorse a particular message . • This is an interactive protocol, and so Alice cannot generate a static signature based on that can푚 be verified by any party without further interaction. 푚 The first issue is not so significant, since we can always have the ciphertext be an encryption of where is some hash function presumed to behave as a random oracle. (We do not want to simply run this protocol with 푥 = 퐻(푚). Can you see퐻 why?) The second issue is more serious. We could imagine Alice trying to run this protocol on her own푥 by= 푚 generating the ciphertext and then decrypting it, and then sending over the transcript to Bob. But this does not really prove that she knows the corresponding private key. After all, even without knowing , any party can generate a ciphertext and its corresponding decryption. The idea behind the DSA protocol is that we require Alice to generate푑 a ciphertext and its decryption satisfying푐 some additional extra conditions, which would prove that Alice truly knew the secret key. 푐

DSA Signatures: The DSA signature algorithm works as follows: (See also Section 12.5.2 in the KL book)

• Key generation: Pick generator for and and

let . Pick and to be some 10 It is a bit cumbersome, but not so hard, to transform 10 functions푎 that can be thoughtℓ of푔 as픾 “hash푎 functions”. ∈ {0, … , |픾|The − 1} public functions that map strings to strings to functions keyℎ is = 푔 (as well퐻 ∶ as {0, the 1} functions→ 픾 퐹) ∶ and 픾 → secret 픾 key is . whose domain or range are group elements. As noted in the KL book, in the actual DSA protocol is not • Signature: To sign a message with the key , pick at random, a cryptographic hash function but rather some very and let(푔, ℎ) , and then let 퐻, 퐹 , where푎 all simple function that is still assumed to be “good퐹 enough” for security. computation is푏 done modulo푚 . The−1 signature푎 is 푏 . • Verification:푓 =To 푔 verify a signature휎 = 푏 [퐻(푚)on a + message 푎 ⋅ 퐹 (푓)] , check that and . |픾| (푓, 휎) 휎 퐻(푚) 퐹(푓) (푓, 휎) 푚 푠 ≠ 0 푓 = 푔 ℎ P You should pause here and verify that this is indeed a valid signature scheme, in the sense that for every , . 푚 푠 푠 푉 (푚, 푆 (푚)) = 1 Very roughly speaking, the idea behind security is that on one hand does not reveal information about and because this is “masked” by the “random” value . On the other hand, if an adversary is 푠 푏 푎 퐻(푚) 206 an intensive introduction to cryptography

able to come up with valid signatures, then at least if we treated and as oracles, if the signature passes verification then (by taking log to the base of ) the answers of these oracles will satisfy 퐻 퐹 , which means that sufficiently many such equations should be enough to recover푔 the discrete log푥, 푦. 푏푠 = 푥 + 푎푦 푎 P Before seeing the actual proof, it is a very good exer- cise to try to see how to convert the intuition above into a formal proof.

Theorem 10.13 — Random-Oracle Model Security of DSA signatures. Sup- pose that the discrete logarithm assumption holds for the group . Then the DSA signature with is secure when are modeled as random oracles. 픾 픾 퐻, 퐹 Proof. Suppose, for the sake of contradiction, that there was a -time adversary that succeeds with probability in a chosen message attack against the DSA scheme. We will show that there is an adver-푇 sary that can퐴 compute the discrete logarithm휖 with running time and probability polynomially related to and respectively. Recall that in a chosen message attack in the random oracle model, the adversary interacts with a signature푇 oracle휖 and oracles that com- pute the functions and . For starters, we consider the following experiment CMA , where in the chosen message attack we replace the signature box with′ 퐹 the following퐻 “fake signature oracle” and “fake function oracle”. On input a message , the fake box will choose at random in (where ), and compute 퐹 푚 휎, 푟 {0, … , 푝 − 1} 푝 =mod |픾| (10.1) −1 퐻(푚) 푟 휎 푝 and output . We will then푓 = record (푔 ℎ the) value and answer on future queries to . If we’ve already answered before with a different value,ℎ then we halt the experiment퐹 (푓) and = 푟 output an error.푟 We claim that the adversary’s퐹 chance of succeeding in퐹CMA (푓) is computa- tionally indistinguishable from its chance of succeeding′ in the original CMA experiment. Indeed, since we choose the value at ran- dom, as long as we don’t repeat a value that was queried before, the function is completely random. But since the adversary푟 = 퐹 makes (푓) at most queries, and each is chosen according푓 to (10.1), which yields a random퐹 element the group (which has size roughly ), the prob- ability푇 that is repeated is푓 at most , which is negligible.푛 Now we computed in the fake box as픾 a random value, but we can2 also com- pute as equaling푓 푇mod /|픾| , where log mod 휎 −1 휎 푏 (퐻(푚) + 푎푟) 푝 푏 = 푔 푓 픾 public key cryptography 207

is uniform as well, and so the distribution of the signature is identical to the distribution by a real box. Note that we can simulate the result of the experiment CMA(푓, 휎)with- out access to the value such that . We now transform an′ al- gorithm that manages to forge a signature푎 in the CMA experiment into an algorithm′ that given푎 ℎmanages = 푔 to recover .′ We let퐴 be the message푎 and signature that the adversary outputs at∗ the∗ end∗ of a successful픾, 푔, 푔 attack. We can assume푎 without loss′ of generality(푚 , 푓 , that휎 ) is queried to the oracle at some point dur- 퐴ing the attack. (For example,∗ by modifying to make this query just before she outputs the푓 final signature.)퐹 So,′ we split into two cases: Case I: The value is first queried퐴 by the signature box. Case II: The value ∗ is first queried by the adversary. If Case I happens퐹 with (푓 ∗) non-negligible probability, then we know that the value is queried퐹 (푓 ) when producing the signature for some message ∗ , and so we know the following two equations∗ hold: 푓 ∗ (푓 , 휎) 푚 ≠ 푚 ∗ 퐻(푚) 퐹(푓 ) ∗ 휎 and 푔 ℎ = (푓 )

∗ ∗ ∗ 퐻(푚 ) 퐹(푓 ) ∗ 휎 Taking logs we get the following푔 ℎ equations= (푓 ) on log and log : 푔 ∗ 푎 = ℎ 푏 = 푔 푓 ∗ and 퐻(푚) + 푎퐹 (푓 ) = 푏휎

∗ ∗ ∗ or 퐻(푚 ) + 푎퐹 (푓 ) = 푏휎 mod ∗ ∗ −1 since all of the values푏 = (퐻(푚 ) − 퐻(푚))(휎 − 휎are) known, this푝 means we can compute , and hence also∗ recover the∗ unknown value . If Case II happens,퐻(푚 then we), 퐻(푚), split it 휎, into 휎 two cases as well. Case IIa is that this푏 happens and is queried before is푎 queried, and Case IIb is that this happens∗ and is queried after∗ is queried. 퐹 (푓 ) ∗ 퐻(푚 ) ∗ We start by considering the setting that퐹 (푓 Case) IIa happens퐻(푚 with ) non-negligible probability . By the averaging argument there are some such that with probability at least , is queried′ by the adversary at휖 the -th query and is queried by2 the∗ adversary푡 < at 푡 ∈ its {1,-th … query., 푇 } We run′ the CMA experiment∗ twice휖/푇, using푓 the same randomness up until the푡 -th query′ 푚 and independent randomness from푡 then onwards. With probability at least , both experiments will result in a successful푡 − 1 forge, and since 2was2 (휖/푇∗ ) 푓 208 an intensive introduction to cryptography

queried before at stage , we get the following equations ′ 푡 < 푡 ∗ ∗ 1 and 퐻 (푚 ) + 푎퐹 (푓 ) = 푏휎

∗ ∗ ∗ where and 2 are the answers of to the query in 퐻 (푚 ) + 푎퐹 (푓 ) = 푏휎 the first and∗ second ∗time we run the experiment. (The∗ answers of to 1 2 are the퐻 (푚 same) since퐻 this(푚 happens) before the 퐻-th step). As before,푚 we can∗ use this to recover log . 퐹 푓 If Case IIb happens with non-negligible probability,푡 . Then 푔 again by the averaging푎 argument = ℎ there are some such that with probability at least , is queried by′ 휖 the > 0 adversary at the -th query, and is queried by2 the adversary∗ 푡 < at 푡 its∈ {1,-th … query. , 푇 } We run the CMA experiment∗ twice휖/푇, using푚 the same randomness′ up until the푡 -th′ query푓 and independent randomness from푡 then onwards.′ This time we will get the two equations 푡 − 1

∗ ∗ 1 and 퐻(푚 ) + 푎퐹 (푓 ) = 푏휎

∗ ∗ ∗ where and are our two2 answers in the first and second 퐻(푚 ) + 푎퐹 (푓 ) = 푏휎 experiment,∗ and now we∗ can use this to learn 1 2 퐹 (푓. ) 퐹 (푓 ) ∗ ∗ 1 The∗ bottom−1 line is that we obtain a probabilistic푎 = 푏(휎 polynomial − 휎 )(퐹 time(푓 ) − 2 퐹algorithm(푓 )) that on input recovers with non-negligible proba- bility, hence violating the assumption푎 that the discrete log problem is hard for the group . 픾, 푔, 푔 푎 ■ 픾

R Remark 10.14 — Non-random oracle model security. In this lecture both our encryption scheme and digital signature schemes were not proven secure under a well-stated computational assumption but rather used the random oracle model heuristic. However, it is known how to obtain schemes that do not rely on this heuristic, and we will see such schemes later on in this course.

10.4 PUTTING EVERYTHING TOGETHER - SECURITY IN PRACTICE.

Let us discuss briefly how public key cryptography is used to secure web traffic through the SSL/TLS protocol that we all use whenwe public key cryptography 209

use https:// URLs. The security this achieve is quite amazing. No matter what wired or wireless network you are using, no matter what country you are in, as long as your device (e.g., phone/laptop/etc..) and the server you are talking to (e.g., Google, Amazon, Microsoft etc.) is functioning properly, you can communicate securely without any party in the middle able to either learn or modify the contents of 11 11 They are able to know that such an interaction took your interaction. place and the amount of bits exchanged. Preventing In the web setting, there are servers who have public keys, and users these kind of attacks is more subtle and approaches who generally don’t have such keys. Ideally, as a user, you should for solutions are known as steganography and anony- mous routing. already know the public keys of all the entities you communicate with e.g., amazon.com, google.com, etc. However, how are you going to learn those public keys? The traditional answer was that because they are public these keys are much easier to communicate and the servers could even post them as ads on the New York Times. Of course these days everyone reads the Times through nytimes.com and so this seems like a chicken-and-egg type of problem. The solution goes back again to the quote of Archimedes of “Give me a fulcrum, and I shall move the world”. The idea is that trust can be transitive. Suppose you have a Mac. Then you have already trusted Apple with quite a bit of your personal information, and so you might be fine if this Mac came pre-installed with the Apple public key which you trust to be authentic. Now, suppose that you want to communi- cate with Amazon.com. Now, you might not know the correct public key for Amazon, but Apple surely does. So Apple can supply Amazon with a signed message to the effect of

“I Apple certify that the public key of Amazon.com is 30 82 01 0a 02 82 01 01 00 94 9f 2e fd 07 63 33 53 b1 be e5 d4 21 9d 86 43 70 0e b5 7c 45 bb ab d1 ff 1f b1 48 7b a3 4f be c7 9d 0f 5c 0b f1 dc 13 15 b0 10 e3 e3 b6 21 0b 40 b0 a3 ca af cc bf 69 fb 99 b8 7b 22 32 bc 1b 17 72 5b e5 e5 77 2b bd 65 d0 03 00 10 e7 09 04 e5 f2 f5 36 e3 1b 0a 09 fd 4e 1b 5a 1e d7 da 3c 20 18 93 92 e3 a1 bd 0d 03 7c b6 4f 3a a4 e5 e5 ed 19 97 f1 dc ec 9e 9f 0a 5e 2c ae f1 3a e5 5a d4 ca f6 06 cf 24 37 34 d6 fa c4 4c 7e 0e 12 08 a5 c9 dc cd a0 84 89 35 1b ca c6 9e 3c 65 04 32 36 c7 21 07 f4 55 32 75 62 a6 b3 d6 ba e4 63 dc 01 3a 09 18 f5 c7 49 bc 36 37 52 60 23 c2 10 82 7a 60 ec 9d 21 a6 b4 da 44 d7 52 ac c4 2e 3d fe 89 93 d1 ba 7e dc 25 55 46 50 56 3e f0 8e c3 0a aa 68 70 af ec 90 25 2b 56 f6 fb f7 49 15 60 50 c8 b4 c4 78 7a 6b 97 ec cd 27 2e 88 98 92 db 02 03 01 00 01” 210 an intensive introduction to cryptography

Such a message is known as a certificate, and it allows you to extend your trust in Apple to a trust in Amazon. Now when your browser communicates with Amazon, it can request this message, and if it is not present not continue with the interaction or at least display some warning. Clearly a person in the middle can stop this message from travelling and hence not allow the interaction to continue, but they cannot spoof the message and send a certificate for their own public key, unless they know Apple’s secret key. (In today’s actual imple- mentation, for various business and other reasons, the trusted keys that come pre-installed in browsers and devices do not belong to Ap- ple or Microsoft but rather to particular companies such as Verisign known as certificate authorities. The security of these certificate author- ities’ private key is crucial to the security of the whole protocol, and it has been attacked before.) Using certificates, we can assume that Bob the user has the public verification key of Alice the server. Now Alice can send Bob also a public encryption key , which is authenticated by and hence guar- 12 12 If this key is ephemeral- generated on the spot anteed to be correct.푣 Once Bob knows Alice’s public key they are in for this interaction and deleted afterward- then business- he can use that푒 to send an encryption of푣 some private key , this has the benefit of ensuring the forward secrecy which they can then use for all the rest of their communication. property that even if some entity that is in the habit of recording all communication later finds out Alice’s This is, at a very high level, the SSL/TLS protocol, but there are 푘 private verification key, then it still will not be able many details inside it including the exact security notions needed to decrypt the information. In applied crypto circles this property is somewhat misnamed as “perfect from the encryption, how the two parties negotiate which crypto- forward secrecy” and associated with the Diffie- graphic algorithm to use, and more. All these issues can and have Hellman key exchange (or its elliptic curves variants), been used for attacks on this protocol. For two recent discussions see since in those protocols there is not much additional overhead for implementing it (see this blog post). The this blog post and this website. importance of forward security was emphasized by the discovery of the Heartbleed vulnerability (see this paper) that allowed via a buffer-overflow attack in OpenSSL to learn the private key of the server. Figure 10.2: When you connect to a webpage protected by SSL/TLS, the Browswer displays information on the certificate’s authenticity public key cryptography 211

Figure 10.3: The cipher and certificate used by ’‘’Google.com” ’. Note that Google has a 2048bit RSA signature key which it then uses to authenticate an elliptic curve based Diffie-Hellman key exchange protocol to create session keys for the block cipher AES with 128 bit key in Galois Counter Mode.

Figure 10.4: Digital signatures and other forms of electronic signatures are legally binding in many jurisdictions. This is some material from the website of the electronic signing company DocuSign. 212 an intensive introduction to cryptography

Example: Here is the list of certificate authorities that were trusted by default (as of spring 2016) by Mozilla products: Actalis, Amazon, AS Sertifitseer- imiskeskuse (SK), Atos, Autoridad de Certificacion Firmaprofesional, Buypass, CA Disig a.s., Camer- firma, Certicámara S.A., Certigna, Certinomis, certSIGN, China Financial Certification Author- ity (CFCA), China Internet Network Information Center (CNNIC), Chunghwa Telecom Corpora- tion, Comodo, ComSign, Consorci Administració Oberta de Catalunya (Consorci AOC, CATCert), Cybertrust Japan / JCSI, D-TRUST, Deutscher Sparkassen Verlag GmbH (S-TRUST, DSV-Gruppe), DigiCert, DocuSign (OpenTrust/Keynectis), e- tugra, EDICOM, Entrust, GlobalSign, GoDaddy, Government of France (ANSSI, DCSSI), Gov- ernment of Hong Kong (SAR), Hongkong Post, Certizen, Government of Japan, Ministry of Internal Affairs and Communications, Government of Spain, Autoritat de Certificació de la Comunitat Valen- ciana (ACCV), Government of Taiwan, Government Root Certification Authority (GRCA), Government of The Netherlands, PKIoverheid, Government of Turkey, Kamu Sertifikasyon Merkezi (Kamu SM), HARICA, IdenTrust, Izenpe S.A., Microsec e-Szignó CA, NetLock Ltd., PROCERT, QuoVadis, RSA the Security Division of EMC, SECOM Trust Systems Co. Ltd., Start Commercial (StartCom) Ltd., Swisscom (Switzerland) Ltd, SwissSign AG, Symantec / GeoTrust, Symantec / Thawte, Syman- tec / VeriSign, T-Systems International GmbH (Deutsche Telekom), Taiwan-CA Inc. (TWCA), TeliaSonera, Trend Micro, Trustis, Trustwave, Turk- Trust, Unizeto Certum, Visa, Web.com, Wells Fargo Bank N.A., WISeKey, WoSign CA Limited

10.5 APPENDIX: AN ALTERNATIVE PROOF OF THE DENSITY OF PRIMES

I record here an alternative way to show that the fraction of primes in 13 is .13 It might be that the two ways are more or less the same, as if we open up the polynomial we 푛 get the binomial coefficients. Lemma 10.15 The probability that a random bit number is prime is at 푘 푘 [2 ] Ω(1/푛) (1 − 푥) 푥 least . 푛 Proof. Let . We need to show that the number of primes Ω(1/푛) between and is푛 at least log . Consider the number . By Stirling’s푁 = 2 formula we know that log 2푁and 푁 in2푁! particular1 푁 log Ω(푁/. Also,푁) by the2푁 formula using factorials,( ) = 푁!푁! 푁 all the prime factors of2푁 are between and( ), = and (1 − each 표(1))2푁 factor 푁 푁 ≤ ( 2푁) ≤ 2푁 ( 푁 ) 0 2푁 public key cryptography 213

log cannot appear more than log times. Indeed, for every , the number of times appears in the2푁 factorization of is , 푃 푃since we get times a factor푘 = ⌊in the⌋ factorizations of 푁푖푁, 푖 푃 times a factor푁 of푃 the form , etc. Thus, the number푁! of∑ times⌊ ⌋ 푃 appears푁2 in the⌊ factorization⌋ of 푃2 is equal to {1, … , 푁} : 푃 ⌊a sum⌋ of at most elements (since푃2푁 (2푁)! ) each of which2푁푖 is either푃푁푖 푁 푁!푁! 푖 푃 푃 or . ( ) =푘+1 ∑ ⌊ ⌋−2⌊ ⌋ log 푘 푃 > 2푁 Thus, log . Taking logs we get that 0 1 prime 2푁 2푁 ⌊ 푃 ⌋ 푁 1≤푃 ≤2푁 ( ) ≤ ∏ 푃 푃 log 2푁 푁 ≤ ( ) log 푁 log log prime 2푁 푃 ≤푃 ∑∈[2푛]⌊ ⌋ 푃 log prime

≤푃 ∑∈[2푛] 2푁 establishing that the number of primes in is log . 푁 ■ [1, 푁] Ω( 푁 ) 10.6 ADDITIONAL GROUP THEORY EXERCISES AND PROOFS

Below are optional group theory related exercises and proofs meant to help gain an intuition with group theory. Note that in this class, we tend only to talk about finite commutative groups , but there are more general groups: 픾 • For example, the integers (i.e. infinitely many elements) where the operation is addition is a commutative group: if are integers, then (commutativity), (associativity), (so is the identity푎, element 푏, 푐 here; we typically푎 + 푏 = think 푏 + of푎 the identity as , especially(푎 + 푏) + when 푐 = 푎 the +(푏 group + 푐)operation is multiplication),푎 + 0 = 푎 and 0 (i.e. for any integer, we are allowed to think of its additive1 inverse, which is also an integer). 푎 + (−푎) = 0

• A non-commutative group (or a non-abelian group) is a group such that but (where is the group operation). One example (of an infinite, non-commutative group) is the set of matrices∃푎, 푏 ∈(over 픾 the푎 ∗real 푏 ≠ numbers) 푏 ∗ 푎 which∗ are invertible, and the operation is matrix multiplication. The identity element is the traditional2 × 2 identity matrix, and each matrix has an inverse (and the product of two invertible matrices is still invertible), and matrix multiplication satisfies associativity. However, matrix multiplica- tion here need not satisfy commutativity. 214 an intensive introduction to cryptography

In this class, we restrict ourselves to finite commutative groups to avoid complications with infinite group orders and annoyances with non-commutative operations. For the problems below, assume that a “group” is really a “finite commutative group”. Here are five more important groups used in cryptography other than . Recall that groups are given by a set and a binary operation.

• For some푝 prime , , with operation multiplication ℤ mod (Note: the is to distinguish this group from with an 푝 additive operation푝 ℤ and= from {1, …GF , 푝 −.) 1} 푝 • The quadratic푝 residues of : withℤ operation multiplication mod (푝) 2 푝 푝 푝 • , where (productℤ of푄 two= primes){푎 ∶ 푎 ∈ ℤ } • The quadratic residues푝 of :: , where 푛 • Ellipticℤ curve푛 = groups 푝 ⋅ 푞 2 ∗ 푛 푛 푛 ℤ 푄 = {푎 ∶ 푎 ∈ ℤ } 푛 = 푝 ⋅ 푞 For more familiarity with group definitions, you could verify that the first 4 groups satisfy the group axioms. For cryptography, two operations need be efficient for elements in group : * Exponen- tiation: . This is done efficiently using repeated squaring, i.e. generate all the푏 squares up to and then푎, 푏 use the binary픾 represen- tation. *푎, Inverse: 푏 ↦ 푎 . This is done푘 efficiently in by Fermat’s little theorem. −1 mod .2 푝 −1푎 ↦ 푎푝−2 ℤ 10.6.1 Solved exercises:푎 = 푎 푝 Solved Exercise 10.1 Is the set a group if the oper- ation is multiplication mod ? What if the operation is addition mod ? 푆 = {1, 2, 3, 4, 5, 6} 7 ■ 7 Solution: Yes (if multiplication) and no (if addition). To prove that some- thing is a group, we run through the definition of a group. This set is finite, and multiplication (even multiplication mod some number) will satisfy commutativity and associativity. The identity element is because any number times , even mod , is still itself. To find inverses, we can in this case literally find the inverses. mod 1 mod (so the inverse of 1 is ). 7 mod mod mod (so the inverse of is , and from commutativ-1 ∗ 1 ity, the7 inverse = 1 of is7 ). mod 1 1mod2 ∗ 4 mod7 =(so 8 the inverse7 = 1 of is ,7 and the inverse of 2 is 4). mod mod mod4 2(so3 ∗is 5 its own7 inverse; = 15 notice7 =that 1 an element7 can be its own3 inverse,5 even if it is not the5 identity3 6 ∗ 6). The set7 = 36 is not7 a group = 1 if the7 operation6 is addition for many reasons: one way to see this mod mod , but 1is not an element푆

1 + 6 7 = 0 7 0 public key cryptography 215

of , so this group is not closed under its operation (implicit in the definition of a group is the idea that a group’s operation must send two푆 group elements to another element within the same set of group elements). ■ Solved Exercise 10.2 What are the generators of the group , where the operation is multiplication mod ? ■ {1, 2, 3, 4, 5, 6} 7 Solution: and . Recall that a generator of a group is an element such that is the entire group. We can directly check the elements3 52 here:3 , so is not a generator.푔 is not a{푔, generator 푔 , 푔 , ⋯} because2 3 mod mod , so the set {1,is really1 , 1 , the ⋯}3 set = {1}, which1 is not the entire 2 group.2 3 will4 be a generator2 because7 =mod 8 7 =mod 1 {2,mod 2 , 2, , 2 mod, ⋯} mod{2, 4, 1}2 mod , mod mod 33 mod , mod3 7, =3 9mod 7 = 2 mod 7 3 , so 7 = 24 ∗ 3 7 = 6 7 53 , which7 = are 18 all of7 the = elements. 4 27is33 not4= a generator5 126 7 because7 = 5 3 mod 7 = 15 mod 7 = 1 mod{3, 3, so, 3 just, 3 like, 3 , 3,, we 3 } won’t = {3, get 2, every 6,3 4, 5, element. 1} is a generator because 4 mod mod 4 mod7 = 64 mod 7 = 1 mod7 2 2 mod3 mod 5 4 mod mod5 6, = so5 4, just 5 like , 7is = a 20 generator.7 =is 6, not 56 a generator7 = because 30 7mod = 2, 5 7mod =, 10 so just like7 =, the 3, set 5 7 = 15cannot contain27 = all1 elements (it3 will5 just have and6 ). 2 3 6 7 = 1 7 2 ■ {6, 6 , 6 , ⋯} 1 6 Solved Exercise 10.3 What is the order of every element in the group , where the operation is multiplication mod ? ■ {1, 2, 3, 4, 5, 6} 7 Solution: The orders (of ) are , respectively. This can be seen from the work of the previous problem, where we test out powers of elements.1, 2, 3, 4, Notice 5, 6 that1, all 3, 6,of 3, these 6, 2 orders divide the number of elements in our group. This is not a coincidence, and it is an example of Lagrange’s Theorem, which states that the size of every subgroup of a group will divide the order of a group. Recall that a subgroup is simply a subset of the group which is a group in its own right and is closed under the operation of the group. ■ 216 an intensive introduction to cryptography

Solved Exercise 10.4 Suppose we have some (finite, commutative) group . Prove that the inverse of any element is unique (i.e. prove that if , then if such that and , then ). 픾 ■ 푎 ∈ 픾 푏, 푐 ∈ 픾 푎푏 = 1 푎푐 = 1 푏 = 푐 Solution: Suppose that and that such that and . Then we know that , and then we can apply to both sides (we are푎 guaranteed ∈ 픾 that푏, 푐has ∈ SOME 픾 inverse푎푏 =in 1 the−1 푎푐group), = 1 and so we have 푎푏 = 푎푐 , but we know that−1 푎 (and we can use associativity−1 of a group),−1푎 so 푎 so−1 . QED. 푎 푎푏 = 푎 푎푐 푎 푎 = 1 (1)푏 = (1)푐 푏 = 푐■

Solved Exercise 10.5 Suppose we have some (finite, commutative) group . Prove that the identity element is unique (i.e. if for all and if for all , then ). 픾 푐푎 = 푐 푐 ∈ 픾■ 푐푏 = 푐 푐 ∈ 픾 푎 = 푏 Solution: Suppose that for all and that for all . Then we can say that (for any , but we can choose some in particular,푐푎 = we 푐 could have푐 ∈픾 picked 푐푏 = 푐 ). And푐 then ∈ 픾 has some inverse element푐푎 = 푐in = the 푐푏 group, so 푐 , but 푐 , so as desired.−1 QED 푐−1 = 1 −1 푐 −1 푐 푐 푐푎 = 푐 푐푏 ■ 푐 푐 = 1 푎 = 푏 The next few problems are related to quadratic residues, but these problems are a bit more general (in particular, we are considering some group, and a subgroup which are all of the elements of the first group which are squares) Solved Exercise 10.6 Suppose that is some (finite, commutative) group, and is the set defined by . Verify that is a subgroup of .픾 2 ℍ ℍ ∶= {ℎ ∈ 픾 ∶ ∃푔 ∈ 퐺, 푔 = ℎ} ■ ℍ 픾 Solution: To be a subgroup, we need to make sure that is a group in its own right (in particular, that it contains the identity, that it contains inverses, and that it is closed under multiplication;ℍ asso- ciativity and commutativity follow because we are within a larger set which satisfies associativity and commutativity). Identity Well, , so , so has the identity element. Inverses If 픾 2, then for some , but has an inverse in , and we can1 look= at 12 1 ∈ ℍ ℍ (where I used commu- ℎ ∈ ℍ 푔 2=−1 ℎ 2 −1푔 ∈2 픾 2 푔 픾 푔 (푔 ) = (푔푔 ) = 1 = 1 public key cryptography 217

tativity and associativity, as well as the definition of the inverse). It is clear that because there exists an element in (specif- ically, ) whose−1 2 square is . Therefore has an inverse in , where−1 if (푔 ) ∈, ℍ then −1 2 . Closure under operation픾 If 푔 , then there2 exist (푔−1 ) where−1 2 ℎ . ℍSo ℎ = 푔 ℎ =, so(푔 ) . Therefore,2 is2 a 1 2 1 2 1 1 2 2 ℎsubgroup, ℎ ∈ ℍ of . 2 2 푔 , 푔 ∈2 픾 ℎ = (푔 ) , ℎ = (푔 ) 1 2 1 2 1 2 1 2 ℎ ℎ = (푔 ) (푔 ) = (푔 푔 ) ℎ ℎ ∈ ℍ ℍ ■ Solved Exercise 10.7픾 Assume that is an even number and is known, and that for any . Also assume that is a cyclic group, i.e. there is|픾| some such that|픾| any element can be written as for some푔 integer= 1 . Also푔 ∈ assume 픾 that exponentiation픾 is efficient in this푘 context (i.e. we푔 ∈ can 픾 compute for any 푓 ∈ 픾 in an efficient 푓time for any ).푘 푟 Under the assumptions stated above,푔 prove0 ≤ that 푟 ≤ there |픾| is an efficient way to check푔 if ∈ some 픾 element of is also an element of , where is still the subgroup of squares of elements of (note: running through all possible elements of may not픾 be efficient, so thisℍ cannotℍ be your strategy). 픾 픾 ■

Solution: Suppose that we receive some element . We want to know if there exists some such that (this is equivalent to being in ). To do so,′ compute . I claim푔′ ∈2 that 픾 if and only if . 푔 ∈ 픾 |픾|/2푔 = (푔 ) |픾|/2 푔 (Provingℍ the if): If , then푔 for some푔 = 1 . We then have푔 ∈ that ℍ ′ .2 But from our′ assump- tion, an element|픾|/2 raised푔 to ∈ the′ ℍ2 order|픾|/2 푔 of = the′ (푔 group|픾|) is , so푔 ∈ 픾 , so . As푔 a result,= ((푔 if ) ) , then= (푔 ) . ′ |픾| (Proving|픾|/2 the only if): Now suppose that|픾|/2 1 (푔 ). At= this 1 point,푔 we= use 1 the fact that 푔is ∈ cyclic, ℍ so푔 let =|픾|/2 1 be the gener- ator of . We know that is some power of 푔, and this= power 1 is either even or odd. If the power픾 is even, we푓 are done. ∈ 픾 If the power is odd,픾 then 푔 for some natural number푓 . And then we see 2푘+1 . We can use the assumption|픾|/2 that푔2푘+1 = any 푓|픾|/2 element raised|픾|+|픾|/2 to its group’s|픾| |픾|/2 order푘 is , so 푔 = (푓 ) = 푓 . This= tells 푓 us푓 that the order of is at most|픾|/2 , but|픾| this|픾|/2 is a contradiction|픾|/2 because is a generator1 of1 =, so 푔 its order= 푓 cannot푓 be= less 푓 than (if it were, then looking 푓 at |픾|/2 , we would only count at most half푓 of the elements before픾 cycling2 3 back to , so this|픾| set wouldn’t contain all {푓, 푓 , 푓 , ⋯} 2 1, 푓, 푓 , ⋯ 218 an intensive introduction to cryptography

of ). As a result, we have reached a contradiction, so means that , so . |픾|/2 픾We are given that2푘 this푘 exponentiation2 is efficient,푔 so checking= 1 is푔= an 푓 efficient= (푓 ) and푔 ∈correct ℍ way to test if . QED. |픾|/2This proof idea came from here as well as from the 2/25/20 푔lecture== at1 Harvard given by MIT professor Yael Kalai.푔 ∈ ℍ Commentary on assumptions and proof: Proving that is a useful exercise in its own right, but it overlaps somewhat|픾| with our problem sets of 2020, so we will not prove it here; observe푔 that= 1 if is the set of for some prime , then this is a special case of Fermat’s Little Theorem, which states that mod픾 for {1, 2, 3, ⋯ , 푝 − 1} . Also, one can prove푝 푝−1 that (the set of numbers , with operation multiplication푎 푝 mod 푝) =for 1 a prime푎 ∈ {1,is cyclic, 2, 3, ⋯ where , 푝 − 1} one method can be found 푍 here, where the proof0, comes1, 2, ⋯ , down 푝 − to 1 factorizing certain polynomi- als and푝 decomposing푝 numbers in terms of prime powers. We can then see that this proof above says that there is an efficient way to test whether an element of is a square or not. ■ 푝 푍 11 Concrete candidates for public key crypto

In the previous lecture we talked about public key cryptography and saw the Diffie Hellman system and the DSA signature scheme. In this lecture, we will see the RSA trapdoor function and how to use it for both encryptions and signatures.

11.1 SOME NUMBER THEORY.

(See Shoup’s excellent and freely available book for extensive coverage of these and many other topics.) For every number , we define to be the set with the addition and multiplication operations modulo . When two 푚 elements are in then푚 we will alwaysℤ assume that{0, all … operations , 푚 − 1} are done modulo unless stated otherwise. We let 푚 푛 .ℤ Note that is prime if and only if∗ . 푚 푚 For every 푚 we can find using the extendedℤ =∗ {푎 gcd ∈ algorithm ℤ ∶ 푚 an푔푐푑(푎, element 푚) =(typically 1} ∗ denoted푚 as ) such that |ℤ | =(can 푚 −you 1 푚 see why?).푎 The ∈ ℤ set is an abelian group−1 with the multiplication operation, and푏 hence∗ by the observations푎 of the previous푎푏 = 1 lecture, 푚 for everyℤ . In the case that is prime, this result is ∗ known|ℤ푚| as “Fermat’s Little∗ Theorem” and is typically stated as 푚 푎mod= 1for every 푎 ∈. ℤ 푚 푝−1 푎 = 1 ( 푝) 푎 ≠ 0 R Remark 11.1 — Note on bits vs a number . One aspect that is often confusing in number-theoretic based cryptography, is푛 that one needs to푛 always keep track whether we are talking about “big” numbers or “small” numbers. In many cases in crypto, we use to talk about our key size or security parameter, in which case we think of as a “small” number of size푛 or so. However, when we work with we often think of as푛 a “big” number having about∗ 100 − 1000 digits; that is would be roughly 푚 ℤ to or so. I will푚 try to reserve the notation 100 1001000 − 1000 푚 2 2

Compiled on 9.23.2021 13:32 220 an intensive introduction to cryptography

for “small” numbers but may sometimes forget to do so, and other descriptions of RSA etc.. often use 푛 for “big” numbers. It is important that whenever you see a number , you make sure you have a sense 푛whether it is a “small” number (in which case time is considered푥 efficient) or whether it is a “large” number (in which case only time would푝표푙푦(푥) be considered efficient). 푝표푙푦(푙표푔(푥))

R Remark 11.2 — The number vs the message . In much of this course we use to denote a string which is our plaintext message푚 to be encrypted푚 or authenticated. In the context푚 of integer factoring, it is convenient to use as the composite number that is to be factored. To keep things interesting (or more honestly, because푚 = I keep 푝푞 running out of letters) in this lecture we will have both usages of (though hopefully not in the same theorem or definition!). When we talk about factoring, RSA, and Rabin,푚 then we will use as the composite number, while in the context of the abstract trapdoor-permutation based encryption and푚 signatures we will use for the mes- sage. When you see an instance of , make sure you understand what is its usage. 푚 푚

11.1.1 Primaliy testing One procedure we often need is to find a prime of bits. The typical way people do it is by choosing a random -bit number , and testing whether it is prime. We showed in the previous lecture푛 that a random bit number is prime with probability at least푛 (in푝 fact the probability is ln by the Prime Number Theorem).2 We now discuss how푛 we can test1±표(1) for primality. Ω(1/푛 ) 푛

Theorem 11.3 — Primality Testing. There is an -time algorithm to test whether a given -bit number is prime or composite. 푝표푙푦(푛) Theorem 11.3 was first푛 shown in 1970’s by Solovay, Strassen, Miller and Rabin via a probabilistic algorithm (that can make a mistake with probability exponentially small in the number of coins it uses), and in a 2002 breakthrough, Agrawal, Kayal, and Saxena gave a deterministic polynomial time algorithm for the same problem. Lemma 11.4 There is a probabilistic polynomial time algorithm that on input a number , if is prime outputs YES with probability and if is not even a “pseudoprime” it outputs NO with probability퐴 푚 푚 퐴 1 퐴 concrete candidates for public key crypto 221

at least . (The definition of “pseudo-prime” will be clarified inthe proof below.) 1/2 Proof. The algorithm is very simple and is based on Fermat’s Lit- tle Theorem: on input , pick a random , and if or mod return NO and otherwise return YES. 푚−1 푚 푎 ∈ {2, … , 푚 − 1} 푔푐푑(푎,By Fermat’s 푚) ≠ 1 little푎 theorem,≠ 1 ( the algorithm푚) will always return YES on a prime . We define a “pseudoprime” to be a non-prime number such that mod for all such that . If is푚not푚−1a pseudoprime then the set 푚 is a strict subset푎 = of 1 ( . But푚) it is easy푎 to see that 푔푐푑(푎,is a∗group 푚) =푚−1and 1 hence 푚 must푛 divide ∗and hence in particular푆 = it {푎 must ∈ ℤ be the∶ 푎 case= that 1} 푚 and so∗ℤ with probability at least 푆the algorithm will 푛 |푆|output NO∗ . |푍 | 푛 |푆| < |ℤ |/2 1/2 ■

Lemma 11.4 its own might not seem very meaningful since it’s not clear how many pseudoprimes are there. However, it turns out these pseudoprimes, also known as “Carmichael numbers”, are much less prevalent than the primes, specifically, there are about log log log pseudoprimes between and . If we choose a random−Θ( number푁/ 푁) and output it if and only if the algorithm of Lemma푁/2 11.4 algorithm outputs푛 YES (otherwise1 resampling),푁 then the probability we make푚 ∈ a [2 mistake] and output a pseudoprime is equal to the ratio of the set of pseudoprimes in to the set of primes in .

Since there are primes in , this푛 ratio is log which is푛 a negligible quantity.푛 Moreover, as mentioned푛 [2 ] above,−Ω(푛/ there푛 푛) are better[2 ] 2 algorithms thatΩ(2 succeed/푛) for all numbers.[2 ] In contrast to testing if a number is prime or composite, there is no known efficient algorithm to actually find the factorization of a composite number. The best known algorithms run in time roughly where is the number of bits. 1/3 푂(푛̃ ) 211.1.2 Fields 푛 If is a prime then is a field which means it is closed under addition and multiplication and has and elements. One property of a field is 푝 the푝 following: ℤ 0 1 Theorem 11.5 — Fundamental Theorem of Algebra, mod version. If is a nonzero polynomial of degree over then there are at most distinct inputs such that . 푝 푓 푝 푑 ℤ 푑 (If you’re curious푥 why, you푓(푥) can = see 0 that the task of, given finding the coefficients for a polynomial vanishing on

1 푑+1 푥 , … , 푥 222 an intensive introduction to cryptography

the ’s amounts to solving a linear system in variables with equations that are independent due to the non-singularity of the 푖 Vandermonde푥 matrix.) 푑 + 1 푑 +In 1 particular every has at most two square roots (numbers such that mod ). In fact, just like over the reals, every 푝 either has no2 square roots푥 ∈ ℤor exactly two square roots of the form 푠. 푝 We can푠 efficiently= 푥 푝 find square roots modulo a prime.푥 In ∈ fact,the ℤ following result is known: ±푠

Theorem 11.6 — Finding roots. There is a probabilistic log time algorithm to find the roots of a degree polynomial over . 푝표푙푦( 푝, 푑) 푝 This is a special case of the problem of factoring푑 polynomials overℤ finite fields, shown in 1967 by Berlekamp and on which much other work has been done; see Chapter 20 in Shoup).

11.1.3 Chinese remainder theorem Suppose that is a product of two primes. In this case does not contain all the numbers from to . Indeed, all the numbers∗ of 푚 the form 푚 = 푝푞 and will have non-trivial푍 g.c.d. with . There are exactly 1 푚 − 1 such numbers (because and are푝, 2푝,prime 3푝, …all , the(푞 −1)푝 numbers푞, of 2푞, the … forms , (푝 −1)푞 above are distinct). Hence 푚 푞 − 1 + 푝 − 1 . 푝 Note푞 that∗ . It turns out this is no accident: 푚 |푍 | = 푚∗ − 1 −∗ (푝 − 1)∗ − (푞 − 1) = 푝푞 − 푝 − 푞 + 1 = (푝 − 1)(푞 − 1) 푚 푝 푞 Theorem 11.7|푍— Chinese| = |ℤ | Remainder ⋅ |ℤ | Theorem (CRT). If then there is an isomorphism . That is, is one to one and onto and maps into∗ a pair∗ ∗ 푚 = 푝푞 such 푚 푝 푞 that for every 휑 ∶∗ ℤ: → ℤ × ℤ 휑 ∗ ∗ 푚 1 2 푝 푞 * 푥 ∈ ℤ∗ mod(휑 (푥), 휑 (푥)) ∈ ℤ × ℤ 푚 * 푥, 푦 ∈ ℤ mod 1 1 1 * 휑 (푥 + 푦) = 휑 (푥) + 휑 (푦)mod ( 푝) 2 2 2 * 휑 (푥 + 푦) = 휑 (푥) + 휑 (푦)mod ( 푞) 1 1 1 휑 (푥 ⋅ 푦) = 휑 (푥) ⋅ 휑 (푦) ( 푝) 2 2 2 Proof. 휑 (푥simply ⋅ 푦) = maps 휑 (푥) ⋅ 휑 (푦)to ( the pair푞) mod mod . Verify- ing that it satisfies all desired∗ properties is a good exercise. QED 푚 휑 푥 ∈ ℤ (푥 푝, 푥 푞) ■

In particular, for every polynomial and , mod iff mod and mod ∗. Therefore find- 푚 ing the roots of a polynomial modulo푓() a composite푥 ∈ ℤ 푓(푥)is easy =if 0 you know( 푚)’s factorization푓(푥) = 0. (However,푝) if you푓(푥) don’t = 0 know ( the푞) factorization then this is hard. In particular,푓() extracting square roots푚 is as hard as finding푚 out the factors: concrete candidates for public key crypto 223

Theorem 11.8 — Square root extraction implies factoring. Suppose and there is an efficient algorithm such that for every and , mod such that mod . Then, there is∗ an efficient2 algorithm퐴 to recover 2from 2 푚. ∈ ℕ 푚 푎 ∈ ℤ 퐴(푚, 푎 ( 푚)) = 푏 푎 = 푏 ( 푚) 푝, 푞 푚 Proof. Suppose that there is such an algorithm . Using the CRT we can define as for all and ∗ ∗. Now,∗ for any∗ let 퐴 −1 2 .2 Since 푝 푞 푝 푞 ∗ mod푓 ∶ ℤ ×and ℤ∗ → ℤ × ℤmod푓(푥,we 푦) know =′ 휑(퐴(휑′ that (푥 , 푦 ))) and 푝 푞 2 푥 ∈′2 ℤ . Since푦 ∈flipping ℤ 2 signs′2 doesn’t푥, 푦 (푥change, 푦 ) = the 푓(푥, ′value of푦) 푥′ = 푥, by( flipping푝) one푦 = or 푦 both( of the푞) signs of or we푥 can∈ ensure {±푥}′ ′ 푦that∈ {±푦} and . Hence .(푥 In other, 푦 ) = 푓(푥,words, 푦)′ if ′ then ′mod푥′ 푦 but mod which푥 in= particular 푥 −1푦 = means −푦′ that′ the(푥, greatest 푦) − (푥 common, 푦 ) = (0, divisor 2푦) of and is . So,푐 by = 휑 taking(푥 − 푥 , 푦 − 푦 ) 푐 = 0we ( will find푝) ,푐 from ≠ 0 which ( 푞) we can find . −1 푐 푚 This푞 almost works,푔푐푑(퐴(휑 but there(푥, is a 푦)), question 푚) of how can푞 we find , given푝 = that푚/푞 we don’t know and ? The crucial observa- tion−1 is that we don’t need to. We can simply pick a value at random 휑in (푥, 푦) . With very high probability푝 (namely푞 ) will be in , and so we can imagine this process as equivalent푎 to the{1, process … , 푚} of∗ taking a random , a random(푝 − 1 +and 푞 − then 1)/푝푞 푚 푎flipping theℤ signs of and randomly∗ and taking ∗ . By 푝 푞 the arguments above with probability푥 ∈ ℤ at least ,푦 it will∈ ℤ hold that will푥 equal푦 . 푎 = 휑(푥, 푦) ■ 2 1/4 푔푐푑(푎 − 퐴(푎 ), 푚) 푞 Note that this argument generalizes to work even if the algorithm is an average case algorithm that only succeeds in finding a square root for a significant fraction of the inputs. This observation is crucial for 퐴 cryptographic applications.

11.1.4 The RSA and Rabin functions We are now ready to describe the RSA and Rabin trapdoor functions:

Definition 11.9 — RSA function. Given a number and such that , the RSA function w.r.t and is the map such that RSA mod푚 = 푝푞 . 푒 푔푐푑((푝 −∗ 1)(푞 −∗ 1), 푒) = 1 푒 푚 푒 푚,푒 푚 푚 푚,푒 푓 ∶ ℤ → ℤ (푥) = 푥 ( 푚) Definition 11.10 — Rabin function. Given a number , the Ra- bin function w.r.t. , is the map such that mod . ∗푚 =∗ 푝푞 푚 푚 푚 2 푚 푅푎푏푖푛 ∶ ℤ → ℤ 푚 푅푎푏푖푛 (푥) = 푥 ( 푚) 224 an intensive introduction to cryptography

Note that both maps can be computed in polynomial time. Using the Chinese Remainder Theorem and Theorem 11.6, we know that 1 1 Using Theorem 11.6 to invert the function requires both functions can be inverted efficiently if we know the factorization. to be not too large. However, as we will see below However Theorem 11.6 is a much too big of a Hammer to invert the it turns out that using the factorization we can invert RSA and Rabin functions, and there are direct and simple inversion 푒the RSA function for every . Also, in practice people often use a small value for (sometimes as small as algorithms (see homework exercises). By Theorem 11.8, inverting the ) for reasons of efficiency.푒 Rabin function amounts to factoring . No such result is known for 푒 푒 = 3 the RSA function, but there is no better algorithm known to attack it than proceeding via factorization of푚 . The RSA function has the advantage that it is a permutation over : 푚∗ Lemma 11.11 RSA is one to one over 푚 . ℤ ∗ 푚,푒 푚 Proof. Suppose that RSA RSAℤ . By the CRT, it means that there is such′ that mod 푚,푒 푚,푒 and mod . But′ ′ if(푎) that’s =∗ the∗ case(푎 we) get푒 that ′푒 푝 푞 mod푒 and′푒(푥, 푦) ≠ (푥 , 푦 )mod ∈ ℤ ×. But ℤ this means푥 that= 푥 has′−1( to푒 be푝) a multiple푦 = 푦 of the( ′−1order푞)푒 of and (at least one of(푥푥 which) is=not 1 ( and hence푝) has(푦푦 order) = 1). (′−1 But since푞) ′−1 the order always divides푒 the group size, this implies that푥푥 has to푦푦 have non-trivial gcd with either 1 or and hence with> 1 . ■ ∗ ∗ 푒 푝 푞 |푍 | |ℤ | (푝 − 1)(푞 − 1)

R Remark 11.12 — Plain/Textbook RSA. The RSA trap- door function is known also as “plain” or “textbook” RSA encryption. This is because initially Diffie and Hellman (and following them, RSA) thought of an encryption scheme as a deterministic procedure and so considered simply encrypting a message by ap- plying ESA . Today however we know that it is insecure to use a trapdoor function directly as푥 an en- 푚,푒 cryption scheme(푥) without adding some randomization.

11.1.5 Abstraction: trapdoor permutations We can abstract away the particular construction of the RSA and Rabin functions to talk about a general trapdoor permutation family. We make the following definition

Definition 11.13 — Trapdoor permutation. A trapdoor permutation family (TDP) is a family of functions such that for every , the function is a permutation on and: 푛 푘 * There is a key generation algorithm{푝 } such푛 that on input푘 ∈ {0, 1} 푘 it outputs a pair푝 such that the{0, maps 1} 푛 and are efficiently computable.퐺 1 푘 −1 (푘, 휏) 푘, 푥 ↦ 푝 (푥) 휏, 푦 ↦ 푝푘 (푦) concrete candidates for public key crypto 225

• For every efficient adversary , Pr . 푛 푛 (푘,휏)←푅퐺(1 ),푦∈{0,1} −1 퐴 [퐴(푘, 푦) = 푝푘 (푦)] < 푛푒푔푙(푛)

R Remark 11.14 — Domain of permutations. The RSA func- tion is not a permutation over the set of strings but rather over for some . However, if we find primes in∗ the interval , then 푚 will be inℤ the interval 푚푛/2 = 푝푞 and푛/2 hence (which푝, 푞 has size [2푛 (1 − 푛푒푔푙(푛)),푛 2 ] ) can푚∗ be thought of as essentially[2 (1 − identical푛푒푔푙(푛)),푛 to2 ] , 푚 sinceℤ we will always푝푞 pick − 푝 elements − 푞 + 1 from = 2 (1 − 푛푒푔푙(푛))at푛 random and hence they will be in with prob-{0,푛 1} ability . It is widely∗ believed{0, 1} that for 푚 every sufficiently large there is aℤ prime in the interval1 − 푛푒푔푙(푛) (this follows from the Extended Reimann푛 Hypothesis푛 ) and푛 Baker, Harman and Pintz[2proved−that 푝표푙푦(푛),there is a 2 prime] in the interval . 2 푛 0.6푛 푛 2 Another, more minor issue is that the description [2 − 2 , 2 ] of the key might not have the same length as log ;I defined them to be the same for simplicity of notation, 11.1.6 Public key encryption from trapdoor permutations and this can be ensured via some padding and 푚 concatenation tricks. Here is how we can get a public key encryption from a trapdoor per- mutation scheme .

TDP-based푘 public key encryption (TDPENC): {푝 } • Key generation: Run the key generation algorithm of the TDP to get . is the public encryption key and is the secret decryption key. • Encryption: To encrypt(푘, 휏) a푘 message with key 휏 , choose and output 푛 where 푚 푛 is a푘 hash ∈ function {0, 1} we model푥 as ∈ a random{0, 1}푛 oracle. ℓ 푘 • Decryption:(푝 (푥), 퐻(푥)To ⊕ decrypt 푚) the퐻 ciphertext ∶ {0, 1} → {0,with 1} key , output . −1 (푦, 푧) 푘 휏 푚 = 퐻(푝 (푦)) ⊕ 푧

P Please verify that you understand why TDPENC is a valid encryption scheme, in the sense that decryption of an encryption of yields .

푚 푚 226 an intensive introduction to cryptography

Theorem 11.15 — Public key encryption from trapdoor permutations. If is a secure TDP and is a random oracle then TDPENC is a CPA 푘 secure public key encryption scheme. {푝 } 퐻 Proof. Suppose, towards the sake of contradiction, that there is a polynomial-size adversary that succeeds in the CPA game of TD- PENC (with access to a random oracle ) with non-negligible advan- tage over half. We will use퐴 to design an algorithm that inverts the trapdoor permutation. 퐻 Recall휖 that the CPA game퐴 works as follows: 퐼

• The adversary gets as input a key . 푛 • The algorithm 퐴 makes some polynomial푘 ∈ {0, amount 1} of computation and queries to the random oracle and produces a pair of messages퐴 . 1 푇 = 푝표푙푦(푛) ℓ 퐻 0 1 • The “challenger”푚 chooses, 푚 ∈ {0, 1} , chooses and computes the ciphertext ∗ ∗ which푛 is an 푅 푅 encryption of . 푏∗ ← {0,∗ 1} ∗ ∗ 푥 ← ∗ {0, 1} 푘 푏 ∗ (푦 = 푝 (푥 ), 푧 = 퐻(푥 ) ⊕ 푚 ) 푏 • The adversary푚 gets as input, makes some additional poly- nomial amount of computation∗ ∗ and queries to , and then outputs .퐴 (푦 , 푧 ) 2 푇 = 푝표푙푦(푛) 퐻 • The adversary푏 wins if . ∗ We make the following푏 =claim: 푏 CLAIM: With probability at least , the adversary will make the query to the random oracle. PROOF:∗ Suppose otherwise. We will휖 prove the claim퐴 using the “forgetful푥 gnome” technique as used in the Boneh Shoup book. By the “lazy evaluation” paradigm, we can imagine that queries to are answered by a “faithful gnome” that whenever presented with a new query , chooses a uniform and independent value 퐻as a response, and then records that to use that as answersℓ for 푅 future푥 queries. 푤 ← {0, 1} Now consider the experiment퐻(푥) where = in 푤 the challenge part we use a “forgetful gnome” that answers by a uniform and indepen- dent string and does not∗record the answer for future queries. In the∗ “forgetfulℓ experiment”,퐻(푥 the) second component of the 푅 ciphertext 푤 ← {0, 1} is distributed uniformly in and inde- pendently from∗ all∗ other∗ random choices, regardless of whetherℓ 푏 or .푧 Hence= 푤 in⊕ this 푚 “forgetful experiment” the adversary{0, 1} gets∗ no information∗ about and its probability of winning is at most푏 =. 0 But푏 the= forgetful 1 experiment∗ is identical to the actual experiment if the 푏 1/2 concrete candidates for public key crypto 227

value is only queried to once. Apart from the query of by the challenger,∗ all other queries to are made by the adversary. Under∗ our assumption,푥 the adversary퐻 makes the query with probability푥 at most , and conditioned on this퐻 not happening the∗ two experiments are identical. Since the probability of winning in푥 the forgetful exper- iment휖 is at most , the probability of winning in the overall experi- ment is less than , thus yielding a contradiction and establishing the claim. (These1/2 kind of analyses on sample spaces can be confusing; See Fig. 11.1 for a1/2+휖 graphical illustration of this argument.) Given the claim, we can now construct our inverter algorithm as follows: 퐼 • The input to is the key to the trapdoor permutation and . The goal of is to output . ∗ ∗ 퐼 푘 ∗ 푦 = • The푘 inverter simulates the adversary in a CPA attack, answering all 푝 (푥 ) 퐼 푥 its queries to the oracle by random values if they are new or the previously supplied answers if they were asked before. Whenever the adversary makes a query퐻 to , checks if and if so halts and outputs . ∗ ℎ 푥 퐻 퐼 푝 (푥) = 푦 • When the time comes to produce the challenge, the inverter 푥 chooses at random and provides the adversary with 3 3 where ∗ . ∗ 퐼∗ It would have been equivalent to answer the adver- sary with a uniformly chosen in , can you 푧 ∗ (푦 , 푧 ) ∗ ∗ see why? • The inverter continues푏 the simulation again halting an outputting ∗ ℓ 푧 = 푤 ⊕ 푚 푧 {0, 1} if the adversary makes the query such that to . ∗ 푥 We claim that up to the point we halt, the experiment푘 is identical 푥 푝 (푥) = 푦 퐻 to the actual attack. Indeed, since is a permutation, we know that if the time came to produce the challenge and we have not halted, 푘 then the query has not been made푝 yet to . Therefore we are free to choose an independent∗ random value as the value . (Our inverter does not푥 know what the value is,∗ 퐻 but this does not∗ matter for this argument: can you see why?) Therefore,∗푤 since by퐻(푥 the claim) the adversary will make the query to 푥with probability at least , our inverter will succeed with the same∗ probability. 푥 퐻 휖 ■

Figure 11.1: In the proof of security of TDPENC, we P show that if the assumption of the claim is violated, This proof of Theorem 11.15 is not very long but it the “forgetful experiment” is identical to the real is somewhat subtle. Please re-read it and make sure experiment with probability larger . In such you understand it. I also recommend you look at the a case, even if all that probability mass was on the version of the same proof in Boneh Shoup: Theorem points in the sample space where the1 − adversary 휖 in 11.2 in Section 11.4 (“Encryption based on a trapdoor the forgetful experiment will lose and the adversary function scheme”). of the real experiment will win, the probability of winning in the latter experiment would still be less than .

1/2 + 휖 228 an intensive introduction to cryptography

R Remark 11.16 — Security without random oracles. We do not need to use a random oracle to get security in this scheme, especially if is sufficiently short. We can replace with a hash function of specific properties known as a hard core construction;ℓ this was first shown by Goldreich퐻() and Levin.

11.1.7 Digital signatures from trapdoor permutations Here is how we can get digital signatures from trapdoor permutations . This is known as the “full domain hash” signatures.

푘 Full domain hash signatures (FDHSIG): {푝 } • Key generation: Run the key generation algorithm of the TDP to get . is the public verification key and is the secret signing key. • Signing: To sign a(푘, message 휏) 푘 with key , we output 휏 where is a hash function−1 modeled as푚 a random∗ oracle.휏 푛 푘 • Verification:푝 (퐻(푚))To verify a message-signature퐻 ∶ {0, 1} → {0, pair 1} we check that .

푘 (푚, 푥) 푝 (푥) = 퐻(푚) We now prove the security of full domain hash:

Theorem 11.17 — Full domain hash security. If is a secure TDP and is a random oracle then FDHSIG is chosen message attack secure 푘 digital signature scheme. {푝 } 퐻 Proof. Suppose towards the sake of contradiction that there is a polynomial-sized adversary that succeeds in a chosen message attack with non-negligible probability . We will construct an inverter for the trapdoor permutation퐴 collection that succeeds with non-negligible probability as well. 휖 > 0 Recall퐼 that in a chosen message attack the adversary makes queries to its signing box which are interspersed with queries to the random oracle . We can assume without푇 ′ 1 푇 loss of generality푚′ , … , 푚′ (by′ modifying the adversary and at most doubling푇 1 푇 the number푚 , … of , queries) 푚 that the adversary퐻 always queries the message to the random oracle before it queries it to the signing box, though it can also make additional queries to the random oracle (and hence 푖 푚in particular ). At the end of the attack the adversary outputs with probability′ a pair such that was not queried to the signing box and푇 ≥ 푇 ∗ ∗ . ∗ Our inverter 휖works∗ as(푥 follows:, 푚∗) 푚 푘 푝 (푥 ) = 퐻(푚 ) 퐼 concrete candidates for public key crypto 229

• Input: and . Goal is to output . ∗ ∗ ∗ 푘 • will guess푘 at푦 random= 푝 (푦 ) which is the step in푥 which the adversary will query to the message∗ that it is eventually going to forge 퐼in. With probability 푡 the guess∗ will be correct. 퐻 ′ 푚 • simulates the execution1/푇 of . Except for step , whenever makes a new query to the random oracle, will∗ choose a random 퐼 , compute 퐴 and designate 푡 . In퐴 step , when the adversary푛 푚 makes the query , the퐼 inverter will return∗ 푘 푥 ← {0, 1} . will record푦 = 푝the(푥) values ∗ and퐻(푚) so in particular = 푦 will푡 always∗ know∗ for every 푚 that it returned퐼 as 퐻(푚answer) = from 푦 its퐼 −1 oracle on query . (푥, 푦) ∗ 푝푘 (퐻(푚)) 퐻(푚) ≠ 푦 • When makes the query to the푚 signature box, then since was queried before to , if then returns using its records.퐴 If then푚 halts∗ and outputs “failure”.−1 푚 푘 퐻 ∗ 푚 ≠ 푚 퐼 푥 = 푝 (퐻(푚)) • At the end of the푚 = game, 푚 the퐼 adversary outputs . If then outputs . ∗ ∗ ∗ 푘 ∗ ∗ (푚 , 푥 ) 푝 (푥 ) = 푦We claim퐼 that, conditioned푥 on the probability event that the adversary is successful and the final message is the′ one queried in step , we provide a perfect simulation of the∗ actual≥ 휖/푇 game. In- deed, while∗ in an actual game, the value 푚 will be chosen independently푡 at random in , this is equivalent to choosing and letting 푛. After푦 all, = a 퐻(푚) permutation applied to the uniform푛 distribution is uniform.{0, 1} 푅 푘 푥 ←Therefore{0, 1} with probability푦 = 푝 at(푥) least the inverter will output such that hence succeeding in′ the inverter. ∗ ■ ∗ ∗ 휖/푇 퐼 푥 푘 푝 (푥 ) = 푦

P Once again, this proof is somewhat subtle. I recom- mend you also read the version of this proof in Section 13.4 of Boneh-Shoup.

R Remark 11.18 — Hash and sign. There is another reason to use hash functions with signatures. By combining a collision-resistant hash function with a signature scheme for -length∗ mes- ℓ sages, we can obtain a signatureℎ for ∶ arbitrary {0, 1} → length {0, 1} messages by defining (푆, 푉 ) ℓ and .′ 푠 푠 ′ 푆 (푚) = 푆 (ℎ(푚)) 푣 푣 푉 (푚, 휎) = 푉 (ℎ(푚), 휎) 230 an intensive introduction to cryptography

11.2 HARDCORE BITS AND SECURITY WITHOUT RANDOM ORA- CLES

The main problem with using trapdoor functions as the basis of public key encryption is twofold: > * The fact that is a trapdoor function does not rule out the possibility of computing from when is of some special form. Recall that the security푓 of a one-way function is given over a uniformly random input. Usually푥 messages푓(푥) to be sent푥 are not drawn from a uniform distribution, and it’s possible that for some certain values of it is easy to invert , and those values of also happen to be commonly sent messages. > * The fact that is a trapdoor function does푥 not rule out the possiblity푓(푥) of easily computing some푥 partial information about from . Suppose we wished푓 to play poker over a channel of bits. If even the suit or color of a card can be revealed from the encryption푥 of that푓(푥) card, then it doesn’t matter if the entire encryption cannot be inverted; being able to compute even a single bit of the plaintext makes the entire game invalid. The RSA and Rabin functions have not been successfully reversed, but nobody has been able to prove that they give semantic security. > The solution to these issues is to use a hardcore predicate of a one-way function . We first define the security of a hardcore predicate, then show howitcan be used to construct semantically secure encryption. 푓

Definition 11.19 — Hardcore predicate. Let be a one- way function (we assume is length preserving푛 for simplicity),푛 be a length function, and 푓 ∶ {0, 1} → {0,be 1} polynomial time computable. We say 푓 is a hardcore푛 predicateℓ(푛)of if for every ℓ(푛)efficient adversary , every polynomialℎ ∶ {0, 1} →, and {0, all 1} sufficiently large , ℎ 푓 퐴 푝 푛 Pr Pr

푛 푛 푛 ℓ(푛) 1 where∣ [퐴(푓(푋and ), 푏(푋are)) independently = 1] − [퐴(푓(푋 and), uniformly 푅 ) = 1]∣ distributed < 푝(푛) over and , respectively. 푛 ℓ(푛) 푋 푛 푅 ℓ(푛) That{0, is, 1} given an{0, input 1} chosen uniformly at random, no efficient adversary can distingusih푛 between a random string 푅 and given with non푥 ← negligible{0, 1} advantage. This allows us to construct semantically secure public key encryption: 푟 푏(푥) 푓(푥)

Hardcore predicate-based public key encryption:

• Key generation: Run the standard key genera- tion algorithm for the one-way function to get , where is a public key used to compute 푓 (푒, 푑) 푒 concrete candidates for public key crypto 231

the function and is a corresponding secret trapdoor key that makes it easy to invert . • Encryption: To푓 encrypt푑 a message of length with public key , pick uniformly푓 at random and compute 푚푛 . 푛 푅 푒 푥 ← {0, 1} 푒 (푓 (푥), 푏(푥) ⊕ 푚) • Decryption: To decrypt the ciphertext we first use the secret trapdoor key to compute ′ , then compute and (푐, 푐 ) 푑 푑 푒 푑 ′ 퐷 (푐) = 퐷 (푓 (푥)) = 푥 푏(푥) 푏(푥) ⊕ 푐 = 푚 P Please stop to verify that this is a valid public key encryption scheme.

Note that in this construction of public key encryp- tion, the input to is drawn uniformly at random from , so the defininition of the one-wayness of can be푛 applied푓 directly.푥 Furthermore, since is indistinguishable{0, 1} from a random string even given푓 , the output is essentially a푏(푥) one- time pad encryption of , where the key can푟 only be retrieved푓(푥) by someone푏(푥) who ⊕ can 푚 invert . Proving the security formally is푚 left as an exercise. 푓 This is all fine and good, but how do we actually construct a hardcore predicate? Blum and Micali were the first to construct a hardcore predicate based on the discrete logarithm problem, but the first construction for general one-way functions was given by Goldreich and Levin. Their idea is that if is one-way, then it’s hard to guess the exclusive or of a random subset of the input to when given 푓 and the subset itself. 푓 푓(푥)

Theorem 11.20 — A hardcore predicate for arbitrary one-way functions. Let be a one-way function, and let be defined as , where . Let be the inner product푓 mod of and . Then is a hard푔 core predicate푔(푥, of 푟) the = function (푓(푥), 푟). 푖∈[푛] 푖 푖 |푥| = |푟| 푏(푥, 푟) = ⊕ 푥 푟 2 푥The proof푟 of this푏 theorem follows the classic proof 푔 by reduction method, where we assume the exis- tence of an adversary that can predict given with non negligible advantage and construct an adversary that inverts with non negligible푏(푥, 푟) 푔(푥,probability. 푟) Let be a (possibly randomized) pro- 푓 퐴 232 an intensive introduction to cryptography

gram and for some polynomial such that 1 퐴 푝(푛) 휖 (푛) > 푛

Pr 1 Where and are푛 uniform푛 and푛 independent푛 2 distributions퐴 over . We observe[퐴(푔(푋 that , 푅being)) = insecure 푏(푋 , 푅 and)] = having+ 휖 an(푛) output of a 푛 푛 single푛 bit implies푋 that푅 such a program exists. First, we show that on {0,at least 1} fraction of the푏 possible inputs, program has a advantage in predicting the output of 퐴. 휖퐴(푛) 퐴 휖 (푛) 퐴 2 Lemma 11.21 There exists a set where such 푏 that for all , 푛 푛 퐴 푆 ⊆ {0, 1} |푆| > 휖 (푛)(2 ) 푥 ∈ 푆 Pr 퐴 푛 푛 1 휖 (푛) Proof. The result푠(푥) = follows[퐴(푔(푥, from 푅 an)) averaging = 푏(푥, 푅 )] argument. ≥ + Let , 2 2 and be the averages of over|푆|푛 푘 = 2 1 1 훼values = in∑ and푠(푥) not in 훽, = respectively,∑ 푠(푥) so 푠(푥). For 푘 푥∈푆 1 − 푘 푥∉푆 notational convenience we set . By definition 1 2 , so the fact that 푆 and gives푘훼 + (1 − 푘)훽 = + 휖 1 , 퐴 푛 2 and solving finds that . 휖 =1 휖 휖(푛) 피[푠(푋1 휖 )] =1 + 휖 훼 ≤ 1 훽 < 2 + 2 푘 + (1 − 푘) ( 2 + 2 ) > 2 +■ 휖 푘 > 휖 Now we observe that for any , we have 푛 푟 ∈ {0, 1}

where is the vector푖 with all s except a in푖 the th location. This observation follows from푥 the= 푏(푥, definition 푟) ⊕ 푏(푥, of 푟 ⊕, and 푒 ) it motivates the main 푖 idea of the푒 reduction: Guess 0 and use 1 to compute푖 , then put it together to find for all . The푏 reason guessing works 푖 will become clear later, but intuitively푏(푥, 푟) the reason퐴 we cannot푏(푥, simply 푟 ⊕ 푒 ) 푖 use to compute both 푥 and 푖 is that the probability guesses both correctly is only (standard union) bounded below 푖 by 퐴 푏(푥, 푟) . However,푏(푥, 푟 ⊕ 푒 if) we can guess correctly,퐴 then1 we only need to invoke one time to get a better than 2 퐴 퐴 half1 probability − 2 ( − 휖 of(푛)) correctly = 2휖 determining(푛) . It is then a simple푏(푥, matter 푟) of taking a majority vote over several such퐴 to determine each . 푖 푥 Now the natural question is how can we possibly 푖 guess (and here we literally mean푟 randomly guess) 푥 each value of ? The key is that the values of only need to be pairwise independent, since down the line we plan푏(푥, to 푟) use Chebyshev’s inequality on 푟 concrete candidates for public key crypto 233

the accuracy of our guesses 4 . This means that while we need many values of , we can get away with guessing log values of and combining them푝표푙푦(푛) with some trickery to푟 get more while preserving pairwise independence.(푛) 푏(푥, Since 푟) log , with non negligible probability we can correctly− 푛 guess1 all of our for polynomially 푛 many2 .= We then use to compute for all and , and since has a푏(푥, non 푟) negligible advantage 푖 by majority푟 vote we can퐴 retrieve each푏(푥, value 푟 ⊕ 푒of) to 푟invert푖 , thus contradicting퐴 the one-wayness of . 푖 푥 4 This has to do with the fact that Chebyshev’s in- 푓 푓 equality is based on the variances of random vari- ables. If we had to use the Chernoff bound we would P be in trouble, since that requires full independence. It is important that you understand why we can- For more on these and other concentration bounds, not rely on invoking twice, on both and we recommend referring to the text Probability and . It is also important that you understand Computing, by Eli Upfal. why, with non neligible퐴 probability, we푏(푥, can 푟) correctly 푖 guess푏(푥, 푟 ⊕ 푒 ) for chosen indepen- dently and uniformly at random and log . At 1 ℓ 1 ℓ the moment,푏(푥, 푟 ), it … is 푏(푥, not 푟 important) 푟 , … what 푟 trickery is used to combine our guesses, but it will reduceℓ = confusion 푂( 푛) down the line if you understand why we can get away with pairwise independence in our inputs instead of complete mutual independence.

Before moving on to the formal proof of our theorem, please stop to convince yourself that, given that some trickery exists, this strategy works for inverting . ■ Proof of Theorem 11.20푓 . We use the assumed existence of to construct , a program that inverts (which we assume is length preserving for notational conve- nience). Pick and log 퐴 ,퐵 where . Next, choose푓 and 2 all independently1 퐴 푝(푛) and uniformly푛1 at = random. |푥|푙 푙Here =푛 ⌈ we(2푛 set1 ⋅ 푝(푛)to푙 be+ 1)⌉ the guess for휖 (푛) the > value of . For푠 each, … 푠 non-empty∈ {0, 1} subset휎 , …of푖 휎 ∈ {0, 1} let . We can observe푖 that 휎 퐽 푗 푗∈퐽 푏(푥, 푠 ) 퐽 {1, 2, … 푙} 푟 = ⊕ 푠

퐽 푗 푗 by the properties of addition modulo푗∈퐽 2, so푗∈퐽 we can say 푏(푥, 푟 ) = 푏(푥, ⊕ 푠 ) = ⊕ 푏(푥, 푠 ) is the correct guess for as long as each of for 퐽 are cor-푗 푗∈퐽 rect. We can easily verify that퐽 the values are pairwise푗 independent휌 = ⊕ 휎 and uniform, so this construction푏(푥, 푟 ) gives us퐽 휎many푗 correct ∈ 퐽 pairs with probability , exactly푟 as needed. Define퐽 퐽 1 to푝표푙푦(푛) be the guess for com- 푝표푙푦(푛) puted(푏(푥, 푟 using), 휌 ) input .퐽 From here, 퐽 simply needs to set to the ma- 푖 푖 퐺(퐽, 푖) =퐽 휌 ⊕ 퐴(푓(푥), 푟 ⊕ 푒 ) 푥 푖 푟 퐵 푥 234 an intensive introduction to cryptography

jority value of our guesses over the possible choices of and output . Now we prove that given퐺(퐽, that 푖) our guesses are all correct,퐽 for all and푥 for every , we have 퐽 휌 푥 ∈ 푆 Pr 1 ≤ 푖 ≤ 푛 푙 푖 1 1 That is, with[|{퐽|퐺(퐽, probability 푖) at = least 푥 }| > (2 −, 1)] more > 1 than − half of our guesses for are correct, where2 1 is the number2푛 of non 푛 empty푙 subsets of . 1 − 푂( 푙 ) 푖 (2 − 1) 푥 2 − 1 For every , define to be the indicator that 퐽 {1, 2,, and … 푙} we can observe that is bernoulli 퐽 with expected퐽 value퐼 (again, given that our 푖 퐽 퐺(퐽,guess 푖) for = 푥 is correct). Pairwise independence퐼 of the is given퐽 by the푠(푥) pairwise independence of the . Setting푏(푥, 푟 ) , defining , 퐽 and using퐽 퐼 Chebyshev’s푙 inequality, we get 1 1 2 푞(푛) 푟 푚 = 2 − 1 푠(푥) = +

Pr Pr

퐽 1 퐽 1 1 푚 [∑ 퐼 ≤ 푚] ≤ [∣∑ 퐼 − ( + ) 푚∣ ≥ 푚] 퐽 2 Pr 퐽 2 푞(푛) 푞(푛)

퐽 퐽 푚 = Var[∣∑ 퐼 − 피 [∑ 퐼 ]∣ ≥ ] 퐽 퐽 푞(푛) 퐽 푚 (퐼2 ) ≤ 푚 ( 푞(푛) ) 1 4 2 ≤ 1 Since we know ( 푞(푛) ) 푚 , so 퐴 1 휖 (푛) 1 푥 ∈ 푆 푞(푛) ≥ 2 ≥ 2푝(푛) 1 1 4 4 2 2 1 Putting it all together,1 ≤must first1 pick an 2 = , then correctly ( 푞(푛) ) 푚 ( 2푝(푛) ) 2푛 ⋅ 푝(푛) 2푛 guess for all , then must correctly compute on more푖 than half of the퐵 . Since each of these푥 ∈ 푆 events happens퐽 independently,휎 푖 we ∈ get [1, 2, …’s 푙]success퐽 퐴 probability to be 푏(푥, 푟 ⊕ 푖 푒 ) 푟 , which1푙 is non1 퐴 2 2푛 negligible1 in2 . This1 contradicts퐵 1 the1 assumption2 1 that1 휖 3 (푛)(is a one)(1− way ) = 퐴 2푛푝(푛) 2푛 푝(푛) 2푛푝(푛) 2 4푛푝(푛) function,휖 (푛)( so no)(1 adversary − ) > ( can)( predict)( ) = given with a non negligible푛 advantage, and is a hardcore predicate푓 of . 퐴 푏(푥, 푟) (푓(푥), 푟) 11.2.1 Extending to more than one푏 hardcore bit 푔 By definition, as constructed above is only a hardcore predicate of length . While it’s great that this method works for any arbitrary one- 푏 1 concrete candidates for public key crypto 235

way function, in the real world messages are sometimes longer than a single bit. Fortunately, there is hope: Goldreich and Levin’s hardcore bit construction can be used repeatedly to get a hardcore predicate of logarithmic length.

Theorem 11.22 — Logarithmically many hardcore bits for arbitrary one-way functions. Let be a one-way function, and define , where and . Let be a constant, 2 and 푓log . Let denote the innter푔 product(푥, 푠) mod 2 of = (푓(푥),the binary 푠) vectors|푥|and = 푛 |푠| =, 2푛 where 푐 > 0 . Then 푖 the function푙(푛) = ⌈푐 푛⌉ 푏 (푥, 푠) is a hardcore function of 푖+1 푖+푛 1 2푛 . 푥 (푠 , … 푠 ) 푠 = (푠 , … 푠 ) 1 푙(푛) ℎ(푥, 푠) = 푏 (푥, 푠) … 푏 (푥, 푠) 2 푔It’s clear that this is an imporant improvement on a single hardcore bit, but still nowhere near useable in general; imagine encrypting a text document with a key exponentially long in the size of the docu- ment. A completely different approach is needed to obtain a hardcore predicate with length polynomial in the key size. Bellare, Stepanovs, and Tessaro manage to pull it off using indistinguishability obfusca- tion of circuits, a cryptographic primitive which, like the existence of PRGs, is assumed to exist.

Theorem 11.23 — Polynomially many hardcore bits for arbitrary one-way func- tions. Let F be a one-way function family and G be a punctured PRF with the same input length of F. Then under the assumed existence of indistinguishability obfuscators, there exists a function family H that is hardcore for F. Furthermore, the output length of H is the same as the output length of G.

Since the output length of can be polynomial in the length of its input, it follows that outputs polynomially many hardcore bits in the length of its input. The퐺 proofs of Theorem 11.22 and Theo- rem 11.23 require the usage퐻 of results and concepts not yet covered in this course, but we refer interested readers to their original papers: Goldreich, O., 1995. Three XOR-lemmas-an exposition. In Elec- tronic Colloquium on Computational Complexity (ECCC). Bellare, M., Stepanovs, I. and Tessaro, S., 2014, December. Poly- many hardcore bits for any one-way function and a framework for differing-inputs obfuscation. In International Conference on the The- ory and Application of Cryptology and Information Security (pp. 102- 121). Springer, Berlin, Heidelberg.

12 Chosen ciphertext security for public key encryption

To be completed

Compiled on 9.23.2021 13:32

13 Lattice based cryptography

Lattice based public key encryption (and its cousins known as knapsack and coding based encryption) have almost as long a his- tory as discrete logarithm and factoring based schemes. Already in 1976, right after the Diffie-Hellman key exchange was discovered (and before RSA), Ralph Merkle was working on building public key encryption from the NP hard knapsack problem (see Diffie’s recollec- tion). This can be thought of as the task of solving a linear equation of the form (where is a given matrix, is a given vector, and the unknown are ) over the real numbers but with the addi- tional constraint퐴푥 = that 푦 must퐴 be either or . His푦 proposal evolved into the Merkle-Hellman system푥 proposed in 1978 (which was broken in 1984). 푥 0 1 McEliece proposed in 1978 a system based on the difficulty of the decoding problem for general linear codes. This is the task of solving noisy linear equations where one is given and such that for a “small” error vector , and needs to recover . Crucially, here we work in a finite field, such as working퐴 푦 modulo for some푦 = prime 퐴푥 + 푒 (that can even be ) rather푒 than over the reals or푥 rationals. There are special matrices for which we know how to푞 solve this problem 푞efficiently: these2 are∗ known as efficiently decodable error correcting codes. McEliece suggested퐴 a scheme where the key generator lets be a “scrambled” version of a special (based on the Goppa algebraic geometric code). So, someone that knows∗ the scrambling could solve퐴 the problem, but (hopefully) someone퐴 that doesn’t know it wouldn’t. McEliece’s system has so far not been broken. In a 1996 breakthrough, Ajtai showed a private key scheme based on integer lattices that had a very curious property- its security could be based on the assumption that certain problems were only hard in the worst case, and moreover variants of these problems were known to be NP hard. This re-ignited the hope that we could perhaps realize the old dream of basing crypto on the mere assumption that NP.

푃 ≠

Compiled on 9.23.2021 13:32 240 an intensive introduction to cryptography

Alas, we now understand that there are fundamental barriers to this approach. Nevertheless, Ajtai’s work attracted significant interest, and within a year both Ajtai and Dwork, as well as Goldreich, Goldwasser and Halevi came up with lattice based constructions for public key encryp- tion (the former based also on worst case assumptions). At about the same time, Hoffstein, Pipher, and Silverman came up with their NTRU public key system which is based on stronger assumptions but offers better performance, and they started a company around it together with Daniel Lieman. You may note that I haven’t yet said what lattices are; we will do so later, but for now if you simply think of questions involving linear equations modulo some prime , you will get enough of the intuition that you need. (The lattice viewpoint is more geometric, and we’ll discuss it more below; it was first푞 used to attack cryptosystems and in particular break the Merkle-Hellman knapsack scheme and many of its variants.) Lattice based cryptography has captured a lot of attention recently from both theory and practice. In the theory side, many cool new constructions are now based on lattice based cryptography, and chief among them fully homomorphic encryption, as well as indistinguisha- bility obfuscation (though the latter’s security’s foundations are still far less solid). On the applied side, the steady advances in the technol- ogy of quantum computers have finally gotten practitioners worried about RSA, Diffie Hellman and Elliptic Curves. While current con- structions for quantum computers are nowhere near being able to, say, factor larger numbers that can be done classically (or even than can be done by hand), given that it takes many years to develop new standards and get them deployed, many believe the effort to transition away from these factoring/dlog based schemes should start today (or perhaps should have started several years ago). The NSA has sug- gested that it plans to initiate the process to “transition to quantum resistant algorithms in the not too distant future”; see also this very interesting FAQ on this topic. Cryptography has the peculiar/unfortunate feature that if a ma- chine is built that can factor large integers in 20 years, it can still be used to break the communication we transmit today, provided this communication was recorded. So, if you have some data that you expect you’d want still kept secret in 20 years (as many government and commercial entities do), you might have reasons to worry. Cur- rently lattice based cryptography is the only real “game in town” for potentially quantum-resistant public key encryption schemes. lattice based cryptography 241

Lattice based cryptography is a huge area, and in this lecture and this course we only touch on few aspects of it. I highly recommend Chris Peikert’s Survey for a much more in depth treatment of this area.

13.0.1 Quick linear algebra recap A field is a set that supports the operations and contains the numbers and (more formally the additive identity and multiplica- tive identity)픽 with the usual properties that the+,⋅ real numbers have. (That is associative,0 1 commutative, and distributive law, the fact that for every there is an element such that and that if there is an element such that .) Apart from the real푥 ∈ numbers, 픽 the main field−1−푥 we will be푥 +−1interested (−푥) = 0 in this sec- 푥 ≠ 0 푥 푥 ⋅ 푥 = 1 tion is the field of the numbers with addition and 1 1 While this won’t be of interest for us in this chapter, multiplication done modulo , where is a prime number. one can also define finite fields whose size isa prime 푞 You should beℤ comfortable with{0, the 1, following … , 푞 − 1} notions: power of the form where is a prime and is an integer; this is sometimes useful and in particular 푞 푞 푘 • A vector and a matrix . An matrix has fields of size are푞 sometimes푞 used in practice.푘 In such fields we푘 usually think of the elements as vector rows and columns.푛 We think of vectors푚×푛 as column vectors and so with2 addition done component-wise but multiplication is not defined component-wise (since we can think푣 ∈ of픽 a vector 푀as ∈ an 픽 matrix.푚 × 푛 We write the푚 푘 otherwise푣 ∈ (ℤ푞) a vector with a single coordinate zero -the coordinate푛 of as and the푛 -th coordinate of as would not have an inverse) but in a different way, via (i.e. the coordinate in the푣 ∈-th 픽 row and푛 the × 1-th column.) We often interpreting these vectors as coefficients of a degree 푖 푖,푗 write푖 a vector as 푣 푣 but we(푖, 푗) still mean that it’s a푀 column푀 polynomial. vector unless we say otherwise.푖 푗 푘 − 1 1 푛 푣 (푣 , … , 푣 ) • If is a scalar (i.e., a number) and is a vector then is the vector . If are dimensional푛 vectors then is훼 the ∈ vector 픽 . 푣 ∈ 픽 훼푣 1 푛 (훼푣 , … , 훼푣 ) 푢, 푣 푛 푢 + 푣 1 1 푛 푛 •A linear subspace(푢 + 푣 , … , 푢is a+ non-empty 푣 ) set of vectors such that for every vectors 푛 and , . In partic- ular this means that푉 ⊆ 픽contains the all zero vector (can you see why?). A subset 푢, 푣 ∈ 푉is linearly훼, 훽 independent ∈ 픽 훼푢 +if 훽푣 there ∈푛 푉 is no collec- tion and푉 scalars such that 0 . It is known (and not퐴 hard ⊆ 푉 to prove) that if is linearly independent푛 1 푘 1 푘 푖 푖 then푎 , … , 푎 . It∈ is 퐴 known that for훼 , every … , 훼 such linear∑ subspace 훼 푎 = 0 there is a linearly independent set 퐴 of vectors, with , such|퐴| that ≤ for 푛 every there exist such that . 1 푑 Such a set is known as a basis퐵for = {푏. A, … subspace , 푏 } has many bases,푑 ≤ 푛 1 푑 푖 푖 but all of them have푢 the ∈ 푉 same size which훼 , … , is 훼 known as the푣 =dimen- ∑ 훼 푏 sion of . An affine subspace is a set푉 of the form푉 where is a linear subspace. We can푑 also write as . We 0 denote푉 the dimension of as the dimension푈 of {푢in such+ 푣∶ a 푣 case. ∈ 푉 } 2 There is a much more general notion of inner prod- 0 푉 푈 푢 + 푉 uct typically defined, and in particular over fields • The inner product (also known as “dot product”) between such as the complex numbers we would define the 푈 푉 inner product as where for , denotes two vectors of the same dimension that is defined as (ad- the complex conjugate of . However, we stick to the dition done in the field ).2 ⟨푢, 푣⟩ simple case above∑ for푢푖 this푣푖 chapter. 푎 ∈ ℂ 푎 푖 푖 푛 ∑ 푢 푣 푎 픽 242 an intensive introduction to cryptography

• The matrix product AB of an and a matrix, that results in an matrix. If we think of the rows of as the vectors and the columns푚 × 푘 of as푘 × 푛 , then the - th coordinate푚 × 푛 of푘 AB is . Matrix product퐴 is associative and 1 푚 1 푛 퐴satisfies, … , 퐴 the∈ 픽distributive law but is not퐵 commutative퐵 , … , 퐵: there are pairs(푖, 푗) 푖 푗 of square matrices ⟨퐴such, 퐵 ⟩ that AB BA.

• The transpose of an matrix is the matrix such that 퐴, 퐵 ≠ . ⊤ ⊤ 푛 × 푚 퐴 푚 × 푛 퐴 • The inverse푖,푗 of푗,푖 a square matrix is the matrix such that (퐴 ) = 퐴 AA where is the identity matrix such that−1 if −1and otherwise.푛 × 푛 퐴 퐴 푖,푗 = 퐼 퐼 푛 × 푛 퐼 = 1 • The rank of푖,푗 an matrix is the minimum number such that 푖 = 푗 퐼 = 0 we can write as where and . We can think of the푚 ×’s 푛 as푟 the columns퐴⊤ of an 푚 matrix 푟 and푛 the 푖=1 푖 푖 푖 푖 ’s as the rows퐴 of∑ an 푢 (푣matrix) , and푢 ∈ hence 픽 the rank푣 ∈ of 픽 is the 푖 minimum such푢 that UV where is푚 × 푟 and 푈is . It 푖 푣can be shown that an푟 × 푛 matrix푉 is full rank if and only if퐴 it has an inverse. 푟 퐴 = 푈 푚 × 푟 푉 푟 × 푛 푛 × 푛 • Solving linear equations can be thought of as the task of given an matrix and -dimensional vector , finding the -dimensional vector such that . If the rank of is at least (which 푚× 푛in particular퐴 means푚 that ) then푦 it means that by푛 dropping 푥rows of and퐴푥 coordinates = 푦 of we퐴 can obtain the푛 equation where is an푚 ≥ 푛matrix that has an inverse. In this 푚case′ − a 푛 solution′ 퐴 (if it′ exists) will be equal푦 to . If for a set of equations퐴 푥 = 푦 we have퐴 푛and × 푛 we can find two′ −1 such matrices such that then we say it(퐴 is over) 푦 determined and′ ″ in such a case′ −1 it has푚 no > solutions.″ 푛−1 If a set of equations has more퐴 , 퐴 variables (퐴than) 푦 equations ≠ (퐴 ) 푦 we say it’s under-determined. In such a case it either has no solutions or the solutions form an affinte subspace푛 of dimension at least푚 .

• The gaussian elimination algorithm can be used to obtain, given a set 푛 − 푚 of equations a solution to if such exists or a certification that no solution exists. It can be executed in time polynomial in the dimensions and퐴푥 the= 푦 bit complexity푥 of the numbers involved. This algorithm can also be used to obtain an inverse of a given matrix , if such an inverse exists. 퐴

R Remark 13.1 — Keep track of dimensions!. Through- out this chapter, and while working in lattice based lattice based cryptography 243

cryptography in general, it is crucial to keep track of the dimensions. Whenever you see a symbol such as ask yourself:

• Is it a scalar, a vector or a matrix? 푣, 퐴, 푥, 푦 • If it is a vector or a matrix, what are its dimensions? • If it’s a matrix, is it “square” (i.e., ), “short and fat” (i.e., ) or “tall and skinny”? ( )? 푚 = 푛 푚 ≪ 푛 푚 ≫ 푛

13.1 A WORLD WITHOUT GAUSSIAN ELIMINATION

The general approach people used to get a public key encryption is to obtain a hard computational problem with some mathematical structure. We’ve seen this in the discrete logarithm problem, where the task is to invert the map mod , and the integer factoring problem, where the task is to invert푎 the map . Perhaps the 3 Despite the name, Gaussian elimination has been simplest structure to consider푎 ↦ 푔 is the( task푝) of solving linear equations. known to Chinese mathematicians since 150BC or so, Pretend that we didn’t know of Gaussian elimination,푎, 푏 ↦ 푎 ⋅ 푏 3 and that and was popularized in the west through the 1670 notes of Isaac Newton. if we picked a “generic” matrix then the map would be hard to invert. (Here and elsewhere, our default interpretation of a vector is as a column vector, and퐴 hence if is푥dimensional ↦ 퐴푥 and is then is dimensional. We use to denote the row vector obtained푥 by transposing .) Could we푥 use⊤푛 that to get a public 퐴key encryption푚 × 푛 scheme?퐴푥 푚 푥 Here is a concrete approach.푥 Let us fix some prime (think of it as polynomial size, e.g., is smaller than or so, though people can and sometimes do consider of exponential size), and푞 all computation below will be done modulo푞 . The secret1024 key is a vector , and the public key is where푞 is a random matrix with푛 entries 푞 in and . Under our푞 assumption, it is hard to recover푥 ∈ ℤ the secret key from the(퐴, public 푦) key,퐴 but how do we푚 use × 푛 the public key to 푞 encrypt?ℤ 푦 = 퐴푥 The crucial observation is that even if we don’t know how to solve linear equations, we can still combine several equations to get new ones. To keep things simple, let’s consider the case of encrypting a single bit.

P If you have a CPA secure public key encryption scheme for single bit messages then you can extend it to a CPA secure encryption scheme for messages of any length. Can you see why? 244 an intensive introduction to cryptography

If are the rows of , we can think of the public key as the set of equations in the unknown vari- 1 푚 ables푎 ., … The , 푎 idea is that to encrypt퐴 the value we will generate a new 1 1 푚 푚 correct equation on⟨푎 , 푥⟩, while = 푦 to, … encrypt , ⟨푎 , 푥⟩ the = value 푦 we will generate an incorrect푥 equation. To decrypt a ciphertext 0 , we think of it as an equation of푥 the form and output1 푛+1if and only if the 푞 equation is correct. (푎, 휎) ∈ ℤ How does the encrypting algorithm,⟨푎, 푥⟩ = 휎 that does not1 know , get a correct or incorrect equation on demand? One way would be to simply take two equations and and add푥 them together to get the equation . This equation is 푖 푖 푗 푗 correct and so one can use⟨푎 it to, 푥⟩ encrypt = 푦 , while⟨푎 , 푥⟩ to = encrypt 푦 we simply 푖 푗 푖 푗 add some fixed nonzero ⟨푎number+ 푎 , 푥⟩ =to 푦 the+ right 푦 hand side to get the incorrect equation 0 . However,1 even if it’s 푞 hard to solve for given the equations,훼 ∈ ℤ an attacker (who also knows 푖 푗 푖 푗 the public key ) can⟨푎 + try 푎 itself, 푥⟩ = all 푦 pairs+ 푦 of+ equations 훼 and do the same thing. 푥 Our solution(퐴, for 푦) this is simple- just add more equations! If the en- cryptor adds a random subset of equations then there are possibili- ties for that, and an attacker can’t guess them all. That is, if푚 the rows of are , then we can pick a vector at random,2 and consider the equation where and푚 . In 1 푚 퐴other words,푎 , … , 푎 we can think of this as the equation푤 ∈ {0, 1} (note 푖 푖 푖 푖 that and⟨푎, so 푥⟩we = can 푦 think of푎 = this ∑ as 푤 the푎 ⊤ equation푦 = ∑ that 푤 푦 we obtain from ⊤ by multiplying both sides on푤 the퐴푥 right = ⟨푤, by the푦⟩ row vector⟨푤, 푦⟩). = 푤 푦 Thus,⊤ at least퐴푥 = intuitively, 푦 the following encryption scheme would be “secure”푤 in the Gaussian-elimination free world of attackers that haven’t taken freshman linear algebra:

Scheme “LwoE-ENC”: Public key encryption under the hardness of “learning linear equations without errors”.

• Key generation: Pick random matrix over , and , the secret key is and the pub- lic key is where푛 푚. × 푛 퐴 푞 푅 푞 • Encryptionℤ 푥 ←: Toℤ encrypt a message 푥 , pick (퐴,and 푦) output푦 = 퐴푥 for some fixed nonzero푚 . ⊤ 푏 ∈ {0, 1} 푤 ∈ {0, 1} 푤 퐴, ⟨푤, 푦⟩ + 훼푏 • Decryption: To decrypt푞 a ciphertext , output iff .훼 ∈ ℤ (푎, 휎) 0 ⟨푎, 푥⟩ = 휎 lattice based cryptography 245

P Please stop here and make sure that you see why this is a valid encryption (not in the sense that it is secure - it’s not - but in the sense that decryption of an encryption of returns the bit ), and this descrip- tion corresponds to the previous one; as usual all calculations are done푏 modulo . 푏

13.2 SECURITY IN THE REAL WORLD.

Like it or not (and cryptographers typically don’t) Gaussian elimina- tion is possible in the real world and the scheme above is completely insecure. However, the Gaussian elimination algorithm is extremely brittle. Errors tend to be amplified when you combine equations. This is usually thought of as a bad thing, and numerical analysis is much about dealing with issue. However, from the cryptographic point of view, these errors can be our saving grace and enable us to salvage the security of the ridiculous scheme above. To see why Gaussian elimination is brittle, let us recall how it works. Think of for simplicity. Given equations in the unknown variables , the goal of Gaussian elimination is to trans- form them into the푚 equations= 푛 where is the identity퐴푥 = matrix 푦 (and hence the solution푥 is simply ′ ). Recall how we do it: by rearranging and scaling, we can퐼푥 assume = 푦 ′ that the퐼 top left corner of is equal to , and then we add the푥 first = 푦 equation to the other equa- tions (scaled appropriately) to zero out the first entry in all the퐴 other rows of (i.e.,1 make the first column of equal to ) and 4 Over , we can think of also as the number continue onwards to the second column and so on and so forth. , and so on. Thus if , we define to be the minimum푞 of and . This ensures the absolute Now,퐴 suppose that the equations were퐴noisy, in the(1, sense 0, … ,0) that we ℤ 푞 − 1 value−1 satisfies the natural푎 ∈ ℤ 푞property of |푎| . 4 added to a vector such that for every . Even ignor- 푎 푞 − 푎 ing the effect of the scaling푚 step, simply adding the first equation to |푎| = | − 푎| 푞 푖 the rest of푦 the equations푒 ∈ ℤ would typically|푒 |

Conjecture (Learning with Errors, Regev 2005): Let and be some functions. The Learning with Error (LWE) conjecture with respect to ,denoted푞 = 푞(푛) as LWE훿 =, is 훿(푛) the following conjecture: for every polynomial and polynomial-time 푞,훿 adversary푞, 훿 , 푚(푛) 푅Pr where for and , this probability is taken over[푅(퐴,a random 퐴푥 + 푒) = 푥] < 푛푒푔푙(푛)matrix in , a random vector푞 = in 푞(푛) and훿a = random 훿(푛) “noise vector” 푞 in where 퐴 푛 for every푚 × 푛 . 5 ℤ 푥 푞 푚 ℤ , 푒 The 푞LWE conjecture푖 (without any parameters) is that thereℤ is some|푒 absolute| < 훿푞 constant푖 ∈such [푚] that for every polynomial there, if then LWE 6 holds with respect to and 푐 푐 . 푝(푛) 푞(푛) > 푝(푛) 푞(푛) 훿(푛) = 1/푝(푛) 6 One can think of as chosen by simply letting every It is important to note the order of quantifiers in the learning with coordinate be chosen at random in . For technical푒 reasons, we sometimes error conjecture. If we want to handle a noise of low enough mag- consider other distributions and in{−훿푞, particular −훿푞 the + nitude (say ) then we need to choose the modulos to 1,discrete … , +훿푞} Gaussian distribution which is obtained by letting every coordinate of be an independent be large enough (for example2 it is believed that will be good Gaussian random variable with standard deviation enough for this훿(푛) case) = 1/푛 and then the adversary can choose4 to푞 be as , conditioned on it being푒 an integer. (A closely big a polynomial as they like, and of course run푞 in > time 푛 which is an ar- related distribution is obtained by picking such a 훿푞Gaussian random variable and then rounding it to the bitrary polynomial in . Therefore we can think of such an푚(푛) adversary nearest integer.) as getting access to a “magic box” that they can use 6 People sometimes also consider variants where both number of times to get푛 “noisy equations on ” of the form with and can be as large as exponential. 푅 , where ). 푚 = 푝표푙푦(푛) 푖 푖 푝(푛) 푞(푛) 푛 푥 (푎 , 푦 ) 푖 푞 푖 푞 푖 푖 푖 13.3푎 ∈ ℤ SEARCH푦 ∈ ℤ TO DECISION푦 = ⟨푎 , 푥⟩ + 푒 It turns out that if the LWE is hard, then it is even hard to distinguish between random equations and nearly correct ones:

Theorem 13.2 — Search to decision reduction for LWE. If the LWE con- jecture is true then for every and and , the following two distributions are computationally indistinguishable: 푞 = 푝표푙푦(푛) 훿 = 1/푝표푙푦(푛) 푚 = 푝표푙푦(푛) • where is random matrix in , is random

in and is random noise vector of magnitude . Figure 13.1: The search to decision reduction (Theo- 푞 {(퐴,푛 퐴푥+푒)} 푚 퐴 푚×푛 ℤ 푥 rem 13.2) implies that under the LWE conjecture, for 푞 푞 • ℤ where푒 ∈ ℤ is random matrix in and is훿 random every , if we choose and fix a random in . matrix over , the distribution is 푞 indistinguishable푚 = 푝표푙푦(푛) from a random vector in , where {(퐴,푚 푦)} 퐴 푚 × 푛 ℤ 푦 is a random vector in 푞 and is a random “short” 푞 푚 × 푛 퐴 ℤ 퐴푥푚 + 푒 Proof. Suppose that we had a decisional adversary that succeeds in 푞 ℤ vector in . The two distributions푛 are indistinguish-ℤ 푞 distinguishing the two distributions above with bias . For example, 푥able even to푚 an adversaryℤ that knows푒 . 푞 퐷 ℤ 퐴 휖 lattice based cryptography 247

suppose that outputs with probability on inputs from the first distribution, and outputs with probability on inputs from the second distribution.퐷 1 푝 + 휖 We will show how we can use1 this to obtain a polynomial-time푝 algorithm that on input noisy equations on and a value , will learn with high probability whether or not the first coordinate of 푞 equals .푆 Clearly, we can푚 repeat this for all the푥 possible values푎 ∈ of 푍 to learn the first coordinate exactly, and then continue in this wayto 푥learn all coordinates.푎 푞 푎 Our algorithm gets as input the pair where and we need to decide whether . Now consider the instance 푆 , where is a random(퐴, vector 푦) in 푦and = 퐴푥 the + matrix 푒 1 푚 푚 is simply the푥 matrix= 푎 with first column푚 equal to and(퐴 + all 푞 (푟‖0other푚‖ columns ⋯ ‖0푚), 푦+푎푟)equal to . If 푟 is random then ℤ is (푟‖0random‖ ⋯ as ‖0 well.) Now note that 푚 푟푚 and hence if then0 we퐴 still have an푚 input퐴푚 of + the (푟‖0 same‖ ⋯ form‖0 ) 1 . 퐴푥 + (푟|0 ⋯ ‖0 )푥 = 퐴푥 + 푥 푟 1 ′In contrast,′ 푥 we= claim 푎 that if if then the distribution (퐴where, 퐴 푥 + 푒) and is identical′ to′ 1 the uniform′ distribution푚 over푚 a random푥 ≠′ 푎 uniformly chosen matrix(퐴 , 푦 ) and a random퐴 = 퐴 and + (푟‖0 independent‖ ⋯ ‖0 ) uniformly푦 = 퐴푥 chosen + 푒 + vector 푎푟 . Indeed,′ we can write this distribution as where is chosen′ uniformly퐴 at random, and ′ ′ where ′ is a random푦 and in- dependent vector.′ (Can′ you see why?)(퐴 , 푦 ) Since 퐴 , this amounts 1 to adding a random푦 = and 퐴 푥 independent + 푒 + (푎 − 푥 vector)푟 to푟 , which means that 1 the distribution is uniform and independent.푎′ − 푥 ′ ≠ 0 Hence if we send′ the′ input to our the푟 decision푦 algorithm , then we would get(퐴 , 푦with) probability′ ′ if and an output of with probability otherwise. (퐴 , 푦 ) 퐷 1 Now the crucial1 observation is that푝 if + our 휖 decision푥 = 푎 algorithm 1 requires equations푝 to succeed with bias , we can use equations (which is still polynomial) to invoke it times.퐷 This2 allows us푚 to distinguish with probability 휖 between100푚푛/휖2 the case that outputs with probability and the−푛 case100푛/휖 that it outputs with probability (this follows from the Chernoff1 − 2 bound; can you see why?).퐷 Hence by1 using polynomially푝 + more 휖 samples than the decision1 algorithm , we푝 get a search algorithm that can actually recover . ■ 퐷 푆 푥

13.4 AN LWE BASED ENCRYPTION SCHEME

We can now show the secure variant of our original encryption scheme: 248 an intensive introduction to cryptography

LWE-based encryption LWE-ENC: * Parameters: Let and let be a prime such that LWE holds4 w.r.t. . We let log . 훿(푛) = 1/푛 푞 = 푝표푙푦(푛) 2 • Key generation:푞,Pick 훿 푚 =. The 푛 private푞 key is and the public key is 푛with with 푞 a -noise vector and푥 ∈aℤ random matrix.푥 • Encrypt: To encrypt (퐴, 푦) 푦 = 퐴푥given + 푒 the key푒 훿 , pick 퐴 푚 ×and 푛 output 푏 ∈(all computations {0,푚 1} are done⊤ (퐴, in 푦) ). 푤 ∈ {0, 1} 푤 퐴, ⟨푤, 푦⟩ + 푏⌊푞/2⌋ • Decrypt: To푞 decrypt , output iff ℤ . (푎, 휎) 0 |⟨푎, 푥⟩ − 휎| < 푞/10

P The scheme LWEENC is also described in Fig. 13.2 with slightly different notation. I highly recommend you stop and verify you understand why the two descriptions are equivalent.

Unlike our typical schemes, here it is not immediately clear that this encryption is valid, in the sense that the decrypting an encryption of returns the value . But this is the case: 푏 Lemma 13.3 With high probability, the decryption of the encryption of 푏 equals . 푏 Figure 13.2: In the encryption scheme LWEENC, 푏 the public key is a matrix , where Proof. . Hence, if then and is the secret key.′ To encrypt a bit ⊤ . But since every coordinate of is either or , we choose a random 퐴 = (퐴|푦), and output푦 = 7 . We decrypt to zero ⊤ ⟨푤 퐴, 푥⟩ = ⟨푤,for 퐴푥⟩ our choice of푦 parameters. = 퐴푥 + 푒 So,⟨푤, we 푦⟩ get = that 퐴푠 + 푒 푠 푚 푏 with key iff 푅 where the inner ⟨푤 퐴, 푥⟩ + ⟨푤, 푒⟩ 푤 0 1 ⊤ ′ 푞푤 ← {0, 1} 푛+1 if and then product푤 퐴 + is (0, done … , 0, modulo 푏⌊ 2 ⌋) . 푐 ∈ ℤ푞 |⟨푤,which 푒⟩| will <⊤ 훿푚푞 be smaller < 푞/10 than iff . 7 In fact, due푠 to|⟨푐, the (푠, fact −1)⟩| that ≤ the 푞/10signs of the error 푎 = 푤 퐴 휎 = ⟨푤, 푦⟩ + 푏⌊푞/2⌋ 휎 − ⟨푎, 푥⟩ = ⟨푤, 푒⟩ + 푏⌊푞/2⌋■ vector’s entries are different,푞 we expect the errors to have significant cancellations and hence we would 푞/10 푏 = 0 expect to only be roughly of magnitude , but this is not crucial for our discussions. We now prove security of the LWE based encryption: |⟨푤, 푒⟩| √ 푚훿푞 Theorem 13.4 — CPA security of LWEENC. If the LWE conjecture is true then LWEENC is CPA secure.

For a public key encryption scheme with messages that are just bits, CPA security means that an encryption of is indistinguishable from an encryption of , even given the public key. Thus Theorem 13.4 will follow from the following lemma: 0 1 lattice based cryptography 249

Lemma 13.5 Let be set as in LWEENC. Then, assuming the LWE conjecture, the following distributions are computationally indistin- guishable: 푞, 푚, 훿

• : The distribution over four-tuples of the form where is uniform in , is uniform in , ⊤is chosen with퐷 푚×푛, , and is푛 uniform(퐴, 푦, 푤 in퐴, ⟨푤, 푦⟩). 푞 푞 푞 퐴 ℤ 푥 ℤ 푒 ∈ 푍 푚 푖 • : The푒 distribution∈ {−훿푞, … , +훿푞} over four-tuples푦 = 퐴푥 + 푒 푤 where all{0, entries 1} are uniform: is uniform in , is uniform′ in , is uni- 퐷form in and is uniform in푚×푛. (퐴,′ 푦 , 푎, 휎) 푚 푞 푞 푛 퐴 ℤ 푦 ℤ 푎 푞 푞 ℤ 휎 ℤ

P You should stop here and verify that (i) You under- stand the statement of Lemma 13.5 and (ii) you un- derstand why this lemma implies Theorem 13.4. The idea is that Lemma 13.5 shows that the concatenation of the public key and encryption of is indistinguish- able from something that is completely random. You can then use it to show that the concatenation0 of the public key and encryption of is indistinguishable from the same thing, and then finish using the hybrid argument. 1

We now prove Lemma 13.5, which will complete the proof of Theo- rem 13.4.

Proof of Lemma 13.5. Define to be the distribution as in the lemma’s statement (i.e., for some , chosen⊤ as above). Define to be the distribution퐷 (퐴, 푦, 푤 where퐴, ⟨푤, 푦⟩) is chosen uniformly′ in . 푦 = 퐴푥 + 푒′ ⊤ 푥 ′푒 ′ We claim that퐷 is computationally푚 indistinguishable(퐴, 푦 , 푤 퐴, ⟨푤, 푦 from⟩) 푦 푞 under the LWE conjecture.′ ℤ Indeed by Theorem 13.2 (search to deci- sion reduction) this퐷 conjecture implies that the distribution over퐷 pairs with is indistinguishable from the distri- bution over pairs where is uniform. But if there푋 was some(퐴, polynomial-time′ 푦) 푦 = 퐴푥 algorithm′ + 푒 distinguishing′ from then we can푋 design a randomized(퐴, 푦 ) polynomial-time푦 algorithm distin-′ guishing from with the same푇 advantage by setting퐷 ′퐷 ′for random . 푇′ We will⊤푋 finish푋 the proof by showing that푚 the distribution푇 (퐴, 푦)is = 푅 푇statistically (퐴, 푦, 푤 퐴, indistinguishable ⟨푤, 푦⟩) (i.e.,푤 has ← negligible{0, 1} total variation distance)′ from . This follows from the following claim: 퐷 CLAIM: Suppose that log . If is a random matrix퐷 in , then with probability at least ′ over the choice of , the distribution푚 푚over > 100푛 which푞 is퐴 obtained−푛 by choosing푚 × 푛 + 1 푞 ′ ℤ ′ 푛+1 1 − 2 퐴 푞 퐴 푍 ℤ 푤 250 an intensive introduction to cryptography

at random in and outputting has at most statistical distance from the uniform푚 distribution⊤ over′ . −푛 Note that the{0, randomness 1} used for푤 the퐴 distribution푛+1 2 is only 푞 obtained by the choice of , and not by the choiceℤ of that′ is fixed. 퐴 (This passes a basic “sanity check” since has random푍′ bits, while the uniform distribution over푤 requires log 퐴 random bits, and hence at least has a푛 “fighting푤 chance”푚 in being statisti- 푞 cally close to it.) Another′ wayℤ to state the푛 same푞 claim ≪ 푚 is that the pair 퐴 is statistically푍 indistinguishable from the uniform distribu- tion′ ⊤ ′ where is a vector chosen independently at random from (퐴 , 푤. ′퐴 ) 푛+1The(퐴 claim, 푧) completes푧 the proof of the theorem, since letting be 푞 ℤthe matrix and , we see that the distribution , as′ the form where is a uniformly random matrix′퐴 and is sampled′ (퐴|푦) from ′푧(i.e., = (푎, 휎) where is uniformly퐷 chosen in (퐴 , 푧)). Hence this퐴 ′ means that⊤ the′ statistical푚 × distance (푛 + 1) of from 퐴 푧 (where푚 all elements푍 are uniform)푧 = 푤 퐴 is 푤. (Please make sure′ you understand{0, 1} this reasoning!) −푛 퐷 퐷 The proof of this claim relies on the 푂(2leftover) hash lemma. First, the basic idea of the proof: For every matrix over , define to be the map . This′ collection can be shown′ 푚 to be a푛 “good” hash function푚 ×′ (푛 + collection 1) ⊤ ′ in퐴 푞 퐴 푞 푞 퐴 someℤ specific ℎtechnical∶ ℤ →sense, ℤ which in particularℎ (푤) = 푤implies퐴 that forev- ery distribution with much more than log bits of min-entropy, with all but negligible probability over the choice of , is sta- tistically indistinguishable퐷 from the uniform푛 distribution.푞 ′ Now′ when 퐴 we choose at random in , it is coming from퐴 a distributionℎ (퐷) with bits of entropy. If 푚 log , then because the output of this function푤 is so much smaller{0, 1} than , we expect it to be completely uniform,푚 and this is what’s푚 shown ≫ (푛 + by 1) the leftover푞 hash lemma. Now we’ll formalize this blueprint.푚 First we need the leftover hash lemma. Lemma 13.6 Fix . Let be a universal hash family with functions . Let be a random variable with output in with log 휖 > 0 log ℋ . Then where follows a ℎuniform ∶ 풲 → distribution 풱 푊 over has statistical difference less풲 than from ∞ 퐻 (푊 )where ≥ |풱|is + uniform 2 (1/휖) over − 2 . (퐻(푊 ), 퐻) 퐻 ℋ 휖 To explain what a universal hash family is, a family of functions (푉 , 퐻) 푉 풱 is a universal hash family if Pr for ℋ all . 푅 ′ 1 ℎ← ℋ |풱| ℎ ∶First, 풲 → let’s′ 풱 see why Lemma 13.6 implies the claim.[ℎ(푥) Consider = ℎ(푥 )] ≤ the hash푥 ≠ family 푥 , where is defined by . For this′ hash family,′ the푚 probability푛+1 over of 퐴 퐴 푞 푞 ′ ⊤ℋ′ = {ℎ } ℎ ∶ ℤ → ℤ ′ ′ 퐴 ℎ (푤) = 푤 퐴 퐴 푤 ≠ 푤 lattice based cryptography 251

colliding is Pr Pr . Since is random, this is ′ ⊤ ′ . So′⊤ ′is a universal′ hash′ ⊤ family.′ ′ 퐴 퐴 The min entropy[푤 퐴푛+1 of = 푤 퐴 ] = is[(푤 the −same 푤 ) as퐴 the= entropy 0] (be-퐴 cause it is uniform)1/(푞 which) isℋ. The푚 output of the hash family is in 푅 , and log 푤 ← {0,log 1}. Since log by푛+1 assumption, Lemma푛+1 13.6 implies푚 that is close in 푞 푞 termsℤ of statistical|ℤ distance| = (푛 + to 1) 푞 where푚⊤ ≥is′ (푛chosen′ + 1) uniformly−10푛푞 + 20푛 in − 2 . ′ (푤 퐴 , 퐴 ) 2 푛+1Now, we’ll show this implies(푧, that 퐴 ) for probability푧 over the 푞 ℤselection of , the statistical distance between and is−푛 less than . If not, the′ distance between and⊤ ′≥ 1 −would 2 be at least−10푛 퐴 . ⊤ ′ ′ 푤 퐴 ′ 푧 2 Now−푛 for−푛 the proof−10푛 of Lemma 13.6(푤(based퐴 , 퐴 on) notes(푧, from 퐴 ) Daniel Wichs’s2 class⋅ 2 [WZM15> 2 ]). Let be the random variable , where the probability is over and . Let be an independent copy of . Step푍 1: we’ll show′ that Pr (퐻(푊 ), 퐻) . 퐻 푊 푍 ′ 1 푍 2 [푍 = 푍 ] ≤ |ℋ|⋅|풱| (1 + 4휖 ) Pr Pr ′ Pr Pr ′ ′ ′ [푍 = 푍 ] = [(퐻(푊 ), 퐻) = (퐻 (푊 ), 퐻 )] ′ ′ = [퐻Pr = 퐻 ] ⋅ [퐻(푊Pr ) = 퐻(푊 )] 1 ′ ′ ′ = ( [푊 = 푊 ] + [퐻(푊 ) = 퐻(푊 ) ∧ 푋 ≠ 푋 ]) |ℋ| 1 1 2 1 ≤ ( 휖 ⋅ 4 + ) |ℋ| |풱| |풱| 1 2 = (1 + 4휖 ). |ℋ| ⋅ |풱| Step 2: we’ll show this implies that the statistical difference between and is less than . Denote the statistical difference by . (퐻(푊 ), 퐻) (푉 , 퐻) 휖 Δ((퐻(푊 ), 퐻), (푉 , 퐻))

Pr 1 1 Δ((퐻(푊 ), 퐻), (푉 , 퐻)) = ∑ ∣ [푍 = (ℎ(푤), 푤)] − ∣ . 2 ℎ,푤 |ℋ| ⋅ |풱| Define Pr and sign . Write for the vector of all the and for1 the vector of all the . Then ℎ,푤 ℎ,푤 ℎ,푤 푥 = [푍 = (ℎ(푤), ℎ)] − |ℋ|⋅|풱| 푠 = (푥 ) ℎ,푤 ℎ,푤 푥 푥 푠 푠

1 Δ((퐻(푊 ), 퐻), (푉 , 퐻)) = ⟨푥, 푠⟩ Cauchy-Schwarz 2 1 2 2 ≤ ‖푥‖ ⋅ ‖푠‖ 2 √|ℋ| ⋅ |풱| 2 = ‖푥‖ . 2 252 an intensive introduction to cryptography

Let’s expand :

2 ‖푥‖ Pr 2 2 2 1 ‖푥‖ = ∑ ( [푍 = (ℎ(푤), ℎ)] − Pr ) ℎ,푤 Pr |ℋ| ⋅ |풱| 2 2 [푍 = (ℎ(푤), ℎ)] 1 = ∑ ( [푍 = (ℎ(푤), ℎ)] − + 2 ) ℎ,푤 |ℋ| ⋅ |풱| (|ℋ| ⋅ |풱|) 2 1 + 4휖 2 |ℋ| ⋅ |풱| ≤ − + 2 |ℋ| ⋅2 |풱| |ℋ| ⋅ |풱| (|ℋ| ⋅ |풱|) 4휖 = . When we|ℋ| plug ⋅ |풱| this in to our expression for the statistical distance, we get

√|ℋ| ⋅ |풱| 2 Δ((퐻(푊 ), 퐻), (푉 , 퐻)) ≤ ‖푥‖ 2 ■ ≤ 휖.

P The proof of Theorem 13.4 is quite subtle and requires some re-reading and thought. To read more about this, you can look at the survey of Oded Regev, “On the Learning with Error Problem” Sections 3 and 4.

8 By discrete we mean that points in are isolated. 13.5 BUT WHAT ARE LATTICES? One formal way to define it is that there is some such that every distinct are of distance at You can think of a lattice as a discrete version of a subspace. A lattice 퐿 least from one another. 휖 > 0 is simply a discrete subset of such that if and are 푢, 푣 ∈ 퐿 8 휖 integers then . A lattice푛 is given by a basis which simply 퐿a matrix such that every vectorℝ is obtained푢, 푣 ∈ as 퐿 푎, 푏 for some vector of푎푢 integers + 푏푣 ∈ 퐿. It can be shown that we can assume without loss of generality퐵 that is full dimensional푢 ∈ 퐿 and hence it’s푢 = an 퐵푥by invertible matrix. Note that푥 given a basis we can generate vectors in , as well as test whether퐵 a vector is in by testing if 푛 is an푛 integer vector. There can be many different퐵 bases for the−1 same lattice, and퐿 some of them are easier to work푣 with than퐿 others (see퐵Fig.푣 13.3). Figure 13.3:A lattice is a discrete subspace that

Some classical computational questions on lattices are: is closed under integer combinations. A basis for the푛 lattice is a minimal set (typically퐿 ⊆ ℝ ) such that every is an integer combination of • Shortest vector problem: Given a basis for , find the nonzero vec- . The same lattice푏1, … can , 푏푚 have different푚 = bases. 푛 tor with smallest norm in . In this figure the푢 ∈lattice 퐿 is a set of points in , and the1 black푚 vectors and the ref vectors are 퐵 퐿 푏 , … , 푏 2 two alternative bases for it. Generally we considerℝ the • Closest vector problem: Given a basis for and a vector that is not 1 2 1 2 푣 퐿 basis “better”푣 , 푣 since the vectors are shorter푢 , 푢 and in , find the closest vector to in . it is less “skewed”. 퐵 퐿 푢 푢1, 푢2 퐿 푢 퐿 lattice based cryptography 253

• Bounded distance decoding: Given a basis for and a vector of the form where is in , and is a particularly short “er- ror” vector (so in particular no other vector퐵 in퐿 the lattice is within푢 distance 푢 =to 푣 +), 푒 recover 푣. Note퐿 that this푒 is a special case of the closest vector problem. ‖푒‖ 푢 푣 In particular, if is a linear subspace of , we can think of it also as a lattice of where we simply say that푛 that a vector is in if 푞 all of ’s coordinates푛푉 are integers and if weℤ let mod then . The푉 learninĝ ℝ with error task of recovering from ̂푢 푉can̂ 푖 푖 then bê푢 thought of as an instance of the bounded푢 = distance ̂푢 ( decoding푞) 푢problem ∈ 푉 for . 푥 퐴푥 + 푒 A natural algorithm to try to solve the closest vector and bounded distance decoding푉̂ problems is that to take the vector , express it in the basis by computing , and then round all the coordinates of to obtain an integer vector−1 and let be푢 a vector in the lattice.퐵 If we have an extremely푤 = 퐵 푢 good basis for the lattice then may indeed푤 be the closest vector in thẽ푤 lattice,푣 but = in 퐵 other ̃푤 more “skewed” bases it can be extremely far from it. 퐿 푣

13.6 RING BASED LATTICES

One of the biggest issues with lattice based cryptosystem is the key size. In particular, the scheme above uses an matrix where each entry takes log bits to describe. (It also encrypts a single bit using a whole vector, but more efficient “multi-bit”푚 × variants 푛 are known.) Schemes using 푞ideal lattices are an attempt to get more practical vari- ants. These have very similar structure except that the matrix cho- sen is not completely random but rather can be described by a single vector. One common variant is the following: we fix some polynomial퐴 over with degree and then treat vectors in as the coefficients of degree polynomials and always work modulo푛 this polynomial 푞 푞 푝 . (Byℤ this I mean that푛 for every polynomial ofℤ degree at least we푛 write − 1 as where is the polynomial above, is some poly- 푝()nomial and is the “remainder” polynomial of푡 degree ; then 푛 mod 푡 .)푝푠 Now + 푟 for every푝 fixed polynomial , the푠 operation which is defined푟 as mod is a linear operation< mapping 푛 푡 푡 (polynomials푝) = 푟 of degree at most to polynomials푡 of degree at퐴 most , or put another푠 way,↦ 푡푠 it ( is a linear푝) map over . However, the map can be described using푛 the − 1 coefficients푛 of as opposed to 푞 푛the − 1 description of a matrix. It also turns out thatℤ by using the Fast 푑 Fourier퐴2 Transform we can evaluate this푛 operation in푡 roughly steps as opposed푛 to . The ideal lattice based cryptosystem use matrices of this form to save2 on key size and computation time. It is still unclear푛 if 푛 254 an intensive introduction to cryptography

this structure can be used for attacks; recent papers attacking principal ideal lattices have shown that one needs to be careful about this. One ideal-lattice based system is the “New Hope” cryptosystem (see also paper) that has been experimented with by Google. People have also made highly optimized general (non ideal) lattice based constructions, see in particular the “Frodo” system (paper here, can you guess what’s behind the name?). Both New Hope and Frodo have been submitted to the NIST competition to select a “post quantum” public key encryption standard. 14 Establishing secure connections over insecure channels

We’ve now compiled all the tools that are needed for the basic goal of cryptography (which is still being subverted quite often) allow- ing Alice and Bob to exchange messages assuring their integrity and confidentiality over a channel that is observed or controlled byan adversary. Our tools for achieving this goal are:

• Public key (aka asymmetric) encryption schemes.

• Public key (aka asymmetric) digital signatures schemes.

• Private key (aka symmetric) encryption schemes - block ciphers and stream ciphers.

• Private key (aka symmetric) message authentication codes and pseudorandom functions.

• Hash functions that are used both as ways to compress messages for authentication as well as key derivation and other tasks.

The notions of security we require from these building blocks can vary as well. For encryption schemes we talk about CPA (chosen plaintext attack) and CCA (chosen ciphertext attacks), for hash func- tions we talk about collision-resistance, being used (combined with keys) as pseudorandom functions, and then sometimes we simply model those as random oracles. Also, all of those tools require access to a source of randomness, and here we use hash functions as well for entropy extraction.

14.1 CRYPTOGRAPHY’S OBSESSION WITH ADJECTIVES.

As we learn more and more cryptography we see more and more adjectives, every notion seems to have modifiers such as “non- malleable”, “leakage-resilient”, “identity based”, “concurrently secure”, “adaptive”, “non-interactive”, etc.. etc… . Indeed, this

Compiled on 9.23.2021 13:32 256 an intensive introduction to cryptography

motivated a parody web page of an automatic crypto paper title gen- erator. Unlike algorithms, where typically there are straightforward quantitative tradeoffs (e.g., faster is better), in cryptography there are many qualitative ways protocols can vary based on the assumptions they operate under and the notions of security they provide. In particular, the following issues arise when considering the task of securely transmitting information between two parties Alice and Bob:

• Infrastructure/setup assumptions: What kind of setup can Alice and Bob rely upon? For example in the TLS protocol, typically Alice is a website and Bob is user; Using the infrastructure of certificate authorities, Bob has a trusted way to obtain Alice’s public signature key, while Alice doesn’t know anything about Bob. But there are many other variants as well. Alice and Bob could share a (low en- tropy) password. One of them might have some hardware token, or they might have a secure out of band channel (e.g., text messages) to transmit a short amount of information. There are even variants where the parties authenticate by something they know, with one recent example being the notion of witness encryption (Garg, Gen- try, Sahai, and Waters) where one can encrypt information in a “digital time capsule” to be opened by anyone who, for example, finds a proof of the Riemann hypothesis.

• Adversary access: What kind of attacks do we need to protect against. The simplest setting is a passive eavesdropping adversary (often called “Eve”) but we sometimes consider an active person- in-the-middle attacks (sometimes called “Mallory”). We sometimes consider notions of graceful recovery. For example, if the adversary manages to hack into one of the parties then it can clearly read their communications from that time onwards, but we would want their past communication to be protected (a notion known as forward secrecy). If we rely on trusted infrastructure such as certificate au- thorities, we could ask what happens if the adversary breaks into those. Sometimes we rely on the security of several entities or se- crets, and we want to consider adversaries that control some but not all of them, a notion known as threshold cryptography.

• Interaction: Do Alice and Bob get to interact and relay several messages back and forth or is it a “one shot” protocol? You may think that this is merely a question about efficiency but it turns out to be crucial for some applications. Sometimes Alice and Bob might not be two parties separated in space but the same party separated in time. That is, Alice wishes to send a message to her future self by storing an encrypted and authenticated version of it establishing secure connections over insecure channels 257

on some media. In this case, absent a time machine, back and forth interaction between the two parties is obviously impossible.

• Security goal: The security goals of a protocol are usually stated in the negative- what does it mean for an adversary to win the secu- rity game. We typically want the adversary to learn absolutely no information about the secret beyond what she obviously can. For example, if we use a shared password chosen out of possibilities, then we might need to allow the adversary success probability, but we wouldn’t want her to get anything beyond 푡 . In some settings, the adversary can obviously completely1/푡 disconnect the communication channel between Alice and Bob,1/푡 but + 푛푒푔푙(푛) we want her to be essentially limited to either dropping communication completely or letting it go by unmolested, and not have the ability to modify communication without detection. Then in some set- tings, such as in the case of steganography and anonymous routing, we would want the adversary not to find out even the fact that a conversation had taken place.

14.2 BASIC KEY EXCHANGE PROTOCOL

The basic primitive for secure communication is a key exchange pro- tocol, whose goal is to have Alice and Bob share a common random secret key . Once this is done, they can use a CCA secure / authenticated private-key푛 encryption to communicate with confiden- tiality and푘 integrity. ∈ {0, 1} The canonical example of a basic key exchange protocol is the Diffie Hellman protocol. It uses as public parameters a group with genera- tor , and then follows the following steps: 픾 1. Alice푔 picks random and sends . 푎 푅 2. Bob picks random 푎 ← {0, … , |픾| − 1}and sends 퐴 = 푔. 푏 푅 3. They both set their푏 key ← as{0, … , |픾| − 1}(which Alice퐵 computes = 푔 as and Bob computes as ), where 푎푏is some hash function. 푎 푏 푘 = 퐻(푔 ) 퐵 Another variant is using퐴 an arbitrary퐻 public key encryption scheme such as RSA:

1. Alice generates keys and sends to Bob.

2. Bob picks random (푑, 푒) and푒 sends to Alice. 푚 푅 푒 3. They both set their푘 key ← to{0,(which 1} Alice computes퐸 (푘) by decrypting Bob’s ciphertext) 푘 258 an intensive introduction to cryptography

Under plausible assumptions, it can be shown that these protocols secure against a passive eavesdropping adversary Eve. The notion of security here means that, similar to encryption, if after observing the transcript Eve receives with probability the value of and with probability a random string , then her probability of guessing which is the case would′ be at1/2 most푛 푘 (where can be thought1/2 of as log or푘 some← other {0, 1} parameter related to the length of bit representation of members in the group).1/2 + 푛푒푔푙(푛) 푛 |픾| 14.3 AUTHENTICATED KEY EXCHANGE

The main issue with this key exchange protocol is of course that ad- versaries often are not passive. In particular, an active Eve could agree on her own key with Alice and Bob separately and then be able to see and modify all future communication. She might also be able to create weird (with some potential security implications) correlations by, say, modifying the message to be etc.. For this reason, in actual applications2 we typically use authenticated key exchange. The notion퐴 of authentication퐴 used depends on what we can assume on the setup assumptions. A standard assumption is that Alice has some public keys but Bob doesn’t. (This is the case when Alice is a website and Bob is a user.) However, one needs to take care in how to use this assumption. Indeed, the standard protocol for securing the web: the transport Layer Security (TLS) protocol (and its predecessor SSL) has gone through six revisions (including a name change from SSL to TLS) largely because of security concerns. We now illustrate one of those attacks.

14.3.1 Bleichenbacher’s attack on RSA PKCS V1.5 and SSL V3.0 If you have a public key, a natural approach is to take the encryption- based protocol and simply skip the first step since Bob already knows the public key of Alice. This is basically what happened in the SSL V3.0 protocol. However, as was shown by Bleichenbacher in 1998, it turns out this is푒 susceptible to the following attack:

• The adversary listens in on a conversation, and in particular ob- serves where is the private key.

푒 • The adversary푐 = 퐸 (푘) then starts푘 many connections with the server with ciphertexts related to , and observes whether they succeed or fail (and in what way they fail, if they do). It turns out that based on this information, the adversary푐 would be able to recover the key .

Specifically, the version of RSA (known as PKCS V1.5) used in 푘 the SSL V3.0 protocol requires the value to have a particular for- mat, with the top two bytes having a certain form. If♯ in the course of 푥 establishing secure connections over insecure channels 259

the protocol, a server decrypts and gets a value not of this form then it would send an error message and halt the connection. While the designers of SSL V3.0 might푦 not have thought푥 of it that way, this amounts to saying that an SSL V3.0 server supplies to any party an oracle that on input outputs iff mod has this form, where mod is the secret decryption푑 key. It turned out that one can−1 use such an∗ 푦 oracle to invert1 푦 the( RSA푚) function. For a result of 푚 a푑 similar = 푒 flavor,( |)ℤ see| the (1/2 page) proof of Theorem 11.31 (page 418) in KL, where they show that an oracle that given outputs the least 1 1 The first attack of this flavor was given in the 1982 significant bit of mod allows to invert the RSA function. paper of Goldwasser, Micali, and Tong. Interestingly, For this reason,푑 new versions of the SSL used a푦 different variant of this notion of “hardcore bits” has been used for both RSA known as PKCS푦 ( $♯$1푚) V2.0 which satisfies (under assumptions) practical attacks against cryptosystems as well as theoretical (and sometimes practical) constructions of chosen ciphertext security (CCA) and in particular such oracles cannot other cryptosystems. be used to break the encryption. (Nonetheless, there are still some implementation issues that allowed to perform some attacks, see the note in KL page 425 on Manfer’s attack.)

14.4 CHOSEN CIPHERTEXT ATTACK SECURITY FOR PUBLIC KEY CRYPTOGRAPHY

The concept of chosen ciphertext attack security makes perfect sense for public key encryption as well. It is defined in the same way as itwas in the private key setting:

Definition 14.1 — CCA secure public key encryption. A public key encryp- tion scheme is chosen ciphertext attack (CCA) secure if every efficient Mallory wins in the following game with probability at most (퐺, 퐸, 퐷):

• The keys1/2 + 푛푒푔푙(푛)are generated via , and Mallory gets the public encryption key and . 푛 (푒, 푑) 푛 퐺(1 ) • For rounds, Mallory푒 1 gets access to the function . (She doesn’t need access to since she already knows푝표푙푦(푛).) 푐 ↦ 푑 푒 퐷 (푐) 푚 ↦ 퐸 (푚) • Mallory푒 chooses a pair of messages , a secret is cho- sen at random in , and Mallory gets . (Note 0 1 that she of course does not get the randomness{푚 , 푚∗ } used to푏 generate 푒 푏 this challenge encryption.){0, 1} 푐 = 퐸 (푚 )

• Mallory now gets another rounds of access to the func- tion except that she is not allowed to query . 푝표푙푦(푛) ∗ 푑 • Mallory푐 ↦ outputs 퐷 (푐) and wins if . 푐 ′ ′ 푏 푏 = 푏 260 an intensive introduction to cryptography

In the private key setting, we achieved CCA security by combining a CPA-secure private key encryption scheme with a message authenti- cating code (MAC), where to CCA-encrypt a message , we first used the CPA-secure scheme on to obtain a ciphertext , and then added an authentication tag by signing with the MAC. The푚 decryption algorithm first verified the푚 MAC before decrypting푐 the ciphertext. In the public key setting,휏 one might hope푐 that we could repeat the same construction using a CPA-secure public key encryption and replacing the MAC with digital signatures.

P Try to think what would be such a construction, and whether there is a fundamental obstacle to combin- ing digital signatures and public key encryption in the same way we combined MACs and private key encryption.

Alas, as you may have realized, there is a fly in this ointment. Ina signature scheme (necessarily) it is the signing key that is secret, and the verification key that is public. But in a public key encryption, the encryption key is public, and hence it makes no sense for it to use a secret signing key. (It’s not hard to see that if you reveal the secret signing key then there is no point in using a signature scheme in the first place.)

Why CCA security matters. For the reasons above, constructing CCA secure public key encryption is very challenging. But is it worth the trouble? Do we really need this “ultra conservative” notion of secu- rity? The answer is yes. Just as we argued for private key encryption, chosen ciphertext security is the notion that gets us as close as possible to designing encryptions that fit the metaphor of secure sealed envelopes. Digital analogies will never be a perfect imitation of physical ones, but such metaphors are what people have in mind when designing cryp- tographic protocols, which is a hard enough task even when we don’t have to worry about the ability of an adversary to reach inside a sealed envelope and XOR the contents of the note written there with some arbitrary string. Indeed, several practical attacks, including Bleichen- bacher’s attack above, exploited exactly this gap between the physical metaphor and the digital realization. For more on this, please see Victor Shoup’s survey where he also describes the Cramer-Shoup en- cryption scheme which was the first practical public key system to be shown CCA secure without resorting to the random oracle heuristic. (The first definition of CCA security, as well as the first polynomial- time construction, was given in a seminal 1991 work of Dolev, Dwork and Naor.) establishing secure connections over insecure channels 261

14.5 CCA SECURE PUBLIC KEY ENCRYPTION IN THE RANDOM ORACLE MODEL

We now show how to convert any CPA-secure public key encryption scheme to a CCA-secure scheme in the random oracle model (this construction is taken from Fujisaki and Okamoto, CRYPTO 99). In the homework, you will see a somewhat simpler direct construction of a CCA secure scheme from a trapdoor permutation, a variant of which is known as OAEP (which has better ciphertext expansion) has been standardized as PKCS $♯$1 V2.0 and is used in several protocols. The advantage of a generic construction is that it can be instantiated not just with the RSA and Rabin schemes, but also directly with Diffie- Hellman and Lattice based schemes (though there are direct and more efficient variants for these as well).

CCA-ROM-ENC Scheme:

• Ingredients: A public key encryption scheme and a two hash functions ′ ′ ′ (which we model as 2 independent′ (퐺 , 퐸 random,∗ 퐷 ) oracles푛 ) • Key퐻, 퐻 generation:∶ {0, 1} We→ generate {0, 1} keys for the underlying encryption scheme. • Encryption:′ 푛 To encrypt a message (푒, 푑) =, 퐺 (1 ) we select randomness for the under-ℓ lying encryption algorithm andℓ푚 output ∈ {0, 1} 푅 푟 ← {0,′ 1} 퐸 ′ ′ 푒 푒 where퐸 by(푚) = 퐸 (푟;we 퐻(푚‖푟))‖(퐻 denote the(푟) result ⊕ 푚) of , en- crypting the′ plaintext′ ′ using the key and the 푒 randomness퐸 (푚(we; 푟 ) assume′ the scheme takes bits of randomness′ as푚 input; otherwise modify푒 the output length푟 of accordingly). 푛 • Decryption: To decrypt a ciphertext first let , 퐻 and then check that ′ .′ If the check fails푐‖푦 we output 푑 error푟 = 퐷; otherwise′(푐) 푚 = we 퐻 output(푟) ⊕ 푦 . 푒 푐 = 퐸 (푚; 퐻(푚‖푟)) 푚 2 Recall that it’s easy to obtain two independent random oracles from a single oracle , for

example by letting ′ and ″ Theorem 14.2 — CCA security from random oracles. The above CCA-ROM- . 퐻, 퐻 ″ 퐻′ ENC scheme is CCA secure. ″ 퐻(푥) = 퐻 (0‖푥) 퐻 (푥) = 퐻 (1‖푥)

Proof. Note: The proof here refers to the original scheme in the notes (which was not secure) - should be updated by the scribes to the correct proof as presented in lecture. 262 an intensive introduction to cryptography

Suppose towards a contradiction that there exists an adversary that wins the CCA game with probability at least where is non-negligible. Our aim is to show that the decryption box would푀 be “useless” to and hence reduce CCA security1/2 to CPA + 휖 security휖 (which we’ll then derive from the CPA security of the underlying scheme). 푀 Consider the following “box” that will answer decryption queries of the adversary as follows: * If was returned before to the퐷̂ adversary as an answer to for some푐‖푦‖푧 , and and then return′ . * Otherwise푧 return error′ 퐻 (푚‖푟) 푒 Claim:푚,The 푟 probability푐 = 퐸 (푚 that 퐻(푚‖푟))answers a푦 query = 푚 ⊕ differently 푟 then푚 is negligible. Proof of claim: If gives a non퐷̂ error response to a query 퐷 then it must be that for some such that and 퐷 , in which′ case will return . The only푐‖푦‖푧 way that will answer this푧 = question 퐻 (푚‖푟) differently푚, is 푟 if 푦 =but 푟 ⊕ 푚 푒 the query푐 = 퐸 (푟; 퐻(푚‖푟))hasn’t been asked before퐷 by the adversary.푚 ′ Here there are two퐷̂ options. If this query has never been asked푧 = before 퐻 (푚‖푟) at all, then by the lazy푚‖푟 evaluation principle in this case we can think of as being independently chosen at this point, and the probability′ it happens to equal will be . If this query was asked by someone퐻 (푚‖푟) apart from the adversary then−푛 it could only have been asked by the encryption oracle푧 while producing2 the challenge ciphertext , but since the adversary is not allowed to ask this precise ciphertext,∗ ∗ ∗ then it must be a ciphertext of the form where 푐 ‖푦 ‖푧 and such a ciphertext would get an error response∗ from both oracles.∗ ∗ QED (claim) 푐‖푦‖푧 (푐, 푦) ≠ (푐 , 푦 ) Note that we can assume without loss of generality that if is the challenge message and is the randomness chosen in this challenge,∗ the adversary never asks∗ the query to the its or oracles,푚 since we can modify it so푟 that before making∗ ∗ a query ,′ it will first check if where 푚is‖푟 the challenge퐻 ciphertext,퐻 and if so use this to win the∗ game. ∗ ∗ ∗ 푚‖푟 푒 In other퐸 (푚 words, 푟) = if 푐 we modified푐 ‖푦 ‖푧 the experiment so the values and chosen while producing the challenge∗ are simply∗ random∗ strings′ ∗ chosen∗ completely independently of푅 every-= 퐻(푟thing‖푚) else. Now푧 note= 퐻 that(푚 ‖푟 our) oracle did not need to use the decryp- tion key . So, if the adversary wins the CCA game, then it wins the CPA game for the encryption scheme 퐷̂ where and 푑are simply independent random strings;′ we leave proving′ 푒 푒 that this′ scheme is CPA secure as an exercise퐸 (푚) = to 퐸 the(푟; reader. 푅)‖푟⊕푚‖푅 푅 푅 ■ establishing secure connections over insecure channels 263

14.5.1 Defining secure authenticated key exchange The basic goal of secure communication is to set up a between two parties Alice and Bob. We want to do so over an open network, where messages between Alice and Bob might be read, mod- ified, deleted, or added by the adversary. Moreover, we want Alice and Bob to be sure that they are talking to one another rather than other parties. This raises the question of what is identity and how is it verified. Ultimately, if we want to use identities, then we needto trust some authority that decides which party has which identity. This is typically done via a certificate authority (CA). This is some trusted authority, whose verification key is public and known to all par- ties. Alice proves in some way to the CA that she is indeed Alice, and 퐶퐴 푣 then generates a pair , and gets from the CA the message 3 3 The registration process could be more subtle than =“The key belongs to Alice” signed with . Now Alice that, and for example Alice might need to prove to 퐴푙푖푐푒 퐴푙푖푐푒 can send (푠 to Bob, 푣 to certify) that the owner of this public the CA that she does indeed know the corresponding 퐴푙푖푐푒 퐴푙푖푐푒 퐶퐴 휎key is indeed Alice.푣 푠 secret key. 퐴푙푖푐푒 퐴푙푖푐푒 For example,(푣 , in 휎 the) web setting, certain certificate authorities can certify that a certain public key is associated with a certain website. If you go to a website using the https protocol, you should see a “lock” symbol on your browser which will give you details on the certificate. Often the certificate is a chain of certificate. If I click on this locksym- bol in my Chrome browser, I see that the certificate that amazon.com’s public key is some particular string (corresponding to a 2048 RSA modulos and exponent) is signed by the Symantec Certificate author- ity, whose own key is certified by Verisign. My communication with Amazon is an example of a setting of one sided authentication. It is im- portant for me to know that I am truly talking to amazon.com, while Amazon is willing to talk to any client. (Though of course once we establish a secure channel, I could use it to login to my Amazon ac- count.) Chapter 21 of Boneh Shoup contains an in depth discussion of authenticated key exchange protocols.

P You should stop here and read Section 21.9 of Boneh Shoup with the formal definitions of authenticated key exchange, going back as needed to the previous section for the definitions of protocols AEK1 - AEK4.

14.5.2 The compiler approach for authenticated key exchange There is a generic “compiler” approach to obtaining authenticated key exchange protocols:

• Start with a protocol such as the basic Diffie-Hellman protocol that is only secure with respect to a passive eavesdropping adversary. 264 an intensive introduction to cryptography

• Then compile it into a protocol that is secure with respect to an ac- tive adversary using authentication tools such as digital signatures, message authentication codes, etc.., depending on what kind of setup you can assume and what properties you want to achieve.

This approach has the advantage of being modular in both the con- struction and the analysis. However, direct constructions might be more efficient. There are a great many potentially desirable properties of key exchange protocols, and different protocols achieve different subsets of these properties at different costs. The most common vari- ant of authenticated key exchange protocols is to use some version of the Diffie-Hellman key exchange. If both parties have public signature keys, then they can simply sign their messages and then that effec- tively rules out an active attack, reducing active security to passive security (though one needs to include identities in the signatures to ensure non repeating of messages, see here). The most efficient variants of Diffie Hellman achieve authentication implicitly, where the basic protocol remains the same (sending and ) but the computation of the secret shared key involves 푥 some authentication푦 information. Of these protocols a particularly푋 = 푔 efficient푌 = 푔 variant is the MQV protocol of Law, Menezes, Qu, Solinas and Vanstone (which is based on similar principles as DSA signatures), and its variant HMQV by Krawczyk that has some improved security properties and analysis.

14.6 PASSWORD AUTHENTICATED KEY EXCHANGE.

To be completed (the most natural candidate: use MACS with a password-derived key to authenticate communication - completely fails)

P Please skim Boneh Shoup Chapter 21.11

14.7 CLIENT TO CLIENT KEY EXCHANGE FOR SECURE TEXT MES- SAGING - ZRTP, OTR, TEXTSECURE

To be completed. See Matthew Green’s blog , text secure, OTR. Security requirements: forward secrecy, deniability.

14.8 HEARTBLEED AND LOGJAM ATTACKS

• Vestiges of past crypto policies.

• Importance of “perfect forward secrecy” IV ADVANCED TOPICS

15 Zero knowledge proofs

The notion of proof is central to so many fields. In mathematics, we want to prove that a certain assertion is correct. In other sciences, we often want to accumulate a preponderance of evidence (or statistical significance) to reject certain hypothesis. In criminal law the prose- cution famously needs to prove its case “beyond a reasonable doubt”. Cryptography turns out to give some new twists on this ancient no- tion. Typically a proof that some assertion X is true, also reveals some information about why X is true. When Hercule Poirot proves that Norman Gale killed Madame Giselle he does so by showing how Gale committed the murder by dressing up as a flight attendant and stabbing Madame Gisselle with a poisoned dart. Could Hercule convince us beyond a reasonable doubt that Gale did the crime without giving any information on how the crime was committed? Can the Russians prove to the U.S. that a sealed box contains an authentic nuclear warhead without revealing anything about its design? Can I prove to you that the number

has a prime factor whose last digit is without giving you any infor-푚 = mation385, 608, about 108, 395,’s 369, prime 363, factors? 400, 501, We 273, won’t 594, answer 475, 104, the 405, first 448, question, 848, 047, 062, 278,1 In case473, you 983 are curious, the factors of are but will show some insights on the latter7 two.1 and 푚 . Zero knowledge푚 proofs are proofs that fully convince that a statement 1, 172, 192, 558, 529, 627, 184, 841, 954, 822, 099 is true without yielding any additional knowledge. So, after seeing a zero 328, 963, 108, 995, 562, 790, 517, 498, 071, 717 knowledge proof that has a factor ending with , you’ll be no closer to knowing ’s factorization than you were before. Zero knowledge proofs were invented by푚 Goldwasser, Micali and Rackoff7 in 1982 and have since been푚 used in great many settings. How would you achieve such a thing, or even define it? And why on earth would it be useful? This is the topic of this lecture.

Compiled on 9.23.2021 13:32 268 an intensive introduction to cryptography

15.1 APPLICATIONS FOR ZERO KNOWLEDGE PROOFS.

Before we talk about how to achieve zero knowledge, let us discuss some of its potential applications:

15.1.1 Nuclear disarmament 2 To be fair, “only” about 170 million Americans live The United States and Russia have reached a dangerous and expen- in the 50 largest metropolitan areas and so arguably sive equilibrium by which each has about 7000 nuclear warheads, many people will survive at least the initial impact of a nuclear war, though it had been estimated that even much more than is needed to decimate each others’ population (and a “small” nuclear war involving detonation of 100 the population of much of the rest of the world).2 Having so many not too large warheads could have devastating global weapons increases the chance of “leakage” of weapons, or of an ac- consequences. cidental launch (which can result in an all out war) through fault in communications or rogue commanders. This also threatens the deli- cate balance of the Non-Proliferation Treaty which at its core is a bar- gain where non-weapons states agree not to pursue nuclear weapons and the five nuclear weapon states agree to make progress on nuclear disarmament. These huge quantities of nuclear weapons are not only dangerous, as they increase the chance of a leak or of an individual failure or rogue commander causing a world catastrophe, but also extremely expensive to maintain. For all of these reasons, in 2009, U.S. President Obama called to set as a long term goal a “world without nuclear weapons” and in 2012 talked about concretely talking to Russia about reducing “not only our strategic nuclear warheads, but also tactical weapons and war- heads in reserve”. On the other side, Russian President Putin has said already in 2000 that he sees “no obstacles that could hamper future deep cuts of strategic offensive armaments”. (Though as of 2018, po- litical winds on both sides have shifted away from disarmament and more toward armament.) There are many reasons why progress on nuclear disarmament has been so slow, and most of them have nothing to do with zero knowl- edge or any other piece of technology. But there are some technical hurdles as well. One of those hurdles is that for the U.S. and Russia to go beyond restricting the number of deployed weapons to significantly reducing the stockpiles, they need to find a way for one country to ver- ifiably prove that it has dismantled warheads. As mentioned inmy work with Glaser and Goldston (see also this page), a key stumbling block is that the design of a nuclear warhard is of course highly clas- sified and about the last thing in the world that the U.S. would liketo share with Russia and vice versa. So, how can the U.S. convince the Russian that it has destroyed a warhead, when it cannot let Russian experts anywhere near it? zero knowledge proofs 269

15.1.2 Voting Electronic voting has been of great interest for many reasons. One potential advantage is that it could allow completely transparent vote counting, where every citizen could verify that the votes were counted correctly. For example, Chaum suggested an approach to do so by publishing an encryption of every vote and then having the central authority prove that the final outcome corresponds to the counts of all the plaintexts. But of course to maintain voter privacy, we need to prove this without actually revealing those plaintexts. Can we do so?

15.1.3 More applications I chose these two examples above precisely because they are hardly the first that come to mind when thinking about zero knowledge. Zero knowledge has been used for many cryptographic applications. One such application (originating from work of Fiat and Shamir) is the use for identification protocols. Here Alice knows a solution to a puzzle , and proves her identity to Bob by, for example, providing an 푥 encryption of and proving in zero knowledge that is indeed an 3 3 As we’ll see, technically what Alice needs to do encryption푃 of a solution for . Bob can verify the proof, but because in such a scenario is use a zero knowledge proof of it is zero knowledge,푐 푥 learns nothing about the solution푐 of the puzzle knowledge of a solution for . and will not be able to impersonate푃 Alice. An alternative approach to 푃 such identification protocols is through using digital signatures; this connection goes both ways and zero knowledge proofs have been used by Schnorr and others as a basis for signature schemes. Another very generic application is for “compiling protocols”. As we’ve seen time and again, it is often much easier to handle passive adversaries than active ones. (For example, it’s much easier to get CPA security against the eavesdropping Eve than CCA security against the person-in-the-middle Mallory.) Thus it would be wonderful if we could “compile” a protocol that is secure with respect to passive attacks into one that is secure with respect to active ones. As was first shown by Goldreich, Micali, and Wigderson, zero knowledge proofs yield a very general such compiler. The idea is that all parties prove in zero knowledge that they follow the protocol’s specifications. Normally, such proofs might require the parties to reveal their secret inputs, hence violating security, but zero knowledge precisely guar- antees that we can verify correct behaviour without access to these inputs.

15.2 DEFINING AND CONSTRUCTING ZERO KNOWLEDGE PROOFS

So, zero knowledge proofs are wonderful objects, but how do we get them? In fact, we haven’t answered the even more basic question of how do we define zero knowledge? We have to start by the most basic task of defining what we mean bya proof. 270 an intensive introduction to cryptography

A proof system can be thought of as an algorithm (for “verifier”) that takes as input a statement which is some string and another string known as the proof and outputs if and only푉 if is a valid proof that the statement is correct. For example: 푥 휋 1 휋 • In Euclidean geometry, statements푥 are geometric facts such as “in any triangle the degrees sum to 180 degrees” and the proofs are step by step derivations of the statements from the five basic postulates.

• In Zermelo-Fraenkel + Axiom of Choice (ZFC) a statement is some 4 purported fact about sets (e.g., the Riemann Hypothesis4), and a Integers can be coded as sets in various ways. For example, one can encode as and if is the set proof is a step by step derivation of it from the axioms. encoding , we can encode ∅ using the - element set . 0 푁 • We can many define other “theories”. For example, a theory where 푛 푛 + 1 푛 + 1 the statements are pairs such that is a quadratic residue {푁} ∪ 푁 modulo and a proof for is the number such that mod , or a theory where(푥, 푚) the theorems푥 are Hamiltonian graphs2 (graphs푚 on vertices that contain푥 an -long푠 cycle) and푥 the = proofs 푠 (are the푚) description of the cycle. 퐺 푛 푛 All these proof systems have the property that the verifying algo- rithm is efficient. Indeed, that’s the whole point of a proof - it’s a sequence of symbols that makes it easy to verify that the statement is true. 푉 휋 To achieve the notion of zero knowledge proofs, Goldwasser and Micali had to consider a generalization of proofs from static sequences of symbols to interactive probabilistic protocols between a prover and a verifier. Let’s start with an informal example. The vast majority of humans have three types of cone cells in their eyes. This is the reason why we perceive the sky as blue (see also this), despite its color being quite a different spectrum than the blue of the rainbow, is that the projection of the sky’s color to our cones is closest to the projection of blue. It has been suggested that a tiny fraction of the human pop- ulation might have four functioning cones (in fact, only women, as it would require two X chromosomes and a certain mutation). How would a person prove to another that she is a in fact such a tetrachro- mat ? Proof of tetrachromacy: Suppose that Alice is a tetrachromat and can dis- tinguish between the colors of two pieces of plastic that would be identical to a trichromat. She wants to prove to a trichromat Bob that the two pieces are not identical. She can do this as follows: Alice and Bob will repeat the following experi- ment times: Alice turns her back and Bob tosses a coin and with probability 1/2 leaves the pieces 푛 zero knowledge proofs 271

as they are, and with probability 1/2 switches the right piece with the left piece. Alice needs to guess whether Bob switched the pieces or not. If Alice is successful in all of the repetitions then Bob will have confidence that the pieces are truly different. −푛 푛 1 − 2 We now consider a more “mathematical” example along simi- lar lines. Recall that if and are numbers then we say that is a quadratic residue modulo if there is some such that mod . Let us define푥 the 푚function NQR to output if푥 and2 only if mod for every푚 푠 . There is푥 a very = 푠 simple (way to푚) prove2 statements of the form “NQR(푚, 푥) ”: just1 give out .푥 However, ≠ 푠 ( here푚) is an interactive푠 ∈ {0, proof … , 푚 system − 1} to prove statements of the form “NQR ”: (푚, 푥) = 0 푠 • We have two(푚, parties: 푥) =Alice 1 and Bob. The common input is and Alice wants to convince Bob that NQR . (That is, that is not a quadratic residue modulo ). (푚, 푥) (푚, 푥) = 1 •푥 We assume that Alice can compute NQR푚 for every but Bob is polynomial time. (푚, 푤) 푤 ∈ •{0, The … protocol , 푚 − 1} will work as follows:

1. Bob will pick some random (e.g., by picking a random number in and discard∗ it if it has nontrivial g.c.d. 푚 with ) and toss a coin 푠 ∈ ℤ. If then Bob will send mod to{1, Alice … , 푚 and − otherwise 1} he will send mod to Alice.2 푚 푏 ∈ {0, 1} 푏 = 0 2 푠 2.( Alice will푚) use her ability to compute NQR 푥푠to( respond푚) with if Bob sent a quadratic residue and with otherwise. ′ (푚, ⋅) ′ 3.푏 Bob=accepts 0 the proof if . 푏 = 1 ′ To see that Bob will indeed푏 = accept푏 the proof, note that if is a non- residue then will have to be a non-residue as well (since if it had a root then 2 would be a root of ). Hence it will always푥 be the case that′ 푥푠′. −1 Moreover,푠 ′ if푠 푠 was a quadratic residue푥 of the form mod for some 푏 ,= then 푏 is simply a random quadratic′2 residue, which means′ that푥 in2 this case′ 2 Bob’s message is distributed푥 = 푠 the( same푚) regardless푠 of whether푥푠 = (푠 푠)or , and no matter what she does, Al- ice has probability at most of guessing . Hence if Alice is always successful than after 푏repetitions = 0 푏 = Bob 1 would have confidence that is indeed a non-residue1/2 modulo . 푏 −푛 푛 1 − 2 푥 푚 272 an intensive introduction to cryptography

P Please stop and make sure you see the similarities be- tween this protocol and the one for demonstrating that the two pieces of plastic do not have identical colors.

Let us now make the formal definition:

Definition 15.1 — Proof systems. Let be some function. A probabilistic proof for (i.e., a proof∗ for statements of the form “ ”) is a pair of interactive푓 ∶ {0, algorithms 1} → {0, 1}such that runs in polynomial time and they푓 satisfy: 푓(푥) = 1 (푃 , 푉 ) 푉• Completeness: If then on input , if and are given input and interact, then at the end of the interaction will output Accept with푓(푥) probability = 1 at least 푥. 푃 푉 푥 푉 • Soundness: If If then for any0.9 arbitrary (efficient or non efficient) algorithm , if and are given input and interact then at the푓(푥) end =will∗ 0 output∗ Accept with probability at most 푃 푃 푉 푥 푉 0.1

R Remark 15.2 — Functions vs languages. In many texts proof systems are defined with respect to languages as opposed to functions. That is, instead of talking about a function we talk about a lanugage .∗ These two viewpoints are completely equivalent푓 ∶ via{0, 1} the∗ mapping→ {0, 1} where 퐿 ⊆. {0, 1} 푓 ⟷ 퐿 퐿 = {푥 |푓(푥) = 1} Note that we don’t necessarily require the prover to be efficient (and indeed, in some cases it might not be). On the other hand, our soundness condition holds even if the prover uses a non efficient 5 strategy.5 We say that a proof system has an efficient prover if there People have considered the notion of zero knowl- edge systems where soundness holds only with re- is an NP-type proof system for (that is some efficient algorithm spect to efficient provers; these are known as argument such that there exists with iff and such that systems. implies that Π 퐿 ), such that the strategy for Π can be implemented efficiently휋 Π(푥, 휋)given = 1 any푥 static ∈ 퐿 proof for in this Π(푥,system. 휋) = 1 |휋| ≤ 푝표푙푦(|푥|) 푃 휋 푥

R Remark 15.3 — Notation for strategies. Up until now, we always considered cryptographic protocols where Alice and Bob trusted one another, but were worried about some adversary controlling the channel between zero knowledge proofs 273

them. Now we are in a somewhat more “suspicious” setting where the parties do not fully trust one an- other. In such protocols there is always a “prescribed” or honest strategy that a particular party should fol- low, but we generally don’t want the other parties’ security to rely on someone else’s good intention, and hence analyze also the case where a party uses an arbi- trary malicious strategy. We sometimes also consider the honest but curious case where the adversary is passive and only collects information, but does not deviate from the prescribed strategy. Protocols typically only guarantee security for party A when it behaves honestly - a party can always chose to violate its own security and there is not much we can (or should?) do about it.

15.3 DEFINING ZERO KNOWLEDGE

So far we merely defined the notion of an interactive proof system, but we need to define what it means for a proof tobe zero knowledge. Before we attempt a definition, let us consider an example. Going back to the notion of quadratic residuosity, suppose that and are public and Alice knows such that mod . She wants to convince Bob that this is the case. However she2 prefers not to푥 reveal푚. Can she convince Bob that푠 such an exist푥 = 푠without( revealing푚) any information about it? Here is a way to do so: 푠 푠 Protocol ZK-QR: Public input for Alice and Bob: ; Alice’s private input is such that mod . 2 푥, 푚 1. Alice푠 will pick a푥 random = 푠 ( and푚) send to Bob mod . ′ ′ ′2 2. Bob will pick a random bit푠 and send푥 =to 푥푠 Alice.( 푚)

3. If then Alice reveals 푏 ∈, hence {0, 1} giving out푏 a root for ; if then Alice reveals , hence showing′ a root for . ′ 푏 = 0 ′ 푠푠 ′ −1 푥 푏 = 1 4. Bob checks that the푠 value revealed by Alice푥 is푥 indeed a root of , if so then it “accepts”″ the proof. ′ −푏 푠 푥If 푥 was not a quadratic residue then no matter how was chosen, either or is not a residue and hence Bob will reject′ the proof with푥 probability′ ′ −1 at least . By repeating this times,푥 we can reduce the probability푥 푥 푥 of Bob accepting a the proof of a non residue to . On the other hand, we1/2 claim that we didn’t really푛 reveal anything−푛 about . Indeed, if Bob chooses , then the two messages 2 he sees can be thought of as a random quadratic residue and′ its ′ root. If푠 Bob chooses then푏 after = 0 dividing by (which′ he(푥 could, 푠푠 ) have done by himself) he still gets a random residue and푥 its root . 푏 = 1 푥 ″ ′ 푥 푠 274 an intensive introduction to cryptography

In both cases, the distribution of these two messages is completely in- dependent of , and hence intuitively yields no additional information about it beyond whatever Bob knew before. To define 푠zero knowledge mathematically we follow the following intuition:

A proof system is zero knowledge if the verifier did not learn anything after the interaction that he could not have learned on his own.

Here is how we formally define this:

Definition 15.4 — Zero knowledge proofs. A proof system for is zero knowledge if for every efficient verifier strategy there exists an efficient probabilistic algorithm (known as the(푃∗simulator , 푉 ) )푓 such that for every s.t. , the∗ following random푉 variables are computationally indistinguishable:푆 푥 푓(푥) = 1 • The output of after interacting with on input . ∗ • The output of 푉 on input . 푃 푥 ∗ 푆 푥 That is, we can show the verifier does not gain anything from the interaction, because no matter what algorithm he uses, whatever he learned as a result of interacting with the prover,∗ he could have just as equally learned by simply running the standalone푉 algorithm on the same input. ∗ 푆

R Remark 15.5 — The simulation paradigm. The natural way to define security is to say that a system is secure if some “laundry list” of bad outcomes X,Y,Z can’t happen. The definition of zero knowledge is differ- ent. Rather than giving a list of the events that are not allowed to occur, it gives a maximalist simulation condition. At its heart the definition of zero knowledge says the following: clearly, we cannot prevent the verifier from running an efficient algorithm on the public input, but we want to ensure that this is∗ the most he can learn from the interaction. This푆simulation paradigm has become the standard way to define security of a great many cryptographic applications. That is, we bound what an adversary Eve can learn by postulating some hypothetical adversary Lilith that is under much harsher conditions (e.g., does not get to interact with the prover) and ensuring that Eve cannot learn any- thing that Lilith couldn’t have learned either. This has an advantage of being the most conservative definition zero knowledge proofs 275

possible, and also phrasing security in positive terms- there exists a simulation - as opposed to the typical negative terms - events X,Y,Z can’t happen. Since it’s often easier for us to think of positive terms, paradox- ically sometimes this stronger security condition is easier to prove. Zero knowledge is in some sense the simplest setting of the simulation paradigm and we’ll see it time and again in dealing with more advanced notions.

The definition of zero knowledge is confusing since intuitively one thing that if the verifier gained confidence that the statement is true than surely he must have learned something. This is another one of those cases where cryptography is counterintuitive. To understand it better, it is worthwhile to see the formal proof that the protocol above for quadratic residousity is zero knowledge:

Theorem 15.6 — Zero knowledge for quadratic residuosity. Protocol ZK-QR above is a zero knowledge protocol.

Proof. Let be an arbitrary efficient strategy for Bob. Since Bob only sends a single∗ bit, we can think of this strategy as composed of two functions: 푉

• outputs the bit that Bob chooses on input and after Alice’s′ first message is . 1 푉 (푥, 푚, 푥 ) 푏 ′ 푥, 푚 • is whatever Bob푥 outputs after seeing Alice’s re- sponse ′to″ the bit . 2 푉 (푥, 푚, 푥″ , 푠 ) Both 푠and are efficiently푏 computable. We now need to come up with an efficient simulator that is a standalone algorithm that on 1 2 input 푉 will output푉 a distribution∗ indistinguishable from the output . The simulator will work푆 as follows: ∗ 푥, 푚 ∗ 1.푉 Pick 푆. ′ 푅 2. Pick 푏 ←at random{0, 1} in . If then let mod . Otherwise″ output ∗ mod . ′ ″2 푚 푠 ′ ℤ ″2푏 = 0 푥 = 푠 ( 푚) 3. Let 푥. If= 푥푠 (then go푚) back to step 1. ′ ′ 1 4. Output푏 = 푉 (푥, 푚, 푥 ) .푏 ≠ 푏 ′ ″ 2 The correctness푉 (푥, 푚, of 푥 the, 푠 ) simulator follows from the following claims (all of which assume that is actually a quadratic residue, since oth- erwise we don’t need to make any guarantees and in any case Alice’s behaviour is not well defined):푥 276 an intensive introduction to cryptography

Claim 1: The distribution of computed by is identical to the distribution of chosen by Alice.′ ∗ Claim 2: With′ probability at푥 least , .푆 Claim 3: Conditioned푥 on and the′ value computed in step 2, the value computed by ′ is1/2 identical푏 = 푏 to the′ value that Alice sends when her first″ message푏 = is 푏 ∗and Bob’s response푥 is . Together these푠 three claims imply푆′ that in expectation only in- vokes and a constant number푋 of times (since every time푏∗ it goes back to step 1 with probability at most ). They also imply푆 that the 1 2 output푉 of is푉 in fact identical to the output of in a true interaction with Alice.∗ Thus, we only need to prove1/2 the claims,∗ which is actually quite easy:푆 푉 Proof of Claim 1: In both cases, is a random quadratic residue. QED ′ Proof of Claim 2: This is a corollary푥 of Claim 1; since the distribu- tion of is identical to the distribution chosen by Alice, in particular gives out′ no information about the choice of . QED ′ Proof푥 of Claim 3: This follows from a direct calculation.′ The value 푥 sent by Alice is a square root of if and푏 of if . But this″ is identical to what happens for′ if . QED′ −1 푠 Together these complete the proof푥 of∗ 푏 the = 0 theorem.′ 푥 푥 푥 = 1 푆 푏 = 푏 ■

Theorem 15.6 is interesting but not yet good enough to guarantee security in practice. After all, the protocol that we really need to show is zero knowledge is the one where we repeat this procedure times. This is a general theorem that if a protocol is zero knowledge then repeating it polynomially many times one after the other (so called푛 “sequential repetition”) preserves zero knowledge. You can think of this as cryptography’s version of the equality “ ”, but as usual, intuitive things are not always correct and so this theorem does re- quire (a not super trivial) proof. It is a good exercise0 + 0 = to 0 try to prove it on your own. There are known ways to achieve zero knowledge with negligible soundness error and a constant number of communication rounds, see Goldreich’s book (Vol 1, Sec 4.9).

15.4 ZERO KNOWLEDGE PROOF FOR HAMILTONICITY.

We now show a proof for another language. Suppose that Alice and Bob know an -vertex graph and Alice knows a Hamiltonian cycle in this graph (i.e.. a length simple cycle- one that traverses all vertices exactly푛 once). Here is퐺 how Alice can prove that such a cycle 퐶exists without revealing any information푛 about it:

Protocol ZK-Ham: zero knowledge proofs 277

0. Common input: graph (in the form of an adjacency ma- trix); Alice’s private input: a Hamiltonian cycle which are distinct vertices퐻 such that 푛is × an 푛 edge in for 1 푛 all and is an edge as well.퐶 = (퐶 , … , 퐶 ) ℓ ℓ+1 (퐶 , 퐶 ) 퐻 푛 1 1. Bobℓ ∈chooses {1, … , a푛 random − 1} string(퐶 , 퐶 ) 3푛 2. Alice chooses a random permutation푧 ∈ {0, 1} on and let be the -permuted adjacency matrix of (i.e., iff is an edge in ). For every , Alice chooses휋 {1, a random … , 푛} string 푀 휋(푖),휋(푗) 휋 and let ,퐻 where 푀 = 1 (푖, 푗) is 푖,푗 a pseudorandom푛 퐻 generator.푖, 푗 She sends to Bob.푛 푥 3푛∈ 푖,푗 푖,푗 푖,푗 {0, 1} 푦 = 퐺(푥 ) ⊕ 푀 푧 퐺 ∶ {0, 1} → {0, 1} 푖,푗 푖,푗∈[푛] 3. Bob chooses a bit . {푦 }

4. If then Alice푏 sends∈ {0, 1} out and the strings for all ; If then Alice sends out the strings , , 푖,푗 together푏 = 0 with their indices. 휋 {푥 } 푖, 푗 휋(퐶1),휋(퐶2) 휋(퐶푛),휋(퐶1) 푏 = 1 푛 푥 … 푥 5. If then Bob computes to be the -permuted adjacency matrix of and verifies that all the ’s were computed from the푏 =’s 0 appropriately. If 푀 then verify휋 that the indices of 푖,푗 the strings퐻 sent by Alice form푦 a cycle and that indeed 푖,푗 푥 for every string푏 =that 1 was sent by Alice. 푖,푗 푖,푗 {푥 } 푦 = 푖,푗 푖,푗 퐺(푥 ) ⊕ 푧 푥 Theorem 15.7 — Zero Knowledge proof for Hamiltonian Cycle. Protocol ZK-Ham is a zero knowledge proof system for the language of Hamiltonian graphs. 6 6 Goldreich, Micali and Wigderson were the first to come up with a zero knowledge proof for an NP Proof. We need to prove completeness, soundness, and zero knowl- complete problem, though the Hamiltoncity protocol here is from a later work by Blum. We use Naor’s edge. commitment scheme. Completeness can be easily verified, and so we leave this to the reader. For soundness, we recall that (as we’ve seen before) with ex- tremely high probability the sets and will be disjoint (this probability푛 is 0 over the choice of that is done푛 by푆 the= verifier). {퐺(푥) ∶ 푥 Now, ∈ {0, assuming 1} } this is 1 푆the= case, {퐺(푥) given ⊕ the푧 ∶ messages 푥 ∈ {0, 1} } sent by the prover in the first step, define an matrix푧 with entries in as follows: 푖,푗 if , if ′ {푦 and} otherwise. ′ 푖,푗 We split푛 into × 푛′ two cases.푀 The first case′ {0, is 1,that ?} there exists푀 some= per-0 푖,푗 0 푖,푗 푖,푗 1 푖,푗 mutation푦 ∈ 푆 such푀 = that 1 (i)푦 ∈is 푆 a -permuted푀 = ? version of the input graph and (ii) contains′ a Hamiltonian cycle. Clearly in this case contains휋 a Hamiltonian′ 푀 cycle as휋 well, and hence we don’t need to consider퐺 it when analyzing푀 soundness. In the other case we claim that 퐺 278 an intensive introduction to cryptography

with probability at least the verifier will reject the proof. Indeed, if (i) is violated then the proof will be rejected if Bob chooses and if (ii) is violated then the1/2 proof will be rejected if Bob chooses . We now turn to showing zero knowledge. For this we need푏 = to 0 build a simulator for an arbitrary efficient strategy of Bob. Recall푏 = that 1 gets as input∗ the graph (but not the Hamiltonian∗ cycle ) and needs∗ to produce푆 an output that is indistinguishable푉 from the output of푆 . It will do so as follows:퐻 퐶 ∗ 0.푉 Pick . ′ 1. Let 푏 ∈ {0, 1} be the first message computed by on input . 3푛 ∗ 2. If 푧 ∈ {0,then 1} computes the second message as푉 Alice does:퐻 chooses′ a random∗ permutation on and let be the -permuted푏 = 0 adjacency푆 matrix of (i.e., iff is an edge in ). In contrast, if 휋 then{1, … , 푛}lets be푀 the all 휋(푖),휋(푗) 휋matrix. For every , chooses′ 퐻 a random푀 ∗ string = 1 (푖, 푗) ′ and let 퐻 ∗ 푏, where= 1 푆 푀 is1푛 a 푖,푗 pseudorandom generator.푖, 푗 푆 푛 푥 ∈ {0,3푛 1} 푖,푗 푖,푗 푖,푗 푦 = 퐺(푥 ) ⊕ 푀 푧 퐺 ∶ {0, 1} → {0, 1} 3. Let be the output of when given the input and the first message computed∗ as above. If then go back to step 0. 푏 푉 ′ 퐻 푖,푗 4. We compute{푦 } the fourth message of the푏 ≠protocol 푏 similarly to how Alice does it: if then it consists of and the strings for all ; If then we pick a random length- cycle and 푖,푗 the message consists푏 = 0 of the strings 휋 , , together{푥′ with} their indices.푖, 푗 푏 = 1 ′ ′ ′ 푛 ′ 퐶 퐶1,퐶2 퐶푛,퐶1 푛 푥 … 푥 5. Output whatever outputs when given the prior message. ∗ We prove the output푉 of the simulator is indistinguishable from the output of in an actual interaction by the following claims: Claim 1: ∗The message computed by is computationally indistinguishable푉 from the first message computed∗ by Alice. 푖,푗 Claim 2: The probability{푦 that} is at least푆 . Claim 3: The fourth message computed′ by is computationally indistinguishable from the fourth푏 message = 푏 computed∗ 1/3 by Alice. We will simply sketch here the proofs (again푆 see Goldreich’s book for full proofs): For Claim 1, note that if then the message is identical to the way Alice computes it. If ′ then the difference is that computes some strings of the form′푏 = 0 where Alice would∗ compute the corresponding strings푏 as= 1 this is indistinguishable푆 because 푖,푗 푖,푗 is a pseudorandom푦 generator퐺(푥 (and) + the 푧 distribution is the 푖,푗 same as ). 퐺(푥 ) 3푛 퐺 푈 ⊕ 푧 3푛 푈 zero knowledge proofs 279

Claim 2 is a corollary of Claim 1. If managed to pick a message such that Pr ∗then in particular it could distinguish between′ the first message푉 of Alice (that is computed inde- 푏pendently of [푏and = 푏 hence] < 1/2 contains − 푛푒푔푙(푛) no information about it) from the first message′ of . For Claim 3,푏 note∗ that again if then the message is computed in a way identical푉 to what Alice does. If then this message is also computed in a way identical to Alice,푏 = 0 since it does not matter if instead of picking at random, we picked a random푏 = 1 permutation and let be the image′ of the Hamiltonian cycle under this permutation. ′ This completes퐶 the proof of the theorem. 휋 퐶 ■

15.4.1 Why is this interesting? The reason that a protocol for Hamiltonicity is more interesting than a protocol for quadratic residuosity is that Hamiltonicity is an NP- complete question. This means that for every other NP language , we can use the reduction from to Hamiltonicity combined with protocol ZK-Ham to give a zero knowledge proof system for . In particular퐿 this means that we can have퐿 zero knowledge proofs for the following languages: 퐿

• The language of numbers such that there exists a prime divid- ing whose remainder modulo is . 푚 푝 • The푚 language of tuples 10 such7 that is an encryption of a number with . (This is essentially what we needed 1 푛 푖 in the voting example above).푋, 푒, 푐 , … , 푐 푐 푖 푖 푥 ∑ 푥 = 푋 • For every efficient function , the language of pairs such that there exists some input satisfying . (This is what we often need in the “protocol퐹 compiling” applications푥, to 푦 show that a particular output was produced푟 by푦 the = correct 퐹 (푥‖푟) program on public input and private input .) 퐹 푥 푟 15.5 PARALLEL REPETITION AND TURNING ZERO KNOWLEDGE PROOFS TO SIGNATURES.

While we talked about amplifying zero knowledge proofs by running them times one after the other, one could also imagine running the copies in parallel. It is not trivial that we get the same benefit of re- ducing푛 the error to but it turns out that we do in the cases we are 푛interested in here. Unfortunately,−푛 zero knowledge is not necessarily preserved. It’s an important2 open problem whether zero knowledge is preserved for the ZK-Ham protocol mentioned above. 280 an intensive introduction to cryptography

Figure 15.1: Using a zero knowledge protocol for Hamiltonicity we can obtain a zero knowledge pro- tocol for any language in NP. For example, if the public input is a SAT formula and the Prover’s se- cret input is a satisfying퐿 assignment for then the verifier can run the reduction휑 on to obtain a graph and the prover can run the same reduction푥 휑 to ob- tain from a Hamiltonian cycle 휑in . They can 퐻then run the ZK-Ham protocol to prove that indeed is Hamiltonian푥 (and hence the퐶 original퐻 formula was satisfiable) without revealing any information the 퐻verifier could not have obtain on his own.

However, Fiat and Shamir showed that in protocols (such as the ones we showed here) where the verifier only sends random bits, then if we replaced this verifier by a random function, then both soundness and zero knowledge are preserved. This suggests a non-interactive version of these protocols in the random oracle model, and this is indeed widely used. Schnorr designed signatures based on this non interactive version.

15.5.1 “Bonus features” of zero knowledge • Proof of knowledge

• Deniability / non-transferability 16 Fully homomorphic encryption: Introduction and bootstrap- ping

In today’s era of “cloud computing”, much of individuals’ and businesses’ data is stored and computed on by third parties such as Google, Microsoft, Apple, Amazon, Facebook, Dropbox and many others. Classically, cryptography provided solutions to protecting data in motion from point A to point B. But these are not always suffi- cient to protect data at rest and particularly data in use. For example, suppose that Alice has some data (in modern applica- tions would well be terabytes in length or larger)푛 that she wishes to store with the cloud service Bob, but푥 ∈ is {0, afraid 1} that Bob will be hacked, subpoenaed푥 or simply does not completely trust Bob. Encryption does not seem to immediately solve the problem. Alice could store at Bob an encrypted version of the data and keep the secret key for herself. But then she would be at a loss if she wanted to do with the data anything more than retrieving particular blocks of it. If she wanted to outsource computation to Bob as well, and compute for some function , then she would need to share the secret key with Bob, thus defeating the purpose of encrypting the data in the first 푓(푥)place. 푓 For example, after the computing systems of Office of Personell Management (OPM) were discovered to be hacked in June of 2015, revealing sensitive information, including fingerprints and all data gathered during security clearance checks of up to 18 million people, DHS assistant secretary for cybersecurity and communications Andy Ozment said that encryption wouldn’t have helped preventing it since “if an adversary has the credentials of a user on the network, then they can access data even if it’s encrypted, just as the users on the network have to access data”. So, can we encrypt data in a way that still allows some access and computing on it? Already in 1978, Rivest, Adleman and Dertouzos considered this problem of a business that wishes to use a “commercial time-sharing

Compiled on 9.23.2021 13:32 282 an intensive introduction to cryptography

service” to store some sensitive data. They envisioned a potential solu- tion for this task which they called a privacy homomorphism. This no- tion later became known as fully homomorphic encryption (FHE) which is an encryption that allows a party (such as the cloud provider) that does not know the secret key to modify a ciphertext encrypting to a ciphertext encrypting for every efficiently computable . In particular in′ our scenario above (see Fig. 16.1), such푐 a scheme푥 will allow Bob,푐 given an encryption푓(푥) of , to compute the encryption푓() of and send this ciphertext to Alice without ever getting the secret key and so without ever learning anything푥 about (or for that 푓(푥)matter). Unlike the case of a trapdoor function, where it푥 only푓(푥) took a year for Diffie and Hellman’s challenge to be answered by RSA, in thecaseof fully homomorphic encryption for more than 30 years cryptographers had no constructions achieving this goal. In fact, some people sus- pected that there is something inherently incompatible between the security of an encryption scheme and the ability of a user to perform all these operations on ciphertexts. Stanford cryptographer Dan Boneh Figure 16.1: A fully homomorphic encryption can be used to joke to incoming graduate students that he will immediately used to store data on the cloud in encrypted form, but still have the cloud provider be able to evaluate sign the thesis of anyone who came up with a fully homomorphic en- functions on the data in encrypted form (without cryption. But he never expected that he will actually encounter such ever learning either the inputs or the outputs of the a thesis, until in 2009, Boneh’s student Craig Gentry released a paper function they evaluate). doing just that. Gentry’s paper shook the world of cryptography, and instigated a flurry of research results making his scheme more effi- cient, reducing the assumptions it relied on, extending and applying it, and much more. In particular, Brakerski and Vaikuntanathan man- aged to obtain a fully homomorphic encryption scheme based only on the Learning with Error (LWE) assumption we have seen before. Although there is an open source library, as well as other imple- mentations, there is still much work to be done in order to turn FHE from theory to practice. For a comparable level of security, the encryp- tion and decryption operations of a fully homomorphic encryption scheme are several orders of magnitude slower than a conventional public key system, and (depending on its complexity) homomorphi- cally evaluating a circuit can be significantly more taxing. However, this is a fast evolving field, and already since 2009 significant opti- mizations have been discovered that reduced the computational and storage overhead by many orders of magnitudes. As in public key encryption, one would imagine that for larger data one would use a “hybrid” approach of combining FHE with symmetric encryption, though one might need to come up with tailor-made symmetric en- 1 1 In 2012 the state of art on homomorphically evalu- cryption schemes that can be efficiently evaluated. ating AES was about six orders of magnitude slower In this lecture and the next one we will focus on the fully homo- than non-homomorphic AES computation. I don’t morphic encryption schemes that are easiest to describe, rather than the know what’s the current record. fully homomorphic encryption: introduction and bootstrapping 283

ones that are most efficient (though the efficient schemes share many similarities with the ones we will talk about). As is generally the case for lattice based encryption, the current most efficient schemes are based on ideal lattices and on assumptions such as ring LWE or the 2 2 As we mentioned before, as a general rule of thumb, security of the NTRU cryptosystem. the difference between the ideal schemes and the one that we describe is that in the ideal setting one deals with structured matrices that have a compact R representation as a single vector and also enable fast Remark 16.1 — Lesson from verifying computation. FFT-like matrix-vector multiplication. This saves a To take the distance between theory and practice factor of about in the storage and computation in perspective, it might be useful to consider the requirements (where is the dimension of the case of verifying computation. In the early 1990’s re- subspace/lattice).푛 However, there can be some subtle searchers (motivated initially by zero knowledge security implications for푛 ideal lattices as well, see e.g., proofs) came up with the notion of probabilistically here, here, here, and here. checkable proofs (PCP’s) which could yield in prin- ciple extremely succinct ways to check correctness of computation. Probabilistically checkable proofs can be thought of as “souped up” versions of NP completeness reductions and like these reductions, have been mostly used for negative results, especially since the initial proofs were extremely complicated and also included enormous hidden constants. However, with time people have slowly understood these better and made them more efficient (e.g., see this survey) and it has now reached the point where these results, are nearly practical (see also this) and in fact these ideas underly at least one startup. Overall, constructions for verifying computa- tion have improved by at least 20 orders of magnitude over the last two decades. (We will talk about some of these constructions later in this course.) If progress on fully homomorphic encryption follows a similar trajectory, then we can expect the road to practical utility to be very long, but there is hope that it’s not a “bridge to nowhere”.

R Remark 16.2 — Poor man’s FHE via hardware. Since large scale fully homomorphic encryption is still impracti- cal, people have been trying to achieve at least weaker security goals using certain assumptions. In particular Intel chips have so called “Secure enclaves” which one can think of as a somewhat tamper-protected region of the processor that is supposed to be out of reach for the outside world. The idea is that a cloud provider client would treat this enclave as a trusted party that it can communicate with through the cloud provider. The client can store their data on the cloud encrypted with some key , and then set up a secure channel with the enclave using an authenticated key exchange protocol, and send푘 over. Then, when the client sends over a function to the cloud provider, the latter party 푘 푓 284 an intensive introduction to cryptography

can simulate FHE by asking the enclave to compute the encryption of given the encryption of . In this solution ultimately the private key does reside on the cloud provider’s푓(푥) computers, and the client푥 has to trust the security of the enclave. In practice, this could provide reasonable security against remote hackers, but (unlike FHE) probably not against sophisticated attackers (e.g., governments) that have physical access to the server.

16.1 DEFINING FULLY HOMOMORPHIC ENCRYPTION

We start by defining partially homomorphic encryption. We focus on en- cryption for single bits. This is without loss of generality for CPA secu- rity (CCA security is anyway ruled out for homomorphic encryption- can you see why?), though there are more efficient constructions that encrypt several bits at a time.

Definition 16.3 — Partially Homomorphic Encryption. Let be a class of functions where every maps to . ℓ An -homomorphic public key encryption scheme isℱℓ a CPA = secure ∪ℱ ℓ public key encryption scheme 푓 ∈ ℱ such{0, that 1} there{0, exists 1} a polynomial-timeℱ algorithm EVAL such that for every , (퐺,, 퐸, 퐷) ∗ , and∗ of description size 푛at most it∶ holds{0, 1} that:→ {0, 1} 1 ℓ ℓ (푒, 푑) = 퐺(1 ) ℓ = 푝표푙푦(푛) 푥 , … , 푥 ∈ {0, 1} 푓 ∈ ℱ • EVAL |푓| 푝표푙푦(ℓ) has length at most .

푒 푒 1 푒 ℓ • 푐 = (푓, 퐸 (푥 ),. … , 퐸 (푥 )) 푛

푑 1 ℓ 퐷 (푐) = 푓(푥 , … , 푥 ) Figure 16.2: In a valid encryption scheme , the set of ciphertexts such that is a superset P of the set of ciphertexts such that 퐸 Please stop and verify you understand the defini- for some 푐 where퐷푑(푐)is the = 푏 number of 푒 tion. In particular you should understand why some random bits used by푡 the푐 encryption푐 algorithm. = 퐸 (푏; 푟) Our bound on the length of the output of EVAL is needed definition푟 ∈of {0, partially 1} homomorphic푡 encryption to rule out trivial constructions that are the analo- scheme requires that for every in our family and , if gous of the cloud provider sending over to Alice the ℓ for then EVAL is in the superset entire encrypted database every time she wants to ℓ 푓 ∶ {0, 1} → {0, 1}푡 푥of ∈ {0, 1} 푐푖 ∈ 퐸푒.(푥 For푖; {0, example 1} ) evaluate a function of it. By artificially increasing the 1 ℓ if we푖 = apply 1..ℓ EVAL to the(푓,OR 푐 function, … , 푐 ) and푡 ciphertexts randomness for the key generation algorithm, this {푐 | 퐷that푑(푐) were = 푓(푥)} obtained퐸 as푒(푓(푥); encryptions {0, 1} ) of and is equivalent to requiring that for some respectively,′ then the output is a ciphertext that fixed polynomial that does not grow with or . would푐, 푐 be decrypted to OR , even1 if″ 0 You should also understand the|푐| distinction ≤ 푝(푛) between is not in the smaller set of possible outputs of푐 the″ ciphertexts that are푝(⋅) the output of the encryptionℓ algo-|푓| encryption algorithm on .(1, This 0) distinction = 1 between푐 rithm on the plaintext , and ciphertexts that decrypt the smaller and larger set is the reason why we cannot to , see Fig. 16.2. automatically apply the EVAL1 function to ciphertexts that are obtained from the outputs of previous EVAL 푏 operations. 푏 fully homomorphic encryption: introduction and bootstrapping 285

A fully homomomorphic encryption is simply a partially homomor- phic encryption scheme for the family of all functions, where the description of a function is as a circuit (say composed of NAND gates, which are known to be a universal basis).ℱ

16.1.1 Another application: fully homomorphic encryption for verifying computation The canonical application of fully homomorphic encryption is for a client to store encrypted data on a server, send a function to the server, and get back the encryption of . This ensures that the server does not learn any information퐸(푥) about , but does not푓 ensure that it actually computes the correct퐸(푓(푥)) function! 푓(푥) Here is a cute protocol to achieve the latter goal푥 (due to Chung Kalai and Vadhan). Curiously the protocol involves “doubly encrypt- ing” the input, and homomorphically evaluating the EVAL function itself.

• Assumptions: We assume that all functions that the client will be interested in can be described by a string of length . 푓 • Preprocessing: The client generates a pair of keys 푛 . In the initial stage the client computes the encrypted database and sends to the server. It also computes (푒, 푑) for 푒 some function ′ as well as EVAL ∗ for푐 some =∗ 퐸 (푥) 푒 function 푐,and 푒, 푒 ∗ keeps for∗∗ herself, where 푐 ∗ = 퐸 (푓 ) is 푒 the circuit∗ evaluation푓 function.∗ ∗∗퐶 = (푒푣푎푙, 푐 ‖푐) 푓 푐 , 푐 푒푣푎푙(푓, 푥) = 푓(푥) • Client query: To ask for an evaluation of , the client generates a new random FHE keypair , chooses and lets and ′ ′ . It sends푓 the triple to 푅 the server.′ (푒 , 푑′ ) ∗ 푏 ← {0, 1} ′ 푏 푒 푒 1−푏 푒 0 1 푐 = 퐸 (퐸 (푓)) 푐 = 퐸 (푐 ) 푒 , 푐 , 푐 • Server response: Given the queries , the server defines the function where EVAL (for 0 1 the fixed received)∗ and computes∗ 푐 , 푐 where EVAL . 푒 (Please pause푔 ∶ {0, here 1} and→ {0, make 1} sure you′푔(푐) understand′ = ′ what(푒푣푎푙, this 푐‖푐)′ step is 0 1 푏 푒 푏 푏 doing! Note푐 that we use here crucially푐 , 푐the fact that푐 =EVAL itself(푔 is, 푐 a) polynomial time computation.)

• Client check: Client checks whether and if so accepts as the answer. ′ ′ ∗∗ 푑 1−푏 ′ ′ 퐷 (푐 ) = 푐 푑 푑 We claim퐷 that(퐷 if(푐 the푏)) server cheats then the client will detect this with probability . Working this out is a great exercise. The probability of detection can be amplified to using appropriate repetition,1/2 − 푛푒푔푙(푛)see the paper for details. 1 − 푛푒푔푙(푛) 286 an intensive introduction to cryptography

16.2 EXAMPLE: AN XOR HOMOMORPHIC ENCRYPTION

It turns out that Regev’s LWE-based encryption LWEENC we saw be- fore is homomorphic with respect to the class of linear (mod 2) func- tions. Let us recall the LWE assumption and the encryption scheme based on it.

Definition 16.4 — LWE (simplified decision variant). Let be some function mapping the natural numbers to primes. The -decision learning with error ( -dLWE) conjecture is the following:푞 = 푞(푛) for every there is a distribution LWE over pairs 푞(푛)such that: 푞(푛) 푞 푚• =is 푝표푙푦(푛) an matrix over and satisfies(퐴, 푠) and for every . 푛 푞 푞 푞 1 퐴 푚 × 푛 ℤ 푠 ∈ ℤ 푠 = ⌊ 2 ⌋ 푖 √ • The|퐴푠| distribution≤ 푞 where푖 ∈ {1, … , 푚}is sampled from LWE is com- putationally indistinguishable from the uniform distribution of 푞 matrices over퐴 . (퐴, 푠)

푞 푚 × 푛 ℤ The dLWE conjecture is that -dLWE holds for every that is at most . This is not exactly the same phrasing we used before, but as we sketch below, it is essentially푞(푛) equivalent to it. One푞(푛) can also make the푝표푙푦(푛) stronger conjecture that -dLWE holds even for that is super polynomial in (e.g., magnitude roughly - note that such a number can still be described푞(푛) in bits and we can푛 푞(푛) still efficiently perform operations푛 푞(푛) such as addition and2 multiplication modulo ). This stronger conjecture also seems푛 well supported by evidence and we will use it in future lectures. 푞

P It is a good idea for you to pause here and try to show the equivalence on your own.

Equivalence between LWE and DLWE: The reason the two conjectures are equivalent are the following. Before we phrased the conjecture as recovering from a pair where and for every . We then showed a search′ to decision reduction′ ′ (Theorem 13.2) 푖 demonstrating푠 that this is(퐴 equivalent, 푦) to푦 the = 퐴 task푠 + of 푒 distinguishing|푒 | ≤ 훿푞 between푖 this case and the case that is a random vector. If we now let and mod , and consider the matrix 푞and the column−1 vector 푦 we see that . Note 2 that훼 if =′is ⌊ a⌋ random훽 = vector 훼 ( in then푞) 훼 so′ is and so the current퐴 = 푠 (−훽푦|퐴form of the) conjecture follows from푚푠 = the ( previous) one.퐴푠 (To = reduce 푒 the 푞 number푦 of free parameters, weℤ fixed to equal−훽푦 ; in this form the conjecture becomes stronger as grows.) √ 훿 1/ 푞 푞 fully homomorphic encryption: introduction and bootstrapping 287

A linearly-homomorphic encryption scheme: The following variant of the LWE-ENC described in Section 13.4 turns out to be linearly homomor- phic:

LWE-ENC’ encryption:

• Key generation: Choose from LWE where satisfies log . 푞 • To encrypt 1/4 , choose(퐴, 푠) and out- 푚 푞 ≫ 푚 푞 ≫ 푛 put . 푚 • To decrypt⊤ 푏 ∈ {0, 1}, output 푤iff ∈ {0, 1} , 푤 퐴 + (푏, 0, … , 0) where for we푛 defined min . 푞 (Recall that푐 the ∈ first ℤ coordinate0 |⟨푐, of is 푠⟩| ≤. 푞/10 푞 푥 ∈ ℤ |푥| = {푥, 푞 − 푥} 푠 ⌊푞/2⌋

The decryption algorithm recovers the original plaintext since and . It turns out that this scheme is⊤ homomorphic with⊤ respect to the class of linear functions 1 √ ⟨푐,modulo 푠⟩ = 푤. Specifically퐴푠 + 푠 푏 we|푤 퐴푠|make ≤ 푚the following푞 ≪ 푞 claim: Lemma 16.5 For every , there is an algorithm EVAL that on 2 input encrypting via1/4 LWEENC bits , outputs ℓ a ciphertext whose decryptionℓ ≪ 푞 . 1 ℓ 1 ℓ 푐 , … , 푐 푏 , … , 푏 ∈ {0, 1} 1 ℓ 푐 푏 ⊕ ⋯ ⊕ 푏 P This claim is not hard to prove, but working it out for yourself can be a good way to get more familiarity with LWE-ENC’ and the kind of manipulations we’ll be making time and again in the constructions of many lattice based cryptographic primitives. Try to show that (where addition is done as vectors in ) will be the encryption . 1 ℓ Note that if푐 =is super 푐 + polynomial ⋯ + 푐 in then can be an 푞 1 ℓ arbitrarily largeℤ polynomial in . 푏 ⊕ ⋯ ⊕ 푏 푞 푛 ℓ 푛 Proof of Lemma 16.5. The proof is quite simple. EVAL will simply add the ciphertexts as vectors in . If then

푞 푖 mod ℤ 푐 = ∑ 푐 푞 where is a “noise term” such푖 that . Since ⟨푐, 푠⟩ = ∑ 푏 ⌊ 2 ⌋ + 휉 푞 , adding at most terms of this difference adds at most , 푞 √ and푞 so휉 we푞 ∈ can ℤ also write |휉| ≤ ℓ푚 푞 ≪ 푞 |⌊ 2 ⌋ − 2 | < 1 ℓ ℓ mod 푞 ′ for . If 푖 is even then is an inte- ⟨푐, 푠⟩ = ⌊∑ 푏 2 ⌋ + 휉 푞 ger multiple′ of and hence in this case . If푞 is odd √ 푖 푖 2 |휉 | ≤ ℓ푚 푞 +mod ℓ ≪and 푞 so∑ in 푏 this case ∑ 푏 . 푖 푞 푞 |⟨푐, 푠⟩| ≪ 푞 ∑ 푏 푖 ⌊∑ 푏 2 ⌋ = ⌊푞/2⌋ 푞 |⟨푐, 푠⟩| = 푞/2 ± 표(푞) > 푞/10 288 an intensive introduction to cryptography

Several other encryption schemes are also homomorphic with respect to linear functions. Even before Gentry’s construction there were constructions of encryption schemes that are homomorphic with respect to somewhat larger classes (e.g., quadratic functions by Boneh, Goh and Nissim) but not significantly so.

16.2.1 Abstraction: A trapdoor pseudorandom generator. It is instructive to consider the following abstraction (which we’ll use in the next lecture) of the above encryption scheme as a trapdoor generator (see Fig. 16.3). On input key generation algorithm outputs a vector with and푛 a probabilistic algorithm such that the following푚 holds: 푞 1 푞 1 푠 푠 ∈ ℤ 푠 = ⌊ 2 ⌋ 퐺 • Any polynomial number of samples from the distribution is computationally indistinguishable from independent samples from푛 푠 the uniform distribution over 퐺 (1 ) 푛 푞 • If is output by then ℤ . 푛 푠 √ Thus푐 can be thought퐺 (1 ) of a “trapdoor”|⟨푐, 푠⟩| ≤ 푛 for푞 the generator that allows to distinguish between a random vector (that with high prob- ability would푠 satisfy ) and an output푛 of the generator. 푞 We use to encrypt a bit by letting 푐 ∈ ℤ and outputting . In the|⟨푐, particular 푠⟩| ≥ 푞/10 instantiation above푛 we obtain by 푠 푅 푠 sampling퐺 the matrix⊤ from푏 the LWE assumption푐 ← 퐺 (1 and) having out- 푠 put푐 + (푏, 0, …for , 0) a random , but we can ignore this particular퐺 푠 implementation⊤ detail퐴 in the forgoing.푛 퐺 Note푤 퐴 that this trapdoor푤 ∈ generator {0, 1} satisfies the following stronger property: we can generate an alternative generator such that the description of is indistinguishable from the description′ of but such that actually′ does produce (up to exponentially퐺 small statis- 푠 tical error) the′ 퐺 uniform distribution over . We can define 퐺trapdoor generators퐺 formally as follows 푛 푞 ℤ Figure 16.3: In a trapdoor generator, we have two ways Definition 16.6 — Trapdoor generators. A trapdoor generator is a pair of to generate randomized algorithms. That is, we have some algorithms GEN and GEN such that GEN out- randomized algorithms GEN GEN that satisfy the following: puts a pair and GEN outputs′ with ′ being themselves algorithms′ (e.g., randomized′ cir- ′ 푠 푠 • On input , GEN outputs, a pair where is a string cuits). The(퐺 conditions, 푠) we require are that퐺 (1) the퐺 , 퐺 descriptions of the circuits and (considering describing푛 a randomized circuit that itself takes as input and 푠 푠 them as distributions over strings) are′ computation- outputs a1 string of length where(퐺 , 푠) is some퐺푛 polynomial. ally indistinguishable and (2)퐺푠 the distribution퐺 1 is statistically indistinguishable from the uniform distri-′ 푛 • On input , GEN outputs where is a string describing a bution , (3) there is an efficient algorithm 퐺that(1 given ) 푡 푡 = 푡(푛) the secret “trapdoor” can distinguish the output ′ randomized푛 circuit that itself′ takes as′ input. of from the uniform distribution. In particular 1 퐺 푛퐺 (1),(2), and (3) together푠 imply that it is not feasible to 1 exract퐺푠 from the description of .

푠 퐺푠 fully homomorphic encryption: introduction and bootstrapping 289

• The distributions GEN (i.e., the first output of GEN and GEN are computationally푛 indistinguishable 푛 1 ′ 푛 (1 ) (1 ) • With probability(1 ) over the choice of output by GEN , the distribution is statistically indistinguishable′ (i.e., within′ total1 variation −′ 푛푒푔푙(푛)푛 distance) from . 퐺 퐺 (1 ) 푡 • There is푛푒푔푙(푛) an efficient algorithm such that for푈 every pair output by GEN, Pr (where this 푠 probability is over the internal푛 푇 randomness used by on(퐺 the, 푠) 푠 input ) but Pr [푇 (푠, 퐺 (1 )) = 1]. 3 ≥ 1 − 푛푒푔푙(푛) 푠 푛 퐺 푡 1 [푇 (푠, 푈 ) = 1] ≤ 1/3 3 The choice of is arbitrary, and can be amplified as needed. 1/3 P This is not an easy definition to parse, but looking at Fig. 16.3 can help. Make sure you understand why LWEENC gives rise to a trapdoor generator satisfying all the conditions of Definition 16.6.

R Remark 16.7 — Trapdoor generators in real life. In the above we use the notion of a “trapdoor” in the pseu- dorandom generator as a mathematical abstraction, but generators with actual trapdoors have arisen in practice. In 2007 the National Institute of Standards (NIST) released standards for pseudorandom genera- tors. Pseudorandom generators are the quintessential private key primitive, typically built out of hash func- tions, block ciphers, and such and so it was surprising that NIST included in the list a pseudorandom gen- erator based on public key tools - the Dual EC DRBG generator based on elliptic curve cryptography. This was already strange but became even more worrying when Microsoft researchers Dan Shumow and Niels Ferguson showed that this generator could have a trap- door in the sense that it contained some hardwired constants that if generated in a particular way, there would be some information that (just like in above) allows to distinguish the generator from random (see 푠 here for a 2007 blog post on this issue). We learned퐺 more about this when leaks from the Snowden doc- ument showed that the NSA secretly paid 10 million dollars to RSA to make this generator the default option in their Bsafe software. You’d think that this generator is long dead but it turns out to be the “gift that keeps on giving”. In De- cember of 2015, Juniper systems announced that they have discovered a malicious code in their system, dat- ing back to at least 2012 (possibly 2008), that would 290 an intensive introduction to cryptography

allow an attacker to surreptitiously decrypt all VPN traffic through their firewalls. The issue is that Juniper has been using the Dual EC DRBG and someone has managed to replace the constant they were using with another one, one that they presumably knew the trap- door for (see here and here for more; of course unless you know to check for this, it’s very hard by looking at the code to see that one arbitrary looking constant has been replaced by another). Apparently, even though this is very surprising to many people in law enforcement and government, inserting back doors into cryptographic primitives might end up making them less secure.

16.3 FROM LINEAR HOMOMORPHISM TO FULL HOMOMORPHISM

Gentry’s breakthrough had two components:

• First, he gave a scheme that is homomorphic with respect to arith- metic circuits (involving not just addition but also multiplications) of logarithmic depth.

• Second, he showed the amazing “bootstrapping theorem” that if 4 The story is a bit more complex than that. Frustrat- a scheme is homomorphic enough to evaluate its own decryption ingly, the decryption circuit of Gentry’s basic scheme was just a little bit too deep for the bootstrapping circuit, then it can be turned into a fully homomorphic encryption theorem to apply. A lesser man, such as yours truly, that can evaluate any function. would at this point surmise that fully homomprphic encryption was just not meant to be, and perhaps take up knitting or playing bridge as an alternative hobby. Combining these two insights led to his fully homomorphic encryp- However, Craig persevered and managed to come tion.4 up with a way to “squash” the decryption circuit so it can fit the bootstrapping parameters. Follow up In this lecture we will focus on the second component - the boot- works, and in particular the paper of Brakerski and strapping theorem. We will show a “partially homomorphic encryp- Vaikuntanathan, managed to get schemes with much tion” (based on a later work of Gentry, Sahai and Waters) that can fit better relation between the homomorphism depth and decryption circuit, and hence avoid the need for that theorem in the next lecture. squashing and also improve the security assumptions.

16.4 BOOTSTRAPPING: FULLY HOMOMORPHIC “ESCAPE VELOC- ITY”

The bootstrapping theorem is quite surprising. A priori you might expect that given that a homomorphic encryption for linear functions was not trivial to do, a homomorphic encryption for quadratics would be harder, cubics even harder and so on and so forth. But it turns out that there is some special degree such that if we obtain homomor- Figure 16.4: The “Bootstrapping Theorem” shows that phic encryption for degree polynomials∗ then we can obtain fully once a partially homomorphic encryption scheme is homomorphic encryption that∗ works푡 for all functions. (Specifically, homomorphic with respect to a rich enough family of if the decryption algorithm푡 is a degree polynomial, then functions, and specifically a family that contains its own decryption algorithm, then it can be converted to homomorphically evaluating polynomials of degree will a fully homomorphic encryption scheme that can be 푑 be sufficient.) That is, it푐 ↦turns 퐷 (푐) out that once푡 ∗ an encryption scheme used to evaluate any function. 푡 = 2푡 fully homomorphic encryption: introduction and bootstrapping 291

is strong enough to homomorphically evaluate its own decryption algo- rithm then we can use it to obtain a fully homomorphic encryption by “pulling itself up by its own bootstraps”. One analogy is that at this point the encryption reaches “escape velocity” and we can continue onwards evaluating gates in perpetuity. We now show the bootstrapping theorem:

Theorem 16.8 — Bootstrapping Theorem, Gentry 2009. Suppose that is a CPA circular 5 secure partially homomorphic en- cryption scheme for the family and suppose that for every pair of(퐺, ciphertexts 퐸, 퐷) the map NAND is in . Then can be turned′ a fully homomorphicℱ encryption′ scheme. 푑 푑 푐, 푐 푑 ↦ 퐷 (푐) 퐷 (푐 ) ℱ 5 You can ignore the condition of circular security in a (퐺, 퐸, 퐷) first read - we will discuss it later. 16.4.1 Radioactive legos analogy Here is one analogy for bootstrapping, inspired by Gentry’s survey. Suppose that you need to construct some complicated object from a highly toxic material (see Fig. 16.5). You are given a supply of sealed bags that are flexible enough so you can manipulate the object from outside the bag. However, each bag can only hold for seconds of such manipulations before it leaks. The idea is that if you can open one bag inside another within seconds then you can10 perform the manipulations for arbitrary length. That is, if the object is in the Figure 16.5: To build a castle from radioactive Lego bag then you put this bag inside9 the bag, spend seconds푡ℎ on bricks, which can be kept safe in a special ziploc bag opening the bag inside the bag푠푡 and then spend another푖 for 10 seconds, we can: 1) Place the bricks in a bag, 푖 + 1 9 and place the bag inside an outer bag. 2) Manipulate second of whatever푡ℎ manipulations you푠푡 wanted to perform. We then the inner bag through the outer bag to remove the continue this푖 process by putting푖 + the 1 bag inside the bag bricks from it in 9 seconds, and spend 1 second putting one brick in place 3) Now the outer bag has 9 and so on and so forth. 푠푡 푛푑 seconds of life left, and we can put it inside a new bag 푖 + 1 푖 + 2 and repeat the process. 16.4.2 Proving the bootstrapping theorem We now turn to the formal proof of Theorem 16.8

Proof. The idea behind the proof is simple but ingenious. Recall that the NAND gate is a universal gate that allows us to compute any function′ ′ that can be efficiently computed. Thus,푏, to 푏 obtain↦ ¬(푏 a ∧ fully 푏 ) homomorphic푛 encryption it suffices to obtain a function NANDEVAL푓 ∶ {0, 1}such→ that {0, 1} NANDEVAL NAND . (Note that this is stronger than the typical′ no- 푑 tion of homomorphic′ evaluation since we require퐷 ( that NANDEVAL(푐, 푐 )) = 푑 푑 퐷outputs(푐) an encryption퐷 (푐 ) of NAND when given any pair of cipher- texts that decrypt to and respectively,′ regardless whether these ciphertexts were produced푏 by′ the encryption푏 algorithm or by some other method, including푏 the푏 NANDEVAL procedure itself.) Thus to prove the theorem, we need to modify into an encryption scheme supporting the NANDEVAL operation. Our new (퐺, 퐸, 퐷) 292 an intensive introduction to cryptography

scheme will use the same encryption algorithms and but the following modification of the key generation algorithm: after run- ning , we will′ append to the public퐸 key an퐷 encryption of the푛 secret퐺 key. We have now defined the key generation, encryption∗ (푑, 푒) = and 퐺(1 decryption.) CPA security follows from the security of 푒 the푐 = original 퐸 (푑) scheme, where by circular security we refer exactly to the condition that the scheme is secure even if the adversary gets a single 6 6 Without this assumption we can still obtained a form encryption of the public key. This latter condition is not known to be of FHE known as a leveled FHE where the size of the implied by standard CPA security but as far as we know is satisfied by public key grows with the depth of the circuit to be all natural public key encryptions, including the LWE-based ones we evaluated. We can do this by having public keys where is the depth we want to evaluate, and encrypt will plug into this theorem later on. the private key of the key with theℓ public So, now all that is left is to define the NANDEVAL operation. On key. However,ℓ since circular푡ℎ security seems푠푡 quite likely to hold, we ignore푖 this extra complication푖 + 1 in the input two ciphertexts and , we will construct the function rest of the discussion. (where is′ the length of the secret key) such that′ 푐,푐 푛 NAND푐 푐 . It would be useful to pause at푓 this∶ point{0, 1}′ and→ make {0, 1} sure you푛 understand′ what are the inputs to , what 푐,푐 푑 푑 are푓 “hardwired(푑) = 퐷 (푐) constants”퐷 and(푐 ) what is its output. The ciphertexts′ 푐,푐 and are simply treated as fixed strings and are not part of the푓 input to ′ . Rather is a function (depending on the strings ) that푐 maps푐 ′ the secret key′ into a bit. When running NANDEVAL we′ of 푐,푐 푐,푐 course푓 do not know푓 the secret key , but we can still design푐, a 푐circuit that computes this function . Now NANDEVAL will simply be defined as EVAL . Since′ 푑 , we get that′ 푐,푐 ′ ∗ 푓 ∗ (푐, 푐 ) 푐,푐 푒 NANDEVAL (푓 , 푐 ) EVAL푐 = 퐸 (푑) NAND ′ ′ ∗ ′ ′ 푑 푑 푐,푐 푐,푐 푑 푑 Thus퐷 ( indeed we map(푐, 푐 ))any =pair 퐷 ( of ciphertexts(푓 , 푐 )) = 푓that(푑) decrypt = 퐷 to(푐) 퐷 (푐 ). into a ciphertext that decrypts to NAND ′. This is all that we ′ needed to prove. ″ 푐, 푐′ 푏, 푏 푐 푏 푏 ■

P Don’t let the short proof fool you. This theorem is quite deep and subtle, and requires some reading and re-reading to truly “get” it. 17 Fully homomorphic encryption: Construction

In the last lecture we defined fully homomorphic encryption, and showed the “bootstrapping theorem” that transforms a partially ho- momorphic encryption scheme into a fully homomorphic encryption, as long as the original scheme can homomorphically evaluate its own decryption circuit. In this lecture we will show an encryption scheme

(due to Gentry, Sahai and Waters, henceforth GSW) meeting the lat- 1 1 This theorem as stated was proven by Brakerski and ter property. That is, this lecture is devoted to proving the following Vaikuntanathan (ITCS 2014) building a line of work theorem: initiated by Gentry’s original STOC 2009 work. We will actually prove a weaker version of this theorem, due to Brakerski and Vaikuntanathan (FOCS 2011), Theorem 17.1 — FHE from LWE. Assuming the LWE conjecture, there which assumes a quantitative strengthening of LWE. exists a partially homomorphic public key encryption EVAL However, we will not follow the proof of Brakerski that fits the conditions of the bootstrapping theorem (Theo- and Vaikuntanathan but rather a scheme of Gentry, Sahai and Waters (CRYPTO 2013). Also note that, rem 16.8). That is, for every two ciphertexts and , the(퐺, function 퐸, 퐷, ) as noted in the previous lecture, all of these results NAND can be homomorphically′ evaluated by require the extra assumption of circular security on top of LWE to achieve a non-leveled fully homomorphic EVAL. ′ 푐 푐 푑 푑 encryption scheme. 푑 ↦ 퐷 (푐) 퐷 (푐 ) Before the detailed description and analysis, let us first outline our strategy. Something of essential importance is the following.

Definition 17.2 — Noisy Homomorphic Encryption. Suppose that is a CPA secure public key scheme and that is a measure which maps any ciphertext to its “noise” Denote(퐺, 퐸, 퐷) 휂 푐 휂(푐) ∈ [0, ∞). 휃 푏 NAND is풞 called푏 = {푐 a ∶noisy 퐷 (푐) homomorphic = 푏, 휂(푐) ≤ 휃}. encryption scheme if the followings holds for some : (퐺, 퐸, 퐷, ) • for any plaintext푞 =. 푞(푛) √ 푞 푒 • If퐸 (푏) ∈ 풞with푏 , then 푏 . 휂 푒 푐 ∈ 풞푏 휂 ≤ 푞/4 퐷 (푐) = 푏

Compiled on 9.23.2021 13:32 294 an intensive introduction to cryptography

• For any and , it holds that ′ 휂 ′ 휂′ 푏 ENAND푏 max 푐 ∈ 풞 푐 ∈ 풞 3 ′ ′ 푛 ⋅ ′ {휂,휂 } as long as max (푐, 푐 ). ∈ 풞푏∧푏 3 ′ 푛 ⋅ {휂, 휂 } < 푞/4 The noisy homomorphic encryption actually states that if and encrypt and up to error and , respectively, then ENAND ′ encrypts NAND′ up to some error′ which can be controlled퐶 by 퐶′ . The푏 coefficient푏 ′ is not휂 essential휂 here; we just need the order(푐, 푐 ) ′ . This property(푏, 푏 ) 3 allows us to perform the ENAND operator 휂,repeatly 휂 as long as we푛 can guarantee the accumulated error is smaller 푝표푙푦(푛)than , which means that the decryption can be done correctly. The next theorem tells us with what depth, a circuit that can be computed homomorphically.푞/4

Theorem 17.3 If there exists a noisy homomorphic encryption scheme with , then the it can be extended to a homo- √ morphic encryption scheme푛 for any circuit with depth smaller than . 푞 = 2

Proof.푝표푙푦푙표푔(푛)For any function which can be described by a circuit with depth , we can compute푚 EVAL with error up to 푓 ∶. {0, (The 1} initial→ {0, error 1} for is smaller 푒 1 푒 푚 than and the error willℓ3 ℓ be accumulated with(푓, rate 퐸 up(푥 to), ⋯ ,.) 퐸 Thus,(푥 )) 푒 푖 to guarantee that√EVAL(푞)(푛 ) can퐸 be(푥 decrypted) 3 to √ 푛 correctly, we only need , i.e., 푛 푒 1 푒 푚 . This is equalvent(푓, to 퐸 log(푥 ), ⋯ , 퐸 (푥, which3))ℓ can be guaranteed3ℓ √ 1 푚 √ when푓(푥푛/2, ⋯ , 푥 ) or . √(푞)(푛 ) ≪ 푞 푛 ≪ 푞 = √ ■ 2 표(1) 3ℓ (푛)ℓ 푛/2 ℓ = 푛 ℓ = 푝표푙푦푙표푔(푛) With Theorem 17.3, the rest is to verify that the function NAND can be computed by a circuit with depth being . And then′ we can obtain a fully homomorphic encryption푑 ↦ 푑 푑 퐷scheme.(푐) We will퐷 head(푐 ) into some details show how to construct things we푝표푙푦푙표푔(푛) want in the rest of this chapter. The most technical and interesting part would be how to upperbound the noise/error.

17.1 PRELUDE: FROM VECTORS TO MATRICES

In the linear homomorphic scheme we saw in the last lecture, every ciphertext was a vector such that equals (up to scaling by ) the plaintext bit. We saw푛 that adding two ciphertexts modulo 푞 corresponded푞 to XOR’ing푐 ∈ (i.e., ℤ adding modulo⟨푐, 푠⟩ ) the corresponding 2 ⌊two⌋ plaintexts. That is, if we define as mod then 푞 ′ 2 ′ 푐 ⊕ 푐 푐 + 푐 ( 푞) fully homomorphic encryption: construction 295

performing the operation on the ciphertexts corresponds to adding modulo the plaintexts. However, to get⊕ to a fully, or even partially, homomorphic scheme, we need2 to find a way to perform the NAND operation on the two plaintexts. The challenge is that it seems that to do that we need to find a way to evaluate multiplications: find a way to define some oper- ation on ciphertexts that corresponds to multiplying the plaintexts. Alas, a priori, there doesn’t seem to be a natural way to multiply two vectors.⊗ The GSW approach to handle this is to move from vectors to ma- trices. As usual, it is instructive to first consider the cryptographer’s dream world where Gaussian elimination doesn’t exist. In this case, the GSW ciphertext encrypting would be an matrix over such that where is the secret key. That is, the encryption of a bit is a matrix 푏 ∈such {0, 1} that푛 the secret key푛 × is 푛 an eigen-퐶 푞 푞 vectorℤ(modulo )퐶푠 of =with 푏푠 corresponding푠 ∈ ℤ eigenvalue . (We defer discussion of how the푏 encrypting퐶 party generates such a ciphertext, since this is in any푞 case퐶 only a “dream” toy example.) 푏

P You should make sure you understand the types of all the identifiers we refer to. In particular, above is an matrix with entries in , is a vector in , and is a scalar (i.e., just a number) in .퐶 See 푞 Fig.푛 17.1푛for × a visual푛 representation ofℤ the푠 ciphertexts 푞 ℤin this “naive”푏 encryption scheme. Keeping{0, track 1} of the dimensions of all objects will become only more important in the rest of this lecture.

Given and we can recover by just checking if or . The scheme allows homomorphic evaluation of both addition (modulo푛 퐶) and푠 multiplication, since푏 if and 퐶푠 = 푠 then퐶푠= we can0 define (where on the righthand′ side, addition′ is simply푞 done in ′ ) and ′ CC퐶푠(where = 푏푠 again퐶 푠 this = 푏 refers푠 to matrix multiplication퐶 ⊕ 퐶 = 퐶 in + 퐶). ′ ′ 푞 Indeed, one canℤ verify that퐶 ⊗ both 퐶 = addition and multiplication suc- 푞 Figure 17.1: In the “naive” version of the GSW encryp- ceed since ℤ tion, to encrypt a bit we output an matrix such that where is the secret key. In

this scheme we can transform푏 encryptions푛 푛 × 푛 of퐶 ′ ′ 푞 and respectively퐶푠 = 푏푠 to an encryption푠 ∈ ℤ of NAND′ (퐶 + 퐶 )푠 = (푏 + 푏 )푠 by letting . CC ′ ″ 퐶, 퐶 ′ 푏, 푏 ″ ′ 퐶 (푏, 푏 ) ′ where all these equalities are in . ′ ′ 퐶 = 퐼 − 퐶퐶 푠 = 퐶(푏 푠) = 푏푏 푠 Addition modulo is not the same as XOR, but given these multi- 푞 plication and addition operations,ℤ we can implement the NAND oper- ation as well. Specifically,푞 for every , NAND . Hence we can take a ciphertext encrypting′ and a ciphertext′ ′ 푏, 푏 ∈ {0, 1} 푏 푏 = 1 −′ 푏푏 퐶 푏 퐶 296 an intensive introduction to cryptography

encrypting and transform these two ciphertexts to the ciphertext ′ that encrypts NAND (where is the identity matrix).″ Thus푏 in′ a world without Gaussian′ elimination it is not hard 퐶to get= a (퐼 fully − 퐶퐶 homomorphic) encryption.푏 푏 퐼

R Remark 17.4 — Private key FHE. We have not shown how to generate a ciphertext without knowledge of , and hence strictly speaking we only showed in this world how to get a private key fully homomorphic 푠 encryption. Our “real world” scheme will be a full fledged public key FHE. However we note that private key homomorphic encryption is already very inter- esting and in fact sufficient for many of the “cloud computing” applications. Moreover, Rothblum gave a generic transformation from a private key homo- morphic encryption to a public key homomorphic encryption.

17.2 REAL WORLD PARTIALLY HOMOMORPHIC ENCRYPTION

We now discuss how we can obtain an encryption in the real world where, as much as we’d like to ignore it, there are people who walk among us (not to mention some computer programs) that actually know how to invert matrices. As usual, the idea is to “fool Gaussian elimination with noise” but we will see that we have to be much more careful about “noise management”, otherwise even for the party hold- 2 2 For this reason, Craig Gentry called his highly rec- ing the secret key the noise will overwhelm the signal. ommended survey on fully homomorphic encryption The main idea is that we can expect the following problem to be and other advanced constructions computing on the hard for a random secret : distinguish between samples of edge of chaos. random matrices and matrices푛 where for some 푞 and “short” satisfying푠 ∈ ℤ for all . This yields a natural candidate for an encryption퐶 scheme where퐶푠 we = encrypt 푏푠 + 푒 by a matrix푏 ∈ 푖 √ 3 {0, 1} 푒 |푒 | ≤ 푞 푖 3 We deliberately leave some flexibility in the defini- satisfying where is a “short” vector. tion of “short”. While initially “short” might mean We can now try to check what adding and multiplying푏 two matri-퐶 that for every , decryption will succeed as long as long as is, say, at most . ces does to퐶푠 the = noise. 푏푠 + 푒 If 푒 and then √ |푒푖| < 푞 푖 ′ ′ ′ |푒푖| 푞/100 퐶푠 = 푏푠 + 푒 퐶 푠 = 푏 푠 + 푒 (17.1) ′ ′ ′ and (퐶 + 퐶 )푠 = (푏 + 푏 )푠 + (푒 + 푒 ) CC (17.2) ′ ′ ′ ′ ′ ′ 푠 = 퐶(푏 푠 + 푒 ) + 푒 = 푏푏 푠 + (푏 푒 + 퐶푒 ). P I recommend you pause here and check for yourself whether it will be the case that if encrypts and CC encrypts up to small noise or′ not. ′ ′ ′ 퐶 + 퐶 푏 + 푏 푏푏 fully homomorphic encryption: construction 297

We would have loved to say that we can define as above mod and CC mod . For this we would′ need that′ equals ′ ′plus a “short” vector and퐶 ⊕CC 퐶 = 퐶equals + 퐶 ( plus푞) a′ “short”퐶 ⊗ vector. 퐶 = The′ ( former statement푞) indeed holds.′ Looking at′ (퐶 (17.2 + 퐶) we)푠 see that(푏 + 푏 )푠 equals up to the 푠 “noise”푏푏 vector푠 , and if are “short”′ then ′ is not too long either. That is, if ′ and(퐶′ + 퐶 )푠for every(푏then + 푏 ′)푠 . So we can at least푒 + handle 푒 a significant푒, 푒 ′ number푒 of + 푒additions′ before the 푖 푖 푖 푖 noise gets out of hand.|푒 | < 훿푞 |푒 | < 훿푞 푖 |푒 + 푒 | < 2훿푞 However, if we consider (17.2), we see that CC will be equal to plus the “noise vector” . The first component′ of this noise′ vector is “short” (after all′ ′ and is “short”). However,′ the푏푏 푠 second component 푏could푒 + 퐶푒 be a very large vector. Indeed,푏 푒 since looks like a random matrix′ 푏 in ∈ {0,, no1} matter푒 how small the entries of , many of the entries퐶푒 of are quite likely to be of magnitude at퐶 푞 least,′ say, and so multiplying′ ℤ by takes us “beyond the edge of 푒chaos”. 퐶푒 ′ 푞/2 푒 퐶 17.3 NOISE MANAGEMENT VIA ENCODING

The problem we had above is that the entries of are elements in that can be very large, while we would have loved them to be small 푞 numbers such as or . At this point one could say퐶 ℤ

“If only there was some way to encode numbers between 0 1 and using only ’s and ’s”

If you think0 about푞 − 1 it hard enough,0 it1 turns out that there is some- thing known as the “binary basis” that allows us to encode a number 4 If we were being pedantic the length of the vector log 4 as a vector . What’s even more surprising is that (and other constant below) should be the integer this seemingly trivial trick turns푞 out to be immensely useful. We will log but I omit the ceiling symbols for simplicity of 푞 푥 ∈ ℤ ̂푥∈ {0, 1} notation. define the binary encoding of a vector or matrix over by . That is, ⌈ 푞⌉ is obtained by replacing every coordinate with log coordinates 푞 log such that 푥 ℤ ̂푥 푖 ̂푥 푥 푞 푖,0 푖, 푞−1 푥 , … , 푥 log (17.3) 푞−1 푗 푖 푖,푗 Specifically, if , then푥 = we∑푗=0 denote2 푥 by. the log -dimensional vector with entries in 푛 , such that each log -sized block of en- 푞 codes a coordinate푠 ∈ of ℤ. Similarly, if is an ̂푠 푛matrix,푞 then we de- note by the log{0, 1}matrix with entries in푞 that correspondŝ푠 to encoding every -dimensional푠 row퐶 of by푚 an× 푛 log -dimensional row where퐶 ̂ each푚 ×log 푛 -sized푞 block corresponds to{0, a single 1} entry. (We still think of the entries푛 of these vectors and퐶 matrices푛 as푞 elements of and so all calculations푞 are still done modulo .) 푞 ℤ 푞 298 an intensive introduction to cryptography

While encoding in the binary basis is not a linear operation, the decoding operation is linear as one can see in (17.3). We let be the log “decoding” matrix that maps an encoding vector back to the original vector . Specifically, every row of is composed푄 of 푛blocks × (푛 each푞) of log size, where the -th row has only the -th block̂푠 nonzero, and equal to푠 the values log 푄 . It’s a good exer- 푛cise to verify that for푞 every vector 푖 and matrix푞−1 푖 , and . (See Fig. 17.2 amd(1,Fig. 2, 4, 17.3푛 ….) , 2 ) 푛×푛 푞 푞 In our⊤ final scheme the ciphertext푠 ∈ ℤ encrypting will퐶 be ∈ anℤ log푄 ̂푠= 푠 log퐶푄̂ matrix= 퐶 with small coefficients such that for a log “short” and for . Now푏 given ciphertexts(푛 푞) × (푛 that푞) encrypt푛 퐶 푞 respectively,⊤ we will푛 define 퐶푣 = 푏푣 + 푒 푞 푞 mod′ 푒and ∈ ℤ ′ 푣CQ = 푄 푠. 푠 ∈ ℤ ′ ′ 퐶, 퐶 푏, 푏 퐶 ⊕ 퐶 = 퐶 + 퐶 Since we have ′ ⊤ and′ we get that ̂ ( 푞) 퐶 ⊗ 퐶 = ( )퐶 ′ ′ ′ (17.4) 퐶푣 = 푏푣 + 푒 퐶 푣 = 푏 푣 + 푒 Figure 17.2: We can encode a vector as a vector ′ ′ ′ ′ log and that has only entries in 푛by using (퐶 ⊕ 퐶 )푣 = (퐶 + 퐶 )푣 = (푏 + 푏 )푣 + (푒 + 푒 ) 푠 ∈ ℤ푞 CQ CQ (17.5) the binary푛 푞 encoding, replacing every coordinate of witĥ푠∈ ℤ a푞 log -sized block in . The decoding{0, 1} operation ′ ⊤ ′ ⊤ ′ ′ is linear and so we can write for a specific But since and ̂ for everŷ matrix , the righthand 푠 (퐶 ⊗ 퐶 )푣 = ( )퐶 푣 = ( )(푏 푣 + 푒 ). (simple) 푞 log matrix̂푠 . side of (17.5) equals⊤ ⊤ 푠 = 푄 ̂푠 푣 = 푄 푠 퐴푄̂ = 퐴 퐴 푛 × (푛 푞) 푄

CQ CQ CQ (17.6) ⊤ ′ ⊤ ′ ′ ⊤ ⊤ ′ ′ ⊤ ′ ̂ ̂ ̂ but( since)(푏 푄 is푠 + a matrix 푒 ) = 푏 with퐶푄 small푠 + ( coefficients)푒 = 푏 퐶푣 +for( every )푒and is short, the righthand side of (17.6) equals up to a short vector,′ ̂ and since 퐵 and is short, we get′ that 퐵 equals푒 plus a short vector as desired.′ 푏 퐶푣 ′ ′ If we keep퐶푣 track= 푏푣 +of푒 the parameters푏 푒 in the above analysis,(퐶 ⊗ 퐶 we)푣 can see Figure 17.3: We can encode an matrix over by an log matrix using the binary basis. that푏 푏푣 푞 We have the equation 푛 ×where 푛 is퐶 the sameℤ ̂ matrix푛 we × (푛use to푞) decode a vector.퐶 ⊤ 퐶 = 퐶푄̂ 푄 then if encrypts and ′encrypts with′ noise vectors satis- 퐶∧퐶 = (퐼 − 퐶 ⊗ 퐶 ) fying max and max ′ then′ encrypts NAND′ up to a vector퐶 of maximum푏 퐶 magnitude′ ′ at푏 most ′ log 푒, 푒, which′ 푖 푖 is definitely|푒 | ≤smaller 휇 than |푒max| ≤ 휇 for퐶∧퐶 . 푏 ′ 푏 √ 3 푂(휇푛 + 푛 푞휇 ) 17.4 PUTTING IT ALL TOGETHER푛 ⋅ {휂, 휂, } 푞 = 2

We now describe the full scheme. We are going to use a quantitatively stronger version of LWE. Namely, the -dLWE assumption for . It is not hard to show that we can relax our assumption to √ -LWE푛 and Brakerski푞(푛) and Vaikuntanathan showed 푞(푛)how =to 2relax the assumption푝표푙푦푙표푔(푛) to standard (i.e. ) LWE 푞(푛)though we푞(푛) will = not 2 present this here. 푞(푛) = 푝표푙푦(푛) fully homomorphic encryption: construction 299

FHEENC:

• Key generation: As in the scheme of last lecture the secret key is and the public key is a generator such that samples푛 from 푠 are indistinguishable푠 ∈ from ℤ independent ran-푛 푠 푠 dom samples퐺 from but if is output퐺 by(1 ) then ,푛 where the inner product (as 푞 푠 all other computations)ℤ is done푐 modulo and퐺 √ for every|⟨푐, 푠⟩| < 푞 we define min . As before, we can푞 assume 푞 that 푥 ∈ ℤ which= {0, implies … , 푞 − that 1} is also|푥| = since{푥, 푞 (as − can 푥} be verified by direct⊤ 1 1 inspection)푠 = the ⌊푞/2⌋ first row of is (푄 .푠) • Encryption:⌊푞/2⌋ To encrypt ⊤ , let 푄 (1, 0, … , 0) log output where is the matrix whose푛푏 rows ∈ are {0,⊤ 1} 1 푛 푞 푅 푠 ̂ 푑 , … , 푑 log generated← 퐺 (1 from) . (See퐶 =Fig.(푏푄 17.4+) 퐷) 퐷 • Decryption:1 푛 푞 To decrypt the ciphertext푠 , we 푑output, … , 푑 if CQ 퐺 and output if CQ ⊤ , see Fig. 17.5.퐶 (It doesn’t 1 matter what0 |(⊤ we output푠) | on < other 0.1푞 cases.) 1 1 • 0.6푞NAND > |( evaluation:푠) | >Given 0.4푞 ciphertexts , we define (sometimes also denoted as′ NANDEVAL ′ ) to equal 퐶,CQ 퐶 , where is퐶 the∧퐶 log′ log identity matrix.⊤ ′ ̂ (퐶, 퐶 ) 퐼 − ( )퐶 퐼 (푛 푞) × (푛 푞)

Figure 17.4: In our fully homomorphic encryption, the public key is a trapdoor generator . To encrypt a bit , we output where is a 퐺푠 log matrix whose rows⊤ are generated using . 푏 퐶 = (푏푄̂+ 퐷) 퐷 P (푛 푞) × 푛 Please take your time to read the definition of the 퐺푠 scheme, and go over Fig. 17.4 and Fig. 17.5 to make sure you understand it.

17.5 ANALYSIS OF OUR SCHEME

To show that that this scheme is a valid partially homomorphic scheme we need to show the following properties: Figure 17.5: We decrypt a ciphertext

by looking at the first coordinate of CQ (or⊤ equiva- ̂ 1. Correctness: The decryption of an encryption of equals . lently, CQ ). If then this equals퐶 =⊤ (푏푄 to the+ first 퐷) coordinate⊤ of , which is at most in푠 magintude. If then푄 we ̂푠 get푏 an = extra0 factor of which we 2. CPA security: An encryption of is computationally푏 ∈ {0, indistinguish- 1} 푏 √ set to be in the퐷푠 interval 푞 .⊤ We can think able from an encryption of to someone that got the public key. of푏 either = 1 or as our secret key. 푄 푠 0 (0.499푞, 0.51푞) 푠 ̂푠 1 300 an intensive introduction to cryptography

3. Homomorphism: If encrypts and encrypts then encrypts NAND (with a higher amount′ of noise).′ The growth′ of the noise will be the′ 퐶 reason that푏 we will퐶 not get immediately푏 퐶∧퐶 a fully homomorphic푏 푏 encryption.

4. Shallow decryption circuit: To plug this scheme into the boot- strapping theorem we will need to show that its decryption al- gorithm (or more accurately, the function in the statement of the bootstrapping theorem) can be evaluated in depth (inde- pendently of ), and that moreover, the noise grows slowly enough that our scheme is homomorphic with respect to such푝표푙푦푙표푔(푛) circuits. 푞 Once we obtain 1-4 above, we can plug FHEENC into the Bootstrap- ping Theorem (Theorem 16.8) and thus complete the proof of exis- tence of a fully homomorphic encryption scheme. We now address those points one by one.

17.5.1 Correctness Correctness of the scheme will follow from the following stronger condition: Lemma 17.5 For every , if is the encryption of then it is an log log matrix satisfying 푏 ∈ {0, 1} 퐶 푏 (푛 푞) × (푛 푞) CQ ⊤ ⊤ where max . 푠 = 푏푄 푠 + 푒

푖 √ Proof. For starters,|푒 | ≪ let푞 us see that the dimensions make sense: the encryption of is computed by where is an

log matrix satisfying ⊤for every . Since is푏 also an log 퐶 =matrix,(푏푄̂ adding+ 퐷) (i.e.퐷 either 푖 √ or(푛 the푞) all-zeroes ×⊤ 푛 matrix, depending|퐷푠| on≤ whether푞 or not푖 ⊤ ) to ⊤ makes sense푄 and applying(푛 the푞) ×operation 푛 will transform푏푄 every row푄 to length log and hence is indeed a square log 푏 = 1 log퐷 matrix. ̂⋅ Let us now푛 see푞 what this퐶 matrix does to the(푛 vector푞) × (푛 .푞) Using the fact that for every matrix , we get that ⊤ ⊤ 퐶 푣 = 푄 푠 푀푄̂ = 푀 푀 ⊤ but by construction 퐶푣 = (푏푄 +for 퐷)푠 every = 푏푣. + 퐷푠 ■ 푖 √ |(퐷푠) | ≤ 푞 푖 Lemma 17.5 implies correctness of decryption since by construction we ensured that and hence we get that if then ⊤ and if then 1 . (푄 푠) ∈ (0.499푞, 0.5001푞) 1 푣 1 푏 = 0 |(퐶푣) | = 표(푞) 푏 = 1 0.499푞 − 표(푞) ≤ |(퐶 ) | ≤ 0.501푞 + 표(푞) fully homomorphic encryption: construction 301

17.5.2 CPA Security To show CPA security we need to show that an encryption of is indistinguishable from an encryption of . However, by the security of the trapdoor generator, an encryption of computed according0 to our algorithm will be indistinguishable from1 an encryption of obtained when the matrix is a random log 푏 matrix. Now in this case the encryption is obtained by applying the operation to 푏 but if is uniformly퐷 random then(푛 for푞) every × 푛 choice of , ⊤ is uniformly random (since a fixed matrix pluŝ⋅ a random푏푄⊤ +matrix 퐷 yields a random퐷 matrix) and hence the matrix (and so푏 also푏푄 the+ 퐷 matrix ) contains no information about ⊤. This completes the proof of CPA⊤ security (can you see why?). 푏푄 + 퐷 푏푄̂If+ we 퐷 want to plug in this scheme in the푏 bootstrapping theorem, then we will also assume that it is circular secure. It seems a reasonable assumption though unfortuantely at the moment we do not know how to derive it from LWE. (If we don’t want to make this assumption we can still obtained a leveled fully homomorphic encryption as discussed in the previous lecture.)

17.5.3 Homomorphism Let , and be a ciphertext such that . We define the noise of , denoted as to be the maximum of over all 푣 = 푄푠log푏 ∈. {0, We 1} make the퐶 following lemma, which퐶푣 we’ll = 푏푣 call + the 푒 푖 “noisy homomorphism퐶 lemma”:휇(퐶) |푒 | 푖 ∈ [푛 푞] Lemma 17.6 Let be ciphertexts encrypting respectively with . Then′ encrypts NAND′ and satisfies ′ 퐶, 퐶 ″ ′ 푏, 푏 ′ 휇(퐶), 휇(퐶 ) ≤ 푞/4 퐶 =log 퐶∧퐶max 푏 푏 (17.7) ″ ′ Proof. This follows from the calculations we have done before. As 휇(퐶 ) ≤ (2푛 푞) {휇(퐶), 휇(퐶 )} we’ve seen,

′ ′ ′ ′ ⊤ ′ ′ ′ ′ ′ ′ But⊤ since is⊤ a matrix with⊤ every row⊤ of length log ,⊤ for ⊤ 퐶푄̂ 퐶 푣 = 퐶푄̂ (푏 푣+푒 ) = 푏 퐶푄̂ 푄 푠+퐶푄̂ 푒 = 푏 (퐶푣)+퐶푄̂ 푒 = 푏푏 푣+푏 푒+퐶푄̂ 푒 every ⊤ log max . We see that the noise vector in ̂ the product퐶푄⊤ has′ magnitude0/1 at most log 푛. Adding푞 the ̂ 푖 푗 푗 identity푖 (퐶푄 for the푒 ) NAND≤ (푛 operation푞) adds|푒 | at most ′ to the noise, and so the total noise magnitude휇(퐶) is + bounded 푛 푞휇(퐶 by the) righthand′ side of (17.7). 휇(퐶) + 휇(퐶 ) ■

17.5.4 Shallow decryption circuit Recall that to plug in our homomorphic encryption scheme into the bootstrapping theorem, we needed to show that for every cipher- texts (generated by the encryption algorithm) the function ′ 퐶, 퐶 302 an intensive introduction to cryptography

log defined as 푛 푞 푓 ∶ {0, 1} → {0, 1} NAND ′ 푑 푑 can be homomorphically푓(푑) evaluated= 퐷 (퐶) where 퐷is(퐶 the) secret key and denotes the decryption algorithm applied to . In our case we can think of the secret key푑 as the binary string 푑 퐷which(퐶) describes our vector as a bit string of length퐶 log . Given a ciphertext , the decryption algorithm takes the dot product modulô푠 of with the first row of CQ푠 (or, equivalently, the푛 dot product푞 of with the first퐶 row of CQ )⊤ and outputs (respectively ) if the resulting푞 푠 number is small⊤ (respectively large). ̂푠 By repeatedly applying the푄 noisy homomorphism0 lemma1 (Lemma 17.6), we can show that can homorphically evaluate every circuit of NAND gates whose depth satisfies log . If then (assuming is sufficiently large) thenℓ as long as √ 푛this will be satisfied. ℓ (2푛 푞) ≪ 푞 푞 =In0.49 particular 2 to show that푛 can be homomorphically evaluated log ℓit < will 푛 suffice to show that for every fixed vector there is a depth circuit푓(⋅) that on input a string푛 푞 log 푞 will output if 0.49 and output if 푐 ∈ ℤ . (We 푛 푞 푝표푙푦푙표푔(푛)don’t care what ≪ 푛 does otherwise.퐹 The above suffices sincê푠∈ {0, given 1} a ciphertext0 we|⟨푐, can푠⟩|̂ use < 푞/10with the vector1 being|⟨푐, 푠⟩|̂ the > top 푞/5 row of CQ , and hence퐹 would correspond to the first entry of CQ . Note⊤ that if 퐶 has depth 퐹then the function 푐 above has depth at⊤ most푄 .) ⟨푐, ̂푠⟩ 푠 퐹 ℓ 푓() ℓ + 1 P Please make sure you understand the above argument.

If log is a vector then to compute its inner product with a vector we simply need to sum up the numbers where 1 푛 푞 푐. = Summing (푐 , … , 푐 up ) numbers can be done via the obvious recur- 푖 sion in0/1 depth that iŝ푠 log times the depth for a single addition푐 of two 푖 numbers.̂푠 = 1 However, the푚 naive way to add two numbers in (each represented by log bits)푚 will have depth log which is too much 푞 for us. ℤ 푞 푂( 푞)

P Please stop here and see if you understand why the natural circuit to compute the addition of two num- bers modulo (represented as log -length binary strings) will require depth log . As a hint, one needs to keep푞 track of the “carry”. 푞 푂( 푞) fully homomorphic encryption: construction 303

Fortunately, because we only care about accuracy up to , if we add numbers, we can drop all but the first log most signifi- cant digits of our numbers, since including them can change푞/10 the sum of the푚 numbers by at most .100 Hence푚 we can easily do this work in log depth, which100 is log since . Let푚 us now show this more푚(푞/푚 formally:) ≪ 푞 푝표푙푦( 푚) 푝표푙푦( 푛) 푚 = 푝표푙푦(푛) Lemma 17.7 For every there exists some function such that: 푚 푚 푞 1. For every 푐 ∈ ℤsuch that , 푓 ∶ {0, 1} → {0,2. 1} For every 푛 such that , 3. There is a circuit̂푠∈ {0,computing 1}푛 of|⟨ depth ̂푠⟩,푐| < at 0.1푞 most푓( ̂푠)=log 0 . ̂푠∈ {0, 1} 0.4푞 < |⟨ ̂푠⟩,푐| < 0.6푞 푓( ̂푠)=3 1 Proof. For every number ,푓 write to be the number100( that푚) is obtained by writing in the binary basis and setting all digits except 푞 the log most significant푥 ∈ ℤones to zero.̃푥 Note that 푥 . We define to equal if mod10 푚 and to equal 10otherwise (where as usual the absolute 푖 푖 value of modulõ푥≤ 푥 ≤is the ̃푥+ 푞/푚 minimum of and푓( ̂푠) .) Note1 that| ∑ all ̂푠 ̃푐 ( numbers̃푞)| ≥ involved 0.3 ̃푞 have zeroes0 in all but the log most significant digits and푥 so these less̃푞 significant digits푥 cañ푞− be 푥 ignored. Hence we can add any pair of such numbers modulo 10in depth푚 log using the standard elementary school algorithm to add two -digit numbers2 in steps. Now we can add the numbers̃푞 by adding푂( pairs,푚) and then adding2 up the results, and this way in a binary treeℓ of depth log푂(ℓto) get a total depth of log 푚. So, all that is left to prove is that this function satisfies the conditions3 (1) and (2). Note푚 that 푂( 푚) so now we want to show that the effect푓 of taking modulo10 is not much9 different from 푖 푖 푖 푖 taking modulo| ∑. ̂푠Indeed,̃푐 − ∑ note ̂푠 푐 | that< 푚푞/푚 this sum= (before 푞/푚 a modular re- duction) is an integer between and . If̃푞 is such an integer and we divide by 푞 to write for , then since , , and so we can write 0 푞푚 푥 so the difference between 푥mod푞 and 푥mod = 푘푞will + 푟 be (in푟 < our 푞 standard modular푥 < 푞푚 푘metric) < 푚 at most 푥 =. 푘 Overall ̃푞+ 푘(푞 we − get ̃푞) + that 푟 if mod is in the interval푘 푞 10푘 then9 ̃푞 mod will be in the interval 푖 푖 푚푞/푚 = 푞/푚 which is contained in ∑ ̂푠 푐 . 푞 푖 푖 ■ 9 [0.4푞, 0.6푞] 9 ∑ ̂푠 ̃푐 ( ̃푞) [0.4푞 − 100푞/푚 , 0.6푞 + 100푞/푚 ] [0.3 ̃푞, 0.7 ̃푞] This completes the proof that our scheme can fit into the bootstrap- ping theorem (i.e., of Theorem 17.1), hence completing the descrip- tion of the fully homomorphic encryption scheme.

P 304 an intensive introduction to cryptography

Now would be a good point to go back and see you understand how all the pieces fit together to obtain the complete construction of the fully homomorphic encryption scheme.

17.6 ADVANCED TOPICS:

17.6.1 Fully homomorphic encryption for approximate computation over the real numbers: CKKS We have seen how a fully homomorphic encryption for a plaintext bit can be constructed and we are able to evaluate addition and multiplication of ciphertexts as well as a NAND gate in the ciphertext space.푏 One can also extend FHEENC scheme to encrypt a plaintext message and can evaluate multi-bit integer additions and multiplications more efficiently. Our next following question would be 푞 floating/fixed휇 ∈ ℤ point operations. They are similar to integer operations, but we need to be able to evaluate a rounding operation following every computation. Unfortunately, it has been considered difficult to evaluate the rounding operation ensuring the correctness property. An easier solution is to assume approximate computations from the beginning and embrace errors caused by them. CKKS scheme, one of the recent schemes, addressed this challenge by allowing small errors in the decrypted results. Its correctness prop- erty is more relaxed than what we’ve seen before. Now decryption does not necessarily be precisely the original message, and indeed, this resolved the rounding operation problem supporting approxi- mate computation over the real numbers. To get more sense on its construction, recall that when we decrypt a ciphertext in the FHEENC scheme, we have CQ where max . Since ⊤ , multiplying⊤ by this term places a plaintext 푖 √ bit⊤ near the most significant푠 = 푏푄 bits푠 + of 푒 the ciphertext|푒 | ≪ where푞 the plaintext 1 (푄cannot푠) be∈ (0.499푞, polluted 0.5001푞) by the encryption noise. Therefore, we are able to precisely remove the noise we added for the security. However, this kind of separated placement actually makes an evaluation of the rounding operation difficult.푒 On the other hand, the CKKS scheme doesn’t clearly separate the plaintext message and noise in its de- cryption structure. Specifically, we have the form of and the noise lies with the LSB part of the message and⊤ does pollute the lowest bits of the message. Note that this is acceptable푐 푠 = as 푚 long + 푒 as it preserves enough precision. Now we can evaluate rounding(i.e., rescaling in the paper) homomorphically, by dividing both a cipher- text and the parameter by some factor . The concept of handling ciphertexts with a different encryption parameter is already known푐 to be possible. You푞 can find more푝 details′ on this modulus 푞 = 푞/푝 fully homomorphic encryption: construction 305

switching technique in this paper if you are interested. Besides, it is also proved that the precision loss of the decrypted evaluation result is at most one more bit loss compared to the plaintext computation result, which means the scheme’s precision guarantee is nearly opti- mal. This scheme offers an efficient homomorphic encryption setting for many practical data science and machine learning applications which does not require precise values, but approximate ones. You may check existing open source libraries, such as MS SEAL and HEAAN, of this scheme as well as many practical applications including logistic regression in the literature.

17.6.2 Bandwidth efficient fully homomorphic encryption GH When we define homomorphic encryption in Definition 16.3, we only consider a class of single-output functions . Now we want to ex- tend the difinition to multiple-output function and consider how bandwidth-efficient the fully homomorphicℱ encryption can be. More specifically, if we want to guarantee that the result of decryption is(or contains) , what will be the minimal possible length of the ciphertext? Let us first define the compressible fully homomorphic 1 ℓ encryption푓(푥 scheme., … , 푥 ) ::: {.definition title=“Compressible Fully Homomorphic Encryp- tion” #compFHE} A compressible fully homomorphic public key encryption scheme is a CPA secure public key encryption scheme such that there exist polynomial-time algorithms EVAL COMP such that for every , , (퐺, 퐸, 퐷) ∗ , and∗ which can be푛 described by, a circuit,∶ 0, it 1 holds→ 1 ℓ 0,that: 1 ℓ ∗(푒, 푑) = 퐺(1 ) ℓ = 푝표푙푦(푛) 푥 , … , 푥 ∈ {0, 1} 푓 ∶ {0, 1} → {0, 1} • EVAL .

• COMP푒 . 푒 1 푒 ℓ 푐 = (푓, 퐸 (푥 ), … , 퐸 (푥 )) ∗ • is a prefix of $D_d(c^*) $. ::: 푐 = (푐)

1 ℓ 푓(푥This, definition … , 푥 ) is similar to the standard fully homomorphic en- cryption except an additional compression step. The bandwidth efficiency of a compressible fully homomorphic encryption is often described by the rate which is defined as follows: ::: {.definition title=“Rate of Compressible Fully Homomorphic Encryption” #ratecompFHE} A compressible fully homomorphic public key encryption scheme has rate if for every , , , and with sufficiently푛 long output, it holds that 훼 = 훼(푛) ℓ (푒,∗ 푑) = 1 ℓ 퐺(1 ) ℓ = 푝표푙푦(푛) 푥 , … , 푥 ∈ {0, 1} 푓 ∶ {0, 1} → {0, 1}

∗ 1 ℓ ::: 훼|푐 | ≤ |푓(푥 , … , 푥 )|. 306 an intensive introduction to cryptography

The following theorem answers the earlier question, which states that the nearly optimal rate, say a rate arbitrarily close to 1, can be achieved.

Theorem 17.8 — Nearly Optimal Rate, [Gentry and Halevi 2019. (https://eprint.iacr.org/2019/733.pdf)]

For any , there exists a compressive fully homomorphic en- cryption scheme with rate being under the LWE assumption. 휖 > 0 17.6.3 Using fully homomorphic encryption1 − 휖 to achieve private information retrieval. Private information retrieval (PIR) allows the client to retrive the - th entry of a database which has totally entries without letting the server know . We only consider the single-server case here. Obvi-푖 ously, a trivial solution is that the server푛 sends the entire database to the client. 푖 One simple case of PIR is that each entry is a bit, for which the trivial solution above has the communication complexity being . Kushilevitz and Ostrovsky 1997 reduced the the complexity to be smaller than for any . After that, another work (Cachin푛 et al. 1999) further reduced휖 the complexity to . More discus- sion about PIR푂(푛 and) related휖 FHE > 0 techniques can be found in Ostrovsky and Skeith 2007, Yi et al. 2013 and references푝표푙푦푙표푔(푛) therein. One interesting observation is that fully homomorphic encryption can be applied to the single-server PIR via the following procedures:

• The client computes and sends it to the server.

푒 • The server evaluates퐸 (푖)EVAL , where returns the -th entry of the database, and sends it (or its compressed version ) 푒 back to the client. 푐 = (푓, 퐸 (푖)) 푓(푖) ∗ 푖 푐 • The client decrypts or and obtains the -th entry of the database. ∗ 푑 푑 퐷 (푐) 퐷 (푐 ) 푖 • Bandwidth efficient fully homomorphic encryption GH

Since there exists compressive fully homomorphic encryption scheme with nearly optimal rate, say rate arbitrary close to (see The- orem 17.8), we can immediately get rate- PIR for any . (Note that this result holds only for database whose entries is quite1 large, since the rate is defined for circuits with(1 − 휖)sufficiently 휖long output.) Prior to the theorem by Gentry and Halevi 2019, Kiayias et al. 2015 also constructed a PIR scheme with a nearly optimal rate/bandwidth efficiency. The application of fully homomorphic encryption to PIRis a fascinating field; not only limited to the bandwidth efficiency, you fully homomorphic encryption: construction 307

may be also interested in the computational cost. We refer to Gentry and Halevi 2019 for more details.

18 Multiparty secure computation I: Definition and Honest-But- Curious to Malicious complier

Wikipedia defines cryptography as “the practice and study of techniques for secure communication in the presence of third par- ties called adversaries”. However, I think a better definition would be: Cryptography is about replacing trust with mathematics.

After all, the reason we work so hard in cryptography is because a lack of trust. We wouldn’t need encryption if Alice and Bob could be guaranteed that their communication, despite going through wire- less and wired networks controlled and snooped upon by a plethora of entities, would be as reliable as if it has been hand delivered by a letter-carrier as reliable as Patti Whitcomb, as opposed to the nosy Eve who might look in the messages, or the malicious Mallory, who might tamper with them. We wouldn’t need zero knowledge proofs if Vladimir could simply say “trust me Barack, this is an authentic nuke”. We wouldn’t need electronic signatures if we could trust that all software updates are designed to make our devices safer and not, to pick a random example, to turn our phones into surveillance de- vices. Unfortunately, the world we live in is not as ideal, and we need these cryptographic tools. But what is the limit of what we can achieve? Are these examples of encryption, authentication, zero knowledge etc. isolated cases of good fortune, or are they special cases of a more general theory of what is possible in cryptography? It turns out that the latter is the case and there is in fact an extremely general formulation that (in some sense) captures all of the above and much more. This notion is called multiparty secure computation or sometimes secure function evaluation and is the topic of this lecture. We will show (a relaxed version of) what I like to call “the fundamental theorem of cryptography”, namely that under natural computational

Compiled on 9.23.2021 13:32 310 an intensive introduction to cryptography

conjectures (and in particular the LWE conjecture, as well as the RSA or Factoring assumptions) essentially every cryptographic task can be achieved. This theorem emerged from the 1980’s works of Yao, Goldreich-Micali-Wigderson, and many others. As we’ll see, like the “fundamental theorems” of other fields, this is a results that closes off the field but rather opens up many other questions. But before wecan even state the result, we need to talk about how can we even define security in a general setting.

18.1 IDEAL VS. REAL MODEL SECURITY.

The key notion is that cryptography aims to replace trust. Therefore, we imagine an ideal world where there is some universally trusted party (cryptographer Silvio Micali likes to denote by Jimmy Carter, but feel free to swap in your own favorite trustworthy personality) that communicates with all participants of the protocol or interaction, including potentially the adversary. We define security by stating that whatever the adversary can achieve in our real world, could have also been achieved in the ideal world. For example, for obtaining secure communication, Alice will send her message to the trusted party, who will then convey it to Bob. The adversary learns nothing about the message’s contents, nor can she change them. In the zero knowledge application, to prove that there exists some secret such that where is a public function, the prover Alice sends to the trusted party her secret input , the trusted party then푥 verifies that푓(푥) = 1 and simply푓(⋅) sends to Bob the message “the statement is true”. It does not reveal to Bob anything푥 about the secret beyond that. 푓(푥) = 1 But this paradigm goes well beyond this. For example, second price (or Vickrey) auctions푥 are known as a way to incentivize bidders to bid their true value. In these auctions, every potential buyer sends a sealed bid, and the item goes to the highest bidder, who only needs to pay the price of the second-highest bid. We could imagine a digital version, where buyers send encrypted versions of their bids. The auc- tioneer could announce who the winner is and what was the second largest bid, but could we really trust him to do so faithfully? Perhaps we would want an auction where even the auctioneer doesn’t learn anything about the bids beyond the identity of the winner and the value of the second highest bid? Wouldn’t it be great if there was a trusted party that all bidders could share with their private values, and it would announce the results of the auction but nothing more than that? This could be useful not just in second price auctions but to implement many other mechanisms, especially if you are a Danish sugar beet farmer. multiparty secure computation i: definition and honest-but-curious to malicious complier 311

There are other examples as well. Perhaps two hospitals might want to figure out if the same patient visited both, but do not want(or are legally not allowed) to share with one another the list of people that visited each one. A trusted party could get both lists and output only their intersection. The list goes on and on. Maybe we want to aggregate securely information of the performance of Estonian IT firms or the financial health of wall street banks. Almost every cryptographic task could become trivial if we just had access to a universally trusted party. But of course in the real world, we don’t. This is what makes the notion of secure multiparty computation so exciting.

18.2 FORMALLY DEFINING SECURE MULTIPARTY COMPUTATION

We now turn to formal definitions. As we discuss below, there are many variants of secure multiparty computation, and we pick one simple version below. A -party protocol is a set of efficiently com- 1 1 Note that here is not a string which the secret key putable prescribed interactive strategies for all parties. We as- but the number of parties in the protocol. sume the existence of an푘 authenticated and private point to point 푘 푘 푘 channel between every pair of parties (this can be implemented using 2 2 Protocols for parties require also a broadcast signatures and encryptions). A party functionality is a probabilistic channel but these can be implemented using the process mapping inputs in into outputs in .3 combination of푘 authenticated > 2 channels and digital signatures. 푘 푛 푛 3 18.2.1 First퐹 attempt: a푘 slightly “too{0, ideal” 1} definition푘 {0, 1} Fixing the input and output sizes to is done for notational simplicity and is without loss of gener- Here is one attempt of a definition that is clean but a bit too strong, ality. More generally, the inputs and outputs푛 could which nevertheless captures much of the spirit of secure multiparty have sizes up to polynomial in and some inputs or output can also be empty. Also, note that one can de- computation: fine a more general notion푛 of stateful functionalities, though it is not hard to reduce the task of building Definition 18.1 — MPC without aborts. Let be a -party functional- a protocol for stateful functionalities to building ity. A secure protocol for is a protocol for parties satisfying that protocols for stateless ones. for every and every efficient퐹 adversary푘 , there exists an efficient “ideal adversary”퐹 (i.e., efficient푘 interactive algorithm) such that푇 for ⊆ every [푘] set of inputs the following퐴 two distributions are computationally indistinguishable:⧵ 푖 푖∈[푘] 푇 푆 {푥 } • The tuple of outputs of all the parties (both con- trolled and not controlled by the adversary) in an execution of 1 푘 the protocol(푦 where, … , 푦 ) controls the parties in and the inputs of the parties not in are given by . 퐴 ⧵ 푇 푖 푖∈[푘] 푇 • The tuple 푇 that is computed{푥 } using the following pro- cess: 1 푘 (푦 , … , 푦 ) a. We let be chosen by , and compute . ′ ′ 푖 푖∈푇 1 {푥 } 푆 (푦 , … , 푦푘) = 1 푘 퐹 (푥 , … , 푥 ) 312 an intensive introduction to cryptography

b. For every , if (i.e., party is “honest”) then and otherwise, we let choose . ′ 푖 푖 푖 ∈ [푘] 푖 ∉ 푇 푖 푦 = 푦 푖 푆 푦

That is, the protocol is secure if whatever an adversary can gain by taking complete control over the set of parties in , could have been gain by simply using this control to choose particular inputs , run the protocol honestly, and observe the outputs푇 of the functionality. 푖 푖∈푇 Note that in particular if (and hence there is no adversary){푥 } then if the parties’ inputs are ∅ then their outputs will equal . 푇 = 1 푘 (푥 , … , 푥 ) 1 푘 18.2.2퐹 (푥 , … Allowing , 푥 ) for aborts The definition above is a little too strong, in the following sense. Con- sider the case that where there are two parties Alice (Party ) and Bob (Party ) that wish to compute some output . If Bob is controlled by the푘 adversary = 2 then he clearly can simply abort the1 1 2 protocol and prevent2 Alice from computing . Thus,퐹 in (푥 this, 푥 case) in the actual execution of the protocol the output will be some error 1 message (which we denote by ). But we did푦 not allow this possiblity 1 for the idealized adversary : if then it must푦 be the case that the output is equal to for some⊥ . 푆 1 ∉ 푆 This means that we′ would be able′ to′ distinguish between the 4 1 1 14 2 1 2 As a side note, we can avoid this issue if we have an output in푦 the real and푦 ideal setting.(푦 , 푦This) = motivates 퐹 (푥 , 푥 ) the following, honest majority of players - i.e. if , but this slightly more messy definition, that allows for the ability of the adver- of course makes no sense in the two party setting.) sary to abort the execution at any point in time: |푇 | < 푘/2

Definition 18.2 — MPC with aborts. Let be a -party functionality. A secure protocol for is a protocol for parties satisfying that for ev- ery and every efficient 퐹adversary푘 , there exists an efficient “ideal adversary”퐹 (i.e., efficient interactive푘 algorithm) such that for every푇 ⊆ [푘]set of inputs the following퐴 two distributions are computationally indistinguishable:⧵ 푆 푖 푖∈[푘] 푇 {푥 } • The tuple of outputs of all the parties (both con- trolled and not controlled by the adversary) in an execution of 1 푘 the protocol(푦 where, … , 푦 ) controls the parties in and the inputs of the parties not in are given by we denote by if the party aborted퐴 the protocol. ⧵ 푇 푖 푖∈[푘] 푇 푖 푡ℎ 푇 {푥 } 푦 = ⊤ • The tuple푖 that is computed using the following pro- cess: 1 푘 (푦 , … , 푦 ) multiparty secure computation i: definition and honest-but-curious to malicious complier 313

a. We let be chosen by , and compute . ′ ′ 푖 푖∈푇 1 {푥 } 푆 (푦 , … , 푦푘) = 1 푘 b.퐹 For (푥 , … , 푥 ) do the following: ask if it wishes to abort at this stage, and if it doesn’t then the party learns . If the adversary푖 = did 1, …abort , 푘 then we exit the푡ℎ loop푆 at this stage′ and the 푖 parties (regardless if they푖 are honest or malicious)푦 do not learn the corresponding outputs. 푖 + 1, … , 푘 c. Let be the last non-abort stage we reached above. For every ′, if then and if then . We let the adversary푘 choose′ ′. ′ ′ 푖 푖 푖 푖 ∉ 푇 푖 ≤ 푘 푦 = 푦 푖 > 푘 푦 = ⊥ 푖 푖∈푇 푆 {푦 }

Figure 18.1: We define security of a protocol imple- menting a functionality by stipulating that for every adversary that control a subset of the parties, ’s view in an actual execution퐹 of the protocol would be indistinguishable퐴 from its view in an ideal setting 퐴where all the parties send their inputs to an idealized and perfectly trusted party, who then computes the outputs and sends it to each party.

Here are some good exercises to make sure you follow the defini- tion:

• Let be the two party functionality such that outputs if the graph equals the graph and is a Hamiltonian′

퐹 퐹 (퐻‖퐶, 퐻 ) 5 cycle and otherwise outputs . Prove′ that a protocol for com- Actually, if we want to be pedantic, this is what’s (1,puting 1) is a zero퐻 knowledge proof5 system퐻 for퐶 the language of known as a zero knowledge argument system since Hamiltonicity.6 (0, 0) soundness is only guaranteed against efficient provers. However, this distinction is not important in 퐹 almost all applications. • Let be the -party functionality that on inputs 6 Our treatment of the input graph is an instance outputs to all parties the majority value of the ’s. Then, in any of a general case. While the definition of a function- 1 푘 protocol퐹 that푘 securely computes , for any adversary푥 , … that , 푥 controls∈ {0, 1} ality only talks about private inputs,퐻 it’s very easy to 푖 include public inputs as well. If we want to include less than half of the parties, if at least of푥 the other parties’ in- some public input we can simply have concate- puts equal , then the adversary퐹 will not be able to cause an honest nated to all the private inputs (and the functionality 푛/2+1 check that they are푍 all the same, otherwise푍 outputting party to output . error or some similar result). 0 1 314 an intensive introduction to cryptography

P It is an excellent idea for you to pause here and try to work out at least informally these exercises.

Amazingly, we can obtain such a protocol for every functionality:

Theorem 18.3 — Fundamental theorem of cryptography. Under reason- able assumptions 7 for every polynomial-time computable - functionality there is a polynomial-time protocol that computes it securely. 푘 퐹 7 Originally this was shown under the assumption of Theorem 18.3 was originally proven by Yao in 1982 for the spe- trapdoor permutations (which can be derived from cial case of two party functionalities, and then proved for the general the Factoring or RSA conjectures) but it is known today under a variety of other assumptions, including case by Goldreich, Micali, and Wigderson in 1987. As discussed be- in particular the LWE conjecture. low, many variants of this theorem has been shown, and this line of research is still ongoing.

18.2.3 Some comments: There is in fact not a single theorem but rather many variants of this fundamental theorem obtained by great many people, depending on the different security properties desired, as well as the different cryptographic and setup assumptions. Some of the issues studied in the literature include the following:

• Fairness, guaranteed output delivery: The definition above does not attempt to protect against “denial of service” attacks, in the sense that the adversary is allowed, even in the ideal case, to pre- vent the honest parties from receiving their outputs. As mentioned above, without honest majority this is essential for simlar reasons to the issue we discussed in our lecture on bitcoin why achieving consensus is hard if there isn’t a honest majority. When there is an honest majority, we can achieve the property of guaranteed output delivery, which offers protection against such “de- nial of service” attacks. Even when there is no guaranteed output delivery, we might want the property of fairness, whereas we guar- antee that if the honest parties don’t get the output then neither does the adversary. There has been extensive study of fairness and there are protocols achieving variants on it under various computa- tional and setup assumptions.

• Network models: The current definition assumes we have a set of parties with known identities with pairwise secure (confidential and authenticated) channels between them. Other network models studies푘 include broadcast channel, non-private networks, and even no authentication). multiparty secure computation i: definition and honest-but-curious to malicious complier 315

• Setup assumptions: The definition does not assume a trusted third party, but people have studied different setup assumptions includ- ing a public key infrastructure, common reference string, and more.

• Adversarial power: It turns out that under certain condition, it can be possible to obtain secure multiparty computation with respect to adversaries that have unbounded computational power (so called “information theoretic security”). People have also studies different variants of adversaries including “honest but curious” or “passive adversaries”, as well as “covert” adversaries that only deviate from the protocol if they won’t be caught. Other settings studied limit the adversary’s ability to control parties (e.g., honest majority, smaller fraction of parties or particular patterns of control, adaptive vs static corruption).

• Concurrent compositions: The definition displayed above are for standalone execution which is known not to automatically imply security with respect to concurrent composition, where many copies of the same protocol (or different protocols) could be executed 8 One example of the kind of issues that can arise is 8 simultaneously. This opens up all sorts of new attacks. See Yehuda the “grandmasters attack” whereby someone with Lindell’s thesis (or this updated version) for more. A very general no knowledge of chess could play two grandmasters simultaneously, relaying their moves to one another notion known as “UC security” (which stands for “Universally and thereby guaranteeing a win in at least one of the Composable” or maybe “Ultimate Chuck”) has been proposed to games (or a draw in both). achieve security in these settings, though at a price of additional setup assumptions, see here and here.

• Communication: The communication cost for Theorem 18.3 can be proportional to the size of the circuit that computes . This can be a very steep cost, especially when computing over large amounts of data. It turns out that we can sometimes avoid this cost퐹 using fully homomorphic encryption or other techniques.

• Efficiency vs. generality: While Theorem 18.3 tells us that essen- tially every protocol problem can be solved in principle, its proof will almost never yield a protocol you actually want to run since it has enormous efficiency overhead. The issue of efficiency isthe biggest reason why secure multiparty computation has so far not had a great many practical applications. However, researchers have been showing more efficient tailor-made protocols for particular problems of interest, and there has been steady progress in making those results more practical. See the slides and videos from this workshop for more.

Is multiparty secure computation the end of crypto? The notion of secure multiparty computation seems so strong that you might think that once it is achieved, aside from efficiency issues, there is nothing else to 316 an intensive introduction to cryptography

be done in cryptography. This is very far from the truth. Multiparty secure computation do give a way to solve a great many problems in the setting where we have arbitrary rounds of interactions and un- bounded communication, but this is far from being always the case. As we mentioned before, interaction can sometimes make a quali- tative difference (when Alice and Bob are separated by time rather than space). As we’ve seen in the discussion of fully homomorphic encryption, there are also other properties, such as compact commu- nication, which are not implied by multiparty secure computation but can make all the difference in contexts such as cloud computing. That said, multiparty secure computation is an extremely general paradigm that does apply to many cryptographic problems.

Further reading: The survey of Lindell and Pinkas gives a good overview of the different variants and security properties considered in the literature, see also Section 7 in this survey of Goldreich. Chapter 6 in Pass and Shelat’s notes is also a good source.

18.3 EXAMPLE: SECOND PRICE AUCTION USING BITCOIN

Suppose we have the following setup: an auctioneer wants to sell some item and run a second-price auction, where each party submits a sealed bid, and the highest bidder gets the item for the price of the second highest bid. However, as mentioned above, the bidders do not want the auctioneer to learn what their bid was, and in general noth- ing else other than the identity of the highest bidder and the value of the second highest bid. Moreover, we might want the payment is via an electronic currency such as bitcoin, so that the auctioneer not only gets the information about the winning bid but an actual self- certifying transaction they can use to get the payment. Here is how we could obtain such a protocol using secure multi- party computation:

• We have parties where the first party is the auctioneer and parties are bidders. Let’s assume for simplicity that each party has a public key푘 that is associated with some bitcoin account.9 We 9 As we discussed before, bitcoin doesn’t have the notion of accounts but rather what we mean by that 2,treat … , all 푘 these keys as the public input. 푖 푖 for each one of the public keys, the public ledger 푣 contains a sufficiently large amount of bitcoins that • The private input of bidder is the value that it wants to bid as have been transferred to these keys (in the sense that well as the secret key that corresponds to their public key. whomever can sign w.r.t. these keys can transfer the 푖 푖 푥 corresponding coins). • The functionality only푖 provides an output to the auctioneer, which 푠 would be the identity of the winning bidder as well as a valid signature on this bidder transferring bitcoins to the key of the auctioneer, where is푖 the value of the second largest valid bid (i.e., 1 equals to the second largest such푥 that is indeed the푣 private key corresponding푥 to .) 푗 푗 푥 푥 푠 푗 푣 multiparty secure computation i: definition and honest-but-curious to malicious complier 317

It’s worthwhile to think about what a secure protocol for this func- tionality accomplishes. For example:

• The fact that in the ideal model the adversary needs to choose its queries independently means that the adversary cannot get any information about the honest parties’ bids before deciding on its bid.

• Despite all parties using their signing keys as inputs to the protocol, we are guaranteed that no one will learn anything about another party’s signing key except the single signature that will be pro- duced.

• Note that if is the highest bidder and is the second highest, then at the end of the protocol we get a valid signature using on a transaction transferring푖 bitcoins to 푗, despite not knowing the 푖 value (and in fact never learning the identity of .) Nonetheless,푠 푗 1 is guaranteed that the signature푥 produced푣 will be푖 on an amount 푗 not larger푥 than its own bid and an amount that one푗 of the other 푖bidders actually bid for.

I find the ability to obtain such strong notions of security pretty remarkable. This demonstrates the tremendous power of obtaining protocols for general functionalities.

18.3.1 Another example: distributed and threshold cryptography It sometimes makes sense to use multiparty secure computation for cryptographic computations as well. For example, there might be several reasons why we would want to “split” a secret key between several parties, so no party knows it completely.

• Some proposals for key escrow (giving government or other entity an option for decrypting communication) suggested to split a cryp- tographic key between several agencies or institutions (say the FBI, the courts, etc..) so that they must collaborate in order to decrypt communication, thus hopefully preventing unlawful access.

• On the other side, a company might wish to split its own key be- tween several servers residing in different countries, to ensure not one of them is completely under one jurisdiction. Or it might do such splitting for technical reasons, so that if there is a break in into a single site, the key is not compromised.

There are several other such examples. One problem with this ap- proach is that splitting a cryptographic key is not the same as cutting a 100 dollar bill in half. If you simply give half of the bits to each party, 318 an intensive introduction to cryptography

you could significantly harm security. (For example, it is possible to recover the full RSA key from only of its bits). Here is a better approach, known as secret sharing: To securely share a string among parties27% so that any of them have no information about푛 it, we choose at random in and let푠 ∈ {0, 1} 푘 ( as usual denotes푘 − the 1 XOR 1 푘−1 operation),푛 and give party the string ,푠 which, … , 푠 is known as the 푘 1 푘−1 {0,share 1}of . Note푠 that= 푠 ⊕ 푠 ⊕ ⋯ 푠 and⊕ so given all pieces we푡ℎ can 푖 reconstruct the key. Clearly푖 the first 푠 parties did not receive푖 any 1 푡 information푠 about 푠(since = 푠 their⊕ ⋯ ⊕ shares 푠 were generated푘 independent of ), but the following not-too-hard claim푘 − shows1 that this holds for every set of parties:푠 푠 Lemma 18.4 For every , and set of size , we get 푘 − 1 exactly the same distribution over푛 as above if we choose for at random and푠 ∈ set {0, 1} 푇where ⊆ [푘] 푘 −. 1 1 푘 푖 (푠 , … , 푠 ) ⧵ 푠 We leave the proof of Lemma푡 18.4푖∈푇as an푖 exercise. 푖 ∈ 푇 푠 = 푠 ⊕ 푠 푡 = [푘] 푇 Secret sharing solves the problem of protecting the key “at rest” but if we actually want to use the secret key in order to sign or decrypt some message, then it seems we need to collect all the pieces together into one place, which is exactly what we wanted to avoid doing. This is where multiparty secure computation comes into play, we can de- fine a functionality taking public input and secret inputs and producing a signature or decryption of . In fact, we can go be- 1 푘 yond that and even퐹 have the parties sign or푚 decrypt a message푠 without, … , 푠 them knowing what this message is, except that푚 it satisfies some con- ditions. Moreover, secret sharing can be generalized so that a threshold other than is necessary and sufficient to reconstruct the secret (and people have also studied more complicated access patterns). Similarly multiparty푘 secure computation can be used to achieve distributed cryptography with finer access control mechanisms.

18.4 PROVING THE FUNDAMENTAL THEOREM:

We will complete the proof of (a relaxed version of) the fundamental theorem over this lecture and the next one. The proof consists of two phases:

1. A protocol for the “honest but curious” case using fully homomor- phic encryption.

2. A reduction of the general case into the “honest but curious” case where the adversary follows the protocol precisely but merely attempts to learn some information on top of the output that it is multiparty secure computation i: definition and honest-but-curious to malicious complier 319

“entitled to” learn. (This reduction is based on zero knowledge proofs and is due to Goldreich, Micali and Wigderson)

We note that while fully homomorphic encryption yields a con- ceptually simple approach for the second step, it is not currently the most efficient approach, and rather most practical implementations are based on the technique known as “Yao’s Garbled Ciruits” (see this book or this paper or this survey ) which in turn is based a no- tion known as oblivious transfer which can be thought of as a “baby private information retrieval” (though it preceded the latter notion). We will focus on the case of two parties. The same ideas extend to parties but with some additional complications.

18.5푘 > 2 MALICIOUS TO HONEST BUT CURIOUS REDUCTION

We start from the second stage. Giving a reduction transforming a protocol in the “honest but curious” setting into a protocol secure in the malicious setting. That is, we will prove the following theorem:

Theorem 18.5 — Honest-but-curious to malicious security compiler. There is a polynomial-time “compiler” such that for every for ev- ery party protocol (where all ’s are polynomial- time computable potentially randomized퐶 strategies), if we let 1 푘 푖 푘 (푃 , … , 푃will) be a -tuple푃 polynomial-time com- putable strategies and moreover if was a protocol for 1 푘 1 푘 (computing푃̃ ,…, 푃̃ ) some = 퐶(푃 (potentially, … , 푃 ) randomized)푘 functionality secure 1 푘 with respect to honest-but-curious(푃 adversaries,, … , 푃 ) then is a protocol for computing the same secure with respect to퐹malicious 1 푘 adversaries. (푃̃ ,…, 푃̃ ) 퐹 The remainder of this section is devoted to the proof of Theo- rem 18.5. For ease of notation we will focus on the case, where there are only two parties (“Alice” and “Bob”) although these tech- niques generalize to arbitrary number of parties .푘 Note = 2 that a priori, it is not obvious at all that such a compiler should exist. In the “honest but curious” setting we assume the adversary follows푘 the protocol to the letter. Thus a protocol where Alice gives away all her secrets to Bob if he merely asks her to do so politely can be secure in the “honest but curious” setting if Bob’s instructions are not to ask. More seri- ously, it could very well be that Bob has an ability to deviate from the protocol in subtle ways that would be completely undetectable but allow him to learn Alice’s secrets. Any transformation of the protocol to obtain security in the malicious setting will need to rule out such deviations. The main idea is the following: we do the compilation one party at a time - we first transform the protocol so that it will remain se- 320 an intensive introduction to cryptography

cure even if Alice tries to cheat, and then transform it so it will remain secure even if Bob tries to cheat. Let’s focus on Alice. Let’s imagine (without loss of generality) that Alice and Bob alternate sending mes- sages in the protocol with Alice going first, and so Alice sends the odd messages and Bob sends the even ones. Lets denote by the message sent in the round of the protocol. Alice’s instructions can 푖 be thought of as a sequence푡ℎ of functions (where푚is the last round in which푖 Alice speaks) where each is an efficiently com- 1 3 푡 putable function mapping Alice’s secret input푓 , 푓 , ⋯, , (possibly) 푓 푡 her ran- 푖 dom coins , and the transcript of the previous푓 messages 1 to the next message . The functions are푥 publicly known and 1 1 푖−1 part of the푟 protocol’s instructions. The only thing that Bob푚 doesn’t, … , 푚 푖 푖 know is and . So,푚 our idea would be{푓 to} change the protocol so that after Alice sends the message , she proves to Bob that it was in- 1 1 deed computed푥 푟 correctly using . If and weren’t secret, Alice could simply send those to Bob so푖 he can verify the computation on 푖 1 1 his own. But because they are (and푓 the푥 security푟 of the protocol could depend on that) we instead use a zero knowledge proof. Let’s assume for starters that Alice’s strategy is deterministic (and so there is no random tape ). A first attempt to ensure she can’t use a malicious strategy would be for Alice to follow the message 1 with a zero knowledge proof푟 that there exists some such that . However, this will actually not be secure 푖 1 푚- it is worth while at this point for you to pause and think푥 if you can 푖 1 1 푖−1 푚understand= 푓(푥 , 푚the,problem … , 푚 ) with this solution.

P Really, please stop and think why this will not be secure. multiparty secure computation i: definition and honest-but-curious to malicious complier 321

P Did you stop and think?

The problem is that at every step Alice proves that there exists some input that can explain her message but she doesn’t prove that it’s the same input for all messages. If Alice was being truly honest, she 1 should have푥 picked her input once and use it throughout the protocol, and she could not compute the first message according to the input and then the third message according to some input . Of course we can’t have Alice reveal the input, as this would′ violate 1 1 1 푥security. The solution is for Alice to commit in advance to푥 the≠ input. 푥 We have seen commitments before, but let us now formally define the notion:

Definition 18.6 — Commitment scheme. A commitment scheme for strings of length is a two party protocol between the sender and receiver satisfying the following: ℓ • Hiding (sender’s security): For every two sender inputs , and no matter what efficient strategy the re- ceiver′ uses, it cannotℓ distinguish between the interaction with the푥, 푥 sender∈ when {0, 1} the latter uses as opposed to when it uses . ′ • Binding (reciever’s security): No푥 matter what (efficient or푥 non efficient) strategy the sender uses, if the reciever follows the pro- tocol then with probability , there will exist at most a single string such that the transcript is consistent with the input and some sender1ℓ − 푛푒푔푙(푛) randomness . 푥 ∈ {0, 1} 푥 푟 That is, a commitment is the digital analog to placing a message in a sealed envelope to be opened at a later time. To commit to a message the sender and reciever interact according to the protocol, and to open the commitment the sender simply sends as well as the random 푥coins it used during the commitment phase. The variant we defined above is known as computationally hiding and statistically푥 binding, since the sender’s security is only guaranteed against efficient receivers while the binding property is guaranteed against all senders. There are also statistically hiding and computationally binding commit- ments, though it can be shown that we need to restrict to efficient strategies for at least one of the parties. We have already seen a commitment scheme before (due to Naor): the receiver sends a random and the sender commits to a bit by choosing a random and3푛 sending PRG 푅 mod where PRG 푧 ← {0, 1}푛 is a pseudorandom generator. 푏 푛 푠 ∈ {0, 1}3푛 푦 = (푠) + 푏푧( 2) ∶ {0, 1} → {0, 1} 322 an intensive introduction to cryptography

It’s a good exercise to verify that it satisfies the above definitions. By running this protocol times in parallel we can commit to a string of any polynomial length. We can now describeℓ the transformation ensuring the protocol is secure against a malicious Alice in full, for the case that that the original strategy of Alice is deterministic (and hence uses no random coins)

• Initially Alice and Bob engage in a commitment scheme where Alice commits to her input . Let be the transcript of this com- 10 Note that even though we assumed that in the mitment phase and be the randomness Alice used during it.10 1 original honest-but-curious protocol Alice used a 푥 휏 deterministic strategy, we will transform the protocol 푐표푚 • For : 푟 into one in which Alice uses a randomized strategy in both the commitment and zero knowledge phases. – If푖 =is 1, even 2, … then Bob sends to Alice

– If is odd then Alice sends 푖 to Bob and then they engage in 푖 푚 a zero knowledge proof that there exists such that (1) 푖 푖 is consistent with 푚, and (2) . 1 푐표푚 The proof is repeated a sufficient number푥 , 푟 of times to ensure 1 푐 푖 1 1 푖−1 푥that, 푟 if표푚 the statement is false휏 then Bob rejects푚 = 푓(푥 with, 푚 , … , 푚 ) probability. 1 − 푛푒푔푙(푛) – If the proof is rejected then Bob aborts the protocol.

We will not prove security but will only sketch it here, see Section 7.3.2 in Goldreich’s survey for a more detailed proof:

• To argue that we maintain security for Alice we use the zero knowl- edge property: we claim that Bob could not learn anything from the zero knowledge proofs precisely because he could have sim- ulated them by himself. We also use the hiding property of the commitment scheme. To prove security formally we need to show that whatever Bob learns in the modified protocol, he could have learned in the original protocol as well. We do this by simulating Bob by replacing the commitment scheme with commitment to some random junk instead of and the zero knowledge proofs with their simulated version. The proof of security requires a hy- 1 brid argument, and is again a푥 good exercise to try to do it on your own.

• To argue that we maintain security for Bob we use the binding prop- erty of the commitment scheme as well as the soundness property of the zero knowledge system. Once again for the formal proof we need to show that we could transform any potentially mali- cious strategy for Alice in the modified protocol into an “honest but curious” strategy in the original protocol (also allowing Alice the ability to abort the protocol). It turns out that to do so, it is not multiparty secure computation i: definition and honest-but-curious to malicious complier 323

enough that the zero knowledge system is sound but we need a stronger property known as a proof of knowledge. We will not define it formally, but roughly speaking it means we can transform any prover strategy that convinces the verifier that a statement is true with non-negligible probability into an algorithm that outputs the underlying secret (i.e., and in our case). This is crucial in order to trasnform Alice’s potentially malicious strategy into an 1 푐 honest but curious strategy.푥 푟 표푚

We can repeat this transformation for Bob (or Charlie, David, etc.. in the party case) to transform a protocol secure in the honest but curious setting into a protocol secure (allowing for aborts) in the malicious푘 > setting. 2

18.5.1 Handling probabilistic strategies: So far we assumed that the original strategy of Alice in the honest but curious is deterministic but of course we need to consider probabilistic strategies as well. One approach could be to simply think of Alice’s random tape as part of her secret input . However, while in the honest but curious setting Alice is still entitled to freely choose her 1 own input ,푟 she is not entitled to choose푥 the random tape as she wishes but is supposed to follow the instructions of the protocol and 1 choose it uniformly푥 at random. Hence we need to use a coin tossing protocol to choose the randomness, or more accurately what’s known as a “coin tossing in the well” protocol where Alice and Bob engage in a coin tossing protocol at the end of which they generate some ran- dom coins that only Alice knows but Bob is still guaranteed that they are random. Such a protocol can actually be achieved very simply. Suppose we푟 want to generate coins:

• Alice selects 푚at random and engages in a commitment protocol to commit′ to . 푚 푅 • Bob selects 푟 ← {0,′ 1} and sends it to Alice in the clear. • The result of″ the coin푟 tossing푚 protocol will be the string . 푅 푟 ← {0, 1} ′ ″ Note that Alice knows . Bob doesn’t know but because푟 = he 푟 chose⊕ 푟 after Alice committed to he knows that it must be fully random regardless″ of Alice’s choice푟 of′ . It can be shown푟 that if we use this 푟coin tossing protocol at the푟 beginning′ and then modify the zero knowledge proofs to show that푟 where is the string that is consistent with the transcript of the coin toss- 푖 1 1 1 푖−1 ing protocol, then we get a general푚 transformation= 푓(푥 , 푟 , 푚 , of … ,an 푚 honest) but curious푟 adversary into the malicious setting. The notion of multiparty secure computation - defining it and achieving it - is quite subtle and I do urge you to read some of the 324 an intensive introduction to cryptography

other references listed above as well. In particular, the slides and videos from the Bar Ilan winter school on secure computation and efficiency, as well as the ones from the winter school on advances in practical multiparty computation are great sources for this and related materials. 19 Multiparty secure computation II: Construction using Fully Homomorphic Encryption

In the last lecture we saw the definition of secure multiparty com- putation, as well as the compiler reducing the task of achieving se- curity in the general (malicious) setting to the passive (honest-but- curious) setting. In this lecture we will see how using fully homo-

morphic encryption we can achieve security in the honest-but-curious 1 1 This is by no means the only way to get multiparty setting. We focus on the two party case, and so prove the following secure computation. In fact, multiparty secure com- theorem: putation was known well before FHE was discovered. One common construction for achieving this uses a technique known as Yao’s Garbled Circuit. Theorem 19.1 — Two party honest-but-curious MPC. Assuming the LWE conjecture, for every two party functionality there is a protocol computing in the honest but curious model. 퐹 Before proving퐹 the theorem it might be worthwhile to recall what is actually the definition of secure multiparty computation, when specialized for the and honest but curious case. The defini- tion significantly simplifies here since we don’t have to deal withthe possibility of aborts.푘 = 2

Definition 19.2 — Two party honest-but-curious secure computation. Let be (possibly probabilistic) map of to . A secure protocol for is a two party protocol푛 such푛 for every푛 party퐹 푛 , there exists an efficient{0, 1} דideal {0, 1} adversary”{0, 1} (i.e.,× {0, 1} efficient interactive algorithm)퐹 such that for every pair of inputs 푡the ∈ following {1, 2} two distributions are computationally indistinguish- 1 2 able: 푆 (푥 , 푥 )

• The tuple obtained by running the protocol on inputs , and letting be the outputs of the two parties and 1 2 be the view(푦 ,(all 푦 , internal푣) randomness, inputs, and messages 1 2 1 2 푥received), 푥 of party푦. , 푦 푣 푡

Compiled on 9.23.2021 13:32 326 an intensive introduction to cryptography

• The tuple that is computed by letting and . 1 2 1 2 (푦 , 푦 , 푣) (푦 , 푦 ) = 1 2 푡 푡 퐹That (푥 , is, 푥 ), which푣 = only 푆(푥 gets, 푦 ) the input and output , can sim- ulate all the information that an honest-but-curious adversary 푡 푡 controlling푆 party will view. 푥 푦

푡 19.1 CONSTRUCTING 2 PARTY HONEST BUT CURIOUS COMPUTA- TION FROM FULLY HOMOMORPHIC ENCRYPTION

Let be a two party functionality. Lets start with the case that is de- terministic and that only Alice receives an output. We’ll later show an easy퐹 reduction from the general case to this one. Here is a suggested퐹 protocol for Alice and Bob to run on inputs respectively so that Alice will learn but nothing more about , and Bob will learn nothing about that he didn’t know before.푥, 푦 ::: Protocol 2PC:퐹 (푥,(See 푦) Fig. 19.1) 푦 푥 • Assumptions: EVAL is a fully homomorphic encryption scheme. (퐺, 퐸, 퐷, ) • Inputs: Alice’s input is and Bob’s input is . The goal is for Alice to learn only 푛 and Bob to learn nothing.푛 푥 ∈ {0, 1} 푦 ∈ {0, 1} • Alice->Bob: Alice generates 퐹 (푥, 푦) and sends and Figure 19.1: An honest but curious protocol for two . 푛 party computation using a fully homomorphic 푅 (푒, 푑) ← 퐺(1 ) 푒 encryption scheme with circuit privacy. 푒 • Bob->Alice:푐 = 퐸 (푥) Bob computes define to be the function and sends EVAL to Alice. ′ 푓 푓(푥) = • 퐹Alice’s (푥, 푦) output: Alice푐 = computes(푓, 푐) . ′ 푑 ::: 푧 = 퐷 (푐 ) First, note that if Alice and Bob both follow the protocol, then in- deed at the end of the protocol Alice will compute . We now claim that Bob does not learn anything about Alice’s input: 퐹 (푥, 푦) Claim B: For every , there exists a standalone algorithm such that is indistinguishable from Bob’s view when interacting with Alice and their corresponding푥, 푦 inputs are . 푆 푆(푦) Proof: Bob only receives a single message(푥, in 푦) this protocol of the form where is a public key and . The simulator will generate and compute where . 푒 (푒,(As 푐) usual 푒denotes the푛 length 푐string = 퐸 consisting(푥) of all zeroes.)푆 푛 푅 푒 No matter(푒, what푛 푑) ← is,퐺(1 the output) of is indistinguishable(푒, 푐) 푐 = from 퐸 (0 the) message Bob0 receives by the security푛 of the encryption scheme. QED 푥 푆 multiparty secure computation ii: construction using fully homomorphic encryption 327

(In fact, Claim B holds even against a malicious strategy of Bob- can you see why?) We would now hope that we can prove the same regarding Alice’s security. That is prove the following:

Claim A: For every , there exists a standalone algorithm such that is indistinguishable from Alice’s view when interacting with Bob and their corresponding푥, 푦 inputs are . 푆 푆(푦) (푥, 푦) P At this point, you might want to try to see if you can prove Claim A on your own. If you’re having difficul- ties proving it, try to think whether it’s even true.

...... 328 an intensive introduction to cryptography

So, it turns out that Claim A is not generically true. The reason is the following: the definition of fully homomorphic encryption only re- quires that EVAL decrypts to but it does not require that it hides the contents of . For example, for every FHE, if we modify EVAL to append(푓, 퐸(푥)) to the ciphertext푓(푥) the first bits of the descrip- 푓 tion of (and have the decryption algorithm ignore this extra infor- 2 2 It’s true that strictly speaking, we allowed EVAL’s mation)(푓, then 푐) this would still be a secure FHE. 100Now we didn’t exactly output to have length at most , while this would specify푓 how we describe the function defined as make the output be , but this is just a tech- but there are clearly representations in which the first bits of the nicality that can be easily bypassed,푛 for example by having a new scheme푛 that+ 100 on security parameter description would reveal the first few푓(푥) bits of the hardwired푥 ↦ 퐹 (푥, constant 푦) , runs the original scheme with parameter (and hence meaning that Alice will learn those bits from Bob’s100 message. hence will have a lot of “room” to pad the output푛 of EVAL with extra bits). 푛/2 Thus we need to get a stronger property, known as circuit privacy. 푦 This is a property that’s useful in other contexts where we use FHE. Let us now define it:

Definition 19.3 — Perfect circuit privacy. Let EVAL be an FHE. We say that satisfies perfect circuit privacy if for every output by and every functionℰ = (퐺, 퐸, 퐷, ) of description푛ℰ size, and every ciphertexts ℓ and (푒, 푑) 퐺(1 ) such that is output푓 ∶ by {0, 1} ,→ the distri- {0, 1} 1 ℓ bution푝표푙푦(푛) of EVAL is identical to the distribution푐 , … , 푐 of 1 ℓ 푖 푒 푖 푥 , … , 푥 . That∈ is, for {0, every 1} 푐 , the퐸 probability(푥 ) that 푒 1 ℓ EVAL (푓, 푐 , … , 푐 ) is the same as∗ the probability that 푒 퐸 (푓(푥)) . We stress푧 that ∈ these probabilities {0, 1} are taken only 푒 1 ℓ over the(푓, coins 푐 , … of , 푐 the) algorithms = 푧EVAL and . 푒 퐸 (푓(푥)) = 푧 Perfect circuit privacy is a strong property,퐸 that also automatically implies that EVAL (can you see why?). In particular, once you understand the definition, the follow- 푑 푒 1 푒 ℓ ing lemma is퐷 a fairly( straightforward(푓, 퐸 (푥 ), … , 퐸 exercise.(푥 ))) = 푓(푥) Lemma 19.4 If EVAL satisfies perfect circuit privacy then if then for every two functions of description푛(퐺, 퐸, size퐷, and every) such′ that ℓ , (푒,and 푑) every = 퐺(1 algorithm) , 푓,ℓ 푓 ∶ {0, 1} → {0, 1}′ 푝표푙푦(푛) 푥 ∈ {0, 1} 푓(푥) = 푓 (푥) 퐴 Pr EVAL Pr EVAL ′ (19.1) 푒 1 푒 ℓ 푒 1 푒 ℓ | [퐴(푑, (푓, 퐸 (푥 ), … , 퐸 (푥 ))) = 1]− [퐴(푑, (푓 , 퐸 (푥 ), … , 퐸 (푥 ))) = 1]| < 푛푒푔푙(푛).

P Please stop here and try to prove Lemma 19.4

The algorithm above gets the secret key as input, but still cannot distinguish whether the EVAL algorithm used or . In fact, the 퐴 ′ 푓 푓 multiparty secure computation ii: construction using fully homomorphic encryption 329

expression on the lefthand side of (19.1) is equal to zero when the scheme satisfies perfect circuit privacy. However, for our applications bounding it by a negligible function is enough. Hence, we can use the relaxed notion of “imperfect” circuit privacy, defined as follows:

Definition 19.5 — Statistical circuit privacy. Let EVAL be an FHE. We say that satisfies statistical circuit privacy if for every output by and every functionℰ = (퐺, 퐸, 퐷, ) of description푛ℰ size, and every ciphertexts ℓ and (푒, 푑) 퐺(1such) that is output by푓 ∶ {0,, 1}the distribution→ {0, 1} 1 ℓ of 푝표푙푦(푛)EVAL is equal up to total푐 variation, … , 푐 distance 1 ℓ 푖 푒 푖 푥to, the … , distribution 푥 ∈ {0, 1} of 푐. 퐸 (푥 ) 푒 1 ℓ That is,(푓, 푐 , … , 푐 ) 푛푒푔푙(푛) 푒 퐸 (푓(푥)) Pr EVAL Pr

푒 1 ℓ 푒 ∑ ∗ | [ (푓, 푐 , … , 푐 ) = 푧] − [퐸 (푓(푥)) = 푧]| < 푛푒푔푙(푛) 푧∈{0,1}where once again, these probabilities are taken only over the coins of the algorithms EVAL and .

If you find Definition 19.5 hard to퐸 parse, the most important points you need to remember about it are the following:

• Statistical circuit privacy is as good as perfect circuit privacy for all applications, and so you can imagine the latter notion when using it.

• Statistical circuit privacy can easier to achieve in constructions.

(The third point, which goes without saying, is that you can always ask clarifying questions in class, Piazza, sections, or office hours…) Intuitively, circuit privacy corresponds to what we need in the above protocol to protect Bob’s security and ensure that Alice doesn’t get any information about his input that she shouldn’t have from the output of EVAL, but before working this out, let us see how we can construct fully homomorphic encryption schemes satisfying this property.

19.2 ACHIEVING CIRCUIT PRIVACY IN A FULLY HOMOMORPHIC ENCRYPTION

We now discuss how we can modify our fully homomorphic encryp- tion schemes to achieve the notion of circuit privacy. In the scheme we saw, the encryption of a bit , whether obtained through the en- cryption algorithm or EVAL, always had the form of a matrix over (for ) where 푏 for some vector that is “small” √ 푛 퐶 푞 ℤ 푞 = 2 퐶푣 = 푏푣 + 푒 푒 330 an intensive introduction to cryptography

(e.g., for every , ). However, the EVAL √ algorithm was deterministic푝표푙푦푙표푔(푛)and hence this vector푛 is a function of 푖 whatever function푖 |푒 |we < 푛 are evaluating≪ 푞 and = 2 someone that knows the secret key could recover and then obtain from푒 it some information about . We want to푓 make EVAL probabilistic and lose that informa- tion, and we푣 use the following푒 approach 푓 To kill a signal, drown it in lots of noise

That is, if we manage to add some additional random noise that has magnitude much larger than , then it would essentially “erase”′ any structure had. More formally, we will use the following lemma:푒 푒 Lemma 19.6 Let and be such that . If we let 푒 be the distribution obtained by taking mod for an integer 푞 chosen at random푎 ∈ in ℤ 푇 ∈and ℕ let be the푎푇 distribution < 푞/2 obtained 푋by taking mod for chosen in′ the푥( same푞) way, then 푥 [−푇 , +푇 ] 푋 Pr Pr 푎 + 푥( 푞) 푥 ′

푦∈ℤ∑푞 | [푋 = 푦] − [푋 = 푦]| < |푎|/푇 Proof. This has a simple “proof by picture”: consider the intervals and on the number line (see Fig. 19.2). Note that the symmetric difference of these two intervals is only about a[−푇 , +푇fraction ] [−푇 of their + 푎, union. +푇 + More 푎] formally, is the uniform dis- tribution over the numbers in the interval while Figure 19.2: If then the uniform distribution over the interval is statistically close to the 푎/푇is the uniform distribution over the shifted푋 version of this inter- uniform distribution푎 ≪ 푇 over the interval val′ 2푇 +. There 1 are exactly numbers[−푇 , which +푇 ] get , since the statistical[−푇 , distance+푇 ] is proportional to the event (which happens with probability ) that a 푋probability zero under one of those distributions and probability [−푇 + 푎, +푇 + 푎]random sample from one distribution falls inside the [−푇 + 푎, +푇 + 푎]under the other. 2|푎| symmetric difference of the two intervals.푎/푇 ■ −1 −1 (2푇 + 1) < (2푇 ) We will also use the following lemma: Lemma 19.7 If two distributions over numbers and satisfy Pr Pr then′ the dis- tributions′ and over dimensional vectors푋 where푋 every 푦∈ℤ Δ(푋,entry 푋is sampled) =푚 ∑ independently| ′푚[푋 = 푥] from− [푌or = 푦]|respectively < 훿 satisfy 푋 .푋 푚 ′ 푚 ′푚 푋 푋 Δ(푋 , 푋 ) ≤ 푚훿 P We omit the proof of Lemma 19.7 and leave it as an exercise to prove it using the hybrid argument. We will actually only use Lemma 19.7 for distri- butions above; you can obtain intuition for it by considering the case where we compare the rectangles of the forms and 푚 = 2 . You can see that their union has size roughly[−푇 , +푇 ]while × their [−푇 symmet- , +푇 ] [−푇 + 푎, +푇 + 푎] × [−푇 + 푏, +푇2 + 푏] 4푇 multiparty secure computation ii: construction using fully homomorphic encryption 331

ric difference has size roughly , and so if then the symmetric difference is roughly a fraction of the union. 2푇 ⋅ 2푎 + 2푇 ⋅ 2푏 |푎|, |푏| ≤ 훿푇 2훿 We will not provide the full details, but together these lemmas show that EVAL can use bootstrapping to reduce the magnitude of the noise to roughly and then add an additional random noise of 0.1 roughly, say, which푛 would make it statistically indistinguishable 0.2 from the actual푛 encryption.2 Here are some hints on how to make this work: the idea2 is that in order to “re-randomize” a ciphertext we need a very noisy encryption of zero and add it to . The normal encryption will use noise of magnitude but we will provide퐶 an 0.2 encryption of the secret key with smaller푛 magnitude퐶 0.1 so we can use bootstrapping to reduce the2 noise. The main푛 /푝표푙푦푙표푔(푛) idea that allows to add noise is that at the end of the day, our scheme2 boils down to LWE instances that have the form where is a random vector in and where is a small noise addition. If푛−1 we take any such input and add(푐, to 휎) some 푐 푞 then we createℤ the effect휎 = ⟨푐, 푠⟩of +completely 푎 푎 ∈re-randomizing [−휂, +휂] ′ the′ noise. ′ However, completely analyzing this requires non-trivial휎 푎 amount∈ [−휂 , of +휂 ] care and work.

19.2.1 Bottom line: A two party secure computation protocol Using the above we can obtain the following theorem:

Theorem 19.8 — Re-randomizable FHE. If the LWE conjecture is true then there exists a tuple of polynomial-time randomized algorithms EVAL RERAND such that:

(퐺,• 퐸, 퐷, EVAL, is a CPA) secure fully-homomorphic encryption for one bit messages. That is, if then for every Boolean(퐺, 퐸, 퐷, circuit ) with inputs and one output, and푛 , the ciphertext EVAL (푑, 푒) = 퐺(1has) length andℓ 퐶with probabilityℓ one over the random푥 choices∈ {0, 1} of 푒 푒 1 푒 ℓ the algorithms푐 =and EVAL(퐶,. 퐸 (푥 ), … , 퐸 (푥 ) 푛 푑 퐷 (푐) = 퐶(푥) • For every pair of퐸 keys there are two distributions over such that: 푛 0 1 푛 (푒, 푑) = 퐺(1 ) 풞– ,For 풞 {0, 1}, Pr . That is, is distributed over ciphertexts that푏 decrypt to . 푏 푐∼풞 푑 푏 ∈ {0, 1} [퐷 (푐) = 푏] = 1 풞 – For every ciphertext in the image of either 푏 or EVAL , if then푛 RERAND is statistically 푒 indistinguishable from푐 ∈. That {0, 1} is, the output of RERAND퐸 (⋅) 푒 푑 푒 (⋅) 퐷 (푐) =푏 푏 (푐) 푒 풞 (푐) 332 an intensive introduction to cryptography

is a ciphertext that decrypts to the same plaintext as , but whose distribution is essentially independent of . 푐 Proof Idea: 푐 We do not include the full proof but the idea is we use our standard LWE-based FHE and to rerandomize a ciphertext we will add to it an encryption of (which will not change the corresponding plaintext) and an additional noise vector that would be of much푐 larger mag- nitude than the0 original noise vector of , but still small enough so decryption succeeds. 푐

Using the above re-randomizable encryption scheme, we can re- ⋆ define EVAL to add a RERAND step at the end and achieve statistical circuit privacy. If we use Protocol 2PC with such a scheme then we get a two party computation protocol secure with respect to honest but curious adversaries. Using the compiler of Theorem 18.5 we obtain a proof of Theorem 18.3 for the two party setting:

Theorem 19.9 — Two party secure computation. If the LWE conjecture is true then for every (potentially randomized) functionality there exists a polynomial-time푛1 protocol푛2 for computing푚1 the functionality푚2 secure 퐹with ∶ respect {0, 1} to potentially× {0, 1} malicious→ {0,adversaries. 1} × {0, 1} 퐹 19.3 BEYOND TWO PARTIES

We now sketch how to go beyond two parties. It turns out that the compiler of honest-but-curious to malicious security works just as well in the many party setting, and so the crux of the matter is to obtain an honest but curious secure protocol for parties. We start with the case of three parties - Alice, Bob, and Charlie. First, let us introduce some convenient푘 > notation 2 (which is used in 3 other settings as well).3 We will assume that each party initially gen- I believe this notation originates with Burrows– Abadi–Needham (BAN) logic though would be erates private/public key pairs with respect to some fully homomor- happy if scribe writers verify this. phic encryption (satisfying statistical circuit privacy) and sends them to the other parties. We will use to denote the encryption of using Alice’s public key (similarly and will 퐴 denote theℓ encryptions of with{푥} respect to Bob’s and Charlie’s pub- 퐵 퐶 푥lic ∈ key. {0, We 1} can also compose these and so denote{푥} by {푥} the encryption under Bob’s key푥 of the encryption under Alice’s key of . 퐴 퐵 With the notation above, Protocol 2PC can be described{{푥} as} follows: ::: Protocol 2PC: (Using BAN notation) 푥

• Inputs: Alice’s input is and Bob’s input is . The goal is for Alice to learn only 푛 and Bob to learn nothing.푛 푥 ∈ {0, 1} 푦 ∈ {0, 1} 퐹 (푥, 푦) multiparty secure computation ii: construction using fully homomorphic encryption 333

• Alice->Bob: Alice sends to Bob. (We omit from this descrip- tion the public key of Alice which can be thought of as being con- 퐴 catenated to the ciphertext).{푥}

• Bob->Alice: Bob sends to Alice by running EVAL on the ciphertext and the map . 퐴 퐴 {푓(푥, 푦)} 퐴 • Alice’s output:{푥}Alice computes 푥 ↦ 퐹::: (푥, 푦)

We can now describe the protocol푓(푥, for 푦) three parties. We will focus on the case where the goal is for Alice to learn (where are the private inputs of Alice, Bob, and Charlie, respectively) and for Bob and Charlie to learn nothing. As usual we퐹 can (푥, reduce 푦, 푧) the gen-푥, 푦, 푧 eral case to this by running the protocol multiple times with parties switching the roles of Alice, Bob, and Charlie. ::: Protocol 3PC: (Using BAN notation)

• Inputs: Alice’s input is , Bob’s input is , and Charlie’s input is . The푛 goal is for Alice to learn only푛 and for Bob and푥 ∈ Charlie푚 {0, 1} to learn nothing.푦 ∈ {0, 1} 푧 ∈ {0, 1} • 퐹Alice->Bob: (푥, 푦, 푧) Alice sends to Bob.

퐴 • Bob–>Charlie: Bob sends{푥} to Charlie.

퐴 퐵 • Charlie–>Bob: Charlie sends{{푥} , 푦} to Bob. Charlie can do this by running EVAL on the ciphertext and on the (efficiently 퐴 퐵 computable) map EVAL{{퐹 (푥, 푦, 푧)}where} is the circuit 퐵 . (Please read this line several times!) 퐴 푦 푦 푐, 푦 ↦ (푓 , 푐) 푓 • 푥Bob–>Alice: ↦ 퐹 (푥, 푦, 푧)Bob sends to Alice by decrypting the ciphertext sent from Charlie. 퐴 {푓(푥, 푦, 푧)} • Alice’s output: Alice computes by decrypting the cipher- text sent from Bob. ::: 푓(푥, 푦, 푧)

Theorem 19.10 — Three party honest-but-curious secure computation. If the underlying encryption is a fully homomorphic statistically circuit private encryption then Protocol 3PC is a secure protocol for the functionality with respect to honest- but-curious adversaries. (푥, 푦, 푧) ↦ (퐹 (푥, 푦, 푧), ⊥, ⊥) Proof. To be completed. ■

20 Quantum computing and cryptography I

“I think I can safely say that nobody understands quan- tum mechanics.” , Richard Feynman, 1965

“The only difference between a probabilistic classical world and the equations of the quantum world is that somehow or other it appears as if the probabilities would have to go negative”, Richard Feynman, 1982

For much of the history of mankind, people believed that the ulti- mate “theory of everything” would be of the “billiard ball” type. That is, at the end of the day, everything is composed of some elementary particles and adjacent particles interact with one another according to some well specified laws. The types of particles and laws might differ, but not the general shape of the theory. Note that this in particular means that a system of particles can be simulated by a computer with memory and time. Alas, in the beginning푁 of the 20th century, several experimental re- sults푝표푙푦(푁) were calling into question the “billiard ball” theory of the world. One such experiment is the famous “double slit” experiment. Suppose we shoot an electron at a wall that has a single slit at position and put somewhere behind this slit a detector. If we let be the probabil- ity that the electron goes through the slit and let be the probability푖 푖 that conditioned on this event, the electron hits this푝 detector, then the 푖 fraction of times the electron hits our detector should푞 be (and indeed is) . Similarly, if we close this slit and open a second slit at position then the new fraction of times the electron hits our detec- 푖 푖 tor훼 will = be 푝 푞 . Now if we open both slits then it seems that the fraction should푗 be and in particular, “obviously” the probability 푗 푗 that the electron훽 = 푝 hits푞 our detector should only increase if we open a second slit. However,훼 + this 훽 is not what actually happens when we run this experiment. It can be that the detector is hit a smaller number of times when two slits are open than when only a single one hits. It’s

Compiled on 9.23.2021 13:32 336 an intensive introduction to cryptography

almost as if the electron checks whether two slits are open, and if they are, it changes the path it takes. If we try to “catch the electron in the act” and place a detector right next to each slit so we can count which electron went through which slit then something even more bizarre happened. The mere fact that we measured the electron path changes the actual path it takes, and now this “destructive interference” pat- tern is gone and the detector will be hit fraction of the time. Quantum mechanics is a mathematical theory that allows us to calculate and predict the results of this and훼 + many 훽 other examples. If you think of quantum as an explanation as to what “really” goes on in the world, it can be rather confusing. However, if you simply “shut up and calculate” then it works amazingly well at predicting the results of a great many experiments.

In the double slit experiment, quantum mechanics still allows to Figure 20.1: The setup of the double slit experiment compute numbers and that denote “probabilities” that the first and second electrons hit the detector. The only difference that in quantum mechanics훼 these훽 probabilities might be negative numbers. However, probabilities can only be negative when no one is looking at them. When we actually measure what happened to the detector, we make the probabilities positive by squaring them. So, if only the first slit is open, the detector will be hit fraction of the time. If only the second slit is open, the detector will be2 hit fraction of the time. And if both slits are open, the detector will훼 be hit2 fraction of Figure 20.2: In the double slit experiment, opening two the time. Note that it can well be that 훽 2and so this slits can actually cause some positions to receive fewer calculation explains why the number of times2 a detector(훼2 + 훽)2 is hit when electrons than before. 1 (훼 + 훽) < 훼 + 훽 If you have seen quantum mechanics before, I should two slits are open might be smaller than the number of times it is hit warn that I am making here many simplifications. when either slit is open. If you haven’t seen it before, it may seem like In particular in quantum mechanics the “probabil- complete nonsense and at this point I’ll have to politely point you back ities” can actually be complex numbers, though one gets most of the qualitative understanding by con- to the part where I said we should not question quantum mechanics sidering them as potentially negative real numbers. but simply “shut up and calculate”.1 I will also be focusing throughout this presentation on so called “pure” quantum states, and ignore the Some of the counterintuitive properties that arise from these nega- fact that generally the states of a quantum subsys- tive probabilities include: tem are mixed states that are a convex combination of pure states and can be described by a so called density matrix. This issue does not arise as much in • Interference - As we see here, probabilities can “cancel each other quantum algorithms precisely because the goal is for a quantum computer is to be an isolated system out”. that can evolve to continue to be in a pure state; in • Measurement - The idea that probabilities are negative as long as real world quantum computers however there will be “no one is looking” and “collapse” to positive probabilities when interference from the outside world that causes the state to become mixed and increase its so called “von they are measured is deeply disturbing. Indeed, people have shown Neumann entropy”- fighting this interference and the that it can yield to various strange outcomes such as “spooky ac- second law of thermodynamics is much of what the challenge of building quantum computers is all about tions at a distance”, where we can measure an object at one place . More generally, this lecture is not meant to be a com- and instantaneously (faster than the speed of light) cause a dif- plete or accurate description of quantum mechanics, ference in the results of a measurements in a place far removed. quantum information theory, or quantum computing, but rather just give a sense of the main points that are Unfortunately (or fortunately?) these strange outcomes have been different about it from classical computing and how confirmed experimentally. they relate to cryptography. quantum computing and cryptography i 337

• Entanglement - The notion that two parts of the system could be connected in this weird way where measuring one will affect the other is known as quantum entanglement.

Again, as counter-intuitive as these concepts are, they have been experimentally confirmed, so we just have to live with them.

20.0.1 Quantum computing and computation - an executive summary. One of the strange aspects of the quantum-mechanical picture of the world is that unlike in the billiard ball example, there is no obvious algorithm to simulate the evolution of particles over time periods in steps. In fact, the natural way to simulate quantum par- ticles will require a number of steps that푛 is exponential in푡 . This is a huge푝표푙푦(푛, headache 푡) for scientists that actually need to do these푛 calculations in practice. 푛 In the 1981, physicist Richard Feynman proposed to “turn this lemon to lemonade” by making the following almost tautological observation: If a physical system cannot be simulated by a computer in steps, the system can be considered as performing a computation that would take more than steps 푇 So, he asked whether one could design a quantum푇 system such that 2 As its title suggests, Feynman’s lecture was actually its outcome based on the initial condition would be some function focused on the other side of simulating physics with a computer, but he mentioned that as a “side such that (a) we don’t know how to efficiently compute remark” one could wonder if it’s possible to simulate in any other푦 way, and (b) is actually useful for푥 something.2 In 1985, physics with a new kind of computer - a “quantum 푦David = 푓(푥) Deutsch formally suggested the notion of a quantum Turing computer” which would “not [be] a Turing machine, but a machine of a different kind”. As far as I know, machine, and the model has been since refined in works of Detusch Feynman did not suggest that such a computer could and Josza and Bernstein and Vazirani. Such a system is now known as be useful for computations completely outside the domain of quantum simulation, and in fact he found a quantum computer. the question of whether quantum mechanics could For a while these hypothetical quantum computers seemed useful be simulated by a classical computer to be more for one of two things. First, to provide a general-purpose mecha- interesting. 3 I am using the theorist’ definition of conflating nism to simulate a variety of the real quantum systems that people “significant” with “super-polynomial”. As we’ll care about. Second, as a challenge to the theory of computation’s ap- see, Grover’s algorithm does offer a very generic proach to model efficient computation by Turing machines, though a quadratic advantage in computation. Whether that quadratic advantage will ever be good enough to challenge that has little bearing to practice, given that this theoretical offset in practice the significant overhead in building “extra power” of quantum computer seemed to offer little advantage a quantum computer remains an open question. We also don’t have evidence that super-polynomial in the problems people actually want to solve such as combinatorial speedups can’t be achieved for some problems outside optimization, machine learning, data structures, etc.. the Factoring/Dlog or quantum simulation domains, To a significant extent, this is still true today. We have no real ev- and there is at least one company banking on such speedups actually being feasible. 3 idence that quantum computers, if built, will offer truly significant 4 This “99 percent” is a figure of speech, but not advantage in 99 percent of the applications of computing.4 However, completely so. It seems that for many web servers, there is one cryptography-sized exception: In 1994 Peter Shor showed the TLS protocol (which based on the current non- lattice based systems would be completely broken that quantum computers can solve the integer factoring and discrete by quantum computing) is responsible for about 1 logarithm in polynomial time. This result has captured the imagina- percent of the CPU usage. 338 an intensive introduction to cryptography

tion of a great many people, and completely energized research into quantum computing. This is both because the hardness of these particular problems provides the foundations for securing such a huge part of our commu- nications (and these days, our economy), as well as it was a powerful demonstration that quantum computers could turn out to be useful for problems that a-priori seemed to have nothing to do with quantum physics. As we’ll discuss later, at the moment there are several inten- sive efforts to construct large scale quantum computers. It seems safe to say that, as far as we know, in the next five years or so there will not be a quantum computer large enough to factor, say, a bit number, but it is quite possible that some quantum computer will be built that is strong enough to achieve some task that is too inefficient1024 to achieve with a non-quantum or “classical” computer (or at least requires more resources classically than it would for this computer). When and if such a computer is built that can break reasonable parameters of Diffie Hellman, RSA and elliptic curve cryptography is anybody’s guess. It could also be a “self destroying prophecy” whereby the existence of a small-scale quantum computer would cause everyone to shift away to lattice-based crypto which in turn will diminish the motivation to invest the huge resources needed to build a large scale quantum 5 5 Of course, given that we’re still hearing of attacks computer. exploiting “export grade” cryptography that was The above summary might be all that you need to know as a cryp- supposed to disappear with 1990’s, I imagine that tographer, and enough motivation to study lattice-based cryptography we’ll still have products running 1024 bit RSA when everyone has a quantum laptop. as we do in this course. However, because quantum computing is such a beautiful and (like cryptography) counter-intuitive concept, we will try to give at least a hint of what it is about and how Shor’s algorithm works.

20.1 QUANTUM 101

We now present some of the basic notions in quantum information. It is very useful to contrast these notions to the setting of probabilistic systems and see how “negative probabilities” make a difference. This discussion is somewhat brief. The chapter on quantum computation in my book with Arora (see draft here) is one relatively short resource that contains essentially everything we discuss here. See also this blog post of Aaronson for a high level explanation of Shor’s algorithm which ends with links to several more detailed expositions. See also this lecture of Aaronson for a great discussion of the feasibility of quantum computing (Aaronson’s course lecture notes and the book that they spawned are fantastic reads as well).

States: We will consider a simple quantum system that includes objects (e.g., electrons/photons/transistors/etc..) each of which can be 푛 quantum computing and cryptography i 339

in either an “on” or “off” state - i.e., each of them can encode a single bit of information, but to emphasize the “quantumness” we will call it a qubit.A probability distribution over such a system can be described as a dimensional vector with non-negative entries summing up to , where푛 for every , denotes the probability that the system2 is in state . As we mentioned,푣 푛 quantum mechanics allows 푥 negative1 (in fact even푥 complex) ∈ {0, 1} probabilities푣 and so a quantum state of the system can푥 be described as a dimensional vector such that . 푛 2 2 2 푣 푥 Measurement:‖푣‖ = ∑푥 |푣 | Suppose= 1 that we were in the classical probabilistic setting, and that the bits are simply random coins. Thus we can describe the state of the system by the -dimensional vector such that for all 푛. If we measure the푛 system and see what the coins came out, we−푛 will get the value with2 probability . Naturally,푣 if we 푥 measure푣 = the 2 system twice푥 we will get the same result. Thus, after we 푥 see that the coin is , the new state푥 of the system collapses푣 to a vector such that if and if . In a quantum state, we do the same thing: if we푥 measure a vector corresponds to turning it with푣 푦 푦 probability푣 = 1 into푦 = a푥 vector푣 that= has0 푦 ≠on 푥 coordinate and zero on all the other coordinates.2 푣 푥 |푣 | 1 푥 Operations: In the classical probabilistic setting, if we have a system in state and we apply some function then this transforms to the state such that 푛 . 푛 Another푣 way to state this, is that 푓 ∶ {0,where 1} → {0,is 1} the matrix 푦 푥∶푓(푥)=푦 푥 such that 푣 for푤 all and all푤 other= ∑ entries are푣 . If we toss a 푓 푓 coin and decide with probability 푤to = apply 푀 and푀 with probability 푓(푥),푥 to apply푀 , this= corresponds 1 푥 to the matrix 0 . More generally, the set of operations1/2 that we can푓 apply can be cap- 푓 푔 tured1/2 as the set푔 of convex combinations of all such(1/2)푀 matrices-+ (1/2)푀 this is simply the set of non-negative matrices whose columns all sum up to - the stochastic matrices. In the quantum case, the operations we can apply to a quantum state are encoded as a unitary matrix, which is a 1matrix such that for all vectors .

Elementary푀 operations:‖푀푣‖Ofcourse, = ‖푣‖ even in the probabilistic푣 setting, not every function is efficiently computable. We think of a function as efficiently푛 푛computable if it is composed of poly- nomially many푓 elementary ∶ {0, 1} → operations, {0, 1} that involve at most or bits or so (i.e., Boolean gates). That is, we say that a matrix is elemen- tary if it only modifies three bits. That is, is obtained by “lifting”2 3 some matrix that operates on three bits ,푀 leaving all the rest of the bits intact.′ Formally, given an 푀 matrix (indexed by strings8 in × 8 ) and푀 three distinct indices 푖, 푗, 푘 ′ 3 8 × 8 푀 {0, 1} 푖 < 푗 < 푘 ∈ {1, … , 푛} 340 an intensive introduction to cryptography

we define the -lift of with indices to be the matrix such that for every strings′ and that agree with each푛 other푛 on all coordinates except푛 possibly푀 , 푖, 푗, 푘 2 ×and 2 other-푀 wise . Note that if푥 is푦 of the form′ for some function 푥,푦 푥푖푥푗푥푘,푦푖푦푗푦푘 then 푖, 푗,′ 푘 푀 where= 푀 ′ is 푥,푦 푓 defined푀 3as = 0 3 푀. We define as푀 an elementary푛 stochastic푛 푔 푓matrix ∶ {0,or 1} a probabilistic→ {0, 1} gate 푀if =is 푀 equal to an푔 ∶ {0,lift 1} of some→ {0, stochas- 1} 푖 푗 푘 tic matrix푔(푥) = 푓(푥. The푥 quantum푥 ) case is푀 similar: a quantum gate is a matrix that′ is an lift푀 of some unitary푛 matrix . It is an exercise푛 8 ×푛 8 to prove푀 that lifting preserves stochasticity and unitarity.′ That is,2 × every 2 probabilistic gate푁 is a stochastic matrix8 and × 8 every quantum푀 gate is a unitary matrix.

Complexity: For every stochastic matrix we can define its random- ized complexity, denoted as to be the minimum number such that is can be (approximately) obtained푀 by combining elemen- tary probabilistic gates. To푅(푀) be concrete, we can define to푇 be the minimum푀 such that there exists elementary matrices 푇 such that for every , 푅(푀). (It can be 1 푇 shown that푇 is finite and in푇 fact at most for every푀 ,; … we , 푀 can 푦 푦,푥 푇 1 푦,푥 do so by writing 푥as∑ a convex|푀 − combination (푀 ⋯ 푀 ) of function푛| < 0.1 and writing every function푅(푀) as a composition of functions that10 map a single푀 string to , keeping all푀 other inputs intact.) We will say that a probabilistic process mapping distributions on to distributions on 푥is efficiently푦 classically computable if 푛 . If is a unitary 푛 matrix, then푀 we define the quantum complexity{0, 1} of , denoted as {0, 1} , to be the minimum number such푅(푀) that there ≤ 푝표푙푦(푛) are quantum푀 gates satisfying that for every , 푀 푄(푀) . 푇 2 1 푇 푦 푦,푥 푇 1 푦,푥 푀 We, … say, 푀 that is efficiently quantumly푥 ∑ |푀 computable− (푀if ⋯ 푀 ) | < . 0.1 Computing functions:푀 We have defined what it means푄(푀) for ≤ 푝표푙푦(푛)an operator to be probabilistically or quantumly efficiently computable, but we typically are interested in computing some function . The idea is that we say that is efficiently computable푚 if the correspondingℓ operator is efficiently computable,푓 except ∶ {0, 1} that→ we also {0,allow 1} to use extra memory and so to푓 embed in some . We define to be efficiently classically computable if there is some such that the operator is efficiently푓 classically푛 = 푝표푙푦(푚) computable where 푓 is defined such that 푛 = 6 푔 It is a good exercise to verify that for every 푝표푙푦(푚) . In푛 the quantum푛 case푀 we have a slight twist since the , is unitary if and only if is a 1 푛 permutation. operator푔 ∶ {0,is 1} not necessarily→ {0, 1} a unitary matrix.6 Therefore푔(푥 , … , 푥 we) say = that 푛 푛 푔 ∶ 1 푚 {0, 1} → {0, 1} 푀푔 푔 푓(푥is efficiently, … , 푥 ) quantumly computable if there is such that the 푔 operator 푀 is efficiently quantumly computable where 푓 푛 = 푝표푙푦(푚) 푛 푞 푀 푔 ∶ {0, 1} → quantum computing and cryptography i 341

is defined as 푛 . 푛−푚−ℓ 1 푛 1 푚 1 푚 {0, 1} 푔(푥 , … , 푥 ) = 푥 ⋯ 푥 ‖(푓(푥 ⋯ 푥 )0 ⊕ 푚+1 푛 Quantum푥 ⋯ 푥 and) classical computation: The way we defined what it means for a function to be efficiently quantumly computable, it might not be clear that if is a function that we can compute by a polynomial size Boolean푚 circuitℓ (e.g., combining polynomially many AND,푓 OR ∶ {0, and 1} NOT→ gates) {0, 1} then it is also quantumly efficiently computable. The idea is that for every gate we can define an unitary matrix where 2 has the form . So,푔 if ∶ {0,has 1} a→ circuit3 {0, 1} of 3 ℎ gates, then we8 can × 8 dedicate an extra푀 bit for everyℎ ∶ one {0, 1} of these→ {0, gates 1} and then runℎ(푎, the corresponding푏, 푐) = 푎, 푏, 푐 ⊕ 푔(푎,unitary 푏) operations푓 one by one,푠 at the end of which we will get an operator that computes the mapping 푠 where the the bit of is the result of applying the 1 푚+ℓ+푠 1 푚 푚+1 푚+푠 1 푚 1 푚 푥 ,gate … , 푥 in the calculation= 푥 푡ℎ⋯ 푥 ‖푥 of ⋯ 푥 ⊕.푓(푥 So this, … is , 푥 “almost”)‖푔(푥 … what 푥 ) we 1 푛 wanted푡ℎ exceptℓ that + 푖 we have푔(푥 this,“extra … , 푥 ) junk” that we need to get rid 1 푚 푖of. The idea is that we now simply푓(푥 , … run , 푥 the) same computation again which will basically we mean we XOR another copy of to the last bits, but since we get that we compute the 1 푚 map 푠 as푔(푥 desired., … , 푥 ) 푠 푔(푥) ⊕ 푔(푥)푠 = 0 1 푚 1 푚 푚+1 푚+ℓ+푠 The ”obviously푥 ↦ 푥 ⋯ exponential” 푥 ‖(푓(푥 , fallacy: … , 푥 )0A⊕ priori 푥 it⋯ might 푥 seem) “obvious” that quantum computing is exponentially powerful, since to com- pute a quantum computation on bits we need to maintain the dimensional state vector and apply matrices to it. Indeed푛 popular descriptions of quantum푛 computing푛 푛 (too) often say some-2 thing along the lines that the difference2 × 2between quantum and clas- sical computer is that a classic bit can either be zero or one while a qubit can be in both states at once, and so in many qubits a quantum computer can perform exponentially many computations at once. De- pending on how you interpret this, this description is either false or would apply equally well to probabilistic computation. However, for probabilistic computation it is a not too hard exercise to show that if

is an efficiently computable function then it has 7 It is a good exercise to show that if is a proba- 7 a polynomial푚 size circuit푛 of AND, OR and NOT gates. Moreover, this bilistic process with then there exists a “obvious”푓 ∶ {0, 1} approach→ {0, 1} for simulating a quantum computation will take probabilistic circuit of size, say, 푀 that approx- imately computes 푅(푀)in the ≤sense 푇 that for2 every input not just exponential time but exponential space as well, while it is not , Pr 100푇 푛 . hard to show that using a simple recursive formula one can calculate 푛 푀 푥 ∑푦∈{0,1} ∣ [퐶(푥) = 푦] − 푀푥,푦∣ < 1/3 the final quantum state using polynomial space (in physics parlance this is known as “Feynman path integrals”). So, the exponentially long vector description by itself does not imply that quantum computers are exponentially powerful. Indeed, we cannot prove that they are (since in particular we can’t prove that every polynomial space cal- 342 an intensive introduction to cryptography

culation can be done in polynomial time, in complexity parlance we don’t know how to rule out that PSPACE), but we do have some problems (integer factoring most prominently) for which they do provide exponential speedup over푃 the= currently best known classical (deterministic or probabilistic) algorithms.

20.1.1 Physically realizing quantum computation To realize quantum computation one needs to create a system with independent binary states (i.e., “qubits”), and be able to manipulate small subsets of two or three of these qubits to change their state. 푛 While by the way we defined operations above it might seem that one needs to be able to perform arbitrary unitary operations on these two or three qubits, it turns out that there several choices for universal sets - a small constant number of gates that generate all others. The biggest challenge is how to keep the system from being measured and collapsing to a single classical combination of states. This is sometimes known as the coherence time of the system. The threshold theorem says that there is some absolute constant level of errors so that if errors are created at every gate at rate smaller than then we can recover from those and perform arbitrary long computations.휏 (Of course there are different ways to model the errors 휏and so there are actually several threshold theorems corresponding to various noise models). There have been several proposals to build quantum computers:

• Superconducting quantum computers use super-conducting elec- tric circuits to do quantum computation. Recent works have shown one can keep these superconducting qubits fairly robust to the point one can do some error correction on them (see also here).

• Trapped ion quantum computers Use the states of an ion to sim- ulate a qubit. People have made some recent advances on these computers too. While it’s not clear that’s the right measuring yard, the current best implementation of Shor’s algorithm (for factoring 15) is done using an ion-trap computer.

• Topological quantum computers use a different technology, which is more stable by design but arguably harder to manipulate to cre- ate quantum computers.

These approaches are not mutually exclusive and it could be that ultimately quantum computers are built by combining all of them together. I am not at all an expert on this matter, but it seems that progress has been slow but steady and it is quite possible that we’ll see a 20-50 qubit computer sometime in the next 5-10 years. quantum computing and cryptography i 343

20.1.2 Bra-ket notation Quantum computing is very confusing and counterintuitive for many reasons. But there is also a “cultural” reason why people sometimes find quantum arguments hard to follow. Quantum folks follow their own special notation for vectors. Many non quantum people find it ugly and confusing, while quantum folks secretly wish they people used it all the time, not just for non-quantum linear algebra, but also for restaurant bills and elemntary school math classes. The notation is actually not so confusing. If then denotes the standard basis vector in dimension. That푛 is - dimensional column푡ℎ vector that has in the푛 푥coordinate ∈ {0, 1} and zero|푥⟩푛 everywhere푥 else. So, we can describe the2 column푡ℎ vector that has|푥⟩ 2 in the entry as . One1 more푥 piece of notation that is 푥 useful is푡ℎ that if and푛 then we identify 훼with 푥∈{0,1} 푥 (that푥 is, the ∑ dimensional푛 훼 |푥⟩ vector푚 that has in the coordinate corresponding to푥 ∈ the푛+푚 {0, concatenation 1} 푦 ∈ {0, of 1}and , and zero everywhere|푥⟩|푦⟩ |푥푦⟩ 2 1 else). This is more or less all you need to know about this notation to 8 If you are curious, there is an analog notation for 8 follow this lecture. 푥 푦 row vectors as . Generally if is a vector then A quantum gate is an operation on at most three bits, and so it can would be its form as a column vector and would be its form⟨푥| as a row product.푢 Hence since be completely specified by what it does to the vectors . |푢⟩ the inner product of and can⟨푢| be Quantum states are always unit vectors and so we sometimes omit the thought⊤ of as . The outer product (the matrix whose entry is ) is denoted as . normalization for convenience; for example we8 will identify|000⟩, the … , state |111⟩ 푢 푣 = ⟨푢, 푣⟩ 푢 푏 ⟨푢||푣⟩ with its normalized version . 푖, 푗 푢푖푣푗 |푢⟩⟨푣|

√1 √1 |0⟩ + |1⟩ 2 |0⟩ + 2 |1⟩

20.1.3 Bell’s Inequality There is something weird about quantum mechanics. In 1935 Einstein, Podolsky and Rosen (EPR) tried to pinpoint this issue by highlighting a previously unrealized corollary of this theory. It was already real- ized that the fact that quantum measurement collapses the state to a definite aspect yields the uncertainty principle, whereby if you measure a quantum system in one orthogonal basis, you will not know how it would have measured in an incohrent basis to it (such as position vs. momentum). What EPR showed was that quantum mechanics results in so called “spooky action at a distance” where if you have a system of two particles then measuring one of them would instante- nously effect the state of the other. Since this “state” is just amathe- matical description, as far as I know the EPR paper was considered to be a thought experiment showing troubling aspects of quantum mechanics, without bearing on experiment. This changed when in 1965 John Bell showed an actual experiment to test the predictions of EPR and hence pit intuitive common sense against the predictions of quantum mechanics. Quantum mechanics won. Nonetheless, since the results of these experiments are so obviously wrong to anyone 344 an intensive introduction to cryptography

that has ever sat in an armchair, that there are still a number of Bell denialists arguing that quantum mechanics is wrong in some way. So, what is this Bell’s Inequality? Suppose that Alice and Bob try to convince you they have telepathic ability, and they aim to prove it via the following experiment. Alice and Bob will be in separate closed 9 9 If you are extremely paranoid about Alice and Bob rooms. You will interrogate Alice and your associate will interrogate communicating with one another, you can coordinate Bob. You choose a random bit and your associate chooses with your assistant to perform the experiment exactly a random . We let be Alice’s response and be Bob’s at the same time, and make sure that the rooms are so that Alice and Bob couldn’t communicate to each response. We say that Alice and푥 Bob∈ {0, win 1} this experiment if other in time the results of the coin even if they do so . 푦 ∈ {0, 1} 푎 푏 at the speed of light. Now if Alice and Bob are not telepathic, then they need to푎 agree⊕ 푏 = in advance푥 ∧ 푦 on some strategy. The most general form of such a strategy is that Alice and Bob agree on some distribution over a pair of functions , such that Alice will set and Bob will set . Therefore, the following claim, which is basically Bell’s 10 This form of Bell’s game was shown by CHSH 푑,Inequality, 푔 ∶ {0, 1}10 →implies {0, 1} that Alice and Bob cannot푎 succeed= 푓(푥) in this game with푏 probability= 푔(푥) higher than :

Claim: For every two functions3/4 there exist some such that . 푓, 푔 ∶ {0, 1} → {0, 1} Proof: 푥, 푦 ∈ {0,Suppose 1} toward푓(푥) a contradiction ⊕ 푔(푦) ≠ 푥 ∧ 푦 that satisfy or ;. So if it must be that for all , but on the other hand, if 푓, 푔 then for푓(푥) ⊕to 푔(푦) hold = then푥 ∧ 푦 it (∗) must푓(푥) be that = (푥 ∧ 푦) ⊕ 푔(푦) (∗)but that means푦 = 0 that cannot be constant.푓(푥) = 0 QED 푥 푦 = 1 (∗) An amazing experimentally푓(푥) = 푥 ⊕ verified 푔(1) fact is that quantum푓 mechanics 11 More accurately, one either has to give up on a 11 allows for telepathy: “billiard ball type” theory of the universe or believe in telepathy (believe it or not, some scientists went for Claim: There is a strategy for Alice and Bob to succeed in this game the latter option). with probability at least .

Proof: The main idea is0.8 for Alice and Bob to first prepare a 2-qubit quantum system in the state (up to normalization) (this is known as an EPR pair). Alice takes the first qubit in this system to her room, and Bob takes the qubit to his room. Now, when|00⟩ Alice + |11⟩ receives if she does nothing and if she applies the unitary map sin to her qubit where . When Bob receives , 푥 푥 = 0 sin푥 = 1cos if 휋/8 he does nothing and휃 if 푐표푠휃 he applies−휃 the unitary map 푅 푅 = ( ) 푦 to his qubit. Then each one of them휃 measures휃 their qubit and sends −휋/8 this푦 = as 0 their response. Recall that푦 = to 1 win the game Bob and Alice푅 want their outputs to be more likely to differ if and to be more likely to agree otherwise. If then the state does not change푥 = 푦 and = 1 Alice and Bob always output either both or both , and hence in both case 푥 = 푦 = 0 0 1 푎 ⊕ 푏 = quantum computing and cryptography i 345

. If and then after Alice measures her bit, if she gets then Bob’s state is equal to cos sin which will equal 푥with ∧ 푦 probability푥 = 0 cos푦 = 1 . The case that Alice gets , or that 0 and , is symmetric,2 and− so(휋/8)|0⟩− in all the cases(휋/8)|1⟩ where (and 0hence ) the probability(휋/8) that will be cos1 푥 = 1. For the푦 = case 0 that and , direct calculation via2푥 trigonomertic ≠ 푦 identities푥 ∧ yields 푦 = 0 that all four options for푎 = 푏 are equally(휋/8) likely ≥ and 0.85 hence in this case푥 = 1 with푦 probability = 1 . The overall probability of winning the game is at least (푎, 푏) . QED 푎 = 푏 1 1 0.5 1 Quantum vs probabilistic4 ⋅ 1 + 2 strategies:⋅ 0.85 + 4It⋅ is 0.5 in- = 0.8 structive to understand what is it about quan- tum mechanics that enabled this gain in Bell’s Inequality. For this, consider the following anal- ogous probabilistic strategy for Alice and Bob. They agree that each one of them output if he or she get as input and outputs with prob- ability if they get as input. In this case0 one can see that0 their success probability1 would be 푝 1 . The1 quantum1 strategy1 we described above can2 be 4 2 4 thought⋅ 1 + (1 of − as 푝) a variant + [2푝(1 of − the 푝)] probabilistic = 0.75 − 0.5푝 strat-≤ 0.75 egy for is sin . But in the case , instead2 of disagreeing only with prob- ability 푝 (휋/8) =, because 0.15 we can use these negative푥 = 푦 = probabilities 1 in the quantum world and ro- tate the2푝(1 state − in 푝) opposite = 1/4 directions, the probability of disagreement ends up being sin . 2 (휋/4) = 0.5

20.2 GROVER’S ALGORITHM

Shor’s Algorithm, which we’ll see in the next lecture, is an amazing achievement, but it only applies to very particular problems. It does not seem to be relevant to breaking AES, lattice based cryptography, or problems not related to quantum computing at all such as schedul- ing, constraint satisfaction, traveling salesperson etc.. etc.. Indeed, for the most general form of these search problems, classically we don’t how to do anything much better than brute force search, which takes time over an -bit domain. Lev Grover showed that quantum computers푛 can obtain a quadratic improvement over this brute force search,2 solving SAT in푛 time. The effect of Grover’s algorithm on cryptography is fairly mild:푛/2 one essentially needs to double the key lengths of symmetric primitives.2 But beyond cryptography, if large scale quantum computers end up being built, Grover search and its variants might end up being some of the most useful computational problems they will tackle. Grover’s theorem is the following: 346 an intensive introduction to cryptography

Theorem (Grover search , 1996): There is a quantum - time algorithm that given a -sized circuit computing푛/2 a function outputs a string such푂(2 that푝표푙푦(푛)) . 푛 푝표푙푦(푛) ∗ 푛 ∗ Proof sketch: The proof is not hard but we only sketch it here. The 푓 ∶ {0, 1} → {0, 1} 푥 ∈ {0, 1} 푓(푥 ) = 1 general idea can be illustrated in the case that there exists a single satisfying . (There is a classical reduction from the general∗ case to this problem.)∗ As in Simon’s algorithm, we can efficiently푥 ini- tialize an 푓(푥-qubit) = system 1 to the uniform state which has dot product with . Of course if−푛/2 we measure 푛, we 푥∈{0,1} only have푛 probability−푛/2 ∗ of obtaining푢 = 2 the value∑ . Our|푥⟩ goal would2 be to use −푛/2 2calls|푥 to−푛 the⟩ oracle to transform the∗ sys-푢 tem to a state with dot(2 product푛/2) = at 2 least some constant 푥with the state . 푂(2 ) It is an∗ exercise푣 to show that using gets we can efficiently휖 > 0 com- pute the|푥 ⟩ unitary operator such that and for every orthogonal to . Also, using the circuit퐻푎푑 for , we can efficiently com- pute the unitary operator 푈 such that푈푢 = 푢 푈푣for = all −푣 and푣 푢 . It turns∗ out that ∗ 푓 applications of UU∗ to yield∗ ∗ a vector ∗ with 푈 inner product푈 |푥⟩푛/2 with = |푥⟩ . To see푥 ≠ why, 푥∗ consider푈 |푥 what⟩ = these−|푥 ⟩ operators do in the푂(2 two dimensional) ∗ linear sub- space푢 spanned by 푣 and Ω(1). (Note that the initial|푥 state⟩ is in this subspace and all our operators∗ preserve this property.) Let be the unit vector orthogonal푢 to|푥 ⟩in this subspace and let be푢 the unit vec- ⟂ tor orthogonal to in this subspace. Restricted to this∗ subspace,푢 ⟂ is a reflection along∗ the푢 axis and is a reflection푥 along the axis .∗ Now, let be the|푥 angle⟩ between∗ and . These vectors are very푈 ⟂ close to each other and so is푥 very small푈 but∗ not zero - it is equal to푢 ⟂ sin 휃which is roughly 푢. Now푥 if our state has angle −1 with−푛/2 , then as long휃 as −푛/2is not too large (say ) then this means(2 that) has angle 2 with . That means푣 that will have훼 ≥ 0angle 푢 with or훼 with∗ , and hence훼 < 휋/8UU∗ will ⟂ have angle 푣 with .∗ Hence푢 + 휃 in one푥 application from UU푈 ∗we푣 move ⟂ radians away−훼 − from 휃 ,푥 and in−훼 − 2휃 steps푢 the angle between∗ 푣 and our state훼 + will 2휃 be at푢 least some constant−푛/2 . Since we live in the 2휃two dimensional space푢 spanned푂(2 by and) , it would mean that푢 the dot product of our state and will be at least휖 > some 0 constant as well. QED 푢 |푥⟩ |푥⟩ 21 Quantum computing and cryptography II

Bell’s Inequality is powerful demonstration that there is some- thing very strange going on with quantum mechanics. But could this “strangeness” be of any use to solve computational problems not directly related to quantum systems? A priori, one could guess the answer is no. In 1994 Peter Shor showed that one would be wrong:

Theorem 21.1 — Shor’s Theorem. The map that takes an integer into its prime factorization is efficiently quantumly computable. Specifically, it can be computed using log quantum gates.푚 3 This is an exponential improvement over푂( the푚) best known classical algorithms, which as we mentioned before, take roughly log 1/3 ̃ time. 푂( 푚) We will now sketch the ideas behind Shor’s algorithm.2 In fact, Shor proved the following more general theorem:

Theorem 21.2 — Order Finding Algorithm. There is a quantum polyno- mial time algorithm that given a multiplicative Abelian group and element computes the order of in the group. 픾 Recall that the푔 ∈ order 픾 of in is the smallest푔 positive integer such that . By “given a group” we mean that we can represent the elements푎 of the group as strings푔 픾 of length log and there is푎 a 푔log= 1 algorithm to perform multiplication in the group. 푂( |픾|) 푝표푙푦( |픾|) 21.1 FROM ORDER FINDING TO FACTORING AND DISCRETE LOG

The order finding problem allows not just to factor integers in polyno- mial time, but also solve the discrete logarithm over arbitrary Abelian groups, hereby showing that quantum computers will break not just RSA but also Diffie Hellman and Elliptic Curve Cryptography. We merely sketch how one reduces the factoring and discrete logarithm

Compiled on 9.23.2021 13:32 348 an intensive introduction to cryptography

problems to order finding: (see some of the sources above for the full details)

• For factoring, let us restrict to the case for distinct . Recall that we showed that finding the size of the group is sufficient to푚 recover = 푝푞and . One can푝, 푞 show that if we pick a few∗ random ’s in and compute(푝 − 1)(푞 −their 1) = order, 푚 − the푝 − 푚 푞least + 1 common multiplierℤ of these orders∗ is likely푝 to푞 be the group 푚 size. 푥 ℤ

• For discrete log in a group , if we get and need to recover , we can compute the order of various elements푥 of the form . The order of such an element픾 is a number푋 =satisfying 푔 푎 푏 푥mod . Again, with a few random examples we will get a푋 non푔 trivial example (where mod ) and푐 be able to푐(푥푎 recover + 푏) = the 0 (unknown|픾|) . 푐 ≠ 0 ( |픾|) 21.2 FINDING푥 PERIODS OF A FUNCTION: SIMON’S ALGORITHM Let be some Abelian group with a group operation that we’ll denote by , and be some function mapping to an arbitrary set (which we canℍ encode as ). We say that has period for some if for⊕ every푓 , ∗ if andℍ only if ∗ for∗ some integer . Note{0, 1} that if is some Abelian푓 group,ℎ then if we∗ ℎ define∈ ℍ , for푥, every 푦 ∈ ℍ element푓(푥) = 푓(푦), the map 푦 = 푥is ⊕ a 푘ℎ periodic map over with푘 period the픾 order of . So, finding the푎 order of an item |픾| reducesℍ = ℤ to the question of finding푔 ∈ 픾 the period푓(푎) of = a 푔 function. How doℍ we generally find the period푔 of a function? Let us consider the simplest case, where is a function from to that is periodic for some number , in the sense that repeats itself on the∗ intervals , , ∗ ,푓 etc.. How do we findℝ thisℝ numberℎ ? The key∗ idea∗ would∗ beℎ∗ to transform∗ from푓 the time to the frequency∗ do- main.[0, ℎ ] That[ℎ , 2ℎ is,] we[2ℎ use, 3ℎ the] Fourier transform to represent as aℎ sum of wave functions. In this representation푓 wavelengths that divide the period would get significant mass, while wavelengths푓 that don’t would likely∗ “cancel out”. Similarly,ℎ the main idea behind Shor’s algorithm is to use a tool known as the quantum fourier transform that given a circuit computing the function , creates a quantum state over roughly log qubits (and hence dimension ) that corresponds to the Fourier transform of푓. ∶ Hence ℍ → ℝ when we measure this state, we get a group|ℍ| element with probability proportional|ℍ| to the square of the corre- sponding Fourier푓 coefficient. One can show that if is -periodic then we canℎ recover from this distribution. ∗ Shor carried out this∗ approach for the group 푓 ℎ for some , but we will start beℎ seeing this for the group ∗ with 푞 ℍ = ℤ 푛 푞 ℍ = {0, 1} quantum computing and cryptography ii 349

Figure 21.1: If is a periodic function then when we represent it in the Fourier transform, we expect the coefficients푓 corresponding to wavelengths that do not evenly divide the period to be very small, as they would tend to “cancel out”.

the XOR operation. This case is known as Simon’s algorithm (given by Dan Simon in 1994) and actually preceded (and inspired) Shor’s algorithm:

Theorem 21.3 — Simon’s Algorithm. If is polyno- mial time computable and satisfies the property푛 that ∗ iff then there exists a quantum푓 ∶ {0, polynomial-time 1} → {0, 1} algorithm that outputs∗ a random such that 푓(푥)mod = 푓(푦). 푥 ⊕ 푦 = ℎ 푛 ∗ Note that given suchℎ ∈{0, samples, 1} we can recover⟨ℎ, ℎ ⟩ = 0with ( high2) probability by solving the corresponding linear equations.∗ 푂(푛) ℎ Proof. Let HAD be the unitary matrix corresponding to the one qubit operation and or 2 × 2√1 . Given the state √1we can |0⟩ ↦ 2 (|0⟩ + |1⟩) |1⟩ ↦ 2 (|0⟩ − |1⟩) apply this map√1 to each one푎 of the first qubits to푛+푚⟩ get the state |푎⟩ ↦ 2 (|0⟩ + (−1)and|1⟩) then we can apply the|0 gates of to map−푛/2 this to the푛 state푚 푛 now suppose that we 2 ∑푥∈{0,1} |푥⟩|0 ⟩ 푓 apply this operation again−푛/2 to the first푛 qubits then we get the state 푥∈{0,1} 2 ∑ |푥⟩|푓(푥)⟩which if we open up each one 푖 of−푛 these product푛 푛 and look at all푥 choices푛 (with cor- 푥∈{0,1} 푖=1 2responding∑ to∏ picking(|0⟩+(−1)and |1⟩)|푓(푥)⟩푛 corresponding푛 to picking in 푖 the product) we get 2 푦 ∈ {0, 1} 푦 =. 0 Now 푖 under푡ℎ our assumptions|0⟩ for−푛 every푦 = particular 1푛 in푛 the image⟨푥,푦⟩ of ,|1⟩ there 푥∈{0,1} 푦∈{0,1} exist푖 exactly two preimages2 ∑and ∑ such that(−1) |푦⟩|푓(푥)⟩ . So, if mod , we get that ∗ 푧 푓 ∗ and ∗ otherwise∗ we get 푥 푥⊕ℎ ⟨푥,푦⟩. Therefore,푓(푥)⟨푥,푦+ℎ = if 푓(푥+ℎ measure⟩ ) = the 푧 ∗ state we⟨푦, ℎwill⟩ = get 0 a ( pair⟨푥,푦⟩2) such⟨푥,푦+ℎ that(−1)⟩ + (−1)mod . QED= 2 ■ (−1) + (−1) =∗ 0 (푦, 푧) ⟨푦, ℎ ⟩ = 0 ( 2) 350 an intensive introduction to cryptography

Simon’s algorithm seems to really use the special bit-wise structure of the group , so one could wonder if it has any relevance for the group for some푛 exponentially large . It turns out that the same insights∗ {0, that 1} underlie the well known Fast Fourier Transform 푚 (FFT) algorithmℤ can be used to essentially follow푚 the same strategy for this group as well.

21.3 FROM SIMON TO SHOR

(Note: The presentation here is adapted from the quantum computing chapter in my textbook with Arora.) We now describe how to achieve Shor’s algorithm for order finding. We will not do this for a general group but rather focus our attention on the group for some number which is the case of interest for integer factoring∗ and the discrete logarithm modulo primes problems. ℓ That is, weℤ prove the following theorem:ℓ

Theorem 21.4 — Shor’s Algorithm, restated. For every and , there is a quantum algorithm to find the order of in ∗. ℓ ℓ 푎 ∈ ℤ∗ The idea is similar푝표푙푦(푙표푔ℓ) to Simon’s algorithm. We consider푎 theℤ mapℓ mod which is a periodic map over where with period푥 being the order of . ∗ 푚 ℓ 푥 ↦To 푎 find( the ℓ)period of this map we will ℤnow need to푚 perform = |ℤa Quan-| tum Fourier Transform (QFT) over푎 the group instead of . This is a quantum algorithm that takes a register from some arbitrary푛 state 푚 into a state whose vector is the Fourierℤ transform {0,of 1}. The QFT takes푚 only log elementary steps and is thus very efficient. 푓Note ∈ ℂ that we cannot say2 that this algorithm “computes” the푓 ̂ Fourier푓 transform, since푂( the transform푚) is stored in the amplitudes of the state, and as mentioned earlier, quantum mechanics give no way to “read out” the amplitudes per se. The only way to get information from a quantum state is by measuring it, which yields a single basis state with probability that is related to its amplitude. This is hardly repre- sentative of the entire Fourier transform vector, but sometimes (as is the case in Shor’s algorithm) this is enough to get highly non-trivial information, which we do not know how to obtain using classical (non-quantum) computers.

21.3.1 The Fourier transform over We now define the Fourier transform over (the group of integers 푚 in with addition moduloℤ ). We give a definition that is 푚 specialized to the current context. For everyℤ vector , the Fourier 1 In the context of Fourier transform it is customary transform{0, … , 푚 of −is 1} the vector where the 푚 coordinate of 푚is defined and convenient to denote the coordinate of a 1 as 푡ℎ 푓 ∈ ℂ vector by rather than 푡ℎ. 푓 푓 ̂ 푥 푓 ̂ 푥 푓 푓(푥) 푓푥 quantum computing and cryptography ii 351

where √1 . 푥푦 ̂ 푚 푦∈ℤ푚 The푓(푥) Fourier = ∑ transform2휋푖/푚 푓(푥)휔 is simply a representation of in the Fourier basis 휔 = 푒, where is the vector/function whose coordinate is . Now the inner product of any two vectors 푓 푡ℎ in this basis 푥 푥∈ℤ푚 푥 is equal√ 1{휒푥푦 to} 휒 푦 푥 푧 푚휔 휒 , 휒

1 푥푦 푧푦 1 (푥−푧)푦 푥 푧 푚 푚 ⟨휒 , 휒 ⟩ = ∑푚 휔 휔 = ∑푚 휔 . But if then 푦∈ℤand hence this푦∈ℤ sum is equal to . On the other hand, if (푥−푧) , then this sum is equal to (푥−푦)푚 푥 = 푧 using휔 the formula= 1 for the sum of a geometric1 1−휔 푥−푦 series.1 In 푚 1−휔 other1 1−1푥−푦 words, this is푥 an ≠orthonormal 푧 basis which means that the Fourier= 푚 1−휔 transform= map 0 is a unitary operation. What is so special about the Fourier basis? For one thing, if we identify vectors푓 in ↦ 푓 ̂ with functions mapping to , then it’s easy to see that every function푚 in the Fourier basis is a homomorphism 푚 from to in theℂ sense that ℤ forℂ every . Also, every function is periodic휒 in the sense that there exists 푚 푚 such thatℤ ℂ for every휒(푦 + 푧) = 휒(푦)휒(푧)(indeed if 푦, 푧 ∈then ℤ 푚 we can take to be 휒 where is the least common multiple of푟푥푦 ∈and ℤ 푚 ). Thus,휒(푦 intuitively, + 푟) = 휒(푧) if a function 푦 ∈ ℤ is itself휒(푦) periodic = 휔 (or roughly periodic)푟 thenℓ/푥 when representingℓ in the Fourier basis,푥 the 푚 coefficients푚 of basis vectors with푓 ∶periods ℤ → ℂ agreeing with the periodof should be large, and so we might be able to푓 discover ’s period from this representation. This does turn out to be the case, and is a crucial푓 point in Shor’s algorithm. 푓

21.3.2 Fast Fourier Transform. Denote by FT the operation that maps every vector to its Fourier transform . The operation FT is represented by an푚 푚 matrix whose th entry is . The trivial algorithm푓 ∈ to ℂ compute ̂ 푚 it takes operations.푓 The famous푥푦 Fast Fourier Transform (FFT)푚 algo-× 푚 rithm computes2 (푥, the 푦) Fourier transform휔 in log operations. We now sketch푚 the idea behind the FFT algorithm as the same idea is used in the quantum Fourier transform algorithm.푂(푚 푚) Note that

1 푥푦 √ 푚 푓(푥)̂ = 푚 ∑푦∈ℤ 푓(푦)휔 = Now√1 since is an −2푥(푦/2)th root of unity푥 √1 and , letting2푥(푦−1)/2 be 푚 푦∈ℤ푚,푦 푒푣푒푛 푚 푦∈ℤ푚,푦 표푑푑 the ∑ 2diagonal푓(푦)휔 matrix with+ 휔 diagonal∑ entries푚/2 푓(푦)휔 ,. we get that 휔 푚/2 휔 = −10 푚/2−1푊 FT푚/2 × 푚/2 FT FT 휔 , … , 휔 FT FT FT 푚 푙표푤 푚/2 푒푣푒푛 푚/2 표푑푑 (푓) = (푓 ) + 푊 (푓 ) 푚 ℎ푖푔ℎ 푚/2 푒푣푒푛 푚/2 표푑푑 (푓) = (푓 ) − 푊 (푓 ) 352 an intensive introduction to cryptography

where for an -dimensional vector , we denote by (resp. ) the -dimensional vector obtained by restricting to the 푒푣푒푛 coordinates whose푚 indices have least significant⃗푣 bit equal⃗푣 to (resp. ) 표푑푑 and⃗푣 by 푚/2(resp. ) the restriction of to coordinates⃗푣 with most significant bit (resp. ). 0 1 푙표푤 ℎ푖푔ℎ The equations⃗푣 above⃗푣 are the crux of the divide-and-conquer⃗푣 idea of the FFT algorithm,0 since1 they allow to replace a size- problem with two size- subproblems, leading to a recursive time bound of the form which solves to 푚 log . 푚/2 21.3.3푇 Quantum (푚) = 2푇 Fourier (푚/2) Transform + 푂(푚) over 푇 (푚) = 푂(푚 푚) The quantum Fourier transform is an algorithm to change the state of a 푚 quantum register from to its Fourierℤ transform . 푚 Theorem 21.5 — Quantum푓 Fourier ∈ ℂ Transform (Bernstein-Vazirani).푓 ̂ For every and there is a quantum algorithm that uses ele- mentary quantum푚 operations and transforms a quantum register2 in 푚state 푚 = 2 into the state 푂(푚, where)

푚 . 푚 푓 = ∑푥∈ℤ 푓(푥)|푥⟩ 푓̂ = ∑푥∈ℤ 푓(푥)|푥⟩̂ 1 푥푦 √ 푚 푓(푥)Thê =crux푚 of∑ the푦∈ℤ algorithm휔 푓(푥) is the FFT equations, which allow the problem of computing FT , the problem of size , to be split into two identical subproblems of size involving computation of FT , 푚 which can be carried out recursively using the same푚 elementary oper- 푚/2 ations. (Aside: Not every divide-and-conquer푚/2 classical algorithm can be implemented as a fast quantum algorithm; we are really using the structure of the problem here.) We now describe the algorithm and the state, neglecting normaliz- ing factors.

1. initial state:

2. Recursively run FT푥∈ℤ푚on most significant qubits (state: 푓 = ∑ 푓(푥)|푥⟩ FT FT ) 푚/2 푚 − 1 3. If LSB푚/2 is푒푣푒푛then compute푚/2 표푑푑on most significant qubits (see ( 푓 )|0⟩ + ( 푓 )|1⟩ below). (state : FT FT ) 1 푊 푚 − 1 4. Apply Hadmard gate푚/2 푒푣푒푛to least significant푚/2 표푑푑 qubit. (state: ( 푓 )|0⟩ + (푊 푓 )|1⟩ FT FT FT FT 퐻 FT FT ) 푚/2 푒푣푒푛 푚/2 표푑푑 ( 푓 )(|0⟩ + |1⟩) + (푊 푓 )(|0⟩ − |1⟩) = 5. Move푚/2 LSB푒푣푒푛 to the most푚/2 significant표푑푑 position.푚/2 푒푣푒푛 (state: 푚/2FT 표푑푑 ( 푓 + 푊 푓 )|0⟩ + ( 푓 − 푊 푓 )|1⟩ FT FT FT ) 푚/2 푒푣푒푛 |0⟩( 푓 + 푚/2 표푑푑 푚/2 푒푣푒푛 푚/2 표푑푑 푊The transformation푓 ) + |1⟩( on 푓 −qubits 푊 can푓 be) defined = 푓 ̂ by (where is the qubit of ). It can be easily seen 푚−2 푖 푖 푥 ∑푖=0 2 푥 푊 푚 − 1푡ℎ |푥⟩ ↦ 푖 휔 = 휔 푥 푖 푥 quantum computing and cryptography ii 353

to be the result of applying for every the following elementary operation on the qubit of the register: and . 푡ℎ 푖 ∈ {0, … , 푚 − 2} 푖 The final state is equal2 to 푖by the FFT equations (we leave this as an|0⟩ exercise) ↦ |0⟩ |1⟩ ↦ 휔 |1⟩ 푓 ̂

21.4 SHOR’S ORDER-FINDING ALGORITHM.

We now present the central step in Shor’s factoring algorithm: a quan- tum polynomial-time algorithm to find the order of an integer mod- ulo an integer . 푎 Theorem 21.6 —ℓ Order finding algorithm, restated. There is a polynomial- time quantum algorithm that on input (represented in bi- nary) finds the smallest such that mod . 푟 퐴, 푁 Let log .푟 Our register퐴 will= 1 consist ( of푁) qubits. Note that the function mod can be computed in 푡 = ⌈5 time(퐴 and + 푁)⌉ so we will assume푥 that we can compute푡 + 푝표푙푦푙표푔(푁) the map mod푥 ↦ 퐴 (where( 푁) we identify a number 푝표푙푦푙표푔(푁) with its푥 representation as a binary string of length 2 To compute this map we may need to extend the log |푥⟩|푦⟩).2 Now ↦we |푥⟩|푦 describe ⊕ (퐴 the( order-finding푁⟩)) algorithm. It uses a tool register by some additional many qubits, but we can ignore them as they will always be equal 푋of elementary∈ {0, … , 푁 −number 1} theory called continued fractions which allows to zero except in intermediate푝표푙푦푙표푔(푁) computations. us to푁 approximate (using a classical algorithm) an arbitrary real num- ber with a rational number where there is a prescribed upper bound on (see below) We훼 now describe the algorithm푝/푞 and the state, this time including normalizing푞 factors.

1. Apply Fourier transform to the first bits. (state: ) 1 푛 √ 푚 2. Compute the transformation 푚 푚mod∑푥∈ℤ |푥⟩)|0. ⟩ (state: mod ) 푥 1 푥 |푥⟩|푦⟩ ↦ |푥⟩|푦 ⊕ (퐴 ( 푁⟩)) √ 푚 3. Measure푚 the∑푥∈ℤ second|푥⟩|퐴 register( to get푁⟩) a value . (state: where is the smallest number such 0 퐾−1 that√1 mod and 푦 .) 퐾 ℓ=0 0 0 0 ∑푥0 |푥 + ℓ푟⟩|푦 ⟩ 푥 0 0 4. Apply퐴 the= Fourier 푦 ( transform푁) 퐾 to = the ⌊(푚 first − 1 −register. 푥 )/푟⌋ (state: ) 퐾−1 0 √ 1√ (푥 +ℓ푟)푥 푥∈ℤ푛 ℓ=0 0 In푚 the퐾 analysis,(∑ ∑ it will휔 suffice|푥⟩) to show |푦 ⟩ that this algorithm outputs the order with probability at least log (we can always am- plify the algorithm’s success by running it several times and taking the smallest푟 output). Ω(1/ 푁) 354 an intensive introduction to cryptography

21.4.1 Analysis: the case that We start by analyzing the algorithm in the case that for some integer . Though very unrealistic푟|푚 (remember that is a power of !) this gives the intuition why Fourier transforms are useful푚 = 푟푐 for detecting periods.푐 푚 2

Claim: In this case the value measured will be equal to for a random . The claim concludes the proof푥 since it implies that 푎푐 where 푎is ∈ random {0, … , 푟 integer − 1} less than . Now for every , at least log of the numbers in are co-prime to .푥/푚 Indeed, = 푎/푟 the prime number푎 theorem says that there푟 at least this many푟 primes in Ω(푟/this interval,푟) and since has at[푟 most − 1] log prime factors,푟 all but log of these primes are co-prime to . Thus, when the algorithm computes a rational approximation푟 for , the denominator푟 it will find will푟 indeed be . 푟 To prove the claim, we compute푥/푚 for every the absolute value of ’s coefficient푟 before the measurement. Up to some normalization 푚 factor this is 푥 ∈ ℤ |푥⟩ ′ If 푐−1does(푥 not0+ℓ푟)푥 divide 푥then0푐 푐 푐−1is a 푟ℓ푥 root of unity,푐−1 so푟ℓ푥 ∣∑ℓ=0 휔 ∣ = ∣휔 ∣ ∣∑ℓ=0 휔 ∣ = 1 ⋅ ∣∑ℓ=0 휔 ∣ . by the formula for sums of geometric푟 푡ℎ progressions. Thus,푐−1 such푟ℓ푥 a ℓ=0 number푐 would be measured푥 휔 with zero푐 probability. But∑ if 푤 then= 0 , and hence the amplitudes of all such ’s are equal푟ℓ푥 for푥푟푐푗ℓ all 푀푗 . 푥 = 푐푗 휔 = 푤 = 휔 = 1 푥 21.4.2 The general푗 ∈ {0, case 2, … , 푟 − 1} In the general case, where does not necessarily divide , we will not be able to show that the measured value satisfies . However, we will show that with 푟 log probability, (1) will푚 be “almost divisible” by in the sense that 푥mod 푚|푥푟 and (2) is coprime to . Ω(1/ 푟) 푥푟 Condition 푚(1) implies that 0 ≤ 푥푟 ( 푚)for < 푟/10 . Dividing⌊푥푟/푚⌋ by gives푟 . Therefore, is a rational number with denominator푥 at푐 most|푥푟 −1 푐푀|that < approximates 푟/10 푐 푐 = ⌊푥푟/푚⌋to within 푚 푟 10푀 푟 푟푀 . It∣ is not− ∣ hard < to see that such an approximation푥 is 푚 unique (again left4 as an exercise) and푁 hence in this case the algorithm will1/(10푀) come < up 1/(4푁 with ) and output the denominator . Thus all that is left is to prove the next two lemmas. The first shows that there are 푐/푟log values of that satisfy the푟 above two con- ditions and the second shows that each is measured with probability Ω(푟/ . 푟) 푥 2 Lemma√ 1: There exist log values such that: Ω((1/ 푟) ) = Ω(1/푟) 1. mod 푚 Ω(푟/ 푟) 푥 ∈ ℤ

0 < 푥푟 ( 푚) < 푟/10 quantum computing and cryptography ii 355

2. and are coprime

Lemma⌊푥푟/푚⌋ 2: If satisfies푟 mod then, before the measurement in the final step of the order-finding algorithm, the coefficient of 푥 is at least0 < 푥푟 (. 푚) < 푟/10

√1 Proof of Lemma 1 |푥⟩ We proveΩ( the푟 lemma) for the case that is coprime to , leaving the general case to the reader. In this case, the map mod is a permutation of . There are at least 푟log num- 푚bers in that are coprime∗ to (take primes in this range푥 that ↦ 푟푥 푚 (are not푚) one of ’s at most log ℤprime factors) and henceΩ(푟/ 푟)log numbers[1..푟/10]such that mod 푟 is in and coprime to . But푟 this means푟 that can not have aΩ(푟/ nontrivial푟) shared factor푥 with , as푟푥 otherwise ( 푚) this= 푥푟 factor − ⌊푥푟/푚⌋푚 would be shared[1..푟/10] with mod as푟 well. ⌊푟푥/푚⌋ 푟 푟푥 Proof of Lemma 2: ( 푚) Let be such that mod . The absolute value of ’s coefficient in the state before the measurement is 푥 0 < 푥푟 ( 푚) < 푟/10 |푥⟩ 퐾−1 √ 1√ ℓ푟푥 퐾 푚 where . Note∣ that∑ℓ=0 휔 ∣ , since . 푚 푚 Setting 0(note that since , ) and using0 the for- 퐾 = ⌊(푚−푥 −1)/푟⌋ 2푟 < 퐾 < 푟 푥 < 푁 ≪ 푚 mula for the sum푟푥 of a geometric series, this is at least ̸ √ ⌈푚/푟⌉ sin 훽 = 휔 mod 푚|푟푥 훽 ≠ 1 푟 1−훽 sin where is the angle such2푀 that 1−훽 √ ∣ ∣ = (see푟 Figure(휃⌈푚/푟⌉/2) [quantum:fig:theta]푟푥 ( for푚) a proof by picture of the last 푖휃 2푀 (휃/2) 푚 equality). Under, our assumptions휃 = and hence훽 (us- = 푒 ing the fact that sin for small angles ), the coefficient of is at least ⌈푚/푟⌉휃 < 1/10 √ This푟 completes the√1훼 ∼proof 훼 of Theorem 21.6훼. 푥 4푀 ⌈푚/푟⌉ ≥ 8 푟 21.5 RATIONAL APPROXIMATION OF REAL NUMBERS

In many settings, including Shor’s algorithm, we are given a real num- ber in the form of a program that can compute its first bits in time. We are interested in finding a close approximation to this real number of the form , where there is a prescribed upper푡 bound푝표푙푦(푡) on . Continued fractions is a tool in number theory that is useful for this. A continued fraction푎/푏is a number of the following form: 푏 for a non-negative integer and

1 1 0 푎1+ 0 1 2 positive integers.푎2+ 1 푎 + 3 푎 푎 , 푎 ,… Given a real푎 +… number , we can find its representation as an in- finite fraction as follows: split into the integer part and fractional part , find recursively훼 > 0 the representation of , and 훼 ⌊훼⌋ 훼 − ⌊훼⌋ 푅 1/(훼 − ⌊훼⌋) 356 an intensive introduction to cryptography

then write

If we continue this process for steps, we1 get a rational number, de- 훼 = ⌊훼⌋ + . noted by , which can be represented푅 as with coprime. The following facts can푛 be proven using induction:푝푛 0 1 푛 푛 푛 푛 [푎 , 푎 , … , 푎 ] 푞 푝 , 푞 • and for every , , . 0 0 0 푛 푛 푛−1 푛−2 푛 푝 = 푎 , 푞 = 1 푛 > 1 푝 = 푎 푝 + 푝 푞 = • 푛 푛−1 푛−2 푎 푞 + 푞 푛−1 푝푛 푝푛−1 (−1) 푞푛 푞푛−1 푞푛푞푛−1 Furthermore,− = it is known that which implies 푛 that is the closest rational number푝 to with1 denominator at most 푞푛 푞푛푞푛+1 . It푝 also푛 means that if is extremely∣ − close 훼∣< to a rational(∗) number, say, 푞푛 for some coprime then훼 we can find by iterating 푛 the푞 continued푎 14 fraction algorithm훼 for steps. Indeed, let be 푏 4푏 the∣훼 − first∣ < denominator such that 푎, 푏 . If 푎,then 푏 implies 푛 that . But this means that푝표푙푦푙표푔(푏) since2 there is at most푞 푛 푛+1 푛 푛+1 one rational푝 number12 of denominator푞 ≥ at 푏 most푝 푞 that푎 > 2푏 is so close(∗) to . 푞푛 2푏 푞푛 푏 On the∣ other− 훼∣ hand, < if then since = is closer to than , 푛+1 again meaning2 that 푝푏 . It’s not hard to verify훼 푎 푛+1 푞푛+1 푏 푛+1 푞 ≤ 2푏 푛+1 훼 that푝 ,1 implying4 that and 푝can be푎 computed in 푞푛+1 4푏 푞푛+1 푏 ∣time. − 훼∣ <푛/2 , = 푛 푛 푛 푛 푞 ≥ 2 푝 푞 푝표푙푦푙표푔(푞 ) 21.5.1 Quantum cryptography There is another way in which quantum mechanics interacts with cryptography. These “spooky actions at a distance” have been sug- gested by Weisner and Bennet-Brassard as a way in which parties can create a secret shared key over an insecure channel. On one hand, this concept does not require as much control as general-purpose quantum computing, and so it has in fact been demonstrated physically. On the other hand, unlike transmitting standard digital information, this “insecure channel” cannot be an arbitrary media such as wifi etc.. but rather one needs fiber optics, lasers, etc.. Unlike quantum computers, where we only need one of those to break RSA, to actually use key exchange at scale we need to setup these type of networks, and so it is unclear if this approach will ever dominate the solution of Alice sending to Bob a Brink’s truck with the shared secret key. People have proposed some other ways to use the interesting properties of quan- tum mechanics for cryptographic purposes including quantum money and quantum software protection. 22 Software Obfuscation

Let us stop and think of the notions we have seen in cryptography. We have seen that under reasonable computational assumptions (such as LWE) we can achieve the following: • CPA secure private key encryption and Message Authentication codes (which can be combined to get CCA security or authenticated encryption)- this means that two parties that share a key can have virtual secure channel between them. An adversary cannot get any additional information beyond whatever is her prior knowledge given an encryption of a message sent from Alice to Bob. Even if Moreover, she cannot modify this message by even a single bit. It’s lucky we only discovered these results from the 1970’s onwards— if the Germans have used such an encryption instead of ENIGMA in World War II there’s no telling how many more lives would have been lost.

• Public key encryption and digital signatures that enable Alice and Bob to set up such a virtually secure channel without sharing a prior key. This enables our “information economy” and protects virtually every financial transaction over the web. Moreover, it is the crucial mechanism for supplying “over the air” software updates which smart devices whether its phones, cars, thermostats or anything else. Some had predicted that this invention will change the nature of our form of government to crypto anarchy and while this may be hyperbole, governments everywhere are worried about this invention.

• Hash functions and pseudorandom function enable us to create authen- tication tokens for deriving one-time passwords out of shared keys, or deriving long keys from short passwords. They are also useful as a tool in password based key exchange, which enables two parties to communicate securely (with fairly good but not overwhelming probability) when they share a 6 digit PIN, even if the adversary can easily afford much much more than computational cycles. 6 10

Compiled on 9.23.2021 13:32 358 an intensive introduction to cryptography

• Fully homomorphic encryption allows computing over encrypted data. Bob could prepare Alice’s taxes without knowing what her income is, and more generally store all her data and perform computations on it, without knowing what the data is.

• Zero knowledge proofs can be used to prove a statement is true with- out revealing why its true. In particular since you can use zero knowledge proofs to prove that you posses X bitcoins without giv- ing any information about their identity, they have been used to obtain fully anonymous electronic currency.

• Multiparty secure computation are a fully general tool that enable Alice and Bob (and Charlie, David, Elana, Fran,.. ) to perform any computation on their private inputs, whether it is to compute the result of a vote, a second-price auction, privacy-preserving data mining, perform a cryptographic operation in a distributed manner (without any party ever learning the secret key) or simply play poker online without needing to trust any central server.

(BTW all of the above points are notions that you should be famil- iar and be able to explain what are their security guarantees if you ever need to use them, for example, in the unlikely event that you ever find yourself needing to take a cryptography final exam…) While clearly there are issues of efficiency, is there anything more in terms of functionality we could ask for? Given all these riches, can we be even more greedy? It turns out that the answer is yes. Here are some scenarios that are still not covered by the above tools:

22.1 WITNESS ENCRYPTION

Suppose that you have uncovered a conspiracy that involves very powerful people, and you are afraid that something bad might happen to you. You would like an “insurance policy” in the form of writing down everything you know and making sure it is published in the case of your untimely death, but are afraid these powerful people could find and attack any trusted agent. Ideally you would to publish an encrypted form of your manuscript far and wide, and make sure the decryption key is automatically revealed if anything happens to you, but how could you do that? A UA-secure encryption (which stands for secure against an Underwood) attack) gives an ability to create an encryption of a message that is CPA secure but such that there is an algorithm such that on input and any string which is a (digitally signed)푐 New York Times푚 obituary for Janine Skorsky will output . 퐷 푐 푤

푚 software obfuscation 359

The technical term for this notion is witness encryption by which we mean that for every circuit we have an algorithm that on input and a message creates a ciphertext that in a CPA secure, and there is an algorithm that퐹 on input and some string퐸 , outputs 퐹 if . In푚 other words, instead푐 of the key being a unique string, the key is any string퐷 that satisfies푐 a certain condition.푤 Wit- 푚ness퐹 encryption (푤) = 1 can be used for other applications. For example, you could encrypt a message to푤 future members of humanity, that can be decrypted only using a valid proof of the Riemann Hypothesis.

22.2 DENIABLE ENCRYPTION

Here is another scenario that is seemingly not covered by our current tools. Suppose that Alice uses a public key system to encrypt a message by computing and sending to Bob that will compute . The ciphertext is intercepted(퐺, 퐸, by 퐷) Bob’s 푒 archenemy푚 Freddie Baskerville푐 = Ignatius 퐸 (푚, 푟) (or FBI for short)푐 who has 푑 the means to force푚 = Alice 퐷 (푐) to reveal the message and as proof reveal the randomness used in encryption as well. Could Alice find, for any choice of , some string that is pseudorandom and still equals

?′ An encryption′ scheme with this property is called deniable 1 One could also think of a deniable witness encryp- 1 , since′ we′ 푚 Alice can deny she푟 sent and claim she sent instead.푐 tion, and so if Janine in the scenario above is forced to 푒 open the ciphertexts she sent by reveal the random- 퐸 (푚 , 푟 ) ′ ness used to create them, she can credibly claim that 22.3 FUNCTIONAL ENCRYPTION푚 푚 she didn’t encrypt her knowledge of the conspiracy, but merely wanted to make sure that her family secret It’s not just individuals that don’t have all their needs met by our recipe for pumpkin pie is not lost when she passes current tools. Think of a large enterprise that uses a public key en- away. cryption . When a ciphertext is received by the enterprise’s servers, it needs to be decrypted using the secret key . 푒 But this creates(퐺, 퐸, 퐷) a single point of failure.푐 It= would 퐸 (푚) be much better if we could create a “weakened key” that, for example, can only decrypt푑 messages related to sales that were sent in the date range X-Y, a key 1 that can only decrypt messages푑 that contain certain keywords, or maybe a key that only allows to detect whether the message en- 2 푑coded by a particular ciphertext satisfies a certain regular expression. 3 This will allow푑 us to give the key to the manager of the sales department (and not worry about her taking the key with her if she 1 leaves the company), or more generally푑 give every employee a key that corresponds to his or her role. Furthermore, if the company re- ceives a subpoena for all emails relating to a particular topic, it could give out a cryptographic key that reveals precisely these emails and nothing else. It could also run a spam filter on encrypted messages without needing to give the server performing this filter access to the full contents of the messages (and so perhaps even outsource spam filtering to a different company). 360 an intensive introduction to cryptography

The general form of this is called a functional encryption. The idea is that for every function we can create a decryption key such that on input ∗ , ∗ but cannot be used to gain any other푓 ∶ information {0, 1} → {0, on 1} the message except for , 푓 푒 푑푓 푓 and푑 even if several parties푐 holding = 퐸 (푚) 퐷 (푐) =collude 푓(푚) together,푑 they can’t learn more than simply . Note that using fully푓(푚) 푓1 푓푘 homomorphic encryption we can easily푑 , … transform , 푑 an encryption of 1 푘 to an encryption of but푓 (푚), what … we , 푓 want(푚) here is the ability to selectively decrypt only some information about the message. 푚 The formal definition푓(푚) of functional encryption is the following:

Definition 22.1 — Functional Encryption. A tuple is a functional encryption scheme if: (퐺, 퐸, 퐷, 퐾푒푦퐷푖푠푡) • For every function , if and , then for everyℓ message , 푛 . 푓 푓 ∶ {0, 1} → {0, 1} (푑, 푒) = 퐺(1 ) 푑 = 푑푓 푒 • Every퐾푒푦퐷푖푠푡(푑, efficient 푓) adversary Eve wins푚 the퐷 following(퐸 (푚)) = 푓(푚)game with probability at most :

1. We generate 1/2 + 푛푒푔푙(푛). 2. Eve is given and for 푛 repeatedly chooses 푅 and receives(푑, 푒). ← 퐺(1 ) 3. Eve chooses푒 two messages푖 = 1, … , 푇such = 푝표푙푦(푛) that for 푖 푓푖 all푓 . 푑 0 1 푖 0 푖 1 4. For , Eve receives푚 , 푚 and푓 (푚 outputs) = 푓 (푚. ) 5. Eve푖 =wins 1, …if , 푇 . ∗ ′ 푅 푒 푏 푏 ← {0,′ 1} 푐 = 퐸 (푚 ) 푏 푏 = 푏

22.4 THE SOFTWARE PATCH PROBLEM

It’s not only exotic forms of encryption that we’re missing. Here is an- other application that is not yet solved by the above tools. From time to time software companies discover a vulnerability in their products. For example, they might discover that if fed an input of some partic- ular form (e.g., satisfying a regular expression ) to a server running their software could give an adversary unlimited access푥 to it. In such a case, you might want to release a patch that modifies푅 the software to check if and if so rejects the input. However the fear is that hackers who didn’t know about the vulnerability before could dis- cover it by푅(푥) examining = 1 the patch and then use it to attack the customers who are slow to update their software. Could we come up for a reg- ular expression with a program such that if and only

푅 푃 푃 (푥) = 1 software obfuscation 361

if but examining the code of doesn’t make it any easier to find some satisfying ? 푅(푥) = 1 푃 푥 푅 22.5 SOFTWARE OBFUSCATION

All these applications and more could in principle be solved by a single general tool known as virtual black-box (VBB) secure software obfuscation. In fact, such an obfuscation is a general tool that can also be directly used to yield public key encryption, fully homomorphic encryption, zero knowledge proofs, secure function evaluation, and many more applications. We will now give the definition of VBB secure obfuscation and prove the central result about it, which is unfortunately that secure VBB obfuscators do not exist. We will then talk about the relaxed notion of indistinguishablity obfuscators (IO) - this object turns out to be good enough for many of the above applications and whether it exists is one of the most exciting open questions in cryptography at the moment. We will survey some of the research on this front. Let’s define a compiler to be an efficient (i.e., polynomial time) pos- sibly probabilistic map that takes a Boolean circuit on bits of input and outputs a Boolean circuit that also takes input bits and computes the same function;풪 i.e., ′ for every퐶 푛 . (If is probabilistic then this should퐶 happen′ for every푛 choice of its 푛 coins.) This might seem a strange퐶(푥) definition, = 퐶 (푥) since it even푥 ∈ allows {0, 1} the trivial풪 compiler . That is OK, since later we will require additional properties such as the following: 풪(퐶) = 퐶 Definition 22.2 — VBB secure obfuscation. A compiler is a virtual black box (VBB) secure obfuscator if it satisfies the following property: for every efficient adversary mapping to 풪 , there exists an efficient simulator such that for every∗ circuit the following random variables are computationally퐴 {0, indistinguishable: 1} {0, 1} 푆 퐶 •

• 퐴(풪(퐶))where by this we mean the output of when it is given the퐶 length|퐶| of and access to the function as a black 푆box(1 (aka) oracle access). 푆 퐶 푥 ↦ 퐶(푥)

(Note that the distributions above are of a single bit, and so being indistinguishable simply means that the probability of outputting is equal in both cases up to a negligible additive factor.) 1 362 an intensive introduction to cryptography

22.6 APPLICATIONS OF OBFUSCATION

The writings of Diffie and Hellman, James Ellis, and others that thought of public key encryption, shows that one of the first ap- proaches they considered was to use obfuscation to transform a private-key encryption scheme into a public key one. That is, given a private key encryption scheme we can transform it to a pub- lic key encryption scheme by having the key generation algorithm select a private key ′ (퐸, 퐷) that will serve as the de- cryption key, and let the encryption(퐺, 퐸 , 퐷) key be푛 the circuit where 푅 is an obfuscator and is a circuit푘 ← mapping{0, 1} to . The new encryption algorithm takes and and푒 simply outputs풪(퐶) . 푘 풪 These days we know퐶′ other approaches for obtaining푐 퐷 (푑) public key encryption, but the obfuscation-based퐸 푒 푐 approach has significant푒(푐) addi- tional flexibility. To turn this into a fully homomorphic encryption, we simply publish the obfuscation of NAND . To turn this into a functional encryption, for every′ function we can′ define 푘 푘 as the obfuscation of 푐,. 푐 ↦ 퐷 (푐) 퐷 (푐 ) 푓 We can also use obfuscation to get a witness encryption,푓 to encrypt푑 푘 a message to be opened푐 ↦ 푓(퐷 using(푐)) any such that , we can ob- fuscate the function that maps to if and outputs error otherwise.푚 To solve the patch problem,푤 for a given퐹 regular(푤) = 1 expression we can obfuscate the function that푤 maps푚 퐹 (푤)to = 1 .

22.7 IMPOSSIBILITY OF OBFUSCATION푥 푅(푥)

So far, we’ve learned that in cryptography no concept is too fantastic to be realized. Unfortunately, VBB secure obfuscation is an exception:

Theorem 22.3 — impossibility of Obfuscation. Under the PRG assump- tion, there does not exist a VBB secure obfuscating compiler.

22.7.1 Proof of impossibility of VBB obfuscation We will now show the proof of Theorem 22.3. For starters, note that obfuscation is trivial for learnable functions. That is, if is a function such that given black-box access to we can recover a circuit that computes it, then we can obfuscate it. Given a circuit 퐹, the obfuscator will simply use it as a black box to퐹 learn a circuit that computes the same function and output it. Since itself only uses′퐶 black box access풪 to , it can be trivially simulated perfectly. (Verifying퐶 that this is indeed the case is a good way to풪 make sure you followed the definition.)퐶 However, this is not so useful, since it’s not hard to see that all the examples above where we wanted to use obfuscation involved func- tions that were unlearnable. But it already suggests that we should software obfuscation 363

use an unlearnable function for our negative result. Here is an ex- tremely simple unlearnable function. For every , we define to be the function that on input푛 outputs if and푛 otherwise푛 outputs . 훼, 훽 ∈ {0, 1} 훼,훽 Given퐹 black∶ {0, box 1} access→ {0, for 1} this function for푛 a random , it’s푥 ex- tremely unlikely훽 푥 = 훼 that we would hit with0 a polynomial number of queries and hence will not be able to recover and so in훼, particular 훽 2 훼 2 Pseudorandom functions can be used to construct will not be able to learn a circuit that computes . examples of functions that are unlearnable in the This function already yields a counterexample훽 for a stronger ver- much stronger sense that we cannot achieve the 훼,훽 sion of the VBB definition. We define a strong VBB퐹 obfuscator to be a machine learning goal of outputting some circuit that approximately predicts the function. compiler that satisfies the above definition for adversaries that can output not just one bit but an arbitrary long string. We can now prove the following:풪 Lemma 22.4 There does not exist a strong VBB obfuscator.

Proof. Suppose towards a contradiction that there exists a strong VBB obfuscator . Let be defined as above, and let be the adversary that on input a circuit simply outputs . We claim that for every 훼,훽 there exists풪 some 퐹 and′ an efficient algorithm′ 퐴 Pr 퐶 Pr 퐶 푆 훼,훽 훼,훽 these probabilities훼, 훽 are over the coins of퐹 and10푛퐷 the simulator . Note 훼,훽 훼,훽 훼,훽 that∣ we[퐷 identify(퐴(풪(퐹 the function)) = 1] − [퐷with(푆 the obvious(1 )) circuit = 1]∣ > of 0.9 size (∗) at most that computes it. 풪 푆 훼,훽 Clearly implies that that퐹 these two distributions are not in- distinguishable,10푛 and so proving will finish the proof. The algo- rithm (∗)on input a circuit will simply output iff . By the definition of a compiler′ (∗) and the algorithm , for every′ , 훼,훽 Pr 퐷 .퐶 1 퐶 (훼) = 훽 On the other hand, for to output on 퐴 훼,, it 훽 must 훼,훽 훼,훽 훼,훽 be[퐷 the case(퐴(풪(퐹 that )) = 1] =. 1 We claim that there′ exists퐹 some10푛 such 훼,훽 that this will happen′ with퐷 negligible probability.1 퐶 = Indeed, 푆 (1 assume) makes 퐶 (훼)queries = 훽 and pick independently and훼, uniformly 훽 at random from . For every , let be the event that푆 the query푇 = 푝표푙푦(푛) of is the푛 first in which훼, 훽 it gets a response other than 푖 . The푡ℎ probability{0, 1}of is at most푖 = 1,because … , 푇 as퐸 long as got all responses푛 푖 to be 푆 , it got no information−푛 about and so the choice of 푖 0 ’s query is independent푛 퐸 of which2 is chosen at random푆 in . By a푡ℎ union bound,0 the probability that got any훼 response other than푛 푆 is푖 negligible. In which case if훼 we let be the output of and{0, let 1} 푛 , then is independent of 푆and′ so the probability that they0 ′ are′ equal is at′ most . 퐶 푆 ■ 훽 = 퐶 (훼) 훽 −푛 훽 2 364 an intensive introduction to cryptography

The adversary in the proof of Lemma 22.4 does not seem very impressive. After all, it merely printed out its input. Indeed, the definition of strong VBB security might simply be an overkill, and “plain” VBB is enough for almost all applications. However, as men- tioned above, plain VBB is impossible to achieve as well. We’ll prove a slightly weaker version of Theorem 22.3:

Theorem 22.5 — Impossiblity of Obfuscation from FHE. If fully homomor- phic encryption exists then there is no VBB secure obfuscating compiler.

(To get the original theorem from this, note that if VBB obfuscation exists then we can transform any private key encryption into a fully homomorphic public key encryption.)

Proof. Let EVAL be a fully homomorphic encryption scheme. For strings , we will define the function as follows:(퐺, for 퐸, inputs 퐷, of the) form , it will output if and only if 푒,푐,훼,훽,훾 , and otherwise푑, 푒, output 푐, 훼, 훽, 훾. For inputs of the form ,퐹 it will output iff and otherwise푛 00푥 output . And훽 for the′ input , it푥 will = 훼 output . For′ all other inputs0 it will output푛 . 01푐 푛 푑 We will훾 use퐷 (푐 this) =function 훽 family where 0are the푛 keys of the FHE,1 and 푐. We now define our adversary . On0 input some circuit , will compute and let 푑,be 푒 the circuit that on input 푒 ′outputs푐 = 퐸 (훼) . It will then′ 푛 let EVAL″ 퐴 . Note that if is퐶 an퐴 encryption′ of 푐and = 퐶 (1computes) ″ 퐶 ″ then will 푒 푥be an encryption퐶 (00푥) of ′ . The푐 = adversary(퐶 will, 푐) then compute″ 푐 푑,푒,푐,훼,훽,훾 and output훼 퐶 . 퐹 = 퐹 푐 ′ We claim′ ′ that for퐹 every (00훼) simulator = 훽 , there exist퐴 some tuple 1 훾 = 퐶 (01푐 ) and a distinguisher훾 such that 푆 (푑, 푒, 푐, 훼, 훽, 훾) 퐷 Pr Pr 퐹푑,푒,푐,훼,훽,훾 |퐹푑,푒,푐,훼,훽,훾| 푑,푒,푐,훼,훽,훾 ∣Indeed,[퐷(퐴(풪(퐹 the distinguisher))) = 1]will − depend[푆 on (1and on input)]∣ a ≥ bit 0.1 will simply output iff . Clearly, if are keys of the FHE and then no matter what퐷 circuit the obfuscator훾 outputs푏 on 1 input , the1 adversary푏 = 훾 will output′(푑, 푒) on and hence 푒 푐will = 퐸output(훼) with probability one on 퐶’s output. ′ 풪 푑,푒,푐,훼,훽,훾 1 퐹 퐴 훾 퐶 퐷 ■ 1 퐴 In contrast if we let be a simulator and generate , pick independently at random in and let 푛, we claim that the probability푆 that will output푛 will(푑, be 푒) equal = 퐺(1 to ) 푒 훼, 훽, 훾 . Indeed, suppose otherwise,{0, and1} define푐 the = event 퐸 (훼) 1 to be that the query is the first푆 query (apart훾 from the query 푖 1/2 ± 푛푒푔푙(푛) 푡ℎ 푛퐸 푖 1 software obfuscation 365

whose answer is ) on which receives an answer other than . Now there are two cases: > Case 1: The query is equal to . >푛 Case 2: The query is equal푐 to for푆 some such that 0. > Case 2 only happens with negligible′ probability′ because′ if00훼only 푑 received the value (which01푐 is independent푐 of ) and퐷 (푐 did) not = 훽 receive any other non response up to the point then it did not푆 learn any information about푛 푒 . Therefore the value푡ℎ is independent훽 of the query and the0 probability that it decrypts푖 to is at most . > Case푡ℎ 1 only happens with훽 negligible probability훽 because otherwise−푛 is푖 an algorithm that on input an encryption of (and훽 a bunch2 of answers of the form , which are of course not helpful) manages to output푆 with non-negligible푛 probability, hence violating훼 the CPA security of the encryption0 scheme. > Now if neither case happens, then does훼 not receive any information about , and hence the probability that its output is is at most . 푆 훾 1 훾 1/2 P This proof is simple but deserves a second read. A crucial point here is to use FHE to allow the adversary to essentially “feed to itself” so it can obtain from an encryption of an′ encryption of , even though that would not be possible퐶 using black box access only. 훼 훽

22.8 INDISTINGUISHABILITY OBFUSCATION

The proof can be generalized to give private key encryption for which the transformation to public key encryption would be insecure, and many other such constructions. So, this result might (and indeed to a large extent did) seem like a death blow to general-purpose obfus- cation. However, already in that paper we noticed that there was a variant of obfuscation that we could not rule out, and this is the fol- lowing:

Definition 22.6 — Indistinguishability Obfuscation. We say a compiler is an indistinguishability obfuscator (IO) if for every two circuits that have the same size and compute the same function,풪 the random′ variables and are computationally indistin- 퐶,guishable. 퐶 ′ 풪(퐶) 풪(퐶 ) It is a good exercise to understand why the proof of the impos- sibility result above does not apply to rule out IO. Nevertheless, a reasonable guess would be that:

1. IO is impossible to achieve. 366 an intensive introduction to cryptography

2. Even if it was possible to achieve, it is not good enough for most of the interesting applications of obfuscation.

However, it turns out that this guess is (most likely) wrong. New results have shown that IO is extremely useful for many applications, including those outlined above. They also gave some evidence that it might be possible to achieve. We’ll talk about those works in the next lecture. 23 More obfuscation, exotic encryptions

Fully homomorphic encryption is an extremely powerful notion, but it does not allow us to obtain fine control over the access to infor- mation. With the public key you can do all sorts of computation on the encrypted data, but you still do not learn it, while with the private key you learn everything. But in many situations we want fine grained access control: some people should get access to some of the informa- tion for some of the time. This makes the “all or nothing” nature of traditional encryptions problematic. While one could still implement such access control by interacting with the holder(s) of the secret key, this is not always possible. The most general notion of an encryption scheme allowing fine con- trol is known as functional encryption, as was described in the previous lecture. This can be viewed as an object dual to Fully Homomorphic Encryption, and incomparable to it. For every function , we can con- struct an -restricted decryption key that allows recovery of from an encryption of but not anything else. 푓 푓 In this푓 lecture we will focus on a푑 weaker notion known as 푓(푚)iden- tity based encryption (IBE)푚 . Unlike the case of full fledged functional encryption, there are fairly efficient constructions known for IBE.

23.1 SLOWER, WEAKER, LESS SECURER

In a sense, functional encryption or IBE is all about selective leaking of information. That is, in some sense we want to modify an encryption scheme so that it actually is “less secure” in some very precise sense, so that it would be possible to learn something about the plaintext even without knowing the (full) decryption key. There is actually a history of cryptographic technique meant to support such operations. Perhaps the “mother” of all such “quasi en- cryption” schemes is the modular exponentiation operation for some discrete group . The map is not exactly an encryp-푥 tion of - for one thing, we don’t know how푥 to decrypt it. Also,푥 ↦ as 푔 a deterministic map, it cannot픾 be semantically푥 ↦ 푔 secure. Nevertheless, if 푥 푥

Compiled on 9.23.2021 13:32 368 an intensive introduction to cryptography

is random, or even high entropy, in groups such as cyclic subgroup of a multiplicative group modulo some prime, we don’t know how to re- cover from . However, given and we can find out if 푥 , and this can be quite푥1 useful푥푘 in many applications. 1 푘 More푥 generally,푔 even in the private푔 , … key , 푔 setting,푎 people, … , 푎 have studied 푖 푖 encryption∑ 푎 푥 schemes= 0 such as

• Deterministic encryption : an encryption scheme that maps to in a deterministic way. This cannot be semantically secure in general but can be good enough if the message has high enough푥 퐸(푥)entropy or doesn’t repeat and allows to check if two encryptions encrypt the same object. (We can also do this by푥 publishing a hash of under some secret salt.)

• Order푥 preserving encryption: is an encryption scheme mapping numbers in some range to ciphertexts so that given and one can efficiently compare whether . This is quite problematic for security.{1, For … example, , 푁} given random such퐸(푥) encryptions퐸(푦) you can more or less know where푥 they < 푦 lie in the inter- val up to multiplicative factor.. 푝표푙푦(푡)

• Searchable(1 encryption± 1/푡) : is a generalization of deterministic encryp- tion that allows some more sophisticated searchers (such as not only exact match).

Some of these constructions can be quite efficient. In particular the system CryptDB developed by Popa et al uses these kinds of encryp- tions to automatically turn a SQL database into one that works on encrypted data and still supports the required queries. However, the issue of how dangerous the “leakage” can be is somewhat subtle. See this paper and blog post claiming weaknesses in practical use cases for CryptDB, as well as this response by the CryptDB authors. While the constructions of IBE and functional encryption often use maps such as as subroutines, they offer a stronger con- trol over the leakage in the푥 sense that, in the absence of publishing a (restricted) decryption푥 ↦ 푔key, we always get at least CPA security.

23.2 HOW TO GET IBE FROM PAIRING BASED ASSUMPTIONS.

The standard exponentiation mapping allows us to compute linear functions in the exponent. That is, given any푥 linear map of the form , we can efficiently푥 ↦ 푔 compute the map . But can we do more? In particular,퐿 can 1 푘 푖 푖 we푥1 compute퐿(푥푥푘, …quadratic , 푥 퐿(푥) =1,…,푥 ∑functions?푘) 푎 푥 This is an issue, as even computing 푔the, map … , 푔 ↦ 푔 is exactly the Diffie Hellman problem that is considered푥 hard푦 in many푥푦 of the groups we are interested in. 푔 , 푔 ↦ 푔 more obfuscation, exotic encryptions 369

Pairing based cryptography begins with the observation that in some elliptic curve groups we can use a map based on the so called Weil or Tate pairings. The idea is that we have an efficiently computable isomorphism from a group to a group mapping to such that we can efficiently map the elements and to the element 1 2 . This in particular픾 means픾푥 that given푦 푔 ̂푔 we can푥 compute푦 푥푦 for every quadratic푔 . Note푔 that푥1 we cannot푥푘 repeat휑(푔 , 푔 this) = to compute, ̂푔 푄(푥1,…,푥푘) say, degree functions in the푔 exponent,, … , 푔 since we don’t knoŵ푔 how to invert the map . 푄 The Pairing Diffie Hellman4 Assumption is that we can find two such groups and generator for휑 such that there is no efficient algorithm that on input (for random 1 2 ) computes픾 , 픾. That is, while푎 푏 we푐푔 can픾 compute a quadratic in the 퐴 푔 , 푔 , 푔 푎, 푏, 푐 ∈ {0, … , |픾| − exponent, we can’t푎푏푐 compute a cubic. 1 The construction we show was first published in 1 1} We now shoŵ푔 an IBE construction due to Boneh and Franklin how the CRYPTO 2001 conference. The Weil and Tate we can obtain from the pairing diffie hellman assumption an identity pairings were used before for cryptographic attacks, but were used for a positive cryptographic result by based encryption: Antoine Joux in his 2000 paper getting a three-party Diffie Hellman protocol and then Boneh and Franklin • Master key generation: We generate as above, choose used this to obtain an identity based encryption scheme, answering an open question of Shamir. At at random in . The master private key is and the 1 2 approximately the same time as these papers, Sakai, master public key is .픾 We, 픾 let, 푔 푎 Ohgishi and Kasahara presented a paper in the SCIS 2000 conference in Japan showing an identity-based and {0, … , |픾|be − 1}two hash functions푎 modeled as푎∗ random 1 2 1 key exchange protocol from pairing. Also Clifford oracles.′ 픾ℓ , 픾 , 푔, ℎ = 푔 퐻 ∶ {0, 1} → 픾 Cocks (who as we mentioned above in the 1970’s 2 퐻 ∶ 픾 ↦ {0, 1} invented the RSA scheme at GCHQ before R,S, • Key distribution: Given an arbitrary string , we gener- and A did), also came up in 2001 with a different identity-based encryption scheme using the quadratic ate the decryption key corresponding to , as ∗ . residuosity assumption. 푖푑 ∈ {0, 1} 푎 푖푑 • Encryption: To encrypt a message 푖푑 푑given= 퐻(푖푑) the public paramters and some id , we choose ℓ , and output 푚 ∈ {0, 1}

푐 ′ 푐 푖푑 푐 ∈ {0, … , |픾| − 1} • Decryption:푔 , 퐻 (푖푑‖휑(ℎ,Given 퐻(푖푑)) the) secret⊕ 푚 key and a ciphertext , we output ′ 푖푑 ′ ′ 푑 ℎ , 푦 푖푑 Correctness:퐻 (푖푑‖휑(푑We claim, ℎ that)) ⊕ 푥 . Indeed, write and let log . Then an encryption of has the form 푑푖푑 푖푑 푖푑 퐷 ,(퐸 and(푚)) so the = second 푚 term is equalℎ = 푔 푖푑 퐻(푖푑)to′ 푐 ′ 푏 = .푎 However,ℎ 푐 since 푚 , we get that 푖푑 ℎ =′ 푔 , 퐻푎푏푐(푖푑‖휑(푔and, hence ℎ ) ) decryption ⊕ 푚 will recover푎 the푎푏 message. QED 푖푑 푖푑 퐻′ (푖푑‖ ̂푔 )푎푏푐 ⊕ 푚 푑 = ℎ = 푔 Security:푖푑 휑(ℎ , 푑 ) =Toprove ̂푔 security we need to first present a definition of IBE security. The definition allows the adversary to request keys corre- sponding to arbitrary identities, as long as it does not ask for keys corresponding to the target identity it wants to attack. There are sev- eral variants, including CCA type of security definitions, but we stick to a simple one here: 370 an intensive introduction to cryptography

Definition: An IBE scheme is said to be CPA secure if every efficient adversary Eve wins the following game with probability at most : 1/2 + 푛푒푔푙(푛)• The keys are generated and Eve gets the master public key. • For , Eve chooses an identity and gets the key . ∗ 푖 • Eve푖 chooses = 1, … , an 푇 identity= 푝표푙푦(푛) and two푖푑 messages∈ {0, 1} 푖푑 . 푑 ∗ 1 푇 • We choose 푖푑and∉ Eve {푖푑 gets, … the , 푖푑 encryption} of with 0 1 respect푚 , 푚 to the identity . 푅 푏 • Eve outputs푏 ←and{0,wins 1} if∗ . 푚 ′ 푖푑 ′ Theorem: If the푏 pairing Diffie푏 = 푏Hellman assumption holds and are random oracles, then the scheme above is CPA secure. ′ 퐻, 퐻 Proof: Suppose for the sake of contradiction that there exists some time adversary that succeeds in the IBE-CPA with probability at least for some non-negligible . We assume without푇 = loss 푝표푙푦(푛) of generality that퐴 whenever makes a query to the key distribution function1/2 with + 휖 id or a query to with휖 prefix , it had already previously made the query to 퐴.( ′can be easily modified to have this behavior) 푖푑 퐻 푖푑 We will build an algorithm that푖푑 on input퐻 퐴 will output with probability . 푎 푏 푐 1 2 The algorithm푎푏푐 will guess 퐵 픾 , 픾and, 푔, simulate 푔 , 푔 , 푔 “in its belly”̂푔 giving it the public푝표푙푦(휖, key , 1/푇 and ) act as follows: 0 0 푅 퐵 푖 , 푗푎 ← {1, … , 푇 } 퐴 • When makes a query to with푔 , then for all but the queries, will chooose a random (as usual we’ll푡ℎ assume 0 is prime),퐴 choose 퐻 and푖푑 define . Let푖 be 푖푑 퐵the query made to the푏 oracle.푏∈푖푑 {0, … We , |픾|} define (where 푖푑 푖푑 0 |픾|is the푡ℎ input to - recall푒 that= 푔 does not know퐻(푖푑).) = 푒 푏 푖푑 푏 0 푖0 퐴 퐻(푖 ) = 푔 푔 • When makes a query to the key distribution oracle with then 퐵 퐵 푏 if then will then respond with . If then 퐴aborts and fails. 푎 푏푖푑 푖푑 0 푖푑 0 푖푑 ≠ 푖푑 퐵 푑 = (푔 ) 푖푑 = 푖푑 • When makes a query to the oracle with input then for all 퐵 but the query answers with′ a random string in ′ . In the ̂ query,퐴 푡ℎ if then stops퐻 and fails. Otherwise,푖푑 ‖ℎ itℓ outputs 0 푡ℎ. 푗 ′ 퐵 {0, 1} 0 푗0 푖푑 ≠ 푖푑 퐵 • does stops the simulation and fails if we get to the challenge part. ℎ̂

It퐵 might seem weird that we stop the simulation before we reach the challenge part, but the correctness of this reduction follows from the following claim: more obfuscation, exotic encryptions 371

Claim: In the actual attack game, with probability at least will make the query to the oracle, where and the public key is . 푎푏푐 ′ 휖/10푏 퐴 ∗ ∗ 푎 푖푑 ‖ ̂푔 퐻 퐻(푖푑 ) = 푔 Proof: If does not make this query then the message in the chal- 푔 lenge is XOR’ed by a completely random string and cannot distin- guish between퐴 and in this case with probability better than . QED 퐴 0 1 Given this claim,푚 to prove푚 the theorem we just need to observe that,1/2 assuming it does not fail, provides answers to that are identically distributed to the answers receives in an actual execution of the CPA game, and hence with probability퐵 at least 퐴 , will guess the query when queries 퐴 and set the answer2 to be , and then guess the query when queries 휖/(10푇in which) 퐵 case 푏 ’s output 0 ∗ will be푖 correct.퐴 QED 퐻(푖푑 ) 푎푏푐 푔 0 ∗ 푗 퐴 푖푑 ‖ ̂푔 퐵 23.3 BEYOND PAIRING BASED CRYPTOGRAPHY

Boneh and Silverberg asked the question of whether we could go be- yond quadratic polynomials and get schemes that allow us to compute higher degree. The idea is to get a multilinear map which would be a set of isomorphic groups with generators such that we can map and to . 1 푑 1 푑 This way we푎 would푏 be픾 able푎푏 , … to , 픾 compute any degree푔 , …polynomial , 푔 in 푖 푗 푖+푗 the exponent푔 given 푔 푔 . We will now show푥 how1 using푥푘 such a multilinear map푑 we can get 1 1 a construction for a푔 witness, … , 푔 encryption scheme. We will only show the construction, without talking about the security definition, the assumption, or security reductions. Given some circuit and some message we want to “encrypt” in a way푛 that given such that it would be possible to decrypt퐶 ∶ {0, 1}, and→ otherwise {0, 1} it should be hard.푥 It should be noted that the푥 encrypting party itself푤 does not퐶(푤) know = any 1 such and indeed (as in the case푥 of the proof of Reimann hypothe- sis) might not even know if such a exists. The idea is the following. We use푤 the fact that the Exact Cover problem is NP complete to map into collection of subsets 푤 of the universe (where ) such that there exists with if and 1 푚 only if퐶 there exists sets 푆 ,that … , 푆 are a partition of 푈(i.e., every 푚,element |푈| = in 푝표푙푦(|퐶|,is covered 푛) by exactly one of these푤 sets),퐶(푤) and moreover= 1 푖1 푖푑 there is an efficient푑 way푆 to, … map , 푆to such a partition and vice푈 versa. Now, to encrypt푈 the message we take a degree instance of multilin- ear maps (with푤 all groups of size ) and choose random 푥 . We then output푑 the cipher- 1 푑 1 푑 (픾 , … , 픾 , 푔 , … , 푔 ) 푝 text 1 |푈| 푅 푚 . Now, given a partition 푗 푗 ∏푗∈푆푎1 푎, … , 푎 ∏푗∈푆←푚 푎{0, … ,∏ 푝푗∈푈 −푎 1}푗 푔1 , … , 푔1 , 퐻(푔푑 ) ⊕ 푥 372 an intensive introduction to cryptography

of the universe , we can use the multilinear operations to푖1 compute푖푑 and recover the message. Intuitively, since the 푆 , … , 푆 푗∈푈 푗 푑 numbers are∏ random,푎 that would be the only way to come up with 푑 computing this푔 value, but showing that requires formulating precise security definitions for both multilinear maps and witness encryption and of course a proof. The first candidate construction for a multilinear map was given by Garg, Gentry and Halevi. It is based on computational questions on lattices and so (perhaps not surprisingly) it involves significant complications due to noise. At a very high level, the idea is to use a fully homomorphic encryption scheme that can evaluate polynomi- als up to some degree , but release a “hobbled decryption key” that contains just enough information to provide what’s known as a zero test: check if an encryption푑 is equal to zero. Because of the homomor- phic properties, that means that we can check given encryptions of and some degree polynomial , whether . Moreover, the notion of security this and similar construction satisfy 1 푛 1 푑 푥is rather, … , 푥 subtle and indeed not푑 fully understood.푃 Constructions푃 (푥 , … , 푥 of) = 0 indistinguishability obfuscators are built based on this idea, but are significantly more involved than the construction of a witness encryp- tion. One central tool they use is the observation that FHE reduces the task of obfuscation to essentially obfuscating a decryption circuit, which can often be rather shallow. But beyond that there is significant work to be done to actually carry out the obfuscation. 24 Anonymous communication

Encryption is meant to protect the contents of communication, but sometimes the bigger secret is that the communication existed in the first place. If a whistleblower wants to leak some information tothe New York Times, the mere fact that she sent an email would reveal her identity. There are two main concepts aimed at achieving anonymity:

• Anonymous routing is about ensuring that Alice and Bob can com- municate without that fact being revealed.

• Steganography is about having Alice and Bob hiding an encrypted communication in the context of an seemingly innocuous conversa- tion.

24.1 STEGANOGRAPHY

The goal in a stegnaographic communication is to hide cryptographic (or non cryptographic) content without being detected. The idea is simple: let’s start with the symmetric case and assume Alice and Bob share a shared key and Alice wants to transmit a bit to Bob. We assume that Alice and has a choice of words that would be reasonable for her to푘 send at this point in the conversation.푏 Alice will 1 푡 choose a word such that 푡where 푤 , …is , a 푤 pseudorandom function collection. With probability there will be such a word. 푖 푘 푖 푘 Bob will decode푤 the message푓 using(푤 ) = 푏 .−푡 Alice{푓 and} Bob can use an error correcting code to compensate for1 − the 2 probability that Alice 푘 푖 is forced to send the wrong bit. 푓 (푤 ) −푡 In the public key setting, suppose that Bob publishes a2 public key for an encryption scheme that has pseudorandom ciphertexts. That is, to a party that does not know the key, an encryption is indistinguish-푒 able from a random string. To send some message to Bob, Alice computes and transmits it to Bob one bit at a time. Given the words , to transmit the bit Alice chooses푚 a word 푒 such that 푐 = 퐸 (푚) where is a hash function 1 푡 푗 푖 푡 푤 , … , 푤 ∗ 푐 푤 푖 푗 퐻(푤 ) = 푐 퐻 ∶ {0, 1} → {0, 1}

Compiled on 9.23.2021 13:32 374 an intensive introduction to cryptography

modeled as a random oracle. The distribution of words output by Al- ice is uniform conditioned on . But note that1 if ℓ is a random oracle, then 1 ℓis going to be uniform,푤 , … ,and 푤 hence indistinguishable from(퐻(푤1 . ), … , 퐻(푤ℓ )) = 푐 퐻 퐻(푤 ), … , 퐻(푤 ) 24.2 ANONYMOUS ROUTING 푐

• Low latency communication: Aqua, Crowds, LAP, ShadowWalker, Tarzan, Tor

• Message at a time, protection against timing / traffic analysis: Mix-nets, e-voting, Dining Cryptographer network (DC net), Dis- sent, Herbivore, Riposte

24.3 TOR

Basic arhictecture. Attacks

24.4 TELEX

24.5 RIPOSTE V CONCLUSIONS

25 Ethical, moral, and policy dimensions to cryptography

This will not be a lecture but rather a discussion on some of the questions that arise from cryptography. I would like you to read some of the sources below (and maybe others) and reflect on the following questions: The discussion is often framed as weighing privacy against secu- rity, but I encourage you to look critically at both issues. It is often instructive to try to compare the current situation with both the histor- ical past as well as some ideal desired world. It is also worthwhile to consider cryptography in the broader contexts. Some people on both the pro regulation and anti regulation camps exeggarate the role of cryptography. On one hand, cryptography is likely not to bring about the “crypto anarchy” regime hoped for in the crypto anarchist manifesto. For example, more than the growth of bitcoin, we are seeing a turn away from cash into credit cards and other forms of much more traceable and less anonymous forms of payments (interestingly, these forms of payments are often enabled by cryptography). On the other hand, despite the fears raised by government agencies of “going dark” there are powerful commercial incentives to collect vast amounts of data and store them at search-warrant friendly servers. Clearly technology is shifting the landscape of relationships among individuals, as well as between individuals and large organizations and governments. Cryptography is an important component in these technologies but not the only one, and more than that, the ways technologies end up used often has more to do with social and commercial factors than with the technologies themselves. All that said, significant changes often pose non trivial dangers, and it is important to have an informed and reasoned discussion of the ways cryptography can help or harm the general and private good. Some questions that are worth considering are:

Compiled on 9.23.2021 13:32 378 an intensive introduction to cryptography

• Is communicating privately a basic human right? Should it extend to communicating at a distance? Should this be absolute privacy that cannot be violated even with a legal warrant? If there was a secure way to implement wiretapping only with a legal warrant, would it be morally just?

• Is privacy a basic good in its own right? Or a necessary condition for the freedom of expression, and peaceful assembly and associa- tion?

• Are we less or more secure today than in the past? In what ways did the balance between government and individuals shift in the last few decades? Do governments have more or less data and tools for monitoring individuals at their disposal? Do individuals and non-governmental groups have more or less ability to inflict harm (and hence need to be protected against)?

• Do we have more or less privacy today than in the past? Do cryp- tography regulation play a big part in that?

• What would be the balance between security and privacy in an ideal world?

• Is the focus on encryption misguided in that the main issue af- fecting privacy and security is the so called meta data? Can crypto- graphic techniques protect such meta data? Even if they could, is there a commercial interest in doing so?

• One argument against the regulation of cryptography is that, given the mathematics of cryptography is not secret, the “bad guys” will always be able to access it. Is this a valid argument? Note that simi- lar arguments are made in the context of gun control. Also, perhaps the “true dissidents” will also be able to access cryptography as well and so regulation will effect the masses or “run of the mill” private good and not-so-good citizens?

• What would be the practical impact of regulations forbidding the use of end-to-end crypto without access by governments?

• Rogaway argues that cryptography is inherently political, and research should acknowledge this and be directed at achieving ben- eficial political goals. Has cryptography research failed the public? What more could be done?

• Are some cryptographic (or crypto related) tools inherently morally problematic? Rogaway suggests that this may be true for fully homomorphic encryption and differential privacy- do you agree? ethical, moral, and policy dimensions to cryptography 379

• What are the most significant scenarios where cryptography can impact positively or negatively? Large scale terror attacks? “Ordi- nary” crimes (that still claim the lives of many more people than terror attacks)? Attacks against cyber infrastructure or personal data? Political dissidents in opressive regimes? Mass government or corporate surveilance?

• How are these issues different in the U.S. as opposed to other coun- tries? Is the debate too U.S. centric?

25.1 READING PRIOR TO LECTURE:

• Moral Character of Cryptographic Work - please read at least parts 1-3 (pages 1-30 in the footnoted version) - it’s long and should not be taken uncritically, but is a very good and thought provoking read. • “Going Dark” Berkman report - this is a report written by a com- mittee, and as such not as exciting (though arguably more ) than Rogaway’s paper. Please read at least the introduction and you might also find the personal statements in Appendix A inter- esting. • Digital Equilibrium project - optional reading - this is a group of very senior current and former officials, in particular in govern- ment, and as such would tend to fall on the more “establishment” or “pro regulation” side. Their “foundational paper” has even more of a “written by committee” feel but is still worthwhile reading. • Crypto anarchist manifesto - optional reading - very much not “written by committee” can be an interesting read even if it sounds more like science fiction than describing actual current or near future reality.

25.2 CASE STUDIES.

Since such a discussion might be sometimes hard to hold in the ab- stract, let us consider some actual cases:

25.2.1 The Snowden revelations The impetus for the current iteration of the security vs privacy debate were the Snowden revelations on the massive scale of surveillance by the NSA on citizens in the U.S. and around the globe. Concurrently, in plain sight, companies such as Apple, Google, Facebook, and oth- ers are also collecting massive amounts of information on their users. Some of the backlash to the Snowden revelations was increased pres- sure on companies to support stronger “end-to-end” encryption such as some data does not reside on companies’ servers, that have become suspect. We’re now seeing some “backlash to the backlash” with law 380 an intensive introduction to cryptography

enforcement and government officials around the globe trying to ban such encryption technlogoy or mandate government backdoors.

25.2.2 FBI vs Apple case We’ve mentioned this case in the past. (I also blogged about it.) The short summary is that an iPhone belonging to one of the San Bernardino terrorists was found by the FBI. The iPhone’s memory was encrypted by a key that is obtained as where is the six digit passcode of the user and is a secret bit key that is hardwired into푘 the processor. The퐻(푢푖푑‖푝푎푠푠푐표푑푒) processor will only allow 푝푎푠푠푐표푑푒ten attempts at guessing the passcode before erasing푢푖푑 all memory.128 The FBI wanted Apple’s help in creating a digitally signed software update that essentially run a brute force search over the passcodes and output the key . The software update could be restricted6 to run only on that particular iPhone. Eventually, the FBI managed10 to extract the information out푘 of the iPhone without Apple’s help. The method they used is unknown, but it may be possible to physically extract the from the processor. It might also be possible to prevent erasure of the memory by disconnecting it from the processor, or rewriting it after푢푖푑 erasure. Would such cases change your position on this question? Some questions that one could ask:

• Given that the FBI had a legal warrant for the information on the iPhone, was it wrong of Apple to refuse to provide the help re- quired?

• Was it wrong for Apple to have designed their iPhone so that they are unable to easily extract information out of it? Should they be required to make sure that such devices can be searched as a result of a legal warrant?

• If the only way for the FBI to get the information was to get Apple’s master signature key (that allows to completely break into any iPhone, and even turn it into a recording/surveillance device), would it have been OK for them to do it? Should Apple design their device in a way that even their master signature key cannot break them? Is that even possible, given that software updates are crucial for proper functioning of such devices? (It was recently claimed that Canadian police has had access to the master decryption key of Blackberry since 2010.)

In the San Bernardino case, the utility of breaking into the phone was questioned, given that both perpetrators were killed and there was no evidence of them receiving any assistance. But there are cases where things are more complicated. Brittney Mills was 29 years old and 8 months pregnant when she was shot and killed in April 2015 ethical, moral, and policy dimensions to cryptography 381

in Baton Rouge, Louisiana. Her baby was delivered via emergency C section but also died a week later. There was no sign of forced entry and so it is quite likely she knew her assailant. Her family believes that the clues to her murderer’s identity could be found in her iPhone, but since it is locked they have no way of extracting this information. One can imagine other cases as well. Recently a mother found her kidnapped daughter using the Find my iPhone procedure. It is not hard to concieve of a case where unlocking a phone is the key to sav- ing someone’s life. Would such cases change your view of the above questions?

25.2.3 Juniper backdoor case and the OPM break-in We’ve also mentioned the case of the Juniper backdoor case. This was a break in to the firewalls of Juniper networks by an unknown party that was crucially enabled by backdoor allegedly inserted by the NSA into the Dual EC pseudorandom generator. (see also here and here for more). Because of the nature of this break in, whomever is responsible for it could have decrypted much of the traffic without leaving any traces, and so we don’t know the damage caused, but such hacks can have much more significant consequences than forcing people to change their credit card numbers. When the federal office of personell man- agement was hacked sensitive information about millions of people who have gone through the security clearance was extracted. This includes fingerprints, extensive personal information from interviews and polygraph sessions, and much more. Such information can help then gain access to more information, whether it’s using the finger- print to unlock a phone or using the extensive knowledge of social connections, habits and interests to launch very targeted attacks to extract information from particular individuals. Here one could ask if stronger cryptography, and in particular cryptographic tools that would have enabled an individual to control access to his or her own data, would have helped prevent such attacks.

26 Course recap

It might be worthwhile to recall what we learned in this course:

• Perhaps first and foremost, that it is possible to mathematically de- fine what it means for a cryptographic scheme to be secure. In the cases we studied, such a definition could always be described asa “security game”. That is, we first define what it means for a scheme to be insecure. Then, a scheme is secure if it is not insecure. The no- tion of “insecurity” is that there exists some adversarial strategy that succeeds with higher probability than what it should have. We normally don’t limit the strategy of the adversary but only his or her capabilities: its computational power and the type of access it has to the system (e.g., chosen plaintext, chosen ciphertext, etc.). We also talked how the notion of secrecy requires randomness and how many real-life failures of cryptosystems amount to faulty assumptions on the sources of randomness.

• We saw the importance of being conservative in security definitions. For example, how despite the fact that the notion of chosen cipher- text attack (CCA) security seems too strong to capture any realistic scenario (e.g., when do we let an adversary play with a decryption box?), there are many natural cases where the using a CPA instead of a CCA secure encryption would lead to an attack on the overall protocol.

• We saw how we can prove security by reductions. Suppose we have a scheme that achieves some security notion (for example, might be a function that achieves the security notion of being a pseudorandom푆 generator) and we use it to build푋 a scheme that we푆 want to achieve a security notion (for example, we want to be a message authentication code). To prove is secure, we푇 show how we can transform an adversary 푌 that wins against in the푇 security game of into an adversary that wins푇 against in the security game of . Typically, the adversary퐵 will run 푇“in its 푌 퐴 푆 푋 퐴 퐵

Compiled on 9.23.2021 13:32 384 an intensive introduction to cryptography

belly”, simulating for the security game with respect to . This can be somewhat confusing, so please re-read the last three sentences and make sure퐵 you understand this푌 crucial notion.푇

• We also saw some of the concrete wonderful things we can do in cryptography:

• In the world of private key cryptography, we saw that based on the PRG conjecture we can get a CPA secure private key encryption (which in particular has key shorter than message), pseudorandom functions, message authentication codes, CCA secure encryption, commitment schemes, and even zero knowledge proofs for NP complete languages.

• We saw that assuming the existence of collision resistant hash func- tions, we can get message authentication codes (and digital signa- tures) where the key is shorter than the message. We talked about the heuristic of how we can model hash functions as a random ora- cle , and use that for “proofs of work” in the context of bitcoin and password derivation, as well as many other settings.

• We also discussed practical constructions of private key primitives such as the AES block ciphers, and how such block ciphers are modeled as pseudorandom permutations and how we can use them to get CPA or CCA secure encryption via various modes such as CBC or GCM. We also discussed the Merkle and Davis-Meyer length extension construction for hash functions, and how the Merkle tree construction can be used for secure storage.

• We saw the revolutionary notion of public key encryption, that two people can talk without having coordinated in advance. We saw constructions for this based on discrete log (e.g., the Diffie-Hellman protocol), factoring (e.g., the Rabin and RSA trapdoor permuta- tions), and the learning with errors (LWE) problem. We saw the notion of digital signatures, and gave several different construc- tions. We saw how we can use digital signatures to create a “chain of trust” via certificates, and how the TLS protocol, which protects web traffic, works.

• We talked about some advances notions and in particular saw the construction of the surprising concept of a fully homomorphic en- cryption (FHE) scheme which has been rightly called by Bryan Hayes “one of the most amazing magic tricks in all of computer science”. Using FHE and zero knowledge proofs, we can get multi- party secure computation, which basically means that in the setting of interactive protocols between several parties, we can establish course recap 385

a “virtual trusted third party” (or, as I prefer to call it, a “virtual Chuck Norris”).

• We also saw other variants of encryption such as functional encryp- tion, witness encryption and identity based encryption, which allow for “selective leaking” of information. For functional encryption and witness encryption we don’t yet have clean constructions under standard assumptions but only under obfuscation, but we saw how we could get identity based encryption using the random oracle heuristic and the assumption of the difficulty of the discrete loga- rithm problem in a group that admits an efficient pairing operation.

• We talked about the notion of obfuscation, which can be thought as the one tool that, if exists, would imply all the others. We saw that virtual black box obfuscation does not exist, but there might exist a weaker notion known as “indistinguishability obfuscation” and we saw how it can be useful via the example of a witness encryption and a digital signature scheme. We mentioned (without proof) that it can also be used to obtain a functional encryption scheme.

• We talked about how quantum computing can change the land- scape of cryptography, making lattice based constructions our main candidate for public key schemes.

• Finally, we discussed some of the ethical and policy issues that arise in the applications of cryptography, and what is the impact cryptography has now, or can have in the future, on society.

26.1 SOME THINGS WE DID NOT COVER

• We did not cover what is arguably the other “fundamental theorem of cryptography”, namely the equivalence of one-way functions and pseudorandom generators. A one-way function is an efficient map that is hard to invert on a random in- put. That is, for∗ any efficient∗ algorithm if is given for uniformly퐹 ∶ {0, chosen 1} → {0, 1} , then the probability that out- puts with is negligible.푛 It can퐴 be퐴 shown that푦 = one-way 퐹 (푥) 푅 functions′ are minimal′ 푥 ←in the{0, sense 1} that they are necessary for퐴 a great many푥 cryptographic퐹 (푥 ) = applications 푦 including pseudorandom gener- ators and functions, encryption with key shorter than the message, hash functions, message authentication codes, and many more. (Most of these results are obtained via the work of Impagliazzo and Luby who showed that if one-way functions do not exist then there is a universal posterior sampler in the sense that for every prob- abilistic process that maps to , there is an efficient algorithm that given can sample from a distribution close to the posterior 퐹 ′ 푥 푦 푦 푥 386 an intensive introduction to cryptography

distribution of conditioned on . This result is typically known as the equivalence of standard one-way functions and dis- tributional one-way푥 functions.) The퐹 (푥) fundamental = 푦 result of Hastad, Impagliazzo, Levin and Luby is that one-way functions are also sufficient for much of private key cryptography since they imply the existence of pseudorandom generators.

• Related to this, although we mentioned this briefly, we did not go in depth into “Impagliazzo’s Worlds” of algorithmica, heuris- tica, pessiland, minicrypt, cryptomania (and the new one of “ob- fustopia”). If this piques your curiosity, please read this 1995 sur- vey.

• We did not go in detail into the design of private key cryptosystems such as the AES. Though we discussed modes of operation of block ciphers, we did not go into a full description of all modes that are used in practice. We also did not discuss cryptanalytic techniques such as linear and differential cryptanalysis. We also not discuss all technical issues that arise with length extension and padding of encryptions in practice.

• While we talked about bitcoin, the TLS protocol, two factor au- thentication systems, and some aspects of pretty good privacy, we restricted ourselves to abstractions of these systems and did not attempt a full “end to end” analysis of a complete system. I do hope you have learned the tools that you’d be able to understand the full operation of such a system if you need to.

• While we talked about Shor’s algorithm, the algorithm people actually use today to factor numbers is the number field sieve. It and its predecessor, the quadratic sieve, are well worth studying. The (freely available online) book of Shoup is an excellent source not just for these algorithms but general algorithmic group/number theory.

• We talked about some attacks on practical systems, but there many other attacks that teach us important lessons, not just about these particular systems, but also about security and cryptography in general (as well some human tendencies to repeat certain types of mistakes).

26.2 WHAT I HOPE YOU LEARNED

I hope you got an appreciation for cryptography, and an understand- ing of how it can surprise you both in the amazing security properties it can deliver, as well in the subtle, but often devastating ways, that it can fail. Beyond cryptography, I hope you got out of this course the course recap 387

ability to think a little differently - to be paranoid enough to see the world from the point of view of an adversary, but also the lesson that sometimes if something sounds crazy but is not downright impossible it might just be feasible. But if these philosophical ramblings don’t speak to you, as long as you know the difference between CPA and CCA and I won’t catch you reusing a one-time pad, you should be in good shape :) I did not intend this course to teach you how to implement crypto- graphic algorithms, but I do hope that if you need to use cryptography at any point, you now have the skills to read up what’s needed and be able to argue intelligently about the security of real-world systems. I also hope that you have now sufficient background to not be scared by the technical jargon and the abundance of adjectives in cryptography research papers and be able to read up on what you need to follow any paper that is interesting to you. Mostly, I just hope you enjoyed this last term and felt like this course was a good use of your time. I certainly did.

Bibliography

[WZM15] Daniel (lecturer) Wichs, Giorgos (scribe) Zirdelis, and Biswaroop (scribe) Maiti. Lecture Notes for CS 7880: Grad- uate Cryptography at Northeastern University, Lectures 3 and 4. Available at http://www.ccs.neu.edu/home/wichs/ class/crypto-fall15/. 2015.

Compiled on 9.23.2021 13:32