Summary Part I Security: Applications & Aspects Cryptographic Features Introduction Software vs Hardware Support Hardware Acceleration Solutions Processor Extensions for Security Part II Basic Cyphering Symmetric vs Asymmetric Cryptography Security Background Theoretical Attacks Arnaud Tisserand Cryptographic Hash Functions Physical Attacks CNRS, IRISA laboratory, CAIRN research team Random Number Generators (RNG) Part III Cryptographic Processors ´ Cryptographic Co-Processors & Accelerators Ecole ARCHI Processors and Co-Processors Lille, Nord Trusted Platform Module (TPM) June 8–12th 2015 Part IV Instruction Set Instruction Set Extensions Instruction Set Extensions Addition of Long Operands Extension for Finite Fields Arithmetic Extensions for AES Conclusion, future prospects, references A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 2/81

Applications with Security Needs Part I

Introduction

Security: Applications & Aspects

Cryptographic Features

Software vs Hardware Support

Hardware Acceleration Solutions We need protections against:

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 3/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 4/81 Security Aspects Steganography Cryptography: art of secret security Steganography: art of dissimulation

Principle: hide a secret message into another message (support) system security cryptology

secret message landings in Normandy on data steganography June 6th, 1944.

networks cryptography

operating systems cryptanalysis program - programs physical devices theoretical support image result difference

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 5/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 6/81

Cryptographic Features Software vs Hardware Support I Objectives: Cryptographic primitives: instructions managment + control SW • Confidentiality • Encryption @ • Integrity • @ • Authenticity • Hash function

memory D reg. hierarchy FU FU FU • Non-repudiation • Random numbers generation LSU file 1 2 3 • ... • ... Implementation issues: EXCELLENT slow large large small • Performances: speed, delay, throughput, latency FLEXIBILITY SPEED AREA ENERGY DEVEL. COST • Cost: device (memory, size, weight), low power/energy consumption, design limited fast small small HUGE • Security: protection against attacks op. op. op. op.

Applications: smart cards, computers, Internet, telecommunications, reg. reg. reg. reg. set-top boxes, data storage, RFID tags, WSN, smart grids. . . CTRL SECURITY?memory HW A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 7/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 8/81 Hardware Acceleration Solutions Part II At cluster/network/system level: (autonomous) dedicated processors • Digital signal processors (DSPs) Security Background • Network processors • Multimedia processors Basic Cyphering • Cryptographic processors Symmetric vs Asymmetric Cryptography At computer level: co-processors and accelerators Theoretical Attacks • Dedicated cards for specific applications: video (GPU), audio, . . . • Cryptographic co-processors Cryptographic Hash Functions

At processor/core level: instruction set extensions Physical Attacks • Vector/matrix computations, SIMD, FMA, small floats, data shuffling, bit manipulation, cache interaction, prefetching Random Number Generators (RNG) • Multimedia & signal processing applications • Cryptographic extensions (AES, GF(2m) multiplication)

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 9/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 10/81

Basic Cyphering Symmetric / Private- Cryptography Alice wants to secretly send a message to Bob in such a way Eve E D (eavesdropper/spy) should have no information Ek (M) M A B Dk (Ek (M)) = M secret k E k

• A : Alice, B : Bob • M: plain text/message • E: encryption/ciphering algorithm, D: decryption/deciphering algorithm E plain text • k: secret key to be shared by A and B M A B • Ek (M): encrypted text • communication Dk (Ek (M)): decrypted text secured zone secured zone channel • E: eavesdropper/spy

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 11/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 12/81 Symmetric Cryptography Limitation Asymmetric / Public-Key Cryptography

people required keys E D n list list number Ek (M) 2 A, B k 1 A B M A B Dk0 (Ek (M)) = M k2 k3 0 3 A, B, C 3 k E k k1 A B k C k2 k3 k 4 A, B, C, D A 1 B 6 k6 • k: B’s public key (known to everyone including E) k4 k5 • D Ek (M): ciphered text • k0: B’s private key (must be kept secret) n A,. . . n×(n−1) 2 • Dk0 (Ek (M)): deciphered text

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 13/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 14/81

Symmetric or Asymmetric Cryptography? Theoretical Attacks Private-key or symmetric cryptography: simple algorithms attack k, M??? fast computation E limited cost (silicon area, energy) E D requires a key exchange Ek (M) key distribution problem for n persons M A B Dk (Ek (M)) = M

k k Public-key or asymmetric cryptography:

no key exchange Notations: only 2 keys per person (1 private, 1 public) • M plain text allows digital signature • E encryption algorithm • C = Ek (M) ciphered text more complex algorithms • D decryption algorithm • secured zone slower computation • k secret key higher cost

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 15/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 16/81 RSA 768 Attack in December 2009 RSA Ciphering (Rivest, Shamir, Adleman 1977)

(n, e) d 6 months on 80 parallel computers (≡ 1 500 years for a single computer!) public key

RSA-768 = m ∈ [0, n − 1] 3347807169895689878604416984821269081770479498371376856891 private key c 2431388982883793878002287614711652531743087737814467999489 c = me mod n A B m = cd mod n × 3674604366679959042824463379962795263227915816434308764267 6032283815739666511279233373417143396810270092798736308917 RSA key pair generation: Source: article 1. generate primes p and q (length l/2) http://eprint.iacr.org/2010/006.pdf 2. compute n = pq and φ = (p − 1)(q − 1) Factorization of a 768-bit RSA modulus. Thorsten Kleinjung, Kazumaro 3. select e such that 1 < e < φ and gcd(e, φ) = 1 Aoki, Jens Franke, Arjen K. Lenstra, Emmanuel Thome, Joppe W. Bos, 4. compute d satisfying 1 < d < φ and ed ≡ 1 mod φ Pierrick Gaudry, Alexander Kruppa, Peter L. Montgomery, Dag Arne Osvik, Herman te Riele, Andrey Timofeev, and Paul Zimmermann Security: • integer factorization problem: compute (p,q) knowing just n is hard • minimal key size recommendation: 1024 bits

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 17/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 18/81

RSA Signature Cryptographic Hash Functions (1/2)

m signature verification signature generation • message m (arbitrary block of data) l bits • variable size l ACCEPT signature (n, e) d H public key • hash or digest h k bits h=h0 h0 = se mod n • fixed size k (in practice k << l) private key h = H(m) (m, s) comparison d A B s = h mod n Security properties of cryptographic hash functions: • preimage resistance (one way function): h m | h = H(m) h6=h0 h = H(m) h = H(m) 9 1 • second preimage resistance : m19m2 6= m1 | H(m1) = H(m2) REJECT signature m • collision resistance: finding (m1, m2) such that m1 6= m2 and H(m1) = H(m2) is very hard

H a cryptographic hash function Examples: MD5, WHIRLPOOL, SHA-1, SHA-2, SHA-3 (selection 2010)

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 19/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 20/81 Cryptographic Hash Functions (2/2) [Trapdoor] One Way Function

One way function: f : x 7→ y = f (x) Examples using openssl: • given x, computing y is easy • given y, computing x is very hard > echo "string" | openssl dgst -sha224 -hex Trapdoor one way function: f : x 7→ y = f (x) string digest • given x, computing y is easy 0123456789 95ae4be607f065743ac1c81d1180591a919d08b8d4765e176b26f214 • given y, computing x is very hard 1123456789 5c527cd1341a4338f09086e71d1a0d69f818d74a828c974b9433524a 0123446789 94e5aa1d275dc3a21c76d28b011f4ea6121fa228af3ec7fa329da44f • given some (secret) information and y, computing x is easy 0123456788 b04c6b0b1d663ad0c00d749441747cc6df211ea6c98f4fd2dbf283ff

Example: p and q primes, computing n = pq is easy but finding (p, q) knowing just n is very hard

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 21/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 22/81

Elliptic Curve Cryptography (ECC) Key Size vs Security Level E : y 2 = x 3 + 4x + 20 over GF(1009) security RSA ECC encryption points on E: P, Q= (x, y) or (x, y, z) level GF(p) GF(2m) signature coordinates: x, y, z ∈ GF(·) |n| [bits] |p| [bits] m [bits] 56 512 112 113 etc GF(p), GF(2m), t : 160–600 bits protocol level 64 704 128 131 k = (kt−1kt−2 ... k1k0)2 ∈ N 80 1024 160 163 J [k]P Scalar multiplication operation 96 1536 192 193 for i from 0 to t − 1 do 112 2048 224 233 JJ if ki = 1 then Q = ADD(P, Q) 128 3072 256 283 P = DBL(P) 192 7680 384 409 P + P curve level Point addition/doubling operations 256 15360 521 571 ADD(P, Q) DBL(P) sequence of finite field operations 2 h DBL: v1 = z1 , v2 = x1 − v1,... • Security level of h: the best known algorithm takes2 steps for 2 ADD: w1 = z1 , w2 = z1 × w1,... breaking the cryptosystem GF(p) or GF(2m) operations • RSA: Z/nZ with n = pq, p and q primes x±y x×y ... operation modulo large prime (GF(p)) • ECC: GF(p) with p prime or GF(2m) m

field level or irreducible polynomial (GF(2 )) Source: SEC2 recommendations from Certicom (v1.0, Jan. 2000) A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 23/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 24/81 Various Types of Attacks Side Channel Attacks

timing analysis Attack: attempt to find, without any knowledge about the secret: power analysis • the message (or parts of the message) • informations on the message EMR analysis • the secret (or parts of the secret) observation “Old style” side channel attacks: attack perturbation theoretical + invasive fault injection

clic good value maths dico etc probing reverse engineering clac bad value EMR = Electromagnetic radiation A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 25/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 26/81

Side Channel Analysis/Attacks (SCA) What Should be Measured?

Answer: everything that can “enter” and/or “get out” in/from the device • power consumption E D • electromagnetic radiation Ek (M) • M A B Dk (Ek (M)) = M temperature • sound k k • computation time measure • number of cache misses • number and type of error messages attack • ... E k, M???

The measured parameters may provide informations on: General principle: measure external parameter(s) on running device in • global behavior (temperature, power, sound...) order to deduce internal informations • local behavior (EMR, # cache misses...)

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 27/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 28/81 Power Consumption Analysis “Read” the Traces

General principle: 1. measure the current i(t) in the cryptosystem 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 2. use those measurements to “deduce” secret informations

crypto. secret key = 962571. . .

i(t) R • algorithm decomposition into steps • VDD detect loops I constant time for the loop iterations traces I non-constant time for the loop iterations

Source: [8] Kocher, Jaffe and Jun. Differential Power Analysis, Crypto99

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 29/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 30/81

Differences & External Signature Simple Power Analysis (SPA) An algorithm has a current signature and a time signature: r = c0 for i from 1 to n do if ai = 0 then r = r+c1 else r = r×c2

T

t T+T× I

t I+ I× i 1 2 3 4 5 6 7 8 ai 0 1 1 0 1 0 0 1 Source: [8]

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 31/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 32/81 SPA in Practice Limits of the SPA

General principle: Example of behavior difference: (activity into a register) algorithm t 0000000000000000 0000000000000000

difference in the behavior analysis t + 1 1111111111111111 0000000000000001

Important: a small difference may be evaluated has a noise during the difference in the trace measurement traces cannot be distinguished Methods: interpretation of the differences in • control signals Question: what can be done when differences are too small? • computation time • operand values Answer: use statistics over several traces • ...

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 33/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 34/81

DPA Example Electromagnetic Radiation Analysis (1/2)

General principle: use a probe to measure the EMR average VDD

correct circuit

incorrect GND

incorrect EMR measurement: • global EMR with a large probe • local EMR with a microprobe

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 35/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 36/81 Electromagnetic Radiation Analysis (2/2) Basic Power Analysis Attack on ECC

VDD EMR analysis methods: encryption signature I circuit • simple electromagnetic analysis: SEMA traces etc

• protocol level differential electromagnetic analysis: DEMA GND DBL DBL DBL ADD DBL ADD DBL DBL

[k]P Local EMR analysis may be used to determine internal architecture 0 0 0 1 1 0 details, and then select weak parts of curve level the circuit for the attack ADD(P, Q) DBL(P)

X-Y table Scalar multiplication operation for i from 0 to t − 1 do x±y x×y ... if ki = 1 then Q = ADD(P, Q)

field level P = DBL(P)

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 37/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 38/81

Random Number Generators (RNG) Historical Hardware TRNGs

Pseudo random number generator (PRNG): • deterministic algorithms • very high throughput and good statistical properties • various algorithms quality/throughput/cost tradeoffs

True random number generator (TRNG): • non-deterministic algorithms (physical random source) • limited throughput • quality = func(environment parameters, . . . ) attacks

Hybrid random number generator (HRNG): • HRNG= TRNG+ PRNG • very high speed and very good quality • selection needs more research ATT Patent 1946, source: P. Kohlbrenner A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 39/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 40/81 TRNGs Selection Free Running Ring Oscillator Physical noise source: odd # of inverters (n) • quantum physics inverter: • radioactive decay in out 1 0 1 0 S • atmospheric noise 0 1 • thermal/Johnson noise 1 0 ring oscillator • jitter in ring oscillator sampling • meta-stability S • noises in circuits: 1/f, shot, popcorn, crosstalk, . . . • ... time period = f (n,...) Characteristics: • throughput (? Mb/s) S φ • randomness quality (bias, /bit, stability, effects of φ random jitter environment variations, . . . ) (timing/phase instability) time • security fully integrated in the chip period = f (n, φ, . . .)

• cost (silicon area, power consumption) VLSI implement.

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 41/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 42/81

Example of Ring Oscillator (RO) Based TRNG Post Processing Purpose: enhance statistical parameters of the output sequence RO1 xor post random • reduce bias Pr(x = 1) = 0.5 +  (AIS 31:  < 0.0173) RO 2 tree processing bits • increase entropy per bit (the real randomness) RO 3 Typical post processing methods: . fs . optional • Von Neumann correction RO on-line k alarm input bits (0,0) (0,1) (1,0) (1,1) test output bit none 1 0 none quality evaluation • Linear feedback shift register (LFSR) • Hash function (e.g. SHA) Description: • Ciphering (e.g. AES) • k free running ring oscillators • Resilient function (e.g. error code computations) • f is the sampling frequency s • ... • post processing: enhance statistical parameters • on-line quality test (environment variations, attacks, . . . ) Trade-off: entropy per bit, data rate, cost, quality

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 43/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 44/81 RO Based TRNG Example Example of Measurements on FPGAs

[11] TRNG from [6] (Altera Stratix II):

Run test AIS31 Description: 100

• k = 114 RO of 13 inverters 90 • resilient function: BCH(256, 13, 113) code 80 • mathematical model (but not realistic assumptions) 70 • data rate 2.5 Mb/s on FPGA 60 50 Problems: 40 30

• very complex calibration (external measurement of the jitter!!!) Success percentage (%) 20 • too many transitions in the xor tree 10 • TRNG by Dichtl and al. setup/hold violations in the flip-flop 0 1 2 3 4 5 6 7 8 9 6 • ... Data rates (Mb/s) x 10

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 45/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 46/81

Cryptographic Processors Part III • Mechanical devices Processors and Co-Processors Example 1: Confederate Cipher Disc used during the American Civil War (1861-1865) Example 2: CD-57 portable, produced in 1957 Cryptographic Processors • Electromechanical devices

Cryptographic Co-Processors & Accelerators Example: Enigma used during the 2nd World War

• Electronic circuits Trusted Platform Module (TPM) 1970s for bank applications

• Tomorrow?

Images sources: http://fr.wikipedia.org/ & http://cryptomuseum.com/

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 47/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 48/81 The Bombe IBM 4765 PCIe Cryptographic Coprocessor Electromechanical device designed for deciphering (i.e. breaking) Enigma

AES, DES, TDES RSA sign. ≤4096b ECDSA sign. p521 SHA-1, SHA-2, . . .

Bletchley Park (http://www.bletchleypark.org.uk/) ...

Images source: https://www-03.ibm.com/security/cryptocards/pciecc/pdf/PCIe_Spec_Sheet.pdf Image source: http://fr.wikipedia.org/ NIST certification: http://csrc.nist.gov/groups/STM/cmvp/documents/140-1/140sp/140sp1505.pdf

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 49/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 50/81

Trusted Platform Module (TPM) TPM Integration in a PC Platform Hardware device for and accreditation of the platform CPU • Boot process integrity: system start-up is tamper free Verification of (many) stored measurements from previous boots Verification of code & behavior: BIOS, chip-set, peripherals, firmwares, North boot loader, kernel, . . . RAM Bridge • TPM Data protection: robust against software and physical attacks LPC Security keys, , certificates South • Improved Security support for : Super IO Bridge Encryption, hash functions, RNG, & management Memory protection, session isolation, protected partition, security PCI parallel support for virtual machines IDE serial • Device physically locked to the USB PS/2 ...... Specifications: Group (TCG, created in 1999) http://www.trustedcomputinggroup.org/ LPC: A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 51/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 52/81 TPM Example from Infineon Our ECC (Co)Processor

counter- measures COMM. TPM Block diagram (from Infineon white paper [5]):

key recoding CTRL register file AGU

±, × ±, × 1/x CTRL CTRL CTRL local register(s) local register(s) local register(s)

• Functional units: ±, ×, 1/x for GF(p) or GF(2m), key recoding • Memory: main register file + internal registers in FUs • Control: operations (curve and field levels) schedule, parameters management, active countermeasures. . .

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 53/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 54/81

Instruction Set Part IV Programming interface of a processor: Instruction Set Extensions • Instruction set (or instruction set architecture)

I Control flow operations Instruction Set I Computations operations (ALU, floating-point) I Memory and data handling operations

Instruction Set Extensions • Registers features and organization • Memory mapping Addition of Long Operands • Virtual memory support • Virtual machines support Extension for Finite Fields Arithmetic • Interruption and exception handling • Permissions Extensions for AES • I/O space organization • ...

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 55/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 56/81 Instruction Set Extensions Examples of Instruction Set Extensions Objectives: For x86 architectures: • Improve efficiency for (useful) operations and memory handling: • MMX (1996, ) +57 instructions and +8 registers 64b multimedia, signal processing, codecs, cryptography, . . . • 3D Now (1997, AMD, ) +21 instructions and +8 registers 64b (FP/MMX) • Increase internal parallelism: • SSE (1999, Intel) +70 instructions and +8 registers 128b vector, matrix, SIMD (single instruction multiple data) • SSE2 (2001, Intel) +144 instructions and +8 registers 128b • Efficient support for specific operations: • SSE3 (2004, Intel) +13 instructions dedicated hardware operators • SSE4 (2006, Intel) +54 instructions • AVX (2008, Intel) +12 instructions and registers 128→256b Programming models: • AES (2008) • Compiler assisted generation: • F16C (2009, AMD) compiler identifies patterns to be mapped on the extended IS • XOP (2009, AMD) • Optimized library based design: • FMA (2011) replace a standard and generic library by a specific library for the target • BMI (2012) processor extended IS Instruction set extensions have been proposed for other architectures: • Intrinsics: MAX-1 for PA-RISC, VIS for Sparc, AltiVec for Apple-IBM-Motorola), access to low-level instructions from a high-level programming language MIPS-3D for MIPS, NEON for ARM, . . . A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 57/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 58/81

MMX Example AVX Example • 8 new 64b registers (in 80b floating-point ones): MM0, MM1,..., MM7 • 4 new vector data types: 63 0 packed bytes

packed words

packed double words

quad word

• 57 new instructions: (examples) Example of new instructions: I logic: , , , , , , , ,... por pand pxor psrlw psrld psrlq psraw psraq • VBROADCASTSS, VBROADCASTSD, VBROADCASTF128: broadcast 32b, I maths: paddb, paddsb, paddusb, paddw, paddsw, paddusw, paddd, pmulhw, pmullw,... 64b or 128b word to ALL elements I comparisons: pcmpeqb, pcmpeqw,... • permutations I data movement: movd, movq,... • shuffles I data packing: packsswb, packssdw, punpckhbw,...

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 59/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 60/81 XOP Example Addition of Long Operands

Addition of long operands is very important in: • multiprecision arithmetic, e.g. GMP, MPRI, MPFR, MPFI libraries • asymmetric cryptography:

I RSA 1024–8192 bits integers and modular arithmetic m I ECC 160–600 bits finite fields elements GF(p) and GF(2 ) 5 I Fully Homomorphic Encryption (> 10 bits integers or polynomials) I El Gamal signature, Diffie-Hellman key exchange, . . .

Addition of long integers is not efficient using a standard ISA: • Addition of w-bit words (w ∈ {32, 64}) produces a w-bit sum

• Carry out (cout) handled as a flag (not an accessible value) 1 • Bad branch prediction since Proba(cout = 1) ≈ 2 Images source: https://chessprogramming.wikispaces.com/XOP

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 61/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 62/81

Hardware Extension for Large Additions Software Usage of ADC Instruction

Proposed solution: local flip-flop to store carries “between” sub-words Programming models: • At assembly language level: explicit use of the instruction b a ADC not very popular cout cin + • At compiler level: c s = a + b + (c==1 ? 1 : 0) s = a + b is identified as a call to the ADC instruction does not work! • c produced at step i is used as c at step i + 1 out in • Using intrinsic: fake function call (replaced by assembler code) • very cheap, no more control issues unsigned char _addcarry_u64 ( • c = 0 for the first sub-word in unsigned char c_in, • New instruction ADC (i.e. add with carry) unsigned __int64 a, unsigned __int64 b, Problem: how can I use this instruction from a high level programming unsigned __int64 * out ) language?

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 63/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 64/81 Hyper Short Introduction to GF(2m) Arithmetic Extensions for GF(2m) Arithmetic

Pm−1 i • Element of the field: A = i=0 ai x with ai ∈ {0, 1} SSE2-4, AVX support: • PXOR instruction: 128-bit exclusive OR 3 2 3 2 A = 1 · x + 1 · x + 0 · x + 1 and B = 0 · x + 1 · x + 1 · x + 1 addition in GF(2m) A: 1 1 0 1 B: 0 1 1 1 • PCLMULQDQ instruction: 64-bit × 64 → 128-bit product product in GF(2m) • Field addition: component wise addition in GF(2) (i.e. bit wise XOR) Carry-less multiplication A + B: 1 0 1 0 Other instructions: PCLMULLQLQDQ, PCLMULHQLQDQ, PCLMULLQHQDQ, PCLMULHQHQDQ • Field multiplication: polynomial multiplication • inversion in the field: euclidean algorithm with GF(2m) addition, bit A × B: 0 0 1 0 0 0 1 1 manipulation instructions

Future extensions (?): 256 × 256 → 512 bits In many cases, we need arithmetic modulo an irreducible polynomial

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 65/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 66/81

Advanced Encryption Standard (AES) AES Round Operations

Established by NIST in 2001

Symmetric encryption

Block size: 128 bits

key length #round 128 10 192 12 256 14

Based on substitution- permutation network Image source: http://fr.wikipedia.org/ Images source: http://fr.wikipedia.org/ NIST: National Institute of Standards and Technology A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 67/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 68/81 Instruction Set Extension for AES

• AESKEYGENASSIST key generation (partial operations) Example

• AESENC one round of encryption Source: • AESENCLAST last round of encryption Shay Gueron • AESDEC one round of decryption Intel’s New AES Instructions for Enhanced Performance and Security • AESDECLAST last round of decryption Proc. Fast Software Encryption (FSE) 2009 • AESIMC inverse mix columns http://iacr.org/archive/ fse2009/56650054/56650054.pdf • PCLMULQDQ GF(2m) multiplication (carry less)

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 69/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 70/81

Hardware Support for Security: Pros/Cons Conclusion & Future Prospects Security support in processors: At cluster/network/system level: (autonomous) dedicated processors • Requires hardware blocks (software is not secure enough) very high security (isolation, strong storage, protections • Use secured libraries at OS, cryptographic and application levels against SCAs) • Important features: isolation, authentication, identification, strong cost, not flexible, requires system level integration cryptographic primitives (cipher, hash, RNG, key management) (client/server prg. model) • Important threats: attacks at ALL levels (protocols, software, IP, OS, maths, physical. . . ) At computer level: co-processors and accelerators Future security support: very high security (isolation, protections against SCAs) • Advanced TPMs not flexible, requires computer level integration • Advanced security co-processors, accelerators, IPs • End-to-end trust chain At processor/core level: instruction set extensions • Security solutions vs economic models vs social aspects flexible security against SCAs privacy protection, anti-trust, small vs huge industries, . . . • ...

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 71/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 72/81 ReferencesI ReferencesII

M. Alioto, L. Giancane, G. Scotti, and A. Trifiletti. M. Dichtl and J. D. Golic. Leakage power analysis attacks: A novel class of attacks to nanometer cryptographic High-speed true with logic gates only. circuits. In Proc. Cryptographic Hardware and Embedded Systems (CHES), volume 4727 of LNCS, IEEE Transactions on Circuits and Systems I, 57(2):355–367, February 2010. pages 45–62. Springer, September 2007. R. Anderson, M. Bond, J. Clulow, and S. Skorobogatov. P. C. Kocher. Cryptographic processors – a survey. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. Technical report, University of Cambridge, Computer Laboratory, Cambridge, UK, August In Proc. Advances in Cryptology (CRYPTO), volume 1109 of LNCS, pages 104–113. 2005. Springer, August 1996. Paper version [3]. P. C. Kocher, J. Jaffe, and B. Jun. R. Anderson, M. Bond, J. Clulow, and S. Skorobogatov. Differential power analysis. Cryptographic processors – a survey. In Proc. Advances in Cryptology (CRYPTO), volume 1666 of LNCS, pages 388–397. Proceedings of the IEEE, 94(2):357–369, February 2006. Springer, August 1999. Research report [2]. F. Koeune and F.-X. Standaert. H. Bar-El, H. Choukri, D. Naccache, M. Tunstall, and C. Whelan. A tutorial on physical security and side-channel attacks. The sorcerer’s apprentice guide to fault attacks. In 5th International School on Foundations of Security Analysis and Design (FOSAD), Proceedings of the IEEE, 94(2):370–382, February 2006. volume 3655 of LNCS, pages 78–108. Springer-Verlag, 2005. H. Brandl and T. Rosteck. L. Lin and W. Burleson. Technology, implementation and application of the standard Leakage-based differential power analysis (LDPA) on sub-90nm CMOS cryptosystems. (TCG). In Proc. IEEE International Symposium on Circuits and Systems (ISCAS), pages 252–255, White paper, Infineon, 2004. Seattle, WA, USA, May 2008.

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 73/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 74/81

ReferencesIII Good Books (in French)

Histoire des codes secrets Simon Singh 1999 Livre de poche B. Sunar, W. J. Martin, and D. R. Stinson. A provably secure true random number generator with built-in tolerance to active attacks. IEEE Transactions on Computers, 56(1):109–119, January 2007.

Math´ematiques,espionnage et piratage informatique Joan Gomez 2010 Le monde est math´ematique,RBA

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 75/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 76/81 Good Books (in French) Good Books (in French)

Cryptographie appliqu´ee Courbes elliptiques Bruce Schneier Philippe Guillot 1997, 2`eme´edition 2010 Wiley Hermes ISBN: 2–84180–036–9 ISBN: 978-2-7462-2392-9

Micro et nano-´electronique Bases, Composants, Circuits Cours de cryptographie Herv´eFanet Gilles Z´emor 2006 2000 Dunod Cassini ISBN: 2–10–049141–5 ISBN: 2–84225–020–6

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 77/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 78/81

Good Books (in English) Good Books (in English)

CMOS VLSI Design A Circuits and Systems Perspective Neil Weste and David Harris 3rd edition, 2004 Handbook of Applied Cryptography Addison Wesley Alfred J. Menezes, Paul C. van Oorschot and ISBN: 0–321–14901–7 Scott A. Vanstone 2001 CRC Press Power Analysis Attacks ISBN:0-8493-8523-7 Revealing the Secrets of Smart Cards Web: http://cacr.uwaterloo.ca/hac/ Stefan Mangard, Elisabeth Oswald and Thomas Popp 2007 Springer ISBN:978-0-387-30857-9

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 79/81 A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 80/81 The end, questions ?

Contact: • mailto:[email protected] • http://people.irisa.fr/Arnaud.Tisserand/ • CAIRN Group http://www.irisa.fr/cairn/ • IRISA Laboratory, CNRS–INRIA–Univ. Rennes 1 6 rue Kerampont, CS 80518, F-22305 Lannion cedex, France

Thank you

A. Tisserand, CNRS–IRISA–CAIRN. Processor Extensions for Security 81/81