Quick viewing(Text Mode)

Eddsa Signatures and Ed25519

EdDSA signatures and Ed25519

Peter Schwabe

Joint work with Daniel J. Bernstein, Niels Duif, Tanja Lange, and Bo-Yin Yang

February 20, 2012

Coding Theory and Seminar, University of Basel I 99 summits over 3000 meters (highest peak: 3952 m)

I Wildlife includes black bears, salmon, monkeys...

I Academia Sinica is a research facility funded by ROC

I About 30 institutes

I About 800 principal investigators, more than 750 postdocs

A few words about Taiwan and Academia Sinica

I Taiwan (台灣) is an island south of China 2 I About 36,200 km large

I Territory of the Republic of China (not to be confused with the People’s Republic of China)

I Capital is Taipei (台北)

I Marine tropical climate

EdDSA signatures and Ed255192 I Academia Sinica is a research facility funded by ROC

I About 30 institutes

I About 800 principal investigators, more than 750 postdocs

A few words about Taiwan and Academia Sinica

I Taiwan (台灣) is an island south of China 2 I About 36,200 km large

I Territory of the Republic of China (not to be confused with the People’s Republic of China)

I Capital is Taipei (台北)

I Marine tropical climate

I 99 summits over 3000 meters (highest peak: 3952 m)

I Wildlife includes black bears, salmon, monkeys...

EdDSA signatures and Ed255192 A few words about Taiwan and Academia Sinica

I Taiwan (台灣) is an island south of China 2 I About 36,200 km large

I Territory of the Republic of China (not to be confused with the People’s Republic of China)

I Capital is Taipei (台北)

I Marine tropical climate

I 99 summits over 3000 meters (highest peak: 3952 m)

I Wildlife includes black bears, salmon, monkeys...

I Academia Sinica is a research facility funded by ROC

I About 30 institutes

I About 800 principal investigators, more than 750 postdocs

EdDSA signatures and Ed255192 Introduction – the NaCl library

EdDSA signatures and Ed255193 I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I elliptic-curve Diffie-Hellman key-exchange software.

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained

I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

EdDSA signatures and Ed255194 I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.

I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained

I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

EdDSA signatures and Ed255194 I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.

I At the end of 2010 the library contained

I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

I We are willing to sacrifice compatability to other crypto libraries

EdDSA signatures and Ed255194 I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained

I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.

EdDSA signatures and Ed255194 I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained

I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.

I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

EdDSA signatures and Ed255194 How it started

I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)

I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)

I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication

I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained

I the stream cipher Salsa20,

I the Poly1305 secret-key authenticator, and

I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.

I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption

I This serves the typical one-to-one communication of most internet connections

I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures

EdDSA signatures and Ed255194 I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic

I Edwards addition is complete (important for secure implementations)

I Curve25519 has an Edwards representation and offers very high security

I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice

Designing a public-key signature scheme

I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation

I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...

EdDSA signatures and Ed255195 I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice

Designing a public-key signature scheme

I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation

I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...

I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic

I Edwards addition is complete (important for secure implementations)

I Curve25519 has an Edwards representation and offers very high security

EdDSA signatures and Ed255195 Designing a public-key signature scheme

I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation

I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...

I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic

I Edwards addition is complete (important for secure implementations)

I Curve25519 has an Edwards representation and offers very high security

I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice

EdDSA signatures and Ed255195 I Verification speed primarily matters in applications that need to verify many signatures

I Idea: To get close to RSA verification speed, support batch verification

I Easier: Verify batches of signatures under the same public key

I Harder (but much more useful!): Verify batches of signatures under different public keys

I We don’t know where the NaCl library is used, so support the latter

I None of the above-mentioned schemes supports fast batch verification

I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required

One step back: Is ECC really faster than, e.g., RSA?

I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring

I Very hard to beat with any elliptic-curve-based signature scheme

EdDSA signatures and Ed255196 I Easier: Verify batches of signatures under the same public key

I Harder (but much more useful!): Verify batches of signatures under different public keys

I We don’t know where the NaCl library is used, so support the latter

I None of the above-mentioned schemes supports fast batch verification

I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required

One step back: Is ECC really faster than, e.g., RSA?

I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring

I Very hard to beat with any elliptic-curve-based signature scheme

I Verification speed primarily matters in applications that need to verify many signatures

I Idea: To get close to RSA verification speed, support batch verification

EdDSA signatures and Ed255196 I None of the above-mentioned schemes supports fast batch verification

I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required

One step back: Is ECC really faster than, e.g., RSA?

I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring

I Very hard to beat with any elliptic-curve-based signature scheme

I Verification speed primarily matters in applications that need to verify many signatures

I Idea: To get close to RSA verification speed, support batch verification

I Easier: Verify batches of signatures under the same public key

I Harder (but much more useful!): Verify batches of signatures under different public keys

I We don’t know where the NaCl library is used, so support the latter

EdDSA signatures and Ed255196 ⇒ Start with Schnorr signatures, modify as required

One step back: Is ECC really faster than, e.g., RSA?

I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring

I Very hard to beat with any elliptic-curve-based signature scheme

I Verification speed primarily matters in applications that need to verify many signatures

I Idea: To get close to RSA verification speed, support batch verification

I Easier: Verify batches of signatures under the same public key

I Harder (but much more useful!): Verify batches of signatures under different public keys

I We don’t know where the NaCl library is used, so support the latter

I None of the above-mentioned schemes supports fast batch verification

I Schnorr signatures only require small changes (and have many nice features anyways)

EdDSA signatures and Ed255196 One step back: Is ECC really faster than, e.g., RSA?

I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring

I Very hard to beat with any elliptic-curve-based signature scheme

I Verification speed primarily matters in applications that need to verify many signatures

I Idea: To get close to RSA verification speed, support batch verification

I Easier: Verify batches of signatures under the same public key

I Harder (but much more useful!): Verify batches of signatures under different public keys

I We don’t know where the NaCl library is used, so support the latter

I None of the above-mentioned schemes supports fast batch verification

I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required EdDSA signatures and Ed255196 I Private key: a ∈ {1, . . . , `}, public key: A = −aB

I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with

R = rB S = (r + H(R,M)a) mod `

I Verifier computes R = SB + H(R,M)A and checks that

H(R,M) = H(R,M)

Recall Schnorr signatures

I Variant of ElGamal Signatures

I Many more variants (DSA, ECDSA, KCDSA, ... )

I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group

EdDSA signatures and Ed255197 I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with

R = rB S = (r + H(R,M)a) mod `

I Verifier computes R = SB + H(R,M)A and checks that

H(R,M) = H(R,M)

Recall Schnorr signatures

I Variant of ElGamal Signatures

I Many more variants (DSA, ECDSA, KCDSA, ... )

I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB

EdDSA signatures and Ed255197 I Verifier computes R = SB + H(R,M)A and checks that

H(R,M) = H(R,M)

Recall Schnorr signatures

I Variant of ElGamal Signatures

I Many more variants (DSA, ECDSA, KCDSA, ... )

I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB

I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with

R = rB S = (r + H(R,M)a) mod `

EdDSA signatures and Ed255197 Recall Schnorr signatures

I Variant of ElGamal Signatures

I Many more variants (DSA, ECDSA, KCDSA, ... )

I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB

I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with

R = rB S = (r + H(R,M)a) mod `

I Verifier computes R = SB + H(R,M)A and checks that

H(R,M) = H(R,M)

EdDSA signatures and Ed255197 The EdDSA signature scheme

EdDSA signatures and Ed255198 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)

I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}

I Hash function H with 2b-bit I H = SHA-512 output

I Non-square d ∈ Fq I d = −121665/121666

I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.

EdDSA and Ed25519 parameters

EdDSA Ed25519-SHA-512

I Integer b ≥ 10 I b = 256

EdDSA signatures and Ed255199 I Hash function H with 2b-bit I H = SHA-512 output

I Non-square d ∈ Fq I d = −121665/121666

I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.

EdDSA and Ed25519 parameters

EdDSA Ed25519-SHA-512

I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)

I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}

EdDSA signatures and Ed255199 I Non-square d ∈ Fq I d = −121665/121666

I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.

EdDSA and Ed25519 parameters

EdDSA Ed25519-SHA-512

I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)

I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}

I Hash function H with 2b-bit I H = SHA-512 output

EdDSA signatures and Ed255199 Ed25519 curve is birationally equivalent to the Curve25519 curve.

EdDSA and Ed25519 parameters

EdDSA Ed25519-SHA-512

I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)

I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}

I Hash function H with 2b-bit I H = SHA-512 output

I Non-square d ∈ Fq I d = −121665/121666

I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1)

EdDSA signatures and Ed255199 EdDSA and Ed25519 parameters

EdDSA Ed25519-SHA-512

I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)

I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}

I Hash function H with 2b-bit I H = SHA-512 output

I Non-square d ∈ Fq I d = −121665/121666

I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.

EdDSA signatures and Ed255199 b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8

I Compute A = aB

I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)

EdDSA keys

I Secret key: b-bit string k

I Compute H(k) = (h0, . . . , h2b−1)

EdDSA signatures and Ed25519 10 I Compute A = aB

I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)

EdDSA keys

I Secret key: b-bit string k

I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8

EdDSA signatures and Ed25519 10 p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)

EdDSA keys

I Secret key: b-bit string k

I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8

I Compute A = aB

I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits)

EdDSA signatures and Ed25519 10 EdDSA keys

I Secret key: b-bit string k

I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8

I Compute A = aB

I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)

EdDSA signatures and Ed25519 10 Verification

I Verifier parses A from A and R from R

I Computes H(R,A,M)

I Checks group equation

8SB = 8R + 8H(R,A,M)A

I Rejects if parsing fails or equation does not hold

EdDSA signatures

Signing

2b I Message M determines r = H(hb, . . . , h2b−1,M) ∈ {0,..., 2 − 1}

I Define R = rB

I Define S = (r + H(R,A,M)a) mod `

I Signature: (R,S), with S the b-bit little-endian encoding of S

I (R,S) has 2b bits (3 known to be zero)

EdDSA signatures and Ed25519 11 EdDSA signatures

Signing

2b I Message M determines r = H(hb, . . . , h2b−1,M) ∈ {0,..., 2 − 1}

I Define R = rB

I Define S = (r + H(R,A,M)a) mod `

I Signature: (R,S), with S the b-bit little-endian encoding of S

I (R,S) has 2b bits (3 known to be zero)

Verification

I Verifier parses A from A and R from R

I Computes H(R,A,M)

I Checks group equation

8SB = 8R + 8H(R,A,M)A

I Rejects if parsing fails or equation does not hold

EdDSA signatures and Ed25519 11 EdDSA and Ed25519 security

EdDSA signatures and Ed25519 12 I Schnorr: H(R,M)

I EdDSA: H(R,A,M)

I Schnorr signatures and EdDSA include R in the hash

I Signatures are hash-function-collision resilient

I Including A alleviates concerns about attacks against multiple keys

Collision resilience

I ECDSA uses H(M)

I Collisions in H allow existential forgery

EdDSA signatures and Ed25519 13 I Including A alleviates concerns about attacks against multiple keys

Collision resilience

I ECDSA uses H(M)

I Collisions in H allow existential forgery I Schnorr signatures and EdDSA include R in the hash

I Schnorr: H(R,M)

I EdDSA: H(R,A,M)

I Signatures are hash-function-collision resilient

EdDSA signatures and Ed25519 13 Collision resilience

I ECDSA uses H(M)

I Collisions in H allow existential forgery I Schnorr signatures and EdDSA include R in the hash

I Schnorr: H(R,M)

I EdDSA: H(R,A,M)

I Signatures are hash-function-collision resilient

I Including A alleviates concerns about attacks against multiple keys

EdDSA signatures and Ed25519 13 I Potential problems: Bad random-number generators, off-by-one(-byte) bugs

I Even worse: No random-number generator: Sony’s PS3 security disaster

I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)

I Same security as random r under standard PRF assumptions

I Does not consume per-message randomness

I Better for testing (deterministic output)

Foolproof session keys

I Each message needs a different, hard-to-predict r (“session key”)

I Just knowing a few bits of r for many signatures allows to recover a

I Usual approach (e.g., Schnorr signatures): Choose random r for each message

EdDSA signatures and Ed25519 14 I Even worse: No random-number generator: Sony’s PS3 security disaster

I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)

I Same security as random r under standard PRF assumptions

I Does not consume per-message randomness

I Better for testing (deterministic output)

Foolproof session keys

I Each message needs a different, hard-to-predict r (“session key”)

I Just knowing a few bits of r for many signatures allows to recover a

I Usual approach (e.g., Schnorr signatures): Choose random r for each message

I Potential problems: Bad random-number generators, off-by-one(-byte) bugs

EdDSA signatures and Ed25519 14 I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)

I Same security as random r under standard PRF assumptions

I Does not consume per-message randomness

I Better for testing (deterministic output)

Foolproof session keys

I Each message needs a different, hard-to-predict r (“session key”)

I Just knowing a few bits of r for many signatures allows to recover a

I Usual approach (e.g., Schnorr signatures): Choose random r for each message

I Potential problems: Bad random-number generators, off-by-one(-byte) bugs

I Even worse: No random-number generator: Sony’s PS3 security disaster

EdDSA signatures and Ed25519 14 I Same security as random r under standard PRF assumptions

I Does not consume per-message randomness

I Better for testing (deterministic output)

Foolproof session keys

I Each message needs a different, hard-to-predict r (“session key”)

I Just knowing a few bits of r for many signatures allows to recover a

I Usual approach (e.g., Schnorr signatures): Choose random r for each message

I Potential problems: Bad random-number generators, off-by-one(-byte) bugs

I Even worse: No random-number generator: Sony’s PS3 security disaster

I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)

EdDSA signatures and Ed25519 14 Foolproof session keys

I Each message needs a different, hard-to-predict r (“session key”)

I Just knowing a few bits of r for many signatures allows to recover a

I Usual approach (e.g., Schnorr signatures): Choose random r for each message

I Potential problems: Bad random-number generators, off-by-one(-byte) bugs

I Even worse: No random-number generator: Sony’s PS3 security disaster

I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)

I Same security as random r under standard PRF assumptions

I Does not consume per-message randomness

I Better for testing (deterministic output)

EdDSA signatures and Ed25519 14 I Program takes different amount of time depending on the value of s

I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

I Attacker can gain information about the secret scalar by timing the execution of the program

I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

I Ed25519 software does not contain any secret branch conditions

Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

EdDSA signatures and Ed25519 15 I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

I Attacker can gain information about the secret scalar by timing the execution of the program

I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

I Ed25519 software does not contain any secret branch conditions

Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

I Program takes different amount of time depending on the value of s

EdDSA signatures and Ed25519 15 I Attacker can gain information about the secret scalar by timing the execution of the program

I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

I Ed25519 software does not contain any secret branch conditions

Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

I Program takes different amount of time depending on the value of s

I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

EdDSA signatures and Ed25519 15 I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

I Ed25519 software does not contain any secret branch conditions

Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

I Program takes different amount of time depending on the value of s

I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

I Attacker can gain information about the secret scalar by timing the execution of the program

EdDSA signatures and Ed25519 15 I Ed25519 software does not contain any secret branch conditions

Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

I Program takes different amount of time depending on the value of s

I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

I Attacker can gain information about the secret scalar by timing the execution of the program

I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

EdDSA signatures and Ed25519 15 Constant-time implementation Avoiding secret branch conditions

I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar

I Program takes different amount of time depending on the value of s

I This is true, even if A and B take the same amount of time!

I Reason: Branch predictors contained in all modern CPUs

I Attacker can gain information about the secret scalar by timing the execution of the program

I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack

I Ed25519 software does not contain any secret branch conditions

EdDSA signatures and Ed25519 15 I Loading from memory can take a different amount of time depending on the (secret) address s

I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow

I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program

I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack

I Ed25519 software does not perform any loads from secret addresses

Constant-time implementation Avoiding secret lookup indices

I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar

EdDSA signatures and Ed25519 16 I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program

I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack

I Ed25519 software does not perform any loads from secret addresses

Constant-time implementation Avoiding secret lookup indices

I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar

I Loading from memory can take a different amount of time depending on the (secret) address s

I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow

EdDSA signatures and Ed25519 16 I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack

I Ed25519 software does not perform any loads from secret addresses

Constant-time implementation Avoiding secret lookup indices

I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar

I Loading from memory can take a different amount of time depending on the (secret) address s

I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow

I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program

EdDSA signatures and Ed25519 16 I Ed25519 software does not perform any loads from secret addresses

Constant-time implementation Avoiding secret lookup indices

I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar

I Loading from memory can take a different amount of time depending on the (secret) address s

I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow

I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program

I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack

EdDSA signatures and Ed25519 16 Constant-time implementation Avoiding secret lookup indices

I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar

I Loading from memory can take a different amount of time depending on the (secret) address s

I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow

I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program

I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack

I Ed25519 software does not perform any loads from secret addresses

EdDSA signatures and Ed25519 16 Speed of Ed25519

EdDSA signatures and Ed25519 17 Radix 251 51 I Instead break into 5 64-bit integers, use radix 2

I Schoolbook multiplication now 25 64-bit integer multiplications

I Partial results have < 128 bits, adding upper part is add, not adc

I Easy to merge multiplication with reduction (multiplies by 19)

I Better performance on Westmere/Nehalem, worse on 65 nm Core 2 and AMD processors

Fast arithmetic in F2255−19

Radix 264

I Standard: break elements of F2255−19 into 4 64-bit integers I (Schoolbook) multiplication breaks down into 16 64-bit integer multiplications

I Adding up partial results requires many add-with-carry (adc)

I Westmere bottleneck: 1 adc every two cycles vs. 3 add per cycle

EdDSA signatures and Ed25519 18 Fast arithmetic in F2255−19

Radix 264

I Standard: break elements of F2255−19 into 4 64-bit integers I (Schoolbook) multiplication breaks down into 16 64-bit integer multiplications

I Adding up partial results requires many add-with-carry (adc)

I Westmere bottleneck: 1 adc every two cycles vs. 3 add per cycle

Radix 251 51 I Instead break into 5 64-bit integers, use radix 2

I Schoolbook multiplication now 25 64-bit integer multiplications

I Partial results have < 128 bits, adding upper part is add, not adc

I Easy to merge multiplication with reduction (multiplies by 19)

I Better performance on Westmere/Nehalem, worse on 65 nm Core 2 and AMD processors

EdDSA signatures and Ed25519 18 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB

EdDSA signatures and Ed25519 19 i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

EdDSA signatures and Ed25519 19 P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time

EdDSA signatures and Ed25519 19 I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB

EdDSA signatures and Ed25519 19 I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

EdDSA signatures and Ed25519 19 I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

EdDSA signatures and Ed25519 19 I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

EdDSA signatures and Ed25519 19 Fast signing

I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with

ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}

i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions

I Wait, table lookups?

I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one

I Signing takes 87548 cycles on an Intel Westmere CPU

I Key generation takes about 6000 cycles more (read from /dev/urandom)

EdDSA signatures and Ed25519 19 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8

I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

I Verification takes 273364 cycles

Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required

EdDSA signatures and Ed25519 20 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8

I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

I Verification takes 273364 cycles

Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

EdDSA signatures and Ed25519 20 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

I Verification takes 273364 cycles

Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8

EdDSA signatures and Ed25519 20 I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

I Verification takes 273364 cycles

Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

EdDSA signatures and Ed25519 20 I Verification takes 273364 cycles

Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

EdDSA signatures and Ed25519 20 Fast verification

I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)

I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α

I Decompression has α = u/v, merge square root with inversion:

β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.

I Second part: computation of SB − H(R,A,M)A

I Double-scalar multiplication using signed sliding windows

I Different window sizes for B (compile time) and A (run time)

I Verification takes 273364 cycles

EdDSA signatures and Ed25519 20 I Choose independent uniform random 128-bit integers zi

I Compute Hi = H(Ri,Ai,Mi)

I Verify the equation

 X  X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i

I Use Bos-Coster algorithm for multi-scalar multiplication

I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)

Faster batch verification

I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai

EdDSA signatures and Ed25519 21 I Verify the equation

 X  X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i

I Use Bos-Coster algorithm for multi-scalar multiplication

I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)

Faster batch verification

I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai

I Choose independent uniform random 128-bit integers zi

I Compute Hi = H(Ri,Ai,Mi)

EdDSA signatures and Ed25519 21 I Use Bos-Coster algorithm for multi-scalar multiplication

I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)

Faster batch verification

I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai

I Choose independent uniform random 128-bit integers zi

I Compute Hi = H(Ri,Ai,Mi)

I Verify the equation

 X  X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i

EdDSA signatures and Ed25519 21 I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)

Faster batch verification

I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai

I Choose independent uniform random 128-bit integers zi

I Compute Hi = H(Ri,Ai,Mi)

I Verify the equation

 X  X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i

I Use Bos-Coster algorithm for multi-scalar multiplication

EdDSA signatures and Ed25519 21 Faster batch verification

I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai

I Choose independent uniform random 128-bit integers zi

I Compute Hi = H(Ri,Ai,Mi)

I Verify the equation

 X  X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i

I Use Bos-Coster algorithm for multi-scalar multiplication

I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)

EdDSA signatures and Ed25519 21 I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

The Bos-Coster algorithm

Pn I Computation of Q = 1 siPi

EdDSA signatures and Ed25519 22 I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

The Bos-Coster algorithm

Pn IIComputation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

EdDSA signatures and Ed25519 22 I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

The Bos-Coster algorithm

Pn IIComputation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

EdDSA signatures and Ed25519 22 I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times

I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:

I Each swap-down step needs only one comparison (instead of two)

I Swap-down loop is more friendly to branch predictors

A fast heap

IIHeap is a binary tree, each parent node is larger than the two child nodes

I Data structure is stored as a simple array, positions in the array determine positions in the tree

I Root is at position 0, left child node at position 1, right child node at position 2 etc.

I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c

EdDSA signatures and Ed25519 23 I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:

I Each swap-down step needs only one comparison (instead of two)

I Swap-down loop is more friendly to branch predictors

A fast heap

I Heap is a binary tree, each parent node is larger than the two child nodes

I Data structure is stored as a simple array, positions in the array determine positions in the tree

I Root is at position 0, left child node at position 1, right child node at position 2 etc.

I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c

I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times

EdDSA signatures and Ed25519 23 A fast heap

I Heap is a binary tree, each parent node is larger than the two child nodes

I Data structure is stored as a simple array, positions in the array determine positions in the tree

I Root is at position 0, left child node at position 1, right child node at position 2 etc.

I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c

I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times

I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:

I Each swap-down step needs only one comparison (instead of two)

I Swap-down loop is more friendly to branch predictors

EdDSA signatures and Ed25519 23 I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

I Optimize the heap on the assembly level

The Bos-Coster algorithm

Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

EdDSA signatures and Ed25519 24 I Optimize the heap on the assembly level

The Bos-Coster algorithm

Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

EdDSA signatures and Ed25519 24 The Bos-Coster algorithm

Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn

I Each step requires the two largest scalars, one scalar subtraction and one point addition

I Each step “eliminates” expected log n scalar bits

I Requires fast access to the two largest scalars: put scalars into a heap

I Crucial for good performance: fast heap implementation

I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits

I Then: extend heap with the zi

I Optimize the heap on the assembly level

EdDSA signatures and Ed25519 24 Results

I New fast and secure signature scheme

I (Slow) C and Python reference implementations

I Fast AMD64 assembly implementations

I Also new speed records for Curve25519 ECDH

I All software in the public domain and included in eBATS

I All reported benchmarks (except batch verification) are eBATS benchmarks

I All reported benchmarks had TurboBoost switched off

I Software to be included in the NaCl library

http://ed25519.cr.yp.to/ http://nacl.cr.yp.to/

EdDSA signatures and Ed25519 25