Eddsa Signatures and Ed25519
EdDSA signatures and Ed25519
Peter Schwabe
Joint work with Daniel J. Bernstein, Niels Duif, Tanja Lange, and Bo-Yin Yang
February 20, 2012
Coding Theory and Cryptography Seminar, University of Basel I 99 summits over 3000 meters (highest peak: 3952 m)
I Wildlife includes black bears, salmon, monkeys...
I Academia Sinica is a research facility funded by ROC
I About 30 institutes
I About 800 principal investigators, more than 750 postdocs
A few words about Taiwan and Academia Sinica
I Taiwan (台灣) is an island south of China 2 I About 36,200 km large
I Territory of the Republic of China (not to be confused with the People’s Republic of China)
I Capital is Taipei (台北)
I Marine tropical climate
EdDSA signatures and Ed255192 I Academia Sinica is a research facility funded by ROC
I About 30 institutes
I About 800 principal investigators, more than 750 postdocs
A few words about Taiwan and Academia Sinica
I Taiwan (台灣) is an island south of China 2 I About 36,200 km large
I Territory of the Republic of China (not to be confused with the People’s Republic of China)
I Capital is Taipei (台北)
I Marine tropical climate
I 99 summits over 3000 meters (highest peak: 3952 m)
I Wildlife includes black bears, salmon, monkeys...
EdDSA signatures and Ed255192 A few words about Taiwan and Academia Sinica
I Taiwan (台灣) is an island south of China 2 I About 36,200 km large
I Territory of the Republic of China (not to be confused with the People’s Republic of China)
I Capital is Taipei (台北)
I Marine tropical climate
I 99 summits over 3000 meters (highest peak: 3952 m)
I Wildlife includes black bears, salmon, monkeys...
I Academia Sinica is a research facility funded by ROC
I About 30 institutes
I About 800 principal investigators, more than 750 postdocs
EdDSA signatures and Ed255192 Introduction – the NaCl library
EdDSA signatures and Ed255193 I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained
I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
EdDSA signatures and Ed255194 I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained
I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
EdDSA signatures and Ed255194 I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
I At the end of 2010 the library contained
I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
I We are willing to sacrifice compatability to other crypto libraries
EdDSA signatures and Ed255194 I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained
I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
EdDSA signatures and Ed255194 I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained
I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
EdDSA signatures and Ed255194 How it started
I My research during Ph.D. was within the European project CACE (Computer Aided Cryptography Engineering)
I One of the deliverables: Networking and Cryptography Library (NaCl, pronounced “salt”)
I Aim of this library: High-speed, high-security, easy-to-use cryptographic protection for network communication
I We are willing to sacrifice compatability to other crypto libraries I At the end of 2010 the library contained
I the stream cipher Salsa20,
I the Poly1305 secret-key authenticator, and
I Curve25519 elliptic-curve Diffie-Hellman key-exchange software.
I This is wrapped in a crypto_box API that performs high-security public-key authenticated encryption
I This serves the typical one-to-one communication of most internet connections
I Still required at the end of 2010: One-to-many authentication, i.e. cryptographic signatures
EdDSA signatures and Ed255194 I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic
I Edwards addition is complete (important for secure implementations)
I Curve25519 has an Edwards representation and offers very high security
I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice
Designing a public-key signature scheme
I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation
I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...
EdDSA signatures and Ed255195 I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice
Designing a public-key signature scheme
I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation
I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...
I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic
I Edwards addition is complete (important for secure implementations)
I Curve25519 has an Edwards representation and offers very high security
EdDSA signatures and Ed255195 Designing a public-key signature scheme
I Core requirements: 128-bit security, fast signing, fast verification, secure software implementation
I Obvious candidates: RSA, ElGamal, DSA, ECDSA, Schnorr...
I Conventional wisdom: ECC is faster than anything based on ∗ factoring or the DLP in Zn I (Twisted) Edwards curves support very fast arithmetic
I Edwards addition is complete (important for secure implementations)
I Curve25519 has an Edwards representation and offers very high security
I Looks like “some” signature scheme using Edwards arithmetic on Curve25519 is a good choice
EdDSA signatures and Ed255195 I Verification speed primarily matters in applications that need to verify many signatures
I Idea: To get close to RSA verification speed, support batch verification
I Easier: Verify batches of signatures under the same public key
I Harder (but much more useful!): Verify batches of signatures under different public keys
I We don’t know where the NaCl library is used, so support the latter
I None of the above-mentioned schemes supports fast batch verification
I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required
One step back: Is ECC really faster than, e.g., RSA?
I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring
I Very hard to beat with any elliptic-curve-based signature scheme
EdDSA signatures and Ed255196 I Easier: Verify batches of signatures under the same public key
I Harder (but much more useful!): Verify batches of signatures under different public keys
I We don’t know where the NaCl library is used, so support the latter
I None of the above-mentioned schemes supports fast batch verification
I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required
One step back: Is ECC really faster than, e.g., RSA?
I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring
I Very hard to beat with any elliptic-curve-based signature scheme
I Verification speed primarily matters in applications that need to verify many signatures
I Idea: To get close to RSA verification speed, support batch verification
EdDSA signatures and Ed255196 I None of the above-mentioned schemes supports fast batch verification
I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required
One step back: Is ECC really faster than, e.g., RSA?
I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring
I Very hard to beat with any elliptic-curve-based signature scheme
I Verification speed primarily matters in applications that need to verify many signatures
I Idea: To get close to RSA verification speed, support batch verification
I Easier: Verify batches of signatures under the same public key
I Harder (but much more useful!): Verify batches of signatures under different public keys
I We don’t know where the NaCl library is used, so support the latter
EdDSA signatures and Ed255196 ⇒ Start with Schnorr signatures, modify as required
One step back: Is ECC really faster than, e.g., RSA?
I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring
I Very hard to beat with any elliptic-curve-based signature scheme
I Verification speed primarily matters in applications that need to verify many signatures
I Idea: To get close to RSA verification speed, support batch verification
I Easier: Verify batches of signatures under the same public key
I Harder (but much more useful!): Verify batches of signatures under different public keys
I We don’t know where the NaCl library is used, so support the latter
I None of the above-mentioned schemes supports fast batch verification
I Schnorr signatures only require small changes (and have many nice features anyways)
EdDSA signatures and Ed255196 One step back: Is ECC really faster than, e.g., RSA?
I RSA with public exponent e = 3 can verify signatures with just one modular multiplication and one squaring
I Very hard to beat with any elliptic-curve-based signature scheme
I Verification speed primarily matters in applications that need to verify many signatures
I Idea: To get close to RSA verification speed, support batch verification
I Easier: Verify batches of signatures under the same public key
I Harder (but much more useful!): Verify batches of signatures under different public keys
I We don’t know where the NaCl library is used, so support the latter
I None of the above-mentioned schemes supports fast batch verification
I Schnorr signatures only require small changes (and have many nice features anyways) ⇒ Start with Schnorr signatures, modify as required EdDSA signatures and Ed255196 I Private key: a ∈ {1, . . . , `}, public key: A = −aB
I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with
R = rB S = (r + H(R,M)a) mod `
I Verifier computes R = SB + H(R,M)A and checks that
H(R,M) = H(R,M)
Recall Schnorr signatures
I Variant of ElGamal Signatures
I Many more variants (DSA, ECDSA, KCDSA, ... )
I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group
EdDSA signatures and Ed255197 I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with
R = rB S = (r + H(R,M)a) mod `
I Verifier computes R = SB + H(R,M)A and checks that
H(R,M) = H(R,M)
Recall Schnorr signatures
I Variant of ElGamal Signatures
I Many more variants (DSA, ECDSA, KCDSA, ... )
I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB
EdDSA signatures and Ed255197 I Verifier computes R = SB + H(R,M)A and checks that
H(R,M) = H(R,M)
Recall Schnorr signatures
I Variant of ElGamal Signatures
I Many more variants (DSA, ECDSA, KCDSA, ... )
I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB
I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with
R = rB S = (r + H(R,M)a) mod `
EdDSA signatures and Ed255197 Recall Schnorr signatures
I Variant of ElGamal Signatures
I Many more variants (DSA, ECDSA, KCDSA, ... )
I Uses finite group G = hBi, with |G| = ` t I Uses hash-function H : G × Z → {0,..., 2 − 1} ∗ I Originally: G ≤ Fq , here: consider elliptic-curve group I Private key: a ∈ {1, . . . , `}, public key: A = −aB
I Sign: Generate secret random r ∈ {1, . . . , `}, compute signature (H(R,M),S) on M with
R = rB S = (r + H(R,M)a) mod `
I Verifier computes R = SB + H(R,M)A and checks that
H(R,M) = H(R,M)
EdDSA signatures and Ed255197 The EdDSA signature scheme
EdDSA signatures and Ed255198 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)
I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}
I Hash function H with 2b-bit I H = SHA-512 output
I Non-square d ∈ Fq I d = −121665/121666
I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.
EdDSA and Ed25519 parameters
EdDSA Ed25519-SHA-512
I Integer b ≥ 10 I b = 256
EdDSA signatures and Ed255199 I Hash function H with 2b-bit I H = SHA-512 output
I Non-square d ∈ Fq I d = −121665/121666
I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.
EdDSA and Ed25519 parameters
EdDSA Ed25519-SHA-512
I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)
I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}
EdDSA signatures and Ed255199 I Non-square d ∈ Fq I d = −121665/121666
I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.
EdDSA and Ed25519 parameters
EdDSA Ed25519-SHA-512
I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)
I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}
I Hash function H with 2b-bit I H = SHA-512 output
EdDSA signatures and Ed255199 Ed25519 curve is birationally equivalent to the Curve25519 curve.
EdDSA and Ed25519 parameters
EdDSA Ed25519-SHA-512
I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)
I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}
I Hash function H with 2b-bit I H = SHA-512 output
I Non-square d ∈ Fq I d = −121665/121666
I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1)
EdDSA signatures and Ed255199 EdDSA and Ed25519 parameters
EdDSA Ed25519-SHA-512
I Integer b ≥ 10 I b = 256 255 I Prime power q ≡ 1 (mod 4) I q = 2 − 19 (prime)
I (b − 1)-bit encoding of I little-endian encoding of 255 elements of Fq {0,..., 2 − 20}
I Hash function H with 2b-bit I H = SHA-512 output
I Non-square d ∈ Fq I d = −121665/121666
I B ∈ {(x, y) ∈ I B = (x, 4/5), with x “even” 2 2 2 2 Fq ×Fq, −x +y = 1+dx y } (twisted Edwards curve E) b−4 b−3 I prime ` ∈ (2 , 2 ) with I ` a 253-bit prime `B = (0, 1) Ed25519 curve is birationally equivalent to the Curve25519 curve.
EdDSA signatures and Ed255199 b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8
I Compute A = aB
I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)
EdDSA keys
I Secret key: b-bit string k
I Compute H(k) = (h0, . . . , h2b−1)
EdDSA signatures and Ed25519 10 I Compute A = aB
I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)
EdDSA keys
I Secret key: b-bit string k
I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8
EdDSA signatures and Ed25519 10 p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)
EdDSA keys
I Secret key: b-bit string k
I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8
I Compute A = aB
I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits)
EdDSA signatures and Ed25519 10 EdDSA keys
I Secret key: b-bit string k
I Compute H(k) = (h0, . . . , h2b−1) b−2 P i I Derive integer a = 2 + 3≤i≤b−3 2 hi I Note that a is a multiple of 8
I Compute A = aB
I Public key: Encoding A of A = (xA, yA) as yA and one (parity) bit of xA (needs b bits) p 2 2 I Compute A from A: xA = ± (yA − 1)/(dyA + 1)
EdDSA signatures and Ed25519 10 Verification
I Verifier parses A from A and R from R
I Computes H(R,A,M)
I Checks group equation
8SB = 8R + 8H(R,A,M)A
I Rejects if parsing fails or equation does not hold
EdDSA signatures
Signing
2b I Message M determines r = H(hb, . . . , h2b−1,M) ∈ {0,..., 2 − 1}
I Define R = rB
I Define S = (r + H(R,A,M)a) mod `
I Signature: (R,S), with S the b-bit little-endian encoding of S
I (R,S) has 2b bits (3 known to be zero)
EdDSA signatures and Ed25519 11 EdDSA signatures
Signing
2b I Message M determines r = H(hb, . . . , h2b−1,M) ∈ {0,..., 2 − 1}
I Define R = rB
I Define S = (r + H(R,A,M)a) mod `
I Signature: (R,S), with S the b-bit little-endian encoding of S
I (R,S) has 2b bits (3 known to be zero)
Verification
I Verifier parses A from A and R from R
I Computes H(R,A,M)
I Checks group equation
8SB = 8R + 8H(R,A,M)A
I Rejects if parsing fails or equation does not hold
EdDSA signatures and Ed25519 11 EdDSA and Ed25519 security
EdDSA signatures and Ed25519 12 I Schnorr: H(R,M)
I EdDSA: H(R,A,M)
I Schnorr signatures and EdDSA include R in the hash
I Signatures are hash-function-collision resilient
I Including A alleviates concerns about attacks against multiple keys
Collision resilience
I ECDSA uses H(M)
I Collisions in H allow existential forgery
EdDSA signatures and Ed25519 13 I Including A alleviates concerns about attacks against multiple keys
Collision resilience
I ECDSA uses H(M)
I Collisions in H allow existential forgery I Schnorr signatures and EdDSA include R in the hash
I Schnorr: H(R,M)
I EdDSA: H(R,A,M)
I Signatures are hash-function-collision resilient
EdDSA signatures and Ed25519 13 Collision resilience
I ECDSA uses H(M)
I Collisions in H allow existential forgery I Schnorr signatures and EdDSA include R in the hash
I Schnorr: H(R,M)
I EdDSA: H(R,A,M)
I Signatures are hash-function-collision resilient
I Including A alleviates concerns about attacks against multiple keys
EdDSA signatures and Ed25519 13 I Potential problems: Bad random-number generators, off-by-one(-byte) bugs
I Even worse: No random-number generator: Sony’s PS3 security disaster
I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)
I Same security as random r under standard PRF assumptions
I Does not consume per-message randomness
I Better for testing (deterministic output)
Foolproof session keys
I Each message needs a different, hard-to-predict r (“session key”)
I Just knowing a few bits of r for many signatures allows to recover a
I Usual approach (e.g., Schnorr signatures): Choose random r for each message
EdDSA signatures and Ed25519 14 I Even worse: No random-number generator: Sony’s PS3 security disaster
I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)
I Same security as random r under standard PRF assumptions
I Does not consume per-message randomness
I Better for testing (deterministic output)
Foolproof session keys
I Each message needs a different, hard-to-predict r (“session key”)
I Just knowing a few bits of r for many signatures allows to recover a
I Usual approach (e.g., Schnorr signatures): Choose random r for each message
I Potential problems: Bad random-number generators, off-by-one(-byte) bugs
EdDSA signatures and Ed25519 14 I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)
I Same security as random r under standard PRF assumptions
I Does not consume per-message randomness
I Better for testing (deterministic output)
Foolproof session keys
I Each message needs a different, hard-to-predict r (“session key”)
I Just knowing a few bits of r for many signatures allows to recover a
I Usual approach (e.g., Schnorr signatures): Choose random r for each message
I Potential problems: Bad random-number generators, off-by-one(-byte) bugs
I Even worse: No random-number generator: Sony’s PS3 security disaster
EdDSA signatures and Ed25519 14 I Same security as random r under standard PRF assumptions
I Does not consume per-message randomness
I Better for testing (deterministic output)
Foolproof session keys
I Each message needs a different, hard-to-predict r (“session key”)
I Just knowing a few bits of r for many signatures allows to recover a
I Usual approach (e.g., Schnorr signatures): Choose random r for each message
I Potential problems: Bad random-number generators, off-by-one(-byte) bugs
I Even worse: No random-number generator: Sony’s PS3 security disaster
I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)
EdDSA signatures and Ed25519 14 Foolproof session keys
I Each message needs a different, hard-to-predict r (“session key”)
I Just knowing a few bits of r for many signatures allows to recover a
I Usual approach (e.g., Schnorr signatures): Choose random r for each message
I Potential problems: Bad random-number generators, off-by-one(-byte) bugs
I Even worse: No random-number generator: Sony’s PS3 security disaster
I EdDSA uses deterministic, pseudo-random session keys H(hb, . . . , h2b−1,M)
I Same security as random r under standard PRF assumptions
I Does not consume per-message randomness
I Better for testing (deterministic output)
EdDSA signatures and Ed25519 14 I Program takes different amount of time depending on the value of s
I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
I Attacker can gain information about the secret scalar by timing the execution of the program
I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
I Ed25519 software does not contain any secret branch conditions
Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
EdDSA signatures and Ed25519 15 I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
I Attacker can gain information about the secret scalar by timing the execution of the program
I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
I Ed25519 software does not contain any secret branch conditions
Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
I Program takes different amount of time depending on the value of s
EdDSA signatures and Ed25519 15 I Attacker can gain information about the secret scalar by timing the execution of the program
I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
I Ed25519 software does not contain any secret branch conditions
Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
I Program takes different amount of time depending on the value of s
I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
EdDSA signatures and Ed25519 15 I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
I Ed25519 software does not contain any secret branch conditions
Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
I Program takes different amount of time depending on the value of s
I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
I Attacker can gain information about the secret scalar by timing the execution of the program
EdDSA signatures and Ed25519 15 I Ed25519 software does not contain any secret branch conditions
Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
I Program takes different amount of time depending on the value of s
I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
I Attacker can gain information about the secret scalar by timing the execution of the program
I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
EdDSA signatures and Ed25519 15 Constant-time implementation Avoiding secret branch conditions
I Many scalar-multiplication algorithms contain parts like if(s) do A else do B where s is a part (e.g., a bit) of the secret scalar
I Program takes different amount of time depending on the value of s
I This is true, even if A and B take the same amount of time!
I Reason: Branch predictors contained in all modern CPUs
I Attacker can gain information about the secret scalar by timing the execution of the program
I In 2011, Brumley and Tuveri recoverd the OpenSSL ECDSA secret signing key through such a timing attack
I Ed25519 software does not contain any secret branch conditions
EdDSA signatures and Ed25519 15 I Loading from memory can take a different amount of time depending on the (secret) address s
I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow
I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program
I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack
I Ed25519 software does not perform any loads from secret addresses
Constant-time implementation Avoiding secret lookup indices
I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar
EdDSA signatures and Ed25519 16 I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program
I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack
I Ed25519 software does not perform any loads from secret addresses
Constant-time implementation Avoiding secret lookup indices
I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar
I Loading from memory can take a different amount of time depending on the (secret) address s
I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow
EdDSA signatures and Ed25519 16 I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack
I Ed25519 software does not perform any loads from secret addresses
Constant-time implementation Avoiding secret lookup indices
I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar
I Loading from memory can take a different amount of time depending on the (secret) address s
I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow
I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program
EdDSA signatures and Ed25519 16 I Ed25519 software does not perform any loads from secret addresses
Constant-time implementation Avoiding secret lookup indices
I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar
I Loading from memory can take a different amount of time depending on the (secret) address s
I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow
I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program
I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack
EdDSA signatures and Ed25519 16 Constant-time implementation Avoiding secret lookup indices
I In particular fixed-basepoint scalar-multiplication algorithms contain parts like P += precomputed_points[s] where s is a part (e.g., a bit) of the secret scalar
I Loading from memory can take a different amount of time depending on the (secret) address s
I Reason: Access to memory is cached, if data is found in cache the load is fast (cache hit), otherwise it’s slow
I Again: Attacker can gain information about the secret scalar by timing the exeuction of the program
I In 2005, Osvik, Shamir, and Tromer discovered the AES key used for hard-disk encryption in Linux in just 65 ms using such a cache-timing attack
I Ed25519 software does not perform any loads from secret addresses
EdDSA signatures and Ed25519 16 Speed of Ed25519
EdDSA signatures and Ed25519 17 Radix 251 51 I Instead break into 5 64-bit integers, use radix 2
I Schoolbook multiplication now 25 64-bit integer multiplications
I Partial results have < 128 bits, adding upper part is add, not adc
I Easy to merge multiplication with reduction (multiplies by 19)
I Better performance on Westmere/Nehalem, worse on 65 nm Core 2 and AMD processors
Fast arithmetic in F2255−19
Radix 264
I Standard: break elements of F2255−19 into 4 64-bit integers I (Schoolbook) multiplication breaks down into 16 64-bit integer multiplications
I Adding up partial results requires many add-with-carry (adc)
I Westmere bottleneck: 1 adc every two cycles vs. 3 add per cycle
EdDSA signatures and Ed25519 18 Fast arithmetic in F2255−19
Radix 264
I Standard: break elements of F2255−19 into 4 64-bit integers I (Schoolbook) multiplication breaks down into 16 64-bit integer multiplications
I Adding up partial results requires many add-with-carry (adc)
I Westmere bottleneck: 1 adc every two cycles vs. 3 add per cycle
Radix 251 51 I Instead break into 5 64-bit integers, use radix 2
I Schoolbook multiplication now 25 64-bit integer multiplications
I Partial results have < 128 bits, adding upper part is add, not adc
I Easy to merge multiplication with reduction (multiplies by 19)
I Better performance on Westmere/Nehalem, worse on 65 nm Core 2 and AMD processors
EdDSA signatures and Ed25519 18 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB
EdDSA signatures and Ed25519 19 i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
EdDSA signatures and Ed25519 19 P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time
EdDSA signatures and Ed25519 19 I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB
EdDSA signatures and Ed25519 19 I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
EdDSA signatures and Ed25519 19 I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
EdDSA signatures and Ed25519 19 I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
EdDSA signatures and Ed25519 19 Fast signing
I Main computational task: Compute R = rB 63 I First compute r mod `, write it as r0 + 16r1 + ··· + 16 r63, with
ri ∈ {−8, −7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}
i I Precompute 16 |ri|B for i = 0,..., 63 and |ri| ∈ {1,..., 8}, in a lookup table at compile time P63 i I Compute R = i=0 16 riB I 64 table lookups, 64 conditional point negations, 63 point additions
I Wait, table lookups?
I In each lookup load all 8 relevant entries from the table, use arithmetic to obtain the desired one
I Signing takes 87548 cycles on an Intel Westmere CPU
I Key generation takes about 6000 cycles more (read from /dev/urandom)
EdDSA signatures and Ed25519 19 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8
I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
I Verification takes 273364 cycles
Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required
EdDSA signatures and Ed25519 20 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8
I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
I Verification takes 273364 cycles
Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
EdDSA signatures and Ed25519 20 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
I Verification takes 273364 cycles
Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8
EdDSA signatures and Ed25519 20 I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
I Verification takes 273364 cycles
Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
EdDSA signatures and Ed25519 20 I Verification takes 273364 cycles
Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
EdDSA signatures and Ed25519 20 Fast verification
I First part: point decompression, compute x coordinate xR of R as q 2 2 xR = ± (yR − 1)/(dyR + 1)
I Looks like a square root and an inversion is required 2 4 I As q ≡ 5 (mod 8) for each square α we have α = β , with β = α(q+3)/8 √ 2 I Standard: Compute β, conditionally multiply by −1 if β = −α
I Decompression has α = u/v, merge square root with inversion:
β = (u/v)(q+3)/8 = u(q+3)/8vq−1−(q+3)/8 = u(q+3)/8v(7q−11)/8 = uv3(uv7)(q−5)/8.
I Second part: computation of SB − H(R,A,M)A
I Double-scalar multiplication using signed sliding windows
I Different window sizes for B (compile time) and A (run time)
I Verification takes 273364 cycles
EdDSA signatures and Ed25519 20 I Choose independent uniform random 128-bit integers zi
I Compute Hi = H(Ri,Ai,Mi)
I Verify the equation
X X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i
I Use Bos-Coster algorithm for multi-scalar multiplication
I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)
Faster batch verification
I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai
EdDSA signatures and Ed25519 21 I Verify the equation
X X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i
I Use Bos-Coster algorithm for multi-scalar multiplication
I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)
Faster batch verification
I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai
I Choose independent uniform random 128-bit integers zi
I Compute Hi = H(Ri,Ai,Mi)
EdDSA signatures and Ed25519 21 I Use Bos-Coster algorithm for multi-scalar multiplication
I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)
Faster batch verification
I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai
I Choose independent uniform random 128-bit integers zi
I Compute Hi = H(Ri,Ai,Mi)
I Verify the equation
X X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i
EdDSA signatures and Ed25519 21 I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)
Faster batch verification
I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai
I Choose independent uniform random 128-bit integers zi
I Compute Hi = H(Ri,Ai,Mi)
I Verify the equation
X X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i
I Use Bos-Coster algorithm for multi-scalar multiplication
EdDSA signatures and Ed25519 21 Faster batch verification
I Verify a batch of (Mi,Ai,Ri,Si), where (Ri,Si) is the alleged signature of Mi under key Ai
I Choose independent uniform random 128-bit integers zi
I Compute Hi = H(Ri,Ai,Mi)
I Verify the equation
X X X − ziSi mod ` B + ziRi + (ziHi mod `)Ai = 0 i i i
I Use Bos-Coster algorithm for multi-scalar multiplication
I Verifying a batch of 64 valid signatures takes 8.55 million cycles (i.e., < 134000 cycles/signature)
EdDSA signatures and Ed25519 21 I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
The Bos-Coster algorithm
Pn I Computation of Q = 1 siPi
EdDSA signatures and Ed25519 22 I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
The Bos-Coster algorithm
Pn IIComputation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
EdDSA signatures and Ed25519 22 I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
The Bos-Coster algorithm
Pn IIComputation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
EdDSA signatures and Ed25519 22 I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times
I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:
I Each swap-down step needs only one comparison (instead of two)
I Swap-down loop is more friendly to branch predictors
A fast heap
IIHeap is a binary tree, each parent node is larger than the two child nodes
I Data structure is stored as a simple array, positions in the array determine positions in the tree
I Root is at position 0, left child node at position 1, right child node at position 2 etc.
I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c
EdDSA signatures and Ed25519 23 I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:
I Each swap-down step needs only one comparison (instead of two)
I Swap-down loop is more friendly to branch predictors
A fast heap
I Heap is a binary tree, each parent node is larger than the two child nodes
I Data structure is stored as a simple array, positions in the array determine positions in the tree
I Root is at position 0, left child node at position 1, right child node at position 2 etc.
I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c
I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times
EdDSA signatures and Ed25519 23 A fast heap
I Heap is a binary tree, each parent node is larger than the two child nodes
I Data structure is stored as a simple array, positions in the array determine positions in the tree
I Root is at position 0, left child node at position 1, right child node at position 2 etc.
I For node at position i, child nodes are at position 2 · i + 1 and 2 · i + 2, parent node is at position b(i − 1)/2c
I Typical heap root replacement (pop operation): start at the root, swap down for a variable amount of times
I Floyd’s heap: swap down to the bottom, swap up for a variable amount of times, advantages:
I Each swap-down step needs only one comparison (instead of two)
I Swap-down loop is more friendly to branch predictors
EdDSA signatures and Ed25519 23 I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
I Optimize the heap on the assembly level
The Bos-Coster algorithm
Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
EdDSA signatures and Ed25519 24 I Optimize the heap on the assembly level
The Bos-Coster algorithm
Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
EdDSA signatures and Ed25519 24 The Bos-Coster algorithm
Pn I Computation of Q = 1 siPi I Idea: Assume s1 > s2 > ··· > sn. Recursively compute Q = (s1 − s2)P1 + s2(P1 + P2) + s3P3 ··· + snPn
I Each step requires the two largest scalars, one scalar subtraction and one point addition
I Each step “eliminates” expected log n scalar bits
I Requires fast access to the two largest scalars: put scalars into a heap
I Crucial for good performance: fast heap implementation
I Further optimization: Start with heap without the zi until largest scalar has ≤ 128 bits
I Then: extend heap with the zi
I Optimize the heap on the assembly level
EdDSA signatures and Ed25519 24 Results
I New fast and secure signature scheme
I (Slow) C and Python reference implementations
I Fast AMD64 assembly implementations
I Also new speed records for Curve25519 ECDH
I All software in the public domain and included in eBATS
I All reported benchmarks (except batch verification) are eBATS benchmarks
I All reported benchmarks had TurboBoost switched off
I Software to be included in the NaCl library
http://ed25519.cr.yp.to/ http://nacl.cr.yp.to/
EdDSA signatures and Ed25519 25