<<

1 2 Speed, speed, speed $1000 TCR hashing competition D. J. Bernstein Crowley: “I have a problem where I need to make some University of Illinois at Chicago; faster, and I’m Ruhr University Bochum setting up a $1000 competition funded from my own pocket for Reporting some recent work towards the solution.” symmetric-speed discussions, Not fast enough: Signing H(M), especially from RWC 2020. where M is a long message. Not included in this talk: “[On a] 900MHz Cortex-A7 NISTLWC. • [SHA-256] takes 28.86 cpb ::: Short inputs. • BLAKE2b is nearly twice as FHE/MPC ciphers. • fast ::: However, this is still a lot slower than I’m happy with.” 1 2 3 Speed, speed, speed $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). D. J. Bernstein Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, University of Illinois at Chicago; cryptography faster, and I’m not full collision resistance. Ruhr University Bochum setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? Reporting some recent work towards the solution.” symmetric-speed discussions, Not fast enough: Signing H(M), especially from RWC 2020. where M is a long message. Not included in this talk: “[On a] 900MHz Cortex-A7 NISTLWC. • [SHA-256] takes 28.86 cpb ::: Short inputs. • BLAKE2b is nearly twice as FHE/MPC ciphers. • fast ::: However, this is still a lot slower than I’m happy with.” 1 2 3 Speed, speed, speed $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). D. J. Bernstein Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, University of Illinois at Chicago; cryptography faster, and I’m not full collision resistance. Ruhr University Bochum setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? Reporting some recent work towards the solution.” symmetric-speed discussions, Not fast enough: Signing H(M), especially from RWC 2020. where M is a long message. Not included in this talk: “[On a] 900MHz Cortex-A7 NISTLWC. • [SHA-256] takes 28.86 cpb ::: Short inputs. • BLAKE2b is nearly twice as FHE/MPC ciphers. • fast ::: However, this is still a lot slower than I’m happy with.” 1 2 3 Speed, speed, speed $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). D. J. Bernstein Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, University of Illinois at Chicago; cryptography faster, and I’m not full collision resistance. Ruhr University Bochum setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? Reporting some recent work towards the solution.” symmetric-speed discussions, Not fast enough: Signing H(M), especially from RWC 2020. where M is a long message. Not included in this talk: “[On a] 900MHz Cortex-A7 NISTLWC. • [SHA-256] takes 28.86 cpb ::: Short inputs. • BLAKE2b is nearly twice as FHE/MPC ciphers. • fast ::: However, this is still a lot slower than I’m happy with.” 2 3 $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, cryptography faster, and I’m not full collision resistance. setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? work towards the solution.”

Not fast enough: Signing H(M), where M is a long message. “[On a] 900MHz Cortex-A7 [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as fast ::: However, this is still a lot slower than I’m happy with.” 2 3 $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, cryptography faster, and I’m not full collision resistance. setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? work towards the solution.” “As far as I know, no-one Not fast enough: Signing H(M), has ever proposed a TCR as a where M is a long message. primitive, designed to be faster than existing hash functions, “[On a] 900MHz Cortex-A7 and that’s what I need.” [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as fast ::: However, this is still a lot slower than I’m happy with.” 2 3 $1000 TCR hashing competition Instead choose random R and sign (R;H(R;M)). Crowley: “I have a problem where I need to make some Note that H needs only “TCR”, cryptography faster, and I’m not full collision resistance. setting up a $1000 competition Does this allow faster H design? funded from my own pocket for TCR breaks how many rounds? work towards the solution.” “As far as I know, no-one Not fast enough: Signing H(M), has ever proposed a TCR as a where M is a long message. primitive, designed to be faster than existing hash functions, “[On a] 900MHz Cortex-A7 and that’s what I need.” [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as More desiderata: tree hash, fast ::: However, this is still a new tweak at each vertex, lot slower than I’m happy with.” multi-message security. 2 3 4 $1000 TCR hashing competition Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). Crowley: “I have a problem 70%, 23%, 35%, 21% rounds or where I need to make some Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of cryptography faster, and I’m not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 setting up a $1000 competition Does this allow faster H design? are “broken” or “practically broken”. funded from my own pocket for TCR breaks how many rounds? “Inconsistent security margins”. work towards the solution.” “As far as I know, no-one Not fast enough: Signing H(M), has ever proposed a TCR as a where M is a long message. primitive, designed to be faster than existing hash functions, “[On a] 900MHz Cortex-A7 and that’s what I need.” [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as More desiderata: tree hash, fast ::: However, this is still a new tweak at each vertex, lot slower than I’m happy with.” multi-message security. 2 3 4 $1000 TCR hashing competition Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). Crowley: “I have a problem 70%, 23%, 35%, 21% rounds or where I need to make some Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of cryptography faster, and I’m not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 setting up a $1000 competition Does this allow faster H design? are “broken” or “practically broken”. funded from my own pocket for TCR breaks how many rounds? “Inconsistent security margins”. work towards the solution.” “As far as I know, no-one Not fast enough: Signing H(M), has ever proposed a TCR as a where M is a long message. primitive, designed to be faster than existing hash functions, “[On a] 900MHz Cortex-A7 and that’s what I need.” [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as More desiderata: tree hash, fast ::: However, this is still a new tweak at each vertex, lot slower than I’m happy with.” multi-message security. 2 3 4 $1000 TCR hashing competition Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). Crowley: “I have a problem 70%, 23%, 35%, 21% rounds or where I need to make some Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of cryptography faster, and I’m not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 setting up a $1000 competition Does this allow faster H design? are “broken” or “practically broken”. funded from my own pocket for TCR breaks how many rounds? “Inconsistent security margins”. work towards the solution.” “As far as I know, no-one Not fast enough: Signing H(M), has ever proposed a TCR as a where M is a long message. primitive, designed to be faster than existing hash functions, “[On a] 900MHz Cortex-A7 and that’s what I need.” [SHA-256] takes 28.86 cpb ::: BLAKE2b is nearly twice as More desiderata: tree hash, fast ::: However, this is still a new tweak at each vertex, lot slower than I’m happy with.” multi-message security. 3 4 Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). 70%, 23%, 35%, 21% rounds or Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. 3 4 Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). 70%, 23%, 35%, 21% rounds or Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a primitive, designed to be faster than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. 3 4 Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). 70%, 23%, 35%, 21% rounds or Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a “Thousands of papers, stagnating primitive, designed to be faster results and techniques”. than existing hash functions, and that’s what I need.” More desiderata: tree hash, new tweak at each vertex, multi-message security. 3 4 Instead choose random R Aumasson, “Too much crypto” and sign (R;H(R;M)). 70%, 23%, 35%, 21% rounds or Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a “Thousands of papers, stagnating primitive, designed to be faster results and techniques”. than existing hash functions, and that’s what I need.” “What we want: More scientific and rational approach More desiderata: tree hash, to choosing round numbers, new tweak at each vertex, tolerance for corrections”. multi-message security. 3 4 5 Instead choose random R Aumasson, “Too much crypto” New BLAKE3 hash function = and sign (R;H(R;M)). 7-round BLAKE2s + tree mode, 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of “Much faster than MD5, SHA-1, not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 SHA-2, SHA-3, and BLAKE2.” Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a “Thousands of papers, stagnating primitive, designed to be faster results and techniques”. than existing hash functions, and that’s what I need.” “What we want: More scientific and rational approach More desiderata: tree hash, to choosing round numbers, new tweak at each vertex, tolerance for corrections”. multi-message security. 3 4 5 Instead choose random R Aumasson, “Too much crypto” New BLAKE3 hash function = and sign (R;H(R;M)). 7-round BLAKE2s + tree mode, 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of “Much faster than MD5, SHA-1, not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 SHA-2, SHA-3, and BLAKE2.” Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a “Thousands of papers, stagnating primitive, designed to be faster results and techniques”. than existing hash functions, and that’s what I need.” “What we want: More scientific and rational approach More desiderata: tree hash, to choosing round numbers, new tweak at each vertex, tolerance for corrections”. multi-message security. 3 4 5 Instead choose random R Aumasson, “Too much crypto” New BLAKE3 hash function = and sign (R;H(R;M)). 7-round BLAKE2s + tree mode, 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. Note that H needs only “TCR”, 50%, 8%, 25%, 20% rounds of “Much faster than MD5, SHA-1, not full collision resistance. AES-128/B2b/ChaCha20/SHA-3 SHA-2, SHA-3, and BLAKE2.” Does this allow faster H design? are “broken” or “practically broken”. TCR breaks how many rounds? “Inconsistent security margins”. “As far as I know, no-one “Attacks don’t really get better”. has ever proposed a TCR as a “Thousands of papers, stagnating primitive, designed to be faster results and techniques”. than existing hash functions, and that’s what I need.” “What we want: More scientific and rational approach More desiderata: tree hash, to choosing round numbers, new tweak at each vertex, tolerance for corrections”. multi-message security. 4 5 Aumasson, “Too much crypto” New BLAKE3 hash function = 7-round BLAKE2s + tree mode, 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. 50%, 8%, 25%, 20% rounds of “Much faster than MD5, SHA-1, AES-128/B2b/ChaCha20/SHA-3 SHA-2, SHA-3, and BLAKE2.” are “broken” or “practically broken”. “Inconsistent security margins”. “Attacks don’t really get better”. “Thousands of papers, stagnating results and techniques”. “What we want: More scientific and rational approach to choosing round numbers, tolerance for corrections”. 4 5 Aumasson, “Too much crypto” New BLAKE3 hash function = 7-round BLAKE2s + tree mode, 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. 50%, 8%, 25%, 20% rounds of “Much faster than MD5, SHA-1, AES-128/B2b/ChaCha20/SHA-3 SHA-2, SHA-3, and BLAKE2.” are “broken” or “practically broken”. “Inconsistent security margins”. Crowley: “Android disk crypto is always right up against the wall “Attacks don’t really get better”. of acceptable speed (and battery “Thousands of papers, stagnating use). uses ChaCha12 results and techniques”. and is still IMHO too slow. “What we want: More [10.6 Cortex-A7 cycles/byte.] It scientific and rational approach sometimes seems like no-one in to choosing round numbers, the crypto world feels the user’s tolerance for corrections”. pain here; it always looks better to call for more rounds.” 4 5 6 Aumasson, “Too much crypto” New BLAKE3 hash function = Huge influence of CPU. 7-round BLAKE2s + tree mode, cycles/byte for two ciphers: 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. 50%, 8%, 25%, 20% rounds of #1 #2 Intel “Much faster than MD5, SHA-1, AES-128/B2b/ChaCha20/SHA-3 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” are “broken” or “practically broken”. 0.38 0.88 2017 Cascade Lake “Inconsistent security margins”. Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X always right up against the wall 1.94 1.90 2016 “Attacks don’t really get better”. of acceptable speed (and battery 0.77 0.98 2016 “Thousands of papers, stagnating use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake results and techniques”. and is still IMHO too slow. 0.77 1.01 2014 Broadwell “What we want: More [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell scientific and rational approach sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge to choosing round numbers, the crypto world feels the user’s tolerance for corrections”. pain here; it always looks better to call for more rounds.” 4 5 6 Aumasson, “Too much crypto” New BLAKE3 hash function = Huge influence of CPU. 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. 50%, 8%, 25%, 20% rounds of #1 #2 Intel microarchitecture “Much faster than MD5, SHA-1, AES-128/B2b/ChaCha20/SHA-3 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” are “broken” or “practically broken”. 0.38 0.88 2017 Cascade Lake “Inconsistent security margins”. Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X always right up against the wall 1.94 1.90 2016 Goldmont “Attacks don’t really get better”. of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake “Thousands of papers, stagnating use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake results and techniques”. and is still IMHO too slow. 0.77 1.01 2014 Broadwell “What we want: More [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell scientific and rational approach sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge to choosing round numbers, the crypto world feels the user’s tolerance for corrections”. pain here; it always looks better to call for more rounds.” 4 5 6 Aumasson, “Too much crypto” New BLAKE3 hash function = Huge influence of CPU. 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: 70%, 23%, 35%, 21% rounds or parallel XOF + more changes. 50%, 8%, 25%, 20% rounds of #1 #2 Intel microarchitecture “Much faster than MD5, SHA-1, AES-128/B2b/ChaCha20/SHA-3 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” are “broken” or “practically broken”. 0.38 0.88 2017 Cascade Lake “Inconsistent security margins”. Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X always right up against the wall 1.94 1.90 2016 Goldmont “Attacks don’t really get better”. of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake “Thousands of papers, stagnating use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake results and techniques”. and is still IMHO too slow. 0.77 1.01 2014 Broadwell “What we want: More [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell scientific and rational approach sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge to choosing round numbers, the crypto world feels the user’s tolerance for corrections”. pain here; it always looks better to call for more rounds.” 5 6 New BLAKE3 hash function = Huge influence of CPU. 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: parallel XOF + more changes. #1 #2 Intel microarchitecture “Much faster than MD5, SHA-1, 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” 0.38 0.88 2017 Cascade Lake Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X always right up against the wall 1.94 1.90 2016 Goldmont of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake and is still IMHO too slow. 0.77 1.01 2014 Broadwell [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge the crypto world feels the user’s pain here; it always looks better to call for more rounds.” 5 6 New BLAKE3 hash function = Huge influence of CPU. 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: parallel XOF + more changes. #1 #2 Intel microarchitecture “Much faster than MD5, SHA-1, 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” 0.38 0.88 2017 Cascade Lake Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X always right up against the wall 1.94 1.90 2016 Goldmont of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake and is still IMHO too slow. 0.77 1.01 2014 Broadwell [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge the crypto world feels the user’s #1: ChaCha12. #2: AES-256. pain here; it always looks better to call for more rounds.” 5 6 7 New BLAKE3 hash function = Huge influence of CPU. Deck functions: e.g., Xoofff 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes parallel XOF + more changes. #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. “Much faster than MD5, SHA-1, 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. always right up against the wall 1.94 1.90 2016 Goldmont of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake and is still IMHO too slow. 0.77 1.01 2014 Broadwell [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge the crypto world feels the user’s #1: ChaCha12. #2: AES-256. pain here; it always looks better to call for more rounds.” 5 6 7 New BLAKE3 hash function = Huge influence of CPU. Deck functions: e.g., Xoofff 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes parallel XOF + more changes. #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. “Much faster than MD5, SHA-1, 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. always right up against the wall 1.94 1.90 2016 Goldmont of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake and is still IMHO too slow. 0.77 1.01 2014 Broadwell [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge the crypto world feels the user’s #1: ChaCha12. #2: AES-256. pain here; it always looks better to call for more rounds.” 5 6 7 New BLAKE3 hash function = Huge influence of CPU. Deck functions: e.g., Xoofff 7-round BLAKE2s + tree mode, Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes parallel XOF + more changes. #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. “Much faster than MD5, SHA-1, 0.37 0.68 2018 Cannon Lake SHA-2, SHA-3, and BLAKE2.” Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; Crowley: “Android disk crypto is 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. always right up against the wall 1.94 1.90 2016 Goldmont of acceptable speed (and battery 0.77 0.98 2016 Kaby Lake use). Adiantum uses ChaCha12 0.74 0.95 2015 Skylake and is still IMHO too slow. 0.77 1.01 2014 Broadwell [10.6 Cortex-A7 cycles/byte.] It 0.77 1.03 2013 Haswell sometimes seems like no-one in 1.71 1.29 2012 Ivy Bridge the crypto world feels the user’s #1: ChaCha12. #2: AES-256. pain here; it always looks better to call for more rounds.” 6 7 Huge influence of CPU. Deck functions: e.g., Xoofff Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake 0.74 0.95 2015 Skylake 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. 6 7 Huge influence of CPU. Deck functions: e.g., Xoofff Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. 6 7 Huge influence of CPU. Deck functions: e.g., Xoofff Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell Security goal: PRF. 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge #1: ChaCha12. #2: AES-256. 6 7 Huge influence of CPU. Deck functions: e.g., Xoofff Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell Security goal: PRF. 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge Efficiency goal: quickly compute substring of Fk (X0), then #1: ChaCha12. #2: AES-256. substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 6 7 8 Huge influence of CPU. Deck functions: e.g., Xoofff Deck-Stream: Fk (N). Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell Security goal: PRF. 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge Efficiency goal: quickly compute substring of Fk (X0), then #1: ChaCha12. #2: AES-256. substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 6 7 8 Huge influence of CPU. Deck functions: e.g., Xoofff Deck-Stream: Fk (N). Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell Security goal: PRF. 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge Efficiency goal: quickly compute substring of Fk (X0), then #1: ChaCha12. #2: AES-256. substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 6 7 8 Huge influence of CPU. Deck functions: e.g., Xoofff Deck-Stream: Fk (N). Intel cycles/byte for two ciphers: Keccak team says: Xoofff takes #1 #2 Intel microarchitecture 0.51 cycles/byte on Skylake-X. 0.37 0.68 2018 Cannon Lake Deck functions are “a new useful 0.38 0.88 2017 Cascade Lake API to make modes trivial”; 0.38 0.89 2017 Skylake-X they “allow efficient ciphers”. 1.94 1.90 2016 Goldmont 0.77 0.98 2016 Kaby Lake Syntax of deck function: F :( 0; 1 ) 0; 1 . 0.74 0.95 2015 Skylake k { }∗ ∗ → { }∞ 0.77 1.01 2014 Broadwell Security goal: PRF. 0.77 1.03 2013 Haswell 1.71 1.29 2012 Ivy Bridge Efficiency goal: quickly compute substring of Fk (X0), then #1: ChaCha12. #2: AES-256. substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 7 8 Deck functions: e.g., Xoofff Deck-Stream: Fk (N). Keccak team says: Xoofff takes 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”.

Syntax of deck function: F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Security goal: PRF. Efficiency goal: quickly compute substring of Fk (X0), then substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 7 8 Deck functions: e.g., Xoofff Deck-Stream: Fk (N).

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 0.51 cycles/byte on Skylake-X. Deck functions are “a new useful API to make modes trivial”; they “allow efficient ciphers”.

Syntax of deck function: F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Security goal: PRF. Efficiency goal: quickly compute substring of Fk (X0), then substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 7 8 Deck functions: e.g., Xoofff Deck-Stream: Fk (N).

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 0.51 cycles/byte on Skylake-X. Deck-SANE session: Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) they “allow efficient ciphers”. as stream C ; → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Security goal: PRF. Efficiency goal: quickly compute substring of Fk (X0), then substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 7 8 Deck functions: e.g., Xoofff Deck-Stream: Fk (N).

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 0.51 cycles/byte on Skylake-X. Deck-SANE session: Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) they “allow efficient ciphers”. as stream ciphertext C ; → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Deck-SANSE: misuse resistance. Security goal: PRF. Efficiency goal: quickly compute substring of Fk (X0), then substring of Fk (X0;X1), then substring of Fk (X0;X1;X2), etc. 7 8 Deck functions: e.g., Xoofff Deck-Stream: Fk (N).

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 0.51 cycles/byte on Skylake-X. Deck-SANE session: Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) they “allow efficient ciphers”. as stream ciphertext C ; → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Deck-SANSE: misuse resistance. Security goal: PRF. Deck-WBC: wide-. Efficiency goal: quickly compute substring of Fk (X0), then For speed, the wide-block cipher substring of Fk (X0;X1), then combines Xoofff and Xoofffie, substring of Fk (X0;X1;X2), etc. (sort of) built from Xoodoo. 7 8 9 Deck functions: e.g., Xoofff Deck-Stream: Fk (N). MAC speed

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 0.51 cycles/byte on Skylake-X. 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) (I’ve started investigating they “allow efficient ciphers”. as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Deck-SANSE: misuse resistance. Security goal: PRF. Deck-WBC: wide-block cipher. Efficiency goal: quickly compute substring of Fk (X0), then For speed, the wide-block cipher substring of Fk (X0;X1), then combines Xoofff and Xoofffie, substring of Fk (X0;X1;X2), etc. (sort of) built from Xoodoo. 7 8 9 Deck functions: e.g., Xoofff Deck-Stream: Fk (N). MAC speed

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 0.51 cycles/byte on Skylake-X. 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) (I’ve started investigating they “allow efficient ciphers”. as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Deck-SANSE: misuse resistance. Security goal: PRF. Deck-WBC: wide-block cipher. Efficiency goal: quickly compute substring of Fk (X0), then For speed, the wide-block cipher substring of Fk (X0;X1), then combines Xoofff and Xoofffie, substring of Fk (X0;X1;X2), etc. (sort of) built from Xoodoo. 7 8 9 Deck functions: e.g., Xoofff Deck-Stream: Fk (N). MAC speed

Keccak team says: Xoofff takes Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 0.51 cycles/byte on Skylake-X. 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. Deck functions are “a new useful 128 bits of F (N) tag; k → API to make modes trivial”; use more bits of Fk (N) (I’ve started investigating they “allow efficient ciphers”. as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; Syntax of deck function: k 1 1 → etc. F :( 0; 1 ) 0; 1 . k { }∗ ∗ → { }∞ Deck-SANSE: misuse resistance. Security goal: PRF. Deck-WBC: wide-block cipher. Efficiency goal: quickly compute substring of Fk (X0), then For speed, the wide-block cipher substring of Fk (X0;X1), then combines Xoofff and Xoofffie, substring of Fk (X0;X1;X2), etc. (sort of) built from Xoodoo. 8 9 Deck-Stream: Fk (N). MAC speed

Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. 128 bits of F (N) tag; k → use more bits of Fk (N) (I’ve started investigating as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; k 1 1 → etc. Deck-SANSE: misuse resistance. Deck-WBC: wide-block cipher.

For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo. 8 9 Deck-Stream: Fk (N). MAC speed

Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. 128 bits of F (N) tag; k → use more bits of Fk (N) (I’ve started investigating as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; k 1 1 → sounds slower, but etc. aims for PRF or PRP or SPRP. Deck-SANSE: misuse resistance. How many rounds are needed in the context of a MAC? Deck-WBC: wide-block cipher.

For speed, the wide-block cipher combines Xoofff and Xoofffie, (sort of) built from Xoodoo. 8 9 Deck-Stream: Fk (N). MAC speed

Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, Deck-SANE session: using mults in field of size 2256. 128 bits of F (N) tag; k → use more bits of Fk (N) (I’ve started investigating as stream ciphertext C ; bit ops for integer mults.) → 1 128 bits of F (N;A ;C ) tag; k 1 1 → Encryption sounds slower, but etc. aims for PRF or PRP or SPRP. Deck-SANSE: misuse resistance. How many rounds are needed in the context of a MAC? Deck-WBC: wide-block cipher. OCB etc. try to skip MAC, For speed, the wide-block cipher but can these modes safely use combines Xoofff and Xoofffie, as few rounds as counter mode? (sort of) built from Xoodoo. 8 9 10 Deck-Stream: Fk (N). MAC speed Bit operations per bit of plaintext (assuming precomputed subkeys): Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, ops/bit cipher Deck-SANE session: using mults in field of size 2256. 256 54 ChaCha8 128 bits of Fk (N) tag; → 256 78 ChaCha12 use more bits of Fk (N) (I’ve started investigating 128 88 : 62 ops broken as stream ciphertext C1; bit ops for integer mults.) → 128 100 NOEKEON 128 bits of Fk (N;A1;C1) tag; → Encryption sounds slower, but 128 117 Skinny etc. aims for PRF or PRP or SPRP. 256 126 ChaCha20 Deck-SANSE: misuse resistance. How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? Deck-WBC: wide-block cipher. 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny For speed, the wide-block cipher but can these modes safely use 128 162.75 Piccolo combines Xoofff and Xoofffie, as few rounds as counter mode? 128 202.5 AES (sort of) built from Xoodoo. 256 283.5 AES 8 9 10 Deck-Stream: Fk (N). MAC speed Bit operations per bit of plaintext (assuming precomputed subkeys): Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, key ops/bit cipher Deck-SANE session: using mults in field of size 2256. 256 54 ChaCha8 128 bits of Fk (N) tag; → 256 78 ChaCha12 use more bits of Fk (N) (I’ve started investigating 128 88 Simon: 62 ops broken as stream ciphertext C1; bit ops for integer mults.) → 128 100 NOEKEON 128 bits of Fk (N;A1;C1) tag; → Encryption sounds slower, but 128 117 Skinny etc. aims for PRF or PRP or SPRP. 256 126 ChaCha20 Deck-SANSE: misuse resistance. How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? Deck-WBC: wide-block cipher. 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny For speed, the wide-block cipher but can these modes safely use 128 162.75 Piccolo combines Xoofff and Xoofffie, as few rounds as counter mode? 128 202.5 AES (sort of) built from Xoodoo. 256 283.5 AES 8 9 10 Deck-Stream: Fk (N). MAC speed Bit operations per bit of plaintext (assuming precomputed subkeys): Deck-MAC: 128 bits of Fk (M). 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, key ops/bit cipher Deck-SANE session: using mults in field of size 2256. 256 54 ChaCha8 128 bits of Fk (N) tag; → 256 78 ChaCha12 use more bits of Fk (N) (I’ve started investigating 128 88 Simon: 62 ops broken as stream ciphertext C1; bit ops for integer mults.) → 128 100 NOEKEON 128 bits of Fk (N;A1;C1) tag; → Encryption sounds slower, but 128 117 Skinny etc. aims for PRF or PRP or SPRP. 256 126 ChaCha20 Deck-SANSE: misuse resistance. How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? Deck-WBC: wide-block cipher. 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny For speed, the wide-block cipher but can these modes safely use 128 162.75 Piccolo combines Xoofff and Xoofffie, as few rounds as counter mode? 128 202.5 AES (sort of) built from Xoodoo. 256 283.5 AES 9 10 MAC speed Bit operations per bit of plaintext (assuming precomputed subkeys): 2014 Bernstein–Chou Auth256: 29 bit ops per message bit, key ops/bit cipher using mults in field of size 2256. 256 54 ChaCha8 256 78 ChaCha12 (I’ve started investigating 128 88 Simon: 62 ops broken bit ops for integer mults.) 128 100 NOEKEON Encryption sounds slower, but 128 117 Skinny aims for PRF or PRP or SPRP. 256 126 ChaCha20 How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny but can these modes safely use 128 162.75 Piccolo as few rounds as counter mode? 128 202.5 AES 256 283.5 AES 9 10 11 MAC speed Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. 2014 Bernstein–Chou Auth256: • Binary mults: Share area with 29 bit ops per message bit, key ops/bit cipher • code-based crypto. using mults in field of size 2256. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • (I’ve started investigating lattice-based crypto and ECC. 128 88 Simon: 62 ops broken bit ops for integer mults.) Use existing CPU multipliers. 128 100 NOEKEON • Encryption sounds slower, but 128 117 Skinny aims for PRF or PRP or SPRP. 256 126 ChaCha20 How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny but can these modes safely use 128 162.75 Piccolo as few rounds as counter mode? 128 202.5 AES 256 283.5 AES 9 10 11 MAC speed Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. 2014 Bernstein–Chou Auth256: • Binary mults: Share area with 29 bit ops per message bit, key ops/bit cipher • code-based crypto. using mults in field of size 2256. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • (I’ve started investigating lattice-based crypto and ECC. 128 88 Simon: 62 ops broken bit ops for integer mults.) Use existing CPU multipliers. 128 100 NOEKEON • Encryption sounds slower, but 128 117 Skinny aims for PRF or PRP or SPRP. 256 126 ChaCha20 How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny but can these modes safely use 128 162.75 Piccolo as few rounds as counter mode? 128 202.5 AES 256 283.5 AES 9 10 11 MAC speed Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. 2014 Bernstein–Chou Auth256: • Binary mults: Share area with 29 bit ops per message bit, key ops/bit cipher • code-based crypto. using mults in field of size 2256. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • (I’ve started investigating lattice-based crypto and ECC. 128 88 Simon: 62 ops broken bit ops for integer mults.) Use existing CPU multipliers. 128 100 NOEKEON • Encryption sounds slower, but 128 117 Skinny aims for PRF or PRP or SPRP. 256 126 ChaCha20 How many rounds are needed 256 144 Simon: 106 ops broken in the context of a MAC? 128 147.2 PRESENT OCB etc. try to skip MAC, 256 156 Skinny but can these modes safely use 128 162.75 Piccolo as few rounds as counter mode? 128 202.5 AES 256 283.5 AES 10 11 Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. • Binary mults: Share area with key ops/bit cipher • code-based crypto. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • lattice-based crypto and ECC. 128 88 Simon: 62 ops broken Use existing CPU multipliers. 128 100 NOEKEON • 128 117 Skinny 256 126 ChaCha20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES 10 11 Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. • Binary mults: Share area with key ops/bit cipher • code-based crypto. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • lattice-based crypto and ECC. 128 88 Simon: 62 ops broken Use existing CPU multipliers. 128 100 NOEKEON • 128 117 Skinny If int mults are available anyway, 256 126 ChaCha20 should we renew attention to 256 144 Simon: 106 ops broken ciphers that use some mults? 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES 10 11 Bit operations per bit of plaintext More virtues of mult-based MACs: (assuming precomputed subkeys): Easy masking. • Binary mults: Share area with key ops/bit cipher • code-based crypto. 256 54 ChaCha8 Integer mults: Share area with 256 78 ChaCha12 • lattice-based crypto and ECC. 128 88 Simon: 62 ops broken Use existing CPU multipliers. 128 100 NOEKEON • 128 117 Skinny If int mults are available anyway, 256 126 ChaCha20 should we renew attention to 256 144 Simon: 106 ops broken ciphers that use some mults? 128 147.2 PRESENT e.g. x *= 0xdf26f9 is same as 256 156 Skinny x-=x<<3; x-=x<<8; x+=x<<13. 128 162.75 Piccolo Mix with ^, >>>16, maybe +. 128 202.5 AES Try 16-bit mults for Intel, ARM. 256 283.5 AES