
Efficient Algorithms in Software Julio López [email protected] Institute of Computing, University of Campinas September 2017, Habana, Cuba. ASCrypto 2017 Agenda 1 Efficient Software Implementations Software Efficiency Parallel Computation -SIMD 2 Symmetric-Key Cryptography Data Encryption Hash Functions SHA2 Implementation SHA3 Implementation 3 Elliptic Curve Cryptography Elliptic Curves Elliptic Curve Diffie-Hellman Digital Signatures EdDSA Scheme Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 2 / 83 Section 1 Efficient Software Implementations 1.1 Software Efficiency Efficient Software Implementations Software Efficiency Software Efficiency The optimization of a software implementation of a cryptographic algorithm is a task with several goals: • Ensure security. • Running time. • Code size. • Memory consumption. • Computer platform characteristics • Energy consumption. Sometimes these goals are in conflict with each other. For example: accelerating an operation using look-up tables, it will increase code size, and it could result vulnerable against memory cache-attacks (if not implemented adequately). Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 3 / 83 Efficient Software Implementations Software Efficiency How Performance is Measured? • Measuring the elapsed time does not allow to compare timing between different computers; instead, clock cycles are measured. • Use the RDTSC instruction to read the Time-Stamp Counter on processor. 1 #include <stdint.h> 2 uint64_t get_cycles() { 3 uint32_t lo,hi; 4 asm volatile("rdtsc":"=a"(lo),"=d"(hi)); 5 return ((uint64_t)hi<<32) | lo; 6 } • To reduce certain sources of randomness during measurements it is recommended to turn off technologies such as Turbo Boost or Hyper-Threading. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 4 / 83 1.2 Parallel Computation -SIMD Efficient Software Implementations Parallel Computation -SIMD Single Instruction Multiple Data • Single Instruction Multiple Data is a class of computers where a single instruction is applied simultaneously over a set of data. • Latest processors support SIMD class by using a bank of wider registers, also known as vector registers. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 5 / 83 Efficient Software Implementations Parallel Computation -SIMD Vector instructions Instructions associated to vector registers are known as vector instructions. These instructions operate over words packed in vector registers. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 6 / 83 Integer Arithmetic MMX (64) Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic MMX 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX (64) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE MMX 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM (64)(128) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE2 SSE MMX 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM (64)(128) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE2 SSE MMX SSE3 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM (64)(128) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE2 String Manipulation SSE SSE4 MMX SSE3 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM (64)(128) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE2 String Manipulation Cryptography SSE SSE4 MMX SSE3 AES-NI + CLMUL 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM (64)(128) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic Floating-point Arithmetic SSE2 String Manipulation Cryptography AVX SSE SSE4 MMX SSE3 AES-NI + CLMUL 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM YMM (64)(128) (256) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic AVX2 Floating-point Arithmetic SSE2 String Manipulation Cryptography Bit Manipulation AVX SSE SSE4 MMX SSE3 AES-NI + CLMUL BMI 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM YMM (64)(128) (256) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic AVX2 Floating-point Arithmetic SSE2 String Manipulation Cryptography Bit Manipulation AVX SSE SSE4 MMX SSE3 SHA1-SHA2 AES-NI + CLMUL BMI 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM YMM (64)(128) (256) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Efficient Software Implementations Parallel Computation -SIMD Releases of Vector Instructions Integer Arithmetic AVX2 AVX-512 Floating-point Arithmetic SSE2 String Manipulation Cryptography Bit Manipulation AVX SSE SSE4 MMX SSE3 SHA1-SHA2 AES-NI + CLMUL BMI 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 MMX XMM YMM ZMM (64)(128) (256) (512) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 7 / 83 Variable logic shifts. • 1 cycle for fixed shifts. • 2 cycles for variable shifts. Permutation of words. • 3 cycles for permutations. Combination/selection of registers. • Up-to 3 instructions per cycle without dependencies. Efficient Software Implementations Parallel Computation -SIMD Relevant AVX2 Instructions Integer arithmetic for 64-bit words: • 1 cycle for add/sub. C = ADD(A, B) • 5 cycles for multiplications. a3 a2 a1 a0 + + + + b3 b2 b1 b0 c3 c2 c1 c0 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 8 / 83 Permutation of words. • 3 cycles for permutations. Combination/selection of registers. • Up-to 3 instructions per cycle without dependencies. Efficient Software Implementations Parallel Computation -SIMD Relevant AVX2 Instructions Integer arithmetic for 64-bit words: • 1 cycle for add/sub. C = VSHL(A, B) • 5 cycles for multiplications. a3 a2 a1 a0 Variable logic shifts. • 1 cycle for fixed shifts. • 2 cycles for variable shifts. b3 b2 b1 b0 c3 c2 c1 c0 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 8 / 83 Combination/selection of registers. • Up-to 3 instructions per cycle without dependencies. Efficient Software Implementations Parallel Computation -SIMD Relevant AVX2 Instructions Integer arithmetic for 64-bit words: • 1 cycle for add/sub. C = PERM(A, M) • 5 cycles for multiplications. a3 a2 a1 a0 Variable logic shifts. • 1 cycle for fixed shifts. • 2 cycles for variable shifts. m3 m2 m1 m0 0, 1, 2, 3 Permutation of words. { } • 3 cycles for permutations. am3 am2 am1 am0 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 8 / 83 Efficient Software Implementations Parallel Computation -SIMD Relevant AVX2 Instructions Integer arithmetic for 64-bit words: • 1 cycle for add/sub. C = BLEND(A, B, M) • 5 cycles for multiplications. a3 a2 a1 a0 b3 b2 b1 b0 Variable logic shifts. • 1 cycle for fixed shifts. • 2 cycles for variable shifts. Permutation of words. 0/1 0/1 0/1 0/1 • 3 cycles for permutations. Combination/selection of registers. c3 c2 c1 c0 • Up-to 3 instructions per cycle without dependencies. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 8 / 83 Efficient Software Implementations Parallel Computation -SIMD Vector Instruction Guide Full documentation available at: http://software.intel.com/sites/landingpage/IntrinsicsGuide Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 9 / 83 Efficient Software Implementations Parallel Computation -SIMD Skylake Execution Engine The Skylake processor has eight execution ports for instructions. This improves the Instruction-Level Parallelism (ILP). Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 10 / 83 Section 2 Symmetric-Key Cryptography 2.1 Data Encryption Symmetric-Key Cryptography Data Encryption Secure Communication • Alice and Bob would like to communicate through an insecure channel. • Charles is a malicious third party that has also access to the channel. • It is desired that Charles does not be able to read messages interchanged by Alice and Bob. 0111100001100010101011111010 Julio López (IC-UNICAMP) Efficient Algorithms
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages124 Page
-
File Size-