<<

Energy, performance, area versus security trade-offs for stream ciphers

L. Batina, J. Lano, N. Mentens, S. B. Örs, B. Preneel, I. Verbauwhede

Katholieke Universiteit Leuven, ESAT/SCD-COSIC Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

Lejla.Batina,Joseph.Lano,Nele.Mentens,Siddika.Berna Ors,Bart.Preneel,[email protected]. ac.be Motivation z Area evaluation z Performance evaluation – Performance for bulk – Performance in actual applications z Energy evaluation – LFSR designs for low power consumption z Conclusions Gate Counts of the Basis Blocks used in 3 E0, A5/1, RC4 stream ciphers

3500 3000 2500 2000 1500 1000 500 0 128-bit 61-bit 256-byte 4-bit 8-bit 8-bit LFSR LFSR RAM Adder Adder register Throughput in Hardware Implementations (1/2)

Throughput = N * clock frequency N is the number of bits produced in every clock cycle. z N = 1 for E0 and A5/1. z RC4 produces 8 bits at a time, and it needs 3 clocks to produce 8 bits of output. N = 8/3 for RC4.

Designers have moved away from bit serial design and have adopted a “word serial” approach

Throughput = N * Wordlength * clock frequency N is the number of words produced in every clock cycle. Throughput in Hardware Implementations (2/2) z Parallelism of the operations in one algorithm increases the word length z Using a pipeline structure increases the clock frequency. – Stream ciphers with tight feedback loops (meaning that the current input depends on recent outputs) cannot be pipelined. Area and Throughput Evaluation for Bulk Encryption

Throughput [Mbits/s] 180 160 140 120 E0 100 A5/1 80 60 RC4 40 20 0 0 5000 10000 15000 Area [CLB slices] Throughput in Software Implementations z The clock frequency is now a constant for a given processor. – N should be as high as possible – The number of instructions should be as low as possible Performance in Actual Applications

z In synchronous stream ciphers the stream is generated independently from the plaintext. – Sender and receiver need to remain synchronized. – The is divided into small packets. In GSM 224 bits, in 2745 bits. Half of the packets circulating on the Internet is only 512 bits long. z Resynchronizing induces some overhead: the algorithm has to perform several clocks before it can start outputting information. – The shorter the packets, the more important this overhead becomes.

Resynchronization Factor (RF)=Total number of clocks performed / number of clocks performed to generate key stream

z The RF increases the clock frequency required from the design to achieve a certain throughput. Throughput clock frequency = * RF N * Wordlength z Stream ciphers should achieve a good tradeoff resulting in a secure and fast resynchronization mechanism. RF of RC4, E0 and A5/1

6

5

4 RC4 3 E0 2 A5/1

1

0 Resynchronization Factor (RF) 1000 3000 5000 7000 9000 Length of the packet in bits •The overhead of RC4 is significantly larger than for E0 and A5/1 ¾Becomes a problem for small packets. ¾The state of RC4 is much bigger, and initializing this state requires more work. Energy Evaluation

z The power consumption in CMOS circuits is primarily attributed to the voltage changes on load capacitors and is given by,

z The energy consumed by a circuit can be divided into two main categories: – The fundamental energy: the energy necessary to calculate the intended result. – The overhead energy: the energy associated with all overhead functions, e.g. z moving data around, z logic that switches but does not create outputs, etc. z This overhead is huge in software implementations because there is a large mismatch between the cryptographic operations and the data paths of the general purpose processors. z In embedded devices, crypto algorithms are usually implemented in hardware accelerators, or dedicated co-processors. LFSR designs for low power consumption z Not for cryptographic applications. z Using where n is the order of the polynomial is proposed. z Kitsos et al. proposed a reconfigurable LFSR design – The critical path of the proposed LFSR is 2.912 nsec and the bit-rate is 340.7 Mbits/s. – Number of library cells is 884. – The average power consumption is 0.77 mW. Conclusions z We have shown the hardware implementation results of the most popular stream ciphers RC4, E0 and A5/1 z The throughput, area and power consumption values of these stream ciphers can be used as a reference for the future stream ciphers z We have tried to give design guidelines to increase throughput and decrease area and power consumption