A High-Speed Hardware Implementation of the Hermes8-128 Stream Cipher

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/4337029 A high-speed hardware implementation of the Hermes8-128 stream cipher Conference Paper · September 2007 DOI: 10.1109/ECCTD.2007.4529608 · Source: IEEE Xplore CITATIONS READS 3 70 2 authors, including: Paris Kitsos University of Peloponnese 105 PUBLICATIONS 1,130 CITATIONS SEE PROFILE Some of the authors of this publication are also working on these related projects: Hardware Trojan detection View project All content following this page was uploaded by Paris Kitsos on 19 May 2014. The user has requested enhancement of the downloaded file. A High-Speed Hardware Implementation of the Hermes8-128 Stream Cipher Paris Kitsos Ulrich Kaiser Computer Science Texas Instruments Deutschland GmbH School of Science and Technology 85350 Freising, Germany Hellenic Open University e-mail: [email protected] Patras, Greece e-mail: [email protected] Abstract—An efficient high-speed hardware implementation of The organization of the paper is as following: In section 2, the Hermes8-128 stream cipher is presented in this paper. a brief introduction of the Hermes8-128 stream cipher is Hermes8-128 is proposed for hardware based implementations given. In section 3, the design methodology with the in the eSTREAM project [1]. Two FPGA devices are used for performance metrics are examined. The proposed architecture the hardware implementations. Especially, the XILINX and VLSI implementation are presented in section 4. (Spartan-2) 2S100-6 and (VIRTEX-4) 4VFX12-11 are used. A Implementation results and discussion (comparison with other maximum throughput of 56.5 Mbps can be achieved with a clock works) are reported in section 5. Finally, section 6 concludes frequency of 49 MHz with a XC2S100-6 device, while a this paper. throughput of 361 Mbps at 313 MHz is achieved with the 4VFX12-11 device. Since now only one previous reported II. HERMES8-128 STREAM CIPHER SPECIFICATIONS Hermes8-128 hardware implementation exists, a comparison with the proposed one is given. Hermes8 is based on the Substitution-Permutation- Network (SPN) principle. The substitution (confusion) is I. INTRODUCTION performed by means of an S-BOX. The permutation and diffusion is performed by means of addressing the different The continuous growing of mobility requires that state bytes, the different key bytes, and most importantly the engineers and developers design new cryptographic primitives with special care for speed, security and simplicity. RFID tags, chaining with help of the Accu. A basic block diagram for the smart cards and mobile pervasive-computing are typical Hermes8-128 cipher is illustrated in Fig. 1. examples of products where the amount of memory and power is very limited. The hardware implementations of today’s algorithms, such as the AES cipher, are costly for devices with limited hardware resources, e.g. chip area or FPGA logic units. So, stream ciphers are useful in cases that low hardware complexity is needed. The European Network of Excellence in Cryptology set up the eSTREAM project [1] with the main task to provide and recommend efficient stream ciphers for a wide variety of Figure 1. The basic Data Flow Diagram of the Hermes8 Stream Cipher applications. One of the candidates is the Hermes8 [2] stream cipher. This cipher is proposed for both software (Profile-I) Hermes8-128 contains 16 key bytes and 37 state bytes. and hardware (Profile-ΙI) byte-orientated implementations, There are two pointers involved: p1 addresses one of the state e.g. Hermes8-128 with a key length of 16 bytes. Until now, bytes, p2 addresses one of the key bytes (see Fig. 1). The one hardware implementation [3] of the Hermes8 has been pointers obey modulo addition operation in order to assure presented in the literature. Its implementation is very compact, that they always address valid register space. The use of but with the drawback of performance. The proposed pointers is favorable over shift register designs when low- implementation has a different philosophy than that in [3] with power requirements are dominating the design. the major goal to increase the performance for efficient use in The core state operation (called sub-round) consists of the applications with high throughput requirements. following steps: While evaluating the performance of Profile-II candidates 1. Select a certain state byte and EXOR it with Accu, area requirements and time performance of an implementation 2. Select a certain key byte and EXOR it with previous result, are the most important metrics [4]. 3. Take the previous result and apply the S-BOX function, 4. Store the previous result in Accu, The maximum clock frequency, given in MHz, is determined 5. Copy Accu into the same state byte selected in step 1. by the critical path of the circuit. The S-BOX is 8-bit wide in order to provide a proper non- · Total Throughput (T) linear Boolean function needed for substitution, i.e. confusion. The total throughput of the algorithm expresses the number of First choice is the known S-BOX of AES which is strong cipher text bits simultaneously generated by the algorithm per against differential cryptanalysis, however random number second. It can be calculated from the following equation as: based S-BOXes are also suitable, if their differential #bits´ F distribution table (ddt) demonstrates good quality with respect T = (1) #clock cycles to differential cryptanalysis attacks. The key bytes are modified every KEY_STEP3, i.e. seven steps, during the sub- IV. HERMES8-128 HARDWARE ARCHITECTURE round loops depending on the position of p2. Two temporary Hermes8 is designed with a dedicated byte hardware pointers p3 and p4 are addressing the key bytes following the implementation. The architecture that performs the Hermes8- byte addresses by p2. The byte k[p2] is not modified because 128 stream cipher’s key stream is shown in Fig. 2. This it has to be used in the following sub-round. But the bytes architecture mainly consists of the State Register, the Key k[p3] and k[p4] are ‘rather old’ and are therefore candidates Register, one S-Box and the Accu register. In addition, some for modification; they are replaced by SBOX[ k[p3] exor multiplexers are there that support the correct operation of the k[p2] ] and SBOX[ k[p4] exor k[p2] ] respectively. The Hermes8-128 cipher. exor’ing with k[p2] is advantageous over the direct application of the SBOX, because the inverse function of the SBOX does Two important modules are the Modulo Counters ge- nerator and the Control Unit. The Modulo Counters generate exist. Therefore, backtracking is hampered by means of this the appropriate count values (p1, p2, etc) used by the cipher. method. The dashed pointer in Fig. 2 represents the next p2 position (because KEY_STEP1=3) when addressing the next For the initialization of these counters some predefined key byte needed for the next sub-round. values (derived from the XOR of a number of key-bytes) must A similar method is followed for the key stream ks[] be loaded. The Control Unit produces all signals that are generation. The key stream bytes are derived from the state responsible for the correct synchronization and operation of bytes state[]. Since the pointer p1 has been incremented after the overall design. the last sub-round, it points to the ‘oldest’ available state byte. Fig. 3 shows the implementation of the State Register. This is the first byte to be packed into the key stream block of Actually this register is consisting of 37 byte registers, a sixteen bytes. Then further bytes follow by means of output pointer po that is incremented by two in order to separate codec circuit, and 37 2-input byte OR gates. This register consecutive sub-rounds from each other. Since a new output initially stores the 37 IV bytes and each byte is updated by block of key stream bytes follows not earlier than the next the output of the Accu register through the 2x1 byte OR STREAM_ROUNDS=3 are completed, the state byte contents gates. The circuit block codec has as input the p1 value and corresponding to the same address are separated by 3 x 37 produces the proper byte register enable signals in order to sub-rounds. During these 111 Hermes8-128 sub-rounds there update the right byte at the right time according to the p1 are nearly 16 occurrences of key modification, i.e. about 32 value. key bytes are modified per output block in relation to 16 key byte registers. More information and also the Hermes8-128 cipher pseudo code can be found in the original specification and a related paper [2]. III. DESIGN METHODOLOGY The design of Hermes8-128 is developed in VHDL with structural description logic such that it can be synthesized for FPGA devices. Especially two XILINX FPGA devices [5], the SPARTAN-II XC2S30 and VIRTEX-IV X4VFX12, are Figure 3. The Implementation of the State Register used in order to evaluate the performance of the proposed implementation. To evaluate the performance of the proposed Finally, the implementation of the Key Register is implementation the following performance metrics will be depicted in Fig. 4. This register consist of 16 byte registers, used in this paper. 16 2x1 8-bit Multiplexers (MUXes), three 16x1 byte Mul- tiplexers (MUXes), 16 2-input byte OR gates, two S-Boxes · Circuit Area (A) and 16 3x1 OR gates. In this register initially the 16 key bytes The term A represents the total circuit area that is required for are stored and each byte is updated either by the K[p3]new or the implementation, expressed in CLB numbers (# CLBs). K[p4]new values through the 2x1 byte MUXes.

Load more