<<

International Journal of Computer Applications (0975 – 8887) Volume 107 – No 13, December 2014

High Performance Architecture for LILI-II

N. B. Hulle R. D. Kharadkar, Ph.D. S. S. Dorle, Ph.D. GHRIEET, Pune GHRIEET, Pune GHRCE, Nagpur Domkhel Road Domkhel Road Hingana Road Wagholi, Pune Wagholi, Pune Nagpur

ABSTRACT cipher. This architecture uses same clock for both LFSRs. It is Proposed work presents high performance architecture for capable of shifting LFSRD content by one to four stages, LILI-II stream cipher. This cipher uses 128 bit key and 128 bit depending on value of function FC in single clock cycle IV for initialization of two LFSR. Proposed architecture uses without losing any data from function FC. single clock for both LFSRs, so this architecture will be useful in high speed communication applications. Presented 2. LILI-II STREAM CIPHER architecture uses four bit shifting of LFSR in single clock LILI-II is synchronous stream cipher developed by A. Clark et D al. in 2002 by removing existing weaknesses of LILI-128 cycle without losing any data items from function FC. Proposed architecture is coded by using VHDL language with stream cipher. It consists of two subsystems, clock controlled CAD tool Xilinx ISE Design Suite 13.2 and targeted hardware subsystem and data generation subsystem as shown in Fig. 1. is Xilinx Virtex5 FPGA having device xc4vlx60, with KEY IV package ff1148. Proposed architecture achieved throughput of 127 128 128 224.7 Mbps at 224.7 MHz frequency. 127

General Terms Hardware implementation of stream ciphers LFSRc LFSRd ... Keywords X0 X126 X0 X1 X96 X122 LILI, Stream cipher, clock controlled, FPGA, LFSR. fc fd 1. INTRODUCTION Key Stream Clock Controlled Subsystem Data Generation Subsystem Message secrecy is one of most important aspect of communication, but especially in wireless environment messages are highly insecure and encryption is must in such Fig. 1: Block diagram of LILI-II stream cipher environment. Implementations of cryptographic algorithms in Clock controlled subsystem consists of LFSR and function hardware usually achieve superior performance when C FC. LFSRC is 128 bit shift register having primitive feedback compared with software based ones. It is the growing polynomial as shown by equation (01) [5]. requirements for high-speed and high-level of secure communications. Security algorithms and their associated X 128X 126X 125X 124X 123X 122X 119X 117 X 115X 111X 108X 106X 105 keys are more secure, when implemented in hardware because 104 103 102 96 94 90 87 82 81 80 79 77 74 73 X X X X X X X X X X X X X X they cannot easily modified by an outside attacker [1]. 72 71 70 67 66 65 61 60 58 57 56 55 54 53 Reconfigurable devices such as FPGAs are a highly attractive X X X X X X X X X X X X X X option for a hardware implementation as they provide the X 52X 51X 50X 49X 47X 44X 43X 40X 39X 36X 35X 30X 29X 25 flexibility of dynamic system evolution as well as the ability 23 18 17 16 15 14 11 9 8 7 6 X X X X X X X X X X X X 1 to easily implement a wide range of algorithms [2], [3]. (01) During the last three decades, the art of securing all forms of Function F is accepting stage 0 and stage 126 output bit of data has improved to a great extent. The complexity of the C LFSRC as input and produces four possible combinations of functions that can now be performed by the security output. These four combinations of output decide shifting of equipment has increased. The majority of the more effective LFSRD by one to four stages. Function FC is given by cipher systems have become far more mathematical intensive equation (02). and hence their evaluation has become complex and abstract. The LILI-II stream cipher is LFSR based synchronous stream Fc(X 0,X 126)  2X 0X 1261 (02) cipher, designed with larger internal components than previous ciphers in this class. This cipher is improved version Data generation subsystem consists of shift register LFSRD of the LILI-128 cipher and capable of providing higher level and function FD. LFSRD is 127 bit shift register having of security [4]. This stream cipher is less efficient in software primitive feedback polynomial shown by equation (03). because of large size of LFSR and bigger Boolean function, X 127X 121X 120X 114X 107X 106X 103X 101 X 97X 96X 94X 92X 89 used to increase level of security. It uses 128-bit key and 128 87 84 83 81 76 75 74 72 69 68 65 64 62 59 bit (IV) for initializing LFSRs [5]. X X X X X X X X X X X X X X X 57X 56X 54X 52X 50X 48X 46X 45X 43X 40X 39X 37X 36X 35 Proposed work presents high performance LILI-II stream X 30X 29X 28X 27X 25X 23X 22X 21X 20X 19X 18X 14X 10X 8 The work described in this article is part of sponsored project X 7 X 6 X 4 X 3X 2 X 1 by BCUD, Savitribai Phule Pune University, Pune to (03) strengthen the research activity among the academicians. LFSR shifting is controlled by function F . The function F Proposal No. 13ENG001265 D C D takes twelve inputs from LFSRD stages (0, 1, 3, 7, 12, 20, 30,

10 International Journal of Computer Applications (0975 – 8887) Volume 107 – No 13, December 2014

44, 65, 80, 96, and 122) and provides one bit output as key Now, this time generated 255 bit string is loaded into LFSRC stream for each clock cycle. Function FD is lookup table given and LFSRD stage to begin the key stream generation. As in the specification [5] of LILI-II stream cipher. previously, first 128 bits are used for LFSRC and remaining 127 bits for LFSR [5], [6]. 3. PROPOSED LILI-II ARCHITECTURE D Proposed LILI-II architecture is shown in Fig. 2. It consists of Proposed architecture uses two input multiplexor for loading two subsystems, clock controlled subsystem and data initial values during initialization stage and shifting LFSR generation subsystem. Initial values of the LFSRs in both content during normal operation. When ―Initial Data Load subsystems are obtained from 128 bit secret key and 128 bit LFSRC‖ terminal is ‗1‘ then initial values are loaded into publicly known IV. If publicly known IV is not of 128 bit, LFSRC stages at positive edge of clock, otherwise LFSRC then multiple copies will be concatenated or truncated to form contents are shifted by one bit towards right at positive edge 128 bit vector. of each clock.

The Boolean expression of function FC has three terms 2X0, Initial state of LFSRC is obtained by XORing of 128 bit key X126 and constant 1. The term 2X0 has only two possible with 128 bit IV. Similarly, initial state of LFSRD is obtained by deleting first bit of 128 bit key, last bit of 128 bit IV and values, either 00 or 10 (only MSB is changing). The term X126 XORing the two 127 bit strings. also has two possibilities 0 or 1 and third term is constant 1. The truth table has four terms A1 (2X0, MSB), A0 (constant If cryptanalyst has access to output Z (i) and IV during ‗0‘, LSB), B0 (X126), C0 (constant ‗1‘) as shown in Table 1. initialization process, he may recover key for available set of Z (i) and IV. Appropriate security is maintained, when key Table 1. Truth table for Boolean function FC loading process do not leak information about key during key Inputs Case 1 Case 2 Case 3 Case 4 loading process. For avoiding access of Z (i) to cryptanalyst during key loading process, DFF is used at output. This DFF A = 2X0 00 00 10 10 will not latch the input string till completion of initialization B = process [5], [6], [7], [8], [9]. 0 1 0 1 X126 Actual initialization process uses LILI-II structure two times C = 1 1 1 1 1 for a period of 255 clock cycles each. At the start LFSRC and LFSRD are loaded with initial state as explained above and Sum 1 0 1 0 architecture runs to produce 255 bit binary string. This generated 255 bit string is used as initial state of LFSRC and Carry 0 1 1 10 LFSRD. First 128 bits of generated string are used for LFSRC FC 001 010 011 100 and remaining 127 bits for LFSRD [5], [6].

Second time this architecture runs again to produce 255 bit string by using initial stages from previously generated string. Initial data for LFSRd

Initial data for LFSRc

LFSR D

01 = Load 1 shift LFSRd1 Initial Data 1 2 3 4 5 127 128 10 = Load 1 shift LFSRd2 Load LFSRc 11 = Load 1 shift LFSRd3 0 = Data Load 00 = Load 1 shift LFSRd4 1 = Shifting 2 f1 Feedback Function Initial Data Load LFSRd of LFSRc 0 = Data Load 1 = Shifting 1 2 3 4 5 126 127

Update

Function Fc 1 1 f2 f1 1 2 3 126 127

f3 f2 f1 1 2 126 127 DFF DFF

f4 f3 f2 f1 1 126 127

1 1

2 01 = Select LFSRd1 10 = Select LFSRd2 11 = Select LFSRd3 Function Fd 00 = Select LFSRd4 (4096x1 Rom)

DFF Z (t) Fig. 2: Proposed Architecture for LILI-II

When output conditions of FC are 01, 10, 11 and 00 the architecture uses LFSRD having four sets of 127 bit shift content of LFSRD is shifted by one stage, two stages, three registers, named as LFSRD1, LFSRD2, LFSRD3 and LFSRD4. stages and four stages respectively in one clock cycle. LFSRD1 to LFSRD4 are used for shifting the original content of LFSRD by one to four bits respectively in single clock cycle The shift register LFSRD is designed in such a way that, it is depending on output from function FC. Shifting of LFSRD by capable to shift its content by one to four stages in one clock one to four bits is shown in Fig. 3. cycle depending on output of function FC. Proposed

11 International Journal of Computer Applications (0975 – 8887) Volume 107 – No 13, December 2014

Single Shift 1 2 3 4 5 6 7 . . . 123 124 125 126 127 The function FD of data generation subsystem is implemented as lookup table in ROM having twelve address lines and each location consists of one bit data. Depending on input address Double Shift f2 f1 1 2 3 4 5 . . . 121 122 123 124 125 the one bit key stream is available at output. 4. RESULTS AND DISCUSSION Triple Shift f3 f2 f1 1 2 3 4 . . . 120 121 122 123 124 Proposed architecture uses compact architecture for function FC as compared to existing implementations [3], [5], [6]. Specification of LILI-II [5] and previous implementations [6] Quad Shift f4 f3 f2 f1 1 2 3 . . . 119 120 121 122 123 uses full adder for implementation of function FC. We used Fig. 3: Shifting of LFSR in single clock cycle compact architecture for function FC as shown by equations D (04) and (05). Proposed architecture uses multiplexer having two inputs for Previous implementation [6] uses clock pulses mechanism loading initial values into LFSRD stages. When ―Initial Data and logic gate AND to shift the content of LFSRD by one to Load LFSRD‖ terminal of LFSRD1 is ‗1‘ then initial values are four states depending on value of function FC. If LFSRD4 of loaded into LFSRD1 stages. During normal operation this terminal is ‗0‘ and provides normal shifting operation. It also this architecture is clocked four times, it is providing four consists of multiplexer having four inputs for loading shifted outputs, but as per specification expected is one. Similarly, this architecture is not using all the values generated by value of LFSRD1 to LFSRD4 in LFSRD1 required for next clock cycle. When select line of multiplexer is 01, 10, 11 or 00, the function FC. For example, when content of LFSRD is to be shifted by four states, then during four clock cycles LFSRD is one bit shifted content of LFSRD1, LFSRD2, LFSRD3 or not accepting any value from function FC. LFSRD4 is loaded into LFSRD1. Architecture presented in this paper uses four set of feedback Implementation [6] do not uses proper loading arrangement functions as f1, f2, f3 and f4 as shown in Fig. 4. Feedback after each cycle. For example, if the content of LFSRD4 is shifted by four stages, same content need to be shifted in next function f1 to f4 are useful for shifting of LFSRD by one to four stages respectively in single clock cycle. clock cycle depending on value of function FC. But previous implementation uses initial contents for shifting of LFSRD1, The functions from f1 to f4 are calculated by using current LFSRD2 or LFSRD3. This may be the serious security issue. stage of LFSRD1 in each clock cycle. For example, first input used by feedback function f1 is X127, same input for feedback Reconfiguration of LFSR with different feedback functions is functions f2, f3 and f4 is taken from X126, X125 and X124 one of the good concepts proposed in the previous respectively. Similarly, the last input for feedback function f1 implementation [6]. This concept may be used for providing is X1, same input for feedback function f2, f3 and f4 is f1, f2 different security levels. Feedback logic of proposed and f3 respectively. This is because f1 is the new value taken architecture is not reconfigurable. Architecture proposed in as input after first shift and similar for third and fourth shift. this paper is capable to shift content of LFSRD by one to four states in single clock cycle in place of using multiple clock In the Proposed architecture, twelve stages output (0, 1, 3, 7, cycles. Similarly, proposed implementation also has facility to 12, 20, 30, 44, 65, 80, 96, and 122) of LFSRD required for load the shifted content in the next cycle and shift this content function FD are connected by using four input multiplexers. by requisite number of states depending on value of function When output of function FC is 01, 10, 11 or 00 then LFSRD1, FC. LFSRD2, LFSRD3 and LFSRD4 output is selected at multiplexer respectively and delivered to function F . Proposed implementation is coded by using VHDL language D and uses XILINX xc4vlx60 FPGA device with package At the same time selected value of LFSRD (1, 2, 3 or 4) in the ff1148. Comparison of synthesis results between previous [6] current clock cycle needed to be loaded in LFSRD1 in next and proposed implementation is shown in Table 2. clock cycle, to shift these contents by one to four stages Comparison of simulated result shows that proposed depending on value of function FC in current clock cycle. The operation mentioned above is performed by storing value of architecture is compact as compared to previous implementations. This architecture achieves maximum function FC in the FFs and used in the next clock cycle as shown in Fig. 2. throughput of 224.7 Mbps at frequency of 224.7 Mhz. Achieved throughput is more than basic architecture but less LFSRd1 1 2 3 4 5 . . . 121 122 123 124 125 126 127 than higher versions of FPGA proposed by other authors. Table 2. Comparisons of hardware resources used Feedback Function f4 Utilized in Utilized in

Feedback Function f3 Logic term proposed previous [6] implementation implementation Feedback Function f2 Function Generators 386 938

Feedback Function f1 Number of Slices 514 469

LFSRd2 Number of Slice Flip f2 f1 1 2 3 4 5 . . . 121 122 123 124 125 261 693 Flops LFSRd3 f3 f2 f1 1 2 2 4 . . . 120 121 122 123 124 Number of bonded 260 391 LFSRd4 IOBs f4 f3 f2 f1 1 2 3 . . . 119 120 121 122 123 Block RAM 0 1 Fig. 4: Feedback function of LFSRD

12 International Journal of Computer Applications (0975 – 8887) Volume 107 – No 13, December 2014

5. CONCLUSION 128 generator‖, In Selected Areas in Proposed architecture minimizes the hardware requirements —SAC 2000, volume 2012 of Lecture as compared to previous implementations and achieves Notes in Computer Science. Springer-Verlag, 2000. throughput of 224.7 Mbps at clock frequency of 224.7 MHz. [5] Clark, Andrew J. and Fuller, Joanne E. and Golic, Jovan This architecture is capable of shifting LFSRD content by one Dj and Dawson, Edward P. and Lee, H-J and Millan, to four states in single clock cycle, without losing any states William L. and Moon, Sang-Jae and Simpson, L. R., from function FC. This architecture has facility to load shifted ―The LILI-II Keystream Generator‖, In Information content in next clock cycle required for further shifting. It Security and Privacy, 2002, pp. 25–39. may be more secure than previous architectures. [6] Kitsos, P., Sklavos, N., & Koufopavlou, O. ―A High- 6. REFERENCES Speed Hardware Implementation of the LILI-II [1] Noman, A. Al, Sidek, R. M., & Ramli, A. R., ―Hardware Keystream Generator‖, In 5th International Symposium Implementation of RC4A Stream Cipher‖, International on Communications Systems, Networks & Digital Signal Journal of Cryptology Research,2009, 1(2), 225–233. Processing , 2006, pp. 6–9. [2] Galanis, M., Kitsos, P., Kostopoulos, G., Sklavos, N., & [7] Kircanski, A., & Youssef, A. M., ―On the Sliding Goutis, C., ―Comparison of the Hardware Property of SNOW 3G and SNOW 2.0‖, In SAS 2012, Implementation of Stream Ciphers‖, The International LNCS 7707, 2013, Springer-Verlag (Vol. 5, pp. 119– Arab Journal of Information Technology, 2005, 2(4), 134). 267–274. [8] ETSI/SAGE: ‗Document 2: Specification of the 3GPP [3] Philippe Leglise, Francois-Xavier Standaert, Gael Confidentiality and Integrity Algorithms 128-EEA3 & Rouvroy, Jean- Jacques Quisquater, ―Efficient 128-EUA3: ZUC specification‘, Version 1.4, 2010. H. Implementation of Recent Stream Ciphers on Englund and T. Johnson, ―A New Distinguisher for Reconfigurable Hardware Devices‖, In Proc. of the 26th Clock Controlled Stream Ciphers‖, Fast Software Symposium on Information Theory in the Benelux, May Encryption 2005, LNCS Vol 3557, pages 181-195, 19th-May 20th, 2005, Brussels, Belgium. Springer, 2005. [4] A. Clark, E. Dawson, J. Fuller, J. Golic, H-J. Lee, William Millan, S-J. Moon, and L. Simpson, ―The LILI-

IJCATM : www.ijcaonline.org 13