<<

An Efficient Implementation of the

A thesis submitted to the

Graduate School of the University of Cincinnati In partial fulfillment of the requirements For the degree of

Master of Science

In the Department of Electrical and Computer Engineering

Of the College of Engineering and Applied Sciences

July 2014

By

Ramya Krishna Addluri

B.E, Electronics and Communications Engineering, University College of Engineering, Osmania University, India, 2011

Thesis Advisor and Committee Chair: Dr. Carla Purdy ABSTRACT

Global networking and mobile computing are prevalent today, with the increase of communication methods such as telephones, computers, internet, broadcasting, etc. Because these transmission channels are open, there is no guarantee to avoid unauthorized access of the information through eavesdropping. Thus, several encryption have been designed for the security of the information being stored or transmitted. Several applications require such high throughputs for encryption and decryption that they cannot be executed on a regular general purpose microprocessor. FPGAs are a great platform for such implementations.

We briefly study several cryptographic algorithms designed to prevent eavesdropping and then focus in detail on the Blowfish encryption algorithm. It is widely used in tools for management, database security, email encryption etc. The algorithm is implemented in VHDL. It is then deployed onto a Nios II soft core processor on Altera DE1 board. Altera

Quartus II is used to synthesize the design. Software based Huffman encoding is used to encode the input to the algorithm to reduce redundancy and to achieve lossless data compression. The results are analyzed.

ii

iii

ACKNOWLEDGEMENT

First of all, I would like to thank my advisor, Dr. Carla Purdy for her continued support and guidance throughout my thesis. I would like to thank Dr. George Purdy and Dr. Wen Ben

Jone for being a part of my defense committee.

Thanks to my family and friends for their support and cooperation.

iv

TABLE OF CONTENTS

1. INTRODUCTION ...... 1

1.1 Motivation ...... 1

1.2 Goals of Thesis ...... 3

1.3 Thesis Organization ...... 3

2. BACKGROUND ...... 4

2.1 Need for ...... 4

2.2 Types of Cryptographic Algorithms ...... 5

3. OUTLINE OF RESEARCH ...... 17

3.1 Blowfish Encryption Algorithm...... 17

3.2 HDL Implementation ...... 20

3.3 Huffman Encoding ...... 23

4. RESULTS ...... 28

4.1 Implementation and Simulation ...... 28

4.2 ANALYSIS ...... 29

5. CONCLUSIONS AND FUTURE WORK ...... 33

5.1 Conclusions ...... 33

5.2 Future Work ...... 34

REFERENCES ...... 35

v

APPENDIX ...... 39

Tutorial for Simulation in Modelsim and Quartus II ...... 39

vi

LIST OF FIGURES

Figure 2.1: Types of Cryptography [17] ...... 5

Figure 2.2: ECB Mode Encryption [22] ...... 9

Figure 2.3: ECB Mode Decryption [22] ...... 9

Figure 2.4: CBC Mode Encryption [22] ...... 11

Figure 2.5: CBC Mode Decryption [22] ...... 11

Figure 2.6: CFB Mode Encryption [22] ...... 13

Figure 2.7: CFB Mode Decryption [22] ...... 14

Figure 2.8: OFB Mode Encryption [22] ...... 16

Figure 2.9: OFB Mode Decryption [22] ...... 16

Figure 3.1: Blowfish Feistel Structure [25] ...... 17

Figure 3.2: Blowfish Feistel Function [25] ...... 18

Figure 3.3: Blowfish Algorithm [28] ...... 19

Figure 3.4: Block diagram of Blowfish Implementation ...... 22

Figure 3.5: Initialization of Memory in Form of Arrays...... 23

Figure 3.6: Huffman Encoding Example ...... 25

Figure 4.1: Simulation Results Using Altera Quartus II ...... 29

Figure 4.2: Logic Circuitry Utilization in EP2C70F896C7 and EP2C20F484C7 (DE1) ...... 30

Figure 4.3: Utilization of Pins in EP2C70F896C7 and EP2C20F484C7 (DE1) ...... 31

Figure 4.4: 3-DES and Blowfish resource utilization comparison ...... 32

Figure A.1: Device Settings in Quartus II ...... 40

Figure A.2: Unsuccessful Compilation Window ...... 41

Figure A.3: Successful Compilation Window ...... 42

vii

Figure A.4: Blowfish Encryption ...... 43

Figure A.5: Blowfish Decryption ...... 44

viii

LIST OF TABLES

Table 3.1: Symbols and Their Corresponding Codes ...... 25

Table 3.2: American English Alphabet and Corresponding Frequencies [31] ...... 27

ix

1 INTRODUCTION

1.1 Motivation

Encryption allows us to hide information being transmitted or stored. Depending on chosen keys, a given algorithm is used to transform the message from one form into another, i.e., to convert into . Encryption does not prevent interception of itself, but it denies the message content to the interceptor [1]. The ciphertext can be read only if decrypted. management is as important as protecting the data. There are several hardware modules to protect the key from unauthorized attacks.

Choosing an encryption key is a crucial issue. It in turn depends on how long the information needs to be secure, type of encryption used, how valuable the data is, and several other factors. A 64-bit symmetric key is sufficient for data that needs to be secure for less than several weeks. A 128-bit key is sufficient for data that needs to be secure for years or decades.

However, a 160-bit key is required if the data has to be secure for a longer period of time [2].

Hence, encryption algorithms with varying key sizes are a great area to focus on.

DES has been the predominant since it was developed by IBM in the 1970’s [3]. It was designed for efficient hardware implementation. Several alternative block designs have been proposed due to its relatively slow operation in software. On

January 2, 1997, NIST (National Institute of Standards and Technology) had announced that they wanted to choose a successor to DES [4]. Advanced Encryption Standard (AES), then under the name Rijndael was selected as the replacement [5]. It came in 3 different key lengths: 128,

1

192 and 256 bits. Block size was 128 bits [5]. Blowfish’s successor [6] was one of the finalists.

Blowfish is a symmetric with variable key length from 32 bits to 448 bits. It is thus ideal for both domestic and export use. Blowfish was designed by in 1993 and can be used as a replacement for DES (Data Encryption Standard) or IDEA (International

Data Encryption Algorithm) [7].

The degree of security in the algorithm has been tested using various cryptographic tests like randomness test [8], avalanche criteria and correlation coefficient [9]. Blowfish emerged successful in all these tests. Several experiments were conducted to compare the execution speed [10] and security of Blowfish with other encryption algorithms like AES, DES and 3DES and Blowfish was the best performing among them [11]. It is however important to make significant tradeoffs between speed, size and implementation [12]. Serge Vaudenay examined weak keys in Blowfish and found a class of keys that can be detected but they cannot be broken in full (16 round) Blowfish. developed a second-order differential attack on 4-round Blowfish but it cannot be extended to more rounds [13].

The advantage with Blowfish is that it is unpatented and license free and has widely been accepted as a strong encryption algorithm by NIST [14]. Blowfish is used in many software platforms. It has its applications in many tools used for password management, file and disk encryption, database security, email encryption, , secure shell, file transfer etc.

[15]. It has been added to the mainline Linux Kernel, starting with v2.5.47 [7].

2

1.2 Goals of Thesis

The goal of this thesis work is to implement the Blowfish Encryption Algorithm on an

FPGA which is equipped with the Altera Nios II processor. The Altera FPGA CAD tool Quartus II is used to verify the synthesizability of the implementation onto hardware. The results are analyzed to study the efficiency of implementing the algorithm in an embedded system.

1.3 Thesis Organization

The thesis is organized as follows:

Chapter 1 is the introduction which contains the motivation for the work, thesis goal and organization of the thesis.

Chapter 2 explains the need for cryptography, types of cryptographic algorithms and different modes of operation.

Chapter 3 contains the description of the Blowfish Encryption Algorithm and its HDL implementation.

Chapter 4 describes a software implementation of Huffman Coding and the analysis of the implementation of the Blowfish encryption algorithm on an FPGA. Huffman coding is used to compress the data before inputting it to the Blowfish algorithm.

Chapter 5 contains the conclusions and suggested future work.

3

2. BACKGROUND

The core of cryptography is secure communication in the presence of third parties. To elaborate, cryptography is defined as "the science and study of secret writing" which concerns the ways in which communications and data can be encoded to prevent disclosure of their contents through eavesdropping or message interception, using codes, and other methods, so that only certain people can see the real message [16].

Today, as the Internet, telecommunication and other forms of electronic communication have become more prevalent, electronic security has become extremely important. One possible way of keeping data safe is physical security, i.e., putting the machine inside physical walls. However this is practically impossible considering cost and efficiency.

Instead, computers are openly connected to each other and hence expose the communication channels and themselves. This problem has to be addressed.

2.1 Need for Cryptography

For any application, there are four main security goals to be kept in mind [17]:

 Authentication - The recipient should be able to verify the sender’s identity, the

origin and the path the message travelled to validate emitter claims and recipient

expectations [17]

 Confidentiality - Only the intended receiver should be able to read the message from

its encrypted form [17]

4

 Integrity – We must ensure that the received message has not been altered in any

way [18]

 Non-repudiation - The sender should not be able to deny sending the message [18]

2.2 Types of Cryptographic Algorithms

Cryptography currently used is mostly key based. The different types of cryptography can be seen in Figure 2.1 [17].

Figure 2.1: Types of Cryptography [17]

Key based cryptography can be broadly classified into two categories:

I. Public Key Cryptography - This is also called asymmetric cryptography. It has two

keys, public key and private key. The former is published to the public while the

5

latter is known only to the owner. While these keys work in pairs, encryption is

initiated with the public key and can be decrypted only with the corresponding

private key. As there is no need for the private key to be communicated among the

interacting parties, the key distribution process is eliminated here. However, a

problem here is that there is no assurance of the true identity of the other party, i.e.,

if the public key was actually sent by the other party we wanted to communicate

with or not. In this case, we need a third party which authenticates the

communicating parties to each other. The trusted third party is called Certificate

Authority and issues certificates to guarantee authenticity [16].

II. Secret Key Cryptography - This is also called symmetric cryptography as the same

key is used for both encryption and decryption. Here, both the sender and the

receiver know the secret key. For the sender and receiver to communicate with each

other, the key must be passed between them through key distribution, which might

get quite complicated. Several encryption algorithms such as Blowfish, DES, 3DES,

RC2 and use this mechanism [16]. Secret key cryptography is further divided

into two categories:

a. Stream Ciphers: Stream Ciphers operate on a single bit, byte or computer

word at a time. Each unit of the plaintext is encrypted one at a time with the

corresponding unit of the keystream. In most cases, the unit is a bit and the

combining operation is exclusive-or. The keystream can be generated from

digital shift registers using a random value. In stream ciphers, when

using the same key, same plaintext will encrypt to different ciphertext.

6

Stream ciphers which use pseudorandom bits are typically periodic in nature,

so the keystream eventually repeats [17]. Stream ciphers are of two kinds:

i. Self-synchronizing Stream Ciphers - This is also called an

asynchronous or ciphertext autokey. Each bit in the

keystream is calculated as a function of the previous N bits in the

keystream. After receiving N ciphertext digits, the receiver

automatically synchronizes with the keystream generator. This makes

recovery easy if digits are added or dropped from the message

stream. Single-digit errors affect only up to N plaintext digits and have

limited effect [17].

ii. Synchronous Stream Ciphers - Independent of the ciphertext and

plaintext, a stream of pseudo-random digits is generated. It is then

combined with the plaintext to encrypt and with the ciphertext to

decrypt. For decryption to be successful, the sender and receiver

must be exactly in step. Synchronization is lost if digits are added or

deleted during transmission. To restore synchronization, different

offsets are used or the ciphertext is tagged at regular points in the

output. A corrupt digit during transmission affects only one digit in

the plaintext. This property makes the cipher more prone to active

attacks [17]. b. Block Ciphers - A block cipher is a symmetric key cipher operating on fixed

length groups of bits, called blocks, with an unvarying transformation. The

7

same key is used on each block. When using the same key, the same plaintext always encrypts to the same ciphertext.

A block cipher can encrypt a message longer than the block size also.

The message is broken down into blocks and each block is encrypted individually. The disadvantage with this is that each block is encrypted with the same key. Security is compromised as any repetition in plaintext is reflected in the ciphertext. To conquer this problem, different modes of operation are used to make encryption probabilistic [19]. The different modes of operation are:

i. Electronic Codebook Mode (ECB mode): This is the simplest of the

modes. The message is broken down into independent blocks. A fixed

ciphertext block is assigned to each plaintext block, similar to the

assignment of words in a codebook [20].

During encryption, forward cipher function is applied

independently and directly to each plaintext block. The resulting

sequences of output blocks form the ciphertext. Similarly, during

decryption, inverse cipher function is applied to obtain the plaintext

[8].

In ECB encryption and decryption, multiple forward and inverse

cipher functions can be computed in parallel. Its major use is when

one or very few blocks of data have to be sent.

8

The disadvantage with this mode of operation is that two identical plaintext blocks produce two identical ciphertext blocks [21]. Also, the blocks can be rearranged and modified by eavesdroppers [21]. It does not hide data patterns and is not suitable for long messages.

Also, it is susceptible to replay attacks [22].

Encryption and decryption in this mode can be seen in Figure 2.2 and 2.3 respectively. Here E = Encryption, D = Decryption, Pi =

Plaintext block i, Ci = Ciphertext block I and K = Secret Key.

Figure 2.2: ECB Mode Encryption [22]

Figure 2.3: ECB Mode Decryption [22]

9

ii. Cipher Block Chaining Mode (CBC mode): Each previous ciphertext

block is chained with the current plaintext block, hence the name

[22]. In CBC encryption, each successive plaintext block is exclusive-

ORed with the previous ciphertext block to produce the new input

block [20]. The ciphertext block is obtained by applying the forward

cipher function to each input block [20]. An IV () is

required to combine with the first plaintext block. The IV need not be

secret but it must be unpredictable [20]. The IV’s integrity should be

protected [20].

In CBC decryption, to recover any plaintext block (except the first),

the inverse cipher function is applied to the corresponding ciphertext

block, and the resulting block is exclusive-ORed with the previous

ciphertext block [20]. To recover the first plaintext block, the output

block from applying the inverse cipher function to the first ciphertext

block is exclusive-ORed with the initialization vector [20].

Encryption can’t be performed in parallel, whereas decryption can

be performed in parallel. CBC mode is used whenever large amounts

of data need to be sent securely, provided that all data is available

beforehand such as FTP, email, web etc. [22].

The disadvantage with CBC mode is that the chaining process

creates an . The encrypted message can’t be

modified without totally destroying the subsequent data [22]. Also,

10

one has to ensure that IV is either fixed or encrypted in ECB mode to prevent an attack on the first block of data.

Encryption and decryption in this mode can be seen in Figure 2.4 and 2.5 respectively. Here E = Encryption, D = Decryption, Pi =

Plaintext block i, Ci = Ciphertext block I and K =Secret Key.

Figure 2.4: CBC Mode Encryption [22]

Figure 2.5: CBC Mode Decryption [22].

11

iii. Cipher Feedback Mode (CFB mode): In CFB mode, the message is

treated as a stream of bits [22]. A new input block is formed when

each ciphertext segment is fed back into the previous input block.

Hence the name cipher feedback mode [20].It requires an

initialization vector (IV) as the initial input block [20]. It also needs an

integer parameter s such that 1 ≤ s ≤ b where b is the block size. Any

number of bits, i.e., 1, 8, 64 or 128 bits can be fed back but it is ideal

to use all the bits in the block [22]. The block cipher is essentially used

as a pseudo random number generator and these random bits are

combined with the message [22].

In CFB encryption, IV is the first input block and the first output

block is obtained by applying the forward cipher function to the IV

[20]. The s most significant bits of the output block are exclusive-

ORed with the first plaintext segment to from the first ciphertext

segment. The second input block is formed by concatenating the b-s

least significant bits of the IV with the s bits of the first ciphertext

segment. The process is repeated with the consecutive input blocks

until each plaintext segment produces a ciphertext segment [20].

In CFB decryption, the initialization vector is the first input block.

The b-s least significant bits of the previous input block are

12

concatenated with the s most significant bits of the previous ciphertext to form each successive input block. Plaintext segments are recovered by exclusive-ORing the s most significant bits of the output blocks with the corresponding ciphertext segments [20].

CFB encryption cannot be performed in parallel whereas CFB decryption can be performed in parallel given that the input blocks are first constructed from the initialization vector and the ciphertext.

Errors propagate for several blocks after the error [22].

CFB Encryption and decryption can be seen in Figure 2.6 and 2.7 respectively. Here E is Encryption, D is Decryption, Pi is Plaintext block i, Ci is Ciphertext block i, K is Secret Key, Si is Shift register, Ti is

Temporary register and IV is Initial Vector (S1).

Figure 2.7: CFB Mode Encryption [22]

13

Figure 2.7: CFB Mode Decryption [22] iv. Output Feedback Mode (OFB mode): In OFB mode, the message is

treated as stream of bits [22]. The generation of random bits here is

totally independent of the message being encrypted. Under the given

key, the initialization vector must be unique for each execution of the

mode [20].

In OFB encryption, the first output block is formed by the

transformation of IV by the forward cipher function [20]. It is then

exclusive-ORed with the first plaintext block to form the first

ciphertext block. The forward cipher function is applied on the first

output block to form the second output block. The second ciphertext

block is produced by exclusive-ORing the second output block with

the second plaintext block. The third output block is obtained by

applying the forward cipher function on the second output block etc.

Thus, the forward cipher function is applied to previous output blocks

14

to produce successive output blocks and ciphertext blocks are produced by exclusive-ORing the output blocks with the corresponding plaintext blocks.

In OFB decryption, the forward cipher function is applied to the previous output blocks to form the successive output blocks. The plaintext blocks are recovered by exclusive-ORing the output blocks with the corresponding ciphertext blocks.

Neither OFB encryption nor OFB decryption can be performed in parallel. The output blocks can however be generated before the availability of the ciphertext data or plaintext if the initialization vector is known. The advantage here is that bit errors in transmission do not propagate. It is however more vulnerable to message stream modification attack that CFB. One thing to be kept in mind is that the sender and receiver should always be synchronized with each other.

OFB Encryption and Decryption can be seen in Figure 2.8 and 2.9.

Here E is Encryption, D is Decryption, Pi is Plaintext block i, Ci is

Ciphertext block i, K is Secret Key, Si is Shift register, Ti is Temporary register and IV is Initial Vector (S1).

15

Figure 2.8: OFB Mode Encryption [22]

Figure 2.9: OFB Mode Decryption [22]

16

3. OUTLINE OF RESEARCH

3.1 Blowfish Encryption Algorithm

Blowfish is a symmetric-key block cipher. The key length varies between 32 bits and 448 bits, which makes it suitable for both domestic and exportable applications. The block size is 64 bits.

A 16 round and large substitution boxes (S-boxes) are used. The S-boxes generally depend on the key. Feistel ciphers are symmetrical structures contained in block ciphers. They are iterated ciphers with an internal function called round function [23]. They were first described by during his work on the cipher at IBM [24]. Lucifer was a predecessor to the Data Encryption Standard (DES). The advantage with a Feistel network is that encryption and decryption are very similar, making the circuitry and code smaller [23].

Other ciphers that use Feistel networks are IDEA, RC5, and Skipjack [24].

The action of Blowfish can be seen in Figure 3.1.

Figure 3.1: Blowfish Feistel Structure [25]

17

Each line represents 32-bits. An 18-entry P-array and four 256-entry S-boxes are used by the algorithm. They contain the subkeys. The S-boxes take 8-bit inputs and give 32-bit outputs.

In each round, one entry of the P-array is used. After the last round, each half of the data block is XORed with one of the two remaining unused P-entries [25]. Blowfish’s Feistel function is shown in Figure 3.2.

Figure 3.2: Blowfish Feistel Function [25]

The 32-bit input is split into four 8-bit quarters and each of these quarters serves as input to the S-boxes. The outputs are added modulo 232 and XORed to produce 32-bit final outputs.

The algorithm first starts with key initialization step. A is an algorithm which calculates the subkey for each round given the key [26]. Blowfish’s key schedule starts with initialization of the P-array and S-boxes with values acquired from the hexadecimal digits of pi. These values contain no obvious pattern. The key is then, cycling the key if necessary,

18

byte by byte XORed with all the P-entries in sequence. An all-zero 64-bit block is encrypted with the algorithm. P1 and P2 are replaced with the resultant ciphertext. The same ciphertext is then encrypted again using the new subkeys and P3 and P4 are replaced with the new ciphertext.

This process continues replacing the entire P-array and all the S-box entries. It takes total 521 to calculate all the subkeys. During the subkey generation process, the subkeys change slightly with every pair of generated subkeys. This is to protect against any attack on the subkey generation process that exploit the fixed and known subkeys [27].

Now that all the subkeys are calculated, data encryption is done. The action of Blowfish is shown in figure 3.3.

Figure 3.3: Blowfish Algorithm [28]

First, the input 64-bit block is divided into two 32-but halves. Several exclusive-OR and

Feistel function operations are performed in each round. One entry of P-array is used in each round. There are 16 such rounds. After 16 rounds, each half of the data block is XORed with the two remaining unused P entries. The 32-bit half data blocks are then concatenated to obtain

19

the ciphertext. Except that P1, P2……… P18 are used in the reverse order, decryption is exactly the same as encryption.

Many implementations support key-sizes up to 576 bits because during initialization, key bytes are XORed with all 576 bits of the P-array. While this is certainly possible here, we limit the to 448 bits to ensure that every bit of every subkey is dependent on every bit of the key, as the last four values of the P-array do not affect every bit of the ciphertext [27].

3.2 HDL Implementation

The most common mode of operation for Blowfish encryption is Electronic Codebook

Mode (ECB) mode because it allows encryption and decryption of multiple blocks in parallel

[29]. We use the same mode for our implementation. In this mode, the data is divided into several blocks. Each of the individual blocks is separately worked on.

One way in which Blowfish is different from other cryptographic algorithms is that the nonlinear S boxes depend on the key. Due to this, initialization of S Boxes is very expensive as we need to do an initialization each time the key changes. Due to this, a considerable number of finite state machines are needed by any implementation to control initialization and encryption.

Any implementation would require the following circuit blocks:

 An encryption core that implements the Feistel Network

20

 The function F(xL) which the core depends on for each round of the Feistel

network

 A generated array of sub-keys, called the P array. This is also used by the

encryption core for each round

 The key-dependent S boxes that are read by the F (xL) function. These are read

by F (xL) in each round.

 Control logic required to initialize the P array and S boxes.

One of the ways this algorithm can be implemented is by having a pipelined core. As all the 16 rounds will be running in parallel in this approach, the P array and S boxes must have all of their data read at every location at each clock cycle. We would need 1042 memory ports for this, which turns out very expensive. Also, the core and F (xL) hardware must be duplicated 16 times.

Another approach is to have a single register which feeds back to itself after each round.

The P array and S boxes would need only 5 memory ports together, which is very small compared to 1042 ports in the previous discussion. Also, the hardware for the encryption core and F (xL) has to exist only once. The S Boxes and P arrays are implemented synchronously.

We follow the latter approach for our implementation. Individual modules for S Boxes and P Arrays are implemented using the hexadecimal digits of pi. The Feistel function is implemented in Blowfish Cipher module. The PiROM module serves as the lookup table.

MiniFIFO’s are implemented and act as interface between Blowfish Cipher block and the key, input and output blocks. The block diagram of the algorithm can be seen in Figure 3.4.

21

Figure 3.4: Block Diagram of Blowfish Implementation

As the key size is 448 bits, we end up using a lot of I/O pins for the key alone. To reduce this overhead, we use memory arrays of size 64 bits in our implementation. The key is broken down into blocks of 64 bits and input to the Blowfish Cipher module. The key blocks are stored in individual memory arrays and are fetched when needed. The initialization of memory arrays can be seen in Figure 3.5.

22

Figure 3.5: Initialization of Memory in Form of Arrays

3.3 Huffman Encoding

Huffman encoding algorithm is a compression algorithm where only the frequencies of the individual letters are used to compress the data [30]. It is used to achieve lossless data compression. Redundancy in message makes cracking code easier. If redundancy is reduced enough, same key can be used longer and cracking the codes also becomes difficult. Also, it reduces the average number of coding bits per message.

The idea behind the algorithm is to use fewer bits to encode more frequent letters. The scheme uses a table of frequency of occurrence for each character or symbol in the input.

During the compression process, a binary tree of nodes is created. The nodes are stored in an array whose size depends on the number of symbols in the input. A node can either be an internal node or a leaf node. Initially all nodes are leaf nodes which contain a symbol and its corresponding frequency. An internal node contains symbol frequency, links to two child nodes and an optional link to a parent node. Conventionally, bit '0' is used to represent the left child

(the one with higher frequency of the two children) and bit '1' is used to represent the right

23

child (the one with lower frequency of the two children). If ‘x’ is the number of symbols, the final tree contains ‘x-1’ internal nodes and ‘x’ leaf nodes.

A priority queue is used in the simplest construction algorithm. The node with the lowest frequency is given the highest priority. The steps followed in the implementation process are:

● A leaf node is created for each symbol and added to the priority queue.

● While there is more than one node in the queue

1. The two nodes with least frequency are removed from the queue

2. A new internal node with these two new nodes as children and frequency equal to

sum of their frequencies is created.

3. The new node is added to the queue.

4. This process is repeated until there is only one node left.

● The remaining node is called the root node and the tree is now complete.

Priority queue data structures need O (log x) time per insertion. As a tree with ‘x’ leaves contains ‘2x-1’ nodes, the overall time complexity building the tree is O (x logx).

The following example generates binary codes for given set of symbols based on the above mentioned steps.

Example:

Consider a set of symbols {a1, a2, a3, a4, a5} with frequency of occurrence {0.4, 0.35,

0.15, 0.05, 0.05}. The two symbols with least frequencies are a4 and a5. These symbols are

24

removed from the queue and a new internal node with these two nodes as children and frequency 0.1 is added to the queue. This process is continues until there is one symbol left. A tree is formed now and starting from the root node, each node is assigned a code. The left child

(the one with higher frequency of the two children) is assigned ‘0’ and the right child (the one with lower frequency of the two children) is assigned ‘1’. This assignment continues till the leaf nodes. This example can be illustrated with figure 3.5.

Figure 3.6: Huffman Encoding Example

Table 3.1 shows resulting codes for each symbol.

Symbol a1 a2 a3 a4 a5

Code 1 00 010 0110 0111

Table 3.1: Symbols and Their Corresponding Codes

It can be observed that the symbol with highest frequency is represented with least number of bits. The implementation of the algorithm is done in C language.

25

The input data is first compressed using Huffman encoding algorithm and the resulting binary data is encrypted using Blowfish Encryption module and is then transmitted. At the decompressing end, we will need the same binary tree used for compression for extract the original input data.

There are two ways in which this can be done.

 Along with the Blowfish key, we can send the binary tree obtained from

compression to the decompressing end using RSA.

 The standard letter frequencies for the language in which we are sending the

message can be used. In our case, it is American English. The tree never changes

with this method. It can be made part of the decoding machinery.

For our implementation, we can use the latter approach and build a binary tree using standard frequencies of the American English alphabet and encode our message using the resulting binary codes.

Table 3.2 shows the alphabet with their corresponding frequencies based on Cornell

University Math Explorer's Project, which produced a table after measuring 40,000 words [31].

‘F’ denotes frequency percentage. It can be seen that previous knowledge of symbol frequencies is needed for creation of trees in this method. The efficiency of compression and transmission can be improved by changing the frequency of symbols during transmission. This method is known as dynamic or adaptive Huffman coding.

26

Symbol F Symbol F Symbol F Symbol F

E 12.02 R 6.02 F 2.30 K 0.69

T 9.10 H 5.92 Y 2.11 X 0.17

A 8.12 D 4.32 W 2.09 0.11

O 7.68 L 3.98 G 2.03 J 0.10

I 7.31 U 2.88 P 1.82 Z 0.07

N 6.95 C 2.71 B 1.49

S 6.28 M 2.61 V 1.11

Table 3.2: American English Alphabet and Their Corresponding Frequencies [31]

27

4. RESULTS

4.1 Implementation and Simulation

An Altera Cyclone II device is used for our implementation. The model number is

EP2C20F484C7. It corresponds to the Altera DE1 board. It is used with a core voltage of 1.2

Volts. It contains 4 PLLs, 315 pins and 16 global clocks. It is equipped with a Nios II processor.

Maximum clock speed obtained is 50MHz.

The functionality of the design is first checked by simulating the design in Modelsim and checking for several test inputs. The design is later synthesized using Quartus II. Appendix contains the steps involved in using Modelsim and creating a project in Quartus II. Once the design has been compiled, we get to know the physical properties of the design. These details can be used for optimizing the design.

The initial design was first implemented on EP2C70F896C7 because the design needed

585 pins and we do have enough pins on the DE1 board. The device EP2C70F896C7 belongs to the Altera Cyclone II family. It contains 4 PLLs and 622 pins. When the design was implemented on this device, it used 94% of the available pins. To reduce the pin usage and to make it fit on the DE1 board, we have used memory arrays. Using memory arrays has decreased the pins usage from 94% to 64%. The latter design is the design under consideration in this thesis.

Figure 4.1 shows the simulation results of our design. It gives details about the number of combinational functions, dedicated logic registers and the number of pins used in our design.

28

Figure 4.1: Simulation Results Using Altera Quartus II

4.2 ANALYSIS

Figure 4.2 below gives a comparison of logic circuitry used, for the design without using memory arrays and the one using memory arrays. The number of pins used is very less for the latter approach. However, the percentage of logic elements, combinational functions and dedicated logic registers used is very more when memory arrays are used. But it can be neglected, given the huge number of logic elements available on the device.

29

30 28 Logic Circuitry Utilization on the FPGA

25 24

20

15 13

10 7 6 5

1 0 Percentage of total logic elements Percentage of total combinational Percentage of dedicated logic used functions used registers used

EP2C70F896C7 EP2C20F484C7 (DE1)

Figure 4.2: Logic Circuitry Utilization in EP2C70F896C7 and EP2C20F484C7 (DE1)

Figure 4.3 shows the number of pins used by both the implementations. It can be observed that the number of pins used in the design with memory arrays is very less when compared to the one without memory arrays. The latter implementation would need 585 pins whereas the design with memory arrays has used only 201 pins out of 315 pins. Pin utilization has decreased by 58.6%. The unused pins can be left alone to save power or can be used to implement a different design, thus increasing the performance of the system.

30

700 Utilization of pins on the FPGA 600 585

500

400

300

201 200

100

0 Number of pins used

EP2C70F896C7 EP2C20F484C7 (DE1)

Figure 4.3: Utilization of Pins in EP2C70F896C7 and EP2C20F484C7 (DE1)

The combinational functions used in our design are more when compared to the normal implementation. This is due to the addition of memory arrays. But due to the large amount of resources available on the device, this difference is very insignificant.

Figure 4.4 compares the resource utilization of Triple DES implementation [35] and

Blowfish implementation. The number of pins used and the power consumed is almost the same for both the algorithms. The Blowfish implementation uses more resources but it is more secure encryption because it is less prone to brute force attacks due to the large key size. An attacker would have to try 2448 combinations of inputs which are highly impossible.

31

Figure 4.4: TDES and Blowfish Resource Utilization and Total Power Used

32

5. CONCLUSIONS AND FUTURE WORK

5.1 Conclusions

We have discussed the definition of cryptography and need for cryptography. Different types of cryptography have been discussed. Various modes of operation and their advantages and disadvantages have been discussed. Blowfish encryption algorithm, a symmetric key block cipher, has been studied. Its structure and working were explained in detail.

The results of the HDL implementation have been analyzed. The design of the hardware was modified to reduce the number of pins used. Blowfish Algorithm was implemented on an

Altera DE1 board with a Nios II processor. The design was synthesized using Altera Quartus II.

Usage of memory arrays has caused a significant decrease in the number of pins. Huffman coding was used to compress the data before encryption by the Blowfish algorithm. This helped in lossless data compression.

The number of logic elements used on the hardware has increased due to the inclusion of memory arrays. Memory plays a crucial role in storing the key and retrieving it later. Hence there is a tradeoff between the number of logic elements and the number of IO pins used. The performance of Blowfish and TDES has been compared. Both the algorithms use almost the same number of pins and power. More resources are used by Blowfish but it provides a higher level of security.

33

5.2 Future Work

We have implemented simple Huffman coding for compressing data in our design.

Adaptive Huffman coding can be implemented as a future work.

Different techniques can be implemented to further decrease the pin count. Blowfish’s successors such as Twofish [6] and [36] encryption algorithms can be implemented on

FPGAs and their performances can be compared as future work.

34

REFERENCES

[1] O. Goldreich, Foundations of Cryptography: Volume 2, Basic Applications, Cambridge University Press, May 2004.

[2] S. William, Cryptography and Network Security: Principles and Practice, Prentice Hall, Upper Saddle River, New Jersey, 2003.

[3] Fu Li, Pan Ming, “A simplified FPGA implementation based on an Improved DES algorithm”, IEEE Genetic and Evolutionary Computing, WGEC, 3rd International Conference, Pg. 227-230, 2009.

[4] National Institute of Standards and Technology, http://csrc.nist.gov/archive/aes/pre- pound1/aes_9701.txt, Date accessed: April 16, 2013.

[5] National Institute of Standards and Technology, http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf Date accessed: April 16, 2013.

[6] Twofish, http://en.wikipedia.org/wiki/Twofish, Date accessed: 22 May, 2013.

[7] Schneier on Security, ://www.schneier.com/blowfish.html, Date accessed: June 13, 2013.

[8] A. Alabaichi, F. Ahmad, and R. Mahmood, “Security analysis of Blowfish algorithm”, Informatics and Applications (ICIA), 2013 Second International Conference, 2013.

[9] A. Alabaichi, F. Ahmad, and R. Mahmood, “Randomness analysis on Blowfish block cipher using ECB and CBC modes”, Journal of Applied Science, 13(6), pp. 768-769, 2014.

[10] A. Nadeem and M.Y. Javed, “A performance comparison of data encryption algorithms”, Information and Communication Technologies, ICICT 2005, pp. 27-28, 2005.

35

[11] O.P. Verma, R. Agarwal, D. Dafouti, and S. Tyagi, “Performance analysis of data encryption algorithms”, Electronics Computer Technology (ICECT), 3rd International Conference, Volume 5, pp. 399-403, 2011.

[12] Strong Encryption, http://www.tropsoft.com, Date accessed: June 17, 2014.

[13] Security and Risk Management, http://www.counterpane.com/bfsverlag.html, Date accessed: June 17, 2014.

[14] Security and Privacy products, http://www.tropsoft.com, Date accessed: June 22, 2014.

[15] Blowfish products, https://www.schneier.com/blowfish-products.html, Date accessed: June 24, 2014.

[16] Network Security Center, Cryptography, http://www.netsec.org.sa/cryptography.html, Date accessed: June 25, 2014.

[17] An overview of Cryptography, http://www.garykessler.net/library/crypto.html#intro, Date accessed: June 28, 2014.

[18] Cryptography, http://en.wikibooks.org/wiki/Cryptography/Introduction, Date accessed: June 29, 2014.

[19] BlockCipher,http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Block_cipher.html, Date accessed: June 29, 2014.

[20] Computer Security, http://csrc.nist.gov/publications/nistpubs/800-38a/sp800-38a.pdf, Date accessed: June 29, 2014.

[21] Modes of Operation, http://www.cs.odu.edu/~mukka/cs472f12/Lectures/day2- 4/chapter4.ppt, Date accessed: June 30, 2014.

36

[22] Encipherment using modern symmetric key cryptography, http://web2.utc.edu/~Li- Yang/documents/c04-Crypto-symmetric-modes.ppt, Date accessed: July 1, 2014.

[23] Feistel Cipher, http://en.wikipedia.org/wiki/Feistel_cipher, Date accessed: July 1, 2014.

[24] B. Michael, Cryptography, Bloch Ciphers, http://web.cs.du.edu/~ramki/courses/security/2011Winter/notes/feistelProof.pdf, Date accessed: July 2, 2014.

[25] Blowfish Cipher, http://en.wikipedia.org/wiki/Blowfish_(cipher), Date accessed: July 4, 2014.

[26] Key Schedule, http://en.wikipedia.org/wiki/Key_schedule, Date accessed: July 4, 2014. [27] B. Schneier, “Description of a new variable-length key, 64-bit block cipher (Blowfish)”, Cambridge Security Workshop Proceedings, pp. 191-204, 1994.

[28] C.Michael, J. Lin, and L. Youn-Long, “A VLSI implementation of the Blowfish encryption/decryption algorithm”, ASP-DAC Proceedings of the 2000 Asia and South Pacific Design Automation Conference, pp. 1-2, 2000.

[29] Encryption operation modes, https://www.adayinthelifeof.nl/2010/12/08/encryption- operating-modes-ecb-vs-cbc/, Date accessed: July 5, 2014.

[30] D. Huffman, “A method for the construction of minimum-redundancy codes”, Proceedings of the IRE, 40 (9), pp. 1098–1101, 1952.

[31] English Letter Frequency, Cornell University, http://www.math.cornell.edu/~mec/2003- 2004/cryptography/subs/frequencies.html”, Date accessed: July 7, 2014.

[32] Nios II, http://en.wikipedia.org/wiki/Nios_II, Date accessed: July 10, 2014.

[33] Altera, Nios II processor, http://www.altera.com/devices/processor/nios2/ni2-index.html, Date accessed: July 10, 2014.

37

[34] Altera, Nios II Processor Reference, http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf, Date accessed: July 10, 2014.

[35] Namburi Lathika SriDatha, An Efficient VHDL Description and Hardware Implementation of the Triple DES Algorithm, MS Thesis, University of Cincinnati, 2014.

[36] Threefish, http://en.wikipedia.org/wiki/Threefish, Date accessed: 30 May, 2013.

38

APPENDIX

Tutorial for Simulation in Modelsim and Quartus II

To implement any algorithm on an FPGA, we first have to implement the algorithm in a hardware description language. The design is synthesized using Altera Quartus II. The functionality is verified by using a test bench and simulating the design in Modelsim. Version

10.3b of Modelsim and version 13.0 of Altera Quartus II are used in this tutorial.

Quartus II is used to synthesize and analyze designs which are described in hardware description languages (HDLs) for programmable logic devices. We download Quartus II, version

13.0 from the official Altera website. We start by installing Quartus II on our computer. When we open Quartus II, we first create a new project by clicking on “New Project Wizard” on file menu. After the introduction window, we create a working directory and name our project. In the next window, we add all the required files and click ‘Next’. In the family and device settings window, we select Cyclone II as family. Cyclone II is a family of FPGAs provided by Altera. Since we are targeting the Altera DE1 board, we then select the device EP2C20F484C7 from the list of devices. This can be seen in Figure A.1 below.

39

Figure A.1: Device Settings in Quartus II

We can choose from a list of EDA tools in the next page but as we do not need any for our implementation, we click ‘Next’. We view a summary of our settings in the last window and then hit ‘Finish’.

Now that the project is ready for compilation, we click on ‘processing’ and select start compilation. We observe the progress in the task window. There are several stages in the compilation process: Analysis and Synthesis, Analysis and Elaboration, Fitter, Assembler and

Timing Analysis. Any errors during compilation can be seen in the individual stage and should be rectified accordingly. An unsuccessful compilation is shown in Figure A.2.

40

Figure A.2: Unsuccessful Compilation Window

When the compilation is successful, a compilation report is generated. It contains information about the hardware used for the design, such as total logic elements, PLLs, pins, registers etc. Using this information, we can tweak the design to achieve better performance.

Figure A.3 shows a successful compilation.

41

Figure A.3: Successful Compilation Window

In our design, we have used VHDL to implement the algorithm. The algorithm is divided into various modules and each module is implemented separately. We have separate modules for S Boxes, P array, Round function, Feistel function (Blowfish Cipher) and FIFOs. We now write a test bench to verify the functionality of our design. The testbench will use simulation to verify that the design does the correct computation.

To test our implementation we choose the key and input as follows:

Key = 0001000100010001000100010001000100010001000100010001000100010001

42

Input data = 0001000100010001000100010001000100010001000100010001000100010001

Clock frequency is 50 MHz. Encryption is performed. The simulation result is shown in

Figure A.4.

Figure A.4: Blowfish Encryption

Output data obtained is

0010010001100110110111011000011110001011100101100011110010011101

We now give this output data as input to Blowfish with the same key and perform decryption. The output of decryption is the same as the input to encryption. This can be seen in

Figure A.5. Hence the functionality of the design is verified.

43

Figure A.5: Blowfish Decryption

44