CALIFORNIA STATE UNIVERSITY NORTHRIDGE

NOISE REDUCTION AND IMAGE SMOOTHING USING GAUSSIAN BLUR.

A graduate project in fulfillment of the requirements

For the degree of Masters of Science

In Electrical Engineering

By

Sachin Eswar

MAY 2015 The graduate project of Sachin Eswar is approved:

______

Dr. Ramin Roosta, Ph.D. Date

______

Dr. Matthew Radmanesh, Ph.D. Date

______

Dr. Shahnam Mirzaei, Ph.D., Chair Date

California State University Northridge

ii

ACKNOWLEDGEMENT

Working on this “ Reduction and Image Smoothing using Gaussian Blur” was a source of immense knowledge to me. I would like to express my sincere gratitude to Dr.

Shahnam Mirzaei for his encouragement and unlimited support towards completion of my dissertation. His guidance and support are the key points for the completion of my project. I would also like to thank Dr. Ramin Roosta and Dr. Mathew Radmanesh for their extended support, scholarly advice and inspiration in completion of my project successfully.

I am grateful to the Department of Electrical and computer engineering for giving me this opportunity and providing me the required knowledge in completing my project.

I would also like to thank my family and friends, who encouraged me to extend my reach, and for their love, care and support through all the tough times during my graduation.

iii

TABLE OF CONTENTS

SIGNATURE PAGE: ...... ii

ACKNOWLEDGEMENT ...... iii

LIST OF FIGURES ...... vi

ABSTRACT ...... ix

CHAPTER 1: INTRODUCTION ...... 1

Introduction on Image Processing: ...... 1

Introduction on Gaussian/ and Filter: ...... 2

Gaussian distribution: ...... 2

Gaussian Filter: ...... 4

Gaussian Blur ...... 6

CHAPTER 2: FPGA MEMORY UNITS ...... 9

Block RAM: ...... 9

Write modes:...... 12

WRITE_FIRST mode: ...... 12

READ_FIRST mode: ...... 13

NO_CHANGE mode: ...... 14

FIFO: ...... 14

FIFO mode of operation: ...... 16

iv

CHAPTER 3: GAUSSIAN BLUR PRINCIPLE ...... 21

CHAPTER 4: IMPLEMENTATION OF GAUSSIAN BLUR USING MATLAB ...... 27

Implementing using inbuilt Matlab function: ...... 27

Steps using inbuilt function: ...... 27

Implementing without using inbuilt Matlab function: ...... 28

CHAPTER 5: IMPLEMENTATION OF GAUSSIAN BLUR USING FPGA ...... 31

Top level design flowchart: ...... 31

Implement Convolution in FPGA: ...... 32

Generating COE files: ...... 34

Finite state machine:...... 35

Displaying the output image using Matlab...... 40

CHAPTER 6: SYNTHESIS AND SIMULATION ...... 42

Synthesis using XST: ...... 42

Simulation using ISim: ...... 47

CHAPTER 7: CONCLUSION ...... 51

REFERENCES ...... 52

APPENDIX A: FPGA SUMMARY REPORTS ...... 54

APPENDIX B: SOURCE CODE ...... 61

VHDL code for module: ...... 61

VHDL test bench for Gaussian Filter module: ...... 76

v

LIST OF FIGURES

Figure 1: Gaussian distribution ...... 3

Figure 2: Gaussian distribution in 3 dimension...... 4

Figure 3: Discrete ...... 5

Figure 4: 3-dimensional discrete Gaussian distribution...... 6

Figure 5: Kernel matrix of size 7x7...... 7

Figure 6: Input image sample with random noise...... 8

Figure 7: Blurred image sample after applying Gaussian blur ...... 8

Figure 8: Single port BRAM pin diagram...... 10

Figure 9: Single port RAM architecture...... 10

Figure 10: Cascaded BRAM/Dual port BRAM pin diagram...... 11

Figure 11: Cascaded BRAM architecture...... 12

Figure 12: WRITE_FIRST mode waveform...... 13

Figure 13: READ_FIRST mode waveform...... 13

Figure 14: NO_CHANGE mode waveform...... 14

Figure 15: FIFO-18 ...... 15

Figure 16: FIFO-36 ...... 15

Figure 17: FIFO capacity ...... 16

Figure 18: Read cycle timing diagram...... 17

Figure 19: Synchronous FIFO timing diagram...... 18

Figure 20: Dual-Clock FIFO Flag Assertion and De-assertion Latency ...... 20

Figure 21: Input filtering using Gaussian kernel function...... 22

vi

Figure 22: Matrix convolution algorithm...... 23

Figure 23: Input image that need to be blurred...... 24

Figure 24: Blurred image with kernel size 3 and sigma 0.5,2,8...... 25

Figure 25: Blurred image with kernel size 7 and sigma 0.5,2,8 ...... 25

Figure 26: Blurred image with kernel size 11 and sigma 0.5,2,8 ...... 25

Figure 27: Matlab snippet showing the convolution steps using inbuilt function...... 28

Figure 28: Snippet showing zero padding...... 29

Figure 29: Snippet showing the convolution logic...... 30

Figure 30: Top level design flowchart showing the project outline...... 31

Figure 31: Flowchart showing the overall process...... 33

Figure 32: Matlab snippet to create COE file...... 35

Figure 33: Finite state machine...... 36

Figure 34: VHDL snippet showing set address state for first 2 cycles...... 37

Figure 35: VHDL snippet showing read_input state for first 2 cycles...... 37

Figure 36: VHDL snippet showing the last read input cycle...... 37

Figure 37: VHDL snippet showing the computation cycle...... 38

Figure 38: VHDL snippet showing combinational process block used for convolution. . 39

Figure 39: VHDL snippet showing store_output state...... 39

Figure 40: VHDL snippet showing complete state...... 40

Figure 41: Matlab snippet used to display the output image...... 41

Figure 42: Top level design...... 42

Figure 43: Complete design schematic...... 43

Figure 44: Schematic showing BRAM...... 44

vii

Figure 45: Schematic showing FSM...... 45

Figure 46: Schematic showing the output storage phase...... 46

Figure 47: Simulation showing the read cycle...... 47

Figure 48: Simulation showing computation, storing and complete state...... 48

Figure 49: Simulation showing the last cycle of the matrix convolution...... 49

Figure 50: Simulation showing output data written to the text file...... 50

Figure 51: HDL Synthesis Report...... 54

Figure 52: Device Utilization summary report...... 55

Figure 53: Design summary report...... 56

Figure 54: Report showing Finite State Machine analysis...... 57

Figure 55: Total time taken and total memory used...... 57

Figure 56: Overview of design summary report - part 1 ...... 58

Figure 57: Overview of design summary report - part 2 ...... 59

Figure 58: Map report showing device info...... 60

viii

ABSTRACT

Noise reduction and image smoothing using Gaussian blur

By

Sachin Eswar

Master of Science in Electrical Engineering

The main goal of this project is to design a filter to smoothen the given image based on Gaussian blur technique. Image Blurring or image smoothening is a technique of averaging the group of in order to reduce noise and sharpness at the edges.

Noise reduction is one of the major concerns in the image analysis and application industry. The main goal of noise reduction is to remove unprofitable information that may corrupt the image. This can be achieved by numerous techniques, such as Median filtering, Gaussian filtering, averaging, applying Fourier transformation and many more.

Edges in an image are the outline that details the structure of an object in the image.

Blurring is in fact reducing the edges and making the transition from one color to the next color in a smooth manner. The blurring or image smoothening effect can be achieved by choosing an appropriate nonlinear filter kernel, which performs the averaging in a selected neighborhood of pixels and normalizes the pixel value. The filter kernel coefficients change their values according to the image structure, which is to be

ix smoothened. Adaptive smoothing is a nonlinear process, which reduces the noise but still preserves the valuable features of the image.

There are few trade-offs that have to be taken into consideration in order to bring up a product in the market in which size and performance are crucial. Size refers to the actual hardware required to implement the design whereas performance refers to how fast the system can perform the operation. In this project, I have mainly concentrated on the first factor “size”, making use of limited hardware trading off with the performance.

x

CHAPTER 1: INTRODUCTION

My project mainly focuses on the design of filter to smoothen the image using Gaussian blur principle. This design is implemented using Matlab and VHDL (Very high speed integrated circuit Hardware Description language), and the simulation is done on ISE simulator (ISim).

Introduction on Image Processing:

Any form of signal processing operation performed on an input image, either a photograph or video frame and extracting the characteristics or parameters related to the images being processed is referred to as image processing. The output of this process is either an image or a set of parameters that exhibits the required characteristics of the input image that was processed. There are 3 ways of image processing namely, Analog image processing, and optical image processing but in today’s world, image processing mainly refers to digital image processing as analog and optical image processing are not widely used due to various limitations.

An image has spatial variation (can be represented in the rectangular co-ordinate system) and hence, can be considered as a two-dimensional signal for standard processing techniques.

Data acquisition or Imaging is the first step involved in processing an image. Data acquisition means collecting the required information or parameters contained to the input image and performing the required transformation operation. This image data are usually stored in form of matrix which determines the value of respective pixel based on its position.

1

During Imaging, random variations may occur in brightness, edges or color information of the input image which is referred to as noise. This noise is either produced by the source used to capture the image such as optical sensors, scanners, digital cameras or by external elements such as interference, aging of the printed copy and many more. Noise adds extraneous information to the image. Noise magnitude can vary from specks on a digital film to entirely noise where sophisticated processing techniques provide very little image information. In signal processing, it is often desired to be able to perform some kind of noise reduction on an image or signal.

Introduction on Gaussian/Normal distribution and Filter:

Gaussian distribution:

This design is based on the Gaussian distribution function stated as “The Gaussian distribution is a continuous function which approximates the exact binomial distribution of the given set of events.” and is given by

−푥2 1 ( ) 푓(푥) = 푒 2휎2 For events having variations in one particular dimension and 2휋휎2

1 2 2 2 푓(푥, 푦) = 푒−((푥 +푦 )/2휎 ) For events having variations in 2 dimension 2휋휎2

Where, x is the deviation of the center pixel along x-axis (horizontal plane), y is the deviation along the y-axis (vertical plane) and 휎 (sigma) is the standard deviation that defines the amount of blurring. The blurring effect is directly proportional to the value of

휎 (sigma).

2

Figure 1: Gaussian distribution

Values from this distribution are used to build a convolution matrix which is applied to the original image. Each pixel's new value is set to a weighted average of that pixel's neighborhood. The original pixel's value receives the heaviest weight (having the highest

Gaussian value) and neighboring pixels receive smaller weights as their distance to the original pixel increases. This results in a blur that preserves boundaries and edges better than other.

Gaussian distribution is an extended part of binomial distribution. When there is large number of events, then it may be treated as a continuous function which is known as

Gaussian distribution. It is also referred to as “Normal Distribution” and is described as a

“Bell-shaped curve” as the response of the impulse function yields a graph in shape of a bell. Theoretically, the Gaussian function is non-zero function and has the domain of –∞

(negative infinity) to +∞ (positive infinity). However this function decays rapidly and hence the required information can be obtained in the FWHM (Full width at half

3 maxima), which implies that the image features can be effectively reconstructed with samples in this window.

FWHM = 2√2푙푛2 휎 ≅ 2.35휎

Figure 2: Gaussian distribution in 3 dimension.

Gaussian Filter:

A Gaussian filter is a non-linear filter with impulse response equal to Gaussian Function.

Each pixel in the given image is transformed using Gaussian function. It is often reasonable to truncate the filter window and implement the filter directly for narrow windows using a simple rectangular window function. As we know, the Gaussian function is a continuous function over a wide domain and hence the Gaussian kernel is also continuous. A discrete function can be derived by sampling this continuous function at a sampling rate greater than the cut off frequency and is known as the Gaussian kernel:

Fs = fc (2πσ)

4

Where Fs is sampling frequency, fc is the cut off frequency of the filter and 휎 is the deviation of the element from the mean.

This Gaussian kernel when needed to be convolved with a 2 dimension matrix has to be converted to an equivalent matrix structure and this matrix of the Gaussian discrete function is known as the Gaussian kernel matrix. Filtering can be implemented by convolving the input matrix with the Gaussian kernel matrix.

Figure 3: Discrete Gaussian Function.

5

Figure 4: 3-dimensional discrete Gaussian distribution.

The Fourier transformation of the Gaussian function yields a Gaussian function, hence the signal (after being divided into overlapping windowed blocks) can be transformed with a “Fast ”, multiplied with a Gaussian function and transformed back. This is the standard procedure of applying a finite impulse response filter and explicitly knowing the filter window.

Gaussian Blur

Gaussian blur is the result of modifying the image parameters using the Gaussian function. It is mainly used to reduce the noise in the image and minimize the image details. The visual effect of this blurring effect resembles that of viewing the image through a translucent lens. This process is also used as a pre-processing stage in order to enhance image structure.

6

Applying Gaussian blur to an image can be related as convolving the image with the

Gaussian kernel matrix. This is also known as two dimensional “Weierstrass transformation”. This filter basically removes the high frequency components in the image and hence is a low-pass filter. The Gaussian function at every point on the image will be a non-zero value and hence the entire image would need to be included in the calculations for each pixel. When computing a discrete approximation of the Gaussian function, pixels at a distance of more than 3σ are small enough to be considered effectively zero. Thus contributions from pixels outside that range can be ignored as the change is not that significant. So an image processing systems needs to calculate only for a fixed size of matrix as the values beyond the range would tend to zero. To ensure a result very close to the one obtained from entire Gaussian distribution, the required size of matrix is [6휎]x[6휎].

Below is an example of Gaussian kernel matrix of size 7x7 and with 휎 = 0.84.

Figure 5: Kernel matrix of size 7x7.

As we can observe that the center pixel has the highest value and, the value of the pixel element decreases as the distance from the center element increases. Beyond a certain radii, the value of the kernel is negligible, which is as portrayed by the Gaussian distribution function.

7

Figure 6: Input image sample with random noise.

Figure 7: Blurred image sample after applying Gaussian blur

8

CHAPTER 2: FPGA MEMORY UNITS

Block RAM:

Block RAM (BRAM) is a memory storing element that stores up to 36Kb of data which can be configured in two combinations either as one complete 36Kb or as 2 independent

18Kb RAM blocks. Also there are several combinations in which the RAM can be configured such as 16kx1, 8kx2 and so on till 512x36 in simple dual port mode. The

BRAM also provides 64 bit error correction coding block and also has separate encoding and decoding functionality. BRAM is synchronous, meaning that the read and write operations from and to the memory are based on the changing edge of the clock input signal. The read and write operations are also dependent on the read/write enable port.

When the write enable pin is set to high, the memory location pointed by the address bus is written with the data available on the input data bus and when the pin is set to low, the data available at the memory location pointed by the address bus is available to read at the output data bus at the rising edge of the clock.

9

Figure 8: Single port BRAM pin diagram.

Figure 9: Single port RAM architecture.

These memory blocks can also be cascaded in order to increase the memory location available without using any logic blocks or interconnects to combine the memory blocks.

The structure is fully symmetrical and both ports are interchangeable. Data can be written to either or both of the port and can also be read from either or both the ports at any given time. The read and write operations are synchronous and require a clock edge as this is an

10 component. Each port has its own addresses, data in, data out on the data busses, clock, write enable and clock enable. Also each individual port can be configured with different data bit widths. For example, port A can have 36-bit read width and a 9-bit write width, and port B with 16-bit read width and a 36-bit write width. This feature increases the efficiency of implementing a content addressable memory in the BRAM.

Figure 10: Cascaded BRAM/Dual port BRAM pin diagram.

11

Figure 11: Cascaded BRAM architecture.

Write modes:

There are three data write modes that determine the behavior of data available on the output latches after a write clock edge namely READ_FIRST, WRITE _FIRST and

NO_CHANGE with the default mode being WRITE_FIRST.

WRITE_FIRST mode:

In this mode, the data on the input data bus is first written on to the memory location at a particular address pointed by the address bus and then outputs the newly written data on to the output bus. This mode is also known as transparent mode.

12

Figure 12: WRITE_FIRST mode waveform.

READ_FIRST mode:

In this mode, the data on the memory location is first written on the output bus and then the new data on the input data bus is written on to the memory location of the address bus. This mode is also known as read-before-write mode.

Figure 13: READ_FIRST mode waveform.

13

NO_CHANGE mode:

In this mode, the data on the output bus remains unchanged when a write operation is performed. This mode is more power efficient as the data is not read from the memory during the write operation.

Figure 14: NO_CHANGE mode waveform.

FIFO:

FIFO is an extended behavior of BRAM. FIFO stands for first in - first out. The data that is stored first, is retrieved first. This block can be used for logical operations too such as counters, comparators or status flag generation. When a rising edge of clock is sensed, the first available memory location is written with the data available on the data bus and also the data available on the last written memory location is available on the output bus.

They also support 2 modes of operations namely, standard mode and FWFT (first word fall through) mode. They can be configured as either 18 Kbits or 36 Kbits memory and is as shown below.

14

Figure 15: FIFO-18

Figure 16: FIFO-36

The 18Kb FIFO supports 4K x 4, 2K x 9, 1K x 18, and 512 x 36 configurations and the

36 Kb FIFO supports 8K x 4, 4K x 9, 2K x 18, 1K x 36, and 512 x 72 configurations.

15

Figure 17: FIFO capacity

FIFO mode of operation:

There are 2 modes of operation in FIFO module. They are standard mode and first word fall through mode. They only differ in the way the output is presented immediately after the first word is written onto the previous empty FIFO. In standard mode, after the first word is written into the empty FIFO, the empty flag is de-asserted (logic low) synchronously with the read clock. Following that, when the read enable (RDEN) is asserted (logic high), the first word appears on the output data bus (DO) on the rising edge of clock (RDCLK). In FWFT mode, after the first word is written into the empty

FIFO, it appears on the output data bus (DO) even before the read enable is asserted.

Subsequent read operations require empty flag to be de-asserted and the read enable to be asserted. The timing diagram shown below illustrates the operation in different modes.

16

Figure 18: Read cycle timing diagram.

FIFO’s can be categorized as Dual clock FIFOs and synchronous clock FIFOs based on the clock cycle used for read and write operations.

Dual clock FIFO:

This provides a very simple user interface. The design relies on the free-running clocks

(read and write) of identical or different frequencies up to the maximum allowed frequency limit. This design helps to avoid ambiguities, glitches and metastable problems even with different clock frequency for the read and write operation. The write operation is synchronous, the data available on the DI is written to the memory when the WREN is active one setup time before the write clock edge. The read operation is also synchronous, presenting the next data on the output data bus DO when the read enable RDEN is active one setup time before the read clock edge. These read and write operations can be done on the same clock or on different clock with different frequency.

Synchronous FIFO:

In this mode, same clock is used for both read and write operation. Also the attribute

EN_SYN is set to true to enable the synchronous mode. The reset can be either

17 synchronous or asynchronous. The FWFT is not available in the synchronous mode as the operation is performed after the clock edge.

Figure 19: Synchronous FIFO timing diagram.

FIFO has 6 status flags that are used to notify the current status of the operation and these flags need one or more clock cycle to either assert or de-assert the flag. There may be some offset experienced in dual clock FIFO but the latency is negligible in case of synchronous FIFO. The 6 status flags are:

Empty flag: This flag is synchronous with the read clock and is asserted when the last element of the FIFO is read. The read pointer is frozen when there are no more entries in the queue. When new data is written into the FIFO, it takes 3 or 4 clock cycles to de- assert the empty flag. The empty flag is used mostly in the read clock domain or the read cycle. The empty condition can only be terminated by the write clock. When there is data written to the FIFO, the memory location will start storing data and hence the empty flag will be reset.

18

Almost empty flag: This flag is set when the FIFO contains the number of entries specified by the ALMOST_EMPTY_OFFSET value or fewer entries. This flag symbolizes that the FIFO memory is being used less than the configured threshold and warns the user to stop reading further values. This flag is de-asserted when the number of entries in the FIFO is greater than the configured ALMOST_EMPTY_OFFSET.

Assertion and de-assertion is synchronous to the read clock.

Read error flag: This flag is asserted when the empty flag is set and the user attempts to read the data. This triggering is to avoid further increment of the address and to notify that the end of the memory is reached and there is no data to be fetched. This flag is synchronous to the read clock and is de-asserted when the read enable (RDEN) or Empty flag is set to low.

Full Flag: This flag is set when there is no more available entries in the FIFO queue. The write pointer is frozen when this flag is set/ FIFO is full. It is synchronous with the write clock and is de-asserted 3 clock cycles after a subsequent read operation.

Write error flag: This flag is asserted when the full flag is set and attempts are made to write to the FIFO. This triggering is to avoid further increment of the write address and to notify user that there is no more available memory to write the data. This flag is synchronous to write clock and is de-asserted when write enable (WREN) or full flag is set to low.

Almost full flag: This flag is set when the FIFO contains the number of available memory entries specified by the ALMOST_FULL_OFFSET value or fewer entries. This flag symbolizes that the FIFO memory is being used more than the configured threshold and warns the user to stop writing further values. This flag is de-asserted when the

19 available entries in the FIFO is greater than the configured ALMOST_FULL_OFFSET.

Assertion and de-assertion is synchronous to the write clock.

Figure 20: Dual-Clock FIFO Flag Assertion and De-assertion Latency

20

CHAPTER 3: GAUSSIAN BLUR PRINCIPLE

Gaussian blur is an image processing algorithm used to smoothen the given image. The blurring effect is used to reduce the sharp edges and get smooth color transition at the edges. This is done by applying the Gaussian filter. The Gaussian is a non-linear low pass filter that has a cut off frequency given by,

풇풄 = 푭풔/(ퟐ흅흈)

Where 풇풄 is the cutoff frequency, Fs is the sampling rate and 휎 is the standard deviation that defines the amount of blurring and is measured in samples.

An image can be seen as a variation in horizontal and vertical direction and hence can also be represented in form a 2-Dimension matrix. The kernel vector is a discrete vector obtained by sampling the continuous kernel function. As the image is in the form of 2-D matrix, we need to convert the kernel matrix to similar dimension in order to apply the filtering appropriately. Hence the kernel matrix is converted into a 2-D square matrix with the weighted elements calculated with their deviation from the center element. The weighted kernel vector is calculated using the below expression,

−푥2 1 ( ) 푓(푥) = 푒 2휎2 For events having variations in one particular dimension and 2휋휎2

1 2 2 2 푓(푥, 푦) = 푒−((푥 +푦 )/2휎 ) For events having variations in 2 dimension 2휋휎2

Where, x is the deviation of the center pixel along x-axis (horizontal plane), y is the deviation along the y-axis (vertical plane) and 휎 (sigma) is the standard deviation that

21 defines the amount of blurring. The blurring effect is directly proportional to the value of

휎 (sigma).

The Gaussian blur can be obtained by convolving the image matrix with the kernel matrix.

Figure 21: Input pixel filtering using Gaussian kernel function.

The image matrix is convolved with the kernel as shown in the above figure. The kernel matrix is place over the image pixels in such a way that the kernel matrix overlaps the image matrix with the center pixel being convolved. The result of the convolution is stored in another matrix with the respective pixel elements placed in the same order of convolution. The resulting matrix is the blurred image vector stored in form of a 2-D matrix. This matrix convolution can be implemented by multiplying the pixel element of the image with the corresponding weighted kernel element and then adding these result.

22

X0 X1 X2

Pixel patch = X3 X4 X5

X6 X7 X8

K0 K1 K2

Kernel = K3 K4 K5

K6 K7 K8

Result = X0*K0 + X1*K1 + X2*K2 + X3*K3 + X4*K4 + X5*K5 + X6*K6 + X7*K7 +

X8*K8

Figure 22: Matrix convolution algorithm.

These convoluted values are then placed in form of a matrix similar to the image matrix and this matrix, when displayed as an image and compared with the original image to verify the blurring effect. For an image of matrix size axb and kernel matrix size cxd, the

23 resultant image matrix is of size (a+c-1) x (b+d-1) i.e., the sum of the image and kernel reduced by 1. Hence the size of the blurred image is greater than the actual input image.

Also the blurring effect depends on 2 main factors kernel matrix size and the 휎 (sigma).

The blurring effect is directly proportional to sigma. As the sigma value increases, more is the blurring effect on the image. This is because the deviation from the center pixel is more and will be negligible as the weighted matrix elements are reduced and making the convoluted value smaller. This will increase the blurring effect on image. Also, the other factor that affects blurring is the kernel matrix size. The blurring is more when the kernel size increases. Hence we can say that,

Blurring effect ∝ kernel size ∝ sigma

Figure 23: Input image that need to be blurred.

24

Figure 24: Blurred image with kernel size 3 and sigma 0.5,2,8.

Figure 25: Blurred image with kernel size 7 and sigma 0.5,2,8

Figure 26: Blurred image with kernel size 11 and sigma 0.5,2,8

25

In the above images we can see the amount of blurring produced by different sizes of kernel matrix and different value of sigma. Also from the images it is evident that the blurring effect increases with the increase in kernel matrix size and sigma values. Figure

23 shows the blurring effect associated with the kernel size is 3x3 and multiple values of sigma. It is observed that the blurring effect increases with increase in the sigma. Similar variation is seen in figure 24 and figure 25 along with the change in size of the matrix.

26

CHAPTER 4: IMPLEMENTATION OF GAUSSIAN BLUR USING MATLAB

Matlab is a high level language with interactive environment being used by millions of engineers and scientists. It is a very powerful and easy to work on tool that is used in all major fields. It lets the user explore, visualize and collaborate the ideas across multiple disciplines such as digital signal and image processing, communication system analysis, control system analysis and many more. It has most of the functions defined and we could just use the inbuilt functions to achieve the expected result.

Implementing using inbuilt Matlab function:

Here I have used the inbuilt function to convolute the matrices. First we read the input image and convert it gray scale. The Matlab function used to convolute the two 2-D matrixes is conv2 function and a special filter, fspecial is used to get the Gaussian kernel matrix. Parameters in the function defines the filter to be used (Gaussian in our case), the kernel matrix size denoted as [3, 3] to define a 3x3 matrix and sigma values.

Steps using inbuilt convolution function:

The image is first read and stored in the form of a matrix. Then it is converted to a grayscale in order to normalize the image to black and white. Then we extract the 2-D matrix with a definite matrix size. In this design, I have taken an image of size 90x90.

Then the kernel matrix is obtained by using the special function ‘fspecial’ for different sigma values and then the image is convoluted with the kernel matrix using the Matlab inbuilt function conv2 which convolves the two 2-D matrixes. Then the resulted image is displayed using ‘imshow’, inbuilt function to display matrix in form of image.

27

Figure 27: Matlab snippet showing the convolution steps using inbuilt function.

Figure 23 is the set of output images obtained from the above code snippet.

Implementing without using inbuilt Matlab function:

The convolution can also be implemented with a code snippet as an alternative to the inbuilt function as it will be convenient when implementing the convolution principle on

FPGA as FPGA doesn’t have any inbuilt functions for this purpose.

The image is first extracted and stored in matrix form. The resulting image is converted to grayscale to extract the image details in form of black/white pixels. The reason for gray-scaling is to normalize the image parameters between the two boundaries. It is then converted to required size so that the dimension of the matrix is known. Then the Kernel matrix is got from the special filter function and is also stored in a 2-D matrix. Now the two matrices have to be convoluted in order to get the resultant matrix which is nothing but the blurred image.

28

Matrix convolution can be performed by multiplying the pixel element of the image with the corresponding weighted kernel element and then adding these result. For this, the kernel matrix has to be placed over the image matrix in such a way that the center pixel and the center of the weighted matrix has to co-inside so that the corresponding pixel elements and the weighted elements can be multiplied and then added to get the convoluted output pixel value. If we place the weighted kernel matrix over the current image matrix, the outer (n-1) rows and (n-1) columns will not be convoluted it is not possible to have negative pointers. Hence we append the matrix with zeros with (n-1) rows and columns so that the all the image pixel elements can be convoluted. This processing of appending zeros to the matrix is called zero padding.

Figure 28: Snippet showing zero padding.

Here we have taken the Gaussian kernel of size 3x3 and hence the matrix is padded with

2 rows and 2 columns of zeros. Now when the padded matrix is convoluted with the

Gaussian kernel, the convoluted matrix will also have the corner rows and columns pixel information as-well.

Now we start convoluting the image matrix with kernel taking 9 elements taken at a time in form of a 3x3 matrix. Once the convolution is performed, the next set of elements are taken and convoluted with the kernel and this process continues till the entire pixel elements are convoluted and after each convolution cycle, the result is stored in from of a

2-D matrix. This matrix is then displayed in the form of an image using a Matlab function.

29

Figure 29: Snippet showing the convolution logic.

30

CHAPTER 5: IMPLEMENTATION OF GAUSSIAN BLUR USING FPGA

Top level design flowchart:

Figure 30: Top level design flowchart showing the project outline.

The above figure shows the outline or block diagram of my project. The input image is first read into Matlab and is stored in the form of a matrix. This matrix should be inputted to the FPGA but the FPGA memory unit can be loaded only by a vector. Hence this matrix is converted to a 1-D vector read row wise. Along with the image pixels, the

Gaussian kernel is also converted to a one dimensional vector. These values are written to

.COE files which are loaded onto FPGA memory units, one to store the image pixel and the other to store the kernel vector. Convolution operation is performed in FPGA and the processed pixel values are stored on to another blocked RAM. These values in the text file are stored in form of one dimension vector. In order to display the output in form of

31 an image, this vector has to be rearranged and stored in form of 2 dimension matrix. The matrix is then converted into an image and is displayed at the output.

Implement Convolution in FPGA:

Once the vector is loaded onto the memory, it can be retrieved by writing the respective address on the address bus of the BRAM. When the rising edge of clock is encountered, the value at memory location pointed by the address bus is available at the BRAM output.

One memory location can be accessed at the rising edge of the clock. Hence to read the 9 pixel elements, we will need 9 clock cycles. Also as we are storing the image pixels and kernel in different memory blocks, both the memory location can be fetched at the same clock. Hence 9 clock cycles are sufficient to read both image pixel and kernel elements.

Once the data is read, the two matrices are convoluted and the convoluted output is written to the output memory block. This convoluted output is then written to a text file using FIFO block. The text file is then read using MATLAB and converted to a matrix and then the output image is displayed using a Matlab function.

32

Figure 31: Flowchart showing the overall process.

The above flowchart shows the procedure followed to implement the Gaussian blur principle.

33

Generating COE files:

The image is first read and converted to matrix. The matrix is then converted to grayscale in order to normalize the pixel elements to black & white values. The matrix is then converted to the required size. In our case, it is saved as 90x90 matrix. It is then padded with zeros using the Matlab function ‘padarray’. The resulting matrix is of dimension

94x94. This matrix is then written on to a text file called image.txt using the Matlab function. The file is then renamed with extension .COE. Similar steps are followed to generate the COE file for the kernel matrix too. These COE files are then loaded on to the

BRAM memory. The snippet used to generate the OCE files are shown in the figure below.

34

Figure 32: Matlab snippet to create COE file.

Finite state machine:

In this project, I have designed the matrix convolution using finite state machine approach. It is designed with 5 states namely: Set address, Read input, computation, store output and complete. Each state has its own significance. The names of the states in the

FSM itself are self-explanatory.

35

Figure 33: Finite state machine.

The state machine has 5 states as shown above. End is not a state in the FSM but is used to represent when the state machine is terminated. The “Set_address” state is used to set the address of the memory elements for both image pixels and kernel weighted elements.

It requires one clock cycle to write the address onto the address bus. Once the address is set, next state “Read_input” is called, where it reads the data stored in the memory locations pointed by the address bus. Then the next address is set in the set_address state and the values at this memory location are read in the clock cycle observed in the read_input state. This set_address and read_input states are looped/repeated till all the elements that are required for the convolution is read successfully.

36

Figure 34: VHDL snippet showing set address state for first 2 cycles.

Figure 35: VHDL snippet showing read_input state for first 2 cycles.

Figure 36: VHDL snippet showing the last read input cycle.

Once the elements are read successfully, the state is changed to computation state. Here, a flag ‘compute’ is set which triggers a combinational process block which does the

37 convolution. This process is executed only when the flag ‘compute’ is set to high, which is set in during the computation state as shown below.

Figure 37: VHDL snippet showing the computation cycle.

The combinational process block executes wherever the ‘compute’ flag values changes but is calculated only when the flag is high. The image pixel values are multiplied with the respective kernel weighted elements and then, the multiplied results are added and assigned to a register unit named ‘output_conv’.

38

Figure 38: VHDL snippet showing combinational process block used for

convolution.

Once the convoluted result is computed, the state is changed to ‘store_output’ where the value available on the signal bus is stored or written to the BRAM memory location pointed by the address bus of output memory block.

Figure 39: VHDL snippet showing store_output state.

Once the memory is written with data available on the input, the state is changed to

‘Completed’ state. Here we check whether we reached the end of particular row or end of

39 the matrix. Local counters are used to keep track of the elements convoluted and stored.

When a row end is encountered, the input matrix address is increased by 3 to point to the respective pixel element in the next row. We have taken a value of 91 to check for row end as the input matrix is of size 94x 94(after zero padding) and the last element of the image is stored in the memory location 2-addresses less than the last element in the row.

Also when the last element of the input matrix is reached, the state machine is terminated and the data present in the output BRAM memory block is written to a text file.

Figure 40: VHDL snippet showing complete state.

Displaying the output image using Matlab.

The output matrix values are now available in a text file and this text file has 8464 data values. These values are read from the file using Matlab and stored in form of a 2-D matrix of 92x92. This matrix is then displayed in form of image using ‘imshow’ function in Matlab.

40

Figure 41: Matlab snippet used to display the output image.

41

CHAPTER 6: SYNTHESIS AND SIMULATION

The synthesis and simulation in this project are done using two different platforms.

Synthesis is performed on the ISE software call XST (Xilinx Synthesis Technology) and simulation is performed on ISIM (I simulator) and these two are available as packages along with Xilinx.

Synthesis using XST:

Figure 42: Top level design.

The above figure shows the top level design. It shows system clock, reset at the input and ima_hex_out’ at output port. The data is loaded to the memory from the COE file and hence is not shown in the top level architecture.

42

Figure 43: Complete design schematic.

The above figure shows the complete schematic of the design. The main blocks are enlarged and shown below.

43

Figure 44: Schematic showing BRAM.

The Memory blocks shown above are loaded from the COE file. Hence the input bus is shown as not connected in the above schematic. Also, as there is no further write operation on the input BRAM and for read operation, the write enable pin has to be kept low. Hence write enable pin is grounded. The other inputs such as address bus and clock

44 are shown connected as these values are continuously inputted during the execution of the design. The range mentioned in the parenthesis defines the size of the bus. For example “douta (7:0)” means that the 8-bits of data can be sent over this data bus.

Figure 45: Schematic showing FSM.

The above figure shows the Finite State Machine used in my design project. The matrix considered for convolution is of size 3x3. Therefore to count for the 9 pixel elements, we need 4 bits (0000 to 1001). Hence we see the “pixel count” of 4 bits which is used to track the count of the pixel elements read before convolution process. Also I have used 5 states in the state machine for the design. Hence to represent the 5 states we need a

45 combination of 3-bits. This is shown by the 3 input lines to the FSM on the right side in above figure.

Figure 46: Schematic showing the output storage phase.

The above schematic shows the output storage phase. The Convoluted result is available on the input data bus of the output BRAM. Based on the address available on the address bus and the status of the write enable signal, the data on the input data bus is store in the

BRAM. And once all the convoluted results are stored in the BRAM, it is then registered and written on to the FIFO on the following clock cycles. Output is extracted from the

FIFO and written on to the text file.

46

Simulation using ISim:

Simulation for my project is done using ISim (ISE Simulator). It provides a complete, full featured HDL simulator package integrated within ISE. It has 2 operation mode,

Graphical User Interface (GUI) and Command Line mode. The graphical user interface mode provides a graphical view of data, whereas the command line mode doesn’t have any interface and user have to run or execute a set of commands on the command line.

In this project, GUI mode is used as the data is shown visually in form of graphs and waveforms that are easier to analyze and debug. Also it is more convenient to display this data compared to the command line mode as it is just a set of values which is difficult to put in together.

Figure 47: Simulation showing the read cycle.

47

The above figure shows the simulation of the read cycle. In this design, the read cycle is a combination of 2 states of the FSM namely set_address and read_input. As shown in figure 47, setting address and reading the input from that memory location is repeated till the elements required for convolution are read completely. We can see that it takes one clock cycle to set the address and one clock cycle to read the data. Therefore, it takes 18 clock cycles to read one set of inputs required for convolution. The values read from the memory are stored in a temporary array called Image_array and kernel_array which are used in the combinational block to perform convolution.

Figure 48: Simulation showing computation, storing and complete state.

The above figure shows simulation details for computation, store_output and complete states of the FSM. The computation is performed in a combinational process block and

48 whenever the parameter governing the combinational process changes, the result is computed based within the clock cycle and the data is available on the input bus of the

BRAM used to store the output. In the next clock cycle, the store_output state is reached and the data is written to the memory address available on the address bus. Then the state is changed to complete, where we check whether the end of row or end of the image is encountered. If end of row is encountered, the input address is incremented by 3 and the state is reset to read cycle so that the next row of elements are taken into consideration.

Figure 49: Simulation showing the last cycle of the matrix convolution.

When the last pixel is encountered, the state machine stays in the complete state till the simulation ends. In this scenario, the done flag is set to high, which ends the convolution process and the data available in the output BRAM has to be loaded to the FIFIO one by

49 one. This is achieved be resetting the address and making the write enable low so that the

BRAM will now enable read operation. The data is placed into the FIFO at every clock cycle and the same data is written on to the output bus, which in turn writes the data to the text file as shown in the figure below.

Figure 50: Simulation showing output data written to the text file.

50

CHAPTER 7: CONCLUSION

As per the section analysis of the design, simulation waveforms and obtained results, we can say that the implemented design is behaving as desired and hence the design project has been successfully completed. The main goal of this project was to gain more knowledge and experience in coding and design verification and also to experience in using design tools such as MATLAB, ISE, XST and ISim. I having learned the above mentioned tools and also gained understanding of how the synthesis of the design works, some challenges are noticed to the constraints given to the design. Finally, sufficient knowledge and experience was gained in the design process which could be of great importance in the future.

51

REFERENCES

1. http://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memo

ry_Resources.pdf Retrieved March 2015

2. http://en.wikipedia.org/wiki/Gaussian_filter Retrieved March 2015

3. Shapiro, L. G. & Stockman, G. C: "Computer Vision", page 137, 150. Prentence

Hall, 2001 Retrieved March 2015

4. R.A. Haddad and A.N. Akansu, "A Class of Fast Gaussian Binomial Filters for

Speech and Image Processing," IEEE Transactions on , Speech and

Signal Processing, vol. 39, pp 723-727, March 1991 Retrieved March 2015

5. http://hyperphysics.phy-astr.gsu.edu/hbase/math/gaufcn.html#c1 Retrieved March

2015

6. http://users.ecs.soton.ac.uk/msn/book/new_demo/gaussian/ Retrieved March 2015

7. http://www.dsp.toronto.edu/~kostas/Publications2008/pub/bookchapters/bch5.pdf

Retrieved March 2015

8. Filtering in the Time and Frequency Domains by Herman J. Blinchikoff, Anatol I.

Zverev Retrieved March 2015

9. http://en.wikipedia.org/wiki/Image_editing#Sharpening_and_softening_images

Retrieved March 2015

10. Virtual Art: From Illusion to Immersion," MIT Press 2002; Cambridge, Mass

Retrieved March 2015

11. http://en.wikipedia.org/wiki/Gaussian_blur Retrieved March 2015

12. Mark S. Nixon and Alberto S. Aguado. Feature Extraction and Image

Processing. Academic Press, 2008, p. 88 Retrieved March 2015

52

13. Erik Reinhard. High dynamic range imaging: Acquisition, Display, and Image-

Based Lighting. Morgan Kaufmann, 2006, pp. 233–234. Retrieved March 2015

14. http://www.mathworks.com/products/matlab/ Retrieved March 2015

15. Fisher, Perkins, Walker & Wolfart (2003). "Spatial Filters - Laplacian of

Gaussian" Retrieved March 2015

53

APPENDIX A: FPGA SUMMARY REPORTS

Figure 51: HDL Synthesis Report.

54

Figure 52: Device Utilization summary report.

55

Figure 53: Design summary report.

56

Figure 54: Report showing Finite State Machine analysis.

Figure 55: Total time taken and total memory used.

57

Figure 56: Overview of design summary report - part 1

58

Figure 57: Overview of design summary report - part 2

59

Figure 58: Map report showing device info.

60

APPENDIX B: SOURCE CODE

VHDL code for Gaussian Filter module:

------

-- Company : California State University Northridge.

-- Engineer : Sachin Eswar

-- Student ID : 104958241

-- Create Date : 15:16:24 03/07/2015

-- Design Name : Gaussian Filter

-- Module Name : Gaussian - Behavioral

-- Project Name : NOISE REDUCTION AND IMAGE SMOOTHING

USING GAUSSIAN BLUR

------library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.NUMERIC_STD.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity gaussian is

Port ( clk : in STD_LOGIC;

reset : in STD_LOGIC;

ima_hex_out : out STD_LOGIC_VECTOR (15 downto 0)

); end gaussian;

61

architecture Behavioral of gaussian is signal ima_hex_0: std_logic_vector (7 downto 0); signal ker_hex_0: std_logic_vector (7 downto 0); signal ima_hex_add : std_logic_vector (15 downto 0) := (others => '0'); signal ker_hex_add : std_logic_vector (3 downto 0) := (others => '0'); signal output_addr : std_logic_vector (15 downto 0) := (others => '0'); signal result, result_1, output_fifo : std_logic_vector (15 downto 0) ; signal ima_hex : STD_LOGIC_VECTOR (7 downto 0); signal ker_hex : STD_LOGIC_VECTOR (7 downto 0); signal temp_out : std_logic_vector (15 downto 0) ; signal ram_out : std_logic_vector (17 downto 0) ; signal output_conv : std_logic_vector (15 downto 0) ; signal Mul1,Mul2,Mul3,Mul4,Mul5,Mul6,Mul7,Mul8,Mul9 : STD_LOGIC_VECTOR

(15 downto 0);

signal a :integer range 0 to 65535; signal b :integer range 0 to 16;

TYPE array_8 is array (8 downto 0) of STD_LOGIC_VECTOR(7 downto 0); signal image_array, kernel_array : array_8;

TYPE States IS (set_address, read_input, computation, store_output, complete);

62 signal current_state : States; signal compute : std_logic := '0'; signal wea_out : std_logic_vector (0 downto 0) := (others => '0');

COMPONENT b_ram

PORT (

clka : IN STD_LOGIC;

wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);

addra : IN STD_LOGIC_VECTOR(15 DOWNTO 0);

dina : IN STD_LOGIC_VECTOR(7 DOWNTO 0);

douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)

);

END COMPONENT;

COMPONENT fifo_out

PORT (

clk : IN STD_LOGIC;

rst : IN STD_LOGIC;

din : IN STD_LOGIC_VECTOR(15 DOWNTO 0);

wr_en : IN STD_LOGIC;

rd_en : IN STD_LOGIC;

dout : OUT STD_LOGIC_VECTOR(15 DOWNTO 0);

full : OUT STD_LOGIC;

empty : OUT STD_LOGIC

);

63

END COMPONENT;

COMPONENT bram_k

PORT (

clka : IN STD_LOGIC;

wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);

addra : IN STD_LOGIC_VECTOR(3 DOWNTO 0);

dina : IN STD_LOGIC_VECTOR(7 DOWNTO 0);

douta : OUT STD_LOGIC_VECTOR(7 DOWNTO 0)

);

END COMPONENT;

COMPONENT BRAM_OUT IS

PORT (

clka : IN STD_LOGIC;

wea : IN STD_LOGIC_VECTOR(0 DOWNTO 0);

addra : IN STD_LOGIC_VECTOR(15 DOWNTO 0);

dina : IN STD_LOGIC_VECTOR(15 DOWNTO 0);

douta : OUT STD_LOGIC_VECTOR(15 DOWNTO 0)

);

END COMPONENT; begin

Bram_image : b_ram

PORT MAP (

clka => clk,

64

wea => "0",

addra => ima_hex_add,

dina => ima_hex,

douta => ima_hex_0

);

File_fifo : fifo_out

PORT MAP (

clk => clk,

rst => '0',

din => output_fifo,

wr_en => '1',

rd_en => '1',

dout => ima_hex_out,

full => open,

empty => open

);

Bram_kernel : bram_k

PORT MAP (

clka => clk,

wea => "0",

addra => ker_hex_add,

dina => ker_hex,

douta => ker_hex_0

65

);

Bram_output : BRAM_OUT

PORT MAP (

clka => clk,

wea => wea_out,

addra => output_addr,

dina => result,

douta => result_1

);

--- State machine --- process(clk, reset)

variable row_end_count , address_count_image, address_count_kernel, address_count_output : integer :=0;

variable count, pixel_count :integer range 0 to 9:= 0;

variable row_end, done : std_logic := '0'; begin if(reset = '0') then

if (rising_edge (clk)) then

if (done = '0') then

-- State machine description. case current_state IS

66

when set_address =>

wea_out <= "0";

if (pixel_count = 0) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 1) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+1),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 2) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+2),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

67

elsif (pixel_count = 3) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+94),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

address_count_kernel := address_count_kernel+1;

current_state <= read_input;

elsif (pixel_count = 4) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+95),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 5) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+96),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 6) then

68

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+188),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 7) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+189),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 8) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+190),16));

ker_hex_add <= std_logic_vector(to_unsigned((pixel_count),4));

current_state <= read_input;

elsif (pixel_count = 9) then

ima_hex_add <= std_logic_vector(to_unsigned((address_count_image+1),16));

ker_hex_add <= std_logic_vector(to_unsigned((0),4));

69

current_state <= read_input;

end if;

when read_input =>

if (pixel_count = 0) then

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 1) then

image_array(0) <= ima_hex_0;

kernel_array(0) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 2) then

image_array(1) <= ima_hex_0;

kernel_array(1) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 3) then

image_array(2) <= ima_hex_0;

kernel_array(2) <= ker_hex_0;

70

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 4) then

image_array(3) <= ima_hex_0;

kernel_array(3) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 5) then

image_array(4) <= ima_hex_0;

kernel_array(4) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 6) then

image_array(5) <= ima_hex_0;

kernel_array(5) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 7) then

image_array(6) <= ima_hex_0;

71

kernel_array(6) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 8) then

image_array(7) <= ima_hex_0;

kernel_array(7) <= ker_hex_0;

pixel_count := pixel_count+1;

current_state <= set_address;

elsif (pixel_count = 9) then

image_array(8) <= ima_hex_0;

kernel_array(8) <= ker_hex_0;

pixel_count := 0;

current_state <= computation;

end if;

when computation =>

wea_out <= "0";

compute <= '1';

result <= output_conv;

72

current_state <= store_output;

when store_output =>

output_addr <= std_logic_vector(to_unsigned((address_count_output),16));

address_count_output := address_count_output+1;

compute <= '0';

current_state <= complete;

when complete =>

wea_out <= "1";

if(address_count_image >= 8835) then

done := '1';

address_count_output :=0;

else

if (row_end_count = 91) then

address_count_image := address_count_image+3;

row_end_count := 0;

73

done := '0';

current_state <= set_address;

else

address_count_image := address_count_image+1;

row_end_count := row_end_count + 1;

done := '0';

current_state <= set_address;

end if;

end if;

end case;

else

-- When done is 1, convolution is completed. Write the output to memory.--

wea_out <= "0";

output_addr <= std_logic_vector(to_unsigned((address_count_output),16));

output_fifo <= result_1;

address_count_output := address_count_output+1;

74

end if; end if; end if; end process;

-- Process block to perform convolution. --

process (compute) begin if (compute = '1') then

Mul1 <= image_array(0)*kernel_array(0);

Mul2 <= image_array(1)*kernel_array(1);

Mul3 <= image_array(2)*kernel_array(2);

Mul4 <= image_array(3)*kernel_array(3);

Mul5 <= image_array(4)*kernel_array(4);

Mul6 <= image_array(5)*kernel_array(5);

Mul7 <= image_array(6)*kernel_array(6);

Mul8 <= image_array(7)*kernel_array(7);

Mul9 <= image_array(8)*kernel_array(8);

output_conv <= Mul1+Mul2+Mul3+Mul4+Mul5+Mul6+Mul7+Mul8+Mul9;

75 end if; end process; end Behavioral;

VHDL test bench for Gaussian Filter module:

------

-- Company : California State University Northridge.

-- Engineer : Sachin Eswar

-- Student ID : 104958241

-- Create Date : 18:23:02 03/08/2015

-- Design Name : Gaussian Filter

-- Module Name : Test Bench

-- Project Name : NOISE REDUCTION AND IMAGE SMOOTHING

USING GAUSSIAN BLUR

-- VHDL Test Bench Created by ISE for module: gaussian

------

LIBRARY ieee;

USE ieee.std_logic_1164.ALL;

USE ieee.numeric_std.ALL;

USE ieee.std_logic_unsigned.ALL;

USE std.textio.ALL;

USE ieee.std_logic_textio.ALL;

76

ENTITY testbench IS

END testbench;

ARCHITECTURE behavior OF testbench IS

-- Component Declaration for the Unit Under Test (UUT)

COMPONENT image

PORT(

clk : IN std_logic;

reset : IN std_logic;

ima_hex_out : OUT std_logic_vector(15 downto 0)

);

END COMPONENT;

--Inputs

signal clk : std_logic := '0';

signal reset : std_logic := '0';

--Outputs

signal ima_hex_out : std_logic_vector(15 downto 0);

77

-- Clock period definitions

constant clk_period : time := 10 ns;

BEGIN

-- Instantiate the Unit Under Test (UUT)

uut: image PORT MAP (

clk => clk,

reset => reset,

ima_hex_out => ima_hex_out

);

-- Clock process definitions

clk_process :process

begin

clk <= '0';

wait for clk_period/2;

clk <= '1';

wait for clk_period/2;

end process;

-- Stimulus process

stim_proc: process

begin

78

reset <= '1';

wait for 10 ns;

reset <= '0';

-- hold reset state for 100 ns.

wait for 100 ns;

wait for clk_period*10;

-- insert stimulus here

wait;

end process;

-- process block to write the output to a text file. --

process(clk)

file file_out : text is out "C:\temp\output.txt";

variable line_out : line;

variable output_tmp : integer range 0 to 65535;

begin

if (rising_edge (clk)) then

output_tmp := to_integer(unsigned (ima_hex_out));

write(line_out, output_tmp);

writeline(file_out , line_out);

end if;

end process;

END;

79