DEGREE PROJECT IN ELECTRICAL ENGINEERING, SECOND CYCLE

STOCKHOLM, SWEDEN 2019

FPGA Based Lane Tracking system for Autonomous Vehicles

ROHITH RAJ RAM PRAKASH

KTH ROYAL INSTITUTE OF TECHNOLOGY ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Abstract

The application of Image Processing to Autonomous driving has drawn significant attention in recently. However, the demanding nature of the image processing algorithms conveys a considerable burden to any conventional real- time implementation. On the other hand, the emergence of FPGAs has brought numerous facilities toward fast prototyping and implementation of ASICs so that an image processing algorithm can be designed, tested and synthesized in a relatively short period in comparison to traditional approaches. This thesis investigates the best combination of current algorithms to reach an optimum solution to the problem of lane detection and tracking, while aiming to fit the design to a minimal system. The proposed structure realizes three algorithms, namely Edge Detector, Hough Transform, and Kalman filter. For each module, the theoretical background is investigated and a detailed description of the realization is given followed by an analysis of both achievements and shortages of the design. It is concluded by describing the advantages of implementing this architecture and the use of these kinds of systems.

Keywords

Autonomous drive, Image processing, FPGA, Sobel Edge detector, Hough Transform.

i Sammanfattning

Tillämpningen av bildbehandling inom autonoma fordon har fått stor uppmärksamhet den senaste tiden. Emellertid förmedlar den krävande karaktären hos bildbehandlingsalgoritmerna en stor belastning på vilken konventionell realtidsimplementering som helst. Å andra sidan har framväxten av FPGAer medfört många möjligheter till snabb prototypering och implementering av ASICar så att en bildbehandlingsalgoritm kan utformas, testas och syntetiseras på relativt kort tid jämfört med traditionella tillvägagångssätt. Denna avhandling undersöker den bästa kombinationen av nuvarande algoritmer för att uppnå en optimal lösning på problemet med spårning och fildetektering, med målet att krympa designen till ett minimalt system. Den föreslagna strukturen realiserar tre algoritmer, nämligen Edge Detector, Hough Transform och Kalman filter. För varje modul undersöks den teoretiska bakgrunden och en detaljerad beskrivning av realiseringen ges följd av en analys av både fördelar och brister i konstruktionen. Avhandlingen avslutas med en beskrivning av fördelarna med att implementera lösningen på det sätt den görs och hur dessa system kan användas.

Nyckelord

Autonoma enheter, Bildbehandling, FPGA, Sobel Edge Detector, Hough Transform.

ii Acknowledgements

First of all, I would like to thank my Industrial supervisors Gunnar Stjernberg and Adrian Sparrenborn at Synective Labs AB who was greatly involved in all the discussions and decisions related to specifications and scope of the project. Next, I would like to extend a note of thanks to Synective Labs AB for allowing me to work on such an interesting and challenging project.

I would like to thank my examiner Prof. Johnny Öberg for the constant feedback and patience. And last but not least, I would like to gratefully express my thanks and love to my family and friends who have always been supporting me unconditionally.

iii Contents

List of Figures vi

List of Tables 1

1 Introduction 2 1.1 Background ...... 2 1.2 Thesis Goal ...... 2 1.3 Outline ...... 2

2 Related Work 4 2.1 Role of FPGA in Autonomous systems ...... 4 2.2 Literature Survey on the Conventional Design Methodologies on FPGAs for image processing ...... 4 2.3 Detection of lines ...... 5 2.4 Detection of circles ...... 6 2.5 Active driver assistance system ...... 7 2.6 Sobel Edge Detection System Design and Integration on an FPGA . 8

3 Methodology 10 3.1 Edge Detection ...... 10 3.1.1 Theory ...... 10 3.1.2 Matlab Simulation ...... 11 3.2 Lane Detection ...... 12 3.2.1 Theory ...... 12 3.2.2 Implementation ...... 13 3.3 MATLAB Simulation ...... 14

4 Architecture 17 4.1 Hardware Architecture of the Line Detection System ...... 17 4.1.1 Camera Unit ...... 17 4.1.2 Edge Extraction Unit ...... 18 4.1.3 Hough Transform Unit ...... 19 4.1.4 Line Identification Unit ...... 21 4.2 System Architecture in FPGA ...... 24

iv 4.2.1 Camera Pass-through Design ...... 24 4.2.2 Sobel Edge Detector Design ...... 35 4.2.3 Hough Transform Design ...... 36

5 Result and Analysis 37 5.1 Hardware Specification ...... 38 5.2 Hardware test and Analysis ...... 39 5.2.1 Edge detection test ...... 39 5.2.2 Hough Transform Test ...... 41 5.2.3 Resource Usage ...... 43 5.2.4 Prototype ...... 44

6 Conclusions 45 6.1 Conclusion ...... 45 6.2 Future work ...... 46 6.2.1 Insights and suggestions for further work ...... 46 6.2.2 Cost analysis ...... 46

References 47

A MATLAB codes 51 A.1 MATLAB code for Sobel, Canny, Roberts and Prewitt filters . . . . . 51 A.2 MATLAB code for Hough Transform ...... 52

v List of Figures

3.1 Image input image to the filter ...... 11 3.2 Sobel filter ...... 12 3.3 Canny filter ...... 12 3.4 Roberts filter ...... 12 3.5 Prewitt filter ...... 12 3.6 Hough transform lines ...... 13 3.7 Sobel filter output / Hough Transform input ...... 15 3.8 Hough transform space ...... 16 3.9 Hough transform lines ...... 16 4.1 Configuration of Line Detection System ...... 17 4.2 FMC-IMAGEON Hardware – Connectivity Diagram [21] ...... 18 4.3 FMC-IMAGEON Hardware – Block Diagram [21] ...... 18 4.4 Configuration of Hough Transform Unit ...... 20 4.5 Configuration of Peak Detector ...... 21 4.6 The ...... 23 4.7 The architecture of Camera Pass-through Block Design ...... 24 4.8 Channel Architecture of Reads [26] ...... 25 4.9 Channel Architecture of Writes [26] ...... 25 4.10 Block Diagram of Video In to AXI4-Stream Core with the Video Timing Controller [27] ...... 27 4.11 RGB and CMY Bayer CFA Patterns [28] ...... 28 4.12 Block Diagram of Color Filter Array Interpolation [28] ...... 29 4.13 AXI Interconnect Core Diagram [29] ...... 30 4.14 AXI IIC Bus Interface Top-Level Block Diagram [30] ...... 31 4.15 AXI4-Stream to Video Out core with the Video Timing Controller [34] ...... 34 4.16 The architecture of Sobel Edge filter Design ...... 35 4.17 The architecture of the Hough Transform Design ...... 36 5.1 Zedboard and FMC camera setup with JTAG programmer . . . . . 38 5.2 Implemented Sobel Edge detector in FPGA ...... 39 5.3 Camera pass through Implemented ...... 39

vi 5.4 Sobel filter implemented ...... 40 5.5 Implemented random XY lines in FPGA ...... 41 5.6 Implemented random XY lines with Video stream in FPGA . . . . . 41 5.7 Implemented Hough Transform in FPGA ...... 42 5.8 Implemented Hough Transform in MATLAB ...... 42 5.9 Prototype of miniature scale Autonomous car ...... 44

vii List of Tables

3.1 Filter masks ...... 10 5.1 Design resource summary ...... 43

1 1 Introduction

1.1 Background

Nowadays the role of image processing is demanded in a wide range of industry and day to day applications. Image processing units are embedded in many devices ranging from mobile phones to Autonomous cars in the streets. However, the processing speed of some complex algorithms puts a barrier in a real-world implementation. FPGAs, on the other hand, are more flexible and handles these complex algorithms better in real-time environment that make a good choice for Autonomous driving. Most autonomous cars these days use GPUs as the brain for image processing and this thesis describes the real-world implementation of FPGAs for Autonomous cars.

1.2 Thesis Goal

This thesis project aims to design and implement a miniature autonomous car that can detect, track lanes and move around in a controlled environment. The challenging task is extracting the required information from a video stream, which is performed by pre-processing extraction algorithms. There are external noises in the environment in which the experiment is conducted, like lighting, uneven lane marking etc., which must be kept in mind while deciding the optimal algorithm and designing the controller for this platform. Hence it must be decided which controller design has minimal resource utilization and power consumption.

1.3 Outline

• chapter 2 conveys the results of the conducted survey on the conventional approaches already practiced by the expert toward implementing lane detection on the FPGA.

• chapter 3 explains the theory behind the Edge detection and Steerable filters used for Autonomous driving. Then the advantages brought to the

2 design by applying this approach along with its disadvantages is explained. Finally, the MATLAB simulations of these filters are plotted and explained.

• chapter 4 explains Architecture of the design implemented in FPGA. All the blocks are explained in detail regarding its functioning and how it connects to everything.

• chapter 5 describes the implementation of lane detection in FPGA. All the electronics used for implementation are explained and hardware implementation challenges, following by an analysis of the performance of the implemented hardware.

• chapter 6 summarizes the implemented design and its achievements. This Chapter also describes both advantages and shortages of the design and suggestions for future work.

3 2 Related Work

The scope of FPGA based system is getting larger day by day, covering new fields of engineering. Since this thesis aims to design and implement an FPGA system for autonomous driving, a literature survey is conducted on previous studies related to this thesis. A summary of all the different approaches is discussed in this chapter. A detailed lane detection and tracking algorithms will be discussed in later chapters.

2.1 Role of FPGA in Autonomous systems

Every year the lives of more than 1.25 million people are cut short as a result of road traffic crashes. 90% of the world’s fatalities on the roads occur in low and middle-income countries, even though these countries only have approximately 54% of the world’s vehicles.

Currently, Driver-less technology is being tested around the world in companies like Tesla, Waymo, Uber, Intel, etc., to bring this technology to the market by 2020 [1]. Using FPGA-based systems for Autonomous cars, the aim is to decrease the cost of driverless technology and making their controllers’ energy efficient. The advantages of using FPGAs in self-driving is its energy-efficient computing and parallel processing computing architecture. Image processing applications can be accelerated to a greater extent using FPGAs and therefore these systems are more faster and efficient in safety-critical real-time environments.

The camera which is mounted in the front of the car captures several images per second of the lane in the front. The camera is interfaced with the FPGA and the FPGA processes the image data.

2.2 Literature Survey on the Conventional Design Methodologies on FPGAs for image processing

In [2], the approach was to split the Region of Interest (ROI) to Multi ROI, then apply Hough transform and finally image reconstruction algorithm was applied,

4 which increased the recognition rate by 0.6%. But at sharp curves, it showed poor recognition which needs improvement in the algorithm.

In [3], Elhossini et. al. used Hough transform approach and they limited the region of interest by removing unnecessary areas without removing road lines. This was done by not checking the entire 360 degrees during hough transform, the range is limited in the voting process which in turn reduces the complexity, leading to faster real-time detection.

2.3 Detection of lines

Several researchers [4] suggested using the CORDIC algorithm which computes hyperbolic functions like sine and cosine. In [4] the Multi-Sector Algorithm (MSA) is used with the CORDIC algorithm. This architecture is fully pipelined and can achieve a throughput of 30 frames/second with 8-bit gray level images with a size of 256 × 256 . Distributed arithmetic is used in [5] to implement the original HT on FPGA. The design was implemented on Virtex II (XC2V2000) FPGA. It took between 0.1 ms and 1.2 ms to process 256 × 256 images depending on the ratio of edge pixels in the image (from 2% to 50%). Upon synthesis, the FPGA utilizes 271 Configurable Logic Blocks, 1210 four input Look Up Tables (LUT) and the maximum clocking frequency that is achieved is 233 MHz. The design is not pipelined and as such, it did consume a variable, relatively long time. In [6] the parameter space is partitioned to decrease the memory requirements. It can detect each line along with its start and endpoints. The lines are then sorted by their length. The design is implemented on an XC4VLX200 FPGA. The main advantage of this design is the detection of finite lines which enable better representation of the features in the image. The design consumed 15% of the 89,088 Slices in FPGA, with a maximum clock frequency of 67 MHz. It can detect up to 256 lines in an 8-bit image the size of 800×600 and it can process up to 149 images per second.

5 2.4 Detection of circles

To detect circles, the original HT was modified in several ways. The architecture presented in [7] detects circles based on using any 3 nonlinear pixels to form a circle. It processes gray level (8-bit) images with size 256×256 and targets Altera Stratix 1S25 FPGA chip which has more than 1Mb embedded RAM. This design is not pipelined and takes around 4.5 seconds to process a single image. A hardware/software co-design implementation of traffic sign recognition is presented in [8]. A soft-core processor, Nios II, is implemented on Altera FPGA chip using a CYCLONE II DE2 Board. A traffic sign detection algorithm that involves the use of HT to detect circles was coded using ANSI-C for the Nios- II processor. The system processes color images with a resolution of 320×240 in over 3 seconds, with HT consuming 2.3 seconds. An architecture for the implementation of the generalized HT on FPGA was introduced in [9]. The generalized HT is a modification to detect any shape with different scales and orientations. However, in [9] it is used to find the similarity between two video frames. One of the frames is used as a template to be found in the image. The processed images are scaled down to the size of 44 × 36. This resolution is enough for a similarity check. As the calculation of the HT depends on the edge gradients, a big part of the architecture is designed to ensure the accuracy of the edge points calculations to enhance the accuracy of the detection process. Because of the small size of the frame, the architecture achieves a high .

A hardware implementation of both FPGA and ASIC is presented in [10] for the implementation of a driving assistance system. HT is implemented as the basic component to detect lines and circles which are used to detect road lanes and traffic signs. The HT employs a window approach to detect peaks for objects in the image. The common feature of all architectures reviewed in this section is the use of external memory to store the HT data which leads to operating on relatively small frame size. These limitations narrow the range of applications that can benefit from implementing HT on FPGAs. In this paper, both of these issues were addressed using a proposed memory architecture that eliminates the need to use external memory while processing large size images at a high frame rate. In [11], El et. al. proposed a methodology that has two parts, a Sobel operator with Hough

6 transform and Kalman filter for lane tracking. To meet real-time requirements, they used gradient directions of the edge stage to simplify the CORDIC algorithm. The obtained results show that the proposed algorithm can reliably detect and track lanes in different illumination conditions.

2.5 Active driver assistance system

The active driver assistance system (ADAS) is presented today as an efficient way to increase road safety. Recent trend and developments in several technologies (micro-sensor, embedded electronics) and computer vision plays an important role in the development of these systems to increase the ability to perform embedded intensive calculations with low power, more intelligence, and low cost. Several examples in the literature [12][13] report applications such as lane departure prevention, detection, and recognition of road signs, pedestrian detection, parking-aid, automatic cruise control, automatic switching on/off beams, collision risk warning, etc. One main problem of the road scene is the visibility and dependence on the illumination environment conditions: sunny, foggy, rainy, cloudy, etc. In such a condition, the camera could give noisy frames that may generate a false alarm to the driver or misinterpret a dangerous situation. At a first approach [14], ADAS to adher at least to the three following conditions: 1) Real-time is the most important constraint, for embedded functions, since ADAS are going to be integrated into more complex intelligent vehicle systems that consume a lot of computing resources; 2) Results have to be accurate, especially for the segmentation function which is the first processing block. Any false detection will affect and make the final result erroneous. Segmentation has to be done independently of the illumination road scene; and 3) The algorithm has to be able to manage with different illumination scenarios such as sunny, cloudy, rainy. In [15], Hwang et al uses a camera for lane tracking and they have used several optimization techniques like Modified Lane Detection Algorithm which eliminates floating-point operator by simplifying to the closest approximation. Even though the proposed operation reduces the quality of the image, it preserves the edge patterns which is useful for lane tracking.

7 2.6 Sobel Edge Detection System Design and Integration on an FPGA

The image processing system’s reliability can be gauged by two major factors. The first is the quality of image transmitted from the external environment since they replace human sight, especially on assistance systems. Second is the processing precision and speed. One missed can cause problems in processing and can be an output risk. For this aim, using a high definition streaming architecture to integrate pixel processing applications became a viable solution to improve embedded image processing systems decision making. But moving from standard definition (SD) to high definition (HD) represents a six times increase in the data that needs to be transmitted and to process this data in real-time with precision, intensive computing resources are required. Software-based CPU, DSP and GPU based architectures cannot satisfy the real- time, power consumption and cost constraints. These software architectures cannot achieve the expected performance when working on transmitting 1920 X 1080 HD videos, which need a minimum throughput of 60 frames per second. Edge detection is the initial element in image analysis. It has an important role in many applications such as visually impaired people’s assistance and ADAS, and it gives information as a precursor step for object recognition, segmentation, feature extraction. Edge detection operators are Sobel, Prewitt, Canny, Laplacian or Robert, etc [16]. These operators detect the edges in the image by calculating the gradient in different directions and require a high amount of computation power. Therefore, a hardware implementation can solve this problem, offering a higher throughput than software solutions [17]. With the new high resolution streaming architectures, edge detection operators must run up to 3 Gbit samples per second of input data. For example, we need 60 fps for a high definition video standard 1080p (1920x1080), the equivalent of 126 megapixels per second, 2.1 megapixels per frame, resulting in a data throughput of 3 Gigabit per second, using 24 bits per pixel. Recently, the Zynq-based FPGA is becoming more attractive for image processing systems developers, due to its capability to have parallel and high computational power and density compared to software-based solution processors. In our Sobel edge detection application, a Zynq-7000 based

8 Xilinx SoC, a single FPGA combined with an ARM Cortex-A3 soft processor, can achieve the requirement on processing power. Our custom IP core is designed to be interfaced with the ARM processor on the Zynq chip through a protocol called AXI [18]. It can be validated with simplified board complexity and larger flexibility.

9 3 Methodology

3.1 Edge Detection

3.1.1 Theory

As a formal definition, any step discontinuity is regarded as an edge. Hence, traditional approaches of edge detection are simply a process of finding the local maxima in the first derivative or the zero crossings in the second derivative by convolving the image by some form of linear filter that approximates either first or second derivatives. While an odd symmetric function can approximate the first derivative, the second derivative is approximated by an even symmetric function.

In fact, in the discrete domain, the gradient of the image can be simply calculated by taking the difference of the gray values in the image. This procedure is equal to convolving the image by one of the masks from Table 3.1. The obvious disadvantage of this simplification is that it is not clear which pixel the result is associated to. There are various approaches to this issue among which the following filter masks offer a first derivative solution.

( ∂I ) ( ∂I ) ∂x  ∂y  0 1 1 0 Robert     −1 0 0 −1     −1 0 1 1 1 1         Prewitt −1 0 1  0 0 0  −1 0 1 −1 −1 −1     −1 0 1 1 2 1         Sobel −2 0 2  0 0 0  −1 0 1 −1 −2 −1

Table 3.1: Filter masks

10 To point to the gradient map of the frame, in each case the magnitude of the gradient map is calculated by the equation 1

√ ∂I 2 ∂I 2 ( ) + ( ) (1) ∂x ∂y

The main concern in derivative edge detection is the effect of the noise since the local maxima due to the white noise can mask the real gradient maxima due to an edge. That is why it is required to convolve the image with a smoothing function e.g. Gaussian function so that the effect of the white noise is minimized. Both of the mentioned operators, e.g., the gradient and the Gaussian, are linear and therefore it is computationally more efficient to combine them.

3.1.2 Matlab Simulation

The image processing algorithms are simulated using Matlab. Different edge detection filters were tested in in Matlab: Sobel, Canny, Prewitt and Roberts. The test image 3.1 was a simple straight road which was the input to the filter. The figure 3.2, 3.3, 3.4 and 3.5 shows the Sobel, Canny, Roberts and Prewitt filter outputs respectively. The Matlab codes for all these filters can be seen in Appendix A.1

Figure 3.1: Image input image to the filter

11 Figure 3.2: Sobel filter Figure 3.3: Canny filter

Figure 3.4: Roberts filter Figure 3.5: Prewitt filter

3.2 Lane Detection

3.2.1 Theory

The detection of lanes can be done in many ways and a long-used approach is the Hough Transform [19]. A Hough transform is a feature extraction technique used in image analysis, computer vision, and . The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure [20].

The simplest case of the Hough transform is detecting straight lines. In general, the straight line y = mx + b can be represented as a point (b, m) in the parameter space. However, vertical lines pose a problem. They would give rise to unbounded values of the slope parameter m. Thus, for computational reasons, we use the Hesse normal form:

r = x cos θ + y sin θ (2) where r is the distance from the origin to the closest point on the straight line,

12 Figure 3.6: Hough transform lines and θ (theta) is the angle between the x-axis and the line connecting the origin with that closest point. It is, therefore, possible to associate with each line of the image a pair (r,θ ). The (r,θ) plane is sometimes referred to as the Hough space for the set of straight lines in two dimensions. This representation makes the Hough transform conceptually very close to the two-dimensional Radon transform.

Given a single point in the plane, then the set of all straight lines going through that point corresponds to a sinusoidal curve in the (r,θ) plane, which is unique to that point. A set of two or more points that form a straight line will produce sinusoids which cross at the (r,θ) for that line. Thus, the problem of detecting collinear points can be converted to the problem of finding concurrent curves.

3.2.2 Implementation

The linear Hough transform algorithm uses a two-dimensional array, called an accumulator, to detect the existence of a line described by the equation below

r = x cos θ + y sin θ (3)

13 The dimension of the accumulator equals the number of unknown parameters, i.e., two, considering quantized values of r and θ is the pair (r,θ). For each pixel at (x,y) and its neighborhood, the Hough transform algorithm determines if there is enough evidence of a straight line at that pixel. If so, it will calculate the parameters (r,θ) of that line, and then look for the accumulator’s bin that the parameters fall into, and increment the value of that bin. By finding the bins with the highest values, typically by looking for local maxima in the accumulator space, the most likely lines can be extracted, and their (approximate) geometric definitions read off. The simplest way of finding these peaks is by applying some form of threshold, but other techniques may yield better results under different circumstances – determining which lines are found as well as how many. Since the lines returned do not contain any length information, it is often necessary in the next step to find which parts of the image match up with which lines. Moreover, due to imperfection errors in the edge detection step, there will usually be errors in the accumulator space, which may make it non-trivial to find the appropriate peaks, and thus the appropriate lines [20].

The final result of the linear Hough transform is a two-dimensional array (matrix) similar to the accumulator—one dimension of this matrix is the quantized angle θ and the other dimension is the quantized distance r. Each element of the matrix has a value equal to the sum of the points or pixels that are positioned on the line represented by quantized parameters (r, θ). So the element with the highest value indicates the straight line that is most represented in the input image.

3.3 MATLAB Simulation

The input for the Hough transform is an edge detected image as shown in figure 3.7. Here a Sobel filter is used for the edge detection process.

As was discussed in section 3.2.2, the Hough transform space is calculated using equation 3. It is plotted as shown in figure 3.8 and the final image with Hough lines is shown in the figure 3.9. The Matlab code for the Hough transform can be found in Appendix A.2

The Hough Transform is a very demanding algorithm in terms of computational

14 Figure 3.7: Sobel filter output / Hough Transform input complexity. However, in most cases either the search space is limited to certain parts of the image or combined reduce the number of calculations. There is also a second approach, where the image is divided into a number of tiles and each tile is searched for the lane passing the center of mass in that tile. A third approach that is widely used in the literature aims to find the present lanes in the image by interpolating a variety of different Splines to the interest points in the image. Among all the above investigated Splines, B-Spline snakes found to be more accurate and popular. Finally, the statistical approaches have drawn a lot of attention in the literature. The last two approaches are very expensive in terms of computational demands and have never been the subject of hardware implementation. The first Hough transform approach is the better choice for hardware implementation.

15 Figure 3.8: Hough transform space

Figure 3.9: Hough transform lines

16 4 Architecture

4.1 Hardware Architecture of the Line Detection System

The line detection system has been implemented as a HW accelerator on an FPGA. The conceptual hardware architecture is shown in Figure 4.1. The system is divided into four units. These are the Camera Input Unit, the Edge Extraction Unit, the Hough Transform Unit, and the Line Identification Unit. Each module’s input and output sequence, and internal operations is described in the following subsections.

​Edge Image

Camera Input Unit Edge Extration Unit Hough Transform Line Identification Unit Unit

Raw Line Equation ​Camera FMC Imageon IP Edge Calculator Image Calculator Inverse Line Equation Calculator

Edge Position Clock Manager ​Edge Voting Memory Estimator Position ​Peak Pixel Line Identifier Data Parameters Position

Coordinate Counter Edge Position List Peak Detector

Memory Controller

Peak Table

​ ​Clock & Sync signals

Figure 4.1: Configuration of Line Detection System

4.1.1 Camera Unit

The camera unit consists of an ON Semiconductor Image sensor and HDMI Input/Output FMC module [21]. The FMC module connects to an FMC carrier and provides HDMI Input, HDMI Output and LCEDI Interface for VITA Image Sensor modules [22].

The following block diagrams 4.2, 4.3 illustrates the connectivity of the FMC module.

17 Figure 4.2: FMC-IMAGEON Hardware – Connectivity Diagram [21]

Figure 4.3: FMC-IMAGEON Hardware – Block Diagram [21]

4.1.2 Edge Extraction Unit

1. Edge Calculator : The edge calculator extracts an edge image from the raw image using real-time window processing [23]. The raw image pixel and its position are transferred from the camera input unit. The edge calculator can store the local region of a raw image using the window buffer, but the edge calculator does not use a frame buffer. By using the Sobel algorithm, the edge calculator determines whether or not the center pixel of the window buffer is an edge point. If this pixel is an edge, then the edge calculator outputs a ’1’. If this pixel is not an edge point, the edge calculator outputs a ’0’, so the edge calculator converts a 1920*1080*8 bit raw image to a 1920*1080*1 bit

18 edge image, which is a binary image. This edge image is transferred to the edge position estimator and the memory controller of the line identification unit.

2. Edge Position Estimator : The edge position estimator makes an edge position list from the edge image. The edge image is not suitable for calculating using a line equation because the line equation needs the x and y coordinates, but the edge image is just binary information. The edge position estimator checks the result of the edge calculator. If this pixel is determined to be an edge by the edge calculator, then the edge position estimator stores this pixel’s position in the edge position list. At the end of the edge image, the edge position estimator writes the final data of bit ’1’ to the edge position list to indicate the end of the list. In the final data, if all of the bits are ’1’, which is not valid a position, so the system can distinguish between the final data and the valid positions.

3. Edge Position List : The edge position list stores the position information of every edge point in one edge image. The edge position list maintains two similar lists to prevent conflicts during read and write operations. It is impossible to perform read and write operations simultaneously. One list is for reading; the other is for writing. By using this double buffering method, the reduction of the frame processing rate is prevented. Moreover, the Hough transform unit can operate independently of the camera clock, so the Hough transform can be processed at a higher speed using a higher frequency clock.

4.1.3 Hough Transform Unit

The Hough transform unit is configured as four modules. One is the ”line equation calculator” that calculates the rho value using the line equation. The second is the ”voting memory” that makes the partitioned parameter space using the rho value. The third is the ”peak detector” that finds the peak position in the partitioned parameter space. The fourth is the ”peak table” to store the theta and rho values of the detected peak. The last is the Hough transform controller that controls

19 these four modules. The peak table transfers information to the line identification unit to identify the exact line position. Figure 4.4 presents the Hough transform unit. The Hough transform unit contains a 15 line equation calculator and voting memory pairs, so it can operate 15 times faster. The line equation calculator and voting memory operate in parallel to calculate 15 values of theta simultaneously, so this operation must be repeated 21 times to calculate the 315 theta values.

​Theta, x, y

Line Equation ​15 numbers Voting memory ​15 numbers Peak Detector Calculator

Peak Table

Figure 4.4: Configuration​ of Hough Transform Unit

1. Line Equation Calculator : The line equation calculator operates according to equation 4. It has a sine table, a cosine table, two multipliers, and one adder. The sine and cosine table is constructed using a read-only memory (ROM). These whole sub-modules are configured as a pipelined structure. After defined pipeline latency, a final result consisting of the rho value is produced at every clock cycle. The total pipeline latency that is involved in converting the (x, y, theta) information into the rho value is 7 clock cycles.

p = x ∗ Cos(ϕ) + y ∗ Sin(ϕ) (4)

2. Voting Memory : The voting memory is configured in one block memory of the FPGA. The voting memory has two operation modes. One mode is to increase and the other is to clear. This voting memory accumulates the input rho values in increase mode. First, it reads the value of the memory cell in the voting memory that is indicated by the rho value, and this memory cell’s value is increased and updated. In clear mode, the contents of the voting

20 memory are transferred to the peak detector, and then whole data is cleared. The address of the voting memory is sequentially increased in clear mode.

3. Peak Detector : The peak detector detects the peak point in the voting memory, which is partitioned in the parameter space. The partitioned parameter space size is 15*800. 4.5 shows the configuration of the peak detector. The peak detector finds local maxima in 15*15 parameter space and then compares this max value and a predefined threshold value.

Peak Detector

1 to 1 comparator with Peak treshold

Voting Memory 15 to 1 comparator max in columns

max in rows 15 to 1 comparator

R1 R2 R15 ​

Delay buffer

Figure 4.5: Configuration of Peak Detector

4. Peak Table : The peak table stores the line parameters of the detected peak point in the partitioned parameter space. The contents of the peak table are transferred to the line identification unit to identify the exact line position.

4.1.4 Line Identification Unit

The line identification part identifies the exact line position and its peak line parameters in an edge image. The line identification unit is configured with three modules. The first is the inverse line equation calculator to find the position of an image domain using the line parameters. The second is the line identifier which checks the existence of a line in an edge image with the position value of the inverse

21 line equation calculator. The last is the ”memory controller” which stores the edge image and transfers the edge pixels to the line identifier.

1. Inverse Equation Calculator The inverse line equation calculator operates using equation 5.

for i = 1:1920 or 1080

if 0.79 ≤ θ ≤ 2.35 then

ρ − icosθ y = (5) sinθ

else

ρ − isinθ x = (6) cosθ

end i f end f o r

Using these equation, it is possible to know the line position by using the line parameters which are rho and theta. The index of equation 5 varies and it is a function of theta. If theta’s range is the range between 0.79 and 2.35, the x coordinate is used as an index and equation 5 calculates the corresponding y position. In this case, the range of the index value x is from 1 to 1920.

Alternatively, the y coordinate can be used as an index for the equation 6, with y value ranging from 1 to 1080. The inverse line equation produces the (x,y) position of the line parameters. This position data is transferred to the line identifier and memory controller to find the exact line position. The line parameter comes from the peak table of the Hough transform unit. If the index reaches the maximum value (either 1080 or 1920), the inverse line equation calculator reads the next line parameter in the peak table. The inverse line equation calculator repeats these operations until no more parameters remain in the peak table.

22 2. Line Identifier : The line identifier identifies the line using the line parameters and the edge image. The line identifier receives the position data of the line parameters from the inverse line equation calculator, and it requests this position’s edge data from the memory controller. The edge image in the memory controller is a 1920*1080*8 bit image. If the pixel is an edge, then the value of the pixel is ”11111111”, otherwise the value of the pixel is ”00000000”.

​BRAM odd Edge Line Image Image Front ​Back FIFO FIFO ​BRAM even

Memory Controller

Inverse Line Equation Line Identifier Calculator

Line Position Edge Data Data

Figure 4.6: The Memory Controller

3. Memory Controller : The memory controller stores the edge image and transfers the edge pixel in this edge image with the identifier’s request. The memory controller controls two FIFO memories and two BRAMs. The FIFO is a frame buffer to separate the operating speed of the line identification unit from the camera clock speed. The BRAM stores the edge image and it also reads and writes simultaneously, so the memory controller has two BRAMs and switches between them when performing read and write operation. Figure 4.6 shows the configuration of the memory controller.

This double buffer technique is used to store the HT space. Two instances of the HT memory are used; the first one is updated using the current frame data while the second one stores the HT data of the previous frame which is then used for further processing. With each new frame, the buffers are switched using multiplexers.

23 4.2 System Architecture in FPGA

4.2.1 Camera Pass-through Design

Figure 4.7: The architecture of Camera Pass-through Block Design

The figure 4.7 shows the system architecture for camera pass-through design implemented in Zynq7 FPGA. The design contains the following blocks.

• ZYNQ7 Processing System The Processing System IP is the software interface around the Zynq-7000 Processing System. The Zynq-7000 family consists of a system-on-chip (SoC) style integrated processing system (PS) and a Programmable Logic (PL) unit, providing an extensible and flexible SoC solution on a single die. The Processing System IP Wrapper acts as a logical connection between the processing system (PS) and the Programmable Logic (PL) [24].

The ZYNQ7 Processing System in this project is set to generate two clocks 150Mhz and 200Mhz. The 150 Mhz clock is used for the AXI4-Stream interconnect and cores and the 200 Mhz clock is used for the VITA receiver block for its Input Delay (IDELAY) primitives.

AXI Overview

AXI stands for Advanced eXtensible Interface, part of the ARM Advanced Microcontroller Bus Architecture 3 (AXI3) and 4 (AXI4) specifications. AXI has been introduced in 2003 with the AMBA3 specification. In 2010, a new revision of AMBA, AMBA4, defined the AXI4, AXI4-Lite and AXI4-Stream protocol. AXI is royalty-free and its specification is freely available from ARM [25].

24 AMBA AXI4, AXI4-Lite, and AXI4-Stream have been adopted by Xilinx [26]. AXI4—for high-performance memory-mapped requirements. AXI4-Lite—for simple, low-throughput memory-mapped communication (for example, to and from control and status registers). AXI4-Stream—for high-speed streaming data.

AXI Working: Both AXI4 and AXI4-Lite interfaces consist of five different channels namely Read Address Channel, Write Address Channel, Read Data Channel, Write Data Channel and Write Response Channel.

Figure 4.8: Channel Architecture of Reads [26]

Figure 4.9: Channel Architecture of Writes [26]

In this project, AXI4-Stream is used for the transfer of data between video processing blocks.

25 AXI4-Stream: The AXI4-Stream protocol defines a single channel for transmission of streaming data. The AXI4-Stream channel is modeled after the Write Data channel of the AXI4. Unlike AXI4, AXI4-Stream interfaces can burst an unlimited amount of data.

• FMC Imageon VITA Block The Input Imageon block comprises of the following cores as below.

– VITA Receiver This core contains logic that will de-serialize the raw pixels from the VITA-2000 image sensor.

– Video In to AXI4 Stream The Video In to AXI4 Stream IP core converts common parallel video signals to an AXI4-Stream interface. The input video signals have data, clock, DE, sync signals (Vsync and Hsync) and/or blanking signals (Vblank and Hblank). The AXI4- Stream interface signals are compliant to the AXI4-Stream Video Protocol as defined in the AXI Reference Guide [26]. This core works in conjunction with the Xilinx Video Timing Controller (VTC) core to detect characteristics of the incoming video format that can be read by a system processor and used to configure subsequent processing blocks [27].

A block diagram of a Video In to AXI4-Stream core with a video timing generator is shown in Fig 4.10.

Video In is defined as parallel video data with a pixel clock and one of the following sets of timing signals (Vsync, Hsync, and Data Valid or Vblank, Hblank, and Data Valid or Vsync, Hsync, Vblank, Hblank, and Data Valid ). In this project, Video In has Vsync, Hsync, Vblank, Hblank, and Data Valid.

The output side of the core is an AXI4-Stream interface in the master mode. This interface consists of parallel video data, tdata, handshaking signals tvalid and tready, and two flags, tlast and tuser, which identify certain pixels in the video stream. The flag tlast designates the last valid pixel of each line and is also known as the end of the line (EOL). The flag tuser designates the first valid pixel of a frame and is known as

26 Figure 4.10: Block Diagram of Video In to AXI4-Stream Core with the Video Timing Controller [27]

the start of frame (SOF). These two flags are necessary to identify pixel locations on the AXI4 stream bus because there are no sync or blank signals. Only active pixels are carried on the bus.

The Video In to AXI4S core is used in parallel with the detector functionality of the VTC. The video timing detector detects the line standard of the incoming video and makes the detected timing values, such as the number of active pixels per line and the number of active lines available to video processing cores downstream of the Video In to AXI4-Stream core via an AXI4-Lite interface.

The parameters configured for this IP are Pixels per clock set to one, Video Input component width is set to 8, AXI4S Output component width set to 8 and the FIFO depth is set to 1024. The Input and output widths are set based on video format.

27 – Colour Filter Array Interpolation

The Color Filter Array Interpolation (CFA) IP is used for converting Bayer sensor data to the RGB color domain. The generation of images from a digital sensor requires a conversion from RAW to RGB data, which is handled by a color filter array interpolation algorithm. This IP generates the missing color components associated with the commonly used Bayer pattern in digital camera systems.

Background for CFA: Images captured by a CMOS/CCD image sensor are monochrome. In this project, we used a VITA 2000 camera which is a CMOS image sensor. To generate a color image, three primary colors - typically Red, Green, and Blue - are required for each pixel. Before the invention of color image sensors, the color image was created by superimposing three identical images with three different primary colors. These images were captured by placing different color filters in front of the sensor, allowing a certain bandwidth of the visible light to pass through. Kodak scientist Dr. Bryce Bayer realized that an image sensor with a Color Filter Array (CFA) pattern would allow the reconstruction of all the colors of a scene from single image capture. Example CFA patterns are shown in Figure 4.11. These are called Bayer patterns and are used in most digital imaging systems [28].

Figure 4.11: RGB and CMY Bayer CFA Patterns [28]

The original data for each pixel contains information only about one

28 color, based on which color filter is positioned over that pixel. However, information for all three primary colors is needed at each pixel to reconstruct a color image. Some missing information can be recreated from the information available in neighboring pixels. This process of recreating the missing color information is called color interpolation or demosaicing.

A variety of simple interpolation methods, such as Pixel Replication, Nearest Neighbor Interpolation, Bilinear Interpolation, and Bicubic Interpolation have been widely used for CFA demosaicing. Simple methods usually compromise quality, and more elaborate methods require the use of an external frame buffer. The CFA core was designed to efficiently suppress interpolation artifacts, such as the zipper and color aliasing effects by minimizing Chrominance Variances in a 5x5 neighborhood. The figure 4.12 consist of a block diagram outlining the CFA core.

Figure 4.12: Block Diagram of Color Filter Array Interpolation [28]

Image sensor datasheets specify whether the starting position, pixel(0,0) of the Bayer sampling grid is on a red-green or a blue-green line, and whether or not the first pixel is green. In this project, the

29 Image sensor VITA 2000 has a starting Blue-green and so the CFA core is set to the BGBG Bayer phase. The CFA IP core supports all four possible Bayer phase combinations of RGRG, GRGR, GBGB, and BGBG.

• AXI Interconnect The AXI Interconnect IP core is used to connect one or more AXI memory-mapped master devices to one or more memory-mapped slave devices, which can vary from one another in terms of data width, clock domain and AXI sub-protocol (AXI4, AXI3, or AXI4-Lite) [29]. In this project, the AXI Interconnect is configured to connect between 4 master and 2 slaves. The Master interfaces are Color Filter Array Interpolation (CFA) IP, I2C Controller, FMC Imageon VITA Receiver IP and Video Timing Controller. The Slave interface is connected to GPIO 1 and GPIO 2 of the Zynq Processing System.

Figure 4.13: AXI Interconnect Core Diagram [29]

The fig 4.13 shows the AXI Interconnect core block diagram. Inside the AXI Interconnect core, a Crossbar core routes traffic between the Slave Interfaces (SI) and Master Interfaces (MI). Along each pathway connecting an SI or MI to the Crossbar, an optional series of AXI Infrastructure cores (couplers) can perform the various conversion and buffering functions. The couplers include Register Slice, Data FIFO, Clock Converter, Data Width Converter, and Protocol Converter.

• VITA IO VITA IO shown in fig.4.7 is a external port connection of the VITA

30 2000 camera.

• AXI I2C This is an AXI I2C Controller block which is used to configure the I2C peripherals on the FMC module. This core allows the processor to configure the FMC-IMAGEON hardware peripherals in it, namely ADV7611(HDMI input device), ADV7511 (HDMI output device) and CDCE913 (video clock synthesizer). AXI IIC core is connected to the ZYNQ7 Processing System’s GP0 via AXI Interconnect.

Figure 4.14: AXI IIC Bus Interface Top-Level Block Diagram [30]

The figure 4.14 illustrates the top-level block diagram for the AXI I2C bus interface. The modules shown in the diagram are explained as follows. AXI4-Lite Interface module implements a 32-bit AXI4-Lite Slave interface for accessing AXI IIC registers. Interrupt Control module interrupt status from the AXI I2C and generates an interrupt to the host. Registers Interface contains Control and Status registers. It also provides an option to access TX FIFO and RX FIFO. Registers are accessed through the AXI4-lite interface. TX and RX FIFOs are used to store data before it is transmitted on the bus or sent to the processor. Dynamic Master controls the mode of the IIC block dynamically. This block works when a start bit and a stop bit are written in the transmit FIFO.

31 Soft Reset resets the block using the software. IIC Control contains the state machine that controls the IIC interface. It interfaces with the Dynamic Master block to configure the core as Master or Slave. Interrupt Control generates interrupts for various conditions based on the Interrupt Enable register settings [30].

• FMC Imageon iic shown in fig.4.7 is an external port connection of the FMC module.

• FMC Imageon HDMI Out Block This core contains logic that will embed the synchronization signals in the 16-bit video data sent to the ADV7511 HDMI output device. It comprises the following cores:

– Video timing Controller The Xilinx Video Timing Controller LogiCORE IP is a general-purpose video timing detector and generator, which automatically detects blanking and active data timing based on the input horizontal and vertical synchronization pulses. The Video Timing Controller can generate video timing signals and allows for adjustment of timing within a video design. The core is programmable through registers, provides a comprehensive set of interrupt controls, and supports multiple system configurations. The core is commonly used with the Video In to AXI4-Stream IP Core to detect the format and timing of incoming video or with the AXI4-Stream to Video Out IP Core to generate outgoing video timing for downstream sinks [31].

– RGB to YCrCb Colour space converter The RGB to YCrCb Color- Space Converter core transforms RGB video data into YCrCb 4:4:4 or YUV 4:4:4 video data. A color space is a mathematical representation of a set of colors. The most popular color models are: RGB or R’G’B’, gamma-corrected RGB, used in computer graphics YIQ, YUV, and YCrCb used in video systems. YUV or YCrCb represents the hue, saturation and brightness [32].

RGB to YUV is used in this project since implementing a filter with a YUV input is easier than having RGB input for a filter.

32 – Chroma Resampler

Background The human eye is not as receptive to chrominance (color) detail as luminance (brightness) detail. Using color-space conversion, it is possible to convert RGB into the YCbCr color space, where Y is Luminance information, and Cb and Cr are derived from color difference signals. At normal viewing distances, there is no perceptible loss incurred by sampling the color difference signals (Cb and Cr) at a lower rate to provide a simple and effective video compression to reduce storage and transmission costs.

The Chroma Resampler core converts between chroma formats of 4:4:4, 4:2:2, and 4:2:0. Conversion is achieved using an FIR filter approach. Some conversions require filtering only in the horizontal dimension, some in the vertical dimensions, and some in both. Interpolation operations are implemented using a two-phase polyphase FIR filter. Decimation operations are implemented using a low-pass FIR filter to suppress chroma aliasing [33].

In this project, the core is set to perform a conversion from 4:4:4 to 4:2:2 to reduce storage.

– AXI Stream to Video Out The AXI-4 Stream to Video Out IP core converts AXI4-Stream interface signals to a standard parallel video output interface with timing signals. The AXI4-Stream interface accepts signals that are compliant to the AXI4-Stream Video Protocol as defined in the AXI Reference Guide[26]. The output interface contains standard video timing signals including Vsync, Hsync, Vblank, Hblank, DE and pixel clock. This core works in conjunction with the Xilinx Video Timing Controller (VTC) core to generate the video format timing signals [34]. In this project, the AXI Stream interface runs at a separate clock of 150 Mhz.

A diagram of an AXI4Stream to Video Out core with a video timing generator is shown in Figure 4.15.

33 Figure 4.15: AXI4-Stream to Video Out core with the Video Timing Controller [34]

• HDMIO IO is the external port of the HDMI out port on the FMC Imageon module as shown in fig 4.7.

34 4.2.2 Sobel Edge Detector Design

Figure 4.16: The architecture of Sobel Edge filter Design

The fig 4.16 shows the architecture of the Sobel filter design. When implementing Sobel Edge Filter, in addition to Passthrough design, there are three more blocks needed to implement the Sobel filter. Those are the RGB to YCrCb Converter, Sobel filter block and YCrCb to RGB Converter. The blocks are described below

• RGB to YCrCb Converter The RGB to YCrCb Color-Space Converter core transforms RGB video data into YCrCb 4:4:4 or YUV 4:4:4 video data. This core is the one implemented in the above section 4.2.1 Pass-through design.

• Sobel filter block This block contains the RTL design of the Sobel filter. The architecture of Sobel filter implementation is described in previous section 4.1.2 Edge Extraction Unit. The output of this block is finite pixel points in an array of 1980x1080 representing the Edge data.

• YCrCb to RGB Converter The YCrCb to RGB Color-Space Converter core is a simplified 3x3 matrix multiplier converting three input color samples to three output samples in a single clock cycle. The optimized structure uses only four XtremeDSP slices [35] by taking advantage of the dependencies between coefficients in the conversion matrix of most YCrCb 4:4:4 or YUV 4:4:4 to RGB standards [36].

35 4.2.3 Hough Transform Design

Figure 4.17: The architecture of the Hough Transform Design

The fig 4.17 shows the architecture of the Hough transform design. When implementing the Hough Transform, in addition to the Sobel filter design, there is one more block needed which takes the edge position as input from the Sobel filter block and calculates the Hough transform. The block is described below.

• Hough Transform block This block contains the RTL design of the Hough transform. The architecture of Hough transform implementation is as described in previous section 4.1.3 Hough Transform Unit. The output of this block is finite pixel points in an array of 1980x1080 describing a line which in turn represents a Lane.

36 5 Result and Analysis

In video streaming, pixels arrive one after another from the video source. The HT algorithm operates on a pixel by pixel basis which makes it suitable for hardware implementation. With each new pixel, HTspace is updated according to the parameterized model of the line or circle. To reduce the memory requirements of the HT space, several measures are proposed. First, the HT is sampled in an efficient way to reduce the size of the HT space and hence reduce the memory size required. Second, the mathematical calculation of the HT is distributed over several math units that operate in parallel. Each unit accesses its memory module. This allows multiple parallel reads and writes to the memory. Third, a double buffer technique is used to allow read/write operations at the same time. All these techniques are combined with a fully pipelined architecture that operates on a pixel by pixel basis. This eliminates the need to buffer frames during video processing and hence reduces the memory requirement.

HT works on the contour points resulting from the edge detection process. Each contour point contributes to a voting process for several instances of the object being detected. For each instance, a memory location is used to store its vote. The memory used to store the votes of all instances of the object is called HT space. The local minima in the HT space represent the object that received the maximum number of votes. A parameterized model of the object should be provided to relate the contour pixels position to the object template. Straight line equation can be expressed in a parameterized form as

ρ = x ∗ Cos(ϕ) + y ∗ Sin(ϕ) (7)

The HT space constructed using Equation 7 is a two-dimensional space where each point in the space represents a line that corresponds to a single value of ρ and ϕ. Each new pixel is used to compute the value of ρ using the position of the pixel (x, y) for each angle ϕ. The vote for the line defined by ρ and ϕ is then incremented. The HT space should be sampled to allow a limited memory size to store the voting results. Sampling the HT space affects the accuracy of the resulting transform, but

37 is essential for efficient hardware implementation. HT can be modified to detect other shapes by modifying the parameterized model, while the basic flow of the algorithm remains the same.

5.1 Hardware Specification

Before mounting the camera to the autonomous car, the test setup for the camera was constructed as shown in figure 5.1. The setup included Zedboard, FMC camera, JTAG programmer.

Board Specfication: ZedBoard [37] is a development board for the Xilinx Zynq- 7000 all programmable SoC (AP SoC). The Zynq-7000 AP SoCs is a tightly coupled ARM processing system and 7-series programmable logic.

Camera specification The ON Semiconductor VITA-2000-color [22] image sensor module is a high definition camera with high frame rates and features a global shutter.

JTAG Programmer

HD Camera

Zedboard FMC Imageon card

Figure 5.1: Zedboard and FMC camera setup with JTAG programmer

38 5.2 Hardware test and Analysis

The figure 5.4 below shows the test setup which also includes an LCD monitor to display the filter output. A detailed explanation for the working of different filters is described in later sections.

Figure 5.2: Implemented Sobel Edge detector in FPGA

5.2.1 Edge detection test

1. Camera Pass-through

The figure 5.3 shows the output of camera without any filters.

Figure 5.3: Camera pass through Implemented

39 2. Sobel Edge Detector

The edge detection filter implemented is a Sobel edge filter. The figure 5.4 below shows the Sobel filter output.

Figure 5.4: Sobel filter implemented

The Block diagram of IPs and RTL blocks used to create the Sobel implementation in Vivado is shown Appendix B.2

40 5.2.2 Hough Transform Test

1. Plotting lines in a video stream

The Hough transform implementation is divided into multiple steps, initially, two lines at different fixed angles were plotted without video input, this shows that lines can be plotted at any angle in monitor and the figure 5.5 shows this implementation.

Figure 5.5: Implemented random XY lines in FPGA

The second step was combining the fixed lines and the input video stream. The figure 5.6 shows the implementation.

Figure 5.6: Implemented random XY lines with Video stream in FPGA

41 2. Hough Transform testing in FPGA

The figure 5.7 shows the actual implementation of Hough transform in FPGA using a double buffer approach as described in the above chapters and the red lines in the picture are the hough lines. The Block diagram implemented in Vivado is shown in Appendix B.3

Figure 5.7: Implemented Hough Transform in FPGA

The figure 5.8 shows Matlab simulation of Hough Transform when applied on a road lane.

Figure 5.8: Implemented Hough Transform in MATLAB

42 5.2.3 Resource Usage

Resources Utilization Available Utilization % Flip Flops 17703 53200 33 % LUTs 17793 70600 25 % BRAMs 120 140 85 % DSPs 80 220 36 % Global clocks 7 32 21 %

Table 5.1: Design resource summary

The above table 5.1 shows the resource usage for the real-time lane detection system design implemented in an FPGA. The system of the real-time finite lane detection is designed using VHDL and is implemented in a Xilinx Zynq 7020 FPGA. The clock frequencies used are 150 Mhz and 200 Mhz.

43 5.2.4 Prototype

A Prototype was designed and constructed as shown in figure 5.9. It includes a Robot car kit [38], Zedboard, FMC Imageon board, FMC HD camera and Lipo battery to power all the circuits.

Figure 5.9: Prototype of miniature scale Autonomous car

44 6 Conclusions

6.1 Conclusion

This thesis aimed to design and implement a lane detection and tracking system on the FPGA. It was shown that all the modules meet the required functionalities. The Steerable Filter module satisfies the requirements of the edge detection problem so that not only the frame is edge detected but also that all the redundant lanes in the undesired directions are suppressed. However, there is still some distortion in the edge detected frame due to the resolution of numbers in the implementation and the applied truncation policy introduced when implementing the algorithms on the FGPA.

Although the implemented Hough transform performs precisely with adjusted threshold values, there can be variation in performance with different lighting setup which requires a minor calibration. It was observed that this implementation is heavily depended on the light conditions in an environment that is not fixed, which makes the analysis troublesome.

The implemented double buffer approach resulted in faster read and writes. According to various experiments with noisy data, the accuracy of the module was verified. Overall the combination of all the implemented modules proved to be able to accurately distinguish, detect and track the desired road lanes.

This thesis project is important in many ways mainly for reducing the accident rates in the whole world and reduce traffic congestion. Upgrading to the newer Autonomous systems leads to efficient lane detection and faster response time. Moreover, this architecture aims to help commercial vehicles to move in the direction of Autonomous cars/trucks/buses. This vehicle automation will have multifold benefits for the people’s safety, due to a reduction in human errors while driving in Autonomous mode.

The FPGA based system for Autonomous cars is a new concept for the automotive industry which has good advantages over a traditional GPU or CPU controller for Autonomous tasks. The main advantage is faster image processing over GPUs, CPUs and re-programmable capability over ASICs. Automotive companies like

45 Uber, Volvo, Tesla, etc spend large amounts of money to test their Autonomous cars and by using these reprogrammable devices they can test their design at faster speeds and the time to market can be reduced.

6.2 Future work

There are several possibilities for further work on this topic. The below sub- sections explains the insights and suggestions for future work and also the cost analysis in the development of these prototypes.

6.2.1 Insights and suggestions for further work

One of the most important insights to come from this work is the need to understand the different algorithmic filters involved in designing an autonomous system for a car to implement them in a hardware description language. Because we need to keep in mind of limited resources in Hardware design which includes memory, processing speed compared to dynamic or unlimited resources in software space while applying these filter algorithms. Future work is implementing Kalman filter, which can help the controller estimate the future coordinates using the present and past coordinates. The Kalman filter approach can drive the car smoother and precisely.

6.2.2 Cost analysis

The current prototype works, but the performance from a cost perspective makes this an impractical solution. Future work must reduce the cost of this solution.

46 References

[1] World Health Organisation. Road traffic injuries. 2018. URL: http://www. who.int/mediacentre/factsheets/fs358/en/.

[2] Cho, Jae-Hyun et al. “Improved lane detection system using Hough transform with super-resolution reconstruction algorithm and multi- ROI”. In: International Conference on Electronics, Information and Communications (ICEIC), 2014 (2014), pp. 1–4. DOI: 10.1109/ELINFOCOM. 2014.6914446.

[3] “A Memory Efficient FPGA Implementation of Hough Transform for Line and Circle Detection, Ahmed Elhossini System and Computer Engineering Faculty of Engineering , Azhar University Cairo”. In: (2012).

[4] Jolly, E. K. and Fleury, M. “Multi-sector algorithm for of the general Hough transform”. In: Image and Vision Computing 24.9 (2006), pp. 970–976. ISSN: 02628856. DOI: 10.1016/ j.imavis.2006.02.016.

[5] Mayasandra, Karthik, Ladak, Hanif M, and Wang, Wei. “FOR REAL- TIME HOUGH TRANSFORM BASED SEGMENTATION”. In: May (2005), pp. 1469–1472.

[6] Kim, Dongkyun et al. “A Real-time Finite Line Detection System Based on FPGA”. In: ().

[7] Jau, Ruen Jen, Mon, Chau Shie, and Charlie, Chen. “A circular hough transform hardware for industrial circle detection applications”. In: 2006 1st IEEE Conference on Industrial Electronics and Applications (2006), pp. 1–6. DOI: 10.1109/ICIEA.2006.257148.

[8] Souki, M. A., Boussaid, L., and Abid, M. “An embedded system for real- time traffic sign recognizing”. In: Proceedings - 2008 3rd International Design and Test Workshop, IDT 2008 (2008), pp. 273–276. ISSN: 2162- 0601. DOI: 10.1109/IDT.2008.4802512.

47 [9] Geninatti, Sergio Rubén et al. “FPGA implementation of the generalized hough transform”. In: ReConFig’09 - 2009 International Conference on ReConFigurable Computing and FPGAs (2009), pp. 172–177. ISSN: 2325- 6532. DOI: 10.1109/ReConFig.2009.78.

[10] Hardzeyeu, Valiantsin and Klefenz, Frank. “On using the hough transform for driving assistance applications”. In: Proceedings - 2008 IEEE 4th International Conference on Intelligent Computer Communication and Processing, ICCP 2008 (2008), pp. 91–98. DOI: 10 . 1109 / ICCP . 2008 . 4648359.

[11] El, I et al. “FPGA Based Real-Time Lane Detection and Tracking Implementation”. In: (2016).

[12] Kuang, Xianyan, Fu, Wenbin, and Yang, Liu. “Real-Time detection and recognition of road traffic signs using MSER and random forests”. In: International Journal of Online Engineering 14.3 (2018), pp. 34–51. ISSN: 18612121. DOI: 10.3991/ijoe.v14i03.7925.

[13] Possa, Paulo Ricardo et al. “A multi-resolution FPGA-based architecture for real-time edge and corner detection”. In: IEEE Transactions on Computers 63.10 (2014), pp. 2376–2388. ISSN: 00189340. DOI: 10.1109/TC.2013. 130.

[14] He, Jia et al. “A lane detection method for lane departure warning system”. In: Proceedings - 2010 International Conference on Optoelectronics and Image Processing, ICOIP 2010 1 (2010), pp. 28–31. DOI: 10.1109/ICOIP. 2010.307.

[15] Hwang, Seokha and Lee, Youngjoo. “FPGA-based Real-time Lane Detection for Advanced Driver Assistance Systems”. In: 1 (2016), pp. 218–219.

[16] Hung, Donald L. “Design of a hardware accelerator for real-time moment computation: a wavefront array approach”. In: IEEE Transactions on Industrial Electronics 46.1 (1999), pp. 207–218. ISSN: 02780046. DOI: 10.1109/41.744413.

48 [17] Zhan, Zhen Huan et al. “A design of versatile image processing platform based on the dual multi-core DSP and FPGA”. In: Proceedings - 2012 5th International Symposium on Computational Intelligence and Design, ISCID 2012 2 (2012), pp. 236–239. DOI: 10.1109/ISCID.2012.210.

[18] AXI Protocol. URL: http : / / infocenter . arm . com / help / index . jsp ? topic=/com.arm.doc.ihi0022b/index.html.

[19] HoughTransformWiki. URL: https://en.wikipedia.org/wiki/Hough_ transform.

[20] Hough Transform. URL: https : / / en . wikipedia . org / wiki / Hough _ transform.

[21] FMCImageon. URL: https://forums.xilinx.com/xlnx/attachments/ xlnx/NewUser/44940/2/FMC-IMAGEON-Tutorial.pdf.

[22] ONSemiC. URL: https://www.onsemi.com/PowerSolutions/product.do? id=VITA2000.

[23] Yang, Baohai. “The design of 3 * 3 real time Videoimage convolver on the basis of FPGA”. In: Proceedings of 2011 International Conference on Electronic and Mechanical Engineering and Information Technology, EMEIT 2011 8 (2011), pp. 4381–4384. DOI: 10.1109/EMEIT.2011.6023928.

[24] ZynqPS. URL: https://www.xilinx.com/support/documentation/ip_ documentation/processing_system7/v5_5/pg082-processing-system7. pdf.

[25] AXI. URL: https://en.wikipedia.org/wiki/Advanced_eXtensible_ Interface.

[26] AXI_REf_Gd. URL: https://www.xilinx.com/support/documentation/ ip_documentation/ug761_axi_reference_guide.pdf.

[27] VintoAXI4S. URL: https://www.xilinx.com/support/documentation/ ip_documentation/v_vid_in_axi4s/v4_0/pg043_v_vid_in_axi4s.pdf.

[28] CFA. URL: https : / / www . xilinx . com / support / documentation / ip _ documentation/v_cfa/v7_0/pg002_v_cfa.pdf.

49 [29] AXIinterconnect. URL: https : / / www . xilinx . com / support / documentation / ip _ documentation / axi _ interconnect / v2 _ 1 / pg059 - axi-interconnect.pdf.

[30] AXI IIC. URL: https://www.xilinx.com/support/documentation/ip_ documentation/axi_iic/v2_0/pg090-axi-iic.pdf.

[31] VTC. URL: https : / / www . xilinx . com / support / documentation / ip _ documentation/v_tc/v6_1/pg016_v_tc.pdf.

[32] RGBtoYUV. URL: https://www.xilinx.com/support/documentation/ ip_documentation/v_rgb2ycrcb/v7_1/pg013_v_rgb2ycrcb.pdf.

[33] CR. URL: https : / / www . xilinx . com / support / documentation / ip _ documentation/v_cresample/v4_0/pg012_v_cresample.pdf.

[34] AXItoVOUT. URL: https://www.xilinx.com/support/documentation/ ip_documentation/v_axi4s_vid_out/v4_0/pg044_v_axis_vid_out.pdf.

[35] XtremeDSP. URL: https://www.xilinx.com/support/documentation/ user_guides/ug431.pdf.

[36] YUVtoRGB. URL: https://www.xilinx.com/support/documentation/ ip_documentation/v_ycrcb2rgb/v7_1/pg014_v_ycrcb2rgb.pdf.

[37] Zedboard specfication. URL: http://zedboard.org/product/zedboard.

[38] Robotic Car Kit. URL: https://www.sunfounder.com/smart-car-kit- v2-0-for-arduino.html.

50 Appendices

A MATLAB codes

A.1 MATLAB code for Sobel, Canny, Roberts and Prewitt filters

original = imread(’bw_curve_2.jpg ’); f i g u r e ( ) imshow(original)

clipped = original(: ,: ,:);

%%% Convert the image to black and white

I = uint8((1/3)*(double(clipped(: ,: ,1))+double(clipped(: ,: ,2)) +double(clipped(: ,: ,3))));

%Find edges using the Sobel method. BW3 = edge(I, ’Sobel ’); f i g u r e ( ) imshow(BW3) title (’Sobel method’)

% Find edges using the Canny method. BW1 = edge(I, ’Canny’); f i g u r e ( ) imshow(BW1); title (’Canny method’);

%Find edges using the Prewitt method.

51 BW2 = edge(I, ’Prewitt ’); f i g u r e ( ) imshow(BW2) title (’Prewitt method’);

%Find edges using the Roberts method. BW3 = edge(I, ’Roberts ’); f i g u r e ( ) imshow(BW3) title (’Roberts method’)

A.2 MATLAB code for Hough Transform

original = imread(’bw_curve_2.jpg ’); f i g u r e ( ) imshow(original)

clipped = original(: ,: ,:);

%%% Convert the image to black and white

I = uint8((1/3)*(double(clipped(: ,: ,1))+double(clipped(: ,: ,2)) +double(clipped(: ,: ,3))));

%Find edges using the Sobel method. BW = edge(I, ’Sobel’); f i g u r e ( ) imshow (BW) title (’Sobel method’)

[H,T,R] = hough(BW); imshow(H,[] , ’XData’ ,T, ’YData’ ,R,... ’InitialMagnification ’,’fit ’); xlabel(’\theta ’) , ylabel(’\rho’);

52 axis on, axis normal, hold on;

P = houghpeaks(H,5,’threshold ’ , ceil(0.3*max(H(:)))); x = T(P(:,2)); y = R(P(:,1)); plot(x,y,’s’,’color ’,’white ’); lines = houghlines(BW,T,R,P, ’FillGap ’ ,5 , ’MinLength’ ,7); figure , imshow(clipped), hold on max_len = 0; for k = 1:length(lines) xy = [lines(k).point1; lines(k).point2]; plot(xy(: ,1) ,xy(:,2),’LineWidth’,2,’Color ’ , ’green ’);

% Plot beginnings and ends of lines plot(xy(1,1),xy(1,2),’x’ , ’LineWidth’,2,’Color’ , ’yellow ’); plot(xy(2,1),xy(2,2),’x’ , ’LineWidth’,2,’Color ’ , ’red ’);

% Determine the endpoints of the longest line segment len = norm(lines(k).point1 − lines(k).point2); if ( len > max_len) max_len = len; xy_long = xy; end end plot(xy_long(: ,1) ,xy_long(: ,2) , ’LineWidth’,2,’Color ’ , ’cyan ’);

53 TRITA-EECS-EX-2020:75

www.kth.se