Real Time Distance Calculation using Stereo Vision Technique

Session 2005-2009 Project Advisor Mr. Khan Asmar Azar

Submitted by:

Ahmed Tassaduq 050620-158

Aisha Ashraf 050620-128

Fatima Zehra Hassan 060820-089

Shajieuddin Hyder Khan 050620-086

Department of Electrical Engineering University of Management and Technology

Real Time Distance Calculation Using Stereo Vision Technique Page i

A report submitted to the

Department of Electrical Engineering

In partial fulfillment of the requirements for the

Degree

Bachelor of Science in Electrical Engineering

by

Ahmed Tassaduq (050620-158)

Aisha Ashraf (050620-128)

Fatima Zehra Hassan (060820-089)

Shajieuddin Hyder Khan (050620-086)

University of Management and Technology October 12, 2009

Real Time Distance Calculation Using Stereo Vision Technique Page ii

CERTIFICATE OF APPROVAL

It is certified that the work contained in this project report, entitled

“Real Time Distance Calculation using Stereo Vision Technique”

carried out by

Ahmed Tassaduq (050620-158)

Aisha Ashraf (050620-128)

Fatima Zehra Hassan (060820-089)

Shajieuddin Hyder Khan (050620-086)

Under the supervision of Mr. Khan Asmar Azar for the partial fulfillment of the degree requirement of Bachelor in Electrical Engineering

Approved By

______

Dr. Aziz Bhatti Khan Asmar Azar

Dean SST Project Advisor

Real Time Distance Calculation Using Stereo Vision Technique Page iii

Acknowledgements

We truly acknowledge the cooperation and help make by our advisor Mr. Khan Asmar Azar, University of Management and Technology. He has been a constant source of guidance throughout the course of this project. We would also like to thank Mr. Saeed-ur-Rehman Turk, Government College University, Lahore for his help and guidance in understanding many important issues. We are also thankful to our friends and families whose silent support led us to complete our project.

(Signed)

Ahmed Tassaduq (050620-158)

Aisha Ashraf (050620-128)

Fatima Zehra Hassan (060820-089)

Shajieuddin Hyder Khan (050620-086)

Date October 12, 2009

Real Time Distance Calculation Using Stereo Vision Technique Page iv

TABLE OF CONTENT

Chapter 1: Stereo Vision 1

1.0.1 Introduction 2 1.0.2 Background 7

1.1 Why Stereovision? 7

1.1.1 A few possibilities for camera based range sensing exist 8 1.1.2 Key advantages of camera based systems 8

1.2 Methodology 8

1.2.1 Pixel to Distance Calculation 8

1.2.2 Formula 9 1.2.3 Correlation 10

1.3 Camera calibration can be performed by two techniques 10

1.3.1 Photogrammetric calibration 10 1.3.2 Self-calibration 10

1.4 Proof of Concept 11

1.4.1 Calculation of Disparity 11 1.4.2 Disparity and distance inverse relation 12

Real Time Distance Calculation Using Stereo Vision Technique Page v

Chapter 2: USB Protocol and Hardware Implementation 13

2.1 The Hardware 14

2.1.1 A4-Tech PK-5 Webcam (ZC0301) 14 2.1.2 Xilinx Spartan 3 starter board 14 2.1.3 A computer 14

2.2 USB Protocol 14

2.2.1 How to use a Core? 15 2.2.2 Issues with pure HDL 15 2.2.3 Embedded development kit 15 2.2.4 Creating a USP IP core 25

Chapter 3: uClinux and Device Integration 51

3.1 Matlab 52

3.2 MicroClinux 52

3.3 Operating System 53

3.3.1 Microblaze & Xilkernel 53 3.3.2 Purpose of an OS 53 3.3.3 Responsibilities of OS 53

3.4 Device Manager 53

3.4.1 Kernel 53 3.4.2 Device drivers 54 3.4.3 Device controller 54

Real Time Distance Calculation Using Stereo Vision Technique Page vi

3.5 Driver Issues 55

3.5.1 Camera Controller: ZC030x 55 3.5.2 Driver GSPCA 55 3.5.3 JPEG decoder 55 3.5.4 Compiler gcc-3.4 55 3.5.5 Additional libraries 55

3.6 Suitable Kernel 55

3.6.1 Why not Xilkernel? 55 3.6.2 Why import another Linux kernel? 56 3.6.3 Suitable kernel for device driver GSPCA 56

3.7 Grabbers 57

3.7.1 Spcagui frame grabber 57 3.7.2 Spcaview frame grabber 57

Chapter 4: Image Processing (C++ and VHDL) 58

4.1 Disparity 59

4.2 Template Matching 59

4.2.1 Methods 60

4.2.1.1 Sum of Absolute Differences (SAD) 60 4.2.1.2 Sum of Square Differences (SSD) 60

4.2.1.3 Normalized Cross Correlation (NCC) 60 4.3 Applied Algorithm 61

Real Time Distance Calculation Using Stereo Vision Technique Page vii

4.3.1 Implementation 61

4.5 Hardware Implementation 61

CONCLUSION 63

Future Projects 63

Appendix A: Correlation (C++) 64

Appendix B : Correlation using State Machine (VHDL) 69

Appendix C: Correlation using For Loop (VHDL) 74

References 78

Real Time Distance Calculation Using Stereo Vision Technique Page viii

LIST OF FIGURES

Figure 1: Image plane of projection with cameras 3

Figure 2: Overview of the visual pathways from eyes to striate cortex 4

Figure 3: 4

Figure 4: Simple lens model 5

Figure 5: Concept of disparity calculation 6

Figure 6: Block diagram of convex lens. 9

Figure 7: two Images taken for correlation calculation with two cameras. 11

Figure 8: the offset calculation in the left and right CCDs. 12

Figure 9: Hardware 50

Figure 10: Independent and dependent parts of the device manager. 54

Figure 11: communication between hardware and software layers. 56

Figure 12: above figure shows the X offset in images 59

Figure 13: Images for template matching 59

Figure 14: Mask 61

Real Time Distance Calculation Using Stereo Vision Technique Page ix

Chapter 1 Stereo Vision

Real Time Distance Calculation Using Stereo Vision Technique Page 1

Abstract

The word "stereo" comes from the Greek word "stereos" which means firm or solid. Stereo vision you can see an object in three spatial dimensions i.e. according to its width, height and depth simply according to its x, y and z-axis. To make stereovision so rich and special in its attributes and qualities concept of depth dimension i.e. Disparity calculation has been added. Stereoscopic vision probably evolved as a means of survival. With stereo vision, we can see where objects are in relation to our own bodies with much greater precision especially when those objects are moving toward or away from us in the depth dimension. We can see a little bit around solid objects without moving our heads and we can even perceive and measure "empty" space with our eyes and brains.

According to the web site of the American Academy of Ophthalmology, September, 1996: "many occupations are not open to people who have good vision in one eye only [that means people without stereo vision]".Basically each eye confine its own view and the two separate images are sent on to the brain for progression. These two images turn up simultaneously in the back of the brain, where they are united into one picture. The brain combines the two images by matching up the similarities and adding in the small differences. This small difference between the two images adds up to a big difference in the final picture. Leonardo da Vinci had also realized that “objects at different distances from the eyes project images in the two eyes that differ in their horizontal positions, but had concluded only that this made it impossible for a painter to portray a realistic depiction of the depth in a scene from a single canvas”. Leonardo chose for his near object a column with a circular cross section and for his far object a flat wall. Had he chosen any other near object, he may have discovered horizontal disparity of its features. His column was one of the few objects that projects identical images of itself in the two eyes [1].

Stereopsis (from stereo meaning unadulterated, and opsis meaning vision) is the process in visual perception leading to the sensation of depth from the two slightly different projections of the world onto the retinas of the two eyes. The differences in the two retinal images are called horizontal disparity, or retinal disparity, or . The differences arise from the eye’s different positions in the head. is commonly referred to as . This is inaccurate, as depth perception relies on many more monocular cues than stereoptical ones, and individuals with only one functional eye still have full depth perception except in artificial cases (such as stereoscopic images) where only binocular cues are present. Stereopsis became popular during Victorian times with the invention of the prism by David Brewster. This, combined with photography, meant that tens of thousands of stereograms were produced. Until about the 1960s, research into stereopsis was dedicated to exploring its limits and its relationship to singleness of vision. Researchers included Peter Ludwig Panum, Ewald Hering, Adelbert Ames Jr., and Kenneth N. Ogle. Profundity perception is the visual ability to comprehend the world into three dimensions.

Although animals are capable to sense the distance of objects in that environment, the term perception is reserved for humans, who are, as far as is known, the only beings that can tell each other about their experiences of distances. Depth sensation is the ability to move accurately, or to

Real Time Distance Calculation Using Stereo Vision Technique Page 2

respond consistently, based on the distances of objects in an environment. With this definition, every moving animal has some sensation of depth. Depth perception arises from a variety of depth cues. These are typically classified into binocular cues that require input from both eyes and monocular cues that require the input from just one eye.

Binocular cues include stereopsis , yielding depth from through exploitation of . Monocular cues include size: distant objects subtend smaller visual angles than near objects. A third class of cues requires synthetic integration of binocular and monocular cues. Distance calculation can be calculated in 2 dimension camera techniques, where camera would capture left image and right images from different positions and would calculate distance between two images i.e called disparity as it is mentioned earlier. Given below is an arrangement of cameras according to image plane projection.

object

Image plane of projection

P1 P2

Left camera Right camera

Figure 1: Image plane of projection with cameras.

Principle of two camera calibration is basically on human eye separation. Computer stereo vision is a part of the field of . It is sometimes used in mobile robotics to detect obstacles. Two cameras take pictures of the same scene, but they are separated by a distance - exactly like our eyes. A computer compares the images while shifting the two images together over lap each other to find the parts that match. Point where both images vary with each-other is called the disparity. The disparity at which objects in the image best match, is used by the computer to calculate their distance [1].

Real Time Distance Calculation Using Stereo Vision Technique Page 3

Figure 2: Overview of the visual pathways from eyes to striate cortex [2]

According to explanation of figure given above human eyes change their angle according to the distance to the observed object, but for a computer this represents significant extra complexity in the geometrical calculations i.e. (), given above is the simplest geometrical case which is the actual scenario that explains how actually image forms. Images may alternatively be converted by re-projection through a linear transformation to be on the same image plane, which is called Image rectification. Image rectification is a process which projects multiple images onto a common surface. Two camera principles are working exactly on human eye principle and disparity is the distance between two eyes. Given below further explains image rectification principle.

Figure 3: Image rectification: [2]

Real Time Distance Calculation Using Stereo Vision Technique Page 4

First part of figure explains search space before rectification where as second part explains search space after rectification. Computer stereo vision with multiple cameras under fixed lighting is called . Techniques using a fixed camera and known lighting are called photogrammetric stereo techniques [1]. Epipolar geometry in stereovision requires the arrangement of camera at a fixed distance from its object in order to measure the depth of the object. Basically Epipolar geometry requires pinhole came which is shown below in the figure.

In general lens formula indicates post and pre lens material and, it is necessary to mention the concept of refraction which is ratio speed of light in vacuum divided by speed of light in medium.

Figure 4: Simple lens model [2]

In this lens model “u” is the distance between lens and the object, where as distance between image and lens has been denoted by “v” whereas “f” is focal point which is a point at which initially collimated rays of light meet after passing through a convex lens, and image would be inverted and diminished according to the size of object.

The goal with respect to stereo vision was, to be able to tell how far away things are in a scene from the camera or, relative to one another. By wearing glasses or contact lenses, you can probably see that tests your intense perception. It’s an astonishing experiment that how little disparity there has to be between images in one's left and right eyes in order for one to tell which object is different from the others [1]. Now the question is that how to calibrate a pair to find X and Y offset that correspond in the right camera with the same position in the left camera when they are both looking at a far off "infinity point"? Once this step has been done, the same basic algorithm for dynamically getting the two cameras "looking" at the same thing even when the subject matter is close enough for the two cameras to actually register a difference. For finding factor and offset elaborated figure is given below.

Real Time Distance Calculation Using Stereo Vision Technique Page 5

Figure 5: Concept of disparity calculation [2]

Real Time Distance Calculation Using Stereo Vision Technique Page 6

The star represents an object on which two cameras are looking at. It has been assumed that the cameras are perfectly aligned with each other. That is, when the object is sufficiently far away from cameras you can blend the images from both cameras, the result looks the same as if you just looked at the left or right camera image. But if you stick your hand in front of the camera pair at a distance which is near to camera and look at the combined image, you see two "hands" partly overlapping. Measure the X (horizontal) offset of one version of the hand from the other let suppose it is about 18 pixels. Now, you change the code to overlap the pictures so that the right-hand one is offset by 18 pixels. Now the two hands perfectly overlap and it's the background scene that's doubled-up. The diagram above is evocative of this in the middle section, where the pink star is in different places in the left and right camera "projections". Projections are the images that are output. The idea that the object seen by the two cameras is the same but simply offset to different positions in each image. Given above we are using same formula to find the factor by using the distance between camera and object, after finding factor we will be able to find distance of objects. Mentioned above offset is the shift of right camera in order to focus on same object.

Background

This project is an inspired work of Alexandria in which the invention relates to a method for the automatic calibration of a stereovision system which is intended to be disposed on board a motor vehicle. The inventive method comprises the following steps consisting in: acquiring a left image and a right image of the same scene comprising at least one traffic lane for the vehicle, using a first and second acquisition device; searching the left image and right image for at least two vanishing lines corresponding to two essentially-parallel straight lines of the lane; upon detection of said at least two vanishing lines, determining the co-ordinates of the point of intersection of said at least two respectively-detected vanishing lines for the left image and the right image; and determining the pitch error and the yaw error in the form of the inter-camera difference in terms of pitch angle and yaw angle from the co-ordinates of the points of intersection determined for the left image and the right image.

1.1 Why Stereovision?

Numerous methods exist for determining the range to an object. The most popular for robotics use are active sensing methods such as infrared, ultrasonic and laser ranging devices. The trouble with all of these methods is that they only give quite low resolution information. For infrared and ultrasonic the area of space within which objects may be detected is quite wide (several degrees at least), so that the precise position of an object may only be known in very vague terms.

A laser ranger on the other hand has a far more specific narrow beam, but at any point in time only gives a tiny tunnel vision-like amount of information about the depth of the scene. To overcome this problem in recent years scanning laser rangefinders have become increasingly popular. By scanning in one or more directions using a spinning mirror to direct a laser beam it's possible to build up detailed information about the geometric structure of the environment. However, this scanning process is relatively slow and relies upon precision engineered mechanical parts. At the time of writing scanning laser rangefinders are still very expensive devices costing thousands of dollars, and remain only within the realm of high end academic, industrial or military applications.

Cameras can also be used as range sensing devices. When more than one camera are arranged in some configuration, such that the same features in the environment can be observed from multiple

Real Time Distance Calculation Using Stereo Vision Technique Page 7

viewpoints, matches made between corresponding features may be used to calculate range values. Potentially, every pixel in the camera image can be used as a highly focused ranging device.

1.1.1 A few possibilities for camera based range sensing exist

1. A single camera taking pictures as it moves through space, with features being tracked and correlated over time. 2. Two cameras taking photos near simultaneously, calculating ranges from stereo correspondences under the Epipolar constraint. 3. Three cameras - a trinocular system - working in a similar manner to stereo but with greater possibilities for accuracy and disambiguation of features. The only disadvantage here is the greater computing resources needed to process three images rather than two, and calculate three dimensional light ray intersections.

4. Use of two or three cameras together with tracking of features over time to produce a spatio- temporal correlation system. This is the ideal visual perception system, since tracking of features over time facilitates accurate long range measurements to be taken.

1.1.2 Key advantages of camera based systems

1. Safety Laser ranging devices may give good mapping results, but whether or not such devices will be deemed safe to use in a domestic environment containing people and pets remains unknown. The lower power rated lasers seem to rely upon the human blink response for protection, which seems dubious and may not apply to cats and dogs. 2. Although a trinocular system might be more accurate, using only two cameras gives a minimally complex solution. 3. High speed. Unlike methods which require mechanical scanning cameras can gather a large amount of range data in a very short space of time. 4. Very low cost of digital imaging devices. In the last five years digital imagers have become ubiquitous devices, used in digital cameras and mobile phones. Low cost means that robots using stereo vision easily fall within the means of the robotics hobbyist or even consumer robotics domains. 5. Cameras are entirely solid state, whereas laser scanners have moving parts which could potentially become unreliable during the rough and tumble of robotic excursions. 6. Color information can be easily acquired at the same time as range data, helping to build realistic full color 3D models of the environment. 7. Energy efficient. Active sensors need to pump a lot of energy into the environment, whereas passive sensors such as cameras don't need to do this and so may have a lower energy requirement [3].

1.2 Methodology

1.2.1 Pixel to Distance Calculation

As shown in figure below, distance can be calculated by this manner. Here it is important to mention that focal length of the camera is known. Given below figure defines the idea of pixel to distance calculation in more elegant perspective. For this purpose Euclidean geometry is a special case of perspective geometry, and the use of perspective geometry in computer vision makes for a simpler and more elegant expression of the computational processes that render vision possible. A superb overview of the geometric viewpoint in computer vision. A perspective projection is the

Real Time Distance Calculation Using Stereo Vision Technique Page 8

projection of a three-dimensional object onto a two-dimensional surface by straight lines that pass through a single point.

H F

u v

D’

Figure 6: Block diagram of convex lens. Which will give us capsized and ebbed image.

H= Height of object

D’=Distance between object and image

F=Focal length of camera (4.9 mm) u = Distance between object and lens v = Distance between lens and image

1.2.2 Formula

Table given shows that as we decrease the distance pixels in meters increases till infinity.

)

Distance (m) Height in Pixels

D=2.115 h=0.76

D=1.88 h=0.86

D=1.64 h=0.98

D=1.41 h=1.14

D=1.17 h=1.376

D=0.94 h=1.72

D=0.70 h=2.29

D=0.47 h=3.44

D=0.23 h=6.88

D=0 infinite

Real Time Distance Calculation Using Stereo Vision Technique Page 9

Distance has been measured in meters and is converted into pixels which are symbolized by “h”. Calculations show that distance is inversely proportional to pixels.

1.2.3 Correlation

A real-time image processing using stereo vision system requires supporting high-level object based tasks in a tale-operated environment. Stereo vision is computationally expensive, due to having to find corresponding pixels. Correlation is a fast, standard way to solve the . Analyses of the behavior of correlation based stereo to find ways to improve its quality while maintaining its real-time suitability is major part. Three methods are suggested. Two of them aim to improve the disparity image especially at depth discontinuities, while one targets the identification of possible errors in general. Results are given on real stereo images with ground truth as mentioned above. Finally, performance results of individual parts of the stereo algorithm, including rectification, filtering and correlation using all proposed methods. The implemented system shows that errors of simple stereo correlation, especially in object border regions, can be reduced in real- time using non-specialized computer hardware. There are lots and lots of amendments are done in order to reduce ambiguity [4].

1.3 Camera calibration can be performed by two techniques

1.3.1 Photogrammetric calibration

Camera calibration is performed by using 3-D space is known with very good accuracy. By using this technique, calibration can be done very professionally, and by using two to three planes orthogonal to each other this approaches require an expensive calibration apparatus, and a convoluted setup.

1.3.2 Self-calibration

This category technique does not require any calibration object; rather it requires moving a camera in a static scene. The rigidity of the scene provides in general two constraints on the cameras’ internal parameters from one camera displacement by using image information alone. Therefore, if images are taken by the same camera with fixed internal parameters, correspondences between three images are sufficient to recover both the internal and external parameters which allow us to reconstruct 3-D structure. While this approach is very flexible, it is not yet mature .Because there is many parameters to estimate. This technique is more applicable but has ambiguity with the respect to fpga (field programmable gate array) implementation. While this approach is very flexible, it is not yet mature .Because there is many parameters to estimate, we cannot always obtain reliable results. Other techniques exist: vanishing points for orthogonal directions and calibration from pure rotation. Our current research is focused on a desktop vision system (DVS) since the potential for using DVSs is large. Cameras are becoming cheap and ubiquitous. A DVS aims at the general public, who are not experts in computer vision. A typical computer user will perform vision tasks only from time to time, so will not be willing to invest money for expensive equipment. Therefore, flexibility, robustness and low cost are important. The camera calibration technique described in this paper was developed with these considerations in mind [5].

1.4 Proof of Concept

Real Time Distance Calculation Using Stereo Vision Technique Page 10