Blind Leap - Realtime Object Recognition with Results Converted to Audio for Blind People”
Total Page:16
File Type:pdf, Size:1020Kb
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING A PROJECT REPORT ON “BLIND LEAP - REALTIME OBJECT RECOGNITION WITH RESULTS CONVERTED TO AUDIO FOR BLIND PEOPLE” Submitted in partial fulfillment for the award of the degree of BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND ENGINEERING BY M K SUBRAMANI 1NH15CS066 Under the guidance of Ms. JAYA R Sr. Assistant Professor, Dept. of CSE, NHCE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CERTIFICATE It is hereby certified that the project work entitled “BLIND LEAP – REALTIME OBJECT RECOGNITION WITH RESULTS CONVERTED TO AUDIO FOR BLIND PEOPLE” is a bonafide work carried out by M K SUBRAMANI (1NH15CS066) in partial fulfilment for the award of Bachelor of Engineering in COMPUTER SCIENCE AND ENGINEERING of the New Horizon College of Engineering during the year 2018-2019. It is certified that all corrections/suggestions indicated for Internal Assessment have been incorporated in the Report deposited in the departmental library. The project report has been approved as it satisfies the academic requirements in respect of project work prescribed for the said Degree. ………………………….. ………………………… ………………………………. Signature of Guide Signature of HOD Signature of Principal (Ms. Jaya R) (Dr. B. Rajalakshmi) (Dr. Manjunatha) External Viva Name of Examiner Signature with date 1. ………………………………………….. …………………………………. 2. …………………………………………… ………………………………….. ple ABSTRACT This project tries to transform the visual world into the audio world with the potential to inform blind people objects as well as their spatial locations. Objects detected from the scene are represented by their names and converted to speech. Their spatial locations are encoded into the 2-channel audio with the help of 3D binaural sound simulation. The system will compose several modules. Video is captured with a portable camera device on the client side, and is streamed to the server for real-time image recognition with existing object detection models (YOLO). The 3D location of the objects is estimated from the location and the size of the bounding boxes from the detection algorithm. Then, a 3D sound generation application based on Unity game engine renders the binaural sound with locations encoded. The sound is transmitted to the user with wireless earphones. Sound is play at an interval of few seconds, or when the recognized object differs from previous one, whichever earliest. With the help of the device, the user will be able to successfully identify objects that is 3- 5 meters away. Possible Issues that can occur for the current prototype can be: detection failure when objects are too close or too far, and overload of information when the system tries to notify users too many objects. I ACKNOWLEDGEMENT The satisfaction and euphoria that accompany the successful completion of any task would be impossible without the mention of the people who made it possible, whose constant guidance and encouragement crowned our efforts with success. I have great pleasure in expressing my deep sense of gratitude to Dr. Mohan Manghnani, Chairman of New Horizon Educational Institutions for providing necessary infrastructure and creating good environment. I take this opportunity to express my profound gratitude to Dr. Manjunatha, Principal NHCE, for his constant support and encouragement. I am grateful to Dr. Prashanth C.S.R, Dean Academics, for his unfailing encouragement and suggestions, given to me in the course of my project work. I would also like to thank Dr. B. Rajalakshmi, Professor and Head, Department of Computer Science and Engineering, for her constant support. I express my gratitude to Ms. Jaya R, Senior Assistant Professor, my project guide, for constantly monitoring the development of the project and setting up precise deadlines. Her valuable suggestions were the motivating factors in completing the work. Finally, a note of thanks to the teaching and non-teaching staff of Dept of Computer Science and Engineering, for their cooperation extended to me, and my friends, who helped me directly or indirectly in the course of the project work. M K SUBRAMANI (1NH15CS066) II CONTENTS ABSTRACT I ACKNOWLEDGEMENT II LIST OF FIGURES V 1. INTRODUCTION 1 1.1. VISUAL IMPAIRMENT 1 1.2. OBJECTIVES OF THE PROPOSED PROJECT WORK 2 1.3. PROJECT DEFINITION 3 1.4. PROJECT FEATURES 4 2. LITERATURE SURVEY 5 2.1. OBJECT RECOGNITION 7 2.1.1 WHAT IS OBJECT RECOGNITION 8 2.1.2 HOW OBJECT RECOGNITION WORKS 8 2.2. EXISTING SYSTEM 11 2.2.1 COMPUTER VISION TECHNOLOGIES 11 2.2.2 SENSORY SUBSTITUTION TECHNOLOGIES 12 2.2.3 ELECTRONIC TRAVEL AIDS 12 2.3. OBJECT DETECTION ALGORITHMS 13 2.3.1 R-CNN 13 2.3.2 FAST R-CNN 14 2.3.3 FASTER R-CNN 15 2.3.4 YOLO (YOU LOOK ONLY ONCE) 16 2.4. PROPOSED SYSTEM 17 2.4.1 OBJECT DETECTION ALGORITHM 17 2.4.2 DIRECTION ESTIMATION 18 2.4.3 DATA STREAMING 18 2.3.4 RESULT FILTERING 19 2.3.5 3D SOUND GENERATION 20 III 3. REQUIREMENT ANALYSIS 21 3.1. METHODOLOGY FOLLOWED 21 3.1.1 CONVOLUTIONAL NEURAL NETWORKS 21 3.1.2 YOLO (YOU ONLY LOOK ONCE) 22 3.2. FUNCTIONAL REQUIREMENTS 24 3.2.1 RASPBERRY PI 25 3.2.2 UNITY 3D 26 3.3. NON-FUNCTIONAL REQUIREMENTS 27 3.3.1 ACCESSIBILITY 27 3.3.2 MAINTAINABILITY 27 3.3.3 PORTABILITY 28 3.4. HARDWARE REQUIREMENTS 28 3.5. SOFTWARE REQUIREMENTS 29 4. DESIGN 30 4.1. DESIGN GOALS 30 4.2. SYSTEM ARCHITECTURE 30 4.3. DATA FLOW DIAGRAM 31 5. IMPLEMENTATION 32 5.1. CONNECTIVITY 32 5.2. OBJECT DETECTION PSEUDO CODE 33 6. RESULT 35 6.1. DETECTION DATA SET CLASSES 35 6.2. OUTPUT 35 7. CONCLUSION 42 REFERENCES 43 ANNEXURE 46 IV LIST OF FIGURES Fig no. Fig Name Page no. 2.1 Object recognition used to identify objects 8 2.2 Deep learning technique for object recognition 9 2.3 Machine learning technique for object recognition 10 2.4 Illustration of R-CNN 13 2.5 Illustration of Fast R-CNN 14 2.6 Illustration of Faster R-CNN 15 2.7 Illustration of YOLO 16 2.8 Data flow pipeline of the system 19 3.1 layer of a convolutional neural network 22 3.2 Bounding boxes, input and output for YOLO 23 4.1 System Architecture of the proposed System 30 4.2 Data flow of the proposed System 31 6.1 Raspberry pi top view 36 6.2 Raspberry pi side view 36 6.3 Raspberry pi camera 37 6.4 Remote desktop connection login panel 37 6.5 Raspberry pi terminal (client) 38 6.6 Windows Terminal (Server) 38 6.7 Object detected from a live stream video (Bottle) 39 6.8 Object detected from a live stream video (Phone) 39 6.9 Object detected from a live stream video (Banana) 40 6.10 Object detected from a live stream video (Scissor) 40 6.11 Depiction of the whole system 41 V Blind Leap CHAPTER 1 INTRODUCTION 1.1 VISUAL IMPAIRMENT Visual impairment also called vision loss or visual impairment. It’s a problem related to the vision, which cannot be fixed with the normal glasses. Blindness is a state where person lose his/her complete vision. Both type of people whether it’s with the person with mild to severe visual impairment or the blind find it tough to carry out their day to day activities. They somehow learn to live with this but that only in the familiar environment. But when it’s a completely unfamiliar environment, thing get tougher for them. Millions of people live in this world with incapacities of understanding the environment due to visual impairment. Although they can develop alternative approaches to deal with daily routines, they also suffer from certain navigation difficulties as well as social awkwardness. For example, it is very difficult for them to find a particular room in an unfamiliar environment. And blind and visually impaired people find it difficult to know whether a person is talking to them or someone else during a conversation. According to WHO report there are around 36 million blind people and 217 million people with mild to severe visual impairment. The visual impairment reported in this WHO report include cataracts, river blindness and trachoma infections, glaucoma, diabetic retinopathy, uncorrected refractive errors and some cases of childhood blindness. Many people with significant visual impairment benefit from vision rehabilitation, changes in their environment, and tools. With the advancements in the field of technology we are able to find the solution of nearly everything. In last few years computer vision technology especially “deep neural network” has developed swiftly. Access technologies such as screen readers, screen amplifiers and refreshing Braille displays allow the blind to use mainstream computer Dept of CSE, NHCE 1 Blind Leap applications and cell phones. The availability of tools is increasing, accompanied by joint efforts to ensure the accessibility of information technology to all potential users, including the blind. Later versions of Microsoft Windows include an Accessibility Wizard and magnifier for those with partial vision, and Microsoft Narrator, a simple screen reader. Linux distributions (as live CDs) for the blind include Vinux and Adriane Knoppix, the latter developed in part by Adriane Knopper who has a visual impairment. macOS and iOS also come with a built-in screen reader called VoiceOver, while Google TalkBack is built in to most Android devices. Few of the other existing tools which are present to help these special people are “Blindsight”, “TapTapSee” etc. The “Blindsight” offers a mobile app Text Detective featuring OCR or optical character recognition technology to detect and read text from pictures captured by using the camera [2]. “TapTapSee” is mobile app that uses computer vision and crowd sourcing in order to define a picture captured by the blind users in about 10 seconds.