Autonomous Self-Driving Vehicle Using Deep Q-Learning

Turkish Journal of Physiotherapy and Rehabilitation; 32(2) ISSN 2651-4451 | e-ISSN 2651-446X AUTONOMOUS SELF-DRIVING VEHICLE USING DEEP Q-LEARNING M.SANGEETHA1, K.NIMALA1,D.SAVEETHA1, P. SURESH2*, S. SIVAPERUMAL2 1Dept. of Information Technology,SRM Institute of Science and Technology - Kattankulathur, 603203. Tamilnadu, India. 2Dept of ECE, Veltech Rangarajan Dr SSagunthala R & D Institute of Science and Technology * Corresponding author: [email protected] ABSTRACT Most of the modern day self-driving cars lack the ability to make a quick judgement based on the objects that come in front of them. We have proposed a hierarchy method in this paper by which a vehicle can decide as to how it should respond in case of a deadlock situation, i.e. when the car has no option other than crashing. We want to achieve a model of a self-driving car that can learn to keep itself on the road and also avoid obstacles in front of it and can give a priority to the different object as to which object is more valuable. Keywords: Deep Q learning, self-driving, Autonomous System, 1. INTRODUCTION Transportation has a lot of improvements regularly in modern times. Increasing developments can be seen done in the industry by the many new automobile manufacturers that seem to enter the market at an all-time increasing pace. Developments are also be made towards road safety. It’s today’s times, the road an accident claiming numerous human lives is at peak [1 - 3]. At the end of the day,most mishaps occur because of driver error. So it is inevitable to think of a solution to increase road safety by trying to remove human effort completely or at least partially to reduce risks on the road. However, a key component for this not being a viable solution in day-to-day life was the slow processing powers of computers, with up to differences of seven hundred of a second in the decision making of a computer vs the human brain [4]. Furthermore, computers do not process the rationality that a human brings to the table.However, sometime now the computers have caught in processing speeds and can judge fairly rationally in some given scenarios [5, 6]. The trust in these new systems can be seen even in the consumer markets where manufacturers like Tesla and Porsche already have started selling mass-market cars with self-driving capabilities built into them. And a few isolated incidents aside, the track record of such systems has been impeccable. Giving us more and more reasons to take this field with more seriousness. So, with the above in mind we decided create a project where the model(vehicle) can judge for itself with some accuracy how a human mind would have taken decisions in case of deadlock situation where avoiding a mishap was not a choice. Implementing software tools from the open source world like Deep Q learning, Relational Learning, Tensor Flow and OpenCV, we tried to create a simulation that would represent such a scenario and perform fairly too [8]. www.turkjphysiotherrehabil.org 1003 Turkish Journal of Physiotherapy and Rehabilitation; 32(2) ISSN 2651-4451 | e-ISSN 2651-446X We have implemented with a model with a simulated camera that can capture the frames from in front of the vehicle and process them into expected results so that the car can proceed with maximum accuracy [9]. Furthermore, the same camera is utilized to analyse the potential obstacles and judge them against a pre-existing database which can then give us the values based on the severity of the damage that might be caused. 2. PROPOSED SYSTEM We propose a system which utilises machine learning model which uses Deep Q – Learning to train a model which can detect obstacles and avoid. In case if a deadlock, the model is able to decide how to resolve the deadlock with minimum damage and penalty. 2.1 Deep Learning Deep Learning is a subset of Machine Learning and Artificial Intelligence. It’s the key technology behind self- driving cars, facial recognition, text translation and much more. The core structure is an Artificial Neural Network, also known as the perceptron is the foundation of Deep Learning. Neural Networks were inspired by the working of the human brain. Information processing and intermodal working were adapted. Although neural networks are designed to be static and limiting in their work of computation. The human brain is indeed extremely dynamic in its neuron computation. The intention was always for Deep Learning to result in performance similar to the human brain itself. Now it dives to specific tasks, and neural networks are designed accordingly. This is a deviation from Biological studies. Artificial Neural Networks were intended to behave and output results that conventional algorithms can’t reach with a certain level of accuracy or efficiency. Such complex problems required attention as they paved the way towards the automation we see today. 2.2 Q-Learning Reinforcement learning is a part of Machine Learning. It contains methods of how to maximize reward for taking a suitable action which eventually contributes the problem being solved.Q-Learning is a reinforcement learning algorithm in which the main goal of the system is to learn a Q function which is represented as Q(s, a), where ‘’s’’ is the current state of the system and ‘’a’’ is the next action that the system must take in order to change its current state. Figure 1. Q-Learning Process In Q-learning, we have to build a memory table represented as Q[s, a] to the Q function value of all the possible combinations of ‘’s’’ and ‘’a’’ where “s” denotes the current state and “a” denotes the action. Following is the algorithm that is used to fit the Q value with the sampled reward. If α, being the discount factor, is less than one then it is easier for the Q value to converge. Algorithm:- Start with Q0(s, a) for all the s and a Get the initial state s For i = 1,2,3,… till Q converges Sample action a and get the next state’s’ if s’ is terminal state : www.turkjphysiotherrehabil.org 1004 Turkish Journal of Physiotherapy and Rehabilitation; 32(2) ISSN 2651-4451 | e-ISSN 2651-446X target = R(s, a, s’) Sample new initial s’ else target=R(s,a,s’)+αmax(Qi(s’, a’)) Qk+1(s, a) = (1- α)Qk(s, a) + α[target] s = s’ The problem that is commonly faced with the algorithm mentioned above is that if the combination set of the action and states are too large, the memory and processing power for Q will be too high. To solve this problem we will be using deep Q network and find an approximate value for Q(s,a). Our reason of using Deep Q networks for this project is its ability of converting high-dimensional sensory data like vision to implementable decision directly on the agent. DQN uses a convolution network that has the ability of extracting high level features from raw sensory data. 2.3 Action State The action space is defined in a discrete action space. These discrete spaces are easier for the DQN to predict and therefore an advantage in terms of processing. There are two types of actions we have to consider when it comes to the movement of the vehicle, longitudinal and lateral. In terms of longitudinal, there are three kinds of actions that need to be considered: 1. Cruise control speed which is calculated as v + vc, where vc is the additional target speed so that the vehicle can accelerate in case of an empty path with no obstacles. vc is set as 2 units/h. 2. Maintaining the current speed 3. Reducing longitudinal velocity by using the earlier value vc but in this case we subtract this value form the velocity v, i.e., v- vc In terms of the lateral movement, we have the following actions:- 1. Staying in its current lane of motion 2. Changing the lane to the left 3. Changing the lane to the right Since an autonomous vehicle has to maintain both longitudinal and lateral motions at the same time to avoid hitting objects, we can define 5 actions as follows:- • No action • Accelerate • Decelerate • Change lane to left • Change lane to right 2.4 Reward Function When an action is selected in reinforcement learning, it has a certain value that is assigned to it and is called the reward. For the vehicle to find the perfect driving policy, it will have to maximize its expected future reward. It can infer from this that the final policy that the system learns can vary depending on how the reward function is designed. Therefore it is important that we chose the perfect reward function in relation to the task being performed for the system to learn the perfect driving policy for the vehicle. Since we are implementing a hierarchy for the objects that are present on the streets, the reward function will have to be designed in such a way that when the vehicle hits an object on top of the hierarchy then it is penalized www.turkjphysiotherrehabil.org 1005 Turkish Journal of Physiotherapy and Rehabilitation; 32(2) ISSN 2651-4451 | e-ISSN 2651-446X more than when it hits an object that lower in the hierarchy. By doing this we can train the vehicle to avoid important objects on the road if there is a deadlock and the vehicle cannot stop and collision is imminent. We have designed a function that satisfies these conditions and is given as follows:- For a constant speed:- 푣−푣푚푖푛 푟푣 = (1) 푣푚푎푥−푣푚푖푛 For collision:- 푟푐표푙 = −푟푐표푙푙푖푠푖표푛 − 푟표푏푗푒푐푡 (2) Where rv is the reward for the vehicle travelling at a constant speed, v is he velocity, rcol is the total reward for collision, rcollision is a constant value for any collision and robject is the penalty for hitting the object according to the hierarchy table of objects.

Autonomous Self-Driving Vehicle Using Deep Q-Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support