Deep Q Learning, FCQ, and what’s next?
Ashis Kumer Biswas, Ph.D.
Deep Learning November 28, 2018 1 / 26 Outlines
1 What is next?
2 More on deep Q network – References
3 Deep Q-network (DQN)
Deep Learning November 28, 2018 2 / 26 Outlines
1 What is next?
2 More on deep Q network – References
3 Deep Q-network (DQN)
Deep Learning November 28, 2018 3 / 26 Utilizing Reinforcement learning strategy to help in supervised learning algorithms. More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.
What’s next?
Adaptive architecture of Artificial Neural Networks.
Deep Learning November 28, 2018 4 / 26 More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.
What’s next?
Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms.
Deep Learning November 28, 2018 4 / 26 Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.
What’s next?
Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms. More perfection, leading to more close to real intelligence!
Deep Learning November 28, 2018 4 / 26 What’s next?
Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms. More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.
Deep Learning November 28, 2018 4 / 26 Outlines
1 What is next?
2 More on deep Q network – References
3 Deep Q-network (DQN)
Deep Learning November 28, 2018 5 / 26 References and material courtesy
Thomas Simonini, “An introduction to Deep Q-learning: let’s play Doom” (https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8)
Deep Learning November 28, 2018 6 / 26 Outlines
1 What is next?
2 More on deep Q network – References
3 Deep Q-network (DQN)
Deep Learning November 28, 2018 7 / 26 Q learning
Deep Learning November 28, 2018 8 / 26 Deep Q-Network
Deep Learning November 28, 2018 9 / 26 Deep Q-Network
Using the Q-table to implement Q-learning is fine in small discrete environments. But, for many states, or continuous cases, the Q-table is not feasible. Fortunately, the solution is Deep Q-learning, that uses a deep neural network to approximate the Q-table.
Deep Learning November 28, 2018 10 / 26 Deep Q-Network
Two approaches to build the Q-network: The input is the state-action pair, and the prediction is the Q-value. The input is the state, and the prediction is the Q-value for each action. (Preferred approach).
Deep Learning November 28, 2018 11 / 26 Deep Q-Network
The most desirable action is simply the action with the biggest Q value.
Deep Learning November 28, 2018 12 / 26 An example Deep Q network
stacks of 4 frames as input, pass through the network, and output a vector of Q-values for each action possible in the given state.
Deep Learning November 28, 2018 13 / 26 Preprocessing
Converting an RGB image to Grayscale (saves computational complexity and space) Cropping out the roof! Stack of 4 frames – to get a sense of motion!
Deep Learning November 28, 2018 14 / 26 Three CNN layers
Extract some spatial properties across the frames. Each Convolution layer uses ELU as an activation function. Exponential Linear Unit, with α > 0 ( x if x > 0 f(x) = α(ex − 1) otherwise In contrast to ReLUs, ELUs have negative values that pushes the mean of the activations closer to zero, that guarantees faster learning. More on this activation, please read here the article by Clevert et al, “Fast and accurate deep network learning by exponential linear units (ELUs)”
Deep Learning November 28, 2018 15 / 26 An example Deep Q network
One Fully connected layer with ELU activation function and one output layer that produces the Q-value estimation for each action.
Deep Learning November 28, 2018 16 / 26 Using a replay buffer
to avoid forgetting past experiences
Deep Learning November 28, 2018 17 / 26 Reducing correlation between experiences
Deep Learning November 28, 2018 18 / 26 Reducing correlation between experiences
Deep Learning November 28, 2018 19 / 26 Reducing correlation between experiences
Deep Learning November 28, 2018 20 / 26 Reducing correlation between experiences
Deep Learning November 28, 2018 21 / 26 Reducing correlation between experiences
Solution: Explore vs Exploit
Deep Learning November 28, 2018 22 / 26 Bellman equation to update Q values
Deep Learning November 28, 2018 23 / 26 Bellman equation to update Q values
Deep Learning November 28, 2018 24 / 26 Overall algorithm
Deep Learning November 28, 2018 25 / 26 Thanks Questions?
Deep Learning November 28, 2018 26 / 26