<<

Deep Q Learning, FCQ, and what’s next?

Ashis Kumer Biswas, Ph.D.

[email protected]

Deep Learning November 28, 2018 1 / 26 Outlines

1 What is next?

2 More on deep Q network – References

3 Deep Q-network (DQN)

Deep Learning November 28, 2018 2 / 26 Outlines

1 What is next?

2 More on deep Q network – References

3 Deep Q-network (DQN)

Deep Learning November 28, 2018 3 / 26 Utilizing strategy to help in algorithms. More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.

What’s next?

Adaptive architecture of Artificial Neural Networks.

Deep Learning November 28, 2018 4 / 26 More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.

What’s next?

Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms.

Deep Learning November 28, 2018 4 / 26 Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.

What’s next?

Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms. More perfection, leading to more close to real intelligence!

Deep Learning November 28, 2018 4 / 26 What’s next?

Adaptive architecture of Artificial Neural Networks. Utilizing Reinforcement learning strategy to help in supervised learning algorithms. More perfection, leading to more close to real intelligence! Problem to algorithm mapping needs to be robust/friendly to cover a broad spectrum of application fields.

Deep Learning November 28, 2018 4 / 26 Outlines

1 What is next?

2 More on deep Q network – References

3 Deep Q-network (DQN)

Deep Learning November 28, 2018 5 / 26 References and material courtesy

Thomas Simonini, “An introduction to Deep Q-learning: let’s play Doom” (https://medium.freecodecamp.org/an-introduction-to-deep-q-learning-lets-play-doom-54d02d8017d8)

Deep Learning November 28, 2018 6 / 26 Outlines

1 What is next?

2 More on deep Q network – References

3 Deep Q-network (DQN)

Deep Learning November 28, 2018 7 / 26 Q learning

Deep Learning November 28, 2018 8 / 26 Deep Q-Network

Deep Learning November 28, 2018 9 / 26 Deep Q-Network

Using the Q-table to implement Q-learning is fine in small discrete environments. But, for many states, or continuous cases, the Q-table is not feasible. Fortunately, the solution is Deep Q-learning, that uses a deep neural network to approximate the Q-table.

Deep Learning November 28, 2018 10 / 26 Deep Q-Network

Two approaches to build the Q-network: The input is the state-action pair, and the prediction is the Q-value. The input is the state, and the prediction is the Q-value for each action. (Preferred approach).

Deep Learning November 28, 2018 11 / 26 Deep Q-Network

The most desirable action is simply the action with the biggest Q value.

Deep Learning November 28, 2018 12 / 26 An example Deep Q network

stacks of 4 frames as input, pass through the network, and output a vector of Q-values for each action possible in the given state.

Deep Learning November 28, 2018 13 / 26 Preprocessing

Converting an RGB image to Grayscale (saves computational complexity and space) Cropping out the roof! Stack of 4 frames – to get a sense of motion!

Deep Learning November 28, 2018 14 / 26 Three CNN layers

Extract some spatial properties across the frames. Each uses ELU as an activation function. Exponential Linear Unit, with α > 0 ( x if x > 0 f(x) = α(ex − 1) otherwise In contrast to ReLUs, ELUs have negative values that pushes the mean of the activations closer to zero, that guarantees faster learning. More on this activation, please read here the article by Clevert et al, “Fast and accurate deep network learning by exponential linear units (ELUs)”

Deep Learning November 28, 2018 15 / 26 An example Deep Q network

One Fully connected layer with ELU activation function and one output layer that produces the Q-value estimation for each action.

Deep Learning November 28, 2018 16 / 26 Using a replay buffer

to avoid forgetting past experiences

Deep Learning November 28, 2018 17 / 26 Reducing correlation between experiences

Deep Learning November 28, 2018 18 / 26 Reducing correlation between experiences

Deep Learning November 28, 2018 19 / 26 Reducing correlation between experiences

Deep Learning November 28, 2018 20 / 26 Reducing correlation between experiences

Deep Learning November 28, 2018 21 / 26 Reducing correlation between experiences

Solution: Explore vs Exploit

Deep Learning November 28, 2018 22 / 26 Bellman equation to update Q values

Deep Learning November 28, 2018 23 / 26 Bellman equation to update Q values

Deep Learning November 28, 2018 24 / 26 Overall algorithm

Deep Learning November 28, 2018 25 / 26 Thanks Questions?

Deep Learning November 28, 2018 26 / 26