Playing Snake with Q Learning Paul Mora Sancho Agenda
1 Q Learning Introduction
2 Q Tables
4 Human vs. Machine
2 Agenda
1 Q Learning Introduction
2 Q Tables
3 Deep Learning
4 Human vs. Machine
3 Let us consider an example
States Observation
Dressed Up as…
1 2 3 4
Actions Observation
Going to… 1 2 3 4
How do we know what to do when dressed in a certain way? 4 The Oracle predicts our reward for an action
State
1
Action
1 2 3 4
Predict
Q Medium High Low Low Medium Values
5 Agenda
1 Q Learning Introduction
2 Q Tables
3 Deep Learning
4 Human vs. Machine
6 Q Tables
General Information Performance Preview § Value based algorithm in reinforcement learning § Fancy word for a lookup table § Calculating maximum future reward § Iterative process, as Q-Table needs improvement § Makes use of Bellman equation § Initial values of Q-Tables are zero, but there are multiple approaches à Optimal intrinsic values
Workings overview Reward Function
7 Q Tables - Example
Situation Procedure
8 Q Tables - Example
Situation Procedure
Try different actions given this state
9 Q Tables - Example
Situation Procedure
Try different actions given this state
Calculate Q Values
10 Q Tables - Example
Situation Procedure
Try different actions given this state
Calculate Q Values
Taking the highest Q Value
11 Agenda
1 Q Learning Introduction
2 Q Tables
3 Deep Learning
4 Human vs. Machine
12 Deep Learning
General Information Performance Preview § Model based approach § Goal is that algorithm predicts whether a movement in a certain direction gets the snake closer to the apple § Data is gathered by purely random movements in the beginning and then trains in epochs § Trade-off between too many variables and state information
Input & Output Variables Reward Function
§ Danger § 1: Getting proximity closer to variables the apple
§ Apple § 0: Getting location further information from the apple § Movement § -1: Dying 13 Q Tables - Example
Situation Input of DL Model Output Action
4 1 3 2
14 Q Tables - Example
Situation Input of DL Model Output Action
0 1 1 0
0 1 2 0 4 1 3 0 2 1 3 0
0 1 4 0 15 Q Tables - Example
Situation Input of DL Model Output Action
0 1 1 0
0 1 2 0 4 0 -1 1 0 1 3 0 2 1 3 0
0 1 4 0 16 Q Tables - Example
Situation Input of DL Model Output Action
0 1 1 0
0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0
0 1 4 0 17 Q Tables - Example
Situation Input of DL Model Output Action
0 1 1 0
0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0
0 1 4 0 18 Agenda
1 Q Learning Introduction
2 Q Tables
3 Deep Learning
4 Human vs. Machine
19 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Random Policy
15
Score
10
1.5 5
20 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Samson
15
Score
10
3.5 5
21 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Aarun
15
Score
10
3.7 5
22 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Rachel
15
Score
10
5.1 5
23 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Dino
15
Score
10
6.9 5
24 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Deep Learning (100k)
15
Score
10
7.2 5
25 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Stathis
15
Score
10
8.2 5
26 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Peter
15
Score
10
8.7 5
27 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Q Tables (200k)
15
Score
10
17.6 5
28 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Stas
15
Score
10
18.2 5
29 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Adam
15
Score
10
19.2 5
30 The results of the fight human vs. machine
Graphical Representation Algorithm/ Person
20 Adam
15
Score
10
19.2 5
31 Thank You
Email: [email protected] LinkedIn: Paul Michael Mora Sancho The Game played is… The rules are… 1. Eat the green bunnies 2. The relevant score is Food/ Death Average 3. 60 Seconds time limit