<<

Playing Snake with Q Learning Paul Mora Sancho Agenda

1 Q Learning Introduction

2 Q Tables

3

4 Human vs. Machine

2 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

3 Let us consider an example

States Observation

Dressed Up as…

1 2 3 4

Actions Observation

Going to… 1 2 3 4

How do we know what to do when dressed in a certain way? 4 The Oracle predicts our reward for an action

State

1

Action

1 2 3 4

Predict

Q Medium High Low Low Medium Values

5 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

6 Q Tables

General Performance Preview § Value based algorithm in § Fancy word for a lookup table § Calculating maximum future reward § Iterative process, as Q-Table needs improvement § Makes use of Bellman equation § Initial values of Q-Tables are zero, but there are multiple approaches à Optimal intrinsic values

Workings overview Reward Function

7 Q Tables - Example

Situation Procedure

8 Q Tables - Example

Situation Procedure

Try different actions given this state

9 Q Tables - Example

Situation Procedure

Try different actions given this state

Calculate Q Values

10 Q Tables - Example

Situation Procedure

Try different actions given this state

Calculate Q Values

Taking the highest Q Value

11 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

12 Deep Learning

General Information Performance Preview § Model based approach § Goal is that algorithm predicts whether a movement in a certain direction gets the snake closer to the apple § is gathered by purely random movements in the beginning and then trains in epochs § Trade-off between too many variables and state information

Input & Output Variables Reward Function

§ Danger § 1: Getting proximity closer to variables the apple

§ Apple § 0: Getting location further information from the apple § Movement § -1: Dying 13 Q Tables - Example

Situation Input of DL Model Output Action

4 1 3 2

14 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 1 3 0 2 1 3 0

0 1 4 0 15 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 1 3 0 2 1 3 0

0 1 4 0 16 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0

0 1 4 0 17 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0

0 1 4 0 18 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

19 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Random Policy

15

Score

10

1.5 5

20 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Samson

15

Score

10

3.5 5

21 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Aarun

15

Score

10

3.7 5

22 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Rachel

15

Score

10

5.1 5

23 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Dino

15

Score

10

6.9 5

24 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Deep Learning (100k)

15

Score

10

7.2 5

25 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Stathis

15

Score

10

8.2 5

26 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Peter

15

Score

10

8.7 5

27 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Q Tables (200k)

15

Score

10

17.6 5

28 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Stas

15

Score

10

18.2 5

29 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Adam

15

Score

10

19.2 5

30 The results of the fight human vs. machine

Graphical Representation Algorithm/ Person

20 Adam

15

Score

10

19.2 5

31 Thank You

Email: [email protected] LinkedIn: Paul Michael Mora Sancho The Game played is… The rules are… 1. Eat the green bunnies 2. The relevant score is Food/ Death Average 3. 60 Seconds time limit