Playing Snake with Q Learning Paul Mora Sancho Agenda

Home , Data science

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

2 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

3 Let us consider an example

States Observation

Dressed Up as…

1 2 3 4

Actions Observation

Going to… 1 2 3 4

How do we know what to do when dressed in a certain way? 4 The Oracle predicts our reward for an action

State

Action

1 2 3 4

Predict

Q Medium High Low Low Medium Values

5 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

6 Q Tables

General Information Performance Preview § Value based algorithm in reinforcement learning § Fancy word for a lookup table § Calculating maximum future reward § Iterative process, as Q-Table needs improvement § Makes use of Bellman equation § Initial values of Q-Tables are zero, but there are multiple approaches à Optimal intrinsic values

Workings overview Reward Function

7 Q Tables - Example

Situation Procedure

8 Q Tables - Example

Situation Procedure

Try different actions given this state

9 Q Tables - Example

Situation Procedure

Try different actions given this state

Calculate Q Values

10 Q Tables - Example

Situation Procedure

Try different actions given this state

Calculate Q Values

Taking the highest Q Value

11 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

12 Deep Learning

General Information Performance Preview § Model based approach § Goal is that algorithm predicts whether a movement in a certain direction gets the snake closer to the apple § Data is gathered by purely random movements in the beginning and then trains in epochs § Trade-off between too many variables and state information

Input & Output Variables Reward Function

§ Danger § 1: Getting proximity closer to variables the apple

§ Apple § 0: Getting location further information from the apple § Movement § -1: Dying 13 Q Tables - Example

Situation Input of DL Model Output Action

4 1 3 2

14 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 1 3 0 2 1 3 0

0 1 4 0 15 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 1 3 0 2 1 3 0

0 1 4 0 16 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0

0 1 4 0 17 Q Tables - Example

Situation Input of DL Model Output Action

0 1 1 0

0 1 2 0 4 0 -1 1 0 Down 1 3 0 2 1 3 0

0 1 4 0 18 Agenda

1 Q Learning Introduction

2 Q Tables

3 Deep Learning

4 Human vs. Machine

19 The results of the fight human vs. machine