Neurorobotics An introduction Marc-Oliver Gewaltig In this lecture you’ll learn

1. What is Neurorobotics 2. Examples of simple neurorobots 1. attraction and avoidance 2. reflexes vs. learned behavior 3. The sensory-motor loop 4. Learning in neurorobotics 1. unsupervised learning for sensory representations 2. reinforcement learning for action learning What is Neurorobotics

Neurorobotics, is the combined study of , robotics, and . It is the science and technology of embodied autonomous neural systems.

https://en.wikipedia.org/wiki/Neurorobotics Neurorobotics: Embodied in silico neuroscience

Spinal Cord Reconstructed Reflexes spinal cord/ CDPs models

Embodiment and virtual environments

Musculo-skeletal system – compliant actuators and mechanics Starting simple: Valentino Braitenberg’s Vehicles

Valentino Braitenberg (1926-2011) Braitenberg, V. (1984). Vehicles: Experiments in Photo: Alfred Wegener, commons.wikimedia.org synthetic psychology. Cambridge, MA: MIT Press. Vehicle 1

1 Vehicle 2a

1 2a Vehicle 2b

1 2a 2b Vehicle 3

1 2a 2b 3 Vehicle 3

1 2a 2b 3 Exercise

How will vehicle 3 move? Generalizing the Braitenberg vehicle Exercise

Using weights in {-1,+1}, which weight configurations implement the vehicles 2a, 2b, and 2c?

speed light

Biological and Non-biological bodies

Sensors: cameras, microphones, etc

Artificial brain with neurons

Servo motors with wheels Biological and Non-biological bodies

Sensors: cameras, microphones, etc

encode

Artificial brain with neurons

Servo motors with wheels Biological and Non-biological bodies

Sensors: cameras, microphones, etc

encode

Artificial brain with neurons

Servo motors with wheels decode Perception Action

Vision Behaviors Hearing

Smell Action Central pattern generators Touch Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. Smell Action Central pattern generators Touch Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory

Perception and behaviours representations Reflexes Temperature

Vestibular Muscle contraction Proprioception Example: Somato-sensory maps in the cortex

1 Touch sensitive 2 Somato-sensory map on the 3 Schema of how these regions regions on the mouse mouse brain are mapped to the the mouse body somato-sensory cortex

Hind limbs Trunk

Forelimbs

Whiskers

Mouth Nose

Sensory cortex limits cortical maps and drives top-down plasticity in thalamocortical circuits Zembrzycki et al. 2013 Example: Somato-sensory maps in the cortex

Trunk: largest area of the body – smallest part of the cortical map

Hind limbs Trunk

Forelimbs

Whiskers

Nose: small area of the Mouth Nose body – larges area of the map The size of a somato-sensory representation in the brain corresponds to the frequency of its stimulation. Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory

Drives & Working Vision Cognitive Motivation memory control

Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory

Perception and behaviours representations Reflexes Temperature

Vestibular Muscle contraction Proprioception Example of behavior learning: the Morris Water Maze Example of behavior learning: the Morris Water Maze

Rats learn to find a hidden Time to find platform platform – they don’t like cold water.

10 trials

Foster, Morris, Dayan 2000 Different types of learning:

1. Supervised learning • learning from labelled examples

2. Unsupervised learning • learning from unknown examples

3. Reinforcement learning • Learning actions from rewards Different types of learning:

1. Supervised learning • learning from labelled examples

2. Unsupervised learning • learning from unknown examples

3. Reinforcement learning • Learning actions from rewards Supervised learning

tree

tree

tree label

tree feature 1,...,n Different types of learning:

1. Supervised learning • learning from labelled examples

2. Unsupervised learning • learning from unknown examples

3. Reinforcement learning • Learning actions from rewards Unsupervised learning

Find structure in the data =

• Data: D={x1,x2,x3,...} • discover different classes of stimuli (e.g. trees and non-trees) • find ‘useful’ feature basis

Trees Trees Something else 2 feature

Something else

feature 1 Different types of learning:

1. Supervised learning • learning from labelled examples

2. Unsupervised learning • learning from unknown examples

3. Reinforcement learning • Learning actions from rewards Reinforcement-Learning: Some examples

• An agent is in a state S • It can take one of several actions • Each action leads to a new state S’. • In some states, the agent is rewarded or punished. Reinforcement-Learning: Some examples

• An agent is in a state S • It can take one of several actions • Each action leads to a new state S’. • In some states, the agent is rewarded or punished. • Goal: Maximize reward Reinforcement-Learning: Some examples

S = State S’ = new state

A = Action

R= Reward Agent environment interaction

Agent

Environment Learning through interaction with the environment

Agent

Rt St

At

Rt+1 St+1 Environment

At

St St+1 Rt+1 Sensory stimulation Perceptual/Behavioural changes Stimulus, response + reward Learning synaptic plasticity Learning and synaptic plasticity

Synaptic plasticity: change in connection strengths Behavioural learning and synaptic plasticity

Synaptic plasticity: change in connection strengths Behavioural learning and synaptic plasticity

Synaptic plasticity: change in connection strengths

Axon terminal Neurotransmitter

Vesicle

Synaptic cleft Receptor

Dendrite Learning through synaptic plasticity

action potential (spike)

pre time post i j

Axon terminal

Neurotransmitter amplitude time Before learning Vesicle

Synaptic cleft Receptor

Dendrite Synapse Learning through synaptic plasticity

action potential (spike)

pre time post i j

Axon terminal

Neurotransmitter amplitude time After learning Vesicle

Synaptic cleft Receptor

Dendrite Synapse Hebb’s postulate

“When an axon of cell j repeatedly or persistently takes part in firing cell i, then j’s efficiency as one of the cells firing i is increased”

Donald O. Hebb The Organization of Behavior (1949) Hebbian postulate explained: cell assemblies

• Items are encoded by groups of cells, so-called assemblies Hebbian postulate explained: cell assemblies

Item A

• If an item is sensed or recalled, the neurons in the assembly are activated Hebbian postulate explained: cell assemblies

Item A

• as a result, the strength of the connection in the assembly increases Hebbian postulate explained: cell assemblies

Item B Hebbian postulate explained: cell assemblies

Item B Hebbian postulate explained: cell assemblies

Item B Hebbian postulate explained: cell assemblies

Partial activation of A... Hebbian postulate explained: cell assemblies

Item A

Partial activation of A...... triggers activation of the remaining neurons ! pattern completion Hebbian learning in experiments (schematic)

pre j w ij u EPSP no spike of i i post Hebbian learning in experiments (schematic)

pre j w ij u EPSP no spike of i i post

pre j Both neurons wij simultaneously active i post Hebbian learning in experiments (schematic)

pre j w ij u EPSP no spike of i i post

pre j Both neurons wij simultaneously active i post pre j

EPSP wij no spike of i i post Increased amplitude ⇒ Δwij > 0 Hebbian plasticity

Donald Hebb’s postulate (1949): When an axon of cell j repeatedly or persistently takes part in firing cell i, then j’s efficiency as one of the cells firing i is increased

pre j

wij i k post time

• learns correlations (simultaneous activity) • acts locally on the neurons activated Summary: Hebbian plasticity in experiments

• Synaptic changes are induced by co-activation of pre- and post-synaptic pre post neurons i • Changes persist for a long time j • Changes can lead to an increase or decrease of the post-synaptic potential

Functionality • useful for learning a new behaviour • useful for development (e.g., wiring for receptive field development) • useful for activity control in network (homeostatis) • useful for coding Hebbian learning is unsupervised learning

pre post

i j local

Reinforcement learning = Hebb + reward signal

SUCCESS

pre post i j local global

Types of learning

Unsupervised learning Reinforcement learning

• Aim: detect structure in data • Aim: learn new skills and actions • exploits statistical structure of • maximises expected reward for the data (stimuli) actions • Important during development • Important to learn from experience success pre pre post post i i j j

Simple firing rate neuron model

Spikes Simple firing rate neuron model

Spikes

Spike/firing rate

activity = number of spikes per unit time Simple firing rate neuron model

Spikes

Spike/firing rate weight

stimulus response

activity = number of spikes per unit time Simple firing rate neuron model

threshold Spikes y - response

x - stimulus

Spike/firing rate weight

stimulus response

activity = number of spikes per unit time Simple firing rate neuron model

threshold Spikes y - response

x - stimulus

Spike/firing rate weight

stimulus response

activity = number of spikes per unit time Rate-based Hebbian learning

pre j

i post

Change of strength depends on the pre- and post-synaptic firing rates: Rate-based Hebbian learning

pre j

i post

Change of strength depends on the pre- and post-synaptic firing rates:

Taylor expansion: Rate-based Hebbian learning

pre j

i post

Change of strength depends on the pre- and post-synaptic firing rates:

Taylor expansion:

Hebb’s learning rule

pre j

i post

is always ≥ 0

▪ In a system where weights can only grow, eventually all weights will reach their maximum value! Weight in Hebb’s learning rule

pre j

i post

Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Covariance rule Hebbian Learning: rate model pre j

i k post

post pre

ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Covariance rule + - - + Example: Oja’s rule

Oja’s Rule (1982)

Summary: Types of learning

Unsupervised learning Reinforcement learning

• Aim: detect structure in data • Aim: learn new skills and actions • exploits statistical structure of • maximises expected reward for the data (stimuli) actions • Important during development • Important to learn from experience success pre pre post post i i j j

Summary

• Neurorobotics is the study of • Sensory skills are e.g. the embodied neural systems development of visual orientation • It studies the interaction of a selectivity ! feature neural system with its discrimination environment • Motor skills are e.g. the abilities to • The body is the interface walk or to orient in a maze between the neural system • Learning is a system property. and the environment • • Learning is the ability of an Synaptic plasticity is a synaptic agent to acquire new sensory property. or motor skills • Learning requires plasticity, but not vice versa. Short- Long- Perception term term Action memory memory Workin Drives & Vision g Cognitive Motivatio memor control n y Action Behaviors Sensor Reward selectio Hearing fusion & n punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature

Vestibular Muscle contraction Proprioception Short- Long- Perception term term Action memory memory Workin Drives & Vision g Cognitive Motivatio memor control n y Action Behaviors Sensor Reward selectio Hearing fusion & n punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory

Perception and behaviours representations Reflexes Temperature

Vestibular Muscle contraction Proprioception