Neurorobotics An introduction Marc-Oliver Gewaltig In this lecture you’ll learn
1. What is Neurorobotics 2. Examples of simple neurorobots 1. attraction and avoidance 2. reflexes vs. learned behavior 3. The sensory-motor loop 4. Learning in neurorobotics 1. unsupervised learning for sensory representations 2. reinforcement learning for action learning What is Neurorobotics
Neurorobotics, is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems.
https://en.wikipedia.org/wiki/Neurorobotics Neurorobotics: Embodied in silico neuroscience
Spinal Cord Reconstructed Reflexes spinal cord/ CDPs brain models
Embodiment and virtual environments
Musculo-skeletal system – compliant actuators and mechanics Starting simple: Valentino Braitenberg’s Vehicles
Valentino Braitenberg (1926-2011) Braitenberg, V. (1984). Vehicles: Experiments in Photo: Alfred Wegener, commons.wikimedia.org synthetic psychology. Cambridge, MA: MIT Press. Vehicle 1
1 Vehicle 2a
1 2a Vehicle 2b
1 2a 2b Vehicle 3
1 2a 2b 3 Vehicle 3
1 2a 2b 3 Exercise
How will vehicle 3 move? Generalizing the Braitenberg vehicle Exercise
Using weights in {-1,+1}, which weight configurations implement the vehicles 2a, 2b, and 2c?
speed light
Biological and Non-biological bodies
Sensors: cameras, microphones, etc
Artificial brain with neurons
Servo motors with wheels Biological and Non-biological bodies
Sensors: cameras, microphones, etc
encode
Artificial brain with neurons
Servo motors with wheels Biological and Non-biological bodies
Sensors: cameras, microphones, etc
encode
Artificial brain with neurons
Servo motors with wheels decode Perception Action
Vision Behaviors Hearing
Smell Action Central pattern generators Touch Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. Smell Action Central pattern generators Touch Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory
Perception and behaviours representations Reflexes Temperature
Vestibular Muscle contraction Proprioception Example: Somato-sensory maps in the cortex
1 Touch sensitive 2 Somato-sensory map on the 3 Schema of how these regions regions on the mouse mouse brain are mapped to the the mouse body somato-sensory cortex
Hind limbs Trunk
Forelimbs
Whiskers
Mouth Nose
Sensory cortex limits cortical maps and drives top-down plasticity in thalamocortical circuits Zembrzycki et al. 2013 Example: Somato-sensory maps in the cortex
Trunk: largest area of the body – smallest part of the cortical map
Hind limbs Trunk
Forelimbs
Whiskers
Nose: small area of the Mouth Nose body – larges area of the map The size of a somato-sensory representation in the brain corresponds to the frequency of its stimulation. Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Perception Short-term Long-term Action memory memory
Drives & Working Vision Cognitive Motivation memory control
Action Behaviors Sensor Reward & selection Hearing fusion punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory
Perception and behaviours representations Reflexes Temperature
Vestibular Muscle contraction Proprioception Example of behavior learning: the Morris Water Maze Example of behavior learning: the Morris Water Maze
Rats learn to find a hidden Time to find platform platform – they don’t like cold water.
10 trials
Foster, Morris, Dayan 2000 Different types of learning:
1. Supervised learning • learning from labelled examples
2. Unsupervised learning • learning from unknown examples
3. Reinforcement learning • Learning actions from rewards Different types of learning:
1. Supervised learning • learning from labelled examples
2. Unsupervised learning • learning from unknown examples
3. Reinforcement learning • Learning actions from rewards Supervised learning
tree
tree
tree label
tree feature 1,...,n Different types of learning:
1. Supervised learning • learning from labelled examples
2. Unsupervised learning • learning from unknown examples
3. Reinforcement learning • Learning actions from rewards Unsupervised learning
Find structure in the data =
• Data: D={x1,x2,x3,...} • discover different classes of stimuli (e.g. trees and non-trees) • find ‘useful’ feature basis
Trees Trees Something else 2 feature
Something else
feature 1 Different types of learning:
1. Supervised learning • learning from labelled examples
2. Unsupervised learning • learning from unknown examples
3. Reinforcement learning • Learning actions from rewards Reinforcement-Learning: Some examples
• An agent is in a state S • It can take one of several actions • Each action leads to a new state S’. • In some states, the agent is rewarded or punished. Reinforcement-Learning: Some examples
• An agent is in a state S • It can take one of several actions • Each action leads to a new state S’. • In some states, the agent is rewarded or punished. • Goal: Maximize reward Reinforcement-Learning: Some examples
S = State S’ = new state
A = Action
R= Reward Agent environment interaction
Agent
Environment Learning through interaction with the environment
Agent
Rt St
At
Rt+1 St+1 Environment
At
St St+1 Rt+1 Sensory stimulation Perceptual/Behavioural changes Stimulus, response + reward Learning synaptic plasticity Learning and synaptic plasticity
Synaptic plasticity: change in connection strengths Behavioural learning and synaptic plasticity
Synaptic plasticity: change in connection strengths Behavioural learning and synaptic plasticity
Synaptic plasticity: change in connection strengths
Axon terminal Neurotransmitter
Vesicle
Synaptic cleft Receptor
Dendrite Learning through synaptic plasticity
action potential (spike)
pre time post i j
Axon terminal
Neurotransmitter amplitude time Before learning Vesicle
Synaptic cleft Receptor
Dendrite Synapse Learning through synaptic plasticity
action potential (spike)
pre time post i j
Axon terminal
Neurotransmitter amplitude time After learning Vesicle
Synaptic cleft Receptor
Dendrite Synapse Hebb’s postulate
“When an axon of cell j repeatedly or persistently takes part in firing cell i, then j’s efficiency as one of the cells firing i is increased”
Donald O. Hebb The Organization of Behavior (1949) Hebbian postulate explained: cell assemblies
• Items are encoded by groups of cells, so-called assemblies Hebbian postulate explained: cell assemblies
Item A
• If an item is sensed or recalled, the neurons in the assembly are activated Hebbian postulate explained: cell assemblies
Item A
• as a result, the strength of the connection in the assembly increases Hebbian postulate explained: cell assemblies
Item B Hebbian postulate explained: cell assemblies
Item B Hebbian postulate explained: cell assemblies
Item B Hebbian postulate explained: cell assemblies
Partial activation of A... Hebbian postulate explained: cell assemblies
Item A
Partial activation of A...... triggers activation of the remaining neurons ! pattern completion Hebbian learning in experiments (schematic)
pre j w ij u EPSP no spike of i i post Hebbian learning in experiments (schematic)
pre j w ij u EPSP no spike of i i post
pre j Both neurons wij simultaneously active i post Hebbian learning in experiments (schematic)
pre j w ij u EPSP no spike of i i post
pre j Both neurons wij simultaneously active i post pre j
EPSP wij no spike of i i post Increased amplitude ⇒ Δwij > 0 Hebbian plasticity
Donald Hebb’s postulate (1949): When an axon of cell j repeatedly or persistently takes part in firing cell i, then j’s efficiency as one of the cells firing i is increased
pre j
wij i k post time
• learns correlations (simultaneous activity) • acts locally on the neurons activated Summary: Hebbian plasticity in experiments
• Synaptic changes are induced by co-activation of pre- and post-synaptic pre post neurons i • Changes persist for a long time j • Changes can lead to an increase or decrease of the post-synaptic potential
Functionality • useful for learning a new behaviour • useful for development (e.g., wiring for receptive field development) • useful for activity control in network (homeostatis) • useful for coding Hebbian learning is unsupervised learning
pre post
i j local
Reinforcement learning = Hebb + reward signal
SUCCESS
pre post i j local global
Types of learning
Unsupervised learning Reinforcement learning
• Aim: detect structure in data • Aim: learn new skills and actions • exploits statistical structure of • maximises expected reward for the data (stimuli) actions • Important during development • Important to learn from experience success pre pre post post i i j j
Simple firing rate neuron model
Spikes Simple firing rate neuron model
Spikes
Spike/firing rate
activity = number of spikes per unit time Simple firing rate neuron model
Spikes
Spike/firing rate weight
stimulus response
activity = number of spikes per unit time Simple firing rate neuron model
threshold Spikes y - response
x - stimulus
Spike/firing rate weight
stimulus response
activity = number of spikes per unit time Simple firing rate neuron model
threshold Spikes y - response
x - stimulus
Spike/firing rate weight
stimulus response
activity = number of spikes per unit time Rate-based Hebbian learning
pre j
i post
Change of strength depends on the pre- and post-synaptic firing rates: Rate-based Hebbian learning
pre j
i post
Change of strength depends on the pre- and post-synaptic firing rates:
Taylor expansion: Rate-based Hebbian learning
pre j
i post
Change of strength depends on the pre- and post-synaptic firing rates:
Taylor expansion:
Hebb’s learning rule
pre j
i post
is always ≥ 0
▪ In a system where weights can only grow, eventually all weights will reach their maximum value! Weight homeostasis in Hebb’s learning rule
pre j
i post
Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Covariance rule Hebbian Learning: rate model pre j
i k post
post pre
ON/ON ON/OFF OFF/ON OFF/OFF Simple Hebbian + 0 0 0 Gated + - - - Post-synaptically gated + 0 - 0 Pre-synaptically gated + - 0 0 Covariance rule + - - + Example: Oja’s rule
Oja’s Rule (1982)
Summary: Types of learning
Unsupervised learning Reinforcement learning
• Aim: detect structure in data • Aim: learn new skills and actions • exploits statistical structure of • maximises expected reward for the data (stimuli) actions • Important during development • Important to learn from experience success pre pre post post i i j j
Summary
• Neurorobotics is the study of • Sensory skills are e.g. the embodied neural systems development of visual orientation • It studies the interaction of a selectivity ! feature neural system with its discrimination environment • Motor skills are e.g. the abilities to • The body is the interface walk or to orient in a maze between the neural system • Learning is a system property. and the environment • • Learning is the ability of an Synaptic plasticity is a synaptic agent to acquire new sensory property. or motor skills • Learning requires plasticity, but not vice versa. Short- Long- Perception term term Action memory memory Workin Drives & Vision g Cognitive Motivatio memor control n y Action Behaviors Sensor Reward selectio Hearing fusion & n punish. SmellLearning of Action Central pattern generators Touchsensory representations Perception Reflexes Temperature
Vestibular Muscle contraction Proprioception Short- Long- Perception term term Action memory memory Workin Drives & Vision g Cognitive Motivatio memor control n y Action Behaviors Sensor Reward selectio Hearing fusion & n punish. SmellLearning of Action Central pattern Learninggenerators of skills Touchsensory
Perception and behaviours representations Reflexes Temperature
Vestibular Muscle contraction Proprioception