arXiv:1605.03269v1 [cs.RO] 11 May 2016 ain rmL anmr n .Lwsfo nvriyo Her of University from Lewis M. Ca˜namero and data L. the from acknowledge rations Uni to like Shenzhen would JZ of 00036107). NSF (00002501, from and (KQCX20140509172609163), City inDprmn fGagog(2013LYM Guangdong of Department tion Educat Higher Engineeri Yumiao of (20134408120004), Education Program of Doctoral Ministry the for Fund (S2013040016857) Research supporte Guangdong ized of was NSF MD (61301182), BABEL. China project of NSF EPSRC UK and 288382 agreement ispt.Ti aiiae epet ifrnit emoti differentiate to behaviours. slow to subtle and these people arise observing facilitates from to This slow relatively be dissipate. to thes neu- ac- tend arousal), from states be as responses bodily (such can neural modulation cortical behaviour dopamine the such rotransmitter of Since part superiorit for and counted inferiority- [12]. linked emotions be th can related instance, head For be of states. can angle emotional tilt body expressing human as the of been of cert thought behaviours have information. motor languages affective general of body subtle signal that a found constitute 7] to found 5, 3, primi- emotions. [1, motor with, co-occur instance, as or by, expression caused partially physical are the tives, expre instance, sensorimotor for the that sion, sense the in s process, including sorimotor processes cognitive m various with, mediates interacts and state lates internal the that agreed widely is It Introduction 1 .Seze e a fAvne omncto n Informati and Communication Advanced of Lab Key Shenzhen 4. irrhclEoinRgltdSnoioo oe:Case Model: Sensorimotor Regulated Emotion Hierarchical A ZadA eespotdb h UpoetPEIO+ ne gr under POETICON++ project EU the by supported were AC and JZ eaius hs euaincnb eitdi h Bayesia the in depicted be can regulation represen These common-coding of kind behaviours. a as acts status internal the Abstract: e Words: Key ap further parame was with model, learning network regulated neural emotion two-level emot recurrent their called to network according neural differences subtle with behaviours rsin oilRobotics Social pression, upiZhong Junpei .Cnr fQatmCmuainadItlietSses U Systems, Intelligent and Computation Quantum of Centre 3. .Cnr o ooisadNua ytm,Pyot Univers Plymouth Systems, Neural and Robotics for Centre 1. nprdb h irrhclcgiieacietr n th and architecture cognitive hierarchical the by Inspired moidEoinMdlig ontv rhtcue Recu Architecture, Cognitive Modelling, Emotion Embodied .Lbrtr o nelgn yaisadRepresentation and Dynamics Intelligent for Laboratory 2. .Sho fEetia n nomto niern,Jinan Engineering, Information and Electrical of School 5. 1 , 2 oyNovianto Rony , 07,Fudto fShenzhen of Foundation 0077), E-mail: { upizog a.cangelosi junpei.zhong, -al [email protected] E-mail: gfo Educa- from ng -al [email protected] E-mail: stadinspi- and -set nvriy hnhn China. Shenzhen, University, 3 o rmthe from ion tfordshire. -al [email protected] E-mail: igu Dai Mingjun , Special- , versity odu- by d ons For en- ain ant y- s- nPoesn n hnhnKyLbrtr fMdaSecurity Media of Laboratory Key Shenzhen and Processing on e e le otsiyti hoyi w ifrn cases. different two in theory this testify to plied ainwihafcs eitsadee euae h senso the regulates even and mediates affects, which tation o rrcgieteeoinb ecpin oe recurren novel A . by emotion the recognize or ion rmwr,ta swycgiieaet r bet generat to able are agents cognitive why is that framework, n rcba nt RNB usi he oe,cntutn a constructing modes, three in runs (RNNPB) units bias tric eto n cini ipyfre yete h mapping the either by formed per- simply the is between action representation We and common ception [16]. this basis that representational coding same asserted inter common the are the sharing perception by and on twined action based that advocates is Perception- which PAM theory of Regulated framework The Emotion Bayeisan A 2 per- and generation i action recognition. feasibility motor emotion its sonalized an about show such cases will realize different model to two neural used This be two- will architecture. A model learning [15]. neural theory level model(PAM) based model perception-action regulation the emotion on an arch propose sensorimotor emo- we hierarchical counterpart’s [25], the tecture its by recognize extended so- Thus, to emo tion. 2) (e.g. non-verbal natural and agent a expression, artificial obtain tion to an 1) endow platform) to robotic model cial useful learning also a are emo tions obtain and behaviours sensorimotor to between relation perspective, the about technical recogni emotion a for From [17] features’ ‘critical sy or [22]. tion. coding 6] motions hierarchical 7, a dynamic [2, exist tem or there by that [23] emotions commu- suggested postures Studies others’ social static recognize daily either to our able their in are do people we nication, as similar Conversely, 4 izegZhang Xinzheng , ecpinato oe PM 1] epooethat propose we [15], (PAM) model perception-action e cinModel Action } @plymouth.ac.uk t,Pyot,P48A ntdKingdom. United 8AA, PL4 Plymouth, ity, iest fTcnlg,Sde,Australia. Sydney, Technology, of niversity aeaUiest,Tko Japan. Tokyo, University, Waseda , rn erlNtok,Nnvra mto Ex- Emotion Non-verbal Networks, Neural rrent nvriy hhi China. Zhuhai, University, 5 neoCangelosi Angelo , Studies 1 rimotor Shenzhen , e t s- i- n - - - - from perception or the perceptual events that actions produce. state, given the perceived behaviours performed by the Specifically, the representation does not explicitly represent others by means of the forward models (e.g. Bayesian actions; instead, there is an encoding of the possible future Model) (e → s). In the formulation of Bayesian infer- percept which is learnt from prior sensorimotor knowledge. ence, it can be written as Eq. 1: Therefore, this perception-action framework is derived from the ideomotor principle [9], which advocates that actions are P (S|A, E) ∝ P (S)P (A|S) (1) represented with prediction of their perceptual consequences, where S estimates the internal state evidence given an i.e. it encodes the forthcoming perception that is going to executed action A (e.g. motor imagination by the ob- happen when an action is executed (i.e. motor imagery) [4]. server itself), the perception E. The term P (A|S) sug- Due to the modulation role of emotion state, we propose that gests a pre-learnt internal model representing the pos- there is an intermediate level of internal state between the sibility of a motor action A will be executed given a cognitive processes and the perception-action cycle. As we certain internal state (S). Since this model is solely by stated, this level of internal state, such as emotion, regulates the observer itself, it may differ from various cultures, the expressions of sensorimotor behaviours. To some extents, ages, personalities. Therefore, learning of such an inter- wecan also regardthis levelof internalstate as a levelof com- nal model is crucial. Here another assumption is held mon coding. As such, the representation of the internal state that one’s own action (or imagined action) A is the same reflects the perception input and the motor imagination, form- as its perceived action (E). ing an imagination of affective understanding such as empa- thy. Instead of the linkage between action and perception • Second, these associations allow the agent to move with (e.g. sensorimotor contingency), the common coding theory a certain voluntary or involuntary behaviour given an proposes that perception and action may modulate each other internal state. From the backward computations intro- directly via the shared coding by a similarity-based match- duced in Eq. 2 (s → a), a predictive/generative sensori- ing of common codes, even regulated by the internal state in motor integration occurs: our model. Therefore, the pairing of perception and action, i.e. the acquisition of ‘common coding’, emerges from prior P (A|S) ∝ P (A)P (S|E)+ P (A)P (S|·) (2) experience of the agent. For instance, assume we have the hierarchical PAM model as a two-level architecture, it is de- where A indicates a motor action given the internal state generated as the model stated in [14]. If the agent (called S. In the equation we omit the factor of goal, but it ‘observer’) observes that one person (called ‘presenter’) is is also the main target of any voluntary actions. Here hurt by a knife, the model may trigger involuntary and subtle we assume that one’s internal state is determined by the movement when the agent is doing a certain kind of hand current sensory input and a lot more factors (P (S|·)). movement (e.g. moving a bit of the arm). Even the ob- • In terms of its hierarchical organization, it also allows server feels sympathy about the incident. In this example, this operation: with bidirectional information pathways, both of the current afferent information (referring to the per- a low level perception representation can be expressed ceived event) and predictive efferent information (referring on a higher level, with a more complex receptive field, to intended events from actions) have the same format and and vice versa (elow ↔ ehigh). These operations can be structure of the internal state representation. achieved by extracting statistical regularity, which may From the aforementioned sensorimotor functions, we pro- be expressed as deep learning architectures. pose a hierarchical cognitive architecture as shown in Fig. 1 focusing on the emotion regulated functions to the sensori- Thus, this Bayesian framework proposes that an internal state motor behaviours. From the common coding framework of as one of the prior for feedbackpathways represents a linkage perception and action, the information of the feedback path- between perception and motor actions. We will propose a ways are formed through various levels, regulated by inter- simple learning model addressing the first two operations at nal states of the cognitive agent in the hierarchical archi- the next section. After that we will demonstrate two case tecture. Between the cognitive processes and the hierarchi- studies about its feasibility. cal motor controls, internal state (such as emotion) regulates 3 ALearningModel perception-action cycle functions such as perception, motor imagery, and action planning. To establish the links be- 3.1 Parametric Bias tween movements (a), the sensory effects (e) and the inter- Recurrent Neural Network with Parametric Bias Units [19, nal state (s), one may need one or more processes of latent 20, 28, 27, 26] (Fig. 2) is a family of recurrent neural net- learning [21], reinforcement learning [24] or evolutionarily works that consist of an additional bias unit acting as bifurca- hard-wired acquisition [10]. This link, once establish, can be tion parameters for the nonlinear dynamics. In other words, described in the following operations: the adjustable values in such a small number of units can de- termine different dynamics in the network with large dimen- • First, these associations allow to estimate the internal sion. Being different from ordinary biases, these units are 3.2 Three Modes The RNNPB three running modes in an RNNPB network, namely learning, recognition and generation modes. They functionally simulate different stages between sensorimotor sequences and high-level internal states of these sequences. The neural dynamics and algorithms of these modes are as following:

• Learning mode (Fig. 3a): The learning algorithm per- forms in an off-line and supervised way as generic re- current neural networks do. When providingthe training stimulus with a new pattern, the weights connectingneu- rons are updated with BPTT (back-propagation through time). Besides, the residual error from the BPTT up- dates the internal values in PB units; the internal values of the PB units are updated in a self-organising way by this error derived from the back-propagation. To achieve the slowness update of the internal value, in each epoch (dorsal) e, the kth PB unit u updates its internal value based on Figure 1: A Hierarchical Perception-Action Model with Ac- the summation of the back-propagated error from one tion and Visual Dorsal Stream complete sequence. • Recognition mode (Fig. 3b): In this mode, the network recognises which type of sequences by updating the in- ternal values of PB units. The information flow in this mode is mostly the same as in the learningmode, i.e. the error is back-propagatedfrom output neurons to the hid- den neurons, but the synaptic weights are not updated; rather, the back-propagated error only contributes to the updating of the PB units. By this mean, if a trained se- quence is presented to the network, the values of the PB units will converge to the ones that were previously shown in the learning mode in order to restore the PB values trained before. • Prediction mode (Fig. 3c): After learning and after the Figure 2: Recurrent Neural Network with Parametric Bias synaptic weights are determined, the RNNPB can act in Units a closed-loop way: the output prediction can be used as an input for the next time step. In principle, the net- work can generate a trained sequence by providing ini- tial value of the input and externally setting the PB val- ues. trained from the training sequences with back-propagation through time (BPTT) in a self-organizing way, so the units exhibit the differences of the features of the training se- 3.2.1 Learning Mode quences. Furthermore, as bifurcation parameters for non- During learning mode, if the training progress is basically de- linear functions, the parametric bias units (PB units) endow termined by this cost function C; following gradient descent, the network to acquire the ability of learning different non- each weight update in the network is proportional to the neg- linear dynamic attractors, which makes it advantageous over ative gradient of the cost with respect to the specific weight generic recurrent networks. w that will be updated: Due to the above features, an RNNPB network can be re- ∂C garded as a two-layer hierarchical structure for cognitive sys- ∆wij = −ηij (3) ∂wij tems. Based on this feature, this network has also been further adopted to mimic the cognitive processes such as where ηij is the adaptive learning rate of the weights between language acquisition [27], language generalisation [18] and neuron i and j, which is adjusted in every epoch. To de- other cognitive processes. termine whether the learning rate has to be increased or de- (a) Learning Mode (b) Recognition Mode (c) Generation Mode Figure 3: Simulation results for the network.

creased, we compute the changes of the weight wi,j in con- 3.2.2 Recognition Mode secutive epochs: The recognition mode is executed with a similar information ∂C ∂C flow as the learning mode: given a set of spatio-temporal se- σi,j = (e − 1) (e) (4) ∂wi,j ∂wi,j quences, the error between the target and the real output is back-propagated through the network to the PB units. How- The update of the learning rate is ever, the synaptic weights remain constant and only the PB units will be updated, so that the PB units are self-organized + min(ηi,j (e − 1) · ξ , ηmax) if σi,j > 0, as the pre-trained values after certain epochs. Assuming the  − ηi,j (e)=  max(ηi,j (e − 1) · ξ , ηmin) if σi,j < 0, (5) length of the observed sequence is a, the update rule is de- ηi,j (e − 1) else. fined as:  + − where ξ > 1 and ξ < 1 represent the increas- T ing/decreasing rate of the adaptive learning rates, with ηmin ρ (e +1)= ρ (e)+ γ δP B (8) i i X i,j and ηmax as lower and upper bounds, respectively. Thus, the t=T −a learning rate of a particular weight increases by ξ+ to speed up the learning when the changes of that weight from two where δP B is the error back-propagated from a certain sen- consecutive epochs have the same sign, and vice versa. sory information sequence to the PB units and γ is the up- As mentioned before, besides the usual weight update ac- dating rate of PB units in recognition mode, which should be cording to back-propagation through time, the accumulated larger than the adaptive rate γi at the learning mode. error over the whole time-series also contributes to the up- date of the PB units. The update for the i-th unit in the PB 3.2.3 Generation Mode vector for a time-series of length T is defined as: The values of the PB units can also be manually set or ob- T tained from recognition, so that the network can generate the ρ (e +1)= ρ (e)+ γ δP B (6) i i i X i,j upcoming sequence with one-step prediction. t=1 Furthermore, according to [8], a trained RNNPB not only can where δP B is the error back-propagated to the PB units, e retrieve and recognise different types of pre-learnt, non-linear is e-th time-step in the whole time-series (e.g. epoch), γi is oscillation dynamics, as an expansion of the storage capabil- PB units’ adaptive updating rate which is proportional to the ity of working-memory within the sensory system, but also absolute mean value of the back-propagation error at the i-th adds the generalisation ability to recognise and generate un- PB node over the complete time-series of length T : trained non-linear dynamics and compositions.

T 1 4 CaseStudies γ ∝ δP B (7) i T X i,j t=1 4.1 Case 1: Generation of Non-verbal Emotion Expres- The reason for applying the adaptive technique is that it is sion by Avatar Simulation hard for the PB units to converge in a stable way. Usually In this demonstration, we used a simulation based on RN- a smaller learning rate is used in the generic version of RN- NPB with the parameters shown in Tab. 1 to generate spe- NPB to ensure the convergence of the network. This results cialised non-verbalrobot behaviours. We separated two kinds in a trade-off in convergence speed. However, the adaptive of training data based on two different behaviours in order to learning rate we used is an efficient technique to overcome eliminate the subtle differences in different behaviours gen- this trade-off. eration, corresponding we omit the goal G in Eq. 2. The training data was taken from an inertial motion capture that the PB value has a bit delay depending on the choice system by Xsens, while the actor was performing two sets the updating rate. However, such a delay can be used as a of basic behaviours (standing and walking) expressing five confidence level to simulate the decision processes for the kinds of emotions (joy, sadness, fear, anger, pride). This robot. three-dimensional skeleton motion capture was used instead To quantitatively evaluate the performance of the learning, camera-based systems such as Microsoft Kinect, because it we also compare the differences between the original PB would be easier to map the generated motion into humanoid value and the recognised PB value. During recognition, a robots for further project developments. Also motions from stopping criterion is set if the updating of PB values was various body parts could be well-isolated in a quantitative smaller than a threshold 0.1 in consecutive 100 times. Tab. 2 way for network training. Such quantitative data can be re- listed the distance of PB values under the training and recog- viewed and utilised in external editors and programs (Fig. 4). nition modes. Although during the experiments we could not The sampling frequency of this system was 120Hz. guarantee the training and recognition data-sets were identi- cal, we can still observe that behaviours from the same emo- tion drove the PB values closer to each other. Furthermore, we can also notice the behaviours from ‘joy’ and ‘pride’, ‘fear’ and ‘anger’ were also quite close which may indicate some connections between these two pairs of emotions. 5 Summary In this paper, we proposed a cognitive framework focusing on the regulation role of emotions to sensorimotor behaviours. This hierarchical architecture is based on the perception- action model, in which the internal state (emotion) acts as Figure 4: Three Dimensional Skeleton a prior in the feedback pathways to further modulate per- ception and actions. Following the proposed two Bayesian After training, as depicted in Eq. 2, given that the PB values inferences between internal state and the sensorimotor be- we obtained from training, we can generate the pre-trained haviours, we propose to use a recurrent neural network with or un-trained temporal sequences in the network’s gener- parametric bias units (RNNPB) to contruct a two-layer ar- ation mode. These behaviours were represented in a 78- chitecture in which the parametric bias represents an inter- dimensional data-set by reconstructing the network output. nal state. The non-verbal expression generation and emo- These simulations were done by adjusting the PB units into tion recognition demonstrations witness the feasibility of this the pre-trained values and the novel values as the midpoints hierarchical architecture. In the next steps, a few possibili- between either two pre-trained values in the PB space. In the ties can be made to further extend this architecture: 1) more avatar demonstrations, only the body skeleton were shown by factors, such as goal, perception from previous states and means of the lines connecting to skeleton points. Part of the prior knowledge, can be considered in the Bayesian infer- demonstrations can be viewed online. Apart from the trained ences; 2) a more sophisticated hierarchical learning method emotions, some ‘interval’ emotions from the behaviours can can be used to extract and generate more statistical regular- be also perceivedwhen the novelpoints in the PB spaces were ities; 3) we will further extend this hierarchical PAM model selected. with ASMO architecture[13] to allow a flexible attention be- tween various conditions as well as modulated by internal 4.2 Case 2: Personalised Emotion Recognition status such as emotion. In the second demonstration, the recognition mode of the net- REFERENCES work was investigated (Eq. 1) using the parameters shown in Tab. 1. Here we will also testify the real-time recognition [1] M. Argyle. Bodily communication. Routledge, 2013. ability of this method. Training was done by capturing the Kinect tracking data by OpenNI from one person. We only [2] N. Dael, M. Mortillaro, and K. R. Scherer. The body selected the upper body of the data (9-dimension: head, neck, action and posture coding system (BAP): Develop- torso, left shoulder, left elbow, left hand, right shoulder, right ment and reliability. Journal of Nonverbal Behavior, elbow, right hand), since the emotion is more significant in 36(2):97–121, 2012. the upper body. After training, the Kinect captured the data [3] C. Darwin. The expression of the emotions in man and again to test whether the networks can distinguish the emo- animals. Oxford Uni. Press, 1998. tion correctly. The real-time demonstration video showed [4] A. G. Greenwald. Sensory feedbackmechanisms in per- www.xsens.com/en/general/mvn http://youtu.be/9yahOKcEi-A formance control: with special reference to the ideo- http://youtu.be/JusCuKvHg44 motor mechanism. Psychol. Rev., 77(2):73, 1970. Parameters Parameter’s Descriptions Value η Initial Learning Rate 2.0 × 10−6 −4 ηmax Maximum Value of Case 1 1.0 × 10 −8 ηmin Minimum Value of Case 1 1.0 × 10 −3 ηr Updating Rate in Case 2 8.0 × 10 Mγ Proportionality Constant of PB Units Updating Rate 0.001 nh Size of Hidden Layer 100 nP B Size of PB Unit 2 ξ− Decreasing Rate of Learning Rate 0.999999 ξ+ Increasing Rate of Learning Rate 1.000001 Table 1: Network parameters

P PP Rec. PP joy pride fear anger sadness Train PP joy 0.3587 0.9755 3.5893 3.2201 5.8812 pride 0.9074 1.0143 3.5741 4.1577 7.6670 fear 3.1214 4.0362 0.9783 1.2778 3.0431 anger 4.2789 3.6609 1.2128 0.9955 9.1021 sadness 4.6976 7.2737 3.4684 9.4351 0.8954 Table 2: Differences between PB Values(Standing Behaviour)

[5] N. Hadjikhani and B. de Gelder. Seeing fearful body [14] S. D. Preston. A perception-action model for . expressions activates the fusiform cortex and amygdala. Empathy in mental illness, pages 428–447, 2007. Curr. Biol., 13(24):2201–2205, 2003. [15] S. D. Preston and F. De Waal. Empathy: Its ultimate [6] E. M. Huis et al. The body action coding system ii: and proximate bases. Behavioral and brain sciences, muscle activations during the perception and expression 25(01):1–20, 2002. of emotion. Frontiers in behavioral , 8, 2014. [16] W. Prinz. Perception and action planning. European Journal of , 9(2):129–154, 1997. [7] E. M. Huis in t Veld, G. J. Van Boxtel, and B. de Gelder. The body action coding system i: Muscle activations [17] C. L. Roether, L. Omlor, A. Christensen, and M. A. during the perception and expression of emotion. Social Giese. Critical features for the perception of emotion neuroscience, 9(3):249–264, 2014. from gait. J. of Vision, 9(6):15, 2009.

[8] M. Ito and J. Tani. Generalization in learning multiple [18] Y. Sugita and J. Tani. Learning semantic combinatori- temporal patterns using rnnpb. In Neural Information ality from the interaction between linguistic and behav- Processing, pages 592–598. Springer, 2004. ioral processes. Adaptive Behavior, 13(1):33, 2005.

[9] W. James. The consciousness of self. The Principles of [19] J. Tani and M. Ito. Self-organizationof behavioral prim- Psychology, 8, 1890. itives as multiple attractor dynamics: A robot experi- [10] J. LeDoux. The amygdala. Current Biology, ment. IEEE Transactions on Systems, Man and Cyber- 17(20):R868–R874, 2007. netics, Part A: Systems and Humans, 33(4):481–488, 2003. [11] M. Lewis and L. Ca˜namero. Are discrete emotions use- ful in human-robot interaction? feedback from motion [20] J. Tani, M. Ito, and Y. Sugita. Self-organization of dis- capture analysis. In 2013 Humaine Association Confer- tributedly represented multiple behavior schemata in a ence on Affective Computing and Intelligent Interaction mirror system: reviews of robot experiments using RN- (ACII), pages 97–102. IEEE, 2013. NPB. Neural Networks, 17(8-9):1273–1289, 2004. [12] A. Mignault and A. Chaudhuri. The many faces of [21] D. Thistlethwaite. A critical review of latent learn- a neutral face: Head tilt and perception of dominance ing and related experiments. Psychological Bulletin, and emotion. J. of Nonverbal Behavior, 27(2):111–132, 48(2):97, 1951. 2003. [22] R. D. Walk and C. P. Homan. Emotion and dance in dy- [13] R. Novianto. Flexible attention-based cognitive archi- namic light displays. Bulletin of the Psychonomic Soci- tecture for robots. 2014. ety, 22(5):437–440, 1984. [23] K. L. Walters and R. D. Walk. Perception of emotion from body posture. 24(5):329–329, 1986. [24] I. Wright. Reinforcement learning and animat emo- tions. In From Animals to Animats IV, Proceedings of the Fourth International Conference on the Simulation of Adaptive Behavior, pages 272–281, 1996. [25] J. Zhong. Artificial neural models for feedback path- ways for sensorimotor integration. 2015. [26] J. Zhong and L. Canamero. From continuous affective space to continuous expression space: Non-verbal be- haviour recognition and generation. In Development and Learning and Epigenetic Robotics (ICDL-Epirob), 2014 Joint IEEE International Conferences on, pages 75–80. IEEE, 2014. [27] J. Zhong, A. Cangelosi, and S. Wermter. Towards a self-organizing pre-symbolic neural model representing sensorimotor primitives. Frontiers in Behavioral Neu- roscience, 8:22, 2014. [28] J. Zhong, C. Weber, and S. Wermter. Robot trajectory prediction and recognition based on a computational mirror neurons model. In Artificial Neural Networks and Machine Learning–ICANN 2011, pages 333–340. Springer, 2011.