Stabilization of an Inverted Pendulum Via Human Brain Inspired Controller Design

Stabilization of an Inverted Pendulum via Human Brain Inspired Controller Design

Hedyeh Jafari, George Nikolakopoulos and Thomas Gustafsson

Abstract— The human body is mechanically unstable, while On the other hand, in most of the robotic applications in the brain as the main controller, is responsible to maintain balancing, the Proportional Derivative (PD) control scheme our balance. However, the mechanisms of the brain towards or a Proportional Integral Derivative (PID) controller are balancing are still an open research question and thus in this article, we propose a novel modeling architecture for replicating widely used [9], [10]. Despite from simplicity, these approa- and understanding the fundamental mechanisms for generating ches are still disparate from the human neural control and balance in the humans. Towards this aim, a nonlinear Recurrent cannot answer how the brain can generate motor commands Neural Network (RNN) has been proposed and trained that has to the musculoskeletal system by integrating multi-sensory the ability to predict the performance of the Central Nervous signals with neural feedback transmission delay and adapt to System (CNS) in stabilizing the human body with high accuracy and that has been trained based on multiple collected human the changes of body and its environment [7]. based balancing data and by utilizing system identification In the related literature, there have been many attempts techniques. One fundamental contribution of the article is the to present the nervous system as an adaptive controller fact that the obtained network, for the balancing mechanisms, with different methodologies, such as the adaptive sliding is experimentally evaluated on a single link inverted pendulum control [11], the gain scheduling [12] or the model predictive that replicates the basic model of the human balance and can be directly extended in the area of humanoids and balancing control [13], and their resembling biological internal model exoskeletons. Keywords: Postural control, system identification, is reviewed in [6]. However, in the recent articles on this recurrent neural network, inverted pendulum topic [14], [15], and by utilizing the latest developments in computing and sensor technology, system identification I.INTRODUCTION and machine learning, the corresponding algorithms are Maintaining balance of the human body, is a dynamic becoming more attractive. In [16] authors have used deep operation of the human brain to compensate for the limitation reinforcement learning as a controller for a humanoid robot, of passive stiffness of the joints and neural transmission or in [17] to control the robot motion planning a neural delays to the gravitational force [1]. The performance of network based on the internal model was used and in [1] this continuous neural control can be effected by injuries or authors simulated a reinforcement learning algorithm to aging [2], [3], therefore, studying the human motor control stabilize an inverted pendulum. and in parallel, developing assistive robotic applications, However, to the best of our knowledge, there are few such as lower limb exoskeletons or robotic prosthetics, are articles that have studied the controller of balance robotic attracting increasing interest nowadays [4]. However, the applications based on human data. In [18], [19] we proposed mathematical representation of the brain mechanisms, and a novel conceptual scheme for the modeling of the CNS in specifically our Central Nervous System (CNS) as our body an upright posture that has been verified by human data and main controller, is still an open question in the computational can be used as a controller of an inverted pendulum as the motor control field [5]. Besides, techniques to control these main model for most of the balance robotic applications. As robotic applications to be identical to the human motor an extension, in this work, we experimentally evaluate the control, have not yet been developed. proposed scheme with the following additional contributions. First, the performance of the proposed human inspired con- There is a considerable amount of the literature on the troller is evaluated on a real application, while guaranteeing hypothetical representation of the CNS to stabilize the body real-time performance. Second, multiple experiments verify in any posture or activity. Among the most widespread the performance of the presented controller with various hypothesis are the internal models of CNS [6], [7]. Based parameters of the inverted pendulum, including the existence on this theory, CNS can predict the movement by an internal of an exogenous disturbance. Finally, its performance is feedback control called efference copy and adapt itself to the compared with a tuned PID controller. This finding is a new changes in the human body and its environment [8], [5]. concept towards controlling of balance robotic applications However, the mathematical representation of this assumption that can resemble a human motor control system. is still questionable. The rest of the article is structured as follows. The *This work was funded by the Swedish Research Council Office under methodology of the proposed problem is presented in Secti- the Grant Agreement No.K2015-99X-22756-01-4 on II, followed by the description of the experimental data The authors are with the Control Engineering Group, Department of Com- collection and the establishment of the inverted pendulum puter, Electrical and Space Engineering, Luleå University of Technology, Luleå SE-97187, Sweden set up in Section III. In Section IV, the experimental results Corresponding Author’s email:([email protected]) are presented with corresponding comparison and discussion Fig. 1: Illustration of controlling a single link inverted pendulum by a human inspired control scheme trained from multiple gathered human data.

+ and finally the article is concluded in Section V. be explained by τankle(k − dy) with dy ∈ Z the time delay as an embedded memory and exogenous input [23]. II.METHODOLOGY The overall function g(·) ∈ R that predicts the current Figure 1 illustrates the schematic of proposed architecture ankle joint, can be divided into two layers: a) the adaptive to stabilize an inverted pendulum in a feedback loop by the correlation unit and b) the command generation unit. First, human inspired controller. The dynamics of the proposed in the adaptive correlation unit the angular position u and controller are inspired by the internal modeling of the CNS the exogenous output τankle(k − dy) with the corresponding and have been trained and verified by the collected human delays are received and the output is generated as it follows: data. In the human balance structure, the CNS generates proper i=du X i motor commands to the muscloskeltal system by integrating zq(k) = f(bh + whuu(k − i)+ the feedback joint kinematics affected by different informa- i=0 (1) tion from multistory organs, such as vestibular, vision and m=dy X m proprioception [7]. The internal model hypothesis of the CNS whyτânkle(k − m)) indicate that a CNS has an internal feedback loop, the so m=1 called efference copy, to reduced the error between the actual where q ∈ {1, 2,...,N} with N ∈ Z+ is the number and desired posture. This hypothesis also claims that CNS i m of nodes, bh ∈ R, whu ∈ R and why ∈ R are the bias, should be able to predict the motor commands regardless weights of input and weight of the exogenous input in this of any time delay in the sensory feedback perception data layer, respectively, while f(·) is the tanh activation function and adapt itself to the changes in the human body or its to map the negative inputs. In the sequel, the outcome of surrounding environment [20]. this layer Z(k) = {z1, z2, . . . , zN } is sent to the command Considering these conditions, the structure of the internal generation unit where the proper joint torque is estimated as: model of CNS can be presented by a recurrent neural network q=N (RNN). This method can predict the output by processing the X τânkle(k) = h(bo + woq zq(k)) (2) sequence of input and internal feedback memory [21]. Since q=1 the data to be predicted are time series and to avoid longtime dependencies and vanishing gradient problem, a Nonlinear where h is the linear activation function, with bo ∈ R and auto regressive modeling approach, with exogenous inputs woq ∈ R to represent the bias value and the weights of this (NARX) was applied [22]. Based on this method, the output layer, respectively. By this configuration, the predicted torque is calculated by minimizing the following cost function: of the network that in this case it is the ankle torque τankle ∈ can be predicted as: j=l R λ X E(k) = (ˆτ j (k) − τ j (k))2 l ankle ankle j=1 τânkle(k) = g(τankle(k − 1), . . . , τankle(k − dy), u(k) (3) i=n (1 − λ) X u(k − 1), . . . , u(k − du)) + w2(k) n i where u ∈ R is the input of the network with a time delay of i=1 + + du ∈ Z , which is measured as the ankle angular position, where l ∈ Z is the number of training epochs, λ ∈ R perceived from multi-sensory organs. The effernce copy can is the performance ratio and W = {wi| wi ∈ [Wh,Wo]} are all the weights used in both hidden (adaptive correlation) Gear-motor with 47 : 1 gear) coupled with a 10 [kOhm] and output (command generation) layers. The second term in rotary potentiometer as an absolute position sensor. At the (3) indicates the regularization term, which avoids the over coupling element, a carbon fibre rod with an extra load, fitting of the network to the trained data. is installed. Two rods with different lengths (65 [cm] and 25 [cm]) are used to check the effect of changing the length III.DATA COLLECTION AND EXPERIMENTAL SET UP parameter on the controller. A. Data collection The human data were collected at the Human Health and Performance Lab - Movement Science at Luleå University of Technology, Luleå, Sweden [24], [25] in accordance with the Helsinki declaration and was approved by the Regional Ethical Review Board in Umeå, Sweden (ref no. 2015-182- 31).

Fig. 4: Inverted pendulum hardware setup.

Finally, an Arduino Mega ADK with an Adafruit Motor Shield V2 are utilized to control the motor speed and direction. As it is shown in Fig.2, the rosserial-python serial-node Fig. 3: A subject standing still on a foam as an unstable protocol is used to connect the hardware setup to the main surface during the experiment. PC. The controller is then implemented in Matlab/Simulink environment that is coupled with the robotic operation sys- Data was collected from forty-five participants, 27 wo- tem (ROS) [26]. men and 18 men with a mean age of 75.2(±4.5) years, a mean height of 167.2(±9.9)cm, and a mean weight of IV. EXPERIMENTAL RESULTS 73.0(±12.2)kg. All the participants were community living residents able to read 100 pt, large block letters, could stand A. Identifying the controller unaided for 30 seconds or more and were able to perceive simple instructions in Swedish. The raw data of the ankle angular positions and the Body kinematics were measured by a Qualisys motion corresponding torques in the sagittal plane are sampled at Capture System with eight cameras and with a 200 Hz 200[Hz]. Since there is a high correlation between left and sampling rate. The body kinetics such as the force, the torque right ankle positions in upright stance, the mean signal of and the Center of Pressure (COP) were measured by a force the left and right ankles’ angle is chosen for the angular plate with a sampling at 3000 Hz that was synchronized position of each subject. First, the signals of each subject with the Qualisys Track Manager (QTM) software. are detrended and the base line noise is filtered. To train the To consider the changes in the sensory motor system, such network, signal data of all the subjects at different mentioned as vision and proprioception, the subjects were asked to stand trails are divided into two sets: 70% for training and 30% still for 30 seconds in four different scenarios: a) upright validation sets to avoid over fitting. The NARX network has stance on a stable surface while eyes were opened, b) upright 5 nodes at the hidden layer, and the Bayesian regularization stance on a stable surface with closed eyes, c) standing on back propagation training method is utilized to have a robust the foam with open eyes, and d) standing on the foam with model and avoid overfitting [27]. The input and output delays closed eyes. The measured angular position of the ankle and are chosen initially based on the cross-correlation between the relevant ankle torque from all the trails were used to train the input and output signals and later are tuned empirically −du the RNN. to find the best performance. Thus, an input delay of z of 30 samples (150 [ms]) and output delay of z−dy of (0.01 [s]) B. Experimental inverted pendulum setup has been selected. At this point it should be also mentioned The single link inverted pendulum, depicted in Fig.4, is that these values fit well with the findings of the neural controlled by a DC motor (Pololu 25 mm−diameter Metal transmission latencies presented in [28]. Fig. 2: Software and hardware set-up configuration.

Measured Predicted 50 6 4 0 2 0 [deg] θ −50 Angular position -2 Subject #1 Torque [N.m] Ref 0 5 10 15 20 25 30 −100 7 2 0 1 2 3 4 5 6 8 9 10 0 -2 20 Subject #2 Torque [N.m] -4 0 5 10 15 20 25 30 10 4 2 0 0

-2 Torque [N.m] -4 −10

Subject #3 0 1 2 3 4 5 6 7 8 9 10 Torque [N.m] -6 0 5 10 15 20 25 30 Time [s] Time [s] Fig. 6: Stabilizing the inverted Pendulum from the horizontal Fig. 5: Prediction of ankle torque for three random indivi- initial position by the proposed control scheme trained by duals. The solid blue line presents the measured Torque and human data. The upper plot shows the angular position of the the dashed red line shows the predicted Torque. inverted pendulum and the lower plot shows the manipulated variable generated by the controller. Figure 5 shows the 100 steps ahead prediction of joint torque for three random subjects in the validation data set with the prediction error of RMSE ∼ 0.43[N.m]). The robustness of the controller to changes in the system, such as length of the inverted pendulum, is evaluated by ex- B. Stabilizing the inverted pendulum perimenting a shorter rod. Figure 9 shows that the controller, The real time experimental performance of the controller without any tuning, is able to stabilize the pendulum with a for the task of stabilizing the inverted pendulum from the slightly larger settling time. This can be improved by training horizontal position, is presented in Fig. 6. As it is shown, the the network by applying different variety in the selected data controller is able to maintain the balance of the pendulum sets. Following link provides a video summary of the overall from the horizontal position (−90◦) to the upright position results: https://youtu.be/rIvPQagVpVs. (0◦) in almost 5 [s]. Furthermore, in order to compare the efﬁciency of the Figure 7 presents the performance of the controller in case proposed control scheme with the prevailing controllers, of different disturbances applied to the setup. As it has been a PID controller is implemented and tuned properly by indicated from the presented results, the proposed controller exhaustive experimental trials. As shown in Fig. 8, the has the ability to stabilize the pendulum even in the case of PID controller can stabilize the inverted pendulum in a disturbances being characterized by large amplitudes, while shorter time. However, comparing the generated torque, the settling time after the disturbance is fast and in the range from the proposed control scheme, with the PID controller, of 1-2 secs. the nonlinear characteristics of the activation function are noticeable in the neural controller, while the overall response is almost similar to the inverse dynamic of the set-up. It Angular position Ref should be highlighted that the purpose of this comparison is 60 to demonstrate the efﬁciency of the presented scheme, while 40 the advantage and the main contribution of the proposed controller over the model-free state-of-the-art controllers, 20 is similarities to the human brain performance which can [deg]

θ 0 maintain balance of an inverted pendulum, regardless the number of links, by a single controller. −20 0 10 20 30 40 50 60 Angular position Ref

50 5 0 [deg] 0 −5 θ

Torque [N.m] −10 −50 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Time [s] 5 Fig. 7: Stabilizing the pendulum by human-inspired controller after applying point disturbances. The maximum peaks 0 indicate the time where these disturbances have been applied to the set up. −5 Torque [N.m] −10 0 10 20 30 40 50 60 Time [s] Angular position Ref Fig. 9: Balancing the inverted pendulum by utilizing a human 60 brain inspired controller for the case of smaller rod (25mm). The max peaks present the time of the applied disturbances. 40

[deg] 20

θ V. CONCLUSIONANDFUTUREWORK 0 This study has provided an experimental evaluation of a human-brain inspired control scheme that has the ability 0 5 10 15 20 25 30 35 40 to mimic the human motor control system to maintain the balance. The obtained results satisfactory proving that the 40 mentioned controller can stabilize the inverted pendulum, as 20 a main model of the human body, while inheriting the neural feedback latencies in the CNS. This approach has the addi- 0 tional potential to be further evaluated to more complicated −20 applications of balance robots in order to demonstrate the applicability of the human balancing mechanisms in artificial Torque [N.m] −40 apparatuses. −60 0 5 10 15 20 25 30 35 40 The data set utilized in this work was from an elderly Time [s] group with various challenges in standing still in order to validate the performance of the proposed method in the Fig. 8: Stabilizing the pendulum after applying disturbances most challenging cases. However, on a wider level, additional with PID controller. The maximum peaks indicate the time research is also needed to validate and train the controller where these disturbances have been applied to the set up. with a data set with different variety of age, height and mass in the subjects. Additionally, this approach can be extended to multiple joints and multi-links inverted pendulum. REFERENCES [25] M. Pauelsen, I. Vikman, V. J. Strandkvist, A. Larsson, U. Röijezon, Decline in sensorimotor systems explains reduced falls self-efficacy, [1] K. Michimoto, Y. Suzuki, K. Kiyono, Y. Kobayashi, P. Morasso, T. No- Journal of electromyography and kinesiology 42 (2018) 104–110. mura, Reinforcement learning for stabilizing an inverted pendulum [26] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, naturally leads to intermittent feedback control as in human quiet R. Wheeler, A. Y. Ng, ROS: an open-source robot operating system, standing, in: 2016 38th Annual International Conference of the IEEE in: ICRA workshop on open source software, Vol. 3, Kobe, Japan, Engineering in Medicine and Biology Society (EMBC), IEEE, 2016, 2009, p. 5. pp. 37–40. [27] F. Burden, D. Winkler, Bayesian regularization of neural networks, in: [2] World Health Organization, global report on falls prevention in Artificial neural networks, Springer, 2008, pp. 23–42. older age-fact sheet, http://www.who.int/mediacentre/ [28] M. Mano, A. Lécuyer, E. Bannier, L. Perronnet, S. Noorzadeh, factsheets/fs344/en/, Sidst set 30/01/2018 (2018 January). C. Barillot, How to build a hybrid neurofeedback platform combining [3] S. F. Tyson, M. Hanley, J. Chillala, A. Selley, R. C. Tallis, Balance eeg and fmri, Frontiers in neuroscience 11 (2017) 140. disability after stroke, Physical therapy 86 (1) (2006) 30–38. [4] A. J. Young, D. P. Ferris, State of the art and future directions for lower limb robotic exoskeletons, IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (2) (2017) 171–182. [5] A. Karniel, Open questions in computational motor control, Journal of integrative neuroscience 10 (03) (2011) 385–411. [6] C. Tin, C.-S. Poon, Internal models in sensorimotor integration: per- spectives from adaptive control theory, Journal of Neural Engineering 2 (3) (2005) S147. [7] R. Shadmehr, M. A. Smith, J. W. Krakauer, Error correction, sensory prediction, and adaptation in motor control, Annual review of neuroscience 33 (2010) 89–108. [8] D. Robinson, G. Lennerstrand, P. Bach-y Rita, Basic mechanisms of ocular motility and their clinical implications. [9] R. Chiba, K. Takakusaki, J. Ota, A. Yozu, N. Haga, Human upright posture control models based on multisensory inputs; in fast and slow dynamics, Neuroscience research 104 (2016) 96–104. [10] K. Tahboub, T. Mergner, Biological and engineering approaches to human postural control, Integrated Computer-Aided Engineering 14 (1) (2007) 15–31. [11] J.-J. Slotine, J. Coetsee, Adaptive sliding controller synthesis for nonlinear systems, International Journal of Control 43 (6) (1986) 1631– 1651. [12] A. D. Kuo, An optimal control model for analyzing human postural balance, IEEE transactions on biomedical engineering 42 (1) (1995) 87–101. [13] S. J. Sober, P. N. Sabes, Multisensory integration during motor planning, Journal of Neuroscience 23 (18) (2003) 6982–6992. [14] T. Yan, M. Cempini, C. M. Oddo, N. Vitiello, Review of assistive strategies in powered lower-limb orthoses and exoskeletons, Robotics and Autonomous Systems 64 (2015) 120–136. [15] C. Ott, B. Henze, G. Hettich, T. N. Seyde, M. A. Roa, V. Lippi, T. Mergner, Good posture, good balance: comparison of bioinspired and model-based approaches for posture control of humanoid robots, IEEE Robotics & Automation Magazine 23 (1) (2016) 22–33. [16] C. Yang, K. Yuan, W. Merkt, T. Komura, S. Vijayakumar, Z. Li, Learning whole-body motor skills for humanoids, in: 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), IEEE, 2018, pp. 270–276. [17] I. V. Blagouchine, E. Moreau, Control of a speech robot via an optimum neural-network-based internal model with constraints, IEEE Transactions on Robotics 26 (1) (2010) 142–159. [18] H. Jafari, M. Pauelsen, U. Röijezon, L. Nyberg, G. Nikolakopoulos, T. Gustafsson, On internal modeling of the upright postural control in elderly, in: 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), IEEE, 2018, pp. 231–236. [19] H. Jafari, G. Nikolakopoulos, T. Gustafsson, Replicating human brain mechanisms towards balancing, in: 17th European Control Conference (ECC)-Naples June 25-28, 2019., 2019. [20] R. Shadmehr, Learning to predict and control the physics of our movements, Journal of neuroscience 37 (7) (2017) 1663–1671. [21] M. Han, J. Xi, S. Xu, F.-L. Yin, Prediction of chaotic time series based on the recurrent predictor neural network, IEEE transactions on signal processing 52 (12) (2004) 3409–3416. [22] E. Diaconescu, The use of narx neural networks to predict chaotic time series, Wseas Transactions on computer research 3 (3) (2008) 182–191. [23] J. M. P. Menezes Jr, G. A. Barreto, Long-term time series prediction with the NARX network: An empirical evaluation, Neurocomputing 71 (16-18) (2008) 3335–3343. [24] M. Pauelsen, L. Nyberg, U. Röijezon, I. Vikman, Both psychological factors and physical performance are associated with fall-related concerns, Aging clinical and experimental research (2017) 1–7.