International Journal of Humanoid Robotics Vol. 13, No. 1 (2016) 1650011 (28 pages) °c World Scienti¯c Publishing Company DOI: 10.1142/S0219843616500110

Whole-Body Balancing Walk Controller for Position Controlled Humanoid Robots

Seung-Joon Yi GRASP Laboratory, University of Pennsylvania, Philadelphia PA 19104, USA [email protected]

Byoung-Tak Zhang BI Laboratory, Seoul National University, Seoul, Korea [email protected]

Dennis Hong RoMeLa Laboratory, University of California, Los Angeles CA 90095, USA [email protected]

Daniel D. Lee GRASP Laboratory,University of Pennsylvania, Philadelphia PA 19104, USA [email protected]

Received 25 May 2015 Accepted 13 January 2016 Published 17 March 2016

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. Bipedal humanoid robots are intrinsically unstable against unforeseen perturbations. Conven- tional zero moment point (ZMP)-based locomotion algorithms can reject perturbations by incorporating sensory feedback, but they are less e®ective than the dynamic full body behaviors humans exhibit when pushed. Recently, a number of biomechanically motivated push recovery behaviors have been proposed that can handle larger perturbations. However, these methods are based upon simpli¯ed and transparent dynamics of the robot, which makes it suboptimal to implement on common humanoid robots with local position-based controllers. To address this issue, we propose a hierarchical control architecture. Three low-level push recovery controllers are implemented for position controlled humanoid robots that replicate human recovery behaviors. These low-level controllers are integrated with a ZMP-based walk controller that is capable of generating reactive step motions. The high-level controller constructs empirical de- cision boundaries to choose the appropriate behavior based upon trajectory information gath- ered during experimental trials. Our approach is evaluated in physically realistic simulations and on a commercially available small .

Keywords: Position controlled humanoid robot; biomechanically motivated push recovery; low-dimensional policy; online learning.

1650011-1 S.-J. Yi et al.

1. Introduction Due to their small footprint and high center of mass (COM), bipedal humanoid robots are prone to lose balance with uneven °oors, robot modeling errors, or im- precise actuators. Thus, active stabilization of humanoid robots has been an im- portant topic in robotics research. Biomechanical studies of human walking and balancing behavior showed that humans use three basic balance control strategies, denoted ankle, hip and step strategies which are illustrated in Figs. 1(a)–1(c).1 The ankle strategy controls torque at the ankle joint, the hip strategy uses the angular acceleration of the torso and free limbs to apply counteractive ground reaction force (GRF), and the step strategy changes the base of support to a new position. All three strategies seek to control the horizontal position of the system's COM by changing the horizontal component of the GRF. The conventional approach for bipedal locomotion control is zero moment point (ZMP)-based control algorithms based upon the linear inverted pendulum model (LIPM).2 The reference ZMP trajectory is typically designed in advance according to footstep locations, then the torso and foot trajectories are calculated based on the reference ZMP using the LIPM.3 Stabilization is accomplished by measuring state error and feedback control to track the reference ZMP, which updates the COM trajectory and generates an inertial force resulting in an e®ective control torque at the ankle joints, as shown in Fig. 1(d). The closed-loop ZMP tracking approaches are usually con¯ned to the ankle strategy, as reactive stepping requires online modi¯- cation of the ZMP trajectory. However, there has recently been some work on si- multaneously generating COM and ZMP trajectories in real time to enable the step strategy.4–6 The main advantage of ZMP tracking-based approaches is that they can easily be integrated in existing walk controllers, and they have been successfully incorporated Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

Fig. 1. A comparison of three biomechanically motivated push recovery approaches and the ZMP tracking approach. (a) Ankle strategy. (b) Hip strategy. (c) Step strategy. (d) ZMP tracking approach.

1650011-2 Whole-Body Balancing Walk Controller

(a) (b) (c)

Fig. 2. Comparison of the ZMP tracking approach and the biomechanical push recovery approach under lateral pushes during walking. An impulsive lateral force for 0.01 s is applied to the COM of the robot at the middle of the single support phase. Note that the step strategy is not possible for this case due to kinematic constraints. (a) The ZMP tracking approach, 0.9 Ns of lateral push. (b) The ZMP tracking approach, 1.2 Ns of lateral push. (c) The ankle and hip strategies, 1.2 Ns of lateral push.

on many humanoid robot platforms. However, they usually require fast online computation, a precise dynamic model of the robot, and accurate estimation of the current dynamic state, which makes it harder to use on resource constrained robots with restricted actuation, sensing and processing capabilities. The ankle strategy alone has limited e®ectiveness against strong perturbation. The step strategy can be used after large perturbations, but it is not always physically feasible due to step timing or foot con¯guration. Figure 2 shows an example where the ZMP tracking- based ankle/step controller fails to stabilize the robot. On the other hand, an active line of research has focused on the theoretical analysis of biomechanically motivated push recovery controllers using an abstract model of the robot. These models include ankle control torque for the ankle strategy, °ywheel body and hip control torque for the hip strategy, and secondary support point for the step strategy.7–9 Such approaches result in very simple analytical controllers that can reject stronger perturbations as they utilize angular momentum degrees of freedom. However, the biggest drawback of these approa- ches is that most of them assume simpli¯ed and transparent dynamics of the robot, which is often hard to realize as most of the humanoid robots currently Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. available has highly distributed mass and local position-based controllers with high feedback gain. Our aim is to get the best of both worlds, devising an integrated walk controller that can exhibit the full range of biomechanically inspired behaviors to respond to external perturbations. We take a hybrid approach where walking is governed by a ZMP-based walk controller, and large perturbations trigger biomechanically moti- vated simple push recovery controllers. First, we design a simple ZMP-based walking controller that simultaneously plans the ZMP and COG trajectories in real time for reactive stepping. To incorporate the biomechanically motivated push recovery controllers, we utilize a hierarchical architecture which consists of low-level con- trollers that governs each biomechanically motivated push recovery behavior with a high-level controller that switches each low-level controller based on the current state of the robot. Instead of relying upon the accuracy of the theoretical model

1650011-3 S.-J. Yi et al.

parameters,10,11 we use the empirical decision boundaries between the controller that are learned from experience. The main contribution of this work is twofold: from the theoretical point of view, we show that the physical humanoid robot has similarly-shaped but quantitatively di®erent stability regions from those derived by theoretical models of varying sim- plicity. In terms of implementation, we propose an integrated system that e®ectively combines three push recovery behaviors and a walk controller to enable a humanoid robot to perform push recovery behaviors while walking. We demonstrate how this controller is learned from experience and evaluate its performance on a small hu- manoid robot. The remainder of the paper is organized as follows. Section 2 reviews three bio- mechanically motivated push recovery controllers and their implementations on position controlled humanoid robots. Section 3 explains the step-based omnidirec- tional walk controller which can perform reactive stepping for push recovery control during walking. Section 4 shows how to learn the high-level controller from repeated trials in a simulated environment, and Sec. 5 shows the experimental results using the DARwIn-OP humanoid robot. Finally, we conclude with a discussion of out- standing issues and potential future directions arising from this work.

2. Biomechanically Motivated Push Recovery Controllers for Position Controlled Robots Biomechanical studies show that humans display three distinctive motion patterns in response to sudden external perturbations, which we denote as ankle, hip and step push recovery strategies.1 The ankle strategy applies control torque at the ankle joint, the hip strategy uses the angular acceleration of torso and free limbs to apply counteractive GRF, and ¯nally the step strategy changes the base of support to a new position. For each push recovery strategy, we ¯rst review the basic push re- covery controllers for the simpli¯ed model, and then explain how we implement the behaviors of such controllers on resource constrained humanoid robots which lack Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. force/torque control and only provide position-based control with high proportional gain. Finally, we explain how we handle possible issues with those controllers when the robot is moving.

2.1. Ankle push recovery The ankle strategy applies control torque on the ankle joints to keep the COM within the base of support. It is widely adopted in the form of the closed-loop ZMP tracking, and this approach is successfully implemented on a number of full-sized humanoid robots equipped with force/torque sensors at the ankles.3,12–16 Further- more, it was recently shown that the approach is robust enough to make a full-sized humanoid robot walk on a public street with unknown surface inclinations and unevenness.17 It also has been widely implemented on small humanoid robots and

1650011-4 Whole-Body Balancing Walk Controller

(a) (b)

Fig. 3. The ankle strategy that applies control torque on ankle joints. (a) The abstract model for the ankle strategy. (b) The ankle strategy implemented on DARwIn-OP humanoid robot.

two current commercially available small humanoid robots, Naoa and DARwIn- OP,b are provided with walk controllers using the ankle strategy for stabiliza- tion.18,19 And there have also been other closed-loop walk control implementations utilizing the ankle strategy on small humanoid robots.20,21

We ¯rst examine the abstract model in Fig. 3(a), where ankle torque ankle is applied to a LIPM with mass m, COM height z0 and COM horizontal position x from current support point. The resulting linearized dynamic model is

2 x€ ¼ ! ðx ankle=mgÞ; ð1Þ pffiffiffiffiffiffiffiffiffi where ! ¼ g=z0 and g is the gravitational constant. If we assume a reference trajectory xref which satis¯es the LIPM without additional ankle torque

2 x€ref ¼ ! xref ; ð2Þ

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. then the state error xerr ¼ x xref follows the same dynamic model as (1): 2 x€err ¼ ! ðxerr ankle=mgÞ; ð3Þ

which can be controlled by PD control on xerr:

ankle ¼ Kpxerr þ Kdx_err; ð4Þ

where Kp and Kd are control gains. This requires torque control of ankle actuators, but in practice it can be approximated for position controlled actuators with pro- portional control by directly setting the target angle of the ankle actuator 0 0 ankle ¼ K pxerr þ K dx_err; ð5Þ

a http://www.aldebaran-robotics.com/. b http://www.robotis.com/xe/darwin en.

1650011-5 S.-J. Yi et al.

10,19 where ankle is the target ankle angle bias. In addition to the ankle joints, we use the same control law to modulate arm position to apply additional e®ective torque at the ankles in a similar way, unless overridden by the hip controller. When the robot is walking, we only apply ankle bias to the current support foot during the middle phase of single support to prevent the ankle strategy from setting nonzero ankle bias for the foot currently in air, which can result in premature landing. We use a trapezoid function f ðÞ to make a smooth transition at landing and takeo® 0 0 ankle ¼ f ðsingleÞðK pxerr þ K dx_errÞ; ð6Þ

where 0 single < 1 is the single support phase and f ðÞ is following function 8 <=lift 0 <lift; < ; f ð Þ¼:1 lift land ð7Þ ð1 Þ=ð1 landÞ land <1;

where lift and land are timing parameters. Figure 3(b) shows the ankle strategy controller implemented on the DARwIn-OP small humanoid robot.

2.2. Hip push recovery The hip strategy uses angular acceleration of the torso and limbs to generate a backward GRF to pull the COM back towards the base of support. A two-phase in the hip strategy for a humanoid has been suggested which uses angular acceleration to absorb the disturbance in the re°ex phase and return to initial pose in the recovery phase.22 An extended LIPM with angular momentum was used to derive analytic control laws for the hip and the step strategy, and the concept of capture point was suggested as the calculated stepping position for the step strategy.7 This approach is further extended by using a simpli¯ed model that results in analytic decision surfaces for push recovery strategies as functions of the state of the robot.8,9 These approa- ches are extended to control GRF and ZMP at each foot using angular momentum Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. and showed it can balance a 3D full-body model of a humanoid robot in a simulated environment for nonlevel and nonstationary ground.23 The hip strategy for a sta- tionary robot has been also implemented on a full-sized, torque controlled humanoid robot.24

The abstract model in Fig. 4(a) includes a °ywheel with mass m, COM height z0 and rotational inertia I , and control torque hip applied at the center of the °ywheel. The resulting linearized dynamic model is then: x€ ¼ !2ðx =mgÞ; ð8Þ :: hip hip ¼ hip=I : ð9Þ However, the °ywheel should not exceed joint limits. In this case, the following bang–bang pro¯le can be used for applying hip torque to maximize the e®ect while

1650011-6 Whole-Body Balancing Walk Controller

(a) (b)

Fig. 4. The hip strategy uses angular acceleration of torso and limbs to apply counteractive GRF. (a) The abstract model for the hip strategy. (b) The hip strategy implemented on DARwIn-OP humanoid robot.

satisfying the joint angle constraint,7 ( MAX < ; hip 0 t TH1 ðtÞ¼ ð10Þ hip MAX < ; hip TH1 t 2TH1 MAX where hip is the maximum torque that the can be applied on torso and TH1 is the time the torso stops accelerating. This torque pro¯le angularly accelerates the torso with maximum torque and then decelerates with maximum negative torque, making MAX it stop at angle hip . This behavior can be approximately implemented with high gain position controlled actuators by directly setting the hip target angle bias TARGET MAX hip to hip , which makes the torso accelerate with the maximum torque and stops at that position with nearly maximum deceleration. After t ¼ 2TH1, the hip angle bias should return to zero.22 This two-phase behavior can be simply imple- mented as Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. 8 > MAX < ; < hip 0 t 2TH1 TARGET ¼ ð11Þ hip > MAX 2TH1 þ TH2 t < ; :> hip 2TH1 t 2TH1 þ TH2 TH2

where TH2 is the duration of the returning phase. The same controller is used for arm angles to apply additional GRF from the angular momentum of the limbs as well. When the robot is pushed hard during walking, the robot may lift its currently tipped foot, which can instantly destabilize the robot. To prevent this, when the hip strategy is initiated, we shorten the single support phase and extend the double support phase until the hip strategy is completed and the robot stands stably on two feet. Figure 4(b) shows the hip strategy controller implemented on the DARwIn-OP small humanoid robot.

1650011-7 S.-J. Yi et al.

2.3. Step push recovery When the magnitude of the disturbance exceeds the capability of the other two push recovery controllers, the step controller can be used to move the base of support towards the direction of the push by taking a step. If we assume that the push is done while robot is in single support phase, this strategy can be imple- mented in a straightforward manner by changing the landing position of currently lifted foot towards the direction of perturbation. This step strategy has been implemented on various full-sized humanoid robots, including HRP-225,11 and Sarcos robot26 while walking, and Hubo10 and Toyota partner robot6 while hop- ping in place. There have been some analytical studies about where the robot should step assuming simpli¯ed models, including the capture point,7 foot place- ment estimator27 and generalized foot placement estimator28 approaches. They all share the inverted pendulum model shown in Fig. 5(a), which models the step strategy as three stages including initial single support stage from initial condition, support point transition stage, and ¯nal single support stage to stable state. Their main di®erence is how they model each stage. A LIPM is used for all three stages, and the support point transition is assumed to occur instantaneously preserving linear momentum, which results in the following landing position from initial support point7:

xcapture ¼ x_=! þ x: ð12Þ In reality, we cannot instantly change the support point, and landing impacts reduce the linear momentum. In Ref. 26, an inverted pendulum model with ¯xed

leg length z0 and pendulum tilt angle is used for the ¯rst and second stages, and a LIPM with body height z0 is used for the third stage. Landing is modeled as an impulse force along the landing leg, which makes the vertical velocity descend to zero. In Refs. 28 and 10, an inverted pendulum model with leg length l and an angular momentum conserving impact model for transition are used. Those models do not admit a closed form solution in general, but an approximate solution is Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

(a) (b)

Fig. 5. The step strategy changes the support point by stepping. (a) The abstract model for the step strategy. (b) The step strategy implemented on the DARwIn-OP humanoid robot.

1650011-8 Whole-Body Balancing Walk Controller

provided in Ref. 10 as:

xcapture ¼ 2 cosða=2Þ; ð13Þ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : a ¼ 2 cos1 1 ðl2=2 þ cos 1Þ=8 : ð14Þ

One practical issue for a physical implementation of the step strategy is the landing shock. As the step strategy is meant to be used with large perturbation, it can lead to a hard landing that can make the robot bounce back and fall down. There have been approaches to handle this by incorporating mechanical or elec- trical compliance, and we use a simpler approach of lowering the proportional gain for the swing leg at the later part of stepping. Figure 5(b) shows the step strategy implementedontheDARwIn-OProbot. We should also consider that the step strategy may not be always possible for walking humanoid robot due to kinematic and timing constraints. Most humanoid robots cannot cross their legs due to kinematics constraints, and the amount by which the robot can change the landing position of the currently lifted foot decreases over time due to velocity constraints. Also, if the robot is pushed when the robot is in double support or is about to land its foot, it needs to take a new step for push recovery. In this case, we have to determine which foot the robot should use for stepping, as lifting the foot with the current support edge will result in the robot instantly falling. Figure 6 shows three possible stepping cases according to the di- rection of perturbation from the same foot stance. The support foot for capture step can be determined based on the angle between the two feet and the perturbation vector as shown in Figs. 6(a) and 6(b). For cases like Fig. 6(c), the step strategy is not available due to a kinematic constraint. Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

(a) (b) (c)

Fig. 6. Determining step foot based upon the direction of perturbation.

1650011-9 S.-J. Yi et al.

Fig. 7. The hierarchical control structure for push recovery.

2.4. The high-level push recovery controller We have explained three biomechanically motivated push recovery controllers and their implementations for walking in a position controlled humanoid robot. When pushed, humans perform a combination of push recovery behaviors according to the particular situation. To select the appropriate set of push re- covery behaviors as humans do, we use a hierarchical controller shown in Fig. 7, where ankle, hip and step push recovery controllers work as low-level sub- controllers and the high-level push recovery controller triggers each according to the direction and the amount of the external disturbance estimated using the onboard sensors. For abstract models we have seen in Figs. 3–5, there have been analytic studies for decision boundaries of each controller.8,29 If we assume maximum ankle torque as ankle MAX, then the stability region for ankle push recovery controller, a region of state space the system can be stabilized, can be derived as

MAX jjx_=! þ x <ankle =mg ð15Þ

and following stability region for the hip strategy plus the ankle strategy

MAX MAX !TH1 2 jjx_=! þ x < ð ankle þ hip ðe 1Þ Þ=mg: ð16Þ Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. Finally, if we assume instantaneous support point transition without loss of linear momentum, we have the following stability region for using all three strategies at once:

MAX MAX !TH1 2 MAX jjx_=! þ x < ð ankle þ hip ðe 1Þ Þ=mg þ x capture; ð17Þ

MAX where x capture is the maximum step size available. In this case we can use two boundary conditions in (15) and (16) to select between controllers based on current state. For the more realistic case with a multi-segmented body with motor dynamics as on a physical robot, these theoretical boundaries do not ¯t well and the high-level controller needs to be trained from experience. This is covered in more detail later in this paper.

1650011-10 Whole-Body Balancing Walk Controller

3. Integration with Walk Controller As we have seen before, the step strategy requires reactive modi¯cation of the stepping sequence and the foot trajectory, as shown in Fig. 8. However reactive stepping is generally not possible with typical ZMP tracking approaches where the reference ZMP trajectory is calculated in advance and the COM trajectory is gen- erated to minimize the ZMP error. Recently, there have been approaches to generate walking patterns online to overcome this limitation, including a ZMP preview-based algorithm that updates the COM trajectory at a high frequency,30 a real-time gait planning method based on the analytic solution of the LIPM with a parametrized ZMP trajectory.4,5 Step push recovery based on these approaches have been successfully implemented on the HRP-2 robot25,11 and the Toyota partner robot.6 Another method for real-time walk pattern generation is the biologically inspired, central-pattern-generator-based approach. This approach has been implemented on the Hubo robot and demonstrated step push recovery behavior while hopping in place.10 Due to its simplicity, this approach has been widely used for small, resource constrained humanoid robots,19,31,32 but it is generally harder to design a stable trajectory as it is not based on a explicit stability criterion. Our walk controller is based on the analytic solution of the LIPM, but further simpli¯ed to be implemented on resource constrained robots. The walk pattern is divided into discrete steps, and the overall walk control is separated into a footstep generation controller and trajectory controller. The footstep generation controller generates the parameters for the next step, including the initial and ¯nal position of each foot and support foot information, and generates the reference ZMP trajectory based on them. The trajectory controller generates foot and torso trajectories for the current step based on those parameters. We describe more details of our walk con- troller in following subsections.

3.1. Footstep generation controller

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. Our ¯rst assumption is that walking is divided into discrete steps, which start and end with a double support phase. Then we can de¯ne the ith step as a set of

(a) (b)

Fig. 8. Two di®erent cases of reactive stepping. (a) The inter-step override which uses the same support foot and updates the foot trajectory for the next step. (b) The intra-step override which updates the current foot trajectory during stepping.

1650011-11 S.-J. Yi et al.

parameters ; ; ; ; ; ; ; STEPi ¼ fgSFi Li Ci Ri L1þ1 Ciþ1 Riþ1 ð18Þ ; ; ; ; where SFi denotes the support foot, Li Ci Ri and Liþ1 Ciþ1 Riþ1 are the initial and ¯nal 2D poses of left foot, torso and right foot in ðx; y;Þ coordinate. The landing foot pose is calculated from the current foot con¯guration, commanded walk velocity and kinematic and self-collision constraints. To make the step transition occur at the

most stable posture, we set the boundary torso pose Ci to be the midpoint of Li and Ri for all i. When an inter-step override is required as in Fig. 8(a), the current commanded walk velocity is overridden and the next landing foot position is de- termined according to the push direction. A single step is further divided into three stages, which includes the ¯rst double support stage when ZMP moves to the current support foot, and the single support stage when ZMP lies on the support foot, and the second double support stage when ZMP moves back to the ¯nal torso position. If

we de¯ne the walk phase as t=t STEP, where t is the time passed since step started and t STEP is the duration of the step, we can design the ZMP trajectory piðÞ as a piecewise linear function of as 8  > > <; >Ci 1 þ Li 0 1 < 1 1 p ðÞ¼ Li 1 <2; ð19Þ i >  > > 1 1 < ; :Ciþ1 1 þ Li 2 1 1 2 1 2 for the left support foot case and 8  > > <; >Ci 1 þ Ri 0 1 < 1 1 p ðÞ¼ Ri 1 <2; ð20Þ i >  > > 1 1 < ; :Ciþ1 1 þ Ri 2 1 1 2 1 2 Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. for the right support foot case, where 1;2 are the timing parameters determining the transition between single support and double support phase. The step controller and resulting ZMP trajectory are shown in Fig. 9.

3.2. Trajectory controller The trajectory controller generates the foot and torso trajectories for the current step

de¯ned in (18). First, we de¯ne the single support walk phase single as 8 > <; >00 1 <> 1 <; single ¼ > 1 2 ð21Þ > 2 1 :> 1 2 <1;

1650011-12 Whole-Body Balancing Walk Controller

(a) (b)

Fig. 9. The step-based walk controller. (a) An example of walking behavior which is composed of two steps, STEPi and STEPiþ1. (b) Corresponding lateral ZMP and torso trajectories pð Þ and xð Þ. Timing : : parameters of 1 ¼ 0 2 and 2 ¼ 0 8 are used.

then we use following heuristic trajectory function with parameters ; : fT ðÞ¼ þ ð1 Þ; ð22Þ

to generate the foot trajectories for both feet liðÞ, riðÞ: ; lið Þ¼Lið1 fT ð singleÞÞ þ Liþ1fT ð singleÞ ð23Þ : rið Þ¼Rið1 fT ð singleÞÞ þ Riþ1fT ð singleÞ ð24Þ

Then the torso trajectory xi is calculated to satisfy following ZMP criterion for the LIPM

:: 2 xi ¼ðxi piðÞÞ=tZMP ; ð25Þ pffiffiffiffiffiffiffiffiffi where tZMP ¼ z0=g. The piecewise linear ZMP trajectory we use in (19) and (20) yields the following closed-form solution of xiðÞ during the step period 0 <1: 8 p = = ZMP n ZMP Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. >p ð Þþa e þ a e > i i i > > 1 1 <; > þ mitZMP sinh 0 1 <> ZMP ZMP p = n = x ðÞ¼ p ðÞþa e ZMP þ a e ZMP <; ð26Þ i > i i i 1 2 > p = n = >p ðÞþa e ZMP þ a e ZMP > i i i > :> 2 2 þ nitZMP sinh 2 <1; ZMP ZMP

where ZMP ¼ tZMP=t STEP and mi, ni are ZMP slopes which are de¯ned as follows for the left support case:

mi ¼ðLi CiÞ=1; ð27Þ = ni ¼ðLi Ciþ1Þ ð1 2Þð28Þ

1650011-13 S.-J. Yi et al.

and for the right support case:

mi ¼ðRi CiÞ=1; ð29Þ = : ni ¼ðRi Ciþ1Þ ð1 2Þ ð30Þ p n The parameters a i and a i can then be uniquely determined from the boundary conditions xið0Þ¼Ci and xið1Þ¼Ciþ1. This analytic solution of the torso trajectory is continuous and has zero ZMP error during each step period, but may have dis- continuous velocity at the transition when commanded velocity is changing. How- , we found this does not hamper stability as the transition occurs in the middle of the most stable double support stance. In addition to calculating foot trajectories based upon predetermined target foot poses from step controller, the intra-step override shown in Fig. 8(b) is handled by the trajectory controller by updating the landing position of the current swing foot towards the capture point. As the new landing point has to satisfy kinematic and velocity constraints, it is most e®ective at the initial phase of the step.

4. Learning the High-Level Push Recovery Controller In the previous sections, we have described our hierarchical push recovery controller structure and its implementation for a position controlled robot. As we have dis- cussed, although there are analytic decision boundaries for simpli¯ed models to select the appropriate set of push recovery controllers based on current state, such decision rules may not work well with more realistic dynamic models. Instead of relying on the abstract model, our previous works have been using a machine learning approach, where we directly train the parametrized controller from experience. We have implemented three parametrized push recovery strategy for a resource constrained robot with high gain position control, and used reinforcement learning to learn the high-level controller that governs three push recovery controllers from raw sensory inputs using a full-body model of robot in simulated environment, and used the learned controller on small humanoid robot walking in place.33 Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. An insight gained from physical experiments is that modest pushes can be ef- fectively stabilized using the ankle strategy alone, and the magnitudes of hip and step strategies are limited with the physical robot due to kinematic and motor con- MAX straints. In other words, it is su±cient to ¯x jj hip jj and jjxcapturejj, which greatly reduces the action space compared to previous parametrized controllers. Still, such a direct approach is not data e±cient as it does not utilize knowledge of the decision surface, and applying this approach on real robot with scarce training data requires much simpli¯cation of the controller.34 In this work, we take a hybrid approach. We use a low-dimensional decision boundary in state space, but instead of relying on a theoretical boundary from a simpli¯ed model, we use the training data from a simulated environment to get the empirical decision boundary for push recovery controllers. Then the model can be easily trained with limited number of data from the physical robot afterwards.

1650011-14 Whole-Body Balancing Walk Controller

4.1. Resource constrained humanoid platform Most of the physical implementation of push recovery controller introduced so far use human-sized robots, usually equipped with harmonic gear drive train, triaxial force– torque sensors and torque-controlled actuators. On the other hand, lightweight, low- cost humanoid robots with o®-the-shelf servomotors are now gaining popularity in part due to the commercial availability of a®ordable small humanoids. Although those a®ordable humanoids are limited in terms of their sensory, motor and pro- cessing power, they have been used for a viable research platform in many areas, including balancing control during walking. A number of push recovery approaches has been implemented on such platforms, including a crouching re°ex similar to hip strategy,35 frontal hip strategy,36,37 lateral step strategy31 and frontal ankle and step strategy.38 For this work, we use the commercially available DARwIn-OP humanoid robot and its simulation model as the test platform. It is 45 cm tall, weighs 2.8 kg, and has 20 of freedom. It has a 3-axis accelerometer and gyroscope for inertial sensing, and joint encoders at each joint for proprioceptive sensing. Position-controlled dyna- mixel servos are used for actuators, which are controlled by a custom microcon- troller connected to an embedded PC at a control frequency of 100 Hz.

4.2. The extended inverted pendulum model The abstract model we used in previous sections does not ¯t the physical hu- manoid platform well. The most notable di®erence is that the physical robot has feet with nonzero size, and the robot can be tipped on the boundary of the foot. Furthermore, the ankle torque is only indirectly controlled by proportional con- trol. Finally, the estimate of the linear position and velocity of the COM using noisy sensors can be very hard. Proprioceptory sensors can be used to determine COM position if we assume the support foot is on the ground, but such assumption will not hold if the robot is perturbed hard. Instead, we have found that the angular velocity and tilt angle information from inertial sensors are more reliable. Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. Thus, we propose a new abstract model for a resource constrained humanoid robot, which is shown in Fig. 10(a). It is an inverted pendulum with the tilt angle as state, and has a foot with toe position þ and heel position from the ankle

joint. The ankle torque ankle is controlled by a PD control of with saturation values mg þ and mg : :: 2 ¼ ! ðsinðÞð ankle þ hipÞ=mgz0Þ; ð31Þ : 00 00 ankle ¼ fsatðK p þ K d Þ; ð32Þ 8

1650011-15 S.-J. Yi et al.

(a) (b) (c) (d)

Fig. 10. The extended inverted pendulum model and the phase space trajectory plots generated using di®erent push recovery strategies. (a) An inverted pendulum model of robot with position controlled ankle joint and foot. (b) Ankle strategy. (c) Ankle plus hip strategy. (d) Ankle plus step strategy. White and gray regions in (b)–(d) are theoretical stable and unstable regions from (34). Darker gray regions in (c) and (d) are the increased stable region compared to using the ankle strategy alone.

We can linearize the stability regions in (15), (16) and consider the saturated case to get the following stability regions for the ankle, hip and step strategies: : þ =z0 <=! þ < =z0; ð34Þ : MAX ! =! þ >=z ðe TH1 1Þ2=mgz ; : 0 hip 0 35 ! ð Þ =! <þ= MAX TH1 2= ; þ z0 þ hip ðe 1Þ mgz0 : MAX ! MAX =! þ >=z ðe TH1 1Þ2=mgz x =z ; : 0 hip 0 capture 0 36 ! ð Þ =! <þ= MAX TH1 2= MAX = : þ z0 þ hip ðe 1Þ mgz0 þ x capture z0 Figures 10(c) and 10(d) show three trajectory plots acquired from various initial pushes using the extended inverted pendulum model and three di®erent sets of push þ recovery strategies. Parameters used are m ¼ 2, z0 ¼ 0:295, ¼ 0:05, ¼0:05, 00 00 : MAX : MAX : K p ¼ 500, K d ¼ 57 83, hip ¼ 1, TH1 ¼ 0 3, x capture ¼ 0 08, which are based on the multi-body model of the DARwIn-OP robot. Reduced mass of m ¼ 2 is used to compensate for the large leg mass of the robot. We see that the hip and step stra- Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. tegies help to enlarge the stability region, and even with the nonlinear dynamic model we use, the empirical stability region of the ankle strategy closely follows the theoretical one derived using simpler LIPM.

4.3. The ankle strategy with multi-body model To model more realistic, multi-body dynamics of the robot we use the Webots commercial robotic simulator39 based on the Open Dynamics Engine physics library and supplied simulated model of DARwIn-OP robot. We use our modular open source humanoid framework40 for controlling the robot. The controller update fre- quency and physics simulation frequency are set to 100 Hz. We use the COM height

z0 ¼ 0:295, step duration t STEP = 0.50 and robot center to ankle width d stance ¼ 0:375 for walk parameters. For the ankle strategy gain parameters, we use values of

1650011-16 Whole-Body Balancing Walk Controller

(a) (b) (c) (d)

Fig. 11. Phase space trajectory plots generated with the multi-body model and di®erent push recovery strategies in physically realistic simulations. (a) Ankle strategy, frontal push. (b) Ankle strategy, lateral push. (c) Ankle plus hip strategy, lateral push. (d) Ankle plus step strategy, frontal push. White and gray regions are theoretical stable and unstable regions from (34)–(36). Thick dashed lines are estimated linear boundary between stable and unstable regions.

00 00 : K p ¼ 0, K d ¼ 0 15 which are found to be e®ective in practice, as the position con- trolled joints and nonpoint feet already apply a positional negative feedback to the system. The robot is pushed with impulse forces for one timestep (0:01 s) with dif- ferent magnitudes and directions, and various combinations of push recovery strategies are evaluated and the state trajectories are logged. Figures 11(a) and 11(b) show the empirical decision boundaries found for the ankle strategy from frontal and lateral pushes. Although the empirical trajectory plots have shapes similar to those in Fig. 10, the empirical stability regions di®er signi¯cantly from those obtained via the abstract models. From the trajectory curves, we ¯t a linear classi¯er that best separates two regions for the duration 0:03 < t < 0:3, as our impulse impact setup makes an unrealistic big spike at sensor readings for one or two ^ simulation steps. Then we get the estimated values for and z^0 shown in Table 1, which implies following empirical stability boundaries for the ankle strategy: qffiffiffiffiffiffiffiffiffiffi: qffiffiffiffiffiffiffiffiffiffi = ^= << = þ ^þ= þ : g z^0 þ z^0 g ^z0 þ z^0 ð37Þ Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. 4.4. Deciding between hip and step strategies Given the empirical stability region of the ankle strategy controller, if the pertur- bations fall outside that region, we need to employ other push recovery controllers in addition to ankle controller to handle them. The LIPM-based abstract models (Figs. 4 and 5) imply the theoretical stability regions described in (16) and (17),

Table 1. Parameter values estimated from the multi-body model.

þ þ Parameter ^ z^0 ^ z^0 Frontal 0.45 1.09 −0.42 1.02 Lateral 0.84 1.45 −0.84 1.45

1650011-17 S.-J. Yi et al.

MAX MAX which can grow quite large with large hip and x capture. However, due to kinematic and velocity constraints we have practical limits for those values. Taking a step also takes time, which further restricts the e®ectiveness of the step strategy. We set jjxhipjj ¼ 40 , TH1 ¼ 0:15, TH2 ¼ 0:3 and jjxcapturejj ¼ 0:08 for hip and step strategy parameters and compare the results of the two strategies. Figures 11(c) and 11(d) show trajectory plots acquired from two sets of push recovery controllers: the ankle plus hip strategy and the ankle plus step strategy. In this case, a clear boundary for ankle plus hip strategy is not as evident as in Fig. 10(c), as the inertial sensor of our robot lies in the torso and rotates when the hip strategy is triggered. Instead of decoupling the hip rotation and sensory readings, which turned out very hard with noisy sensor model we use, we compared the outcome of two push recovery strategies against various magnitudes of perturbation to better compare the e®ectiveness of the two controllers. We have found that against frontal push, the step strategy can withstand slightly larger maximum perturbations than the hip strategy, 1.04 Ns versus 1.05 Ns for the step strategy, and step strategy has a wider region of stability MAX MAX than the hip strategy with ¯xed parameter values hip and x capture. On the other hand, the step strategy is not available for purely lateral perturbation due to kine- matic constraints and we have to rely on the hip strategy for such cases. In summary, the decision rule for push recovery strategies is as follows. We set the ankle strategy active all the time, and if the state estimate moves beyond the em- pirical stability boundaries in (37), the step strategy is triggered. In case the step strategy is not available due to constraints, the hip strategy is triggered instead.

4.5. Comparison with ZMP tracking controller To demonstrate the e®ectiveness of the hierarchical push recovery controller, we compare it to the commonly used closed-loop ZMP tracking controller. We imple- ment the ZMP tracking controller based on Ref. 18, with a single di®erence that the current state is estimated using an inertial sensor rather than joint encoders and forward kinematics. All other parameters remain unchanged. Various amounts of

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. frontal and lateral impulses were applied to the robot, and the outcome of push recovery e®ort is logged for each controller. Figure 12 shows the comparison of two controllers for forward, backward and sideways pushes. Figure 13 shows the stability regions of four di®erent combinations of push recovery controllers settings. We can see that the step strategy can handle the frontal perturbations fairly well, and the hip strategy is e®ective for lateral perturbations where the step strategy cannot be uti- lized due to the kinematic constraint. Overall, we see that the stability region of the suggested approach is approximately 21% larger and completely encompasses that of ZMP tracking method.

4.6. Extension to the full-sized humanoid robots In this paper, we have used only the DARwIn-OP miniature humanoid robot for testing in both the simulated and the real environments, which has relative large feet

1650011-18 Whole-Body Balancing Walk Controller

(a) (b)

(c) (d)

(e) (f)

Fig. 12. A comparison of the ZMP tracking controller and suggested hierarchical push recovery controller with di®erent impulse forces. (a) ZMP tracking controller, 1.05 Ns of frontal push. (b) Hierarchical push recovery controller, 1.05 Ns of frontal push. (c) ZMP tracking controller, 1.04 Ns of backward push. (d) Hierarchical push recovery controller, 1.04 Ns of backward push. (e) ZMP tracking controller, 1.61 Ns of lateral push. (f) Hierarchical push recovery controller, 1.61 Ns of lateral push.

size and larger power to weight ratio compared to common full-sized robots. To see how our method can extend to a larger robot, we have made a comparison to a full- sized position controlled humanoid robot, THOR-RD, which we have used for the Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. DARPA Robotics Challenge.41,42 The overall dimensions of two robots are shown in Fig. 14, and a more detailed comparison between two robots is provided in Table 2. We have found that due to the relatively lower COM height, the larger THOR- RD robot has slightly larger foot length to COM height ratio that a®ects the max- imum tilt angle the robot can recover from. On the other hand, due to the lower power to weight ratio and larger dimension of the robot, the maximum horizontal torso acceleration possible with full ankle torque is approximately three times smaller than DARwIn-OP robot. So overall we expect the ankle strategy to work similarly with larger robot, albeit being less responsive. And from (16) and (35), we can assume that the e®ectiveness of the hip strategy is roughly proportional to the MAX= hip mg. The comparison of the quantity over two robots shows that under this assumption, the hip strategy will be approximately 15% less e®ective with the larger THOR-RD robot. Finally, the THOR-RD robot has longer natural pendulum period

1650011-19 S.-J. Yi et al.

Fig. 13. Comparison of stability regions for four di®erent push recovery settings.

due to its higher COM height, and can take a larger step relative to the foot length. We expect both of these factors can help the e®ect of the step strategy. In summary, we expect that the suggested controller to work with larger position controlled humanoid robots as well, although the torque limit of the actuators can moderately degrade the performance of some strategy. Unfortunately, at the point of Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

Fig. 14. Comparison of the dimensions of the DARwIn-OP miniature humanoid robot and the THOR- RD full-sized humanoid robot.

1650011-20 Whole-Body Balancing Walk Controller

Table 2. Detailed comparison of the DARwIn-OP miniature humanoid robot and the THOR-RD full-sized humanoid robot.

DARwIn-OP THOR-RD Ratio Total height (m) 0.454 1.54 3.4 COM height (m) 0.295 0.70 2.37 Foot length (m) 0.104 0.260 2.5 Foot width (m) 0.66 0.160 2.42 Leg link length (m) 0.186 0.600 3.22 Weight (kg) 2.8 58 20.7 Max torque (Nm) 2.5 44.2 17.68 Foot length/COM height ratio 0.35 0.37 1.06 Leg/foot length ratio 1.78 2.30 1.29 Natural pendulum period (s) 0.17 0.27 1.59 Max COM acceleration (m/s2) 3.03 1.09 0.36 Max torque/mass ratio 0.89 0.76 0.85

writing this paper, we could not test the controller with THOR-RD robot as we could not risk possible hardware damage. This remains as a future work.

5. Experimental Results In addition to the simulated environment, we have implemented the integrated walk controller with push recovery on a commercially available DARwIn-OP small humanoid robot. All code and parameter values used for simulation are used to control the physical robot as well, with help of our modular open source humanoid framework.

5.1. Hardware setup To generate repeatable external perturbations, a motorized moving platform was constructed using Dynamixel servomotors (Fig. 15). To generate maximum peak acceleration, the platform is slowly accelerated in one direction and then suddenly

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. accelerated in the opposite direction. We have found the platform can generate

(a) (b)

Fig. 15. The servo platform to generate controlled perturbation. (a) The ankle strategy alone cannot withstand the perturbation generated by the moving platform. (b) The robot can withstand the same magnitude of perturbation with the hip strategy.

1650011-21 S.-J. Yi et al.

accelerations greater than 0.5 g while carrying the robot, providing large enough perturbations to make the robot fall without stabilization.

5.2. Empirical decision boundary with physical robot We applied various magnitudes of perturbations to the robot from the front, back, and one side while running the ankle strategy controller and measured the inertial sensor readings for one second to generate state trajectories of robot. Figure 16 shows the state trajectories in the frontal and lateral axis, which are ¯ltered with a moving average ¯lter with n ¼ 3. For the frontal pushes, we can see the trajectory plot shown in top part of Fig. 16(a) closely follows the graph acquired using simulated multi- body model in Fig. 11(b), showing an almost linear boundary between stable and unstable trajectories, while the slope is quite di®erent from theoretical one from LIPM shown in gray shade. However, for backward pushes, we see the shape of boundary is nonlinear at the initial part of the trajectory. This is due to mechanical backlash of the joint, and it is only noticeable for backward pushes as the robot leans slightly to the front with the default standing pose, eliminating the e®ect of backlash for frontal pushes. From the sets of state trajectories, we obtain the linear boundaries ^ with estimated parameters and z^0 shown in Table 3.

5.3. Testing the push recovery controller After estimating the boundary values shown in Table 3, we test the hierarchical push recovery controller against perturbations in realistic setting. Figure 17(a) shows the experimental setup. At each test, the pendulum starts swinging from stationary Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

(a) (b)

Fig. 16. Phase space trajectory plot acquired from frontal and lateral push experiment with the DARwIn- OP robot. (a) Frontal push. (b) Lateral push.

1650011-22 Whole-Body Balancing Walk Controller

Table 3. Parameter values estimated from the DARwIn-OP robot.

þ þ Parameter ^ z^ 0 ^ z^ 0 Frontal 1.3 2.7 1.1 2.7 Lateral 1.4 2.7 1.4 2.7

state, where the initial position is determined experimentally so that the perturba- tion is large enough to knock down the standing robot without any stabilization. We use the pendulum mass 500 g and length 75 cm, and the swing angle of 30 and 45 for frontal and lateral trials, which translate into 1.35 Ns and 1.61 Ns of perturba- tions respectively. For each of three di®erent push recovery settings, we have per- formed ¯ve total trials to get the standard deviation, where each trial consists of 20 tests. Figure 17(b) shows the comparison of three stabilization methods. We can see that due to a number of causes such as the battery depletion, slight impact position di®erence and temperature buildup at the actuator, there are some deviation of the results, but still our controller signi¯cantly helps the robot to withstand large dis- turbances. Interestingly, we have found that the physical robot can withstand larger perturbations than simulated one in Fig. 13, probably due to longer impact duration with physical setup. Then we let the robot walk with nonzero speed, and applied disturbances using a soft tipped stick to the robot to see how the walk controller handles the reactive stepping while locomotion. Figure 18 shows some examples of robot response against external disturbances. We see that the suggested controller can successfully trigger appropriate push recovery behaviors during locomotion to keep the robot from falling down. Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

(a) (b)

Fig. 17. The comparison of push recovery controller performances using the DARwIn-OP robot. (a) The experimental setup. (b) Test results.

1650011-23 S.-J. Yi et al.

(a)

(b)

(c) Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

(d)

Fig. 18. Responses of the push recovery controller against perturbation while walking. (a) The ankle and step strategies while walking forward at 18 cm/s. (b) The ankle and step strategies while walking backward at 12 cm/s. (c) The ankle and hip strategies while walking forward at 18 cm/s. (d) The ankle and hip strategies while turning at 0.6 rad/s.

1650011-24 Whole-Body Balancing Walk Controller

6. Conclusions We have demonstrated an integrated controller that enables full-body push recovery for humanoid robots without specialized sensors and actuators. Three low-level biomechanically motivated push recovery strategies are implemented on a position controlled humanoid robot, and integrated with a ZMP-based walk controller that allows reactive stepping. Instead of relying on inaccurate theoretical decision sur- faces, we propose to use a low-dimensional empirical decision surface for a hierar- chical controller that is learned from repeated trials both in a simulated environment using a multi-body model with proportional control joints, and in a real environment using a servo-controlled moving platform and DARwIn-OP small humanoid robot. Experimental results show that the trained controller can successfully initiate a full body push recovery behavior under external perturbations. Potential future work includes incorporating more sophisticated learning algorithms to better utilize the limited training data, and implementing these algorithms on full-sized humanoid robots.

Acknowledgments We acknowledge the support of the NSF PIRE program under contract OISE- 0730206, and ONR SAFFIR program under contract N00014-11-1-0074.

References 1. A. G. Hofmann, Robust Execution of Bipedal Walking Tasks from Biomechanical Principles, Ph.D. Thesis, Computer Science Department (Massachusetts Institute of Technology, Cambridge, MA, USA, 2006), 407 pp. 2. S. Kajita and K. Tani, Study of dynamic biped locomotion on rugged terrain, in IEEE Int. Conf. Robotics and Automation (Sacramento, CA, 1991), pp. 1405–1411. 3. S. Kajita, F. Kanehiro, K. Kaneko, K. Fujiwara, K. Harada and K. Yokoi, Biped walking pattern generation by using preview control of zero-moment point, in IEEE Int. Conf. Robotics and Automation (2003), pp. 1620–1626.

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. 4. K. Harada, S. Kajita, K. Kaneko and H. Hirukawa, An analytical method on real-time gait planning for a humanoid robot, in IEEE–RAS Int. Conf. Humanoid Robots, Vol. 2 (2004), pp. 640–655. 5. M. Morisawa, K. Harada, S. Kajita, K. Kaneko, F. Kanehiro, K. Fujiwara, S. Nakaoka and H. Hirukawa, A biped pattern generation allowing immediate modi¯cation of foot placement in real-time, in IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 581–586. 6. R. Tajima, D. Honda and K. Suga, Fast running experiments involving a humanoid robot, in IEEE Int. Conf. Robotics and Automation (Piscataway, NJ, USA, 2009), pp. 1418–1423. 7. J. Pratt, J. Car® and S. Drakunov, Capture point: A step toward humanoid push recovery, in 6th IEEE–RAS Int. Conf. Humanoid Robots (2006), pp. 200–207. 8. B. Stephens, Humanoid push recovery, in IEEE–RAS Int. Conf. Humanoid Robots (2007). 9. B. Jalgha, D. C. Asmar and I. Elhajj, A hybrid ankle/hip pre-emptive falling scheme for humanoid robots, in IEEE Int. Conf. Robotics and Automation (2011), pp. 1256–1262.

1650011-25 S.-J. Yi et al.

10. B.-K. Cho, S.-S. Park and J.-H. Oh, Stabilization of a hopping humanoid robot for a push, in IEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 60–65. 11. M. Morisawa, F. Kanehiro, K. Kaneko, N. Mansard, J. Sol, E. Yoshida, K. Yokoi and J.-P. Laumond, Combining suppression of the disturbance and reactive stepping for recovering balance, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010), pp. 3150–3156. 12. K. Hirai, M. Hirose, Y. Haikawa and T. Takenaka, The development of Honda humanoid robot, in IEEE Int. Conf. Robotics and Automation, Vol. 2 (IEEE, 1998), pp. 1321–1326. 13. I.-W. Park, J.-Y. Kim, J. Lee and J.-H. Oh, Mechanical design of humanoid robot platform khr-3 (kaist humanoid robot 3: Hubo), in IEEE–RAS Int. Conf. Humanoid Robots (2005), pp. 321–326. 14. S. Kajita, T. Nagasaki, K. Kaneko, K. Yokoi and K. Tanie, A running controller of humanoid biped hrp-2lr, in IEEE Int. Conf. Robotics and Automation (2005), pp. 616–622. 15. T. Buschmann, S. Lohmeier and H. Ulbrich, Humanoid robot Lola: Design and walking control, J. Physiology-Paris, 103(3–5) (2009) 141–148. 16. B.-K. Cho, J.-H. Kim and J.-H. Oh, Online balance controllers for a hopping and running humanoid robot, Adv. Robot. 25 (9–10) (2011) 1209–1225. 17. S. Kajita, M. Morisawa, K. Miura, S. Nakaoka, K. Harada, K. Kaneko, F. Kanehiro and K. Yokoi, Biped walking stabilization based on linear inverted pendulum tracking, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2010), pp. 4489–4496. 18. D. Gouaillier, C. Collette and C. Kilner, Omni-directional closed-loop walk for , in IEEE–RAS Int. Conf. Humanoid Robots (2010), pp. 448–454. 19. I. Ha, Y. Tamura and H. Asama, Gait pattern generation and stabilization for humanoid robot based on coupled oscillators, in IEEE/RSJ Int. Conf. Intelligent Robots and Sys- tems (2011), pp. 3207–3212. 20. V. Prahlad, D. Goswami and M.-H. Chia, Disturbance rejection by online ZMP com- pensation, Robotica, 26 (2008) 9–17. 21. C. Graf and T. Ro€fer, A closed-loop 3D-LIPM gait for the Robocup standard platform league humanoid, in Fourth Workshop on Humanoid Soccer Robots (2010), pp. 18–22. 22. M. Abdallah and A. Goswami, A biomechanically motivated two-phase strategy for biped upright balance control, in IEEE Int. Conf. Robotics and Automation (2005), pp. 2008–2013. 23. S.-H. Lee and A. Goswami, Ground reaction force control at each foot: A momentum- based humanoid balance controller for non-level and non-stationary ground, in IEEE/

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. RSJ Int. Conf. Intelligent Robots and Systems (2010), pp. 3157–3162. 24. B. Stephens, Push Recovery Control for Force-Controlled Humanoid Robots, Ph.D. Thesis, (Pittsburgh, PA, USA, 2011), 180 pp. 25. H. Diedam, D. Dimitrov, P.-B. Wieber, K. Mombaur and M. Diehl, Online walking gait generation with adaptive foot positioning through linear model predictive control, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (2008), pp. 1121–1126. 26. B. Stephens and C. Atkeson, Modeling and control of periodic humanoid balance using the linear biped model, in IEEE–RAS Int. Conf. Humanoid Robots (2009), pp. 379–384. 27. D. L. Wight, E. G. Kubica and D. W. L. Wang, Introduction of the foot placement estimator: A dynamic measure of balance for bipedal robotics, J. Comput. Nonlinear Dynam. 3(1) (2008) 011009. 28. S.-K. Yun and A. Goswami, Momentum-based reactive stepping controller on level and non-level ground for humanoid robot push recovery, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IEEE, 2011), pp. 3943–3950.

1650011-26 Whole-Body Balancing Walk Controller

29. T. Sugihara, Standing stabilizability and stepping maneuver in planar bipedalism based on the best COM-ZMP regulator, in Proc. 2009 IEEE Int. Conf. Robotics and Auto- mation (ICRA'09) (2009), pp. 669–674. 30. K. Nishiwaki and S. Kagami, High frequency walking pattern generation based on pre- view control of ZMP, in IEEE Int. Conf. Robotics and Automation (2006), pp. 2667–2672. 31. M. Missura and S. Behnke, Lateral capture steps for bipedal walking, in IEEE–RAS Int. Conf. Humanoid Robots (2011), pp. 401–408. 32. M. Missura and S. Benke, Omnidirectional capture steps for bipedal walking, in IEEE Int. Conf. Humanoid Robots (2013), pp. 14–20. 33. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Learning full body push recovery control for small humanoid robots, in IEEE Int. Conf. Robotics and Automation (2011), pp. 2047–2052. 34. S.-J. Yi, B.-T. Zhang, D. Hong and D. D. Lee, Online learning of a full body push recovery controller for omnidirectional walking, in IEEE–RAS Int. Conf. Humanoid Robots (2011), pp. 1–6. 35. R. Renner and S. Behnke, Instability detection and fall avoidance for a humanoid using attitude sensors and re°exes, in IEEE/RSJ Int. Conf. Intelligent Robots and Systems (2006), pp. 2967–2973. 36. D. N. Nenchev and A. Nishio, Ankle and hip strategies for balance recovery of a biped subjected to an impact, Robotica 26(5) (2008) 643–653. 37. B. Jalgha and D. Asmar, A simple momentum controller for humanoid push recovery, in Advances in Robotics, Vol. 5744 (Springer, Berlin, 2009). Lecture Notes in Computer Science, pp. 95–102. 38. B. Hengst, M. Lange and B. White, Learning ankle-tilt and foot-placement control for °at-footed bipedal balancing and walking, in IEEE–RAS Int. Conf. Humanoid Robots (2011), pp. 288–293. 39. O. Michel, Webots: Professional mobile robot simulation, J. Adv. Robot. Syst. 1(1) (2004) 39–42. 40. S. G. McGill, J. Brindza, S.-J. Yi and D. D. Lee, Uni¯ed humanoid robotics software platform, in 5th Workshop on Humanoid Soccer Robots (2010), pp. 7–11. 41. S.-J. Yi, S. G. McGill, L. Vadakedathu, Q. He, I. Ha, J. Han, H. Song, M. Rouleau, B.-T. Zhang, D. Hong, M. Yim and D. D. Lee, Team THOR's entry in the DARPA robotics challenge trials 2013, J. Field Robot. 32(3) (2014) 315–335. 42. S.-G. McGill, S. Yi and D. D. Lee, Team THOR's adaptive autonomy for disaster re- sponse humanoids, in IEEE Int. Conf. Humanoid Robots (2015), pp. 453–460. Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only.

Seung-Joon Yi received the B.Sc. degree from the School of Electrical Engineering and the Ph.D. degree from the School of Computer Science and Engineering, Seoul National University, Seoul, Korea, in 2000 and 2013, respectively. He is currently a Postdoctoral Fellow at the GRASP Laboratory, University of Pennsylvania, where he has also worked as a Visiting Scholar from 2009–2013. He is the author of over 20 technical publica- tions, proceedings, editorials and books. He has been the main developer of the University of Pennsylvania RoboCup robotic soccer team and the DARPA Robotics Challenge team. His research interests include reinforcement learning and humanoid robotics.

1650011-27 S.-J. Yi et al.

Byoung-Tak Zhang received the B.Sc. and M.Sc. degrees in Computer Science and Engineering from Seoul National Univer- sity, Seoul, Korea, in 1986 and 1988, respectively, and the Ph.D. degree in Computer Science from the University of Bonn, Bonn, Germany, in 1992. He is currently a Professor with the School of Computer Science and Engineering and the Graduate Programs in Bioinformatics, Brain Science and Cognitive Science, SNU, and directs the Biointelligence Laboratory and the Center for Bioinformation Technology. Prior to joining SNU, he was a Research Associate with the German National Research Center for Information Technology (GMD) from 1992–1995. From August 2003 to August 2004, he was a Visiting Professor with the Computer Science and Arti¯cial Intelligence Laboratory (CSAIL), MIT, Cambridge. His research interests include probabilistic models of learning and evolution, bio- molecular/DNA computing, and molecular learning/evolvable machines.

Dennis Hong is an Associate Professor and the Founding Di- rector of Robotics and Mechanisms Laboratory RoMeLa of the Mechanical Engineering Department at Virginia Tech. His re- search focuses on and manipulation, autono- mous vehicles and humanoid robots. His past awards include the NSF CAREER, the SAE Ralph R. Teetor Award, the ASME Freudenstein/GM Young Investigator Award, and has been named to Popular Science's \Brilliant 10" to name a few. As the inventor of a number of novel robots and mechanisms, Washington Post magazine called Dr. Hong \the Leonardo da Vinci of robots." He received his degrees in Me- chanical Engineering; B.Sc. from the University of Wisconsin Madison (1994), M.Sc. and Ph.D. degrees from Purdue University (1999, 2002).

Int. J. Human. Robot. 2016.13. Downloaded from www.worldscientific.com by SEOUL NATIONAL UNIVERSITY on 06/18/16. For personal use only. Daniel D. Lee is currently a Professor in the School of Engi- neering and Applied Science at the University of Pennsylvania. He studied Physics, receiving his A.B. from Harvard in 1990, and his Ph.D. in Condensed Matter Physics from MIT in 1995. After completing his studies, he joined Bell Labs, the research and development arm of Lucent Technologies, where he was a Researcher in the Theoretical Physics and Biological Computa- tion departments. After six years in industrial research, he joined the faculty at Penn in 2001 where he is currently in the Electrical and Systems Engineering Department and at the GRASP Robotics Laboratory. His research interests include machine learning, robotics and computational neuroscience.

1650011-28