Machine Learning Algorithms in Bipedal Robot Control Shouyi Wang, Wanpracha Chaovalitwongse and Robert Babuskaˇ

2 Machine Learning Algorithms in Bipedal Robot Control Shouyi Wang, Wanpracha Chaovalitwongse and Robert Babuskaˇ Abstract—In the past decades, machine learning techniques, statically stable double-support phase (both feet in contact such as supervised learning, reinforcement learning, and unsu- with the ground) and the statically unstable single-support pervised learning, have been increasingly used in the control phase (only one foot contacts with the ground). Suitable engineering community. Various learning algorithms have been developed to achieve autonomous operation and intelligent deci- control strategies are required for step-to-step transitions. sion making for many complex and challenging control problems. • Under-actuated system: Walking robots are unconnected One of such problems is bipedal walking robot control. Although to the ground. Even if all joints of a bipedal robot are still in their early stages, learning techniques have demonstrated controlled perfectly, it is still not enough to completely promising potential to build adaptive control systems for bipedal control all the degrees of freedom (DOFs) of the robot. robots. This paper gives a review of recent advances on the state- • of-the-art learning algorithms and their applications to bipedal Multi-variable system: Walking systems usually have robot control. The effects and limitations of different learning many DOFs, especially in 3-dimensional spaces. The techniques are discussed through a representative selection of interactions between DOFs and the coordination of multi- examples from the literature. Guidelines for future research on joint movements have been recognized as a very difficult learning control of bipedal robots are provided in the end. control problem. Index Terms—Bipedal walking robots, learning control, super- • Changing environments: Bipedal robots have to be vised learning, reinforcement learning, unsupervised learning adaptive to uncertainties and respond to environmental changes correctly. For example, the ground may become I. INTRODUCTION uneven, elastic, sticky, soft or stiff; there may be obstacles Bipedal robot control is one of the most challenging and on the ground. A bipedal robot has to adjust its control popular research topics in the field of robotics. We have strategies fast enough to such environmental changes. witnessed an escalating development of bipedal walking robots based on various types of control mechanisms. However, In recent years, the great advances in computing power have unlike the well-solved classical control problems (e.g., control enabled the implementation of highly sophisticated learning of industrial robot arms), the control problem of bipedal robots algorithms in practice. Learning algorithms are among the is still far from being fully solved. Although many classical most valuable tools to solve complex problems that need model-based control techniques have been proposed to bipedal ‘intelligent decision making’, and to design truly intelligent robot control, such as trajectory tracking control [76], robust machines with human-like capabilities. Robot learning is a control [105], and model predictive control [57], these control rapidly growing area of research at the intersection of robotics laws are generally pre-computed and inflexible. The resulting and machine learning [22]. With a classical control approach, bipedal robots are usually not satisfactory in terms of sta- a robot is explicitly programmed to perform the desired task bility, adaptability and robustness. There are five exceptional using a complete mathematical model of the robot and its en- characteristics of bipedal robots that present challenges and vironment. The parameters of the control algorithms are often constrains to the design of control systems: chosen by hand after extensive empirical testing experiments. • Non-linear dynamics: Bipedal robots are highly non- On the other hand, in a learning control approach, a robot is linear and naturally unstable systems. The well-developed only provided with a partial model, and a machine learning classical control theories for linear systems cannot be algorithm is employed to fine-tune the parameters of the con- applied directly. trol system to acquire the desired skills. A learning controller • Discretely changing in dynamics: Each walking cycle is capable of improving its control policy autonomously over consists of two different situations in a sequence: the time, in some sense tending towards an ultimate goal. Learning control techniques have shown great potential of adaptability Shouyi Wang is with the Department of Industrial and Manufacturing and flexibility, and thus become extremely active in recent Systems Engineering, University of Texas at Arlington, Arlington, TX 76019, USA. years. There have been a number of successful applications E-mail: [email protected] of learning algorithms on bipedal robots [11], [25], [51], Wanpracha Chaovalitwongse is with the department of Industrial and Sys- [82], [104], [123]. Learning control techniques appear to be tems Engineering and the department of Radiology, University of Washington, Seattle, WA 98104, USA. promising in making bipedal robots reliable, adaptive and E-mail: [email protected] versatile. In fact, building intelligent humanoid walking robots Robert Babuskaˇ is with the Delft Center for Systems and Control, Delft has been one of the main research streams in machine learning. University of Technology, Delft, 2628CD Netherlands. E-mail: [email protected] If such robots are ever to become a reality, learning control techniques will definitely play an important role. 3 There are several comprehensive reviews of bipedal walking in training. To achieve good performance of generalization, the robots [16], [50], [109]. However, none of them has been training data set should contain a fully representative collection specifically dedicated to providing the review of the state-of- of data so that a valid general mapping between inputs and the-art learning techniques in the area of bipedal robot control. outputs can be found. SL is one of the most frequently used This paper aims at bridging this gap. The main objectives learning mechanisms in designing learning systems. A large of this paper are two-fold. The first goal is to review the number of SL algorithms have been developed in the past recent advances of mainstream learning algorithms. And the decades. They can be categorized into several major groups second objective is to investigate how learning techniques as discussed in the following. can be applied to bipedal walking control through the most 1) Neural Networks (NNs): NNs are powerful tools that representative examples. have been widely used to solve many SL tasks where there The rest of the paper is organized as follows. Section II, exists sufficient amount of training data. There are several pop- presents an overview of the three major types of learning ular learning algorithms for training NNs (such as Perceptron paradigms, and surveys the recent advances of the most in- learning rule, Widrow-Hoff rule), but the most well-known fluential learning algorithms. Section III provides an overview and commonly used one is backpropagation (BP) developed of the background of bipedal robot control, including stability by Rumelhart in the 1980’s [88]. BP adjusts the weights of a criteria, classical model-based and biological-inspired control NN by calculating how the error changes as each weight is approaches. Section IV presents the state-of-the-art learning increased or decreased slightly. The basic update rule of BP control techniques that have been applied to bipedal robots. is given by: Section V gives a technical comparison of learning algorithms @E !j = !j − α ; (1) by their advantages and disadvantages. Finally, we identify @!j some important open issues and promising directions for future where α is the learning rate that controls the size of weight research. changes at each iteration, @E is the partial derivative of the @!j error function E with respect to weight !j. BP-based NNs II. LEARNING ALGORITHMS have become popular in practice since they can often find a Learning algorithms specify how the changes in a learner’s good set of weights in a reasonable amount of time. They behavior depend on the inputs it received and on the feedback can be used to solve many problems involve large amounts of from the environment. Given the same input, a learning agent data and complex mapping relationships. As a gradient-based may respond differently later on than it did earlier on. With method, BP is subject to the local minima problem, which respect to the sort of feedback that a learner has access to, is inefficient in searching of global optimal solutions. One of learning algorithms generally fall into three broad categories: the approaches to tackle this problem is to try different initial supervised learning (SL), reinforcement learning (RL) and weights until a satisfactory solution is found [119]. unsupervised learning (UL). The basic structures of the three In general, the major advantage of NN-based SL methods learning paradigms are illustrated in Figure 1. is that they are convenient to use and one does not have to understand the solution in great detail. For example, one does Teacher/Targets not need to know anything about a robot’s model; a NN can be trained to estimate the robot’s model from the input-output Error data of the robot. However, the drawback is that the learned NN is usually difficult to interpret due to its complicated

Load more