Latent-Space Control with Semantic Constraints for Quadruped Locomotion
Total Page:16
File Type:pdf, Size:1020Kb
First Steps: Latent-Space Control with Semantic Constraints for Quadruped Locomotion Alexander L. Mitchell1;2, Martin Engelcke1, Oiwi Parker Jones1, David Surovik2, Siddhant Gangapurwala2, Oliwier Melon2, Ioannis Havoutis2, and Ingmar Posner1 Abstract— Traditional approaches to quadruped control fre- quently employ simplified, hand-derived models. This signif- Constraint 1 icantly reduces the capability of the robot since its effective kinematic range is curtailed. In addition, kinodynamic con- y' straints are often non-differentiable and difficult to implement in an optimisation approach. In this work, these challenges are addressed by framing quadruped control as optimisation in a Robot Trajectory structured latent space. A deep generative model captures a statistical representation of feasible joint configurations, whilst x z x' complex dynamic and terminal constraints are expressed via high-level, semantic indicators and represented by learned classifiers operating upon the latent space. As a consequence, Initial robot configuration VAE and constraint complex constraints are rendered differentiable and evaluated predictors an order of magnitude faster than analytical approaches. We s' validate the feasibility of locomotion trajectories optimised us- Constraint 2 ing our approach both in simulation and on a real-world ANY- mal quadruped. Our results demonstrate that this approach is Constraint N capable of generating smooth and realisable trajectories. To the best of our knowledge, this is the first time latent space control Fig. 1: A VAE encodes the robot state and captures cor- has been successfully applied to a complex, real robot platform. relations therein in a structured latent space. Once trained, performance predictors (triangles) attached to the latent space predict if constraints are satisfied and apply arbitrarily I. INTRODUCTION complex constraints such as robot stability. The flexibility Four-legged robots (quadrupeds) are capable of traversing of this approach allows for the application of any number a broader range of terrains than wheeled robots and can of constraints by performing activation maximisation of a carry larger payloads with longer battery lives than drones loss summed over all active constraints (see, for example, ([1], [2], [3], [4]). Unlike wheeled robots, legged robots can Eq.8). Decoding positions in latent space (blue) along the choose to place their feet on specific locations within the optimisation trajectory translates to movement of the robot. terrain which can support the mass of the robot. This makes them highly suitable for inspection and monitoring tasks in capabilities, leading to overly narrow convergence basins unstructured domains ([2], [4], [5]). However, this flexibility given feasible initial states (e.g. [6], [8], [9]). As a result, tra- comes at the cost of platform complexity. Kinematics and jectories solved using linear approximations are only suitable dynamics are considerably more complicated for quadrupeds for specific use-cases and do not easily generalise to novel than for wheeled robots or drones. This makes trajectory situations. In contrast, approaches which account for the full optimisation, i.e. computing feasible paths for robot end- kinodynamic representation of the robot require iterative, effectors and for the robot's centre of mass (CoM) given sequential optimisations to check for constraint satisfaction kinematic, dynamic, and environmental constraints, one of [10]. This often makes them computationally infeasible for the central challenges in their deployment ([6], [7]). real-time deployment. Traditionally, trajectory optimisation for quadrupeds is Here, we propose a radically different approach to solved using constrained optimisation. However, the robot's quadruped control. Using a generative model of the robot feasible joint space and dynamics such as stability, torque state we perform trajectory optimisation by directly optimis- limits, and contact forces, require complex often non- ing the position in a structured latent space, which captures differentiable constraints. This makes the optimisation in- a statistical model of the robot's feasible joint-space (see, tractable. A typical approach, therefore, is to use approx- Fig.1). In particular, we employ a variational autoencoder imate, hand-derived dynamic models and arbitrarily reduce (VAE) ([11], [12]) to encode the robot state including joint the kinematic range of the robot [6]. These linear approxima- positions. Inspired by [13], operational constraints are in- tions typically over-simplify the problem and limit platform duced via high-level, semantic indicators and represented by learned performance predictors operating directly in the la- 1Applied AI Lab (A2I), 2Dynamic Robot Systems (DRS) Oxford Robotics Institute (ORI), University of Oxford tent space. Arbitrarily complex constraints – and goals – can Correspondence to: [email protected] thus be enforced by performing gradient-based optimisation Non-Linear Program Latent Space e.g. TOWR Optimisation (joint trajectory) find (CoM position) (CoM euler angles) s.t. (stability condition) for every foot f: (stance condition) (feet positions) (contact forces) s.t. (initial condition) Key (robot dynamics) Kinematics approximated by for every foot : the Latent Space (kinematic limits) if foot in contact: Dynamics approximated by (no slip condition) Stability Predictor (contact forces) (friction cone) Dynamics approximated by Stance Predictor if foot in air: (no force) Fig. 2: Conceptual comparison of solving a non-linear program (NLP) such as TOWR with latent space optimisation. The NLP splits locomotion into an optimisation subject to a series of constraints, whilst trajectories in our approach are solved via gradient descent in a structured latent space. Complex constraints such as stability are enforced using learned classifiers (performance predictors) and gradients from differentiating a target loss are used to update a trajectory in latent space. Performance predictors replace multiple and often non-differentiable dynamics exhibited in the NLP. in the latent space driven by the performance predictors via ([7], [6]). For example, TOWR [6] is a general approach for activation maximisation [14]. solving locomotion tasks for any legged robot over known By implicitly capturing a full kinodynamic model, our terrain. Specifically, TOWR solves a non-linear program for approach enables efficient trajectory optimisation without the centre of mass and feet trajectories whilst optimising the relying on hand-specified dynamics models or linear ap- contact forces. This is computationally slow and prohibits proximations. In contrast to prior art, it predicts whether online updates to the current plan to react to environmen- constraints are satisfied with a single pass through our neural tal changes. Furthermore, kinematic constraints restrict the network. More importantly, it enables direct optimisation range of the robot's movement. While TOWR is suitable of traditionally non-differentiable dynamics (see, Fig. 2) to- for highly-dynamic locomotion, our approach is conceived gether with the correlations captured by the latent space, and to be sufficiently general for low-velocity operation such as this results in inherently smooth, feasible robot trajectories locomotion over rough terrain. when decoding poses into state-space along the optimised [15] estimates robot stability by finding a “margin of sta- path in latent space. bility” while taking into account the robot's centre of mass, We validate our approach to latent-space control in the friction forces, and torque limits. We use this formulation context of quadruped locomotion and constrain robot stabil- to create training data for our stability predictor and as a ity and end-effector contact dynamics to obtain a walking performance baseline. While the original approach requires gait (Fig. 1). Using both simulated and real robot exper- sampling and evaluating multiple points, our method directly iments, we demonstrate that our approach to latent-space optimises for stable trajectories using a learned stability control is able to find trajectories from unstable to stable predictor. robot configurations as well as to execute realisable gait Reinforcement learning (RL) (e.g. [17], [18]) is a promis- cycles. This is achieved with execution times almost an order ing alternative to optimisation-based approaches. However, of magnitude faster than a comparable analytical approach RL policies are difficult to train, requiring vast quantities [15]. of data, and suffer from unstable gradient estimates. Once To the best of our knowledge this is the first approach to trained, specific constraints such as foothold placements are latent-space control for a complex dynamic system which is impossible to enforce. In contrast, our approach is flexible: validated both in simulation and on a real platform. additional constraints are applied by constructing an appro- priate loss function summed over all active constraints, which II. RELATED WORK is differentiated to provide a gradient update. As torque controlled robots have become more com- Recent advances in latent-space optimisation also attempt plex, the weaknesses of traditional trajectory optimisation to overcome the limitations of traditional approaches. Uni- approaches have become more pronounced. These mostly versal Planning Networks (UPN) [19] use a gradient-based stem from large configuration-spaces and the complexity of method in a structured latent space to find trajectories model