An Analytical Diabolo Model for Robotic Learning and Control

An analytical diabolo model for robotic learning and control Felix von Drigalski∗1, Devwrat Joshi∗†1;3, Takayuki Murookay1;4, Kazutoshi Tanaka1, Masashi Hamaya1 and Yoshihisa Ijiri1;2 Abstract— In this paper, we present a diabolo model that can be used for training agents in simulation to play diabolo, as well as running it on a real dual robot arm system. We first derive an analytical model of the diabolo-string system and compare its accuracy using data recorded via motion capture, which we release as a public dataset of skilled play with diabolos of different dynamics. We show that our model outperforms a deep-learning-based predictor, both in terms of precision and physically consistent behavior. Next, we describe a method based on optimal control to generate robot trajectories that produce the desired diabolo trajectory, as well as a system to transform higher-level actions into robot motions. Finally, we test our method on a real robot system by playing the diabolo, and throwing it to and catching it from a human player. I. INTRODUCTION Movement arts such as dance, acrobatics and juggling have been enchanting audiences for millennia, and have recently become a popular challenge for robotics researchers, both for entertainment and teaching purposes [1]–[3], and to push the limits of current technology. Juggling, in particular, has seen increased interest [4]–[6] starting from the 1990’s, as robotic actuators become more capable of executing these high- Fig. 1. Top: Robots playing diabolo (or “kongzhu” or “Chinese yo-yo”), acceleration tasks. It is a non-prehensile manipulation task, a juggling prop that is accelerated by a string attached to the two sticks which the player manipulates to perform tricks. which is one of the most challenging types of manipulation Bottom: Internal representation of our proposed baseline method. Yellow: tasks [7]. Predicted diabolo trajectory. Green: Goal states. White: Prediction until the start of the next trajectory. Red/blue: Stick trajectories with via points. Red The diabolo is a juggling prop that has received very little transparent: Spheroid used to calculate the string effects. attention, likely due to its complex and unstable dynamics, which are difficult to model and formulate control laws for. However, following the recent advances in machine learning platforms such as OpenAI Gym [9] offers learning envi- and papers such as [8], a diabolo-playing robot appears ronments, whose dynamics has continuous state and action increasingly tractable. spaces close to the real-robot systems. Unlike the platform While investigating the diabolo as a learning problem, we provides few tasks on each environment (e.g. the agent learns realized that it offers a number of favorable properties for to only move forward in Ant and Hopper environments [9]), arXiv:2011.09068v1 [cs.RO] 18 Nov 2020 robot learning. For example, mistakes during play rarely the diabolo environment can offer more various and agile cause sudden failure, but only a change in the diabolo’s tasks in the same environment, and pose new and interesting orientation, leading to dense rewards in reinforcement learn- problems for the fields of continual learning or transfer ing settings compared to the other juggling tasks. Recent learning [10]. For example, we can start from learning to accelerate the diabolo, then how to stabilize it, and then ∗Authors contributed equally. gradually increase the difficulty from easier tricks, such as y Work done at OMRON SINIC X Corp. as part of an internship. moving in figure patterns, to more complex ones. We can 1Felix von Drigalski, Masashi Hamaya, Kazutoshi Tanaka and Yoshihisa Ijiri are with OMRON SINIC X Corporation, Hongo 5-24-5, Bunkyo- also apply the learned agents to other diabolo environments, ku, Tokyo, Japan f f.drigalski, masashi.hamaya, which have different dynamics depending on the string length kazutoshi.tanaka, yoshihisa.ijiri and diabolo geometry and weight. Considering this, the [email protected] 2Yoshihisa Ijiri is with OMRON Corporation, Konan 2-3-13, Minato-ku, diabolo task could be a new and suitable environment for Tokyo, Japan yoshihisa.ijiri @omron.com testing learning algorithms. 3 Devwrat Joshi is with the University of Osaka, Hosoda Laboratory As setting up and supervising two real robot arms to [email protected] 4Takayuki Murooka is with the University of Tokyo, JSK Laboratory learn diabolo playing is expensive and fast-moving robot [email protected] arms can be dangerous, it would be desirable to train agents in a simulated environment and thus make learning more Target diabolo trajectory practical. However, there are currently no environments that Player Robot control describe the diabolo’s behavior. Consequently, we propose Stick Joint an analytical model to predict the motion of the diabolo in Diabolo trajectory Robot IK trajectory Control node predictor 2 Hz 2 Hz Joint simulation which can be used in a learning environment, 500 Hz targets as well as a model-based control approach which should serve as a baseline solution to the problem, to which learning agents can be compared. Diabolo 100 Hz The contributions of this paper are as follows: state • A diabolo model generating realistic diabolo trajectories • An optimal-control-based method that determines stick 100 Hz trajectories which produce the desired diabolo state State Camera • A robot system that plays diabolo and can react by estimation images adjusting the stick trajectories based on the system state • A dataset of high-precision diabolo and stick positions during different diabolo play situations to train learning Fig. 2. The system controlling the diabolo play. The player node receives high-level goals such as “accelerate with technique A”, “accelerate with agents and evaluate our proposed model technique B”, “throw upwards”, “throw to the side”. We release the diabolo model and code along with a wrapper for the open-source simulator Gazebo [11], so it can be used as a plugin with existing robot systems or stand- up-and-down motion has been demonstrated with a robotic alone. hand [20]–[22]. The remainder of this paper is structured as follows. We B. Diabolo modeling describe related works in Section II. The proposed method is described in detail in Section III, and the experimental The interactions between the string and the diabolo are validation is shown in Section IV. We discuss our results in highly non-linear—we have found no analytical model in Section VI and conclude in Section VII. the literature, and very few systematic investigations of diabolo behavior and control. Sharpe et al. [23] used a strobe II. RELATED WORK light to compare the effect of different diabolo acceleration techniques, reporting maximum rotation speeds of 5700 rpm. A. Juggling with robots Guebara et al. [24] collected diabolo play data with motion Many juggling disciplines have been tackled in research to capture setups similar to ours. varying degrees of success and completeness. The following Murooka et al. [8] proposed a deep-learning-based diabolo studies focus mostly on the most fundamental techniques for stabilization control method which was demonstrated with a each type of juggling, such as throwing and catching a ball, mobile humanoid robot. However, their method assumes that rolling off and returning a yo-yo, or keeping a devilstick in the diabolo does not move, and the sticks always perform the the air. same motion. Applying the method to other diabolo motions The history of robot juggling has been well described by and tricks tasks would require retraining and collecting Ploeger et al. [4], who most recently used reinforcement appropriate training data, which can be a challenging and learning to train a robot arm to juggle two balls using a cup, uncertain process. In our work, we formulate an analytical using a model that outputs via-points and their speed, which model that can be used for arbitrary diabolo manipulations is similar to how we represent diabolo stick trajectories. and adjusted by setting meaningful physical parameters. Ball catching and throwing have been realized with two robots [5], and one-handed catching and throwing balls from III. METHOD and to human participants [12]. Kizaki et al. maintained a Our method consists of three main parts: two-ball cascade with a high-speed robot arm and an actuated 1)A predictor model estimating the diabolo’s motion as end effector [6]. Reist et al. designed a paddle with a one- a function of the stick tips’ motion (“stick trajectory”) degree-of-freedom actuator that can keep a bouncing ball 2)A“ player” that uses the predictor to find a stick centered due to its concave shape, and extended this “Blind trajectory that results in a target diabolo position and Juggler” into various arrangements [13]. velocity Devilstick manipulation with a mechanical aid and 3- 3)A robot control module that monitors the diabolo state ball juggling was also realized by hydraulic robotic arms and blends new trajectories by Schaal et al. [14], [15]. Flowerstick (similar to devil- The system layout is shown in Fig. 2. stick) manipulation was also addressed with visual feedback control using a high-speed vision system [16]. Con- A. Diabolo Predictor tact juggling was simplified to a disk-on-disk system and The goal for the predictor is to estimate the state of the demonstrated with closed loop [17] or feedback stabilization diabolo at time t, given the diabolo state at time t−1 and the control in [18], [19]. Periodic control to realize a yo-yo’s stick positions at time t. This predictor can then be used A) Fig. 3. The ellipse which defines the auxiliary spheroid used to evaluate the diabolo position during play. The ellipse’s focal points are at the stick Fig. 4. The positions and velocity vectors of the diabolo (in state tips.

Load more