POMDP-Based Candy Server
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling (ICAPS 2019) POMDP-Based Candy Server: Lessons Learned from a Seven Day Demo Marcus Hoerger,∗ Joshua Song,y Hanna Kurniawati,z Alberto Elfesx ∗†School of Information Technology & Electrical Engineering, University of Queensland, Australia zResearch School of Computer Science, Australian National University xRobotics and Autonomous Systems Group, Data61, CSIRO fm.hoerger, [email protected]@uq.edu.au, [email protected], [email protected] Abstract POMDP solver as the sole motion planner of a 6-DOFs ma- nipulator performing a task that requires a relatively large An autonomous robot must decide a good strategy to achieve number of steps to accomplish: The robot must move to its long term goal, despite various types of uncertainty. The pick-up a cup placed on a table, scoop candies using the Partially Observable Markov Decision Processes (POMDPs) cup, and put the cup back on the table (Figure 1(a) illustrates is a principled framework to address such a decision making problem. Despite the computational intractability of solving the task). In this scenario, the robot must trigger when scan- POMDPs, the past decade has seen substantial advancement ning of the environment is required and cope with ∼3cm in POMDP solvers. This paper presents our experience in en- perception errors (due to minimal effort in camera calibra- abling on-line POMDP solving to become the sole motion tion), user’s behaviour of moving the cup when the robot is planner for a robot manipulation demo at IEEE SIMPAR and trying to pick it up, and changes to the surface of the candies ICRA 2018. The demo scenario is a candy-serving robot: A after each scoop. 6-DOFs robot arm must pick-up a cup placed on a table by We divide the above task into four motion planning prob- a user, use the cup to scoop candies from a box, and put the lems: Picking up the cup, Approaching the candy box, cup of candies back on the table. The average perception er- ror is ∼3cm (≈ the radius of the cup), affecting the posi- Scooping, and Placing the cup back on the table. Each prob- tion of the cup and the surface level of the candies. This pa- lem was modeled as a POMDP and solved using an on-line per presents a strategy to alleviate the curse of history issue solver. We found that with almost naive modelling, existing plaguing this scenario, the perception system and its integra- solvers are capable in handling the size of the state and ob- tion with the planner, and lessons learned in enabling an on- servation spaces posed by each problem. line POMDP solver to become the sole motion planner of this However, for the picking problem, a major difficulty entire task. The POMDP-based system were tested through a comes from the curse of history. Due to the various “playful” 7 days live demo at the two conferences. In this demo, 150 user’s behaviour, a planner needs to perform a large number runs were attempted and 98% of them were successful. We of lookahead steps to compute a robust strategy. The prob- also conducted further experiments to test the capability of our POMDP-based system when the environment is relatively lem is the size of possible future scenarios that a POMDP cluttered by obstacles and when the user moves the cup while solver must take into account is exponential in the number the robot tries to pick it up. In both cases, our POMDP-based of lookahead steps being considered. As a result, even re- system reaches a success rate of 90% and above. cent sampling-based on-line solvers(Kurniawati and Yadav 2013),(Silver and Veness 2010),(Somani et al. 2013),(Sun- berg and Kochenderfer 2018) have a time complexity that Introduction grows exponentially with the number of lookahead steps Decision making in the presence of uncertainty is a crit- needed to generate a good solution. Methods have been pro- ical component in robot autonomy. The Partially Observ- posed to alleviate the curse of history issue(He, Brunskill, able Markov Decision Processes (POMDPs) is a princi- and Roy 2010),(Kurniawati et al. 2011). However generally, pled framework for solving such decision making prob- on-line POMDP solvers perform well for lookahead step lems. Although computing the exact POMDP solution is in- of up to 10–15 steps (within reasonable computational re- tractable(Papadimitriou and Tsitsiklis 1987) in general, the sources), which is often insufficient for the above problem. past decade has seen approximate POMDP solvers starting Moreover, the curse of history is amplified by the large to become practical for various robotics problems. action space imposed by a 6-DOFs manipulator. Methods As part of the Kinova MOVO Beta program at the have been proposed to alleviate this issue(Seiler, Kurniawati, IEEE SIMPAR and ICRA 2018, we apply a general on-line and Singh 2015),(Sunberg and Kochenderfer 2018). How- ever, existing solvers can only perform well for problems ∗CSIRO and UQ International Scholarship with 3–4 continuous action spaces(Seiler, Kurniawati, and yAustralian Postgraduate Award Singh 2015), while a method that can perform well for prob- Copyright c 2019, Association for the Advancement of Artificial lems with 100,000 discrete actions was only recently pro- Intelligence (www.aaai.org). All rights reserved. posed(Wang, Kurniawati, and Kroese 2018). To alleviate 698 this issue, we resort to use a small discrete number of fixed average perception error in the object localization is 3cm. increments and decrements of the joint angles as the action Furthermore, the cup location can change at run-time, e.g. space. This strategy helps to keep the size of the action space a user might pick-up the cup and place it at a different lo- manageable, but at the cost of an increased planning horizon. cation on the table while the robot is moving to pick it up, In this paper, we propose to alleviate the curse of history whereas the surface of the candies in the box changes after issue by using macro-actions, represented as finite state ma- every scoop. chines. We also present the POMDP models for each of the We divide the above task into 4 motion planning prob- four problems mentioned above and how they are combined lems, i.e.: together, so as to allow a smooth transition between prob- Picking: Moving the manipulator from its initial configura- lems. Furthermore, we present the perception system and tion to pick-up a cup from a table, despite not knowing how it is integrated with the POMDP-based planner, and the exact initial position of the table and the cup, possible some implementation tips and tricks to ensure smooth op- changes of the cup’s position during run-time, and per- eration of the POMDP planner on a physical 6-DOFs ma- ception errors. nipulator. Pre-Scoop: Moving the cup to a pose above the candy box The above robotics scenario and system was demon- that is beneficial for scooping strated as part of a robotics demo that ran for a total of seven days. During this time, 150 runs were attempted, and our so- Scooping: Scooping candies from a box using the cup, de- lution reached 98% success rate. After the demo, we slightly spite not knowing the exact surface of the candies and per- improved the system and conducted more systematic exper- ception errors iments, in particular in scenarios where a user displaces the Placement: Placing the cup back to the table, while ensur- cup that the robot tries to pick-up and in cluttered environ- ing that no candies fall out of the cup. ments. Experiments in such scenarios indicate that our solu- To enable robust operation despite the various types of un- tion reaches no less than 90% success rate. certainties, we formulate each problem as a Partially Ob- servable Markov Decision Process (POMDP). The different The System POMDP problems are designed to have overlapping state The manipulator we use is a Kinova 6-DOFs Jaco arm with variables, so that the end of one problem automatically tran- KG3 gripper, attached to a Kinova MOVO mobile manip- sition into the beginning of the next problem with ease. ulator. In this work, we use only one of the arms of the MOVO and viewed the MOVO as a static base for the Jaco POMDP Background arm. The sensor used is the one attached to the MOVO, i.e., Formally, a POMDP is a tuple (S; A; O; T; Z; R; b0; γ). The Kinect v2 mounted on the MOVO’s head and encoders at the notations S, A and O are the state, action, and observation joints of the Jaco arm. By default, MOVO comes with a non- spaces of the robot. At each time-step, a POMDP agent is calibrated Kinect v2 sensor. We perform internal calibration in a state s 2 S, executes an action a 2 A, perceives an of the sensor, but not the external calibration. We found that observation o 2 O, and moves to the next state s0 2 S. the error due to no external calibration causes model and Which next state it ends up depends on the transition perception error that can easily be handled by our solver. function T (s; a; s0), which is a conditional probability MOVO runs the Robot Operating System (ROS) and function P (s0js; a) representing uncertainty in the effect of comes with two Next Unit of Computing (NUC) comput- actions, while the observation o perceived depends on the ers.