Using Geometric Primitives to Unify Perception and Action for Object-Based Manipulation
Total Page:16
File Type:pdf, Size:1020Kb
USING GEOMETRIC PRIMITIVES TO UNIFY PERCEPTION AND ACTION FOR OBJECT-BASED MANIPULATION by Hunter Brown A thesis submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Science Department of Mechanical Engineering The University of Utah May 2018 Copyright c Hunter Brown 2018 All Rights Reserved The University of Utah Graduate School STATEMENT OF THESIS APPROVAL The thesis of Hunter Brown has been approved by the following supervisory committee members: Tucker Hermans , Chair March 14, 2018 Date Approved Mark Minor , Member March 14, 2018 Date Approved Jake Abbott , Member March 14, 2018 Date Approved and by Tim Ameel , Chair/Dean of the Department/College/School of Mechanical Engineering and by David B. Kieda, Dean of The Graduate School. ABSTRACT In this work we consider task-based planning in uncertainty. To make progress in this problem, we propose an end-to-end method that makes progress toward the unification of perception and manipulation. Critical for this unification is the geometric primitive. A geometric primitive is a 3D geometry that can be fit to a single view from a 3D image. Geometric primitives are a consistent structure in many scenes, and by leveraging this, perceptual tasks such as segmentation, localization, and recognition can be solved. Shar- ing this information between these subroutines also makes the method computationally efficient. Geometric primitives can be used to define a set of actions the robot can use to influ- ence the world. Leveraging the rich 3D information in geometric primitives allows the designer to develop actions with a high chance of success. In this work, we consider a pick-and-place action, parameterized by the object and scene constraints. The design of the perceptual capabilities and actions is independent of the task given to the robot, giving the robot more versatility to complete a range of tasks. With a large number of available actions, the robot needs to select which action the robot performs. We propose a task-specific reward function to determine the next-best action for the robot to complete the task. A key insight into making the action selection tractable is reasoning about the occluded regions of the scene. We propose to not reason about what could be in the occluded regions, but instead to treat the occluded regions as parts of the scene to explore. Defining reward functions that encourage this exploration while balancing trying to solve the given task gives the robot more versatility to perform many different tasks. Reasoning about occlusion in this way also makes actions in the scene more robust to scene uncertainty and increases the computational efficiency of the method overall. In this work, we show results for segmentation of geometric primitives on real data, and discuss problems with fitting their parameters. While positive segmentation results are shown, there are problems with fitting consistent parameters to the geometric prim- itives. We also present simulation results showing the action selection process solving a singulation task. We show that our method is able to perform this task in several scenes with varying levels of complexity. We compare against selecting actions at random, and show our method consistently takes fewer actions to solve the scene. iv For my partner Lucia, who is always supportive, and my Grandmother Linda Rae who has always been there. CONTENTS ABSTRACT ............................................................. iii LIST OF FIGURES ....................................................... vii CHAPTERS 1. INTRODUCTION .................................................... 1 2. RELATED WORKS ................................................... 6 2.1 Segmentation . .6 2.2 Recognition and Localization . .7 2.3 Robotic Approaches . .8 3. METHODS .......................................................... 10 3.1 Problem Definition . 10 3.2 Method Overview . 13 3.3 Geometric Primitive Segmentation . 15 3.4 Action Selection . 18 3.5 Actions . 18 3.5.1 Grasping . 19 3.5.2 Object Placement Locations . 21 3.6 Update State Estimation . 22 3.7 Object Singulation Task Formulation . 24 3.8 Put Away Groceries Task Formulation . 25 4. EXPERIMENTS AND RESULTS ........................................ 28 4.1 Geometric Primitive Segmentation . 28 4.2 Geometric Primitive Manipulation Planner . 29 5. CONCLUSION ...................................................... 37 6. FUTURE WORK ..................................................... 38 APPENDIX: ALGORITHMS ............................................... 39 REFERENCES ........................................................... 41 LIST OF FIGURES 1.1 Examples of common tasks robots might need to complete. (A) An example of a bathroom cabinet with medicine the robot must find, (B) example of a pantry with messy shelves the robot must organize, and (C) An example of a table with groceries the robot must put away. .1 1.2 An example of an over-segmentation created by using the geometric primi- tive. Each color represents a segment of the scene with a different geometric primitive fit to it. .4 3.1 Examples of a box (A), cylinder (B), and sphere (C) geometric primitive. 10 3.2 Given a point cloud of a drill (A) we can fit two cylinders and a box to represent this object (B). 11 3.3 Given an observation (A) possible true states (B). 14 3.4 An example of Geometric Primitive Segmentation results. (A) Initial 3D point cloud data. (B) Segmentation of the scene. Here each color represents all the pixels assigned to a given segment. (C) Geometric primitives fit to the different segments. Blue pixels have a box fit to them, green pixels have a cylinder, and red have a sphere. 17 3.5 An example of generated grasp candidates for (a) box, (b) sphere, and (c) cylinder. 19 3.6 Example of how to generate placement locations. (A) Possible state estima- tion the robot could encounter. The table is colored green, the objects purple and occluded regions are black. If the robot tries to find the placement loca- tions for the cylinder in the middle (B) is the footprint of the scene without the cylinder, (C) is the footprint of the cylinder (scaled to be easily visable), and (D) is the convolution where green indicates placement locations without collision and red placement locations with collision. 21 3.7 An example scene for the put away Groceries taks. (A) Counter full of gro- ceries for the robot to put away. (B) Cabinet the robot must place the groceries into. ............................................................. 25 4.1 Three examples of segmentations generated using Algorithm 2. The first column is an image of the scene. The second column is the segmentation, where each color is a unique segment in the scene. The third column idicates the type of geometric primitive that was assigned to each segment. Green is a cylinder, red a sphere, and blue a box. 29 4.2 Two different fits for the same data: (A) is an ideal fit and (B) is an error. 30 4.3 Examples of each scene type (A) is a large scene and (B) is a small scene. The supporting surface is shown in green and the objects in red. 31 4.4 The mean and CI for the data as a function of the number of objects in the scene. ............................................................ 32 4.5 The mean and CI for the data as a function of the size of the scene. 33 4.6 The mean and CI for the data as a function of the Method. 34 4.7 The mean and CI for the data as a function of the Method and the scene size. 34 4.8 The mean and CI for the data as a function of the Method and number of objects in the scene. 35 viii CHAPTER 1 INTRODUCTION Robots are becoming more relevant in domains beyond their industrial origin, such as in domestic settings and assistive care. A robot in these domains has to handle a wide assortment of challenging tasks. For example, an assistive-care robot could face many different jobs throughout the day. In the morning it might need to search the bathroom cabinet for a medication (Figure 1.1 A), in the afternoon organize a messy shelf (Figure 1.1 B), and in the evening put away groceries (Figure 1.1 C). While the robot might deal with similar objects and use similar actions, planning to solve a given task may vary significantly between different scenes. For example, when finding the medicine, there may be several new objects in the cabinet, because the previous night someone bought a new toothbrush and deodorant, and the objects that were in the cabinet yesterday could be moved to new locations. To complete many of these tasks the robot will need to manipulate the scene, such as moving the lotion to reveal the medicine behind it. This can be difficult (A) (B) (C) Figure 1.1. Examples of common tasks robots might need to complete. (A) An example of a bathroom cabinet with medicine the robot must find, (B) example of a pantry with messy shelves the robot must organize, and (C) An example of a table with groceries the robot must put away. 2 for the robot to do without knocking objects out of the cabinet, running into unseen parts of the scene, and even destroying itself or other objects. For robots to be successful in these new domains, they must be able to safely handle a multitude of tasks, in partially known, and unstructured environments. While there is limited structure in object-based manipulation problems, it is important to leverage the structure that does exist. We can exploit this structure by giving the robot a set of basic perceptual skills. Differentiating objects from each other (i.e, segmenting objects) gives the robot the ability to determine which parts of the scene are objects. Given a segmentation of the scene, the robot should determine the location of the objects in relation to the robot (i.e, localize the object). Segmenting and localizing objects gives the robot critical information about how to manipulate objects in the scene.