Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes from 2D Ones in RGB-Depth Images Zhuo Deng Longin Jan Latecki Temple University, Philadelphia, USA
[email protected],
[email protected] Abstract localizations from monocular imagery [6, 13, 20, 3], and 3D object recognitions on CAD models [29, 27]. But these This paper addresses the problem of amodal perception works either rely on a huge number of ideal 3D graphics of 3D object detection. The task is to not only find object models by assuming the locations are known or are inclined localizations in the 3D world, but also estimate their phys- to fail in cluttered environments where occlusions are very ical sizes and poses, even if only parts of them are visible common while depth orders are uncertain. in the RGB-D image. Recent approaches have attempted The recent advent of Microsoft Kinect and similar sen- to harness point cloud from depth channel to exploit 3D sors alleviated some of these challenges, and thus enabled features directly in the 3D space and demonstrated the su- an exciting new direction of approaches to 3D object detec- periority over traditional 2.5D representation approaches. tion [18, 17, 12, 11, 19, 25, 24]. Equipped with an active We revisit the amodal 3D detection problem by sticking infrared structured light sensor, Kinect is able to provide to the 2.5D representation framework, and directly relate much more accurate depth locations of objects associated 2.5D visual appearance to 3D objects. We propose a novel with their visual appearances. The RGB-Depth detection 3D object detection system that simultaneously predicts ob- approaches can be roughly categorized into two groups ac- jects’ 3D locations, physical sizes, and orientations in in- cording to the way to formulate feature representations from door scenes.