Development of an Underwater Vision Sensor for 3D Mapping

Andrew Hogue and Michael Jenkin Department of Computer Science and Engineering and the Centre for Vision Research York University, Toronto, Ontario, Canada {hogue, jenkin}@cse.yorku.ca

Abstract— health is an indicator of global climate Alternatively, automatic algorithms could be developed for fish change and coral reefs themselves are important for sheltering classification in a similar vein to the coral reef classifiers noted fish and other aquatic life. Monitoring reefs is a time-consuming above. After the data has been analyzed, the robot could be re- and potentially dangerous task and as a consequence autonomous robotic mapping and surveillance is desired. This paper de- deployed over the same site to enable long-term monitoring of scribes an underwater vision-based sensor to aid in this task. a particular coral reef. An autonomous tool such as this would Underwater environments present many challenges for vision- facilitate a paradigm that is much safer to deploy based sensors and robotic vehicles. Lighting is highly variable, and would provide a bias-free and less dangerous long-term optical snow/particulate matter can confound traditional noise monitoring . models, the environment lacks visual structure, and limited communication between autonomous agents including divers and To this end, this paper describes an underwater sensor surface support exacerbates the potentially dangerous environ- capable of generating 3D models of underwater coral reefs. ment. We describe experiments with our multi-camera stereo Our sensor is designed to be deployed and operated by a single reconstruction algorithm geared towards coral reef monitoring. diver and ongoing research is investigating full integration of The sensor is used to estimate volumetric scene structure while the sensor with AQUA[4], an amphibious underwater robot simultaneously estimating sensor ego-motion. Preliminary field trials indicate the utility of the sensor for 3D reef monitoring (see Figure 1). The sensor is capable of collecting high- and results of land-based evaluation of the sensor are shown to resolution video data, generating accurate three-dimensional evaluate the accuracy of the system. models of the environment, and estimating the trajectory of the sensor as it travels. The algorithm developed is primarily I. INTRODUCTION used offline to extract highly accurate 3D models, however if Small changes in climate can produce devastating results on low-resolution images are used, the system operates at 7fps the world’s coral reef population1. For example, an increase using off-the-shelf hardware. in ocean of only a few degrees destroyed most The presents numerous challenges of the coral in Okinawa in 1998[1]. Thus, coral reefs are an for the design of robotic systems and vision-based algorithms. excellent indicator of global climate change. Monitoring reefs Yet it is these constraints and challenges that make this can be a very labor intensive task. An international organiza- environment almost ideal for the development and evaluation tion called Reef Check2 was established in 1997 to frequently of robotic and sensing technologies. Vehicles operating in this and accurately monitor coral reef environments. The methods environment must cope with the potentially unpredictable six- used by Reef Check rely on hundreds of volunteer divers to degree-of-freedom (6DOF) motion of the vehicle. Currents, identify aquatic species over 100m transects. This task is very surf and can produce unwanted and unexpected vehicle time-consuming, error-prone, and is potentially dangerous for motion. the divers. A natural choice for sensing for an aquatic vehicle is to use Advances in computer vision and robotic technology can be cameras and computer vision techniques to aid in navigation used to aid divers in monitoring tasks performed by organiza- and trajectory reconstruction. Unfortunately a host of problems tions such as Reef Check. The development of an autonomous plague underwater computer vision techniques. For example, robot to survey a particular reef area or transect would be the of the water caused by floating sedimentation of enormous value. After a reef section has been selected (”aquatic snow”) and other floating debris can cause problems for monitoring, the robot could be placed near the reef and for standard techniques for visual understanding of the world. could then travel autonomously along the designated transect The optical transmittance of water varies with wavelength and collecting data for later analysis. Algorithms such as [2] or dynamic lighting conditions can make it difficult to reliably [3] could then be used for the automatic classification of coral track visual features over time. Dynamic objects such as fish reefs. The raw data stream could be stored for later viewing and fan coral can create spurious measurements influenc- by a human operator to identify and categorize aquatic life. ing pose-estimation between frames. These challenges have prompted recent research in robotic vehicle design, sensing, 1http://www.marinebiology.org/coralbleaching.htm localization and mapping for underwater vehicles (cf. [5], [6], 2http://www.reefcheck.org [7], [8]). (a) (b) Fig. 2. Trinocular stereo rig and example reference images with correspond- ing dense disparity. Fig. 1. The AQUA Robot (a) swimming (b) walking with amphibious legs.

the control of the robot’s motion. In the terrestrial domain, sensing technologies such as stereo vision coupled with good vehicle odometry has been used to A. The AQUASENSOR construct 3D environmental models. Obtaining such odometry We have developed three versions of a standalone sensor information can be difficult without merging sensor data over package (see Figures 2 and 3) that is to be integrated into the the vehicle’s trajectory, and as a result there is a long history AQUA vehicle in the future. The configuration and limitations of research in simultaneous localization and mapping (SLAM) of each version of the sensor are discussed briefly here and the (cf. [9], [10], [11], [12]) for robotic vehicles. Terrestrial SLAM reader is referred to [17] for further details. AQUASENSOR algorithms often assume a predictable vehicle odometry model V1.0 consists of three fully calibrated Firewire (IEEE1394a) to assist in the probabilistic association of sensor information cameras in an L-shaped trinocular configuration used to extract with environment features. The lack of such predictable ve- dense depth information from the underwater environment. It hicle odometry underwater necessitates which are also integrates a 6DOF inertial measurement unit (Crossbow more dependent upon sensor information than is traditional IMU-300CC) to maintain relative orientation and position of in the terrestrial domain. The algorithm presented here uti- the unit as it moves, and the necessary electronics to drive lizes passive stereo vision to obtain local depth estimates the system. The cameras and inertial unit are tethered to a and uses temporal feature tracking information to estimate base computer system that is operated on land or on a boat the vehicle trajectory. Experimental results obtained with the via a custom-built 62m fibre-optic cable. The cameras capture algorithm during recent sea trials illustrate the effectiveness 640x480 resolution gray-scale images at 30 frames per second of the approach but are difficult to evaluate objectively. Land- and all three video streams are saved for later analysis. Prior based reconstructions conducted using the same algorithm and to operation, the intrinsic and extrinsic camera parameters are hardware are presented to evaluate the accuracy of the system estimated using Zhang’s camera calibration algorithm[18] and against ground-truth trajectory data. an underwater calibration target. AQUASENSOR V1.0 was field tested in Holetown, Barbados and hundreds of gigabytes II. THE AQUA ROBOT of trinocular and synchronized IMU data have been captured using the device. The algorithm described here is designed to operate under Cable management and mobility were issues in the field manual operation and also as a component of the AQUA with this version of the sensor. This initial version of the sensor robot (see [4], [13], [14], [15] and Figure 1). AQUA is a six- required significant surface support personnel to manage the legged amphibious robot based on a terrestrial hexapod named long tether, two divers were needed to orient and move the RHex[16]. RHex was the product of a research collaboration sensor housing, and weather conditions were problematic for between the Ambulatory Robotics Lab at McGill, the Uni- the base unit and operator. versity of Michigan, the University of California at Berkeley Building on our previous design with new mobility con- and Carnegie Mellon University. AQUA differs from RHex straints in mind, we re-designed the entire sensor to accom- in its ability to swim. AQUA has been designed specifically modate a more mobile solution that a single diver could deploy for amphibious operation and is capable of walking on land, and operate. swimming in the open water, station keeping at depths to 15m For AQUASENSOR V2.0 and V2.1 we adopted a Point and crawling along the bottom of the ocean. The vehicle devi- Grey3 BumblebeeTMto capture colour stereo images which had ates from traditional ROV’s in that it uses legs for locomotion. the added benefit of a smaller footprint than our previous When swimming, the legs act as paddles to displace the water camera system. Due to the smaller size of the Bumble- and on land the legs allow AQUA to walk on grass, gravel, bee and the adoption of a smaller inertial unit (a 3DOF sand, and snow. The paddle configuration gives the robot direct Inertiacube3TMfrom Intersense4) we were able to re-design the control over five of the six degrees of freedom: surge, heave, pitch, roll and yaw. An inclinometer, forward and rear facing 3http://www.ptgrey.com video cameras, and an on-board compass are used to assist in 4http://www.isense.com (a) (b) (c) (d) (e)

Fig. 3. Evolution of the AQUASENSOR design. V2.0 (a-c) and V2.1(d-e). (a) Hand-held unit (b) Hand-held unit with Bumblebee (c) Complete unit (d) Final system (e) Revised Hand-held unit. camera system to be smaller and simpler to operate. We placed frame-to-frame. To overcome these problems, we employ ro- all computing hardware into our previous underwater housing bust statistical estimation techniques to label the feature tracks and connected the cameras to the computing unit via a short as either static or non-static. This is achieved by estimating a (1.5m) cable. This new sensor design eliminated issues with rotation and translation model under the assumption that the tether management and removed the need for surface support scene is stationary. The resulting 3D temporal correspondences personnel. The computing unit could be harnessed directly to are associated with stable scene points for the basis of later and the neutrally buoyant hand-held unit was much processing. simpler to orient towards reef and other structures for data We represent the camera orientation using a quaternion and collection by a single diver. compute the least-squares best-fit rotation and translation for the sequence in a two stage process. First, using RANSAC[22] III. MAPPING ALGORITHM we compute the best linear least squares transformation using For AQUASENSOR V1.0, we utilized a hierarchical ap- Horn’s absolute orientation method[23]. Given two 3D point proach for stereo disparity extraction based upon the real- clouds rt0 and rt1 at times t0 and t1 respectively, we estimate time algorithm of Mulligan et al. [19]. After rectification, an the rotation and translation to bring rt1 into accordance with image pyramid is created and the stereo matching algorithm is rt0 . The centroid, r¯t0 , r¯t1 , of each point cloud are computed performed over the specified disparity range beginning at the and subtracted from the points to obtain two new point sets, coarsest level of the pyramid. An occlusion map is extracted ′ ′ rt0 = rt0 − r¯t0 and rt1 = rt1 − r¯t1 by performing the stereo algorithm in the reverse direction To compute the rotation, R(·) represented as a quaternion, on the left-right image pair for the coarsest level. For each we minimize the error function level in the pyramid, the disparity map is transformed to the n appropriate resolution and the disparities are refined using ′ ′ 2 X ||rt0,i − sR(rt1,i)|| . (1) the successive pyramid levels. Example disparity images are i=1 shown in Figure 2. AQUASENSOR V2.0 leveraged the real- The rotation, R(·), and scale, s, are estimated using a linear time sum-of-squared differences algorithm implemented in the least-squares approach (detailed in [23]). After estimating Point Grey Triclops library to extract disparity maps more the rotation, the translation is estimated by transforming the efficiently. centroids into a common frame and subtracting. A. Ego-motion estimation The final step is to refine the rotation and translation using a nonlinear Levenberg-Marquardt minimization[24] over six In order to estimate the motion of the camera, we utilized parameters. For this stage we parameterize the rotation as a both 2D image motion and 3D data from the extracted Rodrigues vector[25] and estimate the rotation and translation disparities. First, “good” features are extracted from the left parameters by minimizing the transformation error camera at time t using the Kanade-Lucas-Tomasi feature n tracking algorithm (see [20], [21]) and are tracked into the 2 subsequent image at time t + 1. Using the disparity map X ||rt0,i − (R(rt1,i) + T )|| . (2) previously extracted for both time steps, tracked points that i=1 do not have a corresponding disparity at both time t and In practice, we found that the minimization takes less than t + 1 are eliminated. The surviving points are subsequently five iterations to minimize the error to acceptable levels. triangulated to determine the metric 3D points associated with This approach to pose estimation differs from the traditional each disparity. Bundle-Adjustment[26] in the structure-from-motion literature In underwater scenes, many objects and points are visually in that it does not refine the 3D locations of the features. We similar and thus many of the feature tracks will be incorrect. chose not to refine the 3D structure to limit the number of Dynamic illumination effects and moving objects (e.g. fish) unknowns in our minimization and thus provide a solution to increase the number of incorrect points that are tracked from our system more quickly. Our minimization is thus prone to (a) (b) (c) (d)

Fig. 4. Reference Images from Underwater Sequences.

(a) (b) (c) (d)

Fig. 5. Volumetric Reconstruction of Underwater Sequences with Camera Trajectory shown in (b) and (d). error in the 3D structure which is correlated with the error the underwater sequences are shown in Figure 4 and their in the pose. We overcome this limitation by utilizing Point corresponding reconstructions can be seen in Figure 5. The Grey’s error model of the disparity estimation on a per point recovered camera trajectory is shown in the reconstructions as basis. The accuracy of each 3D point is determined from the well. In figures 5(a) and 5(b), the structure of the rope and hole stereo parameters used in the disparity estimation process this in the barge is clearly visible. The second sequence (figures information and is used in the 3D modeling phase. We plan on 5(c) and 5(d)) is of a bed of coral growing on the barge. utilizing this information as part of a weighted least-squares Land-based reconstructions show the ability to use the estimation of pose in the future. sensor to reconstruct terrestrial scenes captured with 6DOF hand-held motion. To evaluate the accuracy of the sensor, we B. Volumetric Reconstruction performed experiments and compared to recovered trajectory We use a volumetric approach to visualize the resulting 3D against ground truth trajectory data. To acquire ground truth model. We register the 3D point clouds into a global frame 3D data, we constructed a scene with known measurements using the previously computed camera pose and each point is and independently tracked the camera sensor with an IS- added to an octree. Upon adding the points to the octree, they 900TMmotion tracking system from Intersense5. We also man- are averaged to maintain a constant number of points per node. ually measured the distance between already identified points The octree is pruned to remove isolated points which produces in the physical scene using a standard measuring tape and a result that is less noisy in appearance and can be manipulated augmented our system to allow for interactive picking of 3D in real-time for visualization. The octree can be viewed at points in the model. any level to produce a coarse or fine representation of the We placed two yellow markers 63.5 cm apart on a platform underwater data. Subsequently, a mesh can then be extracted and moved the sensor in an approximate straight line from one using algorithms such as the constrained elastic surface net marker to the other. Figures 6(a) and 6(b) show the stereo pairs algorithm[27]. associated with the first and the last image of the sequence. A second test was also constructed in which the sensor was IV. EXPERIMENTS moved in a circular trajectory (Figure 6(f)-(h)). An Intersense In this section, we describe the evaluation of our recon- IS-900 tracking system was used to provide absolute pose and struction system. We performed several experiments and field the pose error was computed on a frame to frame basis. trials to demonstrate the utility and accuracy of the sensor. We performed a manual calibration of the offset between Results from field experiments show the reconstruction of a the IS-900 sensor and the camera coordinate frame to bring coral bed and pieces of a sunken barge found in Folkstone Marine Reserve in Holetown, Barbados. Sample images from 5http://www.isense.com (a) (b)

(c) (d) (e)

(f) (g) (h)

Fig. 6. Land-based Experiments. (a-b) Show reference stereo images from two frames in the sequence. (c) shows the markers and two measured points with a line drawn between them (in green). The distance between these points was manually measured to be 63.5 cm while the vision-based reconstruction reported a distance of 63.1 cm. (d-e) Show the reconstruction of the scene with the markers placed within it. (f-h) show a second sequence of a colleague sitting in the chair. In all reconstructions, the computed trajectory is shown in green while the absolute IS900 trajectory is shown in red. the measurements into the same space. The euclidean distance domain as well. Although the underwater domain presents between the 3D camera positions of the vision-based recon- unique challenges to traditional vision algorithms, robust struction and the IS-900 absolute position was used as an error noise-suppression mechanisms can be used to overcome many metric and the resulting error is plotted in Figure 7 for the of these difficulties. Results from land-based experiments two reconstructions. In the first error plot, the mean error was demonstrate the accuracy of our reconstruction system. Results found to be 2.1 cm and in the second the mean error was 1.7 from underwater field trials also demonstrate that the system cm. The reconstructed 3D models for these experiments are can operate underwater to reconstruct structures underwater. shown in Figure 6. The error is computed as the euclidean The sensing system described in this paper is part of the AQUA distance between the position estimate given from our system robot sensor package. Future work includes the embedding of versus the absolute position given from the IS-900 tracker. the technology within the robot housing and integrating the stereo reconstruction with inertial data to improve frame-to- V. DISCUSSION AND FUTURE WORK frame motion estimation when no texture is available. Traditional underwater sensing devices have relied on ac- tive sensors ( in particular) to recover three-dimensional ACKNOWLEDGMENTS environmental structure. Advances in stereo sensing and data We graciously acknowledge the funding provided by fusion technologies demonstrates that passive stereo is now NSERC and IRIS NCE for the AQUA project. We also would a sufficiently robust technology to be applied in the aquatic like to thank the McGill AQUA team for engineering support [6] R. Eustice, “Large-Area Visually Augmented Navigation for Au- tonomous Underwater Vehicles,” Ph.D. dissertation, MIT - Woods Hole Oceanographic Institute, June 2005. [7] O. Pizarro, R. Eustice, and H. Singh, “Large Area 3D Reconstructions from Underwater Surveys,” in MTS/IEEE OCEANS Conference and Exhibition, November 2004, pp. 678–687, Kobe, Japan. [8] S. Williams and I. Mahon, “Simultaneous Localisation and Mapping on the ,” in IEEE International Conference on Robotics and Automation (ICRA), April 26-May 1 2004. [9] R. Smith and P. Cheeseman, “On the Representation and Estimation of Spatial Uncertainty,” International Journal of Robotics Research, vol. 5, no. 4, pp. 56–68, 1986. [10] R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,” Autonomous Robot Vehicles, pp. 167–193, 1990. [11] E. Nettleton, P. Gibbens, and H. Durrant-Whyte, “Closed form solutions to the multiple platform simultaneous localisation and map building (a) Error for the trajectory shown in Figure (SLAM) problem,” in Sensor Fusion: Architectures, Algorithms,and 6(d)-(e) Applications IV, B. Dasarathy, Ed., 2000, pp. 428–437. [12] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit, “FastSLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Provably Converges,” in Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI), 2003. [13] C. Georgiades, A. German, A. Hogue, H. Liu, C. Prahacs, A. Ripsman, R. Sim, L. Torres, P. Zhang, M. Buehler, G. Dudek, M. Jenkin, and E. Milios, “AQUA: an aquatic walking robot,” in The 7th Unmanned Underwater Vehicle Showcase (UUVS), Sep. 28-29 2004, southampton Centre, UK. [14] ——, “AQUA: an aquatic walking robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Sept 28 - Oct 2 2004, pp. 3525–3531. [15] C. Georgiades, A. Hogue, H. Liu, A. Ripsman, R. Sim, L. Torres, P. Zhang, C. Prahacs, M. Buehler, G. Dudek, M. Jenkin, and E. Milios, “AQUA: an aquatic walking robot,” Dalhousie University, Nova Scotia, Canada, Tech. Rep. CS-2003-08, Sept 28 - Oct 2 2003. [16] R. Altendorfer, N. Moore, H. Komsuoglu, M. Buehler, H. B. Brown Jr., D. McMordie, U. Saranli, R. Full, and D. Koditschek, “RHex: A (b) Error for the trajectory shown in Figure Biologically Inspired Hexapod Runner,” Autonomous Robots, vol. 11, 6(f)-(h) pp. 207–213, 2001. [17] A. Hogue, A. German, J. Zacher, and M. Jenkin, “Underwater 3D Map- ping: Experiences and Lessons Learned,” in Third Canadian Conference Fig. 7. Error plots for two trajectory reconstructions. The euclidean distance on Computer and Robot Vision (CRV), June 7-9 2006. error (plotted in meters) is shown over time (in frames). The error was [18] Z. Zhang, “A Flexible New Technique for Camera Calibration,” IEEE computed between the absolute position reported by the IS-900 tracking Transactions on Pattern Analysis and Machine Intelligence (PAMI), system and the camera position reported from our vision-based reconstruction vol. 22, no. 11, pp. 1330–1334, November 2000. algorithm. [19] J. Mulligan, V. Isler, and K. Danniilidis, “Trinocular Stereo: a Real- Time Algorithm and its Evaluation,” International Journal of Computer Vision, vol. 47, pp. 51–61, 2002. and the McGill Bellairs Research Institute for providing a [20] J. Shi and C. Tomasi, “Good Features to Track,” in IEEE Computer So- ciety Conference on Computer Vision and Pattern Recognition (CVPR), positive atmosphere during the field research trials. We also Jun. 21-23 1994, pp. 593 – 600. thank Urszula Jasiobedzka for her patience and proofreading. [21] B. Lucas and T. Kanade, “An Iterative Image Registration Technique with an Application to Stereo Vision,” in International Joint Conference on Artificial Intelligence (IJCAI), 1981, pp. 674–679. [22] M. Fischler and R. Bolles, “Random sample consensus: A paradigm REFERENCES for model fitting with application to image analysis and automated [1] Y. Furushima, H. Yamamoto, T. Maruyama, T. Ohyagi, Y. Yamamura, cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–385, S. Imanaga, S. Fujishima, Y. Nakazawa, and A. Shimamura, “Necessity 1981. of bottom topography measurements in coral reef regions,” in MTS/IEEE [23] B. Horn, “Closed-form solution of absolute orientaiton using unit OCEANS, Nov. 9-12 2004, pp. 930 – 935. quaternions,” AI Magazine, vol. A, no. 4, p. 629, April 1987. [2] M. Soriano, S. Marcos, C. Saloma, M. Quibilan, and P. Alino, “Image [24] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical classification of coral reef components from underwater color video,” in Recipes in C. Cambridge University Press, 2002. MTS/IEEE OCEANS, Nov. 5-8 2001, pp. 1008 – 1013. [25] E. Weisstein, “Rodrigues’ Rotation Formula.” [3] M. Marcos, M. Soriano, and C. Saloma, “Classification of coral reef From MathWorld–A Wolfram Web Resource. images from underwater video using neural networks,” Optics Express, http://mathworld.wolfram.com/RodriguesRotationFormula.html, 2006. no. 13, pp. 8766–8771, 2005. [26] B. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon, Bundle [4] G. Dudek, M. Jenkin, C. Prahacs, A. Hogue, J. Sattar, P. Giguere, Adjustment - A Modern Synthesis, ser. Lecture Notes in Computer A. German, H. Liu, S. Saunderson, A. Ripsman, S. Simhon, L. Torres, Science. Springer-Verlag, 2000, pp. 278–375. E. Milios, P. Zhang, and I. Rekleitis, “A visually guided swimming [27] S. Frisken, “Constrained Elastic SurfaceNets: Generating Smooth Mod- robot,” in IEEE/RSJ International Conference on Intelligent Robots and els from Binary Segmented Data,” in International Conference on Systems (IROS), Aug. 2-6 2005. Medical Image Computing and Computer-Assisted Intervention, October [5] R. Eustice, H. Singh, and J. Leonard, “Exactly Sparse Delayed-State 1998, pp. 888–898. Filters,” in IEEE International Conference on Robotics and Automation (ICRA), Apr. 18-22 2005.