Dynamic Reconfiguration of Mission Parameters in Underwater Human
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Reconfiguration of Mission Parameters in Underwater Human-Robot Collaboration Md Jahidul Islam1, Marc Ho2, and Junaed Sattar3 Abstract— This paper presents a real-time programming and in the underwater domain, what would otherwise be straight- parameter reconfiguration method for autonomous underwater forward deployments in terrestrial settings often become ex- robots in human-robot collaborative tasks. Using a set of tremely complex undertakings for underwater robots, which intuitive and meaningful hand gestures, we develop a syntacti- cally simple framework that is computationally more efficient require close human supervision. Since Wi-Fi or radio (i.e., than a complex, grammar-based approach. In the proposed electromagnetic) communication is not available or severely framework, a convolutional neural network is trained to provide degraded underwater [7], such methods cannot be used accurate hand gesture recognition; subsequently, a finite-state to instruct an AUV to dynamically reconfigure command machine-based deterministic model performs efficient gesture- parameters. The current task thus needs to be interrupted, to-instruction mapping and further improves robustness of the interaction scheme. The key aspect of this framework is and the robot needs to be brought to the surface in order that it can be easily adopted by divers for communicating to reconfigure its parameters. This is inconvenient and often simple instructions to underwater robots without using artificial expensive in terms of time and physical resources. Therefore, tags such as fiducial markers or requiring memorization of a triggering parameter changes based on human input while the potentially complex set of language rules. Extensive experiments robot is underwater, without requiring a trip to the surface, are performed both on field-trial data and through simulation, which demonstrate the robustness, efficiency, and portability of is a simpler and more efficient alternative approach. this framework in a number of different scenarios. Finally, a Controlling a robot using speech, direct input (e.g., a user interaction study is presented that illustrates the gain in the keyboard or joystick), or free-form gestures is a general ease of use of our proposed interaction framework compared paradigm [3], [5], [22] in the context of Human-Robot In- to the existing methods for the underwater domain. teraction (HRI). Unlike relatively less challenging terrestrial environments, the use of keyboard or joystick interfaces I. INTRODUCTION or tactile sensors is unappealing in underwater applications Underwater robotics is an area of significantly increasing since it entails costly waterproofing and introduces an addi- importance and applications, and is experiencing a rapid tional point of failure. Additionally, since speech or RGB- D(i.e., visual and depth image)-based interfaces, such as rise in research endeavors. Truly autonomous underwater TM TM navigation is still an open problem, with the underwater a Leap Motion or Kinect are not feasible underwater, domain posing unique challenges to robotic sensing, per- vision-based communication schemes are more natural for ception, navigation, and manipulation. However, a simple diver-robot interaction. yet robust human-robot communication framework [4], [9], [23] is desired in many tasks which requires the use of autonomous underwater vehicles (AUVs). Particularly, the ability to accept direct human guidance and instructions during task execution (see Fig. 1) is of vital importance. Additionally, such semi-autonomous behavior of a mobile robot with human-in-the-loop guidance reduces operational arXiv:1709.08772v8 [cs.RO] 20 Feb 2018 overhead by eliminating the necessity of teleoperation (and one or more teleoperators). However, simple and intuitive instruction sets and robust instruction-to-action mapping are essential for successful use of AUVs in a number of critical applications such as search-and-rescue, surveillance, underwater infrastructure inspection, and marine ecosystem Fig. 1: Divers programming an AUV using the RoboChat [9] monitoring. language using ARTag [11] markers; note the thick “tag The ability to alter parts of instructions (i.e., modifying book” being carried by the diver, which, while necessary, subtasks in a larger instruction set) and reconfigure program adds to the diver’s cognitive load and impacts mission parameters is often important for underwater exploration and performance data collection processes. Because of the specific challenges . This work explores the challenges involved in designing a The authors are with the Department of Computer Science and Engineer- ing, University of Minnesota-Twin Cities, MN 55455, USA. hand gesture-based human-robot communication framework E-mail:f1islam034, 2hoxxx323, [email protected] for underwater robots. In particular, a simple interaction framework is developed where a diver can use a set of observed motion and mapped to the robot instructions. Since intuitive and meaningful hand gestures to program the ac- more information is embeddable in each trajectory, a large companying robot or reconfigure program parameters on number of instructions can be supported using only two the fly. A convolutional neural network-based robust hand fiducial markers. However, this method introduces additional gesture recognizer is used with a simple set of gesture-to- computational overhead to track the marker motion and instruction mapping. A finite-state machine based interpreter needs robust detection of shape, orientation, and size of the ensures predictable robot behavior by eliminating spurious motion trajectory. Furthermore, these problems are exacer- inputs and incorrect instruction compositions. bated since both robot and human are suspended in a six- degrees-of-freedom (6DOF) environment. Also, the symbol- II. RELATED WORK to-instruction mapping remains unintuitive. Modulating robot control based on human input in the Since the traditional method for communication between form of speech, hand gestures, or keyboard interfaces has scuba divers is with hand gestures, similarly instructing been explored extensively for terrestrial environments [3], robots is more intuitive and flexible than using fiducial [5], [20], [22]. However, most of these human-robot com- markers. Additionally, it relieves the divers of the task of munication modules are not readily applicable in underwater carrying a set of markers, which, if lost, would put the applications due to environmental and operational constraints mission in peril. There exists a number of hand gesture-based [7]. Since visual communication is a feasible and oper- HRI frameworks [3], [5], [21], [22] for terrestrial robots. In ationally simpler method, a number of visual diver-robot addition, recent visual hand gesture recognition techniques interaction frameworks have been developed in the literature. [13]–[15] based on convolutional neural networks have been A gesture-based framework for underwater visual-servo shown to be highly accurate and robust to noise and visual control was introduced in [8], where a human operator distortions [10]. A number of such visual recognition and on the surface was required to interpret the gestures and tracking techniques have been successfully used for under- modulate robot movements. Due to challenging visual con- water tracking [19] and have proven to be more robust than ditions underwater [7] and lack of robust gesture recognition other purely feature-based methods (e.g., [12]). However, techniques, fiducial markers were used in lieu of free-form feasibility of these models for hand gesture based diver-robot hand gestures as they are efficiently and robustly detectable communication has not been explored in-depth yet. under noisy conditions. In this regard, most commonly used fiducial markers have been those with square, black-and- III. METHODOLOGY white patterns providing high contrast, such as ARTags [11] The proposed framework is built on a number of com- and April Tags [17], among others. These consist of black ponents: the choice of hand gestures to map to command symbols on a white background (or the opposite) in different tokens, the robust recognition of hand gestures, and the use patterns enclosed within a square. Circular markers with of a finite-state machine to enforce command structure and similar patterns such as the Photomodeler Coded Targets ignore erroneous detections or malformed commands. Each Module system and Fourier Tags [18] have also been used of these components is described in detail in the following in practice. sections. RoboChat [9] is the first visual language proposed for A. Mapping Hand Gestures to Language Tokens underwater diver-robot communication. Divers use a set of ARTag markers printed on cards to display predefined se- The key objective of this work is to design a simple, quences of symbolic patterns to the robot, though the system yet expressive framework that can be easily adopted by is independent of the exact family of fiducial markers being divers for communicating instructions to the robot without used. These symbol sequences are mapped to commands using fiducial markers or memorizing complex language using a set of grammar rules defined for the language. rules. Therefore, we choose a small collection of visually These grammar rules include both terse imperative action distinctive and intuitive gestures, which would improve the commands as well as complex procedural statements. Despite likelihood of robust recognition in degraded visual condi- its utility,