Uncooperative Spacecraft Pose Estimation Using Monocular Monochromatic Images
Total Page:16
File Type:pdf, Size:1020Kb
JOURNAL OF SPACECRAFT AND ROCKETS Vol. 58, No. 2, March–April 2021 Uncooperative Spacecraft Pose Estimation Using Monocular Monochromatic Images Jian-Feng Shi∗ MDA, Brampton, Ontario L6S 4 J3, Canada and Steve Ulrich† Carleton University, Ottawa, Ontario K1S 5B6, Canada https://doi.org/10.2514/1.A34775 Imaging cameras are cost-effective sensors for spacecraft navigation. Image-driven techniques to extract the target spacecraft from its background are efficient and do not require pretraining. In this paper, we introduce several image-driven foreground extraction methods, including combining the difference of Gaussian-based scene detection and graph manifold ranking-based foreground saliency generation. We successfully apply our foreground extraction method on infrared images from the STS-135 flight mission captured by the space shuttle’s Triangulation and LIDAR Automated Rendezvous and Docking System (TriDAR) thermal camera. Our saliency approach demonstrates state- of-the-art performance and provides an order of magnitude reduction in processing speed from the traditional methods. Furthermore, we develop a new uncooperative spacecraft pose estimation method by combining our foreground extraction technique with the level-set region-based pose estimation with novel initialization and gradient descent enhancements. Our method is validated using synthetically generated Envisat, Radarsat model, and International Space Station motion sequences. The proposed process is also validated with real rendezvous flight images of the International Space Station. Nomenclature ILab = image in CIE L*a*b color space A B C K = intrinsic camera matrix , , = pose quaternion matrices K A~ = unnormalized optimal affinity matrix B, KB = box filter kernel and kernel size, respec- B x tively = box filter operator K , K = elliptical kernel and kernel size, respec- C = image curve entity i i CG x CL x tively, where i is dilation d or erosion e , = image converting functions from color to K = Laplace filter kernel size gray and red green blue to International Com- L Ks = 3 × 3 Scharr kernel mission on Illumination L*a*b, respectively L x D = Laplacian filter = degree matrix M x = mean value binary threshold function ^ D, D = maximum and minimum feature vector MB x = minimum boundary distance saliency differences, respectively response generation function D x = normalized difference of Gaussian function MC x = image moment centroid generation function d ij = ith row, jth column element in the degree MD x, ME x = morphological dilation and erosion func- matrix tion, respectively E = image energy function M = appearance model, where i is foreground f F x i = fast explicit diffusion function or background b F = coordinate frame vectrix N x = image normalization function f = camera focal length O x = Otsu thresholding function f, f = superpixel and optimized graph manifold oi = image center position in the i direction, ranking function respectively where i is x, y Downloaded by CARLETON UNIVERSITY on March 18, 2021 | http://arc.aiaa.org DOI: 10.2514/1.A34775 f^ = unit vector of the i vector gradient i Pi = probability density function, where i is fore- G = 11 × 11 Gaussian filter ground f or background b G x = Gaussian distribution map generation function q, qi = quaternion represented target orientation H z, δ z = Heaviside step and smoothed Dirac-delta and ith parameter quaternion, respectively function, respectively q~ = nonnormalized quaternion hi = step size in the i direction, where i is x, y R = rotation from target body frame to camera h i = feature vector for the ith superpixel frame I, I = input color image and normalized image, R x = apply contours and keep only maximum respectively perimeter region RC x = regional contrast saliency response genera- tion function Presented as 2018-5281 at the 2018 AIAA SPACE and Astronautics Forum r x = likelihood of the pixel property, where i is and Exposition, Orlando FL, September 17–19, 2018; received 17 February i 2020; revision received 26 June 2020; accepted for publication 23 September foreground f or background b 2020; published online 8 February 2021. Copyright © 2021 by Jian-Feng Shi and rtol = foreground/background classification ratio Steve Ulrich. Published by the American Institute of Aeronautics and Astronau- tolerance tics, Inc., with permission. All requests for copying and permission to reprint Sbg, Sfg = background and foreground masks, respec- should be submitted to CCC at www.copyright.com; employ the eISSN 1533- tively 6794 to initiate your request. See also AIAA Rights and Permissions www. Sd = simplified difference of Gaussian normal- aiaa.org/randp. ized response map *Senior GNC Engineer, 9445 Airport Road. Senior Member AIAA. S S † e, fe = Scharr filtered edge response map and edge Associate Professor, Department of Mechanical and Aerospace Engineer- contour response, respectively ing, 1125 Colonel By Drive. Senior Member AIAA. 284 SHI AND ULRICH 285 Sfed = mean value fast explicit diffusion response and quality training data that are time-consuming to generate. To this map end, recent work by Kisantal et al. [20] developed a photorealistic SG = Gaussian centroid response map spacecraft pose estimation dataset for machine learning algorithm S hsf = high-frequency saliency feature response training and inference. Our approach developed in this paper limits map image clutter by extracting spacecraft foreground using saliency Si = camera image scaling in image i direction detection. Specifically, we use an image-driven approach to detect where i is x, y, θ; θ is the skew scaling and extract the spacecraft target in real time and reduce pose estima- Si = normalized image intersected with i is fore- tion processing time and prediction errors. ground f or background b mask Object image pose estimation is a core problem in computer SL x = simple linear iterative clustering superpixel vision. Rosenhahn et al. [21] and Lepetit and Fua [22] summarized generation function methods using feature-based structures such as points, lines, circles, SR x, SSR = spectral residual response function and and chains to freeform objects such as active contours and implicit response map, respectively surfaces. The state-of-the-art methods in spacecraft pose estimation SL = Laplace algorithm foreground response are based on image features and registration. Both Sharma et al. [12] map and Capuano et al. [23] present similar pipelines using region of Sm, SM = mean value binary threshold contours and interest estimation, feature extraction, and point or shape registration. response map, respectively As Capuano et al. [23] points out, “the assumption of a number of Smask = processed foreground mask pixels in the region of interest must be large enough to allow the S ” mbd = minimum boundary distance response map extraction of a minimum number of point features. For the extracted So, SO = contour region Otsu threshold and Otsu features to be useful, the region of interest must be estimated pre- thresholded response map, respectively cisely. Unsupervised image processing methods [12] for region Sp = saliency enhanced response map estimation can be quickly generated; it may, however, be confused Src = regional contrast response map by the intense geographical features of the Earth background. Alter- Su = main region contour mask natively, regions of interest can be achieved with good confidence Ti = transformation matrix for the i axis where i using machine learning or deep learning [24,25] techniques; it is, is x, y, z however, a highly time-consuming task [23]. The difficulty in the t = translation vector from camera to target latter feature of extraction assumption is often caused by intense body expressed in camera frame space lighting and shadowing. t~ = noncentralized initial translation vector Key points based on image corners [26,27] and extrema image W, w ij = affinity matrix; and ith row, jth column blobs [28,29] can be difficult to classify due to their quantity and parameter in the affinity matrix quality under extreme lighting or nondistinctive visual targets. X, Y, Z = x-, y-, z-axis directions The feature descriptors used for key-point matching can be time- Xi = target body coordinate where i is the body b consuming to compute and can be erroneous, especially under affine or camera c frame or viewpoint transformations and illumination distortions. Edge- x, y, z = translational x, y, and z or for x, y used as based pose estimation [30] represents stable features to track with image pixel position many invariant properties, but extracting relevant edge features and x, xk = input image or pose state vector at the kth discarding line clutters requires robust decision tools. On the other step hand, the region-based pose estimation uses invariant region boun- x daries with better recognition properties. In this context, we develop a γi = image coordinate partial derivative with respect to γi direction real-time region-based method for monocular infrared (IR) image x, y = x, y-pixel location from image center spacecraft pose estimation. y = pixel intensity or color feature vector z = manifold ranking indication vector α = graph manifold ranking scaling parameter II. Related Work ϵ, η = xyzT and w quaternion, respectively We organize foreground object extraction into three branches of μ σ i, i = pixel mean and standard deviation, where i computer vision research: techniques in using sequential images for is foreground f or background b background subtraction [31,32], techniques in machine learning