Matching Multi-Sensor Remote Sensing Images Via an Affinity Tensor

remote sensing

Article Matching Multi-Sensor Remote Sensing Images via an Afﬁnity Tensor

Shiyu Chen 1 ID , Xiuxiao Yuan 2,3, Wei Yuan 2,4,* ID , Jiqiang Niu 1, Feng Xu 1 and Yong Zhang 5 ID

1 School of Geographic Sciences, Xinyang Normal University, 237 Nanhu Road, Xinyang 464000, China; [email protected] (S.C.); [email protected] (J.N.); [email protected] (F.X.) 2 School of Remote Sensing and Information Engineering, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; [email protected] 3 Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China 4 Center for Spatial Information Science, University of Tokyo, Kashiwa 277-8568, Japan 5 Visiontek Research, 6 Phoenix Avenue, Wuhan 430205, China; [email protected] * Correspondence: [email protected]; Tel.: +86-27-6877-1228

Received: 10 May 2018; Accepted: 26 June 2018; Published: 11 July 2018

Abstract: Matching multi-sensor remote sensing images is still a challenging task due to textural changes and non-linear intensity differences. In this paper, a novel matching method is proposed for multi-sensor remote sensing images. To establish feature correspondences, an afﬁnity tensor is used to integrate geometric and radiometric information. The matching process consists of three steps. First, features from an accelerated segment test are extracted from both source and target images, and two complete graphs are constructed with their nodes representing these features. Then, the geometric and radiometric similarities of the feature points are represented by the three-order afﬁnity tensor, and the initial feature correspondences are established by tensor power iteration. Finally, a tensor-based mismatch detection process is conducted to purify the initial matched points. The robustness and capability of the proposed method are tested with a variety of remote sensing images such as Ziyuan-3 backward, Ziyuan-3 nadir, Gaofen-1, Gaofen-2, unmanned aerial vehicle platform, and Jilin-1. The experiments show that the average matching recall is greater than 0.5, which outperforms state-of-the-art multi-sensor image-matching algorithms such as SIFT, SURF, NG-SIFT, OR-SIFT and GOM-SIFT.

Keywords: image matching; multi-sensor remote sensing image; graph theory; afﬁnity tensor; matching blunder detection

1. Introduction Image matching aims to ﬁnd feature correspondences among two or more images, and it is a crucial step in many remote sensing applications such as image registration, change detection, 3D scene reconstruction, and aerial triangulation. Among all the matching methods, feature-based matching methods receive the most attention. These methods can be summarized in three steps: (1) feature detection; (2) feature description; and (3) descriptor matching. In feature detection, salient features such as corners, blobs, lines, and regions are extracted from images. The commonly used feature detection methods are Harris corner detector [1], features from accelerated segment test (FAST) [2], differences of Gaussian (DoG) features [3], binary robust invariant scalable (BRISK) features [4], oriented FAST and rotation BRIEF (ORB) [5], Canny [6], and maximally stable external regions (MSER) [7]. Among these image features, Harris, FAST, and DoG are point features which are invariant to image rotation. In particular, DoG is invariant to scale and linear illumination changes. Canny and MSER are line and region feature, respectively.

Remote Sens. 2018, 10, 1104; doi:10.3390/rs10071104 www.mdpi.com/journal/remotesensing Remote Sens. 2018, 10, 1104 2 of 19 Remote Sens. 2018, 10, x FOR PEER REVIEW 2 of 19

InIn the feature description, every every image image feature feature is is assigned to to a unique signature to distinguish itselfitself from from others others so so that that the the similar similar features features will will have have close close descriptors, descriptors, while while different different features features will characterizewill characterize far different far different descriptors. descriptors. State-of-t State-of-the-arthe-art feature feature descriptors descriptors include include scale-invariant featurefeature transform transform (SIFT) (SIFT) [3], [3], speeded-up robust robust fe featuresatures (SURF) [8], [8], principle component analysis SIFT (PCA-SIFT) [9], [9], gradient location and orientation histogram (GLOH) [10], [10], local self-similarity (LSS) [[11],11], distinctivedistinctive order-based order-based self-similarity self-similarity (DOBSS) (DOBSS) [12 [12],], local local intensity intensity order order pattern pattern (LIOP) (LIOP) [13], [13],and and a multi-support a multi-support region region order-based order-based gradient gradient histogram histogram (MROGH) (MROGH) [14 [14].]. ForFor line features, features, thethe mean-standard mean-standard deviation deviation line line descriptor descriptor (MSLD) (MSLD) [15] [and15] line and intersection line intersection context context feature feature(LICF) [16](LICF) have [16 good] have performances. good performances. For region For features, region features, Forssén Forss and éLowen and [17] Lowe gave [17 ]their gave solution their solution based onbased SIFT on descriptor. SIFT descriptor. InIn the descriptor matching,matching, thethe similarities similarities between between one one feature feature and and its its candidates candidates are are evaluated evaluated by bythe the distances distances of descriptionof description vectors. vectors. The The commonly commonl usedy used distance distance metrics metrics for for descriptor descriptor matching matching are areEuclidean Euclidean distance, distance, Hamming Hamming distance, distance, Mahalanobis Mahalanobis distance, distance, and a normalized and a normalized correlation correlation coefficient. coefficient.To determine To the determine final correspondences, the final correspondence sometimes as, threshold, sometimes such a asthreshold, the nearest such neighbor as the distancenearest neighborratio (NNDR) distance [3], isratio applied (NNDR) in descriptor [3], is applied matching. in descriptor matching. Though many researchers have made massive im improvementsprovements in the aforementioned matching stepssteps in in trying trying to to obtain obtain stable stable and and practical practical matching matching results, results, matching matching multi-sensor multi-sensor remote remote sensing sensing imagesimages is still aa challengingchallenging task.task. TheThe matchingmatching difficulties difficulties are are the the results results of of two two issues. issues. First, First, images images of ofthe the same same scene scene from from different different sensors sensors often often present present different different radiometricradiometric characteristicscharacteristics which are known as non-linear intensity differences [12,18]. [12,18]. These These intensity intensity differences differences can can be be exampled exampled by by Figure 11a,a, showingshowing a a pair pair of of images images acquired acquired by by Ziyuan-3 Ziyuan-3 (ZY-3) (ZY-3) and and Gaofen-1 Gaofen-1 (GF-1) (GF-1) sensors. sensors. Due Due to tonon-linear non-linear intensity intensity differences differences caused caused by by different different sensors sensors and and illumination,illumination, thethe distance between locallocal descriptors descriptors of of two two conjugated conjugated feature feature points points are not are close not closeenough, enough, and thus and lead thus to false lead matches. to false Second,matches. images Second, from images different from sensors different are sensors often areshot often at different shot at differentmoments moments which gives which rise gives to considerablerise to considerable textural textural changes. changes. As shown As shownin Figure in Figure1b, this1b, image this image pair was pair captured was captured by ZY-3 by ZY-3 and Gaofen-2and Gaofen-2 (GF-2) (GF-2) sensors. sensors. The shot The time shot spans time over spans three over years, three and years, the andrapid the changes rapid changesof the modern of the citymodern resulted city resultedin abundant in abundant textural textural changes changes among among images. images. Textural Textural changes changes will bring will bring down down the repetitivethe repetitive rate rateof image of image features features and andlead lead to fewe to fewerr correct correct matches. matches. Both Both of these of these aspects aspects reduce reduce the matchingthe matching recall recall [10] [and10] andthus thus lead lead to unsatisfying to unsatisfying matching matching results. results.

(a) (b)

Figure 1. Non-linear intensity differences and textural changes between image pairs: ( (aa)) The The image pair consists of GF-1 an andd ZY-3 sensor images and (b)) GF-2GF-2 andand ZY-3ZY-3 sensorsensor images.images.

As the most representative method of feature-based matching, SIFT-based methods have been As the most representative method of feature-based matching, SIFT-based methods have been studied extensively. Its upgraded versions are successfully applied in matching multi-sensor remote studied extensively. Its upgraded versions are successfully applied in matching multi-sensor remote sensing images. Yi et al. [19] proposed gradient orientation modification SIFT (GOM-SIFT) which sensing images. Yi et al. [19] proposed gradient orientation modiﬁcation SIFT (GOM-SIFT) which modified the gradient orientation of SIFT and gave restrictions on scale changes to improve the modiﬁed the gradient orientation of SIFT and gave restrictions on scale changes to improve the matching precision. The simple strategy was also used by Mehmet et al. [20]; they proposed an matching precision. The simple strategy was also used by Mehmet et al. [20]; they proposed orientation-restricted SIFT (OR-SIFT) descriptor which only had half the length of GOM-SIFT. When an orientation-restricted SIFT (OR-SIFT) descriptor which only had half the length of GOM-SIFT. non-linear intensity differences and considerable textural changes are present between images, When non-linear intensity differences and considerable textural changes are present between images, simply reversing the gradient orientation of image features cannot increase the number of correct simply reversing the gradient orientation of image features cannot increase the number of correct matches. Hasan et al. [21] took advantage of neighborhood information from SIFT’s key points. The experimental results showed that this method could generate more correct matches because the geometric structures between images were relatively static. Saleem and Sablatnig [22] used

Remote Sens. 2018, 10, 1104 3 of 19 matches. Hasan et al. [21] took advantage of neighborhood information from SIFT’s key points. The experimental results showed that this method could generate more correct matches because the geometric structures between images were relatively static. Saleem and Sablatnig [22] used normalized gradients SIFT (NG-SIFT) for matching multispectral images; their results showed that the proposed method achieved a better matching performance against non-linear intensity changes between multispectral images. Considering the self-similarity of an image patch, local self-similarity (LSS) captures the internal geometric layouts of feature points. LSS is another theory which has been successfully applied for the registration of thermal and visible videos, and it has handled complex intensity variations. Shechtman [11] first proposed the LSS descriptor and used it as an image-based and video-based similarity measure. Kim et al. [23] improved LSS based on the observation that a self-similarity existing within images was less sensitive to modality variations. They proposed a dense adaptive self-correlation (DASC) descriptor for multi-modal and multi-spectral correspondences. In an earlier study, Kim et al. [24] extended the standard LSS descriptor into the frequency domain and proposed the LSS frequency (LSSF) descriptor in matching multispectral and RGB-NIR image pairs. Their experimental results showed LSSF was invariant to image rotation, and it outperformed other state-of-the-art descriptors. Despite the attractive illumination-invariance property, both standard LSS and extended LSS have drawbacks in remote sensing image matching because of their relatively low descriptor discriminability. To conquer this shortcoming, Ye and San [18] used GOM-SIFT in the coarse registration process, and then utilized LSS descriptor for finding feature correspondences in the fine registration process. Their proposed method alleviated the low distinctiveness effect of LSS descriptors and achieved satisfied matching results in multispectral remote sensing image pairs. Sedaghat and Ebadi [12] combined the merits of LSS and the intensity order pooling scheme, and proposed DOBSS for matching multi-senor remote sensing images. Similar to Ye and San’s matching strategy [18], Liu et al. [25] proposed a multi-stage matching approach for multi-source optical satellite imagery matching. They firstly estimated homographic transformation between source and target image by initial matching, and then used probability relaxation to expand matching. Though the matching scheme presented in [25] obtained satisfied matching results in satellite images, it has the same defect as in [18] that if the initial matching fails, the whole matching process will end up with a failure. Most feature-based matching methods boil down to finding the closest descriptor in distance while seldom considering the topological structures between image features. In multi-sensor image pairs, non-linear intensity differences lead to dissimilarities of conjugate features in radiometry, and changed textures caused by different shooting moments have no feature correspondences in essence. Descriptor-based matching recall, which is a very crucial factor for a robust matching method, only depends on the radiometric similarity of feature pairs. On the contrary, although captured by different sensors or shot in different moments, the structures among images remain relatively static [22], and they are invariant to intensity changes. Therefore, structural information can be utilized to obtain robust and reliable matching results. Based on this fact, many researchers cast the feature correspondence problem as the graph matching problem (e.g., [26–29]). The affinity tensor between graphs, as the core of graph matching [26], paves the way for multi-sensor remote sensing image matching, since radiometric and geometric information of image features can be easily encoded in a tensor. Duchenne et al. [28] used geometric constraints in feature matching and gained favorable matching results. Wang et al. [30] proposed a new matching method based on a multi-tensor which merged multi-granularity geometric affinities. The experimental results showed that their method was more robust than the method proposed in [28]. Chen et al. [31] observed that outliers were inevitable in matching, so they proposed a weighted affinity tensor for poor textural image matching and gained better experimental results compared to feature-based matching algorithms. To address the multi-sensor image matching problem, an affinity tensor-based matching (ATBM) method is proposed, in which the radiometric information between feature pairs and structural Remote Sens. 2018, 10, 1104 4 of 19 information between feature tuples are both utilized. Compared to the traditional descriptor-based matching methods, the main differences are in four folds. First, image features are not isolated in matching; instead, they are ordered triplets to compute the affinity tensor elements (Section 2.1). Second, topological structures between image features are not abandoned; instead, they are treated as geometricRemote Sens. similarities 2018, 10, x FOR in PEER the REVIEW matching process. Geometric similarities and radiometric similarities4 of 19 play the same role in matching by vector normalization and balancing (Sections 2.2 and 2.2). Third, the affinityas geometric tensor-based similarities matching in the matching method process. inherently Geometric has similarities the ability and to detectradiometric matching similarities blunders (Sectionplay 2.4 the). same At last, role the in matching tenor-based by vector model normalization is a generalized and matchingbalancing (Sections model in 2.2 which and 2.3). geometric Third, and the affinity tensor-based matching method inherently has the ability to detect matching blunders radiometric information can be easily integrated. (Section 2.4). At last, the tenor-based model is a generalized matching model in which geometric and 2. Methodologyradiometric information can be easily integrated. Given2. Methodology a source and target images acquired from different sensors, image matching aims to find point correspondencesGiven a source and between target images the two acquired images. from The different proposed sensors, method image matching for matching aims to involves find a four-steppoint process,correspondences namely between complete the graphtwo images. building The proposed [32], affinity method tensor for matching construction, involves tensor a four- power iterationstep andprocess, matching namely gross complete error graph detection. building In the [32], first affinity stage, tens anor upgraded construction, FAST tensor detector power (named uniformiteration robust and FAST, matching abbreviated gross error as detection. UR-FAST, In th whiche first willstage, be an detailed upgraded in FAST Section detector 2.2) was(named used to extractuniform image robust features, FAST, and abbreviated two complete as UR-FAST, graphs which in both will sourcebe detailed and in target Section images 2.2) was are used built to with theirextract nodes image representing features, and FAST two features. complete Then,graphs the in both affinity source tensor and target between images the are graphs built with is built their with nodes representing FAST features. Then, the affinity tensor between the graphs is built with its its elements expressing the node similarities. Next, the tensor power iteration is applied to obtain elements expressing the node similarities. Next, the tensor power iteration is applied to obtain the the leading vector which contains coarse matching results. Finally, a gross error detection method is leading vector which contains coarse matching results. Finally, a gross error detection method is followedfollowed for purifying for purifying the the matching matching results. results. FigureFigure2 2 shows the the main main process process of ofthe the proposed proposed method. method.

Figure 2. The workflow of tensor matching and matching blunder detection. Figure 2. The workﬂow of tensor matching and matching blunder detection. 2.1. Definition of the Affinity Tensor

Given a source and a target image, FS and FT are the two image feature point sets extracted

from the source and target image, respectively; the numbers of features in FS and FT are m and n;

Remote Sens. 2018, 10, 1104 5 of 19 Remote Sens. 2018, 10, x FOR PEER REVIEW 5 of 19

≤ ′ ′ ≤ 2.1.i (im Definition) and of thei ( Affinityin) are Tensor feature indices of FS and FT ; GS and GT are the two complete graphs built by FS and FT (as illustrated in Figure 2). In the mathematical field of graph theory, a Given a source and a target image, FS and FT are the two image feature point sets extracted from complete graph is a simple undirected graph in which every pair of distinct nodes is connected by a the source and target image, respectively; the numbers of features in FS and FT are m and n; i (i ≤ m) unique0 edge.0 As we can see from Figure 2, there are ten and eight features in the source and target and i (i ≤ n) are feature indices of FS and FT; GS and GT are the two complete graphs built by FS image, respectively. According to the complete graph theory, there are C 2 = 45 unique edges that and FT (as illustrated in Figure2). In the mathematical field of graph theory,10 a complete graph is a 2 = connectsimple undirected node pairs graphin GS in, and which C8 every28 pairunique of distinctedges that nodes connect is connected node pairs by ain unique GT . edge. As we can seeThe from affinity Figure tensor2, there A are is built ten andon the eight two features complete in thegraphs source ( GS and and target GT ) image,which describes respectively. the 2 = Accordingsimilarity of to the the two complete graphs. graph In this theory, paper, there tensor are CA 10 can45 beunique considered edges as that a three-way connect node array pairs with in a 2 mn×× mn mn GS, and C8××= 28 unique edges∈ that connect node pairs in GT. size of mn mn mn , i.e., A R and aI ;;JK is a tensor element located in column I, row J and The affinity tensor A is built on the two complete graphs (GS and GT) which describes the tube K in the tensor cube (as shown in Figure 2). To explicitly express how the tensor elements are similarity of the two graphs. In this paper, tensor A can be considered as a three-way array with a size constructed, we use (i,i′), (j,j′)mn and×mn (k×,kmn′) to replace I, J, and K. The indices i, j, and k are feature indices of mn × mn × mn, i.e., A ∈ R and aI;J;K is a tensor element located in column I, row J and in the source image ranging from 0 to m − 1; i′, j′ and k′ are feature indices in the target image ranging tube K in the tensor cube (as shown in Figure2). To explicitly express how the tensor elements are from 0 to n − 1. In this way, the tensor element a is rewritten as a located at (i × n + i′) constructed, we use (i,i0), (j,j0) and (k,k0) to replace II,;;JKJ, and K. The indicesii,';,i, jj j, ';, kkand ' k are feature indices 0 0 0 S column,in the source (j × n image + j′) row, ranging and from (k × 0n to + mk′)− tube,1; i , andj and it kexpressesare feature the indices similarity in the of target triangles image Tijk,, ranging and T 0 0 0 0 fromTijk', ', ' 0 which to n − are1. the In this two way, triangles the tensor in graph element GS andaI;J ;KGisT , rewrittenrespectively. as a i,i ;j,j ;k,k located at (i × n + i ) column, (j × n + j0) row, and (k × n + k0) tube, and it expresses the similarity of triangles TS and The tensor element aii,';, j j ';, kk ' can be computed by Equation (1): i,j,k T Ti0,j0,k0 which are the two triangles in graph GS and GT, respectively. − 2 (*wfbijkijk,, f ',',' ) The tensor element ai,i0;j,j0;k,k0 can − be computed2 by Equation (1): eifijkandijkε 2 ,'''== = =   ( ∗k − k2 )2 wb ()fffiijk,,,j,k f i ',','0 jk0 0 − − i ,j ,k2 2   ε22 0 0 0 a  =e eε ,''', if i f≠≠ i j= kj and= k i and ≠ ji ≠= kj = k ii,';, j j ';, kk '  2 (1)  (k f − f 0 0 0 k )   i,j,k i ,j ,k 2 0 0 0 = − 0 0 0 ai,i ;j,j ;k,k e  ε2 , i f i 6= j 6= k and i 6= j 6= k (1)   0, others     0, others S T where fi, j,k and fi', j',k' are the geometric descriptors of Tijk,, and Tijk', ', ' . As shown in Figure 3, the 0 0 0 S T where fi,j,k and fi ,j ,k are the geometric descriptors of Ti,j,k and Ti0,j0,k0 . As shown in Figure3, geometricthe geometric descriptors descriptors are areusually usually expressed expressed as asthe the co cosinessines of of the the three three vertical vertical angles angles in in a a triangle (i.e., f = (cosθθ ,cos ,cos θ ) ). The variable ε represented the Gaussian kernel band width; . is (i.e., fi,i,j, j,kk = (cos θ ii, cos θ jj, cos kθk)). The variable ε represented the Gaussian kernel band width; k.k is the length of a vector; and wwbb is is a a balanced balanced factor factor which which will will be be detailed detailed below. below.

i i ' θ θ i i'

θ θ j' j j j' θ θ k k ' k k'

G P G Q Figure 3. Diagram of the triangle descriptors. Figure 3. Diagram of the triangle descriptors.

T S T T As shown in Equation (1), triangle ijk,,S and Tijk', ', ' will degenerate to a pair of points when As shown in Equation (1), triangle T and T 0 0 0 will degenerate to a pair of points when == == i,j,k i ,j ,k ijk and 0 ijk'''0 0 ; thus, fi, j,k and fi', j',k' will be radiometric descriptors (e.g., 128 i = j = k and i = j = k ; thus, fi,j,k and fi0,j0,k0 will be radiometric descriptors (e.g., 128 dimensional dimensionalSIFT descriptor SIFT or 64descriptor dimensional or SURF64 dimensional descriptor) SURF of image descriptor) feature i andof imagei0. However, feature the i geometric and i' . However, the geometric descriptors are related to cosines of angles, and the radiometric descriptors are related to intensity of image pixels; the two kinds of descriptors have different physical

Remote Sens. 2018, 10, 1104 6 of 19 descriptors are related to cosines of angles, and the radiometric descriptors are related to intensity of image pixels; the two kinds of descriptors have different physical dimensions. In addition, the radiometric descriptors are more distinctive than geometric descriptors because the radiometric descriptors have higher dimensions than those of the geometric descriptors. To balance the differences of physical dimensions and descriptor distinctiveness, Wang et al. [30] suggested normalizing all the descriptors to a unit norm. However, a simple normalization can only alleviate the physical dimensional differences. Thus, a constant factor wb is used to balance the differences of descriptor distinctiveness. In this way, the geometric and radiometric information are equally important when encoded in the afﬁnity tensor. The constant factor wb can be estimated by the correct matches of image pairs in prior:

fr wb = (2) fg where fr and fg are the average distances of normalized radiometric and geometric descriptors, respectively. Though the balanced factor wb differs in different image pairs, they have the same order of magnitude. The order of magnitude of wb can be estimated as follows:

(1) Select an image pair from the images to be matched, extract a number of tie points manually or automatically (the automatic matching method could be SIFT, SURF, and so on); (2) Remove the erroneous tie points by manual check or gross error detection algorithm such as Random sample consensus (RANSAC) [33];

(3) Estimate the average feature descriptor distance of the matched features, named fr; (4) Construct triangulated irregular network (TIN) by the matched points, and compute the average triangle descriptor distance of the matched triangles, named fg; (5) Compute the balanced factor wb with Equation (2).

In practical matching tasks, calculating wb for every image pair is very time consuming and unnecessary, and only the order of magnitude of wb is significant. Thus, we sample a pair of images from the image set to be matched, estimate the order of magnitude of wb, and apply this factor for the rest of image pairs. The configuration of wb also has some principles in tensor matching: if both geometric and radiometric distortions are small, then wb can be an arbitrary positive real number because either geometric or radiometric information is sufficient for matching; if geometric distortions are small, then it is preferable a greater wb because radiometric information should be suppressed in matching; if radiometric distortions are small, then a smaller wb would be better because large geometric distortions will contaminate the affinity tensor if not depressed.

2.2. Construction of the Affinity Tensor The affinity tensor of two complete graphs can be constructed by Equation (1). Whereas a pair of images may consist of thousands of image features, and using all the features to construct a complete tensor is very memory and time consuming, sometimes a partial tensor is sufficient for matching. Besides, the feature repeatability is relatively low in multi-sensor remote sensing images, thus leading to small overlaps between feature sets. In addition, the outliers of image features mix with inliers and introduce unrelated information in the tensor which leads to a local minimum in power iteration [34]. Moreover, to speed up tensor power iteration, the affinity tensor should maintain a certain sparseness. Based on the above requirements, the following four strategies are proposed to construct a small, relatively pure and sparse tensor. (1) Extracting high repetitive and evenly distributed features. Repetitiveness of features is a critical factor for a successful matching [2]. We evaluated common used image features such as SIFT, SURF, and so on, and found that FAST feature had the highest feature repetitiveness. As mentioned Remote Sens. 2018, 10, 1104 7 of 19 in Section 2.1, the measure of structural similarities of the graph nodes are three inner angles of the RemoteRemote Sens.Sens. 20182018,, 1010,, xx FORFOR PEERPEER REVIEWREVIEW 77 ofof 1919 triangles. However, if any two vertices in a triangle has a small distance, then a small shift of a vertex may lead to tremendous differences of the inner angles. Consequently, noises are introduced maymay leadlead toto tremendoustremendous differencesdifferences ofof thethe innerinner angles.angles. Consequently,Consequently, noisesnoises areare introducedintroduced inin in computing tensor elements. Inspired by UR-SIFT (uniform robust SIFT [35]), we design a new computingcomputing tensortensor elements.elements. InspiredInspired byby UR-SIFTUR-SIFT (uni(uniformform robustrobust SIFTSIFT [35]),[35]), wewe designdesign aa newnew versionversion version of FAST named UR-FAST to extract evenly distributed image features. As shown in Figure4, ofof FASTFAST namednamed UR-FASTUR-FAST toto extractextract evenlyevenly distributeddistributed imageimage features.features. AsAs shownshown inin FigureFigure 4,4, thethe the standard FAST features are clustered in violently changed textures, while the modified FAST standardstandard FASTFAST featuresfeatures areare clusteredclustered inin violentlyviolently changedchanged textures,textures, whilewhile thethe modifiedmodified FASTFAST featuresfeatures features have a better distribution. It should be noted that UR-FAST detector is only applied in the havehave aa betterbetter distribution.distribution. ItIt shouldshould bebe notednoted thatthat UR-FASTUR-FAST detectordetector isis onlyonly appliedapplied inin thethe sourcesource source image, and the features in the target image are acquired by standard FAST and approximate image,image, andand thethe featuresfeatures inin thethe targettarget imageimage areare acacquiredquired byby standardstandard FASTFAST andand approximateapproximate nearestnearest nearest neighbors (ANN) [36], which will be detailed in strategy (2). neighborsneighbors (ANN)(ANN) [36],[36], whichwhich willwill bebe detaileddetailed inin strategystrategy (2).(2).

((aa)) ((bb))

FigureFigureFigure 4.4. 4. ComparisonComparisonComparison ofof of standardstandard standard FASTFAST FAST andand UR-FAST.UR-FAST. ( (aa)) Standard Standard FASTFAST andandand (((bb))) UR-FAST.UR-FAST.UR-FAST.

(2) Reducing the number of image features. The detected features are filtered by ANN [36] (2)(2) ReducingReducing thethe numbernumber ofof imageimage features.features. The detected features are ﬁlteredfiltered by ANN [[36]36] searching algorithm for constructing two feature sets F and F . As illustrated in Figure 5a, there searching algorithm for constructing two feature sets FSS and FTT . As illustrated in Figure 5a, there searching algorithm for constructing two feature sets FS and FT. As illustrated in Figure5a, there are are 3 UR-FAST features and 13 FAST features in the source and target images, respectively. As shown 3are UR-FAST 3 UR-FAST features features and and 13 13 FAST FAST features features in in the the source source and and target target images, images, respectively. respectively. AsAs shownshown in Figure 5b, to guarantee feature repetitiveness, for every feature element in FFS , ANN searching inin FigureFigure5 5b,b, toto guaranteeguarantee featurefeature repetitiveness,repetitiveness, forfor everyevery featurefeature elementelement inin FSS,, ANNANN searchingsearching algorithmalgorithm is is used used to to search search its its 4 4 most most probable probable matches matches amongamong thethe 1313 featuresfeatures inin thethe targettarget image.image. All the candidate matches of feature elements in F constitute F (as shown in Figure 5b, the AllAll thethe candidatecandidate matches matches of of feature feature elements elements in FinS constituteFSS constituteFT (as F shownTT (as shown in Figure in5 b,Figure the original 5b, the originalfeatureoriginal number featurefeature numbernumber in target inin image targettarget is imageimage 13, while isis 13,13, the whilewhile feature thethe feature numberfeature numbernumber decreases decreasesdecreases to 9 after toto ﬁltering).99 afterafter filtering).filtering). In this 3 3 3 3 way,InIn thisthis the way,way, tensor thethe size tensortensor decreases sizesize decreasesdecreases to (3 × 9) totofrom (3(3 ×× the9)9)3 from originalfrom thethe size originaloriginal of (3 ×sizesize13) ofof, which(3(3 ×× 13)13) sharply3,, whichwhich decreases sharplysharply decreasesthedecreases memory thethe consumption. memorymemory consumption.consumption.

2 2 2 2

1 1 1 1 ' ' S B S B

3 3 3 3

Source image Target image FS FT Source image Target image FS FT

((aa)) ((bb)) Figure 5. An illustration of outlier filtering, the pink dots are outliers: (a) original detected feature Figure 5.5. An illustrationillustration of outlieroutlier ﬁltering,filtering, thethe pinkpink dotsdots areare outliers:outliers: ((aa)) originaloriginal detecteddetected featurefeature points are shown in (a); while filtered feature sets are shown in (b). pointspoints areare shownshown inin ((aa);); whilewhile ﬁlteredfiltered featurefeature setssets areare shown shown in in ( b(b).).

(3)(3) ComputingComputing aa partialpartial tensor.tensor. WhenWhen usingusing thethe definidefinitiontion ofof anan affinityaffinity tensortensor inin EquationEquation (1),(1), SS TT the tensor element aii,';, j j ';, kk ' means the geometric similarity of Tijk,, and Tijk', ', ' . As described in the tensor element aii,';, j j ';, kk ' means the geometric similarity of Tijk,, and Tijk', ', ' . As described in strategystrategy (2),(2), (3(3 ×× 9)9)33 elementselements shouldshould bebe computedcomputed forfor thethe completecomplete tensor.tensor. Actually,Actually, completelycompletely computingcomputing thethe tensortensor willwill bebe redundantredundant andand makemake thethe tensortensor lessless sparse.sparse. Therefore,Therefore, inin thisthis algorithm,algorithm, onlyonly aa smallsmall partpart ofof thethe tensortensor elementselements isis compcomputed.uted. AsAs shownshown inin FigureFigure 6,6, toto computecompute aa partialpartial

Remote Sens. 2018, 10, 1104 8 of 19

(3) Computing a partial tensor. When using the definition of an affinity tensor in Equation (1), 0 0 0 S T the tensor element ai,i ;j,j ;k,k means the geometric similarity of Ti,j,k and Ti0,j0,k0 . As described in strategy (2), (3 × 9)3 elements should be computed for the complete tensor. Actually, completely computing the tensorRemote Sens. will 2018 be, redundant10, x FOR PEER and REVIEW make the tensor less sparse. Therefore, in this algorithm, only a small8 of 19 part of the tensor elements is computed. As shown in Figure6, to compute a partial tensor, the ANN tensor, the ANN searching algorithm is applied once again to find the 3 most similar triangles in G searching algorithm is applied once again to find the 3 most similar triangles in G for TS (As shownT S T T T T 1,2,3 for T1,2,3 (AsT shownT in Figure 6,T T1',2',3' , T4',5',6' , and T5',6',7' are the searching results). That 0 is, 0 the0 in Figure6, T10,20,30 , T40,50,60 , and T50,60,70 are the searching results). That is, the tensor elements a1,4 ;2,6 ;3,5 , tensora1,50;2,6 0elements;3,70 , and a 1,1a1,40;2,2 ';2,60 ';3,5;3,3 '0, area1,5 non-zero, ';2,6 ';3,7 ' , and and a1,1';2,2 the ';3,3' remaining are non-zero, tensor and elements the remaining are zero. tensor In this elements way, the effectsare zero. of In outliers this way, are alleviated,the effects andof outliers the tensor are alleviated, is much sparser. and the tensor is much sparser.

' ,5 ';3 ,6 ';2 4 A1, 1

A1,5 ';2,6 ';3,7 ' A 1, 1'; 2, 2' ;3, 3'

GS GT

Figure 6. ANN searching for triangles. Figure 6. ANN searching for triangles.

(4) Making the graph nodes distribute evenly. Though UR-FAST makes image features distribute(4) Making evenly, the there graph is nodesstill some distribute possibility evenly. that Though triangles UR-FAST in the graphs makes imageoccasionally features have distribute small evenly,areas, and there the is vertices still some of possibilitythe triangles that are triangles approximately in the graphs collinear. occasionally Thus, a havethreshold small of areas, 15 pixels and the is verticesset to make of the the triangles triangles are have approximately relatively large collinear. areas.Thus, If a triangle a threshold in either of 15 graph pixels has is set an to area make fewer the trianglesthan 15 pixels, have relatively then all the large tensor areas. elements If a triangle related in either to this graph triangle has anare areaset to fewer zero. than Formally, 15 pixels, if S then allS the< tensor elements= related to thisT triangle< are set to zero.= Formally, if Area(Ti,j,k) < 15, Area()15 Tijk,, , then aijk,.; ,.; ,. 0 ; if Area()15 Tijk', ', ' , then a.,ijk ';., ';., ' 0 . The threshold (15 pixels) has T then a = 0; if Area(T 0 0 0 ) < 15, then a 0 0 0 = 0. The threshold (15 pixels) has less relationship less relationshipi,.;j,.;k,. with sensori ,j ,k types, and it is.,i not;.,j ;., kcase specific but more of an empirical value. Since with sensor types, and it is not case speciﬁc but more of an empirical value. Since UR-FAST avoid UR-FAST avoid clustered image features, the area constraint condition is seldom violated unless the clustered image features, the area constraint condition is seldom violated unless the triangle vertices triangle vertices are approximately collinear. are approximately collinear.

2.3. Power Iteration of the Affinity Affinity Tensor In the perspective of graph theory, the feature correspondence problem is to find the maximum In the perspective of graph theory, the feature correspondence problem is to find the maximum similar sub-graphs (i.e., maximize the summation of the node similarities). The best matching should similar sub-graphs (i.e., maximize the summation of the node similarities). The best matching should be satisfied by be satisfied by ∗ ∗ mn×mn1 ×1 z = arg**=∈ max a 0 0 0 z 0 z 0 z 0 , z ∈{}{0, 1} zazzzzarg max∑ i,i ii;,';,';,'j,j j;k j,k kki,i ii ,'j, jj ,' jk kk,k ,' , 0,1 z z 0 0 0 i,i ;jii,,';,j ; jk, jk ';, kk ' (3) ∗ ∗ T (3) s.t. Z 1 ≤ 1 and (Z ) T 1 ≤ 1 st.. Z** 1≤≤ 1 and() Z 1 1 ∗ ∈ { }m×n ∗ = 0 where Z 0, 1 × is the assignment matrix, zi,i0 1 indicates that i, i are the indices of matched ∗* ∈ mn ∗ * = points;where zZ is the{0,1} vectorization is the assignment of Z ; 1 is matrix, a vector z withii,' 1 all indicates elements that equal ii,' to are 1. the indices of matched points;The z constraint* is the vectorization term in Equation of Z * (3); 1 shows is a vector thatall with the all elements elements in equalZ∗1 and to 1.(Z ∗)T1 are less than 1, T i.e., everyThe constraint node in G termS has in at mostEquation one corresponding(3) shows that nodeall the in elementsGT, and everyin Z * 1 node and in ()GZT* has1 are at most less one corresponding node in GS (that is, the node mapping between GS and GT is an injective mapping). than 1, i.e., every node in GS has at most one corresponding node in GT , and every node in GT has at most one corresponding node in GS (that is, the node mapping between GS and GT is an injective mapping). Equation (3) actually is a sub-graph isomorphism problem, with no known polynomial-time algorithm (it is also known as an NP-complete problem [37]). However, the solution of Equation (3) can be well approximated by the leading vector of the affinity tensor which can be obtained by tensor power iteration [34]; the interpretation is as follows.

Remote Sens. 2018, 10, 1104 9 of 19

Equation (3) actually is a sub-graph isomorphism problem, with no known polynomial-time algorithm (it is also known as an NP-complete problem [37]). However, the solution of Equation (3) can be well approximated by the leading vector of the afﬁnity tensor which can be obtained by tensor power iteration [34]; the interpretation is as follows. Remote Sens. 2018, 10, x FOR PEER REVIEW 0 0 0 S T 9 of 19 Based on Equation (1), if ai,i ;j,j ;k,k is bigger, then the possibility that Ti,j,k and Ti0,j0,k0 are matched triangles is higher, and vice versa. Thus, the tensor elements can be expressed with probability theory Based on Equation (1), if a is bigger, then the possibility that T S and T T are under the assumption that the correspondencesii,';, j j ';, kk ' of the node pairs are independentijk,, [37ijk', ',]: ' matched triangles is higher, and vice versa. Thus, the tensor elements can be expressed with probability theory under the assumption that the correspondences0 0 of the0 node pairs are independent ai,i0;j,j0;k,k0 = P(i, i )P(j, j )P(k, k ) (4) [37]: 0 0 where P(i, i ) is the matching possibilityaPiiPjjPkkii,';, j j ';, kk ' of= feature( , ') ( , pair ') (i and , ') i . (4) Equation (4) can be rewritten in a tensor product form [38] where Pii(, ') is the matching possibility of feature pair i and i′.

Equation (4) can be rewritten in a tensor productA = p ⊗ formK p ⊗ [38]K p (5) =⊗ ⊗ A pppKK (5) where p is the vector with its elements expressing the matching possibility of feature pairs; ⊗K is the ⊗ tensorwhere Kroneckerp is the vector product with symbol;its elements and expressingA is a tensor the matching constructed possibility by two of graphsfeature thatpairs; have K nois noise or outliers,the tensor and Kronecker all the nodeproduct correspondences symbol; and A are is a independent. tensor constructed by two graphs that have no noiseThough or outliers, in practicaland all the matching node correspondences tasks, noise, are and independent. outliers are inevitable, and the assumption that the nodeThough correspondences in practical matching are independent tasks, noise, and is weak, outliers the are matching inevitable, problem and the canassumption be approximated that by thethe followingnode correspondences function are independent is weak, the matching problem can be approximated by the following function ∗ p = arg minkA − p ⊗K p ⊗K pkF (6) * =−⊗⊗p p arg min ApKK p p p F (6) where k•kF is the operator of Frobenius norm. where • is the operator of Frobenius norm. ThoughF the exact solution of Equation (3) generally cannot be solved in polynomial-time, it can be wellThough approximated the exact solution by Equation of Equation (6), of (3) which generally the computationalcannot be solved complexity in polynomial-time, is O((mn it)3 can). The tensor powerbe well iterationapproximated is illustrated by Equation in Algorithm(6), of which 1. the computational complexity is O((mn)3). The tensor power iteration is illustrated in Algorithm 1.

AlgorithmAlgorithm 1: 1:Afﬁnity Affinity tensor tensor power power iteration. iteration Input: Affinity tensor A Output: Leading eigenvector p* of A 1 begin 1 2 A(0) ← A , p(0) ← 1 n 3 for n = 0 to NIteration do (1)()()()n+ ←⊗ nnn ⊗ ∀← 4 p AppKK (i.e., ii,', pii,' a ii ,';, j j ';, kk ' p j , j ' p kk , ') jjkk,';,' (1)n+ + p 5 p(1)n ← p(1)n+ 2 6 end p()n 7 p* ← p()n 2 8 end

The output output of of Algorithm Algorithm 1 is 1 th ise theleading leading vector vector of the of tensor the tensorA (i.e., Athe(i.e., solution the solutionto Equation to Equation(6)). (6)). The size of the affinity tensor A is mn×× mn mn (i.e., the tensor has mn columns, mn rows, and mn The size of the afﬁnity tensor A is mn × mn × mn (i.e., the tensor has mn columns, mn rows, and mn tubes), so p*, the leading vector of the tensor A, has the length of mn. Besides, the elements of the tubes), so p*, the leading vector of the tensor A, has the length of mn. Besides, the elements of the tensor A are real numbers, so the elements of p* are real numbers too. To obtain the assignment matrix, * tensorwe shouldA are firstly real transform numbers, p so* to the a mn elements× matrix of (thep are matrix real numbersis called soft too. assignment To obtain matrix the assignment because matrix, * weits elements should ﬁrstly are real transform numbers. pIn tothe a manuscript,m × n matrix we(the donate matrix by Z is), then called discretize soft assignment Z to Z* by matrixuse of because itsgreedy elements algorithm are real[39] numbers.or Hungarian In thealgorithm manuscript, [40]. Finally, we donate the correspondences by Z), then discretize of image Zfeaturesto Z* by use of are generated by the assignment matrix Z*.

Remote Sens. 2018, 10, 1104 10 of 19 greedy algorithm [39] or Hungarian algorithm [40]. Finally, the correspondences of image features are generated by the assignment matrix Z*.

2.4. Matching Blunder Detection by Affinity Tensor Though the aforementioned tensor matching algorithm gets rid of most of the mismatching points, there are stillRemote matching Sens. 2018, 10, blundersx FOR PEER REVIEW caused by outliers, image distortions, and noise. Thus, 10 of 19 this paper proposed an affinity tensor-based method for eliminating erroneous matches. 2.4. Matching Blunder Detection by Affinity Tensor The proposed method is based on the observation that the structures among images are relatively static. Meanwhile,Though the the affinity aforementioned tensor includestensor matching structural algorithm information gets rid of of most the of image the mismatching features. Therefore, points, there are still matching blunders caused by outliers, image distortions, and noise. Thus, this the two matchedpaper proposed graphs an canaffinity induce tensor-based an attributed method for graph. eliminating In theerroneous induced matches. graph, every node has an attribute that isThe computed proposed method by the summationis based on the of observ the similaritiesation that the of structures the triangles among which images include are such a node. As shownrelatively from static. FigureMeanwhile,7, G theS andaffinityG Ttensorare twoincludes matched structural graphs information including of the image six pairsfeatures. of matched nodes, andTherefore, the node the pair two matched 6 and 6 graphs0 is an can erroneous induce an a correspondence.ttributed graph. In the Therefore, induced graph, in this every induced node graph, has an attribute that is computed by the summation of the similarities of the triangles which include the node induced by erroneous matched node pair (i.e., node 6 in G , which is induced by node pair such a node. As shown from Figure 7, G and G are two matched graphsA including six pairs of 0 S T 6 and 6 ) hasmatched a small nodes, attribute and the node which pair measures6 and 6′ is an the erroneous similarity correspondence. that comes Therefore, from other in this nodes.induced In a more formal expression,graph, the node the attribute induced by of erroneous induced matched node i nodecan bepair expressed (i.e., node 6 asin followsGA , which is induced by node pair 6 and 6′) has a small attribute which measures the similarity that comes from other nodes. In a more formal expression,1 the attribute of induced node i can be0 expressed0 as0 follows sI = ∑ ∑ ai,i0;j,j0;k,k0 , i 6= j 6= k, i 6= j 6= k (7) N 0 0 j=≠≠≠≠,j 1k,k saijkijkIiijjkk ,';, ';, ',,''' (7) N jjkk,' ,' 0 0 0 where sI is the attribute of node i in GA; i and i , j and j , and k and k are the matched node pairs; where s is the attribute of node i in G ; i and i', j and j ' , and k and k ' are the matched and N is the numberI of the matched triangles.A node pairs; and N is the number of the matched triangles.

Figure 7. The induced graph of two matched graph. Figure 7. The induced graph of two matched graph. Equation (7) is the basic formula for matching blunder detection. In a practical matching task, the number of the blunders is more than illustrated in Figure 7, and it is very hard to distinguish Equation (7) is the basic formula for matching blunder detection. In a practical matching task, blunders from noise. Thus, we embed Equation (7) in an iterative algorithm: the big blunders are the numbereliminated of the blundersfirst and then is the more small than ones illustrated until the similarities in Figure are7 ,kept and constant. it is very The hardalgorithm to distinguish is blunders fromillustrated noise. in Algorithm Thus, we 2. embed Equation (7) in an iterative algorithm: the big blunders are eliminated ﬁrstIn Algorithm and then 2, the I, J small, K are onesthe abbreviations until the similaritiesfor feature index are pairs kept (i constant.,i’), (j,j’) and The (k,k’); algorithm p is represents for p-th iteration. c is a constant; it can be calculated by the average similarity of triangles illustratedwhich in Algorithm constructed 2. by two error-free matched feature set (the calculating formula is shown in In AlgorithmEquation (7)). 2, I ,InJ, thisK are paper, the it abbreviations is empirically set for to 0.85. feature index pairs (i,i’), (j,j’) and (k,k’); p represents for p-th iteration. c is a constant; it can be calculated by the average similarity of triangles which constructed by two error-free matched feature set (the calculating formula is shown in Equation (7)). In this paper, it is empirically set to 0.85. Remote Sens. 2018, 10, 1104 11 of 19

Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 19

Algorithm 2: Blunder detection by the afﬁnity tensor. AlgorithmRemote 2: Sens. Blunder 2018, 10 ,detection x FOR PEER by REVIEW the affinity tensor 11 of 19 Input: Affinity tensor A, matched feature set {I, J, K, L…} Output:Algorithm Error-free 2:matching Blunder detectionset by the affinity tensor 1 beginInput: Affinity tensor A, matched feature set {I, J, K, L…} 2 do Output: Error-free matching set 3 for1 begininin==1... , ' 1... do 2 s do← A ≠≠ 4 I ==, IJK 3 forJK inin1... , ' 1... do ← ≠≠ 5 end4 sI A , IJK JK 6 rank {}si in descending order 5 end 7 remove the node pair corresponding to s 6 rank {}s in descending order n + i 8 s(1)p ← s ()p 7nn remove the node pair corresponding to s ←+ n 9 pp1(1)p + ← ()p 8 snns ()pp−> (++ 1) ( p 1) < 10 while9 ( abs pp()0.01 s←+nn s1 or s n c ) 11 end ()pp−> (++ 1) ( p 1) < 10 while ( abs()0.01 snn s or s n c ) 11 end To verify the the effectiveness effectiveness of of Algorithm Algorithm 2, 2,it itwas was compared compared to RANSAC to RANSAC algorithm algorithm integrated integrated with with a homographic transformation (the threshold of re-projected errors was set to 3.0 pixels). The a homographicTo verify transformation the effectiveness (the of threshold Algorithm of2, re-projectedit was compared errors to RANSAC was set algorithm to 3.0 pixels). integrated The results results arewith shown a homographic in Figure 8.transformation (the threshold of re-projected errors was set to 3.0 pixels). The are shown in Figure8. results are shown in Figure 8.

(a) (a)

(b) (b) Figure 8. A comparison of RANSAC and tensor-based gross error detection: (a) manual selected tie Figure 8. 8. AA comparison comparison of RANSAC of RANSAC and andtensor-based tensor-based gross error gross detection: error detection: (a) manual (a) selected manual tie selected tie points; and (b) results of the comparison. points; and and (b ()b results) results of ofthe the comparison. comparison. As can be seen from Figure 8a, 30 pairs of evenly distributed tie points are selected from the sub- As can be seen from Figure 8a, 30 pairs of evenly distributed tie points are selected from the sub- Asimage canbe pairs. seen To from verify Figure the robustness8a, 30 pairs of the of two evenly algorithms, distributed from 10 tie to points 100 pairs are of selected randomly from the image pairs.generated To verifytie points the are robustness added in theof matchedthe two featuralgorithms,e sets as fromthe matching 10 to 100 blunders. pairs Twoof randomly parameters sub-image pairs. To verify the robustness of the two algorithms, from 10 to 100 pairs of randomly generatedare tie used points to evaluate are added the in effectivenes the matcheds of featur the algorithms.e sets as the The matching first one blunders. is rf (rf = fpTwo/t), whereparameters fp is the generatedare used to tieevaluate points the are effectivenes added in thes of matchedthe algorithms. feature The sets first as one the is matching rf (rf = fp/ blunders.t), where fp Two is the parameters are used to evaluate the effectiveness of the algorithms. The first one is rf (rf = fp/t), where fp is the number of falsely detected blunders (that is, the detected matching blunders are actually correct matches, the detection algorithm falsely mark them as erroneous matches), t is the total number of Remote Sens. 2018, 10, 1104 12 of 19 correctly matched points. In this experiment, t is 30. In the ideal case, rf is 0, as a lower rf indicates a better algorithm. The other parameter to measure the usefulness of the algorithms is rc (rc = ct/g), where ct is the number of correctly detected blunders and g is the total number of matching blunders in the matched points; in this experiment, g varies from 10 to 100. By the experimental results, rc of RANSAC and tensor-based algorithms was 1.0, this means both algorithms detected all the gross errors. The results of rf were demonstrated in Figure8b. Although both algorithms can detect all the blunders, the tensor-based algorithm seems to perform better than RANSAC because the average rf of the former one is only 0.03, which is far below that of the latter. The essence of RANSAC is randomly sampling, the global geometric transform model is fitted by the sampled points, and the rest of the points are treated as verification set. The drawback of RANSAC is that the probability of correct sampling is very low when encountered with considerable mismatches, and finally RANSAC is stuck in a local minimum. However, graph-based detection method avoids these drawbacks, because the node attributes represent other nodes’ support. If the node is an outlier, other nodes’ support will be very small, and thus the attribute indicates whether the node is an outlier or not.

3. Experimental Results The proposed ATBM was evaluated with five pairs of images which were captured by ZY-3 backward (ZY3-BWD), ZY-3 nadir (ZY3-NAD), GF-1, GF-2, unmanned aerial vehicle (UAV) platform, Jilin1 (JL1), SPOT 5, and SPOT6 sensors. To examine its effectiveness, five descriptor-based matching algorithms were chosen as the competitors, namely, SIFT, SURF, NG-SIFT, OR-SIFT, and GOM-SIFT. These five descriptor matching algorithms were state-of-the-art in matching multi-sensor images. The matcher for these algorithms was ANN searching, and the NNDR threshold was set to 0.8. In the following, evaluation criteria, experimental datasets, implementation details, and experimental results are presented.

3.1. Evaluation Criterion and Datasets In order to evaluate the proposed method, two common criteria including recall and precision [10] were used. The two criteria are defined as follows: recall = CM/C and precision = CM/M, where CM (correct matches) was the number of correctly matched point pairs, C (correspondences) was the total number of existing correspondences in the initial feature point sets, and M (matches) was the number of matched points. However, for large images covering mountainous and urban areas, a global transformation such as projective and second-order polynomial models could not accurately express the geometric transformation among such images [18]. Thus, in the estimation of C and CM, 25 pairs of sub-images grabbed from the complete image pairs were used for evaluation. Each of the sub-image pairs was typically characterized by non-linear intensity differences and considerable textural changes (the details of the complete image pairs were listed in Table1; the sub-image pairs were shown in Figure9). C and CM were determined as follows: A skilled operator manually selected 10–30 evenly distributed tie points between image pairs, and an accurate homographic transformation was computed using the selected tie points. Then the computed homographic matrix with the back-projective error threshold of 3.0 pixels was used to determine the number of correct matches and correspondences. Apart from recall and precision, positional accuracy was used to evaluate the matching effectiveness of ATBM and other five matching algorithms. The positional accuracy was also used by Ye and San [18] and was computed as follows. First, TIN was constructed using the matched points of the complete image pair, and 13–38 evenly distributed checkpoints (or CPs, for short) were fed to the Delaunay triangulation. Then, an affine transformation model was fitted by the matched triangles. Lastly, positional accuracy was estimated via the root mean square error (RMSE), which was computed through the affine transformation of the CPs. Remote Sens. 2018, 10, 1104 13 of 19

Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 19

(a) (b) (c) (d) (e)

Figure 9. Sub-image pairs for estimating recall, precision, and number of correct matches: (a) Sub- Figure 9. Sub-imageimage pairs from pairs GF-1 for estimatingand ZY-3 sensors; recall, (b) precision, GF-2 and ZY-3 and sensors; number (c) of ZY3-NAD correct matches:and ZY3-BWD (a) Sub-image pairs fromsensors; GF-1 ( andd) UAV ZY-3 and sensors; JL1 sensors; (b )and GF-2 (e) SPOT and 5 ZY-3 and STOP sensors; 6 sensors. (c) ZY3-NAD and ZY3-BWD sensors; (d) UAV and JL1 sensors; and (e) SPOT 5 and STOP 6 sensors. Table 1. Experimental image pairs.

Acquisition Pixel Size CPs No. Platform Table 1. LocationExperimental Image image Size pairs. (Pixels) Date (m/Pixel) Number GF-1 2013 18,192 × 18,000 2.0 1 Acquisition Xinjiang, China Pixel Size30 CPs No. Platform Location Image Size (Pixels) ZY-3 Date 2012 24,530 × 24,575 2.1 (m/Pixel) Number GF-2 2015 9376 × 9136 1.0 2 GF-1 2013 Beijing, China 18,192 × 18,000 2.0 32 ZY-3 2012 Xinjiang, China 24,530 × 24,575 2.1 1 × 30 ZY-3ZY3-BWD 20122014 24,525 24,530 × 24,41924,575 2.1 2.1 3 Wuhan, China 38 GF-2ZY3-NAD 2015 2015 16,292 9376× 16,348× 9136 3.5 1.0 2 Beijing, China 32 ZY-3UAV 20122014 Changchun, 4990 24,530 × 3908× 24,575 0.5 2.1 4 36 ZY3-BWDJL1 2014 2014 China 476524,525 × 4070× 24,419 1.0 2.1 3 SPOT5 2009 Wuhan, China 598 × 551 2.5 38 ZY3-NAD5 2015Barcelona, Spain 16,292 × 16,348 3.5 13 SPOT6 2012 792 × 729 1.5 UAV 2014 4990 × 3908 0.5 4 Changchun, China 36 JL1 2014 4765 × 4070 1.0 3.2. Implementation Details and Experimental Results SPOT5 2009 598 × 551 2.5 5 In the process of feature extraction,Barcelona, UR-FAST Spain was applied in the source image, and standard 13 SPOT6 2012 792 × 729 1.5 FAST was used in the target image. It was shown by our experiments that this feature extracting strategy could greatly improve the feature repetitiveness compared to commonly used feature 3.2. Implementationdetectors such Details as Harris and features, Experimental BRISK, ResultsDoG, and ORB. In the construction of the affinity tensors, to guarantee repetitive rate of image feature, it retained 50 UR-FAST features in the source image. In theFor process every ofUR-FAST feature extraction,feature, ANN UR-FAST algorithm was was applied applied in to the search source for image, its 4 andcandidate standard FAST was usedcorrespondences; in the target image. for every It triangle was shown in the source by our image, experiments ANN algorithm that was this applied feature again extracting to search strategy could greatlyfor its improve 3 candidate the matched feature triangle repetitivenesss (by our experiments, compared the to matching commonly recall usedreached feature the peak detectors when such as the numbers of candidate matching features and triangles were 4 and 3, respectively). The Gaussian Harris features,kernel bandwidth BRISK, DoG, ε in and Equation ORB. (1) In was the set construction to π /15, and of any the computed affinity tensor tensors, elements to guarantee that were repetitive rate of imagehigher feature, than π it/5retained were set to 50 zero. UR-FAST The balanced features weighted in the factor source in Equation image. (1) For was every empirically UR-FAST set feature, ANN algorithmto 0.01 (the was more applied precise tobalanced search weighted for its factor 4 candidate could be calculated correspondences; by Equation for (2) if every necessary). triangle in the source image,Besides, ANN to trade algorithm off computational was applied cost and again accuracy, to search we made for 5 its trials 3 candidate in power iterations matched (the triangles power (by our iteration was shown in Algorithm 1). experiments, theThe matchingcomplete image recall pairs reached listed in the Table peak 1 were when the thetesting numbers datasets offor candidate estimating positional matching features and trianglesaccuracy. were There 4 and were 3, respectively).thousands of feature The points Gaussian in these kernel image pairs. bandwidth Thus, computingε in Equation the affinity (1) was set to π/15, and any computed tensor elements that were higher than π/5 were set to zero. The balanced weighted factor in Equation (1) was empirically set to 0.01 (the more precise balanced weighted factor could be calculated by Equation (2) if necessary). Besides, to trade off computational cost and accuracy, we made 5 trials in power iterations (the power iteration was shown in Algorithm 1). The complete image pairs listed in Table1 were the testing datasets for estimating positional accuracy. There were thousands of feature points in these image pairs. Thus, computing the affinity tensors for such image pairs was very time consuming. Meanwhile, storing all the feature descriptors in computer memory was nearly impossible. Therefore, SIFT algorithm (SIFT algorithm was optional; Remote Sens. 2018, 10, 1104 14 of 19 other algorithms such as SURF and ORB were also sufficient for the coarse matching) was used to obtain some tieRemote points Sens. 2018 in, 10 down-sampled, x FOR PEER REVIEW image pairs. Then, we roughly computed 14 the of 19 homographic transformation between the image pairs. Next, the two images of an image pair were gridded tensors for such image pairs was very time consuming. Meanwhile, storing all the feature descriptors to sub-imagein pairs, computer and memory every was sub-imagenearly impossible. pair Therefore, would SIFT be algorithm overlapped (SIFT algorithm under was the optional; constraint of the homographic transformationother algorithms such (this as SURF coarse-to-fine and ORB were strategy also sufficient was for also the appliedcoarse matching) by the was compared used to algorithms). In the end, allobtain the sub-image some tie points pairs in down-sampled were matched image pa byirs. ABTM Then, we as roughly illustrated computed in the Figure homographic2. transformation between the image pairs. Next, the two images of an image pair were gridded to sub- ABTM, SIFT,image pairs, SURF, and NG-SIFT,every sub-image OR-SIFT, pair would and be ov GOM-SIFTerlapped under the were constraint also of evaluated the homographic with the testing data listed intransformation Table1. All (this the coarse-to-fine test image strategy pairs was also were applied selected by the compared from differentalgorithms). In sensors the end, and, hence, had considerableall the illumination sub-image pairs were differences matched by particularly ABTM as illustrated in images in Figure with 2. higher resolution. In addition, ABTM, SIFT, SURF, NG-SIFT, OR-SIFT, and GOM-SIFT were also evaluated with the testing these test imagesdata listed had in different Table 1. All scales the test andimage topographic pairs were selected relief, from different so obvious sensors geometric and, hence, had distortions and considerable textureconsiderable changes illumination were differences present. particularly in images with higher resolution. In addition, these test images had different scales and topographic relief, so obvious geometric distortions and Figure 10considerable showed thetexture matching changes were results present. of ABTM on the five test image pairs. It seemed clear that ABTM obtained favorableFigure 10 showed matching the matching results results though of ABTM thereon the five were test non-linearimage pairs. It seemed intensity clear that differences with considerable textureABTM obtained changes favorable and matching somewhat results obvious though there geometric were non-linear distortions. intensity differences with considerable texture changes and somewhat obvious geometric distortions.

(a) (f)

(b) (g)

Remote Sens. 2018, 10, x FOR PEER(d )REVIEW (i) 15 of 19

(e) (j)

Figure 10. Matching results of ATBM. (a–e) showed the matching results of GF-1 and ZY-3, GF-2 and Figure 10. MatchingZY-3, ZY3-BWD results and of ZY3-NAD, ATBM. UAV (a– eand) showed JL1, and SPOT the 5 matching and SPOT 6 sensors, results and of (f GF-1–j) showed and the ZY-3, GF-2 and ZY-3, ZY3-BWDmatching and details ZY3-NAD, of the sub-images, UAV and which JL1, was and marked SPOT by red 5 andrectangles SPOT in (a 6–e sensors,). and (f–j) showed the

matching detailsAs shown of the in sub-images, Figure 10, ATBM which obtained was abundant marked and by redevenly rectangles distributed in matches (a–e). in all of the experimental images. The first pair of images was from Xinjiang, China and captured by GF-1 and ZY-3 sensors; therefore, the illumination differences were obvious as evidenced by the matching details. Besides, there were also changes in scale caused by the differences of ground sampling distance (GSD) of sensors and the vast topographic relief caused by the Chi-lien Mountains. These two factors resulted in locally non-linear intensity differences and geometric distortions. Nevertheless, ATBM obtained evenly distributed tie points in the sub-image pairs (as shown in Figure 10f). However, in the middle of the first image pair (Figure 10a), there was no tie point. This is because the two images were captured at different moments, and the image textures changed dramatically. Thus, there was hardly any feature repetition. Although feature correspondences by geometric constraints are found in the initial matching process, the tensor-based detection algorithm eliminated all the false positive matches in the gross error detection process of ATBM. The second image pair consisted of two images from the GF-2 and ZY-3 sensors. The two images were captured at different moments which spanned over three years; therefore, considerable textural changes were presented as a result of the rapid development of modern cities in China (as shown Figure 10g). The tie points of the second sub-image pair were clustered in unchanged image patches because ATBM detected feature correspondences using radiometric and geometric constraints. It could find feature matches in the unchanged textures which were surrounded by numerous changed textures. Therefore, the tie points of the complete image pair (as shown in Figure 10b) distributed well. The third pair of images was captured by the ZY3-BWD and ZY3-NAD sensors. The two images were taken in different seasons—namely early spring and midsummer. The different seasons led to much severer intensity changes compared to the first two image pairs. For example, it seemed clear that in Figure 10h, some of the pixel intensity was reversed. However, ATBM was immune to intensity changes and obtained stable matching results in this image pair as well. The fourth pair of images was shot by UAV and JL1 sensors (the camera equipped on UAV platform was Canon EOS 5D Mark III), and it could be seen that JL1 image was blurrier than UAV image. The transmission distance between the sensed target and satellite sensors was much longer than that of UAV sensors and their targets, the reflected light lost more in the transmission, thus the UAV images looked sharper than satellite images. Besides, as in the second image pair, the two images were captured at different seasons. JL1 image was covered by thick snow, while the UAV image was covered by growing trees (as shown Figure 10i). However, both these two challenges had few impacts on the matching results, and ATBM still obtained abundant tie points. The fifth pair of images was from an urban area and the image resolution was high (2.5 m for SPOT5 and 1.5 m for SPOT6), and the two images were characteristic with structured textures (as shown in Figure 10j). These textures benefitted FAST corner detector, and thus the matching result was satisfied too.

Remote Sens. 2018, 10, 1104 15 of 19

As shown in Figure 10, ATBM obtained abundant and evenly distributed matches in all of the experimental images. The first pair of images was from Xinjiang, China and captured by GF-1 and ZY-3 sensors; therefore, the illumination differences were obvious as evidenced by the matching details. Besides, there were also changes in scale caused by the differences of ground sampling distance (GSD) of sensors and the vast topographic relief caused by the Chi-lien Mountains. These two factors resulted in locally non-linear intensity differences and geometric distortions. Nevertheless, ATBM obtained evenly distributed tie points in the sub-image pairs (as shown in Figure 10f). However, in the middle of the first image pair (Figure 10a), there was no tie point. This is because the two images were captured at different moments, and the image textures changed dramatically. Thus, there was hardly any feature repetition. Although feature correspondences by geometric constraints are found in the initial matching process, the tensor-based detection algorithm eliminated all the false positive matches in the gross error detection process of ATBM. The second image pair consisted of two images from the GF-2 and ZY-3 sensors. The two images were captured at different moments which spanned over three years; therefore, considerable textural changes were presented as a result of the rapid development of modern cities in China (as shown Figure 10g). The tie points of the second sub-image pair were clustered in unchanged image patches because ATBM detected feature correspondences using radiometric and geometric constraints. It could find feature matches in the unchanged textures which were surrounded by numerous changed textures. Therefore, the tie points of the complete image pair (as shown in Figure 10b) distributed well. The third pair of images was captured by the ZY3-BWD and ZY3-NAD sensors. The two images were taken in different seasons—namely early spring and midsummer. The different seasons led to much severer intensity changes compared to the first two image pairs. For example, it seemed clear that in Figure 10h, some of the pixel intensity was reversed. However, ATBM was immune to intensity changes and obtained stable matching results in this image pair as well. The fourth pair of images was shot by UAV and JL1 sensors (the camera equipped on UAV platform was Canon EOS 5D Mark III), and it could be seen that JL1 image was blurrier than UAV image. The transmission distance between the sensed target and satellite sensors was much longer than that of UAV sensors and their targets, the reflected light lost more in the transmission, thus the UAV images looked sharper than satellite images. Besides, as in the second image pair, the two images were captured at different seasons. JL1 image was covered by thick snow, while the UAV image was covered by growing trees (as shown Figure 10i). However, both these two challenges had few impacts on the matching results, and ATBM still obtained abundant tie points. The fifth pair of images was from an urban area and the image resolution was high (2.5 m for SPOT5 and 1.5 m for SPOT6), and the two images were characteristic with structured textures (as shown in Figure 10j). These textures benefitted FAST corner detector, and thus the matching result was satisfied too. Figure 11 gave a quantitative description of ATBM and a comparison of the five feature descriptors. ATBM outperformed the other five matching algorithms in all the evaluation criterion including matching recall, matching precision, number of correct matches, and positional accuracy. ATBM outperformed the other five algorithms in all the test images which contained locally non-linear intensity differences, geometric distortions, and numerous textural changes. The better performance owed to the higher matching recall (higher than 0.5 in matching recall for ATBM as shown in Figure 11a), which resulted in more correct matches between image pairs. In addition, ATBM used the affinity tensor-based gross error detection algorithm which could distinguish true matches among a number of wrong ones. Thus, the matching precision and the number of correct matches was higher too (the average number of correct matches for ATBM was approximately 20 for sub-image pairs). Higher positional accuracy (an average of 2.5 pixels for ATBM) was consequent because positional accuracy was mainly determined by the number of correct matches and matching precision. Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 19

Figure 11 gave a quantitative description of ATBM and a comparison of the five feature Remotedescriptors. Sens. 2018, 10 ATBM, 1104 outperformed the other five matching algorithms in all the evaluation criterion16 of 19 including matching recall, matching precision, number of correct matches, and positional accuracy.

(a) (b)

Figure 11. Quantitative matching results of test images: (a) Matching recall; (b) number of correct Figure 11. Quantitative matching results of test images: (a) Matching recall; (b) number of correct matches; (c) matching precision; and (d) positional accuracy. matches; (c) matching precision; and (d) positional accuracy. ATBM outperformed the other five algorithms in all the test images which contained locally ATBMnon-linear tended intensity to match differences, feature geometric points withdistorti theons, similarities and numerous in both textur geometryal changes. and The radiometry. better Bothperformance geometric and owed radiometric to the higher information matching recall played (higher the samethan 0.5 role in inmatching matching recall texturally for ATBM fine as and well-structuredshown in Figure images 11a), (i.e., which images resulted with in small more geometriccorrect matches and radiometricbetween image distortions). pairs. In addition, The higher matchingATBM recall used the benefitted affinity tensor-based from the affinitygross error tensor detection which algorithm encoded which geometric could distinguish and radiometric true matches among a number of wrong ones. Thus, the matching precision and the number of correct information. Geometry described the intrinsic relations of feature points. Meanwhile radiometry matches was higher too (the average number of correct matches for ATBM was approximately 20 for representedsub-image local pairs). appearances. Higher positional When accuracy geometric (an average or radiometric of 2.5 pixels distortions for ATBM) was were consequent considerable, ATBMbecause could positional make the accuracy two constraints was mainly compensate determined forby th eache number other. of For correct example, matches in and sub-image matching pair 1, the geometricprecision. distortions were relatively large, and thus, radiometric information played a leading role in matching.ATBM tended While to in match sub-image feature pair points 3, thewith flat the farmlands similarities resulted in both geometry in relatively and smallradiometry. geometric distortions,Both geometric and such and information radiometric compensated information played for the the radiometric same role in information matching texturally in matching. fine and Inwell-structured general, descriptor-based images (i.e., images matching with small algorithms geometric are and determined radiometric via distortions). NNDR of theThe radiometric higher descriptors.matching Whenrecall benefitted encountered from withthe affinity non-linear tensor intensity which encoded differences geometric and and changed radiometric textures, the drawbackinformation. of the Geometry NNDR described strategy in the matching intrinsicmulti-sensor relations of feature images points was exposed:. Meanwhile descriptor radiometry distance represented local appearances. When geometric or radiometric distortions were considerable, ATBM of two matched features was insufficiently small. OR-SIFT and GOM-SIFT modified the descriptor to could make the two constraints compensate for each other. For example, in sub-image pair 1, the adaptgeometric the textures distortions in which were the relative radiometricly large, intensity and thus, differences radiometric were information reversed, played while a these leading image role pairs werein not matching. such the While case becausein sub-image the intensity pair 3, the differences flat farmlands were resulted non-linear. in relatively NG-SIFT small used geometric normalized descriptorsdistortions, and abandonedand such information gradient magnitude,compensated though for the radiometric it had a certain information of capability in matching. to resist non-linear intensity differences,In general, descriptor-based it had problems matching in descriptor algorithms distinctiveness. are determined Therefore,via NNDR of the the descriptor-based radiometric matchingdescriptors. methods When such encountered as SIFT, SUFR,with non-linear OR-SIFT, intensity GOM-SIFT differences and NG-SIFT and changed had textures, relatively the lower matchingdrawback recall of in the the NNDR experimental strategy in image matching pairs. multi-se ATBMnsor avoided images was NNDR exposed: rule anddescriptor tended distance to find the best matchesof two matched in geometry features and was radiometry, insufficiently and small. thus OR-SIFT leading and to higherGOM-SIFT matching modified recall. the descriptor Benefitting to from higheradapt matching the textures recall, in there which were the radiometric more correct intens matchesity differences for ATMB were in reversed, the sub-image while these pairs image (shown in

Figure 11c), and it could also be concluded that there would be more correct matches in the complete image pairs (evidences were shown in Figure 10a–e). With more correct matches, the positional accuracy was higher, too (as shown in Figure 11d). The experimental results of the ﬁve descriptor-based matching algorithms were strongly dependent on the image contents: GOM-SIFT obtained the most correct matches and highest matching recall (Figure 11a,b), SIFT had the best matching precision (Figure 11c). In most cases, OR-SIFT and Remote Sens. 2018, 10, 1104 17 of 19

GOM-SIFT had similar matching results, because OR-SIFT used the same manner as GOM-SIFT to build feature descriptors; the differences between them were that OR-SIFT had the half-descriptor dimensions of GOM-SIFT, thus OR-SIFT had efficiency advantage in ANN searching. SIFT had the best matching precision in the experimental data; this is because SIFT descriptor had higher distinctiveness than another four feature descriptors. NG-SIFT and SURF had no obvious regularity in the testing data, though both of them were characteristic with scale, rotation, and partly illumination change invariance, which were also the capabilities of SIFT, OR-SIFT, and GOM-SIFT. ATBM outperformed other five matching algorithms again in positional accuracy (shown in Figure 11d). Positional accuracy was mainly determined by the matching precision, number of correct matches and their spatial distribution. In most cases, more numbers of correct matches (as we can see from Figure 11b, ATBM had more numbers of correct) meant a better spatial distribution, and the matching precision of ATBM was higher than other five algorithms too (shown in Figure 11c). Therefore, ATBM had higher positional accuracy than other five descriptor matching algorithms.

4. Conclusions and Future Work Image matching is a fundamental step in remote sensing image registration, aerial triangulation, and object detection. Although it has been well-addressed by feature-based matching algorithms, it remains a challenge to match multi-sensor images because of non-linear intensity differences, geometric distortions and textural changes. Conventional feature-based algorithms have been prone to failure because they only use radiometric information. This study presented a novel matching algorithm that integrated geometric and radiometric information into an affinity tensor and utilized such a tensor to address matching blunders. The proposed method involved three steps: graph building, tensor-based matching, and tensor-based detection. In graph building, the UR-FAST and ANN searching algorithms were applied, and the extracted image features were regarded as graph nodes. Then the affinity tensor was built with its elements representing the similarities of nodes and triangles. The initial matching results were obtained by tensor power iteration. Finally, the affinity tensor was used again to eliminate matching errors. The proposed method had been evaluated using five pairs of multi-sensor remote sensing images covered with six different sensors: ZY3-BWD, ZY3-NAD, GF-1 and GF-2, JL-1 and UAV platform. Compared with traditionally used feature matching descriptors such as SIFT, SURF, NG-SIFT, OR-SIFT and GOM-SIFT, the proposed ATBM could achieve reliable matching results in terms of matching recall, precision correct matches, and positional accuracy of the experimental data. Because the tensor-based model is a generalized matching model, the proposed method can also be applied in point cloud registration and multi-source data fusion. However, a few problems should be addressed in future research. The bottleneck of the computing efficiency is ANN algorithm when using in the searching for n most similar triangles. For example, if a feature set consists of 50 image features, these features constitute 503 triangles, then the KD-tree will have 503 nodes, and lead to a very time-consuming result for construction of KD-tree. In addition, the efficiency of searching for n most similar triangles in a 503 nodes KD-tree is very low in essence. The computational time of ATBM is higher than that of SURF, SIFT because the considerable size of the affinity tensor increased computing operations in the power iterations. Further research can introduce new strategies to make the tensor sparser and reduce computational complexity in power iterations. Besides, power iterations also could be implemented in a GPU-based parallel computing framework which could immensely speed the power iterations.

Author Contributions: W.Y. advised on the data analysis provided remote sensing experience and gave the whole structure of the idea. S.C. developed the algorithm, conducted the primary data analysis, and crafted the manuscript. X.Y. signiﬁcantly edited the manuscript. J.N., F.X. and Y.Z. revised the manuscript. Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 41371432, Grant No. 41671405, and Grant No. 41771438). Remote Sens. 2018, 10, 1104 18 of 19

Acknowledgments: We would like to thank Yifei Kang, Ying Guo, and Cailong Deng for their help in image pre-processing. Conﬂicts of Interest: The authors declare no conﬂict of interest.

References

1. Harris, C.; Stephens, M. A combined corner and edge detector. In Proceedings of the Alvey Vision Conference (AVC), Manchester, UK, 1–6 September 1988. 2. Rosten, E.; Drummond, T. Machine learning for high-speed corner detection. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006. 3. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef] 4. Leutenegger, S.; Chli, M.; Siegwart, R.Y. BRISK: Binary robust invariant scalable keypoints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. 5. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efﬁcient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. 6. Canny, J.F. Finding Edges and Lines in Images; Technical Report AI-TR-720; MIT, Artiﬁcial Intelligence Laboratory: Cambridge, MA, USA, 1983. 7. Forssén, P.E. Maximally stable colour regions for recognition and matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007. 8. Bay, H.; Tuytelaars, T.; Van Gool, L. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision (ECCV), Graz, Austria, 7–13 May 2006. 9. Ke, Y.; Sukthankar, R. PCA-SIFT: A more distinctive representation for local image descriptors. In Proceedings of the IEEE Computer Conference on Computer Vision and Pattern Recognition (CVPR), Washington, DC, USA, 27 June–2 July 2004. 10. Mikolajczyk, K.; Schmid, C. A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1615–1630. [CrossRef][PubMed] 11. Shechtman, E.; Irani, M. Matching local self-similarities across images and videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007. 12. Sedaghat, A.; Ebadi, H. Distinctive Order Based Self-Similarity descriptor for multi-sensor remote sensing image matching. ISPRS J. Photogramm. Remote Sens. 2015, 108, 62–71. [CrossRef] 13. Wang, Z.; Fan, B.; Wu, F. Local intensity order pattern for feature description. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011. 14. Fan, B.; Wu, F.; Hu, Z. Rotationally invariant descriptors using intensity order pooling. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2031–2045. [CrossRef][PubMed] 15. Wang, Z.; Wu, F.; Hu, Z. MSLD: A robust descriptor for line matching. Pattern Recognit. 2009, 42, 941–953. [CrossRef] 16. Kim, H.; Lee, S. Simultaneous line matching and epipolar geometry estimation based on the intersection context of coplanar line pairs. Pattern Recognit. Lett. 2012, 33, 1349–1363. [CrossRef] 17. Forssén, P.E.; Lowe, D.G. Shape descriptors for maximally stable extremal regions. In Proceedings of the International Conference on Computer Vision (ICCV), Rio de Janeiro, Brazil, 14–20 October 2007. 18. Ye, Y.; Shan, J. A local descriptor based registration method for multispectral remote sensing images with non-linear intensity differences. ISPRS J. Photogramm. Remote Sens. 2014, 90, 83–95. [CrossRef] 19. Yi, Z.; Zhiguo, C.; Yang, X. Multi-spectral remote image registration based on SIFT. Electron. Lett. 2008, 44, 107–108. [CrossRef] 20. Mehmet, F.; Yardimci, Y.; Temlzel, A. Registration of multispectral satellite images with orientation-restricted SIFT. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Cape Town, South Africa, 12–17 July 2009. 21. Hasan, M.; Jia, X.; Robles-Kelly, A.; Zhou, J.; Pickering, M.R. Multi-spectral remote sensing image registration via spatial relationship analysis on sift keypoints. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Honolulu, HI, USA, 25–30 July 2010. 22. Saleem, S.; Sablatnig, R. A robust sift descriptor for multispectral images. IEEE Signal Process. Lett. 2014, 21, 400–403. [CrossRef] Remote Sens. 2018, 10, 1104 19 of 19

23. Kim, S.; Min, D.; Ham, B.; Ryu, S.; Do, M.N.; Sohn, K. Dasc: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. 24. Kim, S.; Ryu, S.; Ham, B.; Kim, J.; Sohn, K. Local self-similarity frequency descriptor for multispectral feature matching. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014. 25. Liu, Y.; Mo, F.; Tao, P. Matching Multi-Source Optical Satellite Imagery Exploiting a Multi-Stage Approach. Remote Sens. 2017, 9, 1249. [CrossRef] 26. Leordeanu, M.; Hebert, M. A Spectral Technique for Correspondence Problems Using Pairwise Constraints. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Beijing, China, 17–20 October 2005. 27. Zhu, Y.; Hu, W.; Zhou, J.; Duan, F.; Sun, J.; Jiang, L. A New Starry Images Matching Method in Dim and Small Space Target Detection. In Proceedings of the International Conference on Image and Graphics (ICIG), Xi’an, China, 20–23 September 2009. 28. Duchenne, O.; Bach, F.; Kweon, I.-S.; Ponce, J. A tensor-based algorithm for high-order graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2383–2395. [CrossRef][PubMed] 29. Zhang, Y.; Jiang, F.; Rho, S.; Liu, S.; Zhao, D.; Ji, R. 3D object retrieval with multi-feature collaboration and bipartite graph matching. Neurocomputing 2016, 195, 40–49. [CrossRef] 30. Wang, A.; Li, S.; Zeng, L. Multiple order graph matching. In Proceedings of the Asian Conference on Computer Vision (ACCV), Queenstown, New Zealand, 8–12 November 2010. 31. Chen, S.; Yuan, X.; Yuan, W.; Yang, C. Poor textural image matching based on graph theory. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 741. [CrossRef] 32. West, D.B. Introduction to Graph Theory; Prentice Hall: Upper Saddle River, NJ, USA, 2001. 33. Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography. Commun. ACM 1981, 24, 381–395. [CrossRef] 34. Shi, X.; Ling, H.; Hu, W.; Xing, J.; Zhang, Y. Tensor power iteration for multi-graph matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. 35. Sedaghat, A.; Mokhtarzade, M.; Ebadi, H. Uniform robust scale-invariant feature matching for optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4516–4527. [CrossRef] 36. ANN: A Library for Approximate Nearest Neighbor Searching. Available online: http://www.cs.umd.edu/ ~mount/ANN/ (accessed on 6 April 2018). 37. Chertok, M.; Keller, Y. Efﬁcient high order matching. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 2205–2215. [CrossRef][PubMed] 38. Lipschutz, S.; Lipson, M. Schaum’s Outline of Linear Algebra, 4th ed.; McGraw-Hill Professional: New York, NY, USA, 2008. 39. Edmonds, J. Matroids and the greedy algorithm. Math. Program. 1971, 1, 127–136. [CrossRef] 40. Jonker, R.; Volgenant, T. Improving the Hungarian assignment algorithm. Oper. Res. Lett. 1971, 5, 171–175. [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).