Matching RGB and Infrared Remote Sensing Images with Densely-Connected Convolutional Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
remote sensing Article Matching RGB and Infrared Remote Sensing Images with Densely-Connected Convolutional Neural Networks Ruojin Zhu 1, Dawen Yu 1, Shunping Ji 1,* and Meng Lu 2 1 School of Remote Sensing and Information Engineering, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; [email protected] (R.Z.); [email protected] (D.Y.) 2 Department of Physical Geography, Faculty of Geoscience, Utrecht University, Princetonlaan 8, 3584 CB Utrecht, The Netherlands; [email protected] * Correspondence: [email protected] Received: 17 October 2019; Accepted: 26 November 2019; Published: 29 November 2019 Abstract: We develop a deep learning-based matching method between an RGB (red, green and blue) and an infrared image that were captured from satellite sensors. The method includes a convolutional neural network (CNN) that compares the RGB and infrared image pair and a template searching strategy that searches the correspondent point within a search window in the target image to a given point in the reference image. A densely-connected CNN is developed to extract common features from different spectral bands. The network consists of a series of densely-connected convolutions to make full use of low-level features and an augmented cross entropy loss to avoid model overfitting. The network takes band-wise concatenated RGB and infrared images as the input and outputs a similarity score of the RGB and infrared image pair. For a given reference point, the similarity scores within the search window are calculated pixel-by-pixel, and the pixel with the highest score becomes the matching candidate. Experiments on a satellite RGB and infrared image dataset demonstrated that our method obtained more than 75% improvement on matching rate (the ratio of the successfully matched points to all the reference points) over conventional methods such as SURF, RIFT, and PSO-SIFT, and more than 10% improvement compared to other most recent CNN-based structures. Our experiments also demonstrated high performance and generalization ability of our method applying to multitemporal remote sensing images and close-range images. Keywords: image matching; convolutional neural network; remote sensing image; template matching 1. Introduction Image matching, as an important topic in computer vision and image processing, has been widely applied in image registration, fusion, geo-localization, and 3D reconstruction. With the development of sensors, multimodal remote sensing data captured on the same scene commonly require to be matched for the complementary information to be used. Specifically, visible and infrared sensors are two commonly and simultaneously used sensors in different kinds of tasks, such as face detection, land cover classification, object tracking, and driverless cars. The panchromatic or RGB imaging is the closest to human vision but is seriously affected by the lighting and atmospheric conditions; the near-infrared image has a relatively lower contrast but is more robust against weather conditions. To make the full use of them, image matching and registration are the first step. However, there are great radiometric and geometric differences between visible and infrared images, as they collect spectral reflectance from different wavelengths with different imaging mechanisms. The visual differences can prevent successful application of conventional matching methods that heavily rely on intensity and gradient. Remote Sens. 2019, 11, 2836; doi:10.3390/rs11232836 www.mdpi.com/journal/remotesensing Remote Sens. 2019, 11, x FOR PEER REVIEW 2 of 17 45 Classic optical image matching is commonly classified into area-based and feature-based 46 methodsRemote Sens. [1]2019. Area, 11, 2836-based methods (also called patch matching) seek matches between a2 ofpatch 18 47 template and an equal-size sliding window in the search window of the target image, whereas 48 featureClassic-based methods optical image search matching matches from is commonly local features classified that have into been area-based extracted and from feature-based the reference 49 andmethods the search [1]. image Area-based respectively methods by (also comparing called patch their matching) similarity. seek The matches patch- betweenbased methods a patch templatehave been 50 comprehensivelyand an equal-size studied sliding and window applied in the to search photogrammetry window of the [2 target–4]. The image, feature whereas-based feature-based methods are 51 currentlymethods more search widely matches used from for local their features robustness that have against been extractedgeometric from transformations the reference and (e.g., the rotation, search 52 scale).image For respectively example, bySIFT comparing is a widely their used similarity. local feature, The patch-based 2D similarity methods invariant have been and comprehensively stable with noise 53 [5]studied. Many andlocal applied features to photogrammetryare developed from [2–4 the]. The SIFT, feature-based such as the methods Affine- areSIFT currently [6], the more UC ( widelyUniform 54 Competused forency their-based robustness) features against [7], and geometric the PSO transformations-SIFT [8]. Some (e.g., studies rotation, establish scale). feature For example, descriptors SIFT is by 55 structurea widely attributes used local instead feature, of 2D intensities, similarity invariantexamples and include stable the with HOPC noise [9] [5]., the Many RIF localT [10] features, the MSPC are 56 [11]developed, and the fromMPFT the [12] SIFT,. Some such asworks the A attemptffine-SIFT to[ 6increase], the UC the (Uniform percentage Competency-based) of correct correspondences features [7], 57 throughand the the PSO-SIFT Markov [ 8random]. Some field studies [13] establish and the feature Gaussian descriptors field [14] by The structure combination attributes of insteadthe feature of - 58 basedintensities, and the examples area-based include methods the HOPC has been [9], thestudied RIFT [[15,16]10], the. Some MSPC works [11], and used the the MPFT line [ 12features]. Some for works attempt to increase the percentage of correct correspondences through the Markov random 59 image matching [17–19]. Other studies combined line and point features [20–22]. However, line-based field [13] and the Gaussian field [14] The combination of the feature-based and the area-based methods 60 methods are often restricted to man-made scenes as the line structures are uncommon in nature. has been studied [15,16]. Some works used the line features for image matching [17–19]. Other 61 Although these handcrafted features have alleviated the influences of radiometric and geometric studies combined line and point features [20–22]. However, line-based methods are often restricted to 62 deformations to some extent, they may be weak in matching an optical and an infrared image as well man-made scenes as the line structures are uncommon in nature. Although these handcrafted features 63 as haveother alleviated multimodel the influences images, of radiometricas the large and spectral geometric and deformations spatial todifferences some extent, can they cause may beless 64 correspondencesweak in matching being an opticalextracted and from an infrared both images. image as well as other multimodel images, as the large 65 spectralCompared and spatial with dihandcraftedfferences can features, cause less using correspondences machine learning being to extracted automatically from both extract images. suitable 66 featuresCompared to measure with the handcrafted similarityfeatures, between using images machine has drawn learning a tolot automatically of attention. extract Recently, suitable deep 67 convolutionalfeatures to measureneural networks the similarity (CNNs) between have been images applied has drawn to measure a lot ofthe attention. similarity Recently, between deepoptical 68 imagesconvolutional [23,24]. neuralThe classic networks CNNs (CNNs) for image have been matching applied have to measure a Siamese the similarity network between with two optical-tower 69 structures,images [ 23each,24]. of The which classic consist CNNs forof a image series matching of convoluti have aonal Siamese layers network and processes with two-tower a patch 70 independentlystructures, each and of compute which consist a similarity of a series score of convolutional between the layers top features and processes of the atwo patch towers independently to measure 71 theand similarity. compute a similarity score between the top features of the two towers to measure the similarity. 72 ThereThere are are mainly mainly threethree variations variations of theof the Siamese Siamese structure. structure. The first The one first (Figure one1 a)(Figure shares 1a) weights shares 73 weightsbetween betwe theen corresponding the corresponding layers oflayers the twoof the towers two towers [25], which [25], iswhich adapted is adapted to homogenous to homogenous data 74 datainput. input. The SCNNThe SCNN [26] used [26] a six-layerused a Siamesesix-layer network Siamese to obtainnetwork similarities to obtain between similarities multitemporal between 75 multitemporaloptical images. optical The images. MatchNet The [27 MatchNet] applied the[27] bottleneck applied the layers bottleneck at the top layers of each at th towere top ofof theeach 76 towerSiamese of the network Siamese to reduce network the dimensionto reduce ofthe features. dimension The MSPSNof features. [28] introducedThe MSPSN multiscale [28] introduced spatial 77 multiscaleinformation spatial into theinformation Siamese networkinto the to