An Indoor Localization System Using Residual Learning with Channel State Information

entropy

Article An Indoor Localization System Using Residual Learning with Channel State Information

Chendong Xu 1, Weigang Wang 1,2,*, Yunwei Zhang 1, Jie Qin 1, Shujuan Yu 1 and Yun Zhang 1

1 College of Electronic and Optical Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210023, China; [email protected] (C.X.); [email protected] (Y.Z.); [email protected] (J.Q.); [email protected] (S.Y.); [email protected] (Y.Z.) 2 National and Local Joint Engineering Laboratory of RF Integration and Micro-Assembly Technology, Nanjing 210023, China * Correspondence: [email protected]

Abstract: With the increasing demand of location-based services, neural network (NN)-based intelli- gent indoor localization has attracted great interest due to its high localization accuracy. However, deep NNs are usually affected by degradation and gradient vanishing. To fill this gap, we propose a novel indoor localization system, including denoising NN and residual network (ResNet), to predict the location of moving object by the channel state information (CSI). In the ResNet, to prevent overfitting, we replace all the residual blocks by the stochastic residual blocks. Specially, we explore the long-range stochastic shortcut connection (LRSSC) to solve the degradation problem and gradient vanishing. To obtain a large receptive field without losing information, we leverage the dilated convolution at the rear of the ResNet. Experimental results are presented to confirm that our system outperforms state-of-the-art methods in a representative indoor environment.

Keywords: indoor localization; channel state information (CSI); denoising neural network (NN); residual network (ResNet) Citation: Xu, C.; Wang, W.; Zhang, Y.; Qin, J.; Yu, S.; Zhang, Y. An Indoor Localization System Using Residual Learning with Channel State Information. Entropy 2021, 23, 574. 1. Introduction https://doi.org/10.3390/e23050574 Due to the large demand for indoor localization, it attracts plenty of attention as an emerging technology. In the past, some indoor localization schemes based on WiFi, Academic Editor: Adam Lipowski Bluetooth, RFID et al. have been proposed. Among them, indoor localization based on WiFi promises to become a large scale implemented technology. This is because the widespread Received: 25 March 2021 deployment of WiFi access points (APs) enables users to obtain their locations at anytime Accepted: 28 April 2021 and anywhere in public places. Various WiFi-based indoor localization schemes mainly fall Published: 7 May 2021 into four categories: angle of arrival-based [1], time of arrival-based [2], signal propagation model-based [3], and fingerprint-based [4]. Since the fingerprint-based localization has Publisher’s Note: MDPI stays neutral a superior performance, it becomes the hot-pot of research. with regard to jurisdictional claims in Because received signal strength (RSS) is relatively easy to be measured and used [5], published maps and institutional affil- it has been utilized as fingerprint in many existing methods. The first fingerprint system iations. based on RSS, named Radar, utilized a deterministic method for location estimation [6]. Horus utilized a probabilistic method for indoor localization with RSS values [7], which achieves better localization accuracy than Radar. However, for the impact of multipath effects, RSS fluctuates greatly over time in the same location. In addition, RSS does not Copyright: © 2021 by the authors. exploit the rich channel information from different subcarriers. Thus, RSS-based fingerprint Licensee MDPI, Basel, Switzerland. system is hard to satisfy the requirements for high localization accuracy. This article is an open access article Recently, an alternative fingerprint, termed CSI in the IEEE 802.11 standard [8], is distributed under the terms and applied to indoor localization. We can obtain CSI from some advanced WiFi network conditions of the Creative Commons interface cards (NICs) and extract fine-grained information from its amplitude and phase. Attribution (CC BY) license (https:// Compared to RSS, CSI has better time stability and location discrimination. With the creativecommons.org/licenses/by/ great achievement of Deep Learning, many indoor fingerprint systems, based on neu- 4.0/).

Entropy 2021, 23, 574. https://doi.org/10.3390/e23050574 https://www.mdpi.com/journal/entropy Entropy 2021, 23, 574 2 of 15

ral networks, have been proposed for localization. DeepFi [9,10] learned 90 CSI amplitude data from three antennas for indoor localization and trained the deep network with a greed learning algorithm. However, there were too many network parameters to be trained and stored, which limits its application. Different from DeepFi, ConFi [11] converted the CSI data into CSI images and formulated indoor localization as a classification problem. The CSI images were fed into a five-layers convolutional neural network (CNN) to obtain features. Convolution operation in ConFi effectively improved the localization accuracy in the indoor scenario. However, the degradation problem and gradient vanishing will be caused due to the increasing depth of CNN. Compared to CNN, ResNet has a superior performance in image classification [12], object detection [13], instance segmentation [14], etc. As we know, ResNet1D [15] utilized ResNet for indoor localization and outperformed ConFi in localization accuracy. Unfor- tunately, ResNet1D has a poor ability of feature expression and network convergence. Thus, in this paper, we propose a ResNet-based indoor WiFi localization scheme that uses CSI amplitude as feature. In our scheme, the raw CSI amplitude information is first extracted from three wireless links. Then, we convert the amplitude information into CSI amplitude images and use them to train a 50 layers ResNet which has a good ability of feature propagation and can solve degradation problem well. Although Zhou et al. [16] proved that noise enables the algorithm converge to a global optimum, we still have the necessity to perform denoising at first. According to [17,18], due to sensitivity of the raw amplitudes of CSI to noise, localization is severely disturbed by the ubiquitously random noise. The goal of image denoising is to obtain a clean image from a noisy image. Most existing methods, based on deep learning, utilize many pairs of noisy/clean images as training samples. However, it is difficult to obtain clean CSI images. This is because the noise, especially Gaussian, always exists in wireless links. Recently, some research is conducted to train the denoising NN with only noisy images. The Noise2Noise (N2N) method [19] used many pairs of noisy images with the same scene to train a denoising NN model. However, it is still difficult to collect extensive image pairs. Self2Self (S2S) [20] was proposed to remove the noise by using Bernoulli sampled instances of an input noisy image. Although S2S greatly reduces the difficulty of collecting image pairs, it does not make full use of the low-level features, such as edge, colour, and pixels. Based on above, it is very urgent to explore a new method to cover these shortcomings. The contributions of this paper can be summarized as follows: (1) We design a novel residual network and solve the degradation problem effectively. All the ordinary residual blocks are replaced by the proposed stochastic residual blocks which can prevent overfitting. (2) Meanwhile, we add long-range stochastic shortcut connections (LRSSCs) to alleviate gradient vanishing and strengthen feature propagation. (3) Since some information may be lost in convolution and pooling layers, we use dilated convolution on the small size layer to gain a larger receptive field with low cost of memory. (4) We elaborate a denoising NN to make it suitable for learning clean images. By leverag- ing the concatenation operation, we can further improve the denoising performance. Meanwhile, since the deep layers reuse the features learned from the shallow layers, we can reduce the parameters of deep layers.

2. Related Works 2.1. Channel State Information The main idea of orthogonal frequency division multiplexing (OFDM) is to divide the channel into several orthogonal subchannels, which can reduce the mutual interference between the subchannels. By using the Intel 5300 NIC [21] or the Atheros AR9390 chipset [22], we can obtain CSI from the subchannels which reveals the channel characteristics. For Entropy 2021, 23, 574 3 of 15

OFDM system, the WiFi channel at the 2.4 GHz band can be regarded as a narrowband ﬂat fading channel. The channel model is deﬁned as

R = HT + G, (1)

where R and T represent the received and transmitted signal, respectively. G is the additive white Gaussian noise. H represents the channel frequency response(CFR). Ignoring the G, it can be calculated by R H = . (2) T The CFR of the ith subcarrier can be represented as

j∠Hi Hi = |Hi|e , (3)

where |Hi| and ∠Hi are the amplitude and phase response of the ith subcarrier, respectively. Generally, since the random noise and unsynchronized time clock between transmitter and receiver make the phase measurement has a large error, we only use amplitude as the ﬁngerprint in this paper.

2.2. Image Denoising We can only collect one noisy image at the same time and place, but traditional non- learning based methods cannot handle the denoising problem with it. Recently, some learning-based methods are proposed to solve this problem. Deep image prior (DIP) [23] showed that the structure of a convolutional image generator can get a large number of image statistics instead of learning. Although the algorithm and network model are simple, the optimal iteration number is hard to determine and the performance is unsatisfactory. S2S was proposed for image denoising using Bernoulli sampled instances which include the major information of the noisy image. By using Bernoulli dropout for reducing the variance of the prediction, the output of S2S gradually approximates to the clean image. Furthermore, in order to overcome the shortcoming of S2S in using low-level features insufﬁciently, we combine the low-level feature maps with multiple deep layers. By reusing the low-level features, we can obtain abundant background information.

2.3. ResNet ResNet was firstly introduced in [12] to address the degradation problem. The bottle- neck architecture, using a stack of three convolutional layers and one shortcut connection, was designed to fit a residual mapping. The first 1 × 1 layer is adopted to reduce dimensions, so that the 3 × 3 layer will have smaller input/output dimensions. Massive experiments show that this architecture can reduce the time complexity and model size. To obtain one-dimensional CSI fingerprint, ResNet was converted into ResNet1D. In order to retain the features of raw CSI and improve the model performance, the network uses pooling layer only in the input and output layer. The degradation problem could also be largely addressed by batch normalization (BN) [24], which ensures forward propagated signals with non-zero variances. The success of ResNet is attributed to the hypothesis that residual mapping is easier to fit than original mapping. Furthermore, suppose that nested residual mapping is easier to fit than original residual mapping. Hence, we add several shortcut connections to alleviate the degradation problem and strengthen information propagation.

3. Localization System The two main networks of our system are illustrated in Figure1. The “Denoiser” network works as a denoising NN which outputs a clean image, and the “ResFi” network works as a classiﬁcation NN which outputs the corresponding location of a CSI amplitude image. Entropy 2021, 23, x 4 of 16

Entropy 2021, 23, x 4 of 16

Entropy 2021, 23, 574 works as a classification NN which outputs the corresponding location of a CSI amplitude4 of 15 image.works as a classification NN which outputs the corresponding location of a CSI amplitude image.

Figure 1. Pipeline of our system. FigureFigure 1.1.Pipeline Pipeline ofof ourour system.system. The input is a noisy CSI image, and we can get a clean image by removing the noise The input is a noisy CSI image, and we can get a clean image by removing the noise from Theit. After input denoising, is a noisy we CSI can image, classify and clean we canimages get aby clean the ResFi.image The by removingdesign of Denoiserthe noise from it. After denoising, we can classify clean images by the ResFi. The design of Denoiser asfrom well it. as After ResFi denoising, will be elaborated we can classify in Section clean 3. images by the ResFi. The design of Denoiser as well as ResFi will be elaborated in Section3. as well as ResFi will be elaborated in Section 3. 3.1. CSI Image Construction 3.1. CSI Image Construction 3.1. CSIAn ImageIntel WiFi Construction link (IWL) 5300 NIC which can read the CSI values of 30 subcarriers An Intel WiFi link (IWL) 5300 NIC which can read the CSI values of 30 subcarriers from 56 subcarriers is used as the received equipment, and a TP-Link wireless router is fromAn 56 subcarriersIntel WiFi link is used (IWL) as the5300 received NIC which equipment, can read and the aCSI TP-Link values wireless of 30 subcarriers router is used as the transmitted equipment. Since only one antenna of wireless router is utilized, usedfromas 56 the subcarriers transmitted is used equipment. as the received Since only equipment, one antenna and of a wirelessTP-Link routerwireless is utilized,router is there are three wireless links between transmitter and receiver. We obtain 90 CSI data of thereused areas the three transmitted wireless links equipment. between Since transmitter only one and antenna receiver. of wireless We obtain router 90 CSI is utilized, data of three wireless links in a packet collection. For one wireless link, we take N packets in the threethere wirelessare three links wireless in a packetlinks between collection. transmi For onetter wirelessand receiver. link, weWe take obtain N packets90 CSI data in the of samesamethree locationwirelesslocation andlinksand convertconvert in a packet thethe CSIcollection.CSI datadata asas For oneone one channelchannel wireless ofof alink,a RGBRGB we image. image.take N Thus,Thus,packets wewe in can canthe constructconstructsame location the RGB and image convert by by theutilizing utilizing CSI data the the CSIas CSI one data data channel of of three three of wireless wirelessa RGB links.image. links. We WeThus, set set the we the N can Nto to30000,construct 30000, and and the conveted convetedRGB image the the packetsby packets utilizing into into the1000 1000 CSI imag data images.es. of As three As shown shown wireless in inFigure links. Figure 2, We2 the, theset curves curvesthe N ofto ofthree30000, three colors and colors convetedrepresent represent the CSI CSI packets data data from frominto three1000 three wiimag wirelessrelesses. Aslinks links shown and and the thein Figure curve of2, eachtheeach curves colorcolor isofis composedcomposedthree colors ofof represent3030 packets.packets. CSI TheThe data horizontalhorizontal from three axisaxis wi denotesderelessnotes thethelinks 3030 and subcarrierssubcarriers the curve ofof aofa wirelesswireless each color link,link, is andandcomposed thethe verticalvertical of 30 axis axispackets. denotes denotes The the thehorizontal amplitude amplitude axis of of CSIde CSInotes value. value. the Figure 30 Figure subcarriers3 illustrates 3 illustrates of the a wireless CSIthe imagesCSI link, im- inagesand four thein differentfour vertical different locations.axis locations.denotes The the differentThe amplitude different data distributionsdataof CSI distributions value. ofFigure CSI of images CSI3 illustrates images indicate indicate the that CSI CSIthat im- imagesCSIages images in canfour becan different used be used as locations. ﬁngerprints as fingerprints The for different localization. for localization. data distributions of CSI images indicate that CSI images can be used as fingerprints for localization.

FigureFigure 2.2. CSICSI amplitudeamplitude ofof threethree differentdifferent antennasantennas inin thethe samesame location.location. Figure 2. CSI amplitude of three different antennas in the same location.

Entropy 2021, 23, x 5 of 16

Entropy 2021, 23, x 5 of 16 Entropy 2021, 23, 574 5 of 15

Figure 3. CSI images in different locations.

3.2.Figure Modification 3. CSI images of S2S in different locations. Figure 3. CSI images in different locations. 3.2.The Modification architecture of S2S of modified S2S is shown in Figure 4. Given a noisy CSI amplitude ×× image3.2. Modification withThe architecture the ofsize S2S of of 30 modified 30 3 S2S , we is firstly shown utilize in Figure Bernoulli4. Given sampling a noisy CSI to amplitude obtain a set of M imageimage pairs with {} thennˆ , size of 30, and× 30 then,× 3, wenˆ firstly is processed utilize Bernoulli by the following sampling tothree obtain encoder a set of blocks The architecturemm ofm=M 1modified S2S ism shown in Figure 4. Given a noisy CSI amplitude image pairs {nˆ m, nm} ××, and then, nˆ m is processed by the following three encoder blocks (EBs).image The with first the twosize ofEBs m30= are1 30 composed 3 , we firstly of a utilize partial Bernoulli convolutional sampling (PConv) to obtain layer a set [25] of and a (EBs). The first twoM EBs are composed of a partial convolutional (PConv) layer [25] and a image pairs {}nnˆ , , and then, nˆ is processed by the following three encoder blocks maxmax pooling pooling layer, layer,mm respectively.m= respectively.1 TheThem last last EB EB is is composed composed of onlyof only a PConv a PConv layer. layer. We use We use the(EBs).the rectified rectified The first linear linear two unitEBs unit are (ReLU) (ReLU) composed [[26]26] as asof the thea pa activation acrtialtivation convolutional function. function. The (PConv) The number nu layermber of [25] channels of and channels a of of allmax allEBs EBspooling increases increases layer, from from respectively. 32 32 to 64, andTheand thenlast then EB to to 128.is 12 composed8. The The output output of only of theof a the lastPConv last EB is layer.EB a featureis Wea feature use map map withthewith rectified size size of of linear88128. 8××× 8 unit× 128. (ReLU) [26] as the activation function. The number of channels of all EBs increases from 32 to 64, and then to 128. The output of the last EB is a feature map with size of 88128.××

FigureFigure 4. 4. TheThe architecture architecture of of modified modified S2S. S2S. Figure 4. The architecture of modified S2S. AfterAfter the the EBs, EBs, there there are are three decoderdecoder blocks blocks (DBs). (DBs). The The first first DB DB is composedis composed of of a After the EBs, there are three decoder blocks (DBs). The first DB is composed of a convolutionala convolutional (Conv) (Conv) layer, layer, an anupsampling upsampling layer, layer, a Conv a Conv layer layer and and a aconcatenation concatenation (Con- convolutional(Concate) operation. (Conv) layer, The second an upsampling DB is composed layer, a of Conv an upsampling layer and a layer, concatenation a Conv layer (Con- and cate) operation. The second DB is composed of an upsampling layer, a Conv layer and a cate)a Concate operation. operation. The second The lastDB is DB composed is composed of an of upsampling three Conv layer, layers a toConv map layer the layerand a to ConcateConcatean image operation. of size 30 The The× 30last last× DB3 .TheDB is iscomposed number composed of of output three of three channelsConv Conv layers of layers these to map Convto themap layerslayer the to arelayer an 48, to an ×× imageimage24 and ofof 3,size respectively. 3030×× 30 30 3 3 For.The .The the number number low-level of outputof tasks, output channels such channels as denoising, of these of these Conv it is necessaryConvlayers layersareto 48, make are24 48, 24 andandfull 3, 3, use respectively.respectively. of low-level For For features. the the low-level low-level Inspired tasks, bytasks, DenseNet such such as denoising, as [27 denoising,], the Concate it is necessary it is operation necessary to make combine to full make a full useuselow-level ofof low-levellow-level feature features.features. map with Inspired Inspired two deep by by layers.Dens DenseNet WeeNet [27], use [27], low-level the Concatethe featuresConcate operation in operation deep combine layers combine two a a low-levellow-leveltimes and feature improve map map the with with information two two deep deep flowlayers. layers. between We Weuse layers uselow-level low-level by adding features features connections. in deep in layers deep Moreover, twolayers two timestimesbecause andand weimprove can reduce the the information information learning redundant flow flow between between feature layers maps layers by in adding deepby adding layers connections. byconnections. feature Moreo- reuse, Moreo- ver,this because network we requires can reduce fewer learning parameters redundant than S2S. feature maps in deep layers by feature ver, because we can reduce learning redundant feature mapsM in deep layers by feature reuse, Similarthis network to S2S, requires we first fewer sample parameters a set of image than pairsS2S. {nˆ m, nm}m=1 from n, and they are reuse, this network requires fewer parameters than S2S. M definedSimilar as to S2S, we first sample a set of image pairs {}nnˆ , from n , and they mm{}m=1 M Similar to S2S, we firstnˆ sample:= b a nset; n of: =image(1 − b pairs) n .nnˆ mm, from n , and (4) they are defined as m m m m m=1 are defined as

Entropy 2021, 23, x 6 of 16

Entropy 2021, 23, 574 6 of 15

= =−() nbnˆ mm:; nbnmm:.1  (4) then the training objective LD(()θ) can be formulated by the mean squared error then the training objective LD θ can be formulated by the mean squared error M M 2 2 LD()(θθ)= = kbm F ()(nm) −−nˆ mk , , (5) D ∑bnnm θθ mmˆ 2 (5) m=m1 =1 2  wherewhere denotes denotes the the elementwise elementwise multiplication. multiplication. The The loss loss of eachof each image image pair pair is calculated is calculated only on those pixels that are not eliminated by b . Since we use the Bernoulli sam- only on those pixels that are not eliminated by bm. Sincem we use the Bernoulli sampling to plingrandomly to randomly select pixels, select the pixels, sum the of loss sum of of all loss pairs of calculatesall pairs calculates the difference the difference over all image over () allpixels, image and pixels, the expectation and the expectation of LD(θ) aboutof LD noiseθ about is the noise same is as the same as

MM2 2 M ()− +M ,  nxm 2 δ2 (6) ==kFθ(n ) − xk +b kδk bm, (6) ∑mm11θ m bm m ∑ bm m=1 m=1 ⋅=22 ⋅ δ where bm  and denotes the standard deviation of noise. When enough bm2 2 2 where k · k = kbm ·k and δ denotes the standard deviation of noise. When enough image pairsb mare used for 2training, the Denoiser will learn a clean image from the noisy image pairs are used for training, the Denoiser will learn a clean image from the noisy imageimage nn.. The The denoised denoised results results corresponding corresponding to to Figure Figure3 are 3 are displayed displayed in Figure in Figure5. We 5. can We canobserve observe that that only only the the main main line line features features have have been been preserved preserved and and the randomthe random noise noise has hasbeen been well well removed. removed.

Figure 5. The denoised results. Figure 5. The denoised results. 3.3. Structure of the ResFi 3.3. StructureCNN has of anthe outstandingResFi performance in image classification [28]. However, as the depthCNN of thehas network an outstanding increases, performance training results in image will classification get worse. ResNet[28]. However, can solve as thisthe depthproblem of the by network learning increases, identity mapping. training result In orders will to get balance worse. the ResNet model can performance solve this prob- and lemparameters, by learning we identity finally adopt mapping. a 50-layer In order ResNet to balance as basic the model. model performance and parameters, Thewe finally proposed adopt ResFi a 50-la is inspiredyer ResNet by FCN,as basic CNN, model. and ResNet which are theoretically provedThe and proposed experimentally ResFi is inspired validated by as FCN, effective CNN, techniques and ResNet in imagewhich classification.are theoretically We will elaborate the structure of ResFi in this subsection. proved and experimentally validated as effective techniques in image classification. We will3.3.1. elaborate Stochastic the Residual structure Block of ResFi in this subsection.

3.3.1. AccordingStochastic Residual to [12], the Block identity block can be mathematically deﬁned as According to [12], the identity block can be mathematically defined as ye = F(ex, wb) + ex, (7) =+() yxwx ,,b  (7) where ex and ye are the vectors of input and output layer, respectively. wb are the weights of   F( ) whereconvolutional x and kernels,y are the and vectorsex, w ofb representsinput and theoutput residual layer, mapping respectively. to be learned.wb are Thethe operation F + x is performed by a shortcut() connection and element-wise addition. weights of convolutionale kernels, and  x,wb represents the residual mapping to be Once the dimensions of ex and F are unequal, a convolutional layer ws is added to the shortcut connections ye = F(ex, wb) + wsex, (8)

Entropy 2021, 23, x 7 of 16

learned. The operation  + x is performed by a shortcut connection and element-wise addition.

Once the dimensions of x and  are unequal, a convolutional layer ws is added to the shortcut connections Entropy 2021, 23, 574 =+() 7 of 15 yxwwx ,,bs  (8) Inspired by “Dropout” [29], we add the randomicity to the shortcut connections. The identity and convolutional block can be rewritten as Inspired by “Dropout” [29], we add the randomicity to the shortcut connections. The =+ (),,  identity and convolutional block can beyxwBx rewrittenb as (9)

yxwBwx=+ (),,  (10) ye = F(ex, wb)bs+ B ex, (9)

where B is a matrix which has the same dimension with x and wxs  . Each dimension ye = F(ex, wb) + B wsex, (10) of B obey Bernoulli Distribution. We replace each residual block by stochastic residual whereblock.B Sinceis a matrix the residual which hasconnections the same dimensionare randomly with preserved,ex and wsex .th Eache stochastic dimension residual of B obeyblock Bernoulli has the same Distribution. function Weas Dropout, replace each such residual as improving block the by stochasticmodel generalization residual block. abil- Sinceity and the preventing residual connections overfitting. are randomly preserved, the stochastic residual block has the same function as Dropout, such as improving the model generalization ability and preventing3.3.2. Long-Range overﬁtting. Stochastic Shortcut Connection

3.3.2. Long-RangeVeit et al. [30] Stochastic proposed Shortcut a novel analysis Connection that the residual networks can be interpreted as ensembles of many paths of differing length, instead of a single ultra-deep network. Veit et al. [30] proposed a novel analysis that the residual networks can be interpreted Inspired by the aforementioned identity and convolutional block, we propose the long- as ensembles of many paths of differing length, instead of a single ultra-deep network. range stochastic shortcut connection to enhance the ensemble behavior, which can further Inspired by the aforementioned identity and convolutional block, we propose the long- mitigate the impact of network degradation and gradient vanishing. As shown in Figure range stochastic shortcut connection to enhance the ensemble behavior, which can further 6a, the long-range stochastic shortcut connection can combine the low-level feature maps mitigate the impact of network degradation and gradient vanishing. As shown in Figure with deep layers. When the shallow layers have learned a desired residual mapping, the 6a, the long-range stochastic shortcut connection can combine the low-level feature maps deep layers of ResFi can retain the feature mapping of shallow layers well. The LRSSC can with deep layers. When the shallow layers have learned a desired residual mapping, the also help to propagate the gradients from deep layers to shallow layers well. We build the deep layers of ResFi can retain the feature mapping of shallow layers well. The LRSSC can alsoLRSSC help referred to propagate by (4.5). the Since gradients the dimensions from deep layersof shallow to shallow and deep layers layers well. are We unequal, build the we LRSSCadd a referredconvolutional by (4.5). layer Since to the the dimensions LRSSC. As of shown shallow in andFigure deep 6a, layers thereare are unequal, 5 LRSSCs we in addResFi. a convolutional Specially, all layer the LRSSCs to the LRSSC. combine As shownthe shallow in Figure layer6a, with there the are deep 5 LRSSCs layer inby ResFi. a con- Specially,catenation all operation the LRSSCs instead combine of element-wise the shallow layer addition. with theThus, deep we layer can prevent by a concatenation losing infor- operationmation from instead previous of element-wise layers and addition. learn more Thus, feature we can maps prevent by increasing losing information the number from of previouschannels. layers and learn more feature maps by increasing the number of channels.

Figure 6. The network structure of ResFi. (a) ResFi; (b) stochastic identity block; (c) stochastic convolutional block. Figure 6. The network structure of ResFi. (a) ResFi; (b) stochastic identity block; (c) stochastic convolutional block. 3.3.3. Dilated Convolution

As shown in Figure6a, different from the original ResNet architecture with two pooling layers, we only preserve the average pooling to avoid losing too much information of CSI image at the front of ResFi. Since the pooling layers will lose information when the receptive field is enlarged. We adopt a dilated convolution [31] to increase the receptive field instead of the pooling layer. The dilated convolution by increasing the interval of weights in kernel obtains a larger receptive field without additional parameters. The dilation rate is set as two, and the comparison of standard convolution and dilated convolution is shown in Figure7. Entropy 2021, 23, x 8 of 16

3.3.3. Dilated Convolution As shown in Figure 6a, different from the original ResNet architecture with two pooling layers, we only preserve the average pooling to avoid losing too much information of CSI image at the front of ResFi. Since the pooling layers will lose information when the receptive field is enlarged. Entropy 2021, 23, 574 We adopt a dilated convolution [31] to increase the receptive field instead of the pooling8 of 15 layer. The dilated convolution by increasing the interval of weights in kernel obtains a larger receptive field without additional parameters. The dilation rate is set as two, and the comparison of standard convolution and dilated convolution is shown in Figure 7. Hence,Hence, thethe 333××3 kernel kernel can can obtain obtain a a5 ×55×5 receptive receptive field. field. Although Although dilated dilated convolution convolution isis usuallyusually usedused inin semanticsemantic segmentation,segmentation, itit isis alsoalso effectiveeffective inin CSICSI imageimage classification,classification, andand thisthis will will be be testified testified in in the the experiments experiments later. later. To To reduce reduce computation computation and and memory, memory, we putwe theput dilated the dilated convolution convolution in the in rear the of rear ResFi. of InResFi. the actualIn the implementation, actual implementation, one dilated one convolutiondilated convolution is enough is toenough obtain to sufficient obtain sufficient effective effective receptive receptive field. field.

FigureFigure 7.7.The The comparisoncomparison ofof standardstandard and and dilated dilated convolution. convolution. 3.4. Training Scheme 3.4. Training Scheme In order to train the network, the cross-entropy [32] with a regularization term is selectedIn order as the to loss train function the network, to minimize the thecross-entr loss betweenopy [32] the with predicted a regularization label and ground term is selected as the loss function to minimize( ) the loss between the predicted label and ground truth label. The loss function LR w()can be written as truth label. The loss function LR w can be written as T (i) w xˆ() N K jT ˆ i 1 n o e wxj 1 NK(i) () e LR(w) =()==∑ ∑ 1 z {}=i j log LzjlogR w2N 1 KK T (i) = = wT xˆ()i i 21Nj 1 = wxl ˆ i=11 j ∑ e l l=1 l=1 N K 1 1 NK + + w2,2 (11) ∑∑ wij ij , (11) 2 2 = i=1i=j=111 j wherewhereN Nis theis the size size of of input input training training set. set.K Kis the is the total total number number of of output output neurons neurons which which (i()) isis equalequal toto thethe numbernumber ofof locations. 11{{}·⋅} is is thethe indicatorindicator function.function. zz i is is the the index index of of the the (i) location of the ith CSI image and j is the index of output neurons. xˆ is() the output of location of the ith CSI image and j is the index of output neurons. xˆ i is the output second last layer and wj is the weight vector connecting the neurons in the second last layerof second to the last output layer layer. and w j is the weight vector connecting the neurons in the second last layerIn to the the training output stage,layer. by minimizing LR(w) iteratively with momentum optimizer [33], we can optimize the network parameters w.() In the testing stage, for a clean CSI image In the training stage, by minimizing LR w iteratively with momentum optimizer x∗, we feed it into the ResFi network and adopt the output of the fully-connected layer as [33], we can optimize the network parameters w . In the testing stage, for a clean CSI im- the optimized∗ deep image features. Then, we can obtain the estimated location by using Softmaxage x , classiﬁer.we feed it into the ResFi network and adopt the output of the fully-connected layerThe as the pseudocode optimized for deep weight image training features. of our Then, system we can is given obtain in the Algorithm estimated 1. Thelocation inputs by ofusing Algorithm Softmax 1 areclassifier. CSI images from all training locations, location labels, max iterations and learningThe pseudocode rate. Firstly, for weight a set of training image pairsof our are system generated is given by in Bernoulli Algorithm Sampling. 1. The inputs For eachof Algorithm iteration, 1 we are decrease CSI images the from weights all trainingθ by descending locations, the location stochastic labels, gradient. max iterations Then, weand can learning get a clean rate. imageFirstly, by a removingset of image the pa noiseirs are from generated the noisy by image. Bernoulli After Sampling. the weights For trainingeach iteration, of Denoiser, we decrease we randomly the weights select aθ mini-batch by descending of N training the stochastic samples gradient. and feed themThen, intowe can ResFi. get Finally,a clean theimage weights by removingw are updated the noise by descendingfrom the noisy the image. stochastic After gradient. the weights training of Denoiser, we randomly select a mini-batch of N training samples and feed

Entropy 2021, 23, 574 9 of 15

Algorithm 1 Weights Training of the Denoiser and ResFi Input: a set of noisy images n, labels l, max iterations of Denoiser maxid, max iterations of ResFi maxir, learning rate α and β Output: Trained weights w∗ //Weight training of Denoiser M Generate Bernoulli sampled image pairs of a noisy image: {nˆ m, nm}m=1 Randomly initialize θ for iteration = 1: maxid do Update the Denoiser by descending the stochastic gradient:

∂L (θ) θ∗ = θ − α D ∂θ

end ∗ Obtain the clean image: x = Fθ(nˆ m) //Weight training of ResFi Randomly initialize w for iteration = 1: maxid do Randomly select a mini-batch of N training samples:

∗ {x i, li}, i = 1, . . . , N

Update the ResFi by descending the stochastic gradient:

∂L (w) w∗ = w − β R ∂w

End Obtain the optimal weights: w∗

4. Experiments 4.1. Experimental Setup Our CSI collecting equipment is composed of two parts, the access point and mobile terminal. We use a TP-Link wireless router as the AP which is responsible for continuously transmitting packets. A Lenovo laptop equipped with Intel 5300 network interface card serves as mobile terminal to collect raw CSI values. A desktop PC with NIVIDA RTX 2070 SUPER Graphic card serves as the model training servers (based on the Tensorﬂow framework and CUDA Tool kit 7.5). We conduct experiments to evaluate the performance of our system in a typical indoor scenario. As shown in Figure8, this is a 4 × 10 m laboratory with some obstacles, such as desktop computers, chairs, and tables. The wireless router and PC are placed at the end of the area with the ﬁxed height of 0.6m. We choose 10 locations (marked as black dots) to be tested. The raw CSI values are collected by CSI Tool [34] at each location. If the PC Pings the AP once, the AP will return a packet to the PC. In these experiments, we set the interval of Pings as 0.01 s and record with 5 min at every location. Thus, we obtain 30,000 packets at every location and then convert them into 1000 CSI images. Finally, the CSI images are increased to 63,000 by using data augment. Entropy 2021, 23, x 10 of 16

Entropy 2021, 23, x 10 of 16

PC Pings the AP once, the AP will return a packet to the PC. In these experiments, we set the interval of Pings as 0.01 s and record with 5 min at every location. Thus, we obtain PC30,000 Pings packets the AP at once, every the location AP will and return then a convertpacket to them the PC.into In1000 these CSI experiments, images. Finally, we set the the interval of Pings as 0.01 s and record with 5 min at every location. Thus, we obtain Entropy 2021, 23, 574 CSI images are increased to 63,000 by using data augment. 10 of 15 30,000 packets at every location and then convert them into 1000 CSI images. Finally, the CSI images are increased to 63,000 by using data augment.

Figure 8. Training and test in the lab.

Figure4.2.Figure Analysis 8. Training 8. Training of the and Experimental and test test in the in thelab. Parameters lab. and Settings 4.2.In Analysis this subsection, of the Experimental we empirically Parameters evaluate and Settings the impact of different parameters of 4.2. Analysis of the Experimental Parameters and Settings ResFi andIn this experimental subsection, settings we empirically. evaluate the impact of different parameters of ResFi In this subsection, we empirically evaluate the impact of different parameters of and experimental settings. ResFi4.2.1. and Impact experimental of the Convolutional settings. Kernel Size 4.2.1.Since Impact we need of the to Convolutional match the dimensions Kernel Size of feature maps in the branches and back- 4.2.1. Impact of the Convolutional Kernel Size bone, theSince stride we needand size to match of convolutional the dimensions kernels of feature in branches maps inneed the to branches be fixed and first. backbone, Thus, wethe Sinceonly stride analyzewe and need size the to of impactsmatch convolutional the of dimensionskernels kernels size in ofin branches featurethe backbone. maps need toin Figurebe the fixed branches 9 first.shows Thus, and the back-model we only bone,performanceanalyze the stride the impactswith and different size of of kernels convolutional kernels size insize the. kernelW backbone.e finds in kernel branches Figure 55×9 showsneed is the to the bebest model fixed choice first. performance, and Thus, this weiswith becauseonly different analyze the kernel kernelsthe impacts 55× size. is ofWe suitable kernels find kernel for size feature in5 × the5 extraction isbackbone. the best of choice, Figure CSI images and9 shows this. is the because model the × performancekernel 5 × with5 is suitable different for kernels feature size. extraction We find of kernel CSI images. 5 5 is the best choice, and this is because the kernel 5× 5 is suitable for feature extraction of CSI images. 100

Training

99 Test

94 Accuracy (%)

90 3*3 5*5 7*7 Size FigureFigure 9. 9.TheThe comparison comparison of ofdifferent different kernel kernel size. size.

4.2.2. Impact of the Number of Dilated Convolutions Figure4.2.2. 9.Impact The comparison of the Number of different of Dilated kernel Convolutionssize. As shown in Figure 10, we observe the test accuracy is improved about 2.80% with As shown in Figure 10, we observe the test accuracy is improved about 2.80% with 4.2.2.one Impact dilated of convolution. the Number Theof Dilated result confirmsConvolutions that dilated convolution is effective for CSI oneimage dilated classification. convolution. The The kernel result size confirms of dilated that convolution dilated convolution is 3 × 3 with is dilation effective rate for of CSI two. As shown in Figure 10, we observe the test accuracy is improved about 2.80% with imageCompared classification. to the pooling The kernel operation, size of thedilated receptive convolution field increases is 33× withoutwith dilation losing rate spatial of one dilated convolution. The result confirms that dilated convolution is effective for CSI information, and this is undoubtedly beneficial for localization task. In addition, dilated image classification. The kernel size of dilated convolution is 3× 3 with dilation rate of convolution should be also suitable for other classification tasks.

Entropy 2021, 23, x 11 of 16

two. Compared to the pooling operation, the receptive field increases without losing spatial information, and this is undoubtedly beneficial for localization task. In addition, di- two. Compared to the pooling operation, the receptive field increases without losing spa- lated convolution should be also suitable for other classification tasks. Entropy 2021, 23, 574 tial information, and this is undoubtedly beneficial for localization task. In addition11, di- of 15 lated convolution should be also suitable for other classification tasks. 100

Training

99 Test

100 98 Training

99 Test 97

98 96

97 95

96 94 Accuracy (%) 95 93

Accuracy (%) 92

93 91

92 90 0 1 91 Number 90 0 1 Figure 10. The comparisonNumber of different number of dilated convolutions. Figure 10. The comparison of different number of dilated convolutions. Figure4.2.3. I 10.mpact The ofcomparison the Number of different of Convolutional number of dilated Kernels convolutions. 4.2.3. Impact of the Number of Convolutional Kernels 4.2.3.As Impact we know, of the moreNumber convolutional of Convolutional kernels K ernelsrequire more computational cost. There- fore, weAs conduct we know, some more experiments convolutional to seek kernels a suitable require number more computationalof the convolutional cost. Therefore, kernels. As we know, more convolutional kernels require more computational cost. There- Firstly,we conduct we set somethe number experiments of convolutional to seek a suitablekernels to number be the same of the as convolutional the original ResNet. kernels. fore, we conduct some experiments to seek a suitable number of the convolutional kernels. Then,Firstly, we we halve set thethe numbernumber of convolutionalconvolutional kernelskernels. toAs be shown the same in F asigure the 11 original, as the ResNet. num- Firstly, we set the number of convolutional kernels to be the same as the original ResNet. berThen, of convolution we halve the kernels number has been of convolutional halved, the localization kernels. As performance shown in Figure has a subtle11, as in- the Then, we halve the number of convolutional kernels. As shown in Figure 11, as the num- crease.number This of convolutionmeans that we kernels do not has need been so halved, many parameters, the localization so we performance halve the number has a subtle of ber of convolution kernels has been halved, the localization performance has a subtle in- convolutionalincrease. This kernels means of that ResNet we do-50 not to need reduce so manythe computational parameters, socost. we halve the number of crease.convolutional This means kernels that ofwe ResNet-50 do not need to reduceso many the parameters, computational so we cost. halve the number of convolutional kernels of ResNet-50 to reduce the computational cost. 100

Training

99 Test

100 98 Training

99 Test 97

98 96

97 95

96 94 Accuracy (%) 95 93

Accuracy (%) 92

93 91

92 90 1/2 1 91 Number 90 Figure 11. The1/2 comparison of different1 number of convolutional kernels. Figure 11. The comparisonNumber of different number of convolutional kernels.

4.2.4. Impact of the Number of Iterations Figure4.2.4. Impact 11. The ofcomparison the Number of different of Iterations number of convolutional kernels. Since proper iterations can prevent overﬁtting and reduce computational cost, we Since proper iterations can prevent overfitting and reduce computational cost, we 4.2.4.compared Impact different of the Number iterations of Iterations of the ResFi to seek a suitable one. Figure 12 shows that Entropy 2021, 23, x compared different iterations of the ResFi to seek a suitable one. Figure 12 shows12 thatof 16 400,000Since iterations proper iterations and 500,000 can iterations prevent getoverfitting the best performance.and reduce computational This shows that cost, the we loss 400,000 iterations and 500,000 iterations get the best performance. This shows that the loss comparedfunction hasdifferent converged iterations when of the the number ResFi to of seek iterations a suitable is 400,000. one. Figure Therefore, 12 shows we choose that function has converged when the number of iterations is 400,000. Therefore, we choose 400,000400,000 iterations as the maximum and 500,000 iterations. iterations get the best performance. This shows that the loss 400,000 as the maximum iterations. function100 has converged when the number of iterations is 400,000. Therefore, we choose Training 400,00099 as Testthe maximum iterations.

94 Accuracy (%)

90 20k 30k 40k 50k Iterations FigureFigure 12. 12. TheThe comparison comparison of of different different iterations. iterations.

4.2.5. Analysis of the Robustness To test the robustness of our localization method to different routers, we construct Dataset 2 and 3 by using two additional TP-Link routers to measure the CSI data, respectively. In addition, we replaced the tester when we constructed Dataset 2. The original test dataset is named Dataset 1 and the combination of Dataset 1, 2, and 3 is named Dataset 4. In addition, the measurement environment of dataset 2 and 3 is a little different from that of dataset 1. As shown in Figure 13, ResFi performs stably on different Datasets which demonstrates that the proposed method is robust to different routers, a certain degree of environmental changes and the replacement of tester.

100

Test 99

94 Accuracy (%)

90 Dataset 1 Dataset 2 Dataset 3 Dataset 4

Figure 13. The comparison of different routers.

4.3. Ablation Experiments To test the impact of the Denoiser, we use the originally noisy CSI images and denoised CSI images as the training data, respectively. As shown in Figure 14, we observe that the test accuracy is improved about 0.8% which demonstrates that the random noise has certain interference to the network. The denoised CSI images can improve the localization accuracy by preserving the main line features.

Entropy 2021, 23, x 12 of 16

100

Training

99 Test

94 Accuracy (%)

90 20k 30k 40k 50k Iterations Entropy 2021, 23, 574 12 of 15 Figure 12. The comparison of different iterations.

4.2.5. Analysis of the Robustness 4.2.5.To Analysistest the robustness of the Robustness of our localization method to different routers, we construct DatasetTo 2 testand the3 by robustness using two ofadditional our localization TP-Link method routers to differentmeasure the routers, CSI data, we construct respectively.Dataset In addition, 2 and 3 by we using replaced two additionalthe tester when TP-Link we constructed routers to measure Dataset the2. The CSI original data, respec- test datasettively. is In named addition, Dataset we replaced 1 and the the combination tester when of we Dataset constructed 1, 2, and Dataset 3 is named 2. The originalDataset test4. Indataset addition is, namedthe measurement Dataset 1 and environment the combination of dataset of Dataset2 and 3 is 1, a 2, little and different 3 is named from Dataset that of4. dataset In addition, 1. As theshown measurement in Figure 13, environment ResFi performs of dataset stably 2 andon different 3 is a little Datasets different which from demonstratesthat of dataset that 1. Asthe shown proposed in Figure method 13 ,is ResFi robust performs to different stably routers, on different a certain Datasets degree which of environmentaldemonstrates changes that the and proposed the replacement method is robustof tester. to different routers, a certain degree of environmental changes and the replacement of tester.

100

Test 99

94 Accuracy (%)

90 Dataset 1 Dataset 2 Dataset 3 Dataset 4

Figure 13. The comparison of different routers. Figure 13. The comparison of different routers. 4.3. Ablation Experiments 4.3. AblationTo test Experiments the impact of the Denoiser, we use the originally noisy CSI images and denoised CSITo images test the as theimpact training of the data, Denoiser, respectively. we use As the shown originally in Figure noisy 14 CSI, we images observe and that de- the Entropy 2021, 23, x test accuracy is improved about 0.8% which demonstrates that the random noise has certain13 of 16 noised CSI images as the training data, respectively. As shown in Figure 14, we observe thatinterference the test ac tocuracy the network. is improved The denoised about 0.8% CSI which images demonstrates can improve thethat localization the random accuracy noise hasby certain preserving interference the main to line the features. network. The denoised CSI images can improve the locali- zation100 accuracy by preserving the main line features. Training

99 Test

94 Accuracy (%)

90 Noisy Denoised Figure 14. The comparison of different datasets. Figure 14. The comparison of different datasets. 4.4. Comparison of the Existing Methods 4.4. Comparison of the Existing Methods We have compared ResFi with three existing NN based methods, including DANN, DeepFiWe and have ConFi. compared The parameters ResFi with ofthree the existing algorithms NN are based all tuned methods, to give including the best DANN, perfor- DeepFimance. and Since ConFi. the overﬁtting The parameters problem of the is serious algorithms in ConFi, are all we tuned add to a Dropoutgive the best layer perfor- at the mance.end of theSince network. the overfitting For a fair problem comparison, is serious all schemes in ConFi, use we the add same a Dropout data set layer to estimate at the endthe positionof the network. of the moving For a fair object. comparison, all schemes use the same data set to estimate the positionWe use of mean the moving error M object.estimated on test dataset as the metric of localization perfor- We use mean error  estimated on test dataset∗ ∗ as the metric of localization performance. For M mistakenly estimated locations, ai , bi represents the estimated location of ( ) ∗∗ mance.objection Fori, andM mistakenlyai, bi represents estimated the real locations, location. ( Theabii, mean) represents error is deﬁnedthe estimated as loca-

tion of objection i , and (abii, ) representsM q the real location. The mean error is defined as 1 ∗ 2 ∗ 2 M = a − ai + b − bi . (12) ∑ M i 22i M i=11 ∗∗  =∑ (aaii −) +−( bb ii) . (12) M i=1 As shown in Table 1, we provide the mean error and the standard deviation of localization errors. Our system achieves the mean error of 1.7873 m and the standard deviation of 1.2806 m. It indicates that ResFi-based indoor localization is the most precise in these methods. ResFi also shows robust performance for different locations with the smallest standard deviation. As shown in Figure 15, compared to ConFi, ResFi improves the localization accuracy about 1.96%. In the actual experiments, ResFi outperforms the other three schemes in localization accuracy.

Table 1. The comparison of localization error.

Algorithm Mean Error (m) Std.dev. (m) Parameters (M) DANN 2.3910 1.6507 0.85 DeepFi 2.1082 1.4821 1.76 ConFi 1.9365 1.3554 8.11 ResFi 1.7873 1.2806 14.07

Entropy 2021, 23, 574 13 of 15

As shown in Table1, we provide the mean error and the standard deviation of localization errors. Our system achieves the mean error of 1.7873 m and the standard deviation of 1.2806 m. It indicates that ResFi-based indoor localization is the most precise in these methods. ResFi also shows robust performance for different locations with the smallest standard deviation. As shown in Figure 15, compared to ConFi, ResFi improves the localization accuracy about 1.96%. In the actual experiments, ResFi outperforms the other three schemes in localization accuracy.

Table 1. The comparison of localization error.

Algorithm Mean Error (m) Std.dev. (m) Parameters (M) Entropy 2021, 23, x 14 of 16 DANN 2.3910 1.6507 0.85 Entropy 2021, 23, x DeepFi 2.1082 1.4821 1.76 14 of 16

ConFi 1.9365 1.3554 8.11 ResFi 1.7873 1.2806 14.07 100

Training

99 Test 100

Training 98 99 Test

97 98

96 97

95 96

Accuracy (%) 95

93 94 Accuracy (%)

92 93

91 92

90 91 DANN DeepFi ConFi ResFi

Methods 90 DANN DeepFi ConFi ResFi Figure 15. The comparisonMethods of different methods. FigureFigure 15. 15. TheThe comparison comparison of ofdifferent different methods. methods. We also apply ResNet-50 to indoor localization in another experiment. The results We also apply ResNet-50 to indoor localization in another experiment. The results are are illustratedWe also apply in Figure ResNet 16.- Compared50 to indoor to localization ResNet-50, ResFiin another improves experiment. the localization The results ac- illustrated in Figure 16. Compared to ResNet-50, ResFi improves the localization accuracy arecuracy illustrated about 1.6%, in Figure which 1 6indicates. Compared that toResFi ResNet can -extract50, ResFi more improves effective the features localization from CSIac- about 1.6%, which indicates that ResFi can extract more effective features from CSI images curacyimages about than ResNet 1.6%, which-50. indicates that ResFi can extract more effective features from CSI than ResNet-50. images than ResNet-50.

100

Training

99 Test 100

98 Training

99 Test

97 98

96 97

95 96

Accuracy (%) 95

93 94 Accuracy (%) 92 93

91 92

90 91 ResNet-50 ResFi

Methods 90 ResNet-50 ResFi FigureFigure 16. 16. TheThe comparison comparisonMethods of of ResNet ResNet-50-50 and and ResFi. ResFi. Figure5. Conclusions 16. The comparison of ResNet-50 and ResFi. 5. Conclusions In this paper, we proposed a denoising NN and a novel ResNet architecture to classify 5.the ConclusionsIn CSI this images. paper, By we full proposed use of the a denoising low-level features NN and in a thenovel deep ResNet layers architecture of the denoising to clas- NN, sifywe theIn could thisCSI improvepaper,images. we By the proposed full denoising use of a the denoising performance low-level NN features and and reduce a innovel the the deepResNet parameters. layers architecture of the Moreover, denoising to clas- the sifyNN,stochastic the we CSI could residualimages. improve By block full the was usedenoising proposedof the low performance to-level effectively features and prevent inreduce the overﬁtting.deep the parameters.layers Specially,of the Moreover,denoising the long- NN,therange stochastic we stochasticcould residual improve shortcut block the denoising connection was proposed performance was to used effectively to and further reduce prevent boost the overfitting. information parameters. Specially, propagationMoreover, the thelongbetween stochastic-range shallow stoch residualastic and shortcutblock deep was layers. connection proposed Through towas effectively empirical used to further validationprevent boost overfitting. and information analysis, Specially, ResFi propa- the was longgation-range between stoch shallowastic shortcut and deepconnection layers. was Through used toempirical further boostvalidation information and analysis, propa- gationResFi was between proved shallow to achieve and significant deep layers. improvement Through empirical in indoor validationlocalization. and The analysis, experi- ResFimental was results proved also toconfirm achieve that significant ResNet has improvement better performance in indoor in localization. indoor localization The experi- than mentalCNN. However, results als theo confirm indoor thatlocalization ResNet hasof multiple better performance objects is still in a indoor challenging localization task which than CNN.is worthy However, of further the studyindoor in localization the future. of multiple objects is still a challenging task which is worthy of further study in the future. Author Contributions: C.X. and Y.Z. (Yunwei Zhang) collected the data; W.W. and C.X. conceived of the study; C.X. designed the network model, analysed the data, and drafted and revised the Author Contributions: C.X. and Y.Z. (Yunwei Zhang) collected the data; W.W. and C.X. conceived manu-script; W.W., Yunwei Zhang, J.Q., S.Y., and Y.Z. (Yun Zhang) revised the manuscript. All of the study; C.X. designed the network model, analysed the data, and drafted and revised the authors have read and agreed to the published version of the manuscript. manu-script; W.W., Yunwei Zhang, J.Q., S.Y., and Y.Z. (Yun Zhang) revised the manuscript. All authorsFunding: have This read work and was agreed supported to the bypublished the National version Natural of the Science manuscript. Foundation of China (Grant No. 61871232 and No.61771257). This work was also supported by the Postgraduate Research & Funding: This work was supported by the National Natural Science Foundation of China (Grant Practice Innovation Program of Jiangsu Province (Grant No. SJCX19_0275 and No. KYCX20_0802). No. 61871232 and No.61771257). This work was also supported by the Postgraduate Research & PracticeInstitutional Innovation Review Program Board Statement: of Jiangsu ProvinceNot applicable. (Grant No. SJCX19_0275 and No. KYCX20_0802). Institutional Review Board Statement: Not applicable.

Entropy 2021, 23, 574 14 of 15

proved to achieve signiﬁcant improvement in indoor localization. The experimental results also conﬁrm that ResNet has better performance in indoor localization than CNN. However, the indoor localization of multiple objects is still a challenging task which is worthy of further study in the future.

Author Contributions: C.X. and Y.Z. (Yunwei Zhang) collected the data; W.W. and C.X. conceived of the study; C.X. designed the network model, analysed the data, and drafted and revised the manu-script; W.W., Yunwei Zhang, J.Q., S.Y., and Y.Z. (Yun Zhang) revised the manuscript. All authors have read and agreed to the published version of the manuscript. Funding: This work was supported by the National Natural Science Foundation of China (Grant No. 61871232 and No.61771257). This work was also supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant No. SJCX19_0275 and No. KYCX20_0802). Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The data are available at https://github.com/Jacriper/ResFi (accessed on 30 April 2021). Conﬂicts of Interest: The authors declare no conﬂict of interest.

References and Note 1. Cidronali, A.; Maddio, S.; Giorgetti, G. Analysis and Performance of a Smart Antenna for 2.45-GHz Single-Anchor Indoor Positioning. IEEE Trans. Microw. Theory Tech. 2010, 58, 21–31. [CrossRef] 2. Wang, Y.; Ma, S.; Chen, C.L. TOA-Based Passive Localization in Quasi-Synchronous Networks. IEEE Commun. Lett. 2014, 18, 592–595. [CrossRef] 3. Ng, J.K.; Lam, K.; Cheng, Q.J. An effective signal strength-based wireless location estimation system for tracking indoor mobile users. J. Comput. Syst. Sci. 2013, 79, 1005–1016. [CrossRef] 4. Jaffe, A.; Wax, M. Single-Site Localization via Maximum Discrimination Multipath Fingerprinting. IEEE Trans. Signal Process. 2014, 62, 1718–1728. [CrossRef] 5. Wang, W.; Li, T.; Wang, W. Multiple Fingerprints-Based Indoor Localization via GBDT: Subspace and RSSI. IEEE Access 2019, 7, 80519–80529. [CrossRef] 6. Bahl, P.; Padmanabhan, V.N. RADAR: An in-building RF-based user location and tracking system. In Proceedings of the Infocom Nineteenth Joint Conference of the IEEE Computer & Communications Societies IEEE, Tel Aviv, Israel, 26–30 March 2000. 7. Youssef, M.; Agrawala, A.K. The Horus WLAN location determination system. In Proceedings of the International Conference on Mobile Systems, Applications, and Services, Seattle, WA, USA, 6–8 June 2005; pp. 205–218. 8. IEEE Standard for Information Technology—Telecommunications and Information Exchange between Systems Local and Metropolitan Area Networks—Specific Requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications—Redline, IEEE Std 802.11-2012 (Revision of IEEE Std 802.11-2007)—Redline, pp. 1–5229, 29 March 2012. 9. Wang, X.; Gao, L.; Mao, S. DeepFi: Deep learning for indoor fingerprinting using channel state information. In Proceedings of the 2015 IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, LA, USA, 9–12 March 2015; pp. 1666–1671. [CrossRef] 10. Wang, X.; Gao, L.; Mao, S. CSI-Based Fingerprinting for Indoor Localization: A Deep Learning Approach. IEEE Trans. Veh. Technol. 2017, 66, 763–776. [CrossRef] 11. Chen, H.; Zhang, Y.; Li, W. ConFi: Convolutional Neural Networks Based Indoor Wi-Fi Localization Using Channel State Information. IEEE Access 2017, 5, 18066–18074. [CrossRef] 12. He, K.; Zhang, X.; Ren, S. Deep Residual Learning for Image Recognition. In Proceedings of the Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. 13. Ren, S.; He, K.; Girshick, R. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. 14. He, K.; Gkioxari, G.; Dollár, P. Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 386–397. [CrossRef] 15. Wang, F.; Feng, J.; Zhao, Y. Joint Activity Recognition and Indoor Localization with WiFi Fingerprints. IEEE Access 2019, 7, 80058–80068. [CrossRef] 16. Zhou, M.; Liu, T.; Li, Y. Towards Understanding the Importance of Noise in Training Neural Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. 17. Liu, W.; Cheng, Q.; Deng, Z. C-Map: Hyper-Resolution Adaptive Preprocessing System for CSI Amplitude-based Fingerprint Localization. IEEE Access 2019, 7, 135063–135075. [CrossRef] 18. Ye, H.; Gao, F.; Qian, J. Deep Learning based Denoise Network for CSI Feedback in FDD Massive MIMO Systems. IEEE Commun. Lett. 2020, 24, 1742–1746. [CrossRef] Entropy 2021, 23, 574 15 of 15

19. Lehtinen, J.; Munkberg, J.; Hasselgren, J. Noise2Noise: Learning Image Restoration without Clean Data. arXiv 2018, arXiv:1803.04189. 20. Quan, Y.; Chen, M.; Pang, T. Self2self with dropout: Learning self-supervised denoising from single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. 21. Halperin, D.; Hu, W.; Sheth, A. PredicTable 802.11 packet delivery from wireless channel measurements. In Proceedings of the ACM Special Interest Group on Data Communication, New York, NY, USA, 3 September 2010; Volume 40, pp. 159–170. 22. Xie, Y.; Li, Z.; Li, M. Precise Power Delay Profiling with Commodity WiFi. In Proceedings of the ACM/IEEE International Conference on Mobile Computing and Networking, Pairs, France, 7–9 September 2015; pp. 53–64. 23. Lempitsky, V.; Vedaldi, A.; Ulyanov, D. Deep Image Prior. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9446–9454. 24. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015. 25. Liu, G.; Reda, F.A.; Shih, K.J. Image Inpainting for Irregular Holes Using Partial Convolutions. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. 26. Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair. In Proceedings of the International Conference on International Conference on Machine Learning. Omnipress, Haifa, Israel, 21–24 June 2010. 27. Huang, G.; Liu, Z.; Der Maaten, L.V. Densely Connected Convolutional Networks. In Proceedings of the Computer Vision and Pattern recOgnition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. 28. Simard, P.Y.; Steinkraus, D.W.; Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the International Conference on Document Analysis and Recognition, Edinburgh, Scotland, 3–6 August 2003; pp. 958–963. 29. Srivastava, N.; Hinton, G.E.; Krizhevsky, A. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. 30. Veit, A.; Wilber, M.; Belongie, S. Residual Networks Behave Like Ensembles of Relatively Shallow Networks. arXiv 2016, arXiv:1605.06431. 31. Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2015, arXiv:1511.07122. 32. De Boer, P.; Kroese, D.P.; Mannor, S. A Tutorial on the Cross-Entropy Method. Ann. Oper. Res. 2005, 134, 19–67. [CrossRef] 33. Qian, N. On the momentum term in gradient descent learning algorithms. Neural Netw. 1999, 12, 145–151. [CrossRef] 34. Halperin, D.; Hu, W.; Sheth, A. Tool release: Gathering 802.11n traces with channel state information. ACM SIGCOMM Comput. Commun. Rev. 2011, 41, 53. [CrossRef]