Article Radio–Image Transformer: Bridging Radio Classification and ImageNet Classification

Shichuan Chen 1,* , Kunfeng Qiu 2,*, Shilian Zheng 1 , Qi Xuan 2 and Xiaoniu Yang 1 1 Science and on Information Security Control Laboratory, Jiaxing 314033, ; [email protected] (S.Z.); [email protected] (X.Y.) 2 Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China; [email protected] * Correspondence: [email protected] (S.C.); [email protected] (K.Q.)

 Received: 10 August 2020; Accepted: 29 September 2020; Published: 9 October 2020 

Abstract: Radio modulation classification is widely used in the field of communication. In this paper, in order to realize radio modulation classification with the help of the existing ImageNet classification models, we propose a radio–image transformer which extracts the instantaneous , instantaneous phase and instantaneous from the received radio complex baseband , then converts the signals into images by the proposed rearrangement method or convolution mapping method. We finally use the existing ImageNet classification network models to classify the modulation type of the signal. The experimental results show that the proposed signal rearrangement method and convolution mapping method are superior to the methods using constellation diagrams and time–frequency images, which shows their performance advantages. In addition, by comparing the results of the seven ImageNet classification network models, it can be seen that, except for the relatively poor performance of the architecture MNASNet1_0, the modulation classification performance obtained by the other six network architectures is similar, indicating that the proposed methods do not have high requirements for the architecture of the selected ImageNet classification network models. Moreover, the experimental results show that our method has good classification performance for signal datasets with different sampling rates, Orthogonal Frequency Division (OFDM) signals and real measured signals.

Keywords: modulation classification; deep learning; ImageNet; convolutional neural network

1. Introduction Radio signal classification is widely used in the field of communication. For example, in [1], the secondary user needs to monitor the primary user’s signal to avoid harmful interference to the primary user. Therefore, the secondary user needs to classify the primary user’s signal to accurately know the type of primary user currently using the spectrum, so as to access the spectrum in a more efficient manner [2]. In management [3,4], some malicious users may transmit specific signals to disturb the spectrum use. In these circumstances with the overlapping and coexistence of various radio signals, if each radio signal can be identified, it will help to determine the existence of malicious users, and then take effective measures to deal with the situation. In adaptive modulation and coding communication [5–7], if the receiver is able to recognize the modulation and coding scheme adopted by the , it can choose the corresponding and decoding algorithm for information recovery, so as to save signaling overhead for informing the receiver of the adopted modulation and coding scheme. Modulation type is a basic attribute of radio communication signal, and in this paper, we mainly focus on radio modulation classification.

Electronics 2020, 9, 1646; doi:10.3390/electronics9101646 www.mdpi.com/journal/electronics Electronics 2020, 9, 1646 2 of 14

The purpose of the radio modulation classification problem is to identify the type of modulation to which the signal belongs. This is actually a multi-category classification problem, and the number of categories is the number of modulation types in the signal dataset. In this paper, we treat the problem of radio modulation classification by making full use of the existing mature ImageNet classification network structure. Since the raw radio signals are represented in IQ form, which is different from the image, we propose a radio–image transformer (RIT) to convert the radio signal into the image format. To be specific, we firstly extract the instantaneous amplitude, instantaneous phase and instantaneous frequency of the radio IQ signals, and then convert the signals into images by the proposed signal rearrangement method (SRM) or convolution mapping method (CMM). We finally complete the task of signal modulation classification by using the existing ImageNet classification network models. We verify the effectiveness of the proposed methods through simulations. The main contribution of this paper is that the RIT proposed can transform the signal classification problem into the image classification problem by mapping signals to images, and thus complete the radio signal modulation recognition using sophisticated classification techniques in the image field. We will show by extensive experiments that our method achieves better classification accuracy than some existing signal-to-image classification methods. The rest of the paper is organized as follows. Section2 gives the related work and literature review. Section3 presents the definition of radio modulation classification problem. Section4 introduces the proposed radio modulation classification methods. Section5 introduces the experiments and performance evaluation. Section6 summarizes the paper.

2. Related Work and Literature Review In this section, we mainly introduce some related work of radio modulation classification. The traditional modulation classification methods are mainly based on feature design [8–13]. The quality of the designed features directly determines the quality of recognition performance. However, these designed features are often associated with a specific modulation, and it is difficult to find features that are widely applicable to a variety of . As an end-to-end learning method, deep learning [14] unifies feature extraction and recognition tasks, avoids the process of manual feature design, and hence greatly enhances its universal applicability. For instance, in the field of image recognition, a lot of convolutional neural network (CNN) structures have been designed for the ImageNet dataset [15] and these CNNs achieved pretty good classification performance. In view of the great success of deep learning in image classification and other fields, more and more work has introduced deep learning into radio modulation recognition. These methods can be divided into two categories. The first is to design a special CNN or long short term memory (LSTM) network for modulation classification according to the raw radio signals’ input form (the in-phase and quadrature (IQ) form) [16–19]. The second is to convert the radio signals into images, and then classify the images by referring to the network structure used for image classification, so as to realize the classification of radio signals. At present, there are two main ways to transform IQ signals into images, namely, constellation diagrams [20] and time–frequency images [21–23], but they may affect the modulation classification performance due to the loss of the temporal correlation information between signal sampling points. The classification methods proposed in this paper make full use of the technology in the field of image classification, especially the content related to ImageNet. ImageNet is an image dataset widely used by vision researchers and plays an important role in object detection and recognition and other fields. The image dataset was created by Professor Li’s team. At that time, the academic scholars mostly focused on designing better algorithms. Prof. Li, however, believed that a good dataset is beneficial to the training of models and then began to collect images with annotated information on a large scale and finally officially launched the ImageNet dataset in 2009. This dataset contains about 15 million annotated images, which can be divided into about 22,000 categories [15]. The dataset can be used to train the models to improve their performance, which makes a great contribution to the academic research in this area. Electronics 2020, 9, 1646 3 of 14

In 2010, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), based on the ImageNet dataset, was launched globally. The competition includes image classification, target detection, target positioning, scene classification and target detection, and so on. A subset of the ImageNet dataset is used in this competition, which contains 1000 categories, and each category contains about 1000 images. Various classic CNN models, such as AlexNet [24], ZFNet [25], VGG [26], GoogleNet [27] and ResNet [28], emerged in the ILSVRC competitions held for 7 years. In particular, ResNet learns the residual representation between inputs and outputs by using multiple layers that contain parameters and can be learned, thus avoiding the problem of gradient disappearance and making the model easier to optimize. In the field of image classification, the CNN models trained with a large amount of high-quality images can be transferred to other image datasets for classification and recognition, and they can also be fine-tuned and trained on this basis. The advantage of this is that it is unnecessary to train the neural network from scratch, which can save time and computing resources. Most of the classic CNN models mentioned above have a pre-training model based on the ImageNet dataset, which is convenient for relevant personnel to conduct research. In addition, some lightweight CNN models for edge application scenarios also have pre-training models based on the ImageNet dataset, such as SqueezeNet [29], Xception [30], MobileNet [31], and ShuffleNet [32]. Taking MobileNet as an example, it is a small model proposed by Google, which can be effectively applied to the recognition task on intelligent devices. Unlike the traditional CNN models, MobileNet uses different convolution kernels to process different input channels, and then uses the convolution kernel of 1 1 to process the × output obtained from the above operation, which greatly reduces the number of model parameters while ensuring that its performance is not far from the standard convolution.

3. Definition of the Problem In this section, we give the definition of the signal modulation classification problem. After the signal is sent by the transmitter, it travels through the wireless channel and reaches the receiving end. The received signal can be expressed as:

r(n) = h(n) g(n) s(n)ej(2π∆ f n+∆θ) + w(n), n = 0, 1, ... , N 1, (1) ∗ ∗ − where presents convolution operation, s(n) is the transmitted complex baseband signal, h(n) is the ∗ channel response, g(n) is the response of the shaping filter, ∆ f is carrier frequency deviation, ∆θ is phase deviation, w(n) is additive white Gaussian (AWGN), and N is the number of sampling points. Let the set of modulation categories to be classified as = 1, 2, ... , M , where M is the number M { } of modulation categories, then the radio modulation classification problem can be expressed as an M-hypothesis testing problem, that is:

Hi : (rN 1) = i, i = 1, 2, ... , M, (2) T × T where rN 1 = [r(0), r(1), ... , r(N 1)] and (rN 1) represents the modulation category to which the × − T × signal belongs. Obviously, this problem can be solved by a classifier with the number of categories M.

4. The Proposed Radio Modulation Classification Method As previously pointed out, the research on ImageNet classification is very rich, and many CNN architectures with good classification capability have been proposed. The main idea of this paper is to use the existing CNN networks for ImageNet recognition to realize radio modulation classification. Since the image size of ImageNet is W W 3 and the radio signal is a complex signal of N 1, the × × × core problem we have to solve is how to convert the complex signals into images of the right size. This will be achieved through RIT. The function ( ) of RIT can be expressed as: R ·

yW W 3 = (rN 1), (3) × × R × Electronics 2020, 9, x FOR PEER REVIEW 4 of 14 images of the right size. This will be achieved through RIT. The function ℛ(∙) of RIT can be expressed as: Electronics 2020, 9, 1646 4 of 14 𝒚𝑊×𝑊×3 =ℛ(𝒓𝑁×1), (3) where 𝒚×× is the image output by RIT. Based on this, the overall architecture of radio where yW W 3 is the image output by RIT. Based on this, the overall architecture of radio modulation modulationclassification× ×classification is shown in Figure is show1. Then in figure Figure shows 1. The that byfigure constructing shows that a bridge by constructing between radio a signalsbridge betweenand images radio through signals RIT, and theimages mature through neural RIT, networks the mature used neural in the ImageNetnetworks used classification in the ImageNet domain classificationcan be used fordomain radio can modulation be used for classification, radio modula therebytion classification, avoiding the thereby complex avoiding process ofthe designing complex processspecialized of designing networks forspecialized radio signal networks classification for radio and signal thus facilitatingclassification the and cross-domain thus facilitating use of the the cross-domainsame model. Ituse should of the be same noted model. that inIt should order to be meet noted the that requirement in order to of meet the numberthe requirement of modulation of the numberclassification of modulation categories, classification the number categories, of neurons the in thenumber classification of neuron layers in ofthe the classification ImageNet networklayer of theneeds ImageNet to be replaced network with needs M. to The be RITreplaced algorithms with M. proposed The RIT inalgorithms this paper proposed are described in this in paper detail are in describedthe following. in detail in the following.

Image Radio Existing ImageNet Radio modulation RIT signal Classifier classification result

FigureFigure 1. 1. RadioRadio modulation modulation classifier classifier architecture. architecture. 4.1. The Proposed Radio–Image Transformer 4.1. The Proposed Radio–Image Transformer 4.1.1. Signal Rearrangement Method 4.1.1. Signal Rearrangement Method The SRM directly arranges the one-dimensional signal sample points into two-dimensional The SRM directly arranges the one-dimensional signal sample points into two-dimensional images. We note that the image has three channels, and if the real and imaginary parts of the signal images. We note that the image has three channels, and if the real and imaginary parts of the signal rN 1 are extracted separately, there are still only two channels. Instead, we from the received signal 𝒓 × T r× are extracted separately, there are still onlyA two= [channels.a( ) a( ) Instead,a(N we )]from the received signal N 1 to extract the instantaneous amplitude N 1 0 , 1 , ... , 1 , instantaneous phase 𝒓×× to extract the instantaneousT amplitude 𝑨×× = [𝑎(0),𝑎(1),…,𝑎(𝑁−1− )] , instantaneous phaseT PN 1 = [p(0), p(1), ... , p(N 1)] and instantaneous frequency FN 1 = [ f (0), f (1), ... , f (N 1)] . 𝑷 × = [𝑝(0),𝑝(1),…,𝑝(𝑁−1−)] and instantaneous frequency 𝑭 × = [𝑓(0),𝑓(1),…,𝑓(𝑁−1− )] . Each× element in the instantaneous amplitude is: × Each element in the instantaneous amplitude is:

a(n) = r(n) , n = 0, 1, ... , N 1, (4) 𝑎(𝑛) = |𝑟(𝑛)|,𝑛=0,1,…,𝑁−1− , (4) EachEach element element of the instantaneous phase is: (()) 𝑝(𝑛) = arctan !, 𝑛=0,1,…,𝑁−1, Im(r((n())) ) (5) p(n) = arctan , n = 0, 1, ... , N 1, (5) ( ( )) where Im(𝑟(𝑛)) and Re(𝑟(𝑛)) represent theRe imaginaryr n and real parts− of 𝑟(𝑛), respectively. Each elementwhere Im of(r the(n)) instantaneousand Re(r(n)) frequencyrepresent the is: imaginary and real parts of r(n), respectively. Each element of the instantaneous frequency𝑓 is:(𝑛) =𝑝(𝑛+1) −𝑝(𝑛),𝑛=0,1,…,𝑁−2, (6)

In order to make the dimensionf (n) = p( nof+ the1) instanp(n), taneousn = 0, 1, ...frequency, N 2, vector consistent with the (6) instantaneous amplitude and the instantaneous− phase, we set 𝑓(𝑁−1− ) =0. ThenIn order we use to makesliding the window dimension to reorder of the instantaneous instantaneous amplitude frequency 𝑨 vector× = [𝑎 consistent(0),𝑎(1),…,𝑎 with(𝑁− the 1instantaneous)] , instantaneous amplitude phase and 𝑷× the= instantaneous[𝑝(0),𝑝(1),…,𝑝 phase,(𝑁−1 we)] set andf (N instantaneous1) = 0. frequency 𝑭× = [ ( ) ( ) ( )] − T 𝑓 0 Then,𝑓 1 we,…,𝑓 use𝑁−1 sliding window, where tothe reorder window instantaneous length is amplitude 𝑊 and theAN interval1 = [a(0 )is, a (𝐿1. ),The... , aschematic(N 1)] , T × − diagraminstantaneous is shown phase in FigurePN 1 2.= For[p (convenience,0), p(1), ... , p (𝐴N, 𝑃1)] andand 𝐹 instantaneous in the figure represent frequency 𝑎(𝑛)FN, 𝑝(𝑛)1 = × − × and[ f (0 )𝑓, (f𝑛(1),) ,respectively.... , f (N 1)] AfterT, where the input the window signal is length rearranged, is W and the theoutput interval size is is 𝑊×𝑊×3L. The schematic, where − 𝑊diagram = 224, iswhich shown is inconsistent Figure2. with For convenience, the image sizeA nof, P ImageNet.n and Fn in In the this figure way, represent the existinga(n )ImageNet, p(n) and classificationf (n), respectively. networks After can the be input used signal to classify is rearranged, the radio themodulation. output size is W W 3, where W = 224, × × which is consistent with the image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify the radio modulation.

Electronics 2020, 9, 1646 5 of 14 Electronics 2020, 9, x FOR PEER REVIEW 5 of 14 Electronics 2020, 9, x FOR PEER REVIEW 5 of 14

Figure 2. Schematic diagram of the signal rearrangement method. In the figure, 𝑁 = 1024, 𝑊= =𝑁 = 1024 =𝑊= Figure224Figure, 𝐿=4 2. 2.Schematic .Schematic The instantaneous diagram diagram of ofamplitude, the the signal signal rearrangementinstantaneou rearrangements method.phase method. and In instantaneousIn the the figure, figure,N frequency1024, W, of 224the, L224= 𝐿=4 input4, .all The contain instantaneous. The 1024instantaneous elements. amplitude, amplitude, First, instantaneous each instantaneouinput sequence phases andphase is instantaneous expanded and instantaneous to a frequency vector frequency with of the a length input of the allof containinput all 1024 contain elements. 1024 elements. First, each First, input each sequence input sequence is expanded is expanded to a vector towith a vector a length with ofa length 1116, and of 1116, and then it is transformed into a matrix of 224 × 224 by sliding. Finally, the three matrices are then1116, it and is transformed then it is transformed into a matrix into of a 224 matrix224 of by224 sliding. × 224 by Finally, sliding. the Finally, three matricesthe three arematrices spliced are to spliced to obtain an image of 224 × 224 × 3, whose size is consistent with the size of images in obtainspliced an to image obtain of 224an image224 of3, 224 whose × 224 size × is3, consistentwhose size with is consistent the size of with images the in size ImageNet. of images in ImageNet. × × ImageNet. 4.1.2. Convolution Mapping Method 4.1.2. Convolution Mapping Method 4.1.2. Convolution Mapping Method = [ ( ) ( ) ( )]T The CMM also calculates firstly the instantaneous amplitude AN 1 a 0 , a 1 , ... , a N 1 , The CMM also calculates firstly the instantaneous amplitudeT 𝑨×× = [𝑎(0),𝑎(1),…,𝑎(𝑁−1− )] , instantaneousThe CMM phase also calculatesPN 1 = firstly[p(0 )the, p( instantaneous1), ... , p(N 1 amplitude)] and instantaneous 𝑨× = [𝑎(0),𝑎 frequency(1),…,𝑎(𝑁−1FN 1)] =, 𝑷× = [𝑝(0),𝑝(1),…,𝑝(𝑁−1− )] 𝑭× = [instantaneous( ) ( ) phase( )]T × and instantaneous frequency × finstantaneous0 , f 1 , ... , f phaseN 1 𝑷of× the= [ received𝑝(0),𝑝(1 signal),…,𝑝(r𝑁−1N 1, then)] and adds ainstantaneous layer of convolution frequency and pooling𝑭× = [𝑓(0),𝑓(1),…,𝑓(𝑁−1− )] of the received signal 𝑟×× , then adds a layer of convolution and pooling operation[𝑓(0),𝑓(1 to),…,𝑓 turn(𝑁−1 input)] into of an the image received of the signal right 𝑟× size., then The adds schematic a layer diagram of convolution is shown and in Figurepooling3 . operation to turn input into an image of the right size. The schematic diagram is shown in Figure 3. Similarly,operation after to turn convolution input into mapping, an image the of outputthe right size size. is W TheW schematic3, where diagramW = 224 is ,shown which in is consistentFigure 3. Similarly, after convolution mapping, the output size× is 𝑊×𝑊×3× , where 𝑊 = 224, which is withSimilarly, the image after size convolution of ImageNet. mapping, In this the way, output the existing size is ImageNet𝑊×𝑊×3 classification, where 𝑊 networks = 224, which can is be consistent with the image size of ImageNet. In this way, the existing ImageNet classification usedconsistent to classify with radio the modulation.image size of ImageNet. In this way, the existing ImageNet classification networks can be used to classify radio modulation. networks can be used to classify radio modulation.

FigureFigure 3. 3.3.Schematic SchematicSchematic diagram diagramdiagram of the ofof convolutionthethe convolutionconvolution mapping ma method.pping method. The input TheThe instantaneous inputinput instantaneousinstantaneous amplitude, instantaneousamplitude,amplitude, instantaneousinstantaneous phase and instantaneous phasephase andand frequencyinstantaneouinstantaneou are arrangeds frequency into areare tensors arrangedarranged with a into heightinto tensorstensors of 1024, withwith width aa ofheightheight 3 and of of channel 1024, 1024, width width of 1, respectively, ofof 33 andand channelchannel and then ofof 1, the respec firsttively, 102 values and thenthen of the thethe height firstfirst 102 dimension102 valuesvalues of areof the the copied height height to thedimensiondimension tail, and are are finally copied copied the to tensorsto thethe tail,tail, with andand a height finallyfinally of the 1126 tensors are expanded. with aa heightheight Then, ofof after 11261126 convolution areare expanded.expanded. operation Then, Then, (theafterafter number convolutionconvolution of convolution operationoperation kernels (the(the numbernumber is 224), of the conv numberolution of outputkernels channels isis 224),224), isthethe expanded numbernumber toofof 224, outputoutput and thenchannelschannels the height is is expanded expanded is reduced to to 224,224, to 224 andand after thenthen pooling, thethe heightheight and is the reduced width andtoto 224224 channel afterafter pooling,pooling, dimension andand are thethe exchanged width width and and to obtainchannel an dimension image with are the exchanged size of 224 to224 obtain3, which an image is consistent with the with size the of size 224 of × images 224 ×in 3, ImageNet. which is channel dimension are exchanged× to obtain× an image with the size of 224 × 224 × 3, which is consistentconsistent with with the the size size of of imagesimages inin ImageNet.ImageNet.

Electronics 2020, 9, 1646 6 of 14

4.2. The Training Procedure The purpose of training is to make use of the training dataset, optimize the network parameters, make it achieve better performance on the training set, and at the same time make the networks generalize to other data outside the training set as much as possible. First, we need to build the training set for training. The training set contains the received signal samples and corresponding labels, which can be expressed as: n (i) (i)oS = rN 1 , m , (7) C × i=1 where, m(i) represents the modulation type corresponding to the i-th sample, and S represents the number of samples in the training set. Since the RIT proposed in this paper (SRM and CMM) needs to convert the received signals into instantaneous amplitude, instantaneous phase and instantaneous frequency, the training set can also be expressed as:

n (i) (i) (i) (i)oS 2 = AN 1 , PN 1 , FN 1 , m . (8) C × × × i=1 In this training set, the loss function used in training is the commonly used cross entropy. For a mini-batch containing NB samples, the loss function is defined as:

N 1 XB XM = dik log(pik), (9) E −NB k=1 i=1 where pik is the confidence output on the k-th category when the i-th sample is taken as input, and dik is the k-th dimension true label of the i-th sample. In this paper, Adam optimization algorithm (Adam) [33] is adopted for network training. It keeps an element-wise moving average of both the parameter gradients and their squared values and uses these averages to update the network parameters as mt = β1mt 1 + (1 β1) (θt), (10) − − ∇E 2 vt = β2vt 1 + (1 β2)( (θt)) , (11) − − ∇E   θ + = θt α mt/ √vt + ˆ , (12) t 1 − · where θt is the network parameter vector (including weights and biases), t is the iterative index, α > 0 is the learning rate, β [0, 1) is the exponential decay rate of the first order moment estimation, 1 ∈ β [0, 1) is the exponential decay rate of the second moment estimation, ˆ is a very small positive 2 ∈ number which is used to prevent dividing by zero during the calculation. We consider two training strategies in this paper. The first training strategy is based on transfer learning. During training, only the weights of the last classification layer and the convolutional mapping layer are adjusted, while the weights of other layers are fixed. The second training strategy is to train the entire network as a new network. Compared with the first training strategy, the second method has higher computational complexity.

5. Experiments and Performance Evaluation

5.1. The Experimental Setup

5.1.1. Dataset In this paper, four datasets are generated to test the performance of the proposed methods.

Sig1024 • The dataset is generated with simulation and contains 12 modulations, including BPSK, QPSK, 8PSK, OQPSK, 2FSK, 4FSK, 8FSK, 16QAM, 32QAM, 64QAM, 4PAM and 8PAM. When IQ data is Electronics 2020, 9, 1646 7 of 14 generated, the original information is generated in a random way. After modulation, a pulse shaping filter is used. The shaping filter used in the simulation is the raised cosine filter, and the rolling off factor is randomly chosen within the range of [0.2, 0.7]. Since the receiver and transmitter are not strictly synchronized, we add random phase and carrier frequency deviations to the signals, where the phase deviation is randomly chosen within the range of [ π, π], and the normalized carrier frequency − offset (relative to the sampling frequency) is randomly chosen within the range of [ 0.1, 0.1]. The noise − is assumed to be AWGN, and the signal-to-noise ratio (SNR) ranges from 20 dB to 30 dB with an − interval of 2 dB. Each IQ signal contains 128 symbols and each symbol has a sampling point of eight, so the number of samples per signal is 1024. In the training set, the sample size of each modulation under each SNR is 1000. In the test set, the sample size is half of the training set.

Sig1024_2 • This dataset is similar to Sig1024, except that two sampling rates are considered. Signal samples with oversampling ratio of four are also contained and therefore this dataset has twice as many signal samples as Sig1024.

RealSig • This real dataset is collected over the air. A signal source generates a specific modulated signal, and the collector receives and samples the signal. Two modulations are considered, i.e., 16QAM and 64QAM. We add AWGN noise with the SNR ranges from 0 dB to 30 dB with an interval of 2 dB. In the training set and test set, the sample size of each modulation under each SNR is 3860.

OFDMSig • The dataset contains OFDM signals, including six modulations. The number of for each modulation and the modulation method for each are as follows: 16 subcarriers, BPSK; 16 subcarriers, QPSK; 16 subcarriers, 64QAM; 64 subcarriers, BPSK; 64 subcarriers, QPSK; 64 subcarriers, 64QAM. Similarly, the SNR ranges from 20 dB to 30 dB, with an interval of 2dB. − 5.1.2. Network Models In the simulation, seven ImageNet classification network models are used, including the typical convolutional neural networks (CNNs) ResNet18, ResNet101, VGG16 and DenseNet121, and the lightweight models MobileNet, ShuffleNet and MNASNet. The number of parameters of these models and their accuracy on ImageNet classification are shown in Table1.

Table 1. The number of parameters of the models used and their accuracy on ImageNet.

Models Parameters Top-1 Accuracy Top-5 Accuracy ResNet18 11689512 69.76% 89.08% ResNet101 44549160 77.37% 93.56% VGG16_bn 138365992 73.37% 91.50% DenseNet121 7978856 74.65% 92.17% MobileNet_V2 3504872 71.88% 90.29% ShuffleNet_v2_x1_0 2278604 69.36% 88.32% MNASNet1_0 4383312 73.51% 91.54%

5.1.3. Training Environment In the experiments, the loss function was cross entropy loss, the optimizer was Adam, the learning rate was set at 0.001 and after every 20 epochs, the learning rate decreased to 80% of the previous value. The deep learning models were trained on NVIDIA GeForce RTX 2080, and the framework used was PyTorch. Electronics 2020, 9, x FOR PEER REVIEW 8 of 14

5.1.3. Training Environment In the experiments, the loss function was cross entropy loss, the optimizer was Adam, the learning rate was set at 0.001 and after every 20 epochs, the learning rate decreased to 80% of the Electronicsprevious2020 value., 9, 1646 The deep learning models were trained on NVIDIA GeForce RTX 2080, and8 ofthe 14 framework used was PyTorch.

5.1.4. Traditional Methods for Comparison 5.1.4. Traditional Methods for Comparison In this paper, constellation diagrams and time–frequency images are used for performance In this paper, constellation diagrams and time–frequency images are used for performance comparison. These two methods convert IQ signals into constellation diagrams and time–frequency comparison. These two methods convert IQ signals into constellation diagrams and time–frequency images, respectively, and then use the existing CNNs for classification. The process of these two images, respectively, and then use the existing CNNs for classification. The process of these two methods is described below. methods is described below. Constellation diagram: Take radio signal’s I channel as the x-coordinate and the corresponding Q Constellation diagram: Take radio signal’s I channel as the x-coordinate and the corresponding channel as the y-coordinate, and then specify the number of bins for both the abscissa and ordinate Q channel as the y-coordinate, and then specify the number of bins for both the abscissa and dimension to be 224 to calculate the two-dimensional histogram of I channel and Q channel data. ordinate dimension to be 224 to calculate the two-dimensional histogram of I channel and Q For each signal, we can get a 224 224 sized matrix, and the matrix is copied and expanded into an channel data. For each signal, we× can get a 224 × 224 sized matrix, and the matrix is copied and image of 224 224 3, which is used as the input of CNNs. expanded into× an image× of 224 × 224 × 3, which is used as the input of CNNs. Time–frequency image: The short-time Fourier transform (STFT) is used to obtain the Time–frequency image: The short-time Fourier transform (STFT) is used to obtain the time– time–frequency image of the radio signal. First, the original 1024 length signal is expanded to frequency image of the radio signal. First, the original 1024 length signal is expanded to 1116 length 1116 length by repeating the first 92 values of the radio signal. We then compute the STFT of the signal. by repeating the first 92 values of the radio signal. We then compute the STFT of the signal. The The window used is the Hamming window, the length of the window is set as 224, the overlap number window used is the Hamming window, the length of the window is set as 224, the overlap number of the window is set as 220, and the length of the STFT is set as 224. Through the above processing, of the window is set as 220, and the length of the STFT is set as 224. Through the above processing, a radio signal of 1116 length in the complex form can be transformed into a matrix of 224 224 size a radio signal of 1116 length in the complex form can be transformed into a matrix of 224 ×× 224 size in the complex form. A time–frequency image of size 224 224 3 can be obtained by taking and in the complex form. A time–frequency image of size 224 × 224 × 3 can be obtained by taking and combining the real part, imaginary part and amplitude of the matrix. This time–frequency image is combining the real part, imaginary part and amplitude of the matrix. This time–frequency image is used as the input of CNNs to realize the classification task. used as the input of CNNs to realize the classification task. 5.2. The Experimental Results 5.2. The Experimental Results 5.2.1. Comparison of Training Strategies 5.2.1. Comparison of Training Strategies At first, we compare the classification performance of the two training strategies mentioned above on datasetAt first, Sig1024. we compare The method the classification used is the performa signal rearrangementnce of the two method training and strategies the model mentioned used is ResNet18.above on dataset The first Sig1024. training The strategy method only used adjusts is the the signal weight rearrangement of the last classification method and layer the duringmodel training,used is ResNet18. and the second The first training training strategy strategy trains only the adjusts whole the network weight from of the the last start. classification Figure4 shows layer theduring classification training, and accuracy the second of the twotraining training strategy strategies trains under the whole different network SNRs, fr andom the Table start.2 shows Figure the 4 overallshows the classification classification accuracy accuracy ofthe of twothe two training traini strategiesng strategies in the under whole different test dataset. SNRs, Although and Table the 2 computationalshows the overall complexity classification of the firstaccuracy training of the strategy two istraining low, the strategies classification in the eff ectwhole is unsatisfactory. test dataset. InAlthough the following the computational simulations, wecomplexity adopt the of strategy the first of tr trainingaining strategy the entire is low, network. the classification effect is unsatisfactory. In the following simulations, we adopt the strategy of training the entire network.

Figure 4. Performance comparison between training the entire network and training only the classification layer. The classification method is signal rearrangement method (SRM), the model used is Resnet18, and the dataset is Sig1024. Electronics 2020, 9, x FOR PEER REVIEW 9 of 14

Figure 4. Performance comparison between training the entire network and training only the classification layer. The classification method is signal rearrangement method (SRM), the model used is Resnet18, and the dataset is Sig1024.

ElectronicsTable2020 2., 9,Overall 1646 classification accuracy of training the entire network and training only the9 of 14 classification layer. The classification method is SRM, the model used is Resnet18, and the dataset is Sig1024. Table 2. Overall classification accuracy of training the entire network and training only the classification layer. The classificationMethod methodTrain is the SRM, Classification the model used is Resnet18,Train and the dataset Entire is Sig1024. Methodss Train the ClassificationLayer Layer Train theNetwork Entire Network AccuracyAccuracy 32.30% 32.30% 65.99% 65.99%

5.2.2. Comparison with Traditional Methods 5.2.2. Comparison with Traditional Methods We now compare the performance of our proposed method with traditional constellation diagram and time–frequencyWe now compare image the methods performance on dataset of our Sig1024. proposed ResNet18 method is usedwith totraditional carry out classificationconstellation tasks.diagram Figure and 5time–fre showsquency the classification image methods accuracy on ofdataset the four Sig1024. methods ResNet18 under is di usedfferent to SNR,carry andout Tableclassification3 shows tasks. the overall Figure classification 5 shows the accuracy classification of the accuracy four methods. of the Asfour can methods be seen under from Figuredifferent5, theSNR, performance and Table 3 of shows the signal the overall rearrangement classification method accu andracy the of convolution the four methods. mapping As methodcan be seen proposed from inFigure this paper5, the areperformance almost the same,of the andsignal the rear classificationrangement accuracy method can and approach the convolution 100% at highmapping SNR. Themethod method proposed based onin this time–frequency paper are almost images the performs same, and better the at classification low SNR, but accuracy has lower can classification approach accuracy100% at high at high SNR. SNR. The The method method based using on thetime–frequency constellation images diagrams performs generated better from at IQlow signals SNR, hasbut relativelyhas lower poor classification results under accuracy both low at andhigh high SNR. SNR. The On method the whole, using the proposedthe constellation two methods diagrams have bettergenerated performance from IQ signals than the has traditional relatively constellationpoor results under diagram both method low and and high time–frequency SNR. On the whole, image method,the proposed which two shows methods the superiority have better of our performance proposed methods. than the traditional constellation diagram method and time–frequency image method, which shows the superiority of our proposed methods.

Figure 5. The performance of the radio–image transformer (RIT) methods on dataset Sig1024, Figure 5. The performance of the radio–image transformer (RIT) methods on dataset Sig1024, compared compared with the constellation diagram method and time–frequency image method. with the constellation diagram method and time–frequency image method.

Table 3.3.Comparison Comparison of of the the overall overall classification classification accuracy accuracy of the of four the methods. four methods. The dataset The isdataset Sig1024. is Sig1024. Methods SRM Constellation Time–Frequency CMM Time– MethodsResNet18 SRM 65.99% Constellation 40.27% 55.79% 65.49%CMM Frequency

5.2.3. PerformanceResNet18 of Different 65.99% ImageNet Models 40.27% 55.79% 65.49% Using the convolution mapping method, we compare the classification accuracy obtained by using different ImageNet classification models on dataset Sig1024, including the typical convolutional neural networks ResNet18, ResNet101, VGG16 and DenseNet121, as well as the lightweight models MobileNet, ShuffleNet and MNASNet. Table4 shows the overall classification accuracy of these methods and Figure6 shows the classification accuracy of these models under di fferent SNRs. We can see that except for the relatively low classification accuracy of the MNASNet model, the classification results of the other models have little difference. Among them, DenseNet121 has the highest overall accuracy. Electronics 2020, 9, x FOR PEER REVIEW 10 of 14 Electronics 2020, 9, 1646 10 of 14 5.2.3. Performance of Different ImageNet Models

UsingTable 4.theComparison convolution of themapping overall method, classification we comp accuracyare ofthe di classificationfferent models. accuracy The convolution obtained by usingmapping different method ImageNet was adopted. classification models on dataset Sig1024, including the typical convolutional neural networks ResNet18, ResNet101, VGG16 and DenseNet121, as well as the CNNs ResNet18 ResNet101 VGG16_bn lightweight models MobileNet, ShuffleNet and MNASNet. Table 4 shows the overall classification accuracy of these methodsAccuracy and Figure 65.49% 6 shows the classification 66.47% accuracy 65.43% of these models under different SNRs. DenseNet121We can see that MobileNet_V2 except for the Shu fflrelativelyeNet_v2_x1_0 low classification MNASNet1_0 accuracy of the MNASNet model, the67.47% classification 66.63%results of the other 65.06% models have little 49.27% difference. Among them, DenseNet121 has the highest overall accuracy.

Figure 6. Comparison of the performance of different ImageNet classification network models. The Figure 6. Comparison of the performance of different ImageNet classification network models. The convolution mapping method was adopted. convolution mapping method was adopted.

5.2.4.Table Performance 4. Comparison with Di offf erentthe overall Sampling classification Rates accuracy of different models. The convolution mapping method was adopted. We test the performance of our method on dataset Sig1024_2 with different sampling rates. We use CMM and MobileNet_V2CNNs modelResNet18 and the classification ResNet101 accuracy of each SNRVGG16_bn is shown in the Figure7. We can see that in this case, our method still has good classification performance. This shows that the method can be usedAccuracy to classify radio65.49% signals with different 66.47% sampling rates 65.43% as long as the model has been trained withDenseNet121 signal samples MobileNet_V2 with these sampling ShuffleNet rates._v2_x1_0 MNASNet1_0 Electronics 2020, 9, x FOR PEER REVIEW 11 of 14 67.47% 66.63% 65.06% 49.27%

5.2.4. Performance with Different Sampling Rates We test the performance of our method on dataset Sig1024_2 with different sampling rates. We use CMM and MobileNet_V2 model and the classification accuracy of each SNR is shown in the Figure 7. We can see that in this case, our method still has good classification performance. This shows that the method can be used to classify radio signals with different sampling rates as long as the model has been trained with signal samples with these sampling rates.

Figure 7. Classification accuracy of each SNR of dataset Sig1024_2 with 2 different sampling rates. Figure 7. Classification accuracy of each SNR of dataset Sig1024_2 with 2 different sampling rates. 5.2.5. Performance with Real Measured Data We also verify the effectiveness of our method by using real measured dataset RealSig to test the MobileNet_V2 model trained in the third experiment. Figure 8 shows the classification accuracy of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM signals.

Figure 8. Classification accuracy of each SNR of the real dataset RealSig. The classification method is CMM and the CNN model used is MobileNet_V2.

To further improve the performance, we add the training set in dataset RealSig to fine-tune the model MobileNet_V2 trained with dataset Sig1024 with a learning rate of 0.00001, and then test the model’s classification performance on the test set. The classification accuracy of each SNR is shown in Figure 9. We can see that the classification accuracy can be improved to nearly 100% at high SNR.

Electronics 2020, 9, x FOR PEER REVIEW 11 of 14

ElectronicsFigure2020, 7.9, Classification 1646 accuracy of each SNR of dataset Sig1024_2 with 2 different sampling rates.11 of 14

5.2.5. Performance with Real Measured Data 5.2.5. Performance with Real Measured Data We also verify the effectiveness of our method by using real measured dataset RealSig to test We also verify the effectiveness of our method by using real measured dataset RealSig to test the MobileNet_V2 model trained in the third experiment. Figure 8 shows the classification accuracy the MobileNet_V2 model trained in the third experiment. Figure8 shows the classification accuracy of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and of each SNR. It is clear that the model we trained on the simulation dataset Sig1024 with CMM and MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM MobileNet_V2 also has good classification accuracy for the real measured 16QAM and 64QAM signals. signals.

Figure 8. Classification accuracy of each SNR of the real dataset RealSig. The classification method is Figure 8. Classification accuracy of each SNR of the real dataset RealSig. The classification method is CMM and the CNN model used is MobileNet_V2. CMM and the CNN model used is MobileNet_V2.

To furtherfurther improveimprove thethe performance,performance, we add the trainingtraining set in dataset RealSig to fine-tunefine-tune the model MobileNet_V2 trained with dataset Sig1024 with a learning rate of 0.00001, and then test the model’s classificationclassification performanceperformance on on the the test test set. set. The The classification classification accuracy accuracy of of each each SNR SNR is shownis shown in Figurein Figure9. We 9. We can can see see that that the the classification classification accuracy accuracy can can be improvedbe improved to nearly to nearly 100% 100% at high at high SNR. SNR. Electronics 2020, 9, x FOR PEER REVIEW 12 of 14

Figure 9. Classification accuracy of each SNR of the real dataset RealSig. The MobileNet_V2 model Figure 9. Classification accuracy of each SNR of the real dataset RealSig. The MobileNet_V2 model is fine-tuned. is fine-tuned.

5.2.6. Performance of OFDM Signal Classification Classification Finally, wewe carry carry out out experiments experiments on dataseton dataset OFDMSig OFDMSig to verify to verify the eff ectivenessthe effectiveness of our method,of our usingmethod, CMM using and CMM MobileNet_V2. and MobileNet_V2. The classification The classificat result ision shown result in is Figure shown 10 .in It Figure can be seen10. It that can our be methodseen that also our has method a good also classification has a good performanceclassification on pe therformance dataset OFDMSig,on the dataset achieving OFDMSig, a classification achieving accuracya classification of 100% accuracy at the highof 100% SNR. at the high SNR.

Figure 10. Classification accuracy of each SNR of dataset OFDMSig.

6. Conclusions In this paper, we have proposed a RIT that transforms radio signals into images, and then the existing ImageNet classification network models are utilized to realize radio modulation classification, thus building a bridge between radio modulation classification and ImageNet classification. The contribution of this paper is to transform the signal modulation classification problem into the image classification problem by the proposed RIT, enabling signal classification using ImageNet models from the image classification domain. Experiments have shown that the performance of this method is better than those of the constellation diagram method and time– frequency image method. The performance comparison of different ImageNet classification network models shows that except for MNASNet1_0, the other six models have similar performance, which indicates that this method has relatively loose requirements for ImageNet classification network models. Furthermore, the experimental results show that our method has good classification performance for signal datasets with different sampling rates and our method can also handle OFDM signals and real measured signal data well. The method in this paper establishes a relationship between radio modulation classification and ImageNet image classification, two

Electronics 2020, 9, x FOR PEER REVIEW 12 of 14

Figure 9. Classification accuracy of each SNR of the real dataset RealSig. The MobileNet_V2 model is fine-tuned.

5.2.6. Performance of OFDM Signal Classification Finally, we carry out experiments on dataset OFDMSig to verify the effectiveness of our method, using CMM and MobileNet_V2. The classification result is shown in Figure 10. It can be Electronicsseen that2020 our, 9 method, 1646 also has a good classification performance on the dataset OFDMSig, achieving12 of 14 a classification accuracy of 100% at the high SNR.

Figure 10. Classification accuracy of each SNR of dataset OFDMSig. Figure 10. Classification accuracy of each SNR of dataset OFDMSig.

6. Conclusions In this paper, wewe havehave proposedproposed aa RITRIT thatthat transformstransforms radioradio signalssignals into images, and then the existing ImageNetImageNet classification classification network network models models are utilized are toutilized realize to radio realize modulation radio classification,modulation thusclassification, building athus bridge building between a bridge radio modulation between radio classification modulation and ImageNetclassification classification. and ImageNet The contributionclassification. of The this contribution paper is to transform of this paper the signal is to modulation transform the classification signal modulation problem into classification the image classificationproblem into problemthe image by classification the proposed problem RIT, enabling by the signalproposed classification RIT, enabling using signal ImageNet classification models fromusing theImageNet image classification models from domain. the image Experiments classification have domain. shown that Expe theriments performance have shown of this that method the isperformance better than of those this ofmethod the constellation is better than diagram those methodof the constellation and time–frequency diagram image method method. and time– The performancefrequency image comparison method. ofThe di performancefferent ImageNet comparison classification of different network ImageNet models classification shows that except network for MNASNet1_0,models shows that the except other sixfor modelsMNASNet1_0, have similar the othe performance,r six models which have sim indicatesilar performance, that this method which hasindicates relatively that loosethis method requirements has relatively for ImageNet loose requirements classification for network ImageNet models. classification Furthermore, network the experimentalmodels. Furthermore, results show the thatexperimental our method results has good show classification that our method performance has good for signal classification datasets withperformance different for sampling signal ratesdatasets and ourwith method different can alsosampling handle rates OFDM and signals our method and real can measured also handle signal dataOFDM well. signals The method and real in this measured paper establishes signal data a relationship well. The between method radio in this modulation paper establishes classification a andrelationship ImageNet between image classification,radio modulation two seeminglyclassification unrelated and ImageNet tasks, and image verifies classification, the cross-domain two transfer ability of ImageNet classification network models. In our future work, we will consider applying a similar RIT to other radio signal classification tasks, such as interference classification, coding recognition, and protocol identification.

Author Contributions: Conceptualization, S.C.; methodology, K.Q. and S.Z.; software, K.Q.; validation, S.C., S.Z. and Q.X.; formal analysis, S.C.; investigation, S.Z.; resources, S.Z.; data curation, K.Q. and S.Z.; writing—original draft preparation, S.C., K.Q and S.Z.; writing—review and editing, Q.X and X.Y.; visualization, K.Q.; supervision, X.Y.; project administration, X.Y.; funding acquisition, Q.X. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Haykin, S.; Setoodeh, P. Cognitive Radio Networks: The Spectrum Supply Chain Paradigm. IEEE Trans. Cogn. Commun. Netw. 2015, 1, 3–28. [CrossRef] 2. Yucek, T.; Arslan, H. A survey of spectrum sensing algorithms for cognitive radio applications. IEEE Commun. Surv. Tutor. 2009, 11, 116–130. [CrossRef] 3. Zheng, S.; Chen, S.; Yang, L.; Zhu, J.; Luo, Z.; Hu, J.; Yang, X. Big Data Processing Architecture for Radio Signals Empowered by Deep Learning: Concept, Experiment, Applications and Challenges. IEEE Access 2018, 6, 55907–55922. [CrossRef] Electronics 2020, 9, 1646 13 of 14

4. Rajendran, S.; Calvo-Palomino, R.; Fuchs, M.; Van den Bergh, B.; Cordobes, H.; Giustiniano, D.; Pollin, S.; Lenders, V. Electrosense: Open and Big Spectrum Data. IEEE Commun. Mag. 2018, 56, 210–217. [CrossRef] 5. Ulversoy, T. Software Defined Radio: Challenges and Opportunities. IEEE Commun. Surv. Tutor. 2010, 12, 531–550. [CrossRef] 6. Emam, A.; Shalaby, M.; Aboelazm, M.A.; Bakr, H.E.A.; Mansour, H.A.A. A Comparative Study between CNN, LSTM, and CLDNN Models in the Context of Radio Modulation Classification. In Proceedings of the 2020 12th International Conference on (ICEENG), Cairo, Egypt, 7–9 July 2020. 7. Kim, S.H.; Kim, C.Y.; Yoo, S.H.; Kim, D.-S. Design of Deep Learning Model for Automatic Modulation Classification in Cognitive . J. Korean Inst. Commun. Inf. Ences. 2020, 45, 1364–1372. 8. Zhu, Z.; Nandi, A.K. Automatic Modulation Classification: Principles, Algorithms and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2015; ISBN 9781118906491. 9. Dobre, O.A.; Abdi, A.; Bar-Ness, Y.; Su, W. Survey of automatic modulation classification techniques: Classical approaches and new trends. IET Commun. 2007, 1, 137–156. [CrossRef] 10. Swami, A.; Sadler, B.M. Hierarchical digital modulation classification using cumulants. IEEE Trans. Commun. 2000, 48, 416–429. [CrossRef] 11. Soliman, S.S.; Hsue, S.-Z. Signal classification using statistical moments. IEEE Trans. Commun. 1992, 40, 908–916. [CrossRef] 12. Grimaldi, D.; Rapuano, S.; Vito, L.D. An Automatic Digital Modulation Classifier for Measurement on Networks. IEEE Trans. Instrum. Meas. 2007, 56, 1711–1720. [CrossRef] 13. Majhi, S.; Gupta, R.; Xiang, W.; Glisic, S. Hierarchical Hypothesis and Feature-Based Blind Modulation Classification for Linearly Modulated Signals. IEEE Trans. Veh. Technol. 2017, 66, 11057–11069. [CrossRef] 14. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef][PubMed] 15. Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. 16. O’Shea, T.J.; Corgan, J.; Clancy, T.C. Convolutional radio modulation recognition networks. Proc. Int. Conf. Eng. Appl. Neural Netw. 2016, 629, 213–226. 17. O’Shea, T.J.; Roy, T.; Clancy, T.C. Over-the-Air Deep Learning Based Radio Signal Classification. IEEE J. Sel. Top. Signal Process. 2018, 12, 168–179. [CrossRef] 18. Zheng, S.; Qi, P.; Chen, S.; Yang, X. Fusion Methods for CNN-Based Automatic Modulation Classification. IEEE Access 2019, 7, 66496–66504. [CrossRef] 19. Rajendran, S.; Meert, W.; Giustiniano, D.; Lenders, V.; Pollin, S. Deep Learning Models for Wireless Signal Classification With Distributed Low-Cost Spectrum Sensors. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 433–445. [CrossRef] 20. Peng, S.; Jiang, H.; Wang, H.; Alwageed, H.; Zhou, Y.; Sebdani, M.M.; Yao, Y. Modulation Classification Based on Signal Constellation Diagrams and Deep Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 718–727. [CrossRef] 21. Akeret, J.; Chang, C.; Lucchi, A.; Refregier, A. interference mitigation using deep convolutional neural networks. Astron. Comput. 2017, 18, 35–39. [CrossRef] 22. Czech, D.; Mishra, A.; Inggs, M. A CNN and LSTM-based approach to classifying transient radio frequency interference. Astron. Comput. 2018, 25, 52–57. [CrossRef] 23. Selim, A.; Paisana, F.; Arokkiam, J.A.; Zhang, Y.; Doyle, L.; DaSilva, L.A. Spectrum Monitoring for Bands Using Deep Convolutional Neural Networks. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Conference, Singapore, 4–8 December 2017; pp. 1–6. 24. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [CrossRef] 25. Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. Lect. Notes Comput. Sci. (Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform.) 2014, 8689, 818–833. 26. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. 27. Christian, S.; Wei, L.; Yang, J.; Pierre, S.; Scott, R.; Dragomir, A.; Dumitru, E.; Vincent, V.; Andrew, R. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. Electronics 2020, 9, 1646 14 of 14

28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. 29. Iandola, F.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2017, arXiv:1602.07360. 30. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. 31. Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. 32. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. 33. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).