<<

entropy

Article Decision Tree-Based Sensitive Information Identification and Encrypted Transmission System

Shuang Liu, Ziheng Yang *, Yi Li and Shuiqing Wang Electronic Engineering College, Heilongjiang University, Harbin 150080, China; [email protected] (S.L.); [email protected] (Y.L.); [email protected] (S.W.) * Correspondence: [email protected]

 Received: 29 December 2019; Accepted: 4 February 2020; Published: 7 February 2020 

Abstract: With the advent of the information age, the effective identification of sensitive information and the leakage of sensitive information during the transmission process are becoming increasingly serious issues. We designed a sensitive information recognition and encryption transmission system based on a decision tree. By training sensitive data to build a decision tree, unknown data can be classified and identified. The identified sensitive information can be marked and encrypted to achieve intelligent recognition and protection of sensitive information. This lays the foundation for the development of an information recognition and encryption transmission system.

Keywords: decision tree; hardware encryption; ZYNQ; SM4

1. Introduction Encryption technology is an important measure for information security. Therefore, this paper adopts FPGA (Field Programmable Gate Array) chip hardware to implement a domestic SM4 (A block cipher standard adopted by the government of the People’s Republic of China) encryption algorithm, and proposes a hardware encryption transmission design scheme based on a fully programmable SOC (System on Chip) platform [1,2]. Furthermore, we make use of an FPGA and ARM (Advanced RISC Machine) processor co-design to enhance the practicability and extensibility of the encryption system. In the FPGA development environment, the PCIe (Peripheral Component Interconnect express) interface design is implemented with the IPI (Intellectual Property Integrator) tool to perform high-speed data transmission between the sending end and the FPGA board card [3]. The Verilog HDL (Hardware Description Language) language is used to implement the data encryption and decryption operations of the SM4 block cipher algorithm, which is encapsulated into an independent encrypted IP (Intellectual Property) core and called in the system [4]. In this paper, a face recognition technique using a decision tree is proposed to encrypt the matched faces. When receiving the photos of people who need to be queried, the system automatically reads the camera information and extracts the people, so as to complete face recognition. After identifying important people, we add their tag information and encrypt the transmission with the hardware; this achieves the effect of intelligent recognition and the protection of sensitive information. The information security problem is not only related to the personal privacy problem, it is also related to business interests, and the national security problem; therefore, how to ensure information security becomes an important problem that must be solved in the new era of development in this sector. By implementing the encryption algorithms through FPGA, provides a set of solutions for hardware encryption of important information for network information security. In the field of communications, with the improvement of bandwidth and performance requirements, traditional buses have been gradually phased out. At this time, the PCIe bus as a third-generation interconnect bus successfully solved the bottleneck problem during high-speed data transmission and obtained good

Entropy 2020, 22, 192; doi:10.3390/e22020192 www.mdpi.com/journal/entropy Entropy 2020, 22, 192 2 of 18 applications [5,6], which also means that the era when the traditional parallel bus started to develop towards the high-speed serial bus has arrived. The performance of the PCIe bus is excellent. It supports point-to-point serial connections between chips and devices and is characterized by a low overhead and low latency, thus greatly improving the bandwidth of the effective bus data [7,8]. In high-speed data transmission and storage systems, transmission through the PCIe bus has become the first choice. In this design, high-speed data transmission is also required for encryption and decryption. Therefore, the PCIe interface on the FPGA board card was developed to realize high-speed communication between the transmitter and the FPGA by using its DMA (Direct Memory Access) function. Software encryption has the advantages of simple implementation, low cost, and convenient updating, but software encryption occupies a large amount of CPU resources and causes system stalls. Additionally, it is also easy for attackers to use analysis programs to track, decompile and other means to attack [9], it has poor security, key management is more complex, and implementation is difficult. Hardware encryption is mainly implemented by a special encryption card or FPGA, with a fast encryption and decryption speed, a relatively complicated encryption process, and protection against being tracked by the outside world and attacked [10,11], as well as it being difficult to crack the hardware; therefore, the security reliability of hardware encryption is relatively high, which also means hardware encryption is being adopted by more and more enterprises. A decision tree is a simple but widely used classifier. By constructing the decision tree with the training data, the unknown data can be classified efficiently. The decision number has two advantages:

The decision tree model is readable and descriptive, which is helpful for manual analysis; • It is highly efficient: the decision tree only needs to be constructed once and is used repeatedly; • the maximum calculation time of each prediction cannot exceed the depth of the decision tree [12].

2. Sensitive Information Identification

2.1. Decision Tree The decision tree algorithm is a method to approximate the value of discrete functions. It is a typical classification method that first processes the data, generating readable rules and decision trees using inductive algorithms, and then it analyzes the new data using the decision [13,14]. In essence, a decision tree is a process of classifying data through a series of rules. Decision tree algorithms first emerged in the 1960s and late 1970s. The ID3 algorithm was proposed by J Ross Quinlan; it aims to reduce the depth of the tree, but ignores the study on the number of leaves [15]. The C4.5 algorithm was improved on the basis of the ID3 algorithm: the missing value processing, pruning technology, derivation rules, and other aspects of predictive variables are greatly improved, making it suitable for classification problems and regression problems [16,17]. The decision tree algorithm constructs a decision tree to discover classification rules contained in the data. How to construct a decision tree with high precision and a small scale is the core content of a decision tree algorithm. The classification problem represents the process of classifying instances based on features, which can be considered as an “if-then” set, or as a conditional probability distribution defined on feature space and class space. Decision trees usually have three steps: , decision tree generation, decision tree pruning [18]. Classification by decision tree: Starting from the root node, a certain feature of the instance is tested, and the instance is assigned to the next node according to the test results. At this time, each child node corresponds to a value of the feature. Test and assign the instance recursively until it reaches the leaf node, and finally divide the instance into the leaf node’s class. Entropy 2020, 22, 192 3 of 18

2.2. Information Gain

2.2.1. The Meaning of Information Entropy to a Decision Tree Because there are many bifurcation cases in the process of constructing a decision tree, the decision tree introduces a very important concept: information entropy. We apply “information entropy” as an index to measure the uncertainty (purity) of the sample set, use the information gain as a measure of purity, and select the feature with the largest information gain to split (as the node). The larger the information gain, the more concise the tree will be because this property is the root of the tree; therefore, the decision tree construction principle is to choose the maximum information gain. Intuitively, if a feature has better classification ability, or if the training dataset is divided into subsets according to this feature so that each subset has the best classification under the current conditions, then this feature should be selected more. Information gain is a good representation of this intuitive criterion. Entropy is defined as the expected value of information. If the thing to be classified may be divided into multiple classes, the information entropy of the symbol xi, is defined as

l(x ) = log p(x ) (1) i − 2 i where p(xi) is the probability of choosing the category. In order to calculate entropy, we need to calculate the expected value of the information contained in all possible values of all categories, which can be obtained by the following formula:

Xn H = p(x )log p(x ) (2) − i 2 i i=1 where n is the number of classifications, and the greater the entropy, the greater the uncertainty of the random variable. When the probability in entropy is obtained from data estimates (especially maximum likelihood estimates), the corresponding entropy is called empirical entropy. The empirical entropy formula is as follows: X c c H(D) = | k|log | k| (3) − D 2 D | | | | Before we get to the information gain, we should clarify what is meant by conditional entropy. The information gain represents the degree to which the information about feature X reduces the information uncertainty of class Y. Conditional entropy H(Y|X) represents the uncertainty of random variable Y under the condition that the random variable X is known. Conditional entropy H(Y|X) of random variable Y under the condition that the random variable X is given is defined by

Xn H(Y X) = p H(Y X = x ) (4) | i | i i=1 where pi = P(X = xi) when the probability of entropy and conditional entropy is obtained by data estimation (especially maximum likelihood estimation); the corresponding values are empirical entropy and empirical conditional entropy, respectively. In this case, if there is 0 probability, then 0log0 = 0. Entropy 2020, 22, 192 4 of 18

2.2.2. Information Gain Information gain is relative to the feature. Therefore, the information gain g(D, A) of feature A to the training dataset D is defined as the difference between the empirical entropy H(D) of set D and the empirical entropy H(D|A) of D under the given conditions of feature A, namely,

g(D, A) = H(D) H(D A) (5) − | Generally, the difference between entropy H(D) and conditional entropy H(D|A) becomes mutual information. The information gain in is equivalent to the mutual information of classes and features in the training dataset. The size of the information gain value is relative to the training dataset, and it has no absolute significance. When the classification problem is difficult, that is, when the empirical entropy of the training dataset is large, the information gain value is large, otherwise the information gain value is small. You can use the information gain ratio to correct this problem, which is another criterion for feature selection. Information gain ratio: the information gain ratio gR(D, A) of feature A to training dataset D is defined as the ratio of its information gain g(D, A) to the empirical entropy of training dataset D:

g(D, A) gR(D, A) = (6) H(D)

2.3. Constructing a Classifier with Feature Faces At the core of the eigenface is an unsupervised dimensionality reduction technique called principal component analysis (PCA). We demonstrate how to apply this general technology to face recognition. Face recognition is the challenge of classifying faces from the input image. A simple way is to take a new image, flatten it into a vector, and calculate the Euclidean distance between it and all other flattened images in the database. If we have a large face database, then the comparison will take a while for each face. The larger our dataset, the slower our algorithm. However, more faces also produce better results. We want a system that is both fast and accurate. We want to take high-dimensional images and compress them to smaller dimensions, while preserving the essence or important parts of the image. Dimensionality reduction is unsupervised learning. We want to take high-dimensional data (such as images) and represent them in low-dimensional space. In the simple 2D case, we want to find a line to project the points onto. After projecting the points, we will have the data in 1D instead of 2D. Similarly, if we have 3D data, we want to find a plane to project these points down to reduce the dimension of the data from 3D to 2D.

2.3.1. Principal Component Analysis Principal component analysis (PCA) is a multivariate statistical analysis method in which a number of variables are transformed by a linear transformation to select a small number of important variables. The core steps involved in PCA selecting these features are as follows:

1. Zero averaging: take the mean of each column, and then subtract the mean from all the numbers in that column; 2. Find the covariance matrix of this matrix; 3. Evaluate the eigenvalue and eigenmatrix; 4. Retain the main components (i.e., retain the first n features with large retention values).

In the actual dimensionality reduction, it is often necessary to transform the coordinate axis to cover as much data as possible, that is, to find the position of the largest variance of the data, which Entropy 2020, 22, 192 5 of 18 often gives the most important information of the data. After we find it, we record the direction as x-axis, then its y-axis is perpendicular to the x-axis. Therefore, the idea is to find the standard normalization of each element of the original matrix minus its own mean, then find its covariance, and finally find the eigenvalue and eigenvector of the covariance.

2.3.2. Building a Decision Tree with Sklearn Sklearn.tree provides a decision tree model for solving classification and regression problems. Now that we have discussed PCA and eigenfaces, we discuss how to write a face recognition algorithm using sklearn. First, we need a dataset. The dataset for this project is the Cropped Yale face dataset, which is an open dataset. Each folder is related to a person and contains multiple photos of that person. If you want to create your own facial dataset, you will need multiple pictures of each person’s face (from different angles and in different lighting) and real labels. The more types of faces you use, the better your recognizer will be. The easiest way to create a dataset for facial recognition is to create a folder for each person and then put facial images into it. The pictures should all be the same size, and they should be stored in an array of size NumPy (num examples, height, width). Finally, we divide the dataset into training and test sets. We must choose the number of components to reduce to, that is, the output dimension (the number of feature vectors to be projected to), and adjust this parameter to try to get the best results. Now that we have smaller facial expressions, we apply a classifier that uses dimensionality reduction input and generates class labels.

3. Encrypted Transmission System

3.1. Basic Functions of the SM4 Encryption Algorithm Aiming at the problem of wireless LAN equipment security, the SM4 algorithm, as the first commercial cipher block algorithm published by China’s cryptography administration, has attracted much attention since its release. In 2012, the SM4 algorithm was determined to be the national cryptographic industry standard in China. Both the plaintext and key block lengths of the SM4 algorithm are 128 bits, and the key expansion process and encryption and decryption processes use 32 rounds of non-linear iterative operations. The SM4 algorithm is typical Feistel structure, this structure is symmetrical, so the decryption process is very similar to the encryption process, except that the subkeys are used in the reverse order, the decryption subkey is the reverse order of the encryption subkey, and the encrypted and decrypted data is also 128 bits.

3.2. SM4 Encryption Algorithm Encryption and Decryption Process Both the data packet length and key length of the SM4 block encryption algorithm are 128 bits. In addition, both encryption and decryption are characterized by 32 rounds of iteration, each round of iteration requires the participation of a round key. Let the input plaintext be (X0, X1, X2, X3) of 128 bits. The input round key is rki, i = 0, 1, ... , 30, 31, for a total of 32, and the output ciphertext is (Y0, Y1, Y2, Y3), a total of 128 bits. Then, the encryption algorithm can be expressed as:

X + = F(X , X + , X + , X + , rk ) = X T(X + X + X + rk ), (7) i 4 i i 1 i 2 i 3 i i ⊕ i 1 ⊕ i 2 ⊕ i 3 ⊕ i

Y0, Y1, Y2, Y3 = R(X32, X33, X34, X35) = (X35, X34, X33, X32), (8) where the R function is the reverse order output introduced earlier. The decryption algorithm of the SM4 block cipher has the same architecture as the encryption algorithm, except that the 32 rounds Entropy 2020, 22, 192 6 of 18 of keys used in decryption are used in reverse order. When encrypting, the input order of the round key is rk0, rk1, rk2 ... , rk31. When decrypting, the input order of the round key is rk31, rk30, rk29 ... , rk0. Figure1 shows the implementation flowchart of the SM4 encryption algorithm. Firstly, 128-bit plaintext or ciphertext is the input, which is divided into four groups after input. The key expansion process produces 32 rounds of subkeys, which participate in the encryption algorithm process through the wheel function. After 32 rounds of iteration, the encryption and decryption operation is completed, and the encrypted ciphertext is output after the reverse sequence operation. If it is a decryption algorithm, only the round key is used in reverse order, and the final output data are also transformed inEntropy reverse 2020 order,, 22, x and then combined for output. 6 of 18

Figure 1. SM4 (A block cipher standard adopted by the government of the People’s Republic of China) Figure 1. SM4 (A block cipher standard adopted by the government of the People's Republic of China) encryption implementation block diagram. encryption implementation block diagram. The SM4 encryption core is mainly divided into the control module and the function realizationThe SM4 module. encryption core is mainly divided into the control module and the function realization module.The main function of the control module is to identify the type of data frames transmitted and storeThe them main accordingly. function Whenof the acontrol data frame module is sent is to to id theentify protocol the type analysis of data module, frames it istransmitted first identified and accordingstore them to accordingly. the identification When flag a data FFE25D frame of is the sent frame to the header protocol (This analysis frame headermodule, is it selected is first becauseidentified it isaccording the byte withto the the identification lowest probability flag FFE25D in Barker of the code, frame which header is not (This easy frame to be header confused is selected with erroneous because datait is andthe byte easy towith identify). the lowest The dataprobability framecarries in Barker valid code, data andwhich then is receivesnot easy the to databe confused frame header with information.erroneous data The and frame easy headerto identify). information The data includes frame carries the length valid of data the and data then and receives the type the of data.data Theframe case header statement information. is then usedThe frame to determine header theinformat type ofion data includes frame. the According length of tothe the data function and the of type the encryptionof data. The core, case the statement data frame is then is divided used to into determin three types:e the type key of 00, data initial frame. plain According text 01, and to initialthe function cipher textof the 02. encryption After determining core, the thedata frame frame type, is divided enable theinto corresponding three types: key FIFO 00, (First initial Input plain First text Output)01, and channelinitial cipher to store text data, 02. andAfter then determining wait for the the enable frame signal type, of enable the function the corresponding module for the FIFO next (First operation. Input First Output) channel to store data, and then wait for the enable signal of the function module for the next operation.

3.3. SM4 Encryption IP Core Design The SM4 encryption IP core module is divided into two modules: the control module and function realization module. The internal design block diagram of the encryption module is shown in Figure 2. The control module is divided into the analysis module, the storage module, the function realization module, which is a key expansion module, the encryption module, and the decryption module. In order to meet the requirements of different functions of the data in the process, the data is first sent to the control module of the encryption core. The protocol analysis module first cages the effective data in FIFO of different categories according to the data’s frame header, and then sends it to the channel of different functions. Because the packet encryption mode is an integer multiple of the packet length, when the data bit width is wrong, FIFO must be used for bit width conversion. In the function realization module, the key expansion function is first carried out, and the generated subkeys are provided to the encryption and decryption module for the iterative operation.

Entropy 2020, 22, x 7 of 18

The transmitter first sends the 128-bit initial key to the encryption core through the PCIe interface, Entropyexpands2020 it, 22into, 192 32 subkeys in the key expansion module, stores the expanded subkeys in the RAM,7 of 18 and sends the expansion completion signal to the encryption and decryption module after the expansion. Encryption module respectively in 32 ciphertext or clear after iterations, the operation 3.3. SM4 Encryption IP Core Design mode of the encryption module adopts full-duplex, encryption and decryption module for the mutualThe interference SM4 encryption of two IP separate core module modules, is divided can add into the two declassified modules: the work control at the module same time, and function but the realizationdecryption module.key is needed The internal whendesign used in block reverse diagram order, of theuse encryption the first wheel module key is is shown extended in Figure when2. Thecompleting control modulethe final isis dividedthe key, intoso it the is worth analysis noting module, that the storagework in module,the encryption the function module realization must be module,completed which before is athe key key expansion ext module, the encryption module, and the decryption module.

SM4 IP

Control Module Function Module

Eneryption Eneryption FIFO Module

Key Expand Discern FIFO Key Expand

Decryption Decryption FIFO Module

Figure 2. SM4 encryption core internal design block diagram. In order to meet the requirements of different functions of the data in the process, the data is 3.4. Hardware Implementation first sent to the control module of the encryption core. The protocol analysis module first cages the e3.4.1.ffective PL-Side data inDesign FIFO and of di Implementationfferent categories according to the data’s frame header, and then sends it to the channel of different functions. Because the packet encryption mode is an integer multiple of the packetIn length,the basic when configuration the data bit of widththe XDMA is wrong, (Xilinx FIFO DMA must IP core) be used module, for bit we width use the conversion. X4 link channel design.In theThe functiontransmission realization method module, of DMA theis AXI(Advanced key expansion eXtensible function isInterface) first carried memory out, mapped. and the generatedThe bandwidth subkeys of a are single provided channel to thecan encryptionreach 5GT/s. and The decryption AXI4 interface module is forselected. the iterative The data operation. bus bit Thewidth transmitter is set to 128 first bits sends and the 100 128-bit MHz initial for the key PCIe to the (Peripheral encryption Component core through Interconnect the PCIe interface, express) expandsreference it clock; into 32 the subkeys 125 MHz in theAXI key interface expansion clock module, provides stores clocks the for expanded other design subkeys modules in the RAM,on the and PL sends(Progarmmable the expansion Logic) completion side. The signalIP core to can the be encryption configured and with decryption eight independent module after DMA the expansion. channels, Encryptioncomprising module four H2C respectively channels and in 32 four ciphertext C2H channels. or clear after Through iterations, these the channels, operation read mode and of write the encryptiontransaction modulepackets adoptscan be full-duplex,generated to encryption the PCIe integrated and decryption module module or host for computer. the mutual interference of twoFor separate the setting modules, of the clock can add module, the declassified first, the differential work at the clock same is connected time, but theto the decryption utility buffer key isIP neededthrough when the pcie used sys_clk in reverse external order, interface, use the and first the wheel reference key clock is extended frequency when is set completing to 100 MHz. the finalWe use is theXDMA’s key, so sys_clk it is worth interface noting to input that thethe workclock insignals the encryption to allow it moduleto generate must its beown completed internal beforeclock and the provide key extension external module, output theclock key signals. extension The moduleclock domain can make of AXI4 complete is output signals, from then the axi_aclk add the decryptionclock interface operation. on the other end of IP. Other IP cores with AXI interfaces on the PL-side provide the clock. The PL (Progarmmable Logic)-side PCIe (Peripheral Component Interconnect express) module 3.4.design Hardware diagram Implementation is shown in Figure 3.

3.4.1. PL-Side Design and Implementation In the basic configuration of the XDMA (Xilinx DMA IP core) module, we use the X4 link channel design. The transmission method of DMA is AXI(Advanced eXtensible Interface) memory mapped. The bandwidth of a single channel can reach 5GT/s. The AXI4 interface is selected. The data bus bit width is set to 128 bits and 100 MHz for the PCIe (Peripheral Component Interconnect express) reference clock; the 125 MHz AXI interface clock provides clocks for other design modules on the PL (Progarmmable Logic) side. The IP core can be configured with eight independent DMA channels, Entropy 2020, 22, 192 8 of 18 comprising four H2C channels and four C2H channels. Through these channels, read and write transaction packets can be generated to the PCIe integrated module or host computer. For the setting of the clock module, first, the differential clock is connected to the utility buffer IP through the pcie sys_clk external interface, and the reference clock frequency is set to 100 MHz. We use XDMA’s sys_clk interface to input the clock signals to allow it to generate its own internal clock and provide external output clock signals. The clock domain of AXI4 is output from the axi_aclk clock interface on the other end of IP. Other IP cores with AXI interfaces on the PL-side provide the clock. The PL (Progarmmable Logic)-side PCIe (Peripheral Component Interconnect express) module design Entropy 2020, 22, x 8 of 18 diagramEntropy 2020 is, shown22, x in Figure3. 8 of 18

Figure 3. PL (Progarmmable Logic)-side PCIe (Peripheral Component Interconnect express) module FigureFigure 3.3. PLPL (Progarmmable(Progarmmable Logic)-sideLogic)-side PCIePCIe (( PeripherPeripheralal ComponentComponent InterconnectInterconnect express)express) modulemodule design diagram. designdesign diagram.diagram. The internal block diagram of the encryption module is shown in Figure4. The encryption module The internal block diagram of the encryption module is shown in Figure 4. The encryption readsThe the 32internal wheel block keys fromdiagram the subkeyof the storageencryption module module rk_ram_en is shown and in participates Figure 4. The in the encryption iterative module reads the 32 wheel keys from the subkey storage module rk_ram_en and participates in the operationmodule reads of the the encryption 32 wheel keys module from encryption. the subkey The storage 32 encrypted module rk_ram_en sub-ciphertexts and participates are stored inin thethe iterative operation of the encryption module encryption. The 32 encrypted sub-ciphertexts are stored sub-ciphertextiterative operation storage of the module encryption cipher_ram. module The encr finalyption. ciphertext The 32 synthesis encrypted module sub-ciphertexts cipher_result are stored reads in the sub-ciphertext storage module cipher_ram. The final ciphertext synthesis module cipher_result thein the last sub-ciphertext four rounds of storage sub-ciphertext, module cipher_ram. and reverses Th theme final into ciphertext the final synthesis ciphertext module data. cipher_result readsreads thethe lastlast fourfour roundsrounds ofof sub-ciphertext,sub-ciphertext, anandd reversesreverses themthem intointo thethe finalfinal ciphertextciphertext data.data.

Figure 4. Block diagram of the encryption module. FigureFigure 4.4. BlockBlock diagramdiagram ofof thethe encryptionencryption module.module. The internal block diagram of the decryption module is shown in Figure5. The decryption moduleTheThe reads internalinternal the 32blockblock wheel diagramdiagram keys from ofof thethe the decryptiondecryption subkey storage momoduledule module isis shownshown rk_ram_en inin FigureFigure and participates5.5. TheThe decryptiondecryption in the iterativemodulemodule reads operationreads thethe 32 of32 thewheelwheel decryption keyskeys fromfrom module thethe subkeysubkey decryption. ststorageorage The modulemodule 32 decrypted rk_ram_enrk_ram_en child andand plaintexts participatesparticipates are stored inin thethe in plain_ram,iterativeiterative operationoperation and the offinalof thethe plaintext decryptiondecryption synthesis modulemodule module decrypdecryp plain_resulttion.tion. TheThe 3232 reads decrypteddecrypted the last childchild four plaintextsplaintexts wheels of areare plaintext storedstored andinin plain_ram,plain_ram, reverses them andand into thethe thefinalfinal final plaintextplaintext plaintext synthesissynthesis data. modulemodule plain_resultplain_result readsreads thethe lastlast fourfour wheelswheels ofof plaintextplaintext andand reversesreverses themthem intointo thethe finalfinal plaintextplaintext data.data.

FigureFigure 5.5. DecryptionDecryption modulemodule internalinternal blockblock diagram.diagram.

AfterAfter wewe implementimplement thethe twotwo majormajor modulesmodules ofof controcontroll andand functionfunction inin thethe encryptionencryption core,core, thethe lastlast thingthing toto dodo isis toto encapsulateencapsulate thethe SM4SM4 encryptionencryption modulemodule intointo aa completecomplete IPIP core,core, asas shownshown inin FigureFigure 6,6,

Entropy 2020, 22, x 8 of 18

Figure 3. PL (Progarmmable Logic)-side PCIe ( Peripheral Component Interconnect express) module design diagram.

The internal block diagram of the encryption module is shown in Figure 4. The encryption module reads the 32 wheel keys from the subkey storage module rk_ram_en and participates in the iterative operation of the encryption module encryption. The 32 encrypted sub-ciphertexts are stored in the sub-ciphertext storage module cipher_ram. The final ciphertext synthesis module cipher_result reads the last four rounds of sub-ciphertext, and reverses them into the final ciphertext data.

Figure 4. Block diagram of the encryption module.

The internal block diagram of the decryption module is shown in Figure 5. The decryption module reads the 32 wheel keys from the subkey storage module rk_ram_en and participates in the iterative operation of the decryption module decryption. The 32 decrypted child plaintexts are stored Entropy 2020, 22, 192 9 of 18 in plain_ram, and the final plaintext synthesis module plain_result reads the last four wheels of plaintext and reverses them into the final plaintext data.

Figure 5. Decryption module internal block diagram. Entropy 2020, 22, x 9 of 18 Figure 5. Decryption module internal block diagram. After we implement the two major modules of control and function in the encryption core, the last and call it in the system design. This is because inter-communication uses the AXI bus protocol, so we thing toAfter do iswe to implement encapsulate the the two SM4 major encryption modules of module control into and afunction complete in the IP core,encryption as shown core, in the Figure last 6, encapsulate the encryption core into a module with an AXI interface and call it in the system. andthing call to it do in is the tosystem encapsulate design. the ThisSM4 isencryption because inter-communicationmodule into a complete uses IP core, theAXI as shown bus protocol, in Figure so 6, we encapsulateAfter encapsulating the encryption the core IP, we into select a module the repository with an AXImanager interface in the and IP settings call it in to the add system. its IP project address, and then we can call our IP in the user repository in the IP catalog.

Figure 6. AXI (Advanced eXtensible Interface) interface SM4 encryption IP core. Figure 6. AXI (Advanced eXtensible Interface) interface SM4 encryption IP core. After encapsulating the IP, we select the repository manager in the IP settings to add its IP project 3.4.2.address, Communication and then we canbetween call our PL IP (Progarmmable in the user repository Logic) and in the PS IP(Processing catalog. System) When the plaintext data to be encrypted is transmitted from the PCIe interface on the PL-side to 3.4.2. Communication between PL (Progarmmable Logic) and PS (Processing System) the SM4 encrypted IP core for encryption, the processed data needs to be stored because the IP of the PCIe Whenand SM4 the encryption plaintext data cores to mainly be encrypted use the isAXI transmitted interface clock from derived the PCIe from interface XDMA on to the work. PL-side The clockto the is SM4 125 encrypted MHZ, and IP the core PS-side for encryption, is derived the from processed the IP datacore needsof ZYNQ to be (Industry's stored because First theScalable IP of Processingthe PCIe and Platform SM4 encryption from Xilinx) cores as mainly100 MHZ. use theTher AXIefore, interface when clockthe PL-side derived wants from to XDMA communicate to work. withThe clockthe PS-side, is 125 MHZ, it needs and to theprocess PS-side across is derived the cloc fromk domain. the IP In core order of ZYNQto meet (Industry’s this condition First and Scalable store theProcessing data, it uses Platform the design from Xilinx)concept as of 100 asynchronous MHZ. Therefore, FIFO. when the PL-side wants to communicate withThe the PS-side,encrypted it needsdata output to process from across the encryption the clock domain. core is buffered In order in to the meet FIFO. this conditionAfter the PS-end and store is enabled,the data, the it uses data the is designread from concept the ofFIFO. asynchronous The main FIFO.interface of the FIFO is connected to the slave interfaceThe encryptedof the AXI data DMA. output The frommain the function encryption of AXI core DMA is bu ffisered to provide in the FIFO. high-bandwidth After the PS-end direct is memoryenabled, theaccess data between is read from memory the FIFO. and The AXI main interface interface peripherals. of the FIFO It is has connected a complete to the on-chip slave interface DMA transferof the AXI function, DMA. Thewhich main can function free the ofPS AXI (Processin DMA isg toSystem)-side provide high-bandwidth CPU from the directdata transfer memory task. access In ZYNQ,between AXI memory DMA is and the AXI bridge interface for FPGA peripherals. to access DDR3 It has a(Third-generation complete on-chip double-rate DMA transfer synchronous function, dynamicwhich can random free the access PS (Processing memory), System)-side but this proce CPUss is from monitored the data and transfer managed task. by In ZYNQ,ARM (Advanced AXI DMA RISCis the Machine). bridge for FPGA to access DDR3 (Third-generation double-rate synchronous dynamic random accessThe memory), data transmitted but this process through is AXI_DMA monitored andis connected managed to by the ARM ZYNQ (Advanced processor RISC through Machine). the AXI connector.The data For transmittedthe ARM processing through AXI_DMAcore, we need is connected to define tosome the settings. ZYNQ processor The AXI_HP through interface the AXI on ZYNQconnector. is a high-speed For the ARM interface processing with high-performan core, we need toce define transmission. some settings. The PL Themodule AXI_HP can be interface connected on asZYNQ a master is a high-speeddevice, which interface is mainly with used high-performance by the PL to access transmission. the memory The on PL the module PS. We canuse bethis connected interface hereas a to master connect device, PS-side which DDR is for mainly high-speed used byencrypted the PL todata access storage the transition. memory onWe the first PS. need We to use enable this theinterface HP interface here to connectin the processor PS-side DDR IP. When for high-speed the data transfer encrypted is complete, data storage we transition.also need to We enable first need the interruptto enable interface the HP interfaceand notify in the the ARM-side processor to IP. perform When thethe corresponding data transfer is processing. complete, Later, we also onneed the PS- to side,enable the the GPIO interrupt is used interface to transmit and notifyinstructions the ARM-side on the ARM-side to perform and the use corresponding the network processing.port to send Later, data toon the the Internet. PS-side, Therefore, the GPIO iswe used need to to transmit enable instructionsthe GPIO interface on the and ARM-side Ethernet and interface use the in network the processor port to IP. The situation of enabling peripherals in ZYNQ is shown in Figure 7.

Entropy 2020, 22, 192 10 of 18 send data to the Internet. Therefore, we need to enable the GPIO interface and Ethernet interface in the processor IP. The situation of enabling peripherals in ZYNQ is shown in Figure7. Entropy 2020, 22, x 10 of 18 Entropy 2020, 22, x 10 of 18

FigureFigure 7.7. ZYNQZYNQ (Industry’s(Industry's First Scalable Processing PlatformPlatform from Xilinx) IP (Intellectual property) coreFigurecore settingsetting 7. ZYNQ peripherals.peripherals. (Industry's First Scalable Processing Platform from Xilinx) IP (Intellectual property) core setting peripherals. After the IP setting is completed, the next step is to open the address setting page to assign AfterAfter thethe IPIP settingsetting isis completed,completed, thethe nextnext stepstep isis toto openopen thethe addressaddress settingsetting pagepage toto assignassign addresses to each IP core controlled by the ARM-side, as shown in Figure8. addressesaddresses toto eacheach IPIP corecore controlledcontrolled byby thethe ARM-side,ARM-side, asas shownshown inin FigureFigure 8.8.

Figure 8. IP core address allocation. Figure 8. IP core address allocation. After that, we only need to synthesize and implement the built hardware system, and finally generateAfter a that,bit stream we only file needand import to synthesize the hardware and implementim systemplement into the the built SDK hardware (Software system, Development and finallyfinally Kit). generateThe next a stepbit stream is to performfile file and import the bare-metal the hardware program system of into the PLthe SDKand SDK PS(Software (Software data communication Development Development Kit). Kit).and Thenetwork nextnext step portstep is controlis to to perform perform module the the bare-metalin thebare-metal SDK programdesign. program Th ofe the frame of PL the and diagram PL PS and data of PS communicationthe data PS-end communication system and design network and is portnetworkshown control in port Figure module control 9. in module the SDK in design.the SDK The design. frame Th diagrame frame ofdiagram the PS-end of the system PS-end design system is showndesign inis Figureshown9 in. Figure 9.

Figure 9. PL and PS communication hardware block diagram. Figure 9. PL and PS communication hardware block diagram. Entropy 2020, 22, x 10 of 18

Figure 7. ZYNQ (Industry's First Scalable Processing Platform from Xilinx) IP (Intellectual property) core setting peripherals.

After the IP setting is completed, the next step is to open the address setting page to assign addresses to each IP core controlled by the ARM-side, as shown in Figure 8.

Figure 8. IP core address allocation.

After that, we only need to synthesize and implement the built hardware system, and finally generate a bit stream file and import the hardware system into the SDK (Software Development Kit). The next step is to perform the bare-metal program of the PL and PS data communication and Entropy 2020, 22, 192 11 of 18 network port control module in the SDK design. The frame diagram of the PS-end system design is shown in Figure 9. ZYNQ DDR AXI DMA FIXED_IO AXI Interconnect S_AXI_HPO AXIS FIFO S_AXI_LITE M_AXIS_S2MM S_AXI M_AXI M_AXI_GPO S_AXIS M_AXIS S_AXIS_S2MM

AXI Interconnect AXI GPIO M00_AXI S_AXI GPIO S_AXI M01_AXI

Figure 9. PL and PS communication hardware block diagram. Figure 9. PL and PS communication hardware block diagram. 4. Results

4.1.Entropy Face 2020 Recognition, 22, x 11 of 18

4. RTheesults dataset used in this project is tailored to the Yale facial dataset (Cropped Yale face dataset); it is an open dataset, each folder is associated with a person, and contains more than the person’s photo. There4.1. Face are Recognition 38 unique faces in the dataset, each with a maximum of 64 photos. The dataset is a kind of high-dimensional data, and we reduced the dimension. InThe this dataset project, used we in this chose project to use is tailored principal to the component Yale facial analysis dataset (PCA),(Cropped because Yale face PCA dataset); uses eigenvectorsit is an open/eigenvalues dataset, each to folder selectively is associated identify with important a person, features and contains that fully more represent than the the person’s image. Therefore,photo. There the firstare 38 step unique is to findfaces the in characteristicsthe dataset, each of thewith face. a maximum of 64 photos. The dataset is a kindIn of order high-dimensional to find the characteristics data, and we of reduced the face, the a gradual dimension. process is followed: In this project, we chose to use principal component analysis (PCA), because PCA uses 1.eigenvectors/eigenvalues Construct a data matrix to D;selectively identify important features that fully represent the image. 2.Therefore, Find a the mean first face step (like is to Figure find the10); characteristics of the face. 3. ReduceIn order the to dimensionfind the characteristics of the data matrix. of the face, a gradual process is followed: 1. Construct a data matrix D; A2. dimensionlessFind a mean data face matrix (like Figure means 10); subtracting the mean from each row to ensure that the mean of any3. columnReduce of D the is 0;dimension of the data matrix. A dimensionless data matrix means subtracting the mean from each row to ensure that the mean 4.of anyAnalyze column the of obtainedD is 0; matrix using principal component analysis to identify the featured face.

FigureFigure 10. 10.The The mean mean face face of of the the dataset. dataset.

Plot the first nine principal components, and again reshape each W * H principal component into Plot the first nine principal components, and again reshape each W * H principal component a w-width and h-height image. These principal components are called characteristic surfaces is shown into a w-width and h-height image. These principal components are called characteristic surfaces is inshown Figure in 11 Figure. We call 11. thisWe thecall Ethis matrix, the E andmatrix, it stores and theit stores principal the principal components components as rows. as rows. Entropy 2020, 22, x 12 of 18 Entropy 2020, 22, 192 12 of 18

Figure 11. The characteristic surfaces of the dataset.

Data matrix structure. Figure 11. The characteristic surfaces of the dataset. •  1.Data Loadmatrix the structure. dataset in grayscale format; 2.1. LoadConvert the eachdataset image in grayscale into vector format; format; 3.2. ConvertThese vector each horizontalimage into stackedvector format; into a matrix D, if the width and height of each image W 3. Theseand H ,vector respectively, horizontal the totalstacked number into a of matrix images D for, if Nthe, is width the size and of height the D forof eachN *( Wimage* H); W 4. andEach H data, respectively, point is labeled the total as number the person’s of images identity, for N which, is the issize stored of the in D an forN N-dimensional * (W * H); 4. Eachvector dataY. point is labeled as the person’s identity, which is stored in an N-dimensional vector Y. ReducedReduced datadata isis obtainedobtained byby projectingprojecting thethe datadata pointspoints ontoonto thethe principalprincipal componentscomponents byby matrixmatrix multiplicationmultiplication i.e.,i.e., P = D ET. (9) 𝑃=𝐷⋅𝐸· . (9) This means that projections of a data point onto the principal components are obtained by taking This means that projections of a data point onto the principal components are obtained by taking the dot product of the data point with each of the principal components. The matrix multiplication just the dot product of the data point with each of the principal components. The matrix multiplication expresses this in a compact and computationally efficient format. just expresses this in a compact and computationally efficient format. Therefore, the ith row of P consists of principal component projections for the ith data point Therefore, the ith row of P consists of principal component projections for the ith data point i.e., i.e., principal component scores that can be used to reconstruct the ith data point using the principal component scores that can be used to reconstruct the ith data point using the principal principal components in E. The jth column of P captures the projection of data points onto the components in E. The jth column of P captures the projection of data points onto the jth principal jth principal component. component. Therefore, by applying PCA, we found the reduced form for all the data points represented using Therefore, by applying PCA, we found the reduced form for all the data points represented the first nine principal components. using the first nine principal components.  BuildingBuilding a a classification classification system. system. • Now, we use the decision trees to build a face classification system. This can be done using the Now, we use the decision trees to build a face classification system. This can be done using the original features as well as the features obtained from PCA-based dimensionality reduction P. The original features as well as the features obtained from PCA-based dimensionality reduction P. The ith ith row in both matrices corresponds to the ith data point. The label for each data point is the person’s row in both matrices corresponds to the ith data point. The label for each data point is the person’s identity that was stored into an N-dimensional vector. identity that was stored into an N-dimensional vector.

Entropy 2020, 22, 192 13 of 18

For the purpose of cross-validation, we split the data (D, P, and y) into 80% training data and 20% test data. Fitting a decision tree model to original data. • Initially, we fit a decision tree model using the training data D and y without any pruning, that is, we let the tree grow until it is impossible to split (all records in the data subset have the same output or the same set of input attributes). We use this model to predict the test set y. These predictive labels are evaluated against real labels to calculate test errors. Tree pruning. • Pruning is one of the methods used to stop branching of the decision tree. There are two types of pruning: pre-pruning and post-pruning. Pre-pruning is used to set an index during the growth of the tree, and stop the growth when the index is reached. Doing so easily results in a “horizon limit”, that is, once the branch is stopped and node N becomes a leaf node, its successor is severed. These stopped branches will mislead the learning algorithm, resulting in a greatly reduced purity of the generated tree, closer to the root node. The tree, during post-pruning, must first be fully grown until the leaf nodes have the smallest impurity value, so the “horizontal limitation” can be overcome. Then, whether to eliminate them for all adjacent pairs of leaf nodes is considered. If the elimination can cause satisfactory impure growth, then the elimination is performed and their common parent node is made into a new leaf node. This method of “merging” leaf nodes is exactly the opposite of the process of node branching. After pruning, leaf nodes are often distributed on a wide level, and the tree becomes unbalanced. The advantage of post-pruning technology is that it overcomes the “horizontal limitation” effect, and does not need to retain samples for cross-validation, making full use of the information of the entire training set. However, the computational cost of post-pruning is much greater than that of pre-pruning, especially in large sample sets. However, for small samples, post-pruning is still better than pre-pruning. Here, we describe the method of post-pruning. The purpose of pruning is not to minimize the loss function. The purpose of pruning is to achieve a better generalization ability. For a decision tree, the larger the number of leaf nodes, the more the decision tree responds to the details of the training data, which weakens the generalization ability. To improve the generalization ability, pruning is needed. Fitting a decision tree model using the principal component training data P and y. • At the beginning, we fit a decision tree model using the training data P and y without any pruning.We use this model to predict the test set y. These predicted labels are evaluated against the true labels to compute the test error. The number of terminal nodes is 1783. The maximum depth is 29. The number of leaf nodes is 896. Since the tree is a bit too large, pruning has to be done. In order to fit a pruned tree, the maximum depth hyperparameter is chosen using 10-fold cross-validation (CV). This pruned tree is evaluated on the test data on the basis of the test error. The maximum depth hyperparameter versus the cross-validation error is plotted to find the optimal hyperparameter in Figure 12. The test error is about 0.658. Entropy 2020, 22, 192 14 of 18 Entropy 2020, 22, x 14 of 18 Entropy 2020, 22, x 14 of 18

Figure 12. The maximum depth hyperparameter versus the cross-validation error. Figure 12. The maximum depth hyperparameter versus the cross-validation error. Figure 12. The maximum depth hyperparameter versus the cross-validation error. 2D hyperparameter search. • 2D hyperparameter search .  Now,Now,2D hyperparameter we we not not only only adjust adjust search the the maximum. maximum tree tree depth depth but but also also performed performed the the 2D 2D cross-validation cross-validation onon the the trainingNow, training we data datanot Ponly P and and adjust y y to to adjust adjustthe maximum the the two two hyperparameters treehyperparameters depth but also (1—the (1—the performed maximum maximum the 2D tree tree cross-validation depth depth in in the the decisiondecisionon the treetraining tree training; training; data 2—the P 2—theand number y tonumber adjust of top ofthe principaltop two princi hyperparameters componentspal components used (1—the asused features). maximum as features). A heat tree mapA depth heat where inmap the thewheredecision X-axis the is tree X-axis the training; maximum is the 2—themaximum tree depth number tree superparameter ofdepth top princisuperppal andarameter components the Y-axis and is the theused Y-axis top as principalfeatures). is the top component A principalheat map characteristiccomponentwhere the characteristicX-axis number is wasthe drawn.numbermaximum was tree drawn. depth superparameter and the Y-axis is the top principal componentTheThe calorific calorific characteristic value value is is a anumber cross-validation cross-validation was drawn. error. error. We We use use this this heat heat map map in in Figure Figure 13 13 to to calculate calculate a a hyperparameterhyperparameterThe calorific combination combination value is a that cross-validationthat minimizes minimizes cross-validation cross-validationerror. We use errors.this errors. heat map in Figure 13 to calculate a hyperparameter combination that minimizes cross-validation errors.

Figure 13. Heatmap wherein the heat value is the cross-validation error. The combination of Figure 13. Heatmap wherein the heat value is the cross-validation error. The combination of hyperparametersFigure 13. Heatmap that minimizes wherein thecross-validati heat valueon iserror the is cross-validation computed using error.this heatmap. The combination of hyperparameters that minimizes cross-validation error is computed using this heatmap. hyperparameters that minimizes cross-validation error is computed using this heatmap.  The above model is used to fit the decision tree model.  TheUsingThe above above the model optimal model is usediscombination used to fitto thefit theof decision thedecision hyperparameters tree tree model. model. that we find, we train the model with • the UsingwholeUsing thetraining the optimal optimal set. combination Using combination this model, of the of hyperparametersthe we hyperparameters predict y in the that test that we set find, we and find, we evaluate train we thetrain these model the predictionsmodel with thewith wholeagainstthe whole training the realtraining set. label Using set. y to thisUsing calculate model, this themodel, we test predict error.we ypredic in thet testy in setthe and test evaluate set and theseevaluate predictions these predictions against theagainst realThe label theerror yreal to of calculate labelthe parameter y to the calculate test test error. obtainedthe test error. by the 2D hyperparameter search is about 0.36, which is half ofThe the errororiginal of thevalue. parameter Obviously, test comparedobtained by with the a 2D one-dimensional hyperparameter superparameter search is about search 0.36, which for the is depthhalf of of the the original largest tree,value. a two-dimensionalObviously, compared hyperpar withameter a one-dimensional search can help superparameter improve the testsearch error. for the depth of the largest tree, a two-dimensional hyperparameter search can help improve the test error.

Entropy 2020, 22, 192 15 of 18

The error of the parameter test obtained by the 2D hyperparameter search is about 0.36, which is half of the original value. Obviously, compared with a one-dimensional superparameter search for the depth of the largest tree, a two-dimensional hyperparameter search can help improve the test error. By projecting the unknown face onto the feature vector, we can see the difference between the unknown face and the average image in different directions. Therefore, if we do this with each row of the bias matrix that we obtained, each row is going to give us a weight vector. We used the Euclidean distance of Ω and Ωk to judge the difference between the unknown face and the kth trained face. The smallest k chosen is the face of the training set corresponding to the face. In fact, usually, the distance threshold is set, and when the distance is less than the threshold, it means that the face to be identified is the same person as the KTH face in the training set. When the traversal of all training sets is greater than the threshold value, it can be divided into one of two things: a new face or non-face, according to the size of the distance value. The threshold setting is not fixed according to the training set. Finally, we can make predictions and use functions to print a complete quality report for each class, as shown in Table1.

Table 1. Quality report.

Precision Recall F1-Score Support YaleB02 0.92 0.72 0.81 50 YaleB03 0.89 0.91 0.90 65 YaleB05 0.87 0.63 0.73 46 Average/total 0.89 0.75 0.81 161

We saw precision, recall, F1-score, and support. Support is only the number of times this ground truth label appears in our test set. In fact, F1-score is calculated based on the precision and recall scores. Precision and recall are more specific indicators than a single accuracy score; the higher the value, the better.

4.2. Encryption Transmission In the overall system design, it was necessary to use to the FPGA and ARM platform. In order to satisfy this, the design chip was selected using the Xilinx company Zynq-7000 series, Zynq-7000 series based on the Xilinx fully programmable extensible processing platform structure. The structure of the processor development platform (hereinafter referred to as the PS-side) was based on the ARM dual-core Cortex-A9 processor. The structure of programmable logic design side (hereinafter referred to as PL) was based on Xilinx programmable logic resources for development. This system was designed on the basis of the SOC platform, and the PL-end and PS-end were jointly used in the development. This ensured the practicability and extensibility of the encryption system. One of the faces from the Cropped Yale Face dataset for the encrypted transmission test was selected, as shown in Figure 14. The performance of this high-speed PCIe encrypted transmission card was compared with that of similar encrypted transmission cards, as shown in Table2. The PCIe encryption card in SG DMA mode used in this design has an obvious advantage in data transmission speed compared with similar encryption cards: when the data volume reaches about 1M, the transmission speed can reach 1 GB/s. Entropy 2020, 22, 192 16 of 18 Entropy 2020, 22, x 16 of 18

Figure 14. The encryption of sensitive informatio informationn and additional location information.

Table 2. Horizontal comparison of encryption card performance. Table 2. Horizontal comparison of encryption card performance.

EncryptionEncryption Card Type Card Type Clock Clock Speed Speed PCI encryption card 100 MHz 80 MB/s PCIPCIe encryption encryption card card (BL DMA) 100 100 MHz MHz 500 MB 80/s MB/s PCIe encryption card (SG DMA) 100 MHz 1 GB/s PCIe encryption card (BL DMA) 100 MHz 500 MB/s

5. ConclusionsPCIe encryption and Future card Work(SG DMA) 100 MHz 1 GB/s Since the tree was a bit too large, pruning had to be done. In order to fit a pruned tree, the maximum5. Conclusions depth and hyperparameter Future Work is chosen using 10-fold cross-validation (CV). The pruned tree was evaluated using the test data on the basis of the test error. InSince view the of tree the increasinglywas a bit too serious large, information pruning had security to be done. problem, In order this paper to fit presents a pruned a hardware tree, the encryptionmaximum depth transmission hyperparameter card, which is chosen realizes using face 10-fold recognition cross-validation using a decision (CV). tree, and performs encryptionThe pruned and decryption tree was evaluated functions using based the on test an FPGA data on chip, the whichbasis of has the a test high error. level of security. ThisIn view system of the adopts increasingly the Xilinx serious company’s information Zynq-7000 security series problem, chip, underthis paper the fullypresents programmable a hardware SOCencryption architecture, transmission utilizing card, the which flexibility realizes of the face FPGA recognition and ARM using processor a decision co-design, tree, and and performs adopts theencryption IPI tool and for developmentdecryption functions and design, based which on an improves FPGA chip, the which practicability has a high and level extensibility of security. of the encryptionThis system system. adopts the Xilinx company’s Zynq-7000 series chip, under the fully programmable SOC Thisarchitecture, paper is utilizing an in-depth the flexibility study of theof the hardware FPGA and design ARM of processor high-speed co-design, PCIe encryption and adopts cards. the TheIPI tool specific for contentsdevelopment are as and follows: design, which improves the practicability and extensibility of the encryption system. 1. ThisThe constructionpaper is an in-depth of a classifier study using of the characteristic hardware design faces to of achieve high-speed face recognition;PCIe encryption cards. 2.The specificThe maximum contents depthare as hyperparameterfollows: versus the cross-validation error is plotted to find the 1. Theoptimal construction hyperparameter; of a classifier using characteristic faces to achieve face recognition; 3.2. TheThe parametersmaximum aredepth obtained hyperparameter through a 2D versus hyperparameter the cross-validation search. Clearly, error theis 2Dplotted hyperparameter to find the optimalsearch helpedhyperparameter; improve test error compared to the 1D hyperparameter search for maximum 3. Thetree depth;parameters are obtained through a 2D hyperparameter search. Clearly, the 2D 4. hyperparameterHardware implementation search helped of each improve module test design error of compared the encryption to the transmission 1D hyperparameter system. Onsearch the forFPGA-side, maximum the tree PCIe depth; interface is developed based on XDMA. The hardware is realized, and 4. Hardwarehigh-speed implementation data transmission of betweeneach module host anddesig boardn of isthe completed. encryption In transmission the SM4 encryption system. core On themodule, FPGA-side, the control the PCIe module interface implements is developed protocol based parsing on andXDMA. storage, The whilehardware the function is realized, module and high-speedimplements data key transmission expansion, encryption, between host and and decryption, board is completed. and encapsulates In the SM4 this encryption into an IP core module,with an AXIthe interface,control module naming itimplements in the system. protoc Underol theparsing ARM and processor storage, environment, while the thefunction DMA module implements key expansion, encryption, and decryption, and encapsulates this into an IP core with an AXI interface, naming it in the system. Under the ARM processor environment,

Entropy 2020, 22, 192 17 of 18

data communication between the PL-end and PS-end is realized, and the MAC network control module is used to realize data transmission; 5. Hardware testing is carried out on the sensitive information recognition and encryption transmission system. Firstly, the system test hardware platform was built, and then the function and performance of the PCIe interface were tested and analyzed.

After that, the function of the SM4 encryption core was tested. Finally, the whole encryption transmission system was tested, and the whole system had a complete set of functions. There are still many areas to be optimized in this design:

1. Although the decision tree is used to realize face recognition, the algorithm is rarely improved, and the recognition accuracy needs to be improved; 2. In the future, we will create and train our own datasets and realize real-time face recognition on video frames; 3. The implementation scheme of the SM4 encryption algorithm can be improved by converting it into an assembly line design; this will increase the total resource consumption of the design, and improve the running speed of the design and the efficiency of the system implementation; 4. When PCIe transfers data, it can use PIO to transfer the key and DMA to transfer the data according to the address; this no longer requires the design of a frame head resolution module, which improves the usability and scalability of the design; 5. In the future, we will use FPGAs to implement public key encryption algorithms to complete digital signatures, key exchanges, etc. These modules will be combined with our existing SM4 IP.

Author Contributions: Conceptualization, S.W.; Formal analysis, Z.Y.; Investigation, S.W.; Project administration, Z.Y.; Resources, S.L.; Software, S.L.; Supervision, Z.Y.; Validation, Y.L.; , Y.L.; Writing—original draft, S.L.; Writing—review & editing, Y.L. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Wei, H.; Cui, H.; Lu, X. Differential algebraic analysis of the SMS4 block cipher algorithm. J. Chengdu Univ. (Nat. Sci. Ed.) 2012, 31, 158–160. 2. Cai, Y.; Qu, Y. Research and design of SMS4 encryption chip based on single-wheel loop structure. Electron. Des. Eng. 2016, 39–42, 46. 3. Wang, Z.; Gao, Q. Design of DMA controller based on the FPGA PCIe bus interface. Electron. Technol. Appl. 2008, 44, 9–12. 4. Li, L.; Liu, B.; Wang, H. QTL: A new ultra-lightweight block cipher. Microprocess. Microsyst. 2016, 45, 45–55. [CrossRef] 5. Martínez-Herrera, A.; Mancillas-López, C.; Mex-Perera, C. GCM implementations of Camellia-128 and SMS4 by optimizing the polynomial multiplier. Microprocess. Microsyst. 2016, 45, 129–140. [CrossRef] 6. Wu, J. Research and implementation of mixed cryptography algorithm based on SM4 and SM2. Softw. Guide 2013, 12, 127–130. 7. Lang, H.; Zhang, L.; Wu, W. Fast software implementation technology of SM4. J. Univ. Chin. Acad. Sci. 2008, 35, 180–187. 8. Wang, Y. Research on the hardware implementation of SMS4 cryptography. Autom. Instrum. 2015, 133, 137. 9. Legat, U.; Biasizzo, A.; Novak, F. A compact AES core with on-line error-detection for FPGA applications with modest hardware resources. Microprocess. Microsyst. 2011, 35, 405–416. [CrossRef] 10. Wei, Y. Design of PCIe Bus DMA Platform Based on FPGA; Wuhan University of Technology: Wuhan, China, 2013. 11. Mohd, B.; Hayajneh, T.; Vasilakos, A. A survey on lightweight block ciphers for low-resource devices: Comparative study and open issues. J. Netw. Comput. Appl. 2015, 58, 73–93. [CrossRef] Entropy 2020, 22, 192 18 of 18

12. Magerman, D.M. Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, MA, USA, 26–30 June 1995; pp. 276–283. 13. Freund, Y.; Mason, L. The alternating decision tree learning algorithm. Icml 1999, 99, 124–133. 14. Polat, K.; Güne¸s,S. Classification of epileptiform EEG using a hybrid system based on decision tree classifier and fast Fourier transform. Appl. Math. Comput. 2007, 187, 1017–1026. [CrossRef] 15. Tso, G.K.F.; Yau, K.K.W. Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks. Energy 2007, 32, 1761–1768. [CrossRef] 16. Mulay, S.A.; Devale, P.R.; Garje, G.V. Intrusion detection system using support vector machine and decision tree. Int. J. Comput. Appl. 2010, 3, 40–43. [CrossRef] 17. Pal, M.; Mather, P.M. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ. 2003, 86, 554–565. [CrossRef] 18. Sahinoglu, M. Security meter: A practical decision-tree model to quantify risk. IEEE Secur. Priv. 2005, 3, 18–24. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).