applied sciences

Article Deep Neural Network-Based Guidance Law Using Supervised Learning

Minjeong Kim , Daseon Hong and Sungsu Park * Department of Aerospace Engineering, Sejong University, Seoul 05006, Korea; [email protected] (M.K.); [email protected] (D.H.) * Correspondence: [email protected]; Tel.: +82-2-3408-3769

 Received: 13 October 2020; Accepted: 2 November 2020; Published: 6 November 2020 

Abstract: This paper proposes that the deep neural network-based guidance (DNNG) law replace the proportional navigation guidance (PNG) law. This approach is performed by adopting a supervised learning (SL) method using a large amount of simulation data from the system with PNG. Then, the proposed DNNG is compared with the PNG, and its performance is evaluated via the hitting rate and the energy function. In addition, the DNN-based only line-of-sight (LOS) rate input guidance (DNNLG) law, in which only the LOS rate is an input variable, is introduced and compared with the PN and DNNG laws. Then, the DNNG and DNNLG laws examine behavior in an initial position other than the training data.

Keywords: proportional navigation guidance (PNG) law; deep neural networks-based guidance (DNNG) law; supervised learning (SL); homing missile

1. Introduction Proportional navigation guidance (PNG) is one of the most widely known laws in existence [1]. This guidance law generates an acceleration command proportional to the line-of-sight (LOS) rate from the perspective of a missile looking at a target. This is easy to implement based on simple principles. Raj [2] demonstrated the performance of PNG by showing that PNG is effective in intercepting a low-maneuvering target in a two-dimensional missile target engagement scenario. Bryson and Ho [3] showed that PNG is the optimal solution for engagement scenarios for a missile with constant speed and a stationary target by solving linearized engagement equations. This guidance law enables various designs according to the navigation constants with arbitrary real number values. Based on the fact that the form or size of the guidance command varies depending on the navigation constants, Yang [4] proposed a recursive optimal pure proportional navigation (PPN) scheme that predicts recursively optimal navigation gain and time-to-go. It also showed that the proposed algorithm applies to maneuvering targets. Kim [5] proposed a robust PNG law, a form of combination of the conventional PNG and the switching part made by using the Lyapunov approach. This was robust when used against a highly maneuvering target. Recently, deep learning [6]-based artificial intelligence technology has received attention in academia and industry. Various studies using artificial neural networks (ANNs) are also being studied in the defense technology sector, and among them, guidance laws have become the focus of various research as they are essential in homing [7–13]. Balakrishnan and Biega [7] presented a new neural structure providing feedback guidance solutions for any initial conditions and time-to-go for two-dimensional scenarios. Lee et al. [8,9] proposed a neural network guidance algorithm for bank-to-turn (BTT) missiles. This guidance performance was verified by comparing it with PNG. Rahbar and Bahrami [10] presented a method of designing a feedback controller of homing missile

Appl. Sci. 2020, 10, 7865; doi:10.3390/app10217865 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 13 Appl. Sci. 2020, 10, 7865 2 of 12 using feed-forward multi-layer perceptron (MLP). They demonstrated the efficiency of the proposed methodusing feed-forward by performing multi-layer simulations perceptron on various (MLP). initial They input demonstrated values. Rajagopalan the efficiency et al. of[14] the presented proposed anmethod ANN-based by performing approach simulations using MLP on variousto replace initial PNG. input It values. showed Rajagopalan that MLP-based et al. [14 guidance] presented can an effectivelyANN-based replace approach the PNG using in MLP a wide to replace range of PNG. engagement It showed scenarios. that MLP-based guidance can effectively replaceIn this the paper, PNG in we a widepropose range a deep of engagement neural network-based scenarios. guidance (DNNG) law imitating PNG. This approachIn this paper, learns we the propose DNN a model deep neural with network-basedtraining data acquired guidance from (DNNG) PNG. law The imitating DNN model PNG. appliesThis approach to the missile learns system the DNN instead model of with PNG, training and its data performance acquired fromis comparable PNG. The with DNN PNG. model Then, applies we designto the missilethe controller system insteadmodel and of PNG, discuss and how its performance the proposed is comparableDNNG law withchanges PNG. according Then, we to design time constants.the controller model and discuss how the proposed DNNG law changes according to time constants. ThisThis paper paper is is organized organized as as follows: follows: In In Section Section 2,2, the the two-dimensional two-dimensional missile–target missile–target engagement scenarioscenario and and equations equations of of motion motion of of the the missile missile will be explained. Section Section 3 brieflybriefly describesdescribes thethe conventionalconventional PNG PNG law. law. In In Section Section 44,, the DNNG law th thatat applied supervised learning, such as training datadata collection,collection, DNNG DNNG structure, structure, training training methods methods and evaluation, and evaluation, is described is indescribed detail. Additionally, in detail. Additionally,the simulation the results simulation of the DNNG results law of arethe shownDNNG through law are numerical shown through simulation. numerical Section 5simulation. introduces Sectiona new DNN-based 5 introduces guidance a new DNN-based law with the guidance input variables law with of the only input the LOSvariables rates. of We only run the simulations LOS rates. at Wethe run initial simulations position outside at the theinitial range position of training outside data. the In range addition, of training we design data. the In first-order addition, systemwe design and theexamine first-order the energy system performance and examine of the the missile energy according performance to the of time the constant. missile according Finally, conclusions to the time are constant.given in SectionFinally,6 .conclusions are given in Section 6.

2.2. Problem Problem Statement Statement

2.1.2.1. Equations Equations of of Motion Motion TheThe missile missile is is attempting attempting to to hi hitt the the stationary stationary target. target. The The miss missileile has has a a constant constant speed speed and and hits hits thethe target target by by controlling controlling the the lateral acceleration command. Figure Figure 1 showsshows thethe two-dimensionaltwo-dimensional engagementengagement geometry geometry between between the the missile missile and and the the target. target.

FigureFigure 1. 1. Two-dimensionalTwo-dimensional engagement engagement geometry. geometry.

Here, R is the relative distance from the missile to the target, Vm is the speed of the missile, am is Here, is the relative distance from the missile to the target, is the speed of the missile, the lateral acceleration command of the missile, λ is the look angle, ψm is the heading angle of the is the lateral acceleration command of the missile, is the look angle, is the heading angle of themissile missile and andΩ is Ω the is LOS the rate.LOS Letrate. us Let define us define the LOS the angle LOS (angleσ ) as a( contained ) as a contained angle formed angle betweenformed betweenthe look anglethe look and angle the heading and the angle.heading The angle. motion The of motion the missile of the is expressedmissile is expressed via nonlinear via dinonlinearfferential differentialequations, suchequations, as Equations such as (1)–(4).Equations (1)–(4). . R== −V mcoscos ((λ −ψm)) (1)(1) − −

. Vm λ = = sin ((λ −ψm)) (2)(2) R − . a = m ψm (3) =Vm (3) σ = λ ψm (4) − Appl. Sci. 2020, 10, 7865 3 of 12

The allowable range of lateral acceleration of the missile is set as follows: Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 13 10g am 10g (5) − ≤ ≤ =− (4) 2 where the accelerationThe allowable of range gravity of lateral is g acceleration= 9.81 m of/ sthe. missile This paper is set as addresses follows: the engagement scenario between the anti-ship missile and the target. It is assumed that the missile flies at a constant speed −10 10 (5) between Mach 0.5 and 0.95, and that the target is stationary. where the acceleration of gravity is = 9.81 m/s. This paper addresses the engagement scenario 2.2. Engagementbetween Scenariothe anti-ship missile and the target. It is assumed that the missile flies at a constant speed between Mach 0.5 and 0.95, and that the target is stationary. The assumptions of the engagement scenarios of this paper are as follows: First, the initial position 2.2. Engagement Scenario of the missile starts at any position 20, 000 30, 000 (m) away from the target. Second, the initial ∼ heading angleThe of assumptions the missile of is the any engagement values between scenarios ofπ andthis paperπ (rad are) ,as centered follows: First, on the the look initial angle of the position of the missile starts at any position 20,000~ 30,000 (m)− 2 2 away from the target. Second, the missile. Third,initial heading the missile angle of isthe regarded missile is any as values having between hit the− targetand (rad) if the, centered relative on the distance look angle between the missile andof the the missile. target Third, is within the missile 2 m. is Finally, regarded the as ha targetving hit is stationarythe target if the in relative position distance(xt, ybetweent) = (0, 0).

θ inthe Figure missile2 andand the Table target1 isrepresent within 2 m. the Finally, angle the of target the is missile stationary viewed in position by the(, target)=(0,0) in. the polar coordinate system. in Figure In Figure 2 and 2Table, the 1 positive represent xthe-axis angle direction of the missile is 0, viewed 2π (rad by) ,the measured target in the counter-clockwise polar 0, 2 (rad) to indicatecoordinate the magnitude system. In of Figure that angle. 2, the positive x-axis direction is , measured counter- clockwise to indicate the magnitude of that angle.

Figure 2. Missile–target engagement scenario. Figure 2. Missile–target engagement scenario.

Table 1. Initial conditions of the missile. Table 1. Initial conditions of the missile. () () () () (/) R (m) (rad) (rad) (rad) V (m/s) 20,000,30,000 0 θ0 0, 2 λ0 −ψ0, 170,323 m h 2 2 i [ ][ ][ + ] π π [ ] 20, 000, 30, 000 0, 2π θ π λ0 2 , λ0 + 2 170, 323 The initial conditions of the missile are shown in Table 1. The− initial values of the missile have random values within the presented range, and the simulation is terminated if the final distance of The initial2 m is conditionssatisfied. of the missile are shown in Table1. The initial values of the missile have random values within the presented range, and the simulation is terminated if the final distance of 2 m is satisfied.

3. Proportional Navigation Guidance Law The proportional navigation guidance (PNG) law is used on almost all homing missiles. Recently, it has actively used methods to improve the performances of missiles using modern control theory, but this is basically an extension of the PNG. PNG is based on the principle that the missile’s acceleration is generated in proportion to the LOS rate of the target if the LOS is constant when the two objects Appl. Sci. 2020Appl., 10 ,Sci. 7865 2020, 10, x FOR PEER REVIEW 4 of 13 4 of 12

3. Proportional Navigation Guidance Law approach eachThe other. proportional Since the navigation LOS rate guidance is measured (PNG) law by is used the on seeker almost of all the homing missile, missiles. the Recently, performance of the seeker hasit has a actively significant used methods effect on to theimprove missile’s the perf motionormances trajectory. of missiles using modern control theory, In addition,but this is PNG basically is veryan extension useful of for the missilePNG. PNG guidance is based on problems the principle against that the stationary missile’s targets. acceleration is generated in proportion to the LOS rate of the target if the LOS is constant when the The characteristic of this guidance law is that if the acceleration is generated such that the LOS two objects approach each other. Since the LOS rate is measured by the seeker of the missile, the rate is zero,performance it can hit of the the target seeker withouthas a significant any special effect on maneuvering the missile’s motion of the trajectory. missile. Moreover, the same principle couldIn be addition, applied PNG to movingis very useful targets. for missile Figure guidance3 shows problems the form against of the stationary guidance targets. geometry The of the impact triangle,characteristic which of ensures this guidance the looklaw is anglethat if the is parallelacceleration to is the generated moving such target. that the In LOS other rate words,is zero, in order it can hit the target without any special maneuvering of the missile. Moreover, the same principle to apply this guidance law, the missile’s seeker requires two pieces of information: the speed and LOS could be applied to moving targets. Figure 3 shows the form of the guidance geometry of the impact rate of thetriangle, missile. which The formensures of the the look PNG angle is is as parallel follows: to the moving target. In other words, in order to apply this guidance law, the missile’s seeker requires two pieces of information: the speed and LOS rate of the missile. The form of the PNG isa asm follows:= NVm Ω (6)

=Ω (6) where N is navigation constant, Vm is the speed of the missile and Ω is LOS rate. where is navigation constant, is the speed of the missile and Ω is LOS rate.

Figure 3.FigureConceptual 3. Conceptual diagram diagram of of proportionalproportional navigation navigation guidance. guidance.

4. DNN-Based4. DNN-Based Guidance Guidance (DNNG) (DNNG) Law Law In this section, we explain the DNN-based guidance (DNNG) law, which begins with the In this section, we explain the DNN-based guidance (DNNG) law, which begins with the acquisition acquisition of training data from missile systems with the PNG law. The DNNG law is a deep neural of trainingnetwork data from that deduces missile one systems output variable with the for PNGtwo input law. variables, The DNNG and has lawthe same is a principle deep neural as the network that deducesPNG one law. output This law variable outputs an for acceleration, two input variables,, by inputting and the has LOS the rate, same Ω, and principle the speed as of the the PNG law. missile, . Figure 4 shows a schematic diagram for the approach of this paper, indicating that the This law outputs an acceleration, am, by inputting the LOS rate, Ω, and the speed of the missile, Vm. DNNG replaces the PNG. This missile system consists of three main modules: the seeker module, the Figure4 shows a schematic diagram for the approach of this paper, indicating that the DNNG replaces guidance module and the dynamics module. The seeker module identifies the target position, and the PNG. Thisthen missilethis outputs system the LOS consists rate and of the three speed main of the modules: missile. The the guidance seeker module module, then the generates guidance the module and the dynamicsacceleration module. commands The for seeker the missile module in accordance identifies with the the target guidance position, law. The and dynamics then module this outputs the LOS rate andcalculates the speed the state of theof the missile. missile through The guidance numerical module integration then based generates on the missile the acceleration dynamics using commands control values. for the missile in accordance with the guidance law. The dynamics module calculates the state of the missile through numerical integration based on the missile dynamics using control values. Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 13

Figure 4. Schematic diagram of the missile system. Figure 4. Schematic diagram of the missile system.

Figure5 shows the processFigure 5 shows of the supervised process of supervised learning learning forfor the the DNNG. DNNG. The learning The process learning consists process consists of of a collection of the training data, the DNN architecture design and optimization. The details of the a collection of the traininglearning process data, are thecovered DNN in the following architecture paragraph. The design performance and ev optimization.aluation of DNNG is The details of the carried out in the missile system simulator established in this study, and its performance is compared learning process arewith covered the PNG. in the following paragraph. The performance evaluation of DNNG is

Figure 5. Learning process of supervised learning.

4.1. Data Collection The dataset used for supervised learning (SL) is generated from the simulation data of the missile system with PNG for the missile–target engagement scenario. This missile system with PNG was constructed with MATLAB/Simulink. The DNNG model is trained to learn about the relationship between state and action from the PNG-based generated dataset. The dataset consists of (∗,∗) ∗ pairs, the state variables are the LOS rate Ω and the speed of the missile , and the action ∗ variable is the lateral acceleration of the missile . The state–action pairs are clean data without considering noise. The purpose is to create a DNN-based guidance law using information received from the seeker. Two pieces of information are provided from the seeker: the LOS rate and the speed of the missile. The initial conditions used for data collection were randomly generated within the

bounds shown in Table 1. That is, the initialization area is defined as ∈, which is an area that contains all the values within the initial bounds in Table 1. The training data consisted of a total of 89,025 trajectories and 1,108,587,789 state–control pairs, which were sampled at 0.01 s intervals. Figure 6 shows the initial position distribution of the missile. It appears that its position is uniformly Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 13

Figure 4. Schematic diagram of the missile system.

Appl. Sci. 2020Figure, 10, 7865 5 shows the process of supervised learning for the DNNG. The learning process consists5 of 12 of a collection of the training data, the DNN architecture design and optimization. The details of the learning process are covered in the following paragraph. The performance evaluation of DNNG is carriedcarried out inout the in the missile missile system system simulator simulator established established in in this this study, study, and and its itsperformance performance is compared is compared with thewith PNG. the PNG.

FigureFigure 5. 5.Learning Learning processprocess of supervised supervised learning. learning.

4.1. Data4.1. Data Collection Collection The datasetThe dataset used used for for supervised supervised learning learning (SL)(SL) is ge generatednerated from from the the simulation simulation data data of the of missile the missile systemsystem with with PNG PNG for for the the missile–target missile–target engagement engagement scenario. This This missil missilee system system with with PNG PNG was was constructedconstructed with with MATLAB MATLAB/Simulink./Simulink. TheThe DNNGDNNG model model is is trained trained to tolearn learn about about the therelationship relationship ∗ ∗ between state and action from the PNG-based generated dataset. The dataset consists of( ( ,)) between state and action from∗ the PNG-based generated dataset. The dataset consists of x∗, u∗ pairs, pairs, the state variables are the LOS rate Ω and the speed of the missile , and the action the state variables∗ x∗ are the LOS rate Ω and the speed of the missile Vm, and the action variable u∗ is variable is the lateral acceleration of the missile . The state–action pairs are clean data without the lateralconsidering acceleration noise. The of purpose the missile is toa createm. The a DNN-based state–action guidance pairs are law clean using data information without received considering noise.from The the purpose seeker.is Two to createpieces aof DNN-based information are guidance provided law from using the informationseeker: the LOS received rate and from the speed the seeker. Two piecesof the missile. of information The initial are conditions provided used from for thedata seeker: collection the were LOS randomly rate and thegenerated speed within of the the missile.

The initialbounds conditions shown in Table used 1. for That data is, collectionthe initialization were area randomly is defined generated as within∈, which the is bounds an area shown that in Tablecontains1. That is,all the the initialization values within area the initialA is defined bounds asin xTable0 A 1., which The training is an area data that consisted contains of a all total the of values 89,025 1,108,587,789 ∈ withinAppl. the initialSci. trajectories 2020,bounds 10, x FOR and PEER in Table REVIEW1.The trainingstate–control data pairs, consisted which ofwere a total sampled of 89, at 0250.01trajectories s intervals.6 of 13 and 1, 108,Figure 587, 789 6 showsstate–control the initial pairs,position which distribution were sampledof the missile. at 0.01 It appears s intervals. that its Figure position6 shows is uniformly the initial distributed. In addition, the initial heading angle of the missile is uniformly distributed within the position distribution of the missile. It appears that its position is uniformly distributed. In addition, range − ~ , based on the look angle. This shows that the training data used to learnπ theπ DNN model the initial heading angle of the missile is uniformly distributed within the range ~ , based on the − 2 2 look angle.are trained This shows in various that the initial training conditions, data used and tothat learn at the the same DNN time model the are initial trained conditions in various have initial a sufficiently uniform distribution. The constant value used for learning the DNNG is as follows: the conditions, and that at the same time the initial conditions have a sufficiently uniform distribution. navigation constant is 3, and the speed of the missile has an arbitrary value between Mach The constant value used for learning the DNNG is as follows: the navigation constant N is 3, and the 0.5 and 0.95, respectively. speed of the missile Vm has an arbitrary value between Mach 0.5 and 0.95, respectively.

Figure 6. Distribution of the initial position and heading angle of the training dataset. Figure 6. Distribution of the initial position and heading angle of the training dataset.

4.2. DNNG Architecture The DNNG architecture consists of a fully connected layer. The output for each layer can be expressed mathematically, and the output of the i-th layer can be represented as shown in Equation (7).

= ( ) (7)

where is the activation function, are the weighting vectors, is the output of the previous layer, is bias and is the number of hidden layers. The selection of an activation function is very important in designing a DNN model. In this study, we experimented with various activation functions, and among them, the DNN model consisting of only a hyperbolic tangent function had the best performance. Thus, in this study, the activation function selects a hyperbolic tangent function as defined in Equation (8), which applies for both the output and hidden layers. − () = ( = ) (8)

4.3. Training In order to obtain the optimal value of the weight parameter from the training data at the training stage, whenever the training process is repeated, the loss function for the DNNG-driven actions and the PNG-driven actions tries to minimize the loss. Each iteration was evaluated using the

mean squared error (MSE) loss function for the difference between the PNG-driven action, , and the DNNG-driven action, , as defined in Equation (9). 1 = ( − ) 2 (9) The DNNG model has been trained using an adaptive moment estimation (Adam) optimizer. The Adam optimization algorithm is not affected by the rescaling of the gradient. Therefore, regardless of the size of the gradient, since the step size is bound, it will be a stable gradient descent. Appl. Sci. 2020, 10, 7865 6 of 12

4.2. DNNG Architecture The DNNG architecture consists of a fully connected layer. The output for each layer can be expressed mathematically, and the output of the i-th layer can be represented as shown in Equation (7).

oi = ai(Wioi 1 + bi) (7) − where ai is the activation function, Wi are the weighting vectors, oi 1 is the output of the previous layer, − bi is bias and i is the number of hidden layers. The selection of an activation function is very important in designing a DNN model. In this study, we experimented with various activation functions, and among them, the DNN model consisting of only a hyperbolic tangent function had the best performance. Thus, in this study, the activation function selects a hyperbolic tangent function as defined in Equation (8), which applies for both the output and hidden layers. ex e x ( ) = − ( = + ) tanh x x − x x Wioi 1 bi (8) e + e− − 4.3. Training In order to obtain the optimal value of the weight parameter from the training data at the training stage, whenever the training process is repeated, the loss function for the DNNG-driven actions and the PNG-driven actions tries to minimize the loss. Each iteration was evaluated using the mean squared error (MSE) loss function for the difference between the PNG-driven action, aPN, and the DNNG-driven action, aDNN, as defined in Equation (9).

1 X 2 Loss = (aDNN aPN) (9) 2 − i

The DNNG model has been trained using an adaptive moment estimation (Adam) optimizer. The Adam optimization algorithm is not affected by the rescaling of the gradient. Therefore, regardless of the size of the gradient, since the step size is bound, it will be a stable gradient descent. This feature is considered an appropriate optimization algorithm to solve this problem. Here, hyper parameters use the default value of the python class of Adam optimization, and these are β 0.9, β 0.999,  10 8 1 ≈ ≈ ≈ − and η = 0.0001, respectively [15].

4.4. Evaluation The action can be derived through the DNNG model for any state, and the mean absolute error (MAE) can evaluate the difference between the accelerations derived from the DNNG and from the PNG. The MAE is represented in Equation (10).

aDNN aPN (10) | − | This measurement can only evaluate the performance of the DNNG model, but how the DNNG law works in the missile environment cannot be known exactly. A small error in generating the trajectory from system dynamics may be enough to result in a failure to hit the target. Thus, we need an additional evaluation for the DNNG law. The additional evaluation is performed by replacing the PNG law of the missile simulator with the DNNG law. We find the DNNG-driven trajectory through the numerical integration of missile system dynamics, such as in Equation (11).

Z t f x(t) = f (x(t), DNN(x(t)))dt (11) 0 Appl. Sci. 2020, 10, 7865 7 of 12

Comparing the trajectories of missiles is a useful method to compare the DNNG and PNG laws. Moreover, the energy of the missile used at this time is an important criterion to confirm the performance of the missile guidance law. Therefore, the energy equation of the missile is defined as follows in Equation (12): Z t f 2 Energy = amdt (12) 0 The total energy required for a missile flight is an important measure of whether the missile trajectory is optimized or not. The simulation results will show how well the DNNG law has been learned from the PNG law through energy performance.

4.5. Simulation Results This section shows the simulation results of the DNNG law for various initial conditions. The simulation was performed on clean data without considering noise. The energy performance of the DNNG law was evaluated for 100 random initial conditions. The simulation was performed with an integration step of 0.01 s. In addition, DNNG is compared not only with PNG, but also with the reinforcement learning-based guidance (RLG) law presented in the previous paper [16], and this demonstrates its performance through the energy term. Table2 shows the average value of the simulation of three missile guidance laws for 100 random initial conditions. Here, t f is the final time, R f is the final distance and the energy is the value calculated using Equation (12). The DNNG law derives almost the same results as the PNG, although very small errors can occur in numerical calculations. The simulation results show that DNNG uses less energy than RL-based guidance, and is implied to have a sufficient performance to replace PNG.

Table 2. Performance comparison of three guidance laws for 100 random initial conditions.

Guidance Law tf (s) Rf (m) Energy PNG 117.2338 0.7323 1507.7367 DNNG 116.2438 0.6440 1507.7142 RLG [16] 113.0026 0.9789 1670.2785

Figures7 and8 shows the acceleration and the trajectory of the missile, respectively. The initial Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 13 conditions are as follows: R0 = 28, 516 (m), λ0 = 4.0915 (rad), ψ0 = 5.1782 (rad), Vm = 230 (m/s). The simulationsenergy of performance the DNNG of the and PNG PNG law laws are 1.1356 m show almost and 1302.3000 the same, and result. those of The the finalDNNG distance law are and the energy performance0.2785 m and of the 1302.3153 PNG law, respectively. are 1.1356 Bothm and guidance1302.3000 laws were, and successful those of in the hitting DNNG the target law are by 0.2785 m entering the 2 m margin. The simulation result suggests that the DNNG law is able to replace the and 1302.3153 , respectively. Both guidance laws were successful in hitting the target by entering the 2 m PNG law. margin. The simulation result suggests that the DNNG law is able to replace the PNG law.

Figure 7.FigureComparison 7. Comparison of the of the acceleration acceleration of PNG PNG and and DNNG DNNG laws. laws.

Figure 8. Comparison of the trajectory of PNG and DNNG laws.

5. Results In this section, we show the additional experiments for the DNNG law. The following three experiments were performed. The first experiment was to subtract the speed information from the input variables of the DNNG module. In other words, this will show the behavior of the missile when there is only one input variable of the DNNG law, the LOS rate. The second examines the behavior of the missile in its initial position not used in the DNN learning. The last interprets the guidance and control loops for all guidance laws.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 13

energy performance of the PNG law are 1.1356 m and 1302.3000, and those of the DNNG law are 0.2785 m and 1302.3153 , respectively. Both guidance laws were successful in hitting the target by entering the 2 m margin. The simulation result suggests that the DNNG law is able to replace the PNG law.

Appl. Sci. 2020, 10, 7865 Figure 7. Comparison of the acceleration of PNG and DNNG laws. 8 of 12

Figure 8.FigureComparison 8. Comparison of the of trajectorythe trajectory of of PNGPNG and and DNNG DNNG laws. laws.

5. Results 5. Results In this section,In this we section, show we the show additional the additional experiments experiments forfor the DNNG DNNG law. law. The following The following three three experiments were performed. The first experiment was to subtract the speed information from the experimentsinput were variables performed. of the DNNG The firstmodule. experiment In other words, was this to will subtract show the the behavior speed of informationthe missile when from the input variablesthere of is the only DNNG one input module. variable of In the other DNNG words, law, the this LOS will rate. show The second the behavior examines of the the behavior missile when there is onlyof one the inputmissile variablein its initial of position the DNNG not used law, in the the DNN LOS learning. rate. TheThe last second interprets examines the guidance the behaviorand of the missile incontrol its initial loops for position all guidance not usedlaws. in the DNN learning. The last interprets the guidance and control loops for all guidance laws. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 13 5.1. Additional Experiment 5.1. Additional Experiment In this section, we assess the behavior of the DNNG-driven trajectory when the input variables of In this section, we assess the behavior of the DNNG-driven trajectory when the input variables the guidance module are reduced to one. This is called the DNN-based only LOS rate input guidance of the guidance module are reduced to one. This is called the DNN-based only LOS rate input (DNNLG) law because there is only one input variable entering the guidance loop. The DNNLG law guidance (DNNLG) law because there is only one input variable entering the guidance loop. The learns state–action relationships using state variable Ω and action variable am from the training data DNNLG law learns state–action relationships using state variable Ω and action variable from generated by the PNG law. Figure9 shows the schematic diagram of the missile system with the the training data generated by the PNG law. Figure 9 shows the schematic diagram of the missile DNNLG law and the guidance law operates with only the LOS rate. system with the DNNLG law and the guidance law operates with only the LOS rate.

Figure 9. Missile system with DNNLG law. Figure 9. Missile system with DNNLG law. The data collection, DNNG architecture, DNNG training and evaluation to create DNNLG are carried The data collection, DNNG architecture, DNNG training and evaluation to create DNNLG are out via the same method as in Section4. Figures 10 and 11 show the simulation results for three guidance carried out via the same method as in Section 4. Figures 10 and 11 show the simulation results for laws. The initial conditions are as follows: R0 = 26, 252 (m), λ0 = 0.5564 (rad), ψ0 = 0.2497 (rad), three guidance laws. The initial conditions are as follows: = 26252 (m), = 0.5564 − (rad), = Vm = 268 (m/s) The energy values of the missile for PNG, DNNG and DNNLG are 3.0942, 3.0935 and −0.2497 (rad), = 268 (m/s) The energy values of the missile for PNG, DNNG and DNNLG are 3.4034, respectively. 3.0942, 3.0935 and 3.4034, respectively. The average values of the DNNLG were obtained using 100 random items from the validation data in Section4. These results are given in Table3. Comparing Table2 to Table3, DNNG produces simulation results that are roughly the same as PNG, but DNNLG shows inferior performance to the other two guidance systems in terms of optimality. However, in terms of hitting the targets, DNNLG indicates that it can be used to replace PNG. This experiment tells us that the missile can maneuver while only knowing the minimum input variable, the value for the LOS rate.

Figure 10. Comparison of the acceleration for all guidance law. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 13

5.1. Additional Experiment In this section, we assess the behavior of the DNNG-driven trajectory when the input variables of the guidance module are reduced to one. This is called the DNN-based only LOS rate input guidance (DNNLG) law because there is only one input variable entering the guidance loop. The

DNNLG law learns state–action relationships using state variable Ω and action variable from the training data generated by the PNG law. Figure 9 shows the schematic diagram of the missile system with the DNNLG law and the guidance law operates with only the LOS rate.

Figure 9. Missile system with DNNLG law.

The data collection, DNNG architecture, DNNG training and evaluation to create DNNLG are carried out via the same method as in Section 4. Figures 10 and 11 show the simulation results for three guidance laws. The initial conditions are as follows: = 26252 (m), = 0.5564 (rad), = −0.2497 (rad), = 268 (m/s) The energy values of the missile for PNG, DNNG and DNNLG are 3.0942, 3.0935 and 3.4034, respectively. Appl. Sci. 2020, 10, 7865 9 of 12

Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 13

Figure 10. Comparison of the acceleration for all guidance law. Figure 10. Comparison of the acceleration for all guidance law.

Figure 11. Comparison of the trajectory for all guidance law. Figure 11. Comparison of the trajectory for all guidance law.

The average valuesTable 3.ofResults the DNNLG of DNNLG were law forobtained 100 random using initial 100 conditions. random items from the validation data in Section 4. These results are given in Table 3. Comparing Table 2 to Table 3, DNNG produces Guidance Law t f (s) R f (m) Energy simulation results that are roughlyDNNLG the same117.0477 as PNG,0.5424 but DNNLG 1568.9078 shows inferior performance to the other two guidance systems in terms of optimality. However, in terms of hitting the targets, DNNLG indicates that it can be used to replace PNG. This experiment tells us that the missile can maneuver This simulation result shows the limitation of SL. SL produces completely identical results for whilecomplete only input–output knowing the pairs, minimum however, input changing variable, the the input–output value for variablesthe LOS rate. results in a difference from the training data. Therefore, it is very important to construct training data in supervised learning. Table 3. Results of DNNLG law for 100 random initial conditions. 5.2. Initial Conditions Outside of the Learning Range Guidance Law () () Energy This study examined howDNNLG DNNG works 117.0477 in an untrained 0.5424 initial position. 1568.9078 We assumed that the missile departs from a position that is less than 20, 000 m and greater than 30, 000 m from away the This simulation result shows the limitation of SL. SL produces completely identical results for complete input–output pairs, however, changing the input–output variables results in a difference from the training data. Therefore, it is very important to construct training data in supervised learning.

5.2. Initial Conditions Outside of the Learning Range This study examined how DNNG works in an untrained initial position. We assumed that the missile departs from a position that is less than 20,000 and greater than 30,000 m from away the position used in the learning (20,000~30,000 m ). All guidance is simulated from 100 arbitrary random validation data. The hit rate and the energy average for the 100 random initial conditions are shown in Table 4. In Table 4, all guidance laws satisfy the accuracy of 2 m .

Table 4. Average energy for the 100 random initial conditions outside the range of the training data.

Guidance Law Hit Rate (%) Energy PNG 100 1195.4409 DNNG 100 1195.6002 DNNLG 100 1252.1283 RLG [16] 100 1287.4488

The simulation results show that DNNG can be used for any initial position. In addition, the DNNG model shows that the training data pairs of the PNG are well learned. In other words, it Appl. Sci. 2020, 10, 7865 10 of 12 position used in the learning (20, 000 30, 000 m ). All guidance is simulated from 100 arbitrary random ∼ validation data. The hit rate and the energy average for the 100 random initial conditions are shown in Table4. In Table4, all guidance laws satisfy the accuracy of 2 m . The simulation results show that DNNG can be used for any initial position. In addition, the DNNG model shows that the training data pairs of the PNG are well learned. In other words, it suggests that DNNG is enough to replace PNG. On the other hand, DNNLG results in a slightly different maneuver from the PNG because this has not completely learned the state–control pairs of the PNG.

Table 4. Average energy for the 100 random initial conditions outside the range of the training data.

Guidance Law Hit Rate (%) Energy Appl. Sci. 2020, 10, x FOR PEER REVIEWPNG 100 1195.4409 11 of 13 DNNG 100 1195.6002 suggests that DNNG is enoughDNNLG to replace PNG. On 100 the other 1252.1283hand, DNNLG results in a slightly different maneuver from the PNGRLG because [16] this has 100not completely 1287.4488 learned the state–control pairs of the PNG. 5.3. Design of the Controller Model 5.3. Design of the Controller Model A controller model is designed for the control of the acceleration of a missile. The practical system A controller model is designed for the control of the acceleration of a missile. The practical model is expressed with a high-order system, but this study is defined as a first-order system with a system model is expressed with a high-order system, but this study is defined as a first-order system time constant τ for simple analysis. The first-order control system can be expressed mathematically with a time constant for simple analysis. The first-order control system can be expressed with a first-order differential equation, and the characteristic of the transient response relies entirely mathematically with a first-order differential equation, and the characteristic of the transient onresponse the time relies constant entirelyτ. Thus,on the accelerationtime constant commands . Thus, acceleration which are commands intended towhich be controlled are intended can to be representedbe controlled by thecan first-order be represented system, by such the first-order as in Equation system, (13). such The as system in Eq responseuation (13). is determinedThe system by the time constant τ. response is determined by the time constant. . τam + am = acmd (13) = (13) where τ is the time constant, a is the derived acceleration and a is the acceleration command. where is the time constant,m is the derived acceleration and cmd is the acceleration command. Figure 12 shows a schematic diagram of the missile simulator with a first-order system. Figure 12 shows a schematic diagram of the missile simulator with a first-order system. The first- The first-order system can be viewed as a linear time-invariant (LTI) input–output system. The order system can be viewed as a linear time-invariant (LTI) input–output system. The missile is missilemaneuvered is maneuvered by acceleration by acceleration derived from derived the contro fromller the model. controller Thus, model.the trajectory Thus, simulation the trajectory is simulationexecuted iswith executed acceleration with accelerationcommands that commands have passed that through have passed the first-order through system. the first-order system.

FigureFigure 12. 12.Conceptual Conceptual diagram diagram ofof thethe missile system with with added added first-order first-order system. system.

TheThe DNNG DNNG and and DNNLG DNNLG laws laws are are trained trained without without a controllera controller model. model. These These two two guidance guidance laws laws are trainedare trained as a state–control as a state–control pair of pair the PNGof the with PNG a navigationwith a navigation constant constant of 3. The of simulation 3. The simulation is performed is forperformed the time constantfor the time0, 0.05,constant 0.07, 0,0.05,0.07,0.1,0.5,1 0.1, 0.5, 1 on 150 random on 150 initial random conditions. initial conditions. The simulation The { } resultssimulation are shown results in are Table shown5. Thein Table DNNG 5. The and DNNG DNNLG and DNNLG laws have laws a 100%have a hit 100% rate hit under rate under all initial all conditions.initial conditions. The DNNG The makesDNNG a makes difference a difference from the from PNG the as thePNG time as the constant time constant grows, which grows, appears which to beappears the result to ofbe training the result without of training considering without the considering controller modelthe controller during themodel DNNG during model’s the DNNG training. Thus,model’s SL cannot training. achieve Thus, theSL cannot same results achieve entirely the same for resu thelts data entirely other for than the thedata training other than data, thewhile training fully data, while fully learning about training data. These results seem to reveal the limit of SL, and to learning about training data. These results seem to reveal the limit of SL, and to make up for it, we need make up for it, we need to explore other ways, such as building a very large database that takes into to explore other ways, such as building a very large database that takes into account all the factors, account all the factors, or using reinforcement learning (RL). or using reinforcement learning (RL). Table 5. Hit rate and average energy of the three different guidance laws for 150 random initial conditions according to time constant.

Time Constant Guidance Hit Rate (%) Energy PNG 100 1061.5301 0 DNNG 100 1061.5235 DNNLG 100 1108.8668 PNG 100 1061.7394 0.05 DNNG 100 1062.4931 DNNLG 100 1109.1051 PNG 100 1061.8672 0.07 DNNG 100 1062.6036 DNNLG 100 1109.1523 PNG 100 1062.0621 0.1 DNNG 100 1062.7446 DNNLG 100 1109.2414 Appl. Sci. 2020, 10, 7865 11 of 12

Table 5. Hit rate and average energy of the three different guidance laws for 150 random initial conditions according to time constant.

Time Constant Guidance Hit Rate (%) Energy PNG 100 1061.5301 0 DNNG 100 1061.5235 DNNLG 100 1108.8668 PNG 100 1061.7394 0.05 DNNG 100 1062.4931 DNNLG 100 1109.1051 PNG 100 1061.8672 0.07 DNNG 100 1062.6036 DNNLG 100 1109.1523 PNG 100 1062.0621 0.1 DNNG 100 1062.7446 DNNLG 100 1109.2414 PNG 100 1064.7169 0.5 DNNG 100 1065.1657 DNNLG 100 1110.9848 PNG 100 1068.1829 1 DNNG 100 1068.5402 DNNLG 100 1113.7670

6. Conclusions In this paper, we proposed that DNNG can replace conventional PNG. The proposed guidance law was simulated in a two-dimensional missile–target engagement scenario. DNNG completely imitated PNG, and this was enough to replace PNG. Additionally, three experimental results were obtained regarding DNNG’s abilities. Firstly, we introduced DNNLG such that only the LOS rate was the input variable. DNNLG had a hit rate of 100%, but this guidance law did not fully follow PNG. Secondly, DNNG also showed the same as PNG at initial positions outside of the training data. Finally, we designed an acceleration controller model with a first-order system and compared the energy performance of the guidance laws according to time constants. This experiment was an important study, showing the capabilities and, at the same time, the clear limitations of SL. The DNN model fully learned the training data of the state–action pairs of the PNG, but showed slightly incomplete aspects for the elements subtracted or added after learning. It is thought that methods such as RL, or building a large database, could sufficiently resolve these problems.

Author Contributions: Methodology, S.P.; software and simulation, M.K., D.H. and S.P.; writing—original draft preparation, M.K. and D.H.; writing—review and editing, M.K., D.H. and S.P. All authors have read and agreed to the published version of the manuscript. Funding: This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1F1A1060749). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Qi, Z.K.; Lin, D.F. Design of Guidance and Control Systems for Tactical Missiles, 1st ed.; CRC Press: Boca Raton, FL, USA, 2020; ISBN 978-03-6726-041-5. 2. Raj, K.D.S. Performance Evaluation of Proportional Navigation Guidance for Low-Maneuvering Targets. Int. J. Sci. Eng. Res. 2014, 5, 93–99. 3. Bryson, A.E.; Ho, Y.C. Applied Optimal Control: Optimization, Estimation, and Control, 1st ed.; Taylor & Francis Group: New York, NY, USA, 1975; ISBN 978-0470114810. 4. Yang, C.D.; Yang, C.C. Optimal Pure Proportional Navigation for Maneuvering Targets. IEEE Trans. Aerosp. Electr. Syst. 1997, 33, 949–957. [CrossRef] Appl. Sci. 2020, 10, 7865 12 of 12

5. Kim, S.H.; Kim, H.J. Robust Proportional Navigation Guidance Against Highly Maneuvering Targets. In Proceedings of the 13th International Conference on Control, Automation and Systems (ICCAS 2013), Gwangju, South Korea, 20–23 October 2013. [CrossRef] 6. Yann, L.C.; Bengio, Y.S.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] 7. Balakrishnan, S.N.; Biega, V. A New Neural Architecture for Homing Missile Guidance. In Proceedings of the 1995 American Control Conference (ACC95), Seattle, WA, USA, 21–23 June 1995. [CrossRef] 8. Lee, H.G.; Lee, Y.I.; Song, E.J.; Sun, B.C.; Tahk, M.J. Missile guidance using neural networks. Control Eng. Pr. 1997, 5, 753–762. [CrossRef] 9. Lee, Y.I.; Lee, H.; Sun, B.C.; Tahk, M.J. BTT Missile Guidance Using Neural Networks. In Proceedings of the 13th World Congress of IFAC, San Francisco, CA, USA, 30 June–5 July 1996. [CrossRef] 10. Rahbar, N.; Bahrami, M. Synthesis of optimal feedback guidance law for homing missiles using neural networks. Optim. Control Appl. Methods 2000, 21, 137–142. [CrossRef] 11. Jan, H.Y.; Lin, C.L.; Chen, K.M.; Lai, C.W.; Hwang, T.S. Missile Guidance Design Using Optimal Trajectory Shaping and Neural Network. 16th IFAC World Congress 2005, 38, 277–282. [CrossRef] 12. Wang, C.H.; Chen, C.Y. Intelligent Missile Guidance by Using Adaptive Recurrent Neural Networks. In Proceedings of the 11th IEEE International Conference on Networking, Sensing and Control, Miami, FL, USA, 7–9 April 2014. [CrossRef] 13. Li, Z.; Xia, Y.; Su, C.Y.; Jun, D.; Jun, F.; He, W. Missile Guidance Law Based on Robust Model Predictive Control Using Neural-Network Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1803–1809. [CrossRef][PubMed] 14. Rajagopalan, A.; Faruqi, F.A.; Nandagopal, D.N. Intelligent missile guidance using artificial neural networks. Artif. Intell. Res. 2015, 4.[CrossRef] 15. Diederik, P.K.; Ba, J.L. ADAM: A method for stochastic optimization. arXiv 2017, arXiv:1412.6980v9. 16. Hong, D.S.; Kim, M.J.; Park, S.S. Study on Reinforcement Learning-Based Missile Guidance Law. Appl. Sci. 2020, 10, 6567. [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).