Neural Comput & Applic DOI 10.1007/s00521-012-1156-2

ORIGINAL ARTICLE

Optimal control problem via neural networks

Sohrab Effati • Morteza Pakdaman

Received: 22 May 2012 / Accepted: 28 August 2012 Ó Springer-Verlag London Limited 2012

Abstract This paper attempts to propose a new method variety of practical problems arising in science and engi- based on capabilities of artificial neural networks, in neering that have a and the dynamic of function approximation, to attain the solution of optimal the system must be controlled to attain an objective. In control problems. To do so, we try to approximate the recent years, several researchers attempted to offer and solution of Hamiltonian conditions based on the Pontryagin extend new methods for solving optimal control problem. minimum principle (PMP). For this purpose, we introduce For example, Krabs et al. [1] proposed a mathematical an error function that contains all PMP conditions. In the model for the control of the growth of tumor cells which is proposed error function, we used trial solutions for the formulated as a problem of optimal . Mod- trajectory function, control function and the Lagrange ares et al. [2] presented a hybrid algorithm by integrating multipliers. These trial solutions are constructed by using an improved particle swarm optimization with successive neurons. Then, we minimize the error function that con- quadratic programming (SQP), for solving nonlinear opti- tains just the weights of the trial solutions. Substituting the mal control problems. optimal values of the weights in the trial solutions, we The solutions of optimal control problems can be calcu- obtain the optimal trajectory function, optimal control lated either by using Pontryagin’s minimum principle function and the optimal Lagrange multipliers. (PMP), which provides a necessary condition for optimality, or by solving the Hamilton–Jacobi–Bellman (HJB) partial Keywords Pontryagin minimum principle (PDE), which is a sufficient condition Optimal control problem Artificial neural networks (see e.g. [3, 4]). Solving the HJB–PDE is a very tedious task. Several approximation methods are proposed for solving it. Hilscher [5] considered Hamilton–Jacobi theory over time 1 Introduction scales and its applications to linear-quadratic problems. Based on the variational iteration method, Berkani et al. [6] A very important, extensive and applicable mathematical proposed a method for solving optimal control problems. model is the optimal control problem. There are a wide Garg et al. [7] presented a unified framework for the numerical solution of optimal control problems using col- location at Legendre–Gauss, Legendre–Gauss–Radau, and & S. Effati M. Pakdaman ( ) Legendre–Gauss–Lobatto points. An adaptive multilevel Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran generalized SQP method presented in [8] to solve PDAE- e-mail: [email protected] constrained (partial differential algebraic equations) opti- S. Effati mization problems. The notion of KT-invexity from math- e-mail: [email protected] ematical programming was extended to the classical optimal control problem by authors of [9]. Optimal control problem S. Effati M. Pakdaman subject to mixed control-state constraints was investigated Center of Excellence on Soft Computing and Intelligent Information Processing, by Gerdts [10]. He stated the necessary conditions in terms of Ferdowsi University of Mashhad, Mashhad, Iran a local minimum principle and use of the Fischer–Burmeister 123 Neural Comput & Applic

function. Buldaev [11] used perturbation methods in optimal Rtf control problems. Numerical methods based on extended min f0ðxðtÞ; uðtÞ; tÞ dt t0 one-step methods were investigated for solving optimal s:t ð1Þ control problems in [12]. Variational inequalities were used x_ ¼ gðxðtÞ; uðtÞ; tÞ to govern the existence results for optimal control problems xðt0Þ¼x0; in [13]. Chryssoverghi et al. [14] considered an optimal n m control problem described by nonlinear ordinary differential where xðtÞ2< is the state variable, uðtÞ2< is the equations (ODEs) with control and state constraints, control variable and t 2<: It is assumed that the integrand including point-wise state constraints. Because their prob- f0 has continuous first and second partial derivatives with lem may have no classical solutions, they formulated a respect to all its arguments. Also we assume that t0 and tf n relaxed form of problem and used discretization method. are fixed and g is Lipschitz continuous on a set X <: England et al. [15], in an interesting work, expressed optimal According to the problem (1), we can construct the control problems as differential algebraic equations. Local well-known Hamiltonian as: H(x(t), u(t), p(t), t) = f0(x(t), m stability of the solution to optimal control problems was u(t), t) ? p(t)g(x(t), u(t), t) where p 2< is the costate analyzed by Rodriguez [16]. An approximate-analytical vector. Suppose that we denote the optimal state, co-state solution for the HJB equation proposed via homotopy per- and control functions by x*(t), p*(t) and u*(t) respectively. turbation method in [17]. Cheng et al. in several works [18– Then, a necessary condition for u*(t) to minimize the 20] proposed a neural network solution for different types of objective functional in (1) is that: optimal control problems. They proposed (in [18]) neural HðxðtÞ; uðtÞ; pðtÞ; tÞHðxðtÞ; uðtÞ; pðtÞ; tÞð2Þ network solution for suboptimal control of non-holonomic chained form systems. In [19], they introduced a neural for all t 2½t0; tf and for all admissible controls. Equation network solution for finite-horizon H-infinity constrained (2) that indicates that an optimal control must minimize the optimal control of nonlinear systems. Finally in [20], they Hamiltonian is called the PMP (see [3]). Using PMP proposed fixed-final time-constrained optimal control laws provides a necessary condition for optimality. This PMP using neural networks to solve HJB equations for general shows that if x(t), p(t) and u(t) are the optimal values of the affine in the constrained nonlinear systems. Vrabie and state, costate and control respectively, they must satisfy the Lewis [21] presented a neural network approach to contin- following conditions: 8 uous-time direct adaptive optimal control for partially > oHðx;u;t;pÞ < ox ¼p_ðtÞ unknown nonlinear systems. oH x;u;t;p ð Þ ¼ x_ðtÞ ð3Þ In the last decade, artificial neural networks and other > op : oHðx;u;t;pÞ elements of soft computing and artificial intelligence played ouðtÞ ¼ 0 an important role in solving hard to solve problems arising By replacing the known functions f and g into the Ham- in science and engineering phenomenons. Applying the 0 iltonian, Eq. (3) gives a system of ODEs, which can be mentioned methods in many contests was successful, and solved via numerical methods or other existing methods. In the results were comparable with the other results obtained some cases, Eq. (3) introduces an straightforward ODE by mathematical algorithms. Lagaris et al. [22] used artifi- system that can be solved easily. But in most cases (spe- cial neural networks to solve ODEs and PDEs for both cially in practical problems), the system cannot be easily boundary value problems and initial value problems. Vrabie solved, and an approximated scheme must be applied. In et al. [21] proposed a method for solving continuous-time the next section, we try to apply neural network’s ability in direct adaptive optimal control for partially unknown non- function approximation to solve (3). linear systems, based on a scheme. A basic neuron based on a perceptron can be observed in In Sect. 2, we introduce the optimal control problem and Fig. 1. It is proved that we can use multi-layer perceptrons present some basic concepts of neural network models. to approximate any nonlinear function with arbitrary Section 3 contains the main idea based on neural network accuracy(see [23]). models. In Sect. 4, we apply the new method for solving Here W is the weight vector of input layer, b is a vector some numerical problems, and finally Sect. 5 contains containing bias weights, and V is the output layer weights. concluding remarks.

Wzv X + Sigmoid Out 2 Preliminaries

b In this paper, we consider the following type of optimal control problem: Fig. 1 Basic perceptron 123 Neural Comput & Applic

It can be observed that we can calculate the output from the It is easy to check that xT satisfies the initial condition following formulation: (xT(t0) = x0). Note that we may have p(.) = 0 for free end- 8 > Pk points. For example, if x(t0) is free, we must have p(t0) = 0, <> out ¼ virðziÞ and thus, we can define pT in (7)as:pT = (t - t0)np.For i¼1 4 > Pk ð Þ other initial (or boundary) conditions, we can construct :> zi ¼ wix þ bi appropriate trial functions. i¼1 By replacing the trial solutions into the Hamiltonian where k is the number of sigmoid units. The activation function, we can define a trial hamiltonian HT which is function here is the sigmoid function in the following conventional Hamiltonian function H where we replaced formula: the functions x, p and u by their corresponding trial format (x , p and u respectively) as H (x (t), u (t ), p (t), t) 1 T T T T T T f T rðxÞ¼ ð5Þ = f (x (t), u (t), t) ? p (t)g(x (t), u (t), t). Thus, the trial 1 þ ex 0 T T T T T Hamiltonian function contains the weights of neural net- Based on Kolmogorov theorem, it is proved that we can works. Since the trial solutions (7) must satisfy conditions implement any continuous function with a multi-layer (3), we replace them into the Eq. (3): 8 perceptron (for more details, see [23]). According to this oHT <> o þ p_T ¼ 0 theorem, we use the ability of neural networks in function xT oHT o x_T ¼ 0 ð8Þ approximation, to approximate the state, co-state and > pT : o HT ¼ 0 control function for optimal control problem (1) which ouT will be discussed in detail in the next section. To solve the system (8), we define three error functions corresponding to each equation: 8 hi o 2 3 Main idea > HT > E1ð/; tÞ¼ o þ p_T <> hixT 2 oHT In this section, we try to propose an approximation scheme E2ð/; tÞ¼ o x_T ð9Þ > hipT for solving the equations arising in PMP (i.e., Eq. 3). We > 2 : oHT E3ð/; tÞ¼ o consider three neural networks for each function: state (its uT neural network is n ), costate (its neural network is n ) and x p and finally a total error function E(/, t) = E (/, t) ? the control (its neural network is n ) function separately, 1 u E (/, t) ? E (/, t), where / is a vector containing all where each neural network model contains its special 2 3 weights of three neural networks (6). Indeed, / contains all adjustable parameters, as it can be observed in Fig. 1. Note weights w , w , w , b , b , b , v , v and v . Now instead of that the structures of the neural network models must be x p u x p u x p u solving Eq. (8), we discritize the interval [t ,t ] (by m points) constructed such that satisfy the initial or boundary con- 0 f and solve the following unconstrained optimization problem: ditions. The proposed neural network models can be pro- posed in the following forms: Pm 8 min Eðtk; /Þ ð10Þ / > PI k¼1 > n ¼ vi rðzi Þ; zi ¼ wi t þ bi > x x x x x x <> i¼1 To solve (10), which is an unconstrained optimization PI i i i i i problem, we can use any optimization algorithms such as > np ¼ vprðzpÞ; zp ¼ wpt þ bp ð6Þ > i¼1 steepest descent, Newton, or Quasi-Newton methods as > PI > i i i i i well as the heuristic algorithms such as GA (genetic : nu ¼ vurðzuÞ; zu ¼ wut þ bu i¼1 algorithm) or particle swarm optimization, etc. After terminating the optimization step, we can replace for i 1; 2; ...; I where I is the number of neurons that can ¼ the optimal values of the weights / (containing the weights be different for each neural network. of input and output layer and the bias vector) into the Now we are ready to use the neural networks (6) and Eq. (7) and conclude the trial structures of state, co-state define the main trail solutions. The trial solutions (for state, and control functions. costate and control function) contain the neural networks, The main advantages of this method are that the satisfy the initial or boundary conditions, and thus, they can implementation of the algorithm is not very complicated, be defined in the following structures: 8 we can use more hidden layer or more training points over < xT ¼ x0 þðt t0Þnx the interval [t0,tf] to obtain more accurate approximations. : pT ¼ np ð7Þ Finally, the solution of state, co-state and control functions uT ¼ nu is introduced as functions of time (t) thus we can calculate

123 Neural Comput & Applic

the solution at every arbitrary point over the interval [t0,tf]. 0 Exact (o) and approximated (+) solution for function u(t) Also the proposed control and state functions are differ- −0.1 entiable which can be useful in applications. −0.2

4 Numerical simulations −0.3

−0.4 In this section, we try to implement the proposed algorithm u(t) to solve four optimal control problems. We used for all −0.5 problems, five parameters for each input, output and bias −0.6 weights. The intervals are discretized to ten equivalent parts. For the optimization step, we used the MATLAB 7 −0.7 optimization toolbox with Quasi-Newton BFGS algorithm. −0.8 The user can use other optimization algorithms such as 0 0.2 0.4 0.6 0.8 1 steepest descent, Newton-based methods or other heuristic t algorithms such as GA or particle swarm optimization, etc. Fig. 2 Exact and approximated control function (Example 4.1) Example 4.1 Consider the following optimization problem:

R1 1 2 2 min ½x ðtÞþu ðtÞ dt Exact (o) and approximated (+) solution for function x(t) 0 s:t ð11Þ 0.95

x_ ¼ uðtÞ 0.9 xð0Þ¼1; xð1Þ is free 0.85 First we must construct the Hamiltonian function: 0.8 Hðx; u; p; tÞ¼x2ðtÞþu2ðtÞþpuðtÞð12Þ x(t) Following Eq. 3, we must have: 0.75 8 < 2xðtÞ¼p_ 0.7 x_ ¼ uðtÞ ð13Þ : 0.65 2uðtÞþp ¼ 0 because x(1) is free, we have p(1) = 0. Considering this 0 0.2 0.4 0.6 0.8 1 condition and the initial condition x(0) = 1, we can choose t the trial solutions as: 8 Fig. 3 Exact and approximated state function (Example 4.1) < xT ¼ 1 þ tnx pT ¼ðt 1Þnp ð14Þ pffiffiffiffiffiffiffiffi : H x; u; p; t 2 x t 2 u2 t p 0:25 x t uT ¼ nu ð Þ¼½ð ð ÞÞ þ ð Þ þ ð ð Þ þ uðtÞÞ For this example, we used 15 weights for each neural ð16Þ network (five weights for each weight of input layer, output layer and the bios vector). We can see the approximate and Following Eq. 2, we must have: exact solutions for u(t) and x(t) in Figs. 2 and 3, respec- 8 > 2ð2 xðtÞÞ þ p0:ffiffiffiffiffiffi25 p ¼p_ tively. Figures 4 and 5 show the solution accuracy. <> pffiffiffiffiffiffiffiffi2 xðtÞ x_ ¼0:25 xðtÞ þ uðtÞ Example 4.2 Consider the following optimization > ð17Þ > problem: : 2uðtÞþp ¼ 0 R1 min ½ð2 xðtÞÞ2 þ u2ðtÞ dt Considering the initial conditions x(0) = 0 and x(1) = 2, 0 we can choose the trial solutions as: s.t pffiffiffiffiffiffiffiffi ð15Þ 8 x_ ¼0:25 xðtÞ þ uðtÞ < xT ¼ 2t þ tðt 1Þnx xð0Þ¼0; xð1Þ¼2 : pT ¼ np ð18Þ u ¼ n First we must construct the Hamiltonian function: T u

123 Neural Comput & Applic

−4 x 10 3 4.5 Exact (o) and approximated (+) solution for function u(t) Error u(t) 4 2.5

3.5 2 3

1.5

2.5 u(t)

2 1

1.5 0.5 1

0.5 0 0 0.2 0.4 0.6 0.8 1

0 t 0 0.2 0.4 0.6 0.8 1 Fig. 6 Exact and approximated control function (Example 4.2) Fig. 4 Error for estimating control function (Example 4.1)

2 −5 x 10 Exact (o) and approximated (+) solution for function x(t) 4.5 1.8 Error x(t) 4 1.6

1.4 3.5 1.2 3 1 x(t) 2.5 0.8

2 0.6

1.5 0.4

1 0.2

0 0.5 0 0.2 0.4 0.6 0.8 1 t 0 0 0.2 0.4 0.6 0.8 1 Fig. 7 Exact and approximated state function (Example 4.2) Fig. 5 Error for estimating state function (Example 4.1)

0.1 Error u(t) For this example, we used 15 weights for each neural net- 0.09 work (five weights for each weight of input layer, output 0.08 layer and the bios vector). We can see the approximate and 0.07 exact solutions in Figs. 6 and 7. Figures 8 and 9 show the solution accuracy. This example is solved in [6] by a varia- 0.06 tional iteration method. Our results are comparable with the 0.05 results in [6]; however, neural network method gives the 0.04 state and control as functions of time which are differentiable and the method implementation is more simple. 0.03 Example 4.3 Consider the following optimization problem: 0.02 min J ¼xð2Þ 0.01 s.t 0 5 2 ð19Þ 0 0.2 0.4 0.6 0.8 1 x_ ¼ 2 ðÞx þ ux u xð0Þ¼1 Fig. 8 Error for estimating control function (Example 4.2) 123 Neural Comput & Applic

−5 0.01 x 10 Error x(t) 1.8 0.009 Error x(t) 1.6 0.008 1.4 0.007

0.006 1.2

0.005 1

0.004 0.8

0.003 0.6 0.002 0.4 0.001 0.2 0 0 0.2 0.4 0.6 0.8 1 0 0 0.5 1 1.5 2 Fig. 9 Error for estimating state function (Example 4.2) Fig. 11 Error for estimating state function (Example 4.3)

0.5 The exact state and control functions are as follows: Exact (o) and approximated (+) solution for function u(t) ( 0.45 xðtÞ¼ 4 1þ3 exp 5t ðÞ2 ð20Þ 0.4 uðtÞ¼xðtÞ 2 0.35

Figures 10 and 11 show the state function approximation 0.3 and corresponding error. Figures 12 and 13 show the 0.25 control function approximation and its corresponding error. u(t) 0.2 Example 4.4 Consider the following nonlinear optimal 0.15 control problem [24, 25]: 0.1 R1 2 min uðtÞ dt 0.05 0 s.t ð21Þ 0 0 0.5 1 1.5 2 2 x_ ¼ 0:5x ðtÞ sinðxðtÞÞ þ uðtÞ t xð0Þ¼0; xð1Þ¼0:5; Fig. 12 Exact and approximated control function (Example 4.3)

−3 x 10 1 2 Exact (o) and approximated (+) solution for function x(t) Error u(t) 0.9 1.8

0.8 1.6

0.7 1.4

0.6 1.2 0.5 x(t) 1 0.4 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0 0 0.5 1 1.5 2 0 t 0 0.5 1 1.5 2

Fig. 10 Exact and approximated state function for Example 4.3 Fig. 13 Error for estimating control function (Example 4.3)

123 Neural Comput & Applic

This problem is solved in [25] by a variational method. 5 Concluding remarks For this example, we have H(x(t), u(t), p(t), t) = u(t)2 ? p(0.5x2(t)sin(x(t)) ? u(t)). Thus conditions (3) This paper presented an approximated solution of optimal can be driven as the following system: control problems, based on the neural network approach. 8 One of the advantages of the proposed method is that, for < x_ðtÞ¼0:5x2ðtÞ sinðxðtÞÞ 0:5pðtÞ 2 attaining more accurate solutions, we can use more hidden : p_ðtÞ¼pðtÞxðtÞ sinðxðtÞÞ 0:5pðtÞx ðtÞ cosðxðtÞÞ 2uðtÞþpðtÞ¼0; layers, more training points and also we are allowed to use heuristic algorithms in optimization step such as GA and ð22Þ PSO or other existing unconstrained optimization algo- with the initial and final conditions x(0) = 0 and x(1) = 0.5. rithms. The proposed solution is a differentiable function This system is solved by numerical methods (Euler method), (for state, co-state and control functions). Work is in pro- and the results are displayed and compared with the results gress to apply the method to approximate the solution of obtained by neural networks in Figs. 14 and 15. The optimal HJB equation and also for problems arising in calculus of value of the objective functional in neural network method is variations.

J* = 0.2353 and x(tf) is exactly equal to 0.5. Comparing our result with the obtained results in [25] shows the accuracy of the method based on neural networks. References 0.5 Neural Network Numerical method 1. Krabs W, Pickl S (2010) An optimal control problem in cancer 0.49 chemotherapy. Appl Math Comput 217:1117–1124 2. Modares H, Naghibi Sistani MB (2011) Solving nonlinear opti- mal control problems using a hybrid IPSOSQP algorithm. Eng 0.48 Appl Artif Intell 24:476–484 3. Kirk DE (2004) Optimal control theory—an introduction. Dover 0.47 Publications, Mineola, NY 4. Lewis F, Syrmos VL (1995) Optimal control. Wiley, New York 5. Hilscher RS, Zeidan V (2012) Hamilton–Jacobi theory over time

Control function 0.46 scales and applications to linear-quadratic problems. Nonlinear Anal 75:932–950 6. Berkani S, Manseur F, Maidi A (2012) Optimal control based on 0.45 the variational iteration method. Comput Math Appl 64:604–610 7. Garg D, Patterson M, Hagera WW, Raoa AV, Bensonb DA,

0.44 Huntington GT (2010) A unified framework for the numerical 0 0.2 0.4 0.6 0.8 1 solution of optimal control problems using pseudo-spectral t methods. Automatica 46:1843–1851 8. Clever D, Lang J, Ulbrich S, Ziems JC (2010) Combination of an Fig. 14 Control function (Example 4.4) adaptive multilevel SQP method and a space-time adaptive PDAE solver for optimal control problems. Procedia Comput Sci 1:1435–1443 0.5 9. de Oliveira VA, Silva GN, Rojas-Medar MA (2009) KT-invexity Neural Network in optimal control problems. Nonlinear Anal 71:4790–4797 0.45 Numerical method 10. Gerdts M (2008) A non-smooth Newtons method for control-state

0.4 constrained optimal control problems. Math Comput Simul 79:925–936 0.35 11. Buldaev AS (2008) Perturbation methods in optimal control problems. Ecol Model 216:157–159 0.3 12. Salama AA (2006) Numerical methods based on extended one- 0.25 step methods for solving optimal control problems. Appl Math Comput 183:243–250 0.2 13. Zhou YY, Yangb XQ, Teob KL (2006) The existence results for optimal control problems governed by a variational inequality.

Trajectory function 0.15 J Math Anal Appl 321:595–608 0.1 14. Chryssoverghi I, Coletsos I, Kokkinis B (2006) Discretization methods for optimal control problems with state constraints. 0.05 J Comput Appl Math 191:1–31 0 15. England R, Gomez S, Lamourc R (2005) Expressing optimal 0 0.2 0.4 0.6 0.8 1 control problems as differential algebraic equations. Comput t Chem Eng 29:1720–1730 16. Rodriguez A (2004) On the local stability of the solution to Fig. 15 State function (Example 4.4) optimal control problems. J Econ Dyn Control 28:2475–2484 123 Neural Comput & Applic

17. Saberi Nik H, Effati S, Shirazian M (2012) An approximate-ana- 21. Vrabie D, Lewis FL (2009) Neural network approach to contin- lytical solution for the Hamilton–Jacobi– via ho- uous-time direct adaptive optimal control for partially unknown motopy perturbation method. Appl Math Model 36(11):5614–5623 nonlinear systems. Neural Netw 22:237–246 18. Cheng T, Sun H, Qu Z, Lewis FL (2009) Neural network solution 22. Lagaris IE, Likas A (2012) Hamilton–Jacobi theory over time for suboptimal control of non-holonomic chained form system. scales and applications to linear-quadratic problems. IEEE Trans Trans Inst Measurement Control 31(6):475–494 Neural Netw 9(5):987–1000 19. Cheng T, Lewis FL (2007) Neural network solution for finite- 23. Keckman V (2001) Learning and soft computing. MIT press, horizon H-infinity constrained optimal control of nonlinear sys- Cambridge, MA tems. J Control Theory Appl 5(1):1–11 24. Rubio JE (1986) Control and optimization, the linear treatment of 20. Cheng T, Lewis FL, Abu-Khalaf M (2007) Fixed-final-time- nonlinear problems. Manchester University Press, Manchester constrained optimal control of nonlinear systems using neural 25. Shirazian M, Effati S (2012) Solving a class of nonlinear optimal network HJB approach. IEEE Trans Neural Netw 18(6):1725– control problems via he’s variational iteration method. Int J 1737 Control Automat Syst 10(2):249–256

123