Spacecraft Attitude Control using Path Integral Method via Riemann Manifold Hamiltonian Monte Carlo

Bryce G. Doerr,∗ Richard Linares† University of Minnesota, Minnesota, 55455, USA Christopher . Petersen‡ Air Force Research Laboratory, Kirtland AFB, New Mexico, 87117, USA

The separation principle for control and estimation is the traditional method for stochas- tic optimal control of linear systems. However, such an approach does not hold true as systems become increasingly nonlinear. This paper presents a new method for control through estimation based on the Path Integral (PI) formulation and the Riemann Mani- fold Hamiltonian Monte Carlo (RMHMC) method. The approach uses the path integral method to formulate the control problem as an estimation problem and adds the estima- tion of the model parameters to the problem. Then, a solution is found by solving the estimation problem by Riemann Manifold Hamiltonian Monte Carlo (RMHMC) sampling which includes a solution for the control parameters. By solving the nonlinear control and estimation problem using the path integral method via RMHMC, no open algorithmic tuning parameters other than exploration noise are required and the control solution has numerically robust performance to high dimensionality in the system control input. The methodology is specifically applied to the spacecraft attitude control problem, though it should be noted that such an approach is generalizable. Simulation results are presented which demonstrate good performance when utilizing the PI via RMHMC method.

I. Introduction

There exists uncertainty in how dynamical systems are modeled, how information is obtained from sensors, and the effects of external disturbances on the system. This is traditionally reflected with the addition of process and measurement noise in the equations of motion. While some of these uncertainties are small, it presents a large problem in areas that need precise control. Specifically for spacecraft, such precision is needed in small satellites for astrophysics research, surveillance, and autonomous servicing which require precise attitude control.1 The issue that occurs for controlling spacecraft precisely is that the system is highly nonlinear, and as such, dynamic uncertainties can be amplified, making the system difficult to control.2 Nevertheless, scientists are motivated to develop estimation and control methods that provide precise tracking so spacecraft can return necessary data for analysis. Our work here develops and extends the control theory for the space attitude problem using the path integral method via Riemann Manifold Hamiltonian Monte Carlo (RMHMC) in order to control spacecraft with precision in the presence of uncertainties. Several methods have been used to develop control and estimation techniques for nonlinear stochastic sys- tems. For the control of the spacecraft attitude problem, nonlinear optimal control strategies include solving the Hamilton-Jacobi equation,3 and using physically-based penalty functions.4 With estimation, Kalman filtering provides accurate estimates of the spacecraft orientation and angular rates.5, 6 Unfortunately, non- linearities in the dynamics do not allow for control and estimation to be combined without performance and robustness to uncertainty repercussions. Fortunately, combined control and estimation methods exist through the separation principle.

∗Graduate Student, Aerospace Engineering and Mechanics. AIAA Student Member. email: [email protected]. †Assistant Professor, Aerospace Engineering and Mechanics. Senior AIAA Member. email: [email protected]. ‡Research Engineer, Space Vehicles Directorate. AIAA Member. email: [email protected].

1 of 15

American Institute of Aeronautics and Astronautics By developing a stochastic adaptive control system, the separation principle can be used to estimate and control the stochastic system simultaneously.7, 8 The separation of both the estimation and control can be optimal to a known objective function, or it can also be used as an assumption to simplify the problem. For example, the separation principle holds when the process and measurement noise in a system is Gaussian and linear for the unknown parameters, and the objective function is quadratic.9 By separating the controller and estimator design, both can be combined to form the stochastic adaptive control by estimating the parameters and then using the information to update the controller. The benefit of this solution is that if both the estimator and controller are optimal to their individual objective functions, the controller input obtained will be optimal (if the additive uncertainty is Gaussian). This also follows the certainty equivalence principle where the additive uncertainty in the system does not change the optimality of the control solution. Unfortunately, this does not let the controller take any measure to improve the controller using the uncertainties in the parameters. These are then non-dual adaptive controllers. Dual control model predictive control (MPC) has been used for simultaneous identification and control of uncertain systems while satisfying constraints.10–12 MPC computes the control input by solving a finite- horizon optimization problem at the current time instant. The benefit of dual control MPC is that it provides robustness to uncertainties using simultaneous identification and control for linear systems. For applications, dual control MPC has been studied for spacecraft rendezvous and attitude control.13, 14 Most dual control MPC methods, however, are required to have additional hard constraints in order to produce a stabilizing control. An example is the persistent excitation condition (constraint) which develops bounds where the input is full rank by being persistently excited.15 By having this constraint, the problem becomes more complex and more computationally expensive.15 Nonlinear MPC adds more complexity as complicated programming methods are necessary, and it may not even provide guarantees of a solution. For our formulation, the persistent excitation condition is not needed to provide robustness to uncertainties due to exploiting Riemann Manifold Hamilton Monte Carlo (RMHMC). Stochastic adaptive control and dual control MPC which combine estimation and control are viable in different conditions, but for the nonlinear spacecraft attitude problem, methods that are robust to uncertain- ties and not computationally expensive are needed. The objectives of the paper is to control a spacecraft’s attitude through the use of the path integral formulation. By setting up an objective function that includes a control and estimation term, it can be converted to an inference problem that Markov Chain Monte Carlo (MCMC) can solve. In previous work, the inference problem was solved using a variant of MCMC called Hamiltonian Monte Carlo (HMC).16 For this work, a variation of HMC called the RMHMC is used which overcomes the difficulty of tuning the exploration noise. Compared to other MCMC algorithms, RMHMC provides less iterations to form the sample distribution of the probability density distribution than the Metropolis-Hastings (MH) approach and accepts the states in most of the runs of the Monte Carlo simu- lations.17, 18 RMHMC can explore the probability density function (PDF) of the inference problem more efficiently than MH for higher dimensionality. By solving the nonlinear control and estimation using the path integral formulation via RMHMC, the path integral method provides a straightforward formulation, no open algorithmic tuning parameters other than exploration noise are needed, and the control solution has numerically robust performance to high dimensionality in the system.19 The paper is organized as follows. In Section II, the path integral and MCMC theory is discussed includ- ing methods and solutions to path integral, the Metropolis-Hastings algorithm and the Riemann Manifold Hamilitonian Monte Carlo algorithm. Section III discusses the dynamical models that demonstrate the methods and solutions discussed in the previous section. Section IV discusses relevant simulation results involved in the path integral formulation, MCMC, and RMHMC. Lastly in Section V, concluding remarks is provided including future research work.

II. Theory

A. Stochastic Optimal Control Problem The following discussion on stochastic optimal control follows closely to that of Theodorou.19 The stochastic dynamical system is defined as x˙ t = f(xt,t,θ)+ G(xt)(ut + ǫt), (1)

yt = h(t, xt)+ vt, (2)

2 of 15

American Institute of Aeronautics and Astronautics where xt is the system state, ut is control input, ǫt is zero mean Gaussian noise with variance Σǫ, f(xt,t,θ) is dynamics as a function of the state, and G(xt) is the control matrix. It is desired to determine both the control trajectory and best parameters, θ that matches the output data, yt. The approach of this paper is to achieve this objective using the path integral control method to formulate the control problem as an inference problem and then add the additional inference of the model parameters, θ, to the problem.30–32 A generalized finite horizon cost function for the trajectory is given by

tN

R(τk)= φtN + rtdt, (3) Ztk

where is the initial time, tn is the final time, τk is the trajectory, φtN is the terminal cost, and rt is the immediate cost. The immediate cost is given in the form 1 r = q(x ,t)+ u⊺Ru , (4) t t 2 t t

where q(xt,t) is the state dependent cost and R is the positive semi-definite control weight matrix. The cost is assumed to be quadratic in terms of control input. For deterministic optimal control, the input is found by minimizing the cost function in terms of control, but for stochastic optimal control, the input is found by minimizing the expectation of the cost function in terms of control

V (xtk ) = min Eτk [R(τk)], (5) utk :tN

where Eτk [·] is the expectation of all the trajectories. The stochastic optimal control problem (Eq. (5)) can be equivalently given as a Hamilton-Jacobi-Bellman (HJB) equation given by

⊺ 1 ⊺ −∂tVt = min rt + (∇xVt) Ft + Trace [(∇xxVt) GtΣǫG ] , (6) u 2 t   where Ft is the nonlinear dynamics without the noise given by Ft = f(xt,t)+ G(xt)(ut), ∂t is the partial derivative with respect to time, ∇x is the Jacobian of the value function, and ∇xx is the Hessian of the value function.20, 21

B. Hamilton-Jacobi-Bellman (HJB) Solution To obtain the HJB solution, the optimal control must first be found by taking the minimization of Eq. (6). Equation (4) is substituted into Eq. (6), and the minimum is solved by taking the gradient and setting it to zero which results in −1 ⊺ ut = −R Gt (∇xt Vt). (7) The resulting second order partial differential equation (PDE) from the substitution from Eq. (4) and (7) is

⊺ 1 ⊺ −1 ⊺ 1 ⊺ −∂ V = q + (∇xV ) f − (∇xV ) G R G (∇xV )+ Trace ((∇xxV ) G Σ G ) . (8) t t t t t 2 t t t t 2 t t ǫ t The goal is to transform this second order PDE into a linear PDE in which there are solutions for. To form a linear PDE from Eq. (8), an exponential transformation of the value function is taken which is given by

Vt = −λ log Ψt, (9)

where Ψt is the exponential cost. The partial derivatives in terms of time (∂tVt) and state (∇xVt and ∇xxVt) are given as 1 ∂tVt = −λ ∂tΨt, (10) Ψt 1 ∇xVt = −λ ∇xΨt, (11) Ψt

1 ⊺ 1 ∇xxVt = λ 2 ∇xΨt∇xΨt − λ ∇xxΨt. (12) Ψt Ψt

3 of 15

American Institute of Aeronautics and Astronautics These are substituted back into Eq. (8) to form

2 λ λ ⊺ λ ⊺ −1 ⊺ 1 ∂tΨt = − (∇xΨt) ft − 2 (∇xΨt) GtR Gt (∇xΨt)+ Trace(Γ), (13) Ψt Ψt 2Ψt 2 where 1 ⊺ 1 ⊺ Γ= λ ∇xΨ ∇xΨ − λ ∇xxΨ G Σ G . (14) Ψ2 t t Ψ t t ǫ t  t t  Equation (14) can be simplified further using a property of traces,

1 ⊺ 1 ⊺ Trace(Γ) = λ 2 Trace(∇xΨt GtΣǫGt∇xΨt) − λ Trace(∇xxΨtGtΣǫGt ). (15) Ψt Ψt −1 Next, it is assumed that λR = Σǫ. This condition relates the noise variance to the control cost and implies that a higher noise variance should have a lower control cost.22, 23 Otherwise, a lower noise variance should have a higher control cost. For example, a system may be under a large disturbance (high noise variance), thus more significant control input is required to bring the system back to stability or a desired state. More significant control input occurs with a lower control cost R. Thus, the control weight matrix R is inverse proportional to the variance of the noise. By using the assumption above, the expression inside Eq. (13) simplifies to

−1 ⊺ ⊺ λGtR Gt = GtΣǫGt = Σ(xt)=Σt, (16) and cancels out with the first term in Eq. (15). The PDE is reduced to

1 ⊺ 1 ⊺ −∂ Ψ = − q Ψ + f ∇xΨ + Trace(∇xxΨ G Σ G ), (17) t t λ t t t t 2 t t ǫ t 1 with a boundary condition of ΨtN = exp − λ φtn . The second order linear PDE (Eq. (17)) is known as the Chapman Kolmogorov partial differential equation. Unfortunately, analytical solutions usually cannot be  found for general problems that contain nonlinear systems and cost functions, but the Chapman Kolmogorov equation can be converted into an inference problem which can be approximated by well-known approximate inference techniques such as Markov chain Monte Carlo (MCMC) sampling, or for this work, RMHMC. To convert the equation into an inference problem, a relation between the solution of PDEs and their representation as a stochastic differential equation (SDE) is used. The Feynman-Kac formula provides this relation by finding distributions of random processes used to solve SDEs and determining numerical methods used to solve PDEs.24, 25 Thus, the solution by using the Feynman-Kac formula on Eq. (17) is

tN tn 1 − R qtdt 1 1 tk λ Ψtk = Eτk Ψtn e = Eτk exp − φtN − qtdt (18) λ λ t     Z k  in continuous time or

− 1 N 1 Ψtk = lim p(τk|xi)exp − φtN + qtj dt dτk (19) dt→0  λ   j=i Z X    in discrete time where p(τk|xi) is the probability of a sample trajectory conditioned on the initial state xtk . The integration is taken with respect to the each sample trajectory τk. Therefore, the exponential cost to go

Ψtk can be solved. The conditional probability p(τk|xi) is developed in the next section using path integral theory.

C. Path Integral Formulation

The probability p(τk|xtk ) can be expanded out as

p(τk|xtk )= p(τk+1|xtk ),

= p(xtk+1 ,..., xtn |xtk ), N−1 (20)

= p(xtk+1 |xtk ), j=i Y

4 of 15

American Institute of Aeronautics and Astronautics where the states in the trajectory are independently distributed and the initial state xtk does not contribute to the joint conditional probability. The probability p(xtk+1 |xtk ), under the assumption of Brownian motion and zero mean variance, is

p(xtk+1 |xtk )= N (xtk ; xtk+1 − ftk dt, Σtk ), (21) G G⊺ dt where Σtk is formed as tk Σǫ tk . The probability density with inclusion of the parameters and measure- ments is N−1

p(τk|xtk )= p(ytj+1 |xtj+1 )p(xtj+1 |xtj ,θ). (22) jY=k Equation (22) is simplified by using Eq. (16), resulting in,

⊺ −1 ⊺ Σtj = Gtj ΣǫGtj dt = λGtj R Gtj dt = λHtj dt. (23)

−1 ⊺ Inside Eq. (23), Htj = Gtj R Gtj . The probability p(τk|xtk ) simplifies to

N−1 N−1 2 1 1 xtj+1 − xtj p(τk|xtk )= exp − − ftj , (24) λ dt −1 2π det(Htj )  2 H  j=k j=1 tj Y X p   2 ⊺ where kvkM = v Mv is the weighted square norm or Mahalanobis distance. With Eq. (19) and (24), the

path integral cost to go Ψtk becomes

1 Ψtk = lim exp − Z(τk) dτk, (25) dt→0 λ Z   where λ(N − i)l Z(τ )= S˜(τ )+ log(2πdtλ), (26) k k 2 − λ N 1 S˜(τ )= S(τ )+ log ||H ||, (27) k k 2 tj Xj=k and N−1 N−1 2 1 xtj+1 − xtj S(τk)= φtN + qtj dt + − ftj . (28) − 2 dt H 1 j=i j=k tj X X

S(τk) is the cost function of the optimal control problem, and it can be compared to Eq. (3) and (4) directly. λ N−1 For Eq. (27), if the Gtk is not state dependent (i.e. Gtk = G), then the 2 j=k log ||Htj || term vanishes and S˜ = S(τ ). τk k P

D. Path Integral Optimal Control It was shown in Eq. (7) that the minimization of the HJB equation in terms of u is the optimal control solution in terms of the Jacobian of the value function. By substituting the transformation of the value function into Eq. (7), the control becomes

∇x Ψ −1 tk tk utk = λR Gtk . (29) Ψtk

The above equation can be reduced by substituting Eq. (25) as given by

utk = P (τk)uL(τk)dτk , (30) Z − 1 S˜(τ ) e λ k P (τk)= , (31) − 1 S˜(τ ) e λ k R

5 of 15

American Institute of Aeronautics and Astronautics and − −1 ⊺ −1 ⊺ 1 1 −1 uL(τk)= R G Gt R G Gt ǫt − λHt Trace H ∂x j Ht . (32) tk k tk k k k 2 tk tk k      The equations given by Eq. (30), (31), and (32) are the path integral solution to the stochastic optimal

control problem. With these final equations, both the estimation part p(ytk+1 |xtk+1 ) and the objective function given by the control problem are included to an overall inference problem which MCMC sampling can solve.

E. Markov Chain Monte Carlo (Metropolis-Hastings) Markov Chain Monte Carlo (MCMC) are methods that are used to sample from probability densities which are only known up to a scaling of dimension.26 These samples can then be used to provide estimates of moments or other statistical properties of the PDF. Suppose an equation in the form ∞ U¯ = U(x)p(x)dx (33) −∞ Z is to be determined, but the PDF p(x) is unknown up to a scale factor. MCMC can solve this problem by providing a sequence of samples xi ∼ p(x). Additionally, if event p(x) is known, computing these multidi- mensional integrals can be computationally intensive by using normal numerical integration methods. Such computation however can be alleviated by sampling the distribution and calculating statistical properties to determine the integral over the high dimensional state. The integral in Eq. (33) can be viewed as an expectation with respect to p(x). Thus, the integration problem can be expressed as a numerical expectation approximated by samples for the distribution

N 1 U¯ = U(x ), (34) N i i=1 X where xi ∼ p(·), and N is the number of samples. This equation determines the integral of U(x) by sampling each xi in the distribution and calculates the average of the accepted samples. Both MH and HMC, which are explained later, are methods that sample the distribution to determine the integral, but they vary on how they sample and accept the samples while exploring the PDF. The idea for the MH algorithm is that it picks a neighboring state of the current state, and it either accepts or rejects the state depending on the change in energy. First, the candidate in the same area of the last state is selected as given by ′ xn = xn + wk, (35)

where wk is zero mean Gaussian noise. Thus, the candidate is created through a random walk. With probability ′ p = min [1, exp (log p(xn) − log p(xn))] , (36) ′ the new state is accepted as xn+1 = xn, otherwise it is rejected with probability 1 − p and xn+1 = xn. From ′ Eq. (36), the candidate is always accepted if the energy from the candidate state is lower (i.e. p(xn)

6 of 15

American Institute of Aeronautics and Astronautics F. Hamiltonian Monte Carlo HMC provides a more efficient way in sampling to produce the sampled PDF of a distribution compared to the MH algorithm for high-dimensional statistical problems.17 The HMC can be described as a hockey puck sliding over a surface without friction which will stop at a certain point in time and then pushed again in a random direction. By this visualization, it can explore the PDF much more readily than a random walk. While MH algorithm proposes a candidate through random walks, HMC proposes a candidate by simulating a Hamiltonian dynamics given by H(xt, pt)= V (xt)+ T (xt, pt), (37) where V (xt) is the potential energy given by the negative log of the target PDF (V (xt)= − log(p(τk))) and T (xt, pt) is the kinetic energy which is dependent on the state parameters and the auxiliary momentum variable (gradient of the state parameters). This will lead the Hamiltonian to a higher probability state with the gradient term. The momentum term inside T (xt, pt) contains the velocity or gradient of the Hamiltonian system which is randomly generated for a proposed candidate. The kinetic energy is assumed to have the form 1 T (p )= p⊺M −1p , (38) t 2 t t where M is the symmetric and positive definite mass matrix of the system. Substituting this equation into Eq. (37), the Hamiltonian system is given by 1 H(x , p )= V (x )+ p⊺M −1p . (39) t t t 2 t t From this Hamiltonian system, the trajectories can be determined by taking partial derivatives in terms of pt and xt as given by ∂H −1 x˙ t = = M pt (40) ∂pt and ∂H p˙ t = − = −∇xt V (xt). (41) ∂xt where ∇xt is the gradient of in terms of the state. These solutions preserve total energy, volume, and are time reversible. The state candidate, using Eq. (40), is determined by numerically solving over some integration time t. The time for integration is chosen to increase the probability for acceptance. To determine the state candidate by integration, a numerical integrator that preserves energy, volume, and time reversibility is needed. The Leapfrog method preserves these attributes.17 The leapfrog integration method moves forward by ∆ with the following ∆ p ∆ = pt − ∇xV (xt), (42) t+ 2 2

−1 xt+∆ = xt + ∆M p ∆ , (43) t+ 2 and ∆ pt+∆ = p ∆ − ∇xV (xt+∆). (44) t+ 2 2 Unfortunately, numerical approximation introduces errors to the integration, and the leapfrog only ap- proximates the conservation of energy. To correct for this, a similar accept and reject method that was used for the MH algorithm is used for the HMC algorithm. This equation is given by

∗ ∗ ∗ α(xt, xt ) = min [1, exp(−H(xt , pt )+ H(xt, pt))] . (45) If the numerical integration produces no errors (Hamiltonian is conserved), the proposal candidates should have a 100% probability of acceptance, but in reality, numerical integration may produce numerical errors that causes the trajectory to head in a decreased probabilistic state which would increase the probability for the state candidate to be rejected. If the trajectory increases the probability then it will be accepted. Thus, with a set number of samples, the PDF can be explored through a position and a velocity term which provides a more thorough exploration compared to MH with acceptance that is not dependent on the dimensionality of the candidate state.

7 of 15

American Institute of Aeronautics and Astronautics Equations (42) and (43) can be written in terms of xt+∆ as 2 −1 ∆ −1 x = x + ∆M p − M ∇xV (x ) (46) t+∆ t t 2 t which is in the form of a pre-conditioned gradient decent or Langevin diffusion. From this equation, a tuned mass matrix is necessary to obtain good exploration noise characteristics, but tuning this mass matrix M is difficult without knowledge of the target density.

G. Riemann Manifold Hamiltonian Monte Carlo The Riemann Manifold Hamiltonian Monate Carlo (RMHMC) removes the difficulty of manually tuning the mass matrix by using the local structure of the target density. This is done by utilizing the Fisher Information Matrix of the PDF dependent on the state, given by ∂ ∂ ⊺ M(x)= F = E log (p(·)) log (p(·)) . (47) ∂x ∂x     Note, however, the mass matrix is state dependent, therefore the Leapfrog method is not applicable for a non- separable Hamiltonian system. Nevertheless, a generalized Leapfrog algorithm can be used to simulate non- separable dynamics using the Fisher Information Matrix. The Hamiltonian for the new RMHMC approach is 1 − 1 H(x , p )= V (x )+ p⊺F 1p + log [Det(F )] , (48) t t t 2 t t 2 which must be solved to solve the HMC on the Riemann manifold defined by the geometry of the parameter space. This is a challenge due to non-separable nature of the Hamiltonian dynamics which require gradients of F . Therefore for this approach, F is held constant during the simulation of the Hamiltonian for generation of the proposal states. Despite the fact F is held constant while solving the Hamiltonian, F is updated for the next proposal candidate. Thus, this method uses the Riemann manifold geometry without the use of a generalized Leapfrog or further derivatives of F . With this generalization, the Hamiltonian is

1 − H(x , p )= V (x )+ p⊺F 1p , (49) t t t 2 t t 1 where the term 2 log [Det(F )] disappears since the mass matrix is held constant for the proposal step. Now, the difficulty of tuning the mass matrix is removed, and the sampling can propagate the PDF without difficulty.

H. Path Integral via RMHMC For the nonlinear control and estimation problem, a path integral method can be developed using Eqs. (30), (31), and (32). Specifically, the probability given by P (τk) in Eq. (31) has a PDF in the form 1 p(τ ) = exp − S˜(τ ) , (50) k λ k   where S˜(τk) is the cost function of the optimal control problem, and λ is a constant used in the exponential transformation given in Eq. (9). Using Eq. (50), MCMC simulations can determine the mean in Eq. (34), which is an approximation of the actual PDF given by Eq. (33). By inspection, Eq. (33) is related to path integral optimal control solution given by Eq. (30), thus by utilizing MCMC sampling, the control trajectory for a nonlinear control and estimation problem can be found. An improvement of the MCMC method called RMHMC is used to determine the mean of the control trajectory states by sampling more efficiently than its MH counterpart.

III. Dynamical Models

A double integrator system and a spacecraft attitude system are used to determine the viability of the path integral method via MCMC. The double integrator system is used specifically because the linearized spacecraft attitude equations of motion reduce to a form similar to a double integrator system. The dynamics and kinematics of each are given below.

8 of 15

American Institute of Aeronautics and Astronautics A. Double Integrator Model The double integrator linear time-invariant (LTI) system is defined as

0 1 0 Ac = , Bc = , (51) "0 0# "1# where Ac and Bc are continuous time state matrices. The Ac and Bc matrices are discretized along a fixed time interval utilizing a zero-order hold assumption for the control (i.e. control is held constant over the time-interval). This results in the discretized A and B matrices and equations of motion,

xk+1 = Axk + B(uk + ǫk). (52)

⊺ To solve for the control trajectory Un = [uk,..., uk+N−1] , the dynamical equation is stacked so it is only dependent on the initial state and the control input at each time step as given by

Xk+1 = Πxk +ΥUk, (53)

Yk =ΦXk, (54) where ⊺ Π= A A2 ,...,AN , (55) h i B 0 ...... 0 .  AB B 0 ......  .  A2B AB B 0 ... .    , Υ=  . . .  (56)  . . .. .   ...... 0 .     . . ..   ...... B 0   − −  AN 1B AN 2B ...... B     0 ... 0 . .  0 .. .  , Φ= . (57)  . ..   . . 0     0 ... 0 C   ⊺  ⊺ xk is the initial state, Yk = [yk+1,..., yk+N ] is the output trajectory vector, and Xk = [xk+1,..., xk+N ] is the state trajectory vector. An optimal control problem is determined through a finite-time horizon optimization 1 min U ⊺QU + HU , (58) ⊺ n n n Un=[un,...,un+N−1] 2 where Q and H is given by ⊺ Q =2L1 + 2Υ L2Υ, (59) ⊺ ⊺ H =2xnΨ L2Υ. (60)

L1 and L2 are defined as

R 0 ... 0 Q 0 ... 0 . . . .  0 .. .   0 .. .  L1 = ,L2 = . (61)  . ..   .   . . 0  . Q 0       0 ... 0 R  0 ... 0 P         

9 of 15

American Institute of Aeronautics and Astronautics The cost function given by Eq. (58) is the S˜(τk) given in Eq. (50). For both the MH and RMHMC algorithms, the potential energy V (xt)= − log(p(τk)) is used. This equation is given by

1 V (x )= − log exp − S˜ , t λ   (62) 1 1 = U ⊺QU + HU . λ 2 n n n   The RMHMC algorithm uses an additional velocity term given by

1 1 ⊺ ∂V (x ) ∂ U QUn + HUn t = λ 2 n , ∂U ∂U n n  (63) 1 = (QU + H) . λ n Now, path integral via MH and RMHMC can be found.

B. Spacecraft Attitude Model The quaternion attitude kinematics equations are given by27 1 1 q˙ = Ξ (q) ω = Ω (ω) q, (64) 2 2 ⊺ where q = [q1, q2, q3, q4] represents the quaternions for the system and ω is the angular velocity of the body- ⊺ fixed frame with respect to the inertial frame with components ω = [ωx,ωy,ωz] . The quaternion describes the orientation of the spacecraft using a body-fixed frame respect to the inertial frame. The dynamics for a rigid body spacecraft are given by27 Jω˙ = [ω×] Jω + L, (65) where J is the moment of inertia matrix, L is the applied torque, and [ω×] is a skew-symmetric matrix of angular velocity components given by

0 −wz wy ω [ ×]=  wz 0 −wx . (66) −w w 0  y x    By assumption, no other forces act on the spacecraft. Using the attitude dynamics, a cost function is defined as28 T ⊺ ⊺ ⊺ J = δq Qqδq + δω Qωδω + u Rudt, (67) Z0 subject to: qq⊺ =1, (68) x˙ = f(x,t), (69) where x = [ω⊺ q⊺]⊺, δq is the quaternion error between the current quaternion (q) and the desired ⊺ quaternion (qd), and similarly, δω = ω − ωd. For the regulation problem, qd = [0, 0, 0, 1] where q4 is the scalar quaternion. The quaternion error is simplified by

−1 δq = q ⊗ qd , = q ⊗ qd, (70) = q, where q ⊗ qd is in the form ∗ ∗ ∗ ∗ q4 q1:3 + q4q1:3 − q1:3 × q1:3 q ⊗ q = ∗ ∗ . (71) " q4q4 − q1:3q1:3 #

10 of 15

American Institute of Aeronautics and Astronautics Therefore, the cost function simplifies to

T ⊺ ⊺ ⊺ J = q Qqq + ω Qωω + u Rudt. (72) Z0 This cost function provides the full nonlinear control solution using the path integral method, but the control solution can be approximated using a linear equation about the current quaternion q. The dynamics are linearized by a first order Taylor series expansion given by

− − d ω J 1 [ω×] J 0 × ω J 1 = 3 4 + L, (73) dt 1 ω 1 "q# " 2 Ω( ) 2 Ξ(q)# "q# " 0 # where − − J 1 [ω×] J 0 × J 1 A(t)= 3 4 , B = . (74) 1 ω 1 " 2 Ω( ) 2 Ξ(q)# " 0 # The A(t) and B matrices are discretized along a fixed time interval utilizing a zero-order hold for the control. Thus, it is in the form that can be solved by the same finite horizon optimal control problem used by the double integrator which is given by Eq. (58). The spacecraft attitude cost is nonlinear, but MH and RMHMC algorithms use the gradient of the corresponding potential energy and kinetic energy of the linearized time varying model for the path integral method given by Eqs. (62) and (63).

IV. Results

20 1.1 PI Control via MetroHast )

t LQR control 15 ( 1.05 1 x 1 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Time (s) 5 1 ) ) t ( 2 0.8 t 0 2 ( x u 0.6 -5 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Time (s) -10 0 ) t -2 -15 ( u -4 -20 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 -20 -15 -10 -5 0 5 10 15 20 Time (s) u(t1)

(a) The PDF for the control input at t1 and (b) The control input using path integral via t2 and optimal control (red point). MH (blue) and LQR (red).

Figure 1: Simulation Results for Case 1.

For the simulations, a double integrator and spacecraft dynamical model are tested under path integral dual control method by using MCMC sampling. Both the MH and RMHMC methods are compared to each other and the linear Quadratic regulator (LQR) method. The results are shown for the double integrator dynamics and the spacecraft attitude dynamics.

A. Double Integrator Path Integral via Metropolis-Hastings First, the LTI double integrator system is simulated to determine the characteristics of Metropolis-Hastings at different conditions. The double integrator system uses 11 time-steps (Case 1) and 201 time-steps (Case 2) to determine how exploring the PDF at different dimensions affects the sample propagation. Figure 1(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimensionless) in both axes and the sampled PDF for the control input at t1 and t2. The red point is the LQR optimal control solution at t1 and t2 seconds. 10, 000 samples were generated from the initial sample in a random walk motion. Through the MH acceptance criteria, the sample was either accepted or rejected by comparing its energy to the previous point. For this simulation, the acceptance ratio was 62.56% from tuning MH

11 of 15

American Institute of Aeronautics and Astronautics parameters. From Figure 1(a), the control input propagates more in higher probabilities of the PDF than the lower probabilities which follows the theory direction. Control input was determined by taking the mean of the sampled PDF which follows the formulation in Eq. (30). Figure 1(b) shows the linear response and control input using the PI via MH method and a base LQR method. Starting from the initial condition of [1, 1]⊺ and simuating for only 11 time-steps (an 11 state control input vector), the path integral method provides very similar performance to the LQR control method.

20 1.5 ) t 15 ( 1 1 PI Control via MetroHast

x LQR control 0.5 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) 5 1 ) ) t ( 2 0 t 0 2 ( x u -1 -5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) -10 10 ) t 0 -15 ( u -10 -20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -20 -15 -10 -5 0 5 10 15 20 Time (s) u(t1)

(a) The PDF for the control input at t1 and (b) The control input using path integral via t2 and optimal control (red point). MH (blue) and LQR (red).

Figure 2: Simulation Results for Case 2.

Figure 2(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimensionless) in both axes and the sampled PDF for the control input at t1 and t2 seconds. The red point is the LQR control solution at t1 and t2 seconds. Again, the 10, 000 samples were generated in a random walk motion. The acceptance ratio for this simulation was 78.62% by tuning the MH parameters. From Figure 2(a), the control input does not explore the PDF evenly. By increasing the number of samples, there is a possibility that it can explore the PDF space, but it could take more time. Also, the exploration itself does not look like a random walk when comparing this plot to Figure 1(a). Since Case 2 contains a 201 dimensional control vector, MH has a hard time accepting samples with lower energy than its previous step. The larger the dimension, the more inefficient MH becomes when exploring the PDF and accepting samples. Figure 2(b) shows the linear response and control input for both the PI via MH and LQR method with initial condition of [1, 1]⊺ and 201 time-steps. The control input from the figure deviates a lot more from the LQR control. More samples are needed to get a comparable control input to the LQR method. The state response also has diminished performance compared to the LQR state response. For the MH method to work, more samples are needed to get performance comparable LQR in this high dimensional system.

B. Double Integrator Path Integral via Riemann Manifold Hamiltonian Monte Carlo The LTI double integrator system was simulated using the RMHMC method to 11 time-steps (Case 3) and 201 time-steps (Case 4) to show how the samples propagate at different dimensions. For Case 3, Figure 3(a) shows the contour plot for the path integral PDF for a mesh [−20, 20] (dimen- sionless) in both axes and the sampled PDF for the control input at t1 and t2 seconds. The red point is the LQR control solution at t1 and t2 seconds. 3, 000 samples were generated using Hamiltonian dynamics with an acceptance ratio of 50.00%. The acceptance ratio of 50.00% and 3, 000 samples provides sufficient sample exploration of the PDF. The samples were accepted or rejected using the candidate’s Hamiltonian. Since the Hamiltonian is conserved, it should have 100% probability of acceptance, but numerical errors from the integration steps of the Hamiltonian can increase the rejection rate. The control input itself propagates in areas of higher probability in the PDF. By taking the mean of the sampled control inputs, the control input for the whole trajectory was determined using Eq. (30). Figure 3(b) shows the linear response and control input using the RMHMC method and a LQR method with 11 time-steps and initial conditions of [1, 1]⊺. PI via RMHMC provides very similar performance to the LQR controller. The RMHMC method at low dimensionality also provides very similar performance to the MH method shown in Figure 1(b). Thus, path integral via MH or RMHMC can be used interchangeably for low dimensional problems without any drop in

12 of 15

American Institute of Aeronautics and Astronautics performance in the controller.

20 1.1 PI Control via HMC )

t LQR control 15 ( 1.05 1 x 1 10 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Time (s) 5 1 ) ) t ( 2 0.8 t 0 2 ( x u 0.6 -5 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 Time (s) -10 -1 ) t -2 -15 ( u -3 -20 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 -20 -15 -10 -5 0 5 10 15 20 Time (s) u(t1)

(a) The PDF for the control input at t1 and (b) The control input using path integral via t2 and optimal control (red point). RMHMC (blue) and LQR (red).

Figure 3: Simulation Results for Case 3.

20 1.5 ) t 15 ( 1 1 PI Control via HMC

x LQR control 0.5 10 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) 5 1 ) ) t ( 2 0 t 0 2 ( x u -1 -5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) -10 5 ) t 0 -15 ( u -5 -20 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 -20 -15 -10 -5 0 5 10 15 20 Time (s) u(t1)

(a) The PDF for the control input at t1 and (b) The control input using path integral via t2 and optimal control (red point). RMHMC (blue) and LQR (red).

Figure 4: Simulation Results for Case 4.

The contour plot shown in Figure 4(a) was simulated for 201 time-steps (Case 4). The red point is a LQR control input for t1 and t2 seconds. 3, 000 samples were generated using Hamiltonian dynamics with an acceptance rate of 49.75%. In contrast with Figure 2(a), the samples propagate the PDF in a sufficient manner. By using less than half the samples from the MH, the RMHMC method propagates the sampled control input more in areas of higher probability and less in lower probability. The individual states in the high dimensional system do not affect the acceptance or rejection rates for exploration. The acceptance and rejection are only affected by errors in the integration which move state candidates toward low probabilistic states. Thus, there will be a chance the state candidate will be rejected. The RMHMC method can propagate through a PDF sufficiently and with less samples than the MH method. Figure 4(b) shows the linear response and the control input for PI via RMHMC and the LQR controller with 201 time-steps and initial conditions of [1, 1]⊺. The control input deviates from the LQR controller a lot less than the MH method-based controller shown in 2(b). Plus, there is less variance from RMHMC sampling. The state response follows the LQR solution a lot better than the MH method with less noise incorporated in the solution. Thus, path integral via RMHMC provides better sampling than random walk methods for high dimensional problems.

C. Spacecraft via Riemann Manifold Hamiltonian Monte Carlo A spacecraft with a mass of 2.6 kg and dimensions of 0.1 x 0.1 x 0.2 m is simulated using the RMHMC method for 121 time-steps. Since the control input at a time-step is size 3, the dimension length of the RMHMC

13 of 15

American Institute of Aeronautics and Astronautics 10-3 2 )

0

rad/s ω1

( -2 ω2 ω3

ω -4 0 10 20 30 40 50 60 Time (s) 0.04 q1 0.02 q2 q3 q 0

-0.02 0 10 20 30 40 50 60 Time (s) 10-5 1 ) 0 Nm

( -1 u1 u2

u u3 -2 0 10 20 30 40 50 60 Time (s)

Figure 5: The time history using path integral via RMHMC. simulation is 363. Figure 5 shows the response and control input for the spacecraft system using the stacked cost function by linear approximation given by Eq. (58). With an initial orientation of [2.0◦, 3.0◦, 4.0◦]⊺, the system was able to stabilize in all states by only using 1, 000 RMHMC samples. Due to the small angle approximation, q4 is not shown since it will stay approximately 1 throughout the simulation. For dimension length of 363, 1, 000 RMHMC samples is sufficient to provide a good approximation of the PDF to stabilize the spacecraft optimally to a cost function.

V. Conclusion

This work studied the control problem for nonlinear systems. The Path Integral (PI) method was used to convert a stochastic optimal control problem into an inference problem connecting estimation and control for nonlinear systems. This inference problem is then solved using the Riemann Manifold Hamiltonian Monte Carlo (RMHMC) method formulating a new approach called Path Integral via Riemann Manifold Hamiltonian Monte Carlo (PI-RMHMC). Compared to the path integral via MH approach, the PI-RMHMC is able to solve high dimensional problems (large number of control inputs over a large number of time-steps) with better approximation of the PDF than MH with less samples. Therefore, the PI-RMHMC method is more robust in stability as compared to the MH method. Additionally, the PI-RMHMC control approach is viable for spacecraft attitude systems which is highly dimensional in the number of control inputs for each time-step. Hence, the PI-RMHMC can be extended to larger dimensions than the spacecraft problem that was solved. For future work, the full nonlinear spacecraft model will be used to determine the viability for full nonlinear systems. By using the full nonlinear model, we suspect that the optimal controller will provide a lower cost function than LQR that is based on the linearized spacecraft dynamics. Also, PI-RMHMC will be extended to the dual control problem for spacecraft systems. By including process and measurement noise to the spacecraft model, PI-RMHMC may be able to simultaneous control and estimate the spacecraft through the time interval. One last extension for PI-RMHMC is to use neural networks to develop parameter control methods for the spacecraft attitude problem. PI-RMHMC is generalizable to nonlinear systems, and it be useful for more general stochastic control problems.

Acknowledgments

This work was made possible by support from the Air Force Research Laboratory at Kirtland Air Force Base, through the Universities Space Research Association.

14 of 15

American Institute of Aeronautics and Astronautics References

1Cebrowski, Arthur K., and John W. Raymond. “Operationally responsive space: A new defense business model.” Army War Coll Carlisle Barracks PA, 2005. 2MacKunis, Will, et al. “Adaptive satellite attitude control in the presence of inertia and CMG gimbal friction uncertain- ties.” The Journal of the Astronautical Sciences 56.1 (2008): 121-134. 3Sharma, Rajnish, and Ashish Tewari. ”Optimal nonlinear tracking of spacecraft attitude maneuvers.” IEEE transactions on control systems technology 12.5 (2004): 677-682. 4Schaub, Hanspeter, John L. Junkins, and Rush D. Robinett. ”New penalty functions and optimal control formulation for spacecraft attitude control problems.” Journal of Guidance Control and Dynamics 20 (1997): 428-434. 5Lefferts, Ern J., F. Landis Markley, and Malcolm D. Shuster. ”Kalman filtering for spacecraft attitude estimation.” Journal of Guidance, Control, and Dynamics 5.5 (1982): 417-429. 6Kim, Son-Goo, et al. ”Kalman filtering for relative spacecraft attitude and position estimation.” Journal of Guidance Control and Dynamics 30.1 (2007): 133-143. 7Wittenmark, Bj¨orn. “Adaptive dual control.” Control Systems, Robotics and Automation, Encyclopedia of Life Support Systems (EOLSS), Developed under the auspics of the UNESCO. (2002). 8Wittenmark, Bj¨orn. ”Adaptive dual control methods: An overview.” IFAC Proceedings Volumes 28.13 (1995): 67-72. 9Lobo, Miguel Sousa, and Stephen Boyd. “Policies for simultaneous estimation and optimization.” American Control Conference, 1999. Proceedings of the 1999. Vol. 2. IEEE, 1999. 10Weiss, Avishai, and Stefano Di Cairano. “Robust dual control mpc with guaranteed constraint satisfaction.” Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on. IEEE, 2014. 11Rathousk´y, Jan, and Vladimir Havlena. “Multiple-step active control with dual properties.” IFAC Proceedings Volumes 44.1 (2011): 1522-1527. 12Rathousk´y, Jan, and Vladim´ır Havlena. “MPC - based approximate dual controller by information matrix maximization.” International Journal of Adaptive Control and Signal Processing 27.11 (2013): 974-999. 13Mitani, Shinji. ”Satisficing Nonlinear Spacecraft Rendezvous Under Control Magnitude and Direction Constraints.” (2013). 14Manikonda, V., et al. ”A model predictive control-based approach for spacecraft formation keeping and attitude control.” American Control Conference, 1999. Proceedings of the 1999. Vol. 6. IEEE, 1999. 15Marafioti, G., R. Bitmead, and M. Hovd. “Persistently exciting model predictive control using fir models.” International Conference Cybernetics and Informatics. Vol. 6. 2010. 16van den Broek, Bart, Wim Wiegerinck, and Bert Kappen. ”Stochastic optimal control of state constrained systems.” International Journal of Control 84.3 (2011): 597-615. 17Neal, Radford M. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (2011): 113-162. 18Betancourt, Michael. “A conceptual introduction to Hamiltonian Monte Carlo.” arXiv preprint arXiv:1701.02434 (2017). 19Theodorou, Evangelos, Jonas Buchli, and Stefan Schaal. “A generalized path integral control approach to reinforcement learning.” Journal of Machine Learning Research 11.Nov (2010): 3137-3181. 20Stengel, Robert F. Optimal control and estimation. Courier Corporation, 2012. 21Fleming, Wendell H., and Halil Mete Soner. Controlled Markov processes and viscosity solutions. Vol. 25. Springer Science & Business Media, 2006. 22Kappen, Hilbert J. ”An introduction to stochastic control theory, path integrals and reinforcement learning.” AIP con- ference proceedings. Vol. 887. No. 1. AIP, 2007. 23van den Broek, L. J., W. A. J. J. Wiegerinck, and H. J. Kappen. ”Graphical model inference in optimal control of stochastic multi-agent systems.” (2008). 24Øksendal, Bernt. ”Stochastic differential equations.” Stochastic Differential Equations. Springer Berlin Heidelberg, 2003. 65-84. 25Yong, Jiongmin. ”Relations among odes, pdes, fsdes, bsdes, and fbsdes.” Decision and Control, 1997., Proceedings of the 36th IEEE Conference on. Vol. 3. IEEE, 1997. 26Geyer, Charles. ”Introduction to markov chain monte carlo.” Handbook of markov chain monte carlo (2011): 3-48. 27Hughes, Peter C. “Spacecraft attitude dynamics.” Courier Corporation, 2012. 28Schaub, Hanspeter, and John L. Junkins. Analytical mechanics of space systems. Aiaa, 2003. 29Crassidis, John L., and John L. Junkins. Optimal estimation of dynamic systems. CRC press, 2011. 30Yang, Insoon, et al. “Path integral formulation of stochastic optimal control with generalized costs.” IFAC Proceedings Volumes 47.3 (2014): 6994-7000. 31Williams, Grady, Andrew Aldrich, and Evangelos A. Theodorou. “Model predictive path integral control: from theory to parallel computation.” Journal of Guidance, Control, and Dynamics. AIAA (2017). 32Mensink, Thomas, Jakob Verbeek, and Bert Kappen. “EP for efficient stochastic control with obstacles.” ECAI 2010-19th European Conference on Artificial Intelligence. IOS Press, 2010.

15 of 15

American Institute of Aeronautics and Astronautics