SOLVING MINIMAL-DISTANCE PROBLEMS OVER THE OF REAL SYMPLECTIC MATRICES

SIMONE FIORI∗

Abstract. The present paper discusses the question of formulating and solving minimal-distance problems over the -manifold of real symplectic matrices. In order to tackle the related opti- mization problem, the real is regarded as a pseudo-Riemannian manifold and a metric is chosen that affords the computation of geodesic arcs in closed forms. Then, the considered minimal-distance problem can be solved numerically via a gradient steepest descent algorithm im- plemented through a geodesic-stepping method. The minimal-distance problem investigated in this paper relies on a suitable notion of distance – induced by the Frobenius norm – as opposed to the natural pseudo-distance that corresponds to the pseudo-Riemannian metric that the real symplectic group is endowed with. Numerical tests about the computation of the empirical mean of a collection of symplectic matrices illustrate the discussed framework.

Key words. Group of real symplectic matrices. Pseudo-Riemannian geometry. Gradient-based optimization on . Geodesic stepping method. Citation: Cite this paper as: S. Fiori, Solving minimal-distance problems over the manifold of real symplectic matrices, SIAM Journal on Analysis and Ap- plications, Vol. 32, No. 3, pp. 938 - 968, 2011 1. Introduction. Optimization problems over smooth manifolds have received considerable attention due to their broad application range (see, for instance, [2, 9, 10, 16] and references therein). In particular, the formulation of minimal-distance problems on compact Riemannian Lie groups relies on closed forms of geodesic curves and geodesic distances. Notwithstanding, on certain manifolds the problem of formu- lating a minimal-distance criterion function and of its optimization is substantially more involved, because it might be hard to compute geodesic distances in closed form. One of such manifolds is the real symplectic group. Real symplectic matrices form an algebraic group denoted as Sp(2n, R), with n 1. The real symplectic group is defined as follows: ≥

def 2n×2n T def 0n In Sp(2n, R) = X R X Q2nX = Q2n , with Q2n = , (1.1) { ∈ | } In 0n  −  where symbol In denotes a n n and symbol 0n denotes a whole- zero n n matrix. A noticeable× application of real symplectic matrices is to the control× of beam systems in particle accelerators [12, 29], where Lie-group tools are applied to the characterization of beam dynamics in charged-particle optical systems. Such methods are applicable to accelerator design, charge-particle beam transport and electron microscopes. In the context of vibration analysis, real symplectic ‘transfer matrices’ are widely used for the dynamic analysis of engineering structures as well as for static analysis, and are particularly useful in the treatment of repetitive structures [38]. Other notable applications of real symplectic matrices are to coding theory [8], quantum computing [6, 22], time-series prediction [3], and automatic control [17]. A further list of applications of symplectic matrices is reported in [14]. The minimal-distance problem over the set of real symplectic matrices plays an important role in applied fields. A recent application which involves the solution

∗Dipartimento di Ingegneria Biomedica, Elettronica e Telecomunicazioni (DiBET), Facolt`adi Ingegneria, Universit`aPolitecnica delle Marche, Via Brecce Bianche, Ancona I-60131, Italy, E-mail: [email protected] 1 2 S. FIORI of a minimal-distance problem in the real symplectic group of matrices is found in the study of optical systems in computational ophthalmology, where it is assumed that the optical nature of a centered optical system is completely described by a real symplectic matrix [23, 25] and involved symplectic matrices are of size 4 4 (n = 2). A notable application is found in the control of beam systems in particle× accelerators [15], where the size of involved symplectic matrices is 6 6 (n = 3), and in the assessment of the fidelity of dynamical gates in quantum× analog computation [41], where the integer n corresponds to the number of quantum observables and may be quite large (for example n = 20). Although results are available about the real symplectic group [13, 14, 30], opti- mization on the manifold of real symplectic matrices appears to be far less studied than for other Lie groups. The present manuscript aims at discussing such problem and to present a solution based on the key idea of treating the space Sp(2n, R) as a manifold endowed with a pseudo-Riemannian metric that affords the computation of geodesic arcs in closed-forms. Then, the considered minimal-distance problem can be solved numerically via a gradient steepest descent algorithm implemented through a geodesic-stepping method. A proper minimal-distance objective function on the manifold Sp(2n, R) is constructed by building on a result available from the scien- tific literature about the Frobenius norm of the difference of two symplectic matrices. Given a collection X1,X2,...,XN of real symplectic matrices of size 2n 2n, the goal of this paper is{ the minimization} of the function ×

N def 1 2 f(X) = X Xk (1.2) 2N k − kF k X=1 over the manifold Sp(2n, R) to find the minimizing symplectic matrix. The remainder of this paper is organized as follows. The 2 recalls those notions from differential geometry (with particular emphasis to pseudo-Rie§ mannian geometry) that are instrumental in the development of an optimization algorithm on the manifold of symplectic matrices. Results about the computation of geodesic arcs and of the pseudo-Riemannian gradient of a regular function on the manifold of real symplectic matrices are presented. The 3 defines minimal-distance problems on manifolds and recalls the basic idea of the gradient-steepest-descent§ optimization method on pseudo- Riemannian manifolds. It deals explicitly with minimal-distance problems on the manifold of real symplectic matrices and recalls how to measure discrepancy between real symplectic matrices. The 4 presents results of numerical tests performed on an averaging problem over the manifold§ Sp(2n, R). The 5 concludes the paper. § 2. The real symplectic matrix group and its geometry. The present sec- tion recalls notions of differential geometry that are instrumental in the development of the following topics, such as affine geodesic curves and the pseudo-Riemannian gradient. For a general-purpose reference on differential geometry, see [39], while for a specific reference on pseudo-Riemannian geometry and the calculus of variation on manifolds, see [32]. Throughout the present section and the rest of the paper, ma- trices are denoted by upper-case letters (e.g., X) while their entries are denoted by indexed lower-case letters (e.g., xij ). 2.1. Notes on pseudo-Riemannian geometry. Let denote a p-dimensional pseudo-Riemannian manifold. Associated with each point Mx in a p-dimensional dif- ferentiable manifold is a tangent space, denoted by Tx . This is a p-dimensional whose elementsM can be thought of as equivalenceM classes of curves passing MINIMAL-DISTANCE SYMPLECTIC MATRICES 3

def through the point x. Symbol T = (x, v) x , v Tx denotes the tangent bundle associated to the manifoldM {. Symbol| ∈ MΩ1( ∈) denotesM} the set of analytic 1-forms on and symbol X( ) denotesM the set of vectorM fields on . A vector field M M E M F X( ) is a map F : x F(x) Tx . Symbol , denotes Euclidean inner∈ product.M ∈ M 7→ ∈ M h· ·i The pseudo-Riemannian manifold x is endowed with an indefinite inner M ∋ product , x : Tx Tx R. An indefinite inner product is a non-degenerate, smooth,h· symmetric,·i M× bilinearM map→ which assigns a to pairs of tangent vectors at each tangent space of the manifold. That the metric is non-degenerate means that there are no non-zero v Tx such that v, w x = 0 for all w Tx . 1 2 p ∈ M h i ∈ M Let x = (x , x ,...,x ) denote a local parametrization and ∂i denote the canonical p×p { } of Tx . The field G : R of components gij (x) associated M M → to the indefinite inner product , x is a covariant tensor field defined by: h· ·i def gij (x) = ∂i, ∂j x, (2.1) h i i which is smooth in x. Given tangent vectors u, v Tx parameterized by u = u ∂i i ∈ M and v = v ∂i (Einstein summation convention used), their inner product expresses as: i j u, v x = gij (x)u v . (2.2) h i In pseudo-Riemannian geometry, the metric tensor gij is symmetric and invertible (but not necessarily positive-definite, in fact, positive-definiteness is an additional property that characterizes Riemannian geometry). Therefore, it might hold u,u x = 0 even for nonzero tangent vectors u. The inverse G−1 of the metric tensorh fieldi is a contravariant tensor field whose components are denoted by gij (x). Given a regular function f : R, the differential df : T Ω1( ) is expressed by: M → M → M

dfx(v)= xf, v x, v Tx , (2.3) h∇ i ∈ M where xf : Tx denotes pseudo-Riemannian gradient. In local coordinates, it holds∇ that:M → M i j dfx(v)= gij (x)( xf) v . (2.4) ∇ As the differential does not depend on the choice of coordinates, it holds that dfx(v)= i j ♯ 1 (∂if)xv . It follows that gij (x)( xf) = (∂if)x. The sharp isomorphism :Ω ( ) X( ) takes a 1-form on and∇ returns a vector field on . It holds that: M → M M M ♯ (dfx) = xf. (2.5) ∇ The pseudo-Riemannian gradient may be computed by the metric compatibility con- dition: E ∂xf, v = xf, v x, v Tx . (2.6) h i h∇ i ∀ ∈ M The notion of differential generalizes to the notion of pushforward map. Given differ- entiable manifolds and and a regular function f : , the pushforward M N M → N map df : T T is a such that df x : Tx Tf(x) for every x . LetMγ :[ → a,N a] be a smooth curve on the| manifoldM → forN some a > 0 ∈M def − →M M and let x = γ(0). The pushforward map df x is defined by: | d dγ f(γ(t)) = df x . (2.7) dt | dt t=0  t=0

4 S. FIORI

The following result holds true. Lemma 2.1 (Adapted from [5]). Let denote a manifold of n n real-valued n×n M × matrices and let f : R denote an analytic map about the point X0 , namely: M → ∈ M

∞ k f(X)= a In + ak(X X ) , ak R, (2.8) 0 − 0 ∈ k X=1 in a neighborhood of X0. Then it holds that:

∞ k r−1 k−r df X (V )= ak (X X ) V (X X ) , (2.9) | − 0 − 0 k r=1 X=1 X for any V TX . The covariant∈ M derivative (or connection) of a vector field F X( ) in the di- ∈ M rection of a vector v Tx is denoted as vF. The covariant derivative is defined axiomatically by the following∈ M properties: ∇

fv gwF = f vF + g wF, (2.10) ∇ + ∇ ∇ v(F + G)= vF + vG, (2.11) ∇ ∇ ∇ v(fF)= f vF + Fdfx(v), (2.12) ∇ ∇ for any vector fields F, G, tangent vectors v, w and scalar functions f,g. The covariant derivative of a vector field F X( ) along a vector v Tx may be extended to the ∈ M ∈ M def covariant derivative of a vector field F along a vector field G by GF(x) = G(x)F. Such rule defines a connection : X( ) X( ) X( ).∇ The fundamental∇ relationship for the connection is:∇ M × M → M

k ∂i ∂j =Γ ∂k, (2.13) ∇ ij where quantities Γk : R are termed Christoffel symbols of the second kind ij M → and describe the structure of the connection. The derivative ∂i ∂j measures the ∇ change of the elementary vector field ∂j = ∂j (x) in the direction ∂i. By the properties (2.10)-(2.12), it is readily obtained that:

k i i k j ∂u i (u ∂i)= v Γ u + ∂k, (2.14) ∇v ∂i ij ∂xi   where the functions vi = vi(x) and the functions ui = ui(x) are the components of vector fields in X( ) in the basis ∂i . The Christoffel symbols of the second kind may be specifiedM arbitrarily and{ give} rise to an arbitrary connection. In the Levi-Civita geometry, the Christoffel symbols of the second kind are associated to the metric tensor of components gij and are defined as:

1 ∂g ∂g ∂g Γk def= gkh hj + ih ij . (2.15) ij 2 ∂xi ∂xj − ∂xh   The Christoffel symbols of the second kind defined as in (2.14) are symmetric in the k k covariant indices, namely, Γ = Γ . The associated Christoffel form Γx : Tx ij ji M× p kdef k i j Tx R is defined in local coordinates by [Γx(v, w)] =Γ v w . In the present M → ij MINIMAL-DISTANCE SYMPLECTIC MATRICES 5 paper, the notion of connection refers to a Levi-Civita connection, unless otherwise stated. The notion of covariant derivative is intimately tied to the notion of parallel translation (or transport) along a curve. Parallel translation allows, e.g., compar- ing vectors belonging to different tangent spaces. On a smooth pseudo-Riemannian manifold with connection , fix a smooth curve γ :[ a, a] (a > 0). The M t ∇ − → M parallel translation map Γs(γ): Tγ(s) Tγ(t) associated to the curve is a linear isomorphism for every s,t [ a, a].M The → parallelM translation map depends smoothly ∈ − s t u t on its arguments and is such that Γs(γ) is an identity map and Γu(γ) Γs (γ)=Γs(γ) for every s,u,t [ a, a]. Let v =γ ˙ (0). The covariant derivative of◦ a vector field ∈ − F X( ) in the direction v Tx is given by: ∈ M ∈ M 0 Γǫ (γ)F(γ(ε)) F(γ(0)) d 0 vF = lim − = Γt (γ)F(γ(t)) . (2.16) ∇ ε→0 ε dt t=0

The notion of geodesic curve generalizes the notion of straight line of Euclidean spaces. A distinguishing feature of a straight line of an Euclidean space is that it translates parallel to itself, namely, it is self-parallel. The notion of ‘straight line’ on a curved space inherits such distinguishing feature. An affine geodesic on a manifold · with connection and associated parallel translation map Γ·( ), is a curve γ such thatM γ ˙ is parallel translated∇ along γ itself, namely, for every s,t· [ a, a], it holds that: ∈ − t Γs(γ)γ ˙ (s) =γ ˙ (t). (2.17) Setting s = 0, F(γ(t))def=γ ˙ (t), and invoking the relationship (2.16), it is seen that an affine geodesic curve must satisfy the condition:

γ γ˙ =0. (2.18) ∇ ˙ A geodesic curve is denoted throughout the paper by ρ(t). On a pseudo-Riemannian manifold , the parallel translation map about a curve γ :[ a, a] preserves the angleM between tangent vectors, namely: − → M Γt (γ)v, Γt (γ)w = v, w , (2.19) h s s iγ(t) h iγ(s) for every pair of tangent vectors v, w Tγ(s) and for every s,t [ a, a]. For a geodesic curve ρ : [0, 1] , setting∈v = wM=ρ ˙(0), thanks to the∈ self-translation− property (2.17), it is immediate→ M to verify that: d ρ˙(t), ρ˙(t) =0, (2.20) dth iρ(t) for every t [0, 1]. (A regular re-parametrization does not modify the distinguishing properties of∈ an affine geodesic curve.) The ‘total action’ associated to a curve γ : [0, 1] is defined as: →M 1 (γ)def= γ˙ (t), γ˙ (t) dt. (2.21) A h iγ(t) Z0 On a geodesic arc ρ : [0, 1] , the integrand is constant, therefore the total action →M takes on the value (ρ)= ρ˙(0), ρ˙(0) ρ(0). Ona Riemannian manifold, the length of a curve γ : [0, 1] A is definedh as: i →M 1 def 1 (γ) = γ˙ (t), γ˙ (t) 2 dt, (2.22) L h iγ(t) Z0 6 S. FIORI hence, the length of a geodesic arc ρ on a Riemannian manifold is computed by 1 1 (ρ) = ρ˙(0), ρ˙(0) 2 = 2 (ρ). On a Riemannian manifold, the total action is L h iρ(0) A positive definite and the distance between two points may be calculated as the length of a geodesic arc which joins them (if it exists). On a pseudo-Riemannian manifold, however, such definition is no longer valid because the total action has indefinite sign. 2.2. An equivalence principle of pseudo-Riemannian geometry. The ge- odesic arc associated to a given metric on a pseudo-Riemannian manifold may be calculated through a variational stationary-action principle rather than via covariant derivation. Lemma 2.2. The equation (2.18) is equivalent to the equation:

1 δ γ,˙ γ˙ γ dt =0, (2.23) h i Z0 where symbol δ denotes the variation of the integral functional. Proof. Substituting uk = vk =γ ˙ k in the equation (2.14), leads to the expression i j k k γ γ˙ =γ ˙ γ˙ Γ ∂k +¨γ ∂k. Setting the above covariant derivative to zero leads to the ∇ ˙ ij geodesic equations for the components γk:

k k i j γ¨ +Γij γ˙ γ˙ =0. (2.24)

The integral functional in (2.23) represents the total action associated with the pa- rameterized smooth curve γ : [0, 1] of local coordinates xk = γk(t) and may be written explicitly as: →M

1 1 i j γ,˙ γ˙ γ dt = gij (γ(t))γ ˙ (t)γ ˙ (t)dt. (2.25) h i Z0 Z0 The variation δ in the expression (2.23) corresponds to a perturbation of the total action (2.25). Let η : [0, 1] denote an arbitrary parameterized smooth curve of local coordinates ηk(t) such→M that ηk(0) = ηk(1) = 0. Define the perturbed action:

1 def i i j j η(ε) = gij (γ + εη)(γ ˙ + εη˙ )(γ ˙ + εη˙ )dt, (2.26) I Z0 with ε [ a, a], a> 0. The condition (2.23) may be expressed explicitly in terms of the perturbed∈ − action as:

η(ε) η(0) lim I − I =0, η. (2.27) ε→0 ε ∀ The perturbed total action may be expanded around ε = 0 as follows:

1 ∂gij k i j i j j i η(ε)= gij (x)+ ε η + o(ε) [γ ˙ γ˙ + ε(γ ˙ η˙ +γ ˙ η˙ )+ o(ε)]dt I ∂xk Z0   1 1 ∂g = g (x)γ ˙ iγ˙ j dt + ε g (x)(γ ˙ iη˙j +γ ˙ j η˙i)+ ηk ij γ˙ iγ˙ j dt + o(ε) ij ij ∂xk Z0 Z0   1 i k j k ∂gij i j k = η(0) + ε gik(x)γ ˙ η˙ + gkj (x)γ ˙ η˙ + γ˙ γ˙ η dt + o(ε), (2.28) I ∂xk Z0   MINIMAL-DISTANCE SYMPLECTIC MATRICES 7 where o(ε) is such that limε→0 o(ε)/ε = 0. The first term within the integral on the right-hand side of the expression (2.28) may be integrated by parts, namely:

1 1 i i k i k 1 d(gik(γ)γ ˙ ) k gik(x)γ ˙ η˙ dt = gik(γ)γ ˙ η η dt. (2.29) 0 − dt Z0 Z0

The first term on the right-hand side vanishes to zero because ηk(0) = ηk(1) = 0, hence: 1 1 i k ∂gik i j i k gik(x)γ ˙ η˙ dt = γ˙ γ˙ + gik(x)¨γ η dt. (2.30) − ∂xj Z0 Z0   A similar result holds true for the second term within the integral on the right-hand side of the expression (2.28). Therefore, it holds that:

1 η(ε) η(0) ∂gij ∂gik ∂gkj i j i k o(ε) I − I = γ˙ γ˙ 2gik(x)¨γ η dt + . (2.31) ε ∂xk − ∂xj − ∂xi − ε Z0    The condition (2.27), therefore, implies that:

i 1 ∂gij ∂gik ∂gkj i j gikγ¨ + + + γ˙ γ˙ =0. (2.32) 2 − ∂xk ∂xj ∂xi   As the metric tensor is invertible, the equation (2.32) is equivalent to the equation (2.24). The geodesic equation (2.24) may be written in compact form as:

γ¨(t)+Γγ(t)(γ ˙ (t), γ˙ (t))=0. (2.33) In the above expressions, the over-dot and the double over-dot denote first-order and second-order derivation with respect to parameter t, respectively. The solution of the geodesic equation may be written in terms of two parameters, such as the initial values γ(0) = x andγ ˙ (0) = v Tx , in which case the solution of the ∈ M ∈ M geodesic equation (2.33) will be denoted as ρx,v(t). A geodesic arc ρx,v : [0, 1] →M is associated to a map exp : Tx defined as: x M→M def expx(v) = ρx,v(1). (2.34) The function (2.34) is termed exponential map with pole x . The notion of exponential map is illustrated in the Figure 2.1. ∈ M The equivalency stated in the Lemma 2.2 proves useful when working in intrinsic coordinates. Intrinsic coordinates appear more appealing when it comes to implement an optimization method on a computation platform (for a detailed discussion, see, e.g., [11]). In order to make such equivalency profitable from a computational point of view, it pays to examine the variational method invoked in the Lemma 2.2 on smooth manifolds from an intrinsic-coordinate perspective. Lemma 2.3. The first variation of the integral functional 1 (γ) def= F (γ, γ˙ )dt, (2.35) F Z0 with γ : [0, 1] denoting a curve on a pseudo-Riemannian manifold and F : T R integrable,→ M reads: M M → 1 ∂F d ∂F E δ (γ)= ,δγ dt. (2.36) F ∂γ − dt ∂γ˙ Z0   8 S. FIORI

v

Tx M

M

b expx x

b

expx(v)

Fig. 2.1. Notion of exponential mapping on a manifold M. The point expx(v) lies on a neighborhood of the pole x denoted by the dashed circle.

Proof. Define a smooth family of smooth curves c : [0, 1] [ a, a], a > 0, such that: × − c(t, 0) = γ(t), t [0, 1], (2.37) ∀ ∈ c(0,ε)= γ(0), c(1,ε)= γ(1), ε [ a, a], (2.38) ∀ ∈ − for a fixed ε, t c(t,ε) is a smooth curve on the manifold , for a fixed t, ε c(t,ε) 7→ M2 2 7→ is a smooth curve on the manifold for some a> 0 and ∂ c = ∂ c . Define: M ∂t∂ε ∂ε∂t def ∂ δγ(t) = c(t,ε) . (2.39) ∂ε ǫ=0

Note that δγ Tγ and δγ(0) = δγ(1) = 0. The variation δ (γ) may be written as: ∈ M F 1 1 1 1 δ F (γ, γ˙ )dt def= lim F (c(t,ε), c˙(t,ε))dt F (c(t, 0), c˙(t, 0))dt ε→0 ε − Z0 Z0 Z0  1 F (c(t,ε), c˙(t,ε)) F (c(t, 0), c˙(t, 0)) = lim − dt ε→0 ε Z0 1 ∂ = F (c(t,ε), c˙(t,ε)) dt. (2.40) ∂ε Z0 ε=0

The partial derivative within the integral in equation (2.40) may be written as:

∂F (c, c˙) ∂F (c, c˙) ∂c E ∂F (c, c˙) ∂c˙ E = , + , . (2.41) ∂ε ε=0 ∂c ∂ε ∂c˙ ∂ε   ε=0   ε=0

The variation of the integral functional reads, therefore:

1 1 ∂F (γ, γ˙ ) E 1 ∂F (γ, γ˙ ) dδγ E δ F (γ, γ˙ )dt = ,δγ dt + , dt. (2.42) ∂γ ∂γ˙ dt Z0 Z0   Z0   MINIMAL-DISTANCE SYMPLECTIC MATRICES 9

Integration by parts yields:

1 1 ∂F d E ∂F E 1 d ∂F E , δγ dt = ,δγ ,δγ dt. (2.43) 0 ∂γ˙ dt ∂γ˙ − 0 dt ∂γ˙ Z     0 Z  

The first term in the right-hand side equals zero, hence, the variation of the integral functional assumes the final expression (2.36). A noticeable consequence of Lemma 2.3 is that the variation δ (γ) depends only F on the tangential component δγ Tγ of the perturbation. It is worth noting the difference∈ M between Riemannian geodesics, that minimize the total action, and pseudo-Riemannian geodesics, that are only stationary points of the total action. 2.3. A pseudo-Riemannian geometric structure of the real symplectic 2 group. The skew- Q2n defined in (1.1) enjoys the properties Q2n = −1 T I n and Q = Q = Q n. From now on, the subscript 2n on the symbol Q n − 2 2n 2n − 2 2 will drop for a tidier notation. The subscript of the zero-matrix 0p will drop as well whenever its dimension is clear from the context. The space Sp(2n, R) is a curved smooth manifold that may also be endowed with an algebraic-group structure (namely, group multiplication, group inverse and a unique identity element) in a manner that is compatible with the manifold structure. Therefore, the space Sp(2n, R) has the structure of a of dimension (2n+1)n. Standard and inversion work as algebraic-group operations. (Any symplectic matrix X Sp(2n, R) is such that det2(X) = 1, where symbol det( ) denotes . Therefore,∈ symplectic matrices are invertible.) The identity· element of the group Sp(2n, R) is the matrix I2n. The following identities hold: det(X) = 1, XT = QX−1Q, X−T = QXQ. It follows that the group Sp(2n, R) is a of Sl(2n,−R) as well as of Gl(2− n, R). The tangent space TX Sp(2n, R) has structure:

2n×2n T T TX Sp(2n, R)= V R V QX + X QV =0 . (2.44) { ∈ | } The tangent space at the identity of the Lie group, namely the Lie algebra sp(2n, R), has structure: sp(2n, R)= H R2n×2n HT Q + QH =0 (2.45) { ∈ | } and the elements of the Lie algebra sp(2n, R) are termed Hamiltonian matrices. By the embedding of the manifold Sp(2n, R) into the Euclidean space R2n×2n, at any point X Sp(2n, R) is associated the normal space: ∈ def 2n×2n T NX Sp(2n, R) = P R tr(P V )=0, V TX Sp(2n, R) , (2.46) { ∈ | ∀ ∈ } where symbol tr( ) denotes the trace operator. The tangent space and the Lie algebra associated to the· real symplectic group may be characterized as follows:

2n×2n T TX Sp(2n, R)= XQS S R , S = S , sp(2n, R)= QS{ S R| 2n∈×2n, ST = S , } (2.47)  { | ∈ } NX Sp(2n, R)= QXΩ Ω so(2n) ,  { | ∈ } where symbol so(p) denotes the Lie algebra:

so(p)def= Ω Rp×p ΩT = Ω . (2.48) { ∈ | − } 10 S. FIORI

Consider the following indefinite inner product on the of matrices Gl(p, R):

def −1 −1 U, V X = tr(X UX V ), X Gl(p, R), U,V TX Gl(p, R), (2.49) h i ∈ ∈ referred to as Khvedelidze-Mladenov metric [26]. The above inner product gives rise to a metric which is not positive-definite on the space Sp(2n, R) Gl(2n, R). To verify such property, it is sufficient to evaluate the structure of the⊂ squared norm 2 −1 2 R R V X = tr((X V ) ) with X Sp(2n, ) and V TX Sp(2n, ). By the structure k k ∈ ∈ −1 2n×2n of the tangent space TX Sp(2n, R), it is known that X V = QS with S R symmetric. It holds that: ∈

T 0n In S1 S2 S2 S3 QS = T = , In 0n S S S S  −  2 3   − 1 − 2  n×n n×n 2 with S1,S3 R symmetric and S2 R arbitrary. Hence, tr((QS) ) = 2 ∈ ∈ 2tr(S2 ) 2tr(S1S3) has indefinite sign. Under− the pseudo-Riemannian metric (2.49), it is indeed possible to solve the geodesic equation in closed form. The solution makes use of the exponential of a matrix X Rp×p, which is defined by the series: ∈ ∞ Xr exp(X)def= I + . (2.50) p r! r=1 X Theorem 2.4. The geodesic curve ρX,V : [0, 1] Sp(2n, R) with X Sp(2n, R), → ∈ V TX Sp(2n, R) corresponding to the indefinite Khvedelidze-Mladenov metric (2.49) has∈ expression:

−1 ρX,V (t)= X exp(tX V ). (2.51)

Proof. By the equivalence principle stated in the Lemma 2.2, the geodesic equa- tion in the variational form writes: 1 δ tr(γ−1γγ˙ −1γ˙ )dt =0, (2.52) Z0 where the natural parametrization is assumed. The above variation is computed on the basis of the variational method on manifolds recalled in the Lemma 2.3 and is facilitated by the following rules of calculus of variations:

δ(X−1)= X−1(δX)X−1, (2.53) − dX d δ = (δX), (2.54) dt dt   δ(XZ) = (δX)Z + X(δZ), (2.55) for curves X,Z : [0, 1] Sp(2n, R). By computing the variations, integrating by parts and recalling that the→ variation vanishes at endpoints, it is found that the geodesic equation in variational form reads:

1 tr(δγ(γ−1γγ¨ −1 γ−1γγ˙ −1γγ˙ −1))dt =0. (2.56) − Z0 MINIMAL-DISTANCE SYMPLECTIC MATRICES 11

The variation δγ TγSp(2n, R) is arbitrary. By the structure of the normal space ∈ T T NX Sp(2n, R), the equation tr(P δγ) = 0, with δγ TγSp(2n, R), implies that P = ΩQγ−1 with Ω so(2n). Therefore, the equation (2.56)∈ is satisfied if and only if: ∈ γ−1γγ¨ −1 γ−1γγ˙ −1γγ˙ −1 =ΩQγ−1, Ω so(2n), (2.57) − ∈ or, equivalently,

γ¨ γγ˙ −1γ˙ = γΩQ, (2.58) − for some Ω so(2n). In order to determine the value of matrix Ω, note that: ∈ γT Qγ Q =0 γ¨T Qγ + 2γ ˙ T qγ˙ + γT γ¨ =0. − ⇒ Substituting the expressionγ ¨ =γγ ˙ −1γ˙ + γΩQ into the above equation yields the condition QΩQ = 0. Hence, Ω = 0 and the geodesic equation in Christoffel form reads:

γ¨ γγ˙ −1γ˙ =0. (2.59) − Its solution, with initial conditions γ(0) = X Sp(2n, R) andγ ˙ (0) = V TX Sp(2n, R), is found to be of the form (2.51). ∈ ∈ −1 By definition of , it follows thatρ ˙X,V (t) = V exp(tX V ). The Lemma 2.1 applies to the map exp : sp(2n, R) Sp(2n, R) around 0 sp(2n, R) → ∈ and implies, in particular, that d exp 0(H)= H for every H sp(2n, R). The structure of the pseudo-Riemannian| gradient associated∈ to the Khvedelidze- Mladenov metric (2.49) applied to the case of the real symplectic group Sp(2n, R) is given by the following result. Theorem 2.5. The pseudo-Riemannian gradient of a sufficiently regular function f : Sp(2n, R) R associated to the Khvedelidze-Mladenov metric (2.49) reads: → 1 T T X f = XQ X ∂X fQ Q∂ fX . (2.60) ∇ 2 − X  Proof. The pseudo-Riemannian gradient of a regular function f : Sp(2n, R) R associated to the metric (2.49) is computed as the solution of the following system→ of equations: −1 −1 T R tr(X X fX V ) = tr(∂X fV ), V TX Sp(2n, ), T ∇ T ∀ ∈ (2.61) fQX + X Q X f =0.  ∇X ∇ The first equation expresses the compatibility of the pseudo-Riemannian gradient with the metric, while the second equation expresses the requirement that the pseudo- Riemannian gradient be a tangent vector. The metric compatibility condition may be rewritten as:

T −T T −T tr(V (∂X f X fX ))=0, (2.62) − ∇X where superscript −T denotes the inverse of the transposed matrix. The above −T T −T R condition implies that ∂X f X X fX NX Sp(2n, ), hence that ∂X f −T T −T − ∇ ∈ − X X fX = QXΩ with Ω so(2n). Therefore, the pseudo-Riemannian gra- dient∇ of the criterion function f has∈ the expression:

T X f = X∂ fX XΩQ. (2.63) ∇ X − 12 S. FIORI

In order to determine the value of the matrix Ω, it is sufficient to substitute the expression (2.63) of the gradient within the tangency condition, which becomes:

T X∂T fX XΩQ QX + XT Q X∂T fX XΩQ =0. (2.64) X − X − Solving for Ω gives:  

1 T T Ω= QX ∂X f + ∂ fXQ . (2.65) −2 X  Substituting the above expression of the variable Ω into the expression (2.63) of the pseudo-Riemannian gradient gives the result (2.60). 2.4. A Riemannian structure of the real symplectic group. For complete- ness of exposition, it is instructive to recall an advanced result about a Riemannian structure of the real symplectic group of matrices. Theorem 2.6 ([7]). Let σ : sp(2n, R) sp(2n, R) be a symmetric positive- definite operator with respect to the Euclidean→ inner product on the space sp(2n, R). The minimizing curve of the integral

1 H(t), σ(H(t)) Edt h i Z0 over all curves γ : [0, 1] Sp(2n, R) with fixed endpoints γ(0), γ(1), having defined → H(t)def= γ−1(t)γ ˙ (t), is the solution of the system:

γ˙ (t)= γ(t)H(t), M˙ (t)= σT (H(t))M(t) H(t)σT (H(t)), (2.66)  −1 −  H(t)= σ (M(t)), where symbol σ−1 denotes the inverse of the operator σ. The simplest choice for the symmetric positive-definite operator σ is the identity map of the space sp(2n, R), which corresponds to a Riemannian metric for the real symplectic group Sp(2n, R) given by the inner product:

−1 T −1 U, V X = tr((X U) (X V )), X Sp(2n, R), U,V TX Sp(2n, R). (2.67) h i ∈ ∈ Such a metric leads to the ‘natural gradient’ on the space of real invertible matrices Gl(p, R) studied in [1]. The above choice for the operator σ implies that M = H and that the corresponding curve on the real symplectic group satisfies the equations:

H˙ (t)= HT (t)H(t) H(t)HT (t), (2.68) H(t)= γ−1(t)γ ˙ (t), −  or, in Christoffel form:

γ¨ γγ˙ −1γ˙ + γγ˙ T QγQγ−1γ˙ γ˙ γ˙ T QγQ =0. (2.69) − − The above equations describe geodesic arcs on the real symplectic group corresponding to the metric (2.67). Closed form solutions of the above equations are unknown to the authors of [7] and to the present author. The structure of the Riemannian gradient is given by the following result. MINIMAL-DISTANCE SYMPLECTIC MATRICES 13

Theorem 2.7. The Riemannian gradient X f of a sufficiently regular function f : Sp(2n, R) R corresponding to the metric∇ (2.67) reads: → 1 T T X f = XQ ∂ fXQ QX ∂X f . (2.70) ∇ 2 X − 

Proof. The Riemannian gradient X f of a regular function f : Sp(2n, R) R must satisfy condition: ∇ →

−1 1 −1 T X f = XQ(Ω X Q∂X f), with Ω = (X Q∂X f + ∂ fXQ), (2.71) ∇ − 2 X from which the expression of the Riemannian gradient associated to the metric (2.67) follows. Clearly, the expressions of the gradients (2.60) and (2.70) differ from each other because the metrics that they correspond to are different. 3. Minimal-distance optimization on pseudo-Riemannian manifolds. Sev- eral numerical applications require the solution of a least-squares problem on a pseudo- Riemannian manifold. As an example, given a ‘cloud’ of points xk on a manifold endowed with a distance function d( , ), its empirical mean µ ∈M is defined as: M · · ∈M def 2 µ = arg min d (x, xk). (3.1) x∈M k X For a recent review, see, e.g., [19]. The empirical mean value µ of a distribution of points on a manifold is instrumental in several applications. The mean value µ is, by definition, close to all points in the distribution. Therefore, the tangent space Tµ associated to a cloud of data-points may serve as reference tangent space in the developmentM of algorithms to process those data (see, for example, the discussion in [37] about ‘principal geodesic analysis’ of data). The notion of averaging over a curved manifold is depicted in the Figure 3.1. In addition, given a cloud of N points xk on a manifold endowed with a distance function d( , ), its empirical mth-order∈M (centered) momentM is defined as: · ·

def 1 m µm = d (µ, xk), m 2, (3.2) N ≥ k X where µ denotes the empirical mean of the cloud. In particular, the moment ∈ M µ2 denotes the empirical variance of the manifold-valued samples, which measures the width of the cloud around its center. The definition of empirical moments µm is paired with the definition of empirical mean as in applied on manifold they provide a characterization of the distribution of a data-cloud. 3.1. Optimization on manifolds. Optimization on manifold is a modern for- mulation of some classical constrained optimization problems. In particular, given a differentiable manifold and a sufficiently regular function f : R, minimiza- tion on manifold is expressedM as: M →

min f(x) . (3.3) x∈M{ } The main tool for building algorithms to solve the optimization problem (3.3) was Rie- mannian geometry. Two seminal contributions in this area are the geodesic-stepping 14 S. FIORI

b xk b b

b

b

b µ

b b

b b

b M

Fig. 3.1. Notion of averaging over a manifold M. Dots represent the samples xk ∈ M to average while the open circle represents the empirical mean µ ∈M. The dashed lines represent the distances among the sample-points and the midpoint.

Riemannian-gradient-steepest-descent method of Luenberger [28] and its extension to Riemannian-Newtonian and quasi-Newtonian methods by Gabay [21], that have been inspiring further research in the area of optimization on curved spaces since their inception. In particular, the Riemannian-gradient-steepest-descent numerical optimization method is based on the knowledge of the explicit form of the geodesic arc emanating from a given point along a given direction and of the Riemannian gradient of the (sufficiently regular) criterion function to optimize. The geodesic-based Riemannian- gradient-steepest-descent optimization method may be expressed as:

x ℓ = exp ( t ℓ x ℓ f), with (3.4) ( +1) x(ℓ) − ( )∇ ( )

t(ℓ) = arg min f(expx ℓ ( t x(ℓ) f)) , (3.5) t>0 { ( ) − ∇ } with ℓ 0 and with the initial guess x being arbitrarily chosen. The rule ≥ (0) ∈ M (3.5) to compute a stepsize schedule is well defined for any integer ℓ such that x(ℓ) is not a critical point. In fact, it holds: d f(exp ( t x ℓ f)) = dt x(ℓ) − ∇ ( )

expx (−t∇x f)f, d expx ℓ −t∇x f ( x(ℓ) f) expx (−t∇x f), (3.6) h∇ (ℓ) (ℓ) ( ) | (ℓ) −∇ i (ℓ) (ℓ) hence, it holds that:

d 2 f(exp ( t x ℓ f)) = x ℓ f, x ℓ f x ℓ = x ℓ f < 0, (3.7) dt x(ℓ) − ∇ ( ) h∇ ( ) −∇ ( ) i ( ) −k∇ ( ) kx(ℓ) t=0

provided x(ℓ) is not a critical point of the function f. The convergence of the geodesic- based Riemannian-gradient-steepest-descent optimization method (3.4) endowed with the stepsize selection rule (3.5) is ensured by the following result. Theorem 3.1 ([21]). Assume that the function f is continuously differentiable, that its critical values are distinct and that the connected subset of the level-set x { ∈ MINIMAL-DISTANCE SYMPLECTIC MATRICES 15

f(x)

Algorithm 1 Descent method on an affine manifold proposed in [34] to numeri- cally solve the optimization problem (3.3). The method is expressed in a completely coordinate-free fashion. Set x(0) to an initial guess in Set ℓ =0 M repeat

Choose a tangent vector v(ℓ) Tx(ℓ) such that dfx(ℓ) (v(ℓ)) < 0 Choose a connection ∈ M ∇(ℓ) R ∇(ℓ) Determine a value t(ℓ) such that f(expx(ℓ) (t(ℓ)v(ℓ)))

to the case that the initial guess x(0) is not a critical point of the criterion function f already (a preventive check of this situation was omitted from the Algorithm 1). At any optimization step, a different connection (ℓ) may be chosen (not necessarily a Levi-Civita connection), implying that no metrics∇ are selected a priori and that the method depends on local affine structures. The paper [34] does not comment on how to choose the move-along direction v(ℓ) nor a stepsize schedule t(ℓ) but suggests a stopping criterion to halt the iteration. Let Γ∇ denote the parallel translation ∇ operation defined by the connection and t expx (tv) the curve associated to ∇ ∇ 7→ the exponential map expx emanating from the point x along the direction v Tx (0) (0) ∈ M of an extent t. At x(0), select a basis e1 ,...,ep of the tangent space Tx(0) { (ℓ) (ℓ}) M and a constant ε > 0. A basis e ,...,ep of the tangent space Tx ℓ { 1 } ( ) M def ∇(ℓ) may be parallel translated along the curve γx(ℓ),v(ℓ) (t) = expx(ℓ) (tv(ℓ)) to give a basis 16 S. FIORI

(ℓ+1) (ℓ+1) e ,...,ep of the tangent space Tx ℓ , namely, by { 1 } ( +1) M

t ℓ (ℓ+1) ∇(ℓ) ( ) (ℓ) ei =Γ (γx(ℓ),v(ℓ) )0 ei . (3.8)

The proposed stopping criterion is to halt the iteration at step ℓ if it holds that:

(ℓ) dfx ℓ (e ) <ε for every i 1,...,p . (3.9) | ( ) i | ∈{ } The analysis of convergence of the method in the Algorithm 1 with stopping criterion (3.9) is summarized as follows. Theorem 3.2 ([34]). Let be a p-dimensional differentiable manifold, f be M a sufficiently regular function and x(ℓ) be the sequence generated by the method of Figure 1. Suppose that the function{ f is} bounded from below and that the connected component of the level set x f(x) 0 the method halts according to the stopping criterion (3.9), then there exists a subsequence of x converging to a critical point of f. { (ℓ)} (ii) If the function f is convex along γ, then the sequence x(ℓ) converges to a unique minimum point of the function f. { } It is to be underlined that such a method relies on parallel translation to propagate a basis of the tangent spaces, which is not surely available or efficiently computable for manifolds of interest.

3.2. Gradient-based optimization on pseudo-Riemannian manifolds. Let denote a p-dimensional pseudo-Riemannian manifold. A key step in pseudo- M Riemannian geometry is to decompose each tangent space Tx as follows: M

+ def T = v Tx v, v x > 0 , x M { ∈ M|h i } 0 def T = v Tx v, v x =0 , (3.10)  x M { ∈ M|h i }  − def  T = v Tx v, v x < 0 . x M { ∈ M|h i }  αdef The above three components may be characterized by defining Qx = v Tx v, v x = α { ∈ M|h i α . If the set Qx with α = 0 is nonempty, it is a submanifold of the space Tx of dimension} p 1. The set6 Q0 = T 0 is nonempty as 0 Q0 and is a set ofM null − x x M ∈ x measure in Tx . M Example 1. The space Sp(2, R) is a 3-dimensional manifold (in fact, Sp(2, R)= Sl(2, R), the special linear group), therefore the following parametrization may be taken advantage of:

x11 x12 R (x11, x12, x21) 1+x12x21 SpI(2, ), for x11 =0, ↔ x21 ∈ 6   x11  (3.11)  0 x12 R  (0, x12, x22) 1 SpII(2, ), for x12 =0, ↔ x22 ∈ 6  − x12    MINIMAL-DISTANCE SYMPLECTIC MATRICES 17

R R R R where (SpI(2, ), SpII(2, )) is a partition of the space Sp(2, ). For X SpII(2, ), it holds that: ∈

−1 −1 0 x12 x22 x12 X = 1 = 1 − , (3.12) x22 0  − x12   x12  R v11 v12 TX SpII(2, ) V = x22 1 . (3.13) ∋ v11 x + v12 x2 v22  12 12  Hence, straightforward calculations lead to:

2 v12 v11x22v12 V, V X =2 + v v . (3.14) h i x2 x − 11 22  12 12  As a consequence, it holds that:

v v 0 R 11 12 2 2 TX SpII(2, )= x22 1 v12 + v11v12x22x12 v11v22x12 =0 . v11 x + v12 x2 v22 −  12 12   (3.15)

When a pseudo-Riemannian manifold of interest and a regular criterion func- tion f : R are specified, an optimization ruleM is ‘gradient steepest descent’, expressedM by → [20]:

+ 0 xf if xf Tx Tx , x˙ = −∇ ∇ ∈ − M ∪ M (3.16) xf if xf T ,  ∇ ∇ ∈ x M whose solution induces the dynamics:

2 + 0 ˙ xf x if xf Tx Tx f = −k∇ 2k ∇ ∈ − M ∪ M 0. (3.17) xf if xf T ≤  k∇ kx ∇ ∈ x M  The differential equation (3.16) on the manifold may be solved numerically by a geodesic-based stepping method. It generalizes theM forward Euler method, which moves a point along a straight line in the direction of the Euclidean gradient. In the Euler method, the extension of each step is controlled by a parameter termed stepsize. In the same manner, a geodesic-based stepping method moves a point along a short arc in the direction of the pseudo-Riemannian gradient. The optimization method based on geodesic stepping reads:

exp ( t f) if f T + T 0 , x(ℓ) (ℓ) x(ℓ) x(ℓ) x(ℓ) x(ℓ) x(ℓ+1) = − ∇ ∇ ∈ − M ∪ M (3.18) exp (t(ℓ) x ℓ f) if x ℓ f T , ( x(ℓ) ∇ ( ) ∇ ( ) ∈ x(ℓ) M where ℓ 0 denotes iteration step-counter, symbol exp (v) denotes an exponential ≥ x map on at a point x applied to the tangent vector v Tx , while the M ∈ M ∈ M succession t(ℓ) > 0 denotes an optimization stepsize schedule and the succession x(ℓ) denotes an approximation of the actual solution of the differential equation (3.16)∈ M at time ℓ−1 t . The initial guess to start iteration is denoted by x . i=1 (i) (0) ∈M The stepsize t at any step may be defined by the counterpart of the ‘line search’ P (ℓ) method of the geodesic-based gradient-steepest-descent optimization on Riemannian 18 S. FIORI manifolds (3.5) adapted to the present geometrical setting. Such a method may be expressed as:

+ arg mint>0 f(expx(ℓ) ( t x(ℓ) f)) if x(ℓ) f Tx(ℓ) , def { − ∇ } ∇ ∈ 0 M t = 1 if x ℓ f T , (3.19) (ℓ)  ( ) x(ℓ) ∇ ∈ − M  arg mint>0 f(exp (t x ℓ f)) if x ℓ f T ,  { x(ℓ) ∇ ( ) } ∇ ( ) ∈ x(ℓ) M provided x(ℓ) is not a critical point. The method (3.19) involves a nonlinear optimiza- tion sub-problem in the parameter t that does not admit any closed-form solution, in general. Moreover, the definition (3.19) is given in a way that takes into account that the function f(exp ( t f)) is locally decreasing for f T ± while it x(ℓ) x(ℓ) x(ℓ) x(ℓ) ∓ ∇ 0 ∇ ∈ M is locally flat for x ℓ f T . A consequence of such phenomenon is that the ∇ ( ) ∈ x(ℓ) M line-search method (3.5) cannot be extended directly, otherwise the resulting stepsize would not be well-defined. The obstruction just encountered is due to the presence of 0 0 the tangent-space part Tx , as moving along a direction v Tx might increase the value of the criterion functionM for any value of the stepsize t∈, unlessMt = 0. The case t = 0 would clearly imply that the optimization method has reached an impasse point, namely, the method cannot proceed any further in the optimization of the criterion function f. 0 In the hypothesis that over an optimization trajectory it holds that xf / T , ∇ ∈ x M a stepsize schedule tˆ(ℓ) that approximates the result of the line-search method (3.19) may be chosen on the basis of the following argument. Define the function:

˜ def R f = f ρx ℓ ,v ℓ : [0, 1] , (3.20) (ℓ) ◦ ( ) ( ) → + − where v ℓ = x ℓ f if x ℓ f T while v ℓ = x ℓ f if x ℓ f T . The ( ) −∇ ( ) ∇ ( ) ∈ x(ℓ) M ( ) ∇ ( ) ∇ ( ) ∈ x(ℓ) M ˜ ˜ value f(ℓ)(0) f(ℓ)(t) 0 denotes the decrease of the criterion function f subjected to a pseudo-geodesic− step≥ of stepsize t. If the value of the geodesic stepsize t is small enough, such decrease may be expanded in Taylor series truncated to the second order: 1 f˜ (0) f˜ (t)= f˜ t f˜ t2 + o(t2), (3.21) (ℓ) − (ℓ) − 1(ℓ) − 2 2(ℓ) where, by definition of Taylor series expansion:

˜ 2 ˜ ˜ def df(ℓ) ˜ def d f(ℓ) f1(ℓ) = , f2(ℓ) = . (3.22) dt dt2 t=0 t=0

Under such second-order Taylor approximation, the stepsize value that maximizes the decrease rate is:

def ˜ 1 ˜ 2 ˜ ˜−1 tˆ(ℓ) =argmax f1(ℓ)t f2(ℓ)t = f1(ℓ)f . (3.23) t − − 2 − 2(ℓ)   (The above choice of optimization stepsize is sound only if f˜ 0 and f˜ > 0, 1(ℓ) ≤ 2(ℓ) hence tˆ(ℓ) 0.) The above procedure may increase significantly the computational complexity≥ of the optimization algorithm as the repeated computation of the coeffi- ˜ cient f2(ℓ) at every iteration may be computationally cumbersome. Therefore, asfaras the computational complexity of the optimization algorithm is of concern, the above procedure should be limited to cases where the computation of second derivatives is simple. MINIMAL-DISTANCE SYMPLECTIC MATRICES 19

A stopping criterion for the iteration (3.18) may be adapted from criterion (3.9). (ℓ) Denote by ei a frame of the tangent space Tx(ℓ) of the pseudo-Riemannian manifold {. Given} a precision value ε> 0, the stoppingM criterion (3.9) halts iteration at step ℓ Mif:

(ℓ) x ℓ f,e x ℓ <ε for every i 1,...,p , (3.24) |h∇ ( ) i i ( ) | ∈{ } where the initial frame e(0) is propagated via the rule (3.8). { i } Another stopping criterion, given a precision value ε> 0 and under the hypothesis 0 that over an optimization trajectory it holds that xf / Tx , halts iteration at step ℓ if: ∇ ∈ M

f˜ < ε. (3.25) − 1(ℓ) Such stopping criterion is simpler to implement than (3.24) and will be actually used in the following sections. Unlike the method (3.4)-(3.5), that converges under appropriate conditions, the method (3.18) is not necessarily convergent, in general, due to the presence of the 0 tangent-space component Tx . However, numerical experiments suggest that opti- M 0 mization trajectories ℓ x(ℓ) are such that x ℓ f / T and that, as a conse- 7→ ∇ ( ) ∈ x(ℓ) M quence, the optimization method (3.18) endowed with the stepsize selection method (3.19) enjoys the convergence properties expressed by Theorem 3.1 and, provided that it generates a sequence converging to a critical point of the criterion function, its convergence is linear. 3.3. A distance function on the real symplectic group. In the present paper, the real symplectic group is treated as a pseudo-Riemannian manifold with a bi-invariant pseudo-Riemannian metric. Although it is possible to introduce a pseudo- distance function that is compatible with the pseudo-Riemannian metric (see, e.g., [33]), such function is not positive definite and cannot be interpreted as a distance function. The research work [42] is concerned with the distance function on the manifold of real symplectic matrices defined by:

d2(X, Y )def= X Y 2 = tr((X Y )T (X Y )), X,Y Sp(2n, R). (3.26) k − kF − − ∈ The paper [42] investigated on the nature of the critical points of the function d2(X, Y¯ ) 2 with Y¯ Sp(2n, R) fixed, namely, of the points that satisfy the equation X d (X, Y¯ )= 0. Generally,∈ there exist multiple solutions for the critical points, which∇ form a sub- manifold of the space Sp(2n, R). A topological study of the critical submanifold based on Hessian analysis reveals that it is formed by a number of separate submanifolds, each of which is characterized by a dimension and a local optimality status (local max- imum, minimum or saddle). The result of such non-trivial analysis is summarized by the following theorem. Theorem 3.3 ([42]). The least-squares distance function X d2(X, Y¯ ) with fixed Y¯ Sp(2n, R) has a unique minimum X = Y¯ in Sp(2n, R)7→and the rest of critical submanifolds∈ are all saddles. On the manifold Sp(2n, R), the empirical mean µ Sp(2n, R) of a data-set ∈ Xk Sp(2n, R) of cardinality N may be defined as in equation (3.1) with dis- tance{ } function ⊂ chosen as in definition (3.26). Likewise, the empirical variance of the 20 S. FIORI data-set may be defined as in equation (3.2) with m = 2. The criterion function f : Sp(2n, R) R to minimize is: →

def 1 2 f(X) = X Xk . (3.27) 2N k − kF k X For the sake of notational convenience, set:

def 1 2n×2n C = Xk R . (3.28) N ∈ k X 1 2 The criterion function (3.27) may be recast as f(X)= 2 X C F + constant. The analytical study of the critical landscape of suchk functio− kn is nontrivial and the present author is not aware of any result about the critical points of the function X C 2 over X Sp(2n, R). Hence, the problem of its minimization is treated k − kF ∈ numerically in the present paper. It holds that ∂X f = X C, hence, according to the Theorem 2.5, the pseudo-Riemannian gradient of the criterion− function (3.27) on the real symplectic group endowed with the Khvedelidze-Mladenov metric reads:

1 1 T X f = Q(X C)Q + X(X C) X. (3.29) ∇ 2 − 2 − In order to compute the partition (3.10), it is worth noting that:

−1 2 T T T 2tr X X f = tr (X C) X(X C) X + (X C) Q(X C)Q . (3.30) ∇ − − − −     The computation of the stepsize at each iteration of the algorithm (3.18) to opti- mize the criterion function (3.27) according to the rule (3.23), is based on the following results:

1 d T X exp(tX−1V ) Y 2 = tr X exp(tX−1V ) Y V exp(tX−1V ) (3.31) 2 dtk − kF − 1 d2    X exp(tX−1V ) Y 2 = V exp(tX−1V ) 2 + 2 dt2 k − kF k kF T tr X exp(tX−1V ) Y VX−1V exp(tX−1V ) . (3.32) −    The coefficients f˜1 and f˜2 of the function f˜ = f ρX,V , with f as in definition (3.27), read: ◦

f˜ = tr (X C)T V , (3.33) 1 − f˜ = tr V T V + (X C)T VX−1V . (3.34) 2 −  The sign of the coefficients f˜1 and f˜2 may be evaluated as follows. In the case that + R 0 R X f TX Sp(2n, ) TX Sp(2n, ), the algorithm (3.18) takes V = X f, hence ∇˜ ∈ T ∪ 1 T T T−∇ f1 = tr((X C) X f) = 2 tr((X C) X(X C) X + (X C) Q(X C)Q), − − ∇ − − −1 −2 − therefore, by equation (3.30), it holds that f˜1 = tr((X X f) ) 0. Conversely, − R − ∇ ≤ when X f TX Sp(2n, ), the algorithm (3.18) takes the search direction V = X f, ∇ ∈ −1 2 ∇ hence f˜ = tr((X X f) ) 0. The coefficient f˜ has a quadratic dependence on 1 ∇ ≤ 2 matrix V , hence it has the same value for V = X f. The coefficient f˜ is computed ±∇ 2 MINIMAL-DISTANCE SYMPLECTIC MATRICES 21 as the sum of two terms, V 2 and tr((X C)T VX−1V ). The first term is non- k kF − negative for every V TX Sp(2n, R), while the second term is indefinite. To evaluate the magnitude of the∈ second term, note that:

tr((X C)T VX−1V ) X C VX−1V X C V 2 X−1 . (3.35) | − |≤k − kFk kF ≤k − kFk kFk kF −1 T −1 By the identity X = QX Q, it follows that X F = X F. Moreover, by the triangle inequality it follows− that X X Ck +k C k. Hence:k k kF ≤k − kF k kF tr((X C)T VX−1V ) ( X C 2 + X C C ) V 2 . (3.36) | − |≤ k − kF k − kFk kF k kF Fixed C , for X C sufficiently small, the term V 2 dominates the sum in k kF k − kF k kF the expression (3.34), hence f˜2 may be non-negative as well. The initial guess X(0) may be chosen or randomly generated in Sp(2n, R), provided it meets the condition

T T −1 tr f X f + (X(0) C) X fX X f > 0. (3.37) ∇X(0) ∇ (0) − ∇ (0) (0) ∇ (0)   The proposed procedure to optimize the criterion function (3.27) may be sum- marized by the pseudocode listed in the Algorithm 2, where it is assumed that the 0 optimization sequence ℓ X(ℓ) is such that X ℓ f / T . In the Algorithm 2, 7→ ∇ ( ) ∈ X(ℓ) M the quantity ℓ denotes a step counter, the matrix J(ℓ) represents the Euclidean gradi- ent of the criterion function (3.27), the matrix U(ℓ) represents its Riemannian gradient and the sign of the scalar quantity s(ℓ) determines whether the matrix U(ℓ) belongs to the space T + Sp(2n, R) or to the space T − Sp(2n, R). X(ℓ) X(ℓ) 3.4. Computational issues. From a computational viewpoint, the evaluation of the geodesic function (2.51) is the most expensive operation required in the imple- mentation of the optimization algorithm (3.18). By definition, the exponential of a H sp(2n, R) is given by the infinite series (2.50). A fundamental result to ease such∈ computation is the Cayley-Hamilton theorem. The characteristic polynomial of H sp(2n, R) is defined def ∈ as q(λ) = det(λI2n H), where λ C. According to the Cayley-Hamilton theorem applied to the present− case, the matrix∈ H satisfies the equation q(H) = 0. A direct consequence of such result is that the power H2n may be expressed as a liner com- bination of the powers Hr with r = 0, 1,..., 2n 1. Hence, the geodesic equation (2.51) may be expressed as the finite sum: −

−1 −1 −1 −1 −1 2n−1 ρX,V (t)= X[ρ0(t,X V )I2n+ρ1(t,X V )X V + +ρ2n−1(t,X V )(X V ) ], ··· (3.38) with ρi : [0, 1] sp(2n, R) R. The advantages of the expression (3.38) of the matrix- exponential are× that the→ expression is exact and that its computational complexity is very reasonable, as the coefficients ρi are scalar functions of the entries of the matrix X−1V and a careful implementation of the matrix-powers in the expression (3.38) costs as much as the computation of the highest-order term (X−1V )2n−1. The problem of the computation of the coefficients ρi has been studied and expressions of such coefficients are available from the literature [35, 36]. The computation of the scalar coefficients ρi is facilitated by the highly-structured form of the Hamiltonian matrices. R Example 2. From the expression of the tangent space TX SpI(2, ) evaluated at the identity (X = I2), it follows that the Lie algebra sp(2, R) coincides with the 22 S. FIORI

Algorithm 2 Pseudocode to implement averaging over the real symplectic group according to the optimization rule (3.18) endowed with stepsize-selection rule (3.23) and stopping criterion (3.25). 0 I Set Q = n n In 0n Set ℓ =0 −  1 Set C = N k Xk Set X(0) to an initial guess in Sp(2n, R) Set ε to desiredP precision repeat Compute J(ℓ) = X(ℓ) C 1 − T Compute U(ℓ) = 2 (QJ(ℓ)Q + X(ℓ)J(ℓ)X(ℓ)) −1 2 Compute s(ℓ) = tr((X(ℓ) U(ℓ)) ) if s(ℓ) > 0 then Set V(ℓ) = U(ℓ) else − Set V(ℓ) = U(ℓ) end if ˜ T Compute f1(ℓ) = tr(J(ℓ)V(ℓ)) ˜ T T −1 Compute f2(ℓ) = tr(V(ℓ)V(ℓ)) + tr(J(ℓ)V(ℓ)X(ℓ) V(ℓ)) ˜ ˜ Set tˆ(ℓ) = f1(ℓ)/f2(ℓ) − ˆ −1 Set X(ℓ+1) = X(ℓ) exp(t(ℓ)X(ℓ) V(ℓ)) Set ℓ = ℓ +1 until f˜ <ε − 1(ℓ) set of real 2 2 matrices with zero trace. Namely, any matrix H sp(2, R) may be represented as:× ∈ h h H = 11 12 . (3.39) h h  21 − 11  2 2 3 2 4 2 2 It follows that H = (h11 +h12h21)I2, H = (h11 +h12h21)H, H = (h11 +h12h21) I2, and so forth. Hence, by the definition of matrix exponential, it follows that: 1 1 exp(H)= 1+ (h2 + h h )+ (h2 + h h )2 + ... I 2! 11 12 21 4! 11 12 21 2   1 1 + 1+ (h2 + h h )+ (h2 + h h )2 + ... H, (3.40) 3! 11 12 21 5! 11 12 21   namely:

sinh √− det(H) I2 cosh det(H)+ H if det(H) < 0, − √− det(H) exp(H)=  I2 + H p if det(H)=0, (3.41)   sin √det(H)  I2 cos det(H)+ H if det(H) > 0. √det(H)   p Another exact method applies when the matrix to be exponentiated is diago- nalizable. As a reference on the eigen-structure of a Hamiltonian matrix see, e.g., MINIMAL-DISTANCE SYMPLECTIC MATRICES 23

[40]. Assume that there exist matrices U,D R2n×2n with D diagonal such that X−1V = UDU −1. Then, it holds that: ∈

−1 ρX,V (t)= XU exp(tD)U . (3.42)

In addition, in numerical , there is a large body of numerical recipes to compute the exponential of a matrix (see, for example, [24]). An interesting exam- ple is based on the following approximation:

H H −1 H H −1 exp(H) = exp exp I n + I n , (3.43) 2 − 2 ≈ 2 2 2 − 2          for H sp(2n, R) and H 2I n Gl(2n, R). The map on the rightmost side is known ∈ − 2 ∈ def −1 p×p as Cayley map, which is defined as cay(X) = (Ip + X)(Ip X) for X R and 1 − ∈ X Ip nonsingular. Note that cay( 2 H) is an approximation of exp(H), yet the Cayley function− maps a Hamiltonian matrix into a real symplectic matrix1 [4]. The computation burden associated to the operations listed in the Algorithm 2, besides of the exponential function, may be evaluated in terms of the size 2n of the involved matrices. Few preliminary observations are in order: The implementation of the Algorithm 2 requires the evaluation of the inverse • X−1 of a symplectic matrix X Sp(2n, R). By the definition of symplectic matrices, it is inferred that X−∈1 = QXT Q, which essentially corresponds to block swapping and sign-switching.− Namely, upon the n n 4-block par- T T A B −1 D −B × tition X = , it results that X = T T . Hence, in the present C D −C A evaluation of computation burden, the inversion of a symplectic matrix is     considered as costless. Some terms repeat across the lines of the pseudocode of Algorithm 2, hence • a careful implementation is assumed, which does not perform the same com- putation more than once. Except for matrix exponentiation and one scalar division, the rest of the • operations are multiplications and additions. The latter are grouped into MAC (Multiply & ACcumulate) operations. On the basis of the above observations, the computational complexity of the terms involved in the Algorithm 2 have been estimated as shown in the Table 3.1. The total computation burden of the algorithm has been estimated in 64n3 MACs, 1 scalar division and 1 matrix exponential evaluation. 4. Numerical tests. The numerical behavior of the developed minimal-distance algorithm will be illustrated via different tests. A numerical test concerns the compu- tation of the empirical mean out of a set of real symplectic matrices of size 2 2. The space Sp(2, R) is 3-dimensional, therefore graphical representations of the qu×antities of interest is allowed. A further numerical test concerns the computation of the em- pirical mean of a set of given real symplectic matrices in Sp(2n, R) with n> 1. The R optimization algorithm was coded in MATLAB 6.0. The numerical experiments presented in the following sections rely on the avail- ability of a way to generate pseudo-random samples on the the real symplectic group Sp(2n, R). Given a point X Sp(2n, R), that will be referred to in the following as ∈

1Indeed, the Cayley map may be applied in the numerical integration of ordinary differential equations on quadratic matrix group-manifolds [27] that generalize the notion of symplectic group. 24 S. FIORI

Term MACs div exp 3 U(ℓ) 16n – – 3 s(ℓ) 16n – – ˜ 3 f1(ℓ) 8n – – ˜ 3 f2(ℓ) 16n – – tˆ(ℓ) – 1 – 3 X(ℓ+1) 8n – 1 Table 3.1 Estimate of the computational complexity of the terms involved in the Algorithm 2. (The symbol ‘div’ denotes scalar division while the term ‘exp’ denotes matrix exponential.)

‘center of mass’ or simply center of the random distribution, it is possible to generate a random sample Y Sp(2n, R) in a neighbor of matrix X by the rule: ∈ Y = X exp(X−1V ), (4.1) where the direction V TX Sp(2n, R) is randomly generated around 0 TX Sp(2n, R). That the points Y generated∈ by the exponential rule (4.1) distribute around∈ the point X is confirmed by the following result. Lemma 4.1. Let (X, V ) T Sp(2n, R) and Y = X exp(X−1V ). Then it holds that: ∈

Y X X [exp( X−1V ) 1]. (4.2) k − kF ≤k kF k kF −

−1 Proof. As Y X = X(exp(X V ) I n) , it follows that: k − kF k − 2 kF ∞ −1 k ∞ −1 k (X V ) X V F Y X F X F X F k k . (4.3) k − k ≤k k k! ≤k k k! k=1 F k=1 X X

The series on the rightmost term converges to exp( X−1V ) 1. k kF − Recall that, given a point X Sp(2n, R) and a matrix S R2n×2n symmetric, ∈ ∈ because of the structure of the tangent space TX Sp(2n, R), the matrix V = XQS belongs to the space TX Sp(2n, R). Hence, it is sufficient to generate a random sym- metric matrix S R2n×2n to get a random point X exp(QS) in Sp(2n, R). In turn, a symmetric 2n 2∈n matrix may be generated by the rule S = AT +A, with A R2n×2n having zero-mean× randomly-generated entries. All in one: ∈

Y = X exp(Q(AT + A)). (4.4)

T 2 2 2 Note that Q(A + A) F = 2(tr(A )+ A F), hence, by the Lemma 4.1, it holds that k k 2 2 k k Y X F √2 X F tr(A )+ A F, namely, the variance of the entries aij controls kthe− spreadk ≤ of thek randomk real-symplectick k sample-matrices Y around the center of mass p X. 4.1. Averaging over the manifold Sp(2, R). A close-up of the numerical be- havior of the minimal-distance optimization algorithm (3.18) stems from the exami- nation of the case of averaging over the manifold Sp(2n, R) for n = 1 and N = 100. The elements of the group Sp(2, R) may be rendered on a 3-dimensional drawing. MINIMAL-DISTANCE SYMPLECTIC MATRICES 25

The Figure 4.1 shows a result obtained with the iterative algorithm (3.18) and the 100 samples Xk generated randomly around a randomly-selected center of mass. The Figure 4.1 shows the location of the target matrices Xk (circles), the location of the center-of-mass (cross), the trajectory of the optimization algorithm over the search space (solid-dotted line) and the location of the final point computed by the algorithm (diamond). In order to emphasize the behavior of the optimization method (3.18), 1 in this experiment a constant stepsize schedule t(ℓ) = 2 was used and no stopping criterion was used, in order to evidence the numerical stability of the method. The

0.3

0.2

0.1

0

−0.1

−0.2

−0.3

−0.4 0.6

0.4 1 0.95 0.2 0.9 0.85 0.8 0 0.75 0.7 −0.2 0.65

Fig. 4.1. Averaging over the real symplectic group Sp(2, R). Target matrices Xk are denoted by circles. The center-of-mass is denoted by a cross mark. The trajectory of the optimization algorithm is denoted by a solid-dotted line, where any dot corresponds to an approximate solution X(ℓ). The last point of the trajectory is denoted by a diamond mark and represents the computed empirical mean µ.

Figure 4.1 shows that the algorithm is convergent toward the center of mass. The 1 2 Figure 4.2 shows the value of the criterion function 2N k d (X,Xk) during iteration, the value of the Frobenius norm of its pseudo-Riemannian gradient during iteration, P the distances d(X,Xk) before iteration (with initial guess chosen as X(0) = I2) and after iteration.

4.2. Averaging over the manifold Sp(2n, R) with n > 1. The Figure 4.3 shows a result obtained with the iterative algorithm (3.18) for n = 5 and N = 50. 1 2 The Figure 4.3 shows the value of the criterion function 2N k d (X,Xk) as well as the value of the Frobenius norm of its pseudo-Riemannian gradient during iteration. In the shown example, the squared pseudo-norm of the pseudo-RP iemannian gradient of the criterion function to optimize may assume negative, zero, as well as positive values during optimization, therefore, the Frobenius norm was selected for displaying in order not to miss the link with the zero-gradient condition at convergence. The Figure 4.3 also shows the distances d(X,Xk) before iteration (with initial guess chosen as X(0) = I10) and after the iteration. Such numerical results were obtained by the 26 S. FIORI

0 0.1 10

0.08 −2 10

0.06 −4 10 0.04

−6 10 Optimization criterion 0.02 Frobenius norm of gradient

−8 0 10 0 20 40 60 80 100 0 20 40 60 80 100 Iteration Iteration

Before iteration After iteration 1 1

0.8 0.8

0.6 0.6

0.4 0.4

Distance to samples 0.2 Distance to samples 0.2

0 0 20 40 60 80 100 20 40 60 80 100 Sample Sample

Fig. 4.2. Averaging over the real symplectic group Sp(2, R). Criterion function (3.27) during iteration, value of the Frobenius norm of the pseudo-Riemannian gradient of the criterion function during iteration, distances d(X,Xk) before iteration (X = X(0)) and after iteration (X = µ). adaptive optimization stepsize schedule explained in subsection 3.2 and by employing the stopping condition (3.25) with precision ε = 10−6. The Figure 4.3 shows that the algorithm converges steadily toward the minimal distance configuration (in fact, the distances from the found empirical mean are much smaller than the distances from the initial guess). The Figure 4.4 shows the value of the ‘symplecticity’ index XT QX Q during iteration as well as the value of the optimization stepsize k − kF schedule. The value of the coefficients f˜1 and f˜2 during iteration is displayed as well. The panels show that the matrix sequence X(ℓ) remains on the manifold Sp(10, R) at each iteration (up to machine precision). The Figure 4.5 shows the result of an empirical statistical analysis about the behavior of the optimization algorithm over 500 independent trials. In each trial, the algorithm starts from a randomly generated initial guess X(0). In particular, the Figure 4.5 shows the distribution of the number of iterations that optimization takes to reach the desired precision on each trial. The convergence speed varies with the initial guess while the algorithm converges in every trial to the same value of the 1 −3 criterion function (namely, to 2 µ2 0.9827, in this test) and takes no more than 10 seconds to run on an average computation≈ platform (4GB RAM, 2.17 GHz clock). 5. Conclusion. Gradient-based optimization methods can be used to solve ap- plied mathematics problems and are relatively easy to implement on a computer. The present paper discusses a minimal-distance problem formulated over the manifold of real symplectic matrices. The present research takes its moves from the following observations: Minimal-distance problems may be extended from flat spaces to curved smooth • metrizable manifolds by the choice of an appropriate distance function. The resulting minimization problem on manifold may be tackled via a gradient- • MINIMAL-DISTANCE SYMPLECTIC MATRICES 27

1 3.5 10

3 0 10

2.5 −1 10 2

−2 10 Optimization criterion 1.5 Frobenius norm of gradient

−3 1 10 0 20 40 60 0 20 40 60 Iteration Iteration

Before iteration After iteration 3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1 Distance to samples Distance to samples 0.5 0.5

0 0 10 20 30 40 50 10 20 30 40 50 Sample Sample

Fig. 4.3. Optimization over the symplectic group Sp(10, R). Criterion function (3.27) during iteration, value of the Frobenius norm of the pseudo-Riemannian gradient of the criterion function during iteration, distances d(X,Xk) before iteration (X = X(0)) and after iteration (X = µ).

descent algorithm tailored to the geometry of parameter manifold through a geodesic-stepping method. The real symplectic group is regarded as a pseudo-Riemannian manifold and a metric is chosen that affords the computation of closed-forms of geodesic arcs. Within such geometric setting, the considered minimal-distance problem may be set up and the geodesic-based numerical stepping method may be implemented. The paper also presents a way of selecting an appropriate stepsize schedule based on local quadratic approximation of the criterion function restricted on a geodesic arc. The present manuscript is related to the previous manuscript [18]. The present manuscript is entirely dedicated to the optimization over the manifold of real sym- plectic matrices. The mathematical instruments needed within the present paper are expressed in the 2 in a fully differential-geometrical fashion, while paper [18] was based on different§ concepts (for example, the Lagrange multipliers method). The optimization problem to tackle differs from the problem considered in the previous paper [18] and is fully supported by the study conducted in [42]. The 3 presents a discussion about the difficulties encountered when trying to extend the§ Rieman- nian optimization setting to a pseudo-Riemannian optimization setting and, more in general, to a general manifold, along with a discussion on the selection of a stepsize schedule as well as of stopping criteria, on the convergence and on the computational issues related to the discussed optimization method. In particular, the 3 evidences how the computational complexity of the discussed optimization algorithm§ on the space Sp(2n, R) is of order (2n)3 (without any specifically-optimized linear-algebra tool). Although the present paper focuses on the study of minimal-distance problems on the manifold of real symplectic matrices, the main results of the paper and the 28 S. FIORI

−13 10 1

0.9

0.8

0.7 −14 10 0.6

Symplecticity 0.5 Optimization stepsize 0.4

−15 10 0.3 0 20 40 60 0 20 40 60 Iteration Iteration

0 5

4 −0.5 1 2 3 −1 2 Coefficient f Coefficient f −1.5 1

−2 0 0 20 40 60 0 20 40 60 Iteration Iteration

Fig. 4.4. Optimization over the symplectic group Sp(10, R). Value of the symplecticity index T ˜ ˜ kX(ℓ)QX(ℓ) − QkF, of the coefficient f1(ℓ) and of the coefficient f2(ℓ) during iteration and value of the optimization stepsize schedule tˆ(ℓ) (the index ℓ is denoted as ‘Iteration’). relevant calculations were presented in a rather general fashion, so that they could be applied to other manifolds of interest to the readers. Numerical tests have been performed with reference to the computation of the empirical mean of a collection of symplectic matrices. The obtained numerical results show that the pseudo-Riemannian-gradient-based algorithm, along with a pseudo- geodesic-based stepping method, is suitable to the numerical solution of the posed minimal-distance problem. Acknowledgments. The author wishes to gratefully thank the associate editor for handling the review of the present manuscript and the anonymous referees for of- fering precious and stimulating comments and suggestions about the technical content of the paper and for substantially contributing to the quality of its final version.

REFERENCES

[1] S.-i. Amari, Natural gradient works efficiently in learning, Neural Computation, Vol. 10, 251 – 276, 1998 [2] P.A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008 [3] M. Aoki, Two complementary representations of multiple time series in state-space innova- tion forms, Journal of Forecasting (Special Issue on “Times Series Advances in Economic Forecasting”), Vol. 13, No. 2, pp. 69 – 90, 2006 [4] V.I. Arnol’d and A.B. Givental’, Symplectic geometry, in “Dynamical Systems IV: Symplec- tic Geometry & Its Applications” (V.I. Arnol’d and S.P. Novikov, Ed.s), Enciclopaedya of Mathematical Sciences, Vol. 4, pp. 1 – 138, Springer-Verlag (Second Edition), 2001 [5] V. Arsigny, P. Fillard, X. Pennec and N. Ayache, Geometric means in a novel vector space structure on symmetric positive-definite matrices, SIAM Journal on Matrix Analysis and Applications, Vol. 29, No. 1, pp. 328 – 347, 2006 MINIMAL-DISTANCE SYMPLECTIC MATRICES 29

200

180

160

140

120

100 Occurrences 80

60

40

20

0 0 100 200 300 400 500 600 700 800 Iterations to converge

Fig. 4.5. Optimization over the symplectic group Sp(10, R). Distribution of the number of iterations to converge on each trial on a total of 500 independent trials.

[6] S. D. Bartlett, B. Sanders, S. Braunstein and K. Nemoto, Efficient classical simulation of continuous variable quantum information processes, Physical Review Letters, Vol. 88, 097904/1-4, 2002 [7] A.M. Bloch, P.E. Crouch, J.E. Marsden and A.K. Sayal, Optimal control and geodesics on quadratic matrix Lie groups, Foundations of Computational Mathematics, Vol. 8, pp. 469 – 500, 2008 [8] Y. Borissov, Minimal/nonminimal codewords in the second order binary Reed-Muller codes: revisited, Proceedings of the Eleventh International Workshop on Algebraic and Combina- torial Coding Theory (June 16-22, 2008, Pamporovo, Bulgaria), pp. 29 – 34, 2008 [9] E. Celledoni and S. Fiori, Neural learning by geometric integration of reduced ‘rigid-body’ equations, Journal of Computational and Applied Mathematics (JCAM), Vol. 172, No. 2, pp. 247 – 269, December 2004 [10] E. Celledoni and S. Fiori, Descent methods for optimization on homogeneous manifolds, Journal of Mathematics and Computers in Simulation (Special issue on “Structural Dy- namical Systems: Computational Aspects”, Guest Editors: N. Del Buono, L. Lopez and T. Politi), Vol. 79, No. 4, pp. 1298 – 1323, December 2008 [11] A. Edelman, T.A. Arias and S.T. Smith, The geometry of algorithms with orthogonality constraints, SIAM Journal on Matrix Analysis Applications, Vol. 20, No. 2, pp. 303 – 353, 1998 [12] A.J. Draft, F. Neri, G. Rangarajan, D.R. Douglas, L.M. Healy and R.D. Ryne, Lie algebraic treatment of linear and nonlinear beam dynamics, Annual Review of Nuclear and Particle Science, Vol. 38, pp. 455 – 496, December 1988 [13] L. Dieci and L. Lopez, Smooth singular value decomposition on the symplectic group and Lyapunov exponents approximation, CALCOLO, Vol. 43, No 1, pp. 1 – 15, March 2006 [14] F.M. Dopico and C.R. Johnson, Parametrization of the matrix symplectic group and ap- plications, SIAM Journal on Matrix Analysis and Applications, Vol. 31, pp. 650 – 673, 2009 [15] A.J. Dragt, F. Neri, G. Rangarajan, D.R. Douglas, L.M. Healy and R.D. Ryne, Lie algebraic treatment of linear and nonlinear beam dynamics, Annual Review of Nuclear and Particle Science, Vol. 38, pp. 455 – 496, 1988 [16] J. Fan and P. Nie, Quadratic problems over the Stiefel manifold, Operations Research Letters, Vol. 34, pp. 135 – 141, 2006 [17] H. Fassbender, Symplectic Methods for the Symplectic Eigenproblem, Kluwer Aca- 30 S. FIORI

demic/Plenum Publisher, new York, 2000 [18] S. Fiori, Learning by natural gradient on noncompact matrix-type pseudo-Riemannian mani- folds, IEEE Transactions on Neural Networks, Vol. 21, No. 5, pp. 841 – 852, May 2010 [19] S. Fiori and T. Tanaka, An algorithm to compute averages on matrix Lie groups, IEEE Transactions on Signal Processing, Vol. 57, No. 12, pp. 4734 – 4743, December 2009 [20] S. Fiori, Learning by natural gradient on noncompact matrix-type pseudo-Riemannian mani- folds, IEEE Transactions on Neural Networks, Vol. 21, No. 5, pp. 841 – 852, May 2010 [21] D. Gabay, Minimizing a differentiable function over a differentiable manifold, Journal of Op- timization Theory and Applications, Vol. 37, No. 2, pp. 177 – 219, 1982 [22] V. Guillemin and S. Sternberg, Symplectic Techniques in Physics, Cambridge University Press, 1984 [23] W.F. Harris, Astigmatic optical systems with separated and prismatic or noncoaxial elements: system matrices and system vectors, Optometry and Vision Science, Vol. 70, No. 6, pp. 545 – 551, 1994 [24] N.J. Higham, Functions of Matrices: Theory and Computation, SIAM, 2008 [25] M.P. Keating, A matrix formulation of spectacle magnification, Ophthalmic and Physiological Optics, Vol. 2, No. 2, pp. 145 – 158, 1982 [26] A. Khvedelidze and D. Mladenov, Generalized Calogero-Moser-Sutherland models from geodesic motion on GL+(n, R) group manifold, Physics Letters A, Vol. 299, No.s 5-6, pp. 522 – 530, July 2002 [27] L. Lopez, Numerical methods for ordinary differential equations on matrix manifolds, Journal of Computational and Applied Mathematics, Vol. 210, pp. 232 – 243, 2007 [28] D.G. Luenberger, The gradient projection method along geodesics, Management Science, Vol. 18, No. 11, pp. 620 – 631, July 1972 [29] W.W. MacKay and A.U. Luccio, Symplectic interpolation, Proceedings of the Tenth Euro- pean Particle Accelerator Conference (EPAC 2006, June 26 - 30, 2006, Edinburgh, Scot- land), 2006 [30] C. Mehl, V. Mehrmann, A.C.M. Ran and L. Rodman, Perturbation analysis of Lagrangian invariant subspaces of symplectic matrices, Linear and Multilinear Algebra, Vol. 57, No. 2, pp. 141 – 184, January 2009 [31] J. Nocedal and S.J. Wright, Numerical Optimization, Springer Series in Operations Research and Financial Engineering (2nd Edition), July 2006 [32] B. O’Neill, Semi-Riemannian Geometry - With Applications to Relativity, Academic Press, 1983 [33] J. Peleska, A characterization for isometries and conformal mappings of pseudo-Riemannian manifolds, Aequationes Mathematicae, Vol. 27, No 1, pp. 20 – 31, 1984 [34] C.L. Pripoae and G.T. Pripoae, General descent algorithm on affine manifolds, Proceedings of “The Conference of Geometry and its Applications in Technology” and “The Workshop on Global Analysis, Differential Geometry and Lie Algebras” (Editor: Gr. Tsagas, 1999), pp. 165 – 169, Balkan Society of Geometers, Geometry Balkan Press, 2001 [35] V. Ramakrishna and F. Costa, On the exponentials of some structured matrices, Journal of Physics A: Mathematical and General, Vol. 37, pp. 11613 – 11627, 2004 [36] F. Silva-Leite and P. Crouch, Closed forms for the exponential mapping on matrix Lie groups, based on Putzer’s method, Journal of Mathematical Physics, Vol. 40, No. 7, pp. 3561 – 3568, July 1999 [37] S. Sommer, F. Lauze, S. Hauberg and M. Nielsen, Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations, Proceedings of the 11th European Conference on Computer Vision (ECCV’10, Crete, Greece, September 5-11, 2010), Part VI, pp. 43 – 56, 2010 [38] N.G. Stephen, Transfer matrix analysis of the elastostatics of one-dimensional repetitive structures, Proceedings of the Royal Society A, Vol. 462, No. 2072, pp. 2245 – 2270, August 2006 [39] M. Spivak, A Comprehensive Introduction to Differential Geometry, 2nd Edition, Berkeley, CA: Publish or Perish Press, 1979 [40] C.F. Van Loan, A symplectic method for approximating all the eigenvalues of a Hamiltonian matrix, Linear Algebra and Its Applications, Vol. 61, pp. 233 – 251, 1984 [41] R.-B. Wu, R. Chakrabarti and H. Rabitz, Optimal control theory for continuous-variable quantum gates, Physical Review A, Vol. 77, 052303, 2008 [42] R.-B. Wu, R. Chakrabarti and H. Rabitz, Critical landscape topology for optimization on the symplectic group, Journal of Optimization Theory and Applications, Vol. 145, No. 2, pp. 387 – 406, January 2010